Fex is a powerful field extraction tool. Fex provides a very concise
language for tokenizeing strings and extracting fields.
The basic usage model is that you provide a series of delimiter and field
selection pairs. Tokens can be any character, while field selections have a
specific syntax.
There are a few ways to specify field selections.
- Just a number, picks the Nth field.
- Comma-separated list inside curly braces: {1,2,3}
- Colon-delimited range, inside curly braces: {N:M}. Examples: {1:3}, {1:}, or {:3}. If no M is specified, {N:}, then the range is from N to the end. If no N is specified, {:M}, then N is assumed to be 1 (start of the string). If no N or M is specified, {:}, it behaves as selecting the entire string
Notes:
- Negative numbers treated as a negative offset against the end of the string
- The number '0' is special and means the entire string, as is {:}
The default behavior is to ignore empty fields. That is, a string
"foo...bar" would only have two fields when split by "." rather than four.
If you want fex to not ignore empty fields, you should prefix your field
selection with "?"
# Greedy (default)
% echo "foo.....bar..baz.fizz" | fex .2
bar
# Nongreedy
% echo "foo.....bar..baz.fizz" | fex '.{?6}'
bar
You can specify multiple, independent field selectors on the command line.
Each argument is treated as a standalone field selector. Selectors are
split by spaces on output (though I am open to changing this).
For example, output the IP and URL from an apache request log:
echo '208.36.144.8 - - [22/Aug/2007:23:39:05 -0400] "GET /svnweb/logwatch/tags/?pathrev=420 HTTP/1.0" 200 3595' \
| fex 1 '"2 2'
208.36.144.8 /svnweb/logwatch/tags/?pathrev=420
- Simple splitting
-
Input: "/usr/local/bin/firefox"
fex /1 == "usr"
fex /{2:3} == "local/bin"
fex /{1,-1} == "usr/firefox"
fex /-1 == "firefox"
fex /{:} == "/usr/local/bin/firefox/
fex /0 == "/usr/local/bin/firefox/
- Greedy vs nongreedy splitting
-
Input: "a:b::c:::d"
fex :{1:3} == "a:b:c"
fex :{?1:3} == "a:b:"
fex :{3} = "c"
fex :{?3} == "" (empty result)
Here's a simple example, to find which root directories contain home directories:
% ./fex '0:-2/1' < /etc/passwd | sort | uniq -c
3 bin
1 dev
4 home
2 nonexistent
1 root
2 usr
14 var
The string '0:-2/1' means:
- 0 - the full string, "root:x:0:0:root:/root:/bin/bash"
"0" here uses awk semantics where $0 in awk is the full record and $1 is the first field.
- : - split by colons
- -2 - take the 2nd to last token, "/root"
- / - split that by "/"
- 1 - take the 1st token, "root"
The output is essentially the root directory for everyone's home directories.
Doing this in awk, cut, perl, or any other tool would be much more typing.
You can also specify multiple field extractions on a single invocation:
# Take the first and 2nd to last token split by colon
% ./fex '0:1' '0:-2' < /etc/passwd--
root /root-
daemon /usr/sbin-
bin /bin-
# Alternatively, {x,y,z,...} syntax selects multiple tokens
# note that the output is joined by colons.
# Again, this is a feature unavailable in xapply's subfield extraction
% ./fex '0:{1,-2}' < /etc/passwd
root:/root
daemon:/usr/sbin
bin:/bin
# Parse urls out of apache logs:
% ./fex '0"2 2' < access | head -4
/
/icons/blank.gif
/icons/folder.gif
/favicon.ico
fex-20071119.tar.gz
Looking for an older version? Try the
fex release archive
Usage Examples
Simple splitting
Input: "/usr/local/bin/firefox"
fex /1 == "usr"
Why does this not return an empty string? Since 0 is a special character to represent the entire string, 1, then, must be the first element before the delimiter. In this case, the delimiter is '/' and the first string would be empty.
[1]/[2]usr/[3]local/[4]bin/[5]firefox
If I have a CSV where some records do not have the first element defined and some do, is there any way to reliably grab the first element for only the ones where it is defined using fex?