The gist of this is that you specify a sequence of field number, separator, field number, separator, etc, to get some very quick tokenization to pull the specific data you want. Basically it gives you *extremely* concise syntax for the a subset of the features provided by cut(1).
My tool expands on this a bit further. It's best shown by example:
% ./fex '0:-2/1' < /etc/passwd | sort | uniq -c
3 bin
1 dev
4 home
2 nonexistent
1 root
2 usr
14 var
The string '0:-2/1' means:
- 0 - the full string (aka "root:x:0:0:root:/root:/bin/bash".
"0" here uses awk semantics where $0 in awk is the full record and $1 is the first field. - : - split by colons
- -2 - take the 2nd to last token (by colon) (aka "root")
Negative offsets aren't available in xapply, but are valid here. - / - split that by "/"
- 1 - take the 1st token (aka "root")
You can also specify multiple field extractions on a single invocation:
# Take the first and 2nd to last token split by colon
% ./fex '0:1' '0:-2' < /etc/passwd
root /root
daemon /usr/sbin
bin /bin
# Alternatively, {x,y,z,...} syntax selects multiple tokens
# note that the output is joined by colons.
# Again, this is a feature unavailable in xapply's subfield extraction
% ./fex '0:{1,-2}' < /etc/passwd
root:/root
daemon:/usr/sbin
bin:/bin
# Parse urls out of apache logs:
% ./fex '0"2 2' < access | head -4
/
/icons/blank.gif
/icons/folder.gif
/favicon.ico
I still have tests to write and bugs to fix, so you won't find a release yet.