Search this site





New fex version available (20071119)

Hop on over to the fex project page and download the new version.


20071119 -
  - Add nongreedy tokenizer. Same semantics of strtok_r(), but doesn't skip
    empty tokens.
  - Renamed tokenizer to split, since really that's what it was doing.
  - You can invoke the nongreedy tokenizer by using '?' as the first character
    of a {} set:
     args: :{?4,6}
     input: one:::four::six
     output: four:six

New fex version available (20071026)

Hop on over to the fex project page and download the new version.


20071026 - First major release
  - Added some tests
  - If you want to specify a different first split token, the first character 
    can be any non-digit character which is not '-' or '{'.

    These are now equivalent:
    % echo "foo/bar/baz" | fex 0/2
    % echo "foo/bar/baz" | fex /2

    Previously, this would give an error due to a design decision.

fex - field extraction tool

I recently posted about a tokenizing tool based somewhat on xapply's field extraction. I think it's polished enough for a release.


Field extraction tool

Tonight was spent implementing and extending one of my favorite features of xapply: its subfield extracting feature, aka this syntax: %[1,2:1]

The gist of this is that you specify a sequence of field number, separator, field number, separator, etc, to get some very quick tokenization to pull the specific data you want. Basically it gives you *extremely* concise syntax for the a subset of the features provided by cut(1).

My tool expands on this a bit further. It's best shown by example:

% ./fex '0:-2/1' < /etc/passwd | sort  | uniq -c
      3 bin 
      1 dev 
      4 home 
      2 nonexistent 
      1 root 
      2 usr 
     14 var 
The string '0:-2/1' means:
  • 0 - the full string (aka "root:x:0:0:root:/root:/bin/bash".
    "0" here uses awk semantics where $0 in awk is the full record and $1 is the first field.
  • : - split by colons
  • -2 - take the 2nd to last token (by colon) (aka "root")
    Negative offsets aren't available in xapply, but are valid here.
  • / - split that by "/"
  • 1 - take the 1st token (aka "root")
The output is essentially the root directory for everyone's home directories. Doing this in awk, cut, perl, or any other tool would be much more typing.

You can also specify multiple field extractions on a single invocation:

# Take the first and 2nd to last token split by colon
% ./fex '0:1' '0:-2' < /etc/passwd  
root /root 
daemon /usr/sbin 
bin /bin 

# Alternatively, {x,y,z,...} syntax selects multiple tokens
# note that the output is joined by colons.
# Again, this is a feature unavailable in xapply's subfield extraction
% ./fex '0:{1,-2}' < /etc/passwd

# Parse urls out of apache logs:
% ./fex '0"2 2' < access | head -4

I still have tests to write and bugs to fix, so you won't find a release yet.