photo
Jordan Sissel
geek

Tue, 20 Nov 2007

New fex version available (20071119)

Hop on over to the fex project page and download the new version.

Changelist:

20071119 -
  - Add nongreedy tokenizer. Same semantics of strtok_r(), but doesn't skip
    empty tokens.
  - Renamed tokenizer to split, since really that's what it was doing.
  - You can invoke the nongreedy tokenizer by using '?' as the first character
    of a {} set:
     args: :{?4,6}
     input: one:::four::six
     output: four:six

Comments: 0 (view comments)
Tags: , , ,
Permalink: /geekery/fex-20071119
posted at: 00:15

Fri, 26 Oct 2007

New fex version available (20071026)

Hop on over to the fex project page and download the new version.

Changelist:

20071026 - First major release
  - Added some tests
  - If you want to specify a different first split token, the first character 
    can be any non-digit character which is not '-' or '{'.

    These are now equivalent:
    % echo "foo/bar/baz" | fex 0/2
    bar
    % echo "foo/bar/baz" | fex /2
    bar

    Previously, this would give an error due to a design decision.

Comments: 0 (view comments)
Tags: , , ,
Permalink: /geekery/fex-20071026
posted at: 03:58

Mon, 30 Jul 2007

fex - field extraction tool

I recently posted about a tokenizing tool based somewhat on xapply's field extraction. I think it's polished enough for a release.

fex-20070729

Comments: 0 (view comments)
Tags: ,
Permalink: /geekery/field-extraction-tool-fex-release
posted at: 01:58

Wed, 25 Jul 2007

Field extraction tool

Tonight was spent implementing and extending one of my favorite features of xapply: its subfield extracting feature, aka this syntax: %[1,2:1]

The gist of this is that you specify a sequence of field number, separator, field number, separator, etc, to get some very quick tokenization to pull the specific data you want. Basically it gives you *extremely* concise syntax for the a subset of the features provided by cut(1).

My tool expands on this a bit further. It's best shown by example:

% ./fex '0:-2/1' < /etc/passwd | sort  | uniq -c
      3 bin 
      1 dev 
      4 home 
      2 nonexistent 
      1 root 
      2 usr 
     14 var 
The string '0:-2/1' means:
  • 0 - the full string (aka "root:x:0:0:root:/root:/bin/bash".
    "0" here uses awk semantics where $0 in awk is the full record and $1 is the first field.
  • : - split by colons
  • -2 - take the 2nd to last token (by colon) (aka "root")
    Negative offsets aren't available in xapply, but are valid here.
  • / - split that by "/"
  • 1 - take the 1st token (aka "root")
The output is essentially the root directory for everyone's home directories. Doing this in awk, cut, perl, or any other tool would be much more typing.

You can also specify multiple field extractions on a single invocation:

# Take the first and 2nd to last token split by colon
% ./fex '0:1' '0:-2' < /etc/passwd  
root /root 
daemon /usr/sbin 
bin /bin 

# Alternatively, {x,y,z,...} syntax selects multiple tokens
# note that the output is joined by colons.
# Again, this is a feature unavailable in xapply's subfield extraction
% ./fex '0:{1,-2}' < /etc/passwd
root:/root
daemon:/usr/sbin
bin:/bin

# Parse urls out of apache logs:
% ./fex '0"2 2' < access | head -4
/
/icons/blank.gif
/icons/folder.gif
/favicon.ico

I still have tests to write and bugs to fix, so you won't find a release yet.

Comments: 1 (view comments)
Tags: , ,
Permalink: /geekery/field-extraction-tool
posted at: 04:04

Search this site

Navigation

Metadata

Home About Resume My Code (SVN)

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< November 2007 >
SuMoTuWeThFrSa
     1 2 3
4 5 6 7 8 910
11121314151617
18192021222324
252627282930 

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati