photo
Jordan Sissel
geek

Tue, 03 Apr 2007

Grok gets more ridiculous

In my last post, I discussed a way to perform additional assertions on a given captured group while still inside of the regex state machine.

I spent some time today and implemented it in grok and it works like a charm! This kind of functionality gives you extreme power in the kind of matches you can specify.

Here's an example: How can we find the first number on a line that's greater than 20?

% jot 50 | xargs -n15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
46 47 48 49 50
Using the above as our sample input:
% jot 50 | xargs -n15 | perl grok -m "%NUMBER>20%" -r "%NUMBER%"
21
31
46
The assertion '>20' is applied during the regular expression match. This is sweet!

Another example would be to 'grep' the output of 'ls -l' for any lines containing a number greater than 5000:

% ls -l | perl grok -m "%INT>5000%"
-rw-r--r--  1 jls  jls   21590 Apr  2 20:17 foo
-rw-r--r--  1 jls  jls   22451 Apr  2 23:52 grok
-rw-r--r--  1 jls  jls  129536 Apr  2 23:47 grok-20070402.tar.gz
-rw-r--r--  1 jls  jls   13539 Mar 22 14:22 grok.1

The equivalent awk statement would look like this:

% ls -l | awk '$5 > 5000'
But if you look at the awk statement, you have to count columns. With grok's solution, all you need to know is you want any line with an integer greater than 5000. You are able to specify the specifics of your match without having to know the precise layout of the data you are matching.

What do you get if you can chain predicates? I haven't added that functionality yet, but it would be trivial to add, so expect it soon.

If you're interested, please try the latest version.

Comments: 0 (view comments)
Tags: , ,
Permalink: /geekery/grok-gets-more-ridiculous
posted at: 01:05


0 responses to 'Grok gets more ridiculous'


Leave a reply

You need javascript enabled to use this form. Anti-spam efforts ongoing. Also, if the comment doesn't show up, it's because the form expired. Go back and copy your comment, reload the form, and resubmit. Apologies if this is a hassle, I'm just playing with antispam methods right now. If this insists on not working, please email me about it.

Name (required)
E-mail (optional. Not posted or recorded anywhere, ever)
URL (also optional)
Comment:


Search this site

Navigation

Metadata

Home About Resume My Code

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< April 2007 >
SuMoTuWeThFrSa
1 2 3 4 5 6 7
8 91011121314
15161718192021
22232425262728
2930     

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati