Search this site


Metadata

Articles

Projects

Presentations

Grok gets more ridiculous

In my last post, I discussed a way to perform additional assertions on a given captured group while still inside of the regex state machine.

I spent some time today and implemented it in grok and it works like a charm! This kind of functionality gives you extreme power in the kind of matches you can specify.

Here's an example: How can we find the first number on a line that's greater than 20?

% jot 50 | xargs -n15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
46 47 48 49 50
Using the above as our sample input:
% jot 50 | xargs -n15 | perl grok -m "%NUMBER>20%" -r "%NUMBER%"
21
31
46
The assertion '>20' is applied during the regular expression match. This is sweet!

Another example would be to 'grep' the output of 'ls -l' for any lines containing a number greater than 5000:

% ls -l | perl grok -m "%INT>5000%"
-rw-r--r--  1 jls  jls   21590 Apr  2 20:17 foo
-rw-r--r--  1 jls  jls   22451 Apr  2 23:52 grok
-rw-r--r--  1 jls  jls  129536 Apr  2 23:47 grok-20070402.tar.gz
-rw-r--r--  1 jls  jls   13539 Mar 22 14:22 grok.1

The equivalent awk statement would look like this:

% ls -l | awk '$5 > 5000'
But if you look at the awk statement, you have to count columns. With grok's solution, all you need to know is you want any line with an integer greater than 5000. You are able to specify the specifics of your match without having to know the precise layout of the data you are matching.

What do you get if you can chain predicates? I haven't added that functionality yet, but it would be trivial to add, so expect it soon.

If you're interested, please try the latest version.