photo
Jordan Sissel
geek

Wed, 04 Jun 2008

Grok + PCRE

Perl grok was great. I learned a lot about how far beyond normal I could take regular expressions. I ported much of perl grok to C++ using Boost Xpressive, but Boost has a lot of baggage with it. I didn't like the feel of Xpressive, Boost is huge, compiling takes forever and a day (thanks C++ Templates!), and binaries are almost guaranteed to be 1meg or more.

That said, I think I might be reinventing the wheel again by trying to see what grok in C with libpcre feels like. Sample code line:

  re = pcre_compile("([0-9]+)(?C1)", 0, &errptr, &erroffset, NULL);
(?C1) is PCRE-syntax for "call callback #1" - the callback I wrote converts the last capture into a number and only succeeds if the value is greater than 5. It'll succeed once that precondition passes:
% ./a.out "foo 2 4 6 8"     
Trying: 2
Trying: 4
Trying: 6
Found: 6
All with a single regular expression + callouts. This feature (called callouts by PCRE) is what allows me (and you) to use predicates in grok. PCRE passes the first test.

A few hours later, I had pattern injection working (Turning %FOO% into it's regular expression) and could parse logs with ease.

I couldn't help pitting the boost and pcre versions against eachother, even though the feature set isn't the same, yet. pcregrok processed 37000lines/sec of apachelog (the most complex regexp I have), versus 6200/sec from c++/boost grok.

Comments: 0 (view comments)
Tags: ,
Permalink: /geekery/grok-pcre-sneak
posted at: 02:32


0 responses to 'Grok + PCRE'


Leave a reply

You need javascript enabled to use this form. Anti-spam efforts ongoing. Also, if the comment doesn't show up, it's because the form expired. Go back and copy your comment, reload the form, and resubmit. Apologies if this is a hassle, I'm just playing with antispam methods right now. If this insists on not working, please email me about it.

Name (required)
E-mail (optional, if you want me to be able to email you back)
URL (also optional)
Comment:


Search this site

Navigation

Metadata

Home About Resume My Code (SVN)

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '08 Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials SSH Tunneling (Video)

Tag Cloud

Calendar

< June 2008 >
SuMoTuWeThFrSa
1 2 3 4 5 6 7
8 91011121314
15161718192021
22232425262728
2930     

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati