That said, I think I might be reinventing the wheel again by trying to see what grok in C with libpcre feels like. Sample code line:
re = pcre_compile("([0-9]+)(?C1)", 0, &errptr, &erroffset, NULL);
(?C1) is PCRE-syntax for "call callback #1" - the callback I wrote
converts the last capture into a number and only succeeds if the value is
greater than 5. It'll succeed once that precondition passes:
% ./a.out "foo 2 4 6 8" Trying: 2 Trying: 4 Trying: 6 Found: 6All with a single regular expression + callouts. This feature (called callouts by PCRE) is what allows me (and you) to use predicates in grok. PCRE passes the first test.
A few hours later, I had pattern injection working (Turning %FOO% into it's regular expression) and could parse logs with ease.
I couldn't help pitting the boost and pcre versions against eachother, even though the feature set isn't the same, yet. pcregrok processed 37000lines/sec of apachelog (the most complex regexp I have), versus 6200/sec from c++/boost grok.