Search this site


Metadata

Articles

Projects

Presentations

Grok porting to C++

I've got pattern generation working.
% ./test '%NUMBER%' "hello 45.04" "-1.34"
Testing: %NUMBER%
Appending pattern 'NUMBER'
Test str: '(?:[+-]?(?:(?:[0-9]+(?:\.[0-9]*)?)|(?:\.[0-9]+)))'
regexid: 0x692840
Match: 1 / '45.04'
Match: 1 / '-1.34'
I'm pretty sure this is the 4th time I've at least started implementing grok in any given language. The total so far has been: perl, python, ruby, C++. I stopped working on the one in ruby because ruby's regexp engine is lacking in some useful features (*). The python port of grok was written before I added advanced predicates, which is why the ruby port was halted quickly.

(*) I opened a ruby feature request explaining a few problems I'd found with ruby's regexp feature. I even offered to help fix some of them. Circular discussions happened and I basically gave up on the idea of moving to ruby after ruby's own creator expressed a defeatist attitude about adding such a feature. My patches are still available. I don't particularly care that my request hasn't gone anywhere, so don't ask me about it, as I've happily moved on :)

Assuming I do this right, this should give grok a serious boost in speed.

Boost xpressive dynamic regexp with custom assertions

As it turns out, xpressive is (so far) exactly what I'm looking for.

'Dynamic regular expression' in Xpressive's docs are means that the regex object comes from compiling a regex string, not from using the static regular expression (aka coded in C++) that is the alternative. Very fortunately, you can mix the uses of dynamic and static expressions, since both end up turning into the same objects!

What I wanted was dynamic regexps with custom assertions, and here's how you do it:

struct is_private {
  bool operator()(ssub_match const &sub) const {
    /* Some test on 'sub' */
  }
};

/* somewhere in your code ... */
sregex ip_re = sregex::compile("(?:[0-9]+\\.){3}(?:[0-9]+)");
sregex priv_ip_re = ip_re[ check(is_private()) ];
This is excellent because this was one of the features of perl that kept me from making grok available in any other language.

I have a working demo you can download. I've tested on Linux and FreeBSD with success. It requires boost 1.34.1 and the xpressive 2.0.1. The version of xpressive that comes with boost 1.34.1 is insufficient, you must separately download the latest version of xpressive. I installed it by unzipping it and copying boost/xpressive/* to /usr/local/include/boost/xpressive/ - this overwrote the old copy of xpressive I had installed.

Compile with (on freebsd, the -I and :

g++ -I/usr/local/include -c -o boost_xpressive_test.o boost_xpressive_test.cpp
g++  boost_xpressive_test.o -o xpressivetest
Running it:
% ./xpressivetest 
RFC1918 test on '1.2.3.4': fail
RFC1918 test on '4.5.6.7': fail
RFC1918 test on '192.168.0.5': pass
Match on test1: 192.168.0.5
RFC1918 test on '129.21.60.0': fail
RFC1918 test on '29.21.60.0': fail
RFC1918 test on '9.21.60.0': fail
RFC1918 test on '172.17.44.25': pass
Match on test2: 172.17.44.25
This is exactly the behavior I expected.