Search this site





Adventures in SWIG and Boost::Python

I spent much of tonight trying to do the least amount of work and get some kind of python bindings available from the C++ version of Grok.


I ran into problem after problem with SWIG, all likely becuase I chose to write c++grok with templates. After failing on that repeatedly, I decided to try out Boost::Python. Also failure. I wasn't able to find docs explaining how to use boost::python without Boost's retarded bjam build system! Fine, so I try to use bjam. After more repeated failing of simply trying to get a hello world example working with bjam, I think I'm giving up for tonight.

Here's a request: Don't make me use your retarded build system.

I fully admit I haven't spent half a lifetime pouring over the Boost::Python documentation, but should I really have to learn an entirely new make(1)-like system just to compile things? With SWIG, atleast the errors were readable and I was able to get things to compile without issue - I just couldn't figure out quickly how to expose the few templated classes c++grok has.

I'm closer to a working python module with SWIG, but the Boost::Python syntax is quite nice and is in pure C++ from what I can tell.

Ugh! Maybe I'll have better luck next time.

I found template instantiation in SWIG:

%template(SGrokRegex) GrokRegex<sregex>;
%template(SGrokMatch) GrokMatch<sregex>;
But compiling this breaks because mark_tag in xpressive seems to lack a default constructor, and the swig-generated code wants to use it?
grok_wrap.cpp:3987: error: no matching function for call to 'boost::xpressive::detail::mark_tag::mark_tag()'
/usr/include/boost/xpressive/regex_primitives.hpp:41: note: candidates are: boost::xpressive::detail::mark_tag::mark_tag(int)
/usr/include/boost/xpressive/regex_primitives.hpp:40: note:                 boost::xpressive::detail::mark_tag::mark_tag(const boost::xpressive::detail::mark_tag&)
Several of the above errors are emitted when compiling... I'll try more tomorrow.

Grok porting to C++

I've got pattern generation working.
% ./test '%NUMBER%' "hello 45.04" "-1.34"
Testing: %NUMBER%
Appending pattern 'NUMBER'
Test str: '(?:[+-]?(?:(?:[0-9]+(?:\.[0-9]*)?)|(?:\.[0-9]+)))'
regexid: 0x692840
Match: 1 / '45.04'
Match: 1 / '-1.34'
I'm pretty sure this is the 4th time I've at least started implementing grok in any given language. The total so far has been: perl, python, ruby, C++. I stopped working on the one in ruby because ruby's regexp engine is lacking in some useful features (*). The python port of grok was written before I added advanced predicates, which is why the ruby port was halted quickly.

(*) I opened a ruby feature request explaining a few problems I'd found with ruby's regexp feature. I even offered to help fix some of them. Circular discussions happened and I basically gave up on the idea of moving to ruby after ruby's own creator expressed a defeatist attitude about adding such a feature. My patches are still available. I don't particularly care that my request hasn't gone anywhere, so don't ask me about it, as I've happily moved on :)

Assuming I do this right, this should give grok a serious boost in speed.

Boost xpressive dynamic regexp with custom assertions

As it turns out, xpressive is (so far) exactly what I'm looking for.

'Dynamic regular expression' in Xpressive's docs are means that the regex object comes from compiling a regex string, not from using the static regular expression (aka coded in C++) that is the alternative. Very fortunately, you can mix the uses of dynamic and static expressions, since both end up turning into the same objects!

What I wanted was dynamic regexps with custom assertions, and here's how you do it:

struct is_private {
  bool operator()(ssub_match const &sub) const {
    /* Some test on 'sub' */

/* somewhere in your code ... */
sregex ip_re = sregex::compile("(?:[0-9]+\\.){3}(?:[0-9]+)");
sregex priv_ip_re = ip_re[ check(is_private()) ];
This is excellent because this was one of the features of perl that kept me from making grok available in any other language.

I have a working demo you can download. I've tested on Linux and FreeBSD with success. It requires boost 1.34.1 and the xpressive 2.0.1. The version of xpressive that comes with boost 1.34.1 is insufficient, you must separately download the latest version of xpressive. I installed it by unzipping it and copying boost/xpressive/* to /usr/local/include/boost/xpressive/ - this overwrote the old copy of xpressive I had installed.

Compile with (on freebsd, the -I and :

g++ -I/usr/local/include -c -o boost_xpressive_test.o boost_xpressive_test.cpp
g++  boost_xpressive_test.o -o xpressivetest
Running it:
% ./xpressivetest 
RFC1918 test on '': fail
RFC1918 test on '': fail
RFC1918 test on '': pass
Match on test1:
RFC1918 test on '': fail
RFC1918 test on '': fail
RFC1918 test on '': fail
RFC1918 test on '': pass
Match on test2:
This is exactly the behavior I expected.

Boost xpressive library supports user-defined assertions

See this doc

Basically this regex library (Boost.Xpressive) supports what I like about perl's regex engine: The (??{ code }) feature (except with different syntax). This means what I had to hack around in grok-perl I can easily express in C++ code. Awesome.

The docs only show examples of using static regexes with this great feature. I'm going to try using it with dynamic regexes. If it works, I'll be converting grok to C++.