% python example.py "%SYSLOGDATE%" < /var/log/messages | head -1
{'MONTH': 'Mar', '=LINE': 'Mar 23 06:47:03 snack syslogd 1.4.1#21ubuntu3: restart.', '=MATCH': 'Mar 23 06:47:03', 'TIME': '06:47:03', 'SYSLOGDATE': 'Mar 23 06:47:03', 'MONTHDAY': '23'}
That's right. I can now use C++Grok from python.
After I saw it work, I immediately ran a time check against the perl version:
% seq 20000 > /tmp/x % time python example.py "%NUMBER>5000%" < /tmp/x > /tmp/x.python 0.59s user 0.00s system 99% cpu 0.595 total % time perl grok -m "%NUMBER>5000%" -r "%NUMBER%" < /tmp/x > /tmp/x.perl 4.86s user 0.94s system 18% cpu 31.647 totalThe same basic operation is 50x faster in python with c++grok bindings than the pure perl version. Excellent. Sample python code:
g = pygrok.GrokRegex()
g.add_patterns( <dictionary of patterns> )
g.set_regex("%NUMBER>5000%")
match = g.search("hello there 123 456 7890 pants")
if match:
print match["NUMBER"]
# prints '7890'
I knew I wasn't doing reference counting properly, so to test that I ran the
python code against an input set of 1000000 lines and watched the memory usage,
which clearly showed leaking. I quickly read up on ref counting in Python and
what functions return new or borrowed references. A few keystrokes later my
memory leaks were gone. After that I put python in the test suite and am read
to push a new version of c++grok.
Download: cgrok-20080327.tar.gz
Python Build instructions:
% cd pygrok % python setup.py install # make sure it's working properly % python -c 'import pygrok'There is an example and some docs in the pygrok directory.
Let me know what you think :)