photo
Jordan Sissel
geek

Fri, 28 Mar 2008

C++Grok bindings working in Python

% python example.py "%SYSLOGDATE%" < /var/log/messages | head -1
{'MONTH': 'Mar', '=LINE': 'Mar 23 06:47:03 snack syslogd 1.4.1#21ubuntu3: restart.', '=MATCH': 'Mar 23 06:47:03', 'TIME': '06:47:03', 'SYSLOGDATE': 'Mar 23 06:47:03', 'MONTHDAY': '23'}
That's right. I can now use C++Grok from python.

After I saw it work, I immediately ran a time check against the perl version:

% seq 20000 > /tmp/x
% time python example.py "%NUMBER>5000%" < /tmp/x > /tmp/x.python
0.59s user 0.00s system 99% cpu 0.595 total
% time perl grok -m "%NUMBER>5000%" -r "%NUMBER%" < /tmp/x  > /tmp/x.perl
4.86s user 0.94s system 18% cpu 31.647 total
The same basic operation is 50x faster in python with c++grok bindings than the pure perl version. Excellent. Sample python code:
g = pygrok.GrokRegex()
g.add_patterns( <dictionary of patterns> )
g.set_regex("%NUMBER>5000%")
match = g.search("hello there 123 456 7890 pants")
if match:
  print match["NUMBER"]
# prints '7890'
I knew I wasn't doing reference counting properly, so to test that I ran the python code against an input set of 1000000 lines and watched the memory usage, which clearly showed leaking. I quickly read up on ref counting in Python and what functions return new or borrowed references. A few keystrokes later my memory leaks were gone. After that I put python in the test suite and am read to push a new version of c++grok.

Download: cgrok-20080327.tar.gz

Python Build instructions:

% cd pygrok
% python setup.py install

# make sure it's working properly
% python -c 'import pygrok'
There is an example and some docs in the pygrok directory.

Let me know what you think :)

Comments: 0 (view comments)
Tags: , , , , ,
Permalink: /geekery/python-cppgrok-bindings
posted at: 01:31

Search this site

Navigation

Metadata

Home About Resume My Code (SVN)

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '08 Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials SSH Tunneling (Video)

Tag Cloud

Calendar

< March 2008 >
SuMoTuWeThFrSa
       1
2 3 4 5 6 7 8
9101112131415
16171819202122
23242526272829
3031     

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati