photo
Jordan Sissel
geek

Fri, 28 Mar 2008

C++Grok bindings working in Python

% python example.py "%SYSLOGDATE%" < /var/log/messages | head -1
{'MONTH': 'Mar', '=LINE': 'Mar 23 06:47:03 snack syslogd 1.4.1#21ubuntu3: restart.', '=MATCH': 'Mar 23 06:47:03', 'TIME': '06:47:03', 'SYSLOGDATE': 'Mar 23 06:47:03', 'MONTHDAY': '23'}
That's right. I can now use C++Grok from python.

After I saw it work, I immediately ran a time check against the perl version:

% seq 20000 > /tmp/x
% time python example.py "%NUMBER>5000%" < /tmp/x > /tmp/x.python
0.59s user 0.00s system 99% cpu 0.595 total
% time perl grok -m "%NUMBER>5000%" -r "%NUMBER%" < /tmp/x  > /tmp/x.perl
4.86s user 0.94s system 18% cpu 31.647 total
The same basic operation is 50x faster in python with c++grok bindings than the pure perl version. Excellent. Sample python code:
g = pygrok.GrokRegex()
g.add_patterns( <dictionary of patterns> )
g.set_regex("%NUMBER>5000%")
match = g.search("hello there 123 456 7890 pants")
if match:
  print match["NUMBER"]
# prints '7890'
I knew I wasn't doing reference counting properly, so to test that I ran the python code against an input set of 1000000 lines and watched the memory usage, which clearly showed leaking. I quickly read up on ref counting in Python and what functions return new or borrowed references. A few keystrokes later my memory leaks were gone. After that I put python in the test suite and am read to push a new version of c++grok.

Download: cgrok-20080327.tar.gz

Python Build instructions:

% cd pygrok
% python setup.py install

# make sure it's working properly
% python -c 'import pygrok'
There is an example and some docs in the pygrok directory.

Let me know what you think :)

Comments: 0 (view comments)
Tags: , , , , ,
Permalink: /geekery/python-cppgrok-bindings
posted at: 01:31

Mon, 24 Mar 2008

Python C++ Grok bindings

I've gotten quite a bit further tonight on making c++grok's functionality available in python.

Mostly tonight's efforts have been spent learning the python C api and learning how to add new objects and methods. I'm planning to have this ready for BarCampRochester3 in two weeks.

So far I can make new GrokRegex objects and call set_regex() and search() on them. Next time I'll be implementing GrokMatch objects (like in the C++ version) and a few other small things. Fun fun :)

Comments: 0 (view comments)
Tags: , , , ,
Permalink: /geekery/python-cgrok-bindings-2
posted at: 13:36

Tue, 11 Mar 2008

Adventures in SWIG and Boost::Python

I spent much of tonight trying to do the least amount of work and get some kind of python bindings available from the C++ version of Grok.

Fail.

I ran into problem after problem with SWIG, all likely becuase I chose to write c++grok with templates. After failing on that repeatedly, I decided to try out Boost::Python. Also failure. I wasn't able to find docs explaining how to use boost::python without Boost's retarded bjam build system! Fine, so I try to use bjam. After more repeated failing of simply trying to get a hello world example working with bjam, I think I'm giving up for tonight.

Here's a request: Don't make me use your retarded build system.

I fully admit I haven't spent half a lifetime pouring over the Boost::Python documentation, but should I really have to learn an entirely new make(1)-like system just to compile things? With SWIG, atleast the errors were readable and I was able to get things to compile without issue - I just couldn't figure out quickly how to expose the few templated classes c++grok has.

I'm closer to a working python module with SWIG, but the Boost::Python syntax is quite nice and is in pure C++ from what I can tell.

Ugh! Maybe I'll have better luck next time.

I found template instantiation in SWIG:

%template(SGrokRegex) GrokRegex<sregex>;
%template(SGrokMatch) GrokMatch<sregex>;
...
But compiling this breaks because mark_tag in xpressive seems to lack a default constructor, and the swig-generated code wants to use it?
grok_wrap.cpp:3987: error: no matching function for call to 'boost::xpressive::detail::mark_tag::mark_tag()'
/usr/include/boost/xpressive/regex_primitives.hpp:41: note: candidates are: boost::xpressive::detail::mark_tag::mark_tag(int)
/usr/include/boost/xpressive/regex_primitives.hpp:40: note:                 boost::xpressive::detail::mark_tag::mark_tag(const boost::xpressive::detail::mark_tag&)
Several of the above errors are emitted when compiling... I'll try more tomorrow.

Comments: 0 (view comments)
Tags: , , ,
Permalink: /geekery/python-cgrok-bindings
posted at: 04:23

Sorting MASTER_SITES_* values by ping time in FreeBSD

I wrote a script that will go through every variable named MASTER_SITE_[something] in bsd.sites.mk and basically sort each variable's contents by ping time.

The implementation uses asyncore in python, and uses tcp connections instead of pinging (pinging is not guaranteed to work). The output is sorted by fastest response time and in a format suitable for Makefile (and thus /etc/make.conf).

% python fastest_site.py
 => Checking servers for MASTER_SITE_GENTOO (77 servers)
MASTER_SITE_GENTOO=\
        ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/%SUBDIR%/ \
        ftp://gentoo.kems.net/pub/mirrors/gentoo/%SUBDIR%/ \
        ftp://files.gentoo.gr/%SUBDIR%/ \
... <output cut> ...

I recommend that you send the output to a separate file, such as /usr/local/etc/ports_sites.conf, and add the following line to /etc/make.conf:

.include "/usr/local/etc/ports_sites.conf"
Now generate the file:
% python fastest_site.py > /usr/local/etc/ports_sites.conf
 => Checking servers for MASTER_SITE_GENTOO (77 servers)
 => Checking servers for MASTER_SITE_TCLTK (11 servers)
 => Checking servers for MASTER_SITE_GET_E (11 servers)
 => Checking servers for MASTER_SITE_BERLIOS (4 servers)
...
Download: fastest_sites.py

Comments: 0 (view comments)
Tags: , ,
Permalink: /geekery/freebsd-ports-master-sites-sorting
posted at: 00:37

Mon, 10 Mar 2008

FreeBSD development

I've had a src commit bit in freebsd for a while, and I haven't done much with it. Yes, I suck. I'm working on getting my mouse code into the tree, finally, after almost 2 years of its life and almost 2 years of my slacking off on getting it ready for submission.

I think one of the main reasons I've directed energy elsewhere is because there's a (from my perception) thick metawork process to get real work done. Culture shock, mostly. Almost all of the tools and methods are different from my own. My experience at Google has given me good practice in dealing with systems foreign to me, so why do I hesitate to work on FreeBSD stuff?

Outside of the processes involved in getting code into the FreeBSD source tree, one of the main problems I've had working on specifically kernel changes in FreeBSD are that I haven't come up with a good solution for separating workspaces other than simply creating a new virtual machine for each logical workspace. In Perforce, you can create multiple clients and work on independent changes in each client. In userland code, you can simply just build a new binary in a different directory, and you can test both binaries independently.

With kernels, I have a hard time multitasking. Not specifically multitasking different kernels, but if I'm making kernel and userland changes which are unrelated to eachother, I can't safely test a new kernel on the same system as a userland change. Isolating these as easy as making a new virtual machine, but copying virtual machines is not as fast and easy as, say, making a new perforce client.

I haven't come up with a good solution yet, but I'm sure someone else has and perhaps I'll build on that. Maybe some kind of hack where I would use a pristine, read only system image and all changes would be written to a memory filesystem on top of that pristine image? But this basically means all systems have to have the same pristine image (copying the image is nontrivial in time)...

Hopefully some of this makes sense. I'm open to suggestions :)

Comments: 2 (view comments)
Tags:
Permalink: /geekery/freebsd-development
posted at: 02:24

Search this site

Navigation

Metadata

Home About Resume My Code (SVN)

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< March 2008 >
SuMoTuWeThFrSa
       1
2 3 4 5 6 7 8
9101112131415
16171819202122
23242526272829
3031     

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati