I finished up work on the pattern discovery feature for the C++ port of grok. As
soon as it was finished, I wanted to see the dpeed differences between the perl
and C++ versions.
- Perl grok: 6 lines analyzed per second
- C++ grok: 130 lines analyzed per second
The feature tested here was the one detailed in this post.
130 lines per second isn't fantastic, but it's 21.66 times faster than the perl
version, and that's huge.
I still have to implement a few other features to make the C++ version
equivalent to the perl version:
- config file (same format, ideally, as the perl version)
- filters, like %SYSLOGDATE|parsedate%
Comments: 1 (view comments)
Tags: grok, c++grok, C++, perl, performance
Permalink: /geekery/cgrok-vs-grok-pattern-discovery
posted at: 19:07
I just finished implementing predicates in c++grok (tentative name) and wanted
to compare the performance against perl grok.
An input of 50000 lines of apache logfile amounting to 9.7megs of data.
I initially attempted this using the regex predicate %IP~/^129% but I realized
that perl grok compiles the predicate regex every time it is executed, and
wasn't a fair test. So I switched to %IP>=129% instead, which converts the
match to an integer first (so 129.21.60.9 turns into 129, for example), which seems like more equal ground based on the implementations in both perl and C++.
# C++ Grok
% /usr/bin/time ./test_patterns "%IP>=129%" < /tmp/access.50klines > /dev/null
2.56user 0.14system 0:02.92elapsed 92%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+408minor)pagefaults 0swaps
# Perl Grok
% /usr/bin/time perl grok -m "%IP>=129/%" -r "%IP%" < /tmp/access.50klines > /dev/null
8.87user 1.24system 0:25.94elapsed 39%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+17721minor)pagefaults 0swaps
What still remains consistent is the trend that the more complexity I add in C++ equates to a greater margin of speed from the perl version.
- Using strict %FOO% patterns with no predicates, the C++ version is 6 to 7 times faster than the perl equivalent in grok.
- Using predicates shows the C++ version running 10 times faster.
I still need to write test cases for the C++ version in addition to porting the
pattern discovery
portion from perl.
Exciting :)
Comments: 0 (view comments)
Tags: grok, perl, C++, benchmark, performance
Permalink: /geekery/grok-predicates-perl-vs-cplusplus
posted at: 02:31
If you've ever used templates in C++, you've probably gone blind trying to read the compiler errors.
grokmatch.hpp:7: error: 'typedef class std::map<std::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char>
> >, std::allocator<std::pair<const std::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::basic_string<char,
std::char_traits<char>, std::allocator<char> > > > >
GrokMatch<boost::xpressive::basic_regex<__gnu_cxx::__normal_iterator<const
char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >
> >::match_map_type' is private
I'm supposed to read all that crap? Especially since 99% of the data isn't
useful in most cases. The following vim script sanitizes this output:
function! GPPErrorFilter()
silent! %s/->/ARROW/g
while search("<", "wc")
let l:line = getline(".")
let l:col = col(".")
let l:char = l:line[l:col - 1]
if l:char == "<"
normal d%
else
break
endif
endwhile
silent! %s/ARROW/->/g
silent %!awk '/: In/ { print "---------------"; print }; \!/: In/ {print }'
endfunction
If I dump the output of make to a file (including stderr), and run the function while in vim, using ':call GPPErrorFilter()', the output turns into this:
g++ -g -I/usr/local/include -c -o main.o main.cpp
---------------
grokmatch.hpp: In function 'int main(int, char**)':
grokmatch.hpp:7: error: 'typedef class std::map GrokMatch::match_map_type' is private
main.cpp:43: error: within this context
make: *** [main.o] Error 1
So much better... Now i know I'm clearly trying to access a private typedef.
Sanity++
Comments: 1 (view comments)
Tags: vim, g++, c++, templates, sanity
Permalink: /geekery/vim-function-to-make-errors-readable
posted at: 00:39
I've got pattern generation working.
% ./test '%NUMBER%' "hello 45.04" "-1.34"
Testing: %NUMBER%
Appending pattern 'NUMBER'
Test str: '(?:[+-]?(?:(?:[0-9]+(?:\.[0-9]*)?)|(?:\.[0-9]+)))'
regexid: 0x692840
Match: 1 / '45.04'
Match: 1 / '-1.34'
I'm pretty sure this is the 4th time I've at least started implementing grok in
any given language. The total so far has been: perl, python, ruby, C++. I
stopped working on the one in ruby because ruby's regexp engine is lacking in
some useful features (*). The python port of grok was written before I added
advanced predicates, which is why the ruby port was halted quickly.
(*) I opened a ruby feature request explaining a few problems I'd found with
ruby's regexp feature. I even offered to help fix some of them. Circular discussions
happened and I basically gave up on the idea of moving to ruby after ruby's own
creator expressed a defeatist attitude about adding such a feature. My patches are
still available. I don't particularly care that my request hasn't gone
anywhere, so don't ask me about it, as I've happily moved on :)
Assuming I do this right, this should give grok a serious boost in speed.
Comments: 0 (view comments)
Tags: grok, C++, boost, xpressive
Permalink: /geekery/grok-porting-to-c++
posted at: 06:38
As it turns out, xpressive is (so far) exactly what I'm looking for.
'Dynamic regular expression' in Xpressive's docs are means that the regex
object comes from compiling a regex string, not from using the static regular
expression (aka coded in C++) that is the alternative. Very fortunately, you
can mix the uses of dynamic and static expressions, since both end up turning
into the same objects!
What I wanted was dynamic regexps with custom assertions, and here's how you do it:
struct is_private {
bool operator()(ssub_match const &sub) const {
/* Some test on 'sub' */
}
};
/* somewhere in your code ... */
sregex ip_re = sregex::compile("(?:[0-9]+\\.){3}(?:[0-9]+)");
sregex priv_ip_re = ip_re[ check(is_private()) ];
This is excellent because this was one of the features of perl that kept me
from making grok available in any other language.
I have a working
demo you can download. I've tested on Linux and FreeBSD with success. It
requires boost
1.34.1 and the xpressive 2.0.1. The version of xpressive that comes with boost
1.34.1 is insufficient, you must separately download the latest version of
xpressive. I installed it by unzipping it and copying
boost/xpressive/* to /usr/local/include/boost/xpressive/ - this
overwrote the old copy of xpressive I had installed.
Compile with (on freebsd, the -I and :
g++ -I/usr/local/include -c -o boost_xpressive_test.o boost_xpressive_test.cpp
g++ boost_xpressive_test.o -o xpressivetest
Running it:
% ./xpressivetest
RFC1918 test on '1.2.3.4': fail
RFC1918 test on '4.5.6.7': fail
RFC1918 test on '192.168.0.5': pass
Match on test1: 192.168.0.5
RFC1918 test on '129.21.60.0': fail
RFC1918 test on '29.21.60.0': fail
RFC1918 test on '9.21.60.0': fail
RFC1918 test on '172.17.44.25': pass
Match on test2: 172.17.44.25
This is exactly the behavior I expected.
Comments: 0 (view comments)
Tags: boost, xpressive, C++, regexp, assertions
Permalink: /geekery/boost-xpressive-testing
posted at: 02:12
A few weeks ago I installed Vmware Server 2.0 Beta 1. I noted a regression from
vmware server 1.3 (and 1.2) that "raw disks" were seemingly not supported. The
workaround was to manually edit the 'vmx' file for the virtual machine to add
the old entries which exposed raw disks to vmware.
Tonight, I rebooted my server after accidentally powering it off while cleaning dust off of the intake vents, and vmware didn't start back up. Technically, all of the startup scripts (/etc/init.d/vmware) ran fine and reported no errors, but I couldn't connect to the management interface on port 8333. Netstat output confirmed that nothign was listening on this port. Crap.
After grepping around in various places, I figured that the tomcat server that
comes with vmware (named webAccess) had no intentions on listening to port
8333, and this was normal. I checked /var/log/ for anything useful, and found /var/log/vmware. In this directory, was a set of hostd-N.log files, where N is a number. In hostd-0.log, was this entry (the entry below is truncated for readability):
[2008-01-08 21:31:23.790 'vm:/vmdisks/vms/filer (solaris 64bit)/filer (solaris 64bit
).vmx' 47879793637584 warning] Disk was not opened successfully. Backing type unknow
n: 0
[2008-01-08 21:31:23.790 'vm:/vmdisks/vms/filer (solaris 64bit)/filer (solaris 64bit
).vmx' 47879793637584 warning] Disk was not opened successfully. Backing type unknow
n: 0
[2008-01-08 21:31:23.791 'App' 47879793637584 error]
Exception: ASSERT /build/mts/release/bora-63231/bfg-atlantis/bora/vim/hostd/vmsvc/vm
ConfigReader.cpp:3251
[2008-01-08 21:31:23.794 'App' 47879793637584 error] Backtrace:
<actual backtrace snipped>
Keep in mind, that even though vmware-hostd was failing, /etc/init.d/vmware
reported success for every operation. Eek.
So, I went to my filer vmx file and commented out the rawDisk entries and restarted vmware (with the init script). No more failures were logged in hostd-0.log, and a subsequent netstat showed vmware-hostd listening on port 8333. Peachy.
Back on my windows box, I ran the vmware console, and guess what happens... I
can now manage my vmware sessions again.
I can only hope that VMware decides to allow raw, local disk access in the
finished version of vmware 2.0, because I am rather dependent on it. If they
don't, I might be able to get away with moving the data out of the zfs pool,
initializing the drives with some random linux file system, and creating a
500gig vmware virtual drive on each disk, and finally telling Solaris to fix
its zfs stuff. Since I don't have too much data there, I might be able to get
away with draining one disk out of the zfs pool, and doing the conversion from
raw to virtual disk one physical disk at a time. Might be a useful exercise in
learning zfs more.
I'll cross that bridge when I get to it.
Comments: 1 (view comments)
Tags: vmware
Permalink: /geekery/vmware-server-2.0-startup-problems
posted at: 01:22
For whatever reason, I decided to play with oniguruma tonight (a newish regular
expression library). I'm considering an effort to port some of grok's
functionality to C or C++ for speed reasons. Doing it in C++ would require me
to re-learn C++.
The docs are pretty complete, but not very helpful with respect to examples. I
wasn't able to find very many useful examples on google, but the API docs are
quite good. What wasn't answered by the docs was answered by reading header
files. Excellent.
The result of this adventure is this:
# regex: ^(?<test>.*?)( (?<word2>.*))?$
# input: "hello there"
% gcc -I/usr/local/include -L/usr/local/lib -lonig oniguruma_named_captures.c
% ./a.out "hello there"
word2 = there
test = hello
% ./a.out "foobarbazfizz"
word2 =
test = foobarbazfizz
Download the code
Comments: 0 (view comments)
Tags: oniguruma, regex, regexp, C, examples
Permalink: /geekery/oniguruma-named-capture-example
posted at: 04:41
I finally picked up some small parts (leds, breadboard, leads, multimeter) from
Fry's to start work on my universal remote project. Yes, I know you can buy
universal remotes. I want to learn more electronics, so why not use this as a
starter project?
Since my Soekris box is now free of it's router duties, I can use it for this
project. To do this, I'll need two pieces: an infrared receiver, and an
infrared emitter. I bought an infrared emitter led today, and I just need to
buy a receiver online (they're like $2).
Before getting there, I needed to learn how to drive the GPIO ports on the net4501.
It was pretty simple to do, nd after hooking up a few wires I had an LED that
blinked after a few hours of reading and hacking. The code itself was trivial
to write, I just had to learn how to talk to the GPIO ports.
After writing the blinking code, I decided the next step was clearly to add
fading to the LED. This is commonly done with PWM (pulse-width modulation).
Apple made this technique famous with its "breathing" LED lights on the
monitors and laptops when the devices were in sleep mode.
At any rate, I have successfully written code that makes both the error led and
the PIO5 (GPIO 0) pin "breathe".
Download glow.c
Comments: 1 (view comments)
Tags: soekris, net4501, gpio, freebsd, C
Permalink: /geekery/soekris-gpio
posted at: 06:54
I've definitely spent 10+ hours in the past 2 days trying to get this system upgraded from 6.0 to 7.0...
Why? First, I was using the wrong power supply, so it would randomly reboot during startup (4 hours, after before I figured that one out, after compiling zillions of kernels, even trying GENERIC from 6.2, 6.3, and 7.0). Then, I couldn't get a stable system image that would boot successfully. It would halt trying to run /sbin/init. Not sure why.
I was initially making my own image with mdconfig, and rsyncing the entire system into a file-backed fs. Then I dd'd this image to my compactflash and put it in the net4501. I gave up on that after several hours, and did this instead:
- Bring up a new vmware instance with 2 disks. 1 800mb disk and 1 5gb disk.
- Install 7.0-RC1, only install kernel and base.
- Mount the 5gb disk (newfs /dev/sd1, mount /dev/sd1 /usr/src) and unpack
the kernel sources there. Build kernel (For the soekris box).
- Make any necessary config changes (serial console, etc)
- Shutdown machine
- Use qemu-img to convert the vmware disk into a raw disk image
- dd the new image to compact flash.
- Rejoice. I have a working net4501 now.
In testing in qemu, I get random timeouts talking to ad0. On the soekris board,
I get random timeouts to ad0. wtf :(
Boot into safe-mode, and everything seems find. (Safe mode turns off DMA, iirc)
Comments: 1 (view comments)
Tags: soekris, net4501, freebsd
Permalink: /geekery/victory-soekris-install
posted at: 06:49
|
Search this site
Navigation
Metadata
Home
About
Resume
My Code (SVN)
ARP Security
Dynamic DNS with DHCP
OpenLDAP+Kerberos+SASL
PPP over SSH
SSH Security: /bin/false
Week of Unix Tools
Work Efficiency
fex
firefox tabsearch
firefox urledit
grok
keynav
liboverride
newpsm (FreeBSD)
nis2ldap
pam_captcha
poor man's backup
Solaris audio utility
xboxproxy
xdotool
xmlpresenter
xpathtool
misc scripts
Presentations
Yahoo! Hack Day '08
Yahoo! Hack Day '06
Unix Essentials
Vi/Vim Essentials
SSH Tunneling (Video)
Tag Cloud
Calendar
Friends
BarCamp
Kent Brewster
Tantek Çelik
John Resig
Wesley Shields
Tyler Shields
Technorati
|