photo
Jordan Sissel
geek

Sat, 29 Sep 2007

Grok speed improvements

The benchmark for testing speed improvements was to process 3000 lines for '%SYSLOGBASE .*? %IPORHOST%'. Profiler was Devel::Profile.

Before:

time running program:  11.9738  (85.07%)
number of calls:       409362

%Time    Sec.     #calls   sec/call  F  name
64.52    7.6336     2870   0.002660     main::meta2string
10.00    1.1838     5740   0.000206     main::filter
 8.88    1.0507     3000   0.000350     main::handle
 3.14    0.3710    54828   0.000007     main::handle_capture
 0.77    0.0908     2870   0.000032     main::react
After
time running program:  2.5216  (82.73%)
number of calls:       105152

%Time    Sec.     #calls   sec/call  F  name
40.56    1.0228     3000   0.000341     main::handle
15.22    0.3838    54828   0.000007     main::handle_capture
 4.47    0.1128     2870   0.000039     main::meta2string
 3.31    0.0834     2870   0.000029     main::react
 2.61    0.0658    14747   0.000004     main::debug
 1.81    0.0456      237   0.000192     main::readlog
Primary changes were mostly to pregenerate a few regular expressions. Previously, I was generating the same regex every time filter() or meta2string() was being called. These small changes gave grok a serious boost in speed: what was taking 12 seconds now takes 2.5 seconds.

One example of a simple optimization is this:

before: my $re = join("|", map { qr/(?:$_)/ } keys(%$filters));
after:  our $re ||= join("|", map { qr/(?:$_)/ } keys(%$filters));
This may decrease readability, but it only sets $re once no matter how many times that line of code is executed. This line of code change was in the filter() function. Just doing this one simple change reduced the runtime of each filter() call by 97%; which reduces its runtime to something trivially small by comparison.

At this point, this is where I was very happy to have written tests for grok. To verify that grok still behaved properly after making these speed improvements, I just ran the tests suite. It passed, giving me some confidence in the stability of the changes.

Comments: 0 (view comments)
Tags: , ,
Permalink: /geekery/grok-speed-improvements
posted at: 03:27

Sun, 23 Sep 2007

Google webmaster tools tip

Google knows a lot about the web. The webmaster tools allows me to find out how much google knows about my site, in addition to some other cool features..

One of these pieces of data is "what sites are linking to me" which google webmaster tools gives you. It offers this data in a CSV format for offline consumption. I downloaded this, and wanted to see who was linking to me sorted by source url:

sed -re 's@([^,]+),([^,]+),(.*$)@\3,\2,\1@' \
| awk '
  $2 ~ /^[0-9],$/ { $2 = "0"$2 } 
  { 
    split($0, a, ","); 
    split($3, b, ","); 
    $3 = b[1]; ref=a[3]; url=a[4]; 
    printf("%s %-130s %s\n", $1" "$2" "$3, ref, url)
  }' \
| sort | sort -k4 | less
Yes, the above code could probably be better, but I'm not interested in elegance: I want data. This lets me get a good overview of who is linking to me and to what specific url they are linking.

Comments: 0 (view comments)
Tags: , , , ,
Permalink: /geekery/google-webmaster-tools
posted at: 05:39

Mon, 17 Sep 2007

Boredom, vmware cpu performance, and /dev/random

These are strictly cpu-bound tests using 'openssl speed'. I didn't compile any of the openssl binaries here, so it's possible that differences in compilationcaused the differences in the numbers.

I've never noticed a performance decrease of the host vs guest systems in vmware, and here's data confirming my suspecions.

Versions:
guest/solaris10    OpenSSL 0.9.8e 23 Feb 2007
guest/freebsd6.2   OpenSSL 0.9.7e-p1 25 Oct 2004
host/linux         OpenSSL 0.9.8c 05 Sep 2006

'openssl speed blowfish'
                   type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
host/linux         blowfish cbc     72062.94k    77117.35k    78280.70k    78680.96k    79309.48k
guest/freebsd6.2   blowfish cbc     68236.69k    73335.83k    74060.50k    74423.40k    74703.29k
guest/solaris10    blowfish cbc     64182.15k    73944.47k    75952.21k    76199.94k    76931.07k

'openssl speed rsa'
                                      sign    verify    sign/s verify/s
host/linux         rsa  512 bits 0.000308s 0.000020s   3244.3  49418.3
guest/freebsd6.2   rsa  512 bits   0.0003s   0.0000s   3343.5  41600.1
guest/solaris10    rsa  512 bits 0.001289s 0.000116s    775.6   8630.8

host/linux         rsa 1024 bits 0.000965s 0.000049s   1036.7  20409.8
guest/freebsd6.2   rsa 1024 bits   0.0009s   0.0001s   1160.0  18894.2
guest/solaris10    rsa 1024 bits 0.007152s 0.000369s    139.8   2708.1

host/linux         rsa 2048 bits 0.004819s 0.000135s    207.5   7414.4
guest/freebsd6.2   rsa 2048 bits   0.0045s   0.0001s    222.8   6951.1
guest/solaris10    rsa 2048 bits 0.045780s 0.001334s     21.8    749.8

host/linux         rsa 4096 bits 0.028600s 0.000422s     35.0   2371.3
guest/freebsd6.2   rsa 4096 bits   0.0279s   0.0004s     35.8   2271.4
guest/solaris10    rsa 4096 bits 0.317812s 0.004828s      3.1    207.1
It's interesting that the performance on blowfish were pretty close, but rsa was wildly different. The freebsd guest outperformed the linux host in signing by 10%, but fell behind in verification. Solaris peformed abysmally. The freebsd-guest vs linux-host data tells me that the cpu speed differences between guest and host environments is probably zero, which is good.

Again, the compilation options for each openssl binary probably played large parts in the performance here. I'm not familiar with SunFreeware's compile options with openssl (the binary I used came from there).

Either way, the point here was not to compare speeds against different platforms, but to in some small way compare cpu performance between host and guest systems. There are too many uncontrolled variables in this experiment to consider it valid, but it is interesting data and put me on another path to learn about why they were different.

My crypto is rusty, but I recall that rsa may need a fair bit of entropy to pick a big prime. Maybe solaris' entropy system is slower than freebsd's or linux's system? This lead me to poke at /dev/random on each system. I wrote a small perl script to read from /dev/random as fast as possible.

host/linux        82 bytes in 5.01 seconds: 16.383394 bytes/sec
guest/solaris10   57200 bytes in 5.01 seconds: 11410.838461 bytes/sec
guest/freebsd6.2  210333696 bytes in 5.01 seconds: 41947398.850271 bytes/sec
I then ran the same test on the host/linux machine while feeding /dev/random on the host from entropy from the freebsd machine:
% ssh jls@teabag 'cat /dev/random' > /dev/random &
% perl devrandom.pl                                  
448 bytes in 5.00 seconds: 89.563136 bytes/sec

# Kill that /dev/random feeder, and now look:
% perl devrandom.pl
61 bytes in 5.01 seconds: 12.185872 bytes/sec
When speed is a often a trade-off for security, are FreeBSD's and Solaris's /dev/random features more insecure than Linux's? Or, is Linux just being dumb?

Googling finds data indicating that /dev/random on linux will block until entropy is available, so let's retry with /dev/urandom instead.

host/linux        29405184 bytes in 5.01 seconds: 5874687.437817 bytes/sec
guest/solaris10   70579600 bytes in 5.00 seconds: 14121588.405586 bytes/sec
guest/freebsd6.2  208445440 bytes in 5.02 seconds: 41502600.216189 bytes/sec
FreeBSD's /dev/urandom is a symlink to /dev/random, so the same throughput appearing here is expected. FreeBSD's still wins by a landslide. Why? Then again, maybe that's not a useful question. How often do you 40mb/sec of random data?

Back at the rsa question - If solaris' random generator is faster than linux in all cases, then why is 'openssl speed rsa' slower on solaris than linux? Compile time differences? Perhaps it's some other system bottleneck I haven't explored yet.

Comments: 2 (view comments)
Tags: ,
Permalink: /geekery/vmware-cpu-performance
posted at: 01:45

Sun, 16 Sep 2007

Ruby/Oniguruma code block patches

I love perl's (?{ code }) feature. I want it in other languages.

I spent some time on hacking this into ruby a few weeks ago. I finally got around to making patches.

In FreeBSD ports, I select to build Ruby 1.8.6 with oniguruma for the regex engine. After doing 'make configure' you can apply these patches:

I haven't tested this on other platforms, and it's not feature complete, but it's close.

Comments: 2 (view comments)
Tags: , , , ,
Permalink: /geekery/ruby-oniguruma-codeblock-patches
posted at: 00:38

Sat, 15 Sep 2007

new grok version available (20070915)

Hop on over to the grok project page and download the new version.

The changelist from the previous announced release is as follows:

Changes for 20070915:

  * Added 'grok_patfind.pl' which adds 'grok -F' functionality. Read about it
    here:
    http://www.semicomplete.com/blog/geekery/grok-pattern-autodiscovery.html

  * Proper shutdown is called to kill the "hack" subprocess
  * Add 'shell' option to 'type' sections; not currently used.
  * Warn if we're trying to override an existing filter.
  * Added more perlthoughts to t/theory/

  * Numerous pattern regex changes:
    - NUMBER no longer uses Regexp::Common's real number regex since that one
      matches '.' and my thoughts are that '.' is not a number
    - Added POSITIVENUM pattern
    - Fix HOSTNAME to match hostnames starting with a number (Again,
      Regexp::Common fails me)
    - Add path patterns: UNIXPATH, WINPATH, URI
    - MONTH no longer matches 0 through 12
    - DAY no longer matches 0 through 6
    - SYSLOGPROG is more specific now, since valid prog names can have dashes.

Comments: 0 (view comments)
Tags: , ,
Permalink: /geekery/grok-20070915
posted at: 04:13

Mon, 03 Sep 2007

New project: liboverride (20070903)

Last month, I wrote about overriding shared library functions. I spent time today working on that project and it's to the point where I want to put it out for consumption. It's not perfect, but I've used it to easily override both libc and libX11 functions with great results.

Download: liboverride-20070903.tar.gz

Comments: 0 (view comments)
Tags: ,
Permalink: /geekery/liboverride-20070903
posted at: 23:52

new keynav version available (20070903)

Hop on over to the keynav project page and download the new version.

The changelist from the previous announced release is as follows:

20070903:
  - Drag is now working. Problem was KeyEvent.state contains masks such as
    | Button1Mask which is set when mouse button 1 is held, so keybindings stopped
    | working. Ignoring Button[1-5]Mask in this value fixes the problem.
  - Drag takes two optional arguments: a button followed by a keysequence to fire.
    | 'drag 1 alt' will do an alt+leftclick drag.
    | 'drag 2' will do a middleclick drag.
  - sync to xdotool@20070903
  - Fix a bug in parse_mods and parse_keysym where it was destructively changing the string.
  - Fix a bug where I was using the loop iterator 'i' inside another for loop. Oops.
  - Add to defaults my nethack-vi-style diagonal keybindings

Comments: 0 (view comments)
Tags: ,
Permalink: /geekery/keynav-20070903
posted at: 18:56

new xdotool version available (20070903)

Hop on over to the xdotool project page and download the new version.

The changelist from the previous announced release is as follows:

20070903:
  * Add xdo_mousemove_relative for relative mouse movements
  * Add xdolib.sh. This is a script library to help with features xdo does not
    explicitly implement, such as querying window attributes, or fetching the
    root window id. An example which uses this is: examples/move_window_away.sh

Comments: 0 (view comments)
Tags: ,
Permalink: /geekery/xdotool-20070903
posted at: 18:27

Dear Xbox Live and Shadowrun,

Let me first open by saying I think it's neat that gamers from both Vista and Xbox360 worlds can play with each other online. Cross-platform anything is cool. However, "neat" isn't always something you push to production without considering the drawbacks.

My complaint is regarding this feature and the first person shooter (FPS) genre. This complaint is about the effect that this feature has on gaming. Allow me to clear that I'm not upset that I suck at Shadowrun and die a lot. Heck, it's fun to play even when I die constantly. Except against users who are clearly average or better gamers on Vista.

As anyone who grew up playing FPSs on the computer, my first distaste for the console FPS was the aiming system. The mouse allows you to more quickly and accurately input directional data to the game than does an analog thumbstick. A mouse lets you turn around instantly and accurately with the trained flick of a wrist. Thumbsticks are a far cry from this. Everyone knows this.

So why, then, do we bridge the gaming worlds of PCs and Consoles? Why, when the aiming device (mouse, trackball, whatever) has several distinct advantages over the thumbstick? Who knows.

All I know is, I'm tired of teleporting 3 times across someone's view and his shots follow me exactly, assumedly because he's using a mouse. Maybe he's really good, and I know I'm really bad, but my gut tells me that most of the time these players are on PCs.

For other games, sure, let's join the worlds, where the tools on both sides don't grant significant advantage.

All I want is a checkbox that says: "Only play with gamers on my platform."

If I played on a PC, I'd ask for this same feature, because it would be an absolute slaughter: me against console players.

Pretty please? A checkbox isn't so hard, is it?

Love,
Me, an Xbox Live and Shadowrun fan.

Comments: 3 (view comments)
Tags: ,
Permalink: /geekery/dear-xbox-live-and-shadowrun
posted at: 17:49

Sun, 02 Sep 2007

Grok and automatic log pattern discovery

My todo list has had "grok - magical 'pattern finder' thing" since May of this year. I added it after I wrote up some thoughts on pattern matching, string sequencing, and automation.

I spent many hours on that problem tonight.

Initially, I wrote a python script which would compare each line in a file against every other line in the file. Using difflib and string split, I could figure out what words were changed between two lines. If the lines were similar, I could compute the difference and replace the differences with a token placeholder, such as "WORD".

Download this python script

Here's a sample output of the tool where it compares one line against another, computes the difference and makes a new string that will match both lines:

0.91: error: PAM: authentication error for illegal user karinebeers
    : error: PAM: authentication error for illegal user leshana
  ==> error: PAM: authentication error for illegal user %WORD%
This script is fairly primitive in execution. It only compares whole tokens, which are delimited by space. This was good, but not enough. It doesn't know about compound patterns such as quoted strings, or complex patterns such as those matching an IP address or a file path.

How do we consider complex patterns like hostnames and quoted strings? Turns out, most of the pattern matching wizardry is in grok. Grok knows about named patterns, and we can abuse this to use grok in a different way - instead of parsing text, we're going to use it to turn input text into a grok pattern.

Example:

% echo '/foo/bar/baz "foo bar baz" hello ther 1234 www.google.com' \
| perl grok -F 
%UNIXPATH% %QS% hello ther 1234 %IPORHOST%
What did it do?
input:  /foo/bar/baz "foo bar baz" hello ther 1234 www.google.com
output: %UNIXPATH%   %QS%          hello ther 1234 %IPORHOST%
Using a new hack on top of grok, we can now turn an unknown plaintext input into a pattern that is reusable and human-meaningful. This is totally awesome.

This hack only considers complex tokens for simplicity's sake; that is, tokens only containing letters and numbers are ignored. Why? Is 'foo1234' a word or a hostname? Is 1997 a number or a year? Grok allows you to make these distinctions, but I skip simple tokens so I don't have to programatically weight random patterns. Note the above example, where '1234' was not replaced with '%NUMBER%' or something similar.

So, I run this new grok hack on my /var/log/messages file. Here's a sampling of output from grok:

# grok -F < /var/log/messages
%SYSLOGBASE% logfile turned over due to size>100K
%SYSLOGBASE% kernel time sync enabled 6001
%SYSLOGBASE% error: PAM: authentication error for testuser from %IPORHOST%
%SYSLOGBASE% Limiting closed port RST response from 380 to 200 packets/sec
%SYSLOGBASE% error: PAM: authentication error for admin from %IPORHOST%
%SYSLOGBASE% error: PAM: authentication error for pants from %IPORHOST%
%SYSLOGBASE% kernel time sync enabled 6001
%SYSLOGBASE% kernel time sync enabled 2001
This is pretty close to a reusable pattern that captures all the changing data. You'll notice there are commonalities among certain lines such as the 'authentication error' lines where only the username is different. Let's run the 'authentication error' lines only through the python script and see what happens:
# messages.patterns contains the output of 'grok -F'
% grep 'authentication error' messages.patterns > testinput
% python distance_sort.py  testinput | sort | uniq
%SYSLOGBASE% error: PAM: authentication error for %WORD% from %IPORHOST%
%SYSLOGBASE% error: PAM: authentication error for illegal user %WORD% from %IPORHOST%
Wow, that's exactly what I wanted out of this process. Now we have some useful patterns, generated with almost zero effort. Let's apply this to our log file and see what we can get. How about counting, by user, failed login attempts?
pattern="%SYSLOGBASE% error: PAM: authentication error for %WORD% from %IPORHOST%"
./grok -m "$pattern" -r "%WORD%" < messages \
| sort | uniq -c | sort -n | tail -5
   5 backup
   9 angel
  10 pgsql
  12 mail
  17 ident

So what have we really accomplished?

You can take any log file and run it through this process. The output of this process is a set of patterns you can use to query data in that log file (and others like it). The real power here is not that patterns are generated, it's that named, queryable patterns are generated. That's awesome.

Comments: 0 (view comments)
Tags: , ,
Permalink: /geekery/grok-pattern-autodiscovery
posted at: 02:23

Search this site

Navigation

Page 1 of 2  [next]

Metadata

Home About Resume My Code (SVN)

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '08 Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials SSH Tunneling (Video)

Tag Cloud

Calendar

< September 2007 >
SuMoTuWeThFrSa
       1
2 3 4 5 6 7 8
9101112131415
16171819202122
23242526272829
30      

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati