Search this site


Metadata

Articles

Projects

Presentations

GDB for poking at libc to test random things

I wanted to test something quickly out in C, but didn't want to write the 5 line of code to do it. Having done some fun ruby debugging with gdb recently, I decided to go with that.
% gdb -q `which sleep` --args `which sleep` 60000
(gdb) break nanosleep
(gdb) run
Starting program: /bin/sleep 60000
[Thread debugging using libthread_db enabled]
[New Thread 0x7f8c40bc46f0 (LWP 6504)]
[Switching to Thread 0x7f8c40bc46f0 (LWP 6504)]

Breakpoint 1, 0x00007f8c404f7ce0 in nanosleep () from /lib/libc.so.6
(gdb) call strcspn("hello world", "w")
$1 = 6
I don't know why I didn't think about this before. This is nicely useful, allowing me to easily test any simple function call unrelated.

Ruby's DateTime::strptime vs libc strptime

A project I'm working on has some odd slowness about it. Using ruby-prof, I found that String#scan was consuming most of the time, but ruby-prof didn't tell me where it was coming from. A quick hack that replaced String#scan with my own method showed who was calling it, DateTime.strptime -
class String
  def scan(*args)
    raise
  end
end
I tried using the ruby debugger to break on String#scan, but it didn't seem to work. PEBCAK, probably, which is why I used the solution above to just toss an exception when that function was called.

Back at the point, DateTime.strptime is slow. Looking at the underlying code shows you why: date/format.rb - the _strptime_i method.

Lots of string shuffling, regular expressions to match field specifiers (%d, etc), string modification with more regexps, etc. The code is pretty easy to read, but it's still doing a lot of work it doesn't need to be doing. Luckily, libc comes with a method for parsing times in the same way: strptime.

So, I started working on an extension to the Time class that invokes libc's strptime and returns a Time instance: ruby-ctime. The usage is simple once you have the module:

require "CTime"

puts Time.strptime("%Y", "2009")
# outputs 'Wed Jan 00 00:00:00 +0000 2009'
The one major holdback from strptime is that there's no wide support for timezones. Format strings like %Z and %z work with strftime, but generally are unsupported by strptime; exceptions that do support %z are glibc, and freebsd appears to support both %Z and %z. Nothing reliably cross-platform. This is a historical problem due to the fact that the 'struct tm' structure has no timezone field (glibc and the bsds add 'long tm_gmtoff' to support timezones).

This means we'll have to correct for this by extending strptime to support it, but I'm not there yet.

Anyway, short benchmarking for features supported by both libc strptime and DateTime strptime shows libc a massive winner:

snack(~/projects/ruby-ctime) % ruby test.rb
Iterations: 10000
datetime: 7.680928 (1301.92601727291/sec)
my_strptime: 0.126583 (78999.5497025667/sec)
A 60x speedup using the new C code vs DateTime.strptime. This is a great start, but we still need timezone support. I need to hack timezone support into this, which probably means I'll start with glibc's strptime implementation.

XSendEvent + LD_PRELOAD == win

As far as feature requests come, for xdotool, one of the more common ones is to have the ability to send key or mouse events to a specific window, not just the active one. XTEST (what xdotool uses for key/mouse currently) doesn't let you specify a window to send events. XSendEvent(3) lets you send hand-crafted events to a specific window, but most applications ignore these sent events.

The XEvent struct has a member 'send_event' which is true if the event came from an XSendEvent call and false otherwise. Programs like firefox and xterm (by default) ignore many events that have 'send_event' set to true.

Enter LD_PRELOAD.

Writing a custom shared library that overrides the default XNextEvent and XPeekEvent functions allows us to force 'send_event' to always be false, so an application with this library loaded will happily handle keyboard/mouse events generated with XSendEvent. I already have a helpful project that lets me write such a shared library: liboverride.

#include <stdio.h>
#include <X11/Xlib.h>

void hack_send_event(XEvent *ev) {
  switch (ev->type) {
    case KeyPress: case KeyRelease: case ButtonPress: case ButtonRelease:
      ev->xany.send_event = False;
      break;
  }
}

override(`XNextEvent', `
  {
    real_func(display, event_return);
    hack_send_event(event_return);
    return;
  }
')
This small bit of liboverride code will give me a shared library I can preload with LD_PRELOAD. Doing so will ensure that send_event is false for any key or mouse button events.

Works well. Now that we have a reliable way to allow XSendEvent I think it's worth putting this into xdotool.

libevent bufferevevents on pipe(2)

From the libevent documentation:
A new bufferevent is created by bufferevent_new(). The parameter fd specifies the file descriptor from which data is read and written to. This file descrip- tor is not allowed to be a pipe(2).
It says it's not allowed to be a pipe, but it works just fine in Linux with epoll, poll, and select.

I'm certain I've just unleashed some evil demons that'll make my code crash in unexpectedly wonderful ways as a result... ;)

Watching process output with libevent

I started really learning libevent today. The API docs for libevent are pretty good, but I'm having trouble finding good examples that use it. To that end, I'll provide my own. This one uses a bufferevent to watch stdout on a child process.

Code: libevent-execwatch.c

libc's tree functions, function pointers, and bdb.

Whoever wrote the tsearch/tfind/tdelete/twalk functions for libc clearly stopped thinking about how it would be used. The only way I can see to iterate the tree is to use twalk, which doesn't let you pass any extra arguments to your provided action method.

This sucks if, for example, you wanted to get a list of all entries in the tree in a threadsafe or multiplicity-safe way.

Some workarounds include:

  • Every time you insert, add the same structure to an array.
  • Use something that supports sane iteration (bdb, for example).
I looked into using bdb for some things, but the tree I wanted to iterate over most was a structure that held, among other things, a function pointer. Function pointers are magical and special and are held in a place in memory you can't simply make a copy of. If you try to store a function pointer in an in-memory BDB database, the value that comes out of your query will be different than the function pointer.

This bdb code sample attempts to store a function pointer in bdb. The output is:

stored: 607a70
actual: 4005e8
The value changed because copying a function pointer doesn't work.

There's a workaround here that might be useful - dlopen(). Any functions I want to store in bdb, I would store by string name and fetch the function pointer with dlsym().

This dlopen example shows how to dlopen yourself and fetch a function by string name.

Fun with pointers.

C vs Python with Berkeley DB

I've got a stable, threaded version of this fancydb tool I've been working on. However, the performance of insertions is less than optimal.

Then again, how much should insert performance matter on a monitoring tool? For data that comes into it gradually, speed doesn't matter much. For bulk inserts, speed matters if you want to get your work done quickly. I haven't decided if bulk insertions are necessary use case for this tool. Despite that, I'm still interested in what the limits are.

I have experimented with many different implementations of parallelism, buffering, caching, etc in the name of making insertion to a fancydb with 10 rules fast. The fastest I've gotten it was 10000/sec, but that was on an implementation that wasn't threadsafe (and used threads).

My most-recent implementation (which should be threadsafe) can do reads and writes at 30000/sec. With evaluation rules the write rate drops to about 10000/sec.

The next task was to figure out what I was doing wrong. For comparison, I wrote two vanilla bdb accessing programs. One in C and one in Python. The output of these two follows:

# The args for each program is: insertions page_size cache_size
% sh runtest.sh
Running: ./test 2000000 8192 10485760
  => 2000000 inserts + 1 fullread: 209205.020921/sec
Running: ./py-bsddb.py 2000000 8192 10485760
  => 2000000 inserts + 1 fullread: 123304.562269/sec
As expected, C clearly outperforms Python here, but the margin is pretty small (C is 69% faster for this test). Given the 120000/sec rate from Python, the poor input rate of my tool seems to be blamed on me. Is my additional code here really the reason that I can only write at 30000 per second? I may need to revisit how I'm implementing things in python. I'm not clear right now where I'm losing so much throughput.

So I use hotshot (python standard profiler) and I find that most of the time is spent in my iterator method. This method is a generator method which uses yield and loops over a cursor.

It's important to note that my python bdb 'speed test' above did not use generators, it used a plain while loop over the cursor. So, I wrote another test that uses generators. First, let's try just inserts, no reading of data:

Running: ./test 1000000 8192 10485760
  => 1000000 inserts: 261096.605744/sec
Running: ./py-bsddb.py 1000000 8192 10485760
  => 1000000 inserts: 166389.351082/sec
Now let's try with 3 different python reading methods: while loop across a cursor, generator function (using yield), and an iterator class (implementing __iter__):
Running: ./py-bsddb.py 4000000 8192 10485760
  => 1 fullread of 4000000 entries: 8.660000
Running: ./py-bsddb_generator.py 4000000 8192 10485760
  => 1 fullread of 4000000 entries: 9.124000
Running: ./py-bsddb_iterable_class.py 4000000 8192 10485760
  => 1 fullread of 4000000 entries: 13.130000
I'm not sure why implementing an iterator is so much slower (in general) than a yield-generator is. Seems strange, perhaps my testing code is busted. Either way, I'm not really closer to finding the slowness.

get this code here

Field extraction tool

Tonight was spent implementing and extending one of my favorite features of xapply: its subfield extracting feature, aka this syntax: %[1,2:1]

The gist of this is that you specify a sequence of field number, separator, field number, separator, etc, to get some very quick tokenization to pull the specific data you want. Basically it gives you *extremely* concise syntax for the a subset of the features provided by cut(1).

My tool expands on this a bit further. It's best shown by example:

% ./fex '0:-2/1' < /etc/passwd | sort  | uniq -c
      3 bin 
      1 dev 
      4 home 
      2 nonexistent 
      1 root 
      2 usr 
     14 var 
The string '0:-2/1' means:
  • 0 - the full string (aka "root:x:0:0:root:/root:/bin/bash".
    "0" here uses awk semantics where $0 in awk is the full record and $1 is the first field.
  • : - split by colons
  • -2 - take the 2nd to last token (by colon) (aka "root")
    Negative offsets aren't available in xapply, but are valid here.
  • / - split that by "/"
  • 1 - take the 1st token (aka "root")
The output is essentially the root directory for everyone's home directories. Doing this in awk, cut, perl, or any other tool would be much more typing.

You can also specify multiple field extractions on a single invocation:

# Take the first and 2nd to last token split by colon
% ./fex '0:1' '0:-2' < /etc/passwd  
root /root 
daemon /usr/sbin 
bin /bin 

# Alternatively, {x,y,z,...} syntax selects multiple tokens
# note that the output is joined by colons.
# Again, this is a feature unavailable in xapply's subfield extraction
% ./fex '0:{1,-2}' < /etc/passwd
root:/root
daemon:/usr/sbin
bin:/bin

# Parse urls out of apache logs:
% ./fex '0"2 2' < access | head -4
/
/icons/blank.gif
/icons/folder.gif
/favicon.ico

I still have tests to write and bugs to fix, so you won't find a release yet.

Eliminiating special cases in strtok loops

strtok has a "first case" and "other case" usage. The first time you call strtok, you pass it the string. Future calls must pass NULL for this same session. This leads to this kind of code:
void foo(char *mystr) {
  char *tok;

  tok = strtok(mystr, " ");
  while (tok != NULL) {
    // Do something with tok

    tok = strtok(NULL, " ");
  }
}
Notice the above duplicate code. You can use pointers properly and achieve this same result with only one line using strtok:
void foo(char *mystr) {
  char *tok;
  char *strptr = mystr;

  while ( (tok = strtok(strptr, " ")) != NULL ) {
    if (strptr != NULL)
      strptr = NULL;
    // Do something with tok
  }
}
This method lets you still invoke both first and nonfirst cases of strtok but you only have one line of code using strtok, making your code more maintainable and readable. This way has the great benefit of being able to use 'continue' inside your loop and you still move on to the next token.