Search this site


Metadata

Articles

Projects

Presentations

GDB for poking at libc to test random things

I wanted to test something quickly out in C, but didn't want to write the 5 line of code to do it. Having done some fun ruby debugging with gdb recently, I decided to go with that.
% gdb -q `which sleep` --args `which sleep` 60000
(gdb) break nanosleep
(gdb) run
Starting program: /bin/sleep 60000
[Thread debugging using libthread_db enabled]
[New Thread 0x7f8c40bc46f0 (LWP 6504)]
[Switching to Thread 0x7f8c40bc46f0 (LWP 6504)]

Breakpoint 1, 0x00007f8c404f7ce0 in nanosleep () from /lib/libc.so.6
(gdb) call strcspn("hello world", "w")
$1 = 6
I don't know why I didn't think about this before. This is nicely useful, allowing me to easily test any simple function call unrelated.

Parallelization with /bin/sh

I have 89 log files. The average file size is 100ish megs. I want to parse all of the logs into something else useful. Processing 9.1 gigs of logs is not my idea of a good time, nor is it a good application for a single CPU to handle. Let's parallelize it.

I abuse /bin/sh's ability to background processes and wait for children to finish. I have a script that can take a pool of available computers and send tasks to it. These tasks are just "process this apache log" - but the speed increase of parallelization over single process is incredible and very simple to do in the shell.

The script to perform this parallization is here: parallelize.sh

I define a list of hosts to use in the script and pass a list of logs to process on the command line. The host list is multiplied until it is longer than the number of logs. I then pick a log and send it off to a server to process using ssh, which calls a script that outputs to stdout. Output is captured to a file delimited by the hostname and the pid.

I didn't run it single-process in full to compare running times, however, parallel execution gets *much* farther in 10 minutes than single proc does. Sweet :)

Some of the log files are *enormous* - taking up 1 gig alone. I'm experimenting with split(1) to split these files into 100,000 lines each. The problem becomes that all of the tasks are done except for the 4 processes handling the 1 gig log files (there are 4 of them). Splitting will make the individual jobs smaller, allowing us to process them faster becuase we have a more even work load across proceses.

So, a simple application benefiting from parallelization is solved by using simple, standard tools. Sexy.

Parsing nfsstat(1) for only version X information

nfsstat | sed -ne '/Version 3/,/^$/p'
sed++

When I was bored (at 4 am, no-less), I kept trying to parse this information out using some crazy tricks with 'x' (swap pattern/hold) and other stuff, but I forgot the fact that regexps are valid addresses. So, we can print anything between 'Version 3' and blank lines, anywhere in our output.

The next thing I want to try with this is to automagically parse nfsstat output into a format that is more machine readable, this will probably be using awk or perl, seeing as how doing it with sed may hurt my brain a bit. Furthermore, trying to read the sed that did said operations would be somewhat intense ;)

The output looks something like this, on Solaris 9:

Version 3: (535958 calls)
null        getattr     setattr     lookup      access      readlink    
0 0%        242223 45%  20606 3%    52504 9%    20025 3%    41 0%       
read        write       create      mkdir       symlink     mknod       
14138 2%    146618 27%  5525 1%     145 0%      337 0%      0 0%        
remove      rmdir       rename      link        readdir     readdirplus 
6279 1%     7 0%        1539 0%     1518 0%     1606 0%     6587 1%     
Parsing this would mean generating a tree-like dictionary. In perl, it may look like:
%foo = (
	'Version 3' => {
		null => 0,
		getattr => 242223,
		setattr => 20606,
		lookup => 52504,
		# .... etc ...
		}
	)
Should be simple enough, we'll see what happens next time I get bored.