photo
Jordan Sissel
geek

Wed, 23 Apr 2008

Non-compiler caching

I setup ccache again (trivial) to help me with building freebsd repeatedly. I noticed that much of the time spent in the kernel build process was in building dependency lists using awk.

Why couldn't we apply the ccache idea to everything else? If the same input always means the same output, then we could cache it if it is computationally expensive to compute that output.

Script: dcache.sh

Above is a hack that runs like ccache, but tracks all files created by the process (and its subprocesses). Here's a sample run, of counting the number of lines in a file with awk and outputing the result (within awk) to another file.

% /usr/bin/time ./dcache.sh awk '{x++} END { print "Total records: " x > "/tmp/hello"}' bigdata
Running original...
        1.60 real         0.05 user         0.74 sys
% cat /tmp/hello
Total records: 1000000

# Remove the old output file
% rm /tmp/hello

# Rerun it again, unmodified, and it will use the cached output.
% /usr/bin/time ./dcache.sh awk '{x++} END { print "Total records: " x > "/tmp/hello"}' bigdata
Using cache...
        0.06 real         0.00 user         0.06 sys
% cat /tmp/hello
Total records: 1000000
It doesn't work with everything just yet, but the problems seem to be with truss's behavior and not the script's fault, like sometimes truss hangs, or it doesn't follow a fork like it should. Beyond truss problems, the scripts doesn't track file renames. It also doesn't understand how to figure out what the input files for each command is. Ideally it would checksum any inputs and use that as the cache key; currently it only checksums the commandline arguments and not the external files being used (such as 'bigfile').

I started initially without using truss, but awk doesn't call open(2) via libc when it opens files, for some reason, and I can't figure out a clean way to capture specific function calls from a process (even a child process).

Dtrace would be sexy here, but it is unavailable in the main freebsd trunk.

The speedup is pretty obvious for cpu-intensive things, but the real test is to see how it performs when working properly and hooked into the freebsd kernel build.

Comments: 1 (view comments)
Tags: , , , ,
Permalink: /geekery/non-compiler-caching
posted at: 04:49

Search this site

Navigation

Metadata

Home About Resume My Code (SVN)

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '08 Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials SSH Tunneling (Video)

Tag Cloud

Calendar

< April 2008 >
SuMoTuWeThFrSa
   1 2 3 4 5
6 7 8 9101112
13141516171819
20212223242526
27282930   

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati