photo
Jordan Sissel
geek

Wed, 23 Apr 2008

Non-compiler caching

I setup ccache again (trivial) to help me with building freebsd repeatedly. I noticed that much of the time spent in the kernel build process was in building dependency lists using awk.

Why couldn't we apply the ccache idea to everything else? If the same input always means the same output, then we could cache it if it is computationally expensive to compute that output.

Script: dcache.sh

Above is a hack that runs like ccache, but tracks all files created by the process (and its subprocesses). Here's a sample run, of counting the number of lines in a file with awk and outputing the result (within awk) to another file.

% /usr/bin/time ./dcache.sh awk '{x++} END { print "Total records: " x > "/tmp/hello"}' bigdata
Running original...
        1.60 real         0.05 user         0.74 sys
% cat /tmp/hello
Total records: 1000000

# Remove the old output file
% rm /tmp/hello

# Rerun it again, unmodified, and it will use the cached output.
% /usr/bin/time ./dcache.sh awk '{x++} END { print "Total records: " x > "/tmp/hello"}' bigdata
Using cache...
        0.06 real         0.00 user         0.06 sys
% cat /tmp/hello
Total records: 1000000
It doesn't work with everything just yet, but the problems seem to be with truss's behavior and not the script's fault, like sometimes truss hangs, or it doesn't follow a fork like it should. Beyond truss problems, the scripts doesn't track file renames. It also doesn't understand how to figure out what the input files for each command is. Ideally it would checksum any inputs and use that as the cache key; currently it only checksums the commandline arguments and not the external files being used (such as 'bigfile').

I started initially without using truss, but awk doesn't call open(2) via libc when it opens files, for some reason, and I can't figure out a clean way to capture specific function calls from a process (even a child process).

Dtrace would be sexy here, but it is unavailable in the main freebsd trunk.

The speedup is pretty obvious for cpu-intensive things, but the real test is to see how it performs when working properly and hooked into the freebsd kernel build.

Comments: 1 (view comments)
Tags: , , , ,
Permalink: /geekery/non-compiler-caching
posted at: 04:49

Sun, 23 Sep 2007

Google webmaster tools tip

Google knows a lot about the web. The webmaster tools allows me to find out how much google knows about my site, in addition to some other cool features..

One of these pieces of data is "what sites are linking to me" which google webmaster tools gives you. It offers this data in a CSV format for offline consumption. I downloaded this, and wanted to see who was linking to me sorted by source url:

sed -re 's@([^,]+),([^,]+),(.*$)@\3,\2,\1@' \
| awk '
  $2 ~ /^[0-9],$/ { $2 = "0"$2 } 
  { 
    split($0, a, ","); 
    split($3, b, ","); 
    $3 = b[1]; ref=a[3]; url=a[4]; 
    printf("%s %-130s %s\n", $1" "$2" "$3, ref, url)
  }' \
| sort | sort -k4 | less
Yes, the above code could probably be better, but I'm not interested in elegance: I want data. This lets me get a good overview of who is linking to me and to what specific url they are linking.

Comments: 0 (view comments)
Tags: , , , ,
Permalink: /geekery/google-webmaster-tools
posted at: 05:39

Tue, 22 May 2007

Week of unix tools; day 3: awk!

Day 3 is ready for viewing. It's about awk.

This article has lots of usage examples for the many ways you can use awk to do hard work for you. Check out the article here:

day 3: awk

Comments: 0 (view comments)
Tags: , ,
Permalink: /tools/week-of-unix-day-3
posted at: 04:25

Search this site

Navigation

Metadata

Home About Resume My Code

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< April 2008 >
SuMoTuWeThFrSa
   1 2 3 4 5
6 7 8 9101112
13141516171819
20212223242526
27282930   

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati