Search this site





Google webmaster tools tip

Google knows a lot about the web. The webmaster tools allows me to find out how much google knows about my site, in addition to some other cool features..

One of these pieces of data is "what sites are linking to me" which google webmaster tools gives you. It offers this data in a CSV format for offline consumption. I downloaded this, and wanted to see who was linking to me sorted by source url:

sed -re '[email protected]([^,]+),([^,]+),(.*$)@\3,\2,\[email protected]' \
| awk '
  $2 ~ /^[0-9],$/ { $2 = "0"$2 } 
    split($0, a, ","); 
    split($3, b, ","); 
    $3 = b[1]; ref=a[3]; url=a[4]; 
    printf("%s %-130s %s\n", $1" "$2" "$3, ref, url)
  }' \
| sort | sort -k4 | less
Yes, the above code could probably be better, but I'm not interested in elegance: I want data. This lets me get a good overview of who is linking to me and to what specific url they are linking.

Week of unix tools; day 1: Sed!

Day 1 is ready for viewing. It's about sed, something I feel many sysadmins (and others) neglect in favor of perl, awk, or other tools. It's a super useful tool. Check out the article here:

Day 1: sed

Strip XML comments with sed

sed -ne '/<!--/ { :c; /-->/! { N; b c; }; /-->/s/<!--.*-->//g }; /^  *$/!p;'
You might consider stripping blanklines and/or filtering through xmllint --format to make the xml pretty printed.

Parsing nfsstat(1) for only version X information

nfsstat | sed -ne '/Version 3/,/^$/p'

When I was bored (at 4 am, no-less), I kept trying to parse this information out using some crazy tricks with 'x' (swap pattern/hold) and other stuff, but I forgot the fact that regexps are valid addresses. So, we can print anything between 'Version 3' and blank lines, anywhere in our output.

The next thing I want to try with this is to automagically parse nfsstat output into a format that is more machine readable, this will probably be using awk or perl, seeing as how doing it with sed may hurt my brain a bit. Furthermore, trying to read the sed that did said operations would be somewhat intense ;)

The output looks something like this, on Solaris 9:

Version 3: (535958 calls)
null        getattr     setattr     lookup      access      readlink    
0 0%        242223 45%  20606 3%    52504 9%    20025 3%    41 0%       
read        write       create      mkdir       symlink     mknod       
14138 2%    146618 27%  5525 1%     145 0%      337 0%      0 0%        
remove      rmdir       rename      link        readdir     readdirplus 
6279 1%     7 0%        1539 0%     1518 0%     1606 0%     6587 1%     
Parsing this would mean generating a tree-like dictionary. In perl, it may look like:
%foo = (
	'Version 3' => {
		null => 0,
		getattr => 242223,
		setattr => 20606,
		lookup => 52504,
		# .... etc ...
Should be simple enough, we'll see what happens next time I get bored.