Google webmaster tools tip

Google knows a lot about the web. The webmaster tools allows me to find out how much google knows about my site, in addition to some other cool features..

One of these pieces of data is "what sites are linking to me" which google webmaster tools gives you. It offers this data in a CSV format for offline consumption. I downloaded this, and wanted to see who was linking to me sorted by source url:

sed -re '[email protected]([^,]+),([^,]+),(.*$)@\3,\2,\[email protected]' \
| awk '
  $2 ~ /^[0-9],$/ { $2 = "0"$2 } 
    split($0, a, ","); 
    split($3, b, ","); 
    $3 = b[1]; ref=a[3]; url=a[4]; 
    printf("%s %-130s %s\n", $1" "$2" "$3, ref, url)
  }' \
| sort | sort -k4 | less
Yes, the above code could probably be better, but I'm not interested in elegance: I want data. This lets me get a good overview of who is linking to me and to what specific url they are linking.