Search this site


Metadata

Articles

Projects

Presentations

grok for apache log analysis

I recently made a small change to my rss and atom feeds. I add a tracker image in the content. It looks like this:
<img src="/images/spacer.gif?path_of_a_blog_entry">
Any time someone views a post in an RSS feed, that image is loaded and the client browser happily reports the referrer url and I get to track you! Wee.

This is in an effort to find out how many people actually read my blog. Now that I can track viewship of the rss/atom feeds, how do I go about analyzing it? grok to the rescue:

% grep 'spacer.gif\?' accesslog \
   | perl grok -m '%APACHELOG%' -r '%IP% %QUOTEDSTRING:REFERRER%' \
   | sort | uniq -c | sort -n
<IP addresses blotted out, only a few entries shown>
  1 XX.XX.XX.XX "http://www.google.com/reader/view/"
  9 XX.XX.XX.XX "http://whack.livejournal.com/friends"
  10 XX.XX.XXX.XXX "http://www.bloglines.com/myblogs_display?sub=44737984&site=6302113"
  10 XX.XXX.XXX.XX "http://www.semicomplete.com/?flav=rss20"
  27 XX.XXX.XXX.XX "http://whack.livejournal.com/friends"
Each line represents a unique viewer, and tells me what the reader was using to view the feed.

Yay grok.