grok for apache log analysis
Posted Mon, 19 Feb 2007
I recently made a small change to my rss and atom feeds. I add a tracker image in the content. It looks like this:
<img src="/images/spacer.gif?path_of_a_blog_entry">Any time someone views a post in an RSS feed, that image is loaded and the client browser happily reports the referrer url and I get to track you! Wee.
This is in an effort to find out how many people actually read my blog. Now that I can track viewship of the rss/atom feeds, how do I go about analyzing it? grok to the rescue:
% grep 'spacer.gif\?' accesslog \ | perl grok -m '%APACHELOG%' -r '%IP% %QUOTEDSTRING:REFERRER%' \ | sort | uniq -c | sort -n <IP addresses blotted out, only a few entries shown> 1 XX.XX.XX.XX "http://www.google.com/reader/view/" 9 XX.XX.XX.XX "http://whack.livejournal.com/friends" 10 XX.XX.XXX.XXX "http://www.bloglines.com/myblogs_display?sub=44737984&site=6302113" 10 XX.XXX.XXX.XX "http://www.semicomplete.com/?flav=rss20" 27 XX.XXX.XXX.XX "http://whack.livejournal.com/friends"Each line represents a unique viewer, and tells me what the reader was using to view the feed.