Search this site


Metadata

Articles

Projects

Presentations

Playing with graphing; matplotlib

webhits.data contains updates of this format:
[email protected]:1
[email protected]:1
[email protected]:1
[email protected]:5
The values are hits seen in a single second to this website. This particular data set includes only the past month's worth of data.

Let's graph "total hits per hour" over time.

% ./evtool.py update /tmp/webhits.db - < webhits.data
% ./evtool.py fetchsum /tmp/webhits.db $((60 * 60)) http.hit
60*60 is 3600, aka 1 hour. hits, 1 hour. I also reran it with 60*60*24 aka 24 hour totals. hits, 1 day.

The data aggregation may be incorrect; not sure if I really got 12K hits on each of the first few days this month. However, using fex+awk+sort on the logfiles themselves shows basically the same data:

 % cat access.* | ~/projects/fex/fex '[2 1:1' | countby 0  | sort -k2 | head -3
 11534 01/Nov/2007
 11488 02/Nov/2007
 11571 03/Nov/2007
Actually looking at the logs shows 5K hits from a single IP on 01/Nov/2007, and it's the googlebot.