Search this site





Matplotlib makes me hate.

Let me caveat this rant with the fact that I've only been playing with matplotlib for approximately a week.

All the demos made matplotlib (a python module) look like a great tool that I should want to use to graph things, then I started trying to actually write code and it all went downhill.

Almost all of the functions operate on some mystical global scope, meaning they are by design not threadsafe. Probably not a big deal, I guess, but it certainly feels like an alien world especially given all the object oriented code in use in python.

If this culture shock wasn't bad enough, it went ahead and decided to use inches and ratios as the standard units of measure. You make a figure of a set width and height (in inches) and you can put stuff in that figure given ratio offsets. An offset of '.5' would put your left-bound in the middle. Weird and unexpected. Perhaps not bad. Still, I'm used to pixels, not inches.

Some of the arguments are just looney:

The docs say this "subplot(211) # 2 rows, 1 column, first (upper) plot". Base10 flag system? What. the. F. I'm at a loss as to why this was ever a good idea. Let's make it hard to add plots? Looks like you can use 'subplot(rows, cols, plotnum)' which is the sensible solution, but all the demos use the integer syntax, and it makes me sad.

You can't easily put the legend outside the graph.

Setting the default font size means you have to set at least 6 things. Make sure you note the excessive use of different tokens for the same freaking setting: labelsize, titlesize, size, and fontsize.

rc("axes", labelsize=10, titlesize=10)
rc("xtick", labelsize=10)
rc("ytick", labelsize=10)
rc("font", size=10)
rc("legend", fontsize=10)

I have code that looks like this:

  fig = figure()
  p = subplot(111)
  line = p.plot_date(dates, values)
  fig.savefig('foo.png', format='png')
Notice my entertaining leaps between OOP and WTF. Other cute nuances are that the docs/examples are littered with:
  ax = subplot(111)
You might think that the name 'ax' means 'axis' and that subplot returns an axis. No. You might ask python with type() and it would say '<type 'instance>'. Helpful. If you just print ax you'll see it is matplotlib.axes.Subplot. I'm trying hard to not get hung up on semantics, but 'axis' to me is very different from a plot. Plot seems like a visual representation, and an axis is a single dimension of a graph (aka a plot).

After several days of playing with this tool, I am frustrated and disheartened. It has such powerful features like tick rules: You can trivially specify "Put one major tick every 3rd week". However, the api is half OO half globally-scoped-procedural. Maybe this is my fault. The docs constantly mix 'matplotlib' and 'pylab' methods. Perhaps you can use just the matplotlib functions by themselves and you don't need pylab? Pylab, by the way, is what provides these awkward global functions and in theory only exists as a pure wrapper on top of matplotlib.

Playing with graphing; matplotlib contains updates of this format:
[email protected]:1
[email protected]:1
[email protected]:1
[email protected]:5
The values are hits seen in a single second to this website. This particular data set includes only the past month's worth of data.

Let's graph "total hits per hour" over time.

% ./ update /tmp/webhits.db - <
% ./ fetchsum /tmp/webhits.db $((60 * 60)) http.hit
60*60 is 3600, aka 1 hour. hits, 1 hour. I also reran it with 60*60*24 aka 24 hour totals. hits, 1 day.

The data aggregation may be incorrect; not sure if I really got 12K hits on each of the first few days this month. However, using fex+awk+sort on the logfiles themselves shows basically the same data:

 % cat access.* | ~/projects/fex/fex '[2 1:1' | countby 0  | sort -k2 | head -3
 11534 01/Nov/2007
 11488 02/Nov/2007
 11571 03/Nov/2007
Actually looking at the logs shows 5K hits from a single IP on 01/Nov/2007, and it's the googlebot.