Less bullshit, more graph.
Posted Sat, 17 Mar 2007
You have 1500 machines you want in cacti. How do you do it?
My take is that you shouldn't ever need to preregister data types or data sources. Have a system that you simply throw data at, and it stores it so you can get a graph of it later. All I need to do, to graph new data, is simply write a script that produces that data and sends it to the collector.
The collector is a python cgi script that frontends to rrdtool. It takes all cgi paramters and stores the values with a few exceptions:
- machine=XX - Spoof machine to store data for. If not given, defaults to REMOTE_ADDR. Useful if you need to proxy data through another machine, or are reporting data about another machine you are probing.
- timestamp=XX - Override default timestamp ("now").
kenya(/mnt/rrds/22.214.171.124) % ls C_bytes_per_page.rrd C_pages_inactive.rrd C_cpu_context_switches.rrd C_rfork_calls.rrd ... etc ...All of those rrds are created by simply throwing data at the python cgi script. The source of the data is a script that runs 'vmstat -s' and turns it into key-value pairs.
Why are the files prefixed with "C_" ? The data I am feeding in comes from counters, and therefore should be stored as counter datatypes in rrdtool. The 'C_' prefix is a hint that if the variable needs an rrd created for it, that the DS type should be COUNTER. The default without this prefix is GAUGE.
Sample update http request:
vmstat -s looks like this:
456846233 cpu context switches 3220655757 device interrupts 17964606 software interrupts ... etc ...It's trivial to turn this into key-value pairs. If this were Cacti (or similar system) I would have to go through every line of vmstat -s and create a new data type/source/thing for each one, then create one per host. Screw that. Keep in mind my experience with Cacti is pretty small - I saw I had to register data sources and graphs and such manually and left it alone after that.
Anyway, back at the problem. Now how do I graph it? The interface isn't the best, but we use a cgi script again:
Show me all the machines with 'C_system_calls' graphed over the past 15 minutes:
This kind of system has the feature that you never need to explicitly define data input variables or data input sources - All you need is to hack together a script that can pump out key-value pairs. No documentation to read. No time consumed registering 500 new servers in your graph system.