photo
Jordan Sissel
geek

Wed, 21 Nov 2007

Python, event tracking, and bdb.

In previous postings, I've put thoughts on monitoring and data aggregation. Last night, I started working on prototyping a python tool to record arbitrary data. Basically it aims to solve the problem of "I want to store data" in a simple way, rather than having to setup a mysql database, user access, tables, a web server, etc, all in the name of viewing graphs and data reports.

The requirements are pretty simple and generic:

  • Storage will be mapping key+timestamp => value
  • Timestamps should be 64bit values, so we can use millisecond values (52 bits to represent current unix epoch in milliseconds)
  • Access method is assumed to be random access by key, but reading multiple timestamp entries for a single key is expected to be sequential.
  • Key names must be arbitrary length
  • Storage must be space-efficient on key names
  • Values are arbitrary.
  • Minimal setup overhead (aka, you don't have to setup mysql)
The goal of this is to provide a simple way to store and retrieve timestamped key-value pairs. It's a small piece of the hope that there can be a monitoring tool set that is trivial to configure, easy to manipulate, easy to extend, and dynamic. I don't want to pre-declare data sources and data types (rrdtool, cacti), or make it difficult to add new collection sources, etc. I'm hoping that relying on the unix methodology (small tools that do one or two things well) that this can be achieved. The next steps in this adventure of a monitoring system are:
  • a graphing system
  • aggregation functions
  • templating system (web, email, etc?)
Space efficiency on key names is achieved with a secondary storage containing a list of key to keyid mappings. Key IDs are 64bit values. The first value is 1. We could use a hash function here, but I want a guarantee of zero collisions. However, this means that keys are specifically stored as key ids in insertion order, not lexigraphical order.

This may become a problem if we want to read keys sequentially. However, if we scan the secondary database (one mapping key => 64bit_keyid) we can get keys in lexigraphical order for free. So iterating over all keys starting with the string 'networking' is still possible, but it will result in random-access reads on the primary database. This may be undesirable, so I'll have to think about whether or not this use case is necessary. There are some simple solutions to this, but I'm not sure which one best solves the general case.

Arbitrary key length is a requirement because I found the limitations of RRDTool annoying, where data source names cannot be more than 19 characters - lame! We end up being more space efficient (8 bytes per name) for any length of data source name at the cost of doing a lookup finding the 64bit key id from the name.

I have some of the code written, and a demo that runs 'netstat -s' a once a second for 5 seconds and records total ip packets inbound. The key is 'ip.in.packets'

ip.in.packets[1195634167358035]: 1697541
ip.in.packets[1195634168364598]: 1698058
ip.in.packets[1195634169368830]: 1698566
ip.in.packets[1195634170372823]: 1699079
ip.in.packets[1195634171376553]: 1699590

Comments: 0 (view comments)
Tags: , , , ,
Permalink: /geekery/python-data-bdb-and-friends
posted at: 03:41


0 responses to 'Python, event tracking, and bdb.'


Leave a reply

You need javascript enabled to use this form. Anti-spam efforts ongoing. Also, if the comment doesn't show up, it's because the form expired. Go back and copy your comment, reload the form, and resubmit. Apologies if this is a hassle, I'm just playing with antispam methods right now. If this insists on not working, please email me about it.

Name (required)
E-mail (optional, if you want me to be able to email you back)
URL (also optional)
Comment:


Search this site

Navigation

Metadata

Home About Resume My Code (SVN)

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< November 2007 >
SuMoTuWeThFrSa
     1 2 3
4 5 6 7 8 910
11121314151617
18192021222324
252627282930 

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati