The requirements are pretty simple and generic:
- Storage will be mapping key+timestamp => value
- Timestamps should be 64bit values, so we can use millisecond values (52 bits to represent current unix epoch in milliseconds)
- Access method is assumed to be random access by key, but reading multiple timestamp entries for a single key is expected to be sequential.
- Key names must be arbitrary length
- Storage must be space-efficient on key names
- Values are arbitrary.
- Minimal setup overhead (aka, you don't have to setup mysql)
- a graphing system
- aggregation functions
- templating system (web, email, etc?)
This may become a problem if we want to read keys sequentially. However, if we scan the secondary database (one mapping key => 64bit_keyid) we can get keys in lexigraphical order for free. So iterating over all keys starting with the string 'networking' is still possible, but it will result in random-access reads on the primary database. This may be undesirable, so I'll have to think about whether or not this use case is necessary. There are some simple solutions to this, but I'm not sure which one best solves the general case.
Arbitrary key length is a requirement because I found the limitations of RRDTool annoying, where data source names cannot be more than 19 characters - lame! We end up being more space efficient (8 bytes per name) for any length of data source name at the cost of doing a lookup finding the 64bit key id from the name.
I have some of the code written, and a demo that runs 'netstat -s' a once a second for 5 seconds and records total ip packets inbound. The key is 'ip.in.packets'
ip.in.packets[1195634167358035]: 1697541 ip.in.packets[1195634168364598]: 1698058 ip.in.packets[1195634169368830]: 1698566 ip.in.packets[1195634170372823]: 1699079 ip.in.packets[1195634171376553]: 1699590