Search this site

Metadata

Articles

Projects

Presentations

Mysql slave server-id selection

For a current project, I need the ability to dynamically grow and shrink a pool of mysql slaves. In order for replication to work properly, every slave must have a unique server id. When you want to grow another slave, how do you choose the server id?

Two slaves with the same server id will replicate successfully, but when they reach the end of the master's binary log, something freaks out and forces them to disconnect. This causes both slaves to reconnect, sync (no data needed), and have the connection die off quickly again. The result of this is rapid connection/disconnection by both slaves driving the load to 1+ on both slaves, and to around .3 on the master even in a completely idle system. This is bad. Therefore, server id collisions are bad.

A simple approach might be to pick a random number. However, depending on your range, collisions may still occur. If there's even a slight chance of collision, you have to detect that collision and try a new number. Collision detection is expensive and can be done one of a few ways:

  • query all slaves asking "show global variable like 'server-id'" and comparing it against the chosen one. This has O(n) runtime, and doesn't scale.
  • Set the server id to whatever you picked at random, have a heuristic tool that can detect the behavior that happens when two server ids collide. This is obviously a horrible idea.
Random choice doesn't seem to be very good. Scanning all slaves and picking an id that isn't in the set of known ids is also bad, as mentioned above. So what now?

We need a number that will never repeat. You might think about using a small table in the master with an auto_increment column and always get a new id that way, but why? Time is always increasing. Bonus that mysql's server-id is an unsigned 32bit value, so unix epoch values will be fine until the distant future.

A trivial script can generate your my.cnf whenever you bring up a new slave with the current time as a server id and you're pretty much guaranteed never to have a collision unless you grow two slaves up at the same second (how likely is that?).

Simple mysql config:

# my.cnf.in
server-id=SERVERID
Simple script to generate a config with a proper serverid:
#!/bin/sh

m4 -DSERVERID=`date +%s` my.cnf.in > /etc/my.cnf
Make this part of your "add a new mysql slave" setup and you'll a scalable server-id selection system.

Alternatively, since mysql server-id values are, again, 32 bit, you can simply use the IP address of the machine itself. Something like this:

#!/usr/bin/perl
# Turn an IP into an integer for use with mysql server IDs (or whatever)

$exp = 3;
map { $x += $_ * (2 ** (8 * $exp--)) } split(/\./, $ARGV[1]);
print $x
I named it ip2int.pl. You can use Socket's inet_aton and unpack to achieve the same result here.
./ip2int.pl 129.21.60.5
2165652485
Since IPs are in theory unique, you can use use the IP of the mysql server for its own server ID.

Mysql prepare'd queries aren't cached, ever.

There once was a database named MySQL.

It had a query cache, becuase caching helps performance.

It also had queries you could "prepare" on the server-side, with the hope that your database server can make some smart decisions what to do with a query you're going to execute N times during this session.

I told mysql to enable it's caching and use a magic value of 1gb for memory storage. Much to my surprise, I see the following statistic after testing an application:

mysql> show status like 'Qcache_%';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| Qcache_free_blocks      | 1          | 
| Qcache_free_memory      | 1073732648 | 
| Qcache_hits             | 0          | 
| Qcache_inserts          | 0          | 
| Qcache_lowmem_prunes    | 0          | 
| Qcache_not_cached       | 814702     | 
| Qcache_queries_in_cache | 0          | 
| Qcache_total_blocks     | 1          | 
+-------------------------+------------+
8 rows in set (0.00 sec)
Why are so many (all!?) of the queries not cached? Surely I must be doing something wrong. Reading the doc on caching explained what I can only understand as a complete lapse of judgement on the part of MySQL developers:
from http://dev.mysql.com/doc/refman/5.0/en/query-cache.html
Note: The query cache is not used for server-side prepared statements. If you're using server-side prepared statements consider that these statement won't be satisfied by the query cache. See Section 22.2.4, C API Prepared Statements.
Any database performance guide anywhere will tell you to use prepared statements. They're useful from both a security and performance perspective.

Security, becuase you feed the prepared query data and it knows what data types to expect, erroring when you pass something invalid. It also will handle strings properly, so you worry less about sql injection. You also get convenience, in that you don't have to escape your data.

Performance, becuase telling the database what you are about to do lets it optimize the query.

This performance is defeated, however, if you want to use caching. So, I've got a dillema! There are two mutually-exclusive (because MySQL sucks) performance-enhancing options available to me: using prepared statements or using caching.

Prepared statements give you two performance benefits (maybe more?). The first, is the server will parse the query string when you prepare it, and execute the "parsed" version whenever you invoke it. This saves parsing time; parsing text is expensive. The second, is that if your database is nice, it will try to optimize your queries before execution. Using prepared statements will permit the server to optimize query execution once, and then remember it. Good, right?

Prepared statements improve CPU utilization, in that the cpu can work less becuase you're teaching the database about what's coming next. Cached query responses improve disk utilization, and depending on implementation should vastly outperform most (all?) of the gains from prepared statements. This assumption I am making is based on the assumption that disk is slow and cpu is fast.

Cached queries will (should?) cache results of complex queries. This means that a select query with multiple, complex joins should be cached mapping the query string to the result. No amount of statement preparation will improve complex queries becuase they still have to hit disk. Large joins require lots of disk access, and therefore are slow. Remembering "This complex query" returned "this happy result" is fast regardless of whether or not it's stored on disk or in memory. Caching also saves cpu utilization.

I can't believe preparing a query will prevent it from being pulled from the query cache, but this is clearly the case. Thanks, MySQL, for making a stupid design decision.

Maybe there's some useful JDBC (oh yeah, the app I'm testing is written in Java) function that'll give you all the convenience/security benefits of prepare, but without the server-side bits, and thus let you use the query cache.