Search this site


Metadata

Articles

Projects

Presentations

MySQL 5.0 'read-only' permits uncommitted writes

I recently had to do a master failover in mysql to bring up a new mysql master to replace an older one.

The switchover went awry shortly after we told the old master to start slaving off of the new master. The output of 'show slave status' indicated a halt of replication due to foreign key constraints: an auto_increment primary key had a duplicate insert attempt. How did this happen? I'm not sure yet, still digging.

This puzzle made me wonder how we got into that state given that I put the old master in 'read only' mode before doing the switch. Turns out, there are some edge cases that are permitted even in read-only mode. The docs have this to say:

When it is enabled, the server allows no updates except from users that have the SUPER privilege or (on a slave server) from updates performed by slave threads.

The above exceptions sound pretty reasonable, but I found an undocumented exception: uncommitted transactions can be committed even in read-only mode.

This breaks my expectation that setting read_only means that as soon as this setting goes 'true' all writes will fail. This means your backups aren't consistent when using read_only unless you lock all your tables during the backup. Additionally, commits made after read_only is set will bump the binlog position, meaning if your backups do "set readonly, copy master status, mysqldump", then you may break things because your mysqldump may have data that includes things in the future of the master status you recorded.

It looks like this is fixed in MySQL >=5.1.15. The docs say that later versions will cause setting read_only to block while there's a pending transactions.

Related to the original problem (replication failover), I think we should've just locked all the tables then restart the old master in read_only mode rather than simply setting read_only.

Below is an example of 'read_only' being set and a transaction commit as non-superuser resulting in a data write. I have aligned the actions side-by-side as they were executed chronologically.

Note the master status binary log position had changed after the commit. This is expected after a normal database write. You can also see the table was actually updated. I wasn't expecting any writes to succeed when read_only is set:

mysql as user 'test' mysql as user 'root'
# As a user I created 'test':
mysql> SET AUTOCOMMIT=0;
mysql> CREATE TABLE foo.foo (a int) ENGINE INNODB;
mysql> START TRANSACTION;
mysql> SELECT * FROM foo.foo;
Empty set (0.00 sec)

# So far so good, let's insert.
mysql> INSERT INTO foo.foo (a) VALUES (2);
mysql> SELECT * FROM foo.foo;
+------+
| a    |
+------+
|    2 | 
+------+
1 row in set (0.00 sec)









mysql> INSERT INTO foo.foo (a) VALUES (12345);
ERROR 1290 (HY000): The MySQL server is running 
with the --read-only option so it cannot execute 
this statement

# We expected the above error, but can we commit
# our previous insert before read_only was set?
mysql> COMMIT;
Query OK, 0 rows affected (0.01 sec)







# This is an empty result, since we haven't 
# yet committed.
[email protected]> SELECT * FROM foo.foo; 
Empty set (0.00 sec)





[email protected]> SET GLOBAL READ_ONLY = TRUE;
[email protected]> SHOW MASTER STATUS \G
*************************** 1. row ********
            File: mysql-bin.000001
        Position: 644
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)









[email protected]> SHOW MASTER STATUS \G
*************************** 1. row ********
            File: mysql-bin.000001
        Position: 834
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

[email protected]> SELECT * from foo.foo;
+------+
| a    |
+------+
|    2 | 
+------+
1 row in set (0.00 sec)

Mysql slave server-id selection

For a current project, I need the ability to dynamically grow and shrink a pool of mysql slaves. In order for replication to work properly, every slave must have a unique server id. When you want to grow another slave, how do you choose the server id?

Two slaves with the same server id will replicate successfully, but when they reach the end of the master's binary log, something freaks out and forces them to disconnect. This causes both slaves to reconnect, sync (no data needed), and have the connection die off quickly again. The result of this is rapid connection/disconnection by both slaves driving the load to 1+ on both slaves, and to around .3 on the master even in a completely idle system. This is bad. Therefore, server id collisions are bad.

A simple approach might be to pick a random number. However, depending on your range, collisions may still occur. If there's even a slight chance of collision, you have to detect that collision and try a new number. Collision detection is expensive and can be done one of a few ways:

  • query all slaves asking "show global variable like 'server-id'" and comparing it against the chosen one. This has O(n) runtime, and doesn't scale.
  • Set the server id to whatever you picked at random, have a heuristic tool that can detect the behavior that happens when two server ids collide. This is obviously a horrible idea.
Random choice doesn't seem to be very good. Scanning all slaves and picking an id that isn't in the set of known ids is also bad, as mentioned above. So what now?

We need a number that will never repeat. You might think about using a small table in the master with an auto_increment column and always get a new id that way, but why? Time is always increasing. Bonus that mysql's server-id is an unsigned 32bit value, so unix epoch values will be fine until the distant future.

A trivial script can generate your my.cnf whenever you bring up a new slave with the current time as a server id and you're pretty much guaranteed never to have a collision unless you grow two slaves up at the same second (how likely is that?).

Simple mysql config:

# my.cnf.in
server-id=SERVERID
Simple script to generate a config with a proper serverid:
#!/bin/sh

m4 -DSERVERID=`date +%s` my.cnf.in > /etc/my.cnf
Make this part of your "add a new mysql slave" setup and you'll a scalable server-id selection system.

Alternatively, since mysql server-id values are, again, 32 bit, you can simply use the IP address of the machine itself. Something like this:

#!/usr/bin/perl
# Turn an IP into an integer for use with mysql server IDs (or whatever)

$exp = 3;
map { $x += $_ * (2 ** (8 * $exp--)) } split(/\./, $ARGV[1]);
print $x
I named it ip2int.pl. You can use Socket's inet_aton and unpack to achieve the same result here.
./ip2int.pl 129.21.60.5
2165652485
Since IPs are in theory unique, you can use use the IP of the mysql server for its own server ID.

Mysql prepare'd queries aren't cached, ever.

There once was a database named MySQL.

It had a query cache, becuase caching helps performance.

It also had queries you could "prepare" on the server-side, with the hope that your database server can make some smart decisions what to do with a query you're going to execute N times during this session.

I told mysql to enable it's caching and use a magic value of 1gb for memory storage. Much to my surprise, I see the following statistic after testing an application:

mysql> show status like 'Qcache_%';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| Qcache_free_blocks      | 1          | 
| Qcache_free_memory      | 1073732648 | 
| Qcache_hits             | 0          | 
| Qcache_inserts          | 0          | 
| Qcache_lowmem_prunes    | 0          | 
| Qcache_not_cached       | 814702     | 
| Qcache_queries_in_cache | 0          | 
| Qcache_total_blocks     | 1          | 
+-------------------------+------------+
8 rows in set (0.00 sec)
Why are so many (all!?) of the queries not cached? Surely I must be doing something wrong. Reading the doc on caching explained what I can only understand as a complete lapse of judgement on the part of MySQL developers:
from http://dev.mysql.com/doc/refman/5.0/en/query-cache.html
Note: The query cache is not used for server-side prepared statements. If you're using server-side prepared statements consider that these statement won't be satisfied by the query cache. See Section 22.2.4, C API Prepared Statements.
Any database performance guide anywhere will tell you to use prepared statements. They're useful from both a security and performance perspective.

Security, becuase you feed the prepared query data and it knows what data types to expect, erroring when you pass something invalid. It also will handle strings properly, so you worry less about sql injection. You also get convenience, in that you don't have to escape your data.

Performance, becuase telling the database what you are about to do lets it optimize the query.

This performance is defeated, however, if you want to use caching. So, I've got a dillema! There are two mutually-exclusive (because MySQL sucks) performance-enhancing options available to me: using prepared statements or using caching.

Prepared statements give you two performance benefits (maybe more?). The first, is the server will parse the query string when you prepare it, and execute the "parsed" version whenever you invoke it. This saves parsing time; parsing text is expensive. The second, is that if your database is nice, it will try to optimize your queries before execution. Using prepared statements will permit the server to optimize query execution once, and then remember it. Good, right?

Prepared statements improve CPU utilization, in that the cpu can work less becuase you're teaching the database about what's coming next. Cached query responses improve disk utilization, and depending on implementation should vastly outperform most (all?) of the gains from prepared statements. This assumption I am making is based on the assumption that disk is slow and cpu is fast.

Cached queries will (should?) cache results of complex queries. This means that a select query with multiple, complex joins should be cached mapping the query string to the result. No amount of statement preparation will improve complex queries becuase they still have to hit disk. Large joins require lots of disk access, and therefore are slow. Remembering "This complex query" returned "this happy result" is fast regardless of whether or not it's stored on disk or in memory. Caching also saves cpu utilization.

I can't believe preparing a query will prevent it from being pulled from the query cache, but this is clearly the case. Thanks, MySQL, for making a stupid design decision.

Maybe there's some useful JDBC (oh yeah, the app I'm testing is written in Java) function that'll give you all the convenience/security benefits of prepare, but without the server-side bits, and thus let you use the query cache.