Drizzle multi-master testing!

So, it has been a while since I’ve blogged.  As some of you may have read, I have a new job and Stewart and I have been busy planning all kinds of testing goodness for Percona >: ) (I’ve also been recovering from trying to keep up with Stewart!)

Rest assured, gentle readers, that I have not forgotten everyone’s favorite modular, community-driven database ; )  Not by a long-shot.  I have some major improvements to dbqp getting ready for a merge (think randgen in-tree / additional testing modes / multiple basedirs of multiple types).  Additionally, I’ve been cooking up some code to test the mighty Mr. Shrews’ multi-master code (mwa ha ha!)

What I’ve done is allow for a new option to be used with a test’s .cnf file (this is a dbqp thing, won’t work with standard drizzle-test-run).  If the runner sees this request, it will generate a multi-master config file from the specified servers’ individual slave.cnf files. 

Here is a sample config:

[test_servers]
servers = [[--innodb.replication-log],[--innodb.replication-log],[--plugin-add=slave --slave.config-file=$MASTER_SERVER_SLAVE_CONFIG]]

[s2]
# we tell the system that we want
# to generate a multi-master cnf file
# for the 3rd server to use, that
# has the first two servers as masters
# the final file is written to the first
# server's general slave.cnf file
gen_multi_master_cnf= 0,1

A good rundown of the file’s contents can be found on Shrews’ blog here, but the end result looks like this:

ignore-errors

[master1]
master-host=127.0.0.1
master-port=9306
master-user=root
master-pass=''

[master2]
master-host=127.0.0.1
master-port=9312
master-user=root
master-pass=''

I tried cooking up a basic test case where we spin up 3 servers – 2 masters and one slave.  One master 1, we create table t1:


CREATE TABLE t1 (a int not null auto_increment, primary key(a));

On master 2, table t2:


CREATE TABLE t2 (a int not null auto_increment, primary key(a));

We insert some records into both tables, then check that our slave has everything! Sounds simple, right?

Sigh. If only. It seems that we are running into some issues when we try to record the test – you can read the bug here

We see some interesting output in the slave’s logs before it crashes:

$ cat workdir/bot0/s2/var/log/s2.err
InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: 127 rollback segment(s) active.
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
(SQLSTATE 00000) Duplicate entry '772-1' for key 'PRIMARY'
Failure while executing:
INSERT INTO `sys_replication`.`queue` (`master_id`, `trx_id`, `seg_id`, `commit_order`, `originating_server_uuid`, `originating_commit_id`, `msg`) VALUES (2, 772, 1, 1, 'ac9c8ac0-8f10-474b-9bbd-b61d2cdb2b93' , 1, 'transaction_context {
server_id: 1
transaction_id: 772
start_timestamp: 1317760732106016
end_timestamp: 1317760732106017
}
event {
type: STARTUP
}
segment_id: 1
end_segment: true
')

Replication slave: Unable to insert into queue.
Replication slave: drizzle_state_read:lost connection to server (EOF)
Lost connection to master. Reconnecting.
Replication slave: drizzle_state_connect:could not connect
111004 16:39:05 InnoDB: Starting shutdown...

Additionally, you can just try the setup with –start-and-exit:

$ ./dbqp --suite=slave --start-and-exit multi_master_basic
20111004-170033 INFO Using Drizzle source tree:

20111004-170033 INFO Taking clean db snapshot...
20111004-170033 INFO Taking clean db snapshot...
20111004-170033 INFO Taking clean db snapshot...
20111004-170035 INFO bot0 server:
20111004-170035 INFO NAME: s0
20111004-170035 INFO MASTER_PORT: 9306
20111004-170035 INFO DRIZZLE_TCP_PORT: 9307
20111004-170035 INFO MC_PORT: 9308
20111004-170035 INFO PBMS_PORT: 9309
20111004-170035 INFO RABBITMQ_NODE_PORT: 9310
20111004-170035 INFO VARDIR: /drizzle_mm_test/tests/workdir/bot0/s0/var
20111004-170035 INFO STATUS: 1
20111004-170035 INFO bot0 server:
20111004-170035 INFO NAME: s1
20111004-170035 INFO MASTER_PORT: 9312
20111004-170035 INFO DRIZZLE_TCP_PORT: 9313
20111004-170035 INFO MC_PORT: 9314
20111004-170035 INFO PBMS_PORT: 9315
20111004-170035 INFO RABBITMQ_NODE_PORT: 9316
20111004-170035 INFO VARDIR: /drizzle_mm_test/tests/workdir/bot0/s1/var
20111004-170035 INFO STATUS: 1
20111004-170035 INFO bot0 server:
20111004-170035 INFO NAME: s2
20111004-170035 INFO MASTER_PORT: 9318
20111004-170035 INFO DRIZZLE_TCP_PORT: 9319
20111004-170035 INFO MC_PORT: 9320
20111004-170035 INFO PBMS_PORT: 9321
20111004-170035 INFO RABBITMQ_NODE_PORT: 9322
20111004-170035 INFO VARDIR: /drizzle_mm_test/tests/workdir/bot0/s2/var
20111004-170035 INFO STATUS: 1
20111004-170035 INFO User specified --start-and-exit. dbqp.py exiting and leaving servers running...
pcrews@mister:/drizzle_mm_test/tests$ ps -al
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 1000 18652 1 2 80 0 - 112094 poll_s pts/2 00:00:00 lt-drizzled
0 S 1000 18688 1 3 80 0 - 112096 poll_s pts/2 00:00:00 lt-drizzled
0 S 1000 18721 1 3 80 0 - 156326 poll_s pts/2 00:00:00 lt-drizzled
0 R 1000 18780 15985 0 80 0 - 3375 - pts/2 00:00:00 ps
0 S 1000 32463 30047 0 80 0 - 11272 poll_s pts/1 00:00:01 ssh

From here, we can connect to the slave and check out sys_replication.applier_state:

$ drizzle -uroot -p9318 test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the Drizzle client.. Commands end with ; or \g.
Your Drizzle connection id is 216
Connection protocol: mysql
Server version: 2011.09.26.2427 Source distribution (drizzle_mm_test)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle> use sys_replication;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Schema changed
drizzle> show tables;
+---------------------------+
| Tables_in_sys_replication |
+---------------------------+
| applier_state |
| io_state |
| queue |
+---------------------------+
3 rows in set (0.001641 sec)

drizzle> select * from applier_state;
+-----------+------------------------+--------------------------------------+-----------------------+---------+-----------+
| master_id | last_applied_commit_id | originating_server_uuid | originating_commit_id | status | error_msg |
+-----------+------------------------+--------------------------------------+-----------------------+---------+-----------+
| 1 | 0 | f716781f-8c00-4b81-82c6-62039136d616 | 0 | RUNNING | |
| 2 | 3 | df7f2f6e-dba4-43ea-a674-fa4a3709865b | 3 | RUNNING | |
+-----------+------------------------+--------------------------------------+-----------------------+---------+-----------+
2 rows in set (0.000928 sec)

drizzle> select * from io_state;
+-----------+---------+-----------+
| master_id | status | error_msg |
+-----------+---------+-----------+
| 1 | STOPPED | |
| 2 | RUNNING | |
+-----------+---------+-----------+
2 rows in set (0.000839 sec)

drizzle>

So, it looks like the slave knows about both masters, but for some reason, the applier from master 1 is stopped : (
At any rate, there is a bug open on this and it could be something in my config(?) It’s been a while since I’ve played with replication and I know there has been some tinkering under the hood since then : )

The branch with the test code can be found here:
lp:~patrick-crews/drizzle/dbqp_multi_master_test

At the very least, we can now create tests that use this feature, which will help ensure that it stays on the path of solid code in the future! How about anyone out there? Has anyone been using multi-master? If so, can you share any setups / tests? Extra information would be most appreciated : )