Salting the cloud for fun and profit

It has been quite some time since I posted.  This is largely because I have been busy digging into the wonderful world of OpenStack over at HP Cloud Services.

One of the things I have been working on is the new Load Balancing as a Service (LBaaS) offering with former Drizzle teammates Andrew Hutchings and David Shrewsbury.  You can find out more details and whatnot for LBaaS here, but this post isn’t so much about the details of load balancing as it is about the neat CI / CD tricks we are accomplishing via Jenkins and Saltstack.

What we needed was a way to spin up, configure, and manage test slaves without necessarily wanting to manage them forever and always.  I’m sure many Jenkins users out there have dealt with borked, messy, misconfigured slave machines that cost a lot of time and effort to get sorted.  So how to solve this? Hey, I know!  We have a cloud, so let’s use it.  Let’s create on-demand machines that we can configure as needed and then blow away.  Sounds good, right? Well, like many things in life it is easier said than done.

One of the first things we tried was the jclouds plugin for Jenkins.  While this is capable of some interesting tricks, we never really felt it integrated nicely into Jenkins (or maybe I didn’t properly grok its paradigm).  One can create Jenkins jobs to create vm’s, but those new vm’s kind of just float there.  You have to pre-configure the vm (creating a base image) at least a little bit (to allow Jenkins or anyone else to get into it) and perform a variety of other tricks to really ‘own’ the machines.  My experiments with this were rather messy and frustrating.

Enter Saltstack + salt-cloud.  Much like Puppet or Chef (or any number of other tools), salt aims to provide a means of creating, configuring, and controlling other machines.

What I like about it is:

  • it is written in Python
  • it is rather easy to grasp (ymmv)
  • provides a one-stop shop for this
  • it is written in Python

What salt-cloud does is provide cloud control / integration for salt.  One can define config files with your cloud credentials, profile files that define vms, and map files that define sets of vm’s based on the profiles.  From there, one can use the tool to create individual vm’s from the command line, or a number of vm’s from a map file.  The vm’s that are created are auto-registered to the salt-master and once they are up and running, they are full and proper salt-minions that you can use to do your bidding immediately >:)

We use the tool for a variety of purposes, but one of the more interesting applications is for testing.  Below is sample output from one of our Jenkins jobs.  What we do is create a vm via a salt-master slave, then configure it to have the python-libraclient and the lbaas test suite.  From there, we use salt to call the test suite and report on the results.  Once the test is done, we can blow away the vm and repeat the process as needed.  We no longer have to manage a lot of Jenkins slaves, just key ones and we can have a practically unlimited array of virtual machines for whatever purposes we may have.

Here, at the start of our test, we simply create a new base vm. The -p is the profile to use. One may have a variety of profiles using different OS’s, sizes, etc.


+ sudo salt-cloud -p hp_az3_large lbaas-client-install-tester1
[INFO ] Loaded configuration file: /etc/salt/cloud
[INFO ] salt-cloud starting
[WARNING ] 'AWS.id' not found in options. Not loading module.
[WARNING ] 'EC2.id' not found in options. Not loading module.
[INFO ] Creating Cloud VM lbaas-client-install-tester1
[WARNING ] Private IPs returned, but not public... checking for misidentified IPs
[WARNING ] 10.2.154.133 is a private ip
[WARNING ] 15.185.228.34 is a public ip
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added '15.185.228.34' (ECDSA) to the list of known hosts.
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added '15.185.228.34' (ECDSA) to the list of known hosts.
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added '15.185.228.34' (ECDSA) to the list of known hosts.
* INFO: /tmp/deploy.sh -- Version 1.5.1
* WARN: Running the unstable version of bootstrap-salt.sh
* INFO: System Information:
* INFO: CPU: GenuineIntel
* INFO: CPU Arch: x86_64
* INFO: OS Name: Linux
* INFO: OS Version: 3.2.0-32-virtual
* INFO: Distribution: Ubuntu 12.04
* INFO: Installing minion
* INFO: Found function install_ubuntu_deps
* INFO: Found function config_salt
* INFO: Found function install_ubuntu_stable
* INFO: Found function install_ubuntu_restart_daemons
* INFO: Running install_ubuntu_deps()

[INFO ] Salt installed on lbaas-client-install-tester1
[INFO ] Created Cloud VM lbaas-client-install-tester1 with the following values:
[INFO ] private_ips: [u'10.2.154.133']
[INFO ] extra: {'updated': u'2013-04-11T16:02:30Z', 'hostId': u'', 'created': u'2013-04-11T16:02:29Z', 'key_name': u'lbaas-saltmaster', 'uri': u'https://az-3.********.compute.hpcloudsvc.com/v1.1/42064206420642/servers/891293', 'imageId': u'48335', 'metadata': {}, 'password': u'thismachinewontlivelongenoughforyoutouseit', 'flavorId': u'103', 'tenantId': u'42064206420642'}
[INFO ] image: None
[INFO ] _uuid: None
[INFO ] driver: [INFO ] state: 4
[INFO ] public_ips: [u'15.185.228.34']
[INFO ] size: None
[INFO ] id: 891293
[INFO ] name: lbaas-client-install-tester1

Once the vm has been created and salt has been installed, we can call state.highstate to configure the machine to install the lbaas client and test suite:


+ sudo salt lbaas-client-install-tester1 state.highstate
lbaas-client-install-tester1:
----------
State: - pkg
Name: required_packages
Function: installed
Result: True
Comment: The following package(s) were installed/updated: python-pip, python-novaclient, git, python-requests, python-prettytable.
Changes: python-novaclient: {'new': '2012.1-0ubuntu1', 'old': ''}
python-setuptools: {'new': '0.6.24-1ubuntu1', 'old': ''}
git: {'new': '1:1.7.9.5-1', 'old': ''}
liberror-perl: {'new': '0.17-1', 'old': ''}
python-pip: {'new': '1.0-1build1', 'old': ''}
python-distribute: {'new': '1', 'old': ''}
python-requests: {'new': '0.8.2-1', 'old': ''}
git-man: {'new': '1:1.7.9.5-1', 'old': ''}
python-greenlet: {'new': '0.3.1-1ubuntu5.1', 'old': ''}
python-gevent: {'new': '0.13.6-1ubuntu1', 'old': ''}
git-completion: {'new': '1', 'old': ''}
python-prettytable: {'new': '0.5-1ubuntu2', 'old': ''}
----------
State: - git
Name: https://github.com/pcrews/libra-integration-tests.git
Function: latest
Result: True
Comment: Repository https://github.com/pcrews/libra-integration-tests.git cloned to /root/libra-integration-tests
Changes: new: https://github.com/pcrews/libra-integration-tests.git
revision: f6290d551188c9239248f0cd0ddcf22470c444d3
...

From there, we can use salt to execute the test suite through the client on the vm and get the results back:

sudo salt \lbaas-client-install-tester1 cmd.run_all cwd='/root/libra-integration-tests' 'python loadbalancer_integration.py --os_username=******** --os_password=******** --os_tenant_name=********-tenant --os_auth_url=https://********.identity.hpcloudsvc.com:35357/v2.0/ --os_region_name=******** --driver=python-client --max_backend_nodes=50 '
0
lbaas-client-install-tester1:
----------
pid:
6567
retcode:
0
stderr:
test_createLoadBalancer (tests.create_loadbalancer.testCreateLoadBalancer)
test creation of loadbalancers for libra ... 20130411-160445 PM Setting up for testcase:
20130411-160445 PM - test_description: basic_positive_name
20130411-160445 PM - lb_name: the quick, brown fox jumps over the lazy dog.
20130411-160445 PM - nodes: [{'port': '80', 'address': '15.185.42.06'}, {'port': '80', 'address': '15.185.42.07'}]
20130411-160445 PM - expected_status: 200
20130411-160445 PM load balancer id: 132651
20130411-160445 PM load balancer ip addr: 15.185.226.182
20130411-160445 PM Validating load balancer detail...
20130411-160446 PM Validating load balancer list...
20130411-160447 PM Validating load balancer nodes url...
20130411-160447 PM testing loadbalancer function...
20130411-160447 PM gathering backend node etags...
20130411-160447 PM testing lb for function...
20130411-160447 PM Deleting loadbalancer: 132651
ok
...

 

After the tests are finished, we simply blow the vm away via the magic of salt-cloud again:

+ sudo salt-cloud -y -d lbaas-client-install-tester1
[INFO ] Loaded configuration file: /etc/salt/cloud
[INFO ] salt-cloud starting
[WARNING ] 'AWS.id' not found in options. Not loading module.
[WARNING ] 'EC2.id' not found in options. Not loading module.
[INFO ] Destroying VM: lbaas-client-install-tester1
[INFO ] Destroyed VM: lbaas-client-install-tester1
- True
Finished: SUCCESS

</code>

 

From there, it is easy to create new variations of this work – one can create multiple client minions for stress testing, clients running different sets of tests, etc.  In addition to this, we are doing some neat tricks with Jenkins + salt for monitoring and management.  If you guys are interested in hearing more and learning some nifty cloud / salt voodoo, let me know in the comments section 😉

Speaking at the Percona Live MySQL Conference and Expo

A number of people have already mentioned this, but the Percona Live MySQL Conference and Expo is just around the corner.
As Stewart has already blogged, there are a number of great sessions this year and I’m looking forward to several of them.

I’ll be giving a talk there as well –
It’s essentially all in the abstract, but I’ll be speaking about various functional testing tools that exist for MySQL-based systems.
Come to learn more about the random query generator, MTR, and kewpie and how they might be of use to you.

Speaking of kewpie, I’ll also be presenting about it at Drizzle Developer Day, which is on 4/13, the day after the main conference.
If you are interested in learning more about Drizzle, whether it be from hearing the various presentations to having a chance to chat with the developers and fellow enthusiasts, you should check it out.  Much like Stewart, I’m quite psyched about the event and doubly excited that my employer, Percona, is sponsoring it.

Additionally, there are other events that day, as Peter mentions here:

dbqp being renamed

One of the best things that can happen to a piece of software is for people to actually use it.

I’ve been fortunate enough to have received feedback on the tool from several members of both the Percona and Drizzle teams.  The most common and strongly emphasized comments were in regards to what a terrible, terrible name dbqp really is in terms of saying, seeing, and typing it ; )

As that isn’t something that can be disputed (it’s really annoying to use in conversations *and* to type several dozen times a day), the project has been renamed to kewpie.  For those that follow such things, I did present on another tool with that name at the last MySQL Conference, but *that* tool is a nice-to-have, while the test-runner sees daily use.  Better to save the good names for software that actually stands a chance of being used, I say : )

While there are probably 1*10^6 other things I need to do (Stewart is a merciless slave driver as a boss, btw…heheh), the fact that we are merging the tool into the various Percona branches meant it should be done sooner rather than later.  The tool is currently in our 5.1 branch and I have merge requests up for both Drizzle and Xtrabackup (dbqp was living there too).

I have several other interesting things going on with the tests and tool, which I’ll be blogging about over at MySQL Performance Blog.  Later this week, I’ll be talking about what we’ve been doing to work on this bug ; )

 

Also, the Percona Live MySQL Conference in DC is just around the corner.  There are going to be some great speakers and attendees

dbqp and Xtrabackup testing

So I’m back from the Percona dev team’s recent meeting.  While there, we spent a fair bit of time discussing Xtrabackup development.  One of our challenges is that as we add richer features to the tool, we need equivalent testing capabilities.  However, it seems a constant in the MySQL world that available QA tools often leave something to be desired.  The randgen is a literal wonder-tool for database testing, but it is also occasionally frustrating / doesn’t scratch every testing itch.  It is based on technology SQL Server was using in 1998 (MySQL began using it in ~2007, IIRC).  So this is no knock, it is merely meant to be an example of a poor QA engineer’s frustrations ; )  While the current Xtrabackup test suite is commendable, it also has its limitations. Enter the flexible, adaptable, and expressive answer: dbqp.

One of my demos at the dev meeting was showing how we can set up tests for Xtrabackup using the unittest paradigm.  While this sounds fancy, basically, we take advantage of Python’s unittest and write classes that use their code.  The biggest bit dbqp does is search the specified server code (to make sure we have everything we should), allocate and manage servers as requested by the test cases, and do some reporting and management of the test cases.  As the tool matures, I will be striving to let more of the work be done by unittest code rather than things I have written : )

To return to my main point, we now have two basic tests of xtrabackup:

Basic test of backup + restore:

  1. Populate server
  2. Take a validation snapshot (mysqldump)
  3. Take the backup (via innobackupex)
  4. Clean datadir
  5. Restore from backup
  6. Take restored state snapshot and compare to original state

Slave setup

  1. Similar to our basic test except we create a slave from the backup, replicating from the backed up server.
  2. After the initial setup, we ensure replication is set up ok, then we do additional work on the master and compare master and slave states

One of the great things about this is that we have the magic of assertions.  We can insert them at any point of the test we feel like validating and the test will fail with useful output at that stage.  The backup didn’t take correctly?  No point going through any other steps — FAIL! : )  The assertion methods just make it easy to express what behavior we are looking for.  We want the innobackupex prepare call to run without error?
Boom goes the dynamite!:

# prepare our backup
cmd = ("%s --apply-log --no-timestamp --use-memory=500M "
"--ibbackup=%s %s" %( innobackupex
, xtrabackup
, backup_path))
retcode, output = execute_cmd(cmd, output_path, exec_path, True)
self.assertEqual(retcode, 0, msg = output)

From these basic tests, it will be easy to craft more complex test cases.  Creating the slave test was simply matter of adapting the initial basic test case slightly.  Our plans include: *heavy* crash testing of both xtrabackup and the server, enhancing / expanding replication tests by creating heavy randgen loads against the master during backup and slave setup, and other assorted crimes against database software.  We will also be porting the existing test suite to use dbqp entirely…who knows, we may even start working on Windows one day ; )

These tests are by no means the be-all-end-all, but I think they do represent an interesting step forward.  We can now write actual, honest-to-goodness Python code to test the server.  On top of that, we can make use of the included unittest module to give us all sorts of assertive goodness to express what we are looking for.  We will need to and plan to refine things as time moves forward, but at the moment, we are able to do some cool testing tricks that weren’t easily do-able before.

If you’d like to try these tests out, you will need the following:
* dbqp (bzr branch lp:dbqp)
* DBD:mysql installed (test tests use the randgen and this is required…hey, it is a WONDER-tool!) : )
* Innobackupex, a MySQL / Percona server and the appropriate xtrabackup binary.

The tests live in dbqp/percona_tests/xtrabackup_basic and are named basic_test.py and slave_test.py, respectively.

To run them:
$./dbqp.py –suite=xtrabackup_basic –basedir=/path/to/mysql –xtrabackup-path=/mah/path –innobackupex-path=/mah/other/path –default-server-type=mysql –no-shm

Some next steps for dbqp include:
1)  Improved docs
2)  Merging into the Percona Server trees
3)  Setting up test jobs in Jenkins (crashme / sqlbench / randgen)
4)  Other assorted awesomeness

Naturally, this testing goodness will also find its way into Drizzle (which currently has a 7.1 beta out).  We definitely need to see some Xtrabackup test cases for Drizzle’s version of the tool (mwa ha ha!) >: )

Drizzle / dbqp updates

Just wanted to blog about some of the latest updates to dbqp.  We just merged some interesting changes into Drizzle (just in time for the impending Fremont beta).  In additional to general code cleanup / reorganization, we have the following goodies:

Randgen in the Drizzle tree

One of the biggest things is that the random query generator (aka randgen) is now part of the Drizzle tree.  While I did some of the work here, the major drivers of this happening were Brian and Stewart:

  1. Brian makes a fair argument that the easier / more convenient it is to run a test, the greater the likelihood of it being run.  Additional tools to install, etc = not so much.  Having something right there and ready to go = win!
  2. Stewart is also a fan of convenience, lotsa testing, and working smarter, not harder.  As a result, he did the initial legwork on merging the randgen.  I do suspect there is still much for me to learn about properly bzr joining trees and whatnot, but we’ll get it right soon enough ; )

This doesn’t mean we won’t be contributing any changes we make back to the main randgen project / branch, it is strictly to facilitate more testing for Drizzle.  As we already have our randgen tests packaged into dbqp-runnable suites, running these tests is even easier : )

–libeatmydata

Another request fulfilled in this update is the ability to use Stewart’s libeatmydata to speed up testing.  By default, dbqp uses shared memory as a workdir, similar to mysql-test-run’s –mem option (this can be bypassed in dbqp with –no-shm, fyi).  However, this isn’t always perfect or desirable to do.

An alternative is to use libeatmydata, which disables fsync() calls.  As the name implies, you don’t want to use it if care about your data, but for general testing purposes, it can greatly speed up test execution.

If you have the library installed / on your machine, you can use it like so:  ./dbqp –libeatmydata [–libeatmydata-path ] …

By default, libeatmydata-path is /usr/local/lib/libeatmydata.so (as if you used make install)

Multiple server types

IMHO, this is one of the coolest new tricks.  dbqp can now handle more than just Drizzle servers / source!  The ultimate idea is to allow tests that utilize more than one type / version of a server to have more interesting tests : )  This will be useful for scenarios like testing Drizzledump migration as we can feed in one (or more) MySQL servers and a Drizzle tree and make sure we can migrate data from all of them.

We also intend to utilize dbqp for testing a variety of Percona products, and it is kind of handy to be able to run the code you are testing ; )  I already have the tool running Percona / MySQL servers and have some randgen tests working:


$ ./dbqp.py --default_server_type=mysql --basedir=/percona-server/Percona-Server --mode=randgen
Setting --no-secure-file-priv=True for randgen usage...
20111013-163443 INFO Linking workdir /dbqp/workdir to /dev/shm/dbqp_workdir_pcrews_9dbc7e8a-2872-45a9-8a07-f347f6184246
20111013-163443 INFO Using mysql source tree:
20111013-163443 INFO basedir: /percona-server/Percona-Server
20111013-163443 INFO clientbindir: /percona-server/Percona-Server/client
20111013-163443 INFO testdir: /dbqp
20111013-163443 INFO server_version: 5.5.16-rel21.0
20111013-163443 INFO server_compile_os: Linux
20111013-163443 INFO server_platform: x86_64
20111013-163443 INFO server_comment: (Percona Server with XtraDB (GPL), Release rel21.0, Revision 188)
20111013-163443 INFO Using default-storage-engine: innodb
20111013-163443 INFO Using testing mode: randgen
20111013-163443 INFO Processing test suites...
20111013-163443 INFO Found 5 test(s) for execution
20111013-163443 INFO Creating 1 bot(s)
20111013-163449 INFO Taking clean db snapshot...
20111013-163452 INFO bot0 server:
20111013-163452 INFO NAME: s0
20111013-163452 INFO MASTER_PORT: 9307
20111013-163452 INFO SOCKET_FILE: /dbqp/workdir/bot0/s0/var/s0.sock
20111013-163452 INFO VARDIR: /dbqp/workdir/bot0/s0/var
20111013-163452 INFO STATUS: 1
20111013-163506 ===============================================================
20111013-163506 TEST NAME [ RESULT ] TIME (ms)
20111013-163506 ===============================================================
20111013-163506 main.blob [ pass ] 8624
20111013-163516 main.create_drop [ pass ] 2862
20111013-163524 main.many_indexes [ pass ] 1429
20111013-163547 main.optimizer_subquery [ pass ] 17153
20111013-163558 main.outer_join [ pass ] 4243
20111013-163558 ===============================================================
20111013-163558 INFO Test execution complete in 69 seconds
20111013-163558 INFO Summary report:
20111013-163558 INFO Executed 5/5 test cases, 100.00 percent
20111013-163558 INFO STATUS: PASS, 5/5 test cases, 100.00 percent executed
20111013-163558 INFO Spent 34 / 69 seconds on: TEST(s)
20111013-163558 INFO Test execution complete
20111013-163558 INFO Stopping all running servers...

Expect to see this up and running tests against Percona Server in the next week or so.  I’ll be writing more about this soon.

Native / unittest mode

This hasn’t made it into the Drizzle tree yet.  To ease merging the code with Percona Server / Xtrabackup, I’ve created a separate launchpad project.  One of the things we needed was the ability to write complex tests directly.  It is currently easy to plug new tools into dbqp, but we essentially needed a new tool for certain testing needs.

Our solution for this was to allow dbqp to run python unittest modules.  We still have a bit of work to do before we have some demo tests ready, but we will be creating some expanded Xtrabackup tests using this system very soon.  So far, it is turning out to be pretty neat:


./dbqp.py --default_server_type=mysql --basedir=/percona-server/Percona-Server --mode=native
20111013-190744 INFO Killing pid 1747 from /dbqp/workdir/bot0/s0/var/run/s0.pid
20111013-190744 INFO Linking workdir /dbqp/workdir to /dev/shm/dbqp_workdir_pcrews_9dbc7e8a-2872-45a9-8a07-f347f6184246
20111013-190744 INFO Using mysql source tree:
20111013-190744 INFO basedir: /percona-server/Percona-Server
20111013-190744 INFO clientbindir: /percona-server/Percona-Server/client
20111013-190744 INFO testdir: /dbqp
20111013-190744 INFO server_version: 5.5.16-rel21.0
20111013-190744 INFO server_compile_os: Linux
20111013-190744 INFO server_platform: x86_64
20111013-190744 INFO server_comment: (Percona Server with XtraDB (GPL), Release rel21.0, Revision 188)
20111013-190744 INFO Using default-storage-engine: innodb
20111013-190744 INFO Using testing mode: native
20111013-190744 INFO Processing test suites...
20111013-190744 INFO Found 1 test(s) for execution
20111013-190744 INFO Creating 1 bot(s)
20111013-190749 INFO Taking clean db snapshot...
20111013-190750 INFO bot0 server:
20111013-190750 INFO NAME: s0
20111013-190750 INFO MASTER_PORT: 9306
20111013-190750 INFO SOCKET_FILE: /dbqp/workdir/bot0/s0/var/s0.sock
20111013-190750 INFO VARDIR: /dbqp/workdir/bot0/s0/var
20111013-190750 INFO STATUS: 1
20111013-190756 ===============================================================
20111013-190756 TEST NAME [ RESULT ] TIME (ms)
20111013-190756 ===============================================================
20111013-190756 main.example_test [ pass ] 1
20111013-190756 test_choice (example_test.TestSequenceFunctions) ... ok
20111013-190756 test_sample (example_test.TestSequenceFunctions) ... ok
20111013-190756 test_shuffle (example_test.TestSequenceFunctions) ... ok
20111013-190756
20111013-190756 ----------------------------------------------------------------------
20111013-190756 Ran 3 tests in 0.000s
20111013-190756
20111013-190756 OK
20111013-190756
20111013-190756 ===============================================================
20111013-190756 INFO Test execution complete in 6 seconds
20111013-190756 INFO Summary report:
20111013-190756 INFO Executed 1/1 test cases, 100.00 percent
20111013-190756 INFO STATUS: PASS, 1/1 test cases, 100.00 percent executed
20111013-190756 INFO Spent 0 / 6 seconds on: TEST(s)
20111013-190756 INFO Test execution complete
20111013-190756 INFO Stopping all running servers...

This really only scratches the surface of what can happen, but I’ll be writing more in-depth articles on what kind of tricks we can pull off as the code gets more polished.

Three non-testing bits:

1)  Percona Live London is just around the corner and members of the Drizzle team will be there.

2)  We are *this* close to Fremont beta being ready.  The contributions and feedback have been most welcome.  Any additional testing / etc are most appreciated.

3)  Drizzle is now part of the SPI!

 

Drizzle multi-master testing!

So, it has been a while since I’ve blogged.  As some of you may have read, I have a new job and Stewart and I have been busy planning all kinds of testing goodness for Percona >: ) (I’ve also been recovering from trying to keep up with Stewart!)

Rest assured, gentle readers, that I have not forgotten everyone’s favorite modular, community-driven database ; )  Not by a long-shot.  I have some major improvements to dbqp getting ready for a merge (think randgen in-tree / additional testing modes / multiple basedirs of multiple types).  Additionally, I’ve been cooking up some code to test the mighty Mr. Shrews’ multi-master code (mwa ha ha!)

What I’ve done is allow for a new option to be used with a test’s .cnf file (this is a dbqp thing, won’t work with standard drizzle-test-run).  If the runner sees this request, it will generate a multi-master config file from the specified servers’ individual slave.cnf files. 

Here is a sample config:

[test_servers]
servers = [[--innodb.replication-log],[--innodb.replication-log],[--plugin-add=slave --slave.config-file=$MASTER_SERVER_SLAVE_CONFIG]]

[s2]
# we tell the system that we want
# to generate a multi-master cnf file
# for the 3rd server to use, that
# has the first two servers as masters
# the final file is written to the first
# server's general slave.cnf file
gen_multi_master_cnf= 0,1

A good rundown of the file’s contents can be found on Shrews’ blog here, but the end result looks like this:

ignore-errors

[master1]
master-host=127.0.0.1
master-port=9306
master-user=root
master-pass=''

[master2]
master-host=127.0.0.1
master-port=9312
master-user=root
master-pass=''

I tried cooking up a basic test case where we spin up 3 servers – 2 masters and one slave.  One master 1, we create table t1:


CREATE TABLE t1 (a int not null auto_increment, primary key(a));

On master 2, table t2:


CREATE TABLE t2 (a int not null auto_increment, primary key(a));

We insert some records into both tables, then check that our slave has everything! Sounds simple, right?

Sigh. If only. It seems that we are running into some issues when we try to record the test – you can read the bug here

We see some interesting output in the slave’s logs before it crashes:

$ cat workdir/bot0/s2/var/log/s2.err
InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: 127 rollback segment(s) active.
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
(SQLSTATE 00000) Duplicate entry '772-1' for key 'PRIMARY'
Failure while executing:
INSERT INTO `sys_replication`.`queue` (`master_id`, `trx_id`, `seg_id`, `commit_order`, `originating_server_uuid`, `originating_commit_id`, `msg`) VALUES (2, 772, 1, 1, 'ac9c8ac0-8f10-474b-9bbd-b61d2cdb2b93' , 1, 'transaction_context {
server_id: 1
transaction_id: 772
start_timestamp: 1317760732106016
end_timestamp: 1317760732106017
}
event {
type: STARTUP
}
segment_id: 1
end_segment: true
')

Replication slave: Unable to insert into queue.
Replication slave: drizzle_state_read:lost connection to server (EOF)
Lost connection to master. Reconnecting.
Replication slave: drizzle_state_connect:could not connect
111004 16:39:05 InnoDB: Starting shutdown...

Additionally, you can just try the setup with –start-and-exit:

$ ./dbqp --suite=slave --start-and-exit multi_master_basic
20111004-170033 INFO Using Drizzle source tree:

20111004-170033 INFO Taking clean db snapshot...
20111004-170033 INFO Taking clean db snapshot...
20111004-170033 INFO Taking clean db snapshot...
20111004-170035 INFO bot0 server:
20111004-170035 INFO NAME: s0
20111004-170035 INFO MASTER_PORT: 9306
20111004-170035 INFO DRIZZLE_TCP_PORT: 9307
20111004-170035 INFO MC_PORT: 9308
20111004-170035 INFO PBMS_PORT: 9309
20111004-170035 INFO RABBITMQ_NODE_PORT: 9310
20111004-170035 INFO VARDIR: /drizzle_mm_test/tests/workdir/bot0/s0/var
20111004-170035 INFO STATUS: 1
20111004-170035 INFO bot0 server:
20111004-170035 INFO NAME: s1
20111004-170035 INFO MASTER_PORT: 9312
20111004-170035 INFO DRIZZLE_TCP_PORT: 9313
20111004-170035 INFO MC_PORT: 9314
20111004-170035 INFO PBMS_PORT: 9315
20111004-170035 INFO RABBITMQ_NODE_PORT: 9316
20111004-170035 INFO VARDIR: /drizzle_mm_test/tests/workdir/bot0/s1/var
20111004-170035 INFO STATUS: 1
20111004-170035 INFO bot0 server:
20111004-170035 INFO NAME: s2
20111004-170035 INFO MASTER_PORT: 9318
20111004-170035 INFO DRIZZLE_TCP_PORT: 9319
20111004-170035 INFO MC_PORT: 9320
20111004-170035 INFO PBMS_PORT: 9321
20111004-170035 INFO RABBITMQ_NODE_PORT: 9322
20111004-170035 INFO VARDIR: /drizzle_mm_test/tests/workdir/bot0/s2/var
20111004-170035 INFO STATUS: 1
20111004-170035 INFO User specified --start-and-exit. dbqp.py exiting and leaving servers running...
pcrews@mister:/drizzle_mm_test/tests$ ps -al
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 1000 18652 1 2 80 0 - 112094 poll_s pts/2 00:00:00 lt-drizzled
0 S 1000 18688 1 3 80 0 - 112096 poll_s pts/2 00:00:00 lt-drizzled
0 S 1000 18721 1 3 80 0 - 156326 poll_s pts/2 00:00:00 lt-drizzled
0 R 1000 18780 15985 0 80 0 - 3375 - pts/2 00:00:00 ps
0 S 1000 32463 30047 0 80 0 - 11272 poll_s pts/1 00:00:01 ssh

From here, we can connect to the slave and check out sys_replication.applier_state:

$ drizzle -uroot -p9318 test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the Drizzle client.. Commands end with ; or \g.
Your Drizzle connection id is 216
Connection protocol: mysql
Server version: 2011.09.26.2427 Source distribution (drizzle_mm_test)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle> use sys_replication;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Schema changed
drizzle> show tables;
+---------------------------+
| Tables_in_sys_replication |
+---------------------------+
| applier_state |
| io_state |
| queue |
+---------------------------+
3 rows in set (0.001641 sec)

drizzle> select * from applier_state;
+-----------+------------------------+--------------------------------------+-----------------------+---------+-----------+
| master_id | last_applied_commit_id | originating_server_uuid | originating_commit_id | status | error_msg |
+-----------+------------------------+--------------------------------------+-----------------------+---------+-----------+
| 1 | 0 | f716781f-8c00-4b81-82c6-62039136d616 | 0 | RUNNING | |
| 2 | 3 | df7f2f6e-dba4-43ea-a674-fa4a3709865b | 3 | RUNNING | |
+-----------+------------------------+--------------------------------------+-----------------------+---------+-----------+
2 rows in set (0.000928 sec)

drizzle> select * from io_state;
+-----------+---------+-----------+
| master_id | status | error_msg |
+-----------+---------+-----------+
| 1 | STOPPED | |
| 2 | RUNNING | |
+-----------+---------+-----------+
2 rows in set (0.000839 sec)

drizzle>

So, it looks like the slave knows about both masters, but for some reason, the applier from master 1 is stopped : (
At any rate, there is a bug open on this and it could be something in my config(?) It’s been a while since I’ve played with replication and I know there has been some tinkering under the hood since then : )

The branch with the test code can be found here:
lp:~patrick-crews/drizzle/dbqp_multi_master_test

At the very least, we can now create tests that use this feature, which will help ensure that it stays on the path of solid code in the future! How about anyone out there? Has anyone been using multi-master? If so, can you share any setups / tests? Extra information would be most appreciated : )

Drizzle’s Jenkins system using dbqp for randgen and crashme testing

Well, that’s pretty much it, thanks for stopping by ; )

In all seriousness, it’s kind of neat that we’re using dbqp to run some of our staging tests and we gain a few neat things:

Speed

Here are the trend charts for randgen and crashme.  While it doesn’t look like randgen is showing much of an improvement, it is worth mentioning that this job now runs both the standard and the transaction log tests in a single run >: )  Previously, we had a separate drizzle-automation job for the transaction log.  Just the trx_log tests took ~30 minutes to run (plus build time).  Long story short, we’re saving about 30-40 minutes on randgen testing per staging run and only needing to build once!

Maintainability

The jobs we run are in the tree and anyone can easily repeat them.  While Drizzle-automation kicks major butt (and I have taken many ideas from it), it is a separate piece of software that requires setup and maintenance.  Basing things around an in-tree setup means that you only need the code and any required bits and pieces.  Now if we need to set up a new randgen machine, we only need the randgen and dbd::drizzle installed (and we plan on including randgen in-tree soon, so you won’t even need that!).  If we need to set up a new crash-me machine, we only need dbd::drizzle – and everyone should have dbd::drizzle installed! ; )

Ease of use

Pretty much all tests provide the same standard output:

dtr mode

From the command:

./dbqp

Our default mode is dtr (aka using drizzletest.cc to execute standard .test files). To run all available tests, use the make target – make test-dbqp

20110621-081404  trigger_dictionary.loaded                  [ pass ]       43
20110621-081408  logging_stats.cumulative                   [ pass ]     1045
20110621-081412  errmsg_stderr.stderr                       [ pass ]       36
20110621-081412  ===============================================================
20110621-081412 INFO Test execution complete in 496 seconds
20110621-081412 INFO Summary report:
20110621-081412 INFO Executed 566/566 test cases, 100.00 percent
20110621-081412 INFO STATUS: PASS, 566/566 test cases, 100.00 percent executed
20110621-081412 INFO Spent 254 / 496 seconds on: TEST(s)
20110621-081412 INFO Test execution complete
20110621-081412 INFO Stopping all running servers...

randgen mode

From the command:

./dbqp --mode=randgen --randgen-path=/path/to/your/randgen


20110621-170141  main.subquery                              [ pass ]     3780
20110621-170148  main.subquery_semijoin                     [ pass ]     3016
20110621-170156  main.subquery_semijoin_nested              [ pass ]     3750
20110621-170202  main.varchar                               [ pass ]     2658
20110621-170202  ===============================================================
20110621-170202 INFO Test execution complete in 147 seconds
20110621-170202 INFO Summary report:
20110621-170202 INFO Executed 19/19 test cases, 100.00 percent
20110621-170202 INFO STATUS: PASS, 19/19 test cases, 100.00 percent executed
20110621-170202 INFO Spent 77 / 147 seconds on: TEST(s)
20110621-170202 INFO Test execution complete
20110621-170202 INFO Stopping all running servers...

crashme mode

From the command:

./dbqp --mode=crashme


20110621-181515  main.crashme                               [ fail ]   149840
20110621-181515  func_extra_to_days=error        # Function TO_DAYS
20110621-181515  ###
20110621-181515  ###<select to_days('1996-01-01') from crash_me_d
20110621-181515  ###>2450084
20110621-181515  ###We expected '729024' but got '2450084'
20110621-181515  func_odbc_timestampadd=error        # Function TIMESTAMPADD
20110621-181515  ###
20110621-181515  ###<select timestampadd(SQL_TSI_SECOND,1,'1997-01-01 00:00:00')
20110621-181515  ###>1997-01-01 00:00:01.000000
20110621-181515  ###We expected '1997-01-01 00:00:01' but got '1997-01-01 00:00:01.000000'
20110621-181515  ###
20110621-181515  ###<select {fn timestampadd(SQL_TSI_SECOND,1,{ts '1997-01-01 00:00:00'}) }
20110621-181515  ###>1997-01-01 00:00:01.000000
20110621-181515  ###We expected '1997-01-01 00:00:01' but got '1997-01-01 00:00:01.000000'
20110621-181515
20110621-181515 ERROR Failed test.  Use --force to execute beyond the first test failure
20110621-181515  ===============================================================
20110621-181515 INFO Test execution complete in 153 seconds
20110621-181515 INFO Summary report:
20110621-181515 INFO Executed 1/1 test cases, 100.00 percent
20110621-181515 INFO STATUS: FAIL, 1/1 test cases, 100.00 percent executed
20110621-181515 INFO FAIL tests: main.crashme
20110621-181515 INFO Spent 149 / 153 seconds on: TEST(s)
20110621-181515 INFO Test execution complete
20110621-181515 INFO Stopping all running servers...

While this isn’t a huge feature, it is nice to have a standardized report for knowing if something failed, what failed and how (we always dump test tool output on test failures).  Why is this nice?  Well, the world is a busy place and only needing to know one way of reading test output simplifies things just a teensy little bit.  This small improvement becomes a huge benefit over time if you happen to spend good chunks of your day looking at test output like me : )

Other than that, I’m still working on teaching dbqp interesting new tricks that will help me in testing SkySQL‘s Reference Architecture – expect to hear more about that next month!

Drizzle testing – now with more server stressing goodness!

One of the long term testing goals for Drizzle is to move all of our test logic directly in-tree.  Currently, we use a system called drizzle-automation to execute a variety of tests for our staging branch.  This is the final set of tests patches must pass before being allowed to merge into Drizzle trunk and includes things like sysbench, dbt2, the randgen, etc.  With the development of dbqp, we can now move this testing logic directly into the tree (and even move some of the testing tools there as well).  Of course, I’ve rambled on about this before, but I personally think it is cool and useful ; )  However enough of the sales pitch, on to the new modes!

sysbench mode

With but a simple incantation of ./dbqp –mode=sysbench [–suite=readonly|readwrite], you too can invoke the mighty sysbench configurations that we use to ensure each and every Drizzle patch is worth its salt!

Basically, each test case is a sysbench command line for a certain concurrency:


sysbench --max-time=240 --max-requests=0 --test=oltp --db-ps-mode=disable --drizzle-table-engine=innodb --oltp-read-only=off --oltp-table-size=1000000 --drizzle-mysql=on --drizzle-user=root --drizzle-db=test --drizzle-port=$MASTER_MYPORT --drizzle-host=localhost --db-driver=drizzle --num-threads=32

readonly and readwrite suites differ only with the –oltp-read-only switch being on|off.

The output looks like this (at present):

20110601-191706  ===============================================================
20110601-191706  TEST NAME                                  [ RESULT ] TIME (ms)
20110601-191706  ===============================================================
20110601-191706  readonly.concurrency_16                    [ pass ]   240019
20110601-191706  max_req_lat_ms: 21.44
20110601-191706  rwreqps: 4208.2
20110601-191706  min_req_lat_ms: 6.31
20110601-191706  deadlocksps: 0.0
20110601-191706  tps: 150.29
20110601-191706  avg_req_lat_ms: 6.65
20110601-191706  95p_req_lat_ms: 7.02
20110601-191706  ===============================================================
20110601-191706 INFO Test execution complete in 275 seconds
20110601-191706 INFO Summary report:
20110601-191706 INFO Executed 1/1 test cases, 100.00 percent
20110601-191706 INFO STATUS: PASS, 1/1 test cases, 100.00 percent executed
20110601-191706 INFO Spent 240 / 275 seconds on: TEST(s)
20110601-191706 INFO Test execution complete
20110601-191706 INFO Stopping all running servers...

This is probably the most ‘work-in-progress’ mode we have.  The reason for this is that our Jenkins system uses a database of previous results for comparison / emailing and we need to come up with some way to keep this bit working properly.  I’m still collaborating with the mighty computing wizard Monty Taylor on this.  One of the possibilities we’ve discussed is the use of the Phoronix Test Suite.  Personally, I think this looks pretty interesting / promising and if any php gurus want to assist here, we will compose ballads to honor your awesomeness.

sqlbench mode

Technically, sqlbench and crashme modes are both tied to the sql-bench test suite, however, they do different things and produce different output, so I will discuss them separately.

The biggest thing to note is that sql-bench is now in-tree.  You can read a bit more about this tool here and here

This mode basically calls the run-all-tests sql-bench script.  This executes all of the available tests for sql-bench and reports on the results (dbqp will fail if any sql-bench tests does).  NOTE – this takes some time (~45 minutes on my laptop)

To use it:
./dbqp –mode=sqlbench

Output:

20110608-135645  ===============================================================
20110608-135645  TEST NAME                                  [ RESULT ] TIME (ms)
20110608-135645  ===============================================================
20110608-135645  main.all_sqlbench_tests                    [ pass ]  2732007
20110608-135645  Test finished. You can find the result in:
20110608-135645  drizzle/tests/workdir/RUN-drizzle-Linux_2.6.38_9_generic_x86_64
20110608-135645  Benchmark DBD suite: 2.15
20110608-135645  Date of test:        2011-06-08 13:11:10
20110608-135645  Running tests on:    Linux 2.6.38-9-generic x86_64
20110608-135645  Arguments:           --connect-options=port=9306 --create-options=ENGINE=innodb
20110608-135645  Comments:
20110608-135645  Limits from:
20110608-135645  Server version:      Drizzle 2011.06.19.2325
20110608-135645  Optimization:        None
20110608-135645  Hardware:
20110608-135645
20110608-135645  alter-table: Total time: 42 wallclock secs ( 0.06 usr  0.04 sys +  0.00 cusr  0.00 csys =  0.10 CPU)
20110608-135645  ATIS: Total time: 22 wallclock secs ( 4.01 usr  0.26 sys +  0.00 cusr  0.00 csys =  4.27 CPU)
20110608-135645  big-tables: Total time: 24 wallclock secs ( 4.16 usr  0.22 sys +  0.00 cusr  0.00 csys =  4.38 CPU)
20110608-135645  connect: Total time: 31 wallclock secs ( 6.81 usr  4.50 sys +  0.00 cusr  0.00 csys = 11.31 CPU)
20110608-135645  create: Total time: 59 wallclock secs ( 2.93 usr  1.65 sys +  0.00 cusr  0.00 csys =  4.58 CPU)
20110608-135645  insert: Total time: 1962 wallclock secs (270.53 usr 66.35 sys +  0.00 cusr  0.00 csys = 336.88 CPU)
20110608-135645  select: Total time: 560 wallclock secs (23.12 usr  4.62 sys +  0.00 cusr  0.00 csys = 27.74 CPU)
20110608-135645  transactions: Total time: 21 wallclock secs ( 2.43 usr  1.98 sys +  0.00 cusr  0.00 csys =  4.41 CPU)
20110608-135645  wisconsin: Total time: 10 wallclock secs ( 2.11 usr  0.52 sys +  0.00 cusr  0.00 csys =  2.63 CPU)
20110608-135645
20110608-135645  All 9 test executed successfully
20110608-135645
20110608-135645  Totals per operation:
20110608-135645  Operation             seconds     usr     sys     cpu   tests
20110608-135645  alter_table_add                       18.00    0.02    0.00    0.02     100
20110608-135645  alter_table_drop                      17.00    0.02    0.01    0.03      91
20110608-135645  connect                                2.00    1.02    0.51    1.53    2000
<snip>
20110608-135645  update_rollback                        3.00    0.26    0.23    0.49     100
20110608-135645  update_with_key                       73.00    6.70    5.23   11.93  300000
20110608-135645  update_with_key_prefix                34.00    4.45    2.30    6.75  100000
20110608-135645  wisc_benchmark                         2.00    1.49    0.00    1.49     114
20110608-135645  TOTALS                              2865.00  310.26   79.94  390.20 2974250
20110608-135645
20110608-135645  ===============================================================
20110608-135645 INFO Test execution complete in 2735 seconds
20110608-135645 INFO Summary report:
20110608-135645 INFO Executed 1/1 test cases, 100.00 percent
20110608-135645 INFO STATUS: PASS, 1/1 test cases, 100.00 percent executed
20110608-135645 INFO Spent 2732 / 2735 seconds on: TEST(s)
20110608-135645 INFO Test execution complete
20110608-135645 INFO Stopping all running servers...

crashme mode

This mode is also provided thanks to the sql-bench suite, but the output and processing are different, thus a separate mode and section : )

Anyway, there is a script called crash-me that is provided with sql-bench.  We execute this script, look for any test failures in the output and report pass/fail.

There is an interesting story around these tests (and the sample output)- our Jenkins crashme slave has been down / having problems for a while.  Due to life and whatnot, we’ve had some issues getting it sorted.  However, once I got this mode up and running, I discovered that we were failing some tests:

20110608-152759  ===============================================================
20110608-152759  TEST NAME                                  [ RESULT ] TIME (ms)
20110608-152759  ===============================================================
20110608-152759  main.crashme                               [ fail ]   155298
20110608-152759  func_extra_to_days=error           # Function TO_DAYS
20110608-152759  ###
20110608-152759  ###<select to_days('1996-01-01') from crash_me_d
20110608-152759  ###>2450084
20110608-152759  ###We expected '729024' but got '2450084'
20110608-152759  func_odbc_timestampadd=error               # Function TIMESTAMPADD
20110608-152759  ###
20110608-152759  ###<select timestampadd(SQL_TSI_SECOND,1,'1997-01-01 00:00:00')
20110608-152759  ###>1997-01-01 00:00:01.000000
20110608-152759  ###We expected '1997-01-01 00:00:01' but got '1997-01-01 00:00:01.000000'
20110608-152759  ###
20110608-152759  ###<select {fn timestampadd(SQL_TSI_SECOND,1,{ts '1997-01-01 00:00:00'}) }
20110608-152759  ###>1997-01-01 00:00:01.000000
20110608-152759  ###We expected '1997-01-01 00:00:01' but got '1997-01-01 00:00:01.000000'
20110608-152759
20110608-152759 ERROR Failed test.  Use --force to execute beyond the first test failure
20110608-152759  ===============================================================
20110608-152759 INFO Test execution complete in 158 seconds
20110608-152759 INFO Summary report:
20110608-152759 INFO Executed 1/1 test cases, 100.00 percent
20110608-152759 INFO STATUS: FAIL, 1/1 test cases, 100.00 percent executed
20110608-152759 INFO FAIL tests: main.crashme
20110608-152759 INFO Spent 155 / 158 seconds on: TEST(s)
20110608-152759 INFO Test execution complete

So, while our tests were down, an ugly bug crept into the works.  Of course, it is terrible that we have a bug, but we can always bzrfind our way to the culprit code (expect a mode for that soon!) and we see the value of constant testing!  At any rate, we can now get our Jenkins slave back in working order and any developer or user that wants to stress their server now has an easy way to do so : )

Upcoming work

I’ve also been doing some cleaning up / reorganizing of dbqp code to allow for other neat tricks.  These changes will enable it to run other servers such as MySQL and allow it to serve as the basis of test suites for tools like mydumper and xtrabackup – I’ve already been discussing things with Stewart and Andrew about this and will be blogging / demoing the code very soon.

Additionally, we’re going to also see about moving the randgen into the Drizzle tree.  We use it for a significant portion of our testing and through the magic of bzr join, it will be easy to provide this tool for everyone (provided they have dbd::drizzle installed, of course).  Stewart was kind enough to set up an initial tree, I’ve just been too busy with SkySQL work to get it done this week.

Finally, we’re still moving forward with making dbqp *the* Drizzle test runner.  This is naturally happening in baby steps, but expect to see some changes in the next month or so.

With that said, I hope that people will enjoy playing with the new toys and I look forward to providing more fun ways of making your favorite dbms sweat in the near future >: )

Hello, SkySQL!

So, as LinuxJedi so eloquently noted here, Rackspace and Drizzle are parting ways.  While they were kind enough to offer other opportunities with them, my preferences were similar to Andrew’s – to remain in the MySQL/Drizzle world.

I was fortunate enough that SkySQL had need of my services and am happy to announce that today marks my first day as a Senior QA Engineer with them.  I am very honored to join such a promising and skilled group and am very excited about the opportunities ahead.

My work will have me developing QA solutions (tests, code, whatever) for a wide variety of things.  Naturally, this includes SkySQL’s Reference Architecture where I will work on tests to ensure the delivered packages work perfectly for our customers (among many, many other things).  Additionally, I’ll be working with tools such as LinuxJedi’s modifications to Domas’ oh-so-tasty mydumper.  Basically, a little bit of everything ; )  One of the first things I’ll be doing is adapting dbqp to work with MySQL.  This will allow us to handle a wide variety of testing challenges from a single, adaptable platform in the MySQL world as well.  Drizzle will be replacing the legacy test-runner with this tool in the very near future.

Speaking of which, I’ve also been encouraged to continue contributing to Drizzle.  I have been speaking with Stewart recently about beefing up xtrabackup tests and putting the random query generator directly in the Drizzle tree.  It’s quite amazing to be in a position where I can collaborate on projects across company lines.  I also think it is particularly cool that Henrik notes SkySQL were the first to do a production install of Drizzle for someone!

With that said, I have plenty on my plate already and it’s time to get back to work : )

New dbqp feature – using pre-created datadirs for tests

Why would one want to do this, you may ask?  Well, for starters, it makes a great ‘canary-in-the-coal-mine‘ in regards to backwards compatibility!

For Drizzle, we’ve created some tables (via the randgen’s data generator if you are curious), saved a copy of the datadir, and then created a test case that uses said datadir for the test server.  The test executes some simple SQL queries to make sure we can read the tables properly.  This way, if we ever do something to either the server or .dfe format (data format exchange – had a most enlightening conversation with the team about this format’s history at the MySQL UC), we’ll have a broken test that cries about it.  From there, we’ll know we have to take some action.  The always-amazing Stewart Smith has also created some foreign key backwards compatibility tests, which I believe marks further progress towards the magical goodness that is catalogs!

We signal that we want to do this by using a .cnf file:


[test_servers]
servers = [[]]

[s0]
load-datadir=backwards_compat_data

Each server is named s0,s1,sN.  If a server name is contained in the .cnf file, the test-runner will do the appropriate magic to use the specified datadir for that server.  The argument to load-datadir is the name of the directory that is intended for use in the test.  All datadirs are expected to live in drizzle/tests/std_data.  Tests that do use a .cnf file, like main.backwards_compatibility and slave.basic are skipped by test-run.pl automatically (you *can’t* run them via test-run.pl).

This is something that I don’t believe could be accomplished with the old test runner, or at least not *easily* done (see Rube Goldberg) ; ).  At some point, we will switch over to dbqp entirely and remove test-run.pl.  Seeing comments like this, makes me happy and think things are on track.

dbqp was created with the idea that it should be easy to express complex testing setups (multiple servers, using a preloaded datadir, etc, etc) and it looks like the incubation is starting to pay some benefits.  In addition to allowing this voodoo to happen, the code I’ve added to the test runner will allow us to start doing proper tests of the super Mr. Shrewsbury’s multi-master replication.  Joe Daly has also been doing some very promising work for hierarchical replication based on Dave’s tree.  I’ll be creating some example tests for these badass features soon.  The moral of the story is that by rethinking our test-runner, one tiny bit of code helps us move the ball forward on testing replication, backwards compatibility, and catalogs.

It’s honestly one of the best parts of working on the Drizzle project – being encouraged to experiment and rethink problems has enabled all sorts of innovation (but one example of Monty Taylor’s computing wizardry!) and cool features.  Thanks to this freedom to experiment, we now have even more ways of making sure we are producing quality code.

My view of QA is that we do help test, but that we also help other people answer their own questions about quality (via tools, documentation, examples, etc).  Ultimately, a test is a question – “Do you return the right answer for this query?”, “Can you survive a beating from the randgen?”, etc – and asking questions should be easy and informative.  QA shouldn’t be the sole province of some obscure priest class, but everyone’s playground.  When I see developers like Stewart writing interesting test cases and even contributing to the test tool itself, I’m even happier than when I find a bug (and finding bugs is quite awesome!).

Anyway, the code is proposed for a merge to trunk and documentation is available (under testing/writing test cases).  I hope that this makes trying to break things even more fun for people >: )