Testing status report – Drizzle’s transaction log

It’s been a while since I’ve blogged about the work we are doing on the transaction log.  Basically, our priority has been to ensure that the log and supporting code is rock-solid before we move further along with replication.  The intent is to allow for a wide variety of replication solutions, all of which will be built on the log’s contents.  We’re very concerned with giving developers and users a solid foundation for whatever solution they may use.

In my last post on this topic, we had just created tests for the test-suite and had starting beating on the log with the randgen in single-user scenarios.  This was important as it helped us catch basic bugs before we moved on to more complicated testing.  We have since moved on to high-concurrency testing.  We use the randgen to generate a wide variety of queries, using 5+ connections.  Once all of the queries have been executed, we use the transaction_reader utility to generate SQL from the log file’s contents.  We use this log file to populate a validation server.  From there, we do a comparison of drizzledump output and report an error if any difference is found.

Our randgen grammars use varying levels of ‘noise’.  We issue some pretty awful SQL at times, but when we consulted with the DBA’s at Rackspace, they said they see such things regularly so our log had better be able to handle it : )  We found a number of bugs by throwing fuzzy queries at the server.  Most of these were issues where one query out of several within a transaction would fail and this would cause problems for the entire transaction.  Fortunately, David Shrewsbury and Joe Daly are very devoted to killing any such bugs I may find : )

We have now automated our randgen tests for the transaction log.  That means that these tests will be run against every patch before it can be pushed to trunk; we’ll have early feedback if something breaks.  We also have a param-build job that runs these tests.  If a developer has been working on this code, they can run the tests against their branch to find out if they have broken anything.

At the time of this writing, I would say that the log is pretty solid.  We do have a couple of troublesome outstanding bugs that show up in concurrent testing:

  • Differences between slave and master in concurrent testing scenarios – randgen tests using many threads to operate on the same set of tables are producing differences between the master server and a validation server populated from the transaction log’s contents.  Still tracking down the exact interaction that is causing this to fail.
  • Transaction ID not unique – we are seeing cases where different transactions in a concurrent environment are using the same transaction id’s

We are still in the process of testing things, but David Shrewsbury and Marcus Ericsson have been making progress with the Tungsten Replicator.  We’ll be working on testing scenarios using that solution once it is ready.  Any developers interested in replication are encouraged to give the transaction log a spin with their favorite solution.  The basics definitely work well, and now would be the time to chime in with your thoughts / needs for the log.  We realize that the concurrency problems are an issue and we’re actively working on resolving these, but things are in a state where one could start testing basic functionality as they saw fit.

As always anyone with any questions, recommendations, or whatever are welcome to contact us via IRC or the mailing list.

One thought on “Testing status report – Drizzle’s transaction log

  1. Pingback: Patrick Crews: Testing status report – Drizzle’s transaction log | Weez.com

Comments are closed.