EXECUTE testing with randgen Transforms

A little while ago, we added EXECUTE functionality to the server.  This lets us either:

  • EXECUTE “SELECT * FROM TEST_TABLE”;
  • SET @var = “SELECT * FROM TEST_TABLE” ; EXECUTE @var;

We have added a new suite to our test tree and we’ve also started testing this with the randgen.  The random query generator allows for code known as a Validator.  This code executes on a per-query basis and can do whatever you need it to in order to verify a query.  Some examples:

  • ResultSetComparator – which compares result sets between two different servers for the same query.  Useful for checking result set correctness against a trusted validation server.  This tool, combined with the optimizer* grammars quickly found over 30 optimizer bugs in MySQL >:-)
  • ExecutionTimeComparator  – compares execution times between two servers.  This is useful when checking a patch for a regression, especially in the optimizer.

There is a special type of Validator known as a Transformer.  There are various Transforms that can be used on a query.  The randgen will alter the query in some way (a Transform).  Each Transform states how the result set should relate to the result set of the original query, for example a TRANSFORM_OUTCOME_SUBSET is expected when tacking on a LIMIT clause.  Some Transforms:

  • ExecuteAsSPTwice – This takes the original query, creates a MySQL stored procedure from it, then executes it twice in a row.  This was developed due to a MySQL bug
  • InlineSubqueries – Converts SELECT…WHERE col_int IN (SELECT col_int…) -> SELECT …WHERE IN (1,3,5) i.e. the actual values returned from the subquery

For Drizzle, we have created two new Transforms.  For each SELECT query generated by a given grammar, the randgen EXECUTE’s it as a string and as a variable.  I’m happy to report that the tests are passing with flying colors and will be added to our automated tests.

It was incredibly easy to create these new Transforms for the randgen.  Now, we get to try the functionality out against every SELECT we can generate via the randgen – we get to cover a lot more ground this way versus trying to craft these tests by hand (though we have added several such tests as previously noted).

Anyway, please feel free to kick the tires on this feature.  I leave it to you to check out EXECUTE…CONCURRENT ; )

Testing status report – Drizzle’s transaction log

It’s been a while since I’ve blogged about the work we are doing on the transaction log.  Basically, our priority has been to ensure that the log and supporting code is rock-solid before we move further along with replication.  The intent is to allow for a wide variety of replication solutions, all of which will be built on the log’s contents.  We’re very concerned with giving developers and users a solid foundation for whatever solution they may use.

In my last post on this topic, we had just created tests for the test-suite and had starting beating on the log with the randgen in single-user scenarios.  This was important as it helped us catch basic bugs before we moved on to more complicated testing.  We have since moved on to high-concurrency testing.  We use the randgen to generate a wide variety of queries, using 5+ connections.  Once all of the queries have been executed, we use the transaction_reader utility to generate SQL from the log file’s contents.  We use this log file to populate a validation server.  From there, we do a comparison of drizzledump output and report an error if any difference is found.

Our randgen grammars use varying levels of ‘noise’.  We issue some pretty awful SQL at times, but when we consulted with the DBA’s at Rackspace, they said they see such things regularly so our log had better be able to handle it : )  We found a number of bugs by throwing fuzzy queries at the server.  Most of these were issues where one query out of several within a transaction would fail and this would cause problems for the entire transaction.  Fortunately, David Shrewsbury and Joe Daly are very devoted to killing any such bugs I may find : )

We have now automated our randgen tests for the transaction log.  That means that these tests will be run against every patch before it can be pushed to trunk; we’ll have early feedback if something breaks.  We also have a param-build job that runs these tests.  If a developer has been working on this code, they can run the tests against their branch to find out if they have broken anything.

At the time of this writing, I would say that the log is pretty solid.  We do have a couple of troublesome outstanding bugs that show up in concurrent testing:

  • Differences between slave and master in concurrent testing scenarios – randgen tests using many threads to operate on the same set of tables are producing differences between the master server and a validation server populated from the transaction log’s contents.  Still tracking down the exact interaction that is causing this to fail.
  • Transaction ID not unique – we are seeing cases where different transactions in a concurrent environment are using the same transaction id’s

We are still in the process of testing things, but David Shrewsbury and Marcus Ericsson have been making progress with the Tungsten Replicator.  We’ll be working on testing scenarios using that solution once it is ready.  Any developers interested in replication are encouraged to give the transaction log a spin with their favorite solution.  The basics definitely work well, and now would be the time to chime in with your thoughts / needs for the log.  We realize that the concurrency problems are an issue and we’re actively working on resolving these, but things are in a state where one could start testing basic functionality as they saw fit.

As always anyone with any questions, recommendations, or whatever are welcome to contact us via IRC or the mailing list.

Sphinx documentation for Drizzle’s test-runner now available

In case you missed it, Drizzle is now using Sphinx to produce our documentation.  If you have sphinx installed (version 1+), you can generate them yourself with `make html`.  It is easy to work with (it’s Python, after all) and creates some very nice looking docs.

For those of you familiar with MySQL, test-run is similar to mysql-test-run, but with some adjustments for Drizzle.  It allows a user to run the test suite to ensure the system is performing correctly.  You can view the code coverage we achieve here.

One of the most important things people can do to help us move Drizzle from beta to GA is to try it out.  We do test very heavily, but extra sets of eyes are always helpful.  Let us know if things are broken or if you have thoughts on how things could work better; we welcome the feedback.

In the future, I intend to expand on the testing documents to include writing test cases and documenting the language features that are available in test-run.  Additionally, I will be writing up docs on how to use the randgen with Drizzle.  Please let us know via the mailing list / IRC / whatever if you have any specific information you’d like to see documented.

Also, if you are interested in contributing, but don’t necessarily want to hack on the code, I encourage you to tinker with the documentation – we are more than happy to accept patches : )  You can find the source files in drizzle/docs.