Basic replication from Drizzle’s transaction log now being tested.

Just a quick update on the state of Drizzle’s transaction log as it’s been a while since I’ve mentioned it.

As I wrote earlier, we have already spent some time on basic tests of the transaction log structure – i.e. can we capture INSERT/UPDATE/DELETE/etc correctly?

Well, the next phase of testing is underway and we’re now beating up on things with the randgen!

At present, we are working with a single thread and throwing a large number of randomly generated queries at the log. We have the randgen set up so that it runs 20-30 queries per cycle and we run several hundred cycles. Once these queries have been executed, we make use of drizzled/message/transaction_reader to create SQL from the transaction log contents.

The example below assumes you’ve started the server via ./test-run –start-and-exit –mysqld=–transaction-log.enable and have created some tables to play with.  We call the transaction_reader like this:

drizzled/message/transaction_reader var/master-data/transaction.log

As an example, a query like:

UPDATE dd SET col_bigint=5 LIMIT 5;

Will show up as:

START TRANSACTION;
UPDATE `test`.`dd` SET `col_bigint`=5 WHERE `pk`=3389;
UPDATE `test`.`dd` SET `col_bigint`=5 WHERE `pk`=2329;
UPDATE `test`.`dd` SET `col_bigint`=5 WHERE `pk`=3634;
UPDATE `test`.`dd` SET `col_bigint`=5 WHERE `pk`=2369;
UPDATE `test`.`dd` SET `col_bigint`=5 WHERE `pk`=3674;
COMMIT;

We then send this SQL to a validation server (just another instance of Drizzle) and then compare drizzledump files between the master and the slave.

So far, we’ve found a couple of new crashes, some minor issues with the transaction_reader program, and a couple of issues where the log actually fails to capture data (only UPDATEs have failed in this way so far). I’d like to give a special mention to David Shrewsbury and Joe Daly for their awesomely fast responses to the bugs I’ve found so far : ) We maintain a list of all  transaction log bugs we have found with the testing blueprint.  Most of these bugs are already closed (thanks Dave and Joe!).

Our next steps will be to tweak our single-threaded grammars a bit further, then we will move on to concurrent testing. We’ll be repeating the testing process I laid out above, except that we will let the multiple threads run to completion, say five to ten thousand queries apiece, and then replicate and validate. At the moment, we’re shooting for testing to be complete in time for next week’s milestone release.

Code coverage – now with branches!

We have now upgraded our lcov testing to take advantage of lcov 1.9‘s branch coverage.

While code coverage isn’t the be-all-end-all of testing, it is a very useful tool in helping us target areas that could use more tender, loving care (by which I mean beating them mercilessly with our test tools).  It doesn’t prove completeness of testing – it merely helps us move in that direction : )  The addition of branch-level coverage gives us another dimension to help us expand our testing.  We’re also making use of the –demangle-cpp option to produce neater function names for the function-level coverage.

You can check out the updated reports here.  We gather the code coverage of our test suite with every push to trunk and store the data for general analysis.

In other news, Brian has added two new functions for testing.  These changes were a part of another blueprint to remove print_stack_trace and stack dump. While I haven’t had a chance to do anything with them yet, I think these will be very useful and look forward to seeing what kinds of tests our community will be able to cook up : )

  • Crash the server:

% ./drizzled/drizzled –plugin-add=crash_function

select crash();

  • Shutdown the server:

% ./drizzled/drizzled –plugin-add=shutdown_function

select shutdown()

Finally, I have been playing with the syslog plugin and it is awesome!  It is easy to setup and is very easy to use.  The data produced will be very useful in testing as well.  An example produced via a randgen run for your enjoyment:

Sep  2 20:06:16 mahmachine drizzled[4340]: thread_id=4 query_id=200380 db=”test” query=”SELECT    SUM(  table1 . `col_int_key` ) AS field1 FROM  c AS table1  RIGHT  JOIN g AS table2 ON  table1 . `col_varchar_1024` =  table2 . `col_varchar_1024_key`  WHERE table1 . `col_int` <> table1 . `col_int_key`   ORDER BY field1″ command=”Query” t_connect=1283472376187915 t_start=486 t_lock=367 rows_sent=1 rows_examined=2 tmp_table=0 total_warn_count=0



Well, that’s it for now.  We have some exciting work coming up and we look forward to seeing what kinds of awesome plugins are being developed out there.   Now back to hacking.

Testing the data dictionary in a concurrent environment

So, Brian wrote a bug the other day, asking me to do more testing of Drizzle’s data dictionary.  Specifically, we wanted to look for how things behaved in a concurrent environment as this is often a killer for table functions / what people are most likely to forget.



What we came up with was the following plan:
1)  Generate a test that only looked at data dictionary tables, with several users generating the same queries.
2)  If step 1 looks good, we will slowly introduce background workloads (SELECT / UPDATE / etc) while we continue with the workload from step 1.



This resulted in a couple of new randgen grammars:
data_dict_concurrent_drizzle.yy – this grammar generates nothing but queries against the data dictionary tables.  At present, these are mostly of the variety:
  • SELECT * FROM data_dictionary_table
  • SHOW PROCESSLIST | VARIABLE | TABLE STATUS | etc
This is designed to stress the data dictionary, either alone or with another randgen process generating a background workload.



proclist_subquery_drizzle.yy – this grammar is the same as optimizer_subquery_drizzle (generating *nasty*, subquery-heavy SELECTs), but also allows for SHOW PROCESSLIST commands. This is mainly designed to stress the server / PROCESSLIST.  This grammar is nice as it is a single test that can just be run with several threads.



I am happy to report that in a data dictionary-only environment, the server was able to handle things very well.  I was running up to 100 connections, 100k queries per connection and things looked good.



The other tests are another matter.  While these are somewhat simple tests, they have proven highly effective so far:
  • Bug #627733: Crash in InnodbTrxTool::Generator::populate_innodb_locks (this=0x7f26140046f0) at plugin/innobase/handler/data_dictionary.cc:269
  • Bug #627742: Assertion failed – in drizzled::plugin::TableFunction::Generator::push (this=0x23876c0, arg=<value optimized out>, length=<value optimized out>) at drizzled/plugin/table_function.cc:185
  • Bug #628398: Crash / segfault in copy_fields (join=0x284f868, end_of_records=false) at drizzled/sql_select.cc:6228
  • Bug #628891: Crash / assertion failed – in drizzled::Diagnostics_area::set_eof_status (this=0x7f9f3c2c4258, session=0x7f9f3c2c3b10) at drizzled/diagnostics_area.cc:120



For each of these bugs, data dictionary queries were being executed while another query was also being processed.  It should be noted that our newest team member, Andrew Hutchings, had Bug#627742 fixed in less than 24 hours : )



I still have a few more scenarios to run through, but it appears that we have shaken out most of the bugs in this area.  Our next steps will be to install such tests in our build and test system to prevent regressions / catch new bugs and to fix the remaining crashes noted above.