More on kewpie (the query probulator)

My presentation from the MySQL UC didn’t give a lot of detail on the actual tool I have hacked up, nor did it go into how to play with it / try it out. I figured I should rectify that (at least one person seemed interested in trying it out <g>)

To begin with, you should have the random query generator installed (see the docs for handling that). Besides being *the* cutting edge, production-ready testing tool in the open-source dbms world, it comes with a handy data generator.

One of the key features of kewpie, is that it can easily generate test queries against any test bed. A standard randgen practice is to develop grammars and gendata files (which generates a user-specified test-bed) that are designed to work together. By knowing which tables and columns will be available to the randgen at runtime, the grammar writer can tune the randgen to produce a higher percentage of valid queries.

It is possible to just use the built in _field element, which will randomly retrieve some available field, however, being able to match columns by data type (for both joins and comparisons) results in much more interesting queries >:) At some point, the randgen will likely be improved to overcome this, but it is a sad fact of qa that we often spend more time producing tests, than working on beefing up our testing infrastructure.

At any rate, the kewpie demos are designed to work with the random data generator. It is a very cool tool, and one can also use it with –start-and-exit to have a populated test server.

Requirements:

randgen
dbd::drizzle (see randgen docs)
MySQLDB (Drizzle speaks the MySQL protocol. MySQLDB enables us to play well with SQLAlchemy too!)
the demo branch from launchpad: bzr branch lp:~patrick-crews/drizzle/dbqp_kewpie_demo

It is important to remember that kewpie is more of a science project than something you’ll use for testing any time soon. It is meant to help illustrate the power behind large-scale automated query generation and analysis, but it pales in comparison to the mighty, mighty randgen. However, if you are interested, please read on : )

Config files

kewpie runs are controlled via a .cnf file. Python has a very nice ConfigParser library and it seems like a solid way of organizing this information and getting at it. Also, the very well-designed drizzle-automation uses similar files. I’ll just digress a bit here to give big props to Jay Pipes of the fu for his work here. It has informed a lot of the infrastructure work I’ve been doing for Drizzle. : )

test_info section:

[test_info] comment = basic test of kewpie seeding test_schema = test init_query_count = 2 init_table_count = 3 init_column_count = 4 init_where_count = 0 # limits for various values max_table_count = 10 max_column_count = 25 max_where_count = 10 max_mutate_count = 3

This section seeds the initial query population. In the example above, we produce 2 queries that each have 4 columns and use 2 tables (and no WHERE clause). It is an eventual dream to have more fine-grained control over such things, but this was a proof-of-concept as much as anything.

Next we have limits. We don’t want to go over 10 tables, 25 columns (in the SELECT clause), or 10 conditions in the WHERE clause. We also set max_mutate_count so that only 3 mutant queries will ever be produced from a seed. Setting it higher = more variants that are possible from each query.

mutators section:

[mutators] add_table = 5 add_column = 1 add_where = 3

At the moment, kewpie only has 3 ways to manipulate a query – add_table, add_column, and add_where. These should be fairly self-explanatory ; ) The vision is that these will eventually have a variety of parameters that can be set, so that we can one day ask that we only add conditions to the WHERE clause that use an integer column, for example. The numeric values following each mutator name is how we stack the deck in favor of one mutator over another. When we evaluate this section, we create a python list object that contains N occurances of each mutator name, when it comes time to mutate a query, we randomly choose one mutator from the list and then call that method against the query.

test_servers section:

[test_servers] servers = [[--innodb.replication-log]]

As we do in other dbqp tests, we provide a list of python lists. Each sublist represents the server options we want to use for the test server. At present, there is no need to start more than 1 server, though there may be value in altering certain options.

evaluators section:

[evaluators] row_count = True explain_output = False

Currently, we only have the row_count evaluator. This ensures that at least one row of data was returned for a given query. It is surprising how valuable just this tiny filter can be. In Microsoft’s research, they found that purely random systems only produced valid queries 50% of the time. The remainder tended to short out at the parser level. The evaluator is what helps us produce useful queries, the mutators are what help the system hit its evaluation targets (whatever they may be).

Future evaluators can measure code-coverage, server variable effect, log file effect, pretty much anything. We want testing to be flexible and have borrowed heavily from the modular Validator and Reporter design of the randgen.

Now to see it in action!

We are going to take our join.cnf file and seed it so we create 2 initial queries, with 4 columns and 3 tables each. We run this in conjunction with the conf/drizzle/drizzle.zz gendata file (sort of our go-to test bed for the randgen).

./dbqp --mode=kewpie --randgen-path=$RANDGEN_PATH --gendata=$RANDGEN_PATH/conf/drizzle/drizzle.zz join --verbose Setting --no-secure-file-priv=True for randgen mode... 21 Apr 2011 11:38:59 VERBOSE: Initializing system manager... 21 Apr 2011 11:38:59 VERBOSE: Processing source tree under test... 21 Apr 2011 11:38:59 INFO: Using Drizzle source tree: 21 Apr 2011 11:38:59 INFO: basedir: /home/user/repos/kewpie_demo <snip> 21 Apr 2011 11:39:00 INFO: Taking clean db snapshot... 21 Apr 2011 11:39:00 VERBOSE: Starting executor: bot0 21 Apr 2011 11:39:00 VERBOSE: Executor: bot0 beginning test execution... 21 Apr 2011 11:39:00 VERBOSE: Restoring from db snapshot 21 Apr 2011 11:39:00 VERBOSE: Starting server: bot0.s0 21 Apr 2011 11:39:00 INFO: bot0 server: 21 Apr 2011 11:39:00 INFO: NAME: s0 21 Apr 2011 11:39:00 INFO: MASTER_PORT: 9306 21 Apr 2011 11:39:00 INFO: DRIZZLE_TCP_PORT: 9307 21 Apr 2011 11:39:00 INFO: MC_PORT: 9308 21 Apr 2011 11:39:00 INFO: PBMS_PORT: 9309 21 Apr 2011 11:39:00 INFO: RABBITMQ_NODE_PORT: 9310 21 Apr 2011 11:39:00 INFO: VARDIR: /home/user/repos/kewpie_demo/tests/workdir/bot0/s0/var 21 Apr 2011 11:39:00 INFO: STATUS: 1 # 2011-04-21T11:39:00 Default schema: test # 2011-04-21T11:39:00 Executor initialized, id GenTest::Executor::Drizzle 2011.03.14.2269 () # 2011-04-21T11:39:00 # Creating Drizzle table: test.A; engine: ; rows: 0 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.B; engine: ; rows: 0 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.C; engine: ; rows: 1 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.D; engine: ; rows: 1 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.AA; engine: ; rows: 10 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.BB; engine: ; rows: 10 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.CC; engine: ; rows: 100 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.DD; engine: ; rows: 100 . 21 Apr 2011 11:39:01 INFO: Executing query: SELECT table_1.col_char_1024_not_null_key AS column_1, table_3.col_char_1024 AS column_2, table_3.col_enum AS column_3, table_1.pk AS column_4 FROM AA AS table_1 RIGHT JOIN D AS table_2 ON table_1.col_char_1024_not_null_key = table_2.col_char_10_not_null_key LEFT OUTER JOIN D AS table_3 ON table_2.col_text_key = table_3.col_text 21 Apr 2011 11:39:01 INFO: EVALUATOR: row_count STATUS: True EXTRA: 1 21 Apr 2011 11:39:01 VERBOSE: ORIG QUERY: SELECT table_1.col_char_1024_not_null_key AS column_1, table_3.col_char_1024 AS column_2, table_3.col_enum AS column_3, table_1.pk AS column_4 FROM AA AS table_1 RIGHT JOIN D AS table_2 ON table_1.col_char_1024_not_null_key = table_2.col_char_10_not_null_key LEFT OUTER JOIN D AS table_3 ON table_2.col_text_key = table_3.col_text 21 Apr 2011 11:39:01 VERBOSE: USING ADD_TABLE mutation 21 Apr 2011 11:39:01 VERBOSE: MUTANT QUERY: SELECT table_1.col_char_1024_not_null_key AS column_1, table_3.col_char_1024 AS column_2, table_3.col_enum AS column_3, table_1.pk AS column_4 FROM AA AS table_1 RIGHT JOIN D AS table_2 ON table_1.col_char_1024_not_null_key = table_2.col_char_10_not_null_key LEFT OUTER JOIN D AS table_3 ON table_2.col_text_key = table_3.col_text RIGHT JOIN B AS table_4 ON table_3.col_text_key = table_4.col_text_not_null <snip>

From this output we can see how the query was executed, evaluated, and mutated. As we wanted, we have 4 columns and 3 tables in the original query and we add extra tables to queries that evaluate well.

Now let’s see what happens when we use a different gendata file. We’ll use one called varchar_drizzle.zz which, surprisingly enough, only uses varchars:

./dbqp --mode=kewpie --randgen-path=$RANDGEN_PATH --gendata=$RANDGEN_PATH/conf/drizzle/varchar_drizzle.zz join --verbose Setting --no-secure-file-priv=True for randgen mode... 21 Apr 2011 11:44:20 VERBOSE: Initializing system manager... 21 Apr 2011 11:44:20 VERBOSE: Processing source tree under test... 21 Apr 2011 11:44:20 INFO: Using Drizzle source tree: 21 Apr 2011 11:44:20 INFO: basedir: /home/user/repos/kewpie_demo <snip> 21 Apr 2011 11:44:20 INFO: Taking clean db snapshot... 21 Apr 2011 11:44:20 VERBOSE: Starting executor: bot0 21 Apr 2011 11:44:20 VERBOSE: Executor: bot0 beginning test execution... 21 Apr 2011 11:44:20 VERBOSE: Restoring from db snapshot 21 Apr 2011 11:44:20 VERBOSE: Starting server: bot0.s0 21 Apr 2011 11:44:20 INFO: bot0 server: 21 Apr 2011 11:44:20 INFO: NAME: s0 21 Apr 2011 11:44:20 INFO: MASTER_PORT: 9306 21 Apr 2011 11:44:20 INFO: DRIZZLE_TCP_PORT: 9307 21 Apr 2011 11:44:20 INFO: MC_PORT: 9308 21 Apr 2011 11:44:20 INFO: PBMS_PORT: 9309 21 Apr 2011 11:44:20 INFO: RABBITMQ_NODE_PORT: 9310 21 Apr 2011 11:44:20 INFO: VARDIR: /home/user/repos/kewpie_demo/tests/workdir/bot0/s0/var 21 Apr 2011 11:44:20 INFO: STATUS: 1 # 2011-04-21T11:44:20 Default schema: test # 2011-04-21T11:44:20 Executor initialized, id GenTest::Executor::Drizzle 2011.03.14.2269 () # 2011-04-21T11:44:20 # Creating Drizzle table: test.table0_varchar_150_not_null; engine: ; rows: 0 . # 2011-04-21T11:44:20 # Creating Drizzle table: test.table1_varchar_150_not_null; engine: ; rows: 1 . # 2011-04-21T11:44:20 # Creating Drizzle table: test.table2_varchar_150_not_null; engine: ; rows: 2 . # 2011-04-21T11:44:20 # Creating Drizzle table: test.table10_varchar_150_not_null; engine: ; rows: 10 . # 2011-04-21T11:44:20 # Creating Drizzle table: test.table100_varchar_150_not_null; engine: ; rows: 100 . 21 Apr 2011 11:44:20 INFO: Executing query: SELECT table_1.col_varchar_1024 AS column_1, table_3.pk AS column_2, table_2.col_varchar_1024_key AS column_3, table_3.col_varchar_1024_not_null AS column_4 FROM table10_varchar_150_not_null AS table_1 RIGHT OUTER JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024 = table_2.col_varchar_1024_not_null LEFT JOIN table10_varchar_150_not_null AS table_3 ON table_1.col_varchar_1024_key = table_3.pk 21 Apr 2011 11:44:20 INFO: EVALUATOR: row_count STATUS: True EXTRA: 2 21 Apr 2011 11:44:20 VERBOSE: ORIG QUERY: SELECT table_1.col_varchar_1024_not_null_key AS column_1, table_3.pk AS column_2, table_3.col_varchar_1024_not_null_key AS column_3, table_1.col_varchar_1024 AS column_4 FROM table0_varchar_150_not_null AS table_1 RIGHT JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024_not_null_key = table_2.col_varchar_1024 LEFT OUTER JOIN table1_varchar_150_not_null AS table_3 ON table_2.col_varchar_1024_not_null_key = table_3.col_varchar_1024_not_null 21 Apr 2011 11:44:20 VERBOSE: USING ADD_TABLE mutation 21 Apr 2011 11:44:21 VERBOSE: MUTANT QUERY: SELECT table_1.col_varchar_1024_not_null_key AS column_1, table_3.pk AS column_2, table_3.col_varchar_1024_not_null_key AS column_3, table_1.col_varchar_1024 AS column_4 FROM table0_varchar_150_not_null AS table_1 RIGHT JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024_not_null_key = table_2.col_varchar_1024 LEFT OUTER JOIN table1_varchar_150_not_null AS table_3 ON table_2.col_varchar_1024_not_null_key = table_3.col_varchar_1024_not_null RIGHT JOIN table100_varchar_150_not_null AS table_4 ON table_3.col_varchar_1024_key = table_4.col_varchar_1024_not_null

As you can see, the testbed (created / populated tables have changed). As a result, the generated queries have changed as well. Allowing this kind of flexibility will allow qa engineers to not only look for good queries, but also for interesting query / test bed combinations (sometimes optimizations and code paths executed rely on both) in an easy and automated manner.

Next, we’ll take a look at how to add other things into the mix. Suppose you want to also add WHERE conditions to your generated queries – it is as simple as tweaking the following line in join.cnf from:
[mutators] add_table = 5 add_column = 0 add_where = 0

[mutators] add_table = 2 add_column = 0 add_where = 4

We are now twice as likely to add a WHERE condition as we are to add a table to a query. Let’s see what happens from the exact same command line:

./dbqp --mode=kewpie --randgen-path=$RANDGEN_PATH --gendata=$RANDGEN_PATH/conf/drizzle/varchar_drizzle.zz join --verbose Setting --no-secure-file-priv=True for randgen mode... 21 Apr 2011 11:50:16 VERBOSE: Initializing system manager... 21 Apr 2011 11:50:16 VERBOSE: Processing source tree under test... 21 Apr 2011 11:50:16 INFO: Using Drizzle source tree: 21 Apr 2011 11:50:16 INFO: basedir: /home/user/repos/kewpie_demo <snip> 21 Apr 2011 11:50:16 INFO: Taking clean db snapshot... 21 Apr 2011 11:50:16 VERBOSE: Starting executor: bot0 21 Apr 2011 11:50:16 VERBOSE: Executor: bot0 beginning test execution... 21 Apr 2011 11:50:16 VERBOSE: Restoring from db snapshot 21 Apr 2011 11:50:16 VERBOSE: Starting server: bot0.s0 21 Apr 2011 11:50:16 INFO: bot0 server: 21 Apr 2011 11:50:16 INFO: NAME: s0 21 Apr 2011 11:50:16 INFO: MASTER_PORT: 9306 21 Apr 2011 11:50:16 INFO: DRIZZLE_TCP_PORT: 9307 21 Apr 2011 11:50:16 INFO: MC_PORT: 9308 21 Apr 2011 11:50:16 INFO: PBMS_PORT: 9309 21 Apr 2011 11:50:16 INFO: RABBITMQ_NODE_PORT: 9310 21 Apr 2011 11:50:16 INFO: VARDIR: /home/user/repos/kewpie_demo/tests/workdir/bot0/s0/var 21 Apr 2011 11:50:16 INFO: STATUS: 1 # 2011-04-21T11:50:16 Default schema: test # 2011-04-21T11:50:16 Executor initialized, id GenTest::Executor::Drizzle 2011.03.14.2269 () # 2011-04-21T11:50:16 # Creating Drizzle table: test.table0_varchar_150_not_null; engine: ; rows: 0 . # 2011-04-21T11:50:16 # Creating Drizzle table: test.table1_varchar_150_not_null; engine: ; rows: 1 . # 2011-04-21T11:50:16 # Creating Drizzle table: test.table2_varchar_150_not_null; engine: ; rows: 2 . # 2011-04-21T11:50:16 # Creating Drizzle table: test.table10_varchar_150_not_null; engine: ; rows: 10 . # 2011-04-21T11:50:16 # Creating Drizzle table: test.table100_varchar_150_not_null; engine: ; rows: 100 . <snip> 21 Apr 2011 11:50:17 INFO: Executing query: SELECT table_1.col_varchar_1024 AS column_1, table_3.pk AS column_2, table_2.col_varchar_1024_key AS column_3, table_3.col_varchar_1024_not_null AS column_4 FROM table10_varchar_150_not_null AS table_1 RIGHT OUTER JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024 = table_2.col_varchar_1024_not_null LEFT JOIN table10_varchar_150_not_null AS table_3 ON table_1.col_varchar_1024_key = table_3.pk 21 Apr 2011 11:50:17 INFO: EVALUATOR: row_count STATUS: True EXTRA: 2 21 Apr 2011 11:50:17 VERBOSE: ORIG QUERY: SELECT table_1.col_varchar_1024_not_null_key AS column_1, table_3.pk AS column_2, table_3.col_varchar_1024_not_null_key AS column_3, table_1.col_varchar_1024 AS column_4 FROM table0_varchar_150_not_null AS table_1 RIGHT JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024_not_null_key = table_2.col_varchar_1024 LEFT OUTER JOIN table1_varchar_150_not_null AS table_3 ON table_2.col_varchar_1024_not_null_key = table_3.col_varchar_1024_not_null 21 Apr 2011 11:50:17 VERBOSE: USING ADD_WHERE mutation 21 Apr 2011 11:50:17 VERBOSE: MUTANT QUERY: SELECT table_1.col_varchar_1024_not_null_key AS column_1, table_3.pk AS column_2, table_3.col_varchar_1024_not_null_key AS column_3, table_1.col_varchar_1024 AS column_4 FROM table0_varchar_150_not_null AS table_1 RIGHT JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024_not_null_key = table_2.col_varchar_1024 LEFT OUTER JOIN table1_varchar_150_not_null AS table_3 ON table_2.col_varchar_1024_not_null_key = table_3.col_varchar_1024_not_null WHERE table_1.pk >= 'W'

As I said, it is still beta software ; ) However, in all seriousness, we want to be able to generate ‘bad’ queries, but to have the option of not using them and filtering them out of a test if they serve no purpose.

Hopefully, this will give anyone that is interested a better idea of how to play with the code. Development will likely continue, but this is still more of a prototype of how things could be. If you *really* want to test a database, I still highly recommend the amazing random query generator – it is good for blowing things up!

I really want this picture to be shown when anyone searches for 'randgen' ; )

workcenter220

a little bit of everything