Tuesday, April 20, 2010

Restoring XML-formatted MySQL dumps

EAVB_VFZUHIARHI To whom it may concern -

The mysqldump program can be used to make logical database backups. Although the vast majority of people use it to create SQL dumps, it is possible to dump both schema structure and data in XML format. There are a few bugs (#52792, #52793) in this feature, but these are not the topic of this post.

XML output from mysqldump

Dumping in XML format is done with the --xml or -X option. In addition, you should use the --hex-blob option otherwise the BLOB data will be dumped as raw binary data, which usually results in characters that are not valid, either according to the XML spec or according to the UTF-8 encoding. (Arguably, this is also a bug. I haven't filed it though.)

For example, a line like:

mysqldump -uroot -pmysql -X --hex-blob --databases sakila
dumps the sakila database to the following XML format:

<?xml version="1.0"?>
<mysqldump xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<database name="sakila">
<table_structure name="actor">
<field Field="actor_id" Type="smallint(5) unsigned" Null="NO" Key="PRI" Extra="auto_increment" />
<field Field="first_name" Type="varchar(45)" Null="NO" Key="" Extra="" />
<field Field="last_name" Type="varchar(45)" Null="NO" Key="MUL" Extra="" />
<field Field="last_update" Type="timestamp" Null="NO" Key="" Default="CURRENT_TIMESTAMP" Extra="on update CURRENT_TIMESTAMP" />
<key Table="actor" Non_unique="0" Key_name="PRIMARY" Seq_in_index="1" Column_name="actor_id" Collation="A" Cardinality="200" Null="" Index_type="BTREE" Comment="" />
<key Table="actor" Non_unique="1" Key_name="idx_actor_last_name" Seq_in_index="1" Column_name="last_name" Collation="A" Cardinality="200" Null="" Index_type="BTREE" Comment="" />
<options Name="actor" Engine="InnoDB" Version="10" Row_format="Compact" Rows="200" Avg_row_length="81" Data_length="16384" Max_data_length="0" Index_length="16384" Data_free="233832448" Auto_increment="201" Create_time="2009-10-10 10:04:56" Collation="utf8_general_ci" Create_options="" Comment="" />
</table_structure>
<table_data name="actor">
<row>
<field name="actor_id">1</field>
<field name="first_name">PENELOPE</field>
<field name="last_name">GUINESS</field>
<field name="last_update">2006-02-15 03:34:33</field>
</row>

...many more rows and table structures...

</database>
</mysqldump>
I don't want to spend too much time discussing why it would be useful to make backups in this way. There are definitely a few drawbacks - for example, for sakila, the plain SQL dump, even with --hex-blob is 3.26 MB (3.429.358 bytes), whereas the XML output is 13.7 MB (14,415,665 bytes). Even after zip compression, the XML formatted dump is still one third larger than the plain SQL dump: 936 kB versus 695 kB.

Restoring XML output from mysqldump

A more serious problem is that MySQL doesn't seem to offer any tool to restore XML formatted dumps. The LOAD XML feature, kindly contributed by Erik Wetterberg could be used to some extent for this purpose. However, this feature is not yet available (it will be available in the upcoming version MySQL 5.5), and from what I can tell, it can only load data - not restore tables or databases. I also believe that this feature does not (yet) provide any way to properly restore hex-dumped BLOB data, but I really should test it to know for sure.

Anyway.

In between sessions of the past MySQL users conference I cobbled up an XSLT stylesheet that can convert mysqldump's XML output back to SQL script output. It is available under the LGPL license, and it is hosted on google code as the mysqldump-x-restore project. To get started, you need to download the mysqldump-xml-to-sql.xslt XSLT stylesheet. You also need a command line XSLT processor, like xsltproc. This utility is part of the Gnome libxslt project, and is included in packages for most linux distributions. There is a windows port available for which you can download the binaries.

Assuming that xsltproc is in your path, and the XML dump and the mysqldump-xml-to-sql.xslt are in the current working directory, you can use this command to convert the XML dump to SQL:

xsltproc mysqldump-xml-to-sql.xslt sakila.xml > sakila.sql
On Unix-based systems you should be able to directly pipline the SQL into mysql using

mysql -uroot -pmysql < `xsltproc mysqldump-xml-to-sql.xslt sakila.xml`
The stylesheet comes with a number of options, which can be set through xsltproc's --stringparam option. For example, setting the schema parameter to N will result in an SQL script that only contains DML statements:

xsltproc --stringparam schema N mysqldump-xml-to-sql.xslt sakila.xml > sakila.sql
Setting the data option to N will result in an SQL script that only contains DDL statements:

xsltproc --stringparam data N mysqldump-xml-to-sql.xslt sakila.xml > sakila.sql
. There are additional options to control how often a COMMIT should be issued, whether to add DROP statements, whether to generate single row INSERT statements, and to set the max_allowed_packet size.

What's next?

Nothing much really. I don't really recommend people to use mysqldump's XML output. I wrote mysqldump-x-restore for those people that inherited a bunch of XML formatted dumps, and don't know what to do with them. I haven't thouroughly tested it - please file a bug if you find one. If you actually think it's useful and you want more features, please let me know, and I'll look into it. I don't have much use for this myself, so if you have great ideas to move this forward, I'll let you have commit access.

That is all.

Wednesday, April 14, 2010

MySQL Conference 2010 Presentation: Optimizing Stored Routines

Yesterday I delivered my presentation for the MySQL User Conference and Expo 2010: Optimizing MySQL Stored Routines. If you are interested in the slides, you can find them on both the MySQL conference site as well as on slideshare.net. Here's the abstract of my presentation so you can decide if this is interesting for you:
MySQL stored routines (functions, procedures, triggers and events) can be useful. But many casually written stored routines are unnecessarily slow. The main reason is that MySQL does not apply even simple code optimizations to stored routine code. Many developers are not aware of this, and as a result, write stored routine code that can quite easily be tuned, increasing performance by 50%-100% by only applying very straightforward code optimizations.
It was very pleased to see so many people attend: I had the impression that MySQL stored routines are quite impopular, due to performance issues, and a syntax that is often regarded as "clunky", so I didn't expect more than about 20 people to show up. Much to my pleasure, the ballroom was filled for about two-thirds, and I estimate there were 70-something people in the room.

A quick survey of the audience indicated that all of them were in fact using stored routines in production, so I assume they didn't show up out of morbid curiosity :) Interestingly, only few people reported performance issues. It would be interesting to do more research to find out what people are in fact doing with MySQL stored routines. Among yesterday's attendees, there were people using MySQL stored routines for managing user privileges, processing astronomical data, and checking complex dynamic business rules. To be sure - these were all different users - not just one isolated fanatic going wild with stored routines.

Co-incidentally, Domas Mituzas from facebook also mentioned stored routines in his presentation on high concurrency MySQL as a way to reduce the lock gap when performing multiple changes in a single transaction. I'm just saying - perhaps MySQL stored routines aren't that bad at all, they just need more love and dedication from the MySQL developers so they can mature and gain wider applicability.

Recently, I already wrote about a recent improvement in MySQL 5.5, the long anticipated SIGNAL /RESIGNAL syntax. I hope more improvements will follow soon now the dust is settling after Oracle's acquisition of Sun. After hearing Edward Screven unfold Oracle's strategy for MySQL in yesterday's keynote, I can tell you without reservation that I am quite optimistic :)

Anyway - that is all for now. Two days of conference ahead :)

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...