The mysqldump program can be used to make logical database backups. Although the vast majority of people use it to create SQL dumps, it is possible to dump both schema structure and data in XML format. There are a few bugs (#52792, #52793) in this feature, but these are not the topic of this post.
XML output from mysqldump
Dumping in XML format is done with the --xml or -X option. In addition, you should use the --hex-blob option otherwise the BLOB data will be dumped as raw binary data, which usually results in characters that are not valid, either according to the XML spec or according to the UTF-8 encoding. (Arguably, this is also a bug. I haven't filed it though.)For example, a line like:
dumps the sakila database to the following XML format:
mysqldump -uroot -pmysql -X --hex-blob --databases sakila
I don't want to spend too much time discussing why it would be useful to make backups in this way. There are definitely a few drawbacks - for example, for sakila, the plain SQL dump, even with
<?xml version="1.0"?>
<mysqldump xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<database name="sakila">
<table_structure name="actor">
<field Field="actor_id" Type="smallint(5) unsigned" Null="NO" Key="PRI" Extra="auto_increment" />
<field Field="first_name" Type="varchar(45)" Null="NO" Key="" Extra="" />
<field Field="last_name" Type="varchar(45)" Null="NO" Key="MUL" Extra="" />
<field Field="last_update" Type="timestamp" Null="NO" Key="" Default="CURRENT_TIMESTAMP" Extra="on update CURRENT_TIMESTAMP" />
<key Table="actor" Non_unique="0" Key_name="PRIMARY" Seq_in_index="1" Column_name="actor_id" Collation="A" Cardinality="200" Null="" Index_type="BTREE" Comment="" />
<key Table="actor" Non_unique="1" Key_name="idx_actor_last_name" Seq_in_index="1" Column_name="last_name" Collation="A" Cardinality="200" Null="" Index_type="BTREE" Comment="" />
<options Name="actor" Engine="InnoDB" Version="10" Row_format="Compact" Rows="200" Avg_row_length="81" Data_length="16384" Max_data_length="0" Index_length="16384" Data_free="233832448" Auto_increment="201" Create_time="2009-10-10 10:04:56" Collation="utf8_general_ci" Create_options="" Comment="" />
</table_structure>
<table_data name="actor">
<row>
<field name="actor_id">1</field>
<field name="first_name">PENELOPE</field>
<field name="last_name">GUINESS</field>
<field name="last_update">2006-02-15 03:34:33</field>
</row>
...many more rows and table structures...
</database>
</mysqldump>
--hex-blob
is 3.26 MB (3.429.358 bytes), whereas the XML output is 13.7 MB (14,415,665 bytes). Even after zip compression, the XML formatted dump is still one third larger than the plain SQL dump: 936 kB versus 695 kB.Restoring XML output from mysqldump
A more serious problem is that MySQL doesn't seem to offer any tool to restore XML formatted dumps. TheLOAD XML
feature, kindly contributed by Erik Wetterberg could be used to some extent for this purpose. However, this feature is not yet available (it will be available in the upcoming version MySQL 5.5), and from what I can tell, it can only load data - not restore tables or databases. I also believe that this feature does not (yet) provide any way to properly restore hex-dumped BLOB data, but I really should test it to know for sure. Anyway.
In between sessions of the past MySQL users conference I cobbled up an XSLT stylesheet that can convert
mysqldump
's XML output back to SQL script output. It is available under the LGPL license, and it is hosted on google code as the mysqldump-x-restore project. To get started, you need to download the mysqldump-xml-to-sql.xslt XSLT stylesheet. You also need a command line XSLT processor, like xsltproc. This utility is part of the Gnome libxslt project, and is included in packages for most linux distributions. There is a windows port available for which you can download the binaries.Assuming that
xsltproc
is in your path, and the XML dump and the mysqldump-xml-to-sql.xslt
are in the current working directory, you can use this command to convert the XML dump to SQL: On Unix-based systems you should be able to directly pipline the SQL into
xsltproc mysqldump-xml-to-sql.xslt sakila.xml > sakila.sql
mysql
usingThe stylesheet comes with a number of options, which can be set through
mysql -uroot -pmysql < `xsltproc mysqldump-xml-to-sql.xslt sakila.xml`
xsltproc
's --stringparam
option. For example, setting the schema
parameter to N
will result in an SQL script that only contains DML statements:Setting the
xsltproc --stringparam schema N mysqldump-xml-to-sql.xslt sakila.xml > sakila.sql
data
option to N
will result in an SQL script that only contains DDL statements:. There are additional options to control how often a
xsltproc --stringparam data N mysqldump-xml-to-sql.xslt sakila.xml > sakila.sql
COMMIT
should be issued, whether to add DROP
statements, whether to generate single row INSERT
statements, and to set the max_allowed_packet
size.What's next?
Nothing much really. I don't really recommend people to usemysqldump
's XML output. I wrote mysqldump-x-restore for those people that inherited a bunch of XML formatted dumps, and don't know what to do with them. I haven't thouroughly tested it - please file a bug if you find one. If you actually think it's useful and you want more features, please let me know, and I'll look into it. I don't have much use for this myself, so if you have great ideas to move this forward, I'll let you have commit access.That is all.
25 comments:
I'm running your script and all i get is a bunch of "Object is a string : " messages in the output.
Any ideas what might be wrong?
I'll do some more testing.
Erik, is that output from xsltproc? Please mail me the command line, a small test XML dump to reproduce and the output at roland.bouman@gmail.com
TIA, Roland.
Hi, I am getting following error
Unexpected data type .
Unexpected data type .
Unexpected data type .
@Anonymous, thanks! Strange, the name of the type should be in the message.
Can you send me the dump? Only the schema, no data please. send it to roland.bouman@gmail.com with some reasonable subject line and body text so you won't get round up by gmail spamfilter.
Hi Roland,
very nice, very informative, very competent site (not only this article). In particular this XML-Dump_back_to_SQL-Dump-XSLT hint is right what I was looking for (could have found it using a search engine but sometimes you errm... "don't find a single tree within the woods" as we use to say in Germany). I fully agree with your advice not to use XML-dumps for 1:1 backups (should be obvious... but XML sounds "good" at first, and it surely is, but not for this kind of purpose).
Also those other articles dealing with native implemented MySQL/XPath constructs are just gold! The point is that any of this may be found within the official MySQL docs - but it's nice to find it "rehashed" like this on a silver tablet.
So thanks for sharing all this (it happened I bookmarked this site twice already and I do not regret this :) ) !
Wolfram
Hi Wolfram!
thank you very, very much for your kind words! It's very rewarding to get this kind of feedback - much appreciated!
kind regards,
Roland.
The problem with using the XSLT file to transform the XML dump file to SQL is that it doesn't handle large files very well. I have an 877M XML dump file that I would like to convert. I ran out of memory on a linux box with 4G of physical memory + 6G of VM.
I was hoping the XSLT file would be better than using simplexml_load_file() in PHP (which has the same problem of opening the entire file in memory).
Hi Anonymous!
yes, most XSLT implementations first need to parse the XML and have some in-memory representation of the document to apply the stylesheet to. There is something called streaming XSLT, and I believe the latest release of Saxon implements it. However, I imagine not all types of documents and stylesheets lend themselves to this technique.
Personally, in your case, I would probably try to parse the XML using a SAX parser. To keep it simple I would probably try to use the SAX handlers to generate lines in CSV (or similar) format. You can then load the CSV with for example LOAD DATA INFILE.
If you don't want to create a custom program for this kind of thing, you could give Kettle (pentaho data integration) a go. It has a Load data from XML step, which may help here. Kettle doesn't like large XML files too (or actually, it handles them fine, but also requires the memory) however, it can use a file buffer to compensate (but of course, it costs performance). For more info on this approach, see:
http://wiki.pentaho.com/display/EAI/Get+Data+From+XML
extra info for handling large files:
http://wiki.pentaho.com/display/EAI/Get+Data+from+XML+-+Handling+Large+Files
I have a xml that looks like this (this is what I have to work with) and all I get is a create statement for database "mediaDb". How can I get it to create tables and data rows?
http://www.4shared.com/document/B7FsPIwJ/mediaDb_Sample.html
Hi steve!
(next time, please use the issue tracker in the google code project to post issues. Thanks in advance)
I downloaded your file and I can confirm your result. But the structure of your xml is different from what I get when I use mysqldump with the -X option. What version of mysqldump did you create this dump with?
Mean while, I can cobble up something to make it work for your example, would you like that? But then it would be cool if you could test it and give me some feedback if it works. Ok?
Ok steve, I took a look and I think it can be done, but there's a huge challenge since the format you posted has zero information on data types.
Anyway, I just committed a new version (revision 4) which I hope will fix your problem. But please be aware that I don't see anyway to restore the table structures going by this format. I did a best guess to distinguish between numbers and strings, I hope this helps for now.
Hi Roland,
can i know if there is any format for xml? I try to do the sqldump from my sql file but I got a syntax error..
syamimi,
this is not helpful. If you think you found a bug, please use the tracker on the google code project.
Please provide a description of what you did, and what happened, and - very important - any error messages you received. Would also be great if you could attach the input file, or even better, a small sample from it that still reproduces that error.
Thanks in advance, and kind regards,
Roland.
Hi Roland,
Your script literally saved the day for myself.
My task was to develop a script that would scramble confidential information such as personal identity numbers, names, and the like from a MySQL dump taken from production. When scrambled, this data would then be transferred to test environment for use.
The data contained both simple values in table columns such as 'personalIdentityNumber', and (for the most part) lots of xml in clob columns. The confidential fields that had to be mangled were identified from the dump file using regular expressions. This would have been very difficult against the traditional MySQL dump file. The xml, however, provided enough boiler plate to get reliable regexp matches. Indeed it works fine against simple columns such as
020372-XXXX
and fields containing xml such as
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<personalIdentityNumber>020372-XXXX</personalIdentityNumber>
...
But the problem was that we're running MySQL 5.1., which doesn't yet support xml import. And I couldn't figure out a schema-agnostic (or almost agnostic) way of doing this using MySQL's EXTRACTVALUE etc functions alone. So I was about to start writing the xslt, but luckily stumbled upon your blog first!
Thanks!
Toni
Hi Toni!
I'm glad the script works for you. But I didn't quite understand this statement:
"the problem was that we're running MySQL 5.1., which doesn't yet support xml import."
I believe 5.1 is the first version to introduce ExtractValue() and UpdateXML()
http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html
Hi Roland,
Yes, EXTRACTVALUE is available in 5.1. Our database schema is still evolving rapidly, and a db import solution based on EXTRACTVALUE and stored procs would require updates to follow db schema evolution. This is much easier when operating on MySQL xml dump and a set of regular expression templates for the mangling. All I have to do is to provide a list of column names (which happen to be the same across several tables in my case) and xml tag names used, and a simple python script will create and process regular expressions (and invoke mapping functions for substitutions) out of templates such as:
"()([^<]+)()" (for simple values)
and
"(<{tagname}>)([^&]+)(</{tagname}>)" (for xml content)
BTW: I also came across this issue: http://code.google.com/p/mysqldump-x-restore/issues/detail?id=1 when running the script with larger datasets. In our case it was ok to simply substitute single quotes with double quotes, which seemed to work.
Third thing encountered was that MySQL's xml export is not able to deal with bit(1) datatypes properly. XML contained strange bytes in place of the bits: 0x0a01 for 1 and 0x0a00 for 0. Substituting these with 0 (0x0a30) and 1 (0x0a31) did the trick.
Toni
Hi Toni!
thanks for the feedback. Do you mean that MySQL generates XML that is not well-formed when outputting bit data type values? If that's the case, it's a bug and it should be reported.
Hi Roland!
In more detail, mysqldump seems to emit a byte 0x00 for "false" and 0x01 for "true" when column type is bit(1), at least when dumping in xml format. These are clearly nonprintable chars and the resulting file is no longer well-formed xml at least according to xmllint. Since MySQL allows use of 0 and 1 decimal digits in inserts to bit(1) columns, a simple tr replacement did the trick:
cat dump.xml | tr "\000" 0 | tr "\001" 1 > dump2.xml
bit(1) seems to be the default mapping for a JPA/Hibernate entities' boolean fields in MySQL.
I'll have a look at bugs.mysql.com and see if I can report the bug.
Toni
Roland,
Actually mysqldump flag --hex-blob also works for binary and varbinary datatypes (bit(1) for example), creating a hex-blob instead of breaking well-formedness of the xml. So I guess it's the same bug you mentioned in the post.
Toni
Hi Toni,
thanks for investigating and sharing the result. Yeah, I guess hex-blob really should be automatic and mandatory when dumping to xml. Good to know that it works for bit fields as well.
Thanks so much Roland,
This was a great solution for me. I think the errors some received of "Unexpected data type" occur with errors parsing indexes and foreign keys. I received those errors which went away when my schema only included columns.
I was upgrading mediawiki on a fedora box, and mistakenly saved the database as --xml with the mymysqldump program, and of course I noticed my mistake after I dropped all tables in the database! Too bad I was following their upgrade instructions without thinking what I was doing!
I tried your mysqldump-xml-to-sql.xslt file with the mediawiki database, and obtained the following output.
xsltproc mysqldump-xml-to-sql.xslt file.xml > file.sql
file.xml:22681: parser error : Char 0x0 out of allowed range
<field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="
^
file.xml:22681: parser error : AttValue: ' expected
<field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="
^
file.xml:22681: parser error : attributes construct error
<field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="
^
file.xml:22681: parser error : Couldn't find end of Start Tag field line 22681
<field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="
^
file.xml:22681: parser error : Premature end of data in tag table_structure line 22674
<field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="
^
file.xml:22681: parser error : Premature end of data in tag database line 22673
<field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="
^
file.xml:22681: parser error : Premature end of data in tag mysqldump line 2
<field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="
^
unable to parse file.xml
I know this likely isn't very helpful, but given the fact that it happened to me while upgrading the mediawiki, and following their instructions, I'm sure it has happened to many others as well!
Thank you for you help. Just for the record, I was lucky enough to have a tarball backup of the entire system that was only a few days old, and was able to recover the database from the /var/lib/mysql directory in the tar backup, and only lost a *FEW* days of updates!
Live and learn!
Cheers,
Dan
Hi Dan!
"file.xml:22681: parser error : Char 0x0 out of allowed range" < looks like the dump is not valid xml? Can you check with a hex editor if the error message is true? If it is, a bug should be filed for the dump program.
dumb question is that line 22681?
or character position 22681?
Oddly enough, it appears that the 22681st line does in fact contain NULL characters...
My notepad++ would let me copy and paste them here, so I changed them from a to the word "NUL" just for clarity and example...
The database came from a working wikimedia installation.
Strange indeed!
Cheers,
Dan
Post a Comment