tag:blogger.com,1999:blog-15319370.post5022183796037634328..comments2024-03-05T11:16:00.846+01:00Comments on Roland Bouman's blog: Restoring XML-formatted MySQL dumpsrpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.comBlogger25125tag:blogger.com,1999:blog-15319370.post-15238258787206339292015-11-14T22:01:35.617+01:002015-11-14T22:01:35.617+01:00Oddly enough, it appears that the 22681st line doe...Oddly enough, it appears that the 22681st line does in fact contain NULL characters...<br />My notepad++ would let me copy and paste them here, so I changed them from a to the word "NUL" just for clarity and example...<br />The database came from a working wikimedia installation.<br /><br /><br /><br />Strange indeed!<br /><br />Cheers,<br />Dan<br />tekvaxhttps://www.blogger.com/profile/12684431390309709661noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-69979528391572613362015-11-14T21:39:13.139+01:002015-11-14T21:39:13.139+01:00dumb question is that line 22681?
or character po...dumb question is that line 22681? <br />or character position 22681?tekvaxhttps://www.blogger.com/profile/12684431390309709661noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-17835132741977528842015-11-14T17:31:07.820+01:002015-11-14T17:31:07.820+01:00Hi Dan!
"file.xml:22681: parser error : Char...Hi Dan!<br /><br />"file.xml:22681: parser error : Char 0x0 out of allowed range" < looks like the dump is not valid xml? Can you check with a hex editor if the error message is true? If it is, a bug should be filed for the dump program.rpboumanhttps://www.blogger.com/profile/13365137747952711328noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-87534674552210750622015-11-14T01:49:32.675+01:002015-11-14T01:49:32.675+01:00I was upgrading mediawiki on a fedora box, and mis...I was upgrading mediawiki on a fedora box, and mistakenly saved the database as --xml with the mymysqldump program, and of course I noticed my mistake after I dropped all tables in the database! Too bad I was following their upgrade instructions without thinking what I was doing!<br /><br />I tried your mysqldump-xml-to-sql.xslt file with the mediawiki database, and obtained the following output.<br /><br /> xsltproc mysqldump-xml-to-sql.xslt file.xml > file.sql<br />file.xml:22681: parser error : Char 0x0 out of allowed range<br /> <field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="<br /> ^<br />file.xml:22681: parser error : AttValue: ' expected<br /> <field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="<br /> ^<br />file.xml:22681: parser error : attributes construct error<br /> <field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="<br /> ^<br />file.xml:22681: parser error : Couldn't find end of Start Tag field line 22681<br /> <field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="<br /> ^<br />file.xml:22681: parser error : Premature end of data in tag table_structure line 22674<br /> <field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="<br /> ^<br />file.xml:22681: parser error : Premature end of data in tag database line 22673<br /> <field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="<br /> ^<br />file.xml:22681: parser error : Premature end of data in tag mysqldump line 2<br /> <field Field="ar_timestamp" Type="binary(14)" Null="NO" Key="" Default="<br /> ^<br />unable to parse file.xml<br /><br />I know this likely isn't very helpful, but given the fact that it happened to me while upgrading the mediawiki, and following their instructions, I'm sure it has happened to many others as well!<br /><br />Thank you for you help. Just for the record, I was lucky enough to have a tarball backup of the entire system that was only a few days old, and was able to recover the database from the /var/lib/mysql directory in the tar backup, and only lost a *FEW* days of updates!<br /><br />Live and learn! <br /><br />Cheers,<br />Dan <br />tekvaxhttps://www.blogger.com/profile/12684431390309709661noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-54860231311156133172014-08-28T19:09:34.307+02:002014-08-28T19:09:34.307+02:00Thanks so much Roland,
This was a great solution ...Thanks so much Roland,<br /><br />This was a great solution for me. I think the errors some received of "Unexpected data type" occur with errors parsing indexes and foreign keys. I received those errors which went away when my schema only included columns.Lewis Thomasnoreply@blogger.comtag:blogger.com,1999:blog-15319370.post-5538524166781869432013-02-12T21:40:06.025+01:002013-02-12T21:40:06.025+01:00Hi Toni,
thanks for investigating and sharing the...Hi Toni,<br /><br />thanks for investigating and sharing the result. Yeah, I guess hex-blob really should be automatic and mandatory when dumping to xml. Good to know that it works for bit fields as well.rpboumanhttps://www.blogger.com/profile/13365137747952711328noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-80897061006090041552013-02-12T17:58:11.213+01:002013-02-12T17:58:11.213+01:00Roland,
Actually mysqldump flag --hex-blob also w...Roland,<br /><br />Actually mysqldump flag --hex-blob also works for binary and varbinary datatypes (bit(1) for example), creating a hex-blob instead of breaking well-formedness of the xml. So I guess it's the same bug you mentioned in the post.<br /><br />Toni<br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-15319370.post-5132621730750569242013-02-12T16:58:32.740+01:002013-02-12T16:58:32.740+01:00Hi Roland!
In more detail, mysqldump seems to emi...Hi Roland!<br /><br />In more detail, mysqldump seems to emit a byte 0x00 for "false" and 0x01 for "true" when column type is bit(1), at least when dumping in xml format. These are clearly nonprintable chars and the resulting file is no longer well-formed xml at least according to xmllint. Since MySQL allows use of 0 and 1 decimal digits in inserts to bit(1) columns, a simple tr replacement did the trick:<br /><br />cat dump.xml | tr "\000" 0 | tr "\001" 1 > dump2.xml<br /><br />bit(1) seems to be the default mapping for a JPA/Hibernate entities' boolean fields in MySQL.<br /><br />I'll have a look at bugs.mysql.com and see if I can report the bug.<br /><br />Toni<br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-15319370.post-34982059223179767942013-02-12T01:36:03.818+01:002013-02-12T01:36:03.818+01:00Hi Toni!
thanks for the feedback. Do you mean tha...Hi Toni!<br /><br />thanks for the feedback. Do you mean that MySQL generates XML that is not well-formed when outputting bit data type values? If that's the case, it's a bug and it should be reported.rpboumanhttps://www.blogger.com/profile/13365137747952711328noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-89412863030304072832013-02-12T00:59:09.512+01:002013-02-12T00:59:09.512+01:00Hi Roland,
Yes, EXTRACTVALUE is available in 5.1....Hi Roland,<br /><br />Yes, EXTRACTVALUE is available in 5.1. Our database schema is still evolving rapidly, and a db import solution based on EXTRACTVALUE and stored procs would require updates to follow db schema evolution. This is much easier when operating on MySQL xml dump and a set of regular expression templates for the mangling. All I have to do is to provide a list of column names (which happen to be the same across several tables in my case) and xml tag names used, and a simple python script will create and process regular expressions (and invoke mapping functions for substitutions) out of templates such as:<br /><br />"()([^<]+)()" (for simple values)<br /><br />and<br /><br />"(<{tagname}>)([^&]+)(</{tagname}>)" (for xml content)<br /><br />BTW: I also came across this issue: http://code.google.com/p/mysqldump-x-restore/issues/detail?id=1 when running the script with larger datasets. In our case it was ok to simply substitute single quotes with double quotes, which seemed to work.<br /><br />Third thing encountered was that MySQL's xml export is not able to deal with bit(1) datatypes properly. XML contained strange bytes in place of the bits: 0x0a01 for 1 and 0x0a00 for 0. Substituting these with 0 (0x0a30) and 1 (0x0a31) did the trick.<br /><br />Toni<br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-15319370.post-18347239074525011772013-02-11T22:55:37.307+01:002013-02-11T22:55:37.307+01:00Hi Toni!
I'm glad the script works for you. B...Hi Toni!<br /><br />I'm glad the script works for you. But I didn't quite understand this statement:<br /><br />"the problem was that we're running MySQL 5.1., which doesn't yet support xml import."<br /><br />I believe 5.1 is the first version to introduce ExtractValue() and UpdateXML() <br /><br />http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html<br /><br />rpboumanhttps://www.blogger.com/profile/13365137747952711328noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-71833156246938497282013-02-11T16:50:15.044+01:002013-02-11T16:50:15.044+01:00Hi Roland,
Your script literally saved the day fo...Hi Roland,<br /><br />Your script literally saved the day for myself.<br /><br />My task was to develop a script that would scramble confidential information such as personal identity numbers, names, and the like from a MySQL dump taken from production. When scrambled, this data would then be transferred to test environment for use.<br /><br />The data contained both simple values in table columns such as 'personalIdentityNumber', and (for the most part) lots of xml in clob columns. The confidential fields that had to be mangled were identified from the dump file using regular expressions. This would have been very difficult against the traditional MySQL dump file. The xml, however, provided enough boiler plate to get reliable regexp matches. Indeed it works fine against simple columns such as<br /><br />020372-XXXX<br /><br />and fields containing xml such as<br /><br /><?xml version="1.0" encoding="UTF-8" standalone="yes"?><br /><personalIdentityNumber>020372-XXXX</personalIdentityNumber><br />...<br /><br />But the problem was that we're running MySQL 5.1., which doesn't yet support xml import. And I couldn't figure out a schema-agnostic (or almost agnostic) way of doing this using MySQL's EXTRACTVALUE etc functions alone. So I was about to start writing the xslt, but luckily stumbled upon your blog first!<br /><br />Thanks!<br /><br />ToniAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-15319370.post-36885238806006704912011-11-17T09:26:32.990+01:002011-11-17T09:26:32.990+01:00syamimi,
this is not helpful. If you think you fo...syamimi,<br /><br />this is not helpful. If you think you found a bug, please use the tracker on the google code project. <br /><br />Please provide a description of what you did, and what happened, and - very important - any error messages you received. Would also be great if you could attach the input file, or even better, a small sample from it that still reproduces that error. <br /><br />Thanks in advance, and kind regards,<br />Roland.rpboumanhttps://www.blogger.com/profile/13365137747952711328noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-72683536286886264482011-11-17T07:21:08.986+01:002011-11-17T07:21:08.986+01:00Hi Roland,
can i know if there is any format for ...Hi Roland,<br /><br />can i know if there is any format for xml? I try to do the sqldump from my sql file but I got a syntax error..syamimihttps://www.blogger.com/profile/16646952193279462053noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-17062277600789116132011-01-18T23:24:32.088+01:002011-01-18T23:24:32.088+01:00Ok steve, I took a look and I think it can be done...Ok steve, I took a look and I think it can be done, but there's a huge challenge since the format you posted has zero information on data types. <br /><br />Anyway, I just committed a new version (revision 4) which I hope will fix your problem. But please be aware that I don't see anyway to restore the table structures going by this format. I did a best guess to distinguish between numbers and strings, I hope this helps for now.rpboumanhttps://www.blogger.com/profile/13365137747952711328noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-11863791466402336222011-01-18T21:57:29.040+01:002011-01-18T21:57:29.040+01:00Hi steve!
(next time, please use the issue track...Hi steve! <br /><br />(next time, please use the issue tracker in the google code project to post issues. Thanks in advance)<br /><br />I downloaded your file and I can confirm your result. But the structure of your xml is different from what I get when I use mysqldump with the -X option. What version of mysqldump did you create this dump with?<br /><br />Mean while, I can cobble up something to make it work for your example, would you like that? But then it would be cool if you could test it and give me some feedback if it works. Ok?rpboumanhttps://www.blogger.com/profile/13365137747952711328noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-89982170293315145812011-01-18T21:32:17.293+01:002011-01-18T21:32:17.293+01:00I have a xml that looks like this (this is what I ...I have a xml that looks like this (this is what I have to work with) and all I get is a create statement for database "mediaDb". How can I get it to create tables and data rows?<br /><br />http://www.4shared.com/document/B7FsPIwJ/mediaDb_Sample.htmlstevehttps://www.blogger.com/profile/13041804963458286186noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-71199134168053585102010-11-16T23:38:00.929+01:002010-11-16T23:38:00.929+01:00Hi Anonymous!
yes, most XSLT implementations firs...Hi Anonymous!<br /><br />yes, most XSLT implementations first need to parse the XML and have some in-memory representation of the document to apply the stylesheet to. There is something called streaming XSLT, and I believe the latest release of Saxon implements it. However, I imagine not all types of documents and stylesheets lend themselves to this technique.<br /><br />Personally, in your case, I would probably try to parse the XML using a SAX parser. To keep it simple I would probably try to use the SAX handlers to generate lines in CSV (or similar) format. You can then load the CSV with for example LOAD DATA INFILE.<br /><br />If you don't want to create a custom program for this kind of thing, you could give Kettle (pentaho data integration) a go. It has a Load data from XML step, which may help here. Kettle doesn't like large XML files too (or actually, it handles them fine, but also requires the memory) however, it can use a file buffer to compensate (but of course, it costs performance). For more info on this approach, see:<br />http://wiki.pentaho.com/display/EAI/Get+Data+From+XML <br />extra info for handling large files:<br />http://wiki.pentaho.com/display/EAI/Get+Data+from+XML+-+Handling+Large+Filesrpboumanhttps://www.blogger.com/profile/13365137747952711328noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-910174038058847732010-11-16T22:46:22.552+01:002010-11-16T22:46:22.552+01:00The problem with using the XSLT file to transform ...The problem with using the XSLT file to transform the XML dump file to SQL is that it doesn't handle large files very well. I have an 877M XML dump file that I would like to convert. I ran out of memory on a linux box with 4G of physical memory + 6G of VM.<br /><br />I was hoping the XSLT file would be better than using simplexml_load_file() in PHP (which has the same problem of opening the entire file in memory).Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-15319370.post-34375993770268970792010-06-07T20:19:06.569+02:002010-06-07T20:19:06.569+02:00Hi Wolfram!
thank you very, very much for your k...Hi Wolfram! <br /><br />thank you very, very much for your kind words! It's very rewarding to get this kind of feedback - much appreciated! <br /><br />kind regards, <br /><br />Roland.rpboumanhttps://www.blogger.com/profile/13365137747952711328noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-7573279607839320762010-06-07T19:14:41.332+02:002010-06-07T19:14:41.332+02:00Hi Roland,
very nice, very informative, very compe...Hi Roland,<br />very nice, very informative, very competent site (not only this article). In particular this XML-Dump_back_to_SQL-Dump-XSLT hint is right what I was looking for (could have found it using a search engine but sometimes you errm... "don't find a single tree within the woods" as we use to say in Germany). I fully agree with your advice not to use XML-dumps for 1:1 backups (should be obvious... but XML sounds "good" at first, and it surely is, but not for this kind of purpose). <br />Also those other articles dealing with native implemented MySQL/XPath constructs are just gold! The point is that any of this may be found within the official MySQL docs - but it's nice to find it "rehashed" like this on a silver tablet.<br />So thanks for sharing all this (it happened I bookmarked this site twice already and I do not regret this :) ) !<br />WolframAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-15319370.post-66698303473328589762010-05-26T16:55:51.408+02:002010-05-26T16:55:51.408+02:00@Anonymous, thanks! Strange, the name of the type ...@Anonymous, thanks! Strange, the name of the type should be in the message. <br /><br />Can you send me the dump? Only the schema, no data please. send it to roland.bouman@gmail.com with some reasonable subject line and body text so you won't get round up by gmail spamfilter.rpboumanhttps://www.blogger.com/profile/13365137747952711328noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-52044632320339696142010-05-26T15:14:13.150+02:002010-05-26T15:14:13.150+02:00Hi, I am getting following error
Unexpected data ...Hi, I am getting following error<br /><br />Unexpected data type .<br /><br />Unexpected data type .<br /><br />Unexpected data type .Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-15319370.post-67654416329212971692010-04-22T01:38:08.292+02:002010-04-22T01:38:08.292+02:00Erik, is that output from xsltproc? Please mail me...Erik, is that output from xsltproc? Please mail me the command line, a small test XML dump to reproduce and the output at roland.bouman@gmail.com <br /><br />TIA, Roland.rpboumanhttps://www.blogger.com/profile/13365137747952711328noreply@blogger.comtag:blogger.com,1999:blog-15319370.post-22504841849986674122010-04-21T22:31:17.928+02:002010-04-21T22:31:17.928+02:00I'm running your script and all i get is a bun...I'm running your script and all i get is a bunch of "Object is a string : " messages in the output.<br /><br />Any ideas what might be wrong?<br /><br />I'll do some more testing.Erikhttps://www.blogger.com/profile/09997025677558009213noreply@blogger.com