Tuesday, March 30, 2010

MySQL: Partition-wise backups with mysqldump

To whom it may concern,

in response to a query from André Simões (also known as ITXpander), I slapped together a MySQL script that outputs mysqldump commands for backing up individual partitions of the tables in the current schema. The script is maintained as a snippet at MySQL Forge a gist on github.

How it works

The script works by querying the information_schema.PARTITIONS system view to generate an appropriate expression for mysqldump's --where option. The generated command also redirects the output to a file with this name pattern:
<schema>.<table>.<partition-name>.sql
For example, for this table (taken from the MySQL reference manual):
CREATE TABLE members (
firstname VARCHAR(25) NOT NULL,
lastname VARCHAR(25) NOT NULL,
username VARCHAR(16) NOT NULL,
email VARCHAR(35),
joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (1960),
PARTITION p1 VALUES LESS THAN (1970),
PARTITION p2 VALUES LESS THAN (1980),
PARTITION p3 VALUES LESS THAN (1990),
PARTITION p4 VALUES LESS THAN MAXVALUE
);
the script generates the following commands:
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) < 1960" test members > test.members.p0.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1960 and YEAR(joined) < 1970" test members > test.members.p1.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1970 and YEAR(joined) < 1980" test members > test.members.p2.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1980 and YEAR(joined) < 1990" test members > test.members.p3.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1990 and YEAR(joined) < 18446744073709551615" test members > test.members.p4.sql
Tip: in order to obtain directly executable output from the mysql command line tool, run the script with the --skip-column-names (or -N) option.

Features

Currently, the script supports the following partitioning methods:

Limitations

The LINEAR HASH method is currently not supported, but I may implement that in the future.

Currently I do not have plans to implement the KEY and LINEAR KEY partitioning methods, but I may reconsider if and when I have more information about the storage-engine specific partitioning functions used by these methods.

Finally, I should point out that querying the information_schema.PARTITIONS table is dog-slow. This may not be too big of an issue, however it is pretty annoying. If anybody has some tips to increase performance, please let me know.

Acknowledgements

Thanks to André for posing the problem. I had a fun hour of procrastination to implement this, and it made me read part of the MySQL reference manual on partitioning.

I also would like to thank Giuseppe Maxia (the Datacharmer) for providing valuable feedback. If you're interested in either partitioning or the mysql command line, you should visit his tutorials at the MySQL conference, april 12-15, 2010.

7 comments:

Unknown said...

Hi, if you change the where clause from
"YEAR(joined) < 1960"
"YEAR(joined) >= 1960 and YEAR(joined) < 1970"
to
"joined < '1960-01-01' or YEAR(joined) is NULL"
"joined >= '1960-01-01' and joined < '1970-01-01' AND YEAR(joined) is not NULL"

Then it would also make use of pruning for all partitions accept the first.

For YEAR, 'is [not] NULL' is not neccessary, but for TO_DAYS it is, since all rows where the partitioning expression evaluates to NULL will end up in the first partition.

(Have not tried it though...)

rpbouman said...

Mattias,

thanks for this suggestion! It is a very interesting plan.

Currently I use the partitioning expression as-is from the information_schema.PARTITIONS table.


I guess that I would need to parse those expressions and rewrite them - I am not sure this will be so simple. What worries me a bit is that a simplistic rewrite could introduce an error and mess up the backup.

Do you have a suggestion to solve this issue?

Thanks in advance, and kind regards.

Anonymous said...

The script is where ?

rpbouman said...

Anonymous, the script is at MySQL Forge:

http://forge.mysql.com/tools/tool.php?id=258

Unknown said...

set global innodb_stats_on_metadata=0

see http://www.mysqlperformanceblog.com/2011/12/23/solving-information_schema-slowness/

Unknown said...

set global innodb_stats_on_metadata=0

see http://www.mysqlperformanceblog.com/2011/12/23/solving-information_schema-slowness/

Lee said...

I'm interested in your partition backup script, but the page on forge.mysql.com is no longer available.

Can you post it somewhere else?

Thanks

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...