Friday, February 23, 2007

Cluster Certification and the MySQL User's Conference

So far, I've never blogged about the work I've been doing for the MySQL Certification Team. Time to change that: I'd like to tell you a little bit more about the things I've been doing since I joined MySQL AB.

I started in July 2006 and from the on, I've been almost exclusively occupied with activities to develop certification for MySQL 5.1 Cluster.
A summary of the things I have done so far:

  • Working with the training department to design an outline for the cluster exam

  • Interviewing MySQL Cluster developers to check all kinds of facts and details of the behaviour of MySQL 5.1 Cluster

  • Creating, reviewing and editing numerous items for the cluster exam

  • Working with the documentation department to align the exam with the development of a MySQL 5.1 Cluster Certification Study Guide

  • Reviewing chapters and exercises for the MySQL 5.1 Cluster Certification Study Guide


Cluster Certification and the MySQL User's Conference


I'm also very excited to be speaking about MySQL Cluster Certification at the MySQL User Conference and Expo (April 23 through 26, in Santa Clara, CA).



I'll be giving a MySQL Cluster Certification Primer on the Monday morning. Kai Voigt, who was an enormous help during the development of the cluster exam, will be giving a MySQL 5.0 DBA I Certification Primer, and Sarah Sproehnle will be giving the MySQL 5.0 DBA II Certification Primer.

If you like, you can get certified at the conference. At the UC, Certification exams cost 25 US$. That's just a little more than a tenth of normal price (200 US$)! I will be around to proctor various MySQL Certification Exams, including the MySQL 5.1. Certified Cluster DBA Exam.

I'll be writing some more about the new MySQL Cluster Certification and the corresponding study guide in a few days. In the mean time, check out the Cluster Section in the MySQL Certification Candidate Guide.

See you at the UC!

Tuesday, February 20, 2007

Updated procedure for creating MySQL FEDERATED tables

Three months ago, I wrote about a procedure to create MySQL FEDERATED tables.

I just added a quick fix that should make the procedure less susceptible to issues relating to bug #23856. To do that, I had to remove the ORDER BY ordinal_position bits in all the calls to GROUP_CONCAT over the rows in the information_schema.COLUMNS table.

If you include the ORDER BY ordinal_position clause, the concatenation result will mess up sometimes. The behaviour can be attenuated somewhat by decreasing value for the group_concat_max_len server variable, but so far, I have not seen a sufficiently large valy of group_concat_max_len that does not display this behaviour.

Omitting the ORDER BY ordinal_position clause does not seem change the order. Rows from information_schema.COLUMNS seem to be ordered by TABLE_SCHEMA, TABLE_NAME, COLUMN and ORDINAL_POSITION anyhow. I hope I can rely on that order, as all sorts of trouble are to be expected when the column order of the local table differs from that of the remote table.

I also added some output so you can see what the procedure is doing. Some steps, like getting the remote metadata, take rather long, and I found some output to be helpful.

The updated procedure can be found here in the usual spot at MySQLForge.

The updated procedure does not yet support creating federated tables using a separate SERVER schema object (see the manual).

Please, try it, and report bugs in the procedure by adding a comment to this blog entry. Thank you!

Sunday, February 11, 2007

MySQL: Transactions and Autocommit

Some people believe that the ability to commit or rollback multiple statements as a single unit is a defining characteristic of the concept of transactions. They draw the -false- conclusion that enabling autocommit is the same as disabling transactions. It is easy to demonstrate why this is not true.

Autocommit


With autocommit enabled, every statement is wrapped within its own transaction. Successful execution of a statement is implicitly followed by a COMMIT, and the occurrence of an error aborts the transaction, rolling back any changes.

By default, autocommit is enabled in MySQL. You can check the current setting by executing the following statement:
mysql> select @@autocommit;
+--------------+
| @@autocommit |
+--------------+
| 1 |
+--------------+
1 row in set (0.00 sec)

The @@ prefix denotes a server variable. The 1 means that autocommit is currently enabled. Let's leave it enabled so we can test the difference with not using transactions.

Transactions and Storage Engines


First, we create a table backed by a storage engine that is capable of handling transactions. I'm using the new FALCON engine in these examples, but you can try it with any of the storage engines that is capable of handling tranactions, such as InnoDB, ndbcluster, PBXT or SolidDB. If you want to try with falcon, please download and install MySQL 5.2.

Let's create that FALCON table in the test database:
mysql> use test;
Database changed
mysql> create table test_falcon(i int primary key)
-> engine = falcon
-> ;
Query OK, 0 rows affected (0.70 sec)

A word of warning: be sure to check the message after creating the table. If you accidentally type the name of the storage engine wrong, a table might be created anyway. If you see a warning after creating the table, always check the warning:
mysql> create table t(i int) engine = ferrari;
Query OK, 0 rows affected, 1 warning (0.14 sec)

mysql> show warnings;
+-------+------+--------------------------------+
| Level | Code | Message |
+-------+------+--------------------------------+
| Error | 1286 | Unknown table engine 'ferrari' |
+-------+------+--------------------------------+
1 row in set (0.00 sec)

You can completely avoid this problem by including NO_ENGINE_SUBSTITUTION in the SQL_MODE. (In fact, I recommend to include it and make sure the proper mode is set when starting the server by including it in the [mysqld] section of your my.ini or my.cnf option file)

Testing Atomicity


Now, we execute a single statement that inserts two rows of data into the table:
mysql> insert into test_falcon values (1),(2);
Query OK, 2 rows affected (0.05 sec)
Records: 2 Duplicates: 0 Warnings: 0

The query executed succesfully, so we can assume two rows are now stored in the table. We can of course attempt to roll back to undo these changes:
mysql> rollback;
Query OK, 0 rows affected (0.00 sec)

Of course, when we check, the rows are still there:
mysql> select * from test_falcon;
+---+
| i |
+---+
| 1 |
| 2 |
+---+
2 rows in set (0.02 sec)

This is completely normal and expected behaviour. The rows were automatically committed right after the insert statement was executed: that is the meaning of autocommit.

We will now attempt to add three more rows to the table:
mysql> insert into test_falcon values (3),(4),(1);
ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'

Whoops! This statement did not execute succesfully. We attempted to insert three rows, but at least one of them would violate the PRIMARY KEY constraint if it would be inserted. So at least, the violating row is rejected. But what about the other rows? Let's find out:
mysql> select * from test_falcon;
+---+
| i |
+---+
| 1 |
| 2 |
+---+
2 rows in set (0.00 sec)

Aha! The statement did not execute successfully, and therefore, all changes that would result from executing that statement are undone. With a posh word, we can say that the statement was executed atomically. The property of transactions that allows for atomic execution of statements is called atomicity

To say that the statement was executed atomically just means that the entire statement, and all of the changes it would achieve is executed as one single unit, undividable. If it succeeds, all associated changes on all rows it touches are made permanent. If it fails, none of the changes are made.

So, Atomicity ensures that statement execution is always complete: It either completely succeeds, or it completely fails. How does this compare to not using transactions at all? Well, we can test this with the MyISAM storage engine. The MyISAM storage engine does not implement transactions, and has no understanding of the autocommit setting. It is completely ignorant of it, because it can't commit or rollback anyway.

So, let's create a table again, this time using the MyISAM storage engine:
mysql> create table test_myisam (i int primary key)
-> engine = myisam
-> ;
Query OK, 0 rows affected (0.00 sec)

Again, we insert two rows in a single statement.
mysql> insert into test_myisam values (1),(2);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0

Now for the interesting bit. Again, we execute one statement in order to insert multiple rows, and again, we deliberately construct the statement in a manner to violate the primary key constraint:
mysql> insert into test_myisam values (3),(4),(1),(5);
ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'

Like with the test_falcon table, this statement is not acceptable, because there is again at least one row that would violate the PRIMARY KEY constraint if it would be inserted. So, how is this different from the previous test? Let's find out by looking what's in the table...
mysql> select * from test_myisam;
+---+
| i |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
+---+
4 rows in set (0.00 sec)

Boom! The last statement attempted to insert no less than four rows, yet it added only two! Why only two? Well, after inserting two rows (3 and 4) succefully, a row was encountered that would violate the primary key constraint if it would be inserted (the value 1 was already present). At that point, an error condition was raised, and execution stopped. The remainder of the rows (5) was not even processed.

It's impossible give a clear answer to the question whether the statement succeeded or failed. We ordered four rows to be inserted, yet it failed to insert two of them. Clearly, the statement did not succeed completely. It did not fail completely either, because it succeeded in inserting two of the four rows.

This indeterminate state is the consequence of the lack of atomicity. The MyISAM engine does not support atomic execution of statements. The statement may succeed for some of the rows, and fail for others. This is simply impossible for the transactional engines. Statements on transactional statements are always executed atomically, regardless of the setting of autocommit.

Disabling Autocommit


So, I hope it's clear now why autocommit has everything to do with transactions. Still for a lot of people, the added value for using transactions is the ability to execute multiple statements within a single transactions. To do that, one needs to disable autocommit.

Within an application's session it is pretty easy to disable autocommit. One just needs to execute the proper SET statement:
SET autocommit=0

However, a lot of people are not really satisfied with this solution. Especially those that are used to Oracle, is seems daft to have to explicitly specify it.

There is a trick one can use to automatically disable autocommit for each newly created session. In the my.ini or my.cnf configuration file, you can specify the init_connect option. This option can be used to automatically execute a statement or sequence of statements at the beginning of each new session.
[mysqld]
init_connect='SET autocommit=0'

It's documented in the manual, and the example happens to use autocommit. Take heed though: the statements specified for the init_connect option are not executed for users that have the SUPER privilege.

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...