Tuesday, February 26, 2008

Online VMWare image Creator Service

Note to self: don't hack up .vmx files manually anymore. Instead, generate the entire image including vmdk's here: http://www.easyvmx.com/

Saturday, February 16, 2008

c,mm,n - Open Source defining the future of Mobility

cmmn-logo-1Today was at the faculty of Industrial Design in Delft to attend the first 'c,mm,n Garage' event. c,mm,n - pronounced as common - is a project to develop sustainable mobility. It is led by Stichting Natuur en Milieu, The Netherlands Society for Nature and Environment and currently the main participants are students from a number of Dutch technical universities.

Right now, engineering efforts are focused on a clean, remanufacturable car that runs on electricity generated by a hydrogen fuel cell: Cmmn
The car is almost completely built out of biodegradable plastics, making it extremely light and environmentally friendly.
There is a YouTube Video available that shows off the exterior design and which gives you a good impression of how open and spacious the design is. For example, the windshield extends very far to the front, giving somewhat the impression if sitting inside a helicopter.

One of the coolest things of the project is that it is open source. For example, the car's blueprints, composition of materials, construction of components, etc. is released under an open source license. (Not sure which one exactly). Check out the Developer's Wiki.

Another cool thing of this project is that it doesn't just focus on products like vehicles. Instead, the people involved in the project are trying to define mobility services (the lease company Athlon is already experimenting with this), and also how community participation can change the ways in which we are mobile.

The idea behind a 'mobility community' is something that very much intrigues me. It is kind of hard to explain the ambitions of the project, as they are very far-stretching. But let me try and give a tangible example.

One of the things that will be implemented in this car is that it will have on-board internet access to exchange routes with other people in your part of the mobility community. Through the network, you can catch up with each other and participate in a so-called 'platoon': basically, several cars form a train, which is controlled by the car in front. The other cars switch on an automatic pilot, allowing the fellow travelers to stop driving and do some work or chat or whatever. The interior of the car is designed in a manner that it allows the seats to rotate in order to support this.

Allowing the other drivers to do something more useful than driving is of course great in itself, but what is really terrific is that the car is designed so that riding in a platoon will actually reduce fuel consumption even more. So it is not only convenient, it will actually be more efficient. I can see all kinds of applications for this platoon-riding. Think of a taxi service or hotel shuttle service...Instead of being cramped up with many people in one van, dropping off everybody sequentially, you can now leave as train (or even form one as you are going) and let the cars with individual destinations bud off the train, taking those passengers to the exact desired location, after which the cars travel back to join a new train that's underway.

Monday, February 11, 2008

Reporting MySQL Internals with Information Schema plug-ins

Last week, I described how to use the MySQL plug-in API to write a minimal 'Hello world!' information schema plug-in. The main purpose of that plug-in is to illustrate the bare essentials of the MySQL information schema plug-in interface.

In this article, I'd like to take that to the next level and demonstrate how to write an information schema plug-in that can access some of the internals of the MySQL server. For this particular purpose, we will focus on a plug-in that reports all the SAVEPOINTs available in the current session. This MYSQL_SAVEPOINTS plug-in may be of some value when debugging scripts and stored routines that rely on complex scenarios using transactions and savepoints.

In a forthcoming article, I will describe a few information schema plug-ins that are arguably more interesting, such as a plug-in to list the currently existing TEMPORARY tables, user-defined variables, and the contents of the query cache. Although the plug-in described in this article may be of some use, its main purpose is to illustrate the minimal requirements for plug-ins that can access the server's internals.

A Quick Recapitulation

You might recall that:

  • The MySQL plug-in API is one of the new features in MySQL 5.1, and forms a generic extension point of the MySQL database server, allowing privileged database users to add functionality to the MySQL Server by loading a shared library from the plug-in directory

  • Loading and unloading a plug-in is a completely dynamic process controlled using the INSTALL PLUGIN and UNINSTALL PLUGIN syntax, and does not involve compiling the server or even restarting it

  • There are several types of plug-ins, the most well-known being storage engines and full-text parsers. Less well-known types include information schema and daemon plug-ins.

  • An information schema plug-in provides the implementation of a table (or actually, a system view) in the information_schema database

  • Plug-ins are usually implemented in C/C++. To implement a plug-in, the implementor must include the header file plugin.h and provide an initialized instance of the st_mysql_plugin structure. In addition, the implementor must provide code to implement the plug-in type dependent part of the interface

  • The plug-in type dependent part of the interface for information schema plug-ins consists of two things: the column definitions of the information schema table and a fill_table function that is called whenever the server wants to retrieve the rows of data from that table.

How Information Schema plug-ins can access MySQL Server internals

Before we discuss the MYSQL_SAVEPOINTS information schema plug-in in detail, let's take a look at the way information schema plug-ins can obtain access to the internals of the MySQL server.

Like I just recapitulated from last week's article, the plug-in type dependent inferface for information schema plug-ins consists of two things:
  • An array of ST_FIELD_INFO structures, each of which defines a column of the information schema table
  • A fill_table function that is called by the server when it needs to retrieve the data from table
The column definitions may be considered a static part of the interface - they simply define the structure of the table - no more, no less. The fill_table function is a different matter. Let's take a look at the signature of the signature of that function:

int fill_table(THD *thd, TABLE_LIST *tables, COND *cond);
The TABLE_LIST *tables argument provides the handle to the information schema table that is being filled, and the COND *cond argument represents the WHERE condition of the SQL statement that is currently being handled, allowing the fill_table function to directly filter rows (instead of relying on the query execution engine to do that). As such, these arguments are occupied with the actual delivery of rows of data to the server.

The first argument to the fill_table function offers all kinds of interesting opportunities to see what is going on inside the server. We will discuss it in more detail in the next section.

Public accessors to the current THD instance

The first argument to the fill_table function is THD *thd. This is the so-callled thread descriptor - something that is best thought of as a handle to the current session. Note that in a MySQL context, the terms connection, thread and session are often used interchangeably. However, I find the term thread too broad, and the term connection too narrow. As there are many parts in THD that maintain state regarding the events that occurred since a connection is established, it seems most sensible to think of THD as the server-side implementation of a session.

This session handle or thread descriptor has the form of a pointer to an instance of the THD class. The plugin.h header file contains a forward declaration to this class, but the actual declaration is contained in sql/sql_class.h. The THD class is one of the key data structures in understanding the workings of the MySQL server as it is passed as an argument to many internal server functions. Consequently it provides a wealth of possibilities to create interesting new information schema plug-ins. In fact, the number of possibilities are so great, that a number of common usages has been explicitly set aside in the plugin.h header file.

The plugin.h header file contains a number of function declarations and macros that provide access to the members of a THD instance. I will not discuss all of them here, but highlight just a few just to give you an idea:

  • thd_test_options() - Find out which options are set. This can be used to find out whether a number of boolean options like big_tables, (general and binary) logging, and autocommit are enabled or disabled.

  • thd_proc_info() - Should be used by the plug-in implementor before starting a potentially time-consuming operation so the rest of the world can monitor what this session doing. The code set here corresponds to the value reported in the State column by the SHOW PROCESSLIST statement

  • thd_killed() - Can be used by the plug-in implementor to find out if the thread in which this session lives was killed. If the plug-in is involved in a potentially time-consuming process, the plugin-in implementor should periodically check this and gracefully abort the plugin-ins work when it detects that the thread was killed.

  • thd_alloc() - Allocates some memory from this session's memory pool. If the plug-in requires some small amount of memory, plug-in implementors should use this rather than the standard malloc() function. Calling thd_alloc(); is likely to be faster because it takes memory out of a pre-allocated pool, reducing contention. In addition, it is more convenient because the memory need not be explicitly freed: it is automatically reclaimed by the pool after handling the current statement.

If you are interested in seeing all these declarations, just open plugin.h and look for comments like this:

Miscellaneous functions for plugin implementors

plugin.h describes a public interface

In plugin.h, the declarations as described in the previous section together form a public interface to the current session. They are there for the convenience of plug-in implementors and represent a 'safe' way to work with the THD pointer passed to the fill_table function.

To say that these form a public interface is to stay that these are officially supported by MySQL AB. That is: they will be supported officially once the MySQL 5.1 Server is a generally available release. From that point on you can rely on these functions when writing plug-ins in the sense that you do not have to be afraid that they will change. At least, the public interface will remain the same for all forthcoming builds of the 5.1 server. Any interface changes in future releases will involve a proper process, giving everybody the chance to update their code well in time.

Unfortunately, not every function declaration in plugin.h has source code comments. This means that for now, you sometimes need to do some digging in the server's source code to find out what you can do with them. I admit that this situation is not exactly perfect. However, the matter has been reported as a bug, and hopefully, it will be adressed soon.

Beyond the public interface

I just described the public interface plug-in implementors can rely on. A distinct advantage of the public interface is that it takes away a lot of the complexity of the underlying internals of the MySQL Server. However, there will always be cases where the public interface does not offer the features you really need. In those cases, you simply need to be able to work directly on the server internals.

The advantage of directly referencing the server's internals is that you can access all the interesting nuts and bolts and bits and pieces. The downside is that there is absolutely no guarantee that your code will work in another version of the server. The internals are by definition the parts that are not meant to be exposed. As such, it is possible that your code does not work or behaves unexpectedly in another version of the server.

Let's not dwell too long on the disadvantages. Instead, let's focus on the merits of pluggable information schema tables. Granted, it is inconvenient that we may need to make our code resilient to each different build of the server. However, for many applications, it is not very likely that we have to constantly do that.

Even if we do have to change our code, the burden will be on the developer of the plug-in. For each specific build of the server, your code may need to be different. Even if the code itself does not change, you will probably at least have to recompile your plug-in for each specific build of the server. However, your users are still not required to recompile the server itself. They can still install the plug-in without stopping or restarting the server, which in many cases seems more important than bearing the burden of changing the code.

You need to break some eggs to bake an omelet - so if you're hungry, you better get over it and start breaking some eggs ;-)


In order to access the server's internals beyond the public interface, we need to use some C/C++ preprocessor magic and define MYSQL_SERVER. This define needs to be present before we include any MySQL header (or source) files:

Throughout the MySQL codebase, there are many sections that are conditionally included or excluded depending on whether MYSQL_SERVER is defined. It is hard to pinpoint the exact effect of adding this definition, because there many spots that use this definition to control conditional compilation.

Normally the MYSQL_SERVER definition need be present only when compiling the server proper, but in this case we need it to let the plug-in code work with internal structures such as THD instances directly, that is, without using the accessors provided by the public interface.

To be absolutely clear: using the MySQL_SERVER define in your code does not mean you must compile your plug-in as part of the server. On the contrary - you can compile your plug-ins separately from the server, and still (un)install them at runtime. The only thing the MySQL_SERVER define does, is pull in the declarations that are normally considered to be 'internal'. They will for example allow us to work directly with the members of the THD class instead of being required to use the public accessors defined in plugin.h.

Implementing the MYSQL_SAVEPOINTS Information Schema plug-in

Now that we sketched the backgrounds, we can quickly proceed and discuss the implementation of the MYSQL_SAVEPOINTS information schema plug-in. (Note that you can download the source code file mysql_is_savepoints.cc here.)

Most of the things are rather similar to what was described in the article describing the MYSQL_HELLO plug-in, for which you can still download the mysql_is_hello.cc source code.

We will do like we did last week and assume the following things are in place on your system:

Creating the source file

First, we need to create a C++ source file. We will assume that the working directory is ~/mysql_is_savepoints/, and that the source file is called mysql_is_savepoints.cc.


Like we explained in the previous sections, we need to define MYSQL_SERVER so we can directly reference the members of the THD class passed to our fill_table function.

Because this affects how the included files are processed, we do this at the very top of our source file.

Include files

We can use the same list of includes we used for the MYSQL_HELLO plug-in - the MYSQL_SERVER define is responsible for including all the additional things we require to write the MYSQL_SAVEPOINTS plug-ins.

#include <mysql_priv.h>
#include <stdlib.h>
#include <ctype.h>
#include <mysql_version.h>
#include <mysql/plugin.h>
#include <my_global.h>
#include <my_dir.h>

Defining the columns

For the MYSQL_SAVEPOINTS plug-in, we will define two columns: SAVEPOINT_ID and SAVEPOINT_NAME. At the SQL level, it will look something like this:

| Field | Type | Null | Key | Default | Extra |
| SAVEPOINT_ID | bigint(0) | NO | | 0 | |
| SAVEPOINT_NAME | varchar(64) | NO | | | |
...and this is what it looks like in the C/C++ source file:


static ST_FIELD_INFO mysql_is_savepoints_field_info[]=
{"SAVEPOINT_ID", 0, MYSQL_TYPE_LONGLONG, 0, 0, "Savepoint Id"},
{"SAVEPOINT_NAME", 64, MYSQL_TYPE_STRING, 0, 0, "Savepoint Name"},
This time, in addition to creating the ST_FIELD_INFO array of column definitions, we also create #defines for the array entry indexes. The defines allow us to refer to the column definitions using the names rather than the raw, literal integer array indexes. This has the advantage that we do not have to change code should we want to change the positions of the columns. Another advantage is that our fill_table code will be easier to read: by consistently referring to COLUMN_SAVEPOINT_ID and COLUMN_SAVEPOINT_NAME rather than 0 and 1 it will be much easier to see what is going on.

Filling the table

Now we come to the heart of the matter: generating a row for each SQL SAVEPOINT available in the current session.

The savepoints for the current session are available in the transaction member of the THD class. The transaction member is an instance of the st_transactions struct, which is declared locally inside the THD class:

class THD :public Statement,
public Open_tables_state

...many, many lines here...


struct st_transactions {
SAVEPOINT *savepoints;

...a few more lines here...

} transaction;

...many, many more lines here ...

Now you might recall that the THD class is declared in sql/sql_class.h. However, you might have some trouble locating the transaction member, because the declaration of the THD class is extremely large and long-winded: in the MySQL 5.1.22-rc source distribution, it ranges from lines 960 to 1886(!!) - and those 900 something lines make up only the declaration!

(Although the official explanation for the name of the THD class is that it is an acronym for THread Descriptor, some developers* explained that it is one of the few class names that is spelled in capitals because it is so incredibly heavy. According to this anecdote, its name should be pronounced as "...THUD!!...THUD!!..." because of the sound it makes each time it is dumped into the argument list of a function that makes up the servers source code.

* = Thanks to Eric Herman for painting this creative and tangible likeness ;-)

Anyway, you will find it easier when you look for st_transactions, or go directly to line 1149, but note that the line number is likely to be different in other versions of the server code.

Now, we can see that the st_transaction struct contains a pointer to a SAVEPOINT pointer called savepoints. As we shall see later, this is actually the list of savepoints we need. But what kind of type is this SAVEPOINT exactly?

Well, to get past this point, you really need some patience and a set of tools that allow you to search the source code. In the case of SAVEPOINT, it turns out that this is actually a typedef for the st_savepoint structure. Now, the odd thing is that this typedef appears in sql/handler.h:

typedef struct st_savepoint SAVEPOINT;
But the structure st_savepoint itself is declared in sql/sql_class.h - that is, the same file that declares THD, which seems to prefer SAVEPOINT rather than st_savepoint!

Well - it is beyond me why it was done like this. For our purpose it doesn't really matter though, let's examine the declaration of st_savepoint instead:
struct st_savepoint {
struct st_savepoint *prev;
char *name;
uint length, nht;
Here we can see that each st_savepoint has a char * member called name, which is presumably whatever name the user provided in the savepoint syntax:

mysql> SAVEPOINT my_savepoint;
So in this case, the name member of the st_savepoint instance corresponding to this SQL SAVEPOINT will point to the character string "my_savepoint".

Apart from the name we can also see that each st_savepoint has itself a pointer to another st_savepoint called prev. This suggests a single linked list of savepoints.

This is about all the information we need to implement the fill_table function. So, here it is:

int mysql_is_savepoints_fill_table(THD *thd, TABLE_LIST *tables, COND *cond)
int status = 0; /* return value for this func, 0=success, 1=error*/
CHARSET_INFO *scs= system_charset_info; /* need this to store field into table */
TABLE *table= (TABLE *)tables->table; /* handle to the I_S table. class declared in table.h */
uint savepoint_id = 0;

SAVEPOINT *sv= thd->transaction.savepoints;

while(sv && !status)
/* store the savepoint sequence into the table column */
table->field[COLUMN_SAVEPOINT_ID]->store(++savepoint_id, 0);
/* store the savepoint name into the table column */
table->field[COLUMN_SAVEPOINT_NAME]->store(sv->name, strlen(sv->name), scs);

status= schema_table_store_record(thd, table);
sv= sv->prev;
return status;
The biggest difference with the fill_table function we used in the MYSQL_HELLO example is that instead of just storing one single row, we have a loop, storing one row for each iteration. The loop is initialized by assigning the SAVEPOINT pointer from the transaction member from the THD instance that is passed as the first argument to the fill_table function to a local sv variable:

SAVEPOINT *sv= thd->transaction.savepoints;
Of course, it is possible that there are no savepoints in the current session, in which case thd->transaction.savepoints will be the NULL pointer. However, if there are savepoints, a pointer to the last savepoint that was created in the current session will now be stored in sv. We can now set up the actual loop:

while(sv && !status)

...lines here...

status= schema_table_store_record(thd, table);
sv= sv->prev;
Note that the loop will be entered only if sv points to a savepoint. If it does, data from the savepoint is written to the columns of our information schema table.

In the bottom of the loop, we store the current record using the schema_table_store_record, which we discussed already for the MYSQL_HELLO example:

status= schema_table_store_record(thd, table);
Interestingly, we were required to make a forward declaration to it in the MYSQL_HELLO example. Now, we don't have to do this, presumably because we defined MYSQL_SERVER.

You might recall that schema_table_store_record function returns 0 in case of success and 1 instead of failure. Note that if a failure occurs at this point, the loop will not iterate again, as the while condition requires status to be not true (that is, zero).

After storing the row, the last step of the loop is to move back and examine the previous savepoint:

sv= sv->prev;
If the end of the list is reached, sv will be NULL, preventing the loop to iterate again. However, if there is in fact a previous savepoint, the loop will run once again and create a new row for that savepoint too, on and on until we reach the end of the list of savepoints.

In the top of the loop, we store data into the columns of our information schema table:

/* store the savepoint sequence into the table column */
table->field[COLUMN_SAVEPOINT_ID]->store(++savepoint_id, 0);
/* store the savepoint name into the table column */
table->field[COLUMN_SAVEPOINT_NAME]->store(sv->name, strlen(sv->name), scs);
This time, we use our defines COLUMN_SAVEPOINT_ID and COLUMN_SAVEPOINT_NAME instead of the literal numerical field indexes. We already demonstrated in the MYSQL_HELLO example how to store a string, so we won't discuss the line that stores the savepoint's name. Instead, let's find out how we can store an integer value by looking at the line that stores the savepoint id.

As you can see, we stipulate the value for the SAVEPOINT_ID column ourselves by simply adding one for each row:

table->field[COLUMN_SAVEPOINT_ID]->store(++savepoint_id, 0);
Savepoints by no means have a numerical ID of their own, but it makes sense to make one up in order to unambigously indicate the order in which the savepoints were created during this session. Note the second argument to the store method, which is always 0 here:

table->field[COLUMN_SAVEPOINT_ID]->store(++savepoint_id, 0);

This second argument is there to tell the store method whether the value represents a signed or an unsigned value. In this case, we are storing an unsigned value - it should be 1 for an unsigned value.

The rest of the implementation

The remainder of the implementation is quite similar to what was discussed for the MySQL_HELLO example. The most important difference is actually the plug-in name, but otherwise the implementation is identical. Therefore, it is not discussed here further.

Building and Installing

The build and install process is pretty much similar to that for the MYSQL_HELLO plug-in.


We can compile the plug-in like this:
g++ -DMYSQL_DYNAMIC_PLUGIN -Wall -shared
-o mysql_is_savepoints.so mysql_is_savepoints.cc
This will create the shared library mysql_is_savepoints.so.

Installing the plug-in

You might recall that the shared library needs to be moved to the plug-in directory. After that, we can install the plug-in using the INSTALL PLUGIN syntax:

mysql> INSTALL PLUGIN MYSQL_SAVEPOINTS soname 'mysql_is_savepoints.so';
Query OK, 0 rows affected (0.00 sec)

Using the plug-in

Now we can finally see our plug-in in action. At first, there will be no savepoints present:
mysql> SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
Empty set (0.02 sec)
Even if we set one, we won't see it immediately:

Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
Empty set (0.00 sec)
This is beause by default, the session has autocommit enabled. As each statement is wrapped in its own transaction, we will never be able to see any savepoints. So we disable autocommit:

mysql> SET autocommit = OFF;
Query OK, 0 rows affected (0.00 sec)
And now we can really witness the behaviour of our plug-in:

Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
| 1 | A |
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
| 1 | B |
| 2 | A |
2 rows in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
| 1 | A |
1 row in set (0.00 sec)

Learn More

This has been quite a ride! In this article it was demonstrated how we can use information schema plug-ins to report some of the things that are going on inside the current session. As such, the MYSQL_SAVEPOINTS plug-in is a big step forward compared to the MYSQL_HELLO plug-in.

However, this is just the tip of the iceberg - the current session harbours much more interesting information about the current session, and in a forthcoming article I will demonstrate a number of other usages. In particular, I will present a plug-in to report the user variables in the current session, and the temporary tables defined in the current session.

In another article, we will also see that it is sometimes possible to look beyond the current session and report on the status of server-wide structures, such as the query cache.

Meet us at the user's conference

When I discussed the MYSQL_HELLO plug-in, I already hinted that there will be a lot there for those people that want to learn more about extending the server with (information schema) plug-ins. You can find all those links in the bottom of that article.

In addition, you can learn a lot about the MySQL Server internals. And...you can learn it from one of the founding fathers: Monty himself will be doing A tour into MySQL's internals. So, I guess that's going to be one of those occasions where you get the opportunity to clear up some of those details in the source code you never quite managed to wrap your head around.

If you register before the 26th of februari, you'll get a $200 discount. There are more discounts available depending on whether you attended before, or if you register together with a number of colleagues; there's special student and non-profit discounts too - check it out here.

See you at the conference! (Bonus points for the first one to ask Monty in the Q&A why SAVEPOINT is typedef-ed in sql/handler.h instead of sql/sql_class.h; double bonus points for the first one to ask Monty if THD is really called like that because it is so heavy ;-)

Wednesday, February 06, 2008

The most misunderstood character in MySQL's SQL-Dialect

Many MySQL users like to use the MySQL monitor. The MySQL monitor is the executable bin/mysql (or if you are on MS Windows, bin\mysql.exe) and is perhaps better known as the MySQL command-line tool or MySQL command-line client.

The MySQL monitor

Basically, the MySQL monitor is a standard, text-based terminal that connects to a running MySQL Server when it is started. Once the connection is established, the user can enter text via the computer keyboard, which is at some point sent to the server. The server attempts to interpret the received text as SQL, which is then executed, sending the result back to the MySQL monitor. The MySQL monitor then prints the received results on the screen:

mysql> SELECT COUNT(*) FROM world.City;
| COUNT(*) |
| 4079 |
1 row in set (0.02 sec)
Here we see that immediately after the mysql> prompt, a line of text was entered. In the next few lines we see the result returned by the server. Or, to be really precise, the server returned a resultset consisting of one column and one row having the value 4079. The MySQL monitor is responsible for printing the ASCII art and the padding, printing this in a nice, human readable manner.

Statement terminator

This little dialog may seem monumentally uninteresting, but I hope to show that there really is something here to be learned. Let's focus for a moment on relating this dialog with the general description I gave of the MySQL client/server communication right preceding this example.

I explained that the client sends the text to the server, which interprets it as SQL and returns the result. The 'interesting' thing is this: if the server is the one that interprets text as SQL, how then does the client know when to send the entered text to server?

For many people the answer will be obvious: it is the semi-colon that appears as the last character on the first line of our example, right after the entered text. So the MySQL monitor does not know how to recognize SQL, but it does know how to recognize a semi-colon, and interprets that as a statement-terminator. So, that semi-colon actually never reaches the server - the MySQL monitor just sends the text that appears before the poition where the semi-colon was detected.

The DELIMITER command

We can tell the MySQL monitor to not scan for semi-colons, but for something else entirely. This is done using the DELIMITER command:

mysql> DELIMITER go
mysql> SELECT COUNT(*) FROM world.City
-> go
| COUNT(*) |
| 4079 |
1 row in set (0.02 sec)
So here we see that the MySQL monitor is instructed to scan for go rather than the semi-colon. The line that goes like DELIMITER go is not an SQL-statement: it is a command recognized by the MySQL monitor, and it is never sent to the server.

The semi-colon in stored routines

Most MySQL users that have created stored routines and the like will be familiar with this DELIMITER command.

The reason is that in many cases, stored routines are composed of typical procedural constructs such as BEGIN...END compound statements and flow-control statements such as IF...THEN and WHILE...DO. These types of statements can themselves contain a list of statements, and the server requires each of these statements to be terminated (or actually, separated) by the semi-colon.

Obviously, if stored routine statements like these would be entered through the MySQL monitor, it is no use if the MySQL monitor still scans for semi-colons. It will just send text prematurely to the server, which in turn tries to interpret each as a well-formed SQL statement, usually resulting in a frenzy of errors being returned back to the MySQL monitor.

So, this is probably one of the most abundant usages of the DELIMITER command: to allow stored routines to be entered through the MySQL monitor.

The semi-colon outside stored routines

What many MySQL users don't realize is that it is perfectly valid to separate normal, non-procedureal SQL statements with the semi-colon. Like when entering stored routines, the MySQL monitor needs to have its delimiter set to something other than the semi-colon, but after having done so, you can enter a batch of statements to be sent to the server at once:

mysql> DELIMITER go
mysql> SELECT COUNT(*)
-> FROM world.Country;
-> FROM world.City;
-> go
| COUNT(*) |
| 239 |
1 row in set (0.00 sec)

| COUNT(*) |
| 4079 |
1 row in set (0.00 sec)
This is not some trick of the MySQL monitor: the batch is sent as a whole, and the results are received as a whole too, all in one network rountrip.

Tuesday, February 05, 2008

Errata page for the MySQL 5.1 Cluster Certification Study Guide

ccsg-coverI am very pleased to announce that we - that is, our excellent documentation team - set up the errata page for the MySQL 5.1 Cluster Certification Study Guide!. For those that don't know, this guide is the official and authoritative guide to prepare for the Certified MySQL 5.1 Cluster DBA exam. Apart from being a study guide, it is probably the most up-to-date text on MySQL Cluster.

As more people are buying a copy, we are receiving more and more feedback. It's mostly positive, but one of the things that is becoming more and more clear is that a number of the illustrations aren't as clear in print as they should be. So we decided to put most of these on the errata page, along with corrections to any textual errors.

So, if you bought a copy, take a look. Or if you are just curious what the pictures look like and want to get a feel for what this book is about, feel free to take a look too. Of course, if you have the book and want to notify us of some error or irregularity, feel free to contact me or the team. You can email us at certification at mysql dot com.

Friday, February 01, 2008

MySQL Information Schema Plugins: the best kept secret of MySQL 5.1

MySQL 5.1 offers an extremely useful feature called information schema plug-ins. This feature allows dynamic runtime loading of a shared library into the MySQL server to implement a table in the information_schema database. The SQL standard (ISO/IEC 9075-11:2003) allows database implementations to extend the information_schema. MySQL 5.1 transfers the possibility to do this directly to privileged database users so they can extend the information_schema themselves, in any way they see fit.

In this article, we will demonstrate how to create a minimal "Hello, World!" MySQL information schema plugin. In a forthcoming article, we'll demonstrate how information schema plugins may be used to report some of the server's internals such as the contents of the query cache, session level objects such as the currently defined TEMPORARY tables, user-defined variables and SAVEPOINTs.

The MySQL Plug-in API

Information Schema plug-ins are a subfeature of the MySQL plug-in API.
The plug-in API is one of the new features in the upcoming release of the MySQL database server, MySQL 5.1 (which is currently a release candidate). In essence, the MySQL plugin API provides a generic extension point to the MySQL server. It allows users to load a shared library in order to add new functionality to the server.

Plug-ins can be loaded and unloaded using the MySQL specific INSTALL PLUGIN and UNINSTALL PLUGIN syntax respectively. A key feature is that this process is completely dynamic - the server need not be re-compiled and need not be stopped in order to benefit from the functionality of a new plugin. Hence, new functionality can be added without suffering any downtime.

In some respects, the new plugin feature resembles the since long supported user-defined function (UDF) feature. Both plugins and UDFs involve dynamically loading a shared library to extend the server's functionality. Like UDFs, plug-ins are usually written in C/C++. The difference between UDFs and plug-ins is that the UDF feature can be used only for adding new functions to use within the server's SQL dialect. The concept of a plug-in is more broadly applicable and can be used to extend the server in more ways.

Currently, plug-ins are not supported on Microsoft Windows. MySQL is working to lift this limitation but it is at present unclear when this feature will be available for windows.

Currently supported plug-in types

Currently, the MySQL plug-in API supports the following types of plugins:
  • Storage engine: can be used to implement special-purpose row stores for data, which can then be accessed through SQL. Arguably, the Pluggable Storeage Engine Interface is one of the key benefits of MySQL 5.1
  • Full-Text parser: can be used for custom indexing of text-data as well as specialized handling of FULLTEXT query-expressions.
  • Daemon: a daemon plug-in can be used to execute a background process.
  • Information Schema table: are used to implement a 'virtual' table in the MySQL information_schema to report the status of for example the operating system or the server's internals
Currently the plug-in API does not provide support for UDFs but it is expected that in due time, the current user-defined function feature will be merged into the plug-in API.

A closer look at the plug-in API

(Note: in this article I will repeatedly refer to a number of C/C++ header and source files that are part of the MySQL 5.1.22-rc source distribution. Any path that starts with mysql-5.1.22-rc/ should be taken to mean the corresponding path beneath the root of an unpacked MySQL 5.1.22-rc source distribution.)

For all plug-in types, the interface comprises a generic plug-in description structure, which is a struct called st_mysql_plugin. This struct is defined in the plugin.h header file, located in the mysql-5.1.22-rc/include/mysql directory.

The declaration of this structure is as follows:

Plugin description structure.

struct st_mysql_plugin
int type; /* the plugin type (a MYSQL_XXX_PLUGIN value) */
void *info; /* pointer to type-specific plugin descriptor */
const char *name; /* plugin name */
const char *author; /* plugin author (for SHOW PLUGINS) */
const char *descr; /* general descriptive text (for SHOW PLUGINS ) */
int license; /* the plugin license (PLUGIN_LICENSE_XXX) */
int (*init)(void *); /* the function to invoke when plugin is loaded */
int (*deinit)(void *);/* the function to invoke when plugin is unloaded */
unsigned int version; /* plugin version (for SHOW PLUGINS) */
struct st_mysql_show_var *status_vars;
struct st_mysql_sys_var **system_vars;
void * __reserved1; /* reserved for dependency checking */
Through this structure, the plug-in implementor provides the following things:

  • A number of data members that convey some meta data about the plug-in. These members are used to list the plug-in in the information_schema.PLUGINS system view once it is loaded. Among these is the member char *name. The value that the plug-in implementor provides for the name member is the name that is to be used for the INSTALL PLUGIN and UNINSTALL PLUGIN SQL syntax. For information schema plug-ins, the name member is also used as the table name of the information schema table implemented by the plug-in.

  • Optionally, an array of system variables through which the plug-in may be controlled, and an array of status variables so the plug-in may report its (dynamic) status to the server.

  • pointers to functions that are called upon when the plug-in is loaded (the plugin_init function) and unloaded (the plugin_deinit function)

  • A pointer to an object or structure that provides the actual functionality to implement a plug-in of the specific type

The type-dependent part of a plug-in implementation is usally a struct containing a number of function pointers ('hooks') which are called in a particular sequence that is appropriate for that particular type of plug-in.

Information Schema Plugins

So what is an information_schema plug-in exactly? Well, the information_schema is a virtual database which is primarily intended as a meta data facility: it acts as a container for a collection of read-only 'tables' or rather system views that serve to provide data about the database server itself. As such, it is defined as part of the SQL standard (ISO/IEC 9075-11:2003).

The SQL standard describes a number of views that should appear in the information_schema, and mysql provides partial built-in support for these standard information schema views. The standard also expressly allows database implementors to extend the information_schema by adding new views, or extending the specified tables by adding columns. MySQL information schema plug-ins simply form an interface to allow privileged database users to extend the information schema themselves by writing their own information schema table implementations.

The type-specific API for Information Schema plug-ins

The type-specific part of the plug-in API for information schema plug-ins is formed by the struct called ST_SCHEMA_TABLE, which is defined in mysql-5.1.22-rc/sql/table.h.

When the plug-in is loaded, the server calls the plugin_init function associated with the int (*init)(void *) member of the st_mysql_plugin struct. When called, the server passes a pointer to an instance of a ST_SCHEMA_TABLE struct to the init function. Inside the plugin_init function, it is expected that the plug-in implementor initializes two members of that structure:
  • fields_info - an array of ST_FIELD_INFO structures. Each entry in this array corresponds to a column in the information schema table. The ST_FIELD_INFO structure is also declared in mysql-5.1.22-rc/sql/table.h
  • fill_table - a pointer to a function that is called whenever the server wants to obtain rows from the information_schema table
The ST_SCHEMA_TABLE struct has more members still, but only these two need to be initialized by the plug-in implementor. As long as the plug-in is loaded, the information_schema contains a new table that has a definition corresponding to the column definitions provided by the array assigned to the fields_info member. Whenever that table is queried, the function assigned to the fill_table member is called to actually construct the table's rows.

These two elements really is all there is to the specific API for information schema plug-ins.

Writing an Information Schema Plugin

In this section, I will demonstrate how to write a minimal "Hello World!" MySQL information schema plugin. If you like, you can download the C++ source code.

Apart from a text editor, writing a simple, bare bones information schema plug-in requires no more than a C++ compiler and a number of MySQL's C/C++ header files.

The following examples assume a GNU/Linux environment, a simple text editor, the g++ compiler and the MySQL header files. In order to get the necessary header files it is best to obtain a MySQL source distribution.

(For the purpose of this article, we will assume MySQL 5.1.22-rc. Until MySQL 5.1 is generally available, you are advised to always compile your plug-ins using only the header files shipped with the version of the product whereto the plug-in will be deployed.)

Creating the source file

First, we need to create a C++ source file. We will assume that the working directory is ~/mysql_is_hello/, and that the source file is called mysql_is_hello.cc


In the top of our source file, we need the include the following header files:

#include <mysql_priv.h>
#include <stdlib.h>
#include <ctype.h>
#include <mysql_version.h>
#include <mysql/plugin.h>
#include <my_global.h>
#include <my_dir.h>

The inclusion of mysql/plugin.h is most important, but currently the other includes are pulled in at some point. So far, this set of includes seems to do the trick.

Defining the columns

It was just mentioned that at some point, the plug-in implementor must provide an array of ST_FIELD_INFO structures that defines the columns of the information schema table. For this example, we'll set up a minimal table definition consisting of one VARCHAR(64) column in order to show a "Hello, world" message. If we flash-forward for a moment - this is the structure of the table we'd like to define:

mysql> desc information_schema.MYSQL_HELLO;
| Field | Type | Null | Key | Default | Extra |
| HELLO | varchar(64) | NO | | | |
1 row in set (0.00 sec)
In order to achieve that, we need to declare our ST_FIELD_INFO array like this:

static ST_FIELD_INFO mysql_is_hello_field_info[]=
{"HELLO", 64, MYSQL_TYPE_VARCHAR, 0, 0, "Hello", NULL},
The last entry of this array is a dummy that serves as a marker for the end of the array. It is very important to always conclude the array with one such entry. Without such a trailing entry, the server does not know where the array ends. This would of course be a very bad thing, and is likely to result in a crash as soon as the plugin is loaded or the information schema table is accessed.

Now, the first ST_FIELD_INFO entry in the array defines the actual column. When we look at the declaration of ST_FIELD_INFO in mysql-5.1.22-rc/sql/table.h we get a better idea of what an individual column definition is made of:

typedef struct st_field_info
const char* field_name;
uint field_length;
enum enum_field_types field_type;
int value;
uint field_flags; // Field atributes(maybe_null, signed, unsigned etc.)
const char* old_name;
uint open_method;

  • field_name - this is used as column name

  • field_length - for string-type columns, this is the maximum number of characters. Otherwise, it is the 'display-length' for the column

  • field_type - a value drawn from enum_field_types. This enum is declared in mysql-5.1.22-rc/include/mysql_com.h and denotes a data type for the column. For the most part, there seems to be one entry in the enum for each SQL data type, although there seem to be a number of additional entries in the enum.

  • value - Unfortunately, I haven't got the faintest idea what this is supposed to do. It does not seem to be used by any of the built-in information schema tales

  • field_flags - This is used to set column attributes. By default, columns are NOT NULL and SIGNED, and you can deviate from the default by setting the appopriate flags. You can use either one of the flags MY_I_S_MAYBE_NULL and MY_I_S_UNSIGNED or combine them using the bitwise or operator |. Both flags are defined in mysql-5.1.22-rec/sql/table.h.
  • old_name - I believe this is used by the built-in information schema tables to denote the column name for the corresponding SHOW statement. It's not really applicable to plug-ins I guess, but I always provide a value here, usually a lower case version of the value for the name member
  • open_method - this should be one of SKIP_OPEN_TABLE, OPEN_FRM_ONLY or OPEN_FULL_TABLE. I admit I don't really understand when to choose which option here. It seems likely that it defines in what manner the server must interact with the table, but I can't boast any detailed knowledge in this matter. However, I do know that we can simply use SKIP_OPEN_TABLE for this simple example.

Filling the table

We explained that apart from defining the table columns, the plug-in implementor must also provide a function to actually deliver the rows. Before we can implement the actual fill_table function, we first need a forward declaration of a function that stores one single row in an information schema table:

bool schema_table_store_record(THD *thd, TABLE *table);
This function is defined in mysql-5.1.22-rc/sql/sql_show.cc. It's role will become clear in a moment.

Now we can create the actual fill_table function:
int mysql_is_hello_fill_table(
THD *thd
, TABLE_LIST *tables
, COND *cond
int status;
CHARSET_INFO *scs= system_charset_info;
TABLE *table= (TABLE *)tables->table;

const char *str = "plugin: hello, information_schema!!!";
, strlen(str)
, scs

status = schema_table_store_record(
, table

return status;
The server passes a number of arguments to the fill_table function:

  • THD *thd - this is a pointer to an instance of the thread descriptor class. In practice, you can think of this as a direct handle to the current session. The THD is declared in mysql-5.1.22-dev/sql/sql_class.h.

  • TABLE_LIST *tables - This is a list of st_table objects. The first entry in that list corresponds to the runtime representation of the table we are implementing as it appears in a query. TABLE_LIST and st_table are defined in mysql-5.1.22-rc/sql/table.h. From what I could infer, it seems that in many cases TABLE (which is a typedef of st_table defined in mysql-5.1.22-dev/sql/handler.h) is used rather than st_table.
  • COND - This is used to pass the instance of the Item class that holds the internal representation of the WHERE-clause. This could then be used by the plugin to return only the rows that are required as per the WHERE condition - something like a pushed down condition. We will ignore this argument for now - the plug-in implementor may use it to optimize the fill_table function, but is not required to do so. So ignoring this argument will not lead to wrong results.
The fill method first obtains a handle to the runtime representation of the information schema table proper:

TABLE *table= (TABLE *)tables->table;
Note that TABLE or st_table structure is the runtime representation of a table, i.e. a table as it appears in a SQL query. This is different from the ST_SCHEMA_TABLE structure which is used to define an information schema table.

We then write the message "plugin: hello, information_schema!!!" to the table's column:

char *str = "plugin: hello, information_schema!!!";
, strlen(str)
, scs
So, each TABLE structure (a.k.a st_table) has an array of Field instances which is the runtime representations of table's columns. Here, one of the store methods is called on the Field entry at the 0th index in the field array. Note that the Field at the 0th index corresponds to the 1st column in our SQL table.

Like we just saw for ST_SCHEMA_TABLE versus TABLE, the Field class is a runtime representation of a table column, not to be confused with the ST_FIELD_INFO which merely defines a column.

The Field class is declared in mysql-5.1.22-rc/sql/field.h. It declares a number of overloaded methods to store a value into the current row. In this example we use a variant that is appropriate to store a character string:

/* Store functions returns 1 on overflow and -1 on fatal error */
virtual int store(const char *to, uint length,CHARSET_INFO *cs)=0;
Finally, we get to call the schema_table_store_record function. This finalizes the process of storing a row into the information schema table. This method returns either 0 (in case of success) or 1 (in case of error). Because our example plugin stores just one row, we can simply conclude by returning the result of schema_table_store_record as the result of our fill_table function.

In most practical applications, one would likely have a loop to repeatedly store a row in the table, and one would have to interrupt the normal completion of that loop as soon as the schema_table_store_record returns something not equal to 0.

Putting together the plug-in type-specific implementation

At this point we've taken care of the individial elements to implement an information schema table: we created a ST_FIELD_INFO array to define the columns of the table, and we provided an implementation for the fill_table function to actually deliver the rows. What we still need to do though is hook these elements up with the ST_SCHEMA_TABLE structure that forms the type-specific part of an information schema plug-in.

Like mentioned before, a pointer to a ST_SCHEMA_TABLE instance is passed to the plug-in's init_plugin function. We can now implement the init_plugin function for our specfic plug-in and set it up to use our fields_info and fill_table implementations:

static int mysql_is_hello_plugin_init(void *p)

schema->fields_info= mysql_is_hello_field_info;
schema->fill_table= mysql_is_hello_fill_table;

return 0;
As we can see, there is very little to do here - we simply assign our implementations to the appropriate members of the ST_SCHEMA_TABLE instance, and return 0 to indicate a successful initialization.

We can immediately take care of the plugin_deinit function too:

static int mysql_is_hello_plugin_deinit(void *p)
return 0;
In this case, we can get away with this simple dummy implementation.

In real-world examples, a plug-in might require some resource like memory or a file. In these cases, the plugin_init function would claim these resources, and the plugin_deinit function would be used to free up those resources again.

Putting together the generic plug-in implementation

With the previous step, we concluded the process of creating the plug-in type specific implementation for an information_schema plug-in. The final touch is that we have to hook this up to the generic part of plug-in interface, that is, we have to provide a st_mysql_plugin structure for our plug-in.

Rather than doing so directly, we use the predefined macros mysql_declare_plugin and mysql_declare_plugin_end for that. (Both these macros are defined in mysql-5.1.22-rc/include/mysql/plugin.h):

struct st_mysql_information_schema mysql_is_hello_plugin=

&mysql_is_hello_plugin, /* type descriptor */
"MYSQL_HELLO", /* Name */
"Roland Bouman (http://rpbouman.blogspot.com/)", /* Author */
"Says hello.", /* Description */
mysql_is_hello_plugin_init, /* Init function */
mysql_is_hello_plugin_deinit, /* Deinit function */
0x0010, /* Version (1.0) */
NULL, /* status variables */
NULL, /* system variables */
NULL /* config options */
We pass mysql_is_hello to the mysql_declare_plugin macro, and end our plug-in descriptor with the mysql_declare_plugin_end. As far as I can see, these macros are there to take care of some plumbing to allow definition of multiple plug-ins within the same source file.

An important element in putting together the plug-in descriptor is the assignment of the plugin_init and plugin_deinit functions which we discussed in the previous section. Assigning them here to the appropriate members of the plug-in descriptor ensures that the server knows what it must do in order to instantiate the plug-in.

Another important element is assigning the plug-in name, which we chose to be MYSQL_HELLO. We already explained that this name is later used in the INSTALL PLUGIN and DEINSTALL PLUGIN syntax, and that it is also used as the table name for the information_schema table.

Building and installing the plugin

Now that we have created the source file we must compile it and then install the plugin into our server.

Compiling the plugin source file

Assuming the source file mysql_is_hello.cc is located in the current working directory and /home/user/mysql-5.1.22-rc is the path to the MySQL 5.1.22 source distribution, the following line can be used to compile the source file:

g++ -DMYSQL_DYNAMIC_PLUGIN -Wall -shared
-o mysql_is_hello.so mysql_is_hello.cc

Note that this is all on one line - I added line breaks to make it easier to read. If all goes well, this should result in a shared object file called mysql_is_hello.so.

Apart from the common g++ options -I (include path), -W (warnings, -Wall means "all warnings"), -shared (to compile as a shared, dynamically linkable library), and -o (output file) we see the specific -DMYSQL_DYNAMIC_PLUGIN. The -D option is there to define a constant (like a #define directive). -DMYSQL_DYNAMIC_PLUGIN triggers some conditional compilation magic that allows the relevant plug-in definitions to be exposed so they are visible from a program that dynamically links the shared object file. So, this option is required to make the plug-in pluggable.

The plugin directory

Once we obtained the mysql_is_hello.so shared object file, we must copy it to the plug-in directory of our installed MySQL 5.1.22 binary. Note that it is not necessary to build MySQL from source - you should be able to install the plug-in in a pre-built MySQL installed from a binary distribution.

The exact location of the plug-in directory is dependent upon specific MySQL distribution and configuration. You can find out its current location by querying the value of the plugin_dir system variable:

mysql> show variables like 'plugin_dir';
| Variable_name | Value |
| plugin_dir | /home/roland/mysql-5.1.22-dev/lib/mysql |
1 row in set (0.01 sec)
So in this case, the shared library mysql_is_hello.so should be copied to /home/roland/mysql-5.1.22-dev/lib/mysql.

Installing the plugin

Once the shared library is in place we can install it using the INSTALL PLUGIN syntax:

mysql> install plugin MYSQL_HELLO soname 'mysql_is_hello.so';
Query OK, 0 rows affected (0.00 sec)
Note that we use the name MYSQL_HELLO for the plugin: this is what we defined earlier as the name member of the st_mysql_plugin plug-in descriptor. Likewise, we use mysql_is_hello.so, which is the file name of our shared object file as soname. The plug-in directory is implied - it should not be possible to install a shared library located at any place outside the plug-in directory.

In order to install a plug-in in this manner, the user has to have privileges to INSERT into the mysql.plugin table, or have the SUPER privilege.

There is a common problem that might occur at this point:

ERROR 1127 (HY000): Can't find symbol '_mysql_plugin_interface_version_' in library
If you see a message like this, it is likely that you forgot to include the -DMYSQL_DYNAMIC_PLUGIN option when compiling the plugin. Adding this option to the g++ compile line is required to create a dynamically loadable plug-in.

Verifying installation

We can now check if the plug-in is correctly installed. We do this by querying the PLUGINS table in the information_schema:

mysql> select * from information_schema.plugins
-> where plugin_name = 'MYSQL_HELLO'\G
*************************** 1. row ***************************
PLUGIN_LIBRARY: mysql_is_hello.so
PLUGIN_AUTHOR: Roland Bouman (http://rpbouman.blogspot.com/)
1 row in set (0.00 sec)

Using the plug-in

Finally, we get to test our plug-in ;-)

mysql> select * from information_schema.mysql_hello;
| plugin: hello, information_schema!!! |
1 row in set (0.00 sec)
Of course, this is a gloriously useless application of information schema plug-ins. In a next installment I'll demonstrate that you can do pretty cool stuff with these information schema plug-ins, such as peeking inside the query cache, listing the currently defined savepoints, temporary tables, user variables and much more.

Uninstalling the plugin

When you get tired of the plugin you can uninstall it using the UNINSTALL PLUGIN syntax:

mysql> uninstall plugin MYSQL_HELLO;
Query OK, 0 rows affected (0.00 sec)
Note that currently, due to a bug, you must be sure to use the exact same name for uninstalling the plugin as you did for installing it. I suspect this will be fixed soon, but for now it is best to simply always use the same name, for example the exact name used in the code, MYSQL_HELLO.

Learn More

I will be posting more about information schema plugins shortly. In particular, I will demonstrate how you can report status on server internals such as the query cache to discover which queries are in the cache, the number of blocks they are using and the number of bytes they actually occupying. However, the best way to learn more about extending the MySQL server, the MySQL plug-in API, and the MySQL information_schema is to visit the MySQL user's conference, April 14-17 2008 in Santa Clara CA, USA.

There is a number of great sessions on this and related topics:Of course, once you are at the conference, there will be many MySQL server developers giving you ample opportunity to ask them about some particular details, or maybe have them look at your code. And if you Register by February 26, 2008 you'll save up to $200.

If you can't wait: other great sources of information may be found in the list below:

Cheers, and happy hacking ;-)

DuckDB bag of tricks: Processing PGN chess games with DuckDB - Rolling up each game's lines into a single game row (6/6)

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is the sixth installment of a s...