In this article, I'd like to take that to the next level and demonstrate how to write an information schema plug-in that can access some of the internals of the MySQL server. For this particular purpose, we will focus on a plug-in that reports all the
SAVEPOINT
s available in the current session. This MYSQL_SAVEPOINTS
plug-in may be of some value when debugging scripts and stored routines that rely on complex scenarios using transactions and savepoints. In a forthcoming article, I will describe a few information schema plug-ins that are arguably more interesting, such as a plug-in to list the currently existing
TEMPORARY
tables, user-defined variables, and the contents of the query cache. Although the plug-in described in this article may be of some use, its main purpose is to illustrate the minimal requirements for plug-ins that can access the server's internals.A Quick Recapitulation
You might recall that:
- The MySQL plug-in API is one of the new features in MySQL 5.1, and forms a generic extension point of the MySQL database server, allowing privileged database users to add functionality to the MySQL Server by loading a shared library from the plug-in directory
- Loading and unloading a plug-in is a completely dynamic process controlled using the
INSTALL PLUGIN
andUNINSTALL PLUGIN
syntax, and does not involve compiling the server or even restarting it - There are several types of plug-ins, the most well-known being storage engines and full-text parsers. Less well-known types include information schema and daemon plug-ins.
- An information schema plug-in provides the implementation of a table (or actually, a system view) in the information_schema database
- Plug-ins are usually implemented in C/C++. To implement a plug-in, the implementor must include the header file
plugin.h
and provide an initialized instance of thest_mysql_plugin
structure. In addition, the implementor must provide code to implement the plug-in type dependent part of the interface - The plug-in type dependent part of the interface for information schema plug-ins consists of two things: the column definitions of the information schema table and a
fill_table
function that is called whenever the server wants to retrieve the rows of data from that table.
How Information Schema plug-ins can access MySQL Server internals
Before we discuss theMYSQL_SAVEPOINTS
information schema plug-in in detail, let's take a look at the way information schema plug-ins can obtain access to the internals of the MySQL server.Like I just recapitulated from last week's article, the plug-in type dependent inferface for information schema plug-ins consists of two things:
- An array of
ST_FIELD_INFO
structures, each of which defines a column of the information schema table - A
fill_table
function that is called by the server when it needs to retrieve the data from table
fill_table
function is a different matter. Let's take a look at the signature of the signature of that function:The
int fill_table(THD *thd, TABLE_LIST *tables, COND *cond);
TABLE_LIST *tables
argument provides the handle to the information schema table that is being filled, and the COND *cond
argument represents the WHERE
condition of the SQL statement that is currently being handled, allowing the fill_table
function to directly filter rows (instead of relying on the query execution engine to do that). As such, these arguments are occupied with the actual delivery of rows of data to the server.The first argument to the
fill_table
function offers all kinds of interesting opportunities to see what is going on inside the server. We will discuss it in more detail in the next section.Public accessors to the current THD
instance
The first argument to the fill_table
function is THD *thd
. This is the so-callled thread descriptor - something that is best thought of as a handle to the current session. Note that in a MySQL context, the terms connection, thread and session are often used interchangeably. However, I find the term thread too broad, and the term connection too narrow. As there are many parts in THD
that maintain state regarding the events that occurred since a connection is established, it seems most sensible to think of THD
as the server-side implementation of a session.This session handle or thread descriptor has the form of a pointer to an instance of the
THD
class. The plugin.h
header file contains a forward declaration to this class, but the actual declaration is contained in sql/sql_class.h
. The THD
class is one of the key data structures in understanding the workings of the MySQL server as it is passed as an argument to many internal server functions. Consequently it provides a wealth of possibilities to create interesting new information schema plug-ins. In fact, the number of possibilities are so great, that a number of common usages has been explicitly set aside in the plugin.h
header file.The
plugin.h
header file contains a number of function declarations and macros that provide access to the members of a THD
instance. I will not discuss all of them here, but highlight just a few just to give you an idea:thd_test_options()
- Find out which options are set. This can be used to find out whether a number of boolean options likebig_tables
, (general and binary) logging, andautocommit
are enabled or disabled.thd_proc_info()
- Should be used by the plug-in implementor before starting a potentially time-consuming operation so the rest of the world can monitor what this session doing. The code set here corresponds to the value reported in theState
column by theSHOW PROCESSLIST
statementthd_killed()
- Can be used by the plug-in implementor to find out if the thread in which this session lives was killed. If the plug-in is involved in a potentially time-consuming process, the plugin-in implementor should periodically check this and gracefully abort the plugin-ins work when it detects that the thread was killed.thd_alloc()
- Allocates some memory from this session's memory pool. If the plug-in requires some small amount of memory, plug-in implementors should use this rather than the standardmalloc()
function. Callingthd_alloc();
is likely to be faster because it takes memory out of a pre-allocated pool, reducing contention. In addition, it is more convenient because the memory need not be explicitly freed: it is automatically reclaimed by the pool after handling the current statement.
plugin.h
and look for comments like this:
/*************************************************************************
Miscellaneous functions for plugin implementors
*/
plugin.h
describes a public interface
In plugin.h
, the declarations as described in the previous section together form a public interface to the current session. They are there for the convenience of plug-in implementors and represent a 'safe' way to work with the THD
pointer passed to the fill_table
function.To say that these form a public interface is to stay that these are officially supported by MySQL AB. That is: they will be supported officially once the MySQL 5.1 Server is a generally available release. From that point on you can rely on these functions when writing plug-ins in the sense that you do not have to be afraid that they will change. At least, the public interface will remain the same for all forthcoming builds of the 5.1 server. Any interface changes in future releases will involve a proper process, giving everybody the chance to update their code well in time.
Unfortunately, not every function declaration in
plugin.h
has source code comments. This means that for now, you sometimes need to do some digging in the server's source code to find out what you can do with them. I admit that this situation is not exactly perfect. However, the matter has been reported as a bug, and hopefully, it will be adressed soon.Beyond the public interface
I just described the public interface plug-in implementors can rely on. A distinct advantage of the public interface is that it takes away a lot of the complexity of the underlying internals of the MySQL Server. However, there will always be cases where the public interface does not offer the features you really need. In those cases, you simply need to be able to work directly on the server internals.The advantage of directly referencing the server's internals is that you can access all the interesting nuts and bolts and bits and pieces. The downside is that there is absolutely no guarantee that your code will work in another version of the server. The internals are by definition the parts that are not meant to be exposed. As such, it is possible that your code does not work or behaves unexpectedly in another version of the server.
Let's not dwell too long on the disadvantages. Instead, let's focus on the merits of pluggable information schema tables. Granted, it is inconvenient that we may need to make our code resilient to each different build of the server. However, for many applications, it is not very likely that we have to constantly do that.
Even if we do have to change our code, the burden will be on the developer of the plug-in. For each specific build of the server, your code may need to be different. Even if the code itself does not change, you will probably at least have to recompile your plug-in for each specific build of the server. However, your users are still not required to recompile the server itself. They can still install the plug-in without stopping or restarting the server, which in many cases seems more important than bearing the burden of changing the code.
You need to break some eggs to bake an omelet - so if you're hungry, you better get over it and start breaking some eggs ;-)
The MYSQL_SERVER
define
In order to access the server's internals beyond the public interface, we need to use some C/C++ preprocessor magic and define MYSQL_SERVER
. This define needs to be present before we include any MySQL header (or source) files:Throughout the MySQL codebase, there are many sections that are conditionally included or excluded depending on whether
#ifndef MYSQL_SERVER
#define MYSQL_SERVER
#endif
MYSQL_SERVER
is defined. It is hard to pinpoint the exact effect of adding this definition, because there many spots that use this definition to control conditional compilation. Normally the
MYSQL_SERVER
definition need be present only when compiling the server proper, but in this case we need it to let the plug-in code work with internal structures such as THD
instances directly, that is, without using the accessors provided by the public interface.To be absolutely clear: using the
MySQL_SERVER
define in your code does not mean you must compile your plug-in as part of the server. On the contrary - you can compile your plug-ins separately from the server, and still (un)install them at runtime. The only thing the MySQL_SERVER
define does, is pull in the declarations that are normally considered to be 'internal'. They will for example allow us to work directly with the members of the THD
class instead of being required to use the public accessors defined in plugin.h
.Implementing the MYSQL_SAVEPOINTS
Information Schema plug-in
Now that we sketched the backgrounds, we can quickly proceed and discuss the implementation of the MYSQL_SAVEPOINTS
information schema plug-in. (Note that you can download the source code file mysql_is_savepoints.cc
here.)Most of the things are rather similar to what was described in the article describing the
MYSQL_HELLO
plug-in, for which you can still download the mysql_is_hello.cc
source code.We will do like we did last week and assume the following things are in place on your system:
- g++, the GNU C++ compiler
- The MySQL 5.1.22 source distribution - we need to include some of the header files
- A text editor or IDE (like Eclipse with CDT)
Creating the source file
First, we need to create a C++ source file. We will assume that the working directory is ~/mysql_is_savepoints/, and that the source file is called mysql_is_savepoints.cc.The MYSQL_SERVER define
Like we explained in the previous sections, we need to defineMYSQL_SERVER
so we can directly reference the members of the THD
class passed to our fill_table
function.Because this affects how the included files are processed, we do this at the very top of our source file.
#ifndef MYSQL_SERVER
#define MYSQL_SERVER
#endif
Include files
We can use the same list of includes we used for theMYSQL_HELLO
plug-in - the MYSQL_SERVER
define is responsible for including all the additional things we require to write the MYSQL_SAVEPOINTS
plug-ins.
#include <mysql_priv.h>
#include <stdlib.h>
#include <ctype.h>
#include <mysql_version.h>
#include <mysql/plugin.h>
#include <my_global.h>
#include <my_dir.h>
Defining the columns
For theMYSQL_SAVEPOINTS
plug-in, we will define two columns: SAVEPOINT_ID
and SAVEPOINT_NAME
. At the SQL level, it will look something like this:...and this is what it looks like in the C/C++ source file:
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| SAVEPOINT_ID | bigint(0) | NO | | 0 | |
| SAVEPOINT_NAME | varchar(64) | NO | | | |
+----------------+-------------+------+-----+---------+-------+
This time, in addition to creating the
#define COLUMN_SAVEPOINT_ID 0
#define COLUMN_SAVEPOINT_NAME 1
static ST_FIELD_INFO mysql_is_savepoints_field_info[]=
{
{"SAVEPOINT_ID", 0, MYSQL_TYPE_LONGLONG, 0, 0, "Savepoint Id"},
{"SAVEPOINT_NAME", 64, MYSQL_TYPE_STRING, 0, 0, "Savepoint Name"},
{NULL, 0, MYSQL_TYPE_NULL, 0, 0, NULL, 0}
};
ST_FIELD_INFO
array of column definitions, we also create #define
s for the array entry indexes. The defines allow us to refer to the column definitions using the names rather than the raw, literal integer array indexes. This has the advantage that we do not have to change code should we want to change the positions of the columns. Another advantage is that our fill_table
code will be easier to read: by consistently referring to COLUMN_SAVEPOINT_ID
and COLUMN_SAVEPOINT_NAME
rather than 0
and 1
it will be much easier to see what is going on.Filling the table
Now we come to the heart of the matter: generating a row for each SQLSAVEPOINT
available in the current session.The savepoints for the current session are available in the
transaction
member of the THD
class. The transaction
member is an instance of the st_transactions
struct, which is declared locally inside the THD
class:Now you might recall that the
class THD :public Statement,
public Open_tables_state
{
...many, many lines here...
public:
struct st_transactions {
SAVEPOINT *savepoints
;
...a few more lines here...
} transaction;
...many, many more lines here ...
};
THD
class is declared in sql/sql_class.h
. However, you might have some trouble locating the transaction
member, because the declaration of the THD
class is extremely large and long-winded: in the MySQL 5.1.22-rc source distribution, it ranges from lines 960 to 1886(!!) - and those 900 something lines make up only the declaration!(Although the official explanation for the name of the
THD
class is that it is an acronym for THread Descriptor, some developers* explained that it is one of the few class names that is spelled in capitals because it is so incredibly heavy. According to this anecdote, its name should be pronounced as "...THUD!!...THUD!!..." because of the sound it makes each time it is dumped into the argument list of a function that makes up the servers source code. * = Thanks to Eric Herman for painting this creative and tangible likeness ;-)
Anyway, you will find it easier when you look for
st_transactions
, or go directly to line 1149, but note that the line number is likely to be different in other versions of the server code.Now, we can see that the
st_transaction
struct contains a pointer to a SAVEPOINT
pointer called savepoints
. As we shall see later, this is actually the list of savepoints we need. But what kind of type is this SAVEPOINT
exactly?Well, to get past this point, you really need some patience and a set of tools that allow you to search the source code. In the case of
SAVEPOINT
, it turns out that this is actually a typedef
for the st_savepoint
structure. Now, the odd thing is that this typedef
appears in sql/handler.h
:But the structure
typedef struct st_savepoint SAVEPOINT;
st_savepoint
itself is declared in sql/sql_class.h
- that is, the same file that declares THD
, which seems to prefer SAVEPOINT
rather than st_savepoint
!Well - it is beyond me why it was done like this. For our purpose it doesn't really matter though, let's examine the declaration of
st_savepoint
instead: struct st_savepoint {Here we can see that each
struct st_savepoint *prev;
char *name;
uint length, nht;
};
st_savepoint
has a char *
member called name
, which is presumably whatever name the user provided in the savepoint syntax:So in this case, the
mysql> SAVEPOINT my_savepoint;
name
member of the st_savepoint
instance corresponding to this SQL SAVEPOINT
will point to the character string "my_savepoint"
.Apart from the
name
we can also see that each st_savepoint
has itself a pointer to another st_savepoint
called prev
. This suggests a single linked list of savepoints.This is about all the information we need to implement the
fill_table
function. So, here it is:The biggest difference with the
int mysql_is_savepoints_fill_table(THD *thd, TABLE_LIST *tables, COND *cond)
{
int status = 0; /* return value for this func, 0=success, 1=error*/
CHARSET_INFO *scs= system_charset_info; /* need this to store field into table */
TABLE *table= (TABLE *)tables->table; /* handle to the I_S table. class declared in table.h */
uint savepoint_id = 0;
SAVEPOINT *sv= thd->transaction.savepoints;
while(sv && !status)
{
/* store the savepoint sequence into the table column */
table->field[COLUMN_SAVEPOINT_ID]->store(++savepoint_id, 0);
/* store the savepoint name into the table column */
table->field[COLUMN_SAVEPOINT_NAME]->store(sv->name, strlen(sv->name), scs);
status= schema_table_store_record(thd, table);
sv= sv->prev;
}
return status;
}
fill_table
function we used in the MYSQL_HELLO
example is that instead of just storing one single row, we have a loop, storing one row for each iteration. The loop is initialized by assigning the SAVEPOINT
pointer from the transaction
member from the THD
instance that is passed as the first argument to the fill_table
function to a local sv
variable:Of course, it is possible that there are no savepoints in the current session, in which case
SAVEPOINT *sv= thd->transaction.savepoints;
thd->transaction.savepoints
will be the NULL
pointer. However, if there are savepoints, a pointer to the last savepoint that was created in the current session will now be stored in sv
. We can now set up the actual loop:Note that the loop will be entered only if
while(sv && !status)
{
...lines here...
status= schema_table_store_record(thd, table);
sv= sv->prev;
}
sv
points to a savepoint. If it does, data from the savepoint is written to the columns of our information schema table. In the bottom of the loop, we store the current record using the
schema_table_store_record
, which we discussed already for the MYSQL_HELLO
example:Interestingly, we were required to make a forward declaration to it in the
status= schema_table_store_record(thd, table);
MYSQL_HELLO
example. Now, we don't have to do this, presumably because we defined MYSQL_SERVER
. You might recall that
schema_table_store_record
function returns 0
in case of success and 1
instead of failure. Note that if a failure occurs at this point, the loop will not iterate again, as the while
condition requires status
to be not true (that is, zero). After storing the row, the last step of the loop is to move back and examine the previous savepoint:
If the end of the list is reached,
sv= sv->prev;
sv
will be NULL
, preventing the loop to iterate again. However, if there is in fact a previous savepoint, the loop will run once again and create a new row for that savepoint too, on and on until we reach the end of the list of savepoints.In the top of the loop, we store data into the columns of our information schema table:
This time, we use our defines
/* store the savepoint sequence into the table column */
table->field[COLUMN_SAVEPOINT_ID]->store(++savepoint_id, 0);
/* store the savepoint name into the table column */
table->field[COLUMN_SAVEPOINT_NAME]->store(sv->name, strlen(sv->name), scs);
COLUMN_SAVEPOINT_ID
and COLUMN_SAVEPOINT_NAME
instead of the literal numerical field indexes. We already demonstrated in the MYSQL_HELLO example how to store a string, so we won't discuss the line that stores the savepoint's name. Instead, let's find out how we can store an integer value by looking at the line that stores the savepoint id.As you can see, we stipulate the value for the
SAVEPOINT_ID
column ourselves by simply adding one for each row:Savepoints by no means have a numerical ID of their own, but it makes sense to make one up in order to unambigously indicate the order in which the savepoints were created during this session. Note the second argument to the
table->field[COLUMN_SAVEPOINT_ID]->store(++savepoint_id, 0);
store
method, which is always 0
here:
table->field[COLUMN_SAVEPOINT_ID]->store(++savepoint_id, 0);
This second argument is there to tell the
store
method whether the value represents a signed or an unsigned value. In this case, we are storing an unsigned
value - it should be 1
for an unsigned value.The rest of the implementation
The remainder of the implementation is quite similar to what was discussed for theMySQL_HELLO
example. The most important difference is actually the plug-in name, but otherwise the implementation is identical. Therefore, it is not discussed here further.Building and Installing
The build and install process is pretty much similar to that for theMYSQL_HELLO
plug-in.Compiling
We can compile the plug-in like this:g++ -DMYSQL_DYNAMIC_PLUGIN -Wall -sharedThis will create the shared library
-I/home/user/mysql-5.1.22-rc/include
-I/home/user/mysql-5.1.22-rc/regex
-I/home/user/mysql-5.1.22-rc/sql
-o mysql_is_savepoints.so mysql_is_savepoints.cc
mysql_is_savepoints.so
.Installing the plug-in
You might recall that the shared library needs to be moved to the plug-in directory. After that, we can install the plug-in using theINSTALL PLUGIN
syntax:
mysql> INSTALL PLUGIN MYSQL_SAVEPOINTS soname 'mysql_is_savepoints.so';
Query OK, 0 rows affected (0.00 sec)
Using the plug-in
Now we can finally see our plug-in in action. At first, there will be no savepoints present:mysql> SELECT * FROM information_schema.MYSQL_SAVEPOINTS;Even if we set one, we won't see it immediately:
Empty set (0.02 sec)
This is beause by default, the session has autocommit enabled. As each statement is wrapped in its own transaction, we will never be able to see any savepoints. So we disable autocommit:
mysql> SAVEPOINT A;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
Empty set (0.00 sec)
And now we can really witness the behaviour of our plug-in:
mysql> SET autocommit = OFF;
Query OK, 0 rows affected (0.00 sec)
mysql> SAVEPOINT A;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
+--------------+----------------+
| SAVEPOINT_ID | SAVEPOINT_NAME |
+--------------+----------------+
| 1 | A |
+--------------+----------------+
1 row in set (0.00 sec)
mysql> SAVEPOINT B;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
+--------------+----------------+
| SAVEPOINT_ID | SAVEPOINT_NAME |
+--------------+----------------+
| 1 | B |
| 2 | A |
+--------------+----------------+
2 rows in set (0.00 sec)
mysql> ROLLBACK TO SAVEPOINT A;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM information_schema.MYSQL_SAVEPOINTS;
+--------------+----------------+
| SAVEPOINT_ID | SAVEPOINT_NAME |
+--------------+----------------+
| 1 | A |
+--------------+----------------+
1 row in set (0.00 sec)
Learn More
This has been quite a ride! In this article it was demonstrated how we can use information schema plug-ins to report some of the things that are going on inside the current session. As such, theMYSQL_SAVEPOINTS
plug-in is a big step forward compared to the MYSQL_HELLO
plug-in. However, this is just the tip of the iceberg - the current session harbours much more interesting information about the current session, and in a forthcoming article I will demonstrate a number of other usages. In particular, I will present a plug-in to report the user variables in the current session, and the temporary tables defined in the current session.
In another article, we will also see that it is sometimes possible to look beyond the current session and report on the status of server-wide structures, such as the query cache.
Meet us at the user's conference
When I discussed theMYSQL_HELLO
plug-in, I already hinted that there will be a lot there for those people that want to learn more about extending the server with (information schema) plug-ins. You can find all those links in the bottom of that article.In addition, you can learn a lot about the MySQL Server internals. And...you can learn it from one of the founding fathers: Monty himself will be doing A tour into MySQL's internals. So, I guess that's going to be one of those occasions where you get the opportunity to clear up some of those details in the source code you never quite managed to wrap your head around.
If you register before the 26th of februari, you'll get a $200 discount. There are more discounts available depending on whether you attended before, or if you register together with a number of colleagues; there's special student and non-profit discounts too - check it out here.
See you at the conference! (Bonus points for the first one to ask Monty in the Q&A why
SAVEPOINT
is typedef-ed in sql/handler.h
instead of sql/sql_class.h
; double bonus points for the first one to ask Monty if THD
is really called like that because it is so heavy ;-)
No comments:
Post a Comment