Roland Bouman's blog: November 2005

Tuesday, November 29, 2005

This is TOO Wicked!

Wow, this really is too cool to believe!

I'm one of the lucky three grand prize winners in the You Make MySQL 5.0 Rock! contest. I bet that sounds cheesy, but I really never imagined I would be one of those lucky three. It goes without saying I'm very very pleased indeed! Brilliant!

I just want to say: Thank You! to the MySQL team, it's a great honour. And hey, Arjen, I didn't mind sucking up at all ;-), as a matter of fact, I can't wait until the next contest.

Well, now that I'm at it, I want to speak up in favour of a couple of MySQL community personalities that I respect and think highly of. To a large extent, my motivation to occupy myself with MySQL has to do with being inspired reading their posts on various MySQL forums, their blogs and in an isolated case, their book. So, here we go:

Andrew Gilfrin: Andrew barely needs introduction: without doubt, one of the people most present on various of the MySQL forums, and of course, the man behind www.mysqldevelopment.com. But hey, this guy is so busy, isn't there anyone out there that can help out, and contribute to the site? Come take a look and help to make it even better still!
Markus Popp: The brains behind that wonderful and unique initiative, www.db4free.net, making open source database product accessible to literally everyone (with a connection and an email adress, that is). Congrats to you, dude!
Jay Pipes: Apart from being a very helpful and knowledgable MySQL forum member, Jay's also one of the Authors of Pro MySQL. I think that book's just great stuff, taking the principle perspective of MySQL as a professional database product.
Felix Geerinckx: I've seldom seen a forum member that was so accurate and quick in answering forum questions. And that's exactly what Felix does. I bet he's the modest type too, because unlike myself, you can never catch him plugging his own pages into the forum. Felix, where's your blog?
Beat Vontobel: Well, who could've missed that, a MySQL Stored Procedure debugger. Amazing! Congratulations Beat! The day will come that somebody will aggregate your blogs in a Internet Explorer friendly page ;D)
Giuseppe Maxia: Finally, someone senior enough has taken it upon himself to create some structure and organize the various MySQL general purpose routines that are scattered around. So, come on, check out the MySQL General Purpose Stored Routines Library and post your comments and contributions.
Arjen Lentz: Maybe I'm not the best judge, but it is my impression that Arjen is one of the major reasons why this community is thriving, and literally alive and kicking. For one, the PlanetMySQL initiative is really great. Just one page to see all those interesting items and blogs. This is daily digest for me

So there you have it. You may think it's corny, or you may think it's a matter of blunt, bad taste advertising...so be it. I benefit a lot from these people, and to me, it just seems the proper thing to publicly pay them tribute. Props to all of you!

Meanwhile, work has to be done too.

For one, Markus and Andrew and myself have been, and still are, quite busy writing an series of articles on connecting to MySQL from an application programming language for mysqldevelopment.com. You can expect to see those soon over there.

Second, I'm getting kinda hooked writing UDF's and delving into the associated materials. The C language (after about 8 years, it's coming all back to me), some of the MySQL source (up till now, only the headers). I took it upon me to write a library to implement most of the Oracle functions. Why? Because I might need it for a migration, and because it's fun!
I decided to make a plan to do it too, although I can't dislose too much about that right now. Let me just say I'm implementing them in alphabetical order (about to do DECODE() now), until there are no more entries on the list ;-).

Bye and greetings from one happy Roland.

Wednesday, November 23, 2005

Using an UDF to Raise Errors from inside MySQL Procedures and/or Triggers

Yes, this is really a feature that would be most welcome.

Ever since I started following the MySQL Stored Procedure/Functions and Trigger Forims (say, June 2005) this feature, or rather, the lack thereof, has been the subject of a substantial amount of threads (see: http://forums.mysql.com/read.php?98,24044,24100, http://forums.mysql.com/read.php?99,22523,23797#msg-23797 or more recent: http://forums.mysql.com/read.php?99,55108,55541)

Luckily, this has been recognized by the MySQL developers, and there's an entry in the MySQL 5.0 Technical FAQ that addresses the issue.

But, there's always people that can't wait. For example, Scott Maxwell devised a most creative solution. Props to you Scott!

Well, I scribbled up a little solution that takes another aproach to the problem. I won't pretend I'm the first one that thought of this. It was actually suggested quite a few times, here and there. I must say, I did not see anyone offering the code to do it, but this maybe just ignorance on my part. At least it gave me the opportunity to touch a whole new field of MySQL development: User Defined Functions or UDFs.

Using an UDF

A UDF is an external function that's dynamically linked to the MySQL server software. You can look it all up in the reference manual. There are some obscurities, especially when you're not an C programmer.

(However, the obscurities could be intentional. Maybe it's meant that way, just to scare people off that don't know what they're doing. Well, I'm the first to admit it: I am not an experienced C programmer, and most of the time (well...) I don't know what I'm doing (not just in C, applies to almost any programming language). I guess the difference between starting to do something not knowing what you're doing and finishing doing that is called experience. Or failure. Anyway, the key is: not letting other people know which one it is, and at the same time pretending you do know the difference yourself.)

First, you need have the object code that contains the UDF. The object code is just the compiled, binary code that contains your functions. Now this compilation must be done in a certain manner so that it results in a library that can be dynamically linked to the MySQL Server process. I don't know how this works out for a linux box (but it has something to do with .so files, your guess is probably better than mine is), but in Windows Speak this is called a DLL (Dynamicaly Linked Library).

Once you have the binary, it must be placed in some location that's accessible to MySQL (I'll tell you in a bit where i put it). Finally, you must issue some specialized DDL command so that MySQL knows it exists (yep, I'll tell you that too). From that point on, you will have this UDF accessible throughout the server. Just like the builtin functions, you will never have to prefix it with a database name. You will never have to grant it to anyone either. It's just like using builtin functions, such as LOWER() or GREATEST().

I'll try the best I can to describe what I did to get it working. I really hope it helps.

Setting up a Project

We have a very modest goal here. We just need one little function that will do nothing but raise an error. Normally, I'm quite good at breaking stuff, and my code gives runtime errors all the time, so this shouldnt be too hard. So let's get started and code some stuff.

Well, the relevant pages in the manual say that you should write the function in C. You probably could do it in other languages as well as long as calling conventions, stack properties and who knows whatnot will be the same. But, I'm not Indiana Jones: just writing a little bit of C is adventure enough for now.

I just used Visual Studio C++ 2003 for no particular reason. It's just what I happen to run. You could probably achieve the same with another IDE and Compiler. But hey, I don't know those, so I won't describe that (Maybe when I get Eclipse hooked up to a freely obtainable compiler I'll relive the experience). BTW, you can download the free (as in bear) VC++ express edition from you know who here. Mind you, it's not entirely the same, but it should support the basic features I'm using here.

I opened a new, blank solution and added a win32 empty DLL project: "MySQLRaiseErrorUDF.vcproj". I added one source C/C++ source file: "MySQLRaiseErrorUDF.cpp", and one Module Definition file: "MySQLRaiseErrorUDF.def". At this stage, both are still empty. However, Visual Studio knows that it should offer C/C++ specific property pages in the Project options dialog, which is good, cause that's what we'll be setting up next.

(Note that the .cpp extension is usually associated with C++ source files. However, we do not use any C++ specific stuff. It just happend to be the only sort of source file I could create using the VS wizard.)

I right-clicked the project to modify the following options:

In the "General" page of the "C/C++" options folder, I modified the "Additional Include Directories" property to set it to the "include" folder under the MySQL installation directory (available after a full or custom/development install). We need this to be able to refer to the "mysql.h" header file in our source.

In the "Input" page of "Linker" folder, I included the "MySQLRaiseErrorUDF.def" file we just added to the project.

Coding the DLL

Now, we can start coding. First, our C source needs to include the "mysql.h" header file. This contains references to "winsock.h" which is included in the windows platform sdk. Most probably, the latter has a different counterpart on other operating systems. (Anyone that can enlighten me concerning this, and even help making the source compilable for multiple platforms, is most invited.) So, we start like this:


#include <winsock.h>
#include <mysql.h>

A simple, non-aggregate function that is to be called from the MySQL side corresonds to three functions in the C code. Say, we want to create a function called: raise_error(). Then, our C file must at least contain the code for that, and optionally contain a raise_error_init() and a raise_error_deinit() function. (look it up in the manual)

We'll just rush on to coding the raise_error_init function. Once we got that, you'll see that the raise_error() is really very trivial.

Now, raise_error_init() functions MUST always have this signature:
(Look here for the relevant page in the reference manual.)


my_bool raise_error_init(
    UDF_INIT *initid
,   UDF_ARGS *args
,   char *message
);

There's not much to explain as to why it must be like this: that's just the way MySQL was built. I can tell what this signature contains, and hope that that'll be enough of an explanation.

`initid`

The initid is declared as a pointer to a struct of the UDF_INIT type. This is defined in mysql_com.h (ln 369..376). It doesnt really matter right now what a pointer is, or what a struct is. What does matter, is that the initid variable is like a composite variable. You can think of it like a record which has a couple of fields (members is the appropriate term here) that contain information about our function.

On the one hand, mysql passes initid to our raise_error_init() function so that we get some information on the context that's calling our raise_error() function; On the other hand, we can change the values in the fields of initid to provide information to our raise_error(), because the very same initid will be passed on by MySQL to the raise_error() after raise_error_init() finishes.

`args`

Like initid, args is a pointer to a particular struct, this time, the UDF_ARGS struct. That's defined in mysql_com.h (ln 356..365).

args provides information on the arguments passed to our raise_error() function. Again, we can both read and write data to and from the members of this struct just before the raise_error() itself is called.

`message`

Now we are arriving at the heart of the matter.
The message variable is again a pointer (man, these guys just love pointers). Maybe now it's a good moment to explain what a pointer is exactly.

A pointer is actually just an integer value. But it's a special integer value: it's an integer that identifies, or "points to", a position in the computer's memory. Quite literlly, the pointer's integer value is the address of a byte in the computer's RAM.

Pointers have some added semantics. Their type definition also includes the datatype of the stuff that's stored at that address. Or more appropriately, it includes the datatype that will be used to interpret whatever happens to be in the computer's memory starting from the address identified by the pointer integer value. For example, the message argument is a char pointer, notated as char * in C.

Now, the contract of the XXX_init() function holds that message points to a buffer of a particular size, MYSQL_ERRMSG_SIZE bytes long. Knowing what we know now, we can also say that message identifies an address that demarcates a stretch of consecutive bytes that are reserved.

Furthermore, XXX_init() expects us to write to this memory a string of characters that human users may read and recognize as a meaninful error message text. At least, if we can detect an error right now.

The main reason to use this feature for an ordinary function would be to check that the UDF has been called with the right amount and types of arguments in order to exit in case of invalid arguments. Now, for the function we're writing now it's easy: raise_error() should always generate an error; that's the entire reason for it's existence. And, while we're at it, it would be cool to use the value passed by the user as argument as error message. That would pretty much conclude the functionality of raise_error().

Returning from `raise_error_init()`

Actually, the message argument to raise_error_init() is only half of the story concerning error generation. (see the manual for more details) In order to let MySQL know that an error occurred we MUST return a 1 (one). Normally, a zero (0) should be returned, indicating that execution may proceed execution.

`raise_error_init()`

So, finally, here's the code for raise_error_init():


my_bool raise_error_init(
    UDF_INIT *initid
,   UDF_ARGS *args
,   char *message
){ 
    unsigned int argStringLength;
    if(args->arg_count==1
    && args->arg_type[0]==STRING_RESULT)
    {
        argStringLength = strlen(args->args[0]) 
                        + 1
                        ;
        memcpy(
            message
        ,   args->args[0]
        ,   argStringLength>MYSQL_ERRMSG_SIZE
        ?   MYSQL_ERRMSG_SIZE
        :   argStringLength
        );
    } else {
        memcpy(
            message
        ,   "Unspecified error raised"
        ,   25
        );
    }    
    return 1;
}

Basically, what happens is that we check if a string argument was passed:


    if(args->arg_count==1
    && args->arg_type[0]==STRING_RESULT
    && args->args[0]!=NULL)

and if so, we calculate the number of bytes needed to hold that string


        argStringLength = strlen(args->args[0]) 
                        + 1
                        ;

(Mind you, the + 1 is there because strings are null-terminated (\0 in C))

Then, this information is used to copy the stretch of memory pointed to by the argment to the strecth of memory pointed to by our message:


        memcpy(
            message
        ,   args->args[0]
        ,   argStringLength>MYSQL_ERRMSG_SIZE
        ?   MYSQL_ERRMSG_SIZE
        :   argStringLength
        );

and here, the precaution has been taken to restrict the number of bytes written to message:


        ,   argStringLength>MYSQL_ERRMSG_SIZE
        ?   MYSQL_ERRMSG_SIZE
        :   argStringLength

Now there's also the possibility that the wrong number and types of argument were passed to our UDF. In that case, we just copy a default message to message:


        memcpy(
            message
        ,   "Unspecified error raised"
        ,   25
        );

The only thing we have to do now is return a 1 (one), and our raise_error_init() is ready:


    return 1;

`raise_error()`

For the actual function it isn't really important what we put in there, or what return type it will have. Remember, the function will never be executed because raise_error_init() already stopped before control is passed to raise_error(). Let's leave the details undiscussed right now. Let's just say that our raise_error() really does not do anything.


long long raise_error(
    UDF_INIT *initid
,   UDF_ARGS *args
,   char *is_null
,   char *error
){
    return 0;
}

so, here it is, doing nothing.

Module Definiton and Compilation

So far we just typed C code. But what we need is that this is all compiled into a DLL, in such a manner that other programs, like MySQL Server, will be able to invoke these functions. We must expose our functions using the module definition file. We created that as "MySQLRaiseErrorUDF.def". It's done like this:


LIBRARY MySQLRaiseErrorUDF
EXPORTS
    raise_error_init
    raise_error

That's it. Not too hard eh?
Now, hit the compile button and generate a DLL.

Deployment

I put the DLL in the bin directory under the MySQL installation directory. Then, on the command line i did:


mysql> create function raise_error returns int soname 'MySQLRaiseErrorUDF.dll';
Query OK, 0 rows affected (0.04 sec)
mysql> select raise_error();
ERROR:
Unspecified error raised

as you can see, an error is raised.


mysql> select raise_error('Whoops!');
ERROR:
Whoops!
mysql>

Getting rid of the function is also easy:


mysql> drop function raise_error;
Query OK, 0 rows affected (0.00 sec)

(You need to do that before you can overwrite a DLL that's linked to the server via a UDF).

Limitations

Right now there are some limitations to this approach.
The most important is that our argument really must be a constant value in order to return it:


mysql> set @err:='Whoops'
    -> ;
Query OK, 0 rows affected (0.00 sec)

mysql> select raise_error(@Whoops);
ERROR:
Unspecified error raised

This occurs because of the way the UDF_ARGS is initialized when it is passed to raise_error_init(). The argument values are only initialized for constant argument expressions.

Of course, we can get around it using prepared statements:


create procedure raise_application_error(
    errno int
,   errm  varchar(255)
) begin
    set @errno := errno;
    set @errm  := errm;
    set @stmt  := concat(
        'select raise_error'
    ,   '(''',errm,''')'
    );
    prepare stmt from @stmt;
    execute stmt;
    deallocate prepare stmt;
end;
//

mysql> set @msg := 'Whoops, my bad, sorry!'
    -> //
Query OK, 0 rows affected (0.00 sec)

mysql> call raise_application_error(-20000,@msg);
    -> //
ERROR:
Whoops, my bad, sorry!
mysql>

This example also proves we can raise the error from inside a procdure. By the way, it works for triggers too, I checked.

Now what?!

Well, I would really like to do it the other way around too: reporting the current error via an UDF. But from what I've seen so far, this is not possible.

Anyway,I'm really glad I got my first UDF up an running. It took me some time (especially the module definition was not clear from any background docs i know), but at least now I know some of the pitfalls etc.

Saturday, November 12, 2005

OSDBConsortium? Cool!

I just read this very interesting post on Kaj Arnö's blog.

Word is, the major database open-sourcerers ;-) (MySQL, PostgreSQL, SQLite, Firebird and others) might start a consortium, and collaborate more where it is convenient.

I find this very interesting. If it works out, it may be a new milestone in professionalism and standardistation, and help out people choosing the right product.

Already, a link is given for a putative website. Right now, it's nothing, but I'll be putting a link up in my sidebar as soon as there's more news.

Friday, November 11, 2005

SQL 2003 drafts for download

Funny how seem to find stuff by just looking out for it...(duh!) like these drafts of the 2003 version of the sql standard.

Wednesday, November 09, 2005

Download ISO 9075:1999.....Again!

Yep, I just spotted another location where you can download most of the 1999 version of the SQL Standard.

It's right here: http://www.cse.iitb.ac.in/dbms/Data/Papers-Other/SQL1999/.

As far as I can see, here the 3rd part is missing too, like in the location I blogged about before.

Not that I would encourage anyone to illegally download it, of course. I mean, altough it's a standard, I really think you should pay money to ISO and IEC, even though it´s a teeny weeny outdated.

Tuesday, November 08, 2005

Powerful MySQL Aggregate Functions

When I started exploring the MySQL SQL Dialect, one of the first things that struck me was the quite support for some quite exotic aggregate functions. Some of the are really useful too, and frankly, I'm at a loss why products like Microsoft SQL Server and Oracle don't support them too.

Just for those who don't know what I'm talking about, here are some examples of my favourites, based on the sakila sample database.

`COUNT(DISTINCT expr1[,expr2,...,exprN])`

Most RDBMS-es support COUNT(DISTINCT expr1). That returns the number of distinct values among all occurrences of expr1. MySQL takes this one step further, and allows you to count the number of distinct combinations of values.
For example, to count the number of film category/actor combinations, we can just do:


SELECT     count(DISTINCT f.category_id, fa.actor_id)
FROM       film f
INNER JOIN film_actor fa
ON         f.film_id = fa.film_id

The only way to achieve this result with the traditional version of COUNT(DISTINCT expr1) is to concatenate all the different expressions into on big expression. Believe me, it's a hassle, because you can't just concatenate. Just consider these category_id / film_id combinations: 1 - 22 and 12 - 2. So, the concatenation approach only works when you use some kind of separator...Which is not a solution, but just another hassle, because you have to think of a separator that does not appear within the data we're concatenating.

`GROUP_CONCAT([DISTINCT] expr1 [ORDER BY expr1[,..,exprN] [SEPARATOR expr1])`

Group_Concat performs string concatenation of the occurrences of expr1 (or distinct occurrences, if applicable). You can sort the occurrences that make up the aggregate inline using a normal ORDER BY syntax. Last but not least, you can separate the occurrences with a particular separator (the default is a comma).
For example, it allows us to write a query that shows us the film title and the list of actors, and all on the same line:


SELECT     f.title
,          group_concat(a.last_name)
FROM       film f
INNER JOIN film_actor fa
ON         f.film_id = fa.film_id
INNER JOIN actor a
ON         fa.actor_id = a.actor_id
GROUP BY   f.title

This really is an extraordinarily and remarkably powerful feature, which is very useful!
In other rdbms-es, you should either do this on the client side or write your own function to do this, and by the looks of it, quite a bunch of people think they have to do this in MySQL too (see: http://forums.mysql.com/read.php?98,53349,53363, http://forums.mysql.com/read.php?102,53112,53262). Such a function would use cursor traversal to loop throuh the list of actors given a particular film, and inside the loop, ordinary string concatenation would be used to deliver the final result.
This wouldnt be so bad if you'd only have to write that function just once. Of course, you can't, not in a straightforward way anyhow because the cursor is probably different for each case where you'd need it. Besides, this approach will usually perform quite poorly, especially when you'd use it for something like the film / actors example, becuase the cursor has to be opened for each film record.

So, here's my big thumbs up for the clever guy or gall that came up with this feature. Thank you!

Saturday, November 05, 2005

MySQL 5: Prepared statement syntax and Dynamic SQL

Every now and then, people are puzzled by the precise status and extent of MySQL support for dynamic SQL.

Statement Handling

MySQL support the prepared statement syntax. For the better part, a prepared statement is much like a 'normal', immediate statement. The main difference is seen in the way the statement is processed by the server.

Immediate Statements

When an immediate statement is issued, it is processed directly. Processing comprises the following steps:

Parsing: lexical and syntactic analysis of the statement

Planning: optimizer devises a strategy to realise the required result or action, the execution plan or query plan

Execution: retrieval/seeking, writing and reading of data and, if applicable, the construction of a resultset

After these steps, the server responds to the request by sending the client a resultset (if applicable), or an acknowledgement that the statement was executed. Of course, all these actions are performed in concert, and the client is not aware of these different steps taking place. This becomes clear when typing the following statement into the MySQL command line client tool:


mysql> select count(*) from information_schema.schemata;

+----------+
| count(*) |
+----------+
|        6 |
+----------+
1 row in set (0.10 sec)

Immediately (well, almost) the request to select the number of schemata is responded to by returning the resultset.

Prepared Statements

A prepared statement is initiated by the PREPARE statement. A preparation for a query equivalent to previous one could look like this:


mysql> prepare stmt from 
    -> 'select count(*) from information_schema.schemata';
Query OK, 0 rows affected (0.00 sec)
Statement prepared

This time, we didn't get a resultset.
The PREPARE statement instructs the server to parse the query, and possibly, to devise the execution plan. PREPARE associates an identifier with the statement, stmt, wich acts as a handle to refer to the statement and the corresponding execution plan.

Actual execution is postponed until called for by the EXECUTE statement, using the handle to identify the prepared statment to execute.


mysql> execute stmt;
+----------+
| count(*) |
+----------+
|        6 |
+----------+
1 row in set (0.00 sec)

Which is the same result as we got when we issued the immediate statement.

This seems like a overly complex way to do what we could also do with just one statement. It is - until we execute it again:


mysql> execute stmt;
+----------+
| count(*) |
+----------+
|        6 |
+----------+
1 row in set (0.00 sec)

....and again and again and again.

This actually touches upon the major purpose of the prepared statement concept: when a statement is to be repeatedly executed, a prepared statement is potentially more efficient. Because only the execution step needs to be repeated, there is less time wasted on parsing and building the query plan each time the result is required.
Increased efficiency is even more evident for statements that do not return a resultset (such as INSERT, UPDATE and DELETE) because the actions needed to construct a resultset are generally more timeconsuming than parsing and creating an execution plan.

Now, we need to elaborate just a little on this efficiency argument. In the Reference manual, you will bump in to this phrase pretty quickly:

MySQL 5.0 provides support for server-side prepared statements. This...takes advantage of the efficient client/server binary protocol...provided that you use an appropriate client programming interface. Candidate interfaces include the MySQL C API client library (for C programs), MySQL Connector/J (for Java programs), and MySQL Connector/NET.

An alternative SQL interface to prepared statements is available. This interface is not as efficient as using...a prepared statement API...

A little further on, the manual explains that the prepared statement syntax is available from within the SQL language primarily for development purposes; NOT to gain efficiency.

(I did some tests that suggest that the SQL prepared statement syntax is slower than immediate statements, but I don't know if this has to do with the query cache. I used INSERT statements BTW)

Using Parameters

A very powerful feature of prepared statments is the possibility to bind parameters to it. Parameters are specified by writing special ? placholders inside the sql statement that is to be prepared:


mysql> prepare stmt from 
    -> 'select count(*) from information_schema.schemata where schema_name = ?'
;
Query OK, 0 rows affected (0.00 sec)
Statement prepared

When EXECUTE-ing the statement, these placeholders must be bound to user variables with the USING syntax:


mysql> set @schema := 'test';
Query OK, 0 rows affected (0.00 sec)

mysql> execute stmt using @schema;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

So, even though the statement has been prepared, we can still enjoy the flexibility of controlling the query result.

One thing to keep in mind is that parameters are not implemented using simple string substitution. For example, the placeholder in the previous example is not quoted inside the statement. It merely provides a slot for a value, and the binding process takes care of transferring that value to the slot inside the statement. Quotes are merely syntactic methods to distinguish a string from the surrounding code. Because the binding process is way beyond the level of parsing, it does not make sense to use quotes.

You can't just put a placeholder wherever you like. Parameter placeholders can appear only in those places inside the statement where you would normally expect an expression. In particular, you cannot use parameter placeholders to parameterize identifiers or entire statement structures. So, the following attempt fails with a syntax error, because the parameter placeholder appears where you would normally put an identifier:


mysql> prepare stmt from 'create table ? (id int unsigned)';
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your
MySQL server version for the right syntax to use near '? (id int unsigned)' at line 1

However, is is not impossible to paramterize identifiers using the prepared statement syntax. I'll show that in a bit. It's just that you can't do it using parameters, because parameters are just a special cases of expressions, like column references, literals, etc.

Multiple Parameters

We are not restricted to just one parameter, we can use several:


mysql> prepare stmt from 
    -> 'select count(*) 
    -> from information_schema.schemata 
    -> where schema_name = ? or schema_name = ?'
;
Query OK, 0 rows affected (0.00 sec)
Statement prepared
mysql> execute stmt 
    -> using @schema1,@schema2
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

Parameter binding occurs in a positional manner.
Each ? placeholder must be matched bij exactly one user variable in the USING clause, where the first placeholder matches the first user variable, the second placeholder matches the second user variable and so on.

You really must match each placeholder by exactly one user variable, or else you will encounter an error:


1210 (HY000): Incorrect arguments to EXECUTE

Dynamic SQL

For no particular reason, PREPARE accepts either a string literal, or a user-defined variable to define the statement to prepare. It would've been just as conceivable to accept just a statement, like so:


mysql> prepare stmt from 
    -> select count(*) from information_schema.schemata;

But no, this does not work. This just results in a syntax error.

Anyway, we already saw how PREPARE accepts a statement in the form of a string literal. It's pretty much the same for a global user variable:


mysql> set @sql_text := 'select count(*) from information_schema.schemata';

mysql> prepare stmt from 
    -> @sql_text

Query OK, 0 rows affected (0.00 sec)
Statement prepared

Because we can freely assign whatever value we want to @sql_text user variable, we can use this as a device to employ dynamic SQL.

Now we know how to dynamically manipulate our identifiers too: we just manipulate our string before assigning it to the user variable, like so:


mysql> set @table_name := 'mytable';
Query OK, 0 rows affected (0.02 sec)

mysql> set @sql_text:=concat('create table ',@table_name,'(id int unsigned)');
Query OK, 0 rows affected (0.00 sec)

Cleaning up

There's one extra step in the process of using prepared statements that I did not yet mention. That's cleaning up (I guess that tells you something about me, right?). Once you've prepared a statement, the handle and the associated objects on the server's side will remain to exist until the client's session is over. This means that client will keep some of the server's resources occupied. Therefore, it's good practice to clean up afterwards. When you're sure you're done working with the statement, you should DEALLOCATE it:


mysql> deallocate prepare stmt;

This just tells the server to get rid of all the resources associated with the statement handle, and to forget about the statement handle as well.

Some links

Although it does not seem that the prepared statement syntax was designed for it (word is there will be true dynamic sql support in MySQL 5.1 in the form of the EXECUTE IMMEDIATE syntax), it sure does the job. If you want to read some articles on actually using this feature for a real purpose, check out these links:

"The Wizard revisited": By Beat Vontobel
"The power of dynamic queries": by Giuseppe Maxia
"Scheduling Stored Procedure Execution": by yours truly

Tuesday, November 01, 2005

Some more thoughts on Crosstabs

I noticed my previous blog on creating crosstabs in MySQL has generated some attention. It made me find out that I didn't do my homework as well as I would've liked.

Whilst writing it, I googled a bit for solutions, and I did stumble into a PERL solution written by Giuseppe Maxia. However, I totally missed Giuseppe's excellent article that is referred to by Beat Vontobel's blog entry describing a stored procedure that solves this problem for the general case.

Thank you guys, I learnt a lot from those!

Reading all that made me rethink the problem. In doing so, I thought of a little syntax that I would like to use to define a crosstab query. I don't intend to do anything with this right away, but I'm just curious what other people think. So, applied to the example from my earlier blog entry, I would like to write this:


select     f.category_id
,          f.film_id
,          f.title
,          i.store_id
,          r.inventory_id   
from       film f
inner join inventory i
on         f.film_id = i.film_id
left join  rental r
on         i.inventory_id = r.inventory_id
RENDER AS CROSSTAB
ROWS
    LEVEL(category_id) as category
,   LEVEL(film_id,title) as film
COLUMNS
    LEVEL(store_id) as store
CELLS (category,store)
    count(distinct film_id) as count_films
,   count(inventory_id) as count_rentals
CELLS (film,store)
    count(inventory_id) as count_rentals

As for the semantics, ROWS is like a operator that generates rows by applying some sort of 'cascading GROUP BY' to the rows from the SELECT expression using the LEVEL specifications, like this:

category LEVEL: GROUP BY category_id
film LEVEL: GROUP BY category_id, film_id, film_title

The COLUMNS operator is like ROWS, but it generates columns that slice up the rows generated by ROWS.

Finally, CELLS generates a group of cells at the intersections of the row/column pairs of LEVEL specifications between the parentheses. In the intersections, the values appear of the expressions in the comma separated list following the intersection specification.

This would generate a crosstab with this structure:


                                  +-------------------------------------------------------+
                                  |                         store                         |
                                  +---------------------------+---------------------------+
                                  |    1                      |    2                      |
--------+-------------------------+------------+--------------+------------+--------------+
category|film                     |            |              |            |              |
--------+-------+-----------------+count_films |count_rentals |count_films |count_rentals |
        |film_id|title            |            |              |            |              |
--------+-------+-----------------+------------+--------------+------------+--------------+
       1|                         |     29     |     103      |     35     |     125      |
--------+-------------------------+---------------------------+------------+--------------+
       1|     19|AMADEUS HOLY     |            |      13      |            |       0      |
        .       .                 .            .              .            .              .
        ...more rows here...            .              .            .              .
        .       .                 .            .              .            .              .
       1|    991|WORST BANGER     |            |       8      |            |      14      |
--------+-------+-----------------+------------+--------------+------------+--------------+
        .       .                 .            .              .            .              .
 some more rows are here   .            .              .            .              .
        .       .                 .            .              .            .              .

(This is only the structure, I have to think a bit how this would have to be rendered in a resultset)