Yes, this is really a feature that would be most welcome.
Ever since I started following the MySQL Stored Procedure/Functions and Trigger Forims (say, June 2005) this feature, or rather, the lack thereof, has been the subject of a substantial amount of threads (see:
http://forums.mysql.com/read.php?98,24044,24100,
http://forums.mysql.com/read.php?99,22523,23797#msg-23797 or more recent:
http://forums.mysql.com/read.php?99,55108,55541)
Luckily, this has been recognized by the MySQL developers, and there's an entry in the
MySQL 5.0 Technical FAQ that addresses the issue.
But, there's always people that can't wait. For example, Scott Maxwell devised a
most creative solution. Props to you Scott!
Well, I scribbled up a little solution that takes another aproach to the problem. I won't pretend I'm the first one that thought of this. It was actually suggested quite a few times, here and there. I must say, I did not see anyone offering the code to do it, but this maybe just ignorance on my part. At least it gave me the opportunity to touch a whole new field of MySQL development: User Defined Functions or UDFs.
Using an UDF
A UDF is an external function that's dynamically linked to the MySQL server software. You can look it all up in the
reference manual. There are some obscurities, especially when you're not an C programmer.
(However, the obscurities could be intentional. Maybe it's meant that way, just to scare people off that don't know what they're doing. Well, I'm the first to admit it: I am not an experienced C programmer, and most of the time (well...) I don't know what I'm doing (not just in C, applies to almost any programming language). I guess the difference between starting to do something not knowing what you're doing and finishing doing that is called experience. Or failure. Anyway, the key is: not letting other people know which one it is, and at the same time pretending you do know the difference yourself.)
First, you need have the
object code that contains the UDF. The object code is just the compiled, binary code that contains your functions. Now this compilation must be done in a certain manner so that it results in a library that can be dynamically linked to the MySQL Server process. I don't know how this works out for a linux box (but it has something to do with .so files, your guess is probably better than mine is), but in Windows Speak this is called a
DLL (Dynamicaly Linked Library).
Once you have the binary, it must be placed in some location that's accessible to MySQL (I'll tell you in a bit where i put it). Finally, you must issue some specialized DDL command so that MySQL knows it exists (yep, I'll tell you that too). From that point on, you will have this UDF accessible throughout the server. Just like the builtin functions, you will never have to prefix it with a database name. You will never have to grant it to anyone either. It's just like using builtin functions, such as
LOWER()
or
GREATEST()
.
I'll try the best I can to describe what I did to get it working. I really hope it helps.
Setting up a Project
We have a very modest goal here. We just need one little function that will do nothing but raise an error. Normally, I'm quite good at breaking stuff, and my code gives runtime errors all the time, so this shouldnt be too hard. So let's get started and code some stuff.
Well, the relevant pages in the manual say that you should write the function in C. You probably could do it in other languages as well as long as calling conventions, stack properties and who knows whatnot will be the same. But, I'm not Indiana Jones: just writing a little bit of C is adventure enough for now.
I just used Visual Studio C++ 2003 for no particular reason. It's just what I happen to run. You could probably achieve the same with another IDE and Compiler. But hey, I don't know those, so I won't describe that (Maybe when I get Eclipse hooked up to a freely obtainable compiler I'll relive the experience). BTW, you can download the free (as in bear) VC++ express edition from you know who
here. Mind you, it's not entirely the same, but it should support the basic features I'm using here.
I opened a new, blank solution and added a win32 empty DLL project: "MySQLRaiseErrorUDF.vcproj". I added one source C/C++ source file: "MySQLRaiseErrorUDF.cpp", and one Module Definition file: "MySQLRaiseErrorUDF.def". At this stage, both are still empty. However, Visual Studio knows that it should offer C/C++ specific property pages in the Project options dialog, which is good, cause that's what we'll be setting up next.
(Note that the .cpp extension is usually associated with C++ source files. However, we do not use any C++ specific stuff. It just happend to be the only sort of source file I could create using the VS wizard.)
I right-clicked the project to modify the following options:
- In the "General" page of the "C/C++" options folder, I modified the "Additional Include Directories" property to set it to the "include" folder under the MySQL installation directory (available after a full or custom/development install). We need this to be able to refer to the "mysql.h" header file in our source.
- In the "Input" page of "Linker" folder, I included the "MySQLRaiseErrorUDF.def" file we just added to the project.
Coding the DLL
Now, we can start coding. First, our C source needs to include the "mysql.h" header file. This contains references to "winsock.h" which is included in the windows platform sdk. Most probably, the latter has a different counterpart on other operating systems. (Anyone that can enlighten me concerning this, and even help making the source compilable for multiple platforms, is most invited.) So, we start like this:
#include <winsock.h>
#include <mysql.h>
A simple, non-aggregate function that is to be called from the MySQL side corresonds to three functions in the C code. Say, we want to create a function called:
raise_error()
. Then, our C file must at least contain the code for that, and optionally contain a
raise_error_init()
and a
raise_error_deinit()
function. (look it up in
the manual)
We'll just rush on to coding the
raise_error_init
function. Once we got that, you'll see that the
raise_error()
is really very trivial.
Now,
raise_error_init()
functions MUST always have this signature:
(Look
here for the relevant page in the reference manual.)
my_bool raise_error_init(
UDF_INIT *initid
, UDF_ARGS *args
, char *message
);
There's not much to explain as to why it must be like this: that's just the way MySQL was built. I can tell what this signature contains, and hope that that'll be enough of an explanation.
initid
The
initid
is declared as a
pointer to a
struct
of the
UDF_INIT
type. This is defined in
mysql_com.h
(ln 369..376). It doesnt really matter right now what a pointer is, or what a struct is. What does matter, is that the
initid
variable is like a composite variable. You can think of it like a record which has a couple of fields (members is the appropriate term here) that contain information about our function.
On the one hand, mysql passes
initid
to our
raise_error_init()
function so that we get some information on the context that's calling our
raise_error()
function; On the other hand, we can change the values in the fields of
initid
to provide information to our
raise_error()
, because the very same
initid
will be passed on by MySQL to the
raise_error()
after
raise_error_init()
finishes.
args
Like
initid
,
args
is a pointer to a particular struct, this time, the
UDF_ARGS
struct. That's defined in
mysql_com.h
(ln 356..365).
args
provides information on the arguments passed to our
raise_error()
function. Again, we can both read and write data to and from the members of this struct just before the
raise_error()
itself is called.
message
Now we are arriving at the heart of the matter.
The
message
variable is again a pointer (man, these guys just love pointers). Maybe now it's a good moment to explain what a pointer is
exactly.
A pointer is actually just an integer value. But it's a special integer value: it's an integer that identifies, or "points to", a position in the computer's memory. Quite literlly, the pointer's integer value is the address of a byte in the computer's RAM.
Pointers have some added semantics. Their type definition also includes the datatype of the stuff that's stored at that address. Or more appropriately, it includes the datatype that will be used to interpret whatever happens to be in the computer's memory starting from the address identified by the pointer integer value. For example, the
message
argument is a
char
pointer, notated as
char *
in C.
Now, the contract of the
XXX_init()
function holds that
message
points to a
buffer of a particular size,
MYSQL_ERRMSG_SIZE
bytes long. Knowing what we know now, we can also say that
message
identifies an address that demarcates a stretch of consecutive bytes that are reserved.
Furthermore,
XXX_init()
expects us to write to this memory a string of characters that human users may read and recognize as a meaninful error message text. At least, if we can detect an error right now.
The main reason to use this feature for an ordinary function would be to check that the UDF has been called with the right amount and types of arguments in order to exit in case of invalid arguments. Now, for the function we're writing now it's easy:
raise_error()
should
always generate an error; that's the entire reason for it's existence. And, while we're at it, it would be cool to use the value passed by the user as argument as error message. That would pretty much conclude the functionality of
raise_error()
.
Returning from raise_error_init()
Actually, the
message
argument to
raise_error_init()
is only half of the story concerning error generation. (see
the manual for more details) In order to let MySQL know that an error occurred we MUST return a 1 (one). Normally, a zero (0) should be returned, indicating that execution may proceed execution.
raise_error_init()
So, finally, here's the code for
raise_error_init()
:
my_bool raise_error_init(
UDF_INIT *initid
, UDF_ARGS *args
, char *message
){
unsigned int argStringLength;
if(args->arg_count==1
&& args->arg_type[0]==STRING_RESULT)
{
argStringLength = strlen(args->args[0])
+ 1
;
memcpy(
message
, args->args[0]
, argStringLength>MYSQL_ERRMSG_SIZE
? MYSQL_ERRMSG_SIZE
: argStringLength
);
} else {
memcpy(
message
, "Unspecified error raised"
, 25
);
}
return 1;
}
Basically, what happens is that we check if a string argument was passed:
if(args->arg_count==1
&& args->arg_type[0]==STRING_RESULT
&& args->args[0]!=NULL)
and if so, we calculate the number of bytes needed to hold that string
argStringLength = strlen(args->args[0])
+ 1
;
(Mind you, the + 1 is there because strings are null-terminated (\0 in C))
Then, this information is used to copy the stretch of memory pointed to by the argment to the strecth of memory pointed to by our
message
:
memcpy(
message
, args->args[0]
, argStringLength>MYSQL_ERRMSG_SIZE
? MYSQL_ERRMSG_SIZE
: argStringLength
);
and here, the precaution has been taken to restrict the number of bytes written to
message
:
, argStringLength>MYSQL_ERRMSG_SIZE
? MYSQL_ERRMSG_SIZE
: argStringLength
Now there's also the possibility that the wrong number and types of argument were passed to our UDF. In that case, we just copy a default message to
message
:
memcpy(
message
, "Unspecified error raised"
, 25
);
The only thing we have to do now is return a 1 (one), and our
raise_error_init() is ready:
return 1;
raise_error()
For the actual function it isn't really important what we put in there, or what return type it will have. Remember, the function will never be executed because
raise_error_init()
already stopped before control is passed to
raise_error()
. Let's leave the details undiscussed right now. Let's just say that our
raise_error()
really does not do anything.
long long raise_error(
UDF_INIT *initid
, UDF_ARGS *args
, char *is_null
, char *error
){
return 0;
}
so, here it is, doing nothing.
Module Definiton and Compilation
So far we just typed C code. But what we need is that this is all compiled into a DLL, in such a manner that other programs, like MySQL Server, will be able to invoke these functions. We must
expose our functions using the module definition file. We created that as "MySQLRaiseErrorUDF.def". It's done like this:
LIBRARY MySQLRaiseErrorUDF
EXPORTS
raise_error_init
raise_error
That's it. Not too hard eh?
Now, hit the compile button and generate a DLL.
Deployment
I put the DLL in the bin directory under the MySQL installation directory. Then, on the command line i did:
mysql> create function raise_error returns int soname 'MySQLRaiseErrorUDF.dll';
Query OK, 0 rows affected (0.04 sec)
mysql> select raise_error();
ERROR:
Unspecified error raised
as you can see, an error is raised.
mysql> select raise_error('Whoops!');
ERROR:
Whoops!
mysql>
Getting rid of the function is also easy:
mysql> drop function raise_error;
Query OK, 0 rows affected (0.00 sec)
(You need to do that before you can overwrite a DLL that's linked to the server via a UDF).
Limitations
Right now there are some limitations to this approach.
The most important is that our argument really must be a constant value in order to return it:
mysql> set @err:='Whoops'
-> ;
Query OK, 0 rows affected (0.00 sec)
mysql> select raise_error(@Whoops);
ERROR:
Unspecified error raised
This occurs because of the way the
UDF_ARGS
is initialized when it is passed to
raise_error_init()
. The argument values are only initialized for constant argument expressions.
Of course, we can get around it using
prepared statements:
create procedure raise_application_error(
errno int
, errm varchar(255)
) begin
set @errno := errno;
set @errm := errm;
set @stmt := concat(
'select raise_error'
, '(''',errm,''')'
);
prepare stmt from @stmt;
execute stmt;
deallocate prepare stmt;
end;
//
mysql> set @msg := 'Whoops, my bad, sorry!'
-> //
Query OK, 0 rows affected (0.00 sec)
mysql> call raise_application_error(-20000,@msg);
-> //
ERROR:
Whoops, my bad, sorry!
mysql>
This example also proves we can raise the error from inside a procdure. By the way, it works for triggers too, I checked.
Now what?!
Well, I would really like to do it the other way around too: reporting the current error via an UDF. But from what I've seen so far, this is not possible.
Anyway,I'm really glad I got my first UDF up an running. It took me some time (especially the module definition was not clear from any background docs i know), but at least now I know some of the pitfalls etc.