Wednesday, November 23, 2005

Using an UDF to Raise Errors from inside MySQL Procedures and/or Triggers

Yes, this is really a feature that would be most welcome.

Ever since I started following the MySQL Stored Procedure/Functions and Trigger Forims (say, June 2005) this feature, or rather, the lack thereof, has been the subject of a substantial amount of threads (see: http://forums.mysql.com/read.php?98,24044,24100, http://forums.mysql.com/read.php?99,22523,23797#msg-23797 or more recent: http://forums.mysql.com/read.php?99,55108,55541)

Luckily, this has been recognized by the MySQL developers, and there's an entry in the MySQL 5.0 Technical FAQ that addresses the issue.

But, there's always people that can't wait. For example, Scott Maxwell devised a most creative solution. Props to you Scott!

Well, I scribbled up a little solution that takes another aproach to the problem. I won't pretend I'm the first one that thought of this. It was actually suggested quite a few times, here and there. I must say, I did not see anyone offering the code to do it, but this maybe just ignorance on my part. At least it gave me the opportunity to touch a whole new field of MySQL development: User Defined Functions or UDFs.

Using an UDF


A UDF is an external function that's dynamically linked to the MySQL server software. You can look it all up in the reference manual. There are some obscurities, especially when you're not an C programmer.

(However, the obscurities could be intentional. Maybe it's meant that way, just to scare people off that don't know what they're doing. Well, I'm the first to admit it: I am not an experienced C programmer, and most of the time (well...) I don't know what I'm doing (not just in C, applies to almost any programming language). I guess the difference between starting to do something not knowing what you're doing and finishing doing that is called experience. Or failure. Anyway, the key is: not letting other people know which one it is, and at the same time pretending you do know the difference yourself.)

First, you need have the object code that contains the UDF. The object code is just the compiled, binary code that contains your functions. Now this compilation must be done in a certain manner so that it results in a library that can be dynamically linked to the MySQL Server process. I don't know how this works out for a linux box (but it has something to do with .so files, your guess is probably better than mine is), but in Windows Speak this is called a DLL (Dynamicaly Linked Library).

Once you have the binary, it must be placed in some location that's accessible to MySQL (I'll tell you in a bit where i put it). Finally, you must issue some specialized DDL command so that MySQL knows it exists (yep, I'll tell you that too). From that point on, you will have this UDF accessible throughout the server. Just like the builtin functions, you will never have to prefix it with a database name. You will never have to grant it to anyone either. It's just like using builtin functions, such as LOWER() or GREATEST().

I'll try the best I can to describe what I did to get it working. I really hope it helps.

Setting up a Project


We have a very modest goal here. We just need one little function that will do nothing but raise an error. Normally, I'm quite good at breaking stuff, and my code gives runtime errors all the time, so this shouldnt be too hard. So let's get started and code some stuff.

Well, the relevant pages in the manual say that you should write the function in C. You probably could do it in other languages as well as long as calling conventions, stack properties and who knows whatnot will be the same. But, I'm not Indiana Jones: just writing a little bit of C is adventure enough for now.

I just used Visual Studio C++ 2003 for no particular reason. It's just what I happen to run. You could probably achieve the same with another IDE and Compiler. But hey, I don't know those, so I won't describe that (Maybe when I get Eclipse hooked up to a freely obtainable compiler I'll relive the experience). BTW, you can download the free (as in bear) VC++ express edition from you know who here. Mind you, it's not entirely the same, but it should support the basic features I'm using here.

I opened a new, blank solution and added a win32 empty DLL project: "MySQLRaiseErrorUDF.vcproj". I added one source C/C++ source file: "MySQLRaiseErrorUDF.cpp", and one Module Definition file: "MySQLRaiseErrorUDF.def". At this stage, both are still empty. However, Visual Studio knows that it should offer C/C++ specific property pages in the Project options dialog, which is good, cause that's what we'll be setting up next.

(Note that the .cpp extension is usually associated with C++ source files. However, we do not use any C++ specific stuff. It just happend to be the only sort of source file I could create using the VS wizard.)

I right-clicked the project to modify the following options:

  • In the "General" page of the "C/C++" options folder, I modified the "Additional Include Directories" property to set it to the "include" folder under the MySQL installation directory (available after a full or custom/development install). We need this to be able to refer to the "mysql.h" header file in our source.

  • In the "Input" page of "Linker" folder, I included the "MySQLRaiseErrorUDF.def" file we just added to the project.



Coding the DLL


Now, we can start coding. First, our C source needs to include the "mysql.h" header file. This contains references to "winsock.h" which is included in the windows platform sdk. Most probably, the latter has a different counterpart on other operating systems. (Anyone that can enlighten me concerning this, and even help making the source compilable for multiple platforms, is most invited.) So, we start like this:


#include <winsock.h>
#include <mysql.h>


A simple, non-aggregate function that is to be called from the MySQL side corresonds to three functions in the C code. Say, we want to create a function called: raise_error(). Then, our C file must at least contain the code for that, and optionally contain a raise_error_init() and a raise_error_deinit() function. (look it up in the manual)

We'll just rush on to coding the raise_error_init function. Once we got that, you'll see that the raise_error() is really very trivial.

Now, raise_error_init() functions MUST always have this signature:
(Look here for the relevant page in the reference manual.)


my_bool raise_error_init(
UDF_INIT *initid
, UDF_ARGS *args
, char *message
);


There's not much to explain as to why it must be like this: that's just the way MySQL was built. I can tell what this signature contains, and hope that that'll be enough of an explanation.

initid


The initid is declared as a pointer to a struct of the UDF_INIT type. This is defined in mysql_com.h (ln 369..376). It doesnt really matter right now what a pointer is, or what a struct is. What does matter, is that the initid variable is like a composite variable. You can think of it like a record which has a couple of fields (members is the appropriate term here) that contain information about our function.

On the one hand, mysql passes initid to our raise_error_init() function so that we get some information on the context that's calling our raise_error() function; On the other hand, we can change the values in the fields of initid to provide information to our raise_error(), because the very same initid will be passed on by MySQL to the raise_error() after raise_error_init() finishes.

args


Like initid, args is a pointer to a particular struct, this time, the UDF_ARGS struct. That's defined in mysql_com.h (ln 356..365).

args provides information on the arguments passed to our raise_error() function. Again, we can both read and write data to and from the members of this struct just before the raise_error() itself is called.

message


Now we are arriving at the heart of the matter.
The message variable is again a pointer (man, these guys just love pointers). Maybe now it's a good moment to explain what a pointer is exactly.

A pointer is actually just an integer value. But it's a special integer value: it's an integer that identifies, or "points to", a position in the computer's memory. Quite literlly, the pointer's integer value is the address of a byte in the computer's RAM.

Pointers have some added semantics. Their type definition also includes the datatype of the stuff that's stored at that address. Or more appropriately, it includes the datatype that will be used to interpret whatever happens to be in the computer's memory starting from the address identified by the pointer integer value. For example, the message argument is a char pointer, notated as char * in C.

Now, the contract of the XXX_init() function holds that message points to a buffer of a particular size, MYSQL_ERRMSG_SIZE bytes long. Knowing what we know now, we can also say that message identifies an address that demarcates a stretch of consecutive bytes that are reserved.

Furthermore, XXX_init() expects us to write to this memory a string of characters that human users may read and recognize as a meaninful error message text. At least, if we can detect an error right now.

The main reason to use this feature for an ordinary function would be to check that the UDF has been called with the right amount and types of arguments in order to exit in case of invalid arguments. Now, for the function we're writing now it's easy: raise_error() should always generate an error; that's the entire reason for it's existence. And, while we're at it, it would be cool to use the value passed by the user as argument as error message. That would pretty much conclude the functionality of raise_error().

Returning from raise_error_init()


Actually, the message argument to raise_error_init() is only half of the story concerning error generation. (see the manual for more details) In order to let MySQL know that an error occurred we MUST return a 1 (one). Normally, a zero (0) should be returned, indicating that execution may proceed execution.

raise_error_init()


So, finally, here's the code for raise_error_init():


my_bool raise_error_init(
UDF_INIT *initid
, UDF_ARGS *args
, char *message
){
unsigned int argStringLength;
if(args->arg_count==1
&& args->arg_type[0]==STRING_RESULT)
{
argStringLength = strlen(args->args[0])
+ 1
;
memcpy(
message
, args->args[0]
, argStringLength>MYSQL_ERRMSG_SIZE
? MYSQL_ERRMSG_SIZE
: argStringLength
);
} else {
memcpy(
message
, "Unspecified error raised"
, 25
);
}
return 1;
}


Basically, what happens is that we check if a string argument was passed:


if(args->arg_count==1
&& args->arg_type[0]==STRING_RESULT
&& args->args[0]!=NULL)


and if so, we calculate the number of bytes needed to hold that string


argStringLength = strlen(args->args[0])
+ 1
;


(Mind you, the + 1 is there because strings are null-terminated (\0 in C))

Then, this information is used to copy the stretch of memory pointed to by the argment to the strecth of memory pointed to by our message:


memcpy(
message
, args->args[0]
, argStringLength>MYSQL_ERRMSG_SIZE
? MYSQL_ERRMSG_SIZE
: argStringLength
);


and here, the precaution has been taken to restrict the number of bytes written to message:


, argStringLength>MYSQL_ERRMSG_SIZE
? MYSQL_ERRMSG_SIZE
: argStringLength


Now there's also the possibility that the wrong number and types of argument were passed to our UDF. In that case, we just copy a default message to message:


memcpy(
message
, "Unspecified error raised"
, 25
);


The only thing we have to do now is return a 1 (one), and our raise_error_init() is ready:


return 1;


raise_error()


For the actual function it isn't really important what we put in there, or what return type it will have. Remember, the function will never be executed because raise_error_init() already stopped before control is passed to raise_error(). Let's leave the details undiscussed right now. Let's just say that our raise_error() really does not do anything.


long long raise_error(
UDF_INIT *initid
, UDF_ARGS *args
, char *is_null
, char *error
){
return 0;
}


so, here it is, doing nothing.

Module Definiton and Compilation


So far we just typed C code. But what we need is that this is all compiled into a DLL, in such a manner that other programs, like MySQL Server, will be able to invoke these functions. We must expose our functions using the module definition file. We created that as "MySQLRaiseErrorUDF.def". It's done like this:


LIBRARY MySQLRaiseErrorUDF
EXPORTS
raise_error_init
raise_error


That's it. Not too hard eh?
Now, hit the compile button and generate a DLL.

Deployment


I put the DLL in the bin directory under the MySQL installation directory. Then, on the command line i did:


mysql> create function raise_error returns int soname 'MySQLRaiseErrorUDF.dll';
Query OK, 0 rows affected (0.04 sec)
mysql> select raise_error();
ERROR:
Unspecified error raised


as you can see, an error is raised.


mysql> select raise_error('Whoops!');
ERROR:
Whoops!
mysql>


Getting rid of the function is also easy:


mysql> drop function raise_error;
Query OK, 0 rows affected (0.00 sec)


(You need to do that before you can overwrite a DLL that's linked to the server via a UDF).

Limitations


Right now there are some limitations to this approach.
The most important is that our argument really must be a constant value in order to return it:


mysql> set @err:='Whoops'
-> ;
Query OK, 0 rows affected (0.00 sec)

mysql> select raise_error(@Whoops);
ERROR:
Unspecified error raised


This occurs because of the way the UDF_ARGS is initialized when it is passed to raise_error_init(). The argument values are only initialized for constant argument expressions.

Of course, we can get around it using prepared statements:


create procedure raise_application_error(
errno int
, errm varchar(255)
) begin
set @errno := errno;
set @errm := errm;
set @stmt := concat(
'select raise_error'
, '(''',errm,''')'
);
prepare stmt from @stmt;
execute stmt;
deallocate prepare stmt;
end;
//

mysql> set @msg := 'Whoops, my bad, sorry!'
-> //
Query OK, 0 rows affected (0.00 sec)

mysql> call raise_application_error(-20000,@msg);
-> //
ERROR:
Whoops, my bad, sorry!
mysql>


This example also proves we can raise the error from inside a procdure. By the way, it works for triggers too, I checked.

Now what?!


Well, I would really like to do it the other way around too: reporting the current error via an UDF. But from what I've seen so far, this is not possible.

Anyway,I'm really glad I got my first UDF up an running. It took me some time (especially the module definition was not clear from any background docs i know), but at least now I know some of the pitfalls etc.

16 comments:

Anonymous said...

Thank you.

But why this can't be used to emulate SIGNAL?

rpbouman said...

Hi anonymous!

I don't know what you mean exactly, but I'll make a guess.

According to SQL 2003 PSM (part 4), SIGNAL syntax is like this:

<signal statement> ::=
SIGNAL <signal value> [ <set signal information> ]

So at least we will have to accept that we must use enclosing parenthesis.

The spec also sais this:

<signal value> ::=
<condition name>
| <sqlstate value>

well, we can pass whatever condition name or sqlstate to whatever UDF we can think of....there will never be a HANDLER clever enough to recognize that is should react specifically to our UDF raising an error.

To exen come close, you must be able to set the internal mysql error number from within the UDF; of course, for the same session that runs the UDF.

I looked into that, and as far as i can see this is not possible. I would like to be able too of course, also for the other way around: reporting the exact error in case of handelr for SQLException

rpbouman said...

Yes that's true, you can setup a context using user variables such as i did with @sqlerrm and @sqlcode, mimicking the Oracle stuff.

But what I meant by returning the error code from an UDF is this. Suppose I have BEGIN..END block, in which all kinds of stuff could go wrong: constraint violations, NO DATA FOUND, tables does not exist...you name it. Suppose these are all genuine error situations that i cannot compensate. Well, what I would want to do is something like this:

BEGIN
DECLARE EXIT HANDLER
FOR SQLEXCEPTION
INSERT
INTO ERROR_LOG(
proc
, block
, error_number
, error_message
, sql_state
) VALUES (
current_proc()
, current_block()
, mysql_error_number()
, mysql_error_message()
, sql_state()
);
/*
here comes the stuff
that could go wrong
*/
END;

Now, I can live without current_proc() and current_block(), but it would be nice to have at least one function that returns the errornumber or the sqlstate. And, from as far as I can see, you can't do that, not even with an UDF.

While it's true I can emulate something like this if the only errors would be raised by the UDF raise_error, I still need to be able to do the same for 'normal' errors.

Another thing we can't do is define a CONDITION for an arbitrary expression:

BEGIN
DECLARE v_error bool default false;
DECLARE myCondition
CONDITION FOR
v_error = TRUE;
DECLARE CONTINUE HANDLER
FOR myCondition
BEGIN
call compensate();
set v_error := FALSE;
END;
/* do stuff */
END;

Anonymous said...

Very good article!
I use ubuntu and this is what I had to do to make it work. Install libmysql++-dev.

Replace winsock.h with string.h in the include statement.

As root, run the command:
gcc -shared libmysqlraiseerror.c -o libmysqlraiseerror.so -I /usr/include/mysql

Where libmysqlraiseerror.c is the C source file and libmysqlraiseerror.so is the output file name.

copy libmysqlraiseerror.so into the /usr/lib

rpbouman said...

Thank you for your kind comments! Glad you could get it to work too :)

Unknown said...

I'm trying to compile in OpenSolaris 2008.05, but I always receive the error "ERROR 1123 (HY000): Can't initialize function 'raise_error';".
Do you have any tips/tricks?
I've compiled ir with no problems, and also the function loading is working ok.
I'm using OpenSolaris 64 bit.

rpbouman said...

Hi Edson,


Do you have the raise_error_init function defined?

rpbouman said...

Oh Edson, sorry my comment does not make sense I guess. I am wondering did you use exactly the code I have for raise_error_init()?

I don't have Solari64 bit running so I can't quickly check it out myself. But if you have the time, please send me the code, and I'll have a look. Maybe I see something you accidentally overlooked.

Thanks!

Roland

Unknown said...

I've changed the code little bit during my tests. This is the last version:

#include <mysql.h>

#include <string.h>


my_bool raise_error_init(UDF_INIT *initid, UDF_ARGS *args, char *message);

long long raise_error(UDF_INIT *initid __attribute__((unused)), UDF_ARGS *args,
char *is_null __attribute__((unused)),
char *error __attribute__((unused)));


my_bool raise_error_init(
UDF_INIT *initid

, UDF_ARGS *args

, char *message
){
unsigned int argStringLength;
if(args->arg_count==1
&& args->arg_type[0]==STRING_RESULT)
{
argStringLength = strlen(args->args[0])
+ 1
;
memcpy(
message
, args->args[0]
, argStringLength>MYSQL_ERRMSG_SIZE
? MYSQL_ERRMSG_SIZE
: argStringLength
);
} else {
memcpy(
message
, "Unspecified error raised"
, 25
);
}
return 1;
}

long long raise_error(UDF_INIT *initid __attribute__((unused)), UDF_ARGS *args,
char *is_null __attribute__((unused)),
char *error __attribute__((unused))) {
return 0;
}

int main(int argc, char **argv)
{
return 0;
}


compiled with:

gcc -shared -R/usr/sfw/lib/64 -m64 libmysqlraiseerror.c -o libmysqlraiseerror.so -I /opt/mysql/mysql/include -fPIC

I suspect the problem is "long long", because the machine already is 64bit... But I'm not sure - I just leaved C programming 15 years ago...

Thanks for your kind support.

rpbouman said...

My initial though too was that it might be the long long, and if you really want to you can use my_longlong or my_ulonglong instead.

However it doesn't really add up since the error message says

"ERROR 1123 (HY000): Can't initialize function 'raise_error';"

Well, since you return 1 from the init function *this is exactly the expected result*.

What is unexpected is that you do not see the argument as message.

Now, I do see a problem in your init function. This line:

argStringLength = strlen(args->args[0])

is not good. You cannot (should not) attempt to determine the argument length using strlen. You see, both UDF args as well as return value (for string UDFs) can convey a binary string. So, these strings are not \0 terminated.

Use args->lengths[index] to determine the length instead. (see http://dev.mysql.com/doc/refman/5.1/en/udf-arguments.html)

So, something like:

my_bool raise_error_init(
UDF_INIT *initid
, UDF_ARGS *args
, char *message
){
if(args->arg_count==1
&& args->arg_type[0]==STRING_RESULT){
int length = (args->lengths[0] > (MYSQL_ERRMSG_SIZE - 1)) ? MYSQL_ERRMSG_SIZE - 1: args->lengths[0];
memcpy(
message
, args->args[0]
, length
);
message[length] = '\0';
} else {
memcpy(
message
, "Unspecified error raised"
, 25
);
}
return 1;
}

should work.

Unknown said...

Thanks, Roland!
You did the trick.
Interesting to note, MySQL 5.1.25 produces a little different error message as result:

mysql> select raise_error('Whoops!');
ERROR 1123 (HY000): Can't initialize function 'raise_error'; Whoops!

So, it's working, besides the error appears with the "Can't initialize function 'raise_error'; in the message.

There is any chance to implement this same procedure but using the raise_error (returning the message, and then "return 0" in body of main function)?

Actually, this is just a brainstorm...

Thanks for your support so far!

Edson Richter

rpbouman said...

Hi Edson,

I'm so glad it is working now ;)

Yeah, I think I noticed a change in the message too, maybe in 5.1.23 already. I don't think you can change that easily. Maybe it is possible to hack the message files although that is kind of dodgy.

re. returning 0 - no, that won't work. You signal an error by returning something else than zero - that's just how the interface works.

To tell you the truth, I think MySQLs current behaviour is actually sane - the error message makes a lot of sense if you really do have a problem with a failed UDF. The UDF to raise an error is basically a hack, just like any of the other methods used to force a runtime error.

Anyway, I do think there is something you can do. In your application or better, db access layer, you can check for the code, message *and function name*. As long as you reserve "raise_error" for this trick, you can distinguish between a real UDF failure and a raised error at the application level. Yes, I know - ugly but it will work, and it will be way less cumbersome than for example causing a duplicate key or not null problem. An important thing to note is that with the UDF you at least have a fighting chance to work around it - the other methods to raise a runtime error don't have any method to let you discover an raised error occurred.

I hope this helps,

Roland

Anonymous said...

hi Roland, can i have copyed this code to vis2005 and i havent compile this code, i have get alot of error. can i get a complied file for linux from you.thanx alot. my email hasimahmet@hotmail.com

rpbouman said...

Hi hasimahmet@hotmail.com,

Please goto

http://www.mysqludf.org/lib_mysqludf_udf/index.php

and download

http://www.mysqludf.org/lib_mysqludf_udf/lib_mysqludf_udf_0.0.3.tar.gz

Read instructions here to compile on windows:

http://rpbouman.blogspot.com/2007/09/creating-mysql-udfs-with-microsoft.html

After compiling, read and use http://www.mysqludf.org/lib_mysqludf_udf/index.php#udf_init_error

Good luck!

Anonymous said...

I'm litle iddle !
can you post the binary DLL ?

kutil kelamin said...

I read few articles on this site and I think that your web blog is real interesting and Power to the People of excellent information.
Keep up the excellent work

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...