Book: "Pentaho Kettle Solutions", Matt Casters, Roland Bouman, & Jos van Dongen, Wiley 2010 Book: "Pentaho Solutions", Roland Bouman & Jos van Dongen, Wiley 2009

Thursday, June 18, 2009

MySQL Stored Functions: Impact of DECLARE HANDLER on Performance

Hi again!

Just a moment ago, I read this post by Peter Gulutzan. In this post, Peter explains a little trick that allows you to capture the SQL state in a variable whenever an error occurs in your MySQL stored routine code.

MySQL CONDITIONs and HANDLERs

For the uninitiated: in MySQL stored routines, you can declare HANDLERs which are pieces of code that are executed only in case a particular CONDITION occurs. This device serves the same purpose as a try ... catch block which is supported in many popular programming languages like Java, JavaScript, C++, PHP5 and C#.

Now, one of the long-standing problems with MySQL HANDLERs - the inability to explicitly raise a CONDITION - has recently been solved by implementing the SIGNAL syntax. The most important issue that remains is the inability to discover exactly which error occurred. You need this especially when you are writing a generic 'catch all' HANDLER (for example, on that is triggered in response to SQLEXCEPTION) and you want to write information regarding the error to a log.

Peter's Trick to Capture SQLSTATE

To cut a long story short, Peter's solution is based on writing a HANDLER for all known CONDITIONs in advance. Here's a fragment of his code to explain:
BEGIN
...
DECLARE EXIT HANDLER FOR SQLSTATE '01000' BEGIN SET @e='01000'; RESIGNAL; END;
...
DECLARE EXIT HANDLER FOR SQLSTATE 'XAE09' BEGIN SET @e='XAE09'; RESIGNAL; END;

...remainder of code...

END;
As Peter points out, it's tedious, but it works. (That is to say, it works better than not having anything)

Performance?

Now, one particular paragraph in Peter's post caught my eye:
I added 38 DECLARE EXIT HANDLER statements at the start of my procedure, just after the variable declarations. These lines are always the same for any procedure. They’re not executed unless an error happens so I don’t worry about speed.
I respect Peter a great deal - if he's got something to say you do well to listen and take his word for it. However, this time I was curious to find out if I could measure the effect of the HANDLER code at all.

Method

The code I used is a simplification of Peter's code. I tested it on MySQL 5.1 because I was just interested in the impact of a DECLARE HANDLER statement. In my case, the BEGIN...END block of the handler does not contain the RESIGNAL statement, and my function does not drop a table but simply returns 1. This is important, as none of the HANDLERs is ever triggered by my code.

Last week I wrote how seemingly small changes in MySQL stored routine code can have a surprisingly large impact on performance. In that particular case, I already had a hunch about which things could be improved. In this case, I just didn't know so I created a series of functions with 2, 4, 8, 16, 32 and 38 DECLARE HANDLER statements, and again I used the BENCHMARK() function to measure the time it takes to execute it 100,000 times. I did warm-up runs, and repeated the measurement 5 times for each function variant.

Results

The graph below summarizes my observations:
handler
The squares in the graph are the measurements - each one represents a version of the function. Along the horizontal axis you see the number of DECLARE HANDLER statements in that particular version of the function. The number of seconds it took to execute the function 100,000 times on my laptop using BENCHMARK() is on the vertical axis.
As you can see, there seems to be a linear relationship between the number of DECLARE HANDLER statements and the time it takes to execute the function. This is in itself a surprise. I mean, I would expect a little overhead per DECLARE statement when the function is compiled initially. After that, it is cached at the session level, and beyond that point I would not expect statements that do not execute to have any impact.

So how badly do the HANDLER declarations slow our function down? Well, I measured an average of 0.38 seconds for 2, and an average of 0.55 seconds for 38 DECLARE HANDLER statements respectively. The difference, 0.18 seconds is a little less than 50% of the function variant with 2 DECLARE HANDLER statements, and a little more than 30% of the function having 38 DECLARE HANDLER statements.

Conclusion

To be fair, the function I tested doesn't actually do anything, and if your function or stored procedure does some real processing, the overhead may be neglectable. However, you can clearly see that even just declaring a handler has a measurable negative impact on performance. The essence of Peter's trick is to actually always write a DECLARE HANDLER for each possible condition and to do this for each stored routine. You will certainly suffer a peformance hit for small functions, esp. if they get called a lot.

5 comments:

Peter Gulutzan said...

I suppose I should worry more before saying "don't worry".
I've edited my blog posting to refer to the above testing.

Peter Gulutzan
Sun Microsystems / MySQL

Shlomi Noach said...

Yet again, great post.

Though I have no information on how the HANDLER is implemented in MySQL stored routines, I can project from my Java and C++ knowledge, that there is indeed an overhead for handling errors, and per error for that matter.
In java bytecode, for example, you need to code information about the scope of code for which each 'catch block' (HANDLER in our case) is responsible for.
This adds not only more bytes to the code, but also search logic when something does occur.

I suppose naive implementations may choose to verify per block if it's required to enter it or not (verify for each HANDLER is it should spawn) - and hopefully MySQL does not implement it this way.

Roland Bouman said...

Hi!

@PeterG: Worry is too much, I would say it's good to be aware of the impact. Your workaround is still valid and usable - one should just use it wisely. For stored procedures that need to do a lot of IO anyway, a few extra HANDLERs won't hurt. For a small function, it's perhaps better to trap errors in the calling context, not for each function call.

@Shlomi: thanks, glad you liked it :) And thanks for your explanations of where the overhead might come from. I was thinking that perhaps simply code size was the culprit.

Anyway - all thanks for your comments, and kind regards,

Roland

Luc Boudreau said...

I'd be curious to have benchmarks of your tests for cases when an exception does occur. Declaring 38 handlers might have a slight and neglectable impact on the performance when nothing turns sour, then again, what happens if the error handling mechanism of stored procedures *does* have to go through all those declarations and find the proper execution path.

Roland Bouman said...

Hi Luc!

it's an interesting point - I haven't tested it so I don't know.

Perhaps if I have some time later on, I'll post the results here.