Tuesday, December 15, 2009

Validating MySQL data entry with triggers: A quick look at the SIGNAL syntax

The latest MySQL 5.5 milestone release offers support for an ANSI/ISO standard feature called the SIGNAL syntax. You can use this syntax inside stored routines (including triggers) to raise an error condition which can be used to invoke specific error handling, or otherwise abort the stored routine. In addition, you can use the SIGNAL syntax to convey information about what went wrong, which may be used by the caller to handle the error.

I have written about MySQL data entry validation procedures in the past. At the time, MySQL did not support any proper means to raise an error condition inside a stored routine, and one had to work around that by deliberatly causing a runtime error, for example by referring to a non-existent table, setting a non-nullable column to null, or ]Yting a specially crafted UDF. In this artcle, I'm taking a closer look at how to implement more robust data entry validation in MySQL using the SIGNAL syntax.

Triggers


For those of you that are unfamiliar with the subject, MySQL offers support for triggers as of version 5.0. Triggers are stored routines that are executed automatically right before or after data change events like a row being inserted, updated or deleted. Because triggers are executed as part of the SQL statement (and its containing transaction) causing the row change event, and because the trigger code has direct access to the changed row, you could in theory use them to correct or reject invalid data.

Example data validation problem


Let's take a quick look at the following example. Suppose you have a table called person to store data about persons:

CREATE TABLE person (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, first_name VARCHAR(64) NOT NULL
, last_name VARCHAR(64) NOT NULL
, initials VARCHAR(8)
)

Let's consider a simple validation method for the initials column:

  • We require the first letter of the value for the initials column to match the first letter of the value of the first_name column.

  • In addition, we require that the values for the initials column consists of uppercase letters separated by periods.


In order to implement this, we design the following algorithm:

  • If the value for first_name is NULL, we do nothing. The NOT NULL table constraint will prevent the data from being entered anyway, so further attempts at validation or correction are pointless.

  • If the value for initials is NULL, we correct it by automatically filling in the first character of the value for first_name.

  • If the values for first_name as well as initials are both not NULL, we require that the first character of the value for first_name equals the first character of the value for initials.

  • Finally, we use a regular expression to check if the value for initials matches the desired pattern of uppercase letters separated by periods.


A data validation stored procedure


Let's start by creating a stored procedure to perform this algorithm. Here's the code for the p_validate_initials procedure which validates and possibly corrects the initials value based on the value for first_name:

DELIMITER go

CREATE PROCEDURE p_validate_initials(
IN p_first_name VARCHAR(64)
, INOUT p_initials VARCHAR(64)
)
DETERMINISTIC -- same arguments yield same result, always
NO SQL -- does not execute SQL statements, only procedural logic
_main: BEGIN

DECLARE WARN_CORRECTED_INITIALS CONDITION FOR SQLSTATE '01000';
DECLARE ERR_INITIALS_DONT_MATCH_FIRSTNAME CONDITION FOR SQLSTATE '45000';
DECLARE ERR_INITIALS_ILLFORMATTED CONDITION FOR SQLSTATE '45000';

IF p_first_name IS NULL THEN
LEAVE _main; -- nothing to validate
ELSEIF p_initials IS NULL THEN -- initials are NULL, correct:
SET p_initials := CONCAT(LEFT(p_first_name, 1), '.'); -- take the first letter of first_name
SIGNAL WARN_CORRECTED_INITIALS -- warn about the corrective measure
SET MESSAGE_TEXT = 'Corrected NULL value for initials to match value for first_name.';
ELSEIF BINARY LEFT(p_first_name, 1) != LEFT(p_initials, 1) THEN -- initials don't match first_name
SIGNAL ERR_INITIALS_DONT_MATCH_FIRSTNAME -- raise an error
SET MESSAGE_TEXT = 'The first letter of the value for initials does not match the first letter of the value for first_name';
END IF;
IF NOT p_initials REGEXP '^([A-Z][.])+$' THEN -- if initials don't match the correct pattern
SIGNAL ERR_INITIALS_ILLFORMATTED -- raise an error
SET MESSAGE_TEXT = 'The value for initials must consist of upper case letters separated by periods.';
END IF;
END;
go

DELIMITER ;

Let's take a look at how this procedure works.

How to issue warnings


First, let's pass NULL for the initials to see if they are properly corrected:

mysql> set @initials := null;
Query OK, 0 rows affected (0.00 sec)

mysql> call p_validate_initials('Roland', @initials);
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> select @initials;
+-----------+
| @initials |
+-----------+
| R. |
+-----------+
1 row in set (0.00 sec)

Note that executing the procedure placed the correct value into the @initials user-defined variable. Also note that a warning was issued.

In this case, the procedure ran through the following branch of the first IF statement:

...
ELSEIF p_initials IS NULL THEN
SET p_initials := CONCAT(LEFT(p_first_name, 1), '.');
SIGNAL WARN_CORRECTED_INITIALS
SET MESSAGE_TEXT = 'Corrected NULL value for initials to match value for first_name.';

ELSEIF ...

The warning is caused by the SIGNAL statement (which appears in bold text in the snippet above). The general syntax for the SIGNAL statement is:

SIGNAL condition_value
[SET signal_information [, signal_information] ...]

condition_value:
SQLSTATE [VALUE] sqlstate_value
| condition_name

signal_information:
condition_information_item = simple_value_specification

So, in this case, we used the condition name WARN_CORRECTED_INITIALS as condition_value. This condition is declared in the top of the procedure:

DECLARE WARN_CORRECTED_INITIALS CONDITION FOR SQLSTATE '01000';

Basically, condition declarations like these serve simply to tie a human readable name to otherwise obscure SQLSTATE values. As per the SQL standard, SQLSTATE values are strings of 5 digits. The prefix 01 indicates a warning.

(Condition declarations like these are not only useful to clarify the meaning of your SIGNAL statements, you can also use them to declare error HANDLERs)

An alternative syntax for SIGNAL allows you to directly refer to the SQLSTATE without explicitly declaring a CONDITION. So, if you feel that declaring explicit conditions is too much trouble, you can also omit that and write:

SIGNAL SQLSTATE '01000'
...

(However, I like using explicit condition names better because it does a better job of explaining the intention of the code.)

Conveying SIGNAL context information


The SIGNAL statement also features a SET-clause which is used to convey signal information. In our example the set clause was:

SET MESSAGE_TEXT = 'Corrected NULL value for initials to match value for first_name.';

As you can see, the SET-clause in the example contains an assignment. In the context of the SIGNAL statemntt, such an assignment is referred to as signal_information. The left hand side of the assignments must be one of the predefined condition_information_items. The SET-clause can have multiple of these signal_information items which can be used to capture and communicate program state to the client program.

In the case of the example we can demonstrate how this works using the MySQL command-line client. By issuing a SHOW WARNINGS statement, we can see the message text was conveyed with with the signal_information item:

mysql> show warnings;
+-------+------+---------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------+
| Error | 1644 | The value for initials must consist of upper case letters separated by periods. |
+-------+------+---------------------------------------------------------------------------------+
1 row in set (0.01 sec)

Unfortunately, MySQL does not currently support a way for any calling stored routines to capture the signal_information items. They are currently only available in the APIs you use to communicate with MySQL, so you can capture them in your application code.

(A bug has been filed to ask for the ability to refer to signal information items in stored routines. This should become available whenever MySQL implements a DIAGNOSTICS feature)

Predefined condition information items


I just mentioned that the left-hand side of the signal_information item assignment must be one of the predefined condition_information_items. These are dictated by the standard, and although MySQL allows all of the standard condition_information_items, only two of them are currently relevant: MESSAGE_TEXT and MYSQL_ERRNO. We already illustrated using MESSAGE_TEXT. The MYSQL_ERRNO is a non-standard condition information item that can be used to convey custom error codes.

This leaves currently three variables to convey information about the context of the SIGNAL statement:

  • SQLSTATE: available in the C API as mysql_sqlstate()

  • MYSQL_ERRNO: available in the C API as mysql_errno()

  • MESSAGE_TEXT: available in the C API as mysql_error()


(You should be able to obtain the information also in PHP through the corresponding mysqli_, pdo_ and mysql_ functions.)

How to issue errors


We just discussed how to cause your stored routine to issue warnings. Issuing errors is exactly the same process, it just relies on a different class of SQLSTATE values (as determined by the code prefix). Let's see the errors in action:

mysql> set @initials := 'r';
Query OK, 0 rows affected (0.00 sec)

mysql> call p_validate_initials('Roland', @initials);
ERROR 1644 (45000): The first letter of the value for initials does not match the first letter of the value for first_name

In this case, the stored routine ran through the last branch of the first IF statement:

ELSEIF BINARY LEFT(p_first_name, 1) != LEFT(p_initials, 1) THEN
SIGNAL ERR_INITIALS_DONT_MATCH_FIRSTNAME
SET MESSAGE_TEXT = 'The first letter of the value for initials does not match the first letter of the value for first_name';
END IF;

As you can see, the syntax for the actual SIGNAL statement is exactly similar to what we saw in the example illustrating warnings. The most important difference is that in this case, the condition that is being signalled is declared with a SQLSTATE value of 45000:

DECLARE ERR_INITIALS_DONT_MATCH_FIRSTNAME CONDITION FOR SQLSTATE '45000';

The SQLSTATE code 45000 is a special and indicates a general user-defined exception.

Using the procedure in a trigger


All we need to do now is create our triggers on the person table that call the procedure to perform the actual validation. We need to apply the validation when data is inserted into the table, but also when data is updated. If it turns out the data is invalid, we need to reject the change. For this reason, we want to create triggers that fire before the data change is applied to the table.

So, to enforce validation, we need two triggers: one that fires BEFORE INSERT events, and one that fires BEFORE UPDATE events. Because the validation process itself is the same regardless of the type of change event, both triggers can call the p_validate_initials procedure to perform the actual validation. This allows us to write (and maintain!) the validation logic only once, and reuse it whenever we need it.

DELIMITER go

CREATE TRIGGER bir_person
BEFORE INSERT ON person
FOR EACH ROW
BEGIN
CALL p_validate_initials(
NEW.first_name
, NEW.initials
);
END;
go

CREATE TRIGGER bur_person
BEFORE UPDATE ON person
FOR EACH ROW
BEGIN
CALL p_validate_initials(
NEW.first_name
, NEW.initials
);
END;
go

DELIMITER ;

A quick check indicates that data validation is now enforced as intended:

mysql> INSERT INTO person (id, first_name, last_name, initials)
-> VALUES (2, 'Roland', 'Bouman', 'r');
ERROR 1644 (45000): The first letter of the value for initials does not match the first letter of the value for first_name

mysql> INSERT INTO person (id, first_name, last_name, initials)
-> VALUES (2, 'Roland', 'Bouman', 'R');
ERROR 1644 (45000): The value for initials must consist of upper case letters separated by periods.

mysql> INSERT INTO person (id, first_name, last_name, initials)
-> VALUES (2, 'Roland', 'Bouman', NULL);
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+---------+------+------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------------------------+
| Warning | 1642 | Corrected NULL value for initials to match value for first_name. |
+---------+------+------------------------------------------------------------------+
1 row in set (0.00 sec)

Final words


For more information on MySQL 5.5, check out Giuseppe Maxia's article on Getting Started with MySQL 5.5. Detailed information on the SIGNAL syntax is available in the reference manual here: http://dev.mysql.com/doc/refman/5.5/en/signal-resignal.html.

24 comments:

Shlomi Noach said...

As always, thorough and informative!

LenZ said...

Great article. Thanks a lot for this introduction, Roland! This is very useful.

Matthew Montgomery said...

This is a great article and I look forward to using SIGNAL/RESIGNAL. One problem I see in relying on these procedures is that since mysqld does not support a comprehensive ALTER PROCEDURE or atomic CREATE OR REPLACE PROCEDURE syntax, maintenance of procedures requires separate DROP and CREATE commands. This then requires that you block writes to these tables while the procedure is being re-made, otherwise they could throw errors.

This old outstanding bug/feature really should get more attention: http://bugs.mysql.com/bug.php?id=9588

rpbouman said...

Hi All, thanks for the kind words.

Matthew, perhaps you can use a work around: LOCK the table, then recreate the procedure?

Shlomi Noach said...

Roland,
Not sure about routines, but definitely won't work with triggers, as my experience tells. While LOCKing tables, triggers cannot be dropped; being the late hour it is, I don't have the link to the bugs system; but this is reported.

Shlomi

rpbouman said...

Hi Shlomi!

odd, I seem to be able to do it anyway:

mysql> lock tables person write;
Query OK, 0 rows affected (0.01 sec)

mysql> drop trigger bir_person;
Query OK, 0 rows affected (0.01 sec)

mysql> DELIMITER go
mysql>
mysql> CREATE TRIGGER bir_person
-> BEFORE INSERT ON person
-> FOR EACH ROW
-> BEGIN
-> CALL p_validate_initials(
-> NEW.first_name
-> , NEW.initials
-> );
-> END;
-> go
Query OK, 0 rows affected (0.09 sec)

mysql> DELIMITER ;
mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)

Anonymous said...

Hi Roland! As is typical, great article :) One little typo, though:

"The MESSAGE_ERRNO is a non-standard condition information..."

I believe you meant MYSQL_ERRNO, not MESSAGE_ERRNO :)

Cheers!

Jay

rpbouman said...

Hi Jay!

thanks for the kind words, I appreciate it a lot. And, thanks for spotting the typo - you are right.
Fixed now!

thanks again,

Roland.

Matthew Montgomery said...

Roland,

Sorry, I'm talking about maintenance of the procedure itself not the trigger. LOCK TABLE won't work here.

mysql> create table t1 (a int);
Query OK, 0 rows affected (0.01 sec)

mysql> lock table t1 read;
Query OK, 0 rows affected (0.00 sec)

mysql> delimiter ;;
mysql> create procedure proc1 () begin select 1; end;;
ERROR 1100 (HY000): Table 'proc' was not locked with LOCK TABLES
mysql> lock tables t1 read, mysql.proc read;;
Query OK, 0 rows affected (0.00 sec)

mysql> create procedure proc1 () begin select 1; end;;
ERROR 1099 (HY000): Table 'proc' was locked with a READ lock and can't be updated

mysql> lock tables t1 write, mysql.proc write;;
ERROR 1428 (HY000): You can't combine write-locking of system tables with other tables or lock types

-- Trying to drop the procedure under table lock you get --

mysql> unlock tables;;
Query OK, 0 rows affected (0.00 sec)

mysql> create procedure proc1 () begin select 1; end;;
Query OK, 0 rows affected (0.00 sec)

mysql> lock tables t1 read;;
Query OK, 0 rows affected (0.00 sec)

mysql> drop procedure proc1;;
ERROR 1100 (HY000): Table 'user' was not locked with LOCK TABLES
mysql> lock tables t1 write, mysql.user write;;
Query OK, 0 rows affected (0.00 sec)

mysql> drop procedure proc1;;
ERROR 1100 (HY000): Table 'db' was not locked with LOCK TABLES

... on and on until you've added all grants tables + proc ...

mysql> lock tables t1 write, mysql.user write, mysql.db write, mysql.tables_priv write, mysql.columns_priv write , mysql.procs_priv write, mysql.proc write ;;
ERROR 1428 (HY000): You can't combine write-locking of system tables with other tables or lock types

This makes me question why this error didn't show up when I locked test.t1 and mysql.user at the same time :-/

rpbouman said...

Hi Matthew,

ok I see. That would be nice indeed.

But still, this particular case - validating data with a trigger - seems to be possible.

The write lock simply locks everybody out, you then modify the triggers so they check data according to whatever new validation rules, recreate the trigger and unlock to give everyone access again.

I just checked it on MySQL 5.1 too - works exactly the same.

Matthew Montgomery said...

Yea, doing all the validation with a trigger works but also results in a lot of duplication for every table that needs to use the same validation. BTW, #49744 is submitted to address the inconsistent system tables locking behavior.

I suppose a work-around (and probably smarter solition) is to:

create a new procedure with a different name (p_validate_initials_v2), lock the table, modify the trigger to point to the new procedure, unlock tables, drop the old procedure or keep it around if you need to roll back.

rpbouman said...

Hi Matthew!

Thanks for submitting a FR.

And,... I agree. when I wrote "modify the triggers" in my comment I didn't exclude "point to a new procedure" ;)

Marc Alff said...

Hi Roland.

Great post, thanks for putting that together.

For completeness, using SIGNAL for data validation can also be extended to *relations* between tables.

See in the test suite how signal_demo1 implements Foreign Keys constraints without foreign keys :-)

Regards,
-- Marc

rpbouman said...

Hi Marc!

thanks! And, thank *you* for caring about bug #11661 and implementing SIGNAL.

(Please extend my thanks to Peter G too - I know you worked together on this one)

Roland

Shlomi Noach said...

Hi Roland,
Odd indeed. I'll have to review this; perhaps a fixed bug, then?
Thanks for pointing this out.

rpbouman said...

Shlomi, could you locate the bug?

I can post a comment on it, and then the support team can figure out if the bug was fixed or if there is another bug thatt accidentally allows me to drop the trigger while the table is locked (I hope that's not the case though!)

Roland

Anonymous said...

how to call shell script from mysql trigger code

rpbouman said...

@Anonymous,

""

Use a UDF. The sys_exec() or sys_eval() udfs in the lib_mysqludf_sys library will do this.

See:
http://www.mysqludf.org/lib_mysqludf_sys/index.php

Data entry India said...

Great post. I like the way you have explained the mysql queries. I have not learned about this anymore. Please continue writing...

Thanks

Brianbruk said...

Hi Roland,
Thank you for giving through and informative information about mysql data entry.

Anonymous said...

Hi Roland,

Nice article, but is there also a way to trigger/validation an insert/update based on the values already in that column?

kind regards,
Joop

rpbouman said...

Joop, not sure what you mean. Example?

Anonymous said...

Roland,

An example:

One table has for example a field like Endusertype which contains values like: 'Commercial',' Educational' or 'Governmental'

Now when i add a new record and Endusertype field will get 'Commercial | Governement' value.

A trigger needs to popup an reject this update until i accept it to be added to the table. When accepted it comes in the table and all other record with the same value will be accpted after accpetion.

I Hope is clear it up a little..

kind regards,

Joop

rpbouman said...

Hi Joop,

" reject this update until i accept it to be added to the table. When accepted it comes in the table"

So you want it to be rejected, but also to be remembered so it can be added later on?

Make a new table similar to the original target table. in the before insert / before update trigger, check the value. If it's valid do nothing, if not, store the row in the reject table, and fire a warning. Note that you cannot fire an error since that would not allow the current transaction to be committed (in which case you can't store the reject).

Create another trigger on the parent table that holds the values you want to check, and use that to move the rejects that are now accepted to the target table.

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...