Tuesday, December 15, 2009

Validating MySQL data entry with triggers: A quick look at the SIGNAL syntax

The latest MySQL 5.5 milestone release offers support for an ANSI/ISO standard feature called the SIGNAL syntax. You can use this syntax inside stored routines (including triggers) to raise an error condition which can be used to invoke specific error handling, or otherwise abort the stored routine. In addition, you can use the SIGNAL syntax to convey information about what went wrong, which may be used by the caller to handle the error.

I have written about MySQL data entry validation procedures in the past. At the time, MySQL did not support any proper means to raise an error condition inside a stored routine, and one had to work around that by deliberatly causing a runtime error, for example by referring to a non-existent table, setting a non-nullable column to null, or ]Yting a specially crafted UDF. In this artcle, I'm taking a closer look at how to implement more robust data entry validation in MySQL using the SIGNAL syntax.

Triggers


For those of you that are unfamiliar with the subject, MySQL offers support for triggers as of version 5.0. Triggers are stored routines that are executed automatically right before or after data change events like a row being inserted, updated or deleted. Because triggers are executed as part of the SQL statement (and its containing transaction) causing the row change event, and because the trigger code has direct access to the changed row, you could in theory use them to correct or reject invalid data.

Example data validation problem


Let's take a quick look at the following example. Suppose you have a table called person to store data about persons:

CREATE TABLE person (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, first_name VARCHAR(64) NOT NULL
, last_name VARCHAR(64) NOT NULL
, initials VARCHAR(8)
)

Let's consider a simple validation method for the initials column:

  • We require the first letter of the value for the initials column to match the first letter of the value of the first_name column.

  • In addition, we require that the values for the initials column consists of uppercase letters separated by periods.


In order to implement this, we design the following algorithm:

  • If the value for first_name is NULL, we do nothing. The NOT NULL table constraint will prevent the data from being entered anyway, so further attempts at validation or correction are pointless.

  • If the value for initials is NULL, we correct it by automatically filling in the first character of the value for first_name.

  • If the values for first_name as well as initials are both not NULL, we require that the first character of the value for first_name equals the first character of the value for initials.

  • Finally, we use a regular expression to check if the value for initials matches the desired pattern of uppercase letters separated by periods.


A data validation stored procedure


Let's start by creating a stored procedure to perform this algorithm. Here's the code for the p_validate_initials procedure which validates and possibly corrects the initials value based on the value for first_name:

DELIMITER go

CREATE PROCEDURE p_validate_initials(
IN p_first_name VARCHAR(64)
, INOUT p_initials VARCHAR(64)
)
DETERMINISTIC -- same arguments yield same result, always
NO SQL -- does not execute SQL statements, only procedural logic
_main: BEGIN

DECLARE WARN_CORRECTED_INITIALS CONDITION FOR SQLSTATE '01000';
DECLARE ERR_INITIALS_DONT_MATCH_FIRSTNAME CONDITION FOR SQLSTATE '45000';
DECLARE ERR_INITIALS_ILLFORMATTED CONDITION FOR SQLSTATE '45000';

IF p_first_name IS NULL THEN
LEAVE _main; -- nothing to validate
ELSEIF p_initials IS NULL THEN -- initials are NULL, correct:
SET p_initials := CONCAT(LEFT(p_first_name, 1), '.'); -- take the first letter of first_name
SIGNAL WARN_CORRECTED_INITIALS -- warn about the corrective measure
SET MESSAGE_TEXT = 'Corrected NULL value for initials to match value for first_name.';
ELSEIF BINARY LEFT(p_first_name, 1) != LEFT(p_initials, 1) THEN -- initials don't match first_name
SIGNAL ERR_INITIALS_DONT_MATCH_FIRSTNAME -- raise an error
SET MESSAGE_TEXT = 'The first letter of the value for initials does not match the first letter of the value for first_name';
END IF;
IF NOT p_initials REGEXP '^([A-Z][.])+$' THEN -- if initials don't match the correct pattern
SIGNAL ERR_INITIALS_ILLFORMATTED -- raise an error
SET MESSAGE_TEXT = 'The value for initials must consist of upper case letters separated by periods.';
END IF;
END;
go

DELIMITER ;

Let's take a look at how this procedure works.

How to issue warnings


First, let's pass NULL for the initials to see if they are properly corrected:

mysql> set @initials := null;
Query OK, 0 rows affected (0.00 sec)

mysql> call p_validate_initials('Roland', @initials);
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> select @initials;
+-----------+
| @initials |
+-----------+
| R. |
+-----------+
1 row in set (0.00 sec)

Note that executing the procedure placed the correct value into the @initials user-defined variable. Also note that a warning was issued.

In this case, the procedure ran through the following branch of the first IF statement:

...
ELSEIF p_initials IS NULL THEN
SET p_initials := CONCAT(LEFT(p_first_name, 1), '.');
SIGNAL WARN_CORRECTED_INITIALS
SET MESSAGE_TEXT = 'Corrected NULL value for initials to match value for first_name.';

ELSEIF ...

The warning is caused by the SIGNAL statement (which appears in bold text in the snippet above). The general syntax for the SIGNAL statement is:

SIGNAL condition_value
[SET signal_information [, signal_information] ...]

condition_value:
SQLSTATE [VALUE] sqlstate_value
| condition_name

signal_information:
condition_information_item = simple_value_specification

So, in this case, we used the condition name WARN_CORRECTED_INITIALS as condition_value. This condition is declared in the top of the procedure:

DECLARE WARN_CORRECTED_INITIALS CONDITION FOR SQLSTATE '01000';

Basically, condition declarations like these serve simply to tie a human readable name to otherwise obscure SQLSTATE values. As per the SQL standard, SQLSTATE values are strings of 5 digits. The prefix 01 indicates a warning.

(Condition declarations like these are not only useful to clarify the meaning of your SIGNAL statements, you can also use them to declare error HANDLERs)

An alternative syntax for SIGNAL allows you to directly refer to the SQLSTATE without explicitly declaring a CONDITION. So, if you feel that declaring explicit conditions is too much trouble, you can also omit that and write:

SIGNAL SQLSTATE '01000'
...

(However, I like using explicit condition names better because it does a better job of explaining the intention of the code.)

Conveying SIGNAL context information


The SIGNAL statement also features a SET-clause which is used to convey signal information. In our example the set clause was:

SET MESSAGE_TEXT = 'Corrected NULL value for initials to match value for first_name.';

As you can see, the SET-clause in the example contains an assignment. In the context of the SIGNAL statemntt, such an assignment is referred to as signal_information. The left hand side of the assignments must be one of the predefined condition_information_items. The SET-clause can have multiple of these signal_information items which can be used to capture and communicate program state to the client program.

In the case of the example we can demonstrate how this works using the MySQL command-line client. By issuing a SHOW WARNINGS statement, we can see the message text was conveyed with with the signal_information item:

mysql> show warnings;
+-------+------+---------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------+
| Error | 1644 | The value for initials must consist of upper case letters separated by periods. |
+-------+------+---------------------------------------------------------------------------------+
1 row in set (0.01 sec)

Unfortunately, MySQL does not currently support a way for any calling stored routines to capture the signal_information items. They are currently only available in the APIs you use to communicate with MySQL, so you can capture them in your application code.

(A bug has been filed to ask for the ability to refer to signal information items in stored routines. This should become available whenever MySQL implements a DIAGNOSTICS feature)

Predefined condition information items


I just mentioned that the left-hand side of the signal_information item assignment must be one of the predefined condition_information_items. These are dictated by the standard, and although MySQL allows all of the standard condition_information_items, only two of them are currently relevant: MESSAGE_TEXT and MYSQL_ERRNO. We already illustrated using MESSAGE_TEXT. The MYSQL_ERRNO is a non-standard condition information item that can be used to convey custom error codes.

This leaves currently three variables to convey information about the context of the SIGNAL statement:

  • SQLSTATE: available in the C API as mysql_sqlstate()

  • MYSQL_ERRNO: available in the C API as mysql_errno()

  • MESSAGE_TEXT: available in the C API as mysql_error()


(You should be able to obtain the information also in PHP through the corresponding mysqli_, pdo_ and mysql_ functions.)

How to issue errors


We just discussed how to cause your stored routine to issue warnings. Issuing errors is exactly the same process, it just relies on a different class of SQLSTATE values (as determined by the code prefix). Let's see the errors in action:

mysql> set @initials := 'r';
Query OK, 0 rows affected (0.00 sec)

mysql> call p_validate_initials('Roland', @initials);
ERROR 1644 (45000): The first letter of the value for initials does not match the first letter of the value for first_name

In this case, the stored routine ran through the last branch of the first IF statement:

ELSEIF BINARY LEFT(p_first_name, 1) != LEFT(p_initials, 1) THEN
SIGNAL ERR_INITIALS_DONT_MATCH_FIRSTNAME
SET MESSAGE_TEXT = 'The first letter of the value for initials does not match the first letter of the value for first_name';
END IF;

As you can see, the syntax for the actual SIGNAL statement is exactly similar to what we saw in the example illustrating warnings. The most important difference is that in this case, the condition that is being signalled is declared with a SQLSTATE value of 45000:

DECLARE ERR_INITIALS_DONT_MATCH_FIRSTNAME CONDITION FOR SQLSTATE '45000';

The SQLSTATE code 45000 is a special and indicates a general user-defined exception.

Using the procedure in a trigger


All we need to do now is create our triggers on the person table that call the procedure to perform the actual validation. We need to apply the validation when data is inserted into the table, but also when data is updated. If it turns out the data is invalid, we need to reject the change. For this reason, we want to create triggers that fire before the data change is applied to the table.

So, to enforce validation, we need two triggers: one that fires BEFORE INSERT events, and one that fires BEFORE UPDATE events. Because the validation process itself is the same regardless of the type of change event, both triggers can call the p_validate_initials procedure to perform the actual validation. This allows us to write (and maintain!) the validation logic only once, and reuse it whenever we need it.

DELIMITER go

CREATE TRIGGER bir_person
BEFORE INSERT ON person
FOR EACH ROW
BEGIN
CALL p_validate_initials(
NEW.first_name
, NEW.initials
);
END;
go

CREATE TRIGGER bur_person
BEFORE UPDATE ON person
FOR EACH ROW
BEGIN
CALL p_validate_initials(
NEW.first_name
, NEW.initials
);
END;
go

DELIMITER ;

A quick check indicates that data validation is now enforced as intended:

mysql> INSERT INTO person (id, first_name, last_name, initials)
-> VALUES (2, 'Roland', 'Bouman', 'r');
ERROR 1644 (45000): The first letter of the value for initials does not match the first letter of the value for first_name

mysql> INSERT INTO person (id, first_name, last_name, initials)
-> VALUES (2, 'Roland', 'Bouman', 'R');
ERROR 1644 (45000): The value for initials must consist of upper case letters separated by periods.

mysql> INSERT INTO person (id, first_name, last_name, initials)
-> VALUES (2, 'Roland', 'Bouman', NULL);
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+---------+------+------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------------------------+
| Warning | 1642 | Corrected NULL value for initials to match value for first_name. |
+---------+------+------------------------------------------------------------------+
1 row in set (0.00 sec)

Final words


For more information on MySQL 5.5, check out Giuseppe Maxia's article on Getting Started with MySQL 5.5. Detailed information on the SIGNAL syntax is available in the reference manual here: http://dev.mysql.com/doc/refman/5.5/en/signal-resignal.html.

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...