Friday, October 21, 2011

MySQL Hacks: Preventing deletion of specific rows

Recently, someone emailed me:
I have a requirement in MYSQL as follows:
we have a table EMP and we have to restrict the users not delete employees with DEPT_ID = 10. If user executes a DELETE statement without giving any WHERE condition all the rows should be deleted except those with DEPT_ID = 10.

We are trying to write a BEFORE DELETE trigger but we are not able to get this functionality.

I have seen your blog where you explained about Using an UDF to Raise Errors from inside MySQL Procedures and/or Triggers. Will it helps me to get this functionality? Could you suggest if we have any other alternatives to do this as well?
Frankly, I usually refer people that write me these things to a public forum, but this time I felt like giving it a go. I figured it would be nice to share my solution and I'm also curious if others found other solutions still.

(Oh, I should point out that I haven't asked what the underlying reasons are for this somewhat extraordinary requirement. I normally would do that if I would be confronted with sucha a requirement in a professional setting. In this case I'm only interested in finding a crazy hack)

Attempt 1: Re-insert deleted rows with a trigger

My first suggestion was:
Raising the error won't help you achieve your goal: as soon as you raise the error, the statement will either abort (in case of a non-transactional table) or rollback all row changes made up to raising the error (in case of a transactional table)

Although I find the requirement strange, here's a trick you could try:

write a AFTER DELETE FOR EACH ROW trigger that re-inserts the rows back into the table in case the condition DEPT_ID = 10 is met.

Hope this helps...

Alas, I should've actually tried it myself before replying, because it doesn't work. If you do try it, a DELETEresults in this runtime error:
Can't update table 'emp' in stored function/trigger because it is already used by statement which invoked this stored function/trigger.
This is also known as "the mutating table problem".

Attempt 2: Re-insert deleted rows into a FEDERATED table

As it turns out, there is a workaround that meets all of the original demands. The workaround relies on the FEDERATED storage engine, which we can use to trick MySQL into thinking we're manipulating a different table than the one that fires the trigger. My first attempt went something like this:

CREATE TABLE t (
id INT AUTO_INCREMENT PRIMARY KEY,
dept_id INT,
INDEX(dept_id)
);

CREATE TABLE federated_t (
id INT AUTO_INCREMENT PRIMARY KEY,
dept_id INT,
INDEX(dept_id)
)
ENGINE FEDERATED
CONNECTION = 'mysql://root@localhost:3306/test/t';

DELIMITER //

CREATE TRIGGER adr_t
AFTER DELETE ON t
FOR EACH ROW
IF old.dept_id = 10 THEN
INSERT INTO t_federated
VALUES (old.id, old.dept_id);
END IF;
//

DELIMITER ;
So the idea is to let the trigger re-insert the deleted rows back into the federated table, which in turn points to the original table that fired the trigger to fool MySQL into thinking it isn't touching the mutating table. Although this does prevent one from deleting any rows that satisfy the DEPT_ID = 10 condition, it does not work as intended:

mysql> INSERT INTO t VALUES (1,10), (2,20), (3,30);
Query OK, 3 rows affected (0.11 sec)

mysql> DELETE FROM t;
ERROR 1159 (08S01): Got timeout reading communication packets

mysql> SELECT * FROM t;
+----+---------+
| id | dept_id |
+----+---------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
+----+---------+
3 rows in set (0.00 sec)
At this point I can only make an educated guess about the actual underlying reason for this failure. It could be that the deletion is locking the rows or even the table, thereby blocking the insert into the federated table until we get a timeout. Or maybe MySQL enters into an infinite loop of deletions and insertions until we hit a timeout. I didn't investigate, so I don't know, but it seems clear this naive solution doesn't solve he problem.

Attempt 3: Deleting from the FEDERATED table and re-inserting into the underlying table

It turns out that we can solve it with a FEDERATED table by turning the problem around: Instead of manipulating the original table, we can INSERT and DELETE from the FEDERATED table, and have an AFTER DELETE trigger on the FEDERATED table re-insert the deleted rows back into the original table:

DROP TRIGGER adr_t;

DELIMITER //

CREATE TRIGGER adr_federated_t
AFTER DELETE ON federated_t
FOR EACH ROW
IF old.dept_id = 10 THEN
INSERT INTO t
VALUES (old.id, old.dept_id);
END IF;
//

DELIMITER ;
Now, the DELETE does work as intended:

mysql> DELETE FROM federated_t;
Query OK, 3 rows affected (0.14 sec)

mysql> SELECT * FROM federated_t;
+----+---------+
| id | dept_id |
+----+---------+
| 1 | 10 |
+----+---------+
1 row in set (0.00 sec)
Of course, to actually use this solution, one would grant applications access only to the federated table, and "hide" the underlying table so they can't bypass the trigger by deleting rows directly from the underlying table.

Now, even though this solution does seem to fit the original requirements, I would not recommend it for several reasons:
  • It uses the FEDERATED storage engine, which hasn't been well supported. For that reason, it isn't enabled by default, and you need access to the MySQL configuration to enable it, limiting the applicability of this solution. Also, you could run into some nasty performance problems with the FEDERATED storage engine
  • The solution relies on a trigger. In MySQL, triggers can really limit performance
  • Perhaps the most important reason is that this solution performs "magic" by altering the behaviour of SQL statements. Arguably, this is not so much the fault of the solution as it is of the original requirement.

An Alternative without relying on magic: a foreign key constraint

If I were to encounter the original requirement in a professional situation, I would argue that we should not desire to alter the semantics of SQL commands. If we tell the RDBMS to delete all rows from a table, it should either succeed and result in all rows being deleted, or it should fail and fail completely, leaving the data unchanged.

So how would we go about implementing a solution for this changed requirement?

We certainly could try the approach that was suggested in the original request: create a trigger that raises an exception whenever we find the row should not be deleted. However, this would still rely on a trigger (which is slow). And if you're not on MySQL 5.5 (or higher), you would have to use one of the ugly hacks to raise an exception.

As it turns out, there is a very simple solution that does not rely on triggers. We can create a "guard table" that references the table we want to protect using a foreign key constraint:

mysql> CREATE TABLE t_guard (
-> dept_id INT PRIMARY KEY,
-> FOREIGN KEY (dept_id)
-> REFERENCES t(dept_id)
-> );
Query OK, 0 rows affected (0.11 sec)

mysql> INSERT INTO t_guard values (10);
Query OK, 1 row affected (0.08 sec)

mysql> DELETE FROM t;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`test`.`t_guard`, CONSTRAINT `t_guard_ibfk_1` FOREIGN KEY (`dept_id`) REFERENCES `t` (`dept_id`))
mysql> DELETE FROM t WHERE dept_id != 10;
Query OK, 2 rows affected (0.05 sec)
(Like in the prior example with the federated table, the guard table would not be accessible to the application, and the "guard rows" would have to be inserted by a privileged user)

Finally: what a quirkyy foreign key constraint!

You might have noticed that there's something quite peculiar about the foreign key constraint: typically, foreign key constraints serve to relate "child" rows to their respective "parent" row. To do that, the foreign key would typically point to a column (or set of columns) that make up either the primary key or a unique constraint in the parent table. But in this case, the referenced column dept_id in the t table is contained only in an index which is not unique. Strange as it may seem, this is allowed by MySQL (or rather, InnoDB). In this particular case, this flexibility (or is it a bug?) serves us quite well, and it allows us to guard many rows in the t table with dept_id = 10 with just one single row in the guard table.

13 comments:

Marc Alff said...

Hi Roland.

Have you considered a SIGNAL statement inside a BEFORE DELETE trigger ?

Regards,
-- Marc

rpbouman said...

Hi Marc,

well, the original requirement was that a DELETE that would include the rows that are to be retained should not fail, rather it should "simply" remove only the other rows. Like I pointed out in my initial reply:

"Raising the error won't help you achieve your goal: as soon as you raise the error, the statement will either abort (in case of a non-transactional table) or rollback all row changes made up to raising the error (in case of a transactional table)"

Of course, if the requirement would be changed to allow failure of the entire statement in case a row that matches the condition would have been deleted, and if one would decide to implement this with a trigger on MySQL 5.5 or higher, then I would most definitely recommend using SIGNAL to raise a custom error rather than some hack.

If we would accept a change of requirements then I am not sure what I'd choose - the solution with the trigger or with the foreign key. Although a bit quirky, I feel the foreign key has a few things going for it. It's quite flexible, as we only need to add or delete guard rows as needed. I also assume it delivers better performance, I really should check that some time.

Unknown said...

I admit that this is a novel solution to the users particular problem, but where I have a fault with it is the fact the situation even came up. If a user or program is able to do a delete all like that, then already the DBA gave away too much power, and something like that might be better handled through a stored procedure.

rpbouman said...

@Unknown: I agree. I just enjoy finding a hack :)

Madhivanan said...

How about creating a view that excludes data of deptid=10 and give users access to only this view and not the underlying table? This way you don't need additional logics implemented

rpbouman said...

@Madhivanan, yeah, that's a good point, and this is probably what I'd do if I had such a requirement.

I guess I was primed by the exact problem statement, and the OP reporting that they couldn't get it to work with a trigger made me explore if I could figure out the underlying problem behind that, rather than providing a functionally equivalent (and probably superior) solution.

Xevi said...

Why not go further, and do it even more tricky?
Your idea is fucking cool, but you do not really need two tables. One is enough.

Here's a snippet with a little gear more
Hope you like it.

rpbouman said...

Hi Xevi,

ok - I see what you mean. Instead of having a separate guard table, you add a nullable "guard column" to the table itself, which has a foreign key constraint pointing to the primary key of the table itself. The rows that maybe freely deleted have a NULL in the guard column, the rows that musn't be deleted reference themselves, correct?

I think it's a creative solution. Of course you'd need to remove write privileges to the guard column from unprivileged users.

You should however be vary careful to use this solution outside MySQL or actually InnoDB. InnoDB will prevent deletion of a self-referencing row, but for example Oracle will not (correct me if I'm wrong). As long as the row only references itself, it can be deleted without ever violating referential integrity (since there are no orphans if the row would simple be deleted).

Xevi said...

You're terribly right

Xevi said...

Have you seen this?
http://forums.mysql.com/read.php?105,593422,593422#msg-593422

I've discovered by chance this morning.

rpbouman said...

Xevi, I haven't. Looks like a bug. Goto bugs.mysql.com, and file it.

Anonymous said...

@Madhivanan how can I prevent deletion from a view? Do you have any idea?

rpbouman said...

> @Madhivanan how can I prevent deletion from a view? Do you have any idea?

Why not simply use the GRANT/privileges system for that? No hacking needed.

SAP HANA Trick: DISTINCT STRING_AGG

Nowadays, many SQL implementations offer some form of aggregate string concatenation function. Being an aggregate function, it has the effe...