Roland Bouman's blog: Ms SQL Server

Showing posts with label Ms SQL Server. Show all posts

Friday, October 21, 2011

MySQL Hacks: Preventing deletion of specific rows

Recently, someone emailed me:

I have a requirement in MYSQL as follows:
we have a table EMP and we have to restrict the users not delete employees with DEPT_ID = 10. If user executes a DELETE statement without giving any WHERE condition all the rows should be deleted except those with DEPT_ID = 10.

We are trying to write a BEFORE DELETE trigger but we are not able to get this functionality.

I have seen your blog where you explained about Using an UDF to Raise Errors from inside MySQL Procedures and/or Triggers. Will it helps me to get this functionality? Could you suggest if we have any other alternatives to do this as well?

Frankly, I usually refer people that write me these things to a public forum, but this time I felt like giving it a go. I figured it would be nice to share my solution and I'm also curious if others found other solutions still.

(Oh, I should point out that I haven't asked what the underlying reasons are for this somewhat extraordinary requirement. I normally would do that if I would be confronted with sucha a requirement in a professional setting. In this case I'm only interested in finding a crazy hack)

Attempt 1: Re-insert deleted rows with a trigger

My first suggestion was:

Raising the error won't help you achieve your goal: as soon as you raise the error, the statement will either abort (in case of a non-transactional table) or rollback all row changes made up to raising the error (in case of a transactional table)

Although I find the requirement strange, here's a trick you could try:

write a AFTER DELETE FOR EACH ROW trigger that re-inserts the rows back into the table in case the condition DEPT_ID = 10 is met.

Hope this helps...

Alas, I should've actually tried it myself before replying, because it doesn't work. If you do try it, a DELETEresults in this runtime error:

Can't update table 'emp' in stored function/trigger because it is already used by statement which invoked this stored function/trigger.

This is also known as "the mutating table problem".

Attempt 2: Re-insert deleted rows into a FEDERATED table

As it turns out, there is a workaround that meets all of the original demands. The workaround relies on the FEDERATED storage engine, which we can use to trick MySQL into thinking we're manipulating a different table than the one that fires the trigger. My first attempt went something like this:


CREATE TABLE t (
    id INT AUTO_INCREMENT PRIMARY KEY,
    dept_id INT,
    INDEX(dept_id)
);

CREATE TABLE federated_t (
    id INT AUTO_INCREMENT PRIMARY KEY,
    dept_id INT,
    INDEX(dept_id)
)
ENGINE FEDERATED
CONNECTION = 'mysql://root@localhost:3306/test/t';

DELIMITER //

CREATE TRIGGER adr_t
AFTER DELETE ON t
FOR EACH ROW
IF old.dept_id = 10 THEN
    INSERT INTO t_federated
    VALUES  (old.id, old.dept_id);
END IF;
//

DELIMITER ;

So the idea is to let the trigger re-insert the deleted rows back into the federated table, which in turn points to the original table that fired the trigger to fool MySQL into thinking it isn't touching the mutating table. Although this does prevent one from deleting any rows that satisfy the DEPT_ID = 10 condition, it does not work as intended:


mysql> INSERT INTO t VALUES (1,10), (2,20), (3,30);
Query OK, 3 rows affected (0.11 sec)

mysql> DELETE FROM t;
ERROR 1159 (08S01): Got timeout reading communication packets

mysql> SELECT * FROM t;
+----+---------+
| id | dept_id |
+----+---------+
|  1 |      10 |
|  2 |      20 |
|  3 |      30 |
+----+---------+
3 rows in set (0.00 sec)

At this point I can only make an educated guess about the actual underlying reason for this failure. It could be that the deletion is locking the rows or even the table, thereby blocking the insert into the federated table until we get a timeout. Or maybe MySQL enters into an infinite loop of deletions and insertions until we hit a timeout. I didn't investigate, so I don't know, but it seems clear this naive solution doesn't solve he problem.

Attempt 3: Deleting from the FEDERATED table and re-inserting into the underlying table

It turns out that we can solve it with a FEDERATED table by turning the problem around: Instead of manipulating the original table, we can INSERT and DELETE from the FEDERATED table, and have an AFTER DELETE trigger on the FEDERATED table re-insert the deleted rows back into the original table:


DROP TRIGGER adr_t;

DELIMITER //

CREATE TRIGGER adr_federated_t
AFTER DELETE ON federated_t
FOR EACH ROW
IF old.dept_id = 10 THEN
    INSERT INTO t
    VALUES  (old.id, old.dept_id);
END IF;
//

DELIMITER ;

Now, the DELETE does work as intended:


mysql> DELETE FROM federated_t;
Query OK, 3 rows affected (0.14 sec)

mysql> SELECT * FROM federated_t;
+----+---------+
| id | dept_id |
+----+---------+
|  1 |      10 |
+----+---------+
1 row in set (0.00 sec)

Of course, to actually use this solution, one would grant applications access only to the federated table, and "hide" the underlying table so they can't bypass the trigger by deleting rows directly from the underlying table.

Now, even though this solution does seem to fit the original requirements, I would not recommend it for several reasons:

It uses the FEDERATED storage engine, which hasn't been well supported. For that reason, it isn't enabled by default, and you need access to the MySQL configuration to enable it, limiting the applicability of this solution. Also, you could run into some nasty performance problems with the FEDERATED storage engine
The solution relies on a trigger. In MySQL, triggers can really limit performance
Perhaps the most important reason is that this solution performs "magic" by altering the behaviour of SQL statements. Arguably, this is not so much the fault of the solution as it is of the original requirement.

An Alternative without relying on magic: a foreign key constraint

If I were to encounter the original requirement in a professional situation, I would argue that we should not desire to alter the semantics of SQL commands. If we tell the RDBMS to delete all rows from a table, it should either succeed and result in all rows being deleted, or it should fail and fail completely, leaving the data unchanged.

So how would we go about implementing a solution for this changed requirement?

We certainly could try the approach that was suggested in the original request: create a trigger that raises an exception whenever we find the row should not be deleted. However, this would still rely on a trigger (which is slow). And if you're not on MySQL 5.5 (or higher), you would have to use one of the ugly hacks to raise an exception.

As it turns out, there is a very simple solution that does not rely on triggers. We can create a "guard table" that references the table we want to protect using a foreign key constraint:


mysql> CREATE TABLE t_guard (
    ->     dept_id INT PRIMARY KEY,
    ->     FOREIGN KEY (dept_id)
    ->         REFERENCES t(dept_id)
    -> );
Query OK, 0 rows affected (0.11 sec)

mysql> INSERT INTO t_guard values (10);
Query OK, 1 row affected (0.08 sec)

mysql> DELETE FROM t;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`test`.`t_guard`, CONSTRAINT `t_guard_ibfk_1` FOREIGN KEY (`dept_id`) REFERENCES `t` (`dept_id`))
mysql> DELETE FROM t WHERE dept_id != 10;
Query OK, 2 rows affected (0.05 sec)

(Like in the prior example with the federated table, the guard table would not be accessible to the application, and the "guard rows" would have to be inserted by a privileged user)

Finally: what a quirkyy foreign key constraint!

You might have noticed that there's something quite peculiar about the foreign key constraint: typically, foreign key constraints serve to relate "child" rows to their respective "parent" row. To do that, the foreign key would typically point to a column (or set of columns) that make up either the primary key or a unique constraint in the parent table. But in this case, the referenced column dept_id in the t table is contained only in an index which is not unique. Strange as it may seem, this is allowed by MySQL (or rather, InnoDB). In this particular case, this flexibility (or is it a bug?) serves us quite well, and it allows us to guard many rows in the t table with dept_id = 10 with just one single row in the guard table.

Wednesday, May 26, 2010

A small issue of SQL standards

From a functional perspective, the core SQL support in all major and minor RDBMS-es is reasonably similar. In this light, it's sometimes quite disturbing to find how some very basic things work so differently across different products. Consider this simple statement:

SELECT  'a' /* this is a comment */ 'b'
FROM    onerow

What should the result be? (You can assume that onerow is an existing table that contains one row)

It turns out popular RDBMS-es mostly disagree with one another.

In Oracle XE, we get this:

SELECT  'a' /* comment */ 'b'
                         *
ERROR at line 1:
ORA-00923: FROM keyword not found where expected

PostgreSQL 8.4 also treats it as a syntax error, and thus seems compatible with Oracle's behavior:

ERROR:  syntax error at or near "'b'"
LINE 1: SELECT  'a' /* this is a comment */ 'b'

In Microsoft SQL Server 2008 we get:

b
-
a

(1 rows affected)

As you can see, MS SQL treats the query as SELECT 'a' AS b FROM onerow.

Finally, in MySQL 5.1, we get:

+----+
| a  |
+----+
| ab |
+----+
1 row in set (0.00 sec)

So in MySQL, its as if the comment isn't there at all, and as if the string literals 'a' and 'b' are actually just one string literal 'ab'.

So what does the SQL standard say? In my copy of the 2003 edition, I find this (ISO/IEC 9075-2:2003 (E) 5.3 <literal>, page 145):

Syntax Rules
1) In a <character string literal> or <national character string literal>, the sequence:
<quote> <character representation>... <quote> <separator> <quote> <character representation>... <quote>
is equivalent to the sequence
<quote> <character representation>... <character representation>... <quote>

If we lookup the definition of <separator>, it reads:

<separator> ::= { <comment> | <white space> }...

So in this case, MySQL does the "right" thing, and basically ignores the comment, treating 'a' and 'b' as a single string constant 'ab'.

UPDATE 1: As always, the devil is in the details. And trust me, the SQL standard has many of them (details that is - I'll leave it up to the reader to decide for the devils, although I have a suspicion in a particular direction). Read on, and make sure to read Nick's comment on this post - it turns out PostgreSQL seems to behave exactly according to the standard in this case.

UPDATE 2: Serg also posted a comment citing yet another part of the standard that states that all comments implicitly count as a newline. This would mean that there doesn't have to be a literal newline character in or following the comment. In this case, my original remark that MySQL got it right would hold again.

I should state that I think very highly of both Nick and Serg, and as far as I am concerned, they're both right. I can't help but seeing this as yet more support for my statement that the SQL standard is so complex it is almost or perhaps completely impossible to get it right.

Do you find this too bold? If so, I'd really love to hear your thoughts on it. Please help us solve this paradox, I only want to understand what the standard really says.

If you try the same thing with a single line comment, all products mentioned react the same as with the initial query, except for PostgreSQL, which now treats the query according to the standard.

Now don't get me wrong. This post is not designed to bash or glorify any of the products mentioned. I think all of them are great in their own way. I am quite aware that although MySQL happens to adhere to the standard here, it violates it in other places. Finally, I should point out that I don't have a specific opinion on what the right behavior should be. I just want it to be the same on all platforms.

At the same time, I realize that for SQL it's probably too late - up to an extent, incompatibility is considered normal, and database professionals tend to be specialized in particular products anyway. So I'm not holding my breath for the grand unification of SQL dialects.

When I encountered this issue, I did have to think about that other rathole of incompatibilities I have to deal with professionally, which is web-browsers. An interesting development there is the HTML 5 specification, which actually defines an algorithm for parsing HTML - even invalid HTML. This is quite different from the approach taken by most standards, which typically define only an abstract grammar, but leave the implementation entirely up to the vendors. In theory, providing parsing instructions as detailed as done in HTML 5 should make it easier to create correct parsers, and hopefully this will contribute to a more robust web.

Anyway. That was all. Back to work...

UPDATE: I just heard that Sybase (unsurprisingly) behaves similar to MS SQL for this query (that is, query is valid, and returns 'a' in a column called b). I checked SQLite myself, which is also in that camp.

Nick also pointed out that LucidDB also provides a standard compliant implementation, in other words, it behaves exactly like PostgreSQL for this particular query. However, Julian, who was and is closely involved in LucidDB agrees with Serg that the comment should probably count as a newline, and filed a bug for LucidDB.

I checked Firebird 2.1.3, and they are in the Oracle camp: in both cases, the query gives a syntax error.

Friday, January 02, 2009

Writing a Book: Building Pentaho Solutions

Ok - this has been stewing for some time now, and I think now is the right time to announce that I am working together with Jos van Dongen from Tholis Consulting to create a book for Wiley with the tentative title "Building Pentaho Solutions".

My personal aim is to make this book the primary point of reference for DBAs and Application Developers that are familiar with Open Source products like MySQL and PostgreSQL but have no prior BI skills, as well as BI professionals that are familiar with closed source BI products like Microsoft BI and Business Objects that want to learn how to get things done with Pentaho.

The book will cover all distinct components and sub-products that make up the Pentaho BI Suite. For each component (and where applicable), installation, usage and maintenance are discussed and illustrated.

Background theory is given as needed to provide context for those readers with no prior BI knowledge or experience. This means we will cover topics such as dimensional modeling, data warehousing, data integration and much more. At the same time, the book will have a strong "hands-on" focus, and for that purpose, Jos and I have put together a fairly realistic online DVD Rental Company, www.worldclassmovies.com.

We went as far as creating different database schemas for the operational applications (for customer orders, inventory management, and purchase orders), and generating non-trivial volumes of example data for it (tables with 100k to 10m rows), covering over 7 years of DVD rental business - and this is just the operational system. And although generated, it's not just random data - things like customer age and location distribution, per year and per week ordering behaviour etc. - it's all in our sample data.

In our schemas and sample solutions, we are making an effort to cater to MS SQL, MySQL, Oracle and PostgreSQL users. We will also explore a few highly interesting open source database products like LucidDB and MonetDB.

Anyway - I will soon post some more details about the book and it's contents as we progress. I have taken Baron Schwartz's advice on writing a book to heart, and we are working from a pretty detailed outline trying to meet a darn-tight schedule.

If all goes well, we should end up with a 450-500 page book in a store near you by August 2009.

UPDATE:
Jos and I are in the final stages of writing the book. We will then have to process reviewer comments and set up the book's website (including all code samples).

We've had a few changes to our schedule. Instead of a 400-500 page book, it will be a 500-600 page book. It was due to be available late August, but this will now be Early September.

The book can be pre-ordered on Amazon. More information on the book can be found on the Wiley website

UPDATE2: The book is in print and available since late August 2009! So far, we've had a couple of very nice reviews.

The book is available directly from the Wiley site and also from Amazon. Wiley also provides an e-book. If the link to the e-book is not working or claiming the book is not available as e-book then you most likely ended up on the European version of the website. To fix that and order the e-book anyway, click the "choose location" link on the Wiley homepage, and set it to United States and try again.