Book: "Pentaho Kettle Solutions", Matt Casters, Roland Bouman, & Jos van Dongen, Wiley 2010 Book: "Pentaho Solutions", Roland Bouman & Jos van Dongen, Wiley 2009

Saturday, August 20, 2005

MySQL: Create Dirty Little Tricker FOR SELECT ON view

Wow, i never thought it would work, but it sure seems to be the case. I stumbled over a post at the MySQL Triggers forum by Greg Chandler.

Greg would've liked to increment a number that's stored in the database. That's not an uncommon thing to want, but what makes Greg's question interesting in my opinion, is that he would like this to happen in the event that a user only reads the database field that stores the number. Now, I don't wan't to go into details as to why he needs it, but feel free and follow the link to find out all about it.

Suppose you'd want to implement this, what you'd need is something like:


create trigger {trigger-name}
for select on {table-name}[(
{columnlist}
)]
for each row
begin
..increase that number...
end;


Of course, I made this syntax up to illustrate the resemblance with an ordinary triggered action. Like a stored procedure, a trigger is associated with a sequence of statements that's persistently stored in the database. This sequence can be executed repeatedly afterwards. Unlike a stored procedure, a trigger can't be called or evaluated directly at the user's request. Rather, a triggers is fired in response to some event occurring in the database.

The classical trigger events are associated with a database base table. These events correspond directly to the occurrence of one of the fundamental operations that can be performed to manipulate the contents of a table: INSERT, UPDATE and DELETE.

Of course, the concept of an event that fires execution of a sequence of statements can be extended to all kinds of objects, and depending upon the type of object, you could think of appropriate events for that object. For starters, quite some database products offer similar functionality for views as well as for base tables. And Oracle even supports triggers that react to operations that alter the structure of a table, as well as events that occur to the system as a whole (system shutdown etc.), not just a particular part of the system.

What I think is interesting about Gregs original problem is that there's no RDBMS product that I know that supports the trigger concept for the SELECT operation. I can think of at least one purpose for such functionality. Suppose we'd have a collection of views, for example as part of a reporting environment. Now, it would be really cool to know which views are used a lot, how many rows a query on the view generally yields and the like.

Now, there's this dirty little trick that seems to do the job. I don't think that this trick reflects intended usage of the MySQL database product, and I wouldn't be suprised if it doesn't survive the 5.0.10 version I'm currently running. But I'm so excited about it, I just want to show it off. And of course, I hope that it will be of some use to you, dear reader.

The trick is implemented by creating a function that performs the desired actions. Now, function evaluation is just a particular kind of expression. This means we can write a view that references our function, thus executing whatever statements we put in the function. Whenever the user select from the view, our function would be evaluated, and thus, the actions are performed.

Let's stick to the 'logging view usage' case. I'll be using the test database for all this, so let's make it the current database:


use test;


First, we'll create a table to log usage in:


CREATE TABLE select_log (
id int unsigned NOT NULL auto_increment PRIMARY KEY
, log_who varchar(16) NOT NULL -- the user that performs the SELECT
, log_conn int unsigned NOT NULL -- the id of the connection
, log_when timestamp NOT NULL -- the moment the operation is performed
, log_schema varchar(64) NOT NULL -- the schema in which the view resides
, log_view varchar(64) NOT NULL -- the view
)
;

Here's the function that performs the logging:

delimiter $$
create function f_log_select(
p_log_schema varchar(64)
, p_log_view varchar(64)
)
returns int unsigned
NOT DETERMINISTIC
begin
insert
into select_log(
log_who
, log_conn
, log_when
, log_schema
, log_view
) values (
current_user()
, connection_id()
, now()
, p_log_schema
, p_log_view
);
return last_insert_id();
end;
$$


So far so good. Now, we need a table that represents the actual data for wich we can write a view that 'fires' our function:


CREATE TABLE table1 (
number int(11) default
, clientid int(11) default
, datebegin datetime default
, dateend datetime default
, data char(1) default
)


And we'll need some data to populate it:


+--------+----------+---------------------+---------------------+------+
| number | clientid | datebegin | dateend | data |
+--------+----------+---------------------+---------------------+------+
| 1 | 1 | 2005-01-01 00:00:00 | 2005-01-02 00:00:00 | A |
| 2 | 1 | 2005-01-02 00:00:00 | 2005-01-04 00:00:00 | b |
| 3 | 1 | 2005-01-04 00:00:00 | 2005-01-05 00:00:00 | c |
+--------+----------+---------------------+---------------------+------+


Finally, we're arriving at the heart of the matter: the view definition. We want the view to select all the rows from table1, and we want to make sure that our f_log_select function is called in the process. This is what I came up with:


create or replace view v_table1
as
select *
from table1
where f_log_select(
'test'
, 'table1'
)


Now, lets select from our view, and see what happens:


mysql> select * from v_table1;
+--------+----------+---------------------+---------------------+------+
| number | clientid | datebegin | dateend | data |
+--------+----------+---------------------+---------------------+------+
| 1 | 1 | 2005-01-01 00:00:00 | 2005-01-02 00:00:00 | A |
| 2 | 1 | 2005-01-02 00:00:00 | 2005-01-04 00:00:00 | b |
| 3 | 1 | 2005-01-04 00:00:00 | 2005-01-05 00:00:00 | c |
+--------+----------+---------------------+---------------------+------+
3 rows in set (0.00 sec)

mysql> select * from select_log;
+----+----------------+----------+---------------------+------------+----------+
| id | log_who | log_conn | log_when | log_schema | log_view |
+----+----------------+----------+---------------------+------------+----------+
| 1 | root@localhost | 1 | 2005-08-20 09:39:03 | test | v_table1 |
+----+----------------+----------+---------------------+------------+----------+
1 row in set (0.00 sec)


Wow! It works...or does it? I was a bit puzzled at first why there's only one row in the log table, whereas there are three rows in the view. I'd expected to see three rows in the log table as well. Giving it a bit more thought, I must conclude that the results are in fact correct. The f_log_select function has about the same status as a call to a built in function like NOW(). So, whenever a select is issued against the view, f_log_select is evaluated exactly once. The evaluation is totally independant of the number of rows selected from the view:


mysql> select * from v_table1 where number = 0;
Empty set (0.01 sec)

mysql> select * from select_log;
+----+----------------+----------+---------------------+------------+----------+
| id | log_who | log_conn | log_when | log_schema | log_view |
+----+----------------+----------+---------------------+------------+----------+
| 1 | root@localhost | 1 | 2005-08-20 09:39:03 | test | v_table1 |
| 2 | root@localhost | 1 | 2005-08-20 09:47:26 | test | v_table1 |
+----+----------------+----------+---------------------+------------+----------+
2 rows in set (0.01 sec)


Having thought about this, I thought that it would be really cool to log both the occurrence of a select statement as well as the event of ann individual row being selected. So, I extended my log table a bit:


alter table select_log
add log_type enum('ROW','STATEMENT')
not null
default 'STATEMENT'
;


(I used 'STATEMENT' as default because I'm to lazy to rewrite f_log_select at this stage).

Ok, so what would we do to trick MySQL into evaluating a function for each row selected by the view? Actually, it's not that hard. We must pass a value from our record to the function, so that MySQL is forced to re-evaluate for each row. So, in addition to f_log_select, I write this:


DELIMITER $$

DROP FUNCTION IF EXISTS `test`.`f_log_select_row`$$
CREATE FUNCTION `test`.`f_log_select_row`(
p_log_schema varchar(64)
, p_log_view varchar(64)
, p_expr varchar(1)
) RETURNS int(10) unsigned
begin
insert
into select_log(
log_who
, log_conn
, log_when
, log_schema
, log_view
, log_type
) values (
current_user()
, connection_id()
, now()
, p_log_schema
, p_log_view
, 'ROW'
);
return last_insert_id();
end$$

DELIMITER ;


And, I rewrote the view:


create or replace view v_table1
as
select *
from table1
where f_log_select(
'test'
, 'table1'
)
and f_log_select_row(
'test'
, 'table1'
, data
)


When we clear the log, select from the view and then see what's in the log we get this:


+----+----------------+----------+---------------------+------------+----------+-----------+
| id | log_who | log_conn | log_when | log_schema | log_view | log_type |
+----+----------------+----------+---------------------+------------+----------+-----------+
| 3 | root@localhost | 1 | 2005-08-20 10:14:19 | test | v_table1 | STATEMENT |
| 4 | root@localhost | 1 | 2005-08-20 10:14:19 | test | v_table1 | ROW |
| 5 | root@localhost | 1 | 2005-08-20 10:14:19 | test | v_table1 | ROW |
| 6 | root@localhost | 1 | 2005-08-20 10:14:19 | test | v_table1 | ROW |
+----+----------------+----------+---------------------+------------+----------+-----------+


So, it definitely seems to work.

Well, I already said that I regard this as rather dirty little trick, but I had fun devising it, and it learnt me a bit about function evaluation within views.

Maybe this isn't a trick after all, and maybe it's even useful for a real purpose. If you find such a purpose, go ahead, and try if it works for you. I'll be most interested to hear from anyone who tries.

1 comment:

Anonymous said...

I can think of a use.

Say you want to check the data in your table to see that its up to date, and if not, fetch it from an external source. Or maybe do some other type of housekeeping.

Checking a date field (to make sure the data is up to date) before returning the data would be a great example of a use for this particular trick.