Percona Live MySQL User's Conference, San Francisco, April 10-12th, 2012 Book: "Pentaho Kettle Solutions", Matt Casters, Roland Bouman, & Jos van Dongen, Wiley 2010 Book: "Pentaho Solutions", Roland Bouman & Jos van Dongen, Wiley 2009

Wednesday, May 10, 2006

MySQL View Metadata: original source, algorithm, and status

Views are a great feature which is available as of MySQL 5. A view can be seen as a named and persistently stored SELECT expression. After defining the view it can be referenced just like a table in other queries.

Metadata for MySQL Views can be queried using the TABLES and VIEWS tables from the information_schema. This can be very helpful when maintaining or developing views.

However, this metadata would be even more valueable if it would provide a few more items:

Status

A view becomes broken when the database objects on which it depends (tables, functions, other views) are modified or dropped. It would be nice to be able to run a query to identify exactly those views.

The original sourcecode

Right now, the view source is parsed and becomes rewritten, stripping out any comments and indentation. The information_schema only provides access to this rewritten statement in the VIEW_DEFINITION column of the information_schema.VIEWS table; an unpleasant suprise if you spent a lot of time commenting your view code.

Algorithm

In the CREATE VIEW statement, one can specify the Algorithm that should be used to implement the view. Because the algorithm can have a big impact on the performance it would be nice to be able to query that.


The View Status


Although it's not explicitly supported, it's actually possible to create a conventional query that returns the status for a given view (or set of views). The solution relies on the value of the TABLE_COMMENT column of the information_schema.TABLES table.

For base tables, this column contains the text specified for the COMMENT table option specified in the CREATE TABLE or ALTER TABLE statement. For views, this normally contains the text 'VIEW':

mysql> USE test;
Database changed
mysql> CREATE TABLE t(
-> id int
-> )
-> ENGINE = MyISAM
-> COMMENT = 'Comments on this table'
-> ;
Query OK, 0 rows affected (0.10 sec)

mysql> CREATE VIEW v
-> AS
-> SELECT *
-> FROM t;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT table_name
-> , table_type
-> , table_comment
-> FROM information_schema.tables
-> WHERE table_schema = schema()
-> AND table_name IN ('t','v')
-> ;
+------------+------------+------------------------+
| table_name | table_type | table_comment |
+------------+------------+------------------------+
| t | BASE TABLE | Comments on this table |
| v | VIEW | VIEW |
+------------+------------+------------------------+
2 rows in set (0.00 sec)

Now, we will deliberately break the view by dropping the table upon which it depends:

mysql> DROP TABLE t;
Query OK, 0 rows affected (0.04 sec)

When we run our select again, this is what we get:

+------------+------------+----------------------------------------------------------------------------------+
| table_name | table_type | table_comment |
+------------+------------+----------------------------------------------------------------------------------+
| v | VIEW | View 'test.v' references invalid table(s) or column(s) or function(s) or defin |
+------------+------------+----------------------------------------------------------------------------------+

So, we're witnessing the fact that MySQL is using this column in it's very own way. Because you cannot specify a COMMENT table option when creating or altering a view, MySQL can pretty much decide for itself how to use this column, and it turns out we can take advantage of this.

(Another case where MySQL uses it for it's own means is seen in InnoDB tables. In those cases, some of the tail of the comment is used to log some innodb specific info, possibly truncating the actually entered comment text.)

Now that we know this, it won't be too hard to write a query that identifies broken views:

SELECT *
FROM information_schema.tables
WHERE table_type = 'VIEW'
AND table_type != table_comment

(Later on, we'll modify the query in order to have a VIEW_STATUS column report a value of either 'VALID' or 'INVALID', indicating wheter the view is broken or not. Of course, that modification is trivial.)

Under the Hood: Algorithm and Sourcecode


MySQL stores the view definition in a .frm file in the database directory beneath the data directory, just like it does for tables. Let's take a look at the v.frm file: :

TYPE=VIEW
query=select `test`.`t`.`id` AS `id` from `test`.`t`
md5=25ace41c7f6b0927b2c4ed5227e64ad0
updatable=1
algorithm=0
definer_user=root
definer_host=localhost
suid=1
with_check_option=0
revision=1
timestamp=2006-05-09 22:34:46
create-version=1
source=select *\nfrom t

I've highlighted some items of interest:

  • the query is the code after MySQL's done rewriting the query.

  • Here, the algorithm item is associated with the number 0. By creating views with explicit ALGORITHM specifications, we can quickly learn that this can have either on of the values 0,1 or 2, corresponding to the algorithms UNDEFINED, TEMPTABLE and MERGE.

  • the source item is almost exactly the code as it was entered by the user. The whitespace is still there, only the newlines are escaped with an escape sequence (\n)


Now that we know that it's stored over there, we can try to use it in a query. To do that, we need to be able to somehow access and parse (or actually scan rather than parse) the file contents...And we can, with the LOAD_FILE() function!

Using LOAD_FILE()


The LOAD_FILE() function accepts a filename as a string argument. It tries to read the specified file, and returns the contents as a string. You need the FILE privilege to do this, and the file must not be larger than max_allowed_packet bytes.

On my Windows XP system, the data directory is located at D:\MySQL\MySQL Server 5.1\data. The view v resides in the test database, so I can access the contents of the corresponding .frm file using this query:


SELECT load_file(
'D:\\MySQL\\MySQL Server 5.1\\data\\test\\v.frm'
)
;


In a string literal, a backslash introduces an escape sequence for special characters. So, in order to get a literal backslash, the backslash itself needs to be escaped by typing two backslashes. Tip for windows users: LOAD_FILE will also accept forward slashes as directory separator, so this works equally well (better, actually):


SELECT load_file(
'D:/MySQL/MySQL Server 5.1/data/test/v.frm'
);


Now, we want to be able to drive this using data from the information_schema.VIEWS table, so the following modification is quite straightforward:


SELECT load_file(
concat(
'D:/MySQL/MySQL Server 5.1/data/'
, table_schema
, '/'
, table_name
, '.frm'
)
) file_contents
FROM information_schema.VIEWS
;


Of course, we would also like to be able to parameterize the value for the data directory. At first, it looks as if you can:


mysql> show variables like '%dir%';
+----------------------------+-------------------------------------------+
| Variable_name | Value |
+----------------------------+-------------------------------------------+
| basedir | D:\MySQL\MySQL Server 5.1\ |
| bdb_logdir | |
| bdb_tmpdir | |
| character_sets_dir | D:\MySQL\MySQL Server 5.1\share\charsets\ |
| datadir | D:\MySQL\MySQL Server 5.1\Data\ |
| innodb_data_home_dir | |
| innodb_log_arch_dir | |
| innodb_log_group_home_dir | .\ |
| innodb_max_dirty_pages_pct | 90 |
| plugin_dir | D:\MySQL\MySQL Server 5.1\lib/ |
| slave_load_tmpdir | D:\MySQL\Datatemp |
| tmpdir | D:/MySQL/Datatemp |
+----------------------------+-------------------------------------------+
12 rows in set (0.00 sec)

But alas, it doesn't work that way:

mysql> select @@datadir;
ERROR 1193 (HY000): Unknown system variable 'datadir'

I'm not really sure if this behaviour constitutes a bug, but it doesn't make much sense to me. A quick search in bugs.mysql.com did find me a report about this (bug #1039). As I cannot change this behaviour, I'm settling for the literal right now. If anyone's got a tip to improve on this, I'll be most glad to hear about it.

(Actually, the minor improvement I came up with was using a user-defined variable but this got me into trouble again because of an entirely different bug alltogether)

'Parsing' the file contents


Now that we can load the view's .frm file, we can try to pick out the items of interest.

(I wouldn't call it parsing exactly, it's more of a scanning/recognition process)

For example, suppose we want to extract the algorithm item from the file contents. First, we'll create a few different views to be able to test the effect:

create algorithm=undefined view v_undefined as select * from t;
create algorithm=merge view v_merge as select * from t;
create algorithm=temptable view v_temptable as select * from t;


Now we can write the actual query. First, we want to look for the string algorithm=; then, our value is sitting there right after that string as the remainder of the line. In MySQL, we can solve this 'excision' SUBSTRING() and INSTR() and the like, but I find that a combination SUBSTRING_INDEX() is really much, much more elegant, as well as less error-prone. That's because you don't have to repeat or duplicate any of the strings that determine the excision: I need only one occurrence of the input string, one occurrence of the starting delimiter and one occurrence of the ending delimiter. (A variant of this theme let's you select the n-th item in a separated list by modifying one integer only). Anyway, this is what I'm doing now:


select v.table_schema as schema_name
, v.table_name as view_name
, substring_index(
substring_index(
load_file(
concat(
'D:/MySQL/MySQL Server 5.1/data/'
, v.table_schema
, '/'
, v.table_name
, '.frm'
)
)
, 'algorithm='
, -1
)
, '\n'
, 1
) as algorithm
from information_schema.views v
where v.table_schema = schema()
and v.table_name like 'v\_%'


and this gives us the following result:


+-------------+---------------+-----------+
| schema_name | view_name | algorithm |
+-------------+---------------+-----------+
| test | v_merge | 2 |
| test | v_temptable | 1 |
| test | v_undefined | 0 |
+-------------+---------------+-----------+


Ok, we can make the output pretty later on, so people can actually read what algorithm was used. We'll take care of that after we're done extracting the original source code.

By itself, excising the souce code is even easier than what we already saw for the algorithm item, because the source code happens to be the last item in the .frm file. There's just one complicating factor, and that's that some special characters in the orignal statement appear as plain, escaped text there. So, in order to get to the actual source code, you need to apply the REPLACE() function. This worked for me:

replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
substring_index(
load_file(
concat(
'D:/MySQL/MySQL Server 5.1/data/'
, v.table_schema
, '/'
, v.table_name
, '.frm'
)
)
, '\nsource='
, -1
)
, '\\_','\_')
, '\\%','\%')
, '\\\\','\\')
, '\\Z','\Z')
, '\\t','\t')
, '\\r','\r')
, '\\n','\n')
, '\\b','\b')
, '\\\"','\"')
, '\\\'','\'')
, '\\0','\0')

Putting it together


Now, let's finish what we've started. Suppose we want to create a query that shows both the algorithm, as well as the original source code. Well, obviously, we can take the expressions for the algorithm and source, just as described before, and SELECT both of them. This means we will be loading the file 2 times (That is, unless the optimizer is clever enough to exclude it).

I don't know if it makes a difference performance-wise, but I use a little trick to prevent duplicate loads of the file. I assign the result of LOAD_FILE() to a (session) user variable, and reference that wherever I need the file contents. Here's the final query:


select v.table_schema as schema_name
, v.table_name as view_name
, case
substring_index(
substring_index(
@frm:=load_file(
concat(
'D:/MySQL/MySQL Server 5.1/data/'
, v.table_schema
, '/'
, v.table_name
, '.frm'
)
)
, 'algorithm='
, -1
)
, '\n'
, 1
)
when 0 then 'UNDEFINED'
when 1 then 'TEMPTABLE'
when 2 then 'MERGE'
else 'UNRECOGNIZED'
end as algorithm
, replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
substring_index(
@frm
, '\nsource='
, -1
)
, '\\_','\_')
, '\\%','\%')
, '\\\\','\\')
, '\\Z','\Z')
, '\\t','\t')
, '\\r','\r')
, '\\n','\n')
, '\\b','\b')
, '\\\"','\"')
, '\\\'','\'')
, '\\0','\0') as view_source
, if(t.table_type=table_comment,'YES'
, 'NO'
) as view_status
from information_schema.views v
inner join information_schema.tables t
on v.table_schema = t.table_schema
and v.table_name = t.table_name


This includes the extra modifications to create the view_status column and the algorithm column.

4 comments:

Martijn Tonies said...

Right now, the view source is parsed and becomes rewritten, stripping out any comments and indentation.

I'll say.

And I complained about that when MySQL 5 was still in Alpha.

Only thing I got was: "but the view itself works, so it's not a bug".

Right.

--
Martijn Tonies
http://www.upscene.com

Roland Bouman said...

Hi Martijn,

Yeah I know about some of your trials and tribulations. Actually, I already borrowed from your comments and contributions to the MySQL internals list before (Formatting SQL code made easy).

What can I say? I hope this will be fixed soon too. It doesn't seem like a hard thing to accomplish and I can only conclude that it's not high on the list of priorities.

Anyway, thanks for your comment.

kind regards,

Roland

Martijn Tonies said...

Hey, you did :-)

This post indeed reminded me of that thread... The request is in the tracker and well, that's that. So it seems.

I wish more people picked this up.

--
Martijn Tonies
http://www.upscene.com

Jayant Kumar said...

you have a very nice technical blog...
I will give some benchmark figures for mysql versus lucene fulltext search soon.