.comment-link {margin-left:.6em;}

Roland Bouman's blog

Saturday, March 29, 2008

MySQL information_schema: Identifying rows from TABLE_CONSTRAINTS

Yesterday, I set out a little quiz about the TABLE_CONSTRAINTS table in the MySQL information_schema. The task was:

  • Specify a minimal set of columns of the information_schema.TABLE_CONSTRAINTS table that is sufficient to reliably identify a single row in the information_schema.TABLE_CONSTRAINTS table.
  • Argue why these columns are necessary and sufficient to identify a row, and why a smaller set of columns does not exist

Short Answer


For MySQL there are two such column sets:

  • CONSTRAINT_SCHEMA, TABLE_NAME, CONSTRAINT_NAME, and CONSTRAINT_TYPE
  • TABLE_NAME, TABLE_SCHEMA, CONSTRAINT_NAME, and CONSTRAINT_TYPE

Explanation


According to the ISO SQL specification (ISO/IEC 9075-11:2003 (E) 6.48 TABLE_CONSTRAINTS base table), we can derive that the "conceptual primary key" of a standard implementation of the TABLE_CONSTRAINTS system view should be (CONSTRAINT_CATALOG, CONSTRAINT_SCHEMA, and CONSTRAINT_NAME). In the absence of support for catalogs, we can ignore the CONSTRAINT_CATALOG column, and this also applies to MySQL (all %_CATALOG columns in the MySQL information_schema are always NULL). That would leave us with (CONSTRAINT_SCHEMA and CONSTRAINT_NAME) - that is, table constraint names would be unique within a schema. However, in MySQL, constraint names need not be unique within a schema.

MySQL supports three types of table constraints:

  • PRIMARY KEY
  • UNIQUE
  • FOREIGN KEY (that is, they are supported dependent upon the storage engine)
.

Identifying FOREIGN KEY constraints


FOREIGN KEYs are in their own per schema namespace, so at a glance it may seem that the combination of (CONSTRAINT_SCHEMA and CONSTRAINT_NAME) is sufficient to identify a FOREIGN KEY constraint. However, because PRIMARY KEY and UNIQUE constraints are not within the same namespace as FOREIGN KEY constraints, a single schema may contain a FOREIGN KEY constraint and a non-FOREIGN KEY constraint that have the same CONSTRAINT_NAME.

From this it follows that we need to take the CONSTRAINT_TYPE into account too.

So in order to reliably identify a FOREIGN KEY constraint, we need to have at least CONSTRAINT_SCHEMA, CONSTRAINT_NAME *and* require that CONSTRAINT_TYPE equals 'FOREIGN KEY'.

(Among the answers was a suggestion that the CONSTRAINT_TYPE is not necessary, because the CONSTRAINT_NAME would provide the necessary differentiation. Although this is true for the identifiers generated by MySQL in the absence of explicit identifiers, it is not true in the general case)

Identifying PRIMARY KEY constraints


So, would (CONSTRAINT_SCHEMA, CONSTRAINT_NAME, and CONSTRAINT_TYPE) be sufficient to identify any of the other types of table constraints?

Unfortunately not. Almost everybody that is somewhat familiar with MySQL knows that in MySQL a PRIMARY KEY constraint doesn't really have its own identifier. Rather, the CONSTRAINT_NAME for a PRIMARY KEY constraint is always 'PRIMARY'. In fact, the identifier 'PRIMARY' is reserved exclusively for PRIMARY KEY constraints, and may not be used for any other type of constraint.

From this, it follows that there can be multiple rows in information_schema.TABLE_CONSTRAINTS that have the same combination of values in the columns (CONSTRAINT_SCHEMA, CONSTRAINT_NAME, and CONSTRAINT_TYPE). In fact, for one particular value of CONSTRAINT_SCHEMA there will be just as many occurrences of the combination ('PRIMARY', 'PRIMARY KEY') in the columns (CONSTRAINT_NAME, CONSTRAINT_TYPE) as there are PRIMARY KEY constraints in that schema.

Of course, we know that there can be only one PRIMARY KEY per table. From that, it follows that we can identify a PRIMARY KEY constraint if we know to which table it belongs.

Tables (and views) reside in their own per-schema namespace, so the combination of (TABLE_SCHEMA and TABLE_NAME) is sufficient to identify a table. That means that the combination of (TABLE_SCHEMA, TABLE_NAME, and CONSTRAINT_TYPE) is sufficient to identify a PRIMARY KEY constraint. Once we identified a particular table using (TABLE_SCHEMA, TABLE_NAME) we know that the table constraint with the CONSTRAINT_TYPE equal to 'PRIMARY KEY' is the PRIMARY KEY constraint.

Of course, because the name 'PRIMARY' is reserved exclusively for PRIMARY KEY constraints we could've equally well settled for (TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_NAME) although personally I prefer to use CONSTRAINT_TYPE.

Identifying UNIQUE constraints


This leaves us with the UNIQUE constraints. UNIQUE constraints must have a unique name per table, and the name must not be equal to 'PRIMARY' (which is reserved for PRIMARY KEY constraints). From this we must conclude that we need no less than the combination of (TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_TYPE, and CONSTRAINT_NAME) to identify a UNIQUE constraint.

We must have (TABLE_SCHEMA, TABLE_NAME) to identify the table because the UNIQUE constraints have a unique name per table. Now we can take a shortcut: within one table we can distinguish between PRIMARY KEY and UNIQUE constraints by virtue of the fact that the CONSTRAINT_NAME for PRIMARY KEYs will always be 'PRIMARY'. However, if we need to distinguish between FOREIGN KEY and UNIQUE constraints, we still need to look at the CONSTRAINT_TYPE too.

Summary


To recapitulate:
  • we need CONSTRAINT_SCHEMA, CONSTRAINT_NAME, CONSTRAINT_TYPE to identify a FOREIGN KEY constraint
  • we need TABLE_SCHEMA, TABLE_NAME, and CONSTRAINT_TYPE to identify a PRIMARY KEY constraint.
  • Alternatively we could identify PRIMARY KEY constraints by looking at TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_NAME by virtue of the fact that all PRIMARY KEYs always have a CONSTRAINT_NAME equal to 'PRIMARY'
  • we need TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_TYPE and CONSTRAINT_NAME to identify a UNIQUE constraint

From this we could conclude that the minimal set of columns required to identify an arbitrary table constraint consists of:
  • CONSTRAINT_SCHEMA
  • CONSTRAINT_NAME
  • CONSTRAINT_TYPE
  • TABLE_SCHEMA
  • TABLE_NAME

However, in MySQL, a table constraint is 'owned' by the table on which it is defined and a table constraint defined on particular table cannot reside in a schema different from the schema its table. This means that for any row in information_schema.TABLE_CONSTRAINTS, the columns TABLE_SCHEMA and CONSTRAINT_SCHEMA will always have the same value.

This brings us to the final conclusion that there are two minimal set of columns that may be used to identify an arbitrary constraint in the information_schema.TABLE_CONSTRAINTS table:
  • CONSTRAINT_SCHEMA, TABLE_NAME, CONSTRAINT_NAME, CONSTRAINT_TYPE
  • TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_NAME, CONSTRAINT_TYPE

As CONSTRAINT_SCHEMA and TABLE_SCHEMA are completely interchangeable as far as the result is concerned, there is no specific reason to generically prefer either combination.

To learn more about this and other traps in the MySQL information schema, come meet me at the MySQL User's conference. I will be your guide on the Grand Tour of the Information Schema and its Applications, and explain more of these gotcha's.

Labels: , ,

Tuesday, March 25, 2008

MySQL Information Schema applications at the UC2008

Last week I blogged about the upcoming MySQL Users conference, in particular about the Writing MySQL UDFs tutorial that I will be delivering.

I will also be doing the Grand Tour of the Information Schema and its Applications.


I will discuss the elements in the MySQL information schema, and provide tips to write queries against it. Most of the talk will revolve around a number of scripts I have developed over the past few years:


  • Generating a history/audit database - generate all the code you need to keep track of all changes occurring in your on-line database. Use it to audit or implement 'flashback'

  • Checking foreign key violations - disabling foreign key checks may be useful, but is dangerous. This script helps you find problems with foreign key constraints

  • Creating federated tables - FEDERATED tables are useful, but tedious and error-prone to create. Let this script do the work instead

  • Checking for duplicated and redundant indexes - Redundant or duplicate indexes can slow down your database performance. Find them with this script (Note: the original script I blogged about earlier contains a critical flaw - this is a completely new version)

I also have a few new information schema tricks up my sleeve. Without giving away too much, take a look at this little conversation I just had with my MySQL command line client:

mysql> call qwz.qwz(null);
+-----+------------------------------------------------------------------+
| | Welcome to the command line Query Wizard for MySQL!!! |
+-----+------------------------------------------------------------------+
| 1 | Set the schema (current: world) |
| --- | ---------------------------------------------------------------- |
| 2 | Choose a table |
+-----+------------------------------------------------------------------+
3 rows in set (0.00 sec)

Yes! It's an interactive query wizard for the command line. I don't want to spoil too much, but I can unveil that I will be presenting a full-fledged interactive query wizard that operates completely from the MySQL command line. It's just a stored procedure - no proxy, no UDFs, no plugins or whatever.

To give you a taste of the possibilities, take a look at the continuation of my session with the query wizard. First let's choose the schema:

mysql> call qwz.qwz(1);
+----+------------------------+
| | Set the default schema |
+---+------------------------+
| 1 | information_schema |
| 2 | mysql |
| 3 | qwz |
| 4 | sakila |
| 5 | world |
+---+------------------------+

Let's settle for the sakila schema:

mysql> call qwz.qwz(4);

This will prompt us to choose a table from the sakila schema:

+-----+------------------------------------------------------------------+
| | The command line Query Wizard for MySQL |
+-----+------------------------------------------------------------------+
| 1 | Set the schema (current: sakila) |
| --- | ---------------------------------------------------------------- |
| 2 | actor |
| 3 | actor_info |
| 4 | address |
| 5 | category |
| 6 | city |
| 7 | country |
| 8 | customer |
| 9 | customer_list |
| 10 | film |
| 11 | film_actor |
| 12 | film_category |
| 13 | film_list |
| 14 | film_text |
| 15 | inventory |
| 16 | language |
| 17 | nicer_but_slower_film_list |
| 18 | payment |
| 19 | rental |
| 20 | sales_by_film_category |
| 21 | sales_by_store |
| 22 | staff |
| 23 | staff_list |
| 24 | store |
+-----+------------------------------------------------------------------+
26 rows in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Let's pick the rental table:

mysql> call qwz.qwz(19);

Once we selected the initial table, we can now build a join path, and we are prompted to pick any of the tables related to rental:

+-----+------------------------------------------------------------------+
| | Join path: + tables are added / - tables are removed |
+-----+------------------------------------------------------------------+
| | sakila.rental |
| 1 | + sakila.customer (fk_rental_customer) |
| 2 | + sakila.inventory (fk_rental_inventory) |
| 3 | + sakila.staff (fk_rental_staff) |
| --- | ---------------------------------------------------------------- |
| 4 | Choose columns |
| --- | ---------------------------------------------------------------- |
| 5 | New Query |
+-----+------------------------------------------------------------------+
8 rows in set (0.20 sec)

Query OK, 0 rows affected (0.20 sec)

Let's extend the join path with the customer table:

mysql> call qwz.qwz(1);
+-----+------------------------------------------------------------------+
| | Join path: + tables are added / - tables are removed |
+-----+------------------------------------------------------------------+
| | sakila.rental |
| 1 | - sakila.customer (fk_rental_customer) |
| 2 | + sakila.address (fk_customer_address) |
| 3 | + sakila.store (fk_customer_store) |
| 4 | + sakila.inventory (fk_rental_inventory) |
| 5 | + sakila.staff (fk_rental_staff) |
| --- | ---------------------------------------------------------------- |
| 6 | Choose columns |
| --- | ---------------------------------------------------------------- |
| 7 | New Query |
+-----+------------------------------------------------------------------+
10 rows in set (0.09 sec)

Query OK, 0 rows affected (0.09 sec)

Let's throw in the inventory table in the mix too:

mysql> call qwz.qwz(4);
+-----+------------------------------------------------------------------+
| | Join path: + tables are added / - tables are removed |
+-----+------------------------------------------------------------------+
| | sakila.rental |
| 1 | - sakila.customer (fk_rental_customer) |
| 2 | + sakila.address (fk_customer_address) |
| 3 | + sakila.store (fk_customer_store) |
| 4 | - sakila.inventory (fk_rental_inventory) |
| 5 | + sakila.film (fk_inventory_film) |
| 6 | + sakila.store (fk_inventory_store) |
| 7 | + sakila.staff (fk_rental_staff) |
| --- | ---------------------------------------------------------------- |
| 8 | Choose columns |
| --- | ---------------------------------------------------------------- |
| 9 | New Query |
+-----+------------------------------------------------------------------+
12 rows in set (0.10 sec)

Query OK, 0 rows affected (0.10 sec)

We can keep this up quite a long time, and we can remove tables from our join path in a similar way. We can then start specifying some columns:

mysql> call qwz.qwz(8);
+-----+------------------------------------------------------------------+
| | Choose columns for the SELECT list |
+-----+------------------------------------------------------------------+
| | sakila.rental.* |
| | --------------- |
| 1 | + rental_id |
| 2 | + rental_date |
| 3 | + inventory_id |
| 4 | + customer_id |
| 5 | + return_date |
| 6 | + staff_id |
| 7 | + last_update |
| | --------------- |
| | sakila.customer.* (fk_rental_customer) |
| | -------------------------------------- |
| 8 | + customer_id |
| 9 | + store_id |
| 10 | + first_name |
| 11 | + last_name |
| 12 | + email |
| 13 | + address_id |
| 14 | + active |
| 15 | + create_date |
| 16 | + last_update |
| | -------------------------------------- |
| | sakila.inventory.* (fk_rental_inventory) |
| | ---------------------------------------- |
| 17 | + inventory_id |
| 18 | + film_id |
| 19 | + store_id |
| 20 | + last_update |
| --- | ---------------------------------------------------------------- |
| 21 | New Query |
+-----+------------------------------------------------------------------+
30 rows in set (0.02 sec)

Query OK, 0 rows affected (0.02 sec)

Currently I'm still working on a multiple select interface, but for now I'll snip out all but the last individual column selection:

mysql> call qwz.qwz(18);
+-----+------------------------------------------------------------------+
| | SELECT list: + columns are added / - columns are removed |
+-----+------------------------------------------------------------------+
| | sakila.rental.* |
| | --------------- |
| 1 | + rental_id |
| 2 | - rental_date |
| 3 | + inventory_id |
| 4 | + customer_id |
| 5 | + return_date |
| 6 | + staff_id |
| 7 | + last_update |
| | --------------- |
| | sakila.customer.* (fk_rental_customer) |
| | -------------------------------------- |
| 8 | + customer_id |
| 9 | + store_id |
| 10 | - first_name |
| 11 | - last_name |
| 12 | + email |
| 13 | + address_id |
| 14 | + active |
| 15 | + create_date |
| 16 | + last_update |
| | -------------------------------------- |
| | sakila.inventory.* (fk_rental_inventory) |
| | ---------------------------------------- |
| 17 | + inventory_id |
| 18 | - film_id |
| 19 | + store_id |
| 20 | + last_update |
| --- | ---------------------------------------------------------------- |
| 21 | Show SQL |
| --- | ---------------------------------------------------------------- |
| 22 | Explain SQL |
| --- | ---------------------------------------------------------------- |
| 23 | Execute SQL |
| --- | ---------------------------------------------------------------- |
| 24 | New Query |
+-----+------------------------------------------------------------------+
36 rows in set (0.03 sec)

Query OK, 0 rows affected (0.03 sec)

Now is a good moment to inspect the SQL underlying our query:

mysql> call qwz.qwz(21);

+-----+----------------------------------------------------------------------------+
| | The command line Query Wizard for MySQL |
+-----+----------------------------------------------------------------------------+
| | SELECT t1.rental_date |
| | ,t2.first_name |
| | ,t2.last_name |
| | ,t8.film_id |
| | FROM sakila.rental AS t1 |
| | INNER JOIN sakila.customer AS t2 ON (t1.customer_id) = (t2.customer_id) |
| | INNER JOIN sakila.inventory AS t8 ON (t1.inventory_id) = (t8.inventory_id) |
| --- | ---------------------------------------------------------------- |
| 1 | Execute SQL |
| --- | ---------------------------------------------------------------- |
| 2 | Explain SQL |
| --- | ---------------------------------------------------------------- |
| 3 | Choose columns |
| --- | ---------------------------------------------------------------- |
| 4 | New Query |
+-----+----------------------------------------------------------------------------+
15 rows in set (1.21 sec)

Query OK, 0 rows affected (1.21 sec)

And we can execute it, or run EXPLAIN on it:

mysql> call qwz.qwz(2);
+----+-------------+-------+--------+----------------------------------------+---------------------+---------+------------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------------+---------------------+---------+------------------------+------+-------+
| 1 | SIMPLE | t8 | index | PRIMARY | PRIMARY | 3 | NULL | 4673 | |
| 1 | SIMPLE | t1 | ref | idx_fk_inventory_id,idx_fk_customer_id | idx_fk_inventory_id | 3 | sakila.t8.inventory_id | 1 | |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY | PRIMARY | 2 | sakila.t1.customer_id | 1 | |
+----+-------------+-------+--------+----------------------------------------+---------------------+---------+------------------------+------+-------+
3 rows in set (0.00 sec)

As you can imagine, quite a good deal of information schema hacking going on here.

Code will be released in full immediately after my talk, and if there is sufficient interest I will discuss the internals of the qwz stored procedure in full.

Labels: , , , ,