Saturday, March 29, 2008

MySQL information_schema: Identifying rows from TABLE_CONSTRAINTS

Yesterday, I set out a little quiz about the TABLE_CONSTRAINTS table in the MySQL information_schema. The task was:
  • Specify a minimal set of columns of the information_schema.TABLE_CONSTRAINTS table that is sufficient to reliably identify a single row in the information_schema.TABLE_CONSTRAINTS table.
  • Argue why these columns are necessary and sufficient to identify a row, and why a smaller set of columns does not exist

Short Answer


For MySQL there are two such column sets:

  • CONSTRAINT_SCHEMA, TABLE_NAME, CONSTRAINT_NAME, and CONSTRAINT_TYPE
  • TABLE_NAME, TABLE_SCHEMA, CONSTRAINT_NAME, and CONSTRAINT_TYPE

Explanation


According to the ISO SQL specification (ISO/IEC 9075-11:2003 (E) 6.48 TABLE_CONSTRAINTS base table), we can derive that the "conceptual primary key" of a standard implementation of the TABLE_CONSTRAINTS system view should be (CONSTRAINT_CATALOG, CONSTRAINT_SCHEMA, and CONSTRAINT_NAME). In the absence of support for catalogs, we can ignore the CONSTRAINT_CATALOG column, and this also applies to MySQL (all %_CATALOG columns in the MySQL information_schema are always NULL). That would leave us with (CONSTRAINT_SCHEMA and CONSTRAINT_NAME) - that is, table constraint names would be unique within a schema. However, in MySQL, constraint names need not be unique within a schema.

MySQL supports three types of table constraints:

  • PRIMARY KEY
  • UNIQUE
  • FOREIGN KEY (that is, they are supported dependent upon the storage engine)
.

Identifying FOREIGN KEY constraints


FOREIGN KEYs are in their own per schema namespace, so at a glance it may seem that the combination of (CONSTRAINT_SCHEMA and CONSTRAINT_NAME) is sufficient to identify a FOREIGN KEY constraint. However, because PRIMARY KEY and UNIQUE constraints are not within the same namespace as FOREIGN KEY constraints, a single schema may contain a FOREIGN KEY constraint and a non-FOREIGN KEY constraint that have the same CONSTRAINT_NAME.

From this it follows that we need to take the CONSTRAINT_TYPE into account too.

So in order to reliably identify a FOREIGN KEY constraint, we need to have at least CONSTRAINT_SCHEMA, CONSTRAINT_NAME *and* require that CONSTRAINT_TYPE equals 'FOREIGN KEY'.

(Among the answers was a suggestion that the CONSTRAINT_TYPE is not necessary, because the CONSTRAINT_NAME would provide the necessary differentiation. Although this is true for the identifiers generated by MySQL in the absence of explicit identifiers, it is not true in the general case)

Identifying PRIMARY KEY constraints


So, would (CONSTRAINT_SCHEMA, CONSTRAINT_NAME, and CONSTRAINT_TYPE) be sufficient to identify any of the other types of table constraints?

Unfortunately not. Almost everybody that is somewhat familiar with MySQL knows that in MySQL a PRIMARY KEY constraint doesn't really have its own identifier. Rather, the CONSTRAINT_NAME for a PRIMARY KEY constraint is always 'PRIMARY'. In fact, the identifier 'PRIMARY' is reserved exclusively for PRIMARY KEY constraints, and may not be used for any other type of constraint.

From this, it follows that there can be multiple rows in information_schema.TABLE_CONSTRAINTS that have the same combination of values in the columns (CONSTRAINT_SCHEMA, CONSTRAINT_NAME, and CONSTRAINT_TYPE). In fact, for one particular value of CONSTRAINT_SCHEMA there will be just as many occurrences of the combination ('PRIMARY', 'PRIMARY KEY') in the columns (CONSTRAINT_NAME, CONSTRAINT_TYPE) as there are PRIMARY KEY constraints in that schema.

Of course, we know that there can be only one PRIMARY KEY per table. From that, it follows that we can identify a PRIMARY KEY constraint if we know to which table it belongs.

Tables (and views) reside in their own per-schema namespace, so the combination of (TABLE_SCHEMA and TABLE_NAME) is sufficient to identify a table. That means that the combination of (TABLE_SCHEMA, TABLE_NAME, and CONSTRAINT_TYPE) is sufficient to identify a PRIMARY KEY constraint. Once we identified a particular table using (TABLE_SCHEMA, TABLE_NAME) we know that the table constraint with the CONSTRAINT_TYPE equal to 'PRIMARY KEY' is the PRIMARY KEY constraint.

Of course, because the name 'PRIMARY' is reserved exclusively for PRIMARY KEY constraints we could've equally well settled for (TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_NAME) although personally I prefer to use CONSTRAINT_TYPE.

Identifying UNIQUE constraints


This leaves us with the UNIQUE constraints. UNIQUE constraints must have a unique name per table, and the name must not be equal to 'PRIMARY' (which is reserved for PRIMARY KEY constraints). From this we must conclude that we need no less than the combination of (TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_TYPE, and CONSTRAINT_NAME) to identify a UNIQUE constraint.

We must have (TABLE_SCHEMA, TABLE_NAME) to identify the table because the UNIQUE constraints have a unique name per table. Now we can take a shortcut: within one table we can distinguish between PRIMARY KEY and UNIQUE constraints by virtue of the fact that the CONSTRAINT_NAME for PRIMARY KEYs will always be 'PRIMARY'. However, if we need to distinguish between FOREIGN KEY and UNIQUE constraints, we still need to look at the CONSTRAINT_TYPE too.

Summary


To recapitulate:
  • we need CONSTRAINT_SCHEMA, CONSTRAINT_NAME, CONSTRAINT_TYPE to identify a FOREIGN KEY constraint
  • we need TABLE_SCHEMA, TABLE_NAME, and CONSTRAINT_TYPE to identify a PRIMARY KEY constraint.
  • Alternatively we could identify PRIMARY KEY constraints by looking at TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_NAME by virtue of the fact that all PRIMARY KEYs always have a CONSTRAINT_NAME equal to 'PRIMARY'
  • we need TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_TYPE and CONSTRAINT_NAME to identify a UNIQUE constraint

From this we could conclude that the minimal set of columns required to identify an arbitrary table constraint consists of:
  • CONSTRAINT_SCHEMA
  • CONSTRAINT_NAME
  • CONSTRAINT_TYPE
  • TABLE_SCHEMA
  • TABLE_NAME

However, in MySQL, a table constraint is 'owned' by the table on which it is defined and a table constraint defined on particular table cannot reside in a schema different from the schema its table. This means that for any row in information_schema.TABLE_CONSTRAINTS, the columns TABLE_SCHEMA and CONSTRAINT_SCHEMA will always have the same value.

This brings us to the final conclusion that there are two minimal set of columns that may be used to identify an arbitrary constraint in the information_schema.TABLE_CONSTRAINTS table:
  • CONSTRAINT_SCHEMA, TABLE_NAME, CONSTRAINT_NAME, CONSTRAINT_TYPE
  • TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_NAME, CONSTRAINT_TYPE

As CONSTRAINT_SCHEMA and TABLE_SCHEMA are completely interchangeable as far as the result is concerned, there is no specific reason to generically prefer either combination.

To learn more about this and other traps in the MySQL information schema, come meet me at the MySQL User's conference. I will be your guide on the Grand Tour of the Information Schema and its Applications, and explain more of these gotcha's.

4 comments:

jefftanner said...

I am trying to find the PRIMARY_KEY for a table to optimize replication from one host to another.

To find the PRIMARY_KEY, I am using the query:

'SELECT k.column_name
FROM information_schema.table_constraints t
JOIN information_schema.key_column_usage k
USING(constraint_name,table_schema,table_name)
WHERE t.constraint_type='PRIMARY KEY'
AND t.table_schema='db_name'
AND t.table_name='tbl_name';'

However, two tables within information_schema that are taking a long time to process 'SELECT *', 4+ minutes.

All the other tables in this schema respond promptly with 'SELECT *'.

'SELECT * FROM information_schema.TABLE_CONSTRAINTS' takes 3+ minutes
'SELECT COUNT(*) FROM information_schema.TABLE_CONSTRAINTS' = 232

'SELECT * FROM information_schema.KEY_COLUMN_USAGE' takes 3+ minutes
'SELECT COUNT(*) FROM information_schema.KEY_COLUMN_USAGE' = 291

Ideas?

Thanks
Jeff in Seattle

jefftanner said...

I am trying to find the PRIMARY_KEY for a table to optimize replication from one host to another.

To find the PRIMARY_KEY, I am using the query:

'SELECT k.column_name
FROM information_schema.table_constraints t
JOIN information_schema.key_column_usage k
USING(constraint_name,table_schema,table_name)
WHERE t.constraint_type='PRIMARY KEY'
AND t.table_schema='db_name'
AND t.table_name='tbl_name';'

However, two tables within information_schema that are taking a long time to process 'SELECT *', 4+ minutes.

All the other tables in this schema respond promptly with 'SELECT *'.

'SELECT * FROM information_schema.TABLE_CONSTRAINTS' takes 3+ minutes
'SELECT COUNT(*) FROM information_schema.TABLE_CONSTRAINTS' = 232

'SELECT * FROM information_schema.KEY_COLUMN_USAGE' takes 3+ minutes
'SELECT COUNT(*) FROM information_schema.KEY_COLUMN_USAGE' = 291

Ideas?

Thanks
Jeff in Seattle

rpbouman said...

Hi Jeff,

your query looks good, you can try to see if things speed up by putting everything in the JOIN condition:

INNER JOIN
ON tcs.table_schema = '[the schema]'
AND tcs.table_name = '[the table]'
AND tcs.table_schema = kcu.table_schema
AND tcs.table_name = kcu.table_name
AND tcs.constraint_type = 'PRIMARY KEY'

That said, information_schema performance is sadly quite poor in some cases: see http://bugs.mysql.com/bug.php?id=19588

There is some improvement in 5.1

Does your replication optimization require instant information on the primary key? In that case it's probably better build some kind of cache for the metadata you need, or obtain it in a different way.

(I know, that sucks. But sadly that's all I can think of. Please add your comment to the bug report - MySQL needs to know people are suffering from the current performance problems)

jefftanner said...

Figured it out:

information_schema.TABLE_CONSTRAINTS AS `tcs`

information_schema.KEY_COLUMN_USAGE AS `kcu`

SAP HANA Trick: DISTINCT STRING_AGG

Nowadays, many SQL implementations offer some form of aggregate string concatenation function. Being an aggregate function, it has the effe...