.comment-link {margin-left:.6em;}

Roland Bouman's blog

Sunday, May 11, 2008

Getting terrorists to talk...

Another quote from slashdot that cracked me up...This time it's gbjbaanb who install MS silverlight to keep his machine from whining and moaning, as found in the recent Silverlight thread:

I installed it in the end just to shut the damn thing up, and even then it refused to install. I almost cried with the frustration! The CIA could use this technique to get their terrorist suspects to talk.

I must admit that I have not installed Silverlight, but out of curiosity, I just might...

Saturday, May 03, 2008

Random slahsdot comment gem

By vux984 in a thread about the article Twitter Said To Be Abandoning Ruby on Rails:

The rails framework is aptly named. Its like driving a train. You follow the rails. Its easy, simple, and those are its strengths. But if one day, you decide you want to cut across a field save a few hours of travel, well, you probably shouldn't have chosen 'train' as your mode of transportation.

Not trying to bash anyone here. This is just a well formulated comment. Maybe the guy took it from somewhere, I dunno. It just stuck on me.

Feel free to post your flames below, maybe I'll pass em on to vux984....

Monday, April 21, 2008

MySQL UC2008 presentation "Grand Tour of the information schema" now online

Yup, the presentation slides as well as the scripts are now available online on the conference website.

The stuff you will find in there:


Information_schema.pdf
Diagram of the information schema

I_S_VC_Slides.pdf
Slides for the UC presentation, "Grand Tour of the Information Schema"

I_S_INDEXES2.sql
script, returns one row for each index (rollup of information_schema.STATISTICS)

I_S_REDUNDANT_INDEXES2.sql
script, lists all redundant indexes. Redundancy rules:

  • two indexes with the same columns, type and uniqueness are interchangeable. The one with the largest index name is listed as redundant

  • if there is a unique index and a non unique index with the same columns and type exists, the non-unique one is considered redundant

  • A non-unique btree index is the leftmost prefix of another btree index (that may or may not be unique)
TODO: indexes created on the referencing columns for InnoDB foreign keys are often redundant. It would be nice if the script could list that the FK currently supports an FK

p_check_fk_constraint_violations.sql
stored procedure Detect probles after using @@foreign_key_checks=1. Takes a schema name LIKE pattern and a table name LIKE pattern and reports any foreign keys constraints for which violations exist. TODO: if the referenced table in the foreign key was dropped, the script will encounter a runtime error. Would be nice to properly report instead

qwz.sql
Stored procedure, The Query wizard for the mysql command line pronounced "cues". To use it, run the script, and it starts a new session with the query wizard. Alternatively, do: USE qwz; CALL qwz(NULL); and follow the on-screen instructions.
schema_doc.sql
script, Generates HTML documentation for the current schema. Run using a command like: mysql -uroot -Dsakila -Nrs -e"source schema_doc.sql" > schema_doc.html

schema_doc.sh
shell, Linux example for running schema_doc.sql

gen_fks.sql
script, Generates trigger code to emulate foreign keys. Set a schema containing foreign key constraints as the default database. Run the script. The output forms trigger code that may be used to recreate the foreign keys functionality in a database that has the same tables but no declarative foreign keys. For foreign keys with the RESTRICT or NO ACTION rules, a non-existent stored procedure will be called: p_child_row_exists(). For orphan INSERTs and UPDATEs, a call is made to p_no_parent_row(). Currently this is the best we can do - it will result in a runtime error and the action causing it will not complete but it is not ideal of course.Corrective rules (CASCADE etc) should just work.

p_create_federated_table.sql
stored procedure, Creates a federated table (residing on a 5.0 or better remote host)

Labels: , , , , , ,

Saturday, March 29, 2008

MySQL information_schema: Identifying rows from TABLE_CONSTRAINTS

Yesterday, I set out a little quiz about the TABLE_CONSTRAINTS table in the MySQL information_schema. The task was:

  • Specify a minimal set of columns of the information_schema.TABLE_CONSTRAINTS table that is sufficient to reliably identify a single row in the information_schema.TABLE_CONSTRAINTS table.
  • Argue why these columns are necessary and sufficient to identify a row, and why a smaller set of columns does not exist

Short Answer


For MySQL there are two such column sets:

  • CONSTRAINT_SCHEMA, TABLE_NAME, CONSTRAINT_NAME, and CONSTRAINT_TYPE
  • TABLE_NAME, TABLE_SCHEMA, CONSTRAINT_NAME, and CONSTRAINT_TYPE

Explanation


According to the ISO SQL specification (ISO/IEC 9075-11:2003 (E) 6.48 TABLE_CONSTRAINTS base table), we can derive that the "conceptual primary key" of a standard implementation of the TABLE_CONSTRAINTS system view should be (CONSTRAINT_CATALOG, CONSTRAINT_SCHEMA, and CONSTRAINT_NAME). In the absence of support for catalogs, we can ignore the CONSTRAINT_CATALOG column, and this also applies to MySQL (all %_CATALOG columns in the MySQL information_schema are always NULL). That would leave us with (CONSTRAINT_SCHEMA and CONSTRAINT_NAME) - that is, table constraint names would be unique within a schema. However, in MySQL, constraint names need not be unique within a schema.

MySQL supports three types of table constraints:

  • PRIMARY KEY
  • UNIQUE
  • FOREIGN KEY (that is, they are supported dependent upon the storage engine)
.

Identifying FOREIGN KEY constraints


FOREIGN KEYs are in their own per schema namespace, so at a glance it may seem that the combination of (CONSTRAINT_SCHEMA and CONSTRAINT_NAME) is sufficient to identify a FOREIGN KEY constraint. However, because PRIMARY KEY and UNIQUE constraints are not within the same namespace as FOREIGN KEY constraints, a single schema may contain a FOREIGN KEY constraint and a non-FOREIGN KEY constraint that have the same CONSTRAINT_NAME.

From this it follows that we need to take the CONSTRAINT_TYPE into account too.

So in order to reliably identify a FOREIGN KEY constraint, we need to have at least CONSTRAINT_SCHEMA, CONSTRAINT_NAME *and* require that CONSTRAINT_TYPE equals 'FOREIGN KEY'.

(Among the answers was a suggestion that the CONSTRAINT_TYPE is not necessary, because the CONSTRAINT_NAME would provide the necessary differentiation. Although this is true for the identifiers generated by MySQL in the absence of explicit identifiers, it is not true in the general case)

Identifying PRIMARY KEY constraints


So, would (CONSTRAINT_SCHEMA, CONSTRAINT_NAME, and CONSTRAINT_TYPE) be sufficient to identify any of the other types of table constraints?

Unfortunately not. Almost everybody that is somewhat familiar with MySQL knows that in MySQL a PRIMARY KEY constraint doesn't really have its own identifier. Rather, the CONSTRAINT_NAME for a PRIMARY KEY constraint is always 'PRIMARY'. In fact, the identifier 'PRIMARY' is reserved exclusively for PRIMARY KEY constraints, and may not be used for any other type of constraint.

From this, it follows that there can be multiple rows in information_schema.TABLE_CONSTRAINTS that have the same combination of values in the columns (CONSTRAINT_SCHEMA, CONSTRAINT_NAME, and CONSTRAINT_TYPE). In fact, for one particular value of CONSTRAINT_SCHEMA there will be just as many occurrences of the combination ('PRIMARY', 'PRIMARY KEY') in the columns (CONSTRAINT_NAME, CONSTRAINT_TYPE) as there are PRIMARY KEY constraints in that schema.

Of course, we know that there can be only one PRIMARY KEY per table. From that, it follows that we can identify a PRIMARY KEY constraint if we know to which table it belongs.

Tables (and views) reside in their own per-schema namespace, so the combination of (TABLE_SCHEMA and TABLE_NAME) is sufficient to identify a table. That means that the combination of (TABLE_SCHEMA, TABLE_NAME, and CONSTRAINT_TYPE) is sufficient to identify a PRIMARY KEY constraint. Once we identified a particular table using (TABLE_SCHEMA, TABLE_NAME) we know that the table constraint with the CONSTRAINT_TYPE equal to 'PRIMARY KEY' is the PRIMARY KEY constraint.

Of course, because the name 'PRIMARY' is reserved exclusively for PRIMARY KEY constraints we could've equally well settled for (TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_NAME) although personally I prefer to use CONSTRAINT_TYPE.

Identifying UNIQUE constraints


This leaves us with the UNIQUE constraints. UNIQUE constraints must have a unique name per table, and the name must not be equal to 'PRIMARY' (which is reserved for PRIMARY KEY constraints). From this we must conclude that we need no less than the combination of (TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_TYPE, and CONSTRAINT_NAME) to identify a UNIQUE constraint.

We must have (TABLE_SCHEMA, TABLE_NAME) to identify the table because the UNIQUE constraints have a unique name per table. Now we can take a shortcut: within one table we can distinguish between PRIMARY KEY and UNIQUE constraints by virtue of the fact that the CONSTRAINT_NAME for PRIMARY KEYs will always be 'PRIMARY'. However, if we need to distinguish between FOREIGN KEY and UNIQUE constraints, we still need to look at the CONSTRAINT_TYPE too.

Summary


To recapitulate:
  • we need CONSTRAINT_SCHEMA, CONSTRAINT_NAME, CONSTRAINT_TYPE to identify a FOREIGN KEY constraint
  • we need TABLE_SCHEMA, TABLE_NAME, and CONSTRAINT_TYPE to identify a PRIMARY KEY constraint.
  • Alternatively we could identify PRIMARY KEY constraints by looking at TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_NAME by virtue of the fact that all PRIMARY KEYs always have a CONSTRAINT_NAME equal to 'PRIMARY'
  • we need TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_TYPE and CONSTRAINT_NAME to identify a UNIQUE constraint

From this we could conclude that the minimal set of columns required to identify an arbitrary table constraint consists of:
  • CONSTRAINT_SCHEMA
  • CONSTRAINT_NAME
  • CONSTRAINT_TYPE
  • TABLE_SCHEMA
  • TABLE_NAME

However, in MySQL, a table constraint is 'owned' by the table on which it is defined and a table constraint defined on particular table cannot reside in a schema different from the schema its table. This means that for any row in information_schema.TABLE_CONSTRAINTS, the columns TABLE_SCHEMA and CONSTRAINT_SCHEMA will always have the same value.

This brings us to the final conclusion that there are two minimal set of columns that may be used to identify an arbitrary constraint in the information_schema.TABLE_CONSTRAINTS table:
  • CONSTRAINT_SCHEMA, TABLE_NAME, CONSTRAINT_NAME, CONSTRAINT_TYPE
  • TABLE_SCHEMA, TABLE_NAME, CONSTRAINT_NAME, CONSTRAINT_TYPE

As CONSTRAINT_SCHEMA and TABLE_SCHEMA are completely interchangeable as far as the result is concerned, there is no specific reason to generically prefer either combination.

To learn more about this and other traps in the MySQL information schema, come meet me at the MySQL User's conference. I will be your guide on the Grand Tour of the Information Schema and its Applications, and explain more of these gotcha's.

Labels: , ,

Thursday, March 27, 2008

"Me Too" MySQL Information Schema popquiz

A few days ago, I wrote how I will be your guide to the Grand Tour of the Information Schema and its Applications which is one of the two talks I will be doing at the upcoming MySQL User's Conference.

In view of the popularity of "Pop Quiz" format so successfully used by Carsten, I feel compelled to imitation, and as a primer to my talk, I'd like to offer you my "Me Too" MySQL Information Schema popquiz. So, here goes...


The MySQL information_schema contains a number of tables. Among them, one is called TABLE_CONSTRAINTS. Unsurprisingly, each row in TABLE_CONSTRAINTS table represents a single table constraint. The TABLE_CONSTRAINTS table has the following columns:

+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| CONSTRAINT_CATALOG | varchar(512) | YES | | NULL | |
| CONSTRAINT_SCHEMA | varchar(64) | NO | | | |
| CONSTRAINT_NAME | varchar(64) | NO | | | |
| TABLE_SCHEMA | varchar(64) | NO | | | |
| TABLE_NAME | varchar(64) | NO | | | |
| CONSTRAINT_TYPE | varchar(64) | NO | | | |
+--------------------+--------------+------+-----+---------+-------+


  • Specify a minimal set of columns of the information_schema.TABLE_CONSTRAINTS table that is sufficient to reliably identify a single row in the information_schema.TABLE_CONSTRAINTS table.

  • Argue why these columns are necessary and sufficient to identify a row, and why a smaller set of columns does not exist



Feel free to post the answer as a comment. Every correct answer will earn a 20% discount to the admission fee. You can get bonus points for nice elegant argumentation, and also if you specify more than one minimal set of columns.

I will elect three to five of the best answers and make sure that whoever posts it gets a laminated printout of the diagram of the MySQL information schema.

(Please make sure you post me your email address if you think you are eligible to either the discount code, the laminated diagram or both)

Labels: , , ,

Tuesday, March 25, 2008

MySQL Information Schema applications at the UC2008

Last week I blogged about the upcoming MySQL Users conference, in particular about the Writing MySQL UDFs tutorial that I will be delivering.

I will also be doing the Grand Tour of the Information Schema and its Applications.


I will discuss the elements in the MySQL information schema, and provide tips to write queries against it. Most of the talk will revolve around a number of scripts I have developed over the past few years:


  • Generating a history/audit database - generate all the code you need to keep track of all changes occurring in your on-line database. Use it to audit or implement 'flashback'

  • Checking foreign key violations - disabling foreign key checks may be useful, but is dangerous. This script helps you find problems with foreign key constraints

  • Creating federated tables - FEDERATED tables are useful, but tedious and error-prone to create. Let this script do the work instead

  • Checking for duplicated and redundant indexes - Redundant or duplicate indexes can slow down your database performance. Find them with this script (Note: the original script I blogged about earlier contains a critical flaw - this is a completely new version)

I also have a few new information schema tricks up my sleeve. Without giving away too much, take a look at this little conversation I just had with my MySQL command line client:

mysql> call qwz.qwz(null);
+-----+------------------------------------------------------------------+
| | Welcome to the command line Query Wizard for MySQL!!! |
+-----+------------------------------------------------------------------+
| 1 | Set the schema (current: world) |
| --- | ---------------------------------------------------------------- |
| 2 | Choose a table |
+-----+------------------------------------------------------------------+
3 rows in set (0.00 sec)

Yes! It's an interactive query wizard for the command line. I don't want to spoil too much, but I can unveil that I will be presenting a full-fledged interactive query wizard that operates completely from the MySQL command line. It's just a stored procedure - no proxy, no UDFs, no plugins or whatever.

To give you a taste of the possibilities, take a look at the continuation of my session with the query wizard. First let's choose the schema:

mysql> call qwz.qwz(1);
+----+------------------------+
| | Set the default schema |
+---+------------------------+
| 1 | information_schema |
| 2 | mysql |
| 3 | qwz |
| 4 | sakila |
| 5 | world |
+---+------------------------+

Let's settle for the sakila schema:

mysql> call qwz.qwz(4);

This will prompt us to choose a table from the sakila schema:

+-----+------------------------------------------------------------------+
| | The command line Query Wizard for MySQL |
+-----+------------------------------------------------------------------+
| 1 | Set the schema (current: sakila) |
| --- | ---------------------------------------------------------------- |
| 2 | actor |
| 3 | actor_info |
| 4 | address |
| 5 | category |
| 6 | city |
| 7 | country |
| 8 | customer |
| 9 | customer_list |
| 10 | film |
| 11 | film_actor |
| 12 | film_category |
| 13 | film_list |
| 14 | film_text |
| 15 | inventory |
| 16 | language |
| 17 | nicer_but_slower_film_list |
| 18 | payment |
| 19 | rental |
| 20 | sales_by_film_category |
| 21 | sales_by_store |
| 22 | staff |
| 23 | staff_list |
| 24 | store |
+-----+------------------------------------------------------------------+
26 rows in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Let's pick the rental table:

mysql> call qwz.qwz(19);

Once we selected the initial table, we can now build a join path, and we are prompted to pick any of the tables related to rental:

+-----+------------------------------------------------------------------+
| | Join path: + tables are added / - tables are removed |
+-----+------------------------------------------------------------------+
| | sakila.rental |
| 1 | + sakila.customer (fk_rental_customer) |
| 2 | + sakila.inventory (fk_rental_inventory) |
| 3 | + sakila.staff (fk_rental_staff) |
| --- | ---------------------------------------------------------------- |
| 4 | Choose columns |
| --- | ---------------------------------------------------------------- |
| 5 | New Query |
+-----+------------------------------------------------------------------+
8 rows in set (0.20 sec)

Query OK, 0 rows affected (0.20 sec)

Let's extend the join path with the customer table:

mysql> call qwz.qwz(1);
+-----+------------------------------------------------------------------+
| | Join path: + tables are added / - tables are removed |
+-----+------------------------------------------------------------------+
| | sakila.rental |
| 1 | - sakila.customer (fk_rental_customer) |
| 2 | + sakila.address (fk_customer_address) |
| 3 | + sakila.store (fk_customer_store) |
| 4 | + sakila.inventory (fk_rental_inventory) |
| 5 | + sakila.staff (fk_rental_staff) |
| --- | ---------------------------------------------------------------- |
| 6 | Choose columns |
| --- | ---------------------------------------------------------------- |
| 7 | New Query |
+-----+------------------------------------------------------------------+
10 rows in set (0.09 sec)

Query OK, 0 rows affected (0.09 sec)

Let's throw in the inventory table in the mix too:

mysql> call qwz.qwz(4);
+-----+------------------------------------------------------------------+
| | Join path: + tables are added / - tables are removed |
+-----+------------------------------------------------------------------+
| | sakila.rental |
| 1 | - sakila.customer (fk_rental_customer) |
| 2 | + sakila.address (fk_customer_address) |
| 3 | + sakila.store (fk_customer_store) |
| 4 | - sakila.inventory (fk_rental_inventory) |
| 5 | + sakila.film (fk_inventory_film) |
| 6 | + sakila.store (fk_inventory_store) |
| 7 | + sakila.staff (fk_rental_staff) |
| --- | ---------------------------------------------------------------- |
| 8 | Choose columns |
| --- | ---------------------------------------------------------------- |
| 9 | New Query |
+-----+------------------------------------------------------------------+
12 rows in set (0.10 sec)

Query OK, 0 rows affected (0.10 sec)

We can keep this up quite a long time, and we can remove tables from our join path in a similar way. We can then start specifying some columns:

mysql> call qwz.qwz(8);
+-----+------------------------------------------------------------------+
| | Choose columns for the SELECT list |
+-----+------------------------------------------------------------------+
| | sakila.rental.* |
| | --------------- |
| 1 | + rental_id |
| 2 | + rental_date |
| 3 | + inventory_id |
| 4 | + customer_id |
| 5 | + return_date |
| 6 | + staff_id |
| 7 | + last_update |
| | --------------- |
| | sakila.customer.* (fk_rental_customer) |
| | -------------------------------------- |
| 8 | + customer_id |
| 9 | + store_id |
| 10 | + first_name |
| 11 | + last_name |
| 12 | + email |
| 13 | + address_id |
| 14 | + active |
| 15 | + create_date |
| 16 | + last_update |
| | -------------------------------------- |
| | sakila.inventory.* (fk_rental_inventory) |
| | ---------------------------------------- |
| 17 | + inventory_id |
| 18 | + film_id |
| 19 | + store_id |
| 20 | + last_update |
| --- | ---------------------------------------------------------------- |
| 21 | New Query |
+-----+------------------------------------------------------------------+
30 rows in set (0.02 sec)

Query OK, 0 rows affected (0.02 sec)

Currently I'm still working on a multiple select interface, but for now I'll snip out all but the last individual column selection:

mysql> call qwz.qwz(18);
+-----+------------------------------------------------------------------+
| | SELECT list: + columns are added / - columns are removed |
+-----+------------------------------------------------------------------+
| | sakila.rental.* |
| | --------------- |
| 1 | + rental_id |
| 2 | - rental_date |
| 3 | + inventory_id |
| 4 | + customer_id |
| 5 | + return_date |
| 6 | + staff_id |
| 7 | + last_update |
| | --------------- |
| | sakila.customer.* (fk_rental_customer) |
| | -------------------------------------- |
| 8 | + customer_id |
| 9 | + store_id |
| 10 | - first_name |
| 11 | - last_name |
| 12 | + email |
| 13 | + address_id |
| 14 | + active |
| 15 | + create_date |
| 16 | + last_update |
| | -------------------------------------- |
| | sakila.inventory.* (fk_rental_inventory) |
| | ---------------------------------------- |
| 17 | + inventory_id |
| 18 | - film_id |
| 19 | + store_id |
| 20 | + last_update |
| --- | ---------------------------------------------------------------- |
| 21 | Show SQL |
| --- | ---------------------------------------------------------------- |
| 22 | Explain SQL |
| --- | ---------------------------------------------------------------- |
| 23 | Execute SQL |
| --- | ---------------------------------------------------------------- |
| 24 | New Query |
+-----+------------------------------------------------------------------+
36 rows in set (0.03 sec)

Query OK, 0 rows affected (0.03 sec)

Now is a good moment to inspect the SQL underlying our query:

mysql> call qwz.qwz(21);

+-----+----------------------------------------------------------------------------+
| | The command line Query Wizard for MySQL |
+-----+----------------------------------------------------------------------------+
| | SELECT t1.rental_date |
| | ,t2.first_name |
| | ,t2.last_name |
| | ,t8.film_id |
| | FROM sakila.rental AS t1 |
| | INNER JOIN sakila.customer AS t2 ON (t1.customer_id) = (t2.customer_id) |
| | INNER JOIN sakila.inventory AS t8 ON (t1.inventory_id) = (t8.inventory_id) |
| --- | ---------------------------------------------------------------- |
| 1 | Execute SQL |
| --- | ---------------------------------------------------------------- |
| 2 | Explain SQL |
| --- | ---------------------------------------------------------------- |
| 3 | Choose columns |
| --- | ---------------------------------------------------------------- |
| 4 | New Query |
+-----+----------------------------------------------------------------------------+
15 rows in set (1.21 sec)

Query OK, 0 rows affected (1.21 sec)

And we can execute it, or run EXPLAIN on it:

mysql> call qwz.qwz(2);
+----+-------------+-------+--------+----------------------------------------+---------------------+---------+------------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------------+---------------------+---------+------------------------+------+-------+
| 1 | SIMPLE | t8 | index | PRIMARY | PRIMARY | 3 | NULL | 4673 | |
| 1 | SIMPLE | t1 | ref | idx_fk_inventory_id,idx_fk_customer_id | idx_fk_inventory_id | 3 | sakila.t8.inventory_id | 1 | |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY | PRIMARY | 2 | sakila.t1.customer_id | 1 | |
+----+-------------+-------+--------+----------------------------------------+---------------------+---------+------------------------+------+-------+
3 rows in set (0.00 sec)

As you can imagine, quite a good deal of information schema hacking going on here.

Code will be released in full immediately after my talk, and if there is sufficient interest I will discuss the internals of the qwz stored procedure in full.

Labels: , , , ,

Friday, March 21, 2008

Sun/MySQL

Today I spent the larger part of the afternoon at the Sun office in Amersfoort. And,

Yes, I signed ;-)
and joined the now largest open source company in the world.