Monday, October 17, 2005

Know Thy Schema...without typing "information_schema.%" so much

The MySQL information_schema (wich is available as of version 5.0.2) is a very powerful feature that can help you maintain or administrate your databases. You can use it to investigate the existence and status of database objects (such as tables, constraints and indexes) and their components (such as columns).

I like to use the MySQL command line tool for lots of common tasks. From the command line tool, I frequently feel the need to query the information_schema. In most cases, I just need to have a quick overview of a database's assets. In fact, I took up the habit in my job as an Oracle developer. In the Oracle command line tool, SQL*Plus, I'd do something like:


select object_name
, object_type
, created
from all_objects o
where owner = 'SCOTT'
order by object_type
, object_name


and this would give me a list of all the objects accessible to the current user that reside in the schema owned by the user SCOTT. For each object, the schema in wich the object resides would be listed, along the object's name and the date (actually the datetime, but the formatting leaves out the time component) the object was created:


OBJECT_NAME OBJECT_TYPE CREATED
------------------------------ ------------------ --------
PK_DEPT INDEX 12-05-02
PK_EMP INDEX 12-05-02
PK_NODE INDEX 18-08-05
SYS_C003897 INDEX 09-08-05
UK_NODE1 INDEX 18-08-05
UK_NODE2 INDEX 18-08-05
EMPREPORT PACKAGE 20-03-05
EMPREPORT PACKAGE BODY 20-03-05
BONUS TABLE 12-05-02
DEPT TABLE 12-05-02
EMP TABLE 12-05-02
IMAGES TABLE 20-03-05
NODE TABLE 18-08-05
PK TABLE 09-08-05
PRODUCT TABLE 18-08-05
SALGRADE TABLE 12-05-02


With the mysql information_schema database, we can achieve the same sort of list. However, there's no table there that contains all these different types of objects.

Another thing that can be a bit frustrating is that such a query can be quite lengthy. The identifiers used throughout the information_schema database are quit long and verbose.

select avg(
character_length(
identifiers.identifier
)
) avg_character_length
, min(
character_length(
identifiers.identifier
)
) min_character_length
, max(
character_length(
identifiers.identifier
)
) max_character_length
from (
select table_name identifier
from information_schema.tables
where table_schema = 'information_schema'
union all
select column_name
from information_schema.columns
where table_schema = 'information_schema'
) identifiers
;

+----------------------+----------------------+----------------------+
| avg_character_length | min_character_length | max_character_length |
+----------------------+----------------------+----------------------+
| 13.1279 | 2 | 37 |
+----------------------+----------------------+----------------------+


On the one hand, these elaborate identifiers make it easy to understand the internal structure of the information_schema and the queries that are built upon it, wich is good. On the other hand, it easily becomes a nuisance when you have to type them over and over again.

On top of that, I usually query the information_schema database in a situation where it is not the current database. Lazy me just wants to perform an ad hoc query to get some information on the current database, and not switch databases first with a use command.

All in all, in it's raw form, the information_schema is not that suitable for quickly inspecting a database. I decided to make life easier for myself, and some time ago, I created a database to store all kinds of general purpose utilities.


create database utils
;


Among my utilities are a couple of views built on the information_schema. In these views, I tried to find a balance between ease of use and comprehensibility. I'm printing them all here, hoping someone else will benefit.

The objs View


First of all, there's the objs view:


create or replace
view
utils.objs (
db
, namespace
, name
, type
, extra
, created
, comments
)
as
select trigger_schema
, _utf8'TRIGGERS'
, trigger_name
, _utf8'TRIGGER'
, concat(
action_timing
, _utf8' '
, event_manipulation
, _utf8' ON '
, table_name
, _utf8' FOR EACH '
, action_orientation
) as extra
, created
, _utf8''
from information_schema.triggers
union all
select routine_schema
, _utf8'ROUTINES'
, routine_name
, routine_type
, concat(
if(is_deterministic='NO','NOT','')
, _utf8' DETERMINISTIC '
, ' SQL SECURITY '
, security_type
, sql_data_access
) as extra
, created
, routine_comment
from information_schema.routines
union all
select table_schema
, _utf8'TABLES'
, table_name
, table_type
, concat(
'ENGINE='
, engine
, ' ROW_FORMAT='
, row_format
) as extra
, create_time
, table_comment
from information_schema.tables


The objs view yields one row for each database object. I decided to include the following objects: base tables, (system) views, procedures, functions and triggers. I decided to leave out indexes, because you can't identify an index by name in a given database: you'll also need the table name to be absolutely sure.

The columns are:

db

The database (schema) in wich the object resides.

namespace

The namespace. Currently, there are three namespaces: 'TABLES' (base tables, views, system views); 'ROUTINES' (functions, stored procedures) and 'TRIGGERS'

name

The name of the object, wich is unique within the namespace given a particular database

type

The actual object type. Values include: 'BASE TABLE', 'VIEW' and 'SYSTEM VIEW' for objects in the 'TABLES' namespace; 'FUNCTION' or 'BASE TABLE', 'VIEW' and 'SYSTEM VIEW'for the 'ROUTINES'namespace; and 'TRIGGER' for the 'TRIGGERS'namespace

extra

This column shows some extra information, dependant upon the object type.

created

The datetime corresponding to the that the object was created

comments

The comments describing the object




Now, to display a list of all the objects in the current database, just do:


select *
from utils.objs
where db = schema()
;


The refs View


The refs view lists information about foreign keys. Here's the code:


create
or replace
view
utils.refs (
db
, cons
, tab
, ref_db
, ref_cons
, ref_type
, ref_tab
, nulls
, idpart
)
as
select c.constraint_schema
, c.constraint_name
, c.table_name
, p.constraint_schema
, p.constraint_name
, p.constraint_type
, p.table_name
, max(ctcol.is_nullable)
, case count(cidcon.constraint_name)
when 0 then 'NO'
else 'YES'
end
from information_schema.table_constraints c
inner join information_schema.key_column_usage kc
on c.table_schema = kc.table_schema
and c.table_name = kc.table_name
and c.constraint_name = kc.constraint_name
inner join information_schema.columns ctcol
on kc.table_schema = ctcol.table_schema
and kc.table_name = ctcol.table_name
and kc.column_name = ctcol.column_name
left join information_schema.key_column_usage cidcol
on kc.table_schema = cidcol.table_schema
and kc.table_name = cidcol.table_name
and kc.column_name = cidcol.column_name
left join information_schema.table_constraints cidcon
on cidcol.constraint_schema = cidcon.constraint_schema
and cidcol.constraint_name = cidcon.constraint_name
and cidcol.table_name = cidcon.table_name
and cidcon.constraint_type != 'FOREIGN KEY'
inner join information_schema.key_column_usage kp
on kc.referenced_table_schema = kp.table_schema
and kc.referenced_table_name = kp.table_name
and kc.referenced_column_name = kp.column_name
and kc.position_in_unique_constraint = kp.ordinal_position
inner join information_schema.table_constraints p
on kp.table_schema = p.table_schema
and kp.table_name = p.table_name
and kp.constraint_name = p.constraint_name
where p.constraint_type != 'FOREIGN KEY'
and c.constraint_type = 'FOREIGN KEY'
group by c.constraint_schema
, c.constraint_name
, c.table_name
, p.constraint_schema
, p.constraint_name
, p.constraint_type
, p.table_name
;


Apart from the foreign key name and table, the table that is referred to is listed too, along with the primary key or unique constraint of wich the columns are referenced by the foreign key columns. As a bonus, some extra information is shown about the type of relationship implemented by the foreign key.


db

The database in wich the foreign key table (and thus, the foreign key constraint) resides.

cons

The name of the foreign key constraint

tab

The name of the table with the foreign key constraint a.k.a. the child or detail table

ref_db

The database that contains the referenced table

ref_cons

The unique or primary key constraint of the referenced table that contains the columns that are referred by the foreign key columns

ref_type

The type of the referenced constraint: either 'PRIMARY KEY' or 'UNIQUE'

ref_tab

The referenced (a.k.a master or lookup) table

nulls

Whether the foreign key columns accept NULL values: 'YES' (foreign key columns accept nulls, the relationship is not mandatory) or 'NO' (foreign key columns cannot have nulls, thus the relationship is mandatory)

idpart

Wheter the foreign key columns are also part of a 'PRIMARY KEY' or 'UNIQUE' constraint in the referencing table: 'YES' (foreign key columns are partially identifying) or 'NO' (foreign key columns are not part of an identifier)



The refs view does not take into account that MySQL/innodb supports foreign keys that do not refer to either a 'PRIMARY KEY' or a 'UNIQUE' constraint.

(Q: Wow, can MySQL do that? A: Yes, it can! Q: Why would you want to do that anyway? A: uhmm, don't really know, but you can post on this subject if you have a suggestion).

The idxs View


The idxs view lists information about indexes. Here's the code:


create
or replace
view
utils.idxs (
db
, tab
, name
, type
, id
, nulls
, cols
, maxs
, mins
)
as
select t.table_schema
, t.table_name
, s.index_name
, s.index_type
, if(s.non_unique=0,_utf8'YES',_utf8'NO')
, if(s.nullable='YES',_utf8'YES',_utf8'NO')
, count(seq_in_index)
, round(max(s.cardinality)/t.table_rows,2)
, round(min(s.cardinality)/t.table_rows,2)
from information_schema.tables t
inner join information_schema.statistics s
on t.table_schema = s.table_schema
and t.table_name = s.table_name
group by t.table_schema
, t.table_name
, s.index_name
, s.index_type
, s.non_unique
, t.table_rows
;


This view yields one row for each index.


db

The database in wich the index table (and thus, the index) resides.

name

The name of the index

tab

The name of the table that defines the index

id

Whether the index accepts duplicate values or value combinations: 'YES' when this is an unique index, else 'NO' if the index accepts duplicates

nulls

Whether the index columns accept NULL values: 'YES' (index columns accept nulls) or 'NO' (index columns are mandatory)

cols

The number of columns in de index

maxs

The maximum selectivity for all columns in the index

mins

The minimum selectivity for all columns in the index



So, good luck with this everyone. I hope this will be as useful to you as it was to me.

6 comments:

Andrew Gilfrin said...

Yet again a brilliant blog. The great thing about your blogs Roland is the way in which you find a way to actually use the new features of 5.0 in a practical environment.

rpbouman said...

Andrew, thank you for the encouragement!

Of course, these views are still just a tools you can use while developing some real application, but they've been very useful to me.

taw said...

Roland,

Thanks very much for this - it's saved me much time in creating something similar for myself!

I'll be adding a slightly more verbose index view, which shows each column in each index, but that's relatively straightforward given the leg-up your work represents.

Anonymous said...

Hi Roland,

Three years later and this post is very relevant so something I was checking about. Great work!

Shlomi Noach

Shlomi Noach said...

Hi Roland,

Quick note: perhaps it would be better to use aliases in the views definitions (some columns are named like "case ... when ..." etc.), and also avoid duplicate column names (as in the refs view). I realize this post is almost 4 years old, but perhaps it's still a good idea to update the text.

Regards

rpbouman said...

Hi Shlomi,

thanks for the comment. However, don't agree.

The views do not have duplicate column names - you wouldn't be able to create the view if you had.

Just look at the CREATE VIEW statements - I used column name lists to explicitly name each column in the target view, regardless of the name that's derived from the column expression.

I used this technique primarily because of the utils.objs view, because with this technique, you can freely change the order of the SELECT expressions that make up the UNION ALL without having to add column aliases to all SELECT expressions.

For consistency I stuck to that syntax for the other views mentioned in this article, and given that it solves the problem, I don't think that adding aliases also to the individual items in the SELECT lists is worth the effort.

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...