Roland Bouman's blog: Know Thy Schema...without typing "information

The MySQL information_schema (wich is available as of version 5.0.2) is a very powerful feature that can help you maintain or administrate your databases. You can use it to investigate the existence and status of database objects (such as tables, constraints and indexes) and their components (such as columns).

I like to use the MySQL command line tool for lots of common tasks. From the command line tool, I frequently feel the need to query the information_schema. In most cases, I just need to have a quick overview of a database's assets. In fact, I took up the habit in my job as an Oracle developer. In the Oracle command line tool, SQL*Plus, I'd do something like:


select   object_name
,        object_type
,        created
from     all_objects o
where    owner = 'SCOTT'
order by object_type
,        object_name

and this would give me a list of all the objects accessible to the current user that reside in the schema owned by the user SCOTT. For each object, the schema in wich the object resides would be listed, along the object's name and the date (actually the datetime, but the formatting leaves out the time component) the object was created:


OBJECT_NAME                    OBJECT_TYPE        CREATED
------------------------------ ------------------ --------
PK_DEPT                        INDEX              12-05-02
PK_EMP                         INDEX              12-05-02
PK_NODE                        INDEX              18-08-05
SYS_C003897                    INDEX              09-08-05
UK_NODE1                       INDEX              18-08-05
UK_NODE2                       INDEX              18-08-05
EMPREPORT                      PACKAGE            20-03-05
EMPREPORT                      PACKAGE BODY       20-03-05
BONUS                          TABLE              12-05-02
DEPT                           TABLE              12-05-02
EMP                            TABLE              12-05-02
IMAGES                         TABLE              20-03-05
NODE                           TABLE              18-08-05
PK                             TABLE              09-08-05
PRODUCT                        TABLE              18-08-05
SALGRADE                       TABLE              12-05-02

With the mysql information_schema database, we can achieve the same sort of list. However, there's no table there that contains all these different types of objects.

Another thing that can be a bit frustrating is that such a query can be quite lengthy. The identifiers used throughout the information_schema database are quit long and verbose.


select  avg(
            character_length(
                identifiers.identifier
            )
        )       avg_character_length
,       min(
            character_length(
                identifiers.identifier
            )
        )       min_character_length
,       max(
            character_length(
                identifiers.identifier
            )
        )       max_character_length
from    (
        select  table_name identifier
        from    information_schema.tables
        where   table_schema = 'information_schema'
        union all
        select  column_name
        from    information_schema.columns
        where   table_schema = 'information_schema'
        )       identifiers
;

+----------------------+----------------------+----------------------+
| avg_character_length | min_character_length | max_character_length |
+----------------------+----------------------+----------------------+
|              13.1279 |                    2 |                   37 |
+----------------------+----------------------+----------------------+

On the one hand, these elaborate identifiers make it easy to understand the internal structure of the information_schema and the queries that are built upon it, wich is good. On the other hand, it easily becomes a nuisance when you have to type them over and over again.

On top of that, I usually query the information_schema database in a situation where it is not the current database. Lazy me just wants to perform an ad hoc query to get some information on the current database, and not switch databases first with a use command.

All in all, in it's raw form, the information_schema is not that suitable for quickly inspecting a database. I decided to make life easier for myself, and some time ago, I created a database to store all kinds of general purpose utilities.


create database utils
;

Among my utilities are a couple of views built on the information_schema. In these views, I tried to find a balance between ease of use and comprehensibility. I'm printing them all here, hoping someone else will benefit.

The `objs` View

First of all, there's the objs view:


create or replace
view 
utils.objs (
    db
,   namespace
,   name
,   type
,   extra
,   created
,   comments
)
as
select   trigger_schema   
,        _utf8'TRIGGERS'
,        trigger_name
,        _utf8'TRIGGER'
,        concat(
             action_timing
         ,   _utf8' '
         ,   event_manipulation
         ,   _utf8' ON '
         ,   table_name
         ,   _utf8' FOR EACH '
         ,   action_orientation
         ) as extra 
,        created       
,        _utf8''
from     information_schema.triggers
union all
select   routine_schema
,        _utf8'ROUTINES'
,        routine_name
,        routine_type
,        concat(
             if(is_deterministic='NO','NOT','')
         ,   _utf8' DETERMINISTIC '
         ,   ' SQL SECURITY '
         ,   security_type
         ,   sql_data_access
         ) as extra
,        created
,        routine_comment
from     information_schema.routines
union all
select   table_schema     
,        _utf8'TABLES'
,        table_name       
,        table_type
,        concat(
             'ENGINE='
         ,   engine
         ,   ' ROW_FORMAT='
         ,   row_format
         ) as extra
,        create_time      
,        table_comment
from     information_schema.tables

The objs view yields one row for each database object. I decided to include the following objects: base tables, (system) views, procedures, functions and triggers. I decided to leave out indexes, because you can't identify an index by name in a given database: you'll also need the table name to be absolutely sure.

The columns are:

db: The database (schema) in wich the object resides.
namespace: The namespace. Currently, there are three namespaces: 'TABLES' (base tables, views, system views); 'ROUTINES' (functions, stored procedures) and 'TRIGGERS'
name: The name of the object, wich is unique within the namespace given a particular database
type: The actual object type. Values include: 'BASE TABLE', 'VIEW' and 'SYSTEM VIEW' for objects in the 'TABLES' namespace; 'FUNCTION' or 'BASE TABLE', 'VIEW' and 'SYSTEM VIEW'for the 'ROUTINES'namespace; and 'TRIGGER' for the 'TRIGGERS'namespace
extra: This column shows some extra information, dependant upon the object type.
created: The datetime corresponding to the that the object was created
comments: The comments describing the object

Now, to display a list of all the objects in the current database, just do:


select *
from   utils.objs
where  db = schema()
;

The `refs` View

The refs view lists information about foreign keys. Here's the code:


create 
or replace 
view 
utils.refs (
    db
,   cons
,   tab
,   ref_db
,   ref_cons
,   ref_type
,   ref_tab
,   nulls
,   idpart
)
as
select     c.constraint_schema                
,          c.constraint_name                  
,          c.table_name                       
,          p.constraint_schema                
,          p.constraint_name                  
,          p.constraint_type                  
,          p.table_name                       
,          max(ctcol.is_nullable)             
,          case count(cidcon.constraint_name)       
               when 0 then 'NO'
               else        'YES'
           end                                
from       information_schema.table_constraints c
inner join information_schema.key_column_usage  kc
on         c.table_schema                   = kc.table_schema
and        c.table_name                     = kc.table_name
and        c.constraint_name                = kc.constraint_name
inner join information_schema.columns           ctcol
on         kc.table_schema                  = ctcol.table_schema
and        kc.table_name                    = ctcol.table_name
and        kc.column_name                   = ctcol.column_name
left join  information_schema.key_column_usage  cidcol
on         kc.table_schema                  = cidcol.table_schema
and        kc.table_name                    = cidcol.table_name
and        kc.column_name                   = cidcol.column_name
left join  information_schema.table_constraints cidcon
on         cidcol.constraint_schema         = cidcon.constraint_schema
and        cidcol.constraint_name           = cidcon.constraint_name
and        cidcol.table_name                = cidcon.table_name
and        cidcon.constraint_type          != 'FOREIGN KEY'
inner join information_schema.key_column_usage  kp
on         kc.referenced_table_schema       = kp.table_schema
and        kc.referenced_table_name         = kp.table_name
and        kc.referenced_column_name        = kp.column_name
and        kc.position_in_unique_constraint = kp.ordinal_position
inner join information_schema.table_constraints p
on         kp.table_schema                  = p.table_schema
and        kp.table_name                    = p.table_name
and        kp.constraint_name               = p.constraint_name
where      p.constraint_type               != 'FOREIGN KEY'
and        c.constraint_type                = 'FOREIGN KEY'
group by   c.constraint_schema
,          c.constraint_name
,          c.table_name
,          p.constraint_schema
,          p.constraint_name
,          p.constraint_type
,          p.table_name
;

Apart from the foreign key name and table, the table that is referred to is listed too, along with the primary key or unique constraint of wich the columns are referenced by the foreign key columns. As a bonus, some extra information is shown about the type of relationship implemented by the foreign key.

db: The database in wich the foreign key table (and thus, the foreign key constraint) resides.
cons: The name of the foreign key constraint
tab: The name of the table with the foreign key constraint a.k.a. the child or detail table
ref_db: The database that contains the referenced table
ref_cons: The unique or primary key constraint of the referenced table that contains the columns that are referred by the foreign key columns
ref_type: The type of the referenced constraint: either 'PRIMARY KEY' or 'UNIQUE'
ref_tab: The referenced (a.k.a master or lookup) table
nulls: Whether the foreign key columns accept NULL values: 'YES' (foreign key columns accept nulls, the relationship is not mandatory) or 'NO' (foreign key columns cannot have nulls, thus the relationship is mandatory)
idpart: Wheter the foreign key columns are also part of a 'PRIMARY KEY' or 'UNIQUE' constraint in the referencing table: 'YES' (foreign key columns are partially identifying) or 'NO' (foreign key columns are not part of an identifier)

The refs view does not take into account that MySQL/innodb supports foreign keys that do not refer to either a 'PRIMARY KEY' or a 'UNIQUE' constraint.

(Q: Wow, can MySQL do that? A: Yes, it can! Q: Why would you want to do that anyway? A: uhmm, don't really know, but you can post on this subject if you have a suggestion).

The `idxs` View

The idxs view lists information about indexes. Here's the code:


create
or replace 
view 
utils.idxs (
    db
,   tab
,   name
,   type
,   id   
,   nulls
,   cols
,   maxs
,   mins
)
as
select     t.table_schema
,          t.table_name
,          s.index_name
,          s.index_type
,          if(s.non_unique=0,_utf8'YES',_utf8'NO')
,          if(s.nullable='YES',_utf8'YES',_utf8'NO')
,          count(seq_in_index)  
,          round(max(s.cardinality)/t.table_rows,2)
,          round(min(s.cardinality)/t.table_rows,2) 
from       information_schema.tables       t
inner join information_schema.statistics   s
on         t.table_schema = s.table_schema
and        t.table_name   = s.table_name 
group by   t.table_schema
,          t.table_name
,          s.index_name
,          s.index_type
,          s.non_unique
,          t.table_rows
;

This view yields one row for each index.

db: The database in wich the index table (and thus, the index) resides.
name: The name of the index
tab: The name of the table that defines the index
id: Whether the index accepts duplicate values or value combinations: 'YES' when this is an unique index, else 'NO' if the index accepts duplicates
nulls: Whether the index columns accept NULL values: 'YES' (index columns accept nulls) or 'NO' (index columns are mandatory)
cols: The number of columns in de index
maxs: The maximum selectivity for all columns in the index
mins: The minimum selectivity for all columns in the index

So, good luck with this everyone. I hope this will be as useful to you as it was to me.

6 comments:

Andrew Gilfrin said...: Yet again a brilliant blog. The great thing about your blogs Roland is the way in which you find a way to actually use the new features of 5.0 in a practical environment.; Tuesday, October 18, 2005 at 11:00:00 AM GMT+2
rpbouman said...: Andrew, thank you for the encouragement!

Of course, these views are still just a tools you can use while developing some real application, but they've been very useful to me.; Wednesday, October 19, 2005 at 2:11:00 PM GMT+2
taw said...: Roland,

Thanks very much for this - it's saved me much time in creating something similar for myself!

I'll be adding a slightly more verbose index view, which shows each column in each index, but that's relatively straightforward given the leg-up your work represents.; Wednesday, April 9, 2008 at 2:51:00 AM GMT+2
Anonymous said...: Hi Roland,

Three years later and this post is very relevant so something I was checking about. Great work!

Shlomi Noach; Sunday, December 21, 2008 at 2:25:00 PM GMT+1
Shlomi Noach said...: Hi Roland,

Quick note: perhaps it would be better to use aliases in the views definitions (some columns are named like "case ... when ..." etc.), and also avoid duplicate column names (as in the refs view). I realize this post is almost 4 years old, but perhaps it's still a good idea to update the text.

Regards; Wednesday, May 6, 2009 at 9:11:00 AM GMT+2
rpbouman said...: Hi Shlomi,

thanks for the comment. However, don't agree.

The views do not have duplicate column names - you wouldn't be able to create the view if you had.

Just look at the CREATE VIEW statements - I used column name lists to explicitly name each column in the target view, regardless of the name that's derived from the column expression.

I used this technique primarily because of the utils.objs view, because with this technique, you can freely change the order of the SELECT expressions that make up the UNION ALL without having to add column aliases to all SELECT expressions.

For consistency I stuck to that syntax for the other views mentioned in this article, and given that it solves the problem, I don't think that adding aliases also to the individual items in the SELECT lists is worth the effort.; Wednesday, May 6, 2009 at 9:34:00 AM GMT+2

Roland Bouman's blog

Monday, October 17, 2005

Know Thy Schema...without typing "information_schema.%" so much

The `objs` View

The `refs` View

The `idxs` View

6 comments:

SAP HANA Trick: DISTINCT STRING_AGG

Monday, October 17, 2005

Know Thy Schema...without typing "information_schema.%" so much

The objs View

The refs View

The idxs View

6 comments:

SAP HANA Trick: DISTINCT STRING_AGG

The `objs` View

The `refs` View

The `idxs` View