In relation to MySQL Connector/Python, I just read a post from Geert about how he uses the MySQL
information_schema
to generate some Python code. In this particular case, he needs the data from the COLLATIONS
table to maintain a data structure that describes all collations supported by MySQL. For some reasons that I cannot fathom, Geert needed to generate a structure for each possible collation, not just the ones for which the
COLLATIONS
table contains a row. To do this, he wrote a stored procedure that uses a cursor to loop through the COLLATIONS
table. In the loop, he detects it whenever there's a gap in the sequence of values from the ID
column, and then starts a new loop to "fill the gaps". For each iteration of the outer cursor loop, a piece of text is emitted that conforms to the syntax of a Python tuple describing the collation, and each iteration of the inner loop generates the text None
, a Python built-in constant.The final result of the procedure is a snippet of Python code shown below (abbreviated):
..
("cp1251","cp1251_bulgarian_ci"), # 14
("latin1","latin1_danish_ci"), # 15
("hebrew","hebrew_general_ci"), # 16
None,
("tis620","tis620_thai_ci"), # 18
("euckr","euckr_korean_ci"), # 19
..
In the final code, these lines are themselves used to form yet another tuple:
desc = (
None,
("big5","big5_chinese_ci"), # 1
("latin2","latin2_czech_cs"), # 2
("dec8","dec8_swedish_ci"), # 3
("cp850","cp850_general_ci"), # 4
..
This is excellent use of the information schema! However, I am not too thrilled about using a stored routine for this. Enter my fosdem talk about refactoring stored routines.
In this case, performance is not really an issue, so I won't play that card. But many people that do need well-performing stored procedures might start out like Geert and write a cursor loop, and perhaps do some looping inside that loop. One of the big take-aways in my presentation is to become aware of the ways that you can avoid a stored procedure. Geerts procedure is an excellent candidate to illustrate the point. As a bonus, I'm adding the code that is necessary to generate the entire snippet, not just the collection of tuples inside the outer pair of parenthesis.
So, here goes:
set group_concat_max_len := @@max_allowed_packet;
select concat('desc = (',
group_concat('\n '
, if( collations.id is null, 'None',
concat('(', '"', character_set_name, '"',
',', '"', collation_name, '"', ')')
)
, if(ids.id=255, '', ','), ' #', ids.id
order by ids.id
separator ''
), '\n)'
)
from (select (t0.id<<0) + (t1.id<<1) + (t2.id<<2)
+ (t3.id<<3) + (t4.id<<4) + (t5.id<<5)
+ (t6.id<<6) + (t7.id<<7) id
from (select 0 id union all select 1) t0
, (select 0 id union all select 1) t1
, (select 0 id union all select 1) t2
, (select 0 id union all select 1) t3
, (select 0 id union all select 1) t4
, (select 0 id union all select 1) t5
, (select 0 id union all select 1) t6
, (select 0 id union all select 1) t7) ids
left join information_schema.collations on ids.id = collations.id;
This query works first by generating 256 rows having id's ranging from 0 to 255. (I think I recall Alexander Barkov mentioning that this is currently the maximum number of collations that MySQL supports - perhaps I am wronge there). This is done by cross-joining a simple derived table that generates two rows:
(select 0 id union all select 1)
So, one row that yields
0
, and one that yields 1
. By cross-joining 8 of these derived tables, we get 2 to the 8th power rows, which equals 256. In the SELECT
-list, I use the left bitshift operator <<
to shift the original 0
and 1
0, 1, 2 and so on up to 7 positions. By then adding those values together, we fill up exactly one byte, and gain all possible values from 0 through 255:
(select (t0.id<<0) + (t1.id<<1) + (t2.id<<2)
+ (t3.id<<3) + (t4.id<<4) + (t5.id<<5)
+ (t6.id<<6) + (t7.id<<7) id
from (select 0 id union all select 1) t0
, ... t1
, ...
, (select 0 id union all select 1) t7) ids
Once we have this, the rest is straightforward - all we have to do now is use a
LEFT JOIN
to find any collations from the information_schema.COLLATIONS
table in case the value of its ID
column matches the value we computed with the bit-shifting jiggery-pokery. For the matching rows, we use CONCAT
to generate a Python tuple describing the collation, and for the non-matching rows, we generate None
:
if( collations.id is null, 'None',
concat('(', '"', character_set_name, '"',
',', '"', collation_name, '"', ')')
)
The final touch is a
GROUP_CONCAT
that we use to bunch these up into a comma separated list that is used as entries for the outer tuple. As always, you should set the value of the group_concat_max_len
server variable to a sufficiently high value to hold the contents of the generated string, and if you want to be on the safe side and not run the risk of getting a truncated result, you should use max_allowed_packet
.I have the honour of speaking at the MySQL user conference, april 12-15 later this year. There, I will be doing a related talk called Optimizing MySQL Stored Routines. In this talk, I will explain how stored routines impact performance, and provide some tips on how you can avoid them, but also on how to improve your stored procedure code in case you really do need them.