Tuesday, October 25, 2005

Formatting MySQL code for the Web...in My:SPL

Among the more time consuming things maintaining this blog is formatting the code examples. I decided it was time for me to take that to the next level so I wrote a HTML formatter for mysql code. In case you're interested, it's available for download (zip 7kb). After downloading, just unzip it to your favourite folder, read the readme.txt, and everything should be fine after that.
I will spend the rest of this blog entry on some explanation and examples.

The Function format_mysql_as_html()


Here's the function that's responsible for actually creating html from mysql code. It is passed a string parameter via p_sql, and cuts that to bits: the tokens.
Dependant upon the type of the token maintained by v_state, it's wrapped into a <span> tag or left just as it is. The <span> is given a class attribute containing part of the name of the type of token, so we can mark it up through css later on.
There's one special type of token, the 'ACCEPT_OTHER' type. A token of this type is checked against the mysql.help_keyword table to see if it is a reserved word, in wich case it gets an extra distinguishing css class. (Alas, this table does not seem to contain all the words that are considered keywords; for example, it does not contain DECLARE or ELSEIF. But, for now, I'll settle for this.)
Finally, a token is produced of the type 'ACCEPT_END_OF_INPUT', wich terminates the function, returning a string of html.

CREATE FUNCTION `utils`.`format_mysql_as_html`(
p_sql longtext
)
RETURNS longtext
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
BLOCK_MAIN: begin
declare v_line int unsigned default 1;
declare v_position int unsigned default 1;
declare v_token varchar(65535) ;
declare v_html_token varchar(65535);
declare v_html longtext default '';

declare v_from_index int unsigned default 1;
declare v_to_index int unsigned default 1;
declare v_state enum(
'ACCEPT_BACKTICK'
, 'ACCEPT_APOS'
, 'ACCEPT_QUOT'
, 'ACCEPT_WHITESPACE'
, 'ACCEPT_SINGLE_LINE_COMMENT'
, 'ACCEPT_MULTI_LINE_COMMENT'
, 'ACCEPT_OTHER'
, 'ACCEPT_END_OF_INPUT'
, 'ACCEPT_ENDLINE'
, 'ACCEPT_DELIMITER'
, 'ACCEPT_AT'
, 'ACCEPT_ATAT'
);

set v_html := concat(v_html,'\n','<pre>');
LOOP_TOKENS: loop
call next_mysql_token(
p_sql
, v_to_index
, v_state
);
set v_token := substring(
p_sql
, v_from_index
, v_to_index - v_from_index
);
set v_html_token := html_entities(v_token);
if v_state = 'ACCEPT_END_OF_INPUT' then
leave LOOP_TOKENS;
elseif v_state in (
'ACCEPT_WHITESPACE'
, 'ACCEPT_ENDLINE'
, 'ACCEPT_DELIMITER'
) then
set v_html := concat(v_html,v_html_token);
else
set v_html := concat(v_html,'<span class="',substring(v_state,8));
if v_state = 'ACCEPT_OTHER' then
if exists (
select null
from mysql.help_keyword
where name = v_token
) then
set v_html := concat(v_html,' RESERVED');
end if;
end if;
set v_html := concat(v_html,'">',v_html_token,'</span>');
end if;
set v_from_index := v_to_index;
end loop LOOP_TOKENS;
set v_html := concat(v_html,'\n','</pre>');

return v_html;
end BLOCK_MAIN


Most of the work is of course done by the next_mysql_token() (Im just printing the signature of that one, not the body. Look it up in the dist. if you're curious):

CREATE PROCEDURE `utils`.`next_mysql_token`(
IN p_text longtext
, INOUT p_index int unsigned
, OUT p_current_state enum(
'ACCEPT_BACKTICK'
, 'ACCEPT_APOS'
, 'ACCEPT_QUOT'
, 'ACCEPT_WHITESPACE'
, 'ACCEPT_SINGLE_LINE_COMMENT'
, 'ACCEPT_MULTI_LINE_COMMENT'
, 'ACCEPT_OTHER'
, 'ACCEPT_END_OF_INPUT'
, 'ACCEPT_ENDLINE'
, 'ACCEPT_DELIMITER'
, 'ACCEPT_AT'
, 'ACCEPT_ATAT'
)
)
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER

This function implements a deterministic state machine. It starts off reading p_text from position p_current_index. Then, it reads characters from p_text, one by one. For each character, the internal state is updated untill an accepting state is reached. Then, it returns.
When it returns, it will have updated the p_current_state as well as p_current_index, wich is advanced to the point right after the previous token. The token itself is easliy excised from the text by taking the substring between the positions indicated by p_current_index before and after the call.

Formatting entire routines


The information_schema database contains a ROUTINES table, providing us with lots of code to feed into format_mysql_as_html(). However, there's a small problem with that. It does not contain information about the parameterlist of the routine. Lucky for us, there's the mysql.proc table (Thanks Pro MySQL, didn't know that one before) that does contain all the information. So, I came up with the get_routine_ddl() function:

CREATE FUNCTION `utils`.`get_routine_ddl`(
p_schema varchar(64)
, p_name varchar(64)
)
RETURNS longtext
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
begin
declare v_routine_ddl longtext;
select concat(
'CREATE ',p.type
, ' `',p.db,'`.`',p.name,'`'
, '(',convert(p.param_list using utf8),')','\n'
, if( p.type='FUNCTION'
, concat('RETURNS ',p.returns,'\n')
, ''
)
, 'LANGUAGE ',p.language,'\n'
, if( p.is_deterministic='NO'
, 'NOT '
, ''
), 'DETERMINISTIC','\n'
, replace(p.sql_data_access,'_',' '),'\n'
, 'SQL SECURITY ',p.security_type,'\n'
, convert(p.body using utf8)
)
into v_routine_ddl
from mysql.proc p
where p.db = p_schema
and p.name = p_name
;
return v_routine_ddl
;
end


This function takes the name of a schema and the name of a function or stored procedure, and returns the complete DDL for it. Because it needs to read from the mysql.proc table, I decided it would be best to create it with SQL SECURITY DEFINER. Then it can be created by a user that has access to that table, but can be executed by anyone that is granted execution of the function.
To top it off, I decided it would be fun to be able to write the formatted ddl directly to file. This can be done with the SELECT...INTO OUTFILE syntax, but sadly that does not accept a variable to specify the filename.
No sweat, since MySQL 5.0.13, the prepared statement syntax is available inside stored procedures! This comes in handy in the next function:

CREATE PROCEDURE `utils`.`write_routine_html_file`(
p_schema varchar(64)
, p_name varchar(64)
, p_outfile varchar(255)
)
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
begin
declare v_quote char(1) default '''';

set @statement_text := concat(
'select '
, 'utils.format_mysql_as_html('
, 'utils.get_routine_ddl(?,?)'
, ')'
, ' into outfile '
, v_quote
, p_outfile
, v_quote
, ' fields escaped by '
, v_quote,v_quote
, ' lines terminated by '
, v_quote,v_quote
);

set @p_schema := p_schema;
set @p_name := p_name;

prepare stmt
from @statement_text
;
execute stmt
using @p_schema
, @p_name
;
deallocate prepare stmt
;
end


This function just wraps the get_routine_ddl() function inside a prepared SELECT...INTO OUTFILE allowing for a variable filename.

Now, using a statement like:

call utils.write_routine_html_file(
'utils'
, 'format_mysql_as_html'
, 'D:/temp/format_mysql_as_html.txt'
);


generates the listing found on top of this blog entry...effortlessly.

1 comment:

Anonymous said...

This is very interesting work! Thanks for the idea.

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...