Showing posts with label Stored Procedure. Show all posts
Showing posts with label Stored Procedure. Show all posts

Wednesday, March 19, 2008

UDFs at the MySQL User's conference

The MySQL User's conference will be held in less than a month from now!!!

This year there is quite a good number of sessions on adding your own functions and procedures, such as:

I will be doing the 3 hour tutorial on writing user-defined functions, and I am currently adding the last few final touches to my slides.

My tutorial will be very much a hands-on experience. The ambition is to allow people with some programming skills in either C/C++, PHP or Java to leave the room with a bunch of UDFs they created themselves during one of labs. With that and the supporting materials (slides and a handout with detailed instructions) the attendees will be able to write UDFs themselves, and have the knowledge that allows them to make a sensible decision when they have to choose between stored SQL functions, UDFs or raw expressions built of built-in functions.

Allow me to tell you a bit about UDFs - you might decide you want to take my tutorial ;)

User-defined functions


User-defined functions or UDFs are often confused with Stored SQL functions but unrightly so. Other than the fact that stored functions and user-defined functions can be created by the user (as opposed to hard-wired, built-in functions), they have little in common. If you want to know exactly what the difference is and what the strengths and weaknesses are of either feature, my tutorial is for you.

Features


For now I don't want unveil too much but I think that it might be good to point out a number of key advantages of using UDF's over stored SQL functions:

  • Flexible argumentlists - UDFs allow a flexible number of dynamically typed arguments, and it is possible to identify arguments by name (rather than position).

  • The ability to leverage the functionality from external libraries - UDFs can be linked to existing libraries in order to expose their functionality to SQL statements

  • Aggregate functions - UDFs can compute a value for a collection of rows just like the built-in functions COUNT and GROUP_CONCAT


Neither of these features is available to stored SQL functions, and a generic workaround is simply impossible or far from trivial. So - UDFs might be for you if you need one of these things.

Performance


Another reason to use UDFs might be performance. UDFs are much, much better than stored SQL functions when it comes to computation. To prove this fact, I'd like to share a number of benchmarks I did.

For the benchmarks I use a stored procedure like this:

create procedure sp_<SOME-NAME>_benchmark(p_num int unsigned)
begin
declare v_num int unsigned default 0;
declare v_return <SOME-DATA-TYPE>;
declare v_begin int unsigned default unix_timestamp();
while v_num < p_num do
set v_return := <SOME-EXPRESSION>;
set v_num := v_num + 1;
end while;
call sp_store_function_benchmark(
'<SOME-NAME>'
, p_num
, unix_timestamp() - v_begin
);
end;

As you can see, the procedure repeats the evaluation of <SOME-EXPRESSION>, depending on the specified number of iterations passed to the parameter p_num. Depending on the benchmark, an expression is plugged in place of <SOME-EXPRESSION>. The variable v_return is used to capture the evaluation result, and a suitable data type is used in place of <SOME-DATA-TYPE>. Before running the loop, the time is recorded. After the loop, the elapsed time, the number of repetitions and the benchmark name are stored in a table for later analysis. This allows me to test a series of increasing repetitions of the same expression.

I also use a sp_noop_benchmark() procedure that is identical to the one described above, except that it omits the assignment of the expresion. So, it lacks this line:

set v_return := <SOME-EXPRESSION>;

This allows us to isolate the time spent on evaluating the expression and doing the assignment. I also use another control measurement that uses an assignment like this:

set v_return := <SOME-LITERAL-VALUE>;

This allows us to estimate the time spent on the value assignment, which can be used to get a bit closer to the actual time spent only on expression evaluation alone. (Of course, this assumes that the time spent on the assignment alone is comparable for expressions, function calls and literals).

All these measurements were made by comparing single, complete calls many times. All tests were done on Ubuntu Gutsy Gibbon using MySQL 5.1.23. Compilation of UDFs was performed using gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2). No optimizations were employed.

Stored functions and UDFs


To compare UDFs and Stored functions, I first used simple "Hello World" expressions that all evaluate to a VARCHAR(14). This was measured:

  • String literal sp_string_benchmark. The expression was:
    v_return := 'Hello, String!';

  • Stored SQL function sp_ssf_benchmark. The expresion was:
    v_return := ssf_hello_world();
    The ssf_hello_world() itself was:

    create function ssf_hello_world()
    returns varchar(14)
    return 'Hello, SSFs!!!';

  • User defined function sp_udf_benchmark. The expression was:
    v_return := udf_hello_world();
    The UDF itself was:

    my_bool udf_hello_world_init(
    UDF_INIT *initid, UDF_ARGS *args,
    char *message
    ){
    return 0;
    }
    char *udf_hello_world(
    UDF_INIT *initid, UDF_ARGS *args
    , char* result, unsigned long* length
    , char *is_null, char *error
    ){
    *length = 14;
    return "Hello, UDFs!!!";
    }

Here are the raw results:

+----------+------+--------+-----+-----+
| repeated | noop | string | UDF | SSF |
+----------+------+--------+-----+-----+
| 0 | 0 | 0 | 0 | 0 |
| 500000 | 3 | 4 | 6 | 8 |
| 1000000 | 7 | 9 | 10 | 16 |
| 1500000 | 10 | 13 | 16 | 25 |
| 2000000 | 13 | 17 | 21 | 33 |
| 2500000 | 16 | 22 | 25 | 42 |
| 3000000 | 19 | 27 | 31 | 50 |
| 3500000 | 23 | 31 | 36 | 58 |
| 4000000 | 26 | 36 | 41 | 67 |
| 4500000 | 30 | 40 | 45 | 75 |
| 5000000 | 33 | 44 | 51 | 83 |
| 5500000 | 36 | 49 | 55 | 90 |
| 6000000 | 39 | 53 | 61 | 99 |
| 6500000 | 42 | 57 | 66 | 107 |
| 7000000 | 45 | 62 | 71 | 115 |
| 7500000 | 49 | 66 | 76 | 123 |
| 8000000 | 52 | 71 | 81 | 131 |
| 8500000 | 56 | 75 | 86 | 140 |
| 9000000 | 58 | 80 | 90 | 147 |
| 9500000 | 62 | 83 | 96 | 156 |
| 10000000 | 64 | 88 | 101 | 164 |
+----------+------+--------+-----+-----+

And here is a graph:
udf-ssf-benchmark
At a glance we see that doing nothing is the fastest, followed by the assignment of a literal string. After that, comes the UDF and the stored SQL function is by far the slowest. Another thing that is immediately apparent is that the execution time increases proportionally as the number of repetitions is increased. This means that we can essentially forget the entire series and focus on the measurements with the highest number of repetitions (which is presumably more accurate than the earlier measurements).

But exactly how much faster is the UDF as compared to the SQL function? Well, it depends on how you look at it. If we just divide the raw execution time of the stored function by that of the UDF, we see that the ratio is:

164/101 = 1.60

This would tempt us into thinking that UDF's are about 60% faster.

However, when we calculate it like this, we are also counting the overhead of the benchmark procedures themselves. We get a better result if we focus on only the expression assignments. We can express the performance in terms of the time required to execute the no-operation benchmark. So:

noop / udf = 64 / 101 = .63
and

noop / ssf = 64 / 164 = .39

This correction indicates that the UDF performs at 63% of the no-operation benchmark, whereas the stored SQL function performs at 39% of the no-operation benchmark. We could say that relative to the no-operation benchmark, UDFs are 24% faster as compared to SQL functions

Now we could stretch it a bit more and try to isolate only the time spent on evaluating the functions. To do that, we'd have to assume that "assignment" costs the same, no matter if we are assigning from a stored SQL function, user-defined function or string literal. If we do that, we get this figure:

literal / udf = 88 / 101 = .87
and

literal / ssf = 88 / 164 = .54

This correction indicates that the UDF performs at 87% of the string-literal benchmark, whereas the stored SQL function performs at 54% of the string-literal benchmark. This would indicate that relative to the string literal assignment, UDFs are 33% faster than SQL functions.

Now, I can imagine that a lot of people will have some reservations against this method of correcting the measurements. Personally I think the method is valid, although the result itself is not that useful. I mean, in real life, the fact is that we will be assigning the value of the function, and from that perspective it doesn't help us much to know how fast it would have been if we didn't perform an assignment. Another reservation that we may have is that this benchmark still doesn't tell us much more than the relative differences in overhead. Basically, both the UDF and the stored function are empty - so this benchmark can tell us nothing about performance in a more realistic case where the function is actually doing some work.

Addition


In an attempt to measure at least some basic processing I decided to benchmark addition too. I took a INTEGER as data type and did the following benchmarks:

  • Integer literal: sp_num_benchmark with the expression:
    v_return := 3;

  • Addition operator: sp_opr_benchmark with the expression:
    v_return := 1+2;

  • UDF: sp_udf_benchmark with the expression:
    v_return := UDF_ADD(1,2);
    The code for the UDF is
    my_bool udf_add_init(
    UDF_INIT *initid
    , UDF_ARGS *args
    , char *message
    ){
    if(args->arg_count!=2){
    strcpy(message, "Require two arguments");
    return 1;
    } else {
    args->arg_type[0]= INT_RESULT;
    args->arg_type[1]= INT_RESULT;
    }
    initid->maybe_null= 1;
    return 0;
    }

    long long udf_add(
    UDF_INIT *initid
    , UDF_ARGS *args
    , char *is_null
    , char *error
    ){
    if((args->args[0]==NULL)||(args->args[1]==NULL)){
    *is_null= 1;
    return 0;
    } else {
    return (*((long long*)args->args[0])) + (*((long long*)args->args[1]));
    }
    }

  • Stored SQL function: sp_ssf_benchmark with the expression:
    v_return := SSF_ADD(1,2);
    The code for the function is

    create function ssf_add(l int, r int)
    returns int
    return l+r;
Here is the raw data:

+--------------+------+-----+-----+-----+-----+
| repeat_count | noop | num | opr | udf | ssf |
+--------------+----- +-----+-----+-----+-----+
| 0 | 0 | 0 | 0 | 0 | 0 |
| 500000 | 3 | 5 | 5 | 6 | 9 |
| 1000000 | 7 | 9 | 9 | 11 | 18 |
| 1500000 | 10 | 14 | 13 | 16 | 27 |
| 2000000 | 13 | 19 | 18 | 21 | 35 |
| 2500000 | 17 | 23 | 22 | 26 | 47 |
| 3000000 | 20 | 27 | 27 | 33 | 56 |
| 3500000 | 23 | 33 | 33 | 38 | 62 |
| 4000000 | 27 | 38 | 36 | 43 | 74 |
| 4500000 | 30 | 42 | 42 | 50 | 83 |
| 5000000 | 34 | 46 | 47 | 54 | 91 |
| 5500000 | 36 | 51 | 60 | 68 | 108 |
| 6000000 | 42 | 56 | 59 | 67 | 115 |
| 6500000 | 45 | 60 | 62 | 71 | 123 |
| 7000000 | 47 | 65 | 69 | 78 | 132 |
| 7500000 | 50 | 67 | 71 | 84 | 143 |
| 8000000 | 54 | 72 | 78 | 89 | 149 |
| 8000000 | 54 | 73 | 78 | 89 | 149 |
| 8500000 | 59 | 78 | 79 | 92 | 157 |
| 9000000 | 60 | 85 | 85 | 99 | 168 |
| 9500000 | 65 | 89 | 92 | 104 | 178 |
| 10000000 | 67 | 91 | 95 | 111 | 189 |
+--------------+------+-----+-----+-----+-----+
And here is the graph of the results:
addition-benchmark
We can immediately see that the result is quite similar to what we saw in the previous measurements. Let's look at the performance relative to assigning a literal integer:

int / opr = 91 / 95 = 0.96
and

int / udf = 91 / 111 = 0.82
and

int / ssf = 91 / 189 = 0.48
So, A direct addition operator is less than 5% slower than a literal assignment. The UDF 18% slower than the literal assignment and 14% slower than the addition operator. The stored SQL function is quite a good deal slower: 48% slower than the direct addition operator and 34% slower than the UDF. If we want to compare only UDFs vs stored SQL functions, we should compare relative to the addition operator, which gives a minutely different result:

opr / udf = 95 / 111 = 0.83
and

opr / ssf = 95 / 189 = 0.5

Built-in functions and UDFs


While I was busy doing benchmarks, I also figured that it'd be nice to figure out how UDFs and built-in functions compare performance-wise. To make things slightly more realistic than simply returning a value, I chose to implement my own version of CONCAT().

Because CONCAT can take on a number of arguments I decided to measure the effect of a varying number of arguments too. I ended up choosing VARCHAR(10) for the data type, and tested these different benchmarks:

  • Built-in function sp_bif<1..10>_benchmark - where <1..10> is a number from 1 to 10, and where the expression was:
    v_return := CONCAT(0[,1[,...,9]);
    That is, 10 different CONCAT measurements, from CONCAT(0) to CONCAT(0,1,2,3,4,5,6,7,8,9)

  • UDF sp_udf<1..10>_benchmark - where <1..10> is a number from 1 to 10, and where the expression was:
    v_return := UDF_CONCAT(0[,1[,...,9]);
    That is, 10 different measurements of my UDF implementation of concat, from UDF_CONCAT(0) to UDF_CONCAT(0,1,2,3,4,5,6,7,8,9).The code for my UDF_CONCAT is:

    my_bool udf_concat_init(
    UDF_INIT *initid, UDF_ARGS *args
    , char *message
    ){
    int i;
    size_t bytes = 0;
    for(i=0;iarg_count; i++){
    args->arg_type[i] = STRING_RESULT;
    bytes += args->lengths[i];
    }
    if(!(initid->ptr= malloc(bytes))){
    strcpy(message, "Error allocating memory.");
    return 1;
    } else {
    return 0;
    }
    }

    char *udf_concat(
    UDF_INIT *initid, UDF_ARGS *args
    , char* result, unsigned long* length
    , char *is_null, char *error
    ){
    int i;
    char *buff = initid->ptr;
    *length = 0;
    for(i=0;iarg_count; i++){
    if(args->args[i]==NULL){
    *is_null = 1;
    break;
    } else {
    memcpy(buff + *length, args->args[i], args->lengths[i]);
    *length += args->lengths[i];
    }
    }
    return buff;
    }

    void udf_concat_deinit(
    UDF_INIT *initid
    ){
    if(initid->ptr){
    free(initid->ptr);
    }
    }

I also measured the same string literal assignment and the noop benchmarks as control measurements. I won't give the raw results here - send me an email or post a comment if you'd like to have them. I do have a graph of the result here:
udf-builtin-benchmarkNote that for this graph I already subtracted the no-operation benchmark. This was done to make it easier to examine the data sets of the UDFs and Built-in functions. The graph does include the literal string benchmark, which is the orange line nearest to the X axis.

Now, what do we see here apart from the control benchmark? Basically, we see two bundles of series. The bundle nearest to the X axis corresponds with the built-in CONCAT() benchmarks. The bundle above that corresponds to the UDF benchmarks. So, this tells us that on every occasion, the built-in CONCAT() was faster than the UDF.

We can also see that there is a relatively small but measurable effect for passing more arguments. The calls with fewer arguments are consistently faster than the calls with more arguments within both the group of built-in and UDF calls.

Frankly, I didn't expect to see this. I had imagined that the effect of passing more arguments would be larger, so I expected to see pairs of UDF/Built-in function calls having the same number of parameters. Clearly, I was wrong.

So how much faster are UDFs? Well, if we look at the raw ratios of execution times for the largest number of repetitions, we see:

UDF_CONCAT(0) / CONCAT(0) = 91 / 107 = .85

Interestingly, if we compare that for the calls with 10 arguments we see the same ratio:

UDF_CONCAT(0,1,2,3,4,5,6,7,8,9) / CONCAT(0,1,2,3,4,5,6,7,8,9) = 91 / 107 = .85

So, UDF_CONCAT() would seem to be 15% slower than the built-in CONCAT(). Interestingly, there seems to be no difference between the call with the larger number of arguments, indicating that the performance impact of passing another argument is about the same for UDFs as it is for built-in functions.

What happens if we express the execution time relative to the no-operation benchmark? Well, we get:

noop / UDF_CONCAT(0) = 64 / 107 = .60
noop / CONCAT(0) = 64 / 91 = .70

and

noop / UDF_CONCAT(0,1,2,3,4,5,6,7,8,9) = 64 / 122 = .52
noop / CONCAT(0,1,2,3,4,5,6,7,8,9) = 64 / 104 = .62

Here we see that relative to the no-operation benchmark, the UDF is 10% slower than the built-in function. We could say that 10% is quite a lot, but it is more than half as large as the difference between UDFs and stored SQL functions.

We see something interesting if we look at the execution time relative to the execution time spent on the string literal assignment:

string / UDF_CONCAT(0) = 88 / 107 = .82
string / CONCAT(0) = 88 / 91 = .87

and

string / UDF_CONCAT(0,1,2,3,4,5,6,7,8,9) = 88 / 122 = .72
string / CONCAT(0,1,2,3,4,5,6,7,8,9) = 88 / 104 = .85

Here, the difference between the single argument calls is only 5%, wereas the difference becomes more 12% for the calls with 10 arguments. This is interesting because in the comparison with the no-operation benchmark we did not detect a difference caused by the the amount of arguments.

My current suspicion is that we are actually witnessing the effect of the length of the left hand expression in the assignment operation. Our string literal is a VARCHAR(14), and it should probably be a CHAR(1) for the comparison to the single argument calls and a CHAR(10) for the comparison with the 10 argument calls.

Conclusion


The UDFs I tested are

  • somewhere around and between 25% to 33% faster than stored SQL functions

  • somewhere around and between 5% to 20% slower than built-in functions


Another interesting finding is that argument passing seems to have about the same impact on UDFs as it has on built-in functions. A single argument seems to cost about 1% of the function performance in both cases. For stored SQL functions no such data is currently available.

Discussion


Currently, all functions call are single complete function calls. That means that for UDFs, the initialization function and the row-level function are both called exactly once per UDF instance. When using UDFs in multi-row SQL statements, the initialization function will be called only once per occurence of the UDF, and the row-level function will be called for each row. For functions where the initialization function is relatively expensive, this means that the performance observed in these benchmarks is probably poorer that it can be. A new set of benchmarks should be done to measure the performance of functions in multi-row SQL statements.

Finally


So, are you interested? If you are, go and register for the conference. You can contact me and get a 20% discount! Of course I'd love to see you attend my tutorial too!

See you at the conference!

Tuesday, February 20, 2007

Updated procedure for creating MySQL FEDERATED tables

Three months ago, I wrote about a procedure to create MySQL FEDERATED tables.

I just added a quick fix that should make the procedure less susceptible to issues relating to bug #23856. To do that, I had to remove the ORDER BY ordinal_position bits in all the calls to GROUP_CONCAT over the rows in the information_schema.COLUMNS table.

If you include the ORDER BY ordinal_position clause, the concatenation result will mess up sometimes. The behaviour can be attenuated somewhat by decreasing value for the group_concat_max_len server variable, but so far, I have not seen a sufficiently large valy of group_concat_max_len that does not display this behaviour.

Omitting the ORDER BY ordinal_position clause does not seem change the order. Rows from information_schema.COLUMNS seem to be ordered by TABLE_SCHEMA, TABLE_NAME, COLUMN and ORDINAL_POSITION anyhow. I hope I can rely on that order, as all sorts of trouble are to be expected when the column order of the local table differs from that of the remote table.

I also added some output so you can see what the procedure is doing. Some steps, like getting the remote metadata, take rather long, and I found some output to be helpful.

The updated procedure can be found here in the usual spot at MySQLForge.

The updated procedure does not yet support creating federated tables using a separate SERVER schema object (see the manual).

Please, try it, and report bugs in the procedure by adding a comment to this blog entry. Thank you!

Wednesday, November 22, 2006

MySQL: A Stored Procedure to Create FEDERATED Tables.

The FEDERATED storage engine is a MySQL feature that lets you access the tables (and views) on a remote MySQL server. In terms of purpose and functionality, it best compares to a "linked table" in MS Access, and to a lesser extent, to Oracle database links.

Creating a federated table is tedious, because in the local database, you need to define a table with exactly the same structure as the remote table. That includes the index definitions, because these are used by the local optimizer to create a query plan (although the statistics for the index will not be locally available)

I created a procedure to make life easier (well, at least as far as creating FEDERATED/ tables is concerned). You can grab if here:

http://forge.mysql.com/snippets/view.php?id=54 https://github.com/sordidfellow/MysqlSnippets/blob/master/p_create_federated_table.sql

Sample1:


call p_create_federated_table(
'192.168.0.100'
, 3306
, 'my_remote_username'
, 'my_remote_password'
, 'my_remote_schema'
, 'my_remote_table'
, 'my_local_schema'
, 'my_local_table'
);


This sample will create a FEDERATED table named 'my_local_table' in the schema 'my_local_schema'. The created table will point to the table 'my_remote_table' in the schema 'my_remote_schema' on the remote MySQL Server on the host 192.168.0.100 that is listening on port 3306

Sample2:


call p_create_federated_table(
'myhost.mydomain.org'
, NULL
, NULL
, NULL
, 'my_schema'
, 'my_table'
, NULL
, NULL
);


This sample will create a FEDERATED table with the same name as the specified remote table, and in the local schema with the same name as the specified remote schema. The remote mysql server is listening on the default mysql port (usually 3306) on the host myhost.mydomain.org'.

If you want to know how it works exactly: the procedure has got pretty good comments. Go get it at http://forge.mysql.com/snippets/view.php?id=54 github, and let me know what you think.

Code

Stephen kindly went into the wayback machine and retrieved the code that used to be on the now defunct MySQL Forge website. Thank you stephen! You can also view the code on github.
delimiter $$

drop  procedure if exists 
p_create_federated_table
$$
create procedure 
p_create_federated_table

(
    -- the ip address or name of the remote mysql server host (if NULL, 'localhost')
    p_remote_host     varchar(32)   
    -- the port where the remote  mysql server is listening (if NULL, 3306)
,   p_remote_port     int unsigned  
    -- the user on the remote server that accesses the table (if NULL, 'root') 
,   p_remote_user     varchar(16)   -- 
    -- the password for the remote user (if NULL, omitted)
,   p_remote_password varchar(32)
    -- the schema in which the remote table resides
,   p_remote_schema   varchar(64)
    -- the name of the remote table
,   p_remote_table    varchar(64)
    -- the local schema to create the local FEDERATED table (if NULL, p_remote_schema)
,   p_local_schema    varchar(64)
    -- the name of the local FEDERATED table (if NULL, p_remote_table)
,   p_local_table     varchar(64)
)
LANGUAGE SQL
NOT DETERMINISTIC
MODIFIES SQL DATA
SQL SECURITY INVOKER
COMMENT 'Creates a FEDERATED table.'
/*
    Changelog
    WHEN?      WHO? WHAT?
    -----------------------------------------------------------
    2007-02-20 RPB  added handler to ignore warning 1366. Not sure why this appears.
                    lowered the group_concat_max_len (bug #23856)
                    removed the ORDER BY clauses from the GROUP_CONCATs on COLUMN_TYPE (bug #23856)
                    added output so we can see what's taking so long
                    added changelog
    2006-11-20 RPB  Created Initial version  
*/ 
begin
    -- size used for the GROUP_CONCAT buffer if current is lower
    -- please see http://bugs.mysql.com/bug.php?id=23856
    declare v_group_concat_max_len smallint 
        default 16384;
    -- stores the original size of the GROUP_CONCAT buffer to restore it
    declare v_old_group_concat_max_len int unsigned
        default @@group_concat_max_len;
    -- stores the original sql_mode
    declare v_old_sql_mode varchar(255)
        default @@sql_mode;
    -- Used to drop temporary tables
    declare v_drop_table_name varchar(64);

    -- set the GROUP_CONCAT buffer sufficiently large
    set @@group_concat_max_len := greatest(
        v_group_concat_max_len
    ,   v_old_group_concat_max_len
    );
    -- set the sql_mode to default to prevent invalid column defaults
    set @@sql_mode := '';

    -- The following block contains all the meat
    -- We use a separate block to allow for proper error handling.
    -- All errors that might be expected are handled inside this block
    -- This should guarantee that the outer block is always completed.
    -- That is necessary, beause we need to do a little cleaning up 
    -- before exiting the procedure.
    begin
        -- Used for defaulting the specified host
        declare v_remote_host     varchar(32) 
            default coalesce(p_remote_host,'localhost');
        -- Used for defaulting the specified user
        declare v_remote_user     varchar(16) 
            default coalesce(p_remote_user,'root');
        -- Used for defaulting the local schema
        declare v_local_schema    varchar(64)
            default coalesce(p_local_schema,p_remote_schema);
        -- Used for defaulting the local schema
        declare v_local_table     varchar(64)
            default coalesce(p_local_table,p_remote_table);
        -- Holds the connectstring prefix for FEDERATED tables
        declare v_connectstring   varchar(255)
            default concat(
                'mysql://'
            ,   v_remote_user
            ,   if(p_remote_password is null
                ,   ''
                ,   concat(':',p_remote_password)
                )
            ,   '@'
            ,   v_remote_host
            ,   if(p_remote_port is null
                ,   ''
                ,   concat(':',p_remote_port)
                )
            ,   '/'
            );

        -- Various conditions we might encounter
        -- We rename them just for clarity
        declare TABLE_EXISTS_ERROR condition FOR 1050;
        declare UNKNOWN_COLUMN_ERROR condition FOR 1054;
        declare SYNTAX_ERROR condition FOR 1064;
        declare GROUP_CONCAT_TRUNCATION_ERROR condition FOR 1260;
        declare INCORRECT_VALUE_ERROR condition FOR 1366;
        declare TRUNCATION_ERROR condition FOR 1406;
        declare FEDERATION_ERROR condition FOR 1429;

        -- Various handlers. These will execute if one of the conditions occur.
        -- They all show a friendly error message and the exit the inner block.
        -- Execution is resumed at the clean up code, just before 
        -- the end of the proceudre
        declare exit handler for FEDERATION_ERROR 
            select  'Federation error' error_type
            ,       'Check the connectstring.' error_message
            ,       v_connectstring connectstring
            ;
        declare exit handler for GROUP_CONCAT_TRUNCATION_ERROR 
            select  'GROUP_CONCAT Truncation' error_type
            ,       'Increase GROUP_CONCAT_MAX_LEN.' error_message
            ,      @@group_concat_max_len group_concat_max_len
            ;
/*
        declare exit handler for SYNTAX_ERROR 
            select 'Syntax Error' error_type
            ,      'Check this statement.' error_message
            ,      @create_table_statment statement
            ;
*/
        declare exit handler for TABLE_EXISTS_ERROR 
            select 'Table Exists' error_type
            ,      'Drop the table first.' error_message
            ,      concat(v_local_schema,'.',v_local_table) table_identifier
            ;
        declare exit handler for UNKNOWN_COLUMN_ERROR 
            select 'Unknown Column' error_type
            ,      'Unexpected error, check the statement.' error_message
            ,      @create_table_statement table_identifier
            ;
        declare exit handler for NOT FOUND 
            select 'No such Table' error_type
            ,      'The requested table was not found on the remote host.' error_message
            ,      concat(p_remote_schema,'.',p_remote_table) table_identifier
            ;
 declare continue handler for INCORRECT_VALUE_ERROR 
            select 'We hit warning 1366. It''s probably nothing serious.'
            ;
        declare exit handler for SQLEXCEPTION
            select 'SQL Exception' error_type
            ,      'Unexpected generic error. Debug the procedure.' error_message
            ,      concat(
                           'CALL ',schema(),'.', p_create_federated_table,'(' 
                   ,       if( p_remote_host is null
                           ,   'NULL'
                           ,   concat('''',p_remote_host,'''')
                           )
                   ,',',   if( p_remote_port is null
                           ,   'NULL'
                           ,   p_remote_port
                           )
                   ,',',   if( p_remote_user is null
                           ,   'NULL'
                           ,   concat('''',p_remote_user,'''')
                           )
                   ,',',   if( p_remote_password is null
                           ,   'NULL'
                           ,   concat('''',p_remote_password,'''')
                           )
                   ,',',   if(
                               p_remote_schema is null
                           ,   'NULL'
                           ,   concat('''',p_remote_schema,'''')
                           )
                   ,',',   if( p_remote_table is null
                           ,   'NULL'
                           ,   concat('''',p_remote_table,'''')
                           )
                   ,',',   if( p_local_schema is null
                           ,   'NULL'
                           ,   concat('''',p_local_schema,'''')
                           )
                   ,',',   if( p_local_table is null
                           ,   'NULL'
                           ,   concat('''',p_local_table,'''')
                           )
                   ,   ')'
                   ) call_command
            ; 

        -- The following block creates temporary federated tables 
        -- on the remote information_schema. 
        -- These are needed to generate the structure 
        -- of the local federated table. 
 select 'Getting remote metadata...';
        begin
            -- Prefix used for the temporary tables
            declare v_temp_table_prefix char(25)
                default 'p_create_federated_table$';
            -- Used to fetch the generated DDL from the cursor
            declare v_create_table_statment text;
            -- Cursor loop control variable
            declare v_no_more_rows boolean
                default FALSE;
            -- Cursor generates DDL for createing temporary federated
            -- tables on the remote information_schema.
            -- We need this to generate the DDL to create the actual 
            -- federated table specified by the user.
            declare csr_metadata cursor for 
                select      table_name
                ,           concat(
                                'create temporary table'
                            ,'\n',schema(),'.'
                            ,v_temp_table_prefix,table_name,'('
                            ,'\n',group_concat(
                                    column_name
                                ,   ' '
                                ,   column_type
                                ,   if(
                                        character_set_name is null
                                    ,   ''
                                    ,   concat(
                                            ' character set '
                                        ,   character_set_name
                                        ,   ' collate '
                                        ,   collation_name
                                        )
                                    )
                                ,   if( is_nullable='NO'
                                    ,   ' NOT NULL'
                                    ,   ''
                                    )
                                    separator '\n,'
                                )
                            ,'\n',')'
                            ,'\n','engine = federated'
                            ,'\n','connection = '
                            ,'\n',''''
                            ,     v_connectstring
                            ,     table_schema
                            ,     '/'
                            ,     table_name
                            ,     '''' 
                            )
                from        information_schema.columns
                where       table_schema = 'information_schema'
                and         table_name IN (
                                'COLUMNS'
                            ,   'STATISTICS'
                            ,   'TABLE_CONSTRAINTS'
                            )
                group by    table_schema
                ,           table_name
                ;
            -- handler to control the cursor loop
            declare continue handler for NOT FOUND 
                set v_no_more_rows := TRUE;

            set @drop_temporary_tables_statement := null;

            -- loop through the cursor
            open csr_metadata;
            my_loop: loop
                -- get the DDL for the temporary federated 
                -- information_schema table
                fetch csr_metadata 
                into  v_drop_table_name
                ,     v_create_table_statment;
  
                -- basic cursor loop control exits if cursor is exhausted
                if v_no_more_rows then
                    close csr_metadata;
                    leave my_loop;
                end if;
                -- build a statement to drop all temporary tables
                set @drop_temporary_tables_statement := if (
                    @drop_temporary_tables_statement is null
                ,   concat(
                        'DROP TEMPORARY TABLE IF EXISTS '
                    ,   v_temp_table_prefix
                    ,   v_drop_table_name
                    )
                ,   concat(
                        @drop_temporary_tables_statement
                    ,   ','
                    ,   v_temp_table_prefix
                    ,   v_drop_table_name
                    )
                );

                -- kludge: need a user variable to execute the DDL string
                -- dynamically with the PREPARE syntax
                set @create_table_statment := v_create_table_statment;

                -- create the temporary federated information_schema table 
                prepare stmt from @create_table_statment;
                execute stmt;
                deallocate prepare stmt;

            end loop;
        end;
 -- Reset the variable. Mainly to simplify debugging
 select 'Generating CREATE TABLE statement for FEDERATED table...';
 set @create_table_statment:='...generating statement...';
        -- This creates the actual ddl for the requested local FEDERATED table.
        -- It selects the DDL directly into the user variable. 
        -- It does this by querying the remote information_schema.
        -- This DDL includes the index definitions of the remote table.
        select      concat(
                        'create table if not exists'
                    ,'\n','`',v_local_schema,'`'
                    ,'.' ,'`',v_local_table,'`'
                    ,     '('
                    ,'\n',column_definitions
                    ,     coalesce(index_definitions,'')
                    ,'\n',')'
                    ,'\n','engine = federated'
                    ,'\n','connection = '
                    ,'\n',''''
                    ,     v_connectstring
                    ,     column_definitions.table_schema
                    ,     '/'
                    ,     column_definitions.table_name
                    ,     '''' 
                    ) stmt
        into        @create_table_statement
        from        (
                    select      table_schema
                    ,           table_name 
                    ,           group_concat(
                                        '`',column_name,'` '
                                    ,   column_type
                                    ,   if(
                                            character_set_name is null
                                        ,   ''
                                        ,   concat(
                                                ' character set '
                                            ,   character_set_name
                                            ,   ' collate '
                                            ,   collation_name
                                            )
                                        )
                                    ,   if( is_nullable='NO'
                                        ,   ' not null'
                                        ,   ''
                                        )
                                    ,   if( column_default is null
                                        ,   ''
                                        ,   concat(
                                                ' default '
                                            ,   case
                                                    when data_type = 'TIMESTAMP' 
                                                    and  column_default = 'CURRENT_TIMESTAMP'
                                                        then column_default
                                                    when data_type like '%char' 
                                                    or   data_type like 'date%'
                                                    or   data_type like 'time%'
                                                    or   data_type in ('set','enum')
                                                        then concat('''',column_default,'''')
                                                    else column_default
                                                end
                                            )
                                        )
                                    ,   if(extra='','',concat(' ',extra))
                                    ,   ' comment '
                                    ,   '''',column_comment,''''                                        
                                        separator '\n,'
                                ) as column_definitions
                    from        p_create_federated_table$columns
                    where       table_schema = p_remote_schema
                    and         table_name   = p_remote_table
                    group by    table_schema
                    ,           table_name
                    ) column_definitions
        left join   (
                    select      index_definitions.table_schema
                    ,           index_definitions.table_name 
                    ,           concat(
                                    '\n,'
                                ,   group_concat(
                                        case c.constraint_type 
                                            when 'PRIMARY KEY' then 
                                                constraint_type
                                            when 'UNIQUE' then 
                                                concat(
                                                    'CONSTRAINT '
                                                ,   constraint_name
                                                ,   ' '
                                                ,   constraint_type
                                                )
                                            else
                                                concat(
                                                    if( index_type in (
                                                            'FULLTEXT'
                                                        ,   'SPATIAL'
                                                        )
                                                    ,   concat(
                                                            index_type
                                                        ,   ' '
                                                        )
                                                    ,   ''
                                                    ) 
                                                ,   if( non_unique
                                                    ,   ''
                                                    ,   'UNIQUE '
                                                    )
                                                ,   'INDEX '
                                                ,   '`',index_name,'`'
                                                )
                                        end
                                    ,   index_columns
                                    ,   if( index_type in (
                                                'BTREE'
                                            ,   'HASH'
                                            )
                                        ,   concat(
                                                ' USING '
                                            ,   index_type
                                            )
                                        ,   ''
                                        )
                                        order by c.constraint_type
                                        separator '\n,'
                                    )
                                )    as index_definitions
                    from        (
                                select      table_schema
                                ,           table_name
                                ,           index_name
                                ,           index_type
                                ,           non_unique
                                ,           concat(
                                                '('
                                            ,   group_concat(
                                                    '`',column_name,'`'
                                                ,   if( sub_part is null
                                                    ,   ''
                                                    ,   concat(
                                                            '(',sub_part,')'
                                                        )
                                                    )
                                                    order by seq_in_index
                                                )
                                            ,   ')'
                                            ) index_columns
                                from        p_create_federated_table$statistics
                                where       table_schema = p_remote_schema
                                and         table_name   = p_remote_table
                                and         index_type not in ('FULLTEXT')
                                group by    table_schema
                                ,           table_name
                                ,           index_name
                                ,           index_type
                                ,           non_unique
                                ) index_definitions
                    left join   (
                                select      table_schema
                                ,           table_name
                                ,           constraint_name
                                ,           constraint_type
                                from        p_create_federated_table$table_constraints c
                                where       table_schema    = p_remote_schema
                                and         table_name      = p_remote_table
                                and         constraint_type in (
                                                'PRIMARY KEY'
                                            ,   'UNIQUE'
                                            )
                                group by    table_schema
                                ,           table_name
                                ,           constraint_name
                                ,           constraint_type
                                ) c
                    on          index_definitions.table_schema = c.table_schema
                    and         index_definitions.table_name   = c.table_name
                    and         index_definitions.index_name   = c.constraint_name
                    group by    table_schema
                    ,           table_name
                    ) index_definitions
        on          column_definitions.table_schema = index_definitions.table_schema
        and         column_definitions.table_name = index_definitions.table_name
        ;

        -- Create the actual FEDERATED table by dynamically executing
        -- the generated DDL for the requested FFEDERATED table. 
 select 'Creating FEDERATED table...';

        prepare stmt from @create_table_statement;
        execute stmt;
        deallocate prepare stmt;

        -- Print a friendly message that we succeeded
        select   'Success' completion_type
        ,        concat(
                        'Created FEDERATED table '
                    ,   v_connectstring,'/',p_remote_schema,'/',p_remote_table
                    ) completion_message
        ,        concat(v_local_schema,'.',v_local_table) table_identifier
        ;
    end;

    -- Cleanup: restore the original sql mode 
    set @@sql_mode := v_old_sql_mode;
    -- Cleanup: restore the original GROUP_CONCAT buffer size
    set @@group_concat_max_len := v_old_group_concat_max_len;

    -- Cleanup: drop the temporary federated information_schema tables.
    prepare stmt from @drop_temporary_tables_statement;
    execute stmt;
    deallocate prepare stmt;

    -- Cleanup: reset the user defined variables.
    set @create_table_statment := null
    ,   @drop_temporary_tables_statement := null
    ;
end;
$$

delimiter ;

DataZen winter meetup 2025

The DataZen winter meetup 2025 is nigh! Join us 18 - 20 February 2025 for 3 days of expert-led sessions on AI, LLM, ChatGPT, Big Data, M...