Friday, September 30, 2005

Why REPEAT and WHILE are usually not handy to handle MySQL CURSORs

Most procedural languages offer some kind of syntax that influences program flow so that a statement or statement-sequence is allowed to be repeated. The MySQL stored procedure language (I think I'll start calling that MySPL from now on) offers three such language constructs: LOOP..END LOOP, REPEAT..UNTIL..END REPEAT and WHILE..DO..END WHILE.

Now, each of these essentially do the same: they denote a region in the stored procedure that will repeated at runtime. The differences between these constructs have to do with the way the repetition is terminated.

Unstructured loops: LOOP..END LOOP


LOOP..END LOOP is the simplest of the three loop constructs. LOOP denotes the start of the repetitive region, END LOOP denotes the end of the repetitive region, and anything inbetween those two markers is repeated. Period.


procedure p_count()
begin
declare
v_counter
int unsigned
default 0
;
loop -- start a repetitive region
set v_counter := v_counter + 1; -- just keep counting
end loop; -- end the repetitive region
end;


If the programmer does not take any special precautions, the repetition will be inifinte, which is not a good thing.

Lucky for the programmer, MySPL defines the LEAVE construct. At runtime LEAVE performs a kind of jump outside a region of code, resuming program execution after that particular region. When the LEAVE is executed from inside the body of a LOOP..END LOOP, the LEAVE can be used to jump right behind the END LOOP, effectively terminating repetition. Of course, just writing LEAVE is no good, as it effectively undoes the repetitive effect of the loop we wanted to achieve in the first place. But a conditionally executed LEAVE is very useful. Check out this snippet:


procedure p_count()
begin
declare
v_max_counter
int unsigned
default 10
;
declare
v_counter
int unsigned
default 0
;
myLoop: loop -- start the repetitive region named 'myLoop'
if -- check if we have counted enough
v_counter = v_max_counter
then
leave myLoop; -- terminate repetition of region 'myLoop'
end if;
set v_counter := v_counter + 1; -- just keep counting
end loop myLoop; -- end the repetitive region 'myLoop'
-- when the loop has been left, execution continues from here
end;


This snippet differs in two ways from the previous one. Apart from the introduction of the IF..THEN..LEAVE..END IF, this snippet explicitly labels the repeated region myLoop. The region is labeled by prefixing the start of the region (the LOOP keyword in this case) by the identifier for the region ('myLoop') followed by a colon. We really must label the LOOP..END LOOP, else LEAVE does not know where to jump to, or rather, what to 'jump out of' (=leave).

Because the code that handles loop termination (IF..THEN..LEAVE..END IF in our snippt) is syntactically not part of the LOOP..END LOOP construct, LOOP..END LOOP is considered to be an unstructured loop construct.

Structured loops


The other MySQL loop constructs, REPEAT..UNTIL..END REPEAT and WHILE..DO..END WHILE are structured loop constructs. Their syntax expliclty requires a boolean expression, a condition, that controls loop termination. This compels the programmer to think about loop termination right away, providing some protection against writing infinite loops. Also, these loops have a well-defined, immutable exit point.

For a REPEAT..UNTIL..END REPEAT loop, the condition is checked after execution of the statements between the REPEAT and UNTIL markers. If the condition evaluates to TRUE, the loop terminates, and program execution resumes after the END REPEAT marker. If the condition does not evaluate to TRUE, that is, if it evaluates to either FALSE or NULL, program execution is repeated starting at the statement after the REPEAT.


procedure p_count()
begin
declare
v_max_counter
int unsigned
default 10
;
declare
v_counter
int unsigned
default 0
;
repeat -- marks the start of the repeat loop
set v_counter := v_counter + 1; -- just keep counting
until -- marks the end of the loop body
v_counter = v_max_counter -- this boolean expression constitutes the loop control condition
end repeat; -- marks the end of the loop
end;


For a WHILE..DO..END WHILE loop, the condition is checked before execution of the statements between the DO and END WHILE markers. If the condition does not evaluate to TRUE, that is, if it evaluates to FALSE or NULL, the loop terminates, and program execution resumes after the END WHILE marker. If the condition evaluates to TRUE, the loop is entered and the statements between the DO and END WHILE markers are executed. When program execution reaches de END WHILE marker, the condition between the WHILE and DO markers is re-evaluated, and it all starts all over again.


procedure p_count()
begin
declare
v_max_counter
int unsigned
default 10
;
declare
v_counter
int unsigned
default 0
;
while -- marks the start of the loop
v_counter <> v_max_counter -- this boolean expression constitutes the loop control condition
do -- marks the start of the loop body
set v_counter := v_counter + 1; -- just keep counting
end while; -- marks the end of the loop body
end;


Like with LOOP..END LOOP, you can still write LEAVE inside such a loop, but this is considered bad practice. In general, Good Code Writing Rules demand that code blocks, including loops, have a single point of entry, and a single exit. By adhering to this rule, you ensure that the loops are easier to maintain, and easier to debug. Ignoring this rule can lead to 'spaghetti' code, that's code that, instead of maintaining a clear sequential flow of program execution, jumps back and forth.

Why people favour structured loops


Generally, people favour structured loops over unstructured ones. The syntactic design of the structured loop constructs makes it easy to see where the repetition terminates and what condition controls termination. The only things that the programmer needs to do is make sure the condition is right. Also, the programmer must refrain from writing some additional LEAVE inside the loop. So, that's one thing programmers must do, and one thing the must not do.

For an unstructured loop, the programmer must write some kind of conditional statement that checks the condition. Apart from checking the condition, the programmer must also actually write the code that terminates the loop. In an unstructured loop, the condition check could appear anywhere inside the loop, so one really must read the entire loop code to see where the exit point is and what condition controls it. As for structured loops, programmers should refrain from writing multiple condition checks and exit points. This holds true for both unstructured and structured loops.

Loops and cursors


In my previous post, I illustrated CURSOR traversal in both Oracle and MySQL. I used an unstructured LOOP..END LOOP construct there in both examples. In fact, I usually handle cursors in MySPL using the unstructured loop construct. I'll hope I can explain why.

Take a look at the following snippet. I didn't make it up myself, I took it from the Cursors section in the MySQL Reference manual.
UPDATE: David Beroff kindly pointed out to me that at some point during the past decade that this post has been online, the example in the MySQL manual changed and now uses an LOOP...END LOOP construct. I don't know if this post had anything to do with that, but I'm happy it was changed anyway. Yay!
I did simplify it a little to keep a focus on the problem at hand, using a loop and a cursor:


CREATE PROCEDURE curdemo()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = 1;

OPEN cur1;

REPEAT
FETCH cur1 INTO a, b;
IF NOT done THEN
INSERT
INTO test.t3
VALUES (a,b);
END IF;
UNTIL
done
END REPEAT;

CLOSE cur1;
END;


Basically, what's happening here's that we have a cursor, cur1 that selects data from the columns id and data of the table t1. The cursor is traversed using an REPEAT..UNTIL..END REPEAT loop, and for each FETCH, data that is taken from the cursor is used to perform an INSERT into table t3. Of course, we could've achieved this in a single INSERT INTO..SELECT..FROM statement, but that's beside the point. I want to focus on the REPEAT..UNTIL..END REPEAT loop and the CURSOR traversal.

So, how is this loop controlled? Well, being a REPEAT..UNTIL..END REPEAT loop, the loop body is entered at least once, and repeated until the condition becomes TRUE. Here, the condition itself is very simple, it consists of a simple evaluation of the INT type variable done. By default, done is zero, wich is equivalent to FALSE when evaluated as a boolean value. The entire control of the loop depends on the value of the done variable being set to an equivalent of TRUE when the cursor is exhausted. When that is the case, the very next time that the UNTIL condition is evaluated, the loop will terminate.

How is cursor exhaustion detected in MySPL? Well, I described in my previous post that you need a CONTINUE HANDLER for the builtin NOT FOUND condition to do this. In this snippet, this very device is used, with a slight modification: instead of the builtin NOT FOUND condition, a SQLSTATE '02000' is used. This covers the same situation as the builtin NOT FOUND condition.

The handler statement itself is very simple. It just assigns a value of 1 to the done variable. Because 1 is an equivalent of TRUE, the loop will terminate the next time that the UNTIL condition is evaluated.

Superficially all this seems to make sense, but I'll try and convince you that this is actually all wrong. Maybe that's a bit too strong. Let's say that I think that this is not as good as it could be.

What is really happening here?


Take a look at the body of the loop; I mean the statements inbetween the REPEAT and the UNTIL markers. First, a record is fetched from the cursor. Then, an IF..THEN statement checks if the done variable is still equivalent to FALSE, and if so, the INSERT is performed.

Now, why would it be neccessary to use an IF statement after the FETCH? Well, suppose the resultset retrieved by the cursor comprises exactly one record. When the cursor is opened, the cursor's internal record pointer will be positioned right before that first and only record. Then, the loop is entered, and the fetch is performed for the first time. The record's field values are transferred to the variables, and the cursor's record pointer will be moved to the next position, right after the record. Execution continues with the IF. The tested condition, NOT done, will evaluate to TRUE, because the done variable is still zero. So, in the first loop cycle, the INSERT is performed. We have arrived at the end of the loop body, reaching the UNTIL. Because the done variable is still equivalent to FALSE, the loop is re-entered, and it starts all over again right after the REPEAT marker defining the beginning of the loop.

We are now about to FETCH for the second time. However, once we attempt that, an error occurs because the cursor is exhausted. Remember, the query retrieved only one record, which was fetched in the first loop cycle. But wait, didn't we declare a HANDLER to deal with this situation? Yes, we did indeed! So, procedure execution is resumed at the handler statement, which assigns a one to the done variable. Once the assignment is done, the procedure execution resumes right after the FETCH that raised the error condition. This is because we are dealing with a CONTINUE handler here. After procedure execution resumes, the IF is executed for the second time. This time NOT done will evaluate to FALSE, so the INSERT is not performed this time. This makes sense, after all, we already exhausted our cursor during the first loop cycle.

Now we know why the IF's there. When we would just write an unconditionally executed INSERT right after the fetch, we would be inserting a record anyway, even though the cursor's already exhausted. The values that we would be inserting are in fact the same values that were inserted during the first loop cycle, because the FETCH is the only statement that assigns any values to the a and b variables.

After the IF..END IF, we encounter the UNTIL for the second time. This time, the done variable will evaluate to TRUE wich will terminate the loop.

Sorry, but why is this so bad?


So, what's not good about this? There are two things that I'm thinking of. One is a matter of style, and not that important, altough I think most programmers can appreciate it. The second has to do with efficiency, and is as far as I can see not really open to any debate. (But hey, feel free to do post a comment; as some advertising bots seem to have discovered, comments on this blog can be made anonymously).

It's 'just' a matter of style, but I feel that REPEAT is a construct you typically use to do some work that has to be done at least one time. After that, the condition may require the loop to be re-entered until all of the work has been done. REPEAT is so suitable for this type of problem because it checks the loop control condition after the loop body, so you are guaranteed to have at least one loop cycle. In my opinion, cursor loops usually do not match this pattern. Most of the times, you can't tell on beforehand if the cursor will retrieve any records at all, and in those cases, I do not want to enter the loop at all. Of course, the snippet is resilient to the case, so it's really a matter of style, not of correctness. If the cursor query would not return any rows at all, the snippet would still work, but to me, a REPEAT is like a marker saying: "Ho there, here's some work that needs to be done, and maybe we'll have to repeat it a couple of times". Not surprisingly, I really almost never have the need to write any loop like this.

I feel quite a bit more strong about another thing. I think that it is really not very good to have to different checks that test the same condition. I mean, we have both the IF and the UNTIL checking essentially an inverse condition. And it's not just something that has to do with this particular snippet; it really has to do with this type of cursor loop in general. Because the work you do in the loop relies on a successful FETCH, you realy need the IF. To me, it seems as if all the structural benefits of the REPEAT loop are superceded by essentially being forced to duplicate the loop control check. What makes it a bit worse, is that we really have to check inverse conditions, because the UNTIL wants the condition to be FALSE in order to repeat the loop and keep going, whereas the IF wants the condition to be TRUE to keep going.

Can't you solve this with a WHILE loop?


Well, in part, but I still feel we can do even better. Let's give it I go, I'll show you what I mean. Here's an equivalent snippet, with the REPEAT rewritten using WHILE:


CREATE PROCEDURE curdemo()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = 1;

OPEN cur1;

FETCH cur1 INTO a, b;
WHILE NOT done DO

INSERT
INTO test.t3
VALUES (a,b);

FETCH cur1 INTO a, b;
END WHILE;

CLOSE cur1;
END;


So, how is this loop controlled in this case? Being a WHILE..DO..END WHILE loop, the loop body is entered only when the condition evaluates to TRUE. Repetition will likewise occur only if the condition is still TRUE when re-entering the loop. So this solves that "style thingie" I just mentioned. What's also good about this solution is this: Because we know that the cursor is not exhausted once we're inside the loop, we can do away with the IF alltogether. That's one inverse check less.

So what's the objection now then?


Although this is an improvement over REPEAT, there's still two things bothering me here.

One is really really minor, and has more to do with my translation of REPEAT into a WHILE than with anything else. For simplicity, I retained the original done variable. However, this makes me test NOT done in the WHILE. This is not bad, but we could do a lot better by reversing the semantics here. If we would alter it's declaration to, say

DECLARE hasMoreRows BOOL DEFAULT TRUE;

and likewise redeclare the handler statements to:

SET hasMoreRows = FALSE;

we could rewrite the loop start as

WHILE hasMoreRows DO

So that's quite nicely solved I think:


CREATE PROCEDURE curdemo()
BEGIN
DECLARE hasMoreRows BOOL DEFAULT TRUE;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET hasMoreRows = FALSE;

OPEN cur1;

FETCH cur1 INTO a, b;
WHILE hasMoreRows DO

INSERT
INTO test.t3
VALUES (a,b);

FETCH cur1 INTO a, b;
END WHILE;

CLOSE cur1;
END;


What's still bothering me though is the handling of the FETCH statement. The snippet uses two FETCH statements. One's inside the loop body, wich is good, because we set up the loop in the first place to browse through the records in the cursor. The other one's outside of the loop, just before the we enter it. Why do we need two FETCH statements then? Let's concentrate on the one inside the loop body, and forget about the other one for a while. We already established we really need one inside the loop in order to traverse the cursor, and you will see why we need one outside the loop in a few moments.

First thing that we can observe about the 2nd FETCH is it's placement. It's immediately at the bottom of the loop body, right before the END WHILE marker. This is different from the situation in the REPEAT..UNTIL..END REPEAT snippet. There, the FETCH statement was at the top of the loop body, right behind the REPEAT marker. Does it make sense to put it at the bottom? When we reason from the point of view of the work that has to be done inside the loop, it certainly doesn't. Remember, we are browsing through all the records in the cursor, and INSERTing data from the cursor record into another table. So how does it make sense to FETCH the data after the INSERT?

Imagine the FETCH would be at the top of the loop, right where it was when we had the REPEAT loop, and pretend that the FETCH immediately preceding the loop isn't there. Now, suppose that query we used to define the cursor yields no records at all. The cursor would be opened, and because the hasMoreRows variable defaults to TRUE, the loop would be entered. Now, we are pretending the FETCH is right on top of the loop body, so we do the FETCH, and this results in an error, because the cursor's exhausted. Procedure execution would then be transferred to the HANDLER statement, wich SETs the hasMoreRows variable to FALSE, after wich procedure execution is resumed...right after the FETCH. Because we are already inside the loop, there's nothing that prevents the INSERT from being executed, wich is clearly not what we intended to do at all. Repetition is terminated only after the first loop cycle completes, wich simply is too late by one loop cycle.

So, now we know why the FETCH must be placed at the bottom of the loop. If it is placed there, cursor exhaustion is immediately handled by not entering the next loop cycle, and without executing any intervening statements. This also explains why we must have a FETCH immediately preceding the entire loop. If we place the FETCH that we need inside the loop at the bottom, we still need to have FETCHed at least once before entering the first loop cycle (or alternatively, not entering the loop at all) to initialize all the variables properly.

The downside is of course that we need to duplicate the fetch. This would not be so bad if it were'nt for all those columns. Suppose you have 20 columns, and you accidentally swap a few columns, fetching the wrong values into your variables...

How an unstructured loop remedies all this


Generally, I do not like to write unstructured loops, but in this case, I keep coming to the conclusion that a simple LOOP..END LOOP is just The Best Solution. Check it out:


CREATE PROCEDURE curdemo()
BEGIN
DECLARE done BOOL DEFAULT FALSE;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = TRUE;

OPEN cur1;

myLoop: LOOP
FETCH cur1 INTO a, b;

IF done THEN
CLOSE cur1;
LEAVE myLoop;
END IF;

INSERT
INTO test.t3
VALUES (a,b);

END LOOP;

END;


It's not really difficult to see how this loop is controlled. As in the REPEAT example, the loop is always entered. Right after it is entered, the FETCH is executed, and should the cursor be exhausted, the HANDLER is executed, setting the done variable to TRUE. Until now, this is similar to the REPEAT example. Then, the IF is executed. Instead of conditionally doing the actual work when NOT done as we saw in the REPEAT example, we use it to CLOSE the cursor, and LEAVE the loop only in case we're done. This pattern ensures that we keep all the statements that have to do with cursor handling together. After the END IF, we can do all the real work we need to perform inside the loop.

Summary


So, to summarize it all, we don't need the double check like we saw with the REPEAT; also, we don't need the extra FETCH we saw with the WHILE. The price we have to pay is that the loop is unstructured: your fellow programmers will sue you, and call you names behind your back.

Seriously, as long as you keep the OPENstatment, the start of the LOOP and the IF..END IF together, and make sure that the IF..END IF contains only those statements that have to do with cleaning up after the cursor traversal, you should be allright. One other concern is the loop label. When handling multiple or nesting loops, you should ensure not to mismatch the labels, but that's about it.

70 comments:

Anonymous said...

Sorry, but this article is SOOOOOO BOOOOOOOOOOOOOOOOOORRRRRIIIIIIINNNNGG! Realize that your audience are programmers, not pre-school children. One could write this in two paragraphs. Ugh...

rpbouman said...

Hey anonymous,

thanks for the constructive criticisms!

I can only admire your insight in my audience, and it's only understandable that you chose not to sign your truly terrific comments on this lousy blog with your name.

So thank you again, and post back anytime.

Roland

Anonymous said...

Very detailed tutorial, I found it very useful, Thank you very much

Anonymous said...

This is great advice, since I'm still a novice in MySQL. If you would've written as anonymous wanted, I don't think I would've understood this as much as now.

keep up the good work!

rpbouman said...

Hi aki-matti itkonen,

I'm glad you like it, thanks for your kind words!

I did this writeup a while ago because I could just tell by looking at the mysql forums that there were a lot of people having trouble with this. One of the things that seems to go wrong most of the time is having a cusor that unexpectedly does not yield any records. As the article explains, REPEAT is particularly bad with this, and REPEAT is what's used in the MySQL documentation (manual + .pdf introductory document)

Of course, I my can never reach the level of understanding mastered by the anonymous MySQL Expert. Words fall short when I try to explain how deeply and profoundly guilty I feel to have burdened his fine and exquisite genius with my totally and utterly worthless boring blog which was so brutally forced upon him (or her).

People like us are probably forever doomed to wander in the dark and murky un-knowingness of matters which form a mere trifle for his superior and truly brilliant mind, and it is only fit that we are left awestruck, numbed when witnessing his divine intelligence

Anonymous said...

Hi, nice tutorial. As you said, cursors are one of the difficult and limted things in MySQL. I dont´t know if you have plans to write about it or if there are some web pages anywhere that can help. Thanks.

Rafael Montoya.
Lima - Peru

rpbouman said...

Thank you Rafael!

Actually, this is just one blog entry about Cursors.

I did a write up about nesting cursors (http://rpbouman.blogspot.com/2005/10/nesting-mysql-cursor-loops.html)
and a very general one (http://rpbouman.blogspot.com/2005/09/want-to-write-cursor-loop-with-mysql.html).

You can also check out the MySQL forums, there's one especially devoted to cursors:

http://forums.mysql.com/list.php?102

Good luck!

Anonymous said...

I updated the mySQL-Server and it seemed as if the new one didn't understand my old stored procedures.

And indeed while 5.0.21 could handle repeat cursors - 5.0.22 only got along with unstructured loop cursors. I had to change the SPs

strange thing ... but it helped me

cu and thanks

rpbouman said...

mmm, that sounds very strange to me.

Do you mind posting a sample of the code? - this really intrigues me.

TIA,

Roland.

Anonymous said...

Thank You for this experience-based tutorial. I'm having a real difficult time trying to execute a cursor based SP which performs selects from far too many tables. I keep getting "Incorrct number of FETCH variables". But your page is a huge help in picking up where the docs leave off.
Regards!

rpbouman said...

Peter,

You are getting this error because the number of expressions in the SELECT list of your cursor does not match the list of variables in the FETCH statement.

This problem is usually due to forgetting to type a comma between one or more expressions in the SELECT list of the cursor.

For example, suppose you want this SELECT statement in your cursor:

SELECT col1, col2
FROM table1

But you accidentally forgot the comma:

SELECT col1 col2
FROM table1

Then MySQL (and most other rdbms-es,too) thinks you want to select col1, and give it an alias (a local name) col2.

Hope this helps, and good luck.

BTW: are you sure you need a cursor? Google: "Roland Bouman" "Are you sure you need a Cursor"

Anonymous said...

Nice write-up, thanks. I’ve found very little info on cursors and Google keeps turning up your articles, and for a good reason. Great depth. I leave your webpage with a much better understand then when I depart the MySQL official docs :)

Anonymous said...

The article was very useful and I think it is always worth to well document any subject. I'm a programmer and the topic covered pretty much 100% of what I was in trouble. For me it was enough to go over the example scripts and seeing the emphasized lines. However this wouldn't be a good article if only fragments of code were put together. Nice job!

Unknown said...

Thanks for this marvelous post. It helped me further in a little date + cursor problem I faced.

Anonymous said...

hi, roland

i have some idea to deal with nested cursor with 'WHILE' statement. here some of the code :

begin
declare var1 int;
declare var2 int;

declare cursor1 cursor for
select t.col1 from table1 as t;

declare cursor2 cursor for
select r.col2 from table2 as r;

declare continue handler for sqlstate '02000' set done = 1;

open cursor1;
fetch cursor1 into var1;

while not done do

open cursor2;
fetch cursor2 into var2;
while not done do

-- insert into something, update, etc

fetch cursor2 into var2;
end while;
close listid_tmp;

set done = 0; -- set handler to 0 again,
-- must before fetch the outer cursor
fetch fetch cursor1 into var1;
end while;
close cursor1;

end;

The idea is to set cursor handler to false / 0 before fetching outer cursor. So, the code looks more structured and easier to read. (i no good with 'loop' :D )

rpbouman said...

Hi Agung Prihanggoro,

Thanks for you comments!

What you are describing here is two things actually: one method to deal with nested cursor loops, and how to use while to write a cursor loop.

Your method to deal with nesting, resetting the loop control variable, is good. I wrote about this and other techniques earlier in my article Nesting MySQL Cursor Loops elsewhere on this blog.

The pattern to use WHILE is also correct and valid, and it is in fact discussed in my article (see the heading: "Can't you solve this with a WHILE loop?"). The WHILE solution is better than the REPEAT solution, because you can do away with one condition check inside the loop body.

However, the problem with this approach is that you need to duplicate the FETCH statement. You need to FETCH at least once before entering the loop, and of course, you need to do the same fetch again at the end of the loop as well.

The duplication of the FETCH clause is undesirable, it makes it harder to maintain the code. If the cursor changes (for example, add a column to the SELECT list, change the order of the columns) the code needs to change accordingly in both FETCH statements. This is not the case for the unstructured LOOP approach.

Another thing about the WHILE solution is that the second FETCH statement appears separated from the other code that handles the cursor. It's largely a matter of taste, but I like it better when all statements that deal with the cursor and the loop control are in the same spot. For the LOOP solution, the END LOOP is the only statement that is (necessarily) in another place, all other statements appear nicely in sequence.

kind regards,

Roland

Anonymous said...

Hi thanks for detailed explanations.
It is so usefull for me.

But i have a new problem while trying what you explained here
I try to check that in which tables,
a given column name exists

DELIMITER $$

DROP PROCEDURE IF EXISTS find_column $$

CREATE PROCEDURE find_column (IN col_name varchar(50))
BEGIN

declare v_col1 varchar(50);
declare no_more_rows boolean default FALSE;
declare cursor1 cursor for
Show tables;

declare continue handler for not found

set no_more_rows := TRUE;

open cursor1;

LOOP1: loop

fetch cursor1 into v_col1;
if no_more_rows then
close cursor1;
leave LOOP1;
end if;

show columns from v_col1 where Field = col_name;

end loop LOOP1;

END $$


DELIMITER ;


In the last query
show columns from v_col1 where Field = col_name;

i go throuh the table names (for this insance it comes from v_col1 and in each table i check whether col_name exists)

Bu when i call procedure like
call find_columns('USERNAME')

it gives error vcol1 doesn't exists. I thinks it is a table name.

In oracle it is simple like For rec IN (Show tables)
loop
blah blahh
end loop

But i couldn't make it with mysql.

rpbouman said...

Hi Jazzy!

Glad to hear the article was useful to you ;)

Actually, I think the problem you are experiencing is down to the cursor declarations. As far as I know, MySQL cursors can be defined only using SELECT statement - not SHOW statements. Although the SHOW statements return a resultset, they are intended mostly for inspection by humans - you cannot use SHOW statements as if they are tables or subqueries (although you can use the resultset returned to the client by SHOW statements just like any other resultset).

There is a solution however: the information returned by the SHOW commands is in many cases available in the information_schema. The result returned by SHOW TABLES is roughly equivalent to a

SELECT * FROM information_schema.TABLES

statement, and the result returned by a SHOW FULL COLUMNS statement is roughly equivalent to the result returned by

SELECT * FROM information_schema.COLUMNS

Becuase information_schema.TABLES and information_schema.COLUMNS are actual tables, you can use these to define the cursors.

hope this helps, post back if you have any further questions :)

kind regards,

Roland Bouman

So, instead of using the

Anonymous said...

Thank very much you for your reply.

Yes with information_table it is very easy.
It is just one query.

select * from COLUMNS WHERE TABLE_SCHEMA = 'my_db_name'
AND
COLUMN_NAME = 'my_column_name'


But if i want to create a stored procedure for information_schema every time it gives the error "No database selected."
I begin with use information_schema
it gives the same error.
But for my other databases which i created before i can crate stored procedure..

For information_schema database i can not create a stored procedure.

Thank yo very much..

rpbouman said...

Hi jazzy!

You mean, a stored procedure inside the information schema? That's not possible I'm afraid. It is a read-only database.

Or do you mean that you do a

USE information_schema

right at the top of the procedure, but that the cursor declarations don't pick that up? I would have to look into that.

My tip for you to is to just define the cursors with explicitly qualified table names, so

SELECT * FROM information_schema.COLUMNS

instead of

SELECT * FROM COLUMNS

Cheers,

Roland

Naveen said...

hello sir

Its really a very useful tutorial. I just want to know if it is possible to perform update row wise in a loop?

Naveen

rpbouman said...

Naveen,

thanks for your kind words.

Yes, it is possible to perform an update inside the loop, but you need to do it with an ordinary update statement - you cannot use a WHERE CURRENT OF syntax like you would in Oracle.

Anonymous said...

Hi Roland,

Thanks for the detailed article.

I have spent most of my programming life steering away from loops of the kind you advocate, but I can clearly see your reasons for using such a loop, and I must agree with you in this instance.

I had a stored procedure (using a cursor) which was duplicating the final row of the results; I traced this back to the use of REPEAT...UNTIL. As soon as I implemented your approach, the problem was solved.

Who says you can't teach an old dog new tricks?!?!

Very much appreciated. Thanks.

Best regards,
Mike.

rpbouman said...

Mike,

I understand your reservation towards using unstructured loops, and I agree. The way I see it, the real problem is that MySQL does not implement cursor FOR loops.

FOR (var1, ..., varN) IN cursorname LOOP
...
END LOOP;

This would solve a lot of problems that are now reponsible for a lot of clumsy code.

(bye-bye OPEN/CLOSE, HANDLER FOR NOT FOUND, v_done, LEAVE, FETCH)

Anonymous said...

Great article Roland, It helped me a lot with my work. As a C++/assembler programmer focused on hardware, always had a bad time with SQL and stuff.

I have this one that is driving me crazy, no matter what I do, loop quits always after the first iteration. I can't understand why, I´ve tested my temporary table and there are like 400 registers there, so, why it quits after first register?? take a look:

DECLARE done INT DEFAULT 0;
DECLARE idx, ssr,stoptime, stflag, ssrtst,
backssr,backstop,backidx,numpars INT(11);
DECLARE cur1 CURSOR FOR SELECT * FROM copiaeventos;
DECLARE CONTINUE HANDLER FOR 1329 SET done = 1; -- nodata handler

DROP TEMPORARY TABLE IF EXISTS copiaeventos;
CREATE TEMPORARY TABLE copiaeventos AS
SELECT eventos.Indice, eventos.SSR, eventos.Stop
FROM eventos
WHERE eventos.Data = '2007/12/07' and eventos.FDU = 1;

set stflag=0;
set numpars=0;
OPEN cur1;

-- pega evento
FETCH cur1 INTO idx,ssr,stoptime;
while not done do

select done,idx,ssr,stoptime;

-- testa se o codigo e uma parada
select paradas.Codigo into ssrtst from paradas WHERE
paradas.Efeito = 1 and paradas.Tipo = 1 and paradas.Codigo = ssr;

if (ssrtst = ssr) then -- se for parada..
set numpars = numpars + 1;
select numpars;
if (stflag = 0) THEN
set backssr = ssr;
set backidx = idx;
set backstop = stoptime;
set stflag=1; -- marca a parada
else
-- guarda o tempo da parada na bd de eventos
update eventos set eventos.FailTime = stoptime - backstop
where eventos.Indice = backidx;

-- guarda os valores atuais (tem parada)
set backssr = ssr;
set backidx = idx;
set backstop = stoptime;
end IF;
elseif (stflag =1) THEN
update eventos set eventos.FailTime = stoptime - backstop
where eventos.Indice = backidx;
set stflag = 0; -- nao tem mais parada
end if;

FETCH cur1 INTO idx,ssr,stoptime;
select done,idx,ssr,stoptime;

end while;

CLOSE cur1;

select numpars, idx, ssr, stoptime;
END

rpbouman said...

Hi Ricardo,

I'm glad the article was useful for you. Now considering your issue:

The most likely explanation is that this:

-- testa se o codigo e uma parada
select paradas.Codigo into ssrtst from paradas WHERE
paradas.Efeito = 1 and paradas.Tipo = 1 and paradas.Codigo = ssr;

statement selects now rows. That will raise the no data found condition too. The handler does not know whether it was triggered by the FETCH or by that SELECT - it doesn't care *which* data wasn't found.

That said - it looks to me like you are making it too complex. Take a look at my "Refactoring Cursors" article and you'll probably see what I mean:

http://rpbouman.blogspot.com/2006/09/refactoring-mysql-cursors.html

You will recognize the pattern I describe in that article in your code. Let me know if that was helpful for you.

kind regards,

Roland.

Anonymous said...

Hi Roland,

Thanks a lot for your help, really appreciate your efforts in using your spare time to help others, very kind of you.

On my SP, I didn't know that ANY select would trigger the NO DATA condition, Is a little misleading that acts that way cause its a NULL result from a SELECT statement, nothing to do with an end of file event. I tought that NO DATA condition were something like EOF in C++.

This was my very first SP ever, and I have to admit that was very hard to implement due to my lack of experience in SQL.

On your other article, was really very helpful, it explained everything that I needed to finish my clumsy SP. Again, many thanks and continue your splendid job, maybe you should be writing a book on MYSQL!!

Regards from sunny Brazil,

Ricardo.

Anonymous said...

Thanks for some excellent posts about mysql.

You wrote that the handler is looking on the session not at the actually query, so if you have a select statement in your cursor look that returns no rows, the cursor loop ends.

Is there a solution to this? I mean, if it possible to have a select statement that potential can return no rows in the loop?

Please write a short example.

rpbouman said...

"You wrote that the handler is looking on the session not at the actually query"

No - not the session. It works like this: any SELECT statement or FETCH statement can cause the NOT FOUND condition to arise (which happens when there are no rows). Then, a sort of search process starts looking upward, from the inner to the outer BEGIN..END blocks to find the 'nearest' HANDLER for NOT FOUND. If one is found, then the HANDLER action is executed.

"so if you have a select statement in your cursor look that returns no rows, the cursor loop ends"

That is correct - if you do not take any precautions, the loop will end in a typical cursor loop.

"Is there a solution to this? I mean, if it possible to have a select statement that potential can return no rows in the loop?"

Oh yes ;-)

See my "nesting cursor loops" blog

http://rpbouman.blogspot.com/2005/10/nesting-mysql-cursor-loops.html

Let me know if you need more help

Ben said...

I'd like to offer up a counter-example which I think is simpler and more intuitive, at least for someone with a background in procedural/OO programming, using WHILE.

The main idea is to grab the first row out of the cursor before the start of the loop, use a WHILE, and get the next one at the bottom of the loop. If there are no rows to get, the loop body won't execute. Also note the close moved out of the loop, no need to have it in there.

This works well (please forgive the complexity of the query I am using, I left it in for completeness). Sure, you have to repeat your FETCH, but I think overall it's a lot simpler and therefore easier to maintain.

DELIMITER //
CREATE PROCEDURE updateCounts()
BEGIN
  DECLARE done BOOL DEFAULT FALSE;
  DECLARE tid int;
  DECLARE cc int;
  DECLARE rec datetime;
  DECLARE pcreat datetime;

  DECLARE cur CURSOR FOR
    SELECT rest of query removed ...;

  -- triggered at end of cursor result set
  DECLARE CONTINUE HANDLER FOR NOT FOUND
     SET done = TRUE;

  OPEN cur;
-- get first row
  FETCH cur INTO tid, cc, rec, pcreat;
  WHILE NOT done DO
    UPDATE CommentTopics
    SET  commentCount=cc, updated = rec
      WHERE commentTopicId = tid;

    -- get next row
    FETCH cur INTO tid, cc, rec, pcreat;
  END WHILE;
  
  CLOSE cur;
END;
//

rpbouman said...

Hi!

Thanks for offering your point of view. Indeed I wrote in the article that WHILE is a lot better than REPEAT for cursor loops, the main improvement of course being the single check of the loop control variable (2 checks for REPEAT).

Although I appreciate your point of view in preferring WHILE, I still favor LOOP. De repeated FETCH may not seem a big deal, but becomes more of a problem as the list of columns/variables grows.

With regard to the CLOSE being inside the loop: I understand your objection, and I can imagine some people favouring a style that keeps the CLOSE out of the loop.

While I agree that from a structural point of view it might not make sense to keep the CLOSE inside the loop, it does so from a practical point of view, as all of the loop set up and tear down reside in the same region of the code.

Anonymous said...

how can i alter table
from cursorsor value.
For example,

declare cur1 cursor for select
table_name from information_schema.tables
where table_schema='db_name';
fetch cur1 into abc;
alter table abc ....;
here mysql not interpret value of abc.
it interpret abc as a table name.
What's the solution

rpbouman said...

Hi Ashik,

use dynamic SQL. You can do that using the PREPARE syntax.

I wrote about it here.

Anonymous said...

Hi Roland,

Thanks for your notes,

i do work in postgres database and a part of a open source promotion society, i had a training session in Mysql for college teachers. while preparing for the training i was bit confused about the cursors comparing with other databases. by the time i thru your blog and its really worth for a fresher like me in mysql.

Thanks,

Riaoss

Eduardo Rossi said...

My dear Holand.

Do U really live in this planet? (rsrs) Just kidding... Your blog is amazing! Like others programmers, DBA's and so on, I must learn about MySQL and forget all about MSSQL (sad, isn't it?)

One more time, congratulations!

:)

Anonymous said...

Great article, helped me finish my assignment!

Todor

EthR said...

Excellent post - thank you very much

stephen b said...

Roland,

just a comment and a small time suggestion:
I tried loading your examples and it did not work with a script window from the mysql windows gui nor with emacs. after 1 hour of playing with it, I finally tried with a delimiter change and it worked.
If the examples work only under a delimiter change on all systems, please, consider showing that explicitly. maybe it will save time for dummies like me.

rpbouman said...

Hi Stephen!

the delimiter is not shown, because it is not part of the stored procedure syntax. It is something that your client tool uses to figure out when to send the text to the server. I suppose it could be included, but I didn't on purpose because I wanted to focus on the stored procedure syntax only.

I agree it can be confusing for newbies, but the article is intended for people that already know how to create them, and just have trouble setting up a cursor loop.

Anyway, thanks for the suggestion, and kind regards,

Roland.

Anonymous said...

Nice write up. Im not an sproc junky but I needed this bit of info and its nice to see someone obsess on stuff like this. And yeah why not put it in the manual?

Anonymous said...

first i said it a very usefull articles and i like it, comments, and your replies.

second i have a question : can you give me sample(s) mysql sp that kind like a bulk collect in oracle ? can i use cursor with variable ?
ex:
procedure proc1( in V_NAME varchar(20), in V_CITY varchar(20))
...
declare cursor cur1 for select * from mytable where name=V_NAME and city=V_CITY
...
...
begin
..
end;

?
thanks

rpbouman said...

Anonymous,

thanks! glad you like it.

MySQL does not have a bulk collect feature. You can do bulk load though from a text file using LOAD DATA INFILE. I believe that does not work from inside a stored routine though.

You can write variable references in the cursor SELECT statement. In the future, you should probably just try those things out yourself.

hth,

Roland.

Anonymous said...

Bulk Collect.... Need Example

hi...

How to implement the bulk collect like oracle in MySQL?
If any one know.. Give me the example code..

Thanks
fredy

rpbouman said...

Fredy, MySQL does not have bulk collect. Google LOAD DATA INFILE if you want to learrn about mysql bulk load.

Anonymous said...

thanks Roland,
forgive me sent you again the same question with name fredy, that's me.

So will you give me example for that LOAD DATA INFILE ??

Thanks again

Fredy

rpbouman said...

Sure! Here's a link to the example:

http://lmgtfy.com/?q=LOAD+DATA+INFILE&l=1

PrinceDhampir said...

Hi Roland,

Interesting post i have a problem with a cursor loop, the loop ends after the first fetch everytime i call the sotre procedure, in example

DECLARE done boolean default false;
DECLARE cursor1 CURSOR FOR
SELECT ... FROM [tables]
Where condition;

DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = true;

OPEN cursor1;

cursor1_loop:LOOP
fetch cursor1 into [variables]
IF done THEN
select done;
LEAVE cursor1_loop;
END IF;
[operations]
END LOOP cursor1_loop;
Close cursor1;

I have read about some problem with empty rows in the query that sets the done variable to true even if the query stilll have rows so what can i do to solve it?.

Thanks in advance.

Giovanni

rpbouman said...

Hi PrinceDhampir,

"I have read about some problem with empty rows in the query that sets the done variable to true even if the query stilll have rows"

never heard of that. link?

It's possible that in you [operations] block, a no data found exception is raised, inadvertently setting the done variable to true. You can quickly check that by doing a

SELECT done;

right before the END LOOP; If that is the case you can start troubleshooting what statement causes that. To prevent it, please check out my "NESTING CURSOR LOOPS" post (http://rpbouman.blogspot.com/2005/10/nesting-mysql-cursor-loops.html) to get a few ideas on how to remedy that. I would also recommend to read my "Refactoring cursors" post (http://rpbouman.blogspot.com/2006/09/refactoring-mysql-cursors.html) if you really do have a nested cursor (or a select statement inside the cursor loop).

HTH,

Roland

Doug said...

Hi Roland,

As a C/C++/other languages programmer, the more I use SQL stored procedures, the less I like them.

Using exception handling is ugly, to put it mildly. Is there any way to place the updating of the done flag with the code where it belongs? Even for the simple code given here and in s. 12.6.6 of the manual, it took a while to see what was going on (on the surface, it looks like you have infinite loops.)

Surely there should be some good, though less verbose, alternative to SQL. This reminds me of all the reasons I hate COBOL!

rpbouman said...

Doug, it is what it is. I don't like standard SQL cursor handling syntax either. In Oracle you can write it like

for rec in (
select col1, col2, ...
from ...
) loop

..code goes here, use rec.field to fetch values...

end;

Which is a lot nicer. But alas, MySQL sticks to the standard which happens to be really ugly.

Anonymous said...

Hi Roland,

How can I run/execute stored procedure. I am using wamp Server. Your answer will help lot.

Thanks in advance

Anonymous said...

Hi Roland,

Your Article really help me to understand SP concept. Thanks for posting this.

I have to find product Likes score in one of my project. Formula is as below:

Like Score = Like count / Total Votes

Likes table has three fields:
id int(11), likes(enum: 'like','dislike'), productid int(11);

Product table has two fields (for test):
productid int(11), pname varchar(25);


I want to write SP using cursor to execute above formula. And product Like score would be update for each product in different table.

How can I write store procedure for above requirement. Please help me for same.

Thanks in advance

rpbouman said...

@anonymous1: you can execute stored procedures on WAMp just like you can on regular MySQL. Just open a command line and do:

call spname();

where spname is the name of the stored procedure in the current schema or

call schemaname.spname()

to run the procedure called spname in the database (schema) with the name schemaname.

@anonymous2: why would you want to use a cursor for that? Also, what do you mean exaclty by "update for each product in different table."?

suppose you have a table to store the likes as measured by a particular point in time, you could do:

CREATE TABLE product_likes(
productid int
, as_of date
, like_score decimal(3,2)
)

insert
into product_likes
([product_id, as_of, like_score)
select product_id, current_date(), sum(if(likes='like',1,0))/count(likes)
from likes_table
group by productid

Rumah Dijual Jakarta said...

Its like you read my mind! You seem to know so much about this, like you wrote the book in it or something. I think that you can do with some pics to drive the message home a little bit, but instead of that, this is great blog. A fantastic read. I'll certainly be back.

Anonymous said...

But It shows a warning when we execute the whole statement.How to deal with this problem?

rpbouman said...

@anonymous, what executes which warning?

Walter said...

It was a bit long. Either way, I used the WHILE, because it made better sense than the REPEAT for cursors. But I too was discontent with the two FETCH. I know why it was needed, but still, there had to be a better way.

Thanks for the LOOP suggestion. Makes more sense.

Walter said...

It was a bit long. Either way, I used the WHILE, because it made better sense than the REPEAT for cursors. But I too was discontent with the two FETCH. I know why it was needed, but still, there had to be a better way.

Thanks for the LOOP suggestion. Makes more sense.

Walter said...

It was a bit long. Either way, I used the WHILE, because it made better sense than the REPEAT for cursors. But I too was discontent with the two FETCH. I know why it was needed, but still, there had to be a better way.

Thanks for the LOOP suggestion. Makes more sense.

Unknown said...

Thanks for this... okay this is now over 7yrs old but hey ho, glad it was here :)

Unknown said...

Beautiful tutorial.

Very detailed. Thank you very much.

Even after 7 years, it solve my doubts.

Thank you for share this knowledge.

David Beroff said...

Great post; thank you!

Take a look at the following snippet. I didn't make it up myself, I took it from the Cursors section in the MySQL Reference manual.

At some point in the last ten years, the maintainers of said manual eventually came to agree with your conclusions! :-) (About the only substantial difference is they close the cursor(s) after the loop, rather than in the conditional, so as to balance the open and close at the same level of indentation, and I must admit that I prefer their approach.)

rpbouman said...

@David Beroff,

thank you so much for pointing this out! I didn't realize the manual was changed, but I'm happy they did. I modified the blog post text to inform the reader. Thanks again!

KasH. said...

This may or may not already have been mentioned..;

In your last and final example you open the cursor outside the loop, and close it inside. May i suggest closing the cursor at the same level you opened it?

So..;


OPEN cur;

blah: LOOP
IF condition THEN
LEAVE blah;
END IF

-- Do some work with cur...
END LOOP

CLOSE cur;

rpbouman said...

KasH, thanks for the comment. Of course you may suggest it :)

You don't give an explicit argument as to why you think your suggestion is better but I will try and guess.

I am assuming you feel the structural entry of the loop, formed by the LOOP keyword must coincide with the opening of the cursor, since the loop exists only to traverse the cursor. And of course, I agree. I'm guessing the next step in your thinking is that the structural end of the loop should coincide with closing the cursor. And again I would I agree.

Where we probably differ in opinion is what exactly constitures the "end of the loop". You feel it should be the END LOOP keywords, because syntactically, LOOP...END LOOP forms one statement block. I on the other hand am not that interested in the syntactic structure. I am interested in the actual process of looping. So I feel that the actual, functional end of the loop is tied to the LEAVE statement. To me, END LOOP is only the demarcation of the loop body. It tells us nothing where the loop actually ends.

So, for me it is more natural to close the cursor in the same block of code that determines that the cursor is exhausted.

My style has what I feel is an extra advantage: all of the cursor control, and all of the loop control form one contiuous segment of code. So I have no trouble seeing both the OPEN and CLOSE of the cursor, and it is very easy to convince myself they match the same cursor variable. If you CLOSE the cursor after END LOOP, you might need to scroll the code, or use an editor that supports code folding to see this.

I hope this helps.

David Beroff said...

That's kind of what I was trying to say in the last sentence of my last post, as well.

If code is so large that one has to scroll or fold to see that, then maybe it's time to break some of it out into a separate component.

rpbouman said...

That's ok. I still disagree though :)

END LOOP is just the syntactical end of the loop. What actually ends at END LOOP is the loop body, not the loop itself. The loop or rather, the looping ends at LEAVE. So that's where I want to tear down and clean up.

We can agree to disagree.

Wes said...

Nice article, though it has not persuaded me to abandon my decades-long practice of using the while-with-duplicate-fetch solution. :-) Your example of the unstructured-loop solution does show that all loop control is kept nicely in one place, but due to its small size it does not illustrate the true danger of unstructured loops: every single line of code in the loop must be examined to be sure there is no other exit, and the number of lines of code that can appear in a loop is in effect unbounded.

The duplication of the fetch is a price I am willing to pay for the lack of a proper cursor control statement in MySQL (as you also observe). In contrast to the unstructured-loop approach, only two lines of code must be examined to check for correctness.

One thing I particularly appreciate about your article is its thoughtful attention to style. So I feel I must address a stylistic issue in your recommended change from "done" to "hasMoreRows". It requires you to initially assert "DECLARE hasMoreRows BOOL DEFAULT TRUE;", which is not, in general, the case. A more accurate name would be "fetchHasNotFailed", but that seems a tad cumbersome compared to "NOT done".

Eduardo Cooper said...

Very nice article! Thank you a lot.

Van said...

Very nicely written article. Even years later it's useful as I try to track down why I'm getting a "call out of order" error message. Thanks!

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...