LOOP..END LOOP, REPEAT..UNTIL..END REPEAT and WHILE..DO..END WHILE.Now, each of these essentially do the same: they denote a region in the stored procedure that will repeated at runtime. The differences between these constructs have to do with the way the repetition is terminated.
Unstructured loops: LOOP..END LOOP
LOOP..END LOOP is the simplest of the three loop constructs. LOOP denotes the start of the repetitive region, END LOOP denotes the end of the repetitive region, and anything inbetween those two markers is repeated. Period.
procedure p_count()
begin
declare
v_counter
int unsigned
default 0
;
loop -- start a repetitive region
set v_counter := v_counter + 1; -- just keep counting
end loop; -- end the repetitive region
end;
If the programmer does not take any special precautions, the repetition will be inifinte, which is not a good thing.
Lucky for the programmer, MySPL defines the
LEAVE construct. At runtime LEAVE performs a kind of jump outside a region of code, resuming program execution after that particular region. When the LEAVE is executed from inside the body of a LOOP..END LOOP, the LEAVE can be used to jump right behind the END LOOP, effectively terminating repetition. Of course, just writing LEAVE is no good, as it effectively undoes the repetitive effect of the loop we wanted to achieve in the first place. But a conditionally executed LEAVE is very useful. Check out this snippet:
procedure p_count()
begin
declare
v_max_counter
int unsigned
default 10
;
declare
v_counter
int unsigned
default 0
;
myLoop: loop -- start the repetitive region named 'myLoop'
if -- check if we have counted enough
v_counter = v_max_counter
then
leave myLoop; -- terminate repetition of region 'myLoop'
end if;
set v_counter := v_counter + 1; -- just keep counting
end loop myLoop; -- end the repetitive region 'myLoop'
-- when the loop has been left, execution continues from here
end;
This snippet differs in two ways from the previous one. Apart from the introduction of the
IF..THEN..LEAVE..END IF, this snippet explicitly labels the repeated region myLoop. The region is labeled by prefixing the start of the region (the LOOP keyword in this case) by the identifier for the region ('myLoop') followed by a colon. We really must label the LOOP..END LOOP, else LEAVE does not know where to jump to, or rather, what to 'jump out of' (=leave).Because the code that handles loop termination (
IF..THEN..LEAVE..END IF in our snippt) is syntactically not part of the LOOP..END LOOP construct, LOOP..END LOOP is considered to be an unstructured loop construct.Structured loops
The other MySQL loop constructs,
REPEAT..UNTIL..END REPEAT and WHILE..DO..END WHILE are structured loop constructs. Their syntax expliclty requires a boolean expression, a condition, that controls loop termination. This compels the programmer to think about loop termination right away, providing some protection against writing infinite loops. Also, these loops have a well-defined, immutable exit point.For a
REPEAT..UNTIL..END REPEAT loop, the condition is checked after execution of the statements between the REPEAT and UNTIL markers. If the condition evaluates to TRUE, the loop terminates, and program execution resumes after the END REPEAT marker. If the condition does not evaluate to TRUE, that is, if it evaluates to either FALSE or NULL, program execution is repeated starting at the statement after the REPEAT.
procedure p_count()
begin
declare
v_max_counter
int unsigned
default 10
;
declare
v_counter
int unsigned
default 0
;
repeat -- marks the start of the repeat loop
set v_counter := v_counter + 1; -- just keep counting
until -- marks the end of the loop body
v_counter = v_max_counter -- this boolean expression constitutes the loop control condition
end repeat; -- marks the end of the loop
end;
For a
WHILE..DO..END WHILE loop, the condition is checked before execution of the statements between the DO and END WHILE markers. If the condition does not evaluate to TRUE, that is, if it evaluates to FALSE or NULL, the loop terminates, and program execution resumes after the END WHILE marker. If the condition evaluates to TRUE, the loop is entered and the statements between the DO and END WHILE markers are executed. When program execution reaches de END WHILE marker, the condition between the WHILE and DO markers is re-evaluated, and it all starts all over again.
procedure p_count()
begin
declare
v_max_counter
int unsigned
default 10
;
declare
v_counter
int unsigned
default 0
;
while -- marks the start of the loop
v_counter <> v_max_counter -- this boolean expression constitutes the loop control condition
do -- marks the start of the loop body
set v_counter := v_counter + 1; -- just keep counting
end while; -- marks the end of the loop body
end;
Like with
LOOP..END LOOP, you can still write LEAVE inside such a loop, but this is considered bad practice. In general, Good Code Writing Rules demand that code blocks, including loops, have a single point of entry, and a single exit. By adhering to this rule, you ensure that the loops are easier to maintain, and easier to debug. Ignoring this rule can lead to 'spaghetti' code, that's code that, instead of maintaining a clear sequential flow of program execution, jumps back and forth.Why people favour structured loops
Generally, people favour structured loops over unstructured ones. The syntactic design of the structured loop constructs makes it easy to see where the repetition terminates and what condition controls termination. The only things that the programmer needs to do is make sure the condition is right. Also, the programmer must refrain from writing some additional
LEAVE inside the loop. So, that's one thing programmers must do, and one thing the must not do.For an unstructured loop, the programmer must write some kind of conditional statement that checks the condition. Apart from checking the condition, the programmer must also actually write the code that terminates the loop. In an unstructured loop, the condition check could appear anywhere inside the loop, so one really must read the entire loop code to see where the exit point is and what condition controls it. As for structured loops, programmers should refrain from writing multiple condition checks and exit points. This holds true for both unstructured and structured loops.
Loops and cursors
In my previous post, I illustrated
CURSOR traversal in both Oracle and MySQL. I used an unstructured LOOP..END LOOP construct there in both examples. In fact, I usually handle cursors in MySPL using the unstructured loop construct. I'll hope I can explain why.Take a look at the following snippet. I didn't make it up myself, I took it from the Cursors section in the MySQL Reference manual.
UPDATE: David Beroff kindly pointed out to me that at some point during the past decade that this post has been online, the example in the MySQL manual changed and now uses an
I did simplify it a little to keep a focus on the problem at hand, using a loop and a cursor:LOOP...END LOOP construct. I don't know if this post had anything to do with that, but I'm happy it was changed anyway. Yay!
CREATE PROCEDURE curdemo()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = 1;
OPEN cur1;
REPEAT
FETCH cur1 INTO a, b;
IF NOT done THEN
INSERT
INTO test.t3
VALUES (a,b);
END IF;
UNTIL
done
END REPEAT;
CLOSE cur1;
END;
Basically, what's happening here's that we have a cursor,
cur1 that selects data from the columns id and data of the table t1. The cursor is traversed using an REPEAT..UNTIL..END REPEAT loop, and for each FETCH, data that is taken from the cursor is used to perform an INSERT into table t3. Of course, we could've achieved this in a single INSERT INTO..SELECT..FROM statement, but that's beside the point. I want to focus on the REPEAT..UNTIL..END REPEAT loop and the CURSOR traversal.So, how is this loop controlled? Well, being a
REPEAT..UNTIL..END REPEAT loop, the loop body is entered at least once, and repeated until the condition becomes TRUE. Here, the condition itself is very simple, it consists of a simple evaluation of the INT type variable done. By default, done is zero, wich is equivalent to FALSE when evaluated as a boolean value. The entire control of the loop depends on the value of the done variable being set to an equivalent of TRUE when the cursor is exhausted. When that is the case, the very next time that the UNTIL condition is evaluated, the loop will terminate.How is cursor exhaustion detected in MySPL? Well, I described in my previous post that you need a
CONTINUE HANDLER for the builtin NOT FOUND condition to do this. In this snippet, this very device is used, with a slight modification: instead of the builtin NOT FOUND condition, a SQLSTATE '02000' is used. This covers the same situation as the builtin NOT FOUND condition.The handler statement itself is very simple. It just assigns a value of 1 to the
done variable. Because 1 is an equivalent of TRUE, the loop will terminate the next time that the UNTIL condition is evaluated.Superficially all this seems to make sense, but I'll try and convince you that this is actually all wrong. Maybe that's a bit too strong. Let's say that I think that this is not as good as it could be.
What is really happening here?
Take a look at the body of the loop; I mean the statements inbetween the
REPEAT and the UNTIL markers. First, a record is fetched from the cursor. Then, an IF..THEN statement checks if the done variable is still equivalent to FALSE, and if so, the INSERT is performed.Now, why would it be neccessary to use an
IF statement after the FETCH? Well, suppose the resultset retrieved by the cursor comprises exactly one record. When the cursor is opened, the cursor's internal record pointer will be positioned right before that first and only record. Then, the loop is entered, and the fetch is performed for the first time. The record's field values are transferred to the variables, and the cursor's record pointer will be moved to the next position, right after the record. Execution continues with the IF. The tested condition, NOT done, will evaluate to TRUE, because the done variable is still zero. So, in the first loop cycle, the INSERT is performed. We have arrived at the end of the loop body, reaching the UNTIL. Because the done variable is still equivalent to FALSE, the loop is re-entered, and it starts all over again right after the REPEAT marker defining the beginning of the loop.We are now about to
FETCH for the second time. However, once we attempt that, an error occurs because the cursor is exhausted. Remember, the query retrieved only one record, which was fetched in the first loop cycle. But wait, didn't we declare a HANDLER to deal with this situation? Yes, we did indeed! So, procedure execution is resumed at the handler statement, which assigns a one to the done variable. Once the assignment is done, the procedure execution resumes right after the FETCH that raised the error condition. This is because we are dealing with a CONTINUE handler here. After procedure execution resumes, the IF is executed for the second time. This time NOT done will evaluate to FALSE, so the INSERT is not performed this time. This makes sense, after all, we already exhausted our cursor during the first loop cycle.Now we know why the
IF's there. When we would just write an unconditionally executed INSERT right after the fetch, we would be inserting a record anyway, even though the cursor's already exhausted. The values that we would be inserting are in fact the same values that were inserted during the first loop cycle, because the FETCH is the only statement that assigns any values to the a and b variables.After the
IF..END IF, we encounter the UNTIL for the second time. This time, the done variable will evaluate to TRUE wich will terminate the loop.Sorry, but why is this so bad?
So, what's not good about this? There are two things that I'm thinking of. One is a matter of style, and not that important, altough I think most programmers can appreciate it. The second has to do with efficiency, and is as far as I can see not really open to any debate. (But hey, feel free to do post a comment; as some advertising bots seem to have discovered, comments on this blog can be made anonymously).
It's 'just' a matter of style, but I feel that
REPEAT is a construct you typically use to do some work that has to be done at least one time. After that, the condition may require the loop to be re-entered until all of the work has been done. REPEAT is so suitable for this type of problem because it checks the loop control condition after the loop body, so you are guaranteed to have at least one loop cycle. In my opinion, cursor loops usually do not match this pattern. Most of the times, you can't tell on beforehand if the cursor will retrieve any records at all, and in those cases, I do not want to enter the loop at all. Of course, the snippet is resilient to the case, so it's really a matter of style, not of correctness. If the cursor query would not return any rows at all, the snippet would still work, but to me, a REPEAT is like a marker saying: "Ho there, here's some work that needs to be done, and maybe we'll have to repeat it a couple of times". Not surprisingly, I really almost never have the need to write any loop like this.I feel quite a bit more strong about another thing. I think that it is really not very good to have to different checks that test the same condition. I mean, we have both the
IF and the UNTIL checking essentially an inverse condition. And it's not just something that has to do with this particular snippet; it really has to do with this type of cursor loop in general. Because the work you do in the loop relies on a successful FETCH, you realy need the IF. To me, it seems as if all the structural benefits of the REPEAT loop are superceded by essentially being forced to duplicate the loop control check. What makes it a bit worse, is that we really have to check inverse conditions, because the UNTIL wants the condition to be FALSE in order to repeat the loop and keep going, whereas the IF wants the condition to be TRUE to keep going.Can't you solve this with a WHILE loop?
Well, in part, but I still feel we can do even better. Let's give it I go, I'll show you what I mean. Here's an equivalent snippet, with the
REPEAT rewritten using WHILE:
CREATE PROCEDURE curdemo()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = 1;
OPEN cur1;
FETCH cur1 INTO a, b;
WHILE NOT done DO
INSERT
INTO test.t3
VALUES (a,b);
FETCH cur1 INTO a, b;
END WHILE;
CLOSE cur1;
END;
So, how is this loop controlled in this case? Being a
WHILE..DO..END WHILE loop, the loop body is entered only when the condition evaluates to TRUE. Repetition will likewise occur only if the condition is still TRUE when re-entering the loop. So this solves that "style thingie" I just mentioned. What's also good about this solution is this: Because we know that the cursor is not exhausted once we're inside the loop, we can do away with the IF alltogether. That's one inverse check less. So what's the objection now then?
Although this is an improvement over
REPEAT, there's still two things bothering me here. One is really really minor, and has more to do with my translation of
REPEAT into a WHILE than with anything else. For simplicity, I retained the original done variable. However, this makes me test NOT done in the WHILE. This is not bad, but we could do a lot better by reversing the semantics here. If we would alter it's declaration to, say DECLARE hasMoreRows BOOL DEFAULT TRUE;and likewise redeclare the handler statements to:
SET hasMoreRows = FALSE;we could rewrite the loop start as
WHILE hasMoreRows DOSo that's quite nicely solved I think:
CREATE PROCEDURE curdemo()
BEGIN
DECLARE hasMoreRows BOOL DEFAULT TRUE;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET hasMoreRows = FALSE;
OPEN cur1;
FETCH cur1 INTO a, b;
WHILE hasMoreRows DO
INSERT
INTO test.t3
VALUES (a,b);
FETCH cur1 INTO a, b;
END WHILE;
CLOSE cur1;
END;
What's still bothering me though is the handling of the
FETCH statement. The snippet uses two FETCH statements. One's inside the loop body, wich is good, because we set up the loop in the first place to browse through the records in the cursor. The other one's outside of the loop, just before the we enter it. Why do we need two FETCH statements then? Let's concentrate on the one inside the loop body, and forget about the other one for a while. We already established we really need one inside the loop in order to traverse the cursor, and you will see why we need one outside the loop in a few moments.First thing that we can observe about the 2nd
FETCH is it's placement. It's immediately at the bottom of the loop body, right before the END WHILE marker. This is different from the situation in the REPEAT..UNTIL..END REPEAT snippet. There, the FETCH statement was at the top of the loop body, right behind the REPEAT marker. Does it make sense to put it at the bottom? When we reason from the point of view of the work that has to be done inside the loop, it certainly doesn't. Remember, we are browsing through all the records in the cursor, and INSERTing data from the cursor record into another table. So how does it make sense to FETCH the data after the INSERT? Imagine the
FETCH would be at the top of the loop, right where it was when we had the REPEAT loop, and pretend that the FETCH immediately preceding the loop isn't there. Now, suppose that query we used to define the cursor yields no records at all. The cursor would be opened, and because the hasMoreRows variable defaults to TRUE, the loop would be entered. Now, we are pretending the FETCH is right on top of the loop body, so we do the FETCH, and this results in an error, because the cursor's exhausted. Procedure execution would then be transferred to the HANDLER statement, wich SETs the hasMoreRows variable to FALSE, after wich procedure execution is resumed...right after the FETCH. Because we are already inside the loop, there's nothing that prevents the INSERT from being executed, wich is clearly not what we intended to do at all. Repetition is terminated only after the first loop cycle completes, wich simply is too late by one loop cycle.So, now we know why the
FETCH must be placed at the bottom of the loop. If it is placed there, cursor exhaustion is immediately handled by not entering the next loop cycle, and without executing any intervening statements. This also explains why we must have a FETCH immediately preceding the entire loop. If we place the FETCH that we need inside the loop at the bottom, we still need to have FETCHed at least once before entering the first loop cycle (or alternatively, not entering the loop at all) to initialize all the variables properly. The downside is of course that we need to duplicate the fetch. This would not be so bad if it were'nt for all those columns. Suppose you have 20 columns, and you accidentally swap a few columns, fetching the wrong values into your variables...
How an unstructured loop remedies all this
Generally, I do not like to write unstructured loops, but in this case, I keep coming to the conclusion that a simple
LOOP..END LOOP is just The Best Solution. Check it out:
CREATE PROCEDURE curdemo()
BEGIN
DECLARE done BOOL DEFAULT FALSE;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = TRUE;
OPEN cur1;
myLoop: LOOP
FETCH cur1 INTO a, b;
IF done THEN
CLOSE cur1;
LEAVE myLoop;
END IF;
INSERT
INTO test.t3
VALUES (a,b);
END LOOP;
END;
It's not really difficult to see how this loop is controlled. As in the
REPEAT example, the loop is always entered. Right after it is entered, the FETCH is executed, and should the cursor be exhausted, the HANDLER is executed, setting the done variable to TRUE. Until now, this is similar to the REPEAT example. Then, the IF is executed. Instead of conditionally doing the actual work when NOT done as we saw in the REPEAT example, we use it to CLOSE the cursor, and LEAVE the loop only in case we're done. This pattern ensures that we keep all the statements that have to do with cursor handling together. After the END IF, we can do all the real work we need to perform inside the loop.Summary
So, to summarize it all, we don't need the double check like we saw with the
REPEAT; also, we don't need the extra FETCH we saw with the WHILE. The price we have to pay is that the loop is unstructured: your fellow programmers will sue you, and call you names behind your back. Seriously, as long as you keep the
OPENstatment, the start of the LOOP and the IF..END IF together, and make sure that the IF..END IF contains only those statements that have to do with cleaning up after the cursor traversal, you should be allright. One other concern is the loop label. When handling multiple or nesting loops, you should ensure not to mismatch the labels, but that's about it.