LOOP..END LOOP
, REPEAT..UNTIL..END REPEAT
and WHILE..DO..END WHILE
.Now, each of these essentially do the same: they denote a region in the stored procedure that will repeated at runtime. The differences between these constructs have to do with the way the repetition is terminated.
Unstructured loops: LOOP..END LOOP
LOOP..END LOOP
is the simplest of the three loop constructs. LOOP
denotes the start of the repetitive region, END LOOP
denotes the end of the repetitive region, and anything inbetween those two markers is repeated. Period.
procedure p_count()
begin
declare
v_counter
int unsigned
default 0
;
loop -- start a repetitive region
set v_counter := v_counter + 1; -- just keep counting
end loop; -- end the repetitive region
end;
If the programmer does not take any special precautions, the repetition will be inifinte, which is not a good thing.
Lucky for the programmer, MySPL defines the
LEAVE
construct. At runtime LEAVE
performs a kind of jump outside a region of code, resuming program execution after that particular region. When the LEAVE
is executed from inside the body of a LOOP..END LOOP
, the LEAVE
can be used to jump right behind the END LOOP
, effectively terminating repetition. Of course, just writing LEAVE
is no good, as it effectively undoes the repetitive effect of the loop we wanted to achieve in the first place. But a conditionally executed LEAVE
is very useful. Check out this snippet:
procedure p_count()
begin
declare
v_max_counter
int unsigned
default 10
;
declare
v_counter
int unsigned
default 0
;
myLoop: loop -- start the repetitive region named 'myLoop'
if -- check if we have counted enough
v_counter = v_max_counter
then
leave myLoop; -- terminate repetition of region 'myLoop'
end if;
set v_counter := v_counter + 1; -- just keep counting
end loop myLoop; -- end the repetitive region 'myLoop'
-- when the loop has been left, execution continues from here
end;
This snippet differs in two ways from the previous one. Apart from the introduction of the
IF..THEN..LEAVE..END IF
, this snippet explicitly labels the repeated region myLoop
. The region is labeled by prefixing the start of the region (the LOOP
keyword in this case) by the identifier for the region ('myLoop') followed by a colon. We really must label the LOOP..END LOOP
, else LEAVE
does not know where to jump to, or rather, what to 'jump out of' (=leave).Because the code that handles loop termination (
IF..THEN..LEAVE..END IF
in our snippt) is syntactically not part of the LOOP..END LOOP
construct, LOOP..END LOOP
is considered to be an unstructured loop construct.Structured loops
The other MySQL loop constructs,
REPEAT..UNTIL..END REPEAT
and WHILE..DO..END WHILE
are structured loop constructs. Their syntax expliclty requires a boolean expression, a condition, that controls loop termination. This compels the programmer to think about loop termination right away, providing some protection against writing infinite loops. Also, these loops have a well-defined, immutable exit point.For a
REPEAT..UNTIL..END REPEAT
loop, the condition is checked after execution of the statements between the REPEAT
and UNTIL
markers. If the condition evaluates to TRUE
, the loop terminates, and program execution resumes after the END REPEAT
marker. If the condition does not evaluate to TRUE
, that is, if it evaluates to either FALSE
or NULL
, program execution is repeated starting at the statement after the REPEAT
.
procedure p_count()
begin
declare
v_max_counter
int unsigned
default 10
;
declare
v_counter
int unsigned
default 0
;
repeat -- marks the start of the repeat loop
set v_counter := v_counter + 1; -- just keep counting
until -- marks the end of the loop body
v_counter = v_max_counter -- this boolean expression constitutes the loop control condition
end repeat; -- marks the end of the loop
end;
For a
WHILE..DO..END WHILE
loop, the condition is checked before execution of the statements between the DO
and END WHILE
markers. If the condition does not evaluate to TRUE
, that is, if it evaluates to FALSE
or NULL
, the loop terminates, and program execution resumes after the END WHILE
marker. If the condition evaluates to TRUE
, the loop is entered and the statements between the DO
and END WHILE
markers are executed. When program execution reaches de END WHILE
marker, the condition between the WHILE
and DO
markers is re-evaluated, and it all starts all over again.
procedure p_count()
begin
declare
v_max_counter
int unsigned
default 10
;
declare
v_counter
int unsigned
default 0
;
while -- marks the start of the loop
v_counter <> v_max_counter -- this boolean expression constitutes the loop control condition
do -- marks the start of the loop body
set v_counter := v_counter + 1; -- just keep counting
end while; -- marks the end of the loop body
end;
Like with
LOOP..END LOOP
, you can still write LEAVE
inside such a loop, but this is considered bad practice. In general, Good Code Writing Rules demand that code blocks, including loops, have a single point of entry, and a single exit. By adhering to this rule, you ensure that the loops are easier to maintain, and easier to debug. Ignoring this rule can lead to 'spaghetti' code, that's code that, instead of maintaining a clear sequential flow of program execution, jumps back and forth.Why people favour structured loops
Generally, people favour structured loops over unstructured ones. The syntactic design of the structured loop constructs makes it easy to see where the repetition terminates and what condition controls termination. The only things that the programmer needs to do is make sure the condition is right. Also, the programmer must refrain from writing some additional
LEAVE
inside the loop. So, that's one thing programmers must do, and one thing the must not do.For an unstructured loop, the programmer must write some kind of conditional statement that checks the condition. Apart from checking the condition, the programmer must also actually write the code that terminates the loop. In an unstructured loop, the condition check could appear anywhere inside the loop, so one really must read the entire loop code to see where the exit point is and what condition controls it. As for structured loops, programmers should refrain from writing multiple condition checks and exit points. This holds true for both unstructured and structured loops.
Loops and cursors
In my previous post, I illustrated
CURSOR
traversal in both Oracle and MySQL. I used an unstructured LOOP..END LOOP
construct there in both examples. In fact, I usually handle cursors in MySPL using the unstructured loop construct. I'll hope I can explain why.Take a look at the following snippet. I didn't make it up myself, I took it from the Cursors section in the MySQL Reference manual.
UPDATE: David Beroff kindly pointed out to me that at some point during the past decade that this post has been online, the example in the MySQL manual changed and now uses an
I did simplify it a little to keep a focus on the problem at hand, using a loop and a cursor:LOOP...END LOOP
construct. I don't know if this post had anything to do with that, but I'm happy it was changed anyway. Yay!
CREATE PROCEDURE curdemo()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = 1;
OPEN cur1;
REPEAT
FETCH cur1 INTO a, b;
IF NOT done THEN
INSERT
INTO test.t3
VALUES (a,b);
END IF;
UNTIL
done
END REPEAT;
CLOSE cur1;
END;
Basically, what's happening here's that we have a cursor,
cur1
that selects data from the columns id
and data
of the table t1
. The cursor is traversed using an REPEAT..UNTIL..END REPEAT
loop, and for each FETCH
, data that is taken from the cursor is used to perform an INSERT
into table t3
. Of course, we could've achieved this in a single INSERT INTO..SELECT..FROM
statement, but that's beside the point. I want to focus on the REPEAT..UNTIL..END REPEAT
loop and the CURSOR
traversal.So, how is this loop controlled? Well, being a
REPEAT..UNTIL..END REPEAT
loop, the loop body is entered at least once, and repeated until the condition becomes TRUE
. Here, the condition itself is very simple, it consists of a simple evaluation of the INT
type variable done
. By default, done
is zero, wich is equivalent to FALSE
when evaluated as a boolean value. The entire control of the loop depends on the value of the done
variable being set to an equivalent of TRUE
when the cursor is exhausted. When that is the case, the very next time that the UNTIL
condition is evaluated, the loop will terminate.How is cursor exhaustion detected in MySPL? Well, I described in my previous post that you need a
CONTINUE HANDLER
for the builtin NOT FOUND
condition to do this. In this snippet, this very device is used, with a slight modification: instead of the builtin NOT FOUND
condition, a SQLSTATE '02000'
is used. This covers the same situation as the builtin NOT FOUND
condition.The handler statement itself is very simple. It just assigns a value of 1 to the
done
variable. Because 1 is an equivalent of TRUE
, the loop will terminate the next time that the UNTIL
condition is evaluated.Superficially all this seems to make sense, but I'll try and convince you that this is actually all wrong. Maybe that's a bit too strong. Let's say that I think that this is not as good as it could be.
What is really happening here?
Take a look at the body of the loop; I mean the statements inbetween the
REPEAT
and the UNTIL
markers. First, a record is fetched from the cursor. Then, an IF..THEN
statement checks if the done
variable is still equivalent to FALSE
, and if so, the INSERT
is performed.Now, why would it be neccessary to use an
IF
statement after the FETCH
? Well, suppose the resultset retrieved by the cursor comprises exactly one record. When the cursor is opened, the cursor's internal record pointer will be positioned right before that first and only record. Then, the loop is entered, and the fetch is performed for the first time. The record's field values are transferred to the variables, and the cursor's record pointer will be moved to the next position, right after the record. Execution continues with the IF
. The tested condition, NOT done
, will evaluate to TRUE
, because the done
variable is still zero. So, in the first loop cycle, the INSERT
is performed. We have arrived at the end of the loop body, reaching the UNTIL
. Because the done
variable is still equivalent to FALSE
, the loop is re-entered, and it starts all over again right after the REPEAT
marker defining the beginning of the loop.We are now about to
FETCH
for the second time. However, once we attempt that, an error occurs because the cursor is exhausted. Remember, the query retrieved only one record, which was fetched in the first loop cycle. But wait, didn't we declare a HANDLER
to deal with this situation? Yes, we did indeed! So, procedure execution is resumed at the handler statement, which assigns a one to the done
variable. Once the assignment is done, the procedure execution resumes right after the FETCH
that raised the error condition. This is because we are dealing with a CONTINUE
handler here. After procedure execution resumes, the IF
is executed for the second time. This time NOT done
will evaluate to FALSE
, so the INSERT
is not performed this time. This makes sense, after all, we already exhausted our cursor during the first loop cycle.Now we know why the
IF
's there. When we would just write an unconditionally executed INSERT
right after the fetch, we would be inserting a record anyway, even though the cursor's already exhausted. The values that we would be inserting are in fact the same values that were inserted during the first loop cycle, because the FETCH
is the only statement that assigns any values to the a
and b
variables.After the
IF..END IF
, we encounter the UNTIL
for the second time. This time, the done
variable will evaluate to TRUE
wich will terminate the loop.Sorry, but why is this so bad?
So, what's not good about this? There are two things that I'm thinking of. One is a matter of style, and not that important, altough I think most programmers can appreciate it. The second has to do with efficiency, and is as far as I can see not really open to any debate. (But hey, feel free to do post a comment; as some advertising bots seem to have discovered, comments on this blog can be made anonymously).
It's 'just' a matter of style, but I feel that
REPEAT
is a construct you typically use to do some work that has to be done at least one time. After that, the condition may require the loop to be re-entered until all of the work has been done. REPEAT
is so suitable for this type of problem because it checks the loop control condition after the loop body, so you are guaranteed to have at least one loop cycle. In my opinion, cursor loops usually do not match this pattern. Most of the times, you can't tell on beforehand if the cursor will retrieve any records at all, and in those cases, I do not want to enter the loop at all. Of course, the snippet is resilient to the case, so it's really a matter of style, not of correctness. If the cursor query would not return any rows at all, the snippet would still work, but to me, a REPEAT
is like a marker saying: "Ho there, here's some work that needs to be done, and maybe we'll have to repeat it a couple of times". Not surprisingly, I really almost never have the need to write any loop like this.I feel quite a bit more strong about another thing. I think that it is really not very good to have to different checks that test the same condition. I mean, we have both the
IF
and the UNTIL
checking essentially an inverse condition. And it's not just something that has to do with this particular snippet; it really has to do with this type of cursor loop in general. Because the work you do in the loop relies on a successful FETCH
, you realy need the IF
. To me, it seems as if all the structural benefits of the REPEAT
loop are superceded by essentially being forced to duplicate the loop control check. What makes it a bit worse, is that we really have to check inverse conditions, because the UNTIL
wants the condition to be FALSE
in order to repeat the loop and keep going, whereas the IF
wants the condition to be TRUE
to keep going.Can't you solve this with a WHILE
loop?
Well, in part, but I still feel we can do even better. Let's give it I go, I'll show you what I mean. Here's an equivalent snippet, with the
REPEAT
rewritten using WHILE
:
CREATE PROCEDURE curdemo()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = 1;
OPEN cur1;
FETCH cur1 INTO a, b;
WHILE NOT done DO
INSERT
INTO test.t3
VALUES (a,b);
FETCH cur1 INTO a, b;
END WHILE;
CLOSE cur1;
END;
So, how is this loop controlled in this case? Being a
WHILE..DO..END WHILE
loop, the loop body is entered only when the condition evaluates to TRUE
. Repetition will likewise occur only if the condition is still TRUE
when re-entering the loop. So this solves that "style thingie" I just mentioned. What's also good about this solution is this: Because we know that the cursor is not exhausted once we're inside the loop, we can do away with the IF
alltogether. That's one inverse check less. So what's the objection now then?
Although this is an improvement over
REPEAT
, there's still two things bothering me here. One is really really minor, and has more to do with my translation of
REPEAT
into a WHILE
than with anything else. For simplicity, I retained the original done
variable. However, this makes me test NOT done
in the WHILE
. This is not bad, but we could do a lot better by reversing the semantics here. If we would alter it's declaration to, say DECLARE hasMoreRows BOOL DEFAULT TRUE;
and likewise redeclare the handler statements to:
SET hasMoreRows = FALSE
;we could rewrite the loop start as
WHILE hasMoreRows DO
So that's quite nicely solved I think:
CREATE PROCEDURE curdemo()
BEGIN
DECLARE hasMoreRows BOOL DEFAULT TRUE;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET hasMoreRows = FALSE;
OPEN cur1;
FETCH cur1 INTO a, b;
WHILE hasMoreRows DO
INSERT
INTO test.t3
VALUES (a,b);
FETCH cur1 INTO a, b;
END WHILE;
CLOSE cur1;
END;
What's still bothering me though is the handling of the
FETCH
statement. The snippet uses two FETCH
statements. One's inside the loop body, wich is good, because we set up the loop in the first place to browse through the records in the cursor. The other one's outside of the loop, just before the we enter it. Why do we need two FETCH
statements then? Let's concentrate on the one inside the loop body, and forget about the other one for a while. We already established we really need one inside the loop in order to traverse the cursor, and you will see why we need one outside the loop in a few moments.First thing that we can observe about the 2nd
FETCH
is it's placement. It's immediately at the bottom of the loop body, right before the END WHILE
marker. This is different from the situation in the REPEAT..UNTIL..END REPEAT
snippet. There, the FETCH
statement was at the top of the loop body, right behind the REPEAT
marker. Does it make sense to put it at the bottom? When we reason from the point of view of the work that has to be done inside the loop, it certainly doesn't. Remember, we are browsing through all the records in the cursor, and INSERT
ing data from the cursor record into another table. So how does it make sense to FETCH
the data after the INSERT
? Imagine the
FETCH
would be at the top of the loop, right where it was when we had the REPEAT
loop, and pretend that the FETCH
immediately preceding the loop isn't there. Now, suppose that query we used to define the cursor yields no records at all. The cursor would be opened, and because the hasMoreRows
variable defaults to TRUE
, the loop would be entered. Now, we are pretending the FETCH
is right on top of the loop body, so we do the FETCH
, and this results in an error, because the cursor's exhausted. Procedure execution would then be transferred to the HANDLER
statement, wich SET
s the hasMoreRows
variable to FALSE
, after wich procedure execution is resumed...right after the FETCH
. Because we are already inside the loop, there's nothing that prevents the INSERT
from being executed, wich is clearly not what we intended to do at all. Repetition is terminated only after the first loop cycle completes, wich simply is too late by one loop cycle.So, now we know why the
FETCH
must be placed at the bottom of the loop. If it is placed there, cursor exhaustion is immediately handled by not entering the next loop cycle, and without executing any intervening statements. This also explains why we must have a FETCH
immediately preceding the entire loop. If we place the FETCH
that we need inside the loop at the bottom, we still need to have FETCH
ed at least once before entering the first loop cycle (or alternatively, not entering the loop at all) to initialize all the variables properly. The downside is of course that we need to duplicate the fetch. This would not be so bad if it were'nt for all those columns. Suppose you have 20 columns, and you accidentally swap a few columns, fetching the wrong values into your variables...
How an unstructured loop remedies all this
Generally, I do not like to write unstructured loops, but in this case, I keep coming to the conclusion that a simple
LOOP..END LOOP
is just The Best Solution. Check it out:
CREATE PROCEDURE curdemo()
BEGIN
DECLARE done BOOL DEFAULT FALSE;
DECLARE a CHAR(16);
DECLARE b INT;
DECLARE cur1
CURSOR FOR
SELECT id,data
FROM test.t1;
DECLARE
CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = TRUE;
OPEN cur1;
myLoop: LOOP
FETCH cur1 INTO a, b;
IF done THEN
CLOSE cur1;
LEAVE myLoop;
END IF;
INSERT
INTO test.t3
VALUES (a,b);
END LOOP;
END;
It's not really difficult to see how this loop is controlled. As in the
REPEAT
example, the loop is always entered. Right after it is entered, the FETCH
is executed, and should the cursor be exhausted, the HANDLER
is executed, setting the done
variable to TRUE
. Until now, this is similar to the REPEAT
example. Then, the IF
is executed. Instead of conditionally doing the actual work when NOT done
as we saw in the REPEAT
example, we use it to CLOSE
the cursor, and LEAVE
the loop only in case we're done
. This pattern ensures that we keep all the statements that have to do with cursor handling together. After the END IF
, we can do all the real work we need to perform inside the loop.Summary
So, to summarize it all, we don't need the double check like we saw with the
REPEAT
; also, we don't need the extra FETCH
we saw with the WHILE
. The price we have to pay is that the loop is unstructured: your fellow programmers will sue you, and call you names behind your back. Seriously, as long as you keep the
OPEN
statment, the start of the LOOP
and the IF..END IF
together, and make sure that the IF..END IF
contains only those statements that have to do with cleaning up after the cursor traversal, you should be allright. One other concern is the loop label. When handling multiple or nesting loops, you should ensure not to mismatch the labels, but that's about it.