You may have heard of the so-called
dangling else problem. The text
if e1 then if e2 then s1 else s2
can reasonably mean either of two possible things:
When mixed with other statements without any obvious end, things can get crazy:
if e1 then while e2 do if e3 then s1 else s2
The dangling-else problem arises in languages with ambiguous grammars, such as the Pascal-like:
Stmt = Assignment | IfStmt | WhileStmt | Block | ...
IfStmt = "if" Exp "then" Stmt ("else" Stmt)?
WhileStmt = "while" Exp "do" Stmt
Block = "begin" Stmt* "end"
or the C-like:
Stmt = Assignment | IfStmt | WhileStmt | Block | ...
IfStmt = "if" "(" Exp ")" Stmt ("else" Stmt)?
WhileStmt = "while" "(" Exp ")" Stmt
Block = "{" Stmt* "}"
There are many ways to deal with this problem, but most lead to rather unsatisfactory ways to handle the general multi-way conditional. Is there a design that avoids dangling else problems
and forces programmers to write nice multiway conditional statements?
Resolve with a semantic rule
In Pascal and C, the syntax is ambiguous, but a
semantic rule says that each
else must match the closest
if,
regardless of indentation. This can be terribly confusing to beginners; the code below on the left doesn't mean what novices might think:
if cond1 then
if cond2 then
stmt1
else
stmt2
|
if cond1 then begin
if cond2 then
stmt1
end else
stmt2
|
One objection to this approach is that purists will probably hate how this solution relies on a semantic rule to handle something so inherently structural. A second objection is that multiway conditionals don’t really exist—the syntax can only describe deeply nested individual conditionals. So programmers end up playing games with their indentation to simulate multiway branching:
if cond1 then
stmt1
else if cond2 then
stmt2
else if cond3 then
stmt3
else if cond4 then
stmt4
else
stmt5
| if cond1 then begin
stmts1
end else if cond2 then begin
stmts2
end else if cond3 then begin
stmts3
end else if cond4 then begin
stmts4
end else begin
stmts5
end
|
This is ugly—but it could be a lot worse: programmers often mix up their conditional arms by sometimes using blocks and sometimes not, and get the ugly code they deserve. The worst part of the “semantic rule solution” dangling else syntax is that there seem to be hundreds of different ways to format multiway if statements. Take a look at some old Pascal textbooks if you want to be grossed out.
Complicate the grammar to remove the ambiguity
It is possible, with a lot of work, to force all
elses to be paired with the nearest
then. Call an
open statement one with a sub-statement that dangles at the end, and a
closed statement one that is not open. Then:
Stmt = OpenStmt | ClosedStmt
OpenStmt = IfThenStmt | IfThenElseStmt | WhileStmt | ForStmt
ClosedStmt = Assignment | Call | BreakStmt | ReturnStmt
NoShortIfStmt = ClosedStmt | NoShortIfIfThenElse
| NoShortIfWhile | NoShortIfFor
IfThenStmt = "if" Exp "then" Stmt
IfThenElseStmt = "if" Exp "then" NoShortIfStmt "else" Stmt
NoShortIfIfThenElse =
"if" EXP "then" NoShortIfStmt "else" NoShortIfStmt
NoShortIfWhile = "while" Exp "do" NoShortIfStmt
NoShortIfFor = "for" Var "in" Range "do" NoShortIfStmt
etc.
This grammar is no longer ambiguous, and every
else is automatically connected to the nearest
if. But it does nothing to prevent the confusion arising from misindented code we saw above.
Require the else part
If there always
has to be an else part, there is never any ambiguity, but you’ll have funny looking code when your
else part is missing: you'll need some empty statement or a special
null keyword.
if cond1 then
if cond2 then
stmt1
else
null
else
stmt2
| if cond1 then
if cond2 then
stmt1
else
stmt2
else
null
|
Not pretty. It works, though, even if the
while statement (and other compound statements) aren’t fully bracketed, for example:
if e1 then while e2 do if e3 then s1 else null else s2
and
if e1 then while e2 do if e3 then s1 else s2 else null
Require bracketing
Ultimately, languages that have compound statements where the bodies are “trailing statements” leave too many formatting choices and look ugly and unbalanced to many people. Defining a language’s syntax to
require bracketed compound statements is a Good Thing for that reason, and as a bonus it
removes the dangling else problem completely. The idea is that a block
should not be a kind of statement, and that all compound statements use blocks for bodies, never simple statements! This grammar fragment:
Stmt = Assignment | IfStmt | WhileStmt | ...
IfStmt = "if" "(" Exp ")" Block ("else" Block)?
WhileStmt = "while" "(" Exp ")" Block
Block = "{" Stmt* "}"
yields code like:
if (cond1) {
if (cond2) {
stmts1
} else {
stmts2
}
}
| if (cond1) {
if (cond2) {
stmts1
}
} else {
stmts2
}
|
Bracketing can also be done with just a terminating
end instead of curly braces or
begin-
end pairs:
Stmt = Assignment | IfStmt | WhileStmt | ...
IfStmt = "if" Exp "then" Stmt+ ("else" STMT+)? end
WhileStmt = "while" Exp "do" Stmt+ "end"
yielding rather clean code like this:
if cond1 then
if cond2 then
stmts1
else
stmts2
end
end
| if cond1 then
if cond2 then
stmts1
end
else
stmts2
end
|
But, wait—if we really require bracketing, won’t that make multiway conditionals ugly? Either you start indenting too much or you get a bunch of
}s (or
ends) at the end. Like this, right?
if cond1 {
stmts1
} else {
if cond2 {
stmts2
} else {
if cond3 {
stmts3
} else {
if cond4 {
stmts4
} else {
stmts5
}
}
}
}
| if cond1 {
stmts1
} else { if cond2 {
stmts2
} else { if cond3 {
stmts3
} else { if cond4 {
stmts4
} else {
stmts5
}}}}
|
Make indentation matter
In Python, there is no dangling else problem since the indentation makes things clear:
if cond1:
if cond2:
stmts1
else
stmts2
| if cond1:
if cond2:
stmts1
else
stmts2
|
But, wouldn't required indentation lead to funny looking code in multiway conditionals?
if cond1:
stmts1
else:
if cond2:
stmts2
else
if cond3:
stmts3
else
if cond4:
stmts4
else
stmts5
While the code above is legal Python, real Python programmers use the next solution.
Introduce a new keyword to simplify required bracketing or indentation
In most languages where bracketing (or indentation) is required, the multiway conditional is described syntactically as a single if-statement, usually with the help of a special keyword (called
elsif in Ada, Ruby, and Perl;
elif in bash and Python; and
elseif in PHP. We have both a curly-brace style:
IfStmt = "if" "(" Exp ")" Block
("elsif" "(" Exp ")" Block)*
("else" Block)?
Block = "{" STMT* "}"
and terminating-end style:
IfStmt = "if" EXP "then" Stmt +
("elsif" Exp "then" Stmt+)*
("else" Stmt+)?
"end"
Code looks like this:
if cond1
stmts1
elsif cond2
stmts2
elsif cond3
stmts3
elsif cond4
stmts4
else
stmts5
end
|
if cond1:
stmts1
elif cond2:
stmts2
elif cond3:
stmts3
elif cond4:
stmts4
else
stmts5
|
if (cond1) {
stmts1
} elseif (cond2) {
stmts2
} elseif (cond3) {
stmts3
} elseif (cond4) {
stmts4
} else {
stmts5
}
|
Lisp’s COND
Because Lisp is basically written in abstract syntax trees, you won’t ever have a dangling else, and the multiway conditional is already handled by
COND:
(COND
(condition1 block1)
(condition2 block2)
(condition3 block3)
(condition4 block4)
(T block5))
Interestingly, Erlang uses the same idea, though with a terminating-end syntax. There are no explicit elses; Erlang’s if is inherently multiway: |
if
condition1 -> body1;
condition2 -> body2;
condition3 -> body3;
condition4 -> body4;
end.
Require bracketing but accept lookahead
There is a way to require bracketing without resorting to special words like
elsif or
elif and without ending up with a whole mess of terminators at the end. The solution is amazingly simple:
IfStmt = "if" "(" Exp ")" Block
("else" "if" "(" Exp ")" Block)*
('else' Block)?
Block = "{" Stmt* '}'
I haven't seen this before. Why is this not popular? The only thing I can think of is that top-down parsers will need a two-token lookahead when encountering an
else. Why should this be a big deal? Just peek ahead to see if the next token is an
if (or a
{).
How can we do this for a terminating-end syntax? This doesn't work very well:
IfStmt = "if" Exp "then" Stmt+
("else" "if" Exp "then" Stmt+)*
("else" Stmt+)?
"end"
because the
amount of required lookahead is infinite. What about rejecting 'if' statements in the final
else-part?
Stmt = IfStmt | NonIfStmt
IfStmt = "if" Exp "then" Stmt+
("else" "if" Exp "then" Stmt+)*
("else" NonIfStmt Stmt*)?
"end"
There we go! As long as
NonIfStmt cannot be empty and cannot start with
if, we can parse top-down with a lookahead of 2.
Novel syntactic forms
If the objection to
elsif and related words is just that they are made up, we could give the
if-statement a make over, using more reasonable words, or symbols, even::
when cond1 do
stmts1
or when cond2 do
stmts2
or when cond3 do
stmts3
or when cond4 do
stmts
or else do
stmts
end
| try cond1 do
stmts1
or cond2 do
stmts2
or cond3 do
stmts3
or cond4 do
stmts
otherwise
stmts
end
| try [ cond1 ] =>
stmts1
[ cond2 ] =>
stmts2
[ cond3 ] =>
stmts3
[ cond4 ] =>
stmts
[] =>
stmts
end
|
No comments:
Post a Comment