2010-02-08

Else ifs

You may have heard of the so-called dangling else problem. The text
if e1 then if e2 then s1 else s2
can reasonably mean either of two possible things:
danglingelse.png
When mixed with other statements without any obvious end, things can get crazy:
if e1 then while e2 do if e3 then s1 else s2
The dangling-else problem arises in languages with ambiguous grammars, such as the Pascal-like:
Stmt      = Assignment | IfStmt | WhileStmt | Block | ...
IfStmt    = "if" Exp "then" Stmt ("else" Stmt)?
WhileStmt = "while" Exp "do" Stmt
Block     = "begin" Stmt* "end"
or the C-like:
Stmt      = Assignment | IfStmt | WhileStmt | Block | ...
IfStmt    = "if" "(" Exp ")" Stmt ("else" Stmt)?
WhileStmt = "while" "(" Exp ")" Stmt
Block     = "{" Stmt* "}"
There are many ways to deal with this problem, but most lead to rather unsatisfactory ways to handle the general multi-way conditional. Is there a design that avoids dangling else problems and forces programmers to write nice multiway conditional statements?

Resolve with a semantic rule

In Pascal and C, the syntax is ambiguous, but a semantic rule says that each else must match the closest if, regardless of indentation. This can be terribly confusing to beginners; the code below on the left doesn't mean what novices might think:
(* Without blocks, 'else'
tied to second 'if' *)
if cond1 then
    if cond2 then
        stmt1
else (* Sorry! Goes w/ cond2! *)
    stmt2
(* MUST use block to tie
'else' to first 'if' *)
if cond1 then begin
    if cond2 then
        stmt1
end else (* Now it goes w/ cond1. *)
    stmt2
One objection to this approach is that purists will probably hate how this solution relies on a semantic rule to handle something so inherently structural. A second objection is that multiway conditionals don’t really exist—the syntax can only describe deeply nested individual conditionals. So programmers end up playing games with their indentation to simulate multiway branching:
if cond1 then
    stmt1
else if cond2 then
    stmt2
else if cond3 then
    stmt3
else if cond4 then
    stmt4
else
    stmt5

if cond1 then begin
    stmts1
end else if cond2 then begin
    stmts2
end else if cond3 then begin
    stmts3
end else if cond4 then begin
    stmts4
end else begin
    stmts5
end
This is ugly—but it could be a lot worse: programmers often mix up their conditional arms by sometimes using blocks and sometimes not, and get the ugly code they deserve. The worst part of the “semantic rule solution” dangling else syntax is that there seem to be hundreds of different ways to format multiway if statements. Take a look at some old Pascal textbooks if you want to be grossed out.

Complicate the grammar to remove the ambiguity

It is possible, with a lot of work, to force all elses to be paired with the nearest then. Call an open statement one with a sub-statement that dangles at the end, and a closed statement one that is not open. Then:
Stmt           = OpenStmt | ClosedStmt
OpenStmt       = IfThenStmt | IfThenElseStmt | WhileStmt | ForStmt
ClosedStmt     = Assignment | Call | BreakStmt | ReturnStmt
NoShortIfStmt  = ClosedStmt | NoShortIfIfThenElse
               | NoShortIfWhile | NoShortIfFor
IfThenStmt     = "if" Exp "then" Stmt
IfThenElseStmt = "if" Exp "then" NoShortIfStmt "else" Stmt
NoShortIfIfThenElse = 
    "if" EXP "then" NoShortIfStmt "else" NoShortIfStmt
NoShortIfWhile = "while" Exp "do" NoShortIfStmt
NoShortIfFor   = "for" Var "in" Range "do" NoShortIfStmt
etc.
This grammar is no longer ambiguous, and every else is automatically connected to the nearest if. But it does nothing to prevent the confusion arising from misindented code we saw above.

Require the else part

If there always has to be an else part, there is never any ambiguity, but you’ll have funny looking code when your else part is missing: you'll need some empty statement or a special null keyword.
if cond1 then
    if cond2 then
        stmt1
    else
        null
else
    stmt2
if cond1 then
    if cond2 then
        stmt1
    else
        stmt2
else
    null
Not pretty. It works, though, even if the while statement (and other compound statements) aren’t fully bracketed, for example:
if e1 then while e2 do if e3 then s1 else null else s2
and
if e1 then while e2 do if e3 then s1 else s2 else null

Require bracketing

Ultimately, languages that have compound statements where the bodies are “trailing statements” leave too many formatting choices and look ugly and unbalanced to many people. Defining a language’s syntax to require bracketed compound statements is a Good Thing for that reason, and as a bonus it removes the dangling else problem completely. The idea is that a block should not be a kind of statement, and that all compound statements use blocks for bodies, never simple statements! This grammar fragment:
Stmt      = Assignment | IfStmt | WhileStmt | ...
IfStmt    = "if" "(" Exp ")" Block ("else" Block)?
WhileStmt = "while" "(" Exp ")" Block
Block     = "{" Stmt* "}"
yields code like:
if (cond1) {
    if (cond2) {
        stmts1
    } else {
        stmts2
    }
}
if (cond1) {
    if (cond2) {
        stmts1
    }
} else {
    stmts2
}
Bracketing can also be done with just a terminating end instead of curly braces or begin-end pairs:
Stmt      = Assignment | IfStmt | WhileStmt | ...
IfStmt    = "if" Exp "then" Stmt+ ("else" STMT+)? end
WhileStmt = "while" Exp "do" Stmt+ "end"
yielding rather clean code like this:
if cond1 then
    if cond2 then
        stmts1
    else
        stmts2
    end
end
if cond1 then
    if cond2 then
        stmts1
    end
else
    stmts2
end
But, wait—if we really require bracketing, won’t that make multiway conditionals ugly? Either you start indenting too much or you get a bunch of }s (or ends) at the end. Like this, right?
if cond1 {
    stmts1
} else {
    if cond2 {
        stmts2
    } else {
        if cond3 {
            stmts3
        } else {
            if cond4 {
                stmts4
            } else {
                stmts5
            }
        }
    }
}
if cond1 {
    stmts1
} else { if cond2 {
    stmts2
} else { if cond3 {
    stmts3
} else { if cond4 {
    stmts4
} else {
    stmts5
}}}}






Make indentation matter

In Python, there is no dangling else problem since the indentation makes things clear:
if cond1:
    if cond2:
        stmts1
else
   stmts2
if cond1:
    if cond2:
        stmts1
    else
        stmts2
But, wouldn't required indentation lead to funny looking code in multiway conditionals?
if cond1:
    stmts1
else:
    if cond2:
        stmts2
    else
        if cond3:
            stmts3
        else
            if cond4:
                stmts4
            else
                stmts5
While the code above is legal Python, real Python programmers use the next solution.

Introduce a new keyword to simplify required bracketing or indentation

In most languages where bracketing (or indentation) is required, the multiway conditional is described syntactically as a single if-statement, usually with the help of a special keyword (called elsif in Ada, Ruby, and Perl; elif in bash and Python; and elseif in PHP. We have both a curly-brace style:
IfStmt =  "if" "(" Exp ")" Block
          ("elsif" "(" Exp ")" Block)*
          ("else" Block)?
Block  =  "{" STMT* "}"
and terminating-end style:
IfStmt = "if" EXP "then" Stmt +
          ("elsif"  Exp "then" Stmt+)*
          ("else" Stmt+)?
          "end"
Code looks like this:
# Ruby
if cond1
  stmts1
elsif cond2
  stmts2
elsif cond3
  stmts3
elsif cond4
  stmts4
else
  stmts5
end
# Python
if cond1:
    stmts1
elif cond2:
    stmts2
elif cond3:
    stmts3
elif cond4:
    stmts4
else
    stmts5

// PHP
if (cond1) {
    stmts1
} elseif (cond2) {
    stmts2
} elseif (cond3) {
    stmts3
} elseif (cond4) {
    stmts4
} else {
    stmts5
}

Lisp’s COND

Because Lisp is basically written in abstract syntax trees, you won’t ever have a dangling else, and the multiway conditional is already handled by COND:
(COND
  (condition1 block1)
  (condition2 block2)
  (condition3 block3)
  (condition4 block4)
  (T block5))
Interestingly, Erlang uses the same idea, though with a terminating-end syntax. There are no explicit elses; Erlang’s if is inherently multiway:
if
  condition1 -> body1;
  condition2 -> body2;
  condition3 -> body3;
  condition4 -> body4;
end.

Require bracketing but accept lookahead

There is a way to require bracketing without resorting to special words like elsif or elif and without ending up with a whole mess of terminators at the end. The solution is amazingly simple:
IfStmt = "if" "(" Exp ")" Block
         ("else" "if" "(" Exp ")" Block)*
         ('else' Block)?
Block  = "{" Stmt* '}'
I haven't seen this before. Why is this not popular? The only thing I can think of is that top-down parsers will need a two-token lookahead when encountering an else. Why should this be a big deal? Just peek ahead to see if the next token is an if (or a {).
How can we do this for a terminating-end syntax? This doesn't work very well:
IfStmt = "if" Exp "then" Stmt+
         ("else" "if" Exp "then" Stmt+)*
         ("else" Stmt+)?
         "end"
because the amount of required lookahead is infinite. What about rejecting 'if' statements in the final else-part?
Stmt   = IfStmt | NonIfStmt
IfStmt = "if" Exp "then" Stmt+
         ("else" "if" Exp "then" Stmt+)*
         ("else" NonIfStmt Stmt*)?
         "end"
There we go! As long as NonIfStmt cannot be empty and cannot start with if, we can parse top-down with a lookahead of 2.

Novel syntactic forms

If the objection to elsif and related words is just that they are made up, we could give the if-statement a make over, using more reasonable words, or symbols, even::
when cond1 do
    stmts1
or when cond2 do
    stmts2
or when cond3 do
    stmts3
or when cond4 do
    stmts
or else do
    stmts
end
try cond1 do
    stmts1
or cond2 do
    stmts2
or cond3 do
    stmts3
or cond4 do
    stmts
otherwise
    stmts
end
try [ cond1 ] =>
    stmts1
[ cond2 ] =>
    stmts2
[ cond3 ] =>
    stmts3
[ cond4 ] =>
    stmts
[] =>
    stmts
end

No comments:

Post a Comment