2010-02-11

Bracketing

At some point in the history of programming languages, the idea of code as lists of statements with jumps or gotos gave way to the idea of nested (hierarchical) declarations and control structures, as in:

ast

So programs are really trees, but we still need a string representation for storage and transmission. How can we capture trees in text? There are two main approaches. The first is to just give a direct representation of the syntax tree. The other is to be creative with syntax, making something that a human would like to read.

Here are several sketches of representations of the above tree.

S-Expressions

The usual direct representation of trees in strings is that of "S-expressions" — parenthesized forms such as (A B C D) where A is the root and B, C, and D are A's (ordered) children. B, C, and D can be primitive values or trees.

(function f (params x y)
    (block 
        (var b 3)
        (if (< x (* a (+ b 7)))
            (while found
                (block
                    (for i (range 2 5) (print i))
                    (if (== a 2) (break))))
            (= p (cos 5)))
        (return (/ a 5))))

Languages like LISP, Scheme, and Clojure use S-expressions (though with a few little extensions here and there).

In the example above, some of the objects aren't actually evaluated, so the distinction between symbols and values sometimes comes up. Occasionally this is made explicit in the syntax, for example:

(function 'f (params 'x 'y)
    (block 
        (var 'b 3)
        (if (< x (* a (+ b 7)))
            (while found
                (block
                    (for 'i (range 2 5) (print i))
                    (if (== a 2) (break))))
            (= p (cos 5)))
        (return (/ a 5))))

Instead of the quote character for symbols, the colon or ampersand (or other character) could be used.

XML Style

The tree can be given directly in XML, too. This is probably too verbose, not only because we have some markup characters, but because we have to invent elements for leaf nodes.

<function name="f" params="x,y">
  <block>
    <var id="b"><intlit value="3"/></var>
    <if>
      <less>
        <ref id="x"/>
        <times>
          <ref id="a"/>
          <plus><ref id="b"/><intlit value="7"></plus>
        </times>
      </less> 
      <while>
        <ref id="found"/>
        <block>
          <for var="i">
            <range><intlit value="2"/><intlit value="5"/></range>
            <print><ref id="i"/></print>
          </for>
          <if>
            <eq><ref id="a"/><intlit value="2"/></eq>
            <break/>
          </if>
        </block>
      </while>
      <assign>
        <ref id="p"/>
        <cos><intlit value="5"/></cos>
      </assign>
    </if>
    <return>5</return>
  </block>
</function>

Reader exercise: Could this have been done without the ref and intlit elements?

JSON Style

Another form of direct tree representation is an encoding in JSON. This gets pretty verbose as well because in JSON the properties are unordered, so naming them seems to be the most reasonable approach. And again, we have to name everything:

{type: "function", name: "f", params: ["x", "y"], body: {
    type: "block", statements: [
        {type: "var", id: "b", init: 3},
        {type: "if", 
            condition: {type: "<", left: "x", right: {
                type: "*", 
                left: "a", 
                right: {type: "+", left: "b", right: 7}
            }}, 
            truePart: {type: "while",
                condition: "found", body: {type: "block", statements: [
                    {type: "for", var: "i", 
                        range: {type: "range", low: 2, high: 5}, 
                        body: {type: "print", args: ["i"]}
                    },
                    {type: "if",
                        condition: {type: "==", left: "a", right: 2},
                        truePart: {type: "break"}
                    }
                ]}
            }, 
            falsePart: {type: "assign", 
                target: "p", 
                source: {type: "cos", args: [5]}
            }
        },
        {type: "return", value: 5}
    ]
}}

Indentation

Okay, let's move away from direct representations and use some real syntax. By real syntax we mean cleverly arranging symbols in a pleasing and readable way but still having it be possible to unambiguously recapture the tree structure.

function f (x, y)
    var b = 3
    if x < a * (b + 7)
        while found
            for i in 2..5
                print i
            if a == 2
                break
    else
        p = cos(5)
    return a / 5

Curly Braces

The use of curly braces to define subtrees is, for some reason, very popular.

function f (x) {
    var b = 3;
    if (x < a * (b + 7)) {
        while (found) {
            for (i in 2..5) {
                print i;
            }
            if (a == 2) {
                break;
            }
        }
    } else {
        p = cos(5);
    }
    return a / 5;
}

Terminating Keywords

Rather than marking the beginning and end of a block, as is done with the curly brace style and its equivalents, we can get away with just marking the end. This can be done simply with the word end:

function f (x, y)
    var b = 3
    if x < a * (b + 7)
        while found
            for i in 2..5
                print i
            end
            if a == 2
                break
            end
        end
    else
        p = cos(5)
    end
    return a / 5
end

or by spelling the initial word backwards (yes, many languages do this):

function f (x, y)
    var b = 3
    if x < a * (b + 7)
        while found
            for i in 2..5
                print i
            rof
            if a == 2
                break
            fi
        elihw
    else
        p = cos(5)
    fi
    return a / 5
noitcnuf

Postfix Style

Here's a sketch using the postfix style. Here it's definitely important, I think, to distinguish symbols from variables that need to be evaluated. I used square brackets to bundle code, as should be obvious:

[
    'x' param 'y' param
    3 'b' var
    x a b 7 + * <
    [
        found
        [
            'i' 2 5 [i print] for
            2 a == [break] [] if
        ]
    while]
    [5 cos p assign]
    if
    a 5 / return
] 'f' function

Functional Style, with 'let' and Infix Operators

Here's a style that borrows from Standard ML. Block scope is introduced with the let...in...end expression, and expression sequences use semicolons as separators. Expression sequences are allowed in the body of a let, or wherever a regular expression can go, provided the sequence is enclosed in parentheses.

function f (x, y) =
    let 
        b = 3 
    in
        if x < a * (b + 7) then
            while found do (
                for i in 2..5
                    print i
                ;
                if a == 2 then
                    break
                else
                    nil
            )
        else
            p = cos(5)
        ;
        a / 5
    end

No comments:

Post a Comment