At some point in the history of programming languages, the idea of code as lists of statements with jumps or gotos gave way to the idea of nested (hierarchical) declarations and control structures, as in:
So programs are really trees, but we still need a string representation for storage and transmission. How can we capture trees in text? There are two main approaches. The first is to just give a direct representation of the syntax tree. The other is to be creative with syntax, making something that a human would like to read.
Here are several sketches of representations of the above tree.
S-Expressions
The usual direct representation of trees in strings is that of "S-expressions" — parenthesized forms such as (A B C D) where A is the root and B, C, and D are A's (ordered) children. B, C, and D can be primitive values or trees.
(function f (params x y) (block (var b 3) (if (< x (* a (+ b 7))) (while found (block (for i (range 2 5) (print i)) (if (== a 2) (break)))) (= p (cos 5))) (return (/ a 5))))
Languages like LISP, Scheme, and Clojure use S-expressions (though with a few little extensions here and there).
In the example above, some of the objects aren't actually evaluated, so the distinction between symbols and values sometimes comes up. Occasionally this is made explicit in the syntax, for example:
(function 'f (params 'x 'y) (block (var 'b 3) (if (< x (* a (+ b 7))) (while found (block (for 'i (range 2 5) (print i)) (if (== a 2) (break)))) (= p (cos 5))) (return (/ a 5))))
Instead of the quote character for symbols, the colon or ampersand (or other character) could be used.
XML Style
The tree can be given directly in XML, too. This is probably too verbose, not only because we have some markup characters, but because we have to invent elements for leaf nodes.
<function name="f" params="x,y"> <block> <var id="b"><intlit value="3"/></var> <if> <less> <ref id="x"/> <times> <ref id="a"/> <plus><ref id="b"/><intlit value="7"></plus> </times> </less> <while> <ref id="found"/> <block> <for var="i"> <range><intlit value="2"/><intlit value="5"/></range> <print><ref id="i"/></print> </for> <if> <eq><ref id="a"/><intlit value="2"/></eq> <break/> </if> </block> </while> <assign> <ref id="p"/> <cos><intlit value="5"/></cos> </assign> </if> <return>5</return> </block> </function>
Reader exercise: Could this have been done without the ref and intlit elements?
JSON Style
Another form of direct tree representation is an encoding in JSON. This gets pretty verbose as well because in JSON the properties are unordered, so naming them seems to be the most reasonable approach. And again, we have to name everything:
{type: "function", name: "f", params: ["x", "y"], body: { type: "block", statements: [ {type: "var", id: "b", init: 3}, {type: "if", condition: {type: "<", left: "x", right: { type: "*", left: "a", right: {type: "+", left: "b", right: 7} }}, truePart: {type: "while", condition: "found", body: {type: "block", statements: [ {type: "for", var: "i", range: {type: "range", low: 2, high: 5}, body: {type: "print", args: ["i"]} }, {type: "if", condition: {type: "==", left: "a", right: 2}, truePart: {type: "break"} } ]} }, falsePart: {type: "assign", target: "p", source: {type: "cos", args: [5]} } }, {type: "return", value: 5} ] }}
Indentation
Okay, let's move away from direct representations and use some real syntax. By real syntax we mean cleverly arranging symbols in a pleasing and readable way but still having it be possible to unambiguously recapture the tree structure.
function f (x, y) var b = 3 if x < a * (b + 7) while found for i in 2..5 print i if a == 2 break else p = cos(5) return a / 5
Curly Braces
The use of curly braces to define subtrees is, for some reason, very popular.
function f (x) { var b = 3; if (x < a * (b + 7)) { while (found) { for (i in 2..5) { print i; } if (a == 2) { break; } } } else { p = cos(5); } return a / 5; }
Terminating Keywords
Rather than marking the beginning and end of a block, as is done with the curly brace style and its equivalents, we can get away with just marking the end. This can be done simply with the word end:
function f (x, y) var b = 3 if x < a * (b + 7) while found for i in 2..5 print i end if a == 2 break end end else p = cos(5) end return a / 5 end
or by spelling the initial word backwards (yes, many languages do this):
function f (x, y) var b = 3 if x < a * (b + 7) while found for i in 2..5 print i rof if a == 2 break fi elihw else p = cos(5) fi return a / 5 noitcnuf
Postfix Style
Here's a sketch using the postfix style. Here it's definitely important, I think, to distinguish symbols from variables that need to be evaluated. I used square brackets to bundle code, as should be obvious:
[ 'x' param 'y' param 3 'b' var x a b 7 + * < [ found [ 'i' 2 5 [i print] for 2 a == [break] [] if ] while] [5 cos p assign] if a 5 / return ] 'f' function
Functional Style, with 'let' and Infix Operators
Here's a style that borrows from Standard ML. Block scope is introduced with the let...in...end expression, and expression sequences use semicolons as separators. Expression sequences are allowed in the body of a let, or wherever a regular expression can go, provided the sequence is enclosed in parentheses.
function f (x, y) = let b = 3 in if x < a * (b + 7) then while found do ( for i in 2..5 print i ; if a == 2 then break else nil ) else p = cos(5) ; a / 5 end
No comments:
Post a Comment