2010-05-02

Reserved Words

Many languages reserve words for their own use, meaning that you, the programmer, cannot use these words for your own purposes. Reserved words are related to keywords which are words used for certain purposes by the language, but which you can also use as you wish.

What is the motivation for a language to have reserved words? Are reserved words a good thing or not? Is there such thing as different degrees of being reserved?

Why reserve words at all?

A language might want to reserve words to limit confusion. The following are legal statements in PL/1, a language which chose not to reserve IF, THEN, ELSE, DO, and similar keywords:

IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;
IF IF THEN ELSE = IF; ELSE THEN = ELSE;
DO WHILE (WHILE = DO); END = WHILE + DO; END;

Would anyone really write code like that? Even if someone felt the need to abuse keywords like that, simply reserving them doesn't completely prevent programmers from writing confusing code in other ways. There's no stopping this:

int subtract(int x, int y) {
    return x + y;
}

Perhaps the real reason languages reserve words is to make things easier for the compiler, or to give the language authors an outlet in which to be funny. For example, the word goto is reserved in Java, even though it isn't used for anything. Of course, nothing prevents a Java programmer from using the word in package names, comments, and string literals. And, as Java is case-sensitive, there's always class Goto ... .

Why not reserve words?

There are a few reasons for not reserving words:

  • Programmers should have the freedom to choose their own identifiers. It's the compiler's job to figure out when words are used in a particular context.
  • It's annoying to programmers. There are thousands of Java programs in the world using variable and parameter names called klass, clazz, or c, because class is verboten. (Awesome hackers might even use clаss — haha, the third letter there is a Cyrillic small letter a, not a Latin one!)
  • As a language evolves, the addition of new reserved words will break old code.

Keywords versus symbols

Here are three alternative syntaxes for a certain statement:

if greater(x, y) and w then begin assign x plus 4 to z; print z; end;
if (x > y && w) {z = x + 4; print z}
(x > y && w ==> z = x + 4, !z)

The obvious things to notice are that too many keywords make code too verbose and readability goes down. Too few keywords gives you cryptic code. Most people understand + and * and /, but really, a ! for "print"? In fact, most symbols aren't obvious at first. Different languages use = differently. And what about these things: <=>. =~. %. ===, ^, *., #, ??, and who knows what else? Anyone use APL?

Symbols are sometimes preferred to keywords for another reason: if keywords are in a language you don't speak, they might be harder to learn than symbols. Programmers that know Spanish but not English may prefer 'y' to 'and' and 'mientras' to 'while'. This is one use for macros, I suppose. In C:

#define mientras while

Symbols as keywords

In ML, identifiers can be alphanumeric or symbolic. So +, <=*=> and ====> are all identifiers. You can define functions with these names. And you can define + to do subtraction. You can even do that in Ruby. Is this okay? The choices seem to be:

  • Allow symbolic identifiers, but treat some as reserved words so that no programmer can override such basic operators as +.
  • Allow symbolic identifiers but place no restrictions on them. After all, the reason languages allow "operator overloading" is to allow addition on vectors and other structured objects.
  • Disallow symbolic identifiers but permit operator overloading (for the fixed set of predefined symbolic operators).
  • Prohibit both symbolic identifiers and operator overloading.

How reserved is reserved?

When a language specification says that a word is reserved, it generally means one of two things:

  • The word cannot, under any circumstances, be used as a name for any programmer-defined entity (e.g., a variable, constant, function, type, object property name, method, function parameter, statement label, etc.)
  • The word can be used to name some programmer-defined entities, but its position in the source code text is restricted to simplify language processing (specifically, tokenization).

The former case may be stricter than necessary because quoting, for one thing, works pretty well for object properties (at least):

point["long"] = 118.3532;

You can't stop people from using reserved words in string literals, after all. Similar to quoting is escaping; for example, you would write something like :

var @while = 2;

The @ is not part of the variable name, it's just an escape character that tells the compiler the variable is really named 'while' but we're going to use it as an identifier and not as the beginning of a while statement.

The second case is interesting. JavaScript, for example, reserves the word "long" and while it allows

point["long"] = 118.3532;

it does not allow

point.long = 118.3532;

The probable reason for this is that it wants its lexical analyzer to be independent of context: for it to simply look that the above text and tokenize it into

    IDENTIFIER ("point")
    DOT
    KEYWORD_LONG
    ASSIGNMENT_OPERATOR
    NUMBER ("118.3532")

But is that really important in practice anymore? Lexical analyzers can easily choose a different meaning for a token if, say, the previous token is a dot! That would allow point.long. Now what about var point = {lat: 40, long: -110}? It would seem we'd have to allow the lexical analyzer to have access to the parser's state, but really, just wait until the semantic analysis phase and allow certain known keyword tokens to be just as good as identifiers when it comes to object property names.

3 comments:

  1. Do you have a spam issue on this website; I also am a blogger, and I was wanting to know your situation; many of us have developed some nice methods and we are looking to trade methods with others, why not shoot me an e-mail if interested.

    ---------------
    Branding

    ReplyDelete