Programming Language Design Sketchbook: April 2010

2010-04-29

Object Construction

There are countless ways to build objects. Here are a few.

Classic Constructors

Many class-based object oriented languages have constructors that look suspiciously like methods. Unlike instance methods which operate on existing objects, though, these constructors bring new objects into existence, and generally use a special syntax (such as an operator called new) to distinguish their invocation from that of a method.

new Point("Pantages Theater", 34.1017, -118.3255, 
    "Hollywood", "Los Angeles", "CA", "us", "90028", 
    new LocalDate(1930, 6, 4), Status.ACTIVE)

If there are too many parameters, you can pass in a map (dictionary, associative array, hash), or your language might support named parameter association.

Setters

With setter methods, you first construct an object in which all of its fields are set to default values, then you customize the object by setting just those fields that differ from the defaults.

Point p = new Point();
p.setName("Pantages Theater");
p.setLatitude(34.1017);
p.setLongitude(-118.3255);
p.setNeighborhood("Hollywood");
p.setCity("Los Angeles");
p.setState("CA");
p.setCountry("us");
p.setPostalCode("90028");
p.setEstablished(new LocalDate(1930, 6, 4));
p.setStatus(Status.ACTIVE);

This approach solves the three problems above, but introduces a new problem: the object being constructed can't be immutable. How can you get around this?

Some languages allow fields to be marked "final" so that after the first set, they can't be set again.
In some languages, you can scope the setter methods so that only the parts of the code that are supposed to the construct the objects can see the setters. Since objects should be able to be constructed anywhere, this all but requires the programmer to factor out construction code into its module, which could be unwieldy. But there are ways to do it nicely in general; one way is the builder, described next.

Builders

Builder objects are common too. They have the three advantages of the setter approach (no need to supply non-default values, order doesn't matter, and reader knows what the arguments mean), while also allowing the newly constructed object to be immutable.

Point p = new PointBuilder()
    .name("Pantages Theater");
    .latitude(34.1017);
    .longitude(-118.3255);
    .neighborhood("Hollywood");
    .city("Los Angeles");
    .state("CA");    
    .country("us");
    .postalCode("90028");
    .established(new LocalDate(1930, 6, 4));
    .status(Status.ACTIVE)
    .build();

Factory Methods

A factory method is any regular method (or regular function) that creates and returns a new object. There's no special syntax here; all that is done is to call a constructor, builder, or other device inside the body of the factory method. In Java:

public class Card {
    private Suit suit;
    private Rank rank;
    private Card(Suit s; Rank r) {       // Constructor must be private
        suit = s;                        // to force all clients to use
        rank = r;                        // the factory method below
    }
    public static Card fromRankAndSuit(Rank r, Suit s) {
        return new Card(s, r);
    }
    . . .
}

Literals

In languages without classes, or when classes are not too important (say, when there is duck typing), object literals create new objects. Here's a JavaScript example. Fields are string literals, but we get to omit the quotes if the field name is a simple word that is not a reserved word:

{name: "Pantages Theater", lat: 34.1017, "long": -118.3255, 
    neighborhood: "Hollywood", "state or province": "CA",
    country: "us", "postal code": "90028", city: "Los Angeles", 
    established: new LocalDate("1930-06-04"), status: Status.ACTIVE)}

Object literals are also possible in languages with statically typed, closed classes. In Ada:

type Point is record
    X: Integer;
    Y: Integer;
end record;
. . .
var P: Point := Point'(6, 5);
var Q: Point := Point'(Y => 5, X => 6);

Construction in Prototypal Languages

Modern (ECMAScript 5-based) JavaScript has nice features for creating objects with a given prototype. I came up with this pattern:

/*
 * An immutable circle datatype.  Synopsis:
 *
 * var c = Circle.create(5, {x: 1, y: 4});
 * c.radius      => 5
 * c.center      => {x: 1, y: 4}
 * c.area()      => 25π
 * c.perimeter() => 10π
 */
var Circle = {
    prototype: {
        area: function () {return Math.PI * this.radius * this.radius;},
        perimeter: function () {return 2 * Math.PI * this.radius;}
    },

    create: function (r, c) {
        return Object.create(this.prototype, {
            center: {value: c},
            radius: {value: r}
        });
    }
};

The circles we create with Circle.create are immutable because we use JavaScript's default property descriptors for center and radius: they're not writable, not enumberable, we can't delete them, nor can we change these settings. The same isn't true for the Circle properties (prototype and create), nor for the prototype's own properties...is it worth using property descriptors on these too?

2010-04-17

Global Variables

The concept of a global variable seems pretty simple, especially to beginning programmers writing a single-file program in a block-structured language. But for languages with modules, or languages that support multithreading or multiprocessing, or with scripting languages embedded in other systems that expose "host" objects via script variables, things are not so clear.

Most programmers would warn that globals should be used sparingly, for various reasons:

Because they can be written to from anywhere, it's harder to tell what might be going on just by looking at local portions of code. Code is harder to debug.
Globals make multithreaded programming harder to reason about, and make race conditions more likely to pop up.
Parameters are almost always a more readable and understandable way to "share" data among routines.

How do we live with, or manage, global variables? What constructs can a language have to mitigate these problems?

Languages that Require Global Variables

There's an interesting distinction between languages that allow global variables and those that require them. If a language allows pieces of a program to be brought together from different physical components (such as files), but does not provide a programming language construct like a module or package, then global variables are basically required to share information. Even if you adopt a message-passing type of communication between functions in different files, without a nice module construct, those functions are referenced with global variables.

Modules

Many languages have modules for the express purpose of limiting the number of global variables and hence minimizing the number of name collisions. The simplest example is that of a math module, found in most modern languages. Instead of global variables for E, PI, sin, cos, tan, etc., these values are wrapped in a module construct (usually called Math), which is the only top-level name exposed to the rest of the code. Examples:

# Ruby-like
module Math
  def sin(x) ... end
  def cos(x) ... end
  PI = 3.141592653589793
  E = 2.718281828459045
end

// JavaScript-like
var Math = {
    sin: function (x) {...},
    cos: function (x) {...},
    PI: 3.141592653589793,
    E: 2.718281828459045
};

# Java-like
public class Math {
    private Math() {}
    public static double sin(double x) {...}
    public static double cos(double x) {...}
    public static double PI = 3.141592653589793;
    public static double E = 2.718281828459045;
}

Modules create a namespace for these values which would otherwise be global. If there is a chance of having too many modules (that is, a possibility of having names of modules collide), modules can be grouped into some higher level named collection (Java calls this a package). Whether or not the additional construct is used, one can always use a hierarchical naming convention for global entities, starting, for instance, with a (reversed) DNS name that the programmer "owns."

Multithreading

The dangers of unprotected global variables in multithreading code (e.g., race conditions, lost updates) are so well known we don't need to explain them here. If you are stuck with a global, shared, variable for communication between threads, your language might provide some mechanism for atomic updating. For example:

atomic var balance = 0;

AtomicInteger balance = new AtomicInteger(0);

Internet Explorer Events

In most event-driven systems, an event object is passed to event handlers. If you are using JavaScript, and are writing event handlers to function on any browser except Internet Explorer, this will be the case. You write code like:

document.body.onclick = function (e) {
    alert("Clicked at (" + e.clientX + "," + e.clientY + ")");
}

With Internet Explorer, the event object is not passed to your handler; instead, the most recently fired event is accessed by a global variable (no kidding!) called event. IE gets away with this because client-side JavaScript is single-threaded: all events are queued and are handled sequentially. To make your code work on multiple browsers you can use the following idiom:

document.body.onclick = function (e) {
    if (!e) e = event;
    alert("Clicked at (" + e.clientX + "," + e.clientY + ")");
}

Sigils

Because local variables are usually better than global variables, one interesting design choice is to make locals the default and make globals ugly. Ruby does exactly this: global variables begin with a $ character. That's a sigil — a character used within a variable name to indicate its type, category, or (in this case) scope.

Example:

$x = 10
def f
  x = 3
  puts x
  puts $x
end
def g
  puts $x
end
def h
  puts x
end
f        # writes 3 then 10
g        # writes 10
puts $x  # writes 10
h        # raises a NameError

Another example:

x = 5
def f
  puts x
end
f        # raises a NameError

Hiding in Closures and Anonymous Functions

TODO