2010-09-30

Values and No Values

Consider something as simple as recording a property for an object, say, a person's supervisor. What kind of possibilities exist? Here are some:

  1. I have a supervisor and her name is Alice.
  2. I definitely have no supervisor.
  3. I may or may not have a supervisor; I really don't know.
  4. I do know whether I have a supervisor, but I don't care to make this information public; in other words, it's none of your business.

How can we capture these cases in a programming language?

Strings Only

For languages with unsophisticated type systems (say, strings or symbols only), you might try:

var supervisor1 = "Alice";
var supervisor2 = "None";
var supervisor3 = "Unknown";
var supervisor4 = "Private Info";

The obvious problem here is that people can have any name at all, even "None", "Nobody", "N/A", "Robert'); drop table students;--", or even "". You'd have to express in comments, and add logic to your application, that certain strings really aren't names. This isn't what we'd call a clean solution. What we want is a solution in which the values representing no (or an unknown) supervisor belong to a type or types other than plain string.

Disjoint Sum Types

ML and related languages feature a very clean syntax for defining flexible datatypes:

datatype Person = None | Unknown | WillNotSay | Actual of string;
val supervisor1 = Actual "Alice";
val supervisor2 = None;
val supervisor3 = Unknown;
val supervisor4 = WillNotSay;

This works very nicely: the type of None is Person, not string. In fact, the type of each supervisor variable above is (inferred to be) Person. While this is a nice statically-typed solution, a programmer would need to add the special constants None, Unknown, and WillNotSay to all types for which this kind of information is relevant. You can do this "once" with a polymorphic type:

datatype 'a info = None | Unknown | WillNotSay | Actual of 'a;
val supervisor1 = Actual "Alice";
val supervisor2: string info = None;
val supervisor3: string info = Unknown;
val supervisor4: string info = WillNotSay;

Here we've given explicit types to supervisor2, supervisor3, and supervisor4, since the None, Unknown, and WillNotSay constructors are polymorphic.

JavaScript's null and undefined

In a dynamically typed language like JavaScript we often have values in distinct types that represent nothingness or lack of knowlege. In JavaScript, for example:

  • null is the sole member of the Null type and represents the certainty of having no real value.
  • undefined is the sole member of the Undefined type and represents the lack of knowledge about its real value.

JavaScript doesn't natively distinguish between cases 3 and 4 — lack of knowledge and the refusal to share that knowledge. It could be argued that isn't really a common case anyway. Now you could, if you really wanted to, make this distinction somewhat like this:

var alice = {name: "Alice"; supervisor: null};
var bob = {name: "Bob"; supervisor: alice};
var eve = (name: "Eve"; supervisor: undefined};
var mallory = {name: "Mallory"}

This uses the lack of a supervisor property of Mallory to say she makes no claim to even having a supervisor, and isn't telling you if she has one at all. This approach is a little ugly in practice since, in JavaScript anyway, evaluating mallory.supervisor produces undefined! You'd have to dig into the object and examine its properties in order to pick up on the difference.

Marker Objects

Another approach that works well in a dynamically-typed language is to create the special values yourself, as simple, plain old objects. In JavaScript:

var NONE = {};
var UNKNOWN = {};
var WILL_NOT_SAY = {};

It should be fairly easy to figure out how to use these values.

This approach doesn't quite work as well in a statically-typed language. In Java, for example, marker objects would be implemented like this:

public static final Object NONE = new Object();
public static final Object UNKNOWN = new Object();
public static final Object WILL_NOT_SAY = new Object();

But, this means, of course, properties such as supervisor will have to be given the Java type Object, which goes against the whole point of using a statically typed language. ML's disjoint sum types are better for static languages.

No comments:

Post a Comment