Programming Language Design Sketchbook: Global Variables

The concept of a global variable seems pretty simple, especially to beginning programmers writing a single-file program in a block-structured language. But for languages with modules, or languages that support multithreading or multiprocessing, or with scripting languages embedded in other systems that expose "host" objects via script variables, things are not so clear.

Most programmers would warn that globals should be used sparingly, for various reasons:

Because they can be written to from anywhere, it's harder to tell what might be going on just by looking at local portions of code. Code is harder to debug.
Globals make multithreaded programming harder to reason about, and make race conditions more likely to pop up.
Parameters are almost always a more readable and understandable way to "share" data among routines.

How do we live with, or manage, global variables? What constructs can a language have to mitigate these problems?

Languages that Require Global Variables

There's an interesting distinction between languages that allow global variables and those that require them. If a language allows pieces of a program to be brought together from different physical components (such as files), but does not provide a programming language construct like a module or package, then global variables are basically required to share information. Even if you adopt a message-passing type of communication between functions in different files, without a nice module construct, those functions are referenced with global variables.

Modules

Many languages have modules for the express purpose of limiting the number of global variables and hence minimizing the number of name collisions. The simplest example is that of a math module, found in most modern languages. Instead of global variables for E, PI, sin, cos, tan, etc., these values are wrapped in a module construct (usually called Math), which is the only top-level name exposed to the rest of the code. Examples:

# Ruby-like
module Math
  def sin(x) ... end
  def cos(x) ... end
  PI = 3.141592653589793
  E = 2.718281828459045
end

// JavaScript-like
var Math = {
    sin: function (x) {...},
    cos: function (x) {...},
    PI: 3.141592653589793,
    E: 2.718281828459045
};

# Java-like
public class Math {
    private Math() {}
    public static double sin(double x) {...}
    public static double cos(double x) {...}
    public static double PI = 3.141592653589793;
    public static double E = 2.718281828459045;
}

Modules create a namespace for these values which would otherwise be global. If there is a chance of having too many modules (that is, a possibility of having names of modules collide), modules can be grouped into some higher level named collection (Java calls this a package). Whether or not the additional construct is used, one can always use a hierarchical naming convention for global entities, starting, for instance, with a (reversed) DNS name that the programmer "owns."

Multithreading

The dangers of unprotected global variables in multithreading code (e.g., race conditions, lost updates) are so well known we don't need to explain them here. If you are stuck with a global, shared, variable for communication between threads, your language might provide some mechanism for atomic updating. For example:

atomic var balance = 0;

AtomicInteger balance = new AtomicInteger(0);

Internet Explorer Events

In most event-driven systems, an event object is passed to event handlers. If you are using JavaScript, and are writing event handlers to function on any browser except Internet Explorer, this will be the case. You write code like:

document.body.onclick = function (e) {
    alert("Clicked at (" + e.clientX + "," + e.clientY + ")");
}

With Internet Explorer, the event object is not passed to your handler; instead, the most recently fired event is accessed by a global variable (no kidding!) called event. IE gets away with this because client-side JavaScript is single-threaded: all events are queued and are handled sequentially. To make your code work on multiple browsers you can use the following idiom:

document.body.onclick = function (e) {
    if (!e) e = event;
    alert("Clicked at (" + e.clientX + "," + e.clientY + ")");
}

Sigils

Because local variables are usually better than global variables, one interesting design choice is to make locals the default and make globals ugly. Ruby does exactly this: global variables begin with a $ character. That's a sigil — a character used within a variable name to indicate its type, category, or (in this case) scope.

Example:

$x = 10
def f
  x = 3
  puts x
  puts $x
end
def g
  puts $x
end
def h
  puts x
end
f        # writes 3 then 10
g        # writes 10
puts $x  # writes 10
h        # raises a NameError

Another example:

x = 5
def f
  puts x
end
f        # raises a NameError

Hiding in Closures and Anonymous Functions

TODO

Programming Language Design Sketchbook

2010-04-17

Global Variables