The Pizza Compiler

The Pizza language is an extension to Java with three new features:
- Generics (aka Parametric polymorphism)
- Function pointers (aka First-class functions)
- Class cases and pattern matching (aka Algebraic types)

Pizza Example: Enumerator

 

 

 

The Problem
This problem shows how Pizza's first-class functions and parametric polymorphism help code reuse. We are looking for solutions to two somewhat related problems:
  1. Read lines from standard input, capitalize each line as you read it, and print out the capitalized line. Input is terminated by an empty line.
  2. Read integers from standard input, one on each line, again terminated by an empty line. Print the sum of all the numbers read.
The individual problems are each trivial. The challenge is to build up a class library that helps in reusing code between these two problems, as well as others like them.


The Java Solution
An element common to both problems is the task of reading individual lines of input. The Java solution factors out this task using an enumeration. The class Lines implements the Java enumeration interface by providing the two procedures
boolean hasMoreElements()
and
Object nextElement() .
In fact the implementation of the first method is trivial, and the second is only slightly more interesting. Since the standard Java class DataInputStream detects the end of a file using exceptions, we simply return true every time the first method is called:
    public boolean hasMoreElements() {
	return true;
    }

The second method is also quite simple. We attempt to read in a line of text, and catch any exceptions which would indicate reading past the file's end:

    public Object nextElement() {
	try {
	    return in.readLine();
	} catch (IOException ex) {
	    throw new NoSuchElementException();
	}
    }
For simplicity, we convert all I/O errors into the single exception indicating that no more elements may be read from the enumerator.

The two programs Caps and Adder each use the Lines class to read successive lines. The main routines of the two classes have similar structure. In each, we begin by initializing the enumerator with the standard keyboard input. The main loop of the program consists of reading and manipulating the input. To fetch the input we use the nextElement method, whose result is of type Object and which must be converted to String by the narrowing cast

    line = (String)nextElement
Then after reading the input in each iteration, we perform some manipulation to the input, and use it to update some part of the state of the system. The loops are exited upon reading a blank line. Following each loop, the methods may each have some shutdown operations. In other words, each main routine has the rough form
     public static void main(String[] argv) {
	Enumeration lines = new Lines(new DataInputStream(System.in));
        while (lines.hasMoreElements()) {
	    String line = (String)lines.nextElement();
	    if (nonEmpty(line)) {
		update(line.manipulate());
	    } else {
		break;
	    }
	}
	shutdown();
    }

In the main routine of the Caps class, the manipulation is with the toUpperCase method, the state update is by printing (thus changing the state of the global output), and the shutdown operation is empty:

     public static void main(String[] argv) {
	Enumeration lines = new Lines(new DataInputStream(System.in));
	while (lines.hasMoreElements()) {
	    String line = (String)lines.nextElement();
	    if (nonEmpty(line)) {
		System.out.println(line.toUpperCase());
	    } else {
		break;
	    }
	}
	/* No shutdown operation */
    }

In the main routine of the Adder class, the manipulation is by converting the string to a number, the state change is addition into a an accumulator variable, and the shutdown operation is to print the final value of the accumulator:

     public static void main(String[] argv) {
	Enumeration lines = new Lines(new DataInputStream(System.in));
	int sum = 0;
	while (lines.hasMoreElements()) {
	    String line = (String)lines.nextElement();
	    if (nonEmpty(line)) {
		sum = sum + Integer.parseInt(line);
	    } else {
		break;
	    }
	}
	System.out.println(sum);
    }

The coding of this example is somewhat awkward, a point we expand upon in a footnote. The coding style anticipates the discussion below by bringing several issues together in a textually small space. The footnote shows how a less contorted coding style will still allow the same benefits in the transition from Java to Pizza.


The Pizza Solution
Before we present the Pizza solution, let's consider the less satisfying aspects of the Java code above. Although the main routines of both the Adder and Caps classes have the same pattern, it would be difficult at best to abstract this behavior without incurring a large interpretive overhead. As a result, most of the two routines - the part we wrote in black above - are duplicates of each other. Such duplication of code clearly reduces programmer productivity as well as increasing the chance that some subtle coding error might be introduced. The problem is one of abstraction: simple object-orientation allows us to encapsulate the particulars of reading from a file in the Enumerator class, but it is more difficult to modify existing Enumerator class objects for different patterns of control.

One possible solution is creation of additional subclasses of the Lines enumerator which override the appropriate methods. While effective, this approach has two drawbacks. First, such extra classes greatly increase code size, much of which is again duplication, even though one of our original criticisms was the size and duplication within the original code. It is likely that the new classes will be used only once each, meaning that the extra code length gives little gain in functionality.

A second drawback which also affects code readability and maintenance is that these new classes are textually separated from the main routines which use them. Understandability is decreased since the reader must flip back and forth between classes to determine the overall effect of a piece of code; maintenance is more difficult since one must consider the effects of changing a much greater amount of code which may or may not be used in yet more places in the program.

A final minor problem with the Java solution is the need for a type cast. Since it would be difficult for the system to decide the subclass of Object to which the result should be expressed, we must make this conversion explicit. Such conversions are standard under the Java type system, but we will see how the proper use of type polymorphism can make such casts unnecessary.

The Pizza solution makes use of the abstract class net.sf.pizzacompiler.util.Enumerator, which extends the standard net.sf.pizzacompiler.util.Enumeration interface. The Pizza version of both classes are parameterized with nextElement()'s result type. Hence, the narrowing casts of the Java solution are no longer necessary. Aside from the type polymorhism, the basic Pizza Enumeration class is essentially identical to the original Java version.

It is in the Enumerator class that we make best use of Pizza's more expressive constructs. In the Java solution, we found it necessary to repeat common control patterns in the two main routines. In Pizza, we can standardize the control pattern, and pass the particular behavior as a parameter. The two main routines which we consider here exhibit three distinct flavors of abstracted behavior: the termination condition for the loop, the conversion from the original input lines and the state manipulation based on the converted input.

All three flavors are similarly derived. The Pizza version of the plain Lines enumerator features additional methods which returns new enumerators computed as a variation on a given enumerator object. For example, the basic Lines class specifies a trivial routine for the hasMoreElements method, but the termination condition in our two examples is different. We can use the takeWhile method to derive a new enumerator whose behavior is just the same as the originally instantiated Lines object, but with a different termination condition.

We specify the termination condition using a first-class function value. In a language such as Java where function values are not considered first-class, an object's methods may be accessed in only one way: by calling them. For example, in both of the main classes of our example we have a static nonEmpty method which indicates when we have received all of the input we wish to consider. In Java, we can only call this method, but in Pizza, we can also pass the uncalled method as a parameter when we call another method. Thus where

    new Lines(new DataInputStream(System.in))
returns an instantiation of the Lines class to read lines of standard input until an exception is raised,
    new Lines(new DataInputStream(System.in))
	    .takeWhile(nonEmpty)
instantiates a second enumerator based on the first which produces output only when nonEmpty returns true. In other words, on the first element where nonEmpty returns false, the modified hasMoreElements will return false and further calls to nextElement will raise an exception - both whether or not there are more lines to read on the standard input.

We implement the second flavor of behavior by another method which receives a function value. The enumerator map method takes a function f and returns a new enumerator whose output is the result of applying f to the original input. That is, if the originally instantiated enumerator

    new Lines(new DataInputStream(System.in))
	    .takeWhile(nonEmpty)
returns some value V at a particular call, then the enumerator
    new Lines(new DataInputStream(System.in))
	    .takeWhile(nonEmpty)
            .map(f)
returns f(V) at the corresponding call.

In the Pizza version of the Caps class, we wish to capitalize each element of the input. We have a static method toUpperCase which performs this manipulation, and we pass this function as the argument of map. So the call

    new Lines(new DataInputStream(System.in))
	    .takeWhile(nonEmpty)
            .map(toUpperCase)
returns capitalized versions of successive lines of input.

For the Adder class, we take advantage of another aspect of first-class functions in order to produce an even more concise main routine. In Caps, we explicitly named the function which we passed to map. If we were passing, say, a constant integer or string to a method we might consider it excessive to give the constant a name, especially if that use of the constant were independant from all other uses of the same constant in the program. But since we consider functions to be first-class citizens of our language, we can use a function value without giving it a name, just as we can do with any other value.

Function values are created using the fun keyword. The expression

    fun(String s) -> int { return Integer.parseInt(s); }
denotes a function which maps a single parameter s of type String to the integer value expressed in digits by that string. So where the enumerator
    new Lines(new DataInputStream(System.in))
	.takeWhile(nonEmpty)
will give lines of string input until an empty string is read, the modification
    new Lines(new DataInputStream(System.in))
	.takeWhile(nonEmpty)
        .map(fun(String s) -> int { return Integer.parseInt(s);
})
will convert each string to the appropriate integer before returning it.

The final flavor of behavior concerns what we do with the enumerated values. The map method alters each enumeration, but does not actually run through the enumerated values. A call to the function named by map is made only when each element is requested, but mapping a function onto an enumeration does not by itself cause the program to loop through the enumerated elements, disposing of each as appropriate.

In the Caps class, we want to print each capitalized line to the standard output. Again, we define a static method println which simply passes its value to the standard system routine for text output:

    static void println(String s) {
	System.out.println(s);
    }
We can then pass this routine to the enumerator method forall. Once again, this method expects a function as its parameter. The function should accept values of the enumerated type - here, String - and should not return anything. That is, the result type of the function should be void. So where the enumerator
    new Lines(new DataInputStream(System.in))
	.takeWhile(nonEmpty)
	.map(toUpperCase);
returns successive capitalized lines of input, the call
    new Lines(new DataInputStream(System.in))
	.takeWhile(nonEmpty)
	.map(toUpperCase)
	.forall(println);
will read each line of input, passing each in turn to the println function. The type of this latter expression is void: we have told the compiler when to stop generating items, how to adjust each one, and what to do with the adjusted versions, so it can process the enumeration without further instructions.

The enumerators in the Adder class are processed slightly differently: for Caps, each line of input affects only the corresponding line of output. Once we have printed the capitalized line, we can safely discard it. In Adder, each number does affect subsequent behavior. The numbers are to be summed, and only the final output printed out. Intuitively, we want to take the series of enumerated values

V1, V2, V3, ... , Vn-1, Vn.
and replace each comma ``,'' with an addition symbol ``+''
V1 + V2 + V3 + ... + Vn-1 + Vn.
and then print the result.

The intuitive description leaves two unanswered questions: should the addition associate to the right or left, and how should the replacement behave when there are no numbers in the enumeration. Addition of course follows the associativity axiom

(x + y) + z = x + (y + z) ,
but associativity does still matter, first since passed functions may have side effects which are not associative, and second since we want our abstraction to be more generally applicable. So we answer the first question by simply providing two such reduce methods, one for each associativity.

For the second question, we will expect an initial value to be passed to the behavior abstractor along with the behavior function. When the enumeration has no elements, the initial value will be immediately returned; otherwise we begin with this value when we add (or otherwise operate) the elements of the operation. So the two possible reduce methods would transform an enumeration

V1, V2, V3, ... , Vn-1, Vn.
and default value DEFAULT into either a left-associative reduction
((( ... (((DEFAULT + V1) + V2) + V3) + ... ) + Vn-1) + Vn)
or a right-associative reduction
(V1 + (V2 + (V3 + ( ... + (Vn-1 + (Vn + DEFAULT)) ... )))) .
So where the enumeration
    new Lines(new DataInputStream(System.in))
	.takeWhile(nonEmpty)
        .map(fun(String s) -> int { return Integer.parseInt(s);
})
generates integers corresponding to those typed as digits on the standard input, the call
    new Lines(new DataInputStream(System.in))
	.takeWhile(nonEmpty)
        .map(fun(String s) -> int { return Integer.parseInt(s); })
	.reduceLeft(0, fun(int x, int y)->int { return x + y;
}));
will sum the integers, beginning with the initial subtotal 0 (zero), returning the final result. We want to print this final total out, so the call
	System.out.println(
	    new Lines(new DataInputStream(System.in))
	        .takeWhile(nonEmpty)
                .map(fun(String s) -> int { return
Integer.parseInt(s); })
	        .reduceLeft(0, fun(int x, int y) -> int { return x +
y; }));
completes the main routine.


Some Final Remarks
The methods map, takeWhile, forAll and reduceLeft are known as higher-order functions, since they are functions which accept (and could conceivably return) other function values. The higher-order methods we provide with the Enumerator class are modelled after the corresponding functions in a list class of a typical functional language. Enumerator objects are very similar to the lazy list data structures of nonstrict functional languages such as Haskell. Data structures in many languages are strict: the entire structure is immediately computed. On the other hand, both Enumerator sequences and lazy lists are computed only one node at a time, and only when each element is explicitly required. A more thorough discussion of the applications of nonstrict data structures may be found in most textbooks on functional languages, for example that of Bird and Wadler.

Students of design patterns will find much of this discussion familiar. Behavioral patterns such as Iterator, Strategy and Template Method are concerned with exactly the sort of pattern discussed in this example: higher-order functions and the separation of a typical flow of control from the particulars of specific instances of that flow. By augmenting simple object-orientation with first-class functions and a richer notion of type polymorphism, we can more readily identify such patterns, and express them in much more concise code.


References

R. Bird and P. Wadler, FILL IN.

E. Gamma, R. Helm, R. Johnson and J. Vlissides, Design Patterns, Addison-Wesley Professional Computing Series, 1995.