This problem shows how Pizza's first-class functions and parametric
polymorphism help code reuse. We are looking for solutions to two
somewhat related problems:
- Read lines from standard input, capitalize each line as you read
it, and print out the capitalized line. Input is terminated by an
empty line.
- Read integers from standard input, one on each line, again
terminated by an empty line. Print the sum of all the numbers read.
The individual problems are each trivial. The challenge is to build
up a class library that helps in reusing code between these two
problems, as well as others like them.
An element common to both problems is the task of reading individual
lines of input. The Java solution factors out this task
using an enumeration. The class Lines implements the Java enumeration
interface by providing the two procedures
boolean hasMoreElements()
and
Object nextElement() .
In fact the implementation of the first method is trivial, and the
second is only slightly more interesting. Since the standard Java
class DataInputStream detects the end of a file using
exceptions, we simply return true every time the first method is called:
public boolean hasMoreElements() {
return true;
}
The second method is also quite simple. We
attempt to read in a line of text, and catch any exceptions which
would indicate reading past the file's end:
public Object nextElement() {
try {
return in.readLine();
} catch (IOException ex) {
throw new NoSuchElementException();
}
}
For simplicity, we convert all I/O errors into the single exception
indicating that no more elements may be read from the enumerator.
The two programs Caps and Adder
each use the Lines class to read successive lines. The main
routines of the two classes have similar structure. In each, we begin
by initializing the enumerator with the standard keyboard input. The
main loop of the program consists of reading and manipulating the
input. To fetch the input we use the nextElement method,
whose result is of type Object and which must be converted to
String by the narrowing cast
line = (String)nextElement
Then after reading the input in each iteration, we perform some manipulation to the input, and use it to update some part of the state of the system.
The loops are exited upon reading a blank line. Following each loop,
the methods may each have some shutdown
operations. In other words, each main routine has the rough
form
public static void main(String[] argv) {
Enumeration lines = new Lines(new DataInputStream(System.in));
while (lines.hasMoreElements()) {
String line = (String)lines.nextElement();
if (nonEmpty(line)) {
update(line.manipulate());
} else {
break;
}
}
shutdown();
}
In the main routine of the Caps
class, the manipulation is with the toUpperCase method, the
state update is by printing (thus changing the state of the global
output), and the shutdown operation is empty:
public static void main(String[] argv) {
Enumeration lines = new Lines(new DataInputStream(System.in));
while (lines.hasMoreElements()) {
String line = (String)lines.nextElement();
if (nonEmpty(line)) {
System.out.println(line.toUpperCase());
} else {
break;
}
}
/* No shutdown operation */
}
In the main routine of the Adder
class, the manipulation is by converting the string to a number, the
state change is addition into a an accumulator variable, and the
shutdown operation is to print the final value of the accumulator:
public static void main(String[] argv) {
Enumeration lines = new Lines(new DataInputStream(System.in));
int sum = 0;
while (lines.hasMoreElements()) {
String line = (String)lines.nextElement();
if (nonEmpty(line)) {
sum = sum + Integer.parseInt(line);
} else {
break;
}
}
System.out.println(sum);
}
The coding of this example is somewhat awkward, a point
we expand upon in a footnote.
The coding style anticipates the discussion below by bringing several
issues together in a textually small space. The footnote shows how a
less contorted coding style will still allow the same benefits in the
transition from Java to Pizza.
Before we present the Pizza solution, let's consider the less
satisfying aspects of the Java code above. Although the main routines
of both the Adder and Caps classes have the same
pattern, it would be difficult at best to abstract this behavior
without incurring a large interpretive overhead. As a result, most of
the two routines - the part we wrote in black above - are duplicates
of each other. Such duplication of code clearly reduces programmer
productivity as well as increasing the chance that some subtle coding
error might be introduced. The problem is one of abstraction: simple
object-orientation allows us to encapsulate the particulars of reading
from a file in the Enumerator class, but it is more difficult
to modify existing Enumerator class objects for different
patterns of control.
One possible solution is creation of additional subclasses of the
Lines enumerator which override the appropriate methods.
While effective, this approach has two drawbacks. First, such
extra classes greatly increase code size, much of which is again
duplication, even though one of our original criticisms was the size
and duplication within the original code. It is likely that the new
classes will be used only once each, meaning that the extra code
length gives little gain in functionality.
A second drawback which also affects code readability and
maintenance is that these new classes are textually separated from the
main routines which use them. Understandability is decreased since
the reader must flip back and forth between classes to determine the
overall effect of a piece of code; maintenance is more difficult since
one must consider the effects of changing a much greater amount of
code which may or may not be used in yet more places in the program.
A final minor problem with the Java solution is the need for a
type cast. Since it would be difficult for the system to decide the
subclass of Object to which the result should be expressed,
we must make this conversion explicit. Such conversions are standard
under the Java type system, but we will see how the proper use of type
polymorphism can make such casts unnecessary.
The Pizza solution makes use of the abstract class net.sf.pizzacompiler.util.Enumerator, which
extends the standard net.sf.pizzacompiler.util.Enumeration
interface. The Pizza version of both classes are parameterized with
nextElement()'s result type. Hence, the narrowing casts of
the Java solution are no longer necessary. Aside from the type
polymorhism, the basic Pizza Enumeration class is essentially
identical to the original Java version.
It is in the Enumerator class that we make best use of
Pizza's more expressive constructs. In the Java solution, we found it
necessary to repeat common control patterns in the two main routines.
In Pizza, we can standardize the control pattern, and pass the
particular behavior as a parameter. The two main routines which we
consider here exhibit three distinct flavors of abstracted behavior:
the termination condition for the loop, the conversion from the
original input lines and the state manipulation based on the converted
input.
All three flavors are similarly derived. The Pizza version of the
plain Lines enumerator features additional methods which
returns new enumerators computed as a variation on a given enumerator
object. For example, the basic Lines class specifies a
trivial routine for the hasMoreElements method, but the
termination condition in our two examples is different. We can use
the takeWhile method to derive a new enumerator whose
behavior is just the same as the originally instantiated
Lines object, but with a different termination condition.
We specify the termination condition using a first-class
function value. In a language such as Java where function values
are not considered first-class, an object's methods may be accessed in
only one way: by calling them. For example, in both of the main
classes of our example we have a static nonEmpty method which
indicates when we have received all of the input we wish to consider.
In Java, we can only call this method, but in Pizza, we can also pass
the uncalled method as a parameter when we call another method. Thus
where
new Lines(new DataInputStream(System.in))
returns an instantiation of the Lines class to read lines of
standard input until an exception is raised,
new Lines(new DataInputStream(System.in))
.takeWhile(nonEmpty)
instantiates a second enumerator based on the first which produces
output only when nonEmpty returns true. In other
words, on the first element where nonEmpty returns
false, the modified hasMoreElements will return
false and further calls to nextElement will raise an
exception - both whether or not there are more lines to read on the
standard input.
We implement the second flavor of behavior by another method which
receives a function value. The enumerator map method takes a
function f and returns a new enumerator whose output is
the result of applying f to the original input. That is, if
the originally instantiated enumerator
new Lines(new DataInputStream(System.in))
.takeWhile(nonEmpty)
returns some value V at a particular call, then the
enumerator
new Lines(new DataInputStream(System.in))
.takeWhile(nonEmpty)
.map(f)
returns f(V) at the corresponding call.
In the Pizza version of the Caps class, we wish to
capitalize each element of the input. We have a static method
toUpperCase which performs this manipulation, and we pass
this function as the argument of map. So the call
new Lines(new DataInputStream(System.in))
.takeWhile(nonEmpty)
.map(toUpperCase)
returns capitalized versions of successive lines of input.
For the Adder class, we take advantage of another aspect
of first-class functions in order to produce an even more concise main
routine. In Caps, we explicitly named the function
which we passed to map. If we were passing, say, a constant
integer or string to a method we might consider it excessive to give
the constant a name, especially if that use of the constant were
independant from all other uses of the same constant in the program.
But since we consider functions to be first-class citizens of our
language, we can use a function value without giving it a
name, just as we can do with any other value.
Function values are created using the fun keyword. The
expression
fun(String s) -> int { return Integer.parseInt(s); }
denotes a function which maps a single parameter s of type
String to the integer value expressed in digits by that
string. So where the enumerator
new Lines(new DataInputStream(System.in))
.takeWhile(nonEmpty)
will give lines of string input until an empty string is read, the
modification
new Lines(new DataInputStream(System.in))
.takeWhile(nonEmpty)
.map(fun(String s) -> int { return Integer.parseInt(s);
})
will convert each string to the appropriate integer before returning
it.
The final flavor of behavior concerns what we do with the
enumerated values. The map method alters each enumeration,
but does not actually run through the enumerated values. A call to
the function named by map is made only when each element is
requested, but mapping a function onto an enumeration does not by
itself cause the program to loop through the enumerated elements,
disposing of each as appropriate.
In the Caps class, we want to print each capitalized line
to the standard output. Again, we define a static method
println which simply passes its value to the standard system
routine for text output:
static void println(String s) {
System.out.println(s);
}
We can then pass this routine to the enumerator method
forall. Once again, this method expects a function as its
parameter. The function should accept values of the enumerated type -
here, String - and should not return anything. That is, the
result type of the function should be void. So where the
enumerator
new Lines(new DataInputStream(System.in))
.takeWhile(nonEmpty)
.map(toUpperCase);
returns successive capitalized lines of input, the call
new Lines(new DataInputStream(System.in))
.takeWhile(nonEmpty)
.map(toUpperCase)
.forall(println);
will read each line of input, passing each in turn to the
println function. The type of this latter expression is
void: we have told the compiler when to stop generating
items, how to adjust each one, and what to do with the adjusted
versions, so it can process the enumeration without further
instructions.
The enumerators in the Adder class are processed slightly
differently: for Caps, each line of input affects only the
corresponding line of output. Once we have printed the capitalized
line, we can safely discard it. In Adder, each number
does affect subsequent behavior. The numbers are to be
summed, and only the final output printed out. Intuitively, we want
to take the series of enumerated values
V1,
V2,
V3,
... ,
Vn-1,
Vn.
and replace each comma ``,'' with an addition symbol ``+''
V1 +
V2 +
V3 +
... +
Vn-1 +
Vn.
and then print the result.
The intuitive description leaves two unanswered questions: should
the addition associate to the right or left, and how should the
replacement behave when there are no numbers in the enumeration.
Addition of course follows the associativity axiom
(x + y) + z = x + (y + z) ,
but associativity does still matter, first since passed functions may
have side effects which are not associative, and second since we want
our abstraction to be more generally applicable. So we answer the
first question by simply providing two such reduce methods,
one for each associativity.
For the second question, we will expect an initial value to be
passed to the behavior abstractor along with the behavior function.
When the enumeration has no elements, the initial value will be
immediately returned; otherwise we begin with this value when we add
(or otherwise operate) the elements of the operation. So the two
possible reduce methods would transform an enumeration
V1,
V2,
V3,
... ,
Vn-1,
Vn.
and default value DEFAULT into either a left-associative
reduction
((( ... (((DEFAULT +
V1) +
V2) +
V3) +
... ) +
Vn-1) +
Vn)
or a right-associative reduction
(V1 +
(V2 +
(V3 +
( ... +
(Vn-1 +
(Vn +
DEFAULT)) ... )))) .
So where the enumeration
new Lines(new DataInputStream(System.in))
.takeWhile(nonEmpty)
.map(fun(String s) -> int { return Integer.parseInt(s);
})
generates integers corresponding to those typed as digits on the
standard input, the call
new Lines(new DataInputStream(System.in))
.takeWhile(nonEmpty)
.map(fun(String s) -> int { return Integer.parseInt(s); })
.reduceLeft(0, fun(int x, int y)->int { return x + y;
}));
will sum the integers, beginning with the initial subtotal 0 (zero),
returning the final result. We want to print this final total out, so
the call
System.out.println(
new Lines(new DataInputStream(System.in))
.takeWhile(nonEmpty)
.map(fun(String s) -> int { return
Integer.parseInt(s); })
.reduceLeft(0, fun(int x, int y) -> int { return x +
y; }));
completes the main routine.
The methods map, takeWhile, forAll and
reduceLeft are known as higher-order functions, since they
are functions which accept (and could conceivably return) other
function values. The higher-order methods we provide with the
Enumerator class are modelled after the corresponding
functions in a list class of a typical functional language.
Enumerator objects are very similar to the lazy list
data structures of nonstrict functional languages such as Haskell.
Data structures in many languages are strict: the entire structure is
immediately computed. On the other hand, both Enumerator
sequences and lazy lists are computed only one node at a time, and
only when each element is explicitly required. A more thorough
discussion of the applications of nonstrict data structures may be
found in most textbooks on functional languages, for example that of
Bird and Wadler.
Students of design patterns will find much of this
discussion familiar. Behavioral patterns such as Iterator, Strategy
and Template Method are concerned with exactly the sort of pattern
discussed in this example: higher-order functions and the separation
of a typical flow of control from the particulars of specific
instances of that flow. By augmenting simple object-orientation with
first-class functions and a richer notion of type polymorphism, we can
more readily identify such patterns, and express them in much more
concise code.
References
R. Bird and P. Wadler, FILL IN.
E. Gamma, R. Helm, R. Johnson and
J. Vlissides, Design Patterns, Addison-Wesley
Professional Computing Series, 1995.