The Pizza Compiler

The Pizza language is an extension to Java with three new features:
- Generics (aka Parametric polymorphism)
- Function pointers (aka First-class functions)
- Class cases and pattern matching (aka Algebraic types)

Pizza Example: Quicksort

 

 

 

The Problem

Quicksort is a well-known sorting algorithm, first described by Hoare. The algorithm is very efficient in the average case, and is quite frequently used. This problem is a good example because the details of the algorithm are almost universally taught in computer science education, because the implementation is large enought to be interesting without drowning us in details, and (of course!) because the new features of Pizza improve the possible codings of the algorithm in a number of ways.

The Java Solution

The main advantage of coding the algorithm in Pizza rather than Java is that the same code can sort arrays of many different types. Here, the modularity arises from the use of first-class functions and polymorphism. Our top-level sorting function will take two parameters: not only the array to be sorted (as usual), but also a function which compares two elements, returning true if the former is less than (or equal to) the latter. Without the first-class functions, one would have either a separate sorting routine for every different sorting criterion, or else a single routine able to sort only arrays of Object which have some comparison method. The disadvantage of the former is clear, as large amounts of code would need to be duplicated. In the latter, we have two main disadvantages.

The first disadvantage is simple: we cannot cast arrays of primitive elements into arrays of Object, so we will still require separate sorters for each primitive type.

The second disadvantage of sorting based on a comparison method of Object's is that we will casts in the general comparison method: objects which are (for example) strings must first be cast to strings, and then compared using the string comparison function. Casts have two serious problems. First, there will be an additional time cost associated with the casting: since the comparison method is called frequently, this penalty is quite serious. The second problem is a loss of type safety: since the sorting function will accept arrays of any objects at compile-time, and check that the objects are of the proper type only at comparison-time, the program may fail in the middle of execution. Since polymorphism is strongly typed at compile-time, the Pizza compiler can detect immediately whether a type error would occur at run-time, leading to a smaller debugging cycle.

The Pizza Solution

This link summons the Pizza code for Quicksort to a new browser window. You may even download the QuickSorter<A> source code.

The first obvious departure from the ordinary Java language is the use of a type parameter in the class definition (red highlighting). The type variable A indicates that each instantiation of a member of the class may use a different type in that position, but usage within each such object must be consistant. That is, if we instantiate one object to sort arrays of integers, it will be a detectable (compile-time) type error to apply that sorter to an array of (say) characters. In this example, we will use single upper-case letters as our type variables. Of course it is legal to use any valid identifier name for the type variables, but the readability of programs is improved if the identifiers for types and ordinary values are distinctly written.

An object of the quicksorter class stores the function to be used for comparing elements of the array under the private name less. The type variable A is part of this definition, since the type of this function may differ among instantiations. However, for every instantiation to a type M, the value of less will always have type (M,M)-->boolean.

The main sorting routine shows how we use the value of the stored functions (green highlighting). Within the body of the class definition, less refers to a function, so we can call it just like any other function or static member. Note that, just like an uninitialized value, calling a reference to an uninitialized function causes an error. The remainder of the method is absolutely standard: the only new aspect of the sorting loop is its use of the stored function, rather than a single comparison predicate hardcoded into the source code of the sorter.

The QuickSortTest class shows how we may use the sorting methods. We use a few ``helper'' functions: less and greater compare two strings in the usual ways, and showArray prints out the elements of an array of strings. The main routine sorts strings given as command-line arguments. We first sort the strings in increasing order. To do so, we generate a new QuickSorter object q (purple highlighting) with the appropriate comparison function. Once we have built the new object, we can use its unary sort method to sort the argument array.

The next few lines of the main routine illustrate another way to use the sorting routines. Although we have separated the task of choosing a sorting function from the task of sorting the array in the implementation, there is no need to require that these tasks be separated everytime we wish to sort. The static method sort of two arguments combines the two steps is the obvious manner, hiding the details. Here, we sort the array into decreasing order.

A second type variable (purple highlighting) appears in the definition of this static sort method. Since there will not necessarily be an instantiated class object when we call the static method, the type variable A which we associate with the whole class will not be resolved. Since different calls to the static method may of course sort arrays of different types, we need a local type variable for the static method. Even though each call to the method may involve arrays of different types of elements, each call must give an array and function whose types ``match'': the function must take pairs of values whose type is the same as the elements of the array. The local quantification of the type variable is shown by the <B> tag before the name of the method.

Clearly this static sorting method is useful because of its brevity, but the final bit of the example main method shows why the two-step approach is useful as well. To re-sort the array in increasing order, we can use the same object q again. The reuse of this closure saves the resources involved in creating an object and discarding it after the single call of the static method.

Exercise

Rewrite the QuickSorter class so that it does not store the comparison function: there should be only two methods, a public static routine with the same argument and result types as the current static sort method which calls a recursive, private static method with the additional bounds arguments.

References

C.A.R. Hoare, Quicksort, Comput. J., 5 (1962), p. 10-15. Any good textbook on data structures will cover this algorithm; see for example Sedgewick (1990).

R. Sedgewick, Algorithms in C, Addison-Wesley, 1990.