The Pizza Compiler

The Pizza language is an extension to Java with three new features:
- Generics (aka Parametric polymorphism)
- Function pointers (aka First-class functions)
- Class cases and pattern matching (aka Algebraic types)

Pizza Tutorial

 

 

 

This section of the tutorial is an introduction to the features the Pizza language adds to Java 1.1. We assume you know at least the basics about Java programming. The Pizza language is fully compatible with the Java 1.1 language and thus you can compile all your Java sources with the Pizza compiler without any changes.

We'll look first at the features Pizza adds to Java. Then we'll start to combine them and see how much easier Java programming becomes with Pizza.

But first of all you should know that all Pizza source files should use the extension `.pizza'.


Parametric Polymorphism

Parametric Polymorphism gives us the possibility to write classes that operate on data without specifing the data's type. (C++ programmers know the template mechanism that does something similar but without typechecking.)

To illustrate a very simple example we'll write a class that stores an element (source: StoreSomething.pizza):

class StoreSomething<A> {
     A something;

     StoreSomething(A something) {
         this.something = something;
     }

     void set(A something) {
         this.something = something;
     }

     A get() {
         return something;
     }
}
We read the source as a normal Java class StoreSomething that takes the type parameter A. A can be any legal Java (or Pizza) type. Whenever we write an A the Pizza compiler fills in the type we specifiy at the point where we instantiate the class:
StoreSomething<String> a = new StoreSomething("I'm a string!");
StoreSomething<int> b = new StoreSomething(17+4);

b.set(9);

int i = b.get();
String s = a.get();
Note that when allocating an Object of a parametric type, we don't need to pass the instance type. The Pizza compiler infers this type from the constructor arguments.

You might say we could have done it using Java's Object class. In that case we wouldn't be able to store a variable of Java's basic types (as int). Furthermore we would be forced to use several type casts to get the correct types when we use the get() method.

We can extend this class as any other Java class:

class StoreString extends StoreSomething<String> { 
    StoreString(String something) {
        super(something);
    }
}
gives us a specialised class that only stores objects of type String.

class StoreSomething2<A> extends StoreSomething<A> {

    StoreSomething2(A something) {
        super(something);
    }

    void print() {
        System.out.println(""+something);
    }
}
This gives us a parameterised class extending the other parameterised class.

Ok, ok the StoreSomething class is not really helpfull. Let's write a class to store two bits of data:

public class Pair<A, B> {

    public A fst;
    public B snd;

    public Pair(A fst, B snd) {
        this.fst = fst;
        this.snd = snd;
    }
}
This class becomes very handy, if you need a method to return two values:
Pair<String, int> getValues() {
    return new Pair(aString, anInt);
}
An because we know how handy it is, we integrated it to the standard Pizza classes as class pizza.Pair!

All the things we did to classes can also be done to interfaces. As an example we consider this interface to sets of elements of a parameterised type:

interface Set<A> {
     boolean contains(A x);
     Set<A> include(A x);
}
The method include(A) shows us the sets defined by this interface are persistent: Every time we include an element we get a new set (it doesn't change the old one).

A common implementation of a set is a linked list storing the elements. To implement the funcionality of a set we'll use two classes: EmptyList to represent a list without any elements and NonEmptyList. Both extend this abstract list class (source: Polymorph.pizza):

     abstract class LinkedList<A> implements Set<A> {

         public abstract boolean contains(A x);

         public LinkedList<A> insert(A x) {
             if (contains(x)) return this;
             else return new NonEmptyList(x, this);
         }

         public Set<A> include(A x) {
             return insert(x);
         }
     }

     class EmptyList<A> extends LinkedList<A> {

         public boolean contains(A x) {
             return false;
         }
     }

     class NonEmptyList<A> extends LinkedList<A> {

         private A elem;
         private LinkedList<A> rest;

         public NonEmptyList(A elem, LinkedList<A> rest) {
             this.elem = elem;
             this.rest = rest;
         }

         public boolean contains(A x) {
             return ((Object)elem).equals((Object)x) || 
             rest.contains(x);
         }
     }
If this example doesn't look to handy to you: wait until you learn about algebraic types! And if the code for method contains looks a little bit strange to you, wait until you learn about Pizza's typecasts!

As an example for the usage we consider this program which tests whether its first command line argument is repeated in subsequent arguments.

     public class TestForDuplicates {

         public static void main(String[] args) {
             Set<String> set = new EmptyList();
             for (int i = 1; i < args.length; i++)
                 set = set.include(args[i]);
             System.out.println(set.contains(args[0]));
         }
     }

Bounded Polymorphism

Our next task is to implement sets using a binary tree. To implement a binary tree we have to know how to compare the elements. That's why we have to set up a constraint on our parameter type - or to use the correct expression we have to bind the type.

So we set up an interface to wich we'll bind our parameter types:

     interface Comparable {
         boolean equals(Object x);
         boolean less(Object x);
     }
And we limit the elements of our set to it:
     interface Set<A implements Comparable> {
         boolean contains(A x);
         Set<A> include(A x);
     }
This declaration means we can only use types as parameters to the interface Set that implement the interface Comparable.

Our implementation of trees consists again of an abstract class, a class for empty trees and a class for non-empty trees (source: Bounded.pizza):

     abstract class Tree<A implements Comparable>
                                    implements Set<A> {
         public abstract boolean contains(A x);
         public abstract Tree<A> insert (A x);

         public Set<A> include(A x) {
             return insert(x);
         }
     }

     class EmptyTree<A implements Comparable> extends Tree<A> {
         public boolean contains(A x) {
             return false;
         }
         public Tree<A> insert(A x) {
             return new Branch(x, this, this);
         }
     }


     class Branch<A implements Comparable> extends Tree<A> {
         A elem;
         Tree<A> left;
         Tree<A> right;
         
         public Branch(A elem, Tree<A> left, Tree<A> right) {
             this.elem = elem;
             this.left = left;
             this.right = right;
         }

         public boolean contains(A x) {
             if (x.less(elem)) 
                 return left.contains(x);
             else if (elem.less(x))
                 return right.contains(x);
             else // we assume total ordering, therefore x.equals(elem)
                 return true;
         }

         public Tree<A> insert(A x) {
             if (x.less(elem)) 
                 return new Branch(elem, left.insert(x), right);
             else if (elem.less(x))
                 return new Branch(elem, left, right.insert(x));
             else // we assume total ordering, therefore x.equals(elem)
                 return this;
         }
     }
When you try to use this tree implementation you'll find out none of Java classes implements our interface Comparable! To use these classes with the tree we have to wrap it with a class implementing our interface (however, Pizza gives us a way to do it without wrapping classes):
     class ComparableString implements Comparable {

         private String s;

         ComparableString(String s) {
             this.s = s;
         }

         public boolean less(Object other) {
             if (other instanceof ComparableString) 
                 return s.compareTo(((ComparableString)other).s) < 0;
             else
                 throw new Error("can't compare to non-strings");
         }
     }
To be able to include argument strings in sets, we need to wrap them in a ComparableString. To show a sample usage of the tree impementation for sets we use again our test-for-duplicates program:
     public class TestForDuplicates {

         public static void main(String[] args) {
             Set<ComparableString> set = new EmptyTree();
             for (int i = 1; i < args.length; i++)
                 set.include(new ComparableString(args[i]));
             System.out.println(
                 set.contains(new ComparableString(args[0])));
         }
     }

F-bounded Polymorphism

In our implementation of less in the CompareableString class you find again some nasty type checks that make the code less readable and may result in exceptions during runtime. To avoid this the Comparable interface can be refined with a type parameter. We have to refine the Set interface to make sure these type parameters will be the same (source: FBounded.pizza):
     interface Comparable<A> {
         boolean less(A x);
     }

     interface Set<A implements Comparable<A>> {
         boolean contains(A x);
         Set<A> include(A x);
     }
Now ComparableString can be defined as follows:
class ComparableString implements Comparable<ComparableString> {

    private String s;

    ComparableString(String s) { this.s = s; }

    public boolean less(ComparableString other) {
        return s.compareTo(other.s) < 0;
    }
}
Note that the type variable A appears recursively in its own bound in the class definition of Set.

Polymorphic Functions

Type-variables which are type-parameters may only be used in dynamic methods, blocks, and initializers. The reason is that for static code, there is no current instance object, and hence no current instance-type for the type parameter.

Sometimes, it is neccessary to also have polymorphic static functions. Or, one wants a dynamic method which is polymorphic independently of any type parameters. This can be done by prefixing the method return type with a type-variable section:

     class TreeUtils {
         static <A implements Comparable<A>> 
             Tree<A> insertArray(Tree<A> tree, A[] elems) {

             for (int i = 0; i < elems.length; i++)
                 tree = tree.insert(elems[i]);
             return tree;
         }
     }

Implementing parametric types

Restrictions on parametric types


First class functions

First class functions let us use functions as special kinds of data types. We can store a funcion in a variable and access it without any knowledge about the class it's defined in. (If you know C++ you know about function pointers.)

With first class functions we can avoid the effort of wrapping classes to make them implement a special interface. We define all the functions we need to be passed as parameters to our methods. (Yes, we pass a function as parameter to a function!)

The syntax for function types is:
(argtype, ..., argtype) -> resulttype.

Or, if exceptions are thrown by the function:
(argtype, ..., argtype) throws exception, ..., exception -> resulttype.

Any method with a matching signature may be substituted for a function parameter. So a function boolean doIt(int, String) matches (int, String) -> boolean or int read(InputStream) throws IOException matches (InputStream) throws IOException -> int.

As an example we consider the compareTo() method of the Java class String. This method compares to Strings lexically and case sensitve. Now we want to sort Strings. We want to change the behaviour of the sorting by providing the function that compares to Strings:

class PrintSortedStrings {

    static void print(String[] args, (String, String) -> int compare) {
        ....
        if (compare(args[i], args[j]) > 0) 
        ....
    }
}
To uses the compareTo method of the class String in one case and a case insensitive version in another we can now write:
int myCompare(String s1, String s2) {
    return s1.compareTo(s2);
}

int myCompareNoCase(String s1, String s2) {
    return s1.toLowerCase().compareTo(s2.toLowerCase());
}

String[] myStrings = { "These", "Strings", "will", "be", "sorted", "!"};
PrintSortedStrings.print(myStrings, myCompare);
PrintSortedStrings.print(myStrings, myCompareNoCase);

The use of first class functions becomes much more important if you are using parametric polymorphism.

So we go back to our tree example. We needed the method less which was introduced via an interface. We now refine our class Tree to take a function as parameter to the methods contains and insert:

abstract class Tree<A> {
    public abstract boolean contains((A,A)->boolean less, A x);
    public abstract Tree<A> insert((A,A)->boolean less, A x);
}
These methods can now use the variable less to use it's function, as if it was defined locally:
    public boolean contains((A, A) -> boolean less, A x) {
        if (less(x, elem)) 
            return left.contains(less, x);
        else if (less(elem, x))
            return right.contains(less, x);
        else // we assume total ordering, therefore x.equals(elem)
            return true;
    }
We can now implement a tree class to store elements of all types without any restrictions. Even Java's basic type can be used again, which was not possible in the F-bounded version via the interface. (source: FirstClass.pizza)
class Branch<A> extends Tree<A> {
    A elem;
    Tree<A> left;
    Tree<A> right;
    
    public Branch(A elem, Tree<A> left, Tree<A> right) {
        this.elem = elem;
        this.left = left;
        this.right = right;
    }

    public boolean contains((A, A) -> boolean less, A x) {
        if (less(x, elem)) 
            return left.contains(less, x);
        else if (less(elem, x))
            return right.contains(less, x);
        else // we assume total ordering, therefore x.equals(elem)
            return true;
    }

    public Tree<A> insert((A, A) -> boolean less, A x) {
        if (less(x, elem)) 
            return new Branch(elem, left.insert(less, x), right);
        else if (less(elem, x))
            return new Branch(elem, left, right.insert(less, x));
        else 
            return this;
    }
}
To get a tree that stores Strings we can now extend the Branch class and overload the methods contains and insert:

class StringBranch extends Branch<String> {

    StringBranch(String elem, Tree<String> left, Tree<String> right) {
        super(elem, left, right);
    }

    private static boolean stringLess(String x, String y) {
        return x.compareTo(y) > 0;
    }

    public boolean contains(String x) {
        return contains(stringLess, x);
    }

    public Tree<String> insert(String x) {
        return insert(stringLess, x);
    }
}
As second example we implement mutable sets. (Our first implementation had persistent sets.) (source: MutableSet.pizza)
interface Set<A> {
    boolean contains(A x);
    void include(A x);
}

class TreeSet<A> implements Set<A> {
    private (A, A) -> boolean less;
    Tree<A> elems = new EmptyTree();

    public TreeSet((A, A) -> boolean less) {
        this.less = less;
    }
    public boolean contains(A x) {
        return elems.contains(less, x);
    }
    public void include(A x) {
        elems = elems.insert(less, x);
    }
}
We don't have to extend a class that uses first class functions as parameters as we did with the StringTree class. In this example we define the needed function locally to the use of the Set class (in the next section you'll learn about a third way!):
public class TestForDuplicates {

    static boolean stringLess(String x, String y) {
        return x.compareTo(y) < 0;
    }

    public static void main(String[] args) {
        Set<String> s = new TreeSet(stringLess);
        for (int i = 1; i < args.length; i++)
            s.include(args[i]);
        System.out.println(s.contains(args[0]));
    }
}

Anonymous Functions

Instead of defining a method belonging to a class Pizza allows you to define functions locally. These functions don't have a name that's why they are known as anonymous functions.

Pizza's syntax to define an anonymous function is:

fun(arguments) -> resulttype { statements }
With anonumous functions we can recode our test-for-duplicates example (same source: MutableSet.pizza):
public class TestForDuplicatesAnonymous {

    public static void main(String[] args) {
        Set<String> s = 
          new TreeSet(
            fun(String x, String y) -> boolean { 
              return x.compareTo(y) < 0; 
            }
          );
        for (int i = 1; i < args.length; i++)
            s.include(args[i]);
        System.out.println(s.contains(args[0]));
    }
}
We can declare inside other functions (even in other anonymous functions). They may access all variables and methods visible to normal methods at that point.

As a short example we consider an incrementer which will return consecutive integer values on each call. We can easily instantiate several counters of this kind:

      () -> int makeIncr() {
          int count = 0;
          return fun() -> int {
              return count++;
          }
      }
As you see the return value is a function (with the signature () -> int). Whenever we need a incrementer we call this method.
     () -> int incr1 = makeIncr();
     () -> int incr2 = makeIncr();
     System.out.println(
          incr1() + " " + incr1() + " " + incr2() + " " + incr1());
This example will print 0 1 0 2.

We could also write incrementers with a startvalue as parameter:

      () -> int makeIncr(int start) {
          int count = start;
          return fun() -> int {
              return count++;
          }
      }
You have to understand where the parameter acutally goes, it is not a parameter to the anonymous function!

The following function applies a given function in infix order to all elements of a binary tree.

void <A> traverse(Tree<A> t, (A)->void proc) {
    if (t instanceof Branch) {
        Branch b = (Branch)t;
        traverse(b.left, proc);
        proc(b.elem);
        traverse(b.right, proc);
    }
}
We can use this method to sum up all elements of a tree of integers:
int sum(Tree<int> t) {
    int n = 0;
    traverse(t, fun(int x) -> void { n += x; });
    return n;
}
The technique in traverse is one of the most commonly used benefits of first class functions. You find numerous examples in Pizza's List class.

Algebraic Data Types

In the list implementation of sets we used one abstract base class and two classes to represent list fields. We often have this situation representing a data abstraction by an abstract base class and some subclasses which represent individual cases. Depending on the type of the subclass we change the behaviour.

But if we want to examine a structure (like our List) from outside we need to use type tests and type casts (like in the traverse example in the last section).

Pizza gives us an easy to use syntax to avoid the complicated abstract class/concrete cases construction: algebraic data types

In Pizza class can contain declarations of these forms:

case ident(arguments);
case ident;
With this syntax our Tree class becomes:
class Tree<A> implements Set<A> {
    case Empty;
    case Branch(A elem, Tree<A> left, Tree<A> right);
}
This example defines one class Tree which provides the variable Empty representing empty branches and a method Branch(A, Tree<A>, Tree<A>) instantiating a representation for a non-empty branch.

To instantiate a tree with three elements we can write:

Tree<int> t = Tree.Branch(2,
     Tree.Branch(1, Tree.Empty, Tree.Empty),
     Tree.Branch(3, Tree.Empty, Tree.Empty));
To avoid the need of using the classname Tree in every instantiation Pizza allows us to import all cases of a class. Just as an import from a Java package we write an import statement in the file header (source: Algebraic.pizza):
import Tree.*;

class Tree<A> {
    case Empty;
    case Branch(A elem, Tree<A> left, Tree<A> right);
}

public class Test {
    public static void main(String[] args) {
        Tree<int> t = Branch(2,
            Branch(1, Empty, Empty),
            Branch(3, Empty, Empty));
    }
}

Pattern matching

Until now we only know how to create a tree, but we can access none of the tree's data. Pizza internally generates inner classes to represent algebraic cases of a class. In these classes the arguments we provide in the case syntax become variables. We could access the element stored in a branch this way:
Tree<String> t = ... ;
String s;
if (t instanceof Branch) {
    s = (Branch)t.elem;
} else {
    s = "(empty)";
}
This, again, seems very complicated in standard Java syntax with all it's typechecks. Pizza provides us a very simple way to access instances of an algebraic class similar to Java's switch statement (source: Pattern.pizza).
class Tree<A> {
    case Empty;
    case Branch(A elem, Tree<A> left, Tree<A> right);

    public boolean contains((A, A) -> boolean less, A x) {
        switch (this) {
        case Empty:
            return false;
        case Branch(A elem, Tree<A> l, Tree<A> r):
            if (less(x, elem)) 
                return l.contains(less, x);
            else if (less(elem, x))
                return r.contains(less, x);
            else 
                return true;
        }
    }

    public Tree<A> insert((A, A) -> boolean less, A x) {
        switch (this) {
        case Empty:
            return Branch(x, Empty, Empty);
        case Branch(A elem, Tree<A> l, Tree<A> r):
            if (less(x, elem)) 
                return Branch(elem, l.insert(less, x), r);
            else if (less(elem, x))
                return Branch(elem, l, r.insert(less, x));
            else 
                return this;
        }
    }
}
As we see the switch (var) statement for algebraic types is very powerful. It does two things for us: It selects the case corresponding to the case of the instance of var. And it assigns all variables in a constructed case with the corresponding fields (in our example l matches left, because it's on the same position in the argument list).

In the matching patterns we can replace parameters we don't want to access with a single '_':

    Tree<A> goRight() {
        switch (this) {
        case Empty:
            return this;
        case Branch(_, _, Tree<A> r):
            return r.goRight();
        }
    }

Deep pattern matching

Pattern matching is not restricted to just one level of variables. If you really want to understand it examine this complicated example:
    static <A,B> List<C> mapPairs((A,B)->C f, List<A> xs, List<B> ys) {

        List<C> result;
        switch (Pair.Pair(xs, ys)) {
        case Pair(List.Nil, _):
        case Pair(_, List.Nil):
            result = List.Nil;
            break;
        case Pair(List.Cons(A x, List<A> xs1),
                  List.Cons(B y, List<B> ys1)):
            result = List.Cons(f(x, y), mapPairs(f, xs1, ys1));
            break;
        case _: System.out.println("case 2"); break;
        }
        return result;
    }

Special case: Classes with a single constructor

The algebraic class syntax is so convenient that it's useful even if there are no sub-cases. An example is the class Pair:

class Pair<A,B> {
    case Pair(A fst, B snd);
}
is a shorthand for
class Pair<A,B> {
    A fst;
    B snd;
    Pair(A fst, B snd) {
        this.fst = fst;
        this.snd = snd;
    }
}
If there's only one constructor, no separate class for the case will be produced. Therefore, it's OK to use the same name Pair for the class and the case.

Special case: Enums

By defining a class with only constant subcases we get enumeration types, which are missing in Java (compared to C):
class Color {
    case Red;
    case Green;
    case Blue;

    String toString() {
        switch (this) {
        case Red: return "Red";
        case Green: return "Green";
        case Blue: return "Blue";
        }
    }
}
This is often preferable to using sets of integer constants since a typecheck by the compiler is provided and you can't use an undefined value.

Type casts in Pizza

To use parametric polymorphism even for Java's basic types we sometimes need the ability to cast between objects and basic types. Pizza provides typecasts from basic types to their boxing classes and back:
int basicX;
Integer integerX;
Object objectX;

basicX = (int)integerX;
basicX = (int)objectX;
integerX = (Integer)basicX;
objectX = (Object)basicX;
However, as always typecasts can go wrong during runtime!

Tail Call Recursion

If you use too deep recursion the execution of your program might stop with an StackOverflowException. The reason is all return addresses, local variables and method parameters are stored on the stack. There is a case in which the Pizza compiler can avoid the enormous stack usage. If the last command in a recursively called method is the call to itself we call it tail call recursion. In this case you can tell the Pizza compiler to generate code that does not use the stack. The method itself has to be marked with continue and the recursive call to itself has to be done with the goto statement (source: TailCall.pizza):
continue int veryRecursive(int howDeep) {
    if (howDeep == 0) {
        System.out.println("I'm there -- finally!");
        return -1;
    } else return goto veryRecursive(howDeep - 1);
}
However, this approach results in code many times slower than the normal solution via the stack. So you should use it very carefully.

Pizza's API Classes

To get into programming with Pizza you should have a closer look to it's standard API classes listed here. You'll find the documentation to all the classes in your Pizza distribution in the directory pizza/doc. These HTML pages are automatically generated by Pizzadoc.

Package pizza.lang:

List<A> Store lists of elements and can apply functions to the elements in many ways. Offer functional-programming-like access to data.
ListBuffer<A> Buffers to which elements can be appended in constant time, and which can be converted to lists. Modelled after java.lang.StringBuffer.
Cell<A> Reference cells holding a single mutable item of type A.
Pair<A,B> Pairs of values.

Package pizza.util:

Enumeration<A> An interface for type-safe enumeration, modelled after java.util.Enumeration.
Enumerator<A> An abstract class which defines additional functionality for enumerations. E.g. map, filter, fold, concat.
CompositeEnumerator<A>
EmptyEnumerator<A>
ListEnumerator<A>
SingletonEnumerator<A>
MapEnumerator<A>
SetEnumerator<A>
Concrete implementations of enumerators.
Hashtable<Key,Value> Type-safe hashtables, modelled after java.util.Hashtable.
Set<Elem> (Mutable) sets.
Option<A> Optional occurence: Either an A or a value for nothing.


Further Information

Pizza implements everything on top of Java. You can translate your Pizza sources to Java sources by providing the command line parameter -s to the compiler. All internally generated classes will be written as Java sources to the class directory. This is the best way to find out how Pizza creates these additional features.



Tutorial author: Enno Runne.