noodling towards a functional brain

Saturday, February 18, 2012

Correcting the Visitor pattern.

I'm writing this post because I want to address a problem that I see time and time again: people who are trying to figure out how to encode algebraic data types in languages that do not support them come up with all kinds of crazy solutions. But, there is a simple and effective encoding of algebraic data types that everyone knows about, but had just been doing wrong: the Visitor pattern. At this point, I expect many devotees of functional programming to start screaming "but... mutability!!!" And, yes, the mutability required by the textbook definition of the Visitor pattern is indeed a major problem - the problem that I intend to show how to correct right here. The solution is remarkably simple. Here's a tree traversal using the Visitor pattern, implemented *correctly* in Java:
public interface Tree<A> {
  public <B> B accept(TreeVisitor<A, B> v);
}

public class Empty<A> implements Tree<A> {
  public <B> B accept(TreeVisitor<A, B> v) {
    return v.visitEmpty();
  }
}

public class Leaf<A> implements Tree<A> {
  public final A value;

  public Leaf(A value) {
    this.value = value;
  }

  public <B> B accept(TreeVisitor<A, B> v) {
    return v.visitLeaf(this);
  }
}

public class Node<A> implements Tree<A> {
  public final Tree<A> left;
  public final Tree<A> right;

  public Node(Tree<A> left, Tree<A> right) {
    this.left = left;
    this.right = right;
  }

  public <B> B accept(TreeVisitor<A, B> v) {
    return v.visitNode(this);
  }
}

public interface TreeVisitor<A, B> {
  public B visitEmpty();
  public B visitLeaf(Leaf<A> t);
  public B visitNode(Node<A> t);
}
This is exactly the traditional Visitor pattern, with one minor variation: the 'accept' method is generic (parametrically polymorphic), returning whatever return type is defined by a particular TreeVisitor instance instead of void. This is a vitally important distinction; by making accept polymorphic and non-void returning, it allows you to escape the curse of being forced to rely on mutability to accumulate a result. Here's an example of the implementation of the 'depth' method from the previously mentioned blog post, and an example of its use. You'll note that no mutable variables were harmed (or indeed used) in the creation of this example:
public class TreeUtil {
  public static final <A> TreeVisitor<A, Integer> depth() {
    return new TreeVisitor<A, Integer>() {
      public Integer visitEmpty() {
        return 0;
      }

      public Integer visitLeaf(Leaf<A> t) {
        return 1;
      }

      public Integer visitNode(Node<A> t) {
        int leftDepth = t.left.accept(this);
        int rightDepth = t.right.accept(this);
        return (leftDepth > rightDepth) ? leftDepth + 1 : rightDepth + 1; 
      }
    };
  }
}

public class Example {
  public static void main(String[] argv) {
    Tree<String> leftBiased = new Node<String>(
      new Node<String>(
        new Node<String>(
          new Leaf<String>("hello"),
          new Empty<String>()
        ),
        new Leaf<String>("world")
      ),
      new Empty<String>()
    );

    assert(leftBiased.accept(TreeUtil.<String>depth()) == 3);
  }
}
TreeVisitor encodes the f-algebra for the Tree data type; the accept method is the catamorphism for Tree. Moreover, given this definition, you can also see that the visitor forms a monad (example in Scala), giving rise to lots of nice compositional properties. Implemented in this fashion, Visitor is actually nothing more (and nothing less) than a multiple dispatch function over the algebraic data type in question. So stop returning void from your visitors, and ramp up their power in the process!

10 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. (sorry for weird blogspot formatting)
    This is a very good solution. But one question, is there any reason for giving individual names to visit methods? Why not do like this:

    public interface TreeVisitor<A, B> {
    public B visit(Empty e);
    public B visit(Leaf<A> aLeaf);
    public B visit(Node<A> t);
    }

    ReplyDelete
  3. The reason that I prefer giving individual names to the visit methods is that I see reusing the 'visit' name to be a somewhat spurious use of overloading. It makes the visitor pattern look more complex than it actually is, and it causes some people to believe that the pattern is not applicable in languages (like Ruby) that don't support overloading. In general I'm not a big fan of overloading, particularly since I spend most of my time in Scala (which supports named and default arguments as a superior alternative to the common uses of overloading.)

    ReplyDelete
  4. Anonymous1:52 PM

    Why is 'A' required as Type Argument in visitor ?

    How about something like this ?
    public T accept(TreeVisitor<.T> v);

    Since we want states to be immutable, I guess accept method will always have to be
    return v.visitNode(this);

    That means we only need 1 Type argument

    ReplyDelete
  5. Anon: Well, try it, and see whether you can get the compiler to like it! The reason that there are two type arguments is that you need to represent both the value type of the Tree and the result type of the computation.

    ReplyDelete
  6. Multiple dispatch or pattern match?

    ReplyDelete
  7. public B empty(), public B leaf(A value), public B tree(Tree[A] left, Tree[A] right) might be slightly less noisy way.

    ReplyDelete
  8. Anonymous5:29 AM

    This blog post is pretty good tbh, I wish someone would fix wikipedia too

    ReplyDelete
  9. crossreference http://stackoverflow.com/a/43815348/6996876 and pingback

    ReplyDelete
  10. Anonymous10:14 AM

    One thing I cannot understand: Why must the return types of accept() and visit() be void? If they were Object, as in
    public Object accept()
    doesnt the problem go away?
    -Saru

    ReplyDelete

Note: Only a member of this blog may post a comment.

About Me

My photo
aspiring to elegant simplicity