Guava Collections Subtleties


At Palantir, we write most of our code in Java, but we miss functional language features like map and filter for working with collections of objects. To make up for that, we use Google’s Guava libraries, but occasionally they don’t behave quite how we expect.

Read on to see one of our Guava puzzlers.

The Puzzle

Can you figure out why the following test doesn’t pass?

public void testTreeBuild() {
        // Create root.
        Node root = new Node(0L);
        // Create a list of ids.size() children.
        Collection<Long> ids = Lists.newArrayList(1L);
        Collection<Node> children =
        Collections2.transform(ids, new Function<Long, Node>(){
        public Node apply(Long id) {
            Node n = new Node(id);
            return n;
        }});
        // Give every child a grandchild.
        for (Node child : children) {
            Node grandchild = new Node(2L);
            child.addChild(grandchild);
        }
        // Attach every child to root.
        for (Node child : children) {
            root.addChild(child);
        }
        // Validate that root has children.
        assertEquals(1, root.numChildren());
        // Validate that children have children.
        for (Node child : root.children) {
            assertEquals(1, child.numChildren());
        }
    }
    private static class Node {
        public long id;
        public List<Node> children = Lists.newArrayList();
        public Node(long id) {
            this.id = id;
        }
        public void addChild(Node child) {
            this.children.add(child);
        }
        public int numChildren() {
            return children.size();
        }
    }

The Solution

Many of Guava’s library functions, including Collections2.transform(), provide a live view of the underlying collection. Most of the time this is a feature—when you update the underlying collection, you get an updated derived collection. The downside is that every time you perform an operation on the derived collection, it performs the operation on the underlying collection and then executes the function on that result. In this case, when we iterate over the children collection to give every child a grandchild, we invoke the node creation function once per id and create one new node. Later, when we attach each child to the root, we create new nodes again for each id, so the nodes that have grandchildren are not the same as the nodes that are attached to the root. This means that the last assertion of this test fails.

Best practices for derived collections that at compute views on underlying collections:

  • Don’t use functions that have side effects
  • Don’t use functions that are slow
  • Don’t create objects in functions
  • If you want to use a function that doesn’t obey the above rules, immediately create a new collection so that the function is invoked exactly once per object, for example: Lists.newArrayList(Collections2.transform(ids, function))