I had to parse in Java an rather big XML document that was looking something like that:

<root>
	<child1>
		<a>I need this element</a>
		<b>but not this one</b>
		<c>this one is also irrelevant</c>
	</child1>
	<child2>
		<a></a>
		<b></b>
		<c></c>
	</child2>
	<!-- and so on to fill more than 4 Mo -->
</root>

There are hundreds of first level child nodes, but each one only has a few children.

Given any child node, I had to get one of its own children by its name. I had two options:

  • iterate over the list of children, and check each one to see if it’s the one I am looking for (5 lines of code);
  • or be lazy and use a very simple XPath expression (1 line of code).

Being lazy, and since the context nodes were having only a few child nodes, I decided to go with XPath. I knew there would be some overhead, but it shouldn’t be very big, right?

Wrong.

My code wasn’t doing anything fancy, but it took 40 seconds to complete. Being curious about the overhead introduced by XPath, I replaced it by a bunch of node.getChildNodes(). The total time dropped to 0.1 second.

I wrote a small benchmark to illustrate this problem. It boils down to these two methods:

/**
 * Returns the direct child of parent with the given name using XPath
 */
private static Node getChildXPath(Node parent, String name) throws Exception {
	return (Node)xpath.evaluate(name, parent, XPathConstants.NODE);
}

/**
 * Returns the direct child of parent with the given name using getChildNodes()
 */
private static Node getChildDom(Node parent, String name) throws Exception {
	NodeList list = parent.getChildNodes();
	for(int i=0; i<list.getLength(); i++) {
		Node child = list.item(i);
		if(child.getNodeType() == Node.ELEMENT_NODE && name.equals(child.getNodeName())) {
			return child;
		}
	}
	return null;
}

The full code is available on github.

On my linux box with OpenJDK 1.6.0_20, I got the following results:

Children were found using DOM in 2 ms
Children were found using XPath in 4016 ms

Lesson learned: javax.xml.xpath is slow as hell, even for very simple expressions.