- Jérôme Nègre's piece of web -

JUN 26 2011 Beware of javax.xml.xpath

I had to parse in Java an rather big XML document that was looking something like that:

<root>
	<child1>
		<a>I need this element</a>
		<b>but not this one</b>
		<c>this one is also irrelevant</c>
	</child1>
	<child2>
		<a></a>
		<b></b>
		<c></c>
	</child2>
	<!-- and so on to fill more than 4 Mo -->
</root>

There are hundreds of first level child nodes, but each one only has a few children.

Given any child node, I had to get one of its own children by its name. I had two options:

Being lazy, and since the context nodes were having only a few child nodes, I decided to go with XPath. I knew there would be some overhead, but it shouldn't be very big, right?

Wrong.

My code wasn't doing anything fancy, but it took 40 seconds to complete. Being curious about the overhead introduced by XPath, I replaced it by a bunch of node.getChildNodes(). The total time dropped to 0.1 second.

I wrote a small benchmark to illustrate this problem. It boils down to these two methods:

/**
 * Returns the direct child of parent with the given name using XPath
 */
private static Node getChildXPath(Node parent, String name) throws Exception {
	return (Node)xpath.evaluate(name, parent, XPathConstants.NODE);
}

/**
 * Returns the direct child of parent with the given name using getChildNodes()
 */
private static Node getChildDom(Node parent, String name) throws Exception {
	NodeList list = parent.getChildNodes();
	for(int i=0; i<list.getLength(); i++) {
		Node child = list.item(i);
		if(child.getNodeType() == Node.ELEMENT_NODE && name.equals(child.getNodeName())) {
			return child;
		}
	}
	return null;
}

The full code is available on github.

On my linux box with OpenJDK 1.6.0_20, I got the following results:

Children were found using DOM in 2 ms
Children were found using XPath in 4016 ms

Lesson learned: javax.xml.xpath is slow as hell, even for very simple expressions.

Spelling or grammatical mistake? Let me know, mail me at [webmaster at jnegre dot org], thanks!
Last modified: Jan. 23, 2012 at 19:47:37 CET
Valid XHTML 1.0!