Friday, May 22, 2009

Xalan-J Serialization Performance hindered by Flushing

Following the "Chaining Transformations" approach I described in XML and XSLT Tips and Tricks for Java, I had developed a very performance-aware system centered around XML processing. By "pipe-lining" the various steps, less memory is required, and the execution time is reduced. The example in my previous post only used a sample final destination of System.out. Unfortunately, an issue quickly appeared once a similar approach was used in a real-world situation, where the output was a higher-latency destination. The approach was and still is correct, but a work-around is currently necessary to avoid a bug in the Apache Xalan/Serializer implementation that would otherwise cause a severe performance penalty.

As discussed between 2001 and 2003 in the XALANJ-78 bug report, there was some discussion around when flush() is called on the result. The overall consensus was that it was and should only be called from endDocument(). This would mean only one flush operation per document, which would seem acceptable.

However, I found that flush() is being called much more often, at least using versions of Xalan-J between 2.6.0 (used in Java 1.5/5.0 - 1.6/6.0) and the latest 2.7.1. It seems that any call to TransformerIdentityImpl.startPrefixMapping(…) calls ContentHandler.startPrefixMapping(…), with no overloaded methods in the public API. This is implemented by ToStream.startPrefixMapping(String prefix, String uri). This then calls the non-API method ToStream.startPrefixMapping(String prefix, String uri, boolean shouldFlush), with "shouldFlush" always true. This in itself seems to be correct, in that "shouldFlush" affects other logic beyond just flushing the output stream. However, this always calls flushPending(), which then flushes the actual output stream.

The result? The output stream or writer may be flushed as much as once per XML element written. I reported this in XALANJ-2500, along with an example that demonstrates 100 XML elements being written, and flush() being called as many times. In this particular case, using namespaced XML elements is required. However, where I first ran into this was with an XSL that utilized XML namespaces for parameter names, but the generated document was completely within the default namespace.

Assume that the output destination has a latency of even just 50ms. Writing just the small sample document of 100 elements will take 5 seconds under the given circumstances! In some related scenarios, wrapping the OutputStream or Writer in a BufferedOutputStream or BufferedWriter can improve performance by allowing the caller to write without causing a call to the underlying system for each write. Unfortunately, each call to flush() on the buffered implementations simply cause the buffers to flush to the underlying output, and for the output to be flushed as well.

The only solution I'm aware of at the moment is the one I mentioned in the bug report: Use a subclass of BufferedOutputStream or BufferedWriter, with flush() being overwritten to do essentially nothing. (See my NoFlushBufferedOutputStream and NoFlushBufferedWriter classes in MarkUtils-IO for an implementation.)

No comments: