Thursday, January 28, 2010

Redundant argument validation code in Java IO classes

Today I noticed some almost comical redundant checks in a number of the java.io InputStream, OutputStream, Reader, and Writer classes. I'm looking at the read and write methods with a signature of ([] x, int off, int len), where "[] x" is either a byte or char array, and "x" is named "b" for a byte array, or "c" or "cbuf" for a char array. One example is InputStream.read(byte[], int, int).

Below is a portion of the source code used within StringReader, StringWriter, BufferedReader, BufferedWriter, CharArrayReader, CharArrayWriter, and ByteArrayOutputStream, identical between all versions of Java checked from 1.3 - 1.6 (6.0), and only slightly reformatted for readability:

if ( (off < 0) || (off > x.length) || (len < 0)
    || ((off + len) > x.length) || ((off + len) < 0) ) {
  throw new IndexOutOfBoundsException();
}

Keep in mind that these methods are typically called within a loop, and depending upon the chosen array/buffer size and the amount of data being read or written, these methods and the shown checks may be executed many times. Any unnecessary statements will only hinder performance. Notice any redundancies in the check?

First, if neither "off" nor "len" are less than 0 (both already checked), it is guaranteed that the sum of "off" and "len" will also never be less than 0. This makes the "((off + len) < 0)" check completely redundant, and it should be removed.

Second, if the sum of "off" and "len" is not greater than the length of the array, and if both "off" and "len" are positive (all already checked), it is guaranteed that "off" alone will also never be greater than the length of the array. This makes the "(off > cbuf.length)" check completely redundant, and it should be removed.

This would simplify and shorten the above checks to the following:

if ( (off < 0) || (len < 0) || ((off + len) > x.length) ) {
  throw new IndexOutOfBoundsException();
}

While I haven't checked, I suppose it is possible that this type of issue could be optimized away by the compiler or even the JVM at runtime - but I wouldn't hold my breath.

BufferedOutputStream in all the same versions works a little differently and doesn't perform any of its own argument validation.

Someone apparently realized this issue on ByteArrayInputStream, and made an optimization. In Java 1.5 / 5.0 and previous, the check was the same as all the others above. In Java 1.6 / 6.0, the check is now written as:

if ( off < 0 || len < 0 || len > x.length - off ) {
  throw new IndexOutOfBoundsException();
}

This is basically identical to my optimized version above, just with "off" moved to the other side of the comparison, with the operator properly switched from '+' to '-' to match. However, all versions of the read method in ByteArrayInputStream add an unnecessary check for "(b == null)", only to manually throw a NullPointerException. An identical NullPointerException will already be thrown on the attempt to read the "length" property from the array if the array is null in the above check. This makes the additional check redundant, and it should be removed.

BufferedInputStream takes an interesting, alternative approach:

if ( (off | len | (off + len) | (x.length - (off + len))) < 0) {
  throw new IndexOutOfBoundsException();
}

Notice that bitwise ORs are being used instead of logical ORs. Any negative value bitwise OR'd with any other value(s) will produce a negative result. However, this code is performing another redundant operation. Again, if neither "off" nor "len" are less than 0 (both already checked), it is guaranteed that the sum of "off" and "len" will also never be less than 0. This makes the "(off + len))" check completely redundant, and should be removed. This can simplify and shorten to:

if ( (off | len | (x.length - (off + len))) < 0 ) {
  throw new IndexOutOfBoundsException();
}

I'm not certain which of the approaches (bitwise or logical ORs) should be faster. Successive logical ORs can be skipped once an earlier portion evaluates to true. However, for almost all (assuming valid) calls to these methods, these expressions should almost always evaluate to false, which then requires an entire logical OR expression to be evaluated regardless. I'd be interested to hear any detailed arguments for one way or another.

I reported this to the Sun (Oracle) bug database (internal review ID 1705260), and will post an update if and when I receive a public bug ID.

1 comment: