Tuesday, September 22, 2009

MarkUtils-Codec: Base64, URL, and other byte/char conversions

This is an overdue introduction of my latest addition to MarkUtils. MarkUtils-Codec could be considered a high-performance replacement for Apache Commons Codec. Like Commons Codec, this implementation has support for Base64, URL (a.k.a. Percent, and covered previously), and Hexadecimal encodings and decodings. Also like Commons Codec, this implementation utilizes a number of interfaces that allow various codecs to be used interchangeably. Unlike Commons Codec, this implementation is designed to be higher performing, as it is written for streaming use with the Buffer classes. The most significant advantage to this design is lower memory requirements and usage, especially when working with longer lengths of data.

MarkUtils-Codec is really a follow-up to one of my previous posts, Improving URLEncoder/URLDecoder Performance in Java. While the API I proposed and sample code I provided solved an immediate need, the lack of proper interfaces made it difficult to replace with other codecs, such as Base64. The options to plug-in to other standard streaming classes was also limited. For example, there was no clear way to create an InputStream that would read decoded data from encoded data. This library is meant as a complete replacement, as I have placed the "urlCodec" library in archival status.

Until I have a suitable place to host the Javadocs online, please reference them in the downloads available at ziesemer.dev.java.net.

The highest-level API interface is com.ziesemer.utils.codec.ICoder. Verbatim from the Javadoc, this is the "Base API for high-performance encoding and decoding between various Buffers. Supports conversions between ByteBuffers and CharBuffers through the IByteToCharEncoder and ICharToByteDecoder child interfaces. This API is similar in design to CharsetEncoder and CharsetDecoder."

Do note that the relation to the Charset classes may seem a bit backwards. When a character set is decoded, the input is bytes and the output is characters. The purpose of this library is to encode any data (as bytes) into character data that can safely be sent through various non-byte transports, e.g. HTTP forms. For this purpose, decoding takes characters and input and produces bytes as output.

Here is a simple example of supported direct usage, taking no advantage of streaming capabilities. This is included as one of the JUnit tests within the com.ziesemer.utils.codec.DemoTest class:

/**
 * Simple usage, taking no advantage of streaming capabilities.
 */
@Test
public void testDirectSimple() throws Exception{
  IByteToCharEncoder encoder = new URLEncoder();
  ICharToByteDecoder decoder = new URLDecoder();
  
  // Random test data.
  byte[] rawData = new byte[1 << 10];
  new Random().nextBytes(rawData);
  
  // Encode.
  CharBuffer cbOut = encoder.code(ByteBuffer.wrap(rawData));
  
  // Decode (round-trip).
  ByteBuffer bbOut = decoder.code(cbOut);
  
  // Verify.
  byte[] result = new byte[bbOut.remaining()];
  bbOut.get(result);
  Assert.assertArrayEquals(rawData, result);
}

Or an even simpler example, using convenience methods. Note that the Base64 codec can be replaced with URL (percent), Hex, or another provided codec:

byte[] sampleBytes = new byte[]{0, 1, 2, 3};
String enc = new Base64Encoder().encodeToString(sampleBytes);
System.out.println(enc); // Yields: AAECAw==
byte[] dec = new Base64Decoder().decodeToBytes(enc);
System.out.println(Arrays.equals(sampleBytes, dec)); // Yields: true

A number of input/output wrappers are also included in the "com.ziesemer.utils.codec.io" package, allowing for transparent use as a standard Java IO reader, writer, or stream. The signatures of the required constructors are also shown. Each class also provides an alternate constructor that can be used to fine-tune the read buffer size.

  • CharDecoderInputStream(ICharToByteDecoder decoder, Reader reader)

    Reads raw bytes from encoded characters. Counter-part to CharEncoderReader. This is a pull-interface; CharDecoderWriter is the equivalent push-interface.

    Can be adapted to read characters to the consumer (instead of raw bytes) by wrapping in a InputStreamReader. This is only valid if the decoded form of the data is known to only contain valid characters. Can also be adapted to read bytes from a provider (instead of characters) by using a InputStreamReader as the Reader.

  • CharDecoderWriter(ICharToByteDecoder decoder, OutputStream outputStream)

    Accepts encoded characters, and writes the raw bytes. Counter-part to CharEncoderOutputStream. This is a push-interface; CharDecoderInputStream is the equivalent pull-interface.

    Can be adapted to accept bytes from a provider (instead of characters) by wrapping in a OutputStreamWriter.

  • CharEncoderOutputStream(IByteToCharEncoder encoder, Writer writer)

    Accepts raw bytes, and writes the encoded characters. Counter-part to CharDecoderWriter. This is a push-interface; CharEncoderReader is the equivalent pull-interface.

    Can be adapted to write bytes to the consumer (instead of characters) by using a OutputStreamWriter as the Writer. Can also be adapted to accept characters from the provider (instead of raw bytes) by wrapping in a OutputStreamWriter.

  • CharEncoderReader(IByteToCharEncoder encoder, InputStream reader)

    Reads encoded characters from raw bytes. Counter-part to CharDecoderInputStream. This is a pull-interface; CharEncoderOutputStream is the equivalent push-interface.

    Can be adapted to read characters to the consumer (instead of raw bytes) by wrapping in a InputStreamReader.

Also included are a number of "character lists" (in the com.ziesemer.utils.codec.charLists package), particularly to support the different Base64 variations.

Please refer to the included JUnit tests (currently 169) for usage examples.

Download

com.ziesemer.utils.codec is available on ziesemer.java.net under the GPL license, complete with source code, a compiled .jar, generated JavaDocs, and a suite of JUnit tests. Download the com.ziesemer.utils.codec-*.zip distribution from here. Please report any bugs or feature requests on the java.net Issue Tracker.

4 comments:

Unknown said...

The link to download your codec doesn't seem to be working. Can you repost the link?

Thanks,
Matt

Mark A. Ziesemer said...

java.net recently migrated to a new platform, and I see there are still a few quirks to work out. I'll work on updating all my links. However, going forward, http://java.net/projects/ziesemer/downloads/directory/Releases should get you what you're looking for.

AlexR said...

Mark, please ignore my previous comment. The link became available several minutes after I've posted my comment.

Mark A. Ziesemer said...

Alex - as I think you've already seen, java.net had moved their URLs a while ago, such that ziesemer.java.net is now the correct address - not ziesemer.dev.java.net. This was already correct here, but not yet fixed at at /2009/05/improving-url-coder-performance-java.html - which I am now correcting.