Saturday, July 26, 2008

Java Collections Listeners

Maybe I've been looking at various JavaScript frameworks, etc., for too long, but I'm surprised that there's really no positive way to listen for add/remove/update events within Java's Collections Framework.

Assumptions / Use Case:

  1. Implement the Map interface. This allows for passing into other methods that only accept an implementation of Map, as well as providing a consistent and familiar API to other developers.
  2. Add some additional processing, e.g.:
    • Disallowing certain keys or values from being inserted.
    • Keeping one or more associated maps or sets for performance, e.g. a map of lists when an entry needs to be obtained by value rather than key.
    • Extending HashMap into an "OrderedMap", where unique keys are still guaranteed, but the order of insertions is also kept.

Considerations

Clearly public interfaces are designed to be implemented, and public abstract classes are available to extend. Sun's "Custom Implementations" lesson in the Collections tutorial demonstrates this. As the lesson describes, there are several abstract classes available to serve as a starting point for a custom collection implementation. However, extending these to match the full functionality and efficiency of the existing concrete implementations is not a small effort, especially for Map implementations. The most significant features to note are the methods that return a "view" to a different part or representation of the map, e.g. keySet(), values(), and entrySet(). All are documented to return a view of the Map, such that changes in the view are reflected in the map, and vice-versa.

If that's not already a lot of extra work, consider the methods that are available on the returned view that should also remain functional, such as the iterator()'s remove() method, or worse, the List.listIterator() method. Sure, many of these are listed as "optional operations", which may simply throw an UnsupportedOperationException for simple implementations. However, this can be frustrating to developers attempting to use these methods, and some code may depend on these methods to function.

One solution to save a lot of work is to extend an existing concrete implementation, e.g. HashMap. However, there always seems to be some debate over extending a non-abstract class. My view is that if the class wasn't meant to be extended, it should be marked final or have less-than-public constructors. (As a compromise, the class could remain open to extension, while protecting various methods by marking them final.) java.util.Properties is one example of a public class that extends java.util.Hashtable, another concrete, public class.

Considerable progress seems possible on a HashMap subclass by simply overriding the put(K, V), remove(Object), and clear() methods. However, as described above, the "view" methods provide alternate access points to modify the underlying collection data, not all of which chain down to the overridden methods. This can quickly lead to an incomplete and buggy class.

My working solution

The best approach I've found is to follow the recommendation in the tutorial, and to extend AbstractMap. A child HashMap can then be held as an instance variable, with most of the AbstractMap methods delegated or proxied to it. For returning the "view" methods, they can be made "unmodifiable" by wrapping them in calls to the Collections.unmodifiable*(…) wrapper methods. While this may lead to some frustrations defined above, it provides a solid class that still properly implements the core collection methods.

I did find one Sun bug report that is a request for enhancement regarding this issue: 5078552 - "(coll) ChangeListener, VetoableChangeListener for Collections, Lists and others". It was submitted in July 2004, and hasn't yet received any attention since.

Gmail migration with IMAP and the Java Mail API

As I previously posted, Gmail has IMAP support. I've been using it for a while now. It's a great way to use Gmail with a a desktop email client such as Mozilla Thunderbird, as well as having offline access to your email.

IMAP also offers another significant feature to Gmail users: The ability to easily migrate messages between accounts - either between different Gmail accounts or from another source into Gmail. Google offers their own tool for this, but it's only available for commercial and educational accounts - not for their personal accounts like the ones I have.

I attempted to run my transfer using the configured accounts in Microsoft Outlook, very much like Ashish Mohta described on his blog. Things were looking good until I started getting the following error:

The current command did not succeed.
The mail server responded: Unable to append message to folder (Failure).

The first thing I suspected was an Outlook issue, so I switched to Thunderbird but got the same result. I suspected I may be hitting some sort of transfer limit, so I tried downloading everything from my first account to a local folder first. That worked, but then I still wasn't able to upload from the local folder into the second account. A Google search shows a number of other people having the same issue. Google help has two (1, 2) answers that primarily blame formatting incompatibilities, but as indicated by by other results in the Google search, this definitely appears to be the result of exceeding some unposted transfer limits.

There didn't seem to be any way to configure either Outlook or Firebird to "slow things down", without manually moving only a few messages at a time - which would be tedious and prone to error. Additionally, I guessed that both clients were performing additional tasks behind the scenes that were making the issue worse, such as needlessly refreshing folders.

JavaMail API

I figured that my best chance of getting everything how I wanted it was to script the transfer. For this, I turned to the JavaMail API. I've used this before, but mostly just for sending messages over SMTP. Surprisingly, use with Gmail is even listed in the JavaMail API FAQ.

Connecting was quite simple. The only property that I needed to pass into Session.getInstance(…) was a value of "imaps" for "mail.store.protocol". For simplicity, I then used the Store's connect(String host, String user, String password) method to connect.

To open the "All Mail" folder in Gmail, either .getFolder("[Gmail]").getFolder("All Mail") or .getFolder("[Gmail]/"All Mail") works.

I did my transfer in two separate steps - the download than the upload. This was mostly because I also wanted a local, backup copy to keep. By iterating through each Message from the Folder's getMessages(), the Message's writeTo(OutputStream) method allows for easily saving the entire message - including headers and attachments - in MIME format. Similarly, the messages can then be recreated using the MimeMessage(Session, InputStream) constructor, then imported into an IMAP folder using the appendMessages(Message[]) method.

Alternatively - though I haven't tried it - opening a session to each account and using the Folder's copyMessages(Message[], Folder) method looks like it would also be convenient. Note that the "this" Folder is the source, and the Folder parameter is the destination. However, while the method is convenient for copying multiple messages at once, it may need to be coded for only one message at a time, so a pause can be inserted between transfers to account for the Gmail restrictions described above.

To slow down the transfers to keep the Gmail servers happy, after each download or upload I simply called Thread.sleep(long millis). I started out at 2,500 ms, but had this down to 500 ms by the time I finished without any further issues.

Sent Mail

Another issue I had was that I wanted/needed to re-populate my "Sent Mail" folder. This may not always be an issue, but due to any number of reasons in my case, Gmail no longer recognized any of my sent messages as "sent". While Gmail seems to treat "Sent Mail" just like any other "label", the Gmail interface doesn't provide any options to add or remove messages from it, short of deletion. However, it can be modified through IMAP. While copying in each new message, I checked to see if the getSender() method matched one of email addresses. If so, it should be copied into the "Sent Mail" folder rather than "All Items". Note the difference in using the sender rather than the "From" attribute from getFrom(). The later would typically include items that were actually received, probably from an automated sender, where it was only made to appear "from" the same email address.

Originally, I simply imported all mail into the "All Mail" folder, then copied it into "Sent Mail" as needed. However, passing the same MimeMessage into Folder.appendMessages(Message[]) then again to copyMessages(Message[], Folder) for "Sent Mail" results in a ClassCastException as copyMessages expects a IMAPMessage variant of a message that already exists in the IMAP store. The solution would be to obtain a proper reference to the inserted message to copy. The Message[] com.sun.mail.imap.IMAPFolder#addMessages(Message[]) method would allow for exactly this, but it appears that Gmail doesn't currently support the required UIDPLUS extension from RFC 2359. The only apparent remaining option would be to recall getMessages(), then iterate through and find the message matching the one inserted, probably by getMessageID().

Even after all this work, I still ran into an issue with the Gmail web interface showing all my messages in my "Sent Items" as "To: me", rather than the actual recipients. The issue wasn't related to anything with the transfer, but in my Gmail account settings. I had assumed that Gmail calculated "me" as mail sent to any address listed in the "Send mail as" list. Instead - and probably more reasonable - it is almost the opposite. If the email is sent to any address not listed in the "Send mail as" list, it is assumed by Gmail to be "To: me".

Other tools

After I successfully finished my migration using the above method, I did find some other tools that appear to solve many of the same issues. 2 that I found off of related Google searches are imapsync and IMAPSize. (I've not downloaded or tried either of these.)