Monday, May 25, 2009

MarkUtils-PacProxySelector for Java

Many computer networks make use of proxy servers for web and Internet connectivity. This is especially true for business and other organization networks, where their use is required by security and other policies. Due to the typical use of proxy servers, they are often thought of in terms of "restricting access". Instead, they should be thought of in the proper terms of a means of "providing access". Even outside of typical corporate environments, proxy servers can be invaluable for testing and debugging, as well as used as a type of VPN between private networks.

Most web browsers and other networked applications support directing traffic through one or more proxy servers. The typical configuration dialog looks like this, as shown from Mozilla Firefox:

Mozilla Firefox Connection Settings

Of the "manual" options, "HTTP" and "SSL" (TLS) are the most basic and common, followed by "FTP". "Gopher" is rarely used anymore - it has already been dropped by Microsoft Internet Explorer, and may be dropped in Mozilla Firefox 4.0. "SOCKS" is arguably the most powerful, supporting any of the above protocols in addition to any other TCP- or UDP-based protocol. (If SOCKS is configured and supported, none of the other protocols need to be configured.)

Unfortunately, directing an entire networks' traffic through a single proxy server can quickly cause a bottle-neck and a common point for failure. This is especially true when all LAN traffic is also sent through the proxy. (In ideal network traffic patterns, there should be many more multiples of LAN traffic over Internet traffic.) Some of the performance and availability concerns can be addressed through DNS or other load balancing. Another common attempt is to configure the "no proxy for" / exception list. Unfortunately, there are some severe limitations to this design. First, the list must be kept up-to-date as the network configuration changes. (There are various tools for this.) More significantly, there are many desired configurations that cannot be accounted for. For example, what if a list is needed for the servers that should be sent to a proxy server, rather than skipping the proxy server (reverse logic)? Or what if traffic must be split among multiple proxy servers, depending upon the destination or other parameters?

The solution to all of the above and other similar concerns is through the use of the last option shown above, and probably the most overlooked: "Automatic proxy configuration URL". This option is also known as proxy auto-config, or PAC, and was introduced into Netscape Navigator as early as 1996. A PAC file only needs to contain an implementation of a JavaScript function, FindProxyForURL(url, host). From here, the full power of JavaScript can be used, including regular expressions, associative arrays, and closures, as well as a number of predefined helper functions specific to PAC. Within the PAC function, various load balancing and black- or white-listing tasks can be performed, optionally by maintaining internal state. A list of multiple proxies may also be returned for attempts by the client. The PAC file is loaded from a URL (including local file:// URLs), where it can be centrally maintained and updated. The PAC file may be cached by the web browser or other client, but should respect the cache settings sent in the HTTP headers if retrieved through HTTP. Alternatively, a chrome:// URL can even be used, allowing for the PAC file to be maintained within a Firefox extension, and updated through Firefox's standard auto-update process for extensions.

Java support

Java supports many of the same above proxy options, mostly through the use of system properties. For full details, see the tech note at java.sun.com, Java Networking and Proxies. These settings affect any communications made through URLConnection, Socket, and possibly other network-related classes. Previously, the proxy configuration options were limited to the "manual" options listed above, with separate options for HTTP, HTTPS, FTP, and SOCKS. However, Java 1.5/5.0 introduced the Proxy and ProxySelector classes. A default ProxySelector can be configured for the current JVM by calling ProxySelector.setDefault(ProxySelector).

Unfortunately, Java does not currently provide any visible support for proxy auto-config (PAC) files. However, the ProxySelector's List<Proxy> select(URI uri) method looks and works very similar to the PAC's FindProxyForURL(url, host) function. The most notable difference is that it is strongly-typed to standard Java classes. As part of my MarkUtils collection, I created MarkUtils-PacProxySelector to provide a ProxySelector implementation that works with PAC files.

Since the PAC files are based on JavaScript, the ability to evaluate JavaScript is required. Fortunately, this is easily done through Java, especially with the introduction of the Java Scripting API in Java 1.6/6.0 (JSR-223). Java 1.6 bundles an internal version of the Mozilla Rhino implementation of JavaScript for Java, based on 1.6R2. Unfortunately, Java doesn't expose all the features of JavaScript or Rhino directly through the scripting API, some of which are required to implement the PAC functionality in a compatible fashion. This includes defining top-level bindings in the JavaScript environment to Java functions, which is directly supported in Rhino by adding a binding to a FunctionObject - a class to which there is no publicly visible match in the JDK. While it is probably possible to hack a work-around to this, my current implementation utilizes Rhino directly. Besides taking advantage of the improvements in the latest version of Rhino (currently 1.7R2), this allows the utility to be easily used with both Java 1.5/5.0 and 1.6/6.0. (However, note that JSR-223 is unofficially supported under Java 1.5/5.0 as well by downloading and including the .jar's from the reference implementation.) Using Rhino directly also avoids some potential security issues, which I reported in Sun Bug 6782031 and Mozilla Bug 468385.

As commented in the pom.xml file, Mozilla Rhino is currently not available through the central repository, a Mozilla repository, or any other "official" repository. I've added a dependency to it as "org.mozilla.javascript : rhino : 1.7R2". For this to work properly, Rhino will need to be downloaded and installed into a local repository as named above.

In addition to the standard PAC methods, PacProxySelector supports an added function called "connectFailed" to take advantage of the connectFailed(URI, SocketAddress, IOException) functionality on ProxySelector. The JavaScript method is called with the same arguments as on ProxySelector, just with the .toString() representations of each of the three parameters. The PAC file could then store this information within internal state to possibly affect future calls to FindProxyForURL.

For the most flexibility, the constructor to PacProxySelector accepts a Reader, which should read from a PAC file. There is also a public static configureFromProperties() method that returns a ProxySelector, assuming that the path to a PAC file is stored as either a Java system or environment property named "proxy.autoConfig", similar to the other network properties. After obtaining an instance from either the constructor or the method, it should be passed to ProxySelector.setDefault(ProxySelector), unless otherwise used directly. Alternatively, a setDefaultFromProperties() convenience method is provided to do this in one call.

I wrote this in mind for plugging into other Java applications. Ideally, the JDK would provide a system property that accepts the classname for the default ProxySelector or some other method for setting the default outside of a function call within the code, but this is currently not the case. However, all that has to be done is finding a way to execute one of the above configuration options from the desired Java application before network access is attempted. I've successfully written a plugin for Oracle SQL Developer that does exactly this. The same is also possible for Eclipse, though it requires patching of some of the plugins due to the current infrastructure. (See Eclipse bug 257443.) Alternatively, PacProxySelector provides a main method that calls setDefaultFromProperties() before chaining execution to another program's main method. See the included Javadoc for details.

Download

com.ziesemer.utils.pacProxySelector is available on ziesemer.java.net under the GPL license, complete with source code, a compiled .jar, generated JavaDocs, and a suite of JUnit tests. Download the com.ziesemer.utils.pacProxySelector-*.zip distribution from here. Please report any bugs or feature requests on the java.net Issue Tracker.

11 comments:

bcoppens said...

This is really a great tool. At my knowledge the only one to provide this kind of functionality. Thanks a lot!

William Wang said...

Thanks a lot for article and tool. I'm wondering without depending on separate Rhino, if just importing java.net.InetAddress to implement a dnsResolve method in PacUtils.js, what do you think?

like this:
function dnsResolve(host){
new JavaImporter(java.net, java.net.InetAddress);
return java.net.InetAddress.getByName(host).getHostAddress();
}

Mark A. Ziesemer said...

William - I'm not exactly sure what you're trying to do here. Rhino or another JavaScript interpreter is required in order to read the PAC files, so the dependency really cannot be avoided.

While having the dnsResolve function declared within the JavaScript source rather than declared by the Java code is possible, I would not advise it. It is also not the only method that requires Rhino to be specifically used, instead of the JSR-223 interfaces. Special care was taken in this implementation to prevent arbitrary Java code being called from JavaScript for security and other reasons. This includes disabling calls to all Java classes, as currently implemented by the PacClassShutter class.

William Wang said...

Thanks for quick reply. I see your point. To make it secure, you are avoiding java call from javascript. It makes sense. Although in my case, I'm trying to make the package as small as possible, that's why I'm trying to avoid carrying whole Rhino package. Looks to me, dnsResolve() is only non-resolved method referenced from FindProxyForURL function.

Mark A. Ziesemer said...

William - It is not only security. A PAC file is JavaScript, and meant to be cross-compatible across all browsers and other networked applications. Including calls to Java code completely defeats this. (Granted, the file you are referring to is the PacUtils.js file, not the PAC file that will be used to define FindProxyForURL.)

I think I see what you mean, in that dnsResolve is the only method that may appear to not be self-resolvable within PacUtils.js. However, I don't see how this is an issue. The "deliverable" of this project is a Java ProxySelector that is driven by a PAC file, the format of which specifies for a number of helper functions to be available. Many of these are easily implemented in JavaScript. dnsResolve has to be implemented at a lower level - in this case, Java. Regardless, I don't see how Rhino can be eliminated - Rhino or another JavaScript interpreter is required in order to read the PAC files as FindProxyForURL is implemented as a JavaScript method.

What exactly are you trying to do?

William Wang said...

What I'm trying to do is getting the proxy settings from PAC script, but I only want to use JavaScript engine (adapted Rhino) inside JRE, rather than full Rhino package. So I can't do the register function as you do in PacProxySelector. dnsResolve function can't be easily implemented in JavaScript either. Not sure if it's possible to do that in javascript. I hope this explains what I'm trying to do. I didn't mean it's an issue for project. Just for my purpose, I think there is an alternative way to have dnsResolve function available in the JavaScript engine. Thanks.

Mark A. Ziesemer said...

William - thanks for the clarification. Yes, this was my original goal as well - to not depend on Rhino and just use the JSR-223 implementation. However, besides dnsResolve, there are also the myIpAddress and alert functions that should be provided. I also believe the implementation is more efficient using the classes and interfaces that are available through Rhino but without a current equivalent through JSR-223. I then tried to write this to allow either implementation (Rhino or JSR-223), but the inability to provide proper security through the JSR-223 was a show-stopper. I will revisit once/if Sun resolves the bug I submitted (above), or if you or someone else provides an equal solution.

Martin Krauskopf said...

Hi Mark. There is an open issue in NB regarding the PAC files support. From what I've found so far your work seems to be the best solution which I would like to utilize to implement the NB RFE (see my comment there).
There is no license in your source-code so I suppose I might reuse it. Also if you found out some better solution in the meantime, please let me know. Thanks.

PS: I'm not official NB developer anymore, just need this functionality in application built on top of the NB platform.

Mark A. Ziesemer said...

Martin - as far as I'm concerned, my PacProxySelector is under the GNU LGPLv3 license, as shown at http://ziesemer.dev.java.net, and in the pom.xml file in the distribution. (I know, the license visibility probably needs work, but it's working for now...) I recently converted it from GPLv3 due to another request. Since the included PacUtils.js is really from Mozilla Firefox, this likely requires even further review.

I think it'd be great if you were able to re-use this from within NetBeans. (As I mentioned in the post, I had been working towards the same for Eclipse.)

Franciscofs said...

Mark,

This was very useful. The only thing that was a little hard was to realize how to authenticate the user and password. here is how did I do it.

InputStreamReader isr = new InputStreamReader(new FileInputStream("proxy.pac"));
Authenticator.setDefault(new ProxyAuthenticator("myUsert", "myPassword"));

ProxySelector ps = new PacProxySelector(isr);
ProxySelector.setDefault(ps);

Regards,
Francisco

franciscofs said...

Ah sorry I forgot to include the ProxyAuthenticator class.

import java.net.Authenticator;
import java.net.PasswordAuthentication;

class ProxyAuthenticator extends Authenticator {

private String user, password;

public ProxyAuthenticator(String user, String password) {
this.user = user;
this.password = password;
}

protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(user, password.toCharArray());
}
}