Mark A. Ziesemer

Sunday, August 6, 2017

A faster, "complete" Java HttpServletRequest.getRequestUrl() replacement

As another performance-focused method for reuse: It should be easier to obtain the "complete" URL from a HttpServletRequest in Java. This is even a popular question on Stack Overflow: HttpServletRequest to complete URL - though I have various issues with each of the current answers.

Following is a self-written version that I've been using for a number of years now. I previously had posted it as a Gist, and since incorporated it into my MarkUtils-Web library in WebUtils - Checkstyle-approved and JUnit-tested:

import javax.servlet.http.HttpServletRequest; public class GetRequestUrl{ /** *

A faster replacement for {@link HttpServletRequest#getRequestURL()} * (returns a {@link String} instead of a {@link StringBuffer} - and internally uses a {@link StringBuilder}) * that also includes the {@linkplain HttpServletRequest#getQueryString() query string}.

https://gist.github.com/ziesemer/700376d8da8c60585438

* @author Mark A. Ziesemer * <www.ziesemer.com> */ public String getRequestUrl(final HttpServletRequest req){ final String scheme = req.getScheme(); final int port = req.getServerPort(); final StringBuilder url = new StringBuilder(256); url.append(scheme); url.append("://"); url.append(req.getServerName()); if(!(("http".equals(scheme) && (port == 0 || port == 80)) || ("https".equals(scheme) && port == 443))){ url.append(':'); url.append(port); } url.append(req.getRequestURI()); final String qs = req.getQueryString(); if(qs != null){ url.append('?'); url.append(qs); } final String result = url.toString(); return result; } }

MarkUtils-CLI: Annotations (and more) for Apache Commons CLI

So much of Java development in the enterprise often seems to be focused around web applications and other aspects of JEE. Sometimes it is almost comical to watch another developer who has typically been focused on this type of work try to develop a stand-alone Java program. One of the challenges faced here is typically proper acceptance, handling, and validation of command-line arguments.

Fortunately, Apache Commmons CLI exists to help with this effort. The summary from their home page:

The Apache Commons CLI library provides an API for parsing command line options passed to programs. It's also able to print help messages detailing the options available for a command line tool.

Commons CLI supports different types of options:

POSIX like options (ie. tar -zxvf foo.tar.gz)
GNU like long options (ie. du --human-readable --max-depth=1)
Java like properties (ie. java -Djava.awt.headless=true -Djava.net.useSystemProxies=true Foo)
Short options with value attached (ie. gcc -O2 foo.c)
long options with single hyphen (ie. ant -projecthelp)

If anything, I feel that the Apache Commons CLI project is selling themselves short. I've found it to be a very comprehensive, well-designed library for effectively parsing the command-line. The only shortcoming I've observed is that the project was developed before Java 5 - and annotations - were available. As such, the library doesn't offer support for any features that annotations have to offer.

Introducing the latest addition to MarkUtils: MarkUtils-CLI is a library that provides an effective bridge between Apache Commons CLI and Java annotations - without replacing the mature Commons CLI library. Originally developed in 2013 with Commons CLI 1.2, the stability of and between both libraries across multiple releases has been a proven success. Like all of the MarkUtils libraries, there are minimal project dependencies: just commons-cli and slf4j-api.

Of the three stages to command line processing, MarkUtils-CLI should most commonly be used to:

Replace the "Definition Stage" - wrapping it with functionality driven by annotations.
Leave the "Parsing Stage" intact - while also offering a convenient wrapper to help "glue" everything together into a seamless process.
Wrap the "Interrogation Stage" - automatically setting the annotated fields or calling the annotated methods as configured with the values received from the command line.

classParser

As with most languages implementing a "main" method, all command-line arguments coming into a Java program are received simply as Strings. Apache Commons CLI doesn't provide any assistance with parsing or converting these String values to the other data types that may be expected or required by the program being developed. MarkUtils-CLI contains a child "classParser" library to effectively bridging this gap - automatically converting one or more String values to the value type being expected, including:

All primitive types: boolean, char, byte, short, int, long, float, and double.
Number Object types: Byte, Short, Integer, Long, Float, Double, BigInteger, BigDecimal.
Direct assignment to the abstract Number type - using BigDecimal unless a number with a radix != 10 is specified.
Support for alternate number bases for all integral types listed above:
- Hexadecimal: Prefixes 0x, 0X, 0h, or 0H.
- Octal: Prefix 0o.
Boolean
Character
String and CharSequence
java.io.File
java.net.URI
java.net.URL
java.util.UUID
Identifiers to java.nio.charset.Charset
Identifiers to java.util.TimeZone
Enum types, using the enum type's named identifiers.
Class (using forName)
Object (using default instantiation of the passed-in class name, similar to Class above - 0-parameter or default constructor required.)
java.net.InetAddress and java.net.InetSocketAddress, and their respective subtypes - such as java.net.Inet4Address and java.net.Inet6Address.
- InetSocketAddresses are expected in the form of <InetSocketAddress>:<int port>. For example: "127.0.0.1:80" or "::1:80".

Boolean (either to the primitive or Object forms) conversion builds upon the default Boolean.valueOf logic, which evaluates to true if and only if the value is case-insensitive equals to "true". The conversion accepts the following values:

true: Y, YES, T, TRUE, 1, -1, X
false: N, NO, F, FALSE, 0
Any other values will result in a detailed ClassParseException.

The parsers are split into a few different groups, with each class implementing IClassParser. ClassParserChain simply chains together these groups, and also implements IClassParser itself. Each parser may either return a definitive result (including null), throw an Exception which will bubble-up to the caller, or return IClassParser.NO_RESULT to defer processing to a later parser in the chain. UnhandledParser is a parser that always throws a detailed "Unhandled type" ClassParseException. ClassParserChain defines a default chain that can be obtained by calling getDefault(), and contains all available parsers within the library (those listed above) - ending with UnhandledParser.

If something in the default parse chain doesn't work for you (or if you simply don't like it) - simply create and use your own chain. For simplicity, the default chain could be obtained and prepended to in order to add a custom parser that takes precedence over all the following (default) parsers - or appended in order to handle any conversions that aren't already handled by the former (default) parsers.

Nothing in the "classParser" package is tied to any of the CLI logic - either that of Commons CLI, or the CLI-specific logic within MarkUtils-CLI. As such, this code may be appropriately reused for other purposes beyond parsing of command lines. (If appropriate, this code may be factored into a separate project outside of MarkUtils-CLI, which would then become a dependency of MarkUtils-CLI.)

cli

Now that we have a comprehensive set of code to handle the String-to-type conversions, it's time to put it to work.

The Parameter annotation can be assigned to any field or method - and provides attributes that map to most of the available options in the Commons CLI Option class. If a parameter name is not specifically defined, the base field/method name is used by default - following standard JavaBean conventions. Additional ParameterGroup and ParameterGroups annotations are used to work with the Commons CLI OptionGroup functionality. To eliminate needing to deal with security manager concerns, all fields or methods must be declared public to be accessed by this library. Proper design of the application code - including using separate "plain-old Java objects" ("POJOs") for containing the mapped command-line arguments - should eliminate most concerns with this.

Mappings support 1 or more argument values, acceptable by a suitable Array type as either a field, or an equivalent set method - including allowing for varargs. If the annotated class wishes to add any special processing - such as additional validations, or storing multiple values in a Collection type instead of an Array - simply annotate an appropriate set method instead of a field, and implement accordingly.

Most of the actual "magic" happens within ClassOptions. To allow for future extensibility and customization, all methods here are non-static, so an instance of this class must be created before use (using the default constructor). General usage is as follows:

ClassOptionsData get(Class<?>)
- Pass-in the type of the "config" class, containing the @Parameter annotations. The returned ClassOptionsData instance contains mappings from each Commons CLI Option to the reflection Member for efficiency - especially if multiple executions are to be evaluated from within the same program (sure, this may be rare - but it is accounted for). Use of this composite class also allows for future, internal extensions - hopefully without breaking the public API.
- This essentially wraps the Commons CLI "Definition Stage".
Options getOptions(ClassOptionsData cod)
- Using the ClassOptionsData obtained above, returns a fresh Commons CLI Options instance - for use by the Commons CLI CommandLineParser - completing the Commons CLI "Parsing Stage".
void autoMap(Object target, ClassOptionsData, CommandLine, IClassParser)
- Given an instance of the actual "config" class, along with the ClassOptionsData, a Commons CLI CommandLine, and an IClassParser - will read the values from the parsed command-line and automatically map the values to the annotated fields / methods in the "config" target.
- This completed the Commons CLI "Interrogation Stage" - but besides simply providing the necessary details from the command-line, this should hopefully eliminate most of the "boilerplate" coding that would have to be done otherwise.

CliRunner

As part of the cli package, the CliRunner class aims to further simplify the above. While the separate calls to the various ClassOptions methods provide for almost unbounded flexibility, and may still be required in some cases, CliRunner should help further eliminate any remaining "boilerplate" code.

Click here to view the CliRunner Javadoc. All that is required is instantiation using one of the constructors, and calling its run method. As such, the complete code necessary to handle command-line processing - using both Apache Commons CLI and MarkUtils-CLI - can be as simple as:

package com.ziesemer.utils.cli;

import java.util.concurrent.Callable;

public class CliRunnerTestConfigTarget implements Callable<Integer>{
  
  @Parameter
  public int testReturn;

  public Integer call() throws Exception{
    return testReturn;
  }
  
}

package com.ziesemer.utils.cli.examples;

import org.apache.commons.cli.PosixParser;

import com.ziesemer.utils.cli.CliRunner;
import com.ziesemer.utils.cli.CliRunnerTestConfigTarget;

public class CliRunnerExample{

  public static void main(String[] args){
    CliRunner<CliRunnerTestConfigTarget> runner = new CliRunner<CliRunnerTestConfigTarget>(
      new PosixParser(), CliRunnerTestConfigTarget.class);
    CliRunnerTestConfigTarget configTarget = new CliRunnerTestConfigTarget();
    runner.run(configTarget, configTarget, args);
  }

}

Note that here, "configTarget" is being used both as the "configuration" object - as well as the target to be executed. I.E., CliRunnerTestConfigTarget has @Parameter annotations, and implements Callable<Integer>. The Integer result returned by the callable is used as the return code / exit status for the program (assuming everything else executes "normally" - further details below).

There are a significant number of getters and setters provided that can be used to customize the operation of CliRunner. Any unit of significant processing within CliRunner is factored into its own method - allowing for sub-classing and overriding as desired / required. Many of these methods are specifically documented as extension points. Please view the Javadocs (and possibly the source code) for additional details.

As noted above, an additional advantage of using CliRunner is that it can handle the return code / exit status of the program without any additional coding. As noted in the Javadocs, the following codes are used by default (all of which can be overridden):

If the command line is successfully parsed, the result returned by the target. Otherwise:
- 1 if the default help was requested.
- 254 if a ParseException is generated.
- 255 for any other exception (considered the "worst").

Note that to properly handle the return code / exit status for most cases, System.exit() will be called (by default).

Download

com.ziesemer.utils.cli is available on java.ziesemer.com under the GPL license, complete with source code, a compiled .jar, generated Javadocs, and a suite of 95+ JUnit tests. Please report any bugs or feature requests on the GitHub Issue Tracker.

Monday, May 29, 2017

MarkUtils Refresh, New Hosting

It's been a few years, but I just completed a much-needed refresh to my open source Java libraries ("MarkUtils").

The code is vastly unchanged, and has held up incredibly well over the Java 1.6, 1.7, and 1.8 upgrades (and soon, 1.9) - probably mostly due to 90%+ unit test coverage on everything. The main driving factor here was to get things re-hosted following the shutdown of java.net. In the process, I rebased against against all the latest dependencies and Maven plugins, and issued a coordinated release. Everything is built on the latest version of Java (and only publicly available secure version), 1.8.0_131 as of this writing - but should continue to run fine on older runtimes as documented and as needed for other purposes. Change logs are included in the Maven site reports of each project. Also included are some new projects, as well as some past efforts not previously hosted - neither of which have been blogged about in further detail here yet.

Downloads and additional details here: https://java.ziesemer.com. HTTPS/TLS/SSL, DNSSEC, CDN, and IPv6 support all now included.

As time permits, I'll work through my collection of old blog posts here to update any broken links to java.net - as well as to the legacy downloads across java.net and Google Code. Future efforts may include making the source code available directly on GitHub, as well as an accessible online Maven repository - either as a hosted directory at the above site, or possibly through Bintray (any input is welcome).

Saturday, February 23, 2013

How to (not) Validate an Email Address

A common requirement to a software developer often follows the lines of "accept a valid email address". Somewhat appropriately, use of a regular expression is typically considered to fulfill this requirement. Unfortunately, many times this is made more complicated than necessary - and only causes additional issues due to becoming too restrictive.

This scenario is best explained by "I Knew How To Validate An Email Address Until I Read The RFC" (Phil Haack, 2007-08-21, haacked.com). I highly encourage you to read through the entire article, but here are a few significant quotes:

What I found out was surprising. Nearly 100% of regular expressions on the web purporting to validate an email address are too strict.

These are all valid email addresses!
"Abc\@def"@example.com "Fred Bloggs"@example.com "Joe\\Blow"@example.com "Abc@def"@example.com customer/department=shipping@example.com $A12345@example.com !def!xyz%abc@example.com _somename@example.com

The article goes on to suggest the following regular expression (complete with unit tests!):

^(?!\.)("([^"\r\\]|\\["\r\\])*"|([-a-z0-9!#$%&'*+/=?^_`{|}~] |(?@[a-z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$

However - this still misrepresents the real issue. There is only one way to validate an email address: Send something containing a cryptographically secure, one-time token to the email address in question, and prompt the user to provide it (either by input back to the web page, and/or by clicking on a link to invoke the validation). If and only if the user is able to complete this step, it can be reasonably assured that the user owns or at least maintains control of the provided email address.

As such, any related requirement should be relaxed to only ensure that whatever is being entered at least "looks" like an email address. This isn't so much for the purpose of ensuring that the address is valid - but that the proper data is being entered into the correct fields, such that an email address is being entered when prompted, instead of a name or a phone number, for example. To accomplish this, the following regular expression is sufficient:

.+@.+\..+

This basically indicates that in order for a given text to match against this pattern, it must contain one or more characters, followed by the '@' sign, followed by one or more characters, followed by a dot ('.'), followed by one or more characters. (Any of the "one or more characters" will also allow for one or more additional '@' signs or dots ('.').)

So for example, the below is an example validation implemented in JavaScript that "makes sense":

var emailValidation = /.+@.+\..+/;
console.log(emailValidation.exec("user@example.com") != null); // Result: "true"
console.log(emailValidation.exec("www.example.com") != null); // Result: "false"

The only exception to this should probably be for a system that needs to validate email addresses for registration into an email system itself. I.E., for registration of an address into a company's webmail system, and it is well-known that this particular system doesn't accept special characters in the email address, for example. Otherwise, why should a given site be coded to reject what otherwise would be a perfectly valid and legitimate email address, just because the local system's requirements don't "like" certain characters or formats? (Hint: If this is the case, the local system's requirements are probably due for review.)

Real-world issues

One of my reminders of this issue, and part of this inspiration of this post, comes from Liferay Portal. In older versions of Liferay, this was the regular expression (implemented within Java code) for validating an email address:

"([\\w-]+\\.)*[\\w-]+@([\\w-]+\\.)+[A-Za-z]+

This incorrectly caused email addresses containing a '+' sign to be rejected - a practice somewhat common with Gmail, for example. As an enterprise Liferay customer, I requested a patch, which replaced the above with the following regular expression to provide support for the '+' sign (and some other things):

([\\w!#%&-/=_`~\\Q.$*+?^{|}\\E]+)*@([\\w-]+\\.)+[A-Za-z]+

Unfortunately, this patch only led to worse issues. At least with the newer version of the pattern, it is a non-optimized regular expression and is subject to catastrophic backtracking. (Please refer to http://www.regular-expressions.info/catastrophic.html and/or http://www.codinghorror.com/blog/2006/01/regex-performance.html for additional details regarding catastrophic backtracking.) In particular, in these situations, each additional character that needs to be backtracked over will cause an exponential growth in required CPU iterations. This expression seems to "fall apart" starting at about 25 characters. This caused us to run into periodic instances (both production and non-production) where our Liferay JVM would completely hang, causing 100% CPU utilization. I captured a stack trace that a provided to them - but the JVM was basically getting stuck on the following call: com.liferay.portal.kernel.util.Validator.isEmailAddress(String).

I proceeded to escalate this as a critical security issue, due to the potential of an easy-to-execute denial-of-service (DOS) attack for anyone running versions of Liferay using this patch or newer versions containing this code.

A minimal example that can be used to reproduce the issue outside of Liferay - but using the same later regular expression shown above:

 Pattern emailAddressPattern = Pattern.compile("([\\w!#%&-/=_`~\\Q.$*+?^{|}\\E]+)*@([\\w-]+\\.)+[A-Za-z]+");
 // "Accidentally" type an email address in with the wrong punctuation.
 Matcher m = emailAddressPattern.matcher("mark.ziesemer.myexamplecompany@com");
 System.out.println(m.matches());

I'm not even exactly sure what this new regular expression was aiming to accomplish, but quite honestly, I believe Liferay went the wrong direction / took the wrong approach with this modification. I also provided them with what I've provided above, including the minimal regular expression. My advice was not immediately accepted. The latest patch uses the following pattern:

[\w!#$%&'*+/=?^_`{|}~-]+(?:\.[\w!#$%&'*+/=?^_`{|}~-]+)*@(?:[\w](?:[\w-]*[\w])?\.)+[\w](?:[\w-]*[\w])?

... which is almost as bad as Phil Haack's example (above) - actually, probably worse, since I doubt it was tested to the same standards as his (if Liferay even has unit tests around this at all...) To their credit, they did file a feature request around this (LPS-30849) - though it doesn't include any of the rationale for its existence, nor has any effort been demonstrated on the ticket yet.

Saturday, August 11, 2012

parent-updates-maven-plugin

I've had the privilege of working with Apache Maven (official site) in most of my projects almost since its official release in 2005. Maven is so much more than a build automation tool - providing dependency resolution, project configurations, and automatically-generated project reports / web sites, just to name a few features. It isn't often that I find something that I can't accomplish with Maven - but hopefully this is my chance to make a slight contribution back to the project.

One of the many plugins I've been using for a while now is the versions-maven-plugin - specifically to find and report upon any available dependencies (artifacts used by a project), as well as plugins (artifacts used by the Maven build for the project). There are goals to display the results of these update checks during the build - see "Checking for new dependency updates" and "Checking for new plugin updates". The plugin also provides even further detailed report pages. For examples of these, see one of the plugin's own reports - which shows available dependency updates for the plugin itself when it was last built: "Dependency Updates Report".

However, one notable omission seems to be a goal to display any available updates to the parent POM (parent project) - which isn't reported as part of either the available dependencies or plugins. I had reported this as a feature request back in March as MVERSIONS-181: Versions Plugin should have a "display-parent-updates" goal:

Similar to how we have "display-dependency-updates", "display-plugin-updates", and "display-property-updates" goals, there should also be a "display-parent-updates" goal.

For example, we have a corporate root POM. This POM already runs "display-dependency-updates" and "display-plugin-updates" as part of its and all child builds. However, if the root POM is updated, any child builds aren't reporting that an updated parent POM is available. I believe that a "display-parent-updates" goal would help with this.

There is already the "update-parent" goal - but there isn't any equivalent "report/display only" functionality.

Unfortunately, since then and as of this writing, the request hasn't seen any activity. In the spirit of open source, I decided to see what it would take to implement this myself. The result is my own Maven plugin that adds "display-parent-updates" and "parent-updates-report" goals.

Shown below are the results of these efforts, as demonstrated against a small set of mock test projects. Each test project was chosen to go against a different parent. Choosing 2 versions of the Apache Maven project itself as parent examples seemed appropriate - and produced good results for demonstration purposes. I also included my own parent project - which had no parent updates available at the time, and a very simple project that doesn't even use a parent project. Click through the provided links, and observe the console output, as well as the generated report for each example. Note that the report template changes between projects, as the theme can be controlled by the parent project:

maven-3.0 | maven-RELEASE | MarkUtils-RELEASE | no-parent

[INFO] --- parent-updates-maven-plugin:2012.08.08:display-parent-updates (default) @ maven-3.0 ---
[INFO] 
[WARNING] The parent project has a newer version:
[WARNING]   org.apache.maven:maven .................................. 3.0 -> 3.0.4
[WARNING] The parent project has a newer version:
[WARNING]   org.apache.maven:maven-parent ............................... 15 -> 21
[WARNING] The parent project has a newer version:
[WARNING]   org.apache:apache ............................................ 6 -> 11

[INFO] --- parent-updates-maven-plugin:2012.08.08:display-parent-updates (default) @ maven-RELEASE ---
[INFO] 
[INFO] The parent project is the newest version:
[INFO]   org.apache.maven:maven ......................................... 3.0.4
[INFO] The parent project is the newest version:
[INFO]   org.apache.maven:maven-parent ..................................... 21
[WARNING] The parent project has a newer version:
[WARNING]   org.apache:apache ........................................... 10 -> 11

[INFO] --- parent-updates-maven-plugin:2012.08.08:display-parent-updates (default) @ MarkUtils-RELEASE ---
[INFO] 
[INFO] artifact com.ziesemer.utils:com.ziesemer.utils.pom: checking for updates from central
[INFO] The parent project is the newest version:
[INFO]   com.ziesemer.utils:com.ziesemer.utils.pom ................. 2011.08.05

[INFO] --- parent-updates-maven-plugin:2012.08.08:display-parent-updates (default) @ no-parent ---
[INFO] Project does not have a parent

The sources for these mock test projects are checked-in and distributed with the plugin sources, as the best alternative to proper automated testing that I was able to put together for this project. I had hoped to include a proper suite of JUnit tests - but only found disappointment in the current state of available testing frameworks for Maven plugins - some details of which are covered at http://docs.codehaus.org/display/MAVENUSER/Review+of+Plugin+Testing+Strategies.

Implementation notes: This plugin actually extends the org.codehaus.mojo.versions.DisplayDependencyUpdatesMojo, DependencyUpdatesReport, and AbstractVersionsReportRenderer classes from versions-maven-plugin. For the display and report goals, I would have liked to have just extended their parent classes - AbstractVersionsUpdaterMojo and AbstractVersionsReport, respectively. However, due to the issues described at MPLUGIN-164, the Javadoc annotations are not visible across projects, causing fields on the parent classes to not be set - resulting in NullPointerExceptions, etc. The work-around I'm using is to use the maven-inherit-plugin. Unfortunately, it only support extending the classes actually bound to goals - and not their parent classes. Fortunately, the goals I'm extending were written such that I could access and override the methods needed to achieve the desired functionality. The only real side-effect of this is that there are some extra configuration options that are exposed from the goals I'm extending, that don't serve any purpose or have any impact on the goals I implemented. As the base versions-maven-plugin maintains compatibility with Java 1.4, I ensured that my parent-updates-maven-plugin does so as well. However, hopefully once everyone can agree to drop support for Java 1.4 and have a minimum Java version requirement of 1.5, proper Java 5 (non-Javadoc) annotations can be used to eliminate these issues, as detailed at MPLUGIN-189.

My intent for this release is to provide a temporary work-around for this missing functionality. I hope that my efforts may be included in a future version of the proper versions-maven-plugin - which would also eliminate some of the issues I noted above.

parent-updates-maven-plugin is available on ziesemer.java.net. It is released with a dual license - GNU GPL v3, as well as The Apache Software License, Version 2.0 - to match the base versions-maven-plugin that this project extends. Download the parent-updates-maven-plugin-*-dist.zip distribution from here. Please report any bugs or feature requests on the java.net Issue Tracker.

Saturday, March 31, 2012

"Connection Reset" errors, MTU, DHCP, and Time Warner Cable

So long, AT&T DSL

Not too long ago, I made the move from AT&T DSL to Time Warner Cable for my family's home Internet connection. AT&T's pricing was no longer competitive, and their terms of service were nothing to be proud of.

Hopefully most readers have heard about the recent AT&T policies regarding 150 GB data caps for DSL connections. While most people have recently been complaining about similar data caps for mobile data plans - these caps are something I largely agree with, but only for wireless. There is only so much wireless spectrum available to share for everyone within a given area - and the laws of physics don't make these limits the easiest to overcome, at least without having towers on every street corner. (This is what we have Wi-Fi for.) While it is great that the wireless industry is marketing their mobile video capabilities, etc. - they need to ensure that they are offering the actual network services to match, instead of balancing on the edge of false advertising. However, for wired connections, there should be few excuses - as a wire can always be upgraded, or another wire (or fiber!) can always be added. Especially as my family is watching more Netflix and other video content online - combined with regular remote sessions to work and other computer-related activities, hitting 150 GB would not be too difficult. Personally, I also don't expect to be able to watch Netflix videos on mobile - I don't even have a mobile data plan. There is no place for data caps on wired Internet connections.

The other concern that I had was a provision that allowed AT&T to forcefully upgrade us from our DSL account to a higher-priced U-verse account at their discretion. I agree that U-verse is cool, but I'd really like to see it stand up to competition by also having something like Version FiOS available to the same customers. Verizon FiOS is not currently available in my area. I had also expressed some related thoughts in a previous post, when Appleton was considering a bid for Google Fiber. In that post, I had also stated some concerns with Time Warner Cable (TWC) - but they seem to have cleaned-up their act a bit since then - including with a new, impressive local retail presence.

"Connection Reset" errors, MTU, and DHCP with Time Warner Cable

So we switched to TWC's Road Runner service for our Internet service. Of course, it would be too easy if this was without issue. Without any other changes in my computer or network configurations, I noticed "The Connection was reset" errors the very first night with the service. Unfortunately, these issues were very difficult to troubleshoot, as the errors were quite sporadic, and I wasn't able to reproduce on-demand. The issues also were more prevalent on some devices than others - even though all devices worked just fine when connected to other networks. Interestingly, Google sites and services also seemed to be affected more than others. I'm guessing this had to do with most of Google's services being accessed over https:// - which increased the packet sizes and likely led to some of the issues. Having the issues happen the most often with encrypted connections also made troubleshooting with Wireshark, etc., quite difficult.

I had called TWC on this, but got the typical run-around. (Obligatory technical support comic: http://xkcd.com/806/.) They don't see any issues with their service - simply saying that it had to be an issue with my computer or router. They were somewhat correct - but only because my router was following TWC's direction.

eth1      Link encap:Ethernet  HWaddr **:**:**:**:**:**
          inet addr:75.87.***.***  Bcast:255.255.255.255  Mask:255.255.240.0
          UP BROADCAST RUNNING MULTICAST  MTU:576  Metric:1
          RX packets:807 errors:0 dropped:0 overruns:0 frame:0
          TX packets:482 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:85701 (85.7 KB)  TX bytes:77492 (77.4 KB)

See the issue? Frustratingly, it had taken me at least another week to spot it. Per TWC's own Road Runner help pages, the Ethernet standard for the Maximum Transmission Unit (MTU) is Ethernet standard of 1,500 bytes. So why was my router being configured for an MTU of only 576 bytes? Here was the latest from my /var/lib/dhcp3/dhclient.eth1.leases file after reproducing the issue:

lease {
  interface "eth1";
  fixed-address 75.87.***.***;
  option subnet-mask 255.255.240.0;
  option routers 75.87.192.1;
  option dhcp-lease-time 43200;
  option dhcp-message-type 5;
  option dhcp-server-identifier 10.65.64.1;
  option interface-mtu 576;
  option broadcast-address 255.255.255.255;
  option host-name "********";
  renew 0 2012/04/01 05:31:32;
  rebind 0 2012/04/01 10:41:37;
  expire 0 2012/04/01 12:11:37;
}

As I mentioned, my router was only doing "as it was told". I'm not exactly sure what is acting as the DHCP server or where it is located, but as the 10.65.64.1 address is not resolvable nor does it appear active, my guess is that this is being served from the cable modem itself, and a function of firmware loaded and controlled by TWC's Road Runner's servers. I'm also guessing that this option is not requested by or respected by most other devices (including SOHO routers and Microsoft Windows, etc.) - otherwise I'm sure TWC would have recognized this and corrected it by now. Fortunately, dhclient under Linux provides an easy work-around - with "work-around" probably being too drastic of a name, as it is really just a simple configuration change. Here is a portion of the default /etc/dhcp/dhclient.conf on my distribution, with the critical detail highlighted:

request subnet-mask, broadcast-address, time-offset, routers,
  domain-name, domain-name-servers, domain-search, host-name,
  netbios-name-servers, netbios-scope, interface-mtu,
  rfc3442-classless-static-routes, ntp-servers,
  dhcp6.domain-search, dhcp6.fqdn,
  dhcp6.name-servers, dhcp6.sntp-servers;

I simply commented-out this field, and my Internet connection has been stable since.

I called back TWC to report my findings, in an effort to hopefully help others having the same issue. Realizing that they refused to recognize that anything was wrong, I also wrote-up a summary of the issue and hand-delivered it to the local TWC retail office - including an invite to contact me with any needed requests for further details - hoping that it would be directed to someone who would be empowered to fix this. Needless to say, 4 months later, this issue still has not been addressed or resolved by TWC.

Thursday, February 2, 2012

Prepping Cygwin for a multi-user installation

I was working on installing Cygwin on a base Windows 7 image that will be used by a significant number of developers. As much as I like Cygwin, following the same experience I previously documented in Fixing Cygwin's user groups, sometimes things just don't work as smoothly as one would hope.

First, even though this will be a "install once" situation that will be copied as part of the image, I still needed to select the packages to install. Having previously installed Cygwin on a number of individual desktops and other images, I finally needed a better way to choose the packages and other installation options. Fortunately, Cygwin's setup.exe has very comprehensive command-line support. The following replicates what I've been using for all of my recent installs:

setup.exe -l C:\cygwin\Setup ^
-s http://sourceware.mirrors.tds.net/pub/sourceware.org/cygwin/ ^
-P cygrunsrv,^
cabextract,p7zip,sharutils,tnef,unace,unzip,zip,^
binutils,cvs,cvsutils,gcc,make,subversion,^
hexedit,mc,nano,vim,^
bind-utils,curl,inetutils,openldap,openssh,rsync,^
bsdiff,patch,patchutils,time,wdiff,which,xdelta,^
tidy,wget,^
xinit,xterm

(The lines are separated by command-line option, and by installation category used in the package manager.)

After installation, there is an issue to address regarding the permissions of the installed Cygwin files. Following Using Windows security in Cygwin, Cygwin operates with all file permissions using the POSIX security model. This leads to some interesting issues, and Cygwin only uses the user/group/other permissions - where as Windows would allow for more than 3 access control entries (ACE) in the access control list (ACL). This results in many files, including /etc/passwd and /etc/group being editable only to the the user that installed Cygwin. This happens even when the default "Install For - All Users" option during installation is kept, vs. selecting the "Just Me" option. (I could have seen choosing "Just Me" affecting this, as its description includes "Only select this if you lack Administrator privileges..." - but this isn't the case.)

The result of this is that once installed, many files can't be edited by another user. This includes later upgrading Cygwin using setup.exe as another user. This could impact systems installed for planned multi-user access - or even just a system that is pre-installed by an Administrator for hand-off to another user. It doesn't matter if these other users are also members of the "Administrators" group, as resulting file permissions look like the following:

$ icacls C:/cygwin/etc/passwd
C:/cygwin/etc/passwd <COMPUTERNAME>\Administrator:(R,W,D,WDAC,WO)
                     <COMPUTERNAME>\None:(R)
                     Everyone:(R)

$ ls -l /etc/passwd
-rw-r--r-- 1 Administrator None 437 Feb  2 21:37 /etc/passwd

This is particularly interesting as once each new user attempts to use Cygwin for the first time, they are greeted with the following (generated from the bottom of /etc/profile):

Your group is currently "mkpasswd".  This indicates that your
gid is not in /etc/group and your uid is not in /etc/passwd.

The /etc/passwd (and possibly /etc/group) files should be rebuilt.
See the man pages for mkpasswd and mkgroup then, for example, run

mkpasswd -l [-d] > /etc/passwd
mkgroup  -l [-d] > /etc/group

Note that the -d switch is necessary for domain users.

Without accounting for these permission issues, the user will have no way of doing this, without resetting the file permissions on these files to allow access.

My current work-around for this involves running the following after installing Cygwin or running any updates using setup.exe:

find /bin /dev /etc /lib /tmp /usr /var -user 1000 -exec chown -h Administrators '{}' +
chown -h Administrators /home

This results in the same /etc/passwd file now looking like the following:

$ icacls C:/cygwin/etc/passwd
C:/cygwin/etc/passwd BUILTIN\Administrators:(R,W,D,WDAC,WO)
                     <COMPUTERNAME>\None:(R)
                     Everyone:(R)

$ ls -l /etc/passwd
-rw-r--r-- 1 Administrators None 437 Feb  2 21:37 /etc/passwd

Note that this keeps the ACE entries in the same ordering of user/group/other, which prevents what could be implications as detailed in the same http://cygwin.com/cygwin-ug-net/ntsec.html#ntsec-mapping URL linked to above. Cygwin doesn't seem to mind that "Administrators" is actually a group instead of a user, and I've not yet seen any side-effects to doing this.

Now ideally, the Cygwin installation could also be modified to automatically add new users to /etc/passwd to avoid the "Your group is currently "mkpasswd"" prompt when they first use Cygwin. Without editing /etc/profile directly (which would cause it to be skipped in any future updates), the only place I could find to tie into this is in /etc/profile.d/. However, this is interesting in that it is dependent upon the shell being used. I.E., if using bash, only *.sh files will be sourced, and if using ZSH, only *.zsh files will be sourced. Assuming that only bash will be used (at least for each initial login), I added the following as /etc/profile.d/mkpasswd.sh:

case "$(id -ng)" in
mkpasswd )
 echo
 echo "/etc/profile.d/mkpasswd.sh: Running mkpasswd for new user $(id -un)..."
 mkpasswd -c >> /etc/passwd
 echo "Please press any key to exit.  You will need to manually restart Cygwin."
 read -sn 1
 exit
 ;;
esac

I'm not too crazy about requiring Cygwin to be restarted, and furthermore for requiring the user to assist, but this should do the job and eliminate some support requests.

I reported a summary of this to the Cygwin mailing list.

Sunday, October 30, 2011

Ubuntu Linux 11.10 Install Notes

This past week, I finally got around to installing the latest version of Ubuntu Linux on my home server - 11.10, "Oneiric Ocelot" (released 2011-10-13). There shouldn't be anything too significant here - these are mainly notes for myself, but posted here in case something is useful to others. I always do a full, clean re-installation - so several of the notes listed here won't be of concern for an in-place upgrade.

For some additional reference, this is the same server that I have setup as part of my Ubuntu Linux Router project, is setup with LDAP, and is the subject of several other Linux-related posts.

As I am using and now dependent on VLANs on this server for my network setup, I need to have a few necessary related packages readily available for the upgraded OS before reinstalling. These are not currently included on either the CD or DVD installation images, and I need them to be able to regain Internet connectivity - so simply using "apt-get" to re-install them from the Internet is clearly not an option. For the 11.10 upgrade, I downloaded the following amd64 *.deb packages:

After the fresh installation, the server fails to boot. This isn't anything new to to 11.10, but has been something to remember since I converted to a GUID Partition Table (GPT) from the legacy MBR partitioning scheme. I already have the special BIOS boot partition, as detailed under "GPT" at http://www.gnu.org/software/grub/manual/grub.html#BIOS-installation, which doesn't need to be re-configured with an OS upgrade. However, after the Ubuntu installation finishes, I do need to reset this partition as "active" in the legacy MBR partition table using fdisk - as I apparently have one of the Intel motherboards with a "buggy BIOS" that otherwise shouldn't require this. Additional details are at http://ubuntuforums.org/showpost.php?p=9787109&postcount=22 and http://www.rodsbooks.com/gdisk/bios.html.

All of my posts for the Ubuntu Linux Router project, while written for 8.04, are still almost completely valid and applicable for all releases since - including 11.10. One edit for Configuring Persistent PPP is that instead of attempting to disable NetworkManager, it is easier to just remove it: "apt-get remove network-manager". Install the networking *.deb packages downloaded above, then configure the DSL. Uncomment the "usepeerdns" line from the /etc/ppp/peers/dsl-provider restored from backup until BIND can be setup.

Once online, download and install some additional packages:

apt-get install vim build-essential ssh gnome-panel nautilus-open-terminal dkms linux-server
apt-get upgrade

After getting LDAP re-configured, 11.10 has some problems with graphical / xorg logins, as detailed at https://bugs.launchpad.net/ubuntu/+source/at-spi2-core/+bug/870874, and somewhat, "Problems with LightDm on Ubuntu 11.10 Oneiric? Here's how to solve" (2011-10-16, linux-software-news-tutorials.blogspot.com). Unfortunately, in this case, both LightDm and gdm exhibit the same issue - so my current work-around is to disable the problem script, /etc/X11/Xsession.d/90qt-a11y, by prefixing its name with a non-numeric character.

Finally, I don't need the extra help that Ubuntu provides at every login, so I disable it:

$ sudo chmod -x /etc/update-motd.d/10-help-text

Saturday, October 1, 2011

Fixing UPnP/DLNA sharing on 2nd drive under Windows 7

Background

Though I may be a few years late to the party, I finally have a capable Home Theater PC (HTPC) and HDTV setup in my living room. For simplicity for the entire family, it's running Windows 7's Media Center.

My favorite feature of a HTPC-solution is the digital video recorder (DVR) functionality - including the options to pause and replay live TV. My family previously had a hard-drive enabled DVD player / DVR with this functionality built-in - but without an ATSC digital tuner, its use is now limited to analog cable. I since had a newer Philips DVD player / DVR with similar features - including an ATSC digital tuner - but it was only standard-definition (SD), not high-definition (HD), and recorded content had issues with volume levels as well as not supporting closed captioning and other broadcast features.

Disappointingly, these types of devices are appearing to be pushed out of the market in favor of versions that require monthly fees. Other than a new subscription to Netflix and our DVDs, most of what we watch is free broadcast television - without paying monthly fees for cable or satellite television service. I'm excited to have a HTPC-based DVR that is also fee-free. Much of the broadcast television we watch is also in HD, for free - something most paid-for TV services are still behind in for not offering without yet additional fees!

Besides having a direct HDMI connection between the PC and the TV, both devices support UPnP/DLNA. I'll prefer the HDMI connection as the primary link between the HTPC and its connected TV - mostly as it doesn't require to have an extra encode/decode step to stream the video over Ethernet. The DLNA method is also missing a few features that are natively available from the HTPC - including some simpler features such as closed captioning. However, the DLNA feature is certainly nice to have as a convenience for quickly streaming videos, pictures, or music from any other computer connected to the network - without needing to first add a wired connection, or otherwise transfer the media to a USB flash drive or another connected device.

UPnP/DLNA issue

I ran into some difficulties, however, when trying to stream content from my primary laptop - also running Windows 7 x64, Professional. My laptop simply would not show itself as a media server device. It wasn't a problem with the TV, as other clients on the network could also see all the other servers - just not my laptop. I eventually found that I could create new user accounts in Windows, and then my laptop became visible as a server - but only for the new user accounts, and not my primary user account. Unfortunately, this was something a bit more complicated than adjusting the "Media streaming options" in the Control Panel for my user account.

UPnP uses Simple Service Discovery Protocol (SSDP) for discovery. Not finding anything significant within the Windows Event Log, and not finding any other tools within Windows that could help troubleshoot the issue, I used Wireshark to try to determine what was causing the issue. I found several SSDP HTTP NOTIFY packets, with a URL of something like http://[fe80::5ddf:5cb8:a7e1:bbcb]:2869/upnphost/udhisapi.dll?content=uuid:eb1c9c01-e5c3-4eea-bea2-d9f0a6fb2bcf. Following the URL, XML was returned that described the media servers for all other media servers and users on the network - including from my own laptop, but not for my primary user account. This confirmed that the issue was with my server, and not the network or clients.

I eventually ran across several posts of other users having very similar symptoms, including:

Cannot "Turn on media streaming" - another Windows 7 user with the same problem, and 'suggested fixes' not working. - 2010-09-30, answers.microsoft.com
Windows 7 Media Streaming won't work after moving User directory with MKLINK - 2011-07-23, social.technet.microsoft.com

Unfortunately, they were all unanswered (both replied to - especially with many non-solutions to the first). However, they all had something in common with my laptop setup: The user profile directory was moved off of the system drive. This should be a common setup - using disk partitions to separate the user data from the program data. Another common scenario for this today is to have the OS on a high-performance SSD, and a larger HDD for the user data. However, as Windows doesn't easily provide for configuring this setup, it isn't surprising that this isn't a more visible issue.

My Solution

In both threads, the users seemed to infer that the problem must have been in the file copy / move of the user profile directory between drives. Fortunately, at least in my case, this had nothing to do with the content of the files - but with the security permissions / Access Control Lists (ACLs).

Even after moving my user profile to a new drive / partition, folders such as "My Videos" have an added "Read" permission granted to "WMPNetworkSvc" (which is obviously the account for the "Windows Media Player network service"). This permission does not exist on (nor does it need to exist on) parent folders. However, there is a quirk in Windows where at least limited access needs to be granted to the root of the hosting drive, otherwise accounts can't access any data within the drive - regardless of what child ACLs may be present. This was not an issue on the C:\ drive, as by default, the "Read" permission is granted to "Users", and additionally has other special permissions granted to "Authenticated Users" - both which should apply to the "WMPNetworkSvc" account. Such permissions, however, did not exist on my data drive - which only had permissions defined for my personal account, the "Administrators" group, and the "SYSTEM" group.

The fix is rather simple - give the "WMPNetworkSvc" limited access to the root of the alternate drive hosting the user profile. By limited, I'm not kidding - and keeping access to the absolute minimum only follows best security practices. (You'll need to use the "Advanced" button to open the "Advanced Security Settings" dialog for this):

From the drive properties, "Security" tab, click "Advanced".
From the "Advanced Security Settings" dialog, on the bottom of the "Permissions" tab, click "Change Permissions...".
On the 2nd dialog that pops-up, click "Add...".
Type in "NT Service\WMPNetworkSvc" (without the quotes), click "Check Names" to make sure it is recognized, then click "OK".
- The "NT Service" prefix is necessary to resolve the Windows service account. There is no other known way to be able to select it through the UI options.
In the "Permission Entry" dialog that pops-up:
1. Under "Apply to", select "This folder only".
2. Under "Permissions:", check only the "Allow" checkbox for "Traverse folder / execute file".
3. Click "OK".
Click "OK" to close all 3 remaining dialogs.
Restart the "Windows Media Player Network Sharing Service" (WMPNetworkSvc) Windows service.

All of the above UI steps can also be easily replaced with the following commands. Simply replace "D:\" with the drive letter that the Windows Media Player network service needs access to:

icacls D:\ /grant "NT Service\WMPNetworkSvc":(X)
net stop WMPNetworkSvc
net start WMPNetworkSvc

This is certainly not meant to nor will this help fix the many different types of issues that exist with network media sharing. However, hopefully this will benefit those who have organized their systems on to separate drives or partitions, and are experiencing the same issue.

HP LaserJet PCL Errors and Driver Availability

My primary printer for the past 10 years has been a trusted Hewlett-Packard (HP) LaserJet 2200D, with an added network adapter making it the equivalent of a 2200DN. 12,600 pages later, it is still working like new (though recognizing in a shared office environment, this same print volume would probably be used within a fraction of a year.)

After recently completing a Windows 7 x64 reinstall, I installed the latest driver linked to from the product support page, which is the Universal Print Driver (UPD), version 5.3.1. Interestingly, the "Previous version" column on the download page is empty - and I was unable to find any archive of driver downloads on HP's site.

This wouldn't have been a problem - except for having the following printed in place of the expected output whenever I attempted to print:

PCL XL error

        Subsystem:  KERNEL

        Error:      UnsupportedProtocol

        Operator:   0x0

        Position:   0

Fortunately, I still had the last known-working version of the driver saved, 5.0.3.37 from December 2009. This worked, but now I was curious as to if this was an issue new to the 5.3.1 drivers and if there were still other updates I could be benefiting from.

Kudos to HP for still releasing drivers for a 10-year-old product, and for the concept of a universal driver. However, these versions shouldn't replace existing ones on the product support page - especially if there isn't at least minimal testing with the related product to ensure that it still works.

From HP's Readme for the HP Universal Print Driver v5.3.1.10527:

Each release is a full release of the product for all printer description languages for both 32bit and 64bit platform. Version history of UPD builds released on CD-ROM or posted to www.hp.com/go/upd.

Unfortunately, there is nothing visible at www.hp.com/go/upd for previous versions of the drivers. Even after searching HP's FTP site, I couldn't find any "official" archive of the UPD driver for versions prior to 5.3.1. However, I did find copies elsewhere on the Internet, and here are some details of each that should be useful to anyone else trying to find the same versions. Note that these are for Windows x64, PCL 6 only:

Version	File Name	Size (bytes)	SHA-1 Checksum
5.0.3.37 / December 2009	`upd-503-pcl664.exe`	16,289,336	`733cd725b962b99a25b9b7c94cdd5aa3a06a7116`
5.1.1.8232 / August 2010	`upd-5_1_1_8232-pcl6_winxp-vista-x64.exe`	16,329,016	`2ead57774607bec316e55ad707b841fe8dfe6cfa`
5.2.6.9321 / February 2011	`upd-PCL6-X64-5_2_6_9321.exe`	16,633,400	`bcb70eb684d1f5b684bcc2873a30df72293416b0`
5.3.1.10527 / July 2011	`upd-pcl6-x64-5.3.1.10527.exe`	17,794,616	`a789f2bab5bf9d48897c8dd99dbae9a46bd7031f`

None of the drivers newer than 5.0.3.37 printed anything other than the "UnsupportedProtocol" error shown above on my 2200. This seems to correlate to HP's version history, which shows that Unidrv*.dll - which was at version 6.0.6001.22127 from 4.7.0.30 in November 2008 through 5.0.3.37, was updated to 6.1.7600.16385. It seems probable that the upgrade of Unidrv*.dll caused an incompatibility with the printer.

I've emailed this to HP support (incident # 14612070), and beyond receiving the automated "One of our Technical Support Specialists will be responding to your inquiry" message, I'm skeptical that I'll receive a response, or that this issue will be addressed in future versions of the UPD driver.

I did receive a few responses:

According to your product serial number, the warranty on your HP LaserJet 2300d printer has expired.
- My response:
  The drivers listed on your web site at http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=28862&prodTypeId=18972&prodSeriesId=28861&swLang=8&taskId=135&swEnvOID=4063 list the UPD 5.3.1 driver as being the "best available driver" that worked with the LaserJet 2200D. This is not correct, since these drivers are not working with this printer. As such, this page should either be updated to include the 5.0 drivers, which is the last version of the UPD that works with the printer - or this page should simply be removed entirely, optionally with a "no longer supported" message. (Of course, the preference would be to update the page to include a link to the working 5.0 drivers.)
Thank you for contacting Hewlett-Packard's Commercial Solutions Center.

Mark, unfortunately, Hewlett-Packard does not offer support via e-mail for your product.

We are sorry for the inconvenience caused to you in this regard.

The issue seems to be with the printer driver you are using. Hence, to resolve the issue, please install the UPD driver which mmight resolve the issue.

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=18972&prodSeriesId=28861&prodNameId=28862&swEnvOID=4063&swLang=8&mode=2&taskId=135&swItem=lj-95991-2

We hope that the above information provides a quick solution to your inquiry.

Thank you again for contacting HP e-Solutions.
- (Apparently, HP didn't comprehend the part about the problem being with latest version of the UPD driver - and that no previous versions were available for download from HP.)
- My response:
  
  I understand that my previous email was not even read, beyond the fact that yes, I'm using an out-of-warranty LaserJet 2200D. However, I'm not looking for support - I just want you to update your web site to reflect correct information.
  
  The problem that I'm attempting to bring to your attention is that I'm not the only one having this issue, and that the product page for this product that HP is still hosting should either be updated to reflect this, or remove the page entirely. I already found the solution - using the 5.0 version of the UPD instead of using the 5.1.3 one. Simply, please update the page at http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=28862&prodTypeId=18972&prodSeriesId=28861&swLang=8&taskId=135&swEnvOID=4063 to reflect this, and to assist other users who may be having the same problem - or remove the page as to not promote dis-information.
Thank you for contacting HP eSolutions.

We really apologize for the inconveniance caused to you in regards to this issue.

Thank you for updating on this. We will escalate this to the concerned team.

If you have any other queries, please feel free to get back to us and we would be glad to be of assistance.

Once again, thank you for contacting HP eSolutions.
- (Received 2011-09-19. No communications since.)

Saturday, April 16, 2011

Improving code and quality with Checkstyle

I've always been picky about the quality my code and the code that I work with. For good reason. It makes the code easier to read and work with. Consistency makes the program flow easier to understand, bugs and other potential issues easier to spot, and difference comparisons between files and versions more effective. Sun Microsystems (original designer of the Java platform, now part of Oracle) seems to agree. From their "Code Conventions for the Java Programming Language":

This Code Conventions for the Java Programming Language document contains the standard conventions that we at Sun follow and recommend that others follow. It covers filenames, file organization, indentation, comments, declarations, statements, white space, naming conventions, programming practices and includes a code example.

80% of the lifetime cost of a piece of software goes to maintenance.
Hardly any software is maintained for its whole life by the original author.
Code conventions improve the readability of the software, allowing engineers to understand new code more quickly and thoroughly.

For years, I haven't really used any special tools to assist with code formatting and standards, other than the Eclipse Java Formatter for automatically and completely reformatting code as needed, and Eclipse's Compiler Errors/Warnings for pointing out potential issues. One limitation with the compiler errors/warnings in Eclipse is that they really don't include any formatting-specific checks.

I've had experience with Checkstyle in the past (official site | Wikipedia). However, I'm not sure how much thought was put into the configuration of the standards that were in place, and combined with a high level of false positives, most of the developers found ourselves working against the tool rather than working with it. Standards are great, but they need to be thought through, and have buy-in from the project team.

Over the past few months, I rediscovered a need for Checkstyle for the group of developers I was working with. We were doing an satisfactory job addressing the compiler errors and warnings, but the code itself was a sore sight to look at. The code formatting was addressed during code reviews, but without having the formatting issues also appear as warnings in the Eclipse "Problems" view as it was being developed, the formatting issues were quickly "out of sight, out of mind", and quickly forgotten by much of the team. (I'm not sure how the group would have managed with a language like Python, where proper indentation of the code is actually part of the language syntax.)

Fortunately, since I last worked with Checkstyle, the tools have only improved. eclipse-cs now integrates seamlessly with both Eclipse and the Checkstyle XML configurations, and is fully configurable to projects on both local and global levels - using either internal, external, remote, or project-relative configurations.

For my own reference, as well as others to benefit from, I'm sharing the Checkstyle configuration I've worked with over the past few months. The surprising thing is how many inconsistencies were reported and that I was able to fix even in my own code after running the Checkstyle validations.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE module PUBLIC "-//Puppy Crawl//DTD Check Configuration 1.3//EN" "http://www.puppycrawl.com/dtds/configuration_1_3.dtd">

<!--
    This configuration file was written by the eclipse-cs plugin configuration editor
-->
<!--
    Checkstyle-Configuration: MAZ
    Description: none
-->
<module name="Checker">
  <property name="severity" value="warning"/>
  <module name="TreeWalker">
    <property name="tabWidth" value="2"/>
    <module name="JavadocMethod">
      <property name="severity" value="ignore"/>
      <property name="suppressLoadErrors" value="true"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="JavadocType">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="JavadocVariable">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="JavadocStyle">
      <property name="checkFirstSentence" value="false"/>
    </module>
    <module name="ConstantName"/>
    <module name="LocalFinalVariableName"/>
    <module name="LocalVariableName"/>
    <module name="MemberName"/>
    <module name="MethodName"/>
    <module name="PackageName"/>
    <module name="ParameterName"/>
    <module name="StaticVariableName">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="TypeName"/>
    <module name="AvoidStarImport"/>
    <module name="IllegalImport"/>
    <module name="RedundantImport"/>
    <module name="UnusedImports"/>
    <module name="LineLength">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="MethodLength">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="ParameterNumber">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="EmptyForIteratorPad"/>
    <module name="MethodParamPad"/>
    <module name="NoWhitespaceAfter">
      <property name="tokens" value="BNOT,DEC,DOT,INC,LNOT,UNARY_MINUS,UNARY_PLUS"/>
    </module>
    <module name="NoWhitespaceBefore"/>
    <module name="OperatorWrap"/>
    <module name="ParenPad"/>
    <module name="TypecastParenPad"/>
    <module name="WhitespaceAfter">
      <property name="tokens" value="COMMA,SEMI"/>
    </module>
    <module name="WhitespaceAround">
      <property name="tokens" value="BAND,BAND_ASSIGN,BOR,BOR_ASSIGN,BSR,BSR_ASSIGN,BXOR,BXOR_ASSIGN,COLON,DIV,DIV_ASSIGN,EQUAL,GE,GT,LAND,LE,LOR,LT,MINUS,MINUS_ASSIGN,MOD,MOD_ASSIGN,NOT_EQUAL,PLUS,PLUS_ASSIGN,QUESTION,SL,SL_ASSIGN,SR,SR_ASSIGN,STAR,STAR_ASSIGN,TYPE_EXTENSION_AND"/>
    </module>
    <module name="ModifierOrder"/>
    <module name="RedundantModifier"/>
    <module name="AvoidNestedBlocks">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="EmptyBlock">
      <property name="option" value="text"/>
      <property name="tokens" value="LITERAL_DO,LITERAL_ELSE,LITERAL_FINALLY,LITERAL_IF,LITERAL_FOR,LITERAL_TRY,LITERAL_WHILE,STATIC_INIT"/>
    </module>
    <module name="LeftCurly"/>
    <module name="NeedBraces"/>
    <module name="RightCurly"/>
    <module name="AvoidInlineConditionals">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="DoubleCheckedLocking"/>
    <module name="EmptyStatement"/>
    <module name="EqualsHashCode"/>
    <module name="HiddenField">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="IllegalInstantiation"/>
    <module name="InnerAssignment">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="MagicNumber">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="MissingSwitchDefault">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="RedundantThrows">
      <property name="suppressLoadErrors" value="true"/>
    </module>
    <module name="SimplifyBooleanExpression"/>
    <module name="SimplifyBooleanReturn"/>
    <module name="DesignForExtension">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="FinalClass"/>
    <module name="HideUtilityClassConstructor">
      <metadata name="net.sf.eclipsecs.core.comment" value="MAZ: Disabled, until this module provides an option to ignore abstract classes."/>
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="InterfaceIsType"/>
    <module name="VisibilityModifier">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="ArrayTypeStyle"/>
    <module name="FinalParameters">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="TodoComment">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="UpperEll"/>
    <module name="RegexpSinglelineJava">
      <metadata name="net.sf.eclipsecs.core.comment" value="MAZ"/>
      <property name="format" value="^\t* +\t*\S"/>
      <property name="message" value="Line has leading space characters; indentation should be performed with tabs only."/>
      <property name="ignoreComments" value="true"/>
    </module>
    <module name="RegexpSinglelineJava">
      <metadata name="net.sf.eclipsecs.core.comment" value="MAZ"/>
      <property name="format" value="^[^&quot;]*(catch|for|if|switch|synchronized|while)\s+\("/>
      <property name="message" value="No whitespace after block keyword."/>
      <property name="ignoreComments" value="true"/>
    </module>
    <module name="RegexpSinglelineJava">
      <metadata name="net.sf.eclipsecs.core.comment" value="MAZ"/>
      <property name="format" value="^\s*(&quot;([^&quot;])*(?&lt;!\\)&quot;|[^&quot;\t])+\)\s+\{"/>
      <property name="message" value="No whitespace before '{'."/>
      <property name="ignoreComments" value="true"/>
    </module>
    <module name="RegexpSinglelineJava">
      <metadata name="net.sf.eclipsecs.core.comment" value="MAZ"/>
      <property name="format" value="^\s*(&quot;([^&quot;])*(?&lt;!\\)&quot;|[^&quot;\t])+(\t|  )"/>
      <property name="message" value="No tabs or multiple whitespace beyond indentation."/>
      <property name="ignoreComments" value="true"/>
    </module>
    <module name="AnnotationUseStyle">
      <metadata name="net.sf.eclipsecs.core.comment" value="javac currently does not accept a trailing comma (http://bugs.sun.com/view_bug.do?bug_id=6337964)."/>
      <property name="severity" value="error"/>
      <property name="elementStyle" value="ignore"/>
      <property name="closingParens" value="ignore"/>
    </module>
    <module name="MissingDeprecated"/>
    <module name="MissingOverride"/>
  </module>
  <module name="JavadocPackage">
    <property name="severity" value="ignore"/>
    <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
  </module>
  <module name="NewlineAtEndOfFile">
    <property name="lineSeparator" value="lf"/>
  </module>
  <module name="Translation"/>
  <module name="FileLength">
    <property name="severity" value="ignore"/>
    <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
  </module>
  <module name="FileTabCharacter">
    <property name="severity" value="ignore"/>
    <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
  </module>
  <module name="RegexpSingleline">
    <metadata name="net.sf.eclipsecs.core.comment" value="MAZ"/>
    <property name="format" value="[^\s]+\s+$"/>
    <property name="message" value="Line has trailing whitespace."/>
  </module>
</module>

This configuration is based off of the "Sun Checks" configuration, with a few enhancements, and tweaks for personal preference.

One of the noteworthy settings is "trailingArrayComma" setting under Annotations/Annotation Use Style. Per the Java Language Specification (JLS), a trailing comma in an annotation array should be allowed / ignored. The Eclipse compiler respects this, but attempting to use javac (e.g. as part of a Maven build) with a trailing comma results in a "illegal start of expression" failure, due to an acknowledged bug with javac. Using Checkstyle here helps to ensure consistent, stable builds across compilers.

Additionally, I extended the configuration to perform limited checks on non-Java files as well. First, there is a "exclude from checking" configuration section that includes a "all files types except:" rule that is enabled by default, and set to "java, properties". This filter needs to be expanded to include any file types that should even be considered for processing by Checkstyle. For basic web development, I added "htm, html, js, css, jsp". Next, disable the "Use simple configuration" option, which enables "Advanced - configure file sets to control which files get checked by which configuration". I then configured 3 file sets:

Java - Uses the configuration shown above. Includes regular expression matches for "\.java$" and "\.properties$".
Web - Provides basic whitespace rules for non-Java files. Includes regular expression matches for each of the "web development" types mentioned above, or "\.(html?|js|jsp|css)$" as a version to accomplish it with one expression.
JSP - Provides additional checks specific to the scriptlet tags used in JSP files. Includes a regular expression match for "\.jsp$". Note that the above Web file set is configured to also apply to JSP files.

In each of these configurations are a number of additional regular expression checks, that primarily focus on whitespace rules that aren't currently checked for or configurable by the other Checkstyle rules:

File Set	Module	Pattern	Message
Java	RegexpSingleLineJava	`^\t* +\t*\S`	Line has leading space characters; indentation should be performed with tabs only.
Java	RegexpSingleLineJava	`^\s("([^"])(?<!\\)"\|[^"\t])+(\t\| )`	No tabs or multiple whitespace beyond indentation.
Java	RegexpSingleLine	`[^\s]+\s+$`	Line has trailing whitespace.
Java	RegexpSingleLineJava	`^\s("([^"])(?<!\\)"\|[^"\t])+\)\s+\{`	No whitespace before '{'.
Java	RegexpSingleLineJava	`^\s*}\s+`	No whitespace after '}'.
Java	RegexpSingleLineJava	`^[^"]*(catch\|for\|if\|switch\|synchronized\|while)\s+\(`	No whitespace after block keyword.
Web	RegexpSingleLine	`^\t* +(?!\)+\t\S`	Line has leading space characters; indentation should be performed with tabs only.
Web	RegexpSingleLine	`^\s(?!\)([^\s"]\|"[^"]*(?<!\\)")+(\t\| )`	No tabs or multiple whitespace beyond indentation.
Web	RegexpSingleLine	`[^\s]+\s+$`	Line has trailing whitespace.
JSP	RegexpSingleLine	`(?<=.+)(?<!%>)<%(?!=\|.+;%>)`	Scriptlet begin tag must be at beginning of line.
JSP	RegexpSingleLine	`<%[^@=](?!;%>)+$`	Content after scriptlet begin tag must be on new line.

These rules are not all infallible. They are designed to catch the majority of the targeted issues, but without causing false positives in the process. (For example, ignoring spacing within multi-line comments in JSP files - as Checkstyle's Java interpretation doesn't work in JSP files, and forbidden sequences that are allowed within String constants are a bit tricky to work with.) In order to prevent regressions as the rules are expanded upon, all of these regular expressions are included in a local suite of JUnit tests that I plan to make available at a future date. There are currently 22 tests in 3 test classes (one per file set), executing 33 base comparisons that result in a total of 993 assertions. Recently, some issues with the JSP rules were observed, causing Checkstyle to fail on "minified" JavaScript files that consisted of single lines of 10,000+ characters. Recursion within Java's regular expression support resulted in stack overflow exceptions, until I tweaked the JSP expressions to make use of different zero-width lookahead and lookbehind constructs. A test case on 10,000+ characters in a single line still takes several seconds to execute, but at least it consistently completes without exception now.

After getting the source code and formatting in order (and to not drown developers in too many warnings at once), additional tools can be considered for use, e.g. FindBugs (official site | Wikipedia). While I have nothing against FindBugs, I just haven't seen as urgent of a need for it. Most of the bugs I've seen are either due to things that are already covered by properly-configured errors and warnings in the Eclipse compiler, or are functional type issues (not properly following business requirements) or other higher-level issues that not even FindBugs can do a satisfactory job with detecting - but are things that are better caught during a proper code review process and comprehensive unit testing. Additionally, FindBugs only searches against the compiled Java bytecode, so it can't help report the source code level issues, like Checkstyle.

Mark A. Ziesemer

Sunday, August 6, 2017

A faster, "complete" Java HttpServletRequest.getRequestUrl() replacement

MarkUtils-CLI: Annotations (and more) for Apache Commons CLI

classParser

cli

CliRunner

Download

Monday, May 29, 2017

MarkUtils Refresh, New Hosting

Saturday, February 23, 2013

How to (not) Validate an Email Address

Real-world issues

Saturday, August 11, 2012

parent-updates-maven-plugin

Saturday, March 31, 2012

"Connection Reset" errors, MTU, DHCP, and Time Warner Cable

So long, AT&T DSL

"Connection Reset" errors, MTU, and DHCP with Time Warner Cable

Thursday, February 2, 2012

Prepping Cygwin for a multi-user installation

Sunday, October 30, 2011

Ubuntu Linux 11.10 Install Notes

Saturday, October 1, 2011

Fixing UPnP/DLNA sharing on 2nd drive under Windows 7

Background

UPnP/DLNA issue

My Solution

HP LaserJet PCL Errors and Driver Availability

Saturday, April 16, 2011

Improving code and quality with Checkstyle

About Me

Blog Archive

Labels

Notes

Search

Followers

Mark A. Ziesemer

Sunday, August 6, 2017

classParser

cli

CliRunner

Download

Monday, May 29, 2017

Saturday, February 23, 2013

Real-world issues

Saturday, August 11, 2012

Saturday, March 31, 2012

So long, AT&T DSL

"Connection Reset" errors, MTU, and DHCP with Time Warner Cable

Thursday, February 2, 2012

Sunday, October 30, 2011

Saturday, October 1, 2011

Background

UPnP/DLNA issue

My Solution

Saturday, April 16, 2011

About Me

Blog Archive

Labels

Notes

Search

Followers

Subscribe To