Saturday, April 16, 2011

Improving code and quality with Checkstyle

I've always been picky about the quality my code and the code that I work with. For good reason. It makes the code easier to read and work with. Consistency makes the program flow easier to understand, bugs and other potential issues easier to spot, and difference comparisons between files and versions more effective. Sun Microsystems (original designer of the Java platform, now part of Oracle) seems to agree. From their "Code Conventions for the Java Programming Language":

This Code Conventions for the Java Programming Language document contains the standard conventions that we at Sun follow and recommend that others follow. It covers filenames, file organization, indentation, comments, declarations, statements, white space, naming conventions, programming practices and includes a code example.

  • 80% of the lifetime cost of a piece of software goes to maintenance.
  • Hardly any software is maintained for its whole life by the original author.
  • Code conventions improve the readability of the software, allowing engineers to understand new code more quickly and thoroughly.

For years, I haven't really used any special tools to assist with code formatting and standards, other than the Eclipse Java Formatter for automatically and completely reformatting code as needed, and Eclipse's Compiler Errors/Warnings for pointing out potential issues. One limitation with the compiler errors/warnings in Eclipse is that they really don't include any formatting-specific checks.

I've had experience with Checkstyle in the past (official site | Wikipedia). However, I'm not sure how much thought was put into the configuration of the standards that were in place, and combined with a high level of false positives, most of the developers found ourselves working against the tool rather than working with it. Standards are great, but they need to be thought through, and have buy-in from the project team.

Over the past few months, I rediscovered a need for Checkstyle for the group of developers I was working with. We were doing an satisfactory job addressing the compiler errors and warnings, but the code itself was a sore sight to look at. The code formatting was addressed during code reviews, but without having the formatting issues also appear as warnings in the Eclipse "Problems" view as it was being developed, the formatting issues were quickly "out of sight, out of mind", and quickly forgotten by much of the team. (I'm not sure how the group would have managed with a language like Python, where proper indentation of the code is actually part of the language syntax.)

Fortunately, since I last worked with Checkstyle, the tools have only improved. eclipse-cs now integrates seamlessly with both Eclipse and the Checkstyle XML configurations, and is fully configurable to projects on both local and global levels - using either internal, external, remote, or project-relative configurations.

For my own reference, as well as others to benefit from, I'm sharing the Checkstyle configuration I've worked with over the past few months. The surprising thing is how many inconsistencies were reported and that I was able to fix even in my own code after running the Checkstyle validations.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE module PUBLIC "-//Puppy Crawl//DTD Check Configuration 1.3//EN" "http://www.puppycrawl.com/dtds/configuration_1_3.dtd">

<!--
    This configuration file was written by the eclipse-cs plugin configuration editor
-->
<!--
    Checkstyle-Configuration: MAZ
    Description: none
-->
<module name="Checker">
  <property name="severity" value="warning"/>
  <module name="TreeWalker">
    <property name="tabWidth" value="2"/>
    <module name="JavadocMethod">
      <property name="severity" value="ignore"/>
      <property name="suppressLoadErrors" value="true"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="JavadocType">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="JavadocVariable">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="JavadocStyle">
      <property name="checkFirstSentence" value="false"/>
    </module>
    <module name="ConstantName"/>
    <module name="LocalFinalVariableName"/>
    <module name="LocalVariableName"/>
    <module name="MemberName"/>
    <module name="MethodName"/>
    <module name="PackageName"/>
    <module name="ParameterName"/>
    <module name="StaticVariableName">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="TypeName"/>
    <module name="AvoidStarImport"/>
    <module name="IllegalImport"/>
    <module name="RedundantImport"/>
    <module name="UnusedImports"/>
    <module name="LineLength">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="MethodLength">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="ParameterNumber">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="EmptyForIteratorPad"/>
    <module name="MethodParamPad"/>
    <module name="NoWhitespaceAfter">
      <property name="tokens" value="BNOT,DEC,DOT,INC,LNOT,UNARY_MINUS,UNARY_PLUS"/>
    </module>
    <module name="NoWhitespaceBefore"/>
    <module name="OperatorWrap"/>
    <module name="ParenPad"/>
    <module name="TypecastParenPad"/>
    <module name="WhitespaceAfter">
      <property name="tokens" value="COMMA,SEMI"/>
    </module>
    <module name="WhitespaceAround">
      <property name="tokens" value="BAND,BAND_ASSIGN,BOR,BOR_ASSIGN,BSR,BSR_ASSIGN,BXOR,BXOR_ASSIGN,COLON,DIV,DIV_ASSIGN,EQUAL,GE,GT,LAND,LE,LOR,LT,MINUS,MINUS_ASSIGN,MOD,MOD_ASSIGN,NOT_EQUAL,PLUS,PLUS_ASSIGN,QUESTION,SL,SL_ASSIGN,SR,SR_ASSIGN,STAR,STAR_ASSIGN,TYPE_EXTENSION_AND"/>
    </module>
    <module name="ModifierOrder"/>
    <module name="RedundantModifier"/>
    <module name="AvoidNestedBlocks">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="EmptyBlock">
      <property name="option" value="text"/>
      <property name="tokens" value="LITERAL_DO,LITERAL_ELSE,LITERAL_FINALLY,LITERAL_IF,LITERAL_FOR,LITERAL_TRY,LITERAL_WHILE,STATIC_INIT"/>
    </module>
    <module name="LeftCurly"/>
    <module name="NeedBraces"/>
    <module name="RightCurly"/>
    <module name="AvoidInlineConditionals">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="DoubleCheckedLocking"/>
    <module name="EmptyStatement"/>
    <module name="EqualsHashCode"/>
    <module name="HiddenField">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="IllegalInstantiation"/>
    <module name="InnerAssignment">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="MagicNumber">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="MissingSwitchDefault">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="RedundantThrows">
      <property name="suppressLoadErrors" value="true"/>
    </module>
    <module name="SimplifyBooleanExpression"/>
    <module name="SimplifyBooleanReturn"/>
    <module name="DesignForExtension">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="FinalClass"/>
    <module name="HideUtilityClassConstructor">
      <metadata name="net.sf.eclipsecs.core.comment" value="MAZ: Disabled, until this module provides an option to ignore abstract classes."/>
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="InterfaceIsType"/>
    <module name="VisibilityModifier">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="ArrayTypeStyle"/>
    <module name="FinalParameters">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="TodoComment">
      <property name="severity" value="ignore"/>
      <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
    </module>
    <module name="UpperEll"/>
    <module name="RegexpSinglelineJava">
      <metadata name="net.sf.eclipsecs.core.comment" value="MAZ"/>
      <property name="format" value="^\t* +\t*\S"/>
      <property name="message" value="Line has leading space characters; indentation should be performed with tabs only."/>
      <property name="ignoreComments" value="true"/>
    </module>
    <module name="RegexpSinglelineJava">
      <metadata name="net.sf.eclipsecs.core.comment" value="MAZ"/>
      <property name="format" value="^[^&quot;]*(catch|for|if|switch|synchronized|while)\s+\("/>
      <property name="message" value="No whitespace after block keyword."/>
      <property name="ignoreComments" value="true"/>
    </module>
    <module name="RegexpSinglelineJava">
      <metadata name="net.sf.eclipsecs.core.comment" value="MAZ"/>
      <property name="format" value="^\s*(&quot;([^&quot;])*(?&lt;!\\)&quot;|[^&quot;\t])+\)\s+\{"/>
      <property name="message" value="No whitespace before '{'."/>
      <property name="ignoreComments" value="true"/>
    </module>
    <module name="RegexpSinglelineJava">
      <metadata name="net.sf.eclipsecs.core.comment" value="MAZ"/>
      <property name="format" value="^\s*(&quot;([^&quot;])*(?&lt;!\\)&quot;|[^&quot;\t])+(\t|  )"/>
      <property name="message" value="No tabs or multiple whitespace beyond indentation."/>
      <property name="ignoreComments" value="true"/>
    </module>
    <module name="AnnotationUseStyle">
      <metadata name="net.sf.eclipsecs.core.comment" value="javac currently does not accept a trailing comma (http://bugs.sun.com/view_bug.do?bug_id=6337964)."/>
      <property name="severity" value="error"/>
      <property name="elementStyle" value="ignore"/>
      <property name="closingParens" value="ignore"/>
    </module>
    <module name="MissingDeprecated"/>
    <module name="MissingOverride"/>
  </module>
  <module name="JavadocPackage">
    <property name="severity" value="ignore"/>
    <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
  </module>
  <module name="NewlineAtEndOfFile">
    <property name="lineSeparator" value="lf"/>
  </module>
  <module name="Translation"/>
  <module name="FileLength">
    <property name="severity" value="ignore"/>
    <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
  </module>
  <module name="FileTabCharacter">
    <property name="severity" value="ignore"/>
    <metadata name="net.sf.eclipsecs.core.lastEnabledSeverity" value="inherit"/>
  </module>
  <module name="RegexpSingleline">
    <metadata name="net.sf.eclipsecs.core.comment" value="MAZ"/>
    <property name="format" value="[^\s]+\s+$"/>
    <property name="message" value="Line has trailing whitespace."/>
  </module>
</module>

This configuration is based off of the "Sun Checks" configuration, with a few enhancements, and tweaks for personal preference.

One of the noteworthy settings is "trailingArrayComma" setting under Annotations/Annotation Use Style. Per the Java Language Specification (JLS), a trailing comma in an annotation array should be allowed / ignored. The Eclipse compiler respects this, but attempting to use javac (e.g. as part of a Maven build) with a trailing comma results in a "illegal start of expression" failure, due to an acknowledged bug with javac. Using Checkstyle here helps to ensure consistent, stable builds across compilers.

Additionally, I extended the configuration to perform limited checks on non-Java files as well. First, there is a "exclude from checking" configuration section that includes a "all files types except:" rule that is enabled by default, and set to "java, properties". This filter needs to be expanded to include any file types that should even be considered for processing by Checkstyle. For basic web development, I added "htm, html, js, css, jsp". Next, disable the "Use simple configuration" option, which enables "Advanced - configure file sets to control which files get checked by which configuration". I then configured 3 file sets:

  • Java - Uses the configuration shown above. Includes regular expression matches for "\.java$" and "\.properties$".
  • Web - Provides basic whitespace rules for non-Java files. Includes regular expression matches for each of the "web development" types mentioned above, or "\.(html?|js|jsp|css)$" as a version to accomplish it with one expression.
  • JSP - Provides additional checks specific to the scriptlet tags used in JSP files. Includes a regular expression match for "\.jsp$". Note that the above Web file set is configured to also apply to JSP files.

In each of these configurations are a number of additional regular expression checks, that primarily focus on whitespace rules that aren't currently checked for or configurable by the other Checkstyle rules:

File Set Module Pattern Message
Java RegexpSingleLineJava ^\t* +\t*\S Line has leading space characters; indentation should be performed with tabs only.
Java RegexpSingleLineJava ^\s*("([^"])*(?<!\\)"|[^"\t])+(\t|  ) No tabs or multiple whitespace beyond indentation.
Java RegexpSingleLine [^\s]+\s+$ Line has trailing whitespace.
Java RegexpSingleLineJava ^\s*("([^"])*(?<!\\)"|[^"\t])+\)\s+\{ No whitespace before '{'.
Java RegexpSingleLineJava ^\s*}\s+ No whitespace after '}'.
Java RegexpSingleLineJava ^[^"]*(catch|for|if|switch|synchronized|while)\s+\( No whitespace after block keyword.
Web RegexpSingleLine ^\t* +(?!\*)+\t*\S Line has leading space characters; indentation should be performed with tabs only.
Web RegexpSingleLine ^\s*(?!\*)([^\s"]|"[^"]*(?<!\\)")+(\t|  ) No tabs or multiple whitespace beyond indentation.
Web RegexpSingleLine [^\s]+\s+$ Line has trailing whitespace.
JSP RegexpSingleLine (?<=.+)(?<!%>)<%(?!=|.+;%>) Scriptlet begin tag must be at beginning of line.
JSP RegexpSingleLine <%[^@=](?!;%>)+$ Content after scriptlet begin tag must be on new line.

These rules are not all infallible. They are designed to catch the majority of the targeted issues, but without causing false positives in the process. (For example, ignoring spacing within multi-line comments in JSP files - as Checkstyle's Java interpretation doesn't work in JSP files, and forbidden sequences that are allowed within String constants are a bit tricky to work with.) In order to prevent regressions as the rules are expanded upon, all of these regular expressions are included in a local suite of JUnit tests that I plan to make available at a future date. There are currently 22 tests in 3 test classes (one per file set), executing 33 base comparisons that result in a total of 993 assertions. Recently, some issues with the JSP rules were observed, causing Checkstyle to fail on "minified" JavaScript files that consisted of single lines of 10,000+ characters. Recursion within Java's regular expression support resulted in stack overflow exceptions, until I tweaked the JSP expressions to make use of different zero-width lookahead and lookbehind constructs. A test case on 10,000+ characters in a single line still takes several seconds to execute, but at least it consistently completes without exception now.

After getting the source code and formatting in order (and to not drown developers in too many warnings at once), additional tools can be considered for use, e.g. FindBugs (official site | Wikipedia). While I have nothing against FindBugs, I just haven't seen as urgent of a need for it. Most of the bugs I've seen are either due to things that are already covered by properly-configured errors and warnings in the Eclipse compiler, or are functional type issues (not properly following business requirements) or other higher-level issues that not even FindBugs can do a satisfactory job with detecting - but are things that are better caught during a proper code review process and comprehensive unit testing. Additionally, FindBugs only searches against the compiled Java bytecode, so it can't help report the source code level issues, like Checkstyle.

No comments: