Managing Complexity: Keeping a Large Java Project on Track

by Tom Copeland
Help Me Help You

Software Projects Need Constant Attention

Managing a large software project is a challenge. Code is constantly being changed: bugs surface and are fixed, branches are created and merged — it's no small task to keep things coherent. It's even trickier if the developers are geographically distributed; if a development team keeps their code away from everyone else for a month or two, it's likely that when they check it in you'll enter integration hell, with painful merges, conflicting interfaces, and a bevy of dependency problems.

There's no silver bullet for this problem. However, one practice that may help keep things under control is continuous integration. That sounds nice, since the software has to be integrated at some point — but what does continuous integration mean practically, on a day-to-day basis?

UltraLog is a large Defense Advanced Research Projects Agency (DARPA) project. The purpose of the project is to "extend the open source COUGAAR cognitive agent architecture using a layered, integrated approach with technologies in robustness, security, stability, and scalability." More importantly for this article's purposes, the UltraLog project is written in Java, by developers from over a dozen companies distributed around the United States. We needed something to help avoid integration problems; we needed a status page. So we put together the "Dashboard."

Click for larger view
Figure 1. The Dashboard (Click for larger view)

The Dashboard — What's Not on There?

There's lots of stuff related to the UltraLog project that, while important in its own right, doesn't make it onto the Dashboard.

The Dashboard — What's on There?

So, what does the dashboard show?

That's an overview. But if you've put together an hourly build, you know that there are a lot of details involved. Let's look at some of the technical issues we encountered while hooking things together.

Ruby and Ant

First of all, the whole process is driven by Ruby scripts and a Jakarta Ant build template. There's no room here to discuss those products, so I'll summarize by saying that Ruby is an excellent open source scripting language and that Ant is an excellent Java build tool.

The most important item on the Dashboard is compilation success/failure. If we can't compile the code, we can't do much else. So compilation status is displayed in color — a strategy that has added phrases like "[some project] is back in the green" to our lexicon. This allows someone to come to the Dashboard and see at a glance whose code is not integrating cleanly. It's an excellent motivational tool.

Technically, the Ant javac task takes care of the compilation. Once it's done and the report is written to an XML file, the Ruby script that drives the process parses that report, determines success or failure, and counts the number of deprecated methods. This allows a developer to see that while the code may have been compiled, there may be newer methods to use. It helps to reduce the integration load when new releases occur.


Ikko is a little templating engine written in Ruby. It's fine for small projects that don't get heavy traffic. In the case of the Dashboard, the web page gets rebuilt once an hour, so template caching is not a requirement.

You can see examples of Ikko's simple operation on the Ikko home page. Here's an example of loading a file and plugging in a couple values:

require 'ikko'
puts fm["people.html", {"name"=>"Fred", "age"=>"25"}]

Here's the HTML file for the above snippet:

<!--Fragment key="name"--> is <!--Fragment key="age"--> years old.

Specify a file name and a Ruby Hash object and you have templated HTML.


JavaNCSS provides a command-line interface and an XML output format. We used the Unix find utility to gather up a list of files:

$ find . -name *.java > files.txt

and then executed JavaNCSS with the XML flag:

$ javancss files.txt -xml > report.xml

With the help of the Ruby REXML library, the result can be parsed in a line of Ruby:

ncss = ("report.xml"))).elements["ncss"].text

and then it's plugged into an HTML template for display in the final report page.

This process is representative of how the other items are handled. The Ruby script invokes an Ant target or a command-line tool, a report is generated, and the Ruby script parses the result and plugs it into the HTML page.

How to Keep Your Boss from Sinking Your Project

Essential Reading

How to Keep Your Boss from Sinking Your Project
By Andrew Stellman, Jennifer Greene

Like it or not, your project needs management. Yet few good software projects can survive bad management. If you're a programmer on a high-visibility project, this PDF offers five principle guidelines for managing upward that will help you help your boss make the right decisions about setting project expectations, working with users and stakeholders, putting the project on the right track and keeping it there. The PDF also covers what problems cause projects to fail and how to fix them, and what you can do to keep your software project from running into trouble.

Read Online--Safari
Search this book on Safari:

Code Fragments only

Recent CVS Activity

Showing recent CVS history proved a bit tricky. In order to display commits by branch, we had to use, an open source Perl script that wraps the output of the CVS log command. We further wrapped that in a homegrown Ruby CGI script that allows the recent commit history to be rendered to HTML.

CVS Charts and Graphs

Since we use CVS for revision tracking, there are numerous open source tools available to create reports. We use the StatCVS tool to generate charts and graphs of CVS history. Again, we use a small Ruby script to drive the report generation. Here's the line of code that runs StatCVS itself:

$ java -jar statcvs.jar -output-dir path/to/html/dir/
	project_name project_module/cvslog project_module

Since StatCVS comes in one .jar file, there are no dependencies to track. We run this report nightly, since it takes about 20 minutes to run on all our repositories.

Coding Guidelines

PMD is a Java static analysis tool that checks for unused code, empty catch blocks, and so forth. We run a subset of the standard PMD rules, and we've also written a couple of custom rules to check for Thread creation, Socket creation, and various other coding practices that are not appropriate for this project. The documentation for the PMD Ant task is straightforward, but one thing we found helpful was to always delete the report file from the previous hour before generating a new one. That way, if the code being checked goes from five errors to zero errors and no new file is generated, the previous file won't linger around.

The Dashboard Ruby script then parses the PMD HTML report and determines the number of errors by simply counting the number of rows, as illustrated in this snippet:

count=0"pmd_report.html").each("<td ") {|x| count += 1}
ruleViolations=(count/4) unless count==0

This result is then displayed on the front page and hyperlinked to the full report.

Duplicate Code

CPD is a Java duplicated-code checker that comes bundled with PMD. We run CPD to check for sequences of more than one hundred duplicate tokens — quite a few, considering that CPD discards whitespace, comments, and various uninteresting sequences like import and package statements. Since CPD has an Ant task, integrating it into the build was similar to integrating PMD.

Note that has published several articles on both PMD and CPD, so there's a lot of information out there on both tools.

JUnit Test Results

JUnit is a popular Java unit testing tool. Some of the developers have begun to write JUnit tests for their code. To encourage this, we run those tests and post the results on the Dashboard. In order to standardize a bit, all tests are to be named by appending Test to the class name (i.e., FooTest), and placed in a separate, parallel directory tree. This lets the Ant task easily find the tests, and it keeps test code separate from the production code. After the tests are run and the results sent to an XML file via the JUnit Ant task's <formatter type="xml"/> element, the Ruby script parses out the number of tests passed/failed:

def parseJunitFile(filename, result)
  "build/target/task/message[@priority='info']") do |info|
    if (info.text =~ "Tests run: ") != nil
      tmp = info.text.split
      result.testsTotal=result.testsTotal.to_i + tmp[2].to_i
      result.testsFailed=result.testsFailed.to_i + tmp[4].to_i + tmp[6].to_i

This allows the totals to be displayed neatly on the Dashboard.


Generating Javadocs is also a straightforward operation with Ant. It's a fairly time-consuming task, though, so we only run it every four hours. Note that Javadocs can consume a considerable amount of disk space; all of the Javadocs on the Dashboard together take up around 500 MB.

Hourly .jar Files of Source and Classfiles

We've found it handy to build an hourly drop of the class files and source files in case someone wants to browse or run the latest code without checking it out and compiling it. Since the code has to be compiled anyway, creating these .jars is a simple matter of using the Ant zip task:

<target name="srczip" if="">
  <delete file="${}"/>
  <zip destfile="${}" basedir="${}"
<target name="jar" depends="compile">
  <jar jarfile="${}" baseDir="${buildDir}"/>
  <signjar jar="${}" keystore="/var/build/signingCA_keystore" 
                   alias="privileged" storepass="keystore"/>

Note the dependency in the jar task; there's no need to attempt to jar things up if the compilation step fails.

Future Plans

What else could be added to the hourly build page? In some projects, a test coverage report (a report on the percentage of the code that the unit tests actually cover) has been found useful. Several tools exist to provide such a report — Clover comes to mind. Of course, such a report isn't very useful unless a decent number of unit tests have been written.

Folks who are familiar with the Jakarta open source build tool Maven may notice some similarities. It might be possible to use Maven to do some of the things the Dashboard does, but Maven was not very far along when we first began putting the Dashboard together. It might be worth revisiting Maven to see if that's possible now.


We've discussed some things can make a large Java project hard to manage. We've looked at one large Java project — UltraLog — and how an hourly build status page helped keep things under control. We've also done a quick overview of some open source tools that you may find to be a useful part of your hourly build page. Give them a try!


Thanks to all of the folks who have donated their time and energy towards the various open source tools mentioned in this article.


Tom Copeland started programming on a TRS-80 Model III, but demand for that skill has waned and he now programs mostly in Java and Ruby.

Return to

Copyright © 2017 O'Reilly Media, Inc.