Duplicate code

Duplicate code detection allows you to find code that has been generated by Copy/Paste programming. Duplicate code typically leads to higher maintainance cost because bugs will need to be fixed twice, more code needs to be tested, etc.

There are many trade-offs when writing a duplicate code detection tool. Some of the conflicting goals are:

  • Fast
  • Low memory usage
  • Avoid false alarms
  • Support multiple/arbitrary languages (Java, JSP, C++, ...)
  • Support Fuzzy matches (comments, whitespace, linebreaks, variable renaming, etc.)

The check provided here, StrictDuplicateCode, is fast enough to facilitate checking very large code bases in acceptable time (minutes). It consumes very little memory, false alarms are impossible. While it supports multiple languages it does not support fuzzy matches (that's why it's called Strict).

Note that there are brilliant commercial implementations of duplicate code detection tools. One that is particularly noteworthy is Simian from RedHill Consulting, Inc. Simian has managed to find a very good balance of the above tradeoffs. It is superior to the checks in this package in many repects. Simian is reasonably priced (free for noncommercial projects) and includes a Checkstyle plugin.

The following table summarizes the characteristics of the available Checkstyle plugins for duplicate code detection:

Name Speed Memory Usage False Alarms Supported languages Fuzzy matches
StrictDuplicateCode High Very Low Impossible any language No
Simian Very high Low Possible but very unlikely many languages, including Java and C/C++/C# Limited support

We encourage all users of Checkstyle to evaluate Simian as an alternative to the Checks we offer in our distribution.

StrictDuplicateCode

Performs a line-by-line comparison of all code lines and reports duplicate code if a sequence of lines differs only in indentation. All import statements in Java code are ignored, any other line - including javadoc, whitespace lines between methods, etc. - is considered (which is why the check is called strict).

Properties

name description type default value
min how many lines must be equal to be considered a duplicate int 12
fileExtensions file type extension of files to process String Set {}

Examples

To configure the check:

 <module name="StrictDuplicateCode"/>
 

To configure the check so that it allows larger equivalent blocks:

 <module name="StrictDuplicateCode">
   <property name="min" value="15"/>
 </module>
 

Package

com.puppycrawl.tools.checkstyle.checks.duplicates

Parent Module

Checker