Detecting Duplicate Code

by Dion Almaer

Tom Copeland has written a nice piece on the open source CPD (copy/paste detector). Having tools like this can really help out, and seeing the amount of copy/paste in the JDK source itself is scary.
However, what do I *really* want tools to find out...

I want them to go above and beyond "this code exists in and". I want it to tell me "This piece of functionality has been duplicated". In large projects, many core utilities get rewritten by different people.

For example, when working on a system that interfaced with a COBOL application on some of IBMs big iron, we found that a function that cleared the screen had been rewritten more times that you could believe. Over the many years, when new employees came in, they just wrote their own functions to work with.

It would be great if you wrote some code for an app, and were told by a program "*ahem*, I know what you are writing here, but just use the functionality that Bob wrote 5 years ago located here", and "Interesting, how about you refactor code X instead of reinventing the wheel there mate".

Now that will be cool!


2003-03-13 07:14:24
Please not through a UI Agent.!
"It appears that your writing a method..."

"...would you like me to take the NIH scales from your eyes?"

2003-03-13 11:11:05
Behavioral, not Technical?
I'm doubtful that a computer can make the intuitive leap to see that two bits of code are refactorable down to one bit. Would some of the Extreme Programming practices work better than an automated process?

In specific, with collective code ownership, merciless refactoring, and pair programming, at any point there should be someone who can say, "We're writing code awfully similar to something we already have. As soon as it passes the test, let's see if we can generalize and simplify."

Of course, getting an existing application to the state of simplicity where that feels natural is another trick altogether.

2003-03-15 14:52:14
Tools for Duplicate code detection
There is a product available from a company called
Semantic Designs which can detect functional duplication of code (rather than just copy and paste) - its called CloneDR