The Trouble with Searching for Open Source Code

by Christopher Diggins

Related link:

Here is a reprint of my blog entry at

I frequently encounter open-source code which reimplements code which exists elsewhere (and usually does so badly). When everyone is busy reinventing the wheel, no one has the time to build a cart.

Even though some developers are guilty of simply not doing research, part of the problem is that finding open-source code for a particular purpose is hard. Search engines are well suited for finding text, but not source code. This is because:

  • Source code documents are not often distributed directly on the web, but rather as part of compressed packages

  • Documentation and source-code are often separated. Robots have trouble creating hard-links between documentation and the source code.

  • Comments in source-code, are treated with the same level of priority as function names, and variables. This means that they aren't indexed with the proper level of priority.

So how does this get solved? Well I can see two ways:

  1. Search engines start applying specialized techniques for parsing and indexing source code.

  2. Open-source developers come up with a new standardized language independant format for distributing source code. (perhaps Open-Source-XML?)

I think either (or both) of these technologies could have a significant impact on moving software technology forward.

How can we improve searching for source-code?


2005-10-28 07:34:18
What trouble?
2. Browse repositories online.

No offense, but where did the confusion come from again?

(Don't hate me if this gets double-posted. when I hit "Submit" it just brought up the article again without any indication of what happened...)