The Trouble with Searching for Open Source Code
by Christopher Diggins
Related link: http://www.artima.com/weblogs/viewpost.jsp?thread=134186
Here is a reprint of my blog entry at Artima.com:
I frequently encounter open-source code which reimplements code which exists elsewhere (and usually does so badly). When everyone is busy reinventing the wheel, no one has the time to build a cart.
Even though some developers are guilty of simply not doing research, part of the problem is that finding open-source code for a particular purpose is hard. Search engines are well suited for finding text, but not source code. This is because:
- Source code documents are not often distributed directly on the web, but rather as part of compressed packages
- Documentation and source-code are often separated. Robots have trouble creating hard-links between documentation and the source code.
- Comments in source-code, are treated with the same level of priority as function names, and variables. This means that they aren't indexed with the proper level of priority.
So how does this get solved? Well I can see two ways:
- Search engines start applying specialized techniques for parsing and indexing source code.
- Open-source developers come up with a new standardized language independant format for distributing source code. (perhaps Open-Source-XML?)
I think either (or both) of these technologies could have a significant impact on moving software technology forward.
How can we improve searching for source-code?
2. Browse repositories online.