OsXml : Publishing Open-Source Code

by Christopher Diggins

Related link: http://www.osxml.org

If you have used open-source code as much as I have, you have probably come to the same realization: the wheel has been reinvented thousands of times, and it is rarely round.

Colorful analogies aside, my point is that most, but not all, open-source code sucks. The majority of open-source code, is untested, unproven, and redundant. I would estimate from my own experiences that for every 1 line of good open-source code there are 9 lines of bad copies of the original. Often these copies occur coincidentally because the coders don't know the originals exist. Or the programmers don't understand the issues that went into making the mature/stable code the way it is today. Mature code is almost never as pretty as naive first versions.

I propose that by using a standardized method of publishing open-source code, such as through an XML schema, it can become easier for people to publish, find, evaluate, and use open-source code. I believe this would reduce redundancy and improve open-source quality.

My suggestion is a format called OsXml which is in a rough first draft stage. I would appreciate some help in trying to make this format viable.

Do you think OsXml, or something like it, could help improve open-source quality? What else can be done?


2005-11-08 08:29:39
Are you familiar with koders.com? It indexes and makes searchable a ton of open source code, including the stuff at sf.net.
2005-11-08 09:13:11
This looks very similar to DOAP (http://usefulinc.com/doap) . What differences and similarities does OsXml have to DOAP? In what circumstances would I use one instead of the other? Do we need both?
2005-11-08 09:44:19
Are you perhaps even suggesting the author sort of recursively makes that same mistake he's trying to find a fix for -- reinventing the wheel..?

.. a new redundant xml-format -- Often these copies occur coincidentally because the coders don't know the originals exist.


2005-11-08 13:15:27
Doap is intended to describe entire packages, whereas OsXml is intended for individual source files. There is also an issue of complexity, DOAP parsing is a nightmare, where OsXml is far simpler, using an attributeless subset of XML (XML--).
2005-11-08 13:19:02
Koders is good, but the technology could be much better. If there was more structure to how source files were published, searching and evaluating could be done more effectively. Right now the big problem is documentation and source is separate.
2005-11-09 07:37:17
"DOAP parsing is a nightmare". This is a joke, right? Get better tools rather than creating worse XML formats.

Getting a DOAP home page using Amara:

print doap_doc.Person.homepage.resource

Or using plain 4Suite

print doap_doc.xpath(u'd:Person/d:homepage/@rdf:resource')

Where's the nightmare? In addition, DOAP allows me to be reasonably internationalized. Getting the French descripton of the project:

print doap_doc.xpath(u'string(d:Person/d:shortesc[@xml:lang="fr"])')

As for "OsXml is intended for individual source files", I guess I'll worry about that when I care about that, but personally I think that such files are too closely tied to their context within a project to need individual "publishing". I can understand publishing code recipes, but that's a different matter, and DOAP is fine for that.

2005-11-09 08:39:39
""DOAP parsing is a nightmare". This is a joke, right? Get better tools rather than creating worse XML formats."

What is so much better about using slow, complicated and memory intensive components?Obviously the components are not of poor quality, but the XML specification is schizophrenic and horrendously complex, a true parser is neccessarily much slower than an XML-- parser. Or have you never written an XML parser? How many lines of code are the Amara and 4suite components? I am not going to want to embed Amara and a Python binding library into a C++ application just to parse a little bit of trivial meta-information, which happens to be needlessly complciated. Finally what is "worse" about using a strict subset of XML which is much easier to parse?