Applying Distributed XML to The Open Source Paradigm Shift

by Steve Mallett

Tim O'Reilly has written and spoken often on what he coins “The Open Source Paradigm Shift”. I've heard Tim give this speech a few times, and read it a few to boot. The one major point that sticks with me is that the software we “use” is no longer just on your desktop/laptop, but the software of the internet that we use everyday a la Google, eBay, & Amazon to use his prime examples.

Tim goes on to point out that this software that exists only through our browsers or APIs, doesn't play by the same rules as does software that we download and use on our own machines. If I download the source code to the Apache HTTP server I can then compile it and use it in accordance to its open source license. This does not apply to a Google or an eBay. Even if you could download the code that runs Google you couldn't just stick it you home directory and start it up.... there's no value there. It's not the same thing at all. It's Infoware.

This is the point in Tim's speech that the brakes go on for me. For me open source is two things. One practical, the other touchy-feely. The first thing is that open source creates a practical benefit to me in that is works better. The other is the value of trust it gives me. The code is open, it can be forked at will when someone does something evil. Those two characteristics in combination make my wheels turn for open source software. So, what happens when the software I depend on slowly shifts to Infoware that I can never really touch and that while still immediately practical gives me no assurance that it can't be taken away or misused at will without any recourse available to me?

I think we can apply the same principles to the data as we have to the source code. Google, eBay, Amazon, et al. are really only as useful as we allow them to be through the information we give them. We still hold the cards here which means we have options.

My proposed solution is based on backlash at social network sites and some XML based projects I follow. Social networking sites, like friendster, orkut, etc, are really the ultimate in Infoware. There is no value whatsoever in the sites without the data we supply. In this case it is our network of acquaintances... our friends in XML.

When the first social networking site came out we all saw some value in it. It genuinely would be helpful to be able to reach out through people we know to find the perfect match for some need. Then came the copycats. “Are you my friend?” quickly became a joke and people tired
of giving up their info. At about the same time those who continued to like the idea of social networks devised a project named FOAF (Friend of a Friend). The concept here is that the owner of the data (that's you) creates one XML file containing your acquaintances (the info in 'Infoware') and distribute that as you like.

Another XML file based project I've been following is DOAP. Edd Dumbill wanted to apply the same idea as FOAF to Description of a Project. This is an XML file that contains all the info you'd ever want to know about a software project in one place that doesn't require being duplicated by hand in the handful of open source project sites.

Both of these projects are based on reducing the bother of an activity centered around the Infoware concept. But, there is a further use of following this model. We own and control the data. The info in Infoware is ours to dictate the terms of its use.

Let's apply this to an sample case. A good one is Google. You can and sometimes do tell Google to bug someone else. You do it with a robots.txt file on your webserver. For those unfamiliar Google looks for these in websites and if it says “Google, bugger off!” it does.

I could extend this model to an Amazon, or whomever challenges it (Amazone), with the data I provide it in terms of book reviews. Here I register as an Amazone user, tell it where it can find my bookreview.xml file and go my merry way knowing that if Amazone decides that if it wants to pull a fast one in the future I can change access to that information and give it to their competitor thus 'forking' them.

[This would have been an extemely useful feature this week with one of Friendster's employees being fired for blogging. We could have collectively pulled our network of friends in the blink of an eye, but as is, they 0wn J00!]

What led me to thinking about this are GPX files. These are GPS data files that describe GPS location co-ordinates. They are written in XML to insure interoperability of the data among GPS handhelds and software. There is another website that is basic Infoware: It specializes in collecting and distributing information that it collects from users. They haven't done anything evil that I'm aware of, but they don't make GPX files freely available. People upload GPS information in a webform, the website turns it into a GPX file, and hides it behind a specialty 'service'. It does make the information available in a normal web page form, but this still seemed a bit weird to me and a first step towards begging to be forked. Plus, I'd like to make GPX forms from scratch. How do I make them distributable? Like DOAP and FOAF.

So let's apply my homemade GPX files to the Open Source Paradigm Shift. I create the valuable data, I tell those who are interested in it where it is under the condition that it is theirs as long as I choose to grant it to them. That's to say, conduct yourself as to make me want to continue to help make your Infoware useful.

In this model I believe that the freedom to innovate and improve data, as opposed to software code, is best served by being largely distributed and in the hands of the many.

There are some legal considerations here along the line of granting copyright of the information to one large organization to fight on one's behalf as the FSF encourages, and some attribution rights that people would want preserved. I think this could be best addressed with a very simple combination of the FSF copyright assignment and a creative common's attribution license. We'll leave that racket for another day.

This essay is available for further editing at mod_foo in the editorial queue. If you have anything to add or detract I'd love to see your editorial comments there as it goes to publication.


2006-08-01 06:14:28
You post it, they harvest it, they process it, they sell it.

You signed up for this. XML won't protect you because it doesn't care. You are caught in the feedback loop that drives the evolution of networked systems. To have a sharable data format, you need standard objects that process it. They may be de facto or public but you can't decouple completely from the application.

As long as that is true, you don't own the data. You publish it. Now, do you feel that Data Rights Management Systems are Bad Things or Good Things?

2006-08-01 06:24:01
"As long as that is true, you don't own the data. You publish it. Now, do you feel that Data Rights Management Systems are Bad Things or Good Things?"

I don't have a problem with Data Rights Management Systems... they just need to be a non-hassle.

2006-08-01 14:18:56

As long as the author/originator can specify in a meaningful and easily understood way their intentions for the reuse of their works, it does good. If it becomes a means to deprive the commons of what is actually common, it does harm.

Where these conflict is when one cannot determine the correctness of the claims. For example, are ideas harvested from open email lists copyrightable if rewritten and stripped of the original source? If so, then it will be necessary for some to get off these lists and onto publishing papers. At that point, innovation slows down and wealth accumulates. The alternative to that is to vette every paper and claim against a search engine that can ontologically classify contributions and show a timeline regardless of proof of access. That will map the emergence in the network and, depending on the claims, may be evidence of independent invention. Going by TimBLs axioms, that would be a good thing altogether.

Sad but so.

2007-01-19 18:24:58
I like your logo. It very impressive, many thanks. Good resources here. Thanks!