Thought Experiment: Science as an Open Source Project

by Timo Hannay

Come with me to Athens. It's the fifth century B.C., around the time of Socrates, and the first ever proto-hacker--let's call him Taikalukathes--is doing what hackers do best: He's thinking. As he ponders the fauna of his student bedsit, he considers an old wives' tale that says fleas cause rats to scratch. As many have done before him, he wonders about the truth of the tale. But unlike his predecessors, Taikalukathes also thinks hard about how he might go about testing it and, in a flash of inspiration, he comes up with the most wonderful means: He will take a rat that he knows to have fleas, and another that he knows to be flea-free, and he'll count the number of times that they each scratch in a given time period. His results support the old wives' tale and Taikalukathes becomes just a little bit wiser than any other man.

If that were the end of the tale, this wouldn't be a story worth telling, but Taikalukathes does one more remarkable thing: He shares his newfound knowledge with his fellow students. Pretty soon, Taikalukathes' friends are doing their own experiments--on other animals and sometimes even on themselves--and from all this hard work and head-scratching emerges a general theory of itching that uses a very small number of facts to explain the whole universe of dermatological irritation.

And so it was that the Kernel of Knowledge v.0.01 was born.

Cut to the early years of the 21st century. Taikalukathes' intellectual descendents have now built up the Kernel--known these days as "Science"--into a vast, interconnected array of theories that, on the whole, are both surprisingly consistent and surprisingly useful in making life more comfortable for everyone, geeks and non-geeks alike. Their scope has now spread from skin infection to almost every aspect of the natural world, and most modern practitioners have forgotten how this whole thing began (though a vestigial collective memory still resides in the habit of responding to a highly successful experiment with the exclamation, "Taikalukathes Results!").

Of course, there are imposters and charlatans, some of them abusing the Science brand to give their own shaky and improperly tested theories an aura of truth. But fortunately, not too many people mistake, say, the Social Science distribution for the real thing. Still fewer would ever entertain using those terminally buggy versions floating around that include infamous patches such as "Cold Fusion" or "Non-HIV AIDS."

Why? Because all patches (read "papers") have to jump through a series of hoops before they can be committed to the Science source-code repository. In particular, they have to gain the considered approval of a gatekeeper ("editor") and, usually, a couple of other hackers ("scientists") with special expertise in that particular part of the Kernel of Knowledge. These people check for a number of things:

  • The submitted code should not cause the system to crash. (The scientific conclusions should not fly in the face of existing scientific knowledge.)

  • It should add potentially useful functionality or increase the elegance of the code. (It should increase the range of phenomena explained or reduce the number of facts we need to explain all understood phenomena.)

  • It should be written in a robust, logically consistent way. (It should be supported by the data presented.)

  • It should be properly documented so that any competent programmer can understand it. (It should be accompanied by methods detailed enough for a competent scientist to replicate the experiment.)

Call for Participation
The second O'Reilly Bioinformatics Technology Conference is slated for February 3 - 6, 2003, at the Westin Horton Plaza in San Diego, California. Individuals and companies interested in making presentations, giving tutorials, or participating in panel discussions are invited to submit proposals.

Sometimes there are complaints that this system tends to suppress patches that contradict the prejudices of existing Kernel architects, even when the new ways are better. But this danger is usually considered a small price to pay in order to keep the whole system reasonably bug-free. Major system rewrites won't happen unless you submit some damn good code. Or, as scientists would put it, extraordinary claims require extraordinary evidence. Dissenters are always free to fork the code--as the Cold Fusion and Non-HIV AIDS people already have--and one day, one of these fringe distros might even become mainstream. But don't hold your breath.

OK, enough of cartoon-like analogies. This note does have a more serious aim. Open source software development has had huge success in a relatively short period of time. As I've tried to describe above, it also bears some similarities to scientific research (which has had even huger success over a much longer period of time). Science has a process, based on peer review, for deciding what should--and what should not--be incorporated into the corpus of knowledge. This has evolved over centuries and generally works pretty well. But it isn't tailored to the online world and it's quite possible that the current "phase transition" in information dissemination brought about by Internet technologies may cause scientists to do such things very differently. Open source software development also has processes, many of them based on peer review, for deciding what should be incorporated into the code base. These processes are tailored to the online world because that's where open source development has always happened.

So my question is this: What lessons have been learned within open source software development that might be directly applicable to scientific peer review in the online world?

Timo Hannay is head of new technologies at the Nature Publishing Group, publishers of the science journal, Nature.