Why I'm Not Supporting the Open Informatics Petition

by Andrew Dalke

Editor's note: For an opposing viewpoint, read The Open Informatics Petition by Jason E. Stewart and Harry Mangalam.

I am an advocate for open source and have been so since at least 1996. I have released several packages under open source licenses, including new-style BSD and LGPL, and even placed code into the public domain. I have used and helped in the development of many other open source software projects. I am secretary of the Open Bioinformatics Foundation, a nonprofit organization that helps support the 'Bio*' projects. (These are open source projects for bioinformatics and related fields; they include Biopython, Bioperl, BioJava, BioCORBA, and Biodas.) I have even started my own company partially so I can be free to release source code as I see fit.

I like helping people. I like writing software. I like the sense of discovery and creativity in science. So I enjoy writing usable software that lets researchers focus on doing new science. I do not like it when people are forced to waste time and effort worrying about the niggling details of file conversions and command-line options and database APIs, nor do I like it when they rewrite code others have done hundreds of times before.

I see open source software as a way that I can help others and reduce that waste. A couple of hours after writing this, I will contribute a new set of modules to the Biopython project, and shortly after that people around the world will start using it as part of their research and work.

Yet I also believe non-open source software can achieve those goals. I have a few ideas for non-open source projects. By restricting redistribution rights my company can receive additional revenue and spend it on a more user-centered design, additional testing, better documentation, and other tasks that help people do science. We will even spend it on developing and promoting open source software.

I am against the Open Informatics petition. Paraphrasing somewhat, if put into practice, the petition would require that publicly funded researchers publish any source code under an open source or free software license.

I am against the petition because this requirement will hinder the development of science. When I was in an academic lab we had access to many packages either "free for academic and nonprofit use" or at a relatively low cost. It was a structural biology lab, so the packages included CHARMM, XPLOR, and DSSP. Other fields have similarly licensed software.

O'Reilly Bioinformatics Technology Conference

Andrew Dalke will be presenting a session on Biopython at the upcoming O'Reilly Bioinformatics Technology Conference.

These packages are distributed with source code but are not open source because there are restrictions on redistributing the code. We had access to the code and rights to modify it, which meant we could inspect the code for bugs, learn from it, and even try out new ideas.

Suppose the petition is in place and I want to experiment with a modification to the DSSP algorithm. (DSSP was developed at EMBL in Europe so changes in the policies of the funding source would not affect its license status.) The easiest way to do that is to get the DSSP source code and make the needed changes. The new code almost certainly contains parts of the old code, so under the DSSP license it cannot be redistributed. This means I cannot release the software under an open source license, and so I cannot do this research under a publicly funded grant. Either I find a private grant, rewrite DSSP completely, or simply not do that science. In any case, it hinders my research.

Whole labs are based on modified versions of non-open source software, originating perhaps from other labs or even commercial organizations. Some of the software is in excess of a million lines of code that evolved over decades of work. They cannot redistribute the software, but they are doing good science. Why should they be forced to completely revamp their research? Does it really help the public to have people spend another decade rewriting existing software solely so it can be released with an open source license?

The inability to redistribute modified software isn't solely a problem with non-open source software. It even happens when combining open source packages. Two open source packages may be incompatible, meaning their licensing requirements are such that no software can be distributed which is derived from a combination of the two.

This is not a theoretical problem. I usually write software for the Python programming language. For a while, Python was distributed under an open source license that was incompatible with the General Public License (GPL) encouraged by the Free Software Foundation. Or rather, the lawyers behind Python thought they were compatible, but the people behind the Free Software Foundation said they weren't. So, many Linux distributions, which often require GPL compatibility, refused to include the newer version. Were I doing work based on that version of Python and GPL'ed software, I would not (according to the Free Software Foundation) have the legal right to distribute the changes. Luckily, the licensing issue has since been resolved. But were I doing publicly funded work this petition would either have denied me the freedom to upgrade my Python installation, or the ability to do my research.

Comment on this articleDo you think all code generated by publicly funded research should be licensed as open source?
Post your comments or read what others have to say

For more of Andrew's thoughts on open source read his Further Discussion on the Meaning of Open Source.

I am against the Open Informatics petition because the originators do not fully understand what open source means. Open source licenses rarely require that local changes be distributed. Open source licenses do not set a limit on the fees charged. Open source licenses set no restriction on when, how, or where the source is distributed (with minor exceptions). As an open source publisher I am free to release my source code only once a year, at a charge of $1 million paid at least two months in advance, and you have to accept it on paper tape while we are both standing under the Eiffel Tower. (I'll cover my own travel arrangements if you take me up on this.) If I am the original copyright holder I'm even allowed to obfuscate the code by removing comments, using nonsense variable names, and other tricks.

But the originators want the source code available for "verifiable, peer-reviewed research." If the source isn't released, which is allowed under open source licenses, then this doesn't happen. If it is absurdly expensive, again, allowed under open source licenses, then this doesn't happen. The petition really argues for mandatory and low-cost publishing of software through easily accessible channels, all distributed under an open source license.

Jason and Harry argue that these additional requirements fall naturally from what they see as the philosophical underpinnings of the petition:

"If the public pays for the research, it should have access to the results of that research."

I cannot disagree with that statement, because it is very vague. "Access" could mean that I, as a taxpayer, have the right to call up any publicly funded researcher and ask questions. Nor do I see how that statement necessarily implies that software be published as open source. Indeed, the U.S. Constitution seems to disagree with their conclusion:

"To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;"

- Article I, Section 8, Clause 8

That seems to me a very clear statement that a public good--the Progress of Science--can be achieved by keeping "Writings" exclusive to the author.

Finally, the petition equates open source to peer review. I am not allowed to take a peer-reviewed paper, modify a paragraph, and republish it. Unless I have permission from the copyright owners, this violates copyright law just as surely as taking someone's source code, changing a few lines of code, and redistributing it does. If the source code of a program and a published, peer-reviewed paper are truly equal, then there is justification for source code inspection, perhaps tied to a publication, but there is no justification for a mandated redistributable, open source license.

In various discussions of this topic elsewhere I have brought up additional objections to the petition, some technical, others philosophical, and still others tongue-in-cheek. I countered that the arguments in favor of the petition, like encouraging standardization and supporting incremental improvements, are not as strong as the originators' claim. Because of all these problems I see in the Open Informatics petition, I find that I cannot sign it.

Andrew Dalke is a supporter of free software and open standards for life science research, and a cofounder of the Biopython project.