Treasure Trove Looted

by John Macdonald

[Editor's Note: The following article describes a serious policy and legal issue in publishing. Programmers and Web site developers will be particularly interested, especially if they are thinking of writing books based on the content of their Web sites. We'd like to thank Jarkko Hietaniemi for bringing this issue to light, and John Macdonald for writing the following article. Both Jarkko and John are coauthors, with Jon Orwant, of Mastering Algorithms with Perl.]

For years, Eric Weisstein's Treasure Trove Web site, more recently named MathWorld, was a valuable resource on the Web. It contained, among other things, information and code on a wide variety of computer algorithms. It was recently available at:

Prior to that, it was available as Eric's Treasure Trove at Eric's personal Web site.

The Wolfram MathWorld site now displays a message that the site is no longer available. This is in response to a court order [.pdf doc] obtained by CRC Press, a publishing company known for a range of technical reference books.

Eric Weisstein, the creator of Eric's Treasure Trove, turned the content of the site into a book, The CRC Concise Encyclopedia of Mathematics, published by CRC Press. In the process, CRC Press acquired the copyright ownership of that material--but the extent of what they acquired is the central issue of the dispute.

For a while, Eric's Web site remained available. Then, at the request of CRC Press, access to the algorithms was limited to items starting with ten letters of the alphabet at a time, but the selection of which ten would be changed each day. Later the limit was removed, though now the site has been shut down completely.

Eric's Web site was active both before and after the book was published. There were ongoing updates provided by the viewers, as is typical of open source activities.

Web site creation utilizes a definition of publishing that does not match the traditional book industry definition. A Web page is published dynamically, and it is made available to readers even while it continues to change. A book is published by printing the content on paper and binding it into what we know as a book. If the data changes, a different edition of the book is published. Small changes might be made before a later print run without waiting for a new edition. Readers can purchase the new edition or continue to use the outdated version. Changes to a Web site that also became a book range from minor ones that might be included in a reprint of the book, to major changes and additions that might require a new edition of the book -- and changes may happen on a continual basis.

When a book publisher acquires an exclusive copyright for a book, they ensure that no other publisher will publish a copy of the same book. However, any copies that were previously published by a different publisher can still be used by the people who purchased them.

A Web user expects to be able to return to a published Web page whenever he needs to. When a Web site goes away, it's akin to having a book removed from the Web user's library. Not only is he unable to get the latest updated information, but he is unable to access the original information.

According to the legal documents displayed on the Wolfram Web site, what appears to be the key issue here is the distinction between a base work and a derivative work. CRC Press claims that the Web site is a derivative work that conflicts with their copyright, and that the contract for the book included the complete copyright to the work. Wolfram claims that the book is an authorized derivative work but that the Web site is the original work. Further, Wolfram claims that the copyright for the Web site is distinct from the book. As evidence, Wolfram cites the fact that CRC Press obtained a separate copyright for the CD-ROM that followed the printed book. In turn, CRC Press offers the fact that Wolfram tried to obtain permission for the Web site to show that Wolfram, to some extent at least, agrees that CRC Press has a legitimate claim to the copyright.

Even when the material in a book is produced specifically for print, not all book publishers are strict with their publication control. It is quite common for books to have associated Web sites that contain code, errata, update information, and, occasionally, the full content of the book. These are often maintained by the author rather than the book publisher.

I suspect that for the most part, Web dwellers will treat CRC Press as damage and route around them. Already numerous people have stated they will never again buy books from CRC Press. If this situation is not resolved soon, it is likely that a different site will replace MathWorld with a shared dynamic algorithm database.