No Central Organization to Collect Failure Data in Open Source?

by Todd Ogasawara

Microsoft's anandeep posts a thoughtful Port 25 blog item titled...

Who really needs to gather crash information and what do they need to do with it?

...relating his recent visit to the Microsoft Cambridge Lab and a recent conference there focused on the topic of software reliability. He closes his blog by writing...

Open Source would have much the same issues but for the fact that there is not a central organization that collects all this failure data. The situation in Open Source may be the reverse of the situation for proprietary software makers in that the failure data is collected at the IT organization level and not centrally. How does this failure data really result in code defect corrections? I guess that it is either pre-analyzed and submitted as a bug or people patch their own instances of the source code. But my opinion is that eventually open source software systems will have to build central repositories of failure data in much the same way that commercial software vendors have built them.

Let me preface my comments on this by admitting that I have never made a significant contribution to a major Open Source project. But, I am a long time FOSS Lurker (FOSSL - pronounced fossil? :-) as a relatively early end-user of FOSS (before the term Open Source was coined) software such as GNU EMACS, GNU C, and Perl since the mid-1980s and having installed Linux from floppies downloaded in increments of uuencoded files from USENET newsgroups.

My gut instinct is that the large Open Source projects that I follow and use (Apache httpd, Firefox, Thunderbird, MySQL, PHP, Ruby, Python, Zope, etc.) as well as many of the smaller projects already do pretty well in the error reporting and correcting department. The various communities around healthy FOSS projects are, it seems to me, extremely knowledgeable about the products they use and proactive in terms of bubbling up issues to the FOSS project members. There are chat rooms, bulletin boards, user groups, and formal error reporting procedures.

The FOSS communities have, it seems to me, performed a remarkable job of identifying and responding to product reliability issues.

However, we have been watching the emergence of FOSS business models over the past few years that may change the complexion of these communities. For me, the first change came when Red Hat stopped providing free ISO distributions after Red Hat 9. More recently, MySQL forked their product into Community and Enterprise Editions and reduced the release of Community RPM releases to twice a year. In their case, however, the MySQL Community Edition source code releases configure and compile easily on a Linux system making it easy to stay up to date (I haven't tried building from source under Windows). These notable (to me personally) changes are understandable from a revenue driven point of view. But, I wonder if it signals, perhaps, the need for these more commercial FOSS projects to focus more on centralized repositories of failure data as anandeep suggests.


Englishman in Ireland
2007-03-13 01:24:22
Hey.. as a long-standing member of the 'closed' community and born-again 'FOSSL' (??) I kind've agree, but my experience is that even the commercial vendors don't do failure collection as well as they could - certainly no better than (e.g.) mozilla.
Aaron 'Teejay' Trevena
2007-03-13 06:24:21
I don't see his point..

Currently I have bugs reported in my Perl software through email and http://, and some automated crash reports provided through CPAN Testers who smoke test perl modules on different platforms and versions of Perl.

I report bugs in software I use directly to the author, frequently with lengthy email exchanges, or through bugzilla, etc.

Redhat already provides a centralised issue tracking system for the HUGE ammount of software bundled in it's distribution and provides a one-stop-shop for reporting, support and updates - all the other major distro's provide the same. All the distro's also work with 'upstream' vendors like MySQL to ensure that their bug fixes, reports, etc make it back to the original software creators. I don't see any problems there.

The only automated crash reporting I've used has been in Firefox, Mozilla and a couple of Gnome applications - I've found them as a user to be unhelpful - essentially they make it even slower to continue working after a failure, and are highly unlikely to provide useful information to developers IMHO.

As a developer I don't believe would find it helpful to have crashes reported to me for widely distributed software of mine - it's much quicker and easier when you have direct contact with the user and can reproduce problems in as simple context as possible.