Error-full systems emerge from single-strategy maintenance regimes
by Rick Jelliffe
I don't know if Joel was correct in his assessment, or whether Microsoft have a different strategy now. But clearly the mid-term impact of such a strategy would be a buggy code base, with entrenched workarounds, combinatorial explosions of symptoms that prevent diagnosis, and an inadquate foundation to prevent major errors. Not to mention a sudden exposure to loss of market share when the market gets saturated and stops growing: when a sucker isn't born every minute.
Sun's Java effort is similarly suffering recently: they have a nice-looking error process based on people voting for errors as critical. Now whether Sun acutally use this list to determine which bugs they fix first, or whether they use the vote to justify ignoring bugs that they are not interested in, the result is probably the same. A system with lots of known bugs.
There are lots of other single-strategy methodologies: risk-based analysis, ISO 9126 software quality analysis, weighting bugs against their depth in the call stack so that libary bugs are fixed at hgih priority, metrics, test driven programming, and so on. I don't know why we should have any confidence that any of them will necessarily not, over time, systematically fail to address some kinds of errors. Which will bite us.
So is a better approach to just fix bugs randomly? Pick a bug from a hat? Well, maybe....
Perhaps we should say each maintenance methodology applied singly over time will result in an accumulation of unaddressed errors in some aspect.
Part of the problem is human: people have interests and pressures and viewpoints. So democracies solve this by what Lee Teng-Hui (the Taiwanese president who secretly funded the opposition parties) called "the regular alternation of power": term limits, shifting jobs, even sabbaticals.
Part of the problem, as I see it, is with simple prioritization of bugs. Sometimes it is better to see each module as a whole, allocate quality requirements for that module, and then handle each bug according to its module priority. For example, Sun could say "we don't treat text.html as a priority module but we do treat 3D rendering as a priority". Apply this to voting, and then two votes for an HTML bug would be required to equal one vote for a 3D bug.
But that is a more complex strategy to be sure, but it is still a single strategy.
A better way of doing things may be to divide the debugging/maintenance/natural enhancement effort into independent efforts. For example, have main stream process use immediate rational economic effect, risk or deadline criteria. But also have a background effort that alternates between different strategies: systematic audits for internationalization, performance, standards-compliance, transparency, integrity, resource utilization, and other quality concerns. And also have a background effort that uses weighted voting and different criteria that accepts minor Requests For Enhancement as well as bugs.
And even, for one in a hundred bug fixes, do pick a bug out of the hat, on the grounds that you don't have 100% confidence that even the multi-criteria maintenance will prevent the emergence of a nasty clump of errors in some aspect. Shake it up.
"have main stream process use immediate rational economic effect, risk or deadline criteria. But also have a background effort that alternates between different strategies: systematic audits for internationalization, performance, standards-compliance, transparency, integrity, resource utilization, and other quality concerns." That's more or less how things are done at Microsoft. There are quite different processes (and sometimes different teams) for shipping and "sustained engineering". There are all sorts of oversight groups with specific charters and processes for security, localization, etc. Then there are the various customer-driven field organizations that push for fixes affecting specific customers.
It sounds corny, but bug fixing *can* get old very quickly, and I definitely would advocate the sort of approach you'd take in a storefront retail business: "October is Customer Satisfaction Month! November is Performance Issue Month!" Planned out ahead of time, the month could begin with an informal discussion or specific training in a focus area. If your employees are motivated by bug-stats, in-focus bugs might count as two if closed within the prescribed time period. Shake it up indeed!
Do you think companies could adopt a flipside of the Google 20% Project to this approach (instead of 20% of work time devoted to individual development of new features, 20% of work time devoted to individual bug squashing and refactoring)?
|Michael, if you have two customers who use an API, and one realizes there is a bug and writes their application to utilize the bug, while the other writes the application thinking that they get what the documentation says, are you saying that the first user should be preferred to the second? The first user makes their bed, and they can sleep in it, to some extent; the second will be delighted at risks reducing and quality improving. Isn't what you are suggesting really that after an API has been released and used, the documentation should be corrected to reflect what the software actually does rather than what it was initially specced to do? (Like the Confucian Chinese idea of the Rectification of Names) However, of course, I understand this is a lifecycle thing: as an API matures it solidifies (and finally fossilizes?) so bugs need to be fixed early because they can become entrenched otherwise. On the other hand, I am not sure I would want to use any product where the documentation was retroactively changed in preference to fixing genuine bugs, no matter how longstanding.|