Microsoft XML parser developer looks back

by Simon St. Laurent

Related link:

Derek Denny-Brown, a key developer of the MSXML parser and a variety of other XML-related tools from Microsoft, takes a look at "Where XML goes astray." He finds three key areas where XML caused him trouble: allowed characters, whitespace, and namespaces.

Denny-Brown's core list definitely strikes the heart of difficulties for people writing parsers, though I have my own lists of issues at other levels in XML 1.0 for people exchanging documents.

I also find his comments about where XML has landed to be worth careful consideration:

XML was primarily intended to support taking a stream of text intended to be interpreted as a human readable document, and delineate portions according to some role. This sequence of characters is a paragraph. That sequence should be displayed with a link to some other information. Et cetera, et cetera. Much of the process in defining XML based on the assumption that the text in an XML document would eventually be exposed for human consumption....

XML was not designed with the SOAP scenarios in mind. Other examples of popular scenarios which deviate XML’s original goals are configuration files, quick-n-dirty databases, and RDF. I’ll call these ‘data’ scenarios, as opposed to the ‘document’ scenarios for which XML was originally intended. In fact, I think it is safe to say that there is more usage of XML for ‘data’ scenarios than for ‘document’ scenarios, today.

I'm also impressed by Denny-Brown's calm conclusion, in which he acknowledges that:

Note that nowhere above do I talk about how XML should have handled these issues. In most cases, when the original decisions were made and they made sense to me. I like to believe that I have learned a lesson or two since, but who knows.

To the extent that XML works as well as it does, it undoubtedly owes its success to years of experience with SGML. All of the issues Denny-Brown raises were new issues specific to XML - especially namespaces - and maybe after a few more years someone can apply more recent experience to XML, doing to the XML family of specifications what XML did to SGML.

How long does it take to work out the bugs in a new specification?