XML Schemas: guaranteed non-interoperability as a design methodology?

by Rick Jelliffe

The vogue quip that "a camel is a horse designed by committee" probably makes more sense to people who don't live in a desert country. From here in Australia, camels seem to a very plausible design. It is the speaker, actually, who is wrong: what you need is a camel when you are in the desrt, a horse on the planes, a yak in the mountains, perhaps a porpoise in the sea, and an elephant in the jungle.

The ongoing XML Schemas trainwreck shows little sign of improvement; that users have so repetitively stated their problem and received no satisfaction from the W3C shows how disenfranchised they are. I am thinking about these things again this week for three reasons.

First, I saw (only 2 years too late) the AT&T-originated guidelines on XML Schemas Best Practices which underly a best checker tool at Java.net. It goes through the capabilities of a particular class of application (while assuming that everyone is interested in the same class of applications grrr "XML" is not just what one set of software uses) and gives a list of what will cause problems or be unportable. Some (like deprecating <appinfo>) are dubious, but most seem well-founded. It is a good document for anyone reading.

The tables in A.2 and A.3 is especially interesting, or horrific in practical terms. None of the software supported derivation of complex types by restriction fully, most not at all. None fully supported ID datatypes. Only one implementation fully supported enumerations. Basically, type derivation of complex types was a complete non-starter.

The other reason I am thinking about it was for work. A customer wants to use MS InfoPath with a schema I have been working on. But, predictably, InfoPath has a range of things it doesn't support. Many of them (replacing "unbounded" for the cardinality of choice groups with some reasonable number) are trivial, but it is the same issue.

A little over a year ago, Paul Klee had a great summary article on XML.COM XML Schemas Profile. It mentions the 2005 W3C organized W3C Workshop on XML Schema 1.0 User Experiences, and the do-nothing Chair's report ("No-one wants anything, and if they do they don't agree, and if they agree it cannot be done, and if it could be done other people don't want it, and if other people do want it they actually want something else, and if they don't want something else it would be confusing.") It looks like very strong leadership for inertia, and it cheeses me off that their laziness affects me and my clients at the end of 2007.

One positive thing that has come out has been the W3C Basic XML Schemas Databinding Patterns which lists various XPaths that databinding tools can have. (It mentions how to use these in Schematron, which is good too!) But it doesn't come up to the level of a profile. (And, to be fair, the W3C Schema WG has also upgraded XSD to reduce some gotchas that have been reported, such as allowing unbounded on all groups.)

Why not? Because, as far as I can make out, the idea that we will all be better off if we pretend that XML Schemas is a unified and whole specification, one size that can fit all, then somehow it will magically happen. But fantasy is a really poor substitute for reality. Time and time again I have seen clients happy about XML Schemas and its promises, only to have their hopes dashes as they realize that as soon as they need to start deploying they have to use subsets and there is no support from "standards" to help interoperability.

The third thing? DIS29500 gave XML Schemas that worked in MSXML, but failed in Xerces. This was raised as an issue (by Japan among others) and the schema is being reworked to support Xerces. (The issue is to do with circular imports IIRC: I think the new schemas will be in a single file per namespace and that will help the RELAX NG conversion too.) Again, this is an issue we are dealing with in late 2007.

And that is what you get when you have a large standard that is not sufficiently modular and focussed to support its main applications: guaranteed non-interoperability. This lack of modularity has been an issue that has been relentless pointed out to the W3C XML Schema Working Group and just as relentless ignored: and the result is that it is surprising if we find a schema that works out-of-the-box with the particular tools desired for a job.

Why is that we are going into 2008 and we still have exactly the same kinds of problems that were clearly expressed as real problems in the 2005 experience workshop, and which were predicted vociferously before then?

5 Comments

Mark
2007-12-31 05:10:21
>> It is the speaker, actually, who is wrong: what you need is a camel when you are in the desrt, a horse on the planes [sic], a yak in the mountains, perhaps a porpoise in the sea, and an elephant in the jungle.


Do horses have to stow their saddles under the seats? ;-)


btw: Keep up the great work. Your work is invaluable.





Rick Jelliffe
2008-01-01 07:47:05
Mark: I am thinking, of course, of the new Airbus 380s which are so large you need a horse to get to the swimming pool.
Paul Kiel
2008-01-03 05:53:03
Here is a link to the Profiling Xml Schema article you mention. (And my name is Paul Kiel not Paul Klee. I'm not as artistic has he ;-)
Article linked here
Rick Jelliffe
2008-01-06 00:14:58
Paul: Sorry about the typo. Must be some kind of optical illusion (a euphemism for my visual stupidity!) I have been trying to figure out why on earth anyone would build a Klee sequencer which is why Klee is in my mind, I guess.
Jesper Lund Stocholm
2008-01-09 23:59:14
Hi Rick,



The third thing? DIS29500 gave XML Schemas that worked in MSXML, but failed in Xerces. This was raised as an issue (by Japan among others) and the schema is being reworked to support Xerces.


Are you refering to Japan's comments 70, 71, 74 and 78?


PS: I am the only one having problems with commenting on your blog using Firefox?