Why we will always have problems with XML Schemas, even when the bugs go

by Rick Jelliffe

Eagerness and laziness just don't mix.

There are basically two kinds of XML Schema applications: one kind relies on having all the various schema documents present, the other kind dynamically locates and loads schemas for namespaces as elements or attributes with that namespace are found in a document, lazily. Examples of the first kind include XML IDEs and XML Databinding tools; examples of the second kind include most server-based library tools, such as Apache Xerces and MSXML.

The reasons are obvious: speed. An IDE needs to have all the information at the user's fingertips. A server-based validator needs to avoid loading spurious schemas where a namespace is possible but not actually used in the instance.

But they are incompatible. The trouble comes whenever you have standard envelope elements or whenever you have multi-vocabulary documents that can start in several vocabularies. The IDE kind of application resolves all th imports in the schemas eagerly; the server kind of application read declarations in a schema, including the import elements, lazily.

XML Schemas allows both behaviours. So you have have a set of schemas that your IDE says is complete and OK, and which validates your sample documents, and then pass the same schema to a server validator and have reports that certain schemas are not available. HUH, BUT I CAN SEE THE IMPORT STATEMENTS?? ( I'll give a little example later, to help clarify it.)

So what can you do? For a start, if you create documents using an application that has all schemas loaded, you need to be aware that there is a large chance (if you have multiple schemas and standard envelopes, etc) that your schemas will not run successfully in applications of the other kind. There are a couple of remedies: importing everything from everywhere, is one ugly one; having facade schemas is another. But you should probably think in terms of "document type", just like the old days of DTDs: base the document type on the namespace of the top-level element, and if you have multiple namespaces possible, make up a separate schema (invoking the common components) directly.

5 Comments

Uche
2006-08-10 17:58:29
Interesting, Rick. I suspect most people would just go with the facade schema workaround on this one. I did want to point out a few typos that make the above harder to follow:





Should be xmlns:ns1 and xmlns:ns2


"But it does not load ns2, so it does not know..."


Should be


"But it does not load s2, so it does not know..."

Uche
2006-08-10 18:02:45
Gah! forgot to escape. I meant


<ns1:e1 xmlns:e1="...">
<ns2:e2 xmlns:e2="..."/>
</ns1:e1>


Should be xmlns:ns1 and xmlns:ns2

Rick Jelliffe
2006-08-11 00:17:57
Thanks Uche, I've fixed the incorrect prefixes now.
len
2006-08-11 06:10:37
I'm forwarding the link to this article to the X3D public list where a debate on the value of validation (is it just a spell checker?) is occurring.


Once outside the XML community itself, there is surprisingly little penetration of non-DTD/XSD thinking particularly understanding of multiple-namespace instances. Each linguistic community tends to think in terms of its own language as dominating the instance and seldom in terms of mashed up integrated documents (eg, XHTML + VML or SVG, X3D + SVG and so on).


The object model incompatibilities are the principal technical reason for that I'd hazard to guess. At the end of the day, it is of not much use to mix and validate XML vocabularies if the rendering/behavior frameworks and applications don't support them. Somehow, the old NOTATION challenge never goes away.

John
2006-09-14 08:54:42
I am not sure that your issue is with XML schema, but with buggy validation code. Under XML Schema Part 1 (http://www.w3.org/TR/xmlschema-1/) Section 4.3.2, paragraph number 4 on xsi:schemaLocation, the schema location of any element can be placed just about anywhere (even on the element itself) as long as it is early enough, and it is an error not to have done so by the time you get to that element. In other words, the lazily assembled schema should not have a problem if your XML document were properly supplied with xsi:schemaLocation attributes. At worst, you could put one on each element. So if the validator ever gets to a point where it says "I don't have a schema for that namespace" it's the XML instance's fault not the validator's.
It is also an error for your eager validator to have not found the error. But then XML Schema has never really defined a validation test for compliant software, so that is an opinion.