Derivation by Implied Restriction: a missing derivation type for XML Schemas?

by Rick Jelliffe

By now XSD users are pretty aware of the severe limitations in the complex type derivation mechanisms provided by XML Schemas. Apart from the issue of whether they should be there at all, rather than being treated as a kind of validation issue as they are in RELAX NG, the problems are basically that "derivation by extension" only allows new elements at the end of the content model ("extension by suffixation") so that I cannot extend <name><first>Rick<<last>Jelliffe</last></name> to be <name><first>Rick<<middle>Alan</middle><last>Jelliffe</last></name> using derivation by restriction (I need to change the base schema), and that I cannot use derivation by restriction to remove or optionalize an element that is required in the base (I need to change the base if I want to remove a required middle name for example.)

Now this is not to say that the definitions of complex type derivation by restriction and restriction are not logical. It is just that they are not useful or too strong in many important situations. The W3C XML Schemas Working Group has indeed worked on finding better definitions for them, but maintaining the core concept that a type derived by restriction is valid against the base type.

But I suggest that there may be other kinds of derivation which are useful. One that I would suggest might be called "Derivation by Implied Restriction". This is where there are two complex types and neither is the base type for the other, but there is clearly some family resemblance. Rather than creating an explicit base type, I wonder whether it would be useful to ask a lesser question of them: could there be a base type created (automagically or notionally) against which both content models were valid by restriction and in which there was only a single particle for each duplicated particle the source content models? So the implied base type could be given a name that derived types but would not be specified (declared, defined) explicitly anywhere.

So if one content model said (first, last, gender?) and the second said (first, middle, second) the implied base content model would be (first, middle?, last, gender?). However if one content model said (first, middle, last) and the second said (first, last, middle) there would be no implied base type, because because (first, (middle, last) | (last | middle)) have duplicated particles. (I haven't thought wildcards through.)

In other words, really the type is being derived from the instances, backwards, and if no derivation is possible then the instances are not related by an implied complex type.

I suspect this derivation type (and I am sure there are more) would reduce the complexity for XSD development from the users POV. Something more constrained than ALL but less constrained than current type derivation.


Daniel Gabi
2007-07-31 06:30:58
there should be a construct in XMLSchema that would allow to define complexTypes (simililar to all the possibilites with elements like sequences, choices between global or local elements, groups,... based on global or local types, derivation and extension, etc.) as 'child nodes' of elements or complexTypes. the instance elements could then be:
- all with the same name but different types,
- of the same type but having different names,
- of another type, because in the schema was set an xsd:anyCXType
- the only element in this childNode set, by default based on the 'foo' type, because in the schema was set that the foo type is once mandatory and other elements / complexTypes are possible

similar to element substitution groups, type substitution groups would be collections of possible complexTypes to match that base on the same (abstract) base type.

thanks to you. daniel

Rick Jelliffe
2007-08-03 20:12:13
Daniel: Indeed, there are lots of possibilities. I lost confidence in the grammar-based approach because it seems that the theoretical infrastructure swamps the ease-of-use.

But clearly XSD has a weak spot when used, say, for whole-of-government use with a master and scores of actual schemas that need to be dreived from it: it seems that a master schema should be as wild-carded and broad as possible, just a vocabulary, and the structural and constraint information removed as far as possible.

Daniel Gabi
2007-09-27 09:13:55
great. i read about it in the XSDL 1.1 Draft: conditional type assignment: the type of an element is set depending from some given attribute (value). there are other improvements in XMLSchema 1.1, i.e. with xsd:any.