Converting Content Models to Schematron

by Rick Jelliffe

Regular grammars model constraints in the form "when we have an element with particular preceding siblings, which elements can be the following sibling?"

To do this in Schematron, we make assertions in the form

 <rule context=" element [ preceding-siblings ]">
<assert test=" immmediately-following-siblings ">The following siblings are
expected after an <name />.</assert>
</rule>

2 Comments

anon
2007-09-11 00:29:32
Why would you want to convert XML Schemas to Schematron? Is this only an (interesting!) academic exercise or is there a business case for it?
Rick Jelliffe
2007-09-11 01:42:44
Anon: Actually, my company Topologi is working on an implementation of this for a customer at this very moment. The ideas are largely based on the outline above, and I will be presenting the details in due course.


There are several reasons. First, because it allows more scope to provide customized diagnostics: XSD (and grammars) are notoriously unhelpful in the messages they provide. Now implementations such as Xerces do allow you to compile in different error messages, but that is a compilation job not a script job, and you usually need to recompile your applications to then use the customized messages.


But the errors themselves are not necessarily useful. For example, if you have a content model (a, (b, c)?, c, d) and the document is "" the error message will be something like "unexpected c found: expecting d", but pointing to the second c. But the error is clearly that the b is missing.


A Schematron implementation that, for example, used partial strings, would be able to say that if there are two elements, there should have been a element and so identify the exact problem.


Now when we combine Schematron's phase mechanism with its ability to model the document from the POV of the typical interests of users in different use cases, we get a pretty compelling picture. The user could say "I am not interested in missing elements, just the order of the existing ones" and be given the equivalent of Jing's feasible validation (which I requested from James Clark for Jing, to support 'progressive validation' better.) Or the user could say, "I am not interested in foreign elements or content models, just in validating simple types."


Obviously, you can make an XSD validator that has these too. But people have not, and they will not, because the grammar-based systems are not user-centric, and people know realize that it would be just dressing mutton up as lamb.


Another reason for doing it in Schematron, is because it reduces double handling: the validation can potentially be integrated with the transformation.


Finally, another reason is that XSL is built into a different range of applications than XSD has. For example, browsers.