A DSRL script for mapping from Schematron 1.n to ISO Schematron

by Rick Jelliffe

ISO Document Schema Renaming Language (DSRL) is one of Martin Bryan's contributions to the ISO Document Schema Description Languages project at JTC1 SC34 WG1. This brings together various technologies by Murata Makoto, James Clark, Martin Duerst, Jenni Tennison, and others (including me) to try to build a layered solution to validation using a variety of "little languages".

I don't need to go into the advantages of little languages, though I will say that I think that one major concern is that large languages disenfranchise the solo and part-time developer—this is perhaps no concern if you are a large corporation (though it will become so as the maintenance crunch sets in) but it is a definite issue otherwise. Of course, there are disadvantages too: we might hope that the little language would be easier to reason about than a large language, but little language may concentrate on depth rather than breadth, and this extra bang-per-buck can add to the complexity of understanding every case. Furthermore, the little languages still need to be combined, and this has its own perils. But admitting these possibilities does not diminish the usefulness of the approach.

A common issue with standards is how to cope with changes from the pre-standard technology to the standard one. Schematron was a typical case: in moving from Schematron 1.6 to ISO Schematron involved:

  • Swapping to a new namespace

  • In the pattern element, replacingt he attribute called name to id with a title subelement element

  • Removing the sch:key element but recommending xsl:key instead.



All these changes are cosmetic as far as functionality is concerned, but prevent a Schematron 1.n schema being a valid ISO Schematron schema.

This kind of renaming problem is not just reserved for the initial step of making a standard. During the life of a schema, different values and names may come into fashion. Sometimes people decide to take a broom through a schema to consolidate names and allowed values.

And this is where DSRL (pronounced DISRULE as in being against a central authority) comes in. It is a simple declarative language that basically maps between from names and values to to names and values. You can make maps for namespaces, element names, attribute names, PI targets, element values and attribute values (including token lists). Most topically now, in relation to recent ODF discussions, you can also declare maps for the default values for attributes and elements: in fact, it is now looking like the ODF facilities for DTD-compatible attribute value default declarations are fraught with complexity and ugliness such that they should be avoided. One really interesting, but problematic, feature is the ability to provide declarations for undeclared entity references in the document (a feature often requested by the publishing industry) and the ability to rename entity references (which may be quite useful now that SC34 has given the ISO standard entity sets for special characters to the W3C MathML group to maintain: they have a high premium on HTML compatibility even when wrong.)


DSRL is now at a very late draft stage, and I expect it will be finalized over this year. DSRL is declarative: it provides mappings, and even though it could be used to rename items in schemas, Martin Bryan's open source XSLT implementation of it takes the more direct route of renaming the document. The implementation is available in the ZIP file at the DSDL.ORG site.

For a flavour, here are the renaming rules as given above for the changes from Schematron 1.n to ISO Schematron.


<dsrl:maps
xmlns:dsrl="http://purl.oclc.org/dsdl/dsrl"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform" >

<dsrl:element-map>
<dsrl:from>sch:schema</dsrl> <dsrl:to>iso:schema</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:title</dsrl> <dsrl:to>iso:title</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:phase</dsrl> <dsrl:to>iso:phase</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:active</dsrl> <dsrl:to>iso:active</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:pattern</dsrl> <dsrl:to>iso:pattern</dsrl>
<dsrl:attribute-map> <dsrl:name>name</dsrl:name></dsrl:attribute-map>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:rule</dsrl> <dsrl:to>iso:rule</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:extends</dsrl> <dsrl:to>iso:extends</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:assert</dsrl> <dsrl:to>iso:assert</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:report</dsrl> <dsrl:to>iso:report</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:diagnostics</dsrl> <dsrl:to>iso:diagnostics</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:diagnostic</dsrl> <dsrl:to>iso:diagnostic</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:let</dsrl> <dsrl:to>iso:let</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:p</dsrl> <dsrl:to>iso:p</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:span</dsrl> <dsrl:to>iso:span</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:value-of</dsrl> <dsrl:to>iso:value-of</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:name</dsrl> <dsrl:to>iso:name</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:dir</dsrl> <dsrl:to>iso:dir</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:emph</dsrl> <dsrl:to>iso:emph</dsrl>
</dsrl:element-map>
<dsrl:element-map>
<dsrl:from>sch:key</dsrl> <dsrl:to>xsl:key</dsrl>
</dsrl:element-map>
</dsrl:maps>


What does it do? Replacing a namespace is quite rare, so the declaration is not as simple as could be conceived: you rename each element explicitly. The last entry handles the special case of sch:key.

The sch:pattern element has an attribute name which ISO Schematron regularized to be a title element, but there is no way to declare this in DSRL: it is not a general purpose transformation language like XSLT (but it can be translated into XSLT, as in Martin's implementation which follows the Schematron pattern) and in fact is just as convenient in reverse (mapping from new schema documents back to old names) or renaming schemas rather than documents with a suitable implementation: it specifies the mapping not the transformation in a sense. So the best we can do is just to strip that attribute out: it is not required for validation.

I think important aspect of DSRL is that it shows that the SC34 WG1 is asking fundamentally different questions than the W3C XML Schemas WG, which is not to say that one is necessarily asking better questions at all! In XSD you have various facilities like import, redefine, equivalence groups, type derivation by restriction and extension, but there is no systematic facility to allow name and value mapping: to say "What I used to call xxx:yyy I am now calling aaa:bbb!" XSD is not interested in PIs or entities, of course.

So where is WG1 going with all this? The DSDL project is taking time: there have been no shortage of distractions. It has no support from large companies, much as we would welcome this, no publicity or marketing budget, and has to stand or fall squarely on its technical merits, in the context of a market which would really prefer if there were some way to shoehorn XSD into doing this. Now, of course, in a rational world the large corporate (open and closed source) developers would see DSRL as a simple pre-processor to XSD that can help many migration and maintenance issues: as an adjunct. But we are not holding our breaths!

But my vision is that in the near term, with DSRL completing the base DSDL quartet of RELAX NG, NVRL, DSRL and Schematron, that standards developers will start to take them on board as a package:

  • ISO NVDL selecting the particular schemas for different namespaces and culling foreign elements as desired

  • ISO DSRL renaming, localization and providing default values to handle common evolution cases

  • ISO RELAX NG performing grammar-based validation, extended with its XSD data types

  • ISO Schematron performing more complex and detailed validation



A couple of years ago we finally arrived at the point where people had come to pretty realistic apprehensions about the proper limits of XSD functionality, and I think we are now arriving at the same kind of level of maturity with RELAX NG. As these limits become commonplace, I think the need for NVDL and DSRL (for XSD and for RELAX NG) will similarly become more well-know.

My prediction is that it will increasingly occur to community standards bodies that their standards have quite a number of constraints or gotchas which are poorly expressed in English but much clearer (and machine verifiable) when expressed using DSRL (and NVDL and Schematron.)

1 Comments

bryan
2008-06-04 10:16:38
ok well I think I'll stick with the 4-5 line transformation to do the same thing. in a dsl I appreciate code brevity.