A simple ISO NVDL script for preparing ODF XML for validation

by Rick Jelliffe

ISO Namespace Validation Dispatching Language (NVDL) is a little language for taking an XML documents, sectioning it off into single namespace sections, attaching or detatching these sections in various ways, and then sending the resulting sections to the appropriate validation scripts.

NVDL solves several problems that come up with namespaces, and as with DSRL takes a very different approach than XSD takes (not saying one is better or worse: they have different capabilities and therefore may even be used together). One of these problems is the problem that often the official schema has a wildcard to say "at this point you can put any element", but you really want to limit this to your own elements only and you don't want to edit the official schemas (and thereby create versioning and configuration issues).

Another of these issues can be found in ODF. It allows foreign elements anywhere, and in order to validate against the schemas you have to strip these out. However, this does not mean just remove the foreign element and their children, you have to leave the non-foreign descendents in place.

Now this is something that W3C XSD cannot really handle well. You can have a wildcard to allow foreign elements, and process them laxly so that when you come to an ODF namespace you start validating, but you don't have the capability of validating that these elements are correct against the content model you want on the parent of the wildcard. You lose synch.

Here is the section of ODF 1.1 clause 1.5 which gives the constraint:


Documents that conform to the OpenDocument specification may contain elements and attributes not specified within the OpenDocument schema. Such elements and attributes must not be part of a namespace that is defined within this specification and are called foreign elements and attributes.

Conforming applications either shall read documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed, or shall write documents that are valid against the OpenDocument schema if all foreign elements are removed before validation takes place.


Hmmm, seems like a job for NVDL.

Here is a rough NVDL script to do this. (It is untested, but thanks to members of the DSDL maillist for vetting it.)

This script just takes the contents.xml file and removes all elements from a foreign namespace. It uses wildcards a bit. Then it sends the result to be validated using the schema. Note that this is a very coarse sieve: there is no need to get too smart with which namespaces are actually allowed under the main office namespace, because validation will handle that. The purpose of the script is to minimally preprocess the file so that the right elements get dispatched to the appropriate validator.

<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="root">

<mode name="root">

<!-- Validation for content.xml -->
<namespace ns="urn:oasis:names:tc:opendocument:xmlns:office:1.0">
<validate schema="super-odf.rng"
useMode="odf"/>
</namespace>

</mode>

<mode name="odf">

<namespace ns="urn:oasis:names:*">
<attach/>
</namespace>

<namespace ns="http://purl.org/*">
<attach/>
</namespace>

<namespace ns="http://www.w3.org/*">
<attach/>
</namespace>

<anyNamespace>
<unwrap/>
</anyNamespace>

</mode>

</rules>


So there you have it: a nice declarative way to specify the validation pre-processing which can be actually run with the various NVDL processors around the place.

Now we could duplicate this script to handle the other XML files in an ODF ZIP archive: to say that stylesheets files should start with the appropriate namespaces etc. (I think it would be possible to combine them all into one file, actually, so that different root namespaces would cause the stripped document to be dispatched to be validated by different schemas as appropriate.)
Now