Validating Ant with Schematron

by Rick Jelliffe

Charles Goldfarb's idea of using grammars to represent documents has proven itself useful in many situations, and the DTD legacy lives on in ISO RELAX NG and W3C XSD. However, there are many structures that regular grammars, as conventionally implemented, cannot cope with. And it is possible to get a certain cart-before-the-horse mentality about grammars, where any structure that cannot be represented by a grammar is regarded as bad ipso facto.

However, we need to be striving towards systems that free us so that what is congenial to the mind is easy to do on the computer.

I was looking at Ant files recently and they provide another good example. Ant files are configuration files for a modern make system, open source through Apache and most associated with Java development. Ant files are mostly a defined set of elements and attributes which you could have a grammar-based schema for quite easily.

But you can extend the elements inline in the document itself. For example, I am working on (updating Christopher Lauret and Willy Ekasalim's) Ant task for Schematron, to be available as an Ant extension. In Ant, you just need this:


<target name="test-fileset" description="Test with a Fileset">
<taskdef name="schematron" classname="com.schematron.ant.SchematronTask"
classpath="../lib/ant-schematron.jar"/>
<schematron schema="../schemas/test.sch" failonerror="true" debugmode="false">
<fileset dir="../xml" includes="*.xml"/>
</schematron>
</target>


Where the taskdef element defines that there is a task called schematron, and this can then be used as an element later.

In Schematron you could validate this by the following:

<sch:pattern>
<sch:title>Check allowed elements</sch:title>

<sch:rule context="target/*[name() = ancestor::*/taskdef/@name]">
<sch:assert test="true()">
The target element may contain user-defined tasks.
</sch:assert>
</sch:rule>

<sch:rule context="target/*" >
<sch:assert test="self::bunzip2 or self::bzip2 or self::depend or self::javac or ..."
diagnostics="unknown-name" >
The target element should only have built-in Ant tasks apart user-defined tasks.
</sch:assert>
</sch:rule>

</sch:pattern>
...

<sch:diagnostic id="unknown-name" >
The element <sch:name/> is not one of the built-in types in Ant (at least, as at Ant 1.7.0).
</sch:diagnostic


Unless I have made a mistake with the XPath what this does is

  • The first rule finds every element that is a child of target for which there is an in-scope taskdef element for that name. In-scope means that any taskdef underneath any ancestor. The assertions in this rule can never fail, and they just filter out properly defined extension elements so that they do not fire the second rule.

  • The second rule, which applies to any other element under target, checks against the full list of the built-in Ant tasks.



That grammars cannot represent this is not just a lost opportunity for better validation: after all, the Ant program itself can generate messages. But it is a real shortfall for documentation: I cannot see one place in the Ant documentation in which all the structural rules are consolidated. I suppose if you are not used to going to a schema first, then you might not miss it, but I think one of the major convenience factors of DTDs, RELAX NG compact syntax, and Schematron can be the convenient and terse collection of structural rules, like a help card for programmers.

I have added a little diagnostic message too: just to let the user know what the unexpected element actually was. It isn't part of the main assertion so that the assertions are "pure" positive descriptions of what should be.

Now, lets assume you are Vigorous Grammar Fanboy (VGF). You object, why not just have a container element like user-task fo all the points where you want these, along the lines of the CustomXML elements in OOXML where the name of the desired element is effectively in an attribute not the actual element name? First, because it is ugly. Second, because it emphasizes that this is an extension element, which is of interest during setup and then extraneous information afterwards. Third, because then you are messed up with using the element name to determine the contents of the element anyway. And fourth because it is not what the original writers found idiomatic, direct and minimal. Or was that point one again?

But you, the VGF, are not content with that. Oh no, you are relentless, like a killer whale attacking a seal pup on the beach. You say "Err, isn't this what namespaces are for?" And, indeed, Ant is starting to add support for namespaces which may in time supercede this. My answer: namespaces are difficult for the kind of developer who are making Ant tasks: they are probably not addressing XML problems at all. And namespaces pose more problems for users. In fact, the Ant declaration system is one of binding a local name to a class, and so it is no more prone to name clashing that if namespaces had been used (i.e. conflicts with the same element name are no different from conflicts from the same prefix.)

So a quick comment to developers: if you have used XML for configuration files or other things, and then found that XSD doesn't have enough power to represent what you have, it is most likely that ISO Schematron can do the job, and do it with clearer diagnostics.

6 Comments

Dan McCreary
2008-05-29 18:52:51
Nice post! I have been wanting to validate Ant files myself for a long time. But I could never find an ant.xsd. Schematron to the rescue again. Schematron really is more powerful and you can create much more user-friendly error messages. So is there a full Schematron rules file for ant yet? ;-)
Rick Jelliffe
2008-05-29 19:20:07
Dan: Nope, I got too bored wading through the Ant documentation to do it. If anyone wants to do it, I am happy to post it here: it shouldn't be difficult. I think using the same model as the second rule to handle all containment should be enough for something useful: I don't think Ant has fiddly sequence or cardinality rules (but, without a schema, who knows, actually?)
Marc van Grootel
2008-05-30 00:43:39
There is this core Ant task called which generates the DTD for an Ant buildfile (containing the rules for all defined tasks).
Marc van Grootel
2008-05-30 00:44:04
There is this core Ant task called which generates the DTD for an Ant buildfile (containing the rules for all defined tasks).
Marc van Grootel
2008-05-30 00:46:02
Sorry for the comment spam ;-) The name of the task got filtered out. The Ant task is "antstructure".
Rick Jelliffe
2008-06-01 23:43:00
Marc: Thanks for the heads-up! Yes, you can retrofit a DTD when you know what the extension elements are (or if you use the ANY content type, perhaps) but the Schematron schema works out-of-the-box before you know what your extensions are.