Expressing untested and untestable constraints in Schematron

by Rick Jelliffe

Schematron is an ISO standard schema language for making assertion about the presence or absense of patterns in XML documents. It has fairly widespread use, from publishing to transport to financial and insurance to health systems, but is not supported by major vendors yet. Schematron is aimed at being a general purpose (rather than domain-specific) rules language for expressing both the kinds of complex structural rules that are beyond the reach of XML Schemas schemas and for expressing simple business rules. Most people use my open source XSLT implementations of Schematron 1.5 (at htp://www.ascc.net/xml/schematron) which is being upgraded to the ISO spec (at http://www.schematron.com), but versions exist from other developers in Python, Perl, C#, and Java.

One of the aims of Schematron was to allow all the constraints in a system to be printed out in bullet list form: literate programming comes to schemas. ISO Schematron allows you to put requirements in free text paragraphs (customer's view), then to put the natural language assertions that test these in bullet point form (the analyst's view), then to arrange and mark these assertions up with the appropirate IDs and XPaths (the devloper's view). This can improve traceability from requirements to analysis to implementation for validators.

But one persistant problem has been that there are often business requirements which are untestable. For example, a business quality requirement that "The document shall be maintainable." is legitimate but not necessarily the thing that you would use a schema to test. (Actually, now that I think about it, I wonder whether it is possible to put the Document Structure Complexity Metric as an XPath that an assertion tests....hmmm)

And there is another kind of constraint that is not tested but will be testable later: perhaps you haven't got the XPath skills to create the test, or perhaps it is based on some future event, such as "All dates in this document must be during the US presidency of G.W.Bush."

So are these kinds of constraints things that can never go into a Schematron schema, or just remain as comment-like paragraphs?

What we can do is have dummy assertions, which never fail and provide a place to park these kind of constraints. Lets make up a pattern for them, and we will use two roles "Untestable" and "Unimplented" to distinguish some of the reasons why the assertion does not have a fallible test.


<sch:pattern>
<sch:title>Untested Assertions</title>

<sch:rule context="/">
<sch:assert test="true()" role="Untestable" >The document shall be maintainable</sch:assert>
<sch:assert test="true()" role="UnImplemented" >All dates must be during the term of G. W. Bush.</sch:assert>
</sch:rule>
</sch:pattern>


Now the constraints are "part of the system" the same as testable constraints, and their status as untested or untestable (by Schematron) is explicit. There might be other roles too: "RequiresCustomTestApplication" for example.

4 Comments

Lars
2007-03-20 08:00:33
Sounds good! In the in-house modeling software we're developing, we have rules in our Schematron schema, which, like you said, just haven't yet been implemented (they depend on external infrastructure that's not yet in place), or we haven't figured out whether/how we're going to implement them. But we needed to get them written down in a place where we wouldn't lose them: our Schematron schema. Right now, they're in the form of XML comments. Converting them to asserts would be a definite gain, so that they can be seen outside of the XML document... even if they can't be automatically checked.


One can envision Schematron validation reports that would summarize the results of automated tests, and would also list the non-automated tests under a section heading or some other decoration that showed them to be unchecked.


Two questions/thoughts:


- What is the default role? "Implemented"?


- Why "role"? Is "role" an attribute that already exists in Schematron, and you're retrofitting this concept onto it? In the general case, the term 'role' might fit this concept, but it doesn't seem like "Untestable" or "Unimplemented" are roles in the intuitive sense of the word 'role'. I guess it makes sense if you think of it as "what do I do with this constraint?" In that case, "untestable" and "unimplemented" are *characteristics* of these constraints that *imply* a role, namely, for human consumption and not automated evaluation. So the role of the constraint could be "documentation", for example; a default role would be "automatedEvaluation". (bleah!)


Or more to the point, you could have a boolean attribute automated="yes|no" (default yes); and for the non-automated ones, if desired, you could have another attribute describing why (e.g. whyNotAutomated="unimplemented|untestable").


You might also have "disabled" as a value for whyNotAutomated: you have an XPath expression that can check the constraint, but since you've just changed the rules, none of your data passes the test. So you disable the new rule for a while so that the rest of the rules are still usable.


Thanks again for a great tool. We get a lot of mileage out of Schematron. We have a complicated data model, and we'd have to spend a lot of time debugging our mistakes if we didn't have a good constraint checker.

Rick Jelliffe
2007-03-20 20:03:17
(sch:assert|sch:report|sch:rule)/@role is an attribute defined by ISO Schematron (and before!).


The meaning of role is fairly open ended: it can be used to describe the role of the assertion in the schema, or the role of the subject nodes of the rule or assertions. (There is an optional attribute @subject which is a relative XPath that can be used to identify the specific node to which an assertion applies, to help this.) So you can get various kinds of arc and node labelling, particularly when used with sch:report, as well as "warning|caution|note" kinds of augmentation.


There is no default role.


In a sense, "Untestable" and "Unimplemented" are not roles in the sense that they label an identified subject node. However, the concept is orthoganal (no test, no possible subject node!) so I don't see that it does any violence. And ISO Schematron in general errs on the side of under-specificity in order to allow this kind of growth. And I take your point that there may more declarative and less functional names that are appropriate: "Documentation" and "PLaceholder" for example.


For disabling specific rules, there is nothing stopping an implementation turning off rules (they still need to evaluate the contexts to make the rule switch work), I was thinking of putting this into the ISO Schematron implementation actually. Or an implementation could just enable or disable assertions based on their @role, for that matter.


The other interesting attribute in ISO Schematron is the flag attribute. I'll write a blog post about it sometime soon.


The ISO Schematron schema is open to other namespaces, so you can add you own custom elements and attributes as you need them.

Lars
2007-03-21 15:04:58
By the way, excuse my ignorance, but is there a way to add a prose description to a sch:rule, as there is with sch:assert?
Rick Jelliffe
2007-03-22 02:44:46
Lars: No, not for rules. You would have to put it in pattern-level documentation, using the sch:pattern/sch:p elements. Or you can link from a rule using sch:rule/@see


This may be one of the differences between "types" and "patterns", in that XSD-style types adheres to an element (or attribute) and describes its allowed children or contents. A pattern is a grouping of assertions that make some compound structure or represent some abstraction from the mind of the schema creator. In a type system, the subject node (i.e. the element being declared to have a certain type) is the organizing principle for schemas; in a pattern system, the various subjects (i.e. the context nodes, usually) are not being defined as having a type but of participating in something bigger (the pattern.) (Of course, both end up giving constraints to nodes.)


The schema for ISO Schematron is at http://www.schematron.com/iso/iso-schematron.rnc