Why Schematron (or something like it) will win

by Rick Jelliffe

The more I think about it, the more that I think that the reason why Schematron (or something like it) will ultimately win (i.e. consolidate into a mainstream place as a schema language with broad vendor support) is that it fundamentally asks a different, and more important, question about documents than the grammar-based validators, such as DTDs and XSD. The question it asks is "What information does a schema need to provide to the user and application?". And a related question "How do we group that information so that it can be sequenced to the user and application in some useful order?", which is just as important when there is a large amount of information.

Contrast this with, say XSD, where the question gets morphed into "What canned outcomes can validation have, independent of the schema's actual semantics and domain?" XSD is crippled in this regard because there is not enough guidance about the appinfo and documentation annotation elements: are they for human end-users, for schema management or what? They are wasted elements.

Now I guess underlying this is the idea that the central issue with building computer systems is ultimately how to relate information to humans in ways they will understand. (I want to use the dreaded word "empower" here, to my shame.) That includes developers, but developers are only the initial target group, not the only ones.

5 Comments

Dan McCreary
2007-07-05 15:09:01
Are you saying that Schematron should replace XML Schemas or should they just supplement XML Schemas?


We use XML Schemas extensively to capture document structure. We empower our users to draw "pictures" using XMLSpy that become the requirements for the structure of a document.


XML Schemas seem to be a very fast way to validate XML documents. I just finished up a project that we validated over 100,000 documents in under two minutes using XML Schema.


But XML Schemas do have their limitations. And perhaps this is OK. Little languages promote separation of concerns. So why not just add Schematron as a separate pass for validating complex business rules where the rules are not just leaf-level checks but are complex XPath expressions.

Phil Fearon
2007-07-05 16:03:07
Whether a standard 'wins' is sometimes more down to the availability of tools that support the standard, or the role for which the standard is promoted (the 'killer app'), rather than the technical merits of the standard itself.
I mention this because I'm a fan of 'as-you-type' validation at the editing front end, so document authors get useful warnings if data structure in their document is inconsistent with business rules etc before they publish. Schematron's flexibility seems to lend itself well to this and it would be good to see more tools that assist with this.
Rick Jelliffe
2007-07-08 00:04:34
Dan: In the short term, of course they will only supplement for most people, who have lined up for the grammar Kool-Aid. And, yes, separation of concerns is one good thing in the trade-off; but unified models also have good aspects too.


But in the long-term, I think the whole notion of validation will re-align itself along lines that Schematron also travels. (I am actually going to make a new blog article with material that I wrote in response, here.)


If you validated 100,000 in under two minutes, that is remarkable. Great! Which implementation of XSD and what sized document would be interesting, please? I would certainly expect that you have validated your validation process, by including a bad document at the end. There is a notorious flaw in XSD that if there is namespace error, it is possible for a document to be marked valid but not actually checked against any schema.

piers
2007-07-10 13:19:33
Grammars are hierarchical, so they implicitly attempt to provide a complete and hegemonic definition of potential data. If and when they fail to do so, or you find yourself in a situation where the same potential data has been defined (differently) by two groups of people, a system like Schematron provides a very timely partial solution. I have found one of the great things about Schematron is it allows for a separation between technical and business constraints on data, for instance, so that multiple working groups can develop partial solutions to their data needs. Is it necessary to begin with a grammar, and add shots of Schematron to the boilermaker as required? Likely not, but it is understandable if people cling to the illusion of a perfect definition that a grammar presents.
Rick Jelliffe
2007-07-10 22:43:39
Piers: Yes, any modeling abstraction (grammars, paths) has will be suitable for *something* (every dog has its day!) The thing is figuring out what the problem that actually needs addressing is. My feeling is that the problem is how to convey (and therefore how to capture or specify) meaningful domain-specific messages about documents to humans and other agents. XSD and the other grammar-based lamguages (as currently formulated) clearly addresses other issues and just has hand-flapping about the human aspect.