UBL Methodology for Code-list and Value Validation

by Rick Jelliffe

Ken Holman sent me copy of the latest draft of the OASIS/UBL Methodology for Code-list and Value Validation, which is a pretty good use of Schematron. It looks like a neat and workable solution to a problem that is somewhere between baroque and a hard place using XSD.

Imagine you are a trading company: you have documents which various fields for countries: countries you can send from, countries you can send to, countries the US won't allow you to export to, countries you can use as hubs, countries with regional offices, etc. And you also have lots of other documents with similar or different sets of countries. And countries are only the start: you also have product codes where different fields can have different sets of codes, and so on. And this may vary according to where the document came from (the Libyan branch office may have different rules from the Alaskan branch office). And, of course, the values of codes may have interdependencies, such as "the source must be different from the destination."

So lots of uses of a standard vocabulary, but lots of local and changing subsets that are much closer to "business rules" than "datatypes".

If you used XML Schemas, you could theoretically derive by restriction all the different subset codes, then use "redefine" on every top-level element that used the subsets. (You'd have to do this redefine on base types where possible, so that subsequent derived types would inherit the restriction, perhaps, except then you'd have to check that any subsequent derived types that themselves define restrictions are indeed subsets. Have a breakdown and a good cup of tea.)

With the Schematron approach, you select the items from the code list you want, and some magic tool provided by the methodology generates the Schematron code, which just uses simple XPaths (i.e. what processing software probably uses.) You could still use an XML Schema, just to constrain the lexical space very broadly, but the Schematron constraints would check the values against the list.


2007-01-04 09:50:53
>>> somewhere between baroque and a hard place

Argh! I don't think I can Handel it! Thursday morning just became a funnier place...

I have found that my life has improved immensely since I started using XML Schema for the very broadest of strokes and Schematron for everything else, especially in conjunction with data which is incomplete, in progress, or baroque. My guess is people who attempt to constrain this sort of data with a grammar haven't tried using Schematron, unfortunately, or have fallen into the trap of assuming one tool is appropriate for every task.