Fake real-time blog from XTech 2007: Henri Sivonen's Implementing an HTML5 conformance checker using XML tools

by Rick Jelliffe

I wasn't there, but the XTech 2007 Conference seems to have its presentations online already: fast!

Scanning through them, one made me really happy. It was Henri's talk on the WhatWG's HTML 5 validation efforts. Actually, "I've won!" flashed through my mind. It was not because the HTML5 group had started to use multiple validation languages, along the layered or progressive lines I (and the DSDL rabble) have been advocating, nor even because they were using Schematron, nor even because Henri says that Schematron (and RELAX NG) while better than XSD were not as good as they expected (thereby giving me a challenge to show how they could do it in Schematron with the correct idiom, and thereby make me appear well smart).

No, what made me happy was a little line towards the end where the issue of generating usable user messages was raised (p41). This is the most important part of Schematron, not the use of paths or assertions or phases or flags or any of the mechanics, nice though they may be. The "big idea" behind Schematron, such as it is, is that the problem of validation is just as much (indeed, more) one of communicating constraints (and therefore unmatched constraints) to users as it is about representing them to machines. Validation is not just binary, or even a set of fixed outcomes: it is about determining, locating and communicating the status of a document and its parts.

This is especially because the user experiences the document often mediated through some user interface, not as elements and attributes: so validation messages that are given in terms of the elements and attributes rather than either the information model or the user interface will just be mystifying. And especially confusing when they give messages about where the problem was found, not what caused the problem: for example when there is a missing element and the error message is in terms of "Found unexpected XXX" rather than "YYY is missing".

I am a bit of a broken record on this, but I think a relentless emphasis on the human user is really important for standards: XML succeeded by providing not only simplicity but native-language markup.


Edd Dumbill
2007-05-23 00:10:13
We missed you, Rick!
Henri Sivonen
2007-05-25 09:02:10
I'm glad to see that my presentation gets blogged about. Thank you.

The conditions detected by the table integrity checker are enumerated in my master's thesis. (Some of them could be checked in Schematron as well.) My understanding is that Schematron (with XPath as the query language) cannot check for overlapping table cells. I'm also very curious about how you have managed to check that the columns are occupied exactly when there are cells spanning multiple rows.