Extreme Markup, Day Last

by Simon St. Laurent

Related link: http://extrememarkup.com/

The brain-stretching continues, with one last morning of high-end talks at Extreme.

(Again, I'll be updating this over the morning. Fortunately, I'm largely sneeze and Sudafed-free today.)

The first speaker, Erik Hennum is presenting on a "unified type hierarchy" for DITA, an "information typing architecture" that combines topics, maps, and context but is not Topic Maps.

DITA comes with a set of basic types for topics, but deriving new topics permits the use of much more specific content structures - tasks instead of paragraphs, for instance. More specialized approaches also let applications create more specific interfaces.

Currently, they're doing derivation through architectural class attributes - DITA processing matches on class attributes rather than on element names. The system can be extended, but only by substitution - there's no addition of new content.

As Hennum put it, "that's working, but it could work better." Hennum looked through a list of requests from DITA vocabulary designers, especially around content models and attributes. To address these many issues, Hennum proposes substantial extension of the existing type system, and suggests that the model could be represented in UML or OWL while being implemented separately in XML Schema or RELAX NG for instance validation if needed. He's also showing how this might work in XSLT 2.0.


Next up is Eric van der Vlist, who is demonstrating XML/RDF query. He's presenting "100% angle brackets," showing purely examples rather than slides.

The particular data van der Vlist is showing comes from LDAP, and is a mixture of tree structure and graphs. He proposed exporting a graph view of LDAP to RDF, "using RDF outside of the domain of the Semantic Web." The LDAP tree turned into a very flat structure, though van der Vlist tried to minimize the "RDF tax," keeping it readable as XML.

The user needed to query the data, and van der Vlist explored options, including LDAP filters, XQuery, and the W3C's SPARQL query language for RDF, but all of them seemed too complicated for the task at hand.

Instead, van der Vlist turned to a query by example (QBE) approach. Starting from a simple approach of showing the query engine what he hopes to retrieve, van der Vlist developed it into a more robust approach with functions, joins, and conditions. There have been a few odd issues with RDF syntax expectations, but they haven't been hard to work around.


I almost made it through the conference without missing a session, but I didn't quite make it. I was checking out during the second-to-last session, unfortunately. Jeff Beck reported on the challenges involved in creating PubMed Central (PMC), a system for giving the public access to research results funded by the National Institutes for Health (NIH). There's an underlying XML repository, as well as process for submitting material and adding information to the system. One piece that stuck out was that they use tagging guidelines that are more restrictive than the DTD they use to validate, and use XSLT in a Schematron-like way as a "style checker." Integrating the material with links to publishers' sites also looks like a challenge. They've also used XSL-FO to generate PDFs. One interesting policy question here is that the grantees submit material voluntarily, not the publishers. So far they have about 1000 submissions, but the system is designed to support many more.

Closing the conference, as he always does, C. Michael Sperberg-McQueen spoke about "Getting it in writing." The description of the session is "The letter killeth, but the spirit giveth life. Or was it the other way around?" (The quote is from Corinthians 3:6.)

He started with a story about jazz great Charlie Mingus coming to a session with a few measures left blank for improvisation, and being told he was getting lazy. Next he turned to the notion that "getting it in writing" signals a lack of trust, and connected this (as well as the UK's unwritten constitution) to the W3C's early hopes for a lack of formal process, something which hasn't proven workable.

Sperberg-McQueen worried about an "endemic mistrust of democracy on the part of technical people," perhaps brought on by experience in high school, but also noted that people mistrust writing things down. Some of the time CMSMCQ thinks that the spirit is more important than the letter, but not always. He talked about the story in Plato's Phaedrus where Thoth presents writing, which the recipient realizes fosters reminiscence, not memory. Readers seem omniscient but know nothing - their answers will be the same without any concern for circumstance or audience.

However, "Individual memory is weaker, but the system is stronger," from Sperberg-McQueen's perspective. "Until artificial intelligence bears fruit, if it ever does.... markup makes computers look well-informed." He concluded with some discussion of "true names", something I think should be left strictly to magicians, and the conference was over.

Substitute? Extend? Restrict?