I'll use this entry as an anchor for my observations on the final day of Extreme Markup Languages
. I'll update it with a note each time a new talk begins, but I'll add my comments on the talk in the comments section. There is a numbering scheme for the talks, to correlate to comments.
If you happen to be reading this in an aggregator, much of the meat is in the comments, so you might want to click through.
D4.1. "Lessons from monitoring the hedge funds: Markup identifies and delineates. Does it give your position to the enemy?", Walter Perry
D4.2. "Declarative specification of XML document fixup", Henry S. Thompson
D4.3. "Topic maps, RDF, and mushroom lasagne", C.M. Sperberg-McQueen
Walter Perry is here to say terrible things about ontologies.
He argues (as he always has) that standard data vocabularies are evil, delusional, and the source of wailing and gnashing of teeth.
He is working on an attempt to regulate hedge funds and private equity on the same footing as everyone else. Citing the very day's headlines from the financial markets, Walter notes that "not only is this subject late breaking, but it's just broken into pieces and lying on the floor"
Markup is identification, consisting of specification and delineation. Markup resembles the process of regulation, and is the first step in enforcing such.
To a hedge fund, regulation is just a risk that can be swapped, sidestepped or otherwise offset.
Ontologies are subject to malevolent intent. They are a priori predication, and we need a posteriori. This is the concept of "double-entry" data semantics. Semantic Assertions should only be applied to the audited survey of process at two separate entities, and not by their both accepting a centralized ontology.
Where interests differ, mapping of input should be accepted skeptically because each entity cannot be trusted not to obscure these inputs to avoid losing peculiar advantage.
D4.2. TAG is discussing what it wold take to reconcile tag soup with well-formed/valid format. Likely approach will be to describe common browser error recovery mechanisms. This will be key input to HTML5.
They are leaning towards TagSoup (hear hear! http://www.ibm.com/developerworks/xml/library/x-tiptagsoup.html ). TagSoup's tokenization phase outputs McGrath's PYX (see http://www.xml.com/pub/a/2000/03/15/feature/index.html ). The rectification phase takes this and turns it into a new, corrected PYX.
Discussed issues with TagSoup, for example idiosyncratic namespace handling and the fact that rectification is not declarative (though ).
HT is proposing PYXup. Stay with TagSoup conventions of 2 stages, streaming, no lookahead in rectification phase and minimal in tokenization phase. Expose the fixup controls declaratively as much as possible. HT uses the TagSoup tokenizer unchanged. Fixup is driven by a schema and an error recovery spec. Schema only specifies element vocabulary and immediate dominance (including allowance of text content).
Discussed some of the error conditions and recovery rules implemented in PYXup.
Notes that HTML table and form are particularly hard cases, and that Tidy and TagSoup both do something very unhelpful.
Pointing out a very strange fixup by Tidy, John Cowan said: "It does that because it has DOM and can move stuff around. I don't have a DOM, so I can't make such mistakes"
Also highlighted the nightmare of script.
HT says PYXup is implemented in Python, which means "it's easy, but it's slow". John Cowan applauded this, saying "I've always wanted a Python port of TagSoup, but I'm not competent to do it. I'm glad someone is expressing my elegant ideas in an elegant language".
D4.3. I learn that CMSQ's closing this conference, and its predecessors, is a long tradition. I can see why. He's tied all the conference events together in a grand philosophical arc that is satisfying, but hard to report.
Thanks for writing all of these out, especially D3.7. Extreme is always held when my family is at the beach, but reading through your comments I get a very good sense of what was going on.