A blast from the past

by Rick Jelliffe

I'm just heading off to Thailand for a week: I am speaking at a seminar on Monday "Interoperable ICT Systems
Seminar" with speakers from NECTEC, CompTIA and Microsoft, with me as Dr Strangelove. James Clark has threatened to be there and ask hard questions: scary! He lives in Thailand has been promoting open source software there for several years.

Going over the Open Office Office Open XML schemas to prepare for the seminar, the I've been struck with the similarity with early 90s SGML "big system" similarities: the HyTime era. Interesting to see old approaches reborn: the HTML generation of systems went a different way...small documents, no link integrity control, no reuse of links, no semantic labelling, indirection handled by servers not documents: MIME, HTTP, REST, the WWW was about how you could take lots of small dumb documents and build a big dumb eco-system, which turned out to be a fine and practical approach for many things.

I remember Dave Peterson suggesting that tables as we know them (HTML-style, CALS-style) were bad because they mixed presentation with content, for example: instead the data should be maintained in a separate semantical structure, and included by reference; so in SpreadsheetML, data and strings can be maintained separately.

Elliot Kimber has often argued that there are many "difficult" problems with handling large dynamic document sets that go away with a suitable, simple indirection method: hence his XIndirect, and indeed OASIS SGML/XML catalogs and even ISO DSDL's Document Schema Renaming Language (DSRL) which comes through Martin Bryan; the relationship system in the Open Packaging Conventions seem similar.

It is an interesting thought, though: at what point of complexity/maintainability does it become a requirement to add extra levels of indirection? I can see that both extremes are appealing: the one that says "just make do with simplicity" and the other that says "build in moderate indirection because it is easier to have it there when you need it and impossible to retrofit."


2007-04-28 09:02:37
Have a good time. We'll see you when you get back.
2007-04-30 18:42:58
It has something to do with address maintenance and energy costs (where energy is not electricity but some measure of effort). One question to ask (an old one), where do you declare structures, ideas, concepts, addressable units to be atomic (no outbound references; all references are inbound). It varies by system and language and application, but the concept should be reasonably the same in all document systems. Is there a heat or thermodynamic component that makes one preferable to another?

So one for you: is a scene-graph a document? Does it make a difference if it is real-time as a synth circuit is? When we talk about the Scene Application Interface (the API for X3D), some want to talk about them as a scene-graph DOM, Yet when working with an embedded X3D viewer object in an HTML page, the same kinds of element/ID relationships work but the API language is quite different.

I claim a scene-graph is not a document model. It is much more like a network of components and more similar to the circuit because of its explicit real-time event routing. This is one reason I think the separation of content and presentation is entirely artificial and a side-effect of thinking in terms of static documents (the XML mind cul de sac). XML being a syntax doesn't care, but being a structure, it has certain inefficiencies (the well-known element/attribute impedance mismatch). But that notion of adding real-time to a document model blows the separation out of the water and replacing it with semantic loading fixes nothing.