Standardize the jellybeans not the jars

by Rick Jelliffe

There is something about XML that makes people go crazy: in particular, people trying to make standards: its that ol' tag fever agin Maude. I think I know what that thing is: the emphasis on standards = good combined with the desire for complete schemas and the idea that organizing schemas by namespace is the way to shoehorn requirements (rather than being a way of expressing results).

The result: vocabularies where unnecessary order and structuring constraints are given. You can tell when a standard schema is over-specified, because people using it will just snip out the low-level elements they need and plonk these in their own home-made container elements.

I have noticed this in a few schemas I have been working with recently: in fact, the trend I notice is that people start off with their own home-made schema, then "adopt" the standard by finding any elements that have close semantics to their home-made elements, and changing the name of the home-made element to the standard name. SVG in ODF looks like an example of this, and there is another standard I have been working with recently that has the same issue: when you adopt arbitrary portions of a cohesive standard, are you really using or abusing that standard?

I suppose there is a case to be made that transitional schemas should be treated seriously.

One software engineering idea that has stuck with me over the last years (which I wrote about in The XML & SGML Cookbook) is the twinning of cohesion and coupling. Basically, that when some information is highly coherent (think of Eve Maler's Information Units) i.e., it belongs together semantically and would not make much sense in isolation, it deserves an official container.

Conversely, you should try to reduce coupling of information that is not cohesive.

A rule of thumb for many situations is that industry standard groups (and, indeed, inhouse schema developers), may be well advised to standardize data elements eagerly but container elements suspiciously: standardize the jellybeans not the jars. The next bloke may likes your jellybeans but have his own jars.

Various approaches to do this come to mind: think in terms of creating a vocabulary rather than a language; split your industry standard in two, with the tightly coupled elements in one normative section and the loosely-coupled elements in another non-normative section, perhaps with different namespaces even; use open content models and order-independence for loosely-coupled elements.

Another upside for this approach, is that it reduces the number of trivial issues for committee members to get excited about.


2007-11-21 06:09:24
The unsurprising part of this is that many SGMLers came to these conclusions over a decade ago (lots of litte schemas/DTDs) although that led to some of the wrapper approaches where entities were not well-supported (the Navy Work Package and the European cousins come to mind). I liked the frame approach and that was later replaced with divs oddly enough by the same people who disliked frames.

The advice has been repeated in many projects and forums. Yet the one-size-fits-all schema mentality persists. One might suggest lack of experience, but I don't think that is the cause. Some other fruit is tempting the language designers or their bosses.

I like the jellybeans/jars analogy. I suggested the approach you advocate for X3D because it was obvious that some parts of the 3D (eg, geometry) would be useful without the overhead of the behavioral model and the transform model. I lost that to the profile approach and today one of the main criticisms of the game language designers is the heaviness of the language and as noted before, the use of XML. Round 4 of graphics vs markup.

I wonder if we would see the same results if XML didn't require a root. Probably yes.