First impressions of Open XML: revisited

by Rick Jelliffe

There has been so much disinformation put out about the limited review time for OpenXML, that it might be salutory for people to revisit a review of the Open XML draft I put on this blog dated Thursday May 25, 2006.

You read it: May 2006. That is 22 months ago! Not "5 months", not even 9 months as the claptrappists say. June, July, August, September, October, November, December, January 2007, February, March, April, May, June, July, August, September, October, November, December, January 2008, February, March.

To the people who are saying they have not had enough time in 22 months, I have no sympathy: you should have been reading my blog! :-)

I want to go through all of what I said then. I think it holds up really well.

A new draft of Open XML came out on my birthday. 4081 pages of PDF, and very impressive for anyone who has worked on specification and standards. Two things stick out: first how horrible XML Schema fragments are when stuck inline to document structure; second, how the implementation-neutral tone of the introduction is at odds with the elements for various kinds of Active X embedded objects. I suspect people would be a lot more comfortable if the elements for Active X embedded objects were in a different namespace, and gathered into an appendix of some kind. Antiques and curios. It will be interesting to see what the extensibility strategy will be (it hasnt been released in this draft.)

By halfway through the Ecma period, the spec had doubled in size with extra material from its original submission of about 2000 pages from Microsoft. In the subsequent six months it increased by the same amount. So much for the idea that Ecma TC45 merely rubberstamped the original submission from Microsoft.

The comment about the horribleness of XML Schema fragments is still one I'd make. The BRM at least made them non-normative, but it did not agree to remove them entirely. I expect when people see the new generation of multi-format standards that some SC34 people are championing, where you can turn on and off normative sections, we can see the end of this clutter at reader request, which is perhaps the sweet spot.

The comment about Active X of course later became a mantra, with various demands that either DIS 29500 should have no normative reference to proprietary binaries or that it should more (to bring them under the OSP). But it was an important issue that was addressed during the BRM and can benefit from continuing vigilance. The idea of gathering legacy proprietary elements into some kind of appendix is exactly what happened, at least for the compatibility elements, at the BRM. (I don't know that many of the participants at the BRM would have been comfortable with namespace-based notions of conformance, I didn't get the impression that using namespaces or schemas as tools was on many delegate's radars, no disrespect intended.)

The extensibility strategy came out as a separate part, with no significant trouble as a technology. Though some people have subsequently discovered that extensibility and "openness" (meaning guaranteed receipt) do conflict: this is something I have repeated talked about: the need for profiles. On the general subject of extensibility and interoperability, Joel Spolsky has another good article this week: Martian Headsets

On the technical merits, well actually I dont know if they matter much. I say potato. Exporting to HTML or XHTML gives people base-level interoperability for most documents, which neither ODF nor Open XML will challenge; at the high end the solution is exporting to XML using a domain-specific schema (e.g. S1000D for military & aerospace) and not ODF or Open XML at all; in the casual middle we will have ISO ODF available, perhaps as the interchange format of choice, as well as ISO Open XML (if it is accepted) for when you need to track MS Offices capabilities closely. I think there is substantial value in a standard XML format for MS Office documents even within organizations that will mandate ODF for interchange and archiving. The availability of the alternatives reduces the need for ODF or Open XML to be the one true interchange format.

I think I still agree with everything there. (By technical merits, my point is not to do with the state of the draft, but about doctrinaire views on optimal technology which are ultimately subjective, and the benefits of plurality.)

I still think ODF is the appropriate format of choice for level-playing-field document interchange, especially for governments, though it seem ODF 1.2 and 2009 are the more realistic time-frames for this. And Don't forget about HTML!

Probably coming from the industrial publishing background biases me here: the need for dumbed down interchange formats is real sure enough, but the need for intricate close-to-the-metal feature-exposing typesetting feature access is also important for different contexts. Word's binary formats and RTF's weaknesses have long held Microsoft's applications back from being happily usable in serious industrial publishing systems (or, at least, have often held back the people who adopted them.)



Gary McGath
2008-03-28 09:49:35
That's "Office Open XML" or "OOXML," please, not "OpenXML."
2008-03-28 15:08:01
Hi Rick

I am not so sure that all the horribles have been made non-normative. As far as I remember from the BRM there are now two conformance classes, strict and transitional. Both are normative. I worked hard, together with the rep from France, to make the at least the strict conformance class somehow vendor neutral. We failed. That for me was the greatest failing of the BRM in terms of what it could have achieved.

There was an opportunity to create a new standard based, not on a structured hierarchical view of a document, but on an unstructured linked list of text runs. Not a brilliant view of a document but something with some sort of a case. I wasted too much time on this fruitless exercise. I would rather have been fishing with the kids.


Rick Jelliffe
2008-03-28 22:08:42
Gary: Yes, that is a better name now. But OpenXML is what the original blog had and I was quoting that verbatim. (Actually, I am not sure whether the original Ecma draft was just called OpenXML at that point too.)

One thing I always try to do is to distinguish between the technology and the draft or standard text. So there is no contradiction between "I want a standard for OOXML" and "I don't support DIS 29500 mark I as a standard" (which was the position I came to). Similarly there is no contradiction from someone saying "I don't support a standard for OOXML" and "I do acknowledge that DIS 29500 mark II meets the technical requirements for a standard."

Rick Jelliffe
2008-03-28 22:23:58
Bob: That "horrible" goes to scope. I certainly agree that the "strict" scope would be better expressed in a product-independent way based on technological characteristics (e.g. of linear structures for WP and so on) to distinguish it from ODF more clearly, and it is not out of the question that if the mark II draft is accepted, Part 1 will be maintained along those lines.

I mentioned this to several people at the BRM, but not with any great buy-in: in the start-game delegates were trying to figure out how the game is played, in the middle game they were doing work on their issues, and in the end-game they had swung towards realising that they could miss the low-hanging fruit unless they were focussed.

However, I don't think that the Transitional conformance needs to be remotely vendor neutral: the scope was to bring out everything that that Office actually has, warts and all. (My joke is that OOXML is open in the sense that a flasher's raincoat is open: openness is no guarantee that you will like what you see!) However, I'd see the transitional spec as evolving to include more than just transitional Office features (what I call "antiques and curios" above): it already has a couple of WordPerfect things, and I thinkg ultimately it could usefully evolve into a listing of all the kinds of legacy features that anyone anywhere used: the various odd options in troff (the open source groff version was written by SC34's James Clark) and TeX (open source by Donald Knuth). All in nice namespaces so that they can be included as properties whether in OOXML or ODF or their successors.

So I'd see the transitional Part's shortcoming not as having too much vendor-specific information, but being too limited in only having one vendor's detritus when it would be better to have more.