10 Corrections to Open XML

by Rick Jelliffe

This is a simple list of 10 general corrections to Open XML. There have been comments recently that pro-Open XML people are not contributing any fixes, so here are my big ticket items. They flow out of the principles that I mentioned in my blog before, and other discussions, and my distaste for non-verifiable specifications.

(If Australia decided to become a P-country and vote for ISO Open XML, these are the general corrections that I would submit to Standards Australia, apart from specific typos and unclear sentences. Whether they formed part of a 'Yes with comments" or a "No with comments" wouldn't bother me, since they all are fixable.)

It is here as PDF and some blog feeds will have it below in the extended entry. Download PDF file

19 Comments


2007-07-16 07:03:54
rick said:
"layout and other algorithms should not be specified (or specified non-normative text in functional or general terms only), and implementation examples, samples and exemplars should be clearly marked as non-normative text."


"Obsolescent technologies should be removed to informative annexes. In particular this is VML."


"Platform-dependent or binary technology references should be removed to informative annexes. In particular, this relates to clipboard formats and printer drivers."


"Hints concerning legacy behaviours should be partitioned using the extension mechanism"

"However, as before, particular algorithms must not be given in normative text."


"This may mean that large sections of the text that is currently phrased normatively should be either reduced in strength (from shall to should, for example) or be made informative."



I have my big ticket item too, inspired by Rick recent contributions:


1. Mark all DIS 29500 as informative text. Annex a copy of MS Office 2007 as the normative part of DIS 29500.


Ready ! ;-)


--omz



Rick Jelliffe
2007-07-16 09:35:08
omz: That is more like it! One of my suggestions early on was to entirely remove all the text, and just have the schemas. In standards for languages, the syntax (notation, schema, etc) should have the primacy of place. In standards for applications, the (abstract?) test suite should have primacy of place. The trouble is that the two get combined willy nilly, because people want to provide semantic information on their languages, but without test suites. This means that standards end up with lots of unverifiable requirements (unverifiable in terms of the verification mechanisms the standard provides, not unverifiable in absolute terms.)

2007-07-17 00:53:32
I would say Items 6 and 7 on that list are more preferable than nescesary as the ISO fasttracking procedures actually make allowance for documentation in a non-ISO conforming style keeping in mind that the maintenance of the standard is not with ISO but with Ecma.
I think the rest of the list sums up a lot of things I suggested before commenting on Brian Jones blog.


Besides that correction that the Ecma TC can do I think that Microsoft could state it's commitment to developing future versionsand their intention to keep their contributions to the spec patent claim free.


[blockquote]One of my suggestions early on was to entirely remove all the text, and just have the schemas[/blockquote]Ah, the sad life of an schema expert. Semantics information is a vital part of a language. The syntax is only a shell.

hAl
2007-07-17 00:58:20
Tsss there is a visual split between the comment form fields with the name field on the right of the article and the comment text field below the article (using ie6). That is fairly anoying

2007-07-17 07:18:30
Surely to guarantee your changes are incorporated you would need to vote "no, with comments"?
Alex Brown
2007-07-17 08:14:50
Rick hi


I believe Standards Australia can vote, because the ballot is open to "all ISO member bodies", not just P-members (Directive, 9.5).


If fewer than 50% of those eligible to vote do so, the Fast Track process terminates immediately.


- Alex.


Alex Brown
2007-07-17 08:21:15
Rick hi


*** Correction to the above: the 50% rule only applies to P-members!


- Alex,

Rick Jelliffe
2007-07-17 14:36:40
Anonymous: "Sad life of the schema expert" Thanks for your sympathy. It means a lot to me.


I think it comes down to expectations too: if you think a standard should state all aspects of the technology in question, then either you get text full of unverifiable (or verifiable only by human inspection) statements (counter the ISO requirement for verifiability) or you have to define application behavours and therefore test suites (counter to the purpose of an application-independent description of a file format.) I think ISO Open XML would serve its purpose if it just had formal descriptions (the current XSD/RELAX NG schemas, the NVRL schema, some extra Schematron schemas as needed, plus formal descriptions of other notations used), with the text on intended semantics coming from other sources: Ecma, MS, reverse engineering.

hAl
2007-07-17 15:06:51
[quote]Surely to guarantee your changes are incorporated you would need to vote "no, with comments"?[/quote]


If it is comments/changes you would support you might vote "no with comments".
If it were comments/changes you do not support you would not supprt such a vote.


I already saw strange comments for instance in a panel from a national body on changing the name of the spec or splitting it in separate 3 parts to be standardized seperatly. These to me seem ridiculous changes to demand in this proces and that kind of comments I would not support but the comments made here by Rick seem pretty good for improving the spec. I hope that some national body reads this and consider this as an example for a workable discussion.

Rick Jelliffe
2007-07-17 15:30:50
Alex: Thanks for that.


Anonymous: I have no problems for a "contingent no" on these. There is no "poison pill" (an impossible requirement slipped in, such as requiring that the spec also include complete documentation on all 26 versions of the .DOC format and the mappings to the XML, all versions of RTF and the mappings to XML, complete descriptions of all line-breaking alogrithms, documentation of all graphics formats ever invented, and so on.)


hAl: Actually, I asked the Director of the Bureau of Indian Standards, who is intimately knowledgable about JTC1 issues, whether splitting up the standard was a possible outcome of ballot resolution, and he said he believed it is not allowed under the rules. Otherwise I would certainly support it, but it seems it is not possible.


I actually was the one to suggest to Patrick Durusau that name change (and improvements to the scope section) would be useful, and we sounded out a dozen or so different choices, without particularly agreeing on one. The trouble is that "Office Open XML" is clear but it doesn't reflect the scope particularly well: the name of the standard can change without altering the name of the technology or its branding. Adopting the standard some really pissweak name will not please MS or please the anti-Open XML people, but it actually gives them both the core of what they want and may be good outcome politically: Open XML becomes an ISO standard without mischievous technical interference but the ISO standard clearly distinguishes itself and the reasons it was standardized (e.g. we may vote for it just for legacy reasons whereas MS wants it for future reasons) from ODF.

Rick Jelliffe
2007-07-17 16:42:28
Anonymous: Further to your comment that "The syntax is only a shell": for verifying/validating a document format, syntax is all you have.


Testing anything beyond syntax goes into the area of application conformance, and requires test suites and procedures. Consequently, because ISO standards are only supposed to have verifiable statements (i.e. no fluff), a format for a document standard should try to limit its normative text to syntactic statements.


It is indeed odd, but the logical consequence of trying to have application-independent (which is different from application-neutral, particularly in the case of Open XML) descriptions of formats is that syntax becomes king. Consequently, forms of markup and schema languages that allow the syntax to be cleanly, accurately and precisely defined become super important. Document standards are not specifications for applications. Their primary role is to let you take a document, in isolation from any application, and say "does this conform to the standard?" If a standard also gives hints and explanations of the default semantics, that is sometimes because the schema language used is not powerful enough, but also because there is no separate describing an abstract application (which in turn should have an abstract test suite.)


For example, take tables in WordprocessingML. The schemas for tables give some syntactical information, but the constraints would be much more complete and testable if there were also Schematron schemas: the goal should be that every constraint on a table can be tested using schemas with no need to open the document up and check anything visually. Now as a separate specification (or part) to the document format, there should also be an abstract description of the semantics (what table spanning and formatting looks like) with (as a separate specification or part) an abstract test suite. So the Schematron schema would test "The number of cells minus the number of horizontally spanned cells should equal the number of columns" while the application test suite would say "A horizontal spanning test will check, for each of some reasonable number of table columns, spans of 0, 1, all-remaining and all-remaining+1 that spans of 0, 1, complete and error are rendered." (Bad wording, but you get the drift.)


This is a general objection I have to both ODF and Open XML, that they try to have it both ways, claiming to be document formats while adding a whole lot of unverifiable application semantics.

Rick Jelliffe
2007-07-17 16:57:11
Anonymous (continued): Continuing this thought: in fact, if a constraint or semantic cannot be expressed in a reasonably power schema/syntax language (i.e. BNF, XSD, the ISO DSDL languages) it may indicate that the technology is baroque. You might be aware of Peter Sefton's criticism of the ODF table model, for example, that (IIRC) you cannot readily convert them to HMTL tables without, in effect, building a complete renderer for them: presuming that this means that the constraints are complex, perhaps the discipline of saying "We won't specify any technology that cannot be readily validated" might have set off warning lights to the standards committee involved.

2007-07-19 06:15:37
"This is a general objection I have to both ODF and Open XML, that they try to have it both ways, claiming to be document formats while adding a whole lot of unverifiable application semantics."


That assumption has counter examples. The X3D standard is an ISO standard and contains two encodings (for XML and for Classic VRML). The VRML experience was that for real-time 3D, a syntax model was not sufficient where the goal was the same rendering and behavior on two different implementations. In X3D, there is an object model and over time with cooperation, that has been tightened up. So there is a counter example in which syntax is insufficient given an application where the goals include dynamic behavior (real-time) and rendering fidelity (the color models are central).


For some subset of the existing content, this has worked. Unfortunately I can point to my own content where it does not. It is still not difficult to write content that passes a syntax check but causes the implementation to fail dramatically.


Syntax is not enough in all applications and the position that a document is independent of its object model doesn't work in all cases.


As to the ODF vs OOXML war, the ending of this will be fatigue. There are too many vested interests here who not only can't reach agreement, none is wanted. But that is quite different from the technical assumption that syntax alone can provide a complete standard in all cases unless we more narrowly restrict the scope of the term "document".


2007-07-19 06:17:07
Apologies. I failed to sign that last post. Too little caffeine.


len

Rick Jelliffe
2007-07-19 07:20:02
Len: I think my position is evolving this: what people call "syntax" and "semantics" are better, for the purpose of standards, thought of as "documents" and "abstract applications" and treated as separate concerns as much as possible. (Obviously sometimes it is convenient for the purposes of communication to interleave the two in a document, though.) The rule of thumb for knowing whether some idea belongs to the world of documents or to the world abstract applications is to consider "how do we verify this?" If it can be verified syntactically (by a grammar or constraint checker, such as Schematron or XSD) then it belongs to the world of documents. If it needs some kind of (abstract) test suite, then it belongs to the world of (abstract) applications.


In effect, it means that when you have a standard with text that relates to abstract applications but which does not have a test suite for them, that text should be informative (with the understanding that "informative" does not mean "do your own thing")


2007-07-19 11:00:08
Aha. I get that. And that is how it is worked by the Web3DC/ISO partnership. A test suite has been developed and being refined. VRML taught us we couldn't actually have just a syntax standard and meet the goals, and you are right that the stated goals make all the difference in the outcomes. We treat them separately by the way the object model works with the syntaxes it can consume. The consumability of the data/document is the best guarantor of its long term viability. That is where one might want to place the bets rather than who has the most current desktops. Lock-in is not a long term condition in permeable ecosystems.


That consortia/standardsOrg partnership seems to be working. The trouble there is the perception that virtual worlds are representative of 3D on the web so all the talk is about WoW, SL, etc., and all the standards are X3D, Collada, etc. That is confused market vis a vis standards made worse by the sudden incursion of IBM which has no products or much business in the market but is willing to write checks. That situation will become very messy although there is a trickle of momentum in the direction of the standards for the new below-the-radar applications. This is one where the 'crowd wisdom' may prove to be wrong.


I think it tougher for the office document world. No standard can fly in under the radar there these days given it is the majority application (try to work without a word processor); thus, my pessimism on the notion of convergence. The idea of a separate standard to cover the majority of the document instances out there doesn't mean another can't exist. If it did, by majority rule, we'd have to kill the Sun/IBM proposals and give Microsoft the nod. Regardless of what is going on in the committee politics, the market has indeed spoken. Call it what they like, the numbers are what they are. Yet only if the MS sponsored candidate has the best consumability characteristics will it win this in the long term.


2007-07-23 05:05:19
I need an editor for open office to help with XML files. Anyone know where to find one?

2007-07-23 05:05:29
I need an editor for open office to help with XML files. Anyone know where to find one?

2007-08-08 12:29:28
Okay, and Microsoft should just sign the same patent license SUN provided.