Reasonable principles for reviewing Open XML and other standards

by Rick Jelliffe

I think by now most people are pretty phlegmatic about accepting various assertions about Open XML and ODF on face value. The sky is not falling. The boy is crying wolf. A seive is full of holes no matter how loudly someone shouts that it is a bucket.

When, for example, one side says "Open XML normatively refers to MS' proprietary WMF" and the other side says "Err, where? Not in the Normative Refences sections" and the first side says "Err, then there is an *implied* normative reference because a mention is made of it elsewhere as a possible kind of graphic that may come in from the clipboard" and the other side says "The ISO usage of 'normative' revolves around indispensibility: isn't 'possible' the opposite of 'indispensible'?..." disinterested observers may think Surely there is a more constructive approach? These silly examples are distractions from serious concerns.

So here is what I suggest, for national bodies reviewing Open XML: adopt a set of general principles and apply them (to Open XML, ODF, and whatever). When someone raises a specific issue, verify that the issue indeed is as claimed, find the general principle, and base your responses on that, with the particular flaw as an examplar. The tactic adopted by some activists is to read the draft text, think of the worst possible interpretation and ramification, then insist it is the case: the "normative reference" example is a good case of this. The trouble with this approach is that it won't work; impartial reviewers will note that there is some kind of concern but that the actual issue raised does is not a problem. The result will be frustration and a lack of a "meeting of the minds". Indeed the legitimate issues that underly some of the anti-OpenXML comments risk being unaddressed.

What kind of principles would there be? Here are a few off the top of my head:

Principle 1: A schema must allow standard data notations for atomic, embedded data fields, where the standards exists, and may also allow local, common, optimised or legacy notations.



Applying this to Open XML, for example, it would mean that where DrawingML uses EMUs coordinates, it also should allow inches, cm and points. And where Spreadsheet ML allows numbers for date indexes, it also should direct ISO 8601 dates. Do you see the difference between saying "Open XML should be banned because it uses EMU" and "Open XML should be improved to allow more than EMU". The most important thing is that this is a superficial change to the exchange language, not to the underlying model: it doesn't force MS to adopt a different model or require them to generate standard units. (That is a different issue: the issue of profiles or application conformance.)

Principle 2: A schema should allow direct representation of data fields, and may allow optimised forms as well



Applying this to Open XML, we see that the string approach taken by SpreadsheetML conforms: you can have text directly or index to a shared string table. Adopting this principle lets a National Body vet the issues: if someone says "This doesn't look like HTML! Therefore it is bad!" the NB can say "We adopt the principle that optimized references can be allowed as long as literal content is allowed too".

Principle 3: A schema language for compound documents should support an indirect or over-riding reference mechanism for entities or resource, and may disallow a direct mechanism.



SGML and XML DTDs have a mechanism called Entities that allow indirect references. This is really important for maintance of large documents, because it disconnects references from names: you can update a graphics file and a single reference. Applying this to Open XML, OPC meets the criterion. OASIS catalogs would also probably fit the bill.

Following from principle 1 and 2, an indirect reference mechanism should allow the standard notation (IRIs) but may also allow a local or optimized form. Applying this to Open XML, this principle would mean double checking that IRIs are allowed (I will check this sometime) in OPC; I don't think that OPC uses a local, optimized or legacy form (I will check this sometime.)

Principle 4: Notations for legacy or obsolescent technologies may be included in a standard, but should be in an informative part, clause, namespace or annex.



Applying this to Open XML, the sections on VML would be marked "informative".

Principle 5: A standard should be arranged as a modular, simply layered container, to allow plurality and evolution



I am not sure of the ramifications for Open XML: I need to check the part 5 of the standard, which deals with extenions and future-proofing. Certainly the use of MIME types in OPC follows this principle, but it goes more than that: could DrawingML be augmented or replaced by SVG for example? (I will check this sometime)

Principle 6: A standard core should be platform-neutral and may allow optional platform-dependent extensions, in a separate annex, namespace or clause where appropriate



I think Open XML is OK in this regard: it allows Word macros, Java, and other scripts, but these are not required and IIRC partitioned.

Principle 7: A standard should address a market requirement, and the availability of a standard for one market or set of standards does not preclude the development of a standard for a different market or set of requirements



In other words, no standard should be denied merely on the grounds of "My requirements are more important than yours". In the case of Open XML, it means that "don't ignore the elephant in the room" arguments —that the needs for level-playing field basic document exchange by governments and suite vendors (ODF's supposed sweet spot) trump the needs of integrators, archivists, and so on for Office's format to be standardized— would be rejected. (Not rejected from all consideration of course, but relegated to their proper place, which is for legislators, regulators, CIO policy makers, and profile makers, not ISO.)

Whither Interoperabilty



When a standard followed the kinds of principles above, it allows both full-fidelity (the main principle behind the design of Open XML) to meet round-tripping/API-replacement/archiving requirements, and it sets the stage for interoperability between different systems: this is where in addition to the broad requirements of the standard, specific limitations are imposed so that all the different kinds of local, legacy, optimized, common-but-non-standard, and platform-dependent notations, media types, scripts and so on are avoided. ODF has just as much need for these kinds of profiles as Open XML does, as far as document interchange goes, by the way.

It is a kind of paradox: an "open" data format must be extensible, but the more that extensions are used, the more that a closed range of applications will be able to use the document; a document format that is "open" in the sense of having a fixed definition that allows guranteed document interchange is actually must be a "closed" (non-extensible) format! The solution? The long-standing policy of SC34 is to standardise "enabling technologies" and to leave profiles to user groups and industry consortia: XML itself is an example of this. ISO SGML allows many different delimiters; the industry consortium W3C picked a particular set of delimiters and features, added some internationalization features, and re-branded their profile "XML" which gives simpler interoperabilty.

In the absense of these kinds of principles, what we have is a line of argument that reduces to "Microsoft is bad, therefore anything they do or make is bad", even when Microsoft is forced to backflip and to start doing the opposite of what they previously did: in this case, abandoning closed, binary formats. Ten years ago, Bill Gates was saying they would be crazy to open up their file formats, now they are doing it. If users and, most importantly, system integrators, keep on encouraging them to further open up and adopt a more modular architecture, it bodes well for where we will be in ten years time. The future is mix and match.

8 Comments

len
2007-05-10 20:31:09
Good approach, Rick. Fair and future-proof.


I love it when the pros are able to articulate positions that enable the standard to both function and breathe.


++1


len

hAl
2007-05-25 08:29:41
[blcokquote]could DrawingML be augmented or replaced by SVG for example? (I will check this sometime) [/blockquote]


Probably DrawingML has several more features than SVG does. Even in ODF they have altered the standard SVG for use in Office documents.


I do like your describing of a method. Many a anti-ooxml zealot might want to read up and that and look at the standard proposal as something which can be improved to a very good standard and not as something to slander at as being a poor standard just because MS was involved in producing it.

Nik
2007-06-25 20:17:27
Hi Rick,


I found your appraisal here quite interesting, and overall quite reasonable.


I certainly agree that many objections to OpenXML could be fixed with minor changes to OpenXML, but isn't that missing the point? Surely these changes need to be be made before it is accepted as a standard? My understanding is that the ISO bodies are being asked to vote on OpenXML as it currently stands, not on what it could be with some number of improvements. And if they are so simple, why haven't these changes been made already, so the bodies could vote on that? In reality, the standard will be whatever is ratified in the vote. So if OpenXML is currently broken in various areas, and is accepted as an ISO standard, then all we have as a result is a broken standard.


I also don't understand your argument that "the needs of integrators, archivists, and so on for Office's format to be standardized" requires making OpenXML an ISO standard.


Surely integrators and archivists will need to convert existing documents from their legacy formats into the new standard format, whether the legacy format is Word, WordPerfect, or whatever, and the new standard be OpenXML, ODF, or something completely different?


I am strongly opposed to OpenXML becoming a standard, primarily because I don't see "legacy support" as a valid reason for crippling a new standard. As a single example, the part of the OpenXML standard which specifies incorrect date interpretations for a small period of time in 1900, for the sake of legacy Microsoft Documents and applications.


Surely the new standard should provide for consistent and non-broken representations, and rely on the conversion process from legacy to the new standard to take care of correcting flaws in the legacy representation?


I really believe that a new standard should represent current best-practice and be forward-looking, rather than represent past mistakes and be backward-looking.


Thanks for listening.


Cheers!
Nik

Rick Jelliffe
2007-06-26 05:27:36
Nik: I think your understanding is indeed wrong. There are various options for voting: the one that is most often used when there is a lot of interest in a specification is "no with comments" which means "yes if these improvements are made". It is not a simple up/down vote. If there are not enough simple yes votes to get accepted (which I don't expect and would not welcome) but a certain number of "no with comments", then there is a ballot resolution meeting (BRM) by in which editing instructions are developed based on the "no with comments" votes. If the instructions can be adopted (and the resulting spec is still acceptable to the proposing body Ecma) then after another vote (at the BRM) the thing is accepted.


The BRM is the big chance to fix problems. Where did you get the idea that there would be no forum for fixing up problems?


I would expect any standard to have changes during the process and then to have other changes (corrigenda and addenda) even after adoption. Standards that are not maintained or improved are dead standards.


Please spare me the cut-and-pasted boilerplate talking point about Open XML. The two month period more than a century ago in which there can be out by 1 error if dumb formatting is used in one of the spreadsheet date formats is simply not a big enough problem to prevent ISO standardization. It is a trivial edge case, of academic interest, not a showstopping flaw.


On the issue that a standard should represent best-practice, the trouble is that there are many other places in Open XML where there are best practices: so should we have a trade-off so that one flaw cancels one best practice, or should we judge how trivial or important the flaw is in context?

Nik
2007-06-26 20:32:07
Hi Rick,


Thanks for your thoughtful response, and for clarifying the voting and editing process. However, I still fail to understand why ECMA couldn't have addressed the problems before making the proposal and asking for a fast-track decision. Why the rush?


You state that you expect that the vote will be "no with comments". What happens if that isn't the case, and the broken proposal is accepted as it stands? This surely is a possibility? And why would ECMA put this proposal forward in its current (broken) state unless they wanted it to be accepted in this state?


You state: "I would expect any standard to have changes during the process...". Why? And perhaps more to the point: which part of the process?.


I understood the purpose of all the working groups (the formulating part of the process) to be the place to sort out the obvious problems in the proposal. This is where the industry experts are supposed to get involved to ensure the standard does represent best practice and isn't broken. This part of the process would seem to be over, and we are now in the part where the standards body is being asked to accept the proposal. I don't understand why you would expect this part of the process to be responsible for fixing problems that are already apparent.


You then go on to say: "Standards that are not maintained or improved are dead standards." But this is a completely different thing to fixing a standards proposal before it is a standard. I agree that standards should be maintained once they have become a standard, but that doesn't mean that standards bodies should be asked to accept broken proposals (which is what ECMA is asking) or to fix broken proposals in the voting process.


My apologies for what you consider the "boilerplate" flaw I used as an example. I cited a single example which I felt clarified my point - I never claimed that this point alone should be a show-stopper.


You then state: "so should we have a trade-off so that one flaw cancels one best practice, or should we judge how trivial or important the flaw is in context". I don't understand how you conclude these are the only alternatives. Why can't we have a standard with only best practice, and no flaws?


There is a lot in OpemXML which is already supported in the existing ODF standard. The primary points of difference in OpenXML are all the lagacy support for the Microsoft formats. In terms of best practice, surely the best way forward is to take the existing ODF standard as a starting point, add whatever best-practice parts of OpenXML that are not supported by the existing ODF standard, and discard all the legacy parts of OpenXML.


I also didn't find any response from you to my question as to why there is a requirement to make OpenXML a standard at all. Did you answer that point, and I just failed to understand that, or am I correct that you didn't answer that point?


Thanks again for your response and clarifications.


Cheers!
Nik


Rick Jelliffe
2007-06-27 05:12:41
Nik: I think there is absolutely no chance the existing text will not have many improvements, either from "no with comments" or even "yes with comments" (which I only found out was available today: it would be used for non-showstopping fixes). The process is there to help make sure the issues that national bodies raise get addressed one way or another.


Which part of the process? Well, for fast track, this is the only place where changes can be made: not changes to the semantics of the technology (that flies in the face of reason: this kind of standard is useful to the extent that it accurately reflects the external reality not because it dictates reality) though. Even in the normal standards process, you would expect comments right up to the final vote: it is not an easy process to make a good standard.


Why should it become a standard? Because there is a market requirement (e.g. integrators who current work with .DOC binary formats, archivists, and remember that it was the EU that asked MS to submit their formats for international standardization in the first place). Because the final text will be of an acceptable quality and the IP issues will be sorted out (though I expect there are people who will never be satisfied in both areas). Because it doesn't conflict with ODF: the drivers for ODF adoption are different from the drivers for Open XML adoption, and the latter will not cancel out the former. Because Open XML is good for the open source world and open-source-using developers (such as myself, where my company uses Java on Linux for a large part), allowing better reach into MS-dominated sites.


And because the underlying data models of Open XML and ODF are different enough both at the heart and in their details that they are not substitutes for each other in their current or short-term forms. Check out ODF editor Patrick Durusau's recent comments at INCITs in this regard.

Nik
2007-06-27 18:21:42
Rick,


Thanks for your answers. It's nice to get a clear response rather than zealotry.


However, I still don't understand how OpenXML represents a benefit to integrators and archivists that ODF does not. Surely if existing .DOC documents were stored in ODF, integrators such as yourself as well as archivists etc, would get all the same benefits you attribute to OpemXML.


I had understood from your original post that you felt integrators and archivists needed OpenXML, but you didn't say why. In your response you have repeated the assertion, but still haven't actually said why. I understand from the rest of your response that you believe that an existing .DOC document could not be converted to ODF with the same fidelity as with OpemXML. Have I correctly understood your position?


I would find that surprising given the success I've had converting to and from .DOC documents (and particularly old ones) using OpenOffice, but at this stage I would have to bow to your greater knowledge on this topic. I obviously need to research this further. On that point, thank you for the cited reference to Patrick Durusau.


One question though, in the hope you know the answer: Does the perceived ability of OpenXML to represent .DOC documents with greater fidelity than ODF rely on those parts of the OpenXML stadard which don't state the behaviour explicitly, but instead state that the new application must replicate the behaviour of the old Microsoft application?


The reason I ask is because I would expect that those parts of OpenXML are most likely to feature in the "no with comments" responses, and so if they are removed from the standard, then does the original claim that OpenXML can represent a .DOC document with greater fidelity than ODF still hold true?


One other interesting point: You state: "[...]remember that it was the EU that asked MS to submit their formats for international standardization in the first place". Is that actually the case? From the reports I read, I understood that Microsoft had only been asked to open up their formats - in other words to publish the details. I didn't read anything about EU asking that they be made a standard. In addition, I understood that the directive (I thought it was a directive, not a request) applied to the existing binary formats (eg Word6, Word95) as much or more than a new XML format.


Thanks again for your helpful responses.


Cheers!
Nik.

Rick Jelliffe
2007-06-27 23:04:12
Iceberg: The trouble is that stripping out things in Open XML that are not in ODF leaves you pretty much with...ODF. It is the differences from ODF, or at least, the completeness of Open XML, that is the value of ISO Open XML.


For reviewers, it becomes difficult to say on one hand that the specification needs to be complete ("wee need more") but also on the other hand that it needs to have obsolescent parts like VML entirely removed ("we need less").


I don't think that people realize it, but MS was pushed on the Open XML path by the EU, who asked them to open up their existing formats as XML and to submit them to an international standards body. In part, they are opening up in order to mollify the Europeans, it seems.