An astounding offer

by Rick Jelliffe

From the Official Blog of the Open Document Foundation. {Speaking I suppose to Microsoft:]
I have a counter offer that ISO/IEC might consider; Give us the keys to those legacy binaries and the documentation for the new MSXML InfoSet binaries that first appeared in Microsoft Office EXcel 2007, and we'll give you international standardization for EOOXML. A fair trade i think, because it will break the monopolist's grip, level the competitive playing field, and restore competition wherever desktop, server and device systems need to interconnect and exchange information.

Wait a minute. I thought OOXML had so many technical flaws it shouldn't become a standard! Now you are saying that these alleged flaws can be brushed aside by some other horse trade? Err, doesn't that mean they are not, in fact, showstoppers to you at all? If Microsoft gives you something else, magically they will become features not bugs? Breathtaking cynicism.

20 Comments

bryan
2007-01-24 05:46:56
come on, the argument from the odf folks has always been that only someone with the access to the legacy codebase of MS could implement EOOXML.
Preston L. Bannister
2007-01-24 08:08:07
Cynicism or pragmatism? The difference here is between new implementation, and old implementation. ODF has a nice clean definition that most folk would rather use for any new implementation. On the other hand, EOOXML is arguably meant to describe Microsoft's legacy implementation. In effect, with EOOXML the "real" standard is in Microsoft's implementation.


If Microsoft's implementation is the "real" standard, and you want the widest possible access, re-use of Microsoft's implementation is one path.

Rob
2007-01-24 08:12:16
I think the logic is something like this: Microsoft claims that the primary virtue of OOXML is its compatibility with their legacy binary documents. This is the justification for many peculiarities in the format, like calling 1900 a leap year. At all costs, OOXML was required to remain backwards compatible with the legacy binary formats.


However, Microsoft is the only who has access to the specification for these legacy binary formats. No one else has access to the technical information needed, without undue experimentation, to create an XML format compatible with things like Word 95's line spacing bugs. So, by having exclusive access to that proprietary format, they are able to project forward an exclusive ability to create an XML format that is 100% compatible with these same binary files.


(Note that I'd argue vigorously that OOXML has numerous faults that cannot be explained purely by compatibility reasons, and that in fact OOXML has not succeeded at being compatible with legacy formats, but take the above a proposition for sake of the argument.)


So, if Microsoft would open up their binary formats, then this would allow some real competition in XML formats, where they would compete based on their elegance, their expressibility, how well they lend themselves to tooling, how intuitive they are, how extensible they are, how easy they are to implement, etc.


But I don't really agree with the Foundation's stated quid pro quo. I'd suggest something even more ambitious. Remove OOXML from Fast Track. Throw it in SC34/WG1. Take the ODF 1.2 that will be submitted to SC34 later this year. Maybe get the Chinese in there as well with their UOF format. Throw it into SC34/WG1 as well. Persuade Microsoft to make their binary format documentation that is necessary for backwards compatibility publicly available under their Open Specification Promise. Hash out in WG1 a new format that is the successor to all three formats, meeting all needs. There is no reason why there cannot be a single format that meets all of the needs. And there is no reason why this format cannot be a lot simpler, a lot better specified, and a lot easier to use than OOXML.


Would this take a lot of time and resources? Absolutely. But if you look at the time needed to review ODF, OOXML, UOF specifications, and their inevitable refreshes, and the effort needed to implement these formats in desktop applications, and other tools, and the effort needed to write converters among these formats, I'd suggest that the cost to harmonize these standards is less than the cost to perpetuate disorder in this area.

Rick Jelliffe
2007-01-24 08:28:38
Preston: I take your point, but standardization shifts power back towards users. That is one reason why the big vendors don't like it, and try to make sure it only happens under their terms. Hence the current bunfight. ODF and OOXML have schemas that can be used to test the documents and to verify that products actually generate and accept conforming data.


If they change the application output to be incompatible with the standard, then organizations which mandate that standard will reject the application. Which is why so much of the spec (see Ecma 376 page 6001 et seq) is taken up with issues of handling forward and backward compatability.

Rick Jelliffe
2007-01-24 08:43:53
Rob: So you are saying that OOXML should be rejected because it is not big enough? :-)


Rob
2007-01-24 08:57:58
Rick,


You know better than that. Information content and page length are not 100% correlated. By analogy, a person can be both obese and malnourished.


I'm saying there should be a single interoperable document format, and that it should be open, flexible, extensible, cross-platform, multi-vendor, expressive, easy to tool, well-specified and be respectful of existing related standards.


What is the argument against this? What is in the best interest of the public? I accept the fact that Ecma serves only the vendors. But ISO, I expect more there. Maybe I'm an idealist, but ISO NB's vote out of their national sovereignty, and although JTC1 NB's sometimes delegate to trade associations for their decisions (like we do in the US), there still is (or should be) some obligation to do what is best for the national good, not just for the big vendors.

orcmid
2007-01-24 09:33:44
Well, the only thing that's official about this offer is that it is a genuine product of the Gary Edwards - Sam Hiser unreality amplifier. fr0mat.net indeed.


At least it is finally conceded that Microsoft was on board at OASIS and did not interfere in any way with ODF progress, although this grudging concession seems made only to forward the argument that Microsoft was then secretly developing a proprietary response (with the innuendo, I suppose, that MSFT was stealing ODF's secret sauce, except how do open standardization processes have secret sauce? I guess it is the "legendary" universal translatability of ODF as an intermediate between all comers).


This offer seems to resonate with Rob Weir's insistence that the legacy binary formats be disclosed. I think the idea is to convert all of those successfully to ODF without ever dealing with Office Open XML and, heaven forbid, ever exporting it. Oh, Rob is speaking for himself on the matter right here, so I will say no more about that.


I'll go to Rob's latest comment though. It presumes to know what "best is" and for some reason doesn't believe that this should be worked out through what we learn from adoption and take-up of the different approaches.


Hey, so what if OOXML becomes the PL/I (or even the more-successful COBOL or the XML Schema) of the format standards universe? Let's find out rather than attempt to short-circuit real-world adoption experience and refinement of standards/specifications over time.


If ODF is good enough, time will tell and one standard will languish and maybe move into a backwater as others have in the past. I'm betting that neither disappear and we need a lot more experience and the maturation of both before the dust settles.

Rick Jelliffe
2007-01-24 09:45:42
Rob: But you are not saying there should be a single interoperable document format, and that it should be open, flexible, extensible, cross-platform, multi-vendor, expressive, easy to tool, well-specified and be respectful of existing related standards. We already have that.


You are saying there should be a sole interoperable document format, and that it should be open, flexible, extensible, cross-platform, multi-vendor, expressive, easy to tool, well-specified and be respectful of existing related standards.

orcmid
2007-01-24 09:46:09
I was looking down the "by now its clear" link to the grokdoc EOOXML contradictions site and things like the confusion of day numbers with Gregorian Dates. I figure the best response by ISO might be "'a pox on both your houses,' the official ISO document formats are ODA (mapped to XML from ASN.1 BER) and SGML." That should fix us.
Micah Dubinko
2007-01-24 10:13:48
Why don't they just ISO standardize their binary formats? That's "backwards compatibility" for ya. -m
Rob
2007-01-24 10:32:21
Rick, you lost me there. I'm making a simple observation, not trying to be obtuse. I'm suggesting that the public is better served if ODF and OOXML work together in JTC1/WG1 to harmonize the two specifications, so we have a single format that meets a wider set of joint needs. I'm also suggesting that if this is done, the quality of the specification would be improved, which benefits interoperability as well. I'm also suggesting that this will require less net JTC1 and SC34 resources and attention over the next several years than if it had to maintain multiple, inconsistent, overlapping specifications. And I'm suggesting that the economic cost to the industry, to all vendors, and to users, governments and academic institutions would be less than if two formats ended up as standards.


Sure, we can never prevent applications like Microsoft Office and OpenOffice from diverging their feature sets over time. In fact, they should do this, competing vigorously in the market for their users based on the innovations the applications brings. But we must also acknowledge that there is much in spreadsheets and word processors that are pretty much common practice and have been set conventions for 15 years or so. There is a vast range of commonality between these two applications and others as well (WordPerfect, AbiWord, KOffice, etc.) The parts that are common, the base consensus of the industry of what a spreadsheet formula is, or how to specify a numbered list, should be specified once and standardized. There is no reason for anyone to attempt to innovate a novel vocabulary for concepts that have been stable for 15 years.


So I'm suggesting an extensible document format, where for interoperability the vendors share a common base set of functionality, but are able to extend, through namespace differentiation or other means, to add vendor-specific innovations. We can disagree on the edges, but it is silly to disagree about the commonalities and have inconsistent representations for the stuff where there is already a broad consensus.

orcmid
2007-01-24 10:35:32
I was browsing through the goodies at fr0mat.net, and I ran across this amazing statement: http://docs.google.com/View?docid=dghfk5w9_826sm2v#Three_Conversion_Approaches_to_C


"Since EOOXML was made expressly and specifically for mapping Microsoft Office IMBR to, you better get perfect fidelity.


"What about ODF? Yes, you can get the same perfect fidelity. The flexibility is there, and has been there since the February 2003 addition of the <foreign element> tags, section 1.5 of the ODF v1.0 standard (casually referred to as the <microsoft tags;> because of what they can do).


"So yes, if you can break the secret of the proprietary IMBR, understand their hidden structure and function, you absolutely can get perfect-fidelity conversions to ODF and EOOXML."


Well, here's one reason there's all this kvetching about Microsoft binary formats and even Office's in-memory formats.


But what would have been a coffee-snorting disaster if my cup hadn't just been emptied, was the homage paid to "foreign elements" in ODF. Foreign elements are by their nature *outside*of*the*spec. So, the wonder of full fidelity is to be achieved by escaping the bounds of ODF and using foreign elements, which every conformant ODF processor is permitted to ignore. Now that's a hat trick. Could anti-gravity and bath-water fusion be next?


Now, the fact that EOOXML allows that sort of thing in a controlled way (ODF has no control mechanism other than what I have just said) has been damned, indicted, and otherwise used as evidence for evil. You go, guys!


The lengths of these posts about the wonders of da Vinci, with no link to available code, reminds me of an old joke about an IBM Salesmen whose ex-wife explained that "all he did every night was sit on the edge of the bed and tell me how good it was going to be when I finally got it."

M. David Peterson
2007-01-24 14:20:37
@orcmid,


I must admit, I digg your style. Keep up the *GREAT* commentary.


@Micah,


> Why don't they just ISO standardize their binary formats? That's "backwards compatibility" for ya. -m <


I must admit that when I read this I find myself suddenly lured into just such a proposal. That said, we're talking about some significant effort here, so its not the kind of thing that could happen overnight, obviously.


But let's be honest about this: Every word processor on this planet implements support for Office binary formats since the dawn of man, and MSFT has never even so much as hinted towards any sort of legal action. In fact, according to Tim Bray [in left to my post from a while back regarding whether or not OpenOffice would implement support for EOOXML],


Get a clue. OpenOffice.org has had import/export filters for every MS Office format back to the dawn of time, and can already open lots of vintage Office files that Office can't, any more. Do you have some inside information that would suggest they'll suddenly change their behavior and refuse to do it this time? If so, please share it. If not, please restrain your enthusiasm.


While attempting to restrain my enthusiasm is not something I have had much luck with in my life (I am who I am; love it or hate it, it ain't gonna change a damn thing about my attitude towards life), what I will do instead is throw some enthusiasm back towards the ODF side of the fence,


Tim's point is well taken, and in fact if truth be known, as far as I know, its absolutely spot on.


With this in mind, I have to ask the question: Setting aside the fact that it would take time to bring into fruition, what exactly is there to lose by standardizing the previous binary formats? They're legacy formats, and the support for them already exists inside of Oo.o, and pretty much every other processor I can think of. That isn't going to change (if MSFT hasn't threatened legal action before now, it's certainly not going to happen at this stage of the game), so the fact of the matter is that specification or no specification, standardization or no standardization, the support for the legacy Office formats are going to be there, so why not just make it official?


- The competitive advantage of Microsoft stating "we provide support, and they don't" is obviously not an advantage at all given Tim's statement from above, so in all honesty... What does MSFT really have to lose?
- In fact, I would argue that what MSFT has to gain by just such a move would FAR surpass any potential loss they could incur (if that was even a true possibility, which I don't believe that it is)


So again... I do find myself lured into Micah's point.


Unless, of course, I'm missing something obvious?

Rick Jelliffe
2007-01-24 17:58:40
Dovid: How do you recognize FUD? Well, one way is that when it takes a particular issue, then without any evidence as to how regularly it occurs or in which situations, it builds the issue into a showstopper. Not F, just UD. "Oooh, unless MS publishes every tinpot binary format even if no-one has used since it 1982 then no-one can implement OOXML."


But Rob does have a good point (or he would have if he made it) that the ability to translate old formats into OOXML (or ODF) is an application issue. But old VBX, COM or DDL objects will always be a problem (for OOXML and ODF both support embedded objects IIRC): in SGML terms they are SDATA entities ("SDATA" is a red flag indicating you need to step in and make custom arrangements.)

So the ability of OOXML to provide full-fidelity versions of old binary data formats is dependent on the ability of OOXML-generating programs to import those file formats, and whether the program's native data structures have mappings matching the binary format's requirements.


Similarly the ability of ODF to provide full-fidelity versions of old binary data formats is dependent on the ability of ODF programs to import those file formats, and whether the program's native data structures have mappings matching the binary format's requirements.


But there is absolutely no chance that ISO would ratify standards for dead Microsoft binary legacy formats. Life it too short. :-)

marc
2007-01-25 06:36:34
>[rob:]Microsoft claims that the primary virtue of OOXML is its
>compatibility with their legacy binary documents. This is the
>justification for many peculiarities in the format, like calling
>1900 a leap year. At all costs, OOXML was required to remain
>backwards compatible with the legacy binary formats.


>However, Microsoft is the only who has access to the
>specification for these legacy binary formats.


Ethics, does MS know that word?

Preston L. Bannister
2007-01-25 10:29:08
Note also that a standard is much more potent with a test suite. No matter how many words you devote to a subject, there are going to be dusty corners with significant unspecified behavior.


To build a test suite you need an implementation. Who is going to do an implementation of a 6000 page spec? Who can say when an implementation is accurate? (BTW, I hope "keys to those legacy binaries" means source code.)


Given a test suite you can show when a vendor wanders off from the standard - which does much to level the playing field.

critic
2007-01-25 12:27:09
It may be only a cynical ploy, but in dealing with Microsoft I doubt that there's any such thing as too cynical.


Leaving aside the ISO image problem, it might be reasonable, and it would smoke out some serious bluffing: it gives MS what they claim to want, which is a standard, and actually makes it what they claim they want it to be, without giving them what they really want, which is ISO standardization for something they nevertheless control exclusively.


Of course, then the two would compete on their own merits, which is what Microsoft has never wanted.


The smartest thing for Microsoft to do in this situation is ignore the offer.

Rick Jelliffe
2007-01-25 19:16:33
Preston: Yes, indeed, test suites are really important for any mature standard. But, in the case of document conformance, schemas provide the lion's share of conformance testing.


As far as I am aware, the ISO rules do not allow test suites. This is because there is a general editing requirement of ISO standards that the same constraint cannot be specified twice (or, at least, twice with different language.) This prevents internal contradiction. Editors get around this by adding notes marked "informative". One place duplication can slip in is when there is a formal schema, such as a RELAX NG schema, and then a text description elsewhere, but this is to some extent unavoidable.


So what tends to happen is that ISO standards will set up the infrastructure for testing or "abstract test suites". For example, ISO ADA has procedures for what conformance testing would be (organizational issues) but not actual tests. ISO Schematron specifies SVRL, the Schematron Validation Report Language, in order to allow test suites, but not actually any test suites.


ISO does not consider itself to be in the certification business, but the standards business. Other organizations have that niche: NIST in the US for example. One of the aims, or at least benefits, of SGML/XML was that it allows testing to be potentially much simpler, because a large component of the specification is in terms of the grammar that can be checked by schema validation. In other words, test the document's conformance not that application's conformance.


When you start to move to application conformance, it becomes really tough. Look at XSD: you can test the conformance of a schema largely by validation against the schema for schema, a single file. There are some specialist validators (IBM Schema Quality Checker) that do more. But for testing XSD applications they have about 40,000 tests: enough for wrong cases to slip in.


You could even see a schema as a kind of integrated test suite for documents. One reason people in standards are excited by XSD, RELAX NG and Schematron is the increased capability to parse data fields and test them. I mentioned before the example of bitfields: this is where instead of having multiple boolean attributes, the attributes are collapsed into a binary array that is the serialized as a number. It is of course a kind of compression technique. Schematron is perfectly capable of taking these arrays apart and validating co-constraints between them, if needed.


Schematron is a perfect language for test suites on XML documents. It allows you to make natural language statements, then figure out XPaths to implement the tests ("assertions"). Then you can further group the tests into separate phases, and run them progressively. The output from tests can contain computed values and set flags. Testing can stop when the first error is found, or continue until all tests in the phase have been performed. Because of Schematron's power, friendliness and simplicity, I expect a trend where XSD schemas (and RELAX NG) used for the broad sweeps of document types, while business rules and variations are checked with Schematron.

hAl
2007-01-30 09:17:08
Flaw or Asset.


OOXML uses the much critisized date 1900 format in spreadsheets. ODf is suppossed to use only ISO 8601 date format. However has anyone from OASIS ever looked at the ISO 8601 spec. It is massive compared to the incredebly simple (limited) date 1900 format that eexcel uses. How can one expect interoperability using such a massive date spec. Who in his right mind will go and implement this spec fully in his spreadsheet cells. For some items in the ODF spec partial implementation seems fine but for a datetype in spreadsheet cells ? One cell in the ODF file containing a ISO date that the spreadsheet can't interprete might render the full spreadsheet useless.


But if you manage to implement the full ISO spec. How then go about in calculating stuff with it in massive spreadsheets ??? Convert to an internal format is common in spreadsheet to get performance. However internal dateformats are unlikely to be compatible with the massive ISO 8601 spec. So conversions will possibly leave a gaping hole in the integrty of the spreadsheet dates.


So is the choice for a simple to implement date format in OOXML spreadsheets a flaw or an asset ????

Rick Jelliffe
2007-02-02 01:29:05
hAl: Yes, in practise, the more complex spec (note that I didn't say "the larger the spec"!) the more chance that implementations will be incomplete.


This was one of the motivating factors for ISO DSDL: we felt that W3C XML Schemas had reached the point where implementations would take take years, if even, to be complete. So better to break the language up into smaller pieces that could each, hopefully, be formally specified.


But it is important to realize that many implementations of standards are incomplete because the standard simply does not require completeness. Indeed, the whole idea that an implementation is somehow lacking because it does not implement optional parts of a standard is utterly wrong-headed. Small utilities that only need to implement part of a standard are good. Vendors of large shrinkwrap or COTS application suites are the ones who stress "full implementation". People running little pipelines with transformations rejoice when they can get their job done with as far-from-full implementation as possible.