A little freaked out by ODF's definition of Open Standard

by Rick Jelliffe

If Microsoft wrote this, we would be up in arms: "people have been able to exchange spreadsheets using completely undocumented formats, such as Excel's, for many years so this notion that documents "can't be exchanged" until every jot and tiddle is written down is simply untrue." What a snowjob! Actually, the quote comes from the ODF website.

But the thing that freaks me out more is the definition of an Open Standard given on that same page: on the one hand they say that implementations of Open Standards may be extended, or offered in subset form but on the other hand that you cannot have licensing requirements to inflict embrace-and-extend tactics. But it is the Hacker's fallacy: embrace and extend is not obviated by open licensing, because most people do not have the technical capability to code up their own alternative to their platform or vendor's immediate offering. We need non-fuzzy Open Standards because not everyone has the power or capability or deadlines to take advantage of Open Source or Open Licensing: standards should reduce the risk from the decision to outsource (i.e. purchase or download) systems and applications. In practise in Open Formula, they seem to be doing things the right way, and are having conformance levels: well-defined groupings of functionality in clear namespaces.

12 Comments

Bruce D'Arcus
2006-09-03 07:37:21
I'm on the ODF TC, but two caveats before I move on: first, I'm not involved in the formula work, and second, I am speaking for myself only.


I think you raise a valid point. It's worth noting, though, that the document you cite is a wiki; a kind of working document. It's not anything formally approved by the TC, so does not reflect any official policy.


Second, there are different ways to deal with this question of standardization. In the case of formulas (which I know nothing about!), it seems to me that syntax is critical, so that ideally any new extension would get folded back into the OpenFormula spec. There's little value in having non-standard extensions I am guessing.


But we all know standards take time, so it makes sense to allow for extension independent of the formal TC process, if for no other reason than to allow them to be implemented alongside getting them included in the spec per se.


There may be other valid reason as well.


Another point I'd like to make is that there are intermediate shades of standardization. Or put differently, one needs to clarify *what* one is standardizing. Sometimes in XML standards we focus far too much on markup.


At the ODF Metadata Subcommitte, for example, we have discussed using a standard model (in my view RDF, or a subset of it) with maybe some standard vocabularies (DC, vCard, etc.). This is conceptually similar to, and may well include support for, Adobe's XMP.


I'd suggest that provides all the benefits of standardization (indeed, you are standardizing the model), with significantly less liability. E.g. I can write (and publish) a RELAX NG or Schematron schema that validate the generic model without having to worry about the specific XML element and attribute names (see here for more).


Finally, WRT to levels of comformance, in the metadata case I'd suggest it'd also possible to have one's cake and eat it too. We can say all ODF applications must preserve foreign content in metadata as a baseline. We can then define fuller support based on the ability to read and/or write that foreign content.


So perhaps what you are suggesting here as a binary blakc-or-white choice is in fact shades of gray (though I admit not too many of them!).

Rick Jelliffe
2006-09-03 20:58:31
Well, I certainly agree that sometimes it is better for a standard to merely specify a container format and leave off standardizing the contents until what to standardize for the next layer emerges as bleeding obvious. Or, even better, to standardize the most common subset of the most obvious contender early but allow extensions or replacements (this is what I did with ISO Schematron, which allows other query languages than XSLT 1's XPath 1).


But a standards group needs to be candid: when there is no agreement or public technology there is no interoperability. Even with a standard, you just get a goal rather than a guarantee.

Bruce D'Arcus
2006-09-04 09:19:03
Sure, but like I said, you were citing a working document, not any official ODF TC statement. I expect any final document will be quite careful about these details.


That said, I posted a link to this on the ODF TC list, and my understanding is the document got cleaned up to better reflect the intention of its author(s) (though again, not yet the official position of the TC as a whole).


OTOH, how much of Microsoft's internal discussions about Open XML are public?

Rick Jelliffe
2006-09-04 23:39:15
Well, I am glad the site is cleaned up! It is a nice page now: ranty and stimulating. But it still reeks of having your cake (support standards) and eating it too (allow proprietary extensions) which dimishes the total message.


As for MicroSoft internal discussion being public, do you mean MicroSoft or ECMA discussion? I am not interested in taking MS' side (or ODF's side): if I have to be on a side, it is probably ISO SC34's side (or their angle perhaps)!


But I would make three points:


1) There is a fundamental difference in approach appropriate to a standard like ODF, that brings together disparate applications to let them work together, and like MSOOX, that puts an XML facade on an existing widely deployed technology. There is only room to tinker on the edges of the latter, so discussions will inevitably be more on housekeeping and matters of form rather than pioneering. In this case, Excel already exists, while ODF is reconciling different products and allowing people more choice to substitute office products.


2) ISO SC34 had a long-standing rule not to standardize *any* schemas or DTDs apart from what was needed for ISO and committee use. This was relaxed in the special case of ISO HTML; both ODF and MSOOX are not products of the ISO SC34 committee but are allowed in because ISO deems that both OASIS and ECMA meet the necessary requirements in their standards. The loophole that allowed ODF in also allows MSOOX in; and we are better off with this loophole than without IMHO.


3) As for openness of discussion, there is a new version of MSOOX out which has quite a few changes in it as a result of public discussion. I know because two of the changes were things I requested, through this blog: for it to be a ISO standard, I felt that MSOOX should not have unnecessary MicroSoft-isms, so I suggested that ECMA should also provide a RELAX NG schema and that controls for Active X should be partitioned off in some way (so that there is no anti-Java bias for example.) I'm just downloading the new drafts for a blog, so I'll know more shortly.


I see Brian Jones from MS has a new blog on their ODF converter: it looks like improvements in it are continuing. The ODF site says a new version of the OpenFormula draft will be out next month: it is very exciting.

David A. Wheeler
2006-09-05 13:43:47
Hi, I'm chair of the OpenFormula subcommittee, and I have a few comments that I hope will help. I can only speak for myself, of course.


First, regarding: ""people have been able to exchange spreadsheets using completely undocumented formats, such as Excel's, for many years so this notion that documents "can't be exchanged" until every jot and tiddle is written down is simply untrue.", you then say "What a snowjob!" No - I think you're missing the point. This is NOT an argument against open standards for formulas; the whole group's PURPOSE is to create such a standard, and I (and others) have now given 1.5 years of my life to DEVELOPING such a standard. This text is instead opposition to the myth that it's "impossible" to use OpenDocument for spreadsheet files. Which is not true; people do it all the time. This sentence is a counter to those who use a double-standard, claiming that you "can't" use OpenDocument, but then go use the only alternatives, the undefined .xls or MS' johnny-come-lately XML - both of which are completely controlled by a single vendor. Let's not have a double-standard; if you can exchange spreadsheet files using .xls or Microsoft's XML format, then you can exchange spreadsheet files using OpenDocument. It is the double-standard that that text was intended to counter. That doesn't mean we want formulas to STAY undefined, which brings me to the next point.


If your point is "there should be a standard", well of course!! At the time of writing it seemed fairly obvious that a standards body would be pro-standards :-). We are working on that very thing, obviously. In fact, the OpenDocument community was the first standards-based community to discuss and work on standardizing formulas; it was initially discussed in the OASIS TC in 2004, the first draft of OpenFormula was posted February 2005, and OASIS formally started the formula group in Feb. 2006. In all that time, Microsoft never defined its formulas with more detail than the original OpenDocument specification. Indeed, it was 15 months after the OpenDocument community began publicly defining a spec for formulas before Microsoft finally STARTED defining its vendor format for formulas (May 2006). But obviously confusion was possible about that text, so sorry about that; we've modified the "About OpenFormula" page to make that (hopefully) clearer. Now there's a "myth #2", which says "Myth #2: It's okay to leave formula formats undocumented." We believe it's critical to define formula formats, which is why we have been the leaders in doing so.


Regarding "implementations of Open Standards may be extended, or offered in subset form", I think you either misunderstand, or we are strongly disagreeing. The most popular definition of "open standards" is Perens', which SPECIFICALLY requires this. Yes, it'd be easy to get "perfect" interoperability by strictly forbidding the implementation of subsets or supersets; malevolent organizations can enforce this legally by granting patent grants that only permit implementations of "this specification, exactly". But without the ability to implement subsets and supersets, standards cannot respond to changing conditions. If you can't implement a subset, you implicitly require that everyone recreate a new incompatible standard from scratch when only a subset is needed. This would quickly cause a proliferation of completely incompatible standards for only slightly different purposes. It also essentially forbids open source software implementation (because such projects work publicly and necessarily start incomplete), and inhibits most proprietary software implementation (because implementors often have to work in stages too). Subsets whose value is proven can become standards themselves; XML, for example, was created as a subset of SGML. If you can't implement a superset, there is no way for implementors to experiment with new capabilities that should eventually make it into the standard. Standards group should try to avoid standardizing things that have no field experience - how can you get that experience, except through supersets? A standard that can never have a superset implemented quickly ossifies, because there is no way to gain experience. The solution is to let implementors add capabilities (supersets) beyond the standard; the successful experiments should then be incorporated into future versions of the standard.


I think you're confusing the IMPLEMENTOR'S need to be ABLE to implement supersets and subsets, with the USER's need to not be confused. Certainly it is critical for users to know when they are only getting a subset implementation, and what the non-standard extensions are in a given implementation so that they can avoid them. I completely agree that users need to have that information! But resolving that problem by making it ILLEGAL to implement supersets and subsets is not the solution. Instead, there are many other ways to solve it.


One simple solution is to simply say that implementors can't claim that they implement the entire standard if they only implement a subset. That is NOT the issue this text is talking about here. This text is about countering a new (nasty) trick in the book you may not have heard about - organizations who work to make it ILLEGAL to implement subsets or supersets of standards. The problem is that patent-wielding organizations can make it ILLEGAL to implement a subset or superset, as a condition to creating the standard. If SGML had had this arrangement, then XML could never have been born (XML is essentially a subset). If ASCII had had this arrangement, Unicode/ISO 10646 could not have been born (Unicode is essentially a superset).


We want two things: (1) implementors are free to implement subsets and supersets, yet (2) users must not be misled by subsets or supersets. In OpenFormula, we resolve this by defining "groups" - to comply with the "Small" group you have to implement 109 functions plus some other capabilities (involving minimum limits, types, etc.). Users can say "I need at least the Small group" and know what they're getting.


Conformance levels do not necessarily undermine interoperability. It's easy to have "interoperability" if there is only one permitted product, but that comes at the cost of monopoly prices and service. In many environments, a "complete set" isn't really practical at all, so if you require "all or none", the result is often a hodgepodge of incompatible "none"s. Rather than have NO interoperability, it's better to have predefined subsets, so that products that can practically only implement a subset implement the SAME subset. By predefining subsets, you can actually INCREASE the interoperability. And if you need more than some subset, then select your products based on that requirement. It's simply a matter of selecting the products that meet your needs.


Your article says: "but on the other hand that you cannot have licensing requirements to inflict embrace-and-extend tactics". I don't understand this; open standards MAY impose licensing requirements to COUNTER embrace-and-extend tactics, though OpenFormula does not. You might want to recheck that text.


You say that my document adopts "Bruce Peren's criteria but falsely, as far as I can see, ties Peren's "Ability to create extension or subset" with Krechmer's "Open Interface"." Actually, it was Krechmer himself who tied these together, though I do agree that they are tied. You may not think they are tied, but the original author of that paper (Krechmer) thinks that his "Open Interface" is tied back to that, so I suggest you contact Krechmer if you disagree.


We've made various modifications to the Wiki text, in the hopes that they will clarify a few things. Please feel free to email us directly if there are other concerns or text you aren't sure of.
We operate very much in a fishbowl - our specs are public and released at least weekly, our discussions are captured on a public mailing list, and so on. And while this makes us especially easy to criticize (because we show all that information), it also makes our results better. We're writing a specification that we want to be used - and useful - in 9000AD, presumably long after all the current application suppliers and document developers are gone. True world-wide review all along the way, by many different parties in a neutral setting, is the only way to get a truly good standards for the ages.

Rick Jelliffe
2006-09-09 08:35:36
David: Thanks for the long reply which repeats so many of the points on the wiki page.


Ken Krechmer's examples of open interfaces are of fallback or fall forward systems, in which an "etiquette" (i.e. a sequence or protocol or software development pattern) is followed to find the optimal interface that both parties understand. His examples are modem data comms (I spent the early part of my career writing microcontrollers for modems) and XML (under, I suppose, the assumption that people will write software that ignores unknown elements.) The essense is an etiquette.


But Perens mentions nothing about an etiquette, does he?


An etiquette for Open Formula, in Ken Krechmer's terms (on the face of them) couldn't really involve negotiation ion a static document language, so they would either have to be a series of alternative forms (e.g. a graphic and a formula) which seems low value for formulas, or some kind of requirement on how a formula application should interpret terms or functions that the implementation doesn't understand, so that the thing still works. I don't see how that is possible.


Where is the "etiquette" in Open Formula?


I agreed that conformance levels are good, but they don't actually seem to meet the level Krechmer's examples provide. And, yes, I think Perens is wrong here: at least, an interpretation of Perens that says that open standards require allowing arbitrary ad hoc proprietary extensions and subsetting is bogus beyond belief and, on the face of it, completely against Krechmer's point about etiquettes. (And in any case, Open Formula does not seem to propose allowing arbitrary ad hoc proprietary extensions and subsetting: it has some conformance levels for well-management.)


So if you want to have a page saying "we don't need standards" in order to promote your standard, I cannot stop you. But it is odd, at the least!


(On the issue of whether Microsoft or ODF was first in anything, I don't really use Microsoft products myself or follow the industry much, so I have no expertise to judge; I do remember that Excel 2002 had a public schema and included R1C1 and A1 format cells, which are fairly well known things (user documentation, etc.) though: it seems iffy to claim that they only started working on their formats in 2006.) Here's a good quote from Brian Jones, 14 months ago in 2005: "The great thing is that both MS Office and OpenOffice use XML for their formats, both formats are fully documented, and both are available to use royalty-free, so anyone can come along and build a filter that translates between the two. The key for full interoperability of course will be that both pieces of software support the same set of features, otherwise there will inevitably be some loss."


So on reflection, perhaps I can state the issue even more strongly: I suspect that conformance levels (such as Open Formula adopts) are not at the level of Ken Krechmer's "etiquettes" and so adopters of Open Formula need to be clear that only by restricting themselves (how?) to the minimal common core will they get the interoperability they may expect to get from an open standard.


(Regular XML-DEV people will recognize this meme!)

orcmid
2006-09-09 11:31:30
It looks like the problem here is not so much the basic parameters of an open standard, but the problems of relying on an open standard in interchange of documents that will be handled by different standard-conforming applications.


In that respect, ignoring the ODF angst, the normative statements of the Open Document Format itself are instructive: there is no "floor" requirement for minimum agreed comment and use of features; there is a peculiar simplistic statement on what happens when "foreign" elements are encountered (ignore the element, pass through the content, and attempt to preserve the element, whatever that means in the face of editing). Finally, there are some places where namespaces not specified in ODF 1.0 are explicitly allowed, and it is the wild west.


There are also difficulties in practice. OpenOffice.org does make use of the namespace provision for introduction of formulas, but I don't know how one makes a normative reference to said namespace and knows what its specifications are in, say, a procurement specification. (I suspect similar things will arise with the ECMA Office Open XML too.)


Finally, in the early practice with ODF, we already have namespace abuse and/or misunderstanding of the rules about who can add things to ODF-specified namespaces. It will take a while for the etiquette around that to get spread around, as you know.


And, of course, any place where a the provisions of an open standard are underspecified, there are going to be interchange and substitution problems. I know the ODF specification better than ECMA Office Open XML, but I am still being surprised by new areas of underspecification and interesting interpretations (the under-defined use of MathXML being the latest case that came to my attention).


So there will have to be conformance levels for application situations, perhaps by some kind of profiling agreement. This sort of thing was incorporated in ODA (as you might recall), but perhaps the better example is what the WS-I folks did to rationalize the use of a variety of web service specifications for integrated, mutual use.


I wonder about this case because it would appear that the crux of the Massachusetts initiative would seem to be that there be some way that the agreed use of documents in civil administration would be preserved by different ODF-compliant products and someone is going to have to specify and verify what the common profile for interchange is going to be. I still don't see anyone addressing that problem or stepping up to what it will take to accomplish.

orcmid
2006-09-09 11:47:00
Oops, pressed Post too quickly. I meant "ignoring the Open Formula angst". Oe must not igore ODF angst because that specification is cast in stone and it will be tough to deal with its sloppiness in future revisions.


And I meant to say 'there is no "floor" requirement for minimum agreed content and use of features'.


The risk of course is that we all speak ODF (or ECMA Office Open XML) but even though we are honoring the same specification, none of us can understand each other enough to present the other product's documents to our users with a strong assurance of fidelity.


Rick Jelliffe
2006-09-09 22:12:41
Orcmid (Dennis): YEs, I agree. I think users/adopters of products and vendors/developers of products are stakeholders with different viewpoints. The kind of technology and checks that a vendor/developer dominated group will come up with is different from the technology and checks that a user/adopter group would come up with.


In the case of ODF and MSOOX, I expect an organization like NIST might have a role in developing a test suite and perhaps certifying products. In the cases of ODF and MSOOX, this might be as simple as opening a file in product X and confirming that the information/formatting is intact. I expect review magazines would do this too, and word of mouth.


However, is that really good enough for what we want when we adopt an open standard for document interchange? I suspect there will need to be a minimum common profile of ODF/MSOOX emerge with a test suite.


The model for this is the OASIS exchange table model. The US military CALS project developed an enormous kitchen sink table model, coping with everything anyone had ever done in a hand drawn table for the military. But all the typesetting systesm only supported different subsets, so interoperability was terrible. This forced the vendors to get together (at SGML/Open, which is now OASIS Open) and agree to cooperate: they made a matrix of features each supported, and agreed that if there was any feature that most supported, all would support it. And the took out from the table model any feature that most people didn't support. The tables were dumbed down, but exchange was improved.


So I wouldn't panic too much about sloppines in ODF's spec, because hopefully it is the start of a conversation between vendors, and between vendors and users. I think vendors/developers understand that users choose "openness" because they want interoperability. Vendors/developers should be under no illusion that users will have no tolerance for a standard that does not provide interoperability no matter how "open" it is.

Olivia
2006-09-11 18:46:31
David A. Wheeler, it's sad that you, the "Chair" of the Open Formulas resort to Microsoft-bashing rather than acting professionally. Micrososof't "Johnny-come-lately XML format"? Microsoft Office has had publicly documented XML formats for a while now. Just because ODF was the first to be ratified by ISO means nothing, particularly when you consider the fact that ISO essentially rubberstamped the ODF spec provided by OASIS, a spec that was rushed just to beat Microsoft to the punch, but at the cost of being incomplete. The ECMA process that MOOX is going through is much more rigorous than tha farcical process that ODF went through with the ISO, so no wonder ODF got ratified first. I still don't know why ISO rubberstamped an incomplete spec, but what's worse is that guys like you are pitching ODF to government entities, trying to convince them to mandate exclusive use of what it (unbeknownst to those governments), an incomplete spec.
hAl
2006-10-08 01:27:03
The 1.4 draft of OOXML contains lines about his now to
Both conformance to the standard and extensibility is now layed down in the standard itself.
Same problems indeed but at least they seem to have thought about it a bit more at Ecma
imparare
2007-04-14 23:13:54
Interesting comments.. :D