Harmonization by augmenting ODF with OOXML elements

by Rick Jelliffe

One possibility for the co-existence that hadn't grabbed my attention until today has probably been obvious to everyone else: when converting from OOXML to ODF just embed OOXML-namespaced elements inside the ODF where there is no direct equivalent.

This allows good round-tripping, doesn't require ODF to be extended with legacy Office-isms, allows developers who want to support more than the ODF base to do so, gives better fidelity for Office users, improves round-tripping and doesn't require that competitors sit down in the same room. Furthermore, in the case of say DrawingML, the original can be preserved as well as converting it to SVG, so the chances for round-off errors and data corruption from incomplete converter implementations is lessened.

ODF already allows foreign namespace elements. I guess what ODF would need to support this well would be a mechanism to say "This kind of foreign element should be stripped out when its context changes, but round-tripped otherwise."

The reverse is also true: where ODF supports something that OOXML does not, it can either use the customXML elements or a separate XML part.

That would be a nice use of XML namspaces, actually. Rather than harmonize into a single format, augment each other without defining new elements. I don't know that this would be satisfactory in every case, though. The daVinci plugin, as I understand it, generated ODF where it could but resorted to nonsemantic markup (binary?) where there was elements it didn't understand or ODF didn't support; a better approach would be to use Office Open XML elements for that purpose.

(In a previous blog, I raised the option that a document can be ODF and OOXML at the same time. The idea here augments that considerably. But it has the same thing in common: that there are other ways of thinking about ODF and OOXML than just as arch-rivals. People think that that either ODF and Open XML will go away: I think both will be around for a while so the issue we have to face is how to manage them. Hat-tip to Patrick Durusau for the namespaces idea.)

25 Comments

bryan
2007-09-12 01:00:49
hmm, actually I did a proof of concept for xforms to extend the ODF one is in. I suppose the same thing could be done with OOXML in ODF.
Iceburg
2007-09-12 01:01:59
> where ODF supports something that OOXML does not
For example?
Jesper Lund Stocholm
2007-09-12 01:58:01
Iceburg:


OASIS ODF 1.0 section 15.4.36 (Text Blinking). OOXML does not support blinking text in WordprocessingML. VML does (2.18.101 ST_TextEffect (Animated Text Effects)), but it seems to be a rather ahem ... "not-so-good" idea to embed VML just to allow blinking text.

Rick Jelliffe
2007-09-12 02:31:41
Iceburg: For a list of some of the features that one has but that other does not (*and* for which there was no workaround found, so this is not a complete list at all) see


http://odf-converter.sourceforge.net/features.html#hUnsupportedDOCX


Jesper: Blink is of course a dangerous element, which is why it was removed from HTML. Certain rates can set off epileptic fits; I was surprised to see it requested in NBs comments and I expect the NB will rapidly withdraw it on accessibility grounds. I cannot think why it is in ODF: IIRC the story used to be that some rates were worse than others, with one or two blinks a second being the worst.

Asbjørn Ulsberg
2007-09-12 04:40:24
This is how Microsoft should have developed OOXML in the first place: as a set of extensions to ODF that slowly could be incorporated into the core ODF standard over time.
len
2007-09-12 06:02:43
It is how XML-based standards should work in general. Namespace embedding is a crucial piece of ASP.Net and has been for some time. With user-developed controls, even moreso. I can almost recite the arguments by heart as to why this isn't more widely practiced and cite cases where the designers have come back an said "Oh d'oh" facing the complex beastie they created by not creating simple dry specs first. More time spent paritioning applications (not documents) into encapsulated user controls should provide more experience for the next generation who will do this as a matter of course. It is easy to see that in .Net although it was building VR worlds that made it clearer to me (try routing real time events through a dozen or so of these).


I don't think either standard goes away. I think like Docbook, 28001, 38784 they become less relevant to the broadest expanse of the information ecosystem and settle down into ever smaller niches. For the forseeable future, variants of HTML, CSS, etc. dominate the larger expanse with lots of little domain-specific namespaces making up the bulk and humans filling in the goo by hand.

William
2007-09-12 06:36:13
Asbjørn: I would posit that MS couldn't have developed their XML-based format (which everyone wanted) based on ODF. Here's why; they have buckets of backwords compatibility to maintain, some of it awful - this is not part of ODF's goal, and by the time ODF was mature enough to add to they would have been even later to market - an unacceptable delay. Furthermore, if they had taken the approach you suggest they would be widely accused of embracing and extending an existing format to its detriment.


MS is in an impossible position - the relationships and the overhead they have created over the past 20 years make it impossible for them to be agile, and their technical decisions saddle them with terrible burdens. This is not an environment where extending a fledgling file format that isn't a standard and doesn't fit their needs (remember that this call had to be made several years ago) can be seen as a good move.

Rick Jelliffe
2007-09-12 06:40:52
Asbjørn: Unfortunately, not all differences can be handled so easily. When ODF 1.2 is out and provides a more complete base, and IS 29500 is standardized to provide a good catalog of requirements, then you will have a much more compelling case, particularly for word processing documents, that ODF with OOXML extensions is all that anyone except archeologists really need.


However, even archeologists are a legitimate user base for a standard :-)


In three years, the landscape may have changed enough so that this is a really viable option, even for MS to consider. (But I think it is unfair to require precognition or even in that the ODF process.) I have no faith that complete harmonization is possible or even necessarily desirable, but it is not an all-or-nothing game.

Rick Jelliffe
2007-09-12 08:11:33
William: Yes. But, as I have said before, I think the drivers for ODF are different from the drivers for OOXML; for some markets one will be the dominant format (ODF for public documents; OOXML for private ones) but governments and users need to keep putting the pressure on both sides to incorporate the converters as part of the standard distributions (Sun should stop baulking at adding the OOXML converter to Open Office; MS should add the ODF converter to at the next service pack for Office 2007.)


I often am pretty scathing about the kneejerking anti-OOXML side's views, but the reasonable pro-ODF people do have many compelling points, and one that I agree with is the need for all office software to allow the user to select the default save format (certainly to select between the XML-in-ZIP formats). That is the kind of thing that is outside ISO's area, certainly for a document standard.


I think that some of the dissatisfaction some people have with DIS29500 is that they think we need be able to constrain and explain applications, hence the chomping at the bit with DIS29500 because, being just a document format, it doesn't go far enough (for them). So some kind of ISO standard for (abstract) office applications might be appropriate....at a future stage: for example, a language that provides metrics on the typesetting/hyphenation/kerning/breaking of an application, to provide more hints for a receiving program on a different platform.

Tim Bray
2007-09-12 08:59:21
Sounds plausible. And after you've gone through the work of establishing which OOXML elements do have direct ODF equivalents and what they are, you have to wonder why you need two different XML vocabularies for the same thing. So why not just choose one vocabulary for the redundant bits?
Jody
2007-09-12 12:04:57
I don't see this working out usefully for areas with overlapping functionality.
- The formula specifications are incompatible. XL could store a function that OO.o could not parse, eg store it as "xl:func(bobo)" rather than "oooc:f(bob)" but it would get dropped on the round trip, and be useless on import.


- Other overlapping features, such as pivot tables, or autofilters also have non-trivially different semantics. It's not a question of adding some missing attributes, or extending an enum or two.


- Even if ODF made promises to store and re-save all of this out of standard content (a performance nightmare for OO.o or vice versa) the result would be significantly slower than OOX for spreadsheets. Without support for shared strings and formulas files bloat quickly.


I fear the result of such a bastardization would end up being pretty much the equivalent of the 'dual format' files from Office 97, which saved two complete parallel copies of the content.

Kriz
2007-09-12 15:09:40
As Jody has already said - adding some OOXML elements and attributes to ODF documents will not work. The whole document structure is quite different (for example lists, tables and text runs in WordprocesingML).


Furthermore there is a severe problem in the ODF spec:
"Documents that conform to the OpenDocument specification may contain elements and attributes not specified within the OpenDocument schema. Such elements and attributes must not be part of a namespace that is defined within this specification and are called foreign elements and attributes. (...) Conforming applications that read and write documents may preserve foreign elements and attribute".


So conforming applications do not need to preserve foreign elements and at least OpenOffice does not.


BTW:
Converting DrawingML to SVG is pointless since ODF has it’s own drawing markup language inside the "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" name space and does not support SVG (contrary to popular belief).



Kriz
2007-09-12 15:22:24
It looks like the daVinci plug-in stores RTF inside ODF files since this is all it gets from the (Winword) converter API (Winword for the most part is a RichText/RTF edit control/widget).


The Sun converter on the other hand starts a stripped down version of OpenOffice/StarOffice and loads the RTF data into this OpenOffice instance which then stores the parsed RTF stream into an ODF file.

Rick Jelliffe
2007-09-13 00:35:59
Tim: What Jean Paoli told me was that they had tried to produce a nice higher-level schema (which would indeed be very ODF-like) initially, but it did not work out, so they realized they needed to just provide a very direct modeling. It was at the time of SML when people were saying "we don't need attributes: structured property elements are better" and "we don't need mixed content" and the design follows the fashion of that time.


But to reduce the problem to there being different ways of saying bold avoids the issue that they have different ways of saying different things too. I tend to agree with you, but in a sense different ways of saying bold is not a problem anyway, because it is semantic variation not element names or structure trivia that causes problems.


The transformation from wml:p/wml:Pr/wml:font/@bold to //odf:style[@id=current()/@style]/@bold (or whatever they are) is not difficult (though the ODF method of making anonymous styles is quite baroque and ungainly compared to the elegance of OOXML: one of the few cases!)


But the transformation from DrawingML to SVG, or from linear WML to hierarchical ODF is not trivial (in the first case) or reliable (in the second case).


Tim, perhaps you can tell me. Will Sun be supporting OPC in ODF, since that sets the precedent for indirect referencing in XML-in-ZIP formats? Why did Sun vote against adding Office-isms to ODF, according to Gary Edwards?


Rick Jelliffe
2007-09-13 01:16:10
Jody: I agree that tacking on missing parts in different namespaces only goes so far, but that does not make it useless!


I think you show a good understanding of one of the central problems in the ODF/OOXML debate, and of application interoperability in general. That is that applications have their own data models independent of the file format. They support different features. They do things in different ways. They have various workarounds to accept and generate data, but round-tripping with exact information preservation has never been a condition or priority for applications.


One of the old data conversion tricks was to open, save, re-open, re-save a document in a single application before starting to process it, because this multiple opening tended to bring documents into line and remove variation based on their history. This worked because of the translation that goes on when reading and writing documents.


This is why for document interoperability we need to think in terms of graceful degradation rather than guaranteed full fidelity. I don't think it is reasonable (or at least practical) to demand that the internal data structures *must* conform and be limited to the features of a particular interchange format. Which is not to say it mightn't be a good way forward, but it is an argument that needs to be started. ODF was based on the idea of unifying the file format and never sold on the basis of unifying application's internal structures, but perhaps that is what is behind the hopes of the anti-OOXML crowd, unarticulated.


On that note, I was pleased to read that IBM has realized it has lost at ISO and so will be cannibalizing desktop Lotus is going to take a much more active contribution to Open Office. As an Open Office user it seems to me that recently for every one step it takes, MS Office takes two. The interview I was reading has some really good about containers (cue comment on OPC!) and the good comment


In order to do that, some of our priorities may have to change. Traditionally, we've grown accustomed to expecting pixel-perfect documents. But now we may often be more concerned about the content than with that level of fidelity. We may be willing to make some compromises in some circumstances, and we may want to throw out the original format in others, when we want to automatically reformat fragments from multiple sources into a new document with, for example, consistent fonts specified for the container. When we create that container, we might want to set that formatting as a default, but you'd also have a lot of freedom to work with the end result.


That only took them thirty or forty years! :-)


Kriz: That is an interesting point about ODF. If they are just making a profile with the published semantics and names and just use different namespace to indicate this, I think that is quite smart of them. I will have to look more.

Bruce
2007-09-13 05:37:02
Rick: "Why did Sun vote against adding Office-isms to ODF, according to Gary Edwards?"


This is a *completely* biased representation.


First, if we're talking about the conflict over lists, it was not "Sun" that decided the outcome. Pretty much everyone on the committee (including developers from KOffice and IBM) voted the same way. I (independent) didn't vote, but if I had, I would have voted with the majority as well. To me the alternative proposal was just not a good proposal.


Second, I do not believe it accurate to say that the vote was "against adding Office-isms to ODF." That is what Gary claims, but it's not a claim accepted by, for example, KOffice engineers.


At the time I said on the list [1] that "at a certain point, we need a formal -- and public -- way to resolve interoperability issues between OOXML and ODF. We cannot have people slipping in unstated requirements of this sort every time we entertain some enhancement. I don't know what kind of political or organizational work needs to be done to make this happen, but I
suggest somebody step up and do it."


It's a completely dysfunctional situation we're now in. As a first step, there really needs to be some formal liason between the two TCs. Maybe down the road there ought to include face-to-face meetings as well to iron out problems areas.


[1] http://www.oasis-open.org/archives/office//200705/msg00006.html

Bruce
2007-09-13 06:05:03
kriz: correct me if I'm wrong, but I'm pretty sure that OOXML has roughly the same language on preservation of foreign content as ODF. The general problem is about the unintended consequences of mandating preservation. This is not something easy to require (but is the kind of thing I imagine could be discussed in the hypothetical liason relationship I suggested).
Rick Jelliffe
2007-09-13 08:31:43
Bruce: I think OOXML has a different approach: rather than allowing foreign elements, it provides a set of tags like CustomXML which allow the use of foreign elements in what we used to call an "architectural form", where the element name is given as an attribute value.


This is an odd case where the horrible linear structure of OOXML works to its advantage, because there is less chance of asynchronous elements. Also, because OOXML is much more based on small fragments with a universal reference mechanism (OPC) there is less need for embedded foreign elements in the main document: any large chunk is likely to be a separate part (file in the ZIP.)


On top of that, there is the extensibility mechanism that part 5 specifies, but I don't know how that will work out in practice.

Rick Jelliffe
2007-09-13 08:46:46
Bruce: I was asking about Sun because Tim is a Sun guy; there are usually several different reasons for and against any position. I probably expressed myself a little cattily (Tim should be used to that by now) but the question was not entirely flippant.


I am aware that others in the ODF TC disagree with Gary Edwards' take on things, which is why asked the question. If you have on one hand ODF people saying "You should use ODF because it can do everything you need" and on the other hand you have an ODF TC person saying "ODF TC deliberately decided not to add things that Office has" then it seems reasonable to figure out what is going on, ne? And Tim is the horse's mouth: as IBM's comments yesterday indicate, Sun plays a leading role in ODF, to its credit.


There seem lots of good answers. Perhaps Sun doesn't want to entrench Office-specific behaviours further. Perhaps Sun wants ODF to be free of legacy Office crapulosities. Perhaps Sun wants to approach the problem systematically, when ISO Office Open XML is published. Perhaps Sun is waiting to welcome MS into the ODF TC with open arms, and is creating a friendly vacuum to suck MS in... All interesting possibilities.

Bruce
2007-09-13 09:39:04
Rick: first, I don't think Tim has anything in particular to do with ODF development. Yes, he works for Sun, but so do thousands of other people. I really don't think Sun positions WRT to ODF are driven by management in California in any case, or that they are even likely aware of any of the technical details.


Second, as I said, this was not per se a Sun position. The arguments are all on the list archive if you're really curious.


Finally, IIRC, the issue WRT to lists was a claim (unconfirmed, and disputed) that the proposal that won represented an incompatible superset of what is in OOXML. We've actually had requests to deprecate features in ODF because (presumably, because it's rarely explicit) they are not supported in OOXML. I imagine there are similar suggestions in the ISO comments about OOXML.

orcmid
2007-09-13 11:55:05
I think you're onto something, especially about the dependency advice. It might take some ODF-tuning to make it work more consistently. In some places they support alternate renditions, and this could be a flavor of that too - provide an ODF construct and an alternative OOXML-preserving one with OOXML namespace usage.


It would take some work to get the "compatibility mode" issues working, but that is goodness for ODF to address since they are going to have it among multiple ODF-supporting products as well, unless it really becomes that OpenOffice.org is the only implementation (smiley goes here).


I think the coolest thing for convergence would be for ODF to come-up with an OPC-based carrier for ODF format down the line. Then some work on the relationships could be used to deal with this and something that is sorely needed, document profiles for (various situations and levels of) interchange and collaborative use.

len
2007-09-13 14:25:38
What Tim said about redundancy rings true. The problem is the size of the conflict is now such that a pull to a center will take incredibly gifted leadership and will.


Sun may be moving into a position to influence that.


On the other hand, one won't be able to call it ODF or OOXML. There is too much damage in my opinion.


This is why smaller specs are better. The same limitations to scope in technical matters tend to work in political matters. That brings us back to the discussion that should actually be interesting to all XMLers: good practices for namespaced design to accomodate semantic collisions.

Rick Jelliffe
2007-09-14 22:36:50
Bruce: I think Tim is in a position to speak for himself and for Sun. He is Sun's Director of Web Technologies (ODF via Google is a Web Technology, I guess), he has important views on OOXML, and he is a good judge of horse flesh (err what's with the horse reference again?)


Auberon Waugh coined a term "Pilgerism" which is "presentation of information in a sensationalist manner to reach a foregone conclusion". The NOOXML website and the site of some of my blue friends are pure Pilgerism. Tim's blog is definitely not Pilgerism. The emotionalism of Gary Edwards' comments on the ODF TC may indeed be Pilgerism, but I do happen to agree with his thoughts on the importance of the default Save format for ODF. However, what is wrong about hearing Sun's version of events?

Rick Jelliffe
2007-09-14 22:47:06
Len: Another dynamic at play is that there is sometimes little point for a developer to participate in standardization if the resulting standard would require large-scale reimplementation of their technology, once their technology is already deployed.


We won't see a standard with Java and C# merged, for example. A few minor syntax changes, but untold pain for the coder and users.

len
2007-09-17 06:21:20
True without a doubt and denying there are no business politics to the processes as Sessions' tried to do is to deny the reality of these being products at point of sale. There are certainly different experiences with different standards at different times, but for any standard I remember where there was a substantial market, the business politics were evident. I've seen the personal kind too, but not with that force.


So again, it comes down to leadership, professionalism and practice. Turning one's back on the bull in the ring to bow to the crowd is dangerous for matadors and dangerous for standards professionals. While it may seem sensible to lay out of a standards process for a product one doesn't support, in a market where the personality split is as you describe it, it is risky because the process behind the door becomes the product of the press. And that is a process that is too easily led.