What is in the new draft of OOXML?
by Rick Jelliffe
The story so far
- In the 1990s and earlier, Microsoft was notoriously prominent in its desire to keep its binary formats proprietary: it provided RTF for text-based interoperability but RTF did not allow full round-tripping of data.
- In 2000, Microsoft started providing XML data dumps for spreadsheet data and each subsequent version MS Office has used XML more, with the Office 2003 providing quite full support, to the extent where now the default save formats, on the Windows platform at least, are all XML-in-ZIP file, the latest generation with the name Office Open XML (which people often write as OOXML.)
- In 2004 a European Union agency recommended to MS that it should continue down the XML route and open up its formats by submitting them to some international standards body. (At the same time, a recommendation was issued for OASIS to submit ODF to ISO.)
- In December 2005 Microsoft founded a technical committee at the ECMA standards body, TC45, which worked for a year and released ECMA 376 in December 2006; during this time the specification, which included much text based on documentation for the older binary formats, grew from about 2,000 pages to over 6,000 pages. A public draft was issued in mid 2006. (At the same time, around December 2005, OASIS submitted ODF 1.0 to for ISO consideration using a variant fast-track procedured: it was accepted with scant National Body review in mid 2006.)
- At this time (December 2006) ECMA 376 was submitted to ISO/IEC JTC1, the international standards organization, for "Fast-Track" adoption as a standard: the fast-track process is used for standards which have been drafted at other organizations, and enter the process as Final Draft International Standards. At this stage, National Bodies had about eight months to review the standard and come to an initial position. Many National Bodies invested significant effort in attempting various reviews, however this period was also characterized by the raising of many spurious issues. (In early 2007, an update to ODF called ODF 1.1 was released at OASIS but not resubmitted to ISO, with improved accessibility features.)
- In September 2007, the initial ballot of National Bodies resulted in a significant number of "No with comment" votes, which triggered a Ballot Resolution Meeting (BRM). The BRM had been widely expected, due to the expected large number of comments. in the ISO process, a "No with comment" has also been called "Conditional Yes but many journalists and commentators at this stage preferred oversimplification to reality. Over 3,000 individual comments were received, however the majority of these were repeated form-letter comments part of an organized campaign, rather than coming from fresh National Body Reviews.
- In mid January 2008, the Editor for DIS 29500 released a promised Disposition of Comments document, containing suggested fixes from ECMA for addressing the National Bodys' issues: these ranged from simple acceptance, to alternative approaches to rejection of the issue, with their justification for these. ECMA had bundled the issues into about 1000 different responses. I wrote earlier, The Editor’s Disposition of Comments ...is usually the starting point for comment resolution, and, given that most comments are uncontroversial, is often the end-point too.
- In early 2008 Microsoft releases the binary format documentation under its OSP covenant, and promises the mappings between the binaries and OOXML: this seems in direct response to requests for this from NBs, though the mappings are not in-scope for DIS29500's text.
- In late February 2008, a week-long Ballot Resolution Meeting was held in Geneva, Switzerland. It was attended by 120 individual delegates from about 34 different National Standards Bodies. The outcome of the meeting was a series of editor's instructions to allow a new draft of the standard to be create: usually these instructions are completely specific though there may be some general ones, for example to use one term rather than another globally. (At time of writing, March 2008, OASIS has been working on ODF 1.2 which is slated to improve several important ODF weakspots, in particular relating to formulas and metadata. It is mooted for re-submission to ISO during 2008.)
- The results of the BRM are available online and
National Bodies now have one month (end of March 2008) to decide if the changed draft meets their requirements. For the new draft to pass, it will require 5 National Bodies (of the "P" class), to switch from Abstain or No votes (remembering that No with Comments may mean "Conditional Yes")
- Of the 1027 Editor's responses, the BRM addressed 189 responses by specific resolutions and discussions of the BRM, and the rest using a paper ballot where each National Body in attendance voted: this accepted 825 of the Editor's recommendations and rejected 13. (The issue of a paper ballot had been abstain on issues of lesser interest to them.
- If the new draft is adopted as a standard, it does not remain static but can be "maintained" by the relevant ISO/IEC JC1 committee, SC34, Document Processing and Description Languages. Procedures exist for National Bodies to submit Defect Reports, which again attract the Editor's attention and National Body voting acceptance, so the kind of process seen at the BRM becomes an ongoing effort, if there is enough interest by National Bodies.
The upshot is that, if DIS29500 mark II and ODF 1.2 both get accepted as standards, by the end of 2008 we should have two standards which together can thoroughly cover the field of representing current and legacy office documents, each representing one of the two dominant commercial traditions, with both under active and significantly open maintenance to fill in the remaining gaps and to repair pending broken parts, with clear cross-mapping to allow interconversion, with an increasing level of modularity so that the can share their component parts, and at least with a feasible agenda of co-evolution and other kinds of convergence.
And if we play our cards well, both traditions will have significant competitive motivation to accommodate the technical requirements of their competitors. Viola, harmonization? (Violà, harmonisation?)
The big picture changes
The "big picture" changes very often concern issues of conformance and modularity.
- The draft is being split into 4 Standards,
- 1. Fundamentals
- A large standard for the core of OOXML
- 2. OPC
- Open Packaging Conventions: the details on using ZIP and referencing
- 3. Markup Compatability and Extensibility
- 4. Transitional Migration Features
- ContainsVML and features not recommended for new documents. Problematic terms like "legacy" and "deprecated" have now been avoided.
- Six document conformance classes have been created: Core and Transitional classes for WordProcessing documents, Spreadsheet documents and Presentation documents.
- Six application conformance classes have been created: Base and Full classes for word processors, spreadsheet and presentation applications.
- The scope sections have been clarified.
- Normative references are to be complete.
- Use of standard formats for syntax: BNF
- Use of standard measures for typesetting lengths
- Use of standard format for dates
- Use of IANA/ISO names for language and countries codes
- Development of a prefix mechanism for spreadsheet formulas, presaging a full namespace modularity system like Open Formula's.
- Encouragement for applications to save equations as MathML even if they also save in the OMML maths.
- Many casual references to MS-tradition technology removed and replaced by references encouraging W3C technologies for interchange
The small picture changes
The small-picture changes frequently are aimed to make the draft more "ISO-ish" and therefore make maintenance and future development at ISO/IEC JTC1 easier.
- All known typos will be fixed
- All known errors in examples will be fixed
- All schema fragments will be marked informative to prevent clashing
- ISO standard conformance language will be used: shalls and shoulds
The middle picture changes
The changes from the BRM usually relate to either correcting bugs or better documentation. Additions to functionality tended to be limited to providing better accessibility and better internationalization, rather than completing or expanding the general feature set. The Editor's Disposition of Comments clearly tried to reduce the amount of gratuitous breakage of documents or applications, and the explicit resolutions of the BRM continued this policy IMHO.
- Accessibility features to support better tabbing (in the fashion of HTML's tabinfo) and table labelling. An informative reference to guide developers in accessibility features is being added.
- Multiple changes to support right-to-left writing, half-width character terminology and less US-centric artwork and measures
- The schemas have been re-written to be more compatible with the frailties of various XSD implementations. The XSD schemas will be included in the text as annexes with line numbers. There will be both Strict and Transitional schemas, following the model of HTML. The RELAX NG schemas have been regenerated accordingly and much improved: many people may find them preferable to the XSD schemas.
- Hundreds of clearer explanations of multiple elements and functions.
- Almost all bitfields will be replaced by specific attributes. (The bitfield which accords with ISO Open Font remains.)
- Fixes to the CONVERT() function and a mathematically proper ceiling function, ISO.CEILING() for spreadsheets
- A mechanism to prevent applications from executing files with incorrect types, to prevent viruses
- Strings may not have non-XML graphical characters in them
- Different hashing algorithms
Plus hundreds more.
Many other related issues were also discussed in the hallways at Genva. For example, the German DIN standards body is preparing a cross-mapping list to match features in OOXML and ODF: there really is very little information on this currently, despite the confident assertions that ODF can/cannot handle everything that OOXML does and vice versa. The Italian standards body is seeking to work on conformance suites for testing: obviously the schemas and BNF grammars allow validation testing of instances for document conformance, so I presume the test suites will be more concerned with application conformance. ISO/IEC JTC1 SC34 has been making various preparations to establish an effective and responsive maintenance regime: ODF could also benefit from this effort.
With over 1,000 changes, I certainly will have missed out some items of interest. Will these be enough to sway the necessary five National Bodies? The changes certainly provide objective extra information favourable to DIS29500 supporters, and the sheer number of changes suggests that ECMA is not going for a first-past-the-post strategy but trying to demonstrate a broader commitment to improvements even from antagonistic National Bodies. But though the anti-OOXML faction doesn't have any new information to provide a counterbalance (discarding the frantic and self-justifying posturings over the BRM) I expect that they will try to explain their longstanding objections more carefully and acutely, since they do raise many good points.
I thought the BRM went very smoothly, for a large high-stakes meeting, and I was happy to make some old and new friendships. In substance, the BRM was a typical ISO meeting of this kind: collegiality, druthers, voting, discussion, corridor meetings, rounding up supporters for measures, trying to track down definitive answers on technical issues, and so on. In accidents, it was very unusual due to size, content and ramifications not to mention the new blood pool.
I think we did pretty well in the Australian delegation, in getting many of our issues addressed completely and most of our issues addressed in part, but (like any standard!) the more you look the more holes you see. There are so many improvements that can and should be made by pro-active maintenance. At various times we had particular help from CA, MY, JP, UK, CZ, FI, US, and several others, so an unofficial thanks to those delegates from this delegate.