That diagram (Let me ring your bell)

by Rick Jelliffe

You've probably seen it. IBM's Rob Weir's 2006 diagram comparing the number of pages of various standards versus the time they spent in committee. It makes its appearance unchallenged regularly: indeed IBM (business rival of Microsoft)'s Bob Sutor gave the diagram a prominent place in his blog this week with what, presumably at this last stage, contains the essence of IBM's argument against DIS 29500 and Office Open XML.

At the Standards Australia meeting, the diagram was brought out again, and I protested that it was misleading, but seeing Bob's blog makes me want to explain my criticism more. Here is the scary diagram:

spec-speed2.jpg

Digression


The issue of page count and book size is prone to publicity stunts. If you look at this web page, for example, you can see two different printouts of the open XML Spec, The first manages to fit in boxes under a man's arms (and we don't know how full the boxes are) while the second manages to be taller than a man! What can account for this doubling of size? Perhaps it is the magic of single sided printing and thick paper :-) (In the 1990s I was discussing a book with a publisher who said "it has to be 1.5" thick, but if you don't have enough material we will use thicker stock"! ) Say we have 6500 pages, and we print it at the maximum common paper weight of 105 weight Bond ledger, that gives us almost 3 metres of print out (10')! But if we print it at the other minimum common weight of 16 weight, that gives us a tad over 50 cm (20"). On average paper weights, this should give about 64 cm (25").



But back to the main story. I'll deal with the issues I have with in reverse order of their seriousness.

Apples and Oranges



If you are using page size to compare documents, you really should make sure the documents are typeset the same. I moved the Open XML spec down from its extravagent 11pt body font and large heading spacing to follow the ISO standard 10pt.

Viola, I estimate that about 1,000 pages can be reduced by this. (Added: I estimate this because I tried it. I saved 800 pages on part 4 alone just by moving to 10pt and more typical ISO clause spacing. Technically, this is because there is so much display content and two-line paragraphs that get pushed to the next page, cascading with many paragraphs taking one line fewer.)

spec-speed3.jpg

Difficulty of Review



The diagram uses page size as a unit of preparation and review. However, not all pages are equal. A page that contains normative text requires much more review than a page of informative text. A page that contains auto-generated text requires almost no review at all: you sample enough instances to have confidence in the autogeneration and then skip the rest.

Now this is especially relevant for DIS 29500, because it contains enormous amounts of non-normative/tutorial text and of autogenerated boilerplate. ODF editor Patrick Durusau this week tried a an experiment where he removed this fluff, and he reduced the WorkdprocessingML specification from about 1880 pages to about 600 pages (and he thought it could go a few hundred pages more!) Most standards avoid tutorial and non-normative material because it increases the tedium of the review process and confuses readers. A good tutorial is usually a bad standard, and vice versa. DIS 29500 is a really extreme example of this.

So lets say that only a quarter of the text is normative and non-autogenerated (based on Patriclk's results, and considering the impact of the normative Part 3 and so on, And that the non-normative text and autogenerated text takes about 1/3 of the review effort. That means that, effectively for review purposes, the document requires only half the effort for the number of pages.

So divide the effective page size in half. (The legend "Number of pages" becomes "Review effort expressed in terms of equivalent number of normative pages")

spec-speed4.jpg

Time spent in Review



Now lets look at the other axis. Wier's numbers here seem to be based on the time spent in committee before coming up for a vote. That might be interesting a year ago, but it is positively misleading a year later. Why is it still being bandied about like this?

In the case of ISO fast track standards, there is the whole review process by ISO that is omitted: the informal discussions with SC34 before submission, the 1 month administrative review period, the 1 month contradictions response period, the 5 month technical review period just coming to an end, and the ongoing review where each national body looks at each other's comments over the next five and a half months before the Ballot Resolution Meeting in Geneva, which I expect to happen. That is a full year.

So add an extra 370 days there.

spec-speed5.jpg


Nature of Review



The work that a committee does in compiling or creating a standard for a pre-existing technology is very different from what the work that a committee does in creating or augmenting a standard. When the proprietary Torx screws became an ISO standard, one can imagine that the committee had little to do. By contrast, the committee that produced the ISO PDF/X standard had a bit more to do, but still no where near what they would have to do if they were developing a standard fro scratch.

The work is review and discussions of policy, relieved of what-ifs and who-needs-this? As a completely conservative estimate, lets say that development of new material takes half the time, and review takes half the time.

Since we are measuring this in pages, lets be conservative and say that this relieves the committee process of 25% of its workload, and express that in effective pages.

spec-speed6.jpg

Since we are looking at the workload of a committee, what about where a committee doesn't have to author much, but is presented with a selection of workable drafts from the pre-existing documentation of a product? That is obviously a lot less work than writing for scratch, especially for the editor.

So lets say that this makes a committee 25% more effective, and express it in effective pages as before.

spec-speed7.jpg

The other standards



Now, of course, to compare apples with apples, we would have to do the same procedure to the other standards, and they would move in the same kind of direction to a greater or lesser extent. But none of their shifts would be anywhere near as much as Open XML's because it has the quintuple whammy of typesetting, fluff, the BRM, the lack of need of development, and pre-existing editorial material.

Furthermore, these other standards are not standing still. ODF has moved to ODF 1.1 with 1.2 in with works.

I have two other additional reasons why I think the diagram (or, at least, the way it is used) is misleading.

Ex nihilo?



The first reason is related to the last segments above. It is really not fair to compare a markup language for an old technology with a markup language for a new technology merely on the basis of the committee time. Microsoft moved into documenting text formats for its standards when it purchased RTF from DEC around 1990. A lot of the documentation in Open XML is adapted directly from the RTF and DOC documentation. Its basic strengths and weaknesses are well-known and long documented.

There have been perhaps fifty different versions of the .DOC format, on six different operating systems over the last twenty of more years. To ignore this history and just use committee time as the metric seems to me to miss out something important. A new standard does not come with all this prior work (and baggage).

I am not sure how to diagram this. Perhaps a line indicating the time the technology and documentation was in development before the start of the committee process? Lets date that from the advent of RTF rather than from the first .DOC format.

spec-speed8.jpg

VML is a particular issue here: it was introduced into IE 5.5 and presented to the W3C committee. To ignore that early development and attempted standardization work seems to miss something important, again which is why I think we have have to be careful not to be mislead by the diagram.

Separate Technologies



Finally, my other problem with the diagram is that people use it to say "this is so big it cannot be reviewed". However, Open XML is made from five or more completely distinct sublanguages: OPC, WordprocessingML, SpreadsheetML, PresentationML, DrawingML, VML, and then the extensions mechanism of Part 5. One person iis not expected to review a whole standard, it is done in co-operation with a committee. India is a good example here: they had separate task forces working on each of the three major application schemas.

So while the size of the draft in total is large, it can be decomposed into smaller sections and reviewed. There have been over 2200 people involved in national standards bodies reviews, I am told: that is a lot. If I was being as free with numbers as some people are, I would say that this represents about three pages per person! But of course, that would be just as flawed logic as accepting Rob's diagram at face value.

So lets divide up the specification into its parts, and see where they fit on the chart. I'll take into account the extra time for review, but just use the current raw page count for OPC (part 2), and the individual languages of Part 4 and 5. We get a diagram showing the size of each distinct (and therefore separately reviewable) sublanguage in page size of the current draft.

(If you select "View Image" or the equivalent in your browser, you will be able to see this a bit more clearly: the OReilly formatting system may get in the way here.)

spec-speed8.jpg

And finally, lets have a look at what happens when we look at these separate languages, but get rid of the fluff as I suggested in the submission I sent to my national body for their consideration on the Australian vote. For WordprocessingML we will use the number that Patrick Durusau found when he stripped out the fluff: about 800. For the other largest four, we will just say that half is fluff, being conservative. (Actually, in my submission I want to remove some lists of examples such as border art to another part, but border art is hardly taxing on the reader.)

So this is a diagram of the estimate page count of normative pages in the component language standards of Open XML, against the time spent in Ecma and ISO development and review (and assuming a Ballot Resolution Meeting).

spec-speed11.jpg

Note that this diagram does not include the "effective size" considerations above, so the position of the new items can be compared directly with the other pieces of data on the page, as apples to apples. To the extent that the other issues raised above apply to each language, their star would move left (and up); however, for a good comparison the other standards mentioned would also have to have their position adjusted in accordance to the same factors: however, as I mentioned, because the other technologies consist largely of normative material, the adjustment would not be as great; the other technologies might also need to have ISO process time added too, I don't know whether Rob's numbers include that or not (the effect would be add six to twelve months in an upward direction to some of the blue points.)

Bottom Line



So that is seven reasons why I think the diagram is misleading. Or, at least, why the diagram itself does not give data that is particularly useful for anything other than mindless sloganeering.

What I don't understand is why people are not on to these kind of tricks. Big standard, ooh scary. Have people never heard of Adam Smith and the division of labour? Have people never changed font size and had a different sized document as a result? Do people think that all text is equally taxing for review? Do people think that adapting a standard from pre-existing text is not easier than writing (and indeed) developing the standard from scratch? I suspect that many people see that on the original graph the OOXML point lies so far to the right, and because pages are easily countable, they don't have any alarm bells ring.

So let me ring your bell, if I may: what the original diagram tells us is that the standard has a lot of text. And that one stage of its life in a committee took about a year in 2006. both those things are such a partial piece of the picture (where is 2007?) that while they are of some sensational value, the diagram can be misleading.

38 Comments

marc
2007-08-29 22:42:21
rick


you are taking this too much personal.


don't let this thing put you in the position of defend the indefensible.


some times it is a bit pathetic, like this red lines drawed to "catch" the other standards... this is not serious, man


my suggestion: keep your reputation.. don't burn it in the net with this kink of posts


Rick Jelliffe
2007-08-29 23:03:34
Marc: I think this diagram is misleading for the uses that are made of it now. Even when it came out in 2006 it irked me, for its simple-minded equation of equal difficulty for every page. But back then I was not as aware of the amount of non-normative material in DIS 29500, which makes simple comparisons with standards that have almost entirely normative content unrealistic.


Certainly your mileage on individual issues may be different to mine. You may end up a few hundred or a few thousand pages different. But the diagrams are there to demonstrate the serious points in the text, not any kind of frivolous personal attack or defense.

Yoon Kit
2007-08-29 23:31:56
Rick,


This post surely must be a hoax, after your last post?


Just in case you were serious, some questions:


1) If you have to add the 6 months to MSOOXML for the review period, shouldnt you be adding the 6-12 months of review periods of the other specs?


2) Why would WordprocessingML, MathML, SpreadsheetML, PresentationML, VML and DrawingML have the same amount of review time at 700 ish days? Shouldnt it be 700 divided by 6 = 100 plus days?


3) Would you agree then that Microsoft/Ecma should instead then split up the large spec (even at 3000+ pages, its still LARGE) to its individual components and process them separately? Would VML then fail as it did back in 1998?


If Rob Weir's diagram is misleading, yours is by far, worse.



yk.

Rick Jelliffe
2007-08-30 00:25:51
Yoon Kit: Not a hoax. What is this anything different from my last post? I have long agreed with ODF's Patrick Durusau that DIS 29500 had too much text: one of my suggestions in discussions was to get rid of all the text and just have the schemas! I support Open XML becoming a standard, I think it would be a win for the public and industry; and I think that DIS 29500 can be made good enough with some changes; and I think the BRM is the best forum for this.


I have not changed sides: the people at MS have known this was my position for a long time, they have never objected to me saying so in the meetings or seminars I have attended, they have never attempted to suggest a position to me. I have regularly said that just because Open XML becomes an ISO standard is no reason for governments to require it for public documents on the web, and that ODF would be a better choice: again, there has never been the slightest pushback from MS. But I see NOOXML has an article saying that I am attempting to bail out my reputation. Childish.


Misleading diagrams like this one don't help people understand DIS 29500, in fact, it encourages the idea that it is some monolithic and unapproachable technology that will outsmart the attempts of mere mortals to review it.


1) Actually, I mention that the existing points need to be similarly adjusted. "To the extent that the other issues raised above apply to each language, their star would move left" but I will add "and up" too. My trouble with the vertical axis of the diagram is that it only shows one portion of the total process, but the diagram is used as if it showed the total process. My problem with the horizontal axis is that raw page count is an unreliable measure of the work to develop or review a standard. A point on a graph with partial data on one side and unreliable data on the other is misleading.


This kind of simplistic diagram becomes more unsatisfactory as soon as you need to get more realistic. Where would ODF 1.1 go? as an addition to ODF 1.0 or as a new committee process?


2) If there were one person reviewing, then you would be right. But since there can be a division of labour, and because the parts are very independent, they can be treated separately: as indeed India did.


3) In fact, I think I may have been the first person (privately with other SC34 people) to raise the issue of splitting up the standard as one solution. When I was in India, I even asked some experts there whether they thought is was an option. There is some disagreement here: some have told me it is not possible, but at the Standards Australia meeting, a recent example was raised (I didn't catch the details unfortunately.) But my favoured option is given in my submission: drastically rearrange the standard into parts. Then get as many of the parts dealt with in the BRM. (If the BRM does not get to some issues, Ecma and MS have said they will still look at them again as part of the maintenance process.) If there are not enough vote conversions during the meeting for the draft to be accepted, then certainly submitting it to the normal track would be an option, but I expect that the BRM will get through the issues fairly fast. Stakeholders will have had months to discuss appropriate responses, so the BRM may be a formality if stakeholders communicate enough.


The ISO Secretariat needs to show flexibility here: if the BRM proposes to split up DIS29500 into separate parts (i.e. separate standards) and perhaps even approve some parts but put other parts back to DIS stage on slow-track, and if this is acceptable to Ecma, then I hope ISO Secretariat or JTC1 would allow it. But this is just speculation.

Yoon Kit
2007-08-30 00:41:28
> you can see two different printouts of the open XML Spec, The first manages to fit
> in boxes under a man’s arms (and we don’t know how full the boxes are) while
> the second manages to be taller than a man! What can account for this doubling
> of size? Perhaps it is the magic of single sided printing and thick paper :-)


The answer is very simple. Its doubling in size because, if you read the blog entry, the Malaysian, who is holding the two boxes, did not want to waste paper, and instead printed MSOOXML on both sides. BTW, the boxes were full.


The Czechoslovakians in the meantime used single sided print for MSOOXML.


Thus the doubling in size.


Also the stack is not taller than a man! No magic required! Thats just a trick in perspective where closer objects appear larger than objects in a distance.


yk.

Asbjørn Ulsberg
2007-08-30 04:08:02
To add the binary Office documents to the development time of OOXML is exactly like adding XML, SGML and GML to the development time of SVG, which should add another 40 or so years to it. You probably want to add this to OOXML's development time too, since it too is based on XML.


But why stop there? We can add years upon years of development that wouldn't be possible if it wasn't for something that was developed previously. Sure, it makes your graphs look cool, but it doesn't make much sense. In fact, it makes you look stupid. Perhaps even more so than you are.


More to the point: All the preparations you did to shrink the size of the specification and make it more readable is something Microsoft, or at least ECMA, should have done before submitting OOXML to ISO. Having to publish explanations and excuses like this to make the specification look prettier, shorter, more elegant and more readable than it is just makes you and the specification look even more pathetic. You're just putting lipstick on a horse, pretending and trying to convince the world that the horse is Gisele Bündchen. If you need to go to such extremes to defend a specification, then perhaps it isn't a very good specification after all.

Christoph
2007-08-30 04:12:05
Rick,
You makes some valid point with regard to "removing the fluff" and I even concede that working on the draft provided by Microsoft might be easier since I do not know about standards writing. However, certain points about your treatment of OOXML, I cannot accept:


1) Rob Weir looked at the submitting organisations and not at the ISO process. This year should not be added.


2) Documenting an existing product is easier you say, but this applies to other standards as well. C++/CLI builds on C++, ODF on openoffice experience. So for many of these standards qualify for the same 25% percent bonus. Also I believe that the 2nd subtraction is not justified, since if you are presented with a draft you have not worked on, review takes more time, since you gain familiarity of the standard while authoring it. For these reason, I think that subtracting 25% twice is not justified.


With your arguments that I think are fair, OOXML comes in at about 2000 pages / 400 days which still makes it an outlier.


I think that in particular your last picture is misleading, since you grant the individual parts of OOXML division of labour, which certainly would apply to ODF and probably other standards and if you add the time spent with ISO, you will find that the other standards left the area where your stars are now.

Rick Jelliffe
2007-08-30 05:03:34
Anders: I was adding the 17 years of RTF time: that is their previous textual format. And anyone who reads the documentation for their binary or RTF can pick up the chunks copied over to DIS 29500.
Rick Jelliffe
2007-08-30 05:19:13
Christophe: What has being an outlier got to do with anything? The original chart does not represent a sample but a self-selected survey, and from different organizations with different practices. Outlier, shmoutlier...


IS 26300 is different from DIS 29500 in several ways. For a start it already factors out several of its languages. SVG is already on the chart for example, but its page count is not added to the Open XML page count. Also, it is more integrated than OOXML: it uses its normal tables in spreadsheets for example, while OOXML has different elements. But, yes, one could indeed subdivide all the languages right down to the element level, and be left with a graph of thousands of one or two page specs :-)


And no matter what fineness or coarseness of grain was used, one would still need to look at the actual editorial and technical details, the history, the working material, the processes, the goal of the standard, the histrionics and politics of the committees, and so on before one could make anything other than the most tentative of generalizations.


The more red lines I draw the more that the biases in the original drawing come out. The bogosity of my red lines is that they should spread out into a vague spray of possible values rather than precisely locating a space; you are free to do the same for the blue points, but base it on the actual editorial content, history and so on, and don't leave out a year of standardization here and 15 years there, as the diagram does (when people use it to represent text development time or total discussion time.)

Christophe
2007-08-30 05:27:47
On your point that reviewing non-normative material takes the same time as normative material, I think that reveals a lack of experience. If you look at Part 4 chapter 2, for example, (wordprocessingML) you will see that about 1/4 of the text is normative and not autogenerated.


Now the autogenerated text is reviewed by sampling: check 10 cases and you are OK. The non-boilerplate you only check to make sure it does not go beyond the normative text, not something you need to think about in its own right. Normative text, however, has new material that you need to think about in terms of your review principles, a very different matter. As long as non-normative text is correct against the normative text, it doesn't matter much: it is there largely at the editor's discretion, and to pander to anyone who wanted some extra "clarity" for whatever reason.


The simplest way to review non-normative text is to red-line it all with only the most basic of glances.

Rick Jelliffe
2007-08-30 05:28:52
Oops that last post was from me to Christophe.
len
2007-08-30 06:20:47
Weir does not mention that outliers also happen when an item not in the same class is added to a graph. Outliers are tossed out only by naive statisticians. Analysts know that outliers also represent 'new development'.


YK: VML is in much heavier use at this time then SVG. Exactly where did it fail? At Adobe? It seems SVG is struggling and VML is still chugging. Not that VML is a great vector graphics language; just that is survives by being used. MS did a pretty good job of making it 'straightforwardly easy to use over the Internet' though the stylesheet attachment. I only wish they had kept on improving it because the dll has some infuriating bugs.


Don't get wrapped around the diagram. In the scale of what is required to put a real interoperable standard for multiple applications together, those numbers are meaningless.


And yes, OOXML should have been split into separate work items. That's obvious.

William
2007-08-30 06:32:44
No matter how misleading the original graph may be (and I agree that there are problems with the methodology), what you did above is blatant hucksterism. No competent scientist will differentially change methodology to move one point on a graph - it is just wrong.


Rob Weir has a strong (very strong?) bias - and so I think his graph needs rebuttal. But to just fiddle with one point assuming that only that point is skewed is both profoundly wrong and *at least* disingenuous, if not fraudulent.


You have been straightforward on this blog, highlighting areas of common misapprehension and making a reasoned case for the standardization of a single-vendor spec - don't lower yourself to actual lying now.

Rick Jelliffe
2007-08-30 07:04:44
William: Lying? But I draw it in a different colour! :-)


More seriously, I specifically say "Now, of course, to compare apples with apples, we would have to do the same procedure to the other standards, and they would move in the same kind of direction to a greater or lesser extent." In what way is that is unclear or lying?


In fact I say this twice, in case anyone was dozing or bitchy: however, for a good comparison the other standards mentioned would also have to have their position adjusted in accordance to the same factors:


The only point that the diagrams need to make is that whereever the Open XML dot (or stars) belong, it is not where the original diagram had them, if people are to make the kind of interpretations they make with the original.


My diagrams are indeed polemical, not scientifical. But the purpose was to show that Rob Weir's original one was also polemical: a cartoon in the form of a chart. Even when accompanied by intricate discussion such as above, they still are unsatisfactory: but how much more so the original which (as people use it) is accepted at face value as something informative rather than as something counter-informative.

Rick Jelliffe
2007-08-30 07:13:13
Perhaps I should say it like this: if the diagram was to make sense, it would have include as "Technical committee effort" the time spent from the formation of the committee to the time the last technical change of the text was accepted. And it would have to include a component of the pre-committee maturity of the standard: for example, the date of the earliest documentation that made its way into the text. And it would have to include something a lot more simplistic than page count.


For example, it could use my Schema Complexity Metric to estimate how complex the schemas are, and take that as a metric of review difficulty. Or multiple factors. But these don't make the point "OOOOOOOOXML is scary big" so would be useless.

William
2007-08-30 12:34:45
Mr. Jelliffe:


If I might coin a term; "A picture is worth a 1000 words, and to really lie you need statistics, but a manipulated graph is worth 1000 lies."


Mentioning in the text of your post that your graph is at least as specious as the original is no excuse for putting up a misleading graph.

Brian Jones
2007-08-30 14:03:58
Another factor that you might find interesting here is how TC45 split out the work. Since the spec was always built using the .docx format, we were able to first shred the XSD files into about 10,000 rows in a SQL table, and then automatically generate the specification as a .docx file from there. Any changes to the actual structure of the formats would be done in the XSDs and the documentation could be automatically updated to reflect those changes.


The documentation was all done using .docx files, where we could add as much or as little information on each element/attribute/enumeration/simple type simply by editing the spec in Word and then reshredding it back into the SQL table. Any changes to the actual structures of the formats (element names; children; types; etc.) was done against the XSD.


One of the big advantages to having the spec shredded into these 10,000 rows, was that we were able to work on specific pieces of the spec at any given time while other pieces were being edited in parallel. So we would have a few .docx files out for review at any given time, and the specific folks interested in that topic were able to focus their attention on that document. When they were done the committee as a whole would vote on whether it was all set and then that document would be re-shredded back into the master table. We had a lot of people on the technical committee who were all experts in different areas, so this type of review lent itself very well to the make-up of the committee. It’s what allowed us to work on WordprocessingML; PresentationML; SpreadsheetML; and DrawingML all in parallel.


-Brian

Rick Jelliffe
2007-08-31 03:04:56
William: To discuss the positions on a chart, it is reasonable to use the positions on a chart.


While I appreciate you concern for my illiterate audience, whom I was not aware of before, they can have the benefit of the speech synthesized version of the article that the OReilly people have thoughtfully provided.


However, to say "your charts are bogus too" rather avoids the point, don't you think? I am happy if people start looking more critically at charts. But I am not the one putting my charts up before standards organizations around the world.


If you do recognize the original as specious, are you spending the same (or more, since it has 1000 times wider distribution) time protesting the original?

len
2007-08-31 06:01:25
@Brian:


How well did that parallel process work for reconciling couplings between items on parallel tracks?

Segedunum
2007-08-31 14:49:02
indeed IBM (the vendor of a closed source office suite and business rival of Microsoft)’s

There's nothing like impartiality and showing your true colours. What kind of office suite do you think Microsoft are bloody well producing?


The first manages to fit in boxes under a man’s arms (and we don’t know how full the boxes are) while the second manages to be taller than a man! What can account for this doubling of size?

Errrrr. Considering that there are two sides to every piece of paper, double-sided printing, per chance?


I moved the Open XML spec down from its extravagent 11pt body font and large heading spacing to follow the ISO standard 10pt.


Viola, I estimate that about 1,000 pages can be reduced by this.


At this point, I'm really hoping that this is a joke......and that's pretty much it. There is nothing of any credibility whatsoever. Not a trace, and that's saying something.


Rick Jelliffe
2007-08-31 19:51:31
Segedunum: On the 1000 pages, it is no joke. I reformatted part 4 and saved 800 pages (10pt body, smaller heads, less space between clauses, all along ISO formating specs). I reported this on the NOOOXML website. I have amended the blog to give this information, since a few people are surprised by it.


Err, I specifically mention double-sided printing.


My point was that it is somewhat paradoxical for a company's Open Source staff to be working to flog their closed source products, simply by wearing the Open Standards hat. A person who loves opens source and hates closed source might think on OOXML "oh, IBM is on our side against closed source products", mistakenly. Of course, IBM is more than Lotus etc, and has also contributed a lot to Open Source projects: Eclipse, Apache and so on.


Now notice I say "paradoxical" and not "hypocritical", because I don't think they are, and that was never my intent. 'Open source where we can and open standards where we cannot' is a pretty good arrangement that all the players should adopt. Similarly, I said the diagram was misleading, not Rob Wier, because I don't think for a minute that it is his intent to mislead.


However, in the interests of not being provocative, I will amend the item to just say business rival.

Asbjørn Ulsberg
2007-09-03 00:55:32

My point was that it is somewhat paradoxical for a company's Open Source staff to be working to flog their closed source products, simply by wearing the Open Standards hat. A person who loves opens source and hates closed source might think on OOXML "oh, IBM is on our side against closed source products", mistakenly. Of course, IBM is more than Lotus etc, and has also contributed a lot to Open Source projects: Eclipse, Apache and so on.


How much has Microsoft contributed to Open Source projects? How many other companies than Microsoft was involved in the specification of OOXML? How many companies other than IBM was involved in the specification of ODF? Does IBM have an anti-open source strategy? Who among the two companies are more credible in the open source and open standard area, you think?


Similarly, I said the diagram was misleading, not Rob Wier, because I don't think for a minute that it is his intent to mislead.


Good. Great. His name is Weir, though, not Wier.
Stephane Rodriguez
2007-09-03 01:45:31

It's interesting that some people here confuse length with substance. It speaks volume whether this blog is intended to be taken seriously.


That Microsoft deliberately stuffs a proposed standard documentation with so many examples (informative) rather actual specificiation or mapping tables (from the legacy formats since this proposed standard actually defines a migration file format) only proves one thing : Microsoft can't seem to be able to write a proper standard paper.


I'll add to this I find a bit entertaining the consistent use from Microsoft most vocal supporters the use of expressions such as "Custom XML" and "Open XML".


What does X in XML stand for already? By this definition, "Custom XML" implies that XML cannot be customized, there we need a private corporation like Microsoft to save us from that decease.


By analogy, "Open XML" implies that the XML recommendation from W3C actually defines a closed language, and that our friends at Microsoft are proposing to fix the problem, and save us, again.


It becomes interesting when you put this in full perspective with Microsoft calling anyone opposed to this paper anti-Open XML. I guess that means that the open source community, a number of private companies, and the software community at large are in favor of a closed XML. While Microsoft, our friends, are here to save us from that, and we should help them with their cause, which is to bring openness to XML. As anyone knows, Microsoft does not perpetuate proprietary formats and semantics at all.


Enough joke for today, right? (it would not be so sad if those guys were not actually using the expressions above to in fact deceipt governments and national bodies). It just takes to read any portion of the paper to have a good laugh.

Rick Jelliffe
2007-09-03 02:24:39
Stephane: "It's interesting that some people here confuse length with substance." Yes, that is a good summary of my points about the diagram.
Brandon
2007-09-04 13:23:22

My point was that it is somewhat paradoxical for a company's Open Source staff to be working to flog their closed source products, simply by wearing the Open Standards hat


What does this mean? Respecting clear, open standards while building proprietary products that take advantage of them is neither paradoxical, hypocritical, counterintuitive, illogical, nor even a bad idea.


You sort of backtrack half-heartedly in your next paragraph, but I really can't figure out what you're even trying to say other than you dislike IBM for some reason. Could you elaborate on this statement so I can figure out if you have some keen insight into open source and open standards that I'm missing?

Rick Jelliffe
2007-09-04 19:47:23
Brandon: I have nothing against IBM per se. My company's server-based product Interceptor was developed to run on WebSphere and their testing labs are fantastic. We have been involved in their partners program for development for years. I was a user of Eclipse when it was still IBM Visual Age.


Where do you get the idea that I am against IBM in general, rather than particular rubbish from particular employees' 'personal' blogs on a particular topic?


I don't see it as odd that a company with a monopoly in one area (mainframes) should seek to get into the markets of a company with a monopoly in another area (office suites). Indeed, good luck to them. They are all profit making concerns.


I am sure that IBM will eventually make sure that its employees are scrupulously disciplined in what they write, to prevent accusations that IBM is back in the FUD game (a word coined for them, by the way.) I don't know if people realize how damaging the FUD from various sources has been to the anti-OOXML case.

Ben Langhinrichs
2007-09-04 22:11:00
Honestly, both you and Rob are grandstanding and skewing the results to meet one expectation or the other. For example, you are certainly correct about the type size, and Rob should take that into account, but you are absolutely incorrect when you say that because a document could be reduced to 800 pages, that makes it as easy to review as a document that is actually 800 pages. The reviewer has to work with the document actually given, not a fabtasy document, and all those lists have to be reviewed. Well, the page border stuff can be skimmed over quickly, but separating the wheat from the chaff is work, not something that can be wished away, although somebody reading this standard would likely wish it would go away.


In general, if the standard had really been condensed and the examples reduced and all the other changes you suggest, the complaint about the size would not be there, but those changes were not made, and that is Microsoft's and Ecma's fault, as they could have been done before it was proposed for fast tracking. It is like telling a college student assigned to read War and Peace that because the Cliff Notes for the book is only 100 pages, the book must be easy and quick to read. It isn't any quicker because it could have been shorter.

Rick Jelliffe
2007-09-05 22:41:15
Ben: "Honestly, both you and Rob are grandstanding and skewing the results to meet one expectation or the other. "


Guilty! (On the other hand, I think my diagrams just serve to clarify the text which makes non-grandstanding points rather than being things in their own right, except perhaps for the last one.)


But why do I have to point it out? Why do people become so credulous with anti-OOXML FUD as soon as Microsoft is invoked: we are seeing it again with the current campaign to blacken ISO's name, votes and procedures—never mind any hard facts, lets trust sweeping statements, slogans, rumour and innuendo! OOXML = Microsoft = corruption

Dave S.
2007-09-06 08:15:47
Jeff -


"OOXML = Microsoft = corruption"


The phrase 'power corrupts' certainly seems to apply to a monopolist company convicted of abusing that monopoly, during which trial an MS witness even committed perjury (though the judge was lenient and did not penalize him.)


It would be truer to say 'An unchecked power corrupts' which is where the FSF and open source movements come in. E.g. if Linus Torvalds had tried a power grab, the contributors could balance that effort. While it might result in a breakdown of linux, it would also mean no power for Linus.


The same cannot be said of Microsoft. If Microsoft included software to invalidate the OS license and prevent the OS from operating, most buyers would object. If users could. But users can't, so Microsoft did. There is little check on Microsoft.


Your graph manipulation would be more convincing were the raw data presented and a review by Edward Tufte made of the results.


I think the question about uncaught errors due to parallel review is still unanswered - such as redefining color scheme notation based on the application sub-set, or having a tabular format in the spreadsheet, with a different method used elsewhere.


Brian Jones, who should know, has not responded to the idea that in order to validate the proposed standard meets its primary mission of representing legacy formats in a new format, one needs to have full access to all the previous formats and map the old to the new. To ensure that the old formats aren't changed they should have been documented and sent to ECMA prior to the submission of OO-XML to ECMA.



Rick Jelliffe
2007-09-07 03:37:18
Dave S: Yes, I would expect Tufte could say what I said far more eloquently and beautifully. But I think he would get my point!


Just sticking two axes on a chart does not mean that you can draw conclusions from the points that get drawn.


You can see the same sloppiness appear in the recent comments that the countries that voted yes to DIS 29500 tend to be the countries that are high on the corruption index. Now if there is no claim that corruption has occurred, why mention these two things together? It is a really shameful slur by the people making it.


If I said that the further away a country was from the psychic influence of the kangaroos of Boggabilla, the more likely it was to vote yes, that would certainly explain the voting pattern, however it would be bogus.

Kurt Cagle
2007-09-08 10:18:04
Rick,


I'm inclined to agree with earlier comments here - ISO has a fairly well defined style guide and standard that Microsoft should have followed for submission of their content. I have to assume that the standards experts at Microsoft understand the difference between normative and non-normative content; certainly if they didn't then Microsoft could have readily hired the services of someone who did.


The OOXML standard is a larger standard because Microsoft insisted on bundling all of the subordinate technology upon which it was based into the same document, rather than breaking these out into subordinate pieces.


If all of the arguments you make concerning page count is true, what emerges is that Microsoft either wrote a poor specification that was far too non-normative (and as such should have been extensively edited prior to submission so that the "tutorial" material was under a separate document) or that Microsoft was hoping that the bulk of the specification would give the proposal extra weight (pun intended) and would reduce the degree to which there could be a meaningful review. Neither point speaks well of Microsoft.

gfim
2007-09-09 16:46:16
Rick, a viola is a musical instrument. I think the word you're looking for is "voila". It actually has an accent over the 'a' too, but that is often omitted in English.


Graham

Rick Jelliffe
2007-09-09 22:46:20
gfim: Thanks for finding the typo.
Rick Jelliffe
2007-09-09 23:04:58
Kurt: Actually, JTC1 fast-tracked standards are not required to conform to ISO drafting rules.


E.g. ODF and OOXML have blue headings, though ODF is better on "shall". However, for maintenance released, which is nominally by SC34, they are supposed to conform so it is good to get the basics right up front. My comments to Standards Australia, available on a different blog, put the issue of clarifying conformance as a central necessary improvement. But it can be fixed in a couple of sentences: vital, but straightforward to fix.


Just as important as the official guidelines is the issue of the unwritten SC34 WG1 rules: the benefit of going through the normal working group of the normal committee is that terminology and terminology (homologization) is aligned and the issue of harmony with the existing SC34 technology and terminology is addressed, too. Fast-track standards only get that review through the informal pre-submission process (i.e. the interaction with SC34 members in the second six months of 1996.)


You say "Microsoft", but it was as a result of the Ecma process that the standard grew from 2000 to 6000 pages: why aren't you collaring openness as the source of this problem? The thing is that each standards body (each committee, each working group, each national body) has different criteria, and a standard made to suit committee A will never satisfy committee B (unless the editor on committee A happens to have learned his chops on committee B, as happened with ODF and RELAX NG.)


Did you see Jim Melton (editor, SQL) and Michael Kay's comments on XML-DEV this weekend that they would expect thousands of corrections for a spec this size, based on their large standards (ISO SQL is almost 4000 pages) without it being a particular problem. Patrick Durusau and I have been working on a paper on this issue, by the way; I hope it will be out soon.


I'll put out another comment on normative versus non-normative next comment. I wanted to write a blog on it, but I don't have a clear hook yet.

Rick Jelliffe
2007-09-09 23:28:04
Kurt #2: On normative versus non-normative.


The issue of normative versus non-normative is actually quite difficult for document standards. This is because sometimes a standard specifies conformance testing that is weaker than its normative requirements. Sometimes because the schema language is not powerful enough; sometimes because it is more natural to speak in semi-procedural terms (operational) rather than strictly declarative terms.


These are both sloppy drafting, but just as much they are an issue where you need to be careful when interpreting a standard not to go too far: when DIS29500 says "Conformance is purely syntactic" then it behooves the reader to read the rest of the standard in that light. It is not a requirement of a standard that no-one will be confused by it: any more than it is a requirement of The Art of Computer Programming that a non-programmer can turn to any page and understand it. A standard is a particular literary form, even sloppy ones, and we need to understand the form before interpreting it.


There are some people (e.g. I would count Murata-san and me in this tendency) who strongly believe that the ISO requirement that there should be non non-verifiable statements in a standard needs to be ramped up so that there should be no statements that do not have a formal test in place (e.g. schema or, failing that, even a formal characterization.) Hence my suggestion that all the text of DIS 29500 should be removed from the standard, just leaving the schemas and other grammars. People in this camp would tend to see a sharp distinction between language standards (which have validators) and applications standards (which would have a test suite.)


Other people want DIS 29500 to contain detailed algorithms: descriptions of the exact operation of bugs, typesetting rules, mappings to previous binary formats: in fact, they want the source code to Office 2007. I am not sure they understand what SC 34's mission in trying to separate out processing from content (at the gain of re-purposing, but at the cost of exact page fidelity) is all about. SC34 is in fact an antagonistic forum to that approach, which is why PDF is done by a different working group at ISO for example.


And then there are people in the middle (e.g. James Clark and Patrick Durusau) who, as I understand them, think that a declarative description of semantics is really important to meet user expectations because people want to know more than "does this document conform to the rules, independent of its semantics" and also need to know "what is this element used for?" So you need to have conformance language even though you defer providing a test apparatus for them.


The other aspect of "normative" is that it corresponds to what IPR covenants usually term "required portions". So when a standard says something is optional on not "required" it is not the same thing as when a license talks about "required". W3C have dropped "required" in their legal language now to use "normative" which is good; it would be good for others to follow. A discretionary part of a standard will be a "required portion".

Asbjørn Ulsberg
2007-09-10 05:10:08

The issue of normative versus non-normative is actually quite difficult for document standards.


You make it sound extremely difficult, when in fact, it isn't. Okay, there may be some edge cases where it's difficult to draw a clear line between normative and non-normative, but as you were able to cut down the specification by more than 5200 pages, it's obvious that it isn't that hard.


You start out by writing a blog post saying that most of the specification is non-normative and can be cut away, but follow up with a comment that says it's very hard to tell what's non-normative and not. Can you please make a decision? Is the OOXML specification full of non-normative stuff that can be moved to an appendix or isn't it?

Jesper Lund Stocholm
2007-09-10 06:47:58
By beating on Rick for his post, you clearly show, that you are missing the greater picture - or even the point itself. Instead of jumping up and down yelling "you can't make this assertion" etc, you should be commenting on Robs blog yelling "Are you a moron - how can you justify making a graph like this?" or "My bad, I'll try next time to actually form a qualified opinion of a (biased) post before I jump on the band wagon".


You should, btw, be posting similar comments on Rob's article on document count using Google search and even on Stephanes article about OOXML being defective by design.


:o)

Rick Jelliffe
2007-09-10 23:07:24
Jesper: +1


Asbjørn: Edge cases? An editor or committee has to make a choice about every single sentence in the body of a standard, whether it is normative or non-normative. The choice is often based on trading off detail for flexibility. And making clear the language of optionality. And making clear what is testable for conformance. A large part of review is looking at these issues; see my comments to Standards Australia in the previous blog for example.


Once the choice has been made about what is normative, how to treat the non-normative sections becomes just a matter of style and editorial policy. Given the widespread comment that people would prefer a smaller document (which is a constructive approach to interpreting the comments on size), then removing the non-normative sections is low-hanging fruit.