The era of closed formats is dead

by Rick Jelliffe

The era of closed formats is dead is a friendly interview with South African standards activist Bob Jolliffe. I enjoy being in the same room as Bob, not least because for once some else gets their name constantly mispronounced: I think I counted three different mispronunciations from the same person in one day! I believe that both our names come from the Chaucerian English for jolly: fat and happy.

What I particularly like about Bob is that, if you read the interview, he is concerned with establishing requirements for interoperability and substitutability, and encouraging ODF, rather than slagging off MS or OOXML. I do tend to categorize people as "enablers" and "disablers" (not that these are permanent or unqualified vocations), and I certainly classify Bob as an enabler, even though we have different opinions on OOXML. But I don't think we have particularly different opinions on ODF. Bob (who has been representing South African standards for the last couple of years at SC34) is now participating on OASIS ODF TC, and I think it is really important for government stakeholders to get intimately involved. I have repeatedly called for more government and stakeholder participation in standards groups, and I think Bob's involvement should be a model for other governments who are wanting to make open standards mission critical.

It is clearly just the next step, that when a government starts adopting open standards, it also needs to develop expertise (Bob's comment that there is an issue of scattered expertise is interesting), in particular in order to be able to make hard-nosed evaluations about the state of the art in implementation and on profiles.

I would have titled this "Bob Jolliffe gets it" (like Neelie Kroes gets it: "Standards are the foundation of interoperability." and The Norwegians get it ) because I agree with pretty much everything in Bob's answers (to the extent that I feel I could have written some of it!) but for the paragraph

One of the big dangers I see is the proliferation of backend office software which is so tightly coupled with single vendor’s office products. The promotion of open standards-based procurement of electronic document management systems is an urgent challenge.


Which goes further than I do, at the moment. I certainly agree that public government documents (in and out) should be in open standard formats, and that for that use OOXML is extraneous given the availability of ODF: however I think it would be better to think in terms of a hierarchy HTML, PDF, ODF with ODF as the last resort for publishing government material at least. (And I don't see any harm in multiple formats being provided including the original native format of a file, for example OOXML or SVG, as long as the broad-reach standard was also available.)

However, for internal and specialist document systems, to the extent that I have a formed opinion it is that I suspect that functionality still has to trump standards support, until it can be proven that the standards meet the functional requirements. This is not to say that open systems will have to have a higher standard of scrutiny or QA than the old closed proprietary systems, but rather than functional-compliance requirements do not go away merely by deciding that you need standards-compliance, unless there is specific objective evidence that the one is fulfilled by the other.

(Oh, and I think Bob is technically wrong that IS29500 is not now an ISO standard. It has been approved by ballot so it is a standard; it's publication has been delayed. The result of a successful appeal would be for it to be withdrawn as a standard (and still not published). )

UPDATE: Bob mentions the South African government's Minimum Interoperability Standards (MIOS). It is available here (PDF) This extends the normal definition of open standard to include a requirement for multiple implementations: I think this is a mistake of naming (a standard is not open because of its implementions) but the correct requirement for procurement (a technology is open if it allows substitutions): what they should say is "open and mature" standards where multiple implementation is a property of maturity not openness. I think that is just fuzzy thinking that causes unnecessary squabbles: confusing issues doesn't help thinking them through clearly.

I would severely criticize it for being entirely W3C Schema centred, including support for WSDL, and consequently a tool of one set of vendors. No explicit mention is made of ISO Schematron for example. How on earth does the requirement that XML Schemas should be used for data interoperability square with the fact the ODF has a RELAX NG schema not a W3C XML Schema? The trouble with making unnecessary restrictions is that then you have to turn a blind eye to wherever they are impractical, and turning a blind eye introduces an element of arbitrariness that goes against good government.

However, by the time you get to section 2.7, it turns out that RELAX NG is allowed. And it requires GML which has some Schematron schemas IIRC. Perhaps Schematron can creep in as a kind of XSLT? (Obviously because this is a minimal guideline, it is not exhaustive, so my criticism is unfair to that extent!)

I see that no versions of XSLT or XSD or XML (or most things) are mentioned: it would be interesting to have some idea about why versions don't matter. And I see the list includes MPEG and ZIP. How do they fit into given the definition of openness? Anyway, these are all the practical issues that Bob will be grappling with.

5 Comments

Bob
2008-06-17 13:34:31
Rick


Thanks for your thoughts on the tectonic interview. You raise a lot of important issues and useful suggestions regarding MIOS which I appreciate. Unfortunately MIOS is not exactly in my control (I don't rule the world - yet), but most of what you point out are real defects which I am aware of and am forever pushing for. Particularly tightening up around version numbers and revisiting some of the grottier sections.


The issue of zip and mpeg is purely one of pragmatism. I would certainly have preferred if there were viable alternatives here. A major problem with mpeg is the essential patents required to implement, but while we do not have valid patents on mpeg in South Africa we can live with it. The problems with zip are significant and which I know there is huge interest (and some progress) in solving in SC34 and other forums.


My concern around DMS is borne of bitter experience. We have grappled with the problem of effecting diversification of desktop software in a government department which has an expensive, useful, proprietary DMS at its information heart - a DMS which requires IE or Microsoft Office to access it! Its a real show stopper. We are currently piloting alfresco (www.alfresco.com) which I can highly recommend as having the right mix of rich functionality, a GPL licence and sports truly open interoperability. Believe me, it is like a breath of fresh air after what we have been living with.


I'll have to think a bit more about your comment on multiple implementations and open standards. I worry a bit about your reference to the "normal" definition of open standards. It seems to me like an area of terminology very much in flux. I am not sure if a "normal" has really emerged yet. I notice even the WIPO have recently chipped in with their own "faith based" approach to IP and standards. I think one of the confusions arise from whether one is attempting to define "sufficiently" open for a particular purpose, which would vary significantly in different contexts, or "maximally" open. What is the most open a standard can be? Which is not to disqualify the use of other "standards" - just to always express a strong preference for the most open. You can't sensibly do that without an understanding of what the most open might mean.


I think the MIOS attempts to define the most open. In which case multiple implementations might be a reasonable criteria. Perhaps not, but it is certainly a reasonable criteria for procurement purposes.


I'm really not sure what the status of OOXML is now. What I said is that it's not ready to be an ISO standard. The only reason it has reached as far as whatever this current weird state of limbo is, is because of a concerted waiving of normal process to wave it through. A capitulation to driven self interest. That was wrong. And that is what SA has appealed.


Cheers
Bob

Rick Jelliffe
2008-06-17 23:31:53
Bob: Actually, the status of ZIP doesn't worry so much since people are being very conservative about using it in standards: there was an incompatible split between rival ZIP implementers about five years ago that has supposedly been resolved from ZIP 5.2, however it was in the area of encryption for ZIP (which there may be some kind of patent issues anyway).


But I think the MHEG/MPEG licensing arrangements are anti-small developer, and have clearly held back distribution, acceptability and adoption of desktop Linux/BSD systems that can be compatible with the large corporates' offerings. To me it should be something that governments, even if they are merely interested in Open Source as an tactic and Open Standards as a strategy, should really be jumping on.


I think a lot of governments actually use open source and open standards as part of their procurement strategy: give us a good deal or we will go over to the opposition. (E.g. PNG forcing Fujitsu etc to not enforce their GIF patents.) That is really good: if you only have one standard technology you may help a market in substitutes implementing that technology, but you also lose in having multiple markets for the different technologies: having a single standard in an area only works if it is a kitchen sink big enough to cover everyone's requirements, or is sufficiently modular to allow internal competition, but once it is sufficiently large you lock out small-developer implementations. So I see the idea that you should have single, large, monolithic standards as playing entirely into the hands of the large corporate vendors. The ways around it: layered standards, and/or small technologies, and/or multiple standards (all having different trade-offs of course.)


If you require multiple implementations to be open, then I wonder whether Linux could withstand it, for example. Is a distro a different implementation? Not really. Is a version a different implementation? Tenuous. Of course, you can go back to POSIX compliance, I guess.


The original definitions of openness came out of academia, and none of them included multiple implementations. (I added the comment to Wikipedia that some people include multiple implementations.) I am not aware of any standards body that requires multiple implementations before something is accepted as a standard. W3C (and OASIS? and SC34 mostly) require that all parts of the standard have an implementation, but that does not mean that there is any one complete implementation, just that the feasibility of every part can be pointed to by referring to an implementation: there can be multiple proofs of concept. And even the bodies which require an implementation frequently only require it for substantial functionality, not cosmetic or syntactic issues or minor functionality.


The trouble with saying that a standard is only open is when it has multiple implementations is that it entrenches the status quo: it makes an additional barrier to entry for new technology. China, in particular, has been very strong recently that there needs to be more churn in standards, to allow technical traditions from new players to be established. All the other criteria for "open standard" are supplier-side issues: they relate to how the standard is developed and maintained. The requirement for multiple implementations is a demand-side issue: I understand that governments frequently don't see any difference between demand and supply, if they are trying to make some kind of command-economy imperative, however I think conflating supply issues and demand issues for standards merely moves things that should be in the procurement domain into the political domain IYSWIM. This is of course not saying that users should not be involved in standards creation/maintenance, nor that standards suppliers should not be involved discusions about adoption.


Perhaps it would be better to say that there are open standards (which has nothing to do with implementation), open implementations (which has nothing to do with standards) and open technologies which is where you have open implementations of open standards (or free implementations of free standards.)

Elliotte Rusty Harold
2008-06-18 10:26:35
Rick,


ODF is vastly superior to PDF. Even OOXML would be superior to PDF. The difference is that ODF/HTML/OOXML is designed as an editable format, whereas PDF is little more than a screenshot. Most of the time, you can't even reliably copy and paste from PDF to extract a simple quote.


While I agree that HTML is definitely the preferred option, PDF just doesn't play in the same league. You might as well publish government docs as JPEGs as PDFs.

Rick Jelliffe
2008-06-19 18:45:48
Elliotte: Yes, there are certainly different ways of ranking formats, and your order may well be right for many use cases. But it is not an either/or question: supporting multiple formats where possible allows maximum reach and minimum grief.


I know you don't like historic PDF, for all sorts of good reasons, but some nicely structured modern PDF profile, like PDF/A-1b, certainly supports cut-and-paste. And there will always be a vast legacy of scanned documents: no particular benefits leap out at me for preferring ODF or OOXML for scanned documents.


For forms, HTML and PDF browsers allow editing of fields without the possibility of accidentally deleting chunks of the form. The ODF or OOXML "browser" is rare (yes, I know they do exist, in fact I use MS' free-beer OOXML browser for my Windows boot on this PC) because people open them in their word processor: this is unsafe for forms documents, which makes up an important part of government documents.


I am delighted that Adobe is taking PDF down the standards track. I remember more than a decade ago when you had to apply to them for a PostScript license, and let them know what your intended usage was. (I applied and was turned down for a project: this deeply monopolistic behaviour is one reason I think market dominating technologies should be exposed in RAND-z, QA-ed voluntary standards. The big players have gotten away with secretiveness and selectiveness for too long.) I also remember implementing a PDF generating tool, for a project, only to find that the PDF accepted by Acrobat was a mysterious subset of the PDF specified in the Blue Book.

Mitch 74
2008-06-23 01:59:15
The interesting fact about ODF is that it calls upon existing standards in a modular way - meaning that if one wanted to make use of Theora/Vorbis intead of MPEG2 or 4 to store video and audio data, you could. The advantage is that the format itself doesn't really need to bother about this or that patent being valid here or there (it fall upon the implementer's will to implement stuff for that contingency, let's say, by making use of the OS's multimedia framework for actually enabling the content to be played): ODF is at least partially able to do that.
It's true though, that balancing between openness and capabilities is hard: if one wants to add functionalities such as applying filters, acting upon played frame number, where are controls to be embedded (and how), soon you hit problems that don't happen in a case where the one creating the format is also the one making the multimedia framework (Microsoft).
Interestingly, this is the same problem that hit the HTML5 group: how can one add audio/video to a webpage, that would be consistent across all platforms?


Standards are nice, but pleasing everyone sure is hard.


By the way, I'm not sure using HTML or PDF or ODF by decreasing order of priority is that much of a good idea: each has an use, but they also have their idiosyncrasies:
- HTML: which one? HTML 4.01 Strict, or XHTML 1.0 Strict following annex C? The latter is XML-compatible and forces precise entity delimitation, the former is widely more implemented. Or maybe HTML 5? Don't forget styles: CSS2.0 or 2.1?
- PDF: I agree with you that Adobe opening the format more is a good thing; OpenOffice.org's idea to embed the source ODF document in the generated PDF however is an idea that works around the format's biggest drawback: it becomes editable by preserving each element's semantic and at least some of its stylistic data
- ODF: a standard on something as mutable as a document format is hard to fathom. ODF however is interesting by its modular approach, which has already been tested with varying images, objects, macro languages and videos across its several implementations. But, providing it with a 'snapshot' (the PDF document above) rally is interesting - and is something that must be considered in future standards revisions.