A different legislative approach to handling archival binary formats

by Rick Jelliffe

The issue of handling legacy binary formats is one that impacts much more than old Word documents, especially for governments who have long-term archiving requirements.

I think governments should simply legislate "After 20 years, the documentation for all formats used in government data should be made public for access on government archival websites and is deemed unencumbered by IP considerations for the purposes of information retrieval of government data" as a matter of public policy. Hand them over or get a fine for obstruction or bad record keeping!

Of course, the regulations would need to say more than that to cope with industry churn and the ravages of time. For example, what if the vendor or product has been onsold and no-one knows where the documentation is now? What if the local sales body is no longer the sales body for that product, or the development organization is defunct. But that need not stop the general case.

Of course, for contemporary and future data, standard open formats are the thing.

12 Comments

Andrew
2007-07-30 03:53:01
What you say above is correct though It all fails if the individual company tries to obstruct.
Yes you could create a documentation escrow system, but what is the point.


Microsoft for instance (your sometime/current sponsor) has a real problem actually documenting anything accurately. Just look at OOXML, Doc formats and CIFS protocols. You'd think that with what was it, 500 people working on the job for the EU they could do a credible job. We are still waiting.


Your logic breaks down when there are bad actors in the ecosystem Rick, your rhetoric assumes that Microsoft always will do the right thing, and granted that they have done good and ethical work, but soooo often they restrict the freedoms of individuals in order to extract the microsoft tax, thus destroying 98% of the true value they create and seriously Rick I am being kind to them.


Rick, we are seeing this problem now with barely out of support products, 20 years is just insane.


Please try defending the defensible, try asserting the "The sky is yellow and the sea is red", and you might provide more value to society.


Andrew

Rick Jelliffe
2007-07-30 04:07:28
Andrew: Thanks for your kind words and practical alternatives.


If a company did not produce documentation where it reasonably could, it could be subject to fines, sanctions, etc.

Gary McGath
2007-07-30 07:15:00
This is the old fallacy that if you want something, you just have to legislate it. If government agencies lack the forethought to migrate files in an obsolete format, they can just _demand_ that the documentation be available, imposing an ex post facto requirement (which violates the US Constitution) to keep documentation. But things like that don't stand in the way of the "general case." Companies will be obligated to reverse-engineer formats neither they nor anyone else has used in a decade to make up for the government agency's blunder.


How about a radical notion: that government agencies should act responsibly?

Rick Jelliffe
2007-07-30 08:03:43
Gary: For a start, my government is not bound by the US Constitution. I said "governments", but of course there is local variations in feasibility and extent.


You are entirely incorrect in saying that my idea rests on "if you want something, you just have to legislate it." Some things you need legislation for, some things you don't.


I think a case can be made for an implied warranty or even duty of care that formats sold to governments for use in archiving must be kept somewhere.


But I am not suggesting that this will fix all problems. But where there is serious data involved, governments need to be equally serious in preventing the data from falling into limbo.

James
2007-07-30 09:33:37
I don't need the documentation. I don't have time to read it, let alone implement it. All I need is a conversion program that will take all those old formats and convert them to approved, well documented and widely supported international standards.
Rick Jelliffe
2007-07-30 09:57:19
James: But the documentation is necessary because sometimes the programs get lost, or are based on obsolete operating systems or libraries, or the licenses were CPU locked on dead machines, or the utilities that used reverse engineering got something wrong, or archivists run out of money to keep their documents in the format of the day.


And sometimes there are none of these magic conversion programs floating about.


If you don't need it, why block it. Is no-one else allowed to need what you don't need?

James
2007-07-30 14:23:17
Rick,
As for the really old binary formats: I'm afraid it's already too late. You cannot impose a new law upon the past. The companies are gone, their documentation is lost or kept secret, the programs and the data may be encrypted (say we are talking about health care records) so even reverse-engineering is hopeless. The worse scenario - you cannot comprehend it - is taking place.
As for the new binary formats I'd really rather have the source code in ANSI C than 500 pages long documentation. Maybe having both at the same time is the best solution? Like prof. Knuth did with his TeX program and documentation.
Rick Jelliffe
2007-07-30 17:01:27
James: The companies are gone? MS, IBM, Sun, Corel (WP etc), Dell (DEC), Unisys (Sperry, Burroughs, Univac, RCA), Computer Associates, Adobe, Getronics (Wang) can be traced too.


(Also, I am not saying that companies should necessarily be uncompensated for the cost of locating whether the formats exist still, e.g. if the company has been on-sold due to bankruptcy.)


In any case, this is not imposing a condition on the past. It is saying what information holders should do now. At the very least, archivists should know whether their data is in formats with extant documentation.


But, I do agree that if there was no specific format documentation, the source code should be required.

Evan Owens
2007-07-31 05:29:44
This is an extremely important topic, one that has been much discussed in the digital preservation community. One alternative to the legislative approach is something closer to source code escrow. The Library of Congress NDIIPP program and the Andrew W. Mellon Foundation are interested in this; Mellon has funded the Global Digital Format Registry project (http://hul.harvard.edu/gdfr/). There is also much related work going on in the European Union, notably the PLANETS project and PRONOM.
Rick Jelliffe
2007-07-31 07:13:01
Ewen: I suspect this is one issue where progress needs to be made on all fronts, but without regulators avoiding their responsibility to make sure our data is not lost.


This has all been made more pointed by the loss of all those pensions by the Japanese Government. They lost 25 seats in the upper house election because of it. I haven't heard that it is a format issue, but it shows the danger.

Evan Owens
2007-08-01 09:22:21
There is a big gap between US and EU approaches to these kinds of problems: centralized top-down approaches don't work very well in the US. Witness the current health care mess where all the innovation is coming from the states rather than the federal government. The good news is that the CIOs of the state governments are becoming interested in digital preservation and related issues.


Unfortunately, digital preservation is in many ways a "public good" and may well be best addressed by top-down approaches to some extent.


Kriz
2007-08-05 04:40:00
In most cases digital documents aren't authoritative and there is no need to archive non-authoritative documents.


To become authoritative those documents need to be digitally signed which leads to event bigger problems than file format documentation (i.e. maintaining the public key infrastructure, et. al.).