PRESTO - A WWW Information Architecture for Legislation and Public Information systems

by Rick Jelliffe

PRESTO is not something new: its basic ideas are presupposed in a lot of people's thinking about the web, and many people have given names to various parts, but I don't know that anyone has given a name to this package. In any case, this combination of ideas which seems to me to be the sweet spot of practicality for large public document sets seem to have escaped the way that we approach many problems and systems. However, the question I ask is "How else are you going to do it?"

The elevator pitch for PRESTO is this:
“All documents, views and metadata at all significant levels of granularity and composition should be available in the best formats practical from their own permanent hierarchical URIs.”


I would see PRESTO as the kind of methodology that a government could adopt as a whole-of-government approach, in particular for public documents and of these in particular for legislation and regulations. The problem is not "what is the optimal format for our documents?" The question is "How can link to the important grains of information in a robust, technology-neutral way that only needs today's COTS tools?" The format wars, in this area, are asking exactly the wrong question: they focus us on the details of format A rather than format B, when we need to be able to name and link to information regardless of its format: supra-notational data addressing.

PRESTO is a combination of three ideas:
  • Permanent URLs

  • REST

  • Object-oriented



Legal documents such as legislation have three characteristics: they are highly structured, they are highly voluminous, but they have highly varying value. So many documents do benefit from the classic SGML treatment, with semantic Full Monty markup, but many others are accessed so rarely there is little benefit in having high-level markup for them. And in fact many documents may be scanned images with no text at all, and full markup entails re-keying.

So what PRESTO does (and people familiar with SGML PUBLIC identifiers will get the drift, and even more so people familiary with ISO Topic Maps) is to say that there is a real importance in being able to have permanent names even for resource that don't have really brilliant representation available.

In fact, the legal documents may not exist physically yt all: it may be a base document and an ammendment document. So we want a permanent URL for the idea of that document, and we want our system to deliver the best fit it can when we want to get the representation. And we want to allow multiple formats, because often the best representation may be client-dependent. !

Some people might understand it better if we say that PRESTO is about naming and structuring the configuration items for document sets, and forms a precondition for vendor-neutral implementations, and to support plurality. What PRESTO does is say that when we drill down into a document, we do not want to drill down using media-dependent or presentation-dependent accidents, but according to the editorial/rhetorical (i.e. "semantic") substance.

So why do I say "How else are you going to do it?"

The reason is because if you are wanting to build a large information system for the kinds of documents, and you want to be truly vendor neutral (which is not the same thing as saying that preferences and delivery-capabilities will not still play their part), and you want to encourage incremental, decentralized ad hoc and planned developments in particular mash-ups, then you need Permanent URLs (to prevent link rot), you need REST (for scale etc) and you need object-oriented (in the sense of bundling the methods for an object with the object itself, rather than having separate verb-based web services which implement a functional programming approach: OO here also including introspection so that when you have a resource you can query it to find the various operations available)

What would a concrete example be? Lets say we are a government and we have adopted PRESTO so all our legislatation is online with these kinds of permanent URLs including every numbered thing inside the legislation. Then we want to be able ask "What other laws reference Part 4 of this Act?" In PRESTO, we say "OK, the object here is Part 4, so we want to extend the URL for Part 4 to add a name which means the list of references." So we would have a URL like http://www.eg.gov/laws/ChildProtectionAct1904/1993/Part4/Referenced so that this gives a new URL, hierarchically based on the object it was dependent on. What we don't do is http://www.eg.gov/functions/getReferences?to=/laws/ChildProtectionAct1094/1993/Part4 (which is procedural/functional) and not http://www.eg.gov/laws/ChildProtectionAct1904/1993/Part4?query=Referenced (some people would think this is OK, I don't have a particularly strong view at the moment.)

Now what happens when we try to access this resource, using an HTTP GET for example? Well, that depends entirely on what information that back-end has to go on. It might be an HTTP 404 error. It might be an HTML file with a list of links. It might be an XML file of XPaths. It is up to the client to cope with the data that is sent, not the server to send in a standard, universal format. But if we allow introspection, we can then ask the resource for a list of the resources available (and HTTP content negotiation can be used too, potentially.)

I guess a rule of thumb for a document system that conformed to this PRESTO approach would be that none of the URLs use # (which indicates that you are groping for information inside a system-dependent level of granularity rather than being system-neutral) or ? (which indicates that you are not treating every object you can think about as a resource in its own right that may itself have metadata and children.)

4 Comments

Rick Jelliffe, Geneva
2008-02-24 20:46:50
Phil: I am removing your comment. I am not allowed to blog on this because of Standards Australia obligations. Accusing me of hiding when I cannot respond says everything about you that I need to know.


The SNAFU that you mention was caught, corrected and reported within hours and did not impact in any vote change, AFAIK. No-one was corrupted.

F. Ciciliati
2008-03-13 15:26:47
Dear Rick,


As you mentioned, there is nothing new in this idea, specially when we consider the legislation domain.


In Italy, a solution like this you point has been officially in use since 2001. I think it would be fair to provide your readers access to the base document of the project (in English):


http://www.nir.it/stdoc/urn/urn-nir-13b-eng_.doc


I'm sure you will find there well established answers to your future questions.


By the way, it's also worthy to mention that several countries decided not to reinvent the wheel: based on the Italian experience (and in association with them) they are proposing the creation of a specific URN namespace for legal resources, the "urn:lex" namespace. Check:


http://www.stf.gov.br/arquivo/sijed/22.pdf


Regards,
Fernando

Rick Jelliffe
2008-04-08 23:11:55
UPDATE: Tim Berners-Lee recently released a page of thoughts on Lined Data Principles. The only significant distinction between it and PRESTO is that Tim is starting off "If you have some significant concepts or resources then give them URLS ets" while PRESTO starts with "You want to have URLs etc for all significant concepts or resources"


In other words, PRESTO is is not so much a general statement of principle, it is a program of action: it is not that when you have some high value or easy concepts or resources and then you give them URLs, but you systematically make sure you have URIs for *everything* significant at every significant level of granularity (and history). And in particular, for documents.


PRESTO is about emphasizing that the decision to have persistent URIs for everything significant is a decision, not something intrinsic to an information collection. This is rather different to the typical idea of ontologies, I gather: with ontologies there is some domain which people are already exploring or representing and they know that is what they want to do. In the case of legislation and other systematic document sets, it is clear at all in people's minds that having a way of naming everything that can be named is a useful and necessary starting point for reliable mashup and evolvable information systems.

hip london
2008-07-22 04:06:31
very interesting article.Now what happens when we try to access this resource, using an HTTP GET for example? Well, that depends entirely on what information that back-end has to go on. It might be an HTTP 404 error. It might be an HTML file with a list of links. It might be an XML file of XPaths. It is up to the client to cope with the data that is sent, not the server to send in a standard, universal format. But if we allow introspection, we can then ask the resource for a list of the resources available (and HTTP content negotiation can be used too, potentially.)


thanks for them