PRESTO - A WWW Information Architecture for Legislation and Public Information systems
by Rick Jelliffe
The elevator pitch for PRESTO is this:
“All documents, views and metadata at all significant levels of granularity and composition should be available in the best formats practical from their own permanent hierarchical URIs.”
I would see PRESTO as the kind of methodology that a government could adopt as a whole-of-government approach, in particular for public documents and of these in particular for legislation and regulations. The problem is not "what is the optimal format for our documents?" The question is "How can link to the important grains of information in a robust, technology-neutral way that only needs today's COTS tools?" The format wars, in this area, are asking exactly the wrong question: they focus us on the details of format A rather than format B, when we need to be able to name and link to information regardless of its format: supra-notational data addressing.
PRESTO is a combination of three ideas:
- Permanent URLs
Legal documents such as legislation have three characteristics: they are highly structured, they are highly voluminous, but they have highly varying value. So many documents do benefit from the classic SGML treatment, with semantic Full Monty markup, but many others are accessed so rarely there is little benefit in having high-level markup for them. And in fact many documents may be scanned images with no text at all, and full markup entails re-keying.
So what PRESTO does (and people familiar with SGML PUBLIC identifiers will get the drift, and even more so people familiary with ISO Topic Maps) is to say that there is a real importance in being able to have permanent names even for resource that don't have really brilliant representation available.
In fact, the legal documents may not exist physically yt all: it may be a base document and an ammendment document. So we want a permanent URL for the idea of that document, and we want our system to deliver the best fit it can when we want to get the representation. And we want to allow multiple formats, because often the best representation may be client-dependent. !
Some people might understand it better if we say that PRESTO is about naming and structuring the configuration items for document sets, and forms a precondition for vendor-neutral implementations, and to support plurality. What PRESTO does is say that when we drill down into a document, we do not want to drill down using media-dependent or presentation-dependent accidents, but according to the editorial/rhetorical (i.e. "semantic") substance.
So why do I say "How else are you going to do it?"
The reason is because if you are wanting to build a large information system for the kinds of documents, and you want to be truly vendor neutral (which is not the same thing as saying that preferences and delivery-capabilities will not still play their part), and you want to encourage incremental, decentralized ad hoc and planned developments in particular mash-ups, then you need Permanent URLs (to prevent link rot), you need REST (for scale etc) and you need object-oriented (in the sense of bundling the methods for an object with the object itself, rather than having separate verb-based web services which implement a functional programming approach: OO here also including introspection so that when you have a resource you can query it to find the various operations available)
What would a concrete example be? Lets say we are a government and we have adopted PRESTO so all our legislatation is online with these kinds of permanent URLs including every numbered thing inside the legislation. Then we want to be able ask "What other laws reference Part 4 of this Act?" In PRESTO, we say "OK, the object here is Part 4, so we want to extend the URL for Part 4 to add a name which means the list of references." So we would have a URL like
http://www.eg.gov/laws/ChildProtectionAct1904/1993/Part4/Referencedso that this gives a new URL, hierarchically based on the object it was dependent on. What we don't do is
http://www.eg.gov/functions/getReferences?to=/laws/ChildProtectionAct1094/1993/Part4(which is procedural/functional) and not
http://www.eg.gov/laws/ChildProtectionAct1904/1993/Part4?query=Referenced(some people would think this is OK, I don't have a particularly strong view at the moment.)
Now what happens when we try to access this resource, using an HTTP GET for example? Well, that depends entirely on what information that back-end has to go on. It might be an HTTP 404 error. It might be an HTML file with a list of links. It might be an XML file of XPaths. It is up to the client to cope with the data that is sent, not the server to send in a standard, universal format. But if we allow introspection, we can then ask the resource for a list of the resources available (and HTTP content negotiation can be used too, potentially.)
I guess a rule of thumb for a document system that conformed to this PRESTO approach would be that none of the URLs use # (which indicates that you are groping for information inside a system-dependent level of granularity rather than being system-neutral) or ? (which indicates that you are not treating every object you can think about as a resource in its own right that may itself have metadata and children.)
|Rick Jelliffe, Geneva
Phil: I am removing your comment. I am not allowed to blog on this because of Standards Australia obligations. Accusing me of hiding when I cannot respond says everything about you that I need to know.
UPDATE: Tim Berners-Lee recently released a page of thoughts on Lined Data Principles. The only significant distinction between it and PRESTO is that Tim is starting off "If you have some significant concepts or resources then give them URLS ets" while PRESTO starts with "You want to have URLs etc for all significant concepts or resources"
very interesting article.Now what happens when we try to access this resource, using an HTTP GET for example? Well, that depends entirely on what information that back-end has to go on. It might be an HTTP 404 error. It might be an HTML file with a list of links. It might be an XML file of XPaths. It is up to the client to cope with the data that is sent, not the server to send in a standard, universal format. But if we allow introspection, we can then ask the resource for a list of the resources available (and HTTP content negotiation can be used too, potentially.)