At the Semantic Technologies conference in San Jose I attended an interesting presentation entitled “persistent identifiers for the real web”
. XML often uses URLs for identifying schema namespaces, and I suppose could be credited for influencing RDF’s practice of using URLs for identifying resources. In using RDF to describe and annotate things a problem arises
are you describing the web page, or the thing the web page is talking about. For example, if I assert that:
<http://tcowan.myopenid.com> :likes <http://www.myspace.com/lettucefunk>
Does that mean I like the web page or the band the page is about? As you're traversing the semantic web it's going to be advantageous to distinguish between content assets and the real world entities they may represent. Their proposed solution involves PURLs (http://purl.org for example). Normally a permanent URL redirects you to the best representation of the resource via a 302 response. They propose that when the PURL represents a real world entity that the response be given as a 303 (see also). The computer agent can then understand that the “thing” is a real world entity, and that the redirect is not to the real thing, but to another web resource about the thing.
I'm very much in favor of permanent URLs. Otherwise all our assertions will become disjointed as links break, or we’ll have to keep our own “archives” of dead links and sites. I also appreciate the simplicity of Dave and Eric’s proposal, however, I’m not so sure this is really the best way to solve identifiers for real world things. Consider books for example
what would be the best way to represent a book, it’s URL on Amazon or it’s ISBN number as a URN? If we use the Amazon URL we can’t be sure it’s a book, it might be binoculars or a coffee table. The URN however makes it clear:
The urn namespace indicates that it's a book, without a doubt. If PURL were to host a “see also” permanent URL scheme for each declared URN namespace we'd be able to visit that URL to find out more...
But on the practical web, we don’t use PURLs or URNs for books, we use the Amazon.com url. I think in practical terms things are going to be represented on the web by the domain that has the best collection with the best open content. Perhaps the best approach in the end is to take advantage of blank nodes.
<http://tcowan.myopenid.com> :likes _:a
<http://www.myspace.com/lettucefunk> :describes _:a
_:a a :funkBand
In English, http://tcowan.myopenid.com likes the funk bank described by http://www.myspace.com/lettucefunk. Now we’ve made it clear, and without the use of PURLs or some new PURL redirection strategy.
back in december/january there was a long discussion about a new URI scheme for geolocation http://lists.w3.org/Archives/Public/uri/2007Dec/0065.html on the W3C URI mailing list. in the end, it got nowhere, but there was quite a bit of discussion around the basic problem of identity on the semantic web http://dret.typepad.com/dretblog/2008/04/identity-on-the.html which still is a big issue and basically unsolved.
personally, i find it a bit ironic that most semantic web examples are a quite sloppy when it comes to identity, almost always assuming that a person *is* his/her web page. this kind of sloppiness gets you into a lot of trouble, in particular if you have inferencing mechanisms that all revolve around the idea of URI-identified concepts.
i think that if you want to make statements about non-information resources (such as a locations), you need to have a URI scheme that is able to represent that, and HTTP often may not be the best choice, because it conveys zero semantics (unless, of course, you assume all semantics to be exclusively covered by semantic web technologies). instead, if you are talking about a location, and the web community decides that locations are a relevant concept that should be made explicit on the web, then there should be a URI scheme for locations, so that a browser for example could automatically feed that to your preferred map service or the navigation application of you mobile device.
and btw, firefox 3 now includes functionality for extending it with URI-scheme-specific behavior (they call it "protocol handlers"), which makes the "hard to deploy" argument against new URI schemes a little bit less important, http://dret.typepad.com/dretblog/2008/06/web-based-sms.html looks at that (*but with a different UIR scheme in mind).
Does the use of an ontology, such as the Music Ontology, which offers a class, mo:MusicGroup, remove any of the ambiguity?
Yes indeed. We'll just pretend someplace else I declared :funkBand rdfs:subClassOf mo:MusicGroup.
I think it's fair to say that we live in a world of TLA (three letter acronym) overloading and there is already a substantial amount of discussion referencing PURLs as "persistent", not "permanent". Your article doesn't seem to make a distinction between the two, is there? I noticed the presentation you reference has the same issue.
BTW, I'm also a fan of alternate URI prefixes, and rather than using blank nodes, use "tag:" prefix from RFC 4151. The advantage is that like URN is that it's clear it's not a web page but a "thing", and all the "things" I'm talking about are in my own (persistent) name space, "tag:email@example.com:", and if you would like to make some inference that one of my things is the "same as" one of yours, then hopefully there is an inverse-functional predicate that we agree on, or if my thing is a book, then I could include some information like:
< tag:firstname.lastname@example.org,2008:favbook_3 > owl:sameAs < URN:ISBN:0-395-36341-1 >.
The problem with bank nodes is that they are only scoped in the context of a specific document. If you send me two different queries, you can't assume that _:a99 and _:a99 refer to the same thing. I concur that HTTP is a terrible prefix for real world things, and even worse for metaphysical things.
>you can't assume that _:a99 and _:a99 refer to the same thing.
You can if _:a1, _a:14, ... are related to via an inverse functional property. Whenever you want to talk about the band, you first have to say:
<http://www.myspace.com/lettucefunk> :describesAtMostOneRealWorldThing _:a1
_:a1 verb this
_:a2 verb that
So before the anonymous node is used in triples, we first must make it clear that the anonymous node is the real world thing referenced by the URL. On some other web page it might be stated via RDFa that:
<http://www.myspace.com/lettucefunk> real:describesAtMostOneRealWorldThing _:a12
_a:12 is the same thing as _:a1 because the URL can only be related to at most one unique real world thing via the real:describesAtMostOneRealWorldThing predicate, that we've all agreed upon.
That wasn't made clear in the post, but that's my proposal for getting away from using URLs for real world things. Just as you say, there's an assumed predicate we have agreed upon to associate at most one unique anonymous thing that is represented by the web resource. For each anon node we make the inverse functional assertion once, and then can assume an owl reasoner will realize they are the same things.