SPARQL: Web 2.0 Meet the Semantic Web
by Kendall Clark
The Semantic Web. It's an odd duck, and not only from the publishing point of view. Academic computer science is starting to take the Semantic Web (which means, for them, webizing the Knowledge Representation part of AI) seriously. There are conferences, journals, books. Government-funded SW research, especially in the EU (but also in the US and Japan), is also on the rise.
But in the geeky technical world, everything is about Web 2.0, not the Semantic Web. Which is fine, since there is considerable overlap between the Web, Web 2.0, and the Semantic Web. Lots of overlap, actually, and some pretty similar goals; the differences are mostly about use cases, emphasis, and some technical approach.
Anyway, so SPARQL. RDF is pretty foundational to the Semantic Web, and it's got a data model, a formal semantics, and a concrete serialization (in XML). What it didn't have till lately was a standard query language. Imagine relational algebra and RDBMSes without SQL. Pretty hard to imagine. So the SemWeb needed a SQL. It stood up the Data Access Working Group, which has been working for about 20 months and has come up with SPARQL — an RDF query language and protocol.
Most Web 2.0 applications and services involve a REST protocol or interface. In other words, you can interact with the app or service by means of HTTP and manipulating resource representations, many of which are in XML, but others may be in JSON, YAML, RDF, etc.
I think that's the way to build such apps/services, far better than an explicitly RPC-style interface. However, there is a bit of a problem. While using REST offers a standard set of operations (GET, PUT, POST, DELETE), it doesn't offer anything like a standard data manipulation language. In others words, there is no standard way to execute an arbitrary query against a Web 2.0 app or service's dataset and get back a representation of that resource or those resources.
And, more to the point, the service or app provider has to explicitly support just those data manipulation primitives or operations which it thinks are most useful.
That's great, but it's limiting.
Since RDF is such a useful data representation formalism, and it now has an equally useful query language, more and more Web 2.0 sites can push more and more smarts and functionality into the place it belongs, namely, the data. REST conceptualizes (and HTTP standardizes) public interfaces; but neither does anything to standardize how one interacts, ad hoc'edly and without central control, with arbitrary slices of someone else's data.
But SPARQL gives you precisely that, even when the data on the other end isn't really RDF, since all it has to do is support SPARQL query and map that into SQL or relational algebra or AtomStore or whatever.
Okay, so SPARQL gives the SW and Web 2.0 a common data manipulation language in the form of expressive query against the RDF data model. Web 2.0 needs something exactly like that. (Imagine the horror of trying to get all of these totally uncoordinated Web 2.0 services and apps to support the same SQL queries? That's completely impossible. It will never happen. It may be hard to get them all to map SPARQL into how they really store data. It may never happen, in fact. But it could happen, and it will long before everyone uses the same RDBS schema.)
What else does it need? It needs a way for those queries and their results to be schlepped back and forth between apps/services and other computer agents that want to consume those apps/services's data. In other words, the SW and Web 2.0 need a data access protocol, which is the other thing SPARQL gives the world. Using WSDL 2.0, SPARQL Protocol for RDF describes a very simple web service with one operation,
query. Available with both HTTP and SOAP bindings, this operation is the way you send SPARQL queries to other sites and the way you get back the results. The HTTP bindings are REST-friendly (though perhaps not maximally so, or so says REST advocate Mark Baker. Perhaps more about that later...) and a simple SPARQL protocol client takes about 10 or 15 lines of Python code.
So what, really, can SPARQL do for Web 2.0? Imagine having one query language, and one client, which lets you arbitrarily slice the data of Flickr, delicious, Google, and yr three other favorite Web 2.0 sites, all FOAF files, all of the RSS 1.0 feeds (and, eventually, I suspect, all Atom 1.0 feeds), plus MusicBrainz, etc.
Damn, that's not only a lot of data, but it's a lot of the data people actually care about. That's powerful stuff.
How powerful? Well, imagine being able to ask Flickr whether there is a picture that matches some arbitrary set of constraints (say: size, title, date, and tag); if so, then asking delicious whether it has any URLs with the same tag and some other tag yr interested in; finally, turning the results of those two distributed queries (against totally uncoordinated datasets) into an RSS 1.0 feed. And let's say you could do that with two
if-statements in Python and three SPARQL queries.
Pretty damn cool.
Frankly, I'm starting to catch the scent of one of those big convergence things just possibly starting to happen. It smells like money!
"REST standardizes public interface means; but it does nothing to standardize how one interacts, ad hoc'edly and without central control, with arbitrary slices of someone else's data."
Very nice indeed
Very nice Kendall, I hadn't considered the new generation of AJAX apps and SPARQL mixing. It will be nice once the current batch of triple stores (Kowari, Sesame, etc.) support SPARQL (I know Kowari is on the way - I may be writing it, but don't know about Sesame). There's also a place for other smaller frameworks such as JRDF (shameless plug) and SWAPI, which offer a rich API as well as query capabilities.
you got the right idea
very compelling thoughts, Kendall. While so many folks look at finding the killer app, there is a LOT that can be done by making services like del.icio.us more useful by adding relationship and a little logic. Now you have me thinking about wrapping snmp!