SPARQL My Opera!

by Kendall Clark

Related link: http://my.opera.com/community/sparql/



I've been pushing the convergence line lately: there's no fundamental conflict between Web 2.0 and the Semantic Web. They may not be quite two sides of the same coin, but they're definitely the same currency. Or something like that. Anyway.



The folks who run the Opera community portal have just done a really smart thing -- they've deployed a SPARQL web service for their data. It's the community's data, and what better way to let the people at their stuff?



This is better than a bespoke API because it's not another API-to-API integration problem. Rather, it exposes, essentially, a domain-specific (or "little") language for arbitrary use by arbitrary third parties.



And it does so using a standard, lightweight web service interface, REST HTTP, with the possibility to deploy SOAP pretty easily too.



Want to add some Opera community data to your latest Web 2.0 mashup? It's as easy as writing a few SPARQL queries and some code to handle the XML those queries return. Easy peasy. (And I've been working on a serialization of SPARQL query results in JSON, which will make it even easier to do in AJAX apps.)



There are some areas for improvement here, since the SPARQL query engine being used here isn't the speediest and there's no support for DESCRIBE queries yet. But still, this is a big deal!



I've hinted at this before, but I think I'll put a stake in the ground and say it clearly:



The developer(s) of every Web 2.0 app/service should seriously consider exposing their data with a SPARQL query service.


What else ya gonna use?


7 Comments

EliasT
2005-11-30 17:28:30
Examples Please...
Kendall,


Do you have any query examples or model snippets to explore this service a bit more?

danja
2005-12-01 01:28:47
Examples Please...
There are some examples notes at: http://leobard.twoday.net/stories/1210825/
PhilWilson
2005-12-01 05:03:06
examples
If you're going to write a column like this, you really do need to provide some examples. It's all very well Danny linking to some, but the article needs more substance!
Kendall
2005-12-01 05:54:39
Examples
Hmm, well, someone wants examples. Here's a very quick one:



PREFIX rdf:


SELECT ?s
WHERE {?s rdf:type }


Which will return a SPARQL XML Results document with all of the URIs of every weblog in the my.opera community knowledge base (there may be others not in the KB, of course; open world assumption and all that...)


Now, here's a query that finds everything asserted of one of those weblogs:


SELECT ?p ?o
WHERE { ?p ?o}


Which returns another SPARQL Results document:



<?xml version="1.0"?>
<sparql xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.w3.org/2001/sw/DataAccess/rf1/result2" xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd">
<head>
<variable name="p"/>
<variable name="o"/>
</head>
<results>
<result>
<binding name="p"><uri>http://purl.org/dc/elements/1.1/creator</uri></binding>
<binding name="o"><uri>http://my.opera.com/soumitra/xml/foaf#soumitra</uri></binding>
</result>
<result>
<binding name="p"><uri>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</uri></binding>
<binding name="o"><uri>http://purl.org/dc/elements/1.1/Collection</uri></binding>
</result>
<result>
<binding name="p"><uri>http://purl.org/dc/elements/1.1/description</uri></binding>
<binding name="o"><literal>my journal where I write about my coldplay experiences</literal></binding>
</result>
<result>
<binding name="p"><uri>http://purl.org/dc/elements/1.1/title</uri></binding>
<binding name="o"><literal>Check out http://www.soumitrabhattacharya.com .</literal></binding>
</result>
</results>
</sparql>


The next step is to get the Opera portal folks to publish a schema or ontology for their KB, so that people can figure out more interesting queries. Or for someone with more free time than I have to reverse engineer a schema document from a dump of the KB.

quxx
2005-12-01 08:49:08
how?
So how does one go about providing such a thing? Are there any open source implementations one could build on? Maybe a drop-in Rails generator, or PHP proxying script?


Also are there performance concerns with providing an openly queryable API? Security/privacy concerns aside, the main reason not to provide an openly queryable API is the concerns of performance.


Its intriguing, but need more info.

Kendall
2005-12-01 10:26:34
how?
So how does one go about providing such a thing? Are there any open source implementations one could build on? Maybe a drop-in Rails generator, or PHP proxying script?


Yes, there are lots of open source SPARQL implementations, including Sesame 2, ARQ, Redland, RDF::Query, and some others. There will be more and more of these as the spec gets finalized.


There's nothing as simple as drop-in Rails generator yet, not that I know of.


Also are there performance concerns with providing an openly queryable API? Security/privacy concerns aside, the main reason not to provide an openly queryable API is the concerns of performance.


Indeed there are. There are lots of things you can do, most of which are orthogonal and depend on what kind of service you're fronting. Of the top of my head:


1. SPARQL Protocol defines QueryRequestRefused, which a SPARQL service may return if a query is impractical.


One might determine which queries those are by doing dynamic or static query analysis and optimization, or some kind of process monitor with a simple timer, etc.


2. For a site like Flickr or delicious, if I were in charge, I'd build an RDF adapter layer over my RDBMS data. Then I'd start by segregating all of my data by, say, users. A query would always be against a user's data, not against all of the data extant. If you want to do aggregations between users, you retrieve an RDF representation, do a merge on the requester side, and query that.


If the strategy is to translate SPARQL queries into SQL, you could then do various optimizations and analyses (or process monitoring) on the SQL queries -- the database literature and market knows about this stuff.


3. Smaller sites might consider using an RDF-native database instead of RDBMS; which still allows KB segregations along various vectors. And RDF query engines are going to have to mature to do query cost analysis anyway if they're going to be serious players.


4. There are few simple things that anyone can do; for instance, I'm surprised that the Opera service allows the SELECT ?s ?p ?o WHERE {?s ?p ?o} query since it's equivalent to transfering the entire KB, which is going to be an expensive operation.


5. Finally, if you don't want to segregate KBs, you can segregate requests or users by requiring authentication (since SPARQL Protocol's HTTP binding is a simple GET) at the HTTP level, where you can log, trace, ban, (or charge more!) users who insist on expensive queries.

kjetilk
2005-12-05 08:45:53
Thanks!
Thanks, Kendall for the nice review of my SPARQL engine! Also, thanks for your work on the Protocol!


I added some examples to the query page myself, to help people get an idea, but I am happy that you did too. I chose to build upon Redland/Rasqal because it fits rather well into the existing architecture, and allowed me to get this out rather fast, as it is indeed something that will allow people to use these data in novel ways and so bring more interested users to Opera and our community.


It is just a start, allowing experimentation and allows us to gain experience in a field where there is little experience to build on. Clearly, as usage soars, much more work will be needed.


I do not anymore allow the retrieval of the full model, as I rewrote the resource limiting code to emphasise memory use rather than a timeout, and now a too big or complex query will result in an error. Unfortunately, it will currently result in a proxy error rather than a internal server error, I will have to fix that later.


Also, I'll try stay active in the Semantic Web community, so it shouldn't be necessary to reverse engineer the model, just ask! :-) I intend to follow the work on service descriptions closely, but time doesn't allow to elaborate on this right now. Briefly, our user data is modelled mostly using FOAF, our galleries using FOAF Image and my own Gallery schema, while blog posts and forum contributions are mostly Dublin Core.


So, we're just getting started, but I hope it will also get the Semantic Web and the Web 2.0 communities started!