RDF Parsing in XSLT

by Erik Wilde

During the recent discussion of the OAI-ORE drafts (which use RDF), the claim was made that RDF is serialized in RDF/XML and thus could be considered an XML representation of the underlying data model. My response to that was that the RDF model is different from XML, and that it thus is pretty hard to process RDF/XML using XML tools, in particular when considering all constructs allowed by RDF/XML, and maybe even the possibility how to update RDF/XML data using XML tools alone.



I tried for some time to find a general-purpose RDF/XML parser written in XSLT, but so far could not find one. But Google is imperfect and i might not know the best places where to look. So here is my question: Is there a general-purpose RDF/XML parser written in XSLT? It has to support all the fun stuff allowed by XML and RDF/XML, such as weird uses of namespace declarations, XML Base, rdf:ID and RDF/XML syntactic sugar. It must accept anything that is valid RDF/XML. As a result, it should produce some form of normalized RDF/XML, but I really don't care that much about the exact format (ideally, it should be XPath-friendly). The parser must be robust enough to produce the exact same normalized result for inputs that look radically different because of XML and RDF/XML syntax variations.



I am really interested to see whether such a beast exists, and if so, how big it is. My guess is that it's not trivial to write such a parser, but it definitely is possible. After finding out whether such a beast exists, my follow-up question will be whether there is an associated function library that can then work on the parsed RDF model, so that the data can be traversed, queried, updated, and serialized.


14 Comments

Jeni Tennison
2008-06-30 12:04:48
Norm (Walsh) did something called RDFTwig a while ago: See http://rdftwig.sourceforge.net/


I have no idea whether he's worked on it since 2003, and I seem to remember it used extension functions a fair amount, but it might be worth a look as a starting point.

Erik Wilde
2008-06-30 12:36:17
RDFTwig seems to be something different: A set of extension functions to access data in an RDF store (Jena in this case) from within XSLT (Xalan and Saxon). It might be a good starting point for an XML/XPath-friendly syntax for RDF (it seems to support various "views" of RDF graphs), but does not seem to have any RDF/XML-related features (most importantly, parsing RDF/XML). This is based on a 5min inspection of the RDFTwig home page, so I might be totally wrong...
Diego Berrueta
2008-06-30 13:52:57
You may be interested in XSLT+SPARQL, see http://berrueta.net/research/xsltsparql . They are some XPath functions which allow you to query an RDF model from XSLT style sheets. An SPARQL query such as "SELECT ?s ?p ?o WHERE { ?s ?p ?o }" executed using XSLT+SPARQL will give you a normalized XML document with all the triples of the original one.
Erik Wilde
2008-06-30 14:00:57
XSLT+SPARQL looks like a slight variation of RDFTwig (RDFTwig + SPARQL, I guess): Assuming you have an RDF datastore and RDF data in it, this is how you can access (and query) it from within XSLT. I am interested in finding out how plain XML developers (those not having RDF datastores and not being interested in having one) can deal with arbitrary RDF serialized as RDF/XML. How easy it is for them to parse and access RDF data?
M. David Peterson
2008-06-30 18:09:22
As much as I truly love XSLT, I'm just not sure that an XSLT-based RDF parser really makes much sense. As you allude to, RDF represents a data model serialized to XML (well, that's kind of the wrong way to say it, but none-the-less...) where as XSLT (specifically 1.0. 2.0 changes things up. Is 2.0 an option for your needs?) was designed around a document-oriented foundation.


I would have to think about it some more, but this really seems like a problem better suited for XQuery. Is XQuery an option?

M. David Peterson
2008-06-30 18:21:00
Just came across > http://www.w3.org/2002/03/11-RDF-XSL/ -- which opens with:


>> "This document describes a XSLT stylesheet that transforms application/xml+rdf to a series of RDF database API calls. Further, it describes a schema annotation system for generating that XSLT, as well as other grammar-defined applications."


... which, to be honest, I'm not even sure I completely understand what it's attempting to do. It was written in 2002, but it does seem to make attempt at building out an XQuery-like language that's embedded into the XSLT which, as I pointed out in my last comment, would make sense given the data-model focus of RDF and the data-query focus of XQuery.


Peter Keane
2008-06-30 19:51:14
Seen on twitter:


"Tim Bray timbray A title that says "Don't read this as you value your sanity": "RDF Parsing in XSLT" about 6 hours ago from twitterrific"


;-).


--peter keane

Erik Wilde
2008-06-30 20:11:31
David: XQuery does not help, I think. the complexity lies in handling all the variations of the XML and RDF/XML syntaxes (I love the plural of syntax!), and whether you are doing this in XQuery functions or XSLT functions or XSLT templates does not really make a huge difference.


Of course, once you have the normalized RDF format that I mentioned as the possible result of such a parser, XQuery would be a good tool for querying that data, but the part I am most interested for now is the parsing part.


I am not sure about the W3C reference you are mentioning, but it looks to my like several of the other projects I found: They started and figured out that it was possible in principle (which is not really a surprise), but stopped before finishing all the hard parts which make it quite tricky to write a general parser handling all possible cases.

Erik Wilde
2008-06-30 23:33:38
David: Sure, XSLT 2.0 is fine with me, it usually allows to write much better code than 1.0, so I would actually prefer 2.0.
M. David Peterson
2008-07-01 10:27:58
@Erik,


>> David: Sure, XSLT 2.0 is fine with me, it usually allows to write much better code than 1.0, so I would actually prefer 2.0.


I'll dig deeper and see what I can come up with.

Damian Steer
2008-07-01 14:38:42
Several attempts were made to do this in the old days. Seemed to be some kind of rite of passage at one point :-)


The most curious one I know of is Jeremy Carroll's snail http://www.hpl.hp.com/personal/jjc/snail/ .


However Max Froumentin's http://www.w3.org/2001/12/rubyrdf/xsltrdf/README.html is much more practical. I think there were others knocking around, too (Dan Connolly? Eric Prud'hommeaux?), but Max's ought to get you started.

Erik Wilde
2008-07-01 15:06:37
Damian: Thanks for the pointers!


It's not that I need such a parser right now. I prefer to work with well-designed XML vocabularies so that i don't need this. What I am trying to do is to find out whether there even is something like that. I have heard people saying that "RDF/XML is easily usable XML data" one time too often now. So I want to find out what they are actually doing when using an XML toolset and presented with RDF/XML data.


Max Froumentin's implementation certainly looks like it's closer to a complete implementation than most other attempts I have seen so far.

Peter Keane
2008-07-01 18:43:56
Just a clarification on my "retweet" of Tim Bray above: I was (clumsily, since I am not sure it came across properly) interpreting the comment *not* as a criticism of this article, but rather a suggestion that even someone w/ his in-depth knowledge of XML regards using XSLT to parse RDF if something of a "...where angels fear to tread" type of endeavor.


Erik Wilde
2008-07-16 02:59:24
It is kind of interesting that this thread closed without any implementation showing up. There are the usual rumors that somebody once did that, but it definitely does not seem to be mainstream. Which really leaves me with the question: As an XML developer, how am I supposed to handle RDF/XML data? Write my own RDF/XML parser?


And all of this of course ignores the much more problematic fact that "applications should deal with the deductive closure of the RDF graph they process, not the (syntactic) graph itself." (http://www.lassila.org/blog/archive/2005/05/xml_considered.html written by Ora Lassila, one of the creators or RDF.)


I am still unconvinced that it is accurate to label something as being XML-based if it uses RDF/XML. Apart from the very basic task of parsing RDF/XML into a DOM/XDM tree, there are just too few useful things which can be done on the level of XML tools.