Pattern matching with XML

by Michael Day

[Update: see below]. A few years ago, Eric van der Vlist put together a proof of concept XML schema language called Examplotron. The clever part of Examplotron is that the schema for a given XML document is that document itself; a document is its own schema. This allows schemas to be designed by writing down example documents (examplotron, get it?) which can then be generalised automatically to produce a RELAX NG schema for those documents and other documents like them. Clever. Now, what if XPath worked like that?

8 Comments

Andrew Houghton
2007-03-28 06:03:52
Two points on your article. The first is on your h1 p example. I believe that you have to be careful with the XPath expression h1/following-sibling::p. This XPath expression will return multiple p nodes which you might not have expected. For example if you had the following siblings: h1 p table p, the XPath expression would return both p nodes. You probably wanted to constrain that XPath expression to h1/following-sibling::*[1]/self::p, to insure that you only match an h1 node and its immediately following sibling that is a p node.


The second point follows along the same lines with your XSLT solution:


match="lastName[following-sibling::*[1]/self::firstName]"
select="following-sibling::*[1]/self::firstName"


match="firstName[preceding-sibling::*[1]/self::lastName]"

Andrew Houghton
2007-03-28 06:21:16
FYI, in case you might be thinking you could replace h1/following-sibling::*[1]/self::p with h1/following-sibling::p[1] consider the following sibling node set: h1 table p


h1/following-sibling::*[1]/self:p returns an empty node set
h1/following-sibling::p[1] return p

Bill Donoghoe
2007-03-28 15:45:32
Here is my take on representing Xpath expressions in XML.
If the goal is to improve readability then I belive that a better approach would be to provide a graphical representation of the expression, akin to those used for XML schema in various editors. Of course, translating xpath notation into XML would provide flexibility in this area.
The downside of using an XML version of an xpath as documentation is verbosity (because a "complete" XML schema for xpath would be fairly complex).


Footnote: As an example, the Axis diagrams in Michael Kays book are IMHO worth more than a thousand words.

CKnell
2007-03-29 10:06:53
<xsl:template match="element()[lastName][firstName]">
<xsl:element name="{name(.)}">
<xsl:copy-of select="firstName" />
<xsl:copy-of select="lastName" />
</xsl:element>
</xsl:template>
Michael Day
2007-03-29 15:42:55
CKnell, that's a good solution, although it might be better to use xsl:copy to create the element, in case it is in a namespace:



<xsl:template match="element()[lastName][firstName]">
<xsl:copy>
<xsl:copy-of select="firstName"/>
<xsl:copy-of select="lastName"/>
</xsl:copy>
</xsl:template>


However, your solution assumes that firstName and lastName are grouped together by a parent element, which makes the problem a lot easier. What about matching two consecutive elements that are not grouped together?

CKnell
2007-03-30 11:53:35
Using the name() function, namespace prefixes are copied (unlike local-name()), so I don't understand the objection.


The solution I offered matched the given XML source document, so I don't understand the next remark either. Since XML documents have exactly one root element, all other elements will be descendants of it. In the proferred source document, firstName and lastName are children of ex:search. Are you proposing a source document where firstName and lastName will be children of different parents? Will these different parent elements be siblings or will they be nested somehow?


"Grouped together" is not XML-speak, so you need to clarify your meaning.


I'd need to see some concrete example to think over.

CKnell
2007-03-30 12:50:12
I'd like to ammend my last comment. Please change, "In the proferred source document, firstName and lastName are children of ex:search." to, "In the proferred source document, firstName and lastName are children of ex:search and ex:replace."
Michael Day
2007-03-30 17:30:32
Sorry, I wasn't clear. The search/replace XML is an example of a pattern that could be applied to another XML document, one like this perhaps:



<citation>
<lastName>Smith</lastName>
<firstName>John</firstName>
<title>To Catch a Walrus</title>
<date>1967</date>
...


Or the <h1/><p/> example, which also has two sequential elements that have other preceding and following siblings. By not being grouped together I meant that they are not enclosed in a parent element with no other siblings, like this: <div><h1/><p/></div>. Grouping them like this would make them much easier to select with XPath, as you could just select the parent element. However, you can't rely on this, and I'm curious as to how XSLT and XPath can handle this situation, as it seems that a pattern matching mechanism similar to regular expressions makes it easier.