A Next Generation of XPath?

by Uche Ogbuji

I finally got around to setting up an XPath NG mailing list. Moving forward with XPath without the enormous complications of W3C XPath 2.0 has long been a topic of raging debate on XML-DEV, www-tag, and other fora. I have been keenly interested in such an initiative, and I hope this mailing list brings it to fruition. I am certainly encouraged by our roaring start. First of all, from my welcome message:

This list is friendly to all discussion of improvements and updates to
XPath. Discussion of the W3C XPath 2.0 specs, as well as community
alternatives, and other relevant matters is welcome.

David Rosenborg, who has long worked on the FXPath - Functional XPath project outlined his initial ideas for and interest in XPath NG.

To summarize my favorite XPath language I'd start with XQuery,
remove the XML Schema stuff, use the XPath 1.0 data
model, and add a dose of functional programming.


I choose XQuery as the starting point to illustrate that I think
that XPath Ng should be a self-sufficient language i.e., not
dependent on a context like XSLT or XPointer. It should
however have the capabilities to inherit properties like
variables from a host language if present.


XPath 1.0 expressions should work unmodifed when processed by an XPath Ng processor.


XPath Ng should not mandate support for any particular schema language.
Instead it should provide a generic facility for tunneling auxiliary information
from, for example, a schema to the XPath Ng language.


A normative XML mapping of the XPath Ng syntax could be usefull. Just as there is
an XML and compact syntax for Relax Ng, there could be two syntaxes for XPath Ng too,
just that we'd use the compact form for the primary spec.

I then offered some procedural thoughts:

As to what we produce, I think we should take a leaf out of the EXSLT
playbook. I think we should produce relatively self-contained modules,
each of which exhibits healthy coupling and cohesion. Each module would
address one particular aspect of extension/modification of XPath 1.0.
One could also layer and combine modules. As an exampele, there might
be an axis extension module which allows one to define extension axes,
say by using qnames or setting up a community registry. On top of this
we might build an annotations module, which provides an extensible
system of annotations or properties for each node and uses the extension
axis module to define an annotations axis. Then we might build a data
typing module which assigns data types to nodes using the constructs in
the annotations module.

Jeni waded into the list with her usual brilliance. First she sketched out a suggested problem statement for XPath NG, and some core goals.

The problem as I see it: XPath should be something that's usable in
multiple different contexts -- in XSLT, in XQuery, in schema
languages, in XForms, in XPointer, in DOM, etc. etc. etc. XPath 2.0 is
so weighty (particularly because the XML Schema support) that it can't
be adopted wholesale into these different contexts, and doesn't have
easily identifiable modules, which means that we'll end up with
different parts being incorporated into different languages in
different ways. It's also a product of two communities (XSLT and
XQuery) with very different requirements pulling in different
directions which has led to some ugly compromises.

I think that the problem we should try to solve is to at least show
that a radically alternative design is possible.

The goals for XPath NG should be, I think:

* Simplicity
* Modularity
* Extensibility
* Schema language independence
* Backwards compatibility

Then Jeni even contributed a stab at a core data model for XPath NG:

Every value in XPath NG is a sequence containing zero or more items.

The items in a sequence can be of three kinds: nodes, values and
other sequences. Nodes are items that have identity whereas values
are items that do not have identity.

There are three core types of values in XPath NG: strings, numbers
and booleans. These values can be cast to each other using the XPath
1.0 rules. The only difference is that in XPath NG, strings that
hold numbers in scientific notation can be cast to numbers, 'INF' is
converted to Infinity and '-INF' is converted to negative Infinity.

Sequences are cast to strings and numbers by converting the first
item of the sequence to a string or number; if the sequence is empty
you get an empty string or NaN. Sequences are cast to boolean true
if they contain any items, boolean false if they do not.

Other modules may add more data types, but every data type must
define a mapping onto a string, a number and a boolean value, and
how that data type is created from a sequence.

A side note: On XML-DEV, Mike Kay seemed to object to our using the name "XPath NG". I and others feel there is no reason not to use this name, but some alternative name suggestions have been floated, including Jeni Tennison's "FIXPath" and my "NextPath".

If you have any ideas on what direction XPath should take, please join us. It's a community effort, and there are some heavy hitting players and deep discussions already.to be found on the list.

So what do you think of the upstart efforts towards XPath NG?