Multi-stage XSLT scripts

by Rick Jelliffe

One of the old secrets of text processing used to be using multiple stages: a pipeline so that each stage did something clear and comprehensible. The programming language OmniMark actually built this notion in, first by having a three stage processor (text, SGML, ESIS) then by generalizing these into processes; in OmniMark they were all implemented as efficent co-routines or semi-co-routines.

But I only figured out how to do this in XSLT recently, multiple stages in a single script (not to be confused with multiple passes of the same data, which modes handle, nor with functions). Probably it is obvious to everyone else. It had never really clicked with me that you can store a tree of elements made from parsing the input data in a variable, then use another set of templates to process that, perhaps into another variable. It is not as flexible as OmnIMark still (no validation=no enforced unit test; no processing of unmarked-up text into marked-up text).


Kurt Cagle
2006-08-16 00:13:57
The one thing you need to watch out here with this approach is that XSLT 1.0 does not natively support the notion of intermediate trees. When you create content within a variable, as you are doing above, the content is an XML fragment, not a bonified collection of nodes, and as such the above will likely choke in most transformations.

Instead, you generally need to rely upon a processor's xx:node-set() function, which converts text and XML fragments into full nodes. Unfortunately, this makes the above somewhat more unwieldy:

<xsl:template match="/">
<variable name="idealXML">
<xsl:apply-templates mode="stage1" />

<variable name="outputXML">
<xsl:apply-templates mode="stage2" select=" xx:node-set($idealXML)/*" />

<xsl:apply-templates mode="stage3" select="xx:node-set($outputXML)/*" />

This is generally not a problem in XSLT 2.0, which eliminates the use of XML Fragments and treats all well formed nodal sets within a variable as template content.

M. David Peterson
2006-08-16 06:53:06
Hey Rick,

As per Kurt's comments, in XSLT 2.0 this becomes a pretty powerful combination.

I'll leave each additional file in a separate comment (past one URI and it forces the need for an approval), but here's an example of an XSLT 2.0 transformation that takes will build out a test suite of sorts, including the .bat file for each test, a master .bat file to run each generated .bat file, and all of the test XML files that increase in node count proportionately, all based on a sequence of nodes contained in a input XML, which is then 'seeded' by another XML file with the settings for the various variables necessary to generate all of the above.

Here's the overall explanation of how it works:

NOTE: I updated things a bit since then (not much... just a few lines, but the update lines make things more efficient), so I will provide a different link to download the zipped archive as a follow-up comment.

M. David Peterson
2006-08-16 06:53:33
XSLT 2.0 file:
M. David Peterson
2006-08-16 06:54:23
Config file for Saxon:

M. David Peterson
2006-08-16 06:54:58
Config file for Saxon.NET:

M. David Peterson
2006-08-16 06:55:34
Same, but for Mono,

M. David Peterson
2006-08-16 06:56:03
Master config file,

M. David Peterson
2006-08-16 06:59:25
2006-08-16 07:37:38
This was really interesting! I have seen XSL pipelines, but it always related to sending node sets to a stylesheet via a parameter. This seems very helpful for larger transformation that have to happen without any help from a programming language.
2006-08-21 07:45:39
I outlined how to do multi-stage processing using EXSLT back in 2002:

No need for XSLT 2.0