XSLT and Binary File Formats

by Philip Fennell

With all the recent talk of angle bracket taxes and what XML is and isn't good for, I thought it would be fun to look at taking XSLT to places where it is not normally associated - the generation of binary file formats.


The sequence in XSLT 2.0 is of more use than the humble node-set. Not just restricted to nodes, you have access to things like the tokenize() function, that creates a sequence of strings or you can concatenate a sequence using the comma operator. The comma operator can be used on any data type.


However, there is nothing here that lifts us out of the ordinary; not until, that is, you create a sequence of xs:unsignedByte numbers. This sequence can be considered a byte sequence, and if you can create a byte sequence you can create just about any binary file format you like. A good example of this would be an image file like a Tagged Image File Format (TIFF) image. If you don't get involved in image compression, it is relatively easy to create a TIFF image, after all it is only a series of sequences of bytes.


Mind you, there are two problems to deal with. The first is that a basic XSLT 2.0 processor does not support the xs:unsignedByte data type. Only a schema aware processor is required to support that data type. So, in the absence of the latter you'd have to make do with xs:integer and put up with the extra memory needed. Secondly, and more importantly is - how to get a byte sequence out the other end of an XSLT processor!


2 Comments

Christian Timmerer
2008-06-02 02:40:35
MPEG, a working group of ISO/IEC, has standardized (within its MPEG-21 Digital Item Adaptation standard) means for generating binary file formats based on so-called Bitstream Syntax Descriptions (BSDs). The main application is the adaptation of (scalable) multimedia contents (JPEG2000, MPEG-4 SVC, etc.).


A BSD describes the structure of a bitstream in terms of packets, headers, layers, etc. It's also possible to include a parameter value in the BSD. The data type of the parameter is either provided through the schema to which the BSD belongs or directly in the BSD through xsi:type. Additional data types (mainly for multimedia formats) have been defined by the standard which are not natively covered by XML Schema built-in data types.

Philip Fennell
2008-06-02 03:20:14
Thanks Christian, I'll take a look at that.