Labels vs. Types and Other Culture Clashes

by Simon St. Laurent

Related link: http://www.cranesoftwrights.com/schedule.htm#ptux



In day 3 of XSLT training, we've reached the point where a lot of divisions and the reasons for them are becoming clearer. The division between processing based on labels (XSLT 1.0's approach) and processing based on types (new to XSLT 2.0) is one of the most profound differences, but other such divides abound.



We've spent a lot of time here exploring how traditional programming assumptions can lead developers down the wrong path in XSLT. As Ken Holman put it cheerfully, "I wanted you to trip over some of the non-obvious problems, so you can get out of a rut." The basic problems seem to boil down to different expectations about what the programmer should be doing and what the program should know.



In the particular example we were working on, programmers could create their own mechanisms for tracking chapter numbers, effectively using variables and treating the processing of the document like the processing of an array. While it worked, it was a lot more complicated than the XSLT which used the XSLT processor's understanding of where the stylesheet was in the document. XSLT was already doing the work, tracking labels and hierarchies so the programming we added was duplication, serving primarily to make our inner programmers comfortable and putting an extra layer between our code and the actual state of processing.



(This also helps explain why XSLT isn't suitable for all data-processing tasks, as its strong foundations in tree structures make it less agreeable for working with structures, like graphs, which may be represented as XML but don't necessarily enjoy the hierarchy imposed by XML.)



The other divide, appearing regularly in discussions of XSLT/XPath 2.0, is the divide between processing labeled hierarchical structures and processing data of a given type. Labeled structures are explicit in every XML document, but types are generally kept in separate documents or processes - W3C XML Schema documents in the case of XSLT/XPath 2.0. While schema awareness adds some extra capabilities to XSLT processing, it also adds a layer of indirection. Programmers who like this kind of indirection will no doubt be pleased, but those who came to markup for its relatively direct approach to labeled information are often infuriated.



It's hard to tell where XSLT will go, given the strange mixture of markup-centric assumptions in its processing model and the layers of type-centric assumptions being added in version 2.0. Rather than a clean division, where markup tools are specific to markup assumptions, XSLT 2.0 challenges developers with a combination of a core language based on labeled hierarchies and features based on typing.



Today has been especially enjoyable for me, as I have the markup perspective and little patience for the typed programming perspective. (I'm a Java programmer, and I still find types only mildly useful - I know, I know, I should switch to Python.) This morning's discussion has illuminated how that divide plays through XSLT, and how XSLT 1.0 will likely remain my tool of choice for the kinds of XML processing problems for which I find transformations appropriate. As Ken Holman put it, addressing the more traditional programmers in the audience:



"Hopefully by the end of the day, you'll feel more comfortable with these native XML structure processing approaches."


There's one other clash worth noting, though primarily to note how it's less troubling in this context. When I first encountered XSL 1.0, the obvious clash seemed to be with Cascading Style Sheets (CSS) - both discussed formatting and had some surface similarity, but very different architectures (CSS annotation, XSL transformation) and syntax. At this point, though some animosity between the two camps remains, it's fairly clear that the foundations of the two are similar. They're both declarative, they both rely on the labeled hierarchies of the documents on which they operate, and they both build on understandings shared by markup communities. Relative to other clashes, that one is small.



How diverse should programming styles be?