Are long names slower to parse in XML?

by Rick Jelliffe

Rob Weir has done some interesting stats on XML parse time of real documents and the effect of increasing the elements and attribute names. The blog article is calledThe Celerity of Velocity. The result? Even though we expanded some NCNames to 32-times their original length, making a 5x increase in the average NCName length, it made no significant difference in parse time. There is no discernible slow down in parse time as the element and attribute names increase.

I don't think he is claiming that this could happen forever or for all software, of course! Indeed, it might be the sign of crap software: if you went mad and allocated a 1K buffer for each name then copied the 1K of text startgin with each NCName you certainly would get constant parsing time regardless of name length.

Rob's figures are of course difficult to accept. I would like them to be wrong. They seem to go against the kinds of stats that the Efficient XML proponents give. But a number is worth a thousand words.

3 Comments

Anthony B. Coates
2006-12-22 04:57:19
One of members of the FpML Architecture Working Group did a lot of tests on this a couple of years ago. His conclusion was that the total size of the XML file was the only important metric with regard to parse time.
Cheers, Tony.
Jess Sightler
2006-12-22 13:48:26
So basically he has shown us that document complexity is far more important than marginal changes in document size (especially when a good platform can mitigate the change in size by using very little additional RAM).


I am not sure that this is particularly surprising.

Rob
2006-12-27 08:50:25
I'd put my conclusion as this: parse time is robust under a wide range of NCName lengths, so losing human comprehensibility for the hopes of a performance gain is probably not a wise trade-off.


Of course in extreme cases I will certainly be wrong. A tag name should not be a novel length description. That would hurt performance as well as comprehensibility.


I have nothing against the Efficient XML guys, but I would suggest that when dealing with XML as documents (in distinction to XML as data) we need to consider human efficiency as well.