The Python comunity has too many deceptive XML benchmarks

by Uche Ogbuji

The Python/XML community has an unfortunately long tradition of dodgy benchmarks. I had a lot to say about probably the most egregious example in my article on PyRXP. PyRXP is called an XML parser, and its developers benchmark it as such against other Python/XML parsers. The problem is that it turns out PyRXP is not an XML parser. It fails the most fundamental conformance to the most important aspect of XML: Unicode support. As a result, a benchmark of PyRXP against an XML parser is ludicrously unfair. In my article I had a lot to say about how poisonous such unfair benchmarks are.



On the less egregious end are benchmarks of libxml2's default Python binding, which is in many ways so gnomic (no pun intended) and trecherous that it's also an unfair comparison against most Pythonic XML tools. It sounds as if Martijn Faassen's lxml is making decent progress towards rectifying this.



But I must say that the benchmarks that were the last straw for me came from an old friend. Fredrik Lundh ("/F") is IMO one of the few XML package developers in the Python community who really understand both Python and XML. This has been generally borne out in his ElementTree library, about which I've always had a lot of good things to say. cElementTree
came along and suddenly raised the Python/XML benchmark sweepstakes once again. As part of promotion of cElementTree, /F posted a benchmark on the home page. The benchmarks are very flattering to cElementTree, and it's probably deserving of some such flattery, but as I examined the performance issue a bit more, I've come to conclude that his benchmarks are pretty much useless.



The problem is that besides a performance bug in my own Amara 0.9.2, which /F brought to my notice, and that was fixed in the subsequent release, I was unable to reproduce under real-world conditions anything like the proportions implied in /F's benchmarks. Well, /F pretty much admits that all he's doing in his benchmark is reading in a file using each library. Hmm. This is not the stuff of which useful benchmarks are made. Nobody reads in a 3MB XML document just to throw all the data away, least of all Python developers who have long been vocal of their desire to do as little with XML as possible. Of course of I can't be 100% sure in this complaint because I haven't seen the benchmark code, but then again that's just another complaint.



I set out to run at least one real-world benchmark, in order to determine whether there is anything to the no-op benchmarks /F uses. The basics come from
this article, where I introduce the Old Testament test. The idea is simply to print all verses containing the word 'begat' Jon Bosak's Old Testament in XML, a 3.3MB document. A quick note on the characteristics of the file: it contains 23145 v elements containing each Bible verse and only text: no child elements. The v elements and their content represent about 3.2 of the file's total 3.3MB. In the rest of this article I present the code and results.



I'm working on a Dell Inspiron 8600 notebook with 2GB RAM. It's a Centrino 1.7GHz, which is about equivalent to a P4-3GHz (modulo the equally wacky world of CPU benchmarks). The OS is Fedora Core 3 Linux, and I've tuned DMA and the like. I'm running Python 2.3.2. The following are my pystone results:



$ python /home/uogbuji/lib/lib/python2.3/test/pystone.py
Pystone(1.1) time for 50000 passes = 2.99
This machine benchmarks at 16722.4 pystones/second


I ran each case 5 times and recorded the high and low run times, according to the UNIX time command. In understand very well that this is not quite statistically thorough, but It's well ahead of all the other such benchmarks I've seen in terms of reproduceability (I present all my code) and usefulness (this is a real-world use-case for XML processing).



First up: plain old PySAX. Forget the performance characteristics for a moment: this code was just a pain in the arse to write.



from xml import sax

class OtHandler(sax.ContentHandler):
def __init__(self):
#Yes, all this rigmarole *is* required, otherwise
#you could miss The word "begat" split across
#multiple SAX events
self.verse = None
return

def startElementNS(self, (ns, local), qname, attrs):
if local == u'v':
self.verse = u''
return

def endElementNS(self, name, qname):
if (self.verse is not None
and self.verse.find(u'begat') != -1):
print self.verse
self.verse = None
return

def characters(self, text):
if self.verse is not None:
#Yeah yeah, probably a tad faster to use the
#''.join(fragment_list) trick, but not worth
#the complication with these small verse chunks
self.verse += text
return

handler = OtHandler()
parser = sax.make_parser()
parser.setContentHandler(handler)
parser.setFeature(sax.handler.feature_namespaces, 1)
parser.parse("ot.xml")


I get numbers ranging from 2.32 - 3.97 seconds.



Next up is PySAX using a filter to normalize text events, and thus simplify the SAX code a great deal. The filter, amara.saxtools.normalize_text_filter is basically the one I
posted here, with some improvements. The code is much less painful than the PySAX example above, but it still demonstrates why SAX turns off people used to Python's simplicity.



from xml import sax
from amara import saxtools

class OtHandler(sax.ContentHandler):
def characters(self, text):
if text.find(u'begat') != -1:
print text
return

handler = OtHandler()
parser = sax.make_parser()
normal_parser = saxtools.normalize_text_filter(parser)
normal_parser.setContentHandler(handler)
normal_parser.setFeature(sax.handler.feature_namespaces, 1)
normal_parser.parse("ot.xml")


I get numbers ranging from 2.66 - 4.88 seconds.



Next up is Amara pushdom, which tries to combine some of the performance advantages of SAX with the (relative) ease of DOM.



from amara import domtools

for docfrag in domtools.pushdom(u'v', source='ot.xml'):
text = docfrag.childNodes[0].firstChild.data
if text.find(u'begat') != -1:
print text


I get numbers ranging from 5.83 - 7.11 seconds.



Next up is Amara pushbind, which tries to combine some of the performance advantages of SAX with the most Pythonic (and thus easy) API I can imagine.



from amara import binderytools

for v in binderytools.pushbind(u'v', source='ot.xml'):
text = unicode(v)
if text.find(u'begat') != -1:
print text


I get numbers ranging from 10.46 - 11.40 seconds.



Next up is Amara bindery chunker, which is the basis of pushbind.



from xml import sax
from amara import binderytools

def handle_chunk(docfrag):
text = unicode(docfrag.v)
if text.find(u'begat') != -1:
print text

xpatterns = 'v'
handler = binderytools.saxbind_chunker(xpatterns=xpatterns,
chunk_consumer=handle_chunk
)
parser = sax.make_parser()
parser.setContentHandler(handler)
parser.setFeature(sax.handler.feature_namespaces, 1)
parser.parse("ot.xml")


I get numbers ranging from 9.44 - 10.27 seconds.



Finally, I look at /F's cElementTree.



import cElementTree as ElementTree

tree = ElementTree.parse("ot.xml")
for v in tree.findall("//v"):
text = v.text
if text.find(u'begat') != -1:
print text


I get numbers ranging from 1.53 - 3.18 seconds.



So what do I conclude from these numbers? As I've said before, the speed of cElementTree amazes, me, but it's advantage in the real world is nowhere near as dramatic as /F's benchmarks claim. More relevant to my own vanity, Amara 0.9.3's disadvantage in the real world is nowhere as dramatic as /F's benchmarks claim. IMHO, it's close enough in performance to all the other options, and offers so many advantages in areas besides performance, that it's a very respectable alternative to any Python/XML library out there.



But the point of this exercise goes far beyond all that. We really need to clean up our act in what is a very strange political battleground in the Python/XML space. If we've decided that MIPS wars are what we're going to be all about in development, then let's benchmark properly. Let's gather some real-world use-cases and normalized test conditions. Let's make sure all our benchmarks are transparent (at least release all the code used), and let's put some statistical rigor behind them (not an easy thing to do, and not something I claim to have done in this article). Let's do all this as a community.



While we're at it, I'd like to repeat my call for test case diversity from my PyRXP article: [R]un the tests
on a variety of hardware and operating systems, and [don't]
focus on a single XML file, but rather examine a variety of XML files.
Numerous characteristics of XML files can affect parsing and processing
speed, including:





  • The preponderance of elements versus attributes versus text (and
    even comments and processing instructions)


  • Any repetition of element or attribute names, values and text content

  • The distribution of white space


  • The character encoding

  • The use of character and general entities

  • The input source (in-memory, string, file, URL, etc.)



And if we're not willing to do things rightly, let's stop deceiving users with meaningless benchmarks.




What real-world conditions would you like to see represented in respectable Python/XML benchmarks?


34 Comments

effbot
2005-01-24 03:22:31
Wow.
"I ran each case 5 times and recorded the high and low run times, according to the UNIX time command"


You're benchmarking subsecond operations by timing the entire Python process? Priceless.

faassen
2005-01-24 08:34:35
deceptive benchmarks
The benchmark in this article is indeed rather deceptive, as the effbot said.


You definitely want to use time.time() or something like that *inside* the program to avoid measuring the Python startup and shutdown time, which is hardly relevant, unless you build web applications using CGI scripts. :)


That said, it is good idea to communicate our benchmarking strategy. What I did when I was curious what Fredrik used was just mail him and ask him. I've been using the same strategy as a result. I get numbers slightly different from his, though not drastically. It's likely due to platform/compiler differences (I'm on Linux, he's on Windows). See my weblog for some numbers:


http://faassen.n--tree.net/blog

effbot
2005-01-24 12:09:20
deceptive benchmarks
I posted a rather long description of my test approach, and the issues involved in benchmarking extremely fast Python extensions that consume large amounts of memory, to the xml-sig in the middle of January. Unlike Uche, I know what I'm measuring.


(fwiw, the virtual debunking team currently suspects that Uche has done most or all of the following mistakes: included Python startup and shutdown times in his figures, included module load times in his figures (cET 0.9 can parse OT.XML nine times in the time it takes Python to load Amara's bindtools component), sent output to a terminal instead of a file or /dev/null, used non-idiomatic solutions for the SAX and cET samples (for cET, Uche's code is 40% slower than the most obvious solution), and, quite possibly, used an unreleased version of the underlying cDomlette library, which is reportedly 3-4 times faster than the current release. And yes, the pystone figures don't seem to match his hardware description, either. This article should be archived in the "the whole bloody breakfast on my face" category, and replaced with an apology.)

faassen
2005-01-24 14:51:55
deceptive benchmarks
It's indeed a surprisingly slow pystone rating for a dell inspiron 8600, 1700 megahertz. I'm on one now (with only 512 megabytes of RAM) and I get the following:


Pystone(1.1) time for 50000 passes = 1.3
This machine benchmarks at 38461.5 pystones/second


perhaps some CPU scaling was going on on Uche's so it wasn't running at the full 1700 mhz?

effbot
2005-01-24 15:08:22
deceptive benchmarks
38461 pystones/s matches the observation made here (37594 pystones/s). weird.


but of course, running the Amara tests at a higher clockspeed, and the pystone/sax/cet tests at a lower speed, might also explain the 3X slowdown.

uche
2005-01-24 17:06:26
"There's a Riot Goin' On"
Sly Stone said it.


Sure, I can accept that writing a proper test harness within Python is a better way to time it. And how useful is it that we can even have this discussion, since I didn't make a mystery of my benchmarking technique. And any adjustments are easy, since I didn't make a mystery of my code.


The ensuing discussion is somewhat along the lines I suggested in the article (and it has such color and character to go with it). But crucially, the color interferes with any understanding that there is a lot more to test (I gave examples), and many more ways to test it before we have MIPS-wars quality benchmarks.


All the effbot bluster in the world does not change the fact that benchmarking requires transparency, which has been completely missing from the Python/XML gorilla match until today. And it doesn't change the fact that his benchmarks are useless, essentially measuring conditions completely alien to anyone's actual use.


So effbot has useless benchmarks, and argues that I also now have useless benchmarks. Nowhere to go from there but up.

David_Mertz
2005-01-24 19:47:49
Trying my own tool (Gnosis Utils)

I always so like the breath of fresh air Uche brings to most topics. His benchmark examples are nicely down to earth (I would point out that I always do almost exactly the same thing--including full code--when I benchmark tools in my articles).


Anyway, with no real a priori sense of how it would come out, I decided to try gnosis.xml.objectify in the mix. I like my API best and all :-).


First, the script used:


$ cat time_xo.py 
from gnosis.xml.objectify import make_instance, walk_xo, tagname
ot = make_instance('ot/ot.xml')
for node in walk_xo(ot):
if tagname(node) == 'v' and 'begat' in node.PCDATA:
print node.PCDATA


I don't use the gnosis.xml.objectify.utils.XPath() function here, though I could. That's because I don't really believe XPath is entirely Pythonic


The timings are quite consistent between five runs:



$ time python2.3 time_xo.py > verses


real 0m7.200s
user 0m5.790s
sys 0m0.350s


Oh... I run on a quite different architecture than Uche, but the Pystone on my Powerbook is just about the same as Uche's:



$ uname -a
Darwin gnosis-powerbook.local 7.7.0 Darwin Kernel Version 7.7.0: Sun Nov 7 16:06:51 PST 2004; root:xnu/xnu-517.9.5.obj~1/RELEASE_PPC Power Macintosh powerpc
$ python /sw/lib/python2.3/test/pystone.py
Pystone(1.1) time for 50000 passes = 3.04
This machine benchmarks at 16447.4 pystones/second
PJE
2005-01-24 19:51:30
Python's startup time
...is definitely not relevant to a benchmark, unless you're trying to compare Python to some other solution.


At the very least, you should break out the statistics into startup time, module import time, and actual run time of whatever function represents the program's functionality. Avoiding console output would be a good idea too.


These things are pretty basic to benchmarking any tool for any language -- and I don't have any axe to grind about these tools; I actively try to avoid XML as much as possible anyway. :)

David_Mertz
2005-01-24 20:03:21
Trying my own tool (Gnosis Utils)
I am thinking, BTW, of wrapping cElementTree.iterparse() into another gnosis.xml.objectify parser. Currently, there's a painfully slow DOM parser, and a reasonably fast EXPAT parser in there. But the design makes it easy to plug in something else. I have vaguely wanted to create an RXPU parser too (and more recently LXML)... but if I think cElementTree is even faster, I might just do that.


I like my (more Pythonic) API better than that in ElementTree, but if I get speed, why not take advantage of /F's underlying work? Of course, there're a zillion things I want to get around to, so it's not quite a promise.

faassen
2005-01-25 03:07:14
"There's a Riot Goin' On"
I wouldn't say the speed of parsing XML into a Pythonic datastructure is completely alien to people's use. It can be done a lot more slowly, as has been shown in the past over and over, and cElementTree can do it very quickly.


That means we can now be far less concerned with parsing overhead. Since the structure is already Python-style, the overhead of ElementTree API calls can then be minimal, as is shown by the fast performance of the find operation in ElementTree. Non-C ElementTree find() sometimes can even beat libxml2 XPath, which is implemented in C.


lxml.etree can do a parse very quickly too, using the underlying libxml2 library. Unfortunately it isn't "done" yet then if you want to use the ElementTree API, are there are Python proxies to be produced while the user accesses the XML. This has been made fairly fast by now, but it still lags behind ElementTree. For libxml2 native xpath this proxy overhead is far less, and you can get down to busines right away.

If you want to know how I know all this, see my blog for a lot of benchmarking over the last couple of weeks. I didn't have a 'begat' test yet, but I did test a simple //v test, as Uche did in an earlier article.

AaronWatters
2005-01-25 08:50:46
Wow.
Hmmm... maybe benchmarks of this ilk should go
through some sort of peer review before they are
widely publicized? Just a thought.
AaronWatters
2005-01-25 08:50:46
Wow.
Hmmm... maybe benchmarks of this ilk should go
through some sort of peer review before they are
widely publicized? Just a thought.
ialbert
2005-01-25 08:55:58
"There's a Riot Goin' On"
So effbot has useless benchmarks, and argues that I also now have useless benchmarks. Nowhere to go from there but up.



What kind of excuse is that?


You're the one that brought up the whole thing yet it seems that you have done a worse job at becnhmarking than others. Very ironic.


I think your benchmarking method is very ad-hoc and you'd be better served if you fixed the glaring errors and posted an updated version of your findings.


I'm getting incomparably better results with cElementtree (runing the same program as you do but I'm benchmarking it with timeit, around 0.25 seconds/run) on a similar laptop. Could not test your framework since your FTP system is down.

DavidAscher
2005-01-25 13:30:25
He lives!
Aaron lives! Wonderful news, in this sea of vitriol...
oreillyuser
2005-01-25 13:45:13
"There's a Riot Goin' On"
"What kind of excuse is that?"


Great non-point, Istvan Albert.


There are two points I believe Uche made that have not been addressed despite all of Fredrik Lundh's (effbot) blustering here, on his blog, and on his pythonware daily site. One is that Fredrik's benchmark is pretty useless because it just loads an XML file into a data structure but does nothing significant with it. Two is that Fredrik's useless benchmarks give the misleading impression then that some other XML tools are much horribly slower than they really are, when really most of the XML tools are quite comparable to one another speed-wise, and some of them are even better when you consider other issues like how easy they are to use. And really, since this is Python, ease of use is of primary importance. celementtree may or may not be the fastest, but I don't believe it is the easiest to use or install.

effbot
2005-01-25 15:02:30
"There's a Riot Goin' On"
So the only supporter Uche can bring up posts anonymously, repeats Uche's nonsense, and uses exactly the same words, style and phrasing as Uche himself. Cute.


(as for your so-called arguments, some hints: for three processes that run in sequence, the total time is A+B+C, not max(A, B, C). if you set A to zero, the total will drop. second, how hard is it to "click on installer" or type "python setup.py install". thousands of people have already done it. I'm sure you can do it to, if you try. feel free to mail me if you need help.)

effbot
2005-01-25 15:08:00
Trying my own tool (Gnosis Utils)
david, it's "different", not "more". classes fit some kind of problems really well, but not all of them. and for some problems, your current code doesn't work at all (hint: namespaces).


and of course you should take advantage of the stuff I'm doing. it may not save you that much, since you still have to create all the objects over at the python side of things, but it's worth trying. drop me a line if you have questions.

effbot
2005-01-25 15:29:35
Wow.
Benchmarking individual components really isn't a black art. Just make sure you eliminate all irrelevant stuff from the measurements, use the best timing device you have access to, run the tests multiple times, and pick the best observed time (unless the object you're studying involves random elements). Make sure your math and your logic is sound; if A and B are large numbers, A+B doesn't equal max(A, B). If you're measuring really fast things, you need to be more careful. If you want prior art, study available tools (such as timeit). If anything I just said is news to you, don't do benchmarks. No need to be stupid. Being stupid only makes you look like a fool.
uche
2005-01-25 18:49:06
So effbot goes from crybaby to libelous cretin...
I really hate to follow this ugly race all the way to the bottom, but a couple of Fredrik's posts here have gone far beyond the pale of civility, and I'm not going to put up with it.


First of all, he insinuates that I ran my tests on Amara with my CPU clock speed set higher than when I ran my tests with cElementTree. That was bad (and stupid) enough. Now he implies that I logged in here and anonymously posted a note supporting my point.


This is not the sort of gross libel that I shall dignify with any response other than that I'll have no dealings with Fredrik anymore, directly or indirectly until he apologizes for his crude and infantile insinuations. And it's probably best if I don't ever run into him in person again...


I thought he was a feckless crybaby when he wanted me to apologize for my original post, but nothing in that post rises to the level of libel, and I think that Frederik has now revealed more of his own character than I think he might have wanted the world to know.

David_Mertz
2005-01-26 00:16:11
Trying my own tool (Gnosis Utils)
Fredrik is, of course, just simply wrong about namespaces in gnosis.xml.objectify. But I admit that I haven't documented the "enable namespaces" switch as well as I should have. I inherit the issue from the pyexpat code, but I should make the stuff work a bit more easily.


Then again, from what I can infer (from some private email), Fredrik also seems to believe that gnosis.xml.objectify only uses the original DOM parser, rather than also (and now by default) using the EXPAT style that it's had since before the first version of ElementTree was created.


Unfortunately, /F's demeanor and tact has taken a big turn for the worse.

faassen
2005-01-26 03:29:44
So effbot goes from crybaby to libelous cretin...
Let's all calm down a bit. I think both of you make valid points, and both of you feel your egos are being trodden on by the other at this stage.


Fredrik has gone rather overboard in the flaming and insinuations, I agree, but Fredrik is in my opinion correct that this article harms far more than it helps. It is rather surpremely ironic that your article is called "The Python comunity has too many deceptive XML benchmarks" and that then the benchmarks here fall apart miserably at minimal prodding.


You claim you want to improve the benchmarking procedure of XML libraries. Let's do that. What *is* up with the following:


* the absurdly low Pystone rating on a 1700 mhz centrino?


* why is Amara such a speed-outlier? Why couldn't Fredrik reproduce this?


* did you send output of the programs to the terminal or a file? If you did send it to the terminal, won't that skew the best performing libraries down unfairly?


* Isn't measuring whole-program running time for a sub-second measurement rather absurd? If you do that, won't it skew the best performing libraries down unfairly?


Let's figure out what's going on with this. We need to do this in order to get sane benchmarks.


Let's also put to rest the notion that parsing speed doesn't matter anyway and is not a useful benchmark. If what you are parsing produces a Pythonic datastructure, which happens in the case of cElementTree, then it *does* matter. How much is up for debate, but it does matter and it's a useful benchmark.


That said, obviously your test to find all verses that say begat is *also* a useful benchmark. It just happens that ElementTree is good at this too, because of this forementioned parsing into Pythonic datastructures. For instance, lxml.etree, which does not parse into Pythonic datastructures but is written mostly in C, can only beat it if it goes down to C completely and expresses it as an xpath expression.


Concerning benchmarks, even though Fredrik didn't publish his benchmarking procedure widely at the time I started to try to replicate them, and I think that's a mistake, it wasn't that hard to replicate them either. Later on he posted what he was doing to a mailing list. For what it's worth, I *can* replicate Fredrik's benchmarks concerning memory almost perfectly, and fairly well for performance figures (the difference there is likely due to platform differences).


And Fredrik, can you please shut up with the insinuations? Let's not blame on evil what we can blame on stupidity (sorry, Uche :) I guess Fredrik cannot believe some mistakes were made as he thinks you're too smart for it. That harms even more than Uche's article. It doesn't help your believability either, and distracts the debate from what it should be about.

RobertKern
2005-01-26 04:09:57
Where angels fear to tread
I know I'm going to regret getting in the middle of this.


Let me say this first: I have no investment in whose XML tool is the fastest or easiest to use or more compliant or whatever other standard you choose to apply. Like Phillip, I avoid XML where possible.


I also want to say that I am incredibly disappointed in the extreme lack of maturity some of you are displaying. It's as if you're looking to get offended by the other guys. This is not how adult communities behave.


No, I take that back: this is how adult communities behave all too often. But it's not how they should behave, and not how I've come to expect the Python community to behave.


Now, this article has, I think, a few valid points. First, I think that benchmark results should, wherever possible, come with the code and data that generated them, especially when they are part of the announcement of the package being benchmarked.


Second, I think a number of different kinds of benchmarks, performed with a variety of packages, is an important thing to have. I don't think that benchmark results that don't have this kind of breadth are worthless, though.


Third, Uche correctly points out that benchmark numbers are not the only factor to consider in choosing a tool. However, no one, not even Fredrik, is disputing this.


The primary area where this article misses the mark is the claim that Fredrik was being deceptive. Fredrik did not post a deceptive benchmark. He posted an incomplete set of benchmark results: one that did not include the actual code he used to derive his numbers although he documented his procedure elsewhere. I would encourage Fredrik to post the benchmark code the next time he advertises cElementTree with benchmark numbers. Not only does it encourage confidence in the numbers, but it will also serve as a useful tool for others. Apparently, there are any number of people who don't know how to properly benchmark Python code and fewer (certainly not I before this incident) who know why the standard solution, timeit.py, is inadequate for these tools. People can see what the current "best practices" are for ElementTree for the operations timed and contribute benchmarks for the tools that were not included.


The article also makes the incorrect claim that what Fredrik bechmarks is useless. It's not. The parse time and memory used are important components to the whole XML-wielding program and should be measured. What's more, these are factors that are shared by pretty much every program; I may not need to find text or construct a tree or extract certain tags in my program, but I certainly need to read in the data. Now these aren't the only quantities that should be measured, but the measurement is not useless just because it's the only one offered there. And Fredrik certainly wasn't hiding the fact that that was all that he was timing.


The article also comes to the conclusion that the bechmarks offered by Fredrik were "deceptive" based on the evidence of Uche's own benchmarks which yielded different numbers than Fredrik's. Following the article's logic, the goal was to measure something important that Fredrik's benchmarks didn't measure: find all tags with a certain text string. The benchmarks were done, the numbers were rather different than Fredrik's, and so he was being deceptive in posting his numbers. If the article's benchmarks were adequate measurements, this line of argument would make some sense. However, the article's benchmarking strategy does not accurately measure comparable times.


The fact that the article's benchmarks are open, with full code and documented timing strategy, does not change the fact that they are wrong. Furthermore, it does not change the fact that concluding that someone else is being deceptive because their results (accurately obtained) don't match up with your results (not accurately obtained) is wrong. Posting an article to O'Reilly falsely accusing someone else of deception instead of hashing it out in private or a semi-private forum like the XML-SIG is also wrong.


But you don't have to take my word for it. I redid the benchmarks from the article with a proper timing harness. The results from 5 runs of each package are given, in seconds, in the file timings.csv. The information about my system are given in comments. I couldn't run the saxtools version; I get an exception as documented. I also tried the Gnosis code that David Mertz posted, but Gnosis_Utils-1.1.1 doesn't seem to define one of the functions needed. I didn't implement the lxml version because I didn't feel like building it.


The results are broadly along the lines of what Fredrik posted. Say what you like about his attitude and his "bluster," the man doesn't lie with his benchmarks.

faassen
2005-01-26 04:32:07
So effbot goes from crybaby to libelous cretin...
My last paragraph was rather badly written, so I'll rewrite:


And Fredrik, can you please shut up with the insinuations? Let's not blame on evil what we can blame on stupidity (sorry, Uche :) I guess Fredrik cannot believe some mistakes were made as he thinks Uche's too smart for it. The insinuations harm even more than Uche's article. It doesn't help Fredrik's believability either, and distracts the debate from what it should be about.

ialbert
2005-01-26 06:26:06
So effbot goes from crybaby to libelous cretin...
If you can't answer (potentially incorrect) criticism in a civil manner you're in the wrong business. If you can’t take the heat, get out of the kitchen. You wrote a badly researched opinion and you got a lot of criticism back, some valid, some not. Concentrate on the valid problems, fix or answer them and the rest will take care of themselves.
Rerun the darn tests in the proper way. Don't include the python startup time and don't write to the standard output. Measure the actual XML processing time, even better ask a few people to do it and average their results. It is pointless to claim impartiality when one of the frameworks is yours.
huh??
2005-01-26 07:34:04
"There's a Riot Goin' On"
"as for your so-called arguments, some hints: ..."


I don't what kind of argumentation strategy you are trying now (ad-hominem followed by red herring?), but that has zero to do with the two points I mentioned. I just want to see a more rigorous and open benchmark used, and see how the different tools compare when it comes to ease of use. I'm sure your celementree is fast, but it doesn't look like the easiest and most pythonic to use, however.

uche
2005-01-26 08:24:43
Closing the matter, for my part
First of all, I apologize to the community for my part in dragging this issue to the unfortunate depths to which it has come. I certainly think I was justified some fury by some disgraceful and unnecessary allegations by one individual, but I might have kept this fury a private matter with the offender.


My intent was always to provide code and discussion towards a useful set of benchmarks for the Python community, but clearly this has proved an area where no sensible conversation is possible. My code is still available in the article, and if anyone is interested, they can do with it what they will. I personally have too much real work on my hands to continue with a matter whose contentiousness so far outweighs its importance.

faassen
2005-01-26 09:01:12
Closing the matter, for my part
> My intent was always to provide code and
> discussion towards a useful set of benchmarks
> for the Python community, but clearly this has
> proved an area where no sensible conversation
> is possible.


You may not realize this, but this is rather offensive to myself (and probably others). In my
mind at least I've been been engaged in entirely sensible conversation about your benchmarks. I certainly believe sensible conversation is possible. If you believe my comments and suggestions are insensible, please point it out. You just declared them thus, after all.


Perhaps you just read my comments as part of a Fredrik-driven attack, or something, but I have been benchmarking XML the whole month now and I'm genuinely interested in improving the way we do benchmarks. I'm also curious about what went wrong with your particular attempt, and how to do it better next time.


I've asked a number of questions about this article, the benchmarks proposed in them and the benchmark results you get. There are some discrepancies which I'd like explained so we can avoid them in the future. I also think that the approach you take was not entirely correct (measuring Python startup time too, possibly printing to terminal), so I've been pointing that out too.


Instead of answering my comments and those of others, you've been focusing on Fredrik, whose benchmarks it is of course what prompted you to write the whole article in the first place. It's not a surprise Fredrik feels attacked. But a more
civil response from him would've been more productive.


This leads to the question whether you yourself are at all interested in calmly improving the
Python XML benchmarking story. You've certainly been ignoring any civil attempts on my side to help doing so.


It would be unfortunate if I have to go home with the conclusion that this article was only written because you felt threatened by Fredrik's benchmarks, instead of what I'd prefer to believe: that you want to improve the way we benchmark XML libraries in Python.

calcium
2005-01-26 20:40:31
Closing the matter, for my part
Wouldnt the whole thing be closer to resolution, if the benchmark tests were re-run by a 3rd party. That way, the numbers would be in-disputable...


I must admit the pystone numbers do sound a bit off...


If u guys send me your (non-viral) scripts, I'm happy to run it on my XP1600 and post the impartial results (though I must admit I own a copy of the OReilly book by Mr Lundh).


Easy solution, no?
Now to the middle east.....

faassen
2005-01-27 01:40:25
Closing the matter, for my part
Well, there's some debate (at least from my side :) about how such benchmarks ought to be run, but see Robert Kern's benchmarks in another posting.
faassen
2005-01-27 02:46:56
Closing the matter, for my part
(that's not to say I am debating Robert Kern's choices, which I haven't examined in detail, I'm debating Uche's)
huh??
2005-01-27 07:28:04
"There's a Riot Goin' On"
You later wrote: ""... I just noted that someone was repeating uche's arguments using Uche's words, with very little additional processing. I expect people to do a little more research before spouting off...""


You must have not read what I wrote at all. Did you see the first sentence in my paragraph? "There are two points I believe Uche made that have not been addressed..."


And then you complain that I was repeating Uche's arguments??? I was repeating Uche's arguments because....I was repeating Uche's arguments! Which you still have not addressed. Are you for real?

Cito
2005-01-27 11:17:49
Calming down...
Sorry for interfering, but somehow I feel it is highly ironic that both parties can parse the whole Bible so quickly, but both parties have extreme difficulties to apply Biblical standards of dealing with each other. Maybe both of you should slow down a little bit and invest some time in improving your EQ skills instead of your programming skills.


I really would like to see both parties calming down, admitting mistakes (not only those of technical nature) and reconcile. You are eminently respected in the Python community and usually great contributors and your reputation can only grow if you could bring yourself to do this.


Concerning the factual issue, what do you think of the following benchmark? Anything wrong with it?


def TestElementTree(ElementTree):
tree = ElementTree.parse(
'religion.2.00.xml/ot/ot.xml')
for text in (v.text for v in tree.findall("//v")
if v.text.find(u'begat') != -1):
dummy = text


def TestAmara(binderytools):
for text in (unicode(v) for v in
binderytools.pushbind(u'v',
source='religion.2.00.xml/ot/ot.xml')
if unicode(v).find(u'begat') != -1):
dummy = text


from timeit import Timer


print "cElementTree:",
t = Timer('TestElementTree(ElementTree2)',
'import cElementTree as ElementTree2;'
'from __main__ import TestElementTree')
print t.timeit(100)/100


print "ElementTree:",
t = Timer('TestElementTree(ElementTree)',
'from elementtree import ElementTree;'
'from __main__ import TestElementTree')
print t.timeit(20)/20


print "Amara:",
t = Timer('TestAmara(binderytools)',
'from amara import binderytools;'
'from __main__ import TestAmara')
print t.timeit(3)/3

Cito
2005-01-27 11:52:49
Calming down...
Found some good advices in ot.xml:


"Just balances, just weights, a just ephah, and a just hin, shall ye have"


"A soft answer turneth away wrath: but grievous words stir up anger."


"The beginning of strife is as when one letteth out water: therefore leave off contention, before it be meddled with."


"A brother offended is harder to be won than a strong city: and their contentions are like the bars of a castle."


And here is one for me:


"He that passeth by, and meddleth with strife belonging not to him, is like one that taketh a dog by the ears."

Phillip.
2005-01-29 07:48:40
I like celemettree
Several comments. First of all I like the elementtree syntax, it's very simple, and probably more the reason I plan on using it than the benchmarks.


Secondly, it's great that it's available in identical form in both Python and C versions. Not everyone can install C modules on their server, but offers a massive performance boost for those that can.


Lastly, the performance overhead of reading in and parsing the file is a significant, if not the only, benchmark. It is one constant that everyone needs to be able to do, but you cannot tell what the user plans to do from that point onwards.


I had written a similar C module for PHP and was planning to port it to Python before discovering elementtree. I'll now abandon that idea as Fredrik's module is perfect for what I need. Well done Fredrik.


Phillip.