Why only REXML?

by Eric Larson

Bill recently commented on another small flare up on the REXML front. It is too bad that Ruby doesn't have a better set of libraries for XML. As Bill mentions, Python does a great job with XML. He mentions ElementTree, which is definitely better than something like pure DOM. Lxml is another option, which actually implements the ElementTree API and includes some pretty slick objectify functionality. Ian recently performed some rather unscientific, but still interesting, benchmarks on some Python libraries for parsing HTML. Ian found lxml to be quite the performer. There is also the 4Suite and Amara toolset that provides a very comprehensive suite of XML tools including an entire XML/RDF based document repository and full featured XSLT engine.

It makes me wonder why the Ruby community have not stepped up with some better options. The Python community is very similar in that XML has not been a hallmark of the community as compared to Java or .NET. One argument could simply be time, since Python has been around a bit longer. No matter the reason, I think it is time for the Ruby community to consider stepping up and producing a healthy alternative to REXML. My first steps would be to start with the libxml bindings and go from there. Lxml and Amara have both proven that utilizing a fast C library for the grunt work pays off in the end.

Lastly, I want to make it clear that REXML is still a pretty great tool. It meets the needs of many of its users, which is more than many software projects seem to accomplish. With that in mind, lets not stop there when we can do even better to make Ruby a great language for working with XML.


7 Comments

M. David Peterson
2008-04-02 13:38:09
Eric Larsen's on XML.com! *ROCK ON*! :D


Welcome, Eric! :D

M. David Peterson
2008-04-02 13:39:06
oops!


s/Larsen's/Larson's

Sylvain Hellegouarch
2008-04-02 14:36:31
dude what's that face mate? :D
matt
2008-04-05 20:44:12
have you looked at hpricot? It's not "feature complete"; however, it's worked in the majority of cases I've played with. Performance-wise it kicks the llama's ass compared to rexml.
Eric Larson
2008-04-05 23:06:15
@Matt, actually I have looked at hpricot it is definitely a great option when working with HTML. Where it fell short for me was with namespaces and mixed content. I like how it takes a jQuery-esque approach, but without better support for namespaces, it doesn't quite cut it for me. Since CSS does support adding prefixes, maybe in the future hpricot can do the same.
Phlip
2008-04-08 11:30:27
uh, what's ruby-libxml? Chopped liver??


(Hpricot, while completely filling many special niches, is not a true XML package and does not intend to be...)

Eric Larson
2008-04-08 13:01:39
@Philip,


I have been using ruby-libxml and it is far from chopped liver! My only thought is that it could be slightly more intuitive and Ruby-like. The Ruby libxslt bindings for example do an excellent job of providing a Ruby-like binding in how they implement adding extension functions to the processor object.


I mentioned this because most folks turn to REXML when presented with XML in Ruby. Personally, I use ruby-libxml.