RubyForge vs CPAN

by Daniel Berger

It's often been said that Perl's greatest strength is CPAN, Perl's vast collection of free libraries contributed by developers from around the world. Recently I started to wonder about RubyForge and how RubyForge stacks up against CPAN in general.1

26 Comments

raggi
2007-09-19 05:27:34
That's interesting. The ratios are, as you say, not as extreme as one might assume. :)
Carlo
2007-09-19 06:50:33
Is interesting to cite the Phalanx project beside th CPAN (http://qa.perl.org/phalanx/). Too much packages doesn't mean "high quality" (or at least a decent level of it...). Surely it's a sign (I read it in this way) that there are TOO much package, and that make difficult the developer to make a choice about when use package A instead of package B.... On the RubyForge the situation is quite more "panic-free".
pmccann
2007-09-19 07:16:23

While I realise that this isn't meant to be anything akin to scientific, I think you're looking through rose coloured glasses somewhat: having recently "half-converted" to ruby from perl it's become abundantly clear that the maturity of libraries on the ruby side is considerably less than their perl counterparts. The standard of libraries like lwp, DBI and friends, HTML parsers, POD tools, LDAP, PDF production, Date and Time parsing, XML (off the top of my head) on CPAN is just astonishingly high. There *are* ruby counterparts, but the difference in the depth and polish of the implementations is fairly marked.


In addition, the infrastructure that accompanies ruby is less reliable than CPAN's (and "cpan" the tool's) mirrored setup: way too many times an attempt to install a gem results in a seemingly random error, and the cure is to "try, try again".


Note: this is not meant to demean the ruby libraries, or those managing ruby gems and the like. Far from it: it's already good and is getting better rapidly. But CPAN and friends are still my golden standard in such matters. Yes, there's cruft there, and yes, there are fun/silly modules, but by and large it's pretty easy to skim the cream off the top.


2007-09-19 07:38:12
CPAN "users" are really just the registered maintainers in the Perl Authors Upload Server. Those are the people permitted to upload distributions. That does not include all of the people who are working on those projects---just those who others might call "release managers". If you aren't going to be the person uploading code, you don't need a PAUSE ID.


RubyForge is acquiring cruft too. There are plenty of projects that have had no activity in years, as well as several "demonstration" projects that aren't really re-usable code. It's not a bad thing, and it has nothing to do with the language. On of the reasons for CPAN's success is that it doesn't try to judge quality; it just encourages people to upload to get going.

Perrin Harkins
2007-09-19 08:04:55
On top of that, a healthy chunk of the Perl libraries on CPAN are either unnecessary in Ruby or contain behavior that’s already baked into Ruby itself.

I think you missed the mark here. Most of this functionality is quite a bit beyond Ruby's built-ins. For example, many of the OO modules provide implementations of alternative OO paradigms, or automate complicated parameter validation. And I don't see how you can claim that the Tie modules are built-in for Ruby. The hooks to do similar functionality may be, but these modules implement database manipulation, efficient line-based file scanning and updating, variable access control, etc. These things are not part of the Ruby language.

Daniel Berger
2007-09-19 08:53:05
Carlo, yes, sometimes too many libraries covering the same area can have a net negative effect. Two cases that come to mind on CPAN are the large numbers of Date/Time libraries and the 100+ Test libraries.


Regarding the former, I remember there was a debate a while back to settle on the one, true date-time library, but I don't know what settled out of that. Regarding the plethora of Test modules, I'm not really sure what the "standard" is these days, but I think there's an effort to settle on the TAP libraries.

Daniel Berger
2007-09-19 09:05:26
Anonymous, that's true, but then there are plenty of contributors to RubyForge projects that aren't officially associated with the project, so I think that's a wash.
Stevan
2007-09-19 10:24:39
@Daniel Berger


It should be noted that most of those "100+ Test modules" are not copy-cat modules but modules to provide specific testing functionality within the Test::More/Test::Builder framework (which supports TAP output). I think this only goes to show the maturity of Perl's testing tools/framework and not any kind of chaos or confusion.

Mark Thomas
2007-09-19 11:04:49
I think Tom did Ruby a great service by setting up Rubyforge but I think there is a lot of room for improvement. For one, I think Rubyforge needs an interface like search.cpan.org which I really like. Particularly the distribution overview pages (e.g. http://search.cpan.org/~adie/Test-Class-0.24/). Also, I think contributors should adopt a few good naming conventions and use namespaces more often, rather than coming up with clever names that start with "r". The project names that make the most sense always seem to be a port of a CPAN distribution of the same name. And finally, I think the project tree should be re-done. I realize these were probably copied from elsewhere (SourceForge perhaps) but some just plain don't make sense these days (A "finger" category? Browse by programming language?). I would say choose categories that better distribute the projects, so you don't end up with subcategories containing 5, 0, 7, 8, 1, and 689 projects, to use an actual example.
chromatic
2007-09-19 12:02:12
@Daniel,


... there are plenty of contributors to RubyForge projects that aren't officially associated with the project, so I think that's a wash.


That logic's silly. CPAN as you define it is a distribution system with an index. RubyForge is a collaborative development site. The relationship between the number of active accounts on each is the same as the ratio of leprechauns to unicorns.


CPAN is the way to distribute Perl libraries. I can think of only a handful not distributed there. RubyForge is one place to host Ruby projects (not necessarily libraries either, but applications). It's probably the most popular, but I'm not sure it's by any means the only one.

Daniel Berger
2007-09-19 12:41:28
Mark,


Yes, there are some interface issues that could stand to be improved. But that's another topic. :)

Daniel Berger
2007-09-19 12:59:24
pmccann,


I don't entirely disagree with your assertion about library quality, although I don't think you're going to find a lot of work going into POD parsers anytime soon. ;) And, in a few cases, I think the reverse is true.


Your point about infrastructure is well taken, however, and CPAN definitely has the edge there. There's been some work done in that area already (there are at least 1 or 2 mirrors now) but I'm not sure how rubygems (the library) handles automatically falling back to other mirrors, if at all, unless you explicitly point at it. I'd have to check, though.

Lyle Johnson
2007-09-19 13:59:01
I'm awfully curious about the 20,300 registered users on RubyForge when only 3635 of them are associated with a project. Did the rest of those people register so that they'd be able to file non-anonymous bug reports?
Daniel Berger
2007-09-19 14:12:50
Lyle,


That's a good question. My *guess* is that it's for the non-anonymous bug reports, mailing list subscriptions (is registration required for that? I can't remember), and automatic notification of package releases. It may have also become something you "just do" in Ruby-land now. :)

trans
2007-09-19 14:35:30
Unfortunately there are number of inactive projects on Rubyforge. And there's no policy for addressing dead projects. They just sit there taking up namespace. It would be nice if as lest projects were subdivide into two groups --active and inactive.


Rubyforge also acts as the official repository of RubyGems --which is the de facto standard of Ruby package distribution. It's a bit unfortunate, but if your project isn't hosted as a .gem on Rubyforge, it may as well not exist. And obviously that leads to a lot of potential name clash. I wonder how well Rubyforge would cope with another 10,000 projects?

Adam Kennedy
2007-09-19 18:23:09
I won't bother to address the specific numbers you mention, since there's a mix of good and bad.


I would, however, suggest that you examine the definitions of the metrics you use. Because it's the definitions that make all the difference.


For example, CPAN doesn't really have a concept of "projects" at all. It's primarily only a store for libraries.


As other have mentioned, the "registered users" definitions you've used are probably incomparable metrics.


"Active Users" for CPAN would be "Current Release Manager". It's quite possible for people to "lose" projects off their list because someone took the module over.


Further to the projects issue, there's actually a seperate issue in that CPAN is NOT particularly strong when it comes to storing Perl applications.


It's relatively rare to find people packaging up entire web applications (like, say Twiki, which isn't on CPAN) as a CPAN distribution.


This is especially true in some areas like desktop applications, for which CPAN doesn't hold anything much more complicated than App::GUI::Notepad (a proof of concept cross-platform Wx Notepad clone).


But certainly for libraries, CPAN is considered complete and authorative.


Personally, I'd be interested to see some other metrics that might be somewhat more comparable.


For example, classes... We know there's about 30-35,000 classes/modules (assuming approximately 1 class per module) in those 12,000 distributions.


We also know that CPAN is about 20-25 million lines of code.


I think it would be fair to say that one SLOC of Perl is approximately equivalent to one SLOC of Ruby (compared to say C or Java), so measuring in those terms could at least be interesting.

Adam Kennedy
2007-09-19 18:36:39
Actually, now I ponder it a bit more, another great metric would be graph/network density.


The "real" power of CPAN is not just in it's raw size, but in the mesh of interdependencies that link the libraries together.


The dependency graph is the logical heart of the CPAN, and is what allows very small amounts of code to link half a dozen other libraries together to add significant value in a very small amount of code.


The total dependency numbers between the two repositories would give some idea of this density, although the Perl numbers are probably going to be inflated to some degree by explicit dependencies on modules that come in the Ruby core libraries, so there's large caveats even there.


2007-09-19 19:37:05
No one working on POD parsers? Perhaps you haven't been watching Damian Conway's work.
chromatic
2007-09-19 22:04:37
@Anonymous,


I believe Daniel meant POD parsers for Ruby.

donger
2007-09-19 23:58:32
the older you get, the more warts you have :)
Aaron Trevena
2007-09-20 02:30:16
@Daniel,


This is interesting - rubyforge/gems is better than I expected, but it's still no CPAN, both by accident and design. CPAN is much more than the sum of it's parts - particularly when you include automated smoke and kwalitee testing for uploaded packages.


Anyway - onto more specific stuff..


The reason we have so many test modules is that we have a good basic framework that allows you to create tests, and a culture of testing even in awkward, hard to reach edges. This means we actually achieve good code reuse ( a rare feat, even in open source / free software ) *and* have a wide range of good tests.


A case in point being my own test module for an interesting edge case : Mocking a database transparently for Class::DBI (one of perls older and simpler ORMs, about equivilent to ActiveRecords).


Also, the Dates/Times and Email modules have mostly sorted themselves out - you can use an obscure library if you wish, but 2 or 3 Date/Time packages are king of the hill, and the Perl Email Project has successfully simplified Email related tasks.

Aaron Trevena
2007-09-21 06:11:04
@Daniel,


is it just me, but if the rate of growth of rubyforge vs cpan is lower, then surely rubyforge isn't actually 'catching up fast' at all?


Catching up to where cpan was when rubyforge was first thought of perhaps, but to catch up you would surely have to be growing faster?


Btw - the cases of multiple releases in one day is that a heavily used or particularly well supported package can have automated tests fail, or users report and fix problems in < 24 hours, or there could be some packaging problem outside of the code, i.e. a dependancy version leading to problems or whatever.


I don't see stuff being fixed and released on the same day as a negative, it's a sign of how dynamic the perl community around CPAN can be.

Aaron Trevena
2007-09-21 06:20:26
Sorry, one last thing..


As other's have said, and you don't seem to have corrected anywhere :


The code quality of cpan, despite the acculumated belly-button-lint of more than a decade is likely to be higher rather than lower than on rubyforge, the maturity of the code, the tools, and much of the community means that quality is not only a goal, but something that's part of CPAN.


And you can't complain about there being too many modules, when often any of those available for a given problem is likely to not exist on rubyforge or not provide the same features/quality/documentation - I'd sooner have a choice of 5 perl modules, 3 of which are good, one of which is excellent, and one which is rubbish, than a choice of none, or one that isn't that good on rubyforge.

Steven
2007-09-21 11:30:43
I think there's a formula for language popularity that goes something like this: Build a HUGE "standard library", and everybody will love you for doing their work for them. Sun obviously knew this when they threw piles of developers at building class libraries. Perl owes a lot of its popularity to CPAN, but also to O'Reilly's "Hi, I'm Larry Wall, creator of Perl: The Language of the Web" TV spots!


CPAN is really an excellent resource, but Ruby will never duplicate it and here's why. Once the initial infrastructure is there, the recipe for building a CPAN goes:


  1. Publicly ridicule anyone who asks a question about the "old" version of the language (e.g. Perl 4),

  2. Answer every announcement for an interesting new program with "You should make a module instead!",

  3. Refuse contributions with cute names that don't fit the hierarchical standard.


Now don't misunderstand me, CPAN is a great accomplishment, and I happen to like standardized, hierarchical naming conventions. I'm not trying to fault the recipe (well, maybe (1), that was a really annoying time to live through), I just claim that the recipe is incompatible with the "Matz is nice so we are nice" Ruby community. The few members who have the chops to openly ridicule others' works also have a strong affinity for overly cute names.

whyme
2007-09-26 21:32:24
Dear advocate:
1. Why conduct statistical analysis on the repositories? It doesn't mean anything, I think it is better to dig into CPAN, compare the similar modules from Ruby and Perl, down to the features level. Simple statement such as:


... CPAN still has the edge in database interfaces, Apache libraries and wrappers for 3rd party commercial libraries, among a few other things ...


rang really hollow. We know Perl is better since Perl started first (Come to think about it: it is one of the main reasons why I picked Perl in the first place). Identify the modules which people would like to have first and work on it (not necessary porting it since I know tons of sucky PM).


2. I maybe wrong but I am pretty sure there are missing functionalities in Perl modules, identify them and the hordes will be more than happy to cover.


3. Don't you gloat: "We don't need no stinking Tie". Some guys just like wearing tie. It makes you look nice and trim, although it means choking O2 supply to the brain.


Let's get back to work. We got lots of holes to cover, bigger fish to fry and bugs to kill.


2007-10-06 20:34:26
Very nice article Daniel.


Keep this fresh mind!