Trust, But Verify

by chromatic

My first patch to Perl 5 was a quick and dirty tiny feature enhancement. It also broke a couple of tests. That small act of public humiliation reinforced what I already knew was a good practice; automated testing is an important part of creating software that works.

I spent a couple of years chasing two goals. First, to create tools of such quality and ease of use that there's no reason not to write good tests for Perl code. Second, to add tests so that we could immediately identify regressions and track them down to specific checkins.

Today, the core Perl 5 test suite (as of the most recent snapshot leading to Perl 5.8.9) has 121873 assertions. It could use more, but those tests cover the language and core libraries. Modern CPAN distributions are incomplete without tests written in the modern style, and test coverage and quality are topics of wider understanding and discussion.

I can't imagine relying on a piece of software that I can't verify with automated tests.

Analyzing the JRuby test suite, a recent weblog post from Christian Neukirchen stunned me. The most complete test suite for Ruby is the JRuby test suite and it has only 2747 assertions.

I know Ruby's simpler than Perl, but it's not that much simpler.


2007-05-31 08:23:20
We are working on it :)

Hop over to if you want to help.

2007-05-31 08:46:31
What's your code coverage like?

A quick glance at Coverity's bug rate gives another point of reference.

- Paddy.

2007-05-31 17:01:30
More worrisome for me about Ruby is the paucity of formal specs. I found what I considered a bug in one of the date handling routines in the standard library and being community minded I tried to implement a fix. I started by checking for tests and found that there were none, so I thought that I would look for a spec which at the very least defined which date formats the routine attempted to handle so that I could write some tests. Unfortunately I couldn't find these either, and there was no response from the comp.lang.ruby list when I asked for them.

I don't want to rag on Ruby, because it's a good language, but I think that chromatic's point is well made here - it's got a way to go before it can play with the big boys from this regard. I don't remember the number of test that are in Java's JCK, but I think it's probably even more than chromatic found for Perl, so these are probably the gold standards that Ruby has to aim for.

2007-05-31 22:27:26
@Matthew, that wasn't exactly my point. Perl 5 certainly doesn't have a formal specification (which has caused the LSB a bit of work lately), and neither does Python to my knowledge.

That said, I do think a comprehensive test suite is one sign of maturity. It's by no means the only sign of maturity. I just noticed the relative scarcity of tests in the Ruby source tree the other day and thought it might be interesting to compare numbers of tests.

2007-06-01 01:43:42
The Python Reference Manual is Pythons specification (and also gives a note on how formal it is).

- Paddy.
Jack Diederich
2007-06-01 07:19:03
Python tests are hard to count because there are two main styles. grepping for 'assert' will count the JUnit style tests but not the doctest style ones. 'doctest' runs a copy of an interactive session and expects the output to match (same errors raised, same results of operations, etc).
Daniel Berger
2007-06-07 08:49:19
As we were taught to say in the military, "No excuse, sir!".

So, a while back I started my own test suite:

cvs -d login
cvs -d checkout ruby_test

It's up to about 4500 tests for core Ruby, though the number varies slightly depending on your platform and whether or not you're running as root (which affects the Process module tests). I don't bother publishing it because I probably average 2-3 commits a day with new tests, updates, enhancements, etc. Any release would be outdated within a couple days.

My initial motivation was nothing as grand as providing a formal spec, but as a sanity check when I wanted to make optimizations to the various core classes (or attempts at optimization anyway), and verify that I hadn't broken anything in the process. As a side benefit, I've been able to help out the JRuby project with bug reports.

One of the other things I've done (that Rubinius does as well) is provide benchmarks. I created these originally to compare Ruby 1.6.8 vs Ruby 1.8.x back when 1.8.0 was in a pre-release state. I also use them to check attempts at optimizations that I make locally (I usually fail or make things worse, btw). Lastly, I've found that high iteration testing is a good way to smoke out pathological cases and bugs that might not otherwise show up in a simple test suite. Typically, these are caused by char pointer mishandling.

So, anyway, you're right, we're behind. No question. As for what the number of tests should actually be, I estimate about 15,000-20,000 would cover core Ruby, with maybe another 5,000-8,000 for the stdlib.

Daniel Berger
2007-06-08 20:54:36
Update: the Ruby 1.8.6-p36 release contains close to 15,000 assertions when you run 'make test-all'. Interesting.
Daniel Berger
2007-06-08 21:12:18
Oops, nm, that included the library tests. Though, hey, the core methods appears to have jumped to a whopping 1400! ;)
2007-06-08 21:12:28
@djb, when I skimmed the 1.8.6 tarball, I noticed that test-all runs module tests too. That's definitely a plus, but I'm sure you've noticed that debugging is easier when you have very specific tests of very specific features.
Bruce Van Allen
2007-06-08 21:35:45
I can't imagine relying on a piece of software that I can't verify with automated tests. ...
I also don't know how many tests it takes to have enough tests for a language or a language feature.

... formal specs

Daniel Berger:
As for what the number of tests should actually be, I estimate about 15,000-20,000 would cover ...

I wonder:
This discussion, abstracted somewhat, presages what might be a few people's CS Ph.D. work in the next few years, and maybe in several years we'll begin to really have a grasp on a new approach to programming language design. The relationships of the spec, the syntax, the tests, and so on will be more than a sense someone has of what's adequate or optimal. Instead we might see a higher order in which tests are not something separate, enforced by "best practices", but rather an inherent part of language desgn. Maybe. Students: get to work! Meanwhile, chromatic, thanks for an insightful contribution to building testing culture!

truth machine
2007-06-14 00:23:36
Much more significant is the number of reported bugs.