Adventures with R

by brian d foy

Related link:

I ran across R recently, and this week over lunch I talked with an economist about statistical packages. Neither of us had tried R, though. It's GNU and it's free, unlike some other popular pacckages.

R has a Mac OS X package that installs quite nicely. They also have pre-compiled binaries for Linux and Windows. The R community looks like it's stealing the best part of the TeX community just like Perl did. Where TeX has the Comprehensive TeX Archive Network (CTAN), and Perl has the Comprehensive Perl Archive Network which are really just gussied-up FTP servers, the R community has the Comprehensive R Archive Network. And, since I link to the Wikipedia entries for CTAN and CPAN, I created my first wikipedia entry: CRAN.

The R project page has lots of pretty pictures and examples, but for the numbers nerds, here's a little taste:

I wanted to compare the occurances of the words "wrong" and "right" in the perlfaq repository, mostly because it's saturday and it's raining outside and I don't have any new NetFlix movies to watch. I have the gory perl details in my use.perl journal.

doc wrong right
perlfaq1.pod 0 4
perlfaq2.pod 0 4
perlfaq3.pod 1 8
perlfaq4.pod 4 12
perlfaq5.pod 5 3
perlfaq6.pod 6 6
perlfaq7.pod 4 11
perlfaq8.pod 2 5
perlfaq9.pod 1 3

Curiously, the distribution of "wrongs" is a bell curve, although not quite symmetrical.

| *
| * *
| * * * *
| * * * *
| * * * * *
| * * * * * * *
1 2 3 4 5 6 7 8 9

I want to get the standard deviation, not because it's useful but more because that's what I'm used to doing when a see a chart like that. I could plug the numbers into one of my fancy calculators, but then I wouldn't get to play with R.

It's really easy. Scary easy compared to the stuff I had to deal with way back when I was in college and writing my own statistical packages so I wouldn't have to use the existing ones. I take the numbers from my chart and put them into R, then calculate the numbers I want.

albook_brian[791]$ R

R : Copyright 2004, The R Foundation for Statistical Computing
Version 2.0.0 (2004-10-04), ISBN 3-900051-07-0

> freq <- c( 1,4,5,6,4,2,1 )
> mean(freq)
[1] 3.285714
> median(freq)
[1] 4
> var(freq)
[1] 3.904762
> sd(freq)
[1] 1.976047

That's enough to hook me. R has all sorts of other much more powerful features that I'm looking forward to exploring those too.


2004-10-30 17:40:53
OS X Nightly Builds
Incidentally, the OS X package for R is still a bit more of a moving target relative to the other platforms due to a recent migration from an interim Carbon-based GUI to a Cocoa-based GUI. Nightly builds live at and there is a Mac-specific mailing lists that's usually pretty helpful.
2004-10-30 17:41:34
Uh, brian?
Your link to Wikipedia's CRAN article actually points to the CPAN one… :-)
2004-10-30 17:47:12
Uh, brian?
Fixed, thanks. :)