LUCENE IN RUBY (name: Ferret) - this is big - pay attention

by Derek Sivers

Related link: http://ferret.davebalmain.com/trac



If you are planning to use Ruby in any project that will need to search anything, pay close attention to Ferret - a Ruby port of Lucene.


Back in January, I started my rewrite of CD Baby in Rails - but one of the biggest unsolved problems was search :


  • over 100,000 albums (and adding 200 new albums a day!)

  • over 1,000,000 songs

  • need to be searchable not just by exact-match, but partial-match and mis-spelling

  • results need to be weighted so that exact-match result comes before partial-match

  • every search must search these six fields: artist, album, style, description, mis-spellings, similar-artists

  • result matches need to be weighted in this order of fields: artist, mis-spellings, similar-artists, album, style, description

  • all this has to happen in under 1 second



I used to have great search results, but it took TEN queries to do it (4 exact-match queries followed by 6 LIKE '%string%' queries). This was fine before CD Baby got popular, but once we started growing, my old reliable search was taking 30 SECONDS to return results! Live! On the website! Intolerable!


I switched to MySQL's fulltext search. Fast, yes. But disappointing results. Too many results. Search for "Bob Dylan" and you'll get EVERY artist with any mention of "Bob" OR "Dylan" in their name or album name.


I asked on my blog, here and got some good advice, including a recommendation for Lucene. My good friend Robert Kaye also told me about Lucene. No - he RAVED about Lucene - about how it could wildcard-search a million strings and return properly-weighted results in a few milliseconds. We talked about his Lucene experience for an hour, and I was convinced that this was the way to go. If you're interested in learning more about Lucene, download the Lucene book : Lucene in Action. It's great.


Only one problem : it's in Java. Fucking Java. I've never tried Java. I was hoping to not have to. I don't hear nice things about it. It's on my coffee list. But I was considering learning it a bit, just to get Lucene going.


RUBY BINDINGS TO LUCENE?

I asked around the Ruby list, and found out that Brian McCallister had been given a small grant to write Ruby bindings to Lucene. This looked very promising, at first, but eventually became apparent that it just wasn't going to happen. At all. Sigh....


LUCENE WEB SERVICE:

Robert Kaye wrote the Lucene Web Service for me. Tomcat. Java. A good start. Open source. Even has some other contributors. But still would mean I'd need to install Java on my servers and maintain a Tomcat server, and do all this Java stuff I was really really hoping not to have to do, just to search my catalog! But it semeed like the only way, so I was going to dedicate next week to setting it all up and getting to know it.


ANNOUNCED THIS WEEK : LUCENE FOR RUBY! HOLY SHIT!

Then just a few days ago, David Balmain announced a full port of Lucene to Ruby - called Ferret. A full port! No Java needed! Oh man what perfect timing.


  • See example usage on his announcement and tutorial, and Brian McCallister's example, too.

  • Bookmark the Ferret home page, download it, give it a try, tell others.

  • Subscribe to Dave Balmain's blog because development is happening fast, even though it's just him right now.

  • And lastly - for you Lucene experts or C-programmers, please consider contributing to Ferret, because Dave could really use some help with it. CD Baby will contribute some money to the project - but I'm not enough of an expert to help the programming itself.


4 Comments

MikeBoone
2005-10-23 12:13:40
PHP Option
I shied away from Lucene for the same reason you did: I didn't want to install and maintain Java on the server. I use PHP, and I found that Xapian (www.xapian.org) fits the bill. Fast, flexible searches and bindings to PHP. I've been pretty happy with it.
Joe
2005-10-23 19:27:08
pyLucene
There's no reason that a similiar critter to pyLucene (http://pylucene.osafoundation.org/) couldn't be made for Ruby. It's compiling the java into C, and then driving forward to SWIG to make the interfaces. Ruby would fit right in...
dereksivers
2005-10-23 20:33:16
pyLucene
>pyLucene for ruby:
That's the Brian McCallister project I was referring to, above. Yes it would be do-able, and that was the goal, but didn't look like it was going to happen.
aristotle
2005-10-24 02:14:25
Re:
For completeness, the Perl port of Lucene:


http://search.cpan.org/dist/Plucene/