In Search of Perfect Search
by Scot Hacker
Yesterday I got to work setting up a sorely needed search engine for the jschool web site. With around 5,000 documents scattered around, it's amazing the site has come this far without one.
I've built search engines for various sites that were completely database-backed, but this site is mostly straight HTML. That means I need to implement a search engine that daily hoovers our static content into a MySQL database, then delivers search results from that.
Writing basic search engines in MySQL/PHP is easy. But making sure you've got a fast, effective spider, a database that stores meta keywords and descriptors in separate fields, a query system that looks first at meta data and secondarily at the page text... doing all of that is a bit more complex. Not impossible, just more complex. So I decided to implement a pre-fab, open source search package.
I had thought my search for perfect search had ended when I hit on Mnogo Search. But on closer inspection, it turned out that Mnogo is only free for use on Unix. When deployed under Windows, you have to pay. And it's not cheap. Interesting business model ;) Unfortunately, I'm stuck with a Windows web server (at least it's Apache), so Mnogo becomes a non-option.
So I spent half of yesterday looking for the perfect, free, cross-platform search engine written in PHP/MySQL. The first few I tried all had seriously broken installation routines and very poor documentation (one was French, the other German - both had fairly poor English skills, making troubleshooting tough).
The Lucerne subproject under that Apache Jakarta project provides an open source search engine. The team behind has quite a bit of expertise.
I've had excellent results from Swish++ which is GPL'd. http://homepage.mac.com/pauljlucas/software/swish/
A list of open-source search engines
I'm tracking open source search enignes at