Weblog:   The Greatest Test of Open Source: Beating Google
Subject:   search engine costs
Date:   2005-02-13 09:55:14
From:   dumbfounder
I think time is by far the biggest cost when developing/deploying a search engine, even if you have great software to work with. For open source developers to be able to tune their software to crawl the internet effectively (which means avoiding the endless sea of web spam) and to produce relevant results for different indexes may be too great a task. I speak from experience, I have been developing a search engine full time over the past 15 months. I have used cheap bandwidth (7 residential dsl lines costing about $400/month total) to download about half a billion pages, which is plenty of data to test my algorithms. Check it out at (not all half billion pages are online)