Web 2.0 Conference Coverage
by Daniel H. Steinberg
Search: The Current and Next Big Thing
The topic of search kept popping up in different sessions at the Web 2.0 conference: from a demo of a new search browser to a panel full of search experts to geospatial search to demos from the labs at Microsoft and Google. The sessions discussed the current state of search, what users want and expect from search tools, and what changes are on the way for personalization and customization.
Improving the post search experience
Tuesday afternoon, Idealab founder and chairman, Bill Gross, set the stage with his "High Order Bit." He began with a look back at the three breakthroughs of search: creating a full text index of the entire Web, using price as a relevance metric, and using link strength as a relevance metric. Gross set up his current project, SNAP, by saying that after you get the results of your search is when your journey begins. To make that component of the search more user friendly, he proposed that it is important to enable the user to change the order and refine the search results, to use user feedback to take advantage of what other users have done after performing that search, and to access transparency to expose how a search system works.
He expanded on these three enhancements during his demo. You can easily filter and refine the returned results by typing in a string to match or a condition such as >50 for the Popularity or Satisfaction columns. The fields available for refining the search are dependent on the type of item for which you are searching. For some searches, SNAP has mined usage patterns of users to determine the paths they follow after a search. With a vague search such as "cars," SNAP presents the four biggest areas that people explore and seed each of them with the two most relevant listing for each path. As for transparency, SNAP intends to expose all of the statistics for their site, including their revenue sources and amounts to help advertisers make more informed decisions.
In his after-dinner speech on Tuesday, HDNet co-founder, Mark Cuban, explained what he would like to see in search results. Cuban is interested in what is new since the last time he searched. He doesn't want to see the most popular links at the top because he has already visited those. He wants a search engine that can highlight what is incrementally new.
Holding the world in your hand
Search is not restricted to entering a text field and getting back a collection of links. John Hanke, CEO of Keyhole, demonstrated how quickly you can zoom in on a physical location and get real-time information as the client accepts streamed data from satellites. He showed how roads and public transit could be layered on the satellite images and how traffic pattern data could be viewed as well. In addition, users were able to annotate the map with geospatial data such as a picture and description of a landmark that might be at a particular location.
On Thursday, Richard Rashid of Microsoft Research showed sites that enable, what he termed, the democratization of science. The picture and topographical data contained in their Terraserver site are hit twenty million times each day with between two and three million hits against a web service. Higher resolution data is available now than when the site launched in 1998. Search for map-based images isn't limited to the Earth, and so Microsoft has a site called SkyServer that presents data from the Sloan Digital Sky Survey, including views for amateurs, scientists, and students.
During the Wednesday afternoon session entitled "Search is a Platform. Where is it Going?" the panel agreed that personalization and improvements to the user interface are key to the future of search. For the most part, search consists of having a user enter two to three words and view a list of returned results. Under one percent of the public use any of the advanced features that many search engines offer. Louis Monier, director of eBay's Advanced Technology Group, said that enhancements to search cannot depend on training users to do more. Instead, he suggested, the metaphor is that you bring them the dish that they want, but you also bring other dishes that they may be interested in.
The key is "understanding the intention of the user and enabling them to complete a task," added Jeff Weiner, a senior vice president at Yahoo. He said that personalization can be thought of as fitting into two boxes. There is the explicit gathering of information, where users provide information about what they do and do not like. The implicit personalization comes from tracking what the user tends to do. Weiner said that a search result must transition from a means to an end to simply being an end in itself.
Peter Norvig, Google's director of Search Quality, shared some projects from the Google labs. The first project was statistical machine translation, which allows you to enter a search in one language and return results from content that is written in a different language, and further, to return the results translated back into the original search language. This is difficult as there are many subproblems that involve understanding syntax and semantics of both languages, as well as idioms. Google is taking a statistical approach by building a model over words that occur next to each other and breaking sentences into phrases. In the end, Norvig explained, the success is the result of having lots of data and applying lots of machines to them. The results are locally good, although the beginning of a sentence may not quite match the end of a sentence.
The second project Norvig presented was named entity extraction. Here, the goal is to take the name of a person or a company, find them, and find connections between them. One example he offered for finding connections is to look for the phrase, "such as." If you see a phrase that includes the phrase "Computer book publishers such as O'Reilly Media," you now have a candidate for a category within which O'Reilly lives. You can use this data to extract sets of related clusters along with the names of clusters and a hierarchy.
The third project was to identify clusters of words. So, for example, Pepsi may be in a cluster that also includes Coke and Sprite. It might also be in a cluster that includes Britney Spears. Once you have clusters of words, you can search on a word and then select the desired cluster that is the most likely container of relevant information. This cluster then contains links ordered in some useful way that you may then follow.
Daniel H. Steinberg is the editor for the new series of Mac Developer titles for the Pragmatic Programmers. He writes feature articles for Apple's ADC web site and is a regular contributor to Mac Devcenter. He has presented at Apple's Worldwide Developer Conference, MacWorld, MacHack and other Mac developer conferences.
See more Web 2.0 Conference Coverage.