Automatic Personalized Search at the Internet Archive

by Mark Finnern

Related link: http://myrecall.archive.org/




Anna Patterson
is the Queen of the geeks, at least fro me :-) On the side, while pregnant,
she wrote a search engine
for the Internet Archive
which clocks in at 150,000 lines of plain POSIX
C code for the indexer and 50,000 lines for the server side, with some neat
features like automatic personalization. Wow.


We taped her presentation at the Future
Salon
Friday a week ago, but the sound didn't get recorded, so you have
to make do with eye witness reports like this one. Anna's search engine datamines
the corpus before indexing, which leads to a higher level of knowledge per page
and enables new features like: automatic content organizing, trend analysis,
and personalized search.


The personalized search you can try for yourself. Put "Alexa" in
the recall engine.
You will see that some of the results are not worksafe, so don't click these.
Next search for Brewster
Kahle
the founder of the Internet Archive who also created and designed
the Alexa
navigation service
, which got bought by Amazon.com.
If you now search for "Alexa" again, the results are tailored towards
your previous interests, all of the first page hits are about the Alexa search
engine. It's a bit eerie at first, but the results will convince you, and of
course you can always clear your search history.


Overall very interesting concepts and results that you can check out today.
I am convinced this is not the last time we hear from Anna Paterson and her
search engine. Here are her quick
overview slides
.