Your personal information space (Dashboard and Beagle)

by Andy Oram

Related link: http://www.gnome.org/projects/beagle/



Google doesn't want you to delete your mail. That note you tossed off
to your spouse last night, asking which brand of cereal to buy at the
grocery store, may be utterly irrelevant to you today, but to
Google Mail
it's highly marketable information.



A similar concern, less commercial but equally avaricious in the
information sense, lies behind one of the projects from Ximian (now
Novell) to generate the most buzz: founder Nat Friedman's

Dashboard

project. Despite a promising prototype, Dashboard implementation
turned out to involve a lot of deep and difficult questions, but its
supporters believe they have a way foreward. A future version of
Dashboard will be reconstituted on the

Beagle

project, led by Jon Trowbridge.



The GNOME foundation is treating Dashboard and Beagle as extremely
important. Trowbridge gave an informal keynote-like talk on them
today at the

4th GNOME Developer's Summit
.



The issue does not concern GNOME alone. Dashboard and Beagle are
desktop-independent; they could be accessed by KDE as well. And
Microsoft has announced a similar system that automatically indexes
your entire computer system and turns up everything related to some
topic of importance to you.

Reasons for Dashboard, etc.



The problem motivating these systems is the common "Where did I see
that?" question. For instance, I told the GNOME Foundation executive
director Tim Ney today that I had seen survey results suggesting that
KDE is three times as popular as GNOME. (I don't consider the results
necessarily accurate.)



Now I'm trying to figure out whether I saw this survey. Was it a Web
site I visit regularly, something on an RSS feed, an email sent by a
colleague, or a hallucination induced by listening to too much modern
jazz this week?



I don't think either Dashboard or Longhorn will help me search that
last category any time soon. But they are supposed to help turn up
results from all the other categories--and (thanks to real-time
indexing) turn them up nearly instantly, even on a hard disk with
multiple gigabytes of information in a variety of formats.

More than a super-grep



The Dashboard/Beagle vision is far more than a super-grep, or
something able to search for keywords in files of different
formats. (Windows has offered that for a long time.) Beagle already
has time tracking, which means that if you read an email and
visit a file a few seconds later, Beagle will remember that they're
related even if there's no particular phrase that's featured
prominently in both. Beagle also maintains a full-text index on every
Web site you visit. Trowbridge would like to go further and track of
the context in which you handle information. For instance, if
you save a file from someone's email message, the file will contain a
marker indicating a connection with that email message.



Trowbridge complains that your computer throws away a lot of
information you give it (such as the fact that you saved a file from
an email message). But I wonder about the push to save so much
metainformation. True, we now have the processing power and storage
space to save all kinds of junk. But can we predict what information
will really be useful? I'll return to this question at the end of this
article.

What Dashboard and Beagle entail



It's worth briefly going over the architecture that supports the
personal information space, because that helps to show how extensively
a system must be changed to support it.



Fast search depends on an up-to-date database, whether one is talking
about the spidering done all the time by Internet search engines, or a
repository of terms used by files on your own hard disk. Thanks to events generated by the
new

D-BUS

interface being developed for Linux, a kernel subsystem called inotify can collect changes to files as they happen and pass them to interested userspace tools.



Beagle depends on an indexing tool called

Lucene

to keep track of what's in various files on the system. It essentially
checks everything except files in dot directories and others that
traditionally contain throw-away data. As I already mentioned, it
records the contents of Web pages you visit. It can also search your
email, your IM logs, and anything else that exists as a file.



The next step is to associate store the metainformation collected in
various ways with the files. Microsoft's Longhorn will theoretically
involve an entirely new filesystem called WinFS. (When this will
happen is anybody's guess, but it won't happen soon.) One of Linux's
strengths is its support for multiple filesystems, and Trowbridge
doesn't expect them all to be enhanced just to support
Beagle. However, many filesystems contain files called "extended
attributes," often used to implement Access Control Lists and other
new features. Beagle can use these to store its metadata.



For each file format or type of information (email, for instance)
Beagle will have a back-end API to do searching. The developers are
even looking for ways to associate metainformation with
pictures. Beagle combines all the results and presents them in a
single front-end API. Applications that want to do system-wide
searches, therefore, will need to understand just the Beagle API in
order to access all types of data on the system. The current utility
used to demo Beagle is called best, for Bleeding Edge Search
Tool.



Privacy fears come to mind when one considers a tool that does instant
searches. Remember that (currently) Dashboard and Beagle are meant for
use by an individual on his or her personal data. One approach to the
issue is to say "Privacy is overrated" and assume that one is doing
the user a favor by presenting his or her entire disk contents on
demand. Another approach would be to divide information into
categories, such as to separate work data from personal data. But
that's hard to do: asking the user to distinguish them is adding work,
while trying to do it from context risks oversimplifying the complex
lives led by users.

Indexing the infinite



I want Dashboard. I am intrigued by the idea that, instead of
organizing and boiling down the information I receive and trying to
get rid of what I don't need, I should go in the opposite direction
and compulsively save information, expecting my computer to pluck out
what I need later. A saying attributed to AI researcher Marvin Minsky
claimed that his information store consisted of his friends. For this
task I trust computers more than friends. (Sorry, Tim Ney.)



But I worry about clever schemes to track and save information--and
not for privacy reasons. I just wonder whether we'll know what we'll
want in the future.



Archaeologists have found marvelous ways to deduce ancient people's
lifestyles from the facts they turn. They make deductions based on
whether an artifact is upside-down or right-side-up, and from chemical
traces found nearby. Still, we often wish people in the past left more
clues.



We also do archaological searches on our computer's data, which is
just as strangely organized and off-balance as the

MIT Stata Center

that hosts today's GNOME summit. Once again, the data we left behind
on our computers proves frustratingly inadequate for today's purposes.
And I would guess that increasing the data we collect will do little
to close the gap.



When one starts creating filesystem attributes and instrumenting
applications, one makes choices that will continue to have impacts
thirty years later. What new application will arise just a year or two
from now that will make the Beagle developers kick themselves because
they forgot to prepare for it?



So I'm not saying full system search is unfeasible. I'm just asking
how long it takes to prepare a system for the search, in comparison to
how it takes for the system to become obsolete. I'd like to try the
results, in any case.


What would you search for?