Annotating Everything

A Report on Marc A. Smith's Talk at ETech 2004

by Daniel H. Steinberg

Editor's note: Daniel Steinberg reports from O'Reilly's Emerging Technology Conference with an in-depth look at featured speaker Marc A. Smith's session on Catalyzing Collective Action on the Net. Marc demonstrated several tools that show promise as ways to enhance online communities. If you couldn't get to San Diego for ETech 2004 this week, you can find complete news coverage as well as the conference wiki, weblogs, photos, and much more on O'Reilly Network's Conference Coverage page.

Marc A. Smith describes himself as a sociologist from a small software company in Redmond, Wash., who is out on a work-release program. The Microsoft researcher is a sociologist who thinks about what people need to interact successfully. His recent research has involved investigating online gathering places and applying social tools to get a sense of place. He's also been looking at connecting people with information about the products they may be thinking about buying.

Reference Material

During his talk at ETech 2004, Smith provided a quick survey of some of the tools he uses to understand online gathering places. Robert Axelrod's book, Evolution of Cooperation, explains that we're all engaged in a game of risky transactions with each other that we relive over and over again.

In "Governing the Commons: The Evolution of Institutions for Collective Action", Elinor Ostrom's 1990 paper, she says that we do not have to live according to a prisoner's dilemma. Smith expressed Ostrom's position that we're only prisoners if we treat each other that way. Further, he says our culture doesn't celebrate collective projects. Smith highlighted Ostrom's assertion that "groups that are able to organize and govern themselves are marked by the following design principles:

  • Group boundaries are clearly defined.
  • Rules governing the use of collective goods are well matched to local needs and conditions.
  • Most individuals affected by these rules and policies can participate in modifying the rules and policies.
  • The rights of community members to devise their own practices is respected by external authorities.
  • A system for monitoring a member's behavior exists; this monitoring is undertaken by the community members themselves.
  • A graduated system of remediation is used.
  • Community members have access to low-cost, conflict resolution mechanisms."

You can find this material online in Managing the Electronic Commons, by Rob Reilly and Barry Kort.

In addition to these works by sociologists, Smith "worships at the church of Tufte." He added, "In fact, I'm a sinner in that church." His resource recommendations included Edward Tufte's The Visual Display of Quantitative Information. Much of Smith's work involves mining metrics for important indicators of life in Usenet newsgroups and finding interesting and useful ways to display the information.

Online Associations

When we think of groups that we interact with in person we are referring to collections of between four and eight people. This is often not what we're finding online and Smith agreed with Brian Butler that these online things aren't necessarily groups. They are too big to be groups. In searching for a label, Smith suggested that maybe they are more like voluntary associations. He referred to these as the AXAs: American Associations of Somethings.

When you think of these online and offline associations, there are many ways in which people clump together. In either case, Smith notes, almost nobody does anything in them. He said between 80 percent and 90 percent of the members do nothing more than join the group. In most cases, any collective effort is going to have a minimal contributing set. On the Net, perhaps the size of this group gets smaller, but contribution remains at about 2 percent.

Social software helps to build addresses. These so-called "Virtual Schelling Points" are obvious places on a landscape where there are places to gather. Turning Murphy's Law on its head results in Yphrum's Law, which celebrates the fact that "Systems that shouldn't work sometimes do, or at least work fairly well."

As an example, Smith cited The Value of Reputation on eBay: A Controlled Experiment, by Paul Resnick, Richard Zeckhauser, John Swanson, and Kate Lockwood. The paper describes "the first randomized, controlled study of an Internet reputation mechanism. A high-reputation, established eBay dealer sold matched pairs of items--batches of vintage postcards--under both his regular identity and under new seller identities (also operated by him). As predicted, the established identity fared better. The difference in buyers' willingness to pay was 8.1 percent of the selling price."

Usenet as a Social Place

O'Reilly Emerging Technology Conference.

Smith and his group study usage patterns of Usenet. He said that despite its nickname, "use to use-net," it's not dead. He acknowledged that it's not well, but reminded the audience that Usenet is 23 years old now and represents a standing structure for conversation. It remains a robust institution with 240 million messages this year from 8.6 million unique authors. You can visit and see a tabular representation of the results of collecting a billion or so headers.

He asked the audience to think of how they chose a restaurant the night before. They didn't look at an alphabetical list on their cell phones. They walked up the street, noticed which places looked busy, and checked out menus while enjoying the smells of food that came out of a particular eatery. How, he asked, could you provide the same rich experience for a newsgroup? Expose information about the participants and the nature of conversation in a newsgroup in the same way you might convey that you have found an Italian restaurant that is really packed right now that you should return to later.

One visualization that Smith's team is using is a tree map. The newsgroup "hierarchies are squished into nested boxes." The size of the box represents some attribute; for instance, it could stand for the number of messages. You could instead look at how regularly people come back. In tree maps color can convey another dimension of the data.

Another visualization is used for the inside of a newsgroup. Bubbles are placed vertically to indicate the number of days this author's been active; horizontally to indicate how many messages the person posts into a thread; and color to indicate activity. You can also represent what a person looks like in cyberspace with a histogram where every strip is a week. Above the line are bubbles for every thread the person initiated and below the line are threads the person responded to. The bigger the circle the more posts the person made in the thread. You can quickly get a feel for what a profile looks like for someone who doesn't initiate threads and who doesn't get into flame wars.

Smith and those like him are building a behavior history pattern. He noted that "people usually are who they are and stay that way. Styles of participation are often stable across time." Too much can go wrong if you base your view of individual and group traits on explicit reputation systems that require response. Implicit systems can be mined for more reliable data.

New Mouse to Click on the World

The world is now a web page. Everything is worth clicking on. Smith asked, "If everything has a machine readable tag, why not read it? Why not let everybody annotate everything?" A cell phone and a pocket PC are the new mice. The Microsoft Research Aura project is designed to take advantage of the "kerjillian barcoded objects that give off data." Suppose that you add a barcode to the label for artwork at a museum or gallery. Then your device would read the artist's name off of the barcode and you could do a web search that was composed of the artist's name and the word "artist."

You can take Aura shopping with you. Whether your supermarket has WiFi, or a cell network is available, you can scan products and do various web searches for more information. Smith showed a shopper scanning Cracklin' Oat Bran. The top result of the Google search was information about a California recall based on the label not reporting that the cereal contains milk, eggs, or almonds.

The Aura Blog annotation repository will be public in a few weeks. You can annotate anything you can scan by building a trusted authority on a particular topic. For example, imagine you are in the process of buying a pair of trousers. The annotation might show you the working conditions under which these trousers were made and ask if you would like to see a pair made under better conditions even if they cost you $8 more. The combination of the barcode reader and an Internet full of rich data leads to an informed buying public like never before.

Daniel H. Steinberg is the editor for the new series of Mac Developer titles for the Pragmatic Programmers. He writes feature articles for Apple's ADC web site and is a regular contributor to Mac Devcenter. He has presented at Apple's Worldwide Developer Conference, MacWorld, MacHack and other Mac developer conferences.

Return to the O'Reilly Network.