Social Networks, Privacy, and the Semantic Web

by Jennifer Golbeck

A couple years ago, Plink.org was launched as a new kind of web-based social network. (You can see how Plink used to look through the internet archive.) Instead of requiring users to register, input information, and add contacts to other people in their centralized network, Plink crawled the web for FOAF Files.


FOAF is short for Friend-of-a-Friend, and it is a Semantic Web vocabulary for representing information about people and their relationships. Because FOAF is a Semantic Web project the files are written in OWL, a machine readable language. This means they can be easily read, processed, and aggregated from distributed sources.


FOAF information is generated from a lot of places. People can create their own FOAF files by hand, or by using Foaf-a-Matic. The largest sources of FOAF data, however, are web-based social networks that choose to share their user's information in FOAF form, as well as in HTML form. LiveJournal, Tribe.net, eCademy, and Buzznet are just a few of the networks that publish users' data in FOAF. In 2004, Howard Dean's campaign was also collecting social network information, by recording which visitors sent emails through the site, inviting their friends to come visit. They published this data as FOAF, as well.


Plink encountered all of this information, freely available on the web, and displayed what it found as part of its website. The result was a nice website that showed all of the data about a person from a variety of sources. It also inspired a lot of angry email from people who never "signed up" for Plink and were very surprised to see their information there. As a result, Plink was forced to shut down.


In one sense, the Semantic Web is designed for exactly the type of activity that Plink undertook: to make information stored in distributed files easy to access, read, and aggregate. However, the ultimate failure of Plink was due to concerns from users about their privacy. People who register for web-based social networks expect some of their personal information to be available on the web. For the users who are concerned, there are privacy policies that can be consulted to explain what will be shared and with whom. But really, how many of us actually read those? There was less of an expectation that information entered on a political website might be shared. Did Howard Dean's campaign have a privacy policy that said people's information might be shared? Yes, but we can't say whether or not people read it or understood it.


I would argue that the standard privacy policy is not a practical way to inform users about how their data will be shared. Sure, users should probably read them...but they don't (I don't read them, and I know better). Privacy policies are long, rambling, and boring.


How can we more effectively inform users about how their data will be shared on the web? I admit that I have not worked hard to come up with a good solution. On first impulse, I could envision a system that maintains the privacy policy, but that also has a simple visual indicator (an icon or something similar) that conveys basic information about how data will be shared (e.g. data will be strictly confidential, some data will be available on the web, data may be shared with commercial third-parties, etc.). In any case, the growth in technologies and interest that will allow this information to be published on the web ensures this is an issue that won't go away. Someone will need to dedicate time and thought to this issue.


As we consider this, let's all share a moment of silence for Plink, may it rest in peace.


What are your ideas about privacy on the Semantic Web? Are there systems you think could be or should be deployed?