Articles Weblogs Books School Short Cuts Podcasts  

The Power of Metadata
Pages: 1, 2, 3, 4

Resources and relationships: A historical overview

So where does this all leave us? How do we infuse our peer-to-peer applications with the metadata lessons learned from the Web?

The core of the World Wide Web Consortium's (W3C) metadata vision is a concept known as the Semantic Web. This is not a separate Web from the one we currently weave and wander, but a layer of metadata providing richer relationships between the ostensibly disparate resources we visit with our mouse clicks. While HTML's hyperlinks are simple linear paths lacking any obvious meaning, such semantics do exist and need only a means of expression.

Enter the Resource Description Framework[3] (RDF) -- a data model and XML serialization syntax for describing resources both on and off the Web. RDF turns those flat hyperlinks into arcs, allowing us to label not only the endpoints, but the arc itself -- in other words, ascribe meaning to the relationship between the two resources at hand. A simple link between Andy Oram's homepage and an article on the O'Reilly Network provides little insight into the relationship between the two. RDF disambiguates the relationship: "Andy wrote this particular article" versus "this is an article about Andy" versus "Andy found this article rather interesting."

RDF's history itself shows how emerging peer-to-peer applications can benefit from a generalized and consistent metadata framework. RDF has roots in an earlier effort, the Platform for Internet Content Selection, or PICS. One of the original goals for PICS was to facilitate a wide range of rating and filtering services, particularly in the areas of child protection and filtering of pornographic content. It defined a simple metadata "label" format that could encode a variety of classification and rating vocabularies (e.g., RSACi, MedPICS[4]). It included the goal of allowing diverse communities to create their own content rating languages and networked metadata services for distributing these descriptive labels. While originally it defined a pretty comprehensive set of tools for rating and filtering systems, PICS as initially defined did not play well with other metadata applications. The protocols, data formats, and accompanying infrastructure were too tightly coupled to one narrow application -- it wasn't general enough to be useful for everyone.

One critical piece PICS lacked was a namespaces mechanism that would allow a single PICS label to draw upon multiple, independently managed vocabularies. The designers of PICS eventually realized that all the work they had put into a well-designed query protocol, a digital signatures system, vocabularies, and so forth risked being reinvented for various other, non-PICS-specific metadata applications.

The threat of such duplication led to the invention of RDF. Unlike PICS, RDF has a highly general information model designed from the ground up to allow diverse applications to create data that can be easily intermingled. However diverse, RDF applications all share a common strategy: They talk about unambiguously named properties of unambiguously named resources. To eliminate ambiguous interpretations of properties such as "type" or "format," RDF rests on unique identifiers.

Foundations of resource description: Unique identifiers

Unique identification is the critical empowering technology for metadata. We benefit from having unique identifiers for both the things we describe (resources), and the ways we describe them (properties). In RDF, we call the things we're describing resources regardless of whether they're people, places, documents, movies, images, databases, etc. All RDF applications adopt a common convention for identifying these things (regardless of what else they disagree about!).

We identify the things we're describing with Uniform Resource Identifiers, or URIs.[5] You're most probably familiar with one subset of URIs, the Uniform Resource Locator or URL. While URLs are concerned with the location and retrieval of resources, URIs more generally are unique identifiers for things that may not necessarily be retrievable.

We also need clarity concerning properties, which are how we describe our resources. To say that something is of a particular type, or has a certain relationship to another resource, or has some specified attribute, we need to uniquely identify our descriptive concepts. RDF uses URIs for these too. Different communities can invent new descriptive properties (such as person, employee, price, and classification) and assign URIs to these properties.

Since the assignment of URIs is decentralized, we can be sure that uniquely named descriptive properties don't get mixed up when we integrate metadata from multiple sources. An auto-maker's concept of "type" is different from that of a cheese-maker's. The use of URIs such as and serves to uniquely identify the particular "type" we're using to describe a resource.

One critical lesson we can take away from the PICS story is that, when it comes to metadata, it is very hard to partition the problem space. The things we want to describe, the things we want to say about them, and the things we want to do with this data are all deeply entangled. RDF is an attempt to provide a generalized framework for all types of metadata. By providing a consistent abstraction layer that goes below surface differences, we gain an elegant core architecture on which to build. There is no limit to the material or applications RDF supports: Through different URIs and namespaces, different groups can extend the common RDF model to describe the needs of the peer-to-peer application at hand. No standards committee or centralized initiative gets to decide how we describe things. Applications can draw upon multiple descriptive vocabularies in a consistent, principled manner. The combination of these two attributes -- consistent framework and decentralized descriptive concepts -- is a powerful architecture for the peer-to-peer applications being built today.

When it comes to metadata, the network becomes a poorer information resource whenever we create artificial boundaries between metadata applications. The Web's own metadata system, RDF, was built in acknowledgment of this. There is little reason to suppose peer-to-peer content is different in this regard since we're talking about pretty much the same kind of content, albeit in a radically new environment.

Pages: 1, 2, 3, 4

Next Pagearrow