Our del.icio.us Folksonomy (Beta)

by Tony Stubblebine

We've just added del.icio.us tags to our articles. These are single keyword categories generated by the O'Reilly readers as they bookmark our articles in del.icio.us. The sum of these tags is a taxonomy (some say folksonomy) of articles that emerged from our readers rather than being handed down by our editors.



There's value in both types. One's authoritative. The other's flexible and dynamic.



Getting Started


Look in the upper-right corner of articles for related tags, then follow each tag to other articles with the same tag. Here's an example article that's tagged with ruby rails programming tutorial web



Rolling with Ruby on Rails



If you want a better way to keep your bookmarks (and also contribute to the O'Reilly categorization system), head over to the del.icio.us site.



Beta Thoughts


We're calling this Beta because we're still experimenting with it. We'll be releasing early and often. And keeping a diary of the results here.



We were tempted to self-populate our articles in del.icio.us. Only 22% of our article content has been tagged. It's hard to leave so much good content
out of the categorization.



However, the numbers look much more encouraging when viewed by year. Here's percentages of articles with del.icio.us tags: 2005 - 71%, 2004 - 51%, 2003 - 23%, 2002 - 16%, 2001 - 10%, 2000 - 6%. The article coverage is rising with the popularity of del.icio.us.



We've gotten two deliveries of data so we're already able to say which tags are on the rise. The top three gainers (over the course of a week) map exactly to the buzz around the office, ajax (+469 tags), ruby (+378 tags), rails (+333 tags).



Our editors were concerned about innapropriate tags. We can live with typos or synonyms, but you're not going to be seeing naughty words get through. That's
because we're moderating new tags before they get incorporated into the site. So far there's been 1174 tags, plenty of typos, and no obscenities.



Joshua Schacter, founder of del.icio.us, gave us data for all our pages (like weblogs)- so there's a lot more to work in. I have a feeling that he'd like to offer this data to all sites. Thanks Joshua!



Stay tuned.



What do you think of folksonomies?


7 Comments

BobDuCharme
2005-06-23 09:20:24
O'Reilly weblog categorization

Speaking of the "O'Reilly categorization system"... Many agree that one of the most valuable kinds of metadata is the assignment of categories from a controlled vocabulary by people familiar with the content. When an O'Reilly weblog author enters a new entry, he or she enters a Subject by picking it from a controlled list of 51 choices, then picks an optional secondary Subject, and then picks one or more Topics from a list of 211 possible topics. The Subject values show up as dc:subject values in the RSS 1.0 and Atom feeds, which is great, but what happens to the topic values? I'm sure they're not going down a black hole, but if they're available to anyone outside of the oreillynet.com domain, I'd love to know where. You may as well make use of this metadata before taking the trouble to add new layers of metadata.

If O'Reilly really wants to show a commitment to the value of metadata, the metadata associated with the developer weblog entries gives them an opportunity to go well beyond the del.icio.us folksonomy bandwagon. I'd suggest: 1. including the topic values in the Atom and RSS feeds 2. making an archive of the RSS entries available, perhaps with the current year's entry as an accumulating text file and each previous year's entries as a zip file. You do have this sitting in some databases, right?

Making this data publicly available would be a huge contribution to semantic web efforts because of the possibilities it would offer for using metadata to navigate valuable information. People would write cool apps around this data using the kinds of technology that O'Reilly covers, and their sample apps would be pointing people to O'Reilly resources.

And of course, including del.icio.us tags in the archived versions would be a bonus.


tonystubblebine
2005-06-23 15:10:06
O'Reilly weblog categorization
Great comments Bob.


I think you make a good point about adding another layer of metadata when we're not even using all the metadata we do have.


I'd built an almost identical prototype using our editor supplied keywords and hit count data. The result was pretty similar, but with better coverage.


I like the del.icio.us data because bookmarking is a better indicator of value than hit counts. Although in practice they're turning out to be pretty similar.


As for sharing more data. Is RSS really your format of choice? Wouldn't you rather have a web service API? I'd rather have an API... but I don't know what semantic efforts are targeting.


We're definitely working on opening up more data. All of our books just went into an XML database which means that we can pull out targeted chunks like code examples or glossary terms. That's what I'm pushing to share.


In the mean time, I've gotten pretty good use out of the Safari API. Recently I wanted to list related books next to Perl CPAN modules. It's pretty easy with Safari's code search api method.

BobDuCharme
2005-06-23 15:28:30
O'Reilly weblog categorization
Accumulated RSS would be the low hanging fruit because you're already creating the RSS anyway, and the RDF basis of RSS 1.0 would appeal to a lot of Semantic Web people. API access would also be cool, but then you'd have some more serious development work to do. I was just thinking of maximum return on minimal investment.


thanks,


Bob

tonystubblebine
2005-06-23 21:51:08
First Update
Joshua Schacter gets the first improvement in. An improved login screen. I talk to a lot of people who don't get the point of del.icio.us. So it's important to give new users coming from our site a smooth introduction. I think the new screen does that.


(Even some of his investors had a hard time wrapping their head around the site)

Chirael
2005-06-29 01:37:26
Sort by date
It would be nice if the articles that come up under each tag were sorted (or sortable) by date. Maybe the default could have the most recent at the top?
syara
2005-07-25 22:08:31
tagging momentum
wow 71% in 2005 that is impressive
tonystubblebine
2005-08-10 10:56:22
recent del.icio.us tags for O'Reilly articles
Just added listing of recent del.icio.us tags for O'Reilly articles. These are the most popular tags from the last month.


http://www.oreillynet.com/tags.csp


Interesting to see RSS so high up and Ajax continuing to stay strong.