Time for Contextual Tagging?

by Dan Zambonini

Hot on the heels of our clickdensity heat-map software, we've taken on yet more developers to create our fourth and fifth products. Although the fifth product is 'top secret' (read: still a bit ambiguous), the requirements behind it have raised an interesting question about tagging.

Most -- if not all -- tags are currently used to record the subject-matter of an object. For example, "this is a photo of the Chrysler Building" or "this is a blog entry about Lebanon".

Could we update this flexible system to allow us to describe more than subject matter, without affecting the simplicity?

35 Comments

Taylor
2006-07-25 06:31:19
The answer is OWL or some other tagging mechanism that carries semantics. It's interesting that this topic follows your last blog because user tagging really goes along the same lines as your last post...low quality content.


As far as location, I suspect soon cameras will simply embed the lat/lon and direction in the pic, so you'd not need to tag the location at all.

Taylor
2006-07-25 06:32:14
mind you, I like the blog, I'm referring to the topic.
Thomas Broyer
2006-07-25 07:52:22
hey, Atom let you express the "context" as a "categorization scheme" (which is expressed as an URI).


For example, "children" in the "http://example.net/audience" scheme has a different meaning as "children" in the "http://example.net/subject" scheme (or with no scheme at all for that matters).


Isn't Atom wonderful? ;)


You could define some "labels" for schemes, to be used after the "#"; and/or some "URI template" to construct a scheme (in case no mapping has been defined). And of course the other way around to display those "scheme'd" categories/tags.

Kit
2006-07-25 08:00:46
I agree with Taylor. Folksonomies are only easier to use if you are inside the web community whose content they describe, so you end up fencing in the content. To make it meaningful (ie. semantic) to the outside world, there has to be some regulation and sharing of the tags, at which stage you are talking about formal triples and, eventually, RDF.



Dan Zambonini
2006-07-25 08:33:14
@Thomas: Wonderful, thanks - I'll certainly check that out; I hadn't seen that before, and it looks really relevant.


@Taylor, Kit: Yeah, I see where you're coming from. It's just that every time I think of trying to deploy formalized categorization schemes on the web, I get panic attacks.

Chris
2006-07-25 09:54:25
Looks like you are going from tags to attributes.


From tags
empirestatebuilding, chryslerbuilding
to attributes
takenfrom=empirestatebuilding, subject=chryslerbuilding
with slightly different syntax and 'subject=' as a sort of default.
You are on the road to somewhere else now, I believe.
(Now you have to standardize the attribute values too.)


M. David Peterson
2006-07-25 12:53:11
Hey Dan,


Extending from @Thomas,


Please see: http://www.x2x2x.org/projects/wiki/doku.php?id=llup , and scroll to the bottom. You will notice the use of Atom Notifications (Peter Saint-Andre, by the way, is one of the editors for the Atom Notifications spec, and as I think I mentioned in private email, has recently joined the Blip Messaging project as well. His co-editor, Ralph Meijer, has also joined our efforts. EXCITING times ahead! :D) as the example of the carrier for the Blip message. Categorization is a BIG piece of what LLUP is all about.


I'm excited to see you asking these kinds of questions, because finding the answers is obviously something that is going to immediatelly bring value to the efforts being put forth in this space.

M. David Peterson
2006-07-25 19:06:19
In thinking this through a bit, and reading the comments of @Chris, it occured to me that there is a relationship here that can be directly made to perspective. So, for example, if the subject is "Chrysler Building", the perspective "Empire State Building", one could use a combination of 'category' elements with their related attributes, and using standard #id URI notation, could make the content both human readable/understandable as well as machine readable understandable, while providing "perspective" by using the scheme attribute value.


For example (using comments of both @Chris and @Thomas Broyer in this example),


<category term="The+Chrysler+Building" label="The Chrysler Building" scheme="tag:subject#The+Chrysler+Building" />
<category term="The+Chrysler+Building" label="The Chrysler Building" scheme="tag:perspective/geolocation#The+Empire+State+Building" />
<category term="The+Chrysler+Building" label="The Chrysler Building" scheme="tag:perspective/photography#The+Empire+State+Building" />


It needs some refinement, but this would allow quite nicely for defining hierarchal relationships, providing the context as the first element after 'tag:', and the heirarchal relationship using simple XPath 1.0 notation.


Of course, if there is more machine readable details in regards to how this information should be processed, what the content is licensed under (and as such, how it can and/or can not be used), etc... using a standard web protocol (http:, https:, ftp:, etc...) as the preface would suggest to the machine that they can find a machine processable schema to gain greater understanding of how this content can be accessed, used, or categorized even further.


Thoughts?

Thomas Broyer
2006-07-26 00:15:47
@David:


Wow! That's not at all what I said, and I think you're misusing Atom categories.


What I said is:


  • I configure my "tagging system" to associate "subject" (which is a default "context") with "http://context.example.net/subject", "takenFrom" with "http://tagging.example.com/ctxt/takenfrom" and unknown "contexts" with "http://example.org/tags?ctxt={context}"

  • I also configure it so that "wiki-style" (camel-case) words are "decomposed" into "phrases" when displayed, and the context is used as a prefix


Now, if I type in EmpireStateBuilding#takenFrom, I'd have the following Atom category:
<category scheme="http://tagging.example.com/ctxt/takenfrom"
term="empirestatebuilding" label="Taken From: Empire State Building" />

If I type "children#audience", the "system" will use the "template" to make an Atom category like the following:
<category scheme="http://example.org/tags?ctxt=audience"
term="children" label="Audience: Children" />

If I were to edit the tags later, the system could convert them back to "EmpireStateBuilding#takenFrom" and "children#audience".


(Also, the "tag:" URI scheme is defined by RFC4151 and your examples don't follow those rules; I guess you tried to make a new URI scheme ;-) )

Collin Hsu
2006-07-26 02:23:31
I think faceted taxonomy will help in your situation. That is you will have separate taxonomies respectively for subject, location and perspectice and so on.
M. David Peterson
2006-07-26 02:48:36
@Collin,


I agree. Attempting to force a rigid taxonomy upon each predicate is obviously something that doesn't make a whole lot of sense. The same would be true (even more so, obviously) for the subject matter.

Ryan Bates
2006-07-26 11:04:00
I agree with Collin. If you want context specific tags, look into faceted classification. Check out the wikipedia entry for more information: http://en.wikipedia.org/wiki/Faceted_classification
Dan Zambonini
2006-07-26 14:22:23
Can anyone give me examples of how these ideas (e.g. faceted classification) would work, for the end user? This is a problem that I need a real-world answer for, and I need the process of users specifying context to be as simple (and flexible) as how they specify subject topics (i.e. the user shouldn't need to care about any kind of complex classification system, hierarchies, URLs, or anything else - just a simple syntax for specifying context as words). Can faceted classification systems evolve (like a folksonomy), or do they need to be enforced on the user (pre-defined)?


I'm sure these suggestions fit in with this simplicity requirement, but it would be interesting to know how you imagined they'd be used, from an interface/user-experience point of view.

Dan Zambonini
2006-07-26 14:34:33
Forgot one more thing... While I have all these intelligent people around, does anyone have any experience (or know of any site) that allows tagging of tags? For example, you could use 'Lebanon' as a tag, but then you (or someone else) could also tag 'Lebanon' as a 'Place' or 'Country'? Using this system, the hierarchy and relationships between terms could also evolve, rather than there being no explicit relationship, or it being pre-defined.
M. David Peterson
2006-07-27 00:01:54
@Thomas,


> Wow! That's not at all what I said,


Did I state "According to Thomas Broyer"?


> and I think you're misusing Atom categories.


Thomas... Who gives a shit?! Sorry, but the current thought train is to expand the current idea set, not be confined by it.


Thinking outside of the box, even when your ideas are wrong, is how we find our way out of the box.


At the moment, the semantidc web is in a box. And its going to stay there until these problems are solved from a real world, real person perspective. Stop thinking how to make this technically correct from the standpoint of computer logic, and start thinking about how to make this easy to use, implement, and understand to a human who doesn't live, eat, and breath technology. The SIGNIFICANT MAJORITY of computer users use them to communicate by both gathering and sharing information. They don't care about technically correct. They care about how if they can find what they are looking for, and share what they want to share with other people.


Thats it. End of story. If we let the logic drive the development of the system then the system is completely useless to ANYONE other than those in whom logically put all the pieces together.


The world exists in a state of constant chaos. Always has, always will. Attempting to create logic that a computer can understand out of the chaos that NOBODY understands doesn't make any sense.


In other words, if we can't make any sense of something, how are we supposed to write programs that tell a computer how to make sense of something? Artificial Intelligence is an important part of the study of Computer Science.


Human Intelligence, however, is more important.


If you want to expand Human Intelligence, stop thinking like a computer!


A human thinks in analog. NOT DIGITAL! Analog. Waves. Peaks and Troughs. Human.


No matter how hard we try, Artificial Intelligence will always be artificial. 0's and 1's, no matter how many millions and billions of pixels per square inch, will never achieve that of what is achieved by the analog mind.


The universe derives its "special effects" in ways Hollywood has NEVER been able to understand, much less reproduce. No matter how hard they try, until such time as analog tools replace the digital tools they never will, either.


Of course, to understand the analog way of doing things is to understand that where there is analog, there is chaos, and where there is chaos there is nature, and where there is nature, you can not recreate special effects.


They just happen. And then we adapt.


M. David Peterson
2006-07-27 00:16:47
@Dan,


Firstly, thanks for putting the human back into the focus of the conversation... It's been needed for a LONG time!


That said, this is one area of research (read: To the naysayers out there, I'm not stating "This is the answer!" and instead "Here's one idea. Lets try it out and see what happens.") I have been working on. Please see: http://www.oreillynet.com/xml/blog/2006/04/mapping_data_between_domains_a.html
for more info of what I am refering to.


In short, I COMPLETELY agree with where you are headed with all of this. This builds from your point regarding "tagging tags" while placing the focus back on the human aspect, while separating the computer logic out such that a computer can make sense of this as well, but does so based on our own rules and definitions. In other words, this isn't a way of artificially gathering information by connecting the dots, although, if allowed, this could aid in that endeavor. This *IS* a way of "tagging" information that can easily be identified as to what it is, and who it belongs to, without giving up ownership and possession of that information to someone else.

M. David Peterson
2006-07-27 00:22:37
@Thomas,


One other thing...


Just because something doesn't exist in a current RFC, or breaks the rules of a given RFC, doesn't mean its incorrect. It just means its a different way of doing something.


An RFC isn't worth the paper its printed on until such time as its use becomes embedded into the culture in which it was designed to try and help. If it doesn't help, *EVEN IF* it's the most logical way to do something, then it won't get used. If it doesn't get used...


It's not a standard, even if technically speaking, it is.

lyncher
2006-07-27 05:49:00
You might want to look at the ideas behind Flamenco:
http://flamenco.berkeley.edu/index.html
Ryan Bates
2006-07-27 09:07:30
Dan, good questions on faceted classification. I don't have much experience with it, but I've thought a bit about how things could work. Facets generally improve the browsing experience, but at the cost of making classification more complex. Because of this, I imagine it works better in a more controlled environment where the classification is handled by administrators of sorts. I haven't seen any examples where classification is handled by the public user, but it is probably possible. Faceted classification is quite a bit more complex than tagging, so it may not be what you are looking for.
M. David Peterson
2006-07-27 15:24:49
@Ryan Bates,


I LOVE 'honest' hackers who are willing to draw the line of 'relavency' for a technology they have an expertise in. Makes it SO MUCH EASIER to be able to analyze things when you go into it ahead of time with an understanding of what something was designed to do, and what it ultimately is known to be useful for. :D


Thanks!

M. David Peterson
2006-07-27 15:26:24
@lyncher,


"FLAMENCO stands for FLexible information Access using MEtadata in Novel COmbinations"


I LOVE IT! :D

M. David Peterson
2006-07-27 15:33:29
@Thomas,


re: The value of an RFC.


Obviously an RFC has had a lot of time and thought put into it by the various experts involved with its development. With this in mind, I'm not trying to suggest there isn't *ANY* value in an RFC until it proves otherwise, and instead that no matter how well something is specified, and how well this specification has been thought through, if no one ever uses it, and furthermore, if the specification was not derived from usability studies and/or usability experience from a real world perspective, then room for adjustment needs to be made.


As we are all *WELL* aware, we humans do things how we do them. As far as I know (although it wouldn't surprise me if someone made an attempt at one in "hindsight") there's no RFC for a specification on how to be human. And if there was...


It would be useless for what I hope are obvious reasons.

Ivan
2006-07-28 10:31:15
At some point contextual tagging gets really close to a general taxonomy/ontology; I'm not sure what the win is. There is already a way to extract out parentage (which is the traditional form of "context"), it just requires a knowledge of how the statistics of the whole thing work out to extract that information. Yahoo labs has done research into this area, definitely go check it out.
Reilly H.
2006-07-28 17:43:11
Your suggestion essentially replaces the flat name-space of tags with an ad-hoc hierarchy. If the community tends toward a shared ontology, great. But the brilliant thing about tags is the acceptance that there is no persistent shared ontology. Instead, categorization and discovery become the mirror image of each other (cheap, fast, and out of control). Messy-cheap categorization is good because it gets done more often.


Of course, the whole point to spending effort on categorization (tags, ontology, whatever) is to improve discovery. With tags, we accept that discovery is a stochastic process that yields an acceptable distribution of results. If we want to improve discovery, we don't try to create an idealized ontology. We create an empirical ontology by applying statistical methods to the data.

Dan Zambonini
2006-07-29 03:21:41
Thanks everyone for the comments, it's given me lots to think about!


@Ivan and Reilly, I'll have to read more into this area of things. But it's worth pointing out that the idea of 'context' that I originally proposed /can't/ be implied or extracted statistically. It's not about the hierarchy of terms, but about the use of those terms in particular contexts. For example, if I tagged an image by three different names, there could be no way of 'guessing' the context behind those names (e.g. one could be the photographer, one could be in the photo, and it could be at the birthday party of the third person). It's this kind of use I'm more interested in, rather than relationships between terms (i.e. ontology rather than taxonomy).


I agree that (in general) there is no long-term convergence of terms. However, apart from the larger applications that currently use tagging (e.g. flickr, del.icio.us), I can see more and more 'domain specific' applications using tagging, where the (smaller) community will tend to consistently use the same tags. In this area, I can see the 'contextual tags' being of particular use, where they do result in some kind of partial shared ontology.


Thanks again everyone - fascinating stuff.

Dave Newton
2006-07-29 09:23:58
You're really talking about knowledge representation, of which tagging is a minor subset. If you can restrict the domains of knowledge you're representing it will look like RDF/OWL/etc.


Presenting such information to users is difficult: the more you know about information the harder it can be to get at exactly what you want. Ideally I'd want to say "show me pictures taken from the empire state building".


What if what I really want is "pictures of cities taken from skyscrapers" or "pictures of buildings taken from other buildings" or... you get the idea.


Suddenly you're presented with wads of is-a relationships. Querying isn't the real problem; getting proper meta-data is. Maybe existing projects might help (haven't checked in on it in awhile, but think OpenCYC etc.) It may already know that the empire state building is a skyscraper, that skyscrapers are usually in a city, that photographs are always taken from a location, etc.


This is a difficult problem on several levels; hiding internal complexity from the user is problematic.

gaby de wilde
2006-07-30 06:23:12
Tags are the worse kind of related links. Not half as good as a normal search. Then tagging takes more time as finding a related article. Call me crazy, I prefer linking to (read credit) a page made by a human. That way I know who I'm linking and what kind of read I'm offering. Text search should be able to find the "picture of ... taken from....". Unlike tags text can describe things quite well. Then again, maybe I just think it does...
M. David Peterson
2006-07-30 21:13:43
@gaby de wilde,


>> Text search should be able to find the "picture of ... taken from....".


If in writing a summary of a picture I were to state something like,


"This is a picture I took of the Chrysler building from the Empire State Building." Then an intelligent key word search of this text could *potentially* extract some quality results.


But what if my summary was,


"I LOVE this picture!"


or,


"",


[yes, as in -- I didn't write a summary --]


?


Of course I may not have tagged the picture with key words either, but Flickr has proven quite nicely that other folks are willing to tag other folks pictures. They've also proven that other folks are willing to write snippets or summarys, attached as labels to the picture itself (if viewed via the Flickr interface.) However, unless I happen to be involved with the picture that was taken (e.g. it's of an event I attended and/or I am in the picture, and/or etc...), or in other ways know something about the subject matter ("Hey, thats Joe Schmoe!") providing a summary or tagging the picture with a short snippet becomes little more than random text graffiti.


Amazon's Mechanical Turk project has been developed around the notion that the only *REAL* way to extract human intelligence is to get a human to do the extraction. Yes, computers can extract the basics ("there's a 98.77% chance there's a human in this photo based on my analysis of the relationship between pixels.") but beyond the basics -- when *REAL* intelligence gathering is required -- *REAL* people need to be involved.


Getty Images is a good example of how this type of human analysis can result in positive results. There image search > http://creative.gettyimages.com/source/home/homeCreative.aspx < tool combined with there HUGE stock of images provides a FANTASTIC way to locate this kind of contextual information. While I can't say I know the process they use to categorize each picture, I'm fairly certain that a human is involved. The computers ability to quickly and easily parse through the contextual information provided by a human is obviously unparalleled, proving that without computers, locating an image inside of Getty Images collection would be near to impossible from the standpoint of building a business model around this collection.


Fortunately, we do have computers, and therefore the ability to locate the images we might want, and as such, providing GI with a nice business model in which they can build from. One thing you will notice... A quick search using the link provided above will showcase that contextual tagging is what GI is *ALL* about. I've used the GI search tools quite often, I can assuure you that I've never thought to myself "I wish they would provide a summary instead of these silly little contextual tags!"


In the end, however, I do agree with your general point... Unless there is incentive (e.g. building a business around selling stock photography) a human isn't going to go to any great length to tag something or to write a summary of that same something. There needs to be incentive to do so. Regardless of preference between tags, summaries, and URI endpoints, without incentive, the semantic web -- at least from a humans involvement -- isn't going to go anywhere, and thats the bottom line...


The Semantic Web will arrive the same day incentive is provided.


Abraham
2006-07-31 13:43:26
I used a similar approach for a personal music database application I was creating a few years ago. Each track could have any number of "classifiers" (this was before I knew about "tags"), and the classifiers themselves, although they were an open class, would each be associated with a "classifier type". For example, you could define classifier types "genre" "mood" "tempo" etc., and then you would have classifiers "happy" "sad" "melancholy" etc. all associated with the type "mood".


Instead of using the vague terminology of "classifiers" and the seemingly redundant "classifier types", it might be better to call these "tags" and "tag types". The word "context" is a bit too ambiguous or vague and probably won't catch on with users.

Dan Zambonini
2006-07-31 14:10:56
@Ambraham: Interesting stuff, thanks. The only problem with your suggestion of 'tag types' (in my example) is that they aren't types (i.e. the 'context' as I call it doesn't represent the 'type' of tag - it is a specifier that is only valid for the single context that it's being used).


For example, in your example, you could nearly always say that 'melancholy' was of 'tag type' 'mood' (in fact, you'd only need to specify it once, as it's defining a sort-of hierarchy).


In the example I quote, the 'Empire State Building' isn't of 'type' 'takenfrom' - it only has the specifier 'takenfrom' applied to it in this context because of something particular about the instance in which it's applied; other than that, there is no particular relationship between 'Empire State Building' and 'takenfrom'.

Doug L.
2006-08-02 13:42:11
Here are a couple more links related to faceted classification. Kim Burchett has a faceted "diamond wiki" running "here. See this "About Kim" entry. Kim mentions sniki, and though the link doesn't currently work, shapr might be convinced to describe its current state.
Interplein
2006-08-03 09:20:21
offcorse tagging should be allowed
as long as it doesn't bother people if you for instanse tag EVERYTHING.
imparare
2007-04-15 00:40:04
Interesting comments.. :D
Gray
2007-04-25 03:07:43
Hello, my name is Petro, I liked yours blog, can get acquainted and with mine
Robert
2007-07-27 15:35:14
Managed Hosting, Colocation and Data Center Services by victoryushchenkonashpresudent ...