Tracking link validity

by Bob DuCharme

Related link: http://developer.java.sun.com/developer/technicalArticles/Programming/linkupdate…



Link value is a huge topic, with many parameters that are difficult to quantify, but one key component is easy to quantify: link validity. Does the link go where it's supposed to? Does it go anywhere? The answers to these questions say a lot about its value.



Years ago I prototyped a link management system in which all the links in a document passed the destination URL to a CGI that looked up the passed URL in a relational database to see if it was still valid before sending the user's browser on to the URL. If the link was no longer valid, it could display a message about the approximate time frame in which it became invalid by listing the last date it was valid and the first date that it was invalid. These dates were assigned by a batch process that ran through the links in the database and rechecked their validity by sending an HTTP HEAD request (which, unlike GET, only asks for the requested document's HTTP header instead of getting the whole document) to each and saving a time-stamped code showing the response in the database. (By the way, as long as there's a row in the table for each link, it's a great place to store other link metadata as well.)



Dr. Matthias Laux, a Sun employee in Walldorf, Germany, has a similar idea, although his is less concerned with runtime checking of link traversal than with maintaining bookmark collections. He, too, is checking for link validity and storing the timestamped results in a relational database. He's written some Java classes to implement his idea and they're available for download.




What lengths have you heard of web or other hypertext systems going to in order to prevent users from following bad links?