The endnotes for the book I’m now reading are a mixture of conventional citations and URLs. The former, expressed as publisher, book or journal title, author, date, and page number, seem not nearly so useful as the latter. Would you rather visit the library or click a link? But nowadays cited URLs also come with disclaimers like this: Accessed July 27, 2009. It might be inconvenient to verify a conventional citation in its original context, but I know that if I had to, I could. There’s no guarantee that I’ll be able to revisit a cited URL. Even if the page itself has not gone missing, there’s no way to know that the page I view on April 22, 2010 is the same one that the author viewed on July 27, 2009.
This anecdote was the springboard for my conversation with Herbert Van de Sompel about Memento, a proposed (and prototyped) method for adding the dimension of time to the web’s existing mechanism for content negotiation.
That mechanism has, to be sure, not taken the world by storm. The most common scenario involves a browser telling a multilingual server that its user prefers to read, say, French. A paper about Memento published last fall walks through the HTTP protocol that enables this negotiation. Odds are, though, that you’ve never seen this actually happen. It’s much more likely for a multilingual website to present itself as “a multiplication of language-specific mini-sites, instead of thinking of it as one site, with one set of URIs, only with different versions and languages available.” Wikipedia, for example, works that way.
The quote comes from a 2006 W3C article, Content Negotiation: Why it is useful, and how to make it work. The article blames the awkwardness of Apache’s implementation of the protocol (since corrected):
For a long time, with the most popular negotiation-enabled Web server (the ubiquitous apache), failed negotiation (for instance, a reader of french being proposed only english and german variants of a document), resulted in a nasty “406 not acceptable” HTTP error, which, while technically conforming to HTTP, failed to follow the recommendation that a server should try to serve some resource rather than an error message, whenever possible.
Is there any reason to suppose that time negotiation will succeed where language negotiation has so far mainly failed? That’s a hard question, and one I wish I’d thought to ask Herbert in the interview, but maybe we can continue the dialogue here.
Meanwhile, the fact that content negotiation is tricky to get right doesn’t invalidate the core of the Memento proposal. Time is fundamental, the web could have a reliable memory, and if we can build such a memory into the fabric of the web the benefits will be profound.
Examples are everywhere. Consider mediabugs.org. Founded by Scott Rosenberg, whom I interviewed last week, the site is dedicated to finding and fixing errors in media reports. A few days ago, the first bug was marked Closed:Corrected. The mediabugs.org bug page initially said:
Listing for Josh Kornbluth’s show “Andy Warhol: Good for the Jews?” says the show is at the Jewish Community Center in SF, but actually it’s at The Jewish Theater in the Theater Artaud building.
There’s a comment pointing out the error but it’s still showing with the wrong info on the Express home page.
This is fixed now!
If you visit the original news report, though, there’s no record of the correction. It’s no big deal in this particular case, but media organizations should want to be transparent about when and how they alter published items.
Likewise governments. The Citability project aims to account for the history of changes made to items published on government websites. As with mediabugs.org, the approach will initially require third-parties to monitor and chronicle the changes.
The Memento idea is that media organizations, governments, and other kinds of web publishers will be accountable for their own change histories.1 And they’ll do so in a standard way, so that people viewing these sites in browsers can straightforwardly say: “Show me this page as it existed on July 7, 2009.”
This is wildly ambitious, but I applaud the ambition. Every since I made the Heavy Metal umlaut screencast, I have imagined what it would be like to scroll back and forth along the timelines of evolving web pages. At one point Andy Baio sponsored a contest to write a script that would animate the revision history for any Wikipedia page, and I made a screencast of Dan Phiffer’s solution.
Clearly we want this. Will it be hard to arrive at a well-known and well-used standard? Sure. Is it worth doing? Absolutely.
1 Third-party watchdogs will often be needed, of course. We’d like to trust self-reported change histories, but we’d also like to verify them. Even so, third parties shouldn’t be the only mechanisms. Self-reported histories should exist.