Open web annotation of audio and video

Hypothesis does open web annotation of text. Let’s unpack those concepts one at a time.

Open, a famously overloaded word, here means:

  • The software that reads, writes, stores, indexes, and searches for annotations is available under an open source license.

  • Open standards govern the representation and exchange of annotations.

  • The annotation system is a good citizen of the open web.

Web annotation, in the most general sense, means:

  • Selecting something within a web resource.

  • Attaching stuff to that selection.

Text, as the Hypothesis annotation client understands it, is HTML, or PDF transformed to HTML. In either case, it’s what you read in a browser, and what you select when you make an annotation.

What’s the equivalent for audio and video? It’s complicated because although browsers enable us to select passages of text, the standard media players built into browsers don’t enable us to select segments of audio and video.

It’s trivial to isolate a quote in a written document. Click to set your cursor to the beginning, then sweep to the end. Now annotation can happen. The browser fires a selection event; the annotation client springs into action; the user attaches stuff to the selection; the annotation server saves that stuff; the annotation client later recalls it and anchors it to the selection.

But selection in audio and video isn’t like selection in text. Nor is it like selection in images, which we easily and naturally crop. Selection of audio and video happens in the temporal domain. If you’ve ever edited audio or video you’ll appreciate what that means. Setting a cursor and sweeping a selection isn’t enough. You can’t know that you got the right intro and outro by looking at the selection. You have to play the selection to make sure it captures what you intended. And since it probably isn’t exactly right, you’ll need to make adjustments that you’ll then want to check, ideally without replaying the whole clip.

All audio and video editors support making and checking selections. But what’s built into modern browsers is a player, not an editor. It provides a slider with one handle. You can drag the handle back and forth to play at different times, but you can’t mark a beginning and an end.

YouTube shares that limitation, by the way. It’s great that you can right-click and Copy video URL at current time. But you can’t mark that as the beginning of a selection, and you can’t mark a corresponding end.

We mostly take this limitation for granted, but as more of our public discourse happens in audio or video, rarely supported by written transcripts, we will increasingly need to be able to cite quotes in the temporal domain.

Open web annotation for audio and video will, therefore, require standard players that support selection in the temporal domain. I expect we’ll get there, but meanwhile we need to provide a way to do temporal selection.

In Annotating Web Audio I described a first cut at a clipping tool that wraps a selection interface around the standard web audio and video players. It was a step in the right direction, but was too complex — and too modal — for convenient use. So I went back to the drawing board and came up with a different approach shown here:

This selection tool has nothing intrinsically to do with annotation. It’s job is to make your job easier when you are constructing a link to an audio or video segment.

For example, in Welcome to the Sapiezoic I reflected on a Long Now talk by David Grinspoon. Suppose I want to support that post with an audio pull quote. The three-and-a-half-minute segment I want to use begins: “The Anthropocene has been proposed as a new epoch…” It ends: “…there has never been a geological force aware of its own existence. And to me that’s a very profound change.”

Try opening and finding the timecodes for the beginning (“The Anthropocene…”) and the end (“…profound change”). The link you want to construct is:,949. That’s a Media Fragments link. It tells a standard media player when to start and stop.

If you click the bare link, your browser will load the audio file into its standard player and play the clip. If you paste the link into a piece of published text, it may even be converted by the publishing system into an inline player. Here’s an annotation on my blog post that does that:

That’s a potent annotation! But to compose it you have to find where the quote starts (12:10) and ends (15:49), then convert those timecodes to seconds. Did you try? It’s doable, but so fiddly that you won’t do it easily or routinely.

The tools at aim to lower that activation threshold. They provide a common interface for mp3 audio, mp4 video, and YouTube video. The interface embeds a standard audio or video player, and adds a two-handled slider that marks the start and end of a clip. You can adjust the start and end and hear (or hear and see) the current intro and outro. At every point you’ve got a pair of synced permalinks. One is the player link, a media fragment URL like,949. The other is the editor link. It records the settings that produce the player link.

As I noted in Annotating Web Audio, these clipping tools are just one way to ease the pain of constructing media fragment URLs. Until standard media players enable selection, we’ll need complementary tools like these to help us do it.

Once we have a way to construct timecoded segments, we can return to the question: “What is open web annotation for audio and video?”

At the moment, I see two complementary flavors. Here’s one:

I’m using a Hypothesis page note to annotate a media fragment URL. There’s no text to which Hypothesis can anchor an annotation, but it can record a note that refers to the whole page. In effect I’m bookmarking the page as I could also do in, for example, Pinboard. The URL encapsulates the selection; annotations that attach to the URL refer to the selection.

This doesn’t yet work as you might expect in Hypothesis. If you visit,949 with Hypothesis you’ll see my annotation, but you’ll also see it if you visit the same URL at #t=1,10, or any other media fragment. That’s helpful in one way, because if there were multiple annotations on the podcast you’d want to discover all of them. But it’s unhelpful in a more important way. If you want to annotate a particular segment for personal use, or to mark it as a pull quote, or because you’re a teacher leading a class discussion that refers to that segment, then you want annotations to refer to the particular segment.

Why doesn’t Hypothesis distinguish between #t=1,10 and #t=730,949? Because what follows the # in a URL is most often not a media fragment, it’s an ordinary fragment that marks a location in text. You’ve probably seen how that works. A page like has intrapage links like and I can send you to those locations in the page by capturing and relaying those fragment-enhanced links. In that case, if we’re annotating the page, we’d most likely want all our annotations to show up, no matter which fragment is focused in the browser. So Hypothesis doesn’t record the fragment when it records the target URL for an annotation.

But there can be special fragments. If I share a Hypothesis link, for example, you’ll land on a link that ends with #annotations:uT06DvGBEeeXDlMfjMFDAg. The Hypothesis client uses that information to scroll the annotated document to the place where the annotation is anchored.

Media fragments could likewise be special. The Hypothesis server, which normally discards media fragments, could record them so that annotations made on a media fragment URL would target the fragment. And I think that’s worth doing. Meanwhile, note that you can annotate the editor links provided by the tools at

This works because the editor links don’t use fragments, only URL parameters, and because Hypothesis regards each uniquely-parameterized URL as a distinct annotation target. Note also that you needn’t use the tools as hosted on my site. They’re just a small set of files that can be hosted anywhere.

Eventually I hope we’ll get open web annotation of audio and video that’s fully native, meaning not only that the standard players support selection, but also that they directly enable creation and viewing of annotations. Until then, though, this flavor of audio/video annotation — let’s call it annotating on media — will require separate tooling both for selecting quotes, and for creating and viewing annotations directly on those quotes.

We’ve already seen the other flavor: annotating with media. To do that with Hypothesis, construct a media fragment URL and cite it in a Hypothesis annotation. What should the annotation point to? That’s up to you. I attached the David Grinspoon pull quote to one of my own blog posts. When I watched a PBS interview with Virginia Eubanks, I captured one memorable segment and attached it to the page on her blog that features the book discussed in the interview.

(If I were Virginia Eubanks I might want to capture the pull quote myself, and display it on my book page for visitors who aren’t seeing it through the Hypothesis lens.)

Open web annotation of audio and video should encompass both of these flavors. You should be able to select a clip within a standard player, and annotate it in situ. And you should be able to use that clip in an annotation.

Until players enable selection, the first flavor — annotating on a segment — will require a separate tool. I’ve provided one implementation, there can be (perhaps already are?) others. However it’s captured, the selection will be represented as a media fragment link. Hypothesis doesn’t yet, but pretty easily could, support annotation of such links in a way that targets media fragments.

The second flavor — annotation with a segment — again requires a way to construct a media fragment link. With that in hand, you can just paste the link into the Hypothesis annotation editor. Links ending with .mp3#t=10,20 and links like will become embedded players that start and end at the indicated times. Links like .mp4#t=10,20 and don’t yet become embedded players but can.

The ideal implementation of open web annotation for audio and video will have to wait for a next generation of standard media players. But you can use Hypothesis today to annotate on as well as with media.

2 thoughts on “Open web annotation of audio and video

  1. I suspect that media fragments experimenters like Aaron Parecki, Marty McGuire, Kevin Marks, and Tantek Çelik will appreciate what you’re doing and will want to play as well as possibly extend it. I’ve already added some of the outline to the IndieWeb wiki page for media fragments (and a link to fragmentions) which has some of their prior work.

    I too look forward to a day where web browsers have some of this standardized and built in as core functionality.

  2. In the service of #DeliberativePolitics (What I in the 1970s called #ParticipatoryDeliberation.) I have managed very little better than this:

    Having worked as broadcast tech I’ve used various methods for //selecting// clips. But displaying them effectively … still have not found an efficient tool for that.


    #ProTension #Exhibitum #Nuts&Bolts #Hammer&Tongs

Leave a Reply