Annotating on, and with, media


At the 2018 I Annotate conference I gave a flash talk on the topic covered in more depth in Open web annotation of audio and video. These are my notes for the talk, along with the slides.


Here’s something we do easily and naturally on the web: Make a selection in a page.

The web annotation standard defines a few different ways to describe the selection. Here’s how Hypothesis does it:

We use a variety of selectors to help make the annotation target resilient to change. I’d like you to focus here on the TextPositionSelector, which defines the start and the end of the selection. Which is something we just take for granted in the textual domain. Of course a selection has a start and an end. How could it not?

An annotation like this gives you a kind of an URL that points to a selection in a document. There isn’t yet a standard way to write that URL along with the selectors, so Hypothesis points to that combination — that is, the URL plus the selectors — using an ID, like this:

The W3C have thought about a way to bundle selectors with the URL, so you’d have a standard way to cite a selection, like this:

In any case, the point is we’re moving into a world where selections in web resources have their own individual URLs.

Now let’s look again at this quote from Nancy Pelosi:

That’s not something she wrote. It’s something she said, at the Peter G Peterson Fiscal Summit, that was recorded on video.

Is there a transcript that could be annotated? I looked, and didn’t find anything better than this export of YouTube captions:

But of course we lack transcriptions for quite a lot of web audio and video. And lots of it can’t be transcribed, because it’s music, or silent moving pictures.

Once you’ve got a media selection there are some good ways to represent it with an URL. YouTube does it like this:

And with filetypes like mp3 and mp4, you can use media fragment syntax like this:

The harder problem in the media domain turns out to be just making the selection in the first place.

Here I am in that video, in the process of figuring out that the selection I’m looking for starts at 28:20 and ends at 28:36.

It’s not a fun or easy process. You can set the start, as I’ve done here, but then you have to scrub around on the timeline looking for the end, and then write that down, and then tack it onto the media fragment URL that you’ve captured.

It’s weird that something so fundamental should be missing, but there just isn’t an easy and natural way to make a selection in web media.

This is not a new problem. Fifteen years ago, these were the dominant media players.

We’ve made some progress since then. The crazy patchwork of plugins and apps has thankfully converged to a standard player that works the same way everywhere.

Which is great. But you still can’t make a selection!

So I wrote a tool that wraps a selection interface around the standard media player. It works with mp3s, mp4s, and YouTube videos. Unlike the standard player, which has a one-handled slider, this thing has a two-handled slider which is kind of obviously what you need to work with the start and end of a selection.

You can drag the handles to set the start and end, and you can nudge them forward and backward by minutes and seconds, and when you’re ready to review the intro and outro for your clip, you can play just a couple of seconds on both ends to check what you’re capturing.

When you’re done, you get a YouTube URL that will play the selection, start to stop, or an mp3 or mp4 media fragment URL that will do the same.

So how does this relate to annotation? In a couple of ways. You can annotate with media, or you can annotate on media.

Here’s what I mean by annotating with media.

I’ve selected some text that wants to be contextualized by the media quote, and I’ve annotated that text with a media fragment link. Hypothesis turns that link into an embedded player (thanks, by the way, to a code contribution from Steel Wagstaff, who’s here, I think). So the media quote will play, start to stop, in this annotation that’s anchored to a text selection at Politifact.

And here’s what I mean by annotating on media.

If I’m actually on a media URL, I just can annotate it. In this case there’s no selection to be made, the URL encapsulates the selection, so I can just annotate the complete URL.

This is a handy way to produce media fragment URLs that you can use in these ways. I hope someone will come up with a better one than I have. But the tool is begging to be made obsolete when the selection of media fragments becomes as easy and natural as the selection of text has always been.

Leave a Reply