Annotating on, and with, media

At the 2018 I Annotate conference I gave a flash talk on the topic covered in more depth in Open web annotation of audio and video. These are my notes for the talk, along with the slides.

Here’s something we do easily and naturally on the web: Make a selection in a page.

The web annotation standard defines a few different ways to describe the selection. Here’s how Hypothesis does it:

We use a variety of selectors to help make the annotation target resilient to change. I’d like you to focus here on the TextPositionSelector, which defines the start and the end of the selection. Which is something we just take for granted in the textual domain. Of course a selection has a start and an end. How could it not?

An annotation like this gives you a kind of an URL that points to a selection in a document. There isn’t yet a standard way to write that URL along with the selectors, so Hypothesis points to that combination — that is, the URL plus the selectors — using an ID, like this:

The W3C have thought about a way to bundle selectors with the URL, so you’d have a standard way to cite a selection, like this:

In any case, the point is we’re moving into a world where selections in web resources have their own individual URLs.

Now let’s look again at this quote from Nancy Pelosi:

That’s not something she wrote. It’s something she said, at the Peter G Peterson Fiscal Summit, that was recorded on video.

Is there a transcript that could be annotated? I looked, and didn’t find anything better than this export of YouTube captions:

But of course we lack transcriptions for quite a lot of web audio and video. And lots of it can’t be transcribed, because it’s music, or silent moving pictures.

Once you’ve got a media selection there are some good ways to represent it with an URL. YouTube does it like this:

And with filetypes like mp3 and mp4, you can use media fragment syntax like this:

The harder problem in the media domain turns out to be just making the selection in the first place.

Here I am in that video, in the process of figuring out that the selection I’m looking for starts at 28:20 and ends at 28:36.

It’s not a fun or easy process. You can set the start, as I’ve done here, but then you have to scrub around on the timeline looking for the end, and then write that down, and then tack it onto the media fragment URL that you’ve captured.

It’s weird that something so fundamental should be missing, but there just isn’t an easy and natural way to make a selection in web media.

This is not a new problem. Fifteen years ago, these were the dominant media players.

We’ve made some progress since then. The crazy patchwork of plugins and apps has thankfully converged to a standard player that works the same way everywhere.

Which is great. But you still can’t make a selection!

So I wrote a tool that wraps a selection interface around the standard media player. It works with mp3s, mp4s, and YouTube videos. Unlike the standard player, which has a one-handled slider, this thing has a two-handled slider which is kind of obviously what you need to work with the start and end of a selection.

You can drag the handles to set the start and end, and you can nudge them forward and backward by minutes and seconds, and when you’re ready to review the intro and outro for your clip, you can play just a couple of seconds on both ends to check what you’re capturing.

When you’re done, you get a YouTube URL that will play the selection, start to stop, or an mp3 or mp4 media fragment URL that will do the same.

So how does this relate to annotation? In a couple of ways. You can annotate with media, or you can annotate on media.

Here’s what I mean by annotating with media.

I’ve selected some text that wants to be contextualized by the media quote, and I’ve annotated that text with a media fragment link. Hypothesis turns that link into an embedded player (thanks, by the way, to a code contribution from Steel Wagstaff, who’s here, I think). So the media quote will play, start to stop, in this annotation that’s anchored to a text selection at Politifact.

And here’s what I mean by annotating on media.

If I’m actually on a media URL, I just can annotate it. In this case there’s no selection to be made, the URL encapsulates the selection, so I can just annotate the complete URL.

This is a handy way to produce media fragment URLs that you can use in these ways. I hope someone will come up with a better one than I have. But the tool is begging to be made obsolete when the selection of media fragments becomes as easy and natural as the selection of text has always been.

Talking with Scott Rosenberg about Say Everything, Dreaming in Code, and MediaBugs

My guest for this week’s Innovators show is Scott Rosenberg. He’s the author of two books, most recently Say Everything, subtitled How blogging began, what it’s becoming, and why it matters. Before that he was the Chandler project‘s embedded journalist, and told its story in Dreaming in Code. His current project is MediaBugs, a soon-to-be-launched service that aims to crowd-source the reporting and correction of errors in media coverage.

We began with a discussion of Say Everything. Its account of how blogging came to be is a great read, and a much-needed history of the era. Since I know that story quite well, though, we focused on the blogosphere’s present state and future prospects. Blogging is still a new medium. But those of us who experienced blogging as a conversation flowing through decentralized networks of blogs have now seen still newer (and more centralized) social media capture a lot of that conversation.

The good news is that more people are able to be involved. The fact that millions of people fired up blogs was, and remains, astonishing. But active blogging has proven to be a hard thing to sustain. Meanwhile hordes of people find it relatively easy to be active on Facebook and Twitter.

The bad news is that, as always, there’s no free lunch. While it’s easier to create and sustain network effects using Facebook and Twitter, you sacrifice control of your own data. Scott thinks we’re moving through a transitional phase, and I hope he’s right. We really need the best of two worlds. First, control of the avatars we project into the cloud, and of the data that surrounds them, insofar as that’s possible. Second, frictionless interaction. The tension between these two conflicting needs will define the future of social media.

Two of Scott’s other projects, Dreaming in Code and MediaBugs, are connected in an interesting way. The media project adopts terminology (“filing bugs”) and process (version control, issue tracking) from the realm of software. If MediaBugs helps make non-technical people aware of that crucial way of thinking and acting, it will be a bonus outcome.