Open web annotation of audio and video

Hypothesis does open web annotation of text. Let’s unpack those concepts one at a time.

Open, a famously overloaded word, here means:

  • The software that reads, writes, stores, indexes, and searches for annotations is available under an open source license.

  • Open standards govern the representation and exchange of annotations.

  • The annotation system is a good citizen of the open web.

Web annotation, in the most general sense, means:

  • Selecting something within a web resource.

  • Attaching stuff to that selection.

Text, as the Hypothesis annotation client understands it, is HTML, or PDF transformed to HTML. In either case, it’s what you read in a browser, and what you select when you make an annotation.

What’s the equivalent for audio and video? It’s complicated because although browsers enable us to select passages of text, the standard media players built into browsers don’t enable us to select segments of audio and video.

It’s trivial to isolate a quote in a written document. Click to set your cursor to the beginning, then sweep to the end. Now annotation can happen. The browser fires a selection event; the annotation client springs into action; the user attaches stuff to the selection; the annotation server saves that stuff; the annotation client later recalls it and anchors it to the selection.

But selection in audio and video isn’t like selection in text. Nor is it like selection in images, which we easily and naturally crop. Selection of audio and video happens in the temporal domain. If you’ve ever edited audio or video you’ll appreciate what that means. Setting a cursor and sweeping a selection isn’t enough. You can’t know that you got the right intro and outro by looking at the selection. You have to play the selection to make sure it captures what you intended. And since it probably isn’t exactly right, you’ll need to make adjustments that you’ll then want to check, ideally without replaying the whole clip.

All audio and video editors support making and checking selections. But what’s built into modern browsers is a player, not an editor. It provides a slider with one handle. You can drag the handle back and forth to play at different times, but you can’t mark a beginning and an end.

YouTube shares that limitation, by the way. It’s great that you can right-click and Copy video URL at current time. But you can’t mark that as the beginning of a selection, and you can’t mark a corresponding end.

We mostly take this limitation for granted, but as more of our public discourse happens in audio or video, rarely supported by written transcripts, we will increasingly need to be able to cite quotes in the temporal domain.

Open web annotation for audio and video will, therefore, require standard players that support selection in the temporal domain. I expect we’ll get there, but meanwhile we need to provide a way to do temporal selection.

In Annotating Web Audio I described a first cut at a clipping tool that wraps a selection interface around the standard web audio and video players. It was a step in the right direction, but was too complex — and too modal — for convenient use. So I went back to the drawing board and came up with a different approach shown here:

This selection tool has nothing intrinsically to do with annotation. It’s job is to make your job easier when you are constructing a link to an audio or video segment.

For example, in Welcome to the Sapiezoic I reflected on a Long Now talk by David Grinspoon. Suppose I want to support that post with an audio pull quote. The three-and-a-half-minute segment I want to use begins: “The Anthropocene has been proposed as a new epoch…” It ends: “…there has never been a geological force aware of its own existence. And to me that’s a very profound change.”

Try opening http://podcast.longnow.org/salt/salt-020170906-grinspoon-podcast.mp3 and finding the timecodes for the beginning (“The Anthropocene…”) and the end (“…profound change”). The link you want to construct is: http://podcast.longnow.org/salt/salt-020170906-grinspoon-podcast.mp3#t=730,949. That’s a Media Fragments link. It tells a standard media player when to start and stop.

If you click the bare link, your browser will load the audio file into its standard player and play the clip. If you paste the link into a piece of published text, it may even be converted by the publishing system into an inline player. Here’s an annotation on my blog post that does that:

That’s a potent annotation! But to compose it you have to find where the quote starts (12:10) and ends (15:49), then convert those timecodes to seconds. Did you try? It’s doable, but so fiddly that you won’t do it easily or routinely.

The tools at http://jonudell.net/av/ aim to lower that activation threshold. They provide a common interface for mp3 audio, mp4 video, and YouTube video. The interface embeds a standard audio or video player, and adds a two-handled slider that marks the start and end of a clip. You can adjust the start and end and hear (or hear and see) the current intro and outro. At every point you’ve got a pair of synced permalinks. One is the player link, a media fragment URL like http://podcast.longnow.org/salt/salt-020170906-grinspoon-podcast.mp3#t=730,949. The other is the editor link. It records the settings that produce the player link.

As I noted in Annotating Web Audio, these clipping tools are just one way to ease the pain of constructing media fragment URLs. Until standard media players enable selection, we’ll need complementary tools like these to help us do it.

Once we have a way to construct timecoded segments, we can return to the question: “What is open web annotation for audio and video?”

At the moment, I see two complementary flavors. Here’s one:

I’m using a Hypothesis page note to annotate a media fragment URL. There’s no text to which Hypothesis can anchor an annotation, but it can record a note that refers to the whole page. In effect I’m bookmarking the page as I could also do in, for example, Pinboard. The URL encapsulates the selection; annotations that attach to the URL refer to the selection.

This doesn’t yet work as you might expect in Hypothesis. If you visit http://podcast.longnow.org/salt/salt-020170906-grinspoon-podcast.mp3#t=730,949 with Hypothesis you’ll see my annotation, but you’ll also see it if you visit the same URL at #t=1,10, or any other media fragment. That’s helpful in one way, because if there were multiple annotations on the podcast you’d want to discover all of them. But it’s unhelpful in a more important way. If you want to annotate a particular segment for personal use, or to mark it as a pull quote, or because you’re a teacher leading a class discussion that refers to that segment, then you want annotations to refer to the particular segment.

Why doesn’t Hypothesis distinguish between #t=1,10 and #t=730,949? Because what follows the # in a URL is most often not a media fragment, it’s an ordinary fragment that marks a location in text. You’ve probably seen how that works. A page like example.com/page1 has intrapage links like example.com/page1#section1 and example.com/page1#section2. I can send you to those locations in the page by capturing and relaying those fragment-enhanced links. In that case, if we’re annotating the page, we’d most likely want all our annotations to show up, no matter which fragment is focused in the browser. So Hypothesis doesn’t record the fragment when it records the target URL for an annotation.

But there can be special fragments. If I share a Hypothesis link, for example https://hyp.is/uT06DvGBEeeXDlMfjMFDAg, you’ll land on a link that ends with #annotations:uT06DvGBEeeXDlMfjMFDAg. The Hypothesis client uses that information to scroll the annotated document to the place where the annotation is anchored.

Media fragments could likewise be special. The Hypothesis server, which normally discards media fragments, could record them so that annotations made on a media fragment URL would target the fragment. And I think that’s worth doing. Meanwhile, note that you can annotate the editor links provided by the tools at http://jonudell.net/av/:

This works because the editor links don’t use fragments, only URL parameters, and because Hypothesis regards each uniquely-parameterized URL as a distinct annotation target. Note also that you needn’t use the tools as hosted on my site. They’re just a small set of files that can be hosted anywhere.

Eventually I hope we’ll get open web annotation of audio and video that’s fully native, meaning not only that the standard players support selection, but also that they directly enable creation and viewing of annotations. Until then, though, this flavor of audio/video annotation — let’s call it annotating on media — will require separate tooling both for selecting quotes, and for creating and viewing annotations directly on those quotes.

We’ve already seen the other flavor: annotating with media. To do that with Hypothesis, construct a media fragment URL and cite it in a Hypothesis annotation. What should the annotation point to? That’s up to you. I attached the David Grinspoon pull quote to one of my own blog posts. When I watched a PBS interview with Virginia Eubanks, I captured one memorable segment and attached it to the page on her blog that features the book discussed in the interview.

(If I were Virginia Eubanks I might want to capture the pull quote myself, and display it on my book page for visitors who aren’t seeing it through the Hypothesis lens.)

Open web annotation of audio and video should encompass both of these flavors. You should be able to select a clip within a standard player, and annotate it in situ. And you should be able to use that clip in an annotation.

Until players enable selection, the first flavor — annotating on a segment — will require a separate tool. I’ve provided one implementation, there can be (perhaps already are?) others. However it’s captured, the selection will be represented as a media fragment link. Hypothesis doesn’t yet, but pretty easily could, support annotation of such links in a way that targets media fragments.

The second flavor — annotation with a segment — again requires a way to construct a media fragment link. With that in hand, you can just paste the link into the Hypothesis annotation editor. Links ending with .mp3#t=10,20 and links like youtube.com/watch?v=Avxm7JYjk8M&start=10&end=20 will become embedded players that start and end at the indicated times. Links like .mp4#t=10,20 and youtube.com/embed/Avxm7JYjk8M?start=10&end=20 don’t yet become embedded players but can.

The ideal implementation of open web annotation for audio and video will have to wait for a next generation of standard media players. But you can use Hypothesis today to annotate on as well as with media.

How to improve Wikipedia citations with Hypothesis direct links

Wikipedia aims to be verifiable. Every statement of fact should be supported by a reliable source that the reader can check. Citations in Wikipedia typically refer to online documents accessible at URLs. But with the advent of standard web annotation we can do better. We can add citations to Wikipedia that refer precisely to statements that support Wikipedia articles.

According to Wikipedia’s policy on citing sources:

Wikipedia’s Verifiability policy requires inline citations for any material challenged or likely to be challenged, and for all quotations, anywhere in article space.

Last night, reading https://en.wikipedia.org/wiki/Tubbs_Fire, I noticed this unsourced quote:

Sonoma County has four “historic wildfire corridors,” including the Hanly Fire area.

I searched for the source of that quotation, found it in a Press Democrat story, annotated the quote, and captured a Hypothesis direct link to the annotation. In this screenshot, I’ve clicked the annotation’s share icon, and then clicked the clipboard icon to copy the direct link to the clipboard. The direct link encapsulates the URL of the story, plus the information needed to locate the quotation within the story.

Given such a direct link, it’s straightforward to use it in a Wikipedia citation. Back in the Wikipedia page I clicked the Edit link, switched to the visual editor, set my cursor at the end of the unsourced quote, and clicked the visual editor’s Cite button to invoke this panel:

There I selected the news template, and filled in the form in the usual way, providing the title of the news story, its date, its author, the name of the publication, and the date on which I accessed the story. There was just one crucial difference. Instead of using the Press Democrat URL, I used the Hypothesis direct link.

And voilĂ ! There’s my citation, number 69, nestled among all the others.

Citation, as we’ve known it, begs to be reinvented in the era of standard web annotation. When I point you to a document in support of a claim, I’m often thinking of a particular statement in that document. But the burden is on you to find that statement in the document to which my citation links. And when you do, you may not be certain you’ve found the statement implied by my link. When I use a direct link, I relieve you of that burden and uncertainty. You land in the cited document at the right place, with the supporting statement highlighted. And if it’s helpful we can discuss the supporting statement in that context.

I can envision all sorts of ways to turbocharge Wikipedia’s workflow with annotation-powered tools. But no extra tooling is required to use Hypothesis and Wikipedia in the way I’ve shown here. If you find an unsourced quote in Wikipedia, just annotate it in its source context, capture the direct link, and use it in the regular citation workflow. For a reader who clicks through Wikipedia citations to check original sources, this method yields a nice improvement over the status quo.

Annotating Web Audio

On a recent walk I listened to Unmasking Misogyny on Radio Open Source. One of the guests, Danielle McGuire, told the story of Rosa Parks’ activism in a way I hadn’t heard before. I wanted to capture that segment of the show, save a link to it, and bookmark the link for future reference.

If you visit the show page and click the download link, you’ll load the show’s MP3 file into your browser’s audio player. Nowadays that’s almost always going to be the basic HTML5 player. Here’s what it looks like in various browsers:

The show is about an hour long. I scrubbed along the timeline until I heard Danielle McGuire’s voice, and then zeroed in on the start and end of the segment I wanted to capture. It starts at 18:14 and ends at 21:11. Now, how to link to that segment?

I first investigated this problem in 2004. Back then, I learned that it’s possible to fetch and play random parts of MP3 files, and I made a web app that would figure out, given start and stop times like 18:14 and 21:11, which part of the MP3 file to fetch and play. Audio players weren’t (and still aren’t) optimized for capturing segments as pairs of minute:second parameters. But once you acquired those parameters, you could form a link that would invoke the web app and play the segment. Such links could then be curated, something I often did using the del.icio.us bookmarking app.

Revisiting those bookmarks now, I’m reminded that Doug Kaye and I were traveling the same path. At ITConversations he had implemented a clipping service that produced URLs like this:

http://www.itconversations.com/clip.php?showid=378&start=4:37&stop=6:04

Mine looked like this:

http://udell.infoworld.com/?url=http://www.itconversations.com/audio/download/ITConversations-378.mp3&beg=4:37&end=6:04

Both of those audio-clipping services are long gone. But the audio files survive, thanks to the exemplary cooperation between ITConversations and the Internet Archive. So now I can resurrect that ITConversations clip — in which Doug Engelbart, at the Accelerating Change conference in 2004, describes the epiphany that inspired his lifelong quest — like so:

http://jonudell.net/av/audio.html?url=http://jonudell.net/audio/ITC.AC2004-DougEngelbart-2004.11.07.mp3&startmin=4&startsec=37&endmin=6&endsec=4

And here’s the segment of Danielle McGuire’s discussion of Rosa Parks that I wanted to remember:

http://jonudell.net/av/audio.html?url=http://ia601506.us.archive.org/5/items/171207OSPODCASTSexualHarassment/171207-OS-PODCAST-SexualHarassment.mp3&startmin=18&startsec=14&endmin=21&endsec=11

This single-page JavaScript app aims to function both as a player of predefined segments, and as a tool that makes it as easy as possible to define segments. It’s still a work in progress, but I’m able to use it effectively even as I continue to refine the interaction details.

For curation of these clips I am, of course, using Hypothesis. Here are some of the clips I’ve collected on the tag AnnotatingAV:

To create these annotations I’m using Hypothesis page notes. An annotation of this type is like a del.icio.us or pinboard.in bookmark. It refers to the whole resource addressed by a URL, rather than to a segment of interest within a resource.

Most often, a Hypothesis user defines a segment of interest by selecting a passage of text in a web document. But if you’re not annotating any particular selection, you can use a page note to comment on, tag, and discuss the whole document.

Since each audio clip defines a segment as a standalone web page with a unique URL, you can use a Hypothesis page note to annotate that standalone page:

It’s a beautiful example of small pieces loosely joined. My clipping tool is just one way to form URLs that point to audio and video segments. I hope others will improve on it. But any clipping tool that produces unique URLs can work with Hypothesis and, of course, with any other annotation or curation tool that targets URLs.

Syndicating annotations

Steel Wagstaff asks:

Immediate issue: we’ve got books on our dev server w/ annotations & want to move them intact to our production instance. The broader use case: I publish an open Pressbook & users make public comments on it. Someone else wants to clone the book including comments. How?

There are currently three URL-independent identifiers that can be used to coalesce annotations across instances of a web document published at different URLs. The first was the PDF fingerprint, the second was the DOI, and a third, introduced recently as part of Hypothesis’ EPUB support, uses Dublin Core metadata like so:

<meta name=”dc.identifier” content=”xchapter_001″>
<meta name=”dc.relation.ispartof” content=”org.example.hypothesis.demo.epub-samples.moby-dick-basic”>

If you dig into our EPUB.js and Readium examples, you’ll find those declarations are common to both instances of chapter 1 of Moby Dick. Here’s an annotation anchored to the opening line, Call me Ishmael. When the Hypothesis client loads, in a page served from either of the example URLs, it queries for two identifiers. One is the URL specific to each instance. The other is a URN formed from the common metadata, and it looks like this:

urn:x-dc:org.example.hypothesis.demo.epub-samples.moby-dick-basic/chapter_001

When you annotate either copy, you associate its URL with this Uniform Resource Name (URN). You can search for annotations using either of the URLs, or the just URN like so:

https://hypothes.is/search?q=url:urn:x-dc:org.example.hypothesis.demo.epub-samples.moby-dick-basic/xchapter_001

Although it sprang to life to support ebooks, I think this mechanism will prove more broadly useful. Unlike PDF fingerprints and DOIs, which typically identify whole works, it can be used to name chapters and sections. At a conference last year we spoke with OER (open educational resource) publishers, including Pressbooks, about ways to coalesce annotations across their platforms. I’m not sure this approach is the final solution, but it’s usable now, and I hope pioneers like Steel Wagstaff will try it out and help us think through the implications.

Really, AT&T?

We woke up this morning in Santa Rosa to smoke and sirens. Last night’s winds fanned a bunch of wildfires in the North Bay, and parts of our town are destroyed. We’re a few miles south of the evacuation zone, things might shift around, but at the moment we’re staying put.

Information was hard to come by. Our Comcast service is down. Our AT&T phones were up and running though, so I turned on my mobile hotspot and read this: “Cannot turn on hotspot, please visit att.com or call 611.”

WTF?

Here’s what happened. Last month we started hitting data overage charges. That hadn’t been an issue before, but we dropped by the AT&T store to review our options. The sales rep pushed hard for an upgrade to an unlimited data plan for an extra $5/mo. Not really necessary, we’re almost always on WiFi, but OK, sure, five bucks, why not?

Turns out the rep neglected to mention that the upgrade removed our tethering capability. This is not something you want to find out while breathing smoke, hearing sirens, and trying to make sense of the latest evacuation map for your burning city. According to the rep we spoke with today, this critical fact about tethering is often omitted from the upsell pitch.

We’ve got it turned back on now. I’m awaiting a callback from a manager’s manager about the additional $30/mo they plan to charge to restore a capability they hadn’t told me they were taking away.

This is a small thing. We are, of course, infinitely luckier than a bunch of folks in our town who will return to the charred foundations of what were their homes. We’re mainly thinking about them today. But while we’re waiting for the ash to settle, I just want to say: Really, AT&T?

Welcome to the Sapiezoic

The latest Long Now podcast, by David Grinspoon, takes a very long view indeed. As we transition from the Holocene to the Anthropocene, he thinks, we’re not just entering a new geological epoch, as shown here:

That alone would be a big deal. But epochs are just geologic eyeblinks a few million years in duration. Grinspoon thinks we might be entering something way bigger. Not just a period or an era. We might happen to be alive now at an eon boundary, as shown here:

There have only been four eons so far. Each was a major transition in earth history — a shift in the relationship between life and the planet. Life first emerged at the beginning of the Archean era around 4 billion years ago, when things cooled down enough. Around 2.5 billion years ago, cyanobacteria learned to photosynthesize. They bathed the world in oxygen, caused mass extinction, and deeply entangled life with the physical and chemical workings of the planet. At the boundary between the Proterozoic and Phanerozoic, around 500 million years ago, life went multicellular, plants and animals appeared, and the modern eon began.

Will our descendants look back on the Anthropocene as the dawn of a fifth eon? Grinspoon makes a compelling case that they might. The Archean cyanobacteria that poisoned the environment couldn’t know what they were doing. We can. Our infrastructure is taking over the planet as surely as the oxygenated atmosphere did. But it is, at least potentially, under our conscious control. The hallmark of the Sapiezoic eon he envisions: intentional re-terraforming.

Grinspoon cites one shining example: the Montreal Protocol that will, if we stay the course, reverse manmade ozone depletion. “We are as gods,” says Stewart Brand, “so we might as well get good at it.”

I’ve listened to all the Long Now podcasts, some more than once. This one rates very highly. It’s a great talk.

Fact-checking Naomi Klein’s “No Is Not Enough”

So my conclusion is that Klein, who says she wrote this book quickly, to respond to the current moment, with less attention to endnotes than usual, is generally reliable on facts.

The way in which I reached that conclusion is a pretty good example of the strategies outlined in Web Literacy for Student Fact-Checkers, and a reminder that those methods aren’t just for students. All of us — me, you, Naomi Klein, everyone — need to build those muscles and exercise them regularly.

On a hike last week I heard an excellent episode of Radio Open Source, featuring Naomi Klein, David Graeber, and Pankaj Mishra. One of the segments of interest that stuck with me is this remark by Naomi Klein:

We need to examine the way in which politics has been taken over by the logic of corporate branding, which is not something Trump started. Trump was just better at it than anybody else because he is himself a fully commercialized brand. So the table was set for Trump, he just showed up and said, “Well, I know this game better than you jokers, I’m the real thing, I’m a reality TV star and I’m a megabrand. Step aside!”

(If I could, I would link you directly to that segment in situ, that’s something I had working a long time ago, but since audio quotation still isn’t a ubiquitous feature of the web, here’s the compelling minute of audio that contains that quote.)

I was previously aware of Naomi Klein but had never heard her speak, had read none of her books, and was only slightly familiar with her critique of corporatized politics. Her conversation with Chris Lydon on that podcast prompted me to read her new book, No Is Not Enough, published just a few weeks ago.

I was also slightly familiar with criticism of Klein’s views. So, in a moment when the president of the United States had just tweeted a video of himself performing a mock attack during his time as a reality TV personality on the pro wrestling circuit, I was curious to know her thoughts but also prepared to take them with a grain of salt.

Here’s the book’s table of contents:

The first section elaborates on the above quote. Human megabrands are, Klein points out, a relatively new thing. She writes:

People keep asking — is he going to divest? Is he going to sell his businesses? Is Ivanka going to? But it’s not at all clear what these questions even mean, because their primary businesses are their names. You can’t disentangle Trump the man from Trump the brand; those two entities merged long ago. Every time he sets foot in one of his properties — a golf club, a hotel, a beach club — White House press corps in tow, he is increasing his overall brand value, which allows his company to sell more memberships, rent more rooms, and increase fees.

I hope we can agree across ideologies that this kind of thing is unhealthy. In the audio clip I cited above, Klein notes that the antidote is not a liberal megabrand, not Zuckerberg or Oprah. Conflation of brand power and political power is just a bad idea, and we need to reckon with that.

The rest of the book builds on arguments made in her earlier ones: Capitalism’s winners exploit natural and man-made crises to consolidate their winnings (The Shock Doctrine); climate change presents an existential challenge to that world order (This Changes Everything). Since I haven’t read those books, and have only just now read a few reviews pro and con, I lack the full context needed to evaluate the arguments in No Is Not Enough. But that’s exactly the right setup for the point about fact-checking that I want to make here.

How reliable is Naomi Klein on facts? I came to No Is Not Enough with no strong opinion one way or another. I raised an eyebrow, though, when I read this passage about Treasury secretary Steven Mnuchin:

Even among Goldman alumni, Steven Mnuchin has distinguished himself by his willingness to profit off misery. Afer the 2008 Wall Street collapse, and in the middle of the foreclosure crisis, Mnuchin purchased a California bank. The renamed company, OneWest, earned Mnuchin the nickname “Foreclosure King,” reportedly collecting $1.2 billion from the government to help cover the losses for foreclosed homes and evicting tens of thousands of people between 2009 and 2014. One attempted foreclosure involved a ninety-year-old woman who was behind on her payments by 27 cents.”

The last sentence sent me to Google, where I quickly learned it had been debunked in a tweetstorm by Ted Frank in January 2017. He works for a libertarian think tank, and I doubt we’d see eye to eye on many issues, but his takedown of the 27-cent claim was accurate. Politico, for example, corrected its version of the story.

This is unfortunate because everything else in the above quote seems to check out. And you don’t have to be a liberal snowflake to worry legitimately about the Goldmanization of the US Cabinet.1

I went on to spot-check a number of other claims in No Is Not Enough and again, so far as I can tell with modest effort, everything checks out. So my conclusion is that Klein, who says she wrote this book quickly, to respond to the current moment, with less attention to endnotes than usual, is generally reliable on facts.

The way in which I reached that conclusion is a pretty good example of the strategies outlined in Web Literacy for Student Fact-Checkers, and a reminder that those methods aren’t just for students. All of us — me, you, Naomi Klein, everyone — need to build those muscles and exercise them regularly.


1. On another episode of Radio Open Source, in a remarkable dialogue between Pat Buchanan and Ralph Nader, the arch-conservative Buchanan aligned with the arch-liberal Nader on that point:

I agree completely with Ralph, I did not know we were going to make the world safe for Goldman Sachs, and I am a little surprised to find three or four or five of these guys, one or two might have been OK.