How shared vocabularies tie the annotated web together

I’m fired up about the work I want to share at Domains 2017 this summer. The tagline for the conference is Indie Tech and Other Curiosities, and I plan to be one of the curiosities!

I’ve long been a cheerleader for the Domain of One’s Own movement. In Reclaiming Innovation, Jim Groom wrote about the need to “understand technologies as ‘potentiality’ (to graft a concept by Anton Chekov from a literary to a technical context).” He continued:

This is the idea that within the use of every technical tool there is more than just the consciousness of that tool, there is also the possibility to spark something beyond those predefined uses. The only real way to galvanize that potentiality is to provide the conditions of possibility — that is, a toolkit for user innovation.

My recent collaboration with Mike Caulfield on the Digital Polarization Initiative has led to the creation of just such a toolkit. It supports DigiPo in the ways described and shown here. A version of the toolkit, demoed here, will support a team of investigative journalists. Now I need to show how the toolkit enables educators, scientists, investigative reporters, students — anyone who researches and writes articles or reports or papers backed by web-based evidence — to innovate in similar ways.

In tech we tend to abuse the term innovation so let me spell out exactly what I mean: Better ways to gather, organize, reason over, and cite online evidence. Web annotation, standardized this week by the W3C, is a key enabler. The web’s infinite space of addressable URLs is now augmented by a larger infinity of segments of interest within the resources pointed to by URLs. In the textual realm, paragraphs, list items, sentences, or individual words can be reliably linked to conversations — but also applications — that live in connected annotation layers.

A web of addressable segments of interest is a necessary, but not sufficient, condition of possibility. We also need tools that enable us to gather, organize, recombine, and cite those segments. And some of those tools need to be malleable in the hands of users who can shape them for their own purposes.

When I reread Vannevar Bush’s As We May Think, to prepare for a conversation about it with Gardner Campbell and Jeremy Dean (video, Gardner’s reflections), I focused on this passage:

He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together.

Nowadays that first encyclopedia article lives at one URL. The pertinent item in a history is a segment of interest within another URL-addressable resource. How do we tie them together? A crucial connector is a tag that belongs to neither resource but refers to both.

When tools control the sets of tags available for resource interconnection, they enable groups of people to make such connections reliably. That’s what the DigiPo toolkit does when it offers a list of investigation pages, drawn from the namespace of a wiki, as the set of tags that connect annotation-defined evidence to investigations. You see that happening with the DigiPo toolkit shown here, and with a variant of the toolkit shown here. In both cases the tags that bind evidence to wiki pages are controlled by software that acquires a list of wiki pages and presents the names of those pages as selectable tags.

One future direction for the toolkit leads to software that acquires lists of pages from other kinds of content management systems: WordPress, Drupal, you name it. Every CMS defines a namespace that is implicitly a list of tags that can be used to bind sets of resources to the pages served by that CMS. If you’re looking to adapt a DigiPo-like tool to your CMS, I’ll be delighted to show you how.

Such adaptation, though, requires somebody to write some code. While it’s unfashionable in some circles to say so, I don’t think everyone should learn to code. There’s a more fundamental web literacy, nicely captured by Audrey Watters here:

It’s about understanding the components of the Web and knowing how to tag and then manipulate them. By thinking and developing sets of named resources, you are a Web thinker. This isn’t about programming but rather the creation of sets of resources and the identification of components that work with those resources and combine them to create solutions.

Web annotation vastly enlarges the universe of resources that can be named. But it’s on us to name them. Tags are a principal way we do that. If our naming of resources is going to be an effective way to organize and combine them, though, we need to do it reliably and consistently. Software can enforce that consistency, but not everyone can write software. So a user innovation toolkit for the annotated web needs to empower users to enforce consistent naming without writing code.

A couple of weeks ago I built a Chrome extension that enables users to define their own lists of shared tags by recording them in an Google Doc. The demonstration video prompted this query from Jim Groom:

I just got through with a workshop here demoing for a European group that may be using it to annotate online legislation for data privacy set to go live in 2018. They are teaching a course on it, and this could be one of the spaces/hubs they build the open part around. I came back to this video just now, but got the sense I could already tag from within annotations/pages, so how does the tag helper change this? Just a different way at it? Is it new functionality from previous tags? I love that you can have a Google Doc list of tags, but the video example is not making sense to me for some reason. And I wanna know :)

Here’s my response. That tag helper, now incorporated into the toolkit I’m evolving for DigiPo and other uses, makes it possible for people who don’t write code to define tag namespaces that govern their gathering, organization, recombination, and citation not only of URL-addressable resources but also of annotation-addressable segments of interest within those resources. People can “tie them together” — as Vannevar Bush imagined — in the ways their interests and workflows require.

Does that answer the question? If not, please keep asking until I do so properly. User-defined tag namespaces, though admittedly still a curiosity, are one of the best ways to make collective use of a web of addressable segments.

5 thoughts on “How shared vocabularies tie the annotated web together

  1. I find that my personal bookmark manager, which I originally developed back in 2003, has one part of these kind of capabilities. It lets me collect bookmarks under a filing name, and automatically links to other sections of the bookmark collection if they have been filed by a suitably named filing path. For example, if I have filing paths “Python” and “Java”, and then add a third path named “Java And Python”, then when I search for “Python”, the manager will inform me that I have related links at “And Java”, along with a link to see their listing. This gives me a good dose of serendipity.

    It doesn’t have the ability to create a narrative page like yours, though, nor to have a timeline nor to share the results. I think I could probably add them without too much work, if put to it.

    This bookmarks manager is written in javascript and runs in the browser. The ideas about the data and linking model (though the software has evolved since then) are discussed at:

  2. Jon, that link to the demonstration video referenced by Jim Groom doesn’t seem to be working, at least for me…just a heads up.

    I guess I have a similar question to Jim’s…I can already add tags to a annotation without the extension. So what is the key thing that the extension adds? Is it the pre-populated tag list, or is it the fact that a “segment of interest” that is tagged, say, to a specific page, will now show up on that page automatically (perhaps in a widget, as in Digipo) and not just in a general search for that tag?

  3. “that link to the demonstration video”

    Fixed, thanks.

    “Is it the pre-populated tag list, or is it the fact that a “segment of interest” that is tagged, say, to a specific page, will now show up on that page automatically (perhaps in a widget, as in Digipo) and not just in a general search for that tag?”

    Both! The controlled tag list enforces a mapping between a set of annotations and a context in which those annotations are displayed.

    It’s true that we could agree on a rule, for example — copy the wiki id of the web page into the Hypothesis tag editor — and if everyone followed that rule consistently, the software that gathers annotations into the page would be none the wiser.

    But rules like that are hard to follow at all, never mind consistently. The extension makes that rule easy to follow and ensures it will happen consistently.

    In the two fully-worked examples — DigiPo and EIC ( — the controlled tags are derived automatically from the namespace of the CMS.

    The Google Doc example is still speculative, but the idea is that it’s an easy way for a group to define and share a tag vocabulary that can then be leveraged in a similar way, without needing to write code to produce the list.

Leave a Reply