Fact-checking Naomi Klein’s “No Is Not Enough”

So my conclusion is that Klein, who says she wrote this book quickly, to respond to the current moment, with less attention to endnotes than usual, is generally reliable on facts.

The way in which I reached that conclusion is a pretty good example of the strategies outlined in Web Literacy for Student Fact-Checkers, and a reminder that those methods aren’t just for students. All of us — me, you, Naomi Klein, everyone — need to build those muscles and exercise them regularly.

On a hike last week I heard an excellent episode of Radio Open Source, featuring Naomi Klein, David Graeber, and Pankaj Mishra. One of the segments of interest that stuck with me is this remark by Naomi Klein:

We need to examine the way in which politics has been taken over by the logic of corporate branding, which is not something Trump started. Trump was just better at it than anybody else because he is himself a fully commercialized brand. So the table was set for Trump, he just showed up and said, “Well, I know this game better than you jokers, I’m the real thing, I’m a reality TV star and I’m a megabrand. Step aside!”

(If I could, I would link you directly to that segment in situ, that’s something I had working a long time ago, but since audio quotation still isn’t a ubiquitous feature of the web, here’s the compelling minute of audio that contains that quote.)

I was previously aware of Naomi Klein but had never heard her speak, had read none of her books, and was only slightly familiar with her critique of corporatized politics. Her conversation with Chris Lydon on that podcast prompted me to read her new book, No Is Not Enough, published just a few weeks ago.

I was also slightly familiar with criticism of Klein’s views. So, in a moment when the president of the United States had just tweeted a video of himself performing a mock attack during his time as a reality TV personality on the pro wrestling circuit, I was curious to know her thoughts but also prepared to take them with a grain of salt.

Here’s the book’s table of contents:

The first section elaborates on the above quote. Human megabrands are, Klein points out, a relatively new thing. She writes:

People keep asking — is he going to divest? Is he going to sell his businesses? Is Ivanka going to? But it’s not at all clear what these questions even mean, because their primary businesses are their names. You can’t disentangle Trump the man from Trump the brand; those two entities merged long ago. Every time he sets foot in one of his properties — a golf club, a hotel, a beach club — White House press corps in tow, he is increasing his overall brand value, which allows his company to sell more memberships, rent more rooms, and increase fees.

I hope we can agree across ideologies that this kind of thing is unhealthy. In the audio clip I cited above, Klein notes that the antidote is not a liberal megabrand, not Zuckerberg or Oprah. Conflation of brand power and political power is just a bad idea, and we need to reckon with that.

The rest of the book builds on arguments made in her earlier ones: Capitalism’s winners exploit natural and man-made crises to consolidate their winnings (The Shock Doctrine); climate change presents an existential challenge to that world order (This Changes Everything). Since I haven’t read those books, and have only just now read a few reviews pro and con, I lack the full context needed to evaluate the arguments in No Is Not Enough. But that’s exactly the right setup for the point about fact-checking that I want to make here.

How reliable is Naomi Klein on facts? I came to No Is Not Enough with no strong opinion one way or another. I raised an eyebrow, though, when I read this passage about Treasury secretary Steven Mnuchin:

Even among Goldman alumni, Steven Mnuchin has distinguished himself by his willingness to profit off misery. Afer the 2008 Wall Street collapse, and in the middle of the foreclosure crisis, Mnuchin purchased a California bank. The renamed company, OneWest, earned Mnuchin the nickname “Foreclosure King,” reportedly collecting $1.2 billion from the government to help cover the losses for foreclosed homes and evicting tens of thousands of people between 2009 and 2014. One attempted foreclosure involved a ninety-year-old woman who was behind on her payments by 27 cents.”

The last sentence sent me to Google, where I quickly learned it had been debunked in a tweetstorm by Ted Frank in January 2017. He works for a libertarian think tank, and I doubt we’d see eye to eye on many issues, but his takedown of the 27-cent claim was accurate. Politico, for example, corrected its version of the story.

This is unfortunate because everything else in the above quote seems to check out. And you don’t have to be a liberal snowflake to worry legitimately about the Goldmanization of the US Cabinet.1

I went on to spot-check a number of other claims in No Is Not Enough and again, so far as I can tell with modest effort, everything checks out. So my conclusion is that Klein, who says she wrote this book quickly, to respond to the current moment, with less attention to endnotes than usual, is generally reliable on facts.

The way in which I reached that conclusion is a pretty good example of the strategies outlined in Web Literacy for Student Fact-Checkers, and a reminder that those methods aren’t just for students. All of us — me, you, Naomi Klein, everyone — need to build those muscles and exercise them regularly.


1. On another episode of Radio Open Source, in a remarkable dialogue between Pat Buchanan and Ralph Nader, the arch-conservative Buchanan aligned with the arch-liberal Nader on that point:

I agree completely with Ralph, I did not know we were going to make the world safe for Goldman Sachs, and I am a little surprised to find three or four or five of these guys, one or two might have been OK.

Thoughts on Audrey Watters’ “Thoughts on Annotation”

Back in April, Audrey Watters’ decided to block annotation on her website. I understand why. When we project our identities online, our personal sites become extensions of our homes. To some online writers, annotation overlays can feel like graffiti. How can we respect their wishes while enabling conversations about their writing, particularly conversations that are intimately connected to the writing? At the New Media Consortium conference recently, I was finally able to meet Audrey in person, and we talked about how to balance these interests. Yesterday Audrey posted her thoughts about that conversation, and clarified a key point:

You can still annotate my work. Just not on my websites.

Exactly! To continue that conversation, I have annotated that post here, and transcluded my initial set of annotations below.


judell 6/27/2017 #

using an HTML meta tag to identify annotation preferences

This is just a back-of-the-napkin sketch of an idea, not a formal proposal.

judell 6/27/2017 #

I’m much less committed to having one canonical “place” for annotations than Hypothesis is

Hypothesis isn’t committed to that either. The whole point of the newly-minted web annotation standard is to enable an ecosystem of interoperable annotation clients and servers, analogous to comparable ecosystems of email and web clients and servers.

judell 6/27/2017 #

Hypothesis annotations of a PDF can be centralized, no matter where the article is hosted or whether it’s a local copy

Centralization and decentralization are slippery terms. I would rather say that Hypothesis can unify a set of annotations across a family of representations of the “same” work. Some members of that family might be HTML pages, others might be PDFs hosted on the web or kept locally.

It’s true that when Hypothesis is used to create and view such annotations, they are “centralized” in the Hypothesis service. But if someone else stands up an instance of Hypothesis, that becomes a separate pool of annotations. Likewise, we at Hypothesis have planned for, and expect to see, a world in which non-Hypothesis-based implementations of standard annotation capability will host still other separate pools of annotations.

So you might issue three different API queries — to Hypothesis, to a Hypothesis-based service, and to a non-Hypothesis-based service — for a PDF fingerprint or a DOI. Each of those services might or might not internally unify annotations across a family of “same” resources. If you were to then merge the results of those three queries, you’d be an annotation aggregator — the moral equivalent of what Radio UserLand, Technorati, and other blog aggregators did in the early blogosphere.

Dumb servers for personal clouds

I’m delighted to hear that my daughter and her best friend will be collaborating on a blog. And of course I’m tickled that she asked my advice on where to run it. I noted that Ghost is the new kid on the block, and is much simpler than what WordPress has become. But they want to do it for free, so WordPress it is.

Then she surprised me with this narrative:

I heard it’s better if you self-host, so that’s what we’ll want to do, right? I think self-hosting is good because you don’t have the website name in your blog URL. Also, more importantly, I think it’s how you ensure that it’s actually yours.

It turns out that she’d conflated self-hosting, i.e. running your own instance of the WordPress software and database, with the simpler method my own blog exemplifies. I use WordPress.com precisely because, although I do run my own servers, the fewer the better. I’m happy to rely on WordPress to host my blog for me. I’m also happy to pay them $13/year to connect jonudell.wordpress.com to blog.jonudell.net.

So that’ll be the solution for my daughter. But I’m left wondering how many others conflate self-hosting with domain redirection, and how that affects their thinking about control of their own digital identities and data. I suspect it’s often unclear that, whether you run a blog on WordPress.com or on another provider’s server, your data is equally under your control. Likewise, use of a personal domain name is equally possible in both cases. What is the difference? With self-hosting, you can use arbitrary WordPress plugins and themes, and/or modify the software. Sometimes, for some people, that matters. Often, for many, it doesn’t.

That said, I agree with Mike Caulfield’s plea to make servers dumb again. In my ideal world, I’d not only outsource the management of the blog software to WordPress, but would also connect the software to my personal cloud, which would be implemented by my chosen storage provider.

I got this idea from Gorden Bell’s MyLifeBits, and riffed on it to imagine cloud-hosted lifebits. Jim Groom ably summed up the argument here:

Will we ever get there? It has to happen sooner or later. Maybe, as Doug Levin suggests today, it’ll be sooner.

Celebrating Infrastructure

When cycling in forested New England countryside I sometimes wondered about the man-made forest built along the roadside — telephone poles, power lines, transformers — and thought someone should write a book about the industrial landscape. It turns out that someone did. Brian Hayes spent many years traveling around America, researching and photographing the infrastructure that sustains our civilization. The book he produced, Infrastructure: A Field Guide to the Industrial Landscape (2005, 2nd ed. 2014), is everything I imagined it would be.

(I found the book by way of a comment that Brian Hayes left here on this blog. “Couldn’t be that Brian Hayes,” I thought. But his signature led me to his blog and thence to Infrastructure‘s home on the web. I’m passing it along here in part to remind myself that my favorite books often aren’t new or well publicized. I find them serendipitously after they’ve been around for a while.)

My father and his twin brother were students of nature in a way I’ve never been. Their knowledge of plants and animals was encyclopedic and ever-expanding. But for most of us, the natural landscape is not an expanse of unnamed and unknown objects. We recognize egrets, crows, hummingbirds, oaks, pines, and maples. The same isn’t true of the industrial landscape. More often than not, driving along some industrial corridor, we’re likely to ask the question Brian Hayes’ daughter asked him: “What’s that thing?” Infrastructure answers those questions for her, and for us.

Chapters on mining, waterworks, farming, energy production and distribution, transportation, shipping, and waste management follow a plan that “traces the flow of materials, information, and energy” throughout the web of industrial networks. We learn how industrial processes work, and how to identify the structures that house and implement them. Not all of us encounter quarries, mills, dams, refineries, or power plants on a daily basis. But water towers, roads, bridges, power lines, and data cables are as much a part of our landscape as what nature put there.

Hayes invites us to know more about the names, appearances, and workings of the industrial landscape. He also challenges us to reconsider how we feel about that landscape.

I stood by the side of a highway near Gallup, New Mexico, looking on a classic vista of the American West: red sandstone buttes, rising from a valley floor. … In front of the cliffs, and towering over them, were several cylindrical spires that I recognized as petroleum fractionating columns; off to one side was a grove of gleaming white spherical tanks. … I suspect that most viewers of this scene would consider the industrial hardware to be an intrusion, a distraction, perhaps even a desecration of the landscape.

Guilty as charged. But I’m provoked by this book to reconsider. Celebrate infrastructure, don’t hide it, Stewart Brand tweeted today. “It is civilization’s metabolism and should be its pride..

Weaving the annotated web

In 1997, at the first Perl Conference, which became OSCON the following year, my friend Andrew Schulman and I both gave talks on how the web was becoming a platform not only for publishing, but also for networked software.

Here’s the slide I remember from Andrew’s talk:

http://wwwapps.ups.com/tracking/tracking.cgi?tracknum=1Z742E220310270799

The only thing on it was a UPS tracking URL. Andrew asked us to stare at it for a while and think about what it really meant. “This is amazing!” he kept saying, over and over. “Every UPS package now has its own home page on the world wide web!”

It wasn’t just that the package had a globally unique identifier. It named a particular instance of a business process. It made the context surrounding the movement of that package through the UPS system available to UPS employees and customers who accessed it in their browsers. And it made that same context available to the Perl programs that some of us were writing to scrape web pages, extract their data, and repurpose it.

As we all soon learned, URLs can point to many kinds of resources: documents, interactive forms, audio or video.

The set of URL-addressable resources has two key properties: it’s infinite, and it’s interconnected. Twenty years later we’re still figuring out all the things you can do on a web of hyperlinked resources that are accessible at well-known global addresses and manipulated by a few simple commands like GET, POST, and DELETE.

When you’re working in an infinitely large universe it can seem ungrateful to complain that it’s too small. But there’s an even larger universe of resources populated by segments of audio and video, regions of images, and most importantly, for many of us, text in web documents: paragraphs, sentences, words, table cells.

So let’s stare in amazement at another interesting URL:

https://hyp.is/LoaMFCSJEee3aAMJuXhO-w/www.ics.uci.edu/~fielding/pubs/dissertation/software_arch.htm

Here’s what it looks like to a human who follows the link: You land on a web page, in this case Roy Fielding’s dissertation on web architecture, it scrolls to the place where I’ve highlighted a phrase, and the Hypothesis sidebar displays my annotation which includes a comment and a tag.

And here’s what that resource looks like to a computer when it fetches a variant of that link:

{
    "body": [
        {
            "type": "TextualBody",
            "value": "components: web resources\n\nconnectors: links\n\ndata: data",
            "format": "text/markdown"
        },
        {
            "type": "TextualBody",
            "purpose": "tagging",
            "value": "IAnnotate2017"
        }
    ],
    "target": [
        {
            "source": "https://www.ics.uci.edu/~fielding/pubs/dissertation/software_arch.htm",
            "selector": [
                {
                    "type": "XPathSelector",
                    "value": "/table[2]/tbody[1]/tr[1]/td[1]",
                    "refinedBy": {
                        "start": 82,
                        "end": 114,
                        "type": "TextPositionSelector"
                    }
                },
                {
                    "type": "TextPositionSelector",
                    "end": 4055,
                    "start": 4023
                },
                {
                    "exact": "components, connectors, and data",
                    "prefix": "tion of architectural elements--",
                    "type": "TextQuoteSelector",
                    "suffix": "--constrained in their relations"
                }
            ]
        }
    ],
    "created": "2017-04-18T22:48:46.756821+00:00",
    "@context": "http://www.w3.org/ns/anno.jsonld",
    "creator": "acct:judell@hypothes.is",
    "type": "Annotation",
    "id": "https://hypothes.is/a/LoaMFCSJEee3aAMJuXhO-w",
    "modified": "2017-04-18T23:03:54.502857+00:00"
}

The URL, which we call a direct link, isn’t itself a standard way to address a selection of text, it’s just a link that points to a web resource. But the resource it points to, which describes the highlighted text and its coordinates within the document, is — since February of this year — a W3C standard. The way I like to think about it is that the highlighted phrase — and every possible highlighted phrase — has its own home page on the web, a place where humans and machines can jointly focus attention.

If we think of the web we’ve known as a kind of fabric woven together with links, the annotated web increases the thread count of that fabric. When we weave with pieces of URL-addressable documents, we can have conversations about those pieces, we can retrieve them, we can tag them, and we can interconnect them.

Working with our panelists and others, it’s been my privilege to build a series of annotation-powered apps that begin to show what’s possible when every piece of the web is addressable in this way.

I’ll show you some examples, then invite my collaborators — Beth Ruedi from AAAS, Mike Caulfield from Washington State University Vancouver, Anita Bandrowski from SciCrunch, and Maryann Martone from UCSD and Hypothesis — to talk about what these apps are doing for them now, and where we hope to take them next.

Science in the Classroom

First up is a AAAS project called Science in the Classroom, a collection of research papers from the Science family of journals that are annotated — by graduate students — so teachers can help younger students understand the methods and outcomes of scientific research.

Here’s one of those annotated papers. A widget called the Learning Lens toggles layers of annotation and off.

Here I’ve selected the Glossary layer, and I’ve clicked on the word “distal” to reveal the annotation attached to it.

Now lets look behind the scenes:

Hypothesis was used to annotate the word “distal”. But Learning Lens predated the use of Hypothesis, and the Science in the Classroom team wanted to keep using Learning Lens to display annotations. What they didn’t want was the workflow behind it, which required manual insertion of annotations into HTML pages.

Here’s the solution we came up with. Use Hypothesis to create annotations, then use some JavaScript in Science in the Classroom pages to retrieve Hypothesis annotations and write them into the pages, using the same format that had been applied manually. The preexisting and unmodified Learning Lens JavaScript can then do what it does: pick up the annotations, assign color-coded highlights based on tags, and show the annotations when you click on the highlights.

What made this possible was a JavaScript library that helps with the heavy lifting required to attach an annotation to its intended target in the document.

That library is part of the Hypothesis client, but it’s also available as a standalone module that can be used for other purposes. It’s a nice example of how open source components can enable an ecosystem of interoperable annotation services.

DigiPo / EIC

Next up is a toolkit for student fact-checkers and investigative journalists. You’ve already heard from Mike Caulfield about the Digital Polarization Project, or DigiPo, and from Stefan Candea about the European Investigative Collaborations network. Let’s look at how we’ve woven annotation into their investigative workflows.

These investigations are both written and displayed in a wiki. This is a DigiPo example:

I did the investigation of this claim myself, to test out the process we were developing. It required me to gather a whole lot of supporting evidence before I could begin to analyze the claim. I used a Hypothesis tag to collect annotations related to the investigation, and you can see them in this Hypothesis view:

I can be very disciplined about using tags this way, but it’s a lot to ask of students, or really almost anyone. So we created a tool that knows about the set of investigations underway in the wiki, and offers the names of those pages as selectable tags.

Here I’ve selected a piece of evidence for that investigation. I’m going to annotate it, not by using Hypothesis directly, but instead by using a function in a separate DigiPo extension. That function uses the core anchoring libraries to create annotations in the same way the Hypothesis client does.

But it leads the user through an interstitial page that asks which investigation the annotation belongs to, and assigns a corresponding tag to the annotation it creates.

Back in the wiki, the page embeds the same Hypothesis view we’ve already seen, as a Related Annotations widget pinned to that particular tag:

I had so much raw material for this article that I needed some help organizing it. So I added a Timeline widget that gathers a subset of the source annotations that are tagged with dates.

To put something onto the timeline, you select a date on a page.

Then you create an annotation with a tag corresponding to the date.

Here’s what the annotation looks like in Hypothesis.

Over in the wiki, our JavaScript finds annotations that have these date tags and arranges them on the Timeline.

Publication dates aren’t always evident on web pages, sometimes you have to do some digging to find them. When you do find one, and annotate a page with it, you’ve done more than populate the Timeline in a DigiPo page. That date annotation is now attached to the source page for anyone to discover, using Hypothesis or any other annotation-aware viewer. And that’s true for all the annotations created by DigiPo investigators. They’re woven into DigiPo pages, but they’re also available for separate reuse and aggregation.

The last and most popular annotation-related feature we added to the toolkit is called Footnotes. Once you’ve gathered your raw material into the Related Annotations bucket, and maybe organized some of it onto the Timeline, you’ll want to weave the most pertinent references into the analysis you’re writing.

To do that, you find the annotation you gathered and use Copy to clipboard to capture the direct link.

Then you wrap that link around some text in the article:

When you refresh the page, here’s what you get. The direct link does what a direct link does: it takes you to the page, scrolls you to the annotation in context. But it can take a while to review a bunch of sources that way.

So the page’s JavaScript also creates a link that points down into the Footnotes section. And there, as Ted Nelson would say, and as Nate Angell for some reason hates hearing me say, the footnote is “transcluded” into the page so all the supporting context is right there.

One final point about this toolkit. Students don’t like the writing tools available in wikis, and for good reason, they’re pretty rough around the edges. So we want to enable them to write in Google Docs. We also want them to footnote their articles using direct links because that’s the best way to do it. So here’s a solution we’re trying. From the wiki you’ll launch into Google Docs where you can do your writing in a much more robust editor that makes it really easy to include images and charts. And if you use direct links in that Google Doc, they’ll still show up as Footnotes.

We’re not yet sure this will pan out, but my colleague Maryann Martone, who uses Hypothesis to gather raw material for her scientific papers, and who writes them in Google Docs, would love to be able to flow annotations through her writing tool and into published footnotes.

SciBot

Maryann is the perfect segue to our next example. Along with Anita Bandrowski, she’s working to increase the thread count in the fabric of scientific literature. When neuroscientists write up the methods used in their experiments, the ingredients often include highly specific antibodies. These have colloquial names, and even vendor catalog numbers, but they still lacked unique identifiers. So the Neuroscience Information Framework, NIF for short, has defined a namespace called RRID (research resource identifier), built a registry for RRIDs, and convinced a growing number of authors to mention RRIDs in their papers.

Here’s an article with RRIDs in it. They’re written directly into the text because the text is the scientific record, it’s the only artifact that’s guaranteed to be preserved. So if you’re talking about a goat polyclonal antibody, you look it up in the registery, capture its ID, and write it directly into the text. And if it’s not in the registry, please add it, you’ll make Anita very happy if you do!

The first phase of a project we call SciBot was about validating those RRIDs. They’re just freetext, after all, typed in by authors. Were the identifiers spelled correctly? Did they point to actual registry entries? To find out we built a tool that automatically annotates occurrences of RRIDs.

In this example, Anita is about to click on the SciBot tool, which launches from a bookmarklet, and sends the text of the paper to a backend service. It scans the text for RRIDs, looks up each one in the registry, and uses the Hypothesis API to create an annotation — bound to the occurrence in the text — that reports the results of the registry lookup.

Here the Hypothesis realtime API is showing that SciBot has created three annotations on this page.

And here are those three annotations, anchored to their occurrences in the page, with registry entries displayed in the sidebar.

SciBot curators review these annotations and use tags to mark which are valid. When some aren’t, and need attention, the highlight focuses that attention on a specific occurrence.

This hybrid of automatic entity recognition and interactive human curation is really powerful. Here’s an example where an antibody doesn’t have an RRID but should.

Every automatic workflow needs human exception handling and error correction. Here the curator has marked an RRID that wasn’t written into the literature, but now is present in the annotation layer.

These corrections are now available to train a next-gen entity recognizer. Iterating through that kind of feedback loop will be a powerful way to mine the implicit data that’s woven into the scientific literature and make it explicit.

Here’s the Hypothesis dashboard for one of the SciBot curators. The tag cloud gives you a pretty good sense of how this process has been unfolding so far.

Publishers have begun to link RRIDs to the NIF registry. Here’s an example at PubMed.

If you follow the ZIRC_ZL1 link to the registry, you’ll find a list of other papers whose authors used the same experimental ingredient, which happens to be a particular strain of zebrafish.

This is the main purpose of RRIDs. If that zebrafish is part of my experiment, I want to find who else has used it, and what their experiences have been — not just what they reported in their papers, but ideally also what’s been discussed in the annotation layer.

Of course I can visit those papers, and search within them for ZIRC_ZLI, but with annotations we can do better. In DigiPo we saw how footnoted quotes from source documents can transclude into an article. Publishers could do the same here.

Or we could do this. It’s a little tool that offers to look up an RRID selected in text.

It just links to an instance of the Hypothesis dashboard that’s pinned to the tag for that RRID.

Those search results offer direct links that take you to each occurrence in context.

Claim Chart

Finally, and to bring us full circle, I recently reconnected with Andrew Schulman who works nowadays as a software patent attorney. There’s a tool of his trade called a claim chart. It’s a two-column table. In column one you list claims that a patent is making, which are selections of text from the claims section of the patent. And in column two you assemble bits of evidence, gathered from other sources, that bear on specific claims. Those bits of evidence are selections of text in other documents. It’s tedious to build a claim chart, it involves a lot of copying and pasting, and the evidence you gather is typically trapped in whatever document you create.

Andrew wondered if an annotation-powered app could help build claim charts, and also make the supporting evidence web-addressable for all the reasons we’ve discussed. If I’ve learned anything about annotation, it’s that when somebody asks “Can you do X with annotation?” the answer should always be: “I don’t know, should be possible, let’s find out.”

So, here’s an annotation-powered claim chart.

The daggers at top left in each cell are direct links. The ones in the first column go to patent claims in context.

The ones in the second column go to related statements in other documents.

And here’s how the columns are related. When you annotate a claim, you use a toolkit function called Add Selection as Claim.

Your selection here identifies the target document (that is, the patent), the claim chart you’re building (here, it’s a wiki page called andrew_test), and the claim itself (for example, claim 1).

Once you’ve identified the claims in this way, they’re available as targets of annotations in other documents. From a selection in another document, you use Add Selection as Claim-Related.

Here you see all the claims you’ve marked up, so it’s easy to connect the two statements.

The last time I read Vannevar Bush’s famous essay As We May Think, this was the quote that stuck with me.

When statements in documents become addressable resources on the web, we can weave them together in the way Vannevar Bush imagined.

Do Repeat Yourself, With Variations

Don’t Repeat Yourself (DRY) is a touchstone principle of software development. It’s often understood to inveigh against duplication of code. Copying a half-dozen lines from one program to another is a bad idea, DRY says, because if you change your mind about how that code works, you’ll have to revise it in several places. Better to convert those lines of code into a function that you write once and reuse.

More broadly, the DRY principle asserts:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

Code and data are two kinds of knowledge that ought to be represented canonically, and repeated — if at all — only by mechanical derivation, never by variation.

I often violate the DRY principle by indulging in CopyAndPasteProgramming. In my defense I point to another principle, CodeHarvesting, which defends duplication as a necessary stepping stone.

Letting a duplication of logic live for now, in order to see how to best universalize it at some later point.

For me, at least, that’s what tends to work best. A common theme doesn’t emerge until I’ve seen — and ideally others have seen and reacted to — several variations on that theme. This kind of duplication — deferred universalization — is beneficial, right?

Here’s another kind. In the JavaScript world the dominant engine of reuse is the Node Package Manager (NPM). When I first started using it a few years ago, I was shocked at the amount of duplication it entails. When you install an NPM package, the modules it depends on are copied into a subdirectory. If those modules depend on others, they are copied into yet deeper subdirectories. For even a simple JavaScript program you can end up with a forest of thousands of files.

A similar thing happens in the Python world. It’s a best practice, nowadays, to use a tool called virtualenv to create, for each Python program you run, an isolated environment with the particular Python interpreter and set of modules needed by that particular program. In practice that means, again, copying lots of files.

Arguably these duplications don’t violate DRY because they are mechanical copies that won’t vary from their originals. But they can! And here too I am prone to indulge in local variation to explore possibilities that might or might not warrant generalization.

While pondering the vices and virtues of duplicative software construction I reread Metamagical Themas, the compendium of Douglas Hofstadter’s columns in Scientific American. (The title is an anagram of Mathematical Games, the column he inherited from Martin Gardner.) In Variations on a Theme as the Crux of Creativity he states the case as plainly as anywhere. At the core of creative thought are “slippery” concepts that we develop in a virtuous cycle of innovation:

Once you have decided to try out a new way of viewing a phenomenon, you can let that view suggest a set of knobs to vary. The act of varying them will lead you down new pathways, generating new images ripe for perception in their own right.

This sets up a closed loop:

– fresh situations get unconsciously framed in terms of familiar concept;

– those familiar concepts come equipped with standard knobs to twiddle;

– twiddling those knobs carries you into fresh new conceptual territory.

We need to get DRY eventually in order to maintain stable systems. But the countervailing state needn’t be WET (“write everything twice”, “we enjoy typing” or “waste everyone’s time”). Instead I propose DRYWV: Do Repeat Yourself, With Variations.

Every piece of knowledge should have a single, unambiguous, authoritative representation within a system. But how do we arrive at such knowledge? I think we have to DRYWV our way there.

Dwelling in the zone of evidence

I’ve written plenty about the software layer that adapts the Hypothesis annotator to the needs of someone who gathers, organizes, analyzes, and then writes about evidence found online. Students in courses based on Mike Caulfield’s Digital Polarization template will, I hope, find that this software streamlines the grunt work required to find and cite the evidence that supports evaluation of a claim like this one:

Claim: The North Carolina Republican Party sent out a press release boasting about how its efforts drove down African-American turnout in the 2016 US presidential election.

That’s a lightly-edited version of something I read in the New Yorker and can send you to directly:

As we were fleshing out how a DigiPo course would work, I wrote an analysis of that claim. The investigation took me all the way back to the 1965 Voting Rights Act. Then it led to the 2013 Supreme Court decision — in Shelby vs Holder — to dilute the “strong medicine” Congress had deemed necessary “to address entrenched racial discrimination in voting.” Then to a series of legal contests as North Carolina began adjusting its voting laws. Then to the election-year controversies about voter suppression. And finally to the press release that the North Carolina GOP sent the day before the election, and the reactions to it.

Many claims don’t require this kind of deep dive. As Mike writes today, core strategies — look for fact-checking prior art, go upstream to the source, read laterally — can resolve some claims quickly.

But some claims do require a deep dive. In those cases I want students to immerse themselves in that process of discovery. I want them to suspend judgement about the claim and focus initially on marshalling evidence, evaluating sources, and laying a foundation for analyis. It’s hard work that the DigiPo toolkit can make easier, maybe even fun. That’s crucial because the longer you can comfortably dwell in that zone of evidence-gathering and suspended judgement, the stronger your critical thinking will become.

When I first read Toobin’s claim my internal narrative was: “Boasted about voter suppression? Of course those neanderthals did!” Then I entered the zone and spent many hours there. Voter suppression wasn’t a topic I’d spent much time reading about, so I learned a lot. When I returned to the claim I arrived at an interesting judgement. Yes there was voter suppression, and it was in some ways more draconian than I had thought. But had the North Carolina GOP actually boasted (Mother Jones: bragged, Salon: celebrated) the lower African-American turnout? I concluded it had not. It had reported reduced early voting, but not explicitly claimed that was a successful outcome of voter suppression.

So we rated the claim as Mixed — that is, partly true, partly false. A next step for this investigation would be to break the claim into more granular parts. (Software developers would call that “refactoring” the claim.) So:

In a press release on November 7, 2016, the North Carolina GOP reported lower African-American early voting.

That’s easy to check. True.

Here’s another:

In its 11/7/2016 press release the North Carolina GOP boasted about the success of its voter suppression efforts.

Also easy to check: False.

What about this?

In the wake of Shelby vs Holder, the North Carolina GOP pushed legislation that discriminates against African-American voters.

You need to gather and organize a lot of source material in order to even begin to evaluate that claim. My fondest hope for DigiPo is that students inclined to judge the claim, one way or the another, will delay that judgement long enough to gather evidence that all can agree is valid. That, I believe, would be a fantastic educational outcome.

How shared vocabularies tie the annotated web together

I’m fired up about the work I want to share at Domains 2017 this summer. The tagline for the conference is Indie Tech and Other Curiosities, and I plan to be one of the curiosities!

I’ve long been a cheerleader for the Domain of One’s Own movement. In Reclaiming Innovation, Jim Groom wrote about the need to “understand technologies as ‘potentiality’ (to graft a concept by Anton Chekov from a literary to a technical context).” He continued:

This is the idea that within the use of every technical tool there is more than just the consciousness of that tool, there is also the possibility to spark something beyond those predefined uses. The only real way to galvanize that potentiality is to provide the conditions of possibility — that is, a toolkit for user innovation.

My recent collaboration with Mike Caulfield on the Digital Polarization Initiative has led to the creation of just such a toolkit. It supports DigiPo in the ways described and shown here. A version of the toolkit, demoed here, will support a team of investigative journalists. Now I need to show how the toolkit enables educators, scientists, investigative reporters, students — anyone who researches and writes articles or reports or papers backed by web-based evidence — to innovate in similar ways.

In tech we tend to abuse the term innovation so let me spell out exactly what I mean: Better ways to gather, organize, reason over, and cite online evidence. Web annotation, standardized this week by the W3C, is a key enabler. The web’s infinite space of addressable URLs is now augmented by a larger infinity of segments of interest within the resources pointed to by URLs. In the textual realm, paragraphs, list items, sentences, or individual words can be reliably linked to conversations — but also applications — that live in connected annotation layers.

A web of addressable segments of interest is a necessary, but not sufficient, condition of possibility. We also need tools that enable us to gather, organize, recombine, and cite those segments. And some of those tools need to be malleable in the hands of users who can shape them for their own purposes.

When I reread Vannevar Bush’s As We May Think, to prepare for a conversation about it with Gardner Campbell and Jeremy Dean (video, Gardner’s reflections), I focused on this passage:

He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together.

Nowadays that first encyclopedia article lives at one URL. The pertinent item in a history is a segment of interest within another URL-addressable resource. How do we tie them together? A crucial connector is a tag that belongs to neither resource but refers to both.

When tools control the sets of tags available for resource interconnection, they enable groups of people to make such connections reliably. That’s what the DigiPo toolkit does when it offers a list of investigation pages, drawn from the namespace of a wiki, as the set of tags that connect annotation-defined evidence to investigations. You see that happening with the DigiPo toolkit shown here, and with a variant of the toolkit shown here. In both cases the tags that bind evidence to wiki pages are controlled by software that acquires a list of wiki pages and presents the names of those pages as selectable tags.

One future direction for the toolkit leads to software that acquires lists of pages from other kinds of content management systems: WordPress, Drupal, you name it. Every CMS defines a namespace that is implicitly a list of tags that can be used to bind sets of resources to the pages served by that CMS. If you’re looking to adapt a DigiPo-like tool to your CMS, I’ll be delighted to show you how.

Such adaptation, though, requires somebody to write some code. While it’s unfashionable in some circles to say so, I don’t think everyone should learn to code. There’s a more fundamental web literacy, nicely captured by Audrey Watters here:

It’s about understanding the components of the Web and knowing how to tag and then manipulate them. By thinking and developing sets of named resources, you are a Web thinker. This isn’t about programming but rather the creation of sets of resources and the identification of components that work with those resources and combine them to create solutions.

Web annotation vastly enlarges the universe of resources that can be named. But it’s on us to name them. Tags are a principal way we do that. If our naming of resources is going to be an effective way to organize and combine them, though, we need to do it reliably and consistently. Software can enforce that consistency, but not everyone can write software. So a user innovation toolkit for the annotated web needs to empower users to enforce consistent naming without writing code.

A couple of weeks ago I built a Chrome extension that enables users to define their own lists of shared tags by recording them in an Google Doc. The demonstration video prompted this query from Jim Groom:

I just got through with a workshop here demoing Hypothes.is for a European group that may be using it to annotate online legislation for data privacy set to go live in 2018. They are teaching a course on it, and this could be one of the spaces/hubs they build the open part around. I came back to this video just now, but got the sense I could already tag from within annotations/pages, so how does the tag helper change this? Just a different way at it? Is it new functionality from previous tags? I love that you can have a Google Doc list of tags, but the video example is not making sense to me for some reason. And I wanna know :)

Here’s my response. That tag helper, now incorporated into the toolkit I’m evolving for DigiPo and other uses, makes it possible for people who don’t write code to define tag namespaces that govern their gathering, organization, recombination, and citation not only of URL-addressable resources but also of annotation-addressable segments of interest within those resources. People can “tie them together” — as Vannevar Bush imagined — in the ways their interests and workflows require.

Does that answer the question? If not, please keep asking until I do so properly. User-defined tag namespaces, though admittedly still a curiosity, are one of the best ways to make collective use of a web of addressable segments.

How annotation layers define “segments of interest” for new kinds of applications

Here are some analogies we use when talking about software:

Construction: Programs are houses built on foundations called platforms.

Ecology: Programs are organisms that depend on ecosystem services provided by platforms.

Community: Programs work together in accordance with rules defined by platforms.

Architecture: Programs are planned, designed, and built according to architectural plans.

Economics: Programs are producers and consumers of services.

Computer hardware: Programs are components that attach to a shared bus.

All are valid and may be useful in one way or another. In this essay I focus on the last because it points to an important way of understanding what web annotation can enable. My claim here is that the web’s emerging annotation layer forms a shared bus for a new wave of content-oriented applications.

A computer’s bus connects devices: disk drive, keyboard, network adapter. If we think of the web in this way, we’d say that devices (your computer, mine) and also people (you, me) attach to the bus. And that the protocol for attachment has something to do with URLs.

You can, for example, follow this link to display and interact with the set of Hypothesis annotations related to this web page. You can also paste the link’s URL into a message or a document to share the view with someone else.

That same URL can behave like an API (application programming interface) that accesses the resource named and located by the URL. A page like this one, part of the DigiPo fact-checking project, uses the link that way. It derives the Hypothes search URL from its own URL, and injects the resulting Hypothesis view into the page.

Every time we create a new wiki page at digipo.io, we mint a new URL that summons the set of Hypothesis annotations specific to that page. In principle there’s no limit to the number of such pages — and associated sets of annotations — we can add. And that’s just one of an unlimited number of sites. The web of URL-addressable resources is infinitely large.

Even so, URLs address only a small part of a larger infinity of resources: words and phrases in texts, regions within images, segments of audio and video. Web annotation enables us to address that larger infinity. The DigiPo project illustrates some of the ways in which annotation expands the notion of content as a bus shared by people and computers. But first some background on how annotation works.

The proposed standard for web annotation defines an extensible set of selectors:

Many Annotations refer to part of a resource, rather than all of it, as the Target. We call that part of the resource a Segment (of Interest). A Selector is used to describe how to determine the Segment from within the Source resource.

When the segment of interest is a selection in a textual resource, one kind of selector captures the selection and its surrounding text. Another captures the position of the selection (“starts at the 347th character, ends at the 364th”). Still another captures its location in a web page (“contained in the 2nd list item in the first list in the seventh paragraph”). For reasons of both speed and reliability, Hypothesis uses all three selectors when it attaches (“anchors”) an annotation to a selection.

When a segment of interest is a clip within a podcast or a video, a selector would capture the start and stop (“starts at 1 minute, 32 seconds, ends at 3 minutes, 12 seconds”). When it’s a region in a bitmapped image, a selector would capture the coordinates (“starts at x=12,y=53, ends at x=355,y=124”). When it’s a piece of a vector image, a selector would capture the Scalable Vector Graphics (SVG) markup defining that piece of the image.

The W3C’s model of web annotation lays a foundation for other kinds of selectors in other domains: locations in maps, nodes in Jupyter notebooks, bars and trend lines and data points in charts. But let’s stick with textual annotation for now, consider how it expands the universe of addressable resources, and explore what we can do in that universe.

Here’s a picture of what’s happening in and around the above-mentioned DigiPo page:

The author has cited a Hypothesis link that refers to a piece of evidence in another web page. The link encapsulates both the URL of that page and a set of selectors that mark the selected passage within it. When you follow the link Hypothesis takes you to the page, scrolls to the passage, and highlights it. That’s a powerful interactive experience!

Now suppose you want to review all the evidence that supports this investigation. You can do it interactively but that will require a lot of context-disrupting clicks. So another program embedded in the wiki page summarizes the cited quotes for you. It uses a variant of the Hypothesis direct link that delivers the interactive experience. The variant is a Hypothesis API call that delivers the annotation in a machine-friendly format. The summarization script collects all the Hypothesis direct links on the page, gathers the annotations, extracts the URLs and quotes, injects them into the Footnotes section of the page, and rewrites the links to point to corresponding footnotes.

To enable this magic, an app that people can use to annotate regions in web pages is necessary but not sufficient. You also need an API-accessible service that enables computers to create and retrieve annotations. Even more fundamentally, you need an open web standard that defines how apps and services work not only with atomic resources named and located by URLs, but also segments of interest within them.

What else is possible on a shared content bus where segments of interest are directly addressable both by people and computers? Here’s one idea being pondered by some folks in the world of open educational resources (OER). Suppose you’re creating an open textbook that attaches quizzes to segments within the text. The quizzes live in a database. How do you connect a quiz to a segment in your book?

Because a quiz is an URL-addressable resource, you can transclude one directly into your book near the segment to which it applies. Doing that normally means encoding the segment’s location in the book’s markup so the software that attaches the quiz can put it in the right place. That works, but it entangles two editorial tasks: writing the book, and curating the quizzes. That entanglement makes it harder to provide tools that support the tasks individually. If you can annotate segments of interest, though, you can disentangle the tasks, tool them separately, build the book more efficiently, and ensure others can more cleanly repurpose your work.

Analogies are necessary but imperfect. The notion of a shared bus, formed by an annotation layer and used by applications oriented to segments of content, may or may not resonate. I’m looking for a better analogy; suggestions welcome. But however you want to think about it, the method I’m describing here works powerfully well, I’ll continue to apply it, and I’d love to discuss ways you can too.

Componentware Revisited

I’m not a scholar, nor do I play one on TV, but when I search Google Scholar I find that I’m cited there a few times, most notably for a 1994 BYTE cover story, Componentware. The details there are at best of historical interest but the topic remains evergreen: How do we package software in ways that maximize its reusability while minimizing the level of skill required to achieve reuse?

By 1996 the web had booted up and I reprised the theme in On-Line Componentware1. That’s when it dawned on me that the websites that people “surfed” to were also software components that could be woven together to meet a variety of needs. It was my first glimpse of what we later came to know as SOA (service-oriented architecture), then RESTful APIs, and most recently microservices. Ever since then, wearing one hat or another, I’ve been elaborating the theme of that column: “A powerful capability for ad hoc distributed computing arises naturally from the architecture of the Web.” (link)

That architecture has in some ways remained the same, in other ways evolved dramatically, but its generative power continues to surprise and delight me. And I keep finding new ways to package and reuse web components.

Hypothesis has been a fascinating case study. Our web annotation system has two main components. The web service, written in Python, runs on a web server. The client, written in JavaScript, runs in your browser. Both are available for reuse in many different ways.

One way to reuse the web service is to embed views in web pages, as shown in this example from the Digital Polarization (Digipo) project:

The “Matching Annotations” widget embedded in that page is just this search result wrapped in an iframe. This is one of the most common and powerful ways to reuse web components.

The Hypothesis API affords another way to reuse its server component. The Timeline widget, embedded on that same page, works that way. It searches Hypothesis for the URLs of annotations tagged with the id of the current wiki page. Then it searches the annotations on each of those URLs for another user-assigned tag that signifies the publication date, and arranges those results chronologically. (The Timeline widget could have been written in PHP to run in the wiki server, but I’m more familiar with JavaScript so instead it’s written in JS and runs in the browser.)

The Hypothesis client can also be reused in powerful ways. Most notably, you can add the client to a website by including this simple script tag in the site’s main template:

https://hypothes.is/embed.js

Or you can use the Hypothesis proxy, https://via.hypothes.is/, to inject the client into a web page, for example: https://via.hypothes.is/https://en.wikipedia.org/wiki/Proxy_server.

When you use Hypothesis to annotate a PDF file, it relies on a separate component — Mozilla’s PDF.js — to parse the PDF and render it in the browser so the Hypothesis client can operate on it. PDF.js is available natively in Firefox, the Hypothesis Chrome extension injects it when you annotate a PDF in that browser.

Another Hypothesis component, pdf.js-hypothesis, enables a web server to serve a PDF with PDF.js and Hypothesis both active. That makes PDF annotation available in any browser. We use it in our prototype Canvas app, for example, to serve annotation-enabled PDFs in the Canvas learming management system (LMS).

Still another component enables custom rendering of annotations. You can see it in action at Science in the Classroom, a collection of research papers annotated to serve as teaching materials.

Graduate students use Hypothesis to create the annotations. But Science in the Classroom prefers to display them using its own mechanism, Learning Lens. So when the page loads, it fetches annotations using the Hypothesis API and then paints them on the page using a component that’s part of the Hypothesis client but is also available as the standalone NPM module dom-anchor-text-quote.

I am deliberately blurring the definition of web component because I think it properly encompasses many different things: a web page embedded in an iframe; an API-accessible web service; a rich client application like Hypothesis (or a simple widget like the Timeline) embedded in a web page; a standalone module like dom-anchor-text-quote; a repackaging of Hypothesis as a WordPress plugin or a Canvanas external tool.

This is a rich assortment of ingredients! But there’s one that’s notably absent. We’ve seen lots of ways to use the Hypothesis client as a component that plugs into other environments and makes annotation available there. But what if you want to plug something into the Hypothesis client? There isn’t yet a mechanism for that. The code is open source and can be modified, as Marija Katic and Martin Eve have done with Annotran, a translation tool based on Hypothesis. That’s a great example of code reuse. But it isn’t, at least to my way of thinking, an example of component reuse. Although I recognize many different species of software components, they all share one piece of common DNA: reuse without internal modification.

In an essay on what I learned while building the Canvas app, I noted two critical aspects of the healthy ecosystem that Canvas and other learning management systems inhabit:

1. Standard protocols. In the LMS world, Learning Tools Interoperability (LTI) defines those protocols.

2. Frictionless component reuse. This flows from item 1. An LTI app expects to be launched from an LMS and to run embedded in an iframe there. Again, this is the most common and powerful way to reuse web components.

The question I asked there, and tried to answer: Could an iframe embed web components within a rich web client like Hypothesis? If so that might open the way for features not yet in the Hypothesis core, like controlled tagging, that would otherwise require deep surgery on the Hypothesis client, and intimate knowledge of its JavaScript framework (Angular) and the nonstandard component model dictated by that framework.

I had already tried a couple of experiments to add controlled tagging to the Hypothesis client. In this one, the tag suggestions offered in the tag editor are bound to Hypothesis groups. In this one, tag suggestions are bound to an external web service. Both experiments entailed nontrivial alteration of the Hypothesis client.

In a third experiment, I modified the Hypothesis client in a way that could enable a family of components to plug into it. This customized client embedded an iframe in the annotation editor, and launched a user-defined web application into that iframe, passing it one parameter: the id of the annotation open in the editor. Because it was configured with the credentials of a Hypothesis user, it could work as a pluggable component that communicates with the active annotation and also with the full panoply of web resources. You could, perhaps, think of it as an annotation applet. Here’s a demo.

This approach was intriguing and might serve some useful purposes, but an iframe is an ugly and awkward construct to stick into the middle of a richly-designed web client. And this approach again fails my definition of component reuse because it requires internal modification of the client.

So as I began working to integrate Hypothesis into Digipo I was still looking for a way to control Hypothesis tags without modifying the Hypothesis client. As described in A toolkit for fact checkers, we initially used bookmarklets to do that, then began developing a Chrome extension for the Digipo project.

The Chrome extension immediately solved a couple of vexing problems. It enabled us to cleanly package a growing set of Digipo tools, by making them conveniently right-click-accessible. And it got around the security constraints that increasingly make bookmarklets untenable.

Just as importantly it enabled us to blend together a Digipo-specific set of tools, some but not all of which are Hypothesis-powered. For a Digipo fact checker, Hypothesis isn’t a primary part of the experience. It’s a supporting component that’s brought into the process as and where needed. It’s infrastructure.

The Digipo workflow relies on controlled tagging to accumulate evidence into several buckets associated with each investigation. When you’re on a page that you want to put into a bucket, you can use Digipo’s Tag this Page helper to create a Hypothesis page note with the tag for that investigation. It starts here:

That leads to a page that lists the Digipo investigations.

When you choose one, the extension uses the Hypothesis API to create a page note with the investigation’s tag.

Thanks to Hypothesis direct linking, the interaction flows seamlessly from the Digipo extension to Hypothesis. You land in the annotation editor where you can do more with Hypothesis: add comments and new tags, discuss the target document with other Hypothesis users.

But this arrangement only creates Hypothesis page notes: annotations that refer to a target document but not to a selection within that document. More powerful uses of Hypothesis flow from selections within target documents. Could a selection-based annotation begin in the Digipo extension, acquire a tag, and then flow through to Hypothesis?

Happily the answer is yes. You can see that here.

The Digipo Chrome extension presents one set of helpers when you right-click on a page with nothing selected. Some of the helpers rely on Hypothesis, others just automate parts of the Digipo workflow — for example, launching advanced Google searches. When you right-click with a selection active, the Digipo Chrome extension presents another set of helpers which, again, may or may not rely on Hypothesis. One of them, Tag this Selection, works like Tag this Page in that it uses the Hypothesis API to create an annotation that includes a controlled tag. But Tag this Selection does a bit more work. It sends not only the URL of the target document, but also a Text Quote Selector that anchors the annotation within the document. In this case, too, the interaction then flows seamlessly into Hypothesis where you can edit the newly-created annotation and perhaps discuss the selected passage.

You can see more of the interplay between the Digipo and Hypothesis extensions in this screencast. I’m pretty excited by how this is turning out. The Digipo extension is Chrome-only for now, as is the Hypothesis extension, but WebExtensions should soon enable broader coverage. There’s still a need to plug packaged behavior directly into the Hypothesis client. But much can be accomplished with an extension that cooperates with Hypothesis using its existing set of affordances. The Digipo extension is one example. I can imagine many others, and I’m expanding my definition of componentware to include them.


1 I love how our copy editor insisted on hyphenating On-Line!

A toolkit for fact checkers

Update: See this post (with screencasts!)

Mike Caulfield’s Digital Polarization Initiative (DigiPo) is a template for a course that will lead students through exercises to analyze and fact-check news stories. The pedagogical approach Mike describes here is evolving; in parallel I’ve been evolving a toolkit to help students research and organize the raw materials of the analyses they’ll be asked to produce. Annotation is a key component of the toolkit. I’ve been working to integrate it into the fact-checking workflow in ways that complement the use of other tools.

We’re not done yet but I’m pleased with the results so far. This post is an interim report to summarize what we’ve learned so far about building an annotation-powered toolkit for fact checkers.

Here’s an example of a DigiPo claim to be investigated:

EPA Plans to Allow Unlimited Dumping of Fracking Wastewater in the Gulf of Mexico (see Occupy)

I start with no a priori knowledge of EPA rules governing release of fracking wastewater, and only a passing acquaintance with the cited source, occupy.com. So the first order of business is to marshal some evidence. Hypothesis is ideal for this purpose. It creates links that encapsulate both the URL of a page containing found evidence, and the evidence itself — that is, a quote selected in the page.

There’s a dedicated page for each DigiPo investigation. It’s a wiki, so you can manually include Hypothesis links as you create them. But fact-checking is tedious work, and students will benefit from any automation that helps them focus on the analysis.

The first step was to include Hypothesis as a widget that displays annotations matching the wiki id of the page. Here’s a standalone Hypothesis view that gathers all the evidence I’ve tagged with digipo:analysis:gulf_of_frackwater. From there it was an easy next step to tweak the wiki template so it embeds that view directly in the page:

That’s really helpful, but it still requires students to acquire and use the correct tag in order to populate the widget. We can do better than that, and I’ll show how later, but here’s the next thing that happened: the timeline.

While working through a different fact-checking exercise, I found myself arranging a subset of the tagged annotations in chronological order. Again that’s a thing you can do manually; again it’s tedious; again we can automate with a bit of tag discipline and some tooling.

If you do much online research, you’ll know that it’s often hard to find the publication date of a web page. It might or might not be encoded in the URL. It might or might not appear somewhere in the text of the page. If it does there’s no predictable location or format. You can, however, ask Google to report the date on which it first indexed a page, and that turns out to be a pretty good proxy for the publication date.

So I made another bookmarklet to encapsulate that query. If you were to activate it on one of my posts it would lead you to this page:

I wrote the post on Oct 30, Google indexed it on Oct 31, that’s close enough for our purposes.

I made another bookmarklet to capture that date and add it, as a Hypothesis annotation, to the target page.

With these tools in hand, we can expand the widget to include:

  • Timeline. Annotations on the target page with a googledate tag, in chronological order.

  • Related Annotations. Annotations on the target page with a tag matching the id of the wiki page.

You can see a Related Annotations view above, here’s a Timeline:

So far, so good, but as Mike rightly pointed out, this motley assortment of bookmarklets spelled trouble. We wouldn’t want students to have to install them, and in any case bookmarklets are increasingly unlikely to work. So I transplanted them into a Chrome extension. It presents the growing set of tools in our fact-checking toolkit as right-click options on Chrome’s context menu:

It also affords a nice way to stash your Hypothesis credentials, so the tools can save annotations on your behalf:

(The DigiPo extension is Chrome-only for now, as is the Hypothesis extension, but WebExtensions should soon enable broader coverage.)

With the bookmarklets now wrapped in an extension we returned to the problem of simplifying the use of tags corresponding to wiki investigation pages. Hypothesis tags are freeform. Ideally you’d be able to configure the tag editor to present controlled lists of tags in various contexts, but that isn’t yet a feature of Hypothesis.

We can, though, use the Digipo extension to add a controlled-tagging feature to the fact-checking toolkit. The Tag this Page tool does that:

You activate the tool from a page that has evidence related to a DigiPo investigation. It reads the DigiPo page that lists investigations, captures the wiki ids of those pages. and presents them in a picklist. When you choose the investigation to which the current page applies, the current page is annotated with the investigation’s wiki id and will then show up in the Related Annotations bucket on the investigation page.

While I was doing all this I committed an ironic faux pas on Facebook and shared this article. Crazy, right? I’m literally in the middle of building tools to help people evaluate stuff like this, and yet I share without checking. Why did I not take the few seconds required to vet the source, bipartisanreport.com?

When I made myself do that I realized that what should have taken a few seconds took longer. There’s a particular Google advanced query syntax you need in this situation. You are looking for the character string “bipartisanreport.com” but you want to exclude the majority of self-referential pages. You only want to know what other sites say about this one. The query goes like this:

bipartisanreport.com -site:bipartisanreport.com

Just knowing the recipe isn’t enough. Using it needs to be second nature and, even for me, it clearly wasn’t. So now there’s Google this Site:

Which produces this:

It’s ridiculously simple and powerful. I can see at a glance that bipartisanreport.com shows up on a couple of lists of questionable sites. What does the web think about the sites that host those lists? I can repeat Google this Site to zoom in on them.

Another tool in the kit, Save Facebook Share Count, supports the sort of analysis that Mike did in a post entitled Despite Zuckerberg’s Protests, Fake News Does Better on Facebook Than Real News. Here’s Data to Prove It.

How, for example, has this questionable claim propagated on Facebook? There’s a breadcrumb trail in the annotation layer. On Dec 26 I used Save Publication Date to assign the tag googledate:2016-08-31, and on the same day I used Save Facebook Share Count to record the number of shares reported by the Facebook API. On Dec 30 I again used Save Facebook Share Count. Now we can see that the article is past its sell-by date on Facebook and never was highly influential.

Finally there’s Summarize Quotes, which arose from an experiment of Mike’s to fact-check a single article exhaustively. Here’s the article he picked, along with the annotation layer he created:

Some of the annotations contain Hypothesis direct links to related annotations. If you open this annotation in the Politico article, for example, you can follow Hypothesis links to related annotations on pages at USA Today and Science.

These transitive annotations are potent but it gets to be a lot of clicking around. So the most experimental of the tools in the kit, Summarize Quotes, produces a page like this:

This approach doesn’t feel quite right yet, but I suspect there’s something there. Using these tools you can gather a lot of evidence pretty quickly and easily. It then needs to be summarized effectively so students can reason about the evidence and produce quality analysis. The toolkit embodies a few ways to do that summarization, I’m sure more will emerge.

Marshalling the evidence

In Bird-dogging the web I responded to questions raised by Mike Caulfield about how annotation can help us fact-check the web. He’s now written a definition of the political technique, called bird-dogging, we discussed in those posts. It’s a method of recording candidates’ positions on issues, but it’s recently been mis-characterized as a way to incite violence. I’ve annotated a batch of articles that conflate bird-dogging with violence:

source: https://hypothes.is/api/search?tags=bird-dogging&user=judell

Each annotation links to Mike’s definition. Collectively they form a data set that can be used to trace the provenance of the bird-dogging = violence meme. A digital humanist could write an interesting paper on how the meme flows through a network of sources, and how it morphs along the way. But how will such evidence ever make a difference?

In Annotating the wild west of information flow I sketched an idea that weaves together annotation, a proposed standard for fact-checking called ClaimReview, and Google’s plan to use that standard to add Fact Check labels to news articles. These ingredients are necessary but not sufficient. The key missing ingredient? President Obama nailed it in his remarks at the White House Frontiers Conference: “We’re going to have to rebuild, within this wild west of information flow, some sort of curating function that people agree to.”

It can sometimes seem, in this polarized era, that we can agree on nothing. But we do agree, at least tacitly, on the science behind the technologies that sustain our civilization: energy, agriculture, medicine, construction, communication, transportation. When evidence proves that cigarettes can cause lung cancer, or that buildings in some places need to be earthquake-resistant, most of us accept it. Can we learn to honor evidence about more controversial issues? If that’s possible, annotation’s role will be to help us marshal that evidence.

Bird-dogging the web

In Annotating the wild west of information flow I responded to President Obama’s appeal for “some sort of curating function that people agree to” with a Hypothes.is thought experiment. What if an annotation tool could make claims about the veracity of statements on the web, and record those claims in a standard machine-readable format such as ClaimReview? The example I gave there: a climate scientist can verify or refute an assertion about climate change in a newspaper article.

Today Mike Caulfield writes about another kind of fact-checking. At http://www.mostdamagingwikileaks.com/ he found this claim:

“Bird-dogging is a term coined by high-level Clinton staffers who openly talk about it in the video. They boast about inciting violence at Trump rallies, paying for every protest…”

Mike knows better.

Wait, what? Bird-dogging is about violence?

I was a bird-dogger for some events in 2008 and as a blogger got to know a bunch of bird-doggers in my work as a blogger. Clinton didn’t invent the term and it has nothing to do with violence.

So he annotates the statement. But he’s not just refuting a claim, he’s explaining what bird-dogging really means: you follow candidates around and film their responses to questions about your issues.

Now Mike realizes that he can’t find an authoritative definition of that practice. So, being an expert on the subject, he writes one. Which prompts this question:

Why the heck am I going to write a comment that is only visible from this one page? There are hundreds (maybe thousands) of pages on the internet making use of the fact that there is no clear explanation of this on the web.

Mike’s annotation does two things at once. It refutes a claim about bird-dogging on one specific page. That’s the sweet spot for annotation. His note also provides a reusable definition of bird-dogging that ought to be discoverable in other contexts. Here there’s nothing special about a Hypothes.is note versus a wiki page, a blog post, or any other chunk of URL-addressable content. An authoritative definition of bird-dogging could exist in any of these forms. The challenge, as Mike suggests, is to link that definition to many relevant contexts in a discoverable way.

The mechanism I sketched in Annotating the wild west of information flow lays part of the necessary foundation. Mike could write his authoritative definition, post it to his wiki, and then use Hypothes.is to link it, by way of ClaimReview-enhanced annotations, to many misleading statements about bird-dogging around the web. So far, so good. But how will readers discover those annotations?

Suppose Mike belongs to a team of political bloggers who aggregate claims they collectively make about statements on the web. Each claim links to a Hypothes.is annotation that locates the statement in its original context and to an authoritative definition that lives at some other URL.

Suppose also that Google News regards Mike’s team as a credible source of machine-readable claims for which it will surface the Fact Check label. Now we’re getting somewhere. Annotation alone doesn’t solve Mike’s problem, but it’s a key ingredient of the solution I’m describing.

If we ever get that far, of course, we’ll run into an even more difficult problem. In an era of media fragmentation, who will ever subscribe to sources that present Fact Check labels in conflict with beliefs? But given the current state of affairs, I guess that would be a good problem to have.

Reading and writing for our peers

The story Jan Dawson tells in The De-Democratization of Online Publishing is familiar to me. Like him, I was thrilled to be part of the birth of personal publishing in the mid-1990s. By 2001 my RSS feedreader was delivering a healthy mix of professional and amateur sources. Through the lens of my RSS reader, stories in the New York Times were no more or less important than blog posts from my peers in the tech blogosophere, And because RSS was such a simple format, there was no technical barrier to entry. It was a golden era of media democratization not seen before or since.

As Dawson rightly points out, new formats from Google (Accelerated Mobile Pages) and Facebook (Instant Articles) are “de-democratizing” online publishing by upping the ante. These new formats require skills and tooling not readily available to amateurs. That means, he says, that “we’re effectively turning back the clock to a pre-web world in which the only publishers that mattered were large publishers and it was all but impossible to be read if you didn’t work for one of them.”

Let’s unpack that. When I worked for a commercial publisher in 2003, my charter was to bring its audience to the web and establish blogging as a new way to engage with that audience. But my situation was atypical. Most of the bloggers I read weren’t, like me, working for employers in the business of manufacturing audiences. They were narrating their work and conserving keystrokes. Were they impossible to read? On the contrary, if you shared enough interests in common it was impossible not to read them.

When publishers created audiences and connected advertisers to them, you were unlikely to be read widely. Those odds don’t change when Google and Facebook become the publishers; only the gatekeepers do. But when publishing is personal and social, that doesn’t matter.

One of the bloggers I met long ago, Lucas Gonze, is a programmer and a musician who curates and performs 19th-century parlour music. He reminded me that before the advent of recording and mass distribution, music wasn’t performed by a small class of professionals for large audiences. People gathered around the piano in the parlour to play and sing.

Personal online publishing once felt like that. I don’t know if it will again, but the barrier isn’t technical. The tools invented then still exist and they work just fine. The only question is whether we’ll rekindle our enthusiasm for reading and writing for our peers.

From PDF to PWP: A vision for compound web documents

I’ve been in the web publishing game since it began, and for all this time I’ve struggled to make peace with the refusal of the Portable Document Format (PDF) to wither and die. Why, in a world of born-digital documents mostly created and displayed on computers and rarely printed, would we cling to a format designed to emulate sheets of paper bound into books?

For those of us who labor to extract and repurpose the contents of PDF files, it’s a nightmare. You can get the text out of a PDF file but you can’t easily reconstruct the linear stream that went in. That problem is worse for tabular data. For web publishers, it’s a best practice to separate content assets (text, lists, tables, images) from presentation (typography, layout) so the assets can be recombined for different purposes and reused in a range of of formats: print, screens of all sizes. PDF authoring tools could, in theory, enable some of that separation, but in practice they don’t. Even if they did, it probably wouldn’t matter much.

Consider a Word document. Here the tools for achieving separation are readily available. If you want to set the size of a heading you don’t have to do it concretely, by setting it directly. Instead you can do it abstractly, by defining a class of heading, setting properties on the class, and assigning the class to your heading. This makes perfect sense to programmers and zero sense to almost everyone else. Templates help. But when people need to color outside the lines, it’s most natural to do so concretely (by adjusting individual elements) not abstractly (by defining and using classes).

It is arguably a failure of software design that our writing tools don’t notice repetition of concrete patterns and guide us to corresponding abstractions. That’s true for pre-web tools like Word. It’s equally true for web tools — like Google Docs — that ape their ancestors. Let’s play this idea out. What if, under the covers, the tools made a clean separation of layout and typography (defined in a style sheet) from text, images, and data (stored in a repository)? Great! Now you can restyle your document, and print it or display it on any device. And you can share with others who work with you on any of their devices.

What does sharing mean, though? It gets complicated. The statements “I’ll send you the document” or “I’ll share the document with you” can sometimes mean: “Here is a link to the document.” But they can also mean: “Here is a copy of the document.” The former is cognitively unnatural for the same reason that defining abstract styles is. We tend to think concretely. We want to manipulate things in the digital world directly. Although we’re learning to appreciate how the link enables collaboration and guarantees we see the same version, sending or sharing a copy (which affords neither advantage) feels more concrete and therefore more natural than sending or sharing a link.

Psychology notwithstanding, we can’t (yet) be sure that the recipient of a document we send or share will able to use it online. So, often, sending or sharing can’t just mean transferring a link. It has to mean transferring a copy. The sender attaches the copy to a message, or makes the copy available to the recipient for download.

That’s where the PDF file shines. It bundles a set of assets into a single compound document. You can’t recombine or repurpose those assets easily, if at all. But transfer is a simple transaction. The sender does nothing extra to bundle it for transmission, and the recipient does nothing extra to unbundle it for use.

I’ve been thinking about this as I observe my own use of Google Docs. Nowadays I create lots of them. My web publishing instincts tell me to create sets of reusable assets and then link them together. Instead, though, I find myself making bigger and bigger Google Docs. One huge driver of this behavior has been the ability to take screenshots, crop them, and copy/paste them into a doc. It’s massively more efficient than the corresponding workflow in, say, WordPress, where the process entails saving a file, uploading to the Media Folder, and then sourcing the image from there.

Another driver has been the Google Docs table of contents feature. I have a 100-page Google Doc that’s pushing the limits of the system and really ought to be a set of interlinked files. But the workflow for that is also a pain: capture the link to A, insert it into B, capture the link to B, insert it into A. I’ve come to see the table of contents feature — which builds the TOC as a set of links derived from doc headings — as a link automation tool.

As the Google Drive at work accumulates more stuff, I’m finding it harder to find and assemble bits and pieces scattered everywhere. It’s more productive to work with fewer but larger documents that bundle many bits and pieces together. If I send you a link to a section called out in the TOC, it’s as if I sent you a link to an individual document. But you land in a context that enables you to find related stuff by scanning the TOC. That can be a more reliable method of discovery, for you, than searching the whole Google Drive.

Can’t I just keep an inventory of assets in a folder and point you to the folder? Yes, but I’ve tried, it feels way less effective, I think there are two reasons why. First, there’s the overhead of creating and naming the assets. Second, the TOC conveys outline structure that the folder listing doesn’t.

This method is woefully imperfect for all kinds of reasons. A 100-page Google Doc is an unwieldy construct. Anonymous assets can’t be found by search. Links to headings lack human-readable information. And yet it’s effective because, I am coming to realize, there’s an ancient and powerful technology at work here. When I create a Google Doc in this way I am creating something like a book.

This may explain why the seeming immortality of the PDF format is less crazy than I have presumed. Even so, I’m still not ready to ante up for Acrobat Pro. I don’t know exactly what a book that’s born digital and read on devices ought to be. I do know a PDF file isn’t the right answer. Nor is a website delivered as a zip file. We need a thing with properties of both.

I think a W3C Working Draft entitled Portable Web Publications for the Open Web Platform (PWP) points in the right direction. Here’s the manifesto:

Our vision for Portable Web Publications is to define a class of documents on the Web that would be part of the Digital Publishing ecosystem but would also be fully native citizens of the Open Web Platform.

PWP usefully blurs distinctions along two axes.

That’s exactly what’s needed to achieve the goal. We want compound documents to be able to travel as packed bundles. We want to address their parts individually. And we want both modes available to us regardless of whether the documents are local or remote.

Because a PWP will be made from an inventory of managed assets, it will require professional tooling that’s beyond the scope of Google Docs or Word Online. Today it’s mainly commercial publishers who create such tools and use them to take apart and reconstruct the documents — typically still Word files — sent to them by authors. But web-native authoring tools are emerging, notably in scientific publishing. It’s not a stretch to imagine such tools empowering authors to create publication-ready books in PWP. It’s more of a stretch to imagine successors to Google Docs and Word Online making that possible for those of us who create book-like business documents. But we can dream.

Customer service and human dignity

It’s been a decade since I interviewed Paul English on the subject of customer service and human dignity (audio). He was CTO and co-founder at kayak but in this interview we talked more about GetHuman. It had begun as a list of cheats to help you hack through the automated defenses of corporate customer service and get to a real person. Here’s how I remember The IVR Cheat Sheet back then:

finance phone steps to find a human
America First Credit Union 800-999-3961 0 or say “member services”
American Express 800-528-4800 0 repeatedly
Bank of America 800-900-9000 00 or dial 813-882-1103 for Executive Office.
Bank of America 800-622-8731 *
Bank of America 800-432-1000 Say “operator” or “associate” at any point in the menu.
Charles Schwab 800-435-9050 3, 0
Chase 800-CHASE24 5 pause 1 4
Chrysler Financial 800-700-0738 Select language, then press 00
Citi AAdvantage 888-766-2484 Ignore prompts and wait for a human.
Citi Card 800-967-8500 0,0,0,0,0

In our interview Paul said:

Dignity is defined in part as giving people the right to make decisions. In particular if it’s a company I’m paying $100/month for cable or cell phone or whatever, and they don’t give me the ability to decide when I need to talk to a human, I find it really insulting.

When the CEO makes the terrible decision to treat customer service as a cost center, the bonus for the VP who runs it is based on one thing: shaving pennies off the cost of the call.

I responded:

Which is a tragedy because customer service is a huge opportunity for business differentiation. If we set up a false dichotomy, where it’s either automated or human, we’re missing out on the real opportunity which is to connect the right people to the right context at the right time. That’s what needs to happen, but a tricky thing to orchestrate and there doesn’t seem to be any vision for how to do that.

I’ve used GetHuman for 10 years. Yesterday I went there to gird for battle with Comcast and was delighted to see that the service has morphed into this:

Boston-based startup GetHuman on Wednesday unveiled a new service that lets you to pay $5 to $25 to hire a “problem solver” who will call a company’s customer service line on your behalf to resolve issues. Prices vary depending on the company, but GetHuman offers to fight for your airline refund, deal with Facebook account issues, or perhaps even prevent a grueling call with Comcast to disconnect your service.

— CNET, May 4, 2016

I’m really curious about their hands-off problem-solving service and will try it in other circumstances, but my negotiation with Comcast was going to require my direct involvement. So this free call-back service made my day:

How our Comcast call-back works

First we call Comcast, wade through their phone maze, wait on hold for you, and then call you back when an agent can talk. We try 4 times, in case we don’t get through the first time. Of course, once you do talk to a Comcast rep, you still have to do the talking, negotiating, etc.

I went back to work. The call came. Normally I’d be feeling angry and humiliated in this situation. Instead I felt happy and empowered. Companies have used their robots to thwart me all these years. Now I’ve got a robot on my side of the table. It’s on!

A chorus of IT recipes

My all-time favorite scene in The Matrix, if not in all of moviedom, is the one where Trinity needs to know how to fly a helicopter. “Tank, I need a pilot program for a B-212.” Her eyelids flutter while she downloads the skill.

I always used to think there was just one definitive flight instruction implant. But lately, thanks to Ward Cunningham and Mike Caulfield, I’ve started to imagine it a different way.

Here’s a thing that happened today. I needed to test a contribution from Ned Zimmerman that will improve the Hypothesis WordPress plugin. The WordPress setup I’d been using had rotted, it was time for a refresh, and the way you do that nowadays is with a tool called Docker. I’d used it for other things but not yet for WordPress. So of course I searched:

wordpress docker ubuntu

A chorus of recipes came back. I picked the first one and got stuck here with this sad report:

'module' object has on attribute 'connection'

Many have tried to solve this problem. Some have succeeded. But for my particular Linux setup it just wasn’t in the cards. Pretty quickly I pulled the trigger on that approach, went back to the chorus, and tried another recipe which worked like a charm.

The point is that there is no definitive recipe for the task. Circumstances differ. There’s a set of available recipes, some better than others for your particular situation. You want to be able to discover them, then rapidly evaluate them.

Learning by consulting a chorus is something programmers and sysadmins take for granted because a generation of open source practice has built a strong chorus. The band’s been together for a long time, a community knows the tunes.

Can this approach help us master other disciplines? Yes, but only if the work of practitioners is widely available online for review and study. Where that requirement is met, choral explanations ought to be able to flourish.

Augmenting journalism

Silicon Valley’s enthusiasm for a universal basic income follows naturally from a techno-utopian ideology of abundance. As robots displace human workers, they’ll provide more and more of the goods and services that humans need, faster and cheaper and better than we could. We’ll just need to be paid to consume those goods and services.

This narrative reveals a profound failure of imagination. Our greatest tech visionary, Doug Engelbart, wanted to augment human workers, not obsolete them. If an automated economy can free people from drudgework and — more importantly — sustain them, I’m all for it. But I believe that many people want to contribute if they can. Some want to teach. Some want to care for the elderly. Some want to build affordable housing. Some want to explore a field of science. Some want to grow food. Some want to write news stories about local or global issues.

Before we pay people simply to consume, why wouldn’t we subsidize these jobs? People want to do them, too few are available and they pay too poorly, expanding these workforces would benefit everyone.

The argument I’ll make here applies equally to many kinds of jobs, but I’ll focus here on journalism because my friend Joshua Allen invited me to respond to a Facebook post in which he says, in part:

We thought we were creating Borges’ Library of Babel, but we were haplessly ushering in the surveillance state and burning down the journalistic defenses that might have protected us from ascendant Trump.

Joshua writes from the perspective of someone who, like me, celebrated an era of technological progress that hasn’t served society in the ways we imagined it would. But we can’t simply blame the web for the demise of journalism. We mourn the loss of an economic arrangement — news as a profit-making endeavor — that arguably never ought to have existed. At the dawn of the republic it did not.

This is a fundamental of democratic theory: that you have to have an informed citizenry if you’re going to have not even self-government, but any semblance of the rule of law and a constitutional republic, because people in power will almost always gravitate to doing things to benefit themselves that will be to the harm of the Republic, unless they’re held accountable, even if they’re democratically elected. That’s built into our constitutional system. And that’s why the framers of the Constitution were obsessed with a free press; they were obsessed with understanding if you don’t have a credible press system, the Constitution can’t work. And that’s why the Framers in the first several generations of the Republic, members of Congress and the President, put into place extraordinary press subsidies to create a press system that never would have existed had it been left to the market.

— Robert McChesney, in Why We Need to Subsidize Journalism. An Exclusive Interview with Robert W. McChesney and John Nichols

It’s true that a universal basic income would enable passionate journalists like Dave Askins and Mary Morgan to inform their communities in ways otherwise uneconomical. But we can do better than that. The best journalism won’t be produced by human reporters or robot reporters. It will be a collaboration among them.

The hottest topic in Silicon Valley, for good reason, is machine learning. Give the machines enough data, proponents say, and they’ll figure out how to outperform us on tasks that require intelligence — even, perhaps, emotional intelligence. It helps, of course, if the machines can study the people now doing those tasks. So we’ll mentor our displacers, show them the ropes, help them develop and tune their algorithms. The good news is that we will at least play a transitional role before we’re retired to enjoy our universal basic incomes. But what if we don’t want that outcome? And what if it isn’t the best outcome we could get?

Let’s change the narrative. The world needs more and better journalism. Many more want to do that journalism than our current economy can sustain. The best journalism could come from people who are augmented by machine intelligence. Before we pay people to consume it, let’s pay some of them to partner with machines in order to produce quality journalism at scale.

I get to be a blogger

To orient myself to Santa Rosa when we arrived two years ago I attended a couple of city council meetings. At one of them I heard a man introduce himself in a way that got my attention. “I’m Matt Martin,” he said, “and I get to be the executive director of Social Advocates for Youth.” I interpreted that as: “It is my privilege to be the director of SAY.” Last week at a different local event I heard the same thing from another SAY employee. “I’m Ken Quinto and I get to be associate director of development for SAY.” I asked Ken if I was interpreting that figure of speech correctly and he said I was.

Well, I get to be director of partnership and integration for Hypothes.is and also a blogger. Former privileges include: evangelist for Microsoft, pioneering blogger for InfoWorld, freelance web developer and consultant, podcaster for ITConversations, columnist for various tech publications, writer and editor and web developer for BYTE. In all these roles I’ve gotten to explore technological landscapes, tackle interesting problems, connect with people who want to solve them, and write about what I learn.

Once, and for a long time, the writing was my primary work product. When blogging took off in the early 2000s I became fascinated with Dave Winer’s notion that narrating your work — a practice more recently called observable work and working out loud — made sense for everyone, not just writers who got paid to write. I advocated strongly for that practice. But my advice came from a place of privilege. Unlike most people, I was getting paid to write.

I still get to tackle interesting problems and connect with people who want to solve them. But times have changed. For me (and many others) that writing won’t bring the attention or the money that it once did. It’s been hard — really hard — to let go of that. But I’m still the writer I always was. And the practice of work narration that I once advocated from a position of privilege still matters now that I’ve lost that privilege.

The way forward, I think, is to practice what I long preached. I can narrate a piece of work, summarize what I’ve learned, and invite fellow travelers to validate or revise my conclusions. The topics will often be narrow and will appeal to a small audiences. Writing about assistive technology, for example, won’t make pageview counters spin. But it doesn’t have to. It only needs to reach the people who care about the topic, connect me to them, and help us advance the work.

Doing that kind of writing isn’t my day job anymore, and maybe never will be again. But I get to do it if I want to. That is a privilege available to nearly everyone.

Towards accessible annotation: a prototype and some questions

The most basic operation in Hypothes.is — select text on a page, click the Annotate button — is not yet accessible to a visually-impaired person who is using a screenreader. I’ve done a bit of research and come up with an approach that looks like it could work, but also raises many questions. In the spirit of keystroke conservation I want to record here what I think I know, and try to find out what I don’t.

Here’s a screencast of an initial prototype that shows, with the NVDA screen reader active on my system, the following sequence of events:

  • Load the Gettysburg address.
  • Use a key to move a selection from paragraph to paragraph.
  • Hear the selected paragraph.
  • Tab to the Annotate button and hit Enter to annotate the selected paragraph.

It’s a start. Now for some questions:

1. Is this a proper use of the aria-live attribute?

The screenreader can do all sorts of fancy navigation, like skip to the next word, sentence, or paragraph. But its notion of a selection exists within a copy of the document and (so far as I can tell) is not connected to the browsers’s copy. So the prototype uses a mechanism called ARIA Live Regions.

When you use the hotkey to advance to a paragraph and select it, a JavaScript method sets the aria-live attribute on that paragraph. That alone isn’t enough to make the screenreader announce the paragraph, it just tells it to watch the element and read it aloud if it changes. To effect a change, the JS method prepends selected: to the paragraph. Then the screenreader speaks it.

2. Can JavaScript in the browser relate the screenreader’s virtual buffer to the browser’s Document Object Model?

I suspect the answer is no, but I’d love to be proven wrong. If JS in the browser can know what the screenreader knows, the accessibility story would be much better.

3. Is this a proper use of role="link"?

The first iteration of this prototype used a document that mixed paragraphs and lists. Both were selected by the hotkey, but only the list items were read aloud by the screen reader. Then I realized that’s because list items are among the set of things — links, buttons, input boxes, checkboxes, menus — that are primary navigational elements from the screenreader’s perspective. So the version shown in the screencase adds role="link" to the visited-and-selected paragraph. That smells wrong, but what’s right?

4. Is there a polyfill for Selection.modify()?

Navigating by element — paragraph, list item, etc. — is a start. But you want to be able to select the next word (or previous) word or sentence or paragraph or table cell. And you want to be able to extend a selection to include the next word or sentence or paragraph or table cell.

A non-standard technology, Selection.modify(), is headed in that direction, and works today in Firefox and Chrome. But it’s not on a standards track. So is there a library that provides that capability in a cross-browser fashion?

It’s a hard problem. A selection within a paragraph that appears to grab a string of characters is, under the covers, quite likely to cross what are called node boundaries. Here, from an answer on StackOverflow, is a picture of what’s going on:

When a selection includes a superscript3 as shown here, it’s obvious to you what the text of the selection should be: 123456790. But that sequence of characters isn’t readily available to a JavaScript program looking at the page. It has to traverse a sequence of nodes in the browser’s Document Object Model in order to extract a linear stream of text.

It’s doable, and in fact Hypothes.is does just that when you make a selection-based annotation. That gets harder, though, when you want to move or extend that selection by words and paragraphs. So is there a polyfill for Selection.modify()? The closest I’ve found is rangy, are there others?

5. What about key bindings?

The screen reader reserves lots of keystrokes for its own use. If it’s not going to be possible to access its internal representation of the document, how will there be enough keys left over for rich navigation and selection in the browser?

What I Learned While Building an App for the Canvas Learning Management System

Life takes strange turns. I’m connected to the ed-tech world by way of Gardner Campbell, Jim Groom, and Mike Caulfield. They are fierce critics of the academy’s embrace of the Learning Management System (LMS) and are among the leaders of an indie-web movement that arose in opposition to it. So it was odd to find myself working on an app that would enable my company’s product, the Hypothes.is web/PDF annotator, to plug into what’s become the leading LMS, Instructure’s Canvas.

I’m not an educator, and I haven’t been a student since long before the advent of the LMS, so my only knowledge of it was second-hand. Now I can report a first-hand experience, albeit that of a developer building an LMS app, not that of a student or a teacher.

What I learned surprised me in a couple of ways. I’ve found Canvas to be less draconian than I’d been led to expect. More broadly, the LMS ecosystem that’s emerged — based on a standard called Learning Tools Interoperability (LTI), now supported by all the LMS systems — led me to an insight about how the same approach could help unify the emerging ecosystem of annotation systems. Even more broadly, all this has prompted me to reflect on how the modern web platform is both more standardized and more balkanized than ever before.

But first things first. Our Canvas app began with this request from teachers: “How can we enable students to use Hypothes.is to annotate the PDF files we upload to our courses?” There wasn’t any obvious way to integrate our tool into the native Canvas PDF viewer. That left two options. We could perhaps create a plugin, internal to Canvas, based on Hypothes.is and the JavaScript component (Mozilla’s PDF.js) we and others use to convert PDF files into web pages. Or we could create an LTI app that delivers that combo as a service running — like all LTI apps — outside Canvas. We soon found that the first option doesn’t really exist. Canvas is an open source product, but the vast majority of schools use Instructure’s hosted service. Canvas has a plugin mechanism but there seems to be no practical way to use it. I don’t know about other LMSs (yet) but if you want to integrate with Canvas, you’re going to build an app that’s launched from Canvas, runs in a Canvas page, and communicates with Canvas using the standard LTI protocol and (optionally) the Canvas API.

Working out how to do that was a challenge. But with lots of help from ed-tech friends and associates as well as from Instructure, we came up with a nice solution. A teacher who wants to base an assignment on group annotation of a PDF file or a web page adds our LTI app to a course. The app displays a list of the PDFs in the Files area of the course. The teacher selects one of those, or provides the URL of a web page to annotate, then completes the assignment in the usual way by adding a description, setting a date, and defining the grading method if participation will be graded. When the student clicks the assignment link, the PDF or web page shows up in a Canvas page with the Hypothes.is annotator active. The student logs into Hypothes.is, switches to a Hypothes.is private group (if the teacher created one for the course), engages with the document and with other students in the annotation layer, and at some point submits the assignment. What the teacher sees then, in a Canvas tool called Speed Grader, on a per-student basis, is an export of document-linked conversation threads involving that student.

The documents that host those conversations can live anywhere on the web. And the conversations are wide open too. Does the teacher engage with students? Do students engage with one another? Does conversation address predefined questions or happen organically? Do tag conventions govern how annotations cluster within or across documents? Nothing in Hypothes.is dictates any such policies, and nothing in Canvas does either.

Maybe the LMS distorts or impedes learning, I don’t know, I’m not an educator. What I can say is that, from my perspective, Canvas just looks like a content management system that brings groups and documents together in a particular context called a course. That context can be enhanced by external tools, like ours, that enable interaction not only among those groups and documents but also globally. A course might formally enroll a small group of students, but as independent Hypothes.is users they can also interact DS106-style with Hypothes.is users and groups anywhere. The teacher can focus on conversations that involve enrolled students, or zoom out to consider a wider scope. To me, at least, this doesn’t feel like a walled garden. And I credit LTI for that.

The app I’ve written is a thin layer of glue between two components: Canvas and Hypothes.is. LTI defines how they interact, and I’d be lying if I said it was easy to figure out to get our app to launch inside Canvas and respond back to it. But I didn’t need to be an HTTP, HTML, CSS, JavaScript, or Python wizard to get the job done. And that’s fortunate because I’m not one. I just know enough about these technologies to be able to build basic web apps, much like ones I was able to build 20 years ago when the web first became a software platform. The magic for me was always about what simple web apps can do when connected to the networked flow of information among people and computers. My Canvas experience reminded me that we can still tap into that magic.

Why did I need to be reminded? Because while the web’s foundation is stronger than ever, the layers being built on it — so-called frameworks, with names like Angular and Ember (in the browser), Rails and Pyramid (on the server) — are the province of experts. These frameworks help with common tasks — identifying users, managing interaction with them, storing their data — so developers can focus on what their apps do specially. That’s a good and necessary thing when the software is complex, and when it’s written by people who build complex software for a living.

But lots of useful software isn’t that complex, and isn’t written by people who do that for a living. Before the web came along, plenty got built on Lotus 1-2-3, Excel, dBase, and FoxPro, much of it by information workers who weren’t primarily doing that for a living. The early web had that same feel but with an astonishing twist: global connectivity. With only modest programming skill I could, and did, build software that participated in a networked flow of information among people and computers. That was possible for two reasons. First, with HTML and JavaScript (no CSS yet) I could deliver a basic user interface to anyone, anywhere, on any kind of computer. Second, with HTTP I could connect that user interface to components and databases all around the web. Those components and databases were called web sites by the people who viewed them through the lens of the browser. But for me they were also software services. Through the lens of a network-savvy programming language (it was Perl, at the time) the web looked like a library of software modules, and URLs looked like the API (application programming interface) to that library.

If I had to write a Canvas plugin I’d have needed to learn a fair bit about its framework, called Rails, and about Ruby, the language in which that framework is written. And that hard-won knowledge would not have transferred to another LMS built on a different framework and written in a different language. Happily LTI spared me from that fate. I didn’t need to learn that stuff. When our app moves to another LMS it’ll need to know how to pull PDF files out of that other system. And that other system might not yet support all the LTI machinery required for two-way communication. But assuming it does, the app will do exactly what it does now — launch in response to an “API call” (aka URL), deliver a “component” (an annotation-enabled document) — in exactly the same way.

Importantly I wasn’t just spared a deep dive into Rails, the server framework that powers Canvas. I was also spared a deep dive into Angular, the JavaScript framework that powers the Hypothes.is client. That’s because our browser-based app can work as a pluggable component. It’s easy to embed Hypothesis in web pages and not much harder to do the same for PDFs displayed in the browser. All I had to do was the plumbing. I wish that had been easier than it was. But it was doable with modest and general skills. That makes the job accessible to people without elite and specific skills. How many more such people are there? Ten times? A hundred? The force multiplier, whatever it may be, increases the likelihood that useful combinations of software components will find their way into learning environments.

All this brings me back to Hypothes.is, and to the annotation ecosystem that we envision, promote, and expect to participate in. The W3C Web Annotation Working Group is defining standard ways to represent and exchange annotations, so that different kinds of annotation clients and servers can work together as do different kinds of email clients and email servers, or browsers and web servers. Because Hypothes.is implements early variants of those soon-to-be-formalized annotation standards, I’ve been able to do lots of useful integration work. Much of it entails querying our service for annotation data and then filtering, transforming, or cross-linking it. That requires only basic web data wrangling. Some of the work entails injection of that data into web pages. That requires only basic web app development. But until recently I didn’t see a way to democratize the ability to extend the Hypothes.is client.

Here’s a example of the kind of thing I want to be able to do and, more importantly, that I want others to be able to do. Like other social systems we offer tags as a principal way to organize data sets. In Hypothes.is you can use tags to keep track of documents as well as annotations linked to those documents. The tags are freeform. We remember and prompt with the tags you’ve used recently, but there are no rules, you can make up whatever tags you want. That’s great for casual use. If you need a bit more rigor, it’s possible to agree with your collaborators on a restricted set of tags that define key facets of the data you jointly create. But pretty soon you find yourself wishing for more control. You want to define specific lists of terms available in specific contexts for specific purposes.

Hypothes.is uses the Angular framework, as I’ve said. It also relies on a set of components that work only in that framework. One of those, called ngTagsInput, is the tag editor used in Hypothes.is. The good news is that it handles basic tagging quite well, and our developers didn’t need to build that capability, they just plugged it in. The bad news is that in order to do any meaningful work with ngTagsInput, you’d need to learn a lot about it, about how it works within the Angular framework, and about Angular itself. That hard-won knowledge won’t transfer to another JavaScript framework, nor will what you build using that knowledge transfer to another web client built on another framework. A component built in Angular won’t work in Ember just as a component built for Windows won’t work on the Mac.

With any web-based technology there’s always a way to get your foot in the door. In this case, I found a way to hook into ngTagsInput at the point where it asks for a list of terms to fill its picklist. In the Hypothes.is client, that list is kept locally in your browser and contains the tags you’ve used recently. It only required minor surgery to redirect ngTagsInput to a web-based list. That delivered two benefits. The list was controlled, so there was no way to create an invalid tag. And it was shared, so you could synchronize a group on the same list of controlled tags.

A prototype based on that idea has helped some Hypothes.is users manage annotations with shared tag namespaces. But others require deeper customization. Scientific users, in particular, spend increasing time and effort annotating documents, extracting structured information from them, and classifying both the documents and the annotations. For one of them, it wasn’t enough to connect ngTagsInput to a web-based list of terms. People need to see context wrapped around those terms in order to know which ones to pick. That context was available on the server, but there was no way to present it in ngTagsInput. Cracking that component open and working out how to extend it to meet this requirement is a job for an expert. You’d need a different expert to do the same thing for ngTagsInput’s counterpart in a different JavaScript framework. That doesn’t bode well if you want to end up with annotation ecosystem made of standard parts.

So, channeling Douglas Hofstadter, I wondered: “What’s the LTI of annotation?” The answer I came up with, in another prototype, was a way to embed a simple web application in the body of an annotation. Just as my LTI app is launched in the context of a Canvas course, with knowledge of the students and resources in that course as well as API access to both Canvas and to the global network of people and information, so with this little web app. It’s launched in the context of an annotation, with knowledge of the properties of that annotation (document URL, quote, comment, replies, tags) and with API access to both Hypothes.is and to the same global network of people and information. Just as my LTI app requires only basic web development knowledge and ability, so with this annotation app. You don’t need to be an expert to create something useful in this environment. And the thing you do could transfer to another standards-based annotation environment.

There’s nothing new here. We’ve had all these capabilities for 20 years. Trends in modern web software development pile on layers of abstraction and push us toward specialization and make it harder to see the engine under the hood that that runs everything. But if you lift the hood you’ll see that the engine is still there, humming along more smoothly than ever. One popular JavaScript framework, called jQuery, was once widely used mainly to paper over browsers’ incompatible implementations of HTML, JavaScript, CSS, and an underlying technology called the Document Object Model. jQuery is falling into disuse because modern browsers have converged remarkably well on those web standards. Will Angular and Ember and the rest likewise converge on a common system of components? A common framework, even? I hope so; opinions differ; if it does happen it won’t be soon.

Meanwhile Web client apps, in fierce competition with one another and with native mobile apps, will continue to require elite developers who commit to non-portable frameworks. Fair enough. But that doesn’t mean we have to lock out the much larger population of workaday developers who command basic web development skills and can use them to create useful software that works everywhere. We once called Perl the duct tape of the Internet. With a little knowledge of it, you could do a lot. It’s easy to regard that as an era of lost innocence. But a little knowledge of our current flavors of duct tape can still enable many of us to do a lot, if systems are built to allow and encourage that. The LTI ecosystem does. Will the annotation ecosystem follow suit?

Copyright can’t stop annotation of government documents

I’ll admit that the Medium Legal team’s post AB 2880?—?Kill (this) Bill had me at hello:

Fellow Californians, please join us in opposing AB 2880, which would allow and encourage California to extend copyright protection to works made by the state government. We think it’s a bad idea that would wind up limiting Californians’ ability to post and read government information on platforms like Medium.

That sure does sound like a bad idea, and hey, I’m a Californian now too. But when I try to read the actual bill I find it hard to relate its text to Medium Legal’s interpretations, or to some others:

I doubt I’m alone in struggling to connect these interpretations to their evolving source text. Medium Legal says, for example:

AB 2880 requires the state’s Department of General Services to track the copyright status of works created by the state government’s 228,000 employees, and requires every state agency to include intellectual property clauses in every single one of their contracts unless they ask the Department in advance for permission not to do so.

What’s the basis for this interpretation? How do Medium Legal think the text of the bill itself supports it? I find four mentions of the Department of General Services in the bill: (1), (2), (3), (4). To which of these do Medium Legal refer? Do they also rely on the Assembly Third Reading? How? I wish Medium Legal had, while preparing their post, annotated those sources.

The Assembly Third Reading, meanwhile, concludes:

Summary of the bill: In summary, this bill does all of the following:

1) clarifies existing law that state agencies may own, license, and register intellectual property to the extent not inconsistent with the rights of the public to obtain, inspect, copy, publish and otherwise communicate under the California Public Records Act, the California Constitution as provided, and under the First Amendment to the United States Constitution;

2) …

7) …

Analysis Prepared by: Eric Dang / JUD. / (NNN) NNN-NNNN

The same questions apply. How does Eric Dang think the source text supports his interpretation? How do his seven points connect to the bill under analysis? Again, an annotation layer would help us anchor the analysis to its sources.

Medium Legal and Eric Dang used digital tools to make notes supporting their essays. Such notes are, by default, not hyperinked to specific source passages and not available to us as interpretive lenses. Modern web annotation flips that default. Documents remain canonical; notes anchor precisely to words and sentences; the annotation layer is a shareable overlay. There’s no copying, so no basis for the chilling effect that critics of AB 2880 foresee. While the bill might limit Californians’ ability to post and read government information on platforms like Medium, it won’t matter one way or the other to Californians who do such things on platforms like Hypothesis.

Watching animals

On a visit to the Berlin Zoo last month, I watched primates interact with a cleverly-designed game called a poke box. It’s a plexiglas enclosure with shelves. Rows of holes in the enclosure give access to the shelves. Each shelf has a single hole, offset from the holes above and below. The machine drips pellets of food onto the top shelf. The primates, using sticks they’ve stripped and prepared, reach through the holes and tease the pellets through the holes in the shelves, performing one delicate maneuver after another, finally reaching through a slot in the bottom of the enclosure to claim the prize.

How would I perform in a game like that? My hunch: not much better or worse than a group of bonobos I spent a long time watching.

“This so-called behavioural enrichment,” the Zoo says, “is an important area in animal management.” The poke box is undoubtedly enriching the lives of those primates. But they are still prisoners.

My dad was a volunteer at the Philadelphia Zoo. I saw, growing up, how that zoo tried to create more naturalistic settings for its animals. The most successful I can recall was the hummingbird house. Its tiny inhabitants roamed freely in a large open space that looked and felt like a tropical rain forest.

I don’t know if those birds really lived something like a normal life, but it seems at least plausible. I can’t say the same for most of the animals I’ve seen in zoos. Conditions may have improved over the years, but I’m not sure I’ll ever again pay to watch the prisoners perform “enriched” behaviors in “naturalistic” settings.

If my children were still young, that’d be a hard call to make. A visit to the zoo is one of the great experiences of childhood. If there weren’t zoos, how could we replace that experience?

Maybe we don’t need to jail animals. Maybe we just need to improve our ability to observe them in situ. There’s still plenty of open space that is, or could be, conserved. And we’re getting really good at surveillance! Let’s put our new skill to a different use. Fly cameras over areas of wildlife parks inaccessible to tourist vehicles. Enable online visitors to adopt and follow individual animals and their groups. Make it possible for anyone to have the sort of experience Jane Goodall did. The drone cameras sound creepy, but unlike the vehicles that carry tourists through those parks, the drones will keep getting smaller and more unobtrusive.

Could it happen? I’m not holding my breath. So for now I’ll focus on the wildlife I can see right here in Sonoma County. This spring we discovered the Ninth Street Rookery, less than a mile from our house. Each of the two large eucalyptus trees there is home to about a hundred nests in which black-crowned night herons and various species of egret — great, cattle, snowy — raise a cacophony as they raise their young. We visited a couple of times; it’s wonderful. Although, come to think of it, if it were a channel on cams.allaboutbirds.org I could watch and learn a lot more about the lives of these animals. And so could you.

Thoughts in motion, annotated

In Knowledge Work as Craft Work (2002), Jim McGee wrote:

The journey from apprentice to master craftsman depends on the visibility of all aspects of craft work.

That was the inspiration for a talk I gave at the 2010 Traction User Group meeting, which focused on the theme of observable work. In the GitHub era we take for granted that we can craft software in the open, subjecting each iteration to highly granular analysis and discussion. Beautiful Code (2007) invited accomplished programmers to explain their thinking. I can imagine an annotated tour of GitHub repositories as the foundation of a future edition of that book.

I can also imagine crafting prose — and then explaining the process — in a similarly open and observable way. The enabling tools don’t exist but I’m writing this post in a way that I hope will suggest what they might be. The toolset I envision has two main ingredients: granular versioning and annotation. When I explored Federated Wiki last year, I got a glimpse of the sort of versioning that could usefully support analysis of prose craft. The atomic unit of versioning in FedWiki is the paragraph. In Thoughts in motion I created a plugin that revealed the history of each paragraph in a document. As writers we continually revise. The FedWiki plugin illustrated that process in a compelling way. The sequence of revisions to a paragraph recorded a sequence of decisions.

For an expert writer such decisions are often tacit. We apply rules that we’ve internalized. How might an expert writer bring those rules to the surface, reflect on them, and explain them to others? Granular version history is necessary but not sufficient. We also need a way to narrate our decisions. I think annotation of version history can help us tell that story. To test that intuition, I am recording a detailed history of this blog post as I write it. The experiment I have in mind: annotate that change history to explain — to myself and others — the choices I’ve made along the way.

Time passes…

OK, I’ve done the experiment here. It certainly explained some things to me about my own process. I doubt it’s generally useful as is, but I think the technique could become so in two ways. As a teacher, I might start with a demo essay, work through a series of revisions, and then annotate them to illustrate aspects of structure, word choice, clarity, and brevity. As a student I might work through my own essay in the same way, guided by progressive feedback (in the annotation layer) from the teacher. It looks promising to me, what do you think?

Annotation is not (only) web comments

Annotation looks like a new way to comment on web pages. “It’s like Medium,” I sometimes explain, “you highlight the passage you’re talking about, you write a comment about it, the comment anchors to the passage and displays to its right.” I need to stop saying that, though, because it’s wrong in two ways.

First, annotation isn’t new. In 1968 Doug Engelbart showed a hypertext system that could link to regions within documents. In 1993, NCSA Mosaic implemented the first in a long lineage of modern annotation tools. We pretend that tech innovation races along at breakneck speed. But sometimes it sputters until conditions are right.

Second, annotation isn’t only a form of online discussion. Yes, we can converse more effectively when we refer to selected passages. Yes, such conversation is easier to discover and join when we can link directly to a context that includes the passage and its anchored conversation. But I want to draw attention to a very different use of annotation.

A web document is a kind of database. Some of its fields may be directly available: the title, the section headings. Other fields are available only indirectly. The author’s name, for example, might link to the author’s home page, or to a Wikipedia page, where facts about the author are recorded. The web we weave using such links is the map that Google reads and then rewrites for us to create the most powerful information system the world has yet seen. But we want something even more powerful: a web where the implicit connections among documents become explicit. Annotation can help us weave that web of linked data.

The semantic web is, of course, another idea that’s been kicking around forever. In that imagined version of the web, documents encode data structures governed by shared schemas. And those islands of data are linked to form archipelagos that can be traversed not only by people but also by machines. That mostly hasn’t happened because we don’t yet know what those schemas need to be, nor how to create writing tools that enable people to easily express schematized information.

Suppose we agree on a set of standard schemas, and we produce schema-aware writing tools that everyone can use to add new documents to a nascent semantic web. How will we retrofit the web we already have? Annotation can help us make the transition. A project called SciBot has given me a glimpse of how that can happen.

Hypothesis’ director of biosciences Maryann Martone and her colleagues at the Neuroscience Information Framework (NIF) project are building an inventory of antibodies, model organisms, and software tools use by neuroscientists. NIF has defined and promoted a way to identify such resources when mentioned in scientific papers. It entails a registry of Research Resource Identifiers (RRIDs) and a protocol for including RRIDs in scientific papers.

Here’s an example of some RRIDs cited in Dopaminergic lesioning impairs adult hippocampal neurogenesis by distinct modification of a-synuclein:

Free-floating sections were stained with the following primary antibodies: rat monoclonal anti-BrdU (1:500; RRID:AB_10015293; AbD Serotec, Oxford, United Kingdom), rabbit polyclonal anti-Ki67 (1:5,000; RRID:AB_442102; Leica Microsystems, Newcastle, United Kingdom), mouse monoclonal antineuronal nuclei (NeuN; 1:500; RRID:AB_10048713; Millipore, Billerica, MA), rabbit polyclonal antityrosine hydroxylase (TH; RRID:AB_1587573; Millipore), goat polyclonal anti-DCX (1:250; RRID:AB_2088494; Santa Cruz Biotechnology, Santa Cruz, CA), and mouse monoclonal anti-a-syn (1:100; syn1; clone 42; RRID:AB_398107; BD Bioscience, Franklin Lakes, NJ).

The term “goat polyclonal anti-DCX” is not necessarily unique. So the author has added the identifer RRID:AB_2088494, which corresponds to a record in NIF’s registry. RRIDs are embedded directly in papers, rather than attached as metadata, because as Dr. Martone says, “papers are the only scientific artifacts that are guaranteed to be preserved.”

But there’s no guarantee an RRID means what it should. It might be misspelled. Or it might point to a flawed record in the registry. Could annotation enable a process of computer-assisted validation? Thus was born the idea of SciBot. It’s a human/machine partnership that works as follows.

A human validator sends the text of an article to a web service. The service scans the article for RRIDs. For each that it finds, it looks up the corresponding record in the registry, then calls the Hypothesis API to post an annotation that anchors to the text of the RRID and includes the lookup result in the body of the annotation. That’s the machine’s work. Now comes the human partner.

If the RRID is well-formed, and if the lookup found the right record, a human validator tags it a valid RRID — one that can now be associated mechanically with occurrences of the same resource in other contexts. If the RRID is not well-formed, or if the lookup fails to find the right record, a human validator tags the annotation as an exception and can discuss with others how to handle it. If an RRID is just missing, the validator notes that with another kind of exception tag.

If you’re not a neuroscientist, as I am not, that all sounds rather esoteric. But this idea of a humans and machines working together to enhance web documents is, I think, powerful and general. When I read Katherine Zoepf’s article about emerging legal awareness among Saudi women, for example, I was struck by odd juxtapositions along the timeline of events. In 2004, reforms opened the way for women to enter law schools. In 2009, “the Commission for the Promotion of Virtue and the Prevention of Vice created a specially trained unit to conduct witchcraft investigations.” I annotated a set of these date-stamped statements and arranged them on a timeline. The result is a tiny data set extracted from a single article. But as with SciBot, the method could be applied by a team of researchers to a large corpus of documents.

Web documents are databases full of facts and assertions that we are ill-equipped to find and use productively. Those documents have already been published, and they are not going to change. Using annotation we can begin to make better use of the documents that exist today, and more clearly envision tomorrow’s web of linked data.

This hybrid approach is, I think, the viable middle path between two unworkable extremes. People won’t be willing or able to weave the semantic web. Nor will machines, though perfectly willing, be able to do that on their own. The machines will need training wheels and the guidance of human minds and hands. Annotation’s role as a provider of training and guidance for machine learning can powerfully complement its role as the next incarnation of web comments.

Owning and sharing your words

In 2001 I was among a community of early bloggers who came together around Dave Winer’s Radio UserLand, a tool for both publishing and aggregating blogs. To the world at large, blogging was rightly understood to be a new and exciting way for people to publish their writing online. Those of us exploring the new medium found it to be, also, a social network that was naturally immune to spam. We all had our own inviolate spaces for web writing. No moderation was needed because there were no comments. And yet rich discourse emerged. How? The web enabled us to link to one anothers’ posts, and RSS enabled us to monitor one anothers’ feeds. That was enough to sustain vibrant and civil conversation.

Things stayed civil because the system aligned incentives correctly. “You own your words,” Dave said, “and you speak in your own space on the web.” If you said something nasty about someone else you weren’t saying it in their space, or in some neutral space, you were saying it directly in your own space, one that represented you to the world.

Earlier systems, such as UseNet and Web forums, lacked this blend of mutual consent and accountability. So do modern ones such as Facebook and Twitter. The quality of discourse on Radio UserLand was, for a while, like nothing I’ve experienced before or since.

The blogosphere grew. Blog comments appeared. Google killed the dominant blog reader. Twitter and Facebook appeared. Now we can be sovereigns of our own online spaces, and we can be connected to others, but it seems that we can’t be both at the same time and in the same way.

What brings all this back is the kerfuffle, last week, that some of us who are building web annotation tools are calling TateGate. (Tate: annotate.) You can read about it on my company’s blog and elsewhere, but it boils down to a set of hard questions. Is it both legal and ethical to:

1. Write your words on my blog in an annotation overlay visible only to you in your browser?

2. Share the overlay with others in a private group space?

3. Share the overlay on the open web?

The answer to 1 is almost certainly yes. The original 1996 CSS spec, for example, recommended that browsers enable users to override publisher-defined style sheets. CSS recognized that the needs of publishers and readers are in dynamic tension. Publishers decide how they want readers to see their pages, but readers can decide differently. In 2005 a Firefox extension called Greasemonkey began empowering users to make functional as well as stylistic changes: adding a Delete button to Gmail, reporting book availability in local libraries on Amazon pages. If a site has ever successfully challenged the right of a user to alter a page locally, in the browser, I haven’t heard of it; neither have knowledgeable friends and acquaintances I’ve asked about this.

When the overlay is shared at wider scopes things get more complicated. The nexus of stakeholders includes publishers, readers, and annotators. Let’s explore that nexus in two different cases.

Climate news

In this case, climate scientists team up to annotate a news site’s story on climate change. The annotation layer is shared on the open web, visible to anyone who acquires the tool needed to view it. The scientists believe they are providing a public service. Readers who value the scientists’ assessment can opt in to view their annotations. Readers who don’t care what the scientists think don’t have to view the annotations. Publishers may be more or less comfortable with the existence of the opt-in annotation layer, depending on their regard for the scientists and their willingness to embrace independent scrutiny.

Before you decide what you think about this, consider an alternative. Climate skeptics team up to offer a competing annotation layer that offers a very different take on the story. Publishers, annotators, and readers still find themselves in the same kind of dynamic tension, but it will feel different to you in a way that depends on your beliefs about climate change.

Either way it’s likely that you’ll find this model generally sound. Many will agree that public information should be subject to analysis, that analysis should take advantage of the best tools, and that commentary anchored to words and phrases in source texts is a highly effective.

“TateGate”

In this case, a news site annotates a personal blog. The blogger believes that she owns her words, she moderates all comments to ensure that’s so, and she feels violated when she learns about a hostile overlay available to anyone who can discover it. The annotators are using a tool that doesn’t enable sharing the overlay in a private group, so we don’t know how the option to restrict the overlay’s availability might have have mattered. The annotation layer is available in two ways: as a proxied URL available in any unmodified browser, and as a browser extension that users install and activate. The proxy could in principle have been turned off for this blog, but it wasn’t, so we don’t know whether things would have played out differently if a user-installed browser extension were the only way to view the annotations.

These variables may affect how you think about this case. Your beliefs about what constitutes fair use, appropriation, and harassment certainly will. And there are still more variables. A site can, for example, choose to invite Hypothesis annotators by embedding our client. We envision, but have yet to offer, layers in which groups self-moderate annotations that all viewers can read but not write. And we envision that publishers might choose to make only certain of those channels discoverable in the annotation layer they choose to embed. That restriction would, however, not apply to users who bring their own independent annotation client to the page.

We at Hypothesis are soliciting a range of views on this thicket of thorny issues, and we are considering how to evolve tools and policies that will address them. Here I’m not speaking for my employer, though, I am just reflecting on the tension between wanting to own our words and wanting to share them with the world.

The closest modern equivalent to the Radio UserLand model is one that indie web folks call POSSE, which stands for Publish (on your) Own Site, Syndicate Everywhere. POSSE encourages me to comment on your site by writing a post on my site and notifying yours about it. You can choose to accept my contribution or not. If you do accept it, there’s a sense in which it is not a statement that lives on your site but rather one that lives on mine and is reflected to yours. Both parties negotiate a zone of ambiguity between what’s on my site and what’s on your site.

Web annotation seems less ambiguous. When I highlight your words and link mine to them, in an overlay on your page, it seems more as if mine are appearing on your site. Is that an unavoidable perception? I don’t know. Maybe there’s a role for a CSS-like mechanism that enables publishers, annotators, and readers to negotiate where and how annotations are displayed. It’s worth considering.

Here’s what I do know: I’m lucky to be involved in a project that raises these issues and challenges us to consider them carefully.

Liminal thinking at scale

My short 2009 review1 of Stewart Brand’s Whole Earth Discipline includes this Kevin Kelly quote that continues to resonate for me:

Kevin Kelly calls the book “a short course on how to change your mind intelligently” — in this case, about cities, nuclear power, and genetic and planetary engineering. These are all things that Steward Brand once regarded with suspicion but now sees as crucial tools for a sustainable world.

In Changeable minds I wrote about a touchstone question that I now sometimes ask people:

What’s something you believed deeply, for a long time, and then changed your mind about?

It’s a hard question for any of us to answer, but as Dave Gray and Wael Ghonim have recently reminded me, it matters more and more that we try. Here’s a useful picture I grabbed from Gray’s screencast on what he calls liminal thinking:

The idea is that I’m standing in the bubble on the left, atop an unconscious pyramid of belief formation. You are standing in the bubble on the right, atop your own unconscious pyramid. And our two pyramids rest on different regions of an underlying reality. How can we engage in productive discourse?

Gray says it requires two tricky maneuvers. First I need to shine a light down into the unconscious fog, climb down my own “ladder of inference,” and reflect on how my own experience of reality informs my own beliefs. Then, he says, I need to take that flashlight, walk over to your pyramid, and climb up your ladder of inference. “Liminal thinking,” he tweeted the other day, “is the art of creating change by understanding, shaping, and reframing beliefs.”

I’ll surely read his book when it comes out. But since I already agree with the principles and practices it espouses, I don’t expect a mind-changing outcome. It’s clear that Dave Gray and I stand on mostly-overlapping belief pyramids. What would motivate somebody not in that bubble to want to cross the chasm to a very different pyramid?

Wael Ghonim’s latest TED talk suggests an intriguing possibility. He’s now given two such talks. The first, in 2011, was a stirring tribute to social media’s role in fomenting the Arab Spring. (Ghonim created the pivotal We are all Khaled Said Facebook page.) In 2016 the Arab Spring seems a distant memory, and Ghonim entitled his latest talk Let’s design social media that drives real change. Here’s the key takeaway for me:

There’s a lot of debate today on how to combat online harassment and fight trolls. This is so important. No one could argue against that. But we need to also think about how to design social media experiences that promote civility and reward thoughtfulness. I know for a fact if I write a post that is more sensational, more one-sided, sometimes angry and aggressive, I get to have more people see that post. I will get more attention.

But what if we put more focus on quality? What is more important: the total number of readers of a post you write, or who are the people who have impact that read what you write? Couldn’t we just give people more incentives to engage in conversations, rather than just broadcasting opinions all the time? Or reward people for reading and responding to views that they disagree with? And also, make it socially acceptable that we change our minds, or probably even reward that?

I suspect that relatively few of us already are liminal thinkers, or are willing and able to learn and apply the principles and practices. Can we imagine, and build, a social media platform that encourages liminal thinking at scale? That’s an idea worth sharing.


1 When I revisited that post today, I was also intrigued by this:

Don’t miss the annotations — a website that reproduces every paragraph that includes citations, links to their sources, and adds updates.

Alas, the link to those annotations — in iCloud, at http://web.me.com/stewartbrand/DISCIPLINE_footnotes/Contents.html — has rotted. For me it’s another reminder to prioritize work on the archival capabilities we envision for Hypothesis. We want to archive both your annotations and (where possible) the documents they refer to.

When it’s cold in New England, thoughts turn to alternative home heating

I started this WordPress incarnation of my blog in late 2007. On this day in 2009 I published  one of the most-read posts here: Central heating with a wood gasification boiler. WordPress stats have shown me that interest has been seasonal. When it’s winter in the northeastern US, people still heating with oil imagine alternatives. As a result, more people find their way to that post than do in summer.

Would this year’s freaky warmth depress that historical wintertime interest in the article? That’s what I expected to find, and this chart appears to confirm it.

eko-post-dec-views

Of course interest in the blog has declined in general over that period, because I’ve put less effort into writing and promoting it. But if we chart another perennial favorite, Why Guiness tastes better in Ireland, there’s no downward trend:

guiness-post-dec-views

So I think the temperature correlation is valid. And I predict the orange curve on the first chart will trend upward when there’s another cold winter in the Northeast.

Abomination and progress: A schizophrenic Saudi timeline

When I read Katherine Zoepf’s article about emerging legal awareness among Saudi women, I found myself imagining a timeline of events. Then I realized I could create one pretty easily with Hypothesis:

A timeline of events noted in Sisters in Law: Saudi women are beginning to know their rights

2004

In 2004, Saudi Arabia introduced reforms allowing women’s colleges and universities to offer degree programs in law.

In 2004, she was a student in the human-resources department at King Abdulaziz University, in Jeddah, when the university announced that it would be opening a degree program in law for female students. It was the first such program in the kingdom, and Zahran immediately switched her concentration to law.

2008

The first female law students graduated in 2008, but, for several years after that, they were prohibited from appearing in court.

In 2008, King Abdullah, who died last January, appalled some of his subjects when he announced that the Riyadh University for Women would be renamed Princess Nora bint Abdul Rahman University, in memory of a favorite aunt.

The fact that women couldn’t obtain law licenses wasn’t a source of anxiety for Zahran and her classmates, but by 2008, when she graduated, the justice ministry still hadn’t indicated that it would begin licensing female lawyers.

2009

Sorcery is considered such a grave concern that, in 2009, the Commission for the Promotion of Virtue and the Prevention of Vice created a specially trained unit to conduct witchcraft investigations.

2011

In 2011, when Mohra Ferak entered the law department at Dar Al-Hekma, her immediate family was supportive, but others were horrified. People said, “Are you serious?”

2013

In 2013, law licenses were granted to four women, including Bayan Mahmoud Zahran.

Since 2013, women have been allowed to ride bicycles, but only in designated parks and recreation areas, chaperoned by a close male relative.

In supermarkets, which have employed women since 2013, low partitions suffice, because semi-public spaces are easily monitored by members of the Committee for the Promotion of Virtue and the Prevention of Vice, the kingdom’s religious police.

2014

The lecturer, Bayan Mahmoud Zahran, a thirty-year-old Jeddah attorney who, in January, 2014, became the first Saudi woman to open a law firm.

The advent, in 2014, of car services that can be requested through mobile apps has given women a freedom of movement that had seemed impossible just months earlier.

2015

The first lecture in the series, which Ferak called Hawa’a’s Rights (Hawa’a is the Arabic version of the name Eve), was publicized on Twitter and took place on the evening of April 15th.

The second Hawa’a’s Rights lecture, on April 26th, addressed personal-status law, the category of Saudi law that governs marriage, divorce, guardianship, and inheritance.

In early October, at the end of the Islamic calendar year, the Saudi justice ministry announced that in the past twelve months there had been a forty-eight-per-cent increase in cases of khula, divorces initiated by women.

In November, in an adultery case, a married woman was sentenced to death by stoning; her unmarried male partner received a hundred lashes.

Today, several thousand Saudi women hold law degrees, and sixty-seven are licensed to practice, according to justice-ministry figures released at the end of November.

To make this timeline I started by annotating the article in a particular way. All the dates of interest were in the 2000s, so I did an in-browser search for 20 which highlighted all the occurrences of 2004, 2008, 2011, etc. I selected those sentences and made Hypothesis annotations with the tags 20xx, Women, and Saudi Arabia. For dates in 2015, I repeated this exercise using search for names of months.

You can see the annotations in context here, and as extracts here. A light massage of that data yielded the timeline.

It’s a weirdly schizophrenic mix of despair and hope, abomination and progress. One the one hand, there’s still stoning.

On the other hand, there’s growing awareness of legal rights — if not yet a strong movement to expand them. And there’s dramatic new mobility thanks to Uber.

Will the better angels of our nature prevail? If we project this timeline into the future, I’m betting that we’ll see less witchcraft and stoning, more freedom and mobility.

Organic hydro-engineering

We arrived in California in the fourth year of the drought. It didn’t rain often last winter but when it did it poured, and the Santa Rosa Creek — which had been trickling through our neighborhood — became a torrent. As I watched all that precious rainwater rushing away to the Russian River, and thence to the Pacific Ocean, I wondered: What’s California doing to capture that water? The answer so far seems to be: Not much.

The need for large-scale water storage has been long discussed, but (at least from my newcomer’s perspective) not seriously considered until the recent crisis. The picture worth a thousand words was of Governor Jerry Brown examining bare ground in the Sierra Nevada Mountains last March. That ground should have been covered with deep snow.

Here in Santa Rosa, a friend who runs Atlas Coffee has taken matters into his own hands. He acquired the 500-gallon storage tank shown here:

Since that photo was taken, he’s hooked it up to a pump. Rainwater flows off the roof, it’s pumped into the tank, and then it irrigates the plants. It’s a wonderful idea that I’m sure has occurred to other homeowners and businesspeople, but you don’t see many other implementations around town. And as El Niño approaches, other priorities will understandably loom larger. Hydro-engineering projects, even relatively simple ones like this, are daunting. And the problem really needs to be addressed at much larger scale.

On that larger scale we still tend to imagine hydro-engineering projects, bigger ones like raising the height of the Shasta Dam to the tune of $1.2 billion [High Country News]. That sounded plausible to me. Then, last night, I watched a PBS documentary on beavers. I’m a sucker for nature documentaries, and I’ve missed seeing beavers since moving to California, they were everywhere in New Hampshire, so it was fun to watch them in action. Then this segment grabbed me:

Here’s a bit more of the backstory. In the Susie Creek watershed near Elko, Nevada, the land’s been drying out since cattle began grazing it 200 years ago. Beavers, reintroduced recently, reversed that trend. In this post, the Resilient Design Institute’s Alex Wilson refers to the same clip I quoted above:

The excellent 2014 PBS Nature documentary Leave it to Beavers describes how beavers modify the landscape to retain moisture. It turns out that beavers don’t only create ponds by damming creeks, the excavate ponds deeper, allowing them to hold more water. Biologist Glynnis Hood, Ph.D., who has been studying beavers near Edmonton, Alberta, described how important a role beavers play in Alberta by holding water.

“In 2002 we had the worst drought on record,” she reported. “The only places where we had water in natural areas was where we had beaver, and farmers were actually seeking out neighbors who had beavers on their landscape to water their cattle. So with beavers back on the land, even during the worst drought on record, they were mitigating the effects of drought and keeping water on the landscape.”

That idea was echoed in the PBS documentary by hydrologist Suzanne Foudy, Ph.D., of the U.S. Forest Service in describing the impact new beaver arrivals have been having on Susie Creek in north-central Nevada, near Elko. “If the snowpack’s coming off earlier and ranchers want water,” she said, “then we’ve got to figure out a way to keep it on the landscape, because it’s no longer going to be stored as snow in the mountains.” She continued: “What beavers do in all these itty-bitty streams is they create these small savings accounts, these pockets where it’s stored — no longer as snow, but as surface and groundwater.”

I’ve always enjoyed watching beavers and thinking about their effects on my local environments. But I never realized that the species as a whole can provide a critical ecosystem service at large scale. Here’s hoping that California will recruit them to help us replace a dwindling snowpack with water stored on the land. And that, as a bonus, I’ll get to see them more often again.