Investigating ClaimReview

17 Mar 201816 Apr 2018 ~ Jon Udell ~ 2 Comments

In Annotating the Wild West of Information Flow I discussed a prototype of a ClaimReview-aware annotation client. ClaimReview is a standard way to format “a fact-checking review of claims made (or reported) in some creative work.” Now that both Google and Bing are ingesting ClaimReview markup, and fact-checking sites are feeding it to them, the workflow envisioned in that blog post has become more interesting. A fact checker ought to be able to highlight a reviewed claim in its original context, then capture that context and push it into the app that produces the ClaimReview markup embedded in fact-checking articles and slurped by search engines.

That workflow is one interesting use of annotation in the domain of fact-checking, it’s doable, and I plan to explore it. But here I’ll focus instead on using annotation to project claim reviews onto the statements they review, in original context. Why do that? Search engines may display fact-checking labels on search results, but nothing shows up when you land directly on an article that includes a reviewed claim. If the reviewed claim were annotated with the review, an annotation client could surface it.

To help make that idea concrete, I’ve been looking at ClaimReview markup in the wild. Not unexpectedly, it gets used in slightly different ways. I wrote a bookmarklet to help visualize the differences, and used it to find these five patterns:

https://www.washingtonpost.com/news/fact-checker/wp/2018/03/16/does-president-trumps-border-wall-pay-for-itself

{
  "urlOfFactCheck": "https://www.washingtonpost.com/news/fact-checker/wp/2018/03/16/does-president-trumps-border-wall-pay-for-itself/?utm_term=.31281569ab46",
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "-1",
    "alternateName": "Fuzzy math",
    "worstRating": "-1",
    "bestRating": "-1",
    "image": "https://dhpikd1t89arn.cloudfront.net/custom-rating/custom-ratings-b89013e8-729f-44aa-816e-93a9307b079c"
  },
  "itemReviewed": {
    "@type": "CreativeWork",
    "author": {
      "@type": "Person",
      "name": "Donald Trump ",
      "jobTitle": "President",
      "image": "https://dhpikd1t89arn.cloudfront.net/custom-rating/custom-ratings-b89013e8-729f-44aa-816e-93a9307b079c",
      "sameAs": [
        "www.whitehouse.gov"
      ]
    },
    "datePublished": "2018-03-13",
    "name": "in a tweet"
  },
  "claimReviewed": "\"The $18 billion wall will pay for itself by curbing the importation of crime, drugs and illegal immigrants who tend to go on the federal dole.”"
}

https://www.snopes.com/fact-check/did-captured-islamic-state-obama/

{
  "urlOfFactCheck": "https://www.snopes.com/fact-check/did-captured-islamic-state-obama/",
  "reviewRating": {
    "alternateName": "False",
    "ratingValue": "",
    "bestRating": ""
  },
  "itemReviewed": {
    "datePublished": "2018-03-13T08:25:46+00:00",
    "name": "Snopes",
    "sameAs": "http://dailyworldupdate.com/2018/03/11/breaking-captured-isis-leader-had-obama-on-speed-dial/"
  },
  "claimReviewed": "A captured Islamic State leader's cell phone contained the phone numbers of world leaders, including former President Barack Obama."
}

https://www.factcheck.org/2018/03/trump-misses-mars-attack/

{
  "urlOfFactCheck": "https://www.factcheck.org/2018/03/trump-misses-mars-attack/",
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "-1",
    "alternateName": "Clinton Endorsed Mars Flight",
    "worstRating": "-1",
    "bestRating": "-1",
    "image": "https://dhpikd1t89arn.cloudfront.net/custom-rating/custom-ratings-76945159-07c1-49d6-833e-717c1da2478f"
  },
  "itemReviewed": {
    "@type": "CreativeWork",
    "author": {
      "@type": "Person",
      "name": "Donald Trump",
      "jobTitle": "President of the United States",
      "image": "https://dhpikd1t89arn.cloudfront.net/custom-rating/custom-ratings-76945159-07c1-49d6-833e-717c1da2478f",
      "sameAs": [
        "http://www.whitehouse.gov"
      ]
    },
    "datePublished": "2018-03-13",
    "name": "Speech at military base"
  },
  "claimReviewed": "NASA “wouldn’t have been going to Mars if my opponent won.\""
}

http://www.politifact.com/texas/statements/2018/mar/15/ted-cruz/ted-cruz-says-beto-orourke-wants-open-borders-and-/

{
  "urlOfFactCheck": "http://www.politifact.com/texas/statements/2018/mar/15/ted-cruz/ted-cruz-says-beto-orourke-wants-open-borders-and-/",
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "2",
    "alternateName": "False",
    "worstRating": "1",
    "bestRating": "7",
    "image": "https://dhpikd1t89arn.cloudfront.net/rating_images/politifact/tom-false.jpg"
  },
  "itemReviewed": {
    "@type": "CreativeWork",
    "author": {
      "@type": "Person",
      "name": "Ted Cruz",
      "jobTitle": "U.S. senator for Texas",
      "image": "https://dhpikd1t89arn.cloudfront.net/rating_images/politifatc/tom-false.jpg",
      "sameAs": [
        "https://www.youtube.com/watch?v=MvKElqdUxZg"
      ]
    },
    "datePublished": "2018-03-06",
    "name": "Houston, Texas"
  },
  "claimReviewed": "Says Beto O’Rourke wants \"open borders and wants to take our guns.\""
}

Breitbart repeats blogger’s unsupported claim that NOAA manipulates data to exaggerate warming

{
  "urlOfFactCheck": "https://climatefeedback.org/claimreview/breitbart-repeats-bloggers-unsupported-claim-noaa-manipulates-data-exaggerate-warming/",
  "reviewRating": {
    "@type": "Rating",
    "alternateName": "Unsupported",
    "bestRating": 5,
    "ratingValue": 2,
    "worstRating": 1
  },
  "itemReviewed": {
    "@type": "CreativeWork",
    "author": {
      "@type": "Person",
      "name": "James Delingpole"
    },
    "headline": "TBD",
    "image": {
      "@type": "ImageObject",
      "height": "200px",
      "url": "https://climatefeedback.org/wp-content/uploads/Sources/Delingpole_Breitbart_News.png",
      "width": "200px"
    },
    "publisher": {
      "@type": "Organization",
      "logo": "Breitbart",
      "name": "Breitbart"
    },
    "datePublished": "20 February 2018",
    "url": "TBD"
  },
  "claimReviewed": "NOAA has adjusted past temperatures to look colder than they were and recent temperatures to look warmer than they were."
}

To normalize the difference I’ll need to look at more examples from these sites. But I’m also looking for more patterns, so if you know of other sites that routinely embed ClaimReview markup, please drop me links!

How to improve Wikipedia citations with Hypothesis direct links

9 Jan 20189 Jan 2018 ~ Jon Udell ~ 3 Comments

Wikipedia aims to be verifiable. Every statement of fact should be supported by a reliable source that the reader can check. Citations in Wikipedia typically refer to online documents accessible at URLs. But with the advent of standard web annotation we can do better. We can add citations to Wikipedia that refer precisely to statements that support Wikipedia articles.

According to Wikipedia’s policy on citing sources:

Wikipedia’s Verifiability policy requires inline citations for any material challenged or likely to be challenged, and for all quotations, anywhere in article space.

Last night, reading https://en.wikipedia.org/wiki/Tubbs_Fire, I noticed this unsourced quote:

Sonoma County has four “historic wildfire corridors,” including the Hanly Fire area.

I searched for the source of that quotation, found it in a Press Democrat story, annotated the quote, and captured a Hypothesis direct link to the annotation. In this screenshot, I’ve clicked the annotation’s share icon, and then clicked the clipboard icon to copy the direct link to the clipboard. The direct link encapsulates the URL of the story, plus the information needed to locate the quotation within the story.

Given such a direct link, it’s straightforward to use it in a Wikipedia citation. Back in the Wikipedia page I clicked the Edit link, switched to the visual editor, set my cursor at the end of the unsourced quote, and clicked the visual editor’s Cite button to invoke this panel:

There I selected the news template, and filled in the form in the usual way, providing the title of the news story, its date, its author, the name of the publication, and the date on which I accessed the story. There was just one crucial difference. Instead of using the Press Democrat URL, I used the Hypothesis direct link.

And voilà! There’s my citation, number 69, nestled among all the others.

Citation, as we’ve known it, begs to be reinvented in the era of standard web annotation. When I point you to a document in support of a claim, I’m often thinking of a particular statement in that document. But the burden is on you to find that statement in the document to which my citation links. And when you do, you may not be certain you’ve found the statement implied by my link. When I use a direct link, I relieve you of that burden and uncertainty. You land in the cited document at the right place, with the supporting statement highlighted. And if it’s helpful we can discuss the supporting statement in that context.

I can envision all sorts of ways to turbocharge Wikipedia’s workflow with annotation-powered tools. But no extra tooling is required to use Hypothesis and Wikipedia in the way I’ve shown here. If you find an unsourced quote in Wikipedia, just annotate it in its source context, capture the direct link, and use it in the regular citation workflow. For a reader who clicks through Wikipedia citations to check original sources, this method yields a nice improvement over the status quo.

Annotating Web Audio

6 Jan 20186 Jan 2018 ~ Jon Udell ~ 7 Comments

On a recent walk I listened to Unmasking Misogyny on Radio Open Source. One of the guests, Danielle McGuire, told the story of Rosa Parks’ activism in a way I hadn’t heard before. I wanted to capture that segment of the show, save a link to it, and bookmark the link for future reference.

If you visit the show page and click the download link, you’ll load the show’s MP3 file into your browser’s audio player. Nowadays that’s almost always going to be the basic HTML5 player. Here’s what it looks like in various browsers:

The show is about an hour long. I scrubbed along the timeline until I heard Danielle McGuire’s voice, and then zeroed in on the start and end of the segment I wanted to capture. It starts at 18:14 and ends at 21:11. Now, how to link to that segment?

I first investigated this problem in 2004. Back then, I learned that it’s possible to fetch and play random parts of MP3 files, and I made a web app that would figure out, given start and stop times like 18:14 and 21:11, which part of the MP3 file to fetch and play. Audio players weren’t (and still aren’t) optimized for capturing segments as pairs of minute:second parameters. But once you acquired those parameters, you could form a link that would invoke the web app and play the segment. Such links could then be curated, something I often did using the del.icio.us bookmarking app.

Revisiting those bookmarks now, I’m reminded that Doug Kaye and I were traveling the same path. At ITConversations he had implemented a clipping service that produced URLs like this:

http://www.itconversations.com/clip.php?showid=378&start=4:37&stop=6:04

Mine looked like this:

http://udell.infoworld.com/?url=http://www.itconversations.com/audio/download/ITConversations-378.mp3&beg=4:37&end=6:04

Both of those audio-clipping services are long gone. But the audio files survive, thanks to the exemplary cooperation between ITConversations and the Internet Archive. So now I can resurrect that ITConversations clip — in which Doug Engelbart, at the Accelerating Change conference in 2004, describes the epiphany that inspired his lifelong quest — like so:

http://jonudell.net/av/audio.html?url=http://jonudell.net/audio/ITC.AC2004-DougEngelbart-2004.11.07.mp3&startmin=4&startsec=37&endmin=6&endsec=4

And here’s the segment of Danielle McGuire’s discussion of Rosa Parks that I wanted to remember:

http://jonudell.net/av/audio.html?url=http://ia601506.us.archive.org/5/items/171207OSPODCASTSexualHarassment/171207-OS-PODCAST-SexualHarassment.mp3&startmin=18&startsec=14&endmin=21&endsec=11

This single-page JavaScript app aims to function both as a player of predefined segments, and as a tool that makes it as easy as possible to define segments. It’s still a work in progress, but I’m able to use it effectively even as I continue to refine the interaction details.

For curation of these clips I am, of course, using Hypothesis. Here are some of the clips I’ve collected on the tag AnnotatingAV:

To create these annotations I’m using Hypothesis page notes. An annotation of this type is like a del.icio.us or pinboard.in bookmark. It refers to the whole resource addressed by a URL, rather than to a segment of interest within a resource.

Most often, a Hypothesis user defines a segment of interest by selecting a passage of text in a web document. But if you’re not annotating any particular selection, you can use a page note to comment on, tag, and discuss the whole document.

Since each audio clip defines a segment as a standalone web page with a unique URL, you can use a Hypothesis page note to annotate that standalone page:

It’s a beautiful example of small pieces loosely joined. My clipping tool is just one way to form URLs that point to audio and video segments. I hope others will improve on it. But any clipping tool that produces unique URLs can work with Hypothesis and, of course, with any other annotation or curation tool that targets URLs.

Thoughts on Audrey Watters’ “Thoughts on Annotation”

27 Jun 201727 Jun 2017 ~ Jon Udell ~ 6 Comments

Back in April, Audrey Watters’ decided to block annotation on her website. I understand why. When we project our identities online, our personal sites become extensions of our homes. To some online writers, annotation overlays can feel like graffiti. How can we respect their wishes while enabling conversations about their writing, particularly conversations that are intimately connected to the writing? At the New Media Consortium conference recently, I was finally able to meet Audrey in person, and we talked about how to balance these interests. Yesterday Audrey posted her thoughts about that conversation, and clarified a key point:

You can still annotate my work. Just not on my websites.

Exactly! To continue that conversation, I have annotated that post here, and transcluded my initial set of annotations below.

judell 6/27/2017 #

using an HTML meta tag to identify annotation preferences

This is just a back-of-the-napkin sketch of an idea, not a formal proposal.

judell 6/27/2017 #

I’m much less committed to having one canonical “place” for annotations than Hypothesis is

Hypothesis isn’t committed to that either. The whole point of the newly-minted web annotation standard is to enable an ecosystem of interoperable annotation clients and servers, analogous to comparable ecosystems of email and web clients and servers.

judell 6/27/2017 #

Hypothesis annotations of a PDF can be centralized, no matter where the article is hosted or whether it’s a local copy

Centralization and decentralization are slippery terms. I would rather say that Hypothesis can unify a set of annotations across a family of representations of the “same” work. Some members of that family might be HTML pages, others might be PDFs hosted on the web or kept locally.

It’s true that when Hypothesis is used to create and view such annotations, they are “centralized” in the Hypothesis service. But if someone else stands up an instance of Hypothesis, that becomes a separate pool of annotations. Likewise, we at Hypothesis have planned for, and expect to see, a world in which non-Hypothesis-based implementations of standard annotation capability will host still other separate pools of annotations.

So you might issue three different API queries — to Hypothesis, to a Hypothesis-based service, and to a non-Hypothesis-based service — for a PDF fingerprint or a DOI. Each of those services might or might not internally unify annotations across a family of “same” resources. If you were to then merge the results of those three queries, you’d be an annotation aggregator — the moral equivalent of what Radio UserLand, Technorati, and other blog aggregators did in the early blogosphere.

Weaving the annotated web

5 May 20171 Sep 2021 ~ Jon Udell ~ 5 Comments

In 1997, at the first Perl Conference, which became OSCON the following year, my friend Andrew Schulman and I both gave talks on how the web was becoming a platform not only for publishing, but also for networked software.

Here’s the slide I remember from Andrew’s talk:

http://wwwapps.ups.com/tracking/tracking.cgi?tracknum=1Z742E220310270799

The only thing on it was a UPS tracking URL. Andrew asked us to stare at it for a while and think about what it really meant. “This is amazing!” he kept saying, over and over. “Every UPS package now has its own home page on the world wide web!”

It wasn’t just that the package had a globally unique identifier. It named a particular instance of a business process. It made the context surrounding the movement of that package through the UPS system available to UPS employees and customers who accessed it in their browsers. And it made that same context available to the Perl programs that some of us were writing to scrape web pages, extract their data, and repurpose it.

As we all soon learned, URLs can point to many kinds of resources: documents, interactive forms, audio or video.

The set of URL-addressable resources has two key properties: it’s infinite, and it’s interconnected. Twenty years later we’re still figuring out all the things you can do on a web of hyperlinked resources that are accessible at well-known global addresses and manipulated by a few simple commands like GET, POST, and DELETE.

When you’re working in an infinitely large universe it can seem ungrateful to complain that it’s too small. But there’s an even larger universe of resources populated by segments of audio and video, regions of images, and most importantly, for many of us, text in web documents: paragraphs, sentences, words, table cells.

So let’s stare in amazement at another interesting URL:

https://hyp.is/LoaMFCSJEee3aAMJuXhO-w/www.ics.uci.edu/~fielding/pubs/dissertation/software_arch.htm

Here’s what it looks like to a human who follows the link: You land on a web page, in this case Roy Fielding’s dissertation on web architecture, it scrolls to the place where I’ve highlighted a phrase, and the Hypothesis sidebar displays my annotation which includes a comment and a tag.

And here’s what that resource looks like to a computer when it fetches a variant of that link:

{
    "body": [
        {
            "type": "TextualBody",
            "value": "components: web resources\n\nconnectors: links\n\ndata: data",
            "format": "text/markdown"
        },
        {
            "type": "TextualBody",
            "purpose": "tagging",
            "value": "IAnnotate2017"
        }
    ],
    "target": [
        {
            "source": "https://www.ics.uci.edu/~fielding/pubs/dissertation/software_arch.htm",
            "selector": [
                {
                    "type": "XPathSelector",
                    "value": "/table[2]/tbody[1]/tr[1]/td[1]",
                    "refinedBy": {
                        "start": 82,
                        "end": 114,
                        "type": "TextPositionSelector"
                    }
                },
                {
                    "type": "TextPositionSelector",
                    "end": 4055,
                    "start": 4023
                },
                {
                    "exact": "components, connectors, and data",
                    "prefix": "tion of architectural elements--",
                    "type": "TextQuoteSelector",
                    "suffix": "--constrained in their relations"
                }
            ]
        }
    ],
    "created": "2017-04-18T22:48:46.756821+00:00",
    "@context": "http://www.w3.org/ns/anno.jsonld",
    "creator": "acct:judell@hypothes.is",
    "type": "Annotation",
    "id": "https://hypothes.is/a/LoaMFCSJEee3aAMJuXhO-w",
    "modified": "2017-04-18T23:03:54.502857+00:00"
}

The URL, which we call a direct link, isn’t itself a standard way to address a selection of text, it’s just a link that points to a web resource. But the resource it points to, which describes the highlighted text and its coordinates within the document, is — since February of this year — a W3C standard. The way I like to think about it is that the highlighted phrase — and every possible highlighted phrase — has its own home page on the web, a place where humans and machines can jointly focus attention.

If we think of the web we’ve known as a kind of fabric woven together with links, the annotated web increases the thread count of that fabric. When we weave with pieces of URL-addressable documents, we can have conversations about those pieces, we can retrieve them, we can tag them, and we can interconnect them.

Working with our panelists and others, it’s been my privilege to build a series of annotation-powered apps that begin to show what’s possible when every piece of the web is addressable in this way.

I’ll show you some examples, then invite my collaborators — Beth Ruedi from AAAS, Mike Caulfield from Washington State University Vancouver, Anita Bandrowski from SciCrunch, and Maryann Martone from UCSD and Hypothesis — to talk about what these apps are doing for them now, and where we hope to take them next.

Science in the Classroom

First up is a AAAS project called Science in the Classroom, a collection of research papers from the Science family of journals that are annotated — by graduate students — so teachers can help younger students understand the methods and outcomes of scientific research.

Here’s one of those annotated papers. A widget called the Learning Lens toggles layers of annotation and off.

Here I’ve selected the Glossary layer, and I’ve clicked on the word “distal” to reveal the annotation attached to it.

Now lets look behind the scenes:

Hypothesis was used to annotate the word “distal”. But Learning Lens predated the use of Hypothesis, and the Science in the Classroom team wanted to keep using Learning Lens to display annotations. What they didn’t want was the workflow behind it, which required manual insertion of annotations into HTML pages.

Here’s the solution we came up with. Use Hypothesis to create annotations, then use some JavaScript in Science in the Classroom pages to retrieve Hypothesis annotations and write them into the pages, using the same format that had been applied manually. The preexisting and unmodified Learning Lens JavaScript can then do what it does: pick up the annotations, assign color-coded highlights based on tags, and show the annotations when you click on the highlights.

What made this possible was a JavaScript library that helps with the heavy lifting required to attach an annotation to its intended target in the document.

That library is part of the Hypothesis client, but it’s also available as a standalone module that can be used for other purposes. It’s a nice example of how open source components can enable an ecosystem of interoperable annotation services.

DigiPo / EIC

Next up is a toolkit for student fact-checkers and investigative journalists. You’ve already heard from Mike Caulfield about the Digital Polarization Project, or DigiPo, and from Stefan Candea about the European Investigative Collaborations network. Let’s look at how we’ve woven annotation into their investigative workflows.

These investigations are both written and displayed in a wiki. This is a DigiPo example:

I did the investigation of this claim myself, to test out the process we were developing. It required me to gather a whole lot of supporting evidence before I could begin to analyze the claim. I used a Hypothesis tag to collect annotations related to the investigation, and you can see them in this Hypothesis view:

I can be very disciplined about using tags this way, but it’s a lot to ask of students, or really almost anyone. So we created a tool that knows about the set of investigations underway in the wiki, and offers the names of those pages as selectable tags.

Here I’ve selected a piece of evidence for that investigation. I’m going to annotate it, not by using Hypothesis directly, but instead by using a function in a separate DigiPo extension. That function uses the core anchoring libraries to create annotations in the same way the Hypothesis client does.

But it leads the user through an interstitial page that asks which investigation the annotation belongs to, and assigns a corresponding tag to the annotation it creates.

Back in the wiki, the page embeds the same Hypothesis view we’ve already seen, as a Related Annotations widget pinned to that particular tag:

I had so much raw material for this article that I needed some help organizing it. So I added a Timeline widget that gathers a subset of the source annotations that are tagged with dates.

To put something onto the timeline, you select a date on a page.

Then you create an annotation with a tag corresponding to the date.

Here’s what the annotation looks like in Hypothesis.

Over in the wiki, our JavaScript finds annotations that have these date tags and arranges them on the Timeline.

Publication dates aren’t always evident on web pages, sometimes you have to do some digging to find them. When you do find one, and annotate a page with it, you’ve done more than populate the Timeline in a DigiPo page. That date annotation is now attached to the source page for anyone to discover, using Hypothesis or any other annotation-aware viewer. And that’s true for all the annotations created by DigiPo investigators. They’re woven into DigiPo pages, but they’re also available for separate reuse and aggregation.

The last and most popular annotation-related feature we added to the toolkit is called Footnotes. Once you’ve gathered your raw material into the Related Annotations bucket, and maybe organized some of it onto the Timeline, you’ll want to weave the most pertinent references into the analysis you’re writing.

To do that, you find the annotation you gathered and use Copy to clipboard to capture the direct link.

Then you wrap that link around some text in the article:

When you refresh the page, here’s what you get. The direct link does what a direct link does: it takes you to the page, scrolls you to the annotation in context. But it can take a while to review a bunch of sources that way.

So the page’s JavaScript also creates a link that points down into the Footnotes section. And there, as Ted Nelson would say, and as Nate Angell for some reason hates hearing me say, the footnote is “transcluded” into the page so all the supporting context is right there.

One final point about this toolkit. Students don’t like the writing tools available in wikis, and for good reason, they’re pretty rough around the edges. So we want to enable them to write in Google Docs. We also want them to footnote their articles using direct links because that’s the best way to do it. So here’s a solution we’re trying. From the wiki you’ll launch into Google Docs where you can do your writing in a much more robust editor that makes it really easy to include images and charts. And if you use direct links in that Google Doc, they’ll still show up as Footnotes.

We’re not yet sure this will pan out, but my colleague Maryann Martone, who uses Hypothesis to gather raw material for her scientific papers, and who writes them in Google Docs, would love to be able to flow annotations through her writing tool and into published footnotes.

SciBot

Maryann is the perfect segue to our next example. Along with Anita Bandrowski, she’s working to increase the thread count in the fabric of scientific literature. When neuroscientists write up the methods used in their experiments, the ingredients often include highly specific antibodies. These have colloquial names, and even vendor catalog numbers, but they still lacked unique identifiers. So the Neuroscience Information Framework, NIF for short, has defined a namespace called RRID (research resource identifier), built a registry for RRIDs, and convinced a growing number of authors to mention RRIDs in their papers.

Here’s an article with RRIDs in it. They’re written directly into the text because the text is the scientific record, it’s the only artifact that’s guaranteed to be preserved. So if you’re talking about a goat polyclonal antibody, you look it up in the registery, capture its ID, and write it directly into the text. And if it’s not in the registry, please add it, you’ll make Anita very happy if you do!

The first phase of a project we call SciBot was about validating those RRIDs. They’re just freetext, after all, typed in by authors. Were the identifiers spelled correctly? Did they point to actual registry entries? To find out we built a tool that automatically annotates occurrences of RRIDs.

In this example, Anita is about to click on the SciBot tool, which launches from a bookmarklet, and sends the text of the paper to a backend service. It scans the text for RRIDs, looks up each one in the registry, and uses the Hypothesis API to create an annotation — bound to the occurrence in the text — that reports the results of the registry lookup.

Here the Hypothesis realtime API is showing that SciBot has created three annotations on this page.

And here are those three annotations, anchored to their occurrences in the page, with registry entries displayed in the sidebar.

SciBot curators review these annotations and use tags to mark which are valid. When some aren’t, and need attention, the highlight focuses that attention on a specific occurrence.

This hybrid of automatic entity recognition and interactive human curation is really powerful. Here’s an example where an antibody doesn’t have an RRID but should.

Every automatic workflow needs human exception handling and error correction. Here the curator has marked an RRID that wasn’t written into the literature, but now is present in the annotation layer.

These corrections are now available to train a next-gen entity recognizer. Iterating through that kind of feedback loop will be a powerful way to mine the implicit data that’s woven into the scientific literature and make it explicit.

Here’s the Hypothesis dashboard for one of the SciBot curators. The tag cloud gives you a pretty good sense of how this process has been unfolding so far.

Publishers have begun to link RRIDs to the NIF registry. Here’s an example at PubMed.

If you follow the ZIRC_ZL1 link to the registry, you’ll find a list of other papers whose authors used the same experimental ingredient, which happens to be a particular strain of zebrafish.

This is the main purpose of RRIDs. If that zebrafish is part of my experiment, I want to find who else has used it, and what their experiences have been — not just what they reported in their papers, but ideally also what’s been discussed in the annotation layer.

Of course I can visit those papers, and search within them for ZIRC_ZLI, but with annotations we can do better. In DigiPo we saw how footnoted quotes from source documents can transclude into an article. Publishers could do the same here.

Or we could do this. It’s a little tool that offers to look up an RRID selected in text.

It just links to an instance of the Hypothesis dashboard that’s pinned to the tag for that RRID.

Those search results offer direct links that take you to each occurrence in context.

Claim Chart

Finally, and to bring us full circle, I recently reconnected with Andrew Schulman who works nowadays as a software patent attorney. There’s a tool of his trade called a claim chart. It’s a two-column table. In column one you list claims that a patent is making, which are selections of text from the claims section of the patent. And in column two you assemble bits of evidence, gathered from other sources, that bear on specific claims. Those bits of evidence are selections of text in other documents. It’s tedious to build a claim chart, it involves a lot of copying and pasting, and the evidence you gather is typically trapped in whatever document you create.

Andrew wondered if an annotation-powered app could help build claim charts, and also make the supporting evidence web-addressable for all the reasons we’ve discussed. If I’ve learned anything about annotation, it’s that when somebody asks “Can you do X with annotation?” the answer should always be: “I don’t know, should be possible, let’s find out.”

So, here’s an annotation-powered claim chart.

The daggers at top left in each cell are direct links. The ones in the first column go to patent claims in context.

The ones in the second column go to related statements in other documents.

And here’s how the columns are related. When you annotate a claim, you use a toolkit function called Add Selection as Claim.

Your selection here identifies the target document (that is, the patent), the claim chart you’re building (here, it’s a wiki page called andrew_test), and the claim itself (for example, claim 1).

Once you’ve identified the claims in this way, they’re available as targets of annotations in other documents. From a selection in another document, you use Add Selection as Claim-Related.

Here you see all the claims you’ve marked up, so it’s easy to connect the two statements.

The last time I read Vannevar Bush’s famous essay As We May Think, this was the quote that stuck with me.

When statements in documents become addressable resources on the web, we can weave them together in the way Vannevar Bush imagined.

Dwelling in the zone of evidence

4 Mar 201727 Jun 2017 ~ Jon Udell ~ 3 Comments

I’ve written plenty about the software layer that adapts the Hypothesis annotator to the needs of someone who gathers, organizes, analyzes, and then writes about evidence found online. Students in courses based on Mike Caulfield’s Digital Polarization template will, I hope, find that this software streamlines the grunt work required to find and cite the evidence that supports evaluation of a claim like this one:

Claim: The North Carolina Republican Party sent out a press release boasting about how its efforts drove down African-American turnout in the 2016 US presidential election.

That’s a lightly-edited version of something I read in the New Yorker and can send you to directly:

As we were fleshing out how a DigiPo course would work, I wrote an analysis of that claim. The investigation took me all the way back to the 1965 Voting Rights Act. Then it led to the 2013 Supreme Court decision — in Shelby vs Holder — to dilute the “strong medicine” Congress had deemed necessary “to address entrenched racial discrimination in voting.” Then to a series of legal contests as North Carolina began adjusting its voting laws. Then to the election-year controversies about voter suppression. And finally to the press release that the North Carolina GOP sent the day before the election, and the reactions to it.

Many claims don’t require this kind of deep dive. As Mike writes today, core strategies — look for fact-checking prior art, go upstream to the source, read laterally — can resolve some claims quickly.

But some claims do require a deep dive. In those cases I want students to immerse themselves in that process of discovery. I want them to suspend judgement about the claim and focus initially on marshalling evidence, evaluating sources, and laying a foundation for analyis. It’s hard work that the DigiPo toolkit can make easier, maybe even fun. That’s crucial because the longer you can comfortably dwell in that zone of evidence-gathering and suspended judgement, the stronger your critical thinking will become.

When I first read Toobin’s claim my internal narrative was: “Boasted about voter suppression? Of course those neanderthals did!” Then I entered the zone and spent many hours there. Voter suppression wasn’t a topic I’d spent much time reading about, so I learned a lot. When I returned to the claim I arrived at an interesting judgement. Yes there was voter suppression, and it was in some ways more draconian than I had thought. But had the North Carolina GOP actually boasted (Mother Jones: bragged, Salon: celebrated) the lower African-American turnout? I concluded it had not. It had reported reduced early voting, but not explicitly claimed that was a successful outcome of voter suppression.

So we rated the claim as Mixed — that is, partly true, partly false. A next step for this investigation would be to break the claim into more granular parts. (Software developers would call that “refactoring” the claim.) So:

In a press release on November 7, 2016, the North Carolina GOP reported lower African-American early voting.

That’s easy to check. True.

Here’s another:

In its 11/7/2016 press release the North Carolina GOP boasted about the success of its voter suppression efforts.

Also easy to check: False.

What about this?

In the wake of Shelby vs Holder, the North Carolina GOP pushed legislation that discriminates against African-American voters.

You need to gather and organize a lot of source material in order to even begin to evaluate that claim. My fondest hope for DigiPo is that students inclined to judge the claim, one way or the another, will delay that judgement long enough to gather evidence that all can agree is valid. That, I believe, would be a fantastic educational outcome.

How shared vocabularies tie the annotated web together

25 Feb 201727 Jun 2017 ~ Jon Udell ~ 5 Comments

I’m fired up about the work I want to share at Domains 2017 this summer. The tagline for the conference is Indie Tech and Other Curiosities, and I plan to be one of the curiosities!

I’ve long been a cheerleader for the Domain of One’s Own movement. In Reclaiming Innovation, Jim Groom wrote about the need to “understand technologies as ‘potentiality’ (to graft a concept by Anton Chekov from a literary to a technical context).” He continued:

This is the idea that within the use of every technical tool there is more than just the consciousness of that tool, there is also the possibility to spark something beyond those predefined uses. The only real way to galvanize that potentiality is to provide the conditions of possibility — that is, a toolkit for user innovation.

My recent collaboration with Mike Caulfield on the Digital Polarization Initiative has led to the creation of just such a toolkit. It supports DigiPo in the ways described and shown here. A version of the toolkit, demoed here, will support a team of investigative journalists. Now I need to show how the toolkit enables educators, scientists, investigative reporters, students — anyone who researches and writes articles or reports or papers backed by web-based evidence — to innovate in similar ways.

In tech we tend to abuse the term innovation so let me spell out exactly what I mean: Better ways to gather, organize, reason over, and cite online evidence. Web annotation, standardized this week by the W3C, is a key enabler. The web’s infinite space of addressable URLs is now augmented by a larger infinity of segments of interest within the resources pointed to by URLs. In the textual realm, paragraphs, list items, sentences, or individual words can be reliably linked to conversations — but also applications — that live in connected annotation layers.

A web of addressable segments of interest is a necessary, but not sufficient, condition of possibility. We also need tools that enable us to gather, organize, recombine, and cite those segments. And some of those tools need to be malleable in the hands of users who can shape them for their own purposes.

When I reread Vannevar Bush’s As We May Think, to prepare for a conversation about it with Gardner Campbell and Jeremy Dean (video, Gardner’s reflections), I focused on this passage:

He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together.

Nowadays that first encyclopedia article lives at one URL. The pertinent item in a history is a segment of interest within another URL-addressable resource. How do we tie them together? A crucial connector is a tag that belongs to neither resource but refers to both.

When tools control the sets of tags available for resource interconnection, they enable groups of people to make such connections reliably. That’s what the DigiPo toolkit does when it offers a list of investigation pages, drawn from the namespace of a wiki, as the set of tags that connect annotation-defined evidence to investigations. You see that happening with the DigiPo toolkit shown here, and with a variant of the toolkit shown here. In both cases the tags that bind evidence to wiki pages are controlled by software that acquires a list of wiki pages and presents the names of those pages as selectable tags.

One future direction for the toolkit leads to software that acquires lists of pages from other kinds of content management systems: WordPress, Drupal, you name it. Every CMS defines a namespace that is implicitly a list of tags that can be used to bind sets of resources to the pages served by that CMS. If you’re looking to adapt a DigiPo-like tool to your CMS, I’ll be delighted to show you how.

Such adaptation, though, requires somebody to write some code. While it’s unfashionable in some circles to say so, I don’t think everyone should learn to code. There’s a more fundamental web literacy, nicely captured by Audrey Watters here:

It’s about understanding the components of the Web and knowing how to tag and then manipulate them. By thinking and developing sets of named resources, you are a Web thinker. This isn’t about programming but rather the creation of sets of resources and the identification of components that work with those resources and combine them to create solutions.

Web annotation vastly enlarges the universe of resources that can be named. But it’s on us to name them. Tags are a principal way we do that. If our naming of resources is going to be an effective way to organize and combine them, though, we need to do it reliably and consistently. Software can enforce that consistency, but not everyone can write software. So a user innovation toolkit for the annotated web needs to empower users to enforce consistent naming without writing code.

A couple of weeks ago I built a Chrome extension that enables users to define their own lists of shared tags by recording them in an Google Doc. The demonstration video prompted this query from Jim Groom:

I just got through with a workshop here demoing Hypothes.is for a European group that may be using it to annotate online legislation for data privacy set to go live in 2018. They are teaching a course on it, and this could be one of the spaces/hubs they build the open part around. I came back to this video just now, but got the sense I could already tag from within annotations/pages, so how does the tag helper change this? Just a different way at it? Is it new functionality from previous tags? I love that you can have a Google Doc list of tags, but the video example is not making sense to me for some reason. And I wanna know :)

Here’s my response. That tag helper, now incorporated into the toolkit I’m evolving for DigiPo and other uses, makes it possible for people who don’t write code to define tag namespaces that govern their gathering, organization, recombination, and citation not only of URL-addressable resources but also of annotation-addressable segments of interest within those resources. People can “tie them together” — as Vannevar Bush imagined — in the ways their interests and workflows require.

Does that answer the question? If not, please keep asking until I do so properly. User-defined tag namespaces, though admittedly still a curiosity, are one of the best ways to make collective use of a web of addressable segments.