Welcome to the Sapiezoic

The latest Long Now podcast, by David Grinspoon, takes a very long view indeed. As we transition from the Holocene to the Anthropocene, he thinks, we’re not just entering a new geological epoch, as shown here:

That alone would be a big deal. But epochs are just geologic eyeblinks a few million years in duration. Grinspoon thinks we might be entering something way bigger. Not just a period or an era. We might happen to be alive now at an eon boundary, as shown here:

There have only been four eons so far. Each was a major transition in earth history — a shift in the relationship between life and the planet. Life first emerged at the beginning of the Archean era around 4 billion years ago, when things cooled down enough. Around 2.5 billion years ago, cyanobacteria learned to photosynthesize. They bathed the world in oxygen, caused mass extinction, and deeply entangled life with the physical and chemical workings of the planet. At the boundary between the Proterozoic and Phanerozoic, around 500 million years ago, life went multicellular, plants and animals appeared, and the modern eon began.

Will our descendants look back on the Anthropocene as the dawn of a fifth eon? Grinspoon makes a compelling case that they might. The Archean cyanobacteria that poisoned the environment couldn’t know what they were doing. We can. Our infrastructure is taking over the planet as surely as the oxygenated atmosphere did. But it is, at least potentially, under our conscious control. The hallmark of the Sapiezoic eon he envisions: intentional re-terraforming.

Grinspoon cites one shining example: the Montreal Protocol that will, if we stay the course, reverse manmade ozone depletion. “We are as gods,” says Stewart Brand, “so we might as well get good at it.”

I’ve listened to all the Long Now podcasts, some more than once. This one rates very highly. It’s a great talk.

Fact-checking Naomi Klein’s “No Is Not Enough”

So my conclusion is that Klein, who says she wrote this book quickly, to respond to the current moment, with less attention to endnotes than usual, is generally reliable on facts.

The way in which I reached that conclusion is a pretty good example of the strategies outlined in Web Literacy for Student Fact-Checkers, and a reminder that those methods aren’t just for students. All of us — me, you, Naomi Klein, everyone — need to build those muscles and exercise them regularly.

On a hike last week I heard an excellent episode of Radio Open Source, featuring Naomi Klein, David Graeber, and Pankaj Mishra. One of the segments of interest that stuck with me is this remark by Naomi Klein:

We need to examine the way in which politics has been taken over by the logic of corporate branding, which is not something Trump started. Trump was just better at it than anybody else because he is himself a fully commercialized brand. So the table was set for Trump, he just showed up and said, “Well, I know this game better than you jokers, I’m the real thing, I’m a reality TV star and I’m a megabrand. Step aside!”

(If I could, I would link you directly to that segment in situ, that’s something I had working a long time ago, but since audio quotation still isn’t a ubiquitous feature of the web, here’s the compelling minute of audio that contains that quote.)

I was previously aware of Naomi Klein but had never heard her speak, had read none of her books, and was only slightly familiar with her critique of corporatized politics. Her conversation with Chris Lydon on that podcast prompted me to read her new book, No Is Not Enough, published just a few weeks ago.

I was also slightly familiar with criticism of Klein’s views. So, in a moment when the president of the United States had just tweeted a video of himself performing a mock attack during his time as a reality TV personality on the pro wrestling circuit, I was curious to know her thoughts but also prepared to take them with a grain of salt.

Here’s the book’s table of contents:

The first section elaborates on the above quote. Human megabrands are, Klein points out, a relatively new thing. She writes:

People keep asking — is he going to divest? Is he going to sell his businesses? Is Ivanka going to? But it’s not at all clear what these questions even mean, because their primary businesses are their names. You can’t disentangle Trump the man from Trump the brand; those two entities merged long ago. Every time he sets foot in one of his properties — a golf club, a hotel, a beach club — White House press corps in tow, he is increasing his overall brand value, which allows his company to sell more memberships, rent more rooms, and increase fees.

I hope we can agree across ideologies that this kind of thing is unhealthy. In the audio clip I cited above, Klein notes that the antidote is not a liberal megabrand, not Zuckerberg or Oprah. Conflation of brand power and political power is just a bad idea, and we need to reckon with that.

The rest of the book builds on arguments made in her earlier ones: Capitalism’s winners exploit natural and man-made crises to consolidate their winnings (The Shock Doctrine); climate change presents an existential challenge to that world order (This Changes Everything). Since I haven’t read those books, and have only just now read a few reviews pro and con, I lack the full context needed to evaluate the arguments in No Is Not Enough. But that’s exactly the right setup for the point about fact-checking that I want to make here.

How reliable is Naomi Klein on facts? I came to No Is Not Enough with no strong opinion one way or another. I raised an eyebrow, though, when I read this passage about Treasury secretary Steven Mnuchin:

Even among Goldman alumni, Steven Mnuchin has distinguished himself by his willingness to profit off misery. Afer the 2008 Wall Street collapse, and in the middle of the foreclosure crisis, Mnuchin purchased a California bank. The renamed company, OneWest, earned Mnuchin the nickname “Foreclosure King,” reportedly collecting $1.2 billion from the government to help cover the losses for foreclosed homes and evicting tens of thousands of people between 2009 and 2014. One attempted foreclosure involved a ninety-year-old woman who was behind on her payments by 27 cents.”

The last sentence sent me to Google, where I quickly learned it had been debunked in a tweetstorm by Ted Frank in January 2017. He works for a libertarian think tank, and I doubt we’d see eye to eye on many issues, but his takedown of the 27-cent claim was accurate. Politico, for example, corrected its version of the story.

This is unfortunate because everything else in the above quote seems to check out. And you don’t have to be a liberal snowflake to worry legitimately about the Goldmanization of the US Cabinet.1

I went on to spot-check a number of other claims in No Is Not Enough and again, so far as I can tell with modest effort, everything checks out. So my conclusion is that Klein, who says she wrote this book quickly, to respond to the current moment, with less attention to endnotes than usual, is generally reliable on facts.

The way in which I reached that conclusion is a pretty good example of the strategies outlined in Web Literacy for Student Fact-Checkers, and a reminder that those methods aren’t just for students. All of us — me, you, Naomi Klein, everyone — need to build those muscles and exercise them regularly.


1. On another episode of Radio Open Source, in a remarkable dialogue between Pat Buchanan and Ralph Nader, the arch-conservative Buchanan aligned with the arch-liberal Nader on that point:

I agree completely with Ralph, I did not know we were going to make the world safe for Goldman Sachs, and I am a little surprised to find three or four or five of these guys, one or two might have been OK.

Thoughts on Audrey Watters’ “Thoughts on Annotation”

Back in April, Audrey Watters’ decided to block annotation on her website. I understand why. When we project our identities online, our personal sites become extensions of our homes. To some online writers, annotation overlays can feel like graffiti. How can we respect their wishes while enabling conversations about their writing, particularly conversations that are intimately connected to the writing? At the New Media Consortium conference recently, I was finally able to meet Audrey in person, and we talked about how to balance these interests. Yesterday Audrey posted her thoughts about that conversation, and clarified a key point:

You can still annotate my work. Just not on my websites.

Exactly! To continue that conversation, I have annotated that post here, and transcluded my initial set of annotations below.


judell 6/27/2017 #

using an HTML meta tag to identify annotation preferences

This is just a back-of-the-napkin sketch of an idea, not a formal proposal.

judell 6/27/2017 #

I’m much less committed to having one canonical “place” for annotations than Hypothesis is

Hypothesis isn’t committed to that either. The whole point of the newly-minted web annotation standard is to enable an ecosystem of interoperable annotation clients and servers, analogous to comparable ecosystems of email and web clients and servers.

judell 6/27/2017 #

Hypothesis annotations of a PDF can be centralized, no matter where the article is hosted or whether it’s a local copy

Centralization and decentralization are slippery terms. I would rather say that Hypothesis can unify a set of annotations across a family of representations of the “same” work. Some members of that family might be HTML pages, others might be PDFs hosted on the web or kept locally.

It’s true that when Hypothesis is used to create and view such annotations, they are “centralized” in the Hypothesis service. But if someone else stands up an instance of Hypothesis, that becomes a separate pool of annotations. Likewise, we at Hypothesis have planned for, and expect to see, a world in which non-Hypothesis-based implementations of standard annotation capability will host still other separate pools of annotations.

So you might issue three different API queries — to Hypothesis, to a Hypothesis-based service, and to a non-Hypothesis-based service — for a PDF fingerprint or a DOI. Each of those services might or might not internally unify annotations across a family of “same” resources. If you were to then merge the results of those three queries, you’d be an annotation aggregator — the moral equivalent of what Radio UserLand, Technorati, and other blog aggregators did in the early blogosphere.

Dumb servers for personal clouds

I’m delighted to hear that my daughter and her best friend will be collaborating on a blog. And of course I’m tickled that she asked my advice on where to run it. I noted that Ghost is the new kid on the block, and is much simpler than what WordPress has become. But they want to do it for free, so WordPress it is.

Then she surprised me with this narrative:

I heard it’s better if you self-host, so that’s what we’ll want to do, right? I think self-hosting is good because you don’t have the website name in your blog URL. Also, more importantly, I think it’s how you ensure that it’s actually yours.

It turns out that she’d conflated self-hosting, i.e. running your own instance of the WordPress software and database, with the simpler method my own blog exemplifies. I use WordPress.com precisely because, although I do run my own servers, the fewer the better. I’m happy to rely on WordPress to host my blog for me. I’m also happy to pay them $13/year to connect jonudell.wordpress.com to blog.jonudell.net.

So that’ll be the solution for my daughter. But I’m left wondering how many others conflate self-hosting with domain redirection, and how that affects their thinking about control of their own digital identities and data. I suspect it’s often unclear that, whether you run a blog on WordPress.com or on another provider’s server, your data is equally under your control. Likewise, use of a personal domain name is equally possible in both cases. What is the difference? With self-hosting, you can use arbitrary WordPress plugins and themes, and/or modify the software. Sometimes, for some people, that matters. Often, for many, it doesn’t.

That said, I agree with Mike Caulfield’s plea to make servers dumb again. In my ideal world, I’d not only outsource the management of the blog software to WordPress, but would also connect the software to my personal cloud, which would be implemented by my chosen storage provider.

I got this idea from Gorden Bell’s MyLifeBits, and riffed on it to imagine cloud-hosted lifebits. Jim Groom ably summed up the argument here:

Will we ever get there? It has to happen sooner or later. Maybe, as Doug Levin suggests today, it’ll be sooner.

Celebrating Infrastructure

When cycling in forested New England countryside I sometimes wondered about the man-made forest built along the roadside — telephone poles, power lines, transformers — and thought someone should write a book about the industrial landscape. It turns out that someone did. Brian Hayes spent many years traveling around America, researching and photographing the infrastructure that sustains our civilization. The book he produced, Infrastructure: A Field Guide to the Industrial Landscape (2005, 2nd ed. 2014), is everything I imagined it would be.

(I found the book by way of a comment that Brian Hayes left here on this blog. “Couldn’t be that Brian Hayes,” I thought. But his signature led me to his blog and thence to Infrastructure‘s home on the web. I’m passing it along here in part to remind myself that my favorite books often aren’t new or well publicized. I find them serendipitously after they’ve been around for a while.)

My father and his twin brother were students of nature in a way I’ve never been. Their knowledge of plants and animals was encyclopedic and ever-expanding. But for most of us, the natural landscape is not an expanse of unnamed and unknown objects. We recognize egrets, crows, hummingbirds, oaks, pines, and maples. The same isn’t true of the industrial landscape. More often than not, driving along some industrial corridor, we’re likely to ask the question Brian Hayes’ daughter asked him: “What’s that thing?” Infrastructure answers those questions for her, and for us.

Chapters on mining, waterworks, farming, energy production and distribution, transportation, shipping, and waste management follow a plan that “traces the flow of materials, information, and energy” throughout the web of industrial networks. We learn how industrial processes work, and how to identify the structures that house and implement them. Not all of us encounter quarries, mills, dams, refineries, or power plants on a daily basis. But water towers, roads, bridges, power lines, and data cables are as much a part of our landscape as what nature put there.

Hayes invites us to know more about the names, appearances, and workings of the industrial landscape. He also challenges us to reconsider how we feel about that landscape.

I stood by the side of a highway near Gallup, New Mexico, looking on a classic vista of the American West: red sandstone buttes, rising from a valley floor. … In front of the cliffs, and towering over them, were several cylindrical spires that I recognized as petroleum fractionating columns; off to one side was a grove of gleaming white spherical tanks. … I suspect that most viewers of this scene would consider the industrial hardware to be an intrusion, a distraction, perhaps even a desecration of the landscape.

Guilty as charged. But I’m provoked by this book to reconsider. Celebrate infrastructure, don’t hide it, Stewart Brand tweeted today. “It is civilization’s metabolism and should be its pride..

Weaving the annotated web

In 1997, at the first Perl Conference, which became OSCON the following year, my friend Andrew Schulman and I both gave talks on how the web was becoming a platform not only for publishing, but also for networked software.

Here’s the slide I remember from Andrew’s talk:

http://wwwapps.ups.com/tracking/tracking.cgi?tracknum=1Z742E220310270799

The only thing on it was a UPS tracking URL. Andrew asked us to stare at it for a while and think about what it really meant. “This is amazing!” he kept saying, over and over. “Every UPS package now has its own home page on the world wide web!”

It wasn’t just that the package had a globally unique identifier. It named a particular instance of a business process. It made the context surrounding the movement of that package through the UPS system available to UPS employees and customers who accessed it in their browsers. And it made that same context available to the Perl programs that some of us were writing to scrape web pages, extract their data, and repurpose it.

As we all soon learned, URLs can point to many kinds of resources: documents, interactive forms, audio or video.

The set of URL-addressable resources has two key properties: it’s infinite, and it’s interconnected. Twenty years later we’re still figuring out all the things you can do on a web of hyperlinked resources that are accessible at well-known global addresses and manipulated by a few simple commands like GET, POST, and DELETE.

When you’re working in an infinitely large universe it can seem ungrateful to complain that it’s too small. But there’s an even larger universe of resources populated by segments of audio and video, regions of images, and most importantly, for many of us, text in web documents: paragraphs, sentences, words, table cells.

So let’s stare in amazement at another interesting URL:

https://hyp.is/LoaMFCSJEee3aAMJuXhO-w/www.ics.uci.edu/~fielding/pubs/dissertation/software_arch.htm

Here’s what it looks like to a human who follows the link: You land on a web page, in this case Roy Fielding’s dissertation on web architecture, it scrolls to the place where I’ve highlighted a phrase, and the Hypothesis sidebar displays my annotation which includes a comment and a tag.

And here’s what that resource looks like to a computer when it fetches a variant of that link:

{
    "body": [
        {
            "type": "TextualBody",
            "value": "components: web resources\n\nconnectors: links\n\ndata: data",
            "format": "text/markdown"
        },
        {
            "type": "TextualBody",
            "purpose": "tagging",
            "value": "IAnnotate2017"
        }
    ],
    "target": [
        {
            "source": "https://www.ics.uci.edu/~fielding/pubs/dissertation/software_arch.htm",
            "selector": [
                {
                    "type": "XPathSelector",
                    "value": "/table[2]/tbody[1]/tr[1]/td[1]",
                    "refinedBy": {
                        "start": 82,
                        "end": 114,
                        "type": "TextPositionSelector"
                    }
                },
                {
                    "type": "TextPositionSelector",
                    "end": 4055,
                    "start": 4023
                },
                {
                    "exact": "components, connectors, and data",
                    "prefix": "tion of architectural elements--",
                    "type": "TextQuoteSelector",
                    "suffix": "--constrained in their relations"
                }
            ]
        }
    ],
    "created": "2017-04-18T22:48:46.756821+00:00",
    "@context": "http://www.w3.org/ns/anno.jsonld",
    "creator": "acct:judell@hypothes.is",
    "type": "Annotation",
    "id": "https://hypothes.is/a/LoaMFCSJEee3aAMJuXhO-w",
    "modified": "2017-04-18T23:03:54.502857+00:00"
}

The URL, which we call a direct link, isn’t itself a standard way to address a selection of text, it’s just a link that points to a web resource. But the resource it points to, which describes the highlighted text and its coordinates within the document, is — since February of this year — a W3C standard. The way I like to think about it is that the highlighted phrase — and every possible highlighted phrase — has its own home page on the web, a place where humans and machines can jointly focus attention.

If we think of the web we’ve known as a kind of fabric woven together with links, the annotated web increases the thread count of that fabric. When we weave with pieces of URL-addressable documents, we can have conversations about those pieces, we can retrieve them, we can tag them, and we can interconnect them.

Working with our panelists and others, it’s been my privilege to build a series of annotation-powered apps that begin to show what’s possible when every piece of the web is addressable in this way.

I’ll show you some examples, then invite my collaborators — Beth Ruedi from AAAS, Mike Caulfield from Washington State University Vancouver, Anita Bandrowski from SciCrunch, and Maryann Martone from UCSD and Hypothesis — to talk about what these apps are doing for them now, and where we hope to take them next.

Science in the Classroom

First up is a AAAS project called Science in the Classroom, a collection of research papers from the Science family of journals that are annotated — by graduate students — so teachers can help younger students understand the methods and outcomes of scientific research.

Here’s one of those annotated papers. A widget called the Learning Lens toggles layers of annotation and off.

Here I’ve selected the Glossary layer, and I’ve clicked on the word “distal” to reveal the annotation attached to it.

Now lets look behind the scenes:

Hypothesis was used to annotate the word “distal”. But Learning Lens predated the use of Hypothesis, and the Science in the Classroom team wanted to keep using Learning Lens to display annotations. What they didn’t want was the workflow behind it, which required manual insertion of annotations into HTML pages.

Here’s the solution we came up with. Use Hypothesis to create annotations, then use some JavaScript in Science in the Classroom pages to retrieve Hypothesis annotations and write them into the pages, using the same format that had been applied manually. The preexisting and unmodified Learning Lens JavaScript can then do what it does: pick up the annotations, assign color-coded highlights based on tags, and show the annotations when you click on the highlights.

What made this possible was a JavaScript library that helps with the heavy lifting required to attach an annotation to its intended target in the document.

That library is part of the Hypothesis client, but it’s also available as a standalone module that can be used for other purposes. It’s a nice example of how open source components can enable an ecosystem of interoperable annotation services.

DigiPo / EIC

Next up is a toolkit for student fact-checkers and investigative journalists. You’ve already heard from Mike Caulfield about the Digital Polarization Project, or DigiPo, and from Stefan Candea about the European Investigative Collaborations network. Let’s look at how we’ve woven annotation into their investigative workflows.

These investigations are both written and displayed in a wiki. This is a DigiPo example:

I did the investigation of this claim myself, to test out the process we were developing. It required me to gather a whole lot of supporting evidence before I could begin to analyze the claim. I used a Hypothesis tag to collect annotations related to the investigation, and you can see them in this Hypothesis view:

I can be very disciplined about using tags this way, but it’s a lot to ask of students, or really almost anyone. So we created a tool that knows about the set of investigations underway in the wiki, and offers the names of those pages as selectable tags.

Here I’ve selected a piece of evidence for that investigation. I’m going to annotate it, not by using Hypothesis directly, but instead by using a function in a separate DigiPo extension. That function uses the core anchoring libraries to create annotations in the same way the Hypothesis client does.

But it leads the user through an interstitial page that asks which investigation the annotation belongs to, and assigns a corresponding tag to the annotation it creates.

Back in the wiki, the page embeds the same Hypothesis view we’ve already seen, as a Related Annotations widget pinned to that particular tag:

I had so much raw material for this article that I needed some help organizing it. So I added a Timeline widget that gathers a subset of the source annotations that are tagged with dates.

To put something onto the timeline, you select a date on a page.

Then you create an annotation with a tag corresponding to the date.

Here’s what the annotation looks like in Hypothesis.

Over in the wiki, our JavaScript finds annotations that have these date tags and arranges them on the Timeline.

Publication dates aren’t always evident on web pages, sometimes you have to do some digging to find them. When you do find one, and annotate a page with it, you’ve done more than populate the Timeline in a DigiPo page. That date annotation is now attached to the source page for anyone to discover, using Hypothesis or any other annotation-aware viewer. And that’s true for all the annotations created by DigiPo investigators. They’re woven into DigiPo pages, but they’re also available for separate reuse and aggregation.

The last and most popular annotation-related feature we added to the toolkit is called Footnotes. Once you’ve gathered your raw material into the Related Annotations bucket, and maybe organized some of it onto the Timeline, you’ll want to weave the most pertinent references into the analysis you’re writing.

To do that, you find the annotation you gathered and use Copy to clipboard to capture the direct link.

Then you wrap that link around some text in the article:

When you refresh the page, here’s what you get. The direct link does what a direct link does: it takes you to the page, scrolls you to the annotation in context. But it can take a while to review a bunch of sources that way.

So the page’s JavaScript also creates a link that points down into the Footnotes section. And there, as Ted Nelson would say, and as Nate Angell for some reason hates hearing me say, the footnote is “transcluded” into the page so all the supporting context is right there.

One final point about this toolkit. Students don’t like the writing tools available in wikis, and for good reason, they’re pretty rough around the edges. So we want to enable them to write in Google Docs. We also want them to footnote their articles using direct links because that’s the best way to do it. So here’s a solution we’re trying. From the wiki you’ll launch into Google Docs where you can do your writing in a much more robust editor that makes it really easy to include images and charts. And if you use direct links in that Google Doc, they’ll still show up as Footnotes.

We’re not yet sure this will pan out, but my colleague Maryann Martone, who uses Hypothesis to gather raw material for her scientific papers, and who writes them in Google Docs, would love to be able to flow annotations through her writing tool and into published footnotes.

SciBot

Maryann is the perfect segue to our next example. Along with Anita Bandrowski, she’s working to increase the thread count in the fabric of scientific literature. When neuroscientists write up the methods used in their experiments, the ingredients often include highly specific antibodies. These have colloquial names, and even vendor catalog numbers, but they still lacked unique identifiers. So the Neuroscience Information Framework, NIF for short, has defined a namespace called RRID (research resource identifier), built a registry for RRIDs, and convinced a growing number of authors to mention RRIDs in their papers.

Here’s an article with RRIDs in it. They’re written directly into the text because the text is the scientific record, it’s the only artifact that’s guaranteed to be preserved. So if you’re talking about a goat polyclonal antibody, you look it up in the registery, capture its ID, and write it directly into the text. And if it’s not in the registry, please add it, you’ll make Anita very happy if you do!

The first phase of a project we call SciBot was about validating those RRIDs. They’re just freetext, after all, typed in by authors. Were the identifiers spelled correctly? Did they point to actual registry entries? To find out we built a tool that automatically annotates occurrences of RRIDs.

In this example, Anita is about to click on the SciBot tool, which launches from a bookmarklet, and sends the text of the paper to a backend service. It scans the text for RRIDs, looks up each one in the registry, and uses the Hypothesis API to create an annotation — bound to the occurrence in the text — that reports the results of the registry lookup.

Here the Hypothesis realtime API is showing that SciBot has created three annotations on this page.

And here are those three annotations, anchored to their occurrences in the page, with registry entries displayed in the sidebar.

SciBot curators review these annotations and use tags to mark which are valid. When some aren’t, and need attention, the highlight focuses that attention on a specific occurrence.

This hybrid of automatic entity recognition and interactive human curation is really powerful. Here’s an example where an antibody doesn’t have an RRID but should.

Every automatic workflow needs human exception handling and error correction. Here the curator has marked an RRID that wasn’t written into the literature, but now is present in the annotation layer.

These corrections are now available to train a next-gen entity recognizer. Iterating through that kind of feedback loop will be a powerful way to mine the implicit data that’s woven into the scientific literature and make it explicit.

Here’s the Hypothesis dashboard for one of the SciBot curators. The tag cloud gives you a pretty good sense of how this process has been unfolding so far.

Publishers have begun to link RRIDs to the NIF registry. Here’s an example at PubMed.

If you follow the ZIRC_ZL1 link to the registry, you’ll find a list of other papers whose authors used the same experimental ingredient, which happens to be a particular strain of zebrafish.

This is the main purpose of RRIDs. If that zebrafish is part of my experiment, I want to find who else has used it, and what their experiences have been — not just what they reported in their papers, but ideally also what’s been discussed in the annotation layer.

Of course I can visit those papers, and search within them for ZIRC_ZLI, but with annotations we can do better. In DigiPo we saw how footnoted quotes from source documents can transclude into an article. Publishers could do the same here.

Or we could do this. It’s a little tool that offers to look up an RRID selected in text.

It just links to an instance of the Hypothesis dashboard that’s pinned to the tag for that RRID.

Those search results offer direct links that take you to each occurrence in context.

Claim Chart

Finally, and to bring us full circle, I recently reconnected with Andrew Schulman who works nowadays as a software patent attorney. There’s a tool of his trade called a claim chart. It’s a two-column table. In column one you list claims that a patent is making, which are selections of text from the claims section of the patent. And in column two you assemble bits of evidence, gathered from other sources, that bear on specific claims. Those bits of evidence are selections of text in other documents. It’s tedious to build a claim chart, it involves a lot of copying and pasting, and the evidence you gather is typically trapped in whatever document you create.

Andrew wondered if an annotation-powered app could help build claim charts, and also make the supporting evidence web-addressable for all the reasons we’ve discussed. If I’ve learned anything about annotation, it’s that when somebody asks “Can you do X with annotation?” the answer should always be: “I don’t know, should be possible, let’s find out.”

So, here’s an annotation-powered claim chart.

The daggers at top left in each cell are direct links. The ones in the first column go to patent claims in context.

The ones in the second column go to related statements in other documents.

And here’s how the columns are related. When you annotate a claim, you use a toolkit function called Add Selection as Claim.

Your selection here identifies the target document (that is, the patent), the claim chart you’re building (here, it’s a wiki page called andrew_test), and the claim itself (for example, claim 1).

Once you’ve identified the claims in this way, they’re available as targets of annotations in other documents. From a selection in another document, you use Add Selection as Claim-Related.

Here you see all the claims you’ve marked up, so it’s easy to connect the two statements.

The last time I read Vannevar Bush’s famous essay As We May Think, this was the quote that stuck with me.

When statements in documents become addressable resources on the web, we can weave them together in the way Vannevar Bush imagined.

Do Repeat Yourself, With Variations

Don’t Repeat Yourself (DRY) is a touchstone principle of software development. It’s often understood to inveigh against duplication of code. Copying a half-dozen lines from one program to another is a bad idea, DRY says, because if you change your mind about how that code works, you’ll have to revise it in several places. Better to convert those lines of code into a function that you write once and reuse.

More broadly, the DRY principle asserts:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

Code and data are two kinds of knowledge that ought to be represented canonically, and repeated — if at all — only by mechanical derivation, never by variation.

I often violate the DRY principle by indulging in CopyAndPasteProgramming. In my defense I point to another principle, CodeHarvesting, which defends duplication as a necessary stepping stone.

Letting a duplication of logic live for now, in order to see how to best universalize it at some later point.

For me, at least, that’s what tends to work best. A common theme doesn’t emerge until I’ve seen — and ideally others have seen and reacted to — several variations on that theme. This kind of duplication — deferred universalization — is beneficial, right?

Here’s another kind. In the JavaScript world the dominant engine of reuse is the Node Package Manager (NPM). When I first started using it a few years ago, I was shocked at the amount of duplication it entails. When you install an NPM package, the modules it depends on are copied into a subdirectory. If those modules depend on others, they are copied into yet deeper subdirectories. For even a simple JavaScript program you can end up with a forest of thousands of files.

A similar thing happens in the Python world. It’s a best practice, nowadays, to use a tool called virtualenv to create, for each Python program you run, an isolated environment with the particular Python interpreter and set of modules needed by that particular program. In practice that means, again, copying lots of files.

Arguably these duplications don’t violate DRY because they are mechanical copies that won’t vary from their originals. But they can! And here too I am prone to indulge in local variation to explore possibilities that might or might not warrant generalization.

While pondering the vices and virtues of duplicative software construction I reread Metamagical Themas, the compendium of Douglas Hofstadter’s columns in Scientific American. (The title is an anagram of Mathematical Games, the column he inherited from Martin Gardner.) In Variations on a Theme as the Crux of Creativity he states the case as plainly as anywhere. At the core of creative thought are “slippery” concepts that we develop in a virtuous cycle of innovation:

Once you have decided to try out a new way of viewing a phenomenon, you can let that view suggest a set of knobs to vary. The act of varying them will lead you down new pathways, generating new images ripe for perception in their own right.

This sets up a closed loop:

– fresh situations get unconsciously framed in terms of familiar concept;

– those familiar concepts come equipped with standard knobs to twiddle;

– twiddling those knobs carries you into fresh new conceptual territory.

We need to get DRY eventually in order to maintain stable systems. But the countervailing state needn’t be WET (“write everything twice”, “we enjoy typing” or “waste everyone’s time”). Instead I propose DRYWV: Do Repeat Yourself, With Variations.

Every piece of knowledge should have a single, unambiguous, authoritative representation within a system. But how do we arrive at such knowledge? I think we have to DRYWV our way there.