Copyright can’t stop annotation of government documents

I’ll admit that the Medium Legal team’s post AB 2880?—?Kill (this) Bill had me at hello:

Fellow Californians, please join us in opposing AB 2880, which would allow and encourage California to extend copyright protection to works made by the state government. We think it’s a bad idea that would wind up limiting Californians’ ability to post and read government information on platforms like Medium.

That sure does sound like a bad idea, and hey, I’m a Californian now too. But when I try to read the actual bill I find it hard to relate its text to Medium Legal’s interpretations, or to some others:

I doubt I’m alone in struggling to connect these interpretations to their evolving source text. Medium Legal says, for example:

AB 2880 requires the state’s Department of General Services to track the copyright status of works created by the state government’s 228,000 employees, and requires every state agency to include intellectual property clauses in every single one of their contracts unless they ask the Department in advance for permission not to do so.

What’s the basis for this interpretation? How do Medium Legal think the text of the bill itself supports it? I find four mentions of the Department of General Services in the bill: (1), (2), (3), (4). To which of these do Medium Legal refer? Do they also rely on the Assembly Third Reading? How? I wish Medium Legal had, while preparing their post, annotated those sources.

The Assembly Third Reading, meanwhile, concludes:

Summary of the bill: In summary, this bill does all of the following:

1) clarifies existing law that state agencies may own, license, and register intellectual property to the extent not inconsistent with the rights of the public to obtain, inspect, copy, publish and otherwise communicate under the California Public Records Act, the California Constitution as provided, and under the First Amendment to the United States Constitution;

2) …

7) …

Analysis Prepared by: Eric Dang / JUD. / (NNN) NNN-NNNN

The same questions apply. How does Eric Dang think the source text supports his interpretation? How do his seven points connect to the bill under analysis? Again, an annotation layer would help us anchor the analysis to its sources.

Medium Legal and Eric Dang used digital tools to make notes supporting their essays. Such notes are, by default, not hyperinked to specific source passages and not available to us as interpretive lenses. Modern web annotation flips that default. Documents remain canonical; notes anchor precisely to words and sentences; the annotation layer is a shareable overlay. There’s no copying, so no basis for the chilling effect that critics of AB 2880 foresee. While the bill might limit Californians’ ability to post and read government information on platforms like Medium, it won’t matter one way or the other to Californians who do such things on platforms like Hypothesis.

Watching animals

On a visit to the Berlin Zoo last month, I watched primates interact with a cleverly-designed game called a poke box. It’s a plexiglas enclosure with shelves. Rows of holes in the enclosure give access to the shelves. Each shelf has a single hole, offset from the holes above and below. The machine drips pellets of food onto the top shelf. The primates, using sticks they’ve stripped and prepared, reach through the holes and tease the pellets through the holes in the shelves, performing one delicate maneuver after another, finally reaching through a slot in the bottom of the enclosure to claim the prize.

How would I perform in a game like that? My hunch: not much better or worse than a group of bonobos I spent a long time watching.

“This so-called behavioural enrichment,” the Zoo says, “is an important area in animal management.” The poke box is undoubtedly enriching the lives of those primates. But they are still prisoners.

My dad was a volunteer at the Philadelphia Zoo. I saw, growing up, how that zoo tried to create more naturalistic settings for its animals. The most successful I can recall was the hummingbird house. Its tiny inhabitants roamed freely in a large open space that looked and felt like a tropical rain forest.

I don’t know if those birds really lived something like a normal life, but it seems at least plausible. I can’t say the same for most of the animals I’ve seen in zoos. Conditions may have improved over the years, but I’m not sure I’ll ever again pay to watch the prisoners perform “enriched” behaviors in “naturalistic” settings.

If my children were still young, that’d be a hard call to make. A visit to the zoo is one of the great experiences of childhood. If there weren’t zoos, how could we replace that experience?

Maybe we don’t need to jail animals. Maybe we just need to improve our ability to observe them in situ. There’s still plenty of open space that is, or could be, conserved. And we’re getting really good at surveillance! Let’s put our new skill to a different use. Fly cameras over areas of wildlife parks inaccessible to tourist vehicles. Enable online visitors to adopt and follow individual animals and their groups. Make it possible for anyone to have the sort of experience Jane Goodall did. The drone cameras sound creepy, but unlike the vehicles that carry tourists through those parks, the drones will keep getting smaller and more unobtrusive.

Could it happen? I’m not holding my breath. So for now I’ll focus on the wildlife I can see right here in Sonoma County. This spring we discovered the Ninth Street Rookery, less than a mile from our house. Each of the two large eucalyptus trees there is home to about a hundred nests in which black-crowned night herons and various species of egret — great, cattle, snowy — raise a cacophony as they raise their young. We visited a couple of times; it’s wonderful. Although, come to think of it, if it were a channel on cams.allaboutbirds.org I could watch and learn a lot more about the lives of these animals. And so could you.

Thoughts in motion, annotated

In Knowledge Work as Craft Work (2002), Jim McGee wrote:

The journey from apprentice to master craftsman depends on the visibility of all aspects of craft work.

That was the inspiration for a talk I gave at the 2010 Traction User Group meeting, which focused on the theme of observable work. In the GitHub era we take for granted that we can craft software in the open, subjecting each iteration to highly granular analysis and discussion. Beautiful Code (2007) invited accomplished programmers to explain their thinking. I can imagine an annotated tour of GitHub repositories as the foundation of a future edition of that book.

I can also imagine crafting prose — and then explaining the process — in a similarly open and observable way. The enabling tools don’t exist but I’m writing this post in a way that I hope will suggest what they might be. The toolset I envision has two main ingredients: granular versioning and annotation. When I explored Federated Wiki last year, I got a glimpse of the sort of versioning that could usefully support analysis of prose craft. The atomic unit of versioning in FedWiki is the paragraph. In Thoughts in motion I created a plugin that revealed the history of each paragraph in a document. As writers we continually revise. The FedWiki plugin illustrated that process in a compelling way. The sequence of revisions to a paragraph recorded a sequence of decisions.

For an expert writer such decisions are often tacit. We apply rules that we’ve internalized. How might an expert writer bring those rules to the surface, reflect on them, and explain them to others? Granular version history is necessary but not sufficient. We also need a way to narrate our decisions. I think annotation of version history can help us tell that story. To test that intuition, I am recording a detailed history of this blog post as I write it. The experiment I have in mind: annotate that change history to explain — to myself and others — the choices I’ve made along the way.

Time passes…

OK, I’ve done the experiment here. It certainly explained some things to me about my own process. I doubt it’s generally useful as is, but I think the technique could become so in two ways. As a teacher, I might start with a demo essay, work through a series of revisions, and then annotate them to illustrate aspects of structure, word choice, clarity, and brevity. As a student I might work through my own essay in the same way, guided by progressive feedback (in the annotation layer) from the teacher. It looks promising to me, what do you think?

Annotation is not (only) web comments

Annotation looks like a new way to comment on web pages. “It’s like Medium,” I sometimes explain, “you highlight the passage you’re talking about, you write a comment about it, the comment anchors to the passage and displays to its right.” I need to stop saying that, though, because it’s wrong in two ways.

First, annotation isn’t new. In 1968 Doug Engelbart showed a hypertext system that could link to regions within documents. In 1993, NCSA Mosaic implemented the first in a long lineage of modern annotation tools. We pretend that tech innovation races along at breakneck speed. But sometimes it sputters until conditions are right.

Second, annotation isn’t only a form of online discussion. Yes, we can converse more effectively when we refer to selected passages. Yes, such conversation is easier to discover and join when we can link directly to a context that includes the passage and its anchored conversation. But I want to draw attention to a very different use of annotation.

A web document is a kind of database. Some of its fields may be directly available: the title, the section headings. Other fields are available only indirectly. The author’s name, for example, might link to the author’s home page, or to a Wikipedia page, where facts about the author are recorded. The web we weave using such links is the map that Google reads and then rewrites for us to create the most powerful information system the world has yet seen. But we want something even more powerful: a web where the implicit connections among documents become explicit. Annotation can help us weave that web of linked data.

The semantic web is, of course, another idea that’s been kicking around forever. In that imagined version of the web, documents encode data structures governed by shared schemas. And those islands of data are linked to form archipelagos that can be traversed not only by people but also by machines. That mostly hasn’t happened because we don’t yet know what those schemas need to be, nor how to create writing tools that enable people to easily express schematized information.

Suppose we agree on a set of standard schemas, and we produce schema-aware writing tools that everyone can use to add new documents to a nascent semantic web. How will we retrofit the web we already have? Annotation can help us make the transition. A project called SciBot has given me a glimpse of how that can happen.

Hypothesis’ director of biosciences Maryann Martone and her colleagues at the Neuroscience Information Framework (NIF) project are building an inventory of antibodies, model organisms, and software tools use by neuroscientists. NIF has defined and promoted a way to identify such resources when mentioned in scientific papers. It entails a registry of Research Resource Identifiers (RRIDs) and a protocol for including RRIDs in scientific papers.

Here’s an example of some RRIDs cited in Dopaminergic lesioning impairs adult hippocampal neurogenesis by distinct modification of a-synuclein:

Free-floating sections were stained with the following primary antibodies: rat monoclonal anti-BrdU (1:500; RRID:AB_10015293; AbD Serotec, Oxford, United Kingdom), rabbit polyclonal anti-Ki67 (1:5,000; RRID:AB_442102; Leica Microsystems, Newcastle, United Kingdom), mouse monoclonal antineuronal nuclei (NeuN; 1:500; RRID:AB_10048713; Millipore, Billerica, MA), rabbit polyclonal antityrosine hydroxylase (TH; RRID:AB_1587573; Millipore), goat polyclonal anti-DCX (1:250; RRID:AB_2088494; Santa Cruz Biotechnology, Santa Cruz, CA), and mouse monoclonal anti-a-syn (1:100; syn1; clone 42; RRID:AB_398107; BD Bioscience, Franklin Lakes, NJ).

The term “goat polyclonal anti-DCX” is not necessarily unique. So the author has added the identifer RRID:AB_2088494, which corresponds to a record in NIF’s registry. RRIDs are embedded directly in papers, rather than attached as metadata, because as Dr. Martone says, “papers are the only scientific artifacts that are guaranteed to be preserved.”

But there’s no guarantee an RRID means what it should. It might be misspelled. Or it might point to a flawed record in the registry. Could annotation enable a process of computer-assisted validation? Thus was born the idea of SciBot. It’s a human/machine partnership that works as follows.

A human validator sends the text of an article to a web service. The service scans the article for RRIDs. For each that it finds, it looks up the corresponding record in the registry, then calls the Hypothesis API to post an annotation that anchors to the text of the RRID and includes the lookup result in the body of the annotation. That’s the machine’s work. Now comes the human partner.

If the RRID is well-formed, and if the lookup found the right record, a human validator tags it a valid RRID — one that can now be associated mechanically with occurrences of the same resource in other contexts. If the RRID is not well-formed, or if the lookup fails to find the right record, a human validator tags the annotation as an exception and can discuss with others how to handle it. If an RRID is just missing, the validator notes that with another kind of exception tag.

If you’re not a neuroscientist, as I am not, that all sounds rather esoteric. But this idea of a humans and machines working together to enhance web documents is, I think, powerful and general. When I read Katherine Zoepf’s article about emerging legal awareness among Saudi women, for example, I was struck by odd juxtapositions along the timeline of events. In 2004, reforms opened the way for women to enter law schools. In 2009, “the Commission for the Promotion of Virtue and the Prevention of Vice created a specially trained unit to conduct witchcraft investigations.” I annotated a set of these date-stamped statements and arranged them on a timeline. The result is a tiny data set extracted from a single article. But as with SciBot, the method could be applied by a team of researchers to a large corpus of documents.

Web documents are databases full of facts and assertions that we are ill-equipped to find and use productively. Those documents have already been published, and they are not going to change. Using annotation we can begin to make better use of the documents that exist today, and more clearly envision tomorrow’s web of linked data.

This hybrid approach is, I think, the viable middle path between two unworkable extremes. People won’t be willing or able to weave the semantic web. Nor will machines, though perfectly willing, be able to do that on their own. The machines will need training wheels and the guidance of human minds and hands. Annotation’s role as a provider of training and guidance for machine learning can powerfully complement its role as the next incarnation of web comments.

Owning and sharing your words

In 2001 I was among a community of early bloggers who came together around Dave Winer’s Radio UserLand, a tool for both publishing and aggregating blogs. To the world at large, blogging was rightly understood to be a new and exciting way for people to publish their writing online. Those of us exploring the new medium found it to be, also, a social network that was naturally immune to spam. We all had our own inviolate spaces for web writing. No moderation was needed because there were no comments. And yet rich discourse emerged. How? The web enabled us to link to one anothers’ posts, and RSS enabled us to monitor one anothers’ feeds. That was enough to sustain vibrant and civil conversation.

Things stayed civil because the system aligned incentives correctly. “You own your words,” Dave said, “and you speak in your own space on the web.” If you said something nasty about someone else you weren’t saying it in their space, or in some neutral space, you were saying it directly in your own space, one that represented you to the world.

Earlier systems, such as UseNet and Web forums, lacked this blend of mutual consent and accountability. So do modern ones such as Facebook and Twitter. The quality of discourse on Radio UserLand was, for a while, like nothing I’ve experienced before or since.

The blogosphere grew. Blog comments appeared. Google killed the dominant blog reader. Twitter and Facebook appeared. Now we can be sovereigns of our own online spaces, and we can be connected to others, but it seems that we can’t be both at the same time and in the same way.

What brings all this back is the kerfuffle, last week, that some of us who are building web annotation tools are calling TateGate. (Tate: annotate.) You can read about it on my company’s blog and elsewhere, but it boils down to a set of hard questions. Is it both legal and ethical to:

1. Write your words on my blog in an annotation overlay visible only to you in your browser?

2. Share the overlay with others in a private group space?

3. Share the overlay on the open web?

The answer to 1 is almost certainly yes. The original 1996 CSS spec, for example, recommended that browsers enable users to override publisher-defined style sheets. CSS recognized that the needs of publishers and readers are in dynamic tension. Publishers decide how they want readers to see their pages, but readers can decide differently. In 2005 a Firefox extension called Greasemonkey began empowering users to make functional as well as stylistic changes: adding a Delete button to Gmail, reporting book availability in local libraries on Amazon pages. If a site has ever successfully challenged the right of a user to alter a page locally, in the browser, I haven’t heard of it; neither have knowledgeable friends and acquaintances I’ve asked about this.

When the overlay is shared at wider scopes things get more complicated. The nexus of stakeholders includes publishers, readers, and annotators. Let’s explore that nexus in two different cases.

Climate news

In this case, climate scientists team up to annotate a news site’s story on climate change. The annotation layer is shared on the open web, visible to anyone who acquires the tool needed to view it. The scientists believe they are providing a public service. Readers who value the scientists’ assessment can opt in to view their annotations. Readers who don’t care what the scientists think don’t have to view the annotations. Publishers may be more or less comfortable with the existence of the opt-in annotation layer, depending on their regard for the scientists and their willingness to embrace independent scrutiny.

Before you decide what you think about this, consider an alternative. Climate skeptics team up to offer a competing annotation layer that offers a very different take on the story. Publishers, annotators, and readers still find themselves in the same kind of dynamic tension, but it will feel different to you in a way that depends on your beliefs about climate change.

Either way it’s likely that you’ll find this model generally sound. Many will agree that public information should be subject to analysis, that analysis should take advantage of the best tools, and that commentary anchored to words and phrases in source texts is a highly effective.

“TateGate”

In this case, a news site annotates a personal blog. The blogger believes that she owns her words, she moderates all comments to ensure that’s so, and she feels violated when she learns about a hostile overlay available to anyone who can discover it. The annotators are using a tool that doesn’t enable sharing the overlay in a private group, so we don’t know how the option to restrict the overlay’s availability might have have mattered. The annotation layer is available in two ways: as a proxied URL available in any unmodified browser, and as a browser extension that users install and activate. The proxy could in principle have been turned off for this blog, but it wasn’t, so we don’t know whether things would have played out differently if a user-installed browser extension were the only way to view the annotations.

These variables may affect how you think about this case. Your beliefs about what constitutes fair use, appropriation, and harassment certainly will. And there are still more variables. A site can, for example, choose to invite Hypothesis annotators by embedding our client. We envision, but have yet to offer, layers in which groups self-moderate annotations that all viewers can read but not write. And we envision that publishers might choose to make only certain of those channels discoverable in the annotation layer they choose to embed. That restriction would, however, not apply to users who bring their own independent annotation client to the page.

We at Hypothesis are soliciting a range of views on this thicket of thorny issues, and we are considering how to evolve tools and policies that will address them. Here I’m not speaking for my employer, though, I am just reflecting on the tension between wanting to own our words and wanting to share them with the world.

The closest modern equivalent to the Radio UserLand model is one that indie web folks call POSSE, which stands for Publish (on your) Own Site, Syndicate Everywhere. POSSE encourages me to comment on your site by writing a post on my site and notifying yours about it. You can choose to accept my contribution or not. If you do accept it, there’s a sense in which it is not a statement that lives on your site but rather one that lives on mine and is reflected to yours. Both parties negotiate a zone of ambiguity between what’s on my site and what’s on your site.

Web annotation seems less ambiguous. When I highlight your words and link mine to them, in an overlay on your page, it seems more as if mine are appearing on your site. Is that an unavoidable perception? I don’t know. Maybe there’s a role for a CSS-like mechanism that enables publishers, annotators, and readers to negotiate where and how annotations are displayed. It’s worth considering.

Here’s what I do know: I’m lucky to be involved in a project that raises these issues and challenges us to consider them carefully.

Liminal thinking at scale

My short 2009 review1 of Stewart Brand’s Whole Earth Discipline includes this Kevin Kelly quote that continues to resonate for me:

Kevin Kelly calls the book “a short course on how to change your mind intelligently” — in this case, about cities, nuclear power, and genetic and planetary engineering. These are all things that Steward Brand once regarded with suspicion but now sees as crucial tools for a sustainable world.

In Changeable minds I wrote about a touchstone question that I now sometimes ask people:

What’s something you believed deeply, for a long time, and then changed your mind about?

It’s a hard question for any of us to answer, but as Dave Gray and Wael Ghonim have recently reminded me, it matters more and more that we try. Here’s a useful picture I grabbed from Gray’s screencast on what he calls liminal thinking:

The idea is that I’m standing in the bubble on the left, atop an unconscious pyramid of belief formation. You are standing in the bubble on the right, atop your own unconscious pyramid. And our two pyramids rest on different regions of an underlying reality. How can we engage in productive discourse?

Gray says it requires two tricky maneuvers. First I need to shine a light down into the unconscious fog, climb down my own “ladder of inference,” and reflect on how my own experience of reality informs my own beliefs. Then, he says, I need to take that flashlight, walk over to your pyramid, and climb up your ladder of inference. “Liminal thinking,” he tweeted the other day, “is the art of creating change by understanding, shaping, and reframing beliefs.”

I’ll surely read his book when it comes out. But since I already agree with the principles and practices it espouses, I don’t expect a mind-changing outcome. It’s clear that Dave Gray and I stand on mostly-overlapping belief pyramids. What would motivate somebody not in that bubble to want to cross the chasm to a very different pyramid?

Wael Ghonim’s latest TED talk suggests an intriguing possibility. He’s now given two such talks. The first, in 2011, was a stirring tribute to social media’s role in fomenting the Arab Spring. (Ghonim created the pivotal We are all Khaled Said Facebook page.) In 2016 the Arab Spring seems a distant memory, and Ghonim entitled his latest talk Let’s design social media that drives real change. Here’s the key takeaway for me:

There’s a lot of debate today on how to combat online harassment and fight trolls. This is so important. No one could argue against that. But we need to also think about how to design social media experiences that promote civility and reward thoughtfulness. I know for a fact if I write a post that is more sensational, more one-sided, sometimes angry and aggressive, I get to have more people see that post. I will get more attention.

But what if we put more focus on quality? What is more important: the total number of readers of a post you write, or who are the people who have impact that read what you write? Couldn’t we just give people more incentives to engage in conversations, rather than just broadcasting opinions all the time? Or reward people for reading and responding to views that they disagree with? And also, make it socially acceptable that we change our minds, or probably even reward that?

I suspect that relatively few of us already are liminal thinkers, or are willing and able to learn and apply the principles and practices. Can we imagine, and build, a social media platform that encourages liminal thinking at scale? That’s an idea worth sharing.


1 When I revisited that post today, I was also intrigued by this:

Don’t miss the annotations — a website that reproduces every paragraph that includes citations, links to their sources, and adds updates.

Alas, the link to those annotations — in iCloud, at http://web.me.com/stewartbrand/DISCIPLINE_footnotes/Contents.html — has rotted. For me it’s another reminder to prioritize work on the archival capabilities we envision for Hypothesis. We want to archive both your annotations and (where possible) the documents they refer to.

When it’s cold in New England, thoughts turn to alternative home heating

I started this WordPress incarnation of my blog in late 2007. On this day in 2009 I published  one of the most-read posts here: Central heating with a wood gasification boiler. WordPress stats have shown me that interest has been seasonal. When it’s winter in the northeastern US, people still heating with oil imagine alternatives. As a result, more people find their way to that post than do in summer.

Would this year’s freaky warmth depress that historical wintertime interest in the article? That’s what I expected to find, and this chart appears to confirm it.

eko-post-dec-views

Of course interest in the blog has declined in general over that period, because I’ve put less effort into writing and promoting it. But if we chart another perennial favorite, Why Guiness tastes better in Ireland, there’s no downward trend:

guiness-post-dec-views

So I think the temperature correlation is valid. And I predict the orange curve on the first chart will trend upward when there’s another cold winter in the Northeast.