Towards accessible annotation: a prototype and some questions

The most basic operation in — select text on a page, click the Annotate button — is not yet accessible to a visually-impaired person who is using a screenreader. I’ve done a bit of research and come up with an approach that looks like it could work, but also raises many questions. In the spirit of keystroke conservation I want to record here what I think I know, and try to find out what I don’t.

Here’s a screencast of an initial prototype that shows, with the NVDA screen reader active on my system, the following sequence of events:

  • Load the Gettysburg address.
  • Use a key to move a selection from paragraph to paragraph.
  • Hear the selected paragraph.
  • Tab to the Annotate button and hit Enter to annotate the selected paragraph.

It’s a start. Now for some questions:

1. Is this a proper use of the aria-live attribute?

The screenreader can do all sorts of fancy navigation, like skip to the next word, sentence, or paragraph. But its notion of a selection exists within a copy of the document and (so far as I can tell) is not connected to the browsers’s copy. So the prototype uses a mechanism called ARIA Live Regions.

When you use the hotkey to advance to a paragraph and select it, a JavaScript method sets the aria-live attribute on that paragraph. That alone isn’t enough to make the screenreader announce the paragraph, it just tells it to watch the element and read it aloud if it changes. To effect a change, the JS method prepends selected: to the paragraph. Then the screenreader speaks it.

2. Can JavaScript in the browser relate the screenreader’s virtual buffer to the browser’s Document Object Model?

I suspect the answer is no, but I’d love to be proven wrong. If JS in the browser can know what the screenreader knows, the accessibility story would be much better.

3. Is this a proper use of role="link"?

The first iteration of this prototype used a document that mixed paragraphs and lists. Both were selected by the hotkey, but only the list items were read aloud by the screen reader. Then I realized that’s because list items are among the set of things — links, buttons, input boxes, checkboxes, menus — that are primary navigational elements from the screenreader’s perspective. So the version shown in the screencase adds role="link" to the visited-and-selected paragraph. That smells wrong, but what’s right?

4. Is there a polyfill for Selection.modify()?

Navigating by element — paragraph, list item, etc. — is a start. But you want to be able to select the next word (or previous) word or sentence or paragraph or table cell. And you want to be able to extend a selection to include the next word or sentence or paragraph or table cell.

A non-standard technology, Selection.modify(), is headed in that direction, and works today in Firefox and Chrome. But it’s not on a standards track. So is there a library that provides that capability in a cross-browser fashion?

It’s a hard problem. A selection within a paragraph that appears to grab a string of characters is, under the covers, quite likely to cross what are called node boundaries. Here, from an answer on StackOverflow, is a picture of what’s going on:

When a selection includes a superscript3 as shown here, it’s obvious to you what the text of the selection should be: 123456790. But that sequence of characters isn’t readily available to a JavaScript program looking at the page. It has to traverse a sequence of nodes in the browser’s Document Object Model in order to extract a linear stream of text.

It’s doable, and in fact does just that when you make a selection-based annotation. That gets harder, though, when you want to move or extend that selection by words and paragraphs. So is there a polyfill for Selection.modify()? The closest I’ve found is rangy, are there others?

5. What about key bindings?

The screen reader reserves lots of keystrokes for its own use. If it’s not going to be possible to access its internal representation of the document, how will there be enough keys left over for rich navigation and selection in the browser?


  1. I don’t understand most of this but was wondering how much of a role audio can play in this? Can the software await particular audio input from the user to annotate? How do they choose a particular section to highlight (even using keystrokes). And a silly question – you can’t have an “override” function on a keystroke that allows the reader to then use all the keys for a different function? (sorry if these are all silly suggestions. Just thinking aloud)

  2. “Can the software await particular audio input from the user to annotate?” Once there’s a key-driven solution, voice input can serve as an alternative way to press the keys. The mapping from voice to commands could be direct, of course, but for commands — unlike for dictation — I would guess that a common key-driven interface makes sense.

    “You can’t have an “override” function on a keystroke” Not a silly question at all, I feel like that must exist and I haven’t found it yet.

    If you haven’t done this experiment yet, try navigating your computer with a screen reader. It’s an awesome and humbling experience. I once interviewed Susan Gerhart about that — — and the episode is queued up for my hike today as a reminder to me.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s