New England still too wet. Escaping to sunny Old England.

We’re just back from a Caribbean vacation — with a couple of interesting souvenirs in tow. Under normal circumstances I’d feel a twinge of regret about turning around a day later and heading out again. But I’m not really in the mood to build an ark, which after 40 days of rain is about to become the new summer sport here in New England. And while the wet isn’t letting up yet here, the weather looks lovely in Old England. So it’s actually a great time to head off to London for a Tuesday visit and talk at Nature Publishing, panels and a talk at the Activate conference on Wednesday, and another talk at the Guardian on Thursday. That one is open to the guests — for the first time, I gather. The writeup also notes:

Many people will then head down to the Rotunda bar for drinks on the canal waterfront after the talk at about 6.

In all these venues I’ll be expanding on the themes I’ve written about here lately: collaborative curation, computational thinking for everyone, community calendars as a motivating case study, and Azure as platform for doing stuff in the cloud.

By the time I get home for July 4, it ought to be dry here. If not, I’ll break out my cubit-calibrated tape measure and get to work on that ark.

It’s the headings, stupid!!!

My recent adventure in naming the times of day was so much fun that I lost track of the original purpose of the exercise, which was to improve accessibility for sight-impaired users.

When I interpersed time-of-day labels into each day’s event listing, I used HTML DIV tags. Wrong, wrong, wrong! Those labels are structural elements, and as my accessibility consultant Susan Gerhart gently reminded me, screen readers depend on HTML headings to find and announce them. The labels should have been second-level headings — i.e., HTML H2 tags.

It gets worse. When Susan prompted me to take another look at what I’d done, I found that the date labels were inexplicably tagged as paragraphs (P) instead of the top-level headers (H1) that they logically are.

Oh. Right. Of course. Duh. Fixed. Sorry.

What was I thinking? How could somebody like me, who has preached about the attention-focusing power of heads, decks, and leads, screw up something so basic as this?

Easily, as it turns out, in the absence of feedback. If you yourself don’t depend on a design feature, there is a natural tendency to forget why it matters to others.

Coincidentally (or not) Susan recently wrote an essay, and published a companion audio recording, that will help me — and I hope others — not to forget again. Entitled Hear Me Stumble Around White House, Recovery, and Data GOV web sites, it’s a blow-by-blow account of her efforts to navigate those sites with a screen reader.

In this recording you can hear Susan and her screen reader trying to make sense of whitehouse.gov. If you’ve never heard a screen reader in action, it’s worth listening for that alone. You’ll get a very clear sense of how these tools depend on the hierarchy of the page.

Simultaneously you’ll hear Susan narrate her intention — to read an article about cybersecurity — and her frustration. For example:

I was thrown off by the slide show at the top of the page. Once I hit the cybersecurity story, the next time I traverse this section the story was about the Supreme Court nominee.

Despite this randomness, the page does at least identify the top stories with H1 tags. And Signed Legislation is an H2. But none of the headlines under Signed Legislation are H3s, they’re Ps.

Over at recovery.gov and data.gov Susan finds none at all, and reacts to their omissions less gently she did to mine:

It’s the headings, stupid!!!

Thanks. I will try not to forget that again.


PS: In a follow-up to her blog essay, Susan links to detailed reports by accessibility pioneer Jim Thatcher on the issues he found with data.gov and recovery.gov.

Endangered languages and linguistic best practices

Daniel Everett’s recent Long Now talk about endangered languages (writeup, mp3) includes this gem reported by Stewart Brand:

Among other things, the wide variety of verb forms are used to account for the directness of evidence for a statement. Everett originally went to the Pirahã in 1977 as a Christian missionary. They challenged him to provide evidence for the existence of Jesus, and lost interest when he couldn’t. Eventually so did he. The Pirahã made him an atheist.

This is so interesting that it’s worth unpacking for those who won’t have time to listen. Among the sixteen suffixes for verbs, there are three that convey the source of evidence:

I heard that Dan went fishing.

I saw Dan go fishing.

I deduce, from the available evidence, that Dan went fishing.

These assertions might not be true. The Pirahã, being human, do sometimes lie. But I love the idea of a culture in which evidence-based thinking is baked into the language.

There are only a few hundred Pirahã, and their language is only one of thousands — more than half unwritten — that are endangered. The talk ends with plea to preserve and document those languages.

It has never been easier to capture and disseminate recorded audio, or to collaboratively curate such material, so I hope these capabilities will be put to good use in the quest to preserve linguistic diversity.

But no matter what, we’re going to continue to lose languages. Maybe, though, if we can identify some of the ways of thinking encoded in those languages, we can carry them forward.

Respect for the source of evidence is a great example. I could have simply told you about what Daniel Everett said, and what Stewart Brand wrote about what Daniel Everett said. But it was possible to form links to the audio and text, so I did.

I wonder how many other best practices are encoded in those thousands of endangered languages. And I wonder if it might be possible to identify and catalog more of them.

When does afternoon begin?

When I invited folks to become calendar curators for the elmcity project, the person who stepped forward in Prescott AZ was Susan Gerhart, whom I interviewed here. One of her great insights about web design is that the right thing for a vision-impaired user is almost always also the right thing for everyone. She calls this the curb cuts principle:

Curb cuts for wheelchairs also guide blind persons into street crossings and prevent accidents for baby strollers, bicyclists, skateboarders, and inattentive walkers.

So I shouldn’t have been surprised when Susan noticed that the HTML rendering of the calendar need some curb cuts. Within each day, the events show up as a long undifferentiated list. She suggested that subdividing the list by time of day — morning, afternoon, evening — will be helpful to folks using screen readers. But in fact, it’s just plain helpful. So I’m testing a version of that idea now.

Ionically I was just thinking about this same principle in another context. The new version of Oakland Crimespotting, which I raved about, segments incidents using this vocabulary:

light, dark, commute, nightlife, day, night, swing shift

In that spirit, I’m trying this:

morning, lunch, afternoon, evening, night

This of course leads to the question: When do these times begin and end?

I was fascinated to see that both Google and Bing return the same Yahoo answers page for the query morning afternoon evening.

For now, though, I’m going with this ruleset:

  Morning:  5:00 AM to 11:30 AM
    Lunch: 11:30 AM to  1:00 PM
Afternoon:  1:30 PM to  5:30 PM
  Evening:  5:30 PM to  9:00 PM
   Night:   9:00 PM to  5:00 AM

But I’ll make these rules — and maybe even the time-of-day names — configurable on a per-location basis.

Bulk search-and-replace for blog entries

Last night I realized there was one more step needed to restore my 2002-2006 archive. All of my references into that archive from this blog, which started in December 2006, had to be redirected. What’s more, they had to be remapped. Old URLs like http://weblog.infoworld.com/udell/2006/12/04.html#a1571 had to become new URLs like http://jonudell.net/udell/2006-12-04-hunting-the-elusive-search-strategy.html.

Even without the remapping, it’s not obvious how to do a simple search and replace (say, from weblog.infoworld.com/udell to jonudell.net/udell) across a set of blog entries. I tried the export/edit/import route, but — at least in the case of WordPress — that doesn’t seem to be a way to update existing stuff.

So I wound up writing a script that uses the MetaWeblog API to fetch my current blog entries, find references to the old namespace, adjust them to point to the new namespace, and update the entries. It’s here for my own future reference, and for yours if you need it.

As always in these situations, I end up wondering what a civilian would do. Blog publishing systems don’t seem have bulk search-and-replace capability. They do, however, have APIs. There could be a tool or service that helps people make these kinds of changes. It’d be hard to avoid the password anti-pattern, so if this were a cloud-based service rather than a locally-installed tool you’d want to change your password after using it. But still, it should be doable.

Do such tools or services exist?

Rebooting my 2002-2006 archive

While spot-checking my mostly-reconstructed 2002-2006 blog, I found this plaint from 2002:

When you are a writer whose entire corpus exists online, woven into a fabric of citation and commentary, it is incredibly painful to see that fabric torn apart.

Déjà vu all over again. In 2002 I had to sacrifice the linkage to my 1999-2002 BYTE.com and restore it here. Now I’ve done the same for my 2002-2006 InfoWorld blog. Since its former namespace isn’t being redirected, and since all the old links were broken anyway, I’ve taken this opportunity to create new descriptive names that incorporate dates and titles.

The reboot isn’t 100% clean, but it’s automated and reproducible so I can address categories of problems as they show up.

I’m glad I’m not in publishing anymore. It turns out to be a lousy way to keep your stuff published. When a commercial hosted lifebits service comes online, I’ll be customer #1.

Scribbling in the margins of iCalendar

Last week I mentioned three ways for elmcity curators to categorize events:

  1. If a source iCalendar feed uses the CATEGORIES property, they’ll be included.

  2. If all of the events from a feed can be categorized, you can name that category in the Delicious metadata, using category=CATEGORY. All events from the feed will inherit it in the same way that they all inherit the default clickthrough link specified with url=URL.

  3. If all of the events from an Upcoming or Eventful venue can be categorized, you can also name that category in the Delicious metadata. To do that, bookmark the venue URL and use the patterns venue={UPCOMING|EVENTFUL} and category=CATEGORY.

Now I’ve added a fourth. In any iCalendar app you can now use these patterns in the Description field:

url=http://www.harlowspub.com

category=music,bluegrass

The url=… and category=… patterns can occur anywhere in the description.

This is particularly useful for recurring events. As discussed here, recurring events are a great way to build critical mass because your curation effort keeps paying dividends.

For example, one of the events I found when exploring the search page for Keene is the Monday night bluegrass jam at Harlow’s Pub.

Here’s the description I entered into Windows Live Calendar — which also could have been entered into Google Calendar, or any other iCalendar app:

The Birch Benders host a Bluegrass picking party at Harlow’s Pub in Peterborough every Monday night – 8 pm until they kick us out (11 or so). url=http://www.harlowspub.com category=music

Here’s the rendered result:

Mon 08:00 PM Bluegrass night with the Birchbenders (recurring events) (music)

The same data shows up in the downstream XML, ICS, and JSON feeds.

Since the iCalendar spec allows for a CATEGORIES element, this approach shouldn’t be necessary. But not all calendar apps allow you to tag events in this way. Outlook does, but Google Calendar, Live Calendar, and Apple iCal don’t.

Fortunately we can scribble in the margins. I first used that phrase in an InfoWorld story about a feature of the Internet’s Domain Name System called the TXT record. Although it is possible to define more specific record types, it’s hard to get everyone to agree to use them. So developers have historically “scribbled in the margins” of the DNS. And we can do the same with iCalendar.


PS: The title of that InfoWorld story was actually Filling in the Margins, which wasn’t what I wrote and which I never liked. The title I wrote was Scribbling in the Margins, and I used it for the blog entry that introduced the InfoWorld article. I’ll have that entry back online soon, along with the rest of my archive from that era. But meanwhile, when I search for the title using doublesearch, I notice an interesting point of comparison between Google and Bing. It’s been over a month since that blog archive went dark, and Google has now evidently forgotten about it. But Bing remembers. I don’t have any special insight into how Bing works, but I’ll be interested to see how long it keeps remembering.

Replaying history

In his writeup on Google Wave, Dare Obasanjo says:

I’m sure there are thousands of Web developers out there right now asking themselves “would my app be better if users could see each others’ edits in real time?”,”should we add a playback feature to our service as well” [ed note – wikipedia could really use this] and “why don’t we support seamless drag and drop in our application?”. All inspired by their exposure to Google Wave.

Indeed, every application that preserves a change history needs playback. Wikipedia, as Dare notes, is a prime candidate. Back in 2006, I made this LazyWeb request:

Animation is the best way to visualize the flow of change, as I discovered when I made my Wikipedia screencast. For Wikipedia, and indeed for all kinds of living documents supported by revision history and diff tools, I can imagine being able to isolate a paragraph or section and autogenerate the screencast of its evolution. I can even imagine the content of such visualizations being considered not just cutting-room floor debris but, rather, part of the “real” document, like footnotes.

Andy Baio responded by sponsoring a contest for a tool that would do just that. And I made a screencast demonstrating Dan Phiffer’s winning entry.

That script is unavailable at the moment because, ironically, Dan’s server reports:

Oh noes! I got HACK*D. I’m sifting through my files and should restore things back to normal soon.

In any case, it probably wasn’t practical for routine use. Fetching every revision on the fly really hammers Wikipedia. What’s really needed — again, not just for Wikipedia but everywhere — is a general way to query change history, and return a stream of versions and differences.

One way of doing the latter would be to use FeedSync, an open extension to RSS/Atom that supports synchronization in Live Mesh. Another would be to use Google’s Wave protocol. Because FeedSync deals with lists of items, which can be arbitrary chunks of content, whereas Wave deals with lists of document-mutation operations, like delete-element and start-annotation, it seems to me that FeedSync is more general, albeit less immediately useful for collaborative editing.

To explain why generality matters, consider change animation in a very different domain: software configuration. My wife, for example, sometimes changes her settings — in Word or Firefox — in ways that cause problems. If these apps persisted their settings to Live Mesh, as they could and arguably should, I’d be able to debug a mishap locally or remotely. But ideally, the change visualization would be sufficiently user-friendly so that she’d have a shot at figuring it out for herself.


PS: Speaking of history and restoration, I’ve been feeling like an amnesiac ever since my InfoWorld archive went dark. So in spare moments I’ve been reconstructing and republishing it. I’ll have the text of all the old blog entries up soon. And I’ve been restoring the screencasts as well. I’m keeping track of my progress at delicious.com/judell/screencast+restored.

More usefully cool stuff from Stamen

My plumber’s last name is Thieme. I was just looking up his phone number, and got distracted when I realized that the people search in Live Bing does a fair job of visualizing the geographic distribution of surnames. If you do a people search for Thieme, New Hampshire, and start panning around at county and state resolutions, you can see where Thiemes have clustered and where they haven’t.

As I was doing this, I suddenly realized: Why don’t maps offer named zoom levels? If you want to pan across the country at state or county resolution, it requires an enormous amount of continuous zooming in and out. Of course the sizes of states and counties vary as you move across the country. But that’s the whole point. Computers can do the math and automate those adjustments.

What prompted this thought was the newly-redesigned Oakland Crimespotting, which features a nifty new widget for selecting times of day. Stamen Designs’ Eric Rodenbeck, whom I recently interviewed, calls it the time pie. It’s fun to spin your way through the hours, making contiguous or discontiguous selections. But what’s really useful are the named slices: light, dark, commute, nightlife, day, night, swing shift. As Stamen’s blog notes:

The last time slices (day, night and swing) are the ways that the police view this information, and one thing we hope will come from the project is a better understanding of how the police view their data as it’s collected.

Nice!

What you may not notice, as you navigate the new interface, is that every adjustment is reflected in an exquisitely detailed URL. It’s not obvious because the URLs are really long, and the changes happen outside the visible part of the browser’s location window. But watch:

Default: http://oakland.crimespotting.org/map/#dtend=2009-06-04T20:35:28-07:00&lat=37.806&types=AA,Mu,Ro,SA,DP,Na,Al,Pr,Th,VT,Va,Bu,Ar&lon=-122.270&hours=16-23&zoom=14&dtstart=2009-05-28T20:35:28-07:00

Hide all crime types: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=&lon=-122.270&hours=0-23&zoom=14&dtstart=2009-05-28T23:59:59-07:00

Show all and extend dates to max range: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=AA,Mu,Ro,SA,DP,Na,Al,Pr,Th,VT,Va,Bu,Ar&lon=-122.270&hours=0-23&zoom=14&dtstart=2009-05-08T00:00:00-07:00

Narcotics only: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=Na&lon=-122.270&hours=0-23&zoom=14&dtstart=2009-05-08T00:00:00-07:00

Nighttime narcotics: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=Na&lon=-122.270&hours=16-23&zoom=14&dtstart=2009-05-08T00:00:00-07:00

Wee hours narcotics: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=Na&lon=-122.270&hours=1-4&zoom=14&dtstart=2009-05-08T00:00:00-07:00

As noted on the Stamen blog, this means that:

It’s now possible to navigate and link to recent newsworthy events like the assassination of journalist Chauncey Bailey, the Oscar Grant riots from January 2009, and the Lovelle Mixon incident from this past March.

The Stamen crew is renowned for brilliance, and rightly so. But the principles at work here — thoughtful naming, granular linking — are ones that we all can and should practice, in the many small ways that we can as we explore and co-create the infosphere.

Categorizing events

Curation is always a two-step tango. First you collect, then you categorize. Until now, the elmcity project has been all about collecting. But as the nodes of this network of community hubs start to light up, and as curators gather growing numbers of calendar feeds, it’s time to start enabling them to categorize as well.

This is a classic hard problem. How do you get people to tag hundreds or thousands of items? What makes the problem even harder, in the domain of events, is that once those items fade into the past, any effort invested in tagging them is lost.

My answer is, at least for now: Don’t worry too much about tagging individual events. Instead, gain leverage by finding ways to tag sources of events. Here are two good strategies:

1. Categorizing iCalendar feeds

The obvious place to start is with the iCalendar feeds that curators are collecting. There’s already a mechanism in place to capture metadata about those feeds. Here, for example, is the iCalendar feed for the 2009 Board of Supervisors meetings in Prescott, AZ:

http://fusecal.com/calendar/ical/3200531?h=b75b09c8-50c2-11de-9169-00163e12298c

That’s an iCalendar feed that was made from this web page:

http://www.co.yavapai.az.us/Events.aspx/id=32794

If you check the Delicious metadata for Prescott’s iCalendar feeds, you’ll see this structure:

title: Board of Supervisors
  url: http://fusecal.com/calendar/ical/3200531?h=b75b09c8-50c2-11de-9169-00163e12298c
  tag: trusted
  tag: ics
  tag: feed
  tag: url=http://www.co.yavapai.az.us/Meetings.aspx/folderid=1488&year=2009
  tag: category=government

The url= tag was already there. It provides the all-important link back to a human-readable authoritative source for events coming from this feed. It’s best if individual events provide their own links, but often in iCalendar feeds they don’t, so this is the default link.

What’s new is the category= tag. Now all events coming from this feed will carry that category. For example:

Mon Jun 15 2009


Regular Meeting – Cottonwood N/A
(Board of Supervisors)
(government)

The same info travels downstream, to the aggregated Prescott iCalendar feed:

BEGIN:VEVENT
CATEGORIES:government
DESCRIPTION:Regular Meeting - Cottonwood N/A \n\n****************
nfrom  FuseCal.com\n ******************************\n\n
DTSTART;VALUE=DATE:20090615
LOCATION: (see http://www.co.yavapai.az.us/Events.aspx?id=32794)
SEQUENCE:0
SUMMARY:Regular Meeting - Cottonwood N/A         
UID:633797255542010000-1196352865@elmcity.cloudapp.net
URL:http://www.co.yavapai.az.us/Events.aspx?id=32794
END:VEVENT

And to the aggregated XML feed:

<event>
<title>Regular Meeting - Cottonwood N/A</title>
<url>http://www.co.yavapai.az.us/Events.aspx?id=32794</url>
<source>Board of Supervisors</source>
<dtstart>2009-06-15T00:00:00</dtstart>
<categories>government</categories>
</event>

This strategy only works, for course, for feeds that can be categorized. And that won’t always be true. Events coming from the ReadItNews feed don’t fit into any single category (or short list of categories). So they’ll remain untagged for now. That’s OK. Better to make some progress than to make none. This partial approach yields a nice return on investment. And thanks to the bulk editing feature of Delicious, it’s really quick and easy to select a set of feeds and then tag them with a category= tag.

2. Categorizing Eventful and Upcoming venues

We can use a variation of this strategy to categorize sources of events coming from Eventful and Upcoming. In this case, the lever is the venue. Not all venues host events that can be categorized. But some do, and in those cases, why not exploit that?

The strategy here is to bookmark and tag the event’s venue URL from Upcoming or Eventful. Here are two examples:

Upcoming

title: Venue: Prescott YMCA - Upcoming
  url: http://upcoming.yahoo.com/venue/435420
  tag: venue=upcoming
  tag: category=recreation

Eventful

title: Venue: Raven Cafe
  url: http://eventful.com/prescott/venues/raven-cafe-/V0-001-000366078-7
  tag: venue=eventful
  tag: category=music

If you check the default HTML view of Prescott’s aggregated events, you’ll see that these categories indeed show up. They’re also in the downstream XML, ICS, and JSON feeds.

But can’t the source iCalendar feeds provide per-event categories?

Yes, some do. In the case of Prescott, the public library‘s iCalendar feed uses the CATEGORIES property, so those categories show up too. For example:

Thu 02:00 PM
Sign up for Computer Mentor
(Prescott Library)
(Adult Computer Class,library)

Here we see a list of two categories. The first item, Adult Computer Class, was in the original iCalendar feed. The second item, library, was inherited from the feed metadata specified by the curator.

There’s a long way to go with this stuff. But this is a nice start!

Talking with Jamie Heywood about PatientsLikeMe

Jamie Heywood joined me for this week’s Innovators show. His quest to cure ALS (Amyotrophic Lateral Sclerosis, aka Lou Gehrig’s Disease) is featured in a book and a movie. In this conversation, we explore Jamie’s current project: PatientsLikeMe. It’s a website where people pool data about their medical conditions, their drug regimes and related therapies, and their outcomes.

Of course people have been sharing medical information online since it became possible to do so. But PatientsLikeMe differs from other online health communities in several ways. The profile of a user is someone who is grappling with a serious, life-changing illness where:

  • You are very debilitated, perhaps even unable to go to work.

  • You can tell if your treatment is helping. (If you have Parkinson’s disease or depression, for example, you can judge what works or doesn’t. If you have breast cancer, you can’t.)

  • You are in a situation where both diagnosis and treament are ambiguous.

The data that you report brings you into direct contact with other patients who share similar conditions and treatments. In this sense, PatientsLikeMe is a uniquely data-driven social network:

It is the richest open quantified human-to-human network that exists. There are a couple of hundred measured channels on which you can evaluate yourself against everyone else that you might be interested in connecting to. And you can go across any of those channels to anyone else in the world.

The data you report also brings you into direct contact with drug companies:

It connects you with the people who are developing the drugs to treat your disease. This cuts out an immense amount of inefficiency and middlemen, and can potentially make the system much better. It’s a way of rationalizing and accelerating discovery.

For that reason, Jamie sees no need to apologize for PatientsLikeMe’s business model, which is to sell the data it collects to drug companies. This arrangement may even, arguably, be a form of citizen science:

Do I think that we’ll be using crowdsourcing to interpret the RNA signature in blood? No. But in the real world, when you ask what it means to have ALS, each patient in the system is a representative of their own specific phenotype of this illness. Which is a way of putting it into the process of discovery. Because if you’re not in there — if you’re different, and everyone is unique in some way — the specific components of your own health and its impacts on your life will not be addressed in the process of treatment.

What about privacy? Jamie admits, honestly, that there can be no guarantees, and does not think people who expect guarantees should use PatientsLikeMe. It isn’t for everyone. But there are a number of folks who, after evaluating the risk of participating (pseudonymously) in the service, conclude that the benefit outweighs that risk. They are part of a collective experiment that I will be watching with the greatest interest.