We’re just back from a Caribbean vacation — with a couple of interesting souvenirs in tow. Under normal circumstances I’d feel a twinge of regret about turning around a day later and heading out again. But I’m not really in the mood to build an ark, which after 40 days of rain is about to become the new summer sport here in New England. And while the wet isn’t letting up yet here, the weather looks lovely in Old England. So it’s actually a great time to head off to London for a Tuesday visit and talk at Nature Publishing, panels and a talk at the Activate conference on Wednesday, and another talk at the Guardian on Thursday. That one is open to the guests — for the first time, I gather. The writeup also notes:

Many people will then head down to the Rotunda bar for drinks on the canal waterfront after the talk at about 6.

In all these venues I’ll be expanding on the themes I’ve written about here lately: collaborative curation, computational thinking for everyone, community calendars as a motivating case study, and Azure as platform for doing stuff in the cloud.

By the time I get home for July 4, it ought to be dry here. If not, I’ll break out my cubit-calibrated tape measure and get to work on that ark.

My recent adventure in naming the times of day was so much fun that I lost track of the original purpose of the exercise, which was to improve accessibility for sight-impaired users.

When I interpersed time-of-day labels into each day’s event listing, I used HTML DIV tags. Wrong, wrong, wrong! Those labels are structural elements, and as my accessibility consultant Susan Gerhart gently reminded me, screen readers depend on HTML headings to find and announce them. The labels should have been second-level headings — i.e., HTML H2 tags.

It gets worse. When Susan prompted me to take another look at what I’d done, I found that the date labels were inexplicably tagged as paragraphs (P) instead of the top-level headers (H1) that they logically are.

Oh. Right. Of course. Duh. Fixed. Sorry.

What was I thinking? How could somebody like me, who has preached about the attention-focusing power of heads, decks, and leads, screw up something so basic as this?

Easily, as it turns out, in the absence of feedback. If you yourself don’t depend on a design feature, there is a natural tendency to forget why it matters to others.

Coincidentally (or not) Susan recently wrote an essay, and published a companion audio recording, that will help me — and I hope others — not to forget again. Entitled Hear Me Stumble Around White House, Recovery, and Data GOV web sites, it’s a blow-by-blow account of her efforts to navigate those sites with a screen reader.

In this recording you can hear Susan and her screen reader trying to make sense of whitehouse.gov. If you’ve never heard a screen reader in action, it’s worth listening for that alone. You’ll get a very clear sense of how these tools depend on the hierarchy of the page.

Simultaneously you’ll hear Susan narrate her intention — to read an article about cybersecurity — and her frustration. For example:

I was thrown off by the slide show at the top of the page. Once I hit the cybersecurity story, the next time I traverse this section the story was about the Supreme Court nominee.

Despite this randomness, the page does at least identify the top stories with H1 tags. And Signed Legislation is an H2. But none of the headlines under Signed Legislation are H3s, they’re Ps.

Over at recovery.gov and data.gov Susan finds none at all, and reacts to their omissions less gently she did to mine:

It’s the headings, stupid!!!

Thanks. I will try not to forget that again.


PS: In a follow-up to her blog essay, Susan links to detailed reports by accessibility pioneer Jim Thatcher on the issues he found with data.gov and recovery.gov.

Daniel Everett’s recent Long Now talk about endangered languages (writeup, mp3) includes this gem reported by Stewart Brand:

Among other things, the wide variety of verb forms are used to account for the directness of evidence for a statement. Everett originally went to the Pirahã in 1977 as a Christian missionary. They challenged him to provide evidence for the existence of Jesus, and lost interest when he couldn’t. Eventually so did he. The Pirahã made him an atheist.

This is so interesting that it’s worth unpacking for those who won’t have time to listen. Among the sixteen suffixes for verbs, there are three that convey the source of evidence:

I heard that Dan went fishing.

I saw Dan go fishing.

I deduce, from the available evidence, that Dan went fishing.

These assertions might not be true. The Pirahã, being human, do sometimes lie. But I love the idea of a culture in which evidence-based thinking is baked into the language.

There are only a few hundred Pirahã, and their language is only one of thousands — more than half unwritten — that are endangered. The talk ends with plea to preserve and document those languages.

It has never been easier to capture and disseminate recorded audio, or to collaboratively curate such material, so I hope these capabilities will be put to good use in the quest to preserve linguistic diversity.

But no matter what, we’re going to continue to lose languages. Maybe, though, if we can identify some of the ways of thinking encoded in those languages, we can carry them forward.

Respect for the source of evidence is a great example. I could have simply told you about what Daniel Everett said, and what Stewart Brand wrote about what Daniel Everett said. But it was possible to form links to the audio and text, so I did.

I wonder how many other best practices are encoded in those thousands of endangered languages. And I wonder if it might be possible to identify and catalog more of them.

When I invited folks to become calendar curators for the elmcity project, the person who stepped forward in Prescott AZ was Susan Gerhart, whom I interviewed here. One of her great insights about web design is that the right thing for a vision-impaired user is almost always also the right thing for everyone. She calls this the curb cuts principle:

Curb cuts for wheelchairs also guide blind persons into street crossings and prevent accidents for baby strollers, bicyclists, skateboarders, and inattentive walkers.

So I shouldn’t have been surprised when Susan noticed that the HTML rendering of the calendar need some curb cuts. Within each day, the events show up as a long undifferentiated list. She suggested that subdividing the list by time of day — morning, afternoon, evening — will be helpful to folks using screen readers. But in fact, it’s just plain helpful. So I’m testing a version of that idea now.

Ionically I was just thinking about this same principle in another context. The new version of Oakland Crimespotting, which I raved about, segments incidents using this vocabulary:

light, dark, commute, nightlife, day, night, swing shift

In that spirit, I’m trying this:

morning, lunch, afternoon, evening, night

This of course leads to the question: When do these times begin and end?

I was fascinated to see that both Google and Bing return the same Yahoo answers page for the query morning afternoon evening.

For now, though, I’m going with this ruleset:

  Morning:  5:00 AM to 11:30 AM
    Lunch: 11:30 AM to  1:00 PM
Afternoon:  1:30 PM to  5:30 PM
  Evening:  5:30 PM to  9:00 PM
   Night:   9:00 PM to  5:00 AM

But I’ll make these rules — and maybe even the time-of-day names — configurable on a per-location basis.

Last night I realized there was one more step needed to restore my 2002-2006 archive. All of my references into that archive from this blog, which started in December 2006, had to be redirected. What’s more, they had to be remapped. Old URLs like http://weblog.infoworld.com/udell/2006/12/04.html#a1571 had to become new URLs like http://jonudell.net/udell/2006-12-04-hunting-the-elusive-search-strategy.html.

Even without the remapping, it’s not obvious how to do a simple search and replace (say, from weblog.infoworld.com/udell to jonudell.net/udell) across a set of blog entries. I tried the export/edit/import route, but — at least in the case of WordPress — that doesn’t seem to be a way to update existing stuff.

So I wound up writing a script that uses the MetaWeblog API to fetch my current blog entries, find references to the old namespace, adjust them to point to the new namespace, and update the entries. It’s here for my own future reference, and for yours if you need it.

As always in these situations, I end up wondering what a civilian would do. Blog publishing systems don’t seem have bulk search-and-replace capability. They do, however, have APIs. There could be a tool or service that helps people make these kinds of changes. It’d be hard to avoid the password anti-pattern, so if this were a cloud-based service rather than a locally-installed tool you’d want to change your password after using it. But still, it should be doable.

Do such tools or services exist?

While spot-checking my mostly-reconstructed 2002-2006 blog, I found this plaint from 2002:

When you are a writer whose entire corpus exists online, woven into a fabric of citation and commentary, it is incredibly painful to see that fabric torn apart.

Déjà vu all over again. In 2002 I had to sacrifice the linkage to my 1999-2002 BYTE.com and restore it here. Now I’ve done the same for my 2002-2006 InfoWorld blog. Since its former namespace isn’t being redirected, and since all the old links were broken anyway, I’ve taken this opportunity to create new descriptive names that incorporate dates and titles.

The reboot isn’t 100% clean, but it’s automated and reproducible so I can address categories of problems as they show up.

I’m glad I’m not in publishing anymore. It turns out to be a lousy way to keep your stuff published. When a commercial hosted lifebits service comes online, I’ll be customer #1.

Last week I mentioned three ways for elmcity curators to categorize events:

  1. If a source iCalendar feed uses the CATEGORIES property, they’ll be included.

  2. If all of the events from a feed can be categorized, you can name that category in the Delicious metadata, using category=CATEGORY. All events from the feed will inherit it in the same way that they all inherit the default clickthrough link specified with url=URL.

  3. If all of the events from an Upcoming or Eventful venue can be categorized, you can also name that category in the Delicious metadata. To do that, bookmark the venue URL and use the patterns venue={UPCOMING|EVENTFUL} and category=CATEGORY.

Now I’ve added a fourth. In any iCalendar app you can now use these patterns in the Description field:

url=http://www.harlowspub.com

category=music,bluegrass

The url=… and category=… patterns can occur anywhere in the description.

This is particularly useful for recurring events. As discussed here, recurring events are a great way to build critical mass because your curation effort keeps paying dividends.

For example, one of the events I found when exploring the search page for Keene is the Monday night bluegrass jam at Harlow’s Pub.

Here’s the description I entered into Windows Live Calendar — which also could have been entered into Google Calendar, or any other iCalendar app:

The Birch Benders host a Bluegrass picking party at Harlow’s Pub in Peterborough every Monday night – 8 pm until they kick us out (11 or so). url=http://www.harlowspub.com category=music

Here’s the rendered result:

Mon 08:00 PM Bluegrass night with the Birchbenders (recurring events) (music)

The same data shows up in the downstream XML, ICS, and JSON feeds.

Since the iCalendar spec allows for a CATEGORIES element, this approach shouldn’t be necessary. But not all calendar apps allow you to tag events in this way. Outlook does, but Google Calendar, Live Calendar, and Apple iCal don’t.

Fortunately we can scribble in the margins. I first used that phrase in an InfoWorld story about a feature of the Internet’s Domain Name System called the TXT record. Although it is possible to define more specific record types, it’s hard to get everyone to agree to use them. So developers have historically “scribbled in the margins” of the DNS. And we can do the same with iCalendar.


PS: The title of that InfoWorld story was actually Filling in the Margins, which wasn’t what I wrote and which I never liked. The title I wrote was Scribbling in the Margins, and I used it for the blog entry that introduced the InfoWorld article. I’ll have that entry back online soon, along with the rest of my archive from that era. But meanwhile, when I search for the title using doublesearch, I notice an interesting point of comparison between Google and Bing. It’s been over a month since that blog archive went dark, and Google has now evidently forgotten about it. But Bing remembers. I don’t have any special insight into how Bing works, but I’ll be interested to see how long it keeps remembering.

In his writeup on Google Wave, Dare Obasanjo says:

I’m sure there are thousands of Web developers out there right now asking themselves “would my app be better if users could see each others’ edits in real time?”,”should we add a playback feature to our service as well” [ed note - wikipedia could really use this] and “why don’t we support seamless drag and drop in our application?”. All inspired by their exposure to Google Wave.

Indeed, every application that preserves a change history needs playback. Wikipedia, as Dare notes, is a prime candidate. Back in 2006, I made this LazyWeb request:

Animation is the best way to visualize the flow of change, as I discovered when I made my Wikipedia screencast. For Wikipedia, and indeed for all kinds of living documents supported by revision history and diff tools, I can imagine being able to isolate a paragraph or section and autogenerate the screencast of its evolution. I can even imagine the content of such visualizations being considered not just cutting-room floor debris but, rather, part of the “real” document, like footnotes.

Andy Baio responded by sponsoring a contest for a tool that would do just that. And I made a screencast demonstrating Dan Phiffer’s winning entry.

That script is unavailable at the moment because, ironically, Dan’s server reports:

Oh noes! I got HACK*D. I’m sifting through my files and should restore things back to normal soon.

In any case, it probably wasn’t practical for routine use. Fetching every revision on the fly really hammers Wikipedia. What’s really needed — again, not just for Wikipedia but everywhere — is a general way to query change history, and return a stream of versions and differences.

One way of doing the latter would be to use FeedSync, an open extension to RSS/Atom that supports synchronization in Live Mesh. Another would be to use Google’s Wave protocol. Because FeedSync deals with lists of items, which can be arbitrary chunks of content, whereas Wave deals with lists of document-mutation operations, like delete-element and start-annotation, it seems to me that FeedSync is more general, albeit less immediately useful for collaborative editing.

To explain why generality matters, consider change animation in a very different domain: software configuration. My wife, for example, sometimes changes her settings — in Word or Firefox — in ways that cause problems. If these apps persisted their settings to Live Mesh, as they could and arguably should, I’d be able to debug a mishap locally or remotely. But ideally, the change visualization would be sufficiently user-friendly so that she’d have a shot at figuring it out for herself.


PS: Speaking of history and restoration, I’ve been feeling like an amnesiac ever since my InfoWorld archive went dark. So in spare moments I’ve been reconstructing and republishing it. I’ll have the text of all the old blog entries up soon. And I’ve been restoring the screencasts as well. I’m keeping track of my progress at delicious.com/judell/screencast+restored.

My plumber’s last name is Thieme. I was just looking up his phone number, and got distracted when I realized that the people search in Live Bing does a fair job of visualizing the geographic distribution of surnames. If you do a people search for Thieme, New Hampshire, and start panning around at county and state resolutions, you can see where Thiemes have clustered and where they haven’t.

As I was doing this, I suddenly realized: Why don’t maps offer named zoom levels? If you want to pan across the country at state or county resolution, it requires an enormous amount of continuous zooming in and out. Of course the sizes of states and counties vary as you move across the country. But that’s the whole point. Computers can do the math and automate those adjustments.

What prompted this thought was the newly-redesigned Oakland Crimespotting, which features a nifty new widget for selecting times of day. Stamen Designs’ Eric Rodenbeck, whom I recently interviewed, calls it the time pie. It’s fun to spin your way through the hours, making contiguous or discontiguous selections. But what’s really useful are the named slices: light, dark, commute, nightlife, day, night, swing shift. As Stamen’s blog notes:

The last time slices (day, night and swing) are the ways that the police view this information, and one thing we hope will come from the project is a better understanding of how the police view their data as it’s collected.

Nice!

What you may not notice, as you navigate the new interface, is that every adjustment is reflected in an exquisitely detailed URL. It’s not obvious because the URLs are really long, and the changes happen outside the visible part of the browser’s location window. But watch:

Default: http://oakland.crimespotting.org/map/#dtend=2009-06-04T20:35:28-07:00&lat=37.806&types=AA,Mu,Ro,SA,DP,Na,Al,Pr,Th,VT,Va,Bu,Ar&lon=-122.270&hours=16-23&zoom=14&dtstart=2009-05-28T20:35:28-07:00

Hide all crime types: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=&lon=-122.270&hours=0-23&zoom=14&dtstart=2009-05-28T23:59:59-07:00

Show all and extend dates to max range: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=AA,Mu,Ro,SA,DP,Na,Al,Pr,Th,VT,Va,Bu,Ar&lon=-122.270&hours=0-23&zoom=14&dtstart=2009-05-08T00:00:00-07:00

Narcotics only: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=Na&lon=-122.270&hours=0-23&zoom=14&dtstart=2009-05-08T00:00:00-07:00

Nighttime narcotics: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=Na&lon=-122.270&hours=16-23&zoom=14&dtstart=2009-05-08T00:00:00-07:00

Wee hours narcotics: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=Na&lon=-122.270&hours=1-4&zoom=14&dtstart=2009-05-08T00:00:00-07:00

As noted on the Stamen blog, this means that:

It’s now possible to navigate and link to recent newsworthy events like the assassination of journalist Chauncey Bailey, the Oscar Grant riots from January 2009, and the Lovelle Mixon incident from this past March.

The Stamen crew is renowned for brilliance, and rightly so. But the principles at work here — thoughtful naming, granular linking — are ones that we all can and should practice, in the many small ways that we can as we explore and co-create the infosphere.

Curation is always a two-step tango. First you collect, then you categorize. Until now, the elmcity project has been all about collecting. But as the nodes of this network of community hubs start to light up, and as curators gather growing numbers of calendar feeds, it’s time to start enabling them to categorize as well.

This is a classic hard problem. How do you get people to tag hundreds or thousands of items? What makes the problem even harder, in the domain of events, is that once those items fade into the past, any effort invested in tagging them is lost.

My answer is, at least for now: Don’t worry too much about tagging individual events. Instead, gain leverage by finding ways to tag sources of events. Here are two good strategies:

1. Categorizing iCalendar feeds

The obvious place to start is with the iCalendar feeds that curators are collecting. There’s already a mechanism in place to capture metadata about those feeds. Here, for example, is the iCalendar feed for the 2009 Board of Supervisors meetings in Prescott, AZ:

http://fusecal.com/calendar/ical/3200531?h=b75b09c8-50c2-11de-9169-00163e12298c

That’s an iCalendar feed that was made from this web page:

http://www.co.yavapai.az.us/Events.aspx/id=32794

If you check the Delicious metadata for Prescott’s iCalendar feeds, you’ll see this structure:

title: Board of Supervisors
  url: http://fusecal.com/calendar/ical/3200531?h=b75b09c8-50c2-11de-9169-00163e12298c
  tag: trusted
  tag: ics
  tag: feed
  tag: url=http://www.co.yavapai.az.us/Meetings.aspx/folderid=1488&year=2009
  tag: category=government

The url= tag was already there. It provides the all-important link back to a human-readable authoritative source for events coming from this feed. It’s best if individual events provide their own links, but often in iCalendar feeds they don’t, so this is the default link.

What’s new is the category= tag. Now all events coming from this feed will carry that category. For example:

Mon Jun 15 2009


Regular Meeting – Cottonwood N/A
(Board of Supervisors)
(government)

The same info travels downstream, to the aggregated Prescott iCalendar feed:

BEGIN:VEVENT
CATEGORIES:government
DESCRIPTION:Regular Meeting - Cottonwood N/A \n\n****************
nfrom  FuseCal.com\n ******************************\n\n
DTSTART;VALUE=DATE:20090615
LOCATION: (see http://www.co.yavapai.az.us/Events.aspx?id=32794)
SEQUENCE:0
SUMMARY:Regular Meeting - Cottonwood N/A
UID:633797255542010000-1196352865@elmcity.cloudapp.net
URL:http://www.co.yavapai.az.us/Events.aspx?id=32794
END:VEVENT

And to the aggregated XML feed:

<event>
<title>Regular Meeting - Cottonwood N/A</title>
<url>http://www.co.yavapai.az.us/Events.aspx?id=32794</url>
<source>Board of Supervisors</source>
<dtstart>2009-06-15T00:00:00</dtstart>
<categories>government</categories>
</event>

This strategy only works, for course, for feeds that can be categorized. And that won’t always be true. Events coming from the ReadItNews feed don’t fit into any single category (or short list of categories). So they’ll remain untagged for now. That’s OK. Better to make some progress than to make none. This partial approach yields a nice return on investment. And thanks to the bulk editing feature of Delicious, it’s really quick and easy to select a set of feeds and then tag them with a category= tag.

2. Categorizing Eventful and Upcoming venues

We can use a variation of this strategy to categorize sources of events coming from Eventful and Upcoming. In this case, the lever is the venue. Not all venues host events that can be categorized. But some do, and in those cases, why not exploit that?

The strategy here is to bookmark and tag the event’s venue URL from Upcoming or Eventful. Here are two examples:

Upcoming

title: Venue: Prescott YMCA - Upcoming
  url: http://upcoming.yahoo.com/venue/435420
  tag: venue=upcoming
  tag: category=recreation

Eventful

title: Venue: Raven Cafe
  url: http://eventful.com/prescott/venues/raven-cafe-/V0-001-000366078-7
  tag: venue=eventful
  tag: category=music

If you check the default HTML view of Prescott’s aggregated events, you’ll see that these categories indeed show up. They’re also in the downstream XML, ICS, and JSON feeds.

But can’t the source iCalendar feeds provide per-event categories?

Yes, some do. In the case of Prescott, the public library’s iCalendar feed uses the CATEGORIES property, so those categories show up too. For example:

Thu 02:00 PM
Sign up for Computer Mentor
(Prescott Library)
(Adult Computer Class,library)

Here we see a list of two categories. The first item, Adult Computer Class, was in the original iCalendar feed. The second item, library, was inherited from the feed metadata specified by the curator.

There’s a long way to go with this stuff. But this is a nice start!

Jamie Heywood joined me for this week’s Innovators show. His quest to cure ALS (Amyotrophic Lateral Sclerosis, aka Lou Gehrig’s Disease) is featured in a book and a movie. In this conversation, we explore Jamie’s current project: PatientsLikeMe. It’s a website where people pool data about their medical conditions, their drug regimes and related therapies, and their outcomes.

Of course people have been sharing medical information online since it became possible to do so. But PatientsLikeMe differs from other online health communities in several ways. The profile of a user is someone who is grappling with a serious, life-changing illness where:

  • You are very debilitated, perhaps even unable to go to work.

  • You can tell if your treatment is helping. (If you have Parkinson’s disease or depression, for example, you can judge what works or doesn’t. If you have breast cancer, you can’t.)

  • You are in a situation where both diagnosis and treament are ambiguous.

The data that you report brings you into direct contact with other patients who share similar conditions and treatments. In this sense, PatientsLikeMe is a uniquely data-driven social network:

It is the richest open quantified human-to-human network that exists. There are a couple of hundred measured channels on which you can evaluate yourself against everyone else that you might be interested in connecting to. And you can go across any of those channels to anyone else in the world.

The data you report also brings you into direct contact with drug companies:

It connects you with the people who are developing the drugs to treat your disease. This cuts out an immense amount of inefficiency and middlemen, and can potentially make the system much better. It’s a way of rationalizing and accelerating discovery.

For that reason, Jamie sees no need to apologize for PatientsLikeMe’s business model, which is to sell the data it collects to drug companies. This arrangement may even, arguably, be a form of citizen science:

Do I think that we’ll be using crowdsourcing to interpret the RNA signature in blood? No. But in the real world, when you ask what it means to have ALS, each patient in the system is a representative of their own specific phenotype of this illness. Which is a way of putting it into the process of discovery. Because if you’re not in there — if you’re different, and everyone is unique in some way — the specific components of your own health and its impacts on your life will not be addressed in the process of treatment.

What about privacy? Jamie admits, honestly, that there can be no guarantees, and does not think people who expect guarantees should use PatientsLikeMe. It isn’t for everyone. But there are a number of folks who, after evaluating the risk of participating (pseudonymously) in the service, conclude that the benefit outweighs that risk. They are part of a collective experiment that I will be watching with the greatest interest.

In the last few days I’ve received useful feedback on the elmcity project from an old friend (whom I’ve never met in person), and a new friend (whom I have). The old friend is Jake Ochs, an accomplished technologist who, like John Faughnan, was a valued online correspondent back in the BYTE era. The new friend is Mykel Nahorniak whom I met at Transparency Camp 2009. Mykel is cofounder of the social event listing platform Localist, and has been curating the elmcity project’s Baltimore hub.

Both Mykel and Jake are intrigued by the elmcity project, but are skeptical about the approach and likely outcome. Here’s Mykel, quoted with permission from email:

It’s already a challenge to convince a local venue that they need a Web site, let alone a Twitter presence, let alone an iCal feed. I think the return a lot of businesses are seeing from social media has helped motivate these local businesses, though.

Really, it’s about giving them a tangible return on their efforts. What incentive do these businesses have to curate their calendars in a specific format when, realistically, it’s not going to equal the return they’d get on, say, curating a Twitter account. That’s what needs to be determined on our end. Specific examples that would give a business no excuse to say “no.”

And here’s Jake, writing on his blog:

I can’t help but feel that Jon is missing the bigger picture. Well, he’s “getting” the bigger picture -that calendar-ish data will probably be a “big” thing. His recombinant approach to existing tools and ideas, though, probably isn’t it. The ability to create such mashups is a hallmark of the “Web 2.0″ era and Jon, once again, displays his masterful ability to create something powerful from simple, existing substrates. Historically, it’s been the entrepreneurs that somehow grasp a simple concept regarding human behavior -or an evolved human behavior- and bring that concept to bear on a traditionally complex problem that win out in the marketplace. I don’t have any idea what that concept will look like, so don’t ask, but I highly doubt that it will contain the recombinant DNA of existing solutions when it debuts.

Mind you, I said when it debuts. After the magical mystery viral calendar tool of the future gains traction, a clamor will be made for an API that will draw the tool into the prevailing social tapestry. (Facebook and Twitter today, who knows what tomorrow?) I wonder, though, will iCal make it into that mix when the day comes or is iCal’s fundamentally one-way nature not be up to the task of the wonder collaboration of tomorrow?

Lately I’ve been pitching my project to folks who don’t dwell the geek ghetto. And I’ve been telling plain stories that seem to resonate — at least in the old-fashioned way, one-on-one and face-to-face. Here’s one of them:

The Monday night chess club

The chess club in Keene gets together on Monday nights at 6:30. They used to gather at the Best Western hotel. Then they switched to the E.F. Lane hotel. For at least a year after the move, the Keene Sentinel’s community bulletin board continued to list the event at the Best Western. If the chess club had published its own authoritative feed, and communicated the address of that feed to the Sentinel — instead of transmitting a copy of soon-to-be-stale data — there might be a few more chess players showing up on Monday nights.

Why should businesses want to publish information in a syndication-friendly format? Because, like all of us, they want to be the authoritative source for information about themselves. And because they don’t want to have to remember, and refresh, every touchpoint to which they have transmitted data by value rather than by reference.

Is iCal’s “fundamentally one-way nature up to the task of the wonder collaboration of tomorrow?” True, iCalendar is a decade-old standard that has never rocked the Internet, and maybe never will. But one-way? That limitation exists only in the eye of the beholder. The chess club can publish a calendar that the Keene Sentinel can subscribe to.1 The Sentinel, in turn, can aggregate those subscriptions into a combined calendar that members of the chess club — and others in the community — can subscribe to. Those other individuals and organizations can also be publishers and subscribers. The system I am building is not really about iCalendar. It’s about the principles, patterns, and practices that make pub/sub ecosystems such fertile ground for communication and collaboration.

Of coure Mykel and Jake are right, and I value their skepticism. I haven’t yet figured out how to make the chess club anecdote go viral, or tell it in a way that business can’t say no to. But I’m warming to the task, and I’m starting to connect with environmental activists, librarians, civic-minded geeks, and colleagues who can help me advance the story.


1 The infrastructure that I’m building is dedicated to this purpose. If you’re a newspaper, a library, a chamber of commerce, or some other natural attention hub in your community, I want to help you syndicate calendars through your hub.

My guest for this week’s ITConversations show is Eric Rodenbeck, the creative director of Stamen Design. His 2008 ETech talk wowed me, and inspired this meditation on time, space, and data.

Near the end of this interview, as we were discussing the tension between graphic design and engineering sensibilities, Eric said:

When it was just me, working as a designer, I was having fun, but I wasn’t able to be effective. And when Mike [Michal Migurski, Stamen's technical architect] was doing tech work for PR companies, it wasn’t all that great. But when we came together, suddenly we had something.

Even in a design studio that we control, though, it’s hard to address that split between the lush sexy design versus the tech. Versus! Why is it always versus?

Exactly. Eric also notes another false dichotomy: cool versus useful. We violently agreed that coolness and utility are two sides of the same coin.

For that reason, it would fun to also talk to Eric’s technical partner Mike Magurski. In this interview, we learn that he created the original API for Oakland Crimespotting by scraping this police site, which (still) produces map images like this:

Mike’s task was to identify and locate incidents by writing code that would scan those images for “purple bras, boxing gloves, and hypodermic needles.” Which is funny, but also sad. So many more usefully cool things will be able to happen when publishers of data finally start to learn how to publish data.

In the first installment of this elmcity+azure series my plan was to build an Azure-based calendar aggregator using IronPython. That turned out not to be possible at the time, because IronPython couldn’t run at full strength in Azure’s medium-trust environment. So I switched to C#, and have spent the past few months working in that language.

It’s been a long while since I’ve worked intensively in a compiled and statically-typed language. But I love being contrarian. At a time when low ceremony languages are surging in popularity, I’m revisiting the realm of high ceremony. It’s been an enjoyable challenge, I’ve gotten good results, and it’s given me a chance to reflect in a more balanced way on the “ceremony vs. essence” dialogue.

Meanwhile, Azure has moved forward. It now provides a full-trust environment. That means you can run PHP, which is interesting to a lot of folks, but it also means you can run IronPython, which is interesting to me.

In this entry I’ll show you how I’m starting to integrate IronPython in the two main components of my Azure project: the web role that provides the (currently minimal) user interface, and the worker role that does calendar aggregation.

Using IronPython in an ASP.NET MVC Azure web role

The elmcity service writes a lot of log data to an Azure table. I’ll want curators to be able to query the slices of that log that pertain to the cities whose calendars they are curating. For Providence, RI, which uses the elmcity (and delicious) id mashablecity, the URLs for those queries might look something like this:

/services/mashablecity/log_info (log entries of type “info”)

/services/mashablecity/log_exception (log entries of type “exception”)

Here’s an URL route to carve out a namespace shaped like that:

routes.MapRoute(
 "services",
 "services/{id}/{what}",
 new { controller="LogServices", action="QueryLog", id="", what="" },
 );

Here’s a simplified version of the corresponding LogServicesController.cs:

[HandleError]
public class ServicesController : Controller
  {
  public ActionResult QueryLog(string id, string what)
    {
    return new ObjectResult(id, what);
    }
  }

public class ObjectResult: ActionResult
  {
  string id;
  string what;

  public ObjectResult( string id, string what)
    {
    this.id = id;
    this.what = what;
    }

  public override void ExecuteResult(ControllerContext context)
    {
    switch (this.what)
      {
      case "log_info":
         var script_url = make_script_url(this.id, this.what);
         var args = new List() { this.id, this.what };
         var result = new ContentResult
          {
          ContentType = "text/plain",
          Content = Utils.run_ironpython(script_url, args),
          ContentEncoding = UTF8
          };
        result.ExecuteResult(context);
        break;
      case  "log_exception":
      // etc
      }
    }

This fragment takes in the URL parameters, forms the URL that IronPython will use to fetch the script that it runs, packages the parameters into a list, calls a method to invoke IronPython, and dumps the script’s output into the outgoing HTTP response.

Here’s the code to invoke IronPython:

public static ScriptEngine python = Python.CreateEngine();

public static string run_ironpython(string script_url, List args)
  {
  var ipy_args = new IronPython.Runtime.List();
  foreach (var item in args)
    ipy_args.Add(item);
  var result = "";
  try
    {
    var s = Utils.FetchUrl(script_url).data_as_string;
    var source = python.CreateScriptSourceFromString(s,
      SourceCodeKind.Statements);
    var scope = python.CreateScope();
    var sys = python.GetSysModule();
    sys.SetVariable("argv", args);
    source.Execute(scope);
    result = scope.GetVariable("result").ToString();
    }
  catch (Exception e)
    {
    result = e.Message.ToString() + e.StackTrace.ToString();
    }
  return result;
  }

Whatever the script deposits in a Python variable called result winds up as the content of the HTTP response.

Using IronPython in an Azure worker role

Until recently I’ve been running some IronPython maintenance scripts from a standalone client machine. Now I’ve pushed them to the cloud. Here’s the scheduler that sets a timer to invoke a handler on a periodic basis:

public static void scheduler (ElapsedEventHandler handler, int minutes)
  {
  var timer = new Timer();
  timer.Elapsed += handler;
  timer.AutoReset = true;
  timer.Interval = 1000 * 60 * minutes;
  timer.Start();
  }

And here’s the handler:

public static void IronPythonHandler(object o, ElapsedEventArgs e)
  {
  try
    {
    var s = Utils.FetchUrl(Configurator.ADMIN_SCRIPT).data_as_string;
    var source = python.CreateScriptSourceFromString(s,
       SourceCodeKind.Statements);
    var scope = python.CreateScope();
    source.Execute(scope);
    ts.write_log_message("info", "IronPythonHandler");
    }
  catch (Exception ex)
    {
    ts.write_log_message("exception", "IronPythonHandler",
      ex.Message.ToString() + ex.StackTrace.ToString());
    }
  }

Best of both worlds

I’m still sorting out how I want to combine these two worlds, and I’m having a blast doing it. Could I have written the whole system in IronPython, had the option been available when I started? Undoubtedly. But high ceremony, coupled with a sophisticated tool like Visual Studio, has its charms. So does low ceremony and emacs. Using both together, and leveraging all their strengths, is really productive. And it’s loads of fun too.

On this week’s Innovators show I talked with Philip Rosedale about the ways in which Second Life, the virtual world, and Linden Lab, the real company, are laboratories for experimenting with social, economic, and organizational principles.

As I was editing the show, I sent some of the notable quotes to Twitter:

On transparency and central control:

As communication technology makes transparency cheaper, the need for central control drops.

On why Second Life works well for group meetings:

We spatialize the audio so you hear where everyone’s voice is coming from.

On distributed development:

We don’t specialize roles by geographic location.

The Linden Lab experience with decentralization, transparency, and fluid team formation echoes what we’ve heard from Andy Singleton. Philip Rosedale adds this thoughtful observation:

There’s a tension between people’s desire to work together in a cohesive, familial kind of unit, and the organization’s need to have people work together in the way that’s optimal for projects, where you want to attack a problem, work together, disband, and then reform to work with different people on the next problem.

Even if you will never fly an avatar around in Second Life, or use the in-world construction kit to build a 3D object, it’s fascinating to hear about the organizational strategies that Philip Rosedale believes make it all possible.

One of the core values of the elmcity project is respect for authoritative sources. Every event that syndicates into the system should have an URL that points back to that authoritative source.

Often, when the source is an iCalendar feed, some or all of the events leave the URL field empty. That’s why I encourage curators to provide a default URL for the whole calendar. For example, the newest city to join the hub is Falls Church, VA. One of the feeds that curator Dave Witzel created (using FuseCal) is for the Dogfish Head Ale House. As you can see in Dave’s metadata catalog, the bookmark he created for its calendar points to the URL of a FuseCal-created iCalendar feed. And one of the tags on that bookmark is:

url=http://www.dogfishalehouse.com/component/option,com_jcalpro/Itemid,70/

The url=http://… tag is how a curator tells the elmcity aggregator: “This is the default URL I want you to use, for any events in this feed that do not individually specify a URL.”

In this example, Dave has pointed the url= tag at the events page at the ale house, which is also the URL that he fed to FuseCal in order to get it to produce the iCalendar feed.

Let’s look at another example: the Thomas Jefferson Public School. Here, Google Calendar is the source of the iCalendar feed. And Dave has bookmarked the web page in which the school has embedded that calendar.

One place this default URL shows up is in the reference HTML view of the Falls Church calendar. On June 3, we see:

Wed 09:15 AM
7th Gr. Jazz Band Concert at TJ
(Thomas Jefferson Public School Calendar)

The link points back to the source. So far, so good.

Now, these source URLs should also propagate to any downstream services that syndicate from the hub. And in each of the outbound formats — XML, JSON, and ICS — they do. Here’s how that originally worked:

XML:
<event>
<title>7th Gr. Jazz Band Concert at TJ</title>
<url>http://www.fccps.org/calendar/tj.htm</url>
<source>Thomas Jefferson Public School Calendar</source>
<dtstart>2009-06-03T09:15:00</dtstart>
</event>

JSON:
{
"title":"7th Gr. Jazz Band Concert at TJ",
"url":"http://www.fccps.org/calendar/tj.htm",
"source":"Thomas Jefferson Public School Calendar",
"dtstart":"\/Date(1244020500000+0000)\/"
}

iCalendar:
BEGIN:VEVENT
DESCRIPTION:For more information regarding this or any other event please
 contact the MEH Main Office at 703-720-5700.
DTSTART:20090603T091500
URL:http://www.fccps.org/calendar/tj.htm
SUMMARY:7th Gr. Jazz Band Concert at TJ
UID:633778966573833700@elmcity.cloudapp.net
END:VEVENT

But this perfectly sensible mapping fails for iCalendar. It turns out that many popular calendar programs don’t surface the URL field in an iCalendar VEVENT. Apple’s iCal does, but Outlook, Google Calendar, and Live Calendar don’t.

What to do? Until somebody helps me find a better solution, here’s mine:

iCalendar:
BEGIN:VEVENT
DESCRIPTION:For more information regarding this or any other event please
 contact the MEH Main Office at 703-720-5700.
DTSTART:20090603T091500
URL:http://www.fccps.org/calendar/tj.htm
LOCATION:http://www.fccps.org/calendar/tj.htm
SUMMARY:7th Gr. Jazz Band Concert at TJ
UID:633778966573833700@elmcity.cloudapp.net
END:VEVENT

In other words I’ve added a LOCATION field, which in this case was (not atypically) empty in the source calendar, and I’ve filled it with the same value that’s in the URL field.

The RFC2445 spec defines LOCATION like so:

Property Name: LOCATION

Purpose: The property defines the intended venue for the activity
defined by a calendar component.

And it gives this example:

LOCATION:Conference Room - F123, Bldg. 002

I’m stretching LOCATION to mean “virtual location of the source of information about the event” instead of “physical location of the venue for the event.”

Given that mapping, here’s how the 7th Gr. Jazz Band Concert shows up in various apps:

Google Calendar:

Outlook:

Live Calendar:

Apple iCal:

Notice how iCal renders both URLs — the one in the LOCATION field, and the one in the URL field — as clickable links. I wish the other programs also made an URL in the LOCATION field clickable. Reporting the source URL is better than not doing so. But the extra friction involved in copying the URL and pasting it into a browser will tend to prevent that from happening.

Given all this, I’d love to hear suggestions for a better approach. One thing to keep in mind is that I’m trying to strike a balance between two conflicting goals. I want to make the downstream syndication formats useful. But not too useful, because I also want to build a connected ecosystem that helps all the upstream sources — including Eventful, Upcoming, and growing populations of iCalendar feeds — thrive. The elmcity service doesn’t aim to be a container of other people’s stuff. It aims to be a router, or actually a whole bunch of routers, each of which helps bootstrap a connected ecosystem in which people become the authoritatives sources of their own information, and in which they learn to syndicate that information to one another.

When I shared my strategy for harvesting Keene’s softball schedules, the Little League baseball schedules hadn’t yet been published online. Now I see why. It took the folks at the Keene Cal Ripken Baseball Association (KCRBA) a while to get them written down in Excel, and then produced and uploaded as a set of web pages like this one. We’re two weeks into the season, and those pages are finally up, but not — sadly yet typically — in a useful calendar format that can mesh with other calendars.

Over the weekend, @llama_grande tweeted:

Dilbert creator on calendars @judell may enjoy http://bit.ly/2lKTlb

I set it aside thinking it was a cartoon I’d enjoy later. In fact, it’s a cogent essay by Scott Adams that nicely captures part of my motivation for doing the elmcity project. From the essay:

I think the family calendar is the organizing principle into which all external information should flow. I want the kids’ school schedules for sports and plays and even lunch choices to automatically flow into the home calendar. And when I want to decide what to do on the weekend, I want to click on the date for next Saturday and have all the relevant choices of plays, movies, and events pop up.

I think the biggest software revolution of the future is that the calendar will be the organizing filter for most of the information flowing into your life. You think you are bombarded with too much information every day, but in reality it is just the timing of the information that is wrong. Once the calendar becomes the organizing paradigm and filter, it won’t seem as if there is so much.

Meanwhile, here’s the reality for Kevin Curry:

checking a PDF 4 school lunch is daily routine 4 me

That’s how it is for most of us, most of the time. But it needn’t be.

Consider the Little League example. If the keystrokes that were poured into Excel to create those web pages had been directed into almost any calendar program, the schedules could have been published both as HTML for online viewing and as iCalendar for syndication to other calendars.

Happily, FuseCal can set things straight. It handily created calendars for each of the 27 teams. I collected the feed URLs and wrote a throwaway script to spray them into Delicious. In a few hours, when the elmcity service scans that account again, all the games will be included in the combined calendar. And anybody who wants what Scott Adams wants — to have the kids’ sports events flow into a home calendar — can have it.

This is wrong and backwards, of course. And while the creator of Dilbert would probably enjoy the absurdity of my solution, I’m glad to know he’s also thinking about the right way to move forward.

Like all University of Michigan alumni who were in the school of Literature, Science, and the Arts, I receive the quarterly LSA Magazine. This spring I’m actually in the magazine. For an issue on the theme of surviving in tough economic times, I contributed the back-page editorial which the editors entitled Can the Noosphere Save Us? The themes will be familiar to readers who know me: personal publishing, knowledge sharing, online collaboration. It was a treat to be asked to write about these topics for a diverse audience of UM alumni.

I would have subtitled the piece: “Ask not what the web can do for you. Instead ask what you can do with the web.” It features three people I have interviewed for my Innovators show, all of whom exemplify that dictum. They are Jean-Claude Bradley, Susan Gerhart, and John Leeke.

In order to make things easier for Susan Gerhart, who’ll be using a screen reader, I’m supplementing the PDF version posted at the magazine’s site with a plain HTML version.

I’ve collected the original URLs of articles and columns I wrote for InfoWorld at http://jonudell.net/InfoWorldArticles.html. Almost all are now live again, albeit redirected, after having been scrambled in the recent site reorg.

Still missing in action is my blog from that era. I have the whole thing in a clean XHTML archive that I can easily republish to my own namespace. However, since the articles and the blog essays cross-reference one another, I’m still hoping that the blog will reappear at its old namespace — underneath http://weblog.infoworld.com/udell — so that the cross-referencing will still work in both directions.

If the blog doesn’t resurface at its original namespace soon, I’ll go ahead and migrate it to mine.

When Phil Windley pointed me to Jeannette Wing’s manifesto on computational thinking, she had me at hello. The intellectual tools of computer science, she argues — including the ability to work at multiple levels of abstraction, to automate repetitive processes, and to make and use state machines — are really “a universally applicable attitude and skill set that everyone, not just computer scientists, would be eager to learn and use.”

In 2007 I interviewed Jeannette Wing for my Innovators show. Since then she has moved from Carnegie Mellon to the National Science Foundation, where she is — among other activities — working to define, promote, and bootstrap the teaching of computational thinking.

On this week’s show I spoke with Joan Peckman, a University of Rhode Island professor of computer science who’s on leave to work with the NSF on that mission.

Toward the end of the podcast, she relates this delightful anecdote:

At the first CSTB workshop on computational thinking for everyone, someone from the University of Indiana showed a video on how he was teaching science. We all looked at it and thought: “But it’s also computational thinking!”

In the course of teaching elementary school students about honey bees, he took them out on the playground and asked them to act out what the honey bees did: leaving the hive, finding the pollen, giving directions to the other bees. Then he brought them back into the classroom, went to a whiteboard, and engaged them in activites that I would identify as modeling, debugging, and drawing finite state diagrams. He didn’t call them that, but that’s what they were.

Yes he was teaching them science, but the way he was analyzing the subject, and engaging them in analysis, clearly involved a set of computational constructs.

In my own recent writing and speaking, I’ve suggested that feed syndication and lightweight service composition are aspects of computational thinking that we ought to formulate as basic principles and teach in middle school or even grade school.

We tried, but failed, to come up with a phrase that embellishes computational thinking with connotations of flow, orchestration, and connectedness. Syndication-oriented architecture. gets partway there, but will never fly in the mainstream. Maybe connected thinking? But you don’t want to leave out what computational connotes. Perhaps computational and connected thinking? Nah, too wordy. I’d love to hear suggestions for a tagline that concisely captures both aspects.

For more background on computational thinking, here are Joan Peckham’s show notes:

The CSTB (Computer Science and Telecommunications Board) of the National Academy of Sciences is holding Computational Thinking for Everyone: A Workshop Series in 2009. Monitor their website for developments and reports: http://sites.nationalacademies.org/cstb/CurrentProjects/CSTB_043590

Previously awarded CPATH projects (only some of which address computational thinking directly … although the current solicitation requires it):

2007 award portfolio – http://www.nsf.gov/cise/funding/CPATH2007awardsfinal.pdf

2008 award portfolio – http://www.nsf.gov/cise/funding/CPATH2008awardsfinal.pdf

Computer Science Unplugged (http://csunplugged.org/) site has a wealth of classroom ready activities.

Rebooting Computing Summit in January 2009 (http://www.rebootingcomputing.org/). Several working groups emerged from this meeting. Some of the groups were concerned with computing education, and in defining and better communicating computing to others.

The Computer Science Teachers Association (CSTA) has a web repository with K-12 computer science teaching and learning materials: http://csta.acm.org/WebRepository/WebRepository.html

The Carnegie Mellon University Center for Computational Thinking site has materials and resources: http://www.cs.cmu.edu/~CompThink/. [ed: Sponsored, I'm pleased to say, by Microsoft Research.]

A friend lent me the antique tool shown in these photos. I’d been thinking about renting a rototiller to prepare three garden beds, but this thing tore through them way more easily than I could have done with a shovel, rake, hoe, or garden weasel. It’s really good at clawing up major weeds and clumps of sod.

Our question: What is this thing called? I looked through the catalog at antiquefarmtools.info. It’s full of beet shovels, muck rakes, turnip grubbers, and barley forks, but I didn’t find anything that looks like this artfully blacksmithed and wickedly effective tool.

One of the ironies I’ve uncovered while working on the elmcity project is that many folks are publishing iCalendar feeds without even realizing it. I’ve found a number of Drupal websites, for example, that present calendars as web pages without offering the corresponding ICS links. But the biggest source of implicit iCalendar feeds is Google Calendar.

Here’s a typical example of Google Calendar embedded in a web page: Commmunity Gardens of Huntington WV. Curators for the elmcity project have figured out how to extract the ICS URL from this kind of page:

  1. View the source of the page (or frame)

  2. Find the script that embeds the calendar

  3. Find the email address mentioned in the script — in this example: communitygardenshunt@gmail.com

  4. Form an ICS URL based on that address

OK, it’s not that bad. As Bill Rawlinson points out here, there’s a civilian (non-geek) alternative:

  1. Click the Google Calendar button

  2. Add the calendar to your Google Calendar application — assuming you’re signed up to use it

  3. Click Settings -> Calendars

  4. Click the calendar you just added

  5. Right-click its ICAL button and capture the link

But either method is cumbersome. So I’ve added a service that streamlines discovery of the iCalendar feed’s URL. The easiest way to use that service is to go here and install the gcal2ics bookmarklet. When clicked from a page with an embedded Google Calendar, like the Huntington Community Gardens calendar, it yields this:

URL of web page with embedded Google Calendar:

http://www.huntingtoncommunitygardens.com/8.html

ICS (iCalendar) URL for that calendar:

http://www.google.com/calendar/ical/communitygardenshunt%40gmail.com/public/basic.ics

Why would a service like Drupal or Google Calendar ever publish an HTML rendering of a calendar without also offering the corresponding feed URL directly? Because, I guess, we have all failed to teach people what feed URLs are, and show them why they matter.

For a while now, I’ve been wanting to connect the wealth of musical performance schedules on MySpace to the elmcity calendar project. For example, here’s the MySpace page for Jatoba, three young guys from Brattleboro whose eclectic, high-energy, acoustic string instrumentals and vocals have inspired me the couple of times I’ve seen them around town.

As is typical, there’s a calendar on their page. So I should be able to plug its iCalendar feed — and more specifically, the Keene-based performances in that feed — into the Keene calendar hub. But MySpace doesn’t export iCalendar feeds. What to do?

Today I hit on a solution. Here’s the recipe:

1. Visit a band’s MySpace page:

2. Click the (view all) link on the calendar to arrive here:

3. Copy the URL of the (view all) page, paste it into FuseCal, arrive here:

4. Filter the list, arrive here:

5. Click the Add to my calendar bar, arrive here:

6. Click Other Calendar, arrive here:

That’s your iCalendar feed URL in the box. In this example, the feed contains just two entries for Jatoba’s two upcoming Keene performances. Here’s the first:

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Public Display Inc//FuseCal Software//EN
BEGIN:VEVENT
URL:http://collect.myspace.com/index.cfm?fuseaction=
 bandprofile.listAllShows&friendid=273136461&n=JATOBA
SUMMARY:VENDETTA  \, Keene\, New Hampshire  - free
DTSTART:20090507T213000
END:VEVENT

You can subscribe to that URL directly in your calendar program. If you’re an elmcity project curator, you’ll want to take one extra step to syndicate this calendar into your hub:

7. Bookmark the FuseCal URL.

Now Jatoba’s Keene performances will appear on the elmcity calendar, courtesy of FuseCal and Delicious. And neither MySpace nor Jatoba had to get involved. The web’s funny that way.

I recently added a specialized search to help curators working with the elmcity project find recurring events in their communities. It’s helpful, but would be much more helpful if it produced results only when the two searched-for phrases occur in close proximity.

The phrase pairs look like this:

"every thursday" "keene nh"
"first friday" "keene nh"

I’d like to limit results to pages where these pairs occur within, say, 100 words of one another. My search robot uses both Google and Live because, well, why wouldn’t you want the best of both worlds? But as far as I know, neither supports a proximity syntax like:

"every thursday" within 100 "keene nh"

I only need to run my search robot occasionally, and there are only thousands of pages per calendar hub, and there are only a dozen hubs yet. So for now it’s feasible to use brute force. I can, and likely will, fetch all the pages found by the two engines, analyze them, and reject those that fail my proximity test.

But since I am virtuously lazy, I just thought I’d ask. Are their undocumented features for either or both of these engines that I’m missing?

Keene is crazy about baseball and softball. In the men’s softball league alone there are 56 teams, they have played 73 games so far, and will play another 431 through August. I know this because the schedule was made in Excel, and published as a web page that Excel’s Data->From Web feature can easily read back.

That Excel spreadsheet isn’t at all useful, however, if you want to combine the schedule with other public calendars, or with your own personal calendar. For that you need an ICS feed. And almost nobody — from the major league websites to local leagues like mine — bothers to provide those.

So I made an ICS feed for Keene men’s softball, and I did it in an unusual way. My first thought was to point FuseCal at the schedule page, which is just an HTML table that looks like this:

DATE TIME FIELD AWAY HOME Lg
Fri. Apr 17 6:00 PM D Computer Solutions of
Keene
J.A. Jubb C1
Fri. Apr 17 6:00 PM O Peerless Insurance C&S 1 D2

But FuseCal wouldn’t read that page. It’s a service that specializes in digging structure out of unstructured text, and I guess it got freaked out when it saw too much structure in this page!

Normally in cases like this I’d write a script to read the HTML table, parse out the dates and times, and write an ICS feed. But that isn’t a skill most people have, and I’m looking for ways to help calendar curators do this kind of thing for themselves.

Then it occurred to me: What would FuseCal read? How about this:

Fri. Apr 17 06:00 PM,
Computer Solutions of Keene vs. J.A. Jubb, Field D
Fri. Apr 17 06:00 PM,
Peerless Insurance vs. C&S 1, Field O

In other words, the same stuff lightly reformatted, and coalesced into a single cell per row. And yes, FuseCal will read that.

So I added a column to the Excel sheet with this formula:

=CONCATENATE(A4, " ", TEXT(B4,"hh:mm AM/PM"), ", " D4, " vs. ", F4,
 ", ", "Field", C4)

Then I exported that column back out as this HTML page, used FuseCal to create this ICS feed, and bookmarked it for inclusion in the aggregator.

This has to be the weirdest maneuver I’ve ever thought of. Taking away structure in order to be able to add structure? Crazy! And yet it makes perfect sense. FuseCal is a component that specializes in turning weakly-structured calendar-like data into better-structured calendar data. It also knows how to do other useful things, like monitor the source of that data for changes, and convert the data into ICS format. If it’s easy enough to provide the sort of weak structure that FuseCal expects, why not just do that and leverage its strengths?

So I did, and here are the key outcomes:

  1. The softball events now show up on the aggregated calendar.
  2. They’re also available directly from the ICS feed, so that players and their families can add these events to personal calendars.

Nice!

It would be even nicer if, as a member of, say, the Blazers, I could scoop up just my own team’s events. And in fact FuseCal does support filtering. As the creator of the feed, I can go into the application, type Blazers, and restrict the feed to just those events. But I’d have to create 56 separate filtered calendars to provide feeds for all the teams. Feature request for FuseCal: Support filtering on the feed URL, so I can form URLs like:

http://fusecal.com/calendar/view/ 741833?h=5f7c2ac6-13cc-11de-a48e-00163e284ee0&filter=Blazers

http://fusecal.com/calendar/view/ 741833?h=5f7c2ac6-13cc-11de-a48e-00163e284ee0&filter=Greenwald+Realty

While we’re wishing, here’s a feature request for Yahoo Pipes: Add a module for ICS feeds! Pipes is a fabulous tool for transforming, filtering, and merging RSS feeds. It would be great to be able to do the same kinds of magic with ICS feeds.

The fulcrum of my talk last week at the Open Education Conference was observable work. I first started thinking about this back in 2002, when I included this Dave Winer excerpt in my review of Radio UserLand:

We’ve been using this tool since November, internally at UserLand. We shipped Radio 8 with it. When we switched over our workgroup productivity soared. All of a sudden people could narrate their work. Watch Jake as he reports his progress on the next project he does. We’ve gotten very formal about how we use it. I can’t imagine an engineering project without this tool.

Since then I’ve spoken a few times about the idea that by narrating our work, we can perhaps restore some of what was lost when factories and then offices made work opaque and not easily observable. Software developers are in the vanguard of this reintegration, because our work processes as well as our work processes are fully mediated by digital networks. But it can happen in other lines of work too, and I’m sure it will.

My favorite example, from a very different domain, is the historic home preservationist John Leeke. In our interview he eloquently explains how and why he works observably.

This week’s Innovators show, with Charlie O’Donnell and Hilary Mason of Path101, expands on the same theme from a different perspective. Path101’s tagline is community-powered career discovery, and the approach is more data-driven than narrative.

When we narrate our work, we enable others to ask and answer the critical question:

What is it like to be a __________?

Path101’s aggregation of resumes and personality tests aims for different kinds of questions:

What personality traits do other _______s like me tend to have?

What careers do other _______s like me transition into?

Path101 is still a very young service, but I love the concept and will be interested to see how it evolves.

One of the elmcity project’s curators — Richard Akerman, in Ottawa — likes to use LibraryThing to keep track of events. He provided me with this RSS feed for Ottawa’s LibraryThing events:

http://www.librarything.com/rss/events/location/ottawa,+on

Although this feed does contain event information, it’s weakly structured. The dates and times appear as free text within the RSS <description> element:

<description>Thursday, April 30 (12:00 pm) Jeramy Dodds discusses Crabwise to the Hounds; Matthew Tierney discusses The Hayflick Unit. Join two stellar poets for a team Masterclass on poetry. Jeramy Dodds, recently shortlisted for the Griffin Prize, and Matthew Tierney, author of The Hayflick Unit and Full speed through the morning dark, for an exploration of the intersection of science and poetry.</description>

Could LibraryThing provide an iCalendar feed? Sure. But in order to do so, its events system would want to start gathering information in a more structured way.

Could FuseCal read the unstructured RSS feed and turn it into a structured RSS feed? In theory yes, in practice it doesn’t seem to want to read XML.

But wait. Maybe FuseCal can read an HTML translation of the RSS feed and turn that into an iCalendar feed?

Yep, that works. For calendar curators, and for anyone else who may be interested, here’s the recipe:

  1. Find a service that converts RSS into HTML. For example: http://www.rss2html.com.

  2. Form a URL that uses that service to convert a LibraryThing feed. For example: http://www.rss2html.com/public/rss2html.php?TEMPLATE=template-1-1-1.htm& XMLFILE=http://www.librarything.com/rss/events/location/keene,nh

    For another location, just replace keene,nh with, say, ottawa,on or baltimore,md.

  3. Copy that URL and paste it into FuseCal.

  4. Click Add to My Calendar -> Other Calendar in FuseCal to expose the iCalendar URL.

  5. If you’re curating for the elmcity project, bookmark that iCalendar URL in the Delicious account you’re using to control your instance of the calendar hub.

Of course I could just automatically scan LibraryThing for each instance, just as I’m doing for Eventful and Upcoming. If that’s what curators prefer, I will. But in any case, this is a nice example of the kind of lightweight, spontaneous, opportunistic integration that I mentioned in my talk at the Global Research Library summit.

A conversation with some folks here at the Open Education Conference (#ocwc2009global) just connected in a wonderful way with another conversation on Twitter about what Douglas Hofstadter calls Ob-Platte puzzles, like this one:

Q: What is the Atlantic City of France?

A:Monaco. (Not a city in France. But borders France, is coastal and casino-oriented).

These come from my favorite of Hofstader’s books, Fluid Concepts and Creative Analogies.1 The thesis is that recognizing and extrapolating from patterns is a core aspect of — maybe the core of — intelligence.

Here’s the connection. To the exent that technologists fetishize innovation and newness, we risk overwhelming people with churn. “Forget what you thought you knew,” we tend to say. “This new thing changes everything.” Except, of course, it usually doesn’t.

For example, we’ve done a terrible job of explaining to the world that Twitter is, among other things, a recapitulation of the pub/sub pattern that most people first encountered in the blogosphere. The packets are smaller, the activation threshold is lower, but the same principles apply. You can extend what you know from the blog domain into the Twitter domain. And the two are complementary.

We aren’t getting that message across. Yesterday’s NY Times — featuring Maureen Dowd’s encounter with Twitter founders Evan Williams and Biz Stone — makes that painfully clear.

Analogies are crucial. The elmcity project boils down to this Ob-Platte puzzle:

Q: What is the RSS feed of calendars?

A: The iCalendar (ICS) feed.

We need to help people focus much less on fast-changing applications, protocols, and formats, and a lot more on constant underlying patterns and principles that they can learn and then extend by analogy.


1My review of the book, for BYTE, is now gone too, I see, along with my InfoWorld archive. More proof, if proof were needed, that we need to take control of our lifebits.

It says here:

Portsmouth defeated by ‘green’ Keene

Municipal employees in Portsmouth and Keene, the state’s two predominant “green” cities, slugged it out over the course of three weeks and, in the end, Keene delivered the knockout punch.

Portsmouth accepted Keene’s challenge in late March to see which environmentally conscious city could get the highest percentage of municipal employees signed up for the New Hampshire Carbon Challenge by Earth Day. With a participation rate of 55 percent, Keene employees easily outperformed Portsmouth’s 41 percent.

That’s nice. I guess. I dunno. From my point of view, ‘green’ Keene has a long way to go. My struggle to get the city to issue its first-ever approval for a clean, modern, efficient wood gasifier was epic, and cost me more than few sleepless nights.

Then last week the other shoe dropped. I found out, by accident, that I qualified for a property tax exemption. A qualifying wood heating system is defined as:

…a wood burning appliance designed to operate as a central heating system to heat the interior of a building.

Yep, that’s what my EKO-40 does. I get to reduce the taxable value of my property by $10,000. It’ll only save me a few hundred bucks a year, but that’s every year, so it’s nothing to sneeze at. I’m grateful.

But. During all that time I was struggling to get the system approved, no official in ‘green’ Keene said: Oh, by the way, we do encourage this kind of thing, and you’ll even qualify for an exemption, and in fact it’ll be the first one we’ve had the opportunity to do, and we’re excited about that!

Well, the secret’s out now. I’m happy to know that the next person to adopt central wood heating will be able to search, find precedent, and move forward.

I spent some time over the weekend perusing the list of possible recurring events that my search robot found, and recording the useful/valid/appropriate ones in a calendar that syndicates into the Keene calendar hub.

It took me a half hour to go through the first 125 items in that list of 3300 search results. I found ten new recurring events for the Keene calendar. Three or four of those came from PDF newsletters that contained English paragraphs like:

Community Singers: Open singing group, no experience necessary, come for the joy of it. Thursdays from 10:45 to 11:45.

Using ordinary calendar software — in this case Live Calendar, but it could as easily have been Google Calendar, Outlook, Apple iCal, Eventful, or Upcoming — I turned these into iCalendar paragraphs like:

BEGIN:VEVENT
RRULE:FREQ=WEEKLY;INTERVAL=1;BYDAY=TH;WKST=SU
DTSTART:20090416T104500
DTEND:20090416T114500
SUMMARY:Community Singers
DESCRIPTION:http://www.lifeartkeene.org
LOCATION:LifeArt Community Resource Center
END:VEVENT

The first thought that will occur to technically-inclined readers is: “Hmm. How might I fully automate that transformation?” I understand, and share, that impulse. But I’m trying to set it aside for now, and focus on a different kind of solution.

At a geekish dinner recently, the conversation turned to automation. The geek mind and personality, someone suggested, tends toward an all-or-none approach. It cherishes algorithms that drive fully-automated processes to 100% completion. It does not value methods that achieve partial results, or systems that engage with people to help them do that refinement.

I think that’s true. As I went through the list of candidate events, I reflected on what I was doing. A lot of it wasn’t mere translation from English to iCalendarese. For example, here’s search result #37:

37. NSD – 2009_02_Issue.indd

Recreation Center, 312 Washington St., Keene, NH. Western Style Square Dance Apparel ….. “We have a dance every first and third Saturday, no matter what!!!”

Here’s the source of the location information:

And here’s the source of the time information:

These components are unrelated. Or rather they are related, just not in a way that machine intelligence is likely to be able to detect anytime soon. But human intelligence can easily figure out that:

  • There is an organization called Monadnock Squares

  • The dances happen at the Keene Recreation Center

  • These are the kinds of events that happen on regular recurring schedules

So I searched for Monadnock Squares, and wound up adding this event to the calendar:

At this point I realized what the tagline for this project should be. The one I’ve been using is accurate but uninspiring:

community calendar syndication

So I’m going to try this instead:

finding and connecting social capital

When Robert Putnam says that we are bowling alone he adds:

More Americans are bowling than ever before, but they are not bowling in leagues.

Yochai Benkler points out that the networked information economy enhances our ability to:

…do more in loose commonality with others, without being constrained to organize their relationship through a price system or in traditional hierarchical models of social and economic organization

Maybe there’s plenty of social capital around, but it’s just harder to find, and connect with, because it’s no longer tightly coupled to traditional clubs, leagues, and organizations.

A lot of it is represented online, it turns out. It just isn’t published in a way that’s easy to find and connect with. I hope this project will help change that.

To that end, I’m wondering how to help curators process lists of many thousands of candidate events. Mechanical Turk comes to mind. It would be great to enable curators to carve their lists into batches of 100 and farm them out to volunteers. Is there a free Mechanical-Turk-like service for doing that?

Next Page »