Curation, meta-curation, and live Net radio

I’ve long been dissatisfied with how we discover and tune into Net radio. This iTunes screenshot illustrates the problem:

Start with a genre, pick a station in that genre, then listen to that station. This just doesn’t work for me. I like to listen to a lot of different things. And I especially value serendipitous recommendations from curators whose knowledge and preferences diverge radically from my own.

Yes there’s Pandora, but what I’ve been wanting all along is a way to enable and then subscribe to curators who guide me to what’s playing now on the live streams coming from radio stations around the world. It’s Wednesday morning, 11AM Eastern Daylight Time, and I know there are all kinds of shows playing right now. But how do I materialize a view for this moment in time — or for tonight at 9PM, or for Sunday morning at 10AM — across that breadth and wealth of live streams?

I started thinking about schedules of radio programs, and about calendars, and about BBC Backstage — because I’ll be interviewing Ian Forrester for an upcoming episode of my podcast — and I landed on this blog post which shows how to form an URL that retrieves upcoming episodes of a BBC show as an iCalendar feed.

Meanwhile, I’ve just created a new mode for the elmcity calendar aggregator. Now instead of creating a geographical hub, which combines events from Eventful and Upcoming and events from a list of iCalendar feeds — all for one location — you can create a topical hub whose events are governed only by time, not by location.

Can these ingredients combine to solve my Net radio problem? Could a curator for an elmcity topical aggregator cherrypick favorite shows from around the Net, and create a calendar that shows me what’s playing right now?

It seems plausible, so I spun up a new topical hub in the elmcity aggregator and started experimenting.

I began with the BBC’s iCalendar feeds. But evidently they don’t include VTIMEZONE components, which means calendar clients (or aggregators) can’t translate UK times to other times.

I ran into a few other issues, which perhaps can be sorted out when I chat with Ian Forrester. But meanwhile, since the universe of Net radio is much vaster than the BBC, and since most of it won’t be accessible in the form of data feeds, I stepped back for a broader view.

Really, anyone can publish an event that gives the time for a live show, plus a link to its player. And when a show happens on a regular recurring schedule, the little bit of effort it takes to publish that event pays recurring dividends.

Consider, for example, Nic Harcourt’s Sounds Eclectic. It’s on at these (Pacific) times: SUN 6:00A-8:00A, SAT 2:00P-4:00P, SAT 10:00P-12:00A. You can plug these into any calendar program as recurring events. And if you publish a feed, it’s not only available to you from any calendar client, it’s also available to any other calendar client — or to any aggregator.

Here’s a calendar with three recurring events for Sounds Eclectic, plus one recurring event for WICN’s Sunday jazz show, plus a single non-recurring event — the BBC’s Folkscene — which will be on the BBC iPlayer on Thursday at 4:05PM my time and 9:05PM UK time. If you load the calendar feed into a client — Outlook, Apple iCal, Google Calendar, Lotus Notes — you’ll see these events translated into your local timezone.

Note that Live Calendar is especially handy for publishing events from many different timezones. That’s because like Outlook, but unlike Google Calendar, it enables you to specify timezones on a per-event basis. So instead of having to enter the Sunday morning recurrence of Sounds Eclectic as 9AM Eastern Daylight, I can enter it as 6AM Pacific Daylight Time. Likewise Folkscene: I can enter 9:05 British Summer Time. Since these are the times that appear on the shows’ websites, it’s natural to use them.

This sort of calendar is great for personal use. But I’m looking for the Webjay of Net radio. And I think maybe elmcity topical hubs can help enable that.

There’s a way of using these topical hubs I hadn’t thought of until Tony Karrer created one. Tony runs TechEmpower, a software, web, and eLearning development firm. He wants to track and publish online eLearning events, so he’s managing them in Google Calendar and syndicating them through an elmcity topical hub to his website.

A topical hub, like a geographic hub, is controlled by a Delicious account whose owner maintains a list of feeds. I’d been thinking of the account owner as the curator, and of the feeds as homogeneous sources of events: school board meetings, soccer games, and so on.

But then Tony partnered with another organization that tracks webinars, invited that group to publish its own feed, added it to the eLearning hub, and wrote a blog post entitled Second Calendar Curator Joins to Help with List of Free Webinars:

The initial list of calendar entries, we added ourselves. But I’m pleased to announce that we’ve just signed up our second calendar curator – Coaching Ourselves. Their events are now appearing in the listings. … It is exactly because we can distribute the load of keeping this list current that makes me think this will work really well in the long run.

This probably shouldn’t have surprised me, but it did. I’d been thinking in terms of curators, feeds, and events. What Tony showed me is that you can also (optionally) think in terms of meta-curators, curators, feeds, and events. In this example, Tony is himself a curator, but he is also a meta-curator — that is, a collector of curators.

I’d love to see this model evolve in the realm of Net radio. If you want to join the experiment, just use any calendar program to keep track of some of your favorite recurring shows. (Again, it’s very helpful to use one that supports per-event timezones.) Then publish the shows as an iCalendar feed, and send me the URL. As the meta-curator of delicious.com/InternetRadio, as well as the curator of jonu.calendar.live.com/calendar/InternetRadio/index.html, I’ll have two options. If I like most or all of the shows you like, I can add your feed to the hub. If I only like some of the shows you like, I can cherrypick them for my feed. Either way, the aggregated results will be available as XML, as JSON, and as an iCalendar feed that can flow into calendar clients or aggregators.

Naturally there can also be other meta-curators. To become one, designate a Delicious account for the purpose, spin up your own topical hub, and tell me about it.

Talking with Cathy Marshall about tags, digital archiving, and lifestreams

My guest for this week’s Innovators show is Cathy Marshall, a Senior Researcher in Microsoft’s Silicon Valley Lab. She’s long been intrigued by personal information management — and nowadays, also by its social dimension.

We kicked off the conversation with a discussion of her recent paper Do Tags Work?. (See also her slides from a talk about the project.) This was a clever study in which she collected a bunch of Flickr photos of people spinning on the bull’s balls in Milan. Notice how that fulltext query effectively retrieves a pile of images, taken by different people, of the same curious custom:

If you are passing through the Galleria Vittorio Emanuele II, you should spin around on the testicles of the bull mosaic found in the centre. Legend has it that this will bring you good luck!

Now try this query, which uses the same terms but looks at tags instead of the free text (title, description) associated with the photos. It finds nothing.

Cathy concludes that while many people think tags are effective hooks for information retrieval, they really aren’t.

Of course, those of us who attend conferences where the first order of business is to announce a tag know that tags can be a very effective way to aggregate all the blog postings, tweets, and photos associated with an event. Folksonomies that aren’t intended to converge don’t. Those that are meant to converge do, quite dramatically, which is why I’ve long been obsessed with intentional tagging as an enabler of loosely-coupled collaboration.

In the second half of the conversation we discussed personal digital archiving, curation, benign neglect, and lifestreams. Cathy tells a lot of stories about the ways in which people do, and also don’t, take care of their digital stuff. She observes, for example, that when people lose the contents of a computer, they react initially with horror, but then often feel a sense of relief. It turns out a lot of what was there wasn’t really needed. The burden of culling through it is lifted, and the guilt associated with not doing that culling that goes away.

(I laughed harder than I have in a long time when Cathy described rental storage units as “garbage cans you pay for, and then when you realize you no longer care about the stuff in them, you stop paying for.”)

We ended by agreeing that the hardest thing about introducing a hosted lifebits service ecosystem will be the conceptual model. For psychological reasons, people will want to think in terms of monolithic containers that keep stuff in one place, and monolithic services that do everything related to that stuff. For architectural reasons, though, we’ll want to federate storage, and also decouple classes of service — so that storage, for example, is orthogonal to access control and authorization, which is orthogonal to social interaction.

Polymath = user innovation

In February 2007, Mike Adams, who had recently joined Automattic, the company that makes WordPress, decided on a lark to endow all blogs running on WordPress.com with the ability to use LaTeX, the venerable mathematical typesetting language. So I can write this:

$latex \pi r^2$

And produce this:

\pi r^2

When he introduced the feature, Mike wrote:

Odd as it may sound, I miss all the equations from my days in grad school, so I decided that what WordPress.com needed most was a hot, niche feature that maybe 17 people would use regularly.

A whole lot more than 17 people cared. And some of them, it turns out, are Fields medalists. Back in January, one member of that elite group — Tim Gowers — asked: Is massively collaborative mathematics possible? Since then, as reported by observer/participant Michael Nielsen (1, 2), Tim Gowers, Terence Tao, and a bunch of their peers have been pioneering a massively collaborative approach to solving hard mathematical problems.

Reflecting on the outcome of the first polymath experiment, Michael Nielsen wrote:

The scope of participation in the project is remarkable. More than 1000 mathematical comments have been written on Gowers’ blog, and the blog of Terry Tao, another mathematician who has taken a leading role in the project. The Polymath wiki has approximately 59 content pages, with 11 registered contributors, and more anonymous contributors. It’s already a remarkable resource on the density Hales-Jewett theorem and related topics. The project timeline shows notable mathematical contributions being made by 23 contributors to date. This was accomplished in seven weeks.

Just this week, a polymath blog has emerged to serve as an online home for the further evolution of this approach.

I am completely unqualified to evaluate the nature of mathematical discourse that’s going in on these polymath collaborations, or the claims being made regarding outcomes. But it sure makes my spidey-sense tingle.

I am, however, qualified to evaluate the nature of the collaborative methods being employed. And on that front, I’m amused (and chagrined) to recall something I wrote back in 2000, in a report called Internet groupware for scientific collaboration. The report was commissioned by Greg Wilson, who organized this week’s Science 2.0 event in Toronto. At that event, my report served as a historical frame for the polymath experimentation that’s going on right now, and that Michael Nielsen discussed at the Toronto event in an updated version of this talk.

In my 2000 report I said:

TeX and LaTeX define scientific publishing for a generation of scientists. But these formats don’t integrate directly into the shared spaces of the Web. The rise of XML as a universal markup language, along with vocabularies such as MathML (for mathematical notation) and SVG (for scalable vector graphics), suggests that the Web may yet reach its original collaborative goal.

Why didn’t I see, then, that the crux of the issue wasn’t XML and MathML and SVG, but rather the ability to “integrate directly into the shared spaces of the Web”? And that what ought to be integrated directly was the typesetting language already familiar to mathematicians, namely LaTeX?

The answer is that I needed (and still need) to be reminded that good-enough solutions here now, and familiar to people, often trump great solutions that aren’t here and wouldn’t be familiar if they were.

From that perspective, I’m wondering what will and won’t turn out be good enough for the polymathematicians. The current setup is admittedly imperfect, and they’re now begining to explore WordPress plugins that enable, for example, more powerful ways to organize, reply to, and refer to one anothers’ comments.

I don’t think anybody yet knows what the right tooling will be for polymathematical collaboration. The ones who are best qualified to figure it out are the polymathematical collaborators themselves, but they are not WordPress plugin developers.

What’s needed is what Eric von Hippel calls a user innovation toolkit. The idea is this: Leading users, as they employ a tool, also modify it, and in so doing they express intentions that tool developers can then capture and formalize.

If you look at the systems of notation that the polymathematicians are creating in order to organize and refer to their contributions in these long and complex threads of mathematical discourse, you can see intentions being expressed. So arguably, WordPress is a user innovation toolkit, and we’ll see these innovations codified in future plugins. I’ll be watching with great interest.

Update: As per Jonathan Fine’s comment below, it appears that MathTran.org has offered the same kind of service for quite a while now:

Talking with Mike Dunn about practical uses of semantic technology

My guest for this week’s Innovators show is Mike Dunn, a veteran media technologist who recently attended, and spoke at, the 2009 Semantic Technology. Mike and I were both impressed by Tom Tague’s keynote talk, which avoided theory and focused on practical ways that here-and-now semantic technologies are helping media businesses work smarter and more profitably. In this conversation, Mike describes some of the ways that his company, Hearst Media Interactive, is proving that point.

Search engine optimization is currently one of the best ways to profit from data-enabled content. Meanwhile, one of the expected benefits of semantic technology — better search recall and precision — hasn’t materialized. But although most users may not care about querying archives more comprehensively and more precisely, writers and editors should. And not only because it helps automate the assembly of context around a current story. If you can review an archive in a precise and comprehensive way, you can do a better job of planning future stories that acknowledge — and advance — the ones you’ve already done.

Topical event hubs

The elmcity project began with a focus on aggregating events for communities defined by places: cities, towns. But I realized a while ago that it could also be used to aggregate events for communities defined by topics. So now I’m building out that capability. One early adopter tracks and promotes online events in the e-learning domain. Another tracks and promotes conferences and events related to environmentally-sustainable business practices.

The curation method is very similar to what’s defined in the elmcity project FAQ. To define a topic hub you use a Delicious account, you create a metadata URL as shown in the FAQ, and you use what= instead of where= to define a topic instead of a location. Since there’s no location, there’s no aggregation of Eventful and Upcoming events. The topical hub is driven purely by your registry of iCalendar feeds.

If you (or somebody you know) needs to curate events by topic, and would like try this method, please get in touch. I’d love to have you help me define how this can work, and discover where it can go.

Why we need an XML representation for iCalendar

Translations:

Croatian

On this week’s Innovators show I got together with two of the authors of a new proposal for representing iCalendar in XML. Mike Douglass is lead developer of the Bedework Calendar System, and Steven Lees is Microsoft’s program manager for FeedSync and chair of the XML technical committee in CalConnect, the Calendaring and Scheduling Consortium.

What’s proposed is no more, but no less, than a well-defined two-way mapping between the current non-XML-based iCalendar format and an equivalent XML format. So, for example, here’s an event — the first low tide of 2009 in Myrtle Beach, SC — in iCalendar format:

BEGIN:VEVENT
SUMMARY:Low Tide 0.39 ft
DTSTART:20090101T090000Z
UID:2009.0
DTSTAMP:20080527T000001Z
END:VEVENT

And here’s the equivalent XML:

<vevent>
  <properties>
    <dtstamp>
      <date-time utc='yes'>
        <year>2008</year><month>5</month><day>27</day>
        <hour>0</hour><minute>0</minute><second>1</second>
      </date-time>
    </dtstamp>
    <dtstart>
      <date-time utc='yes'>
        <year>2009</year><month>1</month><day>1</day>
        <hour>9</hour><minute>0</minute><second>0</second>
      </date>
    </dtstart>
    <summary>
      <text>Low Tide 0.39 ft</text>
    </summary>
    <uid>
      <text>2009.0</text>
    </uid>
  </properties>
</vevent>

The mapping is quite straightforward, as you can see. At first glance, the XML version just seems verbose. So why bother? Because the iCalendar format can be tricky to read and write, either directly (using eyes and hands) or indirectly (using software). That’s especially true when, as is typical, events include longer chunks of text than you see here.

I make an analogy to the RSS ecosystem. When I published my first RSS feed a decade ago, I wrote it by hand. More specifically, I copied an existing feed as a template, and altered it using cut-and-paste. Soon afterward, I wrote the first of countless scripts that flowed data through similar templates to produce various kinds of RSS feeds.

Lots of other people did the same, and that’s part of the reason why we now have a robust network of RSS and Atom feeds that carries not only blogs, but all kinds of data packets.

Another part of the reason is the Feed Validator which, thanks to heroic efforts by Mark Pilgrim and Sam Ruby, became and remains the essential sanity check for anybody who’s whipping up an ad-hoc RSS or Atom feed.

No such ecosystem exists for iCalendar. I’ve been working hard to show why we need one, but the most compelling rationale comes from a Scott Adams essay that I quoted from in this blog entry. Dilber’s creator wrote:

I think the biggest software revolution of the future is that the calendar will be the organizing filter for most of the information flowing into your life. You think you are bombarded with too much information every day, but in reality it is just the timing of the information that is wrong. Once the calendar becomes the organizing paradigm and filter, it won’t seem as if there is so much.

If you buy that argument, then we’re going to need more than a handful of applications that can reliably create and exchange calendar data. We’ll want anyone to whip up a calendar feed as easily as anyone can now whip up an RSS/Atom feed.

We’ll also need more than a handful of parsers that can reliably read calendar feeds, so that thousands of ad-hoc applications, services, and scripts will be able consume all the new streams of time-and-date-oriented information.

I think that a standard XML representation of iCalendar will enable lots of ad-hoc producers and consumers to get into the game, and collectively bootstrap this new ecosystem. And that will enable what Scott Adams envisions.

Here’s a small but evocative example. Yesterday I started up a new instance of the elmcity aggregator for Myrtle Beach, SC. The curator, Dave Slusher, found a tide table for his location, and it offers an iCalendar feed. So the Myrtle Beach calendar for today begins like this:

Thu Jul 23 2009

WeeHours

Thu 03:07 AM Low Tide -0.58 ft (Tide Table for Myrtle Beach, SC)

Morning

Thu 06:21 AM Sunrise 6:21 AM EDT (Tide Table for Myrtle Beach, SC)
Thu 09:09 AM High Tide 5.99 ft (Tide Table for Myrtle Beach, SC)
Thu 10:00 AM Free Coffee Fridays (eventful: )
Thu 10:00 AM Summer Arts Project at The Market Common (eventful: )
Thu 10:00 AM E.B. Lewis: Story Painter (eventful: )

Imagine this kind of thing happening on the scale of the RSS/Atom feed ecosystem. The lack of an agreed-upon XML representation for iCalendar isn’t the only reason why we don’t have an equally vibrant ecosystem of calendar feeds. But it’s an impediment that can be swept away, and I hope this proposal will finally do that.

Late July in Toronto: DemoCamp and Science 2.0

On Tuesday July 28 I’ll be at the Toronto DemoCamp. I’m looking forward to meeting the designers, developers, and developers who’ll be there, seeing what you’re working on, and showing you what I’m working on.

The following day I’ll be speaking at a Science 2.0 event organized by my friend Greg Wilson. Here are the forward-thinking scientists I’ll be joining:

  • Titus Brown: Choosing Infrastructure and Testing Tools for Scientific Software Projects
  • Cameron Neylon: A Web Native Research Record: Applying the Best of the Web to the Lab Notebook
  • Michael Nielsen: Doing Science in the Open: How Online Tools are Changing Scientific Discovery
  • David Rich: Using “Desktop” Languages for Big Problems
  • Victoria Stodden: How Computational Science is Changing the Scientific Method

I am not a scientest, nor do I play one on TV, so why me? Because back in 2000, Greg commissioned me to write a report entitled Internet Groupware for Scientific Collaboration. Greg was then working with the Los Alamos National Laboratory on ways to help scientists make better use of the tools of computation as well as the methods of online collaboration. I had recently finished my book Practical Internet Groupware, I was exploring what we would now call the Web 2.0 landscape, and I was thinking and writing a lot about how these open and loosely-coupled modes of communication could enable the sort of collaboration at the core of science (and other kinds of academic endeavors) in powerful new ways.

Nearly a decade later, that vision is becoming a reality. I’m really excited to meet these folks, whose adventures I’ve been following through their blogs, and hear about their experiences at the forefront of what I believe will be a new golden age of science.

In my own talk, I’ll review how own current project tackles the challenge of social information management, and aims to democratize the computational way of thinking that enables us to wire the web.

Tinker to Evers to Chance, Tripit to Dopplr to Facebook

A few months back I observed:

Tripit, meet Dopplr. Dopplr, Tripit. You two should really get to know one another.

Richard Akerman replied:

You can feed TripIt’s ical output into Dopplr, I hear (I haven’t tried it)

That remark should have rung a loud bell for me, but somehow it didn’t. Then, yesterday, in conversation with James Senior, the bell rang. We were talking about how many services publish and/or subscribe to iCalendar feeds, how few people know that, and how much latent capability is being left on the table. Paraphrasing James:

I’ll give you a perfect example. I use Tripit, it’s a wonderful service. You email it your travel itinerary, and it organizes all your information for you. But I’ve been frustrated not to be able to share that information with my friends on Facebook. I also use Dopplr, and Dopplr talks to Facebook, but Tripit doesn’t. Then I realized that Tripit publishes an iCalendar feed, and that Dopplr can subscribe to iCalendar feeds. So I made that connection, and now my Tripit events are showing up in Facebook.

Brilliant. Look:

How did I miss that? Me, of all people, Mr. Splice-Everything-To-Everything, Mr. Find-Unintended-Uses-Of-Software, Mr. Cosmic-Significance-Of-Pub-Sub, Mr. Champion-Of-The-Underutilized-iCalendar-Standard, Mr. Computational-Thinking?

Because wiring the web is still too abstract, too convoluted, and too non-obvious — even, sometimes, for me.

The phrase wiring the web comes from Ray Ozzie, by the way. At ETech in 2006, demoed a concept called Live Clipboard. From my InfoWorld writeup:

Subscribing to an RSS feed, for example, has never conformed to any familiar user-interface pattern. Soon copying and pasting RSS feeds will feel natural to everyone, and Ozzie hopes the copy/paste metaphor will also make advanced capabilities more accessible. Consider my LibraryLookup bookmarklet. Dragging it onto the browser’s toolbar isn’t something easily understood or explained. Using the clipboard as the wiring junction will make a lot more sense to most people.

The same metaphor can accommodate what I’ve called lightweight service composition and what Ozzie calls “wiring the Web.” He showed how RSS feeds acting as service end points can be pasted into apps to create dynamically updating views. Virtually anyone can master this Tinkertoy approach to self-serve mashups.

This was, and remains, a crucial insight. From now on, we are all going to be wiring the web in one way or another. And we’re going to need a conceptual frame in which to do that — ideally, a user-interface metaphor that’s already familiar. Maybe it’s as simple as copy/paste. Maybe it’s more like Yahoo! Pipes or Popfly blocks. Whatever it turns out to be, we need to invent and deploy a universal junction box for wiring the web.

Talking with Peter O’Toole about gathering clinical data and sharing medical knowledge

My guest for this week’s Innovators show is Peter O’Toole from mTuitive, a company whose authoring toolkit for clinical data collection I featured in a 2006 screencast. mTuitive is working at the intersection of a number of disciplines that all need to come together to deliver cheaper and better health care.

First, usability. Designing clinical data gathering systems that capture what’s right for the patient, along with what’s mandated by the insurance company, requires a careful balancing of constraints and freedom in software user interfaces.

Second, knowledge engineering. Clinical systems don’t merely record data, they embody medical protocols that reflect an ever-changing consensus about methods and best practices. mTuitive’s authoring system aims to enable leading practioners to encode that knowledge in ways that can then guide others. But knowledge grows at the edge as well as at the center. So mTuitive also enables practitioners to extend and modify the software, injecting local knowledge and custom. Who owns this knowledge? Who’s liable for the consequences of its use? These are some of the implications we discussed.

Third, semantics. Electronic medical records are still mainly narrative in form, says Peter O’Toole. But we’re moving toward more computable ways of describing observations about, say, the nature and size of tumors.

Fourth, social software. My hunch, and Peter O’Toole’s too, is that progress toward the nirvana of medical records that are both semantically rich and interoperable will be powered by a two-stroke engine. One stroke of the piston will be driven by centrally-defined standards and centrally-imposed legislation. But the other will be driven by networked collaboration, at the edge, among doctors who pool and codify their experiential knowledge using ad-hoc, Web 2.0-like methods.

Hat tip to Joshua Allen’s Better Living Through Software

Here’s another piece of Say Everything that I want to spotlight:

Microsoft wasn’t known as a haven of openness and cooperation. But it was a big place with a lot of smart people. At the turn of the millenium, during the company’s bitter antitrust fight with the U.S. Department of Justice, many of those people found it impossible to recognize themselves in the press’s portrait of the company. The first programmer at Microsoft to start blogging, Joshua Allen, set himself up with an account on Dave Winer’s EditThisPage service in 2000 and started posting under the header “Better Living Through Software: Tales of Life at Microsoft.” It was totally informal and unauthorized — a lone call for a parley raised from behind the company’s siege walls. Allen explained his intent: “I wanted to say that I am a Microsoft person and you can talk with me.”

I used to read Joshua’s blog back then, I still read it now, it was nice to see its seminal role acknowledged in the book.

Here’s a picture of the blog’s home page, annotated by the ClearForest Gnosis entity extractor:

Quite a cast of characters!

More fun than herding servers

Until recently, the elmcity calendar aggregator was running as a single instance of an Azure worker role. The idea all along, of course, was to exploit the system’s ability to farm out the work of aggregation to many workers. Although the sixteen cities currently being aggregated don’t yet require the service to scale beyond a single instance, I’d been meaning to lay the foundation for that. This week I finally did.

Will there ever be hundreds or thousands of participating cities and towns? Maybe that’ll happen, maybe it won’t, but the gating factor will not be my ability to babysit servers. That’s a remarkable change from just a few years ago. Over the weekend I read Scott Rosenberg’s new history of blogging, Say Everything. Here’s a poignant moment from 2001:

Blogger still lived a touch-and-go existence. Its expenses had dropped from a $50,000-a-month burn rate to a few thousand in rent and technical costs for bandwidth and such; still, even that modest budget wasn’t easy to meet. Eventually [Evan] Williams had to shut down the office entirely and move the servers into his apartment. He remembers this period as an emotional rollercoaster. “I don’t know how I’m going to pay the rent, and I can’t figure that out because the server’s not running, and I have to stay up all night, trying to figure out Linux, and being hacked, and then fix that.”

I’ve been one of those guys who babysits the server under the desk, and I’m glad I won’t ever have to go back there again. What I will have to do, instead, is learn how to take advantage of the cloud resources now becoming available. But I’m finding that to be an enjoyable challenge.

In the case of the calendar aggregator, which needs to map many worker roles to many cities, I’m using a blackboard approach. Here’s a snapshot of it, from an aggregator run using only a single worker instance:

     id: westlafcals
  start: 7/14/2009 12:12:05 PM
   stop: 7/14/2009 12:14:46 PM
running: False

     id: networksierra
  start: 7/14/2009 12:14:48 PM
   stop: 7/14/2009 12:15:05 PM
running: False

     id: localist
  start: 7/14/2009 12:15:06 PM
   stop: 7/14/2009  5:37:03 AM
running: True

     id: aroundfred
  start: 7/14/2009  5:37:05 AM
   stop: 7/14/2009  5:39:20 AM
running: False

The moving finger wrote westlafcals (West Lafayette) and networksierra (Sonora), it’s now writing localist (Baltimore), and will next write aroundfred (Fredericksburg).

Here’s a snapshot from another run using two worker instances:

     id: westlafcals
  start: 7/14/2009 10:12:05 PM
   stop: 7/14/2009  4:37:03 AM
running: True

     id: networksierra
  start: 7/14/2009 10:12:10 PM
   stop: 7/14/2009 10:13:05 PM
running: False

     id: localist
  start: 7/14/2009 10:13:06 PM
   stop: 7/14/2009  4:41:12 AM
running: True

     id: aroundfred
  start: 7/14/2009  4:41:05 AM
   stop: 7/14/2009  4:42:20 AM
running: False

Now there are two moving fingers. One’s writing westlafcals, one has written networksierra, one’s writing localist, and one or the other will soon write aroundfred. The total elapsed time will be very close to half what it was in the single-instance case. I’d love to crank up the instance count and see an aggregation run rip through all the cities in no time flat. But the Azure beta caps the instance count at two.

The blackboard is an Azure table with one record for each city. Records are flexible bags of name/value pairs. If you make a REST call to the table service to query for one of those records, the Atom payload that comes back looks like this:

<m:properties>
   <d:PartitionKey>blackboard</d:PartitionKey>
   <d:RowKey>aroundfred</d:RowKey>
   <d:start>7/14/2009 4:41:05 AM</d:start>
   <d:stop>7/14/2009 4:42:20 AM</d:stop>
   <d:running>False</d:stop>
</m:properties>

At the start of a cycle, each worker wakes up, iterates through all the cities, aggregates those not claimed by other workers, and then sleeps until the next cycle. To claim a city, a worker tries to create a record in a parallel Azure table, using the PartitionKey locks instead of blackboard. If the worker succeeds in doing that, it considers the city locked for its own use, it aggregates the city’s calendars, and then it deletes the lock record. If the worker fails to create that record, it considers the city locked by another worker and moves on.

This cycle is currently one hour. But in order to respect the various services it pulls from, the service defines the interval between aggregation runs to be 8 hours. So when a worker claims a city, it first checks to see if the last aggregation started more than 8 hours ago. If not, the worker skips that city.

Locks can be abandoned. That could happen if a worker hangs or crashes, or when I redeploy a new version of the service. So the worker also checks to see if a lock has been hanging around longer than the aggregation interval. If so, it overrides the lock and aggregates that city.

I’m sure this scheme isn’t bulletproof, but I reckon it doesn’t need to be. If two workers should happen to wind up aggregating the same city at about the same time, it’s no big deal. The last writer wins, a little extra work gets done.

Anyway, I’ll be watching the blackboard over the next few days. There’s undoubtedly more tinkering to do. And it’s a lot more fun than herding servers.

The civic dashboard

On Friday my local paper ran a story entitled Keene crime rates steady over years. Because that link will go dark soon, I’m going to assert fair use for the part of the story that cites statistics:

Strings of vehicle break-ins and vandalism and the occasional vicious beating or stabbing may lead some to believe that Keene’s streets are getting meaner, but crime statistics show little change over the last six years.

Even in light of rough economic times, which typically parallel a spike in shoplifting — people begin stealing groceries or other necessities they can no longer afford — the Elm City’s property crime rate remains stable.

The city’s social programs, such as The Community Kitchen, which provides food to area residents in need, play a significant role in curbing crime, Keene police Lt. Jay U. Duguay said.

“We’re behind the nation when it comes to economic issues. People are still losing their homes and jobs, but overall we haven’t felt the effects of it yet,” he said. “Right now it’s wait-and-see.”

During the last six years, Keene police have received an average of 490 reports dealing with larceny or theft. Last year they took 667 reports of larceny or theft, the highest number of those types of crimes since 2002, which saw 604 reports.

From the beginning of this year to the end of April, there were 202 reports of larceny and theft, slightly higher than the 147 during the same period last year, and 33 burglaries, which is on par with previous years.

“There’s going to be periods with a little influx, but for the most part it’s steady,” Duguay said. “I was actually kind of surprised at how consistent the numbers were.”

In 2004 and 2005, property crime rates dipped dramatically. While 2003 saw 557 larcenies and thefts, that number hit 272 the following year and then slightly increased to 286 the next year before rising to 455 in 2006.

“We didn’t change our patrol procedures during those times (2004 and 2005) and we weren’t up to full staff. So I don’t know why those years are lower,” Duguay said. “I think the more consistent number is the high number, but thank goodness for the lows.”

Violent crime reports in Keene have also remained steady over the last several years, with an average of 366 assaults annually.

Between 20 and 30 sex assaults are reported in the city each year, though only a small fraction of those cases result in arrests because the others lack sufficient evidence, Duguay said.

Statistics only tell part of the story, though. For the crime victims, the numbers hold little meaning.

The story concludes with anecdotes from townsfolk who either do, or don’t, believe that tough economic times are making Keene’s streets meaner.

I quote at length from it here because I think it captures a moment in time. The story seems to be data-driven, but not in the way that many of us now realize such stories can be. The reporter got some numbers from the police department, and the story quotes a lieutenant’s interpretation of those numbers, but there’s nothing available for an interested citizen to verify or falsify. And there’s no reference to an alternative source — from the US Bureau of Justice — that could confirm, challenge, or otherwise contextualize the numbers.

I hope that my response, below, also marks a moment in time — one in which people didn’t demand, governments didn’t provide, journalists didn’t exploit, and all these groups didn’t collaboratively engage with more and better evidence than informs most civic dialogue today.

From time to time, communities ask: Are we having a crime wave? A couple of summers ago it seemed that way. The Sentinel invited TalkBalk comments, and one person wrote: “Keene has gone downhill. Once a peaceful, quaint city that was safe, it is no more.”

We shouldn’t have to just speculate about these trends though. We should be able to look at the data and draw reasonable conclusions. Increasingly, we can.

In 2007 I looked, and the first source I found was the data reported by the Keene police department (and every other police department around the country) to the US Bureau of Justice. I noticed a couple of things. First, the numbers showed no uptick in violent crime. But since they stopped in 2005, they didn’t address concern about events in the then-current 2006-2007 period.

Second, because the numbers went back to 1985, they revealed a remarkable anomaly. There was a huge spike in violent crime — assaults and rapes — from 1990 to 1994. You can see the trend plainly in the charts and data I’ve posted at http://jonudell.net/crime/keene-crime.html. What happened then? How should this historical context influence our perception of current trends? I’d love to see the Sentinel ask, and try to answer, these questions.

Since the Bureau of Justice data wasn’t current enough to address the 2006-2007 concerns about crime, I asked the police department to provide me with more recent data. In the end, after multiple requests and some nudging by an attorney, they complied. The snapshot I received, with numbers through July 2007, showed no evidence of a recent uptick in either violent crime or property crime. That was
reassuring.

It was also enlightening to compare the raw data in the police spreadsheet to the numbers reported to the Bureau of Justice. They don’t exactly line up. This isn’t nefarious, it’s just what happens when local systems try to mesh with national systems. There is a lot of local variation in the classification of different types of crimes, and room for interpretation when you bundle them into larger
categories.

Fast forward to summer 2009. The economy has tanked, and people are again wondering whether we’re having a crime wave. The Sentinel gathered some data, talked to the police, and concluded — I suspect correctly — that as before, the perception of a crime wave is not the reality.

For the reasons I’ve explained, the police department numbers reported in the Sentinel don’t quite line up with those reported to the Bureau of Justice. Consider larceny-theft, for example:

          2003   2004   2005   2006
Sentinel   557    272    286    455
Justice    534    245    235    622

But I do wonder about this:

“Violent crime reports in Keene have also remained steady over the last several years, with an average of 366 assaults annually.”

I hope that’s an error. According to the Bureau of Justice there were at most about 100 violent crimes per year, back in the dark ages of 1990-1994, and we’ve averaged between 40 and 60 per year from then until 2007.

In any case, here’s the larger point. Cities around the country have begun to realize that the operational data of city government can be made available to everyone — citizens as well as journalists — so that we can all monitor the health of our cities in a collaborative way. Crime statistics are one popular category of data, others include restaurant inspections, infrastructure repairs, and licensing.

Nowadays it costs about $100/month to augment a police department’s information system with software that reports current crime statistics online, and also displays the locations of crimes on a map. In New Hampshire, one such system (crimereports.com) has been installed in Exeter, Hampton, Laconia, and Rochester.

I’d love to see the Keene police department join that club. A civic dashboard is part of what I proposed during the Community Visioning Process. But there’s no need to wait until 2028. Cities around the country are creating their dashboards now, and we can too.

Understanding Wikipedia notability

Some fellow residents of my town have recently noticed, and pointed out to me, that I’m listed in Wikipedia as a notable inhabitant of Keene, NH. They’re more impressed than they should be. All forms of notability are subject to bias, but Internet notability is subject to a different kind of bias than most people realize.

For example, friends and family used to be impressed by the fact that I was the top result in Google for my first name — and then second to Jon Stewart for a long while, until I had to reboot my InfoWorld archive. Why? Just because I’ve projected a large surface area of searchable documents whose titles include the trigram jon.

An example of a far more notable person than me is Glenn Fine, who was in my grade in junior high school and is now Inspector General for the Department of Justice. You won’t find him anywhere near the top of a search for his first name because Inspectors General don’t (yet) project a large surface area of documents onto the web.

To place my newfound Wikipedia notability into a similar context, I wanted to show people how these lists of notable inhabitants are made. I figured the person who made the change is somebody who knows of my work, because I’ve written about it so much online, and who is inclined to edit Wikipedia, which correlates with an interest in my work.

I wanted to illustrate exactly who, when, and how, so I went to Wikipedia with the confident expectation that it would be easy to answer those questions.

Surprisingly, it wasn’t. I guess I haven’t really tried searching revision histories in Wikipedia before, but in this case and a few others I’ve tried lately, it seems quite difficult to pinpoint the author of a change.

For example, on Twitter I asked:

Wikipedia: “The term ‘Web 2.0’ was coined by Darcy DiNucci in 1999.” Added when, by whom? WikiBlame seems an ineffective way to find out.

@bazzargh replied: Robert Gehl. http://bit.ly/46r1a

Thanks. By the way, how’d you do that?

switch to 500 view in history, then rough bisection from oldest. Couple of minutes; used this a lot to find long-lived vandalism.

if older, I progressively back off 2..4..8… pages through this. In this case though, there was a clueful log message!

That’s pretty much what I’ve found myself doing when trying to track down changes, so I was glad to know it wasn’t just me. But this highlights an important point about transparency: It’s all relative.

One of the reasons we think of government as opaque is that while records may be notionally public, it takes time, effort, and skill to visit city hall, dig through them, and find what you’re looking for.

I have always regarded Wikipedia as an extreme counter-example. And that’s true. It is radically transparent. You can ultimately find out exactly how any statement in any article came to be. You may not be able to correlate the author’s pseudonym to a real-world identity, but you can evaluate that author’s corpus and reputation within the context of Wikipedia.

And yet, the ability to do this spelunking requires more time, effort, and skill than most people possess. Although I’m reluctant to deflate my status as a notable inhabitant of Keene, I wish it were easier for people who read that to also find out what it does — and doesn’t — mean.

Strategic choices for calendar publishers

Although I haven’t been able to confirm this officially yet, it looks like FuseCal, the HTML screen-scraping service that I’ve been using (and recommending) as a way to convert calendar-like web pages into iCalendar feeds, has shut down.

The web pages that FuseCal has been successfully processing, for several curators participating in the elmcity project, are listed below. They’re a kind of existence proof, validating the notion that unstructured calendar info — what people intuitively create — can be mechanically transformed into structured info that syndicates reliably.

I hope this service, or some future variant of it, will continue. It’s a really useful way to help people grasp the concept of publishing calendar feeds.

But in the long run, it’s a set of training wheels. Ultimately we need to teach people why and how to produce such feeds more directly. All of the event information shown below could be managed in a more structured way using calendar software that produces data feeds for syndication and web pages for browsing.

More broadly, incidents like this prompt us to consider the nature of the services ecosystem we’re all embedded in — as users and, increasingly, as co-creators. In the software business, developers have long since learned to evaluate the benefits and risks of “taking a dependency” on a component, library, or service. Users didn’t have to think too much about that. A software product that was discontinued would keep working perfectly well, maybe for years. But services can — and sometimes do — stop abruptly.

Since the elmcity project is embedded in a services ecosystem, as both a provider and a consumer, how should a curator evaluate service dependencies and their associated risks and benefits? Here are some guidelines.

Many eggs, many baskets

An instance of the calendar aggregator gathers events from three main sources: Eventful (service #1), Upcoming (service #2), and a curated set of iCalendar feeds. A subset of those feeds may (until recently) have been mediated by FuseCal (service #3). So there were three main service dependencies here, and that’s one form of diversification.

But the iCalendar feeds represent another, and more powerful, form of diversification. One may be served up by a Drupal system, one may be an ICS file posted from Outlook 2007, one may be an instance of Google Calendar. Each depends on its own supporting services, but the ecosystem is very diverse.

Data and service portability

The elmcity project isn’t a database of events, but rather an aggregator of feeds of events. What matters in this case is portability of metadata describing the feeds, as well as data describing events. The system depends on Delicious for the management of the metadata. But all this metadata is replicated to Azure for safekeeping.

Since the elmcity project does run on Azure, there’s clearly a strong dependence on that platform’s compute and storage services. But I could run the code on another host — even another cloud-based host, thanks to Amazon’s EC2 for Windows. Likewise I could store blobs and tables in Amazon’s S3 and SimpleDB.

Strategic choices

In this context, the use of FuseCal was a strategic choice. There isn’t a readily available replacement, and that’s a recipe for the sort of disruption we’ve just experienced. But since the system is diversified, that disruption is contained. Was the benefit provided by this unique service worth the cost of disruption? Some curators may disagree, but I think the answer is yes. It was really helpful to be able to show people that informational web pages are implicitly data feeds, and to show what can happen when those data feeds are made explicit.

Still, it was a crutch. Ultimately we want people to stand on their own two feet, and take direct control of the information they publish to the web. FuseCal had to guess which times went with which events, and sometimes guessed wrong. If you’re publishing the event, you want to state these facts unambiguously. And using a variety of methods, as I’ve shown, you can. Those methods are the real strategic choices. If you can publish your own data feed, simply and inexpensively, you should seize the opportunity to do so


Calendar pages successfully parsed by FuseCal

prescottaz

fallschurchcals

ottawacals

snoqualmie

mashablecity

elmcity

a2cal

whyhuntington

Influencing the production of public data

In the latest installment of my Innovators podcast, which ran while I was away on vacation, I spoke with Steven Willmott of 3scale, one of several companies in the emerging business of third-party API management. As more organizations get into the game of providing APIs to their online data, there’s a growing need for help in the design and management of those APIs.

By way of demonstration, 3scale is providing an unofficial API to some of the datasets offered by the United Nations. The UN data at http://data.un.org, while browseable and downloadable, is not programmatically accessible. If you visit 3scale’s demo at www.undata-api.org/ you can sign up for an access key, ask for available datasets — mostly, so far, from the World Health Organization (see below) — and then query them.

The query capability is rather limited. For a given measure, like Births by caesarean section (percent), you can select subsets by country or by year, but you can’t query or order by values. And you can’t make correlations across tables in one query.

It’s just a demo, of course. If 3scale wanted to invest more effort, a more robust query system could be built. The fact that such a system can be built by an unofficial intermediary, rather than by the provider of the data, is quite interesting.

As I watch this data publication meme spread, here’s something that interests me even more. These efforts don’t really reflect the Web 2.0 values of engagement and participation to the extent they could. We’re now very focused on opening up flexible means of access to data. But the conversation is still framed in terms of a producer/consumer relationship that isn’t itself much discussed.

At the end of this entry you’ll find a list of WHO datasets. Here’s one: Community and traditional health workers density (per 10,000 population). What kinds of questions do we think we might try to answer by counting this category of worker? What kinds of questions can’t we try to answer using the datasets WHO is collecting? How might we therefore want to try to influence the WHO’s data-gathering efforts, and those of other public health organizations?

“Give us the data” is an easy slogan to chant. And there’s no doubt that much good will come from poking through what we are given. But we also need to have ideas about what we want the data for, and communicate those ideas to the providers who are gathering it on our behalf.


Adolescent fertility rate
Adult literacy rate (percent)
Gross national income per capita (PPP international $)
Net primary school enrolment ratio female (percent)
Net primary school enrolment ratio male (percent)
Population (in thousands) total
Population annual growth rate (percent)
Population in urban areas (percent)
Population living below the poverty line (percent living on less than US$1 per day)
Population median age (years)
Population proportion over 60 (percent)
Population proportion under 15 (percent)
Registration coverage of births (percent)
Registration coverage of deaths (percent)
Total fertility rate (per woman)
Antenatal care coverage – at least four visits (percent)
Antiretroviral therapy coverage among HIV-infected pregnant women for PMTCT (percent)
Antiretroviral therapy coverage among people with advanced HIV infections (percent)
Births attended by skilled health personnel (percent)
Births by caesarean section (percent)
Children aged 6-59 months who received vitamin A supplementation (percent)
Children aged less than 5 years sleeping under insecticide-treated nets (percent)
Children aged less than 5 years who received any antimalarial treatment for fever (percent)
Children aged less than 5 years with ARI symptoms taken to facility (percent)
Children aged less than 5 years with diarrhoea receiving ORT (percent)
Contraceptive prevalence (percent)
Neonates protected at birth against neonatal tetanus (PAB) (percent)
One-year-olds immunized with MCV
One-year-olds immunized with three doses of Hepatitis B (HepB3) (percent)
One-year-olds immunized with three doses of Hib (Hib3) vaccine (percent)
One-year-olds immunized with three doses of diphtheria tetanus toxoid and pertussis (DTP3) (percent)
Tuberculosis detection rate under DOTS (percent)
Tuberculosis treatment success under DOTS (percent)
Women who have had PAP smear (percent)
Women who have had mammography (percent)
Community and traditional health workers density (per 10 000 population)
Dentistry personnel density (per 10 000 population)
Environment and public health workers density (per 10 000 population)
External resources for health as percentage of total expenditure on health
General government expenditure on health as percentage of total expenditure on health
General government expenditure on health as percentage of total government expenditure
Hospital beds (per 10 000 population)
Laboratory health workers density (per 10 000 population)
Number of community and traditional health workers
Number of dentistry personnel
Number of environment and public health workers
Number of laboratory health workers
Number of nursing and midwifery personnel
Number of other health service providers
Number of pharmaceutical personnel
Nursing and midwifery personnel density (per 10 000 population)
Other health service providers density (per 10 000 population)
Out-of-pocket expenditure as percentage of private expenditure on health
Per capita total expenditure on health (PPP int. $)
Per capita total expenditure on health at average exchange rate (US$
Pharmaceutical personnel density (per 10 000 population)
Physicians density (per 10 000 population)
Private expenditure on health as percentage of total expenditure on health
Private prepaid plans as percentage of private expenditure on health
Ratio of health management and support workers to health service providers
Ratio of nurses and midwives to physicians
Social security expenditure on health as percentage of general government expenditure on health
Total expenditure on health as percentage of gross domestic product

New England still too wet. Escaping to sunny Old England.

We’re just back from a Caribbean vacation — with a couple of interesting souvenirs in tow. Under normal circumstances I’d feel a twinge of regret about turning around a day later and heading out again. But I’m not really in the mood to build an ark, which after 40 days of rain is about to become the new summer sport here in New England. And while the wet isn’t letting up yet here, the weather looks lovely in Old England. So it’s actually a great time to head off to London for a Tuesday visit and talk at Nature Publishing, panels and a talk at the Activate conference on Wednesday, and another talk at the Guardian on Thursday. That one is open to the guests — for the first time, I gather. The writeup also notes:

Many people will then head down to the Rotunda bar for drinks on the canal waterfront after the talk at about 6.

In all these venues I’ll be expanding on the themes I’ve written about here lately: collaborative curation, computational thinking for everyone, community calendars as a motivating case study, and Azure as platform for doing stuff in the cloud.

By the time I get home for July 4, it ought to be dry here. If not, I’ll break out my cubit-calibrated tape measure and get to work on that ark.

It’s the headings, stupid!!!

My recent adventure in naming the times of day was so much fun that I lost track of the original purpose of the exercise, which was to improve accessibility for sight-impaired users.

When I interpersed time-of-day labels into each day’s event listing, I used HTML DIV tags. Wrong, wrong, wrong! Those labels are structural elements, and as my accessibility consultant Susan Gerhart gently reminded me, screen readers depend on HTML headings to find and announce them. The labels should have been second-level headings — i.e., HTML H2 tags.

It gets worse. When Susan prompted me to take another look at what I’d done, I found that the date labels were inexplicably tagged as paragraphs (P) instead of the top-level headers (H1) that they logically are.

Oh. Right. Of course. Duh. Fixed. Sorry.

What was I thinking? How could somebody like me, who has preached about the attention-focusing power of heads, decks, and leads, screw up something so basic as this?

Easily, as it turns out, in the absence of feedback. If you yourself don’t depend on a design feature, there is a natural tendency to forget why it matters to others.

Coincidentally (or not) Susan recently wrote an essay, and published a companion audio recording, that will help me — and I hope others — not to forget again. Entitled Hear Me Stumble Around White House, Recovery, and Data GOV web sites, it’s a blow-by-blow account of her efforts to navigate those sites with a screen reader.

In this recording you can hear Susan and her screen reader trying to make sense of whitehouse.gov. If you’ve never heard a screen reader in action, it’s worth listening for that alone. You’ll get a very clear sense of how these tools depend on the hierarchy of the page.

Simultaneously you’ll hear Susan narrate her intention — to read an article about cybersecurity — and her frustration. For example:

I was thrown off by the slide show at the top of the page. Once I hit the cybersecurity story, the next time I traverse this section the story was about the Supreme Court nominee.

Despite this randomness, the page does at least identify the top stories with H1 tags. And Signed Legislation is an H2. But none of the headlines under Signed Legislation are H3s, they’re Ps.

Over at recovery.gov and data.gov Susan finds none at all, and reacts to their omissions less gently she did to mine:

It’s the headings, stupid!!!

Thanks. I will try not to forget that again.


PS: In a follow-up to her blog essay, Susan links to detailed reports by accessibility pioneer Jim Thatcher on the issues he found with data.gov and recovery.gov.

Endangered languages and linguistic best practices

Daniel Everett’s recent Long Now talk about endangered languages (writeup, mp3) includes this gem reported by Stewart Brand:

Among other things, the wide variety of verb forms are used to account for the directness of evidence for a statement. Everett originally went to the PirahĂŁ in 1977 as a Christian missionary. They challenged him to provide evidence for the existence of Jesus, and lost interest when he couldn’t. Eventually so did he. The PirahĂŁ made him an atheist.

This is so interesting that it’s worth unpacking for those who won’t have time to listen. Among the sixteen suffixes for verbs, there are three that convey the source of evidence:

I heard that Dan went fishing.

I saw Dan go fishing.

I deduce, from the available evidence, that Dan went fishing.

These assertions might not be true. The PirahĂŁ, being human, do sometimes lie. But I love the idea of a culture in which evidence-based thinking is baked into the language.

There are only a few hundred PirahĂŁ, and their language is only one of thousands — more than half unwritten — that are endangered. The talk ends with plea to preserve and document those languages.

It has never been easier to capture and disseminate recorded audio, or to collaboratively curate such material, so I hope these capabilities will be put to good use in the quest to preserve linguistic diversity.

But no matter what, we’re going to continue to lose languages. Maybe, though, if we can identify some of the ways of thinking encoded in those languages, we can carry them forward.

Respect for the source of evidence is a great example. I could have simply told you about what Daniel Everett said, and what Stewart Brand wrote about what Daniel Everett said. But it was possible to form links to the audio and text, so I did.

I wonder how many other best practices are encoded in those thousands of endangered languages. And I wonder if it might be possible to identify and catalog more of them.

When does afternoon begin?

When I invited folks to become calendar curators for the elmcity project, the person who stepped forward in Prescott AZ was Susan Gerhart, whom I interviewed here. One of her great insights about web design is that the right thing for a vision-impaired user is almost always also the right thing for everyone. She calls this the curb cuts principle:

Curb cuts for wheelchairs also guide blind persons into street crossings and prevent accidents for baby strollers, bicyclists, skateboarders, and inattentive walkers.

So I shouldn’t have been surprised when Susan noticed that the HTML rendering of the calendar need some curb cuts. Within each day, the events show up as a long undifferentiated list. She suggested that subdividing the list by time of day — morning, afternoon, evening — will be helpful to folks using screen readers. But in fact, it’s just plain helpful. So I’m testing a version of that idea now.

Ionically I was just thinking about this same principle in another context. The new version of Oakland Crimespotting, which I raved about, segments incidents using this vocabulary:

light, dark, commute, nightlife, day, night, swing shift

In that spirit, I’m trying this:

morning, lunch, afternoon, evening, night

This of course leads to the question: When do these times begin and end?

I was fascinated to see that both Google and Bing return the same Yahoo answers page for the query morning afternoon evening.

For now, though, I’m going with this ruleset:

  Morning:  5:00 AM to 11:30 AM
    Lunch: 11:30 AM to  1:00 PM
Afternoon:  1:30 PM to  5:30 PM
  Evening:  5:30 PM to  9:00 PM
   Night:   9:00 PM to  5:00 AM

But I’ll make these rules — and maybe even the time-of-day names — configurable on a per-location basis.

Bulk search-and-replace for blog entries

Last night I realized there was one more step needed to restore my 2002-2006 archive. All of my references into that archive from this blog, which started in December 2006, had to be redirected. What’s more, they had to be remapped. Old URLs like http://weblog.infoworld.com/udell/2006/12/04.html#a1571 had to become new URLs like http://jonudell.net/udell/2006-12-04-hunting-the-elusive-search-strategy.html.

Even without the remapping, it’s not obvious how to do a simple search and replace (say, from weblog.infoworld.com/udell to jonudell.net/udell) across a set of blog entries. I tried the export/edit/import route, but — at least in the case of WordPress — that doesn’t seem to be a way to update existing stuff.

So I wound up writing a script that uses the MetaWeblog API to fetch my current blog entries, find references to the old namespace, adjust them to point to the new namespace, and update the entries. It’s here for my own future reference, and for yours if you need it.

As always in these situations, I end up wondering what a civilian would do. Blog publishing systems don’t seem have bulk search-and-replace capability. They do, however, have APIs. There could be a tool or service that helps people make these kinds of changes. It’d be hard to avoid the password anti-pattern, so if this were a cloud-based service rather than a locally-installed tool you’d want to change your password after using it. But still, it should be doable.

Do such tools or services exist?

Rebooting my 2002-2006 archive

While spot-checking my mostly-reconstructed 2002-2006 blog, I found this plaint from 2002:

When you are a writer whose entire corpus exists online, woven into a fabric of citation and commentary, it is incredibly painful to see that fabric torn apart.

DĂ©jĂ  vu all over again. In 2002 I had to sacrifice the linkage to my 1999-2002 BYTE.com and restore it here. Now I’ve done the same for my 2002-2006 InfoWorld blog. Since its former namespace isn’t being redirected, and since all the old links were broken anyway, I’ve taken this opportunity to create new descriptive names that incorporate dates and titles.

The reboot isn’t 100% clean, but it’s automated and reproducible so I can address categories of problems as they show up.

I’m glad I’m not in publishing anymore. It turns out to be a lousy way to keep your stuff published. When a commercial hosted lifebits service comes online, I’ll be customer #1.

Scribbling in the margins of iCalendar

Last week I mentioned three ways for elmcity curators to categorize events:

  1. If a source iCalendar feed uses the CATEGORIES property, they’ll be included.

  2. If all of the events from a feed can be categorized, you can name that category in the Delicious metadata, using category=CATEGORY. All events from the feed will inherit it in the same way that they all inherit the default clickthrough link specified with url=URL.

  3. If all of the events from an Upcoming or Eventful venue can be categorized, you can also name that category in the Delicious metadata. To do that, bookmark the venue URL and use the patterns venue={UPCOMING|EVENTFUL} and category=CATEGORY.

Now I’ve added a fourth. In any iCalendar app you can now use these patterns in the Description field:

url=http://www.harlowspub.com

category=music,bluegrass

The url=… and category=… patterns can occur anywhere in the description.

This is particularly useful for recurring events. As discussed here, recurring events are a great way to build critical mass because your curation effort keeps paying dividends.

For example, one of the events I found when exploring the search page for Keene is the Monday night bluegrass jam at Harlow’s Pub.

Here’s the description I entered into Windows Live Calendar — which also could have been entered into Google Calendar, or any other iCalendar app:

The Birch Benders host a Bluegrass picking party at Harlow’s Pub in Peterborough every Monday night – 8 pm until they kick us out (11 or so). url=http://www.harlowspub.com category=music

Here’s the rendered result:

Mon 08:00 PM Bluegrass night with the Birchbenders (recurring events) (music)

The same data shows up in the downstream XML, ICS, and JSON feeds.

Since the iCalendar spec allows for a CATEGORIES element, this approach shouldn’t be necessary. But not all calendar apps allow you to tag events in this way. Outlook does, but Google Calendar, Live Calendar, and Apple iCal don’t.

Fortunately we can scribble in the margins. I first used that phrase in an InfoWorld story about a feature of the Internet’s Domain Name System called the TXT record. Although it is possible to define more specific record types, it’s hard to get everyone to agree to use them. So developers have historically “scribbled in the margins” of the DNS. And we can do the same with iCalendar.


PS: The title of that InfoWorld story was actually Filling in the Margins, which wasn’t what I wrote and which I never liked. The title I wrote was Scribbling in the Margins, and I used it for the blog entry that introduced the InfoWorld article. I’ll have that entry back online soon, along with the rest of my archive from that era. But meanwhile, when I search for the title using doublesearch, I notice an interesting point of comparison between Google and Bing. It’s been over a month since that blog archive went dark, and Google has now evidently forgotten about it. But Bing remembers. I don’t have any special insight into how Bing works, but I’ll be interested to see how long it keeps remembering.

Replaying history

In his writeup on Google Wave, Dare Obasanjo says:

I’m sure there are thousands of Web developers out there right now asking themselves “would my app be better if users could see each others’ edits in real time?”,”should we add a playback feature to our service as well” [ed note – wikipedia could really use this] and “why don’t we support seamless drag and drop in our application?”. All inspired by their exposure to Google Wave.

Indeed, every application that preserves a change history needs playback. Wikipedia, as Dare notes, is a prime candidate. Back in 2006, I made this LazyWeb request:

Animation is the best way to visualize the flow of change, as I discovered when I made my Wikipedia screencast. For Wikipedia, and indeed for all kinds of living documents supported by revision history and diff tools, I can imagine being able to isolate a paragraph or section and autogenerate the screencast of its evolution. I can even imagine the content of such visualizations being considered not just cutting-room floor debris but, rather, part of the “real” document, like footnotes.

Andy Baio responded by sponsoring a contest for a tool that would do just that. And I made a screencast demonstrating Dan Phiffer’s winning entry.

That script is unavailable at the moment because, ironically, Dan’s server reports:

Oh noes! I got HACK*D. I’m sifting through my files and should restore things back to normal soon.

In any case, it probably wasn’t practical for routine use. Fetching every revision on the fly really hammers Wikipedia. What’s really needed — again, not just for Wikipedia but everywhere — is a general way to query change history, and return a stream of versions and differences.

One way of doing the latter would be to use FeedSync, an open extension to RSS/Atom that supports synchronization in Live Mesh. Another would be to use Google’s Wave protocol. Because FeedSync deals with lists of items, which can be arbitrary chunks of content, whereas Wave deals with lists of document-mutation operations, like delete-element and start-annotation, it seems to me that FeedSync is more general, albeit less immediately useful for collaborative editing.

To explain why generality matters, consider change animation in a very different domain: software configuration. My wife, for example, sometimes changes her settings — in Word or Firefox — in ways that cause problems. If these apps persisted their settings to Live Mesh, as they could and arguably should, I’d be able to debug a mishap locally or remotely. But ideally, the change visualization would be sufficiently user-friendly so that she’d have a shot at figuring it out for herself.


PS: Speaking of history and restoration, I’ve been feeling like an amnesiac ever since my InfoWorld archive went dark. So in spare moments I’ve been reconstructing and republishing it. I’ll have the text of all the old blog entries up soon. And I’ve been restoring the screencasts as well. I’m keeping track of my progress at delicious.com/judell/screencast+restored.

More usefully cool stuff from Stamen

My plumber’s last name is Thieme. I was just looking up his phone number, and got distracted when I realized that the people search in Live Bing does a fair job of visualizing the geographic distribution of surnames. If you do a people search for Thieme, New Hampshire, and start panning around at county and state resolutions, you can see where Thiemes have clustered and where they haven’t.

As I was doing this, I suddenly realized: Why don’t maps offer named zoom levels? If you want to pan across the country at state or county resolution, it requires an enormous amount of continuous zooming in and out. Of course the sizes of states and counties vary as you move across the country. But that’s the whole point. Computers can do the math and automate those adjustments.

What prompted this thought was the newly-redesigned Oakland Crimespotting, which features a nifty new widget for selecting times of day. Stamen Designs’ Eric Rodenbeck, whom I recently interviewed, calls it the time pie. It’s fun to spin your way through the hours, making contiguous or discontiguous selections. But what’s really useful are the named slices: light, dark, commute, nightlife, day, night, swing shift. As Stamen’s blog notes:

The last time slices (day, night and swing) are the ways that the police view this information, and one thing we hope will come from the project is a better understanding of how the police view their data as it’s collected.

Nice!

What you may not notice, as you navigate the new interface, is that every adjustment is reflected in an exquisitely detailed URL. It’s not obvious because the URLs are really long, and the changes happen outside the visible part of the browser’s location window. But watch:

Default: http://oakland.crimespotting.org/map/#dtend=2009-06-04T20:35:28-07:00&lat=37.806&types=AA,Mu,Ro,SA,DP,Na,Al,Pr,Th,VT,Va,Bu,Ar&lon=-122.270&hours=16-23&zoom=14&dtstart=2009-05-28T20:35:28-07:00

Hide all crime types: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&amp;types=&lon=-122.270&hours=0-23&zoom=14&dtstart=2009-05-28T23:59:59-07:00

Show all and extend dates to max range: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&amp;types=AA,Mu,Ro,SA,DP,Na,Al,Pr,Th,VT,Va,Bu,Ar&lon=-122.270&hours=0-23&zoom=14&dtstart=2009-05-08T00:00:00-07:00

Narcotics only: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&amp;types=Na&lon=-122.270&hours=0-23&zoom=14&dtstart=2009-05-08T00:00:00-07:00

Nighttime narcotics: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=Na&lon=-122.270&amp;hours=16-23&zoom=14&dtstart=2009-05-08T00:00:00-07:00

Wee hours narcotics: http://oakland.crimespotting.org/map/#dtend=2009-06-04T23:59:59-07:00&lat=37.806&types=Na&lon=-122.270&amp;hours=1-4&zoom=14&dtstart=2009-05-08T00:00:00-07:00

As noted on the Stamen blog, this means that:

It’s now possible to navigate and link to recent newsworthy events like the assassination of journalist Chauncey Bailey, the Oscar Grant riots from January 2009, and the Lovelle Mixon incident from this past March.

The Stamen crew is renowned for brilliance, and rightly so. But the principles at work here — thoughtful naming, granular linking — are ones that we all can and should practice, in the many small ways that we can as we explore and co-create the infosphere.

Categorizing events

Curation is always a two-step tango. First you collect, then you categorize. Until now, the elmcity project has been all about collecting. But as the nodes of this network of community hubs start to light up, and as curators gather growing numbers of calendar feeds, it’s time to start enabling them to categorize as well.

This is a classic hard problem. How do you get people to tag hundreds or thousands of items? What makes the problem even harder, in the domain of events, is that once those items fade into the past, any effort invested in tagging them is lost.

My answer is, at least for now: Don’t worry too much about tagging individual events. Instead, gain leverage by finding ways to tag sources of events. Here are two good strategies:

1. Categorizing iCalendar feeds

The obvious place to start is with the iCalendar feeds that curators are collecting. There’s already a mechanism in place to capture metadata about those feeds. Here, for example, is the iCalendar feed for the 2009 Board of Supervisors meetings in Prescott, AZ:

http://fusecal.com/calendar/ical/3200531?h=b75b09c8-50c2-11de-9169-00163e12298c

That’s an iCalendar feed that was made from this web page:

http://www.co.yavapai.az.us/Events.aspx/id=32794

If you check the Delicious metadata for Prescott’s iCalendar feeds, you’ll see this structure:

title: Board of Supervisors
  url: http://fusecal.com/calendar/ical/3200531?h=b75b09c8-50c2-11de-9169-00163e12298c
  tag: trusted
  tag: ics
  tag: feed
  tag: url=http://www.co.yavapai.az.us/Meetings.aspx/folderid=1488&year=2009
  tag: category=government

The url= tag was already there. It provides the all-important link back to a human-readable authoritative source for events coming from this feed. It’s best if individual events provide their own links, but often in iCalendar feeds they don’t, so this is the default link.

What’s new is the category= tag. Now all events coming from this feed will carry that category. For example:

Mon Jun 15 2009


Regular Meeting – Cottonwood N/A
(Board of Supervisors)
(government)

The same info travels downstream, to the aggregated Prescott iCalendar feed:

BEGIN:VEVENT
CATEGORIES:government
DESCRIPTION:Regular Meeting - Cottonwood N/A \n\n****************
nfrom  FuseCal.com\n ******************************\n\n
DTSTART;VALUE=DATE:20090615
LOCATION: (see http://www.co.yavapai.az.us/Events.aspx?id=32794)
SEQUENCE:0
SUMMARY:Regular Meeting - Cottonwood N/A         
UID:633797255542010000-1196352865@elmcity.cloudapp.net
URL:http://www.co.yavapai.az.us/Events.aspx?id=32794
END:VEVENT

And to the aggregated XML feed:

<event>
<title>Regular Meeting - Cottonwood N/A</title>
<url>http://www.co.yavapai.az.us/Events.aspx?id=32794</url>
<source>Board of Supervisors</source>
<dtstart>2009-06-15T00:00:00</dtstart>
<categories>government</categories>
</event>

This strategy only works, for course, for feeds that can be categorized. And that won’t always be true. Events coming from the ReadItNews feed don’t fit into any single category (or short list of categories). So they’ll remain untagged for now. That’s OK. Better to make some progress than to make none. This partial approach yields a nice return on investment. And thanks to the bulk editing feature of Delicious, it’s really quick and easy to select a set of feeds and then tag them with a category= tag.

2. Categorizing Eventful and Upcoming venues

We can use a variation of this strategy to categorize sources of events coming from Eventful and Upcoming. In this case, the lever is the venue. Not all venues host events that can be categorized. But some do, and in those cases, why not exploit that?

The strategy here is to bookmark and tag the event’s venue URL from Upcoming or Eventful. Here are two examples:

Upcoming

title: Venue: Prescott YMCA - Upcoming
  url: http://upcoming.yahoo.com/venue/435420
  tag: venue=upcoming
  tag: category=recreation

Eventful

title: Venue: Raven Cafe
  url: http://eventful.com/prescott/venues/raven-cafe-/V0-001-000366078-7
  tag: venue=eventful
  tag: category=music

If you check the default HTML view of Prescott’s aggregated events, you’ll see that these categories indeed show up. They’re also in the downstream XML, ICS, and JSON feeds.

But can’t the source iCalendar feeds provide per-event categories?

Yes, some do. In the case of Prescott, the public library‘s iCalendar feed uses the CATEGORIES property, so those categories show up too. For example:

Thu 02:00 PM
Sign up for Computer Mentor
(Prescott Library)
(Adult Computer Class,library)

Here we see a list of two categories. The first item, Adult Computer Class, was in the original iCalendar feed. The second item, library, was inherited from the feed metadata specified by the curator.

There’s a long way to go with this stuff. But this is a nice start!

Talking with Jamie Heywood about PatientsLikeMe

Jamie Heywood joined me for this week’s Innovators show. His quest to cure ALS (Amyotrophic Lateral Sclerosis, aka Lou Gehrig’s Disease) is featured in a book and a movie. In this conversation, we explore Jamie’s current project: PatientsLikeMe. It’s a website where people pool data about their medical conditions, their drug regimes and related therapies, and their outcomes.

Of course people have been sharing medical information online since it became possible to do so. But PatientsLikeMe differs from other online health communities in several ways. The profile of a user is someone who is grappling with a serious, life-changing illness where:

  • You are very debilitated, perhaps even unable to go to work.

  • You can tell if your treatment is helping. (If you have Parkinson’s disease or depression, for example, you can judge what works or doesn’t. If you have breast cancer, you can’t.)

  • You are in a situation where both diagnosis and treament are ambiguous.

The data that you report brings you into direct contact with other patients who share similar conditions and treatments. In this sense, PatientsLikeMe is a uniquely data-driven social network:

It is the richest open quantified human-to-human network that exists. There are a couple of hundred measured channels on which you can evaluate yourself against everyone else that you might be interested in connecting to. And you can go across any of those channels to anyone else in the world.

The data you report also brings you into direct contact with drug companies:

It connects you with the people who are developing the drugs to treat your disease. This cuts out an immense amount of inefficiency and middlemen, and can potentially make the system much better. It’s a way of rationalizing and accelerating discovery.

For that reason, Jamie sees no need to apologize for PatientsLikeMe’s business model, which is to sell the data it collects to drug companies. This arrangement may even, arguably, be a form of citizen science:

Do I think that we’ll be using crowdsourcing to interpret the RNA signature in blood? No. But in the real world, when you ask what it means to have ALS, each patient in the system is a representative of their own specific phenotype of this illness. Which is a way of putting it into the process of discovery. Because if you’re not in there — if you’re different, and everyone is unique in some way — the specific components of your own health and its impacts on your life will not be addressed in the process of treatment.

What about privacy? Jamie admits, honestly, that there can be no guarantees, and does not think people who expect guarantees should use PatientsLikeMe. It isn’t for everyone. But there are a number of folks who, after evaluating the risk of participating (pseudonymously) in the service, conclude that the benefit outweighs that risk. They are part of a collective experiment that I will be watching with the greatest interest.

Useful feedback from old friends and new friends

In the last few days I’ve received useful feedback on the elmcity project from an old friend (whom I’ve never met in person), and a new friend (whom I have). The old friend is Jake Ochs, an accomplished technologist who, like John Faughnan, was a valued online correspondent back in the BYTE era. The new friend is Mykel Nahorniak whom I met at Transparency Camp 2009. Mykel is cofounder of the social event listing platform Localist, and has been curating the elmcity project’s Baltimore hub.

Both Mykel and Jake are intrigued by the elmcity project, but are skeptical about the approach and likely outcome. Here’s Mykel, quoted with permission from email:

It’s already a challenge to convince a local venue that they need a Web site, let alone a Twitter presence, let alone an iCal feed. I think the return a lot of businesses are seeing from social media has helped motivate these local businesses, though.

Really, it’s about giving them a tangible return on their efforts. What incentive do these businesses have to curate their calendars in a specific format when, realistically, it’s not going to equal the return they’d get on, say, curating a Twitter account. That’s what needs to be determined on our end. Specific examples that would give a business no excuse to say “no.”

And here’s Jake, writing on his blog:

I can’t help but feel that Jon is missing the bigger picture. Well, he’s “getting” the bigger picture -that calendar-ish data will probably be a “big” thing. His recombinant approach to existing tools and ideas, though, probably isn’t it. The ability to create such mashups is a hallmark of the “Web 2.0” era and Jon, once again, displays his masterful ability to create something powerful from simple, existing substrates. Historically, it’s been the entrepreneurs that somehow grasp a simple concept regarding human behavior -or an evolved human behavior- and bring that concept to bear on a traditionally complex problem that win out in the marketplace. I don’t have any idea what that concept will look like, so don’t ask, but I highly doubt that it will contain the recombinant DNA of existing solutions when it debuts.

Mind you, I said when it debuts. After the magical mystery viral calendar tool of the future gains traction, a clamor will be made for an API that will draw the tool into the prevailing social tapestry. (Facebook and Twitter today, who knows what tomorrow?) I wonder, though, will iCal make it into that mix when the day comes or is iCal’s fundamentally one-way nature not be up to the task of the wonder collaboration of tomorrow?

Lately I’ve been pitching my project to folks who don’t dwell the geek ghetto. And I’ve been telling plain stories that seem to resonate — at least in the old-fashioned way, one-on-one and face-to-face. Here’s one of them:

The Monday night chess club

The chess club in Keene gets together on Monday nights at 6:30. They used to gather at the Best Western hotel. Then they switched to the E.F. Lane hotel. For at least a year after the move, the Keene Sentinel’s community bulletin board continued to list the event at the Best Western. If the chess club had published its own authoritative feed, and communicated the address of that feed to the Sentinel — instead of transmitting a copy of soon-to-be-stale data — there might be a few more chess players showing up on Monday nights.

Why should businesses want to publish information in a syndication-friendly format? Because, like all of us, they want to be the authoritative source for information about themselves. And because they don’t want to have to remember, and refresh, every touchpoint to which they have transmitted data by value rather than by reference.

Is iCal’s “fundamentally one-way nature up to the task of the wonder collaboration of tomorrow?” True, iCalendar is a decade-old standard that has never rocked the Internet, and maybe never will. But one-way? That limitation exists only in the eye of the beholder. The chess club can publish a calendar that the Keene Sentinel can subscribe to.1 The Sentinel, in turn, can aggregate those subscriptions into a combined calendar that members of the chess club — and others in the community — can subscribe to. Those other individuals and organizations can also be publishers and subscribers. The system I am building is not really about iCalendar. It’s about the principles, patterns, and practices that make pub/sub ecosystems such fertile ground for communication and collaboration.

Of coure Mykel and Jake are right, and I value their skepticism. I haven’t yet figured out how to make the chess club anecdote go viral, or tell it in a way that business can’t say no to. But I’m warming to the task, and I’m starting to connect with environmental activists, librarians, civic-minded geeks, and colleagues who can help me advance the story.


1 The infrastructure that I’m building is dedicated to this purpose. If you’re a newspaper, a library, a chamber of commerce, or some other natural attention hub in your community, I want to help you syndicate calendars through your hub.

A conversation with Eric Rodenbeck about usefully cool design and engineering

My guest for this week’s ITConversations show is Eric Rodenbeck, the creative director of Stamen Design. His 2008 ETech talk wowed me, and inspired this meditation on time, space, and data.

Near the end of this interview, as we were discussing the tension between graphic design and engineering sensibilities, Eric said:

When it was just me, working as a designer, I was having fun, but I wasn’t able to be effective. And when Mike [Michal Migurski, Stamen’s technical architect] was doing tech work for PR companies, it wasn’t all that great. But when we came together, suddenly we had something.

Even in a design studio that we control, though, it’s hard to address that split between the lush sexy design versus the tech. Versus! Why is it always versus?

Exactly. Eric also notes another false dichotomy: cool versus useful. We violently agreed that coolness and utility are two sides of the same coin.

For that reason, it would fun to also talk to Eric’s technical partner Mike Magurski. In this interview, we learn that he created the original API for Oakland Crimespotting by scraping this police site, which (still) produces map images like this:

Mike’s task was to identify and locate incidents by writing code that would scan those images for “purple bras, boxing gloves, and hypodermic needles.” Which is funny, but also sad. So many more usefully cool things will be able to happen when publishers of data finally start to learn how to publish data.

IronPython and the elmcity project: Together again

In the first installment of this elmcity+azure series my plan was to build an Azure-based calendar aggregator using IronPython. That turned out not to be possible at the time, because IronPython couldn’t run at full strength in Azure’s medium-trust environment. So I switched to C#, and have spent the past few months working in that language.

It’s been a long while since I’ve worked intensively in a compiled and statically-typed language. But I love being contrarian. At a time when low ceremony languages are surging in popularity, I’m revisiting the realm of high ceremony. It’s been an enjoyable challenge, I’ve gotten good results, and it’s given me a chance to reflect in a more balanced way on the “ceremony vs. essence” dialogue.

Meanwhile, Azure has moved forward. It now provides a full-trust environment. That means you can run PHP, which is interesting to a lot of folks, but it also means you can run IronPython, which is interesting to me.

In this entry I’ll show you how I’m starting to integrate IronPython in the two main components of my Azure project: the web role that provides the (currently minimal) user interface, and the worker role that does calendar aggregation.

Using IronPython in an ASP.NET MVC Azure web role

The elmcity service writes a lot of log data to an Azure table. I’ll want curators to be able to query the slices of that log that pertain to the cities whose calendars they are curating. For Providence, RI, which uses the elmcity (and delicious) id mashablecity, the URLs for those queries might look something like this:

/services/mashablecity/log_info (log entries of type “info”)

/services/mashablecity/log_exception (log entries of type “exception”)

Here’s an URL route to carve out a namespace shaped like that:

routes.MapRoute(
 "services",
 "services/{id}/{what}",
 new { controller="LogServices", action="QueryLog", id="", what="" },
 );

Here’s a simplified version of the corresponding LogServicesController.cs:

[HandleError]
public class ServicesController : Controller
  {
  public ActionResult QueryLog(string id, string what)
    {
    return new ObjectResult(id, what);
    }
  }

public class ObjectResult: ActionResult
  {
  string id;
  string what;

  public ObjectResult( string id, string what)
    {
    this.id = id;
    this.what = what;
    }

  public override void ExecuteResult(ControllerContext context)
    {
    switch (this.what)
      {
      case "log_info":
         var script_url = make_script_url(this.id, this.what);
         var args = new List() { this.id, this.what };
         var result = new ContentResult
          {
          ContentType = "text/plain",
          Content = Utils.run_ironpython(script_url, args),
          ContentEncoding = UTF8
          };
        result.ExecuteResult(context);
        break;
      case  "log_exception":
      // etc
      }
    }

This fragment takes in the URL parameters, forms the URL that IronPython will use to fetch the script that it runs, packages the parameters into a list, calls a method to invoke IronPython, and dumps the script’s output into the outgoing HTTP response.

Here’s the code to invoke IronPython:

public static ScriptEngine python = Python.CreateEngine();

public static string run_ironpython(string script_url, List args)
  {
  var ipy_args = new IronPython.Runtime.List();
  foreach (var item in args)
    ipy_args.Add(item);
  var result = "";
  try
    {
    var s = Utils.FetchUrl(script_url).data_as_string; 
    var source = python.CreateScriptSourceFromString(s, 
      SourceCodeKind.Statements);
    var scope = python.CreateScope();
    var sys = python.GetSysModule();
    sys.SetVariable("argv", args);
    source.Execute(scope);
    result = scope.GetVariable("result").ToString();
    }
  catch (Exception e)
    {
    result = e.Message.ToString() + e.StackTrace.ToString();
    }
  return result;
  }

Whatever the script deposits in a Python variable called result winds up as the content of the HTTP response.

Using IronPython in an Azure worker role

Until recently I’ve been running some IronPython maintenance scripts from a standalone client machine. Now I’ve pushed them to the cloud. Here’s the scheduler that sets a timer to invoke a handler on a periodic basis:

public static void scheduler (ElapsedEventHandler handler, int minutes)
  {
  var timer = new Timer();
  timer.Elapsed += handler;
  timer.AutoReset = true;
  timer.Interval = 1000 * 60 * minutes;
  timer.Start();
  }

And here’s the handler:

public static void IronPythonHandler(object o, ElapsedEventArgs e)
  {
  try
    {
    var s = Utils.FetchUrl(Configurator.ADMIN_SCRIPT).data_as_string;
    var source = python.CreateScriptSourceFromString(s, 
       SourceCodeKind.Statements);
    var scope = python.CreateScope();
    source.Execute(scope);
    ts.write_log_message("info", "IronPythonHandler");
    }
  catch (Exception ex)
    {
    ts.write_log_message("exception", "IronPythonHandler", 
      ex.Message.ToString() + ex.StackTrace.ToString());
    }
  }

Best of both worlds

I’m still sorting out how I want to combine these two worlds, and I’m having a blast doing it. Could I have written the whole system in IronPython, had the option been available when I started? Undoubtedly. But high ceremony, coupled with a sophisticated tool like Visual Studio, has its charms. So does low ceremony and emacs. Using both together, and leveraging all their strengths, is really productive. And it’s loads of fun too.

Talking with Philip Rosedale about organizational dynamics

On this week’s Innovators show I talked with Philip Rosedale about the ways in which Second Life, the virtual world, and Linden Lab, the real company, are laboratories for experimenting with social, economic, and organizational principles.

As I was editing the show, I sent some of the notable quotes to Twitter:

On transparency and central control:

As communication technology makes transparency cheaper, the need for central control drops.

On why Second Life works well for group meetings:

We spatialize the audio so you hear where everyone’s voice is coming from.

On distributed development:

We don’t specialize roles by geographic location.

The Linden Lab experience with decentralization, transparency, and fluid team formation echoes what we’ve heard from Andy Singleton. Philip Rosedale adds this thoughtful observation:

There’s a tension between people’s desire to work together in a cohesive, familial kind of unit, and the organization’s need to have people work together in the way that’s optimal for projects, where you want to attack a problem, work together, disband, and then reform to work with different people on the next problem.

Even if you will never fly an avatar around in Second Life, or use the in-world construction kit to build a 3D object, it’s fascinating to hear about the organizational strategies that Philip Rosedale believes make it all possible.