Why we need an XML representation for iCalendar



On this week’s Innovators show I got together with two of the authors of a new proposal for representing iCalendar in XML. Mike Douglass is lead developer of the Bedework Calendar System, and Steven Lees is Microsoft’s program manager for FeedSync and chair of the XML technical committee in CalConnect, the Calendaring and Scheduling Consortium.

What’s proposed is no more, but no less, than a well-defined two-way mapping between the current non-XML-based iCalendar format and an equivalent XML format. So, for example, here’s an event — the first low tide of 2009 in Myrtle Beach, SC — in iCalendar format:

SUMMARY:Low Tide 0.39 ft

And here’s the equivalent XML:

      <date-time utc='yes'>
      <date-time utc='yes'>
      <text>Low Tide 0.39 ft</text>

The mapping is quite straightforward, as you can see. At first glance, the XML version just seems verbose. So why bother? Because the iCalendar format can be tricky to read and write, either directly (using eyes and hands) or indirectly (using software). That’s especially true when, as is typical, events include longer chunks of text than you see here.

I make an analogy to the RSS ecosystem. When I published my first RSS feed a decade ago, I wrote it by hand. More specifically, I copied an existing feed as a template, and altered it using cut-and-paste. Soon afterward, I wrote the first of countless scripts that flowed data through similar templates to produce various kinds of RSS feeds.

Lots of other people did the same, and that’s part of the reason why we now have a robust network of RSS and Atom feeds that carries not only blogs, but all kinds of data packets.

Another part of the reason is the Feed Validator which, thanks to heroic efforts by Mark Pilgrim and Sam Ruby, became and remains the essential sanity check for anybody who’s whipping up an ad-hoc RSS or Atom feed.

No such ecosystem exists for iCalendar. I’ve been working hard to show why we need one, but the most compelling rationale comes from a Scott Adams essay that I quoted from in this blog entry. Dilber’s creator wrote:

I think the biggest software revolution of the future is that the calendar will be the organizing filter for most of the information flowing into your life. You think you are bombarded with too much information every day, but in reality it is just the timing of the information that is wrong. Once the calendar becomes the organizing paradigm and filter, it won’t seem as if there is so much.

If you buy that argument, then we’re going to need more than a handful of applications that can reliably create and exchange calendar data. We’ll want anyone to whip up a calendar feed as easily as anyone can now whip up an RSS/Atom feed.

We’ll also need more than a handful of parsers that can reliably read calendar feeds, so that thousands of ad-hoc applications, services, and scripts will be able consume all the new streams of time-and-date-oriented information.

I think that a standard XML representation of iCalendar will enable lots of ad-hoc producers and consumers to get into the game, and collectively bootstrap this new ecosystem. And that will enable what Scott Adams envisions.

Here’s a small but evocative example. Yesterday I started up a new instance of the elmcity aggregator for Myrtle Beach, SC. The curator, Dave Slusher, found a tide table for his location, and it offers an iCalendar feed. So the Myrtle Beach calendar for today begins like this:

Thu Jul 23 2009


Thu 03:07 AM Low Tide -0.58 ft (Tide Table for Myrtle Beach, SC)


Thu 06:21 AM Sunrise 6:21 AM EDT (Tide Table for Myrtle Beach, SC)
Thu 09:09 AM High Tide 5.99 ft (Tide Table for Myrtle Beach, SC)
Thu 10:00 AM Free Coffee Fridays (eventful: )
Thu 10:00 AM Summer Arts Project at The Market Common (eventful: )
Thu 10:00 AM E.B. Lewis: Story Painter (eventful: )

Imagine this kind of thing happening on the scale of the RSS/Atom feed ecosystem. The lack of an agreed-upon XML representation for iCalendar isn’t the only reason why we don’t have an equally vibrant ecosystem of calendar feeds. But it’s an impediment that can be swept away, and I hope this proposal will finally do that.

More fun than herding servers

Until recently, the elmcity calendar aggregator was running as a single instance of an Azure worker role. The idea all along, of course, was to exploit the system’s ability to farm out the work of aggregation to many workers. Although the sixteen cities currently being aggregated don’t yet require the service to scale beyond a single instance, I’d been meaning to lay the foundation for that. This week I finally did.

Will there ever be hundreds or thousands of participating cities and towns? Maybe that’ll happen, maybe it won’t, but the gating factor will not be my ability to babysit servers. That’s a remarkable change from just a few years ago. Over the weekend I read Scott Rosenberg’s new history of blogging, Say Everything. Here’s a poignant moment from 2001:

Blogger still lived a touch-and-go existence. Its expenses had dropped from a $50,000-a-month burn rate to a few thousand in rent and technical costs for bandwidth and such; still, even that modest budget wasn’t easy to meet. Eventually [Evan] Williams had to shut down the office entirely and move the servers into his apartment. He remembers this period as an emotional rollercoaster. “I don’t know how I’m going to pay the rent, and I can’t figure that out because the server’s not running, and I have to stay up all night, trying to figure out Linux, and being hacked, and then fix that.”

I’ve been one of those guys who babysits the server under the desk, and I’m glad I won’t ever have to go back there again. What I will have to do, instead, is learn how to take advantage of the cloud resources now becoming available. But I’m finding that to be an enjoyable challenge.

In the case of the calendar aggregator, which needs to map many worker roles to many cities, I’m using a blackboard approach. Here’s a snapshot of it, from an aggregator run using only a single worker instance:

     id: westlafcals
  start: 7/14/2009 12:12:05 PM
   stop: 7/14/2009 12:14:46 PM
running: False

     id: networksierra
  start: 7/14/2009 12:14:48 PM
   stop: 7/14/2009 12:15:05 PM
running: False

     id: localist
  start: 7/14/2009 12:15:06 PM
   stop: 7/14/2009  5:37:03 AM
running: True

     id: aroundfred
  start: 7/14/2009  5:37:05 AM
   stop: 7/14/2009  5:39:20 AM
running: False

The moving finger wrote westlafcals (West Lafayette) and networksierra (Sonora), it’s now writing localist (Baltimore), and will next write aroundfred (Fredericksburg).

Here’s a snapshot from another run using two worker instances:

     id: westlafcals
  start: 7/14/2009 10:12:05 PM
   stop: 7/14/2009  4:37:03 AM
running: True

     id: networksierra
  start: 7/14/2009 10:12:10 PM
   stop: 7/14/2009 10:13:05 PM
running: False

     id: localist
  start: 7/14/2009 10:13:06 PM
   stop: 7/14/2009  4:41:12 AM
running: True

     id: aroundfred
  start: 7/14/2009  4:41:05 AM
   stop: 7/14/2009  4:42:20 AM
running: False

Now there are two moving fingers. One’s writing westlafcals, one has written networksierra, one’s writing localist, and one or the other will soon write aroundfred. The total elapsed time will be very close to half what it was in the single-instance case. I’d love to crank up the instance count and see an aggregation run rip through all the cities in no time flat. But the Azure beta caps the instance count at two.

The blackboard is an Azure table with one record for each city. Records are flexible bags of name/value pairs. If you make a REST call to the table service to query for one of those records, the Atom payload that comes back looks like this:

   <d:start>7/14/2009 4:41:05 AM</d:start>
   <d:stop>7/14/2009 4:42:20 AM</d:stop>

At the start of a cycle, each worker wakes up, iterates through all the cities, aggregates those not claimed by other workers, and then sleeps until the next cycle. To claim a city, a worker tries to create a record in a parallel Azure table, using the PartitionKey locks instead of blackboard. If the worker succeeds in doing that, it considers the city locked for its own use, it aggregates the city’s calendars, and then it deletes the lock record. If the worker fails to create that record, it considers the city locked by another worker and moves on.

This cycle is currently one hour. But in order to respect the various services it pulls from, the service defines the interval between aggregation runs to be 8 hours. So when a worker claims a city, it first checks to see if the last aggregation started more than 8 hours ago. If not, the worker skips that city.

Locks can be abandoned. That could happen if a worker hangs or crashes, or when I redeploy a new version of the service. So the worker also checks to see if a lock has been hanging around longer than the aggregation interval. If so, it overrides the lock and aggregates that city.

I’m sure this scheme isn’t bulletproof, but I reckon it doesn’t need to be. If two workers should happen to wind up aggregating the same city at about the same time, it’s no big deal. The last writer wins, a little extra work gets done.

Anyway, I’ll be watching the blackboard over the next few days. There’s undoubtedly more tinkering to do. And it’s a lot more fun than herding servers.