Talking with Doug Day about the iCalendar validator

On this week’s Innovators show, Doug Day joins me to discuss the new iCalendar validator he has recently deployed on Azure.

The project draws inspiration from the pathbreaking RSS/Atom feed validator originally created by Mark Pilgrim and Sam Ruby. The RSS/Atom validator’s test-driven and advice-oriented approach is exemplary, and the iCalendar validator follows in its footsteps.

The tests, in this case, are iCalendar snippets that are, or are not, valid according to the spec. These snippets, packaged into XML files, form a library of examples that does not depend on the programming language used to run the tests. So although Doug’s validator, based on his open source parser, is written in C#, another validator written in Java or Python or Ruby could use the same test suite.

The advice offered is minimal so far, but I hope will expand as the test suite grows. Sam Ruby observes:

Identifying real issues that prevent real feeds from being consumed by real consumers and describing the issue in terms that makes sense to the producer is what most would call value.

In that spirit, I am gathering examples of calendars in the wild and looking for ways to help Doug add value.

In the podcast we discuss a nice example that came up recently in the curators’ room of the elmcity project. A custom-built calendar contained events (VEVENT components, in iCalendar-speak) with no start or end times (DTSTART and DTEND properties). This, it turns out, is not prohibited by the spec. But reporting no error is unhelpful. The author of the calendar — or of the software that produced the calendar — ought to be warned that such a calendar won’t yield a useful or expected result.

Why would anyone produce such a calendar in the first place? This harkens back to the early days of RSS. Many of us found that we could craft simple ad-hoc feeds in order to leverage RSS as a lightweight data exchange. It was liberating to be able to do that. But hand-crafted feeds, or feeds written by hand-crafted software, were valuable only to the extent they would reliably interoperate. Often they would not. The feed validator, by showing what was wrong with these feeds, and explaining why and how to fix them, was a powerful ally for those of us trying to bootstrap a feed ecosystem.

The iCalendar validator has a long way to go yet. But the road ahead is well lit, and I’m grateful to Doug Day for resolving to travel it.

Why we need an XML representation for iCalendar

Translations:

Croatian

On this week’s Innovators show I got together with two of the authors of a new proposal for representing iCalendar in XML. Mike Douglass is lead developer of the Bedework Calendar System, and Steven Lees is Microsoft’s program manager for FeedSync and chair of the XML technical committee in CalConnect, the Calendaring and Scheduling Consortium.

What’s proposed is no more, but no less, than a well-defined two-way mapping between the current non-XML-based iCalendar format and an equivalent XML format. So, for example, here’s an event — the first low tide of 2009 in Myrtle Beach, SC — in iCalendar format:

BEGIN:VEVENT
SUMMARY:Low Tide 0.39 ft
DTSTART:20090101T090000Z
UID:2009.0
DTSTAMP:20080527T000001Z
END:VEVENT

And here’s the equivalent XML:

<vevent>
  <properties>
    <dtstamp>
      <date-time utc='yes'>
        <year>2008</year><month>5</month><day>27</day>
        <hour>0</hour><minute>0</minute><second>1</second>
      </date-time>
    </dtstamp>
    <dtstart>
      <date-time utc='yes'>
        <year>2009</year><month>1</month><day>1</day>
        <hour>9</hour><minute>0</minute><second>0</second>
      </date>
    </dtstart>
    <summary>
      <text>Low Tide 0.39 ft</text>
    </summary>
    <uid>
      <text>2009.0</text>
    </uid>
  </properties>
</vevent>

The mapping is quite straightforward, as you can see. At first glance, the XML version just seems verbose. So why bother? Because the iCalendar format can be tricky to read and write, either directly (using eyes and hands) or indirectly (using software). That’s especially true when, as is typical, events include longer chunks of text than you see here.

I make an analogy to the RSS ecosystem. When I published my first RSS feed a decade ago, I wrote it by hand. More specifically, I copied an existing feed as a template, and altered it using cut-and-paste. Soon afterward, I wrote the first of countless scripts that flowed data through similar templates to produce various kinds of RSS feeds.

Lots of other people did the same, and that’s part of the reason why we now have a robust network of RSS and Atom feeds that carries not only blogs, but all kinds of data packets.

Another part of the reason is the Feed Validator which, thanks to heroic efforts by Mark Pilgrim and Sam Ruby, became and remains the essential sanity check for anybody who’s whipping up an ad-hoc RSS or Atom feed.

No such ecosystem exists for iCalendar. I’ve been working hard to show why we need one, but the most compelling rationale comes from a Scott Adams essay that I quoted from in this blog entry. Dilber’s creator wrote:

I think the biggest software revolution of the future is that the calendar will be the organizing filter for most of the information flowing into your life. You think you are bombarded with too much information every day, but in reality it is just the timing of the information that is wrong. Once the calendar becomes the organizing paradigm and filter, it won’t seem as if there is so much.

If you buy that argument, then we’re going to need more than a handful of applications that can reliably create and exchange calendar data. We’ll want anyone to whip up a calendar feed as easily as anyone can now whip up an RSS/Atom feed.

We’ll also need more than a handful of parsers that can reliably read calendar feeds, so that thousands of ad-hoc applications, services, and scripts will be able consume all the new streams of time-and-date-oriented information.

I think that a standard XML representation of iCalendar will enable lots of ad-hoc producers and consumers to get into the game, and collectively bootstrap this new ecosystem. And that will enable what Scott Adams envisions.

Here’s a small but evocative example. Yesterday I started up a new instance of the elmcity aggregator for Myrtle Beach, SC. The curator, Dave Slusher, found a tide table for his location, and it offers an iCalendar feed. So the Myrtle Beach calendar for today begins like this:

Thu Jul 23 2009

WeeHours

Thu 03:07 AM Low Tide -0.58 ft (Tide Table for Myrtle Beach, SC)

Morning

Thu 06:21 AM Sunrise 6:21 AM EDT (Tide Table for Myrtle Beach, SC)
Thu 09:09 AM High Tide 5.99 ft (Tide Table for Myrtle Beach, SC)
Thu 10:00 AM Free Coffee Fridays (eventful: )
Thu 10:00 AM Summer Arts Project at The Market Common (eventful: )
Thu 10:00 AM E.B. Lewis: Story Painter (eventful: )

Imagine this kind of thing happening on the scale of the RSS/Atom feed ecosystem. The lack of an agreed-upon XML representation for iCalendar isn’t the only reason why we don’t have an equally vibrant ecosystem of calendar feeds. But it’s an impediment that can be swept away, and I hope this proposal will finally do that.

iCalendar validation issue #3: Quoted-printable vs HTML

Next up in my series of iCalendar validation examples: The Frost Free Library feed. It fails in three of the four parsers I tried here, and should have failed in all. It begins like so:

BEGIN:VCALENDAR
VERSION:2.0
X-WR-CALNAME:Frost Free Library | January 06, 2009 - February 05, 2009
PRODID:-//strange bird labs//Drupal iCal API//EN
BEGIN:VEVENT
DTSTART;VALUE=DATE-TIME:20090106T203000Z
DTEND;VALUE=DATE-TIME:20090106T203000Z
SUMMARY;ENCODING=QUOTED-PRINTABLE:Library Tea
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:<p>Normal 0 false false false Mic= 
rosoftInternetExplorer4</p>=0D=0A<br class=3D"clear" />
URL;VALUE=URI:http://www.frostfree.org/node/505
UID:http://www.frostfree.org/node/505
END:VEVENT
END:VCALENDAR

It’s hard to know exactly what the feed producer thought it was doing here, but the feed should fail because no valid content line can begin with rosoft.... Adding a blank space at the beginning of all such lines will, I think, make the feed at least nominally valid.

But a robust validator would have more to say on the subject. It would notice that this feed is trying to publish HTML content, and would point out that there’s an ALTREP (alternative representation) for this purpose. Setting aside the fact that this feed doesn’t seem to have any actual HTML content, I believe the right way to encode such content would be something like this:

BEGIN:VCALENDAR
VERSION:2.0
X-WR-CALNAME:Frost Free Library | January 06, 2009 - February 05, 2009
PRODID:-//strange bird labs//Drupal iCal API//EN
BEGIN:VEVENT
DTSTART;VALUE=DATE-TIME:20090106T203000Z
DTEND;VALUE=DATE-TIME:20090106T203000Z
SUMMARY;ENCODING=QUOTED-PRINTABLE:Library Tea
DESCRIPTION;ALTREP="CID:xyz":Basic description here.
URL;VALUE=URI:http://www.frostfree.org/node/505
UID:http://www.frostfree.org/node/505
END:VEVENT
END:VCALENDAR

Content-Type:text/html
Content-Id:xyz
 <html><body>
 <p><b>Enhanced description here</b> Body of 
 enhanced description.</p>
 </body></html>

I don’t know to what extent ALTREPs are actually produced and consumed. My guess is rarely, and that producers might want to lean toward plain text with line folding when that’s sufficient. But that’s just my guess, I’d be interested to hear from folks who know.

iCalendar validation issues #1 and #2: blank lines, PRODID and VERSION

Sam Ruby offers the following advice to those of us who would like to improve the interoperability of iCalendar feeds:

Identifying real issues that prevent real feeds from being consumed by real consumers and describing the issue in terms that makes sense to the producer is what most would call value.

I’ll be documenting issues as I encounter them. Here’s the first: Should feeds use, or not use, blank lines between components? (A component is a chunk of text representing an event, or something else that can show up in an iCalendar file, like a todo item.)

The presence of blank lines is a reason why this feed is one of two I’m tracking that won’t parse in DDay.iCal.

The unmodified feed looks like this:

BEGIN:VEVENT
...stuff...
END:VEVENT

BEGIN:VEVENT
...stuff
END:VEVENT

Part of the “fix” is to make it look like this:

BEGIN:VEVENT
...stuff...
END:VEVENT
BEGIN:VEVENT
...stuff
END:VEVENT

But I’ve put “fix” in air quotes because, well, who’s wrong in this case? The feed producer (in this case, the Keene Chamber of Commerce), or the feed consumer (in this case, DDay.iCal)?

I looked at the spec and didn’t find evidence pointing one way or the other. Neither did this person:

> 1) yes, KOrganizer adds empty lines between VEVENT, VTODO and 
> VJOURNAL. I just checked the specification (RFC 2445), and it 
> doesn't say anything about blank lines... (neither explicitly 
> allowed, nor explicitly not allowed)		

This is a perfect example of why the process that Mark Pilgrim and Sam Ruby went through for RSS/Atom feeds will be so valuable for iCalendar feeds. Quite a few details that affect interoperability turn out to depend on assumptions and interpretations that aren’t explicit.

Maybe I’m misreading the spec, and it really does forbid blank lines between components. If so, great, the validator can enforce that rule. But maybe it neither allows nor forbids. In that case, the validator can say so, and suggest a best practice. In this case, my guess is that the best practice would be not to include blank lines.

But I said that remvoing the blank lines is only part of the “fix” — and here’s why. When I remove them, the feed still won’t parse in DDay.iCal, but for a different reason. Now the problem lies here:

BEGIN:VCALENDAR
X-WR-CALNAME:GKCC
BEGIN:VEVENT
...stuff...

In this case, the reason is clearly stated in the spec. A feed is supposed to include VERSION and PRODID properties like so:

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//hacksw/handcal//NONSGML v1.0//EN
BEGIN:VEVENT

If I inject those into the Chamber of Commerce feed, and remove blank lines, it parses in DDay.iCal.

Note that the unmodified feed is reported to be valid by this iCal4J-based validator. A more robust validator, in the style of the Pilgrim/Ruby RSS/Atom validator, would fail the feed, and would cite the relevant part of the spec in its explanation of the failure.

The spec says, by the way, that both VERSION and PRODID are required elements. When I saw that DDay.iCal was rejecting the Chamber of Commerce feed, which contains neither, I figured that was why. And sure enough, it accepts this:

BEGIN:VCALENDAR
VERSION:2.0
PRODID:Keene Chamber of Commerce
X-WR-CALNAME:GKCC
BEGIN:VEVENT

But it also accepts this:

BEGIN:VCALENDAR
VERSION:2.0
X-WR-CALNAME:GKCC
BEGIN:VEVENT

And this:

BEGIN:VCALENDAR
PRODID:Keene Chamber of Commerce
X-WR-CALNAME:GKCC
BEGIN:VEVENT

But not this:

BEGIN:VCALENDAR
PRODID:Keene Chamber of Commerce
BEGIN:VEVENT

Eventually I twigged to the fact that it’s evidently just looking for two (or more) non-empty lines between the BEGINs. For example, this parses:

BEGIN:VCALENDAR
FOO:BAR
BAZ:FOO
BEGIN:VEVENT

In practice this isn’t a big deal. None of the metadata matters to me, for my purposes, so my aggregator can just elide it before sending a feed to the parser. But the metadata might matter for someone, for some purpose. A proper validator would help ensure that it will be available to those people, for those purposes, by enabling feed producers and feed consumers to more easily produce and consume valid feeds.

For what it’s worth, I’m going to track this category of issue using the tag icalvalid, and I invite other interested parties to do the same. As in the case of the grl2020 tag, I know the tag can appear in a variety of places including del.icio.us, Technorati, WordPress, and nowadays of course Twitter. So I’ll create a metafeed that tracks icalvalid in all of those places.

Update: OK, here’s the icalvalid metafeed, based on this Yahoo Pipe.