iCalendar validation issues #1 and #2: blank lines, PRODID and VERSION

Sam Ruby offers the following advice to those of us who would like to improve the interoperability of iCalendar feeds:

Identifying real issues that prevent real feeds from being consumed by real consumers and describing the issue in terms that makes sense to the producer is what most would call value.

I’ll be documenting issues as I encounter them. Here’s the first: Should feeds use, or not use, blank lines between components? (A component is a chunk of text representing an event, or something else that can show up in an iCalendar file, like a todo item.)

The presence of blank lines is a reason why this feed is one of two I’m tracking that won’t parse in DDay.iCal.

The unmodified feed looks like this:

BEGIN:VEVENT
...stuff...
END:VEVENT

BEGIN:VEVENT
...stuff
END:VEVENT

Part of the “fix” is to make it look like this:

BEGIN:VEVENT
...stuff...
END:VEVENT
BEGIN:VEVENT
...stuff
END:VEVENT

But I’ve put “fix” in air quotes because, well, who’s wrong in this case? The feed producer (in this case, the Keene Chamber of Commerce), or the feed consumer (in this case, DDay.iCal)?

I looked at the spec and didn’t find evidence pointing one way or the other. Neither did this person:

> 1) yes, KOrganizer adds empty lines between VEVENT, VTODO and 
> VJOURNAL. I just checked the specification (RFC 2445), and it 
> doesn't say anything about blank lines... (neither explicitly 
> allowed, nor explicitly not allowed)		

This is a perfect example of why the process that Mark Pilgrim and Sam Ruby went through for RSS/Atom feeds will be so valuable for iCalendar feeds. Quite a few details that affect interoperability turn out to depend on assumptions and interpretations that aren’t explicit.

Maybe I’m misreading the spec, and it really does forbid blank lines between components. If so, great, the validator can enforce that rule. But maybe it neither allows nor forbids. In that case, the validator can say so, and suggest a best practice. In this case, my guess is that the best practice would be not to include blank lines.

But I said that remvoing the blank lines is only part of the “fix” — and here’s why. When I remove them, the feed still won’t parse in DDay.iCal, but for a different reason. Now the problem lies here:

BEGIN:VCALENDAR
X-WR-CALNAME:GKCC
BEGIN:VEVENT
...stuff...

In this case, the reason is clearly stated in the spec. A feed is supposed to include VERSION and PRODID properties like so:

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//hacksw/handcal//NONSGML v1.0//EN
BEGIN:VEVENT

If I inject those into the Chamber of Commerce feed, and remove blank lines, it parses in DDay.iCal.

Note that the unmodified feed is reported to be valid by this iCal4J-based validator. A more robust validator, in the style of the Pilgrim/Ruby RSS/Atom validator, would fail the feed, and would cite the relevant part of the spec in its explanation of the failure.

The spec says, by the way, that both VERSION and PRODID are required elements. When I saw that DDay.iCal was rejecting the Chamber of Commerce feed, which contains neither, I figured that was why. And sure enough, it accepts this:

BEGIN:VCALENDAR
VERSION:2.0
PRODID:Keene Chamber of Commerce
X-WR-CALNAME:GKCC
BEGIN:VEVENT

But it also accepts this:

BEGIN:VCALENDAR
VERSION:2.0
X-WR-CALNAME:GKCC
BEGIN:VEVENT

And this:

BEGIN:VCALENDAR
PRODID:Keene Chamber of Commerce
X-WR-CALNAME:GKCC
BEGIN:VEVENT

But not this:

BEGIN:VCALENDAR
PRODID:Keene Chamber of Commerce
BEGIN:VEVENT

Eventually I twigged to the fact that it’s evidently just looking for two (or more) non-empty lines between the BEGINs. For example, this parses:

BEGIN:VCALENDAR
FOO:BAR
BAZ:FOO
BEGIN:VEVENT

In practice this isn’t a big deal. None of the metadata matters to me, for my purposes, so my aggregator can just elide it before sending a feed to the parser. But the metadata might matter for someone, for some purpose. A proper validator would help ensure that it will be available to those people, for those purposes, by enabling feed producers and feed consumers to more easily produce and consume valid feeds.

For what it’s worth, I’m going to track this category of issue using the tag icalvalid, and I invite other interested parties to do the same. As in the case of the grl2020 tag, I know the tag can appear in a variety of places including del.icio.us, Technorati, WordPress, and nowadays of course Twitter. So I’ll create a metafeed that tracks icalvalid in all of those places.

Update: OK, here’s the icalvalid metafeed, based on this Yahoo Pipe.

10 thoughts on “iCalendar validation issues #1 and #2: blank lines, PRODID and VERSION

  1. I would read the following from RFC2445:

    “The iCalendar object is organized into individual lines of text,
    called content lines. Content lines are delimited by a line break,
    which is a CRLF sequence (US-ASCII decimal 13, followed by US-ASCII
    decimal 10).”

    to support your contention that blank lines are not permitted. “Content lines” are not permitted to be empty:

    “contentline = name *(“;” param ) “:” value CRLF”

  2. To me, it seems like the first matter (DDaily.iCal throwing errors because of an extra line feed) is a point where the author of DDaily.iCal should follow Postel’s law: “Be conservative in what you do; be liberal in what you accept from others.” If the spec is somewhat vague on the matter, the parser should have enough robustness to handle a few stray characters (line feeds, in this case).

    From your subsequent experiments, though, it sounds like DDaily.iCal is quite a fragile hack to start with…

  3. Thank you Jon for doing this research – I believe this is a good step toward improving iCalendar interoperability.

    “it sounds like DDaily.iCal is quite a fragile hack to start with…”

    Peter, please be careful of gross assumptions before flaming another’s work: A close examination of RFC2445’s BNF demonstrates that indeed, blank lines are not explicitly allowed by the standard:

    http://www.w3.org/2002/12/cal/rfc2445#sec4.6

    also

    http://www.w3.org/2002/12/cal/rfc2445#sec4.1

    Although it is not explicitly accepted by the standard, I personally felt this issue should be corrected in DDay.iCal, and have subsequently modified it to accept blank lines.

    As to the other issue of requiring PRODID and VERSION – those also are required by the standard, and hence are by DDay.iCal and its parser (Google Calendars will not parse calendars without a VERSION property). Since the parser (ANTLR) is not robust enough to require PRODID and VERSION explictly, it simply required 2 or more calendar properties. This has since been modified as well to accept calendars with no properties.

    Thanks again Jon – this is excellent work, and I appreciate you letting me know the results of your work. Keep it up!!

    -Doug

  4. Thanks Doug.

    > From your subsequent experiments, though,
    > it sounds like DDaily.iCal is quite a
    > fragile hack to start with…

    Far from it. The DDay.iCal library is of very high quality. The issue, in my view, is that — lacking a robust validator — we face a lot of uncertainty about how all the libraries I’ve looked at should handle the kinds of feeds we see out in the wild.

    > the parser (ANTLR) is not robust
    > enough to require PRODID and VERSION
    > explictly.

    I wondered about that.

  5. > Far from it. The DDay.iCal library is of
    > very high quality.

    Thank you very much.

    > The issue, in my view, is that — lacking a
    > robust validator — we face a lot of
    > uncertainty about how all the libraries
    > I’ve looked at should handle the kinds of
    > feeds we see out in the wild.

    In your view, do you think a DDay.iCal-based validator would be appropriate or useful? I’m wondering what your thoughts are toward finding/creating a robust validator?

    Thanks again,
    -Doug

  6. Hi Doug,
    A calendar validator would certainly be useful! I’m having Outlook compatibility issues, and a validator would be great!

Leave a Reply