iCalendar validation issue #3: Quoted-printable vs HTML

Next up in my series of iCalendar validation examples: The Frost Free Library feed. It fails in three of the four parsers I tried here, and should have failed in all. It begins like so:

BEGIN:VCALENDAR
VERSION:2.0
X-WR-CALNAME:Frost Free Library | January 06, 2009 - February 05, 2009
PRODID:-//strange bird labs//Drupal iCal API//EN
BEGIN:VEVENT
DTSTART;VALUE=DATE-TIME:20090106T203000Z
DTEND;VALUE=DATE-TIME:20090106T203000Z
SUMMARY;ENCODING=QUOTED-PRINTABLE:Library Tea
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:<p>Normal 0 false false false Mic= 
rosoftInternetExplorer4</p>=0D=0A<br class=3D"clear" />
URL;VALUE=URI:http://www.frostfree.org/node/505
UID:http://www.frostfree.org/node/505
END:VEVENT
END:VCALENDAR

It’s hard to know exactly what the feed producer thought it was doing here, but the feed should fail because no valid content line can begin with rosoft.... Adding a blank space at the beginning of all such lines will, I think, make the feed at least nominally valid.

But a robust validator would have more to say on the subject. It would notice that this feed is trying to publish HTML content, and would point out that there’s an ALTREP (alternative representation) for this purpose. Setting aside the fact that this feed doesn’t seem to have any actual HTML content, I believe the right way to encode such content would be something like this:

BEGIN:VCALENDAR
VERSION:2.0
X-WR-CALNAME:Frost Free Library | January 06, 2009 - February 05, 2009
PRODID:-//strange bird labs//Drupal iCal API//EN
BEGIN:VEVENT
DTSTART;VALUE=DATE-TIME:20090106T203000Z
DTEND;VALUE=DATE-TIME:20090106T203000Z
SUMMARY;ENCODING=QUOTED-PRINTABLE:Library Tea
DESCRIPTION;ALTREP="CID:xyz":Basic description here.
URL;VALUE=URI:http://www.frostfree.org/node/505
UID:http://www.frostfree.org/node/505
END:VEVENT
END:VCALENDAR

Content-Type:text/html
Content-Id:xyz
 <html><body>
 <p><b>Enhanced description here</b> Body of 
 enhanced description.</p>
 </body></html>

I don’t know to what extent ALTREPs are actually produced and consumed. My guess is rarely, and that producers might want to lean toward plain text with line folding when that’s sufficient. But that’s just my guess, I’d be interested to hear from folks who know.

32 thoughts on “iCalendar validation issue #3: Quoted-printable vs HTML

  1. Jon,

    > I don’t know to what extent ALTREPs are
    > actually produced and consumed. My guess
    > is rarely, and that producers might want
    > to lean toward plain text with line
    > folding when that’s sufficient.

    I my experience, ALTREPs are not generally handled, and I know of no consumer of iCalendars that can actually handle them. In theory, however, I agree that this Content-Id system works well and is technically supported by RFC 2445 – it properly describes the data as HTML data before presenting it to a CUA. I’m not sure of current CUA support for MIME entities. In theory, a calendar feed using ALTREP in this manner would need to be more than a simple .ics file then, it would need to be a MIME feed with appropriate entities found within.

    In practice, I agree with you – it’s best to provide plain-text representations whenever possible.

    I think a more flexible use of ALTREP would be to simply provide a perma-link to an alternate representation of the content, handled by an underlying calendar server of some kind (i.e. ALTREP=”http://some.site.com/calendar?altrep=1&calid=1234&uid=5678″).

    Additionally, the use of ENCODING=QUOTED-PRINTABLE as a component-based encoding is not explicitly supported by RFC 2445. The standard mentions quoted-printable encoding as a means of transport, but goes no further than that. The only built-in encoding supported are “8BIT” and “BASE64”.

    http://www.w3.org/2002/12/cal/rfc2445#sec4.2.7

    Thanks!
    -Doug

  2. > Quoted printable would imply that the =
    > sign on the preceding line would be
    > treated as a “soft” line break.

    Right. So one interpretation would be: elide the soft breaks before doing any other processing.

    And yet, the example shows subsequent parts of the soft-broken line with leading whitespace, which suggests that it’s necessary to do that in order to distinguish those subsequent parts from normal content lines.

  3. Aren’t these kinds of problems why XML became popular? (Setting aside how software can get XML-content-inside-XML wrong.)

  4. Reed Hedges asks: “Aren’t these kinds of problems why XML became popular?”

    Yes. A technical committee (TC) of CalConnect, the consortium that guides a great deal of the work around calendaring and scheduling standardization, is preparing a draft of an XML representation of iCalendar:

    http://www.calconnect.org/tc-xml.shtml

    The committee intends for this representation to be capable of being “round-tripped” (interchanged in both directions) with the current non-XML data format, although there are likely to be some limitations on that ability. Once finalized, its use would then largely be a question of adoption, both in other related specifications and by developers of calendaring and scheduling products.

    The committee is led by Steven Lees, one of Jon’s colleagues at Microsoft. (Steven is also one of the developers of FeedSync.)

  5. Just a historical note: iCalendar is VCALENDAR 2.0. VCALENDAR 1.0 (and VCARD pre-3.0) used quoted-printable for line wrapping and encoding, which is a huge pain in the butt. For a long time there was a lot of VCALENDAR 1.0 floating around, because that’s all that Palm supported until very recently.

    I imagine the author of that Drupal iCalendar serializer started with a vcalendar 1.0 code fragment and then tried to make it into icalendar by just changing the version…

    vobject will parse the original feed if you turn on allowQP (which uses a vastly slower parser). I think icalendar.py uses a quoted-printable friendly parser, which is why it succeeds there.

    P.S. It looks like the fragment you copied got a little munged in transit, the DESCRIPTION doesn’t have a colon before the payload, and there’s no terminating END:VCALENDAR. If you fix those bits, this is actually valid VCALENDAR 1.0.

  6. > the DESCRIPTION doesn’t have a colon
    > before the payload

    That was my goof, not part of the original feed.

    > vobject will parse the original feed if
    > you turn on allowQP (which uses a vastly
    > slower parser). I think icalendar.py uses
    > a quoted-printable friendly parser, which
    > is why it succeeds there.

    It seems that both parse the corrected fragment above, if there’s leading space in front of the wrapped line, otherwise not.

  7. > Once finalized, its use would then
    > largely be a question of adoption, both
    > in other related specifications and by
    > developers of calendaring and scheduling
    > products.

    Right.

    I look forward to an XML-based representation of iCalendar. Meanwhile, there’s clearly a need for better validation of the existing format.

    And actually, once the XML format is done and in use, there will continue to be a need for a robust validator. RSS was always written in XML, and there were still a zillion issues that needed (and continue to need) the help of the Feed Validator to sort out.

  8. Sorry if I was unclear. The feed is clearly invalid. But in order to produce valuable helpful messages, you do need to try to “get into the head” of the feed producer.

    An error message on the line that starts “rosoft” may be more confusing than helpful, particularly if coupled with the recommendation that a space be added.

    A warning on the line preceding this that you see that an attempt to use the quoted-printable encoding is being made, but that the equals sign at the end of the line isn’t going to be interpreted as a continuation would help to clear this up.

    The fact that you need to produce such a warning is not something you could ever get from reading the iCalendar RFCs.

  9. > But in order to produce valuable helpful
    > messages, you do need to try to “get into
    > the head” of the feed producer.

    That’s a great way to put it. Of course it applies much more broadly to all aspects of computer/human interaction!

    >A warning on the line preceding this that
    >you see that an attempt to use the quoted-
    >printable encoding is being made, but that
    >the equals sign at the end of the line
    >isn’t going to be interpreted as a
    >continuation would help to clear this up.

    Exactly. I just checked with the current version of iCal4j and it reports:

    Error at line 10: Illegal property [ ROSOFTINTERNETEXPLORER4]

    From the perspective of the parser, that’s exactly right. But, as you say, it fails to clarify the intention of the producer.

  10. This is a great comment thread, especially around what the “error” is and how to explain it to someone who might be able to do sommething about it.

    It also suggests that good acceptors are rather flexible and forgiving, but in a way that someone can tell what happened. A good acceptor would not preserve the bug of course and would provide an usable account (that users might only look at if puzzled).

    Makes me want to find a way to past this into my copy of Alan Cooper’s “About Face” 1.0

  11. I also wanted to point out the possible use of FMTTYPE here:

    DESCRIPTION;FMTTYPE=text/html;ENCODING=BASE64:PHA+Tm9ybWFsIDAgZmFsc2UgZmFsc2UgZmFsc2UgTWljcm9zb2Z0SW50ZXJuZXRFeHBsb3JlcjQ8
    L3A+DQo8YnIgY2xhc3M9ImNsZWFyIiAvPg==

    This seems to fit with RFC 2445 as well.

    -Doug

  12. Jon writes:
    > I look forward to an XML-based
    > representation of iCalendar. Meanwhile,
    > there’s clearly a need for better
    > validation of the existing format.

    Agreed. The most likely approach is to improve the validation capabilities of existing iCalendar programming frameworks and libraries, then build validation services – such as web services and web-based front ends to same – on top of these. Regarding the latter, Mark Pilgrim’s and Sam Ruby’s Feed Validator and the W3C’s HTML and CSS validators are obvious sources of inspiration for an iCalendar validator, but it will take some considerable amount of work to get to that place.

    Ben Fortuna (author of iCal4j, a Java-based framework) has indicated his willingness to accept issue reports and patches to improve iCal4j’s validation capabilities, and given Doug Day’s active participation in this thread, one might infer that he’d be receptive to suggestions regarding DDay.iCal’s validation features.

    Jon also points out that even with XML-based formats, there are often a great many subtle issues that can arise. That’s accurate. Nonetheless, with XML data, it’s possible to use well-established tools, such as schema validators (in XML Schema or Relax NG) and rules validation tools (e.g. Schematron). This can help make the job of identifying such issues in an XML representation of iCalendar somewhat easier.

  13. Some subjective background on the community from which iCalendar validation tools might emerge, from someone who is relatively new to that community:

    My subjective impression is that, currently, the major focus of the calendaring and scheduling community is on several new standards intended to provide greater interoperability among calendaring and scheduling software packages. These include CalDAV, which should ultimately permit you to use your choice of desktop or mobile device client software with any vendor’s calendaring server; and CalDAV Scheduling, which should make it possible to schedule meetings across servers from multiple vendors, and even – given the appropriate permissions – servers run by different organizations. This is exciting work, and is demanding a great deal of time and attention within that community.

    Because of that focus, the core iCalendar specification – and iCalendar validation – is, I believe, of less interest right now to this community.

    Another part of the reason is that the first major “refresh” of iCalendar is nearly out the door. This work has also taken attention away from any validation-related efforts, although it may ultimately assist those efforts. The new, cleaned up specification, currently known as RFC 2445 bis (although it will have a new ID when ratified) is being shepherded under the aegis of an IETF working group at:

    http://www.ietf.org/html.charters/calsify-charter.html

    This forthcoming revision of the venerable, core iCalendar spec is intended to help rationalize iCalendar, cleaning up ambiguities in the original iCalendar spec, and includes some modest simplification efforts, as well.

  14. Great points Aron, I agree with them all.

    ICS validation is clearly a trailing-edge thing, and I wouldn’t expect or recommend that it should siphon effort away from the leading-edge initiatives you mention.

    That said, there’s a vast quantity of ICS in circulation, it continues to grow, and it’ll be around for many years to come.

    And yet a very basic use for this stuff — RSS-like pub/sub — has largely been overlooked, and can still make a big impact in ways that matter to, really, just about everyone.

    So I’m thinking a quiet rear-guard action to help people discover and use ICS pub/sub makes sense.

    Folks like Ben Fortuna, Doug Day, and others are already invested in providing ICS libraries, so it’s in their interest to promote and try to support this use case.

    Whatever could be done along the lines of the suite of test cases at feedvalidator.org/testcases would be a boon to ICS in its current form. And I think it could also create a model — and an expectation — for how validation could work for future incarnations of ICS.

  15. Jon, I think your earlier point was right on the mark:

    >And actually, once the XML format is done and in >use, there will continue to be a need for a >robust validator.

    Parsing of data is actually a very small part of ICS validation, so the question of XML vs. folded content lines is probably not the main issue here.

    The interoperability concerns of iCalendar are not so much the result of syntactic ambiguity, but rather semantic ambiguity, which I think highlights a need for “drawing a line in the sand”, even if it’s just to provide a basis for discussion and to highlight the differences in the semantic interpretation of each CUA.

  16. > The interoperability concerns of iCalendar
    > are not so much the result of syntactic
    > ambiguity, but rather semantic ambiguity

    I agree. This is why feedvalidator.org is such an interesting/important project — not just w/respect to the formats it cares about, but in general.

    It demonstrates a solid approach to creating a frame of reference for the reading-between-lines and interpretation that is always necessary.

  17. Reed Hedges: “Aren’t these kinds of problems why XML became popular?”

    Oh come on, and there haven’t been problems with invalid XML and lousy parsers? Please.

    Jon, you’re right on the money, keep up the good work.

  18. Anybody who has dealt with RSS should know how little XML helps with interop.

    Lack of a Validator is an issue, as Jon says, but I would say an even larger issue is the lack of test suites. Data format standards without test suites are going to get interop only by long drawn-out pain.

    We need an iCalendar equivalent of http://tools.ietf.org/html/rfc4134

  19. If the iCal fragment is supposed to be embedded in a feed format (RSS 2.0 or Atom), isn’t it just as swell to inject the alternate HTML (preferably XHTML) representation into the entry/item level of the feed instead of inside the horrific plain-text based iCal format?

    On another note, I welcome an XML-based calendar format, but I think the working group is going in a direction that looks like it could have been thought up 10 or more years ago. DOCTYPE? No namespaces? Cryptic element names, attribute names and values?

    If an XML-document in and of itself isn’t even close to being self-explanatory, the design of the document have, in my humble opinion, utterly failed. What the working group should be thinking is: “How can the format we’re creating most elegantly, easilly and interoperably be embedded in another XML document, like for example, RFC 4287 (Atom)?”.

    The DOCTYPE and having no namespace, for instance, are both huge show-stoppers.

  20. Asbjørn Ulsberg writes:
    > If the iCal fragment is supposed to
    > be embedded in a feed format …
    > isn’t it just as swell to inject the
    > alternate HTML (preferably XHTML)
    > representation into the entry/item
    > level of the feed …?

    Two issues:
    1. There is no canonical XHTML representation of an iCalendar object. The closest thing we have is a microformat, hCalendar, which currently encompasses only a fraction of iCalendar’s data richness.
    2. By embedding iCalendar (RFC 2445) data in a feed entry, a calendaring client (e.g. Outlook, Apple iCal, Google Calendar, Mozilla Sunbird) can import this data. None of these clients would know what to do with “an alternate XHTML representation.”

    Asbjørn also writes:
    > I think the [CalConnect XML TC] working
    > group is going in a direction that looks
    > like it could have been thought up
    > 10 or more years ago. DOCTYPE?
    > No namespaces?

    One possibility is that you’re looking at the wrong draft – an old draft of “xCal,” a former proposal for an XML representation of iCalendar, dating back at least three years. That document is linked from the CalConnect XML TC’s home page, but is *not* their work.

    In their current draft, the CalConnect XML TC is using Relax NG as their schema language, as mentioned in this October 2008 CalConnect Roundtable report:

    http://www.calconnect.org/roundtable13rpt.shtml

    I believe the XML TC’s draft schema has been released within the CalConnect organization, but I don’t know whether it is yet publicly available.

  21. Thanks, Aron, for clearing that up. It is true that I was reading an outdated specification of xCalendar. I’m glad that’s not the direction the TC is heading. Relax NG schema (and hopefully a namespace) sounds very promising indeed. I’m looking forward to reading a publicly available draft of the specification!

    With regard to embedding iCal in a feed, I was mostly thinking about using the feed-level text constructs as an alternative to a Quoted-Printable “DESCRIPTION” or “ALTREP” which look like major PITAs to both produce and consume. If there’s a real need to embed rich and structured HTML representations of calendar event descriptions, it seems like an easier solution to provide this through the feed directly than embedded within iCal within the feed, through archaic escaping and attachment techniques.

  22. I am a java programmer can any body answer my problem?
    I have to declare:
    DESCRIPTION;ALTREP=”CID:xyz”:Basic description.

    here i need Description to print the text in formated way to icalender from java code.

    can anybody explain what is “CID:xyz” here?
    where to declare below things in java code?

    Content-Type:text/html
    Content-Id:xyz

    Enhanced description here Body of
    enhanced description.

    thanks in advance
    prvnhs

  23. Asbjørn: Thanks for the clarification and the excellent thoughts! I’ve experimented with putting an hCalendar representation of a subset of an iCalendar object into the ‘contents’ element of a feed entry, and setting the MIME type of that element to the XHTML type; is that generally what you’re thinking of, as well? When the draft XML representation of iCalendar comes out, that might be another way to represent this, one which could be transformed for consumption in an X/HTML world.

    To prvnhs: There is a Java library for generating and parsing iCalendar data: iCal4j http://http://ical4j.sourceforge.net/ If you’re not just writing a single DESCRIPTION property but need to generate valid calendar events, that’s an excellent tool to use for that purpose. As for the CID (Content-ID), my understanding is that is essentially a way within iCalendar of referring to external, non-iCalendar data (such as vCard ‘business card’ data for a person) that may be carried along as a separate MIME body part, along with the iCalendar data, in a common transport message. There are some usage examples of a CID (for several other properties, although not for the DESCRIPTION property) in the iCalendar spec, RFC 2445, http://www.ietf.org/rfc/rfc2445.txt, as well as some examples on how to package this all up in the iMIP specification, RFC 2447, http://www.ietf.org/rfc/rfc2447.txt

  24. i’m trying to export .ics file from php file, but the events are off about 6 hours.
    is this because of timezone?
    if so, does anyone know how to handle timezones?

  25. Not sure exactly what is trying to be accomplished here but it looks like you’re just wanting to embed HTML descriptions in your calendar event. “DESCRIPTION” should only be used for plain text.

    If you want HTML use X-ALT-DESC instead of DESCRIPTION.

    example:
    X-ALT-DESC;FMTTYPE=text/html:***FULL HTML BODY HERE***

    1. Sorry, but after applying above soluiton when i open ICS file body is getting blank

      html description is not getting displayed.

      1. Thanks Kai..
        your example is very helpful..i was getting the issue , while open the Vcalender , it shows the html text instead of html color or bold etc..
        after apply

        X-ALT-DESC;FMTTYPE=text/html:

        It does not show html as plain text , render html as it is..

        Thanks for posting

  26. I don’t know how much visibility this comment will get, but beyond the little issues with copy/paste, this sample document is a valid iCalendar 2.0.

    Yes, even the quoted-printable. Sec Sec 3.4 of RFC 2445, and page 18 of RFC 1521. 2445 allows use of Quoted-Printable in encoding data in general use throughout the iCal file. 1521 describes the Quoted-Printable format itself, including the fact that the trailing ‘=’ indicates a soft newline.

    Also, none of these points are new, your other commenters mentioned this before. Apparently it got side tracked by talk about CalConnect, which even now, still hasn’t produced a mainstream XML version of iCal.

    Long live 2445! :P

Leave a Reply