If you think that the semantic web is just some kind of geek rapture, like the singularity, I can understand why. As with Zeno’s Paradox we’re always advancing but never arriving. Unlike the singularity, though, I do expect a semantic web to emerge in my lifetime. The latest initiative is schema.org, sponsored by Google, by Yahoo!, and by my employer, Microsoft. Schema.org recapitulates prior efforts to define how webmasters can mark up web pages to include structured data. I hope that this time, thanks to collaboration among the major search engines, we’ll finally cross the activation threshold.
Meanwhile I’ve been toiling in another part of the semantic web. I’ve been trying to get
webmasters people to understand why and how to publish calendars as machine-readable structured data in addition to human-readable text. If you’ve followed the elmcity saga you know I’m on a crusade to make more and better use of the Internet’s venerable standard for exchanging structured calendar events: iCalendar.
It’s been a struggle. Almost every website run by a school, club, business, or town has an Events page. Those pages are, almost always, data siloes: HTML or PDF files that can be read by people but cannot be processed by machines. Only rarely do such pages offer links to iCalendar feeds served up by Google Calendar or Drupal or Hotmail Calendar or some other service capable of producing such feeds. So when elmcity curators discover one of these rare feeds, it’s cause for rejoicing.
Sadly that joy is sometimes short-lived. A surprising number of iCalendar feeds just plain don’t work. That’s why I invited Doug Day to create the iCalendar Validator, a service that helps producers of iCalendar feeds conform to the specification. It’s always painful when I have to explain to a curator that the shiny new feed they’ve discovered doesn’t conform and won’t deliver events to the hub.
Here are three sources of iCalendar feeds that, I’ve recently discovered, don’t work.
1. The University of Michigan’s UM Events. It’s a major hub that serves many campus websites. As far as I can tell, all of those sites are providing feeds with malformed descriptions. Here’s an example of the problem and the fix. (iCalendar producer ID: UM//UM*Events)
2. CiviCRM is a “free, libre and open source software constituent relationship management solution.” Its feeds also have malformed descriptions. Here’s an example of the problem and the fix. (iCalendar producer ID: CiviCRM//NONSGML CiviEvent iCal)
3. Drupal is a popular open source content management system. I’ve seen Drupal feeds used successfully, but today I found one that fails for two reasons: a malformed recurrence rule, and a missing timezone definition. Here’s an example of the problem and the fix. (iCalendar producer ID: Drupal iCal API)
These problems are minor and would be easy to resolve. I’ll try to contact the authors of these iCalendar producers; if you can help put me in touch I’d appreciate that. I’m also going to look through the logs written by the iCalendar Validator, compile a list of producers of invalid feeds, and try to contact them as well.
Where’s the connection to the semantic web? At the end of the day, as RSS/Atom validator co-creator Sam Ruby likes to say, “It’s just data.” But structured data, whether it conforms to the dozen-year-old iCalendar standard or some newfangled microdata standard, is easily screwed up. And the consequences of screwups are often silent. Services that were looking for that structured data find nothing, mutter to themselves, and move along.
As we collectively create the semantic web we’ll need to make sure that the structured data we intend to publish really says what we mean.