I’m working on a project that aggregates a bunch of community calendars, plus a lot of calendar info that’s just written out free-form. Some examples of the latter, in ascending order of resistance to mechanical parsing:
2 Apr – Wed 10:00AM-10:45AM
Thu, 11/15/07 – Fri, 4/11/08
Every Tuesday of the month from 10:00-11:00 a.m
Sat., Apr. 05, 9:00 AM Registration/Preview, 10:00 AM Live Auction
2nd Saturday of every other month, 10:00 am-12:00 pm
Programming languages tend to offer lots of functions and modules for converting among machine formats, and for converting machine formats into human formats, but when it comes to recognizing human formats, not so much.
In looking around for a recognizer, I came across the script that Jamie Zawinski uses to manage the calendar for his DNA Lounge. It looks like it can handle many of these formats, but it’s a 6500-line Perl behemoth that does a bunch of different things.
What else is available, for any language, preferably more focused and packaged, that can turn an item in human format, like “2nd Saturday of every other month, 10:00 am-12:00 pm,” into a sequence of items in machine format?