Ask and ye may receive, don’t ask and ye surely will not

This fall a small team of University of Toronto and Michigan State undergrads will be working on parts of the elmcity project by way of Undergraduate Capstone Open Source Projects (UCOSP), organized by Greg Wilson. In our first online meeting, the students decided they’d like to tackle the problem that FuseCal was solving: extraction of well-structured calendar information from weakly-structured web pages.

From a computer science perspective, there’s a fairly obvious path. Start with specific examples that can be scraped, then work toward a more general solution. So the first two examples are going to be MySpace and LibraryThing. The recipes[1, 2] I’d concocted for FuseCal-written iCalendar feeds were especially valuable because they could be used by almost any curator for almost any location.

But as I mentioned to the students, there’s another way to approach these two cases. And I was reminded of it again when Michael Foord pointed to this fascinating post prompted by the open source release of FriendFeed’s homegrown web server, Tornado. The author of the post, Glyph Lefkowitz, is the founder of Twisted, a Python-based network programming framework that includes the sort of asynchronous event-driven capabilities that FriendFeed recreated for Tornado. Glyph writes:

If you’re about to undergo a re-write of a major project because it didn’t meet some requirements that you had, please tell the project that you are rewriting what you are doing. In the best case scenario, someone involved with that project will say, “Oh, you’ve misunderstood the documentation, actually it does do that”. In the worst case, you go ahead with your rewrite anyway, but there is some hope that you might be able to cooperate in the future, as the project gradually evolves to meet your requirements. Somewhere in the middle, you might be able to contribute a few small fixes rather than re-implementing the whole thing and maintaining it yourself.

Whether FriendFeed could have improved the parts of Twisted that it found lacking, while leveraging its synergistic aspects, is a question only specialists close to both projects can answer. But Glyph is making a more general point. If you don’t communicate your intentions, such questions can never even be asked.

Tying this back to the elmcity project, I mentioned to the students that the best scraper for MySpace and LibraryThing calendars is no scraper at all. If these services produced iCalendar feeds directly, there would be no need. That would be the ideal solution — a win for existing users of the services, and for the iCalendar ecosystem I’m trying to bootstrap.

I’ve previously asked contacts at MySpace and LibraryThing about this. But now, since we’re intending to scrape those services for calendar info, it can’t hurt to announce that intention and hope one or both services will provide feeds directly and obviate the need. That way the students can focus on different problems — and there are plenty to choose from.

So I’ll be sending the URL of this post to my contacts at those companies, and if any readers of this blog can help move things along, please do. We may end up with scrapers anyway. But maybe not. Maybe iCalendar feeds have already been provided, but aren’t documented. Maybe they were in the priority stack and this reminder will bump them up. It’s worth a shot. If the problem can be solved by communicating intentions rather than writing redundant code, that’s the ultimate hack. And its one that I hope more computer science students will learn to aspire to.

8 thoughts on “Ask and ye may receive, don’t ask and ye surely will not

  1. Malcolm Tredinnick

    Greg Wilson’s work with groups at UTorronto deserves every mention it can get. He seems to be leading a group of educators who are doing a real service to their students, giving them guided experience at real software development, including interaction with the outside world. I’m not that surprised to see it was that group who picked it up. Greg is setting an impressive standard and I’m a big fan.

    Reply
  2. Ian Barnes

    I haven’t been following this thread particularly closely, so perhaps what I’m going to suggest is already on the agenda… Anyway, if you do go ahead with the scrapers, I think it would be worth considering the Zotero model: rather than writing lots of page scrapers yourselves or trying to come up with something incredibly clever and general that will probably turn out to be quite fragile and hard to maintain, instead use the wisdom of the crowd and create a framework (plugins, plugin templates, etc.) that makes it as easy as possible for reasonably tech-savvy users to extend the service by adding new scrapers for their favourite sites.

    Reply
  3. Randy

    Jon,
    BRAVO – i loathe scrappers, and use them regularly because so many programs don’t offer structured data. I wrote my first scrapper in 1988, and had to regularly update it as the source morphed. what a waste of time!

    good luck in your attempt to get these features available through the big websites. Walk in the shoes of Facebook and ask yourself – Why do i want someone else to re-purpose data in a manner that makes it possible for them to NOT view my site? they loose the eyeballs and disappear behind your app. I like the intent i just don’t see how it is in their interest to cooperate. My great hope is that I am simply wrong!

    peace.
    randy

    Reply
  4. Jon Udell Post author

    Consider MySpace. It’s become the place for musicians to hang their shingles. Musicians care very much about promoting themselves not only on MySpace but elsewhere. Exporting a calendar feed would make MySpace more useful for that purpose.

    Note also that the style I’m promoting is all about links back to authoritative sources. In this case, that source would be MySpace. Every event exported through the feed into other contexts would be an invitation to visit the band’s MySpace page to sample the music and evaluate whether to attend the event.

    Reply
  5. Pingback: Ask! « UCOSP

  6. about

    Good – I should definitely pronounce, impressed with your site. I had no trouble navigating through all the tabs as well as related information ended up being truly simple to do to access. I recently found what I hoped for before you know it in the least. Reasonably unusual. Is likely to appreciate it for those who add forums or something, site theme . a tones way for your customer to communicate. Nice task.

    Reply
  7. tobacco shop

    Its such as you learn my thoughts! You appear to grasp a lot approximately this, like you wrote the e book in it or something. I feel that you just could do with some % to drive the message house a bit, however other than that, that is great blog. A fantastic read. I will definitely be back.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s