Familiar idioms in Perl, Python, JavaScript, and C#

When I started working on the elmcity project, I planned to use my language of choice in recent years: Python. But early on, IronPython wasn’t fully supported on Azure, so I switched to C#. Later, when IronPython became fully supported, there was really no point in switching my core roles (worker and web) to it, so I’ve proceeded in a hybrid mode. The core roles are written in C#, and a variety of auxiliary pieces are written in IronPython.

Meanwhile, I’ve been creating other auxiliary pieces in JavaScript, as will happen with any web project. The other day, at the request of a calendar curator, I used JavaScript to prototype a tag summarizer. This was so useful that I decided to make it a new feature of the service. The C# version was so strikingly similar to the JavaScript version that I just had to set them side by side for comparison:

JavaScript C#
var tagdict = new Object();

for ( i = 0; i < obj.length; i++ )
  {
  var evt = obj[i];
  if ( evt["categories"] != undefined)
    {
    var tags = evt["categories"].split(',');
    for (j = 0; j < tags.length; j++ )
      {
      var tag = tags[j];
      if ( tagdict[tag] != undefined )
        tagdict[tag]++;
      else
        tagdict[tag] = 1;
      }
    }
  }
var tagdict = new Dictionary();

foreach (var evt in es.events)
  {

  if (evt.categories != null)
    {
    var tags = evt.categories.Split(',');
    foreach (var tag in tags)
      {

      if (tagdict.ContainsKey(tag))
        tagdict[tag]++;
      else
        tagdict[tag] = 1;
      }
    }
  }
var sorted_keys = [];

for ( var tag in tagdict )
  sorted_keys.push(tag);

sorted_keys.sort(function(a,b) 
 { return tagdict[b] - tagdict[a] });
var sorted_keys = new List();

foreach (var tag in tagdict.Keys)
  sorted_keys.Add(tag);

sorted_keys.Sort( (a, b) 
  => tagdict[b].CompareTo(tagdict[a]));

The idioms involved here include:

  • Splitting a string on a delimiter to produce a list

  • Using a dictionary to build a concordance of strings and occurrence counts

  • Sorting an array of keys by their associated occurrence counts

I first used these idioms in Perl. Later they became Python staples. Now here they are again, in both JavaScript and C#.

Ask and ye may receive, don’t ask and ye surely will not

This fall a small team of University of Toronto and Michigan State undergrads will be working on parts of the elmcity project by way of Undergraduate Capstone Open Source Projects (UCOSP), organized by Greg Wilson. In our first online meeting, the students decided they’d like to tackle the problem that FuseCal was solving: extraction of well-structured calendar information from weakly-structured web pages.

From a computer science perspective, there’s a fairly obvious path. Start with specific examples that can be scraped, then work toward a more general solution. So the first two examples are going to be MySpace and LibraryThing. The recipes[1, 2] I’d concocted for FuseCal-written iCalendar feeds were especially valuable because they could be used by almost any curator for almost any location.

But as I mentioned to the students, there’s another way to approach these two cases. And I was reminded of it again when Michael Foord pointed to this fascinating post prompted by the open source release of FriendFeed’s homegrown web server, Tornado. The author of the post, Glyph Lefkowitz, is the founder of Twisted, a Python-based network programming framework that includes the sort of asynchronous event-driven capabilities that FriendFeed recreated for Tornado. Glyph writes:

If you’re about to undergo a re-write of a major project because it didn’t meet some requirements that you had, please tell the project that you are rewriting what you are doing. In the best case scenario, someone involved with that project will say, “Oh, you’ve misunderstood the documentation, actually it does do that”. In the worst case, you go ahead with your rewrite anyway, but there is some hope that you might be able to cooperate in the future, as the project gradually evolves to meet your requirements. Somewhere in the middle, you might be able to contribute a few small fixes rather than re-implementing the whole thing and maintaining it yourself.

Whether FriendFeed could have improved the parts of Twisted that it found lacking, while leveraging its synergistic aspects, is a question only specialists close to both projects can answer. But Glyph is making a more general point. If you don’t communicate your intentions, such questions can never even be asked.

Tying this back to the elmcity project, I mentioned to the students that the best scraper for MySpace and LibraryThing calendars is no scraper at all. If these services produced iCalendar feeds directly, there would be no need. That would be the ideal solution — a win for existing users of the services, and for the iCalendar ecosystem I’m trying to bootstrap.

I’ve previously asked contacts at MySpace and LibraryThing about this. But now, since we’re intending to scrape those services for calendar info, it can’t hurt to announce that intention and hope one or both services will provide feeds directly and obviate the need. That way the students can focus on different problems — and there are plenty to choose from.

So I’ll be sending the URL of this post to my contacts at those companies, and if any readers of this blog can help move things along, please do. We may end up with scrapers anyway. But maybe not. Maybe iCalendar feeds have already been provided, but aren’t documented. Maybe they were in the priority stack and this reminder will bump them up. It’s worth a shot. If the problem can be solved by communicating intentions rather than writing redundant code, that’s the ultimate hack. And its one that I hope more computer science students will learn to aspire to.

FriendFeed for project collaboration

For me, FriendFeed has been a new answer to an old question — namely, how to collaborate in a loosely-coupled way with people who are using, and helping to develop, an online service. The elmcity project’s FriendFeed room has been an incredibly simple and effective way to interleave curated calendar feeds, blog postings describing the evolving service that aggregates those feeds, and discussion among a growing number of curators.

In his analysis of Where FriendFeed Went Wrong Dare Obasanjo describes the value of a handful of services (Facebook, Twitter, etc.) in terms that would make sense to non-geeks like his wife. Here’s the elevator pitch for FriendFeed:

Republish all of the content from the different social networking media websites you use onto this site. Also one place to stay connected to what people are saying on multiple social media sites instead of friending them on multiple sites.

As usual, I’m an outlying data point. I’m using FriendFeed as a lightweight, flexible aggregator of feeds from my blog and from Delicious, and as a discussion forum. These feeds report key events in the life of the project: I added a new feature to the aggregator, the curator for Sasktatoon found and added a new calendar. The discussion revolves around strategies for finding or creating calendar feeds, features that curators would like me to add to the service, and problems they’re having with the service.

I doubt there’s a mainstream business model here. It’s valuable to me because I’ve created a project environment in which key events in the life of the project are already flowing through feeds that are available to be aggregated and discussed. Anyone could arrange things that way, but few people will.

It’s hugely helpful to me, though. And while I don’t know for sure that FriendFeed’s acquisition by FaceBook will end my ability to use FriendFeed in this way, I do need to start thinking about how I’d replace the service.

I don’t need a lot of what FriendFeed offers. Many of the services it can aggregate — Flickr, YouTube, SlideShare — aren’t relevant. And we don’t need realtime notification. So it really boils down to a lightweight feed aggregator married to a discussion forum.

One feature that FriendFeed’s API doesn’t offer, by the way, but that I would find useful, is programmatic control of the aggregator’s registry. When a new curator shows up, I have to manually add the associated Delicious feed to the FriendFeed room. It’d be nice to automate that.

Ideally FriendFeed will coast along in a way that lets me keep using it as I currently am. If not, it wouldn’t be too hard to recreate something that provides just the subset of FriendFeed’s services that I need. But ideally, of course, I’d repurpose an existing service rather than build a new one. If you’re using something that could work, let me know.

elmcity and WordPress MU: Questions and answers

In the spirit of keystroke conservation, I’m relaying some elmcity-related questions and answers from email to here. Hopefully it will attract more questions and more answers.

Dear Mr. Udell,

I am looking for a flexible calendar aggregator that I can use to report upcoming events for our college’s “Learning Commons” WordPress MU website, a site that will hopefully help keep our students abreast of events and opportunities taking place on campus.

1) Our site will be maintained using WordPress MU, so ideally the
display of the calendars, and/or event-lists will be handled by a
WordPress plugin. The one I am favouring is
http://wordpress.org/extend/plugins/wordpress-ics-importer/ . I have
tried this plugin and it almost does what we want.

Specifically, the plugin includes:

– a single widget that can display the “event-list” for one calendar;

– flexible options for displaying and aggregating calendars.

This plugin almost does what I want, but not quite.

a) The plugin is now limited to a single “events-list” widget. But with WordPress 2.8, it is possible to have many instances of a widget, so theoretically, I could display the “Diagnostic Tests” calendar in one instance , and the “Peer-tutoring” calendar in another widget instance.

b) It would be nice to have an option to display only the current week for specific calendars. While in other cases, it makes sense to display the entire month. And although I haven’t thought about it, likely displaying just the current day would be useful.

c) I would like flexibility over which calendars to aggregate, creating as many “topic” hubs as the current maintainer of the website might think useful for the students.

2) It would be nice to remove the calendar aggregation from the WordPress plugin, and handle that separately. Hopefully the calendars will change much less frequently than the website will be viewed. If I understand https://blog.jonudell.net/elmcity-project-faq/ properly, this might be possible using the elmcity-project.

For example, I think we could use “topical hub aggregation” to create a “diagnostic test calendar” that aggregates the holiday calendar and the different departments “diagnostic test” calendars. What I don’t understand is what is the output of “elmcity”. Does it output a single merged calendar (ics) that could be displayed by the above plugin? Is that a possibility?

Similarly, I believe I could create a different meta bookmark to aggregate our holiday calendar and our different peer-tutoring calendars (created by each department). Is this correct?

We have lots of groups, faculty, departments and staff on campus, and each wants to publicize their upcoming events. Letting them input and maintain their own calendars really seems to make sense. (Thanks for the idea. It seems clear this is the way to go, but I don’t seem to have the pieces to construct the output properly, as yet.)

I agree with your analysis that it would be better to have a separation of concerns between aggregation and display. So let’s do that, and start with aggregation.

I would like flexibility over which calendars to aggregate, creating as many “topic” hubs as the current maintainer of the website might think useful for the students.

I think the elmcity system can be helpful here. I’ve recently discovered that there are really two levels — what I’ve started to call curation and meta-curation.

I believe I could create bookmarks to aggregate our holiday calendar and our different peer-tutoring calendars (created by each department). Is this correct?

Right. It sounds like you’d want to curate a topic hub. It could be YourCollege, but if there may need to be other topic hubs you could choose a more specific name, like YourCollegeLearningCommons. That’d be your Delicious account name, and you’d be the “meta-curator” in this scenario.

As meta-curator you’d bookmark, in that Delicious account:

– Your holiday calendar

– Multiple departments’ calendars

Each of those would be managed by the responsible/authoritative person, using any software (Outlook, Google, Apple, Drupal, Notes, Live, etc.) that can publish an ICS feed.

There’s another level of flexibility using tags. In the above scenario, as meta-curator you could tag your holiday feed as holiday, and your LearningCommons feeds as LearningCommons, and then filter them accordingly.

What I don’t understand is what is the output of elmcity. Does it output a single merged calendar (ics) that could be displayed by the above plugin?

Yes. The outputs currently are:

Now, for the display options. So far, we’ve got:

  • Use the WordPress plugin to display merged ICS

  • Display the entire calendar as included (maybe customized) HTML

  • Display today’s events as included or script-sourced HTML

  • I have also just recently added a new method that enables things like this: http://jonudell.net/test/upcoming-widget.html

  • You can view the source to see how it’s done. The “API call” here is:

    http://elmcity.cloudapp.net/services/elmcity/json?jsonp=eventlist&recent=7&view=music

    Yours might be:

    http://elmcity.cloudapp.net/services/YourCollegeLearningCommons/json?jsonp=eventlist&recent=10

    or

    &recent=20&view=holiday

    etc.

    This is brand new, as of yesterday. Actually I just realized I should use “upcoming” instead of “recent” so I’ll go and change that now :-) But you get the idea.

    The flexibility here is ultimately governed by:

    1. The curator’s expressive and disciplined use of tags to create useful views

    2. The kinds of queries I make available through the API. So far I’ve only been asked to do ‘next N events’ so that’s what I did yesterday. But my intention is to support every kind of query that’s feasible, and that people ask for. Things like a week’s worth, or a week’s worth in a category, are obvious next steps.

Two projects for civic-minded student programmers

One of the key findings of the elmcity project, so far, is that there’s a lot of calendar information online, but very little in machine-readable form. Transforming this implicit data about public events into explicit data is an important challenge. I’ve been invited to define the problem, for students who may want to tackle it as a school project. Here are the two major aspects I’ve identified.

A general scraper for calendar-like web pages

There are zillions of calendar-like web pages, like this one for Harlow’s Pub in Peterborough, NH. These ideally ought to be maintained using calendar programs that publish machine-readable iCalendar feeds which are also transformed and styled to create human-readable web pages. But that doesn’t (yet) commonly happen.

These web pages are, however, often amenable to scraping. And for a while, elmcity curators were making very effective use of FuseCal (1, 2, 3) to transform these kinds of pages into iCalendar feeds.

When that service shut down, I retained a list of the pages that elmcity curators were successfully transforming into iCalendar feeds using FuseCal. These are test cases for an HTML-to-iCalendar service. Anyone who’s handy with scraping libraries like Beautiful Soup can solve these individually. The challenge here is to create, by abstraction and generalization, an engine that can handle a significant swath of these cases.

A hybrid system for finding implicit recurring events and making them explicit

Lots of implicit calendar data online doesn’t even pretend to be calendar-like, and cannot be harvested using a scraper. Finding one-off events in this category is out of scope for my project. But finding recurring events seems promising. The singular effort required to publish one of these will pay ongoing dividends.

It’s helpful that the language people use to describe these events — “every Tuesday”, “third Saturday of every month” — is distinctive. To being exploring this domain, I wrote a specialized search robot that looks for these patterns, in conjunction with names of places. Its output is available for all the cities and towns participating in the elmcity project. For example, this page is the output for Keene, NH. It includes more than 2000 links to web pages — or, quite often, PDF files — some fraction of which represent recurring events.

In Finding and connecting social capital I showed a couple of cases where the pages found this way did, in fact, represent recurring events that could be added to an iCalendar feed.

To a computer scientist this looks like a problem that you might solve using a natural language parser. And I think it is partly that, but only partly. Let’s look at another example:

At first glance, this looks hopeful:

First Monday of each month: Dads Group, 105 Castle Street, Keene NH

But the real world is almost always messier than that. For starters, that image comes from the Monadnock Men’s Resource Center’s Fall 2004 newsletter. So before I add this to a calendar, I’ll want to confirm the information. The newsletter is hosted at the MMRC site. Investigation yields these observations:

  • The most recent issue of the newsletter was Winter ’06

  • The last-modified date of the MMRC home page is September 2008

  • As of that date, the Dads Group still seems to have been active, under a slightly different name: Parent Outreach Project, DadTime Program, 355-3082

  • There’s no email address, only a phone number.

So I called the number, left a message, and will soon know the current status.

What kind of software-based system can help us scale this gnarly process? There is an algorithmic solution, surely, but it will need to operate in a hybrid environment. The initial search-driven discovery of candidate events can be done by an automated parser tuned for this domain. But the verification of candidates will need to be done by human volunteers, assisted by software that helps them:

  • Divide long lists of candidates into smaller batches

  • Work in parallel on those batches

  • Evaluate the age and provenance of candidates

  • Verify or disqualify candidates based on discoverable evidence, if possible

  • Otherwise, find appropriate email addresses (preferably) or phone numbers, and manage the back-and-forth communication required to verify or disqualify a candidate

  • Refer event sponsors to a calendar publishing how-to, and invite them to create data feeds that can reliably syndicate

Students endowed with the geek gene are likely to gravitate toward the first problem because it’s cleaner. But I hope I can also attract interest in the second problem. We really need people who can hack that kind of real-world messiness.

Curation, meta-curation, and live Net radio

I’ve long been dissatisfied with how we discover and tune into Net radio. This iTunes screenshot illustrates the problem:

Start with a genre, pick a station in that genre, then listen to that station. This just doesn’t work for me. I like to listen to a lot of different things. And I especially value serendipitous recommendations from curators whose knowledge and preferences diverge radically from my own.

Yes there’s Pandora, but what I’ve been wanting all along is a way to enable and then subscribe to curators who guide me to what’s playing now on the live streams coming from radio stations around the world. It’s Wednesday morning, 11AM Eastern Daylight Time, and I know there are all kinds of shows playing right now. But how do I materialize a view for this moment in time — or for tonight at 9PM, or for Sunday morning at 10AM — across that breadth and wealth of live streams?

I started thinking about schedules of radio programs, and about calendars, and about BBC Backstage — because I’ll be interviewing Ian Forrester for an upcoming episode of my podcast — and I landed on this blog post which shows how to form an URL that retrieves upcoming episodes of a BBC show as an iCalendar feed.

Meanwhile, I’ve just created a new mode for the elmcity calendar aggregator. Now instead of creating a geographical hub, which combines events from Eventful and Upcoming and events from a list of iCalendar feeds — all for one location — you can create a topical hub whose events are governed only by time, not by location.

Can these ingredients combine to solve my Net radio problem? Could a curator for an elmcity topical aggregator cherrypick favorite shows from around the Net, and create a calendar that shows me what’s playing right now?

It seems plausible, so I spun up a new topical hub in the elmcity aggregator and started experimenting.

I began with the BBC’s iCalendar feeds. But evidently they don’t include VTIMEZONE components, which means calendar clients (or aggregators) can’t translate UK times to other times.

I ran into a few other issues, which perhaps can be sorted out when I chat with Ian Forrester. But meanwhile, since the universe of Net radio is much vaster than the BBC, and since most of it won’t be accessible in the form of data feeds, I stepped back for a broader view.

Really, anyone can publish an event that gives the time for a live show, plus a link to its player. And when a show happens on a regular recurring schedule, the little bit of effort it takes to publish that event pays recurring dividends.

Consider, for example, Nic Harcourt’s Sounds Eclectic. It’s on at these (Pacific) times: SUN 6:00A-8:00A, SAT 2:00P-4:00P, SAT 10:00P-12:00A. You can plug these into any calendar program as recurring events. And if you publish a feed, it’s not only available to you from any calendar client, it’s also available to any other calendar client — or to any aggregator.

Here’s a calendar with three recurring events for Sounds Eclectic, plus one recurring event for WICN’s Sunday jazz show, plus a single non-recurring event — the BBC’s Folkscene — which will be on the BBC iPlayer on Thursday at 4:05PM my time and 9:05PM UK time. If you load the calendar feed into a client — Outlook, Apple iCal, Google Calendar, Lotus Notes — you’ll see these events translated into your local timezone.

Note that Live Calendar is especially handy for publishing events from many different timezones. That’s because like Outlook, but unlike Google Calendar, it enables you to specify timezones on a per-event basis. So instead of having to enter the Sunday morning recurrence of Sounds Eclectic as 9AM Eastern Daylight, I can enter it as 6AM Pacific Daylight Time. Likewise Folkscene: I can enter 9:05 British Summer Time. Since these are the times that appear on the shows’ websites, it’s natural to use them.

This sort of calendar is great for personal use. But I’m looking for the Webjay of Net radio. And I think maybe elmcity topical hubs can help enable that.

There’s a way of using these topical hubs I hadn’t thought of until Tony Karrer created one. Tony runs TechEmpower, a software, web, and eLearning development firm. He wants to track and publish online eLearning events, so he’s managing them in Google Calendar and syndicating them through an elmcity topical hub to his website.

A topical hub, like a geographic hub, is controlled by a Delicious account whose owner maintains a list of feeds. I’d been thinking of the account owner as the curator, and of the feeds as homogeneous sources of events: school board meetings, soccer games, and so on.

But then Tony partnered with another organization that tracks webinars, invited that group to publish its own feed, added it to the eLearning hub, and wrote a blog post entitled Second Calendar Curator Joins to Help with List of Free Webinars:

The initial list of calendar entries, we added ourselves. But I’m pleased to announce that we’ve just signed up our second calendar curator – Coaching Ourselves. Their events are now appearing in the listings. … It is exactly because we can distribute the load of keeping this list current that makes me think this will work really well in the long run.

This probably shouldn’t have surprised me, but it did. I’d been thinking in terms of curators, feeds, and events. What Tony showed me is that you can also (optionally) think in terms of meta-curators, curators, feeds, and events. In this example, Tony is himself a curator, but he is also a meta-curator — that is, a collector of curators.

I’d love to see this model evolve in the realm of Net radio. If you want to join the experiment, just use any calendar program to keep track of some of your favorite recurring shows. (Again, it’s very helpful to use one that supports per-event timezones.) Then publish the shows as an iCalendar feed, and send me the URL. As the meta-curator of delicious.com/InternetRadio, as well as the curator of jonu.calendar.live.com/calendar/InternetRadio/index.html, I’ll have two options. If I like most or all of the shows you like, I can add your feed to the hub. If I only like some of the shows you like, I can cherrypick them for my feed. Either way, the aggregated results will be available as XML, as JSON, and as an iCalendar feed that can flow into calendar clients or aggregators.

Naturally there can also be other meta-curators. To become one, designate a Delicious account for the purpose, spin up your own topical hub, and tell me about it.

Topical event hubs

The elmcity project began with a focus on aggregating events for communities defined by places: cities, towns. But I realized a while ago that it could also be used to aggregate events for communities defined by topics. So now I’m building out that capability. One early adopter tracks and promotes online events in the e-learning domain. Another tracks and promotes conferences and events related to environmentally-sustainable business practices.

The curation method is very similar to what’s defined in the elmcity project FAQ. To define a topic hub you use a Delicious account, you create a metadata URL as shown in the FAQ, and you use what= instead of where= to define a topic instead of a location. Since there’s no location, there’s no aggregation of Eventful and Upcoming events. The topical hub is driven purely by your registry of iCalendar feeds.

If you (or somebody you know) needs to curate events by topic, and would like try this method, please get in touch. I’d love to have you help me define how this can work, and discover where it can go.