Search Results for 'elmcity.info'


I had a hunch that if I grew sunflowers in a fenced enclosure inside the chicken run they’d get big, since that’s the most fertile part of my backyard. Tonight I measured the tallest at 10 feet, 8 inches (3.25 meters). It’s stout, too, I feel like I could almost climb it. Impressive!

Yeah, but how impressive? And, even more interesting to me, how can we find data to help answer the question? Perhaps with a sequence of searches like so:

“1-foot sunflower”

“2-foot sunflower”

…etc…

“26-foot sunflower”

“27-foot sunflower”

These are parallel searches of Google and Bing for [1..27]-foot sunflower”. Here are the resulting counts, with Bing scaled up by a factor of 100 to make the trends comparable:

So, maybe my near-11-footer isn’t so special after all. This method of finding out is interesting, though. It seems incredibly naive. If you try those queries you’ll find all sorts of stuff that isn’t relevant to what I mean by an n-foot sunflower. But if the amount of irrelevance is constant across the range, it factors out, right? And the two independent search engines make this a controlled experiment.

I wonder how well this proxy for sunflower height distribution correlates with the actual distribution. Of course there are a million other questions you could try to answer this way. It’d be easy to make a web app to automate this method. I lazily hope somebody already has, or will, so I don’t have to.


PS: My sunflowers are actually a second crop. The first one had a crazy head start, because we had freaky warm weather in February. But then in early April, when they were already 3 feet high, the chickens broke into the enclosure and demolished them. What lofty heights could my sunflowers have reached this summer? We’ll never know.


PPS: Here’s the data:

1,2,0
2,994,10
3,8,4
4,10,4
5,9,4
6,3270,37
7,74,11
8,135,12
9,176,11
10,1690,39
11,75,9
12,472,37
13,82,12
14,220,8
15,54,9
16,9,4
17,2,1
18,55,4
19,6,2
20,119,8
21,0,0
22,2,0
23,0,0
24,8,3
25,891,2
26,3,2
27,0,0

When I posted Permalinks and hashtags for city council agenda items last week, I embedded a permalink and a hashtag to illustrate the idea. The post links to the video of Keene’s recent city council meeting, at the point where Patty Little introduces Tom LePage’s request to expand the Armadillo’s sidewalk cafe. The post also refers to this agenda item using the hashtag generated for it by the Granicus system.

I figured this would enable two ways to find pages, like my blog post, that refer to agenda items, like Tom’s request. First, you could search for pages that mention the hashtag. For example, this combined search of Google and Bing for granicus732_7716 finds my blog post because it mentions that tag. These searches also find my tweet containing the tag, and some echoes of the tweet. Finally, of course, you could search Twitter directly for the tag.

A second approach would be to search for pages that link to the video segment. I expected to be able to find my blog post by searching for this permalink which it cites:

http://keene.granicus.com/MediaPlayer.php?view_id=2&clip_id=77&meta_id=7716

I planned to use the link: operator, which finds pages pointing to an URL. And I figured this would work for both Google and Bing. But I was wrong on several counts. Bing doesn’t seem to support the link: operator. And even though Google does, this query doesn’t find my blog post.

Using the permalink as a plain search term doesn’t work either. And after reviewing the advanced search operators for both Google and Bing, I’m left wondering: How do you find pages that cite a permalink?

Last week I said that confusion about the visibility of events in Facebook had thwarted my plan to include Facebook as an event source for elmcity hubs. The day after I wrote that post, though, Stephen Judd noted in a comment that a new data entry method has appeared — one that clears up the confusion.

Until April 30, your choices when publishing an event were:

Open: Anyone can see this Event and its content.

Closed: Anyone can see this Event, but its content is only shown to guests.

Secret: Only people who are invited can see this Event and its content.

Some people opted for Closed when they really ought to have picked Secret. With the advent of API-based search that meant automated tools like the elmcity aggregator could surface events — like surprise birthday parties — not meant to be seen.

But on May 1 the choices had narrowed to just public or private. It’s implemented as a checkbox:

[x] Anyone can view and RSVP (public event)

It defaults to checked, i.e. public. That’s consistent with the general tilt, in Facebook, toward public rather than private defaults. Many people think that’s the wrong default, and I’m inclined to agree. But at least a confusing three-valued choice has been reduced to an easier-to-understand two-valued choice.

Given that, I’ve decided to add Facebook as an elmcity event source. I’m mindful of the power of defaults, so haven’t made this a default behavior. When a curator spins up a new elmcity hub, the event sources included by default — that is, before you add any iCalendar feeds to your registry — were, and still are, Eventful, Upcoming, and EventBrite. If you want to add Facebook events, you can now do so by adding a new name/value pair to your hub’s metadata record in delicious:

facebook=yes

Curators can, by the way, now include or exclude any of the services. These are the defaults:

eventful=yes
upcoming=yes
eventbrite=yes
facebook=no

All of these settings can be tweaked.

The elmcity service finds Facebook events by searching for them using the location you specify in your metadata record. Here are some sample searches:

If you change the location parameter in that URL you can see which Facebook events will be included for your town. So far, I’m not seeing many public events, even for very populous locations. Facebook’s event system was always more appropriate for friends-and-family events that you wouldn’t expect to see on a community calendar. If you wanted to advertise an event open to the general public, services like Eventful or Upcoming or EventBrite were better ways to do it. Or you can create a public iCalendar feed.

It will be interesting to see if Facebook’s new event system, which defaults to public, produces more public events than before. To the extent that it does, it could become a useful source for elmcity curators. But if people who create public events in Facebook want that to happen, they’ll need to learn more about those events appear in other contexts.

Consider this event, one of a handful that turns up in a search for Keene, NH. Here’s what anyone can see in Facebook:

What film is being screened? Neither the title nor the description tells us. My guess is that if you know Susan Hay, and are affiliated with mothersuniting.org, that information is part of a shared context that Susan just took for granted when she posted this “public” Facebook event. When she marked the event Public it hadn’t occurred to her that the actual scope of Public means she ought to have named the film in the title or description.

Note that there is an events page at mothersuniting.org, albeit five years behind the times. My own view is that mothersuniting.org should be the authoritative source for its own event information. It could use Google Calendar, for example, to publish an HTML view of a calendar into the events page on its website, while at the same time producing an iCalendar feed that could be listed in a community registry. Facebook really ought to be a downstream consumer of that kind of event source, not an upstream producer.

But no matter what I think, there will be people, maybe a lot of people, who end up making Facebook the authoritative source for their event information, instead of their own websites. So I’m enabling curators to capture those streams. We’ll see how it unfolds. As always, it’ll be fascinating to watch people walk the slippery path that divides private from public.


PS: If you’re a developer working with the new events API, here’s an odd quirk I’ve uncovered. Dates and times reported through the Facebook API don’t correlate sensibly with dates and times reported in the Facebook application.

At first I thought this was a timezone issue, and tried various Ptolemaic adjustments to make things work out. It got weirder and weirder, until finally I went empirical and made this table of observations:

    Where: Keene: GMT-5
 FB start: 2010-05-09 19:00
API start: 2010-05-10 02:00
     Diff: +7 hours

http://www.facebook.com/event.php?eid=113825365318937

    Where: Chicago: GMT-6
 FB start: 2010-05-01 11:00
API start: 2010-05-02 06:00
     Diff: +7 hours

http://www.facebook.com/event.php?eid=114150548619789

    Where: Salt Lake City: GMT-7
 FB start: 2010-04-28 06:00
API start: 2010-04-28 13:00
     Diff: +7 hours  

http://www.facebook.com/event.php?eid=115044215191908

    Where: Fresno: GMT-8
 FB start: 2010-05-03 11:00
API start: 2010-05-03 18:00
     Diff: +7 hours

For no reason I can see, the API reports a local time that’s 7 hours ahead of the time you see when you view the event in Facebook. After making that adjustment, things seem to work. Why 7 hours? Beats me.

I’ve long wanted to be able to add Facebook to the list of sources that my elmcity service queries for local event information. It was never possible before, but the recent changes to the Facebook API (and terms of service) prompted me to take another look.

At first glance, it seems doable. Here are some sample queries:

http://elmcity.info/fb_events?location=keene,nh

http://elmcity.info/fb_events?location=ann arbor, mi

http://elmcity.info/fb_events?location=portsmouth,nh

You can see what turns up for your town by swapping in your city and state. A lot of the events are public and could reasonably be included in a citywide aggregation. But then there are ones like this:

SURPRISE Lantheaume Baby Shower
1000 Market Street, Portsmouth, NH 03801
2010-06-26T20:00:00+0000

Clearly this baby shower should not appear on a citywide public calendar. Why does search find it? Let’s look at the data about this event that’s visible to the world:

{ “id”: “314667046847″,
“owner”: {
“name”: “Jesse Barnes”,
“id”: “11000551″},
“name”: “SURPRISE Lantheaume Baby Shower”,
“description”: “Baby \”Ox\” is on his or her way! Come and celebrate with the mom-to-be and her closest friends and family! Please remember to bring your decorated onesie so that we can display them for Kris. \n\nLook on this site for additional details that are still being determined. “,
“start_time”: “2010-06-26T20:00:00+0000″,
“end_time”: “2010-06-26T23:00:00+0000″,
“location”: “1000 Market Street, Portsmouth, NH 03801″,
“privacy”: “CLOSED”,
“updated_time”: “2010-04-02T15:01:10+0000″}

When you create a private event, there are three options:

Open: Anyone can see this Event and its content.

Closed: Anyone can see this Event, but its content is only shown to guests.

Secret: Only people who are invited can see this Event and its content.

Clearly Jesse should have marked this event Secret, not Closed. Until very recently, an error like that would be unlikely to result in an embarrassing information leak. But now things have changed, and people are going to start learning harsh lessons about the visibility of their Facebook stuff.

I don’t see any way to teach my service to exclude events that people marked as Closed because they thought it meant Secret. So I guess elmcity’s Facebook feature is going to have to wait until those lessons are learned.

A while ago I asked the Lazy Web for a service that would produce a tag cloud of the names of the lists on which a Twitter user appears. Mine, for example, would look like this:

The Lazy Web seems not to have taken up the challenge, so I took a crack at it. The solution I came up with is a single-page application, which is just a web page that uses HTML, CSS, and Ajax to do something that’s (hopefully) interesting and useful.

Here’s the page: http://jonudell.net/NamesOfTwitterListsFor.html

It defaults to my Twitter name but you’ll of course want to try yours, and those of others you’re curious about. The first time through, you’ll be prompted to authenticate to api.twitter.com. This looks like the password anti-pattern, but really isn’t. You’re authenticating yourself to the Twitter API in the same way that you normally do to the Twitter website.

Note that since the API call used to build the tag cloud is rate-limited, queries through this page will be charged against your daily allotment of Twitter API usage, just as when you use client applications like TweetDeck or Seesmic.

What will your tag cloud say about you? I don’t think you’ll be surprised. It’s just another of the unique signatures written for us by others. That those signatures do get written, though, and that they can be discovered and read, never ceases to surprise me.

The dynamics of single-page applications also never cease to surprise me. In this case, a tiny 4K web page is all that’s delivered from my modestly-equipped personal webserver. It would probably survive a Slashdotting. If not, the page could be hosted on any other server, or on a other local drive, and would continue to work the same way.

I’m also using jQuery, in this case served from the Microsoft content delivery network, so that’s unlikely to be a bottleneck. The only real limit is Twitter API usage, and that’s spread across all the Twitter users who authenticate through the page.

When you arrange and deploy a tiny amount of HTML, CSS, and JavaScript in this way, you can create a lot of leverage!

Last year I applied for a grant from a philanthropic group, the Knight Foundation, that wants to save journalism by funding the development of new technological methods. I was conflicted about applying because the project I put forward is already well supported by my employer, Microsoft. But since my proposal was to redistribute all of the grant, as a way of exploring an idea about improving the flow of information in communities, I thought it was fair to give it a shot.

My proposal advanced to the final round and was then rejected. Given my initial ambivalence I was OK with that. But the stated rationale has been bugging me ever since. The letter said:

Because there are thousands of proposals and only a few of them advance, we are able to choose only the most innovative ideas. These are new kinds of technologies or techniques, usually things we have never heard of before.

The meme woven into that paragraph has a name: Shiny New Thing syndrome. It is a plague. Technology journalism feeds it. Thought leaders, including Dave Slusher, Jeremy Zawodny, and Jeff Atwood, have denounced it.

I’m clearly biased, since all my best work involves creative remixing of ideas and technologies that are as common as dirt. But I do wonder about the harm that’s done when we equate innovation with shiny new things.

Old things are full of latent value that we’ve yet to discover and unlock. Why? It takes a long time for real understanding to sink in. In Net infrastructure, consider how long it’s taken us to grok what HTTP, REST, HTML, and JavaScript really are and can do. In education, look at the high-value uses that Sal Khan and Dan Meyer find for low-tech screencasting and blogging tools. In journalism and civic life, read what Alan Rusbridger says about Will Perrin’s compelling — and yet so last-century — use of Typepad to activate communities.

Well, I try to do my part. On my show, which is called Interviews with Innovators, I feature people who are more likely to be evolutionary repurposers than revolutionary creators. Maybe I should rename the show Shiny Old Things.

Over the weekend I was poking around in the recipient-reported data at recovery.gov. I filtered the New Hampshire spreadsheet down to items for my town, Keene, and was a bit surprised to find no descriptions in many cases. Here’s the breakdown:

# of awards 25
# of awards with descriptions 05 20%
# of awards without descriptions 20 80%
$ of awards 10,940,770
$ of awards with descriptions 1,260,719 12%
$ of awards without descriptions 9,680,053 88%

In this case, the half-dozen largest awards aren’t described:

award amount funding agency recipient description
EE00161 2,601,788 Sothwestern Community Services Inc
S394A090030 1,471,540 Keene School District
AIP #3-33-SBGP-06-2009 1,298,500 City of Keene
2W-33000209-0 1,129,608 City of Keene
2F-96102301-0 666,379 City of Keene
2F-96102301-0 655,395 City of Keene
0901NHCOS2 600,930 Sothwestern Community Services Inc
2009RKWX0608 459,850 Department of Justice KEENE, CITY OF The COPS Hiring Recovery Program (CHRP) provides funding directly to law enforcement agencies to hire and/or rehire career law enforcement officers in an effort to create and preserve jobs, and to increase their community policing capacity and crime prevention efforts.
NH36S01050109 413,394 Department of Housing and Urban Development KEENE HOUSING AUTHORITY ARRA Capital Fund Grant. Replacement of roofing, siding, and repair of exterior storage sheds on 29 public housing units at a family complex

That got me wondering: Where does the money go? So I built a little app that explores ARRA awards for any city or town: http://elmcity.cloudapp.net/arra. For most places, it seems, the ratio of awards with descriptions to awards without isn’t quite so bad. In the case of Philadelphia, for example, “only” 27% of the dollars awarded ($280 million!) are not described.

But even when the description field is filled in, how much does that tell us about what’s actually being done with the money? We can’t expect to find that information in a spreadsheet at recovery.gov. The knowledge is held collectively by the many people who are involved in the projects funded by these awards.

If we want to materialize a view of that collective knowledge, the ARRA data provides a useful starting point. Every award is identified by an award number. These are, effectively, webscale identifiers — that is, more-or-less unique tags we could use to collate newspaper articles, blog entries, tweets, or any other online chatter about awards.

To promote this idea, the app reports award numbers as search strings. In Keene, for example, the school district got an award for $1.47 million. The award number is S394A090030. If you search for that you’ll find nothing but a link back to a recovery.gov page entitled Where is the Money Going?

Recovery.gov can’t bootstrap itself out of this circular trap. But if we use the tags that it has helpfully provided, we might be able to find out a lot more about where the money is going.

Instead of mourning the lost art of personal customer service, I would rather celebrate examples that show it’s still possible. Yesterday I found two gems.

First, Southwest Airlines. I had booked a round-trip flight and then needed to change to one-way. You can’t do that online. So I clenched my jaw, called customer service, and prepared for the long wait.

Instead, this:

IVR: “Would you like us to call you back in about 20 minutes?”

Me: “Why…yes! Beep, beep, beep, beep, beep, beep, beep, #.”

My jaw relaxed.

Twenty or so minutes later, an agent called back and we made the change. Now the unclenched jaw morphed into a smile.

Second, FindTape.com. I’m making interior storm windows and I need double-stick tape for the project. Which, sure, you can buy online. But the smorgasbord of choices is paralyzing. I wasted a half-hour trying to figure out which product would best suit my unusual application and made no progress whatsoever.

Then, at FindTape.com, I read this:

If you have a specific question related to which tape would work best in your application please fill out and submit the following fields so that we can have an appropriate representative get back in contact with you.

A fellow named Kevin wrote back, we’ve have been discussing my options, and now I’m ready to buy.

Both examples remind me of Michael Nielsen’s luminous phrase: the restructuring of expert attention. He coined it to define a new era of scientific collaboration, but it applies more broadly.

We’ve been told that companies can’t afford to focus expert attention on customers. The truth, of course, is that they can’t afford not to.

For a generation and more we’ve driven a wedge between people who have expertise with products and services and people who need that expertise. How’s that working for you? Me neither.

It’s true that expert attention is a scarce resource. But we’re living through a Cambrian explosion of awareness networks and communication modes. Used adroitly, they can optimize the allocation of that scarce resource. Which is a fancy way of saying: Maybe personal customer service isn’t a lost art after all.

My guest for this week’s Innovators show, Ian Forrester, heads up the BBC’s Backstage project. Launched in 2005, Backstage lives at a cultural crossroads where legacy systems and methods intersect with their next-generation counterparts. The tagline for the feeds and APIs provided under the Backstage umbrella is “use our stuff to build your stuff.”

Admittedly that sounded a lot more exciting prior to 2006, when the BBC ended its trial of the Creative Archive service that was expected to “open the floodgates” to a “treasure trove” of cultural riches. Ian Forrester says those expectations were ratcheted back for two reasons. First, much of that treasure trove remains undigitized. Second, rights clearance proved to be an intractable problem.

So the “our stuff” that’s available to build “your stuff” turns out to be mostly metadata: news headlines, program titles and schedules. What’s more, that metadata comes from a plethora of BBC content management systems. What can you make out of these ingredients?

Here’s an evocative example: http://www.bbc.co.uk/nature/species/African_Bush_Elephant. The BBC’s Tom Scott explains:

Over the last few months we’ve been plundering the NHU’s [Natural History Unit's] archive to find the best bits — segmenting the TV programmes, tagging them (with DBpedia terms) and then aggregating them around URIs for the key concepts within the natural history domain; so that you can discover those programme segments via both the originating programme and via concepts within the natural history domain — species, habitats, adaptations and the like.

This is just the sort of remixing that Backstage ought to enable anyone, inside or outside the BBC, to achieve. Since I’m a US resident, and don’t pay the UK’s television license fee, I can’t watch the videos on that page. There’s nothing that the Backstage team can do about that. But they can take a radically open and inclusive approach to the management of the metadata that supports this remixing, and that’s just what they’re doing.

In our conversation, Ian Forrester describes how the taxonomy that governs the Backstage feeds and APIs is shared with that of Wikipedia and its structured derivative, DBpedia. Tom Scott elaborates:

You might have noticed that the slugs for our URIs (the last bit of the URL) are the same as those used by Wikipedia and DBpedia that’s because I believe in the simple joy of webscale identifiers, you will also see that much like the BBC’s music site we are transcluding the introductory text from Wikipedia to provide background information for most things. This also means that we are creating and editing Wikipedia articles where they need improving (of course you are also more than welcome to improve upon the articles).

As someone who both practices and preaches collaborative curation, I’m delighted to see the BBC taking this approach. And I love the phrase webscale identifier. Here’s how Michael Smethurst defines it, in the post pointed to by Tom Scott:

I agree with the four Linked Data rules but I’d like to try to add a fifth: if possible don’t reinvent other people’s web identifiers. By web identifiers I mean those fragments of URLs that uniquely identify a resource within a domain. So in the case of the MusicBrainz entry for The Fall (http://musicbrainz.org/artist/d5da1841-9bc8-4813-9f89-11098090148e.html) that’ll be d5da1841-9bc8-4813-9f89-11098090148e.

The last time we updated the /music site we made this mistake (kind of unavoidable at the time). Even though we linked our data to MusicBrainz we minted new identifiers for artists. So The Fall became http://www.bbc.co.uk/music/artist/jb9x/ where jb9x was the identifier. But jb9x doesn’t exist anywhere outside of /music. We’ll (hopefully) never make that mistake again.

Beautifully said. Enormous synergies have gone unrealized because web publishers have chosen to mint new namespaces rather than add value to existing ones.

What I realized when talking with Ian, though, is that there is one namespace for which the BBC is the appropriate mint, namely its own. Here, for example, are some of the family of URLs for a radio drama called The Archers:

homepage: http://www.bbc.co.uk/programmes/b006qpgr/

upcoming shows: http://www.bbc.co.uk/programmes/b006qpgr/episodes/upcoming.xml

In this example b006qpgr is, at least potentially, a webscale identifier. It’s a unique tag for the show that, if used on blogs, on Twitter, and elsewhere, would make it easy to assemble all kinds of online activity related to the show. But in fact only web developers using Backstage feeds and APIs will ever discover, or use, b006qpgr. In colloquial discourse people use The Archers.

If the BBC wants people to collaborate with its namespace in the same way that it collaborates with Wikipedia’s, this would be more inviting:

http://www.bbc.co.uk/programmes/The_Archers/

http://www.bbc.co.uk/programmes/The_Archers/episodes/upcoming.xml

It should go without saying, but right after the first rule for linked data, “Use URIs as names for things,” I would add “Where possible, choose names that make sense to people.”

In the spirit of keystroke conservation, I’m relaying some elmcity-related questions and answers from email to here. Hopefully it will attract more questions and more answers.

Dear Mr. Udell,

I am looking for a flexible calendar aggregator that I can use to report upcoming events for our college’s “Learning Commons” WordPress MU website, a site that will hopefully help keep our students abreast of events and opportunities taking place on campus.

1) Our site will be maintained using WordPress MU, so ideally the
display of the calendars, and/or event-lists will be handled by a
WordPress plugin. The one I am favouring is
http://wordpress.org/extend/plugins/wordpress-ics-importer/ . I have
tried this plugin and it almost does what we want.

Specifically, the plugin includes:

– a single widget that can display the “event-list” for one calendar;

– flexible options for displaying and aggregating calendars.

This plugin almost does what I want, but not quite.

a) The plugin is now limited to a single “events-list” widget. But with WordPress 2.8, it is possible to have many instances of a widget, so theoretically, I could display the “Diagnostic Tests” calendar in one instance , and the “Peer-tutoring” calendar in another widget instance.

b) It would be nice to have an option to display only the current week for specific calendars. While in other cases, it makes sense to display the entire month. And although I haven’t thought about it, likely displaying just the current day would be useful.

c) I would like flexibility over which calendars to aggregate, creating as many “topic” hubs as the current maintainer of the website might think useful for the students.

2) It would be nice to remove the calendar aggregation from the WordPress plugin, and handle that separately. Hopefully the calendars will change much less frequently than the website will be viewed. If I understand http://blog.jonudell.net/elmcity-project-faq/ properly, this might be possible using the elmcity-project.

For example, I think we could use “topical hub aggregation” to create a “diagnostic test calendar” that aggregates the holiday calendar and the different departments “diagnostic test” calendars. What I don’t understand is what is the output of “elmcity”. Does it output a single merged calendar (ics) that could be displayed by the above plugin? Is that a possibility?

Similarly, I believe I could create a different meta bookmark to aggregate our holiday calendar and our different peer-tutoring calendars (created by each department). Is this correct?

We have lots of groups, faculty, departments and staff on campus, and each wants to publicize their upcoming events. Letting them input and maintain their own calendars really seems to make sense. (Thanks for the idea. It seems clear this is the way to go, but I don’t seem to have the pieces to construct the output properly, as yet.)

I agree with your analysis that it would be better to have a separation of concerns between aggregation and display. So let’s do that, and start with aggregation.

I would like flexibility over which calendars to aggregate, creating as many “topic” hubs as the current maintainer of the website might think useful for the students.

I think the elmcity system can be helpful here. I’ve recently discovered that there are really two levels — what I’ve started to call curation and meta-curation.

I believe I could create bookmarks to aggregate our holiday calendar and our different peer-tutoring calendars (created by each department). Is this correct?

Right. It sounds like you’d want to curate a topic hub. It could be YourCollege, but if there may need to be other topic hubs you could choose a more specific name, like YourCollegeLearningCommons. That’d be your Delicious account name, and you’d be the “meta-curator” in this scenario.

As meta-curator you’d bookmark, in that Delicious account:

- Your holiday calendar

- Multiple departments’ calendars

Each of those would be managed by the responsible/authoritative person, using any software (Outlook, Google, Apple, Drupal, Notes, Live, etc.) that can publish an ICS feed.

There’s another level of flexibility using tags. In the above scenario, as meta-curator you could tag your holiday feed as holiday, and your LearningCommons feeds as LearningCommons, and then filter them accordingly.

What I don’t understand is what is the output of elmcity. Does it output a single merged calendar (ics) that could be displayed by the above plugin?

Yes. The outputs currently are:

Now, for the display options. So far, we’ve got:

  • Use the WordPress plugin to display merged ICS

  • Display the entire calendar as included (maybe customized) HTML

  • Display today’s events as included or script-sourced HTML

  • I have also just recently added a new method that enables things like this: http://jonudell.net/test/upcoming-widget.html

  • You can view the source to see how it’s done. The “API call” here is:

    http://elmcity.cloudapp.net/services/elmcity/json?jsonp=eventlist&recent=7&view=music

    Yours might be:

    http://elmcity.cloudapp.net/services/YourCollegeLearningCommons/json?jsonp=eventlist&recent=10

    or

    &recent=20&view=holiday

    etc.

    This is brand new, as of yesterday. Actually I just realized I should use “upcoming” instead of “recent” so I’ll go and change that now :-) But you get the idea.

    The flexibility here is ultimately governed by:

    1. The curator’s expressive and disciplined use of tags to create useful views

    2. The kinds of queries I make available through the API. So far I’ve only been asked to do ‘next N events’ so that’s what I did yesterday. But my intention is to support every kind of query that’s feasible, and that people ask for. Things like a week’s worth, or a week’s worth in a category, are obvious next steps.

My recent adventure in naming the times of day was so much fun that I lost track of the original purpose of the exercise, which was to improve accessibility for sight-impaired users.

When I interpersed time-of-day labels into each day’s event listing, I used HTML DIV tags. Wrong, wrong, wrong! Those labels are structural elements, and as my accessibility consultant Susan Gerhart gently reminded me, screen readers depend on HTML headings to find and announce them. The labels should have been second-level headings — i.e., HTML H2 tags.

It gets worse. When Susan prompted me to take another look at what I’d done, I found that the date labels were inexplicably tagged as paragraphs (P) instead of the top-level headers (H1) that they logically are.

Oh. Right. Of course. Duh. Fixed. Sorry.

What was I thinking? How could somebody like me, who has preached about the attention-focusing power of heads, decks, and leads, screw up something so basic as this?

Easily, as it turns out, in the absence of feedback. If you yourself don’t depend on a design feature, there is a natural tendency to forget why it matters to others.

Coincidentally (or not) Susan recently wrote an essay, and published a companion audio recording, that will help me — and I hope others — not to forget again. Entitled Hear Me Stumble Around White House, Recovery, and Data GOV web sites, it’s a blow-by-blow account of her efforts to navigate those sites with a screen reader.

In this recording you can hear Susan and her screen reader trying to make sense of whitehouse.gov. If you’ve never heard a screen reader in action, it’s worth listening for that alone. You’ll get a very clear sense of how these tools depend on the hierarchy of the page.

Simultaneously you’ll hear Susan narrate her intention — to read an article about cybersecurity — and her frustration. For example:

I was thrown off by the slide show at the top of the page. Once I hit the cybersecurity story, the next time I traverse this section the story was about the Supreme Court nominee.

Despite this randomness, the page does at least identify the top stories with H1 tags. And Signed Legislation is an H2. But none of the headlines under Signed Legislation are H3s, they’re Ps.

Over at recovery.gov and data.gov Susan finds none at all, and reacts to their omissions less gently she did to mine:

It’s the headings, stupid!!!

Thanks. I will try not to forget that again.


PS: In a follow-up to her blog essay, Susan links to detailed reports by accessibility pioneer Jim Thatcher on the issues he found with data.gov and recovery.gov.

When I invited folks to become calendar curators for the elmcity project, the person who stepped forward in Prescott AZ was Susan Gerhart, whom I interviewed here. One of her great insights about web design is that the right thing for a vision-impaired user is almost always also the right thing for everyone. She calls this the curb cuts principle:

Curb cuts for wheelchairs also guide blind persons into street crossings and prevent accidents for baby strollers, bicyclists, skateboarders, and inattentive walkers.

So I shouldn’t have been surprised when Susan noticed that the HTML rendering of the calendar need some curb cuts. Within each day, the events show up as a long undifferentiated list. She suggested that subdividing the list by time of day — morning, afternoon, evening — will be helpful to folks using screen readers. But in fact, it’s just plain helpful. So I’m testing a version of that idea now.

Ionically I was just thinking about this same principle in another context. The new version of Oakland Crimespotting, which I raved about, segments incidents using this vocabulary:

light, dark, commute, nightlife, day, night, swing shift

In that spirit, I’m trying this:

morning, lunch, afternoon, evening, night

This of course leads to the question: When do these times begin and end?

I was fascinated to see that both Google and Bing return the same Yahoo answers page for the query morning afternoon evening.

For now, though, I’m going with this ruleset:

  Morning:  5:00 AM to 11:30 AM
    Lunch: 11:30 AM to  1:00 PM
Afternoon:  1:30 PM to  5:30 PM
  Evening:  5:30 PM to  9:00 PM
   Night:   9:00 PM to  5:00 AM

But I’ll make these rules — and maybe even the time-of-day names — configurable on a per-location basis.

Last week I mentioned three ways for elmcity curators to categorize events:

  1. If a source iCalendar feed uses the CATEGORIES property, they’ll be included.

  2. If all of the events from a feed can be categorized, you can name that category in the Delicious metadata, using category=CATEGORY. All events from the feed will inherit it in the same way that they all inherit the default clickthrough link specified with url=URL.

  3. If all of the events from an Upcoming or Eventful venue can be categorized, you can also name that category in the Delicious metadata. To do that, bookmark the venue URL and use the patterns venue={UPCOMING|EVENTFUL} and category=CATEGORY.

Now I’ve added a fourth. In any iCalendar app you can now use these patterns in the Description field:

url=http://www.harlowspub.com

category=music,bluegrass

The url=… and category=… patterns can occur anywhere in the description.

This is particularly useful for recurring events. As discussed here, recurring events are a great way to build critical mass because your curation effort keeps paying dividends.

For example, one of the events I found when exploring the search page for Keene is the Monday night bluegrass jam at Harlow’s Pub.

Here’s the description I entered into Windows Live Calendar — which also could have been entered into Google Calendar, or any other iCalendar app:

The Birch Benders host a Bluegrass picking party at Harlow’s Pub in Peterborough every Monday night – 8 pm until they kick us out (11 or so). url=http://www.harlowspub.com category=music

Here’s the rendered result:

Mon 08:00 PM Bluegrass night with the Birchbenders (recurring events) (music)

The same data shows up in the downstream XML, ICS, and JSON feeds.

Since the iCalendar spec allows for a CATEGORIES element, this approach shouldn’t be necessary. But not all calendar apps allow you to tag events in this way. Outlook does, but Google Calendar, Live Calendar, and Apple iCal don’t.

Fortunately we can scribble in the margins. I first used that phrase in an InfoWorld story about a feature of the Internet’s Domain Name System called the TXT record. Although it is possible to define more specific record types, it’s hard to get everyone to agree to use them. So developers have historically “scribbled in the margins” of the DNS. And we can do the same with iCalendar.


PS: The title of that InfoWorld story was actually Filling in the Margins, which wasn’t what I wrote and which I never liked. The title I wrote was Scribbling in the Margins, and I used it for the blog entry that introduced the InfoWorld article. I’ll have that entry back online soon, along with the rest of my archive from that era. But meanwhile, when I search for the title using doublesearch, I notice an interesting point of comparison between Google and Bing. It’s been over a month since that blog archive went dark, and Google has now evidently forgotten about it. But Bing remembers. I don’t have any special insight into how Bing works, but I’ll be interested to see how long it keeps remembering.

2010-08-04

As of build 777, the source code is released and I’ve begun a series of articles about the development of the project.

Among other recent activities, I’ve been reviewing how curators are including the default HTML rendering in their sites. Sourcing the HTML into an iframe is a popular approach, as seen for example on berkeleyside. In that situation you’d probably like to suppress the calendar’s header image, since the enclosing page already has one. This was formerly only possible by redirecting to an alternate template, but a lighter solution is now available: use header_image=no in your metadata.

Another strategy for sourcing events into a web page can be seen at InMenlo, where the combined ICS feed from the elmcity aggregator is routed back through Google Calendar, and the results sourced into the page. I’ve noticed a lot of duplicate events in that presentation, and I theorize that it’s happening because the IDs I’ve been generating were unique but not stable on a per-event basis across aggregator runs. So I’ve switched to a strategy that will produce stable unique IDs and will watch what happens. Even if I’m right, it may take a while for the duplicates to scroll off the event horizon.

2010-06-25

Given the hiatus since 607, build 659 clearly includes a lot. Much is internal, as I prepare to release the code as open source. But at least these things are visible:

1. New http://elmcity.cloudapp.net homepage, with these improvements: a) /services/ID is now, instead of 404, a helpful document that names and explains all the subsidiary links b) replacement of the dropdown list with a link to a). (/via Rob Goodspeed)

2. tags now link to category views For example, if an event is in the category music, then the default rendering links music to an url like http://elmcity.cloudapp.net/services/elmcity?view=music (/via Bill Rawlinson)

3. Facebook is now an event source. See: http://blog.jonudell.net/2010/05/07/facebook-is-now-an-elmcity-event-source/.

2010-03-02

Build 607 introduces the new iCalendar validator which Doug Day, who is also the author of the iCalendar component I am using, has graciously provided.

On the stats page now — for example, http://elmcity.cloudapp.net/services/prescottaz/stats — the column formerly labeled valid? is now labeled score. The score is a number from 0 to 100, which nicely represents the idea that the validity of a calendar feed is, practically speaking, not a binary yes/no but rather a point along a continuum.

If you click a value in that column, you’ll invoke the validator for the corresponding feed.

Alternatively, if you want to check the validity of a feed, you can do it directly at http://icalvalid.cloudapp.net.

2010-02-26

Build 605 includes few if any curator-visible changes, but much internal improvement. Especially in the areas of caching and service self-monitoring.

2009-11-09

Build 556 adds a major new source of events for all geographic hubs: Eventbrite. It works the same way as the Eventful and Upcoming sources, and happens automatically, curatators don’t have to do anything extra or different.

2009-10-22

Build 540 adds two new administrative features for curators: log viewing, and metadata viewing.

1: Log viewing

The URL pattern for the log viewer is:

http://elmcity.cloudapp.net/logs/{id}/{minutes}

So, for example, here’s the last 60 minutes of activity for the Keene hub:

http://elmcity.cloudapp.net/logs/elmcity/60

If minutes is more than 500, it’ll become 500.

2: Metadata viewing

The URL pattern for the metadata viewer is:

http://elmcity.cloudapp.net/metadata/{id}

You wouldn’t normally need this. But if you want to make sure that the metadata you’ve specified in Delicious has been accurately and completely transmitted to the elmcity service, you can use this URL to check. It retrieves the mirror of your Delicious metadata that’s stored in an Azure table, plus some computed or looked-up values that aren’t in Delicious:

  • events
  • population
  • feed_count

If you have made a change to your Delicious metadata, you might want to send a Twitter start message to the service in order to start a new cycle. You can watch its progress in your log viewer. Look for entries like these:

 8:53:45 PM info 5872,RD00155D301CF7: Scheduler.StartTaskForId: eLearningEvents
 8:53:45 PM info 5872,RD00155D301CF7: processing hub: eLearningEvents
 8:54:03 PM info 5872,RD00155D301CF7: DoIcal: eLearningEvents

Once you see DoIcal, your metadata should be synched to Azure. You can check your metadata URL to verify.

2009-10-21

Build 538 adds the ability for curators to short-circuit the normal cycle and start a new one. You do this by sending a Twitter direct message to @elmcity_azure. If the message says:

start

then your cycle will restart within 5 minutes.

2009-10-06

Major changes in build 516:

Improved calendar widget in HTML views
The new version, based on the JQuery datepicker, stays in a fixed position as you scroll, and ajusts itself to the closest displayed date.

Hubs using the default template will see this new behavior. Hubs that have altered the default template may want to resynch with it.

RSS views
The following flavors work:

1. http://elmcity.cloudapp.net/services/elmcity/rss

2. http://elmcity.cloudapp.net/services/elmcity/rss?view=government

3. http://elmcity.cloudapp.net/services/elmcity/rss?count=30

4. http://elmcity.cloudapp.net/services/elmcity/rss?view=music&count=10

This flavor should be especially useful for systems, like WordPress.com, that will not display dynamic content but will display RSS feeds. You can see example 4 in action on my blog (scroll down to “Upcoming music in Keene”).

2009-09-16

Build 452 adds new standard views for tag summaries (as HTML and as JSON). So for example:

http://elmcity.cloudapp.net/services/elmcity/tags_html

http://elmcity.cloudapp.net/services/elmcity/tags_json

or

http://elmcity.cloudapp.net/services/elmcity/tags_json?jsonp=tags

2009-08-25

Build 441 includes WHERE and WHAT hubs on the homepage, and streamlines the display of all associated links.

2009-08-06

Build 430 converts internally to UTC times. This mainly benefits topical hubs, since their events cross timezones. The data feeds — XML, JSON, iCalendar — are now all expressed in terms of UTC.

For geographical hubs, the data feeds are also expressed in terms of UTC. However, the HTML view converts to local time as defined by the tz= metadata slot. Since geographical hubs are only using the HTML view, and none (I think) are using the data feeds, there should be no observable change.

2009-07-23

Build 420 adds a new axis of aggregation: topic rather than geographic location.

The mechanism is similar. Basically just what= instead of where= in the delicious metadata.

The first two examples are:

http://delicious.com/eLearningEvents

http://delicious.com/SustainabilityInBusiness

In these cases, there is no aggregation from Eventful and Upcoming, because there is no location. So it’s purely aggregation of a list of iCalendar feeds, which are handled in the usual way.

These topic aggregations aren’t yet reflected on the home page. But you can find them here:

http://elmcity.blob.core.windows.net/sustainabilityinbusiness/SustainabilityInBusiness.html

http://elmcity.blob.core.windows.net/elearningevents/eLearningEvents.html

Similarly for .xml, .json, .ics.

(The directory names is all lowercase. The filenames are whatever you’re using on delicious — most folks use lowercase, but mixed case is possible and both of these are that way.)

2009-06-16

Build 396 tweaks the default HTML rendering. With each day, events are now grouped by these time-of-day labels: Morning, Lunch, Afternoon, Evening, Night, WeeHours.

2009-06-10

Build 390 starts to make category-based views available. The supported types so far: html and xml.

Example URLs:

http://elmcity.cloudapp.net/services/elmcity/xml/sports

http://elmcity.cloudapp.net/services/prescottaz/html/government

http://elmcity.cloudapp.net/services/whyhuntington/html/nightclub%20scene

Any or all of the categorization methods discussed previously can contribute to these views.

Not yet supported:

http://elmcity.cloudapp.net/services/whyhuntington/html/music,fine-arts

(And not yet decided: Does that mean music AND fine-arts? Or music OR fine-arts? And either way, how to express the alternative.)

Bill R: You have some slash-delimited categories, e.g.:

Fairs/Festivals

Those won’t work.

2009-06-05

Build 388 adds a fourth way to categorize events. In any iCalendar app, you can now use these patterns in the Description field:

url=http://www.harlowspub.com

category=music,bluegrass

This is particularly useful for recurring events. As discussed here, recurring events are a great way to build critical mass because your curation effort keeps paying dividends.

One of the events I found when exploring the search page for Keene is the Monday night bluegrass jam at Harlow’s Pub.

Here’s the description I entered into Windows Live Calendar — which also could have been entered into Google Calendar, or any other iCalendar app:

“The Birch Benders host a Bluegrass picking party at Harlow’s Pub in Peterborough every Monday night – 8 pm until they kick us out (11 or so). url=http://www.harlowspub.com category=music”

Here’s the rendered result:

Mon 08:00 PM Bluegrass night with the Birchbenders (recurring events (live)) (music)

The url=… and category=… patterns can occur anywhere in the description.

I hope this not only opens some doors w/respect to categorization, but also w/respect to finding and adding recurring events, which are a very powerful way to build up critical mass.

2009-06-04

With build 386, we have 3 ways to categorize events:

  1. If a source iCalendar feed uses the CATEGORIES property, they’ll be included.
  2. If all of the events from a feed can be categorized, you can name that category in the Delicious metadata, using category=CATEGORY. All events from the feed will inherit it  in the same way that they all inherit the default clickthrough link specified with url=URL.
  3. If all of the events from an Upcoming or Eventful venue can be categorized, you can also name that category in the Delicious metadata. To do that, bookmark the venue URL and use the patterns venue={UPCOMING|EVENTFUL} and category=CATEGORY.

See today’s blog entry for a more detailed explanation.

2009-05-14

Build 335 addresses a problem that Dave Witzel noticed when he used Google Calendar as a viewer for the Falls Church combined ICS feed.

The problem was lack of context around the displayed event. The (imperfect) solution, discussed in this blog post, copies the URL for the event (or the default URL, from the url= tag in the delicious metadata), into the LOCATION field.

The blog post has the whole story.

2009-05-04

Build 324 adds a service to help curators (or anyone) extract the ICS URL from a web page with an embedded Google Calendar. The writeup is here. The bookmarklet that uses the service is here. To use it directly, for a page like http://www.huntingtoncommunitygardens.com/8.html, do this:

http://elmcity.cloudapp.net/gcal2ics/www.huntingtoncommunitygardens.com/8.html

In other words: http://elmcity.cloudapp.net/gcal2ics/ plus the URL of the calendar page without its leading htttp://

2009-04-16

Build 313 introduces a major new feature for curators. Read all about it here — A power tool for curators — and let me whether you’re able to make use of it.

2009-04-10

The first version of the project FAQ is done. If you know somebody who’s interested in becoming a curator, that page has the information you need.

For a gentler introduction, you can point them to this blog entry which describes the minimal setup required to bootstrap an instance of the calendar aggregator for a location.

2009-04-08

To the extent that most people ever subscribe to ICS feeds at all in their personal calendars, I think the best strategy will be to use the aggregator as a conduit to individual feeds that you might want to subscribe — the hockey schedule, the nightclub schedule.

However, build 298 does now publish the merged set of events for each location as a single ICS feed. I’ll be very interested to know who uses these, and how.

I suspect that experiences will vary according to event density. Adding an ICS overlay for Keene into my Outlook calendar actually does seem to be useful, at the current event density. It may already be overwhelming for Ann Arbor or Baltimore.

In any case, the densities will (hopefully) increase for all locations. So I think in the long run the best use of the aggregator will be to collect and promote discovery of individual feeds.

2009-04-07

Build 295 lays the foundation for per-location iCalendar feeds that merge all sources of events for each location, but does not yet publish those feeds.

The visible change is on the home page, where events, population, and events/person are now reported. Because these are future events, the numbers change even if no feeds are added. I’d like to include a sparkline in the summary table to visualize those trends.

2009-04-03

Today’s version, build 287, focuses on stats reporting. The per-location changes are:

  1. Total events
  2. Population
  3. Ratio of the two

So, for example, this is mashablecity (Providence RI) today:

All events 910, population 48779, events/person 0.02

The per-feed changes are:

  1. If a feed fails to validate, False (in the Valid? column) is a link to the full validation report
  2. Validation errors are captured and reported
  3. Loading errors are captured and reported
  4. The PRODID (Product ID) of the software that wrote the feed is reported.

2009-03-30

Build 281, deploying now, is the first fully data-driven version. From now on, I should not have to touch any code to add a new city. The workflow is:

  1. A new curator claims a delicious account, as Matt Gillooly just did with delicious.com/mashablecity.
  2. I bookmark that URL in my own delicious account, using the tag calendarcuration.
  3. On the next cycle, the system queries http://delicious.com/judell/calendarcuration, finds one more curatorial account than before, and rolls it in.

In doing this, I sorted out what the minimal requirement is for account metadata. It is simply — in Matt’s case  — the where slot:

where=providence, ri

Everything else is a slot that you can override from the following defaults:

tz=Eastern

title=event+calendar

contact=nobody_yet

img=http://elmcity.info/media/img/keene-night-360.jpg

css=http://jonudell.net/css/elmcity.css

template=http://jonudell.net/tmpl/events.tmpl

lat={computed from where}

lon={computed from where}

radius=5

2009-03-27

This version makes the “today’s events” widgets on the home page dynamically selectable. With more locations on board, it will be interested to be able to compare any two, side-by-side, for a given day.

2009-03-20

The version deploying now includes these changes:

  • CSS. Although the rendered HTML is only meant as a reference implementation — to be replaced with stuff like Bill’s doing here, it didn’t need to be quite so butt ugly.
  • Baltimore. I’ve added a fourth collection for localist. It’s currently Eventful+Upcoming only, no iCalendar, pending adjustment of the feed tags. Radius is 5 miles because 15 was overwhelming.
  • Metadata. I’m persisting the Delicious metadata into the Azure tablestore. Although Delicious is convenient, not everyone will want to use it. Ed Vielmetti in Ann Arbor would rather use a semantic wiki, which is great. The backend will be a neutral repository for various metadata formats, just as the aggregator is a neutral hub for various calendar formats

With 4 locations running, the update cycle is pushing toward an hour. It won’t be too long before I start running out of hours in the day. But that’s what makes this a nice Azure experiment. The work is naturally divisible and distributable. I’m starting to think about how to hand out tasks to a pool of workers, and coordinate them. Really looking forward to that part, it’ll be interesting!

2009-03-17

While implementing Bill’s fix, I found and fixed several problems that were causing both Eventful and iCalendar events to be omitted. So all calendars are now more complete.

I also improved the iCalendar stats page. Now there’s a complete accounting of single events, recurring events, instances of recurring events, and future events.

2009-03-16

Bill Rawlinson noticed that the per-feed links weren’t working. That’s because although I defined the metadata mechanism, and he implemented it at http://delicious.com/whyhuntington, I had neglected to actually wire it up! Thanks for noticing, Bill, I’m testing the fix now.

The idea is as follows. Some iCalendar feeds include per-event URLs but many do not. In those cases, we can specify a catch-all link for the feed. Since the feed will already be bookmarked in Delicious, you can do that by adding a name=value tag like so:

url=http://www.wvpumpkinpark.com/pp-calendar.asp

> I’m testing the fix now.

Hmm. The test triggered the Delicious rate limit. Which is OK, I’ve been needing to implement caching of external services anyway. So now I will.

2009-03-15

The aggregator now includes Upcoming events in and around your location. This feature is controlled by three tags on the metadata URL in your Delicious account. From the Keene example:

radius=15

lat=42.9336

lon=-72.2786strong>2009

WHAT AND WHY

With the community calendar service now live, I’ve got to do a bit more work to make it fully data-driven. Since I’m already managing the per-community feed lists and metadata on Delicious, I figure I might as well go all the way. So I’m keeping a list of the Delicious accounts that control each community’s calendar aggregator on Delicious too. Today there are three. The idea is that when I add the fourth, I won’t touch any code — or even configuration data — that will require an update to the running service. I’ll just bookmark a fourth Delicious account and tag it with calendarcuration.

But that’s merely an administrative convenience. Much more critical, at this point, is to help curators find machine-readable calendars in their communities and — since most of the calendars that might exist don’t — also show people how they can easily create them.

I got a running start when I bootstrapped the Ann Arbor instance, thanks to Google Calendar. I searched for Ann Arbor there and found a nice list of iCalendar feeds. But that search feature is, at least for now, gone.

Several curators have tried searching the web for .ICS files (e.g. filetype:ics), but that’s not very productive for a couple of reasons. Where iCalendar resources do exist, they often aren’t exposed as files with .ICS extensions. But more importantly, relative to the number of iCalendar resources that could exist, very few actually do.

So I thought back on how I bootstrapped the original Keene instance. A number of the events there are recurring events that were advertised on the web, but not in any structured format. I found them one day by doing web queries like:

"first monday" keene
"every thursday" keene

There’s no fully automatic way to convert this stuff into structured calendar data. But it’s pretty straightforward to fire up a calendar program, enter some recurring events, and publish a feed. The advantage of recurring events, of course, is that they keep showing up, which is very helpful if you’re trying to build critical mass.

So I’m now envisioning a pair of tools to help curators do this more easily. First, I’d like to have each community’s aggregator running a scheduled search that helps the curator be aware of calendar-like information that could be upgraded to actual calendar data. Second, I’d like to provide a tool that partly automates the cumbersome data entry.

I’ve done an initial version of the search tool, and an example of its output is here. I’ll attach the code to the end of this item, for those who care, although I expect that if it winds up being useful to curators, most will appropriately not care, and will only want to scan the links now and then.

It may be interesting, over time, to try to evolve this into a robot that makes sense of the calendar information that people actually write, as opposed to the information that calendar programs constrain them to produce. But meanwhile this hybrid approach seems like a way to make progress.

HOW

I did this tool in two parts. The kernel, so to speak, is in C#, because for now that’s the most practical way to write Azure services and applications. But the application is in IronPython, because the search function doesn’t yet need to be hosted on Azure, and IronPython is a really flexible and convenient way to experiment with the kernel.

The C# piece uses James Newton-King’s Json.NET library because JavaScript interfaces are now the preferred way to search programmatically. It’s been a while since I’ve done this kind of thing. Used to be, the REST APIs were easy to find. But now, since those interfaces are mainly intended for use by JavaScript objects embedded in web pages, I had to do a bit of spelunking.

One of the interesting things about Json.NET is that it includes an implementation of LINQ for JSON. That’s why you see the “from … select” syntax, which extracts an enumerable list of URLs from the JavaScript results returned by the search services.

using System;
using System.Collections.Generic;
using System.Linq;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

namespace CalendarAggregator
{
  public class Search
    {
    public List<string> search_result_urls;
    public Dictionary<string, int> dict;

    public List<string> livesearch(string query)
      {
      var url_template =  "http://api.search.live.net/json.aspx?AppId=XXX& \
          Sources=web&Query={0}&Web.Count=50";
      var offset_template = "&Web.Offset={1}";
      var search_url = "";
      int[] offsets = { 0, 50, 100, 150 };
      foreach (var offset in offsets)
        {
        if (offset == 0)
          search_url = string.Format(url_template, query);
        else
          search_url = string.Format(url_template +
            offset_template, query, offset);

        var page = Utils.FetchUrl(search_url).data_as_string;
        JObject o = ( JObject) JsonConvert.DeserializeObject(page);

        var urls =
          from url in o["SearchResponse"]["Web"]["Results"].Children()
            select url.Value<string>("Url").ToString();

        dictify(urls);
        }
      return new List<string>();
    }

    public List<string> googlesearch(string query)
      {
      var url_template = "http://ajax.googleapis.com/ajax/services/search \
          /web?v=1.0&rsz=large&q={0}&start={1}";
      var search_url = "";
      int[] offsets = { 0, 8, 16, 24, 32, 40, 48 };
      foreach (var offset in offsets)
        {
        search_url = string.Format(url_template, query, offset);
        var page = Utils.FetchUrl(search_url).data_as_string;
        JObject o = (JObject)JsonConvert.DeserializeObject(page);

        var urls =
          from url in o["responseData"]["results"].Children()
             select url.Value<string>("url").ToString();

        dictify(urls);
        }
      return new List<string>();
    }

  private void dictify(IEnumerable<string> urls)
    {
    foreach (var url in urls)
      {
      if (dict.ContainsKey(url))
        dict[url] += 1;
      else
        dict[url] = 1;
      }
    }
  }
}

Here’s the IronPython piece which uses the search methods from the C# code:

import clr
clr.AddReference("CalendarAggregator")

locations = [
'ann arbor',
'huntington wv',
'keene'
'virginia beach',
]

qualifiers = [
'first',
'second',
'third',
'fourth',
'every'
]

days = [
'monday',
'tuesday',
'wednesday',
'thursday',
'friday',
'saturday',
'sunday'
]

for location in locations:
  search = Search()
  for qualifier in qualifiers:
    for day in days:
      q = '"%s" "%s %s"' % ( location, qualifier, day )
      search.googlesearch(q)
      search.livesearch(q)

for key in search.dict.Keys:
  print key, search.dict[key]

This week my ongoing fascination with Delicious as a user-programmable database took a new turn. Earlier, I showed how I’m using Delicious to enable collaborative curation of the set of feeds that drives an aggregation of community calendars.

The service I’m building in this ongoing series has so far collected calendars only for a single community — mine. But the idea is to scale out so that folks in other communities can use it for their own collections of calendars.

As I refactored the code this week to prepare for that scale-out, I thought about how to manage the configuration data for multiple instances of the aggregator. This is a classic problem, there are a million ways to solve it, and I thought I’d seen them all. But then I had a wacky idea. If I’m already using Delicious to enable community stakeholders to curate the sets of feeds they want to aggregate, why not also use Delicious to enable them to manage the configuration metadata for instances of the aggregator?

Here’s a way to do that. Consider this URL:

http://delicious.com/elmcity/metadata

It refers to an URL that doesn’t actually point to anything — click it and you’ll see that for yourself. So it’s really an URN (Uniform Resource Name) rather than an URL (Uniform Resource Locator).

But even though it doesn’t point to anything, it can still be bookmarked. The owner of the elmcity account on Delicious can click Save a Bookmark and put http://del.icious.com/elmcity/metadata into the URL field.

Now you can attach stuff to the bookmark, like so:

Here the title of the bookmark is metadata, and the tags are these strings:

tz=Eastern
title=events+in+and+around+keene
img=http://elmcity.info/media/keene-night-360.jpg
css=http://elmcity.info/css/elmcity.css
contact=judell@mv.com
where=keene+nh
template=http://elmcity.info/media/tmpl/events.tmpl

These strings are, implicitly, name=value pairs. The service that reads this configuration data from Delicious can easily make them into explicit names and values. But how does it find them? By looking up the metadata URL, like so:

delicious.com/url/view?url=http://delicious.com/elmcity/metadata

That request redirects to the special Delicious URL that uniquely identifies the bookmark:

delicious.com/url/9ee9d2e51e4f36d4d49207e1675b3cbb

Of course the service doesn’t want to dig the name=value pairs out of that web page. So instead it reads the page’s RSS feed:

feeds.delicious.com/v2/rss/url/9ee9d2e51e4f36d4d49207e1675b3cbb

To prove that it works, check out this prototype version of the elmcity calendar. That page was built by an Azure service that reads configuration data from the bookmarked URN, and interpolates the name=value pairs into the template specified in the metadata.

Is this crazy? Here are some reasons why I think not.

First, I’m embracing one of a programmer’s greatest virtues: laziness. Why write a bunch of database and user-interface logic just to enable folks to manage a few small collections of name=value pairs? Delicious has already done that work, and done it much better than I could.

Second, the configuration data lives out in the open where stakeholders can see it, touch it, and collaboratively manage it. There are all kinds of ways Delicious can help those folks do that. For example, anyone who cares about this collection of data can subscribe to its feed and receive notifications when anything changes.

Third, it’s easy to extend this model. For example, part of the workflow will entail one or more stakeholders deciding to trust a feed and put it into production. As you may recall, the service trusts a feed when it’s bookmarked with the tag trusted. Part of that approval process will involve making sure that there are URLs associated with events coming from the feed. Some iCalendar feeds provide them, but many don’t.

So in addition to the configuration that’s needed once for each instance of a community aggregator, there’s a bit of configuration that’s needed once per feed. If a feed doesn’t provide URLs for individual events, you can at least provide a homepage URL for the feed. And this piece of metadata can be managed in the same way. Here’s the bookmark for the Gilsum church. It carries the tag url=http://gilsum.org/church.aspx. As you browse around in a set of trusted feeds, it’s pretty easy to see which ones do and don’t carry those tags, and it’s pretty easy to edit them.

It all adds up to a ton of value, and to capture it I only had to write the handful of lines of code shown below.

Now I’ll grant this way of doing things won’t work for everybody, so at some point I may need to create an alternative. And since I don’t want to depend on Delicious being always available, I’ll want to cache the results of these queries. But still, it’s amazing that this is possible.


public Dictionary<string, string>
  get_delicious_feed_metadata(string metadata_url, string account)
  {
  var dict = new Dictionary<string, string>();
  var url = string.Format("http://delicious.com/url/view?url={0}",
    metadata_url);
  var http_response = Utils.FetchUrlNoRedirect(url);
  var location = http_response.headers["Location"];
  var url_id = location.Replace("http://delicious.com/url/", "");
  url = string.Format("http://feeds.delicious.com/v2/rss/url/{0}",
    url_id);
  http_response = Utils.FetchUrl(url);
  var xdoc = Utils.xdoc_from_xml_bytes(http_response.data);
  string domain = string.Format("http://delicious.com/{0}/", account);
  var categories = from category in xdoc.Descendants("category")
                   where category.Attribute("domain").Value == domain
                   select new { category.Value };
  foreach (var category in categories)
    {
    var key_value = Utils.RegexFindGroups(category.Value,
      "^([^=]+)=(.+)");
    if (key_value.Count == 2)
      dict[key_value[0]] = key_value[1].Replace('+', ' ');
    }
  return dict;
  }

As promised yesterday, here’s a detailed account of the gymnastics required to extract usable data from Transparency International’s Corruption Perception Index (CPI) reports.

The reports are published as yearly editions for each of the 11 years since 1998. They’re not consolidated, at least not anywhere I can find, so if you want to analyze trends in the TI data you’ve got to consolidate those reports yourself.

The yearly reports are available as both HTML tables and corresponding Excel spreadsheets. I didn’t know about the latter. The website is organized such that for the recent years I examined first, only the HTML table is obviously available. So the procedure I’ll show here wasn’t strictly necessary. I could have gone straight to the Excel files.

But in the end it’s the same data, and all the subsequent processing is necessary in either case. So I’ll take this opportunity to show how to use Excel to extract data from an HTML table. That’s a really common operation if you’re into this sort of thing, and Excel does it pretty well.

Here’s part of the 2005 CPI table:

TI 2005 Corruption Perceptions Index

Country rank Country 2005 CPI score Confidence range Surveys used***

1

Iceland 9.7 9.5 – 9.7 8
2 Finland 9.6 9.5 – 9.7 9
New Zealand 9.6 9.5 – 9.7 9
4 Denmark 9.5 9.3 – 9.6 10

To import it into Excel 2007, first visit the page and capture its URL.

Then, in Excel, do Data -> From Web -> From Web (Classic Mode), navigate to the table you want, click the arrow at its top left corner, and click Import.

That was the easy part. Before long, I had a spreadsheet with 11 CPI reports. To simplify things, I stripped each one down to just two columns: country name and CPI rank. I wanted to see trends in the ranking over time. To do that, I needed to merge the 11 sheets into a single sheet with a column of normalized names, and 11 columns of normalized ranking data.

The names had to be normalized for a couple of reasons. First, there were six different encodings of Côte d´Ivoire:

C\xC3\xB4te d\xC2\xB4Ivoire
Cote d'Ivoire
C\xF4te-d'Ivoire
Cote d\xB4Ivoire
Cote d?Ivoire
C\xF4te d\xB4Ivoire

There were also typos (Moldovaa for Moldova) and variant spellings (USA vs United States)

The rankings had to be normalized because sometimes countries are tied for a rank. In those cases (as above) some of the files were sparse, with empty cells for repeated ranking. In other cases, all cells were populated.

To do this normalization I exported the data from Excel to 11 CSV files, and used the following Python script:

import csv

r98 = csv.reader(open('cpi1998.csv'))
r99 = csv.reader(open('cpi1999.csv'))
r00 = csv.reader(open('cpi2000.csv'))
r01 = csv.reader(open('cpi2001.csv'))
r02 = csv.reader(open('cpi2002.csv'))
r03 = csv.reader(open('cpi2003.csv'))
r04 = csv.reader(open('cpi2004.csv'))
r05 = csv.reader(open('cpi2005.csv'))
r06 = csv.reader(open('cpi2006.csv'))
r07 = csv.reader(open('cpi2007.csv'))
r08 = csv.reader(open('cpi2008.csv'))

def fix(c):
  c = c.replace('(Former Yugoslav Republic of)','')
  c = c.replace('Congo, Republic of','Congo, Republic')
  c = c.replace('Congo, Republic the','Congo, Republic')
  c = c.replace('Dominican Rep.','Dominican Republic')
  c = c.replace('Dominican Rep\n','Dominican Republic\n')
  c = c.replace('FYR ','')
  c = c.replace('Saint Vincent and the','Saint Vincent')
  c = c.replace('Saint Vincent and','Saint Vincent')
  c = c.replace('Macedonia ','Macedonia')
  c = c.replace('Moldovaa','Moldova')
  c = c.replace('Serbia and Montenegro','Serbia')
  c = c.replace('Palestinian Authority','Palestine')
  c = c.replace('the Grenadines','Grenadines')
  c = c.replace('&','and')
  c = c.replace('USA','United States')
  c = c.replace('Viet Nam','Vietnam')
  c = c.replace('Slovak Republic','Slovakia')
  c = c.replace('Kuweit','Kuwait')
  c = c.replace('Taijikistan','Tajikistan')
  c = c.replace('Republik','Republic')
  c = c.replace('Herzgegovina','Herzegovina')
  c = c.replace("Ivory Coast",'C\xC3\xB4te d\xC2\xB4Ivoire')
  c = c.replace("Cote d'Ivoire",'C\xC3\xB4te d\xC2\xB4Ivoire')
  c = c.replace("C\xF4te-d'Ivoire", 'C\xC3\xB4te d\xC2\xB4Ivoire')
  c = c.replace('Cote d\xB4Ivoire', 'C\xC3\xB4te d\xC2\xB4Ivoire')
  c = c.replace('Cote d?Ivoire', 'C\xC3\xB4te d\xC2\xB4Ivoire')
  c = c.replace('C\xF4te d\xB4Ivoire', 'C\xC3\xB4te d\xC2\xB4Ivoire')
  return c

d = {}
rnum = -1
lastrank = None

for reader in [r98,r99,r00,r01,r02,r03,r04,r05,r06,r07,r08]:
  rnum += 1
  for row in reader:
    rank = row[0]
    if rank == '':         # normalize rank
      rank = lastrank
    lastrank = rank
    country = fix(row[1])  # normalize name
    if not d.has_key(country):
      d[country] = [0,0,0,0,0,0,0,0,0,0,0]
    d[country][rnum] = rank

keys = d.keys()
keys.sort()

for key in keys:
  print "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s" %
    ( key, d[key][0],d[key][1],d[key][2],d[key][3],
           d[key][4],d[key][5],d[key][6],d[key][7],
           d[key][8],d[key][9],d[key][10] )

As you can see, the bulk of this script is really just data, in the form of search/replace pairs. Its output is another CSV file. It took me a few tries to reduce the list of names to a normalized core. I ran the script, took the output into Excel, eyeballed the list, and added new search/replace pairs.

Eventually I wound up with this data, which I brought back into Excel to explore. Because I wanted to look at what I’m calling volatility — that is, the variability in CPI rankings — I added a column that computes the difference between a country’s highest and lowest rankings over the 11-year period, and then sorts countries by that difference, from most to least volatile.

We can debate whether a stack of sparklines is a useful way to visualize trends in this data, but that’s the approach I decided to try. It gave me a chance to experiment with some of the sparkline kits available for Excel, and the one I settled on is BonaVista’s MicroCharts.

Here’s a picture of two chart styles I tried:

These microcharts do succeed in telling stories about each country individually, while also making it possible to notice that Georgia, atypically among the more volatile countries, is moving toward a lower (better) ranking.

In another variation on this theme, I flipped the rankings to their negative counterparts so that the charts would flip too, and would correspond to my natural sense that up means better and lower means worse. I also removed the zeroes so that they wouldn’t show up as data points.

That was good enough for my purposes, but when I converted the spreadsheet back to HTML I wasn’t happy with the results. That’s partly because the microcharts, which are rendered using TrueType fonts, had to be converted to lower-resolution images. And it’s partly because the HTML that Excel generated was too complicated for my WordPress blog to handle gracefully.

So I exported the enhanced data back out to a CSV file, and switched to Python again. There are a million ways to generate sparklines from data, but the one I remembered from a previous encounter was Joe Gregorio’s handy sparkline service.

(By the way, it should be possible to use that web-based service from Excel. And interactively, you can. If you capture a sparkline URL like this one, you can paste it into the File Open dialog presented by Excel’s Insert -> Picture feature. The dialog asks for a filename, but you can give it an URL and it’ll work.

When I realized that, I spent a few minutes trying to automate the procedure so that Excel 2007 could programmatically grab data, run it through an image-generating web service, and embed the resulting pictures. I failed, as have others before me, but it’s nifty idea. If you know the solution, please share.)

Anyway, here’s the little Python script that reads the data, produces sparkline images, and embeds them in the HTML table I displayed on my blog:

# -*- coding: utf-8 -*-
import urllib2,os

data = open('cpi.csv').read()

url_template = "http://bitworking.org/projects/sparklines/spark.cgi?\
  type=discrete&d=%s&height=20&limits=0,200&upper=1&\
  above-color=black&below-color=white&width=4"

rows = ''
row_template = """<tr>
<td class="sparkline">
<img src="http://jonudell.net/img/cpi/%s">
</td>
<td>%s</td></tr>\n"""

lines = data.split('\n')
for line in lines:
  country = line.split(',')[0]
  ranks = line.split(',')[1:]
  quoted_fname = '%s.png' % urllib2.quote(country)
  fname = '%s.png' % country
  imgurl = url_template % ranks
  cmd = 'curl "%s" > "./cpi/%s" ' % (imgurl,fname)
  os.system(cmd)
  cmd = 'mogrify -flip "./cpi/%s"' % fname
  os.system(cmd)
  rows += row_template % (quoted_fname,country.replace(' ','&nbsp;'))

html = '<table cellspacing="4">%s</table>' % rows
f = open('cpi.html','w')
f.write(html)

By specifying upper=1 and below-color=white in the sparkline-generating URL, the zeroes (representing unreported data) vanish from the charts.

The charts don’t include reference lines as shown in the Excel screenshot, but I added them back using this bit of CSS:

td.sparkline {
border-top:1px #cccccc solid;
}

I’m using Python here partly as a shell language. It invokes a pair of command-line utilities: cURL to download images, and mogrify (part of the ImageMagick suite) to flip them.

Although one of these commands is a cloud-based sparkline service, and the other is a locally-installed image processing program, they’re treated in exactly the same way. When the quantities of data involved are small — these .PNG images are just a few hundred bytes — there’s no discernible difference between the two modes. I like that symmetry.

What I don’t like is all the moving parts. It’s awkward for me to move from Excel to Python to Excel to Python, with excursions to the command line along the way, and no normal person would even consider doing that.

In a simple case like this, such gymnastics should never have been required. If you’re going to publish data to the web, assume that people will want to use it and do the minimal basic hygiene and consolidation.

At some point, though, people will want to do fancier tricks. Today you have to be a “data geek” to perform them, but that shouldn’t be so. We’ve got to find a way to integrate Excel, dynamic scripting, command-line voodoo, and web publishing into a suite of capabilities that’s much more accessible.

In part one of this series I gave an overview of my current project to recreate the elmcity.info calendar aggregator on the Azure platform. In this installment I’ll focus on test-driven development in Azure.

Because I’m doing the core aggregator in C#, I’m using the popular NUnit software to automate the running of my test suite. It’s standard stuff if you’re familiar with the XUnit approach. But if you’re not a programmer, I’ll briefly explain. I think it’s worthwhile because the ideas that inform test-driven programming are an aspect of computational thinking that everyone could generalize from and apply in a variety of useful ways.

A primer on test-driven development

Let’s focus on one small piece of code, a method called AddTrustedEventfulContributor, which implements part of the trusted-feed mechanism I outlined in Databasing trusted feeds with del.icio.us.

As I explained there, when the aggregator’s scan of Eventful events within 15 miles of Keene finds an unknown contributor, as was true recently for Beau Bristow, it creates a del.icio.us record with the tags new, eventful, and contributor. If I decide to trust Beau, I can just change the new tag to trusted by hand. But eventually I’ll want to automate that, so an administrator needn’t remember the tagging convention or worry about making an error.

So AddTrustedEventfulContributor creates (or updates) a del.icio.us bookmark for the URL eventful.com/users/beaubristow/created/events, and ensures that it’s tagged with trusted, eventful, and contributor.

Once the method is written, and seems to work, how can we be sure that it continues to work? The environment is dynamic. The code supporting the method is evolving. And so is the code supporting the del.icio.us and Eventful services it orchestrates. We want to be able to test the method continuously, and verify that it keeps on doing what we expect.

The code to be tested is defined in a file called Delicious.cs, like so:

public static Utils.http_response_struct
    AddTrustedEventfulContributor(string contrib)
  {
  return AddTrustedContributor(contrib, "eventful");
  }

private static Utils.http_response_struct
    AddTrustedContributor(string contrib, string service)
  {
  contrib = contrib.Replace(' ', '+');
  var bookmark_url = build_bookmark_url(contrib, service);
  string tags = "trusted+contributor+" + service;
  string args = string.Format("&url={0}&tags={1}&description={2}",
    bookmark_url,   tags, contrib);
  var url = string.Format("{0}/posts/add?{1}", apibase, args);
  return do_request_with_url(url);
  }

Tests are defined in a parallel file, DeliciousTest.cs, like so:

[TestFixture]
public class DeliciousTest
  {
  private const string contrib = "xyzas 'dfbyas234";

  [Test]
  public void t1_addTrustedEventfulContributor()
    {
    Utils.http_response_struct response =
      Delicious.AddTrustedEventfulContributor(contrib);
    Assert.AreEqual(HttpStatusCode.OK, response.normal_status);
    Assert.That(isSuccessfulDeliciousOperation(response));
    Assert.That(Delicious.isTrustedEventfulContributor(contrib));
    }

The test calls Delicious.AddTrustedEventfulContributor with the fictitious contributor xyzas ‘dfbyas234, and makes three assertions about the outcome. First, we should get the expected OK status code from del.icio.us. Second, we should get the expected XML response. And third, the expected tags should actually have been applied to the bookmark for xyzas ‘dfbyas234.

Like other XUnit software, NUnit provides a few different ways to run tests. Everyone’s favorite is the GUI testrunner, which displays a tree of test sets (fixtures) and tests, with green and red indicators for pass and fail. The indicators produce a Pavlovian response: You want to see them stay green, and will work obsessively to keep them that way.

The Azure twist

So far this is all standard stuff, but here’s the Azure twist. For a while I was using the GUI testrunner, and then deploying — first to the local Azure development “fabric” and then to the cloud. But the GUI testrunner’s environment isn’t quite the same as Azure. I was reminded of that fact when I added a serialization method to the aggregator.

The original Python-based service uses a binary serialization technique that Pythonistas call pickling. It’s a convenient way to freeze-dry and rehydrate data structures that don’t need to be stored in a queryable or transactional database. You can do the same thing in other programming environments, including Perl, Java, and .NET.

So I implemented .NET-style binary serialization for some intermediate data, and pushed these binary files into the Azure blob store. My NUnit test of this method ran green, but when I deployed into the local fabric it failed. Oh, right. The fabric’s security rules, as I mentioned last time, are different, and stricter than the defaults on your local machine.

Here’s the original serializer, which works outside Azure but not inside:

public void serialize(string container, string file,
  List<evt> events)
  {
  var serializer = new BinaryFormatter();
  var ms = new MemoryStream();
  serializer.Serialize(ms, events);
  var chars = Encoding.UTF8.GetChars(ms.ToArray());
  ms.Close();
  write_to_azure_blob(container, file, new string(chars));
  }

The line shown in red is the culprit. That’s where Azure throws a security exception. Thanks to a clue provided by Brendan Enrick I found this alternate, XML-oriented approach which doesn’t trigger a security exception:

public void serialize(string container, string file,
  List<evt> events)
  {
  var serializer = new XmlSerializer(typeof(List<evt>));
  var stringBuilder = new StringBuilder();
  var writer = XmlWriter.Create(stringBuilder);
  serializer.Serialize(writer, events);
  byte[] buffer = Encoding.UTF8.GetBytes(stringBuilder.ToString());
  write_to_azure_blob(container, file, buffer);
  }

And that’s how these intermediate files are now being written.

At this point I realized that, in order to test things properly, NUnit would have to migrate into the Azure fabric. It’s designed to be embedded in a variety of hosts, but I’ve never tried doing that. Here’s what I learned.

Running NUnit in Azure

The first step, as expected, was to make sure that the NUnit code could even load in Azure’s partial-trust environment. As shipped, it doesn’t. The DLLs won’t load in Azure’s local fabric, or in the cloud. If you’re wondering whether a DLL will or won’t load, Keith Brown’s FindAPTC tool will tell you. It checks DLLs to see if the Allow Partially Trusted Callers attribute is turned on. As I collect components for use in Azure, I find that they often don’t flip that switch.

The solution is to visit files like this one and change them from this:

using System;
using System.Reflection;

[assembly: CLSCompliant(true)]

[assembly: AssemblyDelaySign(false)]
[assembly: AssemblyKeyFile("../../../../nunit.snk")]
[assembly: AssemblyKeyName("")]

To this:

using System;
using System.Reflection;
using System.Security;

[assembly: CLSCompliant(true)]

[assembly: AssemblyDelaySign(false)]
[assembly: AssemblyKeyFile("../../../../nunit.snk")]
[assembly: AssemblyKeyName("")]
[assembly: AllowPartiallyTrustedCallers()]

The needed assemblies turned out to be nunit.core.dll, nunit.core.interfaces.dll, nunit.framework.dll, and nunit.testutilities.dll. After I rebuilt them with the APTC attribute turned on, they loaded.

But I wasn’t home free. I found a couple of things that triggered runtime security exceptions. Here’s one, in this file:

public class DirectorySwapper : IDisposable
  {
  private string savedDirectoryName;
  public DirectorySwapper() : this( null ) { }
  public DirectorySwapper( string directoryName )
    {
    savedDirectoryName = Environment.CurrentDirectory;
    if ( directoryName != null && directoryName != string.Empty )
      Environment.CurrentDirectory = directoryName;
    }
  public void Dispose()
    {
    Environment.CurrentDirectory = savedDirectoryName;
    }
  }

The lines shown in red fail because the Azure trust policy, a “variation on the standard ASP.NET medium trust policy,” prevents changes to environment variables.

The other offender appears here:

private static Assembly FrameworkAssembly
  {
  get
    {
    if (frameworkAssembly == null)
    foreach (Assembly assembly in AppDomain.CurrentDomain.GetAssemblies())
      if (assembly.GetName().Name == "nunit.framework" ||
        assembly.GetName().Name == "NUnitLite")
          {
          frameworkAssembly = assembly;
          break;
          }
    return frameworkAssembly;
    }
  }

Because the Azure trust policy places restrictions on reflection, whereby code inspects (and perhaps modifies) itself, these calls to GetName trigger security exceptions. In this case, I believe NUnit is using reflection to segregate its own DLLs from the DLLs under test, in order to keep its internal bookkeeping straight.

My solution to both of these problems was naive and heavy-handed. I just commented out the handful of cases where NUnit tries to change the current directory, or find out if a DLL is one of its own or not. With those changes in place, here’s my Azure-embedded testrunner:

private tatic void doTests()
  {
  var suites = new Type[] {
    typeof(BlobStorageTest),
    typeof(DeliciousTest),
    typeof(EventCollectorTest),
    typeof(EventStoreTest),
    typeof(FeedRegistryTest),
    typeof(UtilsTest),
    };

  var fixtures = new List<TestFixture>();

  foreach (var suite in suites)
    fixtures.Add(TestBuilder.MakeFixture(suite));

  string report = string.Format("NUnit Tests at {0}\n\n",
    DateTime.Now.ToString());

  foreach (var fixture in fixtures)
    {
    TestSuiteResult results = (TestSuiteResult)fixture.Run(
         new NullListener());
      foreach (TestResult result in results.Results)
        {
        report += string.Format("{0}\n",result.Name);
        if ( ! result.IsSuccess )
          report += string.Format("{0}\n",result.Message);
        report += "\n";
        }
      }

  var bs = new BlobStorage();
  bs.put_blob("events", "nunit.txt", Encoding.UTF8.GetBytes(report));
  }

The aggregator is currently running on a 12-hour cycle. Every time it wakes up, it runs tests and writes this report before it collects events. (It’s a no-news-is-good-news-style report, so if all is well you’ll just see a list of tests.)

Conclusions

It’s nice to know that the aggregator will now test itself continuously, in its production environment. When you park a service in the cloud, you want all the feedback you can get. Constant flows of log data and test reports are essential in order to know that things are working correctly, or to find out why they’re not.

Although these methods are always advisable, I’ll admit I was lazy about them in the current version of the service. It’s running on a Linux box that I can ssh into and poke around on whenever I want. The same would be true if it were running on Amazon EC2. With Azure, as with Google’s App Engine, things are different. The execution environment is more of a black box. You can’t just jump in there and poke around. I miss that.

On the other hand, the black box architecture forces me to rethink some basic assumptions. Should my service expect to be able to modify environment variables? Should it even expect to communicate directly with a file system? We’ve always done things that way, but cloud computing invites us to move to a new level of abstraction. As always, that shift brings challenges along with opportunities.

I’m really of two minds about this. It is frustrating not to be able to use NUnit, unmodified, in Azure. I’m not sure what the effects of my surgery really are, or in what other ways NUnit may yet be incompatible with Azure. A mode of Azure that runs fully trusted code, and even allows EC2-style use of raw virtual machines, would be a wonderful option.

And yet … I haven’t been stymied so far. And part of me wants to embrace constraints in order to gain flexibility at another level of the stack.

From the comments on part one of this series:

“Either give me a machine in the cloud to work on our don’t (anything less is censorship)”

I’d rather have the opportunity to self-censor. And on Amazon EC2 I have that opportunity. That said, when I’ve used EC2 VMs I have been running as root. Why? No good reason, just path of least resistance.

Do you routinely run as root on your personal box, and on hosted boxes? If so, you can do that on EC2, and I suspect you’ll be able to on raw Azure VMs too. But setting the default to something less potent is, well, think about it. Have you ever condemned Microsoft for not being secure by default? How do you square that with condemning Microsoft for being secure by default?

More broadly, the cloud environment is going to challenge a lot of long-held assumptions in what I think will be useful ways. Less so for raw VM hosting a la Amazon, more so for the kinds of “fabrics” of which App Engine and Azure are examples.

That said, although I think it’s useful to challenge assumptions about access to environment variables and file systems, I chafe at the restrictions on reflection. My original plan was to use IronPython for this service, because I believe that the flexibility of dynamic languages will be a key asset in the dynamic environment of the cloud. Currently I’m using IronPython in auxiliary and complementary ways, outside of Azure, as I’ll explain in another installment. Meanwhile I’m finding that C# is becoming more and more dynamic. But reflection is at the core of that dynamism. I’m no expert on this subject, but will be interested to know what folks who are think about the tradeoffs that Azure’s trust policy entails.

If you were tuned into the blogosphere back in 2001, you’ll recall lots of chatter about RSS feed validation. RSS came in multiple flavors. Anyone could whip up a feed purporting to be in one or another of those formats, and many of us did. There were all kinds of questions about how and why feeds did or didn’t conform to the various specifications.

Nowadays we have even more flavors. There’s RSS 2.0. And there’s Atom, which isn’t a member of the RSS family at all, it’s a different species of feed format. And yet you rarely hear about problems with feeds that can’t be read and processed by feedreaders.

I think there are two reasons why RSS/Atom-style feeds work pretty well nowdays. First, there’s the Feed Validator. Mark Pilgrim and Sam Ruby put a huge amount of effort into this excellent tool. Why? Here is their explanation:

Despite its relatively simple nature, RSS is poorly implemented by many tools. This validator is an attempt to codify the specification (literally, to translate it into code) to make it easier to know when you’re producing RSS correctly, and to help you fix it when you’re not.

The second reason is that RSS/Atom-style syndication has been happening in a lot of places for a long time now. A lot of people have used, and helped to refine, the tools and techniques.

Now I’m exploring the parallel world of calendar syndication, using ICS feeds instead of RSS/Atom feeds. And it feels like 2001 all over again. There are ICS feeds out there, but nowhere near as many as RSS/Atom feeds. And my hunch is that even when ICS feeds are published, they’re often unused, so there isn’t enough feedback to flush out problems. Finally, the ICS equivalent of the RSS/Atom Feed Validator — a service called iCalendar Validator, based on a Java library called iCal4j — isn’t anywhere near as comprehensive and informative as the RSS/Atom Validator.

Here’s a chart that lists the iCalendar feeds currently being collected by the elmcity.info calendar aggregator.

feed producer valid in iCal4J loads with DDay.iCal loads with iCalendar.py loads with vObject
armadillos google yes yes yes yes
aveo google yes yes yes yes
chamber of commerce homegrown yes no yes yes
cheshire democrats google yes yes yes yes
frost free library drupal no no yes no
fuzzy logic google yes yes yes yes
gilsum church google yes yes yes yes
hannah grimes drupal yes yes yes no
keene high soccer google no yes yes yes
keene public library fusecal yes yes yes yes
keene state bodyworks google yes yes yes yes
mmama cinema google yes yes yes yes
mmama dance google yes yes no no
mmama music google yes yes yes yes
mmama visual google yes yes yes yes
monadnock folk wordpress ec3 yes yes yes yes
monadnock regional high unknown no yes yes yes
swamp bats google yes yes yes yes
town of gilsum google yes yes yes yes
unh coop extension homegrown no yes yes yes
upcoming yahoo no yes yes yes
ymca google yes yes yes yes

As you can see, the results are all over the map. Some purportedly valid feeds won’t load using one iCalendar library, some won’t load using another. Some purportedly invalid feeds do load.

I expect things will get worse before they get better. There are only a handful of different ICS producers represented here, but the two labeled homegrown were created directly or indirectly in response to my project. If we recapitulate the RSS/Atom experience with ICS, and lots more ad-hoc ICS feeds arrive on the scene, charts like this will go even redder.

To make them go green, we’ll need a more robust ICS validator.

For me, one of the 2008′s most important (but least remarked-upon) ideas was spelled out in this post which details how Ward Cunningham implemented Brian Marick’s notion of Visible Workings. The idea, briefly, is that businesses can wear (non-confidential aspects of) their business logic on their sleeves, observable to all.

In a year of devastating consequences ensuing from the lack of transparency in business, you’d think Ward and Brian would be celebrated for this work. No such luck. Partly, I’m sure, because their insights flow from the realm of software development and software testing, and don’t generalize in an obvious way.

It struck me this morning that yesterday’s item on using del.icio.us to manage trusted feeds may help to broaden the appeal of the idea.

In that item I mainly talked about the logistical benefits of the approach. You write less code, and you get to leverage existing infrastructure for data management, web UI, collaboration, and syndicated alerts. That’s all good. But there’s also a transparency benefit which I neglected to point out.

At this moment, for example, del.icio.us/elmcity is a snapshot of the feeds and contributors known to, and classified by, the live version of my service at elmcity.info/events. That version uses private lists of trusted feeds, and of new and trusted contributors. I haven’t yet cut over to the newly-rewritten Azure version, but when I do, it will use these public lists instead.

The del.icio.us/elmcity snapshot reports that there are 41 Eventful contributors of which 37 are trusted and 4 are new.

Why are the four new contributors still sitting in the holding tank? One I mentioned yesterday. jheslin created a venue, but no events. I plan to delete that contributor and wait to see if he or she shows up again with actual event contributions.

That leaves TallWilly, blahblah25, and michellelewis. Why are they still sitting in the holding tank? Here’s the crucial point: I’m not sure. I know that I reviewed them when they showed up, and applied a policy. If it were written down, which until now it hasn’t been, it would use language like “legitimate” and “substantive” to define the kinds of contributions that move a new contributor into the trusted bucket. But I can’t actually say how I applied that policy in these cases.

So let’s investigate. First, TallWilly. Clicking through, I find that TallWilly is no longer an Eventful user. Obviously I’ll want to remove him from the new bucket. Implicit rule now stated: Must be an Eventful user.

Second, blahblah25. Clicking through, I find only one event. Seems legit, and so far I haven’t required more evidence than a single legit event, so why didn’t I promote blahblah25? Oh, I see. Jan 4, 1900 12:30 AM isn’t a reasonable start date. Implicit rule now stated: Date must be reasonable.

(Of course there’s more to the story here. blahblah25′s bogus date was either a human error or a software error, or both. Ideally the aggregator, when rejecting a contribution on that basis, would notify the contributor and invite a correction.)

Third, michellelewis. Why didn’t I decide to trust her? Turns out it was just a mistake! Clicking through, I find an entire schedule of concerts, including this one at Fritz Belgian Fries on April 3, 2009. That event, and future events posted by michellelewis, absolutely belong on the calendar.

I only discovered this mistake by reviewing the lists of new and trusted contributors. In the existing version of the system, I’m the only one who can do that. But in the new version, everyone can. More eyeballs, fewer bugs.

Even more interesting, to me, is notion of developing and applying policy-driven business logic in a transparent way. Of course business processes can’t always work that way. But the default, now, is that none do. Sometimes, maybe more often than we imagine, we could flip that default. It would be an interesting experiment to try.

In my last entry, I sketched a strategy for maintaining lists of the Eventful and Flickr accounts that I consider trusted sources for the elmcity.info event and photo streams. I didn’t spell out exactly how I plan to maintain those lists, in the Azure rewrite of the service that I’m now doing, but David Hochman read my mind:

It sure would be interesting to syndicate those lists from a trusted del.icio.us feed, leveraging tags as a public data store, and allowing others to trust your trusted lists.

It sure would. And that’s just what I’m doing.

Part One: The User’s View

Here’s the del.icio.us account:

delicious.com/elmcity

Here are the trusted ICS feeds:

elmcity/trusted+ics+feed

Here are the trusted Eventful contributors:

elmcity/trusted+eventful+contributor

Here are the new Eventful contributors — that is, ones I’ve not yet marked as trusted:

elmcity/new+eventful+contributor

This is wildly convenient in several ways. For starters, I get a feed of new Eventful contributors for free:

feeds.delicious.com/v2/rss/elmcity/eventful+new+contributor

Anyone who subscribes to that feed is alerted to the appearance of a previously-unseen contributor of events within 15 miles of Keene. Here’s one:

eventful.com/users/jheslin

Clicking that link reveals that jheslin has created one venue, but so far no events. That’s not enough evidence on which to base a trust/no-trust decision. So what I’d do, in that case, is just delete the del.icio.us bookmark. If the aggregator were to see another event from jheslin, he (or she) will show up again in the feed. In that case, if jheslin has created events that look legitimate, I can decide to trust him (or her). How? Trivially, by editing the bookmark and changing the new tag to trusted.

That’s easy enough, but I don’t want to be forever responsible for monitoring this feed and making trust decisions. And thankfully I needn’t be. When I delegate that job to somebody else, I’ll just need to transfer the credentials to the del.icio.us/elmcity account, and explain what it means for an Eventful account to be bookmarked at del.icio.us/elmcity with a new or trusted tag, and how to decide when to promote an Eventful account from new to trusted.

The same technique can apply to other account-based event sources — for example, upcoming.org. It also applies to feed-based sources. I’ve been encouraging event publishers in Keene to create iCalendar feeds. Those feeds have URLs, and to include them in the aggregation, somebody just needs to bookmark them under the elmcity account with the tags trusted and ics and feed. Like this.

Same for new and trusted Flickr accounts that feed the photos page, for blogs that feed the blog directory, and for any other class of resource that might be contributed.

Part Two: The Developer’s View

Notice that I haven’t had to write any Web forms, any Ajax code, any database CRUD (create/read/update/delete) logic. Del.icio.us, a database with a Web user interface, takes care of all that. Which is fine by me, because life’s too short to write any more CRUD or Web UI than I have to. I’d rather do more interesting things.

By the same token, life’s too short to write more than a few lines of code to drive the CRUD apparatus. As I mentioned last time, I’m writing the core of the Azure event aggregator in C# rather than Python, because IronPython isn’t yet ready for prime time on Azure. I worried that a C# implementation would be too verbose, but I’ve been pleasantly surprised.

Here’s a C# method that reads a del.icio.us RSS feed and returns a dictionary (aka hashtable, aka associative array) of titles and links:

00 const string rssbase = "http://feeds.delicious.com/v2/rss/elmcity";

01 public static Dictionary<string,string> get_delicious_feed(string args)
02  {
03  var dict = new Dictionary<string,string>();
04  string url = String.Format("{0}/{1}", rssbase, args);
05  var response = Utils.FetchUrl(url);
06  var xdoc = Utils.xdoc_from_xml_bytes(response.data);
07  var items = from item in xdoc.Descendants("item")
08  select new { Title = item.Element("title").Value,
09     Link = item.Element("link").Value, };
10  foreach (var item in items)
11    dict[item.Link] = item.Title;
12  return dict;
13  }

The Python equivalent is more concise, but not by much. I am, admittedly, deferring any discussion of the Utils class which I’m using to make the .NET Framework’s HttpWebRequest/HttpWebResponse classes feel more Pythonic to me.

Also noteworthy here is the use of the generic collection class, Dictionary (lines 3, 11, 12), instead of the more Pythonic (and Java-like) Hashtable. I’ll also defer discussion of tradeoffs between Dictionary and Hashtable until I’ve learned more about them.

Finally, I’ll defer discussion of the LINQ-to-XML idioms (lines 6-10) until I’ve learned more about the tradeoffs between LINQ-to-XML and the XPath style which I’m more familiar with, and which is more widely available.

For now, I’ll just observe that this C# method is readable, debuggable, and Azure-deployable.

Here are some of the ways the above method will be used in the service:

get_delicious_feed("trusted+feed+ics")
get_delicious_feed("trusted+eventful+contributor")
get_delicious_feed("new+flickr+contributor")

For example, here’s the method that the aggregator uses to check whether or not to include an Eventful event contributed by a given Eventful account:

01 public static bool isTrustedEventfulContributor(string accountname)
02  {
03  var dict = get_delicious_feed("trusted+eventful+contributor");
04  var re = new Regex("eventful.com/users/([^/]+)/created/events");
05  return match_url(dict, re, accountname);
06  }

The regular expression at line 4 matches URLs like this:

eventful.com/users/judell/created/events

If you check the corresponding Eventful page you’ll see why the aggregator posts bookmarks with addresses in this format. That way, the human who’s monitoring the feed can easily click through to eyeball the events created by a new user whose legitimacy needs to be checked.

To see how isTrustedEventfulContributor makes its yes/no determination, we need to unpack the match_url method. Here’s the first version I wrote:

private static bool match_url(Dictionary<string,string> dict,
  Regex re, string url)
  {
  bool isTrusted = false;
  Match m;
  foreach (string key in dict.Keys)
    {
    m = re.Match(key);
    if (m.Groups[1].Value == url)
      {
      isTrusted = true;
      break;
      }
    }
    return isTrusted;
  }

This worked, but didn’t have the concise, functional, Pythonic feel that I like. So I went back to the drawing board and came up with another version:

private static bool match_url(Dictionary<string,string> dict,
  Regex re, string url)
  {
  var keys = dict.Keys.ToList();
  var matched = keys.FindAll(x => re.Match(x).Groups[0].Value == url);
  return matched.Count == 1;
  }

This works identically, and it’s much closer to what I’d do in Python: Filter a list using a lambda expression.

Part Three: Conclusion

If you’re not a programmer — and in particular, a programmer who would be interested in Azure, or in a comparison between C# and Python — your eyes glazed over when you got to part two. That’s fine. There’s still an important takeway for you. Del.icio.us (and any del.icio.us-like service) is a database! You can use it, without doing any programming, to maintain lists of arbitrary sets of resources that can be queried and edited, with equal ease, by humans and by programs.

Whatever you can identify with a URL is fair game. You can invent your own simple business logic by defining rules for what tags to use, and when and how to change them. You can monitor RSS feeds, in any feedreader, in order to be alerted when monitored items change. You can share or delegate the work by sharing or delegating access to the del.icio.us account. And last but not least, when you need to get a programmer to make use of this database you and your collaborators have built, that person’s job will be drop-dead simple.

If you check the elmcity.info events page for March 7, 2008 you’ll see that Beau Bristow is performing at Keene State College at 8PM. The Eventful item that has syndicated to the events page doesn’t say anything else. There’s no link to beaubristow.com, though it’s easy enough to find. And there’s no more precise venue than Keene State College, though that’ll be easy enough to find as well, when the time comes.

But the item carries enough information to participate in a (still mostly nascent) network of calendar events. Beau Bristow doesn’t know that his concert shows up at elmcity.info, or that on March 7 it’ll show up at citizenkeene.ning.org and cheshiretv.org. And he shouldn’t need to know. But he ought to be able to take it for granted that events he posts to some kind of syndication source — could be Eventful, could be another public service, could be a personal iCalendar feed — will propagate.

I am particularly fascinated by the lightweight, ad-hoc interaction between Eventful, Beau Bristow, and elmcity.info. This lightness is a powerful enabler. If you’re Beau, and you need to promote 18 events in 18 towns, some of which you may only visit once in your career, you don’t have time — and can’t pay for the help — to build relationships in all those places. But you can assert that you’ll be in those places, on specified dates, doing a specified thing. And under the right circumstances, that’s enough.

The question I’ve been exploring is how to create those circumstances. One aspect of the answer, and the one I want to focus on here, is trusted feeds.

Originally, at elmcity.info, any Flickr photo mentioning “Keene NH” showed up in the photo stream, and any Eventful event located within 15 miles of the center of Keene showed up in the event stream. That arrangement was clearly open to abuse. Even though Flickr and Eventful try to take responsibility for their stuff, my aggregator had to take more responsibility for the subsets of their stuff it manages. So I created two lists of trusted contributors. One is a list of Flickr account names, and the other is a list of Eventful account names.

When the aggregator runs, a couple of times a day, it puts previously-unseen account names into a holding tank and writes those names to RSS feeds which I monitor here and here.

Yesterday I found Dan York in the Flickr holding tank, and Beau Bristow in the Eventful holding tank. I happen to know Dan, but even if I didn’t, it only takes a minute to judge that his Flickr portfolio is legitimate. I don’t know Beau, but again it’s easy to determine that his Eventful presence is legitimate. So I marked both accounts as trusted, and today their contributions appear on the site.

If a trusted account ever abuses that trust, it’s easily revoked.

When I tell folks about this model of event syndication, they sooner or later realize that it’s an invitation to spam and ask about that. My answer is trusted feeds. It would be impossible to moderate every event flowing through your network. But it’s easy to moderate a much smaller number of event sources.

For about a week now, I’ve been running a service in the Azure cloud that aggregates calendar events from Eventful.com and from a diverse set of iCalendar feeds. As I mentioned last month, my aim is to recreate and then extend my experimental elmcity.info community information hub, while exploring and documenting the evolution of Azure and the layered services emerging on top of it.

I haven’t written a whole lot about programming here for a while, because I’ve trying to to explain the whys and wherefores of syndication-oriented communication to a wider audience. But as I build out this service I’m learning a lot about cloud-based software development in general, and about Azure in particular, and I want to narrate this work. I’ll try to do it in a way that will inform developers who currently use Microsoft tools and technologies, as well as those who don’t. But I’ll also try to be accessible to folks who don’t write software, yet would like to learn something about the opportunities that cloud computing is creating as well as the challenges it poses.

The service, as it currently exists, is running as an Azure worker role. That means it does input, processing, and output, but presents no user interface. The inputs are Eventful.com, accessed by way of its API, and a growing set of public iCalendar feeds. The processing involves reading calendar events and normalizing them to a common intermediate format. The output is currently XML to the Azure blob store, one file for Eventful and another for the iCalendar feeds.

I’m only allocating one instance of this worker process, and that’s probably enough horsepower for any single community’s events. But I’d like to be able to scale out the aggregator to serve other communities as well, potentially many others. Turning up the dial to do that would be a nice illustration — and test — of the cloud computing fabric.

The existing aggregator at elmcity.info is written in Python, and my original plan was to port it with minimal change to IronPython on Azure. That didn’t work out because, although bare-bones IronPython code runs on Azure as I show here, you quickly run into restrictions imposed by Azure’s security sandbox. The trust policy, defined here, is based on a feature of the .NET platform known as code access security (CAS).

When you upload code to the Azure cloud, or run it in the local development fabric, the hosting environment only partly trusts your code, and also only partly trusts any components used by your code. This is part of a layered, defense-in-depth security strategy, prudent for the same reason that it’s prudent to run your own computer as a partly-trusted user instead of an all-powerful administrator. It is also problematic for the same reason. A lot of Windows applications used to require administrative privilege in order to run properly, and some — though fewer month by month — still do. Similarly, a lot of .NET components that run happily in the fully-trusted environment of your local computer won’t run in Azure’s medium-trust environment, or (what’s nearly equivalent) in Internet Information Server 7 (IIS 7) when its security mode is set to medium trust.

I am no expert on the subject of code access security, but here’s what I think:

  • The medium-trust policy is probably a good thing.
  • It does, however, impede instant gratification when you’re mixing components from various sources.
  • But that impedance will diminish as more component builders adopt the good practice of not making their components unnecessarily require full trust.

I think that IronPython is likely to become such a component, once the dust settles from the recent 2.0 release. (If you care about this issue, you can vote up its priority.) Meanwhile I’ve been working in C#, which has been a fascinating experience. On the one hand, I believe that dynamic languages like Python are excellent choice for agile development everywhere, and especially in the fluid environment of the cloud. On the other hand, I’m not a language bigot and have always appreciated the virtues of statically-typed languages.

My basic philosophy has always been to use a mix of best-of-breed tools in order to gain maximum leverage. The combination of IronPython and C#, on the .NET platform, is a really powerful one, for the same reason that the Jython/Java combo is. On this project, even though I am not yet deploying any code written in IronPython, I often use IronPython to test C# components that I’ve written or acquired.

Along the way, I’ve been recalling something IronPython’s creator, Jim Hugunin, said at the Professional Developers Conference back in October. Jim’s talk followed one by Anders Hejlsberg, the creator of C#. Anders showed an experimental future version of C# that makes use of the Dynamic Language Runtime which supports IronPython and IronRuby on .NET. The effect was to create an island of dynamic typing within C#’s otherwise statically-typed world. We all appreciated the delicious irony of a static type called ‘dynamic’.

Jim might have sounded a bit wistful when he said: “I’m not sure what a dynamic language is any more.” But I think this blurring of boundaries is a wonderful thing. Many smart people I deeply respect value the static typing of C#. Some of the same smart people, and many different ones, value the dynamic typing in languages like Ruby and Python. If I can leverage the union of what all of those smart people find valuable, I’ll happily do so.

I’ll have more to say about this project, and of course code to share, as things evolve. Meanwhile, though, I want to acknowledge Doug Day at DDay Software. When I switched from Python to C#, the key component I needed was an iCalendar module equivalent to MaxM’s excellent Python iCalendar module, which I’m using at elmcity.info. Doug’s DDay.iCal met the need. It’s a solid, cleanly-built, open source .NET component that enables code written in any of the .NET family of languages to parse, and generate, iCalendar (RFC 2445) files.

And now back to the project, which reminds me of the era at BYTE during which I got to build stuff while writing about what I was building. It’s great fun. And as John Leeke so eloquently says, it engages the mind, the hands, and the heart.

Information technologists often recite David Wheeler’s famous aphorism:

Any problem in computer science can be solved with another layer of indirection.

Often, though, they omit the corollary:

But that usually will create another problem.

Those problems used to plague only IT folk. But now we’re all involved. Effective social information management is quite severely constrained by the fact that regular folks are not (yet) taught the basics of computational thinking.

For example, when I explain my community calendar project to prospective contributors, they invariably assume that I’m asking them to enter their data into my database. It’s quite hard to convey: that the site isn’t a database of events, only a coordinator of event feeds; that I’m only asking them to create feeds and give me pointers to their feeds; that this arrangement empowers them to control their information and materialize it in contexts other than the one I’m creating.

I’m having some success explaining this model, but it’s slow going. People don’t take naturally to the indirection and abstraction.

Here’s another example. I know various folks who are trying to create online resource directories of one kind or another. I’ve identified a pattern, which I call collaborative list curation, that is an ideal way to solve this problem. Consider this directory of blogs for the Monadnock region. It looks like any other such directory, but it’s made differently. Again, there is no explicit database. Entries come from the del.icio.us tag delicious.com/judell/monadnockblog — a personal collection whose items are, currently, the same as those in the global collection delicious.com/tag/monadnockblog.

I’m subscribed to the global collection at feeds.delicious.com/v2/rss/tag/monadnockblog which means I can monitor it for new items, vet them, and transfer those I want to include to my personal collection. If I wanted to delegate that editorial control, I would point my directory-making service at the del.icio.us account of a trusted associate and have it camp on that account’s monadnockblog tag instead of (or in addition to) my own.

Of course this is all way too indirect for any normal person to grok, which is why nothing has been added to the global collection. Even many IT-savvy folks, I’m finding, don’t take naturally to this model.

That said, I’m finding that once I can get people to walk through one of these experiences, and see the connection — OK, I do this over here, and that happens over there, and it can also happen somewhere else, and I’m in control — the light bulb does go on.

Now we need to take forward-thinking evangelists like me out of the loop, and get people to discover for themselves how to wire the web. If Live Clipboard didn’t exist, we’d have to invent it. Oh wait. It doesn’t, and we do.

In July 1995 I wrote a column in BYTE with the same title as this blog post. It began:

One day this spring, an HTTP request popped out the back of my old Swan 386/25, rattled through our LAN, jumped across an X.25 link to BIX, negotiated its way through three major carriers and a dozen hosts, and made a final hop over a PPP link to its rendezvous with BYTE’s newborn Web server, an Alpha AXP 150 located just 2 feet from the Swan.

Thus began the project on which this column will report monthly. Its mission: To engage BYTE in direct electronic communication with the world, retool our content for digital deployment, and showcase emerging products, technologies, and ideas vital to these tasks. We don’t have all the answers yet — far from it. But we’re starting to learn how a company can provide and use Internet services in a safe, effective, maintainable, and profitable way.

Today I felt that same kind of excitement when I clicked on this URL:

http://elmcity.cloudapp.net

There isn’t much to see. But what happens behind the scenes is quite interesting to me. The URL hits a deployment in the Azure cloud where I’m hosting an IronPython runtime. Then it invokes that runtime on a file that contains this little Python program:

hello = "Hello world"

Finally, it gets back an object representing the result of that program, extracts the value of the hello variable, and pops it into the textbox.

This is the proof of concept I’ve been looking for. Now I can begin an experiment I’ve been wanting to do for a long time. I have an ongoing personal project, elmcity.info, about which I’ve written from time to time. It’s hosted at http://bluehost.com, it’s written in Python using Django, and it’s invoked by way of FastCGI.

Back in the BYTE era, I loved learning about the web by building out a live project, and explaining what I learned step by step. Now I want to explore, and document, what it’s like to build out another live project in the Azure cloud.

Could I do it in Amazon’s cloud? Sure. In fact I already did, as an experiment. And if it were cheaper to run there than on Bluehost, I’d currently be hosting elmcity.info on EC2 instead.

Could I do it in Google’s cloud? Not sure. I didn’t score an account there and can’t yet try. The interactive pieces of my application should slide nicely into AppEngine’s Django framework. But much of the work is done in long-running processes which I believe AppEngine doesn’t yet support.

In any case, it’s obvious why I’ll be focusing on Azure. I suspect, though, that my focus will be different than most. I’m not a hotshot .NET developer, just an average guy who can get some useful things done in environments that enable me to create small, simple, understandable programs, and do so in agile and dynamic ways. I think that Azure — admittedly nascent in its current form, as Ray Ozzie said at the PDC — can be such an environment. Let’s find out.

Next Thursday is World Usability Day, a distributed event that will happen in lots of places. One of them is Putney, Vermont, not far from my home, where I’ll be speaking at the New England venue, Landmark College.

The program says:

A description of Jon’s talk is forthcoming, but we’ve asked him to help the audience further their thinking about the potential of video on the web in support of teaching and learning, as well as the the importance of the structure behind the information with which we all work, exemplified by his work on compiling disparate web resources, as in his work on Keene-related events culled from the internet and viewable at elmcity.info/events.

Great suggestions! Video and structured data are very different domains. That creates a nice opportunity to talk about key underlying principles, and relate them to the practices of teaching and learning. So, here’s the blurb.

Title: Teaching users to be more usable teachers

Description:
Technologists and designers, including those who self-identify as usability professionals, think of themselves as creators of products and services for “the user” or “the consumer”. But as Eric von Hippel argues in Democratizing Innovation, producers and consumers are not, and never have been, distinct groups. At various times and in various contexts, we are all producers and consumers, teachers and learners, co-creators of products, services, experience, and knowledge.

We learn by imitating how good teachers think and act. Conversely, good teachers think and act in ways that inspire and reward imitation. In the era of peer production on peer networks, we can all be better teachers — more usable teachers — by thinking and behaving in ways that others can imitate easily and effectively. From this perspective, online video and structured data aren’t just new ways to distribute entertainment and information. They’re new environments for teaching and learning. Engineers and designers aren’t solely responsible for make these environments usable. We, the inhabitants, must make ourselves usable too.

This is going to be fun!

Paul Pival noticed a problem with the browser widget I made the other day to search Google and Live side-by-side. The service invoked by that widget, at dualsearch.atsites.net, fails when your query contains double-quoted phrases.

It’s an easy fix as I’ll demonstrate here. There are three ingredients:

  • A itty-bitty web application
  • A simple XML file
  • An even simpler HTML/JavaScript file

Let’s examine them.

1. The web application just receives a query, URL-encodes it, and interpolates it into the template for a web page that invokes the two search engines in side-by-side frames.

There are a million ways to do that. Here’s a Python/Django implementation:

def doublesearch(request):
  import urllib
  q = request.GET['q']
  q = urllib.quote(q)
  template = """<html>
<frameset cols="*,*" frameborder="no">
  <frame src="http://www.google.com/search?q=__QUERY__" />
  <frame src="http://search.msn.com/results.aspx?q=__QUERY__" />
</frameset>
</html>"""
  html = template.replace('__QUERY__',q)
  return HttpResponse(html)

2. The XML file contains an OpenSearch description that invokes that little web application, passing it the query that you type into your browser’s search box. Here’s an example that uses a sample service I’ve located at my elmcity.info site:

<?xml version="1.0" encoding="UTF-8" ?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>DoubleSearch</ShortName>
<Description>DoubleSearch provider</Description>
<Image height="16" width="16" type="image/x-icon">
http://jonudell.net/img/doublesearch.ico"</Image>
<InputEncoding>UTF-8</InputEncoding>
<Url type="text/html"
  template="http://elmcity.info/doublesearch/?q={searchTerms}" />
</OpenSearchDescription>

3. Finally, here’s the HTML file that encapsulates the snippet of JavaScript that installs the OpenSearch widget into your browser:

<a href="javascript:window.external.AddSearchProvider
 ('http://jonudell.net/doublesearch.xml')">Add</a> the
 DoubleSearch provider.

You can Add the DoubleSearch widget and try it for yourself. Unlike other variants I’ve found, this one doesn’t wrap any cruft around the side-by-side results. It simply presents them.

As I mentioned the other day, I’m finding that combining the top 10 results from both engines makes for a more useful set of 20 results than taking the top 20 from either.

With today’s wider screens, placing the two result frames side-by-side works out pretty well. In this mode, however, you’ll want to avoid clicking through directly on a result. Instead, right-click on the result and open it in a new tab.

In a recent series of items I discussed ways of turning an Internet data feed into a video crawl for use on a local public access cable television channel. In the last installment the solution had evolved into an IronPython script that fetches the data, writes XAML code to animate the crawl, and runs that XAML as a fullscreen WPF (Windows Presentation Foundation) application.

This week we finally got a chance to try out the live feed, and we didn’t like what we saw. For starters, the animation was jerky. The PC that became available for this project is an older box running Windows XP. I installed .NET Framework 3.0 on the box, and it now supports WPF apps, but not with the graphics acceleration needed for smooth scrolling.

Even with the smooth scrolling that we see on my laptop, though, it wasn’t quite right. This application displays a long list of events, and it’s going to grow even longer. We decided that a paginated display would be better, so I went back to the drawing board.

We’re happy with the result. It displays pages like so:

                  Community Calendar

 06:30 PM open/lap swim  (ymca) 

 07:00 PM Caregiving for Individuals with
   Dementia (unh coop extension) 

 07:00 PM Vicky Cristina Barcelona (colonial
   theatre) 

 07:30 PM Faculty Recital-Jazz (eventful: Redfern
  Arts Center) 

 events from http://elmcity.info    page 9 of 12

Pages fade in, display for 8 seconds, then fade out. There are a million ways to do this, but since I was already exploring IronPython, XAML, and WPF I decided to remix those ingredients. For my own future reference, and for anyone else heading down the same path, here are some notes on what I learned. As always, I welcome suggestions and corrections. I’m still a XAML beginner, and will be very interested to learn about alternative approaches.

The approach I take here is clearly influenced by my own past experience doing web development using dynamic languages. There’s no C# code, no compilation, no Visual Studio. The solution is minimal in the way I strongly prefer for simple projects: a single IronPython script that depends only on IronPython and .NET Framework 3.0.

When developing for the web, I typically build a HTML/JavaScript mockup, view it in a browser, and then consider how to generate that HTML and JavaScript. Here, XAML is the HTML, and a XAML viewer is the browser. The conventional XAML viewer that comes with the Windows SDK is called XAMLPad, but it’s a beefier tool than I needed for this purpose, so I wound up using the more minimal XamlHack.

I started with the contents of a single page:

<Canvas ClipToBounds="True" Background="Black"
  Width="800" Height="600">

<TextBlock x:Name="page1" Canvas.Top="0" Canvas.Left="20"
  Foreground="#FFFFFF" FontSize="36" FontFamily="Arial"
  xml:space="preserve">

<![CDATA[
 06:30 PM open/lap swim  (ymca) 

 07:00 PM Caregiving for Individuals with
   Dementia (unh coop extension)
]]>

</TextBlock>
</Canvas>

I found that text formatting isn’t WPF’s strong suit, so I’m using the XAML equivalent of an HTML <pre> tag to display text that’s preformatted in IronPython.

Next, I added the fade-in and fade-out effects

<Canvas ClipToBounds="True" Background="Black"
  Width="800" Height="600">
<TextBlock.Triggers> <EventTrigger RoutedEvent="FrameworkElement.Loaded"> <BeginStoryboard> <Storyboard> <DoubleAnimation BeginTime="0:0:0" Storyboard.TargetName="page1" Storyboard.TargetProperty="Opacity" From="0" To="1" Duration="0:0:1" /> # 1 sec fade in <DoubleAnimation BeginTime="0:0:9" # wait 8 sec Storyboard.TargetName="page1" Storyboard.TargetProperty="Opacity" From="1" To="0" Duration="0:0:1" /> # 1 sec fade out </Storyboard> </BeginStoryboard> </EventTrigger> </TextBlock.Triggers>
+ <TextBlock x:Name="page1" Canvas.Top="0" ...> </Canvas>

I thought it would be possible to chain together a series of these animations, and nest that series inside another animation in order to create the infinite loop that’s required. There may be a way to do that in XAML, but I didn’t find it. So, since I was already planning to generate the XAML — in order to interpolate current event data, plus a variety of attribute values — I went with a generator that produces a series of these pages. That solved chaining, but not looping. To make the sequence loop, I added a second timer/event-handler pair to the IronPython script. The first handler reloads the data once a day. The second handler reloads the XAML at intervals computed according to the number of pages for each day, thus looping the animation.

Next I added XAML elements for the header and footer. The header is static, but the footer has a dynamic page counter so I animated it in the same way as the page.

Next I made templates for all the XAML elements. Here’s the footer template:

template_footer = """<Label x:Name="footer___FOOTER_PAGE_NUM___"
  Canvas.Top="___FOOTER_CANVAS_TOP___" Canvas.Left="___
  FOOTER_CANVAS_LEFT___" Foreground="#FFFFFF" xml:space="preserve"
  FontSize="___FOOTER_FONTSIZE___" FontFamily="Arial" Opacity="0">
           page ___FOOTER_PAGE_NUM___ of ___FOOTER_PAGE_COUNT___
<Label.Triggers>
<EventTrigger RoutedEvent="FrameworkElement.Loaded">
  <BeginStoryboard>
    <Storyboard>
     <DoubleAnimation
      BeginTime="___BEGIN_FADE_IN___"
      Storyboard.TargetName="footer___FOOTER_PAGE_NUM___"
      Storyboard.TargetProperty="Opacity"
       From="0" To="1" Duration="___FADE_DURATION___"  />
     <DoubleAnimation
      BeginTime="___BEGIN_FADE_OUT___"
      Storyboard.TargetName="footer___FOOTER_PAGE_NUM___"
      Storyboard.TargetProperty="Opacity"
       From="1" To="0" Duration="___FADE_DURATION___"  />
     </Storyboard>
  </BeginStoryboard>
</EventTrigger>
</Label.Triggers>
</Label>
"""

The script uses variables that correspond to the uppercase triple-underscore-bracketed names. So, for example:

___FOOTER_CANVAS_TOP___ = 520
___FOOTER_CANVAS_LEFT___ = 10
___FOOTER_FONTSIZE___ = 28

To avoid typing all these names twice in order to interpolate variables into the template, I cheated by defining this pair of Python functions:

def isspecial(key):
  import re
  return re.match('^___.+___$',key) is not None 

def interpolate(localdict,template):
  specialkeys = filter(isspecial,localdict.keys())
  for key in specialkeys:
    exec("""template = template.replace("%s",
      str(localdict['%s']))""" % (key,key))
  return template

Given that setup, here’s the core of the XAML generator:

def create_xaml(raw_text,watch_time,fade_duration):

  ___TITLE_TEXT___ = 'Community Calendar'
  ___BODY_TEXT___ = ''
  ___BODIES_AND_FOOTERS___ = ''
  ___BODY_NUM___ = 0
  ___FOOTER_PAGE_NUM___ = 0
  ___BODY_CANVAS_TOP___ = 0
  ___BODY_CANVAS_LEFT___ = 20
  ___BODY_FONTSIZE___ = 36
  ___TITLE_CANVAS_TOP___ = -30
  ___TITLE_CANVAS_LEFT___ = 200
  ___TITLE_FONTSIZE___ = 34
  ___FOOTER_CANVAS_TOP___ = 520
  ___FOOTER_CANVAS_LEFT___ = 10
  ___FOOTER_FONTSIZE___ = 28
  ___FOOTER_PAGE_COUNT___ = 0
  ___FOOTER_PAGE_NUM___ = 0
  ___BEGIN_FADE_IN___ = ''
  ___BEGIN_FADE_OUT___ = ''
  ___FADE_DURATION___ = ''

  pagecount = 0
  for page in page_iterator(raw_text):
    pagecount += 1
  ___FOOTER_PAGE_COUNT___ = pagecount

  begin_fade_in = 0
  begin_fade_out = begin_fade_in + fade_duration + watch_time

  pagenum = 0

  for page in page_iterator(raw_text):
    pagenum += 1

    ___BODY_TEXT___ = page
    ___BODY_NUM___ = pagenum
    ___FOOTER_PAGE_NUM___ = pagenum
    ___BEGIN_FADE_IN___ = makeMinsSecs(begin_fade_in)
    ___BEGIN_FADE_OUT___ = makeMinsSecs(begin_fade_out)
    ___FADE_DURATION___ = makeMinsSecs(fade_duration)

    body = interpolate(locals(),template_body)

    footer = interpolate(locals(),template_footer)

    ___BODIES_AND_FOOTERS___ += body + footer

    begin_fade_in = begin_fade_out + fade_duration
    begin_fade_out = begin_fade_in + fade_duration
     + watch_time

  xaml = interpolate(locals(),template_xaml)

  return (pagecount,xaml)

I guess I could rely less on XAML code generation and exploit IronPython’s ability to dynamically reach into and modify live .NET objects. That would be the WPF analog to JavaScript DOM-tweaking in the web realm. But this works, it’s easy enough to understand, and it’s handy for debugging purposes to have the generated XAML lying around in a file I can easily inspect.

Finally, here’s the core of the application itself:

class CalendarDisplay(Application):

  def load_xaml(self,filename):
    from System.Windows.Markup import XamlReader
    f = FileStream(filename, FileMode.Open)
    try:
      element = XamlReader.Load(f)
    finally:
      f.Close()
    return element

  def loop_handler(self,sender,args):  # reload XAML

    def update_xaml():
      self.window.Content = self.load_xaml(self.xamlfile)

    self.loop_timer.Dispatcher.Invoke(DispatcherPriority.Normal,
      CallTarget0(update_xaml))

  def day_handler(self,sender,args):     # fetch data, generate XAML

    def update_xaml():
      self.pagecount = calendarToXaml(self.path,self.xamlfile,self.url,
        self.cachefile,self.watch_time,self.fade_duration)
      self.window.Content = self.load_xaml(self.xamlfile)

    self.day_timer.Dispatcher.Invoke(DispatcherPriority.Normal,
      CallTarget0(update_xaml))

  def __init__(self):

    Application.__init__(self)

    self.xamlfile = 'display.xaml'
    self.path = '.'
    self.cachefile = 'last.txt'
    self.url = 'http://elmcity.info/events/todayAsText'
    self.watch_time = 8
    self.fade_duration = 1
    self.pagecount = calendarToXaml(self.path,self.xamlfile,self.url,
      self.cachefile,self.watch_time,self.fade_duration)

    self.window = Window()
    self.window.Content = self.load_xaml(self.xamlfile)
    self.window.WindowStyle = WindowStyle.None
    self.window.WindowState = WindowState.Maximized
    self.window.Topmost = True
    self.window.Cursor = Cursors.None
    self.window.Background = Brushes.Black
    self.window.Foreground = Brushes.White
    self.window.Show()

    self.day_timer = DispatcherTimer()
    self.day_timer.Interval = TimeSpan(24, 0, 0)
    self.day_timer.Tick += self.day_handler
    self.day_timer.Start()

    self.loop_timer = DispatcherTimer()
    interval = self.pagecount * (self.watch_time + self.fade_duration*2)
    self.loop_timer.Interval = TimeSpan(0, 0, interval)
    self.loop_timer.Tick += self.loop_handler
    self.loop_timer.Start()

CalendarDisplay().Run()

Next month marks the tenth anniversary of RFC 2445 (iCalendar), the specification that describes how Internet applications represent and exchange calendar information. The authors of RFC 2445 were Frank Dawson (now with Nokia) and Derik Stenerson (now with Microsoft). I asked both to join me to reflect on the past, present, and future of this key standard. Only Derik was available, and he’s my guest for this week’s ITConversations show.

If you’ve followed my blog you’ll know that I’ve come to regard the ICS files that iCalendar-aware apps create and consume as feeds that could and should form a syndication ecosystem analogous to the RSS ecosystem. So in addition to filling us in on how iCalendar came to be, Derik considers whether the analogy holds water, and concludes that it probably does.

Although iCalendar has been around for a decade, I argue that the confluence of syndication and personal publishing, in the calendar domain, requires three enablers.

First, you need a workable syndication format, and we have that: RSS for blogs, ICS for calendars.

Second, you need what we used to call one-button personal publishing. Bloggers have had that capability for a long time. Calendar users have it too, but it’s emerged relatively recently, and many aren’t aware of it.

Third, you need feed aggregators. These proliferate in blogspace but, I argue, are conspicously absent from calendar space. Services like Eventful and Upcoming produce calendar feeds. But because they do not consume them, they don’t encourage individuals and groups to publish feeds, and to think and act in a syndication-oriented way. I’ve prototyped a calendar aggregator at http://elmcity.info/events/, but the category isn’t yet well-established.

If my analysis is correct, one or more well-known services that both consume and produce calendar feeds would unlock the latent potential of iCalendar and help us jumpstart a calendar syndication ecosystem.

Thanks to my calendar syndication project, I’ve gotten intimately familiar with how various calendar programs — including Outlook, Google Calendar, and Apple iCal — handle the entry of recurring events. They all make the task reasonably straightforward, but there’s one vexing problem. There isn’t a way to specify exceptions. My local YMCA, for example, is closed for maintenance during the last week of August. You could enter a “YMCA closed” event for that week, and hope that it gets rendered so that people will understand it to override all the recurring events shown for that week. But that’s not a great workaround.

Really, you’d like to be able to specify exceptions as part of the recurrence rule. To do that in a standard way, that capability would have to be part of the iCalendar standard. And sure enough, it is:

Property Name: EXRULE

Purpose: This property defines a rule or repeating pattern for an
exception to a recurrence set.

Property Name: EXDATE

Purpose: This property defines the list of date/time exceptions for a
recurring calendar component.

But none of the calendar programs I’m familiar with seem to support these features as part of event data entry. Are there others that do? Even if there are, I couldn’t depend on the feature being ubiquitously available to folks who contribute to the calendar network I’m trying to assemble.

The service operates as an iCalendar intermediary, though, so it might be able to inject some exceptions — at least for global EXRULEs like “YMCA closed last week of August”. It’d be harder for event-specific EXRULEs like “Pool closed for maintenance July 22″ which would affect a subset of events, or “Kickboxing class won’t be held Sept 14 or 18″ which would affect a single event.

One of the questions this project has led me to ponder is: Why, after all these years, are calendar programs not used as extensively as it seems they should be? Maybe this is part of the answer. Exceptions to rules are part of the fabric of real life. If the software doesn’t enable people to specify those exceptions, that’s a problem.

Update: Thanks to commenters for pointing out that of course calendar programs enable people to specify exceptions. They just don’t do it the way I expected, i.e. as a continuation of the dialog used to specify the recurrence rule. Instead they do it by enabling you to edit or delete a single event in the series, after the series has been created.

Now I’m curious as to whether my expectation is a geeky aberration that hasn’t affected most people, who have been happily creating exceptions all along. Or whether it’s broadly undiscovered by civilians too.