February 2009
Monthly Archive
February 27, 2009
Posted by Jon Udell under
Uncategorized [10] Comments
Following up on yesterday’s entry, here is an instance of the calendar aggregator for Ann Arbor, Michigan, a town I lived in for a long time and remember fondly: Events in and around Ann Arbor.
It’s controlled by a Delicious account — delicious.com/a2cal — which I’ll happily relinquish to a more appropriate curator.
There are two primary sources of information. First, events posted to Eventful.com at locations within 15 miles of Ann Arbor. Second, Google calendars that turn up in a search for Ann Arbor.
My notion was that this would be a nice way to bootstrap an instance of the aggregator. Not all the Google calendars will be appropriate, and there are of course many other iCalendar feeds that I don’t know about and can’t easily find. But there’s enough here to serve as a proof of concept, and maybe attract the interest of one or more curators. As a curator, you’d do things like:
-
Tweak the template and the image. (Josh Band, I cropped your photo just as a placeholder, hope that’s OK.)
-
Weed out inappropriate feeds.
-
Add new feeds.
-
Edit feed titles, and provide url=http://HOMEPAGE tags so that all events link somewhere.
Unfortunately, just as I was gearing up to roll out this approach, the Search Public Calendars feature of Google Calendar went AWOL. (Perhaps, as one commenter suggests, as a security measure.) I had searched out Ann Arbor iCalendar feeds a couple of weeks ago, and saved the list, but that procedure isn’t repeatable now for Ann Arbor or anywhere else.
In any case, I hope this illustrates the idea. One or more curators maintain a list of feeds for a community, and the service aggregates them. If you’d like to play along, create a Delicious account along the lines of delicious.com/elmcity or delicious.com/a2cal and let me know about it.
February 26, 2009
Posted by Jon Udell under
Uncategorized [24] Comments
This week my ongoing fascination with Delicious as a user-programmable database took a new turn. Earlier, I showed how I’m using Delicious to enable collaborative curation of the set of feeds that drives an aggregation of community calendars.
The service I’m building in this ongoing series has so far collected calendars only for a single community — mine. But the idea is to scale out so that folks in other communities can use it for their own collections of calendars.
As I refactored the code this week to prepare for that scale-out, I thought about how to manage the configuration data for multiple instances of the aggregator. This is a classic problem, there are a million ways to solve it, and I thought I’d seen them all. But then I had a wacky idea. If I’m already using Delicious to enable community stakeholders to curate the sets of feeds they want to aggregate, why not also use Delicious to enable them to manage the configuration metadata for instances of the aggregator?
Here’s a way to do that. Consider this URL:
http://delicious.com/elmcity/metadata
It refers to an URL that doesn’t actually point to anything — click it and you’ll see that for yourself. So it’s really an URN (Uniform Resource Name) rather than an URL (Uniform Resource Locator).
But even though it doesn’t point to anything, it can still be bookmarked. The owner of the elmcity account on Delicious can click Save a Bookmark and put http://del.icious.com/elmcity/metadata into the URL field.
Now you can attach stuff to the bookmark, like so:
Here the title of the bookmark is metadata, and the tags are these strings:
tz=Eastern
title=events+in+and+around+keene
img=http://elmcity.info/media/keene-night-360.jpg
css=http://elmcity.info/css/elmcity.css
contact=judell@mv.com
where=keene+nh
template=http://elmcity.info/media/tmpl/events.tmpl
These strings are, implicitly, name=value pairs. The service that reads this configuration data from Delicious can easily make them into explicit names and values. But how does it find them? By looking up the metadata URL, like so:
delicious.com/url/view?url=http://delicious.com/elmcity/metadata
That request redirects to the special Delicious URL that uniquely identifies the bookmark:
delicious.com/url/9ee9d2e51e4f36d4d49207e1675b3cbb
Of course the service doesn’t want to dig the name=value pairs out of that web page. So instead it reads the page’s RSS feed:
feeds.delicious.com/v2/rss/url/9ee9d2e51e4f36d4d49207e1675b3cbb
To prove that it works, check out this prototype version of the elmcity calendar. That page was built by an Azure service that reads configuration data from the bookmarked URN, and interpolates the name=value pairs into the template specified in the metadata.
Is this crazy? Here are some reasons why I think not.
First, I’m embracing one of a programmer’s greatest virtues: laziness. Why write a bunch of database and user-interface logic just to enable folks to manage a few small collections of name=value pairs? Delicious has already done that work, and done it much better than I could.
Second, the configuration data lives out in the open where stakeholders can see it, touch it, and collaboratively manage it. There are all kinds of ways Delicious can help those folks do that. For example, anyone who cares about this collection of data can subscribe to its feed and receive notifications when anything changes.
Third, it’s easy to extend this model. For example, part of the workflow will entail one or more stakeholders deciding to trust a feed and put it into production. As you may recall, the service trusts a feed when it’s bookmarked with the tag trusted. Part of that approval process will involve making sure that there are URLs associated with events coming from the feed. Some iCalendar feeds provide them, but many don’t.
So in addition to the configuration that’s needed once for each instance of a community aggregator, there’s a bit of configuration that’s needed once per feed. If a feed doesn’t provide URLs for individual events, you can at least provide a homepage URL for the feed. And this piece of metadata can be managed in the same way. Here’s the bookmark for the Gilsum church. It carries the tag url=http://gilsum.org/church.aspx. As you browse around in a set of trusted feeds, it’s pretty easy to see which ones do and don’t carry those tags, and it’s pretty easy to edit them.
It all adds up to a ton of value, and to capture it I only had to write the handful of lines of code shown below.
Now I’ll grant this way of doing things won’t work for everybody, so at some point I may need to create an alternative. And since I don’t want to depend on Delicious being always available, I’ll want to cache the results of these queries. But still, it’s amazing that this is possible.
public Dictionary<string, string>
get_delicious_feed_metadata(string metadata_url, string account)
{
var dict = new Dictionary<string, string>();
var url = string.Format("http://delicious.com/url/view?url={0}",
metadata_url);
var http_response = Utils.FetchUrlNoRedirect(url);
var location = http_response.headers["Location"];
var url_id = location.Replace("http://delicious.com/url/", "");
url = string.Format("http://feeds.delicious.com/v2/rss/url/{0}",
url_id);
http_response = Utils.FetchUrl(url);
var xdoc = Utils.xdoc_from_xml_bytes(http_response.data);
string domain = string.Format("http://delicious.com/{0}/", account);
var categories = from category in xdoc.Descendants("category")
where category.Attribute("domain").Value == domain
select new { category.Value };
foreach (var category in categories)
{
var key_value = Utils.RegexFindGroups(category.Value,
"^([^=]+)=(.+)");
if (key_value.Count == 2)
dict[key_value[0]] = key_value[1].Replace('+', ' ');
}
return dict;
}
February 26, 2009
My guest on this week’s Innovators show is Mark Baker. All of us who celebrate the web owe Mark a debt of gratitude for passionately articulationg key RESTful principles — uniform interfaces, statelessness, hyperlinked representations — back when they were a lot more controversial than they are now.
Mark worried about the interview because he had a wicked cold at the time, and actually so did I. But thanks to the miracle of audio editing, it came out quite well!
February 25, 2009
Posted by Jon Udell under
Uncategorized 1 Comment
Carl Malamud believes that he’d make a great Public Printer of the United States. And he’s right. There is nobody on the planet more qualified to reinvent the Government Printing Office, and there’s never been a time when that mattered more.
Of course nobody’s asked him. But meanwhile, over here, he’s doing the job, and he’ll keep doing it no matter what.
From the New York Times:
“If called, I will certainly serve,” he said. “But if not called, I will probably serve anyway.”
I hope he gets the call.
PS: A lot of folks have done interviews with Carl. Here’s mine.
February 18, 2009
Posted by Jon Udell under
Uncategorized [8] Comments
Back in the good old days, circa 2006 or so, I was a happy podcast listener. During my many long periods of outdoor activity — running, hiking, biking, leaf-raking, snow-shoveling — I sometimes listened to music, but more often absorbed a seemingly endless stream of spoken-word lectures, conversations, and entertainment. Some of my sources were conventional: NPR (CarTalk, FreshAir), PRI (This American Life), BBC (In Our Time), WNYC (Radio Lab). Others were unconventional: Pop!Tech, The Long Now Foundation, TED, ITConversations, Social Innovation Conversations, Radio Open Source.
But once I caught up with these catalogs, there wasn’t enough of the right kind of new flow to provide the intellectual companionship that enriches my solo excursions. That’s problem number one.
Problem number two is more mundane, but still vexing. I’m subscribed to all the aforementioned feeds (and more) in iTunes. When I update them, I wind up taking a screenshot like this:
Why? Because although the downloads window conveniently lists all the shows I want to hear over the next day or so, this view evaporates once the files are downloaded. The shows retreat to separate branches of the iTunes tree. And I can never remember which branches I need to visit in order to copy those files to my trusty Creative MUVO MP3 player. In this case, the branches are Pop!Tech, Long Now, This American Life, and Radio Lab. But there are a bunch of others too, hence the need for this accounting hack.
So far, SpokenWord.org is more helpful with the second problem than with the first. I’m using it to consolidate feeds. From the FAQ:
Think of SpokenWord.org as a funnel. You collect streams (RSS feeds) of programs from all over the Web, then combine them into a singe collection on SpokenWord.org. Then in iTunes you subscribe to just one feed: the feed from your SpokenWord.org collection.
Managing feeds, in addition to (or instead of) managing items, is an aspect of digital literacy that’s only just emerging. I think it’s critical, so I’m a keen observer/participant in various domains: blogging, microblogging, calendaring, or — in this case — audio curation. The notion of a podcast metafeed comes naturally to me. But I’m curious about who will or won’t adopt the practice. It entails a level of indirection which, as we know, can be a non-starter for a lot of folks.
What about the first problem? I’m hoping that SpokenWord will become a place where curators emerge who lead me to places I wouldn’t have gone. That’s what thrilled me about Webjay, five years ago. The world wasn’t ready for collaborative curation then, and the domain of music was (and is) encumbered. But we’re five years on, and most of the spoken word audio that might usefully be curated is unencumbered. Maybe the time is right for folks like OddioKatya — my favorite webjay on Webjay, back in the day — to build reputations and followings in the domain of spoken word audio.
That hasn’t happened yet, of course, since SpokenWord.org just launched in beta this week. Meanwhile, the site offers a variety of lenses through which to view its growing collection of feeds and programs: tags, categories, ratings, user activity. So far I’m finding the activity to be most helpful. I’m either already familiar with, or not interested in, much of what I see. But the Active Collectors bucket on the home page has alerted me to a couple of feeds I hadn’t known about, notably BBC World’s DocArchive.
Disclosure: I am on the ITConversations Board of Directors. At a meeting last summer, a consensus emerged to focus on collaborative curation rather than original production. My contribution has been to connect Doug Kaye with Lucas Gonze (Webjay) and Hugh McGuire (LibriVox — two useful points of reference — and to try to help Doug clarify how curation can happen in this realm.
For me, SpokenWord.org in its current form is very useful for feed consolidation, and not yet quite as useful for discovery and curation. All these aspects will surely evolve as more people engage with it. I’ll be curious to know what those who listen to spoken word podcasts — and those would like to curate them — think about the service.
February 17, 2009
Posted by Jon Udell under
Uncategorized [4] Comments
In an earlier installment of the elmcity+azure series, I created an event logger for my Azure service based on SQL Data Services (SDS). The general strategy for that exercise was as follows:
- Make a thin wrapper around the REST interface to the query service
- Use the available query syntax to produce raw results
- Capture the results in generic data structures
- Refine the raw results using a dynamic language
Now I’ve repeated that exercise for Azure’s native table storage engine, which is more akin to Amazon’s SimpleDB and Google’s BigTable than to SDS. Over on GitHub I’ve posted the C# interface library, the corresponding tests, and the IronPython wrapper which I’m using in the interactive transcript shown below.
As in the SDS example, I’m using the C#-based library in two complementary ways. My Azure service, which currently has to be written in C#, uses it to log events. But when I want to analyze those logs, I use the same library from IronPython.
I haven’t made a CPython version of this library, but it would be straightforward to do so. More generally, I’m hoping this example will help anyone who wants to understand, or create alternate interfaces to, the Azure table store’s RESTful API.
>>> from tablestorage import *
>>> list_tables()
['test1']
>>> r = create_table('test2')
>>> print r.http_response.status
Created
>>> list_tables()
['test1', 'test2']
>>> nr = nextrow()
>>> nr.next()
'r0'
>>> d = {'name':'jon','age':52,'dt':System.DateTime.Now}
>>> r = insert_entity('test2',pk,nr.next(),d)
>>> print r.http_response.status
Created
>>> for i in range(10):
... d = {'name':'jon','count':i}
... r = insert_entity('test2',pk,nr.next(),d)
... print r.http_response.status
...
Created
...etc...
Created
>>> r = query_entities('test2','count gt 5')
>>> len(r.response)
4
>>> for dict in r.response:
... print dict
Dictionary[str, object]({'PartitionKey' : 'partkey1',
'RowKey' : 'r10', 'Timestamp' :
<System.DateTime object at 0x000000000000002E
[2/17/2009 12:37:54 PM]>, 'count' : 6, 'name' : 'jon'})
Dictionary[str, object]({'PartitionKey' : 'partkey1',
'RowKey' : 'r11', 'Timestamp' :
<System.DateTime object at 0x000000000000002F
[2/17/2009 12:37:55 PM]>, 'count' : 7, 'name' : 'jon'})
...etc...
>>> for dict in r.response:
... print dict['count']
6
7
8
9
>>> for entity in sort_entities(r.response,'count','desc'):
... print entity['count']
9
8
7
6
>>> r = update_entity('test2',pk,'r13',{'name':'doug','age':17})
>>> r = query_entities('test2','age eq 17')
>>> print r.response[0]['name']
doug
>>> r = merge_entity('test2',pk,'r13',{'sex':'M'})
>>> r = query_entities('test2','age eq 17')
>>> print r.response[0]['name']
doug
>>> print r.response[0]['sex']
M
>>> r = query_entities('test2', "sex eq 'M'")
>>> r.response.Count
1
>>> r.response[0]['RowKey']
'r13'
>>> r = delete_entity('test2',pk,'r13')
>>> r = query_entities('test2', "sex eq 'M'")
>>> r.response.Count
0
>>> r = query_entities('test2',"dt ge datetime'2009-02-16'")
>>> r.response.Count
1
>>> r.response[0]
Dictionary[str, object]({'PartitionKey' : 'partkey1',
'RowKey' : 'r2', 'Timestamp' :
<System.DateTime object at 0x000000000000005F
[2/17/2009 12:34:29 PM]>, 'age' : 52, 'name' : 'jon',
'dt' : <System.DateTimeobject at 0x0000000000000060
[2/17/2009 7:33:53 AM]>})
>>> r = query_entities('test2',"dt ge datetime'2010'")
>>> r.response.Count
0
February 11, 2009
Posted by Jon Udell under
Uncategorized [8] Comments
Yesterday David Stephenson interviewed me for the book he is was to be writing with Vivek Kundra who is currently Washington DC’s CTO and reportedly the next Office of Management and Budget administrator for e-government and information technology.
Back in 2006 I learned from DC’s previous CTO, Suzanne Peck, and from Dan Thomas, about their plan to publish operational data in the service of transparency and accountability. At the time, I hoped this effort would show how ordinary citizens, as well as journalists, could be empowered to ask and answer questions like:
Do people in poor neighborhoods wait longer for service requests to be handled?
Talking with David yesterday, I struggled to come up with examples where the online publication and visualization of public data supports that kind of analysis. The best one I’ve seen lately comes from Eric Rodenbeck’s talk at ETech.
Eric’s company, Stamen Design, created Oakland Crimespotting. And yes, it’s another in a long line of mashups that spray crime data onto a Google Maps (or, in this case, Virtual Earth) display. But here’s the part of Eric’s talk that really got my attention:
There were no prostitution arrests for about a month. Then one day the cops started at one end of San Pablo Avenue, and you can watch them moving up the street and making arrests.
It wouldn’t have occurred to a citizen, or to a reporter, to ask the question:
Have the cops decided to crack down on prostitution?
Here the policy decision to conduct a sweep emerges from the data. There are two crucial enablers. First, the use of a map as a query interface. That’s common. But second, the use of animation to observe flows of data in time as well as in space. That’s still much rarer.
In the software community there’s vigorous debate about whether we need to rely on plugins like Flash and Silverlight to animate data in ways that enhance its analysis. My answer: It depends. Clearly much can already be done, and more will be done, with the basic web platform: browsers operating in an increasingly rich ecosystem of web services. Look at how the Rocky Mountain Institute uses animation to tell a story about US oil imports much more effectively than my static presentation was able to do. And like Stamen’s Oakland Crimespotting animation, the RMI’s oil import animation doesn’t use any plugins.
But we’re facing critical challenges, and we’ll want to deploy all the power tools we can lay our hands on. To that end, my colleagues at MIX Online have just released Project Descry, a set of four Silverlight-based visualizations. In an introductory article I wrote:
The world we must make sense of now is one in which human actions have planetary effects. The good news is that we can, for the first time, begin to measure those effects. We’re instrumenting the atmosphere and the oceans, and torrents of data are arriving from our sensors. The bad news is that we’re not yet very skillful storytellers in the medium of data. That’s true both in the specialized realm of science, and more broadly at the intersection of science, public policy, and the media.
If you’re a developer and are curious about how to create, for example, a treemap widget in Silverlight, you can visit Descry on CodePlex and have a look.
There are all kinds of useful tools yet to be built — in a variety of ways — and made available to citizens of the Net. I’m particularly interested in general-purpose visualizers, like the excellent ones at Many Eyes, that non-programmers can pour data into and make productive use of.
Where, for example, is the general-purpose visualizer for map data over time? In the spirit of Many Eyes, I’d like anyone to be able to upload a simple comma-separated dataset and create an animation like FlowingData’s Growth of Target, 1962 – 2008.
Ideally, the visualizer would also provide a scrollbar for scrubbing along the timeline. In the FlowingData example, you can do a geographic query by zooming and panning. But once you have selected a region you have to play the whole animation. Add timeline scrolling, and you can combine temporal with spatial query.
What other kinds of general-purpose visualizers do you imagine having and using?
February 9, 2009
Posted by Jon Udell under
Uncategorized [18] Comments
In a 2003 InfoWorld story on the globalization of software development I asked Andy Singleton to share his thoughts on distributed software development. He has continued to refine and reflect on his approach, which he says is inspired by the open source, agile, and web 2.0 movements. On this week’s Innovators podcast, Andy summarizes the often counter-intuitive methods that work well for him and his teams. They include:
Don’t interview. Just pay people to join a project, pull a task from the queue, and find out what they can do.
Don’t divide work geographically. You’re not making best use of your distributed team if you impose that artificial constraint.
Don’t do phone conference calls. “I’ve never had someone tell me: ‘I worked on a project with lots of conference calls, and it worked great, so your thesis is disproved.’”
Don’t estimate. It’s just extra work. If you know your tasks and priorities, go after them in order. Estimation won’t help, and will cost 10% of your time.
Pile on developers early. It enables people to self-sort, and yields a stronger and more flexible team at the two-week mark.
Ironically, Andy says, many proponents of agile software development resist the notion of distributed development:
They think everybody should meet once a day. That’s such a cop-out! Since most development nowadays is distributed, they’re saying that 90% of the people who should be taking advantage of agile methodologies can’t do it. What they really should be doing is figuring out how to make distributed teams at least as productive as colocated teams. And in our case, we believe they’re more productive because we’re bringing in better talent.
One of the key enablers of effective distributed work is a common event stream to which everyone can subscribe. To that end, the Assembla website has embraced web hooks. So, for example, the action of committing code to a repository implicitly fires an event. You can make that explicit by wiring the event to an action, like sending a Twitter direct message.
This is a really important idea. Today, most of the services that you’d like to weave together to enhance distributed teamwork don’t export event hooks. But it’s quite simple to do. Here’s how Assembla enables you to relay events to Basecamp:
It takes two to do this tango. The external system, in this case Basecamp, has to be prepared to catch an incoming REST call. And Assembla has to enable its users to wire internal events to outbound REST calls.
Neither requirement is difficult. And the payoff can go way beyond the basic pub/sub notification scenario shown here, as noted in the Web Hooks blog:
Thanks to CGI we got the read-write web, but we also made the web way more useful than it was intended. Suddenly browsing to a URL would run some code. And code…well, code can do anything.
Yes. That said, simple notification is nothing to sneeze at. That alone, widely implemented, would be a game changer.
February 6, 2009
Posted by Jon Udell under
Uncategorized [3] Comments
Last month, in a series of entries, I laid out the case for an effort — inspired by the RSS/Atom feed validator — to create a similar suite of tests and tools for iCalendar feeds.
I’m delighted to report that two developers of libraries that support iCalendar are collaborating to do just that. Ben Fortuna is the author of iCal4j, which powers the best currently-available online iCalendar validator. And Doug Day is the author of DDay.iCal, a C# iCalendar library. Both iCal4j and DDay.iCal are open source projects.
They’re collaborating, at icalvalid.wikidot.com, on a platform-neutral suite of tests that can serve as foundation for a more robust iCalendar validation service.
As Sam Ruby points out:
For each of the red entries on that page, somebody needs to identify what should be tested for, and for each test identify a short message, an explanation, and a solution. Identifying real issues that prevent real feeds from being consumed by real consumers and describing the issue in terms that makes sense to the producer is what most would call value.
We’ve made a start on the wiki. As I proceed with my calendar aggregation project, I’ll continue to document the validation issues I run into. Meanwhile, in the spirit of loosly-coupled collaboration, please feel free to attach the tag icalvalid to any blog posting, forum message, or other online item that discusses iCalendar feeds that fail to validate, explores reasons why, and recommends solutions.
February 6, 2009
On this week’s Innovators podcast I spoke with Phil Long. He runs the newly-formed Center for Educational Innovation and Technology (CEIT) at the University of Queensland, in Australia. Phil is a transplant from MIT, where he was closely involved in the TEAL (technology-enhanced active learning) project. TEAL was the subject of a recent New York Times story: At MIT, Large Lectures are Going the Way of the Blackboard.
Born of John Belcher’s frustration that his large physics lectures were drawing fewer and fewer students each year, the TEAL experience mixes lecture segments with realtime interactive feedback (“clickers”) and guided teamwork.
Although the word technology is embedded in both TEAL and CEIT, it’s worth noting that sociology belongs there too. As Tim Fahlberg pointed out when I interviewed him about mathcasts and clickers, the technology that enables teachers to conduct realtime quizzes –and thereby adapt presentations on the fly — isn’t only about efficient measurement of what you could gauge roughly by a show of hands. The responses gathered by clickers are anonymous, and that makes all the difference. Nobody wants to raise a hand when asked: “Who didn’t understand that?”
Team formation is another area where technical and social engineering can usefully converge. If you test students before a course starts, Phil says, you can use that data to divide them into groups. But what heuristic should apply? He advocates teams of three drawn from the low-, middle-, and high-scoring groups. That arrangement encourages the most knowledgeable students to help teach their peers, and in so doing reinforce their own knowledge.
Phil points out that TEAL has so far been applied only in the domain of physics, where it has benefited from a wealth of research data about how students learn physics concepts. Part of CEIT’s mission will be to find ways to map the TEAL approach to other scientific domains, and also more broadly.