Proximity search in Google and Live?

30 Apr 200930 Apr 2009 ~ Jon Udell ~ 12 Comments

I recently added a specialized search to help curators working with the elmcity project find recurring events in their communities. It’s helpful, but would be much more helpful if it produced results only when the two searched-for phrases occur in close proximity.

The phrase pairs look like this:

"every thursday" "keene nh"
"first friday" "keene nh"

I’d like to limit results to pages where these pairs occur within, say, 100 words of one another. My search robot uses both Google and Live because, well, why wouldn’t you want the best of both worlds? But as far as I know, neither supports a proximity syntax like:

"every thursday" within 100 "keene nh"

I only need to run my search robot occasionally, and there are only thousands of pages per calendar hub, and there are only a dozen hubs yet. So for now it’s feasible to use brute force. I can, and likely will, fetch all the pages found by the two engines, analyze them, and reject those that fail my proximity test.

But since I am virtuously lazy, I just thought I’d ask. Are their undocumented features for either or both of these engines that I’m missing?

Curating softball schedules

29 Apr 200929 Apr 2009 ~ Jon Udell ~ 10 Comments

Keene is crazy about baseball and softball. In the men’s softball league alone there are 56 teams, they have played 73 games so far, and will play another 431 through August. I know this because the schedule was made in Excel, and published as a web page that Excel’s Data->From Web feature can easily read back.

That Excel spreadsheet isn’t at all useful, however, if you want to combine the schedule with other public calendars, or with your own personal calendar. For that you need an ICS feed. And almost nobody — from the major league websites to local leagues like mine — bothers to provide those.

So I made an ICS feed for Keene men’s softball, and I did it in an unusual way. My first thought was to point FuseCal at the schedule page, which is just an HTML table that looks like this:

DATE	TIME	FIELD	AWAY	HOME	Lg
Fri. Apr 17	6:00 PM	D	Computer Solutions of Keene	J.A. Jubb	C1
Fri. Apr 17	6:00 PM	O	Peerless Insurance	C&S 1	D2

But FuseCal wouldn’t read that page. It’s a service that specializes in digging structure out of unstructured text, and I guess it got freaked out when it saw too much structure in this page!

Normally in cases like this I’d write a script to read the HTML table, parse out the dates and times, and write an ICS feed. But that isn’t a skill most people have, and I’m looking for ways to help calendar curators do this kind of thing for themselves.

Then it occurred to me: What would FuseCal read? How about this:

Fri. Apr 17 06:00 PM,
Computer Solutions of Keene vs. J.A. Jubb, Field D

Fri. Apr 17 06:00 PM,
Peerless Insurance vs. C&S 1, Field O

In other words, the same stuff lightly reformatted, and coalesced into a single cell per row. And yes, FuseCal will read that.

So I added a column to the Excel sheet with this formula:

=CONCATENATE(A4, " ", TEXT(B4,"hh:mm AM/PM"), ", " D4, " vs. ", F4,
 ", ", "Field", C4)

Then I exported that column back out as this HTML page, used FuseCal to create this ICS feed, and bookmarked it for inclusion in the aggregator.

This has to be the weirdest maneuver I’ve ever thought of. Taking away structure in order to be able to add structure? Crazy! And yet it makes perfect sense. FuseCal is a component that specializes in turning weakly-structured calendar-like data into better-structured calendar data. It also knows how to do other useful things, like monitor the source of that data for changes, and convert the data into ICS format. If it’s easy enough to provide the sort of weak structure that FuseCal expects, why not just do that and leverage its strengths?

So I did, and here are the key outcomes:

The softball events now show up on the aggregated calendar.
They’re also available directly from the ICS feed, so that players and their families can add these events to personal calendars.

Nice!

It would be even nicer if, as a member of, say, the Blazers, I could scoop up just my own team’s events. And in fact FuseCal does support filtering. As the creator of the feed, I can go into the application, type Blazers, and restrict the feed to just those events. But I’d have to create 56 separate filtered calendars to provide feeds for all the teams. Feature request for FuseCal: Support filtering on the feed URL, so I can form URLs like:

http://fusecal.com/calendar/view/ 741833?h=5f7c2ac6-13cc-11de-a48e-00163e284ee0&filter=Blazers

http://fusecal.com/calendar/view/ 741833?h=5f7c2ac6-13cc-11de-a48e-00163e284ee0&filter=Greenwald+Realty

While we’re wishing, here’s a feature request for Yahoo Pipes: Add a module for ICS feeds! Pipes is a fabulous tool for transforming, filtering, and merging RSS feeds. It would be great to be able to do the same kinds of magic with ICS feeds.

Data-driven career discovery

28 Apr 200928 Apr 2009 ~ Jon Udell ~ 14 Comments

The fulcrum of my talk last week at the Open Education Conference was observable work. I first started thinking about this back in 2002, when I included this Dave Winer excerpt in my review of Radio UserLand:

We’ve been using this tool since November, internally at UserLand. We shipped Radio 8 with it. When we switched over our workgroup productivity soared. All of a sudden people could narrate their work. Watch Jake as he reports his progress on the next project he does. We’ve gotten very formal about how we use it. I can’t imagine an engineering project without this tool.

Since then I’ve spoken a few times about the idea that by narrating our work, we can perhaps restore some of what was lost when factories and then offices made work opaque and not easily observable. Software developers are in the vanguard of this reintegration, because our work processes as well as our work processes are fully mediated by digital networks. But it can happen in other lines of work too, and I’m sure it will.

My favorite example, from a very different domain, is the historic home preservationist John Leeke. In our interview he eloquently explains how and why he works observably.

This week’s Innovators show, with Charlie O’Donnell and Hilary Mason of Path101, expands on the same theme from a different perspective. Path101’s tagline is community-powered career discovery, and the approach is more data-driven than narrative.

When we narrate our work, we enable others to ask and answer the critical question:

What is it like to be a __________?

Path101’s aggregation of resumes and personality tests aims for different kinds of questions:

What personality traits do other _______s like me tend to have?

What careers do other _______s like me transition into?

Path101 is still a very young service, but I love the concept and will be interested to see how it evolves.

Mashing up LibraryThing, FuseCal, and RSS2HTML to create iCalendar feeds for LibraryThing events

27 Apr 200927 Apr 2009 ~ Jon Udell ~ 6 Comments

One of the elmcity project‘s curators — Richard Akerman, in Ottawa — likes to use LibraryThing to keep track of events. He provided me with this RSS feed for Ottawa’s LibraryThing events:

http://www.librarything.com/rss/events/location/ottawa,+on

Although this feed does contain event information, it’s weakly structured. The dates and times appear as free text within the RSS <description> element:

<description>Thursday, April 30 (12:00 pm) Jeramy Dodds discusses Crabwise to the Hounds; Matthew Tierney discusses The Hayflick Unit. Join two stellar poets for a team Masterclass on poetry. Jeramy Dodds, recently shortlisted for the Griffin Prize, and Matthew Tierney, author of The Hayflick Unit and Full speed through the morning dark, for an exploration of the intersection of science and poetry.</description>

Could LibraryThing provide an iCalendar feed? Sure. But in order to do so, its events system would want to start gathering information in a more structured way.

Could FuseCal read the unstructured RSS feed and turn it into a structured RSS feed? In theory yes, in practice it doesn’t seem to want to read XML.

But wait. Maybe FuseCal can read an HTML translation of the RSS feed and turn that into an iCalendar feed?

Yep, that works. For calendar curators, and for anyone else who may be interested, here’s the recipe:

Find a service that converts RSS into HTML. For example: http://www.rss2html.com.
Form a URL that uses that service to convert a LibraryThing feed. For example: http://www.rss2html.com/public/rss2html.php?TEMPLATE=template-1-1-1.htm& XMLFILE=http://www.librarything.com/rss/events/location/keene,nh

For another location, just replace keene,nh with, say, ottawa,on or baltimore,md.
Copy that URL and paste it into FuseCal.
Click Add to My Calendar -> Other Calendar in FuseCal to expose the iCalendar URL.
If you’re curating for the elmcity project, bookmark that iCalendar URL in the Delicious account you’re using to control your instance of the calendar hub.

Of course I could just automatically scan LibraryThing for each instance, just as I’m doing for Eventful and Upcoming. If that’s what curators prefer, I will. But in any case, this is a nice example of the kind of lightweight, spontaneous, opportunistic integration that I mentioned in my talk at the Global Research Library summit.

What is the RSS of calendars?

23 Apr 2009 ~ Jon Udell ~ 14 Comments

A conversation with some folks here at the Open Education Conference (#ocwc2009global) just connected in a wonderful way with another conversation on Twitter about what Douglas Hofstadter calls Ob-Platte puzzles, like this one:

Q: What is the Atlantic City of France?

A:Monaco. (Not a city in France. But borders France, is coastal and casino-oriented).

These come from my favorite of Hofstader’s books, Fluid Concepts and Creative Analogies.¹ The thesis is that recognizing and extrapolating from patterns is a core aspect of — maybe the core of — intelligence.

Here’s the connection. To the exent that technologists fetishize innovation and newness, we risk overwhelming people with churn. “Forget what you thought you knew,” we tend to say. “This new thing changes everything.” Except, of course, it usually doesn’t.

For example, we’ve done a terrible job of explaining to the world that Twitter is, among other things, a recapitulation of the pub/sub pattern that most people first encountered in the blogosphere. The packets are smaller, the activation threshold is lower, but the same principles apply. You can extend what you know from the blog domain into the Twitter domain. And the two are complementary.

We aren’t getting that message across. Yesterday’s NY Times — featuring Maureen Dowd’s encounter with Twitter founders Evan Williams and Biz Stone — makes that painfully clear.

Analogies are crucial. The elmcity project boils down to this Ob-Platte puzzle:

Q: What is the RSS feed of calendars?

A: The iCalendar (ICS) feed.

We need to help people focus much less on fast-changing applications, protocols, and formats, and a lot more on constant underlying patterns and principles that they can learn and then extend by analogy.

¹My review of the book, for BYTE, is now gone too, I see, along with my InfoWorld archive. More proof, if proof were needed, that we need to take control of our lifebits.

A different take on ‘green’ Keene

23 Apr 2009 ~ Jon Udell ~ 8 Comments

It says here:

Portsmouth defeated by ‘green’ Keene

Municipal employees in Portsmouth and Keene, the state’s two predominant “green” cities, slugged it out over the course of three weeks and, in the end, Keene delivered the knockout punch.

Portsmouth accepted Keene’s challenge in late March to see which environmentally conscious city could get the highest percentage of municipal employees signed up for the New Hampshire Carbon Challenge by Earth Day. With a participation rate of 55 percent, Keene employees easily outperformed Portsmouth’s 41 percent.

That’s nice. I guess. I dunno. From my point of view, ‘green’ Keene has a long way to go. My struggle to get the city to issue its first-ever approval for a clean, modern, efficient wood gasifier was epic, and cost me more than few sleepless nights.

Then last week the other shoe dropped. I found out, by accident, that I qualified for a property tax exemption. A qualifying wood heating system is defined as:

…a wood burning appliance designed to operate as a central heating system to heat the interior of a building.

Yep, that’s what my EKO-40 does. I get to reduce the taxable value of my property by $10,000. It’ll only save me a few hundred bucks a year, but that’s every year, so it’s nothing to sneeze at. I’m grateful.

But. During all that time I was struggling to get the system approved, no official in ‘green’ Keene said: Oh, by the way, we do encourage this kind of thing, and you’ll even qualify for an exemption, and in fact it’ll be the first one we’ve had the opportunity to do, and we’re excited about that!

Well, the secret’s out now. I’m happy to know that the next person to adopt central wood heating will be able to search, find precedent, and move forward.

Finding and connecting social capital

20 Apr 200920 Apr 2009 ~ Jon Udell ~ 5 Comments

I spent some time over the weekend perusing the list of possible recurring events that my search robot found, and recording the useful/valid/appropriate ones in a calendar that syndicates into the Keene calendar hub.

It took me a half hour to go through the first 125 items in that list of 3300 search results. I found ten new recurring events for the Keene calendar. Three or four of those came from PDF newsletters that contained English paragraphs like:

Community Singers: Open singing group, no experience necessary, come for the joy of it. Thursdays from 10:45 to 11:45.

Using ordinary calendar software — in this case Live Calendar, but it could as easily have been Google Calendar, Outlook, Apple iCal, Eventful, or Upcoming — I turned these into iCalendar paragraphs like:

BEGIN:VEVENT
RRULE:FREQ=WEEKLY;INTERVAL=1;BYDAY=TH;WKST=SU
DTSTART:20090416T104500
DTEND:20090416T114500
SUMMARY:Community Singers
DESCRIPTION:http://www.lifeartkeene.org
LOCATION:LifeArt Community Resource Center
END:VEVENT

The first thought that will occur to technically-inclined readers is: “Hmm. How might I fully automate that transformation?” I understand, and share, that impulse. But I’m trying to set it aside for now, and focus on a different kind of solution.

At a geekish dinner recently, the conversation turned to automation. The geek mind and personality, someone suggested, tends toward an all-or-none approach. It cherishes algorithms that drive fully-automated processes to 100% completion. It does not value methods that achieve partial results, or systems that engage with people to help them do that refinement.

I think that’s true. As I went through the list of candidate events, I reflected on what I was doing. A lot of it wasn’t mere translation from English to iCalendarese. For example, here’s search result #37:

37. NSD – 2009_02_Issue.indd

Recreation Center, 312 Washington St., Keene, NH. Western Style Square Dance Apparel ….. “We have a dance every first and third Saturday, no matter what!!!”

Here’s the source of the location information:

And here’s the source of the time information:

These components are unrelated. Or rather they are related, just not in a way that machine intelligence is likely to be able to detect anytime soon. But human intelligence can easily figure out that:

There is an organization called Monadnock Squares
The dances happen at the Keene Recreation Center
These are the kinds of events that happen on regular recurring schedules

So I searched for Monadnock Squares, and wound up adding this event to the calendar:

At this point I realized what the tagline for this project should be. The one I’ve been using is accurate but uninspiring:

community calendar syndication

So I’m going to try this instead:

finding and connecting social capital

When Robert Putnam says that we are bowling alone he adds:

More Americans are bowling than ever before, but they are not bowling in leagues.

Yochai Benkler points out that the networked information economy enhances our ability to:

…do more in loose commonality with others, without being constrained to organize their relationship through a price system or in traditional hierarchical models of social and economic organization…

Maybe there’s plenty of social capital around, but it’s just harder to find, and connect with, because it’s no longer tightly coupled to traditional clubs, leagues, and organizations.

A lot of it is represented online, it turns out. It just isn’t published in a way that’s easy to find and connect with. I hope this project will help change that.

To that end, I’m wondering how to help curators process lists of many thousands of candidate events. Mechanical Turk comes to mind. It would be great to enable curators to carve their lists into batches of 100 and farm them out to volunteers. Is there a free Mechanical-Turk-like service for doing that?

A power tool for calendar curators

16 Apr 200920 Apr 2009 ~ Jon Udell ~ 4 Comments

This installment of the elmcity+azure series shows how curators can find and publish events that are discoverable online, but not available in a structured form that can syndicate in and out of one of the elmcity hubs.

Introduction

When a curator signs up for the elmcity calendar aggregator project, the first question is invariably: OK, where do I find calendars? Although the service aims to gather and republish iCalendar feeds, there’s a chicken-and-egg problem: The vast majority of calendar information on the web is implicit, not explicit. This project is all about finding a lot of that implicit data and making it explicit.

One great asset for calendar curators, as I’ve mentioned before, is FuseCal. I’m using it, for example, to turn this poorly structured page at the Keene Public Library into an iCalendar feed that can participate in the syndication network. Curators in Providence, Huntington, and elsewhere are having good success with FuseCal.

But the universe of implicit calendar information is much large than FuseCal alone can address. There is no fully automatic way to discover that implicit data and make it explicit. So I’ve cooked up a computer-assisted method that I think will be a power tool for motivated curators. It’s based on a specialized search robot that looks for phrases like this:

"every thursday" "keene nh"
"4th sunday" "keene nh"

The output from the search robot, for Keene, is here. It contains over 3000 entries like this:

22. September08newsletter.nws

Keene, NH 03431. Return Address Requested. NON PROFIT ORG. U.S. POSTAGE PAID …. Every Thursday from 5:30-9pm. For more info call Ben Grant at 603-283-6601 …

…

29. Women’s Learn to Play Hockey | Model Web Site

… solution for you. The program runs from November 10 th through February 23 rd, every Monday … great program. Come see us at 149 Emerald Street in sales tax free Keene, NH

Many of these pages mention implicit recurring events that a curator can make explicit, by publishing them in a special iCalendar feed. Since I’ve already shown how to publish iCalendar feeds using Outlook, Google Calendar, and Apple iCal, I’ll add a fourth example to that series and show you how to do it using Microsoft Live Calendar.

Setting up a feed of recurring events

Let’s look at search result #22, September08newsletter.nws. It turns out to be a PDF newsletter from the Keene LifeArt Community Resource Center, whose official events page is under construction.

Here’s what the newsletter looks like:

It mentions a bunch of recurring events, including:

Community Singers: Open singing group, no experience necessary, come for the joy of it. Thursdays from 10:45 to 11:45.

Cheshire County Structured Storytelling: Creating and sharing stories about heroism and humor. Thursdays from 10:45 to 11:45.

Publishing one of these into an iCalendar syndication network, using Live Calendar, is a two-part process. Part one is a once-only setup of the feed. Part two happens once per recurring event.

Setup, Step 1: Add a new calendar

Setup, Step 2: Share the calendar

Click Import into another calendar application

Setup, Step 3: Capture the URL of the iCalendar feed

In this case, the link is webcal://jonu.calendar.live.com/calendar/recurring+events1/calendar.ics.

Setup, Step 4: Bookmark the feed in your curatorial account

Change webcal: to http: and use these three tags: ics, feed, and trusted.

Adding recurring events to the feed

Now repeat these steps for each recurring event you’d like to publish:

Per-event, Step 1: What, where, when

Per-event, Step 2: Recurrence

Per-event, Step 3: Details

Use the best link you can find for the organization, group, or individual sponsoring the event. Anyone who discovers the event at the hub, or in any feed that comes from the hub, will follow that link to find out more about the sponsor and the event.

Outcomes

On the next aggregation cycle, the event will be captured and reflected back in various ways. Here’s the internal representation:

<event>
  <title>Community Singers</title>
  <url>http://www.lifeartkeene.org</url>
  <source>recurring events: LifeArt Community Resource Center</source>
  <dtstart>2009-04-16T10:45:00</dtstart>
</event>

Here’s how it’ll show up in the default HTML rendering:

Thu 10:45 AM Community Singers (recurring events: LifeArt Community Resource Center)

Why recurring events?

Because there’s a high payoff. You could publish an individual event this way, but once it scrolls past the event horizon it’s gone. When you publish a recurring event, though, you’re creating a gift that will keep on giving.

Why should a curator have to do this?

You can certainly try to explain to sponsors that they can do this for themselves. But in my experience, most aren’t open to that discussion. As a curator, though, you can model the behavior you’d like them to emulate. As their events syndicate, and show up in various places, they’ll begin to notice, and will wonder why they’re not the authoritative sources for their own information. At that point, you can say: “You can be! And you should be! It’s easy! Let me show you how!”

Talking with Erin Kenneally about digital forensics in a connected world

15 Apr 2009 ~ Jon Udell ~ 2 Comments

My guest on this week’s Innovators show is Erin Kenneally, a lawyer who helps law enforcement agencies think about digital forensics, and about the authenticity of evidence in a connected world. Methods that were considered best practices not long ago — like shutting down computers, capturing images, and analyzing them — are no longer practical in an ecosystem of always-on services. It’s tempting to say that cyberspace rewrites all the rules of the game, but as Erin points out, that’s not really really true. There are always logs, and people responsible for those logs, and procedures for managing those logs — in physical as well as in virtual space. When a case comes before a judge, a well-documented set of best practices regarding physical custody of computer systems is likely to be as relevant as the cryptographic methods that may have been used to protect and validate the bits.

Someday all this will be relevant to the lifebits scenario I envision. In that model I push as much of my personal data as is feasible to the cloud, surround it with a set of access control and auditing services, and route transactions there whenever I can. When you and I do business, my view of our transactions is logged and audited in a system I control, governed by practices I can document.

What happens when I’m compelled to provide evidence or documentation, but don’t want to cough it up? If I’m running my lifebits service in a translucent way, the cloud infrastructure never sees my data unencrypted. But while that’s feasible, it radically limits my ability to allow automated transactions against my data. So in practice I’ll want to let the infrastructure to access the data as my proxy. Doing that in a controlled environment, with a robust access control scheme that’s uniform across all my transactions, and with comprehensive auditing, will be vastly preferable to the worsening mess we’re in now.

Community calendar curation: The startup guide

10 Apr 200927 Sep 2011 ~ Jon Udell ~ Leave a comment

Suppose your community is Ypsilanti, Michigan. The steps are as follows.

1. Choose an identifier for your new hub. For example: ypsicals.

2. Choose an identifier for yourself. This can be a Facebook name (or id), a Twitter name, a Gmail address, or a Windows Live email address.

3. Notify the elmcity administrator that you’re ready to start your new hub.

4. When you receive confirmation that the hub has started, visit http://elmcity.cloudapp.net and log in using your Facebook, Twitter, Gmail, or Windows Live identity.

Stepping into the river with Heraclitus

8 Apr 20098 Apr 2009 ~ Jon Udell ~ 8 Comments

From the Benjamin Jowett translation of Plato’s Cratylus:

Socrates: Heracleitus is supposed to say that all things are in motion and nothing at rest; he compares them to the stream of a river, and says that you cannot go into the same water twice.

Heraclitus would have loved the web. The ever-changing river of information, flowing around and through us, brings his doctrine of flux vividly to life.

But I don’t think most of us truly embrace the doctrine of flux. We’re more comfortable with stocks than with flows. And that’s one of the key challenges I’m wrestling with at elmcity.cloudapp.net.

The service is not a database, and does not contain stocks of events. It is, instead, a hub that coordinates flows of events. But that model doesn’t make immediate sense to people.

Q: How do I put my events into your database?

A: You don’t. You just arrange for your database to publish a feed. The hub subscribes to your feed and to others, and in turn publishes a merged feed to a network of downstream subscribers. From the hub, or from anywhere in that network, people who find your events then follow links back to the authoritative source: You.

It’s a wonderful model, but a lousy elevator pitch. Given that most people are not computational thinkers, it relies too much on counter-intuitive principles: indirection, pub/sub, loose coupling. And it seems to be all about flux, which is psychologically challenging. Amidst all this dynamic flow, can’t we please have some constant anchor?

Actually we can. At the core of the service is a collection of feeds. The collection changes too, but much more slowly than the streams of events flowing through each feed. Feeds are relatively constant, and so they are manageable. For example, there are far too many events to make individual trust decisions about, but I can easily trust a feed or not.

Of course there are some ways in which we do need to manage individual items in feeds. For example, you can pluck an event from an iCalendar feed and stick it onto your personal calendar. Or you can transfer a single MP3 from an audio feed to your player. This feed/item duality makes the conceptual challenge of feeds even harder.

One new place to experience that duality is SpokenWord.org. My collection there is a mixture of feeds and items. Why items? Feeds are prospective. They look into the future for new items, and make them available to my podcatcher. But when I subscribe to a feed, I may want to make archival items available too. At SpokenWord I do that by “collecting” individual items.

In an interview with Phil Windley, SpokenWord’s founder and developer Doug Kaye says that he was surprised by the extent to which the service is being used to manage feeds rather than items. That’s true inbound, when people submit feed URLs rather than item URLs for inclusion in the catalog. And it’s also true outbound, when they use SpokenWord to consolidate many feeds into a single merged feed that’s more convenient to receive and manage in a podcatcher.

In different ways, the elmcity and SpokenWord projects are both encouraging and enabling people to think in terms of flows, and to manage feeds instead of items.

As we all become more adept at working with flows, I think we’ll need a richer vocabulary of patterns to describe them. At SpokenWord, for example, there’s a big difference between this feed and this one.

The first, based on a script I wrote, points to 45 chapters of Mark Twain’s Connecticut Yankee in King Arthur’s Court. It will never produce a new item. It’s just a way of packaging those chapters so a podcatcher can download them.

The second feed is The Moth Podcast, which is both a historical archive and a way to prospectively grab new items.

Neither pattern corresponds to an iCalendar feed, which is purely prospective. In calendar space I care about today’s events and future events, but almost never about the past.

In rivers of information, feeds provide one kind of anchor. Patterns that describe the nature of feeds, and our ways of using them, are another. These are sources of constancy amidst the flux.

A conversation with Seth Grimes about the voice of the customer

6 Apr 2009 ~ Jon Udell ~ 1 Comment

My guest for this week’s Innovators show is Seth Grimes. He’s a business intelligence expert who, nowadays, is really keen on text analytics. In this conversation he explains why: There’s suddenly a lot more text available for analysis.

And it’s not simply because the web continues to grow. The conversational dimension of the web — blogging, and now microblogging — holds particular interest for companies that need to monitor, and make sense of, what is being said about them online.

Seth uses the wonderful phrase “voice of the customer” to describe this new opportunity. As people increasingly narrate their lives and experiences online, that voice becomes easier to hear — or anyway, to read.

Couple that with “voice of the company” applications, exemplified by Public Service of New Hampshire’s use of Twitter during the December ice storm, and maybe we can begin to restore a market dynamic in which these two complementary voices can hear — and respond to — one another.

iCalendar validation: status report

6 Apr 20096 Apr 2009 ~ Jon Udell ~ 4 Comments

One of the ongoing themes of my calendar aggregation project is the notion that iCalendar files are (or should be) calendar feeds in the same way that RSS and Atom files are blog and microblog feeds.

As I began to explore this idea, I realized that iCalendar feeds are all over the map in the same way that RSS feeds used to be when there wasn’t a robust, well-known validator. So began a parallel effort to improve the state of iCalendar validation.

I’ve written a series of entries on this topic, based on early observations. Now that curators for the aggregation project are finding more iCalendar feeds in the wild, we’re gathering more data for the validation effort.

Here’s the set of iCalendar-feed-generating software products that has emerged so far:

Google Calendar 70.9054
iCalcreator 2.2.8
FuseCal Software
Coldfusion8 
Drupal iCal API
Intand Corporation//Tandem for Schools
Zvents Ical 
Trumba Calendar Services 0.11.5203
WebCalendar-v1.1.2 
Meetup Inc//RemoteApi
iCalendar-Ruby

For each city or town participating in the elmcity project, there’s a stats page that reports how two different parsers handle the set of iCalendar feeds collected for that city or town.

The first parser is the currently-available online validator based on iCal4J.

The second one parser is DDay.iCal, the component I’m using to parse and load calendars.

The outcomes reinforce what I saw in the table of results shown here. Parsers sometimes disagree about which feeds are valid, and why or why not.

My hunch is that this isn’t actually a huge problem. I think that as we:

collect more examples of iCalendar feeds in the wild,
converge on the complete set of software products that produce those feeds,
and run all the feeds through the available set of parsers,

we’ll find that there are maybe a dozen or so issues that account for the bulk of the discrepancies.

But that’s only a hunch. To confirm it we’ll need to gather the data and do the testing. If the elmcity project succeeds in finding and cataloging enough of the iCalendar feeds out there in the wild, we’ll have the set of feeds that need to be analyzed.

In parallel, I’ll try to run the data through more than the two parsers I’m currently using. I’m aware of iCalendar.py and vObject and will roll those in as I can. If either of these is available as service let me know, that’ll make things easier. And if there are other parsers that could be included, ping me about them. Again, if they’re available as a service, that’d be ideal.

Facts and friction

3 Apr 20093 Apr 2009 ~ Jon Udell ~ 22 Comments

Last weekend we all had a good chuckle when we saw that WolframAlpha knows — or anyway claims to know — the airspeed of an unladen swallow. But the more telling example, for me, was one that Stephen Wolfram showed in a post-demo discussion:

Suppose you want to know the distance to Pluto. We don’t just look it up. We answer the question: “What is the distance to Pluto right now?” And we compute the answer.

I reckon that this notion of computable knowledge is going to take a while to sink in. Here’s another example:

Q: length of grand canyon / height of mt. everest

A: 4.47.

These examples run the risk of seeming geeky and pointless. But twice in the last few days, I’ve found myself reaching for bits of computable knowledge that weren’t readily available, and that’s got me thinking about what things might be like when they are.

Both examples are from my elmcity+azure project. In one case, I needed to work out distances — based on latitude/longitude coordinates — for locations that might be written as Providence RI or Ann Arbor, MI. There’s no shortage of online services that can do this. But they all report results in different ways, and digging the answers out of XML responses — which may or may not require special handling for embedded namespaces — can be very tricky.

In the other case I wanted population data for cities whose names are written the same way. Here I wound up digging it out of a CSV file published at http://www.census.gov. It’s perfectly doable, but you’ve got to really want to do it. If you have, say, a count of calendar events in Providence, and you want to divide that by population in order to produce an experimental metric for creative class activity, you can’t just write “population of Providence RI” in the denominator and proceed with your experiment. You have to overcome some fairly serious data friction.

In a few months we’ll all get to tirekick WolframAlpha. Then we’ll draw our own conclusions about what it can or can’t do, and is or isn’t good for. I’m not expecting a Delphic oracle. But I would like to be able to compute with facts in a more frictionless way.

Competing for the creative class, revisited

3 Apr 2009 ~ Jon Udell ~ 5 Comments

In the current build of elmcity.cloudapp.net, the statistics page for each instance of the calendar aggregator reports a line like this one for Providence, RI:

All events 910, population 48779, events/person 0.02

I’m not exactly sure where this might lead, but I’m thinking that it could evolve into a population-independent metric of what Richard Florida calls “creative class” activity. As I learned at the Cities of Knowledge conference in 2007, city planners are now thinking explicitly about how to compete on the basis of such activity.

If you’re Asheville NC or Portsmouth NH, you can’t compete in absolute terms with San Francisco or New York. But you can compete with them on a relative basis. And you can also compete with similar-sized neighbors like Greenville NC and Dover NC. Here’s an early peek at the data:

city	population	events/person
keene, nh	23,000	0.07
ann arbor, mi	115,000	0.04
providence, ri	48,000	0.02

It’s not surprising that Keene ranks first, because I began the experiment in that town and have been curating its events for some time. But that’s exactly what interests me about this process.

I can’t measure the actual events-per-person ratio for these cities, because there’s no way to know know that. Most events aren’t reported in a machine-processable way, and so cannot be counted.

What a curator can do, though, is help make the creative class activity that’s really going on not just visible, but countable. Suppose that city A has less activity than B, but does a good job of curating what it has. City A might thereby create the impression that it has more. And by doing so, it might kick off a virtuous cycle that makes that impression real.

Are there any city planners who are gathering and using this kind of metric?