Negotiating shared responsibility for community information

This week’s Interviews with Innovators show is a conversation with Raymond Yee, author of the recently-published Pro Web 2.0 Mashups.

The book is chock full of good examples. Even if you’re an experienced developer of mashups that involve Flickr, del.icio.us, Eventful, and the various mapping services, you’ll learn helpful strategies for using these services individually and in combination.

What we wound up mostly talking about, though, is the vast space of information that’s not currently available to be mashed up. That might be because the information isn’t online at all, or because it isn’t online in a form that’s tractable.

As a kind of social experiment I’ve been tackling this problem in my local community, with particular emphasis on calendar information. In this week’s interview, Raymond talks about tackling the same kind of problem with emphasis on geographic information. Both cases can exemplify a pattern that I’m calling shared responsibility.

Consider, for example, the public library. It hosts a variety of events, some of which are its own (children’s story hour) and some of which aren’t (an AA meeting). Who’s responsible for putting these events onto the library’s public calendar?

Clearly the library should publish its own events. But it needn’t necessarily feel obliged to publish other organizations’ events. In the case of AA meetings, for example, the library is only one of about a dozen venues around town. Shouldn’t AA publish its events to those venues?

We have the tools and services now to enable this kind of small-pieces-loosely-joined approach. In this case, acting as a proxy for AA, I published its regular meetings to Eventful. One of those meetings happens at the public library. So now when you visit the combined calendar, events at the library show up from multiple sources. One is clearly identified with the library itself, others are identified with the various groups using the library.

Of course nothing prevents the library from choosing to authoritatively publish all of the events that it hosts. But it’s useful to show how that can be a choice, not an obligation. If we take a decentralized, small-pieces-loosely-joined approach, information management chores that look insurmountable can turn out not to be.

A conversation with Ray Ozzie about Live Mesh

Ray Ozzie joined me for this week’s Perspectives show. It’s available there as audio plus a text transcript, and you can also watch the video on Channel 9.

Ray opens the conversation by reflecting on his transition to Microsoft three years ago, and on the roles he and Craig Mundie will play as they jointly inherit Bill Gates’ responsibilities.

Next the conversation turns to a meme that Tim O’Reilly once evangelized: the Internet operating system. That phrase never resonated as powerfully as Web 2.0 did, but the ideas behind it are becoming realities. Ray applauds the work that Amazon and Google have done in this area. And he talks about how Microsoft’s legacy as a platform company, dedicated to helping developers succeed, will influence its approach.

In that context, Ray explores one piece of Microsoft’s emerging Internet operating system: the newly-announced Live Mesh. Sharing common DNA with earlier projects, notably Groove and before that Notes, Live Mesh is a data synchronizer born to the Web. The objects that it synchronizes are represented as RSS and Atom feeds, and are manipulated with a RESTful API that works symmetrically on local and cloud-based nodes.

Although the most visible Live Mesh application is a file-and-folder synchronizer, Ray notes that this is just one example of an application pattern that can apply equally to the synchronization of custom objects, like calendar events, across all the devices in a mesh. It also applies across the spectrum of application types, ranging from the browser to conventional rich clients to Web-based rich clients like Flash and Silverlight.

There’s another pattern for Live Mesh applications, one that’s less familiar. In this pattern, a website uses Live Mesh as a pipeline to communicate with Live Mesh users. If you’re running a travel site, or a bank, you can use that pipeline to transmit structured data to your users — for example, itineraries or transaction reports. It’s easy to create those XML feeds, you can leverage the Live Mesh infrastructure to deliver them securely and reliably at scale, they synchronize across all devices in each user’s Live Mesh, and they’re accessible to local applications using same RESTful feed APIs that were used to create them.

“We posted weekly.pdf to the website. Isn’t that good enough?”

It’s almost 10 years since I began producing and consuming data feeds, initially in RSS format. Although I regard the syndication of data feeds, in general, as a transformative technology, the concept still makes no sense to civilians and has little or no effect on their lives.

In order to understand why not, and as a way of figuring out how to motivate a practical understanding of syndication, I’m tackling a problem whose solution doesn’t involve RSS, or Atom, or microformats, or XML. The problem is calendar syndication, and part of the solution is iCalendar, a non-XML format that all widely-used calendar programs support well enough for my purposes.

It’s only part of the solution because the real problem is that most people, most of the time, for most of their calendar-related activities, don’t use calendar programs. They use spreadsheets and wordprocessors, and they produce unstructured web pages and PDF files.

There was a time when, behind their backs, I would mock them for doing so. No longer. As I meet with intelligent and well-educated professionals in my community, and talk with them about how to synchronize calendar information from a variety of sources, I realize that they simply have no intuition about the difference between a PDF file and an ICS file that contain the same calendar information. Both are computer files, right? Both can be posted to the web, right? Both can be searched, right? Problem solved.

There are really two aspects to this missing intuition. First, the concept that some kinds of computer files are more structured than other kinds. Second, the concept that the structured kind can flow easily around the Net without loss of fidelity, and can deliver use value in a variety of contexts, whereas the unstructured kind is inert.

These are ways of computational thinking unknown to most people. As a school administrator, librarian, city planner, social worker, or retail store owner, nobody expects you to understand and apply these principles.

And yet almost everybody needs to harmonize personal and organizational calendars. And many individuals and organizations need to flow their calendar data into other contexts to promote and coordinate their activities.

So here’s my approach. I’m scooping up all the calendar information I can find for my community, in whatever form I can find it, and flowing it into a coommon view. Then I’m syndicating that view elsewhere to show that there’s nothing special about my aggregation.

The idea is to establish a critical mass by brute force, and allow people to see how, over time, sources that are structured and can syndicate will remain in the game, and sources that aren’t will have to sit out on the sidelines.

It’s turning into a nice case study of how organizations and individuals can negotiate shared responsibility for calendar information that’s of common interest. But that’s a story for another day. First things first. I need to give people a reason to care about using a calendar program — any calendar program, could be Outlook or Apple iCal or Google Calendar, so long as it exports iCalendar — in preference to a spreadsheet or word processor. Although the geek tribe can scarcely imagine why, that first step is a doozy.

A conversation with Deepak Singh about science in the web 2.0 era

For this week’s Interviews with Innovators show I spoke with Deepak Singh. This interview extends what has become an ongoing series of discussions with folks who are applying the principles of web 2.0 to the practice of science. This was, of course, the original purpose of web 1.0.

Other Innovators shows on this topic include conversations with Joel Selanikio about epidemiological data collection, Barbara Aronson about giving poor countries free subscriptions to biomedical journals, and Timo Hannay about the impressive stream of online innovations that’s flowing from the Nature Publishing Group.

My new Perspectives series has also explored this theme of Net-enabled science. There, I’ve talked with Catharine van Ingen and Dennis Baldocchi about collaborative analysis of atmospheric C02 data, and with Pablo Fernicola about using Word to produce scientific articles in the National Library of Medicine’s XML format.

Panoramic Westmoreland

For some reason I’ve never gotten around to doing stitched-together panoramic photos until recently. Today, with spring fever raging, I hopped on my bicycle, did one of my favorite circuits, and made this 360 view of Park Hill in Westmoreland:

It turned out to be an interesting study in perception. If you check the enlarged view, you’ll see a tiny, insignificant-looking church in the center of the spread, dwarfed by mailboxes in the foreground. In my memory of the scene, that church was the dominant feature. But what my eyes actually saw is what the camera saw: a tiny, insignificant-looking church.

Next time I’ll need to stand closer to it. And I’ll need to bear in mind that what we think we see is a heavily interpreted version of what hits the retinas.

Still, it was fun. I love that you can see the handlebars of my bicycle on the left, and the seat on the right.

I’m sure there lots of ways to do this, I’ve never really looked into it, but Windows Live Photo Gallery makes the whole thing a snap. From camera import, to photo stitching, to Flickr upload, was under 10 minutes. And most of that was CPU time.

Radio commentary on citizen use of public data

A while ago I recorded a commentary for New Hampshire public radio on the topic of public data. The themes will be familiar to readers of this blog: transparency, citizen use of government data. I wondered when it would air, and then last night, while doing the dishes, I heard myself on the kitchen radio.

The piece is available on the NHPR site here. Will it make sense to folks listening at their kitchen sinks, or driving in their cars? I hope so, because as powerful an idea as this is, it’ll go nowhere until it does make sense to those folks.

Syndication of rules versus syndication of data

To follow up on last week’s item about parsing the kinds of dates and times that people actually write, Google Calendar’s Quick Add feature looks like the clear winner. Here’s a test page with expressions like:

Third Saturday of Every Month, 10 – 11:30 am

Let’s try the Chronic module from Ruby:

irb(main):007:0> Chronic.parse('Third Saturday of Every Month, 10 - 11:30 am')
=> nil

No joy.

As David French pointed out, Google Calendar’s Quick Add gets this right. Or anyway, close enough. There seems to be a small bug that pokes an instance of the event into today’s slot, whether or not today is a 3rd Saturday. But otherwise it works great.

There are tougher challenges on that test page, like:

9:00 am – 1:30 pm, North Conference Room 1
April: April 5 and 12
May: May 3 and 10
June: June 7 and 14

I doubt think anything we’ve mentioned so far can touch that, though I’d be happy to be proven wrong.

Meanwhile, the ability to capture recurring events like ‘Third Saturday of Every Month, 10 – 11:30 am’ for my aggregated community calendar has raised a new question. When I use Google Calendar for this purpose, its iCal export doesn’t enumerate the series, it defines a rule:

LOCATION:Cheshire Medical Center
RRULE:FREQ=MONTHLY;INTERVAL=1;BYDAY=3SA;WKST=MO

When I pull that event into elmcity.info/events, the RRULE (recurrence rule) only fires once each time the feed is fetched. And that’s fine. I don’t necessarily want to see these recurring events on the the calendar into the far future.

But while I can syndicate these events directly from Google Calendar into elmcity.info, I would rather route them through Eventful.com. The reason is social not technical. Although I’m herding almost all these events into my aggregator for the time being, I want their rightful owners to claim them at some point and take care for them thereafter. Eventful is better suited for the kind of commons-based peer production I’m hoping to encourage.

But, I don’t see how to inject dynamic rules, rather than static events, into Eventful. You could run the rule yourself, then poke the generated events into Eventful, but that’d create maintenance woes when events are rescheduled, modified, or cancelled. I’d rather syndicate the rule than the data.

A conversation with Phil Libin about EverNote’s new memex

In his 1945 Atlantic Monthly essay As We May Think, Vannevar Bush famously imagined the memex, a mechanism that would augment human memory. This idea of mental augmentation inspired Doug Engelbart, and we’ve been chasing the dream ever since. On this week’s Interviews with Innovators, Phil Libin discusses EverNote, a new software-plus-services offering that aims to become your memex.

Listeners may recall that Phil appeared on the show once before. In fact he was the first guest in this series. Then he was CEO of Corestreet, a company tackling the problem of large-scale credentials validation in really interesting ways. Now, as EverNote’s CEO, he’s tackling a very different problem. But although EverNote is an application for ordinary folks rather than for governments and major institutions, it raises its own set of scale issues. And not just in terms of scaling out numbers of users and quantities of storage. EverNote wants to scale in the dimension of time as well.

Like me, Phil’s a huge fan of the Long Now Foundation. When he says that EverNote wants to guarantee the integrity of the digital objects that you commit to it forever, he’s not kidding.

While it’s refreshing to see a Web 2.0 company taking this long view, Phil admits that addressing the forever challenge in a meaningful way is beyond the means of EverNote. I’d add that it’s beyond any individual organization, and will require a federation of players to hammer out not only technical standards, but also shared business arrangements.

That’s not going to happen anytime soon, but then EverNote isn’t currently making guarantees that sentimental memorabilia will be preserved for your great-grandchildren. Instead it wants to guarantee that you’ll have effective near-term use of operational memorabilia — key documents, and in particular photos from which it finds, extracts, and indexes text.

The idea with this photo feature is that you can take pictures of receipts, wine labels, magazine pages, or event posters, dump the pictures into EverNote, and then find the photos by searching for the text in them. EverNote’s secret sauce here is its ability to find text not only in high-res scans, but also in “crappy cellphone photos taken at an angle.”

As Phil points out, from EverNote’s perspective the world comes at its users in two modes. First, when they’re away from their computers and out in the world, usually with some kind of camera. Second, when they’re at their computers, in which case they can take clippings from the web, or forward email.

I’m in that second mode a lot, so we’ll see whether EverNote becomes another of the memory augmentation methods I already use. These include blogging, email, and social bookmarking. Each method serves a communication function but also provides a repository where I often stash things purely so I can find them later.

Here’s an interesting and counter-intuitive aspect of EverNote. Human memory degrades over time. Digital memories, however, not only retain full fidelity, they can actually improve over time. Faces that you can’t find in your EverNote archive today may become recognizable next month or next year.

That’s true not only for EverNote, of course, but also for any system to which we commit digital objects. Human augmentation is powerful magic. We’re only starting to realize what it can do for us. And, I should add, to us.

Making sense of C02 data: A scientific collaboration

This week on Perspectives, I explore the partnership between Dennis Baldocchi, a Berkeley climate scientist, and Catharine van Ingen, an MSR researcher. They’ve been working together on Fluxnet, a scientific data server and collaboration service for hundreds of scientists around the world who are measuring C02 flux in the atmosphere and trying to understand the dynamics of that flux.

Science in the twenty-first century is increasingly a game of data curation and analysis, involving hundreds or thousands of players distributed all around the world. To make progress, teams will need to coordinate online. The coordination systems will emerge from partnerships like the one Dennis Baldocchi and Catharine van Ingen discuss in this interview.

It’s also fascinating to hear, from the horse’s mouth, what we actually know, and don’t know, about atmospheric CO2. And about how and why we know or don’t know. On key issues like global warming, there’s a huge gap between scientific knowledge and public understanding. Projects like this one can help close that gap.

Parsing human-written date and time information

I’m working on a project that aggregates a bunch of community calendars, plus a lot of calendar info that’s just written out free-form. Some examples of the latter, in ascending order of resistance to mechanical parsing:

Tue, 4/1/08

2 Apr – Wed 10:00AM-10:45AM

Weekdays 8:30am-4:30pm

Thu, 11/15/07 – Fri, 4/11/08

Every Tuesday of the month from 10:00-11:00 a.m

Sat., Apr. 05, 9:00 AM Registration/Preview, 10:00 AM Live Auction

2nd Saturday of every other month, 10:00 am-12:00 pm

Programming languages tend to offer lots of functions and modules for converting among machine formats, and for converting machine formats into human formats, but when it comes to recognizing human formats, not so much.

In looking around for a recognizer, I came across the script that Jamie Zawinski uses to manage the calendar for his DNA Lounge. It looks like it can handle many of these formats, but it’s a 6500-line Perl behemoth that does a bunch of different things.

What else is available, for any language, preferably more focused and packaged, that can turn an item in human format, like “2nd Saturday of every other month, 10:00 am-12:00 pm,” into a sequence of items in machine format?

Office XML: The long view

For many years I have tried, and mostly failed, to get people to appreciate the value of structured information. Sure, I’ve connected with the chattering classes who Twitter, blog, and read TechMeme, but I’ve only been preaching to the choir. Inside our echo chambers we grok XML, tagging, syndication, and information architecture. Out in the real world, though, most people aren’t hopping on that cluetrain, and that’s almost as true today as it was a decade ago.

Of course I’m not alone in my quest. Tim Berners-Lee has also tried, and mostly failed, to evangelize the power of structured information. The gating factor always was, and still is, data entry. You can go a long, long way with unstructured information, as Google has brilliantly shown. In late 2002 Sergey Brin told me:

Look, putting angle brackets around things is not a technology, by itself. I’d rather make progress by having computers understand what humans write, than by forcing humans to write in ways computers can understand.

That’s a great way to make progress, but we’re not in an either/or situation here. There’s also huge progress still to be made by enabling (not forcing) people to write in ways that computers can understand more deeply and effectively.

Jean Paoli saw an opportunity to do something about that on a large scale. It was also late 2002 when I first started talking to him about the injection of XML capabilities into Office. I evangelized that stuff long before I became Microsoft evangelist, because I believed then, and still believe today, that it’s a crucial enabler for a world facing challenges that are infinitely compounded by almost universally crummy information management.

In the flurry of commentary surrounding yesterday’s approval of Office Open XML as an ISO standard, I haven’t seen anyone thank Jean and his team for having the vision to transform Office in this important way, and the constancy of purpose to make it real. Well, I’ll say it. Thanks!