Radio commentary on citizen use of public data

A while ago I recorded a commentary for New Hampshire public radio on the topic of public data. The themes will be familiar to readers of this blog: transparency, citizen use of government data. I wondered when it would air, and then last night, while doing the dishes, I heard myself on the kitchen radio.

The piece is available on the NHPR site here. Will it make sense to folks listening at their kitchen sinks, or driving in their cars? I hope so, because as powerful an idea as this is, it’ll go nowhere until it does make sense to those folks.

Syndication of rules versus syndication of data

To follow up on last week’s item about parsing the kinds of dates and times that people actually write, Google Calendar’s Quick Add feature looks like the clear winner. Here’s a test page with expressions like:

Third Saturday of Every Month, 10 – 11:30 am

Let’s try the Chronic module from Ruby:

irb(main):007:0> Chronic.parse('Third Saturday of Every Month, 10 - 11:30 am')
=> nil

No joy.

As David French pointed out, Google Calendar’s Quick Add gets this right. Or anyway, close enough. There seems to be a small bug that pokes an instance of the event into today’s slot, whether or not today is a 3rd Saturday. But otherwise it works great.

There are tougher challenges on that test page, like:

9:00 am – 1:30 pm, North Conference Room 1
April: April 5 and 12
May: May 3 and 10
June: June 7 and 14

I doubt think anything we’ve mentioned so far can touch that, though I’d be happy to be proven wrong.

Meanwhile, the ability to capture recurring events like ‘Third Saturday of Every Month, 10 – 11:30 am’ for my aggregated community calendar has raised a new question. When I use Google Calendar for this purpose, its iCal export doesn’t enumerate the series, it defines a rule:

LOCATION:Cheshire Medical Center
RRULE:FREQ=MONTHLY;INTERVAL=1;BYDAY=3SA;WKST=MO

When I pull that event into elmcity.info/events, the RRULE (recurrence rule) only fires once each time the feed is fetched. And that’s fine. I don’t necessarily want to see these recurring events on the the calendar into the far future.

But while I can syndicate these events directly from Google Calendar into elmcity.info, I would rather route them through Eventful.com. The reason is social not technical. Although I’m herding almost all these events into my aggregator for the time being, I want their rightful owners to claim them at some point and take care for them thereafter. Eventful is better suited for the kind of commons-based peer production I’m hoping to encourage.

But, I don’t see how to inject dynamic rules, rather than static events, into Eventful. You could run the rule yourself, then poke the generated events into Eventful, but that’d create maintenance woes when events are rescheduled, modified, or cancelled. I’d rather syndicate the rule than the data.

A conversation with Phil Libin about EverNote’s new memex

In his 1945 Atlantic Monthly essay As We May Think, Vannevar Bush famously imagined the memex, a mechanism that would augment human memory. This idea of mental augmentation inspired Doug Engelbart, and we’ve been chasing the dream ever since. On this week’s Interviews with Innovators, Phil Libin discusses EverNote, a new software-plus-services offering that aims to become your memex.

Listeners may recall that Phil appeared on the show once before. In fact he was the first guest in this series. Then he was CEO of Corestreet, a company tackling the problem of large-scale credentials validation in really interesting ways. Now, as EverNote’s CEO, he’s tackling a very different problem. But although EverNote is an application for ordinary folks rather than for governments and major institutions, it raises its own set of scale issues. And not just in terms of scaling out numbers of users and quantities of storage. EverNote wants to scale in the dimension of time as well.

Like me, Phil’s a huge fan of the Long Now Foundation. When he says that EverNote wants to guarantee the integrity of the digital objects that you commit to it forever, he’s not kidding.

While it’s refreshing to see a Web 2.0 company taking this long view, Phil admits that addressing the forever challenge in a meaningful way is beyond the means of EverNote. I’d add that it’s beyond any individual organization, and will require a federation of players to hammer out not only technical standards, but also shared business arrangements.

That’s not going to happen anytime soon, but then EverNote isn’t currently making guarantees that sentimental memorabilia will be preserved for your great-grandchildren. Instead it wants to guarantee that you’ll have effective near-term use of operational memorabilia — key documents, and in particular photos from which it finds, extracts, and indexes text.

The idea with this photo feature is that you can take pictures of receipts, wine labels, magazine pages, or event posters, dump the pictures into EverNote, and then find the photos by searching for the text in them. EverNote’s secret sauce here is its ability to find text not only in high-res scans, but also in “crappy cellphone photos taken at an angle.”

As Phil points out, from EverNote’s perspective the world comes at its users in two modes. First, when they’re away from their computers and out in the world, usually with some kind of camera. Second, when they’re at their computers, in which case they can take clippings from the web, or forward email.

I’m in that second mode a lot, so we’ll see whether EverNote becomes another of the memory augmentation methods I already use. These include blogging, email, and social bookmarking. Each method serves a communication function but also provides a repository where I often stash things purely so I can find them later.

Here’s an interesting and counter-intuitive aspect of EverNote. Human memory degrades over time. Digital memories, however, not only retain full fidelity, they can actually improve over time. Faces that you can’t find in your EverNote archive today may become recognizable next month or next year.

That’s true not only for EverNote, of course, but also for any system to which we commit digital objects. Human augmentation is powerful magic. We’re only starting to realize what it can do for us. And, I should add, to us.

Making sense of C02 data: A scientific collaboration

This week on Perspectives, I explore the partnership between Dennis Baldocchi, a Berkeley climate scientist, and Catharine van Ingen, an MSR researcher. They’ve been working together on Fluxnet, a scientific data server and collaboration service for hundreds of scientists around the world who are measuring C02 flux in the atmosphere and trying to understand the dynamics of that flux.

Science in the twenty-first century is increasingly a game of data curation and analysis, involving hundreds or thousands of players distributed all around the world. To make progress, teams will need to coordinate online. The coordination systems will emerge from partnerships like the one Dennis Baldocchi and Catharine van Ingen discuss in this interview.

It’s also fascinating to hear, from the horse’s mouth, what we actually know, and don’t know, about atmospheric CO2. And about how and why we know or don’t know. On key issues like global warming, there’s a huge gap between scientific knowledge and public understanding. Projects like this one can help close that gap.

Parsing human-written date and time information

I’m working on a project that aggregates a bunch of community calendars, plus a lot of calendar info that’s just written out free-form. Some examples of the latter, in ascending order of resistance to mechanical parsing:

Tue, 4/1/08

2 Apr – Wed 10:00AM-10:45AM

Weekdays 8:30am-4:30pm

Thu, 11/15/07 – Fri, 4/11/08

Every Tuesday of the month from 10:00-11:00 a.m

Sat., Apr. 05, 9:00 AM Registration/Preview, 10:00 AM Live Auction

2nd Saturday of every other month, 10:00 am-12:00 pm

Programming languages tend to offer lots of functions and modules for converting among machine formats, and for converting machine formats into human formats, but when it comes to recognizing human formats, not so much.

In looking around for a recognizer, I came across the script that Jamie Zawinski uses to manage the calendar for his DNA Lounge. It looks like it can handle many of these formats, but it’s a 6500-line Perl behemoth that does a bunch of different things.

What else is available, for any language, preferably more focused and packaged, that can turn an item in human format, like “2nd Saturday of every other month, 10:00 am-12:00 pm,” into a sequence of items in machine format?

Office XML: The long view

For many years I have tried, and mostly failed, to get people to appreciate the value of structured information. Sure, I’ve connected with the chattering classes who Twitter, blog, and read TechMeme, but I’ve only been preaching to the choir. Inside our echo chambers we grok XML, tagging, syndication, and information architecture. Out in the real world, though, most people aren’t hopping on that cluetrain, and that’s almost as true today as it was a decade ago.

Of course I’m not alone in my quest. Tim Berners-Lee has also tried, and mostly failed, to evangelize the power of structured information. The gating factor always was, and still is, data entry. You can go a long, long way with unstructured information, as Google has brilliantly shown. In late 2002 Sergey Brin told me:

Look, putting angle brackets around things is not a technology, by itself. I’d rather make progress by having computers understand what humans write, than by forcing humans to write in ways computers can understand.

That’s a great way to make progress, but we’re not in an either/or situation here. There’s also huge progress still to be made by enabling (not forcing) people to write in ways that computers can understand more deeply and effectively.

Jean Paoli saw an opportunity to do something about that on a large scale. It was also late 2002 when I first started talking to him about the injection of XML capabilities into Office. I evangelized that stuff long before I became Microsoft evangelist, because I believed then, and still believe today, that it’s a crucial enabler for a world facing challenges that are infinitely compounded by almost universally crummy information management.

In the flurry of commentary surrounding yesterday’s approval of Office Open XML as an ISO standard, I haven’t seen anyone thank Jean and his team for having the vision to transform Office in this important way, and the constancy of purpose to make it real. Well, I’ll say it. Thanks!

My close encounter with the Hannaford data breach

My debit card was one of the potentially 4.2 million exposed in the recent Hannaford data breach. Here’s part of the letter from my bank, the Savings Bank of Walpole.

I’ve thanked them privately, and want to thank them publicly as well, for being proactive and doing the right thing here. They’re dealing with fallout from a problem they didn’t create.

Details are still emerging but we don’t yet have the full story. As the InfoWorld story notes, Hannaford’s servers might have been compromised by a remote exploit through the network, or a local exploit made possible by unauthorized physical access.

In the aftermath, most of the usual defense-in-depth strategies are being rehashed, and that’s good. But one-time account numbers still aren’t on the radar screen, and I keep on wondering: Why not?

A conversation with Tim Spalding about LibraryThing

I had a great time talking about LibraryThing with Tim Spalding for this week’s ITConversations show. He says LibraryThing is a baroque application. I think of it as deep in the same ways that Flickr is: Many features, many modes of use, many constituencies. Although Tim is flagellating himself about the way we swam around in those depths, I enjoyed the conversation immensely. If you’re fascinated by the dynamics of social information management — whether or not you are a book-lover — I think you will too.

We wound up talking for almost two hours. I omitted the second hour not only for reasons of length, but also because it raised a question that neither of us felt we were able to address very well. As mentioned in comments here, though, it does warrant further consideration. A lot of folks, me included, feel that the inability to move identity and relationships across social networks is increasingly an impediment to joining them and participating in them.

But Tim rightly points out that friction has value. Rites of initiation are costly for a reason. When you invest effort you create meaning. So here’s the question. How do we separate those aspects of social information management that should be portable and frictionless from those that should be unique and special?

Cluster computing, with large data, for the classroom

This week’s Perspectives is a two-parter: an interview and companion screencast on the topic of cluster computing in the classroom. The interview is with Kyril Faenov, the General Manager of the Windows HPC (high performance computing) unit, and the screencast is with Rich Ciapala, a program manager for Microsoft HPC++ Labs.

The project demonstrated in the screencast, and discussed in the interview, is called CompFin Lab. It’s a system that enables professors to in turn enable their students to run computationally expensive financial models on large quantities of data. From the student’s perspective, you go to a SharePoint server, select a computational model, pick a basket of stocks, and run the model. Behind the scenes the task is partitioned and sprayed across a cluster of computers, then the results are gathered and presented in an Excel spreadsheet.

From the professor’s point of view, some .NET programming is required. But a framework abstracts the mechanics of dealing with the cluster, so the professor can focus on the logic of the model itself.

There are couple of key points about the evolution of high-performance computing that I want to highlight here. First, there’s what Kyril calls “the gravitational pull of data.” Increasingly, people and organizations are building vast repositories of data that other people and organizations will want to analyze in computationally expensive ways. It’s great to have access to a compute cluster in the cloud that can do the heavy lifting, but when datasets get really big you get bottlenecked trying to send the data to where the code runs. At a certain point you’d rather send the code to where the data lives.

A second and related point is that in our current model for large-scale cloud-based computing, there are only a handful of what I call intergalactic clusters — namely, those operated by Google, Yahoo, Amazon, and Microsoft. These are one-of-a-kind behemoths. You can’t replicate one of them locally and apply it to your terabytes of data. So as Kyril and his team build out their cloud-based HPC services, they’re working to ensure the services can be replicated locally.

Maybe the most optimal thing is for you to stand up a 1000-node cluster with each node having a terabyte of disk. We want to enable that. We want to be able to tell our customers: Here’s how we run this large-scale data-driven HPC applications, and here’s how, within a day or two, you can stand up one of these yourself.

The idea is that if you build one of those for your own terabyte trove of astronomical or climatalogical data, you can run your own computations against that data, and you can also share that capability with other people and organizations who want to run their code against your data.

Revisiting the InfoWorld metadata explorer

A while ago I wrote an alternative search and navigation interface to InfoWorld.com. The search is broken now because the underlying engine switched from Ultraseek to Google, and nobody has updated the search wrapper. But the navigation piece still works, and while it does, I want to invite some commentary because I’m thinking of doing something similar for another project.

In this model the navigation is metadata-driven, and supports views like:

InfoWorld stories tagged ‘Silverlight’

InfoWorld news stories tagged ‘Silverlight’

InfoWorld news stories by Elizabeth Montalbano tagged ‘Silverlight’

Every piece of metadata in the tabular display is active, and toggles a filter for that item. This works especially well for the tags, and enables you to cruise through the tagspace in a fluid way. For example, try this progression:

1. InfoWorld news stories tagged ‘Silverlight’

2. Click ‘flash’ to toggle it on

3. InfoWorld news stories tagged ‘Silverlight’ and ‘Flash’

4. Click ‘silverlight’ to toggle it off

InfoWorld news stories tagged ‘Flash’

The same principle holds for other bits of metadata, like storytype. So for example:

1. InfoWorld news stories tagged ‘Silverlight’

2. Click ‘News’ to toggle it off

3. InfoWorld stories tagged ‘Silverlight’

4. Click ‘Review’ to toggle it on

5. InfoWorld Reviews tagged ‘Silverlight’

6. Click ‘Martin Heller’ to toggle it on

7. InfoWorld Reviews by Martin Heller tagged ‘Silverlight’

8. Click ‘silverlight’ to toggle it off

9. InfoWorld Reviews by Martin Heller

It’s powerful to explore things this way, but if I did something like this again, I’d look for ways to make these filter progressions more intuitive and discoverable.

I just don’t think people expect every item to work as a control as well as an information display. And because they don’t, it may be a bad idea to do things that way. Or maybe it’s a good idea that’s still in search of its perfect expression. I’d be curious to know what you think.

Rediscovering LibraryThing

To prepare for an interview with Tim Spalding, the founder and lead developer of LibraryThing, I re-registered with LibraryThing, spent some quality time with the service, and was wildly impressed.

At one point in the interview, Tim asked me how I, Mr. LibraryLookup, as likely a person as there is to use and appreciate LibraryThing, could have gone so long without hooking up with it.

I think part of the answer is hidden in the first paragraph: I had to re-register for the service, which I had tirekicked a year or two ago. The friction of joining and re-joining online services has become a major barrier.

There’s also conceptual friction. LibraryThing is a deep application that does lots of things, but on the surface, it appears to be a mechanism for cataloging books that you own. In fact it isn’t only that, you can just load it with books that you’ve read, or might read, as a way to seed discovery and recommendation.

Finally, there’s data friction. There are bibliophiles who will obsessively catalog their own collections, but I’m not one of them. I do, however, maintain a list of books on my Amazon wishlist. I syndicate that list to the version of LibraryLookup that alerts me when books on the wishlist become available in my local library.

What I needed was a frictionless way to reuse that list. And on this go-round with LibraryThing I found it. Sort of. You can import your Amazon wishlist into LibraryThing, which is a great way to jumpstart the discovery and recommendation process. It doesn’t yet syndicate from Amazon, so the initial import won’t be refreshed, but Tim says that’s coming.

It turns out not to matter at all that list of books I’m interested in happens to be an Amazon wishlist. All that matters is that I can keep it in some service, somewhere, that can syndicate data to other services elsewhere.

A conversation with Carl Malamud about access to public information

This week’s ITConversations show is a chat with Carl Malamud, whose exploits I’ve followed ever since he launched podcasting a decade ahead of schedule with a project called Internet Talk Radio. Since then, Carl’s mainly known for his tireless crusade to release troves of public information to the Net: SEC filings, patents, Congressional video, historical photographs, and most recently, U.S. case law.

One of the questions I wanted to explore with Carl is also raised here by John Montgomery:

Popfly, a mashup tool, depends on three things: data that is simple to access programmatically, interesting, and available under terms that enable users to work with it. As with most software endeavors, you can pick two.

The government has a huge amount of interesting data that’s available under really great terms. Weather? Check out http://www.noaa.gov. Financial information? Start with http://www.sec.gov. Crime statistics? Dig around in http://www.usdoj.gov/. But how much of this is programmatically accessible? Very little, as it turns out.

John mentions the Sunlight Foundation’s efforts to provide an intermediary layer of services that make raw data easier to access and manipulate, and I raised that point with Carl. From his perspective, of course, it all starts with the data which he is rightly focused on providing. Even though the U.S. is far ahead of many other countries in this regard, there are oceans of important information not yet available even in raw form.

Carl has enormous faith in the Net’s ability to interconnect and enhance these raw sources, and I do too. Here’s a small but significant example. If you view source on 28 Fed.R.Serv.3d 415, you’ll see one of my favorite strategies at work: semantic metadata encoded using CSS style tags. That enables an important kind of programmatic access. Now it’s true that today, Internet search engines don’t support queries that ask for documents where Shelby Reed appears as a plaintiff in an appeal to the U.S. Court of Appeals, Fifth Circuit. Someday, though, that kind query will be supported, and the latent semantics of this rendering of U.S. case law will emerge.

These enhanced services don’t necessarily just arise from the grassroots, however. Resource-rich organizations are often in the best position to provide them. One example, we agreed, is the New York Times’ stunningly effective visualization of presidential election debates. Ideally we’d be able to visualize all of the proceedings of Congress in the same way. That’s probably too much to expect of public-interest groups running shoestring operations. But what such groups can do is apply Carl’s favorite technique: Create a few high-profile examples, and then pressure the government into internalizing the process.

Perspectives: Understanding CardSpace with Vittorio Bertocci

The second installment of Perspectives is up, with Vittorio Bertocci, author of Understanding Windows CardSpace. This interview was recorded a few months ago, and has been waiting for the Perspectives site to launch. In January I excerpted the part about omnidirectional identity, a difficult phrase that I continue to struggle with. Maybe a better one is Internet persona: the social mask that you project when you self-publish online, and to which reputation attaches. Whatever we call this phenomenon, its Laws of Identity — not only for people, but also for digital object — are not yet well defined.

Most of the interview, though, concerns the existing “unidirectional” mechanisms supported by CardSpace. I asked Vittorio to relate those mechanisms to precursors like SSL client certificates and Kerberos, and also to the complementary OpenID system. As discussed in my ITConversations podcast with Dick Hardt, the principles that govern this identity machinery are abstract and, until we experience them firsthand, will be hard for most of us to grasp. But Vittorio does a good job of explaining those principles in terms of concrete examples.

A close call: photos lost, then found

While reviewing a white paper by a colleague on the subject of personal digital archives, I realized that I hadn’t followed through on a plan to consolidate a few different caches of digital photos from various digicam and computer eras. So of course, when I went looking, things weren’t exactly the way I remembered. One particular batch was missing, and there were some anxious moments while I booted up dormant computers and mounted shelved disks. In the end I found the missing set, but although I could have sworn they were in three safe places, there was really only one.

In these moments of panic, the need for a lifebits service becomes crystal clear. But the moments pass, and we move on. Most people, most of the time, don’t yet feel the need for that kind of service.

Inevitably that will change. I wonder how, and when?

When the LazyWeb gets too lazy

I’m running a couple of services that make automatic use of Amazon wishlists, and today I noticed that the current version of the API is going away:

503 – Service Unavailable

ECS3 is currently unavailable due to a planned outage in preparation for the complete shutdown of ECS3 on March 31, 2008.

After March 31, 2008, we will no longer accept Amazon ECS 3.0 requests. Please upgrade to the Amazon Associates Web Service (previously called Amazon E-Commerce Web Service 4.0) by then to ensure that you or your customers are not affected by the upcoming deprecation.

Amazon ECS 3.0 deprecation was announced a year ago in February 2007. You can read the original post at http://developer.amazonwebservices.com/connect/ann.jspa?annID=164.

In preparation of the March 31st deprecation, the Amazon ECS 3.0 web service will experience several outages. The complete outage schedule can be viewed at http://developer.amazonwebservices.com/connect/ann.jspa?annID=276.

Please refer to the migration guide for assistance in mapping Amazon ECS 3.0 calls to their Amazon Associates Web Service 4.0 equivalents. You can find the migration guide at http://developer.amazonwebservices.com/connect/entry.jspa?categoryID=12&externalID=627. Please use the Amazon Associates Web Service forum to ask technical questions and share answers with your fellow developers.

We thank you for being part of Amazon’s Developer community and look forward to your continued support.

Like Rich Burridge, I’ll be needing a replacement for PyAmazon, the Python module Mark Pilgrim wrote long ago to simplify use of the original Amazon API.

In our modern world of aggregation, search, and syndication, it’s easy to wait and see what will happen. I went to bloglines and searched for blog items that — like Rich’s and now mine — point to Amazon’s page about migrating to the new API. And then I subscribed to that search.

In a way, this is too easy. I can imagine a bunch of people camped on that query, watching the clock and waiting for someone else to step up to the plate before March 31. The first time around, when Amazon web services were new and shiny, it was cool to be that person. Now, not so much.

Update: A couple of folks have pointed to PyAWS. As mentioned in Rich Burridge’s blog entry, it doesn’t seem to offer, e.g., a single call to retrieve all items from a wishlist. However, when I reviewed my use of the earlier PyAmazon, in terms of raw interaction with the RESTful API and its XML output, I remembered how simple that interaction was. It’s just as simple in the new Amazon API, just slightly different. Encapsulating what I needed to do required only a few lines of code.

Generalizing that encapsulation is much harder. And when you have to repeat that hard work for many different languages, and for many different APIs, the inevitable result is that these per-language API wrappers tend to lag.

That’s one reason I’m looking forward to services built on Astoria ADO.NET Data Services, or an equivalent normalization layer. I think it can substantially narrow the gap between RESTful APIs and the convenience wrappers we enjoy in various programming languages.

A conversation with Ward Cunningham about visible workings and aboutus.org

This week on ITConversations I have a two-part interview with Ward Cunningham. In part one, we explore his implementation of Brian Marick’s visible workings idea, which combines software testing with business process transparency. This is one of those transformative ideas that will not, at first, seem interesting and important to most people. And maybe it never will. But then again, Ward has a track record. The wiki idea didn’t at first seem interesting and important to most people either, and look what’s happened there. So, you never know. Maybe in 2020 we’ll notice that business software is a lot more reliable and understandable than it used to be, and we’ll look back and say: Ward did it again.

In part two, we discuss Ward’s new wiki-based venture, aboutus.org. It’s a directory that aims to become a sort of extended WHOIS database, where domain name owners — along with anyone who reads the websites attached to those domains — can collaboratively describe the people, companies, and organizations represented by those websites. I like the concept, but I wish it weren’t necessary to sign up in order to update http://aboutus.org/jonudell.net. Instead I’d prefer to describe myself on my own hosted lifebits service, wherever that might be, and then syndicate the information to aboutus.org and elsewhere.

Missing the cluetrain

I wasn’t going to post this humorous anecdote but Mike Caulfield reminded me that it’s too funny not to share. After musing about a subscription service for running shoes, I walked in my local store, bought a new pair, and invited them to notify me in three months. Hilarity ensued.

He: We’re not really set up to do that.

Me: You could email me.

He: Yeah, but then we’d have to keep some kind of customer database on the computer.

Oh, right. Having a database of customers who’ve invited you to contact them on a regular basis … that’d suck, wouldn’t it?

Perspectives, a new interview series, launches today

Today I’m launching a new Microsoft-oriented interview series called Perspectives. The show will touch on a variety of topics including robotics, digital identity, e-science, and social software. I’ll be speaking mostly with passionate Microsoft innovators, and sometimes also with key partners from academia and industry.

The format is an audio podcast and a blog, where the blog provides a partial (but substantial) text transcription in order to make these conversations accessible to folks who don’t listen to podcasts, and also to expose them to the Net’s ecosystem of search, linking, and aggregation. Where appropriate, I’ll also use screencasts to show software in action.

Perspectives runs on the same publishing platform that supports Channel 10 (for enthusiasts), Channel 8 (for students), TechNet Edge (for IT pros), and VisitMIX (for Web designers and developers). (Channel 9, the original site, will migrate to this platform too.) Perspectives intersects with the interests of all these sites, but it doesn’t really belong in any of them, so we’ve created an independent home for it. Thanks to the EvNet team, especially Duncan Mackenzie, David Shadle, and Jeff Sandquist, for making that happen.

The first episode, with Henrik Nielsen and Tandy Trower, explores the Microsoft Robotics initiative. We discuss why robotics is — as futurist Paul Saffo believes — a Next Big Thing. And Henrik and Tandy explain how the concurrency and decentralized-services infrastructure that supports the robotics platform is broadly relevant in an era of loosely-coupled services.

Ann Arbor’s public library is a beacon of progress

On the Ann Arbor public library’s website you can find a wonderful example of how two local institutions — the library and the police department — can work together to curate an online exhibit. In 2002, history buff and police sergeant Michael Logghe self-published the lavishly illustrated True Crimes and the History of the Ann Arbor Police Department. The library worked with Logghe to produce an online version of the book. And when he visited the library to speak about the book and the online exhibit, his talk was recorded and made available for download (as video or audio-only) from the library’s podcast feed. Nicely done!

In my Remixing the library talk, I said that the two-way web paves the way for this kind of productive teamwork. It’s not a natural reflex, as Cassandra Targett points out:

It’s a shift from being passive recipients of the world’s knowledge to active participants in its creation, a shift that in many ways goes against some of the deepest core principles of what has become library science.

For a profession steeped in the idea that our role is to describe packaged knowledge and then help people find it (and play no role in how they use it once we point the way to it), the idea that we can not only modify some types of packages or even create substantially new ones is quite foreign still.

As I noted in my interview with Adrian Holovaty about EveryBlock, the curatorial collaboration among local governments, newspapers and libraries can encompass more than text, images, audio, and video. Those same institutions can work together to curate data about the operation of government (crime, taxes, maintenance), about social and civic life (event calendars), about the environment (weather, air quality), and more.

Although it’s starting to happen more in the scientific realm, I haven’t yet found a good example of that kind of data-oriented collaboration in the civic realm. But the teamwork shown by Ann Arbor’s police department and public library embodies the spirit that will make it happen.

Linking to excerpts from the MIX keynotes

John Lam asked how to excerpt fragments of Steve Ballmer’s keynote, and the principle of keystroke conservation requires me to answer here. The VisitMIX page for the keynote lists three streams. The links point to .asx files, which are wrappers around references to media files or streams. In this case, the references point to streams, which means that you can excerpt fragments by specifying the starttime and duration parameters.

Here’s the medium-bandwidth .asx file into which I’ve inserted starttime and duration parameters to create a fragment that points to a question and answer about HealthVault.

<asx version="3.0">
  <title>mix08: steve ballmer</title>
  <entry>
    <title>mix08: steve ballmer on healthvault</title>
    <starttime value = "52:50.0"/>
    <duration value="1:45"/>
    <copyright>copyright 2008. all rights reserved.</copyright>
    <ref href="mms://istreampl.wmod.llnwd.net/a269/o2/microsoft/300_microsoft_mix_080306.wmv" />
  </entry>
</asx>

I’ve posted the file at http://channel9.msdn.com/media/ballmer-keynote-healthvault.asx. It should play in Windows Media Player, and also in VLC on the Mac or Linux though I can’t check those at the moment.

In general, launching appropriate media players from a web page is a complex process. I’m hoping and expecting that Silverlight, over time, will simplify it, and help make rich media more granularly linkable.

A conversation with Michael Lenczner about community wifi in Montreal

In Montreal this Friday, McGill professor Darin Barney will be giving a version of his talk on citizenship and technology. Here’s an excerpt:

Each of the telegraph, telephone, radio and television was accompanied by its own heroic rhetoric of democratic transformation and reinvigorated civic engagement. None have delivered fully on this promise, but each has been crucial for the maintenance of a system of political and economic power in which most people are systematically distanced from the practice of citizenship most of the time. For the most part, these technologies have been means of anything but citizenship: spectacular entertainment; docile recreation; habituation to the rhythms of capitalist production and consumption; cultural normalization. The internet, as a radically decentralized medium whose capacity for publication and circulation far surpasses that of its broadcast predecessors, has certainly provided the means by which politically-engaged citizens can access and produce politically-charged information that would never have seen the light of day under the regime of the television and newspaper. This information can be an important resource for political judgment. But the Internet also surpasses its predecessors as an integrated medium of enrolment in the depoliticized economy and culture of consumer capitalism. This is why we should be wary of equating more and better access to information and communication technology with enhanced citizenship.

One Montreal resident deeply influenced by Barney’s critique of the Internet as an enabler of citizenship is Michael Lenczner, whom I interviewed for this week’s ITConversations show. Mike is a co-founder of ÃŽle Sans Fil, Montreal’s community wireless network. With over 150 access points and nearly 60,000 users, the project is a huge success, all the more so given that municipal wi-fi projects in other cities have failed to materialize. And yet, Mike questions the value of what’s been accomplished. The project’s goal was not merely to light up hotspots in downtown Montreal, but to enhance the “sociality” of the city and elicit more and better civic engagement. He doubts these goals have been achieved, and asks himself hard questions about how technology can be deployed to these ends.When I met Mike recently in Montreal, I said: “It amazes that you’re asking yourself these questions. He replied: “It amazes me that others don’t.”

Automation and accessibility in Silverlight and IE8

In this interview at MIX, Mark Rideout explains how Silverlight will use the same UIA (User Interface Automation) mechanisms that make Windows apps (and will make Linux applications) accessible by way of assistive technologies like screenreaders.

If you’re not somebody who needs that kind of assistance, you may not think this matters to you. But as I’ve pointed out in a series of essays, the flip side of accessibility is automation, and that’s something we all need.

For software developers, the automation framework provides the hooks needed to test the interactive behavior of applications.

For users, it provides the hooks needed to record, exchange, and replay software interactions. In The social scripting continuum I showed how IBM’s CoScripter enables people to share their knowledge of how to use web applications. It’s fabulous, but it’s restricted to the domain of simple web apps running in Firefox. IE, Ajax, Flash, Silverlight, and desktop apps are all out of scope.

With an automation/accessibility framework common to browsers, rich runtimes in browsers, and desktop apps, you could in theory enable a common way for people to describe and share their knowledge of how to use software across the full range of application types, for any browser, any rich runtime, and any operating system.

We’re not there yet, and we may never get there, but this Silverlight announcement points toward a future that’s worth imagining.

Update: In related news, John Resig notes that IE8 supports the W3C’s ARIA (Accessible Rich Internet Applications), which makes Ajax applications accessible to screenreaders. Here’s a brief guide for the perplexed, myself included, because this stuff is a layer cake.

Native accessibility toolkits, like MSAA (Microsoft Active Accessibility) and ATK (Linux Accessibility Toolkit), are the foundation. The Mozilla implementation of ARIA rests on this layer, as does the IE8 implementation. User Interface Automation (UIA), meanwhile, is part of the .NET Framework. It can be used to automate unmanaged apps like Word, as well as managed apps on the desktop or (now) in Silverlight. How UIA will be realized on Linux is something I don’t know, but would like to find out.

I can’t formulate a unified field theory that joins all these pieces, on various platforms, but I hope one will emerge.

Permalinking the Hard Rock Memorabilia exhibit

The Hard Rock Memorabilia exhibit is a great example of what becomes possible now that Seadragon Deep Zoom is integrated into Silverlight 2. The exhibit includes:

Madonna’s page in her high school yearbook:

Pat Boone’s shoes:

John Lennon’s handwritten lyrics to Imagine:

And there’s much more. When you choose subsets — by artist, decade, type (e.g. clothing, instruments), genre, location — the images retile, and they’re all navigable using Deep Zoom’s extreme zoom and pan capability.

Note that the links above lead directly into the exhibit and focus on the indicated asset. You acquire these from the Share link in the right pane, which exposes URLs of the form:

http://memorabilia.hardrock.com/Default.aspx?AssetId=8191

It’s great to see this permalink feature included. Deep Zoom is going to open up vast spaces for exploration, and in order to explore those spaces together we’ll need shared coordinate systems.

To that end, I’m hoping that future incarnations of this sort of exhibit will expose richer URL namespaces. If I want to show you Madonna’s yearbook in the context of the 1970s, I have to tell you to click Decade, then 1970, then choose the 2nd item in the 3rd row. It’d be great to be able to get you there directly:

memorabilia.hardrock.com/decade/1970/20352

And of course I’d want to locate Madonna for you, among her other classmates, by zooming to the desired view and then tacking those coordinates onto the URL.

If these precise locators are made available, conversations about the views they identify can form on the web. To see why it’s crucial to expose a public namespace, consider the David Rumsey map collection. There you can explore and precisely annotate an extraordinary collection of historical maps. And you search for those annotations within the Java-based viewer. But when you annotate a feature within a map, it doesn’t — so far as I can tell — produce a shareable URL. If those URLs were available, the collection would be woven into public discourse to a far greater degree than it is.

A couple of years ago, I asked whether rich Internet apps can be web-friendly. One of the reponses came from Kevin Lynch at Adobe, who made this example showing how navigation within a Flash exhibit of images can be reflected on the URL-line.

I don’t think it matters much whether you expose the RIA’s state on the URL-line or by means of a permalink. What matters is that you do it, and do it in as granular way as makes sense for the application.

PS: For extra credit, it’s nice to provide the underlying data for this sort of exhibit. When you’re exploring the Cubism timeline, for example, you can grab the data and mix it as you please.

WebSlices can help popularize feed syndication

With the release of the first public beta of Internet Explorer 8, two new features come to light: Activities and WebSlices. You can see a demo of both in Joshua Allen’s interview with Jane Kim. I think of Activities as next-generation bookmarklets, and also as kissing cousins to the OpenSearch providers that you can add to the browser’s search box.

WebSlices are something else again. They transform pieces of web pages into little feeds that you can subscribe to. For all its power and utility, feed syndication hasn’t yet really sunk into the consciousness of most people. I’m hoping that WebSlices, which are dead simple to create, will help bridge the gap.

Here’s a complete working example of a page with two slices:

<div class="hslice" id="1">  
<p class="entry-title">Slice 1</p>  
<p class="entry-content">This is slice 1.</p>
</div> 

<div class="hslice" id="2">  
<p class="entry-title">Slice 2</p>  
<p class="entry-content">This is slice 2.</p>
<div>

The syntax is based on the hAtom microformat, which in turn is a subset of the Atom feed format. For my purposes here, ’nuff said about that. I’m much more interested in what users will see, do, and understand. Let’s view that page in IE8:

The orange feed icon in the toolbar changes to a (presumably not final) purplish thingy. And when I hover over the second slice, another of those pops up. Both are lit, indicating there’s fresh content.

From either the toolbar or the inline hover, I can subscribe (to just the second slice) like so:

It shows up as a favorite, bolded to indicate fresh content:

From another page, I can peek at the slice’s content by clicking its button:

But when you click Favorites->Feeds, you’ll see it’s also a conventional feed:

I like this for a couple of reasons. First, because it will give microformats a big boost, and propel the data web forward. Second, because it will introduce many more people to the whole idea of subscribing to feeds. There’s a big conceptual barrier there that we haven’t yet brought most people across. I’m hoping that a new way of subscribing to a new kind of feed will also raise awareness about the old ways of subscribing to conventional feeds.

Ward Cunningham’s implementation of Brian Marick’s “Visible Workings”

In Portland last week I visited with Ward Cunningham, whose pragmatic and humane approach to the art of software informs everything he touches: the Wiki, object-oriented, agile, and test-driven programming, the framework for integrated test. (InfoWorld stuff about Ward here, here, and here.) Ward’s living the startup life these days, at aboutus.org, which describes itself as a “socially editable directory of the internet.” Think WHOIS morphed into a Wikipedia where you are not only permitted, but actively encouraged, to write the biography of your company or community.

But that’s not what we mostly talked about. Instead Ward took me behind the scenes at the portal for the Eclipse Foundation. Only members can participate in the workflows accessible through this portal: electing new committers, scheduling project reviews. But it turns out that anybody can explore the portal use cases.

Here’s a simple one: Change Personal Address. This is the part of the system that runs when a member changes facts about his or her address. You can see a test script that exercises this part of the system. You can even run the test script and inspect the results. Try that, and you’ll see that the output interleaves lines of script with renderings of what the users sees: screenshots, emails.

Finally you can swim the test. Here the steps and results are laid out in a table. Time advances as you move down the rows, and there’s a column for every actor in the workflow.

When you hover over an action step or a notification, the corresponding screenshot or email message pops up. This is a great way to visualize a complex email-mediated workflow that can involve many actors, and unfold over many days. But here’s the kicker: the visualization is also available to users, directly from the interface. Here’s the screen that you see when you’re changing your address:

Next to the Save button there’s an Explore link. If you click it, you’ll discover the same swim visualization that anyone, anytime, can explore here. Note the variations, most interestingly the one for the case where the person is a committer, and where the address change either does or does not coincide with a change of employer. If you did change employer, you’re going to get this email informing you that additional paperwork is required:

This isn’t just an innovative approach to software testing and workflow visualization. It’s also a radical statement about business process transparency. For most of us, most of the time, business systems are black boxes whose internal workings we can only discern in the outcomes of our (often painful) interactions with them. But what if you could find out, before pressing the Save button, what’s going on in that black box? And what if your way of finding out wasn’t by reading bogus documentation, but instead by probing the system itself using its own test framework?

It’s a huge idea. In a blog about this project, Ward writes:

The MyFoundation portal, once again, respects the curiosity and intellect of its users by exposing all aspects of the processes it supports. Who asked for this? No one. No one thought to. That doesn’t mean it isn’t needed.

Brian Marick calls this Visible Workings. He identifies a middle ground, between the traditional GUI presentation and the raw source code that produces it. This middle ground makes the application both explanatory and tinkerable. The portal’s swim diagram is our middle ground. We know it makes our work explanatory and look forward to investigating the tinkerable aspects too.

And elsewhere:

Online forms have too much in common with income tax forms. Nobody likes filling out either one. Each is a sea of fields, each field another question, one question after another. It is like being interrogated. Can we make filling out a form more like a conversation than an interrogation? The portal’s explore links suggest a way toward this goal. These links let you ask a question every now and then. You get to ask, “why do you ask?” Wouldn’t it be great if you could always do that?

Amen, brother.

A conversation with Adrian Holovaty about EveryBlock.com

For this week’s ITConversations show, Adrian Holovaty joins me to chat about EveryBlock, a new website that gathers and publishes “address-specific” information such as crime reports, building-code violations, and restaurant inspections.

Acquiring this information isn’t frictionless and raises questions about how this kind of data can be published usefully, as opposed to merely published. EveryBlock also raises broader questions about news gathering and reporting. The project, which is funded by a Knight Foundation grant, has attracted some criticism for not being journalistic in spirit. But Adrian Holovaty suggests that EveryBlock actually redefines news.

The previous criterion for something being covered in the newspaper was that it has to affect a lot of people in the readership. But if the pothole is fixed on your block, it’s news to you, just like what your friends are doing on Facebook is news to you. Instead of a friend feed, we’re making an address feed.

More broadly, as information that used to yield only to investigative shoe-leather starts to flow freely on the Net, journalists will be able to divert energy from data collection to analysis.

I get a little frustrated when the high-falutin’ journalists look at EveryBlock and say ‘How is this journalism? Why do you think this is replacing newspapers?’ Well, this isn’t intended to replace journalism at all, if anything it’ll help you find trends going on in the world.

There’s also an open question as to which social institutions can best organize and curate these sources of information. Governments? Newspapers? Libraries? Self-organizing groups of citizens? I’m really curious to see how it plays out.

Where can I subscribe to a running-shoe-replacement service?

A few years back I realized that my knees and ankles were hurting because I’d put too many miles on my running shoes. No permanent injuries resulted, but a friend who outran his shoes wasn’t so lucky, and he’s got back problems for life.

This is a business opportunity. If you’re a runner, spending $100 every six (or even three) months is infinitely preferable to injury. You’d think that shoe sellers would make it easy to do that, but they don’t. I’d happily authorize regular replacements, but nobody’s ever offered me that option.

Partly I guess this is a failure of service-oriented thinking. My local seller thinks service means taking good care of me when I walk in, and he does. But I think service should also mean helping me manage a lifelong shoe-replacement regimen, and that notion seems not to have sunk in.

Of course planned obsolescence also gets in the way. Once I find a shoe I like, I try to stick with it, but the manufacturers won’t let me. The model I know works well usually isn’t available next time around, so I have to try something different. That’d complicate any kind of subscription service.

I can sort of understand the difference between, say, prescription drugs, which are commodities that I can replace on a subscription basis, and running shoes, which are both fashion items and (supposedly) evolving technologies. But for me, and maybe for a lot of people, what I really want is to regard the running shoe as a commodity I can replace on a subscription basis.

I wonder what else belongs in this category: Products that sellers don’t want to commodify, but that if managed this way would produce recurring revenue and create the opportunity for lifelong service relationships.

A conversation with Valdis Krebs about social network analysis

For this week’s ITConversations show, introduced by special guest introducer Lynne Windley, I got together with Valdis Krebs, who’s been mapping and analyzing social networks since Mark Zuckerberg was in diapers.

I can’t remember how I first got to know Valdis, but this snippet from a 2004 interview — for an InfoWorld cover story on enterprise social software — gives you a sense of what he does and how he thinks:

IW: Social network analysis can reveal that highly connected people are more valuable than the org chart or salary plan suggests. Is this becoming a factor?

VK: Yes. I did a project with an investment bank, and they took into account who was most valuable in getting a deal done, and factored that into the bonus. I’ve had execs inside and outside IBM saying, “If this data is true, then I’m not paying the people who bubbled up to the top what they’re worth.”

IW: Does it cut the other way, too?

VK: We wouldn’t take a job that we knew would lead to a resource action.

IW: Resource action?

VK: Layoff.

Now that everybody in Silicon Valley has become an armchair social network analyst with an opinion about the nature and uses of the “social graph” I thought it’d be useful to check in with Valdis for a long-range perspective on current trends. Bottom line: He thinks social networks that you have to explicitly join are artificial and ungraphable. But we agreed that these first-generation online social networks are fostering a culture of self-disclosure, and that they may lead to a second generation of more naturalistic systems: bottom-up, ad-hoc, peer-to-peer.

Code4Lib 2008

I’ve interviewed a couple of people who attended and/or spoke at last year’s Code4Lib conference: Art Rhyno, and Beth Jefferson. Code4Lib brings together IT-oriented librarians, and library-oriented IT folk, to create what seems like a truly unique event. I’m really looking forward to attending Code4Lib 2008 next week in Portland, where, as an adopted member of this strange tribe, I’ll be giving a talk on Thursday morning.

HealthVault protocols will be released under the Open Specification Promise

Back in October I interviewed Sean Nolan, chief architect for HealthVault. Now he’s launched a blog, and in his latest post he writes:

  • Microsoft will make the complete HealthVault XML interface protocol specification public.
  • With this information, developers will be able to reimplement the HealthVault service and run their own versions of the system.
  • Microsoft will irrevocably promise that we will not make patent claims against you for implementing the specification, subject to the terms of the OSP.

Excellent! My take on HealthVault is that it’s doing the right things in the right ways. This announcement confirms that.