Yesterday David Stephenson interviewed me for the book he is was to be writing with Vivek Kundra who is currently Washington DC’s CTO and reportedly the next Office of Management and Budget administrator for e-government and information technology.
Back in 2006 I learned from DC’s previous CTO, Suzanne Peck, and from Dan Thomas, about their plan to publish operational data in the service of transparency and accountability. At the time, I hoped this effort would show how ordinary citizens, as well as journalists, could be empowered to ask and answer questions like:
Do people in poor neighborhoods wait longer for service requests to be handled?
Talking with David yesterday, I struggled to come up with examples where the online publication and visualization of public data supports that kind of analysis. The best one I’ve seen lately comes from Eric Rodenbeck’s talk at ETech.
Eric’s company, Stamen Design, created Oakland Crimespotting. And yes, it’s another in a long line of mashups that spray crime data onto a Google Maps (or, in this case, Virtual Earth) display. But here’s the part of Eric’s talk that really got my attention:
There were no prostitution arrests for about a month. Then one day the cops started at one end of San Pablo Avenue, and you can watch them moving up the street and making arrests.
It wouldn’t have occurred to a citizen, or to a reporter, to ask the question:
Have the cops decided to crack down on prostitution?
Here the policy decision to conduct a sweep emerges from the data. There are two crucial enablers. First, the use of a map as a query interface. That’s common. But second, the use of animation to observe flows of data in time as well as in space. That’s still much rarer.
In the software community there’s vigorous debate about whether we need to rely on plugins like Flash and Silverlight to animate data in ways that enhance its analysis. My answer: It depends. Clearly much can already be done, and more will be done, with the basic web platform: browsers operating in an increasingly rich ecosystem of web services. Look at how the Rocky Mountain Institute uses animation to tell a story about US oil imports much more effectively than my static presentation was able to do. And like Stamen’s Oakland Crimespotting animation, the RMI’s oil import animation doesn’t use any plugins.
But we’re facing critical challenges, and we’ll want to deploy all the power tools we can lay our hands on. To that end, my colleagues at MIX Online have just released Project Descry, a set of four Silverlight-based visualizations. In an introductory article I wrote:
The world we must make sense of now is one in which human actions have planetary effects. The good news is that we can, for the first time, begin to measure those effects. We’re instrumenting the atmosphere and the oceans, and torrents of data are arriving from our sensors. The bad news is that we’re not yet very skillful storytellers in the medium of data. That’s true both in the specialized realm of science, and more broadly at the intersection of science, public policy, and the media.
If you’re a developer and are curious about how to create, for example, a treemap widget in Silverlight, you can visit Descry on CodePlex and have a look.
There are all kinds of useful tools yet to be built — in a variety of ways — and made available to citizens of the Net. I’m particularly interested in general-purpose visualizers, like the excellent ones at Many Eyes, that non-programmers can pour data into and make productive use of.
Where, for example, is the general-purpose visualizer for map data over time? In the spirit of Many Eyes, I’d like anyone to be able to upload a simple comma-separated dataset and create an animation like FlowingData’s Growth of Target, 1962 – 2008.
Ideally, the visualizer would also provide a scrollbar for scrubbing along the timeline. In the FlowingData example, you can do a geographic query by zooming and panning. But once you have selected a region you have to play the whole animation. Add timeline scrolling, and you can combine temporal with spatial query.
What other kinds of general-purpose visualizers do you imagine having and using?
I couldn’t get the Flowing Data page to load (well it loaded but apparently the plugin wasn’t working) but from your description I envision the type of “moving bubble” chart that GapMinder did.
That product was bought by Google and I’ve used it to make temporal visualizations with pretty good success.
I really think many people overlook Google Docs, but it has the ability to consume as well as publish data from many sources (via HTTP) and thus I think ranks pretty high on the interoperability and accessibility scale.
i’d love to see datasets on consumer product displays that shares information about how many people have returned that product in the past few months — or past few years.
i bought some blank dvd -r disks at a microcenter store last month. these disks did not work at all in my macbook laptop. different people i talked with told me that it’s common knowledge that generic dvd disks don’t work — that i should buy brand name recordable dvd disks.
that common knowledge was not represented on the bin from which i purchased the product. if it had been, i would have walked over a few steps and bought the name brand recordable dvd disks.
so what i’m saying is that i’d like to see more “consumer experience” data represented in brick-and-mortar as well as online stores.
David is still writing the book, even though Vivek is not. Your revision could be interpreted as saying that he isn’t writing it any more.
Loved the Target visualization. The timeline scrollbar is an important control, making the presentation more interactive. Some further features would be: (1) a range control for time, so you could, for example, see where stores where opened within a particular two-year period (see Google Finance charts for an example of this).
(2) mutual interactivity between the panes of the display: If I zoom in on a particular spacial area, have the histogram reflect the hits in that area, possibly showing contributions from the selected area and the total.