In the latest episode of the Long Now lecture series, Steven Johnson talks about the process of thinking on multiple scales simultaneously, both in time and space. Stewart Brand has a nice summary of the talk, which is full of interesting stuff, but there’s one piece of it that I want to highlight here because it relates to one of my current themes about citizen use of government data.

If you’re an Edward Tufte fan, like me, you’ll know the story of the 1854 cholera outbreak in London, and of John Snow’s map which showed deaths clustered around the Broad Street pump and which proved that the cause was bad water, not bad air (miasma). That story plays a central in Steven’s current book, and in his talk he points out that Snow was part of a larger cast of characters. One important but neglected figure was Henry Whitehead, a local vicar who collaborated with Snow. Another was William Farr, a government statistician. Although he initially favored the incorrect miasma theory, Farr had the good sense to publish the data that enabled others to find the right answer. Steven says:

The government could have said “OK, we’re going to compile all these statistics about who’s dying of what, where,” but Farr had this great idea that we’re going to make it available to everyone because someone maybe was going to find something of interest. And without that open access to the government statistics, the case wouldn’t have been made convincingly enough.

Rereading the comments on this entry, it’s humbling to realize that, 150 years later, that kind of access still seems like a novelty.

And in cases where we do have access to the data, it’s frustrating to see how little effective use we are able to make of it, despite all the bandwidth and computational resources available to us. Arranging pushpins on maps is a fine thing to do, but we should expect, and be able to do, more.

Greg Whisenant says “I’d be interested in features that people would like to see in a ‘basic analytics’ framework.” Rather than answer in general terms, I’ll talk about a specific situation. Here in idyllic smalltown Keene, New Hampshire, we’re seeing a very disturbing increase in crime. It’s nothing that would raise anyone’s blood pressure in Chicago or San Francisco, but there have been a lot more burglaries lately, and a handful of violent assaults of the sort we almost never see here.

By way of background, the police station recently moved from downtown to a peripheral location. One hypothesis is that our crime wave correlates with that move: less police presence downtown, more crime. One proposed response is to beef up that presence by creating and manning staffing a downtown substation. How do we test this hypothesis and evaluate that proposal?

I’ve tried (so far unsuccessfully) to get hold of the raw data: crimes, by type and date and location, as far back as our electronic records go. Let’s assume I can get that data. What then? For starters, I’d like to count incidents before and after the move. But I’d also like to count incidents within circles of varying radii centered on the police station, over time, and see an animated visualization of that information. Or rather, since I’m somebody with the programming chops to make something like that happen, I’d like to see a toolkit that would enable an ordinary citizen with data and a hypothesis to make that happen.