The Broad Street pump and the Keene police station

In the latest episode of the Long Now lecture series, Steven Johnson talks about the process of thinking on multiple scales simultaneously, both in time and space. Stewart Brand has a nice summary of the talk, which is full of interesting stuff, but there’s one piece of it that I want to highlight here because it relates to one of my current themes about citizen use of government data.

If you’re an Edward Tufte fan, like me, you’ll know the story of the 1854 cholera outbreak in London, and of John Snow’s map which showed deaths clustered around the Broad Street pump and which proved that the cause was bad water, not bad air (miasma). That story plays a central in Steven’s current book, and in his talk he points out that Snow was part of a larger cast of characters. One important but neglected figure was Henry Whitehead, a local vicar who collaborated with Snow. Another was William Farr, a government statistician. Although he initially favored the incorrect miasma theory, Farr had the good sense to publish the data that enabled others to find the right answer. Steven says:

The government could have said “OK, we’re going to compile all these statistics about who’s dying of what, where,” but Farr had this great idea that we’re going to make it available to everyone because someone maybe was going to find something of interest. And without that open access to the government statistics, the case wouldn’t have been made convincingly enough.

Rereading the comments on this entry, it’s humbling to realize that, 150 years later, that kind of access still seems like a novelty.

And in cases where we do have access to the data, it’s frustrating to see how little effective use we are able to make of it, despite all the bandwidth and computational resources available to us. Arranging pushpins on maps is a fine thing to do, but we should expect, and be able to do, more.

Greg Whisenant says “I’d be interested in features that people would like to see in a ‘basic analytics’ framework.” Rather than answer in general terms, I’ll talk about a specific situation. Here in idyllic smalltown Keene, New Hampshire, we’re seeing a very disturbing increase in crime. It’s nothing that would raise anyone’s blood pressure in Chicago or San Francisco, but there have been a lot more burglaries lately, and a handful of violent assaults of the sort we almost never see here.

By way of background, the police station recently moved from downtown to a peripheral location. One hypothesis is that our crime wave correlates with that move: less police presence downtown, more crime. One proposed response is to beef up that presence by creating and manning staffing a downtown substation. How do we test this hypothesis and evaluate that proposal?

I’ve tried (so far unsuccessfully) to get hold of the raw data: crimes, by type and date and location, as far back as our electronic records go. Let’s assume I can get that data. What then? For starters, I’d like to count incidents before and after the move. But I’d also like to count incidents within circles of varying radii centered on the police station, over time, and see an animated visualization of that information. Or rather, since I’m somebody with the programming chops to make something like that happen, I’d like to see a toolkit that would enable an ordinary citizen with data and a hypothesis to make that happen.


  1. jon, i’m glad you continue to push this idea – i think it is the most important issue for geek/activist types to push. the free software movement has been instrumental in opening up tools to people to do interesting things at little cost. We need to move beyond the tools now. Our world is built on governments’ & peoples’ (and corporations’ too) ability to collect, process, and interpret data; and make decisions/solve problems as a result of the interpretation of data.

    by opening the data sets that drive government policy-making – which in the end is really about problem-solving – we will multiply society’s ability to analyze data, and vastly improve society’s ability to find innovative solutions to problems, big and small.

    so keep pushing! … if it hasn’t happened yet, someone should organize a multi-disciplinary conference on this topic, to get a few groups mixing: free software/open project folk, government stats people, free data people, and then people expert in society-wide problems (say health, environment, energy, etc).

    by mixing these groups we might be able to convince governments of the value to this movement; show the problem-solvers how groups of citizens might help; and help data-hounds and the open movement to understand where solutions might be built, if only we had more access to data.

  2. Jon

    Freedom of Information was supposed to push the door ajar and already that is being compromised left, right and centre.

    If we take the analogy of ” no taxation without representation ” and decided to say in 2007 ” no taxation without access to what it is spent on ” maybe we could have another Boston tea party.

    Would we run a pledge to say I will not pay my council tax if a million others would do the same. Could we get there as fact as the 1.8 million ( or thereabouts ) who went to Downing Street about road pricing ?

  3. I was looking through some Australin crime data the other week, pert of that process was reading about where the data comes from.
    I would suggest you contact you local courts, any convictions will bve recorded there.

    Regarding doing something useful with the data, good luck, though others have had some success.

    Certainly the advent of mapping APIs is part of the enabling process.

    Problems may arise on the borders of jurestictions, you may have the data fro one area but not the neighbouring area.
    Other boundaries may also be a factors, rivers, hills, airport, etc.
    Or maybe some other local changes, cheap alcohol being made available where is might not have been before.

    It’s a very worthwhile pursuit, and it must start with access to the data.

    There our societies, so it’s our data, please give it to us when we ask.

  4. A large number of police forces use this

    as part of their analysis of larger patterns (it’s part of a suite of tools more typically aimed at the analysis of individual cases).

    The buzz-term for this is Crime Pattern Analysis, but crime is typically much more dynamic than a simple GIS approach might show – for example, you have to look at the influence of time on clustering (mugging sprees follow people, people’s movement patterns change according to the time of day, day of the week etc.). Much of the interesting information is not suitable for publishing to the public (it’s not just crime stats but also lots of intelligence gathering in the process of ongoing investigations). And the dangers of publishing subsets of data include issues such as tipping off criminals as to what’s detected and what’s not (and how timely the reporting is), tipping off criminals as to what crimes may be financially viable, and identifying individuals as victims (“this house was robbed 6 months ago, I bet they’ve got a load of new stuff by now”).

    The Washington police have a particularly pro-active and dynamic approach to this sort of analysis. But this involves answering to authorities for prioritisation of tasks – the public perception and concerns regarding crime is one thing, but the proper interpretation of some of the data is best left to the experts who might know how best to avoid simplistic interpretations of complex, incomplete, multi-variate data (Law Enforcement Analysis is a specialist role, this isn’t something routinely undertaken by beat cops and the like).

  5. “the dangers of publishing subsets of data include issues such as…”

    Excellent point. This can lead to discussions of whether, or to what degree, to aggregate data spatially, temporally, or both.

    There’s also the perenially interesting question of ease of access to information that /is/ public, but not aggregated. A patient compiler of the crime column in the newspaper could over time amass the same data as might more conveniently be published in aggregated form. Does not publishing amount to a “security by obscurity” argument? If so does that argument hold some water? Tricky questions indeed.

    “the proper interpretation of some of the data is best left to the experts who might know how best to avoid simplistic interpretations…”

    Another excellent point. Of course, in the London cholera outbreak the presumptive expert was William Farr who had reached the wrong conclusion. And yet he released data that enabled John Snow to get the right answer. Expertise can and does exist in many places.

  6. Hi Jon,

    ‘Of course, in the London cholera outbreak the presumptive expert was William Farr who had reached the wrong conclusion. And yet he released data that enabled John Snow to get the right answer.’

    Yes, but sickness and illness are, largely, acts of nature (* – see below). Crime is committed by individuals, and the perception of crime is a dangerous thing, so can lead to some pretty ugly incidents,7369,361031,00.html

    Publishing any form of crime stats can be very dangerous, hence court restrictions on reporting and the like. Setting the police agenda by ‘public perceptions alone’ tends to lead to poor policing due to the short-term nature of (often media led) frenzies and scapegoating.

    I think I’ve probably led you a distance from your original point, let me say that I take your larger point about getting data and analysis into the hands of more people (sort of ‘open source’ for data analysis), it’s just that crime is a particularly dangerous statistic to play with, see also publishing census data, immigration statistics, HIV rates (c.f. ‘sickness and illness’ above) and anything else that allows an aspect of moral outrage of one set of values over another. Pogroms and the like are an ugly and predictable aspect of human nature.


  7. “I think I’ve probably led you a distance from your original point”

    Actually not, and I appreciate your thoughtful comments very much. Although the long-term trend is clearly toward more transparency, there are some outcomes we’ll reach more easily than others, and some we shouldn’t even aspire to.

    In one sense there’s nothing new here, it’s all been thought about before. But changing circumstances now warrant rethinking. I’ve written a lot in the past about what happens when the notion of public information changes from “available to those with time and money to visit city hall” to “available to everybody”.

    Similarly, there’s about to be a sea change in terms of what constructive /or/ destructive uses can be made of public information by empowered individuals wielding modern communication and information technologies.

    As a society, we need to be having a conversation about what are the new risks, what are the new opportunities, and how should we proceed.

  8. Interesting discussion – we’ve recently started an open source data and analysis project called GeoCommons that might be useful. We do not have a temproal analysis tool up yet, but have one slated for later this year. There is a good chunk of open source data contributed thus far, but nothing for Keene New Hampshire (yet). I did find a data set on crimes at universities for 2005 from the department of education – – and New England College in Henniker, NH had the second most robberies (40)that year. I saved the map here – if you would like to check it out. The data is all under Creative Commons so feel free to take it and play with it.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s