Motivation, context, and citizen analysis of government data

Matt McAlister heard “crackling firearms” in his San Francisco neighborhood and wrote a wonderful essay on a theme that was central to my keynote talk last week at the GOVIS conference: how citizens can and will work with governments to diagnose social problems and develop solutions. When the District of Columbia’s DCStat program rolled out last summer, I was delighted by the forward thinking involved. Publishing the city’s operational data directly to the web, for everyone to see and analyze, with the explicit goal of making the delivery of government services transparent and accountable, was and is an astonishingly bold move. And as Matt found when investigating crime in his neighborhood, it’s still part of the unevenly distributed future:

I then found the official San Francisco Police Department Crime Map. Of course, the data is wrapped in their own heavy-handed user interface and unavailable in common shareable web data formats.

Access to data is good, and access to data in useful formats is better, but these are only the first steps. We need to make interpretations of the data, compare and discuss those interpretations, and use them to inform policy advocacy. The mashups that Matt reviews are a glimpse of what’s to come, but these interactive visualizations have a long way to go.

Here’s another glimpse of what’s to come: I took a snapshot of the DC crime data, uploaded it to Dabble DB, built a view of burglary by district and neighborhood, and published it at this public URL. There are two key points here. First, discussion can attach to (and will be discoverable in relation to) that URL. Second, the data behind the view is also available at that URL, in a variety of useful formats, so alternate views can be produced, pointed to, and discussed.

Still, these are only views of data. There’s no analysis and interpretation, no statistical rigor. Since most ordinary citizens lack the expertise to engage at that level, are governments that publish raw data simply asking for trouble? Will bogus interpretations by unqualified observers wind up doing more harm than good?

That’s a legitimate concern, and while the issue hasn’t yet arisen, because public access to this level of data is a very new phenomenon, it certainly will. To address that concern I’ll reiterate part of another item in which I mentioned John Willinsky’s amazing talk on the future of education:

Willinsky talks about how he, as a reading specialist, would never have predicted what has now become routine. Patients with no ability to read specialized medical literature are, nonetheless, doing so, and then arriving in their doctors’ offices asking well-informed questions. Willinsky (only semi-jokingly) says the Canadian Medical Association decided this shouldn’t be called “patient intimidation” but, rather, “shared decision-making.”

How can level 8 readers absorb level 14 material? There are only two factors that govern reading success, Willinsky says: motivation, and context. When you’re sick, or when a loved one is sick, your motivation is a given. As for context:

They don’t have a context? They build a context. The first time they get a medical article, duh, I don’t know what’s going on here, I can’t read the title. But what happened when I did that search? I got 20 other articles on the same topic. And of those 20, one of them, I got a start on. It was from the New York Times, or the Globe and Mail, and when I take that explanation back to the medical research, I’ve got a context. And then when I go into the doctor’s office…and actually, one of the interesting things…is that a study showed that 65% of the doctors who had had this experience of patient intimidation shared decision-making said the research was new to them, and they were kind of grateful, because they don’t have time to check every new development.

When your loved one is sick, you’re motivated to engage with primary medical literature, and you’ll build yourself a context in which to do that. Similarly, when your neighborhood is sick, you’ll be motivated to engage with government data, and you’ll build yourself a context for that.

The quest for context could, among other things, lead to a renewed appreciation for a tool that’s widely available but radically underutilized: Excel. Most people don’t earn a living as quants, so Excel, for most people, winds up being a tool for summing columns of numbers and arranging text in tabular format. That may change as more public data surfaces, and as more people realize they want to be able to interpret it. In which case Chris Gemignani and the rest of the Juice Analytics team will emerge as leading resources available to motivated citizens wanting to learn how to make better use of Excel.


  1. A few months ago we finished a project to allow flexible access to data produced by the Federal Reserve Board. Two important goals were to expose everything as a URL, and provide “a variety of useful formats, so alternate views can be produced, pointed to, and discussed”.

    One of the big problems was how to encode the search criteria in a URL. Unlike Google, where you are searching on a few unstructured terms to produce a list of possible matches, retrieving specific technical data can include an unbounded set of structured criteria for what needs to be returned. It will be interesting to see the extent to which this extra effort creates new opportunities.

  2. Great link to Juice Analytics. Now they have a new reader. These guys should write a book about using excel to visualize data.

    BTW, would anyone here recommend me an Excel book about how to make sense of data? I’m a software developer, but I never used spreadsheets. Sometimes I really think it is missing from my toolbox.

  3. “Dabble DB is looking for some kind of password.”
    Hmm. That was supposed to be a public view, sorry, I’ll recheck it.
    …Should be OK now.

  4. Jon,

    In our quest to obtain a 911 calls for service data-sharing agreement to eventually map, report, and even send out text alerts to citizens, we have found no examples of police departments openly entering into agreements. Discovering the DCStat program, thanks to you, was euphoric. The only other solid cases of data-sharing with the community I’ve found follow from the UK Crime and Disorder Act. We are very clearly thinking ahead, and are considering using the model of a crime data-sharing agreement to get access to other kinds of information which are essential for citizens to rebuild their neighborhoods — here we’re talking about permits for demolition of historic properties in particular, which are implemented improperly (or illegally). We’d also like more of an opportunity to input citizen complaints about nuisance properties and other quality of life concerns. If we don’t get the cooperation we desire from the city, we’re going to move forward with our own system of citizen data entry to force the issue.

    Thanks for bringing the DCStat program to light. It’s a fantastic development!

  5. are governments that publish raw data simply asking for trouble? see for an example of data being reprocessed by informed public which revealed that the data on traffic accidents did not support the tagline “speed kills”. Probably not a desirable outcome from the viewpoint of the provider.
    Raw data is somewhat sensitive to the quality of people putting the data in … an early example of crime-mapping here showed a concentration of crime at police stations where the crime was reported.

  6. “are governments that publish raw data simply asking for trouble?”

    That’s going to be a huge issue once the ball starts rolling. People will make faulty interpretations of data and jump to wrong conclusions. But people will also make correct interpretations and reach right conclusions. The key thing will be that the transparency cuts both ways. It’s not just the governments providing the data that will be subject to transparency and accountability. So will the citizens interpreting the data.

    It’ll be messy for sure, but I think it’ll be the right kind of messiness.

  7. I agree, Jon. If the data was messy to begin with, it couldn’t have been serving any purpose to government officials. If crime is being reported at police stations, it’s useless for any sort of analysis, which points to a more troubling issue of the public not being served in an intelligent manner by the law enforcement agency. Exposing those problems is merely the first step to working smarter. And where law enforcement agencies don’t have the personnel or the time to dedicate, citizens may step in to provide those services. It can work. All that’s required is cultivating an attitude of understanding and cooperation, with a shared purpose of greater transparency and service to the community.

  8. Jon,

    Great post and blog. We are a small but growing community.

    We have recently partnered with the MPDC to provide high impact Google Maps and automated location-based alerts to the community. The new site is here, with an example of data presentation for one Zip Code in Washington:

    The service is free to members of the community and to law enforcement (we charge departments if they want to send SMS messages or surveys).

    I’d be interested in features that people would like to see in a “basic analytics” framework. We’ll be adding them soon.

  9. Democratizing government data will help change how government operates—and give citizens the ability to participate in making government services more effective, accessible, and transparent.

