Thanks to some really great comments on yesterday’s item I’ve taken another pass through the spreadsheet I got from the police department1. It looks like Chris Anderson and David French were exactly right to suggest a “police station effect” — namely, that there’s more crime at or near the police station.
Here’s a version of yesterday’s chart (with cleaner underlying data):
It’s focused on the old location of the police station which, you may recall, moved from Central Square in Jan 2006. If you thought the presence of the station would suppress the number of incidents, you wouldn’t find evidence for that here.
Now here’s the same thing focused on the new location of the police station:
That’s pretty clear!
There were two causes suggested.
1 (Chris): “The station was the place of the crime report and there was often no specific address.”
Yup. Of the 341 incidents within .1 mile of the new station, 315 were at the exact address.
2 (David): “This is where you end up when they let you out of the drunk tank.”
It’s possible to explore that spillover effect, but I’ll stop here and call out another excellent comment from Doug Finner:
If you get a big pile-o-data and don’t know everything about how the data was collected, it can be pretty close to impossible to do anything other than make very general observations. Trying to draw conclusions from data that is likely ‘dirty’ is often a fools errand. Probably the best you can do, is find interesting trends and then try and get good clean data collected – the whole scientific method thing.
Indeed. For this round I took a much more critical look at the address data. I discarded the fair number of junk addresses that resolved erroneously to the city center. And because the addresses in the file didn’t specify “St” or “Rd” there were systematic problems — particularly in the case of Marlboro which was resolving to Rd rather than St.
As Doug Finner suggests, it would be wise at this point to hand back the file augmented not only with latitude/longitude coordinates, but also with indications of how clean or dirty the geocoding was, and recommendations on how to improve it.
Meanwhile, the toolsmith in me is getting fired up with all kinds of ideas. For example, when I processed the raw file to create this categorized stack graph I wound up creating an ad-hoc system of piped filters in Python. Each one takes a list of rows and returns a transformed list of rows. Here are some of them:
All well and good. But this just begs for some kind of social treatment a la Pipes or Popfly, with a particular focus on the transformation of rectangular datasets.
I’m also thinking about ways to meld Python and Excel together more closely. So far, I’ve only relied on code generation — that is, using Python to write VBA macros to, for example, define named ranges. There’s also the possibility of outside-in automation, where Python drives Excel through its automation interface. But then I got to wondering: Will there be a role for IronPython (or IronRuby) here, someday, such that you could use these languages inside Excel? That’d be very cool.
1 Yes, I will publish this data once I’ve had a chance to show my work to the police and get their approval.
15 thoughts on “The police station effect”
Jon, regarding your last comment about putting languages inside Excel. That reminded me of a post I saw last week over at Sean McGrath’s place talking about a product called Resolver which seems exactly what you are talking about. It is an IronPython application that exposes the entire spreadsheet as a python script. I spent a few hours looking at it and I really like the concept. Check it out if you haven’t already.
You should look at Resolver ( http://www.resolversystems.com and http://www.resolverhacks.net ) for processing the data with IronPython and using a spreadsheet interface. :-)
That looks to be maybe the reverse of what you’re looking for: driving Excel with IronPython. But it does give you access to the object model, so you could conceivably use it to display the data you’ve parsed with Python.
Don’t forget the Python-UNO bridge for OpenOffice.
Sorry, I should have read closer…you already mentioned IPY->Excel in your article.
You probably saw this too:
That obviously isn’t as seamless as being able to call a IPY script from within Excel, though.
Doesn’t the prevalence of more officers ‘create’ more crime data points by being there to witness and make an arrest or a report? Wouldn’t the presence of a station have a lensing effect?
You can use plain Python with the win32 extensions http://python.net/crew/mhammond/win32/ to drive Office (I use it to grab data from Excel, or to munge Word documents). You could even create a Python COM object http://www.oreilly.com/catalog/pythonwin32/chapter/ch12.html and call it from your VBA macro, although it’s not going to be particularly pretty. But being able to use Python directly inside Excel would be very cool.
“thinking about ways to meld Python and Excel together more closely”
You might like to try OpenOffice which supports Python directly. Unfortunately the current charting model still lets OpenOffice down.
So I checked out Resolver — it’s in private beta, but I’ve signed up. Looks intriguing!
A couple of cautionary notes: (1) Computers make it easy to explore enormous numbers of “what if” questions, but conclusions from these explorations are inherently suspect. Hypothesis generating techniques are a good source of starting points for asking questions with better methods. (2) There is a large disparity between the size of the blue zone and the green zone. With an area model, this disparity would be at least a factor of five (if we truncate the green zone at 1.5 miles). Disparities like this can distort actual effects in the data. While an area model may not be appropriate, it is something that should be considered with geographic data in the absence of other models.
I think Stephen (comment 10) nailed the problem with data sifting like this: one can easily find
all kinds of correlations in the world but ascribing cause is much more difficult. It might be
an enlightening exercise to get a couple of smart people in a room and brainstorm some of the
many possible “causes” for your observations. Just off the cuff, for example, are people more
likely to call the cops for minor problems if the police station is “just around the corner”?
Did the police station relocate from a mostly non-residential, urban city center to a more
residential suburb? And so on…
As Stephen says, your interesting observations are just starting points for asking more
focused questions with better methods.
Would it be possible to separate the reports by whether the crime was reported by a citizen, or discovered by a police officer witnessing it? There’s no particular surprise in finding that the latter category occurs more often closer to the cop shop – although a difference as steep as your first graph seems to show might indicate commanders are doing a poor job of getting their people spread out and covering the whole town.
– The old police station location was deliberately located where the crime was worst. The new location is far less optimal.
– The presence of a police station somehow causes a neighborhood to deteriorate. I can certainly see how it would concentrate certain undesirables: drunks turned loose in the morning, chronic complainants, relatives of prisoners in the jail with their own criminal proclivities… Would parole and probation officers be located in the police station? It’s possible that this effect would overwhelm the protective effect of having a police station nearby.
I know this is late, but I was wondering if you had tried Open Office’s Calc for any of this.
Calc has a number of programming language bridges into it using the UNO system, including on for python called pyuno.
I don’t have any experience with it yet, but in my own data project that I am just starting (trying to convince my city that they need more efficient bus routing), I think that I will be using it a lot.