The police station effect

2 Aug 20072 Aug 2007 ~ Jon Udell

Thanks to some really great comments on yesterday’s item I’ve taken another pass through the spreadsheet I got from the police department¹. It looks like Chris Anderson and David French were exactly right to suggest a “police station effect” — namely, that there’s more crime at or near the police station.

Here’s a version of yesterday’s chart (with cleaner underlying data):

It’s focused on the old location of the police station which, you may recall, moved from Central Square in Jan 2006. If you thought the presence of the station would suppress the number of incidents, you wouldn’t find evidence for that here.

Now here’s the same thing focused on the new location of the police station:

That’s pretty clear!

There were two causes suggested.

1 (Chris): “The station was the place of the crime report and there was often no specific address.”

Yup. Of the 341 incidents within .1 mile of the new station, 315 were at the exact address.

2 (David): “This is where you end up when they let you out of the drunk tank.”

It’s possible to explore that spillover effect, but I’ll stop here and call out another excellent comment from Doug Finner:

If you get a big pile-o-data and don’t know everything about how the data was collected, it can be pretty close to impossible to do anything other than make very general observations. Trying to draw conclusions from data that is likely ‘dirty’ is often a fools errand. Probably the best you can do, is find interesting trends and then try and get good clean data collected – the whole scientific method thing.

Indeed. For this round I took a much more critical look at the address data. I discarded the fair number of junk addresses that resolved erroneously to the city center. And because the addresses in the file didn’t specify “St” or “Rd” there were systematic problems — particularly in the case of Marlboro which was resolving to Rd rather than St.

As Doug Finner suggests, it would be wise at this point to hand back the file augmented not only with latitude/longitude coordinates, but also with indications of how clean or dirty the geocoding was, and recommendations on how to improve it.

Meanwhile, the toolsmith in me is getting fired up with all kinds of ideas. For example, when I processed the raw file to create this categorized stack graph I wound up creating an ad-hoc system of piped filters in Python. Each one takes a list of rows and returns a transformed list of rows. Here are some of them:

removeIncidentnums
dedupeCasenums
adjustDates
trimDescs
removeSingletonDescs
addCategories
addMonthlyCounts

All well and good. But this just begs for some kind of social treatment a la Pipes or Popfly, with a particular focus on the transformation of rectangular datasets.

I’m also thinking about ways to meld Python and Excel together more closely. So far, I’ve only relied on code generation — that is, using Python to write VBA macros to, for example, define named ranges. There’s also the possibility of outside-in automation, where Python drives Excel through its automation interface. But then I got to wondering: Will there be a role for IronPython (or IronRuby) here, someday, such that you could use these languages inside Excel? That’d be very cool.

¹ Yes, I will publish this data once I’ve had a chance to show my work to the police and get their approval.

Published by Jon Udell

View all posts by Jon Udell

15 thoughts on “The police station effect”

stand says:

2 Aug 2007 at 2:53 pm

Jon, regarding your last comment about putting languages inside Excel. That reminded me of a post I saw last week over at Sean McGrath’s place talking about a product called Resolver which seems exactly what you are talking about. It is an IronPython application that exposes the entire spreadsheet as a python script. I spent a few hours looking at it and I really like the concept. Check it out if you haven’t already.

Loading...

Reply
Michael Foord says:

2 Aug 2007 at 3:05 pm

You should look at Resolver ( http://www.resolversystems.com and http://www.resolverhacks.net ) for processing the data with IronPython and using a spreadsheet interface. :-)

Loading...

Reply
cthrall says:

2 Aug 2007 at 3:26 pm

http://www.ironpython.info/index.php/Interacting_with_Excel

That looks to be maybe the reverse of what you’re looking for: driving Excel with IronPython. But it does give you access to the object model, so you could conceivably use it to display the data you’ve parsed with Python.

Loading...

Reply
Michael Bernstein says:

2 Aug 2007 at 3:30 pm

I’m also thinking about ways to meld Python and Excel together more closely. So far, I’ve only relied on code generation — that is, using Python to write VBA macros to, for example, define named ranges. There’s also the possibility of outside-in automation, where Python drives Excel through its automation interface. But then I got to wondering: Will there be a role for IronPython (or IronRuby) here, someday, such that you could use these languages inside Excel? That’d be very cool.

Don’t forget the Python-UNO bridge for OpenOffice.

Loading...

Reply
cthrall says:

2 Aug 2007 at 3:47 pm

Sorry, I should have read closer…you already mentioned IPY->Excel in your article.

You probably saw this too:

http://lists.ironpython.com/pipermail/users-ironpython.com/2007-June/005152.html

That obviously isn’t as seamless as being able to call a IPY script from within Excel, though.

Loading...

Reply
leMel says:

2 Aug 2007 at 9:49 pm

Doesn’t the prevalence of more officers ‘create’ more crime data points by being there to witness and make an arrest or a report? Wouldn’t the presence of a station have a lensing effect?

Loading...

Reply
James says:

2 Aug 2007 at 11:12 pm

You can use plain Python with the win32 extensions http://python.net/crew/mhammond/win32/ to drive Office (I use it to grab data from Excel, or to munge Word documents). You could even create a Python COM object http://www.oreilly.com/catalog/pythonwin32/chapter/ch12.html and call it from your VBA macro, although it’s not going to be particularly pretty. But being able to use Python directly inside Excel would be very cool.

Loading...

Reply
David French says:

3 Aug 2007 at 12:16 am

“thinking about ways to meld Python and Excel together more closely”
You might like to try OpenOffice which supports Python directly. Unfortunately the current charting model still lets OpenOffice down.

Loading...

Reply
Jon Udell says:

3 Aug 2007 at 8:14 am

So I checked out Resolver — it’s in private beta, but I’ve signed up. Looks intriguing!

Loading...

Reply
Stephen says:

3 Aug 2007 at 10:02 pm

A couple of cautionary notes: (1) Computers make it easy to explore enormous numbers of “what if” questions, but conclusions from these explorations are inherently suspect. Hypothesis generating techniques are a good source of starting points for asking questions with better methods. (2) There is a large disparity between the size of the blue zone and the green zone. With an area model, this disparity would be at least a factor of five (if we truncate the green zone at 1.5 miles). Disparities like this can distort actual effects in the data. While an area model may not be appropriate, it is something that should be considered with geographic data in the absence of other models.

Loading...

Reply
Tom says:

12 Aug 2007 at 1:43 pm

I think Stephen (comment 10) nailed the problem with data sifting like this: one can easily find
all kinds of correlations in the world but ascribing cause is much more difficult. It might be
an enlightening exercise to get a couple of smart people in a room and brainstorm some of the
many possible “causes” for your observations. Just off the cuff, for example, are people more
likely to call the cops for minor problems if the police station is “just around the corner”?
Did the police station relocate from a mostly non-residential, urban city center to a more
residential suburb? And so on…

As Stephen says, your interesting observations are just starting points for asking more
focused questions with better methods.

Loading...

Reply
markm says:

16 Aug 2007 at 11:10 am

Would it be possible to separate the reports by whether the crime was reported by a citizen, or discovered by a police officer witnessing it? There’s no particular surprise in finding that the latter category occurs more often closer to the cop shop – although a difference as steep as your first graph seems to show might indicate commanders are doing a poor job of getting their people spread out and covering the whole town.

Other theories:
– The old police station location was deliberately located where the crime was worst. The new location is far less optimal.

– The presence of a police station somehow causes a neighborhood to deteriorate. I can certainly see how it would concentrate certain undesirables: drunks turned loose in the morning, chronic complainants, relatives of prisoners in the jail with their own criminal proclivities… Would parole and probation officers be located in the police station? It’s possible that this effect would overwhelm the protective effect of having a police station nearby.

Loading...

Reply
Pingback: First look at Resolver, an IronPython-based spreadsheet « Jon Udell
Pingback: All Night Coder - Today’s Top Blog Posts on Programming - Powered by SocialRank
Jim says:

23 Apr 2008 at 11:23 am

Jon,

I know this is late, but I was wondering if you had tried Open Office’s Calc for any of this.

Calc has a number of programming language bridges into it using the UNO system, including on for python called pyuno.

http://udk.openoffice.org/python/python-bridge.html

I don’t have any experience with it yet, but in my own data project that I am just starting (trying to convince my city that they need more efficient bus routing), I think that I will be using it a lot.

Loading...

Reply