August 2007
Monthly Archive
August 31, 2007
Posted by Jon Udell under
Uncategorized [10] Comments
If you doubt that a librarian can change the world, listen to what Barbara Aronson has to say in this week’s installment of my ITConversations podcast. As the librarian for the World Health Organization, she’s been the driving force behind HINARI, a publisher partnership that’s making thousands of otherwise unaffordable medical journals available to researchers in poor countries — at no cost to the poorest 70 (“band 1″) countries, and at nominal cost to the next-poorest 43 (“band 2″) countries.
I first heard about HINARI from Lee Dirks, Microsoft’s director of scholarly communication, whom I met last month at the EDUCAUSE Seminars on Academic Computing. “She’s doing amazing work,” Lee told me. Wow, is she ever. Her mission to democratize access to medical knowledge challenges our assumptions about the nature of open access, the economics of publishing, and the research priorities of the developed world.
In response to criticism that HINARI isn’t “pure” open access, because of the token fees paid by band 2 countries, she says:
I think that most people who live in wealthy countries have no idea about what incomes are like in poor countries, and how many countries are poor. There are 72 countries whose gross national income per capita is less than a thousand dollars. There are 43 who are in the one- to three-thousand-dollar range. These countries bear the highest burden of disease in the world. These are places where most of the wars in the world are taking place, where they have the highest unemployment, and the most precarious life. Now most of the health research in the world is done on the problems of the richest people, because the money for health research comes from the wealthy countries. So you’ve got these very poor countries with the worst health problems, they’re trying to train doctors and nurses to take care of their populations, they’re trying to train researchers to find out the information that doesn’t exist to solve their problems, and they’re doing all this with no access to scientific and technical information.
Is HINARI open access? Not according to anybody’s strict definition of open access, but those definitions are made in the developed world. The open access argument, or discussion, as it’s going along in the places where it is going along, is too narrow, and it needs to be expanded if we’re going to be truly global about this.
What’s fascinating here, from the perspective of publishing economics, is the way in which publishers seem to be using HINARI to prototype a tiered pricing model that Barbara Aronson suggests may be applicable to their customers in the developed world too:
Maybe if the publishers could find a way to manage tiered pricing, which is a big question, maybe then they’d be able to address issues like: What about poor institutions in rich countries? What about underfunded individuals in rich countries?
But I don’t want to leave the impression that this conversation was all about publishing economics. There’s much more at stake. As she points out, researchers don’t just consume the information that HINARI makes available, they refract it through their own experiences and then contribute important new knowledge.
Two weeks ago I heard from the International Center for Diarrheal Diseases Resarch, in Bangladesh. They’re big users of HINARI. These are the people who pioneered the absolutely brilliant way of treating diarrhea in small children. Diarrheal diseases are one of the two biggest killers of children under five in the world. And this means in the developing world, because in the developed world people don’t die of this. These are the people who developed groundbreaking approaches to dealing with that, and to dealing with acute respiratory infection which is the other big killer. Bangladesh is one of the poorest countries in the world, and yet it has produced world-class research that has helped way beyond its borders.
I tip my hat to Barbara Aronson, to the World Health Organization, and to the hundred-plus publishers who’ve joined the WHO in this inspirational project.
August 29, 2007
Posted by Jon Udell under
Uncategorized [4] Comments
When Eric MacKnight pointed me to Ewan McIntosh’s reflections on Stuart Meldrum’s mapping project, he said: “This struck me as an idea that would interest you.”
It does. These folks are reaching for ways to build maps collaboratively — in this case, maps of the locations of schools in Scotland. One option would be to rely on a central authority that publishes the whole set of locations. Another would be to divvy up the work such that districts publish subsets of locations, or schools publish their own individual locations.
Division of labor aside, there’s the question of locus of control. Should there be a central registry to which contributors are granted access? Or should there be small pieces loosely joined?
Both, actually. Although we see this as an either/or choice, the two strategies can be complementary, and not just in the realm of collaborative mapping. All kinds of collective data management scenarios will benefit from combining these approaches.
Ewan McIntosh offers an example of an easy way to implement a central registry: a mashup of Google Spreadsheets and Google Maps. The spreadsheet is used as a lightweight, multi-user, versioned database of locations that populate the map. A central authority can control the registry, granting access to districts or schools.
In a comment on Ewan’s entry, John Johnston points to another mashup that’s conducive to a decentralized strategy based on tagging and syndication. John’s mashup scans his Flickr account for geotagged photos and sprays them onto the map. Whenever he geotags a new photo, a new point appears on the map.
To build a collaborative map using John’s approach, you still need a central registry. But it needn’t manage primary data such as locations. Instead it need only manage metadata: identifiers for schools, identifiers for authorized contributors. If John were a school administrator he’d register as a contributor, geotag a photo of the school, and tag it with the school’s identifier.
Now that’s a contrived example, because geotagged photos are a circuitous way to federate location data. But this model supports other and more natural approaches equally well. Instead of looking for tagged photos on Flickr, for example, the registry might look for resources (e.g. schools’ domain names) tagged on del.icio.us or elsewhere.
With this scheme you have a nice separation of concerns. Schools are the authoritative sources for data about themselves. Registries define the protocols for syndicating that data, and the sources they’ll syndicate from.
If there’s only one registry the benefits aren’t overwhelming. But in fact there are many registries.
Consider a school that belongs to a variety of regional academic and athletic associations. Today, each association’s view of its member schools requires yet another centrally-controlled database. Many facts about individual schools will be duplicated across those databases. The location won’t often change, but other facts will. When that happens there’s no way to publish one authoritative change about a school that flows to many subscribing registries.
The alternative I’m sketching is really a variation on the lifebits theme. Like individuals, organizations have an interest in declaring authoritative facts about themselves. Aggregate views shouldn’t require singular control of a database, but rather singular definitions of tagging, syndication, and membership protocols.
There is, of course, a huge problem with this scheme. It presents a formidable conceptual barrier. For example, my experiment in community information hasn’t gone very far yet. To me, it makes perfect sense. No need to plug your photos or your events into my database. Instead, just reuse your photos on Flickr and your events on Eventful. But nobody expects things to work that way. In principle the indirection of tagging and syndication creates all sorts of useful effects. In practice most people aren’t comfortable with that indirection. If Jeanette Wing has her way, such computational thinking will become much more prevalent. I hope she’s right, and I wonder what we can do to encourage it.
August 28, 2007
Posted by Jon Udell under
Uncategorized [9] Comments
On a recent vacation during which I helped a friend who’s building a house on Prince Edward Island, I picked up a copy of The Guardian and happened upon the death and funeral announcements. At first glance what’s remarkable is the amount of detail about the family of the deceased, the entire cast of characters involved with the funeral, and even the hymns sung. Scanning all this information, it took me a while to realize that something was missing. There’s almost no information about the life and times of the deceased. What is recognized in these pages is not the person but rather the social network to which the person was connected.
We caught a glimpse of the power of that social network when we were raising the first wall of the house. Word got around, people showed up to help, and we felt the force of community in a place that modernized fairly recently and still retains a strong flavor of pre-industrial culture. In that world, social networking isn’t a lifestyle choice, it’s a matter of survival.
On the way home, waiting in the Charlottetown airport, I saw a copy of Newsweek with a cover story about Facebook. Arguably our new modes of Internet-based social networking really are lifestyle choices, at least so far. As they mature it will be interesting to see how we use them — both to recapture lost ways of life, and to create new ones.
August 28, 2007
Posted by Jon Udell under
Uncategorized [6] Comments
At Burning Man this year Dick Hardt will be generating electricity with a 400W wind turbine. A couple of days ago I saw what appears to be that same thing in a Canadian Tire store on Prince Edward Island, at the end of an aisle that included the usual sorts of automotive accessories you’d expect to find there. It reminded me of the first time I saw an Ethernet switch, formerly a esoteric item, on the shelf at Staples. Evidently wind power is going mainstream. Here’s the 400W generator on sale for $800:

And here’s the delightful illustration on the side of the box, with the immortal caption “How Wind Works”:

It starts with a puff…
August 23, 2007
Posted by Jon Udell under
Uncategorized [4] Comments
As a former gymnast, I’ve always been frustrated with what passes for television coverage of the sport. The announcers always point out what everyone can plainly see: “Oops, didn’t stick the landing.” But they never tell you anything about the real subtleties of the sport. When I’m watching on TV with friends and family I try to explain things, but it all goes by too quickly. Even when replaying a recorded show in slow motion, it can be really hard to pinpoint what goes on.
Here’s an example from a competition I saw tonight. In these parallel frames of video, Nastia Liukin on the left and Shayla Worley on the right are at exactly the same point in a back giant swing:
One second later they’ve both done a half turn to what appears to be exactly the same point in a front giant swing:
But although Elfi Schlegel and Tim Daggett never mention this, there’s a huge difference between what those two women did in the intervening second, and also in the positions they came to.
I captured the two sequences side by side in this video. You may have to drag the slider back and forth a few times to catch what’s going on. Here’s a guide.
They both release their left hands and begin to turn.
Worley on the right turns her back away from the camera and ends in an ordinary undergrip. You know that one. Extend your arms forward, palms up and thumbs out, lay a broomstick across your palms, and grasp. It’s easy and natural.
Liukin on the left turns her back toward the camera, and ends in an eagle grip. You don’t know that one. Release one hand from the broomstick, rotate your thumb inwards and then outwards again through 180 degrees, and regrasp. Now do the same with the other hand. It’s hard and unnatural. Unless you have extremely flexible forearms and shoulders, you won’t even be able to do it.
This intermediate frame shows the difference most clearly. You can see their ponytails flying in opposite directions:
When I was in high school, my coach used to take Super 8 movies of the top competitors — in that era, in men’s gymnastics, it was the Japanese — and we would analyze their performances frame by frame. It’s so cool to be able to make and share that kind of analysis on the web. If we could get Elfi and Tim to do some of that, televised gymnastics would be so much better.
By the way, Nastia Liukin’s set was one of the most fabulous bar routines that I’ve ever seen performed by a woman or a man. Not just because of that crazy eagle grip, which she uses in several places, but in every way: flow, extension, flight, timing, power, flexibility, daring, and style.
August 22, 2007
Posted by Jon Udell under
Uncategorized [19] Comments
Recently I gave a talk in which I explored the idea of a hosted lifebits service. I think it’ll turn out to be fundamental principle and an enabler of many things, including the social network portability that is the blogosphere’s topic du jour. But before we go there, let’s explore how a series of more basic scenarios might play out in the context of a hosted lifebits service.
1) I write a blog entry.
Today we can, and often do, put serious effort into these acts of personal publishing. But the infrastructure to which we commit our words, sounds, and images doesn’t take our effort seriously. There’s no guarantee that anyone will be able to access an item at the published address in a year, never mind ten or a hundred. And there’s no guarantee that the effects of these acts of personal publishing — the reactions they provoke, the influences that flow from them, the reputations they create for us — can be measured.
In the hosted lifebits scenario such guarantees will exist, because we’ll pay for the service that makes them. At the core of that service is an archive that provides price-tiered levels of assurance that your stuff will be stable over time, that access will be granted in exactly the ways you specify, and that you can monitor that access.
I may over time use a succession of blog publishing systems. No problem, because the publishing service is decoupled from the core lifebits service. When I change publishing services, no content changes hands. There’s no export or import. I just authorize a different publishing service to access my archive. And there’s no rewriting of the URLs either. I declare what my web namespace will be. The lifebits service guarantees its long-term persistence, and collaborates with the publishing service to populate that namespace.
What if my lifebits service goes belly-up? Still no problem. There are multiple lifebits providers, and they belong to a government-regulated federation that assures continuity. One real-life example of this business arrangement is the insurance industry’s notion of a guaranty fund.
2) I comment on somebody else’s blog.
Today, each time I do that, I commit my words to a different foreign system. Logically they’re all my comments, but operationally they’re scattered all over the place, subject to a random assortment of naming and archival policies. True, there are services that can help me lasso all my comments, but architecturally this is just herding cats that have already gotten out of the bag.
In the hosted lifebits scenario, the item I’m commenting on is a permanent part of its author’s archive, at a stable URL. To comment on it, I write an item into my archive that refers to the item I’m commenting on, and my publishing service notifies the author’s publishing service that a comment has been made. We have various approximations of this behavior today, of course, but real consistency and coherence will require the use of lifebits services and associated lifebits-aware publishing services.
3) I write an email.
Today when I do that, I transmit a message from my email system to yours. If I want to maintain a coherent archive of my email, there are all sorts of challenges. Over time I use a succession of personal and business email systems. And at any given point I use several different ones concurrently, to separate personal from business correspondence. I know a few people who have kept their email archives intact over time, but for most those archives are scattered across a variety of local and (nowadays) cloud-based repositories.
In the hosted lifebits scenario, an email message can be a kissing cousin to a blog posting or a comment. I write it, commit it to my archive at a stable URL, notify you of its existence at that URL, and optionally transmit a copy of the message. That last step is optional because this model decouples two aspects of email that have always been inseparable: notification and transmission.
The core lifebits service that I’m postulating here, plus associated lifebits-aware publishing services — which are what email services turn out to be in this model — aren’t enough to achieve that decoupling. We’ll also need an access control regime that leverages an identity metasystem. Once those ingredients are all available, we’ll start to see that the services of notification, storage, access control, and transmission can be recombined to achieve powerful effects.
Consider the following example. When Robert Scoble worked for Microsoft he reportedly once said: “I wish we had trackbacks for email.” Why? He was comparing the efficiency of blog communication to the relative inefficiency of email communication. In the blogosphere it’s fairly easy to trace the influence of a posting, but in email there’s no way to monitor the influence of your contribution to an email thread once your name drops off it. In a pervasively publishing-oriented enterprise, knowledge management and social network analysis would be radically simpler and more effective than they are today.
Pushing the email example even further leads to a key objection. From my perspective, my personal and professional lifebits are all part of the same stream. But can we really imagine that when I join a company I’ll be able to federate my personal lifebits service with its corporate lifebits service? That sounds crazy at first. Companies run their own email infrastructure in part so they can enforce policies about email retention and destruction. And yet, companies are increasingly outsourcing that infrastructure and delegating the enforcement of those policies to third parties. Providers that survive in an ecosystem of lifebits providers will have to convince everyone that they are trustworthy, reliable, and interoperable.
Is there a reason to think that my company’s provider will do better on those measures than my personal provider? If we’re thinking just in term of today’s email and blogging systems, then yes, there probably is. There hasn’t been an incentive, yet, for personal-grade systems to meet enterprise-grade expectations. But let’s add one more scenario:
4) I visit a doctor.
Today, the record of my visit is kept by the hospital. Yes, the portable health record is coming, finally, and that will be a great step forward. But it’s really just an interim step. Here too, the services of notification, storage, access control, and transmission can be usefully recombined. Imagine that your health records are managed, in the most permanent and authoritative sense, by your own lifebits service which you choose to federate with the corporate lifebits services of the various health care providers you encounter during your life.
This raises the bar on the guarantees of trustworthiness, reliability, and interoperability that personal lifebits service must make. Those guarantees won’t come for free. But if I can amortize the cost across all my data silos — health, family, employment, education, finance, shopping, social life — the benefits will be huge and I’ll gladly pay.
August 20, 2007
Posted by Jon Udell under
Uncategorized [9] Comments
My guest for this week’s ITConversations show is Greg Elin, chief data architect with the Sunlight Foundation. Founded in 2006, the Sunlight Foundation aims to make the operation of Congress and the U.S. government more transparent and accountable. There are lots of obvious reasons why that’s a good thing. Greg adds a non-obvious reason that I hadn’t heard and find compelling:
I increasingly feel that the reason for Congressional hearings to be open and recorded and annotated is market efficiency. The fed does not announce what it’s going to do with interest rates until it announces it to everybody. But is that the case for the rest of Congress and legislation? If I can afford to have a fulltime lobbyist going to the committee meetings, don’t I have an inside track? Can’t I arbitrage my market investments based on that? It’s a question of market effiency.
That was one of the moments in this conversation where I stopped and said: Wow, great point. Here’s another. We were talking about the difficulty of organizing information from disparate sources based on unique identifiers, whether for individual legislators or for sections and paragraphs of legislation. Greg made this excellent point:
As technologists, we forget how much we’ve gamed the system from the beginning in setting up our tools. That Ethernet card comes with a hardcoded ID, and it’s unique, but it took us a long time to get there, and it required the cooperation of a lot of people to make it work.
Having surveyed a wide range of government data sources, Greg’s conclusion is that the future is already here, but not yet evenly distributed. There are pockets within the government where data management practices are excellent, and large swaths where they are mediocre to horrible. The Sunlight Foundation has an interesting take on how to bootstrap better data practices across the board. By demonstrating them externally, in compelling ways, you can incent the government to internalize them:
Sunlight Foundation made a grant to OMBWatch, they put together fedspending.org, and as that was happening the Coburn-Obama bill was passed, which basically said that the OMB had to put together the same type of website. If the Sunlight Foundation — and other organizations like the Heritage Foundation and Porkbusters — if we had not been doing a collaborative project at the time around earmarks, and at the same time working with OMBWatch to do fedspending.org, I think that there wouldn’t have been the drumbeat pressure for the government to make this information available.
Later the conversation turned to data integrity and data provenanance. What I mean by integrity, here, is the sort of question raised by my Hans Rosling wannabe screencast in which I observe that town-reported crime statistics rolled up to a statewide total don’t agree with state crime statistics as seen from a national perspective. Greg has a similar example:
Everything that CRP [Center for Responsive Politics] tracks is on a two-year election cycle. But OMBWatch is tracking contracts, and Taxpayers for Common sense is tracking earmarks, on a budget year cycle. So things don’t necessarily line up.
There’s never going to be an easy way to make these different gears mesh. But until now, we’ve never had any way to see exactly how they don’t mesh, and to factor that into our thinking. That’s one of the subtler effects of transparency.
Another is the possibility of a more complete view of data provenance — that is, where it comes from, and how it’s transformed along the way. Influenced by Jeff Jonas’ notions of sequence neutrality and data tethering, Greg envisions an open protocol for what he calls continuous data analysis:
If we can get an open protocol for reporting what we find in data, you’re beginning to make explicit the transformations that you apply. What I need to be able to do here at Sunlight, and what all of us working with public data need to be able to do, is instantly reprocess data that we’ve already processed, because any data we get is going to be missing something. If someone decides to change a taxonomy term, you ought to be able to rerun the data at every level with that new taxonomy term.
This was an excellent conversation, thanks Greg!
August 17, 2007
Posted by Jon Udell under
Uncategorized [25] Comments
There’s undoubtedly a whole series of items to be written on unexamined idioms in software user interfaces.1 Here’s one to kick off the series: the linking mechanism in rich text editors. It hasn’t changed in a decade, and it works the same way in new editors — like Yahoo’s Rich Text Editor and the .NET-based Windows Live Writer — as it always did. The idiom goes like this:
1. Select the text to which you want to attach your link.
2. Click the Link button.
3. Type (or paste) the URL.
I’ve watched novices struggle with this for years, and it’s no wonder that they do. What’s missing from this protocol is the capture of the URL. (That’s almost always necessary because few URLs nowadys can be easily typed.) So the idiom really goes like this:
1. Navigate to the target of the link.
2. Capture the URL.
3. Select the text to which you want to attach your link.
4. Click the Link button.
5. Type (or paste) the URL.
We have never, in any rich text editor I’ve ever seen, woven in support for those crucial first two steps.
How might that work? It occurs to me that a picture-in-picture browser would be really helpful. I’ve only seen one example of that genre — Bitty Browser — but it, or an equivalent widget, would seem like a great solution. When you click the Link button you get a picture-in-picture browser that you use to navigate to the link target. Ideally it loads with your current history and tabs, so the target is within easy reach.2 When you land on the target, there’s a button to copy the URL. Now that you’ve been guided through the first two steps, the remaining three flow naturally.
1 Just for fun, I’m going to try keeping a list at del.icio.us/judell/unexamined-software-idioms. To play along you can do the same at del.icio.us/YOU/unexamined-software-idioms, and we can see what accumulates in del.icio.us/tag/unexamined-software-idioms.
2 You can see a glimmer of this idea in Live Writer. From its linking dialog you can navigate to, and select, a prior post or a glossary entry.
August 15, 2007
Posted by Jon Udell under
Uncategorized [5] Comments
I listen to lots of podcasts, often in harsh conditions to which I wouldn’t want to expose a hard-disk-based device. So flash-memory-based gadgets are an attractive choice. Their capacity isn’t an issue for me because once I’ve listened to a podcast I just discard it. There’s no need to manage it as part of a collection that lives on my computer, or is synchronized to a player. If I want to hear it again sometime, I’ll download it and transfer it again.
I also do an increasing amount of voice recording, when preparing for talks. Here too, less is more. I don’t need high quality sound, just convenient recording of speech that’s recognizable on playback. Again, this recording often occurs in harsh outdoor conditions. And it sometimes occurs spontaneously, in which case I want to be able to pop in a AAA battery and go. I’ve come to loath devices with proprietary batteries that are useless if you forget to charge them in advance.
Finally, like everyone, I use USB sticks to store files and move them from one place to another.
I’ve found one device that meets all three of these requirements brilliantly: the Creative MuVo. I’ve been a huge fan of this gadget since 2004, and have owned three models. First was the 256MB MuVo TX. Later came the 512MB MuVo TX FM, which doubled the storage and added an FM radio I never used. Before giving away the TX I owned both for a while, and on one memorable occasion I found a compelling simultaneous use for the pair.
True, the device has a tendency to flake out now and then, in ways that would confound most people, but I was always able to resurrect it with a firmware refresh.
Until now. The TX FM still works as a USB drive but the player is dead. Since I was going to Staples anyway I picked up what seemed like the obvious replacement, the MuVo V100, without doing any research. Bad idea, it’s dog slow on transfers:
At Creative’s site there are three pages of customer complaints about the MuVo v100 slow file transfer rate. No fix is currently available, though Creative customer service sends customers on a useless firmware download wild goose chases and neglects to mention that the snail-like transfer rate is a well-documented problem. [Amazon customer reviews]
Sheesh. I’m taking it back, ordering another TX FM instead, and wishing that somebody would provide that excellent bundle of features in sturdier package.
August 14, 2007
Posted by Jon Udell under
Uncategorized 1 Comment
This morning I spoke with Kentaro Toyama, the assistant managing director of Microsoft Research India, about the mission of Microsoft’s Bangalore-based research center. Our podcast touches on all six of MSR India’s research areas. These are mostly concerned with the same kinds of advanced computer science problems that the other labs around the world focus on. Although it wasn’t a requirement that each of these efforts be particularly appropriate to India, it turns out that one way or another they are.
India’s wealth of mathematical talent, for example, is a tremendous asset for a research program in cryptography, security, and algorithms. Likewise its linguistic diversity — there are 22 officially recognized languages, and several hundred dialects — makes it a natural home for research on multilingual systems. And a country that’s adding 7 million mobile phone subscribers every month is a great place to investigate mobility, networks, and systems.
There’s also work in areas outside the realm of classic computer science. Kentaro Toyama leads an area called technology for emerging markets which tackles problems like how to create text-free user interfaces for people who cannot read. Obviously you need to rely heavily on graphics and on audio feedback, but there are fascinating subtleties involved. Simple icons don’t work well, because they’re not expressive enough. But fully realistic images don’t work well either, because they’re overly literal. It turns out that a cartoon-like approach is what works best, and within that discipline there are further subtleties — for example, you want to animate the pictorial verbs, but not the nouns.
I was also fascinated to hear about related work in digital geographics, and in particular, about an effort to render map data in the style of hand-drawn historical maps. Why do this? Well for one thing, those old maps are beautiful. But as Kentaro Toyama points out, there’s a non-aesthetic reason too. Maps produced by human cartographers communicate more effectively than machine-generated maps normally can. That’s because cartographers use their intelligence and judgement to select and emphasize certain features at the expense of others. It’d be great to be able to model some of that intelligence and judgment and reproduce it software.
I’ve been to India twice. When I was 5, my family lived in New Delhi for a year. Then in 1993, for BYTE, I visited to learn about India’s software industry. Maybe finding out more about MSR India will turn out to be a reason to go again.
August 13, 2007
Posted by Jon Udell under
Uncategorized [13] Comments
In a couple of talks last year on the theme of network-enabled apprenticeship, I referred to an example of the transmission of tacit knowledge. What happened was that Jim Hugunin accidentally taught me a feature of the Python programming language — the use of the special underscore variable to store the value of the most recently evaluated expression — without ever realizing that I hadn’t known about it, or that his use of the idiom transferred it to me.
Now Chris Gemignani has taught me something else about Python in the same accidental and unconscious way. Last week I mentioned his geocoder for Excel. He’s also written a Python class that’s useful for batch geocoding, and when I found it today I was struck by this idiom:
print “location: %(latitude)s, %(longitude)s” % address
If you’re a non-programmer, here’s a bit of background. Most programming languages include some version of printf, a function that use a format string to control the interpolation of the values of variables into text. So in Python, for example, this statement…
print “location: %s, %s” % ( latitude, longitude )
…would interpolate the values of the variables named latitude and longitude into the format string “location: %s, %s” to produce an output like:
location: 42.933659, -72.278542
It’s quite likely, though, that those variables will be members of a data structure like this dictionary:
address = { ‘latitude’: 42.933659, ‘longitude’: -72.278542 }
In this case, your normal instinct will be to write:
print “location: %s, %s” % ( address['latitude'], address['longitude'] )
That works fine, but the alternative Chris revealed to me is better:
print “location: %(latitude)s, %(longitude)s” % address
Although I use Python extensively, I had never discovered this! It’s better in two ways. First, it’s more concise. Second, it associates the names of the variables directly with the percent markers in the format string. That’s not a big deal when there are only two variables to keep track of, but often there are more, and matching up the positions of the markers in the format string with the positions of their corresponding variables in the corresponding list is tedious and error prone.
Quite possibly none of this means anything to you, because you’re neither a programmer nor a Pythonista. Even so, I’ll argue that this principle of transmission of tacit knowledge is profound, and can apply to almost any discipline that’s subject to online narration.
There are all sorts of obvious reasons to narrate the work that we do. By doing so we build reputation, we attract like-minded collaborators, we draw constructive criticism, and we teach what we know.
Sometimes there’s also a non-obvious reason. It’s possible to teach what we don’t know that we know.
August 13, 2007
Posted by Jon Udell under
Uncategorized [9] Comments
Richard Ziade is experimenting with a video form he calls sketchcasting. A sketchcast is a recording of a whiteboard session plus voiceover. I’ve seen some very effective educational uses of this technique, and it’s interesting to compare Tim Fahlberg’s mathcasts to Richard Ziade’s sketchcasts. When Tim Fahlberg demonstrates the solution to a math problem in one of his mathcasts, the visual repertoire of numbers and symbols is fixed, and the creative contribution is sequencing and narration. When Richard Ziade delivers a presentation as a sketchcast, the visual repertoire is open-ended. We all know people who like to sketch and who communicate effectively that way. Richard Ziade is clearly one of them. Microsoft’s Steve Cellini is another. In meetings he invariably leaps to the whiteboard and draws pictures of the ideas being discussed.
It’s great to see all these forms evolving and — crucially — becoming more accessible. TechSmith’s Jing, for example, aims to make screencasting more spontaneous. SlideShare makes it easy to produce and share slidecasts, which are audio narrations of slide decks.
As words suffixed with cast proliferate — pod, screen, math, sketch, slide — it can all seem a bit bewildering. But with a range of choices, people who want to produce rich media can gravitate to the forms that match their skills and inclinations. And for those who watch and listen to these productions, it’s not complicated at all. You click the link, you watch and/or listen.
August 10, 2007
Posted by Jon Udell under
Uncategorized [14] Comments
As mentioned here, I’ve been working with a spreadsheet containing addresses that want to be geocoded. I’ve had lots of experience running batches of addresses through geocoding services, but in the case of the police department I’ve been working with, it would be nice to be able to do the geocoding interactively. That way, if 400 Marlboro resolves incorrectly to 400 Marlboro Rd., the clerk will know it’s necessary to specify 400 Marlboro St. if that’s the intended address.
I found two examples of spreadsheets programmed with this behavior, first from AutomateExcel.com and second from Juice Analytics. When I compared these I realized that I wanted to combine aspects of both.
The AutomateExcel version is extremely simple. That’s partly because it uses the XML mapping features of Excel 2003 or 2007 to capture the XML output of a geocoding service, and partly because it only deals with a single address.
The Juice version is more complex. That’s partly because it eschews the XML mapping features in order to support older versions of Excel, and partly because it deals with many addresses. (It also exports KML for use with Google Earth.)
In my case I was willing to assume Excel 2003 or later, and use XML mapping. But I wanted to be able to accumulate results for many rows of addresses. I also wanted to switch from the XML output of geocoder.us, which is used in the AutomateExcel version, to the XML output of Yahoo’s geocoder, which is used in the Juice version.
The version I came up with is here, and the VBA code appears below. I haven’t used VBA or the XML mapping features of Excel in a while so, while the experience is fresh in my mind, I thought I’d record some of my key observations.
Mapping the output of Yahoo’s geocoder
I started by replicating the XML mapping in the AutomateExcel version. Here’s a sample geocoder.us query:
geocode?address=400%20marlboro%20st,keene,nh
To create an XML mapping in Excel you do: Data -> From the web -> [plug in the URL] -> Import. Excel warns: “The specified XML source does not refer to a schema. Excel will create a schema.” OK.
Then it asks: “Where do you want to put the data?” I answered: “XML table in existing worksheet.”
Then I did Developer -> Source to reveal the XML map, unbound the mapped fields, and rebound them to the vertical rather than horizontal layout I wanted to use.
It was all good. So now I tried the same procedure using this sample Yahoo query:
geocode?appid=YahooDemo&street=400%20marlboro%20st&city=Keene&state=NH1
But this time when I unbound and rebound the fields, I couldn’t access the values in the same way. Eventually I saw why not. The Yahoo results reference a schema, and that triggers a more complex behavior in Excel involving the importation of whole data sets.
So I saved an instance of Yahoo’s XML results in a file, stripped out the schema reference, and then acquired it using Data -> From other sources -> From XML Data import. Then it behaved just like the first example. I expect there’s a simpler solution, and hopefully this item will attract a reference to it.
With the mapping done, it’s a one-liner in VBA (ActiveWorkbook.XmlMaps(“YahooMap”).Import url:=url) to fetch the XML data and spray it into the mapped cells. That’s dramatically simpler than the regular-expression gymnastics performed by the Juice version. Of course if you need to support older versions of Excel, you’ve got to perform those gymnastics.
Relativizing references
My first version was full of hardcoded references to rows and columns in the temporary sheet where the XML data gets unpacked, and in the main sheet where raw addresses are decorated with latitude, longitude, parsed address, precision (e.g. exact address vs. street-level vs. city-level), and cleanliness (e.g. whether there were warnings).
I knew that I’d need to use lookup functions to relativize all those references, and it soon became apparent that I’d want to use the Match function — which finds the position of an item in a row or column — to do it. But it returns numeric positions, which are fine for rows but don’t correspond to alphanumeric column names like C3. The solution, as generations of Excel hackers have learned but I never had need of until now, is to go to Options and enable the R1C1 reference style. Now the columns have numbers too, and in VBA you can write reference like so:
rows(index).columns(address_col)
Dynamic variable assignment
That cleaned up a lot of the mess, but there was still a lot of per-variable code that I’d written in order to stash the geocoded results into VBA variables and then later retrieve them. I thought of generalizing that by using Eval, like so:
Eval( ‘y_lat = Selection.Value’ )
But no dice. Excel 2007 told me there was no Eval function. Which is just as well, because that time-honored trick is really sketchy. So I went looking for VBA’s equivalent to the Perl associative array or the Python dictionary, and found it in VBA’s Collection.
All in all, it was an educational exercise. The patterns here can serve as a model for any scenario that involves interactively querying a web service based on some cell in Excel, and then incorporating the results into companion cells. Of course since I’m a complete novice when it comes to this stuff, I’m hoping that by posting my code I’ll also find out about other and better approaches.
1 You’ll want to substitute your Yahoo application id for YahooDemo. And unless the addresses you’re looking up happen to be in my town, you’ll want to adjust the city and state too.
dim address as string
dim escaped_address as string
dim city as string
dim state as string
dim yahoo_id as string
dim url as string
dim y_labels() as variant
dim y_values as collection
dim main_address_col as integer
dim scratch_data_col as integer
dim scratch_label_col as integer
public sub init
city = "Keene"
state = "NH"
address = ActiveWorkbook.Application.ActiveCell
main_address_col = 2
scratch_label_col = 1
scratch_data_col = 2
escaped_address = replace(address, " ", "+", 1)
yahoo_id = "YahooDemo"
y_labels = array ("y_lon","y_lat","y_addr","y_precision","y_clean")
end sub
public sub GeocoderYahoo()
on error goto ErrorMsg
init
url = "http://local.yahooapis.com/MapsService/V1/geocode?appid=" & yahoo_id
url = url & "&city=" & city & "&state=" & state & "&street=" & escaped_address
'call the geocoder
ActiveWorkbook.XmlMaps("YahooMap").Import url:=url
'find current row
index = application.match(address,columns(main_address_col),0)
set y_values = new collection
'gather results into collection
worksheets("Scratchpad").select
for each label in y_labels
row = application.match(label,columns(scratch_label_col),0)
rows(row).columns(scratch_data_col).select
y_values.Add Item:=selection.value, Key:=label
next label
'unpack collection
worksheets("Main").select
for each label in y_labels
col = application.match(label,rows(1),0)
rows(index).columns(col).select
selection.value = y_values(label)
next label
rows(index).columns(main_address_col).select
goto Fini
ErrorMsg:
msgbox ("Cannot geocode: " & address)
Fini:
end sub
August 7, 2007
Posted by Jon Udell under
Uncategorized [4] Comments
Imagine a 300-page history of the United States that spent the first 290 pages on events up to and including the Civil War, then zoomed through everything else in the last 10 pages. According to Doug Gale, who I met this week at EDUCAUSE, that’s more or less how the history of the Internet has been written. He runs a consultancy called Information Technology Associates which is located in Big Sky, Montana because it can be. Earlier in his career he was an NFSNET administrator who was instrumental in taking us from a research network with a few hundred nodes in 1980 to the 7-million-node recognizably modern Internet we had by 1995. Although much has been written about the early ARPANET, there’s surprisingly little documentation of the 15-year period from the development of CSNET in 1980 to the decommissioning of the NFSNET in 1995.
So Doug Gale is now interviewing seventy-odd of the key players in that 15-year transformation, in order to build an archive of source materials that can be used to write the history of that formative era. The oral archive is currently a work in progress, and none of the interviews captured so far have been published, but he intends to do that and is looking for help.
I’d love to read that history once it’s written. Meanwhile, here’s the project’s home page.
August 7, 2007
Posted by Jon Udell under
Uncategorized 1 Comment
Hugh McGuire, the LibriVox dude, has started a new project, datalibre.ca, whose tagline is: urging governments to make data about canada and canadians free and accessible to citizens. He wanted to know more about why I’ve been focusing lately on the issue of access to public data, so he emailed me some questions and has now posted the answers here.
August 7, 2007
Posted by Jon Udell under
Uncategorized [5] Comments
As senior technical officer for the Defense Intelligence Agency and chief of its requirements and research group, Lewis Shepherd has promoted and observed a remarkable transformation that’s occurring inside the U.S. intelligence community as analysts begin to embrace Web 2.0 practices. When we first met in March I jumped at the chance to quiz him about how Intellipedia, blogs, and related methods are taking root in the various agencies. In this week’s ITConversations podcast we replay that conversation.
The Intellipedia story is fairly well known, but in this podcast you’ll also hear about the viral adoption of blogging among analysts. It’s surprising in one way because as Lewis Shepherd points out, you wouldn’t expect a senior analyst who has built up a reputation as the reigning expert on some topic to be thrilled about having a junior analyst comment on — or edit! — the work. But on the other hand, he notes that this is essentially a scholarly community and the urge to publish, and to be cited, is strong. It’s a fascinating tale of culture change.
In addition to social software, we also discussed a range of initiatives in the realms of virtualization, service-oriented architecture, and the semantic Web.
August 6, 2007
Posted by Jon Udell under
Uncategorized [10] Comments
While traveling to Snowmass Village, Colorado today for the EDUCAUSE Seminars in Academic Computing, I listened to a pair of podcasts: Steward Brand at PopTech and Esther Dyson at ITConversations. As often happens, I thought of questions I’d like to ask, and if I can bring those two onto my own show sometime I’ll do just that. In this particular case, I’d love to have a conversation at the intersection of the topics discussed on those podcasts. Stewart Brand’s topic was world urbanization. It’s a major theme for him lately, and has been the subject of several of the Long Now talks — including his own on cities and time, and Robert Neuwirth’s on the nature and dynamics of squatter cities. We’re becoming an urban planet, Brand says. The crossover moment, when more than half of humanity lives in the cities, may already have occurred, or else soon will. He thinks we’ll go far beyond it in this century, becoming a mostly urban world.
Of course Stewart Brand has been wrong before, as he freely admits. Decades ago he worried about the population explosion. But while he’s astonished by the doubling that’s occurred in his lifetime, he’s even more astonished to think that it was probably the last doubling, and that after leveling out at between 7 and 9 billion the world population is expected to sharply decline.
Could he be wrong again? Could humanity’s rush to the cities slow down or even reverse? Since the concentration of economic opportunity in cities is what brings people there, it would take a dispersal of economic opportunity to enable those who would prefer the countryside to remain there.
One powerful force that’s dispersing economic opportunity is of course the Interent. A decade ago there were a few lucky souls who could pull an income through a modem. Today there are lots more, and we’ve yet to see what may happen once high-bandwidth telepresence finally gets going.
But a second force for dispersion has yet to kick in at all. It is the Internetization of transportation — and specifically, of air travel. That’s where Esther Dyson comes in. She’s investing in several of the companies that are aiming to reinvent air travel in the ways described by James Fallows in his seminal book on this topic, Free Flight. In that vision of a possible future, a fleet of air taxis takes small groups of passengers directly from point to point, bypassing the dozen or so congested hubs and reactivating the thousands of small airports — some near big cities, many elsewhere.
There are two key technological enablers. First a new fleet of small planes that are lighter, faster, smarter, safer, and more fuel-efficient than the current fleet of general aviation craft with their decades-old designs.
The second enabler is the Internet’s ability to make demand visible, and to aggregate that demand. So, for example, I’m traveling today from Keene, NH to Aspen, CO. If there are a handful of fellow travelers wanting to go between those two endpoints — or between, say, 40-mile-radius circles surrounding them, which circles might contain several small airports — we’d use the Internet to rendezvous with one another and with an air taxi.
For me that could be a huge win. There’s an airport not much more than a mile from my house with a runway that can land Air Force One, and in political seasons sometimes does. Years ago we had commercial air service to Boston and New York thanks to a federal essential air service subsidy, but that wasn’t enough to keep the operation going and now it’s gone. So my day looks like this:
1. Drive to Boston’s Logan airport: 2 hours. I can sometimes fly from Manchester, NH, which is only an hour and a quarter, but almost never directly to anywhere. And since today already involves an unavoidable hub — Denver — I’m avoiding a second by going directly there.
2. Logan’s economy lot to the United terminal: 20 minutes. It can be worse, but today the bus was there waiting and left quickly.
3. Clear security and wait: 1.5 hours.
4. Logan to Denver: 4 hours.
5. Layover in Denver: 2 hours.
6. Denver to Aspen: 1 hour.
7. Cab to Snowmass: 30 minutes.
The hypothetical air taxi scenario looks like this:
1. Drive to Keene airport: 6 minutes.
2. Clear security and wait: 30 minutes.
3. Passenger pickup in Amherst, MA: 30 minues.
4. Passenger pickup in Albany, NY: 30 minutes.
5. Albany to Aspen: 7 hours.
Cab to Snowmass: 30 minutes.
Let’s compare the two scenarios:
|
conventional |
air taxi |
difference |
| Drive to airport |
2 |
0.1 |
(1.90) |
| Shuttle bus |
0.33 |
0 |
(0.33) |
| Security and waiting |
1.5 |
0.5 |
(1.00) |
| Passenger pickup |
0 |
0.5 |
0.50 |
| Passenger pickup |
0 |
0.5 |
0.50 |
| Main flight |
4 |
7 |
3.00 |
| Layover |
2 |
0 |
(2.00) |
| Secondary flight |
1 |
0 |
(1.00) |
| Cab |
0.5 |
0.5 |
0.00 |
| |
11.33 |
9.1 |
2.23 |
According to this back-of-the-envelope calculation, the air taxi scenario isn’t a huge win. It only shaves a couple of hours off the trip, and we haven’t even considered how the prices of the two scenarios will compare.
But if I put on my Clayton Christensen hat and look at this from the perspective of disruptive technology, it seems that the positive values in the difference column are much less fungible than the negative values. In the conventional scenario, I don’t expect any significant reduction in the time it takes to get to, or through, hub airports. In the air taxi scenario, however, I can imagine significant reduction on two fronts. If this model starts to succeed, there will be more aggregatable demand and thus fewer required multi-hop passenger pickups. And there will also be more incentive to make smaller planes fly faster. As with other disruptive technologies the air taxi system at first underperforms the incumbent system, but has lots of headroom for improvement.
I have no idea if this will come to pass, or if I’ll live long enough to personally benefit from it. But it’s something I always think about on trips like this one. Looking out the window of the plane I can see a big world down there, full of many beautiful places that are mostly empty and — if Stewart Brand is right — are only going to get emptier.
Maybe it’s true that, given a choice and all other things being equal, most people would prefer to live in settlements of millions to tens of millions rather than tens of thousands to hundreds of thousands. But things aren’t equal, and while the quality of life I enjoy living in a settlement of tens of thousands is special in many ways, I pay a price for it. On travel days like today, I’m reminded that dealing with hub-and-spoke air travel is a fairly significant part of that price.
I don’t doubt that the world will continue to urbanize but I do wonder about the architecture of the emerging network of cities. Will the growth of megacities preclude the growth of towns and small cities, or can we flourish across this range of scales? The Internet is already enabling the latter in ways that I don’t think have yet been factored in to the predictions about world urbanization. Add to that the Internetization of air travel and things just might turn out rather differently than predicted.
PS: After writing this enroute, my Denver to Aspen hop was cancelled due to thunderstorms. So it took another four hours to rent a car and drive 150 miles. On the bright side, six of us shared a minivan, and it was a wonderful group: a brain surgeon, a CTO, a psychologist, an artist, and a telecommunications and real estate entrepeneur. We talked the whole time in a way that rarely happens on planes but that, it struck to me, might be more likely to happen in the more intimate cabin of an air taxi.
August 3, 2007
Posted by Jon Udell under
Uncategorized [9] Comments
I play a bit of fingerstyle guitar as a hobby, and a while ago I found a nice arrangement of The Tennessee Waltz which I’ve been trying to learn. The other day I went to YouTube to check out some other arrangements. Wow. There’s a smoking version by Bonnie Raitt and Norah Jones, a classic Patti Page version, a jazzy version by Kirk Whalum, a soul rendition by Otis Redding, this sweet one by Dan Hardin, and a bunch more. It’s astonishing to be able to sample all these different arrangements. It’s also, very likely, an anomaly. What happened to Napster, and what’s happening to Internet radio, will very likely happen here too.
We can argue until the cows come home about fair use and the appropriate scope of copyright, but the current regime has serious momentum, and significant change could be a long way off.
Meanwhile, a profound new kind of collaboration — enabled by Internet video — is trying to emerge. Sure, you can use Internet video to share cute animal movies1, but you can also use it to share knowledge about lawnmower maintenance or — as Lucas Gonze notes here — guitar playing.
In that post, Lucas reacts to an NPR story about the YouTube takedown of video guitar lessons. And he writes:
When a music publisher prevents musicians from learning a song, they are destroying the value of the song. There’s no reason to learn the Smoke on the Water riff except that everybody else knows it, and cultural ubiquity isn’t possible unless learning is absolutely free and unencumbered. Notice that the song in the original quote is by the Rolling Stones, a band that couldn’t matter less if it weren’t part of pop culture canon.
One result of copyright extremism will be the disappearance of cultural icons like the Rolling Stones. They haven’t contributed anything fresh to the culture for close to forty years, and without third parties reusing their old work in ways that make it fresh they hardly exist. In terms of 2007 pop culture, all those covers of “Paint it black” *are* “Paint it black.”
This is why I am resurrecting 150-year old songs and posting them, along with sheet music, on my blog — it’s possible for those songs to be used as source material for new work.
Although most of my friends disagree with the premise that out-of-copyright material can interest modern audiences, Lucas had me at hello with this project. That’s probably because, though I wouldn’t have thought Episcopal hymns would be toe-tappers, I love to hear — and play — John Fahey’s arrangements of tunes like In Christ There Is No East Or West.
Now I learned that one from a book, so the arrangement is copyrighted by the publisher, although the tune itself is available for reuse. But as I watched and listened to all those different versions of The Tennessee Waltz, I couldn’t help but wonder what might happen if that dynamic were applied to out-of-copyright tunes. Can more of the old tunes be reborn? If so, will our new ability to share, teach, and learn turbocharge the creative process surrounding them? If so, will that process in turn lead to the production of new tunes? If so, will some of those new tunes achieve cultural ubiquity? If so, will some of those conceivably remain outside the copyright regime?
That’s a whole lot of ifs and, as I said, most of my friends think none of this will ever happen. As for me, I dunno. Maybe yes, maybe no. Either way, props to Lucas for having the vision and taking the shot.
1 At 41000 views and climbing steadily, I sometimes worry in dark moments that, when all is said and done, I’ll be known as “the guy who made that cute video with the kittens and the bunnies.”
August 2, 2007
Posted by Jon Udell under
Uncategorized [15] Comments
Thanks to some really great comments on yesterday’s item I’ve taken another pass through the spreadsheet I got from the police department1. It looks like Chris Anderson and David French were exactly right to suggest a “police station effect” — namely, that there’s more crime at or near the police station.
Here’s a version of yesterday’s chart (with cleaner underlying data):
It’s focused on the old location of the police station which, you may recall, moved from Central Square in Jan 2006. If you thought the presence of the station would suppress the number of incidents, you wouldn’t find evidence for that here.
Now here’s the same thing focused on the new location of the police station:
That’s pretty clear!
There were two causes suggested.
1 (Chris): “The station was the place of the crime report and there was often no specific address.”
Yup. Of the 341 incidents within .1 mile of the new station, 315 were at the exact address.
2 (David): “This is where you end up when they let you out of the drunk tank.”
It’s possible to explore that spillover effect, but I’ll stop here and call out another excellent comment from Doug Finner:
If you get a big pile-o-data and don’t know everything about how the data was collected, it can be pretty close to impossible to do anything other than make very general observations. Trying to draw conclusions from data that is likely ‘dirty’ is often a fools errand. Probably the best you can do, is find interesting trends and then try and get good clean data collected – the whole scientific method thing.
Indeed. For this round I took a much more critical look at the address data. I discarded the fair number of junk addresses that resolved erroneously to the city center. And because the addresses in the file didn’t specify “St” or “Rd” there were systematic problems — particularly in the case of Marlboro which was resolving to Rd rather than St.
As Doug Finner suggests, it would be wise at this point to hand back the file augmented not only with latitude/longitude coordinates, but also with indications of how clean or dirty the geocoding was, and recommendations on how to improve it.
Meanwhile, the toolsmith in me is getting fired up with all kinds of ideas. For example, when I processed the raw file to create this categorized stack graph I wound up creating an ad-hoc system of piped filters in Python. Each one takes a list of rows and returns a transformed list of rows. Here are some of them:
- removeIncidentnums
- dedupeCasenums
- adjustDates
- trimDescs
- removeSingletonDescs
- addCategories
- addMonthlyCounts
All well and good. But this just begs for some kind of social treatment a la Pipes or Popfly, with a particular focus on the transformation of rectangular datasets.
I’m also thinking about ways to meld Python and Excel together more closely. So far, I’ve only relied on code generation — that is, using Python to write VBA macros to, for example, define named ranges. There’s also the possibility of outside-in automation, where Python drives Excel through its automation interface. But then I got to wondering: Will there be a role for IronPython (or IronRuby) here, someday, such that you could use these languages inside Excel? That’d be very cool.
1 Yes, I will publish this data once I’ve had a chance to show my work to the police and get their approval.
August 1, 2007
Posted by Jon Udell under
Uncategorized [10] Comments
If you’ve been following the continuing saga of my exploration of local crime statistics [1, 2, 3, 4], here’s an update. The police department has provided a spreadsheet containing a complete dump of reported crimes back to 2002, including the location (address) information I was looking for.
This dataset includes about 15000 rows, which is far too many to show on a map without some fancy filtering. But while pondering what to do about that, I realized I could try to answer two questions that folks have been asking:
1. Is there more crime in the past few years?
2. If so, is the increase localized to the downtown area?
The second question arises because the police department relocated, in 2006, from the center of town to a peripheral address. It’s been suggested that there is, as a result, less of a police presence downtown, and thus more crime.
The answer to the first question appears to be yes. As Martin Wattenberg observes, in his comments on that visualization, there’s a striking seasonal pattern: strong dips in winter, weak dips in summer. He asks:
Is this weather-related (potential criminals thinking “It’s too cold to mug anyone” in January)? Are there population changes in Keene, like tourism or college students, that would cause this?
I think he’s right on both counts. It’s cold here in winter, and it’s a college town.
More broadly though, the 2006 peak is noticeably higher than prior years’ peaks, and though we’re only in the middle of 2007, it’s tracking the 2006 pattern. Clearly crime is up since 2006.
But does the likelikehood of downtown crime correlate with the relocation of the police department? According to this chart, there is — if anything — a reverse correlation:
Here’s how I made that chart:
1. Geocode the addresses to latitude/longitude locations.
2. Compute the distance of each location from the town center.
3. Group the locations into zones.
4. Chart the percent of crimes in each zone.
I’ll reflect in a separate entry on the nature of that process, and on ways it could be made more accessible to the less technically inclined. But if this result proves to be a valid, it’s a nice example of citizen use of public data. And of course if someone else’s analysis of the data (and of my methods) were to challenge my result and prove something different, that would be even better!