Excel geocoding adventures

As mentioned here, I’ve been working with a spreadsheet containing addresses that want to be geocoded. I’ve had lots of experience running batches of addresses through geocoding services, but in the case of the police department I’ve been working with, it would be nice to be able to do the geocoding interactively. That way, if 400 Marlboro resolves incorrectly to 400 Marlboro Rd., the clerk will know it’s necessary to specify 400 Marlboro St. if that’s the intended address.

I found two examples of spreadsheets programmed with this behavior, first from AutomateExcel.com and second from Juice Analytics. When I compared these I realized that I wanted to combine aspects of both.

The AutomateExcel version is extremely simple. That’s partly because it uses the XML mapping features of Excel 2003 or 2007 to capture the XML output of a geocoding service, and partly because it only deals with a single address.

The Juice version is more complex. That’s partly because it eschews the XML mapping features in order to support older versions of Excel, and partly because it deals with many addresses. (It also exports KML for use with Google Earth.)

In my case I was willing to assume Excel 2003 or later, and use XML mapping. But I wanted to be able to accumulate results for many rows of addresses. I also wanted to switch from the XML output of geocoder.us, which is used in the AutomateExcel version, to the XML output of Yahoo’s geocoder, which is used in the Juice version.

The version I came up with is here, and the VBA code appears below. I haven’t used VBA or the XML mapping features of Excel in a while so, while the experience is fresh in my mind, I thought I’d record some of my key observations.

Mapping the output of Yahoo’s geocoder

I started by replicating the XML mapping in the AutomateExcel version. Here’s a sample geocoder.us query:

geocode?address=400%20marlboro%20st,keene,nh

To create an XML mapping in Excel you do: Data -> From the web -> [plug in the URL] -> Import. Excel warns: “The specified XML source does not refer to a schema. Excel will create a schema.” OK.

Then it asks: “Where do you want to put the data?” I answered: “XML table in existing worksheet.”

Then I did Developer -> Source to reveal the XML map, unbound the mapped fields, and rebound them to the vertical rather than horizontal layout I wanted to use.

It was all good. So now I tried the same procedure using this sample Yahoo query:

geocode?appid=YahooDemo&street=400%20marlboro%20st&city=Keene&state=NH1

But this time when I unbound and rebound the fields, I couldn’t access the values in the same way. Eventually I saw why not. The Yahoo results reference a schema, and that triggers a more complex behavior in Excel involving the importation of whole data sets.

So I saved an instance of Yahoo’s XML results in a file, stripped out the schema reference, and then acquired it using Data -> From other sources -> From XML Data import. Then it behaved just like the first example. I expect there’s a simpler solution, and hopefully this item will attract a reference to it.

With the mapping done, it’s a one-liner in VBA (ActiveWorkbook.XmlMaps(“YahooMap”).Import url:=url) to fetch the XML data and spray it into the mapped cells. That’s dramatically simpler than the regular-expression gymnastics performed by the Juice version. Of course if you need to support older versions of Excel, you’ve got to perform those gymnastics.

Relativizing references

My first version was full of hardcoded references to rows and columns in the temporary sheet where the XML data gets unpacked, and in the main sheet where raw addresses are decorated with latitude, longitude, parsed address, precision (e.g. exact address vs. street-level vs. city-level), and cleanliness (e.g. whether there were warnings).

I knew that I’d need to use lookup functions to relativize all those references, and it soon became apparent that I’d want to use the Match function — which finds the position of an item in a row or column — to do it. But it returns numeric positions, which are fine for rows but don’t correspond to alphanumeric column names like C3. The solution, as generations of Excel hackers have learned but I never had need of until now, is to go to Options and enable the R1C1 reference style. Now the columns have numbers too, and in VBA you can write reference like so:

rows(index).columns(address_col)

Dynamic variable assignment

That cleaned up a lot of the mess, but there was still a lot of per-variable code that I’d written in order to stash the geocoded results into VBA variables and then later retrieve them. I thought of generalizing that by using Eval, like so:

Eval( ‘y_lat = Selection.Value’ )

But no dice. Excel 2007 told me there was no Eval function. Which is just as well, because that time-honored trick is really sketchy. So I went looking for VBA’s equivalent to the Perl associative array or the Python dictionary, and found it in VBA’s Collection.

All in all, it was an educational exercise. The patterns here can serve as a model for any scenario that involves interactively querying a web service based on some cell in Excel, and then incorporating the results into companion cells. Of course since I’m a complete novice when it comes to this stuff, I’m hoping that by posting my code I’ll also find out about other and better approaches.


1 You’ll want to substitute your Yahoo application id for YahooDemo. And unless the addresses you’re looking up happen to be in my town, you’ll want to adjust the city and state too.

dim address as string
dim escaped_address as string
dim city as string
dim state as string
dim yahoo_id as string
dim url as string
dim y_labels() as variant
dim y_values as collection
dim main_address_col as integer
dim scratch_data_col as integer
dim scratch_label_col as integer

public sub init
  city = "Keene"
  state = "NH"
  address = ActiveWorkbook.Application.ActiveCell
  main_address_col = 2
  scratch_label_col = 1
  scratch_data_col = 2
  escaped_address = replace(address, " ", "+", 1)
  yahoo_id = "YahooDemo"
  y_labels = array ("y_lon","y_lat","y_addr","y_precision","y_clean")
end sub

public sub GeocoderYahoo()

  on error goto ErrorMsg

  init

  url = "http://local.yahooapis.com/MapsService/V1/geocode?appid=" & yahoo_id
  url = url & "&city=" & city & "&state=" & state & "&street=" & escaped_address

  'call the geocoder
  ActiveWorkbook.XmlMaps("YahooMap").Import url:=url

  'find current row
  index = application.match(address,columns(main_address_col),0)

  set y_values = new collection

  'gather results into collection
  worksheets("Scratchpad").select
  for each label in y_labels
    row = application.match(label,columns(scratch_label_col),0)
    rows(row).columns(scratch_data_col).select
    y_values.Add Item:=selection.value, Key:=label
  next label

  'unpack collection
  worksheets("Main").select
  for each label in y_labels
    col = application.match(label,rows(1),0)
    rows(index).columns(col).select
    selection.value = y_values(label)
  next label

  rows(index).columns(main_address_col).select

  goto Fini

ErrorMsg:
 msgbox ("Cannot geocode: " & address)

Fini:

end sub

15 thoughts on “Excel geocoding adventures

  1. Chandu Thota

    Jon, perhaps you should give Microsoft MapPoint 2006 a try. It is perfectly suited for a task like this. I’m no longer with the firm, but if you need some internal help, ping folks in the Virtual Earth/MapPoint group. :)

    Reply
  2. Pingback: Transmission of tacit knowledge: teaching what we don’t know that we know « Jon Udell

  3. John Ward

    Jon,
    Thanks for the spreadsheet. Like you I rolled my own from inputs, including yours.

    One particular difference is I included was a check link. A generated hyperlink to Google Maps at the returned geo Location. It allows for some user validation that the address is where they said it was (http://thinedgeofthewedge.blogspot.com/2007/08/excel-spreadsheet-for-getting-geo.html)

    I see Yahoo maps supports returning an image. What would be fantastic would be to import that image directly into the spreadsheet. Thats for another day!

    John

    Reply
  4. Pingback: Jing’s the thing « Jon Udell

  5. Pingback: rajsingh.org blog

  6. Celia

    you seem like a smart guy and i have been searching the net for one year to try and find a simple way to map multiple addresses from excel. I did it a couple of times a while ago but now i can’t remember how I did it. Do I have to get them in longitude and latitude first by geocoding then what.. how do I get a kml file to import into maps.google??? I need serious help. I have 50 houses a day to map. Or any suggestions for a solution.

    Reply
  7. Pingback: du hoc Singapore

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s