Back in 2007 I talked with Pablo Castro about Astoria, which I described as a way of making data readable and writeable by means of a RESTful interface. The technology has continued to move forward, and I’m now a heavy user of one of its implementations: the Azure table store. Yesterday at PDC we announced the proposed standardization of this approach as OData, which InfoQ nicely summarizes here.

I’ll leave detailed analysis of the proposal, and the inevitable comparisons to Google’s GData, to others who are better qualified. Nowadays I’m mainly a developer building a web service, and from that perspective it’s very clear that wide adoption of something like “ODBC for the cloud” is needed. We have no shortage of APIs, all of which yield XML and/or JSON data, but you have to overcome friction to compose with these APIs.

For example, the elmcity service merges event information from sets of iCalendar feeds and also from three different sources — Eventful, Upcoming, and (recently added) Eventbrite. In each of those three cases, I’ve had to create slightly different versions of the same algorithm:

  • Query for future events
  • Retrieve the count of matching events
  • Page through the matching events
  • Map events into a common data model

Each service uses a slightly different syntax to query for future events. And each reports the count of matching events differently: page_count vs. total_results vs. resultcount. OData would normalize the queries. And because the spec says:

The count value included in the result MUST be enclosed in an <m:count>

it would also normalize the counting of results.

Open data on the web has enormous potential value, but if we have to overcome too much data friction in order to combine it and make sense of it, we will often fail to realize that value. ODBC in its era was a terrific lubricant. I’m hoping that OData, widely implemented in software, services, and mashup environments like the just-announced Dallas, will be another.