OData, the Open Data Protocol, is described at odata.org:
The Open Data Protocol (OData) is a web protocol for querying and updating data. OData applies web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON to provide access to information from a variety of applications, services, and stores.
The other day, Pablo Castro wrote an excellent post explaining how developers can implement aspects of the modular OData spec, and outlining some benefits that accrue from each. One of the aspects is query, and Pablo gives this example:
http://ogdi.cloudapp.net/v1/dc/BankLocations?$filter=zipcode eq 20007
One benefit for exposing query to developers, Pablo says, is:
Developers using the Data Services client for .NET would be able to use LINQ against your service, at least for the operators that map to the query options you implemented.
I’d like to suggest that there’s a huge benefit for users as well. Consider Pablo’s example, based on some Washington, DC datasets published using the Open Government Data Initiative toolkit. Let’s look at one of those datasets, BankLocations, through the lens of Excel 2010’s PowerPivot.
PowerPivot adds heavy-duty business analytics to Excel in ways I’m not really qualified to discuss, but for my purposes here that’s beside the point. I’m just using it to show what it can be like, from a user’s perspective, to point an OData-aware client, which could be any desktop or web application, at an OData source, which could be provided by any backend service.
In this case, I pointed PowerPivot at the following URL:
I previewed the Atom feed, selected a subset of the columns, and imported them into a pivot table. I used slicers to help visualize the zipcodes associated with each bank. And I wound up with a view which reports that there are three branches of WashingtonFirst Bank in DC, at three addresses, in two zipcodes.
If I were to name this worksheet, I’d call it WashingonFirst Bank branches in DC. But it has another kind of name, one that’s independent of the user who makes such a view, and of the application used to make it. Here is that other name:
http://ogdi.cloudapp.net/v1/dc/BankLocations?$filter=name eq ‘WashingtonFirst Bank’
If you and I want to have a conversation about banks in Washington, DC, and if we agree that this dataset is an authoritative list of them, then we — and anyone else who cares about this stuff — can converse using a language in which phrases like ‘WashingtonFirst Bank branches in DC’ or ‘banks in zipcode 20007’ are well defined.
If we incorporate this kind of fully articulated web namespace into public online discourse, then others can engage with it too. Suppose, to take just one small example, I find what I think is an error in the dataset. Maybe I think one of the branch addresses is wrong. Or maybe I want to associate some extra information with the address. Today, the way things usually work, I’d visit the source website and look for some kind of feedback mechanism. If there is one, and if I’m willing to provide my feedback in a form it will accept, and if my feedback is accepted, then my effort to engage with that dataset will be successful. But that’s a lot of ifs.
When public datasets provide fully articulated web namespaces, though, things can happen in a more loosely coupled way. I can post my feedback anywhere — for example, right here on this blog. If I have something to say about the WashingtonFirst branch at 1500 K Street, NW, I can refer to it using an URL: 1500 K Street, NW.
That URL is, in effect, a trackback that points to one record in the dataset.1 The service that hosts the dataset could scan the web for these inbound links and, if desired, reflect them back to its users. Or any other service could do the same. Discourse about the dataset can grow online in a decentralized way. The publisher need not explicitly support, maintain, or be liable for that discourse. But it can be discovered and aggregated by any interested party.
The open data movement, in government and elsewhere, aims to help people engage with and participate in processes represented by the data. When you publish data in a fully articulated way, you build a framework for engagement, a trellis for participation. This is a huge opportunity, and it’s what most excites me about OData.
1 PowerPivot doesn’t currently expose that URL, but it could, and so could any other OData-aware application.
27 thoughts on “OData for collaborative sense-making”
Please repeat your demo using this LINK: (this is a SPARQL Protocol URL that returns a Linked Data Container).
The data should go straight into your spreadsheet via File | Open or Web Query.
You might need to remove the frag id (#) to make the LINKs live in Excel, once done, just click on a Link to explore its Linked Data mesh.
Would you explain what you mean by fully articulated (as distinct, I suppose, from partially or un-articulated) name spaces?
I mean that every significant item of interest is URL addressable.
A somewhat coarsely-granular example: The case of a city council meeting, that would mean ensuring that every agenda item is addressable, not just the meeting as a whole.
A more finely-granular example: A crime dataset would be URL-addressable by category, view, or even perhaps by more precise query that gets down to a rowset or even a field.
Are the Items (entities) within the OData style data sets referencable via a Generic HTTP URI and de-referencable via a Location specific HTTP URI (aka. URL).
BTW – Is there a live OData instance anywhere? Things get much easier with live demos :-)
Answering my own question:
1. Entities do have their own Identifiers
2. Identifiers do resolve to Metadata bearing documents about Referents.
In addition, like RDF based Linked Data, the underlying model is an EAV graph.
The only issue is that Content Negotiation isn’t exploited re. making data representation aspect of OData more dexterous.
As we did with GData, we’ll certainly add support to Virtuoso such that:
1. OData can come in and RDF Linked Data go out
2. RDF Linked Data comes in and OData based Atom representations go out.
All of the above will happen quite easily because:
1. Virtuoso already supports Atom (format and publishing protocol)
2. Has an in-built support for XPath/XQuery and XSL-T
3. Support HTTP including advanced features such as Transparent Content Negotiation and Quality of Service Algorithms.
BTW — Virtuoso 6.1 (Open Source Edition) has just been released, see: http://www.openlinksw.com/dataspace/dav/wiki/Main/
This is going to be fun!!
The only issue is that Content Negotiation isn’t exploited re. making data representation aspect of OData more dexterous.
What would be an example of how you’d like to see that used?
As we did with GData, we’ll certainly add support to Virtuoso
One live instance, BTW, is supporting the OGDI URLs I cited.
If OData decouples the EAV graph representation from Atom, you basically end up with the something very close to RDF based Linked Data as espoused by the Linked Open Data Community etc.
Once the above is done (or even without it) there is still the issue of Types, right now OData seems to only support primitive types (assuming I haven’t overlooked something). For instance, I should be able to define a Class: Person, and then have representations of instances of the Person Class in OData’s Atom+Feed+ext format.
Once we make our (OData to RDF Linked Data) Cartridge you will see (with clarity) the point above — since Content Negotiation is how we RESTfully switch Data Representation (Content Type) served to OData or RDF based Linked Data oriented User Agents, via HTTP 1.1 Transparent Content Negotiation.
I think you’ve stimulated what might finally answer the question:
Can You Have Linked Data Without RDF Data Formats?
Even better, OData vs RDF Linked Data becomes as WAR that never happens!!
This is very interesting reading as I endeavour to discover more about HTTP-based data protocols, thank you for referring me here Kingsley (http://bit.ly/b6cldL).
Regarding Kingsley’s assertion that for OData “the underlying model is an EAV graph”…is there anywhere that you elaborate on that Kingsley? The physical model underlying an OData service is not what I think of as EAV and whilst I know that is not what you meant it *is* clouding my thoughts slightly.
To me, if OData is an EAV then every dataset I’ve ever worked with is EAV. Any links/explanations here would be appreciated.
Regarding Kingsley’s request for a live OData instance I have one available (temporarily) at http://northwindazure.cloudapp.net/Northwind.svc/ and there is an associated blog post at http://sqlblog.com/blogs/jamie_thomson/archive/2009/09/10/restful-northwind-on-sql-azure.aspx
“OData vs RDF Linked Data becomes a WAR that never happens!! ”
Let us hope so!!! :)
Here is a LINK that I should have added to one of my earlier responses. It basically demonstrates RDF based Linked Data views over the Northwind Demo DB Schema. Key point, courtesy of content negotiation and data representation independence, you can explore the EAV/CR graph from any browser. Now, if you compare this your links above, the data representation shortcoming should be clear i.e., OData links don’t work with existing browsers .
Anyway, when we are done with our bridge for OData and RDF Linked Data you will be able to explore the EAV graphs from either realm using existing browsers :-)
1. http://demo.openlinksw.com/tutorial/Northwind/resource/Customer/ALFKI — URI of Customer “ALFKI”
2. http://demo.openlinksw.com/tutorial/Northwind/resource/Order/10643 — URI of an Order
3. http://dbpedia.org/resource/Berlin — URI of a destination City for a shipment
I agree that it should be perfectly possible to put an RDF head on top of an OData server with no loss of structure or semantics.
Note that OData does make use of HTTP content-type negotation, it’s just that not all implementations make use of that feature. The spec as well as the OData implementation in .NET (WCF Data Services) support Atom and JSON formats, and those are selected using content-type negotation (client makes the choice using the accept header).
Re. Content Negotiation, when implemented in the spirit of HTTP, the Data Representation choices are negotiated between User Agent and a Server. Ideally, Transparent Content Negotiation and Quality of Service algorithms enable communicating parties to enumerate their Data Representation preferences etc.
Re. JSON (which is as generic as XML) you are talking about a specific JSON+OData (JSON notation for Atom+Feed+Ext used by OData) representation.
Anyway, our bridging of OData and RDF based Linked Data (in both directions) will make lots of issues much clear.
The great thing here is that we are ultimately going to be dealing REST-fully with a common underlying Data Model i.e. E-A-V Graph, over HTTP. This also means: the transition of Data Access Middleware from Logical Model Orientation to Conceptual Model Orientation is nearing completion, on a very broad scale :-)
BTW – is there a chance that your team could publish OData URIs bound to the Northwind Demo Database? Using a well known schema drives home the benefits of the imminent: Logical to Conceptual model transition, with much more ease than new schemas ever could.
In other words, SQL Azure doesn’t currently export OData URIs but there’s a groundswell movement to vote up the feature.
Hi Jamie, thanks for doing that. Very instructive! And the URIs, at least for now, remain live.
I am trying to connect PowerPivot to a REST URL in RDF format, but PowerPivot doesn’t appear to support it. Do you know how to either convert RDF to Atom, or some other way to make this work?
Re. PowerPivot and URLs that de-ref to RDF based Linked Data Representations you need a server that is capable of transforming responses into the CXSML format understood by Pivot.
What we do is use the Pivot browser (in conventional browser usage pattern form) to establish a bookmark against an RDF based Linked Data Set via Faceted Navigation on the server side. Once the bookmark is established, and Pivot GETs against said URL, you basically have Pivot doing its thing.
Here is a demonstration of how Virtuoso does this: