I had been meaning to explore PubSubHubbub, a protocol that enables near-realtime consumption of data feeds. Then somebody asked me: “Can OData feeds update through PubSubHubbub?” OData, which recently made a splash at the MIX conference, is based on Atom feeds. And PubSubHubbub works with Atom feeds. So I figured it would be trivial for an OData producer to hook into a PubSubHubbub cloud.
I’ve now done the experiment, and the answer is: Yes, it is trivial. In an earlier post I described how I’m exporting health and performance data from my elmcity service as an OData feed. In theory, enabling that feed for PubSubHubbub should only require me to add a single XML element to that feed. If the hub that connects publishers and subscribers is Google’s own reference implementation of the protocol, at http://pubsubhubbub.appspot.com, then that element is:
<link rel="hub" href="http://pubsubbubbub.appspot.com"/>
So I added that to my OData feed. To verify that it worked, I tried using the publish and subscribe tools at pubsubhubbub.appspot.com, at first with no success. That was OK, because it forced me to implement my own publisher and my own subscriber, which helped me understand the protocol. Once I worked out the kinks, I was able to use my own subscriber to tell Google’s hub that I wanted my subscriber to receive near-realtime updates when the feed was updated. And I was able to use my own publisher to tell Google’s hub that the feed had been updated, thus triggering a push to the subscriber.
In this case, the feed is produced by the Azure Table service. It could also have been produced by the SQL Azure service, or by any other data service — based on SQL or not — that knows how to emit Atom feeds. And in this case, the feed URL (or, as the spec calls it, the topic URL) expresses query syntax that passes through to the underlying Azure Table service. Here’s one variant of that URL:
That query asks for the whole table. But even though the service that populates that table only adds a new record every 10 minutes, the total number of records becomes unwieldy after a few days. So the query URL can also restrict the results to just recent records, like so:
The result of that query, however, is different from the result of this one:
Now here’s my question. The service knows, when it updates the table, that any URL referring to that table is now stale. How does it tell the hub that? The spec says that the topic URL “MUST NOT contain an anchor fragment” but “can otherwise be free-form.” If the feed producer is a data service that supports a query language, and the corresponding OData service supports RESTful query, there is a whole family of topic URLs that can be subscribed to. How do publishers and subscribers specify the parent?