Comments on: Influencing the production of public data

By: Making public data APIs is a business now « baby blog

Making public data APIs is a business now « baby blog — Thu, 24 Dec 2009 10:18:55 +0000

[…] Udell blogs about a company that builds interfaces to public-sector […]

By: Button Forums » Enterprise Mashups

Button Forums » Enterprise Mashups — Fri, 11 Sep 2009 10:21:25 +0000

[…] service to anyone that would like to consume it [UPDATE: Read Jon's own writeup of the interview at Influencing the production of public data]. For example, if you want to know the United Kingdom population’s annual growth rate since 1991, […]

By: The Third Bit » Blog Archive

The Third Bit » Blog Archive — Thu, 16 Jul 2009 20:27:39 +0000

[…] discusses this further in his post on influencing the production of public data. So let me throw it open: what do you want your city/county/province/national government to put […]

By: Ken Kennedy

Ken Kennedy — Mon, 13 Jul 2009 00:08:45 +0000

In reply to Jon Udell. I definitely agree with both here; data dumps often can solve relatively simple problems much more quickly than APIs (though you don't necessarily build the advantages of the repeatability on new data). Also, even if you are planning on hitting the API long term, having a local datastore that you can use to mockup the remote API while you're starting out can be very helpful.

By: Jamie Thomson

Jamie Thomson — Thu, 09 Jul 2009 12:07:47 +0000

“By extracting it from HTML tables?”
That was my first assumption when I heard about it but actually no, it CAN do that but its much more.

If you consider that (a) markup is inherently data and (b) the markup is in a known format (i.e. it hasn’t changed since you last looked at it) then that data can be extracted.

The example I saw was automating the process of going to a SERP, extracting the title/URL/Description of each of the “10 blue links” on the first 5 results pages and returning those 50 rows in a 3-column dataset. There’s no HTML pages in a SERP but Kapow can still loop over them because of the nature of HTML.
Plus it also has worflow (i.e. visit each of the first 5 SERPs in turn and union the results together)

-Jamie

By: Jon Udell

Jon Udell — Thu, 09 Jul 2009 11:27:20 +0000

> It harvests data from a DOM

By extracting it from HTML tables?

By: Jamie Thomson

Jamie Thomson — Wed, 08 Jul 2009 21:34:45 +0000

Jon,
I listened to the podcast a couple of weeks ago and it immediately piqued my interest – I’m fascinated by the exposure of web-based data in an easily consumable manner.

To that end I’ve just seena d emo of something called Kapow (don’t worry I’m not selling anything here. I have no vested interest, I just thinks its a cool technology) which is effectively a cross between a screen scraper and a ETL tool. It harvests data from a DOM and presents it in a RESTful data service – really fascinating stuff. I blogged about it in case you’re interested: http://blogs.conchango.com/jamiethomson/archive/2009/07/08/kapow-etl-for-html.aspx

Keep up the great work – especially the evangelism of calendar subscription.

cheers
Jamie

By: Jon Udell

Jon Udell — Tue, 07 Jul 2009 23:24:48 +0000

> Perhaps there should be one proviso though
> – asking for a feature means a commitment
> to use it.

That’d be an interesting quid pro quo!

By: Jon Udell

Jon Udell — Tue, 07 Jul 2009 23:23:09 +0000

In reply to Edward Vielmetti.

> The API route sounds better, but it
> means that whatever development you
> do to pull things from a site starts
> with protocols and interfaces and
> software and queries and all sorts
> of things that are appealing to
> programmers but difficult for most
> everyone else.

Of course it needn’t be either/or. There can be APIs and downloadable files. Arguably there should be, with a preference for the latter when data quantity is modest, and the former when it is vast.

By: Steven Willmott

Steven Willmott — Tue, 07 Jul 2009 22:02:49 +0000

Thanks for great conversation and write up Jon – it was a pleasure to talk. We definitely see the UNDATA API as an ongoing project and hope it will grow increasingly useful (we’ve just added IMF data + more sorts for the queries) – if we’d gone for an all or nothing on day one it would have taken a lot longer to launch (and maybe would never have made it). We’re certainly keen to hear what people would like from it next (especially if they have a concrete thing they’d like to do with it).

Hopefully growing a useful resource will breed more ideas and then changes in the resource. Interestingly it’s probably a little easier for an unofficial skunkworks project to do this at least to begin with than the UN itself – since expectations on day one would be much higher.

The SQL example is a nice one – it would actually be interesting to add different query languages that people find useful – security and load issues we’d have to look at (plus the API runs on the Google AppEngine – i.e. Big Table, not MySQL).

Perhaps there should be one proviso though – asking for a feature means a commitment to use it :). That way we can fulfill Jon’s idea of co-evolution of the data and the Apps!

We’d be very happy to have feedback / comments / suggestions here in this thread or over at http://www.undata-api.org/

By: Edward Vielmetti

Edward Vielmetti — Tue, 07 Jul 2009 19:11:53 +0000

Jon –

I always wonder whether the right answer is to get an API to a query interface to a database, or whether you’re better off with a snapshot and dump of a collection of raw data.

The API route sounds better, but it means that whatever development you do to pull things from a site starts with protocols and interfaces and software and queries and all sorts of things that are appealing to programmers but difficult for most everyone else.

The alternative, a simple data dump in some easy to parse file format, lets you figure out how the query structure looks like based on yor own data needs and gives you the opportunity either to restructure things for better efficiency or to apply much more primitive tools to do ad hoc queries.

My experience with this so far has been on a street tree database (nicknamed “EveryTree”) in Ann Arbor – there’s now a CSV file with about 50000 geocoded and identified trees, and I was able to get something useful out of it with tools as simple as “grep” in a small amount of time to get a sense for what was possible.

thanks

annarbor.com

By: Jon Udell

Jon Udell — Tue, 07 Jul 2009 15:26:38 +0000

> If these data portals would merely allow
> end-users read-only SQL access to their
> underlying databases — they would be
> amazed at what innovative uses might emerge.

Very interesting point. Historically that was unthinkable because of the fear that unthrottled query would impact services. But as databases move to the cloud there is renewed incentive to manage such access. I hope what you envision will come to pass.

By: datalibre.ca · Open Data Access & APIs

datalibre.ca · Open Data Access & APIs — Tue, 07 Jul 2009 13:16:03 +0000

[…] Jon Udell’s latest innovators podcast, Open Data Access with Steven Willmott: There’s growing awareness of the need to publish data online, and to support programmatic access to that data. In this conversation, host Jon Udell talks with Steven Willmott about how his company, 3Scale, helps businesses create and manage application programming interfaces to their data. […]

By: Larry Welkowitz

Larry Welkowitz — Mon, 06 Jul 2009 20:21:15 +0000

We need to know “what the data mean” in it’s original form. For e.g., drug companies might say “this drug works 80% of the time”…but what does that mean? Self-report? Blind ratings by clinicians? biological tests?

Figures don’t lie, but liars figure.

By: Michael E Driscoll

Michael E Driscoll — Mon, 06 Jul 2009 18:45:44 +0000

Indeed, organizations that blindly follow the “give us the data” mantra often give us data that is unusable.

Data portals often fail for the same reason that chart wizards fail — because they are constrained by pre-built casts, which data must be poured into.

But we already have a widely-understood data query language that is interactive and expressive: it’s called SQL.

If these data portals would merely allow end-users read-only SQL access to their underlying databases — they would be amazed at what innovative uses might emerge.

By: Information in Rotation » Blog Archive » Making public data APIs is a business now

Mon, 06 Jul 2009 14:51:03 +0000

[…] Udell blogs about a company that builds interfaces to public-sector […]