<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Influencing the production of public data</title>
	<atom:link href="http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/</link>
	<description>Strategies for Internet citizens</description>
	<lastBuildDate>Sun, 12 Feb 2012 18:22:41 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Making public data APIs is a business now &#171; baby blog</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-131152</link>
		<dc:creator><![CDATA[Making public data APIs is a business now &#171; baby blog]]></dc:creator>
		<pubDate>Thu, 24 Dec 2009 10:18:55 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-131152</guid>
		<description><![CDATA[[...] Udell blogs about a company that builds interfaces to public-sector [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Udell blogs about a company that builds interfaces to public-sector [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Button Forums &#187; Enterprise Mashups</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-130140</link>
		<dc:creator><![CDATA[Button Forums &#187; Enterprise Mashups]]></dc:creator>
		<pubDate>Fri, 11 Sep 2009 10:21:25 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-130140</guid>
		<description><![CDATA[[...] service to anyone that would like to consume it [UPDATE: Read Jon&#039;s own writeup of the interview at Influencing the production of public&#160;data]. For example, if you want to know the United Kingdom population’s annual growth rate since 1991, [...]]]></description>
		<content:encoded><![CDATA[<p>[...] service to anyone that would like to consume it [UPDATE: Read Jon&#39;s own writeup of the interview at Influencing the production of public&nbsp;data]. For example, if you want to know the United Kingdom population’s annual growth rate since 1991, [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The Third Bit &#187; Blog Archive</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129471</link>
		<dc:creator><![CDATA[The Third Bit &#187; Blog Archive]]></dc:creator>
		<pubDate>Thu, 16 Jul 2009 20:27:39 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129471</guid>
		<description><![CDATA[[...] discusses this further in his post on influencing the production of public data. So let me throw it open: what do you want your city/county/province/national government to put [...]]]></description>
		<content:encoded><![CDATA[<p>[...] discusses this further in his post on influencing the production of public data. So let me throw it open: what do you want your city/county/province/national government to put [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ken Kennedy</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129393</link>
		<dc:creator><![CDATA[Ken Kennedy]]></dc:creator>
		<pubDate>Mon, 13 Jul 2009 00:08:45 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129393</guid>
		<description><![CDATA[I definitely agree with both here; data dumps often can solve relatively simple problems much more quickly than APIs (though you don&#039;t necessarily build the advantages of the repeatability on new data). Also, even if you are planning on hitting the API long term, having a local datastore that you can use to mockup the remote API while you&#039;re starting out can be very helpful.]]></description>
		<content:encoded><![CDATA[<p>I definitely agree with both here; data dumps often can solve relatively simple problems much more quickly than APIs (though you don&#8217;t necessarily build the advantages of the repeatability on new data). Also, even if you are planning on hitting the API long term, having a local datastore that you can use to mockup the remote API while you&#8217;re starting out can be very helpful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jamie Thomson</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129261</link>
		<dc:creator><![CDATA[Jamie Thomson]]></dc:creator>
		<pubDate>Thu, 09 Jul 2009 12:07:47 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129261</guid>
		<description><![CDATA[&quot;By extracting it from HTML tables?&quot;
That was my first assumption when I heard about it but actually no, it CAN do that but its much more.

If you consider that (a) markup is inherently data and (b) the markup is in a known format (i.e. it hasn&#039;t changed since you last looked at it) then that data can be extracted.

The example I saw was automating the process of going to a SERP, extracting the title/URL/Description of each of the &quot;10 blue links&quot; on the first 5 results pages and returning those 50 rows in a 3-column dataset. There&#039;s no HTML pages in a SERP but Kapow can still loop over them because of the nature of HTML.
Plus it also has worflow (i.e. visit each of the first 5 SERPs in turn and union the results together)

-Jamie]]></description>
		<content:encoded><![CDATA[<p>&#8220;By extracting it from HTML tables?&#8221;<br />
That was my first assumption when I heard about it but actually no, it CAN do that but its much more.</p>
<p>If you consider that (a) markup is inherently data and (b) the markup is in a known format (i.e. it hasn&#8217;t changed since you last looked at it) then that data can be extracted.</p>
<p>The example I saw was automating the process of going to a SERP, extracting the title/URL/Description of each of the &#8220;10 blue links&#8221; on the first 5 results pages and returning those 50 rows in a 3-column dataset. There&#8217;s no HTML pages in a SERP but Kapow can still loop over them because of the nature of HTML.<br />
Plus it also has worflow (i.e. visit each of the first 5 SERPs in turn and union the results together)</p>
<p>-Jamie</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Udell</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129259</link>
		<dc:creator><![CDATA[Jon Udell]]></dc:creator>
		<pubDate>Thu, 09 Jul 2009 11:27:20 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129259</guid>
		<description><![CDATA[&gt; It harvests data from a DOM 

By extracting it from HTML tables?]]></description>
		<content:encoded><![CDATA[<p>&gt; It harvests data from a DOM </p>
<p>By extracting it from HTML tables?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jamie Thomson</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129203</link>
		<dc:creator><![CDATA[Jamie Thomson]]></dc:creator>
		<pubDate>Wed, 08 Jul 2009 21:34:45 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129203</guid>
		<description><![CDATA[Jon,
I listened to the podcast a couple of weeks ago and it immediately piqued my interest - I&#039;m fascinated by the exposure of web-based data in an easily consumable manner.

To that end I&#039;ve just seena d emo of something called Kapow (don&#039;t worry I&#039;m not selling anything here. I have no vested interest, I just thinks its a cool technology) which is effectively a cross between a screen scraper and a ETL tool. It harvests data from a DOM and presents it in a RESTful data service - really fascinating stuff. I blogged about it in case you&#039;re interested: http://blogs.conchango.com/jamiethomson/archive/2009/07/08/kapow-etl-for-html.aspx

Keep up the great work - especially the evangelism of calendar subscription.

cheers
Jamie]]></description>
		<content:encoded><![CDATA[<p>Jon,<br />
I listened to the podcast a couple of weeks ago and it immediately piqued my interest &#8211; I&#8217;m fascinated by the exposure of web-based data in an easily consumable manner.</p>
<p>To that end I&#8217;ve just seena d emo of something called Kapow (don&#8217;t worry I&#8217;m not selling anything here. I have no vested interest, I just thinks its a cool technology) which is effectively a cross between a screen scraper and a ETL tool. It harvests data from a DOM and presents it in a RESTful data service &#8211; really fascinating stuff. I blogged about it in case you&#8217;re interested: <a href="http://blogs.conchango.com/jamiethomson/archive/2009/07/08/kapow-etl-for-html.aspx" rel="nofollow">http://blogs.conchango.com/jamiethomson/archive/2009/07/08/kapow-etl-for-html.aspx</a></p>
<p>Keep up the great work &#8211; especially the evangelism of calendar subscription.</p>
<p>cheers<br />
Jamie</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Udell</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129136</link>
		<dc:creator><![CDATA[Jon Udell]]></dc:creator>
		<pubDate>Tue, 07 Jul 2009 23:24:48 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129136</guid>
		<description><![CDATA[&gt; Perhaps there should be one proviso though
&gt; – asking for a feature means a commitment
&gt; to use it.

That&#039;d be an interesting quid pro quo!]]></description>
		<content:encoded><![CDATA[<p>&gt; Perhaps there should be one proviso though<br />
&gt; – asking for a feature means a commitment<br />
&gt; to use it.</p>
<p>That&#8217;d be an interesting quid pro quo!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Udell</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129135</link>
		<dc:creator><![CDATA[Jon Udell]]></dc:creator>
		<pubDate>Tue, 07 Jul 2009 23:23:09 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129135</guid>
		<description><![CDATA[&gt; The API route sounds better, but it 
&gt; means that whatever development you
&gt; do to pull things from a site starts
&gt; with protocols and interfaces and 
&gt; software and queries and all sorts 
&gt; of things that are appealing to
&gt; programmers but difficult for most
&gt; everyone else.

Of course it needn&#039;t be either/or. There can be APIs and downloadable files. Arguably there should be, with a preference for the latter when data quantity is modest, and the former when it is vast.]]></description>
		<content:encoded><![CDATA[<p>&gt; The API route sounds better, but it<br />
&gt; means that whatever development you<br />
&gt; do to pull things from a site starts<br />
&gt; with protocols and interfaces and<br />
&gt; software and queries and all sorts<br />
&gt; of things that are appealing to<br />
&gt; programmers but difficult for most<br />
&gt; everyone else.</p>
<p>Of course it needn&#8217;t be either/or. There can be APIs and downloadable files. Arguably there should be, with a preference for the latter when data quantity is modest, and the former when it is vast.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steven Willmott</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129134</link>
		<dc:creator><![CDATA[Steven Willmott]]></dc:creator>
		<pubDate>Tue, 07 Jul 2009 22:02:49 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129134</guid>
		<description><![CDATA[Thanks for great conversation and write up Jon - it was a pleasure to talk. We definitely see the UNDATA API as an ongoing project and hope it will grow increasingly useful (we&#039;ve just added IMF data + more sorts for the queries) - if we&#039;d gone for an all or nothing on day one it would have taken a lot longer to launch (and maybe would never have made it). We&#039;re certainly keen to hear what people would like from it next (especially if they have a concrete thing they&#039;d like to do with it). 

Hopefully growing a useful resource will breed more ideas and then changes in the resource. Interestingly it&#039;s probably a little easier for an unofficial skunkworks project to do this at least to begin with than the UN itself - since expectations on day one would be much higher. 

The SQL example is a nice one - it would actually be interesting to add different query languages that people find useful - security and load issues we&#039;d have to look at (plus the API runs on the Google AppEngine - i.e. Big Table, not MySQL).

Perhaps there should be one proviso though - asking for a feature means a commitment to use it :). That way we can fulfill Jon&#039;s idea of co-evolution of the data and the Apps!

We&#039;d be very happy to have feedback / comments / suggestions here in this thread or over at http://www.undata-api.org/]]></description>
		<content:encoded><![CDATA[<p>Thanks for great conversation and write up Jon &#8211; it was a pleasure to talk. We definitely see the UNDATA API as an ongoing project and hope it will grow increasingly useful (we&#8217;ve just added IMF data + more sorts for the queries) &#8211; if we&#8217;d gone for an all or nothing on day one it would have taken a lot longer to launch (and maybe would never have made it). We&#8217;re certainly keen to hear what people would like from it next (especially if they have a concrete thing they&#8217;d like to do with it). </p>
<p>Hopefully growing a useful resource will breed more ideas and then changes in the resource. Interestingly it&#8217;s probably a little easier for an unofficial skunkworks project to do this at least to begin with than the UN itself &#8211; since expectations on day one would be much higher. </p>
<p>The SQL example is a nice one &#8211; it would actually be interesting to add different query languages that people find useful &#8211; security and load issues we&#8217;d have to look at (plus the API runs on the Google AppEngine &#8211; i.e. Big Table, not MySQL).</p>
<p>Perhaps there should be one proviso though &#8211; asking for a feature means a commitment to use it :). That way we can fulfill Jon&#8217;s idea of co-evolution of the data and the Apps!</p>
<p>We&#8217;d be very happy to have feedback / comments / suggestions here in this thread or over at <a href="http://www.undata-api.org/" rel="nofollow">http://www.undata-api.org/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Edward Vielmetti</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129129</link>
		<dc:creator><![CDATA[Edward Vielmetti]]></dc:creator>
		<pubDate>Tue, 07 Jul 2009 19:11:53 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129129</guid>
		<description><![CDATA[Jon -

I always wonder whether the right answer is to get an API to a query interface to a database, or whether you&#039;re better off with a snapshot and dump of a collection of raw data.

The API route sounds better, but it means that whatever development you do to pull things from a site starts with protocols and interfaces and software and queries and all sorts of things that are appealing to programmers but difficult for most everyone else.

The alternative, a simple data dump in some easy to parse file format, lets you figure out how the query structure looks like based on yor own data needs and gives you the opportunity either to restructure things for better efficiency or to apply much more primitive tools to do ad hoc queries.

My experience with this so far has been on a street tree database (nicknamed &quot;EveryTree&quot;) in Ann Arbor - there&#039;s now a CSV file with about 50000 geocoded and identified trees, and I was able to get something useful out of it with tools as simple as &quot;grep&quot; in a small amount of time to get a sense for what was possible.

thanks

Ed

annarbor.com]]></description>
		<content:encoded><![CDATA[<p>Jon -</p>
<p>I always wonder whether the right answer is to get an API to a query interface to a database, or whether you&#8217;re better off with a snapshot and dump of a collection of raw data.</p>
<p>The API route sounds better, but it means that whatever development you do to pull things from a site starts with protocols and interfaces and software and queries and all sorts of things that are appealing to programmers but difficult for most everyone else.</p>
<p>The alternative, a simple data dump in some easy to parse file format, lets you figure out how the query structure looks like based on yor own data needs and gives you the opportunity either to restructure things for better efficiency or to apply much more primitive tools to do ad hoc queries.</p>
<p>My experience with this so far has been on a street tree database (nicknamed &#8220;EveryTree&#8221;) in Ann Arbor &#8211; there&#8217;s now a CSV file with about 50000 geocoded and identified trees, and I was able to get something useful out of it with tools as simple as &#8220;grep&#8221; in a small amount of time to get a sense for what was possible.</p>
<p>thanks</p>
<p>Ed</p>
<p>annarbor.com</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Udell</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129126</link>
		<dc:creator><![CDATA[Jon Udell]]></dc:creator>
		<pubDate>Tue, 07 Jul 2009 15:26:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129126</guid>
		<description><![CDATA[&gt; If these data portals would merely allow 
&gt; end-users read-only SQL access to their 
&gt; underlying databases — they would be 
&gt; amazed at what innovative uses might emerge.

Very interesting point. Historically that was unthinkable because of the fear that unthrottled query would impact services. But as databases move to the cloud there is renewed incentive to manage such access. I hope what you envision will come to pass.]]></description>
		<content:encoded><![CDATA[<p>&gt; If these data portals would merely allow<br />
&gt; end-users read-only SQL access to their<br />
&gt; underlying databases — they would be<br />
&gt; amazed at what innovative uses might emerge.</p>
<p>Very interesting point. Historically that was unthinkable because of the fear that unthrottled query would impact services. But as databases move to the cloud there is renewed incentive to manage such access. I hope what you envision will come to pass.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: datalibre.ca &#183; Open Data Access &#38; APIs</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129125</link>
		<dc:creator><![CDATA[datalibre.ca &#183; Open Data Access &#38; APIs]]></dc:creator>
		<pubDate>Tue, 07 Jul 2009 13:16:03 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129125</guid>
		<description><![CDATA[[...] Jon Udell&#8217;s latest innovators podcast, Open Data Access with Steven Willmott: There&#8217;s growing awareness of the need to publish data online, and to support programmatic access to that data. In this conversation, host Jon Udell talks with Steven Willmott about how his company, 3Scale, helps businesses create and manage application programming interfaces to their data. [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Jon Udell&#8217;s latest innovators podcast, Open Data Access with Steven Willmott: There&#8217;s growing awareness of the need to publish data online, and to support programmatic access to that data. In this conversation, host Jon Udell talks with Steven Willmott about how his company, 3Scale, helps businesses create and manage application programming interfaces to their data. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Larry Welkowitz</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129103</link>
		<dc:creator><![CDATA[Larry Welkowitz]]></dc:creator>
		<pubDate>Mon, 06 Jul 2009 20:21:15 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129103</guid>
		<description><![CDATA[We need to know &quot;what the data mean&quot; in it&#039;s original form.  For e.g., drug companies might say &quot;this drug works 80% of the time&quot;...but what does that mean?  Self-report? Blind ratings by clinicians? biological tests?

Figures don&#039;t lie, but liars figure.]]></description>
		<content:encoded><![CDATA[<p>We need to know &#8220;what the data mean&#8221; in it&#8217;s original form.  For e.g., drug companies might say &#8220;this drug works 80% of the time&#8221;&#8230;but what does that mean?  Self-report? Blind ratings by clinicians? biological tests?</p>
<p>Figures don&#8217;t lie, but liars figure.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael E Driscoll</title>
		<link>http://blog.jonudell.net/2009/07/06/influencing-the-production-of-public-data/#comment-129101</link>
		<dc:creator><![CDATA[Michael E Driscoll]]></dc:creator>
		<pubDate>Mon, 06 Jul 2009 18:45:44 +0000</pubDate>
		<guid isPermaLink="false">http://blog.jonudell.net/?p=1754#comment-129101</guid>
		<description><![CDATA[Indeed, organizations that blindly follow the &quot;give us the data&quot; mantra often give us data that is unusable.

Data portals often fail for the same reason that chart wizards fail -- because they are constrained by pre-built casts, which data must be poured into.

But we already have a widely-understood data query language that is interactive and expressive: it&#039;s called SQL.

If these data portals would merely allow end-users read-only SQL access to their underlying databases -- they would be amazed at what innovative uses might emerge.]]></description>
		<content:encoded><![CDATA[<p>Indeed, organizations that blindly follow the &#8220;give us the data&#8221; mantra often give us data that is unusable.</p>
<p>Data portals often fail for the same reason that chart wizards fail &#8212; because they are constrained by pre-built casts, which data must be poured into.</p>
<p>But we already have a widely-understood data query language that is interactive and expressive: it&#8217;s called SQL.</p>
<p>If these data portals would merely allow end-users read-only SQL access to their underlying databases &#8212; they would be amazed at what innovative uses might emerge.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

