The emerging discipline of social data analysis and visualization faces two challenges. First, obviously, you need data. Then, more interestingly, you need to figure out ways for people to create, share, and collaboratively refine interpretations of the data. There are a handful of well-known and powerful sources of data. The OECD’s data, for example, drives several of the visualizations at IBM’s Many Eyes site. Where else can you find data for these kinds of tools and services to chew on?
Sources I’ve used and discussed include Washington DC’s CAPStat and the Dartmouth Atlas of Health Care. A number of others are listed in this summary from the session at Foo Camp 07 on liberating government data.
For my own purposes, I’ve decided to keep track of these kinds of public data sources at del.icio.us/judell/publicdata. One of the delightful consequences of doing things that way is that I can pop up a level, to del.icio.us/tag/publicdata, in order to find out what other folks have been storing in the publicdata bucket.
There’s not a whole lot there, yet, but here’s one gem I discovered by way of a link to Gapminder: the United Nations Common Database. From the Gapminder blog on June 7:
UN statistics finally liberated and free of charge!
In a bold move that hopefully will set the standard for all major producers of statistics, UN Statistical Division have made their data accessible and FREE OF CHARGE from May 1 this year. United Nations Common Database (UNCDB) is now available for everyone, with no demand of subscription or user fees on their web-site.
We now look forward to the domino-effect and the liberation of other hidden or locked global statistics from other producers and collectors of data.
Amen. To that end, I invite readers of this blog to contribute these kinds of findings — as you encounter them in your travels — to the publicdata bucket in del.icio.us, to which I’m now subscribed. I’ll in turn curate that list at judell/publicdata, with an eye toward sources that I deem to be noteworthy, conveniently accessible, and likely to yield useful analysis.
July 5, 2007 at 1:04 pm
[...] Udell has an excellent post about social data analysis on his blog. He has a nice, quick description of some the issues around data, with a tip of the hat [...]
July 5, 2007 at 1:33 pm
Hi Jon,
I was reading about this over at boingboing, and thought that it might be of interest here (maybe a link to add to your publicdata list as well):
Comprehensive Knowledge Archive Network (CKAN)
http://blog.okfn.org/2007/07/04/the-comprehensive-knowledge-archive-network-ckan-launched-today/
From the post:
—
CKAN is a registry of open knowledge packages and projects — be that a set of Shakespeare’s works, a global population density database, the voting records of MPs, or 30 years of US patents.
CKAN is the place to search for open knowledge resources as well as register your own. Those familiar with freshmeat (a registry of open source software), CPAN (Perl) or PyPI (python package index) can think of CKAN as providing an analogous service for open knowledge.
—
I hope you find it useful if you did not already know about it.
-g.
July 6, 2007 at 6:43 am
[...] Udell’s list of public data Jon promises to keep the list del.icio.us/judell/publicdata updated. I wonder if some of the sources are [...]
July 6, 2007 at 11:39 am
The link to the Foo 07 summary should be http://wiki.oreillynet.com/foocamp07/index.cgi?LiberateGovernmentInfo
July 6, 2007 at 12:22 pm
OKFN, which Gerry points to, also has produced a really helpful guide on licensing data for openness, i.e. the myriad ways of making data open from a legal standpoint. http://okfn.org/wiki/OpenDataLicensing
July 6, 2007 at 2:20 pm
“The link to the Foo 07 summary should be”
Thanks!
“CKAN is a registry of open knowledge packages and projects”
Thanks! I’ll look through those to see which offer the kind of data suitable for analysis/visualization/decision-making, which is the sort of thing I’m interested in here.
“http://okfn.org/wiki/OpenDataLicensing”
Thanks! That’s particularly interesting in light of this week’s ITConversations podcast with Timo Hannay of Nature Publishing Group, who (among other things) is interested in making scientific data publishable and creditable separately from the analysis of the data.
July 6, 2007 at 5:09 pm
Jon,
Just last week a very historic meeting took place in Istanbul, Turkey–The OECD’s World Forum on Statistics, Knowledge, and Policy. All the world’s data leaders gathered in one place to discuss measuring the progress of societies (http://www.oecd.org/site/0,3407,en_21571361_31938349_1_1_1_1_1,00.html). What emerged from that conference is a sweeping vision for the future of making the world’s data useful. Essentially, what the Kyoto Protocol is to environment and climate change, the Istanbul Declaration (http://www.oecd.org/dataoecd/14/46/38883774.pdf) will be to measuring the world’s progress with data. The OECD and it’s member organizations have a ton of credibility for this type of visionary undertaking: they did invent GDP after all.
Professor Hans Rosling’s Gapminder, Swivel and IBM’s Many Eyes were among the exhibitors at the conference and we had many conversations throughout the week about the future of data. These are indeed exciting times.
You briefly mentioned Professor Rosling. It is impossible to overestimate the impact the Professor and his vision for liberating the world’s data has had for those of us following his trail. Not to mention the vision of the Secretary General, Chief Statistician, Head of Dissemination and many others at the OECD, who all deserve a big nod as well.
A highlight for us at the conference was that 1200+ delegates were asked to vote for the two exhibitors they felt showed the most vision and capability for helping people turn data into knowledge and along with a fantastic project called Mapping Worlds (http://www.mappingworlds.org/) Swivel was voted a winner. We then nervously presented Swivel to all the delegates with the UN-style earpieces and translators, it was exhilarating.
Jon, it would be great to spend some time with you and talk more about the Istanbul Declaration and the future of data as you see it.
Brian Mulloy
CEO & Co-founder
brian@swivel.com
http://www.swivel.com
July 7, 2007 at 2:20 am
[...] 数据和可视化我还是对数据可视化和资讯超载乐此不疲(我现在里斯本……要在明天的IADIS*会议上就此主题发言)。Jon Udell 强调开放数据的重要性——OECD*也提供了重要数据,联合国最近公布了公用数据库。数据开放后,我们可混合、再融合、对比、扩充数据,以无尽的方式利用它。Hans Rosling作了两场精彩演讲,展示了2006年和 2007年数据可视化的力量。强有力的内容。注:IADIS:信息社会发展国际协会 [...]
July 9, 2007 at 5:05 am
Some follow-up Platform thoughts
Last week I wrote a blog post looking at some of the different ways in which ‘Platforms’ are being brought to web scale and pervasiveness. In writing, I concentrated upon the quite different approaches that I saw Facebook and…
July 9, 2007 at 5:21 am
I’d briefly like to echo the earlier references to the OKFN… and pick up on the importance of effective licensing.
As my colleague Rob Styles said at the recent WWW2007 conference (http://blogs.talis.com/nodalities/2007/05/presentations_from_www2007_ope.php), we are not helping prospective users when we fail to apply explicit and visible licensing terms to content that we make available. We may well publish with the intention that our work should be freely used, but the lack of an explicit permission doesn’t actually mean that; in many jurisdictions it means “All Rights Reserved”.
Creative Commons has done some great work in raising the profile of “Some Rights Reserved” for works covered by the laws of Copyright; a Creative Commons license gives the creator of some creative work (a story, a song, a picture…) the ability to explicitly grant permission for it to be used and reused in a wide range of ways that would previously require the law-abiding to contact the creator and request permission.
Creative Commons licenses, though, appear not to apply very well to data. Leveraging a European notion of the ‘database right’, we developed something called the Talis Community License (http://www.talis.com/tdn/tcl/) to meet our own needs with regard to ensuring the use and reuse of data contributed to some of the services we host.
This has proved successful, and - in the absence of anyone else stepping forward - we are now embarking on some work to strengthen the legal standing of this license, rename it to remove the direct link to ourselves, develop expressions of the license that utilise contract law in jurisdictions where the database right does not apply, and facilitate a wide community of stakeholders to which the ownership and upkeep of the license can be given. If anyone is interested, please do get in touch.
July 9, 2007 at 4:12 pm
[...] Show me the data - John Udell Oproep om alle openbare databanken te linken in del.icio.us [...]
July 11, 2007 at 9:54 am
[...] Jon Udell has mentioned, there’s a ton of data online, but it’s not often we can find it, often [...]
July 12, 2007 at 4:53 pm
I assume you are familiar with http://www.fedstats.gov, but this is a beta site that you might find interesting: http://betaced.census.gov It is a federal collaboration of statistical agencies to integrate community data.
July 17, 2007 at 8:29 am
[...] evolution in del.icio.us Filed under: Uncategorized — Jon Udell @ 8:29 am Recently I began keeping track of interesting public data sources using the del.icio.us tag judell/publicdata, and invited others [...]
July 28, 2007 at 2:28 pm
John,
I wanted to share this with you as I came across it today. Have you heard of opencongress.org and the Sunlight Foundation? They seem to be doing some really cool and effective things to bring public information to light.
August 10, 2007 at 6:28 am
http://www.iquango.org/mellanrummet/ - in progress visualisation of UNCD data
http://www.mptables.com/ - UK political data
Meinedata (on which they’re both based) is from
mysociety.org’s volunteers, and is a public and free
equivalent and completely inspired by gapminder (most of the
functionality, not enough of the prettiness yet). It lets you
take an excel file and have it appear in the flash system for
people to play with.
More details on my blog here : http://www.disruptiveproactivity.com/
January 17, 2008 at 8:30 pm
[...] Over the past year, I’ve been tagging interesting data I find on the web in del.icio.us. I wrote a quick python script to pull the relevant links from my del.icio.us export and list them at the bottom of this post. Most of these datasets are related to machine learning, but there are a lot of government, finance, and search datasets as well. I probably won’t get around to organizing and posting them to the wiki myself, but theinfo community should be able to figure out what to do with them. The concept reminds me a lot of Jon Udell’s post on public data. [...]