The emerging discipline of social data analysis and visualization faces two challenges. First, obviously, you need data. Then, more interestingly, you need to figure out ways for people to create, share, and collaboratively refine interpretations of the data. There are a handful of well-known and powerful sources of data. The OECD’s data, for example, drives several of the visualizations at IBM’s Many Eyes site. Where else can you find data for these kinds of tools and services to chew on?
Sources I’ve used and discussed include Washington DC’s CAPStat and the Dartmouth Atlas of Health Care. A number of others are listed in this summary from the session at Foo Camp 07 on liberating government data.
For my own purposes, I’ve decided to keep track of these kinds of public data sources at del.icio.us/judell/publicdata. One of the delightful consequences of doing things that way is that I can pop up a level, to del.icio.us/tag/publicdata, in order to find out what other folks have been storing in the publicdata bucket.
There’s not a whole lot there, yet, but here’s one gem I discovered by way of a link to Gapminder: the United Nations Common Database. From the Gapminder blog on June 7:
UN statistics finally liberated and free of charge!
In a bold move that hopefully will set the standard for all major producers of statistics, UN Statistical Division have made their data accessible and FREE OF CHARGE from May 1 this year. United Nations Common Database (UNCDB) is now available for everyone, with no demand of subscription or user fees on their web-site.
We now look forward to the domino-effect and the liberation of other hidden or locked global statistics from other producers and collectors of data.
Amen. To that end, I invite readers of this blog to contribute these kinds of findings — as you encounter them in your travels — to the publicdata bucket in del.icio.us, to which I’m now subscribed. I’ll in turn curate that list at judell/publicdata, with an eye toward sources that I deem to be noteworthy, conveniently accessible, and likely to yield useful analysis.
20 thoughts on “Show me the data”
I was reading about this over at boingboing, and thought that it might be of interest here (maybe a link to add to your publicdata list as well):
Comprehensive Knowledge Archive Network (CKAN)
From the post:
CKAN is a registry of open knowledge packages and projects — be that a set of Shakespeare’s works, a global population density database, the voting records of MPs, or 30 years of US patents.
CKAN is the place to search for open knowledge resources as well as register your own. Those familiar with freshmeat (a registry of open source software), CPAN (Perl) or PyPI (python package index) can think of CKAN as providing an analogous service for open knowledge.
I hope you find it useful if you did not already know about it.
The link to the Foo 07 summary should be http://wiki.oreillynet.com/foocamp07/index.cgi?LiberateGovernmentInfo
OKFN, which Gerry points to, also has produced a really helpful guide on licensing data for openness, i.e. the myriad ways of making data open from a legal standpoint. http://okfn.org/wiki/OpenDataLicensing
“The link to the Foo 07 summary should be”
“CKAN is a registry of open knowledge packages and projects”
Thanks! I’ll look through those to see which offer the kind of data suitable for analysis/visualization/decision-making, which is the sort of thing I’m interested in here.
Thanks! That’s particularly interesting in light of this week’s ITConversations podcast with Timo Hannay of Nature Publishing Group, who (among other things) is interested in making scientific data publishable and creditable separately from the analysis of the data.
Just last week a very historic meeting took place in Istanbul, Turkey–The OECD’s World Forum on Statistics, Knowledge, and Policy. All the world’s data leaders gathered in one place to discuss measuring the progress of societies (http://www.oecd.org/site/0,3407,en_21571361_31938349_1_1_1_1_1,00.html). What emerged from that conference is a sweeping vision for the future of making the world’s data useful. Essentially, what the Kyoto Protocol is to environment and climate change, the Istanbul Declaration (http://www.oecd.org/dataoecd/14/46/38883774.pdf) will be to measuring the world’s progress with data. The OECD and it’s member organizations have a ton of credibility for this type of visionary undertaking: they did invent GDP after all.
Professor Hans Rosling’s Gapminder, Swivel and IBM’s Many Eyes were among the exhibitors at the conference and we had many conversations throughout the week about the future of data. These are indeed exciting times.
You briefly mentioned Professor Rosling. It is impossible to overestimate the impact the Professor and his vision for liberating the world’s data has had for those of us following his trail. Not to mention the vision of the Secretary General, Chief Statistician, Head of Dissemination and many others at the OECD, who all deserve a big nod as well.
A highlight for us at the conference was that 1200+ delegates were asked to vote for the two exhibitors they felt showed the most vision and capability for helping people turn data into knowledge and along with a fantastic project called Mapping Worlds (http://www.mappingworlds.org/) Swivel was voted a winner. We then nervously presented Swivel to all the delegates with the UN-style earpieces and translators, it was exhilarating.
Jon, it would be great to spend some time with you and talk more about the Istanbul Declaration and the future of data as you see it.
CEO & Co-founder
I’d briefly like to echo the earlier references to the OKFN… and pick up on the importance of effective licensing.
As my colleague Rob Styles said at the recent WWW2007 conference (http://blogs.talis.com/nodalities/2007/05/presentations_from_www2007_ope.php), we are not helping prospective users when we fail to apply explicit and visible licensing terms to content that we make available. We may well publish with the intention that our work should be freely used, but the lack of an explicit permission doesn’t actually mean that; in many jurisdictions it means “All Rights Reserved”.
Creative Commons has done some great work in raising the profile of “Some Rights Reserved” for works covered by the laws of Copyright; a Creative Commons license gives the creator of some creative work (a story, a song, a picture…) the ability to explicitly grant permission for it to be used and reused in a wide range of ways that would previously require the law-abiding to contact the creator and request permission.
Creative Commons licenses, though, appear not to apply very well to data. Leveraging a European notion of the ‘database right’, we developed something called the Talis Community License (http://www.talis.com/tdn/tcl/) to meet our own needs with regard to ensuring the use and reuse of data contributed to some of the services we host.
This has proved successful, and – in the absence of anyone else stepping forward – we are now embarking on some work to strengthen the legal standing of this license, rename it to remove the direct link to ourselves, develop expressions of the license that utilise contract law in jurisdictions where the database right does not apply, and facilitate a wide community of stakeholders to which the ownership and upkeep of the license can be given. If anyone is interested, please do get in touch.
I assume you are familiar with http://www.fedstats.gov, but this is a beta site that you might find interesting: http://betaced.census.gov It is a federal collaboration of statistical agencies to integrate community data.
I wanted to share this with you as I came across it today. Have you heard of opencongress.org and the Sunlight Foundation? They seem to be doing some really cool and effective things to bring public information to light.
http://www.iquango.org/mellanrummet/ – in progress visualisation of UNCD data
http://www.mptables.com/ – UK political data
Meinedata (on which they’re both based) is from
mysociety.org’s volunteers, and is a public and free
equivalent and completely inspired by gapminder (most of the
functionality, not enough of the prettiness yet). It lets you
take an excel file and have it appear in the flash system for
people to play with.
More details on my blog here : http://www.disruptiveproactivity.com/
It seems like you’re interested in open sourced government databases. Find the Best is an objective comparison search engine that allows you to find a topic, compare your options and select the best choice for you. FindTheBest allows you to make faster and more informed decisions by offering you objective information that has been compiled by FTB researchers or drawn from universities, non-profits, NGOs and governmental agencies. Companies and individual reviewers may claim, edit and add listings, but all edits are reviewed by FTB before publishing.
Matt, Editor at FindTheBest