Transparency trends

13 Jan 200913 Jan 2009 ~ Jon Udell

	Zimbabwe
	Belarus
	Uzbekistan
	Côte d´Ivoire
	Venezuela
	Laos
	Haiti
	Philippines
	Kazakhstan
	Syria
	Ethiopia
	Ecuador
	Kenya
	Russia
	Malawi
	Azerbaijan
	Angola
	Nicaragua
	Pakistan
	Bangladesh
	Nigeria
	Zambia
	Mozambique
	Gambia
	Sudan
	Georgia
	Iraq
	Ukraine
	Belize
	Guatemala
	Indonesia
	Iran
	Egypt
	Honduras
	Papua New Guinea
	Paraguay
	Afghanistan
	Mongolia
	Argentina
	Cameroon
	Bolivia
	Uganda
	Yemen
	Jamaica
	Moldova
	Myanmar
	Swaziland
	Vietnam
	Kyrgyzstan
	Trinidad and Tobago
	Albania
	Congo, Republic
	Sierra Leone
	Benin
	Dominican Republic
	Macedonia
	Morocco
	Panama
	Sri Lanka
	Mali
	Nepal
	Suriname
	Burkina Faso
	Lebanon
	Mauritania
	Congo, Democratic Republic
	Rwanda
	Tonga
	Cambodia
	Somalia
	Brazil
	Saudi Arabia
	Timor-Leste
	Armenia
	Eritrea
	Namibia
	Senegal
	Turkmenistan
	Central African Republic
	Peru
	Chad
	Maldives
	Poland
	Tanzania
	Tunisia
	Kuwait
	Palestine
	Colombia
	Yugoslavia
	Burundi
	Costa Rica
	Bulgaria
	Croatia
	Oman
	Serbia
	Tajikistan
	Turkey
	China
	Italy
	Libya
	Romania
	Thailand
	El Salvador
	India
	Bosnia and Herzegovina
	Cuba
	Niger
	Gabon
	Greece
	Latvia
	Lesotho
	Lithuania
	South Africa
	Togo
	Mauritius
	Mexico
	Dominica
	Equatorial Guinea
	Ghana
	Bahrain
	Uruguay
	Israel
	Malaysia
	Czech Republic
	Macao
	Hungary
	Jordan
	Madagascar
	Algeria
	Botswana
	Seychelles
	Bhutan
	Grenada
	Guinea
	Liberia
	Belgium
	Cyprus
	Kiribati
	Slovakia
	Taiwan
	Comoros
	Guinea-Bissau
	Malta
	Portugal
	Vanuatu
	Qatar
	South Korea
	Canada
	Estonia
	Guyana
	Ireland
	Slovenia
	Japan
	Norway
	Spain
	United Arab Emirates
	Austria
	France
	Switzerland
	Chile
	Germany
	Iceland
	Luxembourg
	United Kingdom
	United States
	Australia
	Samoa
	Sweden
	Finland
	Hong Kong
	Netherlands
	Barbados
	Denmark
	Djibouti
	New Zealand
	Saint Lucia
	Sao Tome and Principe
	Singapore
	Cape Verde
	Grenadines
	Saint Vincent
	Solomon Islands
	Montenegro
	Fiji
	Puerto Rico

Since 1998, Transparency International has published an annual report called the Corruption Perception Index (CPI), which “ranks 180 countries by their perceived levels of corruption, as determined by expert assessments and opinion surveys.” Looking at the 2008 edition, I wondered about trends. Which countries have shown the most CPI volatility since 1998? Is there a trend toward light or darkness? If so, which countries run counter to the trend, and why?

The table of sparklines shown here presents a rendering of the data in a way that allows us to ask, and begin to answer, such questions. It defines CPI volatility as the difference between a country’s highest and lowest CPI ranking over the 11-year period, and sorts countries from most to least volatile. Sparklines chart this data under a reference line, and distance from that line signifies descent into darkness.

To answer one of my questions, Bangladesh, Nigeria, Georgia, and Guatemala stand out — among the most volatile countries — as atypically hopeful amidst a general downhill slide. That, anyway, is what Transparency International’s data seems to indicate.

I’ll leave it to political experts to weigh in on the plausibility of that interpretation. Here I’ll just ask a more basic question. We see tables, maps, and charts — like the ones published by Transparency International — all over the web. But in my experience, when you try to actually use the data, it’s almost always way too hard.

In a later entry I’ll describe, in gory detail, the gymnastics required to massage the TI data and produce this visualization. But just to give you a hint, here are the six different ways of encoding Côte d´Ivoire that I found in the eleven files I had to merge:

C\xC3\xB4te d\xC2\xB4Ivoire
Cote d'Ivoire
C\xF4te-d'Ivoire
Cote d\xB4Ivoire
Cote d?Ivoire
C\xF4te d\xB4Ivoire

There were also typos (Moldovaa for Moldova), variant spellings (USA vs United States), and format inconsistencies (empty vs. non-empty cells when a rank is repeated).

Why go to all the trouble to gather and publish this kind of data, and then not consolidate it into a form we can use directly?

Published by Jon Udell

View all posts by Jon Udell

8 thoughts on “Transparency trends”

Michael E. Driscoll says:

14 Jan 2009 at 9:01 am

Hi John – Nice to see such an insightful and trenchant post that is also honest enough about the hard work that went into the data analysis. As a data geek, I couldn’t agree more with you with you that that data scrubbing is a painful, laborious process. I’m optimistic that this may change, given some recent trends — (my 2 cents are at http://www.dataspora.com/blog).

The harsh truth is this: data is messy because the world is messy. Borders shift. Metrics change. Data goes uncollected or missing.

But I’m hopeful posts like this will give the data geeks out there courage to push forward and produce informative graphics like yours — in spite of the hard work.

Loading...

Reply
Jon Udell says:

14 Jan 2009 at 10:08 am

> The harsh truth is this: data is messy
> because the world is messy. Borders shift.
> Metrics change. Data goes uncollected or
> missing.

Agreed. And yet…in so many cases, it just ain’t rocket science. This is a simple spreadsheet:

http://jonudell.net/data/cpi.csv

If this info were just maintained in a master spreadsheet, and it had a row for Ivory Coast, and the new data came in tagged Côte d´Ivoire, it would be an obvious and trivial thing to reconcile that.

We shouldn’t need a tribe of “data geeks” to reverse-engineer simple stuff like this. Their skills should be applied to a different class of problem.

People need to begin to understand and apply some very basic principles of data management. It’s yet another example of how computational thinking needs to become one of the pillars of primary education along with reading, writing, and arithmetic.

Loading...

Reply
chris hollander says:

14 Jan 2009 at 11:19 am

first off, I agree, a disproportionate amount of effort is being directed at pretty trivial data management work, that should be addressed waaaay earlier in the “pipeline”.

that being said, i’m not really sure about visualizing this information as a sparkline? or, i guess more specifically, i’m not really sure that i get much value out of seeing 50+ sparklines stacked vertically on top of each other. I actually had a hard time using the presented visualizations to answer the questions you posed (trending, etc). IMO, sparklines are great micro-visualizations that can be embedded into a body of text to illustrate one specific point… however, to compare large volumes of data, wouldn’t a simple line graph of served effectively?

Loading...

Reply
Jon Udell says:

14 Jan 2009 at 11:34 am

> compare large volumes of data, wouldn’t
> a simple line graph of served effectively?

It’s good question. Try it and see!

Seriously, I’m not claiming this is the be-all, end-all for this data set.

Nothing ever is, really.

That said, I like the Tuftean “small multiples” idea. This is really 2 columns of a spreadsheet I made. A better version of this idea would be active, not static, and would enable sorting by country name as well as by volatility. That makes it easier to look up the trend for a particular country you’re interested in.

For comparison, I think 50 lines would be too many.

When I did the volatility sort in the spreadsheet, what really struck me was scrolling down the list and watching the sparklines a) flatten, and b) approach the reference line.

You get that same effect by scrolling down in this HTML page.

Now admittedly, the effect relies on a kind of poor-man’s-animation, a flip-card effect, if you will.

For what it’s worth, I actually think that all static infographics are challenged w/respect to visualizing change, and that we ultimately in many cases need moving pictures to best convey moving data.

Loading...

Reply
Pingback: Transparency trends (continued): A data-wrangling tale « Jon Udell
chris hollander says:

15 Jan 2009 at 12:26 pm

i tried it- and yes, the simple line graph was completely meaningless (at least, the limited resolution graph that excel produced)…. i waaay underestimated the volume of data. a 100% stacked area chart was somewhat more useful, but again, i think the main limiter was resolution and “navigation” capabilities.

I think a growing/shrinking bubble visualization (e.g., hans rosling style..http://www.ted.com/index.php/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html) would be great…

Loading...

Reply
Pingback: Transparency data in motion « Jon Udell
Pingback: Transparency trends (continued): A data-wrangling tale - ekcupchay.com

Transparency trends

Like this:

Published by Jon Udell

8 thoughts on “Transparency trends”

Leave a ReplyCancel reply

Share this:

Like this:

Published by Jon Udell

8 thoughts on “Transparency trends”

Leave a ReplyCancel reply

Discover more from Jon Udell