Transparency trends

Zimbabwe
Belarus
Uzbekistan
Côte d´Ivoire
Venezuela
Laos
Haiti
Philippines
Kazakhstan
Syria
Ethiopia
Ecuador
Kenya
Russia
Malawi
Azerbaijan
Angola
Nicaragua
Pakistan
Bangladesh
Nigeria
Zambia
Mozambique
Gambia
Sudan
Georgia
Iraq
Ukraine
Belize
Guatemala
Indonesia
Iran
Egypt
Honduras
Papua New Guinea
Paraguay
Afghanistan
Mongolia
Argentina
Cameroon
Bolivia
Uganda
Yemen
Jamaica
Moldova
Myanmar
Swaziland
Vietnam
Kyrgyzstan
Trinidad and Tobago
Albania
Congo, Republic
Sierra Leone
Benin
Dominican Republic
Macedonia
Morocco
Panama
Sri Lanka
Mali
Nepal
Suriname
Burkina Faso
Lebanon
Mauritania
Congo, Democratic Republic
Rwanda
Tonga
Cambodia
Somalia
Brazil
Saudi Arabia
Timor-Leste
Armenia
Eritrea
Namibia
Senegal
Turkmenistan
Central African Republic
Peru
Chad
Maldives
Poland
Tanzania
Tunisia
Kuwait
Palestine
Colombia
Yugoslavia
Burundi
Costa Rica
Bulgaria
Croatia
Oman
Serbia
Tajikistan
Turkey
China
Italy
Libya
Romania
Thailand
El Salvador
India
Bosnia and Herzegovina
Cuba
Niger
Gabon
Greece
Latvia
Lesotho
Lithuania
South Africa
Togo
Mauritius
Mexico
Dominica
Equatorial Guinea
Ghana
Bahrain
Uruguay
Israel
Malaysia
Czech Republic
Macao
Hungary
Jordan
Madagascar
Algeria
Botswana
Seychelles
Bhutan
Grenada
Guinea
Liberia
Belgium
Cyprus
Kiribati
Slovakia
Taiwan
Comoros
Guinea-Bissau
Malta
Portugal
Vanuatu
Qatar
South Korea
Canada
Estonia
Guyana
Ireland
Slovenia
Japan
Norway
Spain
United Arab Emirates
Austria
France
Switzerland
Chile
Germany
Iceland
Luxembourg
United Kingdom
United States
Australia
Samoa
Sweden
Finland
Hong Kong
Netherlands
Barbados
Denmark
Djibouti
New Zealand
Saint Lucia
Sao Tome and Principe
Singapore
Cape Verde
Grenadines
Saint Vincent
Solomon Islands
Montenegro
Fiji
Puerto Rico

Since 1998, Transparency International has published an annual report called the Corruption Perception Index (CPI), which “ranks 180 countries by their perceived levels of corruption, as determined by expert assessments and opinion surveys.” Looking at the 2008 edition, I wondered about trends. Which countries have shown the most CPI volatility since 1998? Is there a trend toward light or darkness? If so, which countries run counter to the trend, and why?

The table of sparklines shown here presents a rendering of the data in a way that allows us to ask, and begin to answer, such questions. It defines CPI volatility as the difference between a country’s highest and lowest CPI ranking over the 11-year period, and sorts countries from most to least volatile. Sparklines chart this data under a reference line, and distance from that line signifies descent into darkness.

To answer one of my questions, Bangladesh, Nigeria, Georgia, and Guatemala stand out — among the most volatile countries — as atypically hopeful amidst a general downhill slide. That, anyway, is what Transparency International’s data seems to indicate.

I’ll leave it to political experts to weigh in on the plausibility of that interpretation. Here I’ll just ask a more basic question. We see tables, maps, and charts — like the ones published by Transparency International — all over the web. But in my experience, when you try to actually use the data, it’s almost always way too hard.

In a later entry I’ll describe, in gory detail, the gymnastics required to massage the TI data and produce this visualization. But just to give you a hint, here are the six different ways of encoding Côte d´Ivoire that I found in the eleven files I had to merge:

C\xC3\xB4te d\xC2\xB4Ivoire
Cote d'Ivoire
C\xF4te-d'Ivoire
Cote d\xB4Ivoire
Cote d?Ivoire
C\xF4te d\xB4Ivoire

There were also typos (Moldovaa for Moldova), variant spellings (USA vs United States), and format inconsistencies (empty vs. non-empty cells when a rank is repeated).

Why go to all the trouble to gather and publish this kind of data, and then not consolidate it into a form we can use directly?

8 Comments

  1. Hi John – Nice to see such an insightful and trenchant post that is also honest enough about the hard work that went into the data analysis. As a data geek, I couldn’t agree more with you with you that that data scrubbing is a painful, laborious process. I’m optimistic that this may change, given some recent trends — (my 2 cents are at http://www.dataspora.com/blog).

    The harsh truth is this: data is messy because the world is messy. Borders shift. Metrics change. Data goes uncollected or missing.

    But I’m hopeful posts like this will give the data geeks out there courage to push forward and produce informative graphics like yours — in spite of the hard work.

  2. > The harsh truth is this: data is messy
    > because the world is messy. Borders shift.
    > Metrics change. Data goes uncollected or
    > missing.

    Agreed. And yet…in so many cases, it just ain’t rocket science. This is a simple spreadsheet:

    http://jonudell.net/data/cpi.csv

    If this info were just maintained in a master spreadsheet, and it had a row for Ivory Coast, and the new data came in tagged Côte d´Ivoire, it would be an obvious and trivial thing to reconcile that.

    We shouldn’t need a tribe of “data geeks” to reverse-engineer simple stuff like this. Their skills should be applied to a different class of problem.

    People need to begin to understand and apply some very basic principles of data management. It’s yet another example of how computational thinking needs to become one of the pillars of primary education along with reading, writing, and arithmetic.

  3. first off, I agree, a disproportionate amount of effort is being directed at pretty trivial data management work, that should be addressed waaaay earlier in the “pipeline”.

    that being said, i’m not really sure about visualizing this information as a sparkline? or, i guess more specifically, i’m not really sure that i get much value out of seeing 50+ sparklines stacked vertically on top of each other. I actually had a hard time using the presented visualizations to answer the questions you posed (trending, etc). IMO, sparklines are great micro-visualizations that can be embedded into a body of text to illustrate one specific point… however, to compare large volumes of data, wouldn’t a simple line graph of served effectively?

  4. > compare large volumes of data, wouldn’t
    > a simple line graph of served effectively?

    It’s good question. Try it and see!

    Seriously, I’m not claiming this is the be-all, end-all for this data set.

    Nothing ever is, really.

    That said, I like the Tuftean “small multiples” idea. This is really 2 columns of a spreadsheet I made. A better version of this idea would be active, not static, and would enable sorting by country name as well as by volatility. That makes it easier to look up the trend for a particular country you’re interested in.

    For comparison, I think 50 lines would be too many.

    When I did the volatility sort in the spreadsheet, what really struck me was scrolling down the list and watching the sparklines a) flatten, and b) approach the reference line.

    You get that same effect by scrolling down in this HTML page.

    Now admittedly, the effect relies on a kind of poor-man’s-animation, a flip-card effect, if you will.

    For what it’s worth, I actually think that all static infographics are challenged w/respect to visualizing change, and that we ultimately in many cases need moving pictures to best convey moving data.

  5. i tried it- and yes, the simple line graph was completely meaningless (at least, the limited resolution graph that excel produced)…. i waaay underestimated the volume of data. a 100% stacked area chart was somewhat more useful, but again, i think the main limiter was resolution and “navigation” capabilities.

    I think a growing/shrinking bubble visualization (e.g., hans rosling style..http://www.ted.com/index.php/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html) would be great…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s