Monday, September 27, 2010

Corrupt Indexes, Part II

While doing some research into corruption indexes, I stumbled across one organization that was measuring corruption in some unusual and interesting ways. That organization is Global Integrity (GI).

(If you haven't already seen it, you'll want to look at the previous post on corruption indexes for this to make sense. It might be good just to skim it to refresh your memory, as the comparison between Transparency International's CPI and Global Integrity's indicators is very instructive.)

The problem that all corruption indexes must address is that it is virtually impossible to directly measure corruption. The reason for this should be pretty obvious: most people engaged in corrupt activities go to some length to keep those activities secret. So unless you're omniscient, if you're trying to measure corruption you have a knowledge problem. You can never be sure that you've uncovered all cases of corruption. In fact, you can usually be sure that you haven't.

So how do you measure what you can't directly observe?


One solution to the problem is to measure something that is either related to or roughly analogous to the thing that you are trying to measure. This is how car insurance companies measure risk. They use what they do know (a driver's age, past history of speeding tickets, etc) to get a pretty good idea of what they can't directly measure (how safely someone drives). Similarly, this is how Transparency International measures corruption. Their CPI takes perceptions of corruption as a proxy measurement for the actual incidence of corruption.

Now one problem with this, as I've previously discussed, is that perceptions of corruption may not correspond all that closely to actual levels of corruption. But more generally, it seems to be quite difficult to find a good proxy measurement for corruption. Some have tried to measure bribery rates or to look at government financial information, but these proxies suffer from two problems. First, they're highly specific so they only really look at very particular forms of corruption. Second, they are themselves measures of things that are often hidden, just like corruption more generally.

So what do you do if you can't measure X directly, and you can't measure something that is related or analagous to X? One graceful solution is to measure the absence of X. This is exactly what Global Integrity tries to do with regards to corruption. Motivated by the observation that it is difficult and often impossible to directly observe corruption, they do the exact opposite: measure the absence of corruption.

How does this work?
Unlike groups that aggregate other people's data (*ahem* Transparency Int'l), Global Integrity collects their own data set. They do this by surveying a peer reviewed set of in-country experts. This is definitely more expensive than just aggregating data, but it means that they can control their data set in ways that would otherwise be impossible. Notably, they ask the same set of questions in every country that they include in their index. Additionally, the set of questions is (mostly) consistent year over year. The result is that countries on the Global Integrity Index can be compared with each other and over time with relatively low levels of methodological error. The downside to this, though, is that due to the cost, GI doesn't include all countries in it's index and their sample size of experts is small. (More on the GI methodology here.)

What sorts of questions does Global Integrity ask? Here are a few examples:

  • Can citizens use the internet freely?
  • Are journalists safe when investigating corruption?
  • In law, is there an election monitoring agency or agencies?
  • Can citizens access records related to the financing of political parties?


Notice a few things about these questions.

  1. The questions are not asking "Is there corruption?" Instead, they're asking about things that should help prevent different kinds of corruption. 
  2. The questions tend to be linked to specific policies. This means that if a country scores poorly in the GI index, it is easy to see why and it is easy to understand what policies could be implemented to correct for this. In other words, the GI index provides actionable information.
  3. The questions are frequently empirical questions. What does the law say about X? Are there any reported cases of a journalist being assaulted? Etc. These are questions that have answers that can be investigated and known with a high degree of certainty. To further back this up, Global Integrity requires that their experts provide cited evidence in their answers.
  4. The questions fall broadly into two categories: what is codified in law and what actually happens in practice. Answering the first set is simply a matter of going over the law. The second set - what the de facto situation is - is trickier. GI usually recruits a few (5ish) "experts" (usually native journalists) to handle the "de facto" set. This is probably the biggest weakness in their methodology, as it means that some of the questions reflect the opinion of only a few experts. However, they do try to offset this by having their expert's opinions peer reviewed, and by requiring that these opinions cite evidence.


And the end result looks like this.

Now, I don't want to claim that this is the perfect way to measure corruption. But I think it's superior to a number of other methods. I've mentioned a number of reasons for this already: it's not measuring perception, it provides actionable information, it's supported by cited evidence, it's can be compared over time and from country to country with low levels of error. There are, of course, things that could be better. It would be great if Global Integrity could get a larger sample size of experts for each country and I wish they surveyed more countries. Overall, though, it's a pretty cool index...

UPDATE: Matt Willett-Jeffries (you may have heard of him...) pointed out to me in a recent conversation that I was less charitable to Transparency International than was perhaps fair in Part I of this post. In particular, he noted that many of the sources of methodological error that I identify as a problem for the CPI are actually taken into account by the creators of the CPI and represented in their own statistical error calculations. This is a fair point and recently Transparency International has become quite responsible about publishing confidence intervals for all the countries that they index.

However, though it's good that they're documenting their methodological error, it's still worth noting that this doesn't in any way reduce the amount of error their methodology introduces to their index. So, for instance, in the most recent CPI, China is ranked 79th out of 180 countries. But due to the confidence interval on the Chinese CPI score, China could actually be ranked as low at 100th or as high as 65th. Now this may not seem like a big deal. After all, we can still say with confidence that the CPI demonstrates that China is more corrupt than the United States, as their confidence intervals do not overlap at all.

But put yourself in the position of an aid organization that is trying to decide which countries are good candidates for receiving aid. In that case, levels of corruption for particular countries should bear on that decision. Most aid organizations shouldn't want to spend money in areas where that money is just going to get embezzled, misappropriated, etc. So they go to the CPI and what does it tell them? Well, it does show that Norway is not perceived to be as corrupt as Iraq. But that's not particularly useful, mostly because it's just so obvious even without the index. (And because no one is thinking of giving aid money to Norway...) Now, what if I want to make a more typical comparison for an aid organization to make, say, whether Ghana is more corrupt than Rwanda. Here's where the bad news shows up, as the error in the CPI is high enough that it cannot say whether 69th ranked Ghana is actually less corrupt than 89th ranked Rwanda.

In other words, the CPI is good at telling us that countries near the top of it's ranking are perceived to be less corrupt than countries near the bottom of it's ranking. But any comparison between countries that are reasonably closely ranked to each other is statistically meaningless. Since this is often the kind of comparison you want to make, that's a real bummer.

And one last thing...Though it is obviously a responsible thing to do to produce confidence intervals for this sort of corruption data, we all know that the media, whenever they report on this sort of thing is going to completely ignore those confidence intervals. In other words, the media isn't going to say that China is ranked 79th but that it has x confidence interval. No, instead they're just going to report that China is ranked 79th and Peru is ranked 75th and etc. And what we know the media won't say, because they never do, is that the Chinese ranking on the CPI relative to Peru (or a number of other countries) is statistically meaningless.

So when you produce an index that has a lot of error built into it, you are essentially arming the media with information that they will misuse, no matter how carefully you document that error. When you look at it that way, the responsible thing to do might be to do a better job reducing your methodological error as opposed to just documenting that error really well.

3 comments:

  1. Very interesting. If you are not familiar with their programs you should look at the Millenium Challenge Corporation in the U.S.. Much of what you say had to be incorporated into their decision making process. This program (MCC) was slow to get off the ground because of the difficulties in getting comfortable with the "measureables." The program now seems to be chugging along very well.

    ReplyDelete
  2. I've just been catching up on your blog, and I think you've really identified one of the major problems facing the broader scientific community and its interfacing with public media. We're past the point where scientists and statisticians can claim innocence in how their data is used. Especially in the case of something like the CPI where the study is done entirely to further practical goals of reducing production, the onus is on the people releasing the data to consider how it will be picked up and used by the broader public who have not done all of the background research and who don't understand all of the nuances of the methodology. Facts are 'newsworthy' and uncertainty is not, so organizations that are largely concerned with public and political use of their statistics need to be very careful about the nature of the facts that they are releasing when the uncertainty is actually the predominant story. I don't see an easy answer on how to improve the use of scientific data in the public domain, and maybe for now the best stopgap measure we can take is to foresee its misappropriation and control the language of what actually gets released. It's a sticky situation. Interesting stuff.

    ReplyDelete
  3. Pretty much completely agree. I don't see any easy way to get the public to interpret data better. Teaching basic statistics in high schools might help a little but this is, at best, only going to result in gradual progress.

    But that's not too much of a problem. What you describe as a "stopgap measure," I would actually consider a fairly elegant solution. Scientists and researchers need to be more careful about the format and language that they use when they release data. A good start would be doing away with executive summarys (or writing better executive summarys). Changing the way we graphically present data would help too. Just putting error bars onto the graphs that are given to the media would be progress. Not publishing if your research has HUGE error ranges would also be a good thing... This actually isn't too much to ask...

    Another side of this problem is that we don't have a good, preferably open source and free way to produce high quality data graphics. Excel is terribad for this sort of thing, and R - which is what most real statisticians use, is horribly inaccessible for your average user...

    ReplyDelete