Methodical Snark critical reflections on how we measure and assess civic tech

Democracy in the eye of the beholder


I love it when messy methods get topical, and this might be one of the very few silver linings to come out of Trumpland. December saw the publication of an IPSR special issue on measuring democracy, and then shit got real this week, when Andrew Gelman began a stream of posts criticizing the application of EIP methodology to the recent presidential elections in US states, and especially the claim/meme that North Carolina is no longer a democracy.

The EIP (Election Integrity Project) is collaboration between Harvard and Sydney universities, led by Pippa Norris, and which uses expert surveys to assess electoral processes in countries around the world, according to international normative standards. This allows the production of a comparative index, so that one can see how much better elections are in country X than in country Y.

Gelman’s first post called out the EIP for methodological shortcomings, noting particularly their 2014 ranking of North Korean elections as moderate, and questioned the survey’s application to US states on the basis of this. His critique got pretty fundamental.

Electoral integrity is an important issue, and it’s worth studying. In a sensible way.

What went wrong here? It all seems like an unstable combination of political ideology, academic self-promotion, credulous journalism, and plain old incompetence.

But that wasn’t the end. A follow-up post questioned the ethics of the project’s ignoring that finding and continuing to produce and promote assessments, arguing that  more broadly, arguing that “When you find a bug in your code, you shouldn’t just exclude the case that doesn’t work, you should try to track down the problem.”

Gelman’s third post shared a response from Pippa Norris (who leads the EIP), which described the projects methods in some detail, but failed to respond directly to criticisms. This unleashed a wave of brazen criticism in comments, which Norris tried unsuccessfully to stem by expounding on the EIP methods.

My favorites:

darosenthal says:
January 4, 2017 at 11:51 am

That response [from Pippa Norris] was a horrid slog. I’ve rarely encountered a more densely compacted layer of polysyllabic obfuscation in the service of clarification than what I’ve read here. I feel like a man who, knee deep in quicksand, has been handed an anvil..

Raina says:
January 4, 2017 at 2:49 pm

For pete’s sake, Pippa. I’m an academic. This response (and the one above) might fly in a review response, but it’s not going to do anything for the public but make them dismiss you. No one cares how many reams you and your colleagues have published if you can’t BRIEFLY explain and defend the basic principles of your work to non experts.

This is entertaining, but it’s also important.

EIP’s expert survey uses standardized scales (0-100) for a variety of dimensions (Electoral laws, District boundaries…), which are aggregated to produce a countries ranking on the comparative index. Experts are chosen specifically for each country (50% national, 50% international) but not necessarily contracted, and the criteria for selection are unclear. Experts fill out a single survey, and scores are averaged across experts regardless of who those experts are or how many of them there are. This is pretty standard for international governance indices (Freedom in the World is likely the most notorious), and has produced a fair amount of criticism from the countries being assessed.

That criticism has traditionally taken a colonial bent: ie who are you to judge my democracy, western white dude? This line of criticism produced some debate and reform i among western assessments, and some home grown assessments in Africa.

In this thread, the criticism takes novel turn, and has more to do with the validity of expert perspectives that might have overly contextualized perspectives, raising the specter of inter-coder reliability more generally. If, as in the 2014 index, North Korea’s surveys only got a 6% response rate (of the 40 experts consulted per country), can the results justifiably be presented? More importantly, Gelman argues, when the results of the survey are so counter-intuitive (presenting North Koreaas a moderate democracy) does the project have an obligation to revisit and fix it’s methodology?

Yes, according to the comments on these posts, and I have to agree. But I think the problem runs deeper. International expert comparisons get thrown about because they are presented as quantitative data, and because we all love comparisons. But we forget how wildly subjective the scorings are, and how poor are the best controls we have to harmonize and validate them.

The risk is that we easily misrepresent and thus misunderstand assessments, because hard numbers and bold claims are fun to share. I wrote about how this is a problem for open data assessments here. Gelman adds the concern that we risk discrediting our disciplines, which is perhaps even more important for a fledgling endeavor like “what works in technology and accountability programming.”

Add Comment

Methodical Snark critical reflections on how we measure and assess civic tech