The Global Impact of Open Data: Key Findings from Detailed Case Studies Around the World
By Stefaan Verhulst, Andrew Young
Publisher: O’Reilly
Released: September 2016
O’Reilly recently released a book documenting GovLab’s case studies on open data impact around the world. Some of the key findings were presented for feedback at the IODC last week, and were forecast in a report released some months ago. This book includes full versions of all 19 country case studies, so weighs in at a whopping 459 pages, daunting for many.
This post is a quick review and summary of the main points for those who don’t have time to dig in. It has 20 chapters spread across 5 parts, of which 4 are case studies, constituting the bulk of the text. These case studies support a brief analysis, whose main insights are all captured in that first part/chapter. I’ll summarize these below, together with some brief reflections. I’ll then close with three questions and observations about what this book represents in the larger research landscape.
Defining Open Data
The book reviews 10 definitions of open data and asserts its own, which is useful enough, though the authors note that it describes the way things ought to be, more than the way things are:
Open data is publicly available data that can be universally and readily accessed, used, and redistributed free of charge. It is structured for usability and computability.
After this definition, the book moves quickly onto the main focus of it’s synthetic analysis: asserting typologies for types of open data impact, as well as impact’s conditions and challenges. Together, these three typologies support a number of hypothesis (they call them premises) and recommendations.
Types of Open Data Impact
The authors suggest the following categories for understanding the 19 case studies of impact in their collection. They note that types tend to overlap, but categorize each case study within one of these four.
- Improving government- this has to do both with improving governance and efficiency, which the authors label “tackling corruption and transparency” and “improving services”, and is illustrated by 6 case studies
- Empowering citizens – this is evidenced by “informed decision making” and “social mobilization” and is illustrated by 4 case studies
- Creating economic opportunity – this refers to opportunities for both citizens and organizations, is captured in the sub-headers of “economic growth” and “innovation”, and is illustrated by four case studies
- Solving public problems – in which data-driven engagement and assessment make contributions to intractable social problems, illustrated by five case studies.
These are decent enough distinctions. What I miss most though, is some reference to the more ambient impact of open government and open data on the dull and every-day aspects of governance. We know from other contexts that knowing one is being watched will influence behavior, for both individuals and institutions, and there’s every reason to believe this is also the case for open data (Beth Noveck talks about this in terms of the audience effect). I would have loved to see a discussion of evidence for this, or lack thereof.
I also noted with interest that all the case studies categorized as improving government were initiated/hosted by government, while all those categorized as empowering citizens were initiated/hosted by civil society. There are of course overlaps, but this correspondence seems striking, and I would have appreciated a discussion.
Facilitating conditions for open data impact
- Partnerships – as illustrated by engagement of intermediaries and data collaboratives. This condition is present to some degree in all of the case studies.
- Public infrastructure – which refers to institutional and cultural infrastructures as much as technical infrastructures. Open data born outside of a conducive technical policy environment doesn’t live long.
- Policies and performance metrics – which seems to be a bit of a catch-all category. Clear open data policies would seem to belong to the above category, but the authors also emphasis political support and ownership, as well as metrics for monitoring implementation, which cut across open data components, from policy design, to citizen engagement.
- Problem definition – “We have repeatedly seen how the most successful open-data projects are those that address a well-defined problem or issue.”
The discussion of partnerships in this section is compelling, but frustratingly short. A deeper analysis would have been welcome, to tease out some of the differences between partnerships with civil society to engage citizens and with media to inform them, or with “data collaboratives” and all the different ways those can function and be constituted. Similarly, a closer analysis of the kinds of infrastructures at play in different examples would have been useful. Brief reference is made in the synthetic section on interoperability in Kenya’s Open Duka and a “culture-building campaign” in Brazil, but the reader is left to guess at how these kinds of infrastructures interact (and they inevitably do).
More than anything else, I took issue with the emphasis on problem definition. I worry that this emphasis is at odds with a normative approach open by default, which I expect to be most powerful for the type of every-day ambient impact described above, and most appropriate when considering the types of data most central to governance (budgets, legislative proceedings, data mappings for key agencies). It’s worth noting that we discussed this with Sunlight staff at the IODC who help cities to implement open data, and who saw no tension.
Challenges to open data impact
- Readiness – which refers to both human and technical capacity
- Responsiveness – the degree to which government responds to the feedback of citizens and civil society regarding data, I think. It confused me.
- Risks – particularly risks having to do with privacy and security (the Eightmaps case case study was used to illustrate this).
- Resource allocation – referring not only to initial investment, but sustainabilty over time
It’s not entirely clear to what degree the authors differentiate between challenges to opening data and challenges to open data impact here. It’s also easy to wonder what is excluded by the fact that these are synthesised from a set of cases, which are in turn selected on the basis of presumed impact. What would be the challenges culled from cases of failed open data?
Some of the most important challenges that I miss here have to do with the corollaries of the facilitating conditions, especially institutional culture. This can take a variety of forms, one of the research papers presented at the Open Data Research Symposium described economic disincentives to open data as the normative arguments and assumptions about financial consequences of opening data, as distinct both from institutional cultures of opacity and resource allocations.
It’s hard not to think that a larger sample and more expansive analysis would both have surfaced additional challenges. Of course, it might just be really tempting to stick to four categories.
Three final thoughts
First of all, a point on rhetoric.
This research presents a useful starting point for thinking about the consequences of opening government data. And it pretty clearly responds to demand. Governments, donors and all others all want to know why they should bother. That’s an important question for the open data community to be able to answer, and it’s important to do so on the basis of evidence. But I think it’s a tactical mistake to call this impact.
We need to differentiate between impact and outputs. This is especially important for research that aims to provide evidence for policy and program design. We may someday be able to demonstrate the impact of open data. But for now the best we can do is to document outputs, which is well done by this this book, and is a valuable contribution in itself. But being careful not to over-claim will make it more valuable still.
Secondly, it’s also worth noting that this yet another contribution to case study dominance in open data research. There’s a lot of these now, and as a research community we are really approaching the point at which we should start doing more synthetic and meta-analytic work. Synthesis of 16 primary research cases is good, methodical meta analysis of many more secondary sources would be better. It’s hard work, and should be prioritized.
Lastly, I was motivated to this review by a tweet, whose point still stands.
And though I took some time to review this book, I didn’t have time to read all the case studies thoroughly and carefully. I’m not sure anybody does. That means that there’s no one who can really question the analysis presented in that first, all too brief synthetic section.
I’m not suggesting that there’s anything wrong with the authors’ analysis, but this is systemic of a larger problem. As long as we are all only assessing our own case studies, we don’t have the kind of collaborative quality control that drives development in science, we don’t have true synthesis.
The tryanny of the case study is very real. It might be the most powerful limit to our ability to draw heuristics and evidence from the wild explosion of practice in open data, technology and accountability. It’s time to apply some comparative methods.
Very interesting review.
Rather than “outputs”, do you mean “outcomes” when you try to distinguish it from “impacts”?
Thanks Patrick,
I’d distinguish between outputs, outcomes and impact as respective points on a continuum. And my reference to outputs here was deliberate. I’d argue that we’re only able to document the outputs of open data initiatives so far (portals have gone up, complaints have been made, government responses). There’s a valid argument to make that we have seen some outcomes (enhanced capacity, improved access to information, specific policy changes) in regard to open data so far, but I think that it’s a stretch. In any case, I think we’re a long way from being able to document impact (better governance).