Designing systems for the co-production of public knowledge: considerations for national statistical systems

Abstract The functions of government are increasingly complex and information-driven. However, for many developing countries, the quality of information is poor and the consequences of that poor information are substantial. If the goal is to establish or advance effective systems of government – in terms of formulating or implementing public policies by laws or rules – we have to consider how the design process can help attain that goal through improved information, data, and evidence. National statistics are problems of governance, knowledge, and design. While governments are primary users of national statistical systems, national statistical capacity is jointly determined because without contributions from non-state actors, there is little hope of observing accurate data that expresses important social, economic, and natural phenomena in any state – but especially so in failed, transitioning or struggling states. This paper discusses several findings from research studies for those who design and implement systems that collect, disseminate, and interpret government statistics. These findings are derived from the literature on the co-production of public knowledge. The growth of complex, high-dimensional data, accompanied by calls for investment in “big data” technologies and methods, will change how we collect and interpret data in many countries. Yet, our most important data enterprises are built on a human infrastructure with prospects that are both limited and supported by social factors. Organizations themselves must expend resources to navigate a world in which data is growing at exponential rates. But organizations are constrained and enabled by broader aspects of society that go well beyond government’s role in collecting, processing, and disseminating statistical data. As we discuss, one notable example is the relative presence of general purpose information technologies.


Introduction
Theorists have variously described design as the "process of inventing objects," (Baldwin and Clark 2000) or the "conceiving of objects, of processes, of ideas for accomplishing goals" (Simon 1995). In both views, we are invited to think about the nature of the goods and services that institutions choose to provide or not to provide. If the goal is to establish or advance effective systems of governmentin terms of formulating or implementing public policies by laws or ruleswe have to consider how the design process can help attain that goal through improved information, data, and evidence (Crow and Shangraw 2016;Shangraw et al. 1989).
National statistical systems are the enterprises that collect, validate, and report national attributes and national statistical capacity refers to the degree of adequacy to which a national statistical system operates. The World Bank's Statistical Capacity Index (SCI) measure national statistical capacity along dimensions of methodology (or adherence to internationally recognized standards for collecting data), the source of data collected and how frequently this occurs, and how frequently data is disseminated to internal and external consumers (also called periodicity and timeliness). National statistical systems are perhaps the most important component of the modern administrative state in the search for improved information, data, and evidence. The functions of government are increasingly complex and information-driven. However, for many developing countries, the quality of information is poor (World Bank 2002) and the consequences of that poor information are substantial. For example, in 2010, Ghana's national statistical system issued revisions because it had historically underestimated the country's gross domestic product (GDP) and those revisions raised the statistic by 60 percent (Devarajan 2013;Jerven 2013). These revisions had the consequence of reclassifying Ghana from a low-income to a middle-income country (Jerven 2013)and that reclassification affected a whole host of policies including foreign aid, lending, commerce and development, and foreign investment. In response, the international economic development community called for serious reflection on Africa's so-called "statistical tragedy" (Devarajan 2013) and demanded international efforts to evaluate and improve national statistical systems (Willoughby 2008).
Broadly speaking, national statistics are problems of both "governance and knowledge" (Jerven 2013, S19). The quality of the systems that collect, process, and disseminate national statistics varies widely even across nations of similar wealth and governance structure (Jerven 2013). While governments are primary users of national statistical systems, national statistical capacity is jointly determined because without contributions from non-state actors, there is little hope of observing accurate data that expresses important social, economic, and natural phenomena in any statebut especially so in failed, transitioning, or struggling states. In those states, when factors extend beyond the scope of the state, non-state actors may help shape or improve statistical capacity (Anderson and Whitford 2017). In other words, national statistical statistics are often "co-produced" by important government and non-state actors.
This paper discusses several findings from research studies for those who design and implement systems that collect, disseminate, and interpret government statistics. These findings are derived from the literature on the co-production of public knowledge. The growth of complex, high-dimensional data, accompanied by calls for investment in "big data" technologies and methods, will change how we collect and interpret data in many countries. Yet, our most important data enterprises are built on a human infrastructure with prospects that are both limited and supported by social factors. Organizations themselves must expend resources to navigate a world in which data is growing at exponential rates (Crow 2016). But organizations are constrained and enabled by broader aspects of society that go well beyond government's role in collecting, processing, and disseminating statistical data. As we discuss below, one notable example is the relative presence of general purpose technologies. While many technologies have limited or specialized purposes, those with broad penetration and uses across a wide variety of fields are referred to as general purpose technologies (GPTs). These may include public service utilities, mobile communication technology, and the Internet. GPTs stand out from other classes of technologies because of their broad adoption and ability to complement other, more specialized technology platforms (Bresnahan and Trajtenberg 1995;Jovanovic and Rousseau 2005).
The response from the international development community to the "statistical tragedies" (Devarajan 2013) faced by developing countries has been to build programs for evaluation (Willoughby 2008), planning, and the strategic design of improvements in national statistical systems. Programs such as the World Bank's STATCAP project or PARIS21 are notable examples that may, in the end, make a difference. Yet, research reminds us that such systems operate in diverse economic, social, and political environments where "one size fits all" approaches rarely succeed. Little attention has been paid to the role of non-governmental factors in shaping the design of national statistical systems. This paper helps fill that void.

Issues of policy design and practice
For centuries, a key governmental function has been the creation and refinement of support systems for the collection, validation, and reporting of information about a nation's population. For instance, Article I, Section II of the United States Constitution requires a decennial census and Title 13 of the US Code enables the Census Bureau to carry out the population count. National statistical systems, the enterprises responsible for the collection of population data, have become increasingly important components of modern administrative states as the demand has grown for high-quality data (Hulme 2000). However, system quality varies significantly, even among countries with similar structures of government and levels of wealth (Anderson and Whitford 2017).
Reliable data generated by national statistical systems is an increasingly important aspect of governance at both the national and international levels (Polidano 2000). Ivan P. Fellegi (1996), the former Chief Statistician of Canada, once described the purpose of the national statistical system as follows: "to provide relevant, comprehensive, accurate, and objective (politically untainted) statistical information. The end purposes to be supported by the information are multiple, but include prominently the monitoring of the evolution of the country's economic and social conditions, the planning and evaluation of government and private sector programs, investment, policy debates, and advocacy, and the creation and maintenance of an informed public" (165).
Modern national statistical systems go beyond measurements of population and wealth. They collect data that play a key role in our ability to understand complex and critical issues in health (Lieberman 2007(Lieberman , 1407, the environment (G€ ossling et al. 2002, 199), international poverty reduction (Deaton 2001;World Bank 2010), and international peace and security treaties (Sanga et al. 2011, 304). Statistical capacity has been associated with government attributes such as bureaucratic quality (Williams 2006), government transparency (Williams 2011), and tax system quality (Martin, Mehrotra, and Prasad 2009). Political scientists have linked national statistical capacity with government accountability and effectiveness, regulatory quality, political stability, and corruption (Angel-Urdinola, Hilger, and Ivins 2011). Many of these attributes have themselves been associated with improved national productivity (Bekaert, Harvey, and Lundblad 2011). National statistical systems are not the interest of just emerging or advanced democratic states, even authoritarian regimes may support statistical capacity improvements as a way to promote regime survival through enhanced ability to participate in international efforts and stimulate perceptions of legitimacy (Boix and Svolik 2013).
Given the wide-ranging benefits, assorted international organizations have demonstrated a strong commitment to evaluate and improve statistical capacity in developing nations through the creation of standards, operations monitoring, and capacity building efforts (Dasgupta and Weale 1992). Prominent international projects include the Addis Ababa Plan of Action (AAPA), the World Bank's STATCAP program (Sanga et al. 2011), and the EC Partnership in Statistics for Development in the 21st Century (PARIS 21), a joint effort by the United Nations (UN), Organization for Economic Co-Operation and Development (OECD), International Monetary Fund (IMF), and World Bank. In recent decades, increased evaluation of statistical systems in developing nations has begun to reap meaningful improvements (Alexander, Cady, and Gonzalez-Garcia 2008;Willoughby 2008).
Since national statistical capacity is integral to a wide range of government attributes, important questions arise around the determinants of statistical capacity across nations and the factors that can lead to decline in a country's statistical capacity. The World Bank has found a set of five factors that could present threats to national statistical capacity: funding reductions, heavy dependence on donor financing, inadequate training of personnel, insufficient feedback from the users of statistical information, and hesitance of governments to embrace the transparency involved in increased statistical capacity (World Bank 2002). These threats to statistical systems can be perpetuated by insufficient budgets and low-quality outputs. When budgets are excessively constrained, statistical systems often enter a "vicious cycle, in which inadequate resources restrain output and undermine the quality of statistics, while the poor quality of statistics leads to lower demand and hence fewer resources" (World Bank 2002, para. 11). Relatedly, Devarajan (2013) suspects that low-quality statistical systems are in part caused by the potential political sensitivity of the statistics produced.

Statistical capacity depends on technological development
While governments are important consumers and producers of national statistics, they are not the only ones. A number of important non-governmental factors contribute to the enhancement of national statistical systems. Accordingly, national statistical capacity is increasingly "co-produced" by both government and other actors. Technology is a significant driver of societal advancement. It both enhances the capacity for control and creates increased complexity and uncertainty in the economic, political, and cultural life of a nation (La Porte 1971). For decades, economists have highlighted technology's role in economic growth (Fagerberg 1994;Grossman and Helpman 1993;Maddison 1987;Schumpeter 1982;Sen 1999;Solow 1956Solow , 1957 while highlighting its role in the international movement of labor and capital (Arrow 1962;Lucas 1998;Romer 1986). Romer (1994) notes the role of public policy in determining whether organizations make investments to expand their technology, and thus, their knowledge capital.
While many technologies have limited or specialized purposes, those with broad penetration and uses across a wide variety of fields can boost a national economy. These technologies, which include public service utilities, mobile communication technology, and the Internet, are referred to as general purpose technologies (GPTs) because of their broad adoption and ability to complement other, more specialized technology platforms (Bresnahan and Trajtenberg 1995;Jovanovic and Rousseau 2005). While some economists argue that specialized technologies are not particularly useful for widespread growth in developing nations (Fu, Pietrobelli, and Soete 2011), access to GPTs often provides a meaningful boost, economy-wide. In contrast, without broad-based GPTs in a country, firms may lack the incentive to invest in and develop new technologies (Pietrobelli 1994;Stiglitz 1989). Their outsized impact has led researchers to examine the cross-national variance in GPT adoption (Archibugi and Coco 2005).
Beyond governmental choices, economists view variation in levels of technological development as a key factor of cross-national discrepancies in economic growth (Castellacci 2007;Helpman 1998;Islam 2003) and sophistication of the administrative state (Whitford and Tucker 2009). In particular, widespread adoption of GPTs such as electricity or the Internet that can be used broadly throughout an economy, has proven capable of enabling a wide variety of organizations, including governments, to overhaul their operations and improve outcomes (Bresnahan and Trajtenberg 1995;Helpman 1998). Governments frequently support widespread technology development for purposes of national power, either related to national defense or economic prosperity (Lambright and Zinke 1989). Indeed, economists have found that technological advancement provides a strong platform for widespread economic growth (DuBose et al. 1995;Paunov and Rollo 2016;Sen 1999;Solow 1956Solow , 1957WCED 1987).
Collection and validation of data often, but not always, involves the use of GPTs. Even when statistics can be collected by door-to-door surveys and interviews, statistical organizations can institutionalize their information exchange protocols (Barua, Ravindran, and Whinston 2007;Yang and Maxwell 2011), enhance institutional innovation, and boost knowledge capital. A well-designed protocol addresses data collection issues such as the problem of missing or double-counted houses that threatens door-to-door surveys (Deaton 2001, 134). Adoption of GPTs can support innovation and knowledge sharing to better coordinate field workers for observation and verification. Enhanced technology could also play a role in either suppressing or improving observation of notoriously difficult to monitor, informal markets that Jerven (2013) highlights as a threat to the capacity of national statistical systems. This would be an important increase in capacity, particularly in developing nations that tend to have a greater concentration of informal markets than developed countries. Technology also increases an organization's capacity to adapt to change (Garc ıa-Morales et al. 2008), a vital factor in developing, rapidly evolving nations. Indeed, research demonstrating that technology adoption by governments is associated with increased productivity dates back several decades (Danziger 1979).
National statistical capacity is not only about information collection and validation, but information sharing. A quality statistical system is partially characterized by the availability of its data to both internal and external stakeholders such as governmental agencies, regulators, lawmakers, industries, international organizations, and citizen groups (Fellegi 1996, 184). Increased technological capacity is key to improving access to national statistics. Adoption of new technologies increases administrative efficiency and limits the burden of compliance with requests for information through the adoption of automation and the creation of systematic distribution protocols. Systematization has the further benefit of reducing vulnerability to political influence and potential resource constraints. In case of governmental institutions, Norris and Kraemer (1996) find that the ability of the government to benefit from technology is at least partially dependent on the technological sophistication of the surrounding region.
In a recent paper in the Review of Policy Research, we show using statistical results based on data from 100 different institutional contexts that technological attainment has a strong positive impact on national statistical capacity (Anderson and Whitford 2017). We show that this is true when we measure technology attainment broadly as well as when it is measured narrowly. The strongest predictors of national statistical capacity are telephone use and electricity use, which we interpret knowing that electricity and telephone technologies have been in use longer than broadband Internet in developing markets; as such firms and other market actors, nonprofit organizations, and governments all have had substantial experience with integrating them into their organizations. Figure 1 shows the positive linear relationship between technology attainment and national statistical capacity in developing countries.
A more nuanced interpretation is that telephony is very broad category of communication. The Internet as a separate technological mechanism is very relevant for advanced industrialized democraciesthe so-called "first adopters". In contrast, perhaps the most exciting changes in this arena have been developments in other countries, such as South Korea and India, where mobile broadband penetration is fundamentally changing communication methods.

Discussion
We believe that there are important implications of practice in the context of national statistical systems to be drawn from the portfolio of research that has emerged in the last half-century. We want to be very clear that all such implications are necessarily contingentthe ability to deploy them in a particular place at a specific time will depend on many contextual factors, not the least of which is the capacity of the organizations that are working in the space. At the same time, we have come to believe in our work in this and related areas that there are few "free lunches"that most deployments are part of grand bargains, and the outcomes are hedged by tradeoffs among multiple, competing values. Of course, all practitioners are aware of such constraints, though the academic research literature rarely is overt about them in how we pose the policy implications of research articles.
First, when trying to make knowledge for the purpose of governance, the outcomes and even the attempt itself are shaped by both the institutions that produce that knowledge and also the evolution of broader social systems surrounding the institutions that produce the data themselves. Governments are asked to build repositories of knowledge about social, market, and environmental conditions that are necessarily ever-changing. Hitting a moving target is not easy. Yet, it is even harder when the social and market conditions help determine the dart one can throw at the target. In a country with few general purpose technologies, the best we can do in gathering national statistics may not be as good as we hope but it may be the best we can obtain.
Second, the states with the greatest need for national statistical system improvements are also the least likely to be able to implement design-based change. It is even hard to do such things in long-established advanced industrial democracies with the stability and financial resources of advanced markets, as well as the human capital of Silicon Valley or world-class universities. Doing so in states without either is a push. However, these states are often the most dependent on such innovations for moving over the various hurdles in the path to broader development for growing populations. In established markets, there are always incentives to test the veracity of government data on growth or inflation, and there are resources for doing so. The countries where we most worry about growth or inflation data are often those least likely to be able to gather, process, and disseminate such dataand the barriers to such activities are also faced by actors outside government itself. If government lacks GPTs for gathering quality data, why would we expect non-state actors to be able to do much better? Third, all states, but especially developing ones, are interdependent in the production, sharing, and validation of knowledge. Important knowledge activities transcend national boundaries and have implications for global communities. If we know something about Ghana, then we are more likely to know something useful about similar, connected economies. If we find that Ghana's data are problematic, what are we left to believe about other, related economies? Improving one country's systems can help related systems only because the discovery of new paths of improvement cuts transaction costs in the design process.
Fourth, and perhaps counter-intuitively, coproduction of knowledge by such systems with other non-state actors has the benefit of reducing the likelihood that knowledge will be politicized. One notable example of politicizing knowledge is the 1920 Census in the United States, where collected data revealed a politically-destabilizing trend of urbanization. As a consequence of new data, congressional representatives from rural districts worked to block reapportionment efforts that would shift the balance of electoral power away from them (US Census Bureau 2017). Also in the US, the current executive administration's proposal of "alternative facts" represents different types of politicized knowledge. And in India, where the consequences of poverty reduction reforms are significant, methods to measure consumption have become the concern of both supporters and critics of assorted reform efforts (Deaton and Kozel 2005). While it is probably not useful to go much further down the "rabbit hole" of discussing countries that politicize data, shared data production, processing, curation, and dissemination improves many outcomesnot the least of which is that internal conversations about how policy and knowledge emerge over time is more likely to take a productive path.
The purpose of this paper is to discuss findings from a number of related research studies that may be useful for designing and implementing systems that do the heavy lifting of government when they collect, disseminate, and interpret government statistics. The finding that we have reviewed here come from a number of areas of inquiry that all contribute to the broader literature on the co-production of public knowledge. We have not emphasized in this paper how future technologies to handle "big data" will change how we collect and interpret data in many countries. Instead, we have focused on how our most important data enterprises build on human infrastructures that are embedded in social contexts. In a world in which data is growing at exponential rates, those organizations are constrained and enabled by broader aspects of society that go well beyond government's role in collecting, processing, and disseminating statistical data. GPTs are important, if only because having access to telephones or electricity makes it easier to keep records and to extract them. It is clear that mobile broadband may be changing the whole data collection enterprise.
We know that the implications we have offered in this paper are necessarily contingent and that no practical changes are "free lunches"that there are always tradeoffs. However, we believe that recognition of these constraints, as well as the opportunities that they may signal, will help set the stage for future improvements.

Disclosure statement
No potential conflict of interest was reported by the authors.