Building Capacity for Data-Driven Governance: Creating a New Foundation for Democracy

ABSTRACT Existing data flows at the local level, public and administrative records, geospatial data, social media, and surveys are ubiquitous in our everyday life. The Community Learning Data-Driven Discovery (CLD3) process liberates, integrates, and makes these data available to government leaders and researchers to tell their community's story. These narratives can be used to build an equitable and sustainable social transformation within and across communities to address their most pressing needs. CLD3 is scalable to every city and county across the United States through an existing infrastructure maintained by collaboration between U.S. Public and Land Grant Universities and federal, state, and local governments. The CLD3 process starts with asking local leaders to identify questions they cannot answer and the potential data sources that may provide insights. The data sources are profiled, cleaned, transformed, linked, and translated into a narrative using statistical and geospatial learning along with the communities' collective knowledge. These insights are used to inform policy decisions and to develop, deploy, and evaluate intervention strategies based on scientifically based principles. CLD3 is a continuous, sustainable, and controlled feedback loop.


Introduction
Vast amounts of data, generated through almost every aspect of living, offer an unprecedented opportunity to improve the health, well-being, and quality of life of our communities. Smart cities, urban analytics, civic analytics, and urban informatics, are all expressions in today's lexicon that hold the promise the data revolution will democratize data, but has it? The movement to democratize data, to break down the data silos, and to provide broad-based data access, has been around since at least the 1990s (Sawicki and Craig 1996) and is accelerating in many cities and counties, primarily large ones. However, the capacity at the local community level to store, manage, extract knowledge, and gain insights from data is still one of our country's great challenges (National Research Council 2013).
Communities are a complex system of independent networks defined socially around a common interest or geographically over various spatial scales. Policy interventions in one area can ripple through a community with unintended consequences and interactions throughout the network (Pollack 2016). By working across community programs and government agencies, solutions to these complex problems may become more tractable. For example, communities like Arlington, Virginia, are concerned about at-risk youth. The police department wants to be proactive and reach them before they go down the wrong path, the school system wants them to succeed academically, the libraries want to nurture their reading skills, and the parks and recreation department wants to involve them in activities that CONTACT Sallie Keller sallie@vt.edu Social and Decision Analytics Laboratory (SDAL), Biocomplexity Institute of Virginia Tech, Arlington, VA .
will lead to a healthy lifestyle. The goal is to identify trends and patterns at a county and sub-county level that will drive successful interventions, not to identify individuals in the data. Traditionally each department approaches common issues and goals in data isolation. We argue that what is missing in tackling these persistent problems is a defined process that can use the vast amounts of data generated across government departments and programs, nonprofits, businesses, and its citizens, to monitor the complex system of networks in a community, and evaluate interventions and interactions. To meet this challenge, our Social and Decision Analytics Laboratory has developed a novel approach for community learning through data-driven discovery that enables evidence-driven policy interventions and evaluations.
The cycle of community learning begins with local leaders defining critical questions and issues facing their community; identifying the data sources that provide insights; wrangling the data sources (profiling, cleaning, transforming, linking); using statistical and geospatial learning along with the communities' collective knowledge to inform policy decisions; and developing, deploying, and evaluating intervention strategies based on scientific experimental design principles. Community learning is a continuous sustainable and adaptive feedback loop.
We call this community learning cycle, the Community Learning Data-Drive Discovery (CLD3) process. It borrows from the features of the Learning Health Systems (JASON 2014) and the systems science approach in public health (Sterman 2006). The learning healthcare system extends the opportunity for discovery throughout the lifecycle of an intervention at the level of the individual; it seeks to determine which intervention works best, for whom, and under what circumstances. The CLD3 process expands this paradigm from the individual to the community level.
The community learning cycle is realized from direct and indirect input. Direct input is obtained through interactions with the community leadership and through existing data sources such as local administrative data, national repositories, and common data frameworks, for example, human services, education, and emergency management records; American Community Survey (ACS) and American Housing Survey; county health rankings (University of Wisconsin Population Health Institute 2017), and evidence for action metrics (National Academies of Sciences 2016). Indirect input is obtained from the natural opportunistic data flows that occur in a community, for example, social media, blogs, community websites, and local news.
Community leaders can come from all stakeholder groups, however our initial research focuses on civil servants in local government. Local governments collect massive amounts of data every day to provide services and to plan for the future. Yet, most do not use their own administrative data to inform their decision-making and in some cases cannot access these data due to vendor contracts. The CLD3 process can build capacity for data-driven governance in these communities by making use of their local administrative data and combining it with state and federal statistical data, social media, and other sources of data to inform a community's questions.
In the sections that follow, the CLD3 process is described and put in context using our collaboration with Arlington County, Virginia. The roles for community learning, statistical learning, and evidence-based policy development are highlighted.

Data Types, Models, and Statistical Rigor
Data is the new language to communicate within and among departments and offices at all levels of government. But what is meant by data is often not well defined. For the purposes of facilitating collaboration and clarity in developing and deploying the CLD3 process, definitions of "data" are categorized in four bins (Keller et al. 2016; r Designed Data are generated in the pursuit of scientific discovery. Designed data include statistically designed data collections, such as data generated from surveys, experimental designs, registries, and intentional observational collections. r Administrative Data, also referred to as "business practice" data, are collected for the administration of an organization, program, or service processes. These data provide an opportunity for gathering information that exists due to normal economic and social activity. Examples of administrative data include Internal Revenue Service data for individuals and businesses, Social Security earnings records, patent and trademark databases, Medicare and Medicaid health utilization data, banking and other financial data, industrial production processes, such as tracking supply chains end-to-end (Pires et al. 2017), taxi trip data, and local data generated from 911 calls and Emergency Management Services (EMS) responses, property assessment and tax data, and data from health and human services, parks and recreation, libraries, and environmental services (e.g., trash and recycling, water and utilities, projects and planning, transportation, and building permits).
r Opportunity Data are data generated on an ongoing basis as society moves through its daily paces. Opportunity data are derived from a variety of sources such as GPS systems and embedded sensors, social media exchanges, mobile and wearable devices, and Internet entries. Captured through a variety of methods including direct flows, Internet searches, web crawling and scraping, these data may exist in a variety of electronic and physical modalities.
r Procedural Data are data derived from policies, procedures, and legal requirements; they are the rules and regulations that govern and shape our lives. These policies and procedures affect our work, our personal lives, and society. Examples of procedural data include compensation policies, the Affordable Care Act, the Department of Defense policy "Don't Ask, Don't Tell, " and Supreme Court rulings. Although these data sources are ubiquitous in our daily lives, most are not collected with the intention of statistical analysis. In fact, these data are often messy observational data in which selection biases may be nontrivial . For use in a community-learning paradigm, these data sources must be repurposed for statistical learning. This requires a disciplined yet flexible and adaptable data science framework to assess quality and fitness-for-use that is dependent on the intended use of the data (Keller et al. 2016). This framework incorporates a decision-theoretic approach that assesses the benefits and costs of using data of varying quality to answer a community's questions.
Frequently, local problems are being tackled with federal data at an aggregated scale, ignoring the heterogeneity in the smaller spatial scales (e.g., Humes et al. 2011;Weinberg 2011). In particular, the integration of all data (local, state, and federal administrative data, designed data, procedural data, and opportunity data), which is necessary for small-scale estimation, is simply absent. For example, in federal, state, and local data portals, data sources are made available but there is no attempt to integrate these sources. Take, for example, the new GSA Technology Transformation Service, the U.S. Data Federation (2017) provides access to data sources from the federal, state, and local levels (the local data sources are very limited), but does not provide the technology to integrate these sources. Instead, each user must tackle the challenge of integrating these data sources themselves.
Through the development of the of the CLD3 process, we found many organizations using a variety of data sources to support evidence-based policy development. Their uses range from descriptive displays such as dashboards (e.g., New York University 2017) to inferential such as randomized control trials (e.g., Greiner and Glennerster 2017). Randomized control studies can provide statistical evidence as to whether or not a program works in a specific context, but are unable to explain why (Behn 2015;Deaton and Cartwright 2016).
Randomized control studies are valuable but insufficient to explain specific social phenomena. To do this requires an understanding of the context and human interactions (Flyvbjerg 2005).
Statistical reasoning also seems to be absent in many evidence-based applications (National Academies 2017). For example, absent are descriptive statistics that take into account the representativeness of the data or estimate the variability of the statistic; intervention evaluations that acknowledge the interactions (correlations) within the system; models that account for the hierarchical and spatial nature of the data; and experimental designs that account for confounding factors (National Research Council, 2013;Banerjee 2016).
To address these issues, our research uses statistical learning and develops geospatial indicators to analyze complex social systems. The CLD3 process starts with research questions, discovering data sources to answer these questions, conducting exploratory analysis, and creating statistical models. Based on the statistical analyses and discussions with stakeholders, the cycle continues with proposing policy interventions and implementation of these interventions, including analyzing the interventions and updating policy. The cycle continues to monitor the outcomes of interventions, thus creating an adaptive feedback loop.
The community learning process is designed to enable such research to quantify and understand the association of geospatial indicators pre-and post-policy intervention over time to accurately interpret the impact of an intervention. This involves employing statistical designs that incorporate or account for the myriad of confounders in a complex system. Community learning provides for and encourages changes to policy and the ability to measure the impact of these changes through statistical analysis.

Community Learning Data-Driven Discovery (CLD3)
Rural and metropolitan communities are experimenting, taking risks, and making the hard choices required to address disparities in healthcare and education, and in economic, social, and criminal justice (Harkness 2017). As a process, CLD3 liberates and integrates existing designed, administrative, opportunity, and procedural data sources and makes them available to civic leaders and researchers to address their most pressing issues. The CLD3 process also provides the tools for rural and urban communities to leverage their assets by incorporating feedback from the community's collective knowledge. The CLD3 is a holistic approach that includes the following steps.
1. Engages the community by bringing local government agency and program administrators, civil servants, and community leaders together to breakdown the boundaries among governmental bureaucracies and programs through workshops, forums, and meetings that identify issues, and the data sources that can be used to describe and monitor them. This step also includes identifying governance for maintaining and ensuring the protection and confidentiality of the data. 2. Integrates data sources to gain usable insights from statistical learning, combines these insights with the knowledge and insights of local government agencies and program leaders to plan and implement policies and interventions to meet the goals of the community within their contextual constraints. This step ranges from working with messy observational data to sophisticated modeling to assess causality. 3. Measures and reviews the intervention pathways to gain a better understanding of what works, what does not, and why, by understanding the connections across issues, the timing of problems, and how the data sources fit together. Implementation of interventions requires an evaluation of the outputs and effects of the changes through statistical analysis over time. 4. Redirects interventions when warranted based on the information from a continuous systematic review.
Depending on the findings in Step 2, policy interventions are refined as needed. This creates a continuous community learning process that occurs over time. This CLD3 process is illustrated in Figure 1. Data becomes the new language to communicate within and among organizations, departments, and programs at all levels of government. Statistics is the tool used to translate the data into a story and statistical learning is the technical process. Statistical learning is composed of methods and techniques for modeling and analyzing complex datasets (James et al. 2014). It starts with exploratory data analysis methods that are used to isolate patterns and relationships, uncover unexpected behavior, confirm or disprove assumptions, develop explanations, and generate hypotheses.
The democratization of data can contribute to untangling the innumerable causes of social problems through statistical learning. Data discovery and identification of data gaps are critical first steps. Exploratory data analysis helps to identify how rich an inferential model the data can support or if the data can support inferential analyses at all. No amount of data can supplant the necessity for data analysis grounded in statistical theory to develop, test, and accumulate the causal explanations that can guide policy. Theories and logic models from social science fields should also be brought to bear on addressing the issue and guiding policy (Guthrie et al. 2013) Whether the statistical analyses can guide policy depends on whether the story told by the data can be interpreted through the lens of the community and community learning. Does the Figure . Community Learning Data-Driven Discovery (CLD) process. The CLD process involves four broad steps that are connected through the Data Science Framework (see Figure ). Each step is described in more detail in each box-engage with civic leaders, integrate data, measure and explore using local administrative data flows combined with other data sources, and implement and continuously review program and policy changes to redirect resources as needed. The SDAL data framework provides a rigorous and flexible approach to repurpose local administrative and other data flows, starting with the identification of the problem; discovery and acquisition of data sources; profiling of the data to assess quality; preparing, cleaning, linking, and conducting exploratory analysis. The final steps are examining fitness-for-use in the context of the topic under study; the design of experiments and models; and synthesizing findings and analyzing results. () Data discovery: the identification of potential data sources that could be related to the specific topic of interest. () Data inventory: the method used to screen and inventory the data sources to determine their value in supporting the research question and if they would be worthwhile to acquire. () Data acquisition: the process of negotiating and acquiring the data, and managing legal, privacy, security, and confidentiality practices. () Data profiling: a determination of both the quality of the data, provenance, and its utility to the project at hand. () Data preparation: the process of cleaning and readying the data for analysis; what is referred to as "wrangling the data. "() Data linkage: the process of building links to ensure compatible meaning, schemas, and ontology for data from multiple sources, resulting from the repurposing of the data. () Data exploration: the analysis of the datasets by summarizing main characteristics, often with visual methods. () Fitness-for-use assessment: the characterization of the information content in the results as a function of the analysis model, for example, data quality and data coverage (representativeness). () Design of experiments, modeling, and analysis: exploring new uses of the repurposed data to support insights, data-driven hypothesis development, and policy, and creating intervention and evaluation strategies. story provide the relevant information and insights community leaders need to address the issue? Examples of these issues include: r evaluating an existing program or policy in an effort to make it more resource efficient and cost effective, r identifying vulnerable populations to assess their needs and provide services, and r validating and improving the accuracy and precision of estimates calculated from local administrative data that can drive federal funding. At the heart of this CLD3 process is our data science framework (Figure 2), which encapsulates a rigorous and flexible approach for repurposing data, from discovery to analysis to inference (Keller et al. 2016). Statisticians will agree that often the majority of their time is spent preparing the data for analysis. This is especially true when integrating disparate data sources in a community learning context. Even though data quality judgments depend on the data consumer and use of the data, a framework is needed that adheres to scientific principles and provides a systematic approach to use and translate data to a story that is based on statistical and behavioral science principles (Keller et al. 2016). Our data science framework provides a disciplined process of identifying data sources, preparing them for use, and then assessing the value of these sources for the intended use(s). When working with local governments in the context of building capacity for data-driven governance, such rigor is necessary particularly for communication across departments and program, transferability, and scalability.

Community Learning to Statistical Learning-A Proof-of-Concept
The CLD3 process steps outlined in Figure 1 are described next in the context of a proof-of-concept with examples from our work with Arlington County, Virginia. Arlington County is located in Northern Virginia across the Potomac River from Washington, D.C. It is the geographically smallest selfgoverning county in the U.S., occupying slightly less than 26 square miles. The Arlington legislative body is composed of five members elected at large.

Engage with the Community
Engagement in the CLD3 process can start in many ways. One approach is to do this through a forum across government departments and offices, or other stakeholder groups, to help community leaders identify and define those issues that cannot currently be answered and to identify the data sources that can provide insights into these issues. This is similar to the collective impact paradigm (Kania and Kramer 2011) that stresses an inclusive and structured approach for identifying and solving community problems. CLD3 extends the paradigm by adding statistical and social science practices grounded in the scientific method. An example is our engagement with Arlington County, Virginia. It started with a conversation with local government leaders-the Chief Information Officer, the Fire Chief, Police Chief, and the heads of Economic Development and Planning, Environmental Services, Emergency Management, Community Planning, Housing, & Development, Human Resources, Human Services, Parks & Recreation, and Technology Services. In that meeting, the Fire Chief presented us with the challenge of linking together all of the data collected during the course of an emergency to recreate incidents end-to-end through the data. The Chief Information Officer agreed to work with the Fire Chief to ensure access to data and develop an initial Memorandum of Understanding for data sharing. We acted quickly to gain momentum and developed a first integrated analysis. More details are given in the next section.
The lesson learned in that first meeting was that we had taken it for granted that government leaders and policy makers understood the power their data could bring to actually guide decision making. This should not have been a surprise given how often statisticians have made this same comment in nearly every other application domain (National Research Council 2007). It did, however, lead us to develop a more formal engagement activity.
Working with the Chief Information Officer and Deputy County Manger, we developed a strategy to introduce CLD3 to the county's executive management leadership team. We collaboratively organized a Data Discovery Workshop. This was created to both identify community issues and the data sources that could be used to describe the current situation as a baseline and then to implement and monitor new interventions and policy changes.
The workshop provided a forum for government leaders across departments and offices to envision telling the story of their community through the language of their own program data by combining their data with data sources from other departments. The workshop was structured to breakdown the silos across governmental bureaucracies through a dialog process to identify issues and data sources that span their boundaries. Our experience confirms that a single department may be reticent to tackle an issue such as childhood obesity due to the complexity of the associated risk factors, diet, lack of exercise, and family psychological and socioeconomic factors. But when approached from a systems perspective, departments across the government can contribute their unique resources and expertise to addressing the risk factors that align with their mission. Our CLD3 approach would first explore factors that contribute to childhood obesity and then test these factors for statistical relevance. Based on the findings and discussions with stakeholders, policy interventions would be proposed, implemented, and analyzed over time.
The  life rankings based on 20 indicators measuring the social, safety, physical, and economic conditions in Charlotte's 173 neighborhood statistical areas. The indicators were used to group the neighborhoods into stable, transitioning, or challenged. The city used this information to direct resources to transitioning and challenged neighborhoods (Charlotte and Mecklenburg County 2017).
Following the overview presentation, participants were placed into small groups to discuss the local issues that keep them awake at night. No group was allowed more than one representative from a department, program, or office. This provided the opportunity to forge new relationships. The issues that were voiced from the groups are listed in Table 1.
What became evident at the workshop is that the leaders in Arlington want to be responsive to their residents and to provide an environment in which all residents can thrive. After the Data Discovery Workshop, the county manager created a steering committee to begin to build data-driven learning collaborations across programs and departments and to select the initial crosscutting issues to address.
The crosscutting issues proposed led to a new discussion about data sharing among departments, between Arlington and Virginia Tech, and between Arlington and data vendors that are under contract to provide systems for the collection of data and production of reports. Many of the vendor contracts do not allow access to the underlying data. This is changing as the Arlington lawyers are now revising the contracts to ensure that they own and have access to their own data.
Many sources of local data are open public records and can be used to start the CLD3 process. Our progress here is described through the examples below, many of which use open data.

Integrate, Measure, Review, Redirect
Following the initial engagements with the civic leaders, some compelling and successful demonstrations of the community learning cycle are necessary to move toward the adoption of this approach. Also, a forum like the Data Discovery Workshop raises expectations among the participants so efforts should be made to meet some of these expectations to maintain interest and momentum. To meet the expectations raised in the workshop, two actions were taken. First, local government demonstrated commitment to the next steps by establishing a data steering group to guide the development of collaborative projects between government programs, departments, and our laboratory researchers. Second, we made sure that the timing of the workshop could leverage a program in our laboratory that would accelerate our ability to work on some of the issues raised by the leadership. These projects became a central focus of our Data Science for the Public Good (DSPG) summer student fellow program.
The DSPG program is an incubator to educate and train the next generation of government data scientists by exposing them to data science projects that integrate data from the municipal, state, and federal levels of government. This includes engaging students and postdoctoral fellows in the data discovery learning cycle giving them formal, hands-on, training of data discovery, access, cleaning, preparation, and exploratory analysis. Civic leaders participate in team meetings and begin to see their data and programs with a new lens. Arlington now has a full community-based research model running, and these collaborations are starting to drive the planning directions of the steering committee. This process started slowly but gained momentum as agency and program leaders heard about this work (SDAL 2016).
To achieve the full community learning cycle, the steps taken need to be demonstrated clearly and repeatedly. The three examples described below demonstrate how this can be accomplished and how community learning through statistical learning is achieved. Although the process may appear linear, it is not; it involves many steps, iterations, and deep collaboration with the stakeholders.

... Example : Arlington County Fire/Emergency Medical Services (EMS)
As noted earlier, the Arlington Fire Chief posed the first question. He asked us if it was possible to link 911 incidents through the data to improve the fire department's situational awareness. The data being generated during an emergency are used to guide the Fire/EMS staff in their response to an emergency and not designed to be used in a retrospective analysis of the incident. What we discovered were silos of data, starting with the call data coming into the center, response times for engine and medic units, and final incident reports for each medic and engine unit. There was no unique identifier that could be used to link an incident across the various data sources. By linking 911 incident data sources with EMS health records and geographic data (following the processes outlined in the data science framework in Figures 2  and 5), the data were repurposed to address several Fire/EMS questions.
Many people call 911 when there is not a true emergency. Serving nonemergency customers is stressful for firefighters and an inefficient use of resources. By linking calls with their geographic location, it was discovered that what seemed like frequent nonemergency calls from the same individual were actually frequent calls from different individuals at the same location, such as senior assisted living centers and rehabilitation centers (see Figure 3, top). Data discovery and statistical learning changed the perception of the problems (through community learning) leading administrators to change their approach from an emergency response to one that focused on addressing the needs of each center through education and engaging the help of the Department of Human Services. The fire chief worked with the head of the human services department to arrange visits to these centers to identify what services were needed and to provide training on the appropriate use of 911 calls. This intervention resulted in an immediate decline in 911 calls that eventually increased, prompting new visits to the centers. The decline in 911 calls meant fewer fire and medic units on the road and a more efficient allocation of resources for both the Fire/EMS and Human Services departments.
The ability to recreate Fire/EMS incidents through the data provided situational awareness and new insights. For example, by visualizing the relationship between weather and incident volume, resources can be located during weather events to improve response times, for example, wooded areas during wind events or low-income areas during heat events. General unit utilization over time of day and day of week can help determine if new resources are needed (see Figure 3 bottom). In addition, calculating the match (or lack of) between call classification and final incident classification leads to better use of resources. The statistical analyses of the linked data sources and the lessons learned are highlighted in Figure 4 that presents the usage of engine, medic, and other units by day aggregated over 3 years of data.

... Example . Arlington County Multi-Department
Question Regarding Reaching Residents with Services Many departments during the Data Discovery Workshop expressed the desire to understand why residents use or do not use their services. As a first step to addressing this issue, we developed a county map to identify neighborhood block groups based on their economic vulnerability. Our initial analysis developed a simple economic vulnerability index at the census block-group level using American Community Survey (ACS) data. The block-groups were ranked based on: the percentage of Households (HHs) spending more than 50% of HH income on housing, percentage of HHs with no vehicle, percentage of HHs receiving Supplemental Nutrition Assistance Program (SNAP), and percentage of HHs in poverty. These ranks were then summed to create an overall index. Albeit it is simple, this has provided some valuable insights. For example, overlaying the locations of the households receiving Arlington County Department of Parks & Recreation program fee discounts, there are a disproportionate number of census block groups in the top quartile of the Vulnerability Index with less than 10 HHs receiving program fee discounts. This has presented the department with some guidance on where they could target outreach. It also begins to lay the foundation for the statistical evaluation of new policies. Figure 5 summarizes these findings and next steps.
Arguably, many different indices can be developed. However, little research has been done to create indices at the sub-county or sub-metropolitan area geographic detail. This research is needed as local governments begin to govern based on datadriven evidence versus anecdote.

... Example . Using Local Property Data for Planning in
Arlington County Arlington County demographers are concerned about the availability of affordable housing and in knowing how long a household has lived in Arlington. This information is used in their projections of the number of school age children. Since Arlington's housing capacity is limited by the supply of suitable land for residential development, it is important to The charts presented in Figure 6 demonstrate the usefulness of local property data to answer the demographers' questions. Local real estate tax assessment data include the value of the home and characteristics of the structure and land on which the structure is built. The Arlington County real estate assessment data included 59,289 single-family housing units with assessed values and geocoded addresses placing them within the Arlington County census tracts. These data provide a detailed view of a geographic area that is not possible using similar data from federal surveys, such as the American Community Survey (ACS). The top chart in Figure 6 compares housing value aggregated by census tract using the ACS on the left-hand side and then shows the area highlighted by the yellow circle (four census tracts) on the right-hand side using the local real estate tax assessment data. The data displayed on the right provide a more detailed picture of the housing stock since it reflects the heterogeneity within a census tract by using nonaggregated data at the level of the housing unit.
In addition, local real estate tax assessment data can be used as a proxy for wealth and to calculate measures of diversity (Simpson 1949). The Simpson's indices of diversity were estimated using the local data, house value, year built, property type, and number of bedrooms, for each census tract and census block-group (following Narwold and Sandy 2010) and displayed in the bottom of Figure 6. Not displayed is the same index calculated using the 2009-2013 ACS data, which is a sample of 450 households representing the approximately 60,000 single-family housing units contained in the local tax assessment data. The estimates using the local data are based on the population of single-family homes, whereas the ACS estimates are based on a sample that has less geographic detail. The Arlington demographers used the findings based on the local data in their long-run planning analyses.

Conclusions-Implementing and Maturing
Community Learning Data-Driven Discovery (CLD3) Data-driven evidence is a necessary tool for community leaders to make the hard choices required to address disparities in healthcare and education, and in economic, social, and criminal justice. We have developed and piloted the CLD3 process that combines the promise of the data revolution with a scientifically rigorous approach to evidence-based policymaking and evaluation. Insights from the CLD3 process provide the necessary information for local governments to make decisions about alignment of services and funding.
To provide the data-driven evidence necessary for local governments at all levels (e.g., cities, counties, towns) to build sustainable equity into its governance, it is time to move away from one-off studies and bring the methods and tools of statistical sciences forward to ignite a culture of Community Learning. The CLD3 is natural for statisticians and social scientists to embrace since it mirrors the research process-data integration and exploration driving hypothesis development, formulation of intervention strategies based on statistical and community learning, and a continuous evaluation of the intervention based on data integrated across departments and agencies. It is simply a restating of the scientific method that undergirds our research process on a daily basis. However, it is not necessarily so obvious to civic leaders in their policy development. This is why researcher engagement is needed to begin a sustainable movement.
To scale this to a national movement, we propose leveraging the U.S. Public and Land Grant Universities' current interests in community-based research and experiential learning. Using the infrastructures of these institutions we could transform how local governments build evidence into daily decision-making. This new and bold agenda reflects the origins that President Lincoln set out 150 years ago for the mission of the nation's Land Grant University network to translate data to information to knowledge to action.