Skip to Main Content
1,179
Views
29
CrossRef citations to date
Altmetric
 
Translator disclaimer

When is the rigorous impact evaluation of development projects a luxury, and when a necessity? The authors study one high-profile case: the Millennium Villages Project (MVP), an experimental and intensive package intervention to spark sustained local economic development in rural Africa. They illustrate the benefits of rigorous impact evaluation in this setting by showing that estimates of the project's effects depend heavily on the evaluation method. Comparing trends at the MVP intervention sites in Kenya, Ghana, and Nigeria with trends in the surrounding areas yields much more modest estimates of the project's effects than the before-versus-after comparisons published thus far by the MVP. Neither approach constitutes a rigorous impact evaluation of the MVP, which is impossible to perform due to weaknesses in the evaluation design of the project's initial phase. These weaknesses include the subjective choice of intervention sites, the subjective choice of comparison sites, the lack of baseline data on comparison sites, the small sample size, and the short time horizon. The authors describe one of many ways that the next wave of the intervention could be designed to allow proper evaluation of the MVP's impact at little additional cost.

Notes

1. For an example of the research to which Binswanger-Mkhize refers, see Lele (1975 Lele, U., 1975. The design of rural development: lessons from Africa. Washington, DC: World Bank. http://go.worldbank.org/1UOEW9NCX0 (http://go.worldbank.org/1UOEW9NCX0) (Accessed: 23 August 2011).  [Google Scholar]).

2. There are numerous other model village efforts now underway around the world – including hundreds in India, such as the Kuthambakkam model village in Tamil Nadu and the Pattori model village in Bihar.

3. The project's stated objectives have focused on the five-year time horizon: ‘In five years, not only will extreme poverty be wiped out, Sauri will be on a self-sustaining path to economic growth’ and ‘These investments, tailored to meet the needs of each community, are designed to achieve the Millennium Development Goals in 5 years’ (Millennium Promise 2007 Millennium Promise, 2007. Millennium Villages: a revolution is possible. www.un.org/esa/coordination/Alliance/MPBooklet.pdf (http://www.un.org/esa/coordination/Alliance/MPBooklet.pdf) (Accessed: 31 August 2010).  [Google Scholar]); and ‘The Millennium Villages project is an integrated development initiative providing immediate evidence that, by empowering communities with basic necessities and adequate resources, people in the poorest regions of rural Africa can lift themselves out of extreme poverty in five year's time ….’ (MVP 2007a MVP, 2007a. Millennium Villages: overview. http://www.millenniumvillages.org/docs/MVInfokit_rev16.pdf (http://www.millenniumvillages.org/docs/MVInfokit_rev16.pdf) (Accessed: 6 April 2010).  [Google Scholar].) The calendar of MVP key activities (MVP 2010b MVP, 2010b. Key activities. http://www.millenniumvillages.org/aboutmv/keyactivities.htm (http://www.millenniumvillages.org/aboutmv/keyactivities.htm) (Accessed: 30 September 2010).  [Google Scholar]) presents a five-year programme showing ‘Outcomes’ for years three, four, and five as ‘Achievement of Millennium Development Goals for child mortality, education, environment, health, gender equality, maternal mortality and water’. The mid-term evaluation report (MVP 2010c MVP. 2010c. Harvests of development: the Millennium Villages after three years, New York: The Earth Institute at Columbia University.  [Google Scholar], p. 2) explains that the project is conceived of as ‘a ten-year initiative spanning two five-year phases’, where the first phase ‘focuses on achieving quick wins, especially in staple crop production and disease control, and on establishing basic systems for integrated rural development that help communities escape the poverty trap and achieve the MDGs’. The second phase will ‘focus more intensively on commercializing the gains in agriculture and continuing to improve local service delivery systems in a manner that best supports local scale-up’.

4. It is worth noting that the counterfactual is not the absence of all interventions of any type, because the MVP evaluation cannot be and should not be an evaluation of all publicly-funded activity of any kind. Rather, the proper counterfactual for an impact evaluation of the MVP whatever interventions the MVP sites would have received in the absence of the MVP.

5. Fisman and Miguel (2008 Fisman, R. and Miguel, E. 2008. Economic gangsters: corruption, violence, and the poverty of nations, Princeton, NJ: Princeton University Press.  [Google Scholar], pp. 202–206) raise concerns about the lack of rigour in the MVP impact evaluation design, and posit that broader national trends might be responsible for some of the changes observed in the MV intervention site at Sauri, Kenya.

6. The period consisting of the three years of the program varies by country: 2005–2008 for Sauri, and 2006–2009 for Bonsaaso and Pampaida. The DHS data are from 2003 and 2008/09 for Kenya, 2003 and 2008 for Ghana, and 2003 and 2008 for Nigeria.

7. Appendix 2 gives the definitions of all indicators used. The DHS are nationally-representative household surveys containing individual-level data on indicators of population, health, and nutrition, carried out by the Measure DHS Project in cooperation with local governments and non-governmental organisations in countries all over the world since 1984. Although they are often used to study maternal and child health, they are representative of all households – not just those with children. Comparable and standardised survey data are collected roughly every five years in many countries, and made publicly available online (http://www.measuredhs.com). The project is principally funded by the US Agency for International Development. In July 2010 the most recent publicly-available standard DHS microdata for Uganda covered the years 2000/01 and 2006, which do not overlap with the MV initial evaluation period of 2006–2009. The most recent data for Malawi covered the years 2000 and 2004, which also do not overlap with the MVP initial evaluation years. DHS surveys from Rwanda (2005 and 2007/08) overlap with the intervention period for the MV site in that country (2006–2009), but indicators for the Rwanda MV site were not published in MVP (2010c MVP. 2010c. Harvests of development: the Millennium Villages after three years, New York: The Earth Institute at Columbia University.  [Google Scholar]).

8. The indicators shown in the figures as ‘MV region, rural’ are for rural households in the region in which the MV is located. This is rural Ashanti in Ghana, rural Nyanza in Kenya, and the rural Northwest Region in Nigeria. The Nigeria DHS does not provide state-level data in 2003.

9. Standard errors are not reported in the interim MV report, so we are not able to report standard errors for the indicators for the MV sites, the trends at the MV sites, or the differences between the trends at MV sites and the surrounding areas.

10. For example, the Nigeria pane of Figure 3 shows a higher linear slope at the intervention site than in the rural area of the surrounding region, but all the points in both could hypothetically lie on roughly the same exponential growth curve.

11. The gross attendance ratio is difficult to interpret, and a high ratio is not unambiguously positive. Ratios above one (such as those for Kenya and Ghana) indicate the presence of underage and/or overage children – that is, primary school attendees outside the age range six to 11 – and an increase in grade repetition can increase the ratio.

12. For example, Ashraf et al. (2010 Ashraf, N., Fink, G. and Weil, D.N., 2010. Evaluating the effects of large scale health interventions in developing countries: the Zambian malaria initiative. Cambridge, MA: National Bureau of Economic Research, Working Paper 16069. http://www.nber.org/papers/w16069 (http://www.nber.org/papers/w16069) (Accessed: 23 August 2011).  [Google Scholar]) use changes in DHS data on fever occurrence as a rough proxy for changes in malaria morbidity in Zambia.

13. Kibaara et al. (2008 Kibaara, B. Trends in Kenyan agricultural productivity: 1997–2007. Working Paper 31/2008. 2008. Tegemeo Institute of Agricultural Policy and Development. Nairobi: Egerton University [Google Scholar], Table 8) shows a yield of 0.506 tons/acre in Western Lowlands in 2007 (that is, 1.25 tons/hectare). In 2004 it was 0.231 tons/acre.

14. The child mortality figures are for the five-year period preceding each survey.

15. We define ‘substantially less improvement’ as an absolute difference of more than five percentage points between the estimates by the two methods. Because MVP (2010c MVP. 2010c. Harvests of development: the Millennium Villages after three years, New York: The Earth Institute at Columbia University.  [Google Scholar]) does not include standard errors for the indicator estimates, we are unable to estimates confidence intervals for either the before-versus-after or the differences-in-differences estimates. As a result, we cannot determine which estimates are statistically significant. Also note that ‘improvement’ in the indicator means a positive change, except for stunting, for which a decline is improvement.

16. For example, Sanchez et al. (2007 Sanchez, P. 2007. The African Millennium Villages. Proceedings of the National Academy of Sciences, 104(43): 1677516780. [Crossref], [PubMed], [Web of Science ®] [Google Scholar]) note that a health clinic in Sauri receives some patients who are non-residents.

17. Sauri comprises 1.32 per cent of the rural population of Nyanza Province, Kenya. Bonsaaso comprises 1.59 per cent of the rural population of Ashanti Region, Ghana. Pampaida comprises 0.02 per cent of the rural population of the Northwest Region, Nigeria. This calculation assumes: 65,000 residents of Sauri, 35,000 in Bonsaaso, and 6,000 in Pampaida, as reported by the MVP (2010c MVP. 2010c. Harvests of development: the Millennium Villages after three years, New York: The Earth Institute at Columbia University.  [Google Scholar]); each national government's census-based regional population estimates (5,442,711 for Nyanza Province, Kenya in 2009; 4,589,377 for Ashanti Region, Ghana in 2008; and 35,786,944 for Northwest Region, Nigeria in 2006); and the percentage of people defined as living in ‘rural’ areas weighted by sampling weight in the most recent DHS survey data (90.2 per cent for Nyanza Province, Kenya in 2008; 48.0 per cent for Ashanti Region, Ghana in 2008; and 79.5 per cent for Northwest Region, Nigeria in 2008).

18. Millennium interventions bean in Sauri, Kenya in late 2004. The study protocol document was registered with the ClinicalTrials.gov registry site in May 2010.

19. What we describe here is the section of the protocol related to evaluation of programme effects. Separate sections of the protocol not discussed here describe a process evaluation and measurement of the programme's costs.

20. The protocol lists 10 sites for the evaluation and excludes the sites established in 2005: Sauri, Kenya and Koraro, Ethiopia. Sauri is, however, included in the mid-term evaluation report, along with four of the sites listed in the protocol. According to the calendar in both the protocol and the evaluation report, interim assessments for four other sites should have been completed by the first half of 2010. Specifically, the report includes the following indicators not mentioned in the protocol: malaria prevalence among all individuals, maize yields, percentage of children in primary school receiving school meals, primary gross attendance ratio, rates of HIV testing among men and women 15–49, and mobile phone ownership. The report excludes the following indicators specified in the protocol: wasted and underweight nutrition measures, duration of breastfeeding, age at introduction of complementary feeding, proportion of children under five with diarrhoea in past two weeks, proportion of children under five with diarrhoea in past two weeks who received oral rehydration solution, proportion of children under five treated for pneumonia, prevalence of malaria among children under five, proportion of children under five with fever in the past two weeks who receive appropriate anti-malarial treatment, proportion of pregnant women who received an HIV test, proportion of newborns receiving a postnatal check in the first week of life, survival rate to last grade of primary education, an asset-based wealth index, and the proportion of households reporting not enough food for one of the past 12 months. It is unclear why some data are not given in the report for some village sites; for example, why no data on access to improved sanitation are given for Sauri.

21. The protocol states that the overall project evaluation will adhere to Transparent Reporting of Evaluations with Nonrandomized Designs (TREND) guidelines, which are detailed in Des Jarlais et al. (2004 Des Jarlais, D.C., Lyles, C. and Crepaz, N. 2004. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement. American journal of public health, 94(3): 361[Crossref], [PubMed], [Web of Science ®] [Google Scholar]). Among the information that should be reported according to the TREND guidelines are ‘data on study group equivalence at baseline and statistical methods used to control for baseline differences’. But the MVP protocol does not provide for the collection of baseline data on comparison villages.

22. In describing the details of how outcomes at the MV sites will be compared with those from the comparison villages, the protocol says that ‘[f]or variables where no data on baseline status exist, rural sub-national data will be imputed’. It is not clear how this imputation will work, and imputed baseline figures cannot substitute for true baseline data.

23. Comparing 300 treated households at one site that are very similar to each other against 300 non-treated households at a comparison site that are very similar to each other is closer to being a study in which n = 2 than one in which n = 600. A measurement with very low n has a very large statistical confidence interval.

24. Here, ‘reliably detectable’ means that with a probability of 80 per cent, the difference will be detectable at the 5 per cent level of statistical significance.

25. MDG number five calls for a drop of two-thirds in child mortality between 1990 and 2015. This equals an annual decline of 1 – (1 – (2/3))1/25 = 4.3 per cent per year. A drop of 40 per cent in five years, the minimum change reliably detectable by the sample size in the current MV evaluation protocol, equals an annual decline of 1 – (1 – (0.4))1/5 = 9.7 per cent per year.

26. Esther Duflo is a professor of economics at the Massachusetts Institute of Technology. Parker (2010 Parker, I. 2010. The poverty lab: transforming development economics, one experiment at a time. New Yorker, 17 May: 7989.  [Google Scholar]) paraphrases an email sent by Duflo to MVP creator and Columbia University professor Jeffrey Sachs in 2009 stating that, in her opinion, it was ‘too late’ to use rigorous evaluation methods on the existing programme, although ‘the methods could be used in any later expansion’.

27. Pair-matched randomisation may not be the most efficient impact evaluation design in this setting for a fixed evaluation budget; more statistical power might be obtained from having matched triplets or matched quadruplets, of which only one receives treatment. We suggest pair-matched randomisation in this setting because it is efficient given a fixed number of treatment sites, a condition more relevant to this circumstance than a fixed evaluation budget.

28. This could be done using widely-available rainfall data plus rich information contained in the most recent census. Recent censuses are available for many countries, including Ethiopia (1994, 2007), Ghana (2000), Kenya (1999, 2009), Malawi (1998, 2008), Mali (1998, 2008), Nigeria (1991, 2006), Rwanda (2002), Senegal (2002), Tanzania (2002), and Uganda (2002). Access policies for the census microdata vary by country. For many countries that do not make their microdata publicly available, given the extremely high profile of the MVP, we believe that it is likely that access to the data sufficient to conduct the procedure we describe could be negotiated.

29. All data from the KHDS are publicly available online (http://www.edi-africa.com/research/khds/introduction.htm). We use baseline data from 1991 and follow-up data from 1994 on each household. This dataset consists of a total of 578 households across 51 clusters in Kagera. We then take the difference in outcome variable for each household from wave one to wave four. The household-level outcomes chosen are primary school completion rate among children in each household and whether the household had an improved source of drinking water, both of which are dummy variables. A household was given a value of one if any child in the household had completed primary and zero otherwise. A household received a value of zero if their source of drinking water was the lake and one otherwise. Outcomes were chosen based on: similarity to outcomes of interest in the MVP evaluation; a mean value across households that was not close to zero or one; and ease of calculation. We generate matched pairs of village clusters by the method of Bruhn and McKenzie (2009 Bruhn, M. and McKenzie, D. 2009. In pursuit of balance: randomization in practice in development field experiments. American economic journal: applied economics, 1(4): 200232. [Crossref], [Web of Science ®] [Google Scholar]). We match on six cluster-level characteristics: number of households in each cluster, major economic activity, major religion and ethnicity, distance to nearest paved road and electrification of villages. These characteristics are chosen to be similar to the matching criteria sketched in an annex to the MVP evaluation protocol. Finally, the residuals from a regression of the change in outcome 1991–1994 on cluster-pair fixed effects are used to calculate the minimum detectable effect for a given sample size of paired clusters using the following formula (Duflo et al. 2008 Duflo, E., Glennerster, R. and Kremer, M. 2008. “Using randomization in development economics research: a toolkit”. In Handbook of development economics, Edited by: Schultz, T.P. and Strauss, J. Vol. 4, 38953962. Amsterdam: Elsevier.  [Google Scholar]): where J is the number of clusters, P is the proportion of clusters that are treated, α is the significance level, κ is the desired power, is the variance across clusters, is the variance across households in each cluster and . We assume that one cluster in each pair is treated and the other is control (P = 0.5), there are 100 households per cluster (n = 100), the significance level is 5 per cent, and the probability that the minimum detectable effect can be detected at this significance level is 90 per cent.

30. Radelet (2010 Radelet, S. 2010. Emerging Africa: how 17 countries are leading the way, Washington, DC: Center for Global Development.  [Google Scholar]) argues that five fundamental changes have driven the broader turnaround in the ‘emerging African countries’ that he profiles: more democratic and accountable governments; more sensible economic policies; the end of the debt crisis and major changes in relationships with the international community; new technologies that are creating new opportunities for business and political accountability; and a new generation of policy-makers, activists, and business leaders.

 

Further reading

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.