Integrated evaluation of payments for ecosystem services programs in China: a systematic review

ABSTRACT Introduction: During the past two decades, payments for ecosystem services (PES) program has become a popular conservation paradigm for realigning socioeconomic costs and benefits among different stakeholders. As billions of investment flows into the natural capital pool, there is growing interest to understand the ecological, economic, and social outcomes of PES programs. China is one of the countries that extensively implements PES programs. Although there is a growing interest to perform impact evaluation of China’s massive PES programs, it is unclear that what existing literature has done, has not done, and should do in the future. Therefore, to guide further research and practices, we conduct a systematic review of studies on China’s PES programs. Results: Our review shows that there are growing impact evaluation studies of PES programs in China. However, the spatial and ecosystem distributions of existing studies are quite uneven. Most case studies were poorly designed, rarely quantified, and evaluated without sophisticated methods. Among the three dimensions of ecological effectiveness, economic efficiency, and social equity, economic efficiency is the least studied. Discussion and Conclusion: We further discuss the challenges and opportunities and provide insights for future research. To improve the understanding and management of natural capital, we call for mainstreaming impact evaluation of ecosystem service policies in China and beyond following the state-of-art procedures.


Introduction
During the past two decades, payments for ecosystem services (PES) program has become a popular conservation paradigm for realigning socioeconomic costs and benefits among different stakeholders. Globally, hundreds of PES programs have been designed and implemented from local, regional, to national scales (Farley and Costanza 2010;Yang et al. 2013a). As billions of investment flows into the natural capital pool, there is growing interest to understand the ecological, economic, and social outcomes of PES programs (Ferraro and Pattanayak 2006;Baylis et al. 2016;. In existing literature, there are scattered PES case studies that evaluate the ecological, economic, and social outcomes at various scales. For instance, early in 2006 in Costa Rica, Sierra and Russman (2006) found that PES program had limited immediate effects on forest ecosystem services. Sierra and Russman (2006) speculated that there might be a time lag before PES programs took effects. Later in 2014, in a rigorous national impact evaluation using long-term geographic and socioeconomic data, Ferraro and Hanauer (2014) confirmed that tourism explained two-thirds of the poverty reduction associated with the establishment of Costa Rican protected areas. While protected areas indeed reduced deforestation and promoted recovery, the associated land-cover change did not have a statistically significant effect on poverty. Wunscher, Engel, and Wunder (2008) criticized the lack of spatial differentiation in PES targeting and proposed an inverse auction approach for site selection to improve the economic efficiency of PES programs. According to their empirical analyses in Costa Rica, the best scenario of PES targeting can double the economic efficiency. In addition, Gross-Camp et al. (2012) focused on the legitimacy, fairness, and equity issues of PES programs. Their case study in the Nyungwe National Park in Rwanda suggested that the ecological effectiveness of studied PES program depended on the equitable distribution of payments, the legitimacy and fairness of institutions, participants' belief and acceptance of the paid ecosystem services, as well as the complementary nature of PES to conventional enforcement methods.
China is one of the countries that extensively implements PES programs. In response to the severe drought in 1997 and devastating Yangtze River flood in 1998, China initiated two of the world's largest PES programsthe Grain-to-Green Program (GTGP, also known as the Sloping Land Conversion Program) and the Natural Forest Conservation Program (NFCP). The GTGP and NFCP provide payments to local households and county governments or state-owned enterprises to motivate conservation behaviors, respectively. The main goal of GTGP was to reduce soil erosion through the conversion of sloping croplands to forests and grasslands. By 2014, 28.6 million ha of croplands involving more than 120 million farmers were enrolled in GTGP (SFA 2014). The main aim of NFCP was to protect and restore natural forests through logging bans and afforestation. By 2014, NFCP afforested 15.5 million ha of forests through aerial seeding, artificial planting, and mountain closure (SFA 2014). Besides the GTGP and NFCP, there are also many other different PES programs implemented in China across scales. China's massive implementation of PES programs also attracts more and more attention to their ecological, economic, and social performance. For example, Uchida, Xu, and Rozelle (2005) conducted a cost-effectiveness analysis in Ningxia and Guizhou Provinces. Their results suggested that the cost-effectiveness of GTGP could be moderately improved by replacing the uniform payment rate to a flexible rate based on actual opportunity costs and ecological benefits of each plot. Zheng et al. (2013) quantified the costs and benefits of both service providers and beneficiaries in the Paddy Land to Dry Land program. Their results showed that the PES program achieved unusual win-win outcomes with 5% short-term increase in water provision to the downstream and 50% of increase in household income in the upstream. Li et al. (2011) found that GTGP in Zhouzhi County, Shaanxi Province significantly contributed to the increase of household income and reduction in income inequity for households who participated in GTGP than those who did not. Viña et al. (2013) suggested that the economic efficiency of GTGP in Baoxing County, Sichuan Province could have doubled if the targeting approach was used for cropland selection. Their results showed an overall trend of ecosystem service recovery from natural capital investment via a series of conservation policies including NFCP, GTGP, and so forth.
Although there is a growing interest to perform impact evaluation of China's massive PES programs, it is unclear that what existing literature has done, has not done, and should do in the future. In particular, there has long been a concern whether PES programs are sustainable in the long run. Some scholars speculated that massive implementation of PES programs might not achieve their claimed ecological outcomes (Cao 2008 . From our point of view, if a PES program were to be sustainable in the long run, it needs to satisfy all the three criteria: ecologically effective, economically efficient, and socially equal. In addition, a group of mainstream scholars on impact evaluation recently have stated that many existing impact evaluation literatures do not meet the basic standards of study design and evaluation method, which largely attenuate the credibility of assessment results (Baylis et al. 2016). Therefore, to guide further research and practices, we conduct a systematic review of studies on China's PES programs. We attempt to cover case studies investigating any of the three dimensions of outcomesecological effectiveness, economic efficiency, and social equity (3E). Our objectives are (1) to display the temporal trend and spatial distribution of existing literature on the impact evaluation of ecosystem service policies in China; (2) to synthesize the study design, evaluation methods, and ecological, economic, and social outcomes of existing case studies; and (3) to highlight some important challenges and gaps for the integrated monitoring and evaluation of ecosystem service policies in China and beyond.

Materials and methods
We conducted a comprehensive search of the Web of Science and the China National Knowledge Infrastructure (CNKI) databases (Figure 1). In the Web of Science database, our search strategies were: TS = ((compensation* OR polic* OR payment*) AND ("ecosystem service*" OR "ecological service*" OR "environmental service*") AND (China)) and the indexes were SCI-EXPANDED, SSCI, and A&HCI. The CNKI database is China's largest digital library that deposits most of the Chinese academic publications. To collect useful articles comprehensively, we conducted two retrievals in the CNKI in Chinese, using different search strategies: one is Full text = ("payments for ecosystem services" and "case study"); and another one is Themes = ("payments for ecosystem services" and "case study") or Themes = ("payments for ecosystem services" and "case"). In both databases, the document type was "Article," and deadline of search was December 2016. Through these three searches, we retrieved 384, 5739, and 1176 articles, respectively.
However, 98% of the retrieved articles, especially Chinese literature, were descriptive ones on the lessons and experience of PES program implementation, and how to design and improve PES projects, rather than the impact evaluation of PES programs. Meanwhile, our review intended to collect case studies that at least empirically evaluate one dimension of the 3E. In addition, given that some Chinese journals with English abstracts were also indexed by Web of Science, some articles appeared in two or three searches. In the end, we retained 139 articles for in-depth analysis. Appendix A lists out the full citation of these 139 articles.
For the remaining 139 articles, we recorded 22 items of information, including study year and province, ecosystem types, study design, unit of analysis, assessment methods, qualitative or quantitative of the assessment, and ecological, economic, and social outcomes, trade-offs and synergies between different ecosystem services, spillover effects or externalities, and so on. Though, among the 139 articles, there were 34 with only text description and did not conduct any data analysis, we included them for the display of temporal and spatial patterns; however, for comparative analyses of quantitative impact evaluation outcomes, we only used the 105 cases.

Results
Temporal trend and spatial distribution Figure 2 shows that overall case studies on PES program assessment are increasing across years, although the trend is fluctuating from one year to another. According to our selection criteria, the earliest quantitative case was a forest recovery program in 2004 (Zhi et al. 2004). Meanwhile, forest programs dominated the literature pool (62%), followed by wetland (19%), grassland (17%), and farmland (2%). In addition, among the forest programs, except some cases like ecological forest program in Beijing (Mi et al. 2007), new soil and water conservation project in Fujian (Cao et al. 2009), and ecological forest program in Zhejiang (Zhou and Sheng 2008), 84.4% of cases were GTGP and NFCP. Only three cases of farmland PES programs were recorded, including two qualitative studies. Deng, Xiao, and Yan (2015) conducted a quantitative research on the performance of PES for green agriculture production. This case also involved a lake restoration project, and thus, we counted it as a wetland PES case as well. Nevertheless, in the later analysis, we only counted it as a wetland case because only this article mentioned farmland among the quantitative cases. Figure 3 maps the spatial distribution of both quantitative and qualitative case studies. Provinces with the largest amount of studies were Shaanxi, Sichuan, Inner Mongolia, Yunnan, and Gansu. Except that Inner Mongolia is famous for its grassland ecosystems, all other dominated provinces are biodiversity hotpots. In particular, Shaanxi, Sichuan, and Gansu provinces are the only places with wild giant panda population. In contrast, wetland cases were scattered across provinces, such as watershed PES programs between Zhejiang and Anhui (Ma and Du 2015), as well as Figure 1. Procedures for literature selection. ①: Full text = ("payments for ecosystem services" and "case study"); ②: Themes = ("payments for ecosystem services" and "case study") or Themes = ("payments for ecosystem services" and "case"). ③: TS = ((compensation* OR polic* OR payment*) AND ("ecosystem service*" OR "ecological service*" OR "environmental service*") AND (China)) and the indexes were SCI-EXPANDED, SSCI, and A&HCI. In both databases, the document type was "Article," and deadline of search was December 2016. ④: This criterion means that case studies must at least empirically evaluate one dimension of the ecological effectiveness, economic efficiency, and social equity.

Study design and quantification level of existing case studies
According to our statistics, none of the existing case studies implemented an experimental design and only 27 cases adopted the quasi-experimental design. Seventy-eight cases utilized the nonexperimental design. In these 78 cases, the policy effect was reflected by the arithmetic difference of outcome indicators between the post-and pre-policy implementation, without considering the control group, confounding factors, and other control variables. Figure 4 reflects the quantification level of the sorted 105 case studies. Our results show that 27 cases mentioned all the 3 aspects of 3E, while only 16 of them quantified the 3E. Thirty-one cases mentioned 2 aspects of 3E but only 10 of them quantified the 2E. The remaining 38 cases mentioned and quantified 1E (Figure 4(a)). If sorted by each aspect of 3E, 61, 28, and 55 cases quantified the effectiveness, efficiency, and equity, respectively (Figure 4(b)). Nevertheless, our statistics also show that only 22 of the 61 quantitative cases on effectiveness, 9 of the 28 quantitative  cases on efficiency, and 24 of the 55 quantitative cases on equity conducted rigorous statistical analyses.
Evaluation results of 3E Figure 5 shows the evaluation results by different aspects of 3E. For effectiveness, 78.8%, 11.8%, and 9.4% of cases suggested positive, mixed, and negative outcomes, respectively. For efficiency, 54.5%, 27.3%, and 18.2% of cases suggested high efficiency, low efficiency, and inefficiency outcomes, respectively. For equity, the distribution of different outcomes was more even, with 35.4% of positive, 41.5% of mixed, and 23.2% of negative outcomes, respectively. Figure 5(b) illustrates the evaluation results by different types of ecosystem and different aspects of 3E. Wetland PES programs reported the highest effectiveness, followed by forest and grassland PES programs. Except wetland PES programs, both forest and grassland PES programs still have a large potential for efficiency improvement. Equity appeared to be the biggest challenge as none of the three main ecosystem types of PES programs showed a high percentage of positive outcomes.

Discussion
Our review shows that there is a growing interest to assess the outcomes of PES programs in China. However, the spatial and ecosystem distributions of existing studies are quite uneven. Many case studies were poorly designed, rarely quantified, and evaluated without sophisticated methods. In addition, among the three dimensions of 3E, economic efficiency is the least studied. Here, we discuss the challenges and opportunities and provide insights for future research.
First, there is a spatial mismatch between the distribution of key ecosystem services in China and existing PES studies (Figure 3). Existing studies concentrate on part of the biodiversity hotspots, especially giant panda reserves in China. This is because protected areas in China (also named nature reserves) were initially established to protect endangered species particularly the icon species of giant panda (Loucks et al. 2001). Decision makers instinctively set these nature reserves as priority areas for the implementation of ecosystem service policies (e.g., NFCP, GTGP) since the late 1990s. However, the latest national assessment shows that China's nature reserve network has a relatively low coverage of both its biodiversity and ecosystem services (Xu et al. 2017). China's nature reserve network enclosed 15.1% of its land surface. Meanwhile, it protects 17.9% habitat for threatened mammals, 16.4% for threatened birds, but only 13.1% for threatened plants, 10.0% for threatened amphibians, and 8.5% for threatened reptiles. Nevertheless, it only encompasses 10.2-12.5% of its key regulating services such as water retention, soil retention, sandstorm prevention, and carbon sequestration (Xu et al. 2017). Therefore, future ecosystem service policies and research needs to pay more attention to understudied places with important ecosystem services such as Heilongjiang and Jilin Provinces in the northeast, Xinjiang and Qinghai Provinces in the northwest, and Fujian and Guangdong Provinces in the south (Figure 3). In addition, while 69% of existing studies congested on forest ecosystems, future research should focus on grassland and wetland ecosystems. In particular, there are too few case studies on coastal and marine ecosystems.
Second, it is a promising sign that policy makers and researchers become more and more interested in impact evaluation of China's ecosystem service policies; however, there is still a long way to establish and adopt the state-of-art evaluation design and sophisticated methods and to quantify the causal mechanisms. Most (74%) of the reviewed cases even did not meet the basic standards for an impact evaluation, including the consideration of control groups, pre- , and 1E refer to three, two, and one dimension(s) of the ecological effectiveness, economic efficiency, and social equity, respectively. NA (not applicable) denotes that the corresponding evaluation dimension was not mentioned in the article. and postconditions, confounding factors, and rival hypotheses (Baylis et al. 2016). Moreover, none of the analyzed studies quantified spillover effects, although we know the fact that benefits of many ecosystem services (e.g., water regulation, air purification) go beyond the corresponding policy boundaries (Liu, Yang, and Li 2016). Therefore, to rigorously assess the environmental, economic, and social outcomes of PES policies and scientifically guide future decision-making, it is crucial to take a holistic approach accounting for all the 3E dimensions, promote the experimental or quasi-experimental design, include control groups, incorporate confounding factors, consider spillover effects, and systematically rule out rival hypotheses (Nolte et al. 2013;Ferraro and Hanauer 2014).
Third, the impacts of ecosystem service policies may differ across time, space, and social groups. It often needs to compare the trade-offs between shortterm and long-term impacts. Ecological and socioeconomic processes occur at multiple scales and any single scale may fail to capture certain dynamics and impacts. Therefore, it is necessary to compare policy effects at multiple scales or construct multi-level or hierarchical models for analyses (Agarwal et al. 2005;Yang et al. 2013b). In addition, the costs and benefits of ecosystem service policies often differ across various social groups or stakeholders. To sustain the long-term implementation of ecosystem service policies, a relatively equal distribution mechanism of costs and benefits is essential.
Finally, it is important to evaluate how ecosystem service policies have affected environmental, economic, and social outcomes in the past; however, to guide the revision of existing policies or create new policies for ecosystem management, it is also important to predict environmental, economic, and social outcomes in the future. Decision makers are often less interested in what have already happened and more willing to know pragmatic solutions toward a more promising future. Therefore, prospective studies (De Leo   . Results of impact evaluation by the three different dimensions. For the dimensions of effectiveness and equity, positive denotes that the results of the assessments are positive; mixed denotes that the evaluations include both positive and negative outcomes; negative denotes that the evaluation outcomes are negative. For the efficiency dimension, positive denotes that the results of the assessments are of high efficiency; in other words, the researchers claimed positive policy impacts on local economics, or the rate of return on investment was greater than one; mixed denotes that the policy outcomes are of low efficiency, that is, the researchers suggested more efficient policy implementation measures, or the rate of return on investment was between zero and one; negative denotes that the policy leads to negative ecological outcome and is also economically inefficient.
prospective studies usually require the construction of complex system models, such as spatially explicit agentbased models (Polasky et al. 2008;, to simulate dynamic human-nature interactions under various policy scenarios in heterogeneous contexts over time and across space. To date, there have been relatively few such case studies in China. Therefore, there is a crucial need to fill such research gaps in future.

Disclosure statement
No potential conflict of interest was reported by the authors.