Towards pro-poor and voluntary PES: assessment of willingness to pay and willingness to accept PES contract in central Vietnam

ABSTRACT This study employed Contingent Valuation and Discrete Choice Experiment methods to investigate the potential of a Payment for Ecosystem Service mechanism that incentivizes sustainable land use practices by determining the willingness of ecosystem service beneficiaries to pay for delivery of services via adoption of sustainable land use practices by upland poor, ethnic minority communities, and vice versa, the willingness of communities to adopt such practices upstream of Vu Gia river, central Vietnam. From the users’ side, 64% and 56% of pooled respondents said they are willing to pay a higher rate for water and electricity consumption, respectively, if upstream watershed management is improved. On the providers’ side, WTA was high if the conditionality is relaxed. Our findings suggest a fundamental challenge in designing PES that matches the needs of buyers and providers – a scheme that ensures ecosystem service flows but does not impose stringent rules that limit stakeholders’ participation.


Introduction
While terrestrial biodiversity and ecosystems have been continuously threatened, neoliberalisation of biodiversity and ecosystem conservation offers opportunities to mobilize resources (financial, human and technical) from various sources for conservation. Many schemes have been tried with some certain successes (and failures) including market-based incentive instruments, tradable permits, REDD+, debt for nature swap, and payment for ecosystem services (PES) among others (Anderson et al., 2010;Bigger & Dempsey, 2018;Thomas & Theokritoff, 2021). Among them, PES has been fast growing as a promising solution to provide conditional incentives that motivate land-users to adopt environmentally beneficial land use practices (Farley & Constanza, 2010;Kaczan et al., 2013). The PES concept varies widely, from the narrow definition of Wunder (2005) that (conditionality) ', to the broader concept of Muradian et al. (2010) who defined PES as 'a transfer of resources between social actors, which aims to create incentives to align individual and/or collective land use decisions with the social interest in the management of natural resources'. While Wunder's definition of PES is strictly neoliberal (Büscher et al., 2014), implementation in developing countries deviates significantly from this definition (Fletcher & Breitling, 2012;Muradian et al., 2013;McElwee et al., 2014;Fletcher & Büscher, 2017). Indeed, most PES implementation are run by States, non-market, and often in the form of subsidies (Gómez-Baggethun et al., 2015). The neoliberalisation of biodiversity and ecosystem conservation is often critiqued for its 'commodification of nature' (Kopnina, 2017), narrowing human-nature relationship (Kolinjivadi et al., 2019), lack of marketability of ecosystem services (Landell-Mills & Porras, 2002), and ignorance of ecological, social, or spiritual values as separate from an income dimension (Kolinjivadi et al., 2014). Some suggest that government intervention, or even direct administration, is needed for PES to deal with complexity of natural resources management and ensure environmental function (Gao et al., 2020;Wang & Wolf, 2019). As a result, PES and local pre-existing systems in conservation and resources management are often combined into a hybridized structure (Fletcher & Breitling, 2012;Kaiser et al., 2021;McElwee et al., 2020).

'PES is a voluntary transaction where a well-defined ES (or land-use likely to secure that service) is being bought by a buyer from a provider, if and only if the provider secures ES provision
Vietnam is an interesting case to see how 'theoretical' neoliberal conservation via PES breaches into a country, given its long history of socialism and command-and-control approach in forest governance. In 2010 a national policy on payment for forest ecosystem services (PFES) was initiated. Vietnam's PFES takes the broad definition of PES wherein beneficiaries of the ES (e.g. hydropower companies) indirectly and obligatorily pay owners of the forest land that provide the ecosystem services. With the issuance of Decree 99/2011, the PFES policy is regarded as the nation's only working PES mechanism (Do et al., 2018). Under Decree 99, ES users are hydropower enterprises, water supply companies, and eco-tourism businesses, while ES sellers/suppliers are forest owners who manage the forests, management boards of protected and special-use forests, individuals, forest companies, and local organizations with forest land titles. However, most hydropower plants, water supply companies and tourism operators, although called ES users, only simply act as fee collectors or intermediaries that pass the fees from one party to the next (their customers who consume electricity, water and tourism services). Many authors including Pham et al. (2013) and Phan (2019) found that most end users were not aware of the PFES payment part in their electricity and water bills, and information exchange between end users and their suppliers are very limited.
A fixed rate of PFES payment has been applied. At the time of this study, hydropower enterprises pay VND 20 (USD 0.001) per KWh for commercially-produced power, while water supply companies pay VND 40 (USD 0.002) per cubic metre of produced clean water, and eco-tourism businesses pay 1-2% of their revenue. 1 All transactions are made through the Viet Nam Forest Development and Protection Fund (VNFF), the agency mandated to collect and distribute ES payments. In this indirect payment mechanism, any negotiation or contact between sellers and buyers is simply out of necessity. Despite being mentioned in Decrees 99/147/156, direct payment (implying voluntary PES) was never elaborated in policy documents (Do et al., 2018). For the Vietnamese government, the PFES program has helped in successfully raising over US$ 100 million annually for forest protection with proxy indicators on environmental performance such as decreased area of forest loss (McElwee et al., 2020). The PFES program has been criticized for its reverse decentralization of State' forest governance (Suhardiman et al., 2013), avoidance of voluntary negotiations (Do et al., 2018;Kolinjivadi & Sunderland, 2012), largely excluding rightful beneficiaries (To & Dressler, 2019), and lack of conditionality and monitoring systems to report on outcomes (Pham et al., 2015). Nevertheless, few studies report on making the PFES more voluntary and performance-based (Do et al., 2018;McElwee et al., 2020;Nielsen et al., 2018;Simelton & Dam, 2014).
While we partly agree with some authors including Roth and Dressler (2012), Suhardiman et al. (2013), andMcElwee et al. (2020) that the non-neoliberal parts of PFES are influenced by a history of socialist development, we believe that neoliberalisation is a process, not an outcome, since any governmentality must be continuously reproduced or re-enacted (Kolinjivadi et al., 2019), and thus PFES can be improved towards more fair, market-oriented scheme without being completely neoliberal (Kaczan et al., 2013). Our study does not aim to engage in the endless neoliberal versus non-neoliberal debates, but rather on generating empirical evidence for negotiating a voluntary, performance-based PES, which policy-makers can use to improve the existing PFES program. It contextualizes a hypothetical PES scheme wherein current PFES users may want to further secure ES supply by paying extra amount to upland communities who may want to adopt sustainable land use practices with conditional incentives -and, unlike the PFES, the hypothetical scheme involves direct and voluntary transactions. Our study complements existing PFES literature because (1) we employed contingent valuation (CV) and choice experiment (CE) to quantify buyers' and suppliers' preferences for a PES design, not just to value ES; and (2) we expanded the scope of PES to include sustainable land use practices (e.g. agroforestry) outside natural forests that have already been covered by PFES, thus avoiding overlap while still promoting income generation and ES delivery by farmers.

WTA study site -phuoc my and ta bhinh communes, quang nam province
Phuoc My and Ta Bhinh communes are located in the buffer zone of Song Thanh Natural Reserve (STNR) in Quang Nam province ( Figure 1). The two communes lie in the valley of Central Anamite range. Agricultural land in the two communes were small compared to total land (<3% in Phuoc My and <16% in Ta Bhinh). Phuoc My had 1,590 people belonging to the Gie Trieng (Bhnong), Kinh, Tay, Nung, Co Tu ethnic groups (2014 statistics). Ta Bhing had 2,438 residents, predominantly belonging to the Ca To ethnic group. In terms of socio-economic status, 50% of households in Phuoc My were poor, 24% near-poor and 26% non-poor, while in Ta Bhinh, the figures were 64%, 18% and 18%, respectively (Catacutan et al., 2017).
Soil and water conservation technologies were rarely practiced in both communes, resulting in moderate to severe soil erosion. Farming has been affected by drought, heavy rain and storm, flooding, landslide and soil erosion. Although restricted by STNR management authorities, agricultural expansion has ensued furtively.
The quality of natural forests has declined after years of overexploitation. Timber and NTFPs were depleted. Few farmers obtained benefits from their forests besides firewood, medicinal plants and mushroom. Farmers wanted to plant fast-growing trees for the pulp and paper industry. Households from several villages in Phuoc My earned income from the PFES program, while this was the case for only one village in Ta Bhing. Paddy and upland rice remain the main agricultural crops in both communes.

WTP study site -Da Nang city
With 1.1 million people, Da Nang is the sixth most populated city in Vietnam and had the highest urbanization ratio among provinces and municipalities in Vietnam. Around 87% of population live in urban areas with an average annual urban population growth of 3.5%. 2 Da Nang city ( Figure 1) has a total area of 1,285 km 2 , lying on the coast of the South China Sea and downstream of the Vu Gia-Thu Bon rivers. 3 Water demand for millions of residents and tourists is projected at 500,000 m 3 /day by 2025 (Asian Development Bank, 2012). The main water source is Cau Do River, which relies on water from the Vu Gia River. Da Nang's water supply has declined in recent years due to upstream forest destruction, low rainfall and hydropower operations. It is also suffering from salt-water intrusion and fresh-water shortage. Since Da Nang could not control the intake of water, and the flow of Vu Gia River has been unstable, it is urgent for residents to negotiate with upstream land holders in Quang Nam province to rehabilitate the Vu Gia watersheds.

Contingent valuation method to assess WTP
Contingent valuation method (CVM) is a widely used economic tool for measuring the maximum WTP of potential users for an environmental good or service (Wedgwood & Sansom, 2003). The tool relies on two key assumptions: (1) people have well-ordered, but concealed, preferences for all types of environmental goods; (2) people are capable of transforming preferences into monetary values (Hoevenagel, 1994). The method can include either open-or closed-ended questions. We used a close-ended CVM to estimate households' WTP for improved management of watersheds in and around STNR. Four districts in Da Nang city were surveyed -Hai Chau, Cam Le, Lien Chieu, and Thanh Khe. Their populations benefit from Da Nang Water Supply Company and are therefore ES beneficiaries of STNR watersheds.
Steps 4 and 5 were repeated twice -first to determine the WTP per unit of water, then again per unit of electricity. At the end of each WTP section, respondents could provide comments on their decision and suggest alternative solutions to shortages. Twenty HHs were pre-tested to gauge interviewees' awareness about the watershed and to calibrate the final bid amounts.

Bid Amounts
A close-ended, dichotomous choice format was used to determine if respondents are willing to pay a certain amount (the starting bid) for improved management of Vu Gia river basin. For a 'Yes' response to the first bid, participants were offered a second higher bid while a lower bid is offered to a 'No' response. The final bid amounts were VND 10, 30,60,90,120,150,170,and 300 per kWh of electricity,and VND 50,70,90,110,130,150,170 and 500 per m 3 of water. The VND 300/kWh and VND 500/m 3 bids were used to control for acquiescence response bias.

Data Analysis
SPSS was used to perform truncated tests at 5% significance level for correlations between respondents' WTP and selected variables. Descriptive statistics and cross-tabulations between the percentage WTP and select variables were also conducted.

Discrete choice experiment (DCE) to assess WTA
Discrete choice experiment (DCE) belongs to a class of quantitative techniques that are based on stated preference (SP) -an individual's preferences for 'alternatives' (whether goods, services, or courses of action) expressed in a survey context . DCE is based on the random utility theory (RUT) proposed by Thurstone (1927), which has been expanded to multiple comparisons (McFadden, 1986;Midway et al., 2020). This method has been widely applied in PES studies (Chaikaew et al., 2017;Kaczan et al., 2013;Khan et al., 2019). Using DCE in evaluating farmers' WTA in the two selected communes was due to several reasons: (1) In the absence of a real-world situation, SP method is required; (2) in WTA application, DCE is less prone to bias than other stated choice methods (Burton, 2010); and (3) DCE is relatively simple and less likely to cause cognitive burden to respondents (particularly those with lower literacy).

Defining attributes and attribute levels
Prior to this study, we conducted a baseline study in the buffer zone of STNR and found agroforestry models (like intercropping and home gardens) that could maintain some degree of ecosystem functionality while allowing for agricultural production and income generation in the buffer zone. Such models can provide carbon sequestration services and biodiversity (above and below ground) comparable to regenerated forest, and contribute to better surface flow regulation (Catacutan et al., 2017). Although inferior to original forest, maintaining agroforests can offset forest cover loss. Yields and subsequent profits are estimated to be higher than those from baseline practices although there are investment requirements in both financial and management aspects (Catacutan et al., 2017). Regardless of the exact profit differences however, it is likely that long term maintenance of improved agroforestry requires providing farmers with additional incentives above the profits that are already associated with this farming method. The hypothetical PES program is focused on this goal: farmers would receive rewards (in cash and in kind) if they establish and maintain agroforestry plots that would help to generate ES benefits to downstream water and electricity users.
We initially identified relevant attributes to farmers' decision and sustainable land use practices, including land area subscribed to the program, land use types, upfront payment amount, tree density, level of technical support, preferential loan provided, monitoring and reporting methods, contract duration, annual payment, individual/collective payment, fund management, farmers' in-kind contribution, exit option, etc. A choice set was developed for field testing to narrow the choice experiments and define attributes that most influence farmers' decision-making. Thirty farmers participated the pre-test wherein farmers preferred individual payment, while contract length held very marginal effect on their decision. Farmers also revealed concerns on their technical capacity during the test. Accordingly, we adjusted the attributes as shown in the final set in Annex 1.

Questionnaire development and design
Responses in a DCE can take on different formats including 'pick-one', 'best-worse', and others. We applied 'pick-one', as this is similar to real life decision making. A three-alternative design was adopted, wherein each choice set included three options for respondents to choose from. The alternatives in our DCE were unlabeled (Louviere et al., 2000) with generic titles (options A, B and C). A 'none' option (Status quo) was included to reflect unconditional demand and thus, ensure conceptual validity of the design given the voluntary nature of farmer participation in PES. Most upstream farmers were observed to have relatively low education level and unexposed to multiplechoice situations. To collect as much information without imposing the cognitive burden of answering many choice sets, nine choice sets were then used. These choice sets were introduced to farmers as a hypothetical PES program.

Survey administration
The survey took place in all villages of Phuoc My and Ta Bhing, involving 235 respondents. To facilitate communication in local languages, up to eight enumerators (i.e. forest rangers, commune officers) were employed in each meeting. Respondents' information is given in Table 1 below.
In each village, households were gathered in the 'Guol' -a village hall. The meetings started with questions about village land uses, followed by the introduction of the hypothetical PES program. Each household representative made his/her choices individually and answered exit questions with the help of enumerators.

Data analysis
Choice experiment design and data analysis were performed using SAS's JMP Statistical Discovery software version 11.0.0 (SAS Institute Inc, 2013). The software used a special default bias-corrected maximum likelihood estimator described by Firth (1993). The choice statistical model is expressed as: Let X[k] represent a subject attribute design row, with intercept Let Z[j] represent a choice attribute design row, without intercept Then, the probability of a given choice for the k' th subject to the j' th choice of m choices is expressed in equation (1) below: where: • ⊗ is the Kronecker row-wise product • the numerator calculates for the j' th alternative actually chosen • the denominator sums over the m choices presented to the subject for that trial

Perceptions of the environment and awareness of watershed services
Questions around two variables of the New Ecological Paradigm, namely stewardship of nature and mastery of nature were used to gauge respondents' vision of the relationship between humans and nature (Dunlap et al., 2000;de Groot et al., 2011). 86.7% of respondents strongly agreed that technological progress would enable humans to solve future environmental problems, illustrating why respondents were willing to invest in watershed management, and nearly all (95.1%) strongly agreed that humans are responsible for conserving the environment for future generations (Table 2). Nearly all respondents were aware of the connection between upstream watershed management and downstream water supply, and most were aware of STNR itself. Such high awareness is likely explained by respondents' relatively high educational level (53% of respondents finished at least high school, 33% were university graduates, 5% had no formal schooling, and 9% chose not to reveal their educational level).
Majority of respondents felt their current utility supply is sufficient (85% and 84% for water and electricity, respectively). Yet, approximately 48% of respondents reported having water and/or electricity shortages in the last 6 months. This inconsistency is likely because the city's water supply in current year was relatively stable compared to the intense rationing and alarm raised in previous years (Table 3).

WTP
Ten variables were tested for their association with Da Nang residents' WTP. Water shortage experience and insufficient supply in the last 6 months and average monthly water consumption were linked to the residents' higher WTP (Table 4), while electricity WTP was correlated with electricity shortage and average monthly water consumption (Table 5). With regard to electricity, respondents were willing to pay more if they had experienced electricity shortages in the past 6 months. Further cross-tabulations confirmed no significant associations between gender, age, level of education, or other variables in the residents' WTP for water and electricity. For both water and electricity, the higher the respondent's monthly consumption, the higher their WTP becomes. In this scenario, the level of consumption might be a proxy for income level -presence of higher disposable income could thus be linked to the residents' WTP for watershed protection measures.
Sixty-four per cent of pooled respondents said they would be willing to pay a higher rate for water to improve upstream watershed management, whereas 56% were willing to pay higher electricity rates. Buyers were likely less willing to pay more for electricity than water because current electricity rates were perceived as already being too high (Tables 6 and 7). Thirteen respondents said they were not willing to pay higher prices because they could not afford to, especially for electricity which is already too high. Interviewees also did not elicit a WTP because they doubted the effectiveness of the proposed watershed management solution or did not prefer the PES financing mechanism. Overall, buyers expressed that they would be more willing to pay if there will be better monitoring and reporting.
Of the respondents who elicited a WTP, the mean WTP was 113.24 VND/m 3 and 98.07 VND/kWh for water and electricity, respectively (Table 8), which is a positive sign for exploring a voluntary PES mechanism. Note: (*) Stewardship of Nature: perception of human-nature relationship wherein Human beings have a responsibility to conserve the natural environment, and although we stand above nature, we do need to take good care of it; (**) Mastery of Nature: perception of human-nature relationship wherein human beings have the right to alter nature radically and technological progress will enable us to solve environmental problems in the future (de Groot et al., 2011). Source: authors' analysis.

Alternative solutions
The final part of the survey was an open-ended question about solutions to water shortage (Table 9). Sixteen respondents recommended improving supplier management (e.g. controlling corruption, increasing transparency), 12 posited technological solutions (e.g. solar power, water-efficient appliances), 11 suggested improving natural resources management (e.g. preventing illegal logging and improving forest management), and 3 proposed better end-user management (e.g. awareness raising).

General WTA
We employed the Discrete Choice Experiment (DCE) in determining the attributes that explain farmers' utility (WTA) toward the hypothetical PES contract. The most significant attribute was monitoring level, followed by minimum land area, upfront payment, and technical support ( Figure 2a). Predictably, both monitoring level and minimum land area negatively affected WTA wherein, an increase in the level of monitoring and land area requirement would decrease farmers'   WTA, while upfront payment and technical support showed positive impact, which means that an increase in these attributes would motivate farmers' WTA ( Figure 2a). All of the effects were significant. 4 However, the impact of technical support was only marginal compared to other attributes, meaning that farmers' WTA would only increase very slightly with more technical support ( Figure 2b). The multinomial logit models of preferences for a hypothetical PES program is shown in Annex 2. The effects of changing the attribute levels were linear except for monitoring level (Figure 2c). Increasing upfront payment and technical support, or decreasing the minimum land area subscribed to the PES program elevated farmers' WTA level proportionally. However, WTA dropped sharply with a change in the monitoring level from moderate to strict, while a change from low to moderate monitoring level did not significantly affect farmers' WTA. Farmers WTA was 1.36 when all attributes are at minimum level, and changed to .97 and −.24 when the monitoring level changed to moderate and strict. The marginal effects of each attribute on farmers' decision are shown in Figure 2c.

Site differences
Given the very similar natural and socio-economic conditions of the two communes, heterogenetic impacts of attributes on respondents' choice were not expected. We found that farmers in Phuoc My were more strongly motivated by technical support than in Ta Bhing ( Figure 3). This is likely because most of the respondents in Phuoc My have already been involved in the PFES program and may have realized the technical difficulties in implementing activities (Catacutan et al., 2017).

Gender and literacy differences
The most significant difference between gender groups was found in the upfront payment. The male group preferred a higher upfront payment, which corroborates with earlier findings of a REDD+/PES study in Bac Kan province wherein men were more cash-oriented than women (Eastman et al., 2013). The female group's WTA was much more strongly influenced by monitoring level and minimum land area (Figure 4a). Female's WTA decreased more sharply as land area or monitoring level increases. This result concurs with some behavioral economics studies concluding that women are psychologically more risk averse than men (Croson & Uri, 2009). In terms of literacy level, upfront payment and technical support had stronger effects on people who attended high school or above compared to those with lower education (Figure 4b). It is assumed that respondents with higher education (thus presumably higher literacy level and skills) had better cognitive ability (Barnes et al., 2004;Kravchenko, 2021), and therefore could realize the full picture of the hypothetical PES contract rather than focusing on one or two 'perceived' important factors. Total 41 (*) 20.2 (*) Only 41 respondents provided answers on solutions to water and electricity shortage Source: authors' analysis.

Status quo
The status quo (SQ) or no-choice option is a scenario where no action is taken. Twenty-four people (10% of total respondents) had at least one SQ taken. The total number of SQ option taken was 60 (2.8%) from the total number of choices made. Only one person took SQ options in all nine choice sets given. This was not a protest response, but the person was observed to be unsure about his understanding of the hypothetical scenarios. His response is unlikely a reflection of his true WTA, but rather the result of avoidance of choice due to confusion or complexity (Barreiro-Hurle et al., 2018).

Implications of WTP and WTA to PES participation and voluntariness
In terms of WTP, surveyed residents present a high degree of environmental awareness. The majority was willing to pay for improved watershed management; 64% were willing to pay more per m 3 of water, and 56% were willing to pay more per kWh of electricity. Understandably, those willing to pay more have experienced recent shortages. Buyers may have been quicker to rate their electricity supply as sufficient because, compared to water, electricity was not rationed.
While respondents elicited a WTP through a surcharge on both water and electricity bills, it is advised that the PFES policy to be amended to allow inter-province negotiation on increasing buyers' payments through their water bills only. This is because water supply can be directly traced from the catchment to the water supplier, then to consumers. For electricity, the water can only be traced from the catchment to the HPP supplier because electricity is distributed to different regions; hence, Da Nang City residents could not be assured of stable power supply by their regional HPPs, and the ES would leak to non-paying beneficiaries in other regions.
The high WTA of respondents in Phuoc My and Ta Bhing corroborates with other study results that cash payment is not always the most important factor for rural communities to adopt conservation and new farming practices (Adhikari & Boag, 2013;Costedoat et al., 2016;Lliso et al., 2022;Zanella et al., 2014). In many cases found in the literature, cash payment is less preferred, especially  when the villages are poor and have limited market access (Hoynes & Schanzenbach, 2009;Nordén, 2014). In our study, the upfront investment per unit area (one ha), which is USD 200 ha-1 was sufficient to trigger participation when other factors were unfavorable. This amount was less than half of the establishment cost of an agroforestry plot, so it is possible that respondents either did not consider or had no idea about the establishment or opportunity costs involved. Another possibility is that the expected benefit from the investment was high, superseding the value of the input itself. However, a high WTA could also reflect rural communities' tendency to say 'yes' to an external agent, although we tried to minimize this notion by being open about our intention. We also found that preferences in the WTA were somewhat homogenous. Differences between gender, literacy, and commune existed, but were not significantly high to be categorized into discrete classes and treated differently. In other words, a voluntary PES program can be applied uniformly throughout the population of Phuoc My and Ta Bhing.
The main difference between our hypothetical PES scheme and the national PFES is that while participation in the latter is compliant, the former is voluntary. In PFES, ES users are 'charged' by the policy, while suppliers enjoy payments through labor contracts with legal forest holders (mostly Stateowned forest entities). Some of the risks to PFES relate to motivational crowding out -for example, the way local people received daily wages (often representing instrumental values of ecosystem services) for forest patrolling could potentially undermine their willingness to protect forests for relational and intrinsic values and discourages sustainable land management activities that provide economic and ecological benefits. Consequently, they are considered passive participants in the PFES program, and had no role in decision-making (Hoang et al., 2021). It is evident that upstream-downstream payment programs that ignores relational and intrinsic values will likely have less participants and could even threaten those values by reducing access to traditional lands (Arias-Arévalo et al., 2017;Bremer et al., 2018;Lliso et al., 2022). This motivation crowding out effect is often a result of non-participatory, topdown approach, not aligned with personal development, elite capture, individual payments conservation tasks, and commodification, among others (Ezzine- de-Blas et al., 2019), and should be well considered in PES design and implementation. A voluntary approach makes the PES scheme more socially acceptable than the government's regulatory approach (Kolinjivadi & Sunderland, 2012), and helps to avoid potential conflicts associated with mandatory regulations (Lindhjem & Mitani, 2012). Our case study presents a clear opportunity to engage stakeholders more actively in securing downstream water supply and developing agroforestry upstream through a PES scheme without creating an artificial ES demand through regulatory administration. It lends lessons and better understanding on designing tailor-fitted PES schemes that incentivize upland poor communities to adopt sustainable land use practices at a payment level that downstream buyers can readily afford, while local concerns are specified and addressed (Wang & Wolf, 2019).

Conditionality in context
While the neoliberal nature of PES is broadly contested, conditionality is considered the core feature that makes PES a novelty as opposed to command-and-control and other non-coercive conservation approaches (Kaczan et al., 2013;Sommerville et al., 2009;Wunder, 2015). In his revised PES definition, Wunder (2015) emphasized conditionality as the single most important PES feature, while voluntariness can be a preserved criterion. Conditionality ensures payments can actually result in positive outcomes on landowners' behavior and the ES (Sommerville et al., 2009). On this premise, Van Noordwijk and Leimona (2010) identified four PES types with decreasing levels of conditionality (I-IV). Level I is where payment can be linked to actual improvement of ES, and IV is when farmers are evaluated against their commitments to implementing management plans favouring ES. Voluntariness of ES suppliers is arguably disproportional to conditionality: the higher level of conditionality, the heavier the burden on local land managers, and vice versa. Preferences of PES stakeholders, especially suppliers are therefore assumed to be heavily influenced by conditionality (Kaczan et al., 2013).
In our study, level of conditionality was found a limiting factor to participation of poor communities, conforming the studies of Van Noordwijk and Leimona (2010), To et al. (2012), Kaczan et al. (2017), and L. Loft et al. (2019). This is a fundamental challenge in developing a voluntary PES scheme wherein WTA was high at low monitoring level (low conditionality), while WTP varied from low to moderate if high conditionality was required. In general, farmers are likely to participate in a program that accounts for their actions but not the environmental outcomes because the latter is more difficult to comply (Kaczan et al., 2013). The strong effect of monitoring level to farmers' WTA could be also explained by farmers' skepticism towards a foreign land use (.e.g. agroforestry). Since respondents were unsure about outcomes, they wanted to try agroforestry in the smallest area possible first, to reduce the risk of failure or observe the benefits. We posit that if the communities have sufficient experience on sustainable land use practices e.g. agroforestry, their preferences would be less about 'afraid of failure' as reported in the literature (Costedoat et al., 2016). The impacts of low and medium level monitoring (representing conditionality in this case) were not much different. Therefore, a future PES scheme can employ a moderate monitoring level based more on trust amongst community members than physical inspection of compliance. Instead of a strict monitoring scheme, a high-level technical support can be provided to help farmers, especially poor farmers with low educational attainment to fulfil their contractual obligations.

Moving towards voluntary, pro-poor PES scheme that addresses both conditionality and voluntariness
The PFES policy appears to embody the notion of Compensation for Opportunity-Skipped (COS; Van Noordwijk & Leimona, 2010). However, the low payment rate failed to address the opportunity costs of unfriendly forest uses, especially forest land conversion to agriculture Pham et al., 2015), and has undermined legitimacy and effectiveness. Even with increased payment, the current language, 'payment' would not likely halt timber and NTFP exploitation because forest dwellers are paid for 'not doing harm' (e.g. do not cut forest) rather than 'actively doing good things' (preserve and enhance the actual ES). The regulatory nature of PFES thus, inhibits participation (especially those without legal tenure rights) and information exchange between stakeholders. Participation in PFES is by default (as perceived) an obligation than an option, where information flow is lacking, awareness of rights and responsibilities are questionable, and non-participation in decision making is common Le et al., 2016;Pham et al., 2016). The PFES was also criticized for poorly managed conditionality partly due to ineffective monitoring, reporting and verification (McElwee et al., 2020;Trieu et al., 2020).
Results from our WTP and WTA studies provide a basis to shift from the COS type of PES to a Coinvestment in landscape-stewardship (CIS) -a pro-poor PES that aims at enhancing capacity and responsibilities of local communities in conjunction with external financial rewards to achieve desired economic and environmental goals in a stepwise manner: conditionality is enhanced through trust-building, giving attention to non-monetary rewards that enhance the providers' capacity such as technical training, conditional land tenure, and involvement in decision-making (Lliso et al., 2022;Van Noordwijk & Leimona, 2010). This way of PES framing (stewardship) embraces both relational and (partly) instrumental values, is fairer (because it recognizes the multiple ways that various social actors perceive the environment values), is more likely to cause motivational crowding-in, and thus ensure sustainability of the payment (Arias-Arévalo et al., 2017;Lliso et al., 2022).
The CIS-PES offers opportunities to include multiple perspectives in managing agroforestrymosaic landscapes, which have been neglected by policymakers and PES-buyers who consider ESbenefits primarily from forests only. Given the lower conditionality included, attention should be paid to build public trust in the management of upstream natural resources and ES providers. The government should also be attentive to balancing the needs of providers who are often poor communities, with the users' desire to secure ES supply.
Our study provided evidence on how ES users and providers make choices around ES management. It also shows the potential of a pro-poor CIS-PES, which is a more socially acceptable way to manage the exchange of natural and financial capitals between two sides. Direct and voluntary payment was mentioned in Vietnam's PFES policy but progress in this direction is slow. Our key argument is that although financial sustainability has been easily secured via the regulatory-oriented PFES where taxes and fees are main funding sources, public-private partnerships can also result in tailor-fitted, self-sustaining PES schemes through direct transactions between local stakeholders for the ES of utmost concern. Such opportunity for pro-poor CIS-PES can only emerge if the Government shifts its role from being a regulator (as in the current PFES) to an enabler, creating the conditions for voluntary PES schemes and facilitating their implementation. This approach will potentially make a headway in the transformation of PFES in Vietnam.