Heterogeneous preferences of citizens towards agricultural ecosystem services: the demand relevance in a choice experiment

ABSTRACT Agri-environmental policies affect many ecosystem services from agricultural environments. Valuing ecosystem services with choice experiments requires operationalization of these services for the survey and a careful attribute selection process, as the attributes need to be relevant to respondents. In this paper, we expand the concept of demand relevance of attributes to include not only the importance of ecosystem services to the respondent, but also the success in the supply of these services and the need to improve them. We examined this in the context of citizens’ preferences for ecosystem services from agri-environments. The results revealed five citizen groups with differing preferences. Respondents with high demand relevance for ES in the survey were divided into those emphasizing existence values and those emphasizing ES more directly related to the human use of ecosystems. In turn, low demand relevance for the ES was associated with a non-significant cost attribute, preventing the estimation of WTP for these respondents, and led to anomalies in choices. Our findings emphasize the need to take respondents’ heterogeneity with respect to preferences, but also to demand relevance, into account in data analysis and support the inclusion of other factors than the importance of attributes when examining demand relevance.


Introduction
Ecosystem services (ES), providing a link between the ecosystem and human well-being, are increasingly included in environmental policy assessments (Bouwma et al. 2018). The ecosystem service framework and classifications have rapidly developed since the Millennium Ecosystem Assessment (MA 2005) in parallel processes, such as The Economics of Ecosystems and Biodiversity (TEEB 2010) and Common International Classification of Ecosystem Services (CICES) (Haines-Young and Potschin 2012). ES are typically classified into provisioning, regulating, and cultural services. Some classifications also have a category for supporting or habitat services. The fundamental aim of the ecosystem service concept is to guarantee that all of the contributions that ecosystems provide for people are taken into account in decision-making. One option to assess these contributions is to identify and value changes in ES resulting from policies or programs. Those ES for which markets and prices do not exist can be valued using economic valuation methods that elicit citizens' willingness to pay (WTP) for changes in the environment using surveys. A challenge in a valuation survey is that there are numerous ES, and not all possibly relevant ones can be included.
The popularity of stated preference methods has increased over the past two decades (Menegaki et al. 2016) and the choice experiment (CE) method is dominantly used for the valuation of environmental goods without market price (Guijarro and Tsinaslanidis 2020). In CE, the value of environmental change, for example, a biodiversity conservation program, is determined by the changes in its attributes. In a valuation survey, the CE question consists of several choice sets with two or more alternatives described by attributes and the levels that these take. Attribute levels vary between alternatives, and monetary cost to a respondent is often included as one of the attributes to enable the estimation of welfare measures. The respondents are asked to consider the utility from different alternatives and to choose their preferred alternative. Typically, choice sets include a status quo option and hence they do not force the respondents to choose an option with increased cost. Respondents' choices reveal tradeoffs between attributes, and WTP for different alternatives or attribute levels can be estimated (e.g. Hanley et al. 2001). The CE method has been used to value different kinds of ES ranging, for example, from climate change mitigation (Remoundou et al. 2015) to cultural ecosystem services (Rewitzer et al. 2017). Also, biodiversity has been valued with numerous CE studies, as shown by meta-analyses (Jacobsen and Hanley 2009;Lindhjem and Tuan 2012;Subroy et al. 2019).
A challenge in a CE study is the choice complexity, as the individuals responding to the valuation survey have a limited capacity to process information in one choice task (Bateman et al. 2002). In order to keep choice tasks manageable to the respondents, the number of attributes that can be evaluated in a CE is rather low. This implies that not all of the different services produced by the ecosystems in question can be evaluated in one CE survey. Therefore, selecting the key ES requires careful consideration . To a certain degree, selecting ES for a CE is relatively straightforward. Typically, there is no need to use stated preference valuation techniques for ES that have markets and prices, e.g. some provisioning services, such as food and timber. Intermediate services that contribute to the final services can also be left out from the potential attributes, because their value is embedded in the final services, and including both intermediate and final services could lead to double counting (Fisher et al. 2009).
Literature on the selection of attributes emphasizes that attributes need to be relevant to both decision makers and respondents (Johnston et al. 2012(Johnston et al. , 2017Jeanloz et al. 2016) in order to obtain statistically significant estimates (Bateman et al. 2002). Even though the stated preference literature suggests some guidelines for attribute selection using various types of qualitative processes, such as focus groups and stakeholder meetings (Bateman et al. 2002;Blamey et al. 2002;Coast et al. 2012;Abiiro et al. 2014), only a few studies have developed or tested these processes empirically (Coast and Horrocks 2007;Armatas et al. 2014;Jeanloz et al. 2016). Often, the attribute selection processes are poorly reported (Coast et al. 2012;Jeanloz et al. 2016), and the topic is typically covered with a researcher statement that the key ES have been selected for valuation (e.g. Takatsuka et al. 2009).
Considering the relevance of the attributes, many studies differentiate between demand relevance and policy relevance (Blamey et al. 1997;Abiiro et al. 2014;Armatas et al. 2014;Suziana 2017). The relevance of attributes is hence divided into two different aspects: 1) attributes are important to the respondents (demand relevance) and 2) attributes can be impacted by policy (policy relevance). The first aspect is often used in previous literature. For example, Armatas et al. (2014) and Kløjgaard et al. (2012) used the concept of importance and emphasized the process of selecting important goods and attributes for respondents. According to Suziana (2017), 'demand-relevant attribute is the attribute that has meaning for those interviewed.' In turn, attributes have policy relevance when they are relevant to policy makers and can be affected by policy decisions.
Although the concept of attribute relevance includes their importance and being able to affect them with policies, it still does not directly inlcude the aspect of whether there is a need to improve the attributes from the respondents' point of view. In the case of agricultural ecosystem services, this is particularly relevant, as agri-environmental programs have been slow to improve the state of the agricultural environment and the provision of ecosystem services (Burton and Paragahawewa 2011;Batáry et al. 2015). This modest development has also been observed and criticized by citizens (Pe'er et al. 2020). In this paper, we expand the concept of demand relevance to include the perceived need for changes in the current supply of ES. We use the concept of demand relevance to capture the importance of the attribute itself to the respondent, as in previous literature, but also include the respondent's perception of the success of current ES supply. The success, or rather the lack of it, is related to the possibility for improvement as a result of the suggested policy. This paper examines the effect of the relevance of ES on their valuation with surveys.
Although guidelines and methods of attribute selection support researchers, stated preferences valuation methods rely on the analyst's experience and ability to identify, select, define, and articulate the ES (Armatas et al. 2014). Hence, there is a chance that 'not all attributes are relevant to all respondents' (Ryan et al. 2009), and some survey respondents will consider that key ES have been excluded or that, in turn, ES that do not have demand relevance have been included. Missing relevant attributes in a CE can affect the respondents' choices and consequently bias the results (Lancsar and Louviere 2006). Respondents can make assumptions regarding the missing attributes and, for example, infer that a high price is associated with high quality, even if quality is not included as an attribute in the CE. The relevance of the attributes presented in a CE can also determine the attribute processing strategies adopted by a respondent (Hensher 2006;Hensher et al. 2012). Respondents who consider the valuation task to be totally or partly irrelevant due to the low relevance of the attributes may indicate protest behavior, or discontinuous, zero, and low preferences (e.g. Hensher et al. 2005;Campbell et al. 2011;Alemu et al. 2013). It has been noted that there are often respondents who ignore the attributes to some extent (Scarpa et al. 2009;Alemu et al. 2013;Lagarde 2013;Van Zanten et al. 2016). Thus, identifying the respondents who perceive low relevance for attributes included in a CE can also provide a solution to taking their possibly inconsistent preferences in to account in benefit estimation (Carlsson et al. 2010;Scarpa et al. 2010), as not detecting these respondents could bias the WTP estimates. We aim to examine how respondents' perceptions of the demand relevance of attributes in a CE affect their survey responses and WTP.
Our case study originates from Finnish agrienvironmental policy. In Finland, the focus of agrienvironmental policy is on three main targets: water conservation, biodiversity, and climate change mitigation. The current policy is designed to compensate farmers for the expenses from agri-environmental measures, and it does not demand or ascertain the production of public goods or ES. It has been suggested that the agri-environmental policy in Europe should be converted to a more results-or benefit-based direction in order to be effective and cost-efficient (for a review, Schwarz et al. 2008;Russi et al. 2016). Results-based agri-environmental schemes have been tested, for example, in Germany, France, Ireland, and Switzerland (Burton and Schwarz 2013). In Finland, results-based schemes have not yet been implemented. However, Birge et al. (2017) explored the opinions of Finnish farmers, as well as public officials and advisors, on a payment-by-results agri-environmental scheme. To develop this type of policy, it would be essential to know the value of various ES from the benefit-based policy to the beneficiaries, i.e. citizens.
Many previous choice experiments valuing the attributes of agricultural environments have focused on the characteristics of agricultural or rural landscapes (e.g. Hynes and Campbell 2011;Grammatikopoulou et al. 2012;Häfner et al. 2018;Varela et al. 2018), multifunctional agriculture (Dominguez-Torreiro and Soliño 2011; Sangkapitux et al. 2017), or agri-environmental policies (Novikova et al. 2017;Grammatikopoulou et al. 2020). The ES typology, including the entire list of ES, has been used as the basis for valuation in some CE studies concerning agricultural ecosystems (Takatsuka et al. 2009;Bernués et al. 2015;Rodríques-Ortega et al. 2016).
Heterogeneous citizen groups with different preferences towards agricultural ecosystems or related policy have also been observed in some studies. Applying latent class analysis, Grammatikopoulou et al. (2012) and Häfner et al. (2018) identified classes of respondents with varying preferences for agricultural landscape attributes. According to Novikova et al. (2017), Lithuanian citizens had very different tastes concerning agro-ecosystem services. In their study, from among the ES tested, landscape services showed the highest level of heterogeneity across the three classes. Reasons for heterogeneity have been found from socio-demographic variables (e.g. Arnberger and Eder 2011;Howley et al. 2012), as well as from attitudinal factors ). Due to the amplitude of ES from agricultural ecosystems, as well as previously observed heterogeneity related to ES from agri-environments, they provide an interesting platform to examine the effect of demand relevance in valuation.
In this study, our aim was to identify citizen groups with heterogeneous preferences towards agricultural ES and to produce marginal WTP estimates for them. As we wanted to extend the concept of demand relevance to also include respondent's perception of the success of current ES supply, we compared the performance of models containing only importance as covariate and models using both importance and perception of success. We recognize that despite the careful attribute selection, it is possible or even likely that ES selected as attributes in the CE were not demand relevant for all respondents, and our aim was to therefore examine how differences in demand relevance relate to heterogenous respondent classes and the effect of low demand relevance of attributes on respondents' choices.

Data collection
An Internet survey was used to collect the data during the spring of 2016. The sample was drawn from an Internet panel of a private survey company, Taloustutkimus, that comprises over 30 000 respondents who have been recruited to the panel using random sampling to represent the Finnish population (Taloustutkimus 2017). After a pilot survey (N = 202), a random sample of 8391 respondents was selected from the panel. Of the sample, 2066 completed the survey, corresponding to a response rate of 25%. The data represented the population fairly well (Table 1), although the respondents were slightly older and more educated, and the proportion of females was smaller compared to the population.

Identifying ecosystem services for valuation
We applied the Common International Classification of Ecosystem Services (CICES) as a basis for identifying key agricultural ES for the valuation survey. The CICES classification was selected because it is a continuously developing, European-wide classification system that also aims to serve the valuing of ES (CICES 2016). To select the ES from the CICES classification, we first selected 13 ES based on a literature review and the expert judgement of agricultural and environmental economists, as well as ecologists. The selected services included provisioning ES (food, agro-diversity, and bioenergy) and regulating ES (pollination, nursery and reproduction habitats for animals, pest control, soil productivity, water quality, and climate change mitigation), as well as cultural ES (cultural heritage, the existence of species and ecosystems, the recreational environment, and landscape).
To select the attributes for the choice experiment from these 13 ES, we started with the analysis of the importance of the ES for citizens based on previous survey data (N = 800) (Pouta and Hauru 2015). This was followed by evaluation of the importance of agrienvironmental ecosystem services by stakeholders from the administration and NGOs (N = 6) and discussion of the evaluation. Based on the importance of different ES for citizens and stakeholders and analysis of market and non-market services, as well as final and intermediate services, we selected the ES for the pilot study. The selected services were landscape, the existence of species and ecosystems, water quality due to agriculture, and climate change due to agriculture. The group of environmental economists, ecologists, and agri-environmental policy experts (N = 12) developed these selected ecosystem services into measurable attributes and their levels. In defining the attribute levels, we looked for concrete indicators that could be affected by farming practices and thus targeted with agri-environmental policy. Before the pilot (N = 202), the questionnaire and especially the choice experiment were further evaluated by a group of environmental economists specialized in environmental valuation (N = 10). After examining the data from the pilot study, in which attributes and their levels worked well, we decided to keep the same attributes and attribute levels in the valuation task of the final survey. The final attributes for the CE were traditional rural biotopes and endangered species, a typical agricultural landscape (divided into grazing animals and the number of plant species in cultivation), climate effects, and water quality effects.

Measuring demand relevance
As only four of the 13 ES produced in agricultural areas were included in the CE, the respondents were likely to include those for whom the four selected ES were and were not relevant. Here, we examined the demand relevance of ES for each individual based on the importance of ES and the perception of the current state of ES. The personal importance of the 13 ES produced by Finnish agroecosystems was assessed with a scale ranging from 1 (very low importance) to 5 (very high importance). This was followed by a description of the principles of the current agri-environmental programs, and by asking the respondents about their perceptions of how Finnish agriculture has succeeded in producing the 13 ES mentioned. The scale ranged from 1 (extremely well) to 5 (extremely poorly) in order to reflect the strength of demand relevance, as higher values implied the need to improve ES supply. The questions assessing the importance of different ES and the perceptions of the success in their production are presented in Appendix 1. If the current policy fails to produce an ES that the respondent considers important, this ES can be seen as particularly demand relevant to the respondent in the new agricultural policy introduced in the survey. Hence, our definition of demand relevance is not merely the importance of ES, but also the perceived need for action. The importance and evaluation of the current supply of a particular ES were defined separately for the ES included in and excluded from the CE.

Choice experiment
A new result-oriented agri-environmental policy was introduced to the respondents by informing them that in the hypothetical new program, farmers would be paid for producing environmental benefits (see Vainio et al. 2019). The effects of the program were described with four attributes: traditional rural biotopes and endangered species, the typical agricultural landscape (divided into grazing animals and the number of plant species in cultivation), climate effects, and water quality effects. These attributes were described to the respondents and information was given regarding the current state of the attribute, as well as the different attribute levels. Table 2 presents the attributes together with their descriptions and levels.
Next, the survey explained that the new agrienvironmental program would be financed with taxes and that depending on the extent of the program, the cost to taxpayers would vary, but all taxpayers would participate in financing the program. Respondents were informed that the current program also causes expenses to citizens, amounting to approximately 40 euros per individual per year. This cost was based on expert judgement. Consequentiality in the CE, i.e. the belief that survey responses could have consequences, was enforced by stating that the information from the choice tasks would help decision-makers to revise the agri-environmental program.
After introducing the attributes and the new program, the respondents were presented with six choice tasks. Each choice task comprised three alternatives: the status quo alternative, described as maintaining the current program, and two alternatives with higher levels of ES compared to the current state. Each alternative was described with four ES attributes, their levels, and the cost attribute. The status quo alternative was identical across choice tasks. An example of a choice task is presented in Table 3.
To allocate the attribute levels to the choice tasks in the CE, an efficient experimental design was constructed. Efficient designs are used to generate parameter estimates with standard errors that are as low as possible, and thus to obtain the maximum information from each choice situation (see e.g. Rose and Bliemer 2009). The generation of efficient designs requires the specification of priors for the parameter estimates. We employed zero priors in the design of the pilot survey. In the final study, we employed a Bayesian D-efficient design using Ngene (v. 1.0.2), taking 500 Halton draws for the prior parameter distributions and using the parameter estimates obtained from the pilot study. Bayesian designs take into account the uncertainty related to the parameter priors. We used a Bayesian prior for the number of plant species in cultivation and fixed priors for all other attributes.
Overall, we generated a design with 36 choice tasks, blocking them in 6 subsets, which resulted in six choice situations for each respondent. Four different versions of the design were created using four different cost scales. Cost scales varied from €5-300, €5-500, €40-500, and €40-300. 1 Two of the cost scales began from 5 euros, which is below the current contribution to the agrienvironmental schemes. However, this was reasonable, Grazing animals rarely appear in the landscape and few plant species are cultivated.
Grazing animals: • seldom seen • seen often during summer season • seen often during summer and the unfrozen season

Plants in cultivation:
• 3, 4, 5 species Climate effects Agricultural greenhouse gas emissions contribute to climate change. Greenhouse gas emissions can be reduced by various cultivation practices and capturing greenhouse gasses.
The agricultural sector produces 11% of Finland's greenhouse gas emissions.

decrease in current emissions
Water quality effects Share of surface waters in a good or excellent state About half of the nutrient runoff to waters comes from fields. This is affected by the amount of fertilizers used, cultivation practices, and annual weather conditions.
About 60% of the surface waters are in good or excellent condition.

Statistical models
Random utility theory (McFadden 1974) provides a framework for modeling choices between alternatives in a CE. Individual n maximizes utility by selecting alternative i with the highest utility U ni from J n alternatives in a choice set C n . The random utility model assumes that utility is comprised of a deterministic (v) and a random (ε) component: where x i is the vector of attributes describing the alternative, z n is the vector of characteristics describing the individualn; and ε ni is the error term. From the perspective of the respondent, the choice is assumed to be deterministic, but as the researcher cannot observe everything, the error term reflects the researcher's uncertainty about the choice (Holmes and Adamowicz 2003). As the error term is not observed, only probabilistic assumptions can be made about the choice behavior. The probability that an individual chooses alternative j from all of the alternatives in a choice set can be expressed as: Rearranging equation 2 shows that choices are made based on the differences in utilities derived from different alternatives: Utilities derived from choice models are ordinal, and hence, only utility differences matter and the absolute value of utility is meaningless (Hensher et al. 2015). Choice experiments have traditionally been modeled with the conditional logit (CL) model (McFadden 1974). However, the CL model assumes a similar preference structure across all respondents, which is often an unrealistic assumption, and defining heterogeneous respondent segments has been an interest in many studies. As we were interested in heterogeneity, possibly caused by the attribute selection, we used the latent class (LC) model. This approach allows for heterogeneity in preferences (Boxall and Adamowicz 2002), dividing individuals simultaneously into latent segments and estimating a choice model in these classes. Preferences are assumed to be homogeneous in each latent class, but to vary between the classes.
In the latent class model, the joint probability that respondent n belongs to segment q and will select alternative i from a total of J alternatives is: where x i is a vector of attributes of alternative i, β q is a vector of parameters representing the preference associated with each attribute in class q, and μ is the scale parameter, which is set to one for all classes in the standard LC model. The estimation of the LC model is first carried out allowing for one class, then two classes, three classes, and so on (note that a latent class model with only one class reduces to a conditional logit model). In each step, the explanatory power of the model is assessed to select the optimal number of classes. For this purpose, several information criteria, including Bayesian (BIC), Akaike's (AIC) and corrected AIC (CAIC) information criteria, can be used. The LC model also enables the estimation of WTP for ES with various attribute combinations for different respondent segments.
In the modeling of the CE data, a conditional logit model was used as a baseline model. The LC model was used to examine the respondents' choice of agri-environmental policies, allowing for heterogeneity in preferences. We began our search for the specification of preference heterogeneity and the LC model by estimating models from 1 up to 8 classes by comparing BIC and CAIC for different class counts. Estimation was carried out using LatentGOLD 5.1 and models were estimated using effects coding. The preferred number of preference classes was five As we were interested in how the demand relevance of the selected attributes in the CE is related to respondent classes, the two components of demand relevance (importance and success in supply) were used as an active covariate in the class membership function for the LC model. The likelihood ratio test was used to compare the performance of models with only the traditional component of demand relevance, i.e. importance, and models expanded to also include the perceived success of supply. In addition, income was tested as a covariate in the models, but it was not statistically significant.
As knowledge of the value of ES is essential in designing new agri-environmental policies, we also estimated the WTP for different attribute levels. When effects coding is used, the monetary values for all attribute levels can be calculated with the following formula: where xα is the level α of attribute x. The reported measures are the annual WTP per individual for a ten-year period between 2017 and 2026. As the models were estimated using effects coding, the WTP for moving from one attribute level to another was obtained by calculating the difference between them. Confidence intervals were calculated with the delta method. Non-significant measures are not reported, and they can be interpreted as zero.

Results
The results demonstrate that the respondents generally supported the policy programs, as the status quo option was selected on average in 19% of the choice sets. The conditional logit was estimated as the reference model. The results are reported in the first column of Table 4. An increase in the cost of the policy program negatively affected the probability of choosing it, as expected. All ES attributes were significant, except the number of plant species in cultivation, and improvements in the state of ES attributes increased the probability of choice. As we were interested in the heterogeneity in respondents' preferences and how the importance of ES, as well as the perceived success in their production, differs between respondents, an LC model was estimated (Table 4).
McFadden's pseudo-R 2 and information criteria (BIC and CAIC) confirmed a better model fit for the fiveclass model compared to the CL model. Regarding the covariates, i.e. the two components of demand relevance, we tested two model options, the first model only including importance (LL −9507.66) and the second including both importance and the perceived success of supply (LL −9342.99) as covariates. The second option performed better than the first model, confirmed by a likelihood ratio test for nested models (LR 329.35, df 8, p-value 0.000). The LC model for the choice of policy program revealed significant heterogeneity between the classes for all attributes ( Figure 1 in Appendix 2). In the LC model, Class 1 was the largest, comprising one-third of the respondents. These respondents frequently chose the policy programs over the status quo, and almost all of the program attributes except for the number of plant species in cultivation were significant. Their preferences were strongest for biodiversity and climate effects. The cost of the program was significant, but the coefficient was very small, indicating a low sensitivity to cost. Perceiving both ES included in the CE as well as ES excluded from CE as important increased the probability of belonging to Class 1. As these respondents perceived all ES as important, they could be described as 'environmentalists'. This was reinforced based on their support for improvements in environmental quality and low sensitivity to cost, which indicates a high WTP for environmental improvements.
Class 2 was also large, comprising approximately 31% of the respondents. All attributes were significant and of the expected sign, except for the number of plant species in cultivation. Similarly to Class 1, the respondents favored new policy programs over the status quo option. However, contrary to Class 1, landscape and water quality effects, i.e. the ES more directly benefiting humans, affected the choices the most. Considering ES included in the CE to be important but perceiving their current production as non-successful increased the probability of belonging to this class. Additionally, the perception that the production of ES excluded from the CE has failed decreased the probability of class membership. Therefore, this group was named 'scenario focused', implying that for these respondents in particular, the CE included attributes that are not only important but also their state should be improved.
The perception that ES excluded from the CE were important decreased the class membership for Class 3 (18% of the respondents). ASCs revealed that this class still chose the policy programs over the current program, but noticeably fewer attributes were significant, as grazing animals in the landscape and climate effects did not affect their choices. These respondents had the highest marginal utility of money and can therefore be called 'cost sensitives'.
Class 4 (9%) consisted of respondents who more often chose the status quo, i.e. the current program, than the other classes. Furthermore, respondents were more likely to belong to Class 4 if they perceived that none of the ES were important. Hence, this class was named 'SQ supporters'. This group had few significant attributes, and the highest level of water quality had an unexpected negative sign. The cost attribute was significant for this group.
The smallest class was Class 5, with just 8% of the respondents. ASCs for this class reveal a peculiar response pattern, as these respondents seem to have favored the option in the middle, i.e. Program X. Class 5 differed from all the other classes, as class membership increased if respondents perceived that ES excluded from the CE were important and decreased with the perception that included ES were important. Furthermore, the probability of belonging to Class 5 increased if respondents thought that the production of ES excluded from the CE had failed, but decreased if they perceived a failure to produce ES included in CE. Accordingly, Class 5 was named 'outsiders' to emphasize the fact that these respondents consider some ES as demand relevant, but these ES were not included in the CE. From all of the   ) attributes, only climate effects had significant coefficients, as even the cost attribute was not significant. As the CE also included cost levels that were lower than in the SQ alternative, we checked the choices regarding the violations of axiomatic preferences. There was a violation of axiomatic preference behavior in 6% of the choice situations where one of the policy alternatives was cheaper or the cost was equal to the status quo and other attributes were at least as good as in the SQ option. All latent classes significantly differed from each other in the share of respondents that had violated this assumption in at least one of the choice tasks (Class 1 0.9%, Class 2 4.9%, Class 3 13.9%, Class 4 97.9% and Class 5 78.8% of the respondents). The shares for classes 4 and 5 were considerably higher than for other classes. However, as these were the classes containing SQ supporters and respondents favoring the middle option, the result is not surprising. Table 5 presents WTP estimates that were calculated based on both the CL and the LC model. The results show differences in the welfare estimates between the classes. Class 1 had the highest WTP for traditional rural biotopes and endangered species and for decreases in agricultural greenhouse gas emissions. It is noteworthy that even the lowest WTP (for water quality effects) for Class 1 was higher than the highest WTP for any other class. This highlights the low sensitivity to cost for this class. Class 2 had the highest WTP for grazing animals in the landscape and water quality improvements. WTP estimates for Class 3 show that these respondents had a very low WTP for all the significant attributes. For Class 4, WTP could only be estimated for the number of plant species in cultivation and water quality effects. WTP for the lower level of improvement in water quality was positive, but the higher level of improvement had a rather peculiar negative WTP. For Class 5, it was not possible to calculate WTP estimates at all, since the cost was not significant for this group.

Discussion and conclusions
This study examined the heterogeneity in citizens' preferences towards agricultural ES and how the demand relevance of ES affected the valuation survey responses. Our analysis revealed heterogeneity in the respondents' preferences, as five respondent groups were identified that differed regarding their preferences: environmentalists, scenario focused, cost sensitives, status quo supporters, and outsiders. We also estimated policy-relevant citizen's WTP for four agricultural ES: traditional rural biotopes, landscape, climate effects, and water quality effects. The relative importance of environmental attributes clearly varied between the classes. One-third of the respondents,  Bernués et al. (2015), in our results, heterogeneity was particularly observed in the biodiversity-related attribute instead of landscape attributes (Novikova et al. 2017). As our attribute selection focused on non-market ecosystem services, we did not find a similar distinction between respondents as Rodríques-Ortega et al. (2016), who classified respondents into productivists and conservationists based on preferences for agricultural ES. However, resembling their results, we found a non-use value and use value emphasis in the preferences of our classes. Revealing the heterogeneity in citizens' preferences has importance for policy formulation, as it shows the groups that benefit most from emphasizing particular ecosystem services in policy design. Use-valueoriented Classes 2, 3, and 4 (50% of respondents), focusing on water quality, have especially benefitted from the current policy, as in Finland, reducing the eutrophication of surface waters due to the nutrient load from agriculture has been the key objective of the agri-environment scheme. Due to these measures, the nutrient load has slightly declined in the majority of regions (Grönroos 2014;Hyvönen et al. 2020). The 'environmentalists' (Class 1, with one-third of respondents) focused on ES related to wildlife and nature, and even supported ES improvements with higher cost levels. The strong preferences of these respondents towards biodiversity and climate attributes are understandable in the light of the current agri-environmental policy. Biodiversity is continuously declining in agricultural areas. However, it has had a relatively low weight in the agri-environmental scheme, and the conservation measures have not been implemented to a sufficient areal extent. Efforts have been made to formulate measures to reduce greenhouse gas emissions, but their implementation has been ineffective in terms of targeting the measures at those areas where they would be most efficient (Grönroos 2014;Hyvönen et al. 2020).
In addition to preference heterogeneity among respondents, our interest was in examining how the demand relevance of the selected ecosystem services, specified as attributes in the CE, was related to heterogeneous classes. We constructed four covariates based on the importance respondents assigned to the ES provided by the agricultural environment and their perceptions of the success of agricultural policies in providing these ecosystem services. Classes 1 and 2, comprising 65% of respondents, showed the high perceived relevance of the included attributes in importance and unsuccessful current supply. The choices of these respondents were reasonable and allowed the calculation of WTPs. Class 3 (18% of the respondents), with a low relevance of the included attributes in both measures, i.e. importance and the current supply, considered noticeably fewer attributes as significant and was mostly driven by the cost attribute in their choices. This led to low WTP estimates. Class 4, perceiving that none of the ES were important and that attributes included in the CE were already successfully supplied, showed strong support for the SQ. These choice patterns indicate that choices driven only by cost and the tendency to select the SQ are indications of the lower perceived relevance of the attributes in the choice experiment. If the attributes in the choice experiment are of low demand relevance to the respondent, this could lead to a low motivation in making choices and thus to more random choices or to the use of heuristics. The results demonstrated this particularly in Class5, where ACSs indicated that these respondents were likely to choose the first of the two programs. This may imply that not having attributes that the respondents consider as demand relevant may cause them to use simplifying strategies and to select the option in the middle. In addition, the respondents did not pay attention to the cost attribute if they considered the selected ES irrelevant. The non-significance of the cost can lead to unrealistically high WTP estimates if it is not detected. As stated by Lagarde (2013), 'inferring anything about the WTP for all respondents is misleading, and researchers should try to reflect better the heterogeneity of valuations.' For example, Scarpa et al. (2009) reported very high WTP estimates when the respondents' nonattendance to the cost attribute was not taken into account in modeling. However, compared to the relatively high share of respondents ignoring the cost attribute in previous literature, ranging from 36 to 90% (Scarpa et al. 2009;Lagarde 2013;Van Zanten et al. 2016), in our study, Class 5, having a nonsignificant cost attribute, only consisted of 8% of the respondents. Nevertheless, this highlights the importance of modeling preference heterogeneity, as assuming homogeneous preferences when some respondents ignore some of the attributes can distort the results for the whole sample.
Our results support the views of Jeanloz et al. (2016) about the importance of a careful selection process for the attributes, especially when complex goods or ecosystem services are in question. Before conducting surveys, it would be beneficial to use focus groups to not only examine what attributes are important to the respondents, but also to explore how respondents perceive the current state of the attributes and whether there is a need for improvements (Armatas et al. 2014). We would also encourage authors to describe the attribute selection process in greater detail (also Coast and Horrocks 2007;Coast et al. 2012;Abiiro et al. 2014), as reporting the process by which attributes for the CE were selected would improve the credibility in many papers.
In this study, our attribute selection process can be regarded as rather successful, as attributes included in the CE were relevant to 79% of the respondents. Even so, the attribute describing a typical agricultural landscape in terms of the number of plants species in cultivation was problematic, 2 as it was clearly less significant compared to the other attributes. The agricultural landscape is known to be an important ecosystem service (Pouta and Hauru 2015), but converting it into simple indicators for the valuation survey is challenging (Grammatikopoulou et al. 2012). A possible solution would be to use pictures to help visualize the changes in the landscape (e.g. Bernués et al. 2014;Häfner et al. 2018). However, using pictures for only one attribute is not recommendable, as it can become too dominating.
In our study, using both the importance of agricultural ecosystem services and perceptions of the success in providing these services to examine the demand relevance functioned rather well. Our results suggest that the demand relevance of attributes selected for the CE clearly affected the respondents' choices. This implies that if a comprehensive CE or the careful selection of attributes in a CE is not possible and demand relevant attributes are consequently excluded, anomalies are likely to occur. However, this can partly be addressed by allowing preference heterogeneity in the modeling.
Our study revealed heterogeneity in the demand relevance of attributes included in the CE. This highlights that even when attributes are carefully selected, it is likely that not all selected attributes will be relevant to all respondents. Moreover, from the point of view of policy processes, our results for the interaction of relevance and preference heterogeneity demonstrated that in the case of wider policy planning, such as natural resource or agri-environmental policies, non-relevant attributes cause difficulties in applying preference information in policy evaluation. Including a few ecosystem service-based attributes in a valuation study is a limited way to integrate the versatility of public preferences and associating values in decision-making. Hence, in future research, it would be interesting and necessary to develop approaches to address this issue. One solution would be two-step iterative personalized approaches for CE, where only attributes that a respondent perceives as demand relevant are included in his or her CE. Another approach would be to divide citizens into groups in the first phase of a survey and implement separate surveys for these groups, focusing on different ecosystem services.
A third alternative would be to increase the number of ES as attributes in a survey using a partial profile CE (PPCE). In a PPCE, the respondent is only presented a subset from a wider selection of attributes (Chrzan 2010). This method can be used to address a considerably larger number of attributes than a regular CE. However, it has mainly been used in marketing, health care, and transportation studies, and there have been few applications in environmental valuation. Hence, this method requires further research on its suitability for ES valuation. A fourth option that in some cases may better serve policy planning than one uniform survey is a deliberative valuation process or participatory planning. In this approach, different themes related to a wider topic would be processed by separate groups of interested participants, and both relevance and heterogeneity would be taken into account.

Notes
1. There were no differences in the shares of different bid vectors between the latent groups, except for a slightly higher share of bid vector €5-500 for class 5. A closer analysis of the effect of the different bid vectors is provided by the authors in a separate forthcoming paper. 2. A similar survey was conducted among farmers in the same project, and the number of plants species in cultivation was significant in the farmer survey (Tienhaara et al. 2020). Even though the attribute was not significant in the pilot study for citizens, it was necessary to retain it in the choice experiment, in order to keep the two surveys identical.

Disclosure statement
No potential conflict of interest was reported by the author(s).