The Participatory Index of Women’s Empowerment: development and an application in Tunisia

ABSTRACT In this paper we develop the Participatory Index of Women’s Empowerment, an innovative measurement tool that reflects its subjects’ own perceptions of empowerment. Participatory measurement is a response to the paradoxical potential for measurement of empowerment to disempower. A simple stated choice experiment allows participants to implicitly reveal the trade-offs that they make between different indicators of empowerment. This permits participatory determination of the relative weights for each indicator in a composite index, through estimation of a random utility model. We demonstrate the implementation of PIWE through a pilot application in the context of a quasi-experimental impact evaluation of an Oxfam project in Tunisia. Despite a relatively small sample size, we can reject the hypothesis that participants’ perceptions of empowerment are consistent with equal weights. We find that the project had a significant positive impact on participants’ empowerment and find suggestive evidence of impact on their perceptions of empowerment.


Introduction
There is a broad consensus in the development sector around the importance of the fifth Sustainable Development Goal, 'to achieve gender equality and empower all women and girls', as adopted by the United Nations General Assembly in (2015).Many development agencies explicitly target women's equality and empowerment (UN Women, 2014).Consequently, development practitioners and researchers face the challenge of how to define and measure the intangible concept collectively eliciting from women whose empowerment is being measured both the indicators of empowerment that for them are most important and the relative weights that should be assigned to those indicators in an index of empowerment.
In the next section, we examine the case for participatory measurement, identify a gap in the availability of participatory methods for assigning weights to indicators and develop PIWE to fill it.In section three we demonstrate, through a pilot application in the evaluation of an Oxfam project in Tunisia, that PIWE may be operationalised in a quantitative study by embedding a simple stated choice exercise into the same survey questionnaire that is used to measure respondents' indicators of empowerment.We present the results of our analysis to generate weights for the index and its application to evaluation of the project in section four.We discuss our results in section five, highlighting practical and conceptual considerations that emerged from our pilot implementation of PIWE.Section six concludes.

Development of PIWE
In this section, we establish the case for participatory approaches as a response to paradoxes inherent in the measurement of empowerment.We identify a gap in the availability of participatory methods for assigning weights to the indicators that constitute indices of empowerment, and develop PIWE as a new approach to address that gap.

The case for participatory measurement of empowerment
We explore the concept of empowerment and the paradoxes inherent in its measurement, establishing the important role of participatory approaches to partially resolve these paradoxes.

Definitions of empowerment
Empowerment was defined by Kabeer (1999), building on the concepts of agency and capability introduced by Sen (1985aSen ( , 1985b)), as 'the expansion in people's ability to make strategic life choices in a context where this ability was previously denied to them ' (p. 437).She recognises that individual agency is necessary but not sufficient to achieve this, emphasising that 'structural inequalities cannot be addressed by individuals alone ' (p. 457).
The typology of power developed by Rowlands (1995) provides a complementary framework.She distinguishes between power over ('controlling power'), power to ('generative power, . . . [creating] new possibilities and actions without domination'), power with (emerging from collective action) and power from within (based in 'self-acceptance and self-respect', extending 'to respect for and acceptance of others as equals'), thereby identifying aspects of empowerment of all forms (Rowlands, 1997, p. 13).VeneKlasen and Miller (2002) apply Rowlands' framework to the experiences of grassroots activists to explore empowerment as a 'process and the result of the process' to 'address the negative forms and results of power over ' (p. 45, p. 53).

The paradox of externally mediated or measured empowerment
Giving an example from a project in Kenya, Rowlands (1995) discusses the potential for outsiders, as mediators of empowerment in a development context, to engage in ways that are ineffective or even actively disempower 'local people ' (p. 105).This reflects the inherent problems of mediated empowerment discussed by Rocha (1997), which we interpret as perpetuating disempowering power over.
Reflecting on empirical studies of the impact of microcredit on women's empowerment in Bangladesh, Aslanbeigui et al. (2010) observe that '[t]he empowered woman is conceived as a construct, an artifact of specialists ' (p. 191).Discussing the activities of these specialists or 'experts', they conclude that, paradoxically, '[e]mpowerment, therefore, is mainly a consequence of what is done to women as opposed to what they do on their own behalf ' (p. 191, our emphasis).
These concerns apply just as much to the measurement of empowerment as its conceptualisation and promotion.Writing from a 'Southern' development practitioner's perspective, Taylor (2000) highlights the potential for the external measurement of empowerment to disempower, arguing that 'measurement of empowerment must not become something that the more powerful do to the less powerful' (p.12).A similar concern about the power over dynamic maintained by expert measurement of empowerment is expressed from a public health perspective by Raj (2020).Bridges (2001), in his treatment of the ethics of outsider research, examines the claim that any research conducted by outsiders into the experience of disempowered communities is intrinsically disempowering.Bridges recognises the potential for reinforcement of disempowerment through outsider research but argues that to avoid it entirely is 'not, any more than the paternalism of the powerful, the route to a more just society ' (p. 382).He explores the character of participatory research approaches that have potential to reconcile this tension.

Participatory measurement as reconciler and another paradox
We might conclude that the adoption of appropriate participatory approaches could resolve the inherently disempowering aspect of external measurement of empowerment.A further paradox emerges, however.As Kabeer (1999) discusses, illustrating with examples from India, social norms play a fundamental role in disempowerment and it is natural for the disempowered to internalise values that justify their subordinate status.Khader (2011) explores in depth the role of these internalised values, describing empowerment as the process of overcoming 'inappropriately adaptive preferences ' (p. 176).While acknowledging some problematic implications, Kabeer concludes that women's empowerment must be evaluated from an external normative standpoint, undermining the case for participatory measurement.
Despite the phenomenon of internalisation of disempowering values, we argue that there nevertheless remains an important role for participatory approaches to the measurement of empowerment.First, following Bridges' arguments, there is intrinsic value in implementing a participatory approach; the act of measurement will then at least not worsen, and may even to a small extent mitigate, the degree of disempowerment.Second, the extent of internalisation will be context-specific and thus a matter for empirical assessment.Even Khader (2011) argues for the important role of subjective data in identifying and elucidating divergences between subjective and external perspectives.By giving voice to participants, participatory approaches can reveal whether or not, and to what extent, they have internalised disempowering values.We conclude by noting that some participatory approaches may be more robust to internalisation, so careful choice of approach is important.

Existing measurement approaches
We explore the typical structure of existing indices that are used to measure empowerment and the role that participatory approaches have played in their development.This reveals a gap in the availability of participatory methods for assigning weights to the indicators that constitute an index.Kabeer (1999) identified three dimensions of empowerment: resources (material, human and social), agency and achievements.She discusses indicators to measure empowerment in each of these dimensions, arguing that their meanings, and thus their validity as measures of empowerment, are context-specific and determined by interrelationships between the three dimensions.An extensive literature on indicators of empowerment has since emerged.Examples include the indicators of agency proposed by Ibrahim and Alkire (2007) and the comprehensive documentation by Glennerster et al. (2018) of indicators of empowerment in several domains.

Indicators and indices
Many empirical studies have adopted as their outcome of interest a composite index of empowerment that aggregates indicators across several dimensions, not necessarily those identified by Kabeer (1999).A typical example is the Women's Empowerment Index (henceforth OxWEI), described in Lombardini et al. (2017) and Bishop and Bowman (2014), which was developed for Oxfam's Effectiveness Reviews (Hutchings, 2014).Acknowledging the context-specificity of empowerment, a number of individual-level binary indicators (indexed j ¼ 1; 2; . . .; J) are identified, through preliminary qualitative research, to reflect the characteristics of an empowered woman in the evaluation context.Each indicator is categorised by level (personal, relational or environmental) and assigned a weight w j .The empowerment score attained by individual i is then where x ij ¼ 1 if she achieves indicator j and x ij ¼ 0 if she does not.We observe that this index embodies an implicit definition of individual empowerment.Women with different profiles of achievements in its constituent indicators will be assessed as more, or less, empowered, while relative indicator weights represent the implicit trade-offs between different indicators.
Other indices with a similar sum-of-binary-indicators structure include the uncensored 5DE component of the Women's Empowerment in Agriculture Index or WEAI (Alkire et al., 2013), the project-level (pro-)WEAI (Malapit et al., 2019) and the Women's Empowerment in Livestock Index or WELI (Galiè et al., 2019).We discuss the establishment of such composite indices as the typical measurement approach for women's empowerment, surveying these and other examples in more detail, in Appendix A and in the working paper version (Quinn & Lombardini, 2023).

Participatory measurement
Participatory approaches have been extensively applied to the selection of indicators of empowerment.Bishop and Bowman (2014) recognised the importance of involving people affected by a project in its evaluation.Consequently, wherever possible project participants contribute to the identification of context-specific characteristics of an empowered woman in the OxWEI construction process (Lombardini et al., 2017).Other contexts in which participatory approaches have been implemented include (i) the identification of indicators for the evaluation process of a grassroots social movement in Bangladesh documented by Jupp et al. (2010) and (ii) the choice of indicators and dimensions for the WELI (Galiè et al., 2019).
In each of these cases, participation was achieved through qualitative activities including focus group discussions and other participatory rural appraisal methods.As a qualitative exercise, selection of indicators is well-suited to such approaches.It is important to work with a sufficiently representative sample of participants to achieve saturation (Bowen, 2008), being conscious that self-selection of participants who already have the knowledge, self-confidence and time to participate may impede this, but statistical representativeness is not necessary.
Choice of indicators, however, is just one aspect of the implicit definition of empowerment embodied in an index; the assignment of weights to indicators is equally important.The few studies that have adopted a participatory approach to assignment of weights have asked small groups of participants to rank indicators, for example Vigneri and Lombardini (2017).There is no reason to expect ranking exercises to generate weights that reflect trade-offs between indicators consistent with participants' perceptions of empowerment.Furthermore, we argue that, as a quantitative exercise, the participants in a participatory weighting exercise should be representative of those whose empowerment is being evaluated.We propose an alternative approach that addresses these issues.

The Participatory Index of Women's Empowerment
Our starting point is the observation that any individual-level composite index of empowerment represents an ordering of alternative profiles of achievements in the various indicators of empowerment that comprise the index.By 'ordering' we mean a specification, for each profile, of which other profiles are considered more, less, or equivalently empowered.Given an index of the form (Equation 1), the choice of weights determines the ordering represented, and thus the definition of empowerment embodied by the index.
Turning this around, if we were able to observe the ordering of alternative profiles of empowerment indicators perceived by those whose empowerment is being evaluated, we could attempt to determine weights w p j for each indicator j such that the index represents the perceived ordering. 1This is our Participatory Index of Women's Empowerment (PIWE).In order to implement PIWE, we will need both a method to observe or elicit the perceived empowerment ordering and a method to determine weights w p j such that the index represents the observed empowerment ordering.
Following Watson et al. (2019), who apply alternative preference-based methods to explore the robustness of the expert-based weights used in the UK's Index of Multiple Deprivation (IMD), we suggest that a discrete choice experiment (DCE) be implemented to elicit participants' perceived empowerment ordering.A DCE is a stated choice exercise in which participants are invited to express preferences amongst alternatives specified in terms of the discrete levels attained in each of several attributes (Hensher et al., 2015;Ryan et al., 2008). 2 To implement PIWE, the alternatives will be hypothetical women, described in terms of their achievements (levels) in each of several candidate indicators (attributes) of empowerment.We will demonstrate in our pilot application that it is possible to achieve participation at scale in a quantitative study by embedding a DCE into the same survey questionnaire that is used to measure participants' indicators of empowerment.
Having implemented a DCE, the PIWE weights may be recovered by application of standard discrete choice methods.Consider a participant i who expresses which of two hypothetical women 1 and 2 she considers more empowered.PIWE 1 ¼ P J j¼1 w p j x 1j and PIWE 2 ¼ P J j¼1 w p j x 2j , where x 1j and x 2j are the hypothetical women's achievements in the various indicators; the weights w p j are as yet unknown.Let participant i's evaluation of 1 and 2's empowerment be E 1i and E 2i respectively, such that À � so the probability that i will identify hypothetical woman 2 as more empowered is where F is the cumulative distribution function of ε 1i À ε 2i .This is a straightforward linear random utility model (Greene, 2012), whose parameters are the participatory weights w p j .
A key difference between PIWE and the ranking exercises applied in previous studies is that in PIWE the participants rank hypothetical women described by their profile of achievements in several empowerment indicators, rather than ranking the indicators (or dimensions) directly.By ranking profiles, participants implicitly reveal the trade-offs that they make between the different indicators.It is essential to elicit these trade-offs, as they are what the indicator weights in an index represent.Direct ranking of indicators cannot reveal the tradeoffs among them, and so cannot yield weights for the index.Another important difference is that, as a quantitative exercise that can be integrated into a large-sample quantitative study, the results of PIWE can be fully representative of participants' perceptions.It thus has an important complementary role to play alongside qualitative participatory methods, which are better suited for initial selection of indicators and exploration of participants' rationales for the choices that they express.

Application of PIWE in Tunisia: methods
In this section we introduce the context of our application then describe the methods that we applied to select and measure the indicators of empowerment, elicit the trade-offs that participants make between different indicators and recover indicator weights for PIWE.

Study context and sample
We piloted the implementation of PIWE in the context of an Effectiveness Review, conducted by Oxfam GB in Tunisia in November 2016, to assess the impact of the project AMAL: Supporting Women's Transformative Leadership on women's empowerment (Lombardini, 2018).This project started in 2012, following the democratic transition of 2011, with the objectives of increasing women's awareness of their political and socio-economic rights and supporting women to play a more active role in the political and socio-economic life of their community and country.It was implemented by three organisations in Tunisia: the League of Tunisian Women Voters (LET), the Tunisian Association of Democratic Women (ATFD), and the Association of Tunisian Women for Research and Development (AFTURD).
Project activities were conducted in rural and urban areas in five regions of Tunisia: Tunis, Kelibia, Sousse, Kef and Kasserine.215 women were randomly sampled from the implementing organisations' lists of project participants.The median age in this sample was 41 .The 48% of project participants who had completed tertiary education is substantially greater than the proportion of women who had completed tertiary education in a 2017 representative survey of Tunisian households (Ghali et al., 2018).Similarly, the 39% with formal sector employment was substantially higher than in the representative survey.3  As an ex-post impact evaluation, the Effectiveness Review followed a quasi-experimental propensity score matching (PSM) approach.A comparison group of 290 non-projectparticipant women was sampled from matched communities.Sampling mimicked the targeting process employed by the project, in an attempt to minimise both observable and unobservable differences between the project participant and comparison groups.Propensity for project participation was estimated as a function of recalled baseline and fixed characteristics.The sample was restricted to the region of common support of the propensity score distribution, comprising 213 project participants and 285 non-participants.Non-participants were assigned matching weights (henceforth 'PSM weights') determined by application of a Gaussian kernel.Impact evaluation methods and findings are reported in detail by Lombardini (2018); the present study focusses on the implementation of PIWE, in the context of the Effectiveness Review.

Choice and measurement of indicators of empowerment
As an Oxfam study, the Effectiveness Review adopted the OxWEI approach to measuring women's empowerment; we integrated PIWE with the OxWEI index-building process described by Lombardini et al. (2017).
The authors facilitated a preliminary workshop with Oxfam Tunisia programme staff, staff from the implementing organisations and a Tunisian expert in empowerment and gender.Workshop participants identified characteristics of an empowered woman in the evaluation context.Logistical constraints precluded the representation of study participants in this workshop.This was a limitation of our pilot implementation, as the characteristics of empowerment identified by workshop participants may have been different from those that are important for study participants; in principle, study participants should have been represented at this stage.In this particular context, our validity checks show that the impact of their nonrepresentation was limited (section 4.1).We discuss more extensively in section 5.1 the implications of this limitation for our results and what an ideal approach would have involved.
Through brainstorming, discussion and consensus-building, the workshop participants established a list of 14 indicators of empowerment together with, for each, an icon to visually represent it and brief descriptors of a woman who achieves it and a woman who does not, all summarised in Table 1.They also categorised each indicator as representing empowerment at the personal, relational or environmental level.Building on the workshop results and in collaboration with the Tunisian expert, we developed a questionnaire to capture each of the 14 indicators.Questions drew on established survey tools including the Rosenberg Self-Esteem Scale, the Demographic and Health Surveys (DHS) toolkit questionnaire, and the WEAI questionnaires (Alkire et al., 2013), as well as the accumulated experience described by Lombardini et al. (2017).The measurement of each indicator is detailed in Lombardini (2018), Appendix 1; the full translated questionnaire is available as an online supplement to this paper.

The discrete choice experiment
In section 2.3 we proposed the implementation of a discrete choice experiment (DCE) to elicit perceived empowerment orderings and thus determine indicator weights for PIWE.In the evaluation of AMAL in Tunisia, we embedded this DCE within the same questionnaire used to capture study participants' achievement of the indicators of empowerment.
To minimise cognitive demand, the choice set for each DCE question comprised just two alternative hypothetical women, with different profiles of indicators of empowerment.The study participant was asked to identify which of these hypothetical women she considered to be more empowered.Careful phrasing of this question was important, to ensure that we captured perceptions of empowerment rather than any other concept.' used to translate 'empowered' into Tunisian Arabic may be literally translated into English as 'capable of action'.It reflects the phrase 'active citizen', which has become a stock phrase in Tunisian Arabic, entering the vocabulary of democratisation and reflecting a transition in perceptions of the individual from passive subject to active citizen (M.-S.Omri, personal communication, 30 May 2017).While it is possible that the phrase may not have been understood in that way by less-educated study participants, we note that typical education levels of study participants are relatively high.
The choice of which profiles to present, in which choice sets, to which study participants, comprised the design of the DCE.Our design was constrained by the fact that variation (or even specification) of all 14 indicators of empowerment for each hypothetical woman risked cognitive overload for study participants; Arentze et al. (2003) demonstrated in their transport-choice study in South Africa that specification of more than three attributes had negative consequences for data quality.Therefore, we presented partial profiles (Chrzan, 2010) in which at most three indicators of empowerment were specified for both hypothetical women within each choice set.This precluded the estimation of interaction effects, but, as the goal of the exercise was to estimate weights for an additive index, we considered this an acceptable limitation.An example DCE question is illustrated in Figure 1.
Another constraint was logistical: with limited module duration, we could ask only seven DCE questions with each study participant.We therefore created a 'blocked' design, allocating different sets of questions randomly to different study participants.Even so, given the multiplicity of potential hypothetical profiles (2912 with three indicators), it was impossible to explore them all.
We structured the module in three parts.

Empirical methods
As described in section 2.4, the PIWE weights may be recovered through estimation of a random utility model (Equation 3).If the error term ε 1i À ε 2i is normally distributed we may re-scale the coefficients so that ε 1i À ε 2i ,N 0; 1 ð Þ and this becomes the probit model, which may be estimated by maximum likelihood.We weight observations by the PSM weights, so the results are representative for the sample of project participants (subject to the common support restriction).The PIWE weights are then The partial profile design of the DCE complicates estimation of the model.We were reluctant to instruct study participants to assume that the hypothetical women were identical in characteristics other than those specified, as is standard practice with partial profiles (Chrzan, 2010).Ifas is likely -indicators of empowerment are, in the participants' experience, highly correlated, they would perceive the directed assumption as implausible.Therefore, we did not make any reference to unspecified characteristics but allowed participants to make their own inferences on the basis of the information provided.Ryan et al. (2009) find qualitative evidence that DCE respondents do indeed infer additional information beyond what is presented in the task.To represent this inference-making process we utilised the empirical joint distribution of study participants' measured indicators, assigning unspecified characteristics their empirical sample average, conditional on the values of the specified characteristics.This imputation process is detailed in Appendix B.4 and in the working paper version (Quinn & Lombardini, 2023).The imputation of explanatory variables invalidates usual approaches to estimation of standard errors, so we implement a bootstrap procedure to obtain standard errors.As we re-sample from the sample of individuals rather than DCE question observations, the bootstrap accounts for withinindividual clustering as well as the imputation of unspecified characteristics.
We may be interested in heterogeneity of perceptions of empowerment across different subgroups of the study participants; in particular, in this impact evaluation context, between project participants and non-participants.In order to assess such heterogeneity, we introduce interaction terms with a project participation indicator P i .The probit model then becomes where the coefficients γ j reflect the perceptions of the project non-participants, while the coefficients δ j reflect any divergence in perceptions of empowerment between project participants and non-participants.

Application of PIWE in Tunisia: results
In this section we briefly discuss the validity and consistency checks that reassure us that our methods elicited meaningful information about participants' perceptions of empowerment.We then discuss the results of our estimation of PIWE weights and the impact of the programme on both empowerment and participants' perceptions of empowerment.

Validity and consistency checks
We designed the DCE to incorporate several checks of the validity of the exercise.Study participants were not directly represented in the workshop at which the indicators of empowerment were identified, discussed in section 3.2, so an important first check is whether or not study participants agree that each, individually, is indeed a characteristic of empowerment.In part A of the DCE module (single-indicator question), study participants overwhelmingly responded that the hypothetical woman who achieved each indicator was more empowered than the woman who did not, with no more than 7.4% disagreement for any indicator and just 3% disagreement across all 14.This reassures us that the study participants agreed with the workshop participants that each is indeed an indicator of empowerment; we can conclude that in this particular application, the impact of the lack of representation of study participants in the workshop was limited.
Similar results were obtained with the first question in part B, in which the hypothetical women differed in their attainment of only one of the three specified indicators, although the proportion disagreeing was higher at 9.5% across all six relational indicators.This is consistent with up to 19% of the study participants failing to comprehend the exercise, or choosing at random.Similarly, 5.2% expressed choices in part B that indicate an intransitive (inconsistent) empowerment ordering; again, this is consistent with around 80% of the study participants understanding the exercise and expressing consistent choices.The checks are fully documented and further discussed in Appendix B.5 and in the working paper version (Quinn & Lombardini, 2023).
Overall, the validity and consistency checks reassure us that the DCE elicited useful information about participants' perceptions of empowerment.We do not exclude any participants from the sample as a result of these checks, maintaining our fully-participatory approach and allowing the choices expressed even by those who did not have strong opinions or did not fully understand the exercise to contribute to the results.Inconsistencies both within and across study participants are absorbed by the error term in our random utility model (Equation 3).

Estimation of PIWE weights
We pool study participants' responses to questions in parts B and C of the DCE module to estimate a probit model (Equation 4) and thus recover the PIWE weights w p j (Equation 5).Table 2 reports coefficient estimates for four versions of the model, together with tests of the hypothesis of equal coefficients. 4We also report the proportion of study participants' choices that are correctly predicted by each model (and thus any resulting index).For each model this is substantially greater than the 59.7% of study participants' choices correctly predicted by an equally-weighted index.Model (1) maintains the assumption that non-specified characteristics are equal for the hypothetical women within each choice set (no imputation of non-specified characteristics).Models ( 2) -( 4) report coefficient estimates with imputation of unspecified characteristics as described in section 3.4.In models (3) and ( 4), indicators that yield negative coefficients are sequentially excluded, to ensure that all PIWE weights are non-negative.
We chose model ( 4), in which unspecified characteristics are imputed and coefficients constrained to be non-negative, to provide the weights for PIWE, which are obtained by re-scaling the estimated coefficients to sum to 100% (Equation 5).The PIWE weights are illustrated and contrasted with OxWEI (equal) weights in Figure 2.This demonstrates that the total weight (54% solid red) ascribed to the personal indicators is substantially greater than it would have been had they been allocated equal weights (36%), while the total weights allocated to the relational and, especially, environmental indicators (36% striped green and 10% dotted blue respectively) are substantially less than under  equal weights (43% and 21% respectively).Only three of the coefficients of model ( 4) are individually significantly different from equal at the 5% significance level; interestingly these are all coefficients on relational indicators, reflecting the greater power for the relational indicators that arises from the greater effective sample size, as the part B DCE questions specified only relational indicators.Participation in the public sphere (R2) and independent income (R5) receive significantly smaller weights than they would have received in an equally weighted index.Conversely, taking action to stop violence (R4) receives a significantly greater weight.The weights assigned to decision making (P2) and awareness of collective action (P4) are relatively high, but not significantly different from equal weights.
Having estimated the PIWE weights, we may compute the PIWE and OxWEI (equally-weighted) scores for the study participants on the basis of their empirical achievement of all indicators.We find a strong but not perfect correlation of 0.93 between the two indices.The relationship between the indices is illustrated in Figure 3, which demonstrates that individuals' PIWE and OxWEI scores differ by up to 29 percentage points.

Impact on empowerment
This pilot implementation of PIWE was conducted in the context of the quasi-experimental impact evaluation of the Oxfam project AMAL in Tunisia, reported in detail by Lombardini (2018).To illustrate its application, we report the key estimates of impact in Table 3.According to both PIWE and the equally-weighted OxWEI, the project had a small but significant positive impact on the empowerment of project participants.As the PIWE reflects the study participants' collective perception of empowerment, we may conclude that the project had a positive impact on empowerment as perceived by the study participants.

Impact on perceptions of empowerment
It is possible that project participation had an impact on perceptions of empowerment.To assess the extent to which this is a relevant concern in our study, we re-estimate the probit model with project participation interaction terms (Equation 6).We impute the unspecified indicators and retain all indicators in the model, so this amounts to a decomposition of model (2) in Table 2. Results are reported in Table 4.
The only interaction term coefficient significantly different from zero is that on the ability to make decisions for herself (P2), which is substantially higher for the project participants than non-participants.This suggests that the project may have had an impact on the perception of this indicator as an important characteristic of empowerment among project participants.
The standard errors are relatively large for the personal and environmental indicators, so the lack of significant differences for these indicators (other than P2) cannot be taken as indicative of no impact.However, the standard errors are smaller for the relational indicators, and it is interesting to observe that we do not find a significant impact of the project on the weight that the respondents attach to any of these indicators.

Discussion
In this section we reflect on the implementation of PIWE, an unresolved challenge with the empirical methods that we adopted and the results of our study in the context of our application in Tunisia.

Implementation of PIWE
We reflect here on several practical and conceptual considerations that emerged in our pilot implementation.

Indicator choice
The DCE allowed study participants to collectively determine the weights assigned to a candidate list of indicators.Rejection of candidate indicators would have been possible (an indicator not considered relevant by the participants is assigned low weight), but there is no mechanism for participants to propose extra indicators at the quantitative survey stage.Hypotheses of equal rather than zero coefficients reported for γ j ; hypotheses of zero coefficients reported for interaction term coefficients δ j ; * p < 0:1, ** p < 0:05, *** p < 0:01.
In our pilot implementation, the validity checks demonstrated that the study participants considered all 14 candidate indicators to be valid indicators of empowerment.This strong alignment between study participants and workshop participants reassures us that in this particular context the impact of the exclusion of the former from the preliminary workshop was limited; this may be a consequence of the relatively high education levels among study participants, whose backgrounds may not have been very distinct from the 'expert' workshop participants.However, it is possible that indicators considered important by the study participants had been overlooked.With an extensive candidate list, it is likely that included indicators served as proxies for important excluded indicators, minimising the effect on impact evaluation results but complicating interpretation of the estimated weights.In future applications, especially in contexts where study participants have more distinct backgrounds, we recommend that study participants should be represented in any preliminary qualitative exercise to identify the candidate list of indicators.

Experimental design
The need to develop and implement a bespoke DCE design in the context of a tight evaluation timeframe was challenging.Our decision to focus attention on a subset of indicators resulted in an inefficient experimental design that limited our power to estimate precise weights for some of the PIWE indicators.With limited sample size and survey time availability, there is scope to benefit in future applications from the implementation of more efficient experimental designs.As the literature on efficient designs for partial profile DCEs that are robust to model mis-specification is underdeveloped, this is a priority for future research.

Heterogeneity
It is likely that study participants' perceptions of empowerment, in particular the trade-offs that they make between different indicators, will be heterogeneous.In our model (Equation 3) this heterogeneity is absorbed in the error term ε 1i À ε 2i , while our balanced experimental design and pooled estimation approach ensure that the estimated participatory weights represent an impartial average over participants' perceptions.
With sufficiently large sample size, it is possible to empirically assess the extent of heterogeneity by including interaction terms with subgroup dummies (Equation 6).We implemented this for project participants and non-participants (Table 4).

Functional form
We have imposed the assumption that PIWE is additive in the indicators, while in our pilot implementation we imposed the further assumption that all indicators are binary.These assumptions align with other indices in the literature, discussed in section 2.2, but need not align with study participants' perceptions of empowerment.Our experimental design and estimation approach (described in section 3) ensure that PIWE is the best fit to participants' perceptions given these constraints; in future applications with sufficiently large sample sizes it may be possible to relax these restrictive assumptions.

Comparability
Like all context-specific and participatory approaches to measuring empowerment, PIWE has no guarantee of comparability across different contexts or implementations.As we discuss in section 2, there are strong arguments in favour of participatory approaches.Meanwhile, given the contextspecificity of indicators of empowerment (also discussed in section 2), apparently comparable indices (such as WEAI) are not in practice straightforwardly comparable across different contexts.If appropriate indicators of empowerment and the trade-offs between them, and thus their index weights, differ across contexts, then an index that aggregates the same indicators with the same weights may not capture the same concept of empowerment across these contexts.Where it is necessary to make comparisons across different contexts, we suggest that researchers may wish to explore the extent to which context-specific indices align, whether developed using PIWE or another approach, and whether findings are robust to the choice of index.

Subjectivity and internalisation of disempowering norms
It is important to minimise the distortions that can emerge in subjective measurement (Bertrand & Mullainathan, 2001;Jahedi & Méndez, 2014).This is one reason why we do not directly ask study participants to assess their own empowerment.DCE questions are relatively straightforward, minimising cognitive demands that can distort subjective data; Arentze et al. (2003) find that data quality in a transport-choice DCE in South Africa did not vary with respondent literacy.Distortion through social desirability may be minimised by assuring participants of anonymity; in a literate context, they could record their responses without involvement of an enumerator.In some cases, participants may not have a strong attitude about which of the hypothetical women is more empowered and so may respond essentially at random.This is absorbed by the error term in the random utility model and, appropriately, would tend to reduce the coefficients and thus weights on the relevant indicators.
Internalisation of disempowering norms poses a particular issue for subjective or participatory measurement of empowerment.Its extent will vary across contexts, while its impact may be partly mitigated by DCE question phrasing.Participants are asked which of the hypothetical women is 'more empowered' rather than 'better', while the DCE elicits trade-offs between indicators rather than value judgements about levels of empowerment.Nevertheless, it is important to be conscious of this issue.Where it is a particular concern, the PIWE exercise remains relevant as an appropriate method to elucidate participants' perceptions of empowerment, even if the resulting index is not considered appropriate for assessment of empowerment.The exercise could also be repeated with community activists or even empowerment experts familiar with the context, to explore the extent of divergence between participants' and activists' or experts' perceptions of empowerment.

Instability of perceptions
It is quite possible that the process of empowerment impacts perceptions of empowerment.This is an empirical question that PIWE can help to answer, through comparison of the index for different groups or at different times.It is a particular issue when PIWE is implemented for impact evaluation, as in our application.With a relatively small sample size and thus low power, we chose not to develop separate PIWEs for the project participants and non-participants.Had we been able to do so, an important element of the impact evaluation would have been to assess robustness of the results to the choice of index.We recommend that, where possible, divergence in project participants' and non-participants' perceptions, and robustness of evaluation to any divergence, be assessed.

Empirical methods
In the absence of an established method to model the inferences that study participants draw about unspecified indicators, we implemented an ad hoc imputation method.Two aspects of our empirical results suggest that this method was not optimal, indicating that further methodological research is needed to develop a better approach.
Firstly, the explanatory power of models ( 2) -( 4), in which we impute the unspecified indicators, is actually slightly lower, at 67-69% of study participants' choices correctly predicted, than model (1) in which we do not impute, at 74%.
Secondly, despite study participants demonstrating through the validity checks that they consider all indicators individually to be indicators of empowerment, when we estimated model (2) to obtain the PIWE weights, we found that the coefficient on one indicator (R3, political participation) was significantly negative.Some coefficient estimates were unstable on elimination of this indicator; the coefficient on a second indicator (E1, equality of opportunity) became negative and we eliminated it also.This coefficient instability may result from the correlations between indicators introduced by the imputation, so that included indicators proxy for excluded indicators.The diminution in the explanatory power is marginal, so we remain confident that the aggregate index (PIWE) reflects study participants' collective perceptions of empowerment.

Results
Notwithstanding the instability of some individual coefficient values noted above, our empirical results reveal some interesting aspects of study participants' perceptions of empowerment as expressed through the DCE.The high aggregate weight attributed by study participants to the personal indicators of empowerment contrasts with the importance ascribed by project implementers to the relational indicators; project activities had particularly targeted relational characteristics of empowerment.It is possible that the personal indicators more closely align with study participants' own aspirations for empowerment, or those characteristics that they perceive may be amenable to change.Focussing on the relational indicators, for which we have stronger statistical power, it is notable that having an independent income (R5) and control over resources within the household (R6) -both emphasised in the international policy literature as important signifiers of empowerment -are attributed relatively low weights.The high weight attributed to being able to take action to stop violence (R4) is unsurprising and may reflect study participants' recognition of the profound disempowerment experienced in a situation where one experiences violence without any recourse.
In our impact evaluation results, we observe with interest that the average PIWE scores are greater for both project participants and non-participants than the equally-weighted OxWEI scores.This, alongside Figure 3, demonstrates that the weighting of indicators in PIWE aligns more strongly with those indicators that are more commonly achieved by the study participants, perhaps suggesting a salience effect in their perceptions of empowerment.

Concluding remarks
We have demonstrated, through a pilot implementation in the context of an impact evaluation in Tunisia, that it is possible to achieve participation at scale when measuring women's empowerment.The resulting Participatory Index of Women's Empowerment reflects the collective perceptions of empowerment of the participating women.While there remains scope for methodological improvement, in particular through development of efficient experimental designs and better imputation approaches, we hope that this measurement tool will prove a useful addition to the portfolio of methods available to researchers and practitioners who seek to fully involve the communities that they work with.

Geolocation information
The study took place in rural and urban areas in five regions of Tunisia: Tunis, Kelibia, Sousse, Kef and Kasserine.

Notes
1 While, with a finite number of indicators that may take a finite number of values, we can be sure that there exists some function of the indicators that represents the ordering, there is no guarantee that that function will be additive.In fact, existence of an additive representation requires separability of the ordering across indicators, which need not be the case: the extent to which one empowerment indicator is traded off with a second might depend on the value of a third.We are relaxed about this, as estimation will yield the best additive approximation to the 'true' representation, which seems appropriate if we seek an additive index. 2 DCEs have been widely applied in marketing, transport and health economics, as surveyed in Ryan et al. (2008) and Hensher et al. (2015).More recently, political scientists and sociologists have implemented DCEs in surveys to explore voter preferences and social attitudes (Hainmueller et al., 2014;Liebe et al., 2020).Decancq and Watson (2019) implement a DCE to explore weights for the Human Development Index. 3 In the representative survey 1% of rural women and 5% of urban women in the household head/spouse sample and 32% of the youth sample had completed tertiary education, while the proportion engaged in wage work were 8%, 14% and 19% respectively.4 More coefficients are significantly different from zero; at p < 0.05 all 14 in model (1), 7 of 14 in model (2), 5 of 14 in model (3) although only 2 of 14 in model (4).We do not report these tests, as the hypothesis of equality is more relevant to the study.
Part A (one question) specified only one indicator, achieved by one of the hypothetical women and not by the other, allowing us to aid study participants' comprehension of the activity and to provide a fully participatory check of whether they do indeed consider each indicator a valid characteristic of empowerment.The questions in parts B and C specified three indicators.Part B (four questions) focussed on the six relational indicators, identified by participants in the preliminary workshop as particularly targeted by project activities.Part C (two questions) combined relational indicators with a personal or environmental indicator.The DCE design is documented in detail in Appendix B.2 and in the working paper version (Quinn & Lombardini, 2023), while its field implementation in SurveyCTO is documented in Appendix B.3.

Figure 3 .
Figure 3.Comparison of PIWE and OxWEI scores.Notes: Marker size represents aggregate PSM weight for each OxWEI/PIWE combination; markers are semitransparent to reveal overlaps; solid line is the line of equality.

Table 1 .
Indicators of empowerment, icons and descriptors.
R2She does not participate in civil society and associations She actively participates in civil society and associationsParticipation and ability to influence or make decisions in the political sphere R3 She does not actively participate in political parties She actively participates in political parties Taking action to stop violence R4 She experiences violence and does not report it In cases of experience of violence, she is able to report it Independent income R5 She does not have an independent source of income She has an independent source of income Control over resources in her household R6 She has no control over assets and resources in her household She has control over assets and resources in her household Environmental Equality of opportunity E1 She lives in a community that does not allow women to have equal political opportunities with men She lives in a community that ensures that women have equal political opportunities with men Social norms E2 She lives in a society that does not allow her to be free She lives in a society that allows her to be free Legislative protection for women's rights E3 She lives in a society where women's rights are not enshrined in law She lives in a society where women's rights are enshrined in law Notes: Indicator definitions and descriptors developed, and icons identified, in preliminary workshop in Tunis, November 2016.
The DCE question texts, translations and back translations are documented in Appendix B.1.The phrase '

Table 2 .
Estimation of probit coefficients and PIWE weights.

Table 3 .
Impact of AMAL project on women's empowerment.

Table 4 .
Estimation of probit model with project participation interactions.