Subjective well-being in cultural advocacy: a politics of research between the market and the academy

ABSTRACT This paper responds to a trend of contracting out subjective well-being econometrics to demonstrate social return on investment (SROI) for evidence-based policy-making. We discuss an evolving ecology of ‘external’ research taking place ‘between’ the academy and commercial consultancy. We then contextualise this as waves of research methodologies and consultancy for the cultural sector. The new model of ‘external between’ consultancy research for policy is not only placed between the University and the market, but also facilitates discourse between policy sectors, government, the media and the academy. Specifically, it enables seductive but selective arguments for advocacy that claim authority through academic affiliation, yet are not evaluated for robustness. To critically engage with an emergent form of what Stone calls ‘causal stories’, we replicate a publicly funded externally commissioned SROI model that argues for the value of cultural activities to well-being. We find that the author’s operationalisation of participation and well-being are crucial, yet their representation of the relationship problematic, and their estimates questionable. This case study ‘re-performs’ econometric modelling national-level survey data for the cultural sector to reveal practices that create norms of expertise for policy-making that are not rigorous. We conclude that fluid claims to authority allow experimental econometric models and measures to perform across the cultural economy as if ratified. This new model of advocacy research requires closer academic consideration given the changing research funding structures and recent attention to expertise and the contracting out of public services.


Introduction
International interest in incorporating well-being measures into policy-making has gained pace in recent years. One area of intellectual and practical interest is the operationalisation of subjective well-being as a statistic, often referred to as SWB, to appraise governance decisions. We argue that a trend is becoming apparent, with particular policy sectors recruiting consultancy research that uses SWB models to argue the value of sector activities in both social and financial terms. We address the implications of this trend in two parts. Firstly, we argue that the use of SWB in evidence-based policy-making tends to reproduce values and hierarchies, rather than offering new evidence; secondly, that in tracing these econometric models, a new form of research consultancy emerges.
The public sector has a varied relationship to well-being evidence that seems to reflect the power of government administrations. Policy sectors are often described in terms of the prominence of their specific ministries, which, in turn, are thought to be reflected in the funding decisions made by HM Treasury. Those deemed less influential are thought to receive less financial support, as with the cultural sector (see, for example Stevenson et al. 2010 andGray andWingfield 2011).
Policy sectors typically thought to be less persuasive are enthusiastically recruiting well-being econometrics to evidence their value through social return on investment (SROI). The cultural sector is such an example, having recently experienced cuts, both in general and to central bodies such as arts councils and government departments. The question of funding for this sector grew more urgent following the 2010 election, when staff research capacity decreased, thereby increasing demand for lobbying while concurrently diminishing the workforce responsible for advocacy and evidence. Meanwhile, externally commissioned research increasingly consists of quantitative modelling of SROI using large-scale survey data. We argue that this marks a departure from the earlier analysis of quantitative data, which was more descriptive.
The cultural sector's responses to cost-benefit analysis econometrics tend to be extreme. On one end of the scale are those enthusiastic for new opportunities to communicate value in a way that reflects the requirements of HM Treasury (see O'Brien 2010 for an overview of econometrics and cultural value). Counterarguments typically emerge from a fundamental belief that it is impossible to describe the subjective experience of arts participation using these techniques. This paper aims to move beyond this unhelpful binary through empirical research, which does three things. Firstly, we describe the models of research, which generate measures of subjective well-being. Secondly, we demonstrate what these numbers 'do' when they are used: how these numbers reflect existing hierarchies and their influence on the cultural sector. Finally, we replicate a model estimated in a seminal study in cultural econometric consultancy. This study replicates an example of what we call the 'new model' of externally commissioned research, which uses the 'new science' of happiness (Layard 2007) to advocate for the value of cultural sector. The example purports to show positive relationships between arts attendance and happiness, but our replication shows that any positive relationships are actually between arts participation and happiness: what the report claims to measure is not what it actually measures. Extending the replication by introducing additional estimators, we show that the ways in which attendance and participation are operationalised have major implications for estimated effect size. We thereby unpack a specific example of an approximation of the relationship between cultural participation and happiness in a national context, which has implications beyond borders and the cultural sector.
Despite the increasing use of these new econometric models in research commissioned from nonsector consultants, certain forms of culture are still portrayed as more valuable than others. This is in part due to general trends and decisions regarding the relationships researched. For example, the recent Arts and Humanities Research Council (AHRC) Cultural Value project final report noted of one recent review of the relationship between cultural participation and forms of well-being that 'music and singing represented nearly half of the studies reviewed, underlining their general prominence' (Crossick and Kaszynska 2016, p. 108). This suggests a recursive relationship: the more an artform is researched, the stronger the arguments for funding for further research into its contribution to well-being, thus supporting public subsidy of that artform, its further research, and its increased visibility in cultural policy-related discourse.
A growing body of research challenges hierarchies of cultural value in the cultural economy and problematises the assumed relationship between funding for activities and associated well-being (Miles and Sullivan 2012, Oman 2015, Taylor 2016. We argue that new models of econometric consultancy are commissioned to research the value of specific cultural activities, which are of interest to institutions that have historically managed culture and are, thus, likely to reproduce the status quo. Furthermore, this not only affects funding for the cultural sector from central government but also the activities funded and the behaviour of policy actors: what is understood to be valuable affects what is then researched, as well as the activities invested in for their proposed social benefits.
This paper highlights several more general issues arising from externally commissioned cultural research, not all of which can be addressed fully. Is quality control for research commissioned for advocacy a growing issue across other sectors of the economy, given what we have found in the cultural sector? Is the rigour of commissioned research important if the outcome appears authoritative and headline findings generate advocacy and improve claims to funding? Is this model of research less focused on providing answers to pertinent policy questions, and more about creating authoritative-sounding justifications for the continued public funding of culture?
Work on the 'social life of methods' (Law et al. 2011, Law andRuppert 2013) has recently been applied to cultural policy and drawn attention to the problems and paradoxes arising from mistaken belief in objectivity and neutrality in evidence-based policy-making (O'Brien 2014, Campbell et al. 2017. We argue that the performative aspects of these methods (Law 2009, Butler 2010) act upon research and policy with more critical consequences when based on inaccurate variables and unfinished models, and the best way to draw attention to the social effects of new methods of researching well-being for evidence-based policy-making is to 're-perform' them (Butler 2010, Oman 2015 and follow them at work (Mitchell 2005, p. 318).
The study proceeds as follows. First, we describe the role of externally commissioned research in the public sector in general and the changeable relationship between consultancy and the academy. We outline the new model of consultancy as taking place 'between' the academy and the market and how this facilitates communications of value between policy sectors, the media and HM Treasury. Second, we address how this applies particularly to the English cultural sector, and how the use of quantitative research has changed in the context of decreasing budgets. We then describe, replicate, and extend the example discussed, demonstrating its shortcomings, and conclude by discussing the implications of such research. Considering these findings, we suggest a reappraisal of this gap in knowledge of quantitative models and the demand for their production be reviewed.
A new model of research: the external consultant, 'between' the market and the academy In recent years, governments and the public sector have looked to an increasing number and variety of organisations and individuals beyond the civil service to carry out policy research. This 'external' advice might be provided through 'a competitive and fragmented mixed economy' of think tanks, consultancies, and other research institutions, including the academy and lobbyists (Heinz and Pautz 2016). Independent expertise became ever more important in the late 1990s, presenting a possibility of policy-making beyond ideology and rooted in evidence. While problematisation of this representation is on the rise (Cairney 2016), evidence-based policy remains the standard.
There is an expansive literature defining the rise and influence of the think tank (Stone 1996, 2007, Mulgan 2006, Medvetz 2008 in the mobilisation of expertise to win policy arguments (Maxwell 2014) or secure funding (Perez 2014), with some of this specifically focused on the cultural sector and creative industries (e.g. Schlesinger 2009Schlesinger , 2013. Think tanks have usefully been described as 'straddling institutions' in research for policy (Mulgan 2006, p. 148) and 'the bridge between knowledge and power in modern democracies' (United Nations Development Programme, cited in Stone 2007, p. 259). However, the identification of think tanks as 'bridges between state, society and science' proved to be an oversimplification of the diversifying ecology of external research (2007, p. 261). We argue that a new 'mixed economy' of external policy research and evidence is evolving, better described as between the academy and the market, whose actors specialise in the generation of research outputs that facilitate advocacy conversation about value between policy sectors, HM Treasury, the media, and the general public.
A new model of spin-off consultancy firms has evolved where organisations are neither in nor outside Higher Education Institutions (HEIs). Actors in these research organisations are between familiar and well-analysed policy spheres, and modes of knowledge creation, such as McGann and Johnson's (2005). Consultants' associations with HEIs allow research commissioners to present outcomes as intellectually authoritative, neutral, and robust, with flexibility to write attractive and less cautious headline findings than peer review demands.
The contracting out of research is presented as efficient and value for money. It originates from a model that evolved in the mid-2000s that was by and large limited to the recruitment of management school academics into local industry to consult on issues around business expertise. Even this narrow remit raised concerns regarding the balance of academics' scholarship and applied work (Docherty and Smith 2007). More recently, this scope has widened into econometric modelling to demonstrate the value of specific activities, particularly when publicly funded, to policy goals, such as improved well-being.
Seeking external consultancy for public policy involves a differentiated social interaction between external advisor and client organisation (van Helden et al. 2012, p. 207). The policy commissioner or public sector manager will select a consultant to find a solution, whereas an academic will identify the problem to be solved, or will validate a solution. These relations are also distinguished by a commissioner's confidence in the languages, methods, and results of practical versus theoretical consultancy (van Helden et al. 2012).
The new models of 'external between' research consultancy also present innovation in policy communications. Stone (1996) demonstrated that the influence of think tanks changed the ways in which policy is 'debated and decided', rather than directing how policy is made. She argued that such organisations provide the 'conceptual language, ruling paradigms and empirical examples that become the accepted assumptions for those in charge of making policy' (1996, p. 110). The sophisticated and instrumental use of language has been framed as the most significant work of the think tank: 'as producers of reproducible mediated discourse [they] are themselves part of a wider communications industry' across the 'politico-intellectual field' (Schlesinger 2009, p. 3, 4). In other words, the value of commissioned research may lie in its capacity to communicate a message, rather than its rigour.
What purpose does this serve in our case study? Research is being commissioned to 'speak to Treasury' (O'Brien 2010) in the language of economics (Mitchell 1998(Mitchell , p. 85, 2002 to reproduce the discourse of cultural value, and made-to-order by consultancies addressing the cultural sector (Prince 2014). Concurrently, a new model of 'external between' consultants are communicating the value of their research outcomes (in our case, an econometric model) to the cultural sector with promise of methodological practice or rigour. Meanwhile, diminished internal research resources reduce the likelihood of scrutinising claims to evidence. Therefore, the cultural sector offers an interesting site in which to further investigate why innovations in economic models and consultancy seem to have resulted in research outputs that are not robust, yet through the language of economics, appear scientific while presenting what Deborah Stone calls attractive 'causal stories' (Stone 2002, p. 207).

Waves of consultancy and cultural sector research for advocacy
The Department for Culture, Media and Sport (DCMS, formerly Department for National Heritage) was renamed in 1997 by the recently elected Tony Blair. New Labour foregrounded culture as key to the flourishing of societies and individuals, as well as for the economic role it was increasingly perceived to play (see e.g. Hesmondhalgh et al. 2015). It also inherited almost two decades of new public management approaches, a mix of public and private provision and a commitment to using social science technologies to evaluate what worked and what did not in public administration (O'Brien 2014, pp. 114-115). Consequently, consultancies 'have acquired a high profile in the transport of policy ideas, management principles and social reforms from one context to another' (Stone 2007, p. 263).
Initially, the ways DCMS was asked to assess its performance against social and economic goals were not methodologically challenging. For example, it evaluated how well the arts contributed to social cohesion by comparing visitors to a range of events with the general population. If the profile of people at these events grew closer to that of the general populationand less highly educated and whitethen arguments were made for a contribution to social cohesion (DCMS 2003).
While not technically demanding, such assessments were hampered by data quality. It was impossible to identify how the fraction of the population going to a museum had changed in the last 12 months without a figure for the previous 12. The data that were collected on the cultural sector were partial, largely driven by specific targets generated by DCMS and related bodies (see Selwood 2002 for a review). Thus, the data collected reflected interests in, and ways that culture was managed nationally, the expertise available and the current trend in advocacy.
Cultural value arguments were increasingly included in the rhetoric of other actors and organisations, such as local authorities. These arguments retained the two key focuses: social inclusion and economic multipliers. If a local authority could show their local theatres led to economic growth, or to social inclusion, they could make a case for greater funding. Similarly, bids for new local arts venues ordinarily entailed commitments to an evaluation of economic and social impact.
Consequently, there was a rise in consultants who specialised in evaluating the social and economic impact of cultural spending. Prince (2015) describes how, from the late 1990s onwards, such consultants collected and recorded mainly quantitative data on things like the number of creative or cultural businesses in a particular area, the number of people they employed, the amount of revenue they generated and other typically economic 'indicators' of cultural and creative activity. (Prince 2015, p. 584) The consultancies that Prince describes were gathering data that by and large did not exist elsewhere, and their evaluations contained summaries to suit a specific narrative. In the absence of reliable data, these consultancies provided a crucial resource.
Subsequently, the scope of consultancies' remits for the cultural sector broadened. Prince describes how the organisation in which he was embedded in 2010 was estimating social and economic impact with bespoke data collectionfor example assessing social impact of events through surveying attendees about changed perceptions (Prince 2015), with new demands on staff members to understand statistical power and significance (Prince 2014, p. 755). Prince describes a change in the model of consultancy to the cultural sector in several ways, indicating a move from responsibilities to collect and describe data to a more involved analysis of the data gathered.
On this basis, and although imprecise, we can group the activities of cultural consultants into three waves. In the first wave, from 1997 until the early-to-mid 2000s, cultural consultants' work primarily comprised data collection, with little national strategy. Next, work focused more on modelling of local data, for example, around the economic impacts of events. Later, analysis of large national data-sets gained prominence in the external research portfolio.
This later model of consultancy presents a picture of increased neutrality. Even when commissioned by a specific part of the cultural sector to demonstrate their value, the new consultants can model data from outside the sector, such as BHPS to evaluate libraries or Understanding Society to assess the impact of heritage visits on subjective well-being (Fujiwara et al. 2014a(Fujiwara et al. , 2015. This is possible given the increasing prominence of survey questions on life satisfaction and quality of life, resulting from what has been called the 'new science' of happiness (Layard 2007). We suggest that an image of enhanced neutrality enables a rhetoric of greater certainty in the cultural value claims being made, but the presentation of these new econometric models is impenetrable to many in the sector.
The change in consultancy we describe above coincides with the 2010 election and subsequent public sector cuts. In the case of culture, DCMS staff cuts decreased salary costs by 32%, the largest fraction of any government department (National Audit Office 2015, p. 19); Arts Council England (ACE) reported a 21% cut in staff numbers and reduced number of national offices (ACE 2013). Also around this time, a more detailed discussion of social benefits of the arts became prominent; instead of being defined in terms of who attends what events, the discussion tended towards positive outcomes at both the individual and societal levels, coinciding with growing attention to the 'new science' of happiness and its administration (Layard 2007).
Estimating specific positive outcomes is more technically demanding for both the producer and the consumer of the research. A new model estimating positive outcomes evolved through more elaborate quantitative techniques using national-level complex survey data. The representation of independent expert interpretation as superior to personal interest facilitated the alleged relationship between numbers and impartiality that drives evidence generation for policy-making. This perception, together with the authority of mathematical instruments for the analysis of surveys, evolved in the seventeenth century and, as Poovey outlines, has long enabled experts to move between the market and the political sphere, owing to the multidimensional nature of their perceived value and skills (see Poovey 1998, pp. 122-123, 138-139).
Similarly, today's experts use population data and econometrics to speak directly to HM Treasury. Therefore, the estimates they generate for the cultural sector correspond to the evidence demanded by Whitehall. While the interaction between independent expertise, the market, and the State may not be new, a new model of consultancy for the cultural sector has evolved; although similar evaluations have been commissioned from existing consultancies (such as Prince's example), an influential new organisation has emerged, which specialises in SROI work, as discussed in the next section.
The primary intended audience for SROI estimates may be HM Treasury. However, these estimates are also warmly received by the media. Headlines such as 'Dancing makes people as happy as a £1,600 pay rise' in the Telegraph emerge from the commissioning of language to tell a simple and selective story of cultural value (Swinford 2014). This article's reference to 'official figures' arrived at by '[r]esearchers from the London School of Economics … forms part of a drive by David Cameron to measure the impact of policies on people's happiness'. 1 The Telegraph article thereby expresses a version of cultural value in a way that is meaningful to the public, mass media, and Whitehall. Another example of modelling using large data-sets led to the headline 'Research supports theory that arts can heal society' (Hill 2017), reinforcing our proposal that the sector and national media respond well to these 'causal stories'.
The transition toward different models of consultancy sits alongside an ongoing 'crisis' in arts and humanities research and its legitimacy (Holden 2006, Belfiore and Upchurch 2013, Oancea, Florez-Petour and Atkinson 2015. The sector is, therefore, in a double-bind: increasingly expected to look to new models and complicated quantitative techniques to prove its value and impact, while underresourced to knowledgeably commission and evaluate these forms of research. Moreover, the rise of econometrics in advocacy research has further mainstreamed the use of numerical data by media and practitioners who may not fully comprehend the numbers that they cite. This paper intends to incentivise a reappraisal of this gap in knowledge of quantitative models and the demand for their production in the cultural and other public sectors, without due consideration for the realities of peer review.

Simetrica: a case study
Having established the new models of 'external between' research and the growing role of new modelling technologies in cultural advocacy, it is important to focus on the tasks that this research sets itself and how successful it is on its own terms. Does research conducted by external consultants have similar standards to that of the academic sector, and if researchers describe themselves as upholding high standards, does their analysis reflect this?
To address these questions, we focus on a specific instance of commissioned research by the consultancy Simetrica. Its website claims its title is a shortened form of 'Social Impact Metrics', thereby implying its remit; the front page of its website explains: Social impact analysis and policy evaluation are not one size fits all processes. We design bespoke frameworks by drawing from various philosophical principles. And we employ the most robust technical methodologies with the data to help organisations make optimal investment decisions with maximum confidence.
While a discussion of commissioned research might have used any number of consultancies, Simetrica's work has fastidiously remained tied to subjective well-being and social impact measurement using national survey data-sets, exemplifying the new model of consultancy described in the previous two sections. The organisation has been expanding into the Housing sector and is undertaking the evaluation of David Cameron's National Citizen Service (NCS). However, for the first few years, their research was predominantly commissioned by the cultural sector, including ACE, DCMS, the AHRC, Australia Council for the Arts, and English Heritage. One of Simetrica's features is its ambition to fulfil a description of the way that research can be usefully leveraged in the cultural sector: 'fully articulating all forms of the benefits of culture, using the language of public policy and cultural value, that funding decisions can be made that are acceptable to both central government and the cultural sector' (O'Brien 2010, p. 5). Less charitably, one could describe Simetrica's ambition as 'the killer evidence that will release dizzying amounts of money into the sector' (Scullion and Garcia 2005, p. 120). In what follows, we describe Simetrica's stated goals in more detail, first in their general aspirations and second in a specific report.

Simetrica's approach to philosophy and causality
Given Simetrica's focus on quantification of value, whether in direct monetary terms or those of some other measure of utility, their perspective on causality is important. The cultural sector is at best ambivalent about the numerical measurement of the work it does (e.g. O'Brien 2012, Walmsley 2012). Simetrica therefore requires plausible causal mechanisms to render its models persuasive.
Simetrica describes its staff as 'pioneers of some of the most important philosophical and methodological developments in social impact analysis techniques in use today'. 2 The use of 'philosophical' is crucial; expanding on this notion not only does the 'our work' 3 section claim that there are two fundamental approaches to causal inference, the Rubin model (based on counterfactual outcomes), and the Campbell model (based on statistical threats), it also claims that Simetrica's approach is unique in combining methods from both, allowing it to 'essentially address any research question around impact measurement. We guarantee that our impact evaluations will be of a quality equivalent to academic journal publications and government research'. Aside from whether there are two fundamental approaches to causal inference, whether these two are the Rubin and Campbell models, and whether Simetrica is unique in its ability to combine them (to which the philosophical literature is likely to be hostile; see e.g. Friedman 1980, Cox 1992, Pearl 2009, Morgan and Winship 2015, it is advisable to explore exactly how causality is described and equivalency is justified in these reports if we are to understand their role and validity in advocacy.

Museums and Happiness
We therefore focus on a specific report from Simetrica's director, 'Museums and Happiness: the value of participating in museums and the arts' (Fujiwara 2013, hereafter, M&H). This report was commissioned by the Happy Museum Project and funded by ACE. We focus on this report for three main reasons. Firstly, it uses data from the Taking Part Survey (TPS: DCMS 2016), which includes variables covering activities offered by the publicly funded cultural sector and different leisure activities, broadly defined. Secondly, the analysed version of TPS is pooled cross-sectional survey data, thereby limiting the extent of the causal claims that can realistically be made. Finally, M&H appears to be the first report of this genre: using subjective well-being in an econometric model to estimate the relationship between cultural participation and happiness. Successful awards of subsequent contracts to undertake similar work can therefore be seen in the light of this report.
M&H's key stated goal is to 'look at the impacts of the arts on people's subjective well-being and health and attach values to these impacts'. This 'Well-being Valuation approach' (p. 7) was developed by Fujiwara and forms part of HM Treasury guidance (Fujiwara and Campbell 2011). 'The Green Book' is thought to impact on government policy-making (Dolan and Fujiwara 2012), and has been prominent in discussions around the representation of cultural value to policy-makers (O'Brien 2010).
To look at the impact of museums, the author aims to estimate 'monetary values by looking at how a good or service impacts on a person's well-being and finding the monetary equivalent of this impact'. In other words, the coefficients behind particular types of participation are compared with the coefficients behind income, to see how large a change in income would have to be to correspond to a change in participation. The income coefficient is multiplied by 8, as the relationship between income and happiness appears unrealistically weak in these data; the multiplier of 8 is chosen to be comparable with findings from the author's other work predicting happiness using instrumental variables (Fujiwara 2013).
This approach to generating estimates is described in the foreword by the Happy Museum's director. The introduction is illustrative of what the enthusiastic parts of the sector think it is paying for, describing the brief as having asked the author: … from the London School of Economics to measure and value people's happiness as a result of visiting or participating in museum activity.
This implies that measuring and valuing happiness as a function of visiting museums is a soluble task and that the researcher's academic affiliation provides authority to do so. Regarding the value of these estimates in advocacy, the foreword continues: 'we think that quantitative evidence that robustly uncovers cause and effect is more likely to influence policy makers', and describes the analysis as a rare application of 'robust quantitative methods on large national datasets'. (Fujiwara 2013, p. 5) The narrative described above is thereby reinforced, wherein the value in external quantitative research lies in its power to persuade policy-makers to continue funding the cultural sector, concluding that 'the report makes a strong case for investing in museums'.
The key findings of M&H are summarised as follows: . People value visiting museums at about £3200 per year. . The value of participating in the arts is about £1500 per year per person. . The value of being audience to the arts is about £2000 per year per person. . The value of participating in sports is about £1500 per year per person.
Following these key findings, caution is advised on the basis that arts participation and museum attendance are not randomised, there are latent factors likely to affect both participation and outcome variables, and that it is possible that the described causal relationship could be backwards. However, the author reports that they 'have taken steps to employ the most robust statistical methods possible given the data and this level of statistical rigour passes thresholds used by many OECD governments in impact assessments'. (Fujiwara 2013, p. 8). Therefore, they argue that despite acknowledged threats to causal inference, this evidence is strong enough to claim causality to a government agencyin this case, likely HM Treasury.
Indeed, in the subsequent 'Caveats' section (p. 17), the authors state that they have ' … used as many of the potentially confounding explanatory variables as possible', and that '[t]his level of rigour (multivariate analysis) is anyway normally acceptable in public policy-making and policy evaluation in OECD governments … '. This implies that the use of modelling using more than two variables is of equivalent sophistication to other Treasury-facing reports.
M&H includes four regression tables, based on the following equations: estimating the relationships between . museum participation and happiness; . museum participation and health; . arts audience, participation, and happiness; and . arts audience, participation, and health.
with a final figure estimating attendance at museums in the first place. In fact, each of the tables allegedly summarising the results of a model incorporating museums actually summarise four models; there are four museum variables, each entered separately. In terms of these key participation variables, the tables report four significant positive relationships (at the 95% level): 4 between visiting museums and happiness, visiting museums and health, being an arts audience member and happiness, being an arts audience member and health, and participating in sport and health. The tables also report weaker relationships (significant at the 90% level) between participation in each of sport and arts and happiness. All other relationships with participation variablesmost of those testedare not statistically significant, except for a significant negative relationship between volunteering in museums and happiness.
The variables used are as follows. SWB i consists of responses to the question 'Taking all things together how happy would you say you are?' on a scale from 1 to 10 where 10 is described as 'extremely happy' and 1 as 'extremely unhappy'. This is described in the Happy Museum report as follows: 'Happiness taps in to people's emotions, technically their affective state, and hence tries to gauge people's moods at that moment' (Fujiwara 2013, p. 12). Meanwhile, Health i consists of responses to the question 'How is your health in general?' on a scale from 1 to 5 where 1 is 'very good' and 5 'very bad'. This is justified as ' … questions on general health will cover mental health and so we may be able to pick up some aspects of well-being or happiness that are not captured in the stand-alone happiness question' (Fujiwara 2013, p. 13). It does not necessarily follow that responses to this general health question will offer anything meaningful regarding subjective well-being; instead, it serves to provide two outcome variables for triangulation. In any case, the key significant relationships with participation variables are the same for the two outcome variables.
The variables capturing Arts i are different in each set of models. The museum variables are as described above: 1/0 for each of whether participants visit museums in their free time and whether they volunteer in museums, a measure of the number of hours spent in museums per year, and the number of museum visits per year. For the non-museum models, there are three variables: sport, 1/0 for whether participants had done sport or other physical activity in the last four weeks; 1/0 for whether participants had (in the last year) participated in each of ballet, dance, singing, playing music, painting and drawing, photography, or crafts; and 1/0 for whether participants had (in the last year) attended exhibitions, opera, concerts and live music, ballet, and dance.
Meanwhile, the variables in each of x i , y i , and z i are as follows: y i is logged personal annual earnings (in £5000 bands), and each of x and z are vectors of controls: binary variables for each of marital status, religiosity, educational qualifications (having General Certificates of Secondary Education (GCSEs) and above vs. not), sex, employment status, frequency of meeting friends (at least once a month vs. less than that), being in London, satisfaction with the local area ('satisfied' and above vs. less than that), smoking, ethnicity (white vs. other), and volunteering; and scales for numbers of children in the household and how often participants drink (from 'never' to 'every day'). The self-rated health measure is also incorporated into the x vector. The error term is not made explicit in these equations.

Reproducing Museums and Happiness
What follows below summarises our attempts to reproduce the results in the report. We focus on table 4 of the original, the model estimating happiness based on participation in sports and arts and arts attendance. This is because it summarises one model, rather than the four in each of the museums tables, and because the measure of subjective well-being (happiness) seems more realistic than health. As with the authors, we have pooled the cross-sectional samples from each of the waves of the survey between 2005/06 and 2010/11, although observations from the 2009/10 wave are not included as some variables in the models were not collected at that wave (such as the questions about smoking and drinking). The first column contains estimates for our reproduced model without survey weights, the second estimates with survey weights, and the third the estimates from the original report. Models are fit with Stata 13.1's regress commands, with robust standard errors (estimates without these: Breusch-Pagan test of heteroscedasticity, p < .000) ( Table 1).
While small differences in regression coefficients are to be expected, some of the discrepancies are striking. First, our models include more participants -14,000 morewhich may reflect the use of slightly different survey items, with this discrepancy roughly approximating to the number of participants in a survey wave. In terms of variables, the coefficients behind income in these estimated models are small fractions of those in the report and are not statistically significant. 5 One interpretation of this is that participation in arts is worth 10 times as much as in the report headlines -£15,000 per yearbut that is because the relationship between income and happiness is relatively weak. Indeed, if all other variables are removed from the model, leaving a bivariate regression, the coefficient behind log income in predicting happiness is 0.125, with an R 2 of less than 0.01. In other words, if log income is the only variable predicting happiness, with no controls, the relationship is only twice as strong as the M&H reported relationship net of significantly associated controls.
There are other relevant discrepancies, which may have been caused by differences in reported and actual recoding; the coefficient behind high education in the report is half of that in our estimates; the same is true for having children. The sign changes (but the magnitude is similar) for the non-white variable; given the mean happiness for non-white respondents is 7.5, compared with 7.7 for white respondents, this indicates a coding error.
Crucially, however, the coefficients behind arts activity are substantially different; our results show no statistically significant relationships between audience membership and happiness, while (in contrast to the report) we do find statistically significant relationships between participation and happiness. Even if the explanation is that we have recoded variables in different ways from those of these authors, what follows from that is that difference in coding, based on the way it is reported, leads to the key finding being backwards. This drastically undermines M&H and its Note: Unusually, these tables are annotated with '*** = 0.01; ** = 0.05; * = 0.10', unlike the standard approach where *** <0.001, ** <0.01, and * <0.05; we have followed this convention here.
reception. The reports' key headlines focus on the positive relationship between happiness and attending particular activities. These do not stand up to replication and raise questions about the robustness of the analysis conducted. The stated caveats in the report do not allow for the findings to be as vulnerable as this.

Extending Museums and Happiness
There are also questions about the operationalisation of 'participation' and 'audience'. M&H includes some variables and excludes others in its construction of these terms, thereby implying a prior decision concerning which variables relate to happiness and which do not. If the operationalisations are too narrow, and 'participation' and 'attendance' do not include all activities that should be classified within these scales, then the apparent positive effects from participation could reflect something broader than just the publicly subsidised cultural sector. Alternatively, if the operationalisations are too broad, then the positive association between participation and happiness might be driven by one activity, or type of activity, and other activities are then undeservedly classified as being associated with happiness. For example, if dancing is associated with happiness but playing a musical instrument is not, and the two activities are collapsed into a single variable, then dancing will be under-credited for its association with happiness, while playing an instrument over-credited.
Because of this ambiguity, we estimate two further models, which differ from the original in two ways. The first generates different binary variables of arts participation and attendance to understand whether the apparent associations between the arts and happiness are driven by the author's specific operationalisations. The second adds variables for other activities outside of the publicly funded arts sector, to understand whether these associations could be accounted for by participation per se, rather than participation in the publicly funded sector (see Miles and Sullivan 2012).
The binary variables for arts participation and attendance are as follows: 'look at art'; 'watch a performance inside'; 'outdoor art/performance'; 'perform'; 'make'; and 'solo', with constituent elements in Table 2. This classification might seem as arbitrary as the original but demonstrates the effects of changing the classification method, to test the robustness of M&H's findings. The other activities are going to restaurants, to pubs and bars, doing Do it Yourself (DIY), doing gardening, and playing video games.
The regression results are contained in Table 3. Model 1 is identical to the 'weighted' model in Table  1 and is included for comparison; Model 2 changes the operationalisation of participation and attendance to that contained in Table 2; and Model 3 incorporates measures of additional activities. Table 3 shows that the single arts participation variable masks differences between activities. The coefficient behind 'perform' in Models 2 and 3 is more than twice the 'participate in arts' variable in Model 1, while the coefficient behind 'Make something' is around the same. By comparison, the coefficient for 'Solo' is statistically insignificant from zero. Furthermore, while (as in Model 1) the coefficient for arts audience variables are zero in Model 2, by Model 3 the coefficient is negative for attending galleries and so on; one does not expect press releases about how going to museums is equivalent to a pay cut. Meanwhile, when the other activities introduced in Model 3 are considered, going to restaurants is associated with a greater increase in happiness than any of the arts or sport variables, and the relationship between gardening and happiness is of a similar magnitude to participating in both arts and sports. However, not all free time activities are positively associated with happiness; doing DIY is negatively associated with happiness, while playing video games and going to the pub are not significantly associated with happiness in either direction.

Discussion
What has been learned from re-performing the analysis in M&H? The monetary estimates of the relationship between participation and SWB are not based on the models estimated. Instead, the report compares the relationships between participation and SWB, and income and SWB, by comparing coefficients. The estimates of this latter relationship are then discarded in favour of estimates from other reports using different modelling techniques, on the basis that the underlying epistemology for estimating the relationship between income and happiness using instrumental variable modelling is superior. Estimates of the relationship between participation and subjective wellbeing using this epistemologically inferior model are treated as valid and compared with the superior estimates for income. This balance of techniques generates attractive-sounding estimates of the SROI of museums, with attendance at museums being estimated at £3200 a year; the absurdity of estimates exclusively using this inferior method, at around £32,000 per year, necessitate this hybrid modelling.
The estimates in M&H are not uncontroversial. A generous interpretation is that our classification and the report authors' coding of variables differ, which one would expect given the discrepancy in reported Ns. Even if this is the case, however, the radically different estimates of the relationship between cultural participation and happiness should at least inspire caution about confident claims regarding both the direction and the magnitude of this relationship.
Furthermore, the results in Table 3 show that operationalisation of participation matters. M&H operationalises participation in one way and, on that basis, derives monetary estimates of the value of participating in any of the arts activities they incorporate. A reader might infer that painting is associated with happiness and wonder why the painters they know are no happier than the nonpainters; Table 3 shows that this could be explained by inappropriate classification. Yet another classification method might show that painters are happier than non-painters; Table 3 merely shows that a single classification method is inappropriate for drawing this kind of conclusion. This should raise questions about the epistemological basis of the analysis. The author is represented as an innovator in causal model development; yet, this report presents regressions with multiple binary independent variables with arbitrary cut points, poorly specified dependent variables, and importing some (but not many) estimates from better-specified models. While it is claimed that two approaches to causal inference have been successfully reconciled, M&H does not meet the most modest reading of either.
Given the discrepancy between M&H's objective and the report's achievements, why is the relationship that they describe accepted and publicised within the sector? The Museums Journal described the report as having 'found museums improve people's happiness and perception of good health, even after other factors that might be influencing them are accounted for' (Harris 2013), and go further than the original author by claiming that visiting museums 'boosts' happiness, as opposed to museum-goers being happier.
Thus, M&H claims to 'boost' museums without critical engagement in evidence at a time when the sector's capacity to interrogate the results decreases. If relevant research departments have been decimated, and research is increasingly 'external', then independent replication of results they have commissioned is unlikely. Trust in the quality of evidence is established by award of the tender and track record with previous clients, such as HM Treasury. Furthermore, these 'external between' researchers of independent 'external' research wear the credentials of association with what is referred to as a 'world-leading' research university.
Indeed, academics have little reason to interrogate such results either; the citations received by this report largely consist of using it as an exemplar of secondary analysis for estimating social return on investment (Leck, Upton and Evans 2014); where the response is more critical, this is due to general issues with deriving monetary SWB estimates (Wheatley and Bickerton 2017). The lack of critical interrogation of reports such as these from the academic community is not limited to issues around modelling, however. If our replication of M&H had found identical estimates to those in the report itself, we would remain sceptical of claims of how happy arts participation makes people and would expect any other critical interrogation to have drawn similar conclusions. Recent work is increasingly critical of such use of quantitative data. The gentler end of this scale involves damning it with faint praise by stating that '[s]tatistical data well channelled can provide useful ancillary information' (Phiddian et al. 2017, p. 179); the harsher end involves describing 'ideas pertaining to the measurement of culture's value' as 'stupid' (Meyrick and Barnett 2017, p. 109). Focusing on the unsuitability of measuring the subjective experience of culture is, we would argue, a distraction from what is really at play.
In an article developed from an advisory paper to the ONS' Measuring National Well-being Programme, Dolan and Metcalfe explain that any measure of well-being must be 'empirically rigorous', meaning 'that the account of wellbeing can be measured in a quantitative way that suggests that it is reliable and valid as an account of wellbeing'. Although the insistence that any empirically robust account must always be quantitative is contestable, the authors continue by observing that any metric should 'be sensitive to important changes in wellbeing and insensitive to spurious ones. In practice, distinguishing between the two is quite a challenge and often relies on judgement based on a priori expectations' (Dolan and Metcalfe 2012, p. 411). The presentation of these new models obscures this fact. Furthermore, the methods used to estimate relationships between measurements of culture and other social scientific concepts are still methodologically immature, as acknowledged by both the government's 'happiness Tsar' and the head of DCMS' Culture and Sport Evidence scheme (Rustin 2012, comment on McKenzie 2015. Therefore, this paper does not intend to portray all analysis of quantitative data around the arts and cultural sector negatively, nor disputes its contribution to knowledge. Instead, we argue that more work should be done to encourage transparency of what exactly is being measured, how, and to what end, especially given indications that external consultants commissioned to solve a problem (van Helden et al. 2012, p. 207) and that problem is the requirement for successful causal stories for cultural advocacy. Furthermore, this work should be presented in a way in which not only headline findings are accessible to policy sectors dependent on data models for advocacy, and the sector itself should be looking to improve its quantitative literacy to appraise the research it commissions.
It is currently difficult for the sector to appraise the suitability of describing the results in terms of indifference between income and cultural participation, as implied by M&H. Existing data infrastructure does not provide opportunities for estimates that avoid major problems with endogeneity. Furthermore, sufficiently precise estimates of relationships with individual activities are not yet possible, necessitating grouping of large ranges into single variables. As we have shown, the operationalisation of 'culture' makes crucial differences to the estimates generated. The best data-sets for current purposes are TPS and Understanding Society, but TPS's longitudinal sample is small and Understanding Society's measurement of cultural participation is generic. These weaknesses, combined with weak epistemology in SROI, led a prior head of research at DCMS to describe these approaches as a 'dud technique' (comment on McKenzie 2015).
We argue that critical engagement with the constructions of models, the available data, and the variables chosen is necessary, empirically and theoretically. Only through such engagements can cultural research begin to consider the answer to questions of appropriateness of SROI and well-being as econometrics to describe its qualities in certain forums, whether these are to HM Treasury, the public via headlines, or the sector that is commissioning these models.
This case of research commissioned for evidence-based policy, where museums were able to successfully advocate for their ongoing funding, thereby reinforces existing inequalities. Here, detailed questions on participation in the cultural sector existed in a large-scale survey commissioned by DCMS; respondents were also asked questions about income and happiness. Because of the decision to measure cultural participation and the decision to measure specific forms of cultural participation, it is possible for these models to exist. The museums sector also had the resource (from ACE) and the know-how to commission an organisation to conduct this analysis, arguing for continuing support. Other sectors, perhaps associated with 'informal and everyday cultural practices', increasingly referred to as 'everyday participation' (Miles and Sullivan 2012, p. 311;Miles and Gibson 2016), may not have access to such analysis. Crucially, these activities are less likely to feature in largescale surveys, making such evaluationsand headlinesless possible. Everyday participation also does not feature in other well-funded policy-facing well-being research initiatives, such as the What Works for Well-being centre and the ONS' Measuring National Well-being programme (see Oman 2015 for a discussion on the latter). Essentially, activities that are funded are measured for evidence-based policy; that these activities are measured makes it easier for them to make an ongoing case for investment. More generally, this example represents a larger issue of policy-directed evidence or research for advocacy. A reduction in internal expertise reduces institutional memory, resulting in 'no culture of a repository of knowledge' (Hallsworth, Parker and Rutter 2011, p. 8). In turn, this aggravates problems related to 'policy churn', with 'new evidence' commissioned to support what is thought to be a new political argument (Norris and Adam 2017) without the necessary expertise to reference prior robust research. This problem has been identified in the cultural sector that has long lacked an 'evaluating evaluation' culture (Davies and Heath 2013).
Consequently, relevant stakeholders do not have the inclination, the time, or the opportunity to assess the research they have commissioned or used, be it as an intellectual resource or a campaigning tool; there is no resource to reproduce evaluations and few researchers can identify their organisation's commissioned research over a long period. These absences enable research for advocacy, which is branded as legitimate, to become some of the major resources (and major uses of resources) of the cultural sector. Given that this type of causal story has proved a successful campaigning tool for the sector and frequently appears in further research, there is no indication that the situation will change.

Conclusion
We have 're-performed' (Oman 2015(Oman , 2017 an example of subjective well-being (SWB) econometrics, drawing attention to the ways in which they are presented, in order to highlight the discourses and activities that demand cultural advocacy as economics to speak to HM Treasury. In 'question [ing] … what economics does … [we] follow [a specific example of] it at work' (Mitchell 2005, p. 318), thus revealing the 'work' that subjective well-being does as it reproduces across the creative economy. Our case study of Museums and Happiness argues that despite recent intellectual and practical interest in operationalising versions of subjective well-being (SWB) to appraise governance decisions, such engagements have not been critiqued through practical re-application, and that this is problematic for 'guarantees' of 'equivalence of quality' and rigour to research published in academic journals. 6 In highlighting claims to authority as a procedure, we have drawn attention to how, once these needs are met, an econometric model can perform (Butler 2010) as if ratified across the cultural economy of research and funding. In so doing, we draw attention to how the politics of method (Law 2009) reinforce realities in which research undertaken by external consultants is required to 'solve a problem' of advocacy in evidence-based policy-making. Furthermore, we have shown that the way in which culture is managed increases the likelihood that the most prominent (through prior funding) activities will be researched, thereby influencing the economy of public investment in cultural activities To critically engage with this new model of expertise, we replicated an externally commissioned SROI model that argues for the value of cultural activities to well-being. We find the author's operationalisation of participation is crucial, and their estimates questionable. As the government 'happiness Tsar' 7 explains, well-being econometrics are decades from being robust enough for policy-making (Rustin 2012). Therefore, these quantitative technologies are being tweaked to make them policy-ready to the detriment of public sectors suffering economic and political vulnerability. Yet, public institutions are apparently unaware of this when investing in models with assertive positive results.
Furthermore, a new ecology of 'external' research, as between the academy and commercial consultancy, is being represented as a robust solution to a resource and credibility problem, and this necessitates critical engagement. We contend that this new genre of research for policy exists between the market and academy. It relies on claims to authority in both and levies this in the facilitation of discourses of value across policy sectors, Whitehall and the popular press. We argue that these new research organisations are indicative of a potential new cultural economy of expertise emerging from the recent increasing demands on universities to diversify their incomes. We conclude that the intricacies of this new model of research consultancy demand closer academic consideration given changing research funding structures and recent attention to expertise and the value of contracting out public services.
Mark Taylor is lecturer in Quantitative Methods at the University of Sheffield, UK. He is a sociologist researching the relationship between culture and inequality, asking what gets to be classified as culture and by whom, the social basis of different forms of cultural consumption, and issues of inequality in the cultural labour force. He is currently working on an AHRC-funded project on historical social mobility into cultural jobs. Methodologically, he is interested in the analysis of survey data, and data visualisation.