Selling ‘impact’: peer reviewer projections of what is needed and what counts in REF impact case studies. A retrospective analysis

Abstract The intensification of an audit culture in higher education is made no more apparent than with the growing prevalence of performance-based research funding systems like the UK’s Research Excellence Framework (REF) and the introduction of new measures of assessment like ‘impact’ or more specifically, the economic and societal impacts of research. Detractors of this regulatory intervention, however, question the legitimacy and credibility of such a system for, and focus within, the evaluation of research performance. Within this study, we specifically sought to understand the process of evaluating the impact of research by gaining unique access as observers of a simulated impact evaluation exercise populated by aproximately n = 90 senior academic peer reviewers and user assessors, undertaken within one UK research-intensive university prior to and in preparation of its submission to REF2014. Over an intensive two-day period, we observed how peer reviewers and user assessors grouped into four overarching disciplinary panels went about deliberating and scoring impact, presented in the form of narrative-based case studies. Among other findings, our observations revealed that in their efforts to evaluate impact, peer reviewers were indirectly promoting a kind of impact mercantilism, where case studies that best sold impact were those rewarded with the highest evaluative scores.


Introduction
In the milieu of higher education's marketisation (cf. Palfreyman and Tapper 2014); massification (cf. British Council 2014); globalisation (cf. King, Marginson, and Naidoo 2011) and neoliberalization (Peck and Tickell 2002; see also Giroux 2014), academics are challenged to become more gregarious even promiscuous, entrepreneurial and implicated in their external relationships (cf. Slaughter and Rhoades 2004). Academia's ivory tower is in the process of being dismantled whilst the insulation and immunity from public scrutiny it afforded is dissipating. In lieu, academics, in the United Kingdom (UK) especially, face formal demands for visibility, transparency and public accountability (cf. Watermeyer 2015aWatermeyer , 2015b. These are new professional obligations, the prominence of which may be attributed at a macro-level to events of global economic downturn and a UK Government policy of austerity in the distribution of public funds, specifically funds allocated for higher education. At a ABSTRACT The intensification of an audit culture in higher education is made no more apparent than with the growing prevalence of performance-based research funding systems like the UK's Research Excellence Framework (REF) and the introduction of new measures of assessment like 'impact' or more specifically, the economic and societal impacts of research. Detractors of this regulatory intervention, however, question the legitimacy and credibility of such a system for, and focus within, the evaluation of research performance. Within this study, we specifically sought to understand the process of evaluating the impact of research by gaining unique access as observers of a simulated impact evaluation exercise populated by aproximately n = 90 senior academic peer reviewers and user assessors, undertaken within one UK research-intensive university prior to and in preparation of its submission to REF2014. Over an intensive two-day period, we observed how peer reviewers and user assessors grouped into four overarching disciplinary panels went about deliberating and scoring impact, presented in the form of narrativebased case studies. Among other findings, our observations revealed that in their efforts to evaluate impact, peer reviewers were indirectly promoting a kind of impact mercantilism, where case studies that best sold impact were those rewarded with the highest evaluative scores. meso-level, they are new conditions of service predicated upon a fiscal (re)rationalisation of Higher Education (HE), particularly in the context of its relationship with the state and the market (cf. Kolsaker 2013;Kuber and Sayers 2010) and the intensification of a system of performance management or 'new managerialism' (Deem, Hillyard, and Reed 2007). Academia's role in what is now variously designated a networked (Castells 2000;Van Dijk 2006), stakeholder (Neave 2002), information (Webster 2006) and/ or knowledge (Stehr 2002) society is thus increasingly delineated and/or justified by its contribution to innovation (cf. Christensen and Eyring 2011), knowledge production, distribution and appropriation (cf. Smith and Webster 1997;Walton 2011;Zemsky, Massy, and Wegner 2005).
In the UK, a focus on the commercialisation of universities (Bok 2003), academic entrepreneurship (Shane 2004) and academic capitalism (Slaughter and Leslie 1997) has intensified with a succession of Government-sponsored reports: the Lambert review of business-university collaborations (2003); the Warry Report (2006); Leitch (2006) and Sainsbury Report (2007) reviews. These have been accompanied with the expansion and embedding of new models of knowledge production -post-academic (Ziman 2000); mode-2 (Gibbons et al. 1994;Nowotny, Scott, and Gibbons 2001) -and a greater accent on knowledge translation, transfer and commercial exploitation (Etzkowitz 2002(Etzkowitz , 2003Lester 2005;Leydesdorff and Etzkowitz 1998;Rip 2011). What appears an attempt to 'shorten the innovation chain' (Corbyn 2009) and produce more easily evaluated, immediate and immediately recognisable results and benefits is bemoaned by critics who correlate what they perceive as the instrumentalisation and commodification of research (cf. Burawoy 2011) with the stranglehold of managerial governmentality (cf. Olssen and Peters 2005). The economisation and continuous auditing of academia's public role is in the same such context blamed for the erosion and abandonment of Mertonian principles of universalism, communalism, disinterestedness and scepticism and an elevated responsiveness from academics to 'external governmental and/or industrial pressures for near-term or near-market applicability' (Moriarty 2011, 64). Academic researchers accordingly find themselves having to simultaneously operate as 'impactful' knowledge workers whilst collecting and submitting evidence of the economic and societal impact(s) of their endeavours to performance review.
A UK 'impact-agenda' for higher education has faced hostility from significant numbers of academics (cf. Collini 2012;Ladyman 2009;Russell Group 2010University and College Union 2009) and contributed to the emergence of oppositional organisations such as the Council for the Defence of British Universities (2012) established for 'the purpose of defending academic values and the institutional arrangements best suited to fostering them' . Even so, impact as a criterion for funding and assessment decisions is now firmly established within the governance of UK HE and offers a model for imitation by other national academies interested in or already pursuing performance-based funding systems.
In the contemporary landscape of UK HE, impact is organised and adjudicated in funding terms through Impact Summaries (IS) and Pathways to Impact Statements (PIS), which are mandatory requirements of all UK Research Council funding applications. IS and PIS provide the means by which researchers specify or perhaps more realistically speculate upon actions they will undertake within the research process that will produce positive impact(s). In the context of research assessment, the Research Excellence Framework (REF) provides the formal basis through which the economic and societal impact(s) of research are retrospectively evaluated and rewarded. The REF -successor to the Research Assessment Exercise (RAE) 1 -is the UK's current national performance-based funding system for academic research, the results of which inform the UK Government's distribution of approximately £1.6 billion of research monies across UK universities. The first and only REF to date occurred in 2014 2 and unlike the RAE demanded that academics explicitly evidence the impact of their research through an impact template and impact case study, which were evaluated by expert peer reviewers and user assessors belonging to disciplinary (sub)panels. Impact in the REF constituted 20% 3 of the total available award and consequently resulted in a huge investment made by institutions in preparing their impact REF submissions 4 (cf. Watermeyer 2014) 5 . However, whilst impact has formed an ineluctable aspect of research governance in UK HE and looks set to feature in even more significant terms in future instalments of the REF, recent evidence finds serious concerns about how impact is conceptualised and how possible it is to 'measure' (Leathwood and Read 2012).
The conceptualisation and measurement of impact feature as the core concerns of this article, which reports on a small but intensive ethnographic study of impact peer review undertaken in one institution. More specifically, the article considers the process undertaken by approximately n = 90 senior academic and user group representatives engaged in a 'simulated' evaluation of the socio-economic impact claims made by researchers at Cardiff University, a research-intensive 'Russell Group' 6 institution located in the capital city of Wales, UK. The evaluation exercise, which occurred over a two-day period, attempted to prospectively mimic the processes and conditions by which impact would be evaluated in REF2014 and by way sought to generate hypothetical projections of what REF panel members would look for in high-scoring impact case studies. The assessment days were, therefore, in part both summative and formative, producing scores for case studies and suggestions of how these might be improved.
Our findings though drawn from simulated deliberations should be accorded no less value in helping to elucidate the process undertaken by peer reviewers in determining what counts as impact. This may be particularly so where what qualifies as impact is inherently subjective; highly variable and disparate if not divisive; contingent on the vagaries of internal and external conditions and intrinsic and extrinsic motivations; and unlikely to be ever exactly uniform or entirely replicable. Having said that, we observed relative consistency in inter-reviewer reliability, where reviewers accessed and made use of similar 'resources' in synthesising individual viewpoints and locating consensus in the scoring of case studies across the panels. Furthermore, with the benefit of time elapsed since the completion of this exercise and the REF -and the release of data related to institutional impact scores where Cardiff as an institution scored highly -we may reasonably infer complementarity between the two. We might also then speculate that REF panellists may have likely drawn on a common set of academic standards to review impact case studies. Ultimately, given the degree of confidentiality and restrictedness associated with the official REF panels, this simulated version provided a unique opportunity to explore peer reviewing processes of impact and gain a sense of what constitutes excellent impact in the mind of reviewers.
Many of the academics involved in this exercise were subsequently also involved as REF panel members and therefore participated in the actual peer review of impact at a national level. Many also had prior experience of serving as panellists in the RAE2008. They accordingly were individuals with a good sense of what would and what probably wouldn't be needed, count and/or be appreciated by REF panellists and, moreover, benefitted from significant experience of research peer review. Some also held directorial positions for research within their academic departments and were consequently immersed in strategy related to their own discipline's REF submission. Our observees were also, however, a coterie for whom the process of REF impact evaluation was entirely new and without precedent. Unsurprisingly perhaps, we noted many occasions where reviewers, almost apologetically, spoke of the inexactness and/or precariousness of their evaluative conclusions and their own uncertainty, if trepidation, in scoring case studies, predominantly 'guesstimates' they hoped would mimic the impact scores of REF panellists. As a 'mock-up' , the assessment days provided not only an opportunity for further preparing the institution's REF impact portfolio and refining and improving its collection of case studies, but also provided insight into how the peer review of impact in the REF might occur. Furthermore, for those that would go on to serve as REF panellists, the assessment days provided a dress rehearsal to the main event.
As an article that explores 'what counts' or the 'art of academic persuasion' , we hope to evince the sociological complexity of impact peer review, whilst illuminating what is accepted or now expected, as and from research/ers. As a case study into organisational responses to, and sense-making of institutional change, we also hope to exemplify the complicity of universities and academic elites with UK HE policy. We would, however, caution that the article is not so much intended as a substantive critique of impact in the REF, certainly as an iteration of performance management deleteriously affecting academic practice -though as might be expected, this theme is not without allusion -as a study of peer review in academic contexts, which remains its primary focus.

Rationale and approach
Following a successful application to the University's REF Committee to observe the exercise and having secured ethical approval, we undertook observations across two days of intensive impact deliberations, comprising the full range of the University's research activity, represented across four mock panels 7 across which we rotated on an individual basis. These observations were instrumental to our developing empirically based understandings of how impact case studies, across all disciplines, are scrutinised and valued by assessors. Concurrently, we were able to observe not only what reviewers count as an excellent case study but what seems to count in the context of institutional performance.
In the larger context of academic peer review, we have discerned a dearth of empirical inquiry related to impact, which is perhaps not particularly surprising given it being a burgeoning aspect of performance evaluation. Whilst the peer review of research has a strong if modest literature distinguished mainly by the work of Michelle Lamont and her monograph on 'How Professors Think' (2010), empirical research, certainly of an ethnographic variety, is scant and where more abundant is indigenous to work into process and impact studies in health and medical research contexts (cf. Hanney et al. 2013;Jones, Donovan, and Hanney 2012).
Ethnographic observation of the peer review process was intended to plug a knowledge gap by providing an immersive and direct experience of how academics and their user group counterparts go about making collective decisions concerning the legitimacy and merit of impact claims. Our role as silent observers at the same time militated against us in any way influencing or contaminating the peer review process. Furthermore, the seniority and considerable experience of the reviewers afforded us greater invisibility, where they were less than concerned by our being present. Notwithstanding, reviewers were made explicitly aware of the nature of our participation, a condition for which was predicated on our providing an executive summary of findings, which was subsequently presented to the university's REF committee and which articulated -in a more perfunctory, prosaic or less expansive way -some of the findings presented herein. 8 A further condition was that any submission of findings for publication occur in good time after the completion of REF2014 and on the basis that our delayed publication would mitigate against any lost competiveness suffered by the university in the terms of its impact submission by, albeit indirectly, openly declaring locally sourced knowledge related to the production of (potentially high-scoring) impact case studies. This hiatus has also been useful in neutralising any sensitivities potentially arising from the more critical dimensions of our discussion.
Whilst we were not permitted to make audio recordings for reasons of confidentiality, we took extensive field notes, which were ordered into full written accounts at the end of each assessment day, ensuring accuracy and detail. We subsequently undertook a thematic analysis of data, which was organised into two overarching categories of what reviewers interpreted as what was needed and what counts in REF impact case studies.
For the purpose of clarity, we have distinguished 'reviewers' as those we observed in the assessment days from 'panellists' as the official REF impact evaluators.

Findings
The first point to note about our observations and what they might tell us about how the academic assessment of impact takes place is that these data were generated at a very specific point in time, in very specific circumstances. For example, a key feature of the panel discourse we witnessed was how members tried to 'second guess' or predict how REF panel members might respond to specific case studies and subsequently tailor their own response. This kind of second guessing would not, we suggest, have been a feature of actual REF panel decisions, apart perhaps from sub-panels members speculating on how colleagues in cognate disciplines would score impact. In addition, post-REF-2014, academics and university administrators have a far clearer idea of what counts as impact, how to evidence impact claims and many of the other aspects of this process, which were highly uncertain for the mock-REF panels we observed. Thus, at the time of this exercise, approximately 18 months prior to UK universities' submission to the REF2014, the kinds of conceptual and applied frameworks for impact as a measure of assessment used by those we observed appeared resoundingly inchoate. Understandings of what constitutes impact, and moreover excellent impact, and excellent evidence of impact were embryonic. Both senior managers and academic faculty appeared equipped at best, with a vague sense of how to organise and present impact case studies and how these would be assessed by REF panellists. For example, one assessor, who had sat on a previous RAE panel and had been appointed as a member of a REF disciplinary sub-panel, commented that 'none of us really know how to handle this' .
Our main findings relate to what reviewers reflected upon and envisaged, through their experience of the simulated evaluations, as the core factors associated with and influencing high-scoring REF impact case studies. These are dealt with in the following discussion and illustrated in Figure 1, as questions related to 'what is needed' and questions related to 'what counts (and/or does not) count' in the production of REF impact case studies.

What is needed? Style and structure
Arguably, the most prevalent theme to emerge from reviewers' discussion of impact case studies concerned how authors had gone about presenting the impact of their research. Reviewers wrestled with a number of interrelated issues, including but not restricted to impact attribution, timelines and linkage and the manner of their handling or mishandling by case study authors. Reviewers paid particular attention to, and were arguably most discriminating when assessing claims of impact ownership and the exclusivity reported by an institution's individual researcher or research collective when declaring impact: 'Professor "Perkins" has clearly done this work but there are other researchers involved. How much of the impact can be attributed to him?' We might further extrapolate and say that the certainty of impact attribution hinged on authors' proclivity in decollectivising, individualising and/or colonising impact claims or in other words, securing the exclusivity of bragging rights. Therefore, only with the magnification of the author's impact contribution, ostensibly with the nullifying of the (now evanescent) contribution of collaborators, could the claims being promulgated be treated confidently as authoritative declarations.
Reviewers also repeatedly complained of authors' inconsistency in sequencing and synchronising research, research-related and impact-inspired activity, particularly where research/impact chronologies were suspected to fall outside of the REF's impact timeframe , prescribed by the Higher Education Funding Council for England (HEFCE), the architect and administrator of the REF.
Reviewers uniformly made the case that precise and accurate timeframes were indispensable to the perceived robustness of case studies. They also argued for better linkage between research and impact, guiding the reviewer through and making clear a history of change. Perceived slovenliness in attribution and time reporting was correlated to the demotion of an impact case study even where narrative descriptions were considered to be strong. In this specific context, reviewers offered little leniency and scant acknowledgement of the difficulty associated with reporting impact where the impact records of researchers were relatively sparse and incomplete and in many, if not most cases, understandably only instigated at such point where impact was legislated as a part of the REF.
Reviewers, furthermore, proposed that the best impact case studies would reflect a hybrid of narrative lyricism, dynamism and informational efficiency. Case studies, it was felt, would need to be aesthetically pleasing yet sufficiently functional and fit for the purpose of ease-of-evaluation or rather panellists' fluency in making decisions of impact claims. Reviewers claimed that authors should avoid the prosaic yet resist the florid. They would need to achieve succinct and precise statements scaffolded by strong yet not necessarily exhaustive evidence to metaphorically lift impact from the pages of case studies and, therefore, more immediately and forcefully impress upon panellists the strength and quality of the claims being presented. Achieving a balance between a compelling but unfussy style or as one reviewer put it, '… a balance between getting the narrative right and using long quotes and references' , it was felt, would require authors to directly engage their readers with lively and lucid rhetoric that ensured impacts were both obvious and explicit: Reviewer 1a 'There's a strong impact story to tell but it's not being told at the moment. It lacks a coherent narrative… We need more for the lay reader. It's not particularly well written' Reviewer 1b 'I don't disagree, I gave it a 1*. There is a case study here but it downplays the methodology. It undersells the research. I know this research and it's pedestrianly written. It's not cooked enough' .
Reviewer 1c 'It's based on excellent research, but there are difficulties with the language used. It's not userfriendly for non-experts. It needs to say, in a simpler way, what this research has been doing. There's lot of technical concepts that need fleshing out for the reader' .
This exchange points to an issue characterizing many of the case studies (especially those in 'main' panels A and B), being a tendency for authors to write in the convention of their discipline as opposed to the convention of the REF case study, that is using technical and disciplinary jargon as opposed to language suitable for a lay audience.
Whilst authors would need to achieve a 'wow-factor' , reviewers warned that they should abstain from over-claiming, crowding and/or convoluting impact claims, certainly where this occurred at the expense of a clear and 'impactful' narrative. It was argued that authors would need to be selective in what they included as the most authoritative forms of evidence and not bombard panellists with an 'everything but the kitchen sink' approach: Reviewer 2a 'This is potentially interesting but references to research has more than a maximum of six references, the details of impact includes everything but the kitchen sink -academic talks and lectures to postgraduates. It's loose and unwieldy. It needs to connect the research better to impact' . Reviewer 2b 'I get a similar sense. It doesn't bode well if it doesn't fit into 4 pages. There's so much there but I couldn't tell what was going on' .
Reviewer 2c 'I was alarmed by the number of websites. The assessors will have varying amounts of assiduousness and we need to balance between text and evidence' .
Reviewers recommended, therefore, that case study authors would necessarily require poise in providing panellists sufficient information to make them feel confident in their evaluations yet not so much detail as to obfuscate or unnecessarily prolong their decision-making. They would need to be assertive in their claims yet modest in their articulation of achievements; claim but not over-claim.
Reviewers also asserted a need for precision -certainly in ensuring accurate time lines -yet not parsimony in the detail provided by authors: Reviewer 3a 'There were a number of areas where I would like to know more. What were the dates of research? Who else was involved? What is the 'Council for X & Y'? Is this a funded body or just an individual with headed notepaper? How important is this organisation?… I didn't feel I had been told enough to judge how significant the impact had been' .
Reviewer 3b 'I'm concerned by the proliferation of dates, which will distract readers. Context and history of impact is valuable but needs to be clear' .
Notwithstanding, narrative economy, where accomplished without lost signposting, featured regularly in reviewers' comments of what would make a high-scoring case study: Reviewer 4a 'I like the cross-referencing to the evidence. It cuts down on words which is key to clarifying impact statements' .
Reviewer 4b 'This was very good. It was easy to read. I' d say a 3* but could easily get a 4*' .
Precision in the ordering of information and overall structure of case studies were also considered, unsurprisingly, as core to successfully communicating impact and circumventing cause for panellists' doubt. The following exchange between reviewers demonstrates quite clearly a perception of how even research that is patently impactful might fail to be recognised as such where its case study is poorly organised: Reviewer 5a 'It's over in page numbers and rather unclear . . . it's all over the place' .
Reviewer 5b 'I'm in full agreement… There's no one clear narrative of impact. It includes academic impact. It gave it a 2* because the work is there but in terms of presentation it's a 'U' .
Reviewer 5c 'I know the work. The research is ground-breaking' .
Panel Chair 'But the quality of the social science is beside the point' .
Whilst we discerned little variance in the manner of judging the quality of impact case studies across the four main panels, we noted overall that impact case study authors in panels C and D were more frequently commended and, therefore, we might add distinguished in producing clear, convincing and compelling narrative case studies. Indeed, it transpired that impact narratives in panels C and D required less in the way of 'surgery' . They were concurrently regarded by reviewers as better signposted, sequenced, explicated and more 'impactful' than those in panels A and B. A proposal from the members of panels A and B was for authors to receive external facilitation and scaffolding in the production of their impact narratives such as through the intervention of communication specialists, and/or in the first instance input from colleagues unconnected with the research/impact, able to offer steer or directly translate impact claims in ways more easily comprehensible to academic and lay reviewers.
There was some differing opinion among reviewers as to the most effective way of presenting case studies. Some put forward a preference for a holistic 'one-piece' narrative. Others indicated a preference for a segmentation or compartmentalisation of impact narratives and the use of sub-headings/ categories to order claims in a more systematic fashion: Reviewer 6a 'Is it worth us proposing headings should be used in narratives? It's a useful structure' .
Reviewer 6b 'I note it is as a strength in this case but the panel may not like things carved up in a mechanistic way' .
Where some reviewers felt the latter was helpful in explicating claims, certainly in making plain impact attribution and ownership, others thought it may cause to either patronise panellists or unnecessarily interrupt the readability and flow of the impact narrative. Some felt that the use of sub-headings might unnecessarily fragment and silo impact claims or cause them to appear underdeveloped or disconnected. From our own perspective, there appears no 'hard or fast' , right or wrong way in electing a narrative style and that the efficacy of a 'one-piece' or segmented narrative structure relates to the type of impacts being claimed i.e. multiple, incremental and diffuse or singular, contained/discrete, local.
One key presentational suggestion, made by reviewers, was for authors' frequent use of HEFCE lexicon. It was felt that in signposting impact, authors might adhere to prefacing impact narratives by initially articulating: 'the significance of the research was… the reach of the research was…' Other suggestions for effective messaging focused on the value of a good title; the avoidance of self-adjudicating claims and self-congratulatory claims of impact; and the badging of research according to its funder.

What is needed? Evidence
Reviewers voiced concern that REF panellists' main, if only recourse to evidence substantiating impact claims was the underlying research presented in case studies. Reviewers questioned the feasibility of panellists accurately gauging the quality of the underlying research and their capacity to enforce a 2* threshold 9 , where evidence indicating the quality of the research and its suitability as a barometer of impact excellence coupled with time for its review would be scarce. Whilst the HEFCE guidelines for impact evaluation allowed REF panellists to review the outputs of underlying research, it was not clear to reviewers how much time or inclination panellists would have for this, especially given their already considerable reading commitments in evaluating the main corpus of research outputs (cf. Corbyn 2008).
Consequently, in this 'mock REF' , reviewers, furnished only with bibliographic information related to outputs, resorted to a conventional academic criterion for quality: whether or not the article had been published in a peer-reviewed journal and the relative standing of that journal. We habitually noted panel members saying things like: 'publications are of appropriate quality I think; not amazing journals but probably OK'; 'I wanted more in substantive journals, though I am happy to be corrected'; or 'within our field we think the journals are quality' . In the social sciences, a hierarchy of publishers for monographs was noted: 'I would give at least 2* because of the OUP book' . Some preference was also given for traditional journals above open access publications: 'it's in an open access journal: why not Nature?' . Of course publication in a peer-reviewed journal does not necessarily confer research of 2* quality. Consequently, it was felt that some sub-panels would be forced to recruit additional sources of information citation data. Business and Management Studies for instance has an accepted 'league table' of journals that might be used to inform and confirm panellists' estimations of the quality of the underlying research (Havergal 2015). Reviewers also expressed scepticism about peer-reviewed funding for research as an indicator of research quality, noting that peer review of a competitively funded research proposal does not automatically guarantee high-quality research outputs.
Reviewers' overall conclusion in this regard was that the most uncomplicated and cogent way of signposting the quality of the underlying research to REF panellists would be through citing as many peer-reviewed journal articles as possible, and necessarily articles published in journals deemed high quality by the relevant academic discipline. However, the presentation of supporting evidence was seen to be not without issue, especially where its over-abundance might cause to distract, crowd, cloud and/or impair panellists' deliberations or cause case study authors to appear indulgent and/ or bombastic. Whilst reviewers observed the need for case studies to be built upon solid evidence, it was acknowledged that REF panellists' fullest determination of the quality of such would be difficult, without their committing to more extensive and onerous detective work. Consequently, reviewers appeared more concerned that authors should defend against the possibility of REF panellists querying the strength and/or authenticity of the underlying evidence by providing a narrative that would not possibly fail to convince. Given also, the recentness of impact as a component of research evaluation, it was assumed that panellists' scrutiny might not be so discriminating as that given to research outputs. Indeed, reviewers habitually and explicitly communicated a sense of impact peer review as an unfolding experience and formative learning process for all involved. They also reflected a danger that panellists, as neophytes of impact peer review, might veer towards evaluations of impact prejudiced by their experience of, and sense of confidence in evaluating (underlying) research. As such, reviewers feared that panellists might unwittingly duplicate evaluation of research and neglect evaluation of its impact: 'None of us really know how to handle this. We need to concentrate on impact and not research and avoid double-counting research. There will be a tendency to stray into the quality of research' .
A sense of panellists adhering to a priori knowledge or being guided by pre-established values or frames of reference was further made, perhaps none more so succinctly than by one reviewer who intimated that impact peer review was less a process established and maintained by scientific rigour than personal intuition and gut instinct, certainly in the assignment of high impact scores: ' Anything that is truly excellent, 4* is obvious . . . the 4* will jump out at you' .

What counts? Reach and significance
We found that reviewers' interpretations of the reach and significance of impact tended to be geographically defined -and therefore, to our mind, highly proscriptive -with impact that was globally mobile and diffusive attributed with higher value than impact that was more localised and specific to national territories. This was an issue especially problematic for the social scientist researchers whose research interfaced with more local or nationally determined issues and problems. Moreover, reviewers articulated a concern that the devolved context within which many, if not most, of their institution's researchers worked could have a debilitating effect on the value assessment of their impact claims, where many of these demonstrated reach and significance, yet focused on Wales: Reviewer 7a 'My concern about 'reach' is that work that focuses on Wales will be penalised' .
Reviewer 7b 'You don't want all case studies to be Wales based. Institutionally this could be a risk factor' .
Reviewer 7c 'There are questions whether an all Wales focus from a submission will be a problem' .
We witnessed many inconclusive discussions between reviewers speculating about how REF assessors would interpret reach and significance and variations of the two and the extent to which the university's impact competitiveness in the REF might be jeopardised were its impact portfolio considered to be parochial: Reviewer 8a 'It only refers to Wales, which rather limits its reach' .
Reviewer 8b [clearly not convinced by these geographical interpretations of reach] 'There maybe more they can say about reach but are you saying that because it refers to Wales it doesn't have reach?' Reviewer 8c 'There's an element of risk. It depends upon the assessor' .
Unsurprisingly, therefore, a pervasive sense of anxiety characterized reviewers' deliberations of whether impact that was local to Wales was any less valuable, in the terms of its reach and significance, than impact that traversed national borders. The final conclusion was that an institutional impact submission that was overly and/or overtly Wales focused would be detrimental not only to the potential award of 4* case studies, but moreover the claims of the university as a locus of international research. Consequently, we were able to surmise that in second-guessing REF panellists, reviewers were predominantly risk-averse and subscribed to a conservative and narrow, if arguably conclusive conceptualisation of reach and significance that had more to do with the populousness of impact across places than impacts among persons, and were consequently more inclined to back case studies demonstrating wider geographical 'range' .
Reviewers were also seen to advocate case studies, which featured impact diversity or research that had generated multiple rather than single impacts. It became apparent throughout the course of our observations that 4* impact assessments featured as a large category. Not all 4* case studies were seen as equal, with those examples of considerable impact in only one area (e.g. clinical guidelines or changes in policy) being seen as problematic. For example, one panel member suggested that 'there does need to be a diversity of impacts for a 4*; we need to encourage them to extend the reach of case studies' . Other members of the panel disagreed, suggesting that 4* could be achieved in a single impact. Ultimately, however, reviewers surmised that the gold standard REF impact case study would provide not only reach as defined by geographical spread yet reach orchestrated through a diversity of impacts: 'This is a very strong case. It displays good underpinning research. The actual funding of research is articulated. The reach and significance are very broad covering the UK, Europe and professional standards' .

What counts? Public and stakeholder engagement
When impact was first announced by HEFCE as an aspect of evaluation in the REF, certain parts of UK HE perceived an opportunity for the greater mobilisation, embedding and legitimisation of public engagement as a core component of academic labour (Watermeyer 2012a, 2012b, 2015a, 2015b). Many of the same contingent speculated that public engagement would feature pervasively in case studies as either an iteration of, or route to impact. However, our observations from the assessment days revealed ambivalence and reluctance from reviewers, respectively, in their interpretation of public engagement (PE) as a form of impact and in confidently assigning high scores to case studies built on PE. Some, for instance, like Reviewer '9a' , voiced concern that PE activity, where reported as impact, was too riskladen and would not be viewed sympathetically by panellists. The complexity of causal attribution or rather linking research to PE married to a lack of consensus regarding the precise nature of PE as an iteration of or conduit for impact was felt to hinder the potential use of PE in the case study context. Other reviewers such as Reviewer '9b' were more strident in their dismissal of PE in case studies: Reviewer 9a 'This has produced great television… but how do you measure… has it changed public perception? My worry is, is this physics? I gave it a 'U' …It doesn't feel right. This is high risk… I think it's a U…Will the top physicists to do this or will they say its too high risk?… If there were other case studies from physics this would stay on the shelf ' .
Reviewer 9b 'Dissemination and engagement aren't impact. It's important that channels are made clear but I suspect that academics have steered away from emphasising media and engagement' .
Overall, reviewers were more inclined to endorse a view of PE as impact where impact was the consequence of engagement as a process or product leading to impact, rather than PE being impact itself. Reviewer '9c' made this distinction in attributing the impact of research not to the generation of a cultural artefact, or in this case a television programme based on research, but impact emanating from the application of an artefact and its experience. Of course, this is a kind of impact notoriously difficult to accurately measure and arguably even more difficult to confidently claim, certainly in the context of the REF 10 : Reviewer 9c 'Is the impact the cultural artefact of the impact or the artefact. Just because you've been mentioned by a producer means nothing. The impact is what the impact of the programme is… the audience impact' .
Differences of opinion as to the precise nature of PE as impact, certainly where PE centred on the generation of a public artefact, were returned to in consideration of another case study where impact was claimed on the basis of a museum exhibit: Reviewer 10a 'Is this output or impact' Reviewer 10b 'My understanding is that you have to show how visitors to the museum were influenced by the museum exhibit' .

Reviewer 10c
'My understanding is that if I'm a researcher and I tell people something then that is not impact, but if I produce a cultural artefact and people see it in a museum then that is impact' .
Reviewer 10d 'I was in a group that discussed this before and the same points came up then as now. Does this mean if I write an article and put it on a stand in the lobby of the central library or on the side of a bus, then I have created a cultural artefact and had an impact'? Reviewer 10e: 'But a cultural artefact is not the same as a research output' .
Ultimately, reviewers across the panels remained undecided upon the precise value of PE and whether it could, and of course would be counted either as a research output or a research impact.
Such hesitation to conceptualise PE as an iteration of impact played out most explicitly in reviewers' discussions of the cogency of impact claims where case studies elicited a researcher's membership of an advisory body as evidence of impact. For example: Reviewer 11a 'I was worried this might get a 'U' . The impact rests on the role the person is playing on different bodes. Have a role on a body is not impact. They need to show that the role is driven by research and how research is feeding into the body's claims' .
Reviewer 11b 'In terms of the details of impact, it's not clear when impact is claimed or research. There's membership a public body but no evidence to show this is why this happened. This hinges on whether evidence can be generated to show link between research and impact, rather than, 'I did some research and got invited on some bodies and therefore had impact' '. Reviewer 11c 'What I read was engagement and influence but little attempt to link those outputs to the research' .
Reviewer 11d 'He's not the only person on these advisory bodies, so how much of these decisions made are his and how much are collaborative?' Reviewer 11e 'Giving general advice to a committee is not a proxy to putting research into impact' .
Ultimately, reviewers largely agreed that the university's REF impact submission should feature fewer rather than many case studies based on PE, and that where alternatives to PE case studies were available, these should take precedence. Yet, this kind of scepticism about PE is in tension with what we might call the 'paradox of folk impact' . User assessors were included in the REF panels (and hence these mock assessments) because, it was felt, they would be better placed to assess the 'real world' value (and hence the impact) of academic research, certainly better than ivory-towered academics. Yet, in the assessment process, we observed user assessors employing public engagement and media coverage as measures of impact, something that the HEFCE guidance explicitly ruled against. For example, one user assessor argued: 'Public engagement is not really strong in terms of the media, but if a journalist has used your work...This could be an excellent example of engagement with Welsh civic society. ' Subsequently, the user assessor was explicitly contradicted by the mock panel's Chair for employing an illegitimate measure of public engagement: User assessor 'The impact is significant but the research is surprisingly limited given the cultural shift to measurement of local government performance and making this publicly available. There is little evidence of public engagement -no reference to the wider media, what about the third sector and think tanks?' Chair 'But dissemination and engagement aren't impact. It is important that channels are made clear but I suspect that academics have steered away from emphasising media and engagement' .
This paradox, where non-academics, despite the fact that they are meant to provide a non-academic perspective on impact, are discouraged from using impact measures that seem right to them ('folk impact') in favour of academically defined measures is indicative of the highly artificial nature of impact.

Discussion
Much of the focus of what we observed in the course of the impact assessment days had to do with how case study authors window-dressed their impact claims and crafted compelling and 'reader/ evaluator-friendly' impact narratives with which to inveigle the favour of REF panellists. This was an exercise, which prioritised or rather championed the impact author as impact merchant and/or marketer. That this was the case is perhaps unsurprising, given the high levels of uncertainty experienced by reviewers (and the wider academic community) both in terms of what impact in the REF should/would look like and the incipiency of strategies for assessing the evidence supporting claims of impact. The challenges faced by our cohort of reviewers were very different to those explored by Michel Lamont, whose grant reviewers all had clear ideas of what quality looked like in their own disciplines, even if they were less clear about applications from other areas (Lamont 2010).
In total, we witnessed an institutional strategy for upgrading impact case studies based on a presumption that rhetorical artifice and a combination of exegetical eloquence, economy and precision would invoke the largesse of REF panellists. An institutional focus on evidence capture was arguably less sharp, with reviewers seemingly resigned to what they perceived to be a single-track route to substantiating impact claims -via the underpinning research. Of course, the scarcity of other forms of reliable and/or compelling evidence and arguably the uncertainty of the appropriateness of co-opting these for REF impact case studies can be accounted for by researchers' lack of familiarity and experience of impact as an iteration of performance review and, therefore, a pardonable failure in maintaining an historical record of their impact achievements. A lack of follow-on time in the further refinement of these case studies, where internal scrutiny and its communication back to authors closely neighboured the point of the institution's REF submission, may also account for reviewers biasing authors' honing of impact vernacular over their establishing the exactness and indisputability of impact fact through extended evidence harvest.
Whilst not a specific concern of this particular study -though one which is being followed up in a subsequent exploration of REF panellists' accounts of impact review -we observed that the role of user group assessors was largely cursory (curtailed or curated by the mock panel's academic Chairs) and was characterized by deference to the greater scientific capital and ritualistic authority of their academic counterparts. Conversely, user assessors, albeit perhaps more tacitly, were important as both a positive disruptive influence and deliberative ballast to the (academic-led) discussions, providing interpretative width to valuations of impact yet also illuminating the parameters by which research is appropriated by user constituencies. However, their contribution across these panels, where in most instances the ratio of membership was something in the region of 1:12 user academic, was largely ceremonial.
The weight of academic numbers and combined flexing of intellectual capital seemed to usher user assessors towards consensus, in most instances, rather effortlessly. There appeared, therefore, little difference in users' interpretations and value judgements of impact compared to their academic counterparts -though we did witness greater advocacy of public engagement as a form of impact than by the latter.
We thus conclude, initially by recognising the significance and/or the 'need' of researchers as impact case study authors, in discovering and culturing a dynamic style of narrative writing, which successfully communicates and sells impact to academic peer reviewers and user assessors. Secondly, our observations have helped us visualize the difficulty of impact reviewers in confidently and efficaciously arriving at judgements, where the evidence that might inform and guide these is highly curtailed, mainly to the underlying research. The implication, therefore, is that reviewers might be more susceptible or inclined to forming value determinations of impact case studies that are more arbitrary, speculative and/or less assured than they might, were evidence to be more plentiful and defensible. Correspondingly, there was a suggestion from reviewers that high-scoring impact case studies would need to demonstrate a variety of impacts in a variety of settings. In terms, therefore, of 'what counts' , case studies, where impact was anchored to one locale and one constituency would be less influential in persuading REF panellists, than those where 'reach' and 'significance' were more diasporic.
In another diagnosis of what counts, the kinds of 'soft' impact associated as, or rather with public engagement were seen to be more peripheral and vague and consequently less preferred by reviewers than impacts where causality and the research/impact nexus were more readily justified. Identifying hesitance, if not resistance, in reviewers' deliberations of public engagement as impact is especially significant, when considering the future of public engagement in higher education -under a system of reward and recognition -and its relationship to the impactful academic. Furthermore, reviewers' persistent inability to classify PE's impact contribution suggests that the REF privileges a very specific type of engagement, which has less in common with the general public and more to do with the benefits accrued by predefined stakeholders. Selectivity and strategic targeting of stakeholders would appear as such, essential to securing reach and significance, where both qualifiers are both at least partially contingent on the capacity and capital of research users to appropriate and exploit research, especially in ways that further widen its distribution and uptake. Impact reportage in the REF would, therefore, seem to depend upon academics being able to produce impact in a highly instrumentalised, if not self-fulfilling way.
Finally, the findings of this study leave us with a sense of ill ease. Whilst we recognise the need for academic accountability and the usefulness of academics to provide an account of what they do -as reflexive research practitioners and as 'indentured' to the public as their research financiers -a preoccupation with, or over-prioritising of 'performativity' may be ultimately debilitating to academics' public and scientific contribution; their professional identity and occupational welfare. Whilst impact evaluation appears, at least on the basis of these observations, to raise more questions than it provides answers and seems more speculative than specified, it provides further evidence of the way with which academic labour is being repositioned to complement market values. Impact case studies, and the suggestions of these reviewers for their improvement, signal the continuing neoliberalisation of higher education, where academics marketability and capacity to sell themselves as impactful -and where impact equals legacy, a succumbing to a cult of individual celebrity -is analogous to the commodification of seemingly every facet of academic life and its celebration. Concurrently, as reviewers' ambivalence pertaining to public engagement as a form of impact attest, that which is less easily commodified and less assured and/or prolific as impact currency, risks exclusion. Similarly, where the goal of academic excellence is so far and wide, where global reach and significance with elite stakeholders is preferred to the contribution academics can make to public communities on their own doorstep, institutions are at risk of dislocation and becoming irrelevant: foreign bodies, 'in' but not 'of ' their communities.
Our criticism is not however of those who attempt to manage academic accountability, by submitting themselves to peer review or undertaking the process of peer review itself. Our criticism instead is of an existing paradigm of academic governance that appears overly calibrated towards new managerialism and the influence of new public management. The UK higher education community is consequently left with an organisational schema that is less focused on ensuring the public accountability and citizenship of its academics, than engendering a culture of performativity that incentivises those most able (and willing) to sell themselves within a market economy of higher education. 4. An impact submission to REF2014 was made on the basis of one impact case study per 10 full-time equivalent (FTE) eligible academics being submitted to a disciplinary unit-of-assessment. 5. This paper provides evidence of the significant investment made by one institution in preparing for impact and ensuring the competitiveness of its REF impact submission, by committing approximately 100 of its most senior academic and administrative staff to two days of mock evaluation which also involved recruiting very high-profile user assessors. 6. The Russell Group is a self-selecting sub-group of 24 leading research-focused UK universities. See www.russellgroup.ac.uk for further information. 7. The four panels were intended to replicate the REF's four Main Panels: Main Panel A: Medicine, health and life sciences; Main Panel B: Physical sciences, engineering and mathematics; Main Panel C: Social sciences; and Main Panel D: Arts and Humanities. 8. We should note that the Committee to which we reported was perhaps surprisingly -or unsurprisingly in the context of impact being a new phenomenon -open to a frank assessment of the assessment days. Of course, the convention and etiquette of such a technical report is quite unlike that of an academic paper which has greater scope and license to problematise. 9. REF2014 employed a star scale (1-4) with which to classify the quality of research outputs. Impact case studies might only be considered where the research was considered to be of 2* quality or as according to HEFCE's definition: 'Quality that is recognized internationally in terms of originality, significance and rigour' (cf.

Notes on contributors
Richard Watermeyer is a senior lecturer of Education and director of Research within the Department of Education at the University of Bath. He is a sociologist of education (knowledge, science and expertise) with general interests in education policy, practice and pedagogy. He is specifically engaged with critical sociologies of higher education and a focus on new conceptualisations of academic praxis and the current and future role of the (public) university, particularly in the contexts of the marketisation, globalisation and neoliberalisation of higher education.
Adam Hedgecoe is a sociologist of science and technology studies within the School of Social Sciences at Cardiff University.