Backing what works? Social Impact Bonds and evidence-informed policy and practice

ABSTRACT Social Impact Bonds (SIBs) offer an opportunity to explore the use of evidence to inform public policy and commissioning decisions in both discursive and practical terms in what are frequently highly politicized contexts. We identify three potential mechanisms by which SIBs may promote evidence use and explore these through empirical findings drawn from a three-year evaluation of SIBs applied to health and social care in the English NHS. IMPACT This paper highlights three mechanisms by which SIBs may encourage evidence-informed policy-making. First, the ability of SIB financing to promote specific interventions for which a positive evidence base already exists. Second, the opportunities that SIB-financed programmes offer for the promotion of evidence use through improved local data collection practices. Third, the opportunities that SIB-financed interventions offer for formal evaluation. The authors tested these mechanisms; the implications of the results for policy-makers, public managers and other interested parties are presented in the paper.


Introduction
Globally, governments are exploring innovative ways of procuring public services to improve effectiveness and efficiency. A high-profile example of this trend over the past decade is the development of Social Impact Bonds (SIBs). SIBs are pay-for-performance schemes in which private for-profit or social investors (who seek a blend of financial return and social good) provide some up-front finance towards the delivery of a public service and subsequently may receive an outcomesbased rate of return. A key attraction of the SIB model for governments is that they should only pay for 'what works' (Mulgan, Reeder, Aylott, & Bo'sher, 2011). A concern for 'what works' builds on advocacy of evidence-informed policy and practice (EIPP) directed at policy-makers, practitioners and researchers over recent years (Boaz, Davies, Fraser, & Nutley, 2019).
In this paper, we explore the relationship between SIBs and EIPP. This is important because this relationship is somewhat ambiguous (Maier, Barbetta, & Godina, 2018). While some SIB proponents emphasise the promise that SIBs hold for furthering evidence-informed interventions or practices (Mulgan et al., 2011), other authors have highlighted potential epistemological (Warner, 2013), ethical (Roy, McHugh, & Sinclair, 2017), and practical (Edmiston & Nicholls, 2017) concerns in the relationship between SIBs and evidence. These conflicting viewpoints reflect the competing narratives discerned in the literature more broadly. Proponents emphasise the promise of SIBs as a 'win-win-win' policy tool (i.e. one that delivers better social outcomes for service users, cost-savings to government and a return to investors). In contrast, critics urge caution about the potentially damaging implications of the SIB concept (Fraser, Tan, Lagarde, & Mays, 2018a).
We present findings from a three-year evaluation of the first SIBs focused on health and social care in the English NHS (Fraser et al., 2018b), identifying three potential mechanisms by which SIBs may promote EIPP and exploring these through the evaluation's findings. In theoretical terms, we situate SIBs within wider debates linked to both the discursive and practical use of evidence in policy-making. We argue that SIBs are a useful lens for understanding evidence use in policy because evidence is strikingly central to the claims made by SIB proponents and their critics.
The paper is structured as follows: first we discuss the relationship between evidence use and SIBs; then we describe the methods used in this study. Next, the findings are presented. Finally, the findings are discussed and the key implications for practitioners are highlighted.
found in these-first, 'cost-saving risk transfer to private investors' and second, 'flexible but evidencebased services' (Maier et al., 2018(Maier et al., , p. 1333). The first is paradoxical because SIBs have high transaction costs (beyond those of traditional commissioning) and SIBfinanced initiatives that are rational choices for governments are unlikely to be attractive to investors (and vice versa) (Giacomantonio, 2017). 'Evidencebased flexibility' is also paradoxical as it suggests both conformity to an evidence-based model and malleability in delivery which may run counter to model fidelity. Of the 51 practitioner reports reviewed by Maier et al. (2018), 34 contained the paradox of 'evidence-based flexibility'. A strategy developed by practitioner report authors and identified by Maier et al. to sidestep this apparent contradiction was to employ a very loose understanding of the terms 'evidence' and 'evidence-based'. Public sector commissioners face a further contradiction related to financing interventions that have strong evidence of success through a SIB, namely, why should they pay more for a predictable level of success they could achieve through conventional commissioning?
Beyond SIBs, the language of 'evidence' and 'evidence-based' change in policy-making has had a recognized discursive power aligned with positivistic, managerialist, and 'post-ideological' technocratic assumptions (Newman, 2001) since at least the mid 1990s. Use of an evidential discourse may highlight an intentionality on the part of SIB proponents that is worthy of deeper consideration. In policy terms, aligning SIBs with 'evidence' and 'evidence-based' interventions may be seen as an attempt to depoliticize SIBs, and pre-empt some of the ideological and ethical criticism that has emerged about SIBs on the grounds that they 'financialize' human relations and social services (Warner, 2013;Roy et al., 2017). A key question for the commissioning of SIB-financed interventions is finding a balance between risk, return and a focus on the social outcomes beyond financial reward. Strategic attempts to intertwine SIBs within an 'evidence-based' discursive framing are a useful tactic to validate the policy and focus on the usefulness of the respective interventions while distancing SIBs from the more controversial financial mechanisms they comprise (Warner, 2013). In practical terms, the pervasive use of the discourse of 'evidence' allied to SIBs may be seen as a strategy to down-play the risk of programme failure in the eyes of interested stakeholders, especially investors.
The discursive practices identified by Maier et al. (2018) related to the presentation of SIBs as 'flexible yet evidence-based' draw attention to deeper, definitional ambiguities in the SIB concept. Indeed, the idea of what a SIB is, or should be, is imbued with 'chameleonic' characteristics (Smith, 2013). SIBs demonstrate a high degree of 'strategic ambiguity' (Smith, 2013;Eisenberg, 1984). This is to say that SIBs are amenable to being framed in different ways for different audiences. More broadly, a feature of a SIB is that it may be framed as a 'social' innovation to those with a primarily social ethos. At the same time, a SIB may be framed as a 'financial' innovation for those who wish to emphasise the potential to deliver a financial return to investors who desire to engage in 'good works' (Fraser et al., 2018b). At different times, the 'evidence-based' potential of SIBs or their potential for 'flexibility' may be emphasised (Maier et al., 2018) by different actors for different purposes. From a public management perspective, a SIB might be expected to either shift the risk of failure to an investor in return for a higher level of public funding while delivering success in line with intervention model expectations, or deliver higher performance against shared and agreed social outcomes (more social return) in return for some additional cost. The respective rate of return, depending on the likely expectations, might be different in each scenario (for example higher in the case of the former, lower in the latter)-nonetheless, striking this balance is a key concern for commissioners.
It is important to cut through some of the ambiguities and paradoxes that characterize the relationship between evidence and SIBs in particular by being clearer about how SIBs may promote or inhibit the use of evidence through empirical research on SIB projects. We identify three mechanisms by which SIBs may be expected to demonstrate evidence-informed policy-making: . The ability of SIB financing to promote specific interventions for which a positive evidence base already exists (Maier et al., 2018). . The opportunities that SIB-financed programmes may offer for the promotion of evidence use through improved local data collection practices (Stoesz, 2014). . The opportunities that SIB-financed interventions offer for formal evaluation (Fox & Morris, 2019).

Methods
This paper presents findings from a three-year evaluation of the SIB Trailblazers in Health and Social Care (Fraser et al., 2018b) funded by the Department of Health (now the Department of Health and Social Care) in England. Nine projects-collectively known as the SIB 'Trailblazers'-received seed funding in 2013 to explore whether to commission a service locally through a SIB and, if so, how to set it up. These projects proposed SIB-financed interventions targeted at a diverse set of population groups (in both geographical and target population size terms).
Likewise, the strength of the evidence behind the respective interventions was heterogeneous, as described in Table 1. The evaluation described and assessed the development of these projects over time with a view to considering whether and, if so, how, SIB-financed services might deliver better outcomes than alternative financing mechanisms. We drew on comparative qualitative case study methods (Yin, 2013) to do so. Qualitative case studies are an appropriate method for exploring issues related to policy implementation (Fraser & Mays, 2020), exploring 'how' and 'why' questions about phenomena through detailed contextualized accounts of cases (Yin, 2013). We undertook qualitative analysis of documents (both local and national) and conducted interviews with relevant actors across the Trailblazers including interviews before and after the decisions were made not to initiate a SIB for those sites that eventually chose not to initiate SIB-financed services. For those Trailblazers which did initiate SIB-financed programmes, we compared each of these qualitatively with sites elsewhere in the country that had the same or similar interventions (for example social prescribing, or specialist foster care services) serving similar populations provided by the same or similar organizations but not financed through a SIB mechanism. This comparison, though not perfect, sought to illuminate how the presence of SIB financing might have affected the management and delivery of services.
We conducted 177 interviews with 199 informants across all sites between June 2014 and May 2017 until 'data saturation' (Glaser, 1978). We purposively sampled informants to include commissioners (N = 38 with 32 informants), providers (N = 123 with 109 informants), intermediaries (N = 23 with 13 informants), investors (N = 9 with 10 informants) and others (N = 5), for example central government, data analysts or consultants. Most interviews lasted an hour and were face-to-face, though a number of interviews were also over the telephone (N = 27). Many interviews were conducted by two members of the research team together, and a small number of interviews was conducted with more than one informant.
Interview transcripts were coded with the support of NVivo 10 software. Two members of the research team analyzed data collaboratively to ensure inter-coder reliability and interrogated the data repeatedly in order to understand key issues in relation to the Trailblazers. We engaged closely with themes emerging from the data alongside wider theoretical insights (from the SIB specific and EIPP literature). In this way, the approach combined both inductive and deductive elements (Langley, 1999) as part of an iterative analytical process. The themes derived from the research questions and objectives of the evaluation related to the decision to initiate a SIBfinanced project or not; early implementation challenges where SIBs were commissioned; impacts of performance management and contract management decisions and service delivery upon different actors; the nature of the evidence underpinning the SIB-financed intervention and the ability to undertake an attributable evaluation of the intervention; and broader views of staff about potential strengths and weaknesses of SIB-financing mechanisms as they developed and delivered SIBfinanced projects. The interviews in the non-SIB comparison projects explored similar questions with the goal of attempting to tease out the main differences between delivering services with and without a SIB. The research generated a large volume of data. In this paper, we draw on a subset of the data taken from interviews across all sites (where SIBs were initiated, where they were not and the non-SIB comparison sites) that focus specifically on the three aspects of evidence use introduced above.

Findings
The strength of the evidence behind an intervention financed by a SIB mechanism The three proposed Trailblazer interventions with the strongest evidence base were initiated-these were the Manchester TFCO-A programme, the Newcastle Ways to Wellness social prescribing programme and the London Rough Sleeping SIB. There are a number of trials and a recent systematic review exploring social prescribing (Bickerdike, Booth, Wilson, Farley, & Wright, 2017) and academic research into interventions that aim to improve targeted adolescent behaviour, including TFCO-A (Evans, Brown, Rees, & Smith, 2017). Key elements of the rough sleeping intervention have been evaluated through quasi-experimental evaluations (Pleace & Bretherton, 2013) and the 'Housing First' principles it draws on has also been subjected to systematic review (Fitzpatrick-Lewis et al., 2011).
It is notable that the results of the reviews of these three interventions are somewhat mixed-and this may be significant-particularly with respect to social prescribing. Clinical champions of the Newcastle social prescribing SIB Trailblazer suggested that a key aim of the programme (alongside social improvement and cost savings) was to add to the evidence base behind social prescribing at scale: … this is actually a research [project] … you have to be able to prove it works. And you really do need a cohort of control patients to say … is it making that much of a difference? (Clinical champion.) As part of this Trailblazer, the quasi-experimental evaluation, including a control group from another part of the city, performed a dual role. For the clinical champions, it sought to generate data on the effectiveness of the social prescribing intervention itself. For the commissioners, it sought to demonstrate changes that could be causally attributed to the intervention itself, thereby justifying performance-based payments. There were both experimental and managerial reasons to maintain a level of rigidity in these metrics. However, this view was not shared by all parties, one of whom, in the light of implementation problems, sought to 'change and flex' aspects of the intervention metrics generating tensions amongst other parties.
In the case of the Manchester TFCO-A project, all parties (commissioners, providers and investors) valued the fact that the intervention was evidencebased and had been previously successful in other places. The commissioners stated that, without SIB financing and the transfer of implementation risk away from the local authority that it represented, the local authority would not have been prepared to pay for the service: … it's not something that we as a local authority would have invested in because it's so difficult and so complex and so challenging in terms of making it work. But … the risk [is] shared [through the] social impact bond. And, also, the basis originally of this TFCO … was that it was really [effective for] offenders (Commissioner).
SIB financing enabled the Manchester team to ringfence the budget for staffing the dedicated TFCO-A social workers required to achieve model fidelity. It was considered too challenging for the local authority to be able fund such staffing levels itself in the context of government financial austerity in the UK. Indeed, elsewhere in the UK over this period, many TFCO-A teams financed through conventional local government means were under immense financial strain, impeding the delivery of TFCO-A, and leading to the closure of many services including our non-SIB comparison site. As with the social prescribing example above, the SIB-financing mechanism was central to the initiation of TFCO-A by mitigating the implementation risk of an intervention which had numerous social work champions and an emerging (if contested) evidence base.
In addition, two SIB Trailblazer services lacking strong evidence of effectiveness were also commissioned. In the case of Shared Lives, the local authorities in each site had run in-house Shared Lives services or had worked in close collaboration with local third sector organizations (for example voluntary, community, not-for profit) to deliver the service to relatively static numbers of service users for many years. The SIB offered them a way to scale up the service and potentially realize cost-savings locally. So, while there is little rigorous research evidence on the effectiveness of Shared Lives services, there is well developed local experience of the promise of the programme in terms of user satisfaction and reductions in costs. The Worcester Reconnections Trailblazer was a targeted intervention to reduce loneliness, and thereby lead to improvements in health outcomes. By reducing social isolation, service recipients were expected to remain more active (thereby reducing the likelihood of noncommunicable diseases linked to sedentary lifestyles). This intervention was intended as a proof-of-concept project to generate evidence about a personalized approach to combating social isolation, thereby reducing its harmful health effects.
The four Trailblazer interventions that were not initiated lacked research evidence of effectiveness. This was noted as a specific factor that contributed to the decision not to commission the services in two cases (Leeds and Cornwall). The research evidence behind the proposed interventions in Sandwell and East Lancashire was also weak. In summary, the SIBfinancing mechanism enables interventions with and without the backing of research evidence to be initiated. For those interventions without an evidence base, local experience of the service engendering confidence in the minds of commissioners (Shared Lives), or a commitment to experimentation (Worcester Reconnections) appeared to be significant in garnering support.

The opportunities that SIB financing can offer for evidence generation in a local intervention
In the five active Trailblazers, informants emphasised that local administrative and descriptive data were routinely analyzed and used to guide local decisionmaking. Indeed, this enhanced use of data was cited by informants as a central advantage of SIB-financed work compared with their prior experience. This is a consistent finding across UK SIB research (DWP, 2014;Disley, Giacomantonio, Kruithof, & Sim, 2015). Furthermore, interviews with the non-SIB-financed comparator sites revealed that staff at these sites drew less upon administrative and descriptive data than the SIB-financed sites: … there's a lot of data processing that needs doing there and we haven't got the capacity to do it, and probably not the knowledge to do that (Provider: non-SIBfinanced comparison site) These findings align with arguments of SIB proponents who posit that the SIB mechanism encourages more reflective practice and improved capacity for active oversight of programmes through enhanced data collection techniques, management systems and sophisticated governance arrangements (Mulgan et al., 2011;HM Government, 2011).
Nevertheless, the picture is more nuanced than that presented by SIB proponents. An issue identified in the SIB-financed Trailblazer sites, and not found in the conventionally-financed comparator sites, related to how locally-produced administrative data were used to inform decisions about payments among the respective parties (as opposed to local learning and reflective practice). This increased the importance of these data and sometimes led to conflict between different parties (Fraser et al., 2018b). There is an assumption among SIB proponents that the goals of all parties can be aligned, and that more active and extensive use of data will be intrinsically beneficial (Mulgan et al., 2011;HM Government, 2011). However, much of the increased local data collected in these sites did not relate directly to the service user outcomes in the commissioner contract but, rather, to service provider processes, with important implications for performance management.
We found a range of managerial approaches to missed targets in the Trailblazers. In a number of cases, these had serious financial implications for service provider organizations: We had a target of getting, I think it was seven [new clients], I think by the end of March. So it was all gungho to try and get that through. We missed it by one I think. Now because of that I think we lost £600,000 worth of investment. So obviously that has had a big impact on everybody really … That [money] was going to come from the investors. But they wouldn't give it to us (Provider).
Evidently, the goals of investors may not necessarily be aligned with those of service providers, service users and commissioners. Withholding finance from service providers in response to missed process targets was a valid contractual response that also protected the investors from further potential losses, but did not necessarily advance the intervention locally, leading instead to turbulence among the partner organizations and significant financial strain for the provider. It should be noted that in another Trailblazer, missed process targets instead triggered change management processes that enabled the reorganization of service delivery in ways that were welcomed by most subcontractors. It also led to a contract renegotiation between the commissioners and providers and revised targets and financial flows among service providers that reflected what was possible for the remainder of the contract period rather than enforcing penalties for sustained underperformance.
This link between increased process measurement and organizational performance through the use of administrative data pressurized some staff working on the SIB-financed interventions. In interviews, some informants recounted that the financial goals of the SIB linked to local administrative data conflicted with their professional goals and responsibilities to service users (and intervention fidelity where applicable): I would have liked to see [a service user] sit on the programme for maybe a few months more. But from a financial perspective and from the investors' perspective we had to [terminate the process]. In many ways, that was okay, but having not had that SIB there, that side of things there, I would have been advocating or pushing further for a few months on the programme. So that's probably a really key example of where the clash is (Provider).
For some informants working on SIB-financed programmes, increased data collection attributed to the SIB mechanism was interpreted as a disciplinary device to focus service provider staff on achieving outcomes-related rewards for the provider organization as opposed to a collectively devised method to refine service delivery through innovative approaches that reflected a commitment to achieving long term benefits for clients. We found some examples of 'gaming' in the Trailblazers-as have other SIB research teams (Edmiston & Nicholls, 2017;DWP, 2014). However, we also found contrasting signs of provider staff who were committed to avoiding 'gaming', despite incentives to the contrary: … because we are a … charity, we've been able to just ignore the potential issues with payment by results, which are that you cherry-pick and you don't work with the most in need. We have, anyway, just because we see that as our role. Reputationally, it'd be rubbish for us to just say, well, we're going to work with these easy people, and morally-why would you work for an organization like this if you're going to do that? (Provider).
In some Trailblazers, we found that data collected by service providers and local commissioners became highly politicized, with debates among different parties about the appropriate methodologies to analyse and interpret the counterfactual evidence. Once more, this was linked to the financial stakes related to these data as they were used as evidence to trigger payments. While there is the possibility for adverse behaviours with other forms of financing, it is notable that there were no such issues in the non-SIB-financed comparator sites.
Finally, we found significant issues in relation to data access. In one instance, it was impossible for all parties to audit and validate the ways in which data were collected and used due to NHS data governance restrictions. In SIB-financed interventions (in the NHS at least), there may sometimes be overly ambitious assumptions as to what is achievable in terms of increased data collection and local evidence generation due to access issues. Furthermore, while it is possible that local data are used as a collaborative learning device, these data may also be mobilized as a disciplinary device.

The opportunities SIBs offer for formal programme evaluation
There is a growing set of empirical studies commissioned by the UK government that evaluate SIB programmes (Disley et al., 2015;Mason, Lloyd, & Nash, 2017;Fraser et al., 2018b). However, it is still the case that there is little rigorous counterfactual comparison of SIBs versus alternative methods of finance to deliver the same service to the same type of users, and thus a lack of evidence of the costs and benefits of SIBs compared with alternative approaches to procurement (Fraser et al., 2018a). All but two (Anders & Dorsett, 2017;Spurling, 2017) of these UK evaluations have exclusively reported qualitative findings and lack data about quantitative outcomes and costs.
We distinguish between overarching evaluations that seek to generate comparative data across a number of SIBs such as the Trailblazer evaluation (Fraser et al., 2018b), and focused local impact and process evaluations of individual SIBs-such as the Peterborough SIB evaluation work (Disley et al., 2015;Anders & Dorsett, 2017). Our focus in this section is on local evaluations of Trailblazers. It is unclear sometimes how evaluations will be paid for-this can inhibit the development of local evaluations: All of the, the commissioners' money that we've got is going into reward payments to make that as big a pot as possible to get the most outcomes. So no money was kept aside for management [of the] evaluation (Commissioner).
Of the five Trailblazer projects that were commissioned, Newcastle Ways to Wellness and the London Rough Sleeping SIB commissioned local impact evaluations (assessing programme effectiveness against a counterfactual financed by local commissioner and/or central government funds). As with the Peterborough evaluation (Disley et al., 2015), there have been some contested issues related to data collection and interpretation in Newcastle's Ways to Wellness Trailblazer. In some Trailblazers we found an implicit assumption that any and all improvements in client outcomes identified should be attributed to the SIB intervention regardless of whether any local attempt had been made to prove this through any impact evaluation using a counterfactual (though Worcester Reconnections did commission a local evaluation). This runs counter to the early discourse of SIB proponents who pointed to the rigour of the evaluation of the Peterborough project (Mulgan et al., 2011, HM Government, 2011. The original SIB model that the UK government and others promoted included an independent evaluation as routine to ensure that the public purse would only pay for outcomes attributable to the interventions financed by the SIB mechanism (Fraser et al., 2018b). Empirical experience in the UK would suggest that this is the exception rather than the rule and that attribution is often assumed, as opposed to independently proven. Additionally, costeffectiveness data are lacking from all UK SIBs.

Discussion
We found further evidence of the 'strategic ambiguity' (Eisenberg, 1984) within the SIB concept in the Trailblazer evaluation. It can be applied to the development of both interventions with and without a strong positive evidence base (Maier et al., 2018). Evidence from the Trailblazers suggests that the three proposed interventions with some supportive evidence were initiated, and most of those without research evidence were not initiated. The Trailblazers demonstrate that SIBs can indeed promote evidenceinformed programme implementation (i.e. programmes which already have evidence of likely effectiveness). Our findings suggest that SIB financing may bring added value for an intervention like social prescribing, as it is seen as a way to increase the evidence base. In the case of TFCO-A, informants felt that a SIB was a good way to transfer risk and set-up costs from commissioners in a context of austerity. This study also highlights that SIBs can lead to the initiation of programmes for which research evidence does not yet exist in order to enable experimentation as a way to generate greater understanding of novel interventions.
Importantly, there are epistemological questions concerning how we judge what a 'positive evidence base' is, and wider debates about what counts as 'good' evidence for policy and practice. At different times, policy-makers, practitioners and service users may need to draw on different forms of knowledge and ways of knowing, depending upon the questions they seek to answer (Boaz et al., 2019). An interest in knowing in advance that a programme has a 'positive evidence base' may orient SIB proponents towards academic research and interventions that have already been developed and evaluated using established research designs such as randomized controlled trials and systematic reviews. There are advantages in policy-makers carefully considering the evidence underpinning different interventions. Where their primary question is 'what works' (i.e. the question is one of relative effectiveness), there are well established 'hierarchies of evidence' based on study design (Boaz et al., 2019). Such approaches categorize evidence strength and quality based on criteria that privilege quantitative study design and value internal validity. More problematically, however, hierarchies based on study design exclude important forms of evidence and underrate the value of good observational studies-moreover they fail to develop programme theory-i.e. how and why interventions may work, and tend to disregard the importance of local context (Boaz et al., 2019). The prioritization of quantitative evidence over qualitative evidence in SIB-financed interventions, while understandable given the need to measure relative effectiveness in order to pay the investors may limit the potential for programme learning, stifle innovation, and increase pressure on provider staff and create incentives for 'creaming' (Warner, 2013;Roy et al., 2017).
Academic research is just one type of evidence that may be helpful for policy and practice decisions (Oliver, Lorenc, & Innvaer, 2014;Boaz et al., 2019). An important form of evidence is locally-produced administrative or descriptive data. A claim made by SIB proponents is that the SIB-financing mechanism, and the increased rigour it brings, may deliver enhanced data monitoring techniques and skills to third sector providers that have historically been seen as lacking in this regard (Callanan & Law, 2012). The importance of extensive, ongoing performance monitoring and concurrent independent evaluation is emphasised by SIB proponents as a way of ensuring that outcome payments are earned in a valid and attributable way (Cox, 2011). SIBs potentially offer the opportunity to draw on more (and better quality) administrative, descriptive and management data. Paying for outcomes might be expected to encourage increased and improved local data collection (Cox, 2011). This in turn might increase transparency of practice for commissioners and third sector providers and increase accountability of programmes overall (Stoesz, 2014).
Nonetheless, implementing evidence-informed interventions is highly complex, relies on the development of valued relationships over time and assumes shared conceptions as to what evidence is (Oliver et al., 2014). While the Trailblazers promoted increased data collection compared to non-SIB sites, this sometimes led to increased financial pressure on provider organizations and increased managerial pressure on provider staff, with potentially detrimental implications. We identified a danger that because performance data become so closely related to payment, they may become a focus for disputes between different parties and are thus counterproductive in that these data may reduce providers' focus on service user outcomes, and may introduce perverse incentives and damage inter-organizational relationships (Warner, 2013;Roy et al., 2017). Our findings support the findings of other studies into SIBs in the UK which highlight that (as with other forms of payment by results) SIB-financed programmes can lead to 'gaming' (DWP, 2014;Edmiston & Nicholls, 2017) which may weaken the validity of locally-produced administrative data in SIB programmes. It is important to note that we found evidence of provider resistance to such pressure.
SIB proponents highlight the potential SIBs hold for wider learning about what works in terms of preventive policy-making (Mulgan et al., 2011;HM Government, 2011). Because SIB-financed interventions promise to pay a return to investors, it is important that public stakeholders are assured that any outcomes associated with SIB interventions are attributable to the interventions themselves, despite the increased costs that more robust evaluation may imply. It may be the case that the commercial sensitivities of investors militate against the commissioning of independent evaluation (Warner, 2013). The opportunity for in-depth evaluation makes the SIB concept of particular interest to the academic community and evaluation specialists (Fox & Morris, 2019). There are opportunities to combine (quantitative) impact evaluations with (qualitative) process evaluations and cost-effectiveness studies of SIBs, thereby furthering knowledge of 'what works, why, when and for whom' (Pawson, Tilley, & Tilley, 1997)delivering research which transcends traditional 'hierarchies of evidence' of effectiveness and closer to a comprehensive approach to evaluation (Boaz et al., 2019) that includes qualitative as well as quantitative data. Worryingly, only two of the Trailblazers included impact evaluations. In the remaining three cases, payments were linked to performance targets assessed at intervals in simple before-and-after terms as opposed to counterfactual impact evaluation. This finding aligns with what has been found elsewhere in UK SIBs and 'payment by results' programmes (Fox & Morris, 2019). The lack of impact and cost-effectiveness evaluation is problematic as it runs counter to the original SIB concept that in SIBs, government would only pay for 'what works' demonstrably.

Conclusion
We conclude this paper with learning points for practitioners and managers-public commissioners in particular-from this research and highlight questions that may be worthy of consideration from a public management perspective before deciding to enter into a SIB-financed arrangement. Giacomantonio's (2017) analysis suggests that the more attractive a SIB is for investors, the less attractive it is likely to be for public commissioners and vice versa. This paradox is rendered even starker in relation to interventions that are robustly evidenced already and poses a major question for commissioners: why should they pay the extra transaction costs associated with a SIB (Giacomantonio, 2017;Fraser et al., 2018b) for programmes that they already know work? Empirically, the Trailblazers offer some reasons why they might. In the context of austerity, a SIB offers access to new financial streams and increased (non-financial) support for management and delivery of services up-front. Additionally, in the case of the three Trailblazer interventions with the most previous research into their effectiveness (Manchester, Newcastle and London)-the jury is still out as to their overall effectiveness and transferability. Therefore, realistically, there remains no guarantee of local effectiveness, and SIB-financing may be expected to spread some of the implementation risk among a wider set of actors than the public commissioners. For already more strongly evidenced interventions, a question to consider may be whether it can be assumed that the intervention will produce the required outcome without the need for local proof or how much local evidence is needed to establish the effectiveness of the intervention locally.
A further question for commissioners is how can a judicious distribution of risk among respective SIBlinked parties be found? We have written elsewhere in more detail about the different forms of risk that ought to be considered in SIBs (Fraser et al., 2018b). Commissioners need to carefully consider how they can best influence the alignment of interests between service users, providers, investors, intermediaries and themselves through procurement by specifying the interconnectedness between improved social outcomes, different forms of risk and outcome pricing. We hope these findings may aid such considerations.