Valuing health outcomes: developing better defaults based on health opportunity costs

ABSTRACT Background Current health economic analysis guidelines emphasize the importance of using nationally appropriate cost and valuation inputs. However, some countries lack national data, and some analyses focus on interventions with costs and benefits at regional or global scales. Methods Recognizing the need for better estimates of appropriate values for application at these levels than those used in the past, we characterize population-weighted dollar per disability-adjusted life year (DALY) averted by World Bank Income Level based on available national estimates of the marginal productivity of the healthcare system. Results The defaults suggested here reflect health opportunity costs across countries more consistent with existing evidence than those previously used or recommended. As countries change income levels and healthcare spending, and as additional or updated marginal productivity of healthcare expenditure estimates become available, we expect the defaults to change. Conclusion The best option for informing decisions around resource allocation in health care such that they improve health outcomes overall remains the use of time-appropriate country-specific estimates of the marginal productivity of the healthcare system. Instead of single, time-invariant defaults, health economists should seek to develop valuation inputs that better account for health opportunity costs and do so over time.


Introduction
The results of health economic analyses (HEAs) increasingly help to inform decisions and recommendations around resource allocation in healthcare [1][2][3][4][5], although considerable variability exists in the practice of HEAs, both in the methods applied and the assumptions used, all of which can affect results [6][7][8]. Decisions made by individual healthcare systems include, whether to use public funds to support a healthcare intervention, negotiations around the price of new healthcare intervention seeking to enter the national market, and the priorities for public funding of healthcare interventions as part of a package of care (e.g., Health Benefits Package) given the fixed or limited resources available. The results of HEAs can also inform decisions and recommendations made across healthcare systems. For example, the Disease Control Priorities Network's essential universal health coverage model benefits package, which offers a generic starting point for lowand middle-income countries to begin to develop their own country-specific health benefits packages [9]. The World Bank also categorizes countries into four World Bank Income Levels (WBILs) that it uses for its investment decisions based on gross national income (GNI) 1 per capita [10]. For the 2020 fiscal year, the World Bank classification includes 31 low-income (LI) countries based on GNI per capita of 1,025 USD or less, 47 lower-middle-income (LMI) countries with GNI per capita 1,026 USD to 3,995 USD, 60 upper-middle-income (UMI) countries with GNI per capita 3,996 USD to 12,375 USD, and 80 highincome (HI) countries with GNI per capita 12,376 USD or above [10].
In their decisions and recommendations, policymakers consider value for money, often judged by comparing the additional costs and benefits of a healthcare intervention against a threshold value. For example, for cost-effectiveness analysis, policymakers typically compare the ratio of additional costs to health benefits in the form of the incremental costeffectiveness ratio (ICER) to a threshold value. In this case, if the ICER falls below the threshold, the policymakers deem the intervention as a good value for money. Alternatively, the threshold may be used to calculate net benefit, which offers advantages over comparisons of ICERs to a threshold [11][12][13][14][15]. For this analysis, we focus on the threshold applied for valuation.
In HEAs, the threshold value used serves as an explicit or implicit policy choice and may reflect any of a number of different concepts. For example, the threshold may reflect a supply-side concept (e.g., the marginal productivity of healthcare expenditure representing the opportunity cost of committing expenditure to a specific intervention in terms of health) or a demand-side concept (e.g., societal willingness to pay for gains in health), or a concept reflecting a norm without any empirical foundation (e.g., multipliers of GDP per capita or 150 USD per DALY averted benchmark) [16][17][18][19][20][21]. Whether the policy threshold explicitly or implicitly applied in practice reflects health opportunity costs or not, making choices about interventions inevitably involves valuation [22]. In addition, how health systems incur health opportunity costs depends on financing and delivery and the fixed or flexible nature of the budget. Using valuation inputs in health economic analyses that do not reflect health opportunity costs can lead to decisions that reduce health outcomes overall rather than improving them [23].
Empirical estimates of the marginal productivity of the healthcare system available for most countries provide a means to quantify national health opportunity costs, either based on studies conducted using within-country data (i.e. in the United Kingdom, Australia, Spain, the Netherlands, Sweden, and South Africa) or cross-country data [24][25][26][27][28][29][30][31][32]. Analysts developed most of these estimates in the last few years, and many countries with highly diverse characteristics still lack estimates (e.g., Afghanistan, Dominica, Somalia, Vanuatu, etc.). Decisions and recommendations made within a national healthcare system and those made across healthcare systems frequently rely on defaults.
The historical development and application of default thresholds for the evaluation of ICERs and the calculation of net benefits for global health interventions continue to change. In 1993, the World Bank considered interventions with ICERs below 50 USD per DALY averted in LI countries or below 150 USD per DALY averted in LMI and UMI countries as 'highly cost-effective,' and interventions with ICERs from 150 USD-$200 per DALY averted as 'cost-effective' [33]. Contemporaneous and subsequent CEAs widely adopted 150 USD per DALY averted as a default without adjustment over time [18]. For example, the Disease Control Priority Network used these default thresholds to determine cost-effectiveness and define national health benefits packages as recently as 2004, without any consideration for potential growth in these values [34].
Another important precedent for other commonly applied threshold values to judge ICERs for interventions in LI and LMI countries relied on estimates of 1x and 3x GNI per capita as a $ per disability-adjusted life year (DALY) [35]. Building on this, the World Health Organization (WHO) issued guidelines that recommended comparing the incremental dollar per DALY averted by an intervention to 1 and 3x gross domestic product (GDP) per capita as a basis for characterizing an intervention; deeming interventions with ICERs below 1x GDP per capita as 'highly cost-effective' and those below 3x GDP per capita as 'cost-effective' [36]. Notably, the 2002 report did not discuss the differences between GNI and GDP [37] per capita or note the change in metric since it simply asserted that the 2001 report [35] suggested that 'interventions costing less than three times GDP per capita for each DALY averted represent good value for money' (see page 108) [36]. In 2008, the WHO issued guidelines intended to standardize HEAs for immunization, which discussed using GNI per capita for threshold analyses [3737] (see page 63), although elsewhere the report suggested analysts might use either GDP or GNI per capita (see page 25). These recommendations led to the use of valuation thresholds for cost-effectiveness analyses and $ per DALY valuation for use in incremental net benefits analyses that varied by country and with time as a proportion of GDP per capita or GNI per capita [38,39].
The WHO more recently refocused on country-based thresholds instead of GDP per capita or GNI per capita-based thresholds following discussion of the pros and cons of using GDP per capita-based defaults [40]. However, the WHO does not provide any guidance for valuation thresholds for countries that lack national estimates or for the adjustment of the thresholds for the time value of money [38]. The most recent edition of the Disease Control Priorities (series 3) applies a threshold of 200 USD per DALY averted for LI countries and 500 USD per DALY averted for LMI countries, based loosely on the average of estimates from cross-country data falling around 0.5x GDP per capita [9,41], with other analyses adopting the 0.5x GDP per capita threshold (e.g., Francke et al. [42]).
Recognizing the necessity of defaults, for example, to inform decisions for which no country-specific estimate of marginal productivity of health care exists or when the expected additional costs and benefits estimates exist only at a categorical or regional level (e.g., as done to inform priority packages of care) [9] or for global health interventions (e.g., Global Polio Eradication) [43], we characterize the impacts of applying existing defaults and explore the development of defaults that better reflect health opportunity costs.

Methods
We begin by assessing the potential health impact of using current defaults instead of country-specific valuation inputs. To assess the health impact of using existing defaults (i.e., 150 USD or 200 USD for LI countries, 500 USD for LMI countries, and 0.5x, 1x, and 3x GDP per capita for all WBILs) we calculate the health opportunity costs of 1 USD per capita expenditure for each country in countries for which an empirical estimate of health opportunity cost exists and compare these against the implied health opportunity costs from using each default. This illustrates the extent to which the default under-or overestimates health opportunity costs. We use the empirical estimates of marginal productivity for 23 LI, 34 LMI, 39 UMI, and 26 HI countries from prior work [29,30] and convert the estimates to 2018 US$ using a US GDP deflator [44]. Thus, with 200 USD and 500 USD used in DCP3 as the basis for the costeffectiveness of interventions in 2012 US$, we first convert these to US$2018 using the same method leading to 221 USD and 553 USD. We note the continued use of 150 USD without adjustment for the time value of money as often applied to current year estimates of cost-effectiveness (despite the selection of this value over a decade ago) [18]. Given prior practice [18,34], we include 150 USD per DALY as a threshold for LI countries.
We then suggest and assess alternative defaults that better reflect health opportunity costs using the category of WBIL [10]. We seek to develop alternative defaults that explicitly consider the relative population sizes of countries in each group and avoid issues with averaging ratios. To obtain the central estimate for each group, we calculate the number of DALYs averted for each country from a hypothetical change in expenditure (e.g., 1 USD per capita), sum the DALYs averted and hypothetical changes in expenditure, and then divide the total hypothetical change in the expenditure by the total of the estimates of DALYs averted. 2 We report these alternative default estimates by the WBIL group in 2018 US$ and then express this value as a percentage of group populationweighted average GDP per capita to inform alternative default estimates. Table 1 reports the results of applying existing defaults to judge cost-effectiveness compared to using health opportunity cost-based estimates. Applying a threshold to judge costeffectiveness that is lower than an estimate that reflects the marginal productivity of the healthcare system results in overestimating health opportunity costs, and this explains the inappropriate nature of applying an artificially low and fixed threshold. Making decisions on the basis of too low a threshold risks not adopting healthcare interventions that would generate net health benefits. All other things equal, the lower the ICER of any healthcare intervention that is rejected on the basis of a default threshold when it would have been accepted if the threshold reflected health opportunity costs, the greater the potential loss in terms of incremental net health benefits forgone.

Results
As shown in Table 1, the application of WBIL-specific fixed thresholds shows mixed results when compared to countryspecific health opportunity cost estimates. Specifically, applying a threshold of 150 USD results in underestimating health opportunity costs for 52% of LI countries and overestimating health opportunity costs for the remaining 48%. Applying a higher default of 200 USD (or 221 USD in US$2018) across LI countries results in underestimating health opportunity costs in more countries (70%). For LMI countries, applying a 500 USD (or 553 USD in US$2018) default underestimates health opportunity costs in 41% of LMI countries.
Consistent with the findings of Woods et al. [23], using GDP or GNI per capita as the defaults to judge cost-effectiveness generally leads to larger deviations (i.e., under-or overestimates of health opportunity costs) than the fixed and low defaults. For example, applying a 3x GDP per capita rule of thumb to assess the cost-effectiveness of interventions results in an underestimate of health opportunity costs for all countries, and 1x GDP per capita underestimates health opportunity costs for all LI countries, most LMI (94%) and UMI (87%), but fewer HI countries (35%). Making decisions on the basis of a threshold that underestimates health opportunity cost risks adopting healthcare interventions that displace more health than they generate. Applying the more recently suggested value of 0.5x GDP per capita (or 0.5x GNI per capita) appears to more accurately reflect health opportunity costs for relatively lower-income countries than 1x or 3x GDP per capita. However, the number of countries for which using 0.5x GDP or 0.5x GNI per capita results in an over-or underestimate does not distribute equally across income groups. An underestimate of health opportunity costs results in 31% of UMI countries compared to 62% of LMI countries and 100% of LI countries. Applying a default of 0.5x GDP per capita to HI countries would result in an overestimate of health opportunity costs in all HI countries. These results suggest that a rule of 0.5x, 1x, or 3x GDP or GNI per capita will not perform as well as using different appropriate income level weighted factors for each income level. Table 2 presents potential alternative defaults based on extrapolation of existing marginal productivity of healthcare expenditure estimates for population-weighted cost per DALY averted by WBIL. These estimates should better reflect health opportunity costs for each WBIL group. We also report these as a percentage of GDP and GNI per capita in 2018. Our results show that the population-weighted cost per DALY averted for UMI countries that reflect health opportunity costs would be 5,155 USD (US$2018), which represents 55% (57%) of the population-weighted GDP (GNI) per capita for UMI countries. As shown in the far-right column in Table 1, this value underestimates health opportunity costs for just over half of countries and overestimates for the other nearly half. The population-weighted cost per DALY averted for LMI countries 329 USD (2018 US) represents 15% (15%) of populationweighted GDP (GNI) per capita for LMI countries, and the estimate of 133 USD (US$2018) for LI countries represents 18% (20%) of population-weighted GDP (GNI) per capita for that these countries. The estimate for LMI countries also underestimates health opportunity costs for fewer countries (24%) than the estimate for UMI (54%) or LI countries (35%). The seemingly down-weighted cost per DALY averted estimate for LMI countries reflects the large proportion (46%) of people living in LMI countries residing in India, for which the estimated marginal productivity of the healthcare system of 347 USD (i.e., 17% of GDP per capita) falls below average compared to other countries in the LMI WBIL. The estimate for HI countries of 54,234 USD underestimates health opportunity costs for half of HI countries and overestimates for the other half. These results imply using factors to convert to health opportunity costs in $ per DALY by WBIL for 2018 based on GDP per capita of 0.18x for LI, 0.15x for LMI, 0.55x for UMI, and 1.14x for HI, or 0.2x, 0.15x, 0.57x, and 1.15x for GNI per capita for LI, LMI, UMI, and HI countries, respectively. Figure 1 provides a means to visualize the results of current defaults and the suggested population-weighted average defaults by WBIL. The x-axis shows the estimated countryspecific opportunity cost of a 1 USD per capita change in health expenditure and the y-axis shows the implied opportunity cost from applying each default. The 45-degree line  represents an agreement between these estimates and the distance below (above) the line reveals the extent to which the default under(over)estimates health opportunity costs. The figure makes it visually clear that applying a low and fixed threshold overestimates health opportunity cost in most countries. Similarly, applying 3x GDP per capita will underestimate health opportunity costs in all countries and WBILs.

Discussion
The continued use of established default values for thresholds may substantially and negatively affect national health outcomes by overvaluing interventions that generate net health losses while undervaluing worthy interventions. We sought to offer better cross-sectional defaults based on WBIL by extrapolating from the available evidence. However, using a default based on WBIL for countries in the same WBIL suggests that some countries will adopt healthcare interventions that would generate net health losses, while others would fail to adopt healthcare interventions that would generate net health benefits for their populations. This realization should motivate the collection of national data for individual countries.
If the application of a default aims to identify healthcare interventions likely to represent cost-effective strategies across all countries within a WBIL, then setting the threshold at the estimate of marginal productivity for the country with the lowest estimate of marginal productivity within the group would likely better achieve this aim. Using an estimate of 500 USD or 553 USD in US$2018 for LMI and UMI countries, for example, will largely do this for 2018. We emphasize, however, that using this strategy to deem an intervention cost-ineffective using this threshold for all countries in a WBIL would likely mean rejecting an intervention that would have generated net benefits in some of the countries in the group. Furthermore, as national economies grow (or shrink), their budgets for health care will also likely change, alongside changes in the burden of disease, fertility rates, demographics (e.g., sex and age structure), and other characteristics that will lead the marginal productivity of the healthcare system likely to change. Applying a cost-effectiveness threshold in 2018 US$ to judge the cost-effectiveness of estimates of the expected additional costs and benefits of a healthcare intervention with a net present value presented in 2020 US$ and costs and benefits occurring over future years fails to account for any growth in the marginal productivity of health care.
In line with previous analyses by Woods et al. (2016), our results showed that applying a 1x GDP per capita rule of thumb to assess cost-effectiveness results in an underestimate of health opportunity costs for most LI and LMI countries, suggesting that we should expect that using a 1x GDP per capita as a threshold to inform decisions would reduce health outcomes overall for most countries. Using 0.5xGDP per capita, as suggested more recently [9] more accurately reflects health opportunity costs in LI and LMI countries. While this lower default results in health opportunity cost overestimation or underestimation in about half of the LMI countries, underestimation occurs more frequently for LI countries. Thus, recommending the adoption of a healthcare intervention across LI, LMI, and UMI countries on the basis of a cost per DALY averted of 0.5x GDP per capita would likely lead to a reduction in overall health outcomes in more LI countries than in UMI countries with potential implications for equity.
In HI countries, however, 0.5x GDP (or GNI) per capita results in an overestimate of health opportunity costs in all countries while 1x GDP results in an overestimate for most. On the other hand, 3x GDP (or GNI) per capita results in a substantial underestimate of health opportunity costs for all HI countries, as reflected in our population-weighted cost per DALY averted estimates of 114% (115%) of GDP (GNI) per capita. This contrasts with evidence from HI countries from analyses of within-country data because within-country estimates of the elasticity of mortality with respect to expenditure tend to be higher in magnitude and result in the lower estimated marginal productivity of healthcare expenditure [30]. The number of estimates based on within-country data continues to increase, and now also includes at least one UMI country (i.e., South Africa [31]). In addition to obtaining national data, the collection and analysis of within-countries data related to health opportunity costs should remain a research priority.
We find a better reflection of health opportunity costs across countries when using defaults that account for the relative population sizes of countries in WBIL (i.e., populationweighted averages) that avoid issues with averaging ratios across WBILs. The population-weighted cost per DALY averted estimates by WBIL performs better than any previously used or suggested defaults when compared with estimates of health opportunity costs. As countries change income levels and healthcare spending, and as additional marginal productivity of healthcare expenditure estimates become available, we can expect the defaults would and should change.
The application of a WBIL default will most likely perform worse for countries that move between WBILs over time, including those with GNI per capita values that fall close to the thresholds used by the World Bank to classify countries. We can expect that countries may move between income groups as their economies grow or shrink relative to other economies. Accordingly, as this occurs, the countries should increase the multiplier for the GDP (or GNI) per capita assumed (e.g., from 0.15x to 0.55x for GDP per capita when shifting from LMI to UMI). When big countries move between WBILs this leads to significant changes in population-weighted estimates of the average cost per DALY averted for the WBIL (e.g., India moved from LI to LMI in 2009, China moved from LMI to UMI in 2012). Looking prospectively, with India accounting for nearly half of the population of the LMI group, its relatively low existing valuation estimate impacts the WBIL overall (as discussed above). With India projected to achieve a significant growth in real GDP per capita over the next 20 years that could see it move into the UMI [45], the WBIL defaults estimated using the methods we suggested may see significant changes. In addition, as countries develop and their health expenditure budgets expand, we should expect the suggested WBIL threshold values to change over time. We suggest that in the absence of a better approach, analysts might apply the values in Table 2 to future years by applying the percentage of population-weighted average GDP or GNI per capita for the income group to the income groups updated populationweighted average GDP or GNI per capita in the absence of better data. We suggest that annually updating the estimates to reflect updated data on WBIL, population, economic growth, and new national health opportunity costs estimates offers an even better option. Annual updating would account for countries moving between WBILs and allow for the inclusion of additional countrylevel data about marginal productivity of healthcare expenditure, which served as the basis for the estimates. In addition, a process of regular updating would provide a motivation and means for evaluation of the stability of the estimates. While these methods remain imperfect, they account for any growth in the marginal productivity of national healthcare systems. Given the nonnegligible challenges involved in estimating these values, whether for a single country using within-country data or across countries using cross-country data, the ability of analysts to project health opportunity costs based on existing estimates of marginal productivity offers great value. If or when these become available, they (or, better yet, updated estimates for the current year) should also inform updated population-weighted averages.
Although the defaults suggested here reflect health opportunity costs across countries better than previously used or suggested defaults, by virtue of being defaults, they will still overestimate health opportunity costs in some countries and underestimate in others (although to a lesser extent that previously used or suggested defaults). For example, Figure 1 illustrates over-and underestimates assuming a 1 USD spend per capita across countries. In reality, the cost of an intervention in each country may not be proportionate to the national population. For example, assuming the same per-patient cost in each country, an intervention targeted at children will have a higher total cost in countries with a relatively larger fraction of children in its total population. If countries with relatively more children are, in general, poorer and face higher health opportunity costs (i.e., reflected by a lower cost per DALY averted estimates) relative to other countries, then applying the default suggested here would likely underestimate health opportunity costs. This means that the potential for greater net health losses with negative implications in terms of overall population health in poorer countries with a lower life expectancy that already face higher health opportunity costs. Concerns about equity should further motivate the collection of country-specific estimates and improved monitoring of the health and economic implications of interventions as countries invest in them over time.
Our assessment of the performance of defaults based on the comparison of the estimates from two studies [29,30] depends, in part, on econometric analysis that uses cross-country data [46]. Econometric analyses of cross-country data assume a single model can relate differences in health outcomes to differences in expenditure on health care (i.e., the approach presumes the ability to estimate an international health production function). As discussed, econometric advantages come from performing analyses of within-country data, with the additional benefit that within-country data support the estimation of country-specific health production functions. We recognize our use of historical data as a limitation. Future studies should prioritize updating estimates using more recent data and forming projections of future values. The mechanisms underpinning changes in marginal productivity remain complex, variable, and uncertain [47].
Reflecting on current times, we can anticipate that large macroeconomic impacts (such as the consequences of the COVID-19 pandemic in early 2020 and associated policy responses) will affect the marginal productivity of healthcare expenditure, which merits further analysis once data become available.

Conclusion
The best option for informing decisions around resource allocation in health care such that they improve health outcomes overall remains the use of country-specific estimates of the marginal productivity of the healthcare system where these exist. In the absence of existing information for individual countries, any default applied may result in over-or under-estimation of health opportunity costs, although some defaults perform worse than others. In particular, 3x GDP or GNI per capita underestimates health opportunity costs for all LI, LMI, and UMI country healthcare systems and nearly all HI country healthcare systems, while 1x GDP or GNI per capita underestimates health opportunity costs for the vast majority of healthcare systems in UMI and LMI countries and all LI country healthcare systems. When analysts use aggregate default values, for example, when evaluating an intervention at the country group level and/or to support policies and decisions at an aggregate level, this paper suggests defaults that better reflect health opportunity costs across countries within each WBIL than previously used or recommended. We suggest that health economists can use these defaults to inform decisions in 2018 or 2019, but that these defaults can and should be updated in future years.

Notes
1. Formerly GNP, see terminology change in 1993 [48]. 2. This gives a different answer than simply calculating a populationweighted average cost per DALY averted for the WBIL group. Our approach to obtaining a central estimate for each WBIL group accurately reflects the variability that occurs in the denominator of ratios (i.e., the age-old problem of needing to calculate the ratio of the averages rather than to calculate the average ratio for HEAs).