Assessing work disability for social security benefits: international models for the direct assessment of work capacity.

Abstract Purpose: It has been argued that social security disability assessments should directly assess claimants’ work capacity, rather than relying on proxies such as on functioning. However, there is little academic discussion of how such assessments could be conducted. Method: The article presents an account of different models of direct disability assessments based on case studies of the Netherlands, Germany, Denmark, Norway, the United States of America, Canada, Australia, and New Zealand, utilising over 150 documents and 40 expert interviews. Results: Three models of direct work disability assessments can be observed: (i) structured assessment, which measures the functional demands of jobs across the national economy and compares these to claimants’ functional capacities; (ii) demonstrated assessment, which looks at claimants’ actual experiences in the labour market and infers a lack of work capacity from the failure of a concerned rehabilitation attempt; and (iii) expert assessment, based on the judgement of skilled professionals. Conclusions: Direct disability assessment within social security is not just theoretically desirable, but can be implemented in practice. We have shown that there are three distinct ways that this can be done, each with different strengths and weaknesses. Further research is needed to clarify the costs, validity/legitimacy, and consequences of these different models. Implications for rehabilitation It has recently been argued that social security disability assessments should directly assess work capacity rather than simply assessing functioning – but we have no understanding about how this can be done in practice. Based on case studies of nine countries, we show that direct disability assessment can be implemented, and argue that there are three different ways of doing it. These are “demonstrated assessment” (using claimants’ experiences in the labour market), “structured assessment” (matching functional requirements to workplace demands), and “expert assessment” (the judgement of skilled professionals). While it is possible to implement a direct assessment of work capacity for social security benefits, further research is necessary to understand how best to maximise validity, legitimacy, and cost-effectiveness.


Introduction
Disability assessment is at the heart of social security: it divides the 40 million people claiming disability benefits across the OECD from the 25 million claiming unemployment benefits [1]. The history of such assessments can be seen as a gradualand still incompletetransition from impairment-based assessments to functioning-based assessments [2,3]. The earliest assessments were those based on medical conditions or impairments, most commonly through "baremas" that quantify the assumed loss of work capacity associated with, e.g., losing a body part. While elements of the impairment-based approach are still common, these have been supplemented or replaced by a focus on functional capacities. This is because impairments are a poor proxy for an individual's capacity to work: functioning is not reducible to impairment as modern models of disability, such as the International Classification of Functioning, Disability and Health (ICF) make clear [4].
However, in a recent World Bank report, Bickenbach et al. [2] argue for a third form of assessment that directly assesses an individual's capacity for work, based on the core ICF insight that disability arises from the interaction of functional limitations with the particular requirements of the individual's work environment. (While Bickenbach et al. term this "the disability approach", for clarity we here term this a "direct disability approach"). While functioning-based assessments include a "local notion" of the demands of the modern workplace [5, p18], Bickenbach et al. argue that the "fundamental weakness" of functioning-based approaches "is that it is difficult to come up with the domains or areas of functional capacity that are highly and consistently correlated with a standardized 'capacity to work', given the enormous variety of work requirements and kinds of employment situations" [2]. A direct disability approach considers both an individual's functional capacities and how this functioning compares to the likely demands of the workplace [as for fitness-for-work assessments outside the benefits system ; 6]. It would also align social security assessments with legal definitions of disability in anti-discrimination legislation as well as the UN Convention on the Rights of Persons with Disabilities [7].
Despite this, there has been relatively little academic consideration of how to directly assess work disability. This is not to overlook several valuable comparative studies of disability assessment [3,[8][9][10][11], but these are more than a decade old, and provide only scattered evidence on direct disability assessment. Nor is this to underplay the value of recent developments in using the ICF for social security disability assessment, most notably with the creation of a "core set" of ICF categories expressly for this purpose [12], coordinated by the European Union of Medicine and Assurance in Social Security (EUMASS). The ICF provides a common language that allows the exchange of knowledge [13] and by focusing on work-related body functions and activity limitations within the ICF, the EUMASS core set is a useful aid to decision making [14]. However, as others have also noted [12,13,15], the consensus-based approach in developing the EUMASS core set resulted in an exclusive focus on functioning (body functions and activity limitations). No measures of environmental factors, such as working conditions, are included, partly because the ICF's work-related codes are too general to capture the detail needed for work capacity assessments [16], hence "the richness of modern working life factors is not represented in the ICF" [17].
The United Kingdom vividly illustrates these challenges. While various researchers have spoken positively about the United Kingdom's systematic functional assessment in principle [2,17,18], this work capacity assessment (WCA) has been classed as one of the leading recent "blunders of our governments" in practice [19]. This is because the assessment is both inaccuratedespite its name, it does not directly assess claimants' capacity to workand has considerable implications for claimants, with the assessment governing financial payments, and the threat of sanctions [20]. As a result, the WCA has not only failed to control expenditure, but has also been found to raise the risk of suicide [21], and its failures have become headline news. Yet, when disability charities argued that the WCA should be replaced with a direct disability approach, the Government-appointed reviewer of the WCA responded that United Kingdom experts were "unable to offer clear, evidence based advice" on what any such test would look like [22].
It is often difficult for policymakers in any given country to imagine how assessments could operate differently, particularly when there is little academic guidance as to how work capacity could be directly assessed. Nevertheless, sometimes practice is ahead of theory, and a number of countries do currently implement a form of direct work capacity assessment. In this article, we aim to provide an account of how work capacity can be directly assessed in social security disability assessments, based on comparative case studies of nine high-income countries. The article firstly explains this method in further detail, before describing a three-fold typology of different approaches to the direct assessment of work capacity.

Materials and methods
This article presents an account of different models of direct disability assessment, based upon comparative case study research. While comparative case studies are a common method, they are most frequently used for causal inferencethat is, to trace causal processes within a small number of cases, using the logic of comparison to provide clues as to which mechanisms are causally decisive [23,24]. Here, in contrast, we use comparative case studies descriptively. Our aim is to develop an understanding of how direct work capacity assessment can be done, rather than to causally assess the impact of any given model on a series of outcomes. The remainder of this section sets out our selection of cases, the information gained within each case, and the method of analysis.
Case selection: while there has been much discussion of case selection for comparative case studies with causal aims [23], our aim here was instead to understand different ways of implementing the direct disability approach. Therefore, we chose cases that our initial information suggested would be most relevant. This includes the country that two international experts and Bickenbach et al. all suggested as best practice (the Netherlands), three other countries that Bickenbach et al. suggest are currently closest to the direct disability approach (the United States of America, the Canadian federal level, and Sweden), three other European countries that have sometimes been suggested as having elements of good practice (Denmark, Norway, and Germany), and two countries that have undertaken recent reforms that were attracting attention in the United Kingdom (Australia and New Zealand).
Data collection: our goal within each country was to understand how disability is assessed within the social security system, both on paper and in practice. One possible method would be to collect structured information from key informants in each country. This method is ideal if the phenomenon under study is already largely understood, with the research question broken down into a series of discrete, specific questions that make sense in each context [e.g., 18 on disability evaluation reporting]. However, this study is both too exploratory for such an approach, and was seeking deeper, wider-ranging information than would be feasible in a key informant study (as can be seen below). Therefore, we adopted a more flexible approach to data collection, which built up a picture of the disability assessment in each country iteratively, adding further information based partly on gaps or contradictions in the existing data, and partly on the areas that looked most important for the emerging typology.
In practice, we began by sketching outlines of each system using previous comparative case studies of disability assessment and recent comparisons of the disability benefit system in general (particularly by the OECD). We then conducted multiple further online searches: for official government documents on the assessment; for any relevant academic research; and for non-academic material that expressed the views of other stakeholders. Where crucial gaps could not be filled from these documents, this was supplemented with expert interviews (predominantly by phone), who were identified either from their published research, contacted by virtue of their organisational role (e.g., in a Government department/social insurance agency), or found by snowballing from other interviewees.
Ultimately, we focused particularly on four case studies that are most central to the emerging typology (the Netherlands, Denmark, Australia, and the United States of America), in each of which we reviewed 20-40 documents and interviewed 5-10 individuals (see Table 1). Elements of the five other case studies are also used in the typology (not least Germany and Norway where the 4-9 documents were supplemented by 2 interviews, but also 22 documents in Sweden, and 5-7 documents in each of Canada, and New Zealand). In total, we reviewed over 150 documents and interviewed 40 individuals.
Analysis: the analysis had two elements. Firstly, within each case we built up a profile of the disability assessment, with a particular emphasis on any direct assessment of work capacity. (Unlike a conventional qualitative interpretivist study, in which the precise wording within documents and interview transcripts are thematically analysed, we instead focused on factual information about the case, iteratively updating the case profile with any new information gained from each source, and noting discrepancies where relevant). Secondly, we looked across cases to create a typology of currently existing models or "logics" of direct disability assessment. The remainder of the paper explains the threefold typology that resulted, dividing between (i) structured assessment, (ii) demonstrated assessment, and (iii) expert assessment.
It is worth noting that our typology partly builds on previous reviews by de Boer [3] divided impairment-and functioning-based assessments (above) from rehabilitation-based assessments, which overlaps with our model of "demonstrated assessment" (below)and they themselves based this partly on Deborah Stone's classic historical analysis. Gjersøe [25] contrasts the discretionary approach of Norwegian assessors with the standardised approach of British assessors. Both Mabbett et al. [10] and Wright & de Boer [5] further discuss the possibility of linking functional capacity profiles to labour market requirements (discussed below under "structured assessment"), although they go into little detail. Our typology is new, but builds from existing accounts where relevant.

Model #1: Structured assessment of work capacity
The Dutch case is perhaps the most "notable example" of the direct disability approach [2], and was suggested by expert informants as international best practice for the direct assessment of work capacity [see also 5,10]. It exemplifies what we term the "structured assessment of work capacity", by creating a formalised, data-driven link between functioning profiles and work requirements. Claimants' functional capacities are assessed, then compared to the functional requirements of 7000 actually existing jobs in the Netherlands [26] in a database called CBBS ["Claim Beoordelings-en Borgingssysteem", usually translated as "Claim Assessment and Assurance System"; 27,28]. This provides an empirically based assessment of jobs that the individual can do, and the percentage earnings reduction that their disability causes compared to their previous occupation, which then underpins their eligibility for disability benefits.
CBBS records are assessed through recent on-site observations by a team of about 35 full-time specialists in the social insurance agency [28,29 and expert interview]. Given the prohibitive cost of covering all jobs nationally, CBBS covers about 20% of all of the possible occupational codes in the Netherlands, weighted towards "lower level jobs" that are potentially available to all claimants [28,29]. The job assessments cover the 28 different functional domains against which claimants are assessed, allowing variation between regular demands and peak demands, as well as covering the required work pattern, education, experience and skills of the job [29,30]. While the assessment is oriented around the CBBS database, the result is not itself fully automated, with a labour expert providing the final definitive judgement based on their own professional expertise [29], although the degree of discretion is relatively constrained [31].
While this system is considered best practice in linking functioning to work capacity, it has not been immune to criticism. A previous version ("FIS") was criticised for not producing a good assessment of a person's earning capacity, particularly around mental disorders [10,32]. When CBBS was introduced there were also concerns from a newly formed group called the Foundation for the Protection of the Incapacitated about the "black box" nature of the assessment [33]. A court in 2004 ruled that the assessment was valid in principle, but insufficiently transparent, verifiable, and testable in practice [34]. 1 Once this was addressed, the remaining challenge has been getting claimants to come to terms with their inability to meet the requirements of the labour market [35]. The assessment is not necessarily a good basis for rehabilitation (see below), yet in a context in which there are strong pressures from the courts for benefit decisions to be transparently justifiable, it seems to produce benefit eligibility judgements that are widely accepted as valid (expert interviews).
The structured assessment in the Netherlands seems to be uniquely successful, but similar principles are also being applied in the United States of America. Here, the social security administration (SSA) establishes if a medical impairment exists, and then compares this to a listing of impairments [36]. If claimants do not have an impairment that meets the listing, then a residual functional capacity assessment is conducted, which is then compared to the demands of work. Claimants are either found eligible for disability benefits, or (mirroring the Dutch approach) are typically told of three occupations that the SSA believes are commensurate with claimant abilities [37]. However, there are two ways in which this is less satisfactory than the Dutch model. Firstly, rather than a matching of the exact functional profile, claimants are found fit for a crude classification based primarily on "exertional" (physical) limitations: very heavy work, heavy work, medium work, light work, or sedentary work [36]. Benefit eligibility is then based upon claimants work ability combined with their age, education, and work experience (see below).
The second limitation is that information on work demands primarily comes from the American Department of Labour's Dictionary of Occupational Titles (DOT), which has not been substantially revised since 1977 [36][37][38]. While the DOT was created for different purposes and is recognised to be outdated, the more modern version of DOT ("O Ã NET") is considered inadequate for social security assessment: it contains insufficient detail on work activities; the classification of functioning does not fit the SSA's criteria; O Ã NET is often based on written job descriptions rather than actual observations; and job titles are aggregated at too high a level which conceals substantial heterogeneity between jobs [38,39]. While the National Academy of Sciences [38] suggested that O Ã NET could be adapted to serve SSA's purposes, the SSA-convened Occupational Information Development Advisory Panel (OIDAP) concluded that O Ã NET was intrinsically a flawed basis for SSA benefit determinations [39]. An effort to explore a new process of collecting occupational data is currently underway [37], though how SSA will ultimately use this is unclear.
The American and Dutch assessments connect to a wider tradition of job matching within vocational rehabilitation; indeed, the British Government in 1919 attempted to match war veterans to a  [40]. Yet in general there is a "dearth of job matching" research in the return-to-work literature [41]. Moreover, Pransky et al. [42] criticise the use of job matching models within vocational rehabilitation. This is not simply because of their scepticism that any functional assessment is a good guide to an individual's capacity, but because these structured assessments are poor basis for rehabilitation: they ignore psychosocial factors, do not start from the priorities of the individual in question, and do not consider what would help the individual to work. Moreover, they consider the way that the workplace presently is, rather than how it might be changed. Such issues explain why the Dutch assessment is often described as capturing "theoretical" work capacity rather than a basis for rehabilitation [34 and expert interview]. Still, for the purposes of social security benefit assessment, the Dutch system offers potential model for the direct assessment of disability.
A further potential difficulty with the structured assessment of work capacity is that it requires a "substantial effort" [42] to catalogue jobs available nationally, particularly given heterogeneity within each class of occupations. As previously mentioned, the Dutch system requires 35 full-time individuals to keep CBBS up-todate across the 20% most common occupations. One alternative is to focus on the functional requirements of a much smaller number of jobs, which are used as reference categories against which to assess social security benefit eligibility [see also 28]. This is the approach of a new Dutch assessment called SMBA ("Sociaal-Medische Beoordeling van Arbeidsvermogen" ["Socio-Medical Assessment of Work Capacity"]) for the separate youth disability benefit "Wajong". SMBA focuses on functional profiles of 15 relatively light minimum wage jobs (e.g., "parking lot attendant", "receptionist"), which are each meant to be representative of the requirements of wider groups of jobs nationally.
The SMBA system is different in a number of respects from the assessment for the permanent disability benefit above. SMBA addresses some of the problems of structured assessments by supplementing these with personalised expert judgements as to possible adjustments to these jobs that would enable the person to work, which labour market experts must explain within a structured report. A further new development in SMBA is to break apart jobs into their component tasks using the principles of job carving. Individuals who could not earn the minimum wage but who could do 40% of a standard job will be put in the "Banenafspraak" group, and if employed, will have their practical work capacity assessed within a specific job, which will then determine the subsidy received by the employer [43]. Combined with the fact that individuals only need to be able to do one threshold job rather than three CBBS jobs, this explains why the Wajong assessment was described by one of the experts interviewed as "small and mean CBBS". It is too early yet to judge if SMBA has been a success or failure in practice, but it nevertheless represents a further, novel model for (semi-)structured work capacity assessments.

Model #2: Demonstrated assessment of work capacity
The second model of directly assessing work capacity is based on the actual experiences of the individual in the labour market, which we term the "demonstrated assessment of work capacity". This is linked to the "rehabilitation-before-benefit principle", which is generally presented as a activating labour market policy [44], although we here consider it as a way of inferring lack of work capacity from the failure of a concerted rehabilitation effort. Of the countries included in the review, Germany, the Netherlands, and the Scandinavian countries have all been said to have a rehabilitation-before-benefit principle [17,44], but the form that this takes varies considerably between countries.
Perhaps the clearest statement of this principle can be seen in an Australian high-level strategy document. This argued that the original assessment for the disability pension was flawed because it was tasked with assessing claimants' work capacity over the next two years, and for many claimants "there is little or no practical evidence on which to base this judgment" [45]. It was therefore recommended that most claimants should only be eligible for the disability pension "when their 'Continuing Inability to Work' has been demonstrated" in practice. Since the ensuing reforms, claimants need to actively participate in a (usually governmentfunded) "program of support" for 18 months before being eligible for the disability pension [46, 1.1.A0.30], at which point they are referred to the expert assessment outlined in the following section. Further evidence was also expected to come from looking at individuals' prior work historywhether they had "fallen out of employment rather than had to cease work because of their disability" [45], which is also explicitly considered in Canada (in the disability benefits within the Canadian federal pension plan).
Similar reasoning can be seen in Denmark, where an evaluation strongly criticised the old "Resource Profile" for conceiving of work capacity as something that exists (and can be measured) in the abstract, independently of specific contexts [47]. Claimants are therefore now only awarded a disability pension if an assessing multidisciplinary team is confidentand can demonstratethat the individual has no capacity for work [48,49 and expert interviews]. While this includes people who have such severe functional limitations that they "obviously" ("helt åbenbart") cannot be moved towards work [50], in practice the majority of claimantsand nearly all claimants under 40are required to go through a scheme called Resource Activation ("Ressourceforløb") for one to five years. Another crucial (and more longstanding) aspect of the Danish system is that individuals are often sent on a work trial/work test ("arbejdsprøvet/arbejdsprøvning") for several months in order to clarify their work capacity (as described in several expert interviews). These take place in either a private company or an activation service, and are not meant to replace existing jobs, but instead to test which tasks individual are capable of within a work setting.
A key advantage of the demonstrated direct assessment of incapacity is its strong link to rehabilitation. It also has the potential to be more accurate than structured assessments, in the sense that many people's functional capacities and ability to cope in different workplaces are inherently uncertain, and it has something in common with the iterative learning process about an individual's work capacity in the increasingly widely used models of supported employment, such as Individual Placement & Support [51]. However, it faces four challenges. Firstly, as experts in both Australia and Denmark noted, claimants often find the logic of the system contradictory: they are told that in order to prove they cannot work; they have to try to get back to work (or even do a work trial). This is perhaps less of a contradiction that it might appear, but it may nevertheless reduce both claimant motivation and the perceived legitimacy of the system. Secondly, the overlap with rehabilitation is partial, because of the different nature of benefit eligibility assessment and rehabilitation assessment. This is partly because the claimants' relationship with the assessor may be one of distrust when being evaluated for benefits (the assessors' goal being to appropriately restrict access) but more trusting when their rehabilitation needs are being evaluated. It is also because there are pressures for benefit eligibility to be standardised, but for rehabilitation assessment to be personalised [25]. Yet even if these tensions can be overcomewhich they seem to have been in Denmarkthere is only a partial overlap in the information about work capacity that is required. Modern ability-based rehabilitation needs to be based on a holistic assessment of an individual, including inter alia their motivation [52], but motivation is not usually considered a legitimate influence on benefit eligibility (see also below). Conversely, benefit eligibility assessments examine people's capacity to do jobs that they have no desire to do, which is unhelpful for the purposes of rehabilitation (see "structured assessments" above). Therefore, it is possible to combine these assessments in an inefficient way that increases the resources required for assessment, which was a key reason why Australian dual-purpose assessments were later abandoned [45], and explains why the OECD has repeatedly praised new dual-purpose assessments that it later decides are flawed [e.g., the Danish assessment in 44, and the Australian assessment in 53]. Even if the latest Danish reforms ultimately overcome this tension, the possibility of inefficiencies remains for other countries considering demonstrated assessments.
Third, this only provides an accurate picture of work capacity if the rehabilitation maximises work capacity. In practice, however, older models of rehabilitation are not necessarily focused on employment in the open labour market (focusing either on health improvements or sheltered workshops), and even where rehabilitation is focused on supported employment, there are examples from almost every country where this does not maximise work capacity. In Denmark, there are anecdotal reports of work trials that are poorly matched to the individual in question [expert interviews and 54,55]. Despite a series of reforms in Australia, a recent Government consultation found that "providers and people with disability expressed widespread, almost universal, concern about [the assessments], including consistent feedback that they often refer people with disability to inappropriate services" [56]. In Germany, the much-cited principle of rehabilitation-before-benefit is undermined by the fragmentation of the social security and rehabilitation systems, and a lack of expertise of frontline staff in identifying rehabilitation needs [57]. And in Sweden, only a small proportion of those on sick leave for 3-6 months were offered a contact meeting in 2010-2011 (contradicting the reforms), although there has since been an attempt to increase this [58]. In such circumstances, nominal periods of rehabilitation may not accurately demonstrate an individual's true work capacity.
Finally, because rehabilitation benefits are generally lower than disability pensions, claimants will tend to receive less money while they are demonstrating their incapacity. While reforms are often framed as providing increased rehabilitation in return [59], this is only convincing for those that benefit from it. Not only is this account challenged where there are gaps in rehabilitation (as above), but in the view of some critics, the reforms are a way of delaying paying higher levels of benefits to people who have no realistic chance of work. For example, in Denmark, there has been considerable media and political attention on those placed in work trials or Resource Activation who have very low levels of assessed work capacity [e.g., 30 min of work capacity at low speed, twice per week; see 54]. Not only are there claims by some doctors that these are damaging to people's health [55, which spurred a national TV documentary], but as a consultant at one trade union put it: It is very rare that a medical certificate is 100 percent watertight. There is always a little hope that the health will improve, or another treatment option that can be tried. So the process is nonsense. With the new law, municipalities say no to early retirement if you could handle even the smallest of Flex-Jobs. [authors' translation of 60] Similarly, the introduction of demonstrated assessment of work capacity in Australia is now being challenged on the grounds that two years of rehabilitation is "a poor guide to the likelihood of highly persistent disability," and that five years might be more appropriate [Australian Productivity Commission in 61]. This seems to run the risk of a long-running deferral of disability pensions, reducing the income of the disabled people concerned.
These four challenges are significantthe challenges of a coherent message to claimants, of balancing the needs of benefit assessment vs. rehabilitation, of adequate rehabilitation, and of ensuring that this is not simply a reduction in payments for those with no realistic changes of work. Nevertheless, the demonstrated assessment of work capacity has a strong inherent logic, a potentially valuable link to rehabilitation, and seems to becoming increasingly common across high-income countries.

Model #3: Expert assessment of work capacity
The final form of directly assessing work capacity is the most common: to ask a medical, occupational health, or labour market professional to use their expertise to judge whether an individual is capable of work. Again, the precise form of this varies crossnationally: In New Zealand, people's own treating doctor completes a questionnaire that includes questions such as whether their health conditions "limit the person's capacity to work regularly in open employment for 15 hours or more per week?" [62]. Little further guidance is given, and while the Government can request a further independent examination, apparently this only occurred rarely in the early days of the reform [63]. In Australia, claimants are assessed by government allied health professionals in a Job Capacity Assessment. After checking that the health condition is "treated, stabilised and permanent", and that someone scores sufficient points under the "impairment tables", the assessor examines whether someone has an inability to work for 15 or more hours per week which is likely to last for two years [64].
In the disability benefit of the Canadian Pension Plan, after establishing that someone has a medical condition that results in prolonged disability, a government nurse assesses whether the decision "prevents him or her from regularly pursuing any substantially gainful occupation" [65]. While these are superficially straightforward principles, there are longstanding concerns about the consistency and validityand stringencyof such discretionary assessments. One step has been to replace a claimant's own doctor with a governmentappointed expert (seen to some extent in all of the countries above), on the assumption that the assessor will be less swayed by their existing relationship with the claimant [66]. Another has been to ensure that qualified experts lead the process, although these are most commonly medically focused (doctors or allied health professionals) rather than labour market experts. In response to the problems of reliability (below), one recent editorial [67] has argued that "the solution is hardly to find the ultimate expert but rather to allow groups of 'experts' with different types of expertise to give arguments for and against disability pension", a view that can be seen in practice in the Danish and Swedish multidisciplinary team assessments.
Nevertheless, there are three challenges around expert discretionary assessment of work capacity. Firstly, a recent systematic review has concluded that expert assessments of work ability "show high variability and often low reliability" [68]. Barth et al. suggest that low reliability can be partly combated through standardisation, which can be seen in several countries. (This should not be confused with vaguer and more generic guidance about assessing work capacity [such as in Canada or Australia; see 46,65]). Genuine standardisation can be seen, for example, in the standardised inputs that are prepared for rehabilitation assessment meetings in Denmark, via a standard rehabilitation plan that is completed by the claimant in partnership with their caseworker. The expert-based elements of assessment in the Netherlands are perhaps the most structured, in which insurance physicians follow both interview protocols [69], and disease-specific guidelines for assessing work-related functioning [70].
Yet, it is unclear if standardisation produces highly reliable outcomes. The evidence on standardisation in Barth et al.'s review is not compelling (the link they find between standardisation and reliability is confounded by whether the study is conducted in a "manufactured" research or more naturalistic insurance setting). In contrast, direct evaluations of standardisation have found mixed results [e.g., 71]. And even in the Netherlands, where specific guidelines exist for assessing work hour capacity, expert insurance physicians failed to reach high levels of agreement when assessing the hours of work capacity that a social security applicant was capable of, whether they received a written assessment from a nurse [72], or interviewed the applicant themselves [73] in contrast to their relatively reliable assessments of functioning. Arguably this reflects the inherent challenge that (as we have already seen) for many claimants "there is little or no practical evidence on which to base this judgment" of work capacity [45].
Secondly, even if the reliability of expert assessment can be improved, there are questions about their validity because the assumed requirements of the workplace are generally opaque. Sometimes, it is clarified that assessors should consider "the customary conditions of the general labour market" rather than unusually accommodating workplaces [57]. But otherwise insurance physicians tend not to mention job requirements explicitly when making individual decisions about work capacity [17,74], partly because they seem to be assuming a "standardized environment" [75] across their entire caseload. We do not have a clear idea of what assessors consider to be the general demands of the workplace, nor whether their understanding is correct.
Finally, as a consequence, there can be a considerable gap between the formal definition of work capacity being assessed and the actual criteria used by assessors. For example, while the German criterion is formally based on the number of hours/day that an individual could work, in practice assessors divide between more-and less-disabled individuals based on a rule of thumb [76]. The 2006 Australian reforms illustrate this gap: while nominally the assessments were made more stringent (changing the eligibility threshold from 30 to 15 h/wk of work capacity), the long-term claim rate was almost unchanged [45]. Even today, experts in Australia variously described the benchmark hours criterion as "arbitrary" and "almost a fictitious construct", while responses to a recent Government consultation noted that "the 'benchmark hours' assessment is confusing and often does not accurately reflect a participant's work capacity" [56].

The dividing line between disability and unemployment
A final question that often arises for all these models is how to maintain the distinction between disability and unemployment, given that claimants' capacity to work is likely to be affected by non-medical issues (such as personal or labour market factors). The solution in nearly all countries is twofold: to require that claimants have a medically diagnosed health condition [17]; and to make clear that social security assessments only consider if a person is capable of doing work that they are qualified to do, not whether they could actually get a job in their area. There are numerous examples of this. In the Dutch system "the law explicitly stipulates that whether the person in question can actually obtain the labour in question should not be considered" [26], and the Canadian, American, and Australian systems are similarly explicit [36,46,65]. The main exception to this is the German system of providing full pensions where someone is assessed as only capable of part-time work, if part-time work is not considered to be available in that region [77].
This does not mean that non-medical factors were ignored in work capacity assessment, but rather that they were only considered if they influenced the jobs that people were capable of doing. The most direct link is in the Netherlands, where the matching of people's capacities to jobs in CBBS requires these jobs to also match the claimants' education & skills. In the United States of America, the threshold for benefit eligibility is also lowered for individuals who are older, illiterate, cannot speak English, or lack relevant education, although the way these are taken into account is only indirectly and opaquely linked to work capacity [78]. And in the Canadian expert-based assessment, age, education, and work experience are all taken into account in determining whether someone had limited work capacity [65]. Direct disability assessment does not necessarily mean that non-medical factors are taken into account (they were ignored in other cases), but where they are considered, steps are taken to ensure that a sharp administrative boundary between unemployment and disability remains.

Discussion
Social security systems routinely distinguish between work-disabled and unemployed people, and this distinction depends upon disability assessments. Over time, diagnosis-or impairment-based have increasingly been replaced by functioning-based assessments, because a person's functioning cannot be reduced to their diagnosis. Yet by a similar logic, work capacity cannot be reduced to functioning. Therefore, functioning-based assessments suffer one of two problems, depending on whether the assessment is calibrated to be lenient, or stringent. If the assessment is calibrated to be relatively lenient, then some individuals with functional limitations but high work capacity will nevertheless be assessed as entitled to disability benefits [the concern of 44]. Alternatively, if the assessment is calibrated to be stringent, then some individuals with low work capacity will be denied disability benefitspotentially with damaging consequences, as seems to be the case in the United Kingdom.
Because of the limitations of functioning-based assessments, a recent report for the World Bank by Bickenbach et al. has argued that we should directly assess claimants' work capacity [2]. However, policymakers have no guidance on how to actually implement such an assessment (as Bickenbach et al. themselves admit), and for this reason proposals for direct disability assessment in the United Kingdom have foundered. The contribution of this article is an account of how work capacity can be directly assessed, which enables policymakers to see how they might implement such an assessmentand indeed, to demonstrate that it is possible for such an assessment to be implemented. The account is based on comparative case studies of direct disability assessments in nine high-income countries (focusing particularly on the Netherlands, Denmark, United States of America, and Australia), utilising over 150 documents and 40 expert interviews.
We found that there are three different models of direct disability assessment. Firstly, there is structured assessment (e.g., in the Netherlands), which measures the functional demands of jobs across the national economy and compares these to claimants' functional capacities. Secondly there is demonstrated assessment (e.g., in Denmark), which looks at claimants' actual experiences in the labour market, and infers a lack of work capacity from the failure of a concerned rehabilitation attempt. Third, there is expert assessment (e.g., in Australia), where experts make a professional judgement that a claimant has impaired work capacity. While the logic behind each type of assessment is different, each of these assesses claimants' work capacity directly (rather than just their functioning).
We also described the apparent strengths and weaknesses of each model. Expert assessment is perhaps the easiest to implement, but it is difficult to tell exactly what professionals are assuming workplace requirements to be, and whether these assumptions are valid. Partly as a result, even best-practice expert assessments have been found to be produce inconsistent results. Demonstrated assessments have strong links to rehabilitation, but even nominally "rehabilitation-before-benefit" systems struggle to provide optimal environments (and therefore maximise assessed work capacity) for all claimants. Moreover, longer assessments may be needed to perform both benefit eligibility and rehabilitation functions simultaneously. Finally, structured assessments produce empirically-based, transparent decisions about work capacity that are perceived as valid when they are implemented well. However, they require some investment and do not necessarily connect well to rehabilitation, as they focus on job requirements before any accommodations are made.
We should stress here that our main aim was to create an account of how work disability can be directly assessed, rather than to come to a definitive judgement about a single "best" model (or how the different models can best be combined, as in the recent SMBA developments in the Netherlands). Such a judgement partly depends upon the weight that specific policymakers place on different criteria, alongside the practical constraints operating in that particular context. Nevertheless, future research could clarify these trade-offs by providing further evidence on the requirements for successful implementation of each type of direct disability assessment, the costs involved, their predictive validity (and coverage errors in both directions), their face validity (and perceived legitimacy), and whether the type of assessment influences employment outcomes. Note 1. The Dutch General Administrative Law (AVB) sets out principles of good governance, which require any state decision to be justified and that the reasons must be given with sufficient transparency that an interested party can judge the basis of the decision. However, in a case on CBBS at the Central Appeals Tribunal on 9/11/2004, it was found that a "higher emphasis needs to be placed on reporting and justifying the medical insurance and work study principles underpinning the decision of a particular case" [authors' translation of 34, section 3.4.2]. By 2006, the Appeals Court had ruled that the resulting changes were largely acceptable, subject to some further minor amendments (see http:// rechtennieuws.nl/12342/crvb-blijft-kritisch-over-cbbs-systeemin-wao-geschillen/ (accessed 31/7/2017).

Disclosure statement
One of the authors has recently been on secondment in the United Kingdom Department of Work and Pensions, but has no financial interest or other benefit that has arisen from the direct application of this research. The authors report no other potential conflicts of interest. ORCID Ben Baumberg Geiger http://orcid.org/0000-0003-0341-3532