A small number of surgeons outside the control-limit: an observational study based on 9,482 cases and 208 surgeons performing primary total hip arthroplasties in western Sweden

Background and purpose — Feedback programs relating to surgeon levels have been introduced in some orthopedic quality registers around the globe. The aim of an established surgeon feedback program is to help surgeons understand their practice and enable an analysis of their own results. There is no surgeon feedback program in Sweden in the orthopedic quality registers and there is a fear that a feedback system might pinpoint surgeons as poor performers, partly due to patient case mix. As a step prior to the introduction of a future possible feedback program in Sweden, we assessed the variation in the occurrence of adverse events (AE) within 90 days and reoperations within 2 years between surgeons in western Sweden and explored the number of surgeons outside the control-limit following primary total hip arthroplasties (THAs). Patients and methods — Patient data, surgical data, and information on the surgeons, relating to surgeries performed in 2011–2016, were retrieved from 9 publicly funded hospitals in western Sweden. Data from medical hospital records, the Swedish Hip Arthroplasty Register (SHAR) and a regional patient register located in western Sweden were linked to a database. Funnel plots with control-limits based on upper 95% and 99.8% confidence intervals (CI) were used to illustrate the variation between surgeons in terms of the outcome and to explore the number of surgeons outside the control-limit. Both observed and standardized proportions are explored. The definition of surgeons outside the control-limit in the study is a surgeon above the upper 95% CI. Results — The study comprised 9,482 primary THAs due to osteoarthritis performed by 208 surgeons, where 91% of the included primary THAs were performed by orthopedic specialists and 9% by trainees. The mean overall annual volume for all surgeons was 27. The observed overall mean rate for AEs within 90 days for all surgeons was 6.2% (5.8–6.7) and for reoperations within 2 years 1.8% (1.7–2.2). The proportion of surgeons outside the 95% CI was low for both AEs (0–5%) and reoperations within 2 years (0–1%) in 2011–2016. The corresponding numbers were even lower for AEs (0–3%) but similar for reoperations (0–1%) after standardization for differences in case mix. In a sub-analysis when the number of surgeries performed was restricted to more than 10 primary THAs annually to being evaluated, almost half or more of all the surgeons were excluded from the annual analysis. The result of this restriction was that all surgeons outside the control-limit disappeared after standardization for both AEs and reoperations for all the years investigated. Considering the complete period of 6 years, less than 1% (1 high-volume surgeon for AEs and 2 high-volume surgeons for reoperations) after risk adjustments were outside the 95% CI, and no surgeons were outside the 99.8% CI. Interpretation — In a Swedish setting, the variation in surgeon performance, as measured by AEs within 90 days and reoperations within 2 years following primary THA, was small and 3% or less of the surgeons were outside the 95% CI for the investigated years after adjustments for case mix. The risk for an individual surgeon to be regarded as having poor performance when creating surgeon-specific feedback in the SHAR is very low when volume and patient risk factors are considered.

both AEs (0-5%) and reoperations within 2 years (0-1%) in 2011-2016. The corresponding numbers were even lower for AEs (0-3%) but similar for reoperations (0-1%) after standardization for differences in case mix. In a sub-analysis when the number of surgeries performed was restricted to more than 10 primary THAs annually to being evaluated, almost half or more of all the surgeons were excluded from the annual analysis. The result of this restriction was that all surgeons outside the control-limit disappeared after standardization for both AEs and reoperations for all the years investigated. Considering the complete period of 6 years, less than 1% (1 high-volume surgeon for AEs and 2 highvolume surgeons for reoperations) after risk adjustments were outside the 95% CI, and no surgeons were outside the 99.8% CI.
Interpretation -In a Swedish setting, the variation in surgeon performance, as measured by AEs within 90 days and reoperations within 2 years following primary THA, was small and 3% or less of the surgeons were outside the 95% CI for the investigated years after adjustments for case mix. The risk for an individual surgeon to be regarded as having poor performance when creating surgeon-specific feedback in the SHAR is very low when volume and patient risk factors are considered.
In 1975, the 1st orthopedic quality register, the Swedish Knee Arthroplasty Register (Robertsson et al. 2000, Malchau et al. 2018, was started and, 4 years later, it was followed by the Swedish Hip Arthroplasty Register (SHAR) (Kärrholm 2010). These 2 quality registers have played an important role as models for the fair number of successful registers in other countries (Malchau et al. 2018). Today, almost all orthopedic registers publish an annual report with results aggregated at hospital level. Some of the registers have also developed pro-Background and purpose -Feedback programs relating to surgeon levels have been introduced in some orthopedic quality registers around the globe. The aim of an established surgeon feedback program is to help surgeons understand their practice and enable an analysis of their own results. There is no surgeon feedback program in Sweden in the orthopedic quality registers and there is a fear that a feedback system might pinpoint surgeons as poor performers, partly due to patient case mix. As a step prior to the introduction of a future possible feedback program in Sweden, we assessed the variation in the occurrence of adverse events (AE) within 90 days and reoperations within 2 years between surgeons in western Sweden and explored the number of surgeons outside the control-limit following primary total hip arthroplasties (THAs).
Patients and methods -Patient data, surgical data, and information on the surgeons, relating to surgeries performed in 2011-2016, were retrieved from 9 publicly funded hospitals in western Sweden. Data from medical hospital records, the Swedish Hip Arthroplasty Register (SHAR) and a regional patient register located in western Sweden were linked to a database. Funnel plots with control-limits based on upper 95% and 99.8% confidence intervals (CI) were used to illustrate the variation between surgeons in terms of the outcome and to explore the number of surgeons outside the control-limit. Both observed and standardized proportions are explored. The definition of surgeons outside the controllimit in the study is a surgeon above the upper 95% CI.
Results -The study comprised 9,482 primary THAs due to osteoarthritis performed by 208 surgeons, where 91% of the included primary THAs were performed by orthopedic specialists and 9% by trainees. The mean overall annual volume for all surgeons was 27. The observed overall mean rate for AEs within 90 days for all surgeons was 6.2% (5.8-6.7) and for reoperations within 2 years 1.8% (1.7-2.2). The proportion of surgeons outside the 95% CI was low for grams for providing surgeon-level feedback and benchmarking data with other surgeons (National Joint Register 2015, Australian Orthopaedic Association National Joint Replacement Registry 2017). The main aim of the feedback programs at surgeon level, hosted by quality registers, is to help surgeons understand their practice.
The models used for visualizing single surgeons and benchmarking between peers in the National Joint Register for England, Wales, Northern Ireland, the Isle of Man and the States of Guernsey (NJR) and the Australian Orthopaedic Association National Joint Replacement Registry (AOANJRR) is funnel plots (Spiegelhalter 2005). The funnel plot has been suggested to be an appropriate statistical technique for reporting surgeon outcomes (Walker et al. 2013). The AOANJRR adjusts surgeon-level data for patients' age and sex but not for other factors, such as BMI, comorbidities, and smoking, which have been suggested to influence the risk of AE and reoperation following arthroplasty (Mantilla et al. 2002, Thörnqvist et al. 2014, Duchman et al. 2015, Singh et al. 2015, Lübbeke et al. 2016. As yet, none of the Swedish orthopedic registers has started a feedback program to provide individual surgeon data. Little is known about the surgeon performance in a Swedish setting. Here, we describe the variation in outcomes of AE within 90 days and reoperations within 2 years following primary total hip arthroplasties (THAs) among surgeons in western Sweden and explore the number of surgeons outside the control-limit.

Patients and methods
All primary THAs in patients with a diagnosis of osteoarthritis (OA) of the hip, performed in hospitals managed by the county council of western Sweden between 2011 and 2016, were included in the study (Figure 1). Hospital medical records, the SHAR and the regional patient register, Vega (hereafter only namned regional patient register) were used as data sources. A complete list of sources for the variables that were used, including confounders, is shown in the Supplementary data (Appendix). The link between hospital medical records and the SHAR was made using the 10-digit personal identity number (PIN), the name of the hospital, and the date of surgery. If divergent information was obtained from the SHAR and the hospital medical records, the information in the SHAR was regarded as superior. The linked dataset, containing information from hospital medical records and the SHAR, was subsequently forwarded to the regional patient register to add all AEs and the data were made anonymous by replacing the PIN with a unique identifier.

Swedish Hip Arthroplasty Register
The aim of the SHAR is to register all primary THAs and reoperations performed in Sweden (Kärrholm 2010). Although participation is voluntary for both hospitals and patients, the completeness and coverage of the SHAR have been high during the past few decades ). The variables recorded in the SHAR include patient factors such as age, sex, diagnosis for implantation, BMI, ASA classification, and technical details on the surgery, such as fixation technique and type of implant.
Vega -a regional patient register The regional patient register was initiated in 2000. It is an aggregated database, containing records relating to all healthcare contacts (both publicly and privately funded) for all the residents in western Sweden. The population in western Sweden was approximately 1.6 million people in 2011 and 1.7 million people in 2016, which constitutes approximately 17% of all the residents in Sweden. The regional patient register provides information to the National Patient Register (NPR) (Ludvigsson et al. 2011). The PIN is used as the unique identifier of all entries in the regional patient register. The regional patient register contains details on the depiction of the caregiver at the point of contact, for example, the level of hospital or elective care, diagnoses, interventions, and length of stay in hospital.

AEs and reoperations
The definition of AEs we used is the same as in the SHAR and was presented in their 2018 Annual Report ). It has also been used in previous studies (Berg et al. 2018, Jolbäck et al. 2019  tendon ruptures in the lower extremities), and medical complications (thromboembolic events, myocardial infarction, pneumonia, gastroduodenal ulcers, acute kidney injury, and urinary retention). Reoperations are defined as any further surgery following the index surgery on the previously operated hip. All diagnoses for computing AEs were retrieved from the regional patient register, while the reoperations were retrieved from the SHAR.

Statistics
Continuous data are presented as means (SD), while categorical data are presented as proportions. Funnel plots were used to visualize variations between surgeons in the proportion of AEs and reoperations respectively and to explore the number of surgeons outside the control-limits. The control-limits are based on upper 95% and 99.8% confidence intervals (CI).
Wilson's method suitable for low n was used to construct the CIs (Brown et al. 2001). The control-limits are dependent on the sample size; a small sample size increases the control-limits and a larger sample size reduces the limits (i.e., surgeons undertaking few surgeries will have a wider control-limit). Both observed and standardized proportions are explored. The standardized proportion was calculated for each surgeon as the ratio of the number of observed events divided by the number of expected events, multiplied by the overall proportion of events (Spiegelhalter 2005). Logistic regression with adjustments for patient risk factors was used to determine the probability of an event for a patient. The expected number of events for a surgeon was estimated by summing the predicted values of an event for the surgeon's patients. The variables age, sex, ASA classification, BMI, and diagnosis for implan-tation were assumed to be related to the outcome (Mantilla et al. 2002, Thörnqvist et al. 2014, Duchman et al. 2015, Singh et al. 2015, Lübbeke et al. 2016. For AEs within 90 days, all 5 predictors were included in the logistic regression model. For reoperations within 2 years, the best-performing model included sex, ASA classification, and BMI. Only cases with complete data were included in the analysis. Patients undergoing simultaneous bilateral primary THAs were included as 1 surgery in the study; 43 patients underwent simultaneous bilateral primary THAs. Staged bilateral primary THAs were performed on 732 patients. We also performed a sub-analysis including surgeons performing more than 10 THAs annually (Walker et al. 2013).
SPSS version 25 (IBM Corp, Armonk, NY, USA) and R version 3.2.3 (R Foundation for Statistical Computing, Vienna, Austria) (R Core Team 2019) were used for the statistical analysis.

Ethics, funding, and potential conflicts of interest
The study was approved by the Central Ethical Review Board in Stockholm (DNR Ö 9-2016). A research grant for the project was received from Skaraborgs Hospital research foundation. There is no conflict of interest.

Results
The analysis included 208 surgeons from 9 public hospitals in western Sweden who performed the 9,482 primary THAs due to OA (Table 1). The categorization of the hospitals included was 1 university-regional hospital, 3 county hospitals, and 5  (5) (17) rural hospitals (based on the SHAR's categorization of hospitals). Of the 9,482 primary THAs included in the analysis, 8,636 (91%) were performed by orthopedic specialists and 846 (9%) by trainees. The mean annual volume of primary THAs for all surgeons and all years was 27 (SD 17). The mean annual volume of primary THAs varied ( Table 2). The annual number of surgeons performing primary THAs decreased in the latter part of the period investigated (Table 3). The overall mean rate for AEs within 90 days for all surgeons was 6.2% (SD 7.3), with a variation during the years between a minimum of 5.8% (year 2013) and a maximum of 6.7% (year 2011). The corresponding proportion for reoperations within 2 years was 1.7% (2011) to 2.2% (2016), with an overall mean rate of 1.8% (SD 3.9%).
During the years 2011-2016, there were few surgeons outside the upper 95% CI. The year with the highest number of surgeons outside the 95% CI for AEs within 90 days was 2013, there were 6 surgeons outside the control-limit and, after standardization for case mix, only 3 surgeons remained outside the limit. The proportion of surgeons outside the 95% CI during the years investigated varied between 0% and 5%.
The proportions of surgeons outside the 95% CI for reoperations within 2 years were also small, with variations between 0% and 1% annually (min-max) when examining both the observed and standardized proportions.
The result of the sub-analysis, when we included surgeons performing more than 10 primary THAs annually, showed that the surgeons who were outside the control-limit for AEs within 90 days were reduced by more than half (observed) and disappeared when standardization were made for case mix (Table  3). For reoperations within 2 years, all the surgeons outside the control-limit disappeared in the sub-analysis, apart from 1 surgeon in 2016, but, after standardization, this remaining surgeon also disappeared (Table 4).
Considering the complete period of 6 years, less than 1% (1 high-volume surgeon for AEs and 2 high-volume surgeons for reoperations) after risk adjustments were outside the 95% CI, and no surgeons were outside the 99.8% CI (Figure 2). Table 3. Annual number of surgeons outside the control-limit (above the upper 95% CI) in funnel plots due to AE within 90 days and reoperations within 2 years for all surgeons regardless of annual surgeon volume of primary THAs  For footnotes, see Table 3 Figure 2. Funnel-plots for AE within 90 days (top panel) and reoperation within 2 years (bottom panel) with the observed and standardized proportions overlaying. The green line is the mean value for the outcome of interest. The yellow line is the 95% CI and the red line is the 99.8% CI. Each dot represents one surgeon. Red dots are the observed proportion and blue dots are the standardized proportion.

Discussion
Less than 3% of the surgeons were outside the upper 95% CI in this study for both AEs within 90 days and reoperations within 2 years, not only after adjustments for differences in patients' characteristics but also before any standardization was made. The overall mean rates of both AEs and reoperations in the study are similar to the national average for elective primary THAs in Sweden . All the confounders we were able to adjust for are known from earlier studies to influence AEs and the risk of reoperation (Mantilla et al. 2002, Thörnqvist et al. 2014, Duchman et al. 2015, Singh et al. 2015, Lübbeke et al. 2016. However, there could be unknown confounders not available in this study that might affect the outcome following primary THA. The number of surgeons outside the control-limit due to reoperations within 2 years for patients undergoing surgery between 2015 and 2016 needs to be interpreted with some caution, as the follow-up period for these cases is shorter (0.5-1.5 years) than for the surgeries performed in 2011-2014.
The small number of surgeons outside the control-limit for both AEs and reoperations might be an effect of primary THA surgery being a highly standardized procedure in Sweden, but it might also be an effect of the long tradition of quality registers in Sweden providing feedback at hospital level. Primary THA surgeons in Sweden follow the recommendations given by the SHAR and the relatively large proportion of cemented THAs (Mäkelä et al. 2014, with a fairly small number of different prostheses accounting for the majority of operations that are reported, may also contribute to the excellent outcomes. We included only primary THAs due to primary and secondary OA. OA is the most common reason for primary THAs, where four-fifths have OA as a reason for implantation  during the period of the study. Thus, some of the surgeons included in the study might produce a higher annual volume of primary THAs for reasons other than implantation for OA (e.g., fracture, inflammatory arthritis, femoral head necrosis, childhood disease). The experience from these other primary THAs might contribute to improved outcomes following all THAs for these surgeons and the same improvement in outcomes might be seen between surgeons performing both revision THAs and primary THAs.
The control-limits in our funnel plots are based on CIs, as is the case in the AOANJRR's feedback system. The AOANJRR has had a lower limit of 50 performed surgeries since the start of the feedback system. We examined the surgeons' results every year and a fairly large number of surgeons performed only a few operations every year. The number of surgeons outside the control-limit in the study must be interpreted with caution, as we have included all surgeons, regardless of annual surgical volume, and this might increase the uncertainty.
We chose to present individual surgeon variations in funnel plots with control-limits based on CIs using Wilson's method. The choice of method for constructing control-limits is sensitive when the volume of annual surgeries is low. Wilson's method was chosen because of the low annual surgeon volume. However, the number of surgeons outside the control-limits should be interpreted with caution when exploring the variation between surgeons with a low annual volume.
When we excluded surgeons performing 10 or fewer primary THAs, almost half or more than half of the surgeons were excluded from the analysis. However, despite this halving of the number of surgeons, we executed the sub-analysis and the findings in this sub-analysis not only halved the number of surgeons, it also reduced or removed the number of surgeons outside the control-limit for both AEs and reoperations. Perhaps there is a "lower volume issue" that needs to be considered in order to make a reliable comparison between surgeons, and not only a problem with the case mix in terms of differences.
Only 9% of the primary THAs was operated on by trainees. This small number of procedures performed by trainees might reflect the trainee education system in Sweden where almost all hospitals educate their own trainees. We can only speculate as to whether trainees are more likely to be outside the control-limit than trained surgeons. The reason for this is that in Sweden a trainee can apply for specialist certification in orthopedics at the Swedish National Board of Health and Welfare after fulfilling the requirements of the orthopedic trainee program at any time of the year. Therefore, a surgeon could have been both a trainee and orthopedic specialist during the same year. However, trainees or newly certified specialists are more probably likely to be low-volume surgeons, and thereby have an increased risk of being regarded as poor performers compared with more experienced surgeons (Ravi et al. 2014, Koltsov et al. 2018).
The small number of surgeons outside the control-limit for both AEs and reoperations in our study might be in conflict with the development of an individual surgeon feedback program following primary THAs in Sweden. However, there might be other aspects and benefits of an individual surgeon feedback program rather than presenting surgeons outside the control-limits, such as general information on individual surgeons' practice, a substitute for former clinical follow-up visits to the operating surgeon, etc. Further research is needed to explore whether there are other aspects, benefits, or doubts from the surgeons' point of view on the development of a feedback program.
One strength in this study is that we have been able to adjust for BMI and ASA classification. These 2 confounders are recorded in the SHAR's standard collection of variables and it is therefore easy to add them to a possible future program for individual surgeon feedback.
One limitation in our study is that only primary THAs performed within the region of western Sweden were included. Some of the surgeons involved in the study might have had temporary or partial employment, having performed primary THAs outside the region investigated. Due to the terms of employment laws in Sweden, it is very uncommon for surgeons to perform surgeries for multiple employers. We anticipated that the limited number of surgeons operating outside the region of western Sweden would not influence our conclusions.
Our study also shares the same limitation as all observational studies using administrative data. Both changes in practice during the study period and local trends, as well as differences in registration, might occur between the included hospitals during the period investigated. The regional patient register we used has not been validated on its own, but it provides data to the NPR. The Swedish National Inpatient Register (IPR) is part of the NPR. The IPR has been validated and contains 99% of all hospital discharges (Ludvigsson et al. 2011). We used a definition of AEs and reoperations requiring hospital admission. We therefore believe that our data are robust and our conclusions are valid.
In summary, the variation in surgeon performance, as measured by AEs within 90 days and reoperations within 2 years following primary THA, was small and 3% or less of the surgeons were outside the 95% control-limit for the years investigated after adjustments for case mix. The risk for an individual surgeon to be regarded as having poor performance when creating surgeon-specific feedback in the SHAR is very low when volume and patient risk factors are considered.

Supplementary data
The Appendix is available as supplementary data in the online version of this article, http://dx.doi.org/10.1080/17453674. 2020.1772584 PJ had the original idea for the study and prepared the first version of the manuscript. PJ and EN processed the data. All the authors took part in the planning of the study, the analysis and interpretation of the data, and the writing of the manuscript.
Acta thanks Gary Hooper and Stein Atle Lie for help with peer review of this study.