A matched comparison of the patient-reported outcome measures of 38,716 total and unicompartmental knee replacements: an analysis of linked data from the National Joint Registry of England, Northern Ireland and Isle of Man and England’s National PROM collection programme

Background and purpose — The surgical treatment options for severe knee osteoarthritis are unicompartmental (UKR) and total knee replacement (TKR). For patients, functional outcomes are more important than revision rate. We compared the patient-reported outcome measures (PROMs) of both implant types using a large PROMs dataset. Patients and methods — We analysed a propensity-matched comparison of 38,716 knee replacements (19,358 UKRs and 19,358 TKRs) enrolled in the National Joint Registry and the English National PROM collection programme. Subgroup analyses were performed in different age groups. Results — 6-month postoperative Oxford Knee Score (OKS) for UKR and TKR were 38 (SD 9.4) and 36 (SD 9.4) respectively. A higher proportion of UKRs had an excellent OKS (≥ 41) compared with TKR (47% vs 36%) and a lower proportion of poor OKS (< 27) scores (13% vs. 16%). The 6-month OKS was higher in all age groups for UKR compared with TKR, with the difference increasing in older age groups. The mean 6-month EQ-5D score was 0.78 (SD 0.25) and 0.75 (SD 0.25) respectively. The improvement in EQ-5D resulting from surgery was higher for UKR than TKR both overall and in all age groups. All comparisons were statistically significant (p < 0.05). Interpretation — UKR had a greater proportion of excellent OKS scores and lower proportion of poor scores than TKR. Additionally, the quality of life was higher for UKR compared with TKR. These factors should be balanced against the higher revision rate for UKR when choosing which procedure to perform.

A matched comparison of the patient-reported outcome measures of 38,716 total and unicompartmental knee replacements: an analysis of linked data from the National Joint Registry of England, Northern Ireland and Isle of Man and England's National PROM collection programme Background and purpose -The surgical treatment options for severe knee osteoarthritis are unicompartmental (UKR) and total knee replacement (TKR). For patients, functional outcomes are more important than revision rate. We compared the patient-reported outcome measures (PROMs) of both implant types using a large PROMs dataset.
Patients and methods -We analysed a propensitymatched comparison of 38,716 knee replacements (19,358 UKRs and 19,358 TKRs) enrolled in the National Joint Registry and the English National PROM collection programme. Subgroup analyses were performed in different age groups.
Results -6-month postoperative Oxford Knee Score (OKS) for UKR and TKR were 38 (SD 9.4) and 36 (SD 9.4) respectively. A higher proportion of UKRs had an excellent OKS (≥ 41) compared with TKR (47% vs 36%) and a lower proportion of poor OKS (< 27) scores (13% vs. 16%). The 6-month OKS was higher in all age groups for UKR compared with TKR, with the difference increasing in older age groups. The mean 6-month EQ-5D score was 0.78 (SD 0.25) and 0.75 (SD 0.25) respectively. The improvement in EQ-5D resulting from surgery was higher for UKR than TKR both overall and in all age groups. All comparisons were statistically significant (p < 0.05).
Interpretation -UKR had a greater proportion of excellent OKS scores and lower proportion of poor scores than TKR. Additionally, the quality of life was higher for UKR compared with TKR. These factors should be balanced against the higher revision rate for UKR when choosing which procedure to perform.
The main treatments for severe knee arthritis that has failed to respond to nonoperative management are total knee replacement (TKR) and unicompartmental knee replacement (UKR). UKR offers advantages over TKR including reduced mortality and medical complications (Liddle et al. 2014), and a faster recovery, but the registries report several times higher revision rates (National Joint Registry 2018, Australian Orthopaedic Association 2019, New Zealand Joint Registry 2019). Approximately 50% of knees needing replacement are appropriate for UKR (Willis-Owen et al. 2009), yet current usage is only 10% given the higher revision rates (National Joint Registry 2018). Although there is some evidence of better functional outcomes for UKR compared with TKR, all previous studies are limited by sample size, particularly for the UKR arm (Baker et al. 2012, Liddle et al. 2015, Beard et al. 2019, Wilson et al. 2019. In assessing risk, patients need more information than revision rate alone, which is the traditional metric for measuring joint replacement outcome (Goodfellow et al. 2010). In recent years there has been a drive towards more patient-directed outcomes. Goodman et al. (2020) found that what mattered most to patients following a knee replacement was relief of pain, restoration of function, and improved quality of life.
We compared the functional outcomes and quality of life of matched TKRs and UKRs, both overall and in different age groups, using data from 3 national datasets.

Data sources
We performed a retrospective observational study using National Joint Registry for England Wales and Northern Ireland and Isle of Man (NJR) records linked to the Hospital Episodes Statistics Admitted Patient Care records (HES-APC) database and England's National Patient Reported Outcome Measures (PROMs) database. The NJR was established in 2003 and is now the world's largest arthroplasty register (National Joint Registry 2018). HES-APC records is a database of all admission episodes for patients being admitted to an NHS hospital in England (NHS Digital 2020a). From approximately 2009 onwards, NHS-funded knee replacements as part of the PROMs programme have both preoperative and 6-month postoperative PROMs recorded (NHS Digital 2020b). These include the Oxford Knee Score (OKS) (Murray et al. 2007) and quality of life index EuroQol 5 Domain index (EQ-5D) (Group 1990, Devlin et al. 2010. The choice of time intervals by the PROMs programme was a compromise between appropriate proximity to surgery (to provide timely feedback and to avoid influence of nonoperative factors) and sufficient follow-up for comparison whilst accounting for the postoperative recovery period. Research indicates most improvement in PROMs after joint replacement occurs in the first 6 months, with only minor improvement between 6 months and 1 year (Browne et al. 2013). Long-term studies of TKR and UKR have shown that PROMs remain relatively constant after this, at least up to the 10th postoperative year (Pandit et al. 2011, Breeman et al. 2013, Williams et al. 2013).

Data linkage
Between January 1, 2004 andDecember 31, 2018, 687,910 TKRs and 55,248 UKRs from the NJR dataset (National Joint Registry 2018) were successfully linked to the HES APC dataset (NHS Digital 2020a) with a full set of baseline demographic and surgical factors needed for matching. Bilateral knee replacements were excluded to allow data linkage. This dataset was merged with the HES PROMs dataset, which started collecting data from approximately 2009 onwards (NHS Digital 2020b). All preoperative PROMs needed to be completed within 3 months prior to surgery or at the latest 1 month postoperatively to be regarded as robust for inclusion. Cases were excluded if either no preoperative anxiety score was available or there was not both a preoperative and postoperative OKS. The demographics of these patients excluded are summarised in Table 1 (see Supplementary data). There were 254,355 TKRs and 20,347 UKRs meeting the above criteria, making them eligible for inclusion ( Figure 1). Datasets were linked using pseudo-anonymised identification numbers.

Propensity matching
There were substantial differences in baseline characteristics between TKR and UKR groups (Table 2). Logistic regression was used to generate a propensity score representing the probability that a patient received a UKR and were generated from patient demographics and surgical factors. All patient and surgical factors in Table 2 were used for matching, apart from BMI, which had a large proportion of missing data. This is a well-recognised approach (Matharu et al. 2017, Matharu et al. 2018, Mohammad et al. 2020a. Surgical factors included surgeon caseload, defined as the average number of primary knee replacements performed per year as described previously (Liddle et al. 2016, Mohammad 2020a. The algorithm used matched 1:1 on the logit of the propensity score with a 0.02-SD calliper width. Greedy matching without replacement was used given its superior performance for estimating treatment effects (Austin 2009a). Standardized mean differences (SMDs) were examined both before and after matching to assess for any imbalance between groups, with SMDs of > 10% suggestive of covariate imbalance (Austin 2009b). After matching, 38,716 knee replacements (19,358 TKRs and 19,358 UKRs) were included for analysis.
The OKS has 12 items relating to knee pain and function, presented as an overall score between 0 and 48 (Murray et al. 2007  and poor (< 27) results defined by Kalairajah et al. (2005). Various estimates of the minimal clinically important difference (MCID) for the OKS have been made; this is considered to be between 3 and 5 points (Beard et al. 2015). The EQ-5D comprises 5 questions concerning mobility, selfcare, activities of daily living, pain, and anxiety/depression. These answers can be presented as a weighted overall index from 1 (perfect health) to -0.594 (worst possible state) (Group 1990, Devlin 2010.

Statistics
Given that PROMs scores were not normally distributed, appropriate nonparametric tests were used. To compare preand postoperative scores within TKR and UKR groups we used the Wilcoxon signed rank test. To compare TKR and UKR scores the Mann-Whitney test was performed. Locally weighted scatterplot smoothing (LOWESS) curves (Cleveland 1979) were plotted to explore the relationship between preoperative and postoperative PROMs. For clarity purposes the scatter points are suppressed in the plots presented. The percentage of the possible change (PoPC) was calculated as described previously (Kiran et al. 2014). This expresses the actual change attained as a percentage of the possible change, for example for a preoperative OKS of 20 with postoperative score of 40. The actual change is 20 points and the possible change is 48-20 = 28. Therefore the PoPC is 20/28*100 = 71.4%. All statistical analyses were performed using Stata (Version 15.1; StataCorp, College Station, TX, USA) except propensity score matching, which was performed using R (Version 3.4.0; R Foundation for Statistical Computing, Vienna, Austria). P-values of < 0.05 were considered significant.

Results
The unmatched cohort consisted of 254,355 TKRs and 20,347 UKRs with several statistically significant baseline differences between groups ( Table 2). The matched study group consisted of 38,716 knee replacements (19,358 TKRs,19,358 UKRs), which were well balanced ( Table 2). The distribution of the OKS scores is illustrated in Figure 2.
LOWESS curves showed as preoperative OKS increased as did the postoperative score, with similar gradients for both implants. A ceiling effect was visible for the higher preoperative scores (Figure 3). For any given preoperative score, the postoperative score was higher for UKR than TKR. Figure  4 shows how the PoPC was influenced by preoperative score through LOWESS curves. For all preoperative scores the PoPC was greater for UKR than TKR. The higher the preoperative score the larger the differences.
The mean preoperative EQ-5D index for the TKR and UKR groups was 0.47 (SD 0.30) and 0.47 (SD 0.30) respectively with this difference being non-statistically significant (p = 0.3). Both groups showed a statistically significant improvement in their 6-month scores (p < 0.001) to 0.75 (SD 0.25) and 0.78 (SD 0.25) respectively. The TKR group gained 0.28 (SD 0.32) points postoperatively whereas the UKR group gained 0.31 (SD 0.31) with the difference being statistically significant (p < 0.001) ( Figure  6).
Preoperative OKS was similar between TKR and UKR across all age groups. Younger age groups had poorer OKS than older groups, reflecting the higher threshold to operate in younger patients ( Preoperatively the proportions of poor, fair, good, and excellent OKS were similar between TKR and UKR for all age groups (Table 5, see Supplementary data). At 6 months postoperatively there was a greater proportion of excellent and a    lower proportion of poor scores for UKR compared with TKR across all age groups (Table 5, see Supplementary data). The proportion of excellent scores was 6%, 12%, 12%, and 10% higher in UKR compared with TKR for the < 55 years, 55-64 years, 65-74 years, and ≥ 75 years groups respectively. The proportion of poor OKS scores was 3-4% lower in UKR compared with TKR across all age groups.
Preoperatively the EQ-5D score was similar between TKR and UKR across age groups (Table 6, see Supplementary data). For both TKR and UKR younger age groups had poorer EQ-5D scores than older groups. For both TKR and UKR all age groups showed statistically significant improvements postoperatively compared with preoperatively (p < 0.001) although UKR gained more points postoperatively compared with TKR across all age groups (p < 0.001) except the < 55 group (p = 0.5) ( Table 6, see Supplementary data). The 6-month EQ-5D was higher in all age groups for UKR compared with TKR (p < 0.001) ( Table 6, see Supplementary data).

Discussion
This is the largest study comparing the PROMs of TKR and UKR and helps provide answers to outcome metrics patients find most important (Goodman et al. 2020). After matching, a substantially higher proportion of UKRs had an excellent OKS compared with TKR (47% vs. 36%) and a lower proportion of poor scores (13% vs. 16%). This is important, given that currently 1 in 5 patients who undergo a total knee replacement are dissatisfied with their knee replacement (Beswick et al. 2012, Price et al. 2018). This number would likely be lower if more patients suitable for UKR had UKR surgery, which by some estimates could be up to 50% (Willis-Owen et al. 2009).
The 6-month postoperative OKS of UKR was higher than TKR by 2 points. This difference is similar to the TOPKAT (Beard et al. 2019) randomised control trial, suggesting that it is a real difference. Although the magnitude of the difference is below the suggested MCID for the OKS, given the skewed nature of the outcome scores (Figure 2), together with the ceiling effect of the OKS (Dawson et al. 2014), this does not mean that the difference is unimportant. Indeed, its importance is highlighted by finding that the relative risk (1.3) of having an excellent score is 30% higher following a UKR rather than a TKR and the relative risk (0.81) of having a poor score is 20% less following UKR. Additionally, for any given preoperative OKS, a greater PoPC postoperatively was observed for UKR than for TKR ( Figure 4). This was particularly marked in patients with higher preoperative OKS.
In all age groups except the < 55 years group, the average 6-month postoperative OKS was about 2 points greater with UKR than TKR and UKR were about 30% more likely to have an excellent OKS and 20% less likely to have a poor OKS. In the < 55 age group the difference in 6-month postoperative OKS between TKR and UKR groups was only 1.3 points. Also, in this age group UKR were 20% more likely to have an excellent outcome and 12% less likely to have a poor outcome. It is not clear why this is. It may be because the preoperative scores were much lower in this group, and, as can be seen in the graph comparing pre-and postoperative OKS (Figure 3), with lower preoperative scores the difference between UKR and TKR is smaller. It may also be that in this age group more UKR patients had early stages of osteoarthritis (without boneon-bone arthritis) and these patients tend not to do well (Ken-nedy et al. 2020a). Despite this, in all age groups UKR have better function than TKR, which justifies using UKR in all age groups. This is particularly important in the elderly as only 4% of knee replacements in patients older than 75 years are UKR (National Joint Registry 2018), when approximately one-third of knee replacements are appropriate for UKR (Kennedy et al. 2020b). Therefore approximately 10 times as many UKRs could be done in this age group.
This study has also shown UKR offers better 6-month quality of life with EQ-5D scores both overall and on age subgroup analyses. Overall and in all age groups the EQ-5D index was between 0.02 and 0.03 points higher for UKR compared with TKR. This is close to the lower limit of the predicted MCID for the EQ-5D index, which is considered to range between 0.03 and 0.54 points (Coretti et al. 2014). When determining quality of life improvement the index is summated annually, so, over the time period the devices are implanted, the differences are likely to be appreciable and well above the MCID.
Our results agree with those reported from Liddle et al. (2015) who found that UKR had higher 6-month OKS and EQ-5Ds than matched TKRs. In contrast they are different from those of Baker et al. (2012) who found no difference in the PROMs gained by UKR and TKR in an analysis adjusted for case-mix and preoperative score. However, there were only 505 UKRs in the Baker study, suggesting that this study was underpowered compared with the Liddle study (3,519 UKRs), which had about 7 times as many UKR, and our study, which had about 40 times as many UKR (n = 19,358). This is the largest study comparing the PROMs of matched UKR and TKR and the first to perform analyses in different age groups. The main study strengths are that we used an unselected registry sample, which reduces the chances of selection bias. By linking datasets various confounding factors were matched, allowing fair comparison. The main limitation is that this is a retrospective study with 6-month postoperative scores with some evidence existing (Browne et al. 2013) of slight further increases (below the MCID) in OKS between 6 and 12 months postoperatively. However, most improvement in PROMs after joint replacement occurs in the first 6 months (Pandit et al. 2011, Breeman et al. 2013, Browne et al. 2013, et al.Williams 2013. Additionally, matching can reduce the generalisability of findings, but given we were able to match virtually all the UKR to TKR this is unlikely to be an issue. Finally, we were only able to match using variables collected in the databases. There could be unaccounted variables that could lead to some residual confounding.

Conclusion
Surgeons have traditionally made the decision on which implant to use based on relative revision rates; however, patients are much more concerned about functionality (Goodman et al. 2020). This study shows that UKR offers superior functional outcomes and quality of life across all age groups.
Although the absolute mean difference in OKS is below the MCID, the likelihood of an excellent OKS is about 30% higher for UKR and the likelihood of a poor OKS is about 20% lower for UKR. We recommend that the findings of this study are discussed with patients alongside the increased risk of UKR revision to help patients make more informed decisions about their care.

Supplementary data
Tables 1 and 4-6 are available as supplementary data in the online version of this article, http://dx.doi.org/10. 1080/17453 674.2021.1956744 HRM, AJ, and DWM designed the study. HRM and DWM analysed the data with statistical support from AJ. HRM, AJ, and DWM helped with data interpretation. HRM wrote the initial manuscript draft, which was then revised appropriately by all authors.
The authors would like to thank the patients and staff of all the hospitals in England, Wales, Northern Ireland and Isle of Man who have contributed data to the National Joint Registry. They are grateful to the Healthcare Quality Improvement Partnership (HQIP), the NJR Research Sub-committee, and staff at the NJR Centre for facilitating this work. The authors have conformed to the NJR's standard protocol for data access and publication. The views expressed represent those of the authors and do not necessarily reflect those of the National Joint Registry Steering Committee or the Healthcare Quality Improvement Partnership (HQIP), who do not vouch for how the information is presented.
Acta thanks Margareta Hedström for help with peer review of this study.
Austin P C. Some methods of propensity-score matching had superior performance to others: results of an empirical investigation and Monte Carlo simulations. Biom J 2009a; 51(1): 171-84.