Analysis of polyethylene wear in plain radiographs

Background and purpose Two-dimensional computerized radiographic techniques are frequently used to measure in vivo polyethylene (PE) wear after total hip arthroplasty (THA), and several variables in the clinical set-up may influence the amount of wear that is measured. We compared the repeatability and concurrent validity of linear PE wear on plain radiographs using the same software but a different number of radiographs. Methods We used either 1, 2, or 6 anteroposterior (AP) hip radiographs of 11 patients from a clinical THA series with 12 years of follow-up, and measured the PE wear with the software PolyWare 3D Pro. Repeatability within and concurrent validity between the different numbers of radiograph strategies were assessed using limits of agreement (LOAs) and bias. Results Observed median wear (range) in mm was 3.4 (1.6–4.6), 2.3 (0.7–4.9), and 4.0 (2.6–6.2) for the 1-, 2-, and 6-radiograph strategies. For repeatability, no bias (p > 0.41) was observed. LOAs around the bias were ± 0.6, ± 0.4, and ± 1.2 mm for the 1-, 2-, and 6-radiograph strategies. For concurrent validity, a bias (± LOA) between all pairwise comparisons was observed (p < 0.02) with 0.8 mm (± 2.5) between the 1- and 2-radiograph strategies, 1.0 mm (± 2.2) between the 1- and 6-radiograph strategies, and 1.8 mm (± 1.2) between the 2- and 6-radiograph strategies. Interpretation The number of radiographs used for wear measurement with a shadow-casting analysis method on plain AP radiographs influences the amount of linear wear measured. Results of PE wear obtained with PolyWare in studies using a different number of radiographs are not comparable.

 Polyethylene (PE) wear of more than 0.1-0.2 mm/year is asso� ciated with later osteolysis and failure of total hip arthroplasty (THA) (Sochart 1999, Dowd et al. 2000. In vitro simulator wear studies may not reflect the total sum of PE wear seen in vivo, and therefore continuous investigations of PE wear in the clinical setting with matching reports of the clinical outcome are important. Several different methods are currently used to estimate clinical wear after THA, but few comparisons have been made (Collier et al. 2003, Hui et al. 2003, von Schewelov et al. 2004, Brag� don et al. 2006a, Geerdink et al. 2008. Wear measurements of hip arthroplasty are most accurately performed with radios� tereometric analysis (RSA) (Bragdon et al. 2002, von Schewe� (Bragdon et al. 2002, von Schewe� lov et al. 2004, Borlin et al. 2005. RSA, however, which is limited to prospective studies with recordings of stereometric radiographs at all follow�ups, requires an expensive set�up and is not easily established. Consequently, plain radiographs are still used in most descriptions of clinical wear.
Previous studies have shown great variation in wear meas� urements for specific components, which may in part be caused by intraobserver variance, component and patient fac� intraobserver variance, component and patient fac� tors (Orishimo et al. 2003), pelvic orientation (Collier et al. 2003, Foss et al. 2008, and the radiographic quality (Sychterz et al. 2001). Furthermore, it is unlikely that different methods used to measure PE wear will agree exactly by giving identi� cal results for all individuals (Bland and Altman 1986)-as has also been demonstrated in comparative studies (Collier et al. 2003, Hui et al. 2003, Bragdon et al. 2006a. PE wear results obtained by manual methods are known to have large interobserver variance, and results obtained by manual meth� ods may be difficult to compare directly with results obtained by computerized methods, which have a more predictable accuracy and far better precision (Hui et al. 2003, McCalden et al. 2005. There is currently no consensus concerning how wear anal� ysis is best performed and presented. Bedding�in of the PE component has led some researchers to favor exclusion of the initial period (months to years) of follow�up (Sychterz et al. 1999a, Hui et al. 2003, but probably the period and mag� nitude of creep vary between components. Inclusion of the initial period of wear certainly increases the mean measured wear and also the wear rates calculated. This is particularly problematic when comparing wear rates in studies of short� term follow�up versus long�term follow�up. Some research groups recommend analysis of serial radiographs (Sychterz et al. 1997), while others analyze only the latest follow�up radio� graphs and assume zero wear at baseline (Norton et al. 2002). Few evaluate the precision of their own investigations with the method chosen but rather refer to a specialized laboratory for determination of the precision.
Observations and questions raised in our research group on assessment of clinical wear after hip arthroplasty with a computerized shadow�casting technique (Devane et al. 1995b) inspired us to investigate in greater detail whether analysis of a single, two, or multiple radiographs in the same clinical series of patients would result in different estimates of wear, and if so, how different they would be.

Study design
We measured two�dimensional femoral head penetration into the PE liner in a selected group of 11 patients from a clinical series of 27 patients (28 hips) formerly evaluated for early migration of the femoral stem (Soballe et al. 1993) and later for cup revision, PE wear, and osteolysis (Stilling et al. 2009). The acetabular component used was a hemi� spherical rim flair screw�fixed Universal Hexloc metal back� ing (Biomet Inc., Warsaw, IN) with a 10�degree face GUR 415 bar extruded conventional ultra�high molecular weight PE acetabular liner sterilized by gamma radiation in air. The femoral component was a cementless, proximally coated Bi�Metric stem (Biomet). Cobalt�chromium 28�mm femoral heads were used. The acetabular shells ranged in size from 48 to 62 mm, and the PE thickness ranged from 3.39 to 6.47 mm. One surgeon had performed all the operations using a posterolateral approach. All the radiographs had been taken in the same hospital between 1990 and 2003. No specific radiographic protocols other than the standard one for the hospital had been used. The center beam had been aimed at the hip joint (the femoral head).
The 11 patients (6 men, 5 women) from the original group of 28 patients were selected by the criterion of all having 12 years of radiographic follow�up with 6 good�quality AP radio� graphs and no apparent migration of the cup, as changes in cup angulation have been shown to influence wear measure� ments with the used software (Collier et al. 2003). The 17 patients not included did not have a full sequence of 6 radio� graphic follow�ups from baseline to 12 years (for example, due to missing postoperative AP radiographs), or less than 12 years of follow�up because of death or revision. Despite the fact that the postoperative printed radiographs had been stored for almost 15 years, they were in a satisfactory condition and we did not exclude any patients because of insufficient qual� ity of AP images. Cross�table lateral radiographs were also available, but we chose not to include them because of their poor quality and other problems described in the literature (Sychterz et al. 1999b(Sychterz et al. , 2001). The 6 radiographs were taken at the following time points: postoperatively (within days) and 3 months, 6 months, 1 year, 5 years, and 12 years after surgery. 6 hydroxyapatite�coated components and 5 non�hydroxyapatite components were used. Some of the patients had high amounts of wear and some had low amounts of wear.

Radiographs and software
The AP radiographs were all digitized to tagged image files at a resolution of 300 dots per inch with a transmission�light scanner (Mustek P3600 A3 pro, Irvine, CA). The location of the central ray was estimated by pencilling diagonals between the corners of the rectangular exposure on the radiograph. Analysis was performed with a computerized method featur� ing a digital edge�detection algorithm to fit circles and ellipses to the peripheral shadows of the femoral head and acetabular component (PolyWare Pro 3D Digital version 5.10; Draftware Developers, Conway, SC) ( Figure 1). This technique, devel� oped by Devane et al. (1995a,b), relies on computer�assisted technology to create a 3�dimensional solid model of the acetabular component and femoral head based on back�pro� jection of the radiographs, the femoral head size, and knowl� edge of the design of the acetabular component (CAD library of various prosthetic brands in the software). Femoral head penetration is then calculated as the difference between vector lengths on subsequent measurements. The stated precision of linear wear with the software version used is approximately 0.089 mm (Devane and Horne 1999).
The quality of the digitized AP radiographs was generally good, and the automatic circle�fitting in the PolyWare wear measurement software only rarely had to be overruled by the manual digitizer tablet. Whenever the edge�detection routines failed to accurately locate peripheral shadows of the acetabu� lar component or femoral head, the observer applied 5 evenly� spaced dots manually on the peripheral shadow of the compo� nents. This was the case in 3 of the 198 analyses.

strategies of wear analysis
For wear analysis, we used 3 strategies commonly reported in the literature and compared the results. Firstly, we analyzed all 6 follow�ups and added the sequential wear between follow� ups to obtain the mean linear wear. Secondly, we analyzed the postoperative follow�up versus the final 12�year follow�up, and thirdly we analyzed only the 12�year follow�up, assum� ing zero wear at the time of operation. In what follows, the 3 strategies are referred to as PW 6 (6 radiographs), PW 2 (2 radiographs), and PW 1 (1 radiograph).
132 analyses (11 patients × 6 radiographs × double analy� sis) were performed with the PW 6 strategy. The mean wear estimates for PW 2 were based on 44 analyses (11 patients × 2 radiographs × double analysis), and 22 analyses (11 patients × 1 radiograph × double analysis) were performed for the wear estimate of PW 1 .

Statistics
Repeatability (random variation or precision) of the software package PolyWare was assessed as the standard deviation of the difference (SD dif�intra ) between two PE wear measure� ments on the same radiographs for a particular radiograph strategy (PW 1 , PW 2 , PW 6 ). According to Altman (1995), we further calculated limits of agreement (LOAs), in this case, LOA intra as (SD dif�intra × ± 1.96). The systematic variation (bias) between the double measurements was estimated as the mean difference between the 2 measurements. The differences between the 2 measurements followed Gaussian distribution (Shapiro�Wilk test (Altman 1995)) and these were tested by a paired t�test. The measures of repeatability (SD dif�intra or equivalent the width of LOA intra ) of the 3 strategies were com� pared pairwise by Pitman's test.
Criterion validity defines the correlation of a measurement and an external criterion of the phenomenon under study, while the sub�aspect concurrent validity defines the time�chrono� logical correlation (International Epidemiological Association 1995). Thus, concurrent validity was used for comparison of the 3 strategies of time�chronological wear measurement. For each strategy, we used the average value of PE wear from the double measurements, then estimated the difference between 2 strategies, and finally estimated the standard deviation of the difference (SD dif�inter ) between these strategies with LOA inter as (SD dif�inter × ± 1.96) (Altman 1995). The bias between 2 strategies was investigated as the difference in mean measured PE wear. It followed a normal distribution (Shapiro�Wilk test) (Altman 1995), and was tested by a paired t�test. The corre� lation between methods was described by the coefficient of correlation (r).

Discussion
The purpose of this study was to determine whether the number of radiographs used for analysis with a digital shadow�casting wear analysis technique would influence the wear results. No distinction between creep, articulate wear, and backside wear could be made with the method of wear analysis we used, and since all cups were unrevised, the true amount of wear remains uncertain.
The magnitude of wear obtained with all 3 strategies of wear analysis was high and well above the 0.1-0.2 mm/year thresh� old of linear PE wear described to cause later complications of osteolysis and revision (Sochart 1999, Dowd et al. 2000. We have evaluated PE wear to the final follow�up (death, revi�  sion, or 12�year) in all patients of this formerly randomized patient group in a different study, in which we further addressed the resultant compli� cations of excessive osteolysis and revisions and discussed reasons for the magnitude of PE wear (Stilling et al. 2009). The study group consisted of both Ti�coated cups and HA�coated cups, and we have shown a statistically insignificant but clinically relevant difference in total PE wear between the Ti and HA groups of 3.8 mm and 4.8 mm after an average of 11 years (Stilling et al. 2009). The present study investigated the degree of PE wear in a random group of patients and in a range that was relevant for the software used (Hui et al. 2003), and we do not believe that the difference in magnitude of wear with Ti and HA components would affect the conclusions from these measurements.
We observed large differences in measured median PE wear in the same patients between the 3 strategies, and the PE wear estimated with the 6�radiograph strategy was almost twice that observed with the 2�radiograph strategy. This bias was consistent for the individual measurements (Figure 2), except for 2 values close to wear� through of the liner. Repeatability was found to be best for PW 1 and PW 2 , with LOA around the bias of 0.6 and 0.4 mm, which was better than for PW 6 . One explanation for the rather high random variation in repeatability observed for PW 6 could be the inherent problem that each of the 5 PE wear estimations in this strategy contributes with posi� tive values and variances are summed up from examination to examination. It therefore seems that a multiple�radiograph strategy is best when monitoring the development of wear over time, and less favorable when is comes to a precise estimate of wear at a given time point. Regarding (baseline) from CAD�derived knowledge of the cup component and size, along with information about the femoral head size, whereas with PW 2 the actual baseline position of the cup and head, as estimated from the baseline radiograph, is used for the calculation of PE wear. More research is needed to determine what contributes to the dif� ferences between PW 1 and PW 2 , and to explain whether this is only problematic for the Univer� sal component implant brand. In addition, it is not known which strategy (PW 1 or PW 2 ) better reflects the true wear.
The accuracy and precision of clinical PE wear estimates depend on several variables, including patient factors (Schmalzried and Huk 2004), radiographic quality (Sychterz et al. 2001), assumptions of linear wear patterns (Yamaguchi et al. 1999), hip angulations (Collier et al. 2003, Foss et al. 2008, the wear analysis method used (Bragdon et al. 2006a), intraobserver vari� ance (Engh, Jr. et al. 2002), and manufacturing tolerances of acetabular components (Hui et al. 2003). Plain AP radiographs used for wear anal� ysis are not calibrated (position coordinates), and in retrospective studies radiographs are often not obtained according to a standardized proto� col. The clinical positioning of patients with the risk of slight changes in hip angulations between radiographic follow�ups has been shown experi� mentally to influence wear results (Collier et al. 2003, Foss et al. 2008). The greater the change in angulations between follow�ups, the larger the magnitude of wear measured (Collier et al. 2003). A plausible theoretical explanation for this is that the radiographic shadows of the com� ponents vary with angular displacements, making the basis for automatic edge detection different between follow�ups. Recently, a mathematical correction algorithm has been suggested to make 2�dimensional wear measurements in plain radi�  Bland-Altman plots (left) and scatter plots (right) with lines of equality for concurrent validity between the three strategies. PW 6 : PolyWare using 6 follow-up radiographs; PW 1 : PolyWare using only the final follow-up radiographs; PW 2 : Poly-Ware using the postoperative and the final follow-up radiographs. In the Bland-Altman plots (left-hand panels): x-axis, average of the measurements of 2 strategies; y-axis: difference between measurements of two strategies; red lines, 95% limits of agreement; dashed line, bias from 0; long solid green line, y = 0 line; dots, individual double measures. In the scatter plots (right-hand panels): maroon lines, lines of equality. concurrent validity, we observed a large systematic variation of 1.8 mm with a clinically acceptable random variation (± 1.2 mm) between PW 2 and PW 6 . However, the random variation between PW 1 and the other radiograph strategies exceeded ± 2 mm. The systematic variation can be corrected for if known, whereas this is not possible for the random variation, and thus it seems that clinical measurements obtained with PW 2 were similar to those obtained with PW 6 , with a correlation of 0.89 mm. We were rather surprised to find a low concurrent validity between PW 1 and PW 2 , as, in theory, the random variation for both should have been small. The final follow�up radiograph was the same in both strategies; thus, the difference must have arisen from the handling of the starting point by the software. For PW 1 , the software decides the position of zero PE wear ographs less sensitive to radiographic projection differences and to approximate 3�dimensional "true" linear wear values obtained by RSA . Radiographic projection differences are difficult to control and offer some explanation for the differences in magnitude of measured PE wear by use of few rather than multiple radiographs, which we observed. This observation further stresses the use of a strict protocol for patient positioning for standard hip radiographs. The clini� cal radiographs in our clinical study were all obtained accord� ing to the standards of one radiology department, although not according to a specified study protocol. Thus, the leg was not placed in a soft foam positioner or rotation�stabilized by a fixture, and this most likely affected the projection between radiographs obtained over a long follow�up period.
Despite such problematic issues of estimating PE wear in clinical studies with plain radiographs, PolyWare has been validated for both research and clinical use, and wear meas� urements have been described to correlate well with measure� ments of true wear , Hui et al. 2003. Furthermore, digital wear�analysis methods using plain radio� graphs are far more precise than the early manual methods (Livermore et al. 1990), although radiostereometric analysis is the most precise tool , Bragdon et al. 2002, von Schewelov et al. 2004. Repeatability (precision) of linear 3�dimensional femoral head penetration with PolyWare assessed with phantom images is reported to be between 0.10 mm (Collier et al. 2003) and 0.15 mm . We determined the intraobserver precision with double analysis (using the same images) and found that analysis of 6 radio� graphs resulted in a higher mean difference (0.08 mm) than with analysis of 2 radiographs (0.05 mm), and 1 radiograph (0.02 mm). This is not surprising because analysis of more radiographs would be expected to introduce more variance due to radiographic projection and quality.
Wear analysis with PolyWare is based on a single wear vector and is likely to underestimate the true wear in vivo (Hui et al. 2003), which has been shown to occur multidirectionally (Yamaguchi et al. 1999). However, analysis of AP radiographs has been shown to provide a sufficient estimate of the major wear vector (Sychterz et al. 1999b) in THA, and although a 2�dimensional technique on plain radiographs slightly under� estimates wear (Hui et al. 2003), repeatability is better than that obtained with 3�dimensional techniques, which often rely on lateral radiographs of suboptimal quality (Sychterz et al. 2001). Other causes of wear underestimation may be that the PE wear tract is not a tight cylinder around the femoral head (Devane et al. 1995a). Furthermore, there are no guarantees that the head will be located at the deepest point of the wear tract at the time of radiography (Devane et al. 1995a). Several PE wear studies have addressed the potential of weight�bear� ing supine radiographs for PE wear analysis and the conclusion has been that the measured differences in PE wear between weight�bearing and non�weight�bearing radiographs are of no clinical relevance (Martell et al. 2000, Bragdon et al. 2006b, von Schewelov et al. 2006. Much attention has been given to definition, calculation, and exclusion of the initial and delimited period in clinical follow�up based on theories of creep or bedding�in of the PE liner (Sychterz et al. 1999a, Glyn�Jones et al. 2008), but no consensus has been reached. Creep may depend on various factors, including acetabular component design, activity of the patient (friction heating), and the type or quality of PE. "True in vivo wear" can be described in retrieval studies by coordinate�measuring machines (CMM), and while this offers an accurate estimate of the articulate wear, including creep, backside wear cannot be quantified (Hui et al. 2003). It is thus problematic to correlate the defined "true in vivo wear" obtained by CMM with radiographic measurements of wear that include both articulate and backside wear, and it becomes even more complicated when the first postoperative period is excluded because of theories of creep (Hui et al. 2003). In addition, the exclusion of a variable period of "bedding�in" (6 weeks to 2 years) in some but not all studies inevitably results in different magnitudes of reported wear and wear rates, even though efforts are made to calculate intercepts and the steady� state wear. Thus inter�study comparisons of PE wear are dif� ficult, and there is a need for a standardization guide for the presentation of PE wear results and precisions.
The radiographs used in our study were all printed films digitized for computed wear analysis. Physical degradation and varying resolution may have influenced our wear analy� ses, because the first radiographs were obtained in 1990. The PW 2 and PW 6 radiograph strategies involved both old and new radiographs, whereas the PW 1 strategy was based on only one recent radiograph of potentially superior quality. As we did pay attention to the radiographic quality, only the AP radio� graphs were selected and they were all judged to be of good quality in terms of visual implant borders. In support of this, the automated digitizer system for the software (PolyWare) only failed and had to be manually overruled a total of 3 times in 198 wear analyses. Future improvements with direct digital radiographs may improve the precision of PolyWare, but cur� rently the software is recommended only for series of substan� tial wear, such as UHMWPE liners in long�term follow�up or populations of failed implants (Hui et al. 2003); due to the random variation observed, large sample sizes are to be rec� ommended.
RSA is the most accurate tool for wear analysis and it could be regarded as the gold standard for clinical wear analysis (Bragdon et al. 2002, von Schewelov et al. 2004. Unfortu� nately, we did not have the stereo radiographs needed to com� pare the wear results of the 3 strategies using plain radiographs with RSA. Studies on patient series with adequate long�term plain and stereo radiographic follow�ups should focus on this matter. On the basis of our findings, analysis of the same number of radiographs per patient should be attempted in clin� ical studies assessing PE wear using plain AP radiographs.
In conclusion, our results show that there are indeed limita� tions to comparing mean PE wear results based on analysis of different numbers of plain AP radiographs. Inter�study results of PE wear with PolyWare software using 2 or multiple serial radiographs correlate well and seem comparable. However, care should be taken when mixed strategies are used, and we do not advise comparing PE wear in groups by assessing an unequal number of available radiographs per patient. KS, OR, and MS designed the study. MS, OR, and KS gathered the data and MS, NTA, and KL analyzed the data. MS and OR wrote the initial draft and KS, NTA, KL, OR, and MS revised it. MS, NTA, OR, and KL ensured the accuracy of the data and the analysis.