Research on prediction of contact stress of acetabular lining based on principal component analysis and support vector regression

Abstract In the "worst-case" selection of hip prosthesis wear, it is necessary to calculate the contact stress of the acetabular liner. However, there are various combinations of acetabular prostheses. If calculated one by one, it will cause a large workload, a repeated and tedious calculation problem. To solve this problem, a machine learning prediction method by combining principal component analysis and support vector regressions (PCA-SVR) was established. First, the finite element method is used to analyze and calculate the contact stress of the acetabular liner in a typical combination to form a basic data set with the key size of the acetabular prosthesis as input and the contact stress as output; then, based on this data set, the PCA reduces the dimension of the input to obtain a new data set. Finally, based on this data set, SVR is used to establish the mapping model, and the optimal value of the model parameter C and is obtained by combining K-fold cross-validation and grid search method. The maximum absolute error of the prediction on the test data set is only 0.1986, the root mean square error RMSE is only 0.09309, and the R² value is 0.9426, which verifies the effectiveness of the prediction model. At the same time, the prediction performance is compared with the Ridge regression and Lasso models, which further verifies the superiority of the proposed method.


Introduction
Artificial hip replacement has achieved great success in treating bone diseases such as hip fractures, congenital acetabular dysplasia, and femoral head necrosis. Contact stress is an important part of biomechanics, which has been the focus of research on biomechanics scholars since the birth of artificial joints. Maxian et al. [1] established a three-dimensional non-linear contact finite element method for total hip arthroplasty, and the maximum contact stress distribution on the polyethylene liner under 16 discrete gaits was investigated. Wang et al. [2] implemented the finite element method (FEM) on the contact mechanics of the oval bearing surface of the metal-on-metal (MOM) hip joint implant under the standard and slightly lateralized conditions. Hua et al. [3] quantified the occurrence of edge loads/ contacts on the surface of the hip joint and evaluated the cup angle and edge effect of load on contact mechanics of modular metal-polyethylene (MoP) total hip arthroplasty (THR). Cilingir [4] studied the contact mechanics and stress distribution of ceramic-ceramic (COC) hip joint surface replacement using finite elements, and many other scholars have contributed to contact stress. Contact stress is also an important factor affecting the level of acetabular liner wear. Robert Ko sak determined the stress distribution of THA through the HIPSTRESS method, and pointed out that volume wear is related to peak contact stress [5]. Lizhang et al. [6] studied the effect of a metal femoral head on the wear of natural acetabular cartilage after hip replacement. It was found that the linear wear of cartilage increased with the increase of contact stress, sliding distance and speed [6]. Jangid et al. [7] calculated the contact stress and sliding distance under different material groups of artificial joints using finite element analysis, and used the modified Hardard wear law calculating the amount of wear.
The artificial hip joint prosthesis includes four components: an acetabular cup, an acetabular liner, a femoral head and a femoral stem. In the design and development of acetabular prosthesis, the contact stress of acetabular lining is an important indicator for selecting the "worst case" of acetabular lining wear. The acetabular cup, the acetabular lining and the femoral head are different in size. They are assembled according to a certain size relationship to form different acetabular prostheses; therefore, the contact stress in the acetabular prosthesis is also different. An artificial hip joint prosthesis designed by Company A, contains 48 types of assembly relationships. To calculate the lining contact stress one by one, it will require a large amount of calculation and repetition. However, there is no related method at home and abroad that can easily calculate the contact stress of the acetabular liner.
The purpose of this study is to overcome this difficulty, and find a way to predict the contact stress of the acetabular liner based on the external characteristics of each acetabular prosthesis. Machine learning is a research field that has re-emerged in recent years, in which the support vector regression machine has an excellent performance in a small sample and highdimensional feature data prediction [8][9][10][11]. This provides an idea for us to predict the contact stress of the acetabular liner. The most important thing in machine learning is historical data. In the previous researches, the finite element method in calculating contact stress has been proved to be effective. Therefore, we used the finite element method to calculate the contact stress of 48 components one by one, which also took us a lot of time. A large part of the data is used to train the prediction model, while another small part is used to test the effectiveness of the model.

Prediction process
This article predicts the contact stress of the acetabular lining as shown in Figure 1:

Finite element calculation
The finite element method [12] is an efficient and economical numerical simulation method, which is widely used in various industries. The finite element calculation is to prepare data for the prediction model. The acetabular stress was calculated using the method in [13], and the constraints and loads were selected for simulation according to [14]. The geometric model of the acetabular prosthesis established by unigraphics (UG) software is shown in Figure 2: In order to improve the calculation efficiency, the model needs to be simplified. Some insignificant details were deleted and the femoral stem neck was cut off and the part below the stem body, a small part of the neck was kept. After simplification, it is shown in Figure 3(a). A static sub-module was established using Ansys software. The material of the  acetabular cup and the femoral stem was Ti6Al4V, the acetabular lining material was ultra-high molecular polyethylene, and the femoral head material was CoMoCr, respectively. The material parameters are shown in Table 1. The contact between the outer cup of the acetabulum and the lining is "bonded". The frictional contact between the femoral head and the lining is 0.038. The constraint between the femoral stem and the femoral head is "bonded". The mesh is hexahedral, and the mesh model is shown in Figure 3(b). Applying a fixed restraint on the neck, we allow the acetabular outer cup to move vertically and constrain other degrees of freedom, exerting a concentrated force of 1500 N on the outer surface of the acetabulum through the center of the sphere, with abduction/ adduction angle À15 , flexion/extension angle À13 [14], the constrained model is shown in Figure 3(c). The lining contact stress for outputting a group of acetabular prostheses is shown in Figure 3(d).The diameters of the femoral stem, the acetabular cup and the femoral head are different for people with different physiques. To adapt to the eccentricity of different hip joints, the femoral head has "plus and minus". Although the diameter of the femoral head may be the same, the remaining dimensions are slightly different. Due to the above diversity of geometric dimensions of each group, the contact state of each group of acetabular prostheses can be different, which influences the distribution of the contact stress. Using this method, the contact stress of the other 47 groups of acetabular prostheses was calculated.
Support vector regression theory Support vector regression [15] is a branch of the development of support vector machine [16], which has outstanding advantages in solving "dimensional disaster" and "over-learning". By building a training vector, a nonlinear relationship between the prediction vector and the support vector is established, and the prediction vector is forecasted. Given a training set, where m are the number of samples, SVR maps the input space to a high-dimensional feature space through a kernel function, and performs regression in the high-dimensional space, which is shown as follows: where w is the weight coefficient, /ðxÞ is the feature space, and b is the bias term.
According to the principle of minimizing risk, the objective function can be established as follows by minimizing the weight coefficientw and the bias term b [17]: where f ðx i Þ À y i j j e is the loss function. In order to minimize |w| 2 Euler's norm and control the fitting error simultaneously, the relaxation variables fng m i¼1 and fn Ã g m i¼1 are introduced. The optimization problem in (2) can be transformed into a constraint minimization problem, which is: s:t: To derive the dual problem of the original problem (Eq. (3)), the Lagrange multipliers a i , a Ã i , g i , g Ã i are introduced to establish the Lagrange equation of the original problem. By partialing derivative ofw, b, n i , n Ã i to Lagrange function, and letting the partial be 0, substituting the result into Lagrange's equation, we get the dual problem of the original problem: Therefore, the original problem is transformed into a secondary planning problem. By solving this secondary planning problem, a model of support vector regression machine can be obtained: where Kðx i , xÞ ¼ /ðx i Þ Á /ðxÞ is the kernel function, x i is the training sample, x is the test sample.
In theory, as long as the function meets the Mercer condition [18], it can be used as the kernel function. There are currently four kinds of kernel functions: linear kernel function, p-order polynomial kernel function, multi-layer perceptron kernel function, and Gauss radial basis kernel function (RBF). Due to the advantages of the generalization and fast calculation speed of RBF kernel functions, RBF kernel functions are more popular. The RBF kernel function can be expressed as: where c is the kernel parameter.
The establishment of acetabular liner contact stress prediction model Using this method, the 48 groups of acetabular prostheses can be numbered from 1 to 48 according to the size of the acetabular cup and the femoral head.
The contact stress on the acetabular liner depends on the geometrical structures of the acetabular prosthesis. Different structures and sizes cause different contact stresses of the acetabular lining. Therefore, the key dimensions of the acetabular prosthesis (e.g.：acetabular ball radius, lining ball radius, femoral head radius, acetabular thickness, etc., a total of 12 sizes) are used as the feature data x 0 of the sample data. The schematic diagram of its characteristics is shown in Figure 4. The sample features are shown in Figure 5.

Feature dimensionality reduction
Due to the large number of features (12 features) and the small number of samples (48 samples), in order to fully learn the hyperparameters, the feature vector needs to be reduced in dimension. Principal Component Analysis (PCA) is a commonly used linear dimensionality reduction method. Its basic idea is to map high-dimensional data to low-dimensional data space through linear projection; it is not data for highdimensional data. For the reduction, the low-  dimensional integration and reconstruction is carried out. After the low-dimensional features are obtained, the feature vectors that have the most influence are retained, so that the amount of data information lost is very small.
The reconstructed feature is recorded as x. According to the experiment, when the feature dimension is 3, the prediction effect is the best, the feature contribution rate is [0.9375, 0.03646, 0.01680], and the total contribution rate has reached 99.07%, and the amount of information reduction on the original feature is very small. The feature after dimensionality reduction has lost its original physical meaning, and the three feature components arex 1 , x 2 , x 3 , respectively; the feature after dimensionality reduction is shown in Figure 6.
The calculation result of the finite element as the label value of the sample data is counted as y, as shown in Figure 7.
Then we trained the support vector regression machine, randomly abstracted 80% of the data from 48 samples for training. The remaining 20% are used as test samples. During the training process, C and c in the support vector regression machine need to be optimized to obtain a high-precision prediction model. In this paper, K-fold cross validation [19] was used combined with the grid search method [20] to obtain the optimal C andc values.

Acetabular contact stress prediction
Through parameter optimization, the optimal values of C andc are finally obtained, which are C ¼ 23.3 and c ¼ 0.01, respectively. We input the test data to the prediction model, then the prediction results are obtained, as shown in Figure 8.
Note: The red line represents the calculated value of the finite element, the blue line represents the predicted value, the green line represents the absolute error between the calculated value and the prediction, respectively.
It can be seen from Figure 8 that the prediction effect of the model is very good, and the maximum absolute error is only 0.1986.

Model performance evaluation
In order to evaluate the prediction model, we use the two factors of determination coefficient (R 2 ) and root mean square error (RMSE) [21]. The root mean square error and the coefficient of determination are established by the following formula: Where: where y i is the calculated value of finite element,ŷ i is the sample predicted value, y is the sample mean, n is the number of samples, respectively. The smaller the RMSE and the larger R 2 , the better the prediction effect (1 is the best). According to the formulas (7) and (8), the RMSE of the prediction model is 0.09309, and the R 2 is 0.9426.
In order to verify the superiority of the model, the prediction results of this model are compared with  Ridge regression and Lasso models. The comparison of prediction results is shown in Figure 9, and the evaluation index pair is shown in Table 2. It can be seen from Figure 9 that the method proposed in this paper can more accurately predict the contact stress of the acetabular lining with the smallest absolute error. From Table 2 it can be seen that the prediction capabilities of Lasso and Ridge regression are equivalent, while the RMSE value of the method proposed in this paper is the smallest, R 2 Obviously larger than the other two models, which shows that the proposed method has higher prediction accuracy.

Discussion
Using support vector regression to make predictions has been widely used in various industries. Many scholars have made contributions; to name a few, Karmy and Maldonado [22] applied the SVR method to sales forecasting projects in the tourism retail industry, and pointed out that the SVR method is superior to the ARIMA and Holt-Winters methods in forecasting performance. Lu et al. [23] uses SVR to establish a CFRP structure. The damage degree prediction method provides a feasible method for the prediction of the damage degree of CFRP structure. Xue et al. [24] proposed an AUKF-GA-SVR method, which effectively solved the inaccuracy issue in the prediction of the remaining useful life (RUL) of the lithium ion battery. The studies of the above scholars have proved the superiority of the SVR method in predicting performance. From the perspective of the prediction effect in this study, the accuracy of the PCA-SVR method for acetabular prosthesis contact stress prediction model established in this paper is the best. It illustrates that the use of the feasibility of predicting the contact stress of the external structural characteristics of the acetabular prosthesis will also reduce the workload of the R&D personnel and greatly advance the development progress.
However, the method in this paper also has a shortcoming: the training set and test set data are derived from finite element calculations, and the actual experimental data cannot be obtained. Therefore, the model established can only be used for screening of "worstcase" acetabular prosthesis models, and it cannot be used to evaluate whether our products are qualified. The actual experimental data will be included in the next research plan; with the measured data, subsequent research and development of related products will greatly reduce costs.

Conclusions
This paper presents a method for predicting the contact stress of the acetabular lining based on support vector regression machine learning. The prediction results of the test data show the effectiveness of the prediction model. From the calculation results of the finite element method, it can be known that the contact stress of the acetabular lining decreases with the increase of the external dimension of the   acetabular outer cup. This will give us a general direction for the selection of the "worst case" of wear. The RMSE value of the prediction model is 0.09309 and the R 2 value is 0.9426. The prediction performance of the proposed method is compared. The results show that the prediction performance of the proposed method is significantly better than Lasso and Ridge regression. At the same time, it is used for product improvement and design of the same series of products. Provides guidance on the "worst case" selection of wear.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This study was supported by the National Natural Science Foundation of China (NSFC, grant number 11362020).

Data availability
The data that support the findings reported in this study are available from the corresponding author upon reasonable request.