Prediction of remaining service life of pavement using an optimized support vector machine (case study of Semnan–Firuzkuh road)

of remaining using optimized support vector study ABSTRACT Accurate prediction of the remaining service life (RSL) of pavement is essential for the design and construction of roads, mobility planning, transportation modeling as well as road management sys-tems.However,theexpensivemeasurementequipmentandinterferencewiththetrafficflowduringthetestsarereportedasthechallengesoftheassessmentofRSLofpavement.ThispaperpresentsanovelpredictionmodelforRSLofroadpavementusingsupportvectorregression(SVR)optimizedbyparticlefiltertoovercomethechallenges.Intheproposedmodel,temperatureoftheasphaltsurfaceandthepavementthickness(includingasphalt,baseandsub-baselayers)areconsideredasinputs.Forvalidationofthemodel,resultsofheavyfallingweightdeflectometer(HWD)andground-penetratingradar(GPR)testsina42-kmsectionoftheSemnan–Firuzkuhroadincluding147datapointswereused.Theresultsarecomparedwithsupportvectormachine(SVM),artificialneuralnetwork(ANN)andmulti-layeredperceptron(MLP)models.Theresultsshowthesuperiorityoftheproposedmodelwithacorrelationcoefficientindexequalto95%.


Introduction
Estimation of the prerequisites for the maintenance, repair, rehabilitation and reconstruction of pavement is one of the requirements for the design and maintenance of the structure of pavement. The pavement design methods are based on providing a proper prediction of the structure of pavement to keep it in permissible condition. The term 'remaining service life' (RSL) refers to the time it takes for the pavement to reach an unacceptable status and need to be rehabilitated or reconstructed (Elkins, Thompson, Groerger, Visintine, & Rada, 2013).
Prediction of the RSL is a basic concept of pavement maintenance planning. Awareness of the future conditions of pavement is a key point in making decisions in the planning of pavement maintenance. On the other hand, we know that pavement optimization methods are urgently needed to predict changes in pavement conditions over a defined period of time. These methods CONTACT Shahaboddin Shamshirband shahaboddin.shamshirband@tdtu.edu.vn determine essential actions during the maintenance cycle (Elkins et al., 2013). In the available study, a novel method is applied to predicting the RSL. The basic information for making the RSL prediction model is derived from GPR (ground-Penetrating radar) and HWD (heavy falling weight deflectometer) tests. In road improvement plans, the HWD is a proper tool for evaluating the structural capacity of pavement in service. Because of an efficient simulation of traffic loads, many research institutes use this non-destructive test to assess the condition of pavement (Park & Kim, 2003). HWD applies a tension equivalent to an 80-KN wheel axle. This tension is applied to the pavement surface in a 10-35-second period. Finally, HWD measures the deflection of the pavement surface by means of geophones (Technical and Soil Mechanics Laboratory [TSML], 2012). Deflection data are transferred to Evaluation of Layer Moduli and Overlay Design software. Frequency of falls of weights 4 times 4 Number of geophones 9 5 Geophone sequence (cm) 0-20-30-45-60-90-120-150-180 6 Sampling distance (m) 200 7 Sampling line Semnan-Firuzkuh This software, with the help of back-calculation, calculates parameters such as: the modulus of the pavement layers, the RSL and the thickness of the required overlay, measured through the pavement deterioration models (Karballaeezadeh, Ghasemzadeh Tehrani, & Mohammadzadeh, 2017). The temperature of the asphalt surface is recorded automatically by the HWD device. Table 1 includes the characteristics of the HWD test. The GPR device is another non-destructive device to assess pavement layers. This device is able to measure the thickness of the layers in the form of a continuous profile along the road by sending electromagnetic waves in the range of the radio spectrum and receiving recursive signals (TSML, 2012). Other uses of the GPR device include identifying the location of the underground utilities and checking the moisture and deep damage in the pavement layers (TSML, 2012). Table 2 includes the outcomes of HWD and GPR tests for the Semnan-Firuzkuh road.
In Iran, one of the most common methods to determine the RSL of the pavement is to carry out the HWD test. In spite of numerous benefits, this test has two major disadvantages. The first disadvantage is the high price of equipment and the impossibility of equipping all road and transportation departments. The second disadvantage is interference in traffic flow during the test.
The method proposed by the authors has the necessary accuracy and overcomes the challenges listed for the HWD. Therefore, this method can be used as an alternative to RSL estimation.

An overview of the RSL models of pavement
RSL has been defined as the predicted time that a pavement will behave permissibly in terms of function and structure with routine maintenance (Gedafa, 2008). RSL is useful for rehabilitation programs, funds allocation and predicting long-term requirements. RSL assessment is essential to optimum usage of the structural capacity of existing pavements. Determination of the RSL helps in the decision-making of maintenance strategies and optimal usage of budgets (Vepa, George, & Shekharan, 1996). Precise RSL models facilitate better budget allocation for pavement maintenance programs (Romanoschi & Metcalf, 2000). Determination of the RSL pavement requires the actual characteristics, a description of unacceptable condition and a mechanism to anticipate deterioration. The information required to determine RSL is depicted by Figure 1 (Gedafa, 2008).
There are several methods to estimate the RSL of pavement. These methods are divided into two general groups (Hall, Correa, Carpenter, & Elliot, 2001;Yu, Chou, & Yau, 2008): mechanical and empirical (semi-empirical) methods.
Mechanical methods may use either destructive or non-destructive tests to determine the strength characteristics of the existing pavements through empirical equations or physical laws. Finally, the RSL is calculated using the predicted traffic and determined strength.
In the destructive tests the pavement should be sampled. This sampling will cause damage to the pavement. In non-destructive tests, the approach is based on measured deflection from the pavement surface (Yu, 2005).
In the empirical method, the RSL is taken from observed historical data and further conditions and project characteristics. Also, effects of the major parameters may be predicted either directly or indirectly (Yu, 2005). Table 3 compares empirical and mechanical approaches and shows their advantages and disadvantages.
The methods discussed below were developed by pavement engineering associations.
For calculating the RSL, a graphical procedure was developed using the effective thickness of pavement through the non-destructive deflection testing (George, 1989).
The RSL was calculated using a fatigue model, through evaluation of the rate of crack progression, by Mamlouk et al. in Arizona (Mamlouk, Zaniewski, Houston, & Houston, 1990).
Some models for RSL were developed based on falling weight deflectometer (FWD) results from Werkmeister and Alabaster (2007). Santha et al. advanced a mechanistic prediction model to compute RSL (Santha, Yang, & Lytton, 1990). Furthermore, artificial neural networks (ANN) were applied by Ferregut et al. to develop algorithms that combine the pavement functional condition (i.e. percentage of cracking or depth of rut) with simple remaining life algorithms to estimate the RSL (Ferregut, Abdallah, Melchor, & Nazarian, 1999). Zaghloul and Elfino utilized expected traffic and back-calculated layer moduli to predict the RSL (Zaghloul & Elfino, 2000). Gedafa suggested sigmoidal models for estimating RSL based on the central deflection from a rolling wheel reflectometer (RWD) or FWD (Gedafa, 2008). On   the other hand, approaches to predicting pavement condition can be normally categorized into various classes (Balla, 2010), e.g. deterministic, probabilistic and other approaches. Deterministic regression is likely the most famous estimation method for the estimation of pavement condition. It is normally represented as a regression equation with the dependent variable as the condition index and the age and type of pavement as independent variables (Balla, 2010).
According to Lytton (1987), the probabilistic methods estimate pavement condition with a certain probability. Probabilistic methods normally result in a Probabilistic methods often result in a probability distribution. The most famous model for predicting RSL is survival time analysis, which is considered a probabilistic model. In fact Winfrey and Farrell (1941) used this model to calculate the RSL of pavements in the early 1940s. From 1903 to 1937, survival curves were developed in 46 states Table 3. Approaches to measuring RSL (Yu, 2005 • If historical data are available, this approach is cheaper than another approach. • The effects of the effective parameters can be predicted. • It is fairly simple to do and merge with pavement management systems.
• Need enough historical data.
• Accuracy of estimation is very much a function of data quality and model format. Comprehensive experience and field knowledge are needed for specification of the format.
with the help of the life table procedure. The distribution of survival times was divided into a certain number of equal intervals, e.g. 1 year or half a year. During each respective interval, three mileages were enumerated: the mileage of pavement sections that were in service (beginning of the respective interval), the mileage of pavement sections that were out of service (end of the respective interval) and the mileage of pavement sections that were lost. The probability of survival for an interval is computed by dividing the remaining mileage by the total mileage entered for the respective interval. The survival curve is drawn by depicting the probability versus the time interval in chronological order (Winfrey & Farrell, 1941). The RSL can be predicted by extrapolating the survival curve to zero percent survival. The life table approach is common for the analysis of RSL (Winfrey, 1967).

Huang's comprehensive models
The most prominent deterministic models to determine the RSL of flexible pavement are equations offered by Huang. He offered two equations to calculate the RSL of pavement based on the fatigue and rutting criterion (Huang, 2004): where N f is the maximum number of repetitions of cracks due to fatigue does not occur in thepavement, E t is the tensile strain at the bottom of the asphalt layer, E 1 is the elastic modulus of the asphalt layer and f 1 , f 2 and f 3 are fixed coefficients that are obtained from fatigue tests in the lab or in the location of the road; where N d is the maximum number of loading repetitions that limit the rutting, E c is the compressive strain at the top of the subgrade and f 4 and f 5 are coefficients that are obtained from the loading experiments. Coefficients of Equations (1) and (2) were computed by various institutions (Table 4). Das and Pandey reported a mechanistic design model. This model was developed by correlating the performance data from bituminous pavements of various roads in India with the critical stress-strain factors leading to pavement failure. The model was developed by axle loading as given below (Das & Pandey, 1999): where N f is the cumulative standard axle repetitions to producing 25% surface crack due to fatigue on existing Table 4. Fatigue cracking and rutting model parameters (Huang, 1993).  (Hossain & Wu, 2002): where N f is the RSL of the pavement, E r is the horizontal tensile strain under the asphalt layer, E AC is the asphalt layer modulus and a, b and c are constant coefficients of regression. The basis of Equation (4) is similar to Equation (1) (the inputs of the models are the same). The difference between Equations (4) and (10 is their mathematical form. Equation (1) uses a power function and Equation (4) uses a natural logarithm function. Park and Kim presented a model by assessing the FWD test data in accordance with Equation (5) (Park & Kim, 2003): where N f is the number of repetitions of the standard axle to create fatigue failure, E t is the tensile strain at the bottom of the asphalt layer and K and C are regression coefficients. This model is similar to Huang's comprehensive model except that E 1 has been removed from model.

Other models
Some researcher-presented models for determining RSL differed from Huang's comprehensive models (Equations (1) and (2)). They used other indices as inputs of their models. The mathematical form of their models also differs from Huang's models.
In 1986, Smith used the S-shaped curve technique and the PCI (pavement condition index) to model the RSL of the pavement in his PhD thesis (Smith, 1986): where the 'age' is the RSL of pavement and α, β and ρ are fixed coefficients that relate to the curve and pavement conditions.
Turki and Adnan presented a model based on the international roughness index (IRI) as well as the 'current age' of the pavement. This model can be seen in Equation (7) (Turki & Adnan, 2003): where IRI terminal is terminal IRI of the pavement (mm/m or m/km), 'current age' is the age of the pavement section since original construction or last overlay (annually), a is the initial IRI (where age is zero) and b is the curvature of the performance line. Mofreh Saleh presented a model for determining the RSL based on pavement surface curvature (δ) and AUPP (area under pavement profile) parameters as shown by Equations (8) and (9) (Saleh, 2016): where N f is the number of axle load repetitions to fatigue failure, α and β coefficients are material constants, δ is the pavement surface curvature coefficient obtained from the FWD's deflection (D 0 -D 200 ). The basis of Equations (8) and (9) and Equation (5) is the same except that in Equations (8) and (9) E t is replaced by the results of Mofreh's research.

Support vector regression (SVR) and particle filter
An unsupervised learning method like the SVM may be used for classification and regression problems. The SVM model uses SRMP (structural risk minimization principle) and shows a perfect generalization ability to overcome the deficiencies of the traditional ANN algorithm. It uses empirical risk minimization in modelling a given variable (Faizollahzadeh Ardabili et al., 2018). The SVM is considered as a linear classification and tries to select the best reliable line from the dataset. To use this method for real outputs (non-binary) we can use SVR (support vector regression), which is generalized as binary. In this study we have tried to solve the difficulty of parameter setting in SVR.
The basic function of SVR is minimizing Equation (10) (Smola & Schölkopf, 2004): whose δ and C parameters will be explained in the SVM parameters section; the value w is the weight vector. The particle filter is a random-based state estimator operating through noises. It affects x k and y k , and the values of the noise and equations are shown in Equation (11). Furthermore, the measurement noise is defined as the dimensions and weights (Carpenter, Clifford, & Fearnhead, 1999): where x k represents the sluice state, y k is the output, f k is the process function, h k is the measurement functions, u k is the input and w k and v k are noises that affect the equations.

The proposed method
The method proposed in this paper produced a model to estimate 'remaining service life of pavement.' Therefore, the output of the model is 'remaining service life of pavement' (years). Inputs of the model are 'pavement thickness' (mm), including asphalt, bases and sub-base layers, and also 'temperature of asphalt surface' (°C). After the analysis of its strengths and weaknesses mentioned earlier, it was optimized to estimate the SVR parameter and a particle filter method was used for this purpose, in order to select the best parameters, instead of manually selecting them, based on the error test.
The performance of SVR is related to its parameters; the most important ones with concise explanations are given below. These parameters are the main reasons for increasing the efficiency of the method and in this method will be estimated by means of the particle filter.
• C parameter (trade-off between the training error and the complexity of the model [Insom et al., 2015]); • epsilon parameter (accuracy of approximation also known as 'loss function'); • kernel function and kernel scale parameter (mapping the nonlinear dataset to a linear one) Figure 2 illustrates the data cycle of the proposed method.
The proposed method selects SVR parameters based on the weight of the particles in the particle filter method. By using the correct values or true values as a state observer for each particle, a repeat sequence is formed. The data are initially normalized between 0 and 1 and 80% of the data are randomly used to teach the model while the rest are used for the test. After initializing the particles that are zero, the outputs are predicted, in a repeat sequence with the same values, and then compared with the previous results to update the particle weight. Through providing a set of examples of a probabilistic distribution (estimated weights) the target parameters are updated. The SVM model is trained by these parameters and an appropriate parameter is selected by examining the minimum error (compared with the previous result). For each particle, this sequence will continue (predict and update) until the best result is obtained. Finally, the modelling of the SVM regression is done with the parameters of the final training and test.
The numerical values obtained in the proposed method, which are introduced as the best weights in the algorithm, are kernel scale = 0.1543, epsilon = 0.1067, box constraint (C) = 0.5706.

Pavement RSL modelling results
This research focuses on optimizing the performance of SVR using a particle filter method known as SVR-PF. After normalization of data, 80% of the data are used for training and 20% are used for testing. Figure 3 shows the results of the total data, training data and test data, indicating the degree of coherence between the estimated and actual values. The predicted output comparison with the actual values of the test data indicates that the method has 95% accuracy. It is clear that an optimized SVM performed well in estimation.
The graph of the R index in Figure 4, which represents the coincidence of the output of the method and the actual values, represents 95% accuracy on the test data. The index shown in Figure 4 is known as the 'correlation coefficient' and is represented by R. The correlation coefficient is a standard for the quality of linear relationships.
This criterion will represent four states of solidarity: (a) R = 1 (relevance is complete and positive) (b) 0 < R < 1 (relevance is incomplete and positive) (c) R = −1 (significance is complete and negative) (d) −1 < R < 0 (relative is incomplete and negative) The sign represents the relevant direction. A suitable value of R cannot be specified but it is stated that 'the higher value of R represents a better correlation.' This index can be defined in accordance with Equation (12) (Mohammadzadeh, Bolouri, & Alavi, 2014): Root mean square error (RMSE) and mean squared error (MSE) are other indexes to illustrate the difference between the real value and the predicted value (Equation (13)) (Mohammadzadeh et al., 2014): where h i and t i are, respectively, the experimental and calculated output values for the ith output, h i is the average of the experimental outputs and n is the number of samples (Mohammadzadeh et al., 2014). RMSE is in fact the root of the MSE index and can be calculated according to Equation (14): The evaluation metric called Nash-Sutcliffe model efficiency (NSE) is obtained by dividing MSE using the variance of the observations and subtracting that ratio from 1.0 (Gupta, Kling, Yilmaz, & Martinez, 2009). NSE can be calculated by Equation (15): where σ is the standard deviation of the observed values (Gupta et al., 2009).  The collation of predicted and real values is shown in Figure 5 to indicate the difference between these values. Figure 5 is based on the values in the vertical axis and the sample number on the horizontal axis. It should be noted that the values are normalized in the range of 0-1 and can be retrieved and converted to real values for application purposes. The same data are available for other data-mining methods such as MLP neural networks and SVM and the results of the regression correlation coefficient and mean square error, which indicate the accuracy of the research estimate, are visible in Table 5. It should be noted that these values relate to the results of the evaluation.

Conclusion
According to the mentioned weaknesses of the HWD test, the authors of this study sought an alternative method for this experiment. Their proposed method has been able to optimize one of the most widely used methods of artificial intelligence -the SVM -by means of the particle filter method to overcome its weaknesses. Then, using the characteristics of 'pavement layers thickness' (asphalt, base and sub-base), and also 'temperature of asphalt surface,' it predicted the RSL of the pavement per year. After the RSL predicted by the proposed method and the actual RSL values from the non-destructive HWD test had been examined and compared, a precision of over 95% was found to confirm the validity of this method. Now, with the availability of weather information for each area, as well as information about the thickness of the pavement layers which is obtained in a variety of ways (for example with the help of the GPR device) it is possible to estimate the service life of existing and operating pavements. Regarding the high accuracy of the proposed method, the authors suggest that the administration and organizations through this method, compared to HWD, significantly reduce the costsand eliminate traffic disturbances and decide as soon as possible to determine the RSL.