An intelligent approach for predicting the strength of geosynthetic-reinforced subgrade soil

ABSTRACT
 In the recent times, the use of geosynthetic-reinforced soil (GRS) technology has become popular for constructing safe and sustainable pavement structures. The strength of the subgrade soil is routinely assessed in terms of its California bearing ratio (CBR). However, in the past, no effort was made to develop a method for evaluating the CBR of the reinforced subgrade soil. The main aim of this paper is to explore and appraise the competency of the several intelligent models such as artificial neural network (ANN), least median of squares regression, Gaussian processes regression, elastic net regularisation regression, lazy K-star, M-5 model trees, alternating model trees and random forest in estimating the CBR of reinforced soil. For this, all the models were calibrated and validated using the reliable pertinent historical data. The prognostic veracity of all the tools mentioned supra were assessed using the well-established traditional statistical indices, external model evaluation technique, multi-criteria assessment approach and independent experimental dataset. Due to the overall excellent performance of ANN, the model was converted into a trackable functional relationship to estimate the CBR of reinforced soil. Finally, the sensitivity analysis was performed to find the strength and relationship of the used parameters on the CBR value.


Introduction
The pavement design is highly influenced by the load-carrying capacity of the subgrade soil. Geosynthetic reinforcement provides a sustainable and cost-effective way of soil subgrade improvement. The use of geosynthetic reduces the deformation of the pavement caused by the vehicular load, enhancing its strength and durability (Shukla 2002). Starting in the late 1900s, many studies can be found in the literature that investigate the beneficial effects of geosynthetic reinforcement in road construction projects (e.g. Miura et al. 1990, Perkins 1999, Cuelho et al. 2005, Abu-Farsakh et al. 2016, Chen et al. 2018. The California bearing ratio (CBR) is considered as one of the key parameters used to ascertain the subgrade capacity to withstand the applied traffic loads (ASTM 2016). To date, many researchers have employed the CBR testing to investigate the effects of geosynthetic reinforcement on subgrade material (Duncan-Williams and Attoh-Okine 2008, Naeini and Ziaie-Moayed 2009, Nair and Latha 2011, Choudhary et al. 2012, Rajesh et al. 2016, Mittal and Shukla 2018, Negi and Singh 2019. The advent of soft computing and data-driven modelling has made many traditional approaches antiquated. In recent times, the use of artificial intelligence (AI)/machine learning (ML) techniques has become very common in solving various complex engineering problems including those related to pavement engineering (Nazemi and Heidaripanah 2016, Daneshvar and Behnood 2020, Ghosh Mondal and Kuna 2020, Han et al. 2020, Olowosulu et al. 2020, Ghorbani et al. 2021. The laborious, costly and time-consuming nature of the CBR, and moreover, due to the complex non-linear relationships between the soil properties, many researchers have utilised ML to predict the CBR of the soil. Taskiran (2010) applied artificial neural network (ANN) and gene expression programming (GEP) techniques to learn the non-linear relationships between CBR and various index properties of the fine-grained soils sourced from Southeast Anatolia, Turkey. The maximum dry unit weight (g d ), plasticity index (PI), liquid limit (LL), optimum moisture content (OMC) and the content of different soil fractions were identified as the most effective parameters that influence the CBR values of the soils. Yildirim and Gunaydin (2011) used simple multiple regression and ANN to propose correlations for the preliminary estimation of CBR values of different soil, by employing the results of sieve analysis, Atterberg limits, maximum dry density and OMC. Alawi and Rajab (2013) utilised multiple linear regression (MLR) analysis to predict the CBR of the unreinforced subbase soil layer. Recently, similar applications of advanced ML techniques for the prediction of CBR of different unreinforced soils were also conducted by other researchers (Erzin and Turkoz 2016, González Farias et al. 2018, de Souza et al. 2020, Nagaraju et al. 2020, Tenpe and Patel 2020. Similarly, several studies were also carried out to evaluate permanent deformation and resilient modulus of recycled demolition wastes in pavements using ML algorithms (Arulrajah et al. 2013, Ullah et al. 2020, Ghorbani et al. 2020a. Ghorbani et al. (2020bGhorbani et al. ( , 2021 successfully developed an ANN model to predict the permanent strain of blends of two recycled waste materials under different stress and temperature levels and shakedown analysis of polyethylene terephthalate (PET) blend with demolition waste material as the subbase layer of pavement, respectively. However, the review of the current literature revealed that the efforts to predict the CBR of geosynthetic-reinforced subgrade soil are extremely limited. In fact, Singh et al. (2020) is the only study that has attempted to predict the CBR of geogridreinforced soil using a fuzzy logic-based modelling technqiue. However, to the best of authors' knowledge, there is currently no research in the present literature that provides development, implementation and comprehensive comparison of ML-based solutions to the problem of CBR of geosyntheticreinforced subgrade soil.
In this paper, an attempt is made to predict the CBR of geosynthetic-reinforced soil (GRS) using the data-driven-based ML models. The main objectives of this research are fourfold: (1) assessment of numerous ML models in predicting the CBR of GRS; (2) comprehensive comparison of all the predictive tools utilised for the same problem; (3) suggestion of ANN-based trackable mathematical formula for estimating the CBR of soil reinforced with geosynthetic layers; and (4) independent validation by conducting new CBR tests on reinforced soil.

Material and methods
In this work, eight ML models namely, ANN, least median of squares regression (LMSR), Gaussian processes regression (GPR), elastic net regularisation regression (ENRR), lazy Kstar (LKS), M-5 model trees, alternating model trees (AMT) and random forest (RF) models, were constructed to predict the CBR of GRS. All the models were simulated using Waikato Environment for Knowledge Analysis (WEKA). It may be noted that in the past, many researchers and scientists have utilised the WEKA framework for function approximation and feature classification (Gao et al. 2019, Olowosulu et al. 2020. Moreover, the new CBR tests were also conducted to ensure the independent validation of the data-driven models and discussed later in the paper.

Experimental database development and model attributes
In order to calibrate and validate all the data-driven models, the data set of 97 soaked CBR tests is retrieved from the literature. This includes 4 cases reported by Duncan-Williams and Attoh-Okine (2008), 30 cases reported by Vinod and Minu (2010), 16 cases reported by Choudhary et al. (2011), 2 cases reported by Kuity and Roy (2013), 4 cases each reported by Carlos et al. (2016) and Rajesh et al. (2016), 9 cases by Shukla (2018, 2019) and 28 cases by Negi and Singh (2019). The reliability of a model depends upon the comprehensiveness of the input data set. In this study, it was ensured by incorporating a wide variety of soils, ranging between sandy soil (SP) and fine-grained soils (ML, MH, CL and CH), as per the Unified Soil Classification System (USCS). These soils cover a range of engineering properties that affect the stiffness of a soil, such as soil index properties and particle size distribution (Youd 1973, Zheng andHryciw 2016). However, the geotechnical engineering models that are the most effective in predicting the non-linear soil behaviour are based on the important soil parameters that can be obtained via routine tests (Oztoprak and Bolton 2013). The detail of the utilised database is provided in Table 1. For predicting the output, that is, CBR of GRS, the input parameters include LL (X 1 ), plastic limit (X 2 ), PI (X 3 ), dry unit weight (maximum) (X 4 ), optimum moisture content (X 5 ), percentage fines (passing sieve No. 200) (X 6 ), percentage sand (X 7 ), tensile strength of geosynthetic reinforcement (X 8 ), number of reinforcement layers (X 9 ), position of the first reinforcement layer (X 10 ) and position of the subsequent reinforcement layers (X 11 ). All these parameters were selected based on the fact that the current literature shows that they have an effect on the CBR of reinforced soil.
The developed models were also tested for the independent data set obtained by conducting new CBR tests (AS 1289.6.1.1 2014). It is noteworthy that these data are not a part of the actual database utilised to construct the ML models. Non-plastic sandy soil extracted from the pit site located in the northern region of Perth, Australia, was used during the experimental tests. The properties of the soil and geosynthetic (geotextile) are summarised in Table 2. To evaluate the effect of geosynthetic reinforcement on the strength of the subgrade soil, a total of six CBR tests were performed by varying the depth ratio of the first reinforcement layer (u/H), number of reinforcement layers (N) and depth ratio of the subsequent layer (h/H), where H represents the total height of mould. The proposed scheme of the CBR tests is summarised in Table 3.
The reinforcement layer was cut into a circular disk with the diameter slightly smaller than the diameter of the mould. In order to fill the mould, dry weight calculations are done based on the maximum dry unit weight of soil and volume of mould. The soil was mixed thoroughly by adding the water content corresponding to OMC. Thereafter, the mould is filled with the soil by placing the geotextile layer at a predetermined depth as reported in Table 3. The test setup with the schematic diagram of the specimen in the CBR test and the placement of geosynthetic at a predetermined depth in the CBR mould is illustrated in Figure 1. The CBR tests were conducted after soaking the sample in water for 96 h. The surcharge load was applied to the specimen to ensure the effect of the thickness of the overlying layer. Load was applied at 1.25 mm/min (moveable base) and the corresponding penetration was measured through the electronic displacement transducer. The load readings were taken at penetrations ranging from 0.5 to 12.5 mm. The CBR values were estimated by taking the load corresponding to 2.5 or 5.0 mm penetration, whichever is the highest as suggested by the relevant standard. Figure 2 depicts the load-penetration curves obtained for the CBR tests of reinforced soil.

Methodological background of machine learning models
The first and foremost step before mapping the response of any data-driven model is the database partitioning. For this, the data are needed to be randomly divided into training and testing subsets. In this study, 60% of the collected data were randomly chosen for training the ANN, LMSR, GPR, ENRR, LKS, M-5 model trees, AMT and RF models. Thereafter, the predictive veracity of each model was appraised against the remaining 40% of the data set (test data). Moreover, all the data sets have been normalised [−1,1] before feeding it to the model networks, so that each variable would get the same attention during the training process. The brief methodological background of all the data-driven-based modelling techniques utilised to estimate the CBR of reinforced soil is explained in this section. For a more comprehensive understanding, the research scheme employed in this study is also illustrated in Figure 3.

Artificial neural network
ANN is the well-established and widely recognised ML model used for mapping the non-linear response of any system (Moayedi and Hayati 2018). At its core, the ANN model architecture consists of three parts, namely input layer, one or more hidden layers and output layer. Each layer consists of set processing elements called nodes (neurons) which interact with each other through weighted connections. The computation process can be described as follows: (a) data are presented to the model through input layer nodes; (b) in the hidden-output layer, the data are multiplied by the weight matrix and added to the threshold (bias) vector, thereafter the activation function is applied; (c) output of the hidden layer is mapped as the final outcome after passing through the output layer. Mathematically, for n inputs, the output y is computed as follows (Bishop 2006): where w is the weight connection and θ is the bias/threshold. In this study, the sigmoid activation function is used in the hidden layer and given as follows (Han and Moraga 1995): For training the neural network, the optimisation procedure suggested by Soleimanbeigi and Hataf (2006) was adopted to select the optimum number of hidden layer nodes. For this, the hidden layer nodes were increased until no further improvement was obtained over the testing data set. Figure 4 illustrates the optimisation process for selecting the number of hidden nodes. It can be observed that the lowest mean absolute error (MAE = 1.233) and root mean square error (RMSE = 1.70) is obtained at nine hidden nodes. However, the model with six hidden nodes is selected as the optimum model as it has less weight connections, but its performance is closer to nine hidden nodes with MAE and RMSE values of 1.31 and 1.74, respectively. The architecture of the optimum ANN model (11-6-1) is given in Figure 5.

Gaussian process regression
In the past, GPR model has been efficiently used in predicting the response of the system (Gao et al. 2019, Zhang et al. 2019, Suthar 2020. GPR is a kernel-based ML model that is founded on Bayesian theory and statistical learning approach. Such model can be completely defined by the mean and covariance function (kernel) as follows (Rasmussen 2006): where m(x)is the mean function, and k(x, x ′ )is the covariance function of a real process g(x). The Gaussian process is defined by the set of random parameters, and any finite number of it has the joint Gaussian allocation. Mathematically For detailed derivation, readers are directed to excellent studies available in the literature (Seeger 2004, Rasmussen 2006. In this study, various kernel functions such as squared exponential, radial bias and Pearson's VII universal kernel functions are used for estimating the CBR of reinforced soil using GPR. The optimum results are obtained by employing the Pearson's VII universal kernel (PUK) function. The mathematical form of PUK is given as follows (Üstün et al. 2006): (6) where sand vare the Person's width, and peak tailing factor, respectively.

Least median of square regression
The LMSRis a semi-parametric quantile regression technique. In contrary to the classical regression model, the sum of least squares is replaced by the median of squared errors (Rousseeuw 1984). The LMSR overcomes the major drawback in the ordinary regression, that is, the sensitivity to the outliers. For a standard univariate linear regression problem, the residuals take the following form (Massart et al. 1986) where r is the residual, y is the response (output), x is the input, and a and b are regression coefficients. The principle of least square governs, minimise n j=1 r 2 j , whereas LMS estimators aims at minimising the median of square errors, that is, minimise med n j=1 r 2 j . In this study, the following relationship is obtained for estimating the CBR of reinforced soil by using the LMS regression technique:

Elastic net regularisation regression
The ENRR is a robust regression model which aims at combining the penalties of the least absolute shrinkage and selection operator (LASSO), l 1 , and ridge regression technique, l 2 (Ogutu et al. 2012). LASSO randomly tends to choose only one attribute and ignore the other, especially when the attributes are highly correlated, whereas the elastic net (EN) overcomes this problem. For a set of data sample with n observations and p predictors, let {(x i , y i ), i = 1, 2 . . . , n)} , where x i , y i belongs to R p . Moreover, if y = (y 1 . . . , y n ) T and X [ R n×p represents the output vector and model matrix, respectively, then the EN can be written as follows (Zou and Hastie 2005): wherebis the weight vector, j is the EN penalty parameter, that is a combination of l 1 and l 2 . For the detail derivation, readers may refer to the research conducted by Zou and Hastie (2005). It may be noted that, if l = 1, then EN takes the ridge regression shape, and if l = 0, then it takes the LASSO form. For l [ [0, 1), the EN encompasses the characteristics of both the ridge and LASSO. In other words, the penalty parameter l 1 enables the automatic selection of variables and l 2 leads to stabilisation by making the problem strictly convex   (Gao et al. 2019). For the estimation of CBR, the following relationship is obtained through ENRR regression:

Lazy K-star
Lazy learning is a ML method in which the majority of the computation time is deferred to the consultation time, that is, until a query (call) is made to the system (Webb et al. 2011). LKS is a lazy learning algorithm which performs the generalisation of the training data using instant base learning classification. This means that unlike the ANN or other ML methods, the predictions are not inferred from particular instances in the training data, instead the complete data are stored in the memory and upon call, the response is produced by the nearest neighbour approach. The K-star sums all the possible transitions between the two instances and amalgamate them into a single class (Cleary and Trigg 1995). This is achieved by summing the probabilities over all possible transformations between the instances. This entropy-based learning technique has several advantages in comparison to other rulebased schemes, such as better handling of missing values and common attributes. Mathematically, the function K-star is defined as follows (Cleary andTrigg 1995, Gao et al. 2019): where P * represents the probability function, that is, the probability of all the paths from instances p to q.

M-5 model trees
Based on the original research conducted by Quinlan (1992) on the development of decision trees for the regression problem, Wang and Witten (1997) proposed the M-5 model. The M-5 model uses the classical top-down method for growing and pruning decision trees. The data are presented in the form of the mean values and regression function to the leaf nodes; thereafter, the branching/separation is performed at each node (MLR) until the values of response variables reaching a node shows no or negligible change. In the next step, the larger subtrees are replaced by a single larger linear model. For this, the error values inside the inner node of the tree end are compared with those of the tree leaf underlying that node Figure 5. Architecture of the optimum ANN model. (Khorrami et al. 2020). Finally, the smoothing function is applied to trim the gaps between the neighbouring (adjacent) leaf nodes. In this way, the final model is obtained by combining all the available linear models along the path from the root node of the tree to each leaf node; thus, effectively producing a linear combination of all the available linear models (Quinlan 1992, Khorrami et al. 2020).

Alternative model trees
AMT is a recently developed algorithm based on the principle of ensemble learning. Originally, proposed by Frank et al. (2015), AMT uses the additive regression technique to grow the trees. Moreover, similar to the decision trees, AMT utilises stage-wise forward additive regression (statistical boosting variant) and cross-validation technique to minimise the square errors and to limit the growth of the trees, respectively . The main difference between the M-5 model trees and AMT model is that the former uses the multivariate linear regression on the leaves, while the latter employs the simple linear regression to obtain the predictive model. For this, AMT utilises two nodes, namely splitter node and predictor node. The splitter node splits the numeric attributes at the median value, and the predictor node predicts the response of the system by using the linear regression technique (Gao et al. 2019. For detailed insight and derivation, the reader may refer to Frank et al. (2015). AMT model proposed for the present study for estimating the CBR of reinforced soil is illustrated in Figure 6.

Random forest
RF is an ensemble learning method originally suggested by Ho (1995) for classification and regression problems. At its core, the RF works by creating a swarm of decision trees and then averaging the output of each tree. The bootstrap aggregating technique helps the RF to obtain a much stable solution, and reduces the chances of overfitting (Breiman 2001). The user control parameters are the number of trees, number of nodes, and number of variables. Generally, the higher number of trees will lead to higher accuracy but require more computation time. As each tree works entirely independently and uses out-of-bag estimates to observe the errors and correlation strength, the abundance of trees will not lead to overfitting of the model. The detailed algorithm is described in Breiman (2001).

Model assessment criteria
Five statistical matrices were chosen to appraise and compare the predictive strength of all the developed data-driven models. The matrices are as follows: (i) coefficient of determination (R 2 ); (ii) root means square error (RMSE); (iii) scatter index (SI); (iv) index of agreement (I a ) and (v) mean absolute error (MAE). All these statistical tools have been extensively used in the previous studies to simulate the accuracy of ML-based models (Yaseen et al. 2018, Khorrami et al. 2020, Raja and Shukla 2020, 2021. The mathematical form of these statistical standards is given below where n is the number of observations, CBR o i is the ith observed (measured) value, CBR p i is the ith predicted value, CBR o is mean observed value, and CBR p is the mean predicted value. The range of R 2 is 0-1 and for an ideal model, the value should be close to 1. The I a (0 ≤ I a ≤ 1) shows the ratio of mean square error and the prediction error in the system. The value of 1 represents the perfect agreement between the observed and predicted value and 0 represents no agreement (Willmott 1981). RMSE and MAE are widely adopted for assessing the predictive accuracy of the ML models. The MAE represents the average error (equal weightage) over the data set without considering the sign. It means that in MAE, the difference between the observed and predicted values is averaged in a linear manner, while in RMSE, the errors are squared before taking the average (Equation (13)), therefore giving high weightage to the larger errors (Yaseen et al. 2018

Results and discussion
As mentioned abovethat the main aim of this study is the comprehensive ML-based analysis for modelling the CBR of GRS. Main parameters of all the models are summarised in Table 4. For evaluating the performance of the developed prescient models, namely ANN, LMSR, GPR, ENRR, K-star, M-5, AMT, and RF, the calculated values of all the statistical parameters, namely R 2 , RMSE, SI, I a and MAE are presented in Table 5 for testing (validation) data. The colour intensity coding technique (CICT) has been applied to indicate the strength of each parameter according to its obtained value. It may be noted that this technique has been successfully applied in many previous studies (Nazari et al. 2020, Nguyen et al. 2020. In this way, the parameters which illustrate more accuracy (higher R 2 and I a ; and lower RMSE, SI and MAE) are given intense colour and vice-versa. For this study, the shades of green colour are used for depicting the predictive accuracy of the developed ML models. The dark green colour cells show more accuracy, and light pale green colour cells depict the lower precision level. Accordingly, each model was scored, and the final ranking for a particular model was obtained by summation of all the partial scores given based on the statistical indices' values. As Table 5 GPR, respectively, indicate that the forecasting ability of these models are associated with relatively high bias, in comparison to their counterpart models.
The scholars have argued that the reliability of the ML models should also be assessed using the external validation and/or multi-criteria approach (Gandomi et al. 2013, Naser andAlavi 2020). This gives the realistic evaluation of the model's predictive performance by eradicating/ minimising the bias associated with the traditional goodness of fit indices. Therefore, the external model validation criteria by Golbraikh et al. (2003), stabilisation criteria for quantitative structure-activity relationship (QSAR) model by Roy and Roy (2008), and objection function (OBJ) by Gandomi et al. (2013) are also applied to further affirm the accuracy and reliability of the developed data-driven models.
According to Golbraikh et al. (2003), a model must meet the following criteria to be considered reliable. One of the slope regressions lines (k or k ′ ) between the observed values (say x i ) and predicted values (y i ) or vice-versa must pass through the origin, and must be close to unity. In terms of the CBR prediction model, it can be written as follows: The value of k or k ′ must be between 0.85 and 1.15. Additionally, the performance index parameters, that are, m and n  should be <0.1 and can be calculated as follows: whereas R 2 o and R ′ o 2are the regression coefficients and can be estimated as Ideally, the values of R 2 o or R ′ o 2 should be close to actual R 2 , whereas R 2 should be > 0.6. In this way, the model is considered acceptable if it meets all these criteria. Roy and Roy (2008) established the stabilisation criteria for ensuring the predictability of the developed model. Accordingly, the value of R m is estimated as stabilisation criterion to measure the reliability of the developed model. Mathematically The value of R m should be > 0.5. Based on RMSE, R 2 and MAE, the OBJ function contemplates the performance of the model is training and testing data set, simultaneously (Gandomi et al. 2013 where subscripts tr and ts represent training and testing data, respectively. The results of the above-mentioned external validation and multi-criteria approach are summarised in Table 6. From the table, it can be concurred that the ANN, LKS, AMT and RF models have shown excellent prediction ability in calculating the CBR of reinforced soil. This is substantiated by the fact that all these models have met the underlying conditions corresponding to the external validation criteria approach. The ENRR and M-5 models have met all the conditions of Golbraikh et al. (2003) model but failed to meet the model stabilisation condition with R m values of 0.27 and 0.37, respectively. Based on the values of OBJ function values (3.40, 15.43, 5.38, 5.44, 2.95, 4.79, 3.52 and 3.56), respectively, for ANN, LMS, GPR, ENRR, LKS, M-5 trees, AMT and RF also indicate that the ANN and LKS can be established as the best models for predicting the CBR of reinforced soil. Also, the AMT and RF models performed well by meeting all the criteria, and therefore can be introduced as third-and fourth-best models in the hierarchy. This means that the predictions made by these models are trustworthy and are not a mere coincidence. Additionally, the results of LMS, GPR, ENRR and M-5 models indicate relatively poor performance in comparison to their counterpart models.
Finally, Taylor's diagram originally proposed by Taylor (2001) has been presented in Figure 7. The Taylor diagram presents the visual summary of the predictive power of the data-driven models on a single platform, that is, how closely the actual and simulated responses are related to each other in terms of their correlation and biasness ratio (Taylor 2001, Raja and. Regarding Figure 7, the solid radial lines (black) represent the standard deviation (SD); thickened dash lines (grey) represent the correlation coefficient (CC); and the dotted radial lines (red) show the centred root mean square deviation (CRMSD) between the simulated (test data) and reference field. The reference model is indicated by the solid black dot with the measured SD of 7.09, CC of unity and zero CRMSD. It can be observed that for LKS and ANN models, the CC, CRMSD and SD are about (0.977, 1.53 and 6.64) and (0.9716, 1.68 and 6.76), respectively. This highlights excellent predictive capability for the developed models followed by AMT model with values of CC, CRMSD and SD of 0.963, 1.92 and 7.02, respectively. For the same parameters, the values for M-5 and RF were (0.921, 2.83 and 5.89) and (0.965, 1.99 and 6.02), respectively. On the contrary, the GPR model has shown little too spatial variability with the SD value of ∼4.03, and the LSM model has depicted a large variation in comparison to the observed CBR values with the SD of 14.59. The ENRR model has shown a fair overall performance with the SD value of 5.76; however, the correlation is weak (CC = 0.897), and root mean square error is high (CRMSD = 3.21). Therefore, consistent with the results of the statistical indices and external validation criteria, to this point, it can be established with sufficient trustworthiness, that among all the applied ML models, ANN and LKS models have achieved more accuracy in forecasting the CBR of reinforced soil.

Model presentation
For this work, ANN is selected as an appropriate estimator of the CBR values of the GRS, due to its predictive performance and simplicity. The ANN-based equation is given as follows (Aamir et al. 2020): where y is the value of output (CBR), g n is the hidden-output layer transfer function (pureline), f n is the input-hidden layer transfer function (sigmoid), w jk is the weight connection between the jth node in the hidden layer and single node in the output layer (k = 1), b ji is the weight connection between the ith node of the input layer and jth node of the hidden layer, u j is the bias of the jth node at the hidden layer and u k is the bias at the output layer node. The architecture of the developed single hidden layer neural network is already given in Figure 5, that is, 11 input nodes, 6 hidden nodes and 1 output node. The weights and biases values of the network are reported in Table 7. The values of input parameters should be normalised [−1,1] before feeding to ANN using the following relationship where X min is the minimum value of the parameter, and X max is the maximum value of the parameter, and already given in Table 1.
In order to estimate the CBR of reinforced soil with 11 input parameters, the following relationship is established for the optimum ANN model where CBR ′ p is the normalised predicted CBR value [−1,1]; X ′ 1 , X ′ 2 , X ′ 3 , X ′ 4 , X ′ 5 , X ′ 6 , X ′ 7 , X ′ 8 , X ′ 9 , X ′ 10 and X ′ 11 represents the normalised values of all the input parameters. The predicted CBR value should be de-normalised as follows: (29) For easy comprehension, the design example is given in the Appendix. It may be noted that the present relation is only calibrated and validated for predicting the soaked CBR of GRSs within the training data range, and therefore, shall not be used for estimating the unsoaked CBR values, and the CBR of unreinforced soil or soil reinforced with other types of reinforcements such as fibres, tire chips, metals, etc.

Independent validation
LKS and ANN models have shown excellent predictive performance for the data set utilised to establish the ML models. However, in order to establish the supremacy, consistency and reliability of any ML model, its predictive strength should be checked against the entirely new data. Therefore, an experimental study was conducted to establish a new data set. For this, six soaked CBR tests (see Table 3) were conducted and their experimental values were compared with the simulated values obtained from ANN and LKS models. The vis-à-vis comparison of the experimental and predicted CBR values for ANN and LKS is presented in Figure 8. It can be observed that the ANN predicted the CBR values much closer to the actual values in comparison to LKS with the average absolute error of 10.8% and 36.4%, respectively. Therefore, it is admissible that the ANN model has outperformed its competitive models in predicting the CBR of GRS. Moreover, another main advantage of the ANN network in comparison to other efficient models (say LKS or AMT) is that it can be translated into a trackable functional relationship, and, therefore, can easily be executed without the need for any expensive computer-based program. Moreover, the ANN model can be updated to acquire the better results by presenting more training examples when the new data becomes available.

Sensitivity analysis
A sensitivity analysis has also been carried out to determine the relative importance of each parameter affecting the CBR of reinforced soil. It helps in finding the strength of the existing correlation between the input and output dimensions. For this study, the cosine amplitude method (CAM) is used to establish the strength of input parameters with the CBR of reinforced soil.
For CAM, let n data samples in the same region (say Xspace), then the data array X can be written as follows (Hasanzadehshooiili et al. 2012): Each elementx i of data array X is the vector (length m) in Equation (30), and is defined as: In this way, the correlation strength r ij between the data points x i andx j can be estimated by Equation (32) (Ghorbani et al. 2020c) The result of the CAM sensitivity analysis carried for the ANN model is illustrated in Figure 9. The relative strengths of all the input parameters (X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 , X 8 , X 9 , X 10 and-X 11 ) with CBR of reinforced soil are 0. 761, 0.756, 0.759, 0.721, 0.766, 0.731, 0.40, 0.560, 0.676, 0.761 and 0.30, respectively. This imply that all the input parameters except for the subsequent reinforcement depth (X 11 ) play a significant role in determining the CBR of reinforced soil with r ij approximately ranging from 0.4 to 0.8. Moreover, X 1 (LL), X 5 (OMC) and depth of the first reinforcement layer (X 10 ) has achieved the highest correlation in predicting the CBR of reinforced soil.

Conclusions and future outlook
This work presents the comprehensive and detailed comparison of eight data-driven ML-based models, namely ANN, LMSR, GPR, ENRR, LKS, M-5 model trees, AMT and RF for predicting the CBR of subgrade soil reinforced with geosynthetic layers. For this, the pertinent data set was retrieved from previously published scientific studies. Each sample consisted of 11 input variables such as LL, PL, PI, maximum dry unit weight of soil, moisture content, percentage fines, percentage sand, tensile strength of geosynthetic reinforcement, number of reinforcement layers, position of the first reinforcement layer and position of the subsequent reinforcement layers, and one output variable, that is, CBR. The acquired data set was randomly divided into training (i.e. 60% of total data) and 40% testing (i.e. 40% of total data) to calibrate and validate the performance of all the data-driven tools. The performance of all the models was accessed using the five statistical indices, which are, coefficient of determination (R 2 ), root means square error (RMSE), scatter index (SI), index of agreement (I a ) and mean absolute error (MAE). Based on the results of these indices, a colour intensity coded ranking model was developed. Moreover, the predictive strength of the tools mentioned supra was also appraised using the external validation criteria and multi-criteria approach. The most appropriate models were also tested against the entirely independent validation data obtained by conducting the new CBR tests. Finally, the sensitivity analysis was performed to assess the effect of the input parameters on the CBR value. Based on the acquired results, the following conclusions can be drawn. Based on the values of these statistical indices, a total ranking score was obtained for all the modelling techniques. The results have shown excellent prediction ability of LKS and ANN with a total score of 40 and 35, respectively. The ranking scores of other models such as LMSR, GPR, ENRR, M-5 trees, AMT and RF were,respectively,5,15,14,18,26 and 27. . Among all the models, the LMSR model has obtained the poorest approximation of the CBR and its insufficiency was depicted by the ranking score (total score = 5) obtained based on the above-mentioned assessment criteria.
. Based on the results of the external validation technique, and multi-criteria assessment approach, the ANN, LKS, AMT and RF models have achieved good prediction ability and model stability in forecasting the CBR of reinforced soil. However, the LKS and ANN have shown superior performance in comparison to their counterpart models with the OBJ function value of 2.95 and 3.40, respectively. Also, among these two models, the latter has predicted the new experimental data (independent data) with more accuracy. Additionally, for this work, the developed ANN model was also converted into trackable mathematical relationship for easy hand or spreadsheet calculations. . The strength (r ij ) of each input variable with respect to the output (CBR) was evaluated by sensitivity analysis. The results revealed that all the parameters have played an important role in determining the CBR of reinforced soil. However, OMC LL and position of the first reinforcement layer with r ij values of 0.77, 0.762 and 0.761, respectively, are the most influential parameters.
In this study, to maximise the modelling efficiency and ease of use, default settings are used in WEKA for most of the individual models. Therefore, future studies should focus on how the parameters in the models can be optimised automatically. This limitation can be explored in future by combining some optimisation scheme with the model network. Most recently, the use of evolutionary algorithm based on metaheuristics (e.g. shuffled frog algorithm, grey wolf optimiser, ant lion optimiser, elephant herd optimisation, etc.) has shown good ability to improve the prediction ability of neural networks by optimising its weights and biases. The future work with the focus on such optimisation techniques can prove to be a useful idea. Moreover, the ensemble learning techniques in which the learning power of the multiple ML models are combined to predict the response of the system might also be applied in the future.

Disclosure statement
No potential conflict of interest was reported by the author(s).