A comparative evaluation of machine learning algorithms and an improved optimal model for landslide susceptibility: a case study

Abstract In this study, four representative machine learning methods (support vector machine (SVM), maximum entropy (MaxEnt), random forest (RF), and artificial neural network (ANN)) were employed to construct a landslide susceptibility map (LSM) in Xulong Gully (XLG), southwest China. The models were subsequently compared in order to select the best-performing model. This model was further improved to optimize the machine learning method. A total of 16 layers were extracted from the collected data and employed as conditional factors for the correlation analysis and subsequent modelling. The LSMs were then divided into four levels (very high susceptibility (VH), high susceptibility (H), moderate susceptibility (M) and low susceptibility (L)). The results were verified by receiver operating characteristic (ROC) curves, Root Mean Squared Error (RMSE) and Frequency Ratio (FR). The higher of the area under ROC curve (AUC) and the lower the RMSE, the more accurate and stable the performance. Following the factor performance analysis, the optimal SVM model was linearity improved to the Trace Ratio Criterion (TRC)-SVM, with a better performance and the ability to overcome the factor defect. The comprehensive comparisons and proposed LSM can support future research, as well as local authorities in the development of landslide remission strategies.


Introduction
Geological hazards, a common occurrence in Southwest China, can have a devastating impact on environmental systems and sustainable development (Bai et al. 2012). Approximately 1046 geological hazards occurred in 2019 in Southwest China, reaching 16.90% of the nationwide total. A total of 68.27% of these were landslides, causing 89 deaths, 9 missing persons, and CNY 1.7 billion (equal to USD～ 0.24 billion by April 2020 conversion) in financial damage. Unfortunately, landslides are natural disasters that cannot be avoided and are triggered by numerous factors (e.g., abundant rainfall, complex geo-environmental settings, high-frequency earthquakes, etc.) in mountainous communities (Parker et al. 2011;Gorum et al. 2013;Nsengiyumva and Valentino 2020).
Extensive research has been performed on the problems caused by landslides, both directly or indirectly. Direct problems are usually investigated via point-to-point monitoring, while indirect methods focus on natural factors to analyze the landslide formation mechanism (Xu et al. 2014;Fang et al. 2020;Yuan et al. 2021). For example, streams play an important role in the formation of unstable slopes in mountainous areas (Vojtekov a and Vojtek 2020). Physical models are commonly employed in landslide analysis to account for streams (e.g., examining the role of vegetation via slope stability models or man-made factors) (Pham et al. 2016;Wiesmair et al. 2017). Landslides occur at a high frequency along transportation lines, causing damage to communities, property, and resulting in the loss of life (Hong et al. 2018;Jones et al. 2020). Therefore, effective guidelines for landslides risk assessments and risk zoning are necessary (Maloney et al. 2020). Geological hazard model outputs should be presented with a complete hazard evaluation, including the location and scale of the geological hazard (Saha et al. 2005). A susceptibility study of geological hazards can provide information about the stability of areas and the probability of predictions in order to plan for disaster governance and further engineering activities (Khamkar and Mhaske 2019).
Much effort was made to produce landslide susceptibility maps in the early 1970s based on qualitative research (Babb and Bliss 1974). In particular quantitative assessments of landslide susceptibility expanded following the development of Autosnap and the advancement of data acquisition and analytical approaches (Roth 1983). Booth conducted a regional landslide susceptibility evaluation based on the aspect, slope gradient, and bedrock elements contributing to land management (Booth et al. 1984). In 1989, Jibson and Keefer created a geological hazard susceptibility map using over 200 large geological hazards at the edge of the Mississippi Plain via discriminant and multiple linear regression analyses (Jibson and Keefer 1989). Previous work has proposed adopting a discrete combination of geological and geomorphic parameters to calculate landslide susceptibility. In addition, the matrix evaluation method has been introduced to evaluate slope stability in large scale mountainous areas. More recently, GIS techniques have become a common feature in landslide susceptibility mapping (De Ploey et al. 1995;Pannatier et al. 2009), which can be grouped into two components: linear regression analysis and nonlinear regression analysis. Logistic regression models  are the most widely used of linear regression analysis. Neural networks (Cao et al. 2019) is a kind of nonlinear regression approaches and artificial neural network (ANN) (Tian et al. 2019) as one of the major neural network algorithms is developed on the basis of neural network. Model algorithm accuracy and diversification have improved geological hazard susceptibility mapping, particularly with the rapid growth of computer technology and the popularization of GIS techniques (Tehrani et al. 2021).
Geological hazard susceptibility mapping typically includes physical modelling, statistical analysis, and software calculations (Pourghasemi and Rahmati 2018). The physical models require a large number of simulation experiments, as well as analyzing the engineering geological conditions and slope of the field, which can prove to be difficult for large scale areas (Prasad et al. 2016). Statistical analysis generally calls for the parameterization of the structure model and the integration with machine learning techniques to determine the relationship between factors (Vasu and Lee 2016). Machine learning methods are efficient and can help to predict disaster risk and decrease disaster costs. Thus regions vulnerable to natural hazards should adopt detailed assessments that are implemented accordingly (Goetz et al. 2015;Nhu et al. 2020;Jiang et al. 2021). In this regard, susceptibility is often used in natural hazards to analyze and predict the relevant human activities, environment, and geology on a such scale as it not only considers the multivariate limitation but also the nonlinearity (Tonini et al. 2020). Despite its importance in landslide susceptibility mapping there are currently no concerted standards for the selection of machine learning methods in geological hazard susceptibility mapping (Pourghasemi and Rahmati 2018;Li et al. 2020). Numerous machine learning models have been applied for geological hazard susceptibility mapping in recent years, making valuable progress. For example, random forest (RF) can outperform general statistical and heuristic analysis models , and also improves on the classification results of decision trees ). Thus, we employ RF in this paper. Moreover, ANNs and support vector machine (SVMs) are typically adopted in research on susceptibility mapping and we also compare them here Peethambaran et al. 2020). Maximum entropy (MaxEnt) is a widely used statistical-probabilistic machine learning algorithm that contains more universal generality (Kariminejad et al. 2019;Pandey et al. 2020) than other machine learning algorithms such as multiple adaptive regression splines (MARS) (Arabameri et al. 2019) and the adaptive neuro-fuzzy inference system (ANFIS) (Polykretis et al. 2019). RF, ANN and SVM also exhibit similar performances (Chen et al. 2017;Arabameri et al. 2020). However, the resulting accuracy is strongly influenced by the extent and training procedure. Further, research is required to determine which model is more suitable for mountainous geological hazard susceptibility assessments.
Despite their important research significance, landslides are often overlooked. In the present study, four linear and nonlinear machine learning algorithms were selected to develop landslide susceptibility maps: MaxEnt, SVM, ANN and RF. These high-accuracy and reliable algorithms have recently been utilized in landslide susceptibility studies across the Xulong gully (XLG) of Southwest China as well as other areas. We present a detailed classification of the impact factors of each algorithm to compare. Next introduced the trace ratio criterion (TRC) algorithm into the optimal algorithm to make a majorization. We denote the proposed algorithm TRC-SVM following the improved SVM algorithm and evaluate its performance using the receiver operating characteristic (ROC) curves, Root Mean Squared Error (RMSE) and Frequency Ratio (FR) as previous. This work combines remote sensing datasets and extensive field surveys of landslides and collapses, providing material for mudslides and amplifying its size. Our results determine the possible damming induced by a hydroelectric station close to XLG following the occurrence of a mudslide during the research period. The comparison of the four machine learning algorithms for landslide susceptibility provides a considerable contribution to the five villages in XLG. Furthermore, the optimal algorithm for landslide susceptibility mapping can identify and delineate landslide prone zones, which is beneficial to government decision-making and disaster avoidance.

Study area
Xulong gully (XLG) lies in the Jinshajiang Basin with a size of 55.6 km 2 and surrounded by five villages between the Qinghai-Tibet Plateau and Sichuan-Yunnan Plateau, southwest China (N 28 43 0 57 00 , W 99 7 0 56 00 to N 26 49 0 22 00 , W 99 13 0 29 00 ) ( Figure 1). The area contains one main gully and 13 branch gullies with hundreds of rockfalls and landslides. The elevation ranges between 2100 and 4800 m, with the highest elevation located in the northwest region. Mountains at higher and lower elevations have steep ridges and sheer inclines with horns forming V-type river valleys. The area generally exhibits a subtropical arid valley climate that varies greatly in time and space, with an average annual precipitation of approximately 363.3 mm, wet and rainy weather in winter and sunshine and wind in the summer. In addition, the high mountain canyon topography, heterogenous geo-environment and a variable climate result in the frequent occurrence of geological disasters. XLG is located in the northern Hengduan Mountains, a part of the SE Qinghai-Tibet Plateau of China, with a SW trend. The study area is crossed by crumbled rocks and legions of faults, including the riff-Riyu, Xumai-Niwu, and Zeng-Datong faults, etc. These three faults are the key contributors to the intermediate part of the Jinsha River faulting junction zone, which stretches from the north to south, with the riff-Riyu fault being the most fragmented, distorted, and deepest of the three. This fault is geologically located in a different geologic unit, whereby gneiss and greenschist of the Jinsha River Ophiolite suite (DTJ) from the broadest spectrum, followed by dolomite of the Silurian-Yongren Group (S 2 c), garnet-mica schist and marble of the third Xiongsong Group (P t2 X 3 ), conglomerate and sandstone of the Triassic (T 3 j), and lastly terraced fields and alluvial fans distributed in the Quaternary. This forms a complicated system of geological faults with a diversity of delicate rock textures under the influence of variable temperatures and active crustal movement, including landslides and rockfalls.

Data collection
The current study presents the application of different data sources used in landslide susceptibility mapping. Datasets were obtained from the interpretation of remote sensing images downloaded from Google Earth, with a 4 m resolution and referenced using a Tianditu on-line map and a Ziyuan-3 (ZY-3) satellite image with a higher resolution of 2 m. Field survey validations were then performed during the study, including the collection of reliable landslide positions, and road, water, and land-use data. Furthermore, information on historical landslides was collected via interviews with the regional annuals, and the local government. Precipitation data was provided by the China Meteorological Administration, a digital elevation model (DEM) of 2 m resolution was obtained from a ZY-3 satellite image, and a 12.5 m resolution image was taken from Earthdata, a website sharing information service by NASA's EOSDIS (Earth Observing System Data and Information System). We then digitized the lithology and structure from traditional maps with a scale of 1:50000. All datasets were stored in the same digital shape with a UTM-Zone 47 projection and WGS-84 datum. Figure 2 presents the methodological approach.

Inventory map and extracted factors
A landslide inventory map is one of the most important elements of a landslide susceptibility assessment (Salles and Duclaux 2015;Sun et al. 2020). Landslides typically occur in areas where they have already taken place, with a similar slope, lithology, elevation, fault environment, and so on (Cama et al. 2020). We created a landslide inventory map to demonstrate such relationships based on previous work. The online map and ZY-3 imagery were employed to identify landslide features (e.g., roundarmchair shapes, long strips, scarps, divergent texture, etc.; Figure 3 (a-c)) (Chang et al. 2020;Godone et al. 2020) and the field investigation was performed to ensure reliability ( Figure 3 (d-f)). Historical hazards from the Bureau of Natural Resources were combined with the multidimensional hazards to create a landslide inventory map with the 286 disasters shown in Figure 1. We then converted the 286 landslide polygons into points to ensure that each point is in a grid and each grid contains at least a single feature cell. A total of 7,938 points remained after the removal of invalid points and was denoted as the landslide sample. Such a selection not only makes these points replace the complete landslide information, but also improves the stability of the model by increasing the number of training samples. We employ 70% of the sample to train the model and 30% for the validation. The same amount of pseudo absence data was created by randomly dividing the points into 70% and 30% groups as the training data and validation data, respectively when required. The study area was made up of 583,615 pixels, 5,246 determined as landslide pixels and 5,246 selected as non-landslide pixels.
In order to evaluate the factors affecting landslides, we selected four groups of topographical, geological, hydrological, and landuse variables according to the empirical consensus on the interaction between landslide locations. The variables include elevation, slope aspect, slope angle, plan curvature, profile curvature, lithology, distance from faults, distance from streams, distance from roads, landuse, normalized difference vegetation index (NDVI), topographic wetness index (TWI), stream power index (SPI), sediment transport index (STI), soil and precipitation ( Figure 4). The seven geological layers (slope aspect, plan curvature, profile curvature, slope angle, elevation, SPI, and STI) were obtained from the DEM using ArcGIS 10.2 (ESRI) (Figure 4(a-e), (m-n)). Similarly, ArcGIS was employed to extract distance maps from the remote sensing interpretation results, the Tianditu online map, and the DEM (Figure 4(f) (h) (i)). NDVI (Figure 4(g)) was determined from band calculations of Landsat 8 imagery in ENVI 5.3 (Exelis Visual Information Solutions) as follows: where B4 and B5 are the red and near-infrared bands. We also calculated TWI ( Figure 4(j)) via ArcGIS, which combines precipitation data ( Figure 4(o)) from the China Meteorological Administration to precisely determine the impact of rainfall. Soil dataset (Figure 4(p)) was obtained from the National Earth System Science Data Center. Lastly, a seven-category landuse map was extracted by remote sensing image interpretation, containing residential, water, farmland, barren land, forest, grass, and snow-capped mountains (Figure 4(l)). All of these conditional factors were built in a 12.5 m grid.

Methodology
The machine learning (ML) method is commonly applied for landslide susceptibility analysis thanks to its learning advantage over other data classification algorithms. It simulates the human mind to decide data affiliations based on the known data to modulates the relationships of data elements to inhibit and activate other data. Here, we select four linear (MaxEnt and SVM) and nonlinear (RF and ANN) methods. The decision boundary for the classification of linear methods must be a straight line, although they can also use the curve fitting sample. We compared and selected the most suitable algorithm of the study area for further improvements, thus establishing an ML-integrated feature extraction method to realize landslide susceptibility mapping. In this section, we introduce the application of the Pearson correlation coefficient to make a preliminary assessment among factors. The principles of the four machine learning methods are then introduced, followed by a description of the improved optimal. Finally, model validation is introduced.

Preliminary assessment of factors
The Pearson correlation coefficient was employed to compare the factors by measuring the linear relationship and distance variables between two factors: where N is the number of variable values. If r ¼ 0, there is no linear correlation between x and y. The larger the absolute value of the correlation coefficient, the stronger the correlation; the closer the correlation coefficient is to 1 or À1, the stronger the correlation; and the closer the correlation coefficient is to 0, the weaker the correlation. The Pearson correlation coefficient compares different classes of factors that cannot be compared directly. The lower the correlation between factors, the higher the model reliability.

Maximum entropy
Entropy is a measurement of information. Maximum entropy (MaxEnt) is based on the fundamental principle of preferring the known information to subjective data when we extrapolate the unknown. However, more information will decrease the information entropy. Therefore, the MaxEnt tenet is the most objective model with the highest information entropy based on the known data. This model can forge links between various impacts and the interplay of these impacts and landslides. MaxEnt provides a description of the emphasis of each factor and the corresponding contribution rates. Due to its automatic selection of parameters and operation with accutate forecasting, MaxEnt is now common in landslide predictions Azareh et al. 2019).
Essentially, MaxEnt provides an estimate of probability. Information entropy in one system denotes a product of each possibility's values and "log(possibility)", then take its negative value such as Eq. (2): where H(P) is the probability and P(X) is the information entropy. The probability ranges between 0 and 1 and the log is lower than 0, thus when the negative is taken the entropy will be positive. We aim to determine P with the highest H (i.e., the highest entropy) and subsequently calculate the derivative of H(P) to obtain the extremum. For example, consider a system with n events and the probability of each event equal to p i and the sum of events must be equal to 1: with the constraint described as: X n i¼1 p i ¼ 1: In order to satisfy Eqs. (3) and (4), we construct a Lagrange function: Calculate its derivatives under the constraint to obtain final odds.

Support vector machine
A support vector machine (SVM) is employed for sorting possesses with high efficiency, identifying a hyperplane (Eq. (6)) when working within a high-dimensional data space to perform classification tasks ( Figure 5(a)).
where w and b are straitjacket factors. We define hyperplane L1, with two side planes L2 and L3. We call it the margin of the distance between L2 and L3 ( Figure. 5(b)). There exists one best plane only, although there are many planes among L2 and L3 that for margin maximization denoted as the optimal face split. L1 is the optimal face split. We take several points at the nearest distance from L1 on L2 and L3 are samples. They help us classify so we call them support vectors (Pandey et al. 2020). The distance between the support vector and L1 is calculated as follows: thus, to calculate d, we must determine straitjacket factors w and b. In order to do this, we introduce a Lagrange multiplier ai ! 0 to get Eq. (8): Following the calculations, the equation consists of partial derivatives of w and b. For L, we obtain the values of w and b, and the resultant classification model can be described as: SVM can be linear or nonlinear, depending on the presence of a kernel function. We chose a kernel function to maximize the accuracy and resolve nonlinear problems such as polymerization, radial basis function (RBF), and sigmoid. The training data could not be immediately sorted as it was linear in the actual assignment, thus the kernel function played an important role (Zhu and Blumberg 2002). The data was separated by multiple segmentation, where by the largest possible nearest distance from samples was used to segment a surface. The selection of a final hyperplane was then based on a linear algorithm.

Artificial neural networks
An artificial neural network (ANN) is a computer system that imitates the processes of the brain's neurons. The ANN is considered as a biological neuron by analogy and includes the cell body, dendrite, axon, and synapse. Dendrite cells are inputs that receive signals from other cells, axons are outputs that send signals to other cells, and synapses are the interface of the inputs and outputs through which a signal passes from one nerve cell to another. The information from the inputs alters the neuron potential and accumulates continuously. The neuron is activated and a pulse is generated and transmitted to the next neuron when greater than a threshold . Figure 6 presents an example of ANN neuron. Neural network leaning is known as training, allowing the neural network to respond to the external environment in a new pathway. Each neural network has an activation function y ¼ f(x), which is fit through a given x and y. ANNs are robust in training data, although training samples may contain errors that don't have an effect on output. And ANNs can withstand a long training time, which depends on the number of weights, training samples, and the parameter settings (Dou et al. 2015); and the ANNs that used for output with objective function can be a discrete value, real value, or a vector of attributes for several real or discrete values. These advantages result in the strong applicability of ANNs in landslide susceptibility analysis.

Random forest
Random forest (RF) is the ensemble of numerous decision trees. Ensembles include boosting and bagging, and RF is an improvement of bagging. During the bagging process, the training data is divided into n new training data and an independent model is constructed on each data point of the new data. Finally, we integrate the results of these models with the number of n (Figure 7).
We select N and M to represent the number of training cases and features, respectively. The number of input features m denotes the decision result of a node in the decision tree. For each node, m features are randomly selected, and the decision of each node in the decision tree is determined based on these features. The optimal splitting mode is calculated based on the characteristics of m. Each tree is constantly pruned to complete the classification.  3.3. Improved optimal algorithm: TRC-SVM The optimized SVM algorithm is denoted as the trace ratio criterion-support vector machine (TRC-SVM). The TRC-SVM is applied in feature extraction to reduce the data dimensionality via the trace ratio following the application of pretreatment factors through the branch bound method. The trace ratio algorithm is a typical feature selection algorithm of the filter model that widely employed in the feature selection of machine learning frameworks (Li et al. 2017). The traditional feature selection method calculates the score of a single feature to evaluate the advantages and disadvantages of all the features. The features with the highest score are selected to form feature subsets. However, the feature subsets obtained via this method are not optimal due to the presence of multiple features. TRC calculates the ratio of traces on two layers pair by pair, and the optimal feature subset is extended to several new layers. The new and original layer are then input into the SVM model following the determination of the optimal feature subset. Therefore, the features would give better performance to the energy of features by combining together (Wang et al. 2017). The TRC implements a new iterative method to directly calculate the score of the subset level in order to improve the optimization factor for SVM modelling.
Consider an original high-dimensional dataset x, x2R D , where D is the number of features per sample. Eq. (10) is employed to reduce the dimensions of original data set x to get new data set y.
The evaluation criteria of the feature subset is shown in Eq. (11): where matrices A w and A b describe the within and between class relationships of the data, respectively. If x i and x j belong to the same class or are closely related to each other, the value of (A w ) ij will be relatively large; if not, (A w )ij will be relatively small. Thus we aim to minimize P ij ky i À y j k 2 A w ð Þ ij : Moreover, A b describes the between class relationship of the data, and thus when selecting feature subsets, P ij ky i À y j k 2 A b ð Þ ij should be maximized. Eq. (12) describes the feature subset with the highest score: where L w and L b are Laplacian coefficients, The feature subset is obtained following a series of operations in order to overcome the factor defects. The feature subset is used in the machine learning algorithm as a conditional factor.

Assessing model performances
It is an essential process to measure the performance of model. Here, we employed the following factors for performance: the receiver operating characteristic (ROC) curve, the area under ROC curve (AUC), the root mean squared error (RMSE), and the frequency ratio (FR).
The ROC curve can effectively describe the classification performance of a classifier for samples with an uneven distribution (Vakhshoori and Zare 2018). The ROC space defines the false positive rate (or sensitivity) as the X-axis and the true positive rate (or specificity) as the Y-axis. We can get different data pairs by adjusting the classification threshold of the classifier. These data pairs are the ROC data points. The area below the ROC curve is denoted as the AUC, which is also commonly used to evaluate the classification accuracy of the model. The closer the AUC is to 1, the better the performance (Marjanovi c 2013).
The RMSE denotes the square root of the ratio of the deviation between the predicted and true values and the number of observations (Moayedi et al. 2019). It is used to measure the deviation between the observed and true values, thus the larger the value, the more instable the performance.
The FR is a map validation technique, calculated using the ratio of the area (%) of landslide pixels per class to the total area (%) of pixels per class. A significant increase in the ratio from low to high susceptibility, denotes a strong classification performance.

Factor assessments
We analyzed, the correlation of each factor pair prior to modelling (Figure 8). The majority of correlations were below 0.2 while some exhibited values between 0.2 and Figure 8. The relevance of landslide factor (DTF is distance to faults, DTR is distance to roads, DTS is distance to streams Pla is Plan Curvature, and Pro is profile curvature). 0.4, which means the factors were in low association and can be used for modelling. A high correlation was observed between precipitation and elevation as the precipitation changes with the elevation in the study area. The trace ratio criterion (TRC) could overcome this problem as optimal algorithm.

MaxEnt
We constructed a MaxEnt model with the MaxEnt software (version 3.4.0) designed by Steven J. Phillips in Java. We compared the accuracy by adjusting the test and training samples, divisible mode, and other parameters. The maximum number of iterations was set as 500 to avoid over-fitting and weak fitting. Figure 9 depicts the comparative contributions of the environmental variables to the MaxEnt model. To ascertain the first valuation, the regularization gain increment was added to the contribution of the reciprocal variable in each iteration of the training algorithm or deducted if the absolute value changed to a negative value. In the second valuation, training variables and background data were randomly arranged for each environment variable (Pandey et al. 2020). Slope, aspect, and landuse were observed as the most important factors influencing landslide susceptibility.

SVM
The SVM model was built in Matlab (2020a, MathWorks) and required the input of both landslide and non-landslide pixels. Several landslide control factors were discrete objects and nonlinearly separable, thus we employed a sigmoid kernel to change the linearly non-separable samples into detachable linear samples. Not all samples needed to be paired, called a loss function. We assigned a relaxation factor (n) greater than or equal to 0 to induce a function interval greater than or equal to 1 then adding the relaxation factor (n). Denote C as a parameter controlling the weight of n, the smaller the value of C, the wider the transitional zone, and vice versa. More specifically, n can be regarded as a measure for the loss function. The loss function was causes by linear SVM allowing the existence of vectors in the transition band and compromises to some misclassified vectors (Pradhan 2013;Pandey et al. 2020). Figure 9 presents the SVM-determined factor contributions. Lithology, landuse, and aspect were identified as the most important factors influencing landslide susceptibility.

ANN
The ANN model was constructed in Matlab (2020a, MathWorks) with landslides and non-landslides pixel using memory-based learning. The input training and test data were employed to establish a memory model and the results were stored in a large memory source. Following the instruction of a new test vector, the learning process places the new vector into a stored class for classifications (Choi et al. 2012;Dou et al. 2015). Here 50% training data and 50% test data had been experimented but did not perform perfectly. Therefore, 30% and 70% of the test and training data were adopted, respectively. Figure 9 displays the contribution of each factor, with lithology, aspect, and elevation observed as the dominant factors impacting landslides.

Application of RF
RF was developed in Rstudio (version 3.4.1, RStudio) using the Random Forest extension. We defined ntree as 700 and mtry as 6 to obtain the optimal results. After 450 iterations the OOB (out of bag) curve achieved a perfect fit with a low stable error during modelling. Figure 9 depicts the conditional factor weights in the model to exhibit the weight of each conditional factor (Chen et al. 2018a). The Mean Decrease Accuracy is the OOB error, the Mean Decrease Gini is the purity of the subset after dependence partition. Slope, aspect, lithology, and elevation were more important so that it was significant to get a reasonable control of deviation (Youssef et al. 2016). Figure 10 presents the landslide susceptibility maps determined via MaxEnt, SVM, ANN, and RF. The pixel values calculated using the models were used to generate classifications using the natural break classification scheme: low susceptibility (0.00-0.40), moderate susceptibility (0.40-0.70), high susceptibility (0.70-0.90), and very high susceptibility (0.90-1.00) (Pourghasemi et al. 2012;Chen et al. 2018b). The classification must abide by the following distribution: the majority of landslides occur in very high susceptible areas, several occur in high susceptible areas, and few landslides occur in low susceptible areas (Ayalew and Yamagishi 2005;Wang et al. 2019). Table 1 reports the classification results, landslide pixels, and FR values. The low susceptibility class was dominant, accounting for 40.52%, 52.22%, 50.63%, and 54.07% of the total for MaxEnt, SVM, ANN, and RF, and predicting 8.66%, 4.06%, 14.68%, and 2.76% of landslide pixels, respectively; the moderate group accounted for 30.96%, 27.48%, 28.11%, and 25.65% of the four models and predicted 17.29%, 9.77%, 12.70%, and 11.33% of the landslide pixels; the very high susceptibility class exhibited the lowest area: 9.86%, 8.21%, 6.90%, and 6.04%; and predicting the highest landslide pixels percentage at 52.19%, 62.83%, 51.81%, and 65.46%; the high susceptible group accounted for 18.66%, 12.09%, 14.35%, and 14.24%; and predicted 21.84%, 23.33%, 20.81%, and 20.46% of the landslide pixel percentages. The spatial distributions of the four maps followed similar trends.

Model performance evaluation and validation
We employed map and mathematical verifications to evaluate and validate the study. The mathematical verification involves the testing of the model, including the ROC curve and RMSE (Pourghasemi and Rahmati 2018). Table 2 reports the comprehensive comparison.
The rationality of each model classification is analyzed using the FR values in Table 1. The SVM is identified as the optimal model with the lowest additive value of VH and H susceptibility (20.30%) and the highest percentage of landslide pixels (86.16%) of the four models (Table 1). This is followed by RF, with the additive value of VH and H susceptibility determined as 20.56% and landslide pixels percentage of 83.81%. In addition, MaxEnt exhibited an additive value of VH and H susceptibility of 28.52% and landslide pixels percentage of 74.03%, while the corresponding values for ANN were 21.25% and 72.62%, respectively. All models exhibited reasonable distributions, while the SVM was optimal.
Machine learning (ML) methods generate a real value or probability prediction for a test sample and subsequently compare the predicted values with a classification threshold. A positive class is assigned if the prediction value is greater than the threshold, otherwise it will be classified as the inverse class. We changed the threshold from 0 to the maximum based on the prediction results of the learners. This allowed for each sample to be initially predicted as a positive example. As the threshold increased, the number of positive samples predicted was reduced by the learner until no sample was positive. The corresponding ROCs were obtained by calculating the values of two key variables and plotting their horizontal and vertical coordinates respectively. The ROC curve can easily identify the effect of any threshold on the generalized performance of the model (Youssef et al. 2016;Weidner et al. 2019). It  can also select the optimal threshold; the nearer the ROC curve is to the upper left corner, the more precise the model. Our results demonstrate the best threshold close to the upper left corner, with the lowest total number of false positives and negatives and the fewest classification errors. We compared the model performance based on the ROC curves, in which the least distance from the upper left corner was associated with the highest accuracy. The AUC was introduced to identify the good and bad model points, however this proved to be difficult at the intersection of the two ROC curves (Mazzanti et al. 2020). The larger the AUC, the better the model performance is in many practical applications. Figure 11 presents the results of both the training and validation datasets via the four models. The AUC training curve, values were determined as 0.835 (83.5%), 0.909 (90.9%), 0.857 (85.7%), and 0.885 (88.5%), for MaxEnt, SVM, ANN, and RF respectively. The corresponding AUC validation curve values were 0.807 (80.7%), 0.886 (88.6%), 0.844 (84.4%), and 0.862 (86.2%), respectively. All models exhibited values greater than 0.75 and differences between training and validation data were less than 0.05. This implies the applicability of the LSM approaches in the study area. SVM was associated with the optimal training curve performance followed by RF, ANN, and MaxEnt. The same trend was observed for the validation curve. Table 2 reports the RMSE of the four models. SVM exhibiting the lowest value followed by MaxEnt, RF, and ANN.

Application of TRC-SVM
Despite the reasonable classification results of the four models in the study area, their performance is poor compared to other areas (with higher AUC values close to near 0.93 (93%)) (Pham et al. 2016;Hu et al. 2020). Therefore, we improve the optimal algorithm by fusing the TRC and SVM to obtain the TRC-SVM. In addition, the correlation between precipitation and elevation of all factors is very high, the TRC-SVM can improve the classification via the robust recombination and optimization of factors. Figure 12 depicts the LSM. The TRC-SVM exhibits a flawless distribution of LSM around study area. The VH and H area is determined as 19.78% in total, with 85.41% landslide pixels and the RMSE remains unchanged. The AUC value is also improved (Figure 13), reaching   Table 3). The results reveal the higher accuracy of the optimized algorithm.

Discussion
We initially employed 16 conditioning factors in the landslide susceptibility mapping (Dai and Lee 2003;Luo et al. 2019) and their contributions were analyzed prior to modelling (Vojtekov a and Vojtek 2020). The type of conditioning factors is so important that decides the precision of the susceptibility results.
Four models were compared: two linear (MaxEnt, SVM) and two nonlinear (ANN, RF) methods. These models are commonly used in landslide susceptibility assessments. RF was observed to outperform the ANN method in terms of the LSM for future landslides predictions in eastern Turkey (Sevgen et al. 2019). A study based in China revealed the ability of ANN to outperform SVM ). However, the ANN is not always optimal. In Sari County, Iran, SVM exhibited better LSM classification results compared to the ANN (Kalantar et al. 2018). Moreover, a comparison of ANN SVM and MaxEnt demonstrated the latter to have a slight disadvantage (Chen et al. 2017). Numerous studies reveal the algorithm results to be a function of the sample classification, factor combinations, and even the research area, yet the underlying trend is the same. In the current study, determined the most suitable machine learning algorithm for LSM among the four models. Following multiple sample classifications and comprehensive factor selections, all models exhibited strong performances. However, the SVM achieved the highest AUC (0.909/0.886, training/ validation dataset) and lowest RMSE (0.012), as well as reasonable FR values. The results indicate it to be the most suitable model for the study area. RF and ANN are on a par with MaxEnt based on just the evaluation method comparisons. (Table 2), with the following order in terms of RMSE: SVM, MaxEnt, RF and ANN. When AUC is used as the indicator, the models are ranked as SVM, RF, ANN and MaxEnt, and RF, SVM, ANN and MaxEnt when judging by FR values. In-depth investigations are not possible through the above comparison, thus we further employed field investigations and the analysis of landslide conditioning factors. The pixel number of different LSM grades were compared with the number of pixels corresponding to the landslide (Table 1). This can also be regarded as a comparison of the FR under different LSM grades. It should be declining from VH to L that verified the models are in high reliability. However, the gradient of this ratio from VH to L can be further judging the models. Because H and M do not mean there is no landslide, it is reasonable H bigger than M and not very close with each other. So SVM is the best as its best gradient among them.
We then performed a detailed analysis of the factor performances in order to further explore the differences and similarities among the four models. As not all factors were of great significance, the most five influential factors selected for analysis. Table 4 reports the relationship between landslides, controlling factors, and the corresponding results of these factors and LSM determined from the four algorithms. Each factor was divided into several grades (classes) in order to determine the statistics of the following three variables: the proportion of each class in a factor; the number of landslide pixels in a class; and the domain of each level (very high susceptibility (VH), high susceptibility (H), moderate susceptibility (M) and low susceptibility (L)) for each model and class. The dominant conditioning factors were lithology, slope, aspect, elevation, and landuse. Figure 13 compares the detailed classification in VH of the five conditioning factors for the four models, where the pie represents the class proportion in VH and the height is the classification proportion in the conditioning factor ( Figure 14) . Previous research based on machine learning models determined aspect, slope and elevation as dominant factors impacting landslides (Nsengiyumva et al. 2019;Dou et al. 2020), particularly in XLG. Landslides to the southeast exhibited maximum VH levels across all models and were marked in red in Table 4. The 30-40 slope class accounted for the highest VH level, agreeing with the observations in the literature (Dahal 2014;Youssef et al. 2016). Our results also identified elevation as a significant factor in landslides, with a maximal duty ratio of 3,800-4,200 m for all four models at the VH level (Table 4) (Youssef et al. 2016;Cislaghi et al. 2019). B m T exhibited the highest proportion of very high susceptible areas (nearly 50%) in all four models. However, a lower proportion area (28.73%) (red in Table 4) reveals B m T to have the greatest influence on landslides in XLG. Moreover, the forest class was observed to be the dominant landuse (red in Table 4). Comparisons of the maximum percentage at the VH level for each class across all conditional factors indicate the high consistency and model was reliability.
Based on the strong correlation between the precipitation and elevation conditioning factors, we proposed several improvement measures. The four commonly used models achieved excellent susceptibility classification results, however the improved synthesis algorithms exhibited a better performance Zhao and Zhao 2021). This study improved the optimal algorithm of the four models through two key steps: (1) the high correlation between precipitation and elevation allowed precipitation to be ignored in the factor refinement classification. However, precipitation was identified as an important LSM factor (Bui et al. 2015) and thus it could not be ignored in modelling. The TRC algorithm was thus introduced into the optimal model (SVM) as the TRC-SVM. Despite no improvements in the RMSE (0.012), the TRC-SVM increased the classification accuracy and maintained stability. (2) The proposed model enhanced the AUC values from 0.886 to 0.930 for the validation dataset and from 0.909 to 0.943 for the training dataset. Comparisons of the FR values reveal the slight superiority of the TRC-SVM compared to SVM in VH, while this was not the case for the gradient. The improved algorithm thus exhibited a better classification effect for the XLG.

Conclusions
Landslide susceptibility mapping is of great significance in large-scale landslide prediction and is used with increasing frequency in areas where it is difficult to conduct a full field investigation (e.g., southwest China). In the current study, we conducted landslide susceptibility mapping based on four machine learning methods, namely two linear (MaxEnt and SVM) and two nonlinear (ANN and RF) algorithms that are representative approaches for landslide susceptibility mapping. Following a comprehensive review of the literature, 16 conditional factors were summarized and calculated via the four models. We aimed to comprehensively select the most suitable susceptibility evaluation model for XLG and similar areas. The AUC was slightly lower than some of the other studies, although the models were consistent. Thus, the optimal SVM algorithm was improved with the proposed TRC-SVM, enhancing the classification results. This algorithm can be applied to similar mountainous areas to for susceptibility zoning. These results also provide an effective basis for government decision-making and urban planning for disaster prevention and mitigation, as well as resident relocation, due to landslides in southwest China.
All models employed 16 factors in order to reflect the fairness of the four models, there is no classification analysis for which factors are suitable for certain model. The factor combinations may affect the modelling process. This is a disadvantage of the SVM and may be overcome by integrating the TRC algorithm with ANN, RF or MaxEnt. The investigation of such problems is reserved for future work. Our work demonstrates the ability of the TRC to optimize machine learning classification to some extent, making a contribution to the factor optimization of the SVM algorithm.