GIS-based landslide susceptibility modelling: a comparative assessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision tree models

ABSTRACT The main purpose of this paper is to explore some potential applications of sophisticated machine learning techniques such as the kernel logistic regression, Naïve-Bayes tree and alternating decision tree models for landslide susceptibility analysis at Taibai county (China). Initially, a landslide inventory map containing the information of 212 historical landslide locations was prepared. Seventy percentage (148) of landslides were randomly selected for training models and the remaining were used for validation. Additionally, 12 landslide conditioning factors were considered and the thematic layers were prepared in GIS. Subsequently, these three models were applied to build landslide susceptibility maps. The performances of the models were compared using the receive operating characteristic curves, kappa index, and statistical evaluation measures. The results show that the KLR model has the highest AUC values of 0.910 and 0.936 for training and validation datasets, respectively. The KLR model also has the highest degree of goodness-of-fits (84.5%) for the training dataset. The NBTree model has the highest goodness-of-fits (91.4%) for the validation dataset. However, the KLR model has the preferable balance performance for both the training and validation process. The results of this study demonstrate the benefit of selecting the optimal machine learning techniques in landslide susceptibility mapping.


Introduction
Landslide is defined as the downslope movement of soil and rock affected by gravity (Malamud et al. 2004). It is one of the most frequent and catastrophic geologic hazards causing enormous casualties and severe economic losses Myronidis et al. 2016;Pradhan et al. 2016). However, the complexity of soil and rock condition, topography, hydrology, and human activities makes accurate landslide forecast an extremely difficult task and presents a major challenge to the global change research community (Wu & Sidle 1995;Pielke et al. 2003;Gokceoglu & Sezer 2012).
A broad range of methods and techniques have been suggested by early researchers to improve the prediction accuracy of landslides. These methods and techniques are mainly physically based or statistically based (Van Westen & Terlien 1996;Duc 2013;Thanh & De Smedt 2014). The physically based methods evaluate the safety factor of slopes based on the detailed geomorphic and geologic properties at site-specific locations and generally provide accurate results. However, it would be quite expensive and not practical in terms of regional landslide susceptibility assessments.
The kernel logistic regression (KLR) is a kernel version of logistic regression that constructs a linear logistic regression model in a high-dimensional space by using a kernel function (generally, Radial Basis Function). It is powerful to classify the data that is difficult to be distinguished in the current dimensional space. In case of the Na€ ıve-Bayes tree (NBTree), it is a hybrid algorithm of decision tree and Na€ ıve Bayes. The NBTree takes advantage of the simple structures decision tree and creates a na€ ıve-bayes classifier at each leaf node to facilitate the classification. The alternating decision tree (ADTree) develops the structures of decision tree and then combines it with boosting algorithm. The main advantages of the NBTree and the ADTree are that they have easier-to-construct structures and easier-to-interpret classification rules.
The main purpose of this study is to compare and explore potential applications of three models, namely, KLR, the NBTree, and the ADTree in landslide susceptibility mapping at Taibai County (China). KLR, NBTree, and ADTree models have been used in few studies on landslide susceptibility mapping and showed satisfactory results (Hong et al. 2015;Pham et al. 2015b;Pham et al. 2016;Tien Bui et al. 2016b). Therefore, investigation of these new methods for landslide susceptibility is highly necessary.

Study area
The study area is located in the hinterland of Qinling Mountains which lies within longitudes 107 03 0 E to 107 46 0 E and latitudes 33 38 0 N to 34 10 0 N. It covers a surface area of about 2,780 km 2 ( Figure 1). The altitudes of the study area range from 740 m to 3,767 m. In the study area the continental monsoon climate prevails, as well as alpine climate. The mean annual temperature is 7.8 C, the mean annual sunshine duration is about 1,840 hours, and the mean annual precipitation is around 700 mm, most of which falls between July and September.

Landslide inventory
It is well known that information about landslides occurred in the past and present will do great help to predict the landslides in the future. Hence, it is crucial to gather the data reliably when mapping the landslide inventory map. In the present study, in order to compile a reliable landslide inventory map, early reports, aerial photographs interpretation, coupled with comprehensive field surveys by GPS (Global Positioning System) to locate the landslide positions. A total of 212 landslides were marked in the study area ( Figure 1). The landslide inventory consists of 197 slides, 10 falls, and 5 debris flows (Hungr et al. 2014). The smallest landslide is about 20 m 2 , the largest is about 1.1 £ 10 5 m 2 , and the average is about 7,500 m 2 . Several techniques have been adopted to deal with the landslide polygon to point representation, such as centroids (Xu et al. 2012a;Dou et al. 2015;Oliveira et al. 2015;Chen et al. 2016e;Youssef et al. 2016;Chen et al. 2017), seed cells (S€ uzen & Doyuran 2004a(S€ uzen & Doyuran , 2004b, and diagnostic area (Lombardo et al. 2014). As more than 85% of the landslides in the study area are less than 10,000 m 2 ; therefore, the centroid technique was used to deal with the transformation of landslide polygon to point in the present study.
The landslide inventory also shows that most landslides occurred along the main valleys and road networks. This means that the road networks have a strong impact on landslide occurrences (Brenning et al. 2014;Tien Bui et al. 2014).

Landslide conditioning factors
In this study, 12 factors were chosen according to the literature review and the general geo-environment of the study area. These conditioning factors are slope aspect, slope angle, altitude, profile curvature, plan curvature, NDVI, landuse, lithological unit, distance to rivers, distance to roads, distance to faults, and mean annual precipitation. A digital elevation model (DEM) with the resolution of 30 m £ 30m was extracted from the ASTER GDEM data. The data-sets were collected from the National Aeronautics and Space Administration (NASA) website (http://reverb.echo.nasa.gov/reverb/). The DEM data was applied to extract slope aspect, slope angle, altitude, profile curvature, and plan curvature of the study area using ArcGIS 10.0 software (Figure 2(a-e)). The Landsat 8 OLI images with the resolution of 30 m £ 30 m were adopted to extract NDVI and land use using ENVI 5.1 software (Figure 2(f,g)). The data-sets were collected from the U.S. Geological Survey (USGS) website (http:// landsat.usgs.gov/landsat8.php). The lithological unit map was compiled from the geological map in 1:500,000 scale, and was grouped into six classes according to their lithological characters and geological ages (Figure 2(h)). Distance to rivers, distance to roads, and distance to faults were produced from topographic map (1:50,000), and geological map (1:500,000), respectively (Figure 2(i-k)). The mean annual precipitation map was compiled from the precipitation data of the study area (Figure 2(l)) (http://www.sxmb.gov.cn/index.php). The detailed information of the classes for each landslide conditioning factors is shown in Table 1. Finally, all thematic maps were converted into the same resolution of 30 m £ 30 m, and the study area covers 2,240 columns and 2,020 rows.
In this study, the correlation between landslides and conditioning factors were calculated using frequency ratio model (Akgun et al. 2008;Chen et al. 2016b), and the calculated frequency ratios of each landslide conditioning factor were normalized in the range of 0-1. Then, the normalized frequency ratios were used as inputs to build models and produce landslide susceptibility maps.

KLR model
KLR is a kernel version of logistic regression. It aims to classify the data which is difficult to be distinguished in current dimensional space by constructing a linear logistic regression model in a highdimensional space. The KLR can be carried as follows: where w is a vector of input variables (landslide conditioning factors), '(¢) does a nonlinear transformation to each input variable, and b is a vector of constant. A logit function can be written as follows: Hence, we could write the transformation of Equation (2) as: There is a predigestion trick in the calculation of '(x): as '(x)''(x) is inevitable outcome in the calculation procedure, we define the inner product between the images of vectors as the kernel function. That is: The Mercer's condition (Mercer 1909) must be obeyed thus the interpretation of the kernel function will be valid. Several kernel functions have been suggested such as the liner kernel, the polynomial kernel, radial basis function (RBF), and the sigmoid kernel (Lin & Lin 2003). In this study, the radial basis function kernel is applied which is the most commonly used in practical applications (Cawley & Talbot 2005). The RBF kernel is as follows: where d is the turning parameter that controls the sensitivity of the kernel. A optimal w that can be found by minimizing a cost function according to the representer theorem (Kimeldorf & Wahba 1971;Sch€ olkopf et al. 2001).

Equal interval
However, InfoGain biases toward attributes having many distinct values and the corresponding number of splits is unreasonable. To solve this problem, InfoGain is normalized by SplitInfo in C4.5 decision tree. The SplitInfo is defined as follows: SplitInfo is a kind of Entropy on the split point of an attribute. The information gain ratio in C4.5 decision tree is defined as follows: Na€ ıve-Bayes is a popular learning algorithm in pattern recognition due to its simplicity and linear run-time. It assumes that predictive attributes a 1 , a 2 , …, a m are conditionally independent. Given a class attribute c j , a class attribute of class set C, the joint distribution is then given compactly by: Na€ ıve-Bayes aims to decide and choose the class that maximizes the posteriori probability, and then the classification rule of NB is: where k is the total number of classes.

Alternating decision tree model
Alternating decision tree is a classifier combining boosting algorithm and decision tree (Freund & Mason 1999). Its representation is a direct generalization of decision tree where each decision node is replaced by two nodes: a splitter node and a prediction node. A base ruler mapping from instances to real numbers consists of a precondition c 1 , a base condition c 2 , and two real numbers a and b. The prediction is a if c 1 \c 2 or b if c 1 \ ¡ c 2 . ¡ denotes negation (NOT). The values of a and b are calculated as: where W(p) represents the total weight of the training instances that satisfy the predicate p. The best c 1 and c 2 are selected by minimizing the Z t (c 1 ,c 2 ) which is defined as: If symbol R was defined as a set base rules, a new rule could be defined as R t C 1 D R t C r t . r t (x) represents the two prediction values (a and b) at each layer of the tree (with T prediction layers). x represents a set of instances. The classification of an instance is the sign of the sum of all the prediction values in R t C 1 : The algorithm starts by finding the best constant prediction for the entire data-set (Freund 7 Mason 1999). A cross validation is usually applied to make the selection (Dietterich 2000).

Preparation of training and validation data-set
In landslide susceptibility analysis, the dependent variable is a binary variable representing landslides or non-landslide. Models introduced above should be constructed using the training data-set that contains landslide pixels and non-landslide pixels, and validated by a validation data-set that contains landslide pixels and non-landslide pixels. It is generally recommended to use equal landslide and no-landslide pixels (Yesilnacar & Topal 2005;Nefeslioglu et al. 2008;Tien Bui et al. 2016b). With respect to replicate sampling, it is reported that single-sample studies run the risk of accidentally yielding a poor model, and the calculation of multiple models based on independent random samples should be advocated (Heckmann et al. 2014). Brenning (2005) took 50 samples to compare error rates across different sample sizes and statistical methods. Lombardo et al. (2016) also produced a set of 50 randomly generated replicates to assess landslide triggering thickness susceptibility. Cama et al. (2016) evaluated the robustness of predictive results on a total number of 80 repetitions. However, Heckmann et al. (2014) pointed out that these may not be enough and extracted 1,000 replicates. Therefore, in this paper, 70% of landslide locations (centroid, and a landslide location is shown as a pixel) were selected randomly to build landslide susceptibility models and the rest (30%) were used as validation purpose. Then, 212 non-landslide pixels were randomly selected outside the landslide polygons, and were also split into two parts (70/30) (Figure 1). Therefore, the final training data-set contained 70% (148) of the landslide pixels (points) and 148 nonlandslide pixels. The validation data-set contained the remaining 30% (64) of the landslide pixels (points) and 64 non-landslide pixels. This generating and splitting process was repeated more than 10 times and the goodness of fit and prediction ability of each replicate have been assessed through area under the receiver operating characteristic (ROC) curves (AUC) and the statistical evaluation measures to find the best combination. Besides, during the running process of the models, 10-fold cross-validation method was used to reduce variability of the models results.

Multicolinearity analysis of landslide conditioning factors
In landslide susceptibility mapping, it is important to check the multicollinearities of landslide conditioning factors. The tolerance (TOL) and variance inflation factor (VIF) are two widely used indexes for multicollinearity checking. A TOL value less than 0.1 or a VIF value larger than 10 indicates a serious multicollinearity (Tien Bui et al. 2011). In this study, the TOL and VIF values were calculated with the training dataset using IBM SPSS Statistics 21 software, and the results are shown in Table 2. The results show that there are no multicollinearities among the 12 landslide conditioning factors.

Model evaluation and assessment
4.6.1. Statistical evaluation measures By using different cut-off values, the predicted probabilities were classified into one of the response levels (i.e. landslide or non-landslide). For each cut-off value, there always exists misclassification situation. TP (true positive) and TN (true negative) represent the number of landslides and nonlandslides that were correctly classified to the corresponding classes. Whereas FP (false positive) represents the number of non-landslides that were misclassified to the landslides class and FN (false negative) represents the number of landslide that were misclassified to the non-landslides class. Sensitivity, Specificity, and Accuracy were calculated as follows: Accuracy is the proportion of landslide and non-landslide pixels that were correctly classified; Sensitivity is the proportion of landslide pixels that are correctly classified as landslide occurrences; Specificity is the proportion of the non-landslide pixels that are correctly classified as non-landslide.

The ROC curve and kappa index
Undoubtedly, if a susceptibility model leads to high percentage of correct classification and a low percentage of false positive and false negative classification, it could be considered as a valid model (Gobin et al. 2001). The model validity can be graphically represented on an ROC curve. The ROC curve was created by plotting 'sensitivity' versus '1 ¡ specificity' as the cut-off value varies. As the ROC curve cannot clearly explain the performance of a model, the area under the ROC curve (AUC) is usually applied for a quantitative comparison. An AUC value of 1 indicates a perfect model that correctly classified all landslide and non-landslide pixels, while an AUC value of 0 indicates a non-informative model (Walter 2002).

Importance of conditioning factors by different models
The result of importance of conditioning factors by different models is shown in Table 3-5. It can be seen that all the conditioning factors have contributed to the models. Distance to roads, with an average merit (AM) value of 0.309, is the most important factor for three models, followed by altitude and distance to rivers, which also have similar importance to the three models, respectively. The other conditioning factors have different contributions depending on the models used. Besides, all the three models yield small contributions by profile curvature, plan curvature, and slope aspect. As all 12 conditioning factors have positive contributions to the three models, all 12 conditioning factors were used in the analysis to build landslide susceptibility maps.

Model results and analysis
In the present study, the KLR model was applied to extract the spatial relationship between landslides and conditioning factors. A forward stepwise process was carried out to incorporate the conditioning factor that has an important contribution to the landslide. The calculated probability values were classified into five relative susceptibility classes using geometrical interval method: very low (0.001-0.017) (11.20%), low (0.017-0.058) (28.06%), moderate (0.058-0.157) (23.53%), high (0.157-0.400) (15.49%), and very high (0.400-0.998) (21.72%) (Figures 3 and 6). NBTree model was applied to classify the pixel using the training data-set. Each pixel contains the information of conditioning factors and landslide (or non-landslide). At each decision node, a Na€ ıve-Bayes classifier will be constructed based on the sample set that filters into the node. The NBTree continues at data subset which includes instances of different class and stops at 'pure' data subset in which the instances are of the same class. At each leaf node (prediction node), an actual value will be given representing the contribution to the landslides (or non-landslide). All the contributions encountered in the NBTree were summed to give the final prediction which is used to generate the landslides susceptibility map. Five relative susceptibility classes were divided according to the prediction value using geometrical interval method: very low (0.009-0.013) (15.68%), low (0.013-0.028) (29.92%), moderate (0.028-0.180) (21.53%), high (0.180-0.464) (14.11%), and very high (0.464-0.912) (18.84%) (Figures 4 and 6).
The ADTree model was used to estimate the susceptibility index for each pixel using the training data-set. Pixels go down along the paths defined by the ADTree model. When reaching a decision node, a path will continue with the child node corresponding to the outcome of decision node. When reaching a prediction node, a path will continue with all of the children of the node. Susceptibility index was calculated by summing up all the values of prediction nodes which are encountered in the ADTree. In this study, geometrical interval method was applied to divide the landslide susceptibility index into five relative susceptibility classes, i.e. very low (0.056-0.196) (22.09%), low (0.196-0.356) (31.23%), moderate (0.356-0.540) (21.95%), high (0.540-0.749) (12.32%), and very high (0.749-0.990) (12.41%) (Figures 5 and 6).

Model performance and validation
KLR, NBTree, and ADTree models were constructed using the training data-set, and 10-fold crossvalidation method was used during the training and validation process to reduce variability of the result. The result is shown in Table 6. The ROC curves and AUC values for KLR, NBTree, and ADTree are shown in Figure 7.
The KLR has the highest accuracy with a value of 0.845, followed by the the ADTree (0.841), and the NBTree (0.814). For the sensitivity value, the ADTree shows a value of 0.892, indicating the probability to correctly classify landslide pixels to the landslide class is 89.2%, followed by the KLR (0.865) and the NBTree (0.858). The KLR has the highest value of specificity (0.824) explaining that 82.4% non-landslide pixels are correctly classified to the non-landslide class. It is followed by the ADTree (0.791) and the NBTree (0.770).
The AUC values are 0.910, 0.880, and 0.903 for the KLR, the NBTree, and the ADTree models, respectively. It indicates a high degree of fit to the training data-set and the highest one is for the KLR model. Kappa indexes are 0.689, 0.628, and 0.682 for the KLR, the NBTree, and the ADTree models, respectively, which indicate a substantial agreement between prediction and observation.
The prediction capability and confirm accuracy of the trained susceptibility models was validated using the validation dataset. The AUC value, the kappa index, and the three statistical indexes introduced above were used in the validation process. The results are shown in Table 7. The ROC curves and AUC values for these three models are shown in Figure 8.
The results show that the NBTree has the highest classification accuracy (0.914) for the validation data-set, followed by the KLR model (0.898) and the ADTree (0.820). For the sensitivity, the highest is for the NBTree with a quite high value of 0.984, indicating that 98.4% of the landslide pixels are correctly classified into the landslide class. The KLR has the second highest sensitivity value of Figure 6. Percentages of different landslide susceptibility classes for three models.   followed by the ADTree model (0.917). The kappa indexes are 0.797, 0.828, and 0.641 for KLR, NBTree, and ADTree models, respectively. The results also indicate a substantial agreement between predicted and observed landslides.

Discussion and conclusion
Spatial prediction of landslides is considered to be useful for land-use planning and the first important step in landslide hazard and risk assessment (Fell et al. 2008). Therefore, it is necessary to select a susceptibility model with high prediction capability which depends on the methods used. Various methods for landslide susceptibility modelling have been suggested by the previous researchers, while the prediction accuracy of these methods is still debated (Akgun 2012). Some advanced machine learning methods such as KLR, NBTree, and ADTree have been applied in biomedical field (Liu et al. 2005;Lee et al. 2006), acoustic field (Katz et al. 2006;Karsmakers et al. 2007), Internet field (Cai et al. 2000), etc., and showed satisfactory results. Therefore, the investigation of new methods is highly necessary and it will help reach reasonable conclusions. As the application of these models for landslide susceptibility mapping has seldom been used. Thus, this study carried out an investigation and comparison study of KLR, NBTree, and ADTree model for landslide susceptibility mapping.
The prediction capability of a susceptibility model depends on the method used. The goodnessof-fits and the validations of three susceptibility models are good and there is no significant difference between the AUC values of these three models (Figures 7 and 8). For the statistical indexes, they differed significantly between the models. In terms of classification accuracy, the KLR and the ADTree models are really close for the training data-set and are higher than that of the NBTree model by 3%. For the validation data-set, the NBT model has the highest classification accuracy of 91.4%. Therefore, the NBTree model showed a notable prediction power for the validation data-set and gave a poor account of goodness-of-fit for the training data-set. On the contrary, the ADTree has a high degree of goodness-of-fit for the training data-set and made a poor prediction for the validation data-set. In general, the KLR model has more balance for the training and validation datasets in terms of the statistical index (Tables 6 and 7), while the NBTree and the ADTree models show significantly variance.
As not all the selected landslide conditioning factors have equal predictive capabilities, and sometimes some of them may create noises that reduce the prediction quality (Tien Bui et al. 2016b). In the present study, no multicollinearity was found among the 12 landslide conditioning factors (Table 2), and all the conditioning factors have contributions to the models. However, the results also showed that different conditioning factors had different contributions to models (Tables 3-5). In general, distance to roads, altitude, distance to rivers, and land-use have the highest importance to the three models, whereas profile curvature, plan curvature, and slope aspect reveal low predictive capabilities. In order illustrate the importance of landslide conditioning factors to the three used models visually, the normalized predictive capabilities of conditioning factors for the three models were calculated to indicate the relative importance of conditioning factors (Figure 9). It could be observed that the contribution of distance to roads occupies the highest percentages of 17.667%, 19.846%, and 18.945% for KLR, NBTree, and ADTree models, respectively. The highest contribution of distance to roads is related to the fact that 80.4% of landslide locations occurred along the main valleys and road networks (less than 500 m) (Figure 1). This high contribution of distance to roads also give rise to the phenomenon that the very high susceptibility classes follow similar spatial pattern of the roads. The slope aspect occupies the lowest percentages of 2.916%, 1.092%, and 2.391% for the three models, respectively. The NDVI produces higher contribution than lithological unit for the NBTree and ADTree models, while it is on the contrary for the KLR model. Therefore, it can be concluded that landslide conditioning factors tend to have different contributions depending on the types of models used (Tien Bui et al. 2016b). Considering the overall performance and the model construction, the KLR model is considered better in this paper, because it shows a stable classification ability on both training and validation dataset. The landslide susceptibility maps produced in this study could be helpful for land-use planner to choose suitable construction sites.