A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China

ABSTRACT The main objective of this study was to produce landslide susceptibility maps for Langao County, China, using a novel hybrid artificial intelligence method based on rotation forest ensembles (RFEs) and naïve Bayes tree (NBT) classifiers labeled the RF-NBT model. The spatial database consisted of eighteen conditioning factors that were selected using the information gain ratio (IGR) method. The model was evaluated using quantitative statistical criteria, including the sensitivity, specificity, accuracy, root mean squared error (RMSE), and area under the receiver operating characteristic curve (AUC). Furthermore, the new model was compared with the NBT, functional tree (FT), logistic model tree (LMT) and reduced-error pruning tree (REPTree) soft computing benchmark models. The findings indicated that the RF-NBT model showed an increased prediction accuracy relative to the NBT model using both the training and validation datasets, and the RF-NBT model exhibited a greater capability for landslide susceptibility mapping. The new RF-NBT model also showed the most preferable results compared with the FT, LMT and REPTree models. Finally, an analysis of the landslide density (LD) using the RF-NBT model demonstrated that the very high susceptibility (VHS) class had the highest LD (3.552) among the landslide susceptibility maps. These results can be used for the planning and management of areas vulnerable to landslides in order to prevent damages caused by such natural disasters.


Introduction
Landslides are natural disasters that produce intensive, widespread damage to buildings and infrastructures, and they are the cause of countless casualties and economic distress in many nations worldwide (Akgun and T€ urk 2010). Both direct and indirect landslide-related economic losses in China amount to costs of more than CNY 20 billion every year, making the lives of local inhabitants CONTACT Himan Shahabi h.shahabi@uok.ac.ir difficult (Quan-min et al. 2005). Furthermore, most of the geological disasters, such as landslides, collapses, fault slips, land subsidence, ground fissures and mine disasters, that have occurred in the central regions of China (e.g., Shaanxi Province) have been due to the extensive extraction of groundwater and the massive exploitation of mineral resources, including oil and gas resources Dong et al. 2014). Accordingly, landslides generate immense economic losses and casualties every year throughout China. In this respect, it is essential to understand and predict future landslides to mitigate their consequences through landslide susceptibility mapping (LSM). LSM can be accomplished by providing risk managers with easily accessible, continuous, and accurate information about the occurrence of landslides (Shadman Roodposhti et al. 2016). However, to our knowledge, a standard procedure for the production of LSMs does not exist (Ercanoglu and Gokceoglu 2004). Hence, many techniques and methods have been applied to generate LSMs in many regions over the globe in recent decades. Some examples include the use of logistic regression (Miller and Degg 2012;Conoscenti et al. 2015;Tsangaratos et al. 2017), bivariate statistics (Yalcin 2008;Youssef et al. 2015;Wang et al. 2015a;Shirzadi et al. 2017b), multivariate regression (Pradhan 2010;Felic ısimo et al. 2013;Conoscenti et al. 2015;Wang et al. 2015a), multivariate adaptive regression splines (Felic ısimo et al. 2013;Conoscenti et al. 2015;Wang, et al. 2015a), generalized additive models (GAMs) (Goetz et al. 2011;Persichillo et al. 2016;Chen et al. 2017d), discriminant analysis (Guzzetti et al. 2006;Dong et al. 2009;He et al. 2012), weighted linear combinations (Ayalew et al. 2004;, frequency ratios (Regmi et al. 2014;Shahabi et al. 2014;Romer and Ferentinou 2016), weights of evidence (Pourghasemi et al. 2013;Wang et al. 2016;Tsangaratos et al. 2017), analytic hierarchy processes (AHP) (Pourghasemi and Rossi 2017;Zhang et al. 2016), and evidential belief functions (EBFs) (Althuwaynee et al. 2014;Jebur et al. 2015;Pourghasemi and Kerle 2016). These models were additionally employed by some researchers for the production of geomorphological landform susceptibility maps to examine karst collapse features (Papadopoulou-Vrynioti et al. 2013) and conduct multi-hazard assessments in urban areas (Bathrellos et al. 2012;Chousianitis et al. 2016;Bathrellos et al. 2017).
Recently, machine learning techniques have become more popular among researchers for studies on the assessment of hazard susceptibilities. Machine learning is a branch of artificial intelligence that uses computer algorithms to analyze and predict information through learning from training data (Jordan and Mitchell 2015). In addition to the abovementioned methods, various machine learning algorithms have been applied for analyses of landslide susceptibilities, such as neuro-fuzzy systems (Oh and Pradhan 2011;Chen et al. 2017a;Chen et al. 2017e), artificial neural networks (Dou et al. 2015;Tien Bui et al. 2016b;Chen et al. 2017f), kernel logistic regression (Tien Bui et al. 2016b;Chen et al. 2017g), decision trees (DTs) (Tien Bui et al. 2014;Hong et al. 2015;Lombardo et al. 2015), support vector machines (Hong et al. 2016b;Tien Bui, et al. 2016b;Chen et al. 2017c), naive Bayes classifiers (Tsangaratos and Ilia 2016;Pham et al. 2016b;Shirzadi et al. 2017a), and boosted regression trees (Hong et al. 2015).
Moreover, new hybrid methods are being used in LSM studies due to their novelty and ability to provide comprehensive evaluations of landslide-related independent variables for each class of independent layers (Dehnavi et al. 2015;Tien Bui et al. 2015;Nasiri Aghdam et al. 2016;Tien Bui et al. 2016a;Pham et al. 2017b;Chen et al. 2017b). The use of these new hybrid models, which boast higher recognition precision and prediction power relative to data mining models, is immensely necessary for the prediction of landslide-prone areas in landslide studies. Human engineering and economic activities in Langao County, Shaanxi Province, China, primarily include slope cutting, agricultural exploitation, highway construction, hydropower station construction and mine exploitation, all of which may trigger the occurrence of a landslide.
This study is different from the aforementioned studies in that a novel hybrid artificial intelligence approach based on the rotation forest ensemble (RFE) method and na€ ıve Bayes tree (NBT) classifier was proposed. That is, the approach of this study is primarily different from those of the aforementioned publications because it evaluates the performance of the RF and na€ ıve Bayes tree classifier (RF-NBT) method as a novel hybrid artificial intelligence framework that can improve the accuracy of landslide prediction models. The relevant data processing and modeling processes were carried out using the ArcGIS 10.2 and Weka 3.7.12 software packages.

Study area
Langao County is located within the longitudes 108 38 0 ¡109 11 0 E and the latitudes 31 56 0 ¡32 32 0 N and occupies an area of 1,956 km 2 . Topographically, 95% of the study area is mountainous. According to the digital elevation model (DEM) data used in this study, the altitudes of the study area range from 309 m to 2,635 m above sea level and increase from the north to the south. The slope angles in this area exceed 80 . The study area is dominated by a subtropical and continental monsoon climate characterized by warm temperatures, high humidity and abundant rainfall. The average annual rainfall is 1,050 mm. The annual average temperature is 15 C. Human engineering and economic activities in Langao County, Shaanxi Province, China, primarily include slope cutting, agricultural exploitation, highway construction, hydropower station construction and mine exploitation, all of which may trigger the occurrence of a landslide. The river system is well developed within the study area. The land-use types of the study area are primarily farmland, bare land, residential areas, water, forestland and grassland.

Landslide inventory
A landslide inventory map is composed of historical landslide data and other relevant information that includes topographical and geological data and meteorological conditions (Chen et al. 2016;Kumar and Anbalagan 2016). Such maps provide basic information and are prerequisites for evaluating landslide susceptibilities. Furthermore, landslide inventory maps are essential for studying the spatial relationships between the distributions of landslides and their conditioning factors (Pradhan and Kim 2016). In this case study, the landslide data were compiled using existing historical records and the interpretation of satellite imagery coupled with extensive field surveys to produce a detailed and reliable landslide inventory map, following which a total of 288 landslides were identified in the study area. According to analyses in a GIS environment, the smallest landslide area is approximately 75 m 2 while the largest is more than 28,000 m 2 , and the average landslide surface area is 2,700 m 2 . Therefore, to facilitate this study, the shape and scale of every landslide in the dataset was ignored. Rather, each landslide was simplified as a centroid point ( Figure 1) and presented as a single pixel (25 m £ 25 m) in the following analysis. Figure 2 shows two pictures of landslide locations in the study area.

Landslide conditioning factors
The occurrence of a landslide in an area is controlled by various conditioning factors (Yuan et al. 2013;Yuan et al. 2015;Yuan et al. 2016;Hong et al. 2016a). However, there is no general rule for selecting the appropriate landslide conditioning factors (Ayalew and Yamagishi 2005). In this study, the landslide conditioning factors were selected based on an analysis of the previous literature, the general features of the geo-environment and the availability of data for the study region. Ultimately, eighteen conditioning factors were selected: the slope, aspect, elevation, plan curvature, profile curvature, SPI (Stream Power Index), TWI (Topographic Wetness Index), LS (Slope Length), rainfall, lithology, land use, NDVI (Normalized Difference Vegetation Index), distance to rivers, distance to faults, distance to roads, river density, fault density, and road density.
A DEM with a resolution of 25 m £ 25 m was constructed using an ASTER Global DEM provided by the International Scientific & Technical Data Mirror Site, Computer Network Information Center, Chinese Academy of Sciences (http://www.gscloud.cn). The DEM data were used to extract various topographic parameters, including the slope, aspect, elevation, plan curvature, and profile curvature (Figure 3a, b, c, d, and e, respectively). Numerous hydrologic factors, including the SPI, TWI and LS (Figure 3f, g, and h, respectively), were also derived from the DEM data. A rainfall map was compiled from the precipitation data of the study area ( Figure 3i). The various geological factors, including the lithology, distance to faults, and fault density (Figure 3j, n, and p, respectively), were compiled and produced using a geological map at a scale of 1:1,000,000 in ArcGIS 10.0. Among these data, the lithologic units were grouped into 8 classes according to their lithological characteristics and geological ages (Table 1). Landsat 8 OLI images, which were also provided by the International Scientific & Technical Data Mirror Site, Computer Network Information Center, Chinese Academy of Sciences (http://www.gscloud.cn), were adopted to extract the land-use and NDVI maps (Figure 3k and l, respectively) using the ENVI 5.1 software package. The other buffering factors and density factors, including the distance to rivers, distance to roads, river density and road density (Figure 3 m, o, q, and r, respectively), were derived from the topographic data using ArcGIS 10.0. The detailed information of the classification for the eighteen landslide conditioning factors is shown in Table 2. All the eighteen thematic maps were eventually converted into a raster format with a similar resolution of 25 m £ 25 m for further analysis.

Na€ ıve Bayes tree
The NBT is a hybrid algorithm of the na€ ıve Bayes technique and a DT (Kohavi 1996). na€ ıve Bayes originates from research on pattern recognition and is widely used for classification problems in data mining and machine learning fields due to its simplicity and linear run time (Hall 2007, Farid et al. 2014). This algorithm is a simple probabilistic-based method that can predict class membership probabilities (Farid, et al. 2014). Na€ ıve Bayes classifiers are used to determine and choose the class that maximizes the posteriori probability. The classification rule of NB is as follows: where k is the total number of classes. The biggest drawback of NB is the strong assumption of the independence of an attribute, which is what makes it so simple. Therefore, to weaken the attribute independence assumption of na€ ıve Bayes, an NBT was proposed (Wang et al. 2015b). This model uses a DT for its general structure and deploys a na€ ıve Bayes classifier on each leaf node of the constructed DT, and the NBT demonstrates a remarkable classification performance in term of the classification accuracy (Liang and Yan 2006;Wang, et al. 2015b). Meanwhile, a measure of the classification accuracy measure is used during the construction of an NBT rather than a measure of the information gained. To classify a given landslide instance, an NBT sorts the information down the tree from the root node to a given leaf node and then uses the training instances that fall into this leaf node to construct a na€ ıve Bayes classifier (Wang et al. 2015b). An NBT often outperforms DT or na€ ıve Bayes models individually in terms of the classification accuracy and AUC (Kohavi 1996;Liang and Yan 2006).

Rotation forests ensemble
Various classifier ensembles, e.g., RF, boosting, and bagging, are employed in machine learning algorithms to improve the prediction accuracies of base classifiers (Liu and Huang 2008). Among these, RFEs are relatively new, as they were first introduced in 2006 ( Rodriguez et al. 2006). Compared to the use of a single-base classifier, classifier ensembles are generally more accurate (Ozcift and Gulten 2011). Classifier ensembles in machine learning are powerful techniques that can increase the prediction accuracy for base classifiers.
RFEs use principal component analysis to extract features to create training datasets for learning base classifiers (Koyuncu and Ceylan 2013;Pham et al. 2016d). The structure of the RF algorithm is given as follows (Ozcift and Gulten 2011). ( 1) 0-10; (2)  Let the training dataset X be an N Â n matrix, and let the corresponding labels have F number of features. Additionally, suppose the class labels are v 1 ; . . . ; v c f g . Assume that the feature set of the dataset is randomly partitioned into Ksubsets and L DTs of the RF algorithm denoted by D 1 ; . . . ; D L f g . The following three steps are necessary to construct the training dataset for a classifier.
(I) F is randomly divided into Kfeature sets where each subset has M ¼ n K = number of features. (II) Let F ij be the jth subset of features to train the classifier D i , and let X ij be the dataset for X.
Then, a nonempty random subset is designed for X ij to form a new training set via bootstrap resampling with a sample size that is generally smaller than that of X ij . Then, a linear transformation is applied to X 0 ij to produce the coefficients of the matrix C ij , The size of each matrix is T Â 1 with the coefficients of a 1 ij ; . . . ; a M j ij . (III) The sparse rotation matrix R i with the obtained coefficients of the matrix C ij is given as follows: (2) Then, the sparse rotation matrix R i is arranged with the obtained coefficients that are calculated for each class using the average combination method in the given test sample as follows: where d ij xR a i À Á is the probability generated by the classifier C i for the hypothesis that x belongs to the class j. Finally, x will be assigned to the class with the largest confidence (Pham et al. 2016d).

Data preparation
There is no general rule for determining the minimum size of a training dataset (Nefeslioglu et al. 2008). In the present study, 195 (approximately 2/3 of the total) landslide events were selected randomly for the training data while the other 93 (approximately 1/3 of the total) landslide events were used as validation data. The same number of non-landslide points was randomly selected from the landslide-free areas; subsequently, those points were randomly divided into the same ratio of landslide points to construct the training and validation data. The presence of a landslide was assigned a value of '1', while the absence of a landslide was assigned a value of '0.' In this study, the correlations between the landslides and conditioning factors were calculated using a frequency ratio model (Park et al. 2013, Chen, et al. 2016). The calculated frequency ratio for each landslide conditioning factor was extracted for both landslide and non-landslide points to construct training and validation datasets to run the models.

Selection of training factors using the information gain ratio
During landslide susceptibility modeling, some conditioning factors might have little ability to predict the results. In some cases, some conditioning factors may generate noise that reduces the predictive capacity of the model. Therefore, it is necessary to recognize and remove conditioning factors with low or null predictive abilities to quantify a model with a greater prediction accuracy (Tien Chen et al. 2017h). In this study, the predictive abilities of the conditioning factors were determined using the information gain ratio (IGR) proposed by Quinlan (Quinlan 1993). Assume that a training dataset D consists of n input samples and that nðC i ; DÞ is the number of samples in the training dataset D, which belong to the class C i (i.e., landslide or non-landslide). The information needed to classify D is calculated as The amount of information needed to split Dinto ðD 1 ; D 2 ; :::; D m Þ with respect to the landslide conditioning factor Fis estimated as follows: InfoðDÞ: The IGR for the landslide conditioning factor F is calculated as where SplitInfo is the potential information generated by dividing the training dataset Dinto m subsets and is computed as

Training the RF-NBT model
To construct a landslide model, eighteen landslide conditioning factor maps (slope, aspect, elevation, plan curvature, profile curvature, SPI, TWI, LS, rainfall, lithology, land use, NDVI, distance to rivers, distance to faults, distance to roads, river density, fault density and road density) were converted to a grid cell format with a spatial resolution of 25 m. Subsequently, the calculated frequency ratio for each landslide conditioning factor was extracted for every landslide and non-landslide point to construct the training and validation datasets. The training dataset was used to build landslide susceptibility models, while the validation dataset was employed to evaluate the prediction capabilities of the models (Chen et al. 2017h). Using the training dataset, two landslide models (NBT and RF-NBT) were constructed and a heuristic test was conducted to find the best parameters for the two models. After their construction, the landslide models were applied throughout the whole study area to produce LSMs.

Evaluation and comparison methods
In the present study, the models were evaluated using quantitative statistical criteria, including the sensitivity, specificity, accuracy, root mean-squared error (RMSE), and the AUC. The confusion matrix used for both the training and validation datasets consisted of a 2 £ 2 contingency table in which four types of possible results, TP, FP, TN and TP, are categorized. According to the definition, TP and TN are the number of pixels that are correctly classified as landslides and non-landslides, respectively. Meanwhile, FP is the number of pixels incorrectly classified as landslides and FN is the number of pixels incorrectly classified as non-landslides. Based on the four possible combinations of these categories, the sensitivity, specificity and accuracy are calculated using the following formulas: The RMSE was also used in this study to validate the models and was obtained as follows: where X predicted denotes the predicted values in the training dataset or validation dataset, X actual represents the actual (output) values from the landslide susceptibility models, and n is the total number of samples in the training dataset or validation dataset. In addition to the abovementioned statistical criteria, another standard and useful way to evaluate the performances of the landslide models is to employ the receiver operating characteristic (ROC) curve (Pham et al. 2016b). Graphically, the ROC curve is constructed by plotting the sensitivity as the y-axis and the 1-specificity as the x-axis. The AUC quantitatively indicates the general performance of a model, and higher AUC values indicate better model performances (Pradhan 2013;Pham et al. 2016b). The values range from 0.5 to 1, where a value of 1 indicates a perfect landslide model (Chen et al. 2017d).

Selection of landslide conditioning factors
In landslide modeling that is conducted based on machine learning techniques, researchers have already used some methods to select the most effective factors using a training dataset (Tien Pham et al. 2016b). In this study, the IGR method with a 10-fold cross-validation process was utilized. The most effective factors were prioritized based on their IGR values. Hence, the more effective factors for landslide modeling have higher IGR average merit (AM) values (Tien ). According to Figure 4, the results of the analysis show that the elevation (AM = 0.072) and river density (AM = 0.048) demonstrated the highest predictive capabilities, followed by the: distance to river (0.046), distance to road (0.036), road density (0.032), NDVI (0.028), distance to fault (0.026), land use (0.013), profile curvature (0.008), lithology (0.006) and SPI (0.005). Moreover, the other conditioning factors (i.e., the slope, aspect, plan curvature, SPI, TWI, STI and fault density) were removed from the landslide modeling procedure using the training dataset because their AMs were equal to zero (AM = 0).

Model validation and comparison
The results regarding the prediction capability of the new machine learning techniques of the RF-NBT model depend on two parameters: the number of seeds and the number of iterations in the training and validation datasets. In this study, to obtain the optimal values for the parameters in the landslide models using the training and validation datasets, approximately 20 tests were run, and the values of the AUC were recorded.
The results regarding the selection of the best values for the parameters are shown in Figure 5. Graphically, the x-axis and the y-axis show the number of iterations and the area under the curve (AUC), respectively. According to trends in the training and validation datasets (Figure 5a)  optimum value for the number of iterations was found to equal 17, which resulted in the best performance of the RF-NBT model. This is due to, at this point, the values of the AUC for the training (AUC = 0.952) and validation (AUC = 0.883) datasets exhibiting the maximum values. In addition to the number of iterations, the number of seeds must also be considered to obtain the best prediction accuracies from landslide models. Graphically, the x-axis and y-axis of Figure 5b represent the number of seeds and the AUC, respectively. It is readily observed that the highest values of the AUC for the training (0.955) and validation (0.893) datasets occurred for the number of seeds equal to 3.
A comparative assessment between the proposed hybrid model and other soft computing benchmark learning models, including the NBT, functional tree (FT), logistic model tree (LMT) and reduced-error pruning tree (REPTree) models, was taken into account to assess the performance of the RF-NBT model. Similar to defining the number of iterations and seeds for the RF-NBT model, the parameters for the optimal performances of the other methods must be established. The optimum values of the parameters for all of these models are shown in Table 3.
Additionally, using the selected optimal parameters, the proposed hybrid model and the other comparative models were constructed based on the training and validation datasets. A comparison of the results of landslide modeling using the new model with those using the other methods was employed using the selected criteria, including the sensitivity, specificity, accuracy, RMSE, and AUC, for the training (Table 4) and validation (   The results illustrate that all of the applied models showed high predictive capabilities for the spatial prediction of landslides in the study area. Moreover, the results of modeling using the training dataset indicate that the RF-NBT model exhibited the highest sensitivity value (91.7), indicating that 91.7% of the landslide pixels were correctly classified into the landslide class, followed by the NBT (86%), FT (83.4%), LMT (80%), and REPTree (79.7%) models.
In addition to a comparative analysis using the training dataset, the predictive capability of the new hybrid model was also compared with those of the other models using the validation dataset. The results are shown in Table 5. The highest specificity was obtained for the RF-NBT model (87.5%). Likewise, the sensitivity was the highest for the FT model (83%), followed by the NBT model (80.7%), RF-NBT model (77.8%), REPTree model (73.9%), and LMT model (73.5%). Additionally, the results show that the accuracy was the highest for the new hybrid model (81.9%), followed by the NBT model (78.7%), REPTree model (78.2%), FT model (77.1%), and LMT model (76.1%). Similar to the training process during the validation, the RMSE value was the lowest for the RF-NBT model (0.374). Notably, the RFE led to an increase in the prediction accuracies of base classifiers such as the NBT. Table 5 clearly shows that the AUC for the NBT was 0.826, while the value of the AUC was enhanced to 0.884 in the RF-NBT model.
In summary, although all of the models showed good prediction capabilities, the proposed RF-NBT hybrid model was successfully validated during the evaluation process using both the training and validation datasets.

Generating landslide susceptibility maps and evaluations
After the training and validation of the models, LSMs using the NBT model (base classifier) and the new hybrid model (RF-NBT) were constructed according to the following steps. First, the probabilities of landslide occurrences (PLOs) for each pixel in the study area were computed using the probability distribution functions of the NBT and RF-NBT models. Consequently, according to continuous data classification methods in GIS, the PLOs were classified. Some techniques are used for a classification of the PLOs, including natural breaks, standard deviations, quartiles, equal intervals, and geometrical intervals (Nithya et al. 2012). In this study, geometrical intervals (GIs) were exploited to classify the probability occurrences of landslides. Geometrical intervals, which were first developed by an ESRI geostatistical analyst team, are applicable for continuous data and result in a reduction of the variance (Pham et al. 2016c). Therefore, the LSMs were classified using the GI method. Figures 6 and 7 show the results of the reclassification of the PLOs for the NBT and the RF-NBT models, respectively. Each LSM using the GI method was ultimately subdivided into zones according to five different categories: very low susceptibility (VLS), low susceptibility (LS), moderate susceptibility (MS), high susceptibility (HS), and very high susceptibility (VHS).
In general, during susceptibility modeling, there are two main processes: the construction and evaluation of the model followed by the construction and evaluation of the susceptibility maps. Indeed, the model construction and evaluation were described during the abovementioned modeling process, in which the new RF-NBT model was established, operated and compared with the NBT, FT, LMT and REPTree soft computing benchmark models.
During the construction and evaluation of the susceptibility maps, the LSMs from the NBT and RF-NBT models were compared and evaluated. This process was performed using the training (success rate curve) and validation (prediction rate curve) datasets, the results of which were then compared and evaluated using the values of the AUC for both the training and validation datasets. Figure 8 shows the results of the comparison and evaluation of the LSMs. According to the success rate curve (Figure 8, left), the RF-NBT model (AUC = 0.914) outperformed the NBT model (AUC = 0.897). Additionally, the results from the prediction rate curve (Figure 8, right) show that the RF-NBT model (AUC = 0.886) also outperformed the NBT model (AUC = 0.837). These findings show that the RF-NBT model has a higher ability for LSM in the study area with an increased prediction accuracy relative to the NBT model using both the training and validation datasets.
The landslide density (LD) is another way to assess the applicability and prediction accuracy of the proposed hybrid model. The LD is calculated as follows: The results of the LD are shown in Table 6. It can be concluded that the VHS class showed the highest LD (3.552), follows by the HS (3.156), MS (2.587), LS (1.235), and VLS (0.286) classes. Moreover, the prediction accuracy of the new hybrid model was validated in comparison with the NBT base classifier model using statistical tests. The results demonstrating the significant differences between the hybrid landslide susceptibility model and the NBT model using the Friedman test are   shown in Table 7. It can be concluded that the null hypothesis (no significant difference between two models at a significance level of a = 5%.) should be rejected because the p-value is less than 0.05. The Wilcoxon signed-rank test was used to assess the significance of the pairwise differences between the RF-NBT and the NBT models. The results of this test are shown in Table 8. The results using p-values that are less than the threshold (< 0.05) and z-values that are more than the critical values (i.e., z = ¡1.96 and z = +1.96) clearly depict that the performance of the new hybrid intelligent model (RF-NBT) proposed in this study is significantly different with that of the NBT base classifier model.

Discussions
In recent years, according to the literature, ensemble machine learning techniques have become more common (Shirzadi et al. 2017a). On one hand, some uncertainties exist in landslide susceptibility modeling processes with regard to the selection of the conditioning factors related to landslides, the qualities of those factors and the models or techniques that have been utilized. On the other hand, such techniques and methods need to be assessed to increase the prediction accuracies of landslide models such as ensemble models. Therefore, the primary aim of this study was to develop and introduce a novel ensemble machine learning model comprised of RF and NBT classifiers, namely, the RF-NBT model, to map shallow landslides in Langao County, China.
An NBT is a combination of the DT and na€ ıve Bayes (NB) algorithms, which are effective classifiers for classification problems (Shirzadi et al. 2017a). However, the performance of an NBT depends on the independence assumption . Accordingly, ensemble (meta) classifiers such as the RF technique could improve the performance of single (weak) classifiers and enhance the results of the prediction accuracies of landslide susceptibility models ).
According to a review of the literature, eighteen conditioning factors that affect the occurrences of landslides were selected: the slope angle, aspect, elevation, plan curvature, profile curvature, SPI, TWI, LS, rainfall, lithology, land use, NDVI, distance to rivers, distance to faults, distance to roads, river density, fault density and road density. Selecting the most important conditioning factors plays a significant role in achieving the most accurate results (Tien Bui 2012). For this purpose, the features were selected using the AM criteria of the IGR technique with a 10-fold cross-validation approach.
The results show that, among the eighteen conditioning factors, only eleven factors (the elevation, river density, distance to river, distance to road, road density, NDVI, distance to fault, land use, profile curvature, lithology and SPI) could be selected. The other factors exhibited AM values equal to zero and were consequently removed from the modeling to increase the prediction accuracy of the landslide modeling process.
The determination of the input parameters in the RFE model (i.e., the numbers of iterations and seeds) is one of the abovementioned model uncertainties that can influence the results of the modeling scheme. In this study, the modeling process was performed using 20 iterations with variations in the numbers of iterations and seeds. The results demonstrate that values of 17 and 3 are optimal for the numbers of iterations and seeds, respectively, in the RFE model.
In this study, a comparison between the proposed ensemble model with some state-of-the-art benchmark machine learning models using the training (goodness-of-fit) and validation (prediction accuracy) datasets was performed. The goodness-of-fit results for the landslide models show that the RF-NBT model (AUC = 0.945) outperformed the NBT (AUC = 0.866), FT (AUC = 0.890), LMT (AUC = 0.891), and REPTree (AUC = 0.897) models. Additionally, the results reveal that the RF-NBT (AUC = 0.884) model exhibited a good prediction accuracy in comparison with the NBT (AUC = 0.826), the FT (AUC = 0.845), the LMT (AUC = 0.839), and the REPTree (AUC = 0.801) models. This is because the RF-NBT ensemble uses the RF ensemble algorithm, which may enhance the performance of a weak (single) classifier (Onan 2016;Pham et al. 2017a). Furthermore, in the training process, the RF-NBT model led to the optimization of the weights of the NBT for the classification, thereby providing results with higher predictive capabilities in comparison with the other landslide models.
LSMs prepared using the RF-NBT and NBT models were validated based on the training (success rate curve) and validation (prediction rate curve) datasets. The results indicate that the RF-NBT model outperformed the NBT model for single or weak classifiers in both the training and validation datasets. The RF-NBT model (AUC = 0.886) showed an increased prediction accuracy relative to the NBT model (AUC = 0.837). Additionally, the LSMs were validated using the (LD. The results show that the values of the LD increased from the VLS class to the VHS class. Moreover, the significant differences between the RF-NBT model and the other landslide models were verified using the Friedman and Wilcox signed-rank statistical tests. The results confirm that the performance of the RF-NBT model was statistically different than that of the NBT model.

Conclusions
In this study, a novel hybrid artificial intelligence approach based on the RFE and NBT classifiers was proposed to assess the landslide susceptibilities in Langao County, China. In the study area, 288 landslides were identified, and eighteen conditioning factors, including the slope, aspect, elevation, plan curvature, profile curvature, SPI, TWI, LS, rainfall, lithology, land use, NDVI, distance to rivers, distance to faults, distance to roads, river density, fault density and road density, were employed. The IGR method was used to rank the predictive capabilities of the conditioning factors. As a result, the slope, aspect, plan curvature, SPI, TWI, STI and fault density factors were removed from the landslide modeling process because of their null predictive capabilities. Two parameters, namely, the number of seeds and the number of iterations, were used to obtain the optimal values for the landslide modeling. The results showed that the optimal values for the number of iterations and the number of seeds equaled 17 and 3, respectively. A comparative assessment between the new hybrid model and other soft computing benchmark learning models, namely, the NBT, FT, LMT and REPTree models, was conducted to assess the performance of the RF-NBT model. The results of modeling using the training dataset showed that the RF-NBT model exhibited the best performance with regard to the values of the sensitivity, specificity, AUC and RMSE for both the training and validation datasets.
The produced LSMs were reclassified into five classes that were used to express the landslide susceptibilities of the area, including the VLS, LS, MS, HS, and VHS classes. In addition, the LD was also used to assess the applicability and prediction accuracy of the new hybrid model proposed in this study. The LD results showed that the VHS class exhibited the highest LD value (3.552). In summary, the results from this study may be useful for decision making in areas prone to landslides.

Disclosure statement
No potential conflict of interest was reported by the authors.