Spatial prediction of shallow landslide: application of novel rotational forest-based reduced error pruning tree

Abstract Landslides are a form of soil erosion threatening the sustainability of some areas of the world. There is, therefore, a need to investigate landslide rates and behaviour. In this research, we introduced a novel hybrid artificial intelligence approach of rotation forest (RF) as a meta classifier based on reduced error pruning tree (REPTree) as a base classifier called RF-REPTree, for landslide susceptibility mapping (LSM) in the Kalaleh watershed, Golestan Province, Iran. Some benchmark models, including the open-source Java decision tree (J48), naive Bayes tree (NBTree), and REPTree were used to compare the designed model. A total of 249 landslide locations were identified and mapped. The group was split into training (70%) and testing (30%) data for modelling and reliability analysis. Based on a literature review and multi-collinearity tests, 16 landslide conditioning factors (LCFs) were selected. Of the LCFs, the topographical position index (TPI) had the highest correlation with landslide occurrence. The LSM produced by RF-REPTree revealed that nearly 29% of the study areas have high to very high landslide susceptibility (LS). Statistical analysis of the model results included the receiver operating characteristic curve (ROC), the efficiency test, the true skill statistic (TSS), and the kappa index. ROC demonstrated that the AUC values of RF-REPTree, REPTree, J48, and NBTree models were 0.832, 0.700, 0.695, and 0.759 for succession rate curves and 0.794, 0.740, 0.788, and 0.728 for prediction rate curves, respectively. Therefore, all models were judged to be acceptably accurate for LSM. Among the LS models, the RF-REPTree model achieved the highest accuracy, followed by REPTree, J48, and NBTree. The results of LSM can be used to target the mitigation of landslide hazards and provide a foundation for sustainable environmental planning.


Introduction
Soil erosion is a global problem challenging sustainability (Bayat et al. 2019;Jiang et al. 2019;Lu et al. 2019b;Salesa et al. 2019;Guadie et al. 2020). Erosion caused by landslides affects watersheds at the landslide and sediment deposition (Chalise et al. 2019). Landslides, a form of soil erosion, affect sustainable development and land deterioration (Keesstra et al. 2018;Visser et al. 2019). Landslides can have high economic costs (Rangsiwanichpong et al. 2019) and are hazards to people (Gao and Sang 2017;Nachappa et al. 2020). They degrade land quality through erosion and sedimentation (Piacentini et al. 2018). They are found in most regions of the world (Gariano and Guzzetti 2016). Landslides influence the stream flows, which cause flooding and disturb the normal rhythms of stream (Keesstra et al. 2018a. Failing slopes can be activated by earthquakes (Roback et al. 2018), but precipitation tends to be the main factor triggering landslides (Sidle and Bogaard 2016). Their probability of occurrence is also tied to vegetation cover (Guo et al. 2020). Landslide research has been the foci of both basic (Whiteley et al. 2019) and applied sciences (Piciullo et al. 2018).
Iran's mountains face significant landslide problems due to its distinctive geographical setting, climatic and geomorphological instabilities, rapid population growth, increasing demands for resources, and inadequate environmental management (Aghda et al. 2018). Gallicash watershed, in the Golestan Mountains of northern Iran, experiences landslide disasters every year (Arabameri et al. 2019c(Arabameri et al. , 2020d. Landslides damage transportation infrastructure, communication lines, urban settlements, industries, and natural resources and stunt economic growth in this region (National Geosciences Database 2017). Every year in Iran, landslides destroy property worth 500 billion rials (approximately US$12 million in 2020) (National Committee on Natural Disaster Reduction of the Iranian Ministry of the Interior). Landslides are a significant concern in this country as they affect development throughout Iran.
Reports of landslides in this region have been recorded in the National Geosciences Database (2017). In terms of physiography, more than half of Iran's land cover is either desert or semi-desert, one-third is covered by mountains terr, and the rest is alluvial plain (i.e. the Caspian and Khuzestan plains) (Haftlang and Lang 2003). Meteorologically, northern Iran receives an average of 2113 mm more precipitation than do the semi-arid and arid regions to the south. This dramatically increases the landslide problem in this region (Arabameri et al. 2019c). The management of landslides is essential for the safe and sustained economic and infrastructure development of this region. Landslide susceptibility mapping (LSM) is a rudimentary tool that can help design the framework for natural resource management and land use planning. The LSM tool has been deployed in many regions of the world for sustainable development . Delineation of landslide-susceptible areas with a predictive model is critical for reducing losses of properties and lives (Gudiyangada Nachappa et al. 2019;Tavakkoli Piralilou et al. 2019). Different researchers have used LSM for sustainable planning and conservation strategies for specific purposes .
Landslide occurrence is influenced by topographical, climatological, lithological, and morphometric factors. Human activities like road construction, expansion of settlement, and land-use changes are aggravating the natural factors that generate landslides (Ahlmer et al. 2018). Researchers have used several topographical factors (i.e. altitude, slope, aspect, convergence index (CI), plan and profile curvature, stream power index (SPI), and topographic position index (TPI)), hydrological factors (i.e. drainage density (Dd), distance to stream, topographic wetness index (TWI)), and environmental factors (i.e. land use/land cover (LU/LC), normalized difference vegetation index (NDVI), distance to road, soil, and lithology) to create LSMs. Researchers reported that multi-collinearity tests were usually used to select the factors to include in modelling. Information gain ratio and relief-F tests have also been used to assess LS modelling Hong et al. 2018;Roy et al. 2019). Nsengiyumva et al. (2018) stated that the most suitable LSCs produce accurate and meaningful predictive LS models that can provide information to decision makers and planners who can safely and sustainably use, develop, and manage natural resources, soils, roads, and urban infrastructure.
With increase in remote sensing and geography information system techniques (Zuo et al. 2015(Zuo et al. , 2017Jiang et al. 2018;Zhou et al. 2020;Zhao et al., 2020;Yu et al., 2020), qualitative and quantitative methods can be used to evaluate LS (Ayalew and Yamagishi 2005). Qualitative methods require two data types (Xu et al. 2019;Wang et al. 2020;Zhang et al. 2020;Yu et al. 2021;Hu et al. 2021): a landslides inventory and heuristic datasets (Aleotti and Chowdhury 1999). The landslide inventory map (LIM) is the primary survey of landslides. Identifying landslide locations can be done through a field investigation, a perception survey of the local population, aerial photographs, high-resolution satellite images, or Google Earth images. Landslide inventory datasets are typically divided into two randomly selected sets at a 70:30 ratio for training and validation of LS modelling Hong et al. 2018;Arabameri et al. 2019cArabameri et al. , 2020dSaha 2019a, 2019b). Therefore, the landslide inventory is used as the dependent variable in models. Some researchers used different approaches to model LS. Wu et al. (2016), Arabameri et al. (2019b), and Saha et al. (2019) used the analytical hierarchical process (AHP) as the expert knowledgebased and multi-criteria decision-making approach to analyze LS. Arabameri et al. (2019aArabameri et al. ( , 2019bArabameri et al. ( , 2019d, Roy and Saha (2019a), Regmi et al. (2014), Ciurleo et al. (2016), Abedini and Tulabi (2018), Chen et al. (2018), Hemasinghe et al. (2018), Ahmed and Dewan (2017), and Zhu et al. (2014) applied several multivariate statistical techniques: frequency ratio (FR), fuzzy logic (FL), weight-of-evidence (WofE), evidential belief function (EBF), statistical index (SI), landslide nominal risk factor (LNRF), certainty factor (CF), information value (IV), logistic regression (LR), and Dempster-Shafer (DS) models. More recently, machine-learning (ML) ensemble models have been used to map natural hazards. The ML models such as random forest (RAF), boosted regression tree (BRT), artificial neural network (ANN), multivariate adaptive regression spline (MARS), J48 decision tree (JDT), least squares support vector machines (LSSVM), linear discriminant analysis (ADA), decision tree (DT), adaptive neuro-fuzz inference system (ANFIS), k-nearest neighbour (KNN), logistic model tree (LMT), alternate decision tree (ADTree), Bayesian logistic regression (BLR), support vector machine (SVM), convolutional neural network (CNN), and recurrent neural network (RNN) were used in LSM studies by Youssef et al. (2016), Zhou et al. (2018), Chen et al. (2018), Ghorbanzadeh et al. (2019) Ngo et al. (2021, and Arabameri et al. (2019aArabameri et al. ( , 2019bArabameri et al. ( , 2019cArabameri et al. ( , 2019dArabameri et al. ( , 2020aArabameri et al. ( , 2020bArabameri et al. ( , 2020cArabameri et al. ( , 2020d. ML models are superior to statistical methods as they are more accurate, have no overfitting problems, and can analyze both continuous and categorical data simultaneously. Without validation, ML models are meaningless (Xu and Chen 2013;Zhao et al. 2014;Li et al. 2018;Zhao et al. 2019;Tu et al. 2021;Shan et al. 2021). Several statistical techniques can be used to validate LSMs: the receiver operating characteristics (ROC) curve, seed cell area index (SCAI), quality sum index (Qs), root mean square error (RMSE), and mean absolute error (MAE) Chen et al. 2019;Roy et al. 2019;Roy and Saha 2019b). ROC can measure any model's accuracy Arabameri et al. 2019cArabameri et al. , 2020dChen et al. 2019). While these provide a higher level of precision than conventional and individual machine learning models, hybrid models and ensemble methods are interested in mapping susceptibility to landslide Moayedi et al. 2019;Mehrabi et al. 2020). Combining two or more methods will correct flaws in a single approach, improve results, and increase the model's predictive capability Moosavi and Niazi 2016). To resolve the previous statement, and in light of the findings of machine learning models for landslide modelling published in the literature, the reduced error pruning tree (REPTree) and rotational forest (RF) and their ensembles were used to model landslide susceptibility in our study field. Until now, the REPTree and RF ensemble have not been used to determine landslide susceptibility. The method aims to increase the efficiency of any given learning algorithm by fitting a collection of low-error models and then combining them into an ensemble that can yield superior results (Shin et al. 2012).
In this study, tree-based ML techniquesthe open-source Java DT (J48), naive Bayes tree (NBTree), and reduced error pruning tree (REPTree), and a novel ensemble rotational forest-based REPTree (RF-REPTree)were used to model LS. Pham et al. (2019) used REPTree and its ensemble techniques such as bagging-based REPTree, multiboost-based REPTree, rotation forest-based REPTree, random subspace-based REPTree for LS assessment and prediction. Hong et al. (2019) used J48 and its ensemble with adaboot, bagging, and rotational forest models. All studies provided insightful results. Following Hong et al. (2019), this study introduced a novel hybrid artificial intelligence approach of rotation forest (RF) as a meta classifier based on reduced error pruning tree (REPTree). This study's goals were to predict LS and determine the ML model that is best able to achieve this goal in the Kalaleh catchment, Iran.

Study area
The Kalaleh River watershed, a part of the Gorganroud watershed, occupies an area of approximately 5368 km 2 located between 37 07 0 and 37 43 0 N and 54 58 0 and 55 56 0 E. The watershed is in northern Iran and drains into the Caspian Sea ( Figure 1). The highest elevations in the basin are 2870 m above sea level (a.s.l.), and the lowest is 13 m a.s.l. Though Iran is generally known to be arid and semi-arid, the Kalaleh basin climate is semi-arid in the east and humid in the west. Temperatures range annually from 11 to 18.1 C and the average annual precipitation varies from 195 to 946 mm (IRIMO 2012). About 36% of precipitation falls from January through to March. The basin's topography is a complex array of mountains, hills, plateaus and upper terraces, Piedmont plains, alluvial plains, and lowlands. Sedimentary rocks including calcareous, sandstone, shale, dolomite, and marlare found throughout the region, and the surface is also covered with conglomerates, loess sediments, and alluvium (GSI 1997).
The soils of the watershed are Entisols (25.6%), Alfisols (25.1%), Inceptisols (19.7%), and Mollisols (29.3%). Forests (25.88%), irrigated lands (10.44%), orchards (0.01%), dry-farming (47.1%), water bodies (0.69%), mixed agriculture and orchards (14.7%), surface rock (0.08%), and urban (0.98%) are the land use/land covers. The Kalaleh watershed contains the population centre of Golestan Province. It housing nearly 1.2 million residents. The basin maintains an agriculture-based economy (46% of the population works in agriculture), with manufacturing and mining accounting for a smaller part of the economy (20% of the population). The region also contains many wildlife habitats. Rivers flow from west to east in the watershed. They include the Qazanabad, Taghi Abad, Mohammad Abad, Kaboodvall, Ramayan, Gharehay, Narmab, Goli Tape, Gallicash, Tangrah, and Ghary Navi rivers. They originate from the highlands in the south and northeast, passing through Gonbad and Gorgan's plains, and converge in the Mazandaran Sea. The most prominent formations in the region are the Caspian Sea and faults in the northern part of Alborz. These faults are northeast/southwest to northeast/southeast (Shahpasandzadeh 2004). In recent years, population growth on erodible soils has led to accelerated soil erosion and depletion (Lar Consulting Engineering 2007). Subsequently, the Kalaleh River basin suffers high soil erosion, flash flooding, landslides, and high sediment yields (Saadat et al. 2008).

Materials and methods
Various types of data were acquired from several sources . The data were primary and secondary (Yang and Sowmya 2015). The primary data included a field survey of landslides with handheld GPS and a perception survey of local communities to determine the frequency of landslides and the landslide locations. Secondary data included historical/archival reports, newspaper reports, topographical maps, lithological maps, soil maps, a DEM, and Landsat data. The historical data were collected from the Civil Defense and Engineering Department of Iran. An ALOS PALSAR DEM of high resolution (12.5 m Â 12.5m) was downloaded from the Alaska Satellite Facility. The Landsat 8OLI/TIRS was downloaded from United States Geological Survey. The lithological map (scale 1: 50,000) was acquired from the Geological Department of Iran. The soil map (scale 1:50,000) was attained from the Land Use Department of Iran. The topographical map (scale1:500,000) was collected from the Topographical Department of Iran. Historical rainfall records from rain gauges in the catchment were acquired from the Meteorological Department of Iran's Islamic Republic (IRIMD). The ALOS PALSAR DEM was selected as the base map, and the other factors, at lower resolutions, were compiled within it to prepare LSMs of the study area ( Figure 2). The essential components of the methodology are: 1. Data were collected from numerous sources, the LIM was prepared, and LCFs were compiled. 2. Multi-collinearity analysis using the tolerance (TOL) and variance inflation factor (VIF) was conducted to select the factors suitable for LS analysis. 3. LSMs were prepared using J48, NBTree, REPTree, and RF-REFTree models. 4. The LS models were validated with AUROC, TSS, efficiency, and the kappa index.

Preparation of LIM
The landslides and LCFs are the essential elements of LS mapping (Ercanoglu and Gokceoglu 2004). The primary and secondary data, historical landslide records (ILWP 2007;FRWO 2013) and aerial photos (scale 1:20,000 and 1:40,000), were used to generate a LIM. An extensive field investigation with handheld GPS was conducted to verify the LIM. A total of 249 landslides were found and verified. Based on the previous literature, 70% of the landslide inventory data were randomly assigned to a group for training purposes, and the remaining 30% were used for testing ( Figure 1). Same number of non-landslide points was randomly selected for running the models. The landslides in the study area are predominantly translational slides, rotational slides, and debris flows ( Figure 3)

Multi-collinearity analysis of influential factors
A multi-collinearity check is vital for choosing LSM parameters, as a linear association between the parameters will reduce a model's predictive accuracy . The TOI and VIF are essential methods in multi-collinearity analysis. The threshold limits of TOI and VIF are !0.  (PC), CI, topographic positioning index (TPI), topographic ruggedness index (TRI), topographic wetness index (TWI), distances to stream, road, and faults, lithology, soil types, LU/LC, and NDVI.

Generation of effective factors
Eight topographical (elevation, slope, PC, TRI, TPI, TWI, slope length, and CI), two hydrological (rainfall, distance to stream), two lithological (geology, distance to fault), and four environmental (LU/LC, NDVI, soil type, distance to road) factors were used in the LS analysis Arabameri et al. 2019aArabameri et al. , 2019bArabameri et al. , 2019cArabameri et al. , 2019dArabameri et al. , 2020aArabameri et al. , 2020bArabameri et al. , 2020cArabameri et al. , 2020dRahamati et al. 2019). To assess LS by the soft computing techniques as direct methods, inclusion of suitable LCFs is necessary to run the process. The incorporation of several effective variables can increase the models' prediction capabilities and performances. Lithology, slope, and aspect are vital LCFs, and they are extensively used for the analysis of LSM. Chen et al. (2017Chen et al. ( , 2019 and Hong et al. (2018) produced LSMs using several factors that included topographical, hydrological, lithological, and environmental factors. The ALOS PALSAR DEM was used as the elevation map of the study area (Figure 4(a)). The high-resolution PALSAR DEM provided more accurate results than the ASTER and SRTM DEMs (Arabameri et al. 2019c). Elevation effect on rainfall and vegetation ). The higher elevations and hills are more susceptible to landslides than lower elevations (Roy et al. 2019;Roy and Saha 2019b). Landslides are controlled by the slope (Nefeslioglu et al. 2008). The slope map (Figure 4(b)) was produced using ALOS PALSAR DEM and the spatial analyst tool in ArcGIS. The slope of the study area ranges from 0 to 73 degrees (Figure 4(b)). The PC, CI, TWI, TRI, SPI, and slope-length (Figure 4(c-g)) maps were derived from ALOS PALSAR DEM using SAGA GIS. Maps showing distances to stream, roads, and faults ( Figure 4(i-k)) were prepared in GIS using the Euclidian distance buffering tool. Climate change accelerates hydrological cycles, alters the magnitude and timing of streamflow, and threatens the water resources and environmental sustainability of basins (Lu et al. 2019a;Tian et al. 2020). The precipitation map (Figure 4(l)) was created from the station data using the kriging interpolation method. The lithological map (Figure 4(m)) was digitized and described briefly (Table 1). Human activities can affect the natural environment ). The LU/LC (Figure 4(n)) map was derived from Landsat 8 OLI/TIRS imagery using the maximum likelihood classification method. The land-use types found in the region were forest (A), agriculture (B), orchard (C), fry-farming (D), water (E), agriorchard (F), rock (G), and urban (H) land uses. The soil map (Figure 4(o)) was digitized, and four soil orders were found: Alfisols, Inceptisols, Mollisols, and Entisols. The NDVI was derived from Landsat 8 OLI/TIRS imagery using the NDVI method (Figure 4(p)).

Methods
Three ML models and one ensemble framework model (J48, REPTree, NBTree, and RF-REPTree) were used to model LS.

J48
J48 is a decision-tree algorithm capable of detecting changes of vector attributes for any number of instances (Kaur and Chhabra 2014). Using tree classification, the algorithm produces rules for target-variable prediction; data distribution can also be conceptualized clearly (Kaur and Chhabra 2014). J48 has more capacity to count missing values, setting rules, and tree pruning. It can provide more accurate results from data mining. The J48 model uses the C4.5 algorithm to create a very well-organized DT by statistical classification (Witten 2011). Using information gain and entropy equations from any data with class levels can select any attribute (Bashir and Chachoo 2017).

REPTree
The REPTree, generated by combining reduced-error pruning (REP) with a DT, is a fast decision-tree learning process that employs splits and prunes (Quinlan 1987). In this approach, the DT uses the training dataset to model; when the DT's performance is high, the REP minimizes tree structure complexity (Mohamed et al. 2012). The pruning method accounts for backward overfitting problems (Quinlan 1987;Yu et al. 2020). The REPT algorithm finds the optimal form of the most precise sub-tree, depending on the post-pruning approach (Esposito et al. 1999). This model's efficiency is based on information gain from entropy or reduction of variance and reduced errors of techniques for pruning (Srinivasan and Mekala 2014).

NBTree
The naïve Bayes tree (NBTree) classifier is a novel ML technique and a DT (Kohavi 1996). The naïve Bayes emerge from pattern detection, commonly used in data mining and ML searches for classification problems due to its simplicity and linear run time (Farid et al. 2014). The NBTree algorithm, a basic probabilistic method, can estimate class membership probability (Farid et al. 2014). Naïve Bayes classifier trees can be used to evaluate and pick the class that maximizes the subsequent class's likelihood. The NBTree classification procedure is as follows: where c is a class, and k is the number of classes. NB's biggest drawback is a strong assumption that an attribute is unique; this is what makes it so simple. However, an NBTree was proposed to strengthen naive Bayes' presumption of attribute independence . This model uses a DT for its basic structure and sets a naïve Bayes classifier on each leaf node of the developed DT; the NBTree shows impressive classification precision . While, during the development of an NBTree, a metric of recognition precision is used, rather than measuring the information gained. An NBTree filters the information from the root node to a given leaf node down the tree and then uses the training cases that fall into that leaf node to build a naive Bayes classifier that identifies a landslide occurrence ). An NBTree also exceeds DT or naive Bayes models individually regarding classification accuracy and AUC (Kohavi 1996).

Rotational forest
Rotational forest (RF) is a hybrid ensemble method consists of individual decision trees and categories (Rodriguez et al. 2006;Rodriguez 2007). A tree with a specific dataset in association with a rotated feature space must be arranged in RF. RF utilizes principal component analysis (Wold et al. 1987) to derive the learning sets' characteristics and produce training sets for running the base classifiers. For this analysis, S ¼ ðs 1 , s 2 , :::, s n Þ are the LCFs. Y ¼ ðy 1 , y 2 Þ are the variable dependency categories, i.e. landslide and non-landslide. Training data are represented by D. T represents LCFs data set. T is grouped into different k sub-classes. E ij shows LCFs in T ij from E. E 0 ij is chosen randomly by the bootstrap method from E ij . To have the constants of ri i, 1 size is T Â 1, E 0 ij have to be calculated over. Ensemble RF is then produced concerning the rotation matrix via the primary categorization and conversion method (Xia et al. 2014). R a i is the rotation matrix acquired through the Ri matrix's reorganization that can be described in Equation (8). (2) In fact, the obtained coefficients, which are built for each individual class by the average combination technique, order a sparse rotation matrix as in Equation (3), called R i .
where a i, k ða R a i Þ demonstrates the produced probability of C i classifier in which the k class is enabled by e. Finally, e is because of the greatest confidence group. Breiman (2001) introduced random forest, an important method of ML. It was used to calculate regression, grouping, clustering, and interaction. A single DT may provide high variance and bias for classification. The RAF can solve the bias and minimize the error using an ensemble tree (Taalab et al. 2018;). To form a forest, RAF creates thousands of binary trees. Based on a bootstrap model, each tree is grown using the classification and regression trees (CART) technique with a random subset of variables chosen at each node. The out-of-bag (OOB) error rate is calculated using results left out of the bootstrap sample. Finally, the majority vote among all trees will have been produced, the model is constructed, and class memberships are decided (Micheletti 2014). Therefore, two forms of error occurred: the mean decreases in accuracy and the mean decreases in Gini coefficients. Such tests are commonly used to rate and choose variables. To reduce the OOB error and improve model performance (Taalab et al. 2018), the user should optimize two a priori parameters to run the RAF model: the number of trees in the forest (ntree), and the number of variables tested at each node (mtry).

Model evaluation and assessment techniques
3.5.1. ROC curve ROC curves have been used for many purposes. Saha (2017), Hembram and Saha (2020), Hembram et al. (2019aHembram et al. ( , 2019b, Saha (2019a, 2019b)   used ROC curves for mapping gully erosion, landslides, land subsidence, and groundwater potential. Using training and validation datasets, the LSMs were evaluated with different methods, including true skill statistic (TSS), efficiency (Equation (6)), and AUROC, and also these techniques (Arabameri et al. 2019c;Rahamati et al. 2019). These techniques were commonly used to compare and assess data miming models (Allouche et al. 2006;Xia et al. 2017;Chen et al. 2016;Wang et al. 2017;Hu et al. 2015). True-positive (TP) and true-negative (TN) are the numbers of pixels classified correctly, while false-positive (FP) and false-negative (FN) are the numbers of pixels classified incorrectly . The probability value of 0.5 is the threshold (Ahmadlou et al. 2018). If a model has a probability of >0.5, then it is considered helpful for landslide assessments. If the value is <0.5, then it is deemed useless for assessments. This threshold has been used in land-subsidence modelling (Frattini et al. 2010). TSS (Equation (7)) was determined by varying sensitivity (TP, Equation (4)) and specificity (Equation (5)). Data-specific prevalence or data set sizes do not affect TSS (Allouche et al. 2006). The AUROC measures a predictor's efficiency by measuring the area under the sensitivity curve against the specificity (1 À specificity) at various false positive (FB) cut-off thresholds ranging from 0 to 1. The AUROC values were categorized into five groups: poor accuracy (AUROC 0.6), average accuracy (AUROC ¼ 0.60.7), good accuracy (AUROC ¼ 0.7-0.8), excellent accuracy (AUROC ¼ 0.8-0.9), and excellent accuracy (AUROC ¼ 0.9-1) (Fressard et al. 2014).
where TPR means the true-positive rate and FPR means false-positive rate, and TSS ¼ TPR þ FPRÀ1 (7)

Cohen's kappa index
Cohen's kappa index was used to validate the LSMs developed by the J48, REPTree, NBTree, and RF-REPTree approaches. Cohen's kappa coefficient is a statistical measure of qualitative items. It measures inter-rater agreement. The kappa index is believed to be a more effective measurement than simple calculations of agreement. The kappa index relies on measurements of the agreement found (P obs ) in the maps and also that the prediction was by chance (P exp ) (Cohen 1960;Guzzetti 2006): where P obs and P exp can be calculated with: where N refers to the number of pixels in a map. Variable k ranges from À1 to þ1; values when near 1 indicate perfect agreement. The kappa index can be used to assess the precision, efficacy, and reliability of landslide models (Hoehler 2000). The kappa index's accuracy value was graded into six (mostly) proportionately stratified categories: excellent (0.81-1.0), very good (0.61-0.80), good (0.41-0.60), moderate (0.21-0.40), poor (>0-0.2), and very weak (0) (Landis and Koch 1977).

Multi-collinearity assessment
Based on TOI and VIF statistics analysis, the 16 landslide conditioning factors had no multi-collinearity problems (Table 2). Therefore, all factors were used as potential predictive factors to discern LS.

Effective factor assessment by RAF
Analysis of the contributions of LEFs using suitable techniques is essential in any kind of spatial modelling of natural processes for management Rahamati et al. 2019;Roy and Saha 2019b;Roy et al. 2019;Arabameri et al. 2020d). LEF weights were calculated by the RAF model (Table 3). The results revealed that   Figure 5).

Modelling landslide-susceptible areas
The LSMs were produced using training datasets and tree-based models ( Figure 6). The performance of the LS models varied. The LS index value was determined for each pixel. Susceptibility was reclassified into classes using the natural break classification method in GIS (Irigaray et al. 2007) into five classes: very low, low, moderate, high, and very high. Approximately 23.62% of the study area in the J48 LSM ( Figure  6(a)) was classified as having very high LS. The high LS area was 14.84%, the moderate was 32.55%, the low was 17.73%, and the very low was 11.27% ( Figure 7 and Table 4). The NBTree LSM (Figure 6(b)) classification proportions were 18.46% very high, 35.56% high, 25.27% moderate, 10.09% low, and 10.63% very low The very high LS class covers 18.30% of the REPTree LSM (Figure 6(a)). High LS covered 8.56%, moderate covered 10.81%, low covered 29.31%, and very low covered 33.02%. The RF-REPTree ensemble LSM classified 12.81% as very high LS, 17.36% high, 20.74% moderate, 24.72% low, and 24.34% very low.

Model evaluation and comparison
The LSMs were evaluated with the ROC curve, TSS, efficiency, and kappa index methods on the training and validation datasets (Figure 8). According to ROC, the highest reliability was achieved by the RF-REPTree (AUC values of 0.832 and 0.794 for the testing and validation datasets, respectively). The AUCs for testing and     All models had good to very good accuracy, but the ensemble model was the most accurate LS model.

Discussion
To reduce the detrimental impacts of landslides, delineating the areas of highest LS with accurate tools is essential. Research has been conducted comparing different LS modelling methods, but the advances in ML techniques for mapping extreme natural processes have increased their usefulness for hazard assessments. In our research, three tree functions and one novel ensemble (J48, REPtree, NBTree, and RF-REPTree) were used to identify the most landslide-prone areas. The models have indicated that the Kalaleh catchment is a very a landslide-prone watershed. The proportions of the basin that were classified as having very high LS were: 23.62% (J48), 18.46% (NBTree), 18.30% (REPTree), and 12.81% (RF-REPTree). The centre of the catchment was always classified as very high LS. Steep slopes, weak lithology, fragile soils, high precipitation, and poorly engineered and poorly constructed roads and infrastructure may contribute to the high LS in this area. The RAF model indicated that TPI, rainfall, distance to stream, elevation, SPI, distances to roads and faults, and TRI contributed substantially to LS. All models were validated with ROC, efficiency, TSS, and the kappa index, and they demonstrated that all models are suitable for mapping LS, but the novel ensemble (RF-REPTree) was the best performer. The REPTree algorithm prevents backward overfitting and seeks the most reliable subtree with the least variation based on the post-pruning approach (Quinlan 1987;Esposito et al. 1999). Moreover, the RF classifier prevents noise and dramatically reduces classification errors (Breiman 2001). Sampling by bootstrap may reduce the sensitivity of a single classifier to noise in a data set, however, resulting in a corresponding reduction in classification variance. When two or more methods are combined, they may overcome individual approaches' limitations, enhance efficiency, and increase the model's prediction accuracy Moosavi and Niazi 2016). In this study, new ensemble architectures were used, and the results were higher than in earlier studies (Dahal and Hasegawa 2008;Poudyal et al. 2010;Pandey et al. 2020). We also noticed that the two models' ensemble (RF and REPTree) achieved better outputs than other machine learning approaches. In this study, RF-REPTree was found to have better predictive capability than the standalone models, similar to the results in Hong et al. (2018), and Chen  TN  51  59  59  58  127  123  101  129  FP  24  16  16  17  47  51  73  45  FN  18  24  25  22  58  53  43  31  TP  57  51  50  53  116  121  131 (2019). The RF-REPTree ensemble models' performance is better than that of previous landslide ensemble models (Wu et al. 2020). Ensemble ML methods have been used to model other processes like gully erosion, groundwater, land subsidence, and others (Arabameri et al. 2019c;Rahamati et al. 2019;Roy and Saha 2019b;Roy et al. 2019). Their results also showed that ensemble models performed better than standalone models. To tackle landslides, it is essential to delineate the area's susceptibility to landslides and determine the main factors. A significant benefit of the ML algorithms used here is that they simplify finding the relevant data by investigating multiple databases. These algorithms can tackle specific purposes and can be used in conjunction with automated analysis of large datasets to aid decision-makers. The results could help reduce the risk of landslide in the Kalaleh River watershed and its surroundings having similar terrain and geology.

Conclusion
The ML models J48, NBTree, REPTree, and RF-REPTree successfully created LSMs for Kalaleh watershed, Iran. ROC curves and statistical techniques were used to evaluate and compare the LSMs carefully. The results indicate that the accuracies of all the models were either excellent or very good. The ensemble RF-REPTree model is the best predictive model, and it was followed in rank order by the J48, REPTree, and NBTree models. The ML ensemble is an efficient and accurate tool, overcoming errors and providing output results quite quickly. The outcome of the analysis of variable significance showed that the TPI is the most significant LCF. Rainfall and distance to stream were the second and third most important factors. Soil types and LU/LC were the least important factors. These results could help land resource managers cope with currently high levels of uncertainty surrounding landslides and help them understand the relationships between factors and landslides more profoundly. This study highlights the very high LS of the central portion of this catchment. Therefore, it is suggested that immediate, targeted management and planning for landslides is needed to prevent severe consequences in the Kalaleh River watershed. This modelling method could be used to guide future landslide vulnerability research, particularly for vulnerability tied to land-use change. ML ensemble simulation can improve model accuracy and decrease model uncertainty, reducing classification problems like overfitting. These models can be applied to other regions with similar geo-environmental characteristics as the Kalaleh River watershed.

Funding
This research was partly funded by the Austrian Science Fund (FWF) through the Doctoral College GIScience (DK W 1237-N23) at the University of Salzburg.

Data availability statement
The data that support the findings of this study are available on request from the authors.

Disclosure statement
No potential conflict of interest was reported by the author(s).