Spatial mapping of landslide susceptibility in Jerash governorate of Jordan using genetic algorithm-based wrapper feature selection and bagging-based ensemble model

Abstract This study employs five genetic algorithm (GA)-based machine learning (ML) models, namely the Decision Tree (DT), k-Nearest Neighbors (kNN), NaïveBayes (NB), Support Vector Machine (SVM), and Extreme Learning Machine (ELM), to build a novel ensemble algorithm that is founded on the Bagging method for landslide susceptibility mapping (LSM) in Jerash Governorate, north of Jordan. The GA-based wrapper feature selection (FS) was done based on the five individual models and in the initial stages of modeling, an inquiry for the best feature for each of the five models was made. Finally, five hybrid models, namely DT-GA, kNN-GA, NB-GA, SVM-GA, and ELM-GA were constructed and combined to create Bagging-based ensemble model. The FS outcomes uncovered that rainfall depth, distance to roads, the Stream Power Index, the Normalized Difference Vegetation Index, slope, geology, and aspect are the most influential determinants of landslides. After the significant variables were identified, they were selected as input predictors and entered into the models. GA-based Bagging ensemble model with the area under the receiver operating characteristic curve (AUROC) of 0.85 achieved the highest accuracy in the validation run, followed by SVM-GA (AUROC = 0.80), NB-GA (AUROC = 0.76), DT-GA (AUROC = 0.72), kNN-GA (AUROC = 0.70), and ELM-GA (AUROC = 0.48).


Introduction
Natural disasters quite often result in injuries, loss of life, damage to infrastructure and sites of cultural heritage, and displacement of people, which bring about substantial economic losses. Landslide is a natural hazard leading to extensive loss of life and property across the world every year (Schuster and Fleming 1986;Petley 2012). As human population increases and cities are extended to steep regions, the risk of mass movement of earth materials increases (Smyth and Royle 2000;Sassa et al. 2004). In densely-populated regions, this mass movement can be highly disastrous (Knapen et al. 2006). Landslides, particularly those induced by rainfall, are frequently the most devastating natural disasters in the rugged and the mountainous areas. They usually lead to life losses, property damage, and economic losses (Youssef and Pourghasemi 2021). In the past, landslides were accompanied by high mortality rates such as the cases of landslide on 2014 in Afghanistan, which killed nearly 2,700 persons and landslide on 2014 in Shanxi, China, which killed 64 persons (Zhang et al. 2015;Lin and Wang 2018). Therefore, efforts to early identify landslide-susceptible lands and take them into account in land use planning can sharply reduce the landslide risk and associated damage and losses (Fell et al. 2008).
In Jordan, flash floods and landslides are more frequent than other natural hazards. The governorate of Jerash in Jordan is the particular governorate that experiences most of the natural disasters, especially landslides. Almost every year, landslides take place in this governorate, mostly triggered by meteorological factors (e.g., heavy rainfall) and human activities, e.g., construction, road excavation, and agricultural activities (Malkawi et al. 2000). The city of Jerash is prominent in historical and scenic natural sites. As such, it enjoys high potential for tourism. Bearing this in mind, analysis and mapping of landslide susceptibility in this governorate is badly needed in order to reduce the damage associated with landslides.
Landslide susceptibility mapping (LSM) is an action by which landslide-susceptible lands can be identified and, based on that, construction and irrelevant uses of land can be prohibited (Saha et al. 2005). Maps of land susceptibility to slides are quite vital sources of fundamental data that are frequently needed for determining the areas under risk of landslides and for reducing the probable losses associated with the landslides when they occur (Yi et al. 2020;Thi Ngo et al. 2021). In previous studies, there have been numerous attempts to identify landslide-susceptible zones using a variety of approaches (Zeybek and Şanlıo glu 2015;Zeybek et al. 2015). However, the flexibility and accuracy of the approaches differ, depending on the geological and climatic characteristics of the studied regions.
A review of literature on this research theme uncloses that, in general, the task of LSM is achieved using either of two main approaches: knowledge-based and datadriven approaches (Arca et al. 2018;Mallick et al. 2018;Meena et al. 2019;Zhu et al. 2019;Mabdeh et al. 2022). The former approach encapsulates a wide range of techniques, the most famous of which is the Analytical Hierarchy Process (Pawluszek and Borkowski 2017;Abedini and Tulabi 2018;Mandal and Mandal 2018). In general, the outputs of knowledge-based approaches are dependent on experts' opinion, where contradictory and dogmatic comments may affect the results. Many of these approaches for LSM fail to deal with predictors that have negative values (e.g., the normalized difference vegetation index (NDVI)). As well, some techniques like entropy (Zhang et al. 2018) have limited capabilities to handle such data. Nevertheless, these groups of approaches performed well in many applications (Turan et al. 2020;Ali et al. 2021). The data-driven approach uses historical data for modeling (Zhu et al. 2018). In other words, the data-driven models use data on previous landslides locations and consider their influential factors in the modeling process (He et al. 2019;Nhu et al. 2020). This approach encircles a wide range of models, including the artificial neural networks (e.g., Sevgen et al. 2019;Manohar and Anbazhagan 2021;Youssef and Pourghasemi 2021); decision tree (e.g., Bui et al. 2012;Pradhan 2013;Nhu et al. 2020;Biçici and Mustafa 2021); generalized linear model (e.g., Goetz et al. 2015;Pourghasemi and Rahmati 2018); logistic model tree (e.g., Chen et al. 2017;Pham et al. 2018); Naïve Bayes classifier (Hungr et al. 2014;Zhang et al. 2018;He et al. 2019;Sevgen et al. 2019;Nhu et al. 2020;; and the support vector machine (e.g., . In previous studies, researchers made attempts to enhance the accuracy of machine learning models in LSM. For achieving this purpose, numerous hybrid models have been proposed Paryani et al. 2020;Chen and Chen 2021). Large number of these models employed heuristic algorithms in order to tune hyper-parameters of machine learning models, and, accordingly, they have the disadvantage of high computational complexity and running time. In recent years, the ensemble models, as a viable alternative, have attracted the attention of researchers in various fields Kadavi et al. 2018;Roy et al. 2019;Arabameri et al. 2020). Unlike the hybrid models, the ensemble models are not computationally expensive. Furthermore, they combine the outputs of simple, weak algorithms to generate models with higher accuracies than those of the simple models. Bearing this in mind, this study aimed at developing an ensemble model for LSM in an effort to solve the problem of recurrent landslides in the governorate of Jerash, north of Jordan. In addition, the study took into consideration the problem of accurate feature selection in the pre-modeling phase and proposed a suitable solution for this problem. It should be underscored that despite the large effect of feature selection on performance of the machine learning algorithms, most of the earlier studies failed to properly address this step.
As mentioned earlier, the models commonly used in ensemble models have been constructed without considering the feature selection, which is one of the basic preprocessing steps in modeling using machine learning methods (Kadavi et al. 2018;Pham et al. 2020;Pham et al. 2022). In fact, the use of redundant and irrelevant factors in the models increases the running time of the algorithms, the volume of data used for modeling, and in some cases, reduces the accuracy of the modeling (Shafizadeh-Moghadam et al. 2017). Therefore, in the present study, first the considered factors are entered into the feature selection phase based on a wrapper feature selection approach. Then the optimal factors are selected and used in individual models. Finally, these hybrid models constructed by wrapper-based genetic algorithm (GA) and machine learning models are used to construct the ensemble model based on bagging. Therefore, this study contributes to the literature by proposing a novel ensemble model based on Bagging and wrapper-based feature selection using GA for LSM. In consequence, the aim of this study was to produce landslide susceptibility maps for the governorate of Jerash using a bagging ensemble model constructed by the Decision Tree (DT), k-Nearest Neighbors (kNN), NaïveBayes (NB), Support Vector Machine (SVM), and Extreme Learning Machine (ELM) models and GAbased wrapper feature selection.

Study area
The area selected for this study is Jerash Governorate, which is characterized by recurrent landslides, especially in winter. It lies northwest of Jordan (32 16 0 12.12" N, 35 53 0 17.41" E) and has a land area of 420 km 2 (Figure 1). The total population of Jerash is 262,100 capita, which corresponds to a population density of 639.6 people/km 2 . This governorate is characterized by hot weather in the summer and cold to moderately-cold weather in winter.
Jerash has varying altitudes that lie within the range of 2-1,229 m above sea level. Accordingly, this governorate is considered a mountainous area. However, there are parts of Jerash that lie under the level of sea surface. This suggests presence of steep slopes in this governorate. These steep slopes are actually the main culprit for land sliding in this governorate. The field visits and inspection works performed by the researchers in Jerash disclosed that frequent mass movements (that is, landslides and rock falls) take place in this governorate. The traces on topography of this area that had been left by natural disasters are readily seen by the naked eye. Indeed, these and other natural phenomena keep altering topography of land. In other respects, the recurrent landslides result in substantial economic losses, due, for example, to damage to roads, electricity infrastructure, and telephone lines and because of sheeting of slopes and construction of retaining walls to protect people from direct exposure to landslides ( Figure 2). Furthermore, rock falls that result from geological structure of Jerash are somewhat frequent. They produce considerable material damage and create steady fear in the nearby community. Recently, investment in Jerash and the infrastructure and urban developments which have been made by the government and local people brought about growth in tourism infrastructure, which made a study of this kind necessary for helping the urban planners and decision makers in knowing the locations which are particularly vulnerable to landsliding so as to manage human activities in, and around, them.

Methodology
This study aimed at developing a bagging-based ensemble model by using wellknown machine learning algorithms and wrapper-based feature selection. To this end, first, the landslide inventory maps and conditioning criteria (i.e., determinants of landsliding) were prepared. Thereafter, the frequency ratio (FR) was employed to Figure 2. Effects of the Jerash landslide on topography (a) cracks that have formed on land and telephone lines that have been damaged (red circle). (b) A retaining wall that has been damaged, (c) a drainage ditch that has been filled with earth and become unusable d landslide area cracks.
determine the weight of each class of variables affecting landslide occurrence. Before using the models to solve the LSM problem, the proposed hybrid feature selection algorithms based on five machine learning models and GA were employed to determine the optimal determinants of land sliding. The identified determinants were then selected and employed as inputs to the single models and each of the models is constructed. Then the output of these models is combined to construct the bagging-based ensemble model. Finally, the models were all evaluated based on the area under the receiver operating characteristic curve (AUROC) criterion. Then, landslide susceptibility maps were produced. Figure 3 shows the overall process applied in this study.

Inventory landslide map
Generation of inventory landslide maps is the earliest and most vital step in the LSM. This study tracked and documented the landslides that took place in the study area through site visits and collection and analysis of local information, Google Earth images, and site maps, which were obtained from The Watershed Office of Jerash, Jordan. Based on this, sixty-nine landslide locations were spotted on the map of Jerash. Thereafter, data were split in a random manner into training sub-set (70.0% of the data) and validation sub-set (30.0%). For non-landslide locations, sites in the governorate of Jerash with no reported land sliding history have been chosen by the researchers, considering a number of criteria. One of these criteria is that the site has no history of landslides few decades, or centuries, earlier. Another criterion is that the site has no likelihood for landslides. Moreover, in the case when the site has soft crust (like that the bare lands and the sites that are employed for cultivation of seasonal crops) the sites to include in this study were selected in places with low slopes like flat lands or tops of the hills, provided they have cohesive soils. In sites with no steep slopes (namely, sites with low to moderate slopes), non-slipping locations were selected if they were covered with permanent green cover (e.g., woodlands and forests) or if they were having solid cover of rocks. Meanwhile, locations with urban activity established for decades that are entirely free from indications of landslides were too selected. On the other hand, locations with established landslide activity were selected, irrespective of their soil types and geological structures.

Landslide conditioning criteria
Land sliding in Jerash is influenced by various factors. The most important of which are: (i) extending the transportation networks without studying geological structures of the areas in which the new roads will be created and the areas wherein the roads are to be widened; (ii) extending tourist infrastructure and facilities such as restaurants and vehicle parks in scenic areas (e.g., edges of steep slopes and summits of mountains) without considering soil and geological structure types for the new constructions themselves and for the new roads leading to them; (iii) performing construction activities in slums prevailing on village outskirts and countryside, where monitoring by the local municipalities is normally less regular and efficient than monitoring in urban centers. Usually, the people implementing these works do not take precautions against geologic threats to construction; and (iv) the agricultural works, especially in private farms, which do not take into account factors leading, or contributing, to sliding of land such as ploughing and repeated, heavy irrigation. Based on a review of the literature, the available data, and opinions of experts, sixteen landslide conditioning criteria (that is, determinants or influential variables) were selected, being highly relevant for LSM. These are aspect, distance to drainage, distance to faults, distance to roads, elevation, geology, land cover and use (LULC), Normalized Difference Vegetation Index (NDVI), plan curvature, profile curvature, rainfall, Sediment Transport Index (STI), slope, soil texture, Stream Power Index (SPI), and Topographic Wetness Index (TWI). Maps for these 16 predictors were, then, prepared in the Geographic Information System (GIS) environment. The selected factors were not all used in the modeling. Rather, they were filtered for modeling after careful feature selection. Table 1 shows the source and resolution/scale of the spatial layers.

Elevation
Elevation is the height above or below a reference point. Usually, the values of elevation are extracted from Digital Elevation Model (DEM). As Figure 4a indicates, the elevation values range in the study area from 2 m to 1229 m. Determinants of susceptibility to landslides: Geology, distance to roads, rainfall depth, and the Topographic Wetness Index. c. Determinants of susceptibility to landslides: Distance to faults, plan curvature, Normalized Difference Vegetation Index, and profile curvature. d. Determinants of susceptibility to landslides: Distance to drainage, Stream Power Index, soil texture, and Sediment Transport Index.

Aspect
Aspect is the steepest downhill direction (Kadavi et al. 2018). It affects shear strength of the sliding material and influences rainfall distribution, being associated with the physiographic properties of the area ).

Slope
Slope helps in controlling the moisture content and the level of pore pressure of the soil. An increase in the angle of inclination leads to instability of the slope and, consequently, to an increase in the likelihood for landslides to occur (Bahrami et al. 2020). The slope angle also controls the surface water runoff, slope stress distribution, slope sediment accumulation, and soil erosion (Fang et al. 2021). As shown in Figure  4a, the slope angles range in the study area (the governorate of Jerash) from 0 to 30 .

Land use
According to Souissi et al. (2020), land use influences hydrological processes such as runoff and infiltration, which are processes that are much affected by presence and distribution of bare lands, buildings, flow paths of streams, vegetation cover, roads, etc. Land cover and land use (LULC) may have negative and positive roles in the susceptibility of land to the slides. For example, built-up structures result in stabilization of slopes. However, non-expert building activities at slopes compromise slope stability and trigger landslides (Bahrami et al. 2020). As regards the study area, it is seen in Figure 4a that nine land covers and land use types prevail in Jerash.

Geology
Geology of the area influences the shear strength of the rock and indicates the probability of increase in neutral pressure in the sub-soil . Figure 4b uncovers that there are five types of geology in the study area. During the Mesozoic and the Early Cenozoic era, sedimentation was actually controlled by the fluctuational movement of Tethys Ocean, northwest of Jordan, and the eustatic movement of the Arabian-Nubian Shield and, thus, its Palaeozoic rocks in the south of the country.
The study governorate is characterized by exposures of sedimentary rocks that mainly include chalk, chert limestone, marl, and sandstone of the cretaceous and Jurassic ages ( Table 2). Because of the pre-Jurassic tectonic movements, the Azab Group, which constitutes the oldest layers that are exposed in the area under study, overlies the Triassic sequence. This group is covered by the Early Cretaceous Kurnub Sandstone (KS) Group that is overlain by thick sequence of marls and limestone of Ajlun (Cenomanian-Turonian) Group. This group of rock formations is divided into five major formations: Wadi As Sir Limestone (A7), Shuyab Limestone (A5/6), Hummar Limestone (A4), Fuheis Limestone (A3), and Na'ur Limestone (A1/2). Vast segments of the northern parts of Ajloun are covered with three formations of the Belqa Group, namely, Wadi Umm Ghudran Chalk (B1), Amman Silicified Limestone (B2a), and Al Hisa Phosphorite (B2b), which is a Coniacian-Campanian formation. Limited outcrops of Muwaqqar Chalk Marl formation (B3), which is Maastrichtian-Palaeocene formation, are found in the northwestern sections of Ajloun. In this governorate, Quaternary sediments are mainly plateau gravels, soil, wadi sediments, calcrete, and alluvium. On the other hand, two principal fault patterns, of the Late Tertiary age, presumably, characterize the geologic structure of this governorate: east-west and northeast-southwest faults. Accordingly, most of the folds in this governorate are either east-northeast folds or northeast to west-northwest folds.

Distance to roads
The values of distance to roads ranged in the current study area from 0 m to greater than 600 m (Figure 4b).

Rainfall
Rainfall influences the saturation level of the soil and the amount of runoff through it and raises the water pressure in the soil pores (Bahrami et al. 2020). With respect to Jerash, Figure 4b points out that the rainfall ranged in this governorate from 250 mm to 550 mm.

Topographic wetness index (TWI)
Since the TWI is hydrological variable, then it gives quantitative assessment of moisture condition of soil, which is an essential contributor to landslide incidence (Fang et al. 2021) that is widely used in representation of the influence of topography on them. Mathematically, the TWI is given by Eq. (1): In this equation, the variable A S is specific catchment area whereas b is slope gradient in degrees radian . In this study, values of TWI were classified into three intervals: 2-9, 9-13, and 13-19 ( Figure 4b). 3.2.9. Distance to faults The distance to faults is the maximum distance from faults. In the case of the present study area, Jerash, the distances to faults fell in the range of 0-11,000 m ( Figure 4c).

Plan curvature
Plan curvature is perpendicular to the slope direction (Fang et al. 2021). As can be noticed in Figure 4c, the plan curvature values in the governorate of Jerash have been grouped into three classes: < À0.05, À0.05 to 0.05, >0.05.

NDVI
The NDVI is an index commonly employed to outline vegetative cover (Kadavi et al. 2018). The NDVI is an indicator of vegetation intensity (and, therefore, biomass) and distribution. Yang et al. (2012) clarified that changes to vegetation in the vegetated areas can result in failure of slopes. They also indicated that drops in values of the NDVI, reflecting disturbance in vegetation, within short periods of time can be employed for quick identification of areas with high potential for landslides. In the current study, an NDVI map for the study area was created from Landsat 8 images in the ENVI 5.1 software environment. The resulting NDVI values were categorized into six groups: (i) À0.46 to À0.015, (ii) À0.015 to À0.08, (iii) À0.08 to À0.03, (iv) À0.03 to 0.01, (v) 0.01 to 0.05, and (vi) 0.05 to 0.28 (Figure 3c). These values were obtained using Eq. (2): In this equation, the acronyms R and IR stand, respectively, for intensities of energies that are reflected in the red and infrared regions of the spectrum of electromagnetic radiation (Pradhan et al. 2010). Figure 4c unveils that the NDVI values varied in the study area from À0.35 to 0.20. According to Yang et al. (2012) Analyzing the disturbances on vegetation detected from the abnormal sudden drops of the normalized difference vegetation index (NDVI) within a short period can be used for the purpose of rapid landslide identification.

Profile curvature
The profile curvature normally lies parallel to direction of slope. It represents the amount of deformation of the slope surface and contributes to generation of landslides (Fang et al. 2021). For the study area, the profile curvature values were divided into three categories: < À1, À1 to 0.2, and >0.2 as shown in Figure 4c.

Distance to drainage
Values of distance to drainage have the range of 0 to >600 m in the area of study (Figure 3d). These values have been categorized into five intervals as shown in Figure 4d.
3.2.14. Stream power index (SPI) The SPI expresses stream erosive power (Fang et al. 2021). It can be calculated according to Eq. (3): where A S is the specific catchment area and b is the slope gradient . As can be seen in Figure 4d, Jerash has SPI values ranging from 1.54 to higher than 11.08. Figure 4d reveals that four major textures characterize soils of the study area: Silty clay loam, loam, silty loam, and clay loam.

Sediment transport index (STI)
The STI describes slope failure mechanism. It is measure of amount of sediment that is, or can, be transported (Fang et al. 2021) and is computed using Eq. (4): where b is the slope of each pixel and A S is the upstream area .
The values of STI in Jerash had the range of 0 to > 80 (Figure 4d).

Correlations among landslides and their determinant factors
The frequency ratio (FR) was used to investigate probable relationships between landslide determinant factors (variables) and landslide incidence. The FR estimates the likelihood of the presence of a phenomenon in a class of a variable (Lee and Sambath 2006). Stated otherwise, the FR quantifies associations of every group of variables and the locations of landslides. Mathematically, it assumes the formula (Lee and Sambath 2006): In this formula (Eq. 5), the acronym A describes number of pixels of landslides in the particular class, B denotes overall number of occurring landslides, C indicates area of class, and D expresses area of the studied district or province.

Feature selection using genetic algorithm (GA)
Since a better input than otherwise will lead to a less complex and more accurate model, selecting the appropriate features to be used as predictors has always been one of the main concerns of modeling algorithms (Bouhamed et al. 2015). Within this context, use of the correlations amongst the predictor variables and the target variable is a common approach to feature selection. However, this method can only show the effect of a single predictor on the target variable, but a combination of multiple variables with high correlation coefficients may not always be as much effective as expected (Guyon and Elisseeff 2003;Liu et al. 2005). In this study, we used wrapperbased feature selection that combine the GA with the DT, kNN, NB, ELM, and SVM methods in an effort to identify, and, then, select the best features for each model from the pool of available input features. By doing so, the effects of multiple variables are considered simultaneously and, thereupon, the problem of feature selection is eliminated. However, in general, a complete search of the feature space that examines all likely feature combinations is time consuming and it can be infeasible once the number of input features exceeds a certain limit.
In the feature selection method proposed in this paper, which is known as the wrapper approach, first, a random initial population of solutions is generated (Rodrigues et al. 2014), where each chromosome represents a feature set, and each gene designates a single feature. In a single chromosome, if a gene has the value of 0, then the corresponding variable is not part of the final feature set. On the other hand, if a gene has the value of 1, then the corresponding feature is present in the final feature set. At the beginning of implementation of the algorithm, these genes are randomly initialized. Once the initial population is generated, the feature set concomitant with each chromosome is fed to the DT, NB, ELM, SVM, and kNN algorithms, and the modeling is carried out using the training dataset. Once the models are created, the model-predicted values are compared with the true values and the root mean-squared error (RMSE) is calculated. That is, the RMSE serves as fitness function in GA and its values are utilized as foundation, or criterion, for selection of the chromosomes by the roulette wheel method. The chromosomes with low RMSE values have a higher chance of being selected by the roulette wheel, and vice versa. When selection operator is run, the method of two-point crossover is employed to join all chosen chromosomes and generate the sequent generation of the solutions. This process will be repeated and, after a certain number of iterations, the best available feature set will be chosen for modeling.

The k-nearest neighbors (kNN)
The kNN algorithm represents non-parametric approach to supervised classification and stands as the simplest machine learning algorithm for this purpose (Kramer 2013). The foundation of this algorithm is the classification of any new data according to the most similar earlier data (Kramer 2013). For this purpose, the k parameter, that is, number of nearest neighbors to take into account, is specified. After that, the k samples of data which are most identical to the sample of interest are chosen on the basis of distances of data points from one the other (Kramer 2013). The Euclidean distance (Eq. 6), Manhattan distance, and Minkowski distance are among the distance metrics that are typically used for this purpose (Kramer 2013).
Once the nearest neighbors of the desired data sample are determined, the number of samples of each class in the neighborhood is determined and the class with the majority vote is considered as the class of the new data sample. The result of the algorithm is highly dependent on the value of k because the boundaries of the classes vary as the value of k varies. Hence, specification of the best k value is highly critical. Furthermore, the kNN algorithm is sensitive to the scale of the variables. Thus, data normalization can be helpful. However, the most prominent problem of the kNN algorithm is its high computation complexity and execution time, which is a result of the need for re-processing the entire training dataset whenever a new data sample is classified.

Naï ve Bayes (NB) algorithm
The NB algorithm is a machine learning algorithm that employs statistical methods to classify data (Leung 2007). This algorithm is based on the Bayes theorem (Kelly and Kolstad 1999): Given a test sample of data X ¼ fx 1 , x 2 , :::, x n g having n features. At first, we assume that the features are independent variables, i.e., each feature participates independently in determination of the probability of the data belonging to each class. This assumption simplifies the calculations. The NB algorithm determines the probability for the data sample to belong to each class in the set C ¼ ðC 1 , C 2 , :::, C K Þ to predict its class. The algorithm computes the probability of occurrence or non-occurrence of any of the cases using the Bayes theorem (Kelly and Kolstad 1999): However, the independence assumption leads to: The data sample is, accordingly, classified into the class with the highest probability obtained using the training dataset. This algorithm is suitable for handling multi-class problems and it does not require a lot of training data. Additionally, it has a high processing speed and it produces classifications of good accuracy (Kelly and Kolstad 1999).

Decision tree (DT)
The DT represents one of the machine learning methods which may be employed in classification and regression problems (Kotsiantis 2013). The DT comprises three parts: root node, internal nodes, and leaf nodes (Kotsiantis 2013). The data features are explored in the root node and each internal node. Then, the most discriminating data feature is selected for partitioning the data (Kotsiantis 2013). The Entropy (Eq. 11) or Gini (Eq. 12) criteria are often employed for partitioning data (Breiman et al. 1984). These two metrics measure the impurity of the node. The feature with the lowest Entropy or Gini value provides the highest amount of information for classification and it is the best choice for the purpose of data partitioning (Breiman et al. 1984).
These two metrics are employed to disclose the optimal feature for partitioning the data. However, Entropy has somehow better accuracy performance than Gini. On the other hand, the Gini metric involves less computation and it is faster than Entropy (Breiman et al. 1984). Once the best feature is determined, the data is partitioned based on it. The DT grows as this process continues. In the end, each leaf node is labeled with a class. The chance of over-fitting is relatively high in the decision trees. Early stopping on the basis of such measures as maximal tree depth or lowest number of samples in the nodes of the leaf is an approach that is quite often employed for avoiding this sort of problem (Breiman et al. 1984). Interpretability represents one of the major favorable characteristics of this kind of model.

Extreme learning machine (ELM)
Relatively, the ELM is new machine learning model, which has been applied with success in differing fields and in providing solutions to a variety of problems (Huang et al. 2006). Initially, the ELM was proposed as a learning framework for the generalized single-hidden layer feed-forward neural network (SLFN) but was later extended to other generalized SFLN models (Huang et al. 2004). The output function of a generalized SFLN can be expressed as illustrated in Eq. 13 (Huang et al. 2006): where h i ðxÞ maps the d-dimensional input space (x) to an l-dimensional space. The acronyms a i and b i are actually randomly-initialized variables of the ith node while b i stands for weights of connections between the output and hidden layers. In the meantime, the parameter L indicates the total number of functions that need to be estimated. For a binary classification problem, the decision function can be defined as shown in Eq. (14) (Huang et al. 2006): Unlike the conventional learning algorithms, the ELM not only aims at minimizing the error on the training set but also tries to minimize the norm of the output weights (Huang et al. 2006). Based on Bartlett's Theory, minimizing the norm of weights in the SFLNs, in addition to minimizing the training error, results in better generalization (Huang et al. 2004). Thus, the objective function of ELM is formulated according to Eq. (15) to reduce the training error and the norm of the output weights (Huang et al. 2004).
where G a i , b i , x i ð Þexpresses the relationship between the ith variable and its weights.
As can be observed, minimizing kbk is the same as maximizing 2kbk À1 , which is equivalent to maximizing the margin of the two classes in a binary classification problem. To improve the learning model and extend it and to avoid projection, which is not always obvious for the designer, the kernel-based model is used. In this case, the ELM kernel is defined based on Eqs. (17) and (18) (Huang et al. 2004): where T denotes the transpose of a vector and Kðx i , x j Þ represents the relationship of the x i and x j variables in terms of a function K. Meanwhile, the parameter I is the identity matrix with appropriate dimensions and C 2 R þ is a tuning parameter.

Support vector machine (SVM)
Based on the statistical learning theory, the SVM is one of the widely-used machine learning methods. In this method, the features or factors are used along with an optimization algorithm to identify the samples that constitute class boundaries. They are also employed to create an optimal, linear decision-making boundary to separate classes. These samples are called support vectors (Vapnik et al. 1995). The optimal margin method is adopted to determine the decision-making boundary between two completely separate classes (Vapnik et al. 1995). Using kernel functions, this model transfers data to a space with higher dimensions, where data can be separated through a linear discriminating function (Patle and Chouhan 2013). The model can be defined using Eq. (19) by considering datasets f x 1 , y 1 ð Þ , x 2 , y 2 ð Þ , :::, ðx n , y n Þg in which x i refers to the samples whereas y i is the label or class of each sample (Patle and Chouhan 2013).
where a i , K, and C refer to the Lagrangian coefficients, a kernel function, and a constant, respectively. The value of a i can be obtained by solving Eq. (19). The support vectors (s) are the internal values (0, C) and the constant that is called the bias (b) is determined by using these vectors. After these parameters are calculated, the decision-making function is obtained from Eq. (20) (Patle and Chouhan 2013): Equation (20) shows the kernel function K to be having different types of functions such as the linear, sigmoid, polynomial, and radial basis functions (RBF). This study used the kernel of the RBF due to its good performance.

Suggested bagging ensemble model
Bagging stands for Bootstrap Aggregation. It is quite robust ensemble machine learning technique that may be employed in classification and regression problems (Breiman 1996). It is usually used in combination with non-robust and overfit-prone models such as the DT (Breiman 1996). Using this method reduces the variance, increases the robustness, and enhances the overall prediction accuracy of the model.
In this algorithm, once the base learning model is determined, bootstrapping and aggregation are performed. In the bootstrapping phase, multiple sub-sets of the training dataset are created using random sampling with-replacement sampling. Withreplacement sampling ensures that the probability for a data sample to be selected for a new sub-set is equal for all data samples and that creation of each new sub-set is independent of selection of the others (Breiman 1996). The base model is not robust. Therefore, using the same training algorithm with different datasets results in different classifiers. Since sampling of each sub-set is independent of sampling of the others, then the models can be trained independently and in parallel. In the aggregation phase, the results of different classifiers are combined in order to provide the final prediction. In regression problems, the final prediction is determined by averaging the results of the trained models (Breiman 1996) while in the classification problems the majority voting approach is used to combine the results.

Results
The FR was employed in the present study to determine the weight of each class of variables affecting landslide occurrence. Table 3 displays the most and the least important classes of each factor. In elevation factor, the class 318-533 m had the highest weight in the FR model. In aspect factor, all classes almost contribute to the landslide occurrence and only class flat has no effect on the landslide occurrence. For the slope factor, all classes contribute to the landslide occurrence. Landslides were more common in rainfall classes 350-400 mm and 400-450 mm. Furthermore, the highest weights for other factors were achieved by SPI between 200 and 300, TWI < 9, STI between 0 and 565, distance to drainage between 0 and 150 m, distance to fault between 0 and 2000 m, NDVI between 0.05 and 0.28, Sandstone class from geology factor, and Soil rocks from land use factor. After determining the weights for each class, these weights were assigned to the classes.
To eliminate the co-linearity between the variables and remove the unnecessary factors, a combination of GA and five machine learning models based on wrapper were used for feature selection of the conditioning factors. When running the feature selection algorithms in the current study, the maximum iterations and the number of chromosomes in each generation were set to 200 and 20, respectively. Figures 5-9 show convergence of the fitness function in the feature selection algorithms. Figure 5 points out that mean RMSE of GA-DT model has started with 0.14, and after running the algorithm for 200 generations reaches to 0.0093. It should be noted that the optimal value for this fitness function is zero. Meanwhile, Figure 6 indicates that the mean and best values of fitness for the GA-ELM were 0.1516 and 0.1096, respectively. For the other three remaining algorithms (Figures 7-9), the mean and the best fitness values were respectively 0.3361 and 0.3344 for the GA-kNN, 0.4449 and 0.4415 for the GA-NB, and 0.4039 and 0.4012 for the GA-SVM. The results of feature selection using GA-kNN, GA-NB, GA-SVM, GA-ELM, and GA-DT algorithms are provided in Table 4. As this table reveals, the elevation was not among the selected features for any of the models. This is in line with the study of (Micheletti et al. 2014) in Plateau, Switzerland. The aspect and geology factors were among the influential variables for all models, and play a crucial role in LSM.  In addition, in four of the five basic models for feature selection, the slope, NDVI, SPI, distance to roads, and rainfall depth were among the relevant variables.
After choosing the optimal features for each of the five models, namely DT, SVM, kNN, ELM, and NB by wrapper-based feature selection based on GA, the modeling was done using these single models separately. Then the output of these models was combined by the Bagging approach and an ensemble model was created. In fact, this ensemble model is a combination of five machine learning models (formed by the factors selected by feature selection based on the wrapper approach) and bagging approach.
In the LSM task, the objective is identification of the areas with the high likelihood for incidence of landslide. Therefore, irrespective of the approach followed in order to generate landslide susceptibility map, what matters the most is, in effect, reliability  of the approach. We used the AUROC metric to evaluate the various aforementioned models. Figures 10 and 11 present AUROC values for the six developed models in train and test phases. These figures show that Bagging-based ensemble model with the AUROC of 0.99 and 0.85 has the highest accuracy in training and validation phases, respectively. The results also indicate that the ELM model was over-fitted ( Figure 10). For this model, the AUROC was 0.99 in training and 0.43 in validation.
When the models were constructed and evaluated, all cells of the study area along with their predictor factors were entered into the models to calculate likelihood of incidence of landslides. Then, these outputs were employed in creation of landslide susceptibility maps. Figure 12 shows the landslide susceptibility maps generated by the five algorithms. These maps were classified into five classes namely, very low, low, moderate, high and very high susceptibility based on natural break method. The outputs of the mapping reveal that high-risk locations are mostly found in the east, south, and north of Jerash ( Figure 12); the western areas of this governorate prove to be less susceptible to landslides.

Discussion
Landslide susceptibility mapping is of great help for identification of instable areas. This study developed five models for LSM in Jerash; five individual models and novel Ensemble model. Those models were developed by using16 predictor variables and five feature selection algorithms that are employed for the first time to specify the Figure 10. The values of the AUROC for the six models in the training phase. Figure 11. The values of the AUROC for the six models in the testing phase.
most effective features for prediction of land susceptibility to sliding. Seven factors, namely, aspect, slope, geology, NDVI, SPI, distance to road, and rainfall depth were identified by most of the feature selection algorithms employed in this study as the landslide determinant factors. The outcomes of this mapping can be employed as foundation for planning a variety of activities that encircle slope stabilization and construction works. The results of this study provide evidence on that using ensemble models can result in reliable modeling and produce robust results. This is partly due to employment of multiple single models for developing integrated ensemble model. In the ensemble model, the class label of a sample is decided based on the outputs of several single models. A number of previous studies conducted in various areas around the world have reported similar results and shown that, in comparison with single models, the ensemble models often achieve better results. Naghibi et al. (2016) used an ensemble model based on an evidential belief function and a set of tree-based models for spatial modeling of groundwater potential. Their results indicate that the ensemble model achieves better performance than single models, which is a result with which our findings agree (Arabameri et al. 2020) used a combination of expert knowledge-based (AHP), bivariate (statistical index, SI) and multivariate (linear discriminant analysis) models as an ensemble model for landslide susceptibility mapping (LSM) in Mazandran Province, Iran. They found that the AHP-SI ensemble model achieved the highest accuracy (Youssef et al. 2015) used an ensemble method of frequency ratio and logistic regression for LSM in Fayfa area which is located in the southwest of Saudi Arabia (SA) in Jazan region. They found that the proposed ensemble model outperformed other single models. (Di Napoli et al. 2020) used the ensemble of artificial neural network, generalized boosting model and maximum entropy for LSM in the Monterosso al Mare area, Cinque Terre National Park (Northern Italy). Their results show that proposed ensemble model has good performance for LSM.

Conclusion
Landslide susceptibility mapping has many applications in the infrastructural activities of establishing and developing structural, agricultural, and natural resource operations. This study developed a new ensemble model that is based on the bagging approach and which was constructed from five individual single models for mapping landslide susceptibility in Jerash, Jordan. The modeling was performed in two phases. In the first phase, we used five wrapper-based feature selection algorithms to select the most important predictor variables for each model. These algorithms combine the GA with five basic machine learning models, that is, kNN, SVM, NB, DT, and ELM. The results of this phase have shown that aspect, geology, slope, NDVI, SPI, distance to roads, and rainfall depth are the seven most important determinants of landslides in the studied governorate. These seven variables were employed in LSM using the five aforementioned single models. Then, these models were used to create an ensemble model. The modeling results prove that the ensemble model is robust, both in the training and the validation phases. Thus, researchers and land use and urban development planners can use this model in their studies.