Modeling landslide susceptibility based on convolutional neural network coupling with metaheuristic optimization algorithms

ABSTRACT Landslides are one of the most common geological hazards worldwide, especially in Sichuan Province (Southwest China). The current study's main purposes are to explore the potential applications of convolutional neural networks (CNN) hybrid ensemble metaheuristic optimization algorithms, namely beluga whale optimization (BWO) and coati optimization algorithm (COA), for landslide susceptibility mapping in Sichuan Province (China). For this aim, fourteen landslide conditioning factors were compiled in a spatial database. The effectiveness of the conditioning factors in the development of the landslide predictive model was quantified using the linear support vector machine model. The receiver operating characteristic (ROC) curve (AUC), the root mean square error, and six statistical indices were used to test and compare the three resultant models. For the training dataset, the AUC values of the CNN-COA, CNN-BWO and CNN models were 0.946, 0.937 and 0.855, respectively. In terms of the validation dataset, the CNN-COA model exhibited a higher AUC value of 0.919, while the AUC values of the CNN-BWO and CNN models were 0.906 and 0.805, respectively. The results indicate that the CNN-COA model, followed by the CNN-BWO model, and the CNN model, offers the best overall performance for landslide susceptibility analysis.


Introduction
Landslides are considered to be one of the most harmful environmental hazards in mountainous areas, causing heavy losses to human life and infrastructure projects every year (Schuster 1996;GAR 2009;2019;Ghorbanzadeh, Gholamnia, and Ghamisi 2022;Trinh et al. 2022;Yang et al. 2023).Official statistics show that from 2010 to 2019, there were approximately 90,000 landslide events in China, resulting in about 8,000 fatalities and tens of billions of dollars in economic losses (Lv et al. 2022).Sichuan Province experiences 25% of all geohazards in China annually, and landslides account for 69% of these events.Landslide occurrences are largely affected by various factors, including geoenvironmental factors (e.g.topography, tectonics, land use, etc.) and triggering factors (rainfall, earthquakes, glacier outburst, etc.) (Carrara et al. 1991;Petley 2012;Okalp and Akgün 2016;Okalp and Akgün 2022;Merghadi et al. 2020).Therefore, landslide susceptibility mapping (LSM), defined as the possibility of a landslide occurrence under a specific set of conditioning factors, is strongly recommended as a crucial step for disaster prevention and mitigation (Brabb 1984).
Local governments can make more informed land use and development decisions for landslideprone regions with the help of LSM.However, new techniques with strong prediction capabilities still need to be used to increase the accuracy of LSM.
Commonly used landslide susceptibility models can be grouped into four main types: heuristic, deterministic, statistical, and machine learning models (Merghadi et al. 2020).Heuristic models, such as the analytic hierarchy process (Abedini and Tulabi 2018;Azarafza, Ghazifard, and Asghari-Kaljahi 2018) and weighted linear combination (Hung et al. 2016), primarily rely on the subjective judgments of relevant experts.The model is time-efficient and can be applied at any size of study (Guzzetti et al. 1999).The major shortcoming of heuristic models is their subjectivity, which is related to the expert knowledge-based experiential assessment of landslide preparatory factors.Deterministic models use simplified, physically-based landslide modeling techniques to analyze slope instability (Ji et al. 2022).By explicitly or implicitly utilizing numerical models to predict a scenario resulting from the understanding of the effect of assigned elements on slope stability, deterministic models solve the difficulty of studying and predicting hazards (Mebrahtu et al. 2022;Wei et al. 2023).These models can produce precise results and provide a good depth of study of landslide mechanisms.Deterministic models, however, require significant amounts of detailed data to provide credible results for large-scale studies (i.e. at the watershed level up to the county/provincial level), which comes at a high cost both financially and computationally (Youssef and Pourghasemi 2021).Therefore, deterministic models are currently not applicable for large area susceptibility zonation exercises.Statistical models were created to investigate relationships between contributing factors and the occurrence of landslides due to the difficulties of applying deterministic models (Reichenbach et al. 2018).Different types of statistical models, such as the frequency ratio (FR) (Berhane, Kebede, and Alfarrah 2021), logistic regression (Eker et al. 2015;Zhao and Chen 2023), Dempster-Shafer theory (Shirani, Pasandi, and Arabameri 2018), weight of evidence (Ozturk and Uzel-Gunini 2022), and entropy index (Hodasová and Bednarik 2021), have been used to map landslide susceptibility.When predicting the spatial distribution of landslides, statistical models offer a more flexible and cost-effective method (Huang et al. 2021).It should be noted, however, that statistical models perform poorly in assessing the nonlinear relationship between the occurrence of landslides and conditioning factors.To overcome the limitations of statistical models, machine learning models have been introduced for LSM.Various machine learning models have been employed in LSM such as support vector machines (Dou et al. 2020), artificial neuronal networks (Eker et al. 2012;Lucchese, de Oliveira, and Pedrollo 2021), decision tree (Guo et al. 2021), random forest (Taalab, Cheng, and Zhang 2018;Sun et al. 2021), naïve bayes (Youssef and Pourghasemi 2021), classification and regression tree (Pourghasemi and Rahmati 2018), and maximum entropy (Kornejady, Ownegh, and Bahremand 2017).Machine learning models are highly suggested due to their ability to address nonlinear real-world problems  (Hu et al. 2020).However, classical machine learning models may find erroneous correlations during the modeling process and are unable to extract more representative features from the input data to increase the accuracy of the prediction process.Each of the LSM models previously mentioned has its own advantages and drawbacks, which are listed in Table 1.
In recent years, many researchers have developed a novel reliable method to overcome the shortcomings of classic machine learning models by using deep learning models to create landslide susceptibility mappings, such as recurrent neural networks (Ngo et al. 2021), convolutional neural networks (CNN) (Nikoobakht et al. 2022;Huang et al. 2023), and generative adversarial networks (Al-Najjar and Pradhan 2021).These studies show that deep learning models can outperform machine learning algorithms in prediction accuracy.However, most deep learning models require a substantial amount of data for training, and when utilized alone, there is a risk of missing the best fit function during the training process (Azarafza et al. 2021).To improve the performance of deep learning models, researchers have included metaheuristic procedures.Effective solutions to optimization issues can be found by applying metaheuristic algorithms that employ random operators, trial-and-error procedures, and random scanning of the problem-solving space (Dehghani et al. 2023).Metaheuristic optimization algorithms are widely used to solve the majority of challenging real-world optimization issues, both nonlinear and multiplexing (Houssein et al. 2023).There are several metaheuristic algorithms proposed in the state of the art, such as Elephant clan optimization (Jafari, Salajegheh, and Salajegheh 2021), Geometric Octal Zones Distance Estimation algorithm (Kuyu and Vatansever 2022), Beluga whale optimization (BWO) (Zhong, Li, and Meng 2022), artificial Jellyfish Search optimizer (Chou and Truong 2021), Mountain Gazelle Optimizer (Abdollahzadeh et al. 2022), Reptile Search algorithm (Abualigah et al. 2022), and Coati Optimization Algorithm (COA) (Dehghani et al. 2023).These algorithms are distinguished by their superiority in resolving various optimization problems due to their dynamic search behavior and global search capacity.The accuracy of the susceptibility maps produced by deep learning algorithms can be improved by metaheuristic optimization techniques (Jaafari et al. 2022).To achieve a reasonable improvement in LSM, it is therefore necessary to explore the combination methods in various landslide-prone areas.
CNN is a powerful deep learning model that is crucial in solving pattern recognition problems (Youssef et al. 2022).Nevertheless, the parameter settings significantly affect the learning process, which in turn impacts the prediction model's accuracy.To overcome the constraint of determining the parameter settings, two metaheuristic optimization algorithms, COA and BWO, were introduced to increase the prediction accuracy of the CNN model by improving its hyperparameters.The COA algorithm has several advantages: (1) this algorithm's design does not have a control parameter, hence no kind of parameter control is required; (2) this algorithm is highly efficient in dealing with various optimization problems and complex high-dimensional problems in various sciences; and (3) this algorithm exhibits excellent research-search process balancing skills, enabling high-speed convergence to deliver suitable values for decision variables in optimization applications.It is challenging to obtain the global optimum for each metaheuristic algorithm by balancing exploration and exploitation.In tackling optimization problems, the BWO algorithm provides a good balance between the exploration phase and exploitation phases.Furthermore, because the search trajectory clustering is close to the global best solution, this algorithm achieves fast convergence.In this study, two optimization algorithms, BWO and COA, were used to enhance the original CNN model's predictive capability in the Sichuan Province of China, which is a landslide-prone area.The two new models (i.e.CNN-BWO and CNN-COA) can increase the prediction accuracy of the CNN model and generate highly reliable susceptibility maps.Additionally, the CNN-BWO and CNN-COA models are presented for the first time on LSM, which is the primary distinction between the work presented here and the other studies.The results of this study have implications for landslide prevention in the study area and other pertinent studies.

Study area
Sichuan Province is located in the southwestern part of China, covering an area of approximately 486,100 km 2 and extending between 97°21 ′ and 108°31 ′ E longitude and 26°03 ′ and 34°19 ′ N latitude (Figure 1).Due to the presence of extensive river networks, the study area has abundant water resources.The study area's topography and geomorphology are highly diversified, consisting of basins, mountains, hills, and plateaus.The Yunnan-Guizhou Plateau, the Bayankala Mountains, Min Mountains, and Daba Mountains, and the Tibetan Plateau form the southern, northern, and western borders of the study area, respectively.The general elevation of the study region ranges from 688 to 4,745 m above sea level.The slope gradient in this region was divided into seven categories, and 88.45% of the gradients were less than 38°.The study area included three main climate zones: the subtropical climate zone, alpine climate zone, and mid-subtropical humid climate zone.The climate of the study area is characteristic of a temperate continental monsoon zone, being warm in summer and cool in winter.According to a statistical analysis of long-term historical rainfall data  from the Sichuan Provincial Meteorological Service, the average annual rainfall is 1032.5 mm.The rainy season lasts from June to September, and rainfall during this period accounts for 70.25% of the total annual rainfall.There have been 186 major earthquakes (1128-2020) with a surface wave magnitude (Ms) greater than 5.0 recorded in this region, which indicates that the seismic activity is relatively strong.

Landslide inventory
The construction of a landslide inventory is the basis of LSM.In addition to providing sample labels for the training of the proposed models, a landslide inventory can improve our comprehension of the spatial relationship between historical landslide occurrence and selected conditioning factors.A set of base maps relevant to landslide occurrence were used in this study, including the ALOS PAL-SAR digital elevation model (DEM), geology maps (1:200,000-scale), rainfall maps, and road maps.The Department of Natural Resources of Sichuan Province provided information on the historical distribution of landslides in Sichuan Province.Because of the large area of the study area and the huge amount of data involved, this study used the High-performance Computing Platform of Sichuan Agricultural University when analyzing the data.In the study area, a total of 34,893 landslide points were mapped, as shown in Figure 2. A total of 24,425 landslides (70%) were randomly chosen for model training, while the remaining 10,468 landslides (30%) were used for model validation to assess landslide susceptibility.The construction of nonlandslide points is important to suppress the overestimation of landslide susceptibility.However, there are currently no guidelines and no reasonable technique for obtaining nonlandslide points (Hu et al. 2020;Chang et al. 2023).In this study, nonlandslide points were chosen to meet the following requirements.First, every nonlandslide point needs to be at least 500 m away from landslides.Second, any two nonlandslide points must be separated by a distance of greater than 100 meters.These requirements aid in preventing sampling in landslide-affected areas.A total of 34,893 nonlandslide points were generated from the nonlandslide area and divided into two subsets (70/30).

Landslide conditioning factors
The environmental variables of landslide occurrences are essential for predicting landslide susceptibility (Lin et al. 2022).There are hundreds of conditioning factors that influence landslides (Reichenbach et al. 2018).The appropriate factors need to be chosen to create a trustworthy map of landslide susceptibility.However, there are no clear guidelines or uniform criteria for choosing conditioning factors.In this study, it was necessary to determine the landslide conditioning factors based on the regional geo-environment characteristics, an examination of the prior literature, and the availability of data sources for the study area.Fourteen landslide conditioning factors were selected for the spatial prediction of landslides: slope angle, road density, slope aspect, relief amplitude, cutting depth, elevation, surface roughness, stream power index (SPI), plan curvature, river density, peak ground acceleration (PGA), annual maximum 24-hour rainfall, profile curvature, and lithology (Table 2) used for landslide susceptibility modelling (Figure 3).The details of data collection are documented in Table 3.
Elevation is an important factor that is frequently used in LSM (Zhang et al. 2023).In this study, the elevation values of the study area were grouped into five classes: < 824 m, 824-1,581 m, 1, 378 m,2,142 m,3,803 m,3,341 m,and > 4,341 m (Figure 3(a)).
Slope angle is an essential conditioning factor in landslide susceptibility evaluations, as it is directly tied to the forces that cause hill slopes to shift (Bozzolan et al. 2023).The slope angle values for this study were reclassified into seven classes (Figure 3(b)).
Slope aspect describes the degree of solar radiation received on slope faces, which can affect soil moisture and slope stability (Zhang et al. 2023).The slope aspect map, as shown in Figure 3(c), was divided into nine categories, including eight directions and flat.
Relief amplitude, the altitude difference between a given area's highest and lowest points, is another important factor for landslide susceptibility modeling (Huang et al. 2020).The relief amplitude map was divided into five classes for this study : < 118 m,and > 446 m (Figure 3(d)).
Cutting depth is defined as the difference between the average and lowest elevations in a certain area and is a frequently utilized component in LSM (Chen et al. 2022) River density, defined as the total length of a river network in a given area, is another important topographical factor in LSM (Alqadhi et al. 2021).For this study, the river density of the study area was divided into six classes: < 0.38 km/km 2 , 0.38-0.52km/km 2 , 0.52-0.64km/km 2 , 0.64-0.74km/ km 2 , 0.74-0.83 km/km 2 , and > 0.83 km/km 2 (Figure 3(f)).Plan curvature has a direct impact on the convergence and divergence of water flow across a surface, which in turn affects the process of landsliding (Al-Najjar and Pradhan 2021).The plan curvature map was divided into three classes: flat, concave, and convex (Figure 3 Surface roughness is considered a significant conditioning factor affecting landslide susceptibility (Abdulwahid and Pradhan 2017).This factor was grouped into five categories as follows: < 1.00, 1.00-1.10, 1.10-1.23, 1.23-1.52, and > 1.52 (Figure 3(h)).
Profile curvature influences the acceleration and deceleration of stream movement and is an essential indicator of erosion and sediment movement (Al-Najjar and Pradhan 2021).This factor was classified into three classes: <− 0.001, −0.001-0.001,and >0.001 (Figure 3(i)).
Rainfall is recognized as one of the most important variables in the development of landslides since it affects slope shear strength (Huang et al. 2022).In this study, the annual maximum 24hour rainfall was utilized to map landslide susceptibility, and the rainfall map was classified into fifteen classes with 10 mm intervals (Figure 3(k)).
Lithology is one of the most commonly used variables in landslide susceptibility studies, and some geological formations are more conducive to landslides (Chen et al. 2022).There are nine major lithological groups in the study area (Figure 3(l)).
PGA is regarded as an essential dynamic component for assessing the influence of earthquakes on landslides (Yi et al. 2019).For the study area, the PGA ranges from 0.05 to 0.4 g (Figure 3(m)).

General methodology
The methodology for this study included five major steps: (1) database preparation, (2) mapping a set of conditioning factors, (3) designing and training predictive models, (4) generating surface maps of landslide susceptibility, and (5) evaluating the models and maps quantitatively (Figure 4).The frequency ratio (FR) model, which can represent the spatial correlations between landslide distribution and each conditioning factor, was used to identify geographic information system (GIS) factors related to landslide incidence (Berhane, Kebede, and Alfarrah 2021).The FR represents the probability ratio of a landslide occurrence to a nonoccurrence for a particular attribute (Guo et al. 2021).The ratio can be calculated as follows: where G i is the number of landslide pixels in each subclass i of a conditioning factor, G is the total number of landslide pixels, H i is the number of pixels of each subclass i, and H is the number of total pixels in the study area.

Evaluation of conditioning factors
In LSM, the quality of the map depends on both the selected models and the data's capacity for prediction.Since not all landslide conditioning factors have the same level of predictive power, some factors may produce noise that lowers the predictive ability of the final models (Tien Bui et al. 2016).Conditioning factors with little or no predictive power should be eliminated to achieve more accurate results.In this study, the linear support vector machine (LSVM) model was employed to examine the prediction capability of the fourteen conditioning factors.The classification accuracy of this model could be improved by removing unnecessary input parameters (Roy et al. 2020).Quantification of the prediction capabilities of the fourteen conditioning factors was carried out using the following equation (Pham et al. 2017): where w T is the inverse matrix, m = (m 1 , m 2 , … , m 14 ) is the input vector containing fourteen conditioning factors, and n is the offset from the origin of the hyperplane.Average Merit (AM) is a quantitative measure for determining the relative significance of conditioning factors.The higher the AM for each conditioning factor, the greater its influence on the occurrence of landslides.

Convolutional neural network (CNN)
CNN is a well-known deep learning method that can automatically extract useful features from hierarchical neural networks (Li et al. 2022).A basic CNN model is made up of several neural layers, which commonly include convolution layers, pooling layers, and fully connected layers (Figure 5).The convolutional layer can simultaneously find connections between each feature class and perform well by leveraging information extracted from the input data.The pooling layer allows the neural network to focus more on important features by lowering the number of features.Maximum pooling and average pooling are the two most common pooling techniques (Momeny et al. 2022).Maximum pooling uses the maximum value of the features, while average pooling uses the features' average value within a specific pooling zone.The fully connected layer is often used to construct the final combination of nonlinear properties to predict the network design's end.The convolutional layer in the CNN approach employs a set of convolution kernels to learn an efficient appearance from input variables.Assume that v = {v 1 , v 2 , … , v N } refers to the input feature vectors, where N is the number of conditioning factors.Assume that the convolutional layer contains k kernels, with the jth kernel having the weight and bias w j and b j , respectively.As a result, the convolutional operation's output C j can be calculated using the following equation (Xie et al. 2023): where * stands for the convolutional operation and f is the nonlinear activation function.
To reduce the size of the feature vector and avoid overfitting, the pooling layer (max-pooling), which comes after the convolution layer, was used.The extracted feature vectors are then reorganized by fully connected layers.Finally, using the softmax activation function, the detected vector is transformed into a prediction probability for the related category.The probability is calculated as follows: where m is the predicted class, M represents the number of categories, w m and w n represent the weight vectors, and b m and b n indicate the bias vectors.

Beluga whale optimization (BWO)
The BWO algorithm is a novel swarm-based metaheuristic algorithm inspired by the natural behavior of beluga whales (Zhong, Li, and Meng 2022).BWO, similar to other metaheuristic algorithms, includes the exploration phase and the exploitation phase (Figure 6).The principle steps of the BWO are formulated as follows (Figure 7): Initialization stage: Due to the population-based nature of BWO, beluga whales are used as search agents.Each beluga whale is a potential solution that is updated during optimization.The matrix of search agent positions is represented as: where n represents the population size of the search agents and d stands for the design variables' dimensionality.
The relevant fitness values for each beluga whale are stored as follows: . . .The balance factor B f can be denoted as: where T max is the current iteration's (T ) maximum number of iterations.In each iteration, B 0 is a random number between (0, 1) that is created.The fluctuation range of B f decreases from (0, 1) to (0, 0.5) as iteration T rises, showing the considerable change in probabilities for the exploitation and exploration phases, while the probability of the exploitation phase rises as iteration T rises.Exploration stage: The formulation of the BWO's exploration stage includes an examination of the beluga whales' swimming behavior.The positions of the beluga whales are established by their pair swim, and these positions are modified as follows: where T is the latest iteration, X T+1 i,j denotes the i-th beluga whale's new location on the jth dimension, and p j denotes a random number drawn from the jth set.The current locations for the rth and ith are represented by X T r,P1 and X T i,Pj , respectively, where r is a randomly chosen beluga whale, r 1 and r 2 are random numbers between (0, 1), and cos (2pr 2 ) and sin (2pr 2 ) reveal that the fins of the mirrored beluga whales face upward.
Exploitation stage: This stage takes cues from beluga whales' hunting behaviors.Beluga whales can migrate and forage together depending on their proximity to other beluga whales.Thus, beluga whales hunt by sharing information about available positions for each other and choosing the best candidate among them.To improve convergence, a Levy flight technique is added to the BWO exploitation step and is represented as: where X T i and X T r represent the current positions of the ith beluga whale and a random beluga whale, respectively; X T+1 i and X T best represent the ith beluga whale's new position and the best position for the entire population; r 3 and r 4 represent random numbers between (0, 1); and C 1 = 2r 4 (1 − T/T max ) represents the random jump strength.
The Levy flight function, or L f , is computed as follows: where β is a constant set to 1.5 and u and v are random values with a normal distribution.
Whale fall: Killer whales, polar bears, and people pose threats to beluga whales throughout their migration and foraging.The majority of beluga whales are clever and can avoid danger by sharing information with one another.Nevertheless, a small number of beluga whales did not thrive and died on the deep seabed.The phenomenon, known as 'whale fall,' provides food for a variety of creatures.The BWO replicates the whale fall process by selecting the probability of whale fall from the population's individuals.Based on the probability of the whale fall that was chosen to handle every change in all groups, it is considered that these beluga whales either migrated or fell into the deep sea.In order to keep the population size constant, the positions of beluga whales and the step size at which a whale falls are used to determine the updated position.The mathematical formulation of the model is as follows: where r 5 , r 6 and r 7 represent random numbers between (0, 1), and X step is the whale fall's step size and can be expressed as follows: where C 2 is the step factor that correlates with the likelihood of whale falls and population size, and u b and l b are the upper and lower variable boundaries, respectively.It is clear that the maximum number of iterations, the iterations, and the boundaries of the design variables all have an influence on C 2 .The probability of a whale falling (W f ) is calculated as a linear function in the BWO model: The likelihood of whale fall drops from 0.1 in the first iteration to 0.05 in the last iteration, indicating that the risk of beluga whales decreases as they get closer to their food source during the optimization process.

Coati optimization algorithm (COA)
The COA algorithm is a brand-new bioinspired optimization algorithm inspired by Coati's natural behavior (Dehghani et al. 2023).The fundamental idea of COA is the replication of the two natural behaviors of coatis: (i) pursuing and attacking iguanas and (ii) fleeing from predators.In the COA algorithm, coatis are viewed as population participants.Each coati's position in the search space determines the values for the decision factors.Therefore, the coatis' position in the COA offers a potential solution to the problem.The coatis' initial position in the search space is created at random using the following equation: where X i is the ith coati's position in the search space, x i,j is the jth decision variable's value, N is the number of coatis, m is the number of decision variables, r is a random number between (0, 1), and lb j and ub j are the lower and upper boundaries of the jth decision variable, respectively.
The population matrix (X ) is a mathematical representation of the coati population in the COA and can be obtained as follows: Different values for the problem's objective function are evaluated as a result of the placement of potential solutions in decision variables.These values can be given as follows: where F i is the objective function value derived based on the ith coati, and F is the vector of the obtained objective function.
The COA population is updated in two different stages.The two stages of the COA are formulated as follows (Figure 8): Hunting and attacking strategy on iguana (exploration stage): The initial stage of updating the number of coatis in the search space is based on modeling their attack strategy on iguanas.In this method, several coatis climb up the tree to approach an iguana and startle it.Other coatis gather around the iguana as it falls to the ground while they wait under a tree.The iguana falls to the ground, and the coatis attack and hunt it.The pattern diagram for this strategy is shown in Figure 9.It is assumed that the iguana occupies the position of the best member of the population.The following equation is used to approximate the coatis' location when they emerge from the tree.
When the iguana falls to the ground, it is randomly placed in the search space.Coatis on the ground move in the search space according to this random position, which is approximated using Eqs.( 20) and ( 21).
If the new location estimated for each coati improves the value of the goal function, it is acceptable for the update process.Otherwise, the coati stays in its former position.This update condition is simulated using Eq. ( 22) for i = 1, 2, … , N.
where n represents the population size of the search agents and d stands for the design variables' dimensionality.
Here, X P1 i is the newly determined position for the ith coati, X P1 i,j is its jth dimension, F P1 i is the value of its objective function, r is a random real number between (0, 1), and Iguana stands for the position of the iguana in the search space, which actually corresponds to the position of the best member.Iguana j is its jth dimension, I is an integer that is randomly chosen from the set {1, 2}, Iguana G is the randomly generated iguana's position on the ground, Iguana j G is its jth dimension, F Iguana G is its objective function value, and ⌊•⌋ is the floor function.
The process of escaping from predators (exploitation stage): The second stage of the process of updating coatis' positions in the search space is mathematically modeled based on coatis' typical behavior when encountering and evading predators.When a predator attacks a coati, it leaves its location (Figure 10).Coati's actions in this method place it in a secure position close to its current position, demonstrating the COA's ability for local search exploitation.To imitate this behavior, a random position near the position of each coati is generated using Eqs.( 23) and ( 24).
, where t = 1, 2, . . ., T. (23) If the newly calculated position enhances the value of the objective function, this condition uses Eq. ( 25), then it is acceptable.
Here, X P2 i is the new location determined for the ith coati, X P2 i,j is its jth dimension, F P1 i is the value of its objective function, r is a random number between (0, 1), t is the iteration timer, the jth decision variable's local lower and upper bounds are denoted as lb j local and ub j local , respectively, and lb j and ub j represent the jth decision variable's lower and higher bounds.

Model evaluation measures
The accuracy of measurement results in LSM should be ensured through model performance evaluation.The confusion matrix was utilized in this study to assess the performance of all models.Based on the confusion matrix, several performance metrics were obtained, including specificity (SPF), sensitivity (SST), accuracy (ACC), Cohen's kappa coefficient (κ), positive predictive value (PPV), and negative predictive value (NPV).These performance metrics can be calculated using the following equations: (9-24°), and altitude (< 2,378 m), may play an important role in the emergence of very high and highly susceptible areas.Heavy rain that falls continuously has the potential to saturate the rock and soil mass, create positive pore pressure, and act as a catalyst for other conditions    to come together more quickly and cause a landslide.The lithological unit has a significant impact on the occurrence of landslides.In Sichuan Province, the Jurassic and Cretaceous strata composed of sandstone intercalated with a high percentage of mudstone are known as the most slide-prone strata and have produced numerous landslides.Engineering construction connected with road networks has significantly altered local landforms and the hydrogeological characteristics of slopes, weakening slope stability.The equilibrium of natural slopes is disturbed by the inappropriate excavation of the slope toe, which promotes the formation of landslides.Changes in slope angle have a direct impact on the shear stress, which leads to failures.Compared to hard rocks, weak rocks were unable to withstand the higher slope angle.Sichuan Province is dominated by gentle and medium slope angles.A total of 77.25% of all landslides are located in the slope gradient range of < 24°, which makes up 59.73% of the gradients in the region.Landslides appear to be associated with medium or moderate slopes, which have the characteristics of slide-type movements.Higher slopes are related to rock falls and avalanches, while lower slopes are related to flow-type movements.The low frequency of landslides on steeper slopes is mostly due to a low number of such cells.Additionally, the research area's steep natural slopes are not prone to landslides because they are mostly made of bedrock with high weathering resistance.Low-altitude regions are more likely to experience landslides than high-altitude regions.These regions have a high concentration of human activity, which increases the risk of landslides.The best strategy to prevent landslides is to pay closer attention to the co-occurrences of the landslide-related factors outlined above.The low and very low susceptibility zones are primarily found in the research area's west and north, with high altitudes and steep slope angles, and these characteristics are inappropriate for the occurrence of landslides.

Model validation and comparison
Tables 4 and 5 show the training and validation performances of three landslide models employing PPV, NPV, SST, SPF, ACC, and κ.The statistical index metrics for the CNN-COA model performed the best on the training dataset, followed by the CNN-BWO model and the CNN model.As shown by the kappa indices of 0.867, 0.832, and 0.730 for the CNN-COA, CNN-BWO, and CNN models, respectively, there is substantial agreement between the observed and projected landslides.The results of the validation dataset follow the same pattern as the training dataset.Figure 15 shows the ROC plots for the CNN-COA, CNN-BWO, and CNN models, and Tables 6 and 7 show the detailed results.The CNN-COA model has the highest predictive performance for the training dataset, as evidenced by its higher AUC value (0.946) compared to the other two models.When compared to the training dataset, the results of the validation dataset revealed similar performance trends.The CNN-COA model (0.919) had the greatest AUC value, followed by the CNN-BWO model (0.906) and the CNN model (0.805).In general, all landslide models perform well in the assessment of landslide susceptibility (AUC > 0.8).By comparing the standard error (SE) and 95% confidence interval (CI), the CNN-COA model has the smallest SE (0.0136) and the narrowest 95% CI (0.912-0.970), followed by the CNN-BWO model (SE = 0.0152; 95% CI = 0.901-0.964)and the CNN model (SE = 0.0230; 95% CI = 0.806-0.895).The CNN-COA model has the best performance among the three models, and the training datasets of all three models show a respectable goodness of fit.The established model's results were verified using the validation dataset, and the CNN-COA model still had the smallest SE (0.0173) and the narrowest 95% CI (0.879-0.949).The GD-FFR-RF model clearly outperforms the other two models when using the validation dataset.

Discussion
By choosing the appropriate landslide conditioning factors, landslide susceptibility models can produce more accurate findings with less noise, improving their prediction power.Therefore, in landslide modeling or landslide susceptibility studies, the choice of proper conditioning factors is seen as a   Wang et al. (2020), the random forest model can be used to choose the landslide conditioning variables.Their findings demonstrate that the catchment area indicated a nonpredictive value; therefore, these values were eliminated from the landslide susceptibility modeling.To improve the performance of the landslide model, Chen et al. (2022) used the information gain ratio method to exclude irrelevant or insignificant components.Other approaches, such as Relief-F (Fang et al. 2021), correlation attribute evaluation (Wu et al. 2020), and gain ratio (Wang, Fang, and Hong 2019), have also been used to identify landslide conditioning factors.Higher AUCs, reduced RMSE, and other metrics show that these strategies have indeed enhanced the performance of the landslide models.The LSVM model was used in the current study to rank the significance of conditioning factors.In the study area, rainfall was found to be the most significant factor influencing the occurrence of landslides.The result is in accordance with other authors such as Pham, Prakash, and Bui (2018), Jiao et al. (2019), Dou et al. (2020), Wang et al. (2020), andMandal, Saha, andMandal (2021).In reality, the hourly rainfall intensity is the primary driver of geological hazards such as flash floods and landslides.The annual maximum 24-hour rainfall has high importance in LSM, and the contribution of this factor to landslide occurrence varies by region.Rainfall infiltration alters the stresses inside the internal structure of slopes, reduces their shear strength, and causes slope instability.The results of the LSVM approach revealed that all of the conditioning factors were capable of positive prediction.Therefore, susceptibility models were created using all fourteen conditioning factors.
The CNN model is currently regarded as a superior landslide modeling tool across the globe due to its promising results and greater prediction rate in the spatial prediction of landslides (Aslam, Zafar, and Khalil 2023).For example, Sameen, Pradhan, and Lee (2020) used several models to assess landslide susceptibility.They found that a CNN-based model achieved the highest validation data accuracy (83.11%) and AUC (0.880).Mandal, Saha, and Mandal (2021) successfully investigated the landslide susceptibility of the Rorachu River basin.It also noted that the CNN model outperformed other machine learning algorithms such as random forest, artificial neural network, etc.A regional multihazard (flash floods, debris flows, and landslides) susceptibility prediction framework is provided by Ullah et al. (2022) for the Shangla District, Pakistan, based on three different models (CNN, logistic regression, and K-nearest neighbor).Their results show that the CNN model performs better than traditional machine learning models at estimating the susceptibility of flash floods, debris flows, and landslides.The CNN model's output should be regarded with caution because it frequently relies on parameters that have been predetermined and/or determined through trial and error.The CNN model's performance is mostly determined by its design, which includes the training approach, input window size, and layer depth.Metaheuristic optimization algorithms have been demonstrated to effectively increase landslide prediction accuracy (Jaafari et al. 2022).Those with optimum hyperparameters offer high-quality classification outputs and have the most potential for usage in LSM.In general, the model's structure, dataset size and quality, and parameter tuning can all affect how well a deep learning model predicts.Fine-tuning of the hyperparameters of the CNN model using metaheuristic techniques and the size of the input data influence the model's prediction performance (Table 8).Due to the COA algorithm's   0.084, respectively.These results demonstrate the COA algorithm's superior capacity to optimize the input dataset.As a result, in this instance, the COA algorithm is more suitable to create the hybrid model with the CNN.It should be emphasized that the CNN-BWO model can also yield satisfying results, which are just slightly inferior to the results of the CNN-COA model.The difference in performance with the training and validation datasets between the two innovative hybrid models is not substantial, and both models can give appropriate results to estimate landslide susceptibility in this study region.The difference in AUC values for the CNN model between the training and validation datasets was 0.050, and the corresponding differences of the CNN-BWO and CNN-COA models were 0.031 and 0.027, respectively.This shows that the risk of overfitting is more likely to occur in a single model, and the CNN-BWO and CNN-COA models can effectively avoid this issue by lowering the model's variance.In practice, the model's stability is crucial for landslide susceptibility modeling.Limitations can be used to express stability.For example, compared to the CNN model, the CNN-BWO and CNN-COA models need a relatively longer training time.As a result, model selection is relatively subjective depending on the perspective of the user, planner, or even reader under specific circumstances.There are various factors that can contribute to uncertainty in LSM, such as the inconsistent spatial resolution of the input data, ratio of landslide to nonlandslide samples, proportion of training and validation datasets, selection of modeling methods, consideration of different landslide types, determination of conditioning factors, and other factors.All of the aforementioned factors can affect the quality of LSM, making it challenging to precisely capture the spatial location of landslides.The uncertainty caused by modeling methods is minimized in this study by utilizing an ensemble of different optimization algorithms.In spite of using different environmental predictors or producing noticeably different spatial predictions, some models may display similar predictive performance, which is one of the reasons for employing and integrating multiple models.This makes it difficult to determine the most acceptable and equivalent candidate models.By combining these models, the strengths of each individual model can be incorporated while the weaknesses are minimized, resulting in more reliable and accurate prediction.Additionally, as suggested by Kim et al. (2018), Yin et al. (2021), Lv et al. (2022), and Aslam, Zafar, and Khalil (2023), integrating multiple models can help reduce the level of uncertainty associated with each model by providing a variety of predictions that can be used to determine the degree of uncertainty related to the final prediction.
Overall, the enhanced efficiency of the metaheuristic optimization technique was consistent with earlier studies that combined the optimization algorithm and machine learning model (Balogun et al. 2021;Wang et al. 2021;Lv et al. 2022;Huang et al. 2023).In this work, the implementation of an optimization algorithm to be combined with the deep learning algorithm based on the CNN model provides a promising result, particularly when the COA optimization algorithm is used as the highest predictive model.As a result, in this study, the CNN-COA model is regarded as the optimum model for LSM.Nevertheless, among the countless models, it is difficult to find the optimal model for LSM in a particular geo-environment, and other hybrid models should be further investigated.
Although this study contributes to the research on landslide susceptibility mapping, there are still some limitations.First, in the different geomorphic units, not all slide and flow typologies could be predicted with the same precision.On the one hand, this is due to their high reliance on some factors not considered in the current assessment or on the interplay of local conditions.On the other hand, even the factors that were examined are likely to differ in how they regulate different types of movement.By increasing the typological specificity of the susceptibility assessments, this aspect can be better accounted for in the future.Second, nonlandslide points were randomly selected in a 1:1 ratio from landslide-free area.This selection method may have a negative impact on the evaluation results.Existing studies have demonstrated that different sampling ratios of nonlandslide points can greatly influence the original landslide susceptibility evaluation results (Hong et al. 2019;Chang et al. 2023;Liu, Tang, and Huang 2023).Therefore, to ensure the accuracy of the results of LSM, the optimized sampling ratio of nonlandslide locations should be given more consideration.Third, the landslide susceptibility analysis in the study area is stationary in time.This is not valid because landslide-inducing factors (such as rainfall and earthquakes) may vary over time.Furthermore, along with the occurrence of new landslides, the landslide inventory must also be updated.Therefore, multitemporal data on landslide conditioning variables and inventories should be gathered, and transfer learning may be a useful method for evaluating those temporal changes.

Conclusions
This study developed two innovative hybrid models for the spatially explicit prediction of landslide susceptibility by combining the CNN model with the BWO and COA optimization techniques.This is the first study to create such models and test their utility using actual, spatially explicit data from a landslide-prone region of Sichuan Province.Fourteen landslide conditioning factors were used to prepare the training datasets for the susceptibility analysis.The relative contribution of conditioning factors was assessed using the LSVM method.All fourteen factors are to some extent responsible for LSM and are thus used for spatial landslide modeling in Sichuan Province.The ROC curve, RMSE, and other statistical metrics were used to analyze the accuracy assessment and performance of the three models.The results revealed that there was satisfactory agreement between the predicted susceptibility grades and the known landslide locations using the two hybrid models.The results showed that the CNN-COA model had superior performance in LSM, with the highest AUC values of 0.946 and 0.919 for the training and validation datasets, respectively.Following this model for the training datasets were the CNN-BWO model and the CNN model (AUC values of 0.937 and 0.855, respectively), and for the validation datasets, the CNN-BWO model and the CNN model (AUC values of 0.906 and 0.805, respectively).The hybrid models used in this work greatly increase the CNN model's accuracy, showing that BWO and COA optimization techniques are promising approaches for LSM.The CNN-COA model generated the best landslide susceptibility map in terms of overall performance.As a result, the CNN-COA model was chosen as the best model for LSM in the study region.The mapping unit used in this study is the grid cell, which has little physical significance and cannot accurately represent the geological or geomorphological characteristics of natural slopes.Instead, the slope unit based on terrain segmentation can more accurately reflect the variables that influence the occurrence of landslides.Therefore, in future research, the slope unit should be used to analyze landslide susceptibility.In addition, a deeper and wider CNN architecture can be used to create a more accurate and robust hybrid model.Transfer learning can be investigated to optimize the parameters based on historical data from the local area, characterize local characteristics, and further improve the model's capacity for generalization and robustness when used across different geographical areas.Overall, the generated landslide susceptibility maps can be useful for better land use management and can help decision-makers provide a better approach to mitigate further loss in the case study area by providing a better prevention strategy in vulnerable areas.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Figure 1 .
Figure 1.Location of the study area.

Figure 2 .
Figure 2. Landslide inventory map of the study area: (a) training data.(b) validating data.

Figure 4 .
Figure 4. Methodological flow chart of the present study.

Figure 9 .
Figure 9. Pattern diagram for the Coati Optimization Algorithm's first stage (modified after Dehghani et al. 2023): (a) Attack of half of the coatis' population towards the iguana on the tree; (b) Hunting the fallen iguana by the other half of the coatis' population.

Figure 10 .
Figure 10.Pattern diagram of a coati evading a predator in the second phase of Coati Optimization Algorithm (modified after Dehghani et al. 2023).

Figure 11 .
Figure 11.Average merit of the landslide conditioning factors.

Figure 12 .
Figure 12.Landslide susceptibility map produced by the CNN model.

Figure 13 .
Figure 13.Landslide susceptibility map produced by the CNN-BWO model.

Figure 14 .
Figure 14.Landslide susceptibility map produced by the CNN-COA model.

Figure
Figure Comparison of the three models using ROC curve technique with (a) the training dataset and (b) the validation dataset.
properties, CNN-COA outperformed the other algorithms in terms of prediction capacity.The ability to avoid local minima, accelerate convergence, and have superior robustness and stability due to only two control parameters needing to be tuned in the algorithm results in acceptable performance in predicting landslide susceptible areas.Several statistical indices showed that the CNN-COA model outperformed the other models on the training dataset.A total of 95.3% (PPV) and 90.5% (NPV) of the training dataset's pixels were correctly identified by the CNN-COA model as landslide and nonlandslide, respectively.For the classification of landslide locations, the CNN-COA (SST = 90.9%)model outperforms the CNN-BWO (SST = 90.7%)and CNN (SST = 84.3%)models.In terms of nonlandslide classification, the CNN-COA model still has the highest SPF (SPF = 95.1%),followed by the CNN-BWO model (SPF = 94.4%) and the CNN model (SPF = 87.6%).Additionally, the CNN-COA model outperformed the other two models, as shown by the results of ACC and κ.The results of all statistical indices in the validation dataset revealed that the CNN-COA model performed best (PPV = 93.6%,NPV = 88.4%,SST = 88.9%,SPF = 93.3%,ACC = 90.9%,and κ = 0.838).On both the training and validation datasets, two hybrid models (i.e.CNN-COA and CNN-BWO) demonstrated superior predictive performance.The AUC value of the CNN model was increased by 9.1% and 8.2%, respectively, by the COA algorithm and the BWO algorithm utilizing the training dataset.COA and BWO both enhanced the CNN model's prediction by 11.4% and 10.1%, respectively.The RMSE value of each algorithm was used to conduct a second model performance study, with lower RMSE values indicating higher predicted accuracy.The RMSE values of the CNN, CNN-BWO, and CNN-COA models for the training dataset are 0.219, 0.094, and 0.083, respectively (Figures 16-18).The RMSE values for the CNN, CNN-BWO, and CNN-COA models for the validation dataset are 0.221, 0.124, and

Figure 16 .
Figure 16.CNN model: (a) target and output values of training data samples, (b) magnitude of training error, (c) target and output values of validation data samples, and (d) magnitude of validation error.

Figure 17 .
Figure 17.CNN-BWO model: (a) target and output values of training data samples, (b) magnitude of training error, (c) target and output values of validation data samples, and (d) magnitude of validation error.

Figure 18 .
Figure 18.CNN-COA model: (a) target and output values of training data samples, (b) magnitude of training error, (c) target and output values of validation data samples, and (d) magnitude of validation error.
This work was supported by China Postdoctoral Science Foundation: [grant number 2020M680583]; National Natural Science Foundation of China [grant number 52208359 ]; National Natural Science Foundation of China: [grant number 52109125]; National Postdoctoral Program for Innovative Talents: [grant number BX20200191].

Table 1 .
Review of the methods used in the previous studies for LSM.

Table 2 .
Description of geological units of the study area.

Table 4 .
Model performance on the training dataset.

Table 5 .
Model performance on the testing dataset.

Table 6 .
AUC analysis for the nine landslide models with the training dataset.

Table 7 .
AUC analysis for the nine landslide models with the testing dataset.Due to the complexity of landslides, there was no established global protocol or clear guidelines for choosing trustworthy landslide conditioning factors in earlier studies.According to

Table 8 .
Hypermeters settings of the used models.