Evaluation of neural network models for landslide susceptibility assessment

ABSTRACT Identifying and assessing the disaster risk of landslide-prone regions is very critical for disaster prevention and mitigation. Owning to their special advantages, neural network algorithms have been widely used for landslide susceptibility mapping (LSM) in recent decades. In the present study, three advanced neural network models popularly used in relevant studies, i.e. artificial neural network (ANN), one dimensional convolutional neural network (1D CNN) and recurrent neural network (RNN), were evaluated and compared for LSM practice over the Qingchuan County, Sichuan province, China. Extensive experimental results demonstrated satisfactory performances of these three neural network models in accurately predicting susceptible regions. Specifically, ANN and 1D CNN models yielded quite consistent LSM results but slightly differed from those of RNN model spatially. Nevertheless, accuracy evaluations revealed that the RNN model outperformed the other two models both qualitatively and quantitatively but its complexity was relatively high. Experiments concerning training hyper-parameters on the performance of neural network models for LSM suggested that relatively small batch size values with Tanh activation function and SGD optimizer are essential to improve the performance of neural network models for LSM, which may provide a thread to help those who apply these advanced algorithms to improve their efficiency.


Introduction
Landslides are a ubiquitous natural hazard that seriously threatens the safety of people's lives and property as well as human wellbeing alike and will occur more frequently in the future under a changing climate context in mountainous regions in China as well as worldwide (Froude and Petley 2018;Gariano and Guzzetti 2016;Pham et al. 2020). Landslide is a complex and synthetic geographical phenomenon , which describes a wide variety of surface processes (Hungr, Leroueil, and Picarelli 2014;Varnes 1978), including block slide, debris flow and rock fall. Its occurrence and expansion are all affected by various factors, including conditioning factors (topography, geology, etc.) and triggering factors (rainfall, earthquakes, etc.) (Guzzetti et al. 2005). Consequently, landslide susceptibility mapping (LSM), defined as the spatial probability of landslides occurring in a region under the local geo-environmental conditions (Reichenbach et al. 2018), is strongly recommended as a critical step for disaster mitigation and risk management, as well as land planning (Fell et al. 2008). However, due to the complexity of landslide formation (Cruden and Varnes 1996), generating a reliable and accurate landslide susceptibility map yet remains a challenging technique in practice.
In the past decades, various approaches have been developed to assess landslide susceptibility, among which knowledge-based and data-driven approaches were the most popular (Reichenbach et al. 2018;Thi Ngo et al. 2021). Knowledge-based approaches are mainly based on subjective decision rules by relevant experts (Dai and Lee 2002), the frequently used methods include analytical hierarchy process (AHP) (Mansouri Daneshvar 2014;Yi et al. 2019) and fuzzy logic (Zhu et al. 2014). However, subjectivity and uncertainty in their rules of decision making were argued always on the requirement of the guidance of experienced experts to produce satisfactory results. With the advancement of Earth observations, data-driven approaches, including statistical methods and machine learning models, were developed rapidly in LSM for its advantages in focusing on mining the data itself compared with knowledge-based approaches, such as support vector machine (SVM) (Xu et al. 2012), artificial neural network (ANN) (Abbaszadeh Shahri et al. 2019;Gorsevski et al. 2016) and random forest (RF) (Catani et al. 2013;Fan et al. 2021;He, Wang, and Liu 2021). Based on the high-quality spatial data, data-driven approaches are capable of objectively learning and analyzing the complex relationship between landslides and environmental factors. Nevertheless, owning to the different data mining capabilities of the models adopted, results generated by different data-driven methods still vary greatly. For instance, Nsengiyumva et al. (2019) compared the models of weights of evidence, logistic regression, frequency ratio and statistical index adopted in LSM, which proved the effectiveness of weights of evidence. Ali et al. (2021) generated the LSMs by using several machine learning methods, implying the effectiveness of the RF classifier. Moreover, integrated approaches for LSM were also proposed, such as integration of AHP and ANN (Jena et al. 2019), integration of certainty factor and ANN (Dou et al. 2015) and other ensemble methods (Hong et al. 2018). Meanwhile, more robust and novel LSM algorithms have also been explored.
In recent decades, the deep learning technique with its powerful feature extraction capability got tremendous development in image-related tasks (LeCun, Bengio, and Hinton 2015;Wu et al. 2021). Attempts to apply deep learning algorithms to analyze the complex relationship between landslides and environmental factors become research hotspots in recent years (Hua et al. 2021;Lv et al. 2022;Wang, Fang, and Hong 2019). Unlike the traditional machine learning algorithms, the deeper the model structure of deep learning algorithms is, theoretically the more powerful data mining capabilities they have. Currently, some advanced deep learning models, e.g. deep neural network (DNN) (Dao et al. 2020;Dou et al. 2020;Wang et al. 2020b), convolutional neural network (CNN) (Sameen, Pradhan, and Lee 2020;Wei et al. 2022;, and recurrent neural network (RNN) (Thi Ngo et al. 2021;Wang et al. 2020a), have been applied to LSM with satisfactory performances. As mentioned earlier, different predictive capabilities for the mentioned approaches do exist in the models adopted for LSM, and deep learning models are not exceptional. For instance, experiments by Thi Ngo et al. (2021) suggested that the RNN model performed better than the CNN model, while in the experiments of Habumugisha et al. (2022), the DNN model outperformed both CNN and RNN. Obviously, choosing an appropriate modeling approach for a particular application remains a challenge (Goetz et al. 2015). It is worth noting that the existing works of literature seldom involved in the studies on detailed implementation strategies for these deep learning models, which seriously hinders their extensive applications to some extent. Because the setting of the implementation strategy not only strongly affects the training process of deep learning models but also affects the accuracy of LSM. Additionally, these advanced methods were only applied in a very small number of regions, and the performance of these models needs to be comprehensively compared and investigated in more extensive regions. Therefore, this study was conceived with the aims to investigate and evaluate the performances of different neural network models being adopted in LSM. For this purpose, Qingchuan County, Sichuan province, China, was selected as the experiment site. Three neural network models, i.e. ANN, one-dimensional CNN (1D CNN) and RNN, were adopted to investigate their performances in LSM with the focus of evaluations on the detailed implementation strategies for the three models.

Study area and data
Qingchuan County, with an area of about 3221 km 2 , is situated in northern Sichuan province ( Figure 1). Rugged terrain and mountainous characterized the topography of this area with the elevation ranging from 501 m over the southeast to 3808 m to northwest above the mean sea level. The streams are densely developed, and Pailung River, Qingjiang River and Qiaozhuang River are the main rivers flowing over the county. The strata in this county are predominated by sedimentary (e.g. limestones and sandstones), magmatic (e.g. granite) and metamorphic rocks (e.g. slate, phyllite and schists) (Li et al. 2012). Three major active faults, i.e. the Pingwu-Qingchuan fault, Yingxiu-Beichuan fault and Jiangyou-Guanxian fault dispersed across the study area. Complex geological settings together with rugged topography make this county a typical landslideprone area where geological disasters seriously threaten the safety of people's lives and property. As the first step for LSM, a landslide inventory map was compiled by collecting field observed high accuracy data provided by the local land management department. Till the end of 2019, 304 major slope failures, mainly including small/medium-sized landslides, rock falls and unstable slopes, were recorded and compiled to constitute the core information of landslide inventory (as shown in Figure 1). Since the detailed landslide scarp or boundary was not available in this study, the centroid of the landslide was used to represent the landslide and has been proven its feasibility in LSM (Dou et al. 2020;Huang et al. 2022;Hussin et al. 2016). To avoid training samples imbalance, we randomly sampled the same number of non-landslide points by using the ArcGIS tools. Considering the size of landslides, an empirical value (100-meter) was set to generate the buffer zone of landslide points, and non-landslides were sampled outside this buffer zone. Then, these landslide and non-landslide points were randomly divided into a moderate proportion (7:3) for training and validation purposes.
Subsequently, fifteen conditioning factors were derived from different data sources, as presented in Table 1 and Figure 2, including topographical factors (slope, aspect, elevation, relief amplitude, plan curvature and profile curvature), geological factors (lithology and distance to faults), environmental factors (distance to roads, land-use/cover (LULC), normalized difference vegetation index (NDVI)), hydrological factors (distance to rivers, stream power index (SPI) and topographic wetness index (TWI)) and meteorological factor (annual precipitation).
A 30 × 30 m digital elevation model (DEM) was adopted for extracting topographical factors. Among these factors, slope, aspect, elevation factors are the most frequently applied topographical factors in landslide-related studies (Van Den Eeckhaut et al. 2009;Yi et al. 2019), which indicates that they have an important contribution to the occurrence of landslides. In the study area, the terrain slope varies from 0°to 73°. Over 50% of landslides occurred in the region where the slope angle is more than 40°and the elevation is less than 1000 m. In addition, plan curvature and profile curvature can depict the structure and morphology of regional terrain, and relief amplitude measures topographic changes. Thus, these three conditioning factors were also considered in this study.
For geological factors, lithology has a controlling effect on the occurrence of landslides. In the study area, lithology includes 18 groups, and group c and group h are the most widely distributed. A detailed lithology map was presented in Figure 2(i) and Table 2. As the major causative fault in the 2008 Wenchuan earthquake, Yingxiu-Beichuan fault runs through the study area. In addition, there are also Pingwu-Qingchuan and Jiangyou-Guanxian active faults that may promote the occurrence of landslides. Thus, distance to faults was also an essential factor affecting the stability of a slope.
For environmental conditions, road excavation usually forms highwall slope and can damage the stability of the slope, promoting landslides. LULC can directly reflect the impact of human activity  (e.g. deforestation, agricultural activities) (Meneses, Pereira, and Reis 2019) and poses an impact on the stability of the slope. LULC map was extracted from GLC_FCS30-2020 global land-cover products , as presented in Figure 2(m) and Table 2. In addition, vegetation coverage has indirect effects on slope stability by affecting soil erosion . As a common indicator reflecting vegetation coverage, the NDVI map was calculated by using Landsat-8 imagery, and its value was normalized to [0, 1]. For hydrological factors, SPI and TWI were derived from DEM data. Rivers weaken the slope stability by eroding or saturating slope materials . SPI measures the erosive capacity of the stream (equation (1)) while TWI reflects the soil moisture spatial pattern (equation (2)) (Sameen, Pradhan, and Lee 2020). Obviously, slope instability is closely related to the hydrological process, e.g. runoff and slope saturation by water (Highland and Bobrowsky 2008). Therefore, distance to rivers, SPI and TWI were considered in this study.
where A S is the upslope area per unit contour length and β is the slope gradient in degree. For the meteorological factor, precipitation is an important trigger factor, which can not only activate old landslides but also trigger new ones. Therefore, the meteorological observation station data, which were recorded from 1989 to 2019, were collected to construct the mean annual precipitation map of the study area.
Finally, to facilitate processing, all landslide conditioning factors were downscaled or interpolated to a resolution of 30 m with the same spatial projection.

Methodology
As depicted in Figure 3, this study was mainly conducted in the following three steps. First, various data were collected and processed, including conditioning factors and landslide inventory maps. Then, three neural networks, i.e. ANN, ID CNN and RNN, were constructed to produce the landslide susceptibility maps. Finally, results were compared, evaluated and discussed.

Selection of conditioning factors
Before modeling the susceptibility, a crucial step is to check the multicollinearity among these landslide conditioning factors. Because the presence of correlation between conditioning factors may potentially cause incorrect modeling results. In this study, variance inflation factor (VIF) and tolerance (TOL) were adopted to check the multicollinearity. Given the independent variable x, R j 2 is Figure 3. Workflow of the present study.
the coefficient of determination between the independent variable x j and the rest of the independent variables. The VIF and TOL can be calculated as follows: According to some studies (Dou et al. 2020;, the value of VIF is lower than 10 and the value of TOL is greater than 0.1, indicating that these variables are independent of each other. In addition, the information gain ratio (IGR) method was also applied to analyze the contribution of landslide conditioning factors to modeling. The IGR is a probability-based metric that is primarily applied to evaluate the reduction of uncertainty. Thus, the IGR is usually used for feature selection in many studies (Bui et al. 2020). Generally, the IGR values greater than zero are helpful for landslide modeling, and larger IGR values indicate more useful for modeling.

Neural network models
3.2.1. ANN ANN is a commonly used shallow multilayer perceptron method, which can model the complex relationships between conditioning factors and responses. ANN model has several advantages over traditional statistical methods, e.g. high accuracy, self-learning and strong robustness (Yilmaz 2010). Structurally, a typical three-layer ANN includes an input layer, a hidden layer and an output layer , as illustrated in Figure 4. Each layer has corresponding neurons (one or more), and neurons between different layers are connected by weights. Usually, these weights are continuously adjusted by the back-propagation learning algorithm.
In this study, the input layer refers to the conditioning factors, and the hidden layer mainly acts as the non-linear function to model the complex relationship between landslides and conditioning factors. Theoretically, a simple neural network with one hidden layer is sufficient to simulate most complex functions as long as there are enough neurons in the hidden layer (Hornik 1991). Thus, one hidden layer was used in this study. In addition, to better train the model, the batch normalization layer was added behind the hidden layer. Due to the limited number of neurons, the dropout layer was not applied in the model. Finally, the output layer generates a binary result (i.e. landslide or non-landslide).

1D CNN
One-dimensional (1D) CNN is a version of a convolutional neural network, which is mainly used to analyze 1D data, such as text data and time-series signal data. Typically, 1D CNN contains the input layer, convolutional layer, pooling layer, fully connected layer and output layer, as presented in Figure 5. Specifically, the convolutional layer is mainly used to extract various data features related to the target and its main role is as a feature extractor. The pooling layer operates on one dimension and is very effective in reducing the amount of data. For a complex task, 1D CNN usually consists of multiple convolutional and pooling layers to learn the complex data features. Finally, the fully connected layer acts as a classifier to classify the extracted data features.
In the present study, to keep the structure simple, a convolutional layer, a pooling layer and a fully-connected layer were adopted to build the 1D CNN model. In addition, the batch normalization layer was added behind the convolutional layer and fully connected layer. To alleviate the overfitting problems, the dropout layer with a dropout rate of 0.5 was added behind the fully connected layer.

RNN
RNN has been widely used in serialized data processing, such as text language and audio processing. For serialized data, the current data are usually related to the previous data. The implicit relationship is not considered in other neural networks, because the units in the layer are independent and not connected, except RNN. As shown in Figure 6, in the hidden layer of RNN, each unit is connected to other units at different time intervals. In other words, the current output (y t+1 ) considers not only the present input (x t+1 ), but also the history information of previous elements (such as x t and x t-1 ) of a sequence. Thus, RNN can learn valuable information in sequence data of various lengths (Fang et al. 2020).
In this study, the landslide conditioning factors were regarded as the one-dimensional serial data, which can be analyzed by the RNN model. The output refers to the landslide or non-landslide. To avoid the over-fitting problems, the alleviate layer with a dropout rate of 0.5 was added behind the hidden layer.

Accuracy evaluation methods
To quantitatively evaluate the performance of different neural networks, the receiver operating characteristic (ROC) curve was adopted to measure the predictive capabilities of different neural network models (Catani et al. 2013;Dou et al. 2020). Generally, the area under the ROC curve (AUC) varies between 0.5 and 1. An AUC value close to 1 means good performance, and vice versa. In addition, several statistical measures, namely precision, recall, F1 and overall accuracy (OA), were also used to measure the performance of these advanced models (Chen et al. 2018;. They can be calculated as follows: where, TP, FP, TN and FN represent true positive, false positive, true negative and false negative, respectively. For Precision and Recall, the threshold of 0.5 was used to determine the landslide (above 0.5) or non-landslide (below 0.5). Regarding these statistical measures, a higher value indicates the model performs better.

Multicollinearity and importance analysis
As shown in Figure 7(a), all VIF values were lower than the threshold value (10.0) and all TOL values were greater than 0.1, which implied that there was no significant multicollinearity among the landslide conditioning factors. As shown in Figure 7(b), among these factors, topographical factors contributed the most to the susceptibility model, especially the elevation factor. Although these factors (such as SPI, aspect, profile curvature and distance to rivers) contributed little to landslide modeling, their IGR values were higher than zero. Therefore, to make the neural network models better learn the intrinsic features, all selected landslide conditioning factors were adopted to predict the landslide susceptibility.

Comparison of the performance of neural network models
For neural network models, it is very important to choose the appropriate model parameters, including the model structure parameters and the training hyper-parameters. To improve the robustness of neural network models, an effective sampling method, namely the K-fold cross-validation with the stratified sampling, was used in this study. The training data set was internally sampled using tenfold cross-validation. Additionally, to reduce the randomness of training-test split, the stratified sampling strategy was adopted. As shown in Figure 8, the number of neurons in the hidden layers of ANN was set to 11. For 1D CNN, the number of channels of the convolutional layer was set to 70, and the neural units of the fully connected layer were set to 140. Regarding the RNN, the number of hidden layers was set to 80.
In addition, training hyper-parameter settings are critical to model training; however, there is no universally common standard to select the training hyper-parameters. In this study, training hyperparameters were determined according to the tenfold cross-validation method, and the details are listed in Table 3. The initial learning rate will dynamically decrease as the number of training epochs increases.
The landslide susceptibility rate ranges between 0.0 and 1.0, while higher values (close to 1.0) indicate this region is more prone to landslides, and vise-a-versa. To facilitate the analysis, three susceptibility maps derived from ANN, 1D CNN and RNN were reclassified into five landslide susceptibility levels from very high to very low by using the natural breaks method with ArcGIS software, as depicted in Figure 9(a)-(c) respectively.
It was observed that the spatial distribution of landslide susceptibility levels obtained by three neural network models was basically consistent, with some slight differences. Visually, the very low susceptibility areas generated by the RNN model were significantly higher than those of ANN and 1D CNN models, which were mostly concentrated in the northwest of the study area. The moderate and high susceptibility areas produced by RNN model were smaller than those of the other two models. For the very high susceptibility level, the areas were mainly distributed in the south and southeast of the study area. For further comparison, two regions (Subarea 1 and subarea 2) were enlarged to display the differences. For subarea 1, the very high and high susceptibility areas were mainly concentrated along the banks of the Pailung River. It is worthwhile to note that the Pailung River was classified as the moderate and low susceptibility areas by ANN and 1D CNN models, while the RNN model divided it into very low susceptibility areas. Obviously, the susceptibility map of this subarea generated by RNN model was more reasonable. In this case, the results generated by ANN and 1D CNN models were relatively more conservative than the RNN model. For subarea 2, the very high and high susceptibility areas generated by the 1D CNN model were larger than those of the other two models, which were mainly distributed along the rivers. Meanwhile, most of the existing landslides were located in these areas. In addition, it can be observed that most landslides in this area were predicted by the three models, which confirmed their good predictive ability.
The mean and standard deviation (Std) of the three susceptibility maps are as follows: RNN (Mean = 0.3366, Std = 0.4082), 1D CNN (Mean = 0.3573, Std = 0.3006) and ANN (Mean = 0.4461, Std = 0.2682). The mean value of the susceptibility map generated by the RNN model is significantly lower than the results of the other two models, while the standard deviation of the RNN model is the highest. It implies that the predicted results of the RNN model are generally low, and the results of the RNN model seem to be more extreme and better than those of ANN and 1D CNN models. In contrast, the predicted results of the ANN model are larger and more conservative, while the results of the 1D CNN model are more balanced. As depicted in Table 4, results revealed that in the resultant maps by ANN, 1D CNN and RNN, very high susceptibility level covered 19.43%, 15.52% and 23.62% of the entire study area, respectively, while very low susceptibility level covered 28.30%, 36.13% and 54.09% of the entire study area, respectively. Obviously, results generated by the RNN model are mainly in the very high and very low susceptibility zones, which is consistent with the statistical results. Furthermore, the percentage of landslides and landslide density values Figure 8. Selection of optimal model structure parameters for ANN, 1D CNN and RNN using tenfold cross-validation with the stratified sampling method.  Figure 9. Landslide susceptibility maps generated by three neural network models.
gradually increase with the increase of landslide susceptible levels, demonstrating the feasibility and reliability of the three neural network models. Figure 10 presents the accuracy evaluation results of the three neural network models. Obviously, all the AUC values are higher than 80%, indicating that these neural network models are effective for LSM. Furthermore, the RNN model has the best performance with an AUC of 84.76%, slightly higher than the 1D CNN model with an AUC of 84.59%. ANN model achieved the lowest AUC value (82.90%). For statistical measures (Figure 10(b)), it was observed that most accuracy evaluation measures of the RNN model were higher than those of the other two models. The performance of the 1D CNN model was slightly lower than the RNN model but better than the ANN model. It is worthwhile to note that ANN has the highest recall value, implying ANN was more effective in reducing the occurrence of FN. RNN has the highest precision value, implying the RNN model predicted landslides more accurately. In contrast, the results of the 1D CNN model are more balanced. Additionally, the Wilcoxon rank-sum statistical test (two-tailed) was performed to assess the level of significance between different neural network models. As shown in Table 5, all the P values were less than 0.05, indicating a significant difference between these susceptibility models.

Impact of training hyper-parameters and model complexity
For neural network models, the setting of training hyper-parameters not only strongly affects the training process of models but also affects the prediction accuracy. However, different training hyper-parameters have different effects on neural network models, and not all neural network models are affected to the same degree (Wang et al. 2020a). Therefore, investigating the impact of training hyper-parameters on neural network models is very important for the better application of neural network models, as well as the selection of the appropriate LSM model. Usually, training hyper-parameter of neural network models includes activation function, optimizer, learning rate, batch size and epoch. The activate function is mainly used to introduce nonlinear transformation into neural networks. In order to minimize the training loss, the optimizer is usually adopted to adjust the model weights during the training process, and the learning rate controls the update speed of model weights. The batch size means the number of training data trained in each iteration, and the epoch refers to the number of times to train all training data.
In this study, three main hyper-parameters, i.e. activation function, optimizer and batch size, were analyzed using a trial-and-error approach. For the other two hyper-parameters, we adopted adaptive methods to update them dynamically. For instance, we used an adaptive learning rate to train the three models and applied the early-stopping method to determine the epoch.
As shown in Figure 11, 1D CNN and RNN models achieved the highest AUC values when using the combination of Tanh + SGD, while the ANN model obtained the highest AUC value when using the combination of ReLU + SGD. Specifically, for ANN and 1D CNN models, SGD and Adam optimizers perform better than the other two optimizers (i.e. Adagrad and Adadelta). There is no significant difference in the performance of different activation functions, which implies that ANN and 1D CNN models may not be sensitive to the activation function. For the RNN model, the AUC values of using Tanh and ReLU activation functions are significantly higher than that using the Sigmoid activation function. Also, the performance of RNN model using SGD and Adam optimizers is better than the other two optimizers, which implies that SGD and Adam optimizers are robust in terms of optimizing parameters. In general, the combination of Tanh + SGD was more suitable for this study than other hyper-parameter combinations.
In terms of batch size (Figure 11(b)), it is observed that the highest AUC values were achieved for ANN, 1D CNN and RNN models when batch size was set to 120, 40 and 80, respectively. Specifically, for ANN and RNN models, as the batch size value increases, the AUC values show a trend of increasing and then decreasing, similar to a quadratic function curve. For 1D CNN, as the batch size value increases, the AUC values decrease significantly. Obviously, relatively small batch size values are recommended to improve the performance of neural network models for LSM, especially for the 1D CNN model.
In practice, the selection of a neural network model should not only consider the performance of the model but also consider the complexity. To investigate the complexity of three neural network models used in this study, their model parameters, training and inference time were simply compared in the same windows environment (Intel Core i5-9300H CPU with 8G RAM and NVIDIA GTX 1650 GPU with 4G RAM), as depicted in Table 6.
It can be observed that the ANN model has the least model parameters, followed by RNN and 1D CNN. Meanwhile, compared with the other two models, the ANN model also requires the least training and inference time, implying the relatively low complexity of the ANN model. Although the complexity of 1D CNN and RNN models is relatively high, they performed better than ANN

Previous studies and future recommendations
In the last few decades, many approaches have been developed for LSM, among which neural network-based methods performed excellently (Thi Ngo et al. 2021;). However, how to select an appropriate neural network model has always been controversial. In addition, although some experiments have proved the effectiveness of neural network models; unfortunately, few studies mentioned their implementation strategies, which makes it difficult for others to adopt these advanced algorithms. In terms of model structure, there is no widely accepted approach to design the model structure of neural networks. Neural networks are very complex algorithms requires extensive experience in model design, training and fine-tuning. Moreover, differences in topographic conditions may make it difficult to adapt models trained in one region to other regions. Thus, in practice, the K-fold crossvalidation method should be applied to determine the structural parameters of neural networks. Additionally, due to a large number of model parameters, dropout layer, adaptive learning rate and early-stopping strategies should be considered during model training.
In terms of training hyper-parameters, we found that relatively small batch size values were effective to improve the performance of neural network models for LSM. Previous studies were also consistent with our findings. For instance, Wang et al. (2020a) reported 32 and 64 are the optimal batch size, compared to 128 and 256. Sameen, Pradhan, and Lee (2020) reported the optimal batch size is 16 for CNN when batch size varies between 4 and 128. Hua et al. (2021) stated the optimal batch size is 600 for DNN when batch size varies between 20 and 3000. It is worth noting that the value of batch size may be related to the number of training samples. Because the number of Figure 11. Predictive performance of different neural network models with different training hyper-parameter settings. training samples in most LSM studies is relatively small compared to other image-related tasks. This conjecture needs to be further confirmed. In terms of activation function and optimizer, ReLU activation function and Adam optimizer were commonly used in many studies (Lv et al. 2022;Wang et al. 2020a). However, based on extensive experiments, we found that the Tanh activation function performs more robustly than the ReLU activation function, and SGD optimizer also performs slightly better than Adam optimizer. It may be related to the properties of the training sample and requires further investigation in the future. In addition, neural network models require a large number of reliable training samples as input, which implies that the sampling strategy and ratio of landslide/non-landslide samples should be taken into account seriously. For this study, we adopted the buffer-based non-landslide sampling method that is commonly used method in LSM. However, unreasonable buffer distance settings may lead to biased non-landslide samples (Lucchese, de Oliveira, and Pedrollo 2021). The impact of sampling strategy on neural network models needs to be further analyzed in the future. It should also be noted that the study area and training samples in this study were relatively small, and the division ratio between train and validation samples (7:3) is moderate and acceptable. For continental or global susceptibility mapping, the division ratio, as well as the model structure and optimal hyper-parameter values for neural network models, may need to be adjusted according to the amount of data.

Conclusions
In this study, three neural network models, i.e. ANN, 1D CNN and RNN, were investigated and evaluated in terms of their performances for LSM in the Qingchuan County, Sichuan province, China. Through comprehensive experiments, the following remarks can be concluded: •Three neural network models can accurately identify landslide susceptible regions. Spatially, the zonation of landslide susceptibility levels generated by ANN and 1D CNN models was basically consistent, but slightly differed from that of the RNN model. Qualitative and quantitative evaluation results, however, suggested that the RNN model performed better among the three neural network models. •The Tanh activation function and SGD optimizer were found more suitable for training neural network models in this study, and relatively small batch size values were recommended to improve the performance of neural network models for LSM, especially for the 1D CNN model. •ANN model requires the least model parameters, training and inference time, implying it owns relative low complexity, followed by 1D CNN and RNN models. However, the RNN model achieved the highest accuracy values among the three neural network models. It indicated that no perfect model exists for LSM, and the selection of the neural network model in LSM should be based on the specific application scenarios.
In general, this study provided insights and scientific suggestions for LSM using neural network models, which is expected to help those who apply these advanced methods to improve their efficiency. Furthermore, an easy-to-use software based on neural networks should be developed in the future to enhance landslide understanding and monitoring capabilities. Additionally, it is recommended to explore the impact of more parameters on neural network models and evaluate these models in many extensive regions.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The data or code used in this study are available from the corresponding author on reasonable request.