An ensemble deep-learning framework for landslide susceptibility assessment using multiple blocks: a case study of Wenchuan area, China

Abstract The improvement of the landslide susceptibility mapping (LSM) is a long-standing problem, as it provides basics for hazard mitigation. Recently, hybrid ensemble deep learning (DL) techniques have witnessed the potential for this purpose. In this paper, we proposed a novel ensemble DL model, namely GL-ResNet, which employs the conventional ResNet blocks for landslide feature extraction, long-short term memory (LSTM) structures for information storage, and a proposed GoogLeNet block (GBlk) to broaden model perception ability. To validate the model performance, a landslide inventory containing 1147 historical landslide polygons and the data of 12 landslide factors in the Wenchuan area in southwestern China, was presented and separated into training and validating dataset using a 7:3 randomly sampling ratio strategy. Based on AUC and Accuracy, GL-ResNet (0.96 and 0.909) outperformed logistic regression (0.92 and 0.851), support vector machines (0.94 and 0.884), deep belief networks (0.95 and 0.884), gated recurrent unit (0.94 and 0.884) and ResNet (0.95 and 0.894). We also explored the robustness of GL-ResNet for LSM. The results suggested that although GL-ResNet is sensitive to initial training conditions, it showed good robustness to model training method and sample ratios. In detail, GL-ResNet outperformed the conventional models in terms of fitting power and prediction performance by 0.03-0.04 and 0.02 respectively in most cases, with even greater differences in the limited training dataset.


Introduction
Landslides are catastrophic and devastating geological disasters worldwide, resulting in significant loss of life and property (Huang and Zhao 2018).To mitigate the landslide hazards, landslide susceptibility mapping (LSM), which is an effective tool for understanding landslides and predicting potential landslide hazard areas (Feizizadeh et al. 2014), is commonly suggested.LSM provides beneficial information for local governments to develop and implement effective landslide risk mitigation strategies (Gudiyangada Nachappa et al. 2020).
Recent studies on LSM have attracted a great deal of attention (Chowdhuri et al. 2021a).Various models have been proposed for LSM, which can be majorly classified into two main categories, i.e. heuristic-based models and data-driven models (Fang et al. 2021).The heuristic-based models rely on the empirical experience of experts to identify landslide triggering factors with the ultimate goal of obtaining landslide susceptibility zones (Aditian et al. 2018), such as the analytic hierarchy process (AHP) (Feizizadeh et al. 2014).The data-driven models use mathematical-statistical methods to extract the probability of landslide occurrence according to regional geographic information and disaster distribution (Reichenbach et al. 2018), which include frequency ratio (FR) (Wang et al. 2020b), weights-of-evidence (WoE) (Saha and Saha 2020), and information value method (Pal and Chowdhuri 2019).
At this stage, with the enhancement of artificial intelligence (AI) technology, machine learning (ML) models have been widely applied to boost landslide susceptibility studies (Sun et al. 2021b), including logistic regression (LR) (Zhao et al. 2019), support vector machines (SVM) (Dou et al. 2020), random forests (RF) (Sun et al. 2020), and artificial neural network (ANN) (Shahri et al. 2019).In addition, many scholars, such as Chowdhuri et al. (2021b), have started to compose two or more ML models into an ensemble model to produce higher quality LSMs.Although ML models outperform heuristic-based models and data-driven models, they cannot fully discover representative features of the data, which limits the accuracy of the models (Wang et al. 2019).
To address this problem, deep learning (DL) models have received more attention as they are excellent in deep and intrinsic data feature extraction (Zhu et al. 2020).In the field of landslide detection, U-Net and residual U-Net were used to detect and identify landslides based on Sentinel-2 data and the ALOS digital elevation model (Ghorbanzadeh et al. 2021;Ghorbanzadeh et al. 2022).In the field of landslide mapping, Wang et al. (2019) used convolutional neural network (CNN) for the first time to obtain LSM, and the results showed that the overall accuracy (OA) and Mathews correlation coefficient (MCC) of the CNN were 3.94%-7.45%and 0.079-0.151higher than those of the SVM, respectively.And later, the long short-term memory (LSTM) (Xiao et al. 2018;Wang et al. 2021a), the gated recurrent unit (GRU) (Wang et al. 2020a;Zhao et al. 2022), the deep belief network (DBN) (Li et al. 2022), and the ResNet (Lv et al. 2022) were used to analyze landslide susceptibility, all with higher accuracy than traditional ML models.However, these successful studies still have some drawbacks as they used only an individual model to predict landslide susceptibility, which means that only one type of convolutional kernel can be used and the best-fit function may be missed during the training phase (Wang et al. 2011;Korzh et al. 2017), thus reducing the accuracy of the model.Although Fang et al. (2021) proposed to solve this problem with four heterogeneous ensemble-learning techniques, this leads to a complex training process for the model.
Therefore, in this paper, instead of using ensemble-learning techniques, we reported a novel ensemble DL model to overcome the limitations of individual models and to obtain LSM with higher accuracy compared to conventional models.This model is called GL-ResNet and includes both the conventional ResNet block and LSTM structure, as they always work well when used as an individual model and excel at feature extraction and information storage respectively.In addition, GL-ResNet incorporates a proposed GoogLeNet block (GBlk) which contains multiple types of convolutional kernels to broaden model perception ability.To validate the performance of GL-ResNet, Wenchuan county in southwestern China was selected as a case study and the results (including receiver operating characteristic curve (ROC), confusion matrix, and quality of LSM) by GL-ResNet were compared with the ones by five state-of-the-art models.In the discussion section, we constructed multiple experimental environments (including cross-validation, different percentages of training and testing datasets, various proportions of landslides and non-landslides, and multiple random initial training conditions) to verify the robustness of GL-ResNet.

Methodology
The assessment method used in this paper consists of three stages, from factors optimization to model performance validation.Details of the assessment procedure were illustrated in Figure 1.

Factor screening
The occurrence of landslides is associated with both natural and anthropogenic-activity factors, wherein many studies proposed numerous factors.However, not all the influence factors are critical to LSM (Dou et al. 2015).In this study, we first used the RF model to extract the importance of each factor, and then used the variance inflation factor (VIF) and tolerance (TOL) to access the multicollinearity of the factors.When constructing models for LSM, only factors with both high importance and low multicollinearity were considered as independent variables.

Random forest
RF was first proposed by Breiman in 2001 (Sun et al. 2021a) as a bagging and ensemble data mining model (Pal et al. 2022).The core of RF is to use multiple different datasets to train decision trees and to determine the final output based on the judgment results of these decision trees.In addition, RF can be used to obtain the relative importance of factors based on out-of-bag (OOB) error, which was the main role of RF in this paper.The detailed expression of RF and the calculation method of relative importance can refer to Ruidas et al. (2022b).

Multicollinearity assessment
After analysis of the RF model, we will obtain a number of landslide influence factors of high importance.If these landslide influence factors are highly correlated with each other rather than with the model, this can lead to a reduction in the accuracy of the model predictions (Ruidas et al. 2022a).To avoid this problem, multicollinearity assessment is a common method, which is performed with VIF and TOL.The equations for VIF and TOL can be found in Ruidas et al. (2022c).In general, TOL < 0.1 and the VIF > 10 indicate obvious multicollinearity among the factors (Saha et al. 2021).

Assessment model training
The GL-ResNet is supposed to have the advantage of comprehensively considering the advantages of individual models, which helps to improve the ability for landslide susceptibility assessment.A detailed description of the GL-ResNet could be found in Section 3. Notably, the landslide dataset will be divided into the training and testing dataset, with only the training dataset being used for model training and both the training and testing datasets being used for model performance analysis.To highlight the performance of GL-ResNet, the conventional models mentioned in Section 1, including LR, SVM, DBN, GRU and ResNet, were also well-trained respectively.A brief description of the conventional model was as follows.

Logistic regression
LR is a multivariate statistical technique that can be used to deal with both binary and multivariate classification problems.LR is essentially based on a sigmoid function that maps a linear combination of input features to probability values between 0 and 1, with the aim of obtaining a category label for the object.LR is calculated as shown below (Ruidas et al. 2021).
where P is the probability value, a is the intercept, b i ði ¼ 1, 2, 3, :::, nÞ is the regression coefficient, x i ði ¼ 1, 2, 3, :::, nÞ is the input feature and n is the number of input features.

Support vector machine
SVM is a statistically based ML model closely related to Vapnik-Chervonenkis (VC) theory (Ruidas et al. 2022a).The goal of SVM is to use kernel functions to map the input samples non-linearly into a high-dimensional space and find an optimal hyperplane to divide the positive and negative samples.Currently, SVM have been widely used in LSM due to its ability to effectively handle small sample datasets.

Deep belief networks
DBN is a DL model based on unsupervised learning, the core part of which is a multilayer restricted boltzmann machine (RBM).During DBN training, the output of each RBM layer is used as input to the next RBM layer to dig deeper into the features of the input data.The structure of the DBN can be referred to Li et al. (2022), which consists of the visual layer and the hidden layer.

Gated recurrent unit
GRU is a variant of the recurrent neural network (RNN), which has a strong memory, is good at handling sequential problems and can avoid gradient disappearance and gradient explosion.Specifically, the GRU introduces gating mechanisms, including update gates and reset gates, which are used to control the impact and retention of the hidden state from the previous moment, respectively.For GRU, Zhao et al. (2022) introduced its cell structure and specific functions.

ResNet
ResNet is a very powerful DL model, proposed by He et al. (2016), which aims to solve the degradation problem caused by increasing number of layers in the network.Unlike normal CNN, ResNet introduces a skip connection to enable direct information exchange between layers, which facilitates the transfer of input data to deeper layers.In this paper, ResNet will not only be used as a comparison model, but its structure will also be used as part of GL-ResNet.

Performance validation
We used multiple criteria to validate the performance of GL-ResNet.First, we introduced ROC curve (Jaydhar et al. 2022) and used the area under the curve (AUC) of the training and testing dataset of the model to obtain preliminary information on the fitting power and prediction performance of the model.Then, both the confusion matrix (true negative (TN), false negative (FN), false positive (FP), true positive (TP)) (Panahi et al. 2022) and its additional indicators (Precision, Recall, F1_score, and Accuracy) of the model were used for more in-depth analysis.Finally, we reclassified the landslide susceptibility obtained by all models into five levels, i.e. very low (0 À 0.1), low (0.1 À 0.3), medium (0.3 À 0.5), high (0.5 À 0.75), and very high (0.75 À 1), according to Dai and Lee (2001), to demonstrate the generalization ability of the model.The higher the percentage of area in the very low region, the percentage of non-landslides in the very low region, and the percentage of landslides in the very high region, the better the generalization ability of the model.

The structure of the proposed GL-ResNet model
To overcome the limitations of individual models and improve the accuracy of LSM, we proposed GL-ResNet, whose detailed structure is shown in Figure 2. The GL-ResNet utilizes the conventional ResNet blocks to extract landslide features.Meanwhile, the GL-ResNet adds multiple LSTM structures to store information and a GBlk structure with multiple convolutional features to enhance the perception ability of the model.
The core of this GL-ResNet is the GL-ResNet block.The structure of this block is similar to that of the Residual Network (ResNet) (He et al. 2016), both with the main path and a shortcut, which is well suited to cope with degradation (Li and Lima 2021).The main path of the GL-ResNet block consists of the main path of the ResNet block, an LSTM structure and a GBlk structure.The main path of the ResNet block consists of two convolutional layers, two batch normalization (BN) layers, two activation layers (ReLU), and four Dropout layers, which are mainly used to extract landslide features.The LSTM structure contains an LSTM layer and a Dropout layer.The LSTM structure can discard extraneous data through training, allowing the model to store the most crucial landslide occurrence information (Wang et al. 2020a).
To improve the perception ability of the model, we proposed a GBlk structure with multiple convolutional features and inserted it into the GL-ResNet block.The GBlk structure has four path types, i.e. 1 Â 1, 3 Â 3, 5 Â 5, 7 Â 7 type.The 1 Â 1 path contains a convolutional layer with the convolutional kernel size of 1 Â 1, a BN layer, an activation layer (ReLU), and a Dropout layer.Compared to the 1 Â 1 path, the 3 Â 3, 5 Â 5, 7 Â 7 paths add a convolutional layer with the convolutional kernel size of 3 Â 3, 5 Â 5, 7 Â 7, respectively.Meanwhile, all three paths also add a BN layer, an activation layer (ReLU), and a Dropout layer.The input data of the GBlk structure is processed by four paths simultaneously, followed by spatial stitching using the "Concat" method.The GBlk structure with multiple convolutional features can enhance the perception ability of the model and promote multiscale feature extraction of the input data by the model (Ran et al. 2021).
The shortcut of the GL-ResNet block consists of a convolutional layer with the convolutional kernel size of 1 Â 1, a BN layer and two Dropout layers.The shortcut is designed to ensure the integrity of input data and to reduce network training costs.Notably, the input data of the GL-ResNet block will enter both the main path and the shortcut.The output data of the main path of the ResNet block, the output data of the LSTM structure, and the output data of the GBlk structure will be combined using the "Add" method, while the output data of the main path and the output data of the shortcut will be stitched using the "Concat" method.The key parameters of each layer, such as the convolution kernel size, the number of output channels, the stride, and the padding of the convolution layer, can be found in Figure 2.
The input data of the GL-ResNet is a three-dimensional matrix (b Â 1 Â m), where b and m are the set training batch size and the number of landslide factors, respectively.First, the input data is pre-processed through a convolutional layer, a BN layer, and an activation layer (ReLU).The data form changes from b Â 1 Â m to b Â 64 Â m: The data then goes through two GL-ResNet blocks for feature extraction, information storage, and multiscale analysis.The data form changes from b Â 64 Â m to b Â 256 Â m and then to b Â 1024 Â m: Finally, the data goes through a pooling layer, a data transformation structure, a fully connected layer, and an activation layer (Sigmoid) and is converted into a b Â 1 two-dimensional matrix.The goal of this step is to establish the link between the deep features of landslide influence factors and landslides.

Study area
The Wenchuan area in southwestern China was selected as a case study.It is located in the northwestern part of Sichuan Province and the southeastern part of Aba Tibetan and Qiang Autonomous Prefecture (30 45 0 37 00 -31 43 0 10 00 N, 102 51 0 46 00 -103 44 0 37 00 E), covering a region of approximately 4084 km 2 (Figure 3).The terrain in the area is high in the northwest and low in the southeast, with the elevation ranging from 763 m to 5827 m.The geological effect is active in the area, with the southwestern part of the Longmenshan Fault Zone across the area.The lithologies in Wenchuan county are primarily Cenozoic, Quaternary, Precambrian, Paleozoic, and Mesozoic Jurassic, of which granite, basalt, shale, slate, and dolomite are the most common rock types.The average annual temperature of Wenchuan County is 13.5 C-14.1 C, with annual rainfall ranging from 528.7 mm to 1332.2 mm.
Based on the interpretation of the collected remote sensing images, a total of 1147 historical landslides were reported in the Wenchuan area.As shown in Figure 3, most of the landslides were observed in the eastern and northeastern mountainous areas.It was shown that these historical landslides were majorly distributed along rivers and also adjacent to fault zones.In the subsequent study, all acquired landslides were considered as part of the main dataset in order to adequately train and compare the performance of the models.

Landslide influence factors
Previous studies (Van Dao et al. 2020) have demonstrated that the selection of proper influence factors is a crucial step in LSM.However, it remains a challenge to form uniform evaluation criteria and technical specifications due to the spatial heterogeneity of the regions.In most cases, the selection of influence factors commonly follows previous studies, prior knowledge of the local environmental features (Rossi et al. 2019), and data accessibility.In this paper, twelve influence factors categorized by four aspects, i.e. topography, geology, hydro-climate and anthropogenic-activity, were chosen for accessing landslide susceptibility.It should be mentioned that altitude, slope, aspect, relief amplitude, plan curvature, profile curvature, distance to faults, the normalized difference vegetation index (NDVI), rainfall, distance to rivers, and distance to the road in the dataset were in quantitative and continuous forms, while lithology was categorical according to physical properties, lithological structure, and mechanical features.The classification of twelve factors was depicted in Figure 4 for offering a better visualization of influence factors attributes in Wenchuan County.A brief overview of these twelve factors can be found in the Appendix.

Data preparation and model parameter setting
In this paper, the datasets used for model training and testing include both landslides (positive areas) and non-landslides (negative areas).As 1147 historical landslides were included, equally, we therefore randomly generated 1147 non-landslide samples in the negative areas.Then, the specific data of influence factors for all landslides and nonlandslides were extracted using ArcGIS environment.Finally, during the process of model training and testing, the dataset was randomly divided into two categories, i.e. approximately 70% of the samples (including 796 landslides and 809 non-landslides) were used for training, while the remaining about 30% (including 351 landslides and 338 non-landslides) were used for testing.For the model parameter settings, the optimizer, training epoch, training batch size, original learning rate, and loss function for GL-ResNet were Adam, 100, 32, 10 À3 and Binary CrossEntropyLoss, respectively.
For the purpose of LSM, the unit type plays an important role.In general, the unit type contains terrain units, grid units, unique condition units, and slope units (SUs) (Amato et al. 2019), which would exert a significant influence on the validity and rationality of LSM.In this paper, we used our previous object-oriented multi-resolution segmentation method (Li et al. 2021;Li et al. 2017) and generated 215,598 homogeneous SUs covering the whole study area.The information on landslide influence factors of SUs was then acquired using the ArcGIS tool, which was mainly used to extract the landslide susceptibility.

Analysis of influence factors
We used the RF model to extract the importance of influence factors (Figure 5a).The results showed that the relief amplitude has the most significant influence on the occurrence of landslides.Next is the altitude and the distance to the road, which still have influence on the occurrence of landslides.It can be seen that the occurrence of the historical landslides in the Wenchuan area was primarily explained by the topography and anthropogenic-activity conditions.It was worth mentioning that factors such as slope, plan curvature, and profile curvature are of very low importance (smaller than 0.05) for landslide occurrence in the study area.Therefore, they should be removed in this case.
Along with the importance checking, the multicollinearity test using VIF and TOL was performed.The VIF and TOL of each high-importance factor were extracted (Figure 5b).It demonstrated that the remaining nine influence factors were free from collinearity (VIF < 10 and TOL > 0.1).Therefore, these nine influence factors were considered as independent variables in the construction of the landslide susceptibility assessment models.
However, as demonstrated by Zhang et al. (2022), it is not sufficient to validate the results of model performance by AUC alone.To further demonstrate the fitting power and prediction performance of the GL-ResNet, the confusion matrix (Table 1) was used in this paper as a supplement to AUC to obtain more details of the model performance.The result in Table 1 showed that GL-ResNet has the best fitting power.This is because although the Recall of GL-ResNet (0.955) is slightly smaller than that of GL-ResNet (0.957), the Precision (0.886), F1_Score (0.919), and Accuracy (0.917) of the training dataset of GL-ResNet are all the largest and significantly greater than other models.For the testing dataset, a similar phenomenon was still observed, which indicated that the GL-ResNet has the best prediction performance.
In summary, for landslide susceptibility assessment, the proposed GL-ResNet showed better fitting power and prediction performance compared to the conventional models, i.e.ResNet, GRU, DBN, SVM and LR.

Landslide susceptibility mapping
As analyzed above, the GL-ResNet has the strongest fitting power and prediction performance.Therefore, we used the proposed GL-ResNet to access the landslide susceptibility in the study area (Figure 7).As shown in Figure 7f, the LSM result by the GL-ResNet showed that most of Wenchuan County is located in very low landslide susceptibility regions, and its area is significantly larger than the other four levels.This indicated that the landslide risk in Wenchuan County is relatively low.The high and very high landslide susceptibility regions are mainly located in the eastern and northeastern regions, close to the Mowen Fault Zone and Beichuan-Yingxiu Fault Zone.These regions need to focus on landslide hazard prevention and risk transfer.
Since all models have never seen the influence factor data of SUs, the quality of the LSM generated based on SU provided a good validation of the generalization ability of the models.Therefore, to highlight the generalization ability of GL-ResNet, the LSMs generated by all models should be explored in more detail to provide an additional argument.The LSM was obtained using the conventional models described above for comparison (Figure 7).Meanwhile, the area proportion, landslide distribution and non-landslides distribution for various landslide susceptibility levels were accessed and demonstrated in Figure 8a-c.All LSMs showed that historical landslides mainly overlap with high and very high regions.Meanwhile, a pattern was found in all LSMs in which the proportion of landslides gradually increased and the proportion of non-landslides gradually decreased as the level of susceptibility classification increased.These phenomena all suggested that all LSMs have some validity, without an overfitting problem.
For a more detailed exploration, the GL-ResNet-based LSM obtained the highest area of very low region (59.92%),followed by SVM (57.74%),DBN (55.76%),LSTM (54.60%),ResNet (53.61%) and LR (9.88%).This indicated that the GL-ResNet-based LSM has the highest quality.This is because, in reality, most areas of Wenchuan County are considered safe.In addition, as shown in Figure 8b and c, ResNet has the highest percentage of landslides in very high region (84.92%),followed by GL-ResNet (83.17%),DBN (74.19%),SVM (71.58%),LSTM (65.48%) and LR (33.39%), while for the proportion of non-landslides in very low region, the sequences are GL-ResNet (68.88%),SVM (65.30%),DBN (61.99%),LSTM (60.51%),ResNet (60.16%) and LR (8.63%).Although the percentage of landslides in very high region of GL-ResNet (83.17%) is slightly smaller than that of ResNet (84.92%), the quality of GL-ResNetbased LSM is still the highest when evaluated from several aspects (area proportion, landslides distribution, non-landslides distribution).This is not only because the GL-ResNet-based LSM performs best in terms of both area proportion and nonlandslide distribution, while the ResNet-based LSM performs best only in terms of landslide distribution, but also because the accurate prediction of the very low region is more important than the accurate prediction of the very high region.The quality of GL-ResNet-based LSM was the highest, which indicated that GL-ResNet has the strongest generalization ability.

Role of cross-validation
According to Sections 2.2 and 4.3, when constructing GL-ResNet and the five conventional models, we divided the dataset into two parts, a training dataset and a testing dataset, for model training and model performance validation respectively.This approach made model performance validation inadequate, as the model may only perform well on this particular testing dataset.Therefore, we introduced the cross-validation (Ghorbanzadeh et al. 2018) to address this issue by repeating the experiment for each subset of the data.Cross-validation requires the selection of an appropriate number of folds, which is still specified empirically by researchers in the field of landslide mapping.In response, considering the dataset capacity and the sample requirements for constructing the model, and referring to (Ghorbanzadeh et al. 2018), three folds were conducted in this paper.Specifically, we split the dataset (1147 landslides and 1147 non-landslides) into 3 folds, each consisting of approximately 765 data.Two of these three folds were used for model training in each time, and the remaining third fold was used for model performance validation.The distribution of landslides and non-landslides in the different three folds was shown in Figure 9. Therefore, we conducted three experiments for each model and their Accuracy and AUC were listed in Table 2.
The results showed that the performance of GL-ResNet varied across the folds, but this difference was very small in terms of both Accuracy and AUC, with an average difference of approximately 0.01.This suggested that the model is not sensitive to cross-validation.In addition, GL-ResNet achieved the best results in almost all three folds, with an average accuracy of 0.912 and 0.902 on the training and testing datasets, respectively, which is higher than ResNet (0.899 and 0.883), GRU (0.874 and 0.874), DBN (0.891 and 0.884), SVM (0.876 and 0.877), and LR (0.842 and 0.842).As a result, GL-ResNet proved to have the best fitting power, prediction performance and robustness.

Trade-offs in sample ratios
For ML and DL models, the sample capacity and classification ratios of the training and testing dataset are crucial, as they can have a significant impact on model accuracy.In addition, due to the extensive coverage of non-landslide locations, Wang et al. (2021a) suggested that the 1:1 sample ratio between landslides and non-landslides is not necessarily the most applicable.To explore the effect of sample ratios on the GL-ResNet ability, here we constructed multiple experimental environments considering different percentages of training and testing datasets and various proportions of landslides and non-landslides.The conventional models mentioned above were also  well-trained in these environments and served as a comparison of GL-ResNet performance.This in-depth analysis helped to explore the robustness of the GL-ResNet.The detailed experimental environment and model performance can be seen in Figure 10. Figure 10 showed that the accuracies of all models improved when more samples were used for model training.Figure 10a revealed that for the training dataset, the proposed GL-ResNet attained the highest accuracy.Although ResNet follows GL-ResNet most of the time, the accuracy difference between GL-ResNet and ResNet was still obvious (0.03 À 0.04).The fitting power of the remaining models (LR, SVM, DBN, GRU) was much lower than that of GL-ResNet and ResNet.For the testing dataset (Figure 10b), the accuracy of GL-ResNet remained the highest accuracy in all cases, also with a common leading of 0.02 accuracy compared to the conventional ResNet.These data showed that the proposed GL-ResNet exhibited the best fitting power and prediction performance in all cases, confirming the robustness of GL-ResNet.
We also found that GL-ResNet performed well even with a few training samples, which is an advantage compared to other models.For example, when the proportion of landslides to non-landslides is 1:1 and the percentage of training dataset to testing dataset is 1:9 (only 229 training samples), the accuracy of the training and testing dataset of GL-ResNet were 0.948 and 0.891, respectively, which were significantly higher than that of ResNet (0.838 and 0.845), DBN (0.882 and 0.866), GRU (0.847 and 0.870), SVM (0.825 and 0.828), and LR (0.773 and 0.815).Moreover, even with such a small amount of training samples, the constructed GL-ResNet outperformed the models constructed by scholars with sufficient data.Therefore, for most regions, the verified robustness and low sample requirement make the proposed GL-ResNet a beneficial tool to obtain landslide susceptibility.

Effect of random initial training conditions
DL models have numerous random initial training conditions, such as random initial weights for each layer, randomness of regularization (e.g.Dropout, etc.), the randomness of optimization, etc.These random initial cases have witnessed an impact on the accuracy of DL models and may lead to non-reproducible prediction results of DL models.Based on this, the above conclusion, that the GL-ResNet has the strongest fitting power, prediction performance, generalization ability, and robustness, may exist by chance.Therefore, to fully validate the ability of the GL-ResNet to assess landslide susceptibility, it is necessary to obtain the performance of the GL-ResNet under randomness weakening.
Current studies (Phong and Phuong 2019) generally used random seeds, i.e. prefixed random initial conditions, to weaken the randomness of DL models.However, this approach is not very applicable when the DL models are used to access landslide susceptibility.This is because different regions have different geographic features and landslide distributions, and random seeds suitable for all situations have still not been discovered.Therefore, instead of using random seeds, we weakened the effect of randomness by repeating the experiment dozens of times at each ratio.According to the sample ratios shown in Figure 10, each of the above DL models (DBN, GRU, ResNet, and GL-ResNet) was trained 30 times repeatedly at each ratio.A total of 1560 groups of tests were conducted, and the results can be seen in Figure 11.Furthermore, although the ML models are not subject to randomness, their results were still presented in Figure 11 for comparison.
As shown in Figure 11, the random initial cases showed an obvious influence on the GL-ResNet, ResNet, and DBN, while GRU was not such sensitive to the random initial cases.Although the result demonstrated that the proposed GL-ResNet was the most sensitive to the random initial cases, it should be mentioned that the GL-ResNet al.ways attained the best accuracy among the six models.For instance, with a sample ratio of 1:9, the accuracy of the proposed GL-ResNet dropped to approximately 0.885, which was higher than the other models.Overall, using different sample ratios, the proposed GL-ResNet attained average accuracies ranging between 0.895-0.955,which was obviously higher than the other models.

Conclusions
LSM can help governments to understand the probability of landslides in a region for hazard transfer and risk avoidance, which is important for land planning and environmental protection.Recently, DL models have been progressively used to provide higher accuracy LSM due to the ability to dig deeper into the intrinsic feature of the input data.However, most previous studies used an individual DL model to predict landslide susceptibility, which means that only one type of convolution kernel was used and the best-fit function may be missed, thus limiting the accuracy of landslide susceptibility assessment.Nevertheless, the ensemble DL model with multiple blocks can help solve this problem.
The main contribution of this study is to purpose a novel ensemble DL model for landslide susceptibility assessment.The so-called GL-ResNet model utilizes the conventional ResNet block for landslide feature extraction and LSTM structure for information storage.Meanwhile, a proposed GBlk structure with multiple convolutional features is added to expand the perception ability of the model.To validate the improved effectiveness of the proposed model in landslide susceptibility assessment, the Wenchuan area in southwestern China was selected as a case study, where landslide inventory containing influence factors and 1147 historical landslides.Based on AUC and accuracy, the proposed GL-ResNet model obtained the best performance (0.96 and 0.909) compared to LR (0.92 and 0.851), SVM (0.94 and 0.884), DBN (0.94 and 0.884), GRU (0.95 and 0.884), and ResNet (0.95 and 0.894).To explore the robustness of the proposed model, the model training method, effects of sample ratios, and random initial cases, were analyzed.The outcome suggested that although the proposed GL-ResNet model was more sensitive to the random initial cases, It had a higher average accuracy than other models by 2%-4%, in particular when the training dataset is limited.
As a remark, the generated landslide susceptibility map from the proposed GL-ResNet model has been proven the better accuracy and thus can be practical for the improved management of land use in landslide areas, as well as for landslide hazard mitigation.Moreover, the robustness and small sample requirements make GL-ResNet a good choice for obtaining LSM in most regions of the world.In the ongoing work, the improved effect of the GL-ResNet in other different cases should be further evaluated, which will be essential to better support its effectiveness in LSM.

Figure 3 .
Figure 3. Location and landslide distribution of the study area.

Figure 6 .
Figure 6.ROC and AUC of each model.

Figure 8 .
Figure 8. Validation of LSMs (a) area proportion of each susceptibility class; (b) landslides distribution of each susceptibility class; (c) non-landslides distribution of each susceptibility class.

Figure 9 .
Figure 9. Landslide and non-landslide distribution for different folds.
For example, Yuan et al. (2022) based on 1514 training samples, used CF-RF, RF, CF-SVM, CF-LR, SVM and LR to analyze the collapse and landslide susceptibility of Wenchuan County and achieved a maximum accuracy of only 0.890.Wang et al. (2021b) used 2046 training samples from Wenchuan County to construct CF, LR-CF and RF-CF, and the highest accuracy achieved was only 0.762.Due to the difficulty in obtaining landslide data, the number of samples used for model training is small when analyzing landslide susceptibility in most areas.

Figure 10 .
Figure 10.Model performance variation with different sample ratios.

Figure 11 .
Figure 11.Model accuracy variation based on different random initial cases.

Table 1 .
Precision, Recall, F 1 _Score, and Accuracy of models based on the confusion matrix.

Table 2 .
Statistical criteria of Accuracy and AUC for each fold.

Table A .
Continued.