GIS-based soil planar slide susceptibility mapping using logistic regression and neural networks: a typical red mudstone area in southwest China

Abstract Global warming increases the frequency and intensity of extreme rainfall, putting many areas at risk of landslides. Landslide susceptibility assessment is essential to understand the threats and to predict, prevent, and mitigate landslides. In this study, a soil landslide inventory was constructed based on satellite images, topological maps, and extensive field studies. Subsequently, eight different GIS layers, which were geomorphology, elevation, slope angle, slope aspect, slope structure, slope curvature, antecedent rainfall, and cumulative rainfall on 16 September, were produced as control factors of soil planar slides for the susceptibility mapping. Landslide susceptibility mapping was performed using two different methods, logistic regression model and backpropagation (BP) neural network. Landslide susceptibility in the study area is divided into four levels, which are high, moderate, low, and no susceptibility in both the logistic regression model and the BP neural network model. In both the two models, most of the observed soil planar slides were located in areas with high or moderate susceptibility. For the logistic regression model, total 605 soil planar slides locate in the area with high susceptibility, of which the area is 800.56 km2, accounting for 40.31% of the total area. Finally, the validation of two models was evaluated. The AUC value of the logistic regression model was 0.878 and the parameters of BP neural network has the correlation coefficient of 0.880, which shows the two models are both reliable and reasonable for predicting the spatial susceptibility of soil planar slides. According to field checks, the BP neural network model is verified to have more accurate spatial prediction performance than the logistic regression model.


Introduction
Southwest China is one of the most earthquake-prone areas. Several powerful earthquakes including, the Yiliang Earthquake on 7 September 2002 (Wang et al. 2013), the Wenchuan Earthquake on 12 May 2008 (Xu et al. 2016a), and the Lushan Earthquake on 20 April 2013, were recorded in Southwest China in just the past two decades. Strong earthquakes like these have further loosened what was already loose soil in the area, making the area prone to failure, especially after prolonged exposure to heavy rainfall and human engineering activities (Zhang et al. 2012. On 16 September 2011, a total of 1296 soil landslides occurred in Nanjiang County, Sichuan Province, China (Fig. 1). These landslides resulted in huge risks to the population and properties in the nearby downstream area (Fig. 2). Those landslides were triggered by heavy rainfall according to a geological survey conducted by the Nanjiang County Land Resources Bureau. Although many investigations on the triggering factors, geological conditions, or failure mechanisms have been conducted for the red-layered landslides that occurred in Nanjiang (Li et al. 2014;Zhang et al. 2014Zhang et al. , 2015Zhang et al. , 2016a, it is still necessary to characterize how soil landslides are distributed in the red mudstone area of the Nanjiang County. It is also important to analyze the susceptibility of this area to landslide, which can provide strong theoretical and technical support for the prevention, management, monitoring, and early warning of landslide hazards. Landslide susceptibility is the likelihood of a landslide to occur in a particular area based on the local terrain conditions (Brabb 1984). With developments in GIS dataprocessing techniques, research on the formation mechanisms of landslides, hazard evaluation, and susceptibility assessment of landslides have been conducted over the last two decades (e.g., Van Westen et al. 1997;Guzzetti et al. 1999;, Wooten 2006Yilmaz 2007;Kawabata and Bandibas 2009;Jadda et al. 2011;Poiraud 2014;Pourghasemi et al. 2014;Xu et al. 2016b;Pham et al. 2017;Singh et al. 2020;Wu et al. 2020). In particular, a lot of research on landslide susceptibility mapping used for susceptibility analysis has been conducted (e.g., Ohlmacher and Davis 2003;Ayalew et al. 2004;Yesilnacar and Topal 2005;Van Den Eeckhaut et al. 2006;Lee and Sambath 2006;Dahal et al. 2008;Kamp et al. 2008;Oh et al. 2009;Yilmaz 2009;Bai et al. 2010;Akgun 2012;Felic ısimo et al. 2013;Aniya 1985;Su et al. 2015;Fei et al. 2017;Gigovi c et al. 2019;B alteanu et al., 2020). In the authors' process of susceptibility mapping, many models were used for calculating the value of susceptibility, which can be qualitative or quantitative. In studies using qualitative methods, landslide hazard levels in a study area were illustrated using descriptive terms; on the other hand, studies using quantitative methods evaluated the numerical probabilities of landslide occurrence in a specific hazard zone. However, geotechnical models only appear to be useful to a limited extent considering the difficulty in collecting data for necessary physical variables in large regions. Hence, statistical approaches are currently pursued to assess landslide hazards (Yesilnacar and Topal 2005). For example, Ohlmacher and Davis (2003) adopted a multiple logistic regression method to create a landslide hazard map for landslides on hilly terrains along the Kansas and Missouri rivers in northeastern Kansas. The results indicated that slopes are the most important variable for estimating landslide hazards in the study area. A statistical multivariate method, i.e., rare events logistic regression, is evaluated by Van Den Eeckhaut et al. (2006) to create a landslide susceptibility map for a 200 km 2 study area of the Flemish Ardennes (Belgium). In a study by Yalcin (2008), the analytical hierarchy process (AHP), the statistical index (W i ), and weighting factor (W f ) methods were used to produce and later compare three susceptibility maps. The results showed that the AHP method gave a more realistic picture of the actual distribution of landslide susceptibility than the W i and W f methods. Bai et al. (2010) used a logistic regression based on GIS data to produce a detailed susceptibility map of one of the most landslide-prone areas in China, the Zhongxing-Shizhu segment in the Three Gorges Reservoir region. In recent years, more and more machine learning methods have been used for landslide susceptibility mapping, such as artificial neural networks ( Møller, 1993;Zeng-Wang 2001;Yesilnacar and Topal 2005;Nawi et al. 2006;Pham et al. 2017;Le et al. 2019aLe et al. , 2019bShariati et al. 2019;and Lv et al. 2020), support vector machine (Guo et al. 2005;Yao and Dai 2006;Basak et al. 2007;and Dou et al. 2015), decision Tree (Chu et al. 2009;Saito et al. 2009;Nefeslioglu et al. 2010;and Tien Bui et al. 2012); random forest (Dou et al. 2019), neuron fuzzy (Pradhan 2013) and so on. Also, comparison between these models have been made to figure out the most accurate model for landslide susceptibility mapping. For example, Chen et al. (2018) assess and compare four advanced machine learning techniques, namely the Bayes' net (BN), radical basis function (RBF) classifier, logistic model tree (LMT), and random forest (RF) models, for landslide susceptibility modelling in Chongren County, China. Out of the tested models, the RF model was verified to have the highest sensitivity.
It can be found that past research has mainly focused on: (1) choosing the parameters used for landslide susceptibility-in order to achieve a more accurate assessment, the evaluation parameters with the highest correlation with landslides would be selected by researchers; (2) modeling landslide susceptibility-a large number of susceptibility evaluation models considering different control factors and mathematical methods have been established; (3) verifying evaluation results. Although many researches have been conducted on the triggering factors, geological conditions, and failure mechanisms for red mudstone landslides in the Nanjiang County, studies on the distribution characteristics of red mudstone landslides and their susceptibility are still limited.
This paper aims to (1) construct a soil landslide inventory map that incorporates the spatial distribution characteristics of soil planar slides in the red mudstone area of Nanjiang; (2) establish an evaluation parameter system of soil landslide susceptibility mapping, based on the correlation between each influencing factor of red mudstone landslides; and (3) determine the appropriate evaluation model and conduct a susceptibility mapping of soil landslides in the study area.

Description of the study area
The study area is located in Nanjiang, Sichuan Province of China, which lies between the longitudes of 106 27 0 to 107 28 0 E and latitudes of 31 53 0 to 32 45 0 N (Figs. 1a and b). In general, according to the topographical map (Fig. 1c), the altitude of the south and middle sections of the study area ranges from 500 to 1000 m and increases to above 1000 m in the north of the area. This area is characterized by widely distributed cuesta ( Fig. 3a) with gently inclined rock layers and quasi cuesta, and most of the hillslope area has been explored as cultivated land and paddy field in a ladder shape . The slope angle of the rock layers ranged from 10 to 30 . As shown in Fig. 3a, tectonic structure is developed in the northern mountainous area of Nanjiang County, which is not active in the southern area. The main fault structure in the county is the Lanchaiba Fault. The main fold structures are the Zhongzi Mountain Synclinore, the Liang Anticlinorium, Guomatan syncline, the Shatan Anticline, the Xinhua Syncline and the Longfengchang Anticline. On the whole, the tectonic movement in Nanjiang County is mainly manifested as fold tectonic movement. And the fault structure in Nanjiang County is underdeveloped, only some small faults are developed. Lithology is rather complicated in the study area, where the main lithology formations are loose soil, shale, mudstone, dolomite, and diorite (Fig. 3b). Mesozoic Jurassic and Cretaceous strata are the two basic underlying rock layers that consist of the most distributed red mudstone layers in this area. Most of the exposed red mudstone is deeply fractured and highly weathered. During the rainy seasons, these materials are easily weathered and softened. Due to the low strength of the red layer materials, it is easy to collapse under the changing climate conditions (Zhang et al., 2016). Hence, rainfall-induced landslides occur frequently in the study area. During the extreme rain event on 16 September 2011, over 1700 landslides were triggered in Nanjiang County. Over 90% of soil landslides in Nanjiang were concentrated in the red mudstone area (Fig. 3b). The uneven distribution of soil landslides can contribute to the poor mechanical properties of red mudstone, which is characterized by low strength, strong water sensitivity, and well-developed structural plane. Such soft rock can weather rapidly under the effect of dry-wet cycles. Weak layers between the upper soil and lower bedrock can develop due to water infiltration. Thus, the completely or highly weathered red mudstone can easily become the source of a potential landslide. To perform more accurate modeling, the red mudstone area in the south of Nanjiang has been chosen as the target study area of this research.
For the hydrological conditions, Nanjiang has a subtropical humid monsoon climate, and the average annual precipitation from 2000 to 2011 was approximately 1074 mm, which was slightly less than the annual rainfall of 1149.7 mm in 2011. Rainfall is mainly concentrated between May and September, accounting for nearly 79.2% of the annual rainfall. Figure 4 shows the rainfall contours of the red mudstone area for Nanjiang in 2011. The maximum and minimum rainfall took place in the northeast and southeast, respectively. The upper bound and lower bound were around 2,000 mm and 1,100 mm. Heavy rainfall occurred in the study area from 6 to 15 September 2011 with an accumulated rainfall of 268.1 mm. From 8:00 am on 16 September to 8:00 am on 18 September 2011, a daily rainfall of 250.4 mm and 179.1 mm were recorded, respectively. The total accumulated rainfall within the 12 days was 698.6 mm (Fig. 5). Compared with the historical monthly average rainfall of 182.6 mm in September, the extreme rain event provided likely prerequisites for rainwater infiltration and softening effects on the sliding surfaces of the potential landslides.

Landslide database
This study began with the construction of a soil landslide inventory map based on three basic datasets, including Landsat TM5 satellite images with a spatial resolution of 30 m, a 1:60000-scale topological map, and extensive field studies. According to the landslide classification by Varnes (1978), Hungr et al. (2014), these soil landslides contains 1296 soil planar slides and 455 soil rotational slides. As shown in Fig. 6 and Table 1, based on the field investigation, planar slides are often of small magnitude with a stepped sliding surface in 1-5 m deep from the ground. And they usually occur in a gentle terrain, mainly within 10-30 . On the contrary, rotational slides are of large magnitude with circular sliding surface in the depth of 10-15 m. Planar slides mainly occurred along the gentle dipped bedding surface between the shallow overlaying soil and bedrock. However, rotational slides always occur along the circular sliding surface due to gravity or rainfall. Since the most soil slides in the study area is planar slides, rotational slides occur most frequently in fairly homogeneous materials, the failure principle of them is usually simple, and hence easier to be prevented from causing damage to human lives and properties. So, only the soil planar slides are studied in this research, and 1146 soil planar slides in the red mudstone area in this database have been chosen for this study's target, and the detailed distribution characteristics of the soil planar slides were studied.

Landslide control factors
The occurrence of a landslide is controlled by multiple factors. Identification and mapping of a suitable set of parameters in a relationship with slope failure require prior knowledge of the main causes of landslides (Guzzetti et al. 1999). Based on the study of mechanics of soil landslides in the study area and data collected, eight factors are selected as control factors used for landslide susceptibility. Statistical graphs of landslide control factors in the study area are shown in Figs. 7 and 8, to intuitively illustrate the characteristics of landslides. Eight GIS layers, namely geomorphology, elevation, slope angle, slope aspect, slope structure, slope curvature, antecedent rainfall, and cumulative rainfall on 16 September, were then produced as the control factors of soil planar slides for susceptibility mapping (Fig. 9). The geomorphology factor is shown in Fig. 9a, which is derived from the interpretation of the satellite images and then checked on the field. The soil planar slides in the red mudstone area of Nanjiang are mainly distributed in the Quasi-Cuesta area (515 landslides, with a density of 0.56 landslides/km 2 ), the Mesa area (471 landslides, density 0.53 landslides/km 2 ), and the Cuesta (160 landslides, density 0.77 landslides/ km 2 ). In the above three different geomorphic units, one landslide occurred almost every 2 km 2 . It is found that the special terrain in the study area, i.e., monoclinic structure, is one of the control factors for the occurrence of soil planar slide.
The topological attributes, such as elevation, slope angle, slope aspect, slope structure, and curvature, are derived from the digital elevation method (DEM) with a resolution of 30 m. These are generated from a triangulated irregular network (TIN) model. As shown in Fig. 8b, the potential soil planar slides in the study area are mainly located between elevations of 500 m and 1000 m. The soil planar slides in the red layer area of Nanjiang are mainly distributed within the elevation range of 230-1500 m, and especially between 500-1000 m. According to the statistics, the elevation of the soil planar slides follows a normal distribution. The landslide densities in elevations of 230-500 m, 500-1000 m and greater than 1000 m are 0.63 landslides/ km 2 , 0.68 landslides/km 2 , and 0.20 landslide/km 2 , respectively (Fig. 9b). It can be found that not only does the slope structure at the elevation of 500-1000 m is mainly monoclinic, making it easy for a landslide to occur, but it is also due to intense human activities, including cropland irrigation and engineering excavations. These human activities are concentrated mainly within the same range of elevation, which further reduces the stability of the slope. As shown in Fig. 8c, the density points of landslides with a slope angle smaller than 10 , between 10 -30 , and larger than 30 are 0.542, 0.664, and 0.346 landslides/km 2 , respectively. Soil planar slides are normally distributed in the monoclinic region within the range of 10 -30 . Rainwater can gather and penetrate the surface of the gentle slopes easily. Moreover, the dip-direction of the bedding slope is consistent with the inclination direction, which leads to the occurrence of bedding failures along with the soil-bedrock interface. Local human activities and land use within the affected area can accelerate the occurrence of landslides. Furthermore, the soil planar slides are evenly distributed at different slope aspects in the red mudstone area of Nanjiang, where landslides in the directions of 90 -180 and 270 -360 are the most accounted for (Fig. 9d). The majority of the soil planar slides are distributed in dip slopes with a density of 0.5926 landslides/km 2 , followed by oblique slopes with a density of 0.5811 landslides/km 2 , and anti-dip slopes of 0.5515 landslides/km 2 (Fig. 9e). The dip slopes are prone to landslides due to the underlying dipping strata. Large sheets of rock tend to slide down the dip slopes, whereas for anti-dip slopes, the effect is the opposite.
Rainfall is a major triggering factor of soil planar slides in Nanjiang. As shown in Fig. 8a, according to statistical analyses, the 14-day antecedent rainfall before the extreme rainfall event on 16 September is mainly between 303-723 mm, following a normal distribution. The distribution curve peaks around 600 mm with the highest frequency of landslide. The density points (per square kilometer) of landslides for the 14-days of rainfall of 303-400 mm, 400-500 mm, and 500-723 mm are 0.35, 0.40, and 0.61 landslides/km 2 , respectively (Fig. 9b). The 48-hour cumulative rainfall recorded is 204-735 mm, also with a normal distribution (Fig. 9c). The density point of landslides for the 48-hours of cumulative rainfall for 204-300 mm, 300-400 mm, 400-500 mm, and 500-735 mm is 0.49, 0.55, 0.69, and 0.99 landslides/km 2 , Figure 8. Distribution of the red-layered soil planar slides with (a) (b) 14-day antecedent rainfall; (c) (d) 48-hour cumulative rainfall of the 9.16 rain event. Source: Author. respectively (Fig. 9d). Like the anteceding rainfall, the density of soil planar slides increases with the increasing cumulative rainfall.

Logistic regression model
The logistic regression model can form a regression relationship between a dependent variable and one or more independent variables, in which the results of the dependent variable is 0 or 1 (Menard 1995;Atkinson and Massari 1998). A good nonlinear logistic regression model has the advantage that the variables may be either discrete or continuous and may not follow a certain statistical distribution. Therefore, the logistic regression model is an efficient method to establish a relationship between the occurrence of landslide and the multiple control factors (Ayalew and Yamagishi 2005;Bai et al. 2010;Wang et al. 2015).
In the logistic regression model, the independent variables are the control factors, and the dependent variable is the occurrence of landslides, where 0 means landslides do not occur, and 1 represents landslides do occur. The formulas of the logistic regression model are as follows: in which, P is the probability of occurrence of a landslide, its value range is (0, 1); n is the number of independent variables; a 1 , a 2 , :::, a n is a logistic regression coefficient; and a 0 is the intercept of this model which equals logarithmic change value of the probability of occurrence, divided by the probability of not occurring when a factor changes by one unit. According to data from DEM and interpretation of satellite images, eight different GIS layers have been produced for the control factors for soil planar slides, including geomorphology, elevation, slope angle, slope aspect, slope structure, slope curvature, antecedent rainfall, and 48-hour cumulative rainfall (Fig. 9). However, some of those variables are nominal variables that should be transformed into numeric variables when used in a logistic regression model. It is significant to find an appropriate approach to quantify those nominal variables. An approach using landslide density is applied in this study where control factors are quantified by the ratios landslide points to the area occupied by each secondary control factor. This ratio may not only reflect the distribution of the disaster per unit area, but also the frequency of occurrence. The equation is as follows: where i ¼ 1,2 … n is the number of the primary control factor; j ¼ 1,2 … m is the number of the secondary control factor; N ij is the number of disaster points; S ij is the area of each secondary factor occupied; X ij is the weight of the control factor, which can be normalized to I ij and summarized in Table 2.
After producing a comprehensive evaluation layer superimposed by all the control factor layers in ArcGIS, the extracted data of each unit in this layer were input into SPSS to calculate the correlation of landslides to each factor, during which the "LR" algorithm is selected. In this process, when selecting an area where no landslide occurred, a buffer zone with a radius of 100 m was created for each landslide point to make the selected area more representative and convincing, based on the study of Suzen and Doyuran (2004). The layer contains circles of the same number as landslide points. Subsequently, the landslide point buffer layer and the comprehensive evaluation layer are analyzed to determine the units in the comprehensive evaluation layer, which do not intersect with any landslide point buffer circles. The value of the non-intersecting units is 0, indicating that there no landslide disaster occurred in the unit. Then the stepwise calculation is carried out within SPSS to eliminate variables that have no or little contribution to the occurrence of landslides. The significance level is set as 0.05. The results indicate that the slope aspect, geological structure, and pattern of the slope's surface contributed little to the occurrence of soil planar slides, however, the slope angle, the 14-day antecedent rainfall, elevation, and the 48-hour cumulative rainfall had a greater contribution.
Based on the results of the logistic regression, the following logistic regression model is applied in this study to map the study area's susceptibility to landslides in SPSS: where X 1j , X 2j , X 3j and X 4j represent the weight of the slope angle, the 14-day antecedent rainfall, the elevation, and the 48-hour cumulative rainfall, respectively. The slope angle, the 14-day antecedent rainfall, elevation, and the 48-hour cumulative rainfall have logistic regression coefficients of 5.93, 4.29, 3.48, 1.66, receptively. According to the value of the coefficients, the slope angle has the greatest impact on landslides, followed by the 14-day antecedent rainfall, elevation, and finally, the 48hour cumulative rainfall.
Using the factor weight X ij and the established regression model, the weight value of a soil planar slide to occur within each unit can be obtained. Weight values of the resulting susceptibility map are treated as relative and not absolute values ( Blahut et al., 2010). Based on the relative probability value obtained by overlapping the layers of each causing factor, the units are ordered from the highest ones to the lowest and then reclassified into classes. A landslide susceptibility map of soil planar slides in the red mudstone area can therefore be developed (Fig. 10).
Four susceptibility levels, i.e., high, moderate, low, and none were classified. The area with high susceptibility is 800.56 km 2 , accounting for 40.31% of the total area and containing 605 soil planar slides distributed between the town of Ganchang in the north of Nanjiang, and the towns of Shahe, Changchi, and Xialiang in the south of Nanjiang. The area with moderate susceptibility is 808.44 km 2 , accounting for 40.71% of the total area and containing 451 soil planar slides distributed mainly in the towns of Zhengzhi and Nanjiang. The low susceptibility regions were mainly found in Dahe town, which accounts for 16.8% of the total area with an affected area of 333.67 km 2 . A total of 83 soil planar slides were observed. The area with no susceptibility is 43.10 km 2 , accounting for 2.17% of the total area and is also mainly distributed in Dahe town in the red mudstone area of eastern Nanjiang.

Backpropagation neural network
Landslide control factors constitute a dynamic system with strong nonlinearity and dimensionality. For nonlinear dynamic problems like this, an artificial neural network is suitable, whereas traditional regression techniques often fail to produce accurate approximation as the dimensionality and/or nonlinearity of the problem increases. A three-layer backpropagation (BP) descent algorithm was used for assessing the susceptibility of soil planar slides in this paper, which is a well-recognized procedure for training a neural network (Multi-Layer Perceptrons-MLPs topology) (Yesilnacar and Topal 2005).
The principle of the BP neural network is using gradient descent for point(s) with minimum error to search for a performance surface (error as a function of neural network weights) (Yesilnacar and Topal 2005). The training of the BP neural network model is mainly composed of two processes, forward and backward propagation. First, the information is input from the input layer and passed to the hidden layer. After being processed layer by layer within the hidden layers, the processed information is finally transmitted to the output layer. This process is called forward propagation. If results are not as expected, an error signal will be issued so that weights of neurons in each layer are modified until the smallest error signal appears; this is the backpropagation process. When the weights are modified to the optimal, the network training is completed. The BP neural network model used in this paper is shown in Fig. 11.
Initially, X ¼ ðx 1 , x 2 , :::, x l Þ T is the input vector, X j is the output value of the hidden layer, and Y k is the output value of the output layer. The input value of the neuron at the hidden layer is equal to the output value at the input layer. Supposing l is the number of neurons in the input layer, m is the number of neurons in the hidden layer, and n is the number in the output layer, so X j can be calculated by the following formula: where x ij is the connection weight between node i (input layer) and node j (hidden layer), x jk is the connection weight between node j (hidden layer) and node k (output layer), Q j is the threshold of node j, and Q k is the threshold of node k. Using the lowering direction of Newton' s method to modify the weight. The weight is adjusted as the follows: In which, J represents the Jacobian matrix obtained by differentiating the error with the weight, I represent the initial iteration matrix, and E represents the error vector. When the value of u is large enough, the error correction formula can be further expressed as: Using a recursive algorithm can back-propagate the error from the output layer and adjust the weight as shown in the following formula: where d k is the actual value. The error E of the output will eventually be less than or equal to the expected error e. At this point, the training process of the network will have been completed. It is necessary to evaluate the predictive capability of landslide control factors in the BP neural network model to acquire more accurate landslide susceptibility modeling, because some factors may have a negative effect on the generated models (Tien Bui et al. 2016). Thus, in this study, three different groups of parameters are tested in MATLAB to obtain the predictive capability of each group of parameters. For each group of parameters, five hundred typical soil planar slide points and 500 non-slide points in the red layer area of Nanjiang County are extracted using ArcGIS. The 500 slides were randomly divided into two parts, including 350 slides (70%) used for training the BP neural network model and the rest 150 slides (30%) used for validating the built models. The non-slide points were also randomly split into training (70%, 350 non-slides) and validation (30%, 150 non-slides). Then, a non-linear model is established between the parameters and the slope stability, and eventually, the group of evaluation parameters with the most accurate spatial prediction result is selected as control factors of the occurrence of a soil planar slide and used to predict the landslide susceptibility.
The first group of parameters consists of all the eight influencing factors, which are geomorphology, elevation, slope angle, slope aspect, slope structure, slope curvature, antecedent rainfall, and the 48-hour cumulative rainfall. Since there is a large difference in the order of magnitude between the input data and the output data, it will lead to a large error in the prediction result of the BP neural network, which is called 'overfitting'. In order to avoid the error caused by the magnitude difference, the values of each parameter are normalized before being inputted. The softmax normalization method is used in this study to convert the input data into numbers between 0 and 1. This can eliminate the difference between the orders of magnitude. Examples of some of the input data are shown in Table 3. It can be seen that the normalized data samples are all distributed in intervals of 0 and 1, and there is no abnormal phenomenon that any of the data reaches the two ends, which achieves the expected processing effect. Moreover, for the data of slope stability in Table 3, 1 indicates that the slope is unstable and a landslide occurs, while 0 indicates that the slope is in a stable state.
The MATLAB software is used to complete the calculation of the BP neural network method. The fitted curve of the predicted value and the actual value is shown in Fig. 12(a). The correlation coefficient R between the predicted value and the actual value is 0.862, indicating the prediction result is highly accurate and that the BP network model has good reliability and can be applied to the spatial prediction of soil planar slide disasters in Nanjiang. Table 4 shows the calculation results of the weights of the eight influencing factors. Here we can see the parameters of the weights in descending order: slope angle, antecedent rainfall, elevation, the 48-hour cumulative rainfall, slope structure, slope curvature, geomorphology, and slope aspect. The results show that the soil planar slides in the red layer area in the Nanjiang County are greatly affected by slope angle, rainfall, elevation, and geomorphology, and that the slope aspect does not contribute much to the occurrence of the landslide.
The geomorphology and slope aspect with the smallest weight values are removed in the second group of parameters, and the slope structure and slope curvature are further excluded in the third group of parameters. The fitting curves between the predicted value and the actual value of the stability coefficient of the second and third groups of parameters are shown in Figs. 12b and c, respectively. The calculated weights of the three groups of parameters using the BP neural network model are arranged in the same order. Furthermore, the correlation coefficients (R) between the predicted and actual values of the BP network model, considering there was three different groups of evaluating parameters: 0.862, 0.880, and 0.839, respectively. The prediction results are all good but the correlation coefficient of the second group is the highest, which illustrates that the evaluation parameters selected in the second group are the most reliable and reasonable. Six parameters, which are elevation, slope angle, slope structure, slope curvature, and the 48-hour cumulative rainfall are found to be the key factors that affect the stability of slopes. Using the most reliable and reasonable evaluation parameters, the superposition of layers in ArcGIS is carried out to assess the landslide susceptibility. According to the probability results of a landslide to occur, the units in ArcGIS are also ordered from the highest ones to the lowest ones and then reclassified into classes. A landslide susceptibility map of soil planar slides in the red mudstone area can be developed (Fig. 13). The susceptibility of soil planar slide disasters in the red layer area of the Nanjiang County can be also divided into 4 levels (high susceptibility, moderate susceptibility, low susceptibility, and no susceptibility).

Discussion
The susceptibility map can have the potential to predict the spatial characteristics of future landslides. Without some form of validation, the prediction model and images are useless and have little scientific significance (Chung and Fabbri, 2003). According to Can et al. (2005), two rules for a spatially effective landslide susceptibility map were mentioned: i) most landslides should occur on the areas having high susceptibility values, and ii) each susceptibility should be included in the susceptibility map to be produced because the entire area can be classified in subclasses having different susceptibility categories. As the results have shown, most of the observed soil planar slides in this detailed field investigation are located in the areas with high or moderate susceptibility, in both the logistic regression and BP neural network models. In  the two susceptibility maps produced, the four susceptibility regions, high, moderate, low, and no susceptibility regions, both covered the study area. Besides, for the logistic regression model, the ROC curve is a useful method to represent the quality of deterministic and probabilistic detections and the forecast systems (Swets 1988). Receiver operating characteristic curve analysis (ROC analysis), Figure 13. Susceptibility map of the soil planar slides distributed within the red-layer zone developed by the BP neural network. Source: Author. provides tools to differentiate two classes, established through a diagnostic test, in an optimal manner. This analysis is based on the final distribution of a classification method that differentiates between correct and failed predictions according to a confusion matrix composed of four indexes, true positive (TP), false positive (FP), true negative (TN), and false negative (FN). For landslide susceptibility, TP and FN are landslides points which are located successfully and wrongly, respectively. On the contrary, TN and FP are the stable points which are located successfully and wrongly, respectively, respectively. Two statistics are calculated, namely TPR (true positive rate) and FPR (false positive rate) for a set of threshold or cut-off values (Vakhshoori and Zare 2018;Cantarino et al. 2019). The quality of a forecast system is characterized by the area under the ROC curve (AUC). The AUC varying from 0.5 to 1.0 shows the system's ability to correctly predict the occurrence or non-occurrence of the event. The AUC value of the model was calculated to validate the performance of the logistic regression model. The ROC curve of this model is drawn and the AUC value is calculated as 0.878, which shows the model has a high accuracy to predict the spatial susceptibility of landslides (Fig. 14). For the BP neural network, the predicted results of three different groups of parameters are all good, but the correlation coefficient of the second group is the highest. This illustrates that six parameters, elevation, slope angle, slope structure, slope curvature, and 48-hour cumulative rainfall, are key factors that control the stability of the slopes. The correlation coefficient (R) of this model was 0.880, which shows it is reliable and reasonable for predicting spatial susceptibility of landslide.
Moreover, the performance of the two models was compared to select the optimized model for soil planar slide susceptibility assessment of the study area. Field checks had been conducted and showed that the accurate rate of logistic regression model is 0.821 which is lower than that of BP neural model which is 0.880. The comparison illustrates that the BP neural network have a better performance for predicting the spatial susceptibility of soil planar slide for Nanjiang county.

Conclusions
Landslide susceptibility assessment is essential for understanding the threats and to predict, prevent, and mitigate landslides. On 16 September 2011, heavy rain triggered more than 1000 soil landslides in the red mudstone area in the Nanjiang County, southwest China. In this study, according to statistical analysis of the distribution of 1146 soil planar slides, and mapping of their susceptibilities using ArGIS in the study area, the following conclusions are drawn: (1) Of the eight influencing factorsgeomorphology, elevation, slope angle, slope structure, slope aspects, slope curvature, the 14-day antecedent rainfall, and the 48-hour cumulative rainfall on 16 Septemberconsidered to be used for soil planar slide susceptibility mapping. The slope angle, rainfall, and elevation have the greatest impact on the occurrence of soil planar slides, followed by slope structure, geomorphology, slope curvature, and slope aspects in both the logistic regression and the BP neural network models. This shows that the coupling effects posed by these external and internal factors contribute to the occurrence of rainfall-induced soil planar slides in Nanjiang.
(2) Among these factors, the slope angle has the largest weight and contributes the most to the occurrence of soil planar slides. The slop angles are mainly concentrated in the 10-30 range. In natural conditions, the soil slopes in Nanjiang has good stability. However, in extreme cases, such as the heavy rainfall event with rich antecedent rainfall on 16 September, infiltrated water can decrease the soil shear strength, increase the pore water pressure, and soften the interface between the soil and underlying bedrock, meaning landslides may eventually occur. Therefore, higher weights are assigned to the control factors of the 14day antecedent rainfall and the 48-hour cumulative rainfall. (3) The landslide susceptibility in the study area is divided into four levels, which are high, moderate, low, and no susceptibility in both the logistic regression model and the BP neural network model, which is shown in the susceptibility maps. In both the two models, most of the observed soil planar slides were located in areas with high or moderate susceptibility. And in the logistic regression model, total 605 soil planar slides locate in the area with high susceptibility, of which the area is 800.56 km2, accounting for 40.31% of the total area. (4) The validation of two models was also evaluated. The AUC value of the logistic regression model was 0.878 and the parameters of BP neural network has the correlation coefficient of 0.880, which shows the two models are both reliable and reasonable for predicting the spatial susceptibility of soil planar slides. According to field checks, the BP neural network model is verified to have more accurate spatial prediction performance than the logistic regression model, since it has a higher accurate rate of 0.880 than the logistic regression model.

Disclosure statement
This research is financially supported by the Youth Program of National Natural Science Foundation of China (Grant No. 41907243), Key Program of National Natural Science