Spatial prediction of rotational landslide using geographically weighted regression, logistic regression, and support vector machine models in Xing Guo area (China)

ABSTRACT This study evaluated the geographically weighted regression (GWR) model for landslide susceptibility mapping in Xing Guo County, China. In this study, 16 conditioning factors, such as slope, aspect, altitude, topographic wetness index, stream power index, sediment transport index, soil, lithology, normalized difference vegetation index (NDVI), landuse, rainfall, distance to road, distance to river, distance to fault, plan curvature, and profile curvature, were analyzed. Chi-square feature selection method was adopted to compare the significance of each factor with landslide occurence. The GWR model was compared with two well-known models, namely, logistic regression (LR) and support vcector machine (SVM). Results of chi-square feature selection indicated that lithology and slope are the most influencial factors, whereas SPI was found statistically insignificant. Four landslide susceptibility maps were generated by GWR, SGD-LR, SGD-SVM, and SVM models. The GWR model exhibited the highest performance in terms of success rate and prediction accuracy, with values of 0.789 and 0.819, respectively. The SVM model exhibited slightly lower AUC values than that of the GWR model. Validation result of the four models indicates that GWR is a better model than other widely used models.


Introduction
Landslide is a common phenomenon in mountainous areas (Lee et al. 2015;Xu et al. 2015). Landslides are ultimately driven by the topographic relief produced by fluvial and glacial erosion, and these events are controlled by hill slope material (Larsen et al. 2010). Every year, landslides cause substantial human deaths and mass economic loss worldwide (Beyabanaki et al. 2016;Carlini et al. 2016). Many scientists are engaged in research on landslide disasters; the key question is when and where it will happen and what can be done (Chou et al. 2017;Chung et al. 2017). Landslide susceptibility modelling is the most commonly used method to identify and predict landslides (Ciurleo et al. 2016;Conte et al. 2017). With the development of computers, many sophisticated models were used to predict landslides in recent years, using geographic information system (GIS) and remote sensing (Dickson and Perry 2016;Fan et al. 2016). Climate change plays an important role in the environment and human life and exhibits some effect in the development and occurrence of landslides; in particular, rainfall is the key factor (Feng et al. 2017;Franz et al. 2017). Earthquake is another important factor that induces landslides.
In the past few decades, a lot of researches have been done on landslide susceptibility modelling, however, the debate whether the physical or statistical theory can effectively explain the mechanism of landslides is still ongoing (Osadchiev et al. 2016;Promper and Glade 2016). Various models have been used in previous studies, such as frequency ratio, analytical hierarchy process, logistic regression (LR), artificial neural network (ANN), support vector machines (SVM), and fuzzy logic (Yilmaz 2010;Romano et al. 2016;Shi et al. 2016;Wang et al. 2016;Tien Bui et al. 2016Chen et al. 2017cHong et al. 2017aHong et al. , 2017c. Although these models performed quite well in landslide susceptibility mapping in different areas around the world, the best model to use has not yet achieved consensus among the researchers (Webster et al. 2016;Wen and Jiang 2016). In more recent years, new methods, such as statistical algorithms and machine learning based approaches have continuously introduced more comprehensive landslide modelling methods (Wood et al. 2016;Wu et al. 2016;Chen et al. 2017aChen et al. , 2017bHong et al. 2017bHong et al. , 2017d. Producing consistent spatial prediction of landslides is a challenge because of the complex mechanisms of landslides, such as soil condition, bedrock, topography, hydrology, and human activities (Yamao et al. 2016;Zieher et al. 2016).
According to the Ministry of Land and Resources of the People's Republic of China (http://www. mlr.gov.cn/), a total of 8,224 landslides have occurred in China in 2015. These landslides have caused 229 deaths with 58 missing and 138 injured and the direct economic losses were US$2.49 billion. Jiangxi Province is prone to geological disasters and one of the high-risk provinces. Xing Guo County is a hilly area and is surrounded by the mountains. Landslides commonly occur during the rainy season.
This study was conducted because of the urgent need for landslide susceptibility assessment for the local government and land-use planning. The objectives of this work are to (1) optimize the landslide predictors for susceptibility mapping using chi-square method, (2) evaluate the geographically weighted regression (GWR) models for landslide susceptibility, and (3) compare the GWR with well-known SVM and LR models. The analysis was performed using SPSS, Matlab R2015b, and ArcMap10.3 software.

Study area
The Xing Guo area is located in the south of Jiangxi Province and lies between latitude 26 4 0 N and 26 42 0 N, and longitude 115 1 0 E and 116 51 0 E. It covers an area of approximately 3,215 km 2 ( Figure 1). The altitude of the area ranges from 109 to 1,196 m above mean sea level. The slope angle of the study area varies between 0 to 67.5 . More than 43 geologic groups and units are recognized in this area (Table 1). The main lithology of the study area consists of purple grey feldspar, quartz sandstone, silty slate, light grey chert, and phyllite ( Figure 2).
The land-use map of the area was classified into six categories, namely, bare, forest, grass, water, farmland, and residential. Forest occupies the largest area (54.5%), whereas residential occupies 22.3%. Grass with farmland almost covers the same area (9.0%), and water covers 4.4%. Only 0.5% land-use area is bare.
The study area is located in a subtropical monsoon climate region. According to the Jiangxi Province Meteorological Bureau (http://www.weather.org.cn), the average annual rainfall in the Xing Guo weather station for 1960-2012 is from 895.3 mm (1963) to 2284.5 mm (1997). The total number of precipitation days is 156, and the rainy season is mainly from March to August, which accounts nearly 73.1% of the annual rainfall. In May and June, the average rainfall varies between 240 and 250 mm per month. The average annual temperature is 18.8 C. The average annual evaporation is 1 635.8 mm, and the average relative humidity is 78%.
According to the Xing Guo County government, a total of 3,653 people in the study area are affected by the landslides (Hong et al. 2015). The estimated damages to properties are approximately US$4 million. However, in the study area not much preventive measures have been carried out to predict the location of landslides and prevent the damages caused by them. Therefore, analysing landslides in this area is crucial. The main factor that causes landslides in the Xing Guo area is the high amount of rainfall.

Landslide inventory map
The landslide inventory map consisted of 79 landslide locations which was obtained from the Department of Land and Resources (http://www.jxgtt.gov.cn/) and the Meteorological Bureau of the Jiangxi Province (http://www.weather.org.cn/). The landslide inventory data was prepared by the aforementioned agencies through various means such as multiple field survey, and interpretation of 10-m resolution Google Earth images with a zoom-in and zoom out tools. Figure 3 shows some examples of landslides from Google Earth images. The landslide movement in the study area is mainly categorized into two types, translational slides (49) and rotational slides (79) ( Figure 4). As different types of landslides have different occurrence mechanisms, thus they are required to be studied separately for better assessment. Therefore, this study focused on modelling rotational landslides, as there were more data available (79 compared to 49) where better assessment and comparison could be done.
To calculate the volume of landslides, the thickness and headscarp area were used from the digital elevation model (DEM). Then, the volume of a landslide (V s ) was calcuated as a wedge geometry model using the following equation (McAdoo et al. 2000): where A s is the area of landslide headscarp (m 2 ), h is the height of headscarp (m), and a is the scar slope angle in degrees. The volume of the smallest landslide is 30 m 3 , the largest is 60,000 m 3 , and the average is 874.2 m 3 . Large-volumed landslides (>1000 m 3 ) occurred in the study area and affected 831 people. These landslides accounted for only 10.5% of the total number of landslides. Around 46.5% of the landslides are medium-volumed (200-1000 m 3 ) and affected 1,066 people. Small-volumed landslides (<200 m 3 ) affected 851 people accounting for 42.8% of the total landslides. The spatial distribution of the landslide locations along with their types is shown in Figure 4. The dates of landslide occurrences are almost unknown. Thus, based on a random process, the landslide inventory data (79 landslides) was partitioned into two subsets (70/30). The first subset includes 55 landslide locations which are then used as training dataset, whereas the remaining 24 landslide locations were used as validation dataset.

Landslide conditioning factors
Overall, 16 landslide conditioning factors were analysed in this study area, such as slope, aspect, altitude, TWI, SPI, STI, soil, lithology, Normalized Difference Vegetation Index (NDVI), land use, rainfall, distance to road, distance to river, distance to fault, plan curvature, and profile curvature. Chisquare method was employed to compare the significance of each factor with landslide occurrence.
A DEM for the study area was acquired from the ASTER Gdem (http://gdem.ersdac.jspacesys tems.or.jp/) at a scale of 30 m. Using this DEM, slope, altitude, aspect, plan curvature, profile curvature, streams, TWI, SPI, and STI were extracted in ArcGIS 10.2.
The slope angles were from 0 to 67.5 ( Figure 5a). Slope angle is an important indicator in landslide formation because of its relationship with the gravitational force (Chen et al. 2017e). In general, the potential likelihood for landslides to occur increases with increasing of the slope angle. However, some favourable conditions are necessary. The aspect map (Figure 5b) was produced into nine classes, flat (¡1), north (337.5 -360 , 0 -22.5 ), northeast (22.5 -67.5 ), east (67.5 -112.5 ), southeast (112.5 -157.5 ), south (157.5 -202.5 ), southwest (202.5 -247.5 ), west (247.5 -292.5 ), and northwest (292.5 -337.5 ). The direction of a slope face can affect the physical and biotic features of the slope and can significantly influence the local climate (microclimate). In some regions, patterns of soil differences related to differences exist. Thus, slope aspect indirectly affects the landslides (Pourghasemi et al. 2012;Jebur et al. 2014;. The altitude map varied between 109 and 1196 m ( Figure 5c). TWI is the major factor used to quantify the topographic control on hydrological processes, and it is a function of both the slope and flow direction. The formula of TWI is given as where A s is the specific catchment area (m 2 /m) and b is slope angle in degrees. The TWI map in this study was from 3.0 to 43.8 ( Figure 5d). Stream power index (SPI) is the rate of energy at which water flows. SPI is defined as the movement of solid particles, typically because of a combination of gravity acting on the sediments. The formula of SPI is given as where A s is the specific catchment area (m 2 /m) and b is slope angle in degrees. On the other hand, STI explains the procedure of slope failure and deposition (Jebur et al. 2014). The formula of STI is as follows: where A s is the specific catchment area (m 2 /m) and b is slope angle in degrees. Equation (4) shows that the sediment transportation is controlled by the catchment area and slope angle. In general, the larger STI, the more water accumulates at the bottom of the catchment which then causes erosion.
The SPI and STI values ranged between 0 to 58,733,748 and 0-92,239.7, respectively (Figure 5e,f). The soil map was constructed into eight groups, namely, Ach, ACu, Alh, Atc, CMd, CMo, RGd, and WR ( Figure 5g). The soil map was prepared in 1995 by the Institute of Soil Science, Chinese Academy of Sciences (http://www.issas.ac.cn/). Figure 1 shows the geological map of the study area at a scale of 1:200,000. The lithology map was obtained from China Geology Survey (http://www.cgs. gov.cn/) (see Table 1). The lithology map (Figure 5h) was constructed into eight groups (A, B, C, D, E, F, G, H, I, and J). The NDVI values varied between ¡0.48 and 0.56 ( Figure 5i). The map was obtained from the Landsat 7 ETM + satellite images, which were acquired on 10 December 1999. These images were obtained from the US Geological Survey (http://landsat.usgs.gov/).
Human activity plays an important role in changing land use in recent years. Land use is a human activity that significantly affects natural resources, soil, and plants. The land-use map was produced based on Landsat 7 ETM + satellite images. Overall, six classes were recognized, namely, water, residential area, forest land, bare land, farmland, and grassland. The land-use map was produced using the Maximum likelihood supervised method with an accuracy of 92.5% (Figure 5j). The mean annual precipitation data were collected from the 29 rainfall stations were subsequently used to create the rainfall map ( Figure 5k). A simple IDW (inverse distance weighted) interpolation method was used to produce the rainfall map. The precipitation data were extracted from a database from the government of Jiangxi Province Meteorological Bureau (http://www.weather.org.cn). Road and river networks were constructed into five group categories that undercut slopes larger than 15 , and were extracted from the topographic map at a scale of 1:50,000. Subsequently, the distance to road maps ( Figure 5l) and the distance to river maps ( Figure 5m) were prepared. The fault lines were extracted from the geological map at a scale of 1:200,000 and were employed to construct the distance to faults map. The value ranged from 0 to 25,875 m ( Figure 5n). Furthermore, the plan and profile curvatures were extracted from the DEM and classified into three classes, flat, convex and concave (Figure 5o,p). The plan curvature which is created by intersecting a horizontal plane with the surface controls the divergence and convergence of water during the slides flow. On the other hand, the profile curvature is constructed by considering a profile parallel to the direction of the maximum slope. The profile curvature causes the acceleration or deceleration of water flow on the surface.
To assemble the conditioning factor maps and landslide inventory, the vector datasets were converted into raster data format with the reference scale of DEM (30 m). Finally, the factor maps and the landslide inventory data were combined to construct the matrix to develop the regression models in statistical software. The rows of the matrix contained the attributes of the predictor maps, whereas the columns represented the landslide and non-landslide samples. Aspect, soil, lithology, and land-use data were used as categorical variables whereas the remaining variables were used as continuous. To avoid the sensitivity of the models to the reclassification procedure of the continuous variables, they were not further reclassified into subclasses. Figure 6 shows the overall GWR modelling workflow implemented in ArcGIS software. First, the necessary input data was gathered and managed in a proper data storage location. Then, 16 landslide conditioning factors were derived from various sources and by different methods. The details are explained in Section 3.2. In the GWR modelling stage, three main steps were applied. To optimize the landslide conditioning factors and selection, a chi-square factor optimization method was adopted. Then, by using an empirical analysis, the parameters of the GWR model were selected based on the training dataset, i.e. 30% of the whole landslide inventory data. Then, the spatial weight matrix was constructed using the optimized landslide conditioning factors and the landslide locations. Finally, the developed GWR regression model was applied to predict the probability of landslide occurrence in the remaining pixels locations in the dataset. The pixels that had values close to 1, interpreted as a high probability, whereas the pixels with values close to 0 interpreted as the low probability of landslide occurrence. Next, the landslide susceptibility was mapped by using the weighted sum function of ArcGIS. In this step, the estimated weight of each factor was used to overlay the landslide conditioning factors producing the landslide susceptibility index. Then, the landslide susceptibility index was converted into a probability by re-scaling into the range of 0-1 by using a linear function. After that, the probability raster map was reclassified into five classes of landslide susceptibility by the quantile classification approach. Finally, the landslide susceptibility map was validated by calculating the success and prediction rates using the testing dataset.

Factor optimizationchi-square method
To evaluate the performance of different types of models, the quality of input data should be as high as possible to reach an accurate and reliable conclusion (Julong 1989;Breiman 2001;Wang 2005). The quality assessment for input data is essential because of the difficulty in preparing the landslide inventory map. Selecting significant parameters (landslide predictors) is another important step prior to landslide susceptibility modelling (Moh'd A Mesleh 2007). In this study, chi-square based factor optimization method was adopted to select significant landslide predictors for modelling purpose (Lineback Gritzner et al. 2001).
Pearson chi-square is the main test used to determine the significance of the relationship between different categorical variables (Satorra and Bentler 2001;Ye and Chen 2001). Its concept is based on computing the expected frequencies in a two-way table (i.e. no relationship exists between the variables) (Press 1966). The value of the chi-square and its significance level depend on the overall number of observations and the number of cells in the table (Regmi et al. 2010;Bryant and Satorra 2012). To calculate the importance of landslide predictors using chi-square method, the null hypothesis was defined first. The null hypothesis states that knowing the level of a landslide predictor does not help predict landslide occurrence (Sarkar and Kanungo 2004). The variables are independent.
H1: Variable X (e.g. slope) and variable Y (e.g. landslide occurance) are not independent: where O i is the observed frequency count at level i of variable X and E i is the expected frequency count at level i of Variable X.
After the calculating the chi-square statistical value and the P-value for each predictor variable, the P-value was evaluated against the significance level (0.05) to estimate the relationship between the landslide predictor and landslide occurrence. A high chi-square value implies a high indicator performance to identify the landslides.

GWR
GWR is a spatial regression technique, which is widely used in geography and other disciplines (Brunsdon et al. 1996;Fotheringham et al. 1998;Brunsdon et al. 1999). Regression parameters in different geographic locations tend to exhibit various results (Leung et al. 2000;Brunsdon et al. 2001). In utilizing the global spatial regression model, regression parameter estimation will be the regression parameters in the entire study area; the average value cannot reflect the regression parameters of the real feature space (Blanco-Moreno et al. 2008;Griffith 2008;Pirdavani et al. 2014b). Therefore, a model must be identified to deal with this problem.
In simple linear regression, the dependent variable is modelled as a linear function of a set of independent or influential variables as follows: where y i is the ith observation of the dependent variable, x ik is the ith observation of the kth independent variable, the e i is independent normally distributed errors terms with zero means, and each a k must be determined from a sample of n observations (Brunsdon et al. 1996;Pirdavani et al. 2014a). GWR is a relatively simple technique that extends the traditional regression framework of the equation by allowing local variations in the rates of change. Thus, the coefficients in the model are specific to a location i instead of being global estimates. The regression equation is calculated as where a ik is the value of the kth parameter at location i. Equation (7) is a special case of Equation (6) where all the functions are constants across space. Point i, at which estimates of the parameters are obtained, is completely generalizable and not only needs points at which data are collected (Brunsdon et al. 1996; Lukawska-Matuszewska and Urbanski 2014). The GWR produces localized versions of all standard regression diagnostics, including goodnessof-fit measures, such as R 2 , and produces localized parameter estimates. The localized parameter estimates is useful for understanding the application of the model being calibrated and exploring the possibility of adding additional predictors to the model (Fotheringham et al. 1998).
In this sense, the difference between GWR and the spatial error approach is that spatial drift from 'average' global relationships is measured directly in the former, whereas it is measured as a secondorder effect through the spatial distribution of residuals in the latter (Kimsey et al. 2008;Spurna 2008). The GWR is also used to improve our understanding of the processes being modelled, and thus separate local spatial anomalies in terms of each predictor (Brunsdon et al. 1999;Wei and Qi 2012).
The main idea of GWR is a spatial weight matrix; the result of GWR is influenced by selecting different spatial weighting functions ( Spurna 2008;Ogneva-Himmelberger et al. 2009). Regardless of the specific weighting function employed, the essential idea of GWR is that for each point i, a "bump of influence" exists around i corresponding to the weighting function in such a way that sampled observations near to i have more influence in the estimation of i's parameters than that of sampled observations that are farther away (Brunsdon et al. 1996).
The main function of GWR is as follows:

Distance threshold function
Distance threshold function is the simplest and widely used method of the spatial weighting function and is expressed as where D is the distance threshold and d ij is distance from return point i to data point j.

Gaussian function
The essential idea of Gaussian function is to select a continuous monotonically decreasing function to express the relationship between W ij and d ij as where b is defined as the bandwidth. If i and j coincide, the weight of data at that point will be combined and the weight of other data will decline according to a Gaussian curve as the distance between i and j increases (Fotheringham et al. 1998;Kumar et al. 2012).

Bisquare function
Equations (8) and (9) can reach to a compromise solution which have a desirable property of excluding all data points greater than some distance from, as well as the analytically desirable property of continuity. An example of the bisquare function is given as Equation (10)  Many previous works discussed the application of GWR (Harris et al. 2010;Koutsias et al. 2010;Paez et al. 2011;. To address the limit of the least square sum of squares, CV approach was suggested for local regression (Cleveland 1979) and is given as whereŷ 6 ¼i b ð Þ is the fitted value of y i with the observations for point i omitted from the calibration process. This approach can counteract the 'wrap-around' effect because when b becomes very small, the model is then calibrated on samples near i and not at i itself.

Stochastic gradient descentlog loss (logistic regression)
Stochastic gradient descent (SGD) is a stochastic solution of the gradient descent optimization for minimizing an objective function which is in a form of a sum of differentiable functions (Cleveland 1979;Langford et al. 2009;Bottou 2010;Bach 2014). One can generalize the loss function using Equation (12). The loss function consists of two parts, namely, loss term and a regularization term. These two terms can be written as follows: f w x i The log loss equivalent to the cross entropy loss function is used to train an LR model: JðwÞ ¼ λkwk 2 þ X i y i log g w ðx i ð Þ Þ þ ð1 À y i Þðlog1 À gðx ðiÞ ÞÞ; y ðiÞ 2 0; 1 f g (16) Thus, Equation (17) can be written as

Stochastic gradient descenthinge loss (SVM)
The loss term for soft margin SVM is presented below (Bottou 2010). We use L hinge to represent the hinge loss: 4.6. Support vector machine SVM is essentially a nonlinear data processing method that differs from neural networks. The former is based on structure risk minimization principle, whereas the latter is based on empirical risk minimization principle (Hong et al. 2015). These novel structure risk minimization principles are based on firm mathematical foundations and produce profound changes in understanding machine learning . SVM has the following characteristics (Ren et al. 2015;Shahabi et al. 2015): (1) Simple structure.
(3) Sparse representation: The direction of the optimal separating super-plane is a linear combination of training samples. The coefficient that is contained in each sample reflects its importance. All the information about classification is contained in support vectors whose coefficients are not zero. If non-support vectors are removed or shifted slightly, re-training leads to the same solution as before. In other words, the solution only depends on support vectors. (4) Modularization: SVM is composed of two modules, namely, a general-purpose learning machine and a domain-specific kernel function. Thus, we can design learning algorithm and kernel function in a modular way. This process is crucial for theoretical analysis and engineering implementation.
The most common formula of SVM classification is given as where n denotes the number of training data points. Moreover, x i and y i are training and testing pattern, respectively. b represents the bias term, and K x i ; y i ð Þis the kernel function. In general, four types of kernels are used with SVM classifier, radial basis function (RBF), polynomial (PL), sigmoid (SIG), and linear (LN). In this research, the RBF kernel was selected because of the most common kernel function used in landslide susceptibility mapping and because it is less sensitive to outliers (Yao et al. 2008). The mathematical representation of RBF is shown as Radial basis function : Kðx i ; y i Þ ¼ ðÀgkX i À X j kÞ; g > 0; ( 22) where K x i ; y i ð Þis the kernel function, g is the gamma term in the kernel function for all kernel types except linear, d is the polynomial degree term in the kernel function for the polynomial kernel, r is the bias term in the kernel function for the polynomial and sigmoid kernels, and g, d, and r are user-defined parameters; the correct definition of these parameters can increase the accuracy of the SVM solution (Su et al. 2015;Tehrany et al. 2015;Chen et al. 2016;Hong et al. 2016). Many previous works applied SVM methods in landslide susceptibility modelling (Chen et al. 2016;Hong et al. 2016).

Statistical evaluation measures
In this study, statistical index-based evaluations and receiver-operating characteristic (ROC) curve have been used to validate the produced landslide susceptibility maps. Statistical indexes such as sensitivity and specificity were used (Su et al. 2015;Chen et al. 2016. These metrics were calculated based on the confusion matrices resulting from the GWR, SGD-LR, SGD-SVM, and SVM models and the landslide inventory map: where TP is the number of landslide points correctly classified to the landslide class and TN is the total number of non-landslide points correctly classified to the non-landslide class. FN is the number of landslide points classified to the non-landslide class, and FP is the non-landslide points classified to the landslide class (Frattini et al. 2010;Hong et al. 2016;Chen et al. 2017i).
On the other hand, the ROC curve which is a standard method to validate the general performance of landslide susceptibility models were constructed by plotting sensitivity and 100-specificity indexes . The area under the ROC curve was calculated to validate quantitatively the general performance of the landslide susceptibility models. The higher the AUC value, the better performance of landslide models. The performance of landslide models is perfect when the AUC is equal to 1.0 (Tien Bui et al. 2017b;Chen et al. 2017d). In addition, success curve rates and prediction curve rates were constructed using the landslide training and validation datasets, respectively. The success curve rate shows the performance of a landslide model to fit the training dataset, whereas the prediction curve rate depicts the performance of landslide models to predict landslides in unsampled areas.
Validation of landslide susceptibility maps is an important task that should be conducted to confirm the usability of the final maps using all kind of models. In the current study, landslide susceptibility maps produced by the four models were validated by comparing the susceptibility map with the training and the testing data. To conduct this process, 79 landslides were randomly separated into two datasets; 55 (70%) landslides were selected as the training data and the remaining 24 (30%) landslides were used as testing data. Figure 7 shows the results of the chi-square test on the observed distribution and the expected distribution of the landslide occurrence based on posterior probabilities calculated using the 16 variables. The highest chi-square value (119.13) was observed for the slope factor indicating the high contribution of this factor to the rotational landslide occurrence in the study area. In addition, the chi-square values of the altitude, STI, distance to river, and aspect factors are greater than 60. However, the factors distance to road, plan curvature, NDVI, and land use had relatively low chi-square values below 10.

Results of chi-square factor optimization
To select the best subset of landslide conditioning factors, a threshold of the chi-square value must be established. No standard methods were developed to select these thresholds because these methods depend on the characteristics of the study area and datasets used. Therefore, the factor subset selection depends on the analyst. In our case, three experiments with best 5, 10, and 15 factors were conducted to select the best subset for GWR modelling. The success and prediction rates of the four models (SVM, SGD-SVM, SGD-LR, GWR) using the three-factor subsets are shown in Table 2.
In general, the results demonstrated that a larger number of landslide conditioning factors obtain a higher prediction accuracy except with SGD-LR model. The best accuracies of SVM, SGD-SVM, and GWR were obtained with using 15 factors. On the other hand, the highest accuracy of SGD-LR was obtained with 10 factors. Increasing the number of factors from 10 to 15 affects the accuracy of the SVM, SGD-SVM, and GWR models. The prediction accuracy of GWR was increased from 0.83 to 0.85 when the number of factors increased from 10 to 15. Therefore, considering the focus of this paper (i.e. GWR modelling) the 15 factors were used to produce the landslide susceptibility maps for the study area.

Results of spatial correlation among landslide locations in the study area
The estimation of parameter coefficients by GWR required the mapping of spatial correlation among landslide locations in the study area. The spatial correlations among landslide locations were calculated and represented by a spatial weight matrix. This matrix is a representation of the spatial structure of landslide data. The spatial weights matrix imposes a structure on the landslide data, which is crucial to select a conceptualization that best reflects how features actually interact with each other. In this study, the inverse distance was selected because it is most appropriate than other available methods in ArcMap 10.2. The spatial weight matrix is employed to generate the spatial correlation map of landslides in the study area. Figure 8 shows the results of the spatial correlation among landslide locations in the study area. In a physical sense, the spatial weights indicate the variations of landslide locations in terms of spatial distribution. In other words, the geographical location is an important indicator for a landslide, and each landslide location was compared with the neighbouring landslides using the Euclidean distance. This process was important to ensure that landslides have influences according to their spatial distributions and relations to neighbour slides. The correlations were estimated as continuous values ranged from 0 to 1. However, these correlations are shown as categorical classes. Thus, the interpretation of the map becomes significantly easier. The spatial correlations were categorized into three classes, namely, weak relationship (0-0.6), moderate relationship (0.61-0.7), and strong relationship (0.71-1), by the quantile method. Few landslides in the study area were highly correlated. This result can be observed in the west and middle parts of the study area. However, most landslides exhibited weak and moderate spatial relationships, which can be seen in the east, south, and some parts in north of the study area. This map was used to generate landslide susceptibility by GWR method, which is presented in the next section.

Landslide susceptibility modelling
Four landslide susceptibility maps were produced for the study area by employing GWR, SGD-LR, SGD-SVM, and SVM models (Figure 9). The first examination of the susceptibility maps shows that all the models agree that the northwest part of the study area is highly susceptible to rotational landslides. The maps show that the majority of the study area has low/moderate susceptibility to landslides. GWR and SGD-LR models produced maps where the high and very high susceptibility classes have larger areas compared with SVM models.
Using the 15 landslide conditioning factors, the SVM, SGD-SVM, SGD-LR and GWR models were constructed using the training data. The ROC curves and AUC values of the four models are presented in Figure 10. The GWR model exhibits the highest performance in terms of success rate and prediction rate with values of 0.87 and 0.85, respectively. The SGD-LR model exhibited slightly lower success and prediction rates (0.83) than GWR model. The SGD-SVM model showed better success and prediction rates than the normal SVM model. The success rates of SGD-SVM and SVM models were 0.81 and 0.79, respectively. In addition, the prediction rates of these models were slightly lower than success rates. Overall, the result of the validation of the four models indicates that GWR is a better model than other widely used models. Table 3 shows the estimated coefficient values for the landslide conditioning factors by the models developed in the current study. The coefficients were standardized using the expression presented in Equation (24). The result illustrates that the four landslide susceptibility models are consistent in terms of altitude, slope, NDVI, and aspect, which are the most important factors that contribute to landslides in the study area: where W s;i k ð Þ is the standardized weight at kth approach, W i k ð Þ is the calculated weight by the kth approach, and i is the identity number of each parameters (e.g. i value of slope = 8).

Discussion
Producing accurate landslide susceptibility maps is difficult because of several reasons, such as soil condition, bedrock, topography, hydrology, and human activities. Therefore, the comparative study of landslide susceptibility models is progressing in the scientific literature. In this study, four models were assessed, namely, GWR, SGD-LR, SGD-SVM, and SVM. A total of 79 landslides and 16 conditioning factors, such as slope, aspect, altitude, TWI, SPI, STI, soil, lithology, NDVI, land use, rainfall, distance to road, distance to river, distance to fault, and plan curvature, profile curvature, were analysed. Only the significant factors were used in the susceptibility mapping.

Impact of landslide conditioning factors
In this study, the chi-square method was used to select significant landslide conditioning factors to be used in susceptibility models. This process can reduce over-fitting to the training data and speed up the classification process. In addition, non-significant factors may generate severe multicollinearity, which can disrupt regression estimates. Therefore, employing this technique was important in landslide mapping, which was conducted in the current study. The chi-square method evaluated landslide factors individually with respect to the presence or absence of landslides. The results revealed that SPI factor is not significant for landslide susceptibility modelling in the study area whereas slope and altitude were determined to be the most influential factors. In the generated regression models, the slope was given a high coefficient, which indicates its importance for the spatial prediction of landslides. However, plan curvature was given a relatively low coefficient by the four models. As a result, the interpretation of the estimated coefficients in the models cannot describe the importance of landslide factors by evaluating only one model. To explain the causes of landslides, several models should be analysed. This analysis is crucial to determine the important factors that cause landslides.

Prediction accuracy of the models
The comparative study of the four models showed that the prediction rate of the four models is quite close to each other. The highest prediction rate of (0.85) was achieved by the GWR model. The results also revealed that the SGD learning approach could improve the prediction accuracy of landslides in SVM models. The success and prediction rates of the traditional SVM were slightly lower than the SGD-SVM approach. In addition, according to the successive and predictive curve rates, the performance of SGD-LR is higher than SGD-SVM using the training dataset. The hyperparameters of the SVM model could be fine-tuned such that it could perform well on the training dataset. The predictive rate of SGD-SVM is lower than those of SGD-LR. This indicates that the global LR model has more generalization capability and less sensitive to over-fitting.
One of the main advantages of GWR is that it can integrate geographical location and other landslide conditioning factors for estimating the spatial distribution of landslides and reflects the nonstationary spatial relationship between these factors and landslide occurrence probability. The spatial variations exhibit in landslide conditioning factors within the study area is a challenge when using most of statistical and data mining methods. A factor can have either positive or negative effect on landslide occurrence in these methods. On the other hand, the GWR method generates local regression models that vary according to the geographic location in the study area. Compared with other traditional models, GWR achieved better accuracy and explained the spatial distribution characteristics of landslides in the study area. However, the results of the current study showed that the GWR model could achieve better accuracy with 15 factors compared to LR where it achieved the highest prediction accuracy with only 10 factors. Using a large number of factors in GWR modelling can yield to a severe multicollinearity problem where the GWR model cannot be efficiently built. Therefore, further improvements in factor optimization for GWR should be explored and deeply investigated. Another important point is the temporal correlation of landslides. With comprehensive landslide inventory data where the temporal information is available, the GWR model can be further enhanced to include spatial-temporal modelling for landslide susceptibility mapping. Furthermore, because GWR is a local regression method, the spatial resolution of the input data can have a significant effect on its accuracy and performance. Thus, the effects of the spatial resolution of input data on GWR modelling is recommended to be studied in future works.

Contribution of the study and results of previous studies
Comparative study of modelling methods is a classical research area in landslide susceptibility assessment. The main goal of these works is to understand the prediction capability of the models and the effect of their hyperparameters in different environments on different datasets. Even though the science in this field has progressed a lot, several models including GWR have not been fully understood in the context of landslide modelling. The contribution of this study is to understand the prediction accuracy of the GWR model and its sensitivity to the number of landslide conditioning factor in the case study of Xing Guo area (China). As landslide occurrences and conditioning factors have spatial variations, global models such as neural network or LR ignore autocorrelation characteristics of data between the landslide locations in susceptibility modelling. In the literature, several studies have compared spatial regression and global models. Erener and D€ uzg€ un (2010) found that the spatial regression model which estimates the coefficients at local scale has better generalization performance (AUC = 0.83) than the LR model (AUC = 0.74). Feuillet et al. (2014) suggested that GWR-based modelling provides significant inputs for landslide susceptibility mapping, by highlighting local drivers, indecipherable in global models. More recently, Yu et al. (2016) developed a landslide susceptibility model utilizing the GWR approach and they found that their model outperforms the SVM model in terms of prediction capability by up to 19%. In addition, they indicated that the slope and distance from drainage are greatly significant for landslide occurrence in their study area. In our study, according to the coefficients estimated by the GWR, the slope and distance from the river are influential factors. Other studies such as by Park and Kim (2015) and Sabokbar et al. (2014) have shown that high prediction accuracy better than global LR model can be achieved by the GWR model for landslide susceptibility mapping.

Conclusion
In this paper, a comparative experiment between GWR, SVM, and LR for landslide susceptibility mapping is presented using multisource data of the Xing Guo area in China. The GWR model was developed using the significant factors selected by the chi-square method. Several subsets of landslide factors were analysed and the sensitivity of the GWR model to the number of the selected factors is reported. The results of the comparative study showed that the GWR outperforms the SVM and LR models in terms of prediction capability. Based on the results obtained from the current study, GWR can be used for the spatial prediction of landslides and it is comparable to the wellknown methods (i.e. SVM and LR). The landslide susceptible zones represent an important base for assessing landslide hazard and risk over the study area. Consequently, the generated maps could be useful to local authorities and decision makers for selecting suitable locations for future land-use planning and implementation of development.
However, there are several points that need to be considered in future works as follows: (1) the GWR model was found to perform better when using 15 landslide factors compared with using less number of factors. In this context, other optimization methods (i.e. Random Forest, Ant Colony) should be investigated to attempt reducing the number of factors while preserving the prediction accuracy of the model. This can improve the general performance of the GWR model; reducing the multicollinearity problem and the sensitivity of the model to overfitting especially in data-scarce environments. The second point that needs to be addressed is the integration of the spatial regression models (e.g. GWR) with other statistical and data mining methods to improve the prediction capability of the landslide susceptibility models. Finally, with comprehensive landslide inventory data, both spatial and temporal autocorrelations can be investigated in the spatial regression model or in the integrated models.

Disclosure statement
No potential conflict of interest was reported by the authors.