Predictive modeling of landslide hazards in Wen County, northwestern China based on information value, weights-of-evidence, and certainty factor

Abstract Landslide susceptibility mapping is essential in delineating landslide prone areas in mountainous regions. The primary purpose of this study is to evaluate landslide susceptibility mapping using three methods, information value (IV), weights-of-evidence (WofE), and certainty factor (CF) in southeastern Gansu province, China. Firstly, the landslide in the study area was located mainly using aerial-photo interpretation and field surveys. A total of 529 landslides were randomly divided using a ratio of 70/30 for training and validation of these methods. Eight landslide conditioning factors including slope angle, altitude, NDVI, distance to faults, distance to roads, distance to rivers, rainfall, and lithology, were considered. Landslide susceptibility maps were produced for each of the three methods. Susceptibility maps were verified and compared using the area under the curve (AUC) method. The success rate curve demonstrated that the AUC for the IV, WofE, and CF methods was 88.02, 87.56, and 89.04%, respectively, and the prediction rate curve showed that the AUC was 86.24, 86.34, and 87.44%, respectively. Furthermore, results showed that CF method had the highest accuracy in comparison with other methods. Generally, the three methods showed reasonable accuracy in landslide susceptibility mapping. Results of this study can serve as guidelines to managers and policy makers regarding the prevention and mitigation of landslide hazards.


Introduction
Landslides, occurring suddenly and causing substantial damage to people and property, are dangerous natural hazards. Landslides occur frequently in hill and mountain terrains. The frequency of landslide occurrences increases with growing human population (Chen et al. 2016). Globally, landslides cause thousands of casualties and fatalities per year and property damage of billions of dollars (Lee and Pradhan 2007).
Scientific assessment of landslide prone areas is highly significant for mitigating and controlling landslide disasters.
Current landslide susceptibility mapping methods can be grouped into qualitative and quantitative methods. Qualitative methods are a heuristic or expert-driven approach, based on the field practical field experience of specialists (Sujatha et al. 2014). Quantitative methods are indirect mapping techniques, based on mathematical expressions of the correlation between causal variables and landslides (Wang et al. 2014). Over the years, both types of methods have been used in conjunction with geographic information systems (GIS) to assess landslide susceptibility (Dahal et al. 2008;Regmi et al. 2010;Mohammady et al. 2012;Kayastha et al. 2013;Bourenane et al. 2015;Sujatha 2017;Sujatha and Sridhar 2017). There are many statistical models used in landslide hazard analysis, including the frequency ratio method (Mohammady et al. 2012;Kayastha 2015;Bourenane et al. 2016;Chen et al. 2017a), statistical index method (Bui et al. 2011;Pourghasemi et al. 2013a;Razavizadeh et al. 2017), weights of evidence method (Dahal et al. 2008;Razavizadeh et al. 2017), certainty factor method (Kanungo et al. 2011;Sujatha et al. 2012), and logistic regression method (Ayalew and Yamagishi 2005;Lee and Pradhan 2007;Yilmaz 2009;Chen et al. 2017c). Other varieties of classification techniques, such as fuzzy systems (Oh and Pradhan 2011;Sezer et al. 2011), decision-trees (Saito et al. 2009;Pradhan 2013), neural networks (Yesilnacar and Topal 2005;Ermini et al. 2005;Pham et al. 2017), and support vector machines (Yao et al. 2008;Pradhan 2013), have been used to assess landslide risk. The primary difference between this study and other research is the application and comparison of the information value (IV), weights-of-evidence (WofE), and certainty factor (CF) methods for landslide susceptibility mapping.
This paper attempts to evaluate and compare results from landslide susceptibility maps using three statistical methods, that is, IV, WofE, and CF methods in Wen County (China). The performance of these landslide susceptibility maps are verified and compared using the area under the curve (AUC) method.

Study area
In China, geological hazards, causing hundreds of casualties and fatalities, and hundreds of millions of dollars worth of damage each year (Figure 1), happen rather frequently due to the topographical, geological, and environmental factors. Landslides are one of the most serious natural hazards. According to statistics, landslides account for more than 70% of geological disasters in China every year. The average annual economic loss is nearly sixty million dollars (Yan et al. 2017). Northwestern China is usually more serious region suffering landslide disaster in the whole country.
The study area is located in southeastern Gansu province, northwestern China, which lies between 32 35 0 43 00 to 33 20 0 36 00 N and 104 16 0 16 00 to 105 27 0 29 00 E and has an area of approximately 4,994 km 2 ( Figure 2). Elevation in the study area ranges from 559 to 4,177 m, and steep slopes are common. The study area belongs to a typical warm temperate humid monsoon climate. The mean annual temperature is 15 C, while the maximum and minimum mean annual temperatures range between 24.8 and 3.6 C, respectively. The maximum mean annual rainfall varies from 400 to 800 mm and is concentrated primarily between May and August (Qiao et al. 2017;Wang et al. 2018a). The Bailong and Baishui Rivers, running from west to east into Jialing River, are among the larger river systems within the boundary of the Wen County. In the study area, landslides occur frequently due to special geomorphological, geological, and hydrogeological conditions. A total of 529 landslides were identified and mapped ( Figure 2). These landslides are primarily distributed along roads or rivers and concentrated in groups were shallow soil can slip and debris flows are common. Medium-model landslides (volume less than 0.1 million cubic meters) and little-model landslides (volume greater than 0.1 million but smaller than 10 million cubic meters) account for approximately 80% of all landslides. Slope failures in this study area are severe and frequent, which resulted in considerable loss of life and property from local residents. Currently, not much work has been done on landslide susceptibility and risk analysis in Wen County. Therefore, it is necessary to assess the landslide susceptibility of the study area.

Data
The landslide inventory map records the locations and characteristics of landslide Chen et al. 2017b) and is the premise of landslide susceptibility analysis. In the study area, landslides were identified and mapped through interpretation of 1:50,000 scale aerial-photography and field surveys. In addition, some historical landslides records were obtained from published literature (Xie 2013;Bi 2014). The landslide inventory was randomly divided into two datasets 70% (370 landslides) for training the models and the remaining 30% (159 landslides) was used for validating the model.
Determination of landslide condition related to terrain, geology, and anthropogenic activities is essential for landslide susceptibility mapping. In the present study, eight thematic layers including slope angle, altitude, normalized difference vegetation index (NDVI), distance to faults, distance to roads, distance to rivers, rainfall, and lithology, were prepared and analyzed.
Slope angle is a primary parameter in slope stability analysis because slope angle has a direct influence on shear strength . In this study, slope angle was calculated using ArcGIS 10.0 based on digital elevation model data and was divided into six slope categories ( Figure 3a). Altitude is also frequently used in preparing landslide susceptibility maps because it is controlled by several geologic and geomorphological processes (Pradhan and Kim 2014;Wang and Li 2017;McQuillan et al. 2018). The elevation values in study area were divided into seven categories using an interval of 400 m ( Figure 3b). NDVI is often considered as a conditioning factor in landslide susceptibility mapping. In this study, the NDVI map was obtained from Landsat satellite imagery and was classified into five classes ( Figure 3c).
In this study, the proximity to faults, roads, and river were used for landslide susceptibility analysis. Landslide distribution correlates strongly with the tectonic fractures that commonly decrease rock strength. The landslides occurred primarily along faults and the number of landslides decreases sharply with increasing distance from faults (Xu et al. 2012). In study area, five different distances to faults buffer zones were prepared ( Figure 3d). Generally, landslides occur along road cuts due to excavation, additional hydrologic load change, and drainage during the road construction process (Gayen and Saha 2017). Similarly, distance to rivers is an important landslide conditioning factor since runoff commonly decrease slope stability and leads to landslides by eroding slopes (Solaimani et al. 2013). A map showing a 200 m buffer zone around roads and rivers is shown in Figure 3e and f, respectively.
Rainfall, affecting slope stability through run-off and pore water pressure, is also widely considered as a primary triggering factor of landslides (Bourenane et al. 2016). In this study, a precipitation map was prepared using rainfall data from monitoring stations and was divided into five categories (Figure 3g). Lithology is also one of the most common determining factors in landslide susceptibility maps. Rocks may vary in strength and permeability because of structural variations (Bourenane et al. 2016;Wang and Chen 2017;Wang et al. 2018b). Lithologies of the geological unit in the study area are given in Table 1 and Figure 3h.

Information value (IV) method
The information value (IV) method, originally proposed by Yin and Yan (1988) and slightly modified by van Westen (1993), is a statistical analysis method developed from information theory (Xu et al. 2013). This model has been widely used in landslide susceptibility analysis by various researchers (van Westen 1993; Bui et al. 2011;Bhandary et al. 2013;Xu et al. 2013;Bourenane et al. 2015). In this model, the IV for each parameter class is defined as the natural logarithm of the landslide density in a class divided by the landslide density of the entire map (Equation 1) (van Westen 1993; Bui et al. 2011).
where, I i is the weight given to the class of a parameter; Densclass is the landslide density within a parameter class; Densmap is the total landslide density within the entire map; N pix ðS i Þ is the number of landslides in a certain parameter class; N pix ðN i Þ is the number of pixels in a certain parameter class; N is the total number of landslides in the entire map; and n is the number of classes in a parameter map.
The resulting weights of all thematic layers used in this study were calculated using ArcGIS 10.0 and Microsoft Excel ( Table 2). The higher the weight, the higher the possibility that a landslide occurs within a given area.

Weights-of-evidence (WofE) method
The weights-of-evidence (WofE) method is a statistical method using the Bayesian probability model, which was originally developed for mineral potential assessment (Bonham-Carter et al. 1988). The WofE has been widely applied in many landslide susceptibility assessments (Van Westen 2000; Dahal et al. 2008;Hussin et al. 2016). A detailed description of the WofE mathematical equation of the method is given in several research papers (Bonham-Carter et al. 1988;Dahal et al. 2008;Sujatha et al. 2014;Rahmati et al. 2016). The WofE method is based on the determination of positive (W þ ) and negative weights (W À ). The method calculates the weight for each landslide conditioning factor (B) based on the presence or absence of landslides (L) within a specific area as follows (Bonham-Carter 1994):  where P is the probability ratio and ln is the natural logarithm. W þ and W À weights indicate positive and negative correlations between landslides occurrence and a variable, respectively (Bourenane et al. 2016). The weight contrast (C), considered to be measure of the overall importance of a parameter map class (Hussin et al. 2016), is given by the difference between two weights (C ¼ W þ À W À ). The standard deviation of W is calculated as follows: where S 2 (W þ ) and S 2 (W À ) are the variance of the W þ and W À weights, respectively.
The studentized contrast W f , giving a measure of confidence, is defined as following (Kayastha et al. 2012):

Certainty factor (CF) method
The certainty factor (CF) method, defined as a function of probability, has been widely considered and experimentally investigated for landslide susceptibility (Sujatha et al. 2012;Pourghasemi et al. 2013b;Liu et al. 2014). The CF method was originally proposed by Shortliffe and Buchanan (1975) and later modified by Heckeman (1986) where pp a is the conditional probability of a landslide event occurring in class a, and pp s is the prior probability of total number of landslides occurring in the study area (Binaghi et al. 1998). Certainty factor value ranges from À1 to 1. A positive value indicates an increasing certainty of landslide occurrence, while a negative value corresponds to a decreasing certainty of landslide occurrence. A value close to 0 indicates that the initial probability is very similar to the conditional probability; hence, it is difficult to provide any information about the certainty of the landslide occurrence (Kanungo et al. 2011;Sujatha et al. 2012;Devkota et al. 2013).
The CF values for each class of the selected factors were first calculated by Equation (6) ( Table 2). CF values for the conditioning factors were pairwise tested according to the CF combination rule (Chung and Fabbri 1993;Binaghi et al. 1998;Luzi and Pergalani 1999;Tsangaratos and Ilia 2016): x; y ! 0 x þ y 1 À min jxj; jyj À Á x; y opposite sign x þ y þ xy x; y<0 8 > > < > > : where x and y are two different layers of information, and Z is the obtained CF value. By using the parallel-combination function (Eq. 7), the pairwise combination operates until all CF layers are brought together to obtain the landslide susceptibility index (LSI).

Landslide susceptibility analysis
The weight values for different classes of conditioning factors are given in Table 2, indicating the importance of each factor. Weight values are assigned to each factor layer to generate weighted thematic maps. The LSI value was calculated by summation of each factor's ratings using Equation (8), where W C is the weight of each class of factors. In this study, LSI values were calculated using ArcGIS10.0. The LSI values based on the IV, WofE, and CF methods vary from À8.5 to 8.0, À64.6 to 90.0, and À1 to 0.996, respectively. High LSI values indicate higher susceptibility to landslides, while a lower LSI values indicate lower susceptibility to landslides. LSI values were classified using the natural break method and grouped into four susceptibility classes: low, moderate, high, and very high (Figures 4, 5, and 6). The percentage of observed landslides in various susceptibility classes was counted (Figure 7).

Validation of susceptibility maps
The area under curve (AUC) method using success and prediction rate curves (Razavizadeh et al. 2017) was used to validate the evaluation results. The rate curves show the cumulative percentage of observed landslide occurrences (x-axis) versus cumulative percentage of decreasing LSI value (y-axis) (Kamp et al. 2010;Kayastha et al. 2012). The AUC value suggests prediction accuracy of the susceptibility map qualitatively. Larger values indicate higher accuracy (Intarawichian and Dasananda 2011). In this study, success rate and prediction rate curves were created using training data and validation data, respectively (Figure 8). The CF method had the highest success rate of 89.04%, followed by IV (88.02%) and WofE (87.56%) (Figure 8). Prediction rate curves show that the prediction ability is highest for CF (87.44%), followed by IV (86.24%) and WofE (86.34%). Methods employed in this study are reasonably accurate in predicting landslide susceptibility in the study area. The map produced by the CF model exhibited the best result for the purpose of landslide susceptibility mapping.

Results and discussion
The correlation between landslide locations and landslide conditioning factors, that is, slope angle, altitude, NDVI, distance to faults, distance to roads, distance to rivers, rainfall, and lithology was determined using the IV, WofE, and CF methods ( Table 2). Given that higher values of I, W f , and CF indicate a higher level for a specific conditioning factor class, the distance to roads has the highest impact on landslide susceptibility, followed by altitude, distance to rivers, and NDVI, Lithology is comparatively less significant for determining LSI. In the study area, the topographic and geological conditions are complex, and road construction activity greatly alters the topography and original slope conditions, geological conditions, and natural stability. Therefore, the probability of landslide occurrence is highest near roads. The difference in lithology in the study area is not significant, indicating lithology has a  smaller influence on landslides. Landslides are most common on slopes from 20 -30 , and the lowest values of I, W f , and CF are for slopes >50 (Table 2). It is clear that the landslide occurrence increases with increasing slope angle to a certain extent, and then, it decreases. A similar pattern was also observed in some published literatures (Jaafari et al. 2014;Wang et al. 2015). Landslide occurence is highest at altitudes <900 m, with the maximum (I ¼ 1.978, W f ¼20.295, and CF ¼ 0.862) at an altitude <900 m, indicating that the probability of landslide occurrence is greater at altitudes <900 m compared with altitudes >900 m. NDVI values below 0.041 are relatively favorable (high susceptible) for landslide occurrence, indicating that relatively low vegetation coverage can easily lead to landslide occurrence. Regarding distance to rivers and roads, distances <200 m have the highest I, W f , and CF values, and distances >1,000 m show a very low LSI, indicating that the smaller the distance to a river or road, the more susceptible that area is to landsliding. Landslide occurrence probability is higher in areas where the distance to faults is <1,000 m. For rainfall, most landslides occurred in class <500 mm/year. The relation between lithology and landslide probability shows group 4 (see in details in Table 1) has a highest values, indicating the probability of landslide occurrence in this lithological unit is highest.
Highly and very highly susceptibility areas are primarily distributed around rivers, whereas low and moderately susceptibility areas are primarily distributed in high-altitude alpine areas. Highly and very highly susceptibility zones cover more than 20% of the study area ( Figure 7) and the percentages of very high susceptibility areas from the IV, WofE, and CF methods are 10.98, 10.13, and 15.39%, respectively. The percentages of the total landslides in the very high susceptibility class in the maps produced by IV, WofE, and CF methods are 64.08, 60.87, and 75.80%, respectively (Figure 7). In addition, it should be noted that the validation results show the accuracies for the various methods are not very different from each other. They lie in the same range. So here selected factors play a more important role than the method.

Conclusions
Landslide susceptibility mapping is a fundamental tool for disaster management in mountainous terrains. In this paper the influence of landslide condition factors including slope angle, altitude, NDVI, distance to faults, distance to roads, distance to rivers, rainfall, and lithology on landslides and their potential distribution in Wen county of northwestern China were analyzed using the IV, WofE, and CF methods. During this process, landslide locations were identified using aerial-photo interpretation supported by available literature and field surveys. As a result, 70% of identified landslides (529 landslides) were used as training data and the remaining (30%) were used to validate the methods. Landslide susceptibility maps produced by the IV, WofE, and CF methods have been classified into four classes: low, moderate, high, and very high and using the natural breaks method. The obtained LSM has been validated by comparison with known landslide locations. According to the AUC function, the training accuracy was 0.8802 (88.02%), 0.8756 (87.56%), and 0.8904 (89.04%) for the IV, WofE, and CF methods, respectively. In addition, the CF model showed higher prediction performance (87.44%) than the IV (86.24%) and WofE (86.34%) methods. Thus, the performance of the produced susceptibility map by CF method was slightly higher than that of the maps produced by the IV and WofE methods. Each of the three methods is useful in analyzing the relationship between factors influencing landslides. In this study area, areas near roads are the most susceptible to landslides. It is recommended that a landslide control project be carried out employing anti-slide piles, slope cutting, and vegetation planting along roads in the high susceptibility zones. In addition, evaluation results can be used as guidelines for policy makers regarding prevention and mitigation of landslide hazards.