Large-scale assessment of landslide hazard, vulnerability and risk in China

Abstract Landslides (generally including debris flows) seriously threaten life and property. Studies on landslide risk are of great significance for regional disaster prevention and mitigation. In this study, the grid of 1 km ×1 km was selected as the assessment unit to exhibit the spatial distribution of landslide risk in China. With the support of Geographic Information System (GIS) technique, the combined model of certainty factor model (CF) and logistic regression model (LR) was applied to assess the landslide hazard, Liu’s method was used to evaluate the landslide vulnerability, and the product of hazard multiplied by vulnerability was used to represent the landslide risk. The spatial patterns of the hazard, vulnerability and risk of landslides were displayed on the national scale of China. The results indicated that the high hazard zones are commonly distributed in the south of Yangtze River, especially the very high hazard zones are concentrated in the southwest of China. The high vulnerability zones are mainly distributed in the eastern coastal cities with developed economy and dense population. The distributional pattern of landslide risk is approximately divided by Heihe-Tengchong population density line. The west of the line is mainly distributed low risk areas; whereas the east of the line contains moderate and high risk areas. The classes of very low, low, moderate, high and very high risk areas account for 40.51%, 18.34%, 33.86%, 7.28% and 0.01% of the total area respectively. The macro-scale regional assessment is needed in China, which may benefit the top design of landslide risk reduction and territorial functional zoning.


Introduction
Risk assessment is an effective approach to mitigate natural hazards and reduce landslide disasters. Over the past years, studies on geological hazard risk have made significantly progress. The risk assessment with different scales on site-specific, local and regional dimensions have been carried out and updated in the different areas around the world (Wilson and Crouch 1987;China Earthquake Administration, Editorial Department of Journal of Natural Disasters 1992;Alexander 1993;Blaikie et al. 1994;Shook 1997;Ren 1999). Meanwhile, landslide risk management has also been further strengthened (Dai et al. 2002;Liu and Lei 2003). Shi et al. (2006) used a semi-quantitative method to regionalize the risk zones for those natural hazards affecting Chinese cities. Gentile et al. (2008) used hazard map, incorporated with field survey and empirical model, evaluated the debris-flow risk in southern Italy. Kawagoe et al. (2009) assessed the landslide risk caused by melted snow in Japan, as well as analysed the influential factors concerning hydrology, geology, geomorphology on landslide susceptibility, and thus a probability model for landslide risk by multiple logistic regression analysis was constructed. Based on a number of literatures of debris-flow risk, Wang et al. (2012) argued that the existed models of debris-flow risk assessment might probably not applicable to those debris flows triggered by Wenchuan earthquake on 12 May 2008 in China. Kritikos and Davies (2015) applied fuzzy logic method to evaluate shallow landslides in the Southern Alps, and obtained the regional sensitivity map for land use planning and landslide disaster mitigation. Using semiquantitative method, Althuwaynee and Pradhan (2017) assessed landslide risk in densely populated areas with lack of historical data, so that the proposed method could approximate landslide risk in the data-scarce environment.
Based on the first generation of landslide risk map of China (Liu et al. 2011), the objective of this study is to update the risk map and renew the macro-spatial distribution through a combined assessment method so as to provide the latest findings for the country's stratagem on landslide prevention and risk reduction.

Materials and methods
Topography, geological structure, meteorology, hydrology, vegetation, soil and human activities are commonly environmental drivers that control geological hazard activities, while the factors deriving from environmental drivers have various influences on the different types of geological hazards (Zhang et al. 2004). In present study, nine factors being of general significance influencing the formation, distribution and evolution of landslides were extracted from the environmental drivers. The factors include slope, lithofacies, geological age, fault distance, average annual precipitation, monthly rainfall variation coefficient, average annual days of rainstorm !50 mm/24 h, ground motion peak acceleration, and land cover. These factors were firstly considered as the candidate factors for assessing landslide hazard. Meanwhile, Economic density (Gross Domestic Product per unit area, GDP/km 2 ), population density and land-use type were selected as the indexes to quantify landslide vulnerability. In more than 9.6 million km 2 territory of China, the assessment unit with 1 km Â1 km grid created more than 9.6 million assessment grids, which had enough high-resolution for the macro-regional assessment. Landslide risk and the spatial distribution were mapped and analysed by Geographic Information System (GIS) technique. This work may establish a scientific base for top design on landslide risk management in China.

Data sources and processing
The landslide data came from China Geological Disaster Bulletin, China Geological Environment Bulletin, and Geostress and Geological Hazards Database of China. The number of super-huge and huge landslides as well as their locations came from historical archives. Based on the Regulations on Geological Hazard Prevention of China, the categories of geological hazards are listed in Table 1. After cataloguing the text data, the spatial locations of the landslides were plotted on China map and the vector layer was generated by ArcGIS 10.2 software.
Using Kriging method to interpolate rainfall data form 194 meteorological stations in China, the maps of average annual precipitation, monthly rainfall variation coefficient, and average annual days of rainstorm !50 mm/24 h were obtained, and then converted them into 1 km Â1 km grid data.
From the land cover information of European Space Agency (ESA), the data for China were extracted. Converting the projection coordinate, the 22 land cover types used by the ESA were reclassified into 6 Chinese land cover categories: (1) unused land; (2) water; (3) grassland; (4) forestland; (5) cultivated land and (6) industrial and residential land, and then the Chinese land cover map was produced with 1 km Â 1 km grid. Economic density (GDP/km 2 ) and population density were used as the surrogates for socioeconomic indexes representing vulnerability. The data came from State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research of the Chinese Academy of Science (www.resdc.com). The spatial analysis function of ArcGIS10.2 software was applied to evenly distribute the economic and population densities in 1 km Â1 km grids, and thus the maps of economic density and population density were obtained.

Assessment methods
Adopting the risk definition published by United Nations Department of Humanitarian Affairs (UNDHA) (1991), landslide risk may be calculated by Equation (1).
where R is risk; H is hazard; V is vulnerability; all of the values range from 0 to 1.

Hazard assessment model
Statistical models are widely used in the hazard assessment for landslides (Shahabi and Hashim 2015). In this study, the statistical model was a combination of the certainty factor model (CF model) and the logistic regression model (LR model).
The CF model presumes that the future formation conditions of the landslides are similar to those in the past (Shortliffe and Buchanan 1975;Heckerman 1986). The C F values of the CF model are defined using the proportion of the number (or area) of landslides to their total number (or total area) for each class of their influential factor, to represent the contribution of each class of influential factor to the hazard. Thus, C F values may be calculated by Equation (2).
where PP a is the probability of landslide occurrence of class a, which is expressed as the proportion of landslide number (or area) of class a to the total number (or area) of class a; and PP s is expressed by the proportion of total number (or area) of landslide to the total area of the studied region. The C F values fall within [-1, 1]. Positive value means that there is an increase of landslides, where the environmental drivers are prone to landslide occurrence. Negative value means that there is a decrease of landslides, where the environmental drivers are not prone to landslide occurrence.
The LR model describes a relationship between the binary dependent variables. "0" indicates that the landslides did not occur; "1" indicates that the landslides did occur. The independent variables are the influential factors (x 1 , … , x 9 ). Thus the logic regression function may be expressed as Equation (3).
where P is the probability of landslides occurrence (0-1); a is the intercept; and b is the regression coefficient. Both sides of Equation (3) were token natural logarithm, the ln(P/(1-P)) is obtained. Using ln(P/(1-P)) as the dependent variable Y, the influential factors x i (i ¼ 1, 2, … ,9) naturally become the independent variables. Thus, the linear regression expressed in Equation (4) The results for the significance test of each influential factor are shown in Table 2. The significance of the independent variables is determined by the p value (or Wals value). In general, when the p value is less than 0.05, the statistics is significant. As seen from Table 2, the p values of annual rainfall precipitation (x 5 ) and land cover (x 9 ) are greater than 0.05, indicating the two factors are not significant, and they were eliminated. The remaining 7 influential factors were substituted into Equations (3) and (4). Thus, the combined model of CF model and LR model was displayed as Equation (5). The combined model was used for assessing landslide hazard in China.
where P is landslide hazard; x 1 is the C F value of the class of slope; x 2 is the C F value of the class of lithofacies; x 3 is the C F value of the geological age of the lithofacies; x 4 is the C F value of the class of fault distance; x 6 is the C F value of monthly rainfall variation coefficient; x 7 is the C F value of average annual days of rainstorm !50 mm/24 h; x 8 is the C F value of the class of ground motion peak acceleration.

Vulnerability assessment model
Vulnerability is the potential total maximum losses due to a potential damaging phenomenon for a specified area and during a reference period (UNDHA 1991; Liu and Lei 2003). Referring to Liu et al.'s (2011Liu et al.'s ( , 2012) simplified vulnerability assessment model, the landslide vulnerability in China may be expressed as Equation (6).
where V is vulnerability; G is GDP density by normalization; L is the assigned value of land-use type; D is population density by normalization. GDP density and population density were normalized by the range transformation. Land-use type was normalized using the assigned values of 0, 0.2, 0.4, 0.6, 0.8 and 1, respectively for unused land, water, grassland, forestland, farmland, industrial and residential land.
Using Equation (6), landslide vulnerability was obtained in each of 1 km Â 1 km grid. The vulnerability was also divided into five classes: very low vulnerability (0-0.2), low vulnerability (0.2-0.4), moderate vulnerability (0.4-0.6), high vulnerability (0.6-0.8) and very high vulnerability (0.8-1). Landslide vulnerability map of China is shown in Figure 2.  Using Equation (1), landslide risk was calculated in each of 1 km Â 1 km grid. Based on the classes of hazard and vulnerability, the risk was correspondingly divided into five classes: very low risk (including non-risk) (0-0.04), low risk (0.04-0.16), moderate risk (0.16-0.36), high risk (0.36-0.64) and very high risk (0.64-1). The risk assessment results are shown in Table 3, and the landslide risk map of China is shown in Figure 3.

Hazard model analysis
To evaluate the reliability of the hazard, the rationality and accuracy of the hazard assessment model, i.e., the CF & LR combination model should be tested and evaluated. The rationality test of the CF & LR combination model was carried out through the distribution of the actual landslide spots in the different hazard classes. The testing landslide spots were randomly selected by ArcGIS, and total 338 spots from the existed 1260 landslides spots (26.83% of the total) were used for the rationality test.
The rationality of the model was inspected on the basis of the following conditions: 1. The larger the percentage of hazard spots in higher hazard zone than in lower hazard zone is, the better the rationality of the model is; 2. The larger the percentage of lower hazard zone in the total study area is, the better the rationality of the model is; 3. The ratio (R ei ) of the percentage of landslides spots in each hazard class (G ei ) to the area percentage of each hazard class in the total study area (S ai ) should be increased with the hazard class increase (where i ¼ I, II, III, IV, V, respectively representing the hazard classes from very low to very high), that the reasonable hazard assessment (R ei ) should be satisfy: R eI <R eII <R eIII <R eIV <R eV .
Thus the rationality of the CF & LR combined models for landslide hazard assessment is shown in Table 4. The hazard assessment results of the model is generally satisfied with test condition (1), that is the percentage of landslide spots in each hazard class increases in turn from very low hazard to very high hazard, exceptionally the very high hazard is little smaller than that of the high hazard. Furthermore, the assessment results of the CF & LR combined model is fully satisfied with test condition (2) and (3) which are the S ai increases in turn from the very high hazard to the very low hazard, and also R eI <R eII <R eIII <R eIV <R eV . Thus we may say that the CF & LR combined model is reasonable for the hazard assessment of landslides in China.
The accuracy of the assessment results of the CF & LR combined model is evaluated by the Receiver Operating Characteristic (ROC) curve. ROC curve is a graphical method for evaluating the effectiveness of classification, which is drawn by Sensitivity as ordinate and 1-Specificity as abscissa based on a series of two different classifications. Area under ROC curve is AUC value, which is an index for measuring model accuracy. The range of AUC value is [0.5, 1]. A greater value indicate that a better discrimination (accuracy) of the model. The ordinate (Sensitivity) represents the proportion of the hazard assessment units that are evaluated to have landslide spots and actually existed landslides. The abscissa (1-Specificity) represents the proportion of the hazard assessment units that are evaluated to have landslide spots but without landslides. Based on the landslide database, the recorded 1260 super-huge and huge landslides were used as the checkpoints for the accuracy of the CF & LR combined model. As mentioned before, the greater the AUC value is; the higher accuracy of the model has. Figure 4 shows AUC value of the CF & LR combined model is 0.92, which indicates that the CF & LR combined models has a high accuracy for the macro-scale of regional assessment on landslides, which verified the model is applicable to landslide hazard assessment in China. As a result, the hazard calculated by the CF & LR combined model may be used to assess landslide risk in China.
The very high hazard areas of landslides are mainly distributed in southwestern China. The high hazard areas are mostly located in the south of Yangtze River. The moderate and low hazard areas are the majority in China. The very low hazard areas are distributed primarily in the northwestern and western provinces.

Hazard distribution and description
As seen from Figure 5, the hazard classes and their constitutions for each province are presented. The area of very high hazard is concentrated in Sichuan, Guizhou, and Yunnan provinces. In particular, Yunnan has the vast area of very high hazard, accounting for 69.26% of the provincial area. Hunan and Guangxi also have large areas of high hazard, accounting for 25.90% and 27.64% respectively of their provincial areas. Zhejiang, Fujian, and Chongqing are prominently moderate hazard areas, accounting for 31.93%, 39.20% and 41.24% of these provincial areas respectively. Shanxi, Jiangxi, Guangdong, and Hainan principally consist of low hazard areas, accounting for more than 30% of their provincial areas. The other 19 provinces have mainly very low hazard areas. Among them, almost the whole Shanghai is very low hazard (containing non-hazard).

Vulnerability distribution and description
The vulnerability classes and their constitutions are relatively simple, and most provinces are very low, low, and moderate classes ( Figure 5). Very high vulnerability zones are scarcely distributed in eastern China, where are highly developed and dense population. The high vulnerable areas are partially distributed in the provincial capitals and their neighbouring cities, where Beijing-Tianjin-Tangshan, Yangtze River Delta and Pearl River Delta are mostly prominent. Moderate vulnerability zones majorly distribute in northern China and in Sichuan Basin. Low and very low vulnerability zones are located in northwestern and southwestern China, where are relatively undeveloped with low density of population. Specifically, the Beijing, Tianjin, Hebei, Shanxi, Shanghai, Jiangsu, Anhui, Shandong, Henan, Hubei and Shaanxi provinces have moderate vulnerable areas in more than half of the total areas for each province.
In particular, Shandong has the largest area of moderate vulnerability, accounting for 88.77% of its total. The other 16 provinces are prominently the low vulnerable areas; among them, Fujian has the largest area of low vulnerability, accounting for 83.38% of its total; Inner Mongolia, Gansu, Ningxia, and Xinjiang are dominated by very low vulnerability zones, as the area proportions account for 51.00%, 49.58%, 51.76% and 79.93%, respectively of the total area for each province.

Risk distribution and description
The risk distribution of landslides is roughly demarcated by Heihe-Tengchong population density line as the boundary; the west of the line is mainly low risk areas, however, there are high risk zones in southern Gansu, western Sichuan and eastern Tibet, because these areas are in the transition zone between the first and second geomorphologic terraces of China, which formed a great topographic disparity by strong tectonic activities; the east of the line is mainly moderate and high risk zones, especially the southeastern China, where is densely populated and has a large economic density, so this area has high vulnerability, once landslides occur, there will be a high risk ( Figure 3). As seen from Table 3, the area of very high risk is approximate 668 km 2 accounting for only 0.01% of China; high risk areas are mainly distributed in Sichuan Basin and its surrounding regions, accounting for 7.28% of China; moderate risk areas are mostly located in the south of Yangtze River, accounting for 33.86% of China; as well as low and very low risk areas majorly distribute in northwestern, northeastern, and northern China, accounting for 40.51% and 18.34% of the total area, respectively. The risk classes and their constitutions for each province are shown in Table 5. Very high risk areas account for 20%-40% of Chongqing, Sichuan, Guizhou, and Yunnan; among them, Guizhou has vast area of very high risk, accounting for 30.07% of its total. High risk areas account for more than 60% of the total in Fujian, Guangxi, and Yunnan; among them, Fujian has the largest area of high risk, accounting for 75.27% of its total. The provinces with high risk areas accounting for 40%-60% of their total include Zhejiang, Jiangxi, Hunan, Guangdong, Chongqing, Sichuan and Guizhou. Moderate risk areas are only shown in Shanxi, Jiangxi, Guangdong, Hainan, Tibet and Gansu with areas more than 20% of their each provincial area. Low risk areas are distributed prominently in Shanghai and Jiangsu, accounting for 64.82% and 47.80% respectively of their each provincial area. The other 6 provinces including Liaoning, Heilongjiang, Anhui, Shandong, Henan, and Hainan have 20%-40% low risk area. Very low risk areas distribute a vast of region in China, including other 16 provinces, of which, Xinjiang has the largest area accounting for of 88.27% of its total. The large magnitude of landslides frequently occurred in western China, so there are high hazard, but its relatively undeveloped economics and low population density, make the west of China have a relatively low vulnerability. Therefore, incorporated with the hazard and vulnerability, landslide risk in western China is not yet very high at present. The spatial pattern of landslide risk is distinctly divided by Heihe-Tengchong population density line. The west of the line mainly favours low risk areas, in contrast, the east of the line, mainly moderate and high risk areas are presented.
As far as the entire country is concerned, because the proportion of very high and high risk areas is extremely small and localized, except for partial areas, China as a whole is not yet a high risk country for landslides at present. However, with the rapid socioeconomic development predicted for the next decade, the regional economic difference will be minimized, and the vulnerability will be greatly increased. Thus, the landslide risk will be persistently raised. With the enlargement of the overlapping areas of high hazard and high vulnerability in western China, the high risk area will move over Heihe-Tengchong population density line from the east toward the west of China, and the holistic landslide risk will be increased. Hence, the landslide risk reduction and mitigation are urgently needed.

Conclusions
Landslides are the main geological hazards, and occurred in the vast mountainous regions in China. The zoning of landslide hazard, vulnerability and risk is absolutely necessary, and the risk map updated approximately 10 years in interval is also really needed. The assessment method proposed in this study may evaluate landslides on a macro-scale which will be beneficial not only for hazard mitigation in China, but also of practical reference for geological hazard management and land-use planning in other countries of the world. Note: Dots represent area percentages of the risk classes accounting for the total area in each province; the size from the smallest to the largest represents 0-20%, 20-40%, 40-60%, 60-80% and 80-100% respectively.
At present, five classes of landslide risk displayed a non-normal distribution show that many provinces have high and low risks rather than moderate risk. Of course, only a few provinces have very high risk, and most provinces have very low risk. Specifically, Fujian, Guangxi and Yunnan have the highest risk, and followed by 7 provinces of Zhejiang, Jiangxi, Hunan, Guangdong, Chongqing, Sichuan and Guizhou. These provinces should be paid highly attentions to strictly planning their land-uses, especially in the top design for the territory control.
The spatial pattern of landslide risk in China approximately takes Heihe-Tengchong population density line as the boundary. The west of the line is mainly a low risk area; the east of the line contains mainly moderate and high risk areas. Some high risk areas are also sporadically distributed in the west of the line. The areas of very low and low risk respectively account for 40.51% and 18.34% of China, whereas the areas of very high and high risk account only for 0.01% and 7.28%, respectively. Therefore as a whole, China is currently not yet a high risk country for landslides expect for some localized terrains. However, the overall tendency of the risk will be increased in the next decade, and the high risk areas will also be crossed the demarcation line and expanded to the west of China in the following years. Precautions and countermeasures should be taken into accounted immediately.