Mapping debris flow susceptibility based on watershed unit and grid cell unit: a comparison study

Abstract Debris flow susceptibility analysis is a prerequisite of risk assessment. The main objective of this study was to explore the accuracy and practicability of mapping units for evaluation of debris flow susceptibility. These units include grid cell units (GCUs), and watershed units (WUs) with the flow thresholds 10 000 (WU 10 000) and 5000 (WU 5000). The frequency ratio (FR) model was selected as the statistical method. Yongji County (YJC) of Jilin Province, China was selected as the research site, and a total of 123 debris flow disasters were surveyed. Eight influencing factors were considered and a total of three models were constructed. The predictive capabilities of the models were verified using an ROC curve and AUC. The results showed the three models to be accurate and the evaluation results of the GCU were found to be more accurate than others. However, when considering the effects of geology and geomorphology on the occurrence of debris flows, the WU was more feasible than the GCU. Therefore, the results indicate that the evaluation of debris flow susceptibility should be carried out based on the WU of the appropriate flow threshold in combination with the actual prevention and control of debris flow disasters.


Introduction
Debris flows occur in mountainous areas and they can be significantly disastrous. They have the potential to destroy infrastructure, buildings, and human life (Faria Lima Lopes et al. 2016). In recent years, debris flows have become one of the most recognized natural disasters in the world (Giannecchini et al. 2007).
Evaluation of debris flow susceptibility and determination of the high susceptibility zones play an important role in managing and reducing debris flow risks (Shi et al. 2016;Jiang et al. 2017;Pastorello et al. 2017). Recently, remote sensing and geographic information system (GIS) technologies have been applied to the risk assessment of debris flows (Elkadiri et al. 2014;Bregoli et al. 2015), which has made debris CONTACT Chen Cao ccao@jlu.edu.cn flow susceptibility mapping (DFSM) more convenient (Chevalier et al. 2013;Xu et al. 2013;Kritikos and Davies 2015). Various models including probabilistic approach (Chang et al. 2014), rock engineering system and fuzzy C-means algorithm (Li et al. 2017), empirical models (Kappes et al. 2011;Horton et al. 2013), artificial neural network (Chang and Chao 2006), logistic regression (LR) (Ayalew and Yamagishi 2005;Greco et al. 2007), analytic hierarchy process (Yalcin 2008), clast distribution patterns (Faria Lima Lopes et al. 2016), qualitative heuristic method, Flow-R (Blais-Stevens and Behnia 2016), and advanced Bayesian spatial models (Lombardo et al. 2018) have been used for DFSM. Furthermore, the frequency ratio (FR) method has been proven to be effective, and it has been successfully applied to flash flood hazard susceptibility mapping and landslide susceptibility mapping (Cao et al. 2016;Wang et al. 2016;Zhang et al. 2016). In view of the effectiveness of FR method, in the present study, this method was selected as the statistical method to better explore the effect of different mapping units on the susceptibility mapping of debris flow. Selection of the appropriate mapping units is crucial to the accuracy and practicality of disaster susceptibility mapping (Cama et al. 2016;Zezere et al. 2017). The mapping units include grid cell units (GCUs), slope units, geohydrological units, topographic units, administrative units, and unique condition units (Van Den Eeckhaut et al. 2009;Erener and D€ uzg€ un 2012;Rotigliano et al. 2012;Zezere et al. 2017). Grid cells are totally automatic, simple, and have been widely used recently. However, noteworthy, grid cells are a regular representation of space, whereas landslides are complex and non-regular phenomena (Alvioli et al. 2016).
Many scholars have studied the effect of grid cell size on landslide and DFSM. Palamakumbure et al. (2015) investigated the effect of the grid cell size (2,5,10,15,20,25,30, and 40 m) on the accuracy of landslide susceptibility mapping by utilizing induced decision trees. Cama et al. (2016) explored the relationships between grid cell size (2, 4, 16, and 32 m) and accuracy for debris flow susceptibility models by using forward stepwise binary LR. Moreover, some studies also focused on the relationship among various mapping units and disaster susceptibility assessments. Erener and D€ uzg€ un (2012) investigated the effect of mapping unit (slope unit and grid cell) on a landslide susceptibility assessment by using LR and spatial regression. Van Den Eeckhaut et al. (2009) established and analysed two landslide susceptibility maps, namely, a grid cell based map and a topographic unit based map. Zezere et al. (2017) compared landslide susceptibility maps obtained by utilizing LR using slope terrain units, geohydrological terrain units, census terrain units, and grid cell (5 m) terrain units. The watershed unit (WU), as the basic unit for the development and activity of debris flows, has been used as an effective mapping unit to evaluate the susceptibility of debris flows and has proven to have strong applicability (Bregoli et al. 2015;Shi et al. 2016;Li et al. 2017). However, research on the comparative analysis of WU and other mapping units has rarely been reported.
The effects of different mapping units on disaster susceptibility mapping have been proven to be greater than other statistical methods (Zezere et al. 2017). The objective of this study was to explore the effect of GCUs and WUs on the susceptibility mapping of debris flow. Yongji County (YJC), which covers an area of 2620 km 2 in Jilin Province of northeast China, was selected as the study area. Eight influencing factors including elevation, slope, precipitation, landforms, lithology, land use, distance to fault, and population density were selected to conduct an evaluation of debris flow susceptibility. Three DFSMs are discussed and an area under the curve (AUC) analysis was used to evaluate the accuracy of the DFSM methods. The DFSM based on GCU was compared with the DFSM based on WUs. Differences in the DFSM generated by WUs with different flow thresholds were observed, and these differences were also analysed in this study.
YJC lies in the transition zone between Changbai Mountain and Songnen Plain, and the terrain gradually decreases from south to north. There are seven peaks that are higher than 1000 m above sea level, and most of the peaks from 500 to 1000 m are in the southern and middle areas. There are four landforms in the entire area: the middle mountains, low mountains, platform, and river valley.
According to genetic age, material composition, structure, and physical and mechanical properties, the lithology is divided into the following five categories: relatively hard clastic rock (conglomerate, sandstone, and siltstone), soft clastic rock (shale and tuffaceous shale sandstone), hard bedded rock (limestone), hard massive rock (granite), and soil mass (clay, sub clay, fine sand, coarse sand, and gravel soil). The study area lies in the Tianshan-Xingan geosyncline fold area, Jilin and Heilongjiang fold system. Faults dominantly trending approximately NE-SW and NW-SE are present in this region.
The study area has a continental, dry, and cold monsoon climate in the North Temperate Zone with four distinctive seasons, and the temperature changes significantly. The annual average temperature is 5.3 C with an annual average precipitation of 722.75 mm. The maximum annual rainfall is 1150 mm (2010), minimum annual rainfall is 457 mm (1982), and rainfall is concentrated between June to August.

Inventory map
Based on a field survey of the study area, a total of 123 debris flow hazards were identified and are shown in the inventory map ( Figure 1). The debris flow disaster that occurred in YJC is mainly distributed across the southeast mountain area, and to a lesser extent in the northwest plain area.
Before July 13, 2017, based on a survey and zoning of 1:50000 geological disasters in the Jilin Province, a total of 91 debris flow disasters were investigated. The debris flow damaged 247 houses and 430 acres of farmland, with a direct loss of 15 million yuan. On July 13 and 19, 2017, two heavy rainstorms occurred in YJC with an average rainfall of 175.4 mm, and the maximum reached 309 mm. The field survey statistics indicated that the rainstorm caused 32 debris flows. The debris flows destroyed roads and houses and caused traffic paralysis and homelessness ( Figure 2).
In this study, 70% (n ¼ 86) of the debris flows were selected randomly for training data, which were used to create the DFSM models. The remaining 30% (n ¼37) were used as testing data, which were used to validate the DFSM model ( Figure 1).

Influencing factors
Occurrence of debris flow is related to a variety of factors. For example, geological, topographical, morphological, and vegetation and human engineering activities can influence debris flows (Chang et al. 2014;Shi et al. 2016;Li et al. 2017). Debris flow factors considered in this study include elevation, slope, precipitation, landforms, lithology, land use, distance to fault, and population density. These factors were used to create the DFSM model, which provided good results (Xu et al. 2013;Elkadiri et al. 2014;Kritikos and Davies 2015;Camilo et al. 2017). The 8 Â 8 m digital elevation model (DEM) of study area was collected using the Google Earth. By using this DEM, topographic-related thematic data layers such as elevation and slope were prepared. Precipitation data were collected from weather stations in YJC. The geology parameters were obtained using a geological map of Jilin Province, with a scale of 1:200,000. Other parameters were mainly collected from available resources.
The elevation ranges between 182 and 1386.64 m in the study area ( Figure 3a). The elevation reflects the relative height difference, which aids in determining the gravity potential energy of the debris flow. The larger the height difference, the greater the gravitational potential energy. This provides a dynamic condition for the occurrence of a debris flow. For the WU, the difference between the highest and the lowest points in each watershed was calculated and used as influencing factor.
Based on the DEM, the slope was obtained using ArcGIS. Stability of the slope decreases with an increase in slope angle, and a steep slope helps to provide loose materials. The slope also influences the initiation and movement of debris flows (Lin et al. 2002;Chang and Chao 2006) (Figure 3b). For the WU, the average slope in each watershed was calculated.
The precipitation, which is one of the most important factors for triggering the initiation of debris flow, was selected as an influencing factor. Debris flow often occurs during the rainy season (Oh and Pradhan 2011). Clearly, in the study area, heavy rainfall easily induces debris flow. The precipitation in the study area varies between 650 and 730 mm ( Figure 3c). For the WU, the rainfall level for each watershed was determined based on the principle of majority, and this principle was also applied to factors of distance to fault, lithology, land use, and landforms.
Earthquakes often occur around the fault, which leads to breakage and weathering of the rocks and formation of a significant weathering zone, thereby providing more materials to the debris flow (Hong et al. 2015). An interval of 800 m is used to generate multiple buffer zones between faults, and these are divided into five classes based on the natural breaks method (Figure 3d).
Strong connection exists between the lithology and formation of debris flows. Lithology controls the stability of slopes and determines the quantity of materials available to a debris flow. The lithology is divided into the following five categories: relatively hard clastic rock, soft clastic rock, hard bedded rock, hard massive rock, and soil mass ( Figure 3e).
The type of land use reflects the changes and effects of human activities on nature. In general, agricultural production and residential use destroy the original vegetation, reduce the capacity of soil and water conservation, and provide good conditions for the occurrence of debris flows, and these are integrated into five categories according to the type of land use (Figure 3f).
The debris flow occurs mostly in the mountainous areas, and the conditions of the landform type affect the initiation, movement, and scale of debris flows. There are four major landforms in the study area: platform, river valley, low mountains, and middle mountains (Figure 3g).
The influence of social and economic factors on vulnerability of debris flow should not be ignored. Human activity factors are important triggers for debris flow. In this study, population density was selected as an influencing factor to represent the intensity of human activities (Figure 3h). For the WU, the average population density in each watershed was calculated.

Watershed unit extraction based on ArcGIS
The WU division in this study was completed by using Model Builder in ArcGIS. The construction algorithm of the WU is illustrated in Figure 4. The first step involves the depression filling of the original DEM (30 m Â 30 m), which is followed by the extraction of the direction of water flow. Further, through the slope runoff simulation method, which is based on the flow direction to calculate the flow accumulation, the threshold is set to connect the cells with a flow accumulation larger than the threshold to form a river grid network, and then, the river network is classified and turned into a vector. Finally, through stream link and watershed range extraction, the WU is divided. In this study, two thresholds, i.e., 10 000 and 5000, were selected. The river nets based on these thresholds are similar to those of the real river. The number of the WUs units is 202 and 394, respectively ( Figure 5). The general characteristics of the considered mapping units are listed in Table 1.

The frequency ratio
The FR method is an accurate and effective method, which is based on the observed relationships between the distribution of debris flows and related factors. In this study, the FR method was used to complete the DFSM Park et al. 2013;Rozos et al. 2013). The FR is defined as the ratio of the probability of occurrence of a debris flow to the probability of a nonoccurrence for the given attributes (GF 1994;Regmi et al. 2014). The larger the FR, the stronger the effect of the given factor on the debris flow (Lee and Talib 2005). First, the FR for each factor type or range was calculated, and the corresponding equation is represented as follows: where A 1 is the number of cells with a debris flow disaster for each factor class; A 2 denotes the total number of cells with a debris flow disaster occurrence in study area; B 1 represents the number of cells for each factor class; and B 2 denotes total number of cells in the study area. Then, the debris flow susceptibility index (DFSI) was calculated by using the following equation:

DFSI
where DFSI is debris flow susceptibility index and FR is the frequency ratio of a factor or range. Clearly, the greater the DFSI, the higher the risk of occurrence of debris flow (Lee and Pradhan 2007). Finally, the DFSM is generated based on the DFSI (Jadda et al. 2011). The spatial relationship between debris flow disasters and influencing factors derived from the FR is listed in Tables 2 and 3.   precipitation is more than 700 mm. For the distance to fault criteria, class 0-800 m exhibits the highest FR value of 1. 92, followed by class 1600-2400 m with 1.6, and finally, class 800-1600 m with 1.48. The number of debris flows in the 0-3200 m range accounts for 75.58% of the total flows. In case of landforms, the river valley proved to be the most prone to debris flow disasters with the highest FR value being 1.83, followed by the middle mountains with an FR value of 1.67. Low mountains and platforms exhibited the lowest FR values of 0.28 and 0, respectively. The results of debris flow assessment based on lithology indicated that most debris flows appeared in hard massive rock and soil mass. The FR values of hard massive rock and soil mass are 1.38 and 1.32, respectively. Hard bedded rock is unlikely to form debris flow disasters. Evaluation results of land use based on GCUs reveal that class of residential land has the highest FR value of 3.17, followed by farmland with 1.55. When the vegetation is destroyed in a farmland, a bare surface is created, and soil is scoured by rainwater, which is easily lost; thus resulting in a debris flow. The FR value of forest is 0.35, and forests can reduce soil erosion; thus, the policy of  (2), the DFSM based on GCU is shown in Figure 6a. Based on the natural breaks method (Feizizadeh and Blaschke 2013;Cao et al. 2016), the DFSM was divided into five classes, namely, very low, low, moderate, high, and very high, which is summarized in Table 4.  The above mentioned discussion indicates that the very high and high susceptibility areas for DFSM were found to be located at a high altitude, steep slope, and close to the fault. The lithology in this area was mainly composed of hard massive rock and soil mass. The tectonic movement provided a large amount of loose materials as a source for the debris flows . The low and very low susceptibility areas accounted for 36.9% of the study area. These regions are lower in altitude and belong to river valley landforms; the terrain is flat, land is mainly farmland, lithology is mainly soil mass, these regions are far from the fault, and there are no debris flows.

Evaluation results based on the watershed units
In calculating the FR values, two types of WUs present the same trend. There are five classes of elevation, the highest FR value is obtained for the fifth class, and both values are greater than 3.00. With the decrease in the elevation, the FR value is also reduced, and the lowest FR value corresponds to the first class, which is 0.00. The FR results for the slope are similar to those for the elevation, i.e., the higher the slope, the larger the FR value. The two FR values for the fifth class are 5.07 and 3.37 for WU 10 000 and WU 5000, respectively. The FR value for the first class is 0, which is in accordance with the fact that a debris flow does not likely occur on a gentle slope. Regarding the precipitation, the WUs and the GCUs show the same trend, i.e., debris flows occur mainly in class 700-730 mm. Evaluation results for distance to fault reveal that the FR values of the WUs are different from those of GCUs, attributed to larger size of WUs. With respect to land use, the forest has the largest FR value, followed by the farmland. The results of landforms indicate that two types of WUs are consistent and include middle mountains with the highest values of FR. Then, there are low mountains and river valley. The class platform has the lowest FR value of 0. For lithology, two types of WUs are the same, and the highest FR value is for the hard massive rock class with an average value of 2.04. This is followed by relatively hard clastic rock and soft clastic rock. The lowest FR value is obtained for the hard bedded rock class, which is 0. For population density, two types of WUs show the same trend, debris flows are mainly concentrated in sparsely populated areas.
According to Formula (2), the DFSM based on WUs is shown in Figures 7b  and 7c. Based on the natural breaks method, the DFSMs based on two types of WUs were divided into five classes: very low, low, moderate, high, and very high. The area of each class is listed in Table 4.
For the DFSM, the zone of very high susceptibility has an area of 248.02 and 593.36 km 2 for WU 10 000 and WU 5000, accounting for 9.46% and 22.64% of the total study area, respectively. Furthermore, the zone of high susceptibility has an area of 543.34 and 472.47 km 2 for WU 10 000 and WU 5000, accounting for 20.73% and 18.03% of the total study area, respectively. Clearly, the very high and high susceptibility zones are distributed in the southeast part of the study area and the landform in this part is dominated by middle mountains. The elevation, slope, and precipitation values of the very high and high susceptibility zones are large. This region lies close to the faults and it mainly consists of forest and farmland. The lithology of the very high and high susceptibility zones is mainly composed of hard massive rock and relatively hard clastic rock. Moreover, the tectonic movement leads to the weathering of rocks followed by accumulation of as formed loose materials, which turn into debris flow disaster in rainstorm weather. Moderately susceptibility zone is distributed in the middle of the study area and the landform in this part is dominated by low mountains. The area of the moderately susceptibility zone accounts for 23.34% and 17.92% of the research area for the flow thresholds of 10 000 and 5000, respectively. This area mainly consists of forest and the lithology is mostly composed of relatively hard clastic rock. The very low and low susceptibility zones are in the northwest of YJC, and the values of elevation, slope, and precipitation are small. The landforms of this area mainly include river valley and platform. This area is far from the faults and the lithology is mainly composed of soil mass, thus no conditions for forming loose deposits prevail in this region (Guo-liang et al. 2017).

Validation of the DFSM results
Validation of the DFSM results is one of the most important tasks (Dieu et al. 2011). In this study, the results of DFSMs were validated by the receiver operating characteristic (ROC) technique. In the ROC curve, the vertical axis represents a true positive rate and the horizontal axis represents a false positive rate. The area under the ROC curve (AUC) was used to evaluate the validity of four models. Based on the training and testing data, the success and prediction rates of six models were calculated by using AUC. The value of AUC varied from 0.5 to 1, and the accuracy of the model was high if the value of AUC was close to 1. Figure 7a displays the accuracy rates for the classification methods of 0.927 for the GCUs and 0.896 and 0.885 for the WU 10 000 and WU 5000, respectively. The associated predicted accuracy rates are 0.888, 0.879, and 0.862, respectively (Figure 7b). This result shows that the three models have high and equal abilities to predict the occurrence of debris flows.
6. Discussion 6.1. Application of frequency ratio FR is a simple, understandable, and effective probabilistic method, which is widely used in the susceptibility evaluation of geological hazards (Regmi et al. 2014;Cao et al. 2016). In this research, the FR method was adopted to better reflect the characteristics of DFSM based on WUs. By using the ROC curve analysis and AUC, the validation results attested to both the GCU-based susceptibility model and WU-based susceptibility model exhibiting high suitability in the DFSM. The DFSM models used in this research also prove that the FR method has high applicability to evaluating the susceptibility of occurrence of a debris flow. Figures 6a-c indicate that very high and high-risk regions are mainly located in the southeast area of YJC. The disasters caused in the region by debris flows are impossible to estimate, and the local residents should be vigilant during the rainy season.

Comparative analysis of watershed units and grid cell units
The GCU is the most commonly used disaster susceptibility mapping unit. Compared to the WU, the GCU has larger AUC for both validation procedures (success rate and prediction rate), which shows that the GCU is more accurate and this has been proven in many studies Zezere et al. 2017). However, the GCU destroys the integrity of the debris flow; although each GCU has a corresponding geological factor, almost any terrain information space is completely independent (Jia et al. 2012). In contrast, the WU, which is the basic unit of development and activity of a debris flow, can protect the integrity of debris flows and easily reveal the spatial information and geological background of a debris flow. When a single point is used to represent a debris flow, only the geological and geomorphic information at the location of the debris flow disaster is considered based on the GCU. However, the debris flow is composed of a forming area, a circulation area, and an accumulation area, and thus, it is better to consider the effects of geological and geomorphic factors on the formation of debris flows in the entire watershed based on the WU. For example, the weathered granite, which increases the density and destruction force of debris flows, is the main component of debris flows (Figure 2). Debris flows pile up over the soil mass, which results in high value of FR of soil mass based on GCU; however, the FR value of hard massive rock is high based on WUs.
In summary, the rapidity, simplification, and accuracy are the advantages of GCU; however, it is nearly independent of the geologic, geomorphic, or other spatial terrain information. The WU, which is much larger than the GCU, completely considers the geological and geomorphic information of a debris flow and helps to reduce the consequences of existing position errors in a debris flow inventory (Zezere et al. 2017). Furthermore, the watershed is the object of exploration, research, and prevention of debris flows. From the DFSM (Figures 7b and c), it is easy to determine which basin has a relatively large probability to form debris flows and which is relatively safe to put forward targeted defensive measures. However, the WU leads to overestimation of debris flow susceptibility and equilibrium phenomenon exists during the extraction of the influencing factors (Guzzetti et al. 2006).

Watershed units of different flow threshold
In this study, two types of WUs were selected to map debris flow susceptibility. The extent of very high susceptibility zones derived using WU 10 000 are smaller than that using the WU 5000, while the high, moderate, and very low susceptibility zones determined using WU 10 000 are larger than those by using the WU 5000. The WU 10 000 generates a high AUC for the training group (0.896) in contrast with the equivalent AUC obtained for the WU 5000 (0.885). The AUC decreases when the debris flow test group is considered (Figure 7). However, the WU 10 000 still generates a high AUC for the testing group (0.879) against the WU 5000 (0.862). The model based on WU 10 000 generates higher AUC for the success and prediction rates, but the unit size is very large, and debris flows mostly occur only in very large watersheds (Table 1), which is explained as an overestimation of debris flow susceptibility (Zezere et al. 2017). When the WU is used as the mapping unit of DFSM, a WU that is too large covers most part of an area, which overestimates the debris flow susceptibility and is not significant for the practical prevention and control of debris flows. Thus, a suitable WU should be selected according to the actual situation.

Conclusion
DFSM is helpful for disaster management and lays the foundation for the formulation of disaster reduction measures. Based on this idea, in this study we provided the results of DFSM based on three types of mapping units: GCU, and watershed units (WU 10 000 and WU 5000) in YJC, Jilin Province, China. The FR method was used as the statistical method. Elevation, slope, precipitation, landforms, lithology, land use, distance to fault, and population density were selected as the main influencing factors. After many field investigations, a total of 123 debris flow disasters were investigated in the study area, where 70% (86 debris flows) were selected randomly for training and creating the DFSM models. The remaining 30% (37 debris flows) were used for testing and validating the DFSM model. By using the ROC curve and AUC, the effectiveness of the results was verified. In the following study, we will constantly update the debris flow database to increase the data quality.
By using the Model Builder in ArcGIS, a tool for generating WUs was developed, which significantly improves the work efficiency. The WUs of flow thresholds 10 000 and 5000 were selected to map debris flow susceptibility. The results show that the very high and high susceptibility zones are mainly located in the southeast area of YJC.
The model based on GCU, which involves a simple calculation process and shows a stable modelling performance, lacks in physical meaning. In contrast, the WU can reflect the geological and geomorphic environmental conditions of a debris flow accurately and perfectly. From the DFSM of WU, the risk level of each watershed can be determined directly, and targeted measures should be taken to avoid the occurrence of potential debris flow disasters. Based on the actual prevention and control of debris flow disasters, a WU with a suitable size should be selected to complete DFSM.