Assessment of post-wildfire debris flow occurrence using classifier tree

Abstract Besides the dangers of an actively burning wildfire, a plethora of other hazardous consequences can occur afterwards. Debris flows are among the most hazardous of these, being known to cause fatalities and extensive damage to infrastructure. Although debris flows are not exclusive to fire affected areas, a wildfire can increase a location’s susceptibility by stripping its protective covers like vegetation and introducing destabilizing factors such as ash filling soil pores to increase runoff potential. Due to the associated dangers, researchers are developing statistical models to isolate susceptible locations. Existing models predominantly employ the logistic regression algorithm; however, previous studies have shown that the relationship between the predictors and response is likely better predicted using nonlinear modeling. We therefore propose the use of nonlinear C5.0 decision tree algorithm, a simple yet robust algorithm that uses inductive inference for categorical data modelling. It employs a tree-like decision making system that makes conditional statements to split data into homogeneous classes. Our results showed the C5.0 approach to produce stable and higher validation metrics in comparison to the logistic regression. A sensitivity of 81% and specificity of 78% depicts improved predictive capability and gives credence to the hypothesis that data relationships are likely nonlinear.


Introduction
An average of 350 million hectares of land are affected annually by wildfires worldwide (van der Werf et al., 2006). There are predictions of even further increase in these occurrences with increasing trends in temperature (Westerling et al., 2006;Bond-Lamberty et al., 2007;Flannigan et al., 2009). The hazards associated with a wildfire continue even after it is contained. Its aftermath can yield itself to a spectrum of post-effects, of which debris flows are at the most disastrous end (Brock et al., 2007;Cannon et al., 2010). Debris flows are fast-moving, high-density slurry of water, morphology, burn severity, rainfall characteristics, and soil properties to build logistic regression models that predict the statistical likelihood of post-fire debris flow occurrence in western United States (Cannon et al., 2010, Staley et al., 2013Staley et al., 2017). This work began in 2005 but there have been several refinements to the models over the years Cannon et al., 2010;Staley et al., 2017). The logistic regression approach is advantageous mostly because it considers simple linear relationships which are computationally faster and easy to interpret. Up until 2017, the best available logistic regression model developed by USGS researchers for the intermountain west United States reported a sensitivity of 44% (Cannon et al., 2010) this translates to an approximate of 4 out of 10 potential hazardous debris flow events being correctly predicted. This classifier had each of the input predictors modelled to influence the response variable independently, as such, probabilities greater than the cutoff points occurred even when the rainfall input was zero (Cannon et al., 2010). This was problematic because it is impossible for debris flows to occur in the absence of a driving high intensity, short duration rainfall Cannon et al., 2010). In a bid to improve upon this, in 2016, USGS researchers added more samples to the initial 2010 dataset to investigate if the now data-rich database could improve the initial model (Staley et al., 2016). The data size was increased from 388 samples in 2005  to 1550 samples in 2016 (Staley et al., 2016). Also, this new study introduced a link function whereby the critical inputs of the basin characteristics were multiplied by the rainfall inputs to ensure that the response probability was close to zero when there was no rainfall event. The best of these updated models had an improved sensitivity of 83% as compared to the previous 44%, with a corresponding specificity of 58% (Staley et al., 2017).
Other researchers have also looked into nonlinear probability modeling approach to investigate if more of the complex relationships between basin predictors and debris flow occurrence, which might not be discernible to linear models, can be captured with the nonlinear approach. Kern et al., 2017 explored the use of machine learning algorithms to model debris flow response. Their study explored both linear and nonlinear relationships between the predictors and response variable. They compared the accuracies offered by different linear and nonlinear models using the same dataset in Cannon et al. 2010's study. Their results showed the nonlinear models outperformed the linear ones by as much as $64% giving credence to their hypothesis that the relationship between basin predictors and the debris flows occurrence might be a nonlinear one. The top model identified from the Kern et al. (2017) study was one that was built using the Naïve Bayes algorithm. This model resulted in a sensitivity of 72%, an improvement on the 44% that was initially obtained from Cannon et al., 2010's study, and a corresponding specificity of 90% showing improved ability to predict these debris flow locations with the nonlinear model.
We are therefore proposing the application of nonlinear modeling to the 2016 dataset as well to further improve the debris flow prediction. Preliminary analyses done with the Naïve Bayes algorithm resulted in a sensitivity 75% of and a specificity of 81%, showing improved results. However, in this current study, we propose the use of the nonlinear C5.0 tree algorithm (Quinlan, 1993) as opposed to the Naïve Bayes algorithm because the Naïve Bayes model is a black box model whose inner workings are unknown. It does not offer any insight into the relationships of the predictors as they relate to the response. The C5.0 tree algorithm, was therefore chosen in particular was chosen because it is one of the simplest nonlinear algorithms with transparent outputs. It affords a nonlinear approach by identifying unique cutoffs in the different distributions of the predictors as they relate to the response. The algorithm works by splitting the data into smaller, more homogeneous groups. Stepwise decisions are made on predictors at different levels to iteratively determine unique breakpoints as they relate to the different classes of the response variable. C5.0 builds trees from a set of training data using concepts from information theory. The algorithm makes different decisions at different nodes in an attempt to sort the response variable into its homogenous classes. To determine which predictor to choose to ask which question at each node, it uses a concept called information entropy (H), a statistic measure for the average rate at which information is produced by a stochastic source of data (Shannon, 2001). Essentially, H calculates the uncertainty in any particular decision at each node. Shannon defined H of a discrete random variable, X, with possible values (x 1 , x 2, … x n ) and probability mass function P(x) as: where b is usually taken as base 2. For a binary response like in this project's case, 'no' debris flow and 'yes' debris flow, the entropy distribution looks like Figure 1 below: Entropy reaches a maximum at the halfway point when the probability is 0.5; i.e., there is a 50-50 chance that it could go either way ('no' or 'yes'), uncertainty is at its maximum. It is lowest when the probability approaches 0 or 1. The goal is to choose the predictor which gives us the lowest entropy. Moving on from there the process is repeated for the node below; however, this time the gain, measure of entropy change, is also determined. This is to assess the magnitude of information increase in comparison to a prior node (Mitchell, 1997;Shannon, 2001). with changes in the probability of response classes. Uncertainty is lowest when the probability approaches 0 or 1, and reaches maximum when probability is 0.5.
Gain ¼ Entropy prior À Entropy current All candidate predictors are considered and the one with the largest information gain is chosen for the decision using a greedy system. This process is applied recursively from the root node down until a subset contains only samples of a single class, or the partitioning tree has reached a predetermined maximum depth. The C5.0 tree algorithm uses boosting in its model training process, which works by combining average model decisions together to improve overall model performance of the final output. The particular objective of this study was to investigate the applicability of the nonlinear C5.0 tree algorithm in predicting the likelihood of post-wildfire debris flow occurrences in the western U.S. and to determine if there is any advantage to be obtained over the linear logistic regression approach with this new dataset. Besides the response variable, there were a total of 16 predictors, which included information on the basin morphology, burn severity, soil properties, rainfall characteristics, and other ancillary data. Brief summaries of predictors have been provided in Table 1.

Model development
The data from Staley et al., 2016 were all given as independent predictors. As was done in Staley et al., 2017, we introduced a link function by multiplying the morphological and burn properties predictors by rainfall predictors, since debris flows cannot occur in the absence of a driving storm. A total of 35 compound predictors were obtained after this. Preliminary data pre-processing steps taken to ensure optimal model performance included omitting observations with missing values and assessing predictor degeneracy. We also observed the existence of correlations between predictors so we performed a pairwise collinearity test. The results showed that 15 predictors had 99% or more correlations with other predictors. These were regarded as redundant information and were thus deleted, leaving 20 predictors for further analyses. We run a predictor selection routine that tested the performance of candidate models with successively fewer predictors (Dillon et al., 2011;Birch et al., 2015). By this, we assessed the influence of each individual predictor on the model as a whole. We examined the variable importance from the C5.0 Tree modeling process. With each model run, C5.0 tree calculates variable importance by randomly permuting the values of each variable, one at a time, and assessing the overall improvement in the optimization criteria (accuracy, in this case). We determined the rankings of stable Peak 15-minute rainfall accumulation of storm, in millimeters Acc030 Peak 30-minute rainfall accumulation of storm, in millimeters Acc060 Peak 60-minute rainfall accumulation of storm, in millimeters Area Contributing area of observation location, in square kilometers dNBR Average differenced normalized burn ratio of watershed GaugeDist Distance from rain gauge to documented response location, in meters Iave Average storm intensity, in millimeters per hour KF Average KF-factor a basin. Also known as the erodibility factor. It is the susceptibility of soil particle to detached and get transported by rainfall. Peak_I15 Peak 15-minute rainfall intensity of storm, in millimeters per hour Peak_I30 Peak 30-minute rainfall intensity of storm, in millimeters per hour Peak_I60 Peak 60-minute rainfall intensity of storm, in millimeters per hour PropHM23 Proportion of watershed burned at high or moderate severity and with slope >23 Soil Thickness of soil to the closest 'restrictive layer' that significantly impede the movement of water and air through the soil. StormAccum Total rainfall accumulation of storm, in millimeters StormDate Date of storm that produced the debris-flow response StormDur Total duration of storm, in hours predictor importance by running 10 reproducible C5.0 tree models with all the 20 remaining predictors. From here, we determined a single value of importance for each predictor from the mean variable importance of all 10 candidate models and ranked them (1 ¼ highest importance; 20 ¼ least importance). A threshold of 50 was applied, narrowing down the predictor size to five most informative. Finally, to ensure that predictors were truly independent we performed a pairwise collinearity test once more and applied a cutoff of 60%. This resulted in these three final predictors for modelling: Soil Ã peakI30, PropHM23 Ã peakI30, and KF Ã peakI30.
Using the final three predictors, we split the data into 70% training and 30% validation sets using stratified random sampling to ensure that representative distributions of the response variable were represented in each set, since the data were skewed towards the 'no debris flow' response at about 3:1. A repeated 10-fold cross validation resampling was applied to determine the number of trials needed to achieve optimal model performance. An interval of 1-30 was set as the candidates for this process. The results from the resampling were then aggregated into a performance profile which revealed optimal number of trials to be 11. Setting this as the optimum number of trials, the model was developed a final time using the entire training data.

Model evaluation
The developed model was then tested on the initial 30% validation set that was retained after preprocessing. To define the performance profile, we first generated a confusion matrix (Table 2) to summarize how the model's predictions performed against the actual data.
From this we determined the sensitivity (Equation (1)), which measures the fraction (or percentage) of debris flow producing locations that were correctly predicted by the model. This metric was given the highest priority because it gives a direct measure of the objective of the study. It runs from 0 to 1, with 1 representing a perfect model that correctly classifies all the debris flow locations. The specificity (Equation (2)) was also determined as the measure of the fraction (or percentage) of locations that did not produce any debris flows that were correctly predicted by the model. This metric also runs from 0 to 1, with 1 representing a perfect model that correctly classifies all the no debris flow locations. We were also interested in this metric since it assesses the model's robustness in preventing false positives.
The third metric was the overall accuracy (Equation (3)), which measures the overall performance of the model in correctly distinguishing between debris flow and no debris flow locations. A score of 1 indicates a perfect model and 0 indicates a model with no predictive capability. Threat score (Equation (4)), also known as the critical success index (Schaefer, 1990), was also considered. It is another metric that measures a model's overall performance. It is especially used when the class distributions of the response variable are as skewed toward one class, as is present in our data. This metric, however, does not consider true negative (TN) events. Cohen's Kappa (Equation (5)), also provides a measure of the overall performance of the model by measuring the similarity between predictions and observations while correcting for agreement that occurs by chance (Cohen, 1960). The kappa statistic tends to be quite conservative but it is a more robust measure than the overall accuracy since it takes into account the possibility of chance predictions. Therefore typically, values within 0.30-0.50 on a scale of -1 to 1 indicate reasonable agreement (Kuhn and Johnson, 2013).
where p 0 is the total accuracy given as: p 0 ¼ TPþTN TPþFPþTNþFN , and p e is the random accuracy given as: We evaluated the validation data on all these five metrics together because each of these metrics has its own biases, hence, using them together gave a better picture of the model's performance. For example, by virtue of the skewed nature of the data, if we consider the threat score metric, it will afford us the ability to prioritize the lowfrequency debris flow locations, since it ignores the TNs in its computation. However, considering this metric alone would have meant that we would not have been able to fully assess the influence of the false positives in relation to how many no-debris flow locations were present. That information is necessary to assess whether or not the model holds the risk of desensitizing the public, showing the need for considering all these metrics. To further test robustness and stability, the entire modeling process from resampling, training, and validation was repeated for ten different combinations of data samples and the best among these was selected.

Results and discussion
In this study, we used C5.0 decision tree algorithm to develop predictive models with the aim of isolating locations in western U.S. that will likely produce post-wildfire debris flows. Ten candidate models were built in order to investigate the robustness and stability of the algorithm. The average of validation metrics for the candidate models have been presented in Table 3. The results gave stable as well as high metrics for all 10 candidates with an average sensitivity of 77%, which is an improvement on the 66% from the logistic regression method (Table 4), thereby giving confidence in the use of the nonlinear C5.0 algorithm for our study. This algorithm was robust in distinguishing between the two classes of the response, even though the data distribution was skewed towards the 'no debris flow' class. The 2017 study done on this same dataset using logistic regression approach (Staley et al., 2017) reported validation outputs as TPs, TNs, FPs, and FNs hence, we have computed the corresponding sensitivities, specificities, kappas, and accuracies (Table 4), to allow for direct comparison of the two different approaches.
Comparing Table 3 to Table 4, we can see that the logistic regression algorithm produces generally lower metrics than the C5.0 approach. Its highest sensitivity of 84% corresponded to a specificity of 52%, which goes with the general theme of its results, where high sensitivities corresponded to low specificities, and vice versa. In other words, the model had a harder time isolating actual debris flow locations without lumping some of the no debris flow locations (false positives) with them. This is likely due to the linear nature of the logistic regression algorithm being unable to capture much of the complex relationships to discern between little nuances as they relate to the likelihood of debris flow occurrence.
The C5.0 tree, on the other hand, affords a nonlinear approach by identifying unique cutoffs in the different distributions of the predictors as they relate to the response. Model 3 from Table 3 was chosen to be the overall best model since it had a high sensitivity with an equally high specificity, as well as a simple tree structure. A confusion matrix of the resulting classification of the test data for this model is presented in Table 5. The resulting sensitivity is 81% and the specificity is 78%, which shows an increased capacity to correctly identify $8 out of 10 of these hazardous debris flow locations with a very low risk ($22%) of collating numerous false positives in the process. The comprehensive list of rainfall predictors investigated in this study include rainfall accumulations in 15, 30, and 60 min, respectively, peak rainfall intensities in 15, 30, and 60 min, respectively, total rainfall accumulation as well as average rainfall intensity. A look at the decision tree plot (Figure 3), however, reveals that it was only the peakI30 rainfall predictor that the model found to be most informative. This suggests that it does not take much time (at most 30 min) for a post-wildfire debris flows to be triggered once an intense storm (peak) starts, which agrees with what is found in literature (Cannon et al., 2010).
As was discussed earlier, the rainfall predictors are very essential because debris flows cannot occur in the absence of a triggering storm event. However, since the peakI30 predictor was multiplied to each of the other predictors used in the study, it will be considered as a constant in the following discussion of the tree decisions. The generated simple tree comprised a root, five branch nodes and seven leaf nodes; therefore there were seven different decision paths. The accuracy of the decisions were individually $60% or better. The most important predictors were the PropHM23, which indicates the proportion of moderate-high burn severity locations that have slopes that are 23% or higher, as well as the Soil predictor, which measures the overburden thickness over a 'restrictive layer' such as bedrock, cemented layers, frozen layers, etc. The PropHM23 was the root of the tree and thus formed the basis of every decision made, whereas the soil was the 'tie-breaker' for $86% of the decisions.
A quick overview of the seven decision nodes showed that the model was flexible enough to discourage the training data from overfitting to it. Taking the decision path from the root to leaf node 1, shows that fire affected location that are on higher  elevations (slope >23%) but sustained mostly lower severity burns, i.e., lower moderate-high burn severity burn (PropHM23 Ã peak_I 30 < 2.9) will likely not produce any debris flows. This decision was especially reassuring because it agrees with the main premise of this study that wildfires increase the susceptibility of a location to debris flow occurrences. A look at decisions 2 through to 7 seem to suggest that the thicker the soil overburden over a 'restrictive layer' (Soil Ã peak_I 30 ), the higher a location's chance of generating post-wildfire debris flows. A speculation can be drawn that this is likely due to the fact that the basin will now have a greater supply of loose sediment primed for movement by a triggering rainfall. Decisions 4 and 6 further buttress this point by suggesting that even with a higher erodibility factor (KF), a basin might not be susceptible to post-fire debris flows, if it has little supply of loose material to move. In summary, the comprehensive tree seems to suggest that: Post-wildfire debris flows are triggered by high intensity storms that occur over short periods. Any location with an ample steepness in terrain (slope >23) that experiences moderate to high severity burns (PropHM23) has an increased capacity to produce debris flows. Coupled with this, the thicker the soil overburden over a 'restrictive layer', the higher the chances of generating these events.
These are in no way new discoveries as past studies have reported or alluded to them. In fact, the recent logistic regression modeling identified these similar predictors as well (Staley et al., 2017). The agreement confirms the importance of these predictors and gives rise to the recommendation to focus future studies on the cutoff  points identified by the C5.0 tree algorithm. This will further our understanding of the triggering and driving forces to better prepare and mitigate future hazardous events.

Conclusions and future work
The main focus of this study has been to investigate the use of nonlinear C5.0 tree algorithm in predicting the likelihood of post-wildfire debris flow occurrences in the western U.S. and compare it to the results obtained from the logistic regression approach. The results showed the C5.0 approach to produce stable and higher validation metrics in comparison to the logistic regression. It had an average sensitivity of 77%, which is an improvement on the 66% from the logistic regression approach. Analyzing the resulting tree of the adopted C5.0 tree model buttressed the intuitive hypotheses that a location with a steep terrain (slope >23) that experiences moderate to high burn severity fires and laden with thick overburden soil is highly susceptible to such events. All these give rise to the conclusion that the relationship between the predictors and the response is nonlinear as opposed to the linear one offered by the logistic regression.