Evaluation of various boosting ensemble algorithms for predicting flood hazard susceptibility areas

Abstract The purpose of the present study was to predict the areas affected by flood hazard in the Talar watershed, Mazandaran province, Iran, using Adaptive Boosting (AdaBoost), Boosted Generalized Linear Models (BGLM), Extreme Gradient Boosting (XGB) ensemble models, and the novel ensemble framework of deep decision trees include the Deep Boosting (DB) model. For this purpose, 14 flood conditioning variables were used as independent variables in flood hazard modeling. In addition, 130 flood points in the region were identified by field visits and available flood information, which were used as the dependent variable in modeling. The results showed that all used models have a good efficiency in predicting flood hazard. The area under curve (AUC) of BGLM, XGB, AdaBoost and DB models were 0.88, 0.87, 0.89 and 0.91, respectively, which indicated the highest efficiency of the DB model in flood hazard modeling in the study area. Relative importance of the variables showed that they have different effects in each model. Altitude and distance from the river are more important than other variables. However, these two variables have been selected as the most important variables based on machine learning models, but other variables may be influential in flood hazards.


Introduction
Floods are the most devastating natural disasters causing enormous threats to human lives and properties (Lee et al. 2012;Mosavi et al. 2018) and environmental socioeconomic systems (Merkuryeva et al. 2015;Tehrany et al. 2015). Flood is a common disaster in the northern provinces of Iran due to high intensity rainfall, unsuitable watershed management and poor flood control structures (Tehrany et al. 2014). It is important to note that floods are part of nature. One of the main reasons for the increase of flood damage in recent years is changes in land use and residential construction in rivers and flood-prone areas. Therefore, flood hazard modeling is essential to identify flood susceptible areas of watersheds. Several factors affect the flood incidence including slope, altitude, land use, drainage network, distance from stream, and soil type (Rahmati and Pourghasemi 2017;Zhao et al. 2018). Climatic and basin factors play the most important roles in causing floods (Janizadeh et al. 2019;Pham et al. 2020). Flood control projects will benefit from understanding the climatic and basin factors that affect flood events (Merkuryeva et al. 2015). Hence, quantifying and evaluating the effects of these factors can be useful and be a high priority in flood mapping. At present time, with the advancement of the geo-information technologies such as Geographic Information Systems (GIS), Machine Learning (ML), remote sensing and statistical approaches, production of highly accurate flood prone area map is entirely achievable (Tehrany et al. 2014;Chapi et al. 2017;Zhao et al. 2018). However, these maps need to be detailed and suitable for understanding flood processes and how each factor influences the process of flood incidence, and appropriate model selection for developing the model.
Flood hazard modeling has been carried out for various areas at watersheds, regional and national scales ) using different machine learning algorithms namely Generalized Linear Model (GLM; (Chapi et al. 2017), Support Vector Machine (SVM; (Tehrany et al. 2015); and Random Forest (RF Rashidi et al. 2016;Chapi et al. 2017;Chen et al. 2020); adaptive neuro-fuzzy inference systems (ANFIS) (Ahmadlou et al. 2019;Wang et al. 2019); ensemble models (Choubin et al. 2019;Chowdhuri et al. 2020). In this study, our focus was on using different boosting algorithms and comparing their performance in studying and preparing a flood susceptibility map. These algorithms have been used in many studies in various fields in recent years due to their high capabilities compared to other methods. Boosting comes with an easy-to-read and interpret algorithm, making its prediction interpretations easy to handle. The prediction capability is efficient through the use of its clone methods, such as bagging or random forest, and decision trees. Boosting is a resilient method that curbs over-fitting easily. Therefore, the present study focuses on other types of machine learning algorithms such as AdaBoost (Freund and Schapire 1997), Boosted Generalized Linear Models (BGLM) (Youssef et al. 2016), Extreme Gradient Boosting (XGB) (Chen and Guestrin 2016), Deep Boost (DP) (Cortes et al. 2014). Standard ensemble algorithms namely, AdaBoost incorporate functions selected from a base classifier hypothesis set H. In successful cases, H is diminished and generated boosting stumps in which decision trees reach to depth one. For some difficult situations, simple boosting stumps are not adequate to reach a high level of accuracy due to a more complex hypothesis set, for instance, decision trees with depth limited by certain comparatively grand number. For instance, Al-Abadi (2018) stated that AdaBoost classifier was the most suitable with respect to the statistical measures in mapping flood susceptibility in an arid region, followed by the random forest and rotation forest models. Also, Coltin et al. (2016) indicated that AdaBoost have high effectiveness, an extensively variety of conditions in flood mapping algorithms. On the other hand, some researches, namely Lee et al. (2017), Albers et al. (2016) and Wang et al. (2015), presented that machine learning models (e.g. random forest models) can be useful in flood susceptibility mapping and its analysis. As indicated above, each model may present different accuracies on the basis of its assumptions with respect to data distribution, interior factors, and main causes of development. Accordingly, there are no worldwide instructions for assessing these models, and each model has its superiority and drawbacks ). However, incorporating individual models could engender more generalizable results, less sensitive and high accurate models (Ara ujo and New 2007;Buston and Elith 2011).
Nowadays, the use of ensemble models has been considered by researchers in various fields due to the use of different modeling algorithms that reduce the prediction generalization error (Hosseini et al. 2020;Mosavi et al. 2020). These studies show that different ensemble models have been considered by various researchers and they proved that it is appropriate for the study of flood hazard. However, the use of the deep boost ensemble model to model flood hazard has not been considered by researchers. Therefore, in this study, the main objectives are to (1) evaluate the efficiency of ensemble models Adaptive Boosting (AdaBoost), Boosted Generalized Linear Models (BGLM), Extreme Gradient Boosting (XGB) and novel framework of deep decision trees (Deep Boosting) to model flood hazard in the Talar watershed, Mazandaran province, Iran, (2) identify the most significant independent variables affecting flood hazard in the study area, and (3) predict a flood hazard map for the Talar watershed.

Case study
Talar watershed is one of the mountainous watersheds of Iran, which is located in the provinces of Mazandaran, Semnan and Tehran, so that most of this watershed is located in Mazandaran province. The study area is 1770 km 2 and is within range latitudes 35 45 0 ̶ 36 20 0 N, and longitude 52 35 0 ̶ 53 25 0 E ( Figure 1). Elevation changes in this area are relatively high, so that the minimum height is 214 m and the maximum height of the area is 3541 m. Precipitation changes are inversely related to altitude, so that most of the precipitation occurs in low-lying areas of the Talar watershed and the average annual rainfall in this area is 609 mm. The main land uses in the Talar watershed include agriculture, rangeland, forest, orchard and residential areas, forest is a major part of the basin, however, due to human intervention and human misuse, forest cover has decreased in recent years ( Figure 2) . Destruction

Methodology
In this study, 130 flood points within the flood-affected locations were selected by field survey and available data of the Regional Water Department of Mazandaran Province. The 130 non-flood points were prepared in ArcGIS 10.5 software by random selection. Data for modeling were divided into the training (70%) and validation (30%) randomly using caret package in R software (Choubin et al. 2019;Janizadeh et al. 2019).
The steps of the modeling framework related to AdaBoost, BGLM, XGB and novel deep decision tree (Deep Boost) is summarized in a flowchart which is presented in Figure 3.

Dataset preparation for spatial modeling
Flood susceptibility map is essential for development planning. A variety of flood conditioning parameters are required for constructing the flood susceptibility model (Tehrany et al. 2014). The computational GIS database is being used in a phase of implausibility that can influence the appropriate procedure (Mojaddadi et al. 2017).
In the geospatial environment, all flood-conditioning factors are considered to be completely unbiased. Here, 14 flood causing factors, namely slope, aspect, altitude, plan curvature, profile curvature, land use/land cover (LULC), drainage density, lithology, rainfall, topographic wetness index (TWI), stream power index (SPI), depth of soil, distance from river and distance from road, were used as input data to estimate  Table 1). The flood-related causal parameters are discussed below.
The digital elevation model (DEM) map was prepared with a pixel size of 12.5 m from the ALOSPALSAR sensor, the slope map, aspect, plan and profile curvature were prepared by this DEM. Environmental degradation is one of the main reasons for the increase in floods and its devastating effects. Improper construction and the creation of impervious surfaces such as buildings, streets and roads disrupt ecosystem function are increasing the risk of flooding. The distance from river and the distance from road were prepared using river and road layer in GIS software as raster layers based on Euclidean extension. Drainage density map was created using line density extension. SAGAGIS software was used to TWI and SPI layers. The TWI equation: where as is the catchment area and tanB represent slope in radians. The SPI equation In Equation (2), r shows the slope in radian.
The soil depth map of the region was obtained from the Natural Resources Administration of the Province of Mazandaran, Iran. The lithological map was obtained from geological map of 1: 100,000 of the country mapping organization. Land use map was prepared by Landsat satellite images, OLI measurements and the Maximum Likelihood Classifier methods in the ENVI software. The interpolation method was used to prepare the precipitation map, based on observed data at seven climatological stations in Mazandaran province.

Multi-collinearity analysis
Multi-collinearity is seen as a state of interdependence that may occur apart from the origin or presence of two or more variables. Evaluation of a data set as to the nature of multi-collinearity must always serve as a unique function and step in any probability estimation (Haitovsky 1969). Scientists, statisticians and scholars from the various fields in which they conduct probabilities analysis would learn the reverse implications of multi-collinearity and create problems with symmetrical dependency detection. The main issue of multi-collinearity is the least square coefficient of optimization algorithm for variables associated with sequential linkages of major variances (Slinker and Glantz 1985). All other negative impacts are the result of these major variances: the projections are usually significant and may show signs of disagreement with known hypothetical variable characteristics; and the partial F-statistics were also highly dependent and incorrect for use in variable determination. The methods of estimation of the Tolerance (TOL) and the Variance Inflation Factor (VIF) shown as follows: The VIF is a predictor that presents the application with a define how much bigger the Variance would be for multi-collinearity statistics as contrasted with orthonormal data (in which the VIF is a 1.0). If the VIF are not significantly bigger than 1.0, the multi-collinearity is no major issue (Slinker and Glantz 1985).

AdaBoost
Boosting is one of the reliable techniques that can solve the problem of classification and is mainly linked to the solution of two class classifications. Initially, this technique was introduced by Freund and Schapire (1995) in the form of AdaBoost algorithm. The directions AdaBoost is generalized from two-class to multiple class depends on the understanding or perception of AdaBoost's performance in binary classification, which is still problematic. It has been observed that AdaBoost is similar to the forward stage wise multiplicative function algorithm, which reduces the speed of failure.
This type of technique is capable of increasing the accuracy of the predictive by integrating multiple weak classifier models (Zahid et al. 2020). The major point in AdaBoost is to optimize the distribution (w) and train the set of data (i) in individual iteration (t) to properly assess the maximum level or unexpected outcomes. Originally, AdaBoost selects a small portion of the training sample D ¼ (xi, yi) where the individual xi instance is the vector of the character values belonging to the area X and each yi category mark is connected to the xi belonging to the Y and then sequentially trains the AdaBoost predictor by selecting a training set that depends on the specific estimation of the final training. This assigns the maximum weight (w) to incorrectly classified results, so that such findings would have a good classification probability for the next iteration. In addition, the weight of each iteration shall also be applied to the qualified classifier according to its accuracy. Its most accurate classification may result in higher weights. If all training data do not match any error until the maximum number of estimation methods specified is reached, this approach will be followed. Alternatively, for each classifier training, algorithms would iterate all feasible functions and, in each case, determine the loss of each function. Therefore, the optimal function is chosen as the very first poor classifier. Then role of the initial learner is to identify a fragile hypothesis ðh t Þ : X ! fÀ1, þ 1g, which is acceptable for distribution (D t ). The objective is to choose h t in order to reduce the error ( t ).
Modifying the D distribution and highlighting the misclassified areas, the final equation for AdaBoost classifier is provided here.
where h t is a weak beginner, the coefficient is a t and the final hypothesis is outcome is H(x).

Boosted generalized linear models (BGLM)
The Generalized Linear Model is the extended part of the linear model which is associated with modified the regression structure to provide the 'non-normal distribution'. The mathematical expressions for structuring the method are as follows (Youssef et al. 2016): þC 1 X 1 ::::::::::::þC n X n 1 þ e C O þC 1 X 1 ::::::::::þC n X n where Y ¼ p was recognizable, i.e. the possibility of every other expected result to be unbalanced or stable due to some kind of combined effect of uncertainty variables. LOGIT is considered as a linear relationship for marginal reaction simulation to process the data at zero and 1. Severe principles thus, the GLM comprised of three components: (1) a conditional possibility for the response variable Y, (2) a linear determinant, and (3) a link feature that offers the correlation among the linear predictor and the gaussian distribution mean (Nikita 2014; Youssef et al. 2016). The mentioned model utilizes boosting segment-wise. Element-wise boosting indicates that one part of the prediction is fitted at a time, in this situation one conditional expression (Tutz and Groll 2010). The BGLM function enables communication to fit linear (generalized) frameworks. It is established of a linear (nonspecific) framework of the covariates x ¼ ðx1, :::, xpÞ T (Hofner et al. 2014): With the (predicated) response assumption l ¼ E yx ð Þ , the connection function and variables b (Hofner et al. 2014).

Extreme gradient boosting (XGB)
This method was initially proposed by Chen and Guestrin (2016) as an innovative algorithm which is considered to be an advanced gradient boosting associated with 'K classification' and 'Regression tree'. With the additive procedure, it can be helpful for learners to develop a strong prediction based on a combination of weekly prediction sets. XGBoost is trying to avoid computational complexity but also to improve computing resources. This is achieved by modifying the optimization problem, which allows the combination of probabilistic and normalization terms, while at the same time maintaining optimum computing speeds, and simulations are also widely performed for the XGBoost features throughout the training process. The additive learning mechanism of XGBoost can be found elsewhere. Initially, the first learner matches the entire input storage, and these coefficients are then integrated into Model 2 to address the disadvantages of the weak learning process. This convenient process is repeated several times, until after a stop indicator has been found. The final estimation of the model is based on the total amount of the individual instructor's prediction.

Deep boost (DP)
A simple boost stump sometimes cannot be considered to be more accurate to predict (Cortes et al. 2014). Then it is upsetting to provide a more complicated set of hypotheses, e.g. the pair of most decision trees, with a comparatively substantial percentage constraining deeper. However, the current training assurance for AdaBoost focuses not only on the range and number of training instances, but also on the difficulty of the calculated H interval of the VC component or the consistency of the Rademacher component (Cortes et al. 2014). From this perspective, the expansion of new hypotheses and procedures can play a key influence in the improvement of models.

Evaluation of efficiency
The performance of flood susceptibility models were validated by various statistical indices ). These indices are positive predictive values (PPV), negative predictive values (NPV), Sensitivity (SST), specificity (SPF) and accuracy (ACC), which are measured to confirm the flood susceptibility model. All modeling steps in this study were performed using R software (R Core Team 2020). Variance inflation factors (VIFs) were computed using the usdm package version 1.1 (Naimi 2015), all models and the ensemble forecasts were performed using the caret package in R.

Multi-collinearity analysis
Multi-collinearity is one of the precise methods for removing redundant variables of predictive models by considering the relationship between two or multiple independent variables . The TOL and VIF values are either 0.1 or ! 10, representing a high multi-collinearity (Amiri et al. 2019). From the multi-collinearity analysis of 14 flood causality factors, the range of VIF and TOL were 1.08 to 3.91 and 0.26 to 0.93, respectively. This indicates the maximum possibilities of efficient accuracy of the predicted models. The details of the VIF and TOL values of 14 flood causality factor are shown in Table 2. These variables are associated with permissible limit of VIF and TOL, so there is no such problem of multi-collinearity.

Flood susceptibility modeling
The flood susceptibility assessment was carried out with the help of a variety of boosting ensemble algorithms, a novel deep learning framework and GIS to estimate vulnerable areas. However, there are some spatial differences between the susceptible areas of the predicted models. All raster outputs have been reclassified into different qualitative classes using a natural break method in the GIS environment ( Figure 5 and Table 3).
In AdaBoost, the areal coverage of very high, high, moderate, low and very low are 152.04 (8.59%), 217.9 (12.31%), 222.66 (12.58%), 251.88 (14.23%) and 925.8 km 2 (52.30), respectively. A large part of the study area has very low to moderate sensitivity to floods. The very low to low susceptible areas (66.53%) are distributed in the southern parts of study area, whereas the rest of the portion (33.48%) of this region, mainly the major streams and its adjacent areas, are associated with moderate to very high susceptible areas.
In Boosted Generalized Linear Models (BGLM), the areas of very high, high, moderate, low and very low are 263.2 (14.87%), 366.6 (20.7%), 468.96 (26.49%), 422.57 (23.87%) and 248.95 km 2 (14.06), respectively. Most of the portion (62.06%) of this outcome is moderate to very high susceptible areas and which is concentrated in   along the streams and its influencing area. The remaining area of the watershed (37.93%) is associated with very low to low susceptible zones, which are located in the southern section of this study area.   In Extreme Gradient Boosting (XGB), the areal coverage of very high, high, moderate, low and very low are 257.26 (14.53%), 97.73 (5.52%), 113.75 (6.43%), 176.75 (9.99%) and 1124.75 km 2 (63.54), respectively. In this outcome, as the surface morphology and the river valley density expect the 73.53% of this region is associated with very low to low flood susceptible areas and which is located into the southern, eastern and western portions of the catchment area. The remaining 26.48% is associated with moderate to high flood susceptible areas and which is located only the major stream lines and surroundings.
In Deep Boost (DP), the areal coverage of very high, high, moderate, low and very low are 120.67 (6.82%), 156.19 (8.82%), 154.19 (8.71%), 312.22 (17.64%) and 1027.01 km 2 (58.01), respectively. In this model, 75.65% of this region are associated with very low to low flood susceptible areas and which is located into the southern, eastern and western portions of the catchment area. The moderate to high flood susceptible areas which are located only along the major stream lines and surroundings account for 24.35% of the area.

Evaluation of models accuracy
The evaluation of the predictive models was determined by using the following statistical parameters: Sensitivity, Specificity, PPV, NPV, and AUC. The AUC values of ROC of AdaBoost, GLM boost, XG boost and Deep boost are 0.89, 0.91, 0.88 and 0.87, respectively ( Figure 6). From this analysis, it was observed that all the predictive methods are showed high accuracy and successfully predict the flood susceptibility (Table 4) (Table 4).

Variables importance analysis
The maximum importance of the variables in the AdaBoost model was found to be altitude (100), distance from river (75.02), drainage density (68.11) and distance from road (51.17), while the moderate to minor importance is associated with variables such as soil depth (26.22), slope (24.27), plan curvature (21.87), TWI (20.13), lithology (13.12), profile curvature (11.41), land use (2.37) and aspect (1.11). In the case of SPI, no such importance was found in the modeling of the susceptibility of floods.
In the GLM boost model, the maximum importance of the variables can be found in the drainage density (100), profile curvature (46.66) and TWI (36.45), while the moderate to less importance are associated in the variables, such as slope (27.21), lithology (9.88), soil depth (7.55), altitude (6.64) and distance from road (1.5). In the case of aspect, plan curvature, distance from river, rainfall, land use and SPI, there is no such importance was observed in flood susceptibility modeling.
In the XG boost model, the maximum importance of the variables was found only in the altitude (100) variable, while the moderate to less importance is associated in the other variables, such as distance from river (23.78), distance from road (19.97), plan curvature (10.78), slope (7.24), aspect (5.73), drainage density (5.53), lithology (5.07), TWI (4.95), rainfall (4.94), profile curvature (3.83), soil depth (3.04) and SPI (0.25). In the case of land use, there is no importance was observed in flood susceptibility modeling.

Discussion
In the present study, we evaluated the efficiency of various boosting algorithms including AdaBoost, BGLM, XGB, and Deep Boosting for flood susceptibility mapping in the Talar watershed, Mazandaran province, Iran. Based on evaluation criteria, the Deep boost model is the most efficient, followed by AdaBoost, XG boost, and Deep boost algorithms. The results of variable importance analysis of the most accurate model (i.e. Deep Boost model) showed that the important variables are altitude, distance from the river, drainage density, and distance from the road.
The focus of this research is to introduce an efficient model for flood hazard evaluation based on different boosting ensemble algorithms in the Talar watershed, Mazandaran province, Iran. For increasing the accuracy of the predictive models, different ensemble hybrid approaches were considered. From this outcome, the Deep boost model indicated the maximum predictive accuracy followed by AdaBoost, GLM boost, and XG boost, respectively. The efficiency of the Deep Boost model has been proven in different studies in various fields (W. ).
The reason for higher accuracy of Deep boost model is the positive importance of the conditioning factors for the flood hazard assessment, which is much higher than other models. Apart from this, the Deep boost algorithm have special features that easily increases the efficiency and accuracy of the model. Deep boost model is one of the ensemble models that in this model ensemble of deep decision trees or members of some other very rich or complicated families can make out in achieving a higher efficiency level (Cortes et al. 2014;. The efficiency of the classifier is one of the determining elements of model accuracy. Independent sample estimation is one of the major issues of model performance regarding accuracy and predictive capability. The main advantage of a hybrid algorithm is that the predictive ability is always higher than the single model. From the previous studies, it can be said that the hybrid ensemble approaches are more efficient than any single model, so our main objective is to propose the most suitable model that can predict flood susceptibility. In this study, the importance of variables affecting floods hazard was evaluated. Investigation of importance value shows that altitude, distance from river, and distance from road are very effective variables in modeling flood hazard in this study area. The results of the flood susceptibility maps of the Talar watershed show that low elevation areas of the watershed, which often have a low slope and are close to the watershed outlet, are highly sensitive to floods. The importance of altitude has been proven in other studies in the field of flood risk modeling in other studies Sahana and Patel 2019;Das and Pal 2020;Yariyan et al. 2020;Kalantar et al. 2021). In accordance with the present results, Kalantar et al. (2021) demonstrated that altitude and distance from the river are the most important variables in flood hazard modeling in Brisbane, Australia. Physiographic characteristics such as altitude directly affect the hydrological regime and indirectly affect the climate of the region. In other words, physiographic and topographic conditions have a decisive effect on hydrological characteristics and water regime such as flood volume, runoff coefficient, intensity and weakness of flood discharge Dodangeh et al. 2020). Another factor affecting flood risk in the study area is the distance from the river. The results of the flood susceptibility studies indicated that areas close to the river are highly sensitive to flood risk. Various researchers have shown in their studies that one of the important factors affecting the flood risk is the distance from the river and most areas near the river are highly sensitive to flood risk (Arabameri et al. 2020;Mosavi et al. 2020). In this study, the distance from the road was also an important factor in modeling flood risk. Due to the fact that road construction creates areas with less water permeability than other areas, it is an important factor in causing floods (Pesaresi et al. 2016).
Flood is one of the major problems in every region of the world and it is associated with extreme loss of the property, economy and livelihood of the people (Few 2003). In this perspective, various methods were developed for estimating the flood susceptibility and the main objectives of those models are the same, i.e. predicting this scenario at more optimal level. Most researchers have tried to incorporate new algorithms to develop new models or create hybrid models from the existing models (Nourani et al. 2014). Floods cannot be completely avoided but we can minimize its effect by introducing the appropriate measures. For this purpose, the identification of vulnerable areas in accurate manners is necessary and first consideration for planning purpose. As a result, the use of boosting models (e.g. Deep Boost) has a high accuracy in spatial prediction and modeling, which is suggested as an ideal method in future studies.

Conclusion
In this study, four boosting ensemble algorithms were considered for flood hazard modeling. According to evaluation criteria, the deep boost ensemble approach is the most efficient among the other approaches in this study. All models had high accuracy, and 24.35% area of this catchment is connected with moderate to very high flood hazard zones. Based on result of Deep Boost model, the maximum importance of the variables is found in the altitude, distance from river, and drainage density. The main finding of present study is identification of flood susceptible areas. In recent years, due to human activities and environmental changes, the rate of natural disasters such as floods in Iran has increased. One of the main ways to reduce flood risk is to assess flood prone areas in different watersheds. Therefore, the outputs of this investigation can help planning of future development plans for flood mitigation in the Talar watershed. In terms of methodology, researchers are trying to assess the spatial modeling of flood hazard using different models in a number of studies and recommend an optimum one. The results of this study recommend Deep Boost algorithm, not only in the future study of flood hazard, but also in other fields of spatial modeling because of high accuracy of this algorithm.
Ogunrinde wrote the manuscript and discussion, and analyzed the data; Subodh Chandra Pal, Quoc Bao Pham and Khaled Mohamed Khedher provided technical sights, as well as edited, restructured, and professionally optimized the manuscript. All authors discussed the results and edited the manuscript. All authors have read and agreed to the published version of the manuscript.