Flood susceptibility mapping using meta-heuristic algorithms

Abstract Flood is a common global natural hazard, and detailed flood susceptibility maps for specific watersheds are important for flood management measures. We compute the flood susceptibility map for the Kaiser watershed in Iran using machine learning models such as support vector machine (SVM), Particle swarm optimization (PSO), and genetic algorithm (GA) along with ensembles (PSO-GA and SVM-GA). The application of such machine learning models in flood susceptibility assessment and mapping is analyzed, and future research suggestions are presented. The model of flood susceptibility model was constructed based on fifteen causatives: slope, slope aspect, elevation, plan curvature, land use, and land cover, normalize differences vegetation index (NDVI), convergence index (CI), topographical wetness index (TWI), topographic positioning Index (TPI), drainage density (DD), distance to stream, terrain ruggedness index (TRI), terrain surface texture (TST), geology and stream power index (SPI) and flood inventory data which later is divided by 70% for training the model and 30% for validated the model. The model output was evaluated through sensitivity, specificity, accuracy, precision, Cohen Kappa, F-score, and receiver operating curve (ROC). The evaluation of flood susceptibility mapping through the receiver operating curve method along with flood density shows robust results from support vector machine (0.839), particle swarm optimization (0.851), genetic algorithm (0.874), SVM-GA (0.886), and PSO-GA (0.902). Compared have done with some methods commonly used in this susceptibility assessment. A high-quality, informative database is essential for the classification of flood types in flood susceptibility mapping that is very important and helpful to improve the model performances. The performance of the ensemble PSO-GA is better than that of the machine learning model, yielding a high degree of accuracy (AUC-0.902%). Our approach, therefore, provides a novel method for flood susceptibility studies in other watersheds.


Introduction
Flood is one of the common and disastrous natural hazards in different regions of the globe. According to United Nations disaster and risk reduction data, floods from 1996 to 2015 have caused severe damage to property and human life (Hong et al. 2018;Chen et al. 2019). Preparation of flood susceptibility maps is important in reducing the damages arising from flood and exercise alert through suitable measurements in the susceptible areas. In this context, geo-environmental parameters are essential for the formulation of susceptibility maps considering the most significant causatives of the flood. The susceptibility map is helpful for regional planners and decision-makers to take suitable measures. Even though floods cannot be controlled, this map help in managing the pre-condition and post-operation of floods.
The assessment of floods is complex because of the diverse climatic and anthropogenic factors. With the ever-increasing global population, devastating floods can affect millions of lives and property loss. Due to global climate change, the increasing frequency and intensity of rainfall also lead to a flood. In the case of Iran, the climate is highly variable, with cold winters and heavy snowfall at subfreezing temperatures in the northwestern part. The weather is relatively mild during the spring season, whereas summer is dry and hot. In the southern part of the country, winters are mild, and summer is very hot, with an average daily temperature in July exceeding 38 C. Though Iran has a continental climate, most precipitation occurs from October to April, and the annual average precipitation is 880 mm. When the intensity of rainfall goes above infiltration capacity, flood events occur. Anthropogenic activities like land use planning, unscientific construction, dam construction, cutting of trees for road construction, and rapid settlement plantation on the riverbank have all contributed to the increased impact on the flood. Where dam construction can change the normal hydrological process and impervious area, road construction and urbanization reduce infiltration rate and promotes flood with warm water that indirectly destroys the ecosystem process in the river (Hawley and Bledsoe 2011). Therefore, both climate and human activities contribute to flood occurrences.
The preparation of flood susceptibility maps using various machine learning models through statistical approaches has gained attention in recent years, rather than the earlier approach of using only statistical models. Several previous studies have adopted machine learning models to prepare flood susceptibility maps such as decision trees (Khosravi et al. 2018;, support vector machine (SVM) (Huang et al. 2010;Lin et al. 2013), random forest (Chapi et al. 2017;, and artificial neural network (Campolo et al. 2003;Gebrehiwot et al. 2019). Several regional approaches were also applied in the catchment to estimate the flood probabilities, namely index flood (Dalrymple 1960; ) Region of Influence (Burn 1990;Zrinji and Burn 1994). In recent years many data driven methods considering bi-variate statistical models, multi-criteria decision making (MCDM), Artificial Intelligence (AI), Computational Intelligence (CI), soft computing, Data mining (DI), Knowledge Discovery in Database (KDD), Intelligence Data Analysis (IDA) have shown promises to predict flood susceptibility. Recently, the hybrid computational method has been shown to perform more accurately preparing flood susceptibility maps. The sub-sampling and bootstrapping algorithm with machine learning is used for preparing such maps. Costache et al. (2020)point out the application of integrated frequency ratio and logistic regression model for FSM. The bagging and random subspace ensemble strategies instigated with Reduced-error pruning tree models have also been applied . Bui et al. 2019;Wang et al. 2019. Bui et al. (2019 and Wang et al. (2019) used meta-heuristic optimization algorithms to find the optimal parameters of intelligence models. Though these methods have different strategies, they are all helpful to predict the unknown areas defined by the probability for flooding.
A flood susceptibility map is a tool for risk reduction measures (Tehrany et al. 2014;Youssef et al. 2016;Dano et al. 2019;Mind'je et al. 2019;. In a nonlinear relationship between multiple variables and hazard level, the flood susceptibility assessment is considered as a quantitative evaluation of the degree of rainstorms and their adverse impact, including the root cause of the flood and final output as an environmental hazard. The methods of flood susceptibility maps were categorized into three parts: hydrological and hydrodynamic models, multi-criteria decision analysis, and machine learning models. In machine learning methods, feature engineering is an essential step that uses domain knowledge of the data to features to obtain more accurate results. This critical technique converts the raw data into specific data to the predictive models. Thus, the information from the study area is first converted from raster data to specific spatial resolution. The susceptibility prediction is regarded as a binary classification process to differentiate the grid cell responsible for flood disaster. Each grid cell comprises a feature value that is a conditioning factor.
Most of the previous studies provided a simple rate of contribution of two factors based on the time series, where the flood susceptibility map with causative characteristics represents the final output in the environment. The simple relative contribution rate helps to understand the effect of climate change and human activities to flood. Nevertheless, modern approaches have established the causatives as spatial training data and validated them through ensembles for accuracy.
A large amount of spatial data can be acquired through remote sensing Zhao et al. 2020;Chao et al. 2021;Wang et al. 2021;Zheng et al. 2021;Zhao et al. 2021) and geographical information system (GIS) (Li and Zhang 2008). The MCDM (Yang et al. 2022), statistical , deep learning (Zhong et al. 2021), and machine learning ) models and their hybrid combinations were used for better accuracy. Based on a careful evaluation of previous studies, the most popular and robust method was selected in this research work: SVM, particle swarm optimization, and genetic algorithm (GA) for preparing the flood susceptibility map to identify vulnerable zones.
This study aims to present an appropriate methodology for the delineation of susceptibility map assessment in the Kaiser watershed, Iran; therefore, for this purpose, we have used a machine learning model through remote sensing and a GIS platform. This approach will help to identify flood-prone or vulnerable zones by considering geo-environmental factors in the surrounding environment. Fifteen causatives were considered in the model for testing and training. The output of three machine learning models was validated through sensitivity, specificity, accuracy, Cohen Kappa, receiver operating curve (ROC), precision, and F score. The results from the present study can be helpful for regional planners, engineers, and future researchers by focusing on the vulnerable zone for effective management. The governmental agencies can also use the information for taking suitable preventive measures. This paper is organized as follows: section 2 has described the study area and dataset. Section 3 has represented the models and a brief discussion about them. Moreover, the result of the application of various techniques was presented in section 4 to address flood-prone areas in this region. Finally, a summary and conclusion have to end this research paper.
Based on the previous literature, the hybrid model performs better than the single model (). SVM advances a lot in dealing with the problem of nonlinear regression prediction. However, its generalization performance is sensitive to the selection of the parameters (Bui et al. 2016). The difficulties in capturing the critical modeling variables are noted as the major drawbacks to SVM . Hence, it is urgent to apply optimization algorithms to search for the optimal parameters of SVM. The Genetic Algorithms (GA) is an optimization algorithm it simulates the mechanism of genetic variation and the theory of biological evolution. GA has the advantages of parallelism and global optimization. It has achieved excellent optimization results in plenty of studies (Zhang et al. 2014). In the current study, GA was utilized to select and optimize the parameters of SVM. Application of SVM-GA hybrid model in flash flood susceptibility mapping is novelty of this manuscript.

Study area
The Kaiser watershed covering 3648.32 km 2 of the present study is located in the Mazandaran province, northern Iran. This area lies between 35 56 0 22 00 to 36 49 0 20 00 N latitude and 53 00 0 34 00 to 53 43 0 41 00 E longitude ( Figure 1). The elevation ranges from 1 mt m.a.s.l to 3725 m.a.s.l. and average elevation is 932.62 m.a.s.l. Half of the basin part is mountainous, whereas the northern part of the watershed has a smooth, gentle slope, forming part of the Alborz mountains. According to IRIMO 2012 (IRIMO 2012), the annual rainfall of this area is around 600 mm. the high intensity of rainfall over a short period and land-use changes in rangelands, e.g. deforestation, conversion to agricultural land and garden to residential areas, causes severe flood here. Maximum rainfall occurring month is January, February, March, and October, addition October is the wettest month within average monthly precipitation of 160 mm. The climatic features of the area s semi-humid, semi-arid, and the Mediterranean. Most of the study area is utilized for agriculture (around 45.35%), followed by rangeland (38.07%), forest (15.95%), and residential area (0.64%). Geologically, the dominant part of the area consists of marl, calcareous sandstone, sandy limestone, and a minor conglomerate. Red conglomerate and sandstone are also found. In this study area, alfisols (58.30%), mollisols (25.76%), and rock outcrops/entisols (8.73%) are the most common soil types.

Database
Data preparation is an essential and forward step to prepare the flood susceptibility map. The slope, slope aspect, elevation, and plan curvature factors were extracted from the digital elevation model (DEM) using Arc GIS software. Using Euclidean tools, the drainage density and distance to stream were extracted. The land use land cover data were derived from Landsat 8 satellite images. The factors of NDVI were calculated using ENVI software. The topographic wetness index (TWI), Topographic position index (TPI), stream power index (SPI), terrain ruggedness index (TRI), terrain surface texture (TST), and convergence index (CI) were calculated from DEM using SAGA software. Various scale data were first vectorized on the Arc GIS platform. The flood conditioning factors were extracted and converted into raster data from the vector data. All the data were converted into raster data with the exact spatial resolution consistent with the DEM (Table 1). Based on previous data, expert knowledge, and characteristics of the flood distribution, the factors were reclassified. In this study, 70% flood and non-flood points were considered as training data set, and 30% flood and non-flood points were considered for validation of those datasets.

Methodology
We adopted the following workflow for this study.
I. Firstly, the flood location and causatives were identified through high-resolution satellite images, Google earth images, GPS, and based on an extensive field survey. The flooding and non-flood points were considered 70% and 30% in a random split. II. The multi co-linearity analysis among the factors was done through the variance inflation system (VIF) and tolerance (TOL). III. To get the susceptibility map for identifying the flooding zone, three models were used, namely, SVM, PSO, and GA. IV. Two ensemble models were used for better performance with accuracy: the SVM-GA and PSO-GA ( Figure 2). V. Validation of results has a vital role in scientific works (Moayedi and Mosallanezhad 2017;Moayedi et al. 2021;Mi et al. 2022;. This model's result was validated through sensitivity, specificity, accuracy, F score, kappa's model, precision, and receiver operating curve.

Flood inventory map
For modeling and susceptibility mapping of floods in an area, an inventory map is essential and primary data for training and testing the applied models. This flood inventory map shows the spatial distribution of floods and their geometry. Through the susceptibility map, one can predict the vulnerable flood zone by studying historical flood events. In recent years, remote sensing and GIS have provided important platforms to create flood inventory maps. The geographical location of flood images was collected through Google earth images and local and regional organizations. Extensive field observation was done to compliment the global positioning earth system (GPS) and Google earth images. Based on this data, all-inclusive maps were created in the ArcGIS10.3 version. The final output inventory map was split into two parts as training and validation data in seven replicates with different training and validation balances. In this study, flood points were considered in the model 70% and 30% as training and validation data set. Moreover, the results from the three models were validated through sensitivity, specificity, accuracy, precision, F score, kappa model, and receiver operating curve.

Flood conditioning factors
A variety of environmental factors triggered the flooding condition, and these causatives ensure the reliability and accuracy of flood susceptibility mapping due to appropriate parameters (Chapi et al. 2017;. We selected the flood conditioning factors in this research work based on previous studies and expert knowledge. noted that a flat area is more probable for flood as water flows down from higher terrain. River flood disasters are more likely to occur in lower height and slope areas in Iran. The degree of slope surface is indicated by curvature and is an important factor to cause the flood. The minimum value is 0, and the maximum value is 72.45 in degree ( Figure 3). The slope Aspect is another key factor, as the windward slope is prone to precipitation. Aspect is related to the intensity of solar radiation, which affects the surface vegetation and soil moisture. The elevation ranges from 1 mt to 3725 mt control the water flow rate and is responsible for the flooding. Furthermore, the terrain parameter, convergence index (CI), represents the agreement of aspect direction and control of the flooding process. The output value varies from À100 to 100 and is extracted from the DEM and satellite images with 12.5 m resolution. Several authors (e.g. Tehrany et al. 2015;Mahmoud and Gan 2018)confirmed that the topographic wetness index (TWI) presents soil saturated situation associated with water accumulation in the basin. TWI is calculated as follows: Where / presents upslope area per unit contour length and slope angle is represented by b: In Iran, most of the floods occur due to heavy rainfall, and the annual average precipitation is chosen as a conditioning factor for the flood. The stability of the terrain is affected by the stream power index (SPI), which directly reflects the erosive force of the current. Studied the relationship between geomorphic factors and flood, he claimed that catastrophic channel variation is led by high stream power.
Where A S and tanB indicate the area of the basin and slope gradient, respectively . Water permeability, storage capacity, and drainage density are controlled by soil types (Zuo et al. 2020;Jiang et al. 2021). Lithology determines the shape of the channel and affects the development of floodplains (Heitmuller et al. 2015). The distance of the river as a factor controls the flood discharging and its expansion. The surface vegetation coverage and density were shown by normalizing differences in vegetation index (NDVI). The relationship between NDVI was established in previous studies (Gao et al. 2012). Caprario and Finotti (2019)noted that sparse vegetation cover has a high potential for flooding since its poor water storage capacity. The NDVI is calculated as follows: R NIR is the spectral reflectance of the near-infrared band, and R R is the red band in the electromagnetic spectrum (Pal et al. 2018;Malik et al. 2019).
The drainage density on the earth's surface is directly related to the hydrological process , and the range varies from 0 to 1.82. High drainage density defines a more vulnerable area or zoning to flood. Water infiltration and evapotranspiration are affected by various types of land use and land cover of any area. Nine types of land use zone have been identified, namely agricultural area, fellow land, built-up area, water bodies, dense forest, scattered forest, mixed forest rangeland, and plantation. Moreover, the geology of any region helps construct river channels and develops the flood plains. The geology of our study area can be classified into nine zones. The topographic position index can be used to determine the ruggedness of the terrain. It can be calculated by comparing each cell slope to the mean slope. The topographic ruggedness index expresses the elevation difference between adjacent cells of a DEM. The value of TRI varies from 0 to 85.28 and another factor, topographic surface texture, refers to both the profile shape and surface roughness which control flood intensity and speed of flowing. Here the value of TST 0 to 68.63 plays a significant role to occur the flood in this region frequently.

Multi-collinearity analysis
It is a standard criterion for excluding the co-linearity and selecting influential factors for the probability estimation model. If the value of variance inflation function (VIF) is more than 5 and the TOL value is 0.1 then it can be concluded that factors create the multi co-linearity problem. Fifteen flood conditioning parameters were selected to prepare the flood susceptibility map: slope, slope aspect, elevation, plan curvature, SPI, TWI, CI, TST, TRI, TPI, NDVI, LULC, and distance to stream, drainage density, and geology. Two indices were included to estimate the multi co-linearity assessment of the factors, such as variance inflation function (VIF) tolerance (TOL). The calculation is as follows: where, R 2 j indicate the regression value of j on supplementary parameters in a given dataset (Roy, Chandra Pal, Arabameri et al. 2020;Chowdhuri et al. 2021).

Importance of flood susceptibility causal parameters by random forest
In this research, random forest (RF), a robust machine learning method, was partially applied to determine the relative importance of flood causal parameters. The RF is the multivariate non-parametric statistical analysis. First of all, the algorithm was developed by Breiman (Breiman 2001). The RF algorithm generated the decision tree (DT) with the help of training datasets . The target variables indicated (1) and absence (0) of the flood and the individual flood causal parameters input as independent variables. The RF algorithm and its generalization error are expressed here. In this analysis, the out-of-bag error is 5.14%, where the prediction result is 94.86% is corrected. The average values of mean decrease Gini (MDG) and mean decrease accuracy (MDA) were used to calculate the importance of LS causative factors for the construction of the flood susceptibility model.
Where, x and y represent flood causal parameters specify the probability over x and y space, mg indicates the marginal function, and I ( Ã ) represents the indicator function (Breiman 2001).
2.3.5. Flood susceptibility modeling 2.3.5.1. Support vector machine. Among machine learning methods , a support vector machine (SVM) is a data mining machine learning method including a set of linear indicators functions, and for the issues of solution estimation, it can be used (Vapnik et al. 1995). The SVM is also known as the maximum margin method (Mohammadi et al. 2019), and it yields higher performance and better results with limited data points. Based on the statistical learning theory SVM maps the dataset into a high dimensional feature space through nonlinear transformers to create the best hyperplane (Ghorbanzadeh et al. 2019). When margins between the defined classes of the problem are maximal, the best hyperplane can be obtained. SVMs are unidirectional in character and consist of two layers that implement various activation functions: linear, polynomial, radial, sigmoid, and radial basis function (RBF). Among them, the RBF kernel function is more prevalent in SVM for its nonlinear classifier or regression line characteristics. Though SVM includes four kernel categories approach to understand the classification efficiency and susceptibility mapping along with the type as a string. SVM performs and generalized well on the out of sample data. It is slightly different from SVR, a regression algorithm for working with continuous values and SVM is for classification and can handle high dimensional data, which proves a great help taking into its account and application in the field of machine learning.
2.3.5.2. Particle swarm optimization. Particle swarm optimization is an evolutionary computing algorithm derived from bird predation behavior study. The birds foraging process, heuristic characteristics, and random search of evolutionary algorithms inspired the construction of this model (Abido 2002). In this algorithm, the bird is considered an independent particle, and a group of birds is taken as a particle swarm. The 'group' and 'individuals' are common like another evolutionary algorithm. Each particle can be regarded as a searching individual in the N-dimensional search space. The optimal historical position of the particles and optimal historical position of the population helps to adjust the flight speed of the particle dynamically. Particle swarm optimization includes two parts, namely speed and position. The speed of movement is represented by speeds and positions representing the direction of movement. The mathematical calculation is as follows: Where Pbest represents the historical best position of the single-particle and Gbest represents the historical best position of particle swarm, c1 and c2 are considered as learning factors. R1 and r2 are the random probabilities values that vary from 0 to 1. w indicates the inertial weight. The current position and velocity are represented by x and v, respectively. X 0 and v 0 represents the updated position and velocity of the particle.
Each particle helps to search for the optimal solution which is called an individual extreme and in the particle swarm, the optimal individual extreme is taken for the current global optimal solution. The speed and position are continuously updated by iterating which helps to get the optimal solution and finally meets the terminate conditions. To adapt to the environment, every particle helps one another in this model and achieves the optimal search of complex solutions in complex spaces (Armaghani et al. 2014).

Genetic algorithm.
From an evolutional perspective, the genetic algorithm developed a computational model (M€ uhlenbein 1997). This type of algorithm encodes the potential solution or candidate solution to a specific problem on the chromosome-like data structure. Whitley (1994) noted that the genetic algorithm preserves critical information by applying recombination operators to those structures.
The GA searches for a candidate solution space by simulating the evolution process in nature to get the best solution. A set of initial solutions is produced and distributed in solution space for the optimal solutions. Here each population member is considered as a chromosome including several genes. Each gene shows the behavior and traits of the chromosome. A fitness function is defined for chromosomes based on objective functions. The calculation of the population member depends on the fitness value. For the next generation, the chromosome with a better fitness value is considered as the parent chromosome. The three main operators in orders will affect the new generation of the population. New solutions characterize the new population or chromosome. The selection, mutation, and cross-over are the three leading operators of the genetic algorithm. The initial solutions change depends on the convergence of the optimal solutions instill the fitness is acquired (Marchetti and Wanke 2020). The genetic algorithm model optimizes the solutions by repeating and applying defined operators. This algorithm has a global exploitation ability without evaluating many points in each space. It is possible to find a solution to a complex combinatorial optimization problem in this situation. The mutation operator prevents this algorithm from being trapped in the local optimum by increasing candidate solutions.
2.3.5.3.1. GA-SVM model. In order to build an effective SVM model, the parameters (C and c ) of the model need to be chosen properly in advance (Lin 2001). The parameter C determines the tradeoff cost between minimizing the training error and complexity of the SVM model. With a bigger C value, the predictive accuracy of the training sample is higher. However, this may cause an over-training problem. The parameter of the RBF kernel function defines a nonlinear mapping from input space to high-dimensional feature space. The value of effects the shape of RBF function. Hence, the parameters (C and c ) have a powerful influence on the efficiency and generalization performance of the SVM model. At present, the choice of the parameters lacks the guide of mature theory, mainly depending on experiences. A grid-search technique was presented by Lin (2001). However, the grid algorithm is time-consuming and does not perform very well (Gu et al. 2011).
According to some related research in different fields, GA is proved to be a better choice to determine the parameters (Lessmann et al. 2005). It can reduce the blindness of human-made choices and improve the predictive performance of the SVM model. Therefore, we choose GA to search for the optimal parameters of the SVM model for flash flood prediction in this study. The algorithm can be realized by a parameter optimization procedure designed by Li of Beijing Normal University based on the libsvm-mat toolbox developed by Lin of National Taiwan University (Chang and Lin 2001).
We built a single-factor SVM model for the flash flood prediction and determined the parameters (C and c ) of the model by GA. The GA had a generation number of 100, the population size of 20. The search range of C and c parameters is [0, 100]. We obtained a best C parameter of 6.8231, and a best c parameter of 0.12945. The model with the best parameters has the smallest mean square error (MSE) (0.00034).
We also built a multi-factor SVM model for the flash flood prediction and determined the parameters (C and c ) of the model by GA. The parameters of GA and the search range of C and c parameters are identical to those of the single-factor GA-SVM. We obtained a best C parameter of 63.6, and a best c parameter of 0.0045 and MSE 0.00021 for multi factors SVM by GA.
2.3.5.3.2. PSO-SVM model. Considering the sensitivity of SVM to the model parameters, the PSO method was proposed to establish the PSO-SVM prediction model in this paper. Then, the PSO was applied to search the SVM parameters: the penalty factor C, the kernel function c of the radial basis RBF and the parameters of the loss function p. Where the evolutionary generation quantity of the PSO algorithm was 100 and the population number was 20. The search result was: Cbest ¼ 23.3241, cbest ¼ 0.0265, pbest ¼ 1743.
2.3.6. Validation and accuracy assessment. The validation and accuracy of the results have a crucial role in scientific works (Zhou et al. 2018;Xu et al. 2021;Liu et al. 2021;Cao et al. 2021). The validation and accuracy of predictive results and evaluation of MLA and optimization models applied in the study were conducted by AUROC, Accuracy, Sensitivity, Specificity, F-Score, Kappa, MAE (%), NSC (%). The ROC curve is the graphical presentation of model validation. The AUC of ROC is the quantitative measure of the graphical presentation. The following equations are incorporated for calculating the other statistical indices of model evaluation Hoque et al. 2020;Lei et al. 2020;Pham et al. 2020;.
Where, TP is the true positive, TN is true negative, FP is false positive, FN is false negative and P and N signify the presence and absence of flood, accordingly Arabameri et al. 2021).

Multi-collinearity analysis
As individual variables are associated in a predictive model, multi-collinearity occurs. Since factors must be independent, this association is a concern. Based on which control variables would be in the framework, the estimated coefficients will fluctuate wildly. The coefficient gets markedly susceptible to the model's sudden variations. It decreases the accuracy of the predicted coefficients, thus reducing the regression model's predictive strength. The multi-collinearity of the multiple parameters was performed by considering VIF and TOL. The ranges of TOL and VID in this flood susceptibility assessment are 0.205 to 0.897 and 1.114 to 4.772, respectively (Table 2). With the basis of TOL limit and VIF there is no problem of multicollinearity in this flood susceptibility assessment.

Model evaluation performance result
In the method of statistical modeling, accuracy assessment plays a significant role. By selecting the correct indicators, the efficiency of a predictive outcome is measured and compared. Thus, it is important to use the correct parameters for a specific predictive approach to obtain an accurate performance. Evaluating appropriate predictive models is necessary since different types of information points are likely to be considered by the particular predictive model. In this analysis, AUC and different statistical indices, such as sensitivity (SST), specificity (SPF), accuracy (ACC), precession (PRE), F-score (F), and kappa index (KI) have been considered to find the most optimal model. Testing for these indices indicates that all four models worked well. The value  of different statistical indices as validation methods for the training and validation datasets is shown in Figure 6.    (Figure 7). The PSO-GA model has 91.6 percent success for running the flood susceptibility model and 90.2 percent accurate in predicting the flood occurrences. From the analysis, it is pretty clear that the novel ensemble PSO-GA model is the best optimal for flood susceptibility model than the other ensemble and standalone models. Another ensemble model, SVM-GA, had the second-highest goodness-of-fit and predictive performance with the AUC values of 90.1 (training) and 88.6 (validation), respectively. As a result, combining evolutionary optimization algorithms with a machine learning model outperforms standalone SVM and PSO models. Even though SVM and PSO are highly structured and capable of solving multidimensional problems, their performance was unsatisfactory. However, when combined with metaheuristic algorithms, both goodness-of-fit and predictive performance improve to acceptable levels by resolving the parameter tuning and optimization issues of SVM and PSO.

Relative importance of the variables
The relative importance of the selected variables was estimated by considering the RF algorithm. The importance of some variables i.e. drainage density, LULC, TWI, TRI, distance to stream, plan curvature, and NDVI in flood susceptibility assessment, is maximum and the important values of these parameters are 0. 890, 0.760, 0.612, 0.564, 0.502, 0.453 and 0.421 respectively. Rests of the parameters, i.e. convergence index, TST, elevation, slope, geology, TPI, SPI, and aspect, are associated with moderate to lower importance in flood susceptibility assessment; the importance values of these parameters are 0. 394, 0.358, 0.312, 0.284, 0.222, 0.204, 0.151 and 0.124 respectively (Figure 8).

Discussion
Flood is one of the most dangerous natural hazards that frequently occur in regions with high slopes like those in Iran (Pourghasemi et al. 2019). Flood susceptibility mapping is an important tool for management after flood occurrence and for effective planning before floods . Sometimes the complex conditions make it very difficult to make reliable predictions of flood occurrences and causative factors (Ouma and Tateishi 2014).
The flood generating factors are complex, and this mechanism and is different in different regions (Merz et al. 2010). Generally, they are considered as internal and external factors. The factors that are common for flood in our study area are: slope, slope aspect, elevation, plan curvature, land use and land cover, differences vegetation index, convergence index, topographical wetness index, topographic positioning Index, drainage density, distance to stream, terrain ruggedness index, terrain surface texture, geology, and stream power index. Among these, the most important causatives by the model performances are drainage density, land use, and land cover. The changes in land-use patterns and hydrological processes also play a major role in flash floods. External factor such as human activities also has a major control on flooding. The destruction of grassland or forest, road construction, and rapid settlement on the riverbank side are also major threats to flood. If the input data are more reliable, then the result will be more accurate. However, there is no standard rule for selecting non-flood locations and a random process selected these. Many previous studies noted that non-flood points are that locations where no flood occurred. Future studies require improved input data to select reliable non-flood locations.
In this study, to prepare the flood susceptibility map, we used three machine learning models and two ensembles. The machine learning model performances and success rate were compared through the validation such as particle swarm optimization (success rate 0.85% and prediction rate 0.88%), support vector machine (success rate 0.839% prediction rate 0.86%), genetic algorithm (success rate 0.87% and prediction rate 0.89%). The novel ensembles (SVM-GA and PSO-GA) for mapping flood susceptibility were also considered in the Kaiser watershed. In this work, the ensemble model PSO-GA (prediction rate 0.91% and success rate 0.90%) and SVM-GA (prediction rate 0.90% and 0.88%) yielded better results than the single model. The result also demonstrated that the ensembles outperformed other models with higher accuracy. The percentage of susceptibility classes varies from very high to very low classes. The spatial distribution of these three models and ensemble performance are as follows: vulnerability zoning by SVM (very low 30.92%, very high 10.17% area) vulnerability zone by PSO ( very low 28.34% and very high 10.57%), vulnerable are by GA (very low 29.64% and very high 12.28%), vulnerable zone prepared by ensembles SVM-GA (very low 24.36% and very high 8.96%) and vulnerable zone by PSO-GA (very low 28.65% and very high 8.37%). These data suggest that more practical and reliable results are yielded by the main models. The ML method can reduce cost and time with an effective mitigation plan in flood-targeted areas. Some factors are most influential in flood susceptibility assessment; drainage density, LULC, TWI, TRI, distance from the stream, etc. The different resecures established similar kind of findings in different parts of the world (Shafapour Tehrany et al. 2019;Tehrany et al. 2019). There is a direct impact of LULC on flood occurrences has been already established by different researchers . Flood susceptibility prediction research is currently being undertaken in America, Europe, Africa, and Asia (Alfieri et al. 2017). Though the effects of climate change coupled with the influence of predicted LULC on flooding are still debatable, the effects of climatic instability and seasonality necessitate further investigation . From the storm rainfall event and its associated flash flood, the large scale erosion can be continued and which is a negative effect on the environment and its associated ecosystem (Pal and Chakrabortty 2019a;Pal and Chakrabortty 2019b;. The changing pattern of LULC and conversion of forest cover into settlement and agricultural areas can drastically increase the amount and intensity of flood. On the other hand, the increase in impervious areas can changes the nature of runoff and, the increase in impervious areas can changes the nature of runoff, which ultimately influences the magnitude of the flood.
Our study may be beneficial for land-use planners and governmental policymakers to adopt suitable measures for flood management. The decision on the construction or settlement zones as well as dams and other structures in the susceptible or vulnerable zones can also be evaluated based on our results. These time-saving and modern approaches in machine learning can help reduce the loss of life and property and financial liabilities.

Conclusion
The flood susceptibility map formulation is a non -engineering approach for preventing floods and reducing socio-economic losses. This method combined both synthetic and analytical methods by considering numerous factors that lead to flooding. The most prominent model can be considered for solving uncertainty between quantitative and qualitative factors. This study focused on the Kaiser watershed located in the Mazandaran province of northern Iran, where the flood susceptibility map was constructed by applying fifteen parameters. The main goal of this study is to evaluate three models (SVM, PSO, and GA) and ensemble models (PSO-GA and SVM-GA). The results show that the ensemble models represent better accuracy with a value of PSO-GA (0.92) and SVM-GA (0.886). Except for ensemble SVM-GA and PSO-GA, all the probability maps appeared to be visually similar. This could be due to the inefficiency of the standalone model to detect the probable flood areas precisely.
The ensemble model is good at predicting flood locations, as evidenced by the seven types of validation methods. The best combination for a sound flood susceptibility model is to consider more precise flood causative factors, locations of historical flood events, and new ensemble machine learning models. So, shortly, flood susceptibility analysis, landslide susceptibility analysis, soil erosion, gully erosion, and potentiality analysis of agriculture, mineral, and groundwater could all be done using the above mentioned ensemble methods. Because of the lack of spatial flood location data in this study, the flood susceptibility maps do not exceed 90% accuracy in predicting future flood occurrences. We obtained the specific indicators for causing the flood from the model performances. The drainage density (0.89), land use and land cover (0.76), and topographical wetness index are revealed as the more responsible factors for causing flood in this region. The resulting map has been classified to different conditions of flood in this region. Moreover, for the evaluation of the functional approaches compared to the traditional approach, the flood susceptibility map using various machine learning models and ensembles and result was validated. A reliable flood susceptibility map provides a valuable tool for land-use managers and governmental agencies to adopt effective management measures for reducing the risk. Our results could be helpful in the land use prior to the planning of these areas, which are likely to be affected by the flood. In conclusion, the result is attained from the study, demonstrating the superiority of the machine learning model and flood susceptibility mapping through integrated ensemble models yields with better accuracy will help prevent and mitigate future floods.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This research was partly funded by the Austrian Science Fund (FWF) through the Doctoral College GIScience (DK W 1237-N23) at the University of Salzburg.

Data availability
The data that support the findings of this study are available on request from the authors.