Using fuzzy and machine learning iterative optimized models to generate the flood susceptibility maps: case study of Prahova River basin, Romania

Abstract In this work, the vulnerability to flooding in the Prahova River basin was calculated and analyzed using advanced methods and techniques. Thus, 2 hybrid models represented by Iterative Classifier Optimizer – Multiclass Alternating Decision Tree – Certainty Factor (ICO-LADT-CF) and Fuzzy-Analytical Hierarchy Process – Certainty Factor (FAHP-CF) were generated, which had as input data the values of 10 flood predictors and a number of 158 points where historical floods occurred. In the first step, the Certainty Factor values were calculated, which were then used in the Fuzzy-Analytical Hierarchy Process and Multiclass Alternating Decision Tree models. It should be mentioned that the Multiclass Alternating Decision Tree model was optimized with the help of the Iterative Classifier Optimizer. In the case of both ensemble models the slope angle was the most important flood conditioning factor. Moreover, according to Certainty Factor modelling the 8 classes/categories achieved the maximum value of 1. Next, the susceptibility to floods on the surface of the study area was derived. On average, about 20% of the study area has areas with high and medium susceptibility to flash floods. After evaluating the quality of the models through Receiver Operating Characteristics (ROC) Curve, the following results emerged: Success Rate for Flood Potential Index (FPI) Iterative Classifier Optimizer – Multiclass Alternating Decision Tree – Certainty Factor (ICO-LADT-CF) (Area Under Curve = 0.985) and Flood Potential Index (FPI) Fuzzy-Analytical Hierarchy Process – Certainty Factor (FAHP-CF) (Area Under Curve = 0.967); Prediction Rate for Flood Potential Index (FPI) Iterative Classifier Optimizer – Multiclass Alternating Decision Tree – Certainty Factor (ICO-LADT-CF) (Area Under Curve = 0.952) and Flood Potential Index Fuzzy-Analytical Hierarchy Process – Certainty Factor (FAHP-CF) (Area Under Curve = 0.913). At the same time, the accuracies of the models were: Training dataset − 0.943 (Iterative Classifier Optimizer – Multiclass Alternating Decision Tree – Certainty Factor) and 0.931 (Fuzzy-Analytical Hierarchy Process – Certainty Factor); Validating dataset − 0.935 (Iterative Classifier Optimizer – Multiclass Alternating Decision Tree – Certainty Factor) and 0.926 (Fuzzy-Analytical Hierarchy Process – Certainty Factor). As main conclusion, it can be mentioned that the 2 ensemble models outperform the previous machine learning models applied on the same study area before.


Introduction
As a natural hazard, floods can pose a major problem to communities in the event that they do not have geospatial planning (Uddin and Matin 2021).It is widely acknowledged that geospatial technology can eliminate flood disaster risks and promote decision-making when it comes to flood management (Zhou et al. 2021a).There have been many attempts to reduce flood damage over the past few decades by examining floods, creating flood hazard maps, and analyzing floods in order to minimize flood damage (Rojas et al. 2022;Zhou et al.2021b).It has been unanimously acknowledged by many scientists that hazard zoning has an important role to play in the spatial planning processes (Ma et al. 2023).There are two kinds of flooding: fluvial flooding, which occurs when an increase in the water level of a stream or river overflows into its surrounding area and subsequently into the coastal area, and pluvial flooding, which occurs when excessive runoff from rainfall causes a rapid rise of water levels (Ho et al. 2022;Zhou et al. 2021c).It's possible that a significant increase in water levels may have been caused by excessive rain, snowmelt, or a combination of both (Tian et al. 2019).In addition to causing property and life losses, snowmelt floods are often associated with numerous flood events.Increasing flood preparedness requires a better understanding of the hydrology of cold regions and how snowmelt causes flooding (Venegas-Cordero et al. 2023;Tian et al. 2020).A cold area is represented also by the upper part of Prahova River catchment from Romania.Over the period of 1950 to 2005, there were a total of around 40% of deaths in Europe attributed to flooding that occurred due to flash floods (Yao et al. 2022).It has become more and more prevalent to emphasize the prevention and mitigation of floods in order to reduce the damage of life and property that can be caused by devastating hazards upon occurrence due to our limited ability to save lives and properties (Li et al. 2018;Yin et al. 2023a).In order to take the most appropriate measures to mitigate the negative effects of flood phenomena, an accurate risk analysis should be done.
A flood risk analysis includes flood hazard mapping as part of the process, which provides an efficient and accurate means to determine the spatial extent of flood characteristics such as velocity, depth, and frequency at the time of an event (Parizi et al. 2022;Zhao et al. 2020;Gao et al. 2021).Maps of flood hazards play an important role in flood management strategies because they show what flood hazard is in a specific location.Over the past few years, a lot of research has been conducted on the analysis, prediction, and quantification of floods and their effects on the Earth (Fekete 2019;Li et al. 2022).One of the approaches include the application of multicriteria decision-making analysis like Analytical Hierarchy Process, TOPSIS, VIKOR (Dahri and Abida 2017;Xie et al. 2021).Another type of techniques include bivariate statistical analysis such as Frequency Ratio, Weights of Evidence, Index of Entropy or Certainty Factor (Cao et al. 2020).Because these approaches usually rely on expert knowledge and are frequently subject to bias and uncertainty, the results often lead to inaccurate conclusions.Simulation of hydrological processes using hydrodynamic models enables hydrological models to identify the relationships between the floods and their predictors.As a result, it is difficult to develop models that are accurate and based on long-term topographic, hydrological, and meteorological data, which can be difficult to acquire.In addition to this, hydrological models tend to be sitespecific and are generally not suitable for large-scale regional studies, because their use within other river basins is very limited, as they are often site-specific (Yao et al. 2022).Modeling of flash floods has become increasingly more sophisticated thanks to advancements in the use of high-performance computing technique and machine learning (ML) approaches.These can be described as a data-driven method, where a data-driven model takes into account the training data and the influencing factors so as to establish a quantitative model that can predict flood susceptibility by analyzing the relationship between flood occurrence and these factors.The most representative machine learning models include: logistic regression (Ali et al. 2020), support vector machine (Costache 2019), artificial neural networks (Chapi et al. 2017), naïve bayes (Chen et al. 2020) or random forest (Lee et al. 2017).Though, it is considered that the accuracy of stand-alone machine learning techniques can be exceeded by the ensemble machine learning models (Li and Hong 2023;Yin et al. 2023a).Therefore, in the last period of time many ensembles of hybrid models were applied in order to solve the issue of model's accuracy.Thus, the most well-known ensembles are boosting, AdaBoost or stacking based models (Luu et al. 2021).Nevertheless, there is no general consensus regarding the most accurate or performant machine learning based ensemble that could be applied in order to estimate the flood susceptibility.
Therefore, the primary objective of this paper is to explore the impact of coupling a bivariate statistics model and fuzzy multicriteria decision-making analysis with a novel decision tree model.The resulted machine learning model (LADT-CF) will be optimized with Iterative Classifier Optimizer (ICO) technique.Additionally, the next specific objectives of this work can be mentioned: i) inventory of flood points; ii) collection and description of flood predictors; iii) computing and applying the ensemble models for Flood Potential Index; iv) spatial representation and validating the results.In this regard, a significant point to note is that in the present study, the accuracy of the model will be evaluated by using ROC curves and several statistics to validate the results of the analysis.The present analysis will be focused on the middle and upper zone of Prahova River.The area has been subject to severe floods in 1970, 1975, 2005, and 2010.Therefore, from practical point of view, it is imperative to highlight these areas as accurately as possible for adequate flood risk management (Avand et al. 2022).In areas that have shown to have a high and very high flood potential, detailed studies can be conducted using hydraulic modeling (Kuriqi and Ardic ¸lioǧlu 2018) in order to reduce the potential damage (Kuriqi and Hysa 2021).
Using two hybrid models for the first time to determine flood susceptibility is the main novelty of this paper.Moreover, the present results are more accurate than those obtained in previous studies on the same topic.

Study area -Prahova basin Romania
The study area extends over the hill and mountain area in the central-southern part of Romania and occupies an area of about 2600 square kilometers (Figure 1).The elevations of the relief in the study area start from 128 meters in the low hill area, and reach over 2500 m in the high mountain area.In the research area of the study, the flood occurrence is influenced by low slope values arising from the main river valleys and depressions that make up the research area.
The peak of the floods season generated by flash-floods usually occurs between May and July when Cumulonimbus clouds produce torrential rains with extremely high levels of velocity.During this period, the study area receives approximately 15% of the annual amount of rainfall.It can be concluded from the data that out of the two hydrological soil groups B and C, hydrological soil group B dominates in the study area (40% of the total).The vulnerability of the study area to flood phenomena is provided also by the presence of human objectives like 12 cities in which there exists a number of 152506 inhabitants (Costache 2019).

Flood inventory
It is essential to have an in-depth understanding of the areas of previous flooding that are related to flood susceptibility in order to make a highly accurate prediction.It was necessary to conduct an inventory of locations affected by floods in the past to get a better understanding of how some geographical factors can influence flooding.Data was gathered from the archives of the General Inspectorate for Emergency Situations.There were 158 flood events that were detected and mapped in the upper and middle segments of the catchment of the Prahova River in the upper and middle sectors of the catchment.158 additional non-flooded points have been included in the present study in order to enhance the objectivity of the results obtained as a result of this study.The areas that were not affected by floods were randomly selected from the interfluvial zones, or in areas with very steep slopes, which would make it nearly impossible for floods to occur.Flood-affected points have been assigned the value of 0, while non-affected points have been assigned the value of 1.A percentage of 70% of floods is categorized as training dataset (110 -flood points, 110 -nonflood points), while the rest of 30% was kept as validation dataset (48 -flood points, 48 -non-flood points).

Flood predictors
In order to accurately determine flood susceptibility values, it is crucial to consider geographic factors that best explain the flooding process.This study used 10 geographic factors to identify flood-prone areas: elevation, plan curvature, slope angle, distance from rivers, lithology, topographic wetness index, hydrological soil group, land use, rainfall, and convergence index in order to identify flood-prone areas.
Among the factors that affect natural hazards analyses, elevation is one of the most important.Flooding is virtually impossible at high elevations, but flat areas are highly susceptible.According to Chen et al. (2020), flowing water from hillsides connects to lower regions of rivers, causing flooding.It is important to consider the effect of elevation on runoff amount and velocity.The importance of elevation factor is given by its usage in the many previous studies related to the flood susceptibility evaluation (Arabameri et al. 2020;Luu et al. 2021;Li et al. 2022;Yao et al. 2022;Zhuo et al. 2022).In our case, the study area is divided into eight elevation classes, with those between 200 m and 400 m accounting for the largest spatial extension (25%), followed by those between 400 m and 600 m (Figure 2a).It is important to keep in mind that plan curvature is also one of the influencing factors in flood studies (Ali et al. 2020).According to the literature (Akay 2021), this factor is often categorized into three categories: flat, convex, and concave (Figure 2b).The influence of this morphometric predictor in flood genesis is acknowledged by many authors in the literature (Chapi et al. 2017;Chowdhuri et al. 2020;Zhu et al. 2022).In order to calculate the slope, we use the angle between the horizontal datum and the ground as the reference.Essentially, it is concerned with the effect of gravity on the creation and velocity of runoff, as well as its quality (Bui et al. 2020;Cao et al. 2020;Rui et al. 2023).In fact, this conditioning factor has a great deal of importance for hydrological analysis as a result.Due to the increased volume of water that reaches rivers as a consequence, floods are triggered (Figure 2c).Around 40% of the research zone have slopes ranging in intensity from seven to fifteen degrees.Distance from river represents a critical geographical factor due to the fact that most of the flood phenomena are triggered by high rivers discharges (Saleh et al. 2022).According to the literature (Abdulrazzak et al. 2019), the distance from river factor was computed using the Euclidean distance tool having as basis the river network in shapefile format (Figure 2d).The number of floods decreasing with increasing distance from the hydrographic network is remarkable, with over 40% of flood events occurring within 50 meters of rivers.Another important factor derived from vectorial database is represented by lithology.This factor has a very high influence in terms of water infiltration process (Yang et al. 2018;Xu et al. 2022).Thus, in the case of rocks with a high impermeability and hydraulic conductivity, the flood phenomena will have a higher probability to occur, that the areas characterized by the presence of rocks with a higher permeability (Li et al. 2021).In this study, the lithological groups were extracted from the Geological Map of Romania, 1:200.000(Figure 2e).The topographic wetness index (twi) (Figure 2f) is a morphometrical factor calculated using the relationship between slope and catchment area (Azareh et al. 2021).High values of twi factor are directly related to the areas where the probability of flooding is considerably higher.The range of values is between 8.5 and 12, the middle class having the highest percentage of flood pixels equal to 30%.The hydrological soil groups (Figure 3a) are a factor that has a strong relation with the flood conditions.The hydrological groups are mainly established according to their texture (Stewart et al. 2012).Infiltration of water is enhanced by coarser textures and stoniness, which are associated with larger pores (Xu et al. 2021;Golkarian et al. 2023;Sun et al. 2023).In the study area the main hydrological soil group is C (68%).According to Zhao et al. (2023), land use is another significant factor that plays a significant role in the occurrence of floods.Flood occurrence and vegetation density have generally been found to be negatively correlated (Marino et al. 2023;Yin et al. 2023b).The forest is present on approximately 50% of the study zone, while the most flood pixels are located in the built-up areas.Compared to the timing of rain on vegetated lands, the speed of rain on non-vegetated lands is much faster than in vegetated or planted areas (Figure 3b).Thus, the areas covered by impervious surfaces, such as concrete, concrete structures, and impervious surfaces, tend to flood more frequently than the regions with plantations and trees (Al-Taani et al. 2023).Flooding is caused directly by rainfall (Figure 3c), according to a large number of existing studies (Wu et al. 2023).Due to this, flood susceptibility maps consider it to be one of the most important factors (Tariq et al. 2023).Convergence index is a morphometric measure whose values have a big influence in the flood occurrence process (Costache and Bui 2019).Their values close to 100 are characteristics for zones with a lower flood susceptibility while the values close to −100 represent the areas with a high flood susceptibility (Figure 3d).

Relief-F method
The Relief-F algorithm employs an iterative process for adjusting feature weights in order to distinguish between neighboring patterns based on the dimensions of the features.As an example, let us assume mathematically that x is a random sample of binary class data that has been randomly selected (Tuncer et al. 2020).An estimate can be made of two neighbours of a given object, where one can represent the nearest neighbor in terms of class (called a nearest hit or NH) and the other can represent its nearest neighbour in terms of class (called a nearest miss or NM).The estimation of ReliefF coefficients values can be done using the next mathematical relations (Tuncer et al. 2020): where w is the weight for the i-th feature; NM is the nearest miss; NH is the nearest hit; x is a randomly selected sample from a binary class data.

Certainty Factor (CF)
For the mapping of natural hazard susceptibility, the certainty factor approach has become one of the commonly used approaches for determining the level of probability of a given event as a result of combining heterogeneous data (Wang et al. 2019).As a result, it will be necessary to calculate the value of the Certainty Factor by applying the following equation to each class/category of flood predictors (Wang et al. 2019): In a study area of a certain size, PPa represents the conditional likelihood of flood pixels occurring in class a, whereas PPs represents the prior probability of flood pixels occurring in the entire study area of a certain size.
Further, the conditional probability (PP a ) values will be determined (Costache 2019): where PfSjBg stands for the conditional probability that a flood to occur in the unit B; NpixfS\Bg represent the total amount of flood pixels located in class B; NpixfBg is the amount of pixels from class B.
To calculate prior probabilities (PPs), divide the number of pixels within the study area representing flood phenomena by the number of pixels inside the study area.To this end, the next mathematical expression will be involved (Costache 2019): where PfSg represents the value of the prior probability, Npix(flood) -represents the amount of pixels having flood phenomena in the study region; Npix(total) -represents the total pixels from study region.

Fuzzy-AHP
The fuzzy AHP modeling will be involved in the present research paper to estimate the flood susceptibility.Thus, the study extent method using triangular fuzzy numbers (TFN) will be used to derive the fuzzy weights of the flood predictors.Initially, the AHP method is applied based on a literature survey and expert opinion specific to the area of study, and it is the first step in the development of the research report.
It is then developed a judgment matrix based on pairwise comparisons, and decisionmakers/experts are invited to fill in the judgment matrix with their views.In order to compare the prominence of the elements in the matrix, eigenvalues and eigenvectors were calculated to be able to examine their consistency.Afterward, the matrix judgments are checked to ensure that they are consistent with the pairwise evaluations that follow.In the event of a failure of the consistency test, it is necessary for the original values of the pairwise comparison matrix to be revised.There is a second stage of fuzzy logic used in decision-making where the fuzzy membership function (MF) is determined by looking at the spatial relationships of the conditioning factors to determine the fuzzy membership function (MF) (Ayyildiz and Taskin Gumus 2021).An object can be given a membership value in the fuzzy set theory that ranges between 0 and 1, indicating the degree to which the object belongs to the fuzzy set theory.
During Step-1 above, the pairwise comparison scores are transformed into linguistic variables, which are then used to assess alternative scenarios under a fuzzy environment, according to Chang's fuzzy AHP theory (Tripathi et al. 2022).In order to achieve this, all flood conditioning factors have been converted into one of the following ranges: 0 (low susceptible) to 1 (high susceptible).The synthetic extent values were derived using the next formula (Costache et al. 2022): (5) The computation of the degree of possibility of S 0 i � S 0 j (Figure 4) represents the third step and is being achieved through the following formula (Costache et al. 2022): where S 0 i ¼ ðl, m i , u i Þ and S 0 j ¼ ðl j , m j , u j Þ: Considering that (Costache et al. 2022): Then, the value of weight vector will be calculated as: where A i (i ¼ 1, 2, … , n) are n elements.After the normalization process, the weight vectors can be obtained using the following relation: where w is a non-fuzzy number.AHP weights were applied to each class of factors in ArcGIS 10.5.As a second step, Microsoft Excel 2013 was involved to construct the Fuzzy Analytical Hierarchy Process evaluation matrix associated to the class/category of each factor.Then it was utilized to compute the weights of each predictor in ArcGIS 10.5.To calculate flood susceptibility indexes using the FAHP model, we multiplied the predictors raster in ArcGIS 10.5 and the weights derived through AHP model.Then we applied those weights to the raster using the AHP model.

Multiclass alternating decision tree (LADTree)
As a result of stretching the alternating decision tree (ADT) model, we can form the LADTree algorithm that can effectively be used for multiclass allocation (Zhao et al. 2022).To separate the original problem into several subproblems, the LogitBoost measure is made use of in conjunction with the ADT model and it is applied to the multiclass alternating decision tree (LADTree) algorithm (Jirou� sek and Shenoy 2016).With ADT, decision trees and boosting are effectively combined to yield an effective categorization method that is good at predicting the future and allows for explanation of the results (Holmes et al. 2002).This method is based on polynomial likelihood computations and can be applied directly to multiclass problems by using the LogitBoost method.Taking into account both the LogitBoost method as well as the induction of the LADTree model, we will be able to examine the association between these two methods.As an alternative to the conventional LT1PC model, which generates separate trees for each category in parallel, the second method utilises a more scientifically based approach.LT, on the other hand, requires that only one tree be constructed in order to predict the probability for all classes at the same time.As illustrated in the following formula, two types of symmetric logarithmic conversion are engendered during the computation of the prediction probability for each category in this algorithm (Holmes et al. 2002): where Z k (t) represents the function associated to the regression of input variable t after N enhancement sequence are completed.Moreover, it should be noted that Z k (t) is the sum of the responses of all the ensemble classifier on instance t over N classes.

Iterative Classifier Optimizer (ICO)
In the case of iterative classifier optimizer (ICO), cross-validation is used and an algorithm is developed that optimizes the number of iterations for a given classifier.It can handle missing, numerical, binary and empty classes as well as attributes such as numeric, nominal and binary (Khosravi et al. 2021).By using the ICO algorithm, a model is developed, and then when the observed and measured values are compared, the model is optimized and, then, after the model performance is examined, the obtained information is introduced to the model for tuning the outputs using the information obtained in the optimization procedure (Saad 2018).In order to increase the accuracy of the prediction, the hybridization is primarily intended to improve the accuracy of the LADTree-CF algorithm by combining them.In this paper, the three aforementioned models have been integrated with the ICO algorithm to improve their outcomes.

Results validation
Based on the derived flood susceptibility maps of the study, flood points were used to validate the accuracy of the maps.For the purpose of evaluating the performance of the four algorithms used, the AUC-ROC method was adopted (Fang et al. 2021;Yin et al. 2023b).In order to produce an AUC graph, we calculated the sensitivity as well as the 1-specificity.Along with AUC-ROC, the next performance metrics were also used to estimate the results reliability: Sensitivity, Specificity, Accuracy and K-index (Huang et al. 2020;Mi et al. 2023).
Figure 5 presents in a schematically manner the implemented methodological workflow.

Feature selection
The first step of the implemented workflow consisted in the feature selection procedure.This procedure was applied with the help of ReliefF method.According to the results provided by this method the highest score was assigned to Slope angle (0.134), followed by Distance from rivers (0.127), Elevation (0.114), Land use (0.107), Rainfall (0.1), Hydrological Soil Group (0.094), TWI (0.087), Plan curvature (0.084), Convergence index (0.08) and Lithology (0.074) (Figure 6).
Taking into account the results achieved by each flood conditioning factor, it can be stated that all the variables have an influence related to the flood phenomena.Therefore, all flood predictors will be involved in the modelling procedure.

Certainty factor
The certainty factor coefficients were calculated for each class associated to the flood conditioning factors using the Microsoft Excel software.The highest value was equal to 1 and have been assigned to the next 8 classes/categories of flood predictors: slope angle lower than 3 � ; slope angle between 3 � and 7 � ; hydrological soil group A; TWI values between 15.1 and 25; water bodies land use category; elevation class between 128 m and 200 m; distance from river class lower than 50 m; and distance from river class between 50 m and 100 m (Table 1).Thus, the enumerated classes/categories, are, from the CF coefficients point of view, the most favourable to the occurrence of flood phenomenon.In the same time, the lowest value of CF coefficients was equal to −1 and was assigned to a number of 18 classes/categories of flood predictors.Within these classes/categories the flood pixels are not present.
Further, in the modelling procedure, the CF coefficients were used as input data in the FAHP and LADTree algorithms.

FAHP modelling results
The first stage in the computation of FPI with the help of Fuzzy-AHP method, assumes the creation in Microsoft Excel, of a fuzzy pair-wise comparison matrix.In the matrix all the 10 flood conditioning factors were inserted (Table 2).Their importance on flood genesis was the basis for the values assigned when they were compared.The results of the comparison matrix allow us to further compute the equation in which the synthesis values were included (Costache et al. 2022): Results of the previous equations were involved in the computation of fuzzy numbers which will be assigned to each flood predictor.Using the fuzzy numbers, the degree of possibility have been calculated (Table 3).Further, the importance of flood predictor was determined using the next equations: w' a i ð Þ ¼ f0:49 0:43 0:49 0:09 0:05 1 0:03 0:16 0:05 0:04g T (12) w a i ð Þ ¼ f0:17 0:15 0:171 0:031 0:017 0:35 0:01 0:056 0:017 0:014g T (13) Table 3.The ordinate of the highest intersection point, the degree possibility for TFNs, and the wright's landslide predictors.
Additionally, the defuzzification procedure permit the TFNs to be converted into crisp weights that are applied to each flood predictor and then multiplied with CF coefficients in order to derive the FPI FAHP (Figure 7a).The spatial variation of the FPI FAHP-CF values was represented in the GIS environment by using cartographic algebra.The derived values of the spatial index fell between −0.96 and 0.94.These were grouped into five classes by the Natural Breaks method.It can be observed that the first class of values, having a very low flood potential, appears mainly in the northern part of the study area, where the relief slope values are very high, and at the same time, the forest cover is very well represented.In these areas, the FPI FAHP-CF values fall between −0.96 and −0.66 (Figure 7a) and occupy 34.67% of the total study area (Figure 7a).Next, the class of low values is individualized, between −0.65 and −0.23.This class has a weight of 32.41% of the total study area and is present in the median area of the area under research.The average values of flood potential appear in the transition zones between the steep slopes of the hills and the valleys of the main rivers.They fall between −0.22 and 0.21 and occupy a total percentage of 16.01% of the Prahova river basin.The high and very high values of flood potential extend mainly in the southern part of the region under research, where the slopes tend to the 0 � value and the forest cover is also missing.It is also noticeable the presence of this potential along the main valleys of the study area.From the point of view of the values, high and very high flood potential is above 0.22 and covers 16.9% of the total studied area.

LADTree modelling results
In the case of applying the LADTree-CF algorithm, the ICO helped to determine the best parameters in order to obtain the highest accuracy.The optimization process through the ICO model was made possible by applying a cross-validation method.In the end, accuracy values of 0.967 (for the training sample) and 0.954 (for the validating sample) resulted after a number of 28 iterations applied to the ICO-LADTree-CF hybrid model.Next, the important flood predictors were determined through which the FPI ICO-LADTree-CF was calculated.The highest importance resulted for Slope (0.767), followed by Plan curvature (0.576), TWI (0.451), Land use (0.446), Distance from river (0.421), Elevation (0.377), Hydrological Soil Group (0.376), Convergence index (0.291),Rainfall (0.282) and Lithology (0.232) (Figure 8).
In the case of FPI ICO-LADT-CF , the values were obtained by multiplying the importance of each flood predictor with the CF values.The difference in values fell between −1.2 and 1.23, it being divided into 5 classes by using the same Natural Breaks method as in the situation presented previously.About 25.27% of the total study area is characterized by a very low flood potential.This is associated with the range between −1.2 and −0.93 (Figure 7b).As in the case of the other index, the very low values appear mainly in the northern half of the Prahova river basin, where the slopes have very high values and the degree of afforestation is also very high.The low values of FPI ICO-LADT-CF between −0.92 and −0.57have a weight of 29.34% of the total study area, while medium flood potential is located on around 20.39%.The high and very high flood potential is present on 25% of Prahova river basin (Figure 9).

Results validation
The first method applied to evaluate the accuracy of the obtained results is the ROC Curve.Thus, through the training data set and the FPI values, the Success Rate was built (Fig. xa).As can be seen, the AUC value associated with FPIICO-LADT-CF is 0.985 and exceeds that attributed to FPIFAHP-CF, which was 0.967.And in the case of Prediction Rate for which the validation sample was used together with the flood potential values, the AUC related to FPIICO-LADT-CF was higher than the AUC value for FPIFAHP.Thus, for the first applied model the AUC was 0.952, while for the second model the AUC was equal to 0.913 (Figure 10b).
The second group of methods used to evaluate the results is represented by the application of several statistical indicators.In order to calculate these indicators in a first stage, it was necessary to establish the values of the following variables: True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN).The use of the training sample showed Sensitivity values of 0.95 for FPI ICO-LADT-CF and 0.934 for FPI FAHP-CF .And in the case of Specificity, the value attributed to FPI ICO- LADT-CF of 0.936 was higher than the one associated with FPI FAHP-CF , which was 0.927.Accuracy value varied between 0.943 for FPI ICO-LADT-CF and 0.931 for FPI FAHP-CF .As expected, the K-index value was also higher in the case of FPI ICO- LADT-CF (0.886) compared to FPI FAHP-CF (0.862) (Table 4).In the case of using the validating sample, the Sensitivity value was higher (0.942) for FPI FAHP-CF compared to FPI ICO-LADT-CF (0.912).At the same time, the Specificity value for FPI ICO-LADT-CF (0.961) exceeds that attributed to FPI FAHP-CF (0.911).These values once again led to a higher Accuracy for FPI ICO-LADT-CF (0.935) compared to the one associated with FPI FAHP-CF (0.926).And the K-index value obtained in the case of FPI ICO-LADT-CF (0.87) is higher than that obtained in the case of FPI FAHP-CF (0.852).

Discussion
In the real world, many geographical factors play a very important role in the flood phenomenon genesis (Yin et al. 2023c).There are two categories of factors, exogenous and endogenous elements of the world, which contribute to or influence flood susceptibility.It is important to recognize that there are several internal elements of flood occurrence at varying levels, which include geomorphology, soil, geology, and socio-hydrology (Luo et al. 2022a).Physiographic characteristics as well as land use characteristics play a critical role in preventing frequent floods.In the present research the following 10 flood predictors were taken into account: elevation, plan curvature, slope angle, distance from river, lithology, topographic wetness index, hidrological soil group, land use, rainfall and convergence index.In order to verify the contribution of each flood predictor in flood susceptibility process the ReliefF method was applied.Also, if a predictor achieves a score equal to 0, it will be removed from analysis because that means it do not have any influence on flood susceptibility (Ahmadlou et al. 2019).According to the results, the highest importance the most influential factors proved to be the slope angle, distance from river and elevation.This is in agreement with many previous research work carried out in literature (Ahmadlou et al. 2021;Arabameri et al. 2020;Chowdhuri et al. 2020).
In order to achieve high quality results, the data associated to the flood predictors should be used as input in advanced and accurate models.Several studies have been conducted over the past few years across a wide variety of regions around the world using a wide variety of decision-making techniques, approaches, principles, and concepts (Handini et al. 2021;Chen et al. 2023).As a matter of fact, it is relatively new for flood susceptibility modeling to be based on machine learning algorithms and GIS (Popa et al. 2019).In terms of applications involving predictive analytics and making maps more appealing to viewers, machine learning could be used to speed up the processing and interpretation of data.The ML approach used in this study has previously been shown to be an excellent tool for this application, as previously investigated (Bui et al. 2020).In order to manage large datasets with constrained capacity, academics and data scientists need to be able to use machine learning methodologies.There are several machine learning methodologies that can be used to teach academics and data scientists how to manage large datasets with limited bandwidth (Yang et al. 2022;Luo et al. 2022b).Having access to information about likely target areas in a timely manner could potentially provide both lives and property with information that could aid in evacuation.Therefore, as a result of this information and the maps that could be created with it, flooding can potentially be regulated and prevented.In addition, these findings could be used as a basis for developing a value evaluation criterion that could be used to evaluate the value of quantitative data that is used to develop management plans for rivers, land uses and flood protections (Luu et al. 2021).There is also a need to constantly research and develop new models of floods in order to be able to understand the different aspects of floods.In the present research, the flood susceptibility maps were created using two hybrid models represented by FAHP-CF and ICO-LADT-CF.As can be seen, hybrid models were generated from several categories of stand-alone models.Thus, FAHP is part of both the category of fuzzy models and the category of specific multicriteria decision-making analysis models.At the same time, CF is a bivariate statistical method, a category that also includes frequency ratio, weights of evidence or index of entropy.Furthermore, the use of the LADT machine learning decision tree method in the present study is noteworthy.At the same time, it was desired to optimize this machine learning model by using the ICO algorithm.Thus, by generating the 2 hybrid models, the combined use of the capabilities of methods that are part of several categories was desired.The results proved to be very accurate, both models having an AUC greater than 0.913 and an accuracy above 0.926.These values exceed the performances obtained on the same study area by Costache (2019) who applied several models to estimate flood susceptibility including Adaptive Neuro-Fuzzy Inference System (ANFIS), Fuzzy-Support Vector Machine (FSVM) and Statistical Index.These models achieved a maximum AUC of 0.911 (FSVM).Therefore, it can be stated that the present study presents an improvement in the quality and precision of the results compared to the previous one.One of the advantages of the present study is given by the short time in which accurate results can be obtained regarding the susceptibility to floods for a considerably large area such as the Prahova river basin (approx.2600 km 2 ).The alternative to such a study is the use of hydraulic modeling, which in the present case would require a very long processing time and the use of much larger data and resources.Thus, the results of the hydraulic modeling for a surface equal to that of the present study would be obtained after a very long period of time.
Like any research work the present study and methodology applied in this study are subject to limitations and possible uncertanties.Thus, one of the limitation of this study and methodology is represented by the impossibility to derive the depth and velocity of river flow during the flood events.These parameters can be assessed using the hydraulic modelling.The uncertainties of data can be brought by the measurements done in the: i) land use/land cover dataset; ii) rainfall dataset; iii) soil dataset; iv) lithological and topographic dataset.These uncertainties are probably to induce some inaccuracies in the susceptibility results.Another element of limitation is represented by the lack of flood historical data.This element is usually used to calibrate and validate the flood susceptibility results.
Nevertheless, the value of such studies for policymakers cannot be overstated.Policymakers have an important role in regulating and decreasing flood vulnerability by developing and implementing flood-resistance policies and regulations.Land use planning policies that direct development away from flood-prone areas, such as floodplains, coastal zones, and other susceptible places, can be implemented by policymakers (Islam et al. 2021).This may involve zoning regulations, construction codes, and development limits in high-risk locations.Floodplain management policies can be established by policymakers to focus on actions such as floodplain mapping, flood zoning, and land acquisition or transfer of properties in flood-prone locations.This can aid in the prevention or reduction of development in flood-prone areas (Saleh et al. 2022).

Conclusions
The present work has highlighted an experimental approach through which the susceptibility of mapping floods was computed based on the integration of several models from diverse categories in the Prahova River basin from Romania.All the models were combined into 2 hybrid models (ICO-LADT-CF and FAHP-CF) that have as input data 10 flood conditioning factors and a number of 158 flood and non-flood locations.Through the 2 models, the importance of the 10 predictors was determined in the first phase.These values were then used in ArcGIS to calculate flood susceptibility across the study area.The results highlighted the fact that the Prahova River basin presents a high and very high susceptibility to flooding on a percentage between 16.9% (FPI FAHP-CF) and 25% (FPI ICO-LADT-CF).It is also worth noting that 2 categories of methods (ROC Curve and Statistical Metrics) were used in order to evaluate the accuracy of the models and validate the results.Thus, according to the ROC Curve, ICO-LADT-CF obtained an AUC higher than 0.952, while FAHP-CF obtained an AUC higher than 0.913.Statistical metrics showed an Accuracy of over 0.935 for ICO-LADT-CF and over 0.926 for FAHP-CF.
The main novelty of this research paper is the use for the first time of the 2 hybrid models in order to determine flood susceptibility.It is also worth noting that the present results proved to be more accurate than the results obtained in previous works on the same study area.
As a result of the precise results obtained through the 2 hybrid models, this study will be of major interest for future research works aimed at determining flood susceptibility, both in Romania and in any other area of the globe.Also, the results of this study will be of major interest both for the National Institute of Hydrology and Water Management of Romania, which is responsible for issuing flood warnings, and for the General Inspectorate for Emergency Situations, which is responsible for interventions in the event of disasters natural.
At the global level, there is more and more evidence of the significant influence of climate change in terms of the severity and manifestation of floods and floods.For this reason, in the future studies the authors will incorporate the climate changes characteristics in order to evaluate the flood susceptibility on Prahova River basin from Romania.

Figure 4
Figure 4. a.The degree of possibility of S 0 i � S 0 j ; b.Triangular fuzzy number corresponding to linguistic variables according to the level of preference.

Figure 5 .
Figure 5.The scheme of the implemented workflow in the present manuscript.

Figure 9 .
Figure 9. Percentage of each FPI class.

Table 1 .
CF values for each class/category of flood conditioning factors.

Table 4 .
Flood potential maps accuracy assessment using statistical metrics.