Flood susceptibility modeling of the Karnali river basin of Nepal using different machine learning approaches

Abstract The Karnali River Basin (KRB) comprises the longest river in Nepal, located south of the Himalayas. Despite its high susceptibility to floods, the basin lacks detailed studies. Proper floodplain management is essential to reduce the impacts due to rising flood frequency, magnitude, and severity aggravated by climate change. This research applies three machine learning techniques, Support Vector Machine (SVM), Random Forest (RF), and Artificial Neural Networks (ANN), to flood event data from the KRB. Ten flood conditioning factors; Aspect, curvature, distance to a river (DTR), normalized difference vegetation index (NDVI), elevation, slope, rainfall, soil, stream power index (SPI), and topographical wetness index (TWI) were selected based on the multicollinearity test. The parameter performance was evaluated using the Cohen Kappa Score, with NDVI having the greatest influence, followed by elevation, DTR, curvature, and TWI. Based on the Area Under the Curve of Receiver Operating Characteristics (AUROC), SVM outperformed RF, ANN, and ANN for FSM. The area of very high flood susceptible areas ranges from 0.8 to 2.5% of the basin area, most of them located in the south with low slopes and elevations. The results of this study suggest the use of SVM for FSM to help with proper floodplain management. Graphical Abstract


Introduction
A flood is the quick flow of a huge water volume for a brief period beyond the holding capacity of the river channel (Pangali Sharma et al. 2019;Andaryani et al. 2021).Floods are one of the most frequent and common natural disasters worldwide, causing devastating effects on human lives, physical infrastructures, societal welfare, and the economic status of the affected regions (UNDRR 2019; Diaconu et al. 2021).Climate change has led to an increase in flood magnitudes and occurrences, which has further exacerbated the damages caused by floods (Hirabayashi et al. 2013;Winsemius et al. 2016;Tabari 2020;Alifu et al. 2022).It is estimated that climate change and population growth in flood-prone areas, the exposure of the people and economy to floods is likely to increase by three factors (Merz et al. 2021).
Nepal, home to over 6000 rivers and rivulets with a total length of around 45,000 km, relies heavily on them for agriculture and livelihood (Gaire et al. 2015;Gupta et al. 2021).However, the high drainage density of 0.3 km/km 2 and restraining drainage capacity, and frequent flash floods (Rentschler et al. 2022) make these rivers prone to overflowing and causing damage (Dixit 2011;MoHA 2019;UNDRR 2019;Dingle et al. 2020;Rai et al. 2020;Thapa et al. 2020;Shrestha et al. 2020).Furthermore, due to the concentration of heavy rainfall events in a few months of monsoon, these rivers can cause significant flood disasters, affecting a large proportion of the livelihoods (Dingle et al. 2020).Floods annually cause several deaths and major economic damage to the country (MoHA 2018; UNDRR 2019; Shrestha et al. 2020), posing an imminent threat to human lives, physical infrastructures, societal welfare, and the economic status of developing countries like Nepal (Rai et al. 2020;Thapa et al. 2020).In fact, the loss due to floods in Nepal is expected to be 82.93% of the annual loss to the country (UNDRR 2019).
To mitigate the severity of flood hazards, proper watershed management is crucial (Andaryani et al. 2021).Preparing a flood susceptibility map (FSM) is an important component in making informed decisions regarding flood hazards (Youssef et al. 2022).An FSM is a binary classification that predicts whether a pixel will experience a flood (1) or not (0) based on past flood data and conditioning factors (Dodangeh et al. 2020;Nachappa et al. 2020).A range of physical, statistical, and decision-making methods are used for FSM, including hydraulic/hydrologic modeling, bivariate analysis, and the analytical hierarchy process (Khosravi, Pourghasemi, et al. 2016;Chakraborty and Mukhopadhyay 2019;Costache et al. 2020;Araujo and Dias 2021).However, each of these methods has its limitations.Less availability of various hydrogeomorphological observation data and the issue of data reliability and availability hinder the application of physically based models (Mosavi et al. 2018;Khosravi et al. 2020;Mehravar et al. 2023) and numerical modeling (Antwi-Agyakwa et al. 2023) in some regions (Khosravi et al. 2020;Liu et al. 2021;Seydi et al. 2023).Moreover, the physical models are affected by computational complexities and the proper selection of parameters (Fu et al. 2020).In recent years, the use of multi-criteria decision-making methods in FSM has gained popularity.However, the reliance on expert judgment in these methods can create biases, and even slight changes in parameter weights can have a significant impact on the results (de Brito et al. 2019;Ali et al. 2020;Mehravar et al. 2023).On the other hand, statistical methods such as frequency ratio (Tehrany et al. 2014;Shafapour Tehrany et al. 2017) and logistic regression models are widely used in flood modeling.However, these methods are based on linear assumptions and may not capture the non-linear behavior of floods (Pangali Sharma et al. 2019;Khosravi et al. 2020;Andaryani et al. 2021).The hydrological/hydraulicbased models use the non-linearity concept for their performance however, the geomorphological and environmental factors could factors affect their accuracies (Seydi et al. 2023).For basins larger than 1000 km 2 , accurate two or three-dimensional analysis using a hydrodynamic model such as HECRAS is not feasible (Khosravi et al. 2020).
To overcome the limitations of traditional models, Machine Learning (ML) algorithms have been introduced, which utilize information based on the data provided without predefined assumptions or understanding of the physical process (Nachappa et al. 2020;Mishra et al. 2022) and help in rapid spatial data analysis (Dodangeh et al. 2020;Mishra et al. 2022).Some of the ML methods commonly used in FSM include artificial neural networks (ANN) (Tehrany et al. 2014;Shafapour Tehrany et al. 2017;Andaryani et al. 2021), support vector machines (SVM) (Yousefi et al. 2018;Li et al. 2019;Costache et al. 2020;Nachappa et al. 2020), gradient boosting (Ghosh et al. 2022;Seydi et al. 2023) and random forest (RF) (Schmidt et al. 2020;Nachappa et al. 2020).The Key steps in ML for FSM include analyzing the problem, preparing data, identifying a data-driven model, finding the best model, and evaluating it.Data-driven model identification is crucial, and minimizing the discrepancy between actual and predicted data during training achieves the best model approximation (Fu et al. 2020;Shahabi et al. 2020).Besides these ML-based modeling approaches deep learning models have shown great potential in various applications, but they are complex and demand a large training dataset to achieve high accuracy.These models have many hyperparameters, and tuning them can be difficult and time-consuming (Seydi et al. 2023).These approaches have led to the development of state-of-the-art ML models at different scales in FSM (Rahmati et al. 2020).
Similarly, in the context of flood risk management in Nepal, the application of machine learning (ML) models is limited.There have been studies using Gaussian process regression and SVM (Baig et al. 2022;Shreevastav et al. 2022) and MaxEnt (Shreevastav et al. 2022) in the Koshi Basin and a part of the Bagmati Basin respectively.However, most of the previous studies have focused on hydrodynamic analysis for flood modeling in the Karnali River Basin (KRB) and other river basins (MacClune et al. 2014;Aryal et al. 2020;Dingle et al. 2020;Rai et al. 2020).The potential of ML-based models such as support vector machines (SVM), random forests (RF), and artificial neural networks (ANN) for flood risk management is less explored in large river basins of Nepal.
To address the limitations of existing models used in FSM and the lack of data in larger river basins of Nepal, we focused on developing a simple yet effective and robust FSM for these basins.We propose an ML-based model for flood risk management in the KRB, a large river basin located in the southern slope area of the Himalayas.The KRB has a history of devastating floods, including the once-in-1000-year occurrence of mid-August 2014, which claimed the lives of 222 people and significantly impacted 120,000 more (MacClune et al. 2014;Aryal et al. 2020).The Terai portion of the KRB is particularly vulnerable due to broad and flat plains that contain more sediment deposits and unstable steep, rugged slopes that are raising the beds of the Karnali River by 10 to 30 cm per year (Team (NCVST) 2009;MacClune et al. 2014).These raised beds have resulted in the lowering of the elevations of several communities below the river, making them vulnerable to frequent floods (Dhakal 2013;MacClune et al. 2014).While early warning systems have helped to reduce the impact of floods to some extent (MacClune et al. 2014), our proposed MLbased model for flood risk management in the KRB has the potential to further improve decision-making processes in flood plain management.We aim to select the best ML approach among SVM, RF, and ANN, considering the influence of flood conditioning factors on the modeling approach.The proposed approach will generate a flood susceptibility map for the KRB, highlighting the area most susceptible to flood and the factors that contribute the most to the flood.Our study contributes to the national disaster risk reduction strategic plan of Action (2018-2030) (MoHA 2019; UNDRR 2019) based on the Sendai Framework for Disaster Risk Reduction (United Nations 2015), which aims to reduce the impact of natural disasters in Nepal.

Study area
Nepal is a landlocked-mountainous country lying in the central Himalayan region of southern Asia, situated between 80 4 0 E and 88 12 0 E longitude and 26 12 0 N to 30 27 0 N latitude (Talchabhadel et al. 2018;Pangali Sharma et al. 2019;Thapa et al. 2020).Extending 885 km and 140-250 km in the east-west and north-south regions.Nepal has diverse topography, ranging from 60 m in the southern plains to 8848.6 m of Mount Everest in the north.The study area of this project is Karnali River Basin , considered the longest river in Nepal (Figure 1) with a basin area of more than 46,000 km 2 .Majorly snow-fed by glaciers with 1361 glaciers present over an area of 1740 km 2 (Ives et al. 2010) the KRB is composed of six major watersheds; Kawadi, West Seti, Humla Karnali, Mugu Karnali, Tila, and Bheri (Rai et al. 2020).Karnali stretches from the Tibetan region of China to the southern part of India.Originating from the southern part of Mansarovar and Rookas lakes in the Tibetan region, Karnali enters Nepal from Khojarnath and leaves through Chisapaani (Khatiwada and Pandey 2019).Karnali River is gravel-bottomed and the flow of the water in the channel is controlled by the sediment deposits (MacClune et al. 2014).The land-use pattern of the KRB is dominated by snow/bare land in the upper portion, while forest and agriculture are in a presiding role in the lower portion (Khatakho et al. 2021).The temporal distribution of the precipitation in the KRB is influenced by the monsoon, where the highest precipitation of 290.40 mm is observed in July, and the lowest precipitation of 12.51 mm is observed in November (Khatakho et al. 2021).

Data collection and data preparation
For the research work, we collected the required geospatial data from different sources for our research work.Specifically, we obtained long-term rainfall data (Fick and Hijmans 2017) from WorldClim, Lithological map (DMG 1994) from the Department of Mines and Geology, ASTER Global Digital Elevation Model (ASTER GDEM) produced by the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) (NASA 2022) instrument on board the Terra satellite of 30 m spatial resolution from the United States Geological Survey (USGS) official website), and flood area locations from the national disaster risk reduction portal (NDRR portal), and news reports for the date and time of flood occurrences.

Preparation of flood inventory map and training data-set
The Disaster Risk Reduction portal of the Nepal Government (http://drrportal.gov.np/) was explored for the location of flood points estimation.We used the UN-Spider Google Earth Engine (GEE) code (https://code.earthengine.google.com/f5c2f984c053c8ea574bfcd4040d084e) after the identification of the critical areas to identify the flooded area.The detection of water change is achieved by using Sentinel-1 imagery in the GEE code.A comparison was made between the Sentinel-1 image of the flood occurrence date and a non-flooded day.We manually created random flood points by utilizing field knowledge about the flood-prone areas and comparing them with the flood raster obtained from GEE (Figure 3).Non-flood points were created by selecting areas where flooding was not observed or are unlikely based on their location, slope, or elevation.To verify the accuracy of these points, we conducted a visual inspection using water extent, change, and availability maps derived from the Global Surface Water Data set (Pekel et al. 2017;Tang et al. 2020) and the Normalized Difference Water Index (NDWI) obtained through Google Earth Engine using Landsat Image (USGS 2022).
The equation for NDWI is where Green and NIR are the green and near-infrared bands.Based on the suggestion by different researchers (Buitinck et al. 2013;Towfiqul Islam et al. 2021) the same number of flood and non-flood pixels with a total of 424 points were prepared from the flood inventory to avoid biases (Figure 3).To increase the performance and accuracy of the models, 15-fold cross-validation was applied to the training dataset for all approaches.Similarly, the flood data was divided into different train-test split sizes from 80-20% to 50-50% with a difference of 5% and a significance test of the average scores was performed using a one-way ANOVA test to determine F-test and p-value.

Flood conditioning factors
After analyzing their potential effects, we selected ten flood conditioning factors based on the physiography of the KRB.These factors included elevation, slope, aspect, curvature, lithology, rainfall, Stream Power Index (SPI), Topographic Wetness Index (TWI), Distance to River (DTR), and Normalized Difference Vegetation Index (NDVI).Elevation plays a crucial role in determining the distribution of topography and vegetation within the catchment (Nachappa et al. 2020).Lower regions with low elevation are more susceptible to damage (Li et al. 2019).Slope affects water flow and discharge, with steeper gradients resulting in faster water movement and lower infiltration rates (Shafizadeh-Moghadam et al. 2018).Aspect determines the hydrologic characteristics of the basin, with northward slopes receiving higher rainfall due to reduced exposure to solar radiation (Andaryani et al. 2021)].However, in the Nepal Himalayas, the northern side forms a rain-shadow area with minimal rainfall (Panthi et al. 2015).
Curvature influences flood susceptibility, where positive values indicate convex surfaces and negative values indicate concave surfaces (Shafizadeh-Moghadam et al. 2018).Areas with higher curvature have a lower probability of flooding (Liu et al. 2021;Mehravar et al. 2023).Rainfall acts as a trigger for flooding, and increased rainfall intensifies the potential for flood occurrences.SPI represents erosive power and surface runoff intensity in a specific location (Khosravi et al. 2019).TWI characterizes accumulated flow (Nachappa et al. 2020), while DTR measures proximity to the river (Janizadeh et al. 2019).NDVI indicates vegetation availability, with dense vegetation reducing inundation and weak or no vegetation increasing flood vulnerability (Askar et al. 2022).The NDVI, SPI, and TWI were calculated from DEM using Equations ( 2), (3), and (4).
where, NIR and RED are near-infrared and RED bands.A s is the specific catchment area and b is the slope gradient.
We used QGIS to prepare the flood conditioning factors.Once the preparation of the flood conditioning factors was completed, we reclassified the prepared rasters to create classes for identifying the flood-prone zone.To perform the reclassification, we applied different methods based on previous studies (Chapi et al. 2017;Bui et al. 2018;Shahabi et al. 2020), including manual division, equal division, quantile division, and natural breaks division.In this study, the natural breaks method was used for the reclassification of elevation, slope, aspect, rainfall, and NDVI.The quantile division method was employed for reclassifying TWI, SPI, and DTR.For soil and curvature, we performed manual division during the reclassification process (see Table 1, Figure 4).

Multicollinearity test
Due to the existence of a high correlation between two parameters the multicollinearity problem occurs (Dodangeh et al. 2020;Towfiqul Islam et al. 2021).There are different methods of Multicollinearity analysis, however, the Pearson correlation coefficient and VIF are preferred in hazard mapping (Tehrany, Jones, et al. 2019).If the tolerance value is below 0.2 and the VIF value is above 5 or 10 (Arabameri et al. 2019;Rahman et al. 2019;Baig et al. 2022;Ghosh et al. 2022;Mehravar et al. 2023), it indicates that there is multicollinearity among the parameters.Similarly, the value of a Pearson correlation coefficient less than 0.8 or 0.7 is considered less likely for the existence of collinearity (Tehrany, Jones, et al. 2019;Shrestha 2020;Shreevastav et al. 2022).

Evaluation of the importance of the flood conditioning factor based on information gain ratio and leave-one-out cross validation (LOOCV)
Another research objective was to determine the most influential flood conditioning factors and explore which factors are significant while preparing the models.The suitability and importance of factors in the flood are crucial before the application of the modeling approaches (Towfiqul Islam et al. 2021).We applied the information gain ratio (IGR) (Mehravar et al. 2023) to assess the importance and suitability of the factors due easy application and effectiveness (Dodangeh et al. 2020).To determine the importance of the factors, we applied the LOOCV approach.First, all the parameters were utilized, and the Kappa score was recorded for each model.After that, each of the parameters was removed one by one, and respective Kappa scores were recorded for all the models.

Flood susceptibility model using ANN, RF, and SVM
We trained the models using the flood points data and a stacked raster file of conditioning factors for SVM, RF, and ANN with Sci-kit Learn (Pedregosa Fabian et al. 2011) and Keras (Chollet 2015).ANN is a common ML method that shows the correlation between input conditioning factors and output through multilayer perceptron with hidden and output neurons (Dtissibe et al. 2020).RF creates numerous models through bagging (Richman and W€ uthrich 2020) and random feature selection, then combines the findings to generate samples and anticipate output based on polling (Towfiqul Islam et al. 2021).SVM is a supervised learning binary classifier that estimates functions using a linear or kernel function (Nachappa et al. 2020) via hyperplane formation and binary categorization of data points as þ1 or À1 (Tehrany et al. 2014).The success of the model is determined by the kernel function used, which can be sigmoid, polynomial, radial basis, or linear (Choubin et al. 2019).

Validation and comparison of the models
In the ML algorithm for flood susceptibility, pixels that are correctly classified as positive i.e. flood pixels (P), or negative, that is, non-flood pixels (N) are True Positives (TP) or True Negatives (TN), respectively.Likewise, Flood pixels and nonflood pixels that are incorrectly identified are referred to as False Positives (FP) and False Negatives (FN) (Chakraborty and Mukhopadhyay 2019;Choubin et al. 2019).The probability of correctly identifying positive and negative instances is represented by the true positive rate and true negative rate respectively (Pourghasemi et al. 2020).In the Receiver Operating Curve (ROC), the X-axis displays the false positive rate (1specificity) (Equation 5), while the Y-axis shows the true positive rate (Sensitivity) (Equation 6).

Specificity True Negative Rate
The Area Under Curve of ROC (AUROC) is determined as Generally, an AUROC (Equation 8) value ranging from 0.5 to 0.6 is considered as the incompetent model, 0.6 to 0.7 indicates a poorly performing model, 0.7 to 0.8 is considered a satisfactorily performing model, and a model having an AUROC value greater than 0.8 is considered as the best performing models (Chapi et al. 2017;Towfiqul Islam et al. 2021;Ranjgar et al. 2021).For the validation and effectiveness of the model success and prediction rate curves were constructed respectively for train and test data sets (Khosravi, Pourghasemi, et al. 2016;Shafapour Tehrany et al. 2017).We used AUROC for success and prediction rates to test the performance of the ML algorithms and to validate the models (Mojaddadi et al. 2017;Andaryani et al. 2021).Similarly, kappa coefficient, accuracy, F1-scores (Seydi et al. 2023), MSE, and RMSE (Nguyen 2022) values were also evaluated for accuracy assessment and model comparison.

Flood inventory and training dataset
The One-way ANOVA test showed an F-value and p-value of 2.481 and 0.112, respectively.The result implies that the difference of average scores (Table 2) from the mean value is insignificant so the train-test split size has insignificant effects on the values of average scores.Hence as a common practice, a 70-30% split size was applied using a random selection process as in various studies (Khosravi, Pourghasemi, et al. 2016;Tehrany, Kumar, et al. 2019;Khosravi et al. 2020;Andaryani et al. 2021;Baig et al. 2022).

Multicollinearity test
The results from the multicollinearity test show that the VIF values are less than 10 and the tolerance is higher than 0.1.This reveals that all ten flood conditioning factors do not have any collinearity problems for utilization in FSM.On the other hand, the correlation matrix shows the strength of the linear relationship between the conditioning factors and flood susceptibility.Among the factors, elevation (0.675), slope (0.530), NDVI (0.529) and curvature (0.453) have relatively high correlation values but are less than 0.8.This indicated that the multicollinearity does not exist between any of the factors and hence can be used for the FSM (Table 3).

Comparison of the parameters
Based on the result as shown in Table 3, elevation (0.390), rainfall (0.205), soil (0.194), and TWI (0.191) have higher IGR values, indicating they are more important for flooding compared to other factors.Similarly, NDVI is the most influencing factor based on Cohen Kappa Score in the LOOCV method.The Kappa Score dropped sharply for all models when the NDVI parameter was removed.Likewise, the removal of elevation, distance to the river (DTR), and rainfall also decreased the score in the SVM model (Figure 5a).Curvature and TWI were found equally significant in the ANN model (Figure 5b).However, no other parameter showed a significant decrease in the case of the random forest model (Figure 5c).Aspect seems to be a less important factor in cases of SVM and ANN models (Figure 5a, b).Based on the IGR and LOOCV, we observed that elevation, NDVI, DTR, rainfall, curvature, and TWI are the major factors that influence flooding in KRB (Table 3 and Figure 5).This result was similar to previous studies such as (Khosravi, Nohani, et al. 2016;Chapi et al. 2017;Bui et al. 2018;Tehrany, Jones, et al. 2019) but shows high contrast with the results such as by (Bui et al. 2015;Rahman et al. 2019).
Naturally, a flood occurs in areas of relatively low elevation and slope as such in the previous studies (Tehrany, Pradhan, and Jebur 2015;Khosravi, Nohani, et al. 2016;Khosravi, Pourghasemi, et al. 2016;Bui et al. 2018;Talukdar et al. 2020).Loamy sand and clay are major soil types in the flood susceptible area of KRB.Similarly, major floods occurred in the southern plain of the basin where extreme rainfall frequency is higher compared to the northern mountainous region.The area has low vegetation, slope, curvature, and TWI (Figures 3 and 4).The Karnali River enters into a low-elevation, low-slope flat area with a slope ranging from 0 to 3 degrees and branches into two major channels Karnali and Bheri channels forming the river island (Rakhal et al. 2021).This is because the upper region of KRB confines the water flow within its banks, and abrupt changes in elevation or breaches in the bank cause water to spread over vast flat areas (Dhakal 2013;Rakhal et al. 2021).This fact is supported by the results from IGR and LOOCV in this FSM study.The study by Andaryani et al. (2021) shows a similar ranking of the influencing factors.Similar results were observed by other studies as well (Talukdar et al. 2020;Towfiqul Islam et al. 2021).There are some contrasts in the importance of the factors in the study (Bui et al. 2015;Khosravi, Nohani, et al. 2016;Khosravi et al. 2020).Another study shows that slope, SPI, geology, and altitude are important factors for flood susceptibility (Tehrany, Kumar, et al. 2019).Major of the previous research (Tehrany, Kumar, et al. 2019;Andaryani et al. 2021;Youssef et al. 2022) shows that elevation is the major factor in FSM.Besides these other factors might vary according to the nature of the river basin such as geomorphology, hydrology, and topography (Tehrany, Kumar, et al. 2019;Andaryani et al. 2021).

Flood susceptibility mapping
During the preparation of the flood susceptibility map of KRB , it was observed that the major portion lying in the upper portion was covered by vegetation and contained solid bedrock, which in turn confines the river and reduces the flood susceptible area as represented in Table 4 and Figures 6, 7. Flood susceptibility maps show that ANN predicted a very low susceptible area of 82.22% (38,762 km 2 ), followed by SVM at 55.12% (25,985 km 2 ) and RF at 54.76% (25,818 km 2 ).Similarly, for moderate flood risk, the SVM model dominated at 7.68% (3621 km 2 ), followed by RF and ANN at 5.94% (2799 km 2 ) and 2.39% (1125 km 2 ) %, respectively.For very high flood-susceptible areas ANN model showed the highest at 2.467% (1163 km 2 ), followed by RF at 0.995 (469 km 2 ) % and SVM at 0.828% (390 km 2 ).The majority of the susceptible area was  found in the lower portion of the basin (area denoted in Table 4 by the phrase "very high" and enlarged portion in Figure 6).This region lacks vegetation, and more areas are exposed to direct runoff.Also, in the lower region, elevation decreases rapidly.Due to the sudden drop in the elevation and lowered slope, the sediment deposition is backed by extreme floods (Yousefi et al. 2021).As seen in the enlarged portion of Figure 6a-c, the braided river pattern represents the more susceptible land in the lower Terai portion.This is the area where the slope is lowered to 3 due to sedimentation and with a braided river pattern where major flooding occurs (Rakhal et al. 2021).The riverbeds in numerous Terai rivers are increasing in height at a rate of 10-30 cm annually due to sedimentation (MacClune et al. 2014).Likewise, the possible impacts of floods in this region are relatively high due to the development of cities with high population density such as Tikapur, Rajapur, Madhuwan, etc. (Aryal et al. 2020).

Validation and comparison between different machine learning algorithms
The comparison between different ML algorithms was performed based on the AUROC, as shown in Figure 8 and Table 5. AUROC is one of the most reliable methods for the evaluation of predicted FSMs.The simplicity and easy understandability of the AUROC have made this popular in validations of spatial modeling (Nachappa et al. 2020;Andaryani et al. 2021;Youssef et al. 2022).From the results obtained, as represented inTable 5 and Figure 8, the SVM model performed better in comparison to RF and ANN.The SVM model exhibited superior performance, as evidenced by its highest AUROC value for both success and prediction rate with 92.8 and 98.7%, respectively, followed by RF (91.9 and 98.5%) and ANN (89.8 and 98.1%).
Since the success is related to the fitting of the training data set so the prediction rate determines the applicability of the model (Tehrany, Pradhan, and Jebur 2015;Khosravi, Pourghasemi, et al. 2016;Andaryani et al. 2021).However, the results of AUROC for success and prediction are in line with each other so the results are consistent.Besides this, the accuracy, F1-score, and kappa scores of SVM were the highest among the models tested, with values of 0.953, 0.955, and 0.905, respectively, followed by RF (0.906, 0.930, and 0.809) and ANN (0.919, 0.923, and 0.795).Similarly, MSE and RMSE values are also lowest for SVM.These findings imply that the SVM model can deliver a more precise mapping of flood vulnerability, confirming its applicability in managing and planning floods.Because the SVM model offers a versatile and reliable method for modeling complicated interactions between climatic and topographical factors and flood occurrence, its application for FSM can be very beneficial similar to the studies by Tehrany, Pradhan, Mansor, et al. (2015) and Liu et al. (2021).The SVM model can simultaneously assess many features and data types, discover non-linear correlations between the predictors and the response variable, and detect subtle patterns and trends that other models might overlook (Tehrany, Pradhan, Mansor, et al. 2015;Tehrany, Kumar, et al. 2019).In the studies (Tehrany et al. 2014;Tehrany, Pradhan, and Jebur 2015), the results were compared between SVM and other statistical methods where SVM performed the best.The results are contrasting with the results from some researchers (Nachappa et al. 2020;Towfiqul Islam et al. 2021;Andaryani et al. 2021;Youssef et al. 2022;Seydi et al. 2023).The differences between the modeling approach in most of the indicators of the validation and comparison parameters are marginally different with values at the higher end.Therefore, while the SVM model demonstrated the best performance, the applicability of RF and ANN models cannot be ignored.The study aimed to propose a simple yet effective ML-based FSM for KRB.We found that the careful selection of the flood and non-flood points for the inventory preparation brought higher accuracy with SVM, RF, and ANN.The familiarity with the flood-prone areas in the KRB  helped in creating better flood inventory which in turn helped in increasing the accuracy of the methods applied.Besides this, we believe that in the case of model validation use of statistical approaches might not be error-free (Meyer et al. 2019) so, the results from the models were compared by visual inspection to verify the prediction results and found appropriate.Likewise, the results were also compared with the flood hazard map from the Colorado flood observatory (https://floodobservatory.colorado.edu/)which includes the maximum flood extent from 1993.From the modeling approaches and visual inspection, a conclusion can be drawn that the flood susceptibility maps prepared from all three approaches are appropriate however, the result from SVM is more reliable.

Conclusions
The use of traditional approaches for FSM mostly the lack of data creates a major uncertainty and often relies on expert knowledge and field surveys, which can be time-consuming, expensive, and prone to errors.In contrast, ML models can be trained on historical data, which allows for the efficient analysis of large datasets, leading to faster and more accurate predictions.Besides this, the previous modeling approaches used fixed river channels.However, in KRB Periodic variations in the path of a river and the way sediment moves, which can change its physical shape and structure of the river channel and adjacent floodplain can modify the likelihood of flooding (Sinclair et al. 2017).For the preparation of the flood inventory historical flood data has been used in this study so the uncertainty due to channel shift is addressed.In the lower elevations of KRB, where morphological features from the bedrock gorge to the Indo-Gangetic plains create a favorable environment for floods, the research found highly susceptible areas.From various tests used in this study vegetation largely regulates the flood as the scores decreased for each model significantly when the NDVI parameter was removed, demonstrating the significance of vegetation in flood forecasts.Additionally, topographic factors like elevation and slope were found to play a critical role in runoff generation.The SVM model performed better than other models, as shown by the best AUROC precision, F1-score, and accuracy values whereas other models also performed well.This result signifies that the area of high flood risk area is in the range of 469 km 2 (from SVM) to 1163 km 2 (from ANN).We should thus like to reiterate that vegetation plays a vital role in flood control by increasing the flow path, and infiltration rates and decreasing the flow velocities and in many cases acts as a buffer zone against the settlements and flooding rivers.This should also be a part of floodplain management to protect people and property from floods.Hence, the results of this study should be useful to the state and local administrations and policymakers to develop effective flood management plans for KRB.It can be concluded that since floods continue to pose a significant threat to people and infrastructure worldwide, the application of ML models can play a crucial role in reducing the risks associated with flooding.With the development of advanced ML models and the increased computational power of information systems, accurate mapping tools are now available, allowing scientists to develop more accurate FSMs.These models are critical to developing viable and effective floodplain management strategies (Chen et al. 2019;Nachappa et al. 2020;Towfiqul Islam et al. 2021) and keeping people safe in flood-prone areas.
conditioning factors.The main steps in the proposed method of FSM include (1) Data collection and Preparation; (2) Preparation of a flood inventory map for the selection of training data set; (3) Preparation for the layers in flood conditioning factors; (4) Multicollinearity and Pearson test to analyze the suitability and relative importance of the conditioning factor; (5) train and test the ANN, SVM, and RF models using the flood inventory data; (6) use Information gain ratio (IGR) and Cohen Kappa Score in Leave-one-at-a-time cross-validation (LOOCV) approach to determine the most influential flood conditioning factors; (7) preparation of the susceptibility maps separately for ANN, SVM, and RF and (8) validation and comparison of the ML Algorithms using the Area Under the Curve of Receiver Operating Characteristics (AUROC), accuracy, F1-score, and kappa scores.A generalized flowchart of the study is shown in Figure 2.

Figure 3 .
Figure 3.A snippet of flood and non-flood points selection based on Sentinel-1 SAR images of (a) before and (b) after flood using GEE.

Figure 5 .
Figure 5. Parameter evaluation based on Cohen Kappa Score using leave-one-out cross-validation (LOOCV) method for (a) Support Vector Machine, (b) Artificial Neural Network, and c) Random Forest.

Figure 7 .
Figure 7. Flood susceptible areas in percentage obtained from Support Vector Machine, Random Forest, and Artificial Neural Network.

Figure 8 .
Figure 8. Validation of FSM prediction rate and success rate curves for (a) ANN, (b) RF, and (c) SVM.

Table 1 .
Data used to prepare flood conditioning factors

Table 2 .
Mean Score for different train-test split sizes

Table 3 .
Assessment of flood conditioning factors based on multicollinearity, Pearson test and Information Gain Ratio (IGR).

Table 4 .
Flood susceptible area (km 2 ) predicted by support vector machine, random forest, and artificial neural network.

Table 5 .
Performance of models.