Combination four different ensemble algorithms with the generalized linear model (GLM) for predicting forest fire susceptibility

Abstract In this study, the generalized linear model (GLM) and four ensemble methods (partial least squares (PLS), boosting, bagging, and Bayesian) were applied to predict forest fire hazard in the Chalus Rood watershed in the Mazandaran Province, Iran. Data from 108 historical forest fire events collected through field surveys were applied as the basis of the analysis. About 70% of the data were used for training the models, while the remaining 30% was used for testing. A total of 14 environmental, climatic, and vegetation variables were used as input features to the models to predict forest fire probability. After conducting a multicollinearity test on the independent variables, the GLM and the ensemble models were applied for modeling. The efficiency of the models was evaluated using receiver operating characteristic (ROC) curve parameters. Results from the validation process, based on the area under the ROC curve (AUC), showed that the GLM, PLS-GLM, boosted-GLM, Bagging-GLM, and Bayesian-GLM models had efficiencies of 0.79, 0.75, 0.81, 0.84, and 0.85, respectively. The results indicated that all ensemble methods, except the PLS algorithm, improved the performance of the GLM model in modeling forest fire hazards in the Chalus Rood watershed, with the Bayesian algorithm being the most efficient method among them.


Introduction
Forest fires are a crucial environmental problem under ongoing climate change, which threaten human lives and the environment (Huebner et al. 2012).Forest fires affect the pattern of forest formation (Huesca et al. 2009).Many factors, such as temperature, rainfall, drought, and human activity, can influence the frequency and intensity of forest fire (Eastaugh & Hasenauer 2014).Forest fires are also damaging and destructive natural disasters, triggering significant fatalities and economic loss every year (Kwak et al. 2012).Forest fire susceptibility modeling plays a critical role in forest management and conservation efforts (Chicas and Østergaard Nielsen 2022).Accurate prediction of forest fires is vital for the sustainable management of forests and planning of emergency procedures by local authorities (Tehrany et al. 2019).
Statistical and physics-based models have been employed to model the forest fire susceptibility (Pourtaghi et al. 2015;Agranat and Perminov 2020;Lattimer et al. 2020;Chicas and Østergaard Nielsen, 2022;Das et al. 2023).Statistical models utilize past fire data to predict fire occurrences (Das et al. 2023).Statistical methods are restricted by assumptions about the data and may struggle to robustly capture complex relationships among variables (Jain et al. 2020).They may not be able to robustly capture intrinsic nonlinearity and nonstationary between the forest fire signature and predictor variables, and thus they perform poorly (Chicas and Østergaard Nielsen, 2022).These models also cannot account for cross correlation between variables.It is a common practice to assume the regression residuals are normally distributed, but this may not be a valid assumption (Tang et al. 2020).Furthermore, statistical models require many data to estimate the behaviour of unknown systems (Bui et al. 2017).
Physical-based models incorporate physical properties such as weather and topography to estimate fire behaviour and spread (Collin et al. 2011).They are good at providing insights into catchment environmental processes, but they have been criticized for being uncertain and difficult to implement (Zheng et al. 2017).With complex model structures and extensive calculation requirements, physical-based forest fire models often have high computational costs and require high levels of environmental expertise for modellers and users, thus limiting their application in forest fire management (Chew et al. 2022).Also, physical models often require a significant effort and many environmental variables for calibration in order to simulate physical processes of the watershed (Reinhardt et al. 2001).On the other hand, physics-based models require detailed information on various factors that may not be easily accessible, particularly in remote or inaccessible regions (Jain et al. 2020).
Recent studies in environmental hazard modeling have shown that ensembles of multiple ML algorithms tend to outperform individual algorithms (Ahmadi et al. 2020;Band et al. 2020aBand et al. , 2020b;;Kalantar et al. 2020;Shao et al. 2023).The GLM remains a valuable tool for fire susceptibility mapping.It also has been widely used to identify relationships between independent and dependent variables for fire management and planning (Carvalho et al. 2018 ;R ıos-Pena et al. 2017).In the context of GLM for spatial modeling, the accuracy of results can be improved by combining ensemble methods with GLMs.This is particularly important because the estimation of forest fire risk is complex owing to the multiple interactions of the factors that affect the ignition phenomena.The four most commonly used ensemble methods in spatial modeling are Bayesian, Bagging, Boosting, and partial least squares (PLS).Each of these approaches has its own strengths and drawbacks and can be used depending on the particular goals and requirements of the study.Bayesian allows incorporating prior knowledge and estimating uncertainties of the GLM (Tran et al. 2020).Bagging can reduce the variance and increases stability of results from the GLM (Sutton, 2005).Boosting can reduce the bias and variance of predictions from the GLM (Bou-Hamad et al. 2017).Finally, PLS can improve interpretability and computational efficiency of the GLM (Ding and Gentleman 2005).
Although these ML models are efficient in predicting the forest fire susceptibility, their accuracy is still debated.Previous studies have shown that there is no reliable framework for selecting multiple algorithms to model the forest fire susceptibility (Abedi et al. 2021;Achu et al. 2021).The complex interaction between various factors that contribute to forest fire ignition and spread makes it challenging to accurately model and predict forest fires over regional scales.Furthermore, the uncertainty in inter-model predictions yields an unclear decision-making process (Bui et al. 2016;Gigovi c et al. 2019).This paper proposes the use of four hybrid ML methods, namely PLS-GLM, Boosted-GLM, Bagging-GLM, and Bayesian-GLM for modeling forest fire susceptibility, with the aim of improving the accuracy of forest fire prediction.To the best of our knowledge, there has been no prior attempt to use these four ensemble methodologies in combination with the GLM for modeling forest fire susceptibility.
The main aim of this study is to assess the efficiency of four ensemble algorithms (i.e.PLS, Bagging, Boosting and Bayesian) to augment the ability of the GLM model in predicting forest fire hazards.The secondary aim of this work is to produce accurate forest fire maps in the Chalus Rood watershed in Mazandaran Province (Iran) as they are crucial for informed and effective forest management practices.The application of advanced ML methods in the forest fire simulation provides valuable insights into the behaviour and spread of fires under different conditions in the Chalus Rood watershed.This is particularly important because forest fires in this watershed have far-reaching and long-lasting impacts on the local communities.Furthermore, by providing an augmented understanding of the causes and consequences of forest fires in this region, these models can support sustainable forest management practices and help to prevent future forest fires.

Study area
Forest fires are a common occurrence in the Chalus Rood watershed from May to December, mainly due to declining humidity levels and higher wind speeds.However, most of fires are initiated by human activities such as intentional burning for agriculture and campfires.In Mazandaran Province, some forest areas have been devastated by fires due to high temperatures and reduced precipitation, which dramatically increase the risk of fires and change their behaviour.Therefore, prediction and preparing forest fire maps are needed to recognize critical areas.The Chalus Rood watershed is a part of the Caspian Sea basin, which has 18 sub-watersheds (50 58 0 -51 40 0 E and 36 08 0 -36 36 0 N) with an area of $1634 km 2 located in Mazandaran Province, Iran (Figure 1).The maximum and minimum altitudes of the Chalus Rood watershed are 4256 and À26 m above sea level, respectively.The area is mountainous, with a steep slope towards the north.The region mainly experiences cold and humid climate, with semi-arid cold conditions in some lowlands.The annual rainfall in the study area varies from 288.3 mm to 1538.2 mm.Precipitation is seasonal and mainly occurs during winter and autumn.Land use in the study area included forests (dominant), pastures, agriculture, and gardens.The forests in the study area (Caspian) are a remnant of the third geological period with many species of that period existing in them (Kalantar et al. 2020).In recent years, fires in the Chalus Rood basin have caused the destruction of forestlands, most of which occurred in the summer and were human-induced.

Forest fire inventory map
The Department of Natural Resources in the Province of Mazandaran collected forest fires in the Chalus Rood watershed from 2006 to 2018 through field surveys (https:// frw.ir/).We used a total of 108 forest fire data points in the study domain during 2006-2018.These points represent the center of burned areas.On the basis of available forest fire data in the study domain, the period 2006-2018 was utilized for mapping forest fire susceptibility.To generate non-fire points, a circular buffer zone with a diameter of 400 m was created around each forest fire location utilizing the Buffer module in ArcGIS 10.5 software.Outside these buffer zones, 108 non-fire points were randomly selected.A similar approach was employed by Yaakob et al. (2011), Tomar et al. (2021), and Mabdeh et al. (2022) to generate non-forest fire points.

Dataset
The selection of factors influencing forest fire mapping was based on a review of previous studies (Eastaugh & Hasenauer 2014;Hong et al. 2018;Jaafari et al. 2019) and data availability.Totally, 14 variables were chosen in this research: elevation, aspect, curvature, slope, distance from river, distance from residential area, distance from road, land use, normalized difference moisture index (NDMI), land surface temperature (LST), soil-adjusted vegetation index (SAVI), normalized difference vegetation index (NDVI), rainfall, and temperature.The integration of maps depicting these factors and related thematic layers was performed by ArcGIS 10.5, SAGA GIS, and ENVI 5.1 (Table 1 and Figure 2).
Topographical variables play a significant role in the initiation and growth of forest fires (Tariq et al. 2021).Elevation directly affects temperature, moisture, and wind patterns, and thus it is a crucial factor in fire propagation (Jaiswal et al. 2002).Although fires tend to be milder at higher altitudes because of increased rainfall (Adab et al. 2013), slope can significantly impact the fire spread rate.Fires move more rapidly uphill than downhill (Pourtaghi et al. 2015), and slope has a higher influence on fire spread than elevation.The direction of the slope (aspect) also affects forest fires as it controls the amount of solar energy received in the area.In the North Hemisphere, south-facing slopes receive more solar radiation and thus have a higher temperature, less humidity, and drier vegetation, which enhance the likelihood of ignition (Prasad et al. 2006;Setiawan et al. 2004).Curvature, or the degree of bending, of a terrain surface can have a significant impact on forest fires.A positively curved (convex) surface can trap and concentrate heat, wind, and fire, leading to increased fire intensity and spread.Conversely, a negatively curved (concave) surface can interrupt the flow of wind and fire, helping to slow down and contain the fire (Pourghasemi, 2016).
The Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) with a spatial resolution of 30 m was downloaded from the USGS EarthExplorer website (https://earthexplorer.usgs.gov/)for our case study.The elevation, slope, curvature, and aspect maps were produced based on the DEM using ArcGIS 10.5.Land use can significantly influence the frequency and severity of forest fires.Human activities such as agriculture and urbanization practices can increase the likelihood of fire ignition and growth (Kant Sharma et al. 2012;Nuthammachot and Stratoulias, 2021).The land use map was generated from a Landsat 8 OLI satellite imagery captured on 21st July 2019 using the maximum likelihood method in ENVI 5.1.
Vegetation indices play an important role in understanding and predicting forest fires.These indices provide information about the health, density, and moisture content of vegetation.Normalized Difference Vegetation Index (NDVI) is widely used to monitor the status of plants, and is given by (Cheret and Denux, 2011;Digavinti and Manikiam, 2021), where NIR and Red represent the near-infrared and red bands, respectively.NDVI varies from À1 to 1. High NDVI values indicate healthy, dense, and moist vegetation, while low values indicate unhealthy, sparse, and dry vegetation, which is more susceptible to ignition and spread of fire (Chuvieco et al. 2004;Abdo et al. 2022).Normalized Difference Moisture Index (NDMI) is another vegetation index employed to evaluate the moisture of vegetation (Taloor et al. 2021;Abdo et al. 2022).It is calculated by, where SWIR is the short-wave infrared band.Positive values of NDMI indicate high moisture content in vegetation, while negative values indicate low moisture content and dry vegetation (Taloor et al. 2021;Abdo et al. 2022).The Soil-Adjusted Vegetation Index (SAVI) is a vegetation index that is adjusted for the effect of soil background on the reflectance measurement.It can be calculated by, where L is a correction factor (between 0 and 1) that adjusts for the soil background influence.A value of L ¼ 0.5 is commonly used (Vargas-Cuentas and Roman-Gonzalez, 2021).The NDVI, NDMI, and SAVI were derived from the same Landsat 8 OLI satellite imagery by ENVI 5.1 (Table 1).
Land surface temperature (LST) has a significant effect on the occurrence and spread of forest fires (Maffei et al. 2018).Higher LSTs tend to dry out fuels and reduce the moisture content of vegetation, making them more vulnerable to ignition.LST can also impact atmospheric stability and wind patterns, further affecting fire behaviour.Thus, understanding the relationship between LST and forest fires is imperative for improving fire management and mitigating the risks of fires (Ahmed et al. 2020).The LST was computed based on bands 10 and 11 of the Landsat satellite images by the ENVI software (Ahmed et al. 2020).This algorithm estimates LST by using radiance measurements in the thermal infrared (TIR) band of Landsat 8 OLI as follows, where T b is the radiance in the TIR band, and K 1 and K 2 are temperature-emissivity separation (TES) constants (Ahmed et al. 2020;Junaidi et al. 2021).
The distance from a river has a crucial effect on the occurrence and intensity of forest fires.Forests located near rivers tend to have a higher humidity, which reduces the risk of fire ignition and spread.On the other hand, forests far from rivers are more likely to experience drought conditions, making them more susceptible to fire (Nuthammachot and Stratoulias, 2021).The distance from roads and residential areas highly influences the probability and extent of forest fires.The closeness to humans and infrastructures can enhance the risk of ignition due to human activities.At the same time, forests near populated areas may receive inadequate fire management due to limited access, leading to a higher risk of fire spread (Mohajane et al. 2021;Tariq et al. 2021).Arc GIS 10.5 was utilized to obtain the distance from road, distance from river, and distance from residential area (Bera et al. 2022b).
Annual rainfall is a vital factor in determining the prevalence and ferocity of forest fires.Regions with low rainfall are more susceptible to fires due to the dry and flammable state of vegetations (Akıncı and Akıncı, 2023).Temperature has an important role in the initiation of forest fires.Warmer temperatures lead to dryer conditions and therefore make vegetations more susceptible to ignition, ultimately increasing the frequency and intensity of forest fires.There are eight weather stations in the Chalus Rood watershed.The station observations are spatially interpolated and gridded to generate annual rainfall and temperature maps over the study domain.

Methodology
The proposed approach for modeling forest fires is comprised of several distinct steps, as depicted in Figure 3.The methodological flowchart outlines the process from start to finish, including the crucial steps of data preparation, data exploration, data splitting, application of machine learning techniques, evaluation of model performance, and finally forest fire susceptibility maps.

Multicollinearity
Multicollinearity is a problem that arises when predictor variables in machine learning (ML) models are highly correlated with each other.This can result in unreliable coefficients, reduced interpretability of the model, and decreased prediction accuracy (Alin, 2010).To mitigate these issues, it is essential to assess the strength of multicollinearity among input variables in ML models.One of the most widely used metrics for assessing multicollinearity is the variance inflation factor (VIF).VIF quantifies the increase in variance of an estimated regression coefficient due to collinearity in the predictor variables (Paul, 2006).The VIF is calculated as: where R 2 i is the determination coefficient of the regression of the i th predictor variable on all the other predictor variables.VIF values less than 1 imply no multicollinearity, while VIF values from 1 to 5 show mild multicollinearity.Moderate multicollinearity occurs when the VIF is between 5 and 10, and severe multicollinearity happens for the VIF of larger than 10.By conducting a VIF analysis, researchers can gain insights into the extent of multicollinearity among input variables and make informed decisions about which predictor variables to include or exclude from the model (Paul, 2006;Alin, 2010).

Modeling and ensemble approaches
ML-based models use artificial intelligence (AI) techniques that successfully equip machines to learn from experience without being explicitly programmed (Ardabili et al. 2019).Ensemble learning has been developed in ML to advance the robustness and accuracy of a model.Ensemble learning algorithms, such as 'boosting' and 'bagging', simplify solutions to key computational issues (Zhang and Ma, 2012).In this study, the GLM and four ensemble approaches (PLS, Boosting, Bagging and Bayesian) were employed to model the forest fire risk.For this purpose, the R software with the caret, raster, brms, mboost, and plsRglm packages were employed (Kuhn 2012, Hothorn et al. 2013, Bertrand et al. 2014, Hijmans et al. 2015, Buerkner and Buerkner 2016).

Generalized linear model (GLM)
GLMs are frequently used to model binary or count data as flexible generalizations of ordinary linear regression.They enable the response factors to provide error distributions except a normal distribution (McCulloch and Searle 2014).GLMs simulate the response variable by a linear model and employ the variance of each observation as a function of its predicted value (Kamata, 2001).Each output (Y) is presumed to be produced from a specific distribution.The average (A) of the distribution depends on the independent variable X according to Equation ( 6) (Wolfinger and O'connell, 1993;Stroup, 2012): where X b is the linear predictor as a linear integration of the unknown parameter b, F(Y) is the expected value of Y, and g is the link function.Accordingly, the variance (V) is usually a function of the average value, as shown in Equation ( 7) (Wolfinger and O'connell, 1993;Stroup, 2012): This is appropriate if V contains an exponential distribution, but it may be convenient for the variance to be a function of the predicted value.B is an unknown parameter that can be successfully estimated by maximum likelihood, maximum quasi-likelihood, or Bayesian techniques (Stroup, 2012).

Partial least-squares PLSGLM
PLSGLM employs a complex of orthogonal latent factors (LFs) to model the response variables.The set of LFs was exported from a large number of independent variables using PLS (Marx, 1996;Ding and Gentleman 2005).It was generated by projecting the dataset onto a lower-dimensional subspace, while maximizing the covariance between the LF.This process is conducted in a way that the LFs describe most of the covariance between Y:N Â M and X:N Â P (Park et al. 2002).These LFs (usually called X-scores) are few (A < P).Often, they are shown by T, and because the LFs must be orthogonal, T T T ¼ I A .Several algorithms have been employed for computing LFs, such as kernels and statistical modifications to PLS and nonlinear iterative PLS (Ding and Gentleman 2005).
Different algorithms, such as kernel PLS, orthogonal partial least square (OPLS), sparse partial least square (SPLS), result in distinct PLS decompositions.For all cases, T:N Â A, and a matrix of X-scores is found as T ¼ X 0 R. PLS employs matrix T as an independent variable to model Y, as shown in Equation ( 8) (Marx, 1996): where A is the number of PLS components and BPLS is the predicted coefficient matrix for the PLS model.This model can be extended to the GLM, as in Equation ( 9) (Marx, 1996): where BPLSÀGLM ¼ R Ã q Ã is the PLS-GLM coefficient vector, and R * and q * :A Â 1 are calculated from the PLS-GLM.

Boosted GLM
Variable selection can be considered as one of the main tasks in the GLM.The first step can be performed without any random effects.In boosted GLM, the variables could be added only to the last selected model, with the disadvantage that it may not be optimal.Boosted GLM is a modern method that selects the inclusion parameters.
Boosting-based techniques are often used in ML communities to increase classification accuracy.In boosted GLM, random effects are simultaneously fitted to the model parameters (mstop and prune) (Bou-Hamad et al. 2017).The method starts with a minimal number of variables and gradually increases them until the end criterion is met.This component-wise approach starts with all parameters set to zero and updates them iteratively, leading to an optimized model.

Bagging GLM
Bagging enhances model accuracy that starts by aggregating several predictors (Breiman, 1996;Sutton, 2005).This technique works by developing a base-learning algorithm using a trial-and-error method on various training sets (Breiman, 1996).Bagging improves the stability and predictive robustness of the regression and classification models.In this study, the bagging approach was selected to weigh the presence data (fire points) and determine its importance over the absence data (non-fire points).First, the dataset was separated into presence and absence data.Then, equal number of presence and absence data were selected by bootstrapping.Accordingly, the GLM was employed with a binomial error with the total data.The combination of bagging and GLM provides a valuable tool for modeling complex systems and predicting forest fire risk, taking into account the influence of various factors and providing a comprehensive understanding of the underlying patterns in the data.

Bayesian GLM
Bayesian inference is a probabilistic approach, which is built on the concept that for each quantity, there is a probability distribution that can be optimized for new data.Bayesian networks (also called belief networks) are probabilistic models that signify information in an unspecified context.Each node in the diagram shows a random variable, and the branches indicate possible dependencies between them (Tran et al. 2020).These conditional dependencies are usually assessed using probabilistic and statistical approaches.Bayesian networks merge statistics, computer science, principles of graph theory, and probability theory.They efficiently represent and calculate probability distributions in a series of random variables (Band et al. 2020).The use of Bayesian methods for the proper modeling and determination of optimal parameters has been undertaken in recent years (Ahmadi et al. 2020).In this study, the Bayesian method was combined with cumulative linear regression to estimate the risk of forest fires.

Validation and accuracy assessment
The receiver operating characteristic (ROC) is often used as an evaluation metric for binary classification models.The ROC plots the relationship between the true positive rate (sensitivity) and false positive rate (1-specificity) at various classification thresholds.The area under the curve (AUC) of the ROC plot denotes the performance of a classifier.It ranges from 0 to 1 with the AUC of closer to unity indicates a better model performance.AUC is a useful metric as it summarizes the performance of the classifier across all thresholds and provides an overall measure of its discriminative ability (Peres and Cancelliere 2014).
Sensitivity, Specificity, PPV, and NPV are important metrics in evaluating binary classification models.Sensitivity (also known as the true positive rate) measures the proportion of positive cases that are correctly identified as positive (equation 10).Specificity (also known as the true negative rate) measures the proportion of negative cases that are correctly identified as negative (equation 11).Positive predictive value (PPV) and Negative Predictive Value (NPV) provide a measure of the precision of the classifier's positive and negative predictions, respectively.PPV is the proportion of positive predictions that are actually positive (equation 12), while NPV is the proportion of negative predictions that are actually negative (equation 13).These metrics are essential in understanding the strengths and limitations of a classification model and provide valuable insights into its performance (Peres and Cancelliere 2014;Yonelinas, 1994).

Results and discussion
Table 2 presents the VIF and tolerance values for the 14 independent variables utilized for modeling forest fires.The results show that all 14 variables have a VIF below 5. NDVI has the highest VIF value of 4.23 and Aspect has the lowest VIF value of 1.10.
For both training and testing phases, the sensitivity, specificity, NPV, PPV, and area under curve (AUC) metrics of five models (namely GLM, PLS-GLM, Boosted-GLM, Bagging-GLM, and Bayesian-GLM) are shown in Table 3.The AUC values from these models during the testing phase are presented in Figure 4.In the testing phase, the AUC values of the GLM, PLS-GLM, Boosted-GLM, Bagging-GLM, and Bayesian-GLM were 0.79, 0.75, 0.81, 0.84, and 0.85, respectively.
The evaluation metrics indicated that the Boosting, Bagging and Bayesian ensemble methods improved the efficiency of the GLM in predicting forest fires.Similarly, other studies (e.g.Xie and Peng 2019;Naghibi et al. 2020;Sanjaya and Puspitasari 2020) showed that ensemble models tend to be more accurate than individual models in forest fire hazard modeling.This finding is consistent with the previous studies that showed the superiority of ensemble models in predicting floods (Hosseini et al. 2020), landslide and debris flow (Pal et al. 2022), erosion (Band et al. 2020) and multi-hazard (Saha et al. 2021).
The Bayesian-GLM had the highest efficiency compared to the other three ensemble models.The Bayesian-GLM has several advantages in modeling natural hazards.First, it incorporates prior knowledge and uncertainties into the model parameters, resulting in predictions that are both accurate and reliable.This is especially crucial in the context of natural hazards, where data are often limited and uncertain.By aggregating the outputs of ensemble models, the Bayesian-GLM can produce predictions that are more accurate and reliable.This has a significant impact on the management and mitigation of natural hazards.Additionally, the Bayesian-GLM allows for the assessment of model uncertainty, which can be particularly valuable for decision-making in high-stakes and high-uncertain situations (Zhao at al. 2006; de los Campos and P erez-Rodr ıguez 2014; Hosseini et al. 2020).
Using the Jenk's natural breaks model in ArcGIS 10.5 (Jenks, 1967;McMaster, 1997), the forest fire susceptibility map of the Chalus watershed was classified into very high, high, moderate, low, and very low. Figure 5 displays the classified forest fire susceptibility map based on GLM and four hybrid models (PLS-GLM, boosted-GLM, bagging-GLM, and Bayesian-GLM).The spatial resolution of generated fire susceptibility maps is 30 m. Table 4 lists the area (km 2 ) and percentage of each forest fire susceptibility class.The importance of input variables was obtained based on the Gini index (Sandri and Zuccolotto 2008).It is a commonly used index for determining the importance of inputs in machine learning models.This index measures the reduction in impurity (incorrect predictions) in a binary split of the data based on a particular variable.The Gini importance score is calculated as the weighted average of the decrease in impurity for each split, where the weight is proportional to the number of samples in the node.This score can be used to rank input variables based on their contribution to the outputs (Sandri and Zuccolotto, 2008;Pourghasemi et al. 2020).Table 5 displays the importance values for each variable.The highest importance value is for 'Altitude' with a score of 100 and the lowest is for 'Aspect' with a score of 3.11.Variables with     97, 69.10, 77.26, 42.39, and 60.65, respectively.The partial dependence plots of forest fires for these four variables are depicted in Figure 6.This study investigated the impact of various factors on fire risk in the Chalus Rood basin.We found that elevation, rainfall, distance from the road, distance from the river, and temperature had the most significant influence on fire risk.The results showed that lowlands, which are often nearby residential areas, are more susceptible to forest fire hazards compared to high-altitude regions.This finding is in agreement with previous studies (e.g.Flannigan and Haar, 1986;;Milanovi c et al. 2020) that identified elevation as an important variable in determining fire hazards.
This study also indicated that the risk of forest fires increases with decreasing rainfall and increasing temperature.These atmospheric factors have a major impact on the region's humidity and have a dominant control on the ignition and spread of fires.Moist conditions from high amounts of rainfall reduce the risk of fires, while low rainfall and drought conditions increase the risk by accumulating dry fuels, which can cause fires to spread rapidly (Abram et al. 2021;Dhar et al. 2023).Temperature also plays a crucial role in forest fire susceptibility.Higher temperatures can increase the evaporation rate of moisture in the fuels, leading to an increased risk of fire.Additionally, high temperatures can cause an increase in atmospheric instability, which leads to the formation of thunderstorms, and lightning strikes can ignite fires in the forests (de Santana et al. 2021;Akıncı and Akıncı, 2023).Eskandari et al. (2020) explored the relationship between climatic factors and the incidence of fires in Moreover, our study found that the distance from roads is another crucial factor affecting the risk of forest fires in the Chalus Rood basin.This variable reflects human impacts on fire occurrences and plays an important role in spreading fires in the surrounding region.The road map of the study area highlights that areas close to roads are at a high risk of forest fire.The roads and residential areas in the Chalus Rood watershed positively contribute to the occurrence of fires as tourists and/or residents can cause fires.These results match with those of Jaiswal et al. (2002), Erten et al. (2004), Bui et al. (2017), and Pandey and Ghosh (2018) that showed the distance from roads is the major cause of forest fires by humans.

Conclusion
In this study, fire risk was investigated using four ensemble algorithms on the GLM model in the Chalus Rood basin in Iran, and a fire hazard model were developed using these algorithms.All ensemble algorithms, except the PLS algorithm, increased the efficiency of the GLM model, and the Bayesian algorithm was the most efficient algorithm for improving the performance of the GLM model.The features of the study area, including topographic factors affecting the occurrence of fire, accuracy and type of layers of independent variables to model the risk of fire, accuracy of recorded points and ranges of past fires, and type of prediction algorithms used, affected the overall classification accuracy of this study.Because the impact of most fires in the Chalus Rood basin can be increased by increasing the effective human variables, the accuracy and precision of hazard maps can be increased.The high accuracy of the fire prediction models in this study makes them useful for creating fire probability maps in other northern regions of Iran.These maps can help manage the risk of forest fires, prevent them, and quickly extinguish any that occur.Based on these maps, facilities for dealing with fires in high-risk areas can be established, such as watchtowers and water tanks in inaccessible areas, wireless sensor networks for early detection, and rescue bases before the fire season.To further prevent and reduce fire damage, it is important to raise awareness, educate the local population, and establish fire pits and barriers.One of the limitations of this study is the inaccessibility of a few influential climate variables such as wind speed, wind direction, and humidity.Thus, we could not use them in mapping forest fire susceptibility.It is recommended that future studies utilize these climate variables in modeling forest fire susceptibility maps.Additionally, it is suggested to use other forest fire mapping approaches (e.g.deep learning techniques and hybrid machine learning models) and compare their results with the findings in this study.

Figure 1 .
Figure 1.Location of the study area: a) Iran, b) Mazandaran Province, and c) Chalus Rood watershed.

Figure 4 .
Figure 4. ROC curve analysis for forest fire susceptibility models using the validation dataset (The vertical axis is the true positive fraction (TPF) value and the horizontal axis represents the false positive fraction (FPF) value.).

Figure 6 .
Figure 6.Partial dependence plots showcasing the influence of four explanatory variables on forest fire susceptibility; a) altitude and rainfall, b) distance from road and distance from river.

Table 1 .
Data type, source, and resolution of the independent variables.

Table 2 .
Result of the multi-collinearity analysis.According toTable 4, the output of GLM owns about $6.4% at very low class of 104.2 km 2 , 26% at low class of 261.14 km 2 , 28.22% at moderate class of 460.35 km 2 , 31% at high class of 505.56 km 2 and 18.4% at very high class of 300.26 km 2 .However, the Bayesian-GLM owns about 6.49% at very low class of 105.82 km 2 , 16.8% at low class of 273.87 km 2 , 28.98% at moderate class of 472.78 km 2 , 29.48% at high class of 480.89 km 2 and 18.27% at very high class of 298.15 km 2 .

Table 3 .
Predictive capability of forest fire models using training and testing datasets.

Table 4 .
Forest fire susceptibility class areas.

Table 5 .
Results of determination of importance value.