Gene expression programming and ensemble methods for bushfire susceptibility mapping: a case study of Victoria, Australia

Abstract Bushfire susceptibility mapping helps the government authorities predict and provide the required disaster management plans to reduce the adverse impacts from bushfires. In this paper, we investigated Gene Expression Programming (GEP) and ensemble methods to create bushfire susceptibility maps for Victoria, Australia, as a case study. Bushfire susceptibility maps indicate that the eastern part of Victoria where forests are predominant has the highest probability of bushfire. Western part of Victoria which is covered by cropland, shrubland and grassland has the lowest bushfire probability. Two ensemble methods, namely an ensemble of GEP and Frequency Ratio (GEPFR) and an ensemble of Logistic Regression and Frequency Ratio (LRFR), were proposed and compared with stand-alone GEP and stand-alone Frequency Ratio (FR) methods. The proposed methods were evaluated by Area Under Curve (AUC). AUCs of GEPFR, LRFR, GEP and FR are 0.860, 0.852, 0.850, and 0.840, respectively. It can be concluded that GEPFR outperforms the other three methods, and the ensemble methods outperform the stand-alone methods. GEPFR, LRFR and GEP produced the bushfire probability with an accuracy in the range of 90.79%−92.27%, and therefore they are equally useful for policy makers and managers to have better natural hazard management plans. Graphical Abstract


Introduction
Bushfires harm human life and have destructive consequences in the environment, especially in the populated areas (Tonini et al. 2020). The number and severity of bushfires have been increased in many parts of the world during the last decade due to the global warming, industrialization and human activities (Ghorbanzadeh et al. 2018;Zhang et al. 2019;Gholamnia et al. 2020). In one bushfire season alone, from late 2019 to early 2020, more than 12.6 million ha across Australia were burned, and as a result, 434 million tons of carbon dioxide were emitted (Werner and Lyons 2020). The bushfire season in 2019-2020 resulted in the loss of 33 people and over one billion animals across Australia (Werner and Lyons 2020). Southeastern Australia was the most affected area. Twenty four casualties with around 5 million ha of the burned areas belong to Southeastern Australia (Green et al. 2020), while over one million ha was burned in Victoria (Bunch 2020).
A bushfire susceptibility model could be a valuable tool to control and manage the future bushfires by predicting the vulnerable areas. Important factors in bushfire occurrence can be divided into four main categories including topography, climate, fuel loads and human activities (Hong et al. 2019;Jaafari and Pourghasemi 2019). Topographic factors such as elevation, slope, and aspect play an important role in bushfires (Tolhurst et al. 2008;O'Connor et al. 2016;Clarke et al. 2019). Many scholars have used a variety of climatic conditioning factors such as temperature (Tehrany et al. 2019;Zhang et al. 2019;Pham et al. 2020), precipitation (Jaafari et al. 2017;Zhang et al. 2019), wind speed (Jaafari et al. 2017;Tehrany et al. 2019), and humidity (Tehrany et al. 2019). Fuel loads are another conditioning factor that can be grouped into two subcategories, namely Normalized Difference Vegetation Index (NDVI) (Tehrany et al. 2019;Zhang et al. 2019), and land use/land cover (De Vasconcelos et al. 2001;Jaafari et al. 2017;Pham et al. 2020;Shang et al. 2020). The previous research on bushfires also used different human activity factors such as distance to roads (De Vasconcelos et al. 2001;Jaafari et al. 2017;Valdez et al. 2017;Jaafari et al. 2018;Pham et al. 2020), distance to populated areas (Jaafari et al. 2017;You et al. 2017;Hong et al. 2019) and distance to rivers (Valdez et al. 2017;Jaafari et al. 2017;Jaafari et al. 2018;Jaafari et al. 2019c;Zhang et al. 2019). The abovementioned four factors are closely related to the bushfire occurrence. Topography is one of the main drivers of bushfires because slope influences the spread rate of bushfires (Leonard and Blanchi 2012). Fire can move faster upslope and slower downslope (Leonard and Blanchi 2012). Topography can also affect the distribution of vegetation, climate elements (e.g. wind speeds), temperature, precipitation, and solar radiation (Tien Bui et al. 2016;Leuenberger et al. 2018;Eskandari et al. 2020a). Meteorological parameters such as temperature were reported as a significant factor in bushfire prediction (Ghorbanzadeh et al. 2019b). Temperature has a direct relationship with fire occurrence (Ghorbanzadeh et al. 2019b). The chance of bushfire is higher in places and times with a higher temperature (Nicholls and Lucas 2007). NDVI plays an important role in bushfire modelling (Tien Bui et al. 2016;Zhang et al. 2016). Different types of vegetation were considered as a proxy of fuel, and NDVI was used to explain the vegetation status (Tien Bui et al. 2016). Low NDVI indicates water stress, hence increases the risk of fire (Ghorbanzadeh et al. 2019b). Fire can quickly spread in highly connected forest patches, whereas patches of irregular shapes reduce the spread rate of fire (Gralewicz et al. 2012). Bushfire occurrence is more probable in the areas near human activities (Zhang et al. 2015). As a result, distance to roads and human facilities were reported to be negatively correlated with bushfire occurrence (Zhang et al. 2015).
Different statistical approaches have been used for bushfire susceptibility mapping including Frequency Ratio (FR) (Dorji and Ongsomwang 2017;Hong et al. 2017), Evidential Belief Function (EBF) (Pourghasemi 2016), and Weight Of Evidence (WOE) (Jaafari et al. 2017;Hong et al. 2019). Significant advances have been made in bushfire monitoring by remote-sensing technologies. The improvement in the quality of weather data and a sufficient amount of remotely sensed data has created an opportunity for scientists to use a data-driven approach to bushfire modelling (Jain et al. 2020). As a result, Machine Learning (ML) techniques have been used increasingly in bushfire management (Jain et al. 2020). ML includes a class of algorithms that produce susceptibility maps based on input data without the need for prior knowledge. ML learns from experience, and once the model is fitted, it will generate bushfire susceptibility maps for the whole study area (Leuenberger et al. 2018;Tonini et al. 2020).
Gene expression programming (GEP) is a branch of artificial intelligence methods that can automatically find the explicit function between dependent and independent variables without any assumption on the function form (Emamgolizadeh et al. 2015;Hoang and Tien Bui 2018). GEP can find the nonlinear relationship between a response variable and conditioning factors (Emamgolizadeh et al. 2015). GEP is proven to be an effective method for predicting natural hazards such as landslides and other engineering fields (Zakaria et al. 2010;Kayadelen 2011;Emamgolizadeh et al. 2015). The GEP algorithm is proven to be a promising tool for the prediction of shallow landslides in Son La Province, Vietnam (Hoang and Tien Bui 2018). GEP was also used in other engineering fields such as high-performance concrete (HPC) to formulate compressive strength of HPC mixes (Mousavi et al. 2012). However, to the best of our knowledge, GEP has not been explored in bushfire studies. Therefore, the overarching aim of this paper is to investigate the potential of GEP for bushfire susceptibility mapping. In this paper, we proposed four methods, namely an ensemble of GEP and FR (GEPFR), a single GEP, an ensemble of LR and FR (LRFR), and a single FR to create bushfire susceptibility maps for Victoria, Australia, as a case study. For this purpose, we collected necessary data for conditioning factors in Victoria. Then, we had generated the required conditioning factor maps that were later used for generating the bushfire susceptibility maps, as explained in the materials and method section. Then, the generated models were validated and compared in the result and discussion section. The methods used in this research are generic and scalable, hence they can be applied to other bushfire-prone countries.

Materials and methods
This study aims to investigate the bushfire susceptibility in Victoria located in Southeastern Australia, between latitudes from 34˚S to 39˚S and longitudes from 141˚E to 150˚E with an area of 227,444 km 2 ( Figure 1). New South Wales occupies the northern side, and South Australia is located on the western side of Victoria. Victoria is surrounded by Tasman Sea and the Indian (southern) Ocean on the south. Plant cover in Victoria includes forests, savannas and grasslands.

Fire data
Collecting data is a fundamental step prior to the generation of bushfire susceptibility mapping (Eskandari et al. 2020a). In this study, the history of bushfires in Victoria was collected from the MODIS fire product (MODIS 500-m MCD 64 Monthly) from the NASA website for the period of fire seasons between 2010 and 2020. The burned areas were classified as 1 and unburned areas were classified as 0 in inventory map of our study area ( Figure 1C). Fire seasons in Australia occur from November to February. The MODIS data will help researchers generate inventory maps that are important sources of information for elaborating hazard, risk and susceptibility maps (Tonini et al. 2020). This data can be found at http://modis-fire.umd.edu.

Effective factors on bushfires
Bushfires are a complex phenomenon, and many factors are involved in bushfire occurrence. There is no definite answer to which factor plays the most important role for the prediction of bushfires (Hong et al. 2019). As a result, we consider the following four factors in our study based on the literature (Jaafari et al. 2017;You et al. 2017;Sachdeva et al. 2018;Hong et al. 2019;Zhang et al. 2019;Jaafari et al. 2019c;Tonini et al. 2020;Eskandari et al. 2020a), and the availability of the data: topography, climate, fuel loads, and human-made factors. Topographical factors include slope, aspect, and digital elevation models. Climatic factors are mainly annual mean precipitation and annual mean temperature. Fuel loads and human-made factors include NDVI, land cover, and distance to roads. (Figure 2A) was provided by the United States Geological Survey (USGS) (USGS 2021). The majority of the northeastern part of Victoria has high elevation while the western part has lower elevation ( Figure 2A). Aspect and slope were extracted from the DEM ( Figure 2B,C, respectively). The northeastern part of Victoria has the highest steep slopes while the western part shows undulating slopes ( Figure 2C).

Climatic factors.
Temperature and rainfall data were collected from the Bureau of Meteorology of Australia for our study area between 2010 to 2020 ( Figure  3A,B) (BOM 2021). The northwestern part of Victoria has the highest temperature and lowest precipitation while the northeastern part has the lowest temperature and highest precipitation ( Figure 3A,B).

Fuel loads.
The NDVI and land cover data known as MODIS 1-km MYD13A3 NDVI ( Figure 4A,B) were collected from USGS (USGS 2021). The majority of the eastern part of our study area has the highest NDVI index and is mainly covered by forests ( Figure 4A,B). The western part is mainly covered by grassland, shrubland and savannas ( Figure 4B) and these areas have the lowest NDVI index as well ( Figure 4A).

Human-made factors.
Human activities are highly connected to the bushfires in the populated areas. Human activities, either voluntary (arsonism) or involuntary (accidental or negligent causes), can initiate fire (Leuenberger et al. 2018). In our research, distance to roads is considered as one of the human-made factors based on previous literature ( Figure 4C). Open Street Map was used to calculate distance to roads (OSM 2021) which shows the variations in the study area. Another humanmade factor to be considered is distance to populated areas, however, this factor is closely correlated with land cover which is already taken into account. Therefore, we excluded distance to populated areas. Distance to rivers is also excluded as this is correlated with land cover too.

Genetic programming
A Genetic Programming (GP) method is an extension of a genetic algorithm (GA) that was introduced by Koza (1992). The main difference between GP and GA is the representation of the solution (Gandomi et al. 2011;Mousavi et al. 2012). GP solutions are computer programs with hierarchical structures (Koza 1992;Gandomi et al. 2011;Mousavi et al. 2012), while GA solutions are strings of numbers (Mousavi et al. 2012). Hierarchical structures and the dynamic variability of computer programs are important features of GP (Koza 1992). GP starts with a random population of computer programs. Individuals are computer programs that have been measured based on their fitness and their performance in the special problem environment (Koza 1992). GP determines the fittest individuals that explain the problem (Koza 1992).
GP provides the outcome by running the following steps: (1) Create an initial random population of computer programs which are a composite of functions and terminals.
(2) Execute following sub-steps repetitively and stop after a termination criterion has been met, Perform each computer program in the population and allocate a fitness value to each program based on their ability to solve the problem.
Two main operations given below have to be used for the creation of new population. It is worthwhile to note that the fitness of computer programs will be considered for the following operations.
Computer programs will be copied to the next generation. Two current genetic programs will be genetically recombined randomly to generate new computer programs.
(3) The best computer program resulted from any generation will be presented as the solution or an approximate solution to the problem (Koza 1992). 2.3.1. Gene expression programming GEP was first introduced by Ferreira (2001) as an extension of GP. The nature of individuals is the main difference between GP and GEP. Individuals are nonlinear entities with different sizes and shapes representing as parse trees in GP, while individuals in GEP are linear strings with fixed lengths, but they will be expressed as nonlinear entities of different sizes and shapes (Ferreira 2001).
GEP starts with a random population of chromosomes. GEP works based on natural selection and the survival of the fittest. The best formula between inputs will be nominated by evolving the initial population through genetic operators (e.g. crossover and reproduction) (Hoang and Tien Bui 2018). We used GeneXproTools software to model the bushfire with GEP (Ferreira 2013).
GEP uses the following steps to nominate the best solution for the problem: 1. The first population of chromosomes will be generated randomly. Several genes will be linked by the linking function which will create chromosomes. Each chromosome has two parts that are called head and tail. Head can consist of functions and terminals while tail can be chosen only from terminals (Ferreira 2001;Hoang and Tien Bui 2018). 2. The fitness function will assign a value to each chromosome based on its performance (Hoang and Tien Bui 2018). 3. In the selection process, if the stop condition is not satisfied, the selection process will be applied based on the concept of roulette wheel with elitism. Hence each individual that is more fit has more chances to survive, and as a result, they are more likely to reproduce and the elitism replicates the best individuals to the next generation (Hoang and Tien Bui 2018). 4. The reproduction processes consist of several genetic operations including crossover, mutation, transposition, and inversion. These operations create diversity in the population. Crossover generates two new individuals through exchanging genetic elements between two random chromosomes. Two types of crossovers will be used in GEP (one-point and two-point crossovers). Mutation procedure randomly modifies the gene element at a certain point of a chromosome. Transposition is another genetic operation that can create diversity in the population by transporting gene fragments within the gene. Inversion operator can also randomly choose and then inverse the sequence in the chromosomes (Hoang and Tien Bui 2018).

Frequency ratio
Frequency ratio (FR) is a statistical method to generate the relationships between conditioning factors and independent variables (Hong et al. 2017;Razavi-Termeh et al. 2020). Bushfire prediction works based on the similarity of the conditions and characteristics of previous hazards (Razavi-Termeh et al. 2020). FR is a ratio of a bushfire area to the total study area which can be generated from Equation (1) where BF(x) is the number of bushfires happened at a class x, TBF is the total number of bushfires, N(x) is the number of pixels in x, and TN is the total number of pixels for the whole study area. Each contributing factor to a bushfire can have multiple classes. For example, elevation can be reclassified as low, medium and high. For each factor f, FR(f) is the sum of the frequency ratios of the factor's subclasses. And the bushfire susceptibility is the weighted sum of the factors' frequency ratios. The weights should be determined by FR. Our weights are presented in Section 3. All factors and each factor's subclasses used in our study are provided in Table 1. This FR method was reported to be an effective method for determining the importance and significance of different classes of factors (Van Westen et al. 2003;Hong et al. 2017).

Logistic regression
A Logistic Regression (LR) model can be used to determine the spatial relationship between bushfire occurrence and conditioning factors (Lee and Pradhan 2007). Correlations between the conditioning factors and bushfire occurrence have been calculated based on Equations (2) and (3).
where p is the probability of bushfire occurrence and z is a linear combination as follows: z ¼ a 0 þ a 1 F 1 þ a 2 F 2 þ ::: þ a n F n : where a 0 is a model intercept, a 1 , a 2 , … , a n are the coefficients of the model, and F i 's are the independent conditioning factors (Lee and Pradhan 2007). The variables in LR can be either continuous or discrete due to the addition of a suitable link function to the ordinary linear regression (Lee and Pradhan 2007;Gholamnia et al. 2020). For this study, the dependent variable is binary. The R statistical software was used to carry out computational process (R Core team 2020).

An ensemble of GEP and FR (GEPFR)
The framework of an ensemble of GEP and FR (GEPFR) proposed in our research is illustrated in Figure 5. In the first step, eight factors and an inventory map were collected and classified. The importance of each class is assigned by FR. The data sets were randomly divided into training data (70%) and testing data (30%). The initial model is trained with 70% of the data (training data) based on the concept of GEP. The final model is evaluated by 30% of the data (testing data). Finally, a bushfire susceptibility map is generated by GEPFR. We used a population size of 30 where each chromosome has a head length of eight. Chromosomes in our study contain three genes that are linked together by applying addition. The terminal set consists of bushfire conditioning factors and constants. We chose functions (Table 2) based on the previous research (Ferreira 2013). We selected eight conditioning factors including elevation, aspect, slope, annual mean temperature, annual mean precipitation, NDVI, land cover, and distance to roads. The existing data were converted to raster layers. All conditioning factors except aspect and land cover were classified in ArcGIS using Natural break method. FR was used to evaluate the importance of different classes for each factor.
The importance of conditioning factors was prioritized by GeneXproTools. For this purpose, the input variables were randomized, and their importance was defined based on the decreasing value of R-square between the generated model and real values (Ferreira 2013).

An ensemble of LR and FR (LRFR)
An ensemble of LR and FR (LRFR) is also used for bushfire susceptibility mapping. Factors were collected in the first step and then classified. The importance of each class is determined by FR, and then LR is trained by 70% of the data and is evaluated by 30% of the data (randomly chosen) ( Figure 6). Finally, the bushfire susceptibility was generated.

Results and validation
The model was generated from GeneXproTools, and the resulting mathematical formula for bushfire susceptibility mapping by GEPFR is given in Equation (4): where d 0 is aspect, d 1 is elevation, d 2 is land cover, d 3 is annual mean temperature, d 4 is NDVI, d 5 is annual mean precipitation, d 6 is distance to roads, and d 7 is slope. These are referred to as the weights mentioned in Subsection 2.4. a, b, c, and d are constant values that were determined by GEP. Each sub-tree from Equation (4) is a gene that is linked together by the addition operator. The final result is the fittest chromosome. This chromosome contains three genus which consist of terminals and functions. A bushfire susceptibility map by GEPFR shows that the eastern part of Victoria is a hot spot for the bushfire occurrence, whereas the west seems to have the lowest possibility for the bushfire occurrence ( Figure 7A). The AUCs of the model from the GEPFR in the training and testing sets are 0.863 and 0.860, respectively. Accuracy for training and testing sets are 91.12% and 90.79% respectively.
Arctan(X) X to the power of two X2 X 2 We also generated bushfires susceptibility maps by using a GEP method ( Figure  7B). AUCs of the model generated by the GEP in the training and testing sets are 0.852 and 0.850, respectively. The training and testing sets have accuracies of 91.49% and 91.64%, respectively.
The LRFR generated a bushfire susceptibility map ( Figure 7C) with the AUCs of 0.853 and 0.852 for training and testing sets, respectively. Accuracy is 92.4% for training and 92.2% for testing. The generated model from LRFR is given in Equation (5). z ¼ À4:934 þ 0:166 Â As À 0:339 Â El þ 0:335 Â L À 0:232 Â T þ 0:395 Â N þ 0:528 where, As is aspect, El is elevation, L is land cover, T is mean annual temperature, N is NDVI, P is mean annual precipitation, D is distance to roads and S is slope. These variables are referred to as the weights mentioned in Subsection 2.4.
Both training and testing sets have balanced AUCs and accuracies, which illustrates that there has been no overfitting in the presented models from all the used methods. It proves the generalization capabilities of the GEPFR, GEP, and LRFR methods for generation of bushfire models in Victoria.
We also generated bushfire susceptibility mapping by single FR ( Figure 7D) which has AUC of 0.840 and accuracy of 81.2%. Our result shows that the GEPFR has the highest AUC compared to three other methods (GEP, LRFR, and FR). However, LRFR has the highest accuracy followed by GEP, GEPFR, and FR. Therefore, it is concluded that the two ensemble methods outperform the single methods. In summary, Table 3 shows the AUCs and accuracies of the four methods.
Bushfire susceptibility maps generated by the GEPFR, GEP, LRFR, and FR were reclassified by the natural break classification method. The reclassified maps were categorized into five subclasses including very low, low, moderate, high and very high ( Figure 7A-D).
The natural break method uses patterns in data to generate break points that aims to minimize the variance within each class and maximize the variance between classes (Tehrany et al. 2019;Moayedi et al. 2020). Therefore, the susceptibility maps in our research were divided into five categories (Figure 7). The areas covered in each probability category are presented in Table 4 from different methods.
The susceptibility maps in our research show that the majority of the eastern part of Victoria is prone to bushfire. The western part has the lowest bushfire susceptibility while it has the lowest annual mean precipitation and the highest annual mean temperature. East of Melbourne's metropolitan area shows a moderate to high  probability of bushfire. Also, the highest probability can be found in Croajingolong National Park located in eastern Victoria and Great Otway National Park located in South Victoria. These two areas have the highest NDVI. NDVI is low in the western part while it gradually increases toward the eastern part of Victoria. The majority of the area with a high bushfire probability in the eastern part is covered by forests and the western part with the lowest bushfire probability is covered by grasslands. We divided the data into two sets including training and testing (randomly chosen, 70% and 30%, respectively) for GEPFR, GEP and LRFR. The best mathematical model as a solution for the problem was determined using the mentioned methods. The models were evaluated by Area Under Curve (AUC) of Receiver Operating Characteristic (ROC) and accuracy. Accuracy is a ratio of correctly classified cases to the number of all data (Hoang and Tien Bui 2018). ROC determines the efficiency of generated models. The AUC of ROC has been widely used in the assessment of the efficiency of predicted models (Bui 2019;Hong et al. 2019;Zhang et al. 2019;Pham et al. 2020;Razavi-Termeh et al. 2020;Eskandari et al. 2020a). Ling et al. (2003) proposed AUC of ROC for machine learning algorithms as an alternative single-number measure. The AUC is statistically consistent and more discriminating than accuracy (Ling et al. 2003). Hence, we use AUC to understand which method is presenting the best performance. The AUCs figure for different methods is also provided Figure 8. In the ROC curve, the false positive rate given in Equation (6) is on X-axis, and the true positive rate in Equation (7) is on Y-axis (Hong et al. 2019): where False Positive (FP) is the number of pixels that are wrongly classified as fire class, True Negative (TN) is the number of pixels that are correctly classified as nonfire class, True positive (TP) is the number of pixels that are correctly determined as fire class, and False Negative (FN) is the number of pixels that are wrongly classified as non-fire class (Hong et al. 2019). AUC values can be interpretated as < 0.6: poor, 0.6-0.7: moderate, 0.7-0.8: good, 0.8-0.9: very good and > 0.9: excellent classification (Hong et al. 2019). Accuracy in Equation (8) is also one of the commonly used quality metrics for the comparison between different models.

Discussions
In this paper we introduced an innovative GEPFR for the prediction of bushfires in the Southeastern Australia. In our research, most of the bushfires were occurred in the eastern part of the Victoria because of high vegetation and fuel load in that area. GeneXproTools defines the importance of conditioning factors which indicate that NDVI plays an important role in bushfire occurrence in our study area. Similarly, previous research also reported that NDVI was an important factor in bushfires (Tien Bui et al. 2016;Zhang et al. 2016). The higher NDVI indicates the presence of more fuel load which is an important factor in the spread of fires (Zhang et al. 2016). Similarly, Tien Bui et al. (2017) stated that NDVI was the most important factor for bushfire occurrence and the highest probability of bushfire was found in the areas with high NDVI (Tien Bui et al. 2017). NDVI cannot distinguish different types of vegetation but can reflect variation in vegetation load, and as a result, it can be an important factor for bushfire prediction (Chen et al. 2001;Malak and Pausas 2006). The eastern part of Victoria has the highest annual mean precipitation compared to the west and the annual mean precipitation is the second most important factor for our bushfire susceptibility mapping. Precipitation will increase vegetation and as a result, there will be a higher flammability rate in the area (Collins et al. 2014). Positive correlation between precipitation and bushfire occurrence in Southeastern Arizona also indicates that precipitation presumably increases the fuel load, however, climatic conditions cannot be extrapolated globally (Crimmins and Comrie 2004;Nicholls and Lucas 2007). In contrast, precipitation was the main negative driver for bushfire occurrence in Tasmania where lower precipitation led to the increase in the extent of burned areas (Nicholls and Lucas 2007).
In our study, temperature is higher in the western and northern part of Victoria, but these areas have low probabilities for bushfire. In contrast, some research reported that temperature is an important factor for bushfires in different areas (Hong et al. 2018;Gigovi c et al. 2019;Eskandari et al. 2020a). This contradictory result could be due to the land cover in the western and northern part of our study area. The western part is mainly covered by cropland, shrubland, and grassland which show lower association with bushfire occurrence. Similarly, previous research also shows that shrubland has the lowest probability for bushfire (Zhang et al. 2015(Zhang et al. , 2016. The western part has also the lowest annual precipitation rate and the highest temperature which results in a low amount of fuel load in the area. Our models show that cropland, shrubland and grassland are less prone to bushfire and as a result there is lower bushfire possibility in the western part of Victoria which is covered by these types of plants. Grasslands have lower fuel load and mainly compaction in grasslands by weathering or grazing decreases the fire occurrence (Cheney 1976). Our research shows that most areas with a moderate to high probability of bushfire are covered by forests. Similarly, Zhang et al. (2016) reported that forest land cover had strong positive relationship with bushfires while shrubland had the lowest probability of bushfire.
In our study area, the bushfire occurrence was more frequent in the eastern area with higher elevation but our method shows that elevation is not among the most important factors. The elevation map correlates positively with forest vegetation which could explain the low importance of elevation factor for bushfire occurrence. Similarly, previous research showed that elevation corresponded with vegetation and as a result there was more bushfire in the elevated area because of higher vegetation (fuel load) (Zhang et al. 2016). Our results show that slope is not affecting bushfire occurrence. Similarly, slope was reported to have a weak relationship with the bushfire occurrence (Valdez et al. 2017;Eskandari et al. 2020a) Comparison of bushfire susceptibility maps generated by GEPFR, GEP, LRFR, and FR show that the AUC of GEPFR is higher than the other three methods. LRFR also has a higher AUC than GEP. FR has the lowest AUC. The GEPFR, LRFR and GEP show that the majority of our study area has very low susceptibility for bushfires, while FR indicates that the majority of the area has very low to low susceptibility. The GEPFR is able to differentiate between very high and high prone areas, while GEP has nominated the majority of the eastern part as high category and FR has allocated 13% of the area as very high. GEPFR indicates that NDVI and precipitations are the most important factors. GEP determines that land cover, precipitation, and distance to roads are important. GEP also illustrates that forest had highest bushfire occurrence among different types of land cover. However, LRFR shows that distance to roads, precipitation, and NDVI play important roles in bushfire occurrence.

Concluding remarks
This research provides an insight to the use of GEPFR, GEP, LRFR, and FR to predict bushfires in Victoria, Australia. The inventory map of bushfires was generated using the historical bushfire data between 2010 and 2020. We used eight conditioning factors to prepare bushfire susceptibility maps for the case study area. The eastern part of Victoria has the highest probability of bushfire. This area is covered mostly with forest and has the highest NDVI. In summary, the bushfire susceptibility maps generated by the four proposed methods agree with the historical data in the area with low, moderate and high bushfire probabilities.

Data and codes availability statement
The data is available in: https://github.com/maryam929/MH_Data.git

Disclosure statement
There are no relevant financial or non-financial competing interests.