GIS-based spatial prediction of flood prone areas using standalone frequency ratio, logistic regression, weight of evidence and their ensemble techniques

ABSTRACT The aim of this research was to evaluate the predictive performances of frequency ratio (FR), logistic regression (LR) and weight of evidence (WoE), in flood susceptibility mapping in China. In addition, the ensemble WoE and LR and ensemble FR and LR techniques were applied and used in the evaluation. The flood inventory map, consisting of 196 flood locations, was extracted from a number of sources. The flood inventory data were randomly divided into a testing data-set, allocating 70% for training, and the remaining 30% for validation. The 15 flood conditioning factors included in the spatial database were altitude, slope, aspect, geology, distance from river, distance from road, distance from fault, soil type, land use/cover, rainfall, Normalized Difference Vegetation Index, Stream Power Index, Topographic Wetness Index, Sediment Transport Index and curvature. For validation, success and prediction rate curves were developed using area under the curve (AUC) method. The results indicated that the highest prediction rate of 90.36% was achieved using the ensemble technique of WoE and LR. The standalone WoE produced the highest prediction rate among the individual methods. It can be concluded that WoE offers a more advanced method of mapping prone areas, compared with the FR and LR methods.


Introduction
Floods are among the most devastating of natural disasters and may cause immeasurable damage (Rozalis et al. 2010). Floods have been classified into three major types: river flooding, flash flooding and coastal flooding. According to Kron (2002), river flooding is caused by heavy precipitation and/ or the melting of snow, making rivers overflow their banks to cover land not usually covered by water. However, river flooding may be predicted by specific methods (Jonkman 2005). While reaction time plays a major role in the management of all natural disasters, it is particularly crucial in the case of floods (Walder and O'Connor 1997). Thus, a flood model should incorporate an efficient early warning mechanism, in order to facilitate the possibility of prevention. There is a vast body of CONTACT Haoyuan Hong honghaoyuan@qq.com research on the measurement and classification of floods, as well as the effects. Damages may be classified as direct or indirect, or alternatively as tangible or intangible, in assessing overall impact and costs (Smith and Ward 1998;Merz et al. 2004). Opolot (2013) calculated that between 2000 and 2008, an annual average of 99 million people globally, suffered some effect of flooding. Urbanization and demographic development along rivers, as well as a reduction in forested areas, has contributed to increasing damage (Bronstert 2003;Christensen and Christensen 2003). Hence, areas most susceptible to flooding indicate where further development should be avoided or controlled, as well as where emergency strategies should be planned. China ranks as the global leader in the analysis of population and economic impact due to flooding (Guha-Sapir et al. 2015). In the last decade of the twentieth century, flood costs were unrecoverable in terms of property damage and loss of lives (Brody et al. 2009;Haraguchi and Lall 2015). Such damage and loss would have been far less significant if some system of early warning system had existed. The impact of uncontrolled land use/cover (LULC) developments, particularly in river bank expansion and construction, has been shown to have an effect on the natural cycle, and is discernable in the spatial and temporal patterns of disaster occurrences. Research into the structure of basins, climatic factors and more susceptible regions can help minimize loss of life and property damage. Thus, the assessment of risk and production of risk maps for future land use and infrastructure development is essential.
Flood management generally comprises four distinct stages: predictions, preparations based on the predictions, preventative measures and the assessment of damage (Konadu and Fosu 2009). Refinements in remote sensing (RS) and geographic information system (GIS) have reshaped hydrology and the management of floods, facilitating the execution of each stage . A variety of analyses, before, during and after the flood, can be performed. Recent automated or rule-based methods have replaced traditional flood models and have proven more robust under analysis (Hostache et al. 2013). Thus, the potential risks of flooding and vulnerable areas may be predicted. Susceptibility analyses establish the areas most threatened and early preparations can be planned and executed to decrease the impact (Kia et al. 2012;Elmahdy and Mostafa 2013).
In recent years, flood susceptibility mapping has seen the development and application of a number of new methods. Some of these such as WetSpa (Liu and De Smedt 2004), HYDROTEL (Jutras et al. 2009) and SWAT (Jayakrishnan et al. 2005) are hydrological models integrated with RS and GIS to collect the data and analyse the spatial parameters. However, the field needs further development (Li et al. 2012). The research of Townsend and Walsh (1998) has been a pioneer landmark in demonstrating the potential of RS modelling in flood prediction in a GIS environment. Youssef et al. (2011), Pradhan (2010a), Pradhan et al. (2009), Garc ıa-Pintado et al. (2013, Stephens et al. (2012), Prakash et al. (2012), Degiorgis et al. (2012), Hostache et al. (2010) and others have since used this combined approach. Some results from these studies have been successful, while others have been flawed (Matgen et al. 2007). Artificial neural network (ANN) has been a popular method in flood susceptibility mapping in many parts of the world (Islam et al. 2001;Dixon 2005;Kia et al. 2012;Rahmati et al. 2016), but has a complex procedure difficult to understand and relies on extremely powerful computer capacity (Maier and Dandy 2000;. Kia et al. (2012) employed ANN to simulate flood-prone areas in the Johor River Basin of Malaysia. They show that this technique can cope with input data uncertainties, as well as incomplete, or contradictory data, in data-sets (Pradhan 2010a;Oh and Pradhan 2011). However, predictions will always be poor when data values outside the training data range are contained in the validation data. With a large amount of variables, the process becomes extremely laborious (Ghalkhani et al. 2012). The fuzzy logic model has been effective in hydrological applications and is more transparent than ANN. (Tilmant et al. 2002). Certain qualitative approaches such as the analytic hierarchy process (AHP) demand expert knowledge and produce a high level of bias (Lawal et al. 2012). AHP is suited to regional studies, but a global solution and transferable method should be applied to floods (Ayalew and Yamagishi 2005a). Fern andez and Lutz (2010) produced flood maps for the Argentinian cities of Yerba Buena and Tucuman, performing a multi-criteria decision analysis in AHP, noting that this method requires the input of experts (Chang et al. 2008).
Statistical methods are widely used in natural hazards mapping (Lee et al. 2012a(Lee et al. , 2012bHong et al. 2016;Chen et al., 2017c;Chen et al., 2017e;Chen et al., 2017f). Compared to some methods such as machine learning techniques (Chen et al., 2017a;Chen et al., 2017b;Chen et al., 2017d), statistical methods can be easily understood and their processing time is considerably quicker, which makes them appropriate for catastrophe mapping (Tehrany et al. 2013). However, among statistical methods, some are more thorough in catchment and flood mapping than others. Statistical methods can be classified into the two general groups of bivariate statistical analysis (BSA) and multivariate statistical analysis (MSA) techniques (Ayalew and Yamagishi 2005b;Tehrany et al. 2014). Frequency ratio (FR) and weight of evidence (WoE) are two examples of BSA methods that evaluate the impact of every class of each conditioning factor in flood occurrence . Conversely, logistic regression (LR), which is an MSA method, assesses the influence of each individual factor on the flood creation, as well as the correlations among conditioning factors (Althuwaynee et al. 2014). The main objective of this study was to derive the extent of flood prone areas in China, using the three standalone statistical methods of FR, LR and WoE, and to compare the efficiency and precision of these methods in flood susceptibility mapping with the ensemble method of WoE and LR, and the ensemble method of FR and LR. An ensemble method is an integration of two individual methods, in order to enhance the efficiency of each technique (Althuwaynee et al. 2014). Attempts were made to analyse the impact of the classes of each conditioning factor on flooding, and extract the correlations among conditioning factors, such that the user can perform the analysis within a short period of time with easily understandable and executable procedures.

Study area and data used
The Xing guo area is located in the South of the Jiangxi Province, and it lies between latitude 26 4 0 N and 26 42 0 N, and longitude 115 1 0 E and 116 51 0 E. It covers an area of about 3 215 km 2 (Figure 1). In this study, more than 43 geologic groups and units were observed (Table 1). The main lithologies are purple grey feldspar, quartz sandstone, silty slate; light grey chert, and phyllite ( Figure 2(h)). The climate of the study area is classified as subtropical monsoon. According to the Jiangxi Province Meteorological Bureau (http://www.weather.org.cn), the average annual rainfall at Xing guo weather station, over the period 1960-2012, ranged from 895.3 (1963) to 2284.5 mm (1997). On average, there are 156 precipitation days annually, and the rainy season from March to August accounts for 73.1% of the annual rainfall. In May and June, the average monthly rainfall varies between 240 and 250 mm. The average annual temperature is 18.8 C, average annual evaporation 1635.8 mm and average relative humidity is 78%. According to the statistics of the Xing guo County government (http://www.xingguo.gov.cn/), a total of about 40,000 people in the study area were affected by flooding. The damage to property has been estimated at about US$ 5 million annually. However, very few research articles have been published, or measures taken to predict flood sites and prevent damage. Thus, the time has arrived to research flood susceptibility assessment in this area. In the Xing guo area, the main trigger for the occurrence of a flood is an unusually high amount of rainfall.

Flood inventory map
Accuracy in recording the history of flood events impacts enormously on the accuracy of flood susceptibility mapping (Merz et al. 2007). One hundred and ninety six flood location points were selected for inclusion in the inventory. Random points were used in the analysis, in that utilizing the polygon format of the inventory is problematic for the algorithm and exaggerates the results. In most of the similar natural hazard modelling's inventory data was used as a point format Lee et al. 2012b;Rahmati et al. 2016). The map was divided into a 70%-30% proportion for training and testing, respectively (Ohlmacher and Davis 2003). Training locations (137 points) were randomly selected for the production of the dependent data, consisting of 0 and 1 values, with 1 representing the existence, and 0 the absence of flooding. An equivalent of 137 points were selected as non-flooding locations and assigned the 0 value, based on the assumption that the inclusion of non-flooded areas would enhance the accuracy of results. The remaining flood events (59 points) were reserved for testing. Figure 1 displays the flood areas of the study location in China.

Flood conditioning factors
The conditioning factors chosen for particular study areas varied according to location characteristics. While a single variable may impact to a large degree on flooding in a specific area, it may have no impact in another region (Kia et al. 2012). These variables were derived from a field survey or from literature sources (Kia et al. 2012). A total of 15 conditioning factors with 25 £ 25 m pixel size were used in the flood susceptibility mapping: (1) altitude, (2) slope, (3) aspect, (4) geology, (5) distance from river, (6) distance from road, (7) distance from fault, (8) soil type, (9) LULC, (10) rainfall, (11) Normalized Difference Vegetation Index (NDVI), (12) Stream Power Index (SPI), (13) Topographic Wetness Index (TWI), (14) Sediment Transport Index (STI) and (15) curvature. Figure 2 illustrates the thematic maps of the conditioning factors. It should be noted that the topographic data have a direct impact on the modelling output and that many studies are limited by lack of proper topographic data (Bates et al. 2003). Topography and derivative factors have a primary role in the recognition of flood susceptibility (Pradhan 2009), which occurs particularly at low elevations and in flat areas. The flood conditioning factors of altitude, aspect and slope were obtained from digital elevation model (DEM) and are shown in Figure 2(a-c). DEM for the study area was generated from the ASTER Gdem (http://gdem.ersdac.jspacesystems.or.jp/) at a scale of 30 m. The SPI and TWI conditioning factors were obtained by Equations (1) and (2), respectively, and were  The estimated FR and WoE for conditioning factor of (1) altitude, (2) slope, (3) aspect, (4) geology, (5) distance from river, (6) distance from road, (7) distance from fault, (8) soil type, (9) LULC and (10) rainfall and (11)   shown in Figure 2(d,e).
where A s is the specific catchment area, and b (radian) is the slope gradient (in ) (Regmi et al. 2010). STI and NDVI were obtained by Equations (3) and (4)   where b is the slope at each pixel, and A s is the upstream area.
where NIR is the near-infrared band and RED is the red band. The lithology and LULC layers of the study area can be seen in Figure 2(h,i), respectively. The information for the lithology map of the study area was obtained from China Geology Survey (http://www. cgs.gov.cn/) at the scale of 1:200,000. The lithology map was divided into eight groups (A, B, C, D, E, F, G, H, I and J). The NDVI and LULC were obtained from the Landsat 7 ETM+ satellite images. The distances from fault, soil and rainfall layers are shown in Figure 2(j-l), respectively. Regarding rainfall, the mean annual precipitation at 29 rainfall stations was used to create the rainfall map, using the inverse distance weighted method. The precipitation data were extracted from a database of the government of Jiangxi Province Meteorological Bureau (http://www.weather.org.cn). The soil map was divided into eight groups: Ach (Haplic Acrisols); ACu (Humic Acrisols); Alh (Haplic Alisols); Atc (Cumulic Anthrosols); CMd (Dystric Cambisols); CMo (Ferralic Cambisols); RGd (Dystric Regosols) and WR (Water). It was compiled in 1995 by the Institute of Soil Science, Chinese Academy of Sciences (ISSCAS), from data of the Office for the Second National Soil Survey of China (http://www.issas.ac.cn/). The layers of distance from river, and distance from road, were generated by Euclidean distance tool, as shown in Figure 2 (m,n), respectively. Road and river networks were extracted from the topographic map at a scale of 1:50,000. The curvature layer is illustrated in Figure 2(o).

Methodology
The use of flood susceptibility maps for the purpose of LULC planning has increased significantly during the last few decades (Cerra and Prange 2012). Such mapping assists in recognizing and categorizing areas which are threatened by present or future flooding. In this paper, the statistical analysis methods of individual FR, WoE and LR were used, to compare and evaluate their efficiency in flood susceptibility mapping. In addition, to have a more comprehensive outcomes, two ensemble methods of FR and LR, and ensemble methods of WoE and LR were applied and their results were compared with their individual accuracies. The methodology flowchart for the current research can be seen in Figure 3. 3.1. Bivariate statistical analysis (BSA) 3.1.1. Frequency ratio FR is an effective BSA method for evaluating the impact on the occurrences of floods, by the classes of each conditioning factor (Lee et al. 2012b). FR is the expression of the ratio of the probability of an occurrence to that of a non-occurrence for any attribute (Lee and Sambath 2006;Lee and Pradhan 2007). It is regarded as a simple, easily understandable method (Yilmaz 2007). The greater the FR, the more substantial is the relationship between occurrence and specific variable (Pradhan 2010b;Sujatha et al. 2013).
All the scaled flood conditioning factors were classified, in order to perform FR analysis in GIS. While many classification techniques exist, the quantile method was chosen for this purpose, based on its popularity (Tehrany et al. 2013). Altitude, slope, SPI, TWI, NDVI, rainfall, distance from river and distance from road were categorized into 10 equal area classes. FR was applied and the weights were assigned to each class of each conditioning factor. A greater ratio indicates a stronger relationship between a conditioning factor and flooding, and vice versa. If the FR value is higher than 1, the relationship is strong, and conversely weak if less than 1 (Lee and Talib 2005;Sujatha et al. 2013).
For calculating a Flood Probability Index (FPI), the FR value for each factor was added to the training area, as expressed in Equation (5). The flood hazard value represents the relative hazard to landslide occurrence.
where Fr is the rating of each factor's type or range, such that the greater the value, the higher the susceptibility to flood occurrence and the lower the value, the lower the susceptibility to landslide occurrence.

Weights of evidence (WoE)
Weights for the conditioning factor classes were derived using the WoE technique. The weights for classes of each individual conditioning factor were measured in terms of flood occurrence, by comparison of the flood density within a particular class to that of the entire study area. Positive and negative weights W þ i and W _ i ð Þwere assigned to each class (e.g. each soil unit within a soil map) using Equations (6) and (7): and with B i denoting the presence and B i the absence of each factor, S the existence of a flood and S i its absence. Maps of individual factors, with categories to include landslide presence or absence, were employed. W þ i was used to denote the pixels of a conditioning factor (included as a class) to indicate the significance of the particular factor in the occurrence of flooding. The conditioning factor impacts on flood occurrence where W þ i is positive, and has no impact where W þ i is negative. The significance of a factor's absence on flood occurrence is denoted by W _ i . A positive W _ i implies that the factor's absence impacts positively on the occurrence of flooding. Factors with higher value weights are effective for mapping susceptibility, while a weight of zero indicates that there is no correlation with the occurrence of flooding. There are four possible combinations for every factor, the frequency of which is expressed as a number of pixels and can be calculated in GIS. Thorough expositions of WoE modelling can be found in Lee and Choi (2004and Sujatha et al. 2014.

Logistic regression (LR)
LR is a popular type of MSA that considers several parameters that may affect the probability of flood occurrence. The advantage of this method is that the data does not demand a normal distribution and the conditioning factors can either be continuous or discrete, or any combination of both types (Lee and Sambath 2006). LR was first introduced by McFadden (1974) for measuring the probability of any disaster in an area, using a specific formula to generate the conditioning factors. This technique is capable of analysing the relationship between binary dependent variables, with the scalar and nominal values as the conditioning factors (Shirzadi et al. 2012). In the current research, flooding is used as dependent variable (binary) representing the presence or absence of flooding by values of 0 and 1. The LR produces the weights for each conditioning factor, which can be used in GIS to produce the probability map of flood occurrence.
The multivariate LR analysis was performed using SPSS V.19 software. It is expected that the higher the value of a logistic coefficient, the greater will be its impact on the occurrence of flooding (Ayalew and Yamagishi 2005a). The formula in Equation (8) expresses the probability map generated from the derived logistic coefficient: where p denotes probability of flooding as a value between 0 and 1 on an S-shaped curve. Z is the linear combination and thus LR involves the fitting of an equation of the form of Equation (9) to the data: where b 0 is the intercept of the model, b i i ¼ 0; 1; 2; . . . ; n ð Þ represents the coefficients of the LR model, and x i i ¼ 0; 1; 2; . . . ; n ð Þ denotes the conditioning factors (Lee and Sambath 2006). A positive LR coefficient indicates that the existence of the conditioning factor in the area increases the probability of the flood creation. The negative logistic coefficient values imply that the occurrence of flooding is negatively related to that specific factor (Chauhan et al. 2010;Ramani et al. 2011).

Ensemble modelling
Two ensemble methods were used in the current research. The first one was produced using the FR and LR methods, while the second was created using the WoE and LR methods. Based on the aforementioned literature, FR and WoE, as BSA techniques, evaluate the impact of every class of each conditioning factor and assign proper weight to each class, based on its influence on flood creation. To start, both FR and WoE methods were applied and the weights were derived for each factor separately. Each conditioning factor was then reclassified using the derived weights. Subsequently, two data-sets of reclassified factors were entered into LR analysis, and MSA was performed for each factor. Finally, two flood susceptibility maps were produced using two ensemble techniques.

Model validation
The area under the curve (AUC) is a popular, comprehensive quantitative method of accuracy assessment that can be used to evaluate the prediction and success rates (Pourghasemi et al. 2012).
The process of validation was executed by a comparison of known data on flooding with the probability map of acquired flooding, using AUC (Tien Bui et al. 2012a). This demonstrates a perfect classification where AUC = 1, and a classification by chance where AUC = 0.5. A number of studies have used AUC in susceptibility mapping efficiency evaluations (Althuwaynee et al. 2012;Tien Bui et al. 2012b). The technique involves dividing the probability map into equal area categories, and ranking these values in a minimum to maximum hierarchy. The percentage of flooding occurrence for each probability category is determined from the success and prediction curves. These curves are formed by plotting percentages of flood susceptible areas from highest to lowest on the x-axis, and percentages of flood events on the y-axis. A steeper curve indicates a greater number of flood events falling into categories of greater susceptibility.
The training data-set of 137 points (70% of inventory locations of flooding was used to obtain the success rate. When a model is generated from the training flood layer, it cannot be used for validation, as it is not representative of the model's actual efficiency. Nor can training data be used for calculating predictive capability, which indicates efficiency to predict floods in an area (Tien Bui et al. 2012a). Thus, the 59 points (30% of flooded locations) were set aside for measuring the prediction rate. This method can measure the generalization ability of the model.

Flood susceptibility maps
The FPI is a prediction of flood probability for each individual pixel, in terms of a particular set of conditioning factors. Five probability maps based on: (1) standalone FR, (2) standalone LR, (3) standalone WoE, (4) ensemble WoE and LR, and (5) ensemble FR and LR, were produced. The susceptibility maps were prepared using the popular method of dividing the probability map into a specified number of classes (Ayalew and Yamagishi 2005a). Using the quantile method, values were separated into five categories: very low, low, moderate, high and very high. Figure 4 shows the flood susceptibility map produced using all five models.
Based on the LR, WoE and FR methods, the highest flood susceptibly of the studied area is located from 115 15 0 E to 115 30 0 E and 26 00 0 N to 26 30 0 N. However, some differences can be seen in the results (Figure 4(a-c)). In general, the LR and WoE models show agreement for very high susceptibility areas located in the central regions of the studied area.

Accuracy assessment
Our analysis of flood hazard was structured and validated with known flood location data. Validation was performed using the AUC method and training and testing locations of flooding. Having generated the model using training flood locations, validation based on this data does not represent the model's real efficiency. The prediction rate evaluated the accuracy of prediction, based on the application of conditioning factors for these models. This implies that AUC can be used for qualitative assessment of prediction accuracy. By hierarchical arrangement of the calculated values of all study area cells into descending order, the prediction methods were compared. Thereafter, cell values were arranged into 100 classes at 1% accumulation intervals. A comparison of standalone and ensemble method accuracies is shown in Figure 5. WoE, in both individual (88.07%) and ensemble (90.36%) approaches, produced a higher capability in prediction than FR (60.05%) and LR (70.78%). AUC results show that FR created the least accurate susceptibility map of the different approaches. This might be due to the linear and simple method of its calculation. Conversely, WoE has an advanced framework of measurements. The proficiency of ensemble modelling was proven by our results, as both LR and FR accuracy increased through their integration. The success and prediction rates for ensemble FR and LR were 75.49% and 81.47%, respectively.
Comparing the class of very high flood susceptibility, our result indicated that there is very high susceptibility to flood in the central regions of the study area.

FR and WoE outcomes
The FR and WoE results denote the weights of the classes of the conditioning factors. Tables 1 and 2 listed the FR and WoE for the 15 factors and their respective relationships to the flood events, respectively. Thus, the relationship between occurrence of flooding and the classes of each conditioning factor was established through FR analysis. In Table 1, the altitude analysis results indicate that the lowest classes of 109-168, 168-208 and 208-250 m were most influential, while the highest classes of 420-486, 486-594 and 594-1196 m were least influential on flooding. This reflects the natural characteristics of flooding, which occurs mostly in vast areas of low elevation, and not on mountain peaks (Table 1). For this conditioning factor, the trend of WoE outputs was similar to FR results (Table 2).
Slope is another indicator that can prove the existence of a strong relationship between flooding and flat areas of low elevation. The FR ratio greater than 1 illustrated that floods occurred mostly in areas with the slope range of 0 -10.59 . Thus, it can be inferred that the greater the slope, the greater the occurrence of flooding in lower parts of the catchment (Table 1). However, the indirect effect of the regions with high slope cannot be ignored, as they increase the speed of water flow and reduce the time for absorption and penetration of water into the ground. The trend of WoE outputs was similar to the FR results for this conditioning factor (Table 2).
Aspect was selected for this analysis, due to its impact on the amount of precipitation and level of sunshine (Abubakar et al. 2012). Excluding flat, all other classes of this factor exhibited a relationship with flooding, with the strongest found at the north-west aspect ( Table 1). The trend of WoE aspect outputs was similar to the FR results (Table 2).
A geology map was used in the analysis, as this factor has a significant influence on the water behaviour on the ground (Tehrany et al. 2015). In the study area, the main lithologies were purple grey feldspar, quartz sandstone, silty slate; light grey chert and phyllite. Of the 10 lithology types, the ratio was highest for F (288), followed by E (91). It should be mentioned that lithology types C and D had minimal impact on flood occurrences (Table 1). The trend of WoE lithology outputs was similar to the FR results (Table 2).
Distance from river is one of the most important factors in flood susceptibility mapping. River levels will increase due to the heavier rainfall during flooding, causing an overflow of water into areas closest to the river bank. Distances of up to 125 m from the river had the most significant impact on flooding (382 ratio). Further, the distance range of 125-4384 m showed a ratio greater than 1, which represented the long coverage and spread of alluvial floods over the studied area (Table 1). For distance from river, the trend of WoE outputs was similar to the FR results (Table 2).
Distance from road is also a major factor in flood susceptibility mapping. Impervious roads and surrounding urban surfaces have a considerable influence on flood levels. They decrease the infiltration capacity of the terrain and are a source of runoff (Shuster et al. 2005). Based on the results, distances up to 75 m from the road had the most significant impact on flooding (205 ratio), while the range of 75-3045 m showed a ratio greater than 1, indicating that there is a relationship between this factor and flooding, which can be explained by the fact that urban areas are mostly covered by impervious surfaces such as asphalt, which are strongly susceptible to flooding (Table 1). For distance from road, the trend of WoE outputs was similar to the FR results (Table 2).
Through FR analysis, the relationship between flood occurrence and distance from fault was also established. The correlations between this factor and flooding were studied by Brothers et al. (2011). Regarding the use of this factor in flood susceptibility mapping, there is a hypothesis that these features on the ground might affect the direction of the runoff and subsequently, influence the flooding. This hypothesis was used and tested in this research. However, results showed that no significant correlation existed between flooding and distance from fault, in the modelling of this study. Our result only showed that the highest distance of 13,292-25,875 from fault made minimal impact on flood occurrences (Table 1).
Soil type has a direct impact on water storage, permeability and drainage (Mojaddadi et al. 2017). Hence, this factor was also used in our analysis. For the soil conditioning factor, the highest ratio was indicated by soil type RGd with a ratio of 60.34. The lowest ratios of 0 and 0.64 for soil types 'Haplic Acrisols' and 'Cumulic Anthrosols', respectively, indicates that a minimal relationship exists between these types and flood occurrences (Table 1). LULC is a very important factor in recognizing sensitive regions prone to flooding. Vegetated areas offer levels of protective mechanisms, making land less prone to flooding. Therefore, a negative relationship exists between a flood event and vegetation density. Conversely, urban areas increase the storm water runoff due to the extensive impervious surfaces (Al-Zahrani et al. 2016). The FR outputs indicated that a strong relationship exists between LULC and flooding. The analysis produced the highest FR ratio of 365 for residential areas, and the lowest ratio of 0 for the barren areas. It should be mentioned that in barren lands, where there is no vegetation coverage to control and prevent the rapid flow of water on the ground, the potential for flooding and loss of soil (erosion) over the area increases (Shabani et al. 2014). However, this is contrary to the results of this study and can be explained by the fact that the soil type in the barren areas was 'Haplic Acrisols', which has a minimal impact on flood occurrence (Table 1).
Rainfall is a triggering factor in flood generation and in the absence of this factor, no flooding will be generated. The highest ratio for the rainfall conditioning factor occurred in areas with more than 1067 mm precipitation (the ratio exceeded than 100). Results indicated that areas within the range of 675-860 mm of precipitation had a ratio of 111 (Table 1).
NDVI, an index representing the vegetation density over the study area, was also used in this research. As already mentioned, vegetation can decrease the speed of the runoff on the ground (Lei et al. 2014). The highest ratio of 337 for the NDVI conditioning factor was related to the ¡0.48 to 0.04 class, followed by a ratio of 296 for the 0.04-0.08 class. Our results indicated that all NDVI classes of more than 0.27 had a zero ratio, indicating a minimal impact on flood occurrence (Table 1). This confirms that increasing vegetation density can have an influence on flood reduction.
The FR analysis showed the impact of SPI on flood occurrence. The ratio was highest (1562) when SPI was 690,985 to 1,151,642, followed by a ratio of 99 when SPI was between 0 and 230,328 (Table 1). Usually, areas with lower power streams are more susceptible to flooding (Mojaddadi et al. 2017). This is due to the fact that most of the regions with higher SPI values are located on the slope of the mountains and steep areas, where flooding will not occur.
As explained earlier, TWI represents the effect of topography on the location and size of saturated source areas in generating runoff . The FR outputs on the TWI conditioning factor showed that in general as TWI increases, the ratio of FR also increases, which could be explained by the fact that there is a direct relationship between this parameter (soil moisture status) and flood occurrence (Table 1).
STI factor was selected for the analysis due to its correlation with flood generation. The areas which have less STI capability are mostly located on flat terrain, and thus have a higher potential for flooding. The highest ratio of 145 for the STI conditioning factor was related to the class of 0, followed by a ratio of 112 for the 1058 to 1446 class. Our results indicated that all STI classes of more than 5064 had a zero ratio, indicating a minimal impact on flood occurrence (Table 1). In the case of curvature, flooding mostly occurred in areas having flat curvature, which is confirmed by the FR results on the studied area. The reason is that the flat terrain is the most suitable area for flood occurrence. Hillsides promote water flow to lower and flat regions, leading to flooding.

Logistic regression outcomes (LR)
SPSS V.19 software was used to establish the flood occurrence to flood influencing parameters relationships. LR measurements are displayed in Table 3. Beta represents the logistic coefficients, or LRderived weights, for each conditioning factor. The flood probability values were calculated using these estimated coefficients. The Wald (Wald chi-square values) column can be used to measure the significance of each factor in flood occurrence. Sig is another useful measure provided by LR, indicating the significant probability. If sig value is less than 0.05, it indicates that the factor has statistically significant effects on flood generation, at the 5% significance level (Chauhan et al. 2010). Regarding the LR coefficients, only slope, TWI and STI, exhibited a negative relationship with flooding, while all other conditioning factors showed a positive correlation. Considering the absolute values of LR coefficients, LULC with 0.002 has the lowest, and slope with ¡0.055 has the highest LR weights of all the factors. For the probability map, LR values were transported to ArcGIS 9.3 software, while Equation (9) In the next step, the probability index was calculated from Equation (8), giving a range from 0 to 0.99.

Conclusion
Flooding is often catastrophic for mankind. In recent history, attempts have been made to curtail its devastation of communities, properties and economies, with early warning and control systems, pioneered in developed countries where massive spatial databases are available for the analysis of flood data. Areas susceptible to floods may be identified, and by analysis of spatial distribution data, future occurrences can be predicted and potential damage curtailed. This study contributes a comparative analysis of the standalone FR, LR and WoE statistical modelling techniques, ensemble FR and LR and ensemble WoE and LR, as tools for flood susceptibility mapping. The aim of the research of comparing the performance of these techniques, on the basis of efficiency and accuracy, by mapping areas susceptible to flooding in the Xing guo region of China, was successfully achieved. Fifteen flood conditioning factors were selected and a flood inventory map was utilized to create the flood dichotomous dependent layer. Having applied the five statistical models, the relationships between occurrence of flooding and the variables were evaluated. Standalone LR outcomes showed that a negative relationship with floods was observed for the variables of slope, TWI and STI, while all other variables demonstrated a positive correlation with the occurrence of floods. The derived weight from the standalone FR and WoE showed considerable similarity in recognizing the most influential classes of some of the conditioning factors. Accuracy in the estimation or preparation of the conditioning factors is imperative for reliability in flood susceptibility mapping. After establishing the flood rating classes, it is crucial to understand the potential degree of impact of each conditioning factor, in contributing to flooding. Such information is essential for the planning of conservation measures in susceptible areas, by which not only onsite impact, but also downstream effects of transported sediment, can be minimized. Our study results showed that the standalone WoE and ensemble WoE and LR prediction rates surpassed those of the individual LR and FR methods, in terms of accuracy. The highest prediction rate of 90.36%, and the lowest prediction rate of 60.05%, were derived from the ensemble WoE and LR, and individual FR, respectively.
In this study, distance from faults was used to test whether this factor has a correlation in flood occurrence. However, the derived weights indicated no significant relationship between this conditioning factor and flood occurrence. In general, less vegetated, higher rainfall areas at lower elevation are assumed to have greater susceptibility to floods. Our study supports the complementing of the traditional hydrological, with the current RS and GIS, technologies, thus enhancing the collection, storage, analysis, manipulation and modelling of flood data, as well as being more cost, time and manpower efficient.