Using GIS and Random Forests to identify fire drivers in a forest city, Yichun, China

Abstract Forest city (FC) usually refers to an urban area with high forest coverage. It is a green model of urban development that has been strongly advocated for by governments of many nations. Forest fire is a prominent threat in FC development, but the causes of fires in FCs are usually different and more complex than in pure forested areas since more socio-economic factors and human activity are involved in the ignition and spread of fire. The large and increasing number of lives being exposed to wildfire hazard highlights the need to understand the characteristics of these fires so that forest fire prediction and prevention can be efficient. In this study, Ripley's K(d) function and Random Forests (RF) were applied to analyze the drivers, spatial distribution and risk patterns of fires in Yichun, a typical FC in China. The results revealed a clustered distribution of forest fire ignitions in Yichun, as well as identified the driving factors and their dynamic influence on fire occurrence. Fire risk zones were identified based on RF modelling. Improved preventive measures can be implemented in the fire prone areas to reduce the risk of fire in Yichun by considering the factors identified in this study.


Introduction
Forest fires harm forest ecosystems and pose a threat to human life, property and safety (Grogan et al. 2000;Chuvieco et al. 2008;Chas-Amil et al. 2010;Chang et al. 2013). Understanding the drivers that determine the spatial distribution and occurrence of forest fires is essential for successful forest fire prevention, including optimized fire-fighting resources and development projects, such as construction of fire lines and watchtowers (Hering et al. 2009). The drivers and spatial distribution of forest fires have been studied in many different regions of the world (Martinez et al. 2009;Grala and Cooke 2010;Padilla and Vega-Garcia 2011;Gralewicz et al. 2012;Guo et al. 2016Guo et al. , 2017aGuo et al. , 2017b, though the results of these studies vary widely due to differences in meteorology, topography, vegetation, infrastructure and other socioeconomic factors, making generalizations challenging (Maingi and Henry 2007;Syphard et al. 2008;Vilar et al. 2010).
A forest city (FC) is a green model of urban development that has been strongly advocated for in many nations in recent years. A FC usually refers to an urban area with high forest coverage and where forests play a leading role in urban ecosystem function. Compared to natural forests or remote forested regions, forest fires are more likely to occur in a FC due to the high frequency of human activity and the density of infrastructure (Martinez et al. 2013). Although these fires are generally small due to early detection, intense suppression efforts and better firefighter accessibility (Massada et al. 2009), every ignition source has the potential to grow into a large fire and can lead to serious damage to human health and property loss. Along with rapid advancement of FC construction nationwide, forest fire prevention and suppression will be an essential task for city security in China, making the quantification of wildfire risk and understanding of fire drivers in FCs particularly important. Identifying forest fire drivers in FCs is usually more challenging than in pure forested areas due to complex relationships between the various factors that may influence fire occurrence in FCs, including climatic, biophysical and social variables. Therefore, a fire prediction tool, which can fully account for the complexity between fire occurrence and its driving factors, is needed.
Random Forests (RF) is an ensemble learning method based on classification and regression trees (CART). RF can select important variables and calculate the relative importance of each independent variable automatically no matter how many variables are used initially (Cutler et al. 2007). Additionally, RF has been demonstrated to have a high prediction accuracy and high tolerance to outliers and 'noise' (Breiman 2001;Archibald et al. 2009). Due to the strengths of RF, fire occurrence studies have begun using this approach in recent years (Oliveira et al. 2012;Wu et al. 2014). Yichun is a typical FC in Northeastern China, famous for eco-tourism. Like many of China's FCs, Yichun faces a serious threat from forest fires in the process of urban development. To further understand the fire risks and influencing factors in Yichun, and to provide reference for other FCs, this study identifies the spatial distribution of historical fire ignitions  in Yichun and uses RF to examine an extensive set of factors (including four broad categories: meteorology, vegetation type, infrastructure and socio-economic factors) to assess their ability to influence the distribution and incidence of forest fire in this region. In this paper, we have carried out detailed analysis to understand the spatial distribution pattern of forest fire; identify the drivers and their importance for forest fire occurrence and map fire risk zones based on driving factors in Yichun. With its findings, this study can be used to improve forest fire management in FCs in China and serve as reference for forest fire management in other regions.

Study area
Yichun is a key forestry region in China and prone to occurrence of large forest fires due to its climate and vegetation type. It is located in the northeastern part of China (127 37'-130 46'E, 46 28'-49 26'N) and covers an area of 39,017 km 2 ( Figure 1). It is a hilly region, with an average altitude of 600 m. The population of Yichun is about 1.2 million, scattered across the landscape (Figure 2(a)), and rivers, railways and roads are widely distributed in the area (Figure 2(b)). Average annual air temperature is 1 C and average annual precipitation is 750-820 mm. Forest cover if 83.8%, consisting of mixed conifer and broadleaved forests and dominated by Korean pine (Pinus koraiensis), along with Larix gmelinii, Pinus sylvestris var. mongolica and Populus davidiana. Fire seasons are April-June and September-October of each year.

Overarching study design
Ripley's K(d) function was used to identify the spatial distribution of forest fires in the study area. RF algorithms were then used to identify key drivers of forest fire occurrence based on broad potential factors selected from a literature review. Potential factors (independent variables) that are likely to cause forest fire were Figure 2. Distribution of (a) towns and villages, (b) railways, roads and rivers, and (c) proportion of vegetation types in the study area.
classified into three categories, meteorological, non-meteorological (local factors) and combined factors, and models were then analyzed based on these three classifications accordingly. RF algorithms performed using the varSelRF package in R were used to select drivers and rank their importance, and partialPlot was then used to test the marginal effect of a variable on fire occurrence (Diaz-Uriarte and Alvarez de Andres 2005; Thompson and Spies 2009). The predictive ability of each model was evaluated based on a cut-off point that was calculated using receiver operating characteristic (ROC) curve analysis. In order to eliminate bias, we randomly selected 20% of the complete dataset as independent validation data, which was only used to assess performance of the final predictive model. The remaining 80% of the complete dataset was used for model building, for which the data (80% of the complete dataset) was randomly divided into a training dataset (60%) and an inner validation dataset (40%) (Rodrigues and Riva 2014). This 'data division' procedure was repeated 10 times, resulting in 10 random sub-samples of data, where each sub-sample has its own training and inner validation dataset. Ten intermediate models were obtained based on the sub-samples. The variables that were tested as important variables in at least six of the 10 intermediate models were considered as 'key drivers' and were used to fit the model building dataset (80% of the complete dataset) and to conduct the final validation based on the independent validation dataset (20% of the complete dataset).
The study area was divided into three zoneslow, medium and high fire riskaccording to the map of fire occurrence probability, and the cutoff value was

Dependent variables
The Forest Fire Prevention Office of Yichun, China provided forest fire data from 1980 to 2010. The existing database on forest fire contains information about fire location, size and cause. According to this database, 479 wildfires occurred in the study area from 1980 to 2010, and the majority of fires reported occurred due to railways (29.20%), slash burning (28.59%) and other unknown reasons (29.05%) ( Figure 4).
In order to establish a discriminant model of forest fires, a certain percentage of reference points (non-fire points) were required. ArcGIS software was applied to randomly generate 1000 non-fire (i.e. control) points at a ratio of 1:2 to the fire ignition number (Catry et al. 2009). To ensure control points would not be on the same or nearby location as fire ignition points, a buffer zone of 1000 m around fire points was placed as a barrier, excluding the control points that fell into the buffer zone (Kalabokidis et al. 2007;Guo et al. 2017aGuo et al. , 2017b. The double random principle of time and space was adopted during the random generation process (i.e. the space coordinates were randomly generated, while the time points were selected from fire seasons during 1980-2010, then the space and time points were randomly combined together).

Independent variables
Independent variables included five broad categories: topography, vegetation type, infrastructure, meteorology and socio-economic factors, with a total of 27 explanatory sub-variables.
2.3.2.1. Topography. Topographic factors consisted of three explanatory variables: elevation, slope and aspect. Data for these variables are from digital topographic maps with a spatial resolution of 25 m, created in 2000 and provided by the National Administration of Surveying, Mapping and Geoinformation of China. ArcGIS 10.0 software was used to extract the corresponding elevation, slope and aspect of each fire and non-fire point from the maps. Elevation and slope values for each point were used directly in the modelling process. The aspect was categorized as flat, North (315 -45 ), East (45 -135 ), South (135 -225 ) and West (225 -315 ). The proportion of each aspect in the study area was calculated and the corresponding value for each fire and control point was used to develop the model. According to the geographical boundary of Yichun, polygons were grouped into the following four categories: (1) needle leaf deciduous and needle leaf evergreen trees (12.08%); (2) broadleaf deciduous trees and shrubs (63.52%); (3) grass and agricultural crops (21.83%) and (4) urban construction land, permanent wetland and barren or sparsely vegetated land (2.57%) (Figure 2(c)). Forest vegetation types for each fire and non-fire point were extracted from the vegetation map using ArcGIS 19.0 software. The proportion of each vegetation type located in a fire or non-fire point was used in the modelling process.
2.3.2.3. Infrastructure. Four infrastructure variables were used: distance from fire point or non-fire point to the nearest railway, nearest roads (including urban main roads and rural roads), nearest river and nearest residential area. Basic geographic data was retrieved from a 1:250,000 Digital Line Graphic (DLG) map, which was created in the year 2000 and obtained from the National Administration of Surveying, Mapping and Geoinformation of China (http://www.sbsm.gov.cn/). Distance to the nearest railway, road, river and residential area was calculated using ArcGIS.

Meteorology.
Daily meteorological data from the fire season (March-November) of 1980-2010 were obtained from four national weather stations located in the four main districts of Yichun. Daily climate data were provided by the China Meteorological Data and Sharing Network (www.cma.gov.cn/2011qxfw/ 2011qsjgx/), and contained the following 12 climate factors: daily average, maximum and minimum ground surface temperature ( C), daily average and maximum wind speed (m s À1 ), 24 h precipitation (mm), daily average pressure (hPa), sunshine hours (h), daily average, maximum and minimum air temperatures ( C) and average relative humidity (%). The corresponding daily climate factors for each fire and control point were retrieved under an ArcGIS19.0 environment. The daily climate variables for each fire and control point were provided by the meteorological station that was identified as being closest to each point.
Monthly weather data during the fire season, including monthly average relative humidity (%), temperature ( C) and precipitation (mm) were also obtained from the HADCM2 climate model, downloaded from the Data Sharing Infrastructure of Earth System Science (http://geodata.nju.edu.cn/Portal/index.jsp). The HADCM2 climate model was developed by the Hadley Centre for Climate Research and Prediction and the original dataset was collected from the International Panel on Climate Change (IPCC) in grid format. The data were converted into text format and data points for China were extracted with a spatial resolution of 0.085 . The corresponding monthly climate factors for each fire and non-fire point were retrieved under an ArcGIS19.0 environment.
2.3.2.5. Socio-economics. Socio-economic factors included three variables: population density, per capita GDP and net income of rural residents. Since the majority of rural areas in Yichun are forested, the net income of rural residents is highly dependent on forest related economic activities, which in turn influences local fire occurrence. Due to the lack of a vector dataset of socio-economic factors, we collected the recorded socio-economic data for Yichun from the Heilongjiang statistical yearbook (2006). Because this data does not contain spatial information, the same value for each socioeconomic factor was assigned to the fire and non-fire points in each given year.

Spatial distribution analysis
Ripley's K(d) function is a statistical method used to calculate nearest neighbour distance and has been widely used to identify the spatial distribution pattern of fire occurrence (Podur et al. 2003;Vadrevu et al. 2008;Guo et al. 2015;2017b). The definition of Ripley's K(d) function has also been explained in detail in previous studies (Peterson and Squiers 1995;Stoyan and Penttinen 2000;Guo et al. 2015). In this study, 'the guard area correction', a basic edge correction method, was used in Ripley's K-function and SpPack software was used to perform the K-function (Perry 2004). Confidence envelopes were set to 95% based on 499 replicates.

Random Forests model
Random Forests (RF), developed by Breiman and Cutler in 2001, is an ensemble learning method based on classification and regression trees (CART). RF has been demonstrated to have a high prediction accuracy and high tolerance to outliers and 'noise' (Breiman 2001). It can also evaluate the relationship between covariates and dependent variables, and calculate the relative importance of covariates (Cutler et al. 2007;Kubosova et al. 2010). RF has been applied in various fields in the past, including medicine, genetics, ecology and remote sensing. In recent years, it has been used in forest fire forecasting, for which it has demonstrated good predictive ability (Oliveira et al. 2012;Rodrigues and Riva 2014;Kane et al. 2015).
RF generates ntree bootstrap samples from the original dataset. For each of the bootstrap samples, it establishes an unpruned classification or regression tree. The final outcome is the average of the results of all the ntrees. For each bootstrap sample, the outcome consists of about 63% of the overall samples, and retains about one-third of the samples (called out-of-bag, OOB) for validation. For each classification or regression tree, at each node RF randomly samples mtry of the predictors and chooses the best split from among those variables. When using RF for data fitting and prediction, the number of trees (ntree) and the number of each random variable at each node (mtry) must be parameterized. For selecting mtry, Liaw and Wiener (2002) suggests choosing mtry ¼ ffiffiffiffi m p . As the number of trees increases, the overall error rate will tend to be a stable upper boundin other words, as long as the number of trees is large enough, the overall error rate of the forest will stabilize to ensure the convergence of the RF (Cutler et al. 2007;Genuer et al. 2010;Kubosova et al. 2010). In this study, we set the mtry as ffiffiffiffi m p and ntree as 2000. RF can also be used to rank the importance of variables. The basic concept is as follows: for the jth variable (X j ), the OOB error (errOOB j t ) of each tree t is calculated and then the value of the jth variable (X j ) is permuted while all others are left unchanged among OOB data and the OOB error ( 0 OOB j t ) is again recalculated on this permuted dataset. RF estimates the importance of a variable by looking at how much the prediction error increases when OOB data for that variable is permuted. The importance score of X j is: where R is the summation of all the trees and ntree is the number of trees in the random forest (Gromping 2009;Genuer et al. 2010;Oliveira et al. 2012). For regression, the OOB error is the mean square error (MSE), while for classification, the OOB is misclassification probability.
The RF algorithm, which aims to minimize the OOB error, can be used to select featured variables in the model. In this study, the R package varSelRF was used to conduct the variable selection (Genuer et al. 2010). The partialPlot function in the varSelRF package was used to construct partial dependence plots to reveal the relationship between the dependent and independent variables (Hastie et al. 2001;Cutler et al. 2007).

Evaluation of model performance
The prediction accuracy of the RF model on fire occurrence was evaluated based on model-selected variables using ROC curve analysis (Fielding and Bell 2002;Rodrigues and Riva 2014). Youden criteria, calculated based on the sensitivity and specificity of ROC (i.e. Youden criterion = sensitivity þ specificity À1) (Taube 1986), were used to determine the 'cut-off' value, which was applied as a threshold to determine the prediction accuracy of RF. In this study, if the predicted probability of a given point in space were greater than the cut-off point, forest fires were considered to occur, otherwise there is no forest fire occurrence (Garcia et al. 1995;Catry et al. 2009;Chang et al. 2013).
In addition, the area under the ROC curve (AUC) can be considered an indicator of the goodness of model fitting, ranging from 0.5 to 1 (Zhou et al. 2002;Franklin 2010). The greater the AUC value, the better the performance of the model at fitting the data. When the AUC value equals 0.5, the model fit is considered to be equivalent to a random guess. The model is poorly fit when AUC is 0.5-0.69, moderately fit when AUC is 0.7-0.79 and well fit when AUC is 0.

Mapping fire risk zones
A likelihood map of fire occurrence was created using Ordinary Kriging interpolation in ArcGIS 10.0, using the original probability scale (Isaaks and Srivastava 1989) combined with the previously determined cut-off point for forest fire occurrence. Yichun was divided into three fire risk zones: low (0-cutoff), medium (cutoff-0.5) and high (>0.5) fire risk (Chang et al. 2013). The likelihood of forest fire occurrence was analyzed and high fire risk zones within the study area were identified.

Spatial pattern of fire occurrence
Ripley's K function was used to analyze the spatial pattern of forest fire occurrence in Yichun, China from 1980 to 2010 ( Figure 5). As shown in Figure 5, there is a clear departure from complete special randomness (CSR) towards clustering from 1980 to 2010, indicating a clustered distribution of forest fires in study area.

Meteorological factors
The process of variable selection based on meteorological factors is shown in Appendix Table 2. Based on the iterative results of ten sub-training sets, eight variables were selected for the training sample, including monthly average temperature, with the theoretical Ripley's K-function, representing complete spatial randomness (CSR). The pink line represents the empirical K-function under CSR with green and red lines, which are 95% confidence envelopes. The X-axis represents the distance to the nearest ignition from a given ignition point and the unit is degrees. monthly average relative humidity, monthly average precipitation, daily maximum ground surface temperature, daily average ground surface temperature, sunshine hours, daily maximum temperature and daily average relative humidity (Appendix Figure 9). The training sample resulted in the selection of relative humidity and temperature as the main driving factors influencing forest fire during the study period. All variables in Appendix Figure 9 influenced the occurrence of forest fires, but with varying degrees of importance.

Local, non-meteorological factors
The process of variable selection based on local factors is shown in Appendix Table  3. The results of testing the 10 sub-training sets is that six variables were selected to fit the training sample, including distance from fire points to the nearest railway and residential area, as well as net income of rural residents, elevation, per capita GDP and population density (in descending order of importance) (Appendix Figure 10).

Combined factors
The process of variable selection based on combined factors is shown in Appendix Table 4. Appendix Figure 11 shows that based on the results of 10 sub-training datasets, eight variables were used to fit the training sample. Distance from fire point to the nearest railway and daily average relative humidity were the factors with the most influence on forest fire occurrence, followed by distance from fire point to the nearest residential area and daily maximum ground surface temperature, which were the most influential of the non-meteorological and meteorological factors. Compared to other factors, population density, per capita GDP, net income of rural residents and daily maximum temperature had relatively less influence on fire occurrence, but they were still important driving factors of forest fires.  Table 1 shows the RF model predictive ability based on meteorological, local and combined factors. According to the sensitivity and specificity values derived from ROC curve analysis, cut-off values were obtained for the 10 sub-training samples and the training sample for each factor-based model. Combined with these cut-off values, RF model prediction accuracies were also computed. Results show that the prediction of meteorological factors is the least accurate (sub-training samples ranging from 71.2% to 76.5% and a training sample prediction of 74.2%). RF model prediction for combined factors had the highest accuracy (80.78-84.8% for the ten sub-training samples and 82.9% for the training sample) (Table 1).

Model fitting and prediction accuracy
Additionally, in the case of inner and independent validation, the prediction accuracy of meteorological factors range from 69.5% to 75.0% in the inner validation of ten sub-samples and that in the independent validation is 74.8%. In terms of the result of local factors test, its range is roughly equal to that of sub-training sample. Furthermore, the similar situation took place in the complete sample. Overall, the prediction for combined factors is the highest accuracy among three categories (meteorological, local factors and combined factors), while the accuracy of meteorological factors prediction is the least (Table 1).

Relationship between driving factors and fire occurrence
Partial dependence plots (Figure 7) show that the distance between fire points and railways does not have an obvious linear relationship with the probability of ignition; it shows a sharp downward trend within an 18km radius, a weaker trend from 18 to 38 km, and then a gradual upward trend at a radius >38 km. The probability of ignition is greatest when the average daily relative humidity is 20%, and then declines with increasing humidity, before stabilizing above 70%. The distance between fire points and the nearest residential area has an important influence on forest fire, mainly within an 18 km radius; it declines with increasing distance, then probability of ignition stabilizes at >18 km. The probability of ignition increases sharply as daily maximum ground surface temperature exceeds 20 C and as maximum air temperature exceeds 3 C. The probability of fire occurrence shows a sharp downward trend when the net income of rural residents is <2000 Yuan per month, and stabilizes when net income is >1200 Yuan per month. The probability of fire occurrence shows the same trend with per capita GDP as with net income, with a steep drop in probability between 0 and 2500 Yuan before stabilizing at higher values. The impact of population density on forest fire is relatively significant at <30 people per km 2 , and then declines slightly at >30 people per km 2 .

Distribution of likelihood of fire occurrence
A map of likelihood of fire occurrence (Figure 8(a)) was produced and represents the probability of a fire occurring at a given location. Fire risk classes were also generated (Figure 8(b)), which divide the probability graph into three fire risks based on the threshold of the model fitting process and 0.5 classification points (Chang et al. 2013). These maps show a high risk of fire occurrence concentrated in the middle, southwestern and northeastern portions of the study area. They also show fire prone zones distributed throughout the southern and the northernmost regions.

Discussion
Distance between fire points and the nearest railway or residential area were identified as the main local factors affecting forest fire occurrence. Railways in Yichun predominantly run through forested areas, and about 29.2% of forest fires were found to  be caused by locomotives, especially trains ( Figure 4). Results show that with an increase in distance between forests and railway tracks, the influence of railways on fire ignition decreases (0-20 km), indicating that forests closer to railways have a higher chance of fire occurrence (Figure 7). Many studies attribute fire ignition near railways to accidents or negligence (Cardille et al. 2001;Martinez et al. 2009;Arndt et al. 2013). The main reasons for fire originating near railways include sparks released by the train's steam engines, fire accidents that occur on board the train, passengers smoking on the train and a lack of control over burning activities near the tracks. The increase in fire occurrence found with increasing distance (>40 km) from tracks likely reflects the influence of other factors besides railways (Figure 7). The relatively strong influence of residential areas on forest fire ignition (within a 10 km radius of fire points) is likely due to the high intensity of human activities, including clearing weeds from residences, agricultural activities, deliberate fires and other activities ( Figure 7). As the distance from residential areas to forests increases, the influence of residential areas on fire occurrence decreases and then stabilizes (>20 km). Government statistics show that about 1.02% of total forest fire ignitions are caused by escaped fire from residents, indicating only a minor influence of local residential areas on fire occurrence and contradicting our findings ( Figure 4). Interestingly, the percentage of fire ignitions stated for unknown reasons is quite a lot higher (29.20%). Based on observations and knowledge acquired from forest officials, it is suggested that these anonymous sources are most likely attributable to human-caused fire. The findings of this study, as well as local knowledge about forest fire occurrence in this region, indicate that fire prevention efforts should be more focused on local residential areas.
This study also found that socio-economic factors, including per capita GDP and population density, also affect forest fire occurrence, and these findings are supported by previous studies by Syphard et al. (2007) and Chang et al. (2013). In Yichun, most people who earn <1000 Yuan per month are highly dependent on forest resources for their income and many people with a low income engage in slash-and-burn during agriculture and wood collection in winter. Many studies have considered variables related to population density and fire risk, as there is a direct link between a growing number of people and pressure on forests, potentially increasing the risk of fire ignition (Guyette and Dey 2000;Cardille et al. 2001;Guyette and Spetich 2003). In Yichun, a large number of people reside close to forested areas and about 28.59% of fires are caused by agricultural activities like slash-and-burn, clearly demonstrating the critical role that human populations play in forest fire occurrence.
Roads link the city to surrounding rural and forested areas, suggesting that local roads may affect forest fire occurrence (Oliveira et al. 2012), though our study did not identify any meaningful relationship between fire points and distance to the nearest road. Syphard et al. (2007) found a similar lack of relationship, in contrast to the findings of Miranda et al. (2012). The reason for this may be, as suggested by Stephens (2005), that the road network is too dense to reflect potentially important influences on forest fire, as there is a high density of roads around forested areas in Yichun ($0.21 km/km 2 ).
Maximum ground surface temperature (GST_max), maximum air temperature (Temp_max) and average relative humidity (RH_avg) are three important climate factors influencing fire occurrence in Yichun. GST_max and Temp_max have a positive impact on fire occurrence, while RH_avg has a negative impact. High temperatures can increase evaporation from plants, as well as decrease moisture content of potential fire fuels (e.g. downed woody material), leading to increased probability of fire occurrence (Chuvieco et al. 2004). Conversely, high relative humidity induces an increase in moisture content of the fuel, thereby reducing the likelihood of fire (Guo et al. 2017a(Guo et al. , 2017b. Previous studies have concluded that temperature and relative humidity do not have a consistent linear relationship with fire occurrence (Castro et al. 2003;Wu et al. 2015), and our results support this finding with a few threshold points at which the relationship trend changes (15 C and 20 C for GST_max and Temp_max, and 70% for RH_avg).
Distribution patterns of fire occurrence likelihood are critical to the management of forest fire (Saglam et al. 2008). Understanding this distribution pattern can assist in optimal allocation of resources (Wei et al. 2008) and in determination of the number and location of fire observation towers (Catry et al. 2009), ultimately contributing to more efficient use of financial and human resources. It is also an important basis for the division of forest fire risk zones (Chang et al. 2013). In recent years, forest fire management agencies have paid more attention than previously to the prevention and monitoring of forest fires (Chang et al. 2013). The results of this study show that forest fires mainly occur in the central, northeastern and southwestern regions of Yichun, while the southern region and the most northern region also have incidences of forest fire, but to a lesser extent. Fire watchtowers and checkpoints should be built and additional fire-fighting resources should be available in these high-risk fire zones in order to reduce forest fire incidence and minimize losses.
With government investment in tourism, local GDP and per capita income in Yichun expected to continue to increase in the next decades, forest fire frequency may be reduced (Figure 7). However, the population is also expected to increase from a current population density of 37 people/km 2 due to the opened birth policy implemented a few years ago, which will lead to increased development and promote the occurrence of fires. According to simulations with various general circulation models (GCMs), the earth's climate will be several degrees warmer by the end of the next century due to increasing atmospheric concentrations of greenhouse gases (Flannigan et al. 1998), and it is therefore reasonable to consider an increasing temperature in Yichun over the next few decades, which may also increase fire occurrence. Although we can predict future changes in certain factors, in general, predicting the future fire frequency in Yichun is difficult since fire occurrence results from a complex set of interactions, including fuel conditions, topography, human activity and weather including temperature, relative humidity and precipitation. Therefore, any changing factor alone does not necessarily cause a significant impact on fire frequency (Flannigan et al. 1998;Syphard et al. 2007).
More in-depth analyses of the relationship between these various factors and their joint impact on forest fire occurrence is required in the future, especially given expected changes such as climate change. This study offers a suitable approach to explore the features of forest fire occurrence in a typical forest city and provides insight on appropriate forest fire management accordingly. These findings can be widely referenced by other regions that already have forest cities or are undergoing forest city construction.

Conclusions
In this study, we used Ripley's K(d) function and RF models to analyze the spatial distribution of forest fire ignition and identify drivers and their effect on fire occurrence in Yichun, China. The results of our analysis are as follows: (1) forest fires are present in a clustered distribution across Yichun; (2) overall, local factors showed a stronger influence on local fire occurrence than meteorological factors, with distance from fire points to the nearest railway and residential area, elevation, net income of rural residents, per capita GDP and population density identified as the key drivers of forest fire occurrence; (3) RF performed well, forecasting fire occurrence in Yichun with a high prediction accuracy (82.9%) using all factors combined and (4) based on historical data, central, northern (north, northeast) and southern (south, southwest) regions of the study area were identified as high fire risk zones. Forest management agencies and policy makers should pay particular attention to these regions and allocate additional fire-fighting resources accordingly. Figure 10. Variable importance measures of local factors from Random Forests sub-samples based on mean decrease accuracy (X-axis), which quantifies the importance of a variable by measuring the change in prediction accuracy when the values of the variable are randomly permuted compared to the original observations. Dis_railway represents distance to railway; Dis_residential: distance to residential; Elev: elevation; CGDP: per capita GDP; Den_Pop: density of population; Income_rural: net income of rural residents. Figure 11. Variable importance measures of combined factors from Random Forests sub-samples based on mean decrease accuracy (X-axis), which quantifies the importance of a variable by measuring the change in prediction accuracy when the values of the variable are randomly permuted compared to the original observations. GST_max represents daily maximum ground surface temperature; RH_avg: daily average relative humidity; Temp_max: daily maximum temperature; Dis_railway: distance to railway; Dis_residential: distance to residential; Income_rural: net income of rural residents; CGDP: per capita GDP; Den_Pop: density of population.