Predictive applications of Australian flood loss models after a temporal and spatial transfer

ABSTRACT In recent decades, considerably greater flood losses have increased attention to flood risk evaluation. This study used data-sets collected from Queensland flood events and investigated the predictive capacity of three new Australian flood loss models to assess the extent of physical damages, after a temporal and spatial transfer. The models’ predictive power is tested for precision, variation, and reliability. The performance of a new Australian flood loss function was contrasted with two tree-based damage models, one pruned and one un-pruned. The tree-based models are grown based on the interaction of flood loss ratio with 13 examined predictors gathered from flood specifications, building characteristics, and mitigation actions. Besides an overall comparison, the prediction capacity is also checked for some sub-classes of water depth and some groups of building-type. It has been shown that considering more details of the flood damage process can improve the predictive capacity of damage prediction models. In this regard, complexity with parameters with low predictive power may lead to more uncertain results. On the other hand, it has also been demonstrated that the probability analysis approach can make damage models more reliable when they are subjected to use in different flooding events.


Introduction
Flood is a common natural disaster in Australia, and a frequently occurring natural phenomenon in the world (Baeck et al. 2014;Hasanzadeh Nafari et al. 2015;Bhatt et al. 2016). In recent decades, flood impacts have increased (Kreibich et al. 2007;Cheng and Thompson 2016;McMillan et al. 2016;Mojaddadi et al. 2017), reaching 29% of the total cost of Australian natural disasters (Bureau of Transport Economics 2001). Hence, flood risk evaluation including hazard assessment and estimation of the associated consequences (Ciullo et al. 2016;Vojtek and Vojtekov a 2016) has attracted growing attention (Raaijmakers et al. 2008;Merz et al. 2010;Cammerer et al. 2013;Kundzewicz et al. 2013). While much effort has gone into hazard investigation, i.e. models of probability and intensity of flood, flood loss estimation models are still subject to a high level of uncertainty (Merz et al. 2004;Kreibich and Thieken 2008;Meyer et al. 2013). Loss estimation is needed in cost-benefit analyses of disaster risk reduction measures (Mechler 2016), vulnerability and resilience studies, flood risk analyses, and in the insurance and reinsurance sectors (de Moel and Aerts 2011).
Flood impact can be classified as direct or indirect damage (Thieken et al. 2005;Molinari et al. 2014). Direct losses happen in the flood boundary and due to the physical impacts of water on flooded objects (e.g. humans, properties, building contents, or any other objects), while indirect flood damages could occur outside the flooded area or after inundation time (Chen et al. 2016;Novelo-Casanova and Rodr ıguez-Vangort 2016). Both direct and indirect losses can be categorized as tangible or intangible consequences (Gissing and Blong 2004;Thieken et al. 2005). Tangible losses can be estimated in fiscal terms, but intangible losses are non-marketable (Chinh et al. 2015). The focus of this study is on the physical, tangible impacts of flood and the spatial scale is on the order of individual residential buildings.
Although there is currently no widely accepted method for estimating flood damage in urban areas, most approaches rely on stage-damage functions for simplicity (Luino et al. 2009;Merz et al. 2010;Meyer et al. 2013). Stage-damage functions, which date back to White (1945), are usually based on the level of water and the vulnerability of the buildings at risk (Thywissen 2006). The functions establish a relation between the level of water (i.e. flood magnitude) and the expected damages for specific building vulnerability classes (Thieken et al. 2006;Dewals et al. 2008;Jongman et al. 2012). Nonetheless, there are some exceptions which account for further impact parameters such as flow velocity, water contamination, duration of inundation, individual precautionary behaviour, or early warning time (Cammerer et al. 2013;Merz et al. 2013). Stage-damage functions can be derived based on real damage data (i.e. empirical curves), or they can be developed by 'what-if' questions (i.e. synthetic curves) (Smith 1994;Amadio et al. 2016). Each approach has some advantages and disadvantages . Flood loss functions can also be grouped as absolute or relative. The absolute type expresses the extent of losses in fiscal terms, while relative functions show the magnitude of damages as a ratio of the asset price, i.e. replacement or depreciated cost of the property, and are independent of market variations ).
On the other hand, flood damage might be controlled by a variety of influencing parameters rather than the ones considered in stage-damage functions (Schr€ oter et al. 2014). Merz et al. (2013) have classified these parameters into flood intensity factors including depth of water, flow velocity, return period, duration, and contamination of water; and building flood-resistant indicators including material and characteristics of property, individual precaution and emergency actions, early warning time and preparedness, former flood experience of residents, and residents' socio-economic situations (Merz et al. 2013). Accordingly, data mining techniques, as effective alternatives to traditional stage-damage functions, have recently been used for exploring the interaction and the importance of different damage-influencing parameters in Germany, the Mekong Delta, and Australia (Merz et al. 2013;Chinh et al. 2015;Hasanzadeh Nafari et al. 2016c;Kreibich et al. 2016). These studies show that the impacts of different affecting factors can be studied effectively with the treebased data mining technique, which is mostly utilized in water resource studies and hydrology science, but rarely in flood-loss modelling (Merz et al. 2013).
Flood loss models (either stage-damage functions or tree-based models) are sharply restricted to the features of the area of origin (i.e. flood features and building characteristics) (Hasanzadeh Nafari et al. 2016a). Thus, transferring the damage models to a new study area and/or a new flood event does not result in an accurate relationship between the extent of damages and the impacts of flood, unless the models have been calibrated with an empirical data-set collected from the new case study (Oliveri and Santoro 2000;Luino et al. 2009;Cammerer et al. 2013). This loss of accuracy naturally reduces the predictive capacity (Schr€ oter et al. 2014). On the other hand, the largest effect on loss estimation is induced by the shape of the applied damage models, while precision in collecting hydraulic input and flood characteristics is of minor importance (Apel et al. 2009;de Moel and Aerts 2011). Therefore, validation of flood loss models is one important step of model development (Cammerer et al. 2013). However, due to a lack of historical data, little research has been done on the validation of models, especially when they are subjected to a temporal and/or spatial transfer Merz et al. 2010;Seifert et al. 2010;Meyer et al. 2013), and Australia is no exception.
This study, therefore, attempts to explore the predictive performance of three newly derived Australian flood loss models, one stage-damage function and two tree-based models, after a temporal and spatial transfer. First, all three models are developed and calibrated based on the empirical dataset collected from one flood event that occurred in Queensland at the beginning of 2013. Afterwards, their predictive capacity is compared and contrasted with the official damage data of the 2012 flood event. The models' predictive power is tested regarding precision, variation, and reliability of the results. The prediction capacity is also checked for some sub-classes of water depth and some groups of building-type.

Study area and flood event in 2013
The study area of the 2013 flood event is the city of Bundaberg, on the Burnett River in south-east Queensland. Due to its geographical characteristics (e.g. location and ground elevation), shown in Figure 1, this city has seen several flood disasters in recent years. One of the most significant happened in January 2013. The flood happened after Tropical Cyclone Oswald and its associated rainfall (Alamdar et al. 2016). The flood had significant negative consequences on Bundaberg's economy as more than 2000 buildings were impacted, and damage to public infrastructure was estimated around AUD103 million (Hasanzadeh Nafari et al. 2016a). The maximum level of flood water was recorded as 9.53 metres and its return period was estimated as 100 years (North Burnett Regional Council 2014). An empirical data-set including information on the hazard intensity, the vulnerability of buildings, and the associated damages used for the models' development was collected after this flood event.

Study area and flood event in 2012
The second area of study is the city of Roma, and the data-set used for the cross-regional and temporal model validation was collected from a flood event that occurred there in February 2012. The city of Roma is in the Maranoa region in Queensland, on the Condamine River. The flood event, which happened due to an intense rainfall, damaged more than 444 residential properties. It was a rare disaster in Roma's 149-year history and its return period was estimated as 100 years (North Burnett Regional Council 2014). The inundation boundary is shown in Figure 2.

Empirical damage data collection
Queensland Reconstruction, as a government authority for responding to Queensland natural disasters, has collected and provided the data-set used in this research including 250 data samples from the 2012 flood and 607 samples from the 2013 flood. The official data-set, which was compiled from post-disaster on-site surveys, includes information on flood impacts (e.g. depth, velocity, contamination), specifications of the affected buildings (type of building and number of storeys, construction material, area of the building, protection of mechanical and electrical utilities, and emergency measures undertaken), and the extent of losses. The empirical data-set expresses the magnitude of losses by illustrating the status of all structural components in post-disaster time (i.e. which components are undamaged and which ones are partially or entirely damaged). Accordingly, the damage ratio was calculated by dividing the replacement cost of the affected components by the total replacement cost of the property (Hasanzadeh Nafari et al. 2016a). More data about the affected buildings (e.g. age and total replacement value) and the status of their residents was gathered from the National Exposure Information System of Australia database (Dunford et al. 2014).

Damage models
In this study, the performance of three newly established Australian models, different in approach and complexity, was compared and contrasted with real system data. All models were calibrated and developed based on the same data-set gathered from the 2013 flood event in Queensland. After a spatial and temporal transfer, their predictive capacity was assessed in comparison to the 2012 damage data.

FLFA rs
The Flood Loss Function for Australian residential structures (FLFA rs ) was newly developed by Hasanzadeh Nafari et al. (2016b). The FLFA rs is an empirical-synthetic model, meaning that this model was initially developed using a simplified synthetic approach called the sub-assembly method, developed by the HAZUS manual (FEMA 2012). Then, the synthetic curves were calibrated using the data of the 2013 flood event in Queensland. To be more precise, this model takes the empirical data of damage and depth, stratified by building classifications, and uses the chi-square test of goodness of fit to fit a parameterized function to compute depth-damage estimates.
The paper has illustrated a bootstrapping approach to assist in exploring the inherent uncertainty of the empirical data and the associated confidence limits around parameters of the stage-damage function. For every building (i.e. one-and two-storey buildings with masonry and timber walls), three stage-damage functions (i.e. most likely, maximum, and minimum damage functions) are depicted. In more detail, for each type of building, using a bootstrapping approach and the chisquare test, resampling of the related empirical loss values was carried out for 1000 times, and 1000 sets of functional parameters were generated. Afterwards, the average of the 1000 sets of functional parameters was converged to the values considered for the most likely curve which produces the smallest error. The function that maximizes the depth damage relationship was taken as a maximum curve, and the observation that created the minimum depth damage relationship was taken as the minimum curve (see Figures 3-6). As mentioned, the range of estimates represents the epistemic uncertainty of the empirical data-set. The advantages of this approach compared to most Australian synthetic models include the ability to utilize empirical data; considering the epistemic uncertainty about the depth damage relationship and representing robust damage curves; and capacity to easily change functional parameters based on different characteristics of Australian buildings. The stage-damage functions utilized in this study are the most likely curves shown in Figures 3-6.

Regression trees
Regression trees were drawn based on the approach of Hasanzadeh Nafari et al. (2016c). Compared to the outcomes of that study, the model has been redeveloped, and its shape has been adapted based   Table 1.
Data mining was carried out using tree-based analysis and Weka machine learning software algorithms (Kalmegh 2015). Branches were generated in a way which maximizes the predictive capacity of the model, and prediction of damage ratio in every terminal node was carried out based on the average value of all loss ratios dedicated to the node (Kreibich et al. 2016). In the tree-based analysis, the overfitting issue needs careful attention (Merz et al. 2013). This issue can affect the prediction capability of a model if it is fully developed on one data-set (Breiman et al. 1984;Pal and Mather 2003). Then, trees should not be made complicated by branches which do not enhance the   (Bramer 2007). On the other hand, trees should not be too simple, and they should take advantage of branches that could enhance the predictive capability of the model (Merz et al. 2013). Accordingly, in this study, to choose the most accurate model with a better predictive capacity taking into account the spatial and temporal transfer, two trees (i.e. one pruned, with 12 terminal nodes, and one un-pruned, with 21 terminal nodes) were grown and utilized (see Figures 7 and 8). These sizes were selected due to the minimum value of error (MAE: Mean Absolute Error) calculated by a 10-fold cross-validation test on the original data-set (i.e. the 2013 flood event data). For the error calculation, the damage records were randomly partitioned into 10 subsets. Then, 10 iterations of model calibration and model testing were carried out. In each iteration, the model was calibrated using nine subsets of the data, while the picked-out subset was kept for the model testing. In the end, errors were calculated by averaging over all 10 iterations (Refaeilzadeh et al. 2009).
As shown in Figure 7, the flood depth, precaution actions, floor space, and quality of property by having, respectively, five, three, two, and one decision nodes are the influencing variables of the pruned regression tree. Also, Figure 8 represents the importance of building value, water depth, floor area, quality of property, and precaution actions in the un-pruned model by taking six, five, three, three, and three decision nodes, respectively. As is evident, the advantage of the tree-based model can be addressed to the ability to consider more damage-influencing parameters, while it does not reflect the inherent uncertainty of the data-set.

Validation of the models
As stated earlier, model validation is a major step in model development, but due to a lack of historical data, it has been widely neglected. Model validation should represent the intended purpose of the model and may represent the replicative application or the predictive application of a damage model. The replicative validation approach refers to the performance of the model on a data-set which has been used in model development. The predictive validation process assesses the model's capability of predicting an independent data-set (Power 1993). The focus of this work is on the  Table 1).
predictive validation approach after a spatial and temporal transfer, and the authors have attempted to explore the suitability of the models as compared to real system data.
To test the predictive capability of the models, 200 sets of 250 affected buildings were randomly drawn from the original data-set; each model was applied to every building record and results were calculated and averaged over all samples. Following the approach of Schr€ oter et al. (2014), the models' predictive capacity was tested for the precision of the outcomes, variation of the residuals, and reliability of the results. Accordingly, precision was tested by Mean Bias Error (MBE) and MAE. The MBE as the overall bias error is negative if the predicted damage values are smaller than the actual loss records, and it is positive if an overestimation has occurred in the prediction. Also, MAE represents how much predictions are adjacent to the real damage data (Chai and Draxler 2014). Residuals variation has been checked by the Coefficient of Variation (CV) measurement, which represents the extent of variability in relation to the mean of the population. A smaller CV shows a lower spread of prediction errors (Hasanzadeh Nafari et al. 2015). The models' reliability has been examined using the Hit Rate (HR) value, which illustrates the percentage of damage records included in the 90% range (i.e. 95 ¡ 5 quantiles interval) of predicted values (Schr€ oter et al. 2014). This quantile interval represents the nominal coverage rate of 90% of model outcomes. Accordingly, the model performance shows a perfect reliability if the HR is equal to 0.9, i.e. the nominal coverage rate of 90% of model outcomes is equal to the coverage rate of actual damage records (Thordarson et al. 2012). The models' evaluation criteria are calculated based on Table 2.

Results and discussion
The accuracy of the results and the validation of the models' performance were tested three times. First, the overall performance of the aforementioned models, calibrated with the 2013 data, was tested for predicting the extent of losses of the 2012 flood event. Afterwards, the water depth 'd' was divided into six different groups (d < 20 cm, 20 < d < 40 cm, 41 < d < 60 cm, 61 < d < 80 cm, Figure 8. Unpruned regression tree (RT up ) with 21 leaves (for the description of the examined predictors, see Table 1). 81 < d < 100 cm, d > 101 cm), and the models' outcomes were contrasted with the corresponding damage data. Finally, the buildings were grouped into four classes (one-storey timber buildings, two-storey timber buildings, one-storey masonry buildings, and two-storey masonry buildings) and the evaluation was repeated. Table 3 represents the results of the overall comparison, which are calculated by averaging the overall sample outcomes.
Overall, all three models, newly derived for Australian geographical conditions, perform well (due to a slight underestimation; low variation and acceptable reliability in results). However, as to precision, Table 3 shows that the pruned tree, compared to FLFA rs , is better for predicting the 2012 flood event, having fewer values of mean absolute error and mean bias error. This accuracy is due to considering more influencing parameters and having more complexity, which increases the capability to predict flood damage, especially when the model is transferred in time and space. Also, the un-pruned tree that was grown fully on the original data is less precise on an independent data-set. The result confirms the hypothesis of the lower prediction ability of an un-pruned tree on an independent system data (i.e. a low generalization capability), signifying a low level of transferability in time and space (higher rigidity to the original system data). The un-pruned regression tree (RT up ) might also be subject to an additional source of uncertainty since additional variables are taken into account. It is worth noting that all damage models, on average, show a slight negative bias from the actual damage values, which indicates an approximate 1%-3% underestimation in predictions. The variation of the errors was checked based on the distributions of the residuals (CV), and FLFA rs shows more variation in the results.
As stated earlier, the HR indicator, i.e. the percentage of damage records included in the 90% interval of predicted values, was utilized for testing and comparing the reliability of the models' predictions. According to Table 3, the performance of FLFA rs is more reliable, since the HR indicator is 0.79 (very close to 0.9 as the nominal coverage rate of 90% of model outcomes). This matter accords with the variation test outcomes. Consequently, the reliability of the models seems to be more dependent on the model approach. The FLFA rs , as opposed to the other deterministic models (e.g. pruned regression tree (RT p ) and Unpruned regression tree (RT up )), is a probabilistic dependence model that depicts the most likely relationship among the water depth, the building characteristics, and the percentage of damages (Lehman and Hasanzadeh Nafari 2016). The reliance of this approach on the probability distributions of damage ratios has increased its performance reliability.  As mentioned before, the predictive capability has also been studied for some sub-classes of water depth and building characteristics. Figures 9 and 10 show the precision (the Euclidean distance of the MAE and MBE errors) of the results for six different sub-classes of water depth and four groups of building-types. Figure 9 shows that the uncertainty of the pruned tree is less than the other two models, except when the water depth is between 41 and 60 centimetres or is more than 101 centimetres. In these two cases, FLFA rs and RT up approaches show a slightly better performance. Figure 10 also depicts less uncertainty and more accuracy in the results for the transferred pruned tree in contrast to the other two flood damage models. However, FLFA rs performs better for the two-storey timber buildings.
The above differences in the magnitude of errors related to the critical sub-classes for RT p damage predictions are not too considerable, and the results justify the overall better performance of the pruned tree, after transferring and using in a new area of study. This matter accords with the earlier findings. However, the outcomes would indicate the use of FLFA rs if the critical water depth  coincides with the critical type of the building (i.e. the water depth is between 41 and 60 centimetres or is more than 101 centimetres, and the structure is a two-storey building with timber walls).

Conclusions
Flood is a frequently occurring natural disaster with significant adverse consequences for Australian societies. Hence, flood risk management and flood risk reduction are attracting growing attention. In this context, damage assessment and loss prediction are important components of flood risk mitigation. Although there is no widely accepted method of flood loss assessment, traditional stage-damage functions, due to simplicity, are accepted as the international standard for the estimation of direct losses. These functions estimate the extent of losses by establishing a relationship among water depth, type of building at risk, and magnitude of damages. On the other side, flood as a complicated process may be controlled by more influencing parameters which are neglected in the traditional stage-damage functions. In this regard, tree-based analysis and data mining have recently been used to create some new flood loss estimation models with more complexity and more damage-influencing parameters.
Although a variety of approaches are used in today's studies, flood damage assessment models are still subject to a high level of uncertainty. Model validation, which is dependent on flood features and building specifications, has the largest effect on the accuracy of results. Model validation needs more careful attention if the damage model is used in a new study area and/or applied to a new flood event. However, due to a lack of historical data, model validation has been widely neglected in Australia. This study, therefore, has attempted to explore the validation and the predictive performance of three newly established flood loss models from Australia, different in approach and complexity (i.e. one stage-damage function and two tree-based models), after a temporal and spatial transfer. All three models were developed and calibrated with the data from the 2013 Queensland flood event. Their predictive capacity was compared and contrasted with the official loss records of the 2012 flood event. The models' predictive power was tested for precision, variation, and reliability.
The flood stage-damage function utilized in this study (FLFA rs ) is an empirical-synthetic model that relies on the probability distributions of damage ratios. This newly derived model is a probabilistic dependence model that depicts the most robust relationship among water depth, building characteristics, and the percentage of damages. This stage-damage function has been developed by considering the epistemic uncertainty about the depth damage relationship. In addition to the stagedamage function, two tree-based models have also been grown in this study. The trees are similar in approach and different in complexity, one being a pruned tree with less complexity, and the other an un-pruned fully grown tree with more complexity. These models are grown on the basis of the minimum value of errors, and the importances and influences of 13 candidate predictors (i.e. depth of water, velocity, contamination of flood, private precautionary actions, emergency actions undertaken, former flood experience, area of building per person, average value of property, quality and resistance of property, and residents' socio-economic situation).
Results show that the pruned tree is better for predicting the 2012 flood event, having less uncertainty of results. This accuracy is due to the complexity of the model (i.e. considering more damageinfluencing parameters) which increases the capability to predict flood damages, especially when the model is transferred in time and space. Results also confirm the low level of transferability of the fully grown un-pruned tree, which is due to the low generalization capability of this model. In addition, it has been shown that the performance of FLFA rs is more reliable than the other two models. Accordingly, the reliability of the models seems to be more dependent on the model's approach rather than its complexity. As stated above, FLFA rs , as opposed to the tree-based deterministic models, is a probabilistic dependence model. The reliance of this approach on the probability distributions of damage ratios has increased the reliability of its performance. Besides this overall comparison, this study has also explored the accuracy of the results and compared the performance of the models for some sub-classes of water depth and building-type. Generally, the results accord with the overall comparison outcomes. However, this detailed analysis indicates the use of FLFA rs for floods with a depth ranging from 41 to 60 centimetres or more than 101 centimetres and buildings with a two-storey structure and timber walls.
All in all, considering more details of the damaging process can be useful for improving the predictive capacity of Australian flood damage prediction models and enhancing the level of transferability. In this regard, statistical tests need careful attention, since complexity with parameters with low predictive power might have adverse effects on the outcomes and may increase the level of uncertainty of the results. Furthermore, reliance on probability analysis can intensify the reliability of damage models when they are subjected to use in different flooding events of Australia.