Assessment of tropical cyclone disaster loss in Guangdong Province based on combined model

ABSTRACT Tropical cyclone (TC) disaster loss assessment is an important and difficult problem in TC prevention and disaster mitigation. Few studies have focused on combined model in this area. This study introduced a new combination model method to predict TC disaster loss, taking Guangdong province as an example. We analysed and collected 67 TC data from 1993 to 2009, which had impact on Guangdong province, in which 60 were randomly for training data and another 7 were for testing data. We conducted three models – GA–Elman neural networks, support vector regression (SVR) and generalized regression neural networks (GRNN), and the root mean square error (RMSE) value we got are 5.05, 7.85 and 3.82, respectively. Then the three models are combined into a comprehensive evaluation model by model combination method. The RMSE of the test results is 3.30. The results show that the combined model is superior to one individual model and it is a more accurate and stable method.


Introduction
Influenced by global warming, atmospheric circulation becomes abnormal and cyclone disasters occur frequently. TC strikes coastal areas every year, which cause more and more serious losses and becomes one of the worst natural disasters in China. It is noteworthy that Guangdong province has suffered from TC the worst among all the provinces. Annual loss caused by TC accounted for an average of 0.64% of gross domestic product (GDP) in Guangdong province, which was much higher than the national average of 0.19% (Zhong and Chen 2012).
Currently, cyclone activities are intensified by global warming and the nature has become fragile to resist disasters by the interference of human activities. Influenced by these double factors, the disasters caused by TC are getting more and more serious (Tang and Sui 2014). Therefore, TC disaster assessment is of great importance and demand. However, due to numerous impact factors, complex relations and unfamiliarity with disaster mechanism, TC disaster assessment has always been the bottleneck in disaster mitigation.
Many related research have focused on the methods of disaster assessment. For example, Li and Fang (2012) et al. built a TC loss index formula based on translating speed, duration, size and wind speed. Considering comprehensively the high wind and heavy rainfall caused by tropical cyclones, Zhang et al. established influence index to evaluate TC disasters (Zhang, Wei and Chen 2010).
Fuzzy math is commonly used as an assessment method. Focusing on China's Zhejiang province, Yu (2015) evaluted the TC disaster based on intuitionistic fuzzy theory. Chen et al. (2011) proposed fuzzy intelligent decision support system for tropical cyclone disaster management, and created a database to assess the risks of TC in the system. They also estimated direct economic losses of tropical cyclone disaster in two aspectscase-based deduction technology and fuzzy theory (Xiao-Chun et al. 2003). Some scholars believe that it is difficult to apply linear simulation method to the problem of TC disaster loss assessment, so various nonlinear methods are adopted. Luo et al. (2012) built the assessment model of TC disaster loss in Zhejiang province, China, using geographic information system (GIS) technology and support vector machine (SVM), and they also tried PCA -BP method (Zheng 2011). Liu et al. then used Elman neural network (Liu et al. 2013).
Hence, it is hard to choose assessment factors because of the numerous impact factors of TC disasters and the obscure disaster-causing mechanism. Choices of factors in existing literature are somehow flawed. For example, when measuring the factors of rainfall, many students choose only the maximum rainfall as the rainfall assessment factor, so do they do to measure the gale. It is obviously unreasonable to represent the face with spots. Therefore, these pages will represent wind-rain numerical value in the form of faces. In addition, we will add the landing sites of TC to the assessment factors. As for the socio-economic part such as the density of population, we adopt historical dynamic statistics according to the years when there were TCs.
According to prediction assessment models, one model can only contain or reflect partial information of the whole system features. That means it is difficult to demonstrate the TC disaster law accurately through a single model, with its shortcomings. However, by integrating several types of prediction assessment models into a combined assessment model, which comprehensively processes the advantages of every individual model, and effectively collects more useful information and objectively describes the reality in a more reasonable way, we can benefit from a more precise assessment.
Generalized regression neural network (GRNN), GA-Elman neural networks and support vector regression (SVR) are all mature theoretical nonlinear models, and they have been studied and applied in different fields. In the paper, these three TC disaster prediction assessment models will be conducted, respectively. Then, a combined model will be conducted based on the above three models, taking Guangdong province for example.

Generalized regression neural network (GRNN)
Artificial neural network (ANN) technology is a nonlinear dynamical system combining artificial intelligence technology such as mathematical statistics, neural computation, symbolic logic and so on. The output of the system only depends on the connection weights between input and output with linear mapping capability, regardless of the understanding of the object mechanism or establishment of complex mathematical models. These are all its advantages and the weights' value of the connections is obtained by learning of the training samples. It is particularly efficient to solve obscure problems in which the law is inherent and the mechanism is unclear. There are several models in neural networks, among which GRNN was proposed by Specht in 1991one kind of radial basis function (RBF) neural network.
The notation of x and y is used to denote, respectively, the independent variable and dependent variable. Assume the joint probability density of x and y is f ðx; yÞ, and the observed value of x is known to be X; so y is the return for X. Therefore, the conditional mean iŝ Y is the assessment output of Y when the input is X.
Specht pointed out that the continuous probability density function can be obtained from the observed values: (2) where: X i , Y i are the ith observed values of samples, respectively, for random variables x and y; s is smoothing parameter; p is dimension of the random variable x; and n is the number of samples.
GRNN has four organizing structure layers, respectively; input layer, pattern layer, summation layer and output layer, which are as shown in Figure 1.
The number of neurons in input layer is equal to the dimension of input vector in the study sample. As simple distribution unit, each neuron directly passes each element of the input vector to the pattern layer.
The number of pattern layer unit is equal to the number of training samples k. Each unit corresponds to different samples. Its transfer function is Summation layer consists of two types of units. One is the denominator of corollary (3) summing all the model layer unit outputs. The connection weights are 1 with each neuron in model layer. The transfer function is  The other type of unit is molecule formula (3), and its molecules take each y i of the output sample Y in pattern layer as connection weights, which operate weighted sum on the corresponding neuron output in the pattern layer. Here is the transfer function: The number of output layer units is equal to the number of output vector dimension k in training sample. By calculating summation layers of each unit and dividing one by the other, namely: These results show that GRNN topology depends on the training samples. There are few parameters artificially adjusted. The only factor affecting the output of the network is smoothing parameter s. As network training is an optimizing process for smoothing parameter in essence, this feature determines that GRNN can prevent assessment from being influenced by human subjective assumptions to the greatest extent.

Elman neural network
Elman neural network is proposed by J.L. Elman in 1990 (Wang et al. 2014). It is a typical dynamic neural network. Elman network can be considered as a forward neural network with local memory unit and local feedback connections. Elman networks are generally divided into four layers: input layer, hidden layer, context layer and output layer. As shown in Figure 2, the connections of input layer, hidden layer and output layer are similar to the forward feedback network, and the connection weights can be corrected by learning. Context layer receives feedback signals from the hidden layer. Each hidden layer unit connects a corresponding context layer unit. The function of context layer is putting hidden layer output of last moment and network output of the current moment into hidden

Context Layer
x c (k) y(k) layer through connection memory. It is equivalent to status feedback. This inner feedback network is sensitive to historical data, which empowers the network itself to deal with dynamic information, so as to achieve the purpose of dynamic modelling. Tropical cyclone disaster assessment is essentially a dynamic system. These pages reflect the complex nonlinear relation between loss and disasters that tropical cyclones bring and every factor, which is based on the mathematical model -Elman neutral network, which has been through samples training.

Genetic algorithm (GA)
Genetic algorithm (Sivanandam and Deepa 2008) is a random overall searching and optimization algorithm that simulates biological natural selection and natural genetic mechanism. It automatically acquires and accumulates the knowledge about searching space during the searching process, and adaptively controls the searching process in order to reach the optimum solution. Using encoding technology, GA generates strings of numbers like chromosome, and the strings are individuals. Every individual will be confirmed with fitness according to certain rules. To form new individuals, GA exchanges individuals' organized and random information. Individuals with high fitness are more likely to be inherited in the next generation. Then the optimal or near-optimal solution is available, which is similar to biological evolution.
Three basic operations of GA are selection, crossover and mutation. The function of selection is to propagate the best, which means the individuals with higher fitness are more likely to be chosen, reflecting the principle of survival of the fittest. Crossover is to form a new entity, refers by combining and exchanging partial digital strings of two individuals, which is similar to genetic exchange. Mutation changes spatial digit string values in a certain probability to generate a new individual, which is similar to genetic mutations.

GA-Elman neural network
Elman neural network model is improved on the basis of GA, which means GA is utilized to optimize the Elman network parameters previously (Ding et al. 2013). First, we get several groups of network parameter initial data randomly and set network errors as fitness to calculate the fitness of network initial parameters with GA. After several generations of evolution, the best group of network parameters for Elman network can be found. We then apply this group of network parameters in Elman network and further optimize them with training data. Therefore, the basic structure of GA-Elman model consists of Elman network and GA upgraded part. GA-Elman model algorithm is shown in Figure 3.

SVR
2.3.1. Basic principle of SVM SVM was first proposed by Cortes and Vapnik (1995). Based on cross validation (VC) dimension theory and structural risk minimization principle in statistical learning theory, SVM transforms to be a quadratic optimization problem finally; thus, the partial local optimal solution must be global optimal solution. SVM was originally proposed to tackle classifying problems, but it can also be applied in regression analysis.
SVM for regression problem solving is called SVR model (Pai et al. 2010). The basic principle is transforming input sample x into a high-dimensional feature space through nonlinear mapping FðxÞ. Then fit Fðx i Þ; y i f gin the feature space. The fitting function is If f ðx; wÞ makes jy i À f ðx; wÞj e; ðe > 0Þ, namely e-linear regression is found. To maximize e ffiffiffiffiffiffiffiffiffiffi ffi 1þjjwjj p under the accepted fitting error, slack variables ξ and ξ * need to be added. Here e-linear regression problem can be transformed into an optimization problem: s.t.
where C is the penalty factor controlling the influence of fitting error. The optimization problem duality form is available based on Lagrange multipliers: where a i ; a Ã i ði ¼ 1; 2; . . . ; nÞ are Lagrange multipliers; Kðx i ; xÞ is a kernel function. During the process of unholding original space sample data into high-dimensional feature space, it is the kernel function that conducts nonlinear conversion and mapping function. It is critical to deal with SVM nonlinear problems with Kernel function. The Kernel function, together with the choice of its parameters, are both important factors directly impacting SVM algorithm. There are four major kernel functions commonly used now: linear kernel, polynomial kernel function, RBF kernel function and sigmoid kernel function. In this paper, the most widely used and most adaptable kernel function RBF is selected.

SVM parameter optimization
In RBF kernel function-based SVM model, it is crucial to select penalty factor C and kernel function parameter g. There is no best unified way recognized to optimize the two parameters in academics. In the paper, particle swarm optimization (PSO) is chosen to optimize parameters (Kennedy and Eberhart 1995). PSO simulates the behaviours of bird flock, considering every solution to optimized problem as flying particle, which adjusts itself dynamically according to its own experience and optimal particle experience. After iterations, the optimal solution will be found. It shows great advantages such as high precision, fast convergence, simple implement and few parameters, etc.
However, limited by training samples, random selection on training and testing sets, PSO is not convincing for its unstable testing results with high contingency. In order to deal with that, crossvalidation (CV) is used currently. The basic principle of CV is to divide data samples into K groups, and to make every subset, respectively, a testing set. Meanwhile, it makes the remaining K¡1 subsets of samples as training sets. Then there will be K SVR models. Based on them, precision rate or fitness is calculated for SVR models. By doing so, it can effectively avoid over-learning or under-learning, and the testing results will be more convincing.

Selection of assessment factors
The main disasters caused by TC are violent rainstorms and high winds. Therefore, minimum pressure, maximum wind speed, rainfall, duration and scope of tropical cyclones, which affected Guangdong province, are chosen as key factors of the assessment index. The scope is represented with different levels of rainfall and numbers of wind speed sites. Among all, rainfall and winds are in greatest connection with TC disasters, which should be placed on special emphasis. With documents available, areas are usually marked by maximum rainfall and maximum wind speed for the limitation of the form and number of statistics. Considering that statistics of a point in the area have nothing to do with the face of the place, we analytically gather and study 26 rainfall and wind speed values of one area. The face of the certain station is represented by the value of different levels of rainfall and wind speed. The coastline in Guangdong province is the longest around the whole country. When the landing sites of TC are different, the disasters caused by it are different. So the landing sites of TC are selected as assessment factors. Some scholars (Xiao-Chun et al. 2003) conducted related work with statistics method and divided Guangdong into three classes, namely: class 1, western (Yangjiang-Zhanjiang coastal section); class 2, Pearl River estuary (Huidong-Taishan coastal section); and class 3, eastern Guangdong (Raoping-Haifeng coastal section). In this paper, we will focus on a larger range of tropical cyclones, so we have added two types of TCs and defined them according to their landing sites: class 4, landing along Fujian coast; and class 5, landing in Guangxi or Hainan province, and as shown in Figure 4.
In addition, economy developments and population are varied in different regions of Guangdong province. Disasters can cause different losses in different regions even if the tropical cyclone is of the same class. Also, population, arable land area and GDP are taken into account in this model, in which the affected area is defined as the places where tropical cyclones occur with process rainfall more than 50 mm or with maximum wind speed over Level 7. These statistics adopt dynamic numerical number which is through the detailed counting of yearly data rather than still values.
Furthermore, 17 factors are considered in this research: maximum process rainfall A1, maximum daily precipitation A2, number of sites with process rainfall over 300 mm A3, number of sites with process rainfall over 200 mm A4, number of sites with process rainfall over 100 mm A5, number of  sites with maximum daily precipitation over 100 mm A6, number of sites with daily precipitation over 50 mm A7, number of sites with process rainfall over 50 mm A8, maximum wind speed A9, number of sites with maximum wind speed over Level 7 A10, number of sites with maximum wind speed over Level 10 A11, minimum pressure A12, impact duration A13, landing site A14; and A15 » A17 are, respectively, per capita GDP, per capita arable land area and population density in the affected areas when tropical cyclones occurred.
According to disaster losses, the most obvious loss data are economic losses and casualties. Casualties are not certain for high contingency. With the increasing capability in TC prevention and disaster mitigation, casualties decreased evidently, so the assessment of casualty numbers is of little significance. That is the reason why assessment results are only based on direct economic loss and set it as output Y of the model.

Data sources
We selected sample data from 67 tropical cyclones in Guangdong from 1993 to 2009 because the data collected before 1992 were incomplete. In addition, tropical cyclone path, central barometric pressure, wind and rainfall data are excerpted from Tropical Cyclone Yearbook and China Meteorological Data Sharing System; data of direct economic losses are from China Meteorological Disasters Collection: Guangdong Section and Guangdong Anti-disaster Yearbook; data of GDP, population and arable land are from Guangdong Statistical Year book; and GDP and the amount of losses are converted based on the price index in Guangdong province in 2009.

Principal component analysis dimensional reduction
Dimensional reduction is indispensable in simplifying the complex data because assessment factors are numerous and the involved factors are relevant to each other in some way. After dimensional reduction, synthetical factors are available and they are independent. One of the commonly used statistical methods in dimension reduction is principal component analysis, which searches for as less orthogonal vectors as possible for multivariate sets of data points, so as to represent data features.

Combined model construction
Among 67 tropical cyclone data, 60 are randomly selected as the training data, and the remaining 7 are chosen as testing data to construct three types of assessment models based on GA-Elman, SVR and GRNN. The testing results are shown in Table 1.
To improve accuracy, after predicting the same object with many different assessment methods, combination assessment is to form a new prediction model via weighted combination. Combination prediction method is a means that predicts the same object taking advantage of two or more than two different single prediction methods. According to the reliability of different models, model weights are set as w i , i = 1,2, …, n.According to the prediction results of the individual models, the results are weighted average and the new final prediction result turn to be : where y i is the prediction result in every single model. Shown in Figure 5, combined model in this paper is made by SVR, GRNN, and GA-Elman neural network.The commonly used methods are equal average method, variance-covariance component method, reciprocal variance method and coefficient of alienation method, etc. Among all, optimal weighted method is more reasonable. Simply speaking, optimal weighted method aims to find an optimal weighted combination to minimize the prediction error, which is under the condition ( P w i = 1, w i 0). We make n groups of observed statistics and establish m prediction methods, then e ij will become the error of group j in the i -th prediction method.
To find the optimal weighted group w i , the value of Q should be the minimum. In these pages, we make use of exhaustive attack method, which takes 0.01 as the grad to search for all possible weighted groups. Finally, we found that when w is 0.64, 0.29 and 0.07, the value of Q is the minimum, which means this group of statistic is the weighted group we want.
Assessment results of the combined model are shown in Table 1. In order to compare and analyse each model, Root Mean Square Error (RMSE) is utilized as a standard to judge the effectiveness of assessment results. The smaller the RMSE is, the better the results are. As Table 1 shows, RMSE in combined model is the smallest. Therefore, its assessment results are better than the other three types of single models, and it improves accuracy and reduces risks of assessments faults.

Conclusion
To better assess tropical cyclone disaster, based on GRNN, GA-Elman neural network and SVRthree types of models, a tropical cyclone disaster assessment model has been established in the paper. The result shows that a combined model can effectively collect useful information from many models. As a consequence, its assessment accuracy, assessment stability and ability to adapt to changes in dynamic systems are much higher than in single model. The combined assessment model built in this paper offers a new way in assessing disaster caused by tropical cyclone. Furthermore, we got down to gathering more TC disaster statistics, building a model base combined with more models and training random-combined models to examine the effectiveness. Through great efforts, we aim at gaining the best combined model.
Tropical cyclones can be serious disaster, for it endangers people's life and damages property in coastal areas; thus, it is important for us to study tropical cyclones utterly and minimize disaster losses as hard as possible. Now the study of this object is challenging for unclear tropical cyclone mechanism and limited historical data for analysis. It is necessary to explore new ways in terms of theory building and model building. In addition, the introduction of new analytical tools (such as GIS, remote sensing, etc.) might be able to play an important role in the study.