Earthquake prediction with meteorological data by particle filter-based support vector regression

Prediction of earthquakes has been long of interest of scientists to create a timely warning to save lives and reduce the damage. During the last few decades, scientists could record and classify the earthquakes’ effective parameters through careful studies. Precursor, as one of the most important parameters, presents the variation in the concentration of radon gas in the earth’s crust released by faults. Measuring and comparing this precursor requires the installation of appropriate hardware in


Introduction
Earthquakes are among the most destructive natural disasters. Iran due to the convergence between the Arabian and Eurasian plates had been highly effected by earthquakes. Six earthquakes have been identified since 1997 with a magnitude of larger than seven Richter scale (Zhou, Thomas, Parsons, & Walker, 2018). Obviously, the prediction of this natural phenomenon prevents disasters in the area, which is seismically highly active. Scientists and researchers made many efforts in this field to determine the possibility of an earthquake in terms of magnitude, time and place. Although the unpredictability of the time is still emphasized.
The parameters such as "earthquake precursors" (Lu et al., 2018) have been used by scientists to predict the earthquake. During the past few decades, the hope for accurate earthquake prediction has dramatically increased with the advancement of computer systems.
Up until now, about 30 precursors were identified. Since the probability of an earthquake cannot be predicted by CONTACT Shahaboddin Shamshirband shahaboddin.shamshirband@tdt.edu.vn focusing only on one precursor. Due to the importance of the subject, an acceptable technique can be found by improving the precursor systems or combining the signs. Earthquake processes are complex natural phenomena and it is difficult to capture diagnostic precursor, if any, before the occurrence of an earthquake (Asencio-Cortés, Martínez-Álvarez, Morales-Esteban, Reyes, & Troncoso, 2017). As a predictor, support vector regression (SVR) has been used in this study, which is one of the best predictive methods used in several recent papers. SVR training parameters and the data set influences the overall performance of the model (Insom et al., 2015). In this research, the aim is to improve the performance of the SVR using a particle filter method (PF). In the proposed method, SVR parameters are determined by the particle filter, which is usually set by try and error.
The rest of the article is organized as follow. Section 2 provides the state of the art of earthquake prediction using machine learning algorithms. Section 3 defines precursors and their types. Sections 4 and 5 describe the support vector regression and particle filter. Section 6 explains the proposed method. Section 7 describes the database used and measurement accuracy. And finally, the conclusion is given in section 8.

An overview of earthquake prediction methods
Scholars have used a variety of approaches in this regard for decades. The earthquake prediction methods can be categorized into two groups: (i) traditional method; and (ii) knowledge-based method (Mehdidoust & Shahbahrami, 2016). The traditional methods are dependent on the basis of historical data obtained from external sensors or seismic cycles. Recent improvements in technologies have enabled researches to get acquainted with the causes and symptoms of the earthquakes via monitor the surface of the earth and collecting required data from orbiting satellites (Ikram & Qamar, 2014;Torabi, Hashemi, Saybani, Shamshirband, & Mosavi, 2018). To demonstrate the earthquake precursors through ionosphere layer changes (i.e. the total electron content), a scrutinize inspection was conducted at the equator. The results indicated that satellite facilities may help to diagnose the precursors in the ionosphere layer, from a few hours up to eleven days before the main shock (Mehdidoust & Shahbahrami, 2016). Another common prediction method is to study the continuous motion of the earth's surface through remote sensing devices such as Global Positioning System (GPS) (Ikram & Qamar, 2015). It was found that GPS pre-signals data can help to predict the location of the earthquake, up to 90 days prior to its occurrence (Murai, 2010). According to a research on the surface temperature of the earth, using thermal bands in satellite pictures, it is noted that in some cases there are anomalies in the temperature state before the earthquake too (Tronin, 2010). Several recent works have been attempted to extract the empirical relationships between modified Mercalli intensity (MMI) and engineering ground acceleration (PGA) of the earthquakes, for the Iranian territory. The generated MMI-PGA relationships in that study will be particularly beneficial for either damage prediction or determination of the engineering parameters, when a major earthquake occurs at or nearby Iran. Therefore, modification of the seismic building code of Iran becomes vital to reduce hazards arising from future earthquakes (Nemati, 2016;Vargas, 2017). In a recent report, animals' behavior has been addressed in this regard. Anomalous animal behaviors have been widely observed the date even several days before an occurrence an earthquake. In that research, animals were used as intelligent geo-sensors to tell or estimate when and where an earthquake will potentially occur (Cao & Huang, 2018;Mosavi, Bathla, & Varkonyi-Koczy, 2017).
The knowledge-based methods use previous information to predict the earthquake. In other words, these methods employ the features and data available from previous earthquakes at a time interval to predict the probability of an earthquake (Borghi, Aoudia, Riva, & Barzaghi, 2009). An efficient knowledge-based system based on the frequency pattern growth algorithm used to predict earthquakes, the aim was to predict future earthquakes by previous earthquakes data, and the system was able to predict the probability of an earthquakes occurrence for a defined range with high accuracy (Tonooka, Palluconi, Hook, & Matsunaga, 2005). A phase-neural classification was used for short-term prediction by stored seismic data including depth, magnitude, location and time, this method was capable to predict an earthquake five minutes before its occurrence with an accuracy of 82.86% (Mehdidoust & Shahbahrami, 2016). Another efficient knowledge-based system was obtained by extracting associative rules from earthquakes data from 1972 to 2013 (Ikram & Qamar, 2015). Another method for earthquake prediction has support vector regression, which is intrinsically a supervised learning algorithm (Dehbozorgi & Farokhi, 2008). Table 1 shows the comparison of several prediction systems.
A study was conducted on the health of lithium-ion batteries and this paper used improved methods for the lithium-ion batteries SOH (State of Health) monitoring and RUL (Remaining Useful Life) prediction with the SVR-PF applied and achieve good performance and better monitoring and prediction capability than the standard particle filter (Dong, Jin, Lou, & Wang, 2014). Support vector machine is one of the best artificial intelligence tools which has been widely adopted by scientists to improve its performance by using other method along with it. Recently a study tried to improve breast cancer detection by using particle swarm optimization (PSO) method to recognize tumors patterns (feature reduction) and support vector machine is employed to classify tumors. This method could achieve high accuracy and best results (Ahmadi & Afshar, 2016).

Earthquake precursor
Any parameter which changes before the earthquake is called "precursor", so that this phenomenon can be predicted by examining, measuring, and finding the relationships between them and the earthquake. More than 30 earthquakes were identified, but most of them are still at the research stage, while the strong relationship between these precursors and earthquakes was proven  (Mehdidoust & Shahbahrami, 2016 to researchers, and perhaps it is important to note that uncertainty is due to the complexity of geological phenomena. However, this may be more practical by limiting this prediction to certain regions with known geological patterns. Here are the types of earthquake precursors (Mehdidoust & Shahbahrami, 2016).

Types of earthquake precursors
Precursors have a lot of diversity and complexity requiring the cooperation of specialists in various scientific fields. In Iran, a research-specialized collection is working under the supervision of the Institute of Geophysics of the University of Tehran. Precursors can be divided into several categories in terms of the scientific field required for review and analysis such as the changes in the amount of radon gas in underground water, changes in temperature in groundwater, groundwater change, foreshock before the main earthquake, the magnitude of foreshock , the number of foreshock, the lack of an earthquake in areas including foreshock due to the presence of fault, on clouds due to the reaction of atmospheric gases with leaded ions released with radon gas, changes in air temperature and pressure, wind speed, relative humidity changes, bird fluttering, and earthworm outflow.

Support vector regression
Support vector machine is one of the supervisory learning methods used for classification and regression. This method is reported more effective compared to other ANNs e.g. perceptron neural networks. The purpose of an SVM is to search for a hyperplane that completely separates a data set. The construction of an SVM has been described in many publications (Insom et al., 2015). The basis for the SVM classifier is the linear classification of data which select a line with more fade margin. Support vector regression used in this study is generalized to the binary classification of support vector machine. In the classification, the inputs are in n-dimensional space and outputs are +1 or −1 (binary). In regression, there can be infinite regressions (outputs are as real) which can be used to estimate the values of the function (other uses include modeling and predicting the time series).
In general, there are three main ideas for support vector machines: define an optimal hyperplane, extend the above definition for non-linearly separable problems and map data to high dimensional space where it is easier to classify with linear decision surface (Sayad, 2017). As depicted in (1), the data are based on continuous space in regression.
{xi, ti} xi ∈ Rm ti ∈ R (1) In this equation, x i represents the input and t i is regarded as the target. It aimed to estimate y i during the process of real data as shown in equation (2).
Equation (2) is related to linear regression. Using the kernel trick, which is also used in this research, we obtain linear regression from a nonlinear one. One of the disadvantages of this method is the adjustment of its parameters, which in this study has been tried to optimize this problem (Insom et al., 2015). The benefits of support vector regression are as follows: • Training is relatively simple in this method.
• Unlike neural networks, it does not get stuck at the local maximums. • For small data results are better, compared to other methods. • The tradeoff between the complexity and errors is clearly controlled.
Support vector regression is a generalized method of its classification, and the objective function can be defined as minimizing the equation (3).
δ and C parameters are explained in section 4.1 and w can be defined according to equation (4).
The values of x i and y i are inputs, and the coefficient α i can be obtained from the dual problem of equation (4).

Support vector regression parameters
The parameters C, Epsilon, and Kernel Scale are the values which are estimated by the particle filter improving the function of the support vector regression. These values were usually determined by trial and error. The kernel function parameter, which is used to map non-linear data linearly was manually determined. A brief explanation of these parameters is given below:

Parameter C
This parameter is a compromise between training error and model complexity (Insom et al., 2015). The larger C results in a reduction of the final error, but if you increase C too much, you will increase the risk of losing the generalizable classification properties. Regarding the reason, we can refer to the effort of the support vector regression to achieve the best results for all the points.
In addition, C increases the training time. If C is small, we will have a complicated classification. The value of C is chosen to minimize the training error and can also be well-generalized. A large amount of C is shown in Figure 1(A) and its small one in Figure 1(B).

Epsilon
This parameter is Loss Function which is related to the accuracy of the approximation. The value becomes zero if the predicted value is inside the -tube (between − and + as shown in Figure 2). Generally, it can be said that the accuracy of predicted outputs and the number of support vectors are directly related to this parameters; a support vector regression with an appropriate value can provide a smooth output inside the -tube (Insom et al., 2015). As it is shown in Figure 2, if value select greater than the range of educational data we cannot expect a good result, and if we make it zero, we expect over fitting, it means the freedom degree of the model is greater than the real one, which results in undesirable output on the test data, in other words, the model on training data has a better result than the test data. The -insensitive loss function, proposed by Vapnik (2013), is the most frequently used function to quantify the empirical risk and measure the quality of estimation (Akay, Abut, Cetin, Yarim, & Sow, 2017). In this way, the condition is as equation (3).

Kernel function
These functions are used to map nonlinear data to linear. The previous studies showed that the earthquake is a nonlinear phenomenon. There are several Kernel Functions such as polynomials and hyperbolic tangent. In this study, the Gaussian Radial Basis was used based on trial and error. This function is expressed as equation (4). Kernel methods generally are a class of algorithms for pattern analysis in SVM and the selection of the kernel function is a very important and mission-critical step (Feizizadeh, Roodposhti, Blaschke, & Aryal, 2017).
where S ∈ R N represents an N-dimensional data vector and each sample belongs to a class with label Y i . The parameter which specifies kernel function is the standard deviation σ (Insom et al., 2015).

Kernel scale
This parameter is actually known as σ (sigma) and determines the output of the vector machine by kernel function which is explained. The kernel parameter defines the non-linear mapping from the input feature space to a high-dimensional feature space (Tharwat, Hassanien, & Elnaghi, 2017).

Particle filter
The particle filter is a method for estimating the state of linear systems. The difference between this method and the state observer (State estimator) is that the particle filter converts the system from a typical linear system to a system with random behavior at the presence of noise. However, this randomness is not highly severe. These noises affect both the state equation and the measurement equation as shown in equation (5). The measurement noise can be defined such as other measurement items including dimensions and weights. Particle filters are sequential important sampling methods (Morzfeld, Hodyss, & Snyder, 2017).
In this equation, x k indicates the current state and y k represents the particle filter output. f k and h k are the process and measurement functions, respectively. u k is regarded as the input and w k and v k are both the noises affecting the equations.

The proposed method
The performance of a support vector regression depends on its parameters and the data set. The correlation between the support vector regression parameters and the data can affect the model, so parameter estimation is an important and necessary process for achieving a strong correlation. In the usual case, these parameters are determined by the user or by trial and error. The most used method, in this case, is Grid Search, However, the range of parameters is still performed by the individual and trial and error which fails to achieve an ideal result. While the selection of main parameters is conducted by a particle filter in the particle filter based support vector regression, Parameter selection is performed with particles filter based on particle weight in this filter, which is obtained by calculating the probability density function (PDF). The particle filter is often used to estimate the state of a dynamic system, including prediction and updating process, where some functions are also used for repetitive techniques. Using these two processes, the state vector formed by the estimated parameters can update its values for each repetition based on the weight of each particle, as the result of calculating the PDF on the output and the correct value of the system. Each repetition of particle filter updates the parameters to obtain a logical and better value. Finally, the support vector regression is finalized with parameters predicted by the particle filter. Parameter calculation by particle filter is defined as equation (6) and (7).
where x k = [σ k C k ε k ] indicates the state vector in moment k, w k represents the nonlinear noise with mean zero and variance Q, which describes the oscillation of the support vector regression parameters. y k Is regarded as the vector of measurement or the degree of accuracy of the support vector regression training function h k , and v k is described as the nonlinear error prediction with mean zero and variance R. How this method works is displayed in Figure 3. By studying the minimum error, the process of predicting and updating the weight of the particles continues at any repetition. Thus each time it repeats, the support vector machine is modeled to calculate the predicted values with actual values, leading to the estimation of the error. At the end of repetitions, depending on the  Tradeoff between educational error and the complexity of the model data and their number, the selection of more repetitions according to the number of data leads to over-processed modeling without producing the correct result of the test data. Using the obtained weights, the support vector model is modeled with the estimated parameters and is re-tested and tested as the final model. The parameters set in the proposed method are given in Table 2. The three parameters that are referred to as unknown in Table 2 are values that will be selected by particle filter and set by zero at the beginning. It should be noted that the noise value is given in equation 5.

Study area and dataset
The data used in this study were extracted from two Iranian databases, climatic data from the Islamic Republic of Iran Meteorology Center (www.irimo.ir) and seismic data from the Iranian Seismological Center (www.irsc.ut.ac.ir) during 2006-2014 and the area under consideration was in Tabriz in the range of 37-39°latitude and 45-48°longitude, Tabriz is located northwest of Iran and as shown in Figure 4, is located on an active fault, shown with red line. In the present study, the seismic data included the number and magnitude of the earthquakes and the meteorological data included the mean temperature, maximum temperature, minimum temperature, average wind speed and precipitation. Figure 4 illustrates 1385 earthquakes recorded in this area.
More information on the collected data is shown in Table 3.
Feature 6 and 7 were used as outputs and the rest as inputs.

Accuracy assessment
The following figures display the results obtained from the regression model in terms of the correlation between the actual and expected outputs. The model for the average magnitude of earthquakes is as follows. In this study, the accuracy of 96% was achieved using two databases. This accuracy drop is related to a severe malformation in November 2013, the study area encountered 477 earthquakes which were unprecedented. This anomaly was more evident in predicting the number of earthquakes. By obtaining the appropriate model and training it with 80% of the data that is randomly separated from entire data, the testing is carried out with residual values which is 20% of entire data. The proposed method was used for test data and results for the mean magnitude are shown in Figures 5 and 6.
As shown, the used method could calculate and predict the correct values to a high accuracy of 96% which  is for test data (Figure 6(C)). This method was used with the same conditions and assumptions to determine and predict the number of expected earthquakes in a month. Which are shown in Figures 7 and 8. Based on the above figure, the actual values and estimated targets by the optimized support vector regression were apparent, and except one case, prediction and estimation were performed appropriately. The same data (with the same separation for train and test) were analyzed for other data mining methods including multilayered perceptron and support vector regression and Tables 4 and 5 indicate the results of the regression index indicating the accuracy of the research estimate, these results are for test data to be evaluated.
As shown in Tables 4 and 5, the proposed method could increase the accuracy of the measurement. In this   comparison, the difference between the support vector machine and the parameters determined by trial and error and the proposed method was quite evident.
The chart of the R index in Figures 6 and 8 represent the coincidence of outputs of the method (Correlation) and the actual values, it is a standard for describing linear relations and is not used for nonlinear data. This criterion will represent four modes: • Complete and positive correlation if R = 1 • Incomplete and positive correlation if 0 < R < 1 • Complete and negative correlation if R = −1   • Incomplete and negative correlation if −1 < R < 0 In general, negative or positive sign only represent correlation point. The appropriate value of R cannot be defined, but the closer to one is better. This index can be defined as equation (8).
In this case, x and y will be comparable values, and the parameters in equation (8) can be calculated according to equations 9-13.x x.y = 1 n n i=1 x i y i S 2 y = y 2 −ȳ 2 (12) And n also indicate the number of samples.
In the case of other indicators, the mean squared error and its root are shown as MSE and RMSE. The MSE represents the difference between the real values and predicted values in regression. The equation is shown in equation (14).
where θ 1 and θ 2 are real and predicted vectors, and n is the number of data. The root of mean square error is another criterion used to examine the method error and can be calculated according to equation (15).

Conclusion
By studying and reviewing the history of earthquake prediction, the signs of occurring earthquakes can be identified which were always considered by ordinary people and even researchers. The signs or parameters which vary before and after the earthquake are known by the researchers as the earthquake precursors. As for the earthquake phenomenon, 30 precursors were identified and evaluated about earthquake phenomena. The evaluated data were extracted from the meteorological and seismological organizations of Iran which include meteorological precursors and seismological history. The study area is located in Tabriz, northwest of Iran. A Support vector regression is one of the best methods for predicting, but the parameters of this method can play an important role in the accuracy of the model, they are always considered as a subtlety of this method and are often determined as trial and error. In this study, the particle filter method was used to optimize the support vector regression performance. The used model could increase the accuracy in predicting the magnitude and number of expected earthquakes in a month, with an accuracy of 96% for the mean magnitude of the earthquake and more than 78% for the number of earthquakes, leading to an accuracy drop in the number of earthquakes, due to a seismic disorder in November 2013. The studied area witnessed the occurrence of more than 477 earthquakes, which may be due to human error in data recording. This study could demonstrate the relationship between meteorological data and the occurrence of an earthquake and predict it in terms of the number and magnitude of the earthquake by using the proposed method based on artificial intelligence methods with a high accuracy. The database of this study was limited to monthly data; if accessed to the daily data, the proposed method probably was able to identify the results within daily range. And deferent results can be found using other precursor types as described in section 3.

Disclosure statement
The authors declare no conflict of interest.