Dam deformation analysis based on BPNN merging models

Abstract Hydropower has made a significant contribution to the economic development of Vietnam, thus it is important to monitor the safety of hydropower dams for the good of the country and the people. In this paper, dam horizontal displacement is analyzed and then forecasted using three methods: the multi-regression model, the seasonal integrated auto-regressive moving average (SARIMA) model and the back-propagation neural network (BPNN) merging models. The monitoring data of the Hoa Binh Dam in Vietnam, including horizontal displacement, time, reservoir water level, and air temperature, are used for the experiments. The results indicate that all of these three methods can approximately describe the trend of dam deformation despite their different forecast accuracies. Hence, their short-term forecasts can provide valuable references for the dam safety.


Introduction
Dam deformation is influenced by many complex impact factors, those are: (1) Trend − the dam under gravity and water pressure tends to suffer from upstream or downstream trend creep. (2) Periodicity − the ongoing effect of temperature variation and the changes in water level with consequent dam displacement result in significant cyclical fluctuations which vary annually. (3) Randomness − sunshine, wind direction, and seepage are the impact factors which can cause dam displacement and exhibit random fluctuation characteristics. For these purposes, the process of dam deformation must be modeled by the physical relationship as follows: (1) The effect of main factors which cause the deformation of dam, such as loading capacity, time, and reservoir pressure; (2) The effect of unchangeable factors but under control, such as environmental terms; (3) The effect of unexpected factors, such as earthquakes and storms, which are not available to predict the degree of their influences in advance.
Once the physical relationship between dam deformation and impact factors are established, the model for analysis and prediction is determined. To date, a lot of methods, which include the multi-regression models, the statistic models and the neural network models, have been successfully used in the analysis and forecasting of dam deformation.
The use of the multi-regression models in dam observation analysis has a long and rich history. Impact factors which affect dam deformation are used to build the multi-regression models. Liu et al. (2010) established a multiple linear regression to effectively forecast with high accuracy. Zheng and Jin (2010) used a nonlinear regression model with first-order auto-regressive error, also denoting as AR(1) error, for GPS monitoring data of a dam to improve the analysis and forecast precision. Li, Wang, and Liu (2013) demonstrated that the error correction model had a better forecast capacity than the multiple linear regression model. In general, the primary analysis models (which usually adopt linear and nonlinear regression, sinusoidal, or exponential functions) are too simple to reflect the real progressive movement of dam construction and operation.
The seasonal integrated auto-regressive moving average (SARIMA) model is a seasonal case of the integrated auto-regressive moving average (ARIMA) models. Although the ARIMA models were popularly applied to dam deformation analysis and forecast, SARIMA models were rarely used for deformation monitoring, especially for very important structures such as dams. An application of the multiplicative SARIMA model for analysis and forecast of dam displacement has been proposed by Chen et al. (2014). Combined with the calculation and analysis of one specific dam using radical displacement monitoring data, the results show that the multiplicative SARIMA model can predict the displacement tendency with fair and reasonable accuracy, which can produce OPEN ACCESS more precise prediction. Although there is an abundance of researches concerning the analysis and forecast of dam deformation, the authors find that the literature relating to using the SARIMA model is comparatively scant and the abilities of SARIMA models need to be studied further for lacking adequate analytical results.
In recent years, the back-propagation neural network (BPNN) used to analyze the observed data of the dam has become a newly researched subject. Jang et al. (2010) applied the improved BPNN model in the earth-stone dam distortion monitoring. The analysis achievement indicated that these corrective measures have large useful value. The BPNN and the merging model based on BPNN algorithm and statistic models were mainly discussed by Hu et al. (2012). The results showed that the merging model is better than the single model.
In this paper, the conceptions and methods of the multi-regression model, the SARIMA model and BPNN are introduced in Section 2. In Section 3, the experiment with the data-set of a hydro-power dam in Vietnam is done by applying the multi-regression model and the SARIMA model and the accuracy is the discussed in greater detail. Then, the BPNN merging models are presented based on the impact factors and the results of the two existent models. The results illustrate that merging models have better capacity than the single models in long-term forecasting.

Methods of deformation analysis and forecast
According to the cause of deformation and technical specifications for dam safety monitoring such as technical specification for concrete dam safety monitor- (1) the water level or hydraulic pressure component caused by the upriver reservoir; (2) temperature caused by thermal dilation; and (3) aging of the dam caused by the operation.
The following models are used separately or combined used to determine the physical relationship between these impact factors and dam deformation.

Multi-regression model
At present, the statistical model divides the dam deformation into three parts: reservoir pressure component, temperature component and aging component, according to the causes of deformation. This model has been widely adopted in dam deformation analysis. δ(t) expresses the observed deformation of an earth-rockfill dam at time t, and it can usually be decomposed into three components as follows (Lu 2003;Zhang and Hu 2013): The aging component δ A (t) originates from time-dependent behavior changes of dam. The following function is appropriate for earth-rockfill dams: The hydraulic component δ H (t) is a function of water level in the reservoir and can be modeled by a simple polynomial: where H is the water depth in front of the dam, or the reservoir water level.
The air temperature component δ T (t) can be modeled in various ways. Due to the time lag because of air temperature effect on dam deformation, we choose the impact factor T i as the temperature of i days before (Ma, Wang, and Chen 2009). Then, the response delay of dam deformation to the change in air temperature needs to be considered as:

Seasonal integrated autoregressive moving average (SARIMA) model
We assign Y t as the observed time series and e t as an unobserved white noise series, that is, a sequence of identically distributed, zero-mean, independence random variables. It is important to note that the process is stationary if the auto-covariance structure only depends on time lag and not on absolute time. The series that is partly auto-regressive and partly moving average have a general formula as: where θ i is nonzero, e t is the white noise term.
It is said that Y t is a mixed auto-regressive moving average (ARMA) process of order p and q, respectively, and can be abbreviated to ARMA (p, q) (Cryer and Chan 2008). If the d th difference W t of time series {Y t } in Equation (6) follows an ARMA (p, q) model, then {Y t } is called an integrated ARMA model abbreviated to ARIMA (p, d, q).
these models can be rewritten as follows (Cryer and Chan 2008): Simple non-seasonal model are mentioned above for beginning and then seasonal models are considered. For non-stationary seasonal processes, the seasonal difference of periods of the series {Y t } is denoted as ∇ s Y t and defined as: Seasonal ARIMA model is a special case of ARIMA models with the combined two differences in Equations (6) and (10): Seasonal ARIMA model is abbreviated by SARIMA(p,d,q) (P,D,Q) s with non-seasonal orders p, d, q, seasonal orders P, D, Q, and seasonal period s, also expressed by following Equation:

Back-propagation neural network
Back-propagation neural network (BPNN) is probably the most popular artificial neural network. Its structure is illustrated in Figure 1.
BPNN is a supervised learning method, described as an optimization neural network having multi-layer dynamic system. Basically, the implementation of BPNN can be subdivided into two phases: forward (propagation) and backward (error minimizing). The back-propagation algorithm can be decomposed into four steps (Graupe 2013).

Feed-forward computation (or forward pass)
The first step involves two parts: the first part is to get the values of the hidden layer nodes and the second part is to compute the value of any output layer using those values form the hidden layer. Both of these two parts use Equations (13) and (14) to calculate.
where W ij is the weight associated with the node i in the preceding layer to the node j in the current layer; X i is the output of the node i in the preceding layer; O j is the output of the node j in the current layer; f is the activation function, popularly using Sigmoid function defined as: where constant c can be selected arbitrarily.
By this way, the output value of the output layer is calculated. If this value is not expected, then we continue with the second step.

Back propagation to the output layer
This step is to calculate the error of output value using Equation (16). Then, this error can be used for backward propagation and weights adjustment.
where ε j is the error function of the output neuron j; O j is the computing value of output neuron j; d j is the desired/ target value of output neuron j.
The error is propagated from output layer to the hidden layer by Equation (17). The rate of change needs to be found and weights can be updated as in Equation (18). Here, the learning rate and momentum are brought to equations.
where ΔW ij is the correction of the weight W ij ; η is the learning rate.
where W k+1 ij is the new weight used in training epoch k + 1; W k ij and ΔW k ij are the initial weight and its correction in current epoch k, respectively; α is the momentum coefficient used to reduce the tendency to instability, approximately 0.9; δW is the change of weight correction change computed as: level, and the air temperature. The data were acquired from September 1999 to December 2010, and totally 137 periods and displacements of a monitoring point located on dam surface named P12 are used for the experiment.

Deformation analysis using multi-regression model
Using Equation (1), three components are determined as follows: (1) Aging component as shown in Equation (2); (2) Hydraulic component: In summary, the multi-regression (MTR) model of dam deformation is as follow: The least square estimation was used in the process of MTR. Because 3 √ t, H 2 and T 15 have strong collinearity to other variables, they are automatically excluded. Finally, the parameters are obtained and the MTR model is as follow: The fitting of the MTR model in Equation (22) was conducted in the SPSS software. The fitting process as well as the precision is, respectively, shown in Figure 5 and Table 4, then the forecast results using the MTR model are shown in Table 5.

Deformation analysis using SARIMA model
The moving process of P12 point is illustrated in Figure  2 and we can see that it is significantly nonstationary. Because the impact factors of the dam deformation all have annual periods, a regular first difference and then a seasonal difference with s = 12 months using R programming language were taken, as shown in Figure 3.
The most popular unit-root test for stationary process is the augmented Dickey-Fuller (ADF) test, percentage points of this limit distribution have been tabulated in (Fuller 1996). Hence, the (ADF) unit-root test is implemented for the combination of two differences, as in Table 1.
The ADF test value is smaller than all the test critical values, thus the series is stationary, that is, d = 1 and D = 1.

Back propagation to the hidden layer
This step is a bit more complicated than previous step in implementing. Errors of the hidden layer nodes are calculated by the new weight and error of output value.
where ε i and O i are the error and value of the hidden neuron i, respectively; ε j is the error of output neuron j that having connection with the hidden neuron i; W ij is the updated weight of connection from hidden neuron i to output neuron j. Once these errors are known, the rate of change is calculated for every weight and then new weights between input and hidden layers can be updated using Equations (17-19).

Weight updates
Here, it is important not to update any weights until all errors have been calculated. Because the corresponding results would not be valid if new weights are used in error calculation.
The new error can be determined using updated weights and in this way the continued iteration of the four steps can be done until the error value is 0 or below a desired threshold, which is usually very small.
Among the published improvements in standard BPNN, momentum item and the Levenberg-Marquardt (LM) algorithm are two common and effective ways to improve the performance of BPNN. Among them the LM algorithm is the most effective method used in this paper. Once all the weights in BPNN are determined, we can predict according to the input values and network weights.

Description of Hoa Binh Dam and monitoring data
Hoa Binh Dam, located on the Black River (Vietnamese: Sông Đà) in Hoa Binh Province at 20°48′30″N latitude and 5°19′26″E longitude, was the first hydro-electric dam in Vietnam, and also the first one in South-East Asia. The construction process of this earth-rockfill dam began in November 1979 and completed in December 1994. It measures 128 m (420 ft) in height, and 970 m (3182 ft) in length. The facility is owned by the Vietnam Electricity Corporation and produces up to 8160 GWh of power annually.
Because of the importance of the Hoa Binh Dam, a well-instrumented monitoring system was established from the start of the construction phase to monitor its deformation. The monitoring data were collected monthly, including the horizontal displacement determined by alignment survey method, the reservoir water Among the models whose residuals are white noise, the best model is the one that have the smallest AIC value. According to Table 2, the SARIMA (1,1,1)(1,1,1) 12 model is chosen according to the formula as follows: The fitting values and precision of the SARIMA model are illustrated in Figure 5 and Table 3, respectively. The forecast values and forecast errors using this model to make short-term forecasts with 12 periods ahead from January 2010 are presented in Table 4.
Mean absolute error (MAE) and root mean square error (RMSE) are calculated in Equation (24): where N is the number of forecast values; x t is calculated value using the SARIMA model; x t is measured value.
Both models have high RMSE in comparison to the accuracy standard of 3.54 mm (Ministry of Science and Technology of Vietnam 2012). According to Table 4, the biggest value of forecast errors happened in the fifth month forecast. The limitation error is double the above accuracy standard, thus we have m limitation = 2 × m rockfill dam = 2 × 3.54 = 7.08 (mm) The limited error is larger than the biggest forecast error, however, the precisions of these two models in annual forecasting are quite low. In addition, the MTR model require large amount of data but the precision in this case is not as good as the SARIMA's because of the uncertainty factors. As known that the BPNN is a good nonlinear algorithm, therefore it can be merged with the SARIMA or the MTR model to improve the precision of modeling and forecasting.
The autocorrelation function (ACF) and partial autocorrelation function (PACF) are known as effective tools for identifying the parameters of the stationary time series (Cryer and Chan 2008). By computing the ACF and PACF of stationary series and then showing the plots as in Figure 4, the parameters of SARIMA model are determined as q = 1, Q = 1, p = 1, p = 1.

SARIMA-BPNN merging model
In the SARIMA model, it assumes that the future values of a time series have a linear relationship with current and past values, therefore the errors of the SARIMA model may include nonlinear problems. In this study, we use BPNN model to overcome the limitations of the SARIMA model and improve fitting and forecasting accuracy. Specific network structure for merging BPNN with SARIMA model is determined as: (1) Input layer: having 12 neurons which are 12 consecutive forecast errors of the SARIMA model, and output layer is the subsequent forecast error. That is, the fitting errors during the 1st and the 12th periods are the values of input layer and that of the 13th period is the value of output layer, and the fitting errors during the 2nd and the 13th periods are the values of input layer and the value in period 14th is the value of output layer, and so forth; (2) Hidden layer: using empirical formulas (Lu, Chen, and Zhou 2010) to quickly determine the number of hidden layer nodes as 25.
Applying the LM algorithm, the SARIMA-BPNN model obtains validated after 19 training and learning iterations, as seen in Figure 6(b). The results of the MTR-BPNN and the SARIMA-BPNN merging models Data-set of 137 observation periods are divided into two groups: group one involves 125 training samples for learning and training the network, and the other group containing 12 samples will serve the test of predicting. In addition, the analysis and forecast results of previous two models are also used. Both two BPNN merging models are programed in the Matlab.

MTR-BPNN merging model
The MTR-BPNN merging model is a method to compensate for the error of the theoretical model based on the BP neural network model (Hu and Zhang 2010). Thus, the specific network structure is determined as: (1) Input layer including 8 neurons: 7 impact factors in Equation (22) are √ t, lnt, H, H 3 , T 1 , T 30 , T 60 and the fitting value of MTR model x t ; (2) Output layer with only one neuron, which is the difference Δx between the model fitting value and the measured value (fitting error); (3) Hidden layer: using empirical formulas to quickly determine the hidden neuron number is 18.
Applying the LM algorithm, MTR-BPNN get the validation after 194 training and learning iterations, as seen in Figure 6(a).

ORCID
Kien-Trinh Thi Bui http://orcid.org/0000-0003-2265-7559 are presented in Table 5. To assess the forecasting accuracy, we need to compute MAE and RMSE using Equation (24) by replacing the fitting value with the forecast value. The results of the accuracy assessment are exhibited in Table 6. Both the proposed merging methods have evident improvement in accuracy in long-term forecast. Especially, the errors are evenly and the forecast errors are almost smaller than the accuracy standard.

Conclusions
The theoretical foundation of the MTR models is based on the precise function of the relationship between dam deformation and reservoir levels, air temperature and the age of the dam. Because both reservoir levels and air temperature have seasonal characteristics, the analysis and forecast results of SARIMA model is slightly better than the results of the MTR model. The results show that in the normal operation phase of a dam, the SARIMA and MTR models are suitable for short-term forecast up to four months, these models being derived from unique deformation data. The BPNN is implemented for merging with the SARIMA model and the MTR model. The results of the merging models show that both merging models have small errors during the forecasting time. Particularly, the forecasting precision after the fourth month to one year is greatly improved. The experiments show that BPNN can solve the random problems caused by the impact factors and provide more accurate forecast values. So in case of analyzing multi-source deformation data, it has been found useful to combine the BPNN with SARIMA model or the MTR model because of clearly improvements in the prediction accuracy.
The accepted value of errors shows that, if we need to forecast the dam deformation in less than 4 months ahead, the most suitable model is the SARIMA model because of the equivalent accuracy and only requiring the deformation values. For up to one year forecasting, we should use the SARIMA-BPNN merging model because it is better than the others according to the achieved results. By this experiment, we can also see that the SARIMA model and the SARIMA-BPNN merging