Establishment and application of a fractional difference-autoregressive model for daily runoff time series forecasting based on wavelet analysis

ABSTRACT The daily runoff is a complicated hydrological time-series, its extreme changes will affect not only the safety of the ships underway but also the dispatching of water transportation. Daily runoff forecast is a hot topic in hydrologic analysis and water transport planning. The fractional difference-autoregressive model based on wavelet analysis (WFIAR) is used to forecast the daily runoff. The daily runoff time series from 2002 to 2013 of Hankou Hydrologic Station are decomposed by the orthogonal db4 wavelet function into a series of stationary high-frequency sub-sequences and a non-stationary low frequency subsequence. According to the Hurst exponent of the low frequency subsequence, the appropriate difference order is determined and fractional difference is made. The differenced low-frequency part and high-frequency parts set up autoregressive models separately. These models are combined to forecast the daily runoff time series in 2014 and 2015 separately. The first-order difference autoregression (DIAR) model and the first-order autoregression (WDIAR) model based on wavelet analysis are also established to compare the forecasting accuracy of different models. The results show that the prediction accuracy of WFIAR is higher than the other two models that reveal the advantage of WDIAR model in daily runoff forecast.


Introduction
Hydrologic time series is a sample of observations with complex variation characteristics. Hydrologic time series analysis aims at indicating the complex hydrology process which is the basis of hydrologic simulation and prediction, water resources and transport planning and many other water activities. The task is not easy to fulfil due to its various influence factors (Sang, Dong, and Wu, 2009). Establishing an auto-regressive model is a traditional time series prediction method with a convenient and intuitive advantage (Lin, Xia, and Jie, 2013;Wang, Liang, and Ding, 2016). However, hydrological time series are usually non-stationary and do not meet the requirements of establishing an auto-regressive model. The existing studies mostly adopt the first-order difference method to eliminate some volatility of the time series and make it become a stationary time series. Although the method is computationally simple, it is prone to over-differentiation problems, resulting in the loss of some important detail-changing features in the time series. Fractional difference can be to some extent avoid this problem. In recent years, some domestic and foreign scholars have carried out some researches on the application of fractional difference in time series prediction. Hu, Xu, and Wang (2006) proposed a method on the basis of fractional difference Fuzzy-AR method for network traffic simulation and prediction, the fractional difference method was used to eliminate the longtime correlation of the time series. Zuo, Wang, and Xu (2009) established the coefficient auto-regressive predictive model based on fractional difference method, automatically selected fractional-order difference or integerorder difference to deal with the original data and estimate the optimal modelling parameters. The example verification shows that the DFAR model improves the prediction accuracy of fault feature quantities for nonstationary and non-linear development. GADA Pereira and Souza (2014) developed a study of the effects of long dependence in the series of runoff natural energy in the South subsystem, in order to estimate a long memory model capable of generating synthetic hydrologic series. The above attempts prove the validity of fractional order difference in processing time series.
Wavelet analysis theory developed from the Fourier analysis. As it can reveal the local characteristics of time series simultaneously (Kumar and Foufoula-Georgiou, 1997;Labat, and DLABATD, 2005), the wavelet analysis is more suitable to studying non-stationary hydrologic time series. Kumar and Foufoula-Georgiou (1993) introduced the wavelet analysis to the hydrology when he studied the scale features of rainfall distribution by using orthogonal wavelet transform. The development of stochastic hydrology has been greatly promoted with the widely application of wavelet analysis. As the two different aspects of wavelet analysis, continuous wavelet transform and discrete wavelet transform play different roles in hydrology (Sang, 2012 ). Generally, Continuous wavelet transform method is used to reveal and describe the multi-time scale change characteristics of sequences (Yao, 2001). By decomposing and reconstructing wavelet, hydrologic time series can be simulated randomly (Hen, Wang, and Li, 2002;Jayawardena, Xu, and Li,2009;Nowak, Rajagopalan, and Zagona, 2011;Wang, Yuan, and Ding, 2000). Because of this advantage, discrete wavelet transform method is mainly used to establish prediction models. A number of prediction models which are composed of discrete wavelet transform method and the neural network. (Zhao, Ding, and Deng, 1998) established a chaotic wavelet neural network model based on wavelet analysis, chaos and artificial neural network. By using this model, the daily flow process in the flood season of Pingshan hydrological station was predicted. The accuracy ratings of these prediction models are acceptable if the independent variables are known. In fact, the river runoff changes over time, the influence factors are complex and unquantifiable. Hence, these prediction models are not suitable for river runoff forecasting. To solve the problem of lack of independent variables, the scholars have made some attempts. (Wang and Liu, 2010) combined wavelet transform with support vector machines to establish a tendency analysis model for hydrologic process. Hydrologic time series was decomposed to subsignals by wavelet transform firstly, simulated and predicted each sub-signal using the support vector machine. (Wang, Ding, and Heng, 2003) proposed a combination prediction model of nearest neighbour sampling regression based on wavelet transform sequence. The maximum peak flow of Yichang station in Yangtze River was taken as an example to discuss. In Darian's study (2016), a data-driven streamflow forecasting model is developed, in which appropriate model inputs are selected using a binary genetic algorithm (GA). These attempts have achieved better results in monthly runoff or annual runoff prediction, but it is worth considering whether they can be used for daily runoff forecasting.
In this study, a wavelet-autoregressive prediction model (WFIAR) for daily runoff on the basis of fractional difference method was established. The WFIAR prediction model combines the wavelet analysis, fractional difference method and autoregressive model. To highlight the advantages of this model, an autoregressive prediction model(DIAR) on the basis of integer difference method and a wavelet-autoregressive prediction model(WDIAR) on the basis of fractional difference method are also established.
The innovation point of this study is as follows.
(1) The discrete wavelet transform method is used in establishing an autoregressive prediction model. (2) To solve the strong non -stationarity of the daily runoff sequence, the fractional difference method was used after the original sequence had been decomposed.

Wavelet analysis
Wavelet analysis is a method of time-frequency multiresolution analysis. In the book Hydrological Wavelet Analysis (Wang, Ding, & Li, 2005), the continuous wavelet transform is defined as: is called the mother wavelet; < , > is the inner product; a is the scaling factor; b is the time translation factor;ψ is the complex conjugation mother wavelet function, ψ a,b (t) is a family of functions which are flexed and translated from ψ(t).
Where, ψ a,b (t) is called the analysing wavelet or the continuous wavelet. Hydrological time series are mostly discrete; therefore, the continuous wavelet transform must be discretized.
and j are integers. The wavelet transform coefficient can be expressed by the discrete wavelet transform(DWT) as follows: where, j denotes the decomposition level; k denotes the time translation factor. The sampling step in the time domain is adjustable towards different frequency components a −j 0 . Usually, a 0 is equal to 2 and b 0 is usually equal to 2. The discrete wavelet transform can be indicated as the follow equation.
The high frequency component corresponding a smaller j value has a smaller sampling interval, the low frequency component corresponding a greater j value has a larger sampling interval. That is the reason why the discrete wavelet transform can realize time-frequency localization. After the time-series has been decomposed into wavelet coefficients, the time-series analysis is transformed into the study of wavelet transform coefficients. The Mallat algorithm was put forward by Mallat on the basis of multiresolution analysis. It has been proved that it is suitable for the fast decomposition and reconstruction of signal by orthogonal wavelet. The Mallat fast algorithm includes the decomposition algorithm and the reconstruction algorithm.
(1) Mallat decomposition algorithm Where, H is the decomposition low pass filter, G is the decomposition high pass filter. The original time series can be decomposed into an approximation signal (c J ) and detail signals (d 1 , d 2 , · · · d J ) respectively by using the above equation.
(2) Mallat reconstruction algorithm The reconstruction algorithm is the inverse of decomposition algorithm.
Where,H is the reconstruction low pass filter,G is the reconstruction high pass filter. Refactor c J and d j respectively by using Equation (5), the high frequency components (D 1 , D 2 . . . D J ) and the low frequency component (C J ) can be obtained. Therefore, the original time series can be calculated by follow equation:

Fictional difference-autoregressive (FIAR) model
The key steps for establishing a FIAR (d, p) model are as follows: The first step: Analyse the long memory characteristics in hydrologic time series, and calculate the order 'd' of fractional difference. The second step: Fractional order difference for time series, get a new stationary time series.
The third step: Determine the order of the AR model, which is to determine the value of 'p'.

Autoregressive model
As the autoregressive prediction model method is simple and intuitive, it has been valued and widely used in hydrologic analysis. The autoregressive forecasting model can be defined as: Where, X t is the original hydrological time series; ϕ 1 , ϕ 2 , . . . ϕ p are auto-regressive coefficients; μ is the average value of X t ; p is the order of the auto-regressive model. Parameters in an auto-regressive model (ϕ i ) can be derived by the Yule-Walktr Equation: ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ r 1 = ϕ 1 + ϕ 2 r 1 + · · · + ϕ p r p−1 r 2 = ϕ 1 r 1 + ϕ 2 + · · · + ϕ p r p−2 · · · r p = ϕ 1 r p−1 + ϕ 2 r p−2 + · · · + ϕ p In the above equation, r i is the autocorrelation coefficient of hydrological time series with delay number.
Before establishing an autoregressive model, the stability of the hydrologic time series should be test. The autocorrelation graph and the partial correlation graph of the time series can be used to judge whether the time series is stable. If the autocorrelation function of the time series rapidly drops to zero, the time-series is stationary, otherwise it is non-stationary. Although the graphical method is simple and intuitive, it is subjective. Augment Dickey-Fuller (ADF) test method was advanced by Dickey and Fuller in 1979 (Xia, 2005). It is a type of unit root test. The ADF rule is to judge the stability of a time series by checking whether there is a unit root in it. If there is not unit root, we can infer that the time series is stationary. The ADF test can be realized directly by MATLAB software.

Fractional difference and fractional integral
If the time series is non-stationary, we cannot establish an autoregressive model with it. The first order difference is a frequently-used method, but it may lead to the over difference problem. Mills (2002) found the relationship between the difference order 'd' and the variance of the time series. The variance will decrease with the increase of the difference order until it becomes a stable time series, and then it is going to increase.
Fractional order difference adopts a smaller order to overcome over-difference. The fractional difference method is defined as follows: Where X t is the original time series, d is the difference order, L is the lag operator, W t is the signal after differenced. The binomial expansion of (1 − L) d is as follows: When d > −1, the above equation can be expressed by a hypergeometric function: After d has been determined, is just a function of k, wrote as g(k)•.
When t = N, then W N = g(0)X N + g(1)X N−1 + · · · + g(N)X 1 ; The matrix of fractional difference can be expressed as: The time series X t can be fractional differenced using the Equation (17).
The key of fractional difference method is to determine the order d. It can be determined by using the Equation (18).
Where, h is the Hurst exponent of the time series.
An auto-regressive model was established on the differenced sequence to obtain the predicted data. Then fractional integral the predicted sequence, which means the inverse operation of fractional difference, the forecasting runoff sequence can be obtained.
The matrix of fractional integration is expressed as: Where, G −1 is the inverse matrix of the matrix G.

The working framework
The main objective of this paper is to establish a combination autoregressive prediction model based on discrete wavelet decomposition. The study was divided into five steps.
(2) Decompose the hydrologic time series. Based on the multi-resolution analysis ability of wavelet, the original daily runoff time series can be decomposed into high frequency sub-series d j (j = 1, 2, . . . , 6) and the low frequency sub-series c 6 . (3) Process the low frequency sub-series c 6 by the fractional difference method to obtain a stationary subseries. (4) Establish an autoregressive model on each stationary sub-series. Forecast the prospective values by using the autoregressive models and reconstruct them together to get the prediction runoff series. (5) Calculate the statistical error of the predictions and compare with other prediction models.

Case study
In this paper, hydrological data from 2002

Wavelet decomposition of daily runoff time series
Generally, all mother wavelets can be divided into two types, orthogonal or nonorthogonal (Sang, Singh, Sun, Chen, and Liu, 2016). The Daubechies wavelet (dbN) is orthogonal which is usually used for the discrete wavelet transforming. The db4 wavelet is selected to decompose the daily runoff time series of Hankou hydrologic stations from 2002 to 2013, the low frequency sub-series c 6 and the high frequency sub-series d j (j = 1, 2, . . . , 6) can be obtained. The result is as Figure 1. Wavelet analysis can identify different frequency components of hydrological sequences. The graph of the low frequency sub-series c 6 indicates some important characteristics of the daily runoff process, it manifests as the approximate component of the daily runoff process. The principal period of the sequence is annual, and the variation trends are similar in different cycles. Focusing on the sequence in one period, the approximate component increases sostenuto until it gets to the summit, and then begin to decrease. The distinction of approximate component between flood season and non-flood season is revealed clearly in Figure 1. Besides, the high-frequency time-series include d 1 ∼ d 6 , which manifest as the detail components, indicate the detail change characteristics and the saltation of the daily runoff process with a degree of randomness.
Using the ADF test method to test the stability of the sub-series. The detail components are stationary time series and the approximate component is non-stationary time series.

Difference on the non-stationary sub-series
As the detail component c 6 is nonstationary, establishing an AR model with it immediately is unsuited. It is the common practice to difference the non-stationary time series. The Hurst exponent of sequence c 6 is calculated as 0.88, thus the order of fractional difference can be determined as 0.38 according to the Equation (18). Then fractional difference the sequence c 6 with order is 0.38, the time series named as W 1 is the result series. For comparison analyzing, the c 6 sequence and the original time series are differenced with first order. Two new time series named as W 2 or W 3 can be obtained respectively.
Testing all the three new time series by ADF test method, the result shows that they are all stationary. Therefore, autoregressive models can be established with them.

Model establishment and prediction
Because of the large number of the sub-sequences, choosing an autoregressive model with the same order is positive for application. For all sub-sequences in this study, the accuracy of AR (2) model is acceptable. Hence, AR (2) model is adopted in this paper, that is p = 2, AR (2) model is: The regression coefficients can be derived as follows:  In the formula, r 1 , r 2 are autocorrelation coefficients of hydrological sequence delay number 1 and 2 respectively.
The forecasting model named as WFIAR (0.38,2) is established based on wavelet decomposition and fractional-order difference combination autoregressive forecasting model of daily runoff process, where 0.38 means the difference order of the approximate component c 6 and 2 means the order of autoregressive model. At the same time, a first order difference autoregressive forecasting model named as WDIAR (1,2) and a first order difference single autoregressive forecasting model named as DIAR (1,2) are also established. '1' means first-order difference, '2' means the order of autoregressive model. The parameters of the models are showed in Table 1.
Forecast the daily runoff series of Hankou hydrological station in 2014 and 2015 by using the above models respectively. The predicted results of different models are shown in Figures 2 and 3.
For the daily run off in 2014 which is shown in Figure 2, the predicted value by the WFIAR (0.38,1) model is closest to the measured value, followed by the WDIAR (1,2) model, the predicted value of the DIAR (1,2) model has the biggest disparity to the measured value. Similar conclusions can be drawn from Figure 3. In allusion to the daily run off forecast values for 2015, the WFIAR (0.38,1) model has the smallest error, followed by the WDIAR (1,2) model, the DIAR (1,2) model has the largest errors.
The Root Mean Square Error (RMSE) and the Mean Absolute Percentage Error (MAPE) are used to test the prediction accuracy. The results are shown in Table 2.
Where, f (t) is the forecasted series, and f (t) is the measured time-series.

Conclusion
In this paper, a daily runoff prediction model was established, which couples the wavelet decomposition method, the fractional order difference theory and the autoregressive modelling method, named as 'WFIAR'. The daily runoff processes of Hankou Hydrological Station in 2014 and 2015 are predicted by using the above model respectively. The WDIAR forecast model which combined with the wavelet decomposition theory, the first order difference theory and the autoregressive modelling method and the DIAR forecasting model which combined with the  first order difference theory and the autoregressive modelling method are also established for comparing. The result of the statistical errors indicates that the prediction accuracies of the three models. The traditional difference method for dealing with non-stationary sequences is easiest and the most intuitive, but it is likely to produce over-difference problem. Some important details of the missing sequence lost that may leading to large errors in the prediction. The prediction accuracy of the WDIAR forecasting model is better than the DIAR model, but is still far from being satisfactory. Although it exploits the advantage of the wavelet decomposition in time series analysis, the over-difference problem is not solved. The WFIAR model exploits the advantages of multi-resolution wavelet analysis and decomposes hydrological sequences into high-frequency sub-sequences and low-frequency sub-sequences, and fully excavates the variation characteristics of the daily runoff series. At the same time, fractional order difference method not only transform the non-stationary time series to stationary to meet the requirements of establishing the autoregressive model but also preserves the detail variation characteristics of the daily runoff series to a great extent. It is superior to the DIAR forecasting model and the WDIAR forecasting model in prediction accuracy.
It is worth noting that the prediction model established need to determine some parameters, these parameters will directly affect the precision of the model prediction, the choice of parameters is also a cause of the hydrological analysis of uncertainty. In future studies, it could be analysed in greater detail. This paper simply discusses the feasibility of WFIAR forecasting model and its superiority in forecasting accuracy. As regards the structure of the model, there are many areas that can be further improved.