Understanding the dynamical mechanism of year-to-year incremental prediction by nonlinear time series prediction theory

Abstract Previous studies have shown that year-to-year incremental prediction (YIP) can obtain considerable skill in seasonal forecasts. This study analyzes the mathematical definition of YIP and derives its formula in the nonlinear time series prediction (NP) method. It is shown that the two methods are equivalent when the prediction time series is embedded in one-dimensional phase space. Compared to previous NP models, the new one introduces multiple external forcings in the form of year-to-year increments. The year-to-year increments have physical meaning, which is better than the NP model with empirically chosen parameters. The summer rainfall over the middle to lower reaches of the Yangtze River is analyzed to examine the prediction skill of the NP models. Results show that the NP model with year-to-year increments can reach a similar skill as the YIP model. When the embedded number of dimensions is increased to two, more accurate prediction can be obtained. Besides similar results, the NP method has more dynamical meaning, as it is based on the classical reconstruction theory. Moreover, by choosing different embedded dimensions, the NP model can reconstruct the dynamical curve into phase space with more than one dimension, which is an advantage of the NP model. The present study suggests that YIP has a robust dynamical foundation, besides its physical mechanism, and the modified NP model has the potential to increase the operational skill in short-term climate prediction.


Introduction
Short-term climate prediction is a hot but challenging scientific topic in current climate research. According to previous studies, the methods frequently used in short-term climate prediction include: the statistical approach, the hybrid statistical and dynamical approach, the numerical model approach, and the nonlinear time series prediction approach (hereafter NP). The latter is based on the phasespace reconstruction theorem (Casdagli 1989), which implicitly requires a stationary system. However, weather and climate systems are influenced by perturbations of driving forces; in other words, the atmosphere is essentially nonstationary in dynamic terms. Previous studies have indicated that the stationarity of atmospheric processes is changeable. For instance, Tsonis (1996) found that fluctuations around the global mean precipitation amount have increased significantly, which means that global precipitation was a nonstationary process over the past century. The cause of nonstationarity is the change in driving forces with time (Manuca and Savit 1996). This led Wang et al. (2011) to develop a new prediction model with such driving forces, which can improve the accuracy of prediction effectively when applied to the time series of a single climate variable.
However, most time series from the real world, especially those of processes typically related to climate, are too short. When we apply the NP method to such short may achieve a better understanding of YIP with respect to its dynamical mechanism. In addition, we can analyze its advantages and disadvantages, and then explore its potential for application to other seasonal events in shortterm climate prediction.

Data and methods
Monthly precipitation data from 160 stations  in China provided by the China Meteorological Administration are used in this study. Following Chen and Zhao (2000), the mean precipitation of 17 stations during June-August over the middle to lower reaches of the Yangtze River (Nanjing, Hefei, Shanghai, Hangzhou, Anqing, Tunxi, Jiujiang, Hankou, Zhongxiang, Yueyang, Yichang, Changde, Ningbo, Quxian, Guixi, Nanchang, and Changsha) is selected to represent the summer rainfall over this region.
The present study employs monthly atmospheric variables from the NCEP-NCAR reanalysis data set, with a resolution of 2.5° × 2.5° (Kalnay et al. 1996).

The YIP method
The YR can be predicted on the basis of precursory factors using the YIP model (Fan, Wang, and Choi 2008).
where ΔY stands for the year-to-year increments of YR, Δf is the year-to-year increments of the normalized indices (also known as DY -the difference of a variable between the current year and the preceding year), c i is the fitting parameter, and the subscript in Δf represents the number of the predictor.
Each Δf is defined as where N is the number of years of historical data. The details of six factors can be found in Fan, Wang, and Choi (2008) (also introduced in Section 3.1 in the present paper).

The NP method with external forcing
The method of using a single-point nonlinear model with driving forces refers to the work of Wang et al. (2011) and Wang, Yang, and Zhou (2013). Assuming a non-stationary process with two series x i i=1,2,⋯,n and i i=1,2,⋯,n , the former variable is the system state and the latter is the driving force. By selecting a proper parameter τ, we can embed (1) ΔY = c 1 Δf 1 + c 2 Δf 2 + c 3 Δf 3 + c 4 Δf 4 + c 5 Δf 5 + c 6 Δf 6 , time series, the model sometimes encounters a data 'bottleneck' . Researchers have thus proposed the 'spatiotemporal series' method, which attempts to utilize the information at different spatial positions to remedy the insufficiency in the length of the time series. For example, with a spatiotemporal artificial neural network system, Yang, Zhou, and Bian (2000) carried out a regional prediction experiment regarding the distribution of atmospheric ozone over China, and the accuracy of the prediction was beyond 43%. In addition, based on the spatiotemporal series method, Chen et al. (2003) improved the extended-range (monthly) dynamical prediction of the pentad zonal mean height and revealed that spatiotemporal series can effectively improve the ergodicity of single-variable time series. Wang, Yang, and Lü (2004) introduced the idea of spatiotemporal series to enhance the original NP method and, through the example of 500 hPa height over the Northern Hemisphere, found that it is valuable to apply this technique to regional climate prediction. In short-term climate prediction, the predictands are traditional climate variable anomalies (Wang et al. 2012). As most regions in East Asia vary in connection to the tropospheric biennial oscillation (TBO), it may be easy to capture the interannual change if the prediction method takes the TBO into consideration. However, decadal change in the climate mean may lead to uncertainty in the prediction. To deal with this issue, Fan, Wang, and Choi (2008) proposed a new prediction scheme in their study of summer rainfall over the middle to lower reaches of the Yangtze River, named the year-to-year incremental prediction (YIP) method. Hereafter, we use YR to stand for the summer rainfall in this region. In this approach, the first stage is to predict the year-to-year precipitation increments, and the precipitation and precipitation rate anomalies are computed. The average root-mean-square error (RMSE) of the fitting period is 20%, and 18% in the hindcast stage of YR. The prediction model captures the interannual variation of YR, and reproduces the ascending trend in 1984-1998 and the descending trend in 1998-2006, and thus raises the prediction skill of YR strikingly. This method has also been applied to the prediction of summer rainfall in North China (Fan, Lin, and Gao 2009), winter surface air temperature (Fan 2009) and summer temperature (Fan and Wang 2010) over Northeast China, typhoon frequency over the western North Pacific (Fan and Wang 2009), wintertime heavy snow activity in Northeast China (Fan and Tian 2013), and the North Atlantic Oscillation (Tian and Fan 2015), among other climatic features.
There have been many variants of YIP, but the fundamental method is that based on Fan, Wang, and Choi (2008). The benefit of YIP is that it involves physical mechanics, while the advantage of NP is that it has robust mathematical dynamical foundations. By combining YIP and NP, we them into a state space (also known as phase space) with m 1 + m 2 dimensions, and then obtain the state trajectory where m 1 and m 2 are the dimensions of x i and i , respectively; and N = n − max m 1 , m 2 − 1 is the number of points in phase space. After reconstructing this trajectory, we can build a prediction model by means of the global approximation method (Casdagli 1989), where x i ; i is an undetermined function, x i is the system state, i is the driving force, and ɛ i is the fitting residual. The condition to determine x i ; i is to make the sum of the squared residuals reach its minimum, Here, φ is supposed to be a second-order polynomial at most.
The single-point prediction method can be expanded to the spatial time series prediction method, which not only includes the driving forces but also the information from the surrounding four neighboring points. The spatiotemporal series prediction model can be written as The embedding dimension of x (i,j) t is m 1 , and the forcings t and t (i+1,j) t are embedded in the m 2 dimension (in this study, we choose m 1 = m 2 = m). (3)

Relationship between YIP and NP
Once the sample number N in YIP is large enough, we have where C is the expected value of f i − f i−1 , and ΔY = c 1 Δf 1 + c 2 Δf 2 + c 3 Δf 3 + c 4 Δf 4 + c 5 Δf 5 + c 6 Δf 6 is equivalent to

and we have
This formula is analogue to the NP model (Equation (4)). This implies that they have some corresponding relationship. Equation (4) only considers one forcing originally, but we can expand it to multiple forcings and one variable system by way of Equation (6): Here, ɛ t is the fitting residual. The model only uses the value of one previous step during prediction, so it can be regarded as an NP model with m = 1 and τ = 1 to predict one step ahead. Therefore, we can conclude that YIP and NP are equivalent when linear fitting is applied and N is large enough. From the theoretical analysis, YIP could be considered as a special case of NP, so the YIP method can be applied to nonlinear modeling. YIP emphasizes the effect of quasi-biennial signals; thus, this method has explicit physical meaning compared with the NP method.

Hindcast and result
To reveal the equivalence of YIP and NP, and compare their performance, we apply the two methods to the following rainfall prediction experiments. Figure 1 shows the wavelet analysis of YR. It illustrates that there is a strong 2-5-yr period of YR variations in the period 1965-2006, so we can amplify the anomaly signal of the rainfall via YIP. Fan, Wang, and Choi (2008) analyzed the correlation coefficients between the year-to-year increments of YR and the geopotential height field at 500 hPa in spring the area-averaged geopotential height at 500 hPa over the region (55°-60°N, 120°-150°E), and the correlation coefficient between the year-to-year increments of YR and the EAI is −0.44 during 1965-1996, reaching the 95% confidence level. Figure 2 depicts the correlation coefficients between the year-to-year increments of YR and meridional wind shear between the 850 hPa and 200 hPa levels in March-May (MAM) (i.e. v850 minus v200). As is shown in Figure 2, there is negative correlation in the region of (20°S-10°N, 120°-140°E), which is the place for the interaction of the monsoon and ENSO. Next, we define an index for meridional wind shear around Indo-Australia (WSI) using the area-averaged meridional wind shear over the region (20°S-10°N, 120°-140°E). The correlation coefficient between the year-to-year increments of YR and the WSI is −0.33 during 1965-1996, reaching the 90% confidence level. Figure 3(a) reveals the correlation coefficients between the year-to-year increments of YR and sea level pressure in December-February (DJF). According to Figure 3(a), there is a remarkable negative-correlation area in the South Pacific. Therefore, we define a South Pacific sea level pressure index (SPI) using the area-averaged sea level pressure over the region (40°-30°S, 130°-110°W). The correlation coefficient between the year-to-year increments of YR and the SPI is −0.49 during 1965-1996, reaching the 99% confidence level. Figure 3(b) shows the correlation coefficients between the year-to-year increments of the SPI in DJF and the sea level pressure in JJA. The result indicates that the SPI in winter is linked to the western Pacific subtropical high in summer.

Precipitation prediction of YIP
When analyzing the correlation coefficients between the year-to-year increments of YR and vorticity at 850 hPa, obvious positive coefficients are found in the region (30°-35°N, 115°-120°E) from spring to summer. Thus, we and found that the Urals high and East Asian trough have a great influence on YR. So they defined two indices: the spring Eurasian circulation index (EUI) and the spring East Asian circulation index (EAI). The EUI is the area-averaged geopotential height at 500 hPa over the region (60°-70°N, 30°-60°E), and the correlation coefficient between the year-to-year increments of YR and EUI is 0.41 during 1965-1996, reaching the 95% confidence level. The EAI is negative YR increment, the model does not behave well, but it still produces a negative increment.

Precipitation prediction of NP
When using the NP method to build the prediction model, we utilize the same five factors and the former 32-yr precipitation data to reconstruct the phase trajectory, and then conduct one-step prediction in the latter 10 years to examine the performance. In the model, we choose the embedded dimensions as m = 1 and m = 2, respectively, and the time delay parameter as τ = 1.
The NP results are consistent with the observed values ( Figure 5). In the middle to lower reaches of the Yangtze River, NP can capture the upward trend at the end of the twentieth century, and the downward trend after 2000. For the years with a large amount of precipitation, the predicted values are similar to observed, such as in 1997 and 1983. For the years with a fairly small amount of precipitation, the prediction model also behaves well, such as in 1972 and 1981. Overall, the YIP and NP methods predict similar results.
Though the correlation coefficient of the NP model (m = 1) in the building stage is smaller than that in YIP (Table 1), the correlation coefficient in the predicting stage is similar to YIP. We also apply the NP model with a higher number of embedded dimensions (m = 2) and find an increased correlation coefficient (Table 1).
To further evaluate the accuracy of precipitation prediction, we define two quantities: the percentage of relative error of prediction, And the average relative RMSE, consider the vorticity index (VOI), i.e. the 850-hPa vorticity averaged in the above region, as a factor in the prediction of YR year-to-year increments. The correlation coefficient between the year-to-year increments of YR and the VOI is 0.49 during 1965-1996, reaching the 99% confidence level. Fan, Wang, and Choi (2008) considered the AAO (Antarctic Oscillation) index as well, but we find that the correlation coefficient of the year-to-year increments of YR and the AAO is relatively small. There is no significant difference in the prediction when we exclude the AAO index. Hence, we do not take the AAO index into account in this paper.
We build the YIP prediction model based on the year-toyear increments of five factors from 1965 to 1996 in order to predict the YR, and then conduct the hindcast in the following 10 years. The statistical forecast model of YR yearto-year increments through the multi-linear regression is Figure 4 illustrates the simulated and observed year-toyear increments of YR from 1965 to 2006. It suggests that the simulated increments from the above five factors resemble the observations quite well. The correlation coefficient between the model result and observation is 0.74 in 1965-1996, and 0.56 in 1997-2006. There is an obvious biennial variation in the year-to-year increments of YR, which features alternately positive and negative anomalies in neighboring years. Most of the years with large positive and negative anomalies during the previous period from 1965 to 1996, such as in 1977, 1978, 1980, and 1981, are   models all still have large prediction errors in 1999 and 2000. Generally, the NP method may obtain similar results to the YIP method. When we compare the results of two cases of the NP method, the precipitation mean-square error is a little lower when the number of embedded dimensions is set to two rather than one. The mean-square errors (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006) of the three models (YIP, NP (m = 1) and NP (m = 2)) are 26.62%, 22.71%, and 22.30%, respectively.

Conclusion
The YIP method is based on the knowledge that the physical processes of the predictands and the climate variables have the characteristics of quasi-biennial variation. Therefore, we can use the preceding year's observational information as much as possible to improve the prediction results. Previous research has revealed that the YIP approach may improve the forecast skill (Wang et al. 2012). We analyze the mathematical definition of YIP and obtain its corresponding formula in the NP method. It proves that they are equivalent when the prediction time series is embedded in one-dimensional phase space. This theoretical result suggests that YIP also has robust mathematical and dynamical foundations, besides its physical mechanism.
We demonstrate the NP model with multiple external driving forces. The model is different from previous NP models in its application of year-to-year increments (Δf) as forcing factors. Hence, the quasi-biennial signals with explicit physical meaning are included, which is better than the NP model with empirically chosen parameters. In a certain sense, the two models are equivalent. Nevertheless, the NP method has more dynamical meaning, as it is based on the classical reconstruction theory. By choosing different embedded dimensions, the NP model Here, y is the simulated YR, y 0 is the observation, and y 0 is the multi-year average precipitation from 1965 to 1996.   can reconstruct the dynamical curve into phase space with a higher number of dimensions than one. In addition, the fitting residual of NP can be set to second-order polynomial precision, which is more convenient than the original linear regression of YIP, indicating the superiority of the NP model over the YIP model. We also notice that YIP and NP have some differences when the sample number N is not big enough, and these differences can decrease with an increase in N. The numerical results suggest that these differences are acceptable in practical prediction experiments. We select the YR to test the prediction skill of the NP models. Five predictors are introduced into the YIP and NP models. Results show that the NP model with yearto-year increments of former signals can obtain similar skill as the YIP model. When we increase the number of embedded dimensions to two, more accurate prediction can be obtained. These results indicate that the NP model has the potential to increase the operational skill in shortterm climate prediction.