Introducing driving-force information increases the predictability of the North Atlantic Oscillation

ABSTRACT The North Atlantic Oscillation (NAO) is the most prominent mode of atmospheric variability in the Northern Hemisphere. Because of the close relationship between the NAO and regional climate in Eurasia, North Atlantic, and North America, improving the prediction skill for the NAO has attracted much attention. Previous studies that focused on the predictability of the NAO were often based upon simulations by climate models. In this study, the authors took advantage of Slow Feature Analysis to extract information on the driving forces from daily NAO index and introduced it into phase-space reconstruction. By computing the largest Lyapunov exponent, the authors found that the predictability of daily NAO index shows a significant increase when its driving force signal is considered. Furthermore, the authors conducted a short-term prediction for the NAO by using a global prediction model for chaotic time series that incorporated the driving-force information. Results showed that the prediction skill for the NAO can be largely increased. In addition, results from wavelet analysis suggested that the driving-force signal of the NAO is associated with three basic drivers: the annual cycle (1.02 yr), the quasi-biennial oscillation (QBO) (2.44 yr), and the solar cycle (11.6 yr), which indicates the critical roles of the QBO and solar activities in the predictability of the NAO. Graphical abstract


Introduction
The North Atlantic Oscillation (NAO) acts as one of the most dominant modes of global climate variability, affecting the weather patterns over the North Atlantic Ocean, North America, Europe, and even the entire Northern Hemisphere (Wallace and Gutzler 1981;Hurrell 1995;Li, Sun, and Jin 2013;Jajcay et al. 2016;Delworth et al. 2016). On the one hand, the NAO is believed to be generated by internal atmospheric processes (i.e. eddy-mean-flow interaction of the atmosphere (Barnes and Hartmann 2010) and atmospheric responses to external forces, i.e. oceanic forcing (Rodwell, Rowell, and Folland 1999;Keenlyside et al. 2008)); whilst on the other hand, the NAO is considered as a forcing agent for the interannual and multidecadal variability of sea surface temperature and oceanic circulation (Nicolay et al. 2009;Li, Sun, and Jin 2013).
The NAO affects the climate variability on multiple time scales, from daily to decadal (Jones, Jonsson, and Wheeler 1997;Zuo et al. 2016), and has close relationships with many extratropical weather systems (e.g. storm track, extratropical cyclones); so, improving the prediction skill for the NAO has become an important topic in operational weather forecasting (Domeisen, Badin, and Koszalka 2018). Previous studies have found that considering the effects of atmospheric teleconnections and external forcings can extend the predictability of the NAO on a range of time scales. Potential candidates for the external forcings of the NAO are El Niño-Southern Oscillation (ENSO) (Mokhov and Smirnov 2006), stratospheric activity (Scaife et al. 2016), change in Arctic sea ice (Sun, Deser, and Tomas 2015), and solar activities (Thiéblemont et al. 2015).
Most previous studies that employed climate models to determine the driving forces of the NAO were based on dynamic causal relationships between the NAO and other factors (Mokhov and Smirnov 2006). This can be regarded as a 'direct problem' from a mathematical viewpoint. In recent years, based on the 'inverse problem' perspective, algorithms that extract driving forces directly from time series have been developed (Eckmann, Kamphorst, and Ruelle 1987;Verdes et al. 2001;Wiskott 2003). Slow Feature Analysis (SFA) is a representative algorithm that aims to extract the signal of driving forces from nonstationary time series (Konen and Koch 2011;Wiskott 2003). SFA was first applied in the neurobiology field, and then successfully introduced into climate science Yang et al. 2016;Wang, Yang, and Zhou 2017;Zhang et al. 2017). SFA-derived signal can be regarded as the combined effect of different driving factors. Pan, Wang, and Yang (2017) demonstrated the applicability of SFA and wavelet analysis in extracting driving-force signals and determining potential driving factors from a non-stationary time series.
Based on an idealized prediction model, Wang et al. (2011) found that external forces act the same roles as state variables. They developed a nonlinear time series prediction model that introduces the information on external forces into phase-space reconstruction. Specifically, they incorporated the SFA-extracted driving-force signal rather than the real driving factors into the prediction model. Their results showed that this model improves the prediction skill significantly (Chen, Wang, and Jin 2015;Wang and Chen 2015). However, it remains unknown whether this approach can be used to increase the predictability and prediction skill for realistic climate modes, like the NAO.
In this study, the predictability and prediction skill of the NAO was reassessed by introducing the SFAextracted driving-force signal into phase-space reconstruction. Furthermore, the physical features of these driving forces were revealed, and attributed to three basic drivers of climate through wavelet analysis. Section 2 describes the data and methods used in this study. Results are presented in section 3, followed by conclusions and discussion in section 4.

NAO index
Traditionally, the NAO index is defined as the difference in sea level pressure (SLP) between the Icelandic Low and a semi-permanent high-pressure system centered at the Azores islands (i.e. station-based indices (Jones, Jonsson, and Wheeler 1997)). Some studies that focused on the large-scale variability of the NAO employed an empirical orthogonal function (EOF) analysis of SLP over the North Atlantic and treated the first principal component (PC1) time series as the NAO index (i.e. EOF-based indices (Thompson and Wallace 2000)). Domeisen, Badin, and Koszalka (2018) analyzed the Lyapunov spectra of various NAO indices and found that the predictability time scales of the EOF-based NAO indices are around 12-16 days, and that the station-based indices exhibit a longer predictability of 18-20 days. Thus, we used an NAO index based on Rotated Principal Component Analysis (Barnston and Livezey 1987). This index can be obtained from the NOAA website: ftp://ftp.cpc.ncep.noaa.gov/cwlinks/. Figure 1 (a) shows the time series of daily NAO index (from 1 January 1950 to 31 December 2017), which includes 24 820 days. We divided it into three periods: Period 1 (1-10 000 days); Period 2 (10 001-20 000 days); Period 3 (20 001-24 820 days. This procedure allowed us to compare the results of different periods.

SFA
The main idea of SFA is embedding a time series into state space by using an embedding dimension m and a time-delay parameter τ. By applying Principal Component Analysis to estimate the velocity field of the state space, we obtained the smallest eigenvalue and its corresponding eigenvector. Then, we projected the time series into this eigenvector to obtain its SFAderived driving-force signal. More details about the SFA algorithm can be found in previous studies (Wiskott 2003;Konen and Koch 2011;Yang et al. 2016). It should be noted that the driving-force signal derived from SFA is sensitive to the chosen parameters. To increase the robustness of the procedure, additional experiments were needed. In this study, the embedding dimension m was set to 31 days (a time-window of one month) and the time lag to 1 day. Figure 1(b) shows the driving-force signal of the NAO extracted by SFA. It is apparent that the driving-force signal of the NAO is distinct from the native NAO index.

Estimation of the predictability of the NAO
The largest Lyapunov exponent can be used to represent the sensitivity of a chaotic system to initial values. However, it has been suggested that the largest Lyapunov exponent alone (even if it is positive) cannot guarantee the system to be chaotic (Kantz, Kurths, and Mayer-Kress 1998). Therefore, the Surrogate Data Method, which randomizes the phase of time series, can be used to determine whether a system is deterministic or stochastic. Specifically, if the feature of the surrogate data (e.g. the largest Lyapunov exponent) has no significant distinction from that of the native time series, the native time series is stochastic. Based on this method, we produced a series of surrogate time series during the three above-mentioned time periods. We estimated the uncertainty range of the largest Lyapunov exponents of these surrogate time series by setting different embedding dimension parameters.
Here, the largest Lyapunov exponent was calculated following the algorithm of Wolf et al. (1985), which was based on the embedding theorem (Takens 1981). A time series fx 1 ; x 2 ; :::; x n g (n is the length of the time series) can be embedded in an m 1 -dimensional phase space, which is defined by the delay coordinates with time lag τ: where i = 1, 2, . . ., N, N = n -(m 1 -1)τ is the number of phase points.
When incorporating the driving-force signal fα 1 ;α 2 ;:::;α n g(n is the length of the driving-force signal) into the phase space, the embedding dimension of the signal was set to m 2 and the time lag to τ, and turned Equation (1) into: Then equation (2) can be simply expressed as where i = 1, 2, . . . N, N = n− (max(m 1 , m 2 ) -1)τ is the number of phase points on the trajectory. To quantify the growth rate of initial perturbations in the reconstructed phase space of the dynamical system, the largest Lyapunov exponent was computed. Furthermore, the inverse of the largest Lyapunov exponent can be interpreted as an intrinsic time scale of the predictability of the variation.
Regarding the parameters used in this study, we chose the decorrelation time scale when the timelagged autocorrelation function of the daily NAO index crossed the value of e −1 (here, 4 days). Also, in the original time series, we chose the pair of points whose temporal separation was at least one mean orbital period (Wolf et al. 1985), which was 365 days in our test. Furthermore, we carried out our experiments based on different embedding dimensions.

NAO prediction model
Based on the idea of Wang et al. (2011), we introduced the SFA-derived signal of NAO index into the prediction model that was established by the theory of state-space reconstruction (Takens 1981). According to Equation (3) we can use the global approximation method to establish a predictive model as: xðj þ 1Þ ¼f ðxðjÞ; αðjÞÞ þ εðjÞ (4) wheref is a desired function and εðiÞ are fitting errors, and j is the length of time series used to build the prediction model. Here,f is assumed to be a oneorder polynomial. More details can be found in Wang et al. (2011); Wang and Chen (2015). We used 20 sub-datasets with the same length (20 examples) to test each model. The initial 23 591 data points were used to build the predictive model, and the last 1000 were used to test the prediction skill, which was measured by the correlation coefficient between observed and predicted ones. The ensemble prediction results of these 20 different cases were analyzed.
In each model, the time lag τ was set to one, and the m 1 values were set from 3 to 8. The ensemble-mean results are presented. The embedding dimensions of driving-force m 2 values were set from 0 to 3, respectively. If m 2 is equal to zero, it means the SFA signal had not been taken into account in the predictive model. For the sake of convenience, we refer to the prediction model that only considered the native NAO index as the 'stationary model' (m 2 = 0) and the one that incorporates the NAO with its SFA-derived signal as the 'forcing model' (m 2 = 1, 2, 3, respectively).

Dynamic feature of the SFA-derived driving-force signal of NAO index
We created 100 surrogate time series for each period and obtained a range of the largest Lyapunov exponents by setting different embedding-dimension parameters from 3 to 10. As shown in Figure 2, the predictability of the SFA-derived driving-force signal was significantly different from its corresponding surrogate time series for most embedding dimensions. These results confirm that the SFA-derived driving-force signal of NAO index (hereafter referred to as SFA-NAO) has a dynamic feature. However, for period 1, the obvious distinctions only occurred when embedding-dimension was set to 5 and 6. This is probably due to the inaccuracy of the dataset during the early period (from 1 January 1950 to 25 April 1977).
3.2. Introducing information on the SFA-derived driving force into the NAO prediction According to Equation (3), we introduced the SFA-NAO information into the phase-space of the NAO and computed the largest Lyapunov exponent. The embedding dimension was set from 3 to 10 (as shown by the green line in Figure 3(a)). For the sake of comparison, we also show the largest Lyapunov exponent of the native NAO index (phase space without the SFA signal) for the whole time period and three individual time periods, respectively.
As shown in Figure 3(a), with the increase in embedding dimension, the largest Lyapunov exponent decreased. The largest Lyapunov exponent of the signal that incorporated with SFA-NAO signal was smaller than that based on the native NAO index, irrespective of time period. Considering that a smaller value of the largest Lyapunov exponent corresponds to higher predictability, we suggest that introducing information on the SFA-derived driving force can increase the predictability of the NAO.
The above finding is consistent with the results of previous studies (Yang and Zhou 2005;Wang et al. 2011), which considered climate system possesses with a hierarchical structure and the essential cause of the nonstationary characteristics of the climate system was the time-varying feature of its driving forces. As compared with 'over-embedding', incorporating the driving force into the reconstructed system is a more effective way to predict nonstationary time series. Therefore, we introduced the SFA-derived signal of the NAO into the prediction model. As described in section 2.4, we chose 20 different time periods (i.e. 20 different experiments) and divided each of them into two sub-periods. We used the data of the first subperiod to build the prediction model and applied the latter (1000 values) to estimate the correlation coefficient and quantify the model's skill. Figure 3(b) shows the ensemble mean of the correlation coefficients for the 20 experiments. Meanwhile, we also examined the performance of the prediction model with a higher embedding dimension but without the introduction of the SFA-derived signal. As shown in Figure 3(b), for the stationary model without the SFA-derived signal, increasing the embedding dimension partly improved the prediction accuracy. Furthermore, as compared to the stationary model, the prediction accuracy of the forcing model that incorporated the SFA-derived signal was increased significantly, even if the embedding dimension m 2 was set to 1 (purple line). Increasing the embedding dimension (for instance, setting m 2 from 1 to 2 (green line) and 3 (blue line), respectively) led to higher prediction accuracy. It should be noted that  when the prediction step was set to 20, the correlation coefficient exceeded 0.5.

Wavelet analysis of the SFA-derived driving-force signal of NAO index
Previous studies based on idealized models suggest that SFA can be used to extract either the slow driving force itself or its subcomponent from a nonstationary time series (Konen and Koch 2011). Furthermore, by virtue of wavelet analysis (Torrence and Compo 1998), we found that the filtered signals of the peak points in the time-averaged power spectrum of the SFAextracted signal correspond to the true independent driving forces very well (Pan, Wang, and Yang 2017).
In this study, we used this approach to further extract the physical information involved in the SFAderived driving-force signal of NAO index. As shown in Figure 4, the peak scales that passed the significant test at the 95% confidence level were 0.51 yr, 1.02 yr, 2.44 yr, 5.88 yr, 11.6 yr, 23.2 yr, and 32.8 yr. Following harmonic analysis, it was found that the peak scales of 1.02 yr, 2.44 yr, and 11.6 yr were physically independent. The peak-scales of 1.02 yr, 2.44 yr, and 11.6 yr correspond to the annual cycle, the QBO cycle, and solar cycle, respectively. Other peak scales can be regarded as harmonic oscillations of these three base scales (e.g. 0.51 yr = 1.02 yr/2; 5.88 yr = 11.6 yr/2; 23.2 yr = 11.6 yr × 2; 32.8 yr ≈ 32 yr × 1.02). These results suggest that these three climatic factors are the dominant driving forces of the NAO.

Discussion and conclusions
In this study, SFA was used to extract the driving-force signal from daily NAO index and quantify its predictability. By computing the largest Lyapunov exponent and building a time-series prediction model, the SFAderived signal was introduced into the phase-space reconstruction. Results showed that both the predictability and the prediction skill for the NAO improved significantly. Wavelet analysis showed that the SFAderived driving-force signal of NAO index includes three independently characteristic time scales, which are associated with the annual cycle, QBO, and solar activities, respectively.
Obviously, solar radiation is the main energy source of the climate system. The annual cycle is associated with the movement of point of direct sunlight. The QBO is a major driver of extratropical climate variability The dashed red line in the right-hand panel is the 95% confidence level, and the red dots represent the peak power. (Marshall and Scaife 2009), the most prominent feature of which is equatorial stratospheric zonal wind. Those signals with a similar QBO periodicity have also been widely observed in the subtropical troposphere (Thompson, Baldwin, and Wallace 2002). It has been indicated that the QBO exerts influence on tropospheric circulation and surface weather directly through inducing downward and poleward propagating changes in the zonal mean meridional circulation and zonal wind structure Hartmann 2011a, 2011b) and imposing impacts on the stratospheric polar vortex (Kidston et al. 2015). Both models and observations indicate the relationships between the NAO and QBO (Marshall and Scaife 2009;Asbaghi, Joghataei, and Mohebalhojeh 2017). Moreover, a previous study suggested that the spatial structure of the NAO can be modulated by the phase of the solar cycle (Kodera 2003).
It should be noted that we only analyzed cases when the embedding dimension was set to 31 days (i.e. a time window of one month). In the case of different embedding-dimension settings, it might be possible to detect other characteristic time scales, such as the ENSO cycle. The physical information on the SFAderived driving-force signal of NAO index needs further study. Nevertheless, our results suggest that introducing the SFA-derived signal has the potential to increase the predictability of the NAO and improve the prediction accuracy of statistical models.