Data assimilation using a climatologically augmented local ensemble transform Kalman filter

Ensemble data assimilation methods are potentially attractive because they provide a computationally affordable (and computationally parallel) means of obtaining flow-dependent background-error statistics. However, a limitation of these methods is that the rank of their flow-dependent background-error covariance estimate, and hence the space of possible analysis increments, is limited by the number of forecast ensemble members. To overcome this deficiency ensemble methods typically use empirical localisation, which allows more degrees of freedom for the analysis increment by suppressing spatially distant background correlations. The method presented here improves the performance of an Ensemble Kalman filter by increasing the size of the ensemble at analysis time in order to boost the rank of its background-error covariance estimate. The additional ensemble members added to the forecast ensemble at analysis time are created by adding a collection of ‘climatological’ perturbations to the forecast ensemble mean. These perturbations are constant in time and provide state space directions, possibly missed by the dynamically forecasted background ensemble, in which the analysis increment can correct the forecast mean based on observations. As the climatological perturbations are calculated once, there is negligible computational cost in obtaining the additional ensemble members at each analysis cycle. Included here are a formulation of the method, results of numerical experiments conducted with a spatiotemporally chaotic model in one spatial dimension and discussion of possible future extensions and applications. The numerical tests indicate that the method presented here has significant potential for improving analyses and forecasts.


Introduction
Data assimilation aims to optimally combine a best-guess forecast of a system state with observations of the true system state. This best-guess is known as the background state and is typically made with a computer model. Modern data assimilation includes variational and ensemble methods, which update the background state vector using either static (for variational methods) or flow-dependent (for ensemble methods) background-error covariance models. Due to the size of atmospheric models, static covariances between atmospheric variables are typically approximated as spatially homogeneous and isotropic (Wang et al., 2007). Static covariance matrices have high rank and can effectively encode important relationships and balance constraints. In contrast, flow-dependent covariance matrices, such as those approximated using ensemble methods, can distinguish physical spatially varying and time-dependent relationships that may not be accounted for by static covariance matrices. However, such ensemble-derived covariance estimates are typically restricted to much lower rank than the covariance estimates used in variational methods. The empirical technique of localisation Mitchell, 1998, 2001;Hamill et al., 2001;Ott et al., 2004) is the most often used method for compensating for the rank deficiency of ensemble-derived covariance estimates (see Section 2 for further discussion).
The aim of this paper is to propose a method for improving the performance of ensemble methods by effectively increasing the rank of their estimates of background-error covariance relationships. The way in which this is accomplished is through the inclusion of information from a static, *Corresponding author. email: mkretsch@umd.edu Tellus A 2015. # 2015 M. Kretschmer et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for any purpose, even commercially, provided the original work is properly cited and states its license. climatologically derived background-error covariance estimate, such as that which is often used in variational methods. One of the simplest and most common ways to combine flow-dependent and static covariance estimates is through a linear combination of their respective covariance matrices, which is essential to the workings of hybrid data assimilation methods (Hamill and Snyder, 2000;Lorenc, 2003). In contrast, here we consider remaining entirely within the ensemble data assimilation framework and achieve higher rank by using climatological information to construct additional ensemble members. This approach achieves a higherrank estimate of the background-error covariance matrix without simultaneously increasing the number of forecasted ensemble members.
Referring to the ensemble members corresponding to computed forecasts as the 'dynamic' ensemble, at analysis time we augment the ensemble by adding supplementary static 'climatological' ensemble members. These climatological ensemble members are formed by adding constantin-time perturbations to the background dynamic ensemble mean at each analysis time. In our method, these climatological perturbations are chosen to be approximately parallel to the directions of the leading eigenvectors of a static, climatological background-error covariance matrix, thus potentially enabling the analysis to allow for additional error directions that may not be well-represented by the dynamic ensemble. After these new ensemble members have been created and the ensemble has been enlarged, assimilation is performed on the collection of both dynamic and climatological ensemble members. The analysis ensemble members which correspond to the updated background (dynamic) ensemble members are then forecasted to the next analysis time, and the cycle is repeated. The accuracy of ensemble data assimilation methods can strongly depend on the number of ensemble members used, and we believe that our method will obtain benefits from increased ensemble size, without correspondingly increasing the number of forecasts carried out.
A related but fundamentally different technique to enhance ensemble perturbations is additive covariance inflation (Hamill and Whitaker, 2005;Wang et al., 2009). This approach adds perturbations randomly sampled from a climatological error distribution to each ensemble member during each analysis cycle. A key difference between this approach and the climatologically augmented Local Ensemble Transform Kalman Filter (caLETKF) is that the caLETKF increases the size of the ensemble, and hence the rank of the background covariance.
The rest of this paper is organised as follows. Section 2 provides background on ensemble data assimilation methods, while Section 3 formulates our proposed method. Section 4 describes numerical experiments testing the effectiveness of our approach, and Section 5 contains a review and discussion of the results. Section 6 contains our conclusions and additional discussion.

Background: the ensemble Kalman filter
Ensemble data assimilation methods are newer than variational methods and show great potential for many geophysical applications. Specifically, we focus here on the ensemble Kalman filter method (Evensen, 1994;Burgers et al., 1998). These filters estimate the time-evolving background covariance matrix, P b , as the sample covariance of an ensemble of k d model forecasts. Each of these forecasts can be done independently of the others, allowing for a naturally parallel computational method for forecasting background-error covariances. However, due to computational limitations, the ensemble size k d is typically much smaller than the size N of the model state vector, k d B BN. Thus, despite gaining flow-dependence, the estimate of P b is severely rank deficient. Empirical techniques such as localisation are one way to remedy this rank deficiency. Localisation works by suppressing correlations between model variables beyond some spatial distance determined by an empirically tuned localisation radius. Though localisation can substantially increase the effective rank of background covariances, practical limitations on the ensemble size may still result in information being missed by the ensemble. Decreasing the localisation radius in such a situation is not necessarily a solution to this problem, as too much localisation can lead to deleterious effects, such as increased imbalances in the analyses (Greybush et al., 2011). On the contrary, larger localisation radii may improve state estimates at the cost of requiring a larger ensemble.
There are several variants of the ensemble Kalman filter (e.g. Houtekamer and Mitchell, 1998;Anderson, 2001;Bishop et al., 2001;Whitaker and Hamill, 2002;Ott et al., 2004;Wang et al., 2004). A class of ensemble filters, known as square root filters, work by finding transformations that, when applied to the background ensemble members, produce a new collection of ensemble members whose mean and sample covariance obey the Kalman filter equations. Though we apply our method to one of these Kalman filter variants known as the Local Ensemble Transform Kalman Filter (LETKF) (Hunt et al., 2007), it can also be applied to other ensemble Kalman filter formulations.
The LETKF finds an ensemble of analysis state vectors whose mean and sample covariance match those given by the Kalman filter equations, in the case of a linear observation operator H. It represents the k d member background ensemble by its mean x b , a N-dimensional vector, and a N )k d dimensional matrix of perturbations X b , obtained by subtracting the mean from each ensemble member. The LETKF finds the analysis ensemble mean x a and ensemble perturbations X a by transforming x b and X b via (1) In the LETKF, the transformations w and W are found locally at each model grid point and depend on the observations (and their error covariance matrix R) contained within a local analysis region centred at each model grid point. The transformations in eqs.
(1) and (2) are applied at each grid point to the ensemble of state vectors at that grid point. The global analysis ensemble is then formed by combining the results of each local analysis.

A climatologically augmented ensemble Kalman filter
As mentioned above, we present here a method which incorporates climatologically derived error covariance information into a pure ensemble data assimilation framework. In ensemble Kalman filters, a collection of k d model forecasts is used to estimate the most likely system state and its uncertainty. Each of these forecasts is integrated using the full non-linear dynamics from one analysis time to the next, and is represented by an N-dimensional state vector, which we denote asx i d . The ensemble mean, given by is interpreted as the best estimate of the system state. Its uncertainty is estimated by the dynamic ensemble sample covariance matrix, and the superscript T denotes the transpose operation. We incorporate climatological covariance information into this framework by increasing the size of the ensemble at analysis time, by adding to the dynamic ensemble a collection of k c 'climatological' ensemble members,x j c , for k d '15j5k d 'k c . In practice, the size k c of this climatological ensemble is determined by a number of factors including convenience, empirical tests of forecast effectiveness and the user's computational resources. Each of these N-dimensional climatological ensemble members are created by adding a climatologically derived perturbation x c j to the dynamic Thus, the ensemble on which the analysis is performed may be represented as a mean x d and a N ) where the columns of X d are the dynamic ensemble perturbations, and the columns of X c are the climatological ensemble perturbations.
Our particular choice of climatological perturbations X c , described below, is made under two conditions. First, in order that the mean of the combined ensemble be the same as the mean of the dynamical ensemble, we require that the mean of the columns of X c be 0 (note that the mean of the columns of X d is 0 by definition). Second, we want the population covariance P b c ¼ 1 k c X c X c T of the climatological ensemble to approximate the true climatological background-error covariance matrix B. The sample covariance P b ¼ 1 k d þk c À1 XX T of the combined ensemble may then be written as Thus, as with many hybrid methods (Hamill and Snyder, 2000;Lorenc, 2003), the implicit background covariance P b is a linear combination of a dynamical (flow-dependent) covariance P b d and a climatological covariance P b c , with coefficients whose sum is 1.
In most applications, the true climatological backgrounderror covariance matrix, B, is not known a priori, and is estimated using various physical arguments and statistical techniques (we denote this estimate by B est ). The climatological perturbations x j c , which are the columns of X c , are derived from the orthogonal eigenvectors of the background-error covariance estimate B est . Specifically, they are derived from k c columns of the matrix A 0VD 1/2 , where V is a N )N dimensional matrix whose columns are the orthonormal eigenvectors of B est , and D is a N )N dimensional diagonal matrix of the eigenvalues of B est , so that B est 0AA T . The collection of k c scaled orthonormal eigenvectors, where k c B BN, is made of the eigenvectors which correspond to the k c largest eigenvalues of B est , and we interpret these climatological vectors as representing the k c directions of greatest climatological error variability. Once these k c columns of A have been chosen, we multiply each of them by ffiffiffiffi ffi k c p , to make their covariance matrix (normalised by k c ) approximate B est . We then subtract from these k c vectors their mean, to make their sum equal zero.
We store the resulting climatological perturbation vectors in the columns of the N )k c dimensional matrix X c .
In our experiments, we use as an analysis algorithm the LETKF (Ott et al., 2004;Hunt et al., 2007). We perform the analysis on the collection of k d 'k c dynamic and climatological ensemble members, and upon completion of the analysis procedure, the ensemble mean is updated according to eq. (1). The analysis also produces a collection of k d 'k c analysis ensemble perturbations. The 'dynamic' analysis ensemble perturbations are created from the first k d analysis ensemble perturbations, which for the LETKF are the ones that are closest to the dynamic background ensemble perturbations [Ott et al. (2004), in particular, see Appendix A]. The mean of these k d perturbations is subtracted from each perturbation to yield the dynamic analysis ensemble perturbations. The dynamic analysis ensemble members are then calculated by adding the analysis ensemble mean to each of these dynamic analysis perturbations. This k d member analysis ensemble is then forecasted forward in time to the next analysis time. Especially for other Ensemble Kalman filters, it may be fruitful to consider other methods of selecting k d analysis perturbations from a (k d 'k c )-member analysis ensemble. Our choice of the first k d perturbations is appropriate for filters like the LETKF and perturbed observations EnKF (Burgers et al., 1998;Houtekamer and Mitchell, 1998), for which there is a natural correspondence between pairs of background and analysis ensemble members.
We think of the new climatological members as representing potential error directions that might not be captured by the dynamically forecasted ensemble members. In this sense, the data assimilation algorithm can search in a higher dimensional space for corrections to the dynamic ensemble. Though the addition of ensemble members increases the cost of the analysis computation, we show in Section 5 that a major benefit of the caLETKF is that greater analysis accuracy can be achieved with fewer forecasts. We note that the static, climatological perturbations only need to be calculated once, so that the increased computational cost of our method comes purely from carrying out the analysis in a higher dimensional space.

Set-up of our numerical experiments
We apply our climatological ensemble augmentation method in a series of experiments that use a one-dimensional chaotic model which we call Lorenz Model II (Lorenz, 2005). Lorenz Model II represents the flow of an 'atmosphericlike' quantity Z around a circle of constant 'latitude'. Specifically, Z is defined on a lattice of N grid points with periodic boundary conditions. The value of Z at a given grid point n is denoted by Z n , and its temporal behaviour is given by The terms on the right-hand side of eq. (4) are meant to roughly model quantities analogous to spatially averaged non-linear advection, linear dissipation, and constant forcing, respectively. For our experiments, we set the parameter F to have a value of 15, and take N 0240. The first term on the right-hand side of eq. (4) is given by where K is an integer-valued parameter, J 0K/2 if K is even and J 0(K'1)/2 if K is odd. If K is even, the primed summation notation in eq. (5) denotes an ordinary sum with the first and last terms in the sum each multiplied by 1/2. For our experiments, K 08. The result of the spatial averaging present in eq. (5) is that the Model II states are characterised by spatially smooth waves. We construct an estimate, B est , of the background-error covariance matrix through the National Meteorological Center (NMC) Method of Parrish and Derber (1992). This method approximates the background-error covariance matrix using a large set (for our experiments Â50 000) of differences between 6-and 24-hour forecasts that verify at the same time. These forecasts were started from initial conditions generated using a 40-member LETKF. To account for magnitude differences between the estimate of the background-error covariance matrix found using the NMC method and the true climatological background-error covariance matrix, the estimate of the background-error covariance matrix estimated using the NMC method is typically multiplied by a constant scalar factor that is tuned for the particular application (e.g. 3D-Var). To accomplish this tuning within the caLETKF, the climatological perturbations X c , generated as described in Section 3, are multiplied by a scalar factor a. For the cases explored here, we found that the analysis error was insensitive to the precise value of a, and that the value a 01 produced a near-minimum error across a range of values of k c and k d . Thus, we used a 01 for all of the results reported here.
A key step in our proposed method is the generation of the leading eigenvectors of the static, climatological background-error covariance matrix. For the experiments described here, small system size allowed all of the eigenvectors of the climatological background-error covariance matrix to be easily calculated. However, for large, operational meteorological applications, explicit diagonalisation of the complete B matrix for systems of order NÂ10 8 (10 is problematic, since B is too large to be stored in a computer. On the contrary, the approximate effect of multiplying vectors by such matrices is available and is, for example, widely used for preconditioning in variational data assimilation. Furthermore, we only require the leading eigenvectors, not all of them, and methods such as the Lanczos algorithm are capable of calculating the leading m B BN eigenvectors numerically while requiring only m evaluations of the action of B on a vector (Golub and Van Loan, 1996). Exploiting special structure of the background-error covariance matrix can make these numerical procedures even more efficient and tractable. We again emphasise that these operations must only be completed once and can be done offline, as they are the same at each analysis cycle.
We employ a 'perfect model' set-up in our numerical experiments, so that the truth against which our state estimate is compared is generated from a free model integration that uses the same parameter values and dynamics as are used to forecast the ensemble. Observations are generated by adding Gaussian white noise, with mean 0 and standard deviation 1, to the truth state vector at the observation times and locations. We chose a spatially homogeneous, static observation network. Specifically, we assimilate 12 observations at each analysis time, with one observation every 20 model grid points. We perform assimilations every 0.05 model time units, analogous to approximately every 6 hours for the atmosphere (Lorenz, 2005). The initial ensembles used in our numerical experiments are found by randomly sampling widely timeseparated states from a long run of the forecast model states.
As a benchmark for comparison, we use results produced when analyses are performed using the standard LETKF algorithm (i.e. without a climatological supplement to the ensemble). Benchmark runs have the same ensemble size, assimilate the same set of observations, and compare against the same truth run as experiments with the climatologically augmented algorithm. In both sets of experiments, the localisation radius used by the analysis algorithm is constant at 20 model grid points, with no tapering of the observation influence (Hunt et al., 2007). In addition, we utilize 3% multiplicative covariance inflation (Anderson and Anderson, 1999) in all experiments involving the standard LETKF and 2.75% multiplicative covariance inflation for experiments using the caLETKF. These inflation factors were tuned to minimise analysis root-mean-square error (RMSE) over the interval of 0Á7% inflation.
To compare the performance of the caLETKF and the LETKF, we measure the RMSE between the truth and forecast ensemble mean for several forecast lead-times f. We denote the difference at location n between the ensemble mean of the f-hour forecast ensemble verifying at time r and the truth at the same time as E(r,n,f). The RMSE of the f-hour forecast ensemble mean verifying at analysis time r is then expressed as The temporally averaged RMSE of the forecast ensemble mean is then found by averaging eq. (6) over all c times during which statistics are calculated, to yield the average RMS error for a given forecast lead-time f, =c; where r 01 is chosen to correspond to an analysis time occurring after a sufficiently long spin-up time. We compare the LETKF and caLETKF through the analysis accuracy [f 00 in eqs. (6) and (7)] and for ensemble forecasts of various lead-times [0 Bf572 in eq. (7)]. Though the present work focuses on the usage of the RMSE as a metric of analysis and forecast performance, we recognise that other diagnostics (e.g. the spread/skill relationship) will be useful and necessary for further explorations of the properties of the caLETKF.

Experimental results
The effectiveness of our method was first examined through time series of the analysis RMSE. The RMS error of the analysis ensemble mean was recorded at each analysis time, the results of which are shown in Fig. 1, for both the caLETKF and standard LETKF. Here the dynamic ensemble size of both the standard LETKF and the caLETKF are both equal to 10, while the caLETKF supplements the dynamic ensemble with 10 additional static, climatological ensemble members at each analysis time. We note that the LETKF case with k d 010 dynamic ensemble members does not converge. Once our method (red curve) has converged, it produces analysis errors that are significantly smaller than those produced using the standard LETKF with k d 010 dynamic ensemble members (blue curve). The standard LETKF, albeit with a substantially larger dynamic ensemble size (k d 020, black curve), can achieve analysis accuracies similar to those of our method shown in Fig. 1.
Comparing the results for the standard LETKF with k d 010 dynamic ensemble members with results for the caLETKF for k d 010 dynamic and k c 010 climatological ensemble members, one can see that the inclusion of the additional climatological ensemble members helps to stabilise the filter (i.e. fluctuations in the errors are substantially reduced). The behaviour shown in Fig. 1 suggests that the climatological members account for realistic errors in directions not captured by the dynamic ensemble members, and are providing value to the overall assimilation without the expense of being forecasted themselves.
To better quantify how much of an advantage is gained by climatological augmentation of the ensemble at analysis time, we also compare how RMS analysis error changes with increasing dynamic ensemble size, for both the augmented and standard analysis methods. At each dynamic ensemble size, the RMS analysis error of the ensemble mean is averaged over 50 000 analysis cycles, after an initial 1000 spin-up cycles. For each dynamic ensemble size listed on the x-axis of Fig. 2, the caLETKF was computed with k c 010 climatological ensemble members. As Fig. 2 shows, the advantage of the caLETKF is shown in its convergence, at smaller dynamic ensemble size, to the error level reached by the traditional LETKF at larger dynamic ensemble size. Specifically, our method needs approximately 13 dynamic ensemble members to achieve the same error that the LETKF achieves with approximately 28 dynamic ensemble members. Figure 2 further suggests that, if limited by computational resources for calculating ensemble member forecasts, it can be beneficial to consider using climatological ensemble members during the analysis, rather than striving to increase total ensemble size.
In the experiments detailed above, comparisons between the caLETKF and LETKF analyses are made when the caLETKF uses k c 010 climatological ensemble members. Figure 3 explores how many climatological ensemble members the caLETKF needs by changing the number of climatological ensemble members while keeping the size of the dynamic ensemble constant at k d 015. The plot of Fig. 3 shows time-averaged RMS error of the analysis ensemble mean versus climatological ensemble size, and each data point shown here is averaged over 50 000 analysis cycles. Figure 3 shows that as few as about eight climatological ensemble members can be used without a significant Fig. 2. A comparison of the RMS error of analysis ensemble means, eq. (7), between both the standard LETKF (dashed curve) and the caLETKF (solid curve). After an initial spin-up of 1000 analysis cycles, RMS error is averaged over 50 000 analysis cycles and is plotted as a function of dynamic ensemble size. For the experiments shown here, the climatologically augmented method uses k c 010 climatological ensemble members. Below k d 04 dynamic ensemble members, we find that the caLETKF is susceptible to filter divergence, while the standard LETKF is susceptible to filter divergence below k d 010 dynamic ensemble members. Fig. 3. Here we compare RMS error of the analysis ensemble mean for the caLETKF, eq. (7), as a function of climatological ensemble size k c . Fifteen dynamic ensemble members (k d 015) are used, and each trial is averaged over 50 000 analysis cycles, discarding the first 1000 cycles as spin-up.  6)] is plotted at each analysis cycle over a 1500 analysis cycle period. Both experiments assimilate the same observations on the same observation network. For comparison, results from an experiment where the standard LETKF has k d 020 dynamic ensemble members (black curve) is also included. loss of analysis accuracy, implying that the caLETKF curve of Fig. 2 would look very similar for any k d ]8. This suggests that much of the information missed by the dynamic ensemble is found in a relatively small number of state space directions.
The experiments just described measure analysis accuracy of the caLETKF while keeping forecasting costs (i.e. k d ) constant. We next present analysis accuracy results from an experiment which keeps analysis cost constant, by varying the size of the climatological ensemble while keeping constant the sum of the number of dynamic and climatological ensemble members, k d 'k c . Specifically, the total ensemble size on which the analysis is performed is kept constant at k d 'k c 030 ensemble members. The value k d 'k c 030 is chosen because, as shown in Fig. 2, if a dynamic ensemble and no contributing static members (k c 00) are used, the LETKF analysis accuracy (dashed curve in Fig. 2) quickly degrades when the number of dynamic ensemble members falls below 30, k d B30. Figure 4 shows the time-averaged RMSE of the analysis ensemble mean, averaged over 50 000 analysis cycles, as a function of climatological ensemble size. We see that the inclusion of climatological ensemble members allows the caLETKF to achieve the same accuracy with 12 dynamic ensemble members (and k c 018 climatological members) as that achieved with the standard LETKF with 30 dynamic ensemble members (and k c 00 climatological ensemble members). For reference, the horizontal line represents the analysis accuracy of the standard LETKF when k d 030 dynamic ensemble members are used, which is the smallest dynamic ensemble size at which the LETKF converges. Here, the caLETKF is advantageous over the LETKF, as it achieves comparable analysis errors with many fewer (as low as k d 012) dynamic ensemble members.
To measure ensemble forecast accuracy, we average the accuracy of the forecast ensemble mean over a series of 50 000 ensemble forecasts. Forecast accuracy information is stored for a maximum lead-time of three model days. The forecast accuracy, as measured by RMSE [eq. (7)] is plotted in Fig. 5 as a function of forecast lead-time. For this experiment, the caLETKF uses k d 020 dynamic ensemble members and k c 010 climatological ensemble members. Forecast results from the caLETKF are compared against results from the LETKF with k d 020 and k d 030 dynamic ensemble members. Forecast results from the LETKF with k d 030 dynamic ensemble members outperform those of the LETKF with k d 020 dynamic ensemble members, and are on average comparable to the caLETKF with k d 020 and k c 010. Figure 5 shows that gains made from more accurate caLETKF analyses persist, and lead to more accurate forecasts. In fact, the caLETKF corresponds to gains in forecast lead-time of 24 to 17 hours: the 1-d forecasts initialised with the 20 dynamic member caLETKF analysis Analysis accuracy of the caLETKF at constant analysis cost. Here, the sum of the dynamic and climatological ensemble sizes is kept constant at k d 'k c 030, and the climatological ensemble size is plotted versus analysis RMS error, eq. (7), averaged over 50 000 analysis cycles, after discarding 1000 initial cycles. For small dynamic ensemble sizes, k d B8 and k c !22, the caLETKF was susceptible to filter divergence. The resulting errors were averaged over a sample of 50 000 forecasts at each lead-time, using eq. (7). Forecast results initialised from caLETKF analysis ensembles with k d 020 dynamic and k c 010 climatological ensemble members are shown as a solid black curve. For comparison, results of forecasts initialised from LETKF analysis ensembles with k d 020 and k d 030 dynamic ensemble members are shown as dashed and dotÁdashed curves, respectively. The small difference (0.06 in forecast RMSE) between the LETKF k d 030 result and the caLETKF result is within the level of statistical fluctuations seen in our experimental system (for example, see the variation of the solid and dashed curves in Fig. 2 for k d ]30). ensemble are as accurate as the 20 dynamic member LETKF analysis ensemble. Though these gains diminish slightly with time, at constant dynamic ensemble size (k d 020), 48-hour caLETKF-initialised forecasts are as accurate as 30-hour LETKF-initialised forecasts, and 72-hour caLETKF-initialised forecasts are as accurate as 55-hour LETKF-initialised forecasts.

Summary and conclusions
The techniques of data assimilation can broadly be categorised as either variational or ensemble methods. While variational methods have been around for longer than ensemble methods, and are in widespread use in both operational and research settings, ensemble methods have advantages of their own, such as flow-dependent covariance estimates and natively computationally parallel forecasting. Recently, there has been a push to combine these methods to take advantage of their mutual advantages. The result has been the 'hybrid' methods, which combine the flow-dependent covariance estimates of ensemble methods with the climatological covariance estimates used by variational methods. Inspired by the success of these methods, we present here an ensemble method that aims to take advantage of climatological covariance information while staying in a purely ensemble framework.
The method we present here computes the analysis in a higher dimensional space than standard ensemble Kalman filters, by considering an augmented ensemble during the analysis. At the start of each analysis, the background dynamic ensemble mean is computed, along with the background dynamic ensemble perturbations. Additional ensemble members are created by adding to the background dynamic ensemble mean perturbations derived from a climatological background-error covariance matrix. We generate these climatological perturbations from the eigenvectors that correspond to the largest eigenvalues of the climatologically estimated background-error covariance matrix. These chosen eigenvectors correspond to the directions in model space that climatologically account for the most forecast error variability. Preliminary results (not shown) indicate that generating climatological perturbations from the eigenvectors of a static covariance estimate B est is advantageous over the simpler approach of using perturbations that are samples taken at each analysis cycle from the archive of forecast errors used to generate B est and scaled by a factor independent of k c . For the experimental results reported here, we estimate the background-error covariance matrix through the NMC method of Parrish and Derber (1992). We anticipate that other methods of generating the climatological background-error covariance matrix might be called for in other settings.
We conducted a series of numerical experiments using a one-dimensional chaotic model of Lorenz (Lorenz, 2005). In these experiments, we compared the caLETKF against the standard LETKF analysis algorithm. Initial experiments compared time-series analysis errors generated by both methods at a constant dynamic ensemble size (Fig. 1). These experiments showed that the caLETKF with k c 010 added climatological ensemble members can have a clear advantage over the standard LETKF.
Our next series of experiments investigated how well the new method performs as a function of dynamic ensemble size when the climatological ensemble size is fixed at k c 010 (Fig. 2). This experiment showed that the caLETKF converged and achieved a consistent level of analysis accuracy at approximately 13 dynamic ensemble members, while the LETKF needed approximately 28 dynamic ensemble members to perform as accurately. In this experiment, we observed that both the caLETKF and LETKF produce similar results at large enough dynamic ensemble sizes. In more complex, highly flow-dependent systems, we imagine that a large number of climatological ensemble members could reduce the relative weight given to the dynamical portion of the covariance enough to potentially degrade performance. In these scenarios, one might introduce an additional tunable scaling factor that, in conjunction with a, would independently control the weights of dynamic and climatological perturbations in the estimation of background covariances.
A third experiment explored the effect on analysis accuracy of varying the climatological ensemble size (Fig. 3). This experiment found that fewer than the 10 climatological ensemble members used in the previous experiments could be used during the analysis with negligible loss of analysis accuracy. In addition to comparing our method to the standard LETKF at constant forecasting cost, as just described, the accuracy of the caLETKF was also tested against the standard LETKF at constant analysis cost. Here, the total size of the ensemble is kept constant (i.e. the sum of dynamic and climatological ensemble sizes), while the proportion of the ensemble members that are climatological ensemble members is varied (Fig. 4). Results from these experiments indicate that analysis accuracy can be maintained by replacing a significant number of dynamic ensemble members with climatological ensemble members. In applications where the same result applies, this could substantially reduce forecast costs.
Our last series of experiments investigated the accuracy of 1-, 2-and 3-d ensemble forecasts (Fig. 5). This experiment found that the caLETKF analysis gains discussed earlier were retained during the forecasts, as ensemble forecasts initialised from caLETKF analyses were more accurate than forecasts initialised from LETKF analyses. We find the results of these numerical experiments with the caLETKF to be encouraging, and suggest that it be tested on larger and more realistic atmospheric models. Numerical experiments with larger, more complex models will necessitate other, more advanced diagnostics (e.g. the spreadÁ error relationship) to measure and quantify the benefits of the caLETKF. The caLETKF could potentially be very useful in these settings, as fewer dynamic ensemble members can be used without loss of analysis accuracy.