Improving the condition number of estimated covariance matrices

High dimensional error covariance matrices and their inverses are used to weight the contribution of observation and background information in data assimilation procedures. As observation error covariance matrices are often obtained by sampling methods, estimates are often degenerate or ill-conditioned, making it impossible to invert an observation error covariance matrix without the use of techniques to reduce its condition number. In this paper we present new theory for two existing methods that can be used to 'recondition' any covariance matrix: ridge regression, and the minimum eigenvalue method. We compare these methods with multiplicative variance inflation. We investigate the impact of reconditioning on variances and correlations of a general covariance matrix in both a theoretical and practical setting. Improved theoretical understanding provides guidance to users regarding method selection, and choice of target condition number. The new theory shows that, for the same target condition number, both methods increase variances compared to the original matrix, with larger increases for ridge regression than the minimum eigenvalue method. We prove that the ridge regression method strictly decreases the absolute value of off-diagonal correlations. Theoretical comparison of the impact of reconditioning and multiplicative variance inflation on the data assimilation objective function shows that variance inflation alters information across all scales uniformly, whereas reconditioning has a larger effect on scales corresponding to smaller eigenvalues. The minimum eigenvalue method results in smaller overall changes to the correlation matrix than ridge regression, but can increase off-diagonal correlations. Data assimilation experiments reveal that reconditioning corrects spurious noise in the analysis but underestimates the true signal compared to multiplicative variance inflation.


Introduction
The estimation of covariance matrices for large dimensional problems is of growing interest [Pourahmadi, 2013], particularly for the field of numerical weather prediction (NWP) [Bormann et al., 2016, Weston et al., 2014 where error covariance estimates are used as weighting matrices in data assimilation problems, e.g. Daley [1991], Ghil [1989], Ghil and Malanotte-Rizzoli [1991]. At operational NWP centres there are typically O(10 7 ) measurements every 6 hours [Bannister, 2017], meaning that observation error covariance matrices are extremely high-dimensional. In nonlinear least squares problems arising in variational data assimilation, the inverse of correlation matrices are used, meaning that well-conditioned matrices are vital for practical applications [Bannister, 2017]. This is true in both the unpreconditioned and preconditioned variational data assimilation problem using the control variable transform, as the inverse of the observation error covariance matrix appears in both formulations. The convergence of the data assimilation problem can be poor if either the background or observation variance is small; however, the condition number and eigenvalues of background and observation error covariance matrices have also been shown to be important for convergence in both the unpreconditioned and preconditioned case in Haben et al. [2011b,a], Haben [2011], Tabeart et al. [2018]. Furthermore, the conditioning and solution of the data assimilation system can be affected by complex interactions between the background and observation error covariance matrices and the observation operator [Tabeart et al., 2018, Johnson et al., 2005. The condition number of a matrix, A, provides a measure of the sensitivity of the solution x of the system Ax = b to perturbations in b. The need for well-conditioned background and observation error covariance matrices motivates the use of 'reconditioning' methods, which are used to reduce the condition number of a given matrix.
In NWP applications, observation error covariance matrices are often constructed from a limited number of samples Cordoba et al. [2016], Waller et al. [2016a,c]. This can lead to problems with sampling error, which manifest in sample covariance matrices, or other covariance matrix estimates, that are very ill-conditioned or can fail to satisfy required properties of covariance matrices (such as symmetry and positive semi-definiteness) [Higham et al., 2016, Ledoit andWolf, 2004]. In some situations it may be possible to determine which properties of the covariance matrix are well estimated. One such instance is presented in Skøien and Blöschl [2006], which considers how well we can expect the mean, variance and correlation lengthscale of a sample correlation to represent the true correlation matrix depending on different properties of the measured domain (e.g. sample spacing, area measured by each observation). However, this applies only to direct estimation of correlations and will not apply to diagnostic methods, e.g. Desroziers et al. [2005], where transformed samples are used and covariance estimates may be poor. We note that in this paper, we assume that the estimated covariance matrices used in our experiments represent the desired correlation information matrix well and that differences are due to noise rather than neglected sources of uncertainty. This may not be the case for practical situations, where reconditioning may need to be performed in conjunction with other techniques to compensate for the underestimation of some sources of error.
Depending on the application, a variety of methods have been used to combat the problem of rank deficiency of sample covariance matrices. In the case of spatially correlated errors it may be possible to fit a smooth correlation function or operator to the sample covariance matrix as was done in Simonin et al. [2019], Guillet et al. [2019] respectively. Another approach is to retain only the first k leading eigenvectors of the estimated correlation matrix and to add a diagonal matrix to ensure the resulting covariance matrix has full rank [Michel, 2018, Stewart et al., 2013. However, this has been shown to introduce noise at large scales for spatial correlations and may be expensive in terms of memory and computational efficiency [Michel, 2018]. Although localisation can be used to remove spurious correlations, and can also be used to increase the rank of a degenerate correlation matrix [Hamill et al., 2001], it struggles to reduce the condition number of a matrix without destroying off-diagonal correlation information [Smith et al., 2018]. A further way to increase the rank of a matrix is by considering a subset of columns of the original matrix that are linearly independent. This corresponds to using a subset of observations, which is contrary to a key motivation for using correlated observation error statistics: the ability to include a larger number of observations in the assimilation system [Janjić et al., 2018]. Finally, the use of transformed observations imay result in independent observation errors [Migliorini, 2012, Prates et al., 2016; however, problems with conditioning will manifest in other components of the data assimilation algorithm, typically the observation operator. Therefore, although other techniques to tackle the problem of ill-conditioning exist, they each have limitations. This suggests that for many applications the use of reconditioning methods, which we will show are inexpensive to implement and are not limited to spatial correlations, may be beneficial.
We note that small eigenvalues of the observation error covariance matrix are not the only reason for slow convergence: if observation standard deviations are small, the observation error covariance matrix may be well-conditioned, but convergence of the minimisation problem is likely to be poor [Haben, 2011, Tabeart et al., 2018. In this case reconditioning may not improve convergence and performance of the data assimilation routine.
Two methods in particular, referred to in this work as the minimum eigenvalue method and ridge regression, are commonly used at NWP centres. Both methods are used by Weston [2011], where they are tested numerically. Additionally in Campbell et al. [2017] a comparison between these methods is made experimentally and it is shown that reconditioning improves convergence of a dual four-dimensional variational assimilation system. However, up to now there has been minimal theoretical investigation into the effects of these methods on the covariance matrices. In this paper we develop theory that shows how variances and correlations are altered by the application of reconditioning methods to a covariance matrix.
Typically reconditioning is applied to improve convergence of a data assimilation system by reducing the condition number of a matrix. However, the convergence of a data assimilation system can also be improved using multiplicative variance inflation, a commonly used method at NWP centres such as ECMWF [Liu and Rabier, 2003, McNally et al., 2006, Bormann et al., 2015 to account for neglected error correlations or to address deficiencies in the estimated error statistics by increasing the uncertainty in observations. It is not a method of reconditioning when a constant inflation factor is used, as it cannot change the condition number of a covariance matrix. In practice multiplicative variance inflation is often combined with other techniques, such as neglecting off-diagonal error correlations, which do alter the conditioning of the observation error covariance matrix.
Although it is not a reconditioning technique, in Bormann et al. [2015] multiplicative variance inflation was found to yield faster convergence of a data assimilation procedure than either the ridge regression or minimum eigenvalue methods of reconditioning. This finding is likely to be system-dependent; the original diagnosed error covariance matrix in the ECMWF system has a smaller condition number than the corresponding matrix for the same instrument in the Met Office system [Weston et al., 2014]. Additionally, in the ECMWF system the use of reconditioning methods only results in small improvements to convergence, and there is little difference in convergence speed for the two methods. This contrasts with the findings of Weston [2011], Weston et al. [2014], Campbell et al. [2017] where differences in convergence speed when using each method of reconditioning were found to be large. Therefore, it is likely that the importance of reducing the condition number of the observation error covariance matrix compared to inflating variances will be sensitive to the data assimilation system of interest. Aspects of the data assimilation system that may be important in determining the level of this sensitivity include: the choice of preconditioning and minimisation scheme [Bormann et al., 2015], quality of the covariance estimate, interaction between background and estimated observation error covariance matrices within the data assimilation system [Fowler et al., 2018, Tabeart et al., 2018, the use of thinning and different observation networks. We also note that Stewart et al. [2009], Stewart [2010], Stewart et al. [2013] consider changes to the information content and analysis accuracy corresponding to different approximations to a correlated observation error covariance matrix (including an inflated diagonal matrix). Stewart et al. [2013], Healy and White [2005] also provide evidence in idealized cases to show that inclusion of even approximate correlation structure gives significant benefit over diagonal matrix approximations, including when variance inflation is used.
In this work we investigate the minimum eigenvalue and ridge regression methods of reconditioning as well as multiplicative variance inflation, and analyse their impact on the covariance matrix. We compare both methods theoretically for the first time, by considering the impact of reconditioning on the correlations and variances of the covariance matrix. We also study how each method alters the objective function when applied to the observation error covariance matrix. Other methods of reconditioning, including thresholding [Bickel and Levina, 2008] and localisation [Horn and Johnson, 1991, Ménétrier et al., 2015, Smith et al., 2018 have been discussed from a theoretical perspective in the literature but will not be included in this work. In Section 2 we describe the methods more formally than in previous literature before developing new related theory in detail in Section 3. We show that the ridge regression method increases the variances and decreases the correlations for a general covariance matrix and the minimum eigenvalue method increases variances. We prove that the increases to the variance are bigger for the ridge regression method than the minimum eigenvalue method for any covariance matrix. We show that both methods of reconditioning reduce the weight on observation information in the objective function in a scale dependent way, with the largest reductions in weight corresponding to the smallest eigenvalues of the original observation error covariance matrix. In contrast, multiplicative variance inflation using a constant inflation factor reduces the weight on observation information by a constant amount for all scales. In Section 4 the methods are illustrated via numerical experiments for two types of covariance structures. One of these is a simple general correlation function, and one is an interchannel covariance arising from a satellite based instrument with observations used in NWP. We provide physical interpretation of how each method alters the covariance matrix, and use this to provide guidance on which method of reconditioning is most appropriate for a given application. We present an illustration of how all three methods alter the analysis of a data assimilation problem, and relate this to the theoretical conclusions concerning the objective function. We finally present our conclusions in Section 5. The methods are very general and, although their initial application was to observation error covariances arising from numerical weather prediction, the results presented here apply to any sampled covariance matrix, such as those arising in finance [Higham, 2002, Qi andSun, 2010] and neuroscience [Nakamura andPotthast, 2015, Schiff, 2011].

Covariance matrix modification methods
We begin by defining the condition number. The condition number provides a measure of how sensitive the solution x of a linear equation Ax = b is to perturbations in the data b. A 'well-conditioned problem' will result in small perturbations to the solution with small changes to b, whereas for an 'ill-conditioned problem', small perturbations to b can result in large changes to the solution. , Noting that all covariance matrices are positive semi-definite by definition, we distinguish between the two cases of strictly positive definite covariance matrices and covariance matrices with zero minimum eigenvalue. Symmetric positive definite matrices admit a definition for the condition number in terms of their maximum and minimum eigenvalues. For the remainder of the work, we define the eigenvalues of a symmetric positive semi-definite matrix S ∈ R d×d via: (1) Theorem 1. If S ∈ R d×d is a symmetric positive definite matrix with eigenvalues defined as in 1 we can write the condition number in the L 2 norm as κ(S) = λ1(S) λ d (S) .
For a singular covariance matrix, S, the convention is to take κ(S) = ∞ [Trefethen and Bau, 1997, Sec. 12]. We also note that real symmetric matrices admit orthogonal eigenvectors which can be normalised to produce a set of orthonormal eigenvectors.
Let R ∈ R d×d be a positive semi-definite covariance matrix with condition number κ(R) = κ. We wish to recondition R to obtain a covariance matrix with condition number κ max where the value of κ max is chosen by the user. We denote the eigendecomposition of R by where Λ ∈ R d×d is the diagonal matrix of eigenvalues of R and V R ∈ R d×d is a corresponding matrix of orthonormal eigenvectors.
In addition to considering how the covariance matrix itself changes with reconditioning, it is also of interest to consider how the related correlations and standard deviations are altered. We decompose R as R = ΣCΣ, where C is a correlation matrix, and Σ is a non-singular diagonal matrix of standard deviations. We calculate C and Σ via: We now introduce the ridge regression method and the minimum eigenvalue method; the two methods of reconditioning that will be discussed in this work. We then define multiplicative variance inflation. This last method is not a method of reconditioning, but will be used for comparison purposes with the ridge regression and minimum eigenvalues methods.

Ridge regression method:
The ridge regression method (RR) adds a scalar multiple of the identity to R to obtain the reconditioned matrix R RR . The scalar δ is set using the following method.
We note that this choice of δ yields κ(R RR ) = κ max .
In the literature [Hoerl andKennard, 1970, Ledoit andWolf, 2004], 'ridge regression' is a method used to regularise least squares problems. In this context, ridge regression can be shown to be equivalent to Tikhonov regularisation [Hansen, 1998]. However, in this paper we apply ridge regression as a reconditioning method directly to a covariance matrix. For observation error covariance matrices, the reconditioned matrix is then inverted prior to its use as a weighting matrix in the data assimilation objective function. As we are only applying the reconditioning to a single component matrix in the variational formulation, the implementation of the ridge regression method used in this paper is not equivalent to Tikhonov regularisation applied to the variational data assimilation problem [Budd et al., 2011, Moodey et al., 2013. This is shown in Section 3.5 where we consider how applying ridge regression to the observation error covariance matrix affects the variational data assimilation objective function. The ridge regression method is used at the Met Office [Weston et al., 2014].

Minimum eigenvalue method:
The minimum eigenvalue method (ME) fixes a threshold, T , below which all eigenvalues of the reconditioned matrix, R ME , are set equal to the threshold value. The value of the threshold is set using the following method.
• Set λ 1 (R ME ) = λ 1 (R) • Set the remaining eigenvalues of R ME via • Construct the reconditioned matrix via R ME = V R Λ ME V T R , where Λ ME (i, i) = λ i (R ME ).
This yields κ(R ME ) = κ max . The updated matrix of eigenvalues can be written as Λ ME = Λ + Γ, the sum of the original matrix of eigenvalues and Γ, a low-rank diagonal matrix update with entries Γ(k, k) = max{T − λ k , 0}. Using (3) the reconditioned R ME can then be written as: Under the condition that κ max > d − l + 1, where l is the index such that λ l ≤ T < λ l−1 , the minimum eigenvalue method is equivalent to minimising the difference R − R ME ∈ R d×d with respect to the Ky Fan 1-d norm (The proof is provided in Appendix A). The Ky Fan p − k norm (also referred to as the trace norm) is defined in Fan [1959], Horn and Johnson [1991], and is used in Tanaka and Nakata [2014] to find the closest positive definite matrix with condition number smaller than a given constant. A variant of the minimum eigenvalue method is applied to observation error covariance matrices at the European Centre for Medium-Range Weather Forecasts (ECMWF) [Bormann et al., 2016].

Multiplicative variance inflation
Multiplicative variance inflation (MVI) is a method that increases the variances corresponding to a covariance matrix. Its primary use is to account for neglected error correlation information, particularly in the case where diagonal covariance matrices are being used even though non-zero correlations exist in practice. However, this method can also be applied to non-diagonal covariance matrices.
Definition 1. Let α > 0 be a given variance inflation factor and R = ΣCΣ be the estimated covariance matrix. Then multiplicative variance inflation is defined by This is equivalent to multiplying the estimated covariance matrix by a constant. The updated covariance matrix is given by The estimated covariance matrix is therefore multiplied by the square of the inflation constant. We note that the correlation matrix, C, is unchanged by application of multiplicative variance inflation.
Multiplicative variance inflation is used at NWP centres including ECMWF [Bormann et al., 2016] to counteract deficiencies in estimated error statistics, such as underestimated or neglected sources of error. Inflation factors are tuned to achieve improved analysis or forecast performance, and are hence strongly dependent on the specific data assimilation system of interest. Aspects of the system that might influence the choice of inflation factor include observation type, known limitations of the covariance estimate, and observation sampling or thinning.
Although variance inflation is not a method of reconditioning, as it is not able to alter the condition number of a covariance matrix, we include it in this paper for comparison purposes. This means that variance inflation can only be used in the case that the estimated matrix can be inverted directly, i.e. is full rank. Multiplicative variance inflation could also refer to the case where the constant inflation factor is replaced with a diagonal matrix of inflation factors. In this case the condition number of the altered covariance matrix would change. An example of a study where multiple inflation factors are used is given by Heilliette and Garand [2015], where the meteorological variable to which an observation is sensitive determines the choice of inflation factor. However, this is not commonly used in practice, and will not be considered in this paper.

Theoretical considerations
In this section we develop new theory for each method. We are particularly interested in the changes made to C and Σ for each case. Increased understanding of the effect of each method may allow users to adapt or extend these methods, or determine which is the better choice for practical applications.
We now introduce an assumption that will be used in the theory that follows.
Main Assumption: Let R ∈ R d×d be a symmetric positive semi-definite matrix with We remark that any symmetric, positive semi-definite matrix with λ 1 = λ d is a scalar multiple of the identity, and cannot be reconditioned since it is already at its minimum possible value of unity. Hence in what follows, we will consider only matrices R that satisfy the Main Assumption.

Ridge Regression Method
We begin by discussing the theory of RR. In particular we prove that applying this method for any positive scalar, δ, results in a decreased condition number for any choice of R.

Theorem 2. Under the conditions of the Main Assumption, adding any positive increment to the diagonal elements of R decreases its condition number.
Proof. We recall that R RR = R + δI. The condition number of R RR is given by It is straightforward to show that for any δ > 0, κ(R RR ) < κ(R), completing the proof.
We now consider how application of RR affects the correlation matrix C and the diagonal matrix of standard deviations Σ.
Theorem 3. Under the conditions of the Main Assumption, the ridge regression method updates the standard deviation matrix Σ RR , and correlation matrix C RR of R ME via Proof. Using (4), Σ(i, i) = (R(i, i)) 1/2 . Substituting this into the expression for R RR yields: Considering the components of C RR and the decomposition of Σ RR given by (4): as required.
Theorem 3 shows how we can apply RR to our system by updating C and Σ rather than R. We observe, from (10), that applying RR leads to a constant increase to variances for all variables. However, the inflation to standard deviations is additive, rather than the multiplicative inflation that occurs for multiplicative variance inflation. We now show that RR also reduces all non-diagonal entries of the correlation matrix.

Corollary 1. Under the conditions of the Main Assumption, for
Proof. Writing the update equation for C, given by (10), in terms of the variance and correlations of R yields: We consider C RR (i, j) for i = j. As Σ RR and Σ are diagonal matrices, we obtain From the update equation (10), (14) yields that for all values of i, j with i = j, |C RR (i, j)| < |C(i, j)| as required.
For i = j, it follows from (13) that C RR (i, i) = 1 for all values of i.

Minimum Eigenvalue Method
We now discuss the theory of ME as introduced in Section 2.2. Using the alternative decomposition of R ME given by (6) enables us to update directly the standard deviations for this method.
Theorem 4. Under the conditions of the Main Assumption, the minimum eigenvalue method updates the standard deviations, Σ ME , of R via This can be bounded by Proof.
Noting that Γ(k, k) ≥ 0 for all values of k, we bound the second term in this expression by This inequality follows from the orthonormality of V R , and by the fact that T > λ d (R) by definition.
Due to the way the spectrum of R is altered by ME it is not evident how correlation entries are altered in general for this method of reconditioning.

Multiplicative variance inflation
We now discuss theory of MVI that was introduced in Section 2.3. We prove that MVI is not a method of reconditioning, as it does not change the condition number of a covariance matrix.
Theorem 5. Multiplicative variance inflation with a constant inflation parameter cannot change the condition number or rank of a matrix.
If R is rank-deficient, then λ min (R MV I ) = α 2 λ d = 0 and hence R MV I is also rank deficient. If R is full rank then we can compute the condition number of R MI as the ratio of its eigenvalues, which yields Hence the condition number and rank of R are unchanged by multiplicative inflation.

Comparing ridge regression and minimum eigenvalue methods
Both RR and ME change R by altering its eigenvalues. In order to compare the two methods, we can consider their effect on the standard deviations. We recall from Sections 3.1 and 3.2 that RR increases standard deviations by a constant and the changes to standard deviations by ME can be bounded above and below by a constant.
Corollary 2. Under the conditions of the Main Assumption, for a fixed value of κ max < κ, Proof. From Theorems 3 and 4 the updated standard deviation values are given by From the definitions of δ and T we obtain that We conclude that the increment to the standard deviations for RR is always larger than the increment for ME.

Comparison of methods of reconditioning and multiplicative variance inflation on the variational data assimilation objective function
We demonstrate how RR, ME and MVI alter the objective function of the variational data assimilation problem when applied to the observation error covariance matrix. We consider the 3D-Var objective function here for simplicity of notation, although the analysis extends naturally to the 4D-Var case. We begin by defining the 3D-Var objective function of the variational data assimilation problem.
Definition 2. The objective function of the variational data assimilation problem is given by where x b ∈ R n is the background or prior, y ∈ R d is the vector of observations, h : R n → R d is the observation operator mapping from control vector space to observation space, B ∈ R n×n is the background error covariance matrix, and R ∈ R d×d is the observation error covariance matrix. Let J o denote the observation term in the objective function and J b denote the background term in the objective function.
In order to compare the effect of using each method, they are applied to the observation error covariance matrix in the variational objective function (24). We note that analogous results hold if all methods are applied to the background error covariance matrix in the objective function.
We begin by presenting the three updated objective functions, and then discuss the similarities and differences for each method together at the end of Section 3.5. We first consider how applying RR to the observation error covariance matrix alters the variational objective function (24).
Theorem 6. By applying RR to the observation error covariance matrix we alter the objective function (24) as follows: where Λ δ is a diagonal matrix with entries given by (Λ δ ) ii = δ λi(λi+δ) . Proof. We denote the eigendecomposition of R as in (3). Applying RR to the observation error covariance matrix, R, we obtain We then calculate the inverse of R RR and express this in terms of R −1 and an update term: Substituting (29) into (24), and defining Λ δ as in the theorem statement we can write the objective function using the reconditioned observation error covariance matrix as (25).
The effect of RR on the objective function differs from the typical application of Tikhonov regularisation to the variational objective function [Budd et al., 2011, Moodey et al., 2013. In particular, we subtract a term from the original objective function rather than adding one, and the term depends on the eigenvectors of R as well as the innovations (differences between observations and the background field in observation space). Writing the updated objective function as in (25) shows that the size of the original objective function (24) is decreased when RR is used. Specifically, as we discuss later, the contribution of small-scale information to the observation term, J o , is reduced by the application of RR.
We now consider how applying ME to the observation error covariance matrix alters the objective function (24).
Theorem 7. By applying ME to the observation error covariance matrix we alter the objective function (24) as follows: where Proof. We begin by applying ME and decomposing R ME as in (6): Therefore calculating the inverse of the reconditioned matrix yields As this is full rank we can calculate the inverse of the diagonal matrix Λ + Γ DefiningΓ as in the theorem statement, and we can write R −1 ME as Substituting this into the definition of the objective function (24) we obtain the result given in the theorem statement.
AsΓ is non-zero only for eigenvalues smaller than the threshold, T , the final term of the updated objective function (30) reduces the weight on eigenvectors corresponding to those small eigenvalues. As all the entries ofΓ are non-negative, the size of the observation term in the original objective function (24) is decreased when ME is used.
Finally we consider the impact on the objective function of using MVI. We note that this can only be applied in the case that the estimated error covariance matrix is invertible as, by the result of Theorem 5, variance inflation cannot change the rank of a matrix.
Theorem 8. In the case that R is invertible, the application of MVI to the observation error covariance matrix alters the objective function (24) as follows Proof. By Definition 1, R MV I = α 2 R for inflation parameter α. The inverse of R MV I is given by Substituting this into (24) yields the updated objective function given by (37).
For both reconditioning methods, the largest relative changes to the spectrum of R occur for its smallest eigenvalues. In the case of positive spatial correlations, small eigenvalues are typically sensitive to smaller scales. For spatial correlations, weights on scales of the observations associated with smaller eigenvalues are reduced in the variational objective function, increasing the relative sensitivity of analysis to information content from the observations at large scales.
We also see that for RR and ME smaller choices of κ max yield larger reductions to the weight applied to small scale observation information. For RR, a smaller target condition number results in a larger value of δ, and hence larger diagonal entries of Λ δ . For ME, a smaller target condition number yields a larger threshold, T , and hence larger diagonal entries ofΓ. This means that the more reconditioning that is applied, the less weight the observations will have in the analysis. This reduction in observation weighting is different for the two methods; RR reduces the weight on all observations, although the relative effect is larger for scales corresponding to the smallest eigenvalues, whereas ME only reduces weight for scales corresponding to eigenvalues smaller than the threshold T . In ME, the weights on scales for eigenvalues larger than T are unchanged.
Applying MVI with a constant inflation factor also reduces the contribution of observation information to the analysis. In contrast to both methods of reconditioning, the reduction in weight is constant for all scales and does not depend on the eigenvectors of R. This means that there is no change to the sensitivity to different scales using this method. The analysis will simply pull closer to the background data with the same relative weighting between different observations as occurred for analyses using the original estimated observation error covariance matrix.
We have considered the impact of RR, ME and MVI on the unpreconditioned 3D-Var objective function. For the preconditioned case, Johnson et al. [2005] showed how, when changing the relative weights of the background and observation terms by inflating the ratio of observation and background variances, it is the complex interactions between the error covariance matrices and the observation operator that affects which scales are present in the analysis. This suggests that in the preconditioned setting MVI will also alter the sensitivity of the analysis to different scales.

Numerical experiments
In this section we consider how reconditioning via RR and ME and application of MVI affects covariance matrices arising from two different choices of estimated covariance matrices. Both types of covariance matrix are motivated by numerical weather prediction, although similar structures occur for other applications.

Numerical framework
The first covariance matrix is constructed using a second-order auto-regressive (SOAR) correlation function [Yaglom, 1987] with lengthscale 0.2 on a unit circle. This correlation function is used in NWP systems [Fowler et al., 2018, Stewart et al., 2013, Tabeart et al., 2018, Waller et al., 2016b, Thiebaux, 1976 where its long tails approximate the estimated horizontal spatial correlation structure well. In order to construct a SOAR error correlation matrix, S, on the finite domain, we follow the method described in Haben [2011], Tabeart et al. [2018]. We consider a one-parameter periodic system on the real line, defined on an equally spaced grid with N = 200 grid points. We restrict observations to be made only at regularly spaced grid points. This yields a circulant matrix where the matrix is fully defined by its first row. To ensure the corresponding covariance matrix is also circulant, we fix the standard deviation value for all variables to be σ = √ 5.
One benefit of using this numerical framework is that it allows us to calculate a simple expression for the update to the standard deviations for ME. We recall that RR updates the variances by a constant, δ. We now show that in the case where R is circulant, ME also updates the variances of R by a constant.
Circulant matrices admit eigenvectors which can be computed directly via a discrete Fourier transform [Gray, 2006] (via R = VΛV † , where † denotes conjugate transpose). This allows the explicit calculation of the ME standard deviation update given by (15) as This follows from (15) because the circulant structure of the SOAR matrix yields V(i, k) 2 = 1/d.
We therefore expect both reconditioning methods to increase the SOAR standard deviations by a constant amount. As the original standard deviations were constant, this means that reconditioning will result in constant standard deviations for all variables. These shall be denoted σ RR for RR and σ ME for ME. Constant changes to standard deviations also means that an equivalent MVI factor that corresponds to the change can be calculated. This will be denoted by α.
Our second covariance matrix comprises interchannel error correlations for a satellite-based instrument. For this we make use of the Infrared Atmospheric Sounding Instrument (IASI) which is used at many NWP centres within data assimilation systems. A covariance matrix for IASI was diagnosed in 2011 at the Met Office, following the procedure described in Weston [2011], Weston et al. [2014] (shown in Online Resource 1). The diagnosed matrix was extremely ill-conditioned and required the application of the ridge regression method in order that the correlated covariance matrix could be used in the operational system. We note that we follow the reconditioning procedure of Weston et al. [2014], where the reconditioning method is only applied to the subset of 137 channels that that are used in the Met Office 4D-Var system. These channels are listed in Stewart et al. [2008, Appendix A]. As the original standard deviation values are not constant across different channels, reconditioning will not change them by a constant amount, as is the case for Experiment 1. We note that the 137 × 137 matrix considered in this paper corresponds to the covariance matrix for one 'observation' at a single time and spatial location. The observation error covariance matrix for all observations from this instrument within a single assimilation cycle is a block diagonal matrix, with one block for every observation, each consisting of a submatrix of the 137 × 137 matrix . In the experiments presented in Section 4.2 we apply the minimum eigenvalue and the ridge regression methods to both the SOAR and IASI covariance matrices. The condition number before reconditioning of the SOAR correlation matrix is 81121.71 and for the IASI matrix we obtain a condition number of 2005.98. We consider values of κ max in the range 100 − 1000 for both tests. We note that the equivalence of the minimum eigenvalue method with the minimiser of the Ky Fan 1 − d norm is satisfied for the SOAR experiment for κ max ≥ 168 and the IASI experiment for κ max ≥ 98.

Changes to the covariance matrix
Example 1: Horizontal correlations using a SOAR correlation matrix Due to the specific circulant structure of the SOAR matrix and constant value of standard deviations for all variables, (10) and (39) indicate that we expect increases to standard deviations for both methods of reconditioning to be constant. This was found to be the case numerically. In Table 1 the computed change in standard deviation for different values of κ max is given as an absolute value and as α, the multiplicative inflation constant that yields the same change to the standard deviation as each reconditioning method. We note that in agreement with the result of Corollary 2 the variance increase is larger for the RR than the ME for all choices of κ max . Reducing the value of κ max increases the change to standard deviations for both methods of reconditioning. The increase to standard deviations will result in the observations being down-weighted in the analysis. As this occurs uniformly across all variables for both methods, we expect the analysis to pull closer to the background. Nevertheless, we expect this to be a rather small effect. For this example, even for a small choice of κ max the values of the equivalent multiplicative inflation constant, α, is small, with the largest value of α = 1.124 occurring for RR for κ max = 100.
As the SOAR matrix is circulant, we can consider the impact of reconditioning on its correlations by focusing on one matrix row. In Figure 1 the correlations and percentage change for the 100th row of the SOAR matrix are shown for both methods for κ max = 100. These values are calculated directly from the reconditioned matrix. We note that by definition of a correlation matrix, C(i, i) = 1 ∀ i for all choices of reconditioning. This is the reason for the spike in correlation visible in the centre of Figure 1a and on the right of Figure 1b. As multiplicative variance inflation does not change the correlation matrix, the black line corresponding to the correlations of the original SOAR matrix also represents the correlations in the case of multiplicative inflation. We also remark that although ME is not equivalent to the minimiser of the Ky Fan 1 − d norm for κ max = 100, the qualitative behaviour in terms of correlations and standard deviations is the same for all values in the range 100 − 1000. It is important to note that ME is still a well-defined method of reconditioning even if it is not equivalent to the minimiser of the Ky Fan 1 − d norm. Figure 1a shows that for both methods, application of reconditioning reduces the value of off-diagonal correlations for all variables, with the largest absolute reduction occurring for variables closest to the observed variable. Although there is a large change to the off-diagonal correlations, we notice that the correlation lengthscale, which determines the rate of decay of the correlation function, is only reduced by a small amount. This shows that both methods of reconditioning dampen correlation values but do not significantly alter the overall pattern of correlation (blue dot-dashed). As the SOAR matrix is symmetric, we only plot the first 100 entries for (b).
information. Figure 1b shows the percentage change to the original correlation values after reconditioning is applied. For RR, although the difference between the original correlation value and the reconditioned correlation depends on the index i, the relative change is constant across all off-diagonal correlations. As MVI does not alter the correlation matrix, it would correspond to a horizontal line through 0 for Figure 1b.
When we directly plot the correlation values for the original and reconditioned matrices in Figure  1a, the change to correlations for ME appears very similar to changes for RR. However, when we consider the percentage change to correlation in Figure 1b we see oscillation in the percentage differences of the ME correlations, showing that the relative effect on some spatially distant variables can be larger than for some spatially close variables. The spatial impact on individual variables differs significantly for this method. We also note that ME increases some correlation values. These are not visible in Figure 1 due to entries in the original correlation matrix that are close to zero. Although the differences between C and C ME far from the diagonal are small, small correlation values in the tails of the original SOAR matrix mean that when considering the percentage difference we obtain large values, as seen in Figure 1b. This suggests that RR is a more appropriate method to use in this context, as the reconditioned matrix represents the initial correlation function better than ME, where spurious oscillations are introduced. These oscillations occur as ME changes the weighting of eigenvectors of the covariance matrix. As the eigenvectors of circulant matrices can be expressed in terms of discrete Fourier modes, ME has the effect of amplifying the eigenvalues corresponding to the highest frequency eigenvectors. This results in the introduction of spurious oscillations in correlation space.
Both methods reduce the correlation lengthscale of the error covariance matrix. In Tabeart et al. [2018], it was shown that reducing the lengthscale of the observation error covariance matrix decreases the condition number of the Hessian of the 3D-Var objective function and results in improved convergence of the minimisation problem. Hence the application of reconditioning methods to the observation error covariance matrix is likely to improve convergence of the overall data assimilation problem. Fowler et al. [2018] studied the effect on the analysis of complex interactions between the background error correlation lengthscale, the observation error correlation lengthscale and the observation operator in idealised cases. Their findings for a fixed background error covariance, and direct observations, indicate that the effect of reducing the observation error correlation lengthscale (as in the reconditioned cases) is to increase the analysis sensitivity to the observations at larger scales. In other words, more weight is placed on the large-scale observation information content and less weight on the small scale observation information content. This corresponds with the findings of Section 3.5, where we proved that both methods of reconditioning reduce the weight on small scale observation information in the variational objective function. However, the lengthscale imposed by a more complex observation operator could modify these

findings.
Example 2: Interchannel correlations using an IASI covariance matrix We now consider the impact of reconditioning on the IASI covariance matrix. We note that there is significant structure in the diagnosed correlations (see Stewart et al. [2014, Fig. 8] and online resource 1), with blocks of highly correlated channels in the lower right hand region of the matrix. We now consider how RR, ME and MVI change the variances and correlations of the IASI matrix. Figure 2 shows the standard deviations Σ, Σ RR , Σ ME and Σ MV I . These are calculated from the reconditioned matrices, but the values coincide with the theoretical results of Theorems 3 and 4. Standard deviation values for the original diagnosed case have been shown to be close to estimated noise characteristics of the instrument for each of the different channels [Stewart et al., 2014]. We note that the largest increase to standard deviations occurs for channel 106 only and corresponds to a multiplicative inflation factor for this channel of 2.02 for RR and 1.81 for ME. Channel 106 is sensitive to water vapour and is the channel in the original diagnosed covariance matrix with the smallest standard deviation. The choice of κ max = 100 is of a similar size to the value of the parameters used at NWP centres [Weston, 2011, Weston et al., 2014, Bormann et al., 2016. This means that in practice, the contribution of observation information from channels where instrument noise is low is being substantially reduced.
Channels are ordered by increasing wavenumber, and are grouped by type. We expect different wavenumbers to have different physical properties, and therefore different covariance structures. In particular larger standard deviations are expected for higher wavenumbers due to additional sources of error [Weston et al., 2014], which is observed on the right hand side of Figure 2. For RR, larger increases to standard deviations are seen for channels with smaller standard deviations for the original diagnosed matrix than those with large standard deviations. This also occurs to some extent for ME, although we observe that the update term in (15) is not constant in this case. This means that the reduction in weight in the analysis will not be uniform across different channels for ME. The result of Corollary 2 is satisfied; the increase to the variances is larger for RR than ME. This is particularly evident for channels where the variance from the original diagnosed covariance matrix is small. As MVI increases standard deviations by a constant factor, the largest changes for this method would occur for channels with large standard deviations in the original diagnosed matrix. This is in contrast to RR, where the largest changes occur for the channels in the original diagnosed matrix with the smallest standard deviation. Figure 3 shows the difference between the diagnosed correlation matrix, C, and the reconditioned correlation matrices C RR and C ME . As some correlations in the original IASI matrix are negative, we plot the entries of (C − C RR ) • sign(C) and (C − C ME ) • sign(C) in Figures 3a and b respectively. Here • denotes the Hadamard product, which multiplies matrices of the same dimension elementwise. This allows us to determine whether the magnitude of the correlation value is reduced by the reconditioning method; a positive value indicates that the reconditioning method reduces the magnitude of the correlation, whereas a negative value indicates an increase in the correlation magnitude. For RR, all differences are positive, which agrees with the result of Theorem 3. As MVI does not change the correlation matrix, an equivalent figure for this method is not given. We also note that there is a recognisable pattern in Figure 3a, with the largest reductions occurring for the channels in the original diagnosed correlation matrix which were highly correlated. This indicates that this method of reconditioning does not affect all channels equally.
For ME, we notice that there are a number of entries where the absolute correlations are increased after reconditioning. There appears to be some pattern to these entries, with a large number occurring in the upper left hand block of the matrix for channels with the smallest wavenumber [Weston et al., 2014]. However, away from the diagonal for channels 0-40, where changes by RR are very small, the many entries where absolute correlations are increased by ME are much more scattered. This more noisy change to the correlations could be due to the fact that 96 eigenvalues are set to be equal to a threshold value by the minimum eigenvalue method in order to attain κ max = 100. One method to reduce noise was suggested in Smith et al. [2018], which showed that applying localization methods (typically used to reduce spurious long-distance correlations that arise when using ensemble covariance matrices via the Schur product) after the reconditioning step can act to remove noise while retaining covariance structure.
For positive entries, the structure of C ME appears similar to that of RR. There are some exceptions however, such as the block of channels 121-126 where changes in correlation due to ME are small, but correlations are changed by quite a large amount for RR. The largest elementwise difference between RR and the original diagnosed correlation matrix is 0.138, whereas the largest elementwise difference between ME and the original diagnosed correlation matrix is 0.0036. The differences between correlations for ME and RR are shown in Figure 3c.
For both methods, although the absolute value of all correlations is reduced, correlations for channels 1-70 are eliminated. This has the effect of emphasising the correlations for channels that are sensitive to water vapour. Weston et al. [2014], Bormann et al. [2016] argue that much of the benefit of introducing correlated observation error for this instrument can be related to the inclusion of correlated error information for water vapour sensitive channels. Therefore, although the changes to the original diagnosed correlation matrix are large it is likely that a lot of the benefit of using correlated observation error matrices is retained.
We also note that it is more difficult to choose the best reconditioning method in this setting, due to the complex structure of the original diagnosed correlation matrix. In particular, improved understanding of how each method alters correlations and standard deviations is not enough to determine which method will perform best in an assimilation system. One motivation of reconditioning is to improve convergence of variational data assimilation algorithms. Therefore, one aspect of the system that can be used to select the most appropriate method of reconditioning is the speed of convergence. As ME results in repeated eigenvalues, we would expect faster convergence of conjugate gradient methods applied to the problem Rx = b for x, b ∈ R d for ME than RR. However, Campbell et al. [2017], Weston [2011], Weston et al. [2014], Bormann et al. [2015] find that RR results in faster convergence than ME for operational variational implementations. This is likely due to interaction between the reconditioned observation error covariance matrix and the observation operator, as the eigenvalues of H T R −1 H are shown to be important for the conditioning of the variational data assimilation problem in Tabeart et al. [2018].
Another aspect of interest is the influence of reconditioning on the analysis and forecast performance. We note that this is likely to be highly system and metric dependent. For example, Campbell et al. [2017] studies the impact of reconditioning on predictions of meteorological variables (temperature, geopotential height, precipitable water) over lead times from 0 to 5 days. In the U.S. Naval Research Laboratory system, ME performed slightly better at short lead times, whereas RR had improvements at longer lead times [Campbell et al., 2017]. Differences in forecast performance were mixed, whereas convergence was much faster for RR. This meant that the preferred choice was RR. However, in the ECMWF system, Bormann et al. [2015] studied the standard deviation of first-guess departures against independent observations. Using this metric of analysis skill, ME was found to out-perform RR. The effect of RR on the analysis of the Met Office 1D-Var system is studied in Tabeart et al. [2019], where changes to retrieved variables sensitive to water vapour (humidity, variables sensitive to cloud) are found to be larger than for other meteorological variables such as temperature.

Changes to the analysis of a data assimilation problem
In Section 3.5 we considered how the variational objective function is altered by RR, ME and MVI. We found that the two methods of reconditioning reduced the weight on scales corresponding to small eigenvalues by a larger amount than MVI, which changes the weight on all scales uniformly. In this section we consider how the analysis of an idealised data assimilation problem is altered by each of the three methods. We also consider how changing κ max alters the analysis of the problem.
In order to compare the three methods, we study how the solution x of a conjugate gradient method applied to the linear system Sx = b changes for RR, ME and MVI, where S = B −1 + H T R −1 H is the linearised Hessian associated with the 3D-Var objective function (24). To do this we define a 'true' solution, x true , construct the corresponding data b and assess how well we are able to recover x true when applying RR, ME and MVI to S. Haben [2011] showed that this is equivalent to solving the 3D-Var objective function in the case of a linear observation operator. We define a background error covariance matrix, B ∈ R 200×200 , which is a SOAR correlation matrix on the unit circle with correlation lengthscale 0.2 and a constant variance of 1. Our observation operator is given by the identity, meaning that every state variable is observed directly.
We construct a 'true' observation error covariance R true , given by a 200 dimensional SOAR matrix on the unit circle with standard deviation 1 and lengthscale 0.7. We then sample the Gaussian distribution with zero mean and covariance given by R true 250 times and use these samples to calculate an estimated sample covariance matrix via the Matlab function cov. This estimated sample covariance matrix R est has condition number κ(R est ) = 3.95 × 10 8 . The largest estimated standard deviation is 1.07 and the smallest is 0.90, compared to the true constant standard deviation of 1. RR, ME and MVI are then applied to R est with κ max = 100. When applying MVI, we use two choices of α which correspond to changes to the standard deviations (R RR (1, 1)) 1/2 , α RR = 1.41, and (R ME (1, 1)) 1/2 , α ME = 1.39. The modified error covariance matrices will be denoted R inf lRR = α 2 RR R est and R inf lME = α 2 ME R est . We define a true state vector, x(k) = 4 sin(kπ/100) − 5.1 sin(7kπ/100) + 1.5 sin(12kπ/100) − 3 sin(15kπ/100) + 0.75 sin(45kπ/100), which has five scales. We then construct b ≡ Sx using R true , and apply the Matlab 2018b pcg.m routine to the problem (B −1 + R −1 )x = b for each choice of R. We recall that x est denote the solution that is found using R est and x mod refer to a solution found using a modified version of R est , namely R RR , R ME , R inf lRR or R inf lME . The maximum number of iterations allowed for the conjugate gradient routine is 200, and convergence is reached when the relative residual is less than 1 × 10 −6 .
From Section 3.5 we expect RR, ME and MVI to behave differently at small and large scales. We therefore analyse how using each method alters the solution x at different scales using the discrete Fourier transform (DFT). This allows us to assess how well each scale of x true is recovered for each choice of R. As x true is the sum of sine functions, only the imaginary part of the DFT will be non-zero. We therefore define a true = imag(DF T (x true )); similarly a est = imag(DF T (x est )) and a mod = imag(DF T (x mod )).
By construction, as x, given by (40), is the sum of sine functions of period 2πn/200 for n = 1, 7, 12, 15, 45, a true returns a signal with 5 peaks, one for each value of n at frequency k = n.
The amplitude for all other values of k is zero. For frequencies larger than 20, all choices of estimated and modified R recover a true well. Figure 4 shows the correction that is applied by the modified choices of R compared to R est for the first 20 frequencies. A positive (negative) value shows that a mod moves closer to (further from) a true than a est . The distance from 0 shows the size of this change. For the first true peak (k = 1) RR is able to move closer to a true than a est . However, both reconditioning methods move further from the truth at the location of true signals k = 7, 12, 15. For frequencies where a est has a spurious non-zero signal RR and ME are able to move closer to a true than a est . At the location of true signals k = 7, 12, 15, MVI makes smaller changes compared to a est than either method of reconditioning. As all modifications to R est move a mod further from a true than a est for k = 7, 12, 15, MVI is therefore better able to recover the value of a true than RR or ME at these true peaks. However, MVI introduces a larger error for the first peak at k = 1 than RR or ME, and changes for frequencies k > 5 are much smaller than for reconditioning. This agrees with the findings of Section 3.5, that the weight on all scales is changed equally by MVI, whereas both methods of reconditioning result in larger changes to smaller scales and are hence able to make larger changes to amplitudes for higher frequencies. We recall from Section 3.5 that ME changes only the smaller scales, whereas RR also makes small changes to the larger scales. This behaviour is seen in Figure 4: for frequencies k = 0 to 5 ME results in very small changes, with much larger changes for frequencies 5 ≤ k ≤ 15. RR makes larger changes for larger values of k, but also moves closer to a true for 1 ≤ k ≤ 3.  We now consider how changing κ max alters the quality of x RR . As the behaviour for κ max = 100 shown in Figure 4 was similar for both RR and ME, we only consider changes to RR. Figure 5 shows the difference between a true and a RR for different choices of κ max . Firstly we consider the true signal that occurs at frequencies k = 1, 7, 12, 15. For k = 1 the smallest error occurs for κ max = 50 and the largest error occurs for κ max = 10000. For k = 7, 12, 15 the error increases as κ max decreases. For all other frequencies, reducing κ max reduces the error in the spurious non-zero amplitudes. For very large values of κ max we obtain small errors for the true signal, but larger spurious errors elsewhere.
Very small values of κ max can control these spurious errors, but fail to recover the correct amplitude for the true signal. Therefore a larger reconditioning constant will result in larger changes to the analysis. This means that there is a balance to be made in ensuring the true signal is captured, but spurious signal is depressed. For this framework a choice of κ max = 100 provides a good compromise between recovering the true peaks well and suppressing spurious correlations.
Finally, Table 2 shows how convergence of the conjugate gradient method is altered by the use of reconditioning and MVI. Using a larger inflation constant for MVI does lead to slightly faster convergence compared to R est . However, reducing κ max leads to a much larger reduction in the number of iterations required for convergence for both RR and ME. This agrees with results in operational data assimilation systems, where the choice of κ max and reconditioning method makes a difference to convergence Weston [2011], Tabeart et al. [2019].

Conclusions
Applications of covariance matrices often arise in high dimensional problems [Pourahmadi, 2013], such as numerical weather prediction (NWP) [Bormann et al., 2016, Weston et al., 2014. In this paper we have examined two methods that are currently used at NWP centres to recondition covariance matrices by altering the spectrum of the original covariance matrix: the ridge regression method, where all eigenvalues are increased by a fixed value, and the minimum eigenvalue method, where eigenvalues smaller than a threshold are increased to equal the threshold value. We have also considered multiplicative variance inflation, which does not change the condition number or rank of a covariance matrix, but is used at NWP centres [Bormann et al., 2016].
For both reconditioning methods we developed new theory describing how variances are altered. In particular, we showed that both methods will increase variances, and that this increase is larger for the ridge regression method. We also showed that applying the ridge regression method reduces all correlations between different variables. Comparing the impact of reconditioning methods and multiplicative variance inflation on the variational data assimilation objective function we find that all methods reduce the weight on observation information in the analysis. However, reconditioning methods have a larger effect on smaller eigenvalues, whereas multiplicative variance inflation does not change the sensitivity of the analysis to different scales. We then tested both methods of reconditioning and multiplicative variance inflation numerically on two examples: Example 1, a spatial covariance matrix, and Example 2, a covariance matrix arising from numerical weather prediction. In Section 4.2 we illustrated the theory developed earlier in the work, and also demonstrated that for two contrasting numerical frameworks, the change to the correlations and variances is significantly smaller for the majority of entries for the minimum eigenvalue method.
Both reconditioning methods depend on the choice of κ max , an optimal choice of which will depend on the specific problem in terms of computational resource and required precision. The smaller the choice of κ max , the more variances and correlations are altered, so it is desirable to select the largest condition number that the system of interest can deal with. Some aspects of a system that could provide insight into reasonable choices of κ max are: • For conjugate gradient methods, the condition number provides an upper bound on the rate of convergence for the problem Ax = b [Golub and Van Loan, 1996], and can provide an indication of the number of iterations required to reach a particular precision [Axelsson, 1996]. Hence κ max could be chosen such that a required level of precision is guaranteed for a given number of iterations.
• For more general methods, the condition number can provide an indication of the number of digits of accuracy that are lost during computations [Gill et al., 1986, Cheney, 2005.
Knowledge of the error introduced by other system components, such as approximations in linearised observation operators and linearised models, relative resolution of the observation network and state variables, precision and calibration of observing instruments, may give insight into a value of κ max that will maintain the level of precision of the overall problem.
• The condition number measures how errors in the data are amplified when inverting the matrix of interest [Golub and Van Loan, 1996]. Again, the magnitude of errors resulting from other aspects of the system may give an indication of a value of κ max that will not dominate the overall precision.
For our experiments we considered choices of κ max in the range 100 − 1000. For Experiment 2 these values are similar to those considered for the same instrument at different NWP centres e.g. 25, 100, 1000 [Weston, 2011], 67 [Weston et al., 2014], 54 and 493 [Bormann et al., 2015], 169 [Campbell et al., 2017]. We note that the dimension of this interchannel error covariance matrix in operational practice is small and only forms a small block of the full observation error covariance matrix. Additionally, the matrix considered in this paper corresponds to one observation type; there are many other observation types with different error characteristics.
In this work we have assumed that our estimated covariance matrices represent the desired correlation matrix well, in which case the above conditions on κ max can be used. This is not true in general, and it may be that methods such as inflation and localisation are also required in order to constrain the sources of uncertainty that are underestimated or mis-specified. In this case, the guidance we have presented in this paper concerning how to select the most appropriate choice of reconditioning method and target condition number will need to be adapted. Additionally, localisation alters the condition number of a covariance matrix as a side effect; the user does not have the ability to choose the target condition number κ max or control changes to the distribution of eigenvalues [Smith et al., 2018]. This indicates that reconditioning may still be needed in order to retain valuable correlation information whilst ensuring that the computation of the inverse covariance matrix is feasible.
The choice of which method is most appropriate for a given situation depends on the system being used and the depth of user knowledge of the characteristics of the error statistics. The ridge regression method preserves eigenstructure by increasing the weight of all eigenvalues by the same amount, compared to the minimum eigenvalue method which only increases the weight of small eigenvalues and introduces a large number of repeated eigenvalues. We have found that ridge regression results in constant changes to variances and strict decreases to absolute correlation values, whereas the minimum eigenvalue method makes smaller, non-monotonic changes to correlations and non-constant changes to variances. In the spatial setting, the minimum eigenvalue method introduced spurious correlations, whereas ridge regression resulted in a constant percentage reduction for all variables. In the inter-channel case, changes to standard deviations and most correlations were smaller for the minimum eigenvalue method than for ridge regression.
Another important property for reconditioning methods is the speed of convergence of minimisation of variational data assimilation problems. It is well-known that other aspects of matrix structure, such as repeated or clustered eigenvalues, are important for the speed of convergence of conjugate gradient minimisation problems. As the condition number is only sensitive to the extreme eigenvalues, conditioning alone cannot fully characterise the expected convergence behaviour. In the data assimilation setting, complex interactions occur between the constituent matrices [Tabeart et al., 2018], which can make it hard to determine the best reconditioning method a priori. One example of this is seen for operational implementations in Campbell et al. [2017], Weston [2011] where the ridge regression method results in fewer iterations for a minimisation procedure than the minimum eigenvalue method, even though the minimum eigenvalue method yields observation error covariance matrices with a large number of repeated eigenvalues. Furthermore, Tabeart et al. [2018] found cases in an idealised numerical framework where increasing the condition number of the Hessian of the data assimilation problem was linked to faster convergence of the minimisation procedure. Again, this was due to interacting eigenstructures between observation and background terms, which could not be measured by the condition number alone. Additionally, Haben [2011], Tabeart et al. [2018] find that the ratio of background to observation error variance is important for the convergence of a conjugate gradient problem. In the case where observation errors are small, poor performance of conjugate gradient methods is therefore likely. This shows that changes to the analysis of data assimilation problems due to the application of reconditioning methods are likely to be highly system dependent, for example due to: quality of estimated covariance matrices, interaction between background and observation error covariance matrices, specific implementations of the assimilation algorithm, and choice of preconditioner and minimisation routine. However, the improved understanding of alterations to correlations and standard deviations for each method of reconditioning provided here may allow users to anticipate changes to the analysis for a particular system of interest using the results from previous idealised and operational studies (e.g. Tabeart  where γ i (X) denotes the i-th largest singular value of X, p ≥ 1 and k ∈ {1, . . . , min{m, n}}.
As covariance matrices are positive semi-definite by definition, the singular values of a covariance matrix X ∈ R d×d are equal to its eigenvalues.
Proof. We apply the result given in Theorem 4 of [Tanaka and Nakata, 2014] for the trace norm (defined as p = 1 and k = d) to find the optimal value of µ * . Theorem 2 of the same work yields the minimising solutionX for the value of µ * .
We remark that the statement of Theorem 4 of [Tanaka and Nakata, 2014] uses the stronger assumption thatκ ≥ d. However, a careful reading of the proof of this theorem indicates that a weaker assumption is sufficient: we assume thatκ > d − l + 1 where l is the index such that λ l ≤ µ * < λ l−1 .
We note that this optimal value of µ * is the same as the threshold T = λ1 κ defined for the minimum eigenvalue method in (5) and hence the minimum eigenvalue method is equivalent to the Ky Fan 1-d minimizer of (A.2)in the case that κ ≥ d − l + 1.
The minimum eigenvalue method is still a valid method of reconditioning when the additional assumption on the eigenvalues of X is not satisfied. In particular, in the experiments considered in Section 4 we see qualitatively similar behaviour for the choices of T that satisfy the assumption, and those that do not. It is possible that the lower bound on the condition number imposed by the additional constraint on κ max could provide guidance on the selection of the target condition number.