Optimal shrinkage-based portfolio selection in high dimensions

In this paper we estimate the mean-variance portfolio in the high-dimensional case using the recent results from the theory of random matrices. We construct a linear shrinkage estimator which is distribution-free and is optimal in the sense of maximizing with probability $1$ the asymptotic out-of-sample expected utility, i.e., mean-variance objective function for different values of risk aversion coefficient which in particular leads to the maximization of the out-of-sample expected utility and to the minimization of the out-of-sample variance. One of the main features of our estimator is the inclusion of the estimation risk related to the sample mean vector into the high-dimensional portfolio optimization. The asymptotic properties of the new estimator are investigated when the number of assets $p$ and the sample size $n$ tend simultaneously to infinity such that $p/n \rightarrow c\in (0,+\infty)$. The results are obtained under weak assumptions imposed on the distribution of the asset returns, namely the existence of the $4+\varepsilon$ moments is only required. Thereafter we perform numerical and empirical studies where the small- and large-sample behavior of the derived estimator is investigated. The suggested estimator shows significant improvements over the existent approaches including the nonlinear shrinkage estimator and the three-fund portfolio rule, especially when the portfolio dimension is larger than the sample size. Moreover, it is robust to deviations from normality.


Introduction
In the seminal paper of Markowitz (1952) the author suggests to determine the optimal composition of a portfolio of financial assets by minimizing the portfolio variance assuming that the expected portfolio return attains some prespecified fixed value. By varying this value we obtain the whole efficient frontier in the mean-standard deviation space. Despite of its simplicity, this approach justifies the advantages of diversification and is a standard technique and benchmark in asset management. Equivalently (see, Tobin (1958), Bodnar et al. (2013)) we can obtain the same portfolios by maximizing the expected quadratic utility (EU) with the optimization problem given by w µ n − γ 2 w Σ n w → max subject to w 1 p = 1 , (1.1) where w = (ω 1 , . . . , ω p ) is the vector of portfolio weights, 1 p is the p-dimensional vector of ones, µ n and Σ n are the p-dimensional mean vector and the p × p covariance matrix of asset returns, respectively. The quantity γ > 0 determines the investor's behavior towards risk. It must be noted that the maximization of the mean-variance objective function (1.1) is equivalent to the maximization of the exponential utility (CARA) function under the assumption of normality of the asset returns. In this case γ equals the investor's absolute risk aversion coefficient (see, e.g., Pratt (1964)). The solution of the optimization problem (1.1) is well known and it is given by is the vector of the weights of the global minimum variance (GMV) portfolio. By changing the risk-aversion coefficient γ ∈ (0, ∞) we obtain the set of optimal portfolios. Merton (1972) proved that this set is a parabola in the mean-variance (R-V) space (cf. Bodnar and Schmid (2009)) given by where R GM V = µ n Σ −1 n 1 p 1 p Σ −1 n 1 p and V GM V = 1 1 p Σ −1 n 1 p (1.6) are the expected return and the variance of the GMV portfolio, and s = µ n Q n µ n (1.7) is its slope parameter. The quantity s is always non-negative since Q n is a positive semidefinite matrix. Moreover, when s is equal to zero, then the efficient frontier degenerates into a straight line with the GMV portfolio being the only optimal portfolio. In practice, however, the above mentioned approach of constructing an optimal portfolio frequently shows poor out-of-sample performance in terms of various performance measures. Even naive portfolio strategies, e.g., equally weighted portfolio (see, DeMiguel et al. (2009)), often outperform the mean-variance strategy. One of the reasons is the estimation risk. The unknown parameters µ n and Σ n have to be estimated using historical data on asset returns. This results in the "plug-in" estimator of the EU portfolio (1.2) which is a traditional and simple way to evaluate the portfolio in practice. This estimator is constructed by replacing the mean vector µ n and the covariance matrix Σ n with their sample counterparts in (1.2). Okhrin and Schmid (2006) derive the expectation and the variance of the sample portfolio weights under the assumption that the asset returns follow a multivariate normal distribution, whereas Bodnar and Schmid (2011) obtain the exact finite-sample distribution. Recently, Bodnar et al. (2016) extended these results to the case n < p.
The estimation of the parameters has a negative impact on the performance of the asset allocation strategy. This is noted in a series of papers with Merton (1980), Best and Grauer (1991), Chopra and Ziemba (1993) among others. Several approaches have arisen to reduce the consequences of the estimation risk. One strand of research opts for the Bayesian framework and using appropriate priors takes the estimation risk into account already while building the portfolio. The second strand relies on the shrinkage techniques and is related to the method exploited in this paper. A straightforward way to improve the properties of the estimators for µ n and Σ n is to use the shrinkage approach (see, Jorion (1986), Ledoit and Wolf (2004)). Alternatively, one may apply the shrinkage estimation to the portfolio weights directly. Golosnoy and Okhrin (2007) consider the multivariate shrinkage estimator by shrinking the portfolios with and without the riskless asset to an arbitrary static portfolio. A similar technique is used by Frahm and Memmel (2010), who construct a feasible shrinkage estimator for the GMV portfolio which dominates the traditional one. At last, Bodnar et al. (2018) suggest a shrinkage estimator for the GMV portfolio which is feasible even for the singular sample covariance matrix.
An important issue nowadays is, however, the asset allocation for large portfolios. The sample estimators work well only in the case when the number of assets p is fixed and substantially smaller than the sample size n. This case is known as the standard asymptotics in statistics (see, Le Cam and Lo Yang (2000)). Under this asymptotics the traditional sample estimator is a consistent estimator for the EU portfolio. But what happens when the dimension p and the sample size n are comparable of size, say p = 900 and n = 1000? Technically, here we are in the situation when both the number of assets p and the sample size n tend to infinity. In the case when p/n tends to some concentration ratio c > 0 this asymptotics is known as highdimensional asymptotics or "Kolmogorov" asymptotics (see, e.g., Bai and Silverstein (2010)). If c is close to one the sample covariance matrix tends to be close to a singular one and when c > 1 it becomes singular. Thus it is very unstable and tends to under-or overestimate the true parameters for c smaller but close to 1 (see, Bai and Shi (2011)). As a result, the sample estimator of the EU portfolio behaves badly in this case both from the theoretical and practical points of view (see, e.g., El Karoui (2010); Rubio et al. (2012)). For c > 1 the inverse sample covariance matrix does not exist and the portfolio cannot be constructed in the traditional way.
Taking the above mentioned information into account the aim of the paper is to construct a feasible and simple shrinkage estimator of the EU portfolio which is optimal in an asymptotic sense and is additionally distribution-free. The estimator is developed using the fast growing branch of probability theory, namely random matrix theory. The main result of this theory is proved by Marčenko and Pastur (1967) and further extended under very general conditions by Silverstein (1995). Now it is called Marcenko-Pastur equation. Its importance arises in many areas of science because it shows how the true covariance matrix and its sample estimator are connected asymptotically. Knowing this we can build suitable estimators for high-dimensional quantities which depend on Σ n . In our case this refers to the shrinkage intensities. Note however, that the optimal shrinkage intensity depends again on the unknown characteristics of the asset returns. To overcome this problem we derive consistent estimators for specific functions (quadratic and bilinear forms) of the inverse sample covariance matrix and mean vector. Furthermore, we succeed to provide consistent estimators for the optimal shrinkage intensities too. Additional advantage of our approach is the simultaneous treatment of estimation risks of both the covariance matrix and the mean vector. In particular we contribute to the existent literature (see, Ledoit and Wolf (2017a)) by weakening the assumption imposed on the mean vector of the asset returns.
It is worth mentioning that there are clear links between the subject of the paper and classical methods in statistical signal processing. The data generating process considered in the paper encompasses a broad range of system configurations described by the general vector channel model. Moreover, as for the aforementioned mean-variance portfolio optimization problem, usual linear filtering schemes solving typical signal waveform estimation and detection problems in signal array processing and wireless communications are based on the estimation of the unknown population covariance matrix. Famous example is the equivalence of the GMV portfolio to the so-called Capon or minimum variance distortionless response (MVDR) beamformer (see, Verdú (1998); Van Trees (2002)).
The rest of paper is organized as follows. In the next section, we construct a shrinkage estimator for the optimal portfolio weights obtained by shrinking the EU portfolio weights to an arbitrary target portfolio. The oracle shrinkage intensity and the corresponding feasible bona-fide estimators for c < 1 and c > 1 are established as well. The derived results are evaluated in Section 3 in extensive simulation and empirical studies. All proofs are moved to the Appendix presented in the supplementary material.
2 Optimal shrinkage estimator of mean-variance portfolio Let Y n = (y 1 , y 2 , ..., y n ) be the p × n data matrix which consists of n vectors of the returns on p ≡ p(n) assets. Let E(y i ) = µ n and Cov(y i ) = Σ n for i ∈ 1, ..., n. We assume that p/n → c ∈ (0, +∞) as n → ∞. This type of limiting behavior is known as "the large dimensional asymptotics" or "Kolmogorov asymptotics". In this case the traditional sample estimators perform poorly or even very poorly and tend to over/underestimate the unknown parameters of the asset returns, e.g., the mean vector and the covariance matrix. Throughout the paper it is assumed that there exists a p × n random matrix X n which consists of independent and identically distributed (i.i.d.) real random variables with zero mean and unit variance such that Y n = µ n 1 n + Σ 1 2 n X n . (2.1) It must be noted that the observation matrix Y n has dependent rows but independent columns. Broadly speaking, this means that we allow arbitrary cross-sectional correlations of the asset returns but assume their independence over time. Although this assumption looks quite restrictive for financial applications, there exist stronger results from random matrix theory which show that the model can be extended to (weakly) dependent variables by demanding more complicated conditions on the elements of Y n (see, Bai and Zhou (2008)) or by controlling the number of dependent entries as dimension increases (see, Hui and Pan (2010), Friesen et al. (2013), Wei et al. (2016)). Although our findings can still be used when weak serial dependence structure is present between the observation vectors, like in the case of uncorrelated GARCH (generalized autoregressive conditional heteroscedastic) processes or similar ones (see, e.g., the simulation study in Bodnar et al. (2021a)), we suspect substantial changes in the analytical expressions stated in the theorems for strongly correlated observation vectors, like in the case of VAR (vector autoregressive) processes. In such situations, the estimator will depend on the autocorrelation matrices of the underlying stochastic model and the theoretical results of the paper must be adjusted correspondingly. This interesting and important topic is not treated in the paper and is left for future research. Nevertheless, if the entries of matrix Y n are weakly dependent or so called m-dependent, this will only make the proofs more technical, but leave the results unchanged. For that reason we assume independent in time asset returns only to simplify the proofs of the main theorems and make them as transparent as possible. The three assumptions which are used throughout the paper are the following: (A1) The covariance matrix of the asset returns Σ n is a nonrandom p-dimensional positive definite matrix.
(A2) The elements of the matrix X n have uniformly bounded 4 + ε moments for some ε > 0.
(A3) The efficient frontier is asymptotically a non-degenerate object, i.e. for its slope parameter it holds that s = µ n Q n µ n > 0 uniformly in p.
All of these regularity assumptions are general enough to fit many real world situations. The assumption (A1) together with (2.1) are usual for financial and statistical problems and they impose no strong restrictions. The assumption (A2) is a technical one. Although we demand the existence of moments of order a bit higher than four, this is solely due to the fact that the almost sure convergence is employed in the formulation of the theoretical results. In case of the convergence in probability the existence of exactly the fourth moment is sufficient. Indeed, it can be easily shown that this extra ε follows from the Borel-Cantelli lemma (see Rubio and Mestre (2011)[Proof of Lemma 4]). The assumption (A3) has an important financial interpretation. It ensures that the efficient frontier is a parabola in the mean-variance space as defined in (1.5) and it does not degenerate into a line parallel to the variance axis (cf., Bodnar and Bodnar (2010)). In the latter case, the only optimal portfolio is the GMV portfolio (1.4), a special case of the EU portfolio (1.2) with γ = ∞, and its shrinkage estimators have already been developed in Frahm and Memmel (2010) and Bodnar et al. (2018). The assumption (A3) can be tested in practice by using Theorem 1 of Bodnar et al. (2021c).
The sample covariance matrix is given by where the symbol I n stands for the n-dimensional identity matrix. The sample mean vector becomes y n = 1 n Y n 1 n = µ n + Σ In this section we consider the optimal shrinkage estimator for the EU portfolio weights presented in the introduction by finding the shrinkage parameter α and fixing some target portfolio b.
The resulting estimator for c < 1 is given bŷ where the vectorŵ S is the sample estimator of the EU portfolio given in (1.2), namelŷ (2.6) The target portfolio b ∈ R p is a given nonrandom (or random, but independent of Y n ) vector with b 1 p = 1.No assumption is imposed on the shrinkage intensity α n which is the object of our interest.

5
The aim is now to find the optimal shrinkage intensity for a given nonrandom target portfolio b. For that reason we introduce a unified mean-variance objective function in order to calibrate the shrinkage intensity α n . Consider the following optimization problem Obviously, the mean-variance objectives (1.1) and (2.7) coincide if β = γ. Other special values of β which lead to widely used out-of-sample performance measures we summarize in the following proposition Proposition 2.1 (Calibration criteria). The optimization problem (2.7) is equivalent to The proof of Proposition 2.1 follows from the fact that all optimal mean-variance portfolios can be obtained by maximizing the expected quadratic utility function with a specific risk aversion coefficient. As a result, the global minimum variance portfolio is a partial solution of the optimization problem (1.1). The presentation of the calibration criterion (2.7) provides an elegant way how to find the optimal shrinkage intensity α n = α n (β) in a unified manner for several popular out-of-sample loss functions and compare them just by changing the parameter β. In Section 2.3, we provide consistent estimates of these quantities under high-dimensional asymptotic regime p/n → c > 0 for (p, n) → ∞.
It is worth mentioning that the coefficient β has an interesting interpretation from statistical point of view. While coefficient γ controls for investor attitude towards financial risk ("insample risk"), the parameter β stays for controlling the estimation risk ("out-of-sample risk"). This implies that even the mean-variance investor with arbitrary γ > 0 could choose β → ∞ if she/he is interested, for example, in the minimization of the out-of-sample variance of the estimated portfolio.
The unified calibration criterion (2.7) can be rewritten as (2.8) Next, taking the derivative of U with respect to α n and setting it equal to zero we get From the last equation it is easy to find the optimal shrinkage intensity α * n given by To ensure that α * n is the unique maximizer of (2.7) the second derivative of U must be negative, which is always fulfilled. Indeed, it follows from the positive definitiveness of the matrix Σ n , namely (2.10) In the next theorem we derive the asymptotic properties of the optimal shrinkage intensity α * n under large-dimensional asymptotics.
Theorem 2.1. Assume (A1)-(A3). Then it holds that where the parameters of the efficient frontier R GM V , V GM V and s are given in (1.6) and (1.7), respectively. The quantities R b = b µ n and V b = b Σ n b denote the expected return and the variance of the target portfolio b.
Next, we assess the performance of the classical estimator of the portfolio weightsŵ S and the optimal shrinkage weightsŵ GSE . As a measure of performance we consider the relative increase in the utility of the portfolio return compared to the portfolio based on true parameters of asset returns. The results are summarized in the following corollary.
Corollary 2.1. (a) Let U EU and U S be the mean-variance objectives in (1.1) for the true EU portfolio and its traditional estimator. Then under the assumptions of Theorem 2.1, the relative loss of the traditional estimator of the EU portfolio is given by for p n → c ∈ (0, 1) as n → ∞.
(b) Let U GSE be the expected quadratic utility for optimal shrinkage estimator of the EU portfolio. Under the assumptions of Theorem 2.1, the relative loss of the optimal shrinkage estimator is given by 2.2 Oracle estimator. Case c > 1.
Here, similarly as in Bodnar et al. (2018), we will use the generalized inverse of the sample covariance matrix S n . Particularly, we use the following generalized inverse of the sample covariance matrix S n S * n = Σ −1/2 n 1 n X n X n −x nx n + Σ −1/2 n , (2.14) where + denotes the Moore-Penrose inverse. It can be shown that S * n is a generalized inverse of S n satisfying S * n S n S * n = S * n and S n S * n S n = S n . However, S * n is not exactly equal to the Moore-Penrose inverse because it does not satisfy the conditions (S * n S n ) = S * n S n and (S n S * n ) = S n S * n .

7
In case c < 1 the generalized inverse S * n coincides with the usual inverse S −1 n . Moreover, if Σ n is a multiple of identity matrix, then S * n is equal to the Moore-Penrose inverse S + n . In this section, S * n is used only to determine an oracle estimator for the weights of the EU portfolio. The bona fide estimator is constructed in the next section.
Thus, the oracle estimator for c > 1 is given bŷ where the vectorŵ S * is the sample estimator of the EU portfolio given in (1.2), namelŷ (2.17) Again, the shrinkage intensity α + n is the object of our interest. In order to save place we skip the optimization procedure for α + n as it is only slightly different from the case c < 1. The optimal shrinkage intensity α + n in case c > 1 is given by In the next theorem we find the asymptotic equivalent quantity for α + n in the case p/n → c ∈ (1, +∞) as n → ∞.
Next, as for the case c < 1, we provide here the expression for the relative losses.
Corollary 2.2. (a) Let U EU and U S be the mean-variance objectives in (1.1) for the true EU portfolio and its traditional estimator. Then under the assumptions of Theorem 2.2, the relative loss of the traditional estimator of the EU portfolio is given by for p n → c ∈ (1, +∞) as n → ∞.
(b) Let U GSE be the expected quadratic utility for the optimal shrinkage estimator of the EU portfolio. Under the assumptions of Theorem 2.2, the relative loss of the optimal shrinkage estimator is given by for p n → c ∈ (1, +∞) as n → ∞ with L b = (U EU − U b )/U EU is the relative loss in the expected utility U b of the target portfolio b.

Estimation of unknown parameters. Bona fide estimator
The limiting shrinkage intensities α * and α + are not feasible in practice, because they depend on R GM V , V GM V , s, R b , and V b which are unknown quantities. In this subsection we derive consistent estimators for α * and α + . These results are summarized in two propositions dealing with the cases c ∈ (0, 1) and c ∈ (1, ∞), respectively. The statements follow directly from the proofs of Theorems 2.1 and 2.2 that are provided in the supplement of the paper.
Proposition 2.2. The consistent estimator for the limiting optimal shrinkage intensity α * under large dimensional asymptotics p/n → c < 1 as n → ∞ is given by which are also ratio consistent estimators for R GM V , V GM V , s, R b , and V b , respectively, whilê R GM V ,V GM V andŝ are traditional plug-in estimators.
Using Proposition 2.2 we can immediately construct a bona-fide estimator for the expected utility portfolio weights in case c < 1. It holds that with α * given in Proposition 2.2. The expression (2.28) is the optimal shrinkage estimator for a given target portfolio b in the sense that the shrinkage intensity α * tends almost surely to its optimal value α * for p/n → c ∈ (0, 1) as n → ∞.
The situation is more complex in case c > 1. Here we can present only oracle estimators for the unknown quantities R GM V , V GM V and s.
Proposition 2.3. The consistent estimator for the limiting optimal shrinkage intensity α * under large dimensional asymptotics p/n → c > 1 as n → ∞ is given by whereR GM V ,V GM V andŝ are the traditional plug-in estimators based on the generalized inverse S * n from (2.14) andR b andV b are given in (2.26) and (2.27), respectively.
Note that α o from Proposition 2.3 is not the bona fide estimator for the unknown shrinkage intensity α + , since the matrix S * n depends on the unknown quantities. Thus, we propose a reasonable approximation using the application of the Moore-Penrose inverse S + n . As a result, the bona fide estimators of the quantities R GM V , V GM V and s in case c > 1 are approximated byR respectively. The application of (2.30) leads to the bona fide optimal shrinkage estimator of the EU portfolio in case c > 1 expressed aŝ whereR b andV b are given in (2.26) and (2.27), respectively; Q + n = S + n − S + n 11 S + n 1 S + n 1 and S + n is the Moore-Penrose pseudo-inverse of the sample covariance matrix S n .
Remark 1. It is easy to verify that if Σ n = σ 2 I p for any σ > 0 the considered approximations in (2.30) become the exact ones. Next, we investigate the quality of this approximation in general case without imposing restrictions on Σ n . This issue was studied for other quantities involving S * n and S + n in detail by Bodnar and Parolya (2020), who compare the limiting spectral distributions of S * n and S + n by deriving the limits for their corresponding Stieltjes transforms. It is concluded that the two inverses behave completely different in general. However, when the concentration ratio c approaches 1, then the limiting spectral distributions of both inverses S * n and S + n coincide independently of the structure of Σ n . This in turn means that one should expect a good approximation quality when c is not far away from 1.
In Figure 1 we provide a comparison between the optimal shrinkage intensities computed using different types of generalized inverses, namely Moore-Penrose inverse S + n and the reflexive inverse S * n from (2.14). The optimal shrinkage intensity is calculated by (2.32) in the case of S + n and by using (2.32) whereR The design of the simulation study is exactly the same as the one described in Section 3.1.
We observe that the two optimal shrinkage intensities are quite identical till the breaking point c = 2. For c > 2 the Moore-Penrose approximation is not reliable anymore. We observe a similar behaviour for other structures of the covariance matric Σ n and the mean vector µ n , which indicates that the results are robust and justifies the theoretical findings of Bodnar and Parolya (2020) for the optimal shrinkage intensity given in (2.32). Figure 2 provides further numerical results related to the comparison of the two optimal shrinkage intensities. Here, we set c = 1.5 and c = 2 and study the robustness of the results in Figure 1 to changes in the dimension p from 50 to 300. We see that the shrinkage intensities are very similar uniformly over p, independently of the chosen value of c.
Summarizing the above findings, we can recommend the application of the Moore-Penrose approximation for c ≤ 2, but there is no guarantee for a good performance for c > 2. Also, we observe that the Moore-Penrose inverse gets closer to S * n when the covariance matrix Σ n is sparse. In the empirical study of Section 3.2 we consider the values of the concentration ratio c bounded by 2. The empirical results are in line with the discussion provided in this remark and, thus, c ≈ 2 seems to be a breaking point for the approximation indeed. More theoretical treatment of this interesting phenomenon is of an independent research interest and is not within the scope of this paper.
Remark 2. Seemingly, we have handled two cases c < 1 and c > 1 (for c ≤ 2), but not c = 1. The case c = 1 is not easy to manage because the sample covariance matrix is theoretically invertible for c equal or close to one but computationally very unstable. The reason is the smallest eigenvalue of S n which is numerically very close to zero. Indeed, it is well-known that the smallest eigenvalue of S n is of order (1 − p/n) 2 , which converges to zero if p/n → 1 and all the estimators explode (see, e.g., Bai and Yin (1993)).
In order to overcome the difficulty in a small neighborhood of c = 1 one has a few options to proceed: Tikhonov (ridge) One of the possibilities, which has also been used in the simulation and empirical studies of Section 3, is the Tikhonov (ridge) approximation of the Moore-Penrose inverse. Indeed, one can show by the eigenvalue decomposition that if c > 1. For c < 1 the limit in (2.33) trivially exists and equals S −1 n . The advantage of the representation (2.33) is twofold. First, one has an elegant formula which incorporates both cases c < 1 and c > 1, and, secondly, one stabilizes the behaviour of the inverse matrix near singularity, i.e., near c = 1. The only question arises how to choose δ = o(1) in practice, but it seems that taking δ = 1/p works well in many applications. Thus, we will employ this adjustment in the empirical study in order to have a balanced and stable estimator when c is close to 1 from both sides. Although this procedure smoothes out the estimator of the precision matrix, it does not resolve the issue when c is large, i.e., c > 2.
Moore-Penrose Yet another option would be to derive the explicit limit of (2.18) when the Moore-Penrose inverse matrix S + n is directly used in (2.15)-(2.17). This procedure is highly nontrivial because S + n depends in a nonlinear way on the matrix Σ n and this leads to nonlinear integral equations in the high-dimensional setting. The problem becomes even more involved when we consider quadratic and bilinear forms involving the Moore-Penrose inverse. Moreover, the case of centered sample covariance matrix (the sample mean is subtracted) makes the expressions tedious and confusing (see, e.g., Pan (2014)). Thus, the fact that the optimal shrinkage intensity depends on the mean vector µ n and the covariance matrix Σ n only via the three parameters of the efficient frontier will be lost and no closed-form formulas can be derived for Σ n = σ 2 I. Nevertheless, we are working on this problem in a separate project and are studying the properties of the pseudo-inverse S + n in detail, especially the limiting behaviour of its eigenvectors.
Double-shrinkage A more intuitive option is to apply a double-shrinkage approach, which incorporates both the shrinkage of the estimated portfolio weights and the regularization of the sample covariance matrix. Namely, first we shrink (regularize) the sample covariance matrix and then we shrink the estimated portfolio weights built upon the regularized sample covariance matrix. This could be done by taking either the matrix S n + δI for some δ > 0 or the one from (2.33) instead of S n . Then the limiting shrinkage intensity α will depend on δ and one needs to optimize the objective function (2.7) over δ as well.
Unfortunately, no closed-form expression for the optimal shrinkage intensity is available in this case and the theoretical results presented in Section 2 must be changed accordingly. Actually, this procedure would generalize all of the above mentioned ideas (including the ones presented in this paper) and would provide a data-driven regularization parameter δ. This interesting and important topic is left for future research.

Simulation and empirical studies
In this section we illustrate the performance and the advantages of the derived results using simulated and real data. Particularly we address the estimation precision of the shrinkage coefficient and compare the bona-fide estimator with the existent approaches.

Simulation study
For simulation purposes we select the structure of the spectrum of the covariance matrix and of the mean vector to make it consistent with the characteristics of the empirical data. Particularly, for each dimension p we select the expected returns equally spread on the interval -0.3 to 0.3, capturing a typical spectrum of daily returns measured in percent. The covariance matrix has a strong impact on the properties of the shrinkage intensity and for this reason we consider several structures of its spectra. Replicating the properties of empirical data we generate covariance matrices with eigenvalues satisfying the equation λ i = 0.1e δc·(i−1)/p for i = 1, ..., p (see, e.g., Bodnar et al. (2021b) for implementation). Thus the smallest eigenvalue is 0.1 and by selecting appropriate values for c we control the largest eigenvalue and thus the condition index of the covariance matrix. Large condition indices imply ill-conditioned covariance matrices, with the eigenvalues very sensitive to changes of the elements. We choose δ to attain the condition indices of 150, 1000 and 8000. The target portfolio weights are set equal to the weights of the equally weighted portfolio, i.e. b i = 1/p for i = 1, ..., p. The calibration criteria used to determine the optimal shrinkage intensities are selected as defined in Proposition 2.1. First, we assess the general behavior of the oracle shrinkage intensities as functions of c. The oracle shrinkage intensities are computed using expressions in (2.11) and (2.19) for the cases c < 1 and c > 1, respectively. The parameters are computed using the true mean vector and the true covariance matrix. The results are illustrated for different condition indices and different calibration criteria in Figure 3. We observe that in all cases the shrinkage intensity falls to zero as c → 1 − and increases with c for c > 1. Thus if c is small the shrinkage estimator puts higher weight on the traditional estimator of the portfolio weights, due to lower estimation risk. If c tends to 1 the system becomes unstable because of nearly zero eigenvalues. In this case the portfolio weights collapse to the target portfolio weights. With c further increasing the shrinkage intensity increases too, implying that the pseudo-inverse covariance matrix can be evaluated in a proper way. The fraction of the sample EU portfolio increases with c in this case. It is worth mentioning that at some high level of c the information content in the data becomes less relevant and the shrinkage intensity starts to decrease. Note, however, that even for p much larger than n, there is still valuable information in the sample covariance matrix leading to relatively high values of α + .
Regarding the calibration criteria we observe that if the calibration criteria coincides with the expected quadratic utility (i.e. β = γ), then the limit shrinkage intensities are naturally higher, compared to those minimizing the out-of-sample variance. The variance of portfolio return for the equally weighted portfolio tends to be lower than that of the sample EU portfolio. Thus the shrinkage intensity weights the equally weighted portfolio more heavily. It is important to stress that the latter calibration criterion is more sensitive to the condition index. In a similar fashion we analyze the relative losses of portfolios based on the traditional estimator and the oracle shrinkage estimator. As a benchmark, we take the equally weighted portfolio which is also the target portfolio of the shrinkage estimator. The relative losses as functions of c for fixed p = 100 are plotted in Figure 4. For c < 1 the losses of the traditional estimator show explosive behavior and are comparable to the shrinkage-based estimators only for very small values of c. Thus the traditional estimator is reliable only if the sample size is considerably larger than the portfolio dimension. The performance of the shrinkage-based estimator is relatively stable over the whole range of c and it clearly dominates both the traditional and the equally weighted benchmark in almost all of the considered cases. The losses are increasing for c < 1 and attain the loss of the equally weighted portfolio around c = 1. This is consistent with the results in Figure 3. For c > 1 the losses decrease and remain stable for c > 3.
The behavior of losses as functions of the dimension p is illustrated in Figure 5. For space reasons we provide here only the results for β = γ, i.e., for the first calibration criterion. The fraction c is set to 0.2 (top left), 0.5 (top right), 0.8 and 2, while the condition index equals 1000. From financial perspective it is important to note that the traditional estimator outperforms the equally weighted portfolio only for small values of c (in the particular setup for c = 0.2), thus when the classical estimators are stable and robust. This is consistent with the evidence from Figure 4. For c = 0.8 the losses of the traditional estimator increase dramatically and they are always considerably larger than the losses of two other considered trading strategies. As before the shrinkage-based estimator clearly beats both the traditional EU portfolio and the benchmark portfolio for all c values. Furthermore, the performance of the shrinkage-based portfolio is stable for a wide range of dimensions, particularly for large values of p.

Empirical study
The data used in this study cover daily returns on 395 S&P500 constituents available for the whole period from 01.01.2000 till 23.03.2018. The investor allocates his/her wealth to the constituents with daily reallocation. We address several issues in this empirical study. First, we wish to verify the robustness of the established theoretical results for empirical data. Thus our aim is to go beyond the common practice of considering a single portfolio, but to generate a large set of different portfolios from the universe of the S&P 500 constituents. Second, we assess the economic performance of the dynamic portfolio strategies stemming from the generalized shrinkage estimator for portfolio weights. Thus we consider several popular performance measures and test the significance in the differences between the alternative strategies. Third, the choice of the target portfolio can clearly have a substantial impact on the empirical results. For this reason, we consider several popular choices of the target. Finally, we wish to assess the dynamics of the estimated shrinkage intensities and relate their behavior to the market conditions. Next, we provide details on the setup of the empirical study.
To address the applicability of the suggested estimator in high dimensions we set p = 300 which is larger than a typical portfolio size in the literature. For each parameter constellation we draw randomly 1000 sets of assets from the available constituents of the S&P500 index. This guarantees a robust assessment of the empirical results. For every set of the assets we build portfolios on each of the last 1000 trading days and compute the corresponding realized returns. Afterwards, we compute the certainty equivalent (CE), Sharpe ratio (SR), Value-at-Risk (VaR) and Expected shortfall (ES) as performance measures for each path of returns and every random portfolio. To avoid potentially skewed inferences due to outliers or asymmetries we report the 10%-trimmed means and the medians of the CE and SR over the 1000 random portfolios. The VaR and ES are computed as lower empirical quantiles at 5% and 1% significance levels and are averaged over the portfolios either. For simplicity we neglect the transaction costs in the below discussion.

Target portfolios
The target portfolio weights are the key component of the shrinkage estimator. We consider three different targets: the equally weighted portfolio and two modified global minimumvariance portfolios. The equally weighted portfolio arises if we assume that all asset returns have equal expectations, equal variances and equal correlations. The covariance matrix for the first global-minimum variance portfolio assumes different variances, but equal correlations. Thus allows for more heterogeneity compared to the equally weighted portfolio. The single correlation is computed as the average correlation for all asset pairs. The corresponding target is computed by b ec =Σ −1 ec 1/1 Σ −1 ec 1, whereΣ ec = diag{Σ} 1/2 R ec diag{Σ} 1/2 with diag{Σ} being the diagonal of the sample covariance matrix of returns. For the second global minimum-variance portfolio we compute the covariance matrix of returns using the three-factor Fama-French model, i.e.
whereB is the matrix of estimated parameters,Σ f is the covariance matrix of the factors and diag{σ 2 ε ε ε,i } i=1,...,p is the diagonal matrix of residual variances. The resulting target vector of weights is defined as The latter two portfolios reduce the variation of portfolios by looking at the variance but not the mean of the underlying assets. Note, that b ec and b f f are stochastic by construction, since they are computed using sample characteristics of the asset returns. Thus the theoretical results in these cases are valid only conditionally on the target vector.

Benchmark models
To guarantee a fair assessment of the suggested estimator we consider two popular approaches as benchmarks. The first approach is based on the non-linear shrinkage estimator of the covariance matrix suggested by Wolf (2012, 2017a). The estimator relies on the spectral decomposition of the sample covariance matrixΣ = UDU , but replaces the original eigenvalues by eigenvalues D * that minimize the Frobenius norm D * = argmin D ||Σ − UDU ||. The solution can be approximated using a generalized version of the Marčenko-Pastur equation. First, we consider a numerical implementation of this method proposed in Ledoit and Wolf (2017b) (called LWQuEST ). The direct numerical computation of the eigenvalues appears to be very demanding. Recently, Ledoit and Wolf (2020) suggest an analytic expression of the nonlinear shrinkage estimator that uses a nonparametric estimator of the spectral density (called LWAnalytic). In order to determine the optimal portfolio weights we use these two approaches as a plug-in estimator of the covariance matrix. Note that in contrary to our approach, these methods shrink a parameter of the distribution of asset returns and not the portfolio weights, which are the key object of interest in asset allocation problems. The second benchmark is an extension of the three-fund portfolio of Kan and Zhou (2007). The optimal portfolio is a linear combination of the target portfolio, the sample global minimum-variance portfolio and the portfolio that maximizes the Sharpe ratio. 2 .
The weights of this linear combination are determined by maximizing the expected out-ofsample utility. Note that Kan and Zhou (2007) derive the optimal portfolios assuming a finite portfolio with p < n. The extension of these results to the case p/n → c goes beyond the scope of this paper. For this reason we apply this estimator only for the case p < n. For both benchmarks we estimate the expected returns by the historical mean returns, as it is frequently done in practice.

Empirical results
The results of the empirical study are summarized in Tables 1 and 2 containing the results for mean-variance (β = γ) and minimum variance (β = ∞) calibration criteria, respectively. The top blocks of the table provides results for c = 0.2, while the following blocks correspond to c =0.5, 0.8, and 2. At the beginning of each block we summarize the performance measures for the traditional estimator and the estimator based on the nonlinear-shrinkage of Ledoit and Wolf (2012). These estimators do not depend on the target portfolio weights. Further we provide the results for the strategies involving the target, i.e. the suggested bona-fide shrinkage technique, the target portfolio itself and the extension of the estimator of Kan and Zhou (2007). This is done for equally weighted portfolio, equal correlation portfolio and Fama-French portfolio as targets. Furthermore, we include results for the traditional portfolio and for the bona-fide shrinkage portfolio, where the sample covariance matrix is replaced by its regularized version based on the Tikhonov (ridge) approximation (2.33) with δ = 1/p. The corresponding strategies are called trad ridge and bona fide ridge, respectively.
First, we consider the results for the mean-variance calibration in Table 1. If the dimension is low relatively to the sample size, namely c = 0.2, then the traditional estimator shows a good and robust performance. According to virtually all performance measures it is better than any estimator based on equal correlation and Fama-French targets. The Ledoit-Wolf estimator is better only in terms of the Sharpe ratio. The dominant strategy for this value of c is the Kan-Zhou estimator with the equally weighted target which is closely followed by the suggested bona-fide shrinkage portfolio. The ranking of the targets and the estimators slightly changes if we increase c to 0.5. As expected the traditional estimators becomes worse and is dominated by every target-based portfolio. The leading estimator for this value of c becomes the analytical Ledoit-Wolf estimator followed by Kan-Zhou with the equally weighted portfolio as the target. Finally, we note that the application of the regularized sample covariance matrix based on the Tikhonov (ridge) approximation (2.33) leads to minor improvements in the performance of both the traditional and the bona-fide shrinkage estimators when c = 0.2 and c = 0.5.
With c = 0.8 we attain the ratio of dimension to sample size where the high-dimensional asymptotics becomes relevant and simpler estimators heavily suffer from estimation risk. The traditional estimator shows extremely poor performance, which is similar to that of the numerical Ledoit-Wolf estimator. Since the Kan-Zhou estimator does not take the increasing dimension into account, it becomes worse than the target portfolios and the bona-fide shrinkage portfolios. At the same time, the analytical Ledoit-Wolf estimator becomes dominant with bona-fide ridge estimator slightly behind. Large improvements are observed in the performance of the traditional estimator when the ridge regularization is employed in its construction. In contrast, the application of the ridge regularization to the bona-fide shrinkage portfolios leads to minor improvements.
Finally, if we increase c to 2 the sample covariance matrix is singular and we use the generalized inverse for the traditional estimator. Now the bona-fide estimator becomes clearly  dominant, while the use of the equally weighted target and of the equally correlation target leads to similar results. As mentioned above the Kan-Zhou estimator is not feasible for c > 1. Also, the application of the regularized sample covariance matrix based on the Tikhonov (ridge) approximation improves the performance of the traditional estimator, while it leads to slightly worse performance of the bona-fide shrinkage estimator. To this end, we note a surprisingly poor performance of both Ledoit-Wolf estimators, which appear to be worse than most of the target portfolios. Noteworthy, the LWQuEST estimator is probably suffering from some numerical instabilities, while LWAnalytical behaves still very stable. If we switch to the minimum variance calibration criteria in Table 2, then the ranking of the estimators remains unchanged. Summarizing, the suggested bona-fide shrinkage estimator is comparable to the analytical Ledoit-Wolf nonlinear shrinkage estimator for c < 1 and becomes superior starting with c > 1. It is dominant with respect to all performance measures. For smaller values of c the Kan-Zhou estimator outperforms the bona-fide estimator and both Ledoit-Wolf estimators, and tends to show the best performance when it is used with the equally weighted target. For intermediate values of c < 1 the analytical Ledoit-Wolf estimator is dominant, while its poor performance for c > 1 is potentially due to the fact, that it does not take into account the estimation risk related to the sample mean vector, which is accounted for in the bona-fide shrinkage estimator and in the Kan-Zhou estimator. The numerical Ledoit-Wolf estimator seems to show huge numerical instabities in comparison to the analytic one (see, also Remark 3 below for further discussion of this finding).
The choice of the target is an important issue and relying on the results we can make general recommendations regarding its choice. For c < 1 the equally weighted portfolio is the best performing standalone strategy among the three alternatives. This target also leads to the shrinkage portfolio with the best overall performance. The equally correlated target is in all cases the second best choice. If c = 2 then the order changes and the equally correlated target becomes slightly better both as a standalone strategy and as the target for the shrinkagebased approach. Since the analysis is based on 1000 random portfolios, we can, therefore, recommend using the equally weighted target for c < 1 and equally correlated target for c > 1. Furthermore, we can conclude that taking the best standalone target strategy shall lead to the best performing shrinkage-based approach.
The time series of estimated shrinkage coefficients are depicted in Figure 6. For space reasons we provide the coefficients only for the equally weighted target and the mean-variance calibration. For other parameter constellations the results are similar. The portfolio is constructed using the first alphabetically sorted assets. We observe that for small values of c and thus a low estimation risk the shrinkage intensities are close to one. The behavior is very stable, but mimics the periods of high and low volatility of financial markets. Thus high volatility on financial markets causes higher shrinkage coefficients and a larger fraction of the sample EU portfolio. This can be justified by stronger effects of diversification during turmoil periods. With larger c the confidence in the classical portfolio diminishes leading to a stronger preference for the equally weighted portfolio. This results in lower and more volatile shrinkage intensities. However, we observe the reverse behaviour of the estimated shrinkage intensity when c = 2. Here, the impact of the traditional estimator in the portfolio structure increases and becomes comparable to the case of c = 0.5. Such results are in line with our findings of the simulation study presented in Figure 3, where the shrinkage intensity is close to zero around c = 1. Finally, the results obtained by employing the regularized sample covariance matrix based on the Tikhonov (ridge) approximation leads to similar values of the shrinkage intensities independently of c. These are shown in the lower plot in Figure 6. This finding is in line with the values presented in Tables 1 and 2. c=0.8 c=2 Figure 6: The bona-fide shrinkage intensities for the first 100 assets (alphabetic order) using the equally weighted target portfolio and the mean-variance calibration for c = 0.2, 0.5, 0.8 and 2. Above -bona fide, below -bona fide ridge (see formula (2.33)).
poor performance, which is probably because of some numerical issues. That is why we recommend to use its new analytic version. Nevertheless, the analytical nonlinear shrinkage Ledoit-Wolf estimator still shows in our empirical study a poor performance 3 in case c > 1, but we believe there is a specific reason for that. Indeed, in Ledoit and Wolf (2017a, Assumption 5) the authors assume that the sample mean vector is independently distributed of the sample covariance matrix and its distribution is rotation invariant. This assumption appears to be a characteristic property of multivariate normal distribution following Lukacs (1979). Moreover, the assumption that the distribution of the sample mean vector is rotation invariant, imposes further restrictions on the data-generating model. It requires the population mean vector to be a zero vector and the population covariance matrix to be proportional to the identity matrix. As such, it is not clear whether the Ledoit-Wolf estimator is optimal in the case of a non-zero population mean vector, i.e., in the mean-variance framework. This is also justified by the authors themselves in Ledoit and Wolf (2017a, see Remark 5). It seems that this "sample mean" effect becomes more prominent in the case of the singular sample covariance matrix (c > 1) and/or small risk aversion coefficient γ, i.e. when optimal portfolios lie further away from the global minimum variance portfolio on the efficient frontier. Thus, the Ledoit-Wolf estimator should be adjusted to this type of optimal portfolios before it can be efficiently used in practice when portfolio dimension is larger than the sample size, whereas the suggested bona-fide estimator for the optimal portfolio weights incorporates both the high-dimensional effects from the sample covariance matrix and the sample mean vector simultaneously.

Summary
In this paper we consider the portfolio selection in the high-dimensional framework. Particularly, we assume that the number of assets p and the sample size n tend to infinity such that their ratio p/n tends to constant c where c can also be larger than one, implying that we have more assets than observations. Because of the large estimation risk we suggest a shrinkagebased estimator of the portfolio weights, which shrinks the mean-variance portfolio to several target portfolios, such as the equally weighted portfolio, minimum-variance portfolio, etc. For the established shrinkage intensity we derive the limiting value which depends on c and on the characteristics of the efficient frontier only. On the other side, the derived limiting expression of the shrinkage uncertainty is only an oracle value and is not feasible in practice, since it depends on unknown quantities. In order to overcome the problem, we construct a bona-fide shrinkage estimator of the optimal portfolio weights by deducing consistent estimators of the parameters of the efficient frontier under the high-dimensional setting. As a result, a fully data-driving approach is established for constructing a practically feasible estimator for the weights of the optimal mean-variance portfolios. From the technical point of view, we rely on random matrix theory and work with the asymptotic behavior of linear and quadratic forms in the sample mean vector and in the (pseudo)-inverse sample covariance matrix. In extensive simulation and empirical studies, we evaluate the performance of established results with artificial and real data. Only if the sample size is much larger than the portfolio dimension, the traditional portfolio or the benchmark portfolio dominates the portfolio suggested in the paper.

Appendix: Proofs
Here the proofs of the theorems are given. Recall that the sample mean vector and the sample covariance matrix are given bȳ y n = 1 n Y n 1 n = µ n + Σ 1 2 nxn withx n = 1 n X n 1 n (5.1) and respectively. Later on, we also make use ofṼ n defined bỹ V n = 1 n X n X n (5.3) and the formula for the 1-rank update of usual inverse given by (c.f., Horn and Johnsohn (1985)) as well as the formula for the 1-rank update of Moore-Penrose inverse (see, Meyer (1973)) expressed as First, we present an important lemma which is a special case of Theorem 1 in Rubio and Mestre (2011).
Lemma 5.1. Assume (A2). Let a nonrandom p × p-dimensional matrix Θ p and a nonrandom n×n-dimensional matrix Θ n possess a uniformly bounded trace norms (sum of singular values). Then it holds that for p/n −→ c ∈ (0, +∞) as n → ∞, where Proof of Lemma 5.1: The application of Theorem 1 in Rubio and Mestre (2011) leads to (5.6) where x(z) is a unique solution in C + of the following equation (5.10) The two solutions of (5.10) are given by (5.11) In order to decide which of two solutions is feasible, we note that x 1,2 (z) is the Stieltjes transform with a positive imaginary part. Thus, without loss of generality, we can take z = 1 + c + i2 √ c and get (5.12) which is positive only if the sign " + " is chosen. Hence, the solution is given by The second assertion of the lemma follows directly from Bai and Silverstein (2010).
We note here that Lemma 5.1 is a special case of Theorem 1 in Rubio and Mestre (2011), where one has uniform convergence in the statement of the theorem. Although it is not precisely written in the statement of Theorem 1 in Rubio and Mestre (2011), this observation follows from its proof on page 600 where after showing pointwise convergence Rubio and Mestre additionally proved the uniform convergence by applying Montel's theorem. In short, they first show that the random sequence of analytic functions of interest forms a normal family and, thus, by Montel's theorem there exists a subsequence of it, which converges uniformly on each compact subset of C \ R + to an analytic function and this one vanishes almost surely on C \ R + . And so, the entire sequence converges uniformly to zero on every compact subset of C \ R + . Furthermore, it is mentioned on page 348 of Rubio et al. (2012) that the convergence in Theorem 1 of Rubio and Mestre (2011) is in fact uniform.
Moreover, the following result (see, e.g., Theorem 1 on page 176 in Ahlfors (1953)), known as the Weierstrass theorem on the uniform convergence, will be used in a sequel together with Lemma 5.1 in the proofs of the technical lemmas.
Theorem 5.1 (Weierstrass). Suppose that f n (z) is analytic in the region Ω n , and that the sequence {f n (z)} converges to a limit function f (z) in a region Ω, uniformly on every compact subset of Ω. Then f (z) is analytic in Ω. Moreover, f (z) converges uniformly to f (z) on every compact subset of Ω.
Because the convergence in Lemma 5.1 is uniform over z on every compact subset of C\R + , the Weierstrass theorem allows us to interchange any derivative with respect to z and the limit n → ∞. We will consider compact subsets, which are the small neighbourhoods of zero with (z) = 0 (without loss of generality) because all of the times we will let z → 0 in order to get specific limiting expressions of interest. For example, one may take Ω as a unit disk |z| < 1 and Ω n as a disk |z| < ε n for some ε n → 0 as n → ∞. The analyticity of the function tr Θ p (Ṽ n − zI p ) −1 follows immediately from the properties of the Stieltjes transform.
Proof of Lemma 5.2: Since the trace norm of θξ is uniformly bounded, i.e.
Proof of Lemma 5.4: It holds that for p/n −→ c ∈ (0, 1) as n → ∞ following (5.21). Similarly, we get for p/n −→ c ∈ (0, 1) as n → ∞. The rest of the proof follows from the equality and Lemma 5.3.
Substituting the above results into the expressions of A * n and B * n , we get that a.s.
For the proof of Theorem 2.2 we need several results about the properties of Moore-Penrose inverse which are summarized in the following three lemmas. Similarly as the proof of Lemma 5.2, we will use Lemma 5.1 and Theorem 5.1 in a sequel every time a derivative must be interchanged with the limit n → ∞. for p/n −→ c ∈ (1, +∞) as n → ∞ .
Proof of Lemma 5.5: It holds that V + = 1 n X n X n + = 1 √ n X n 1 n X n X n −2 1 √ n X n and, similarly, (Ṽ + ) i = 1 √ n X n 1 n X n X n −(i+1) 1 √ n X n for i = 2, 3, 4.