A Unified Framework for Fast Large-Scale Portfolio Optimization

We introduce a unified framework for rapid, large-scale portfolio optimization that incorporates both shrinkage and regularization techniques. This framework addresses multiple objectives, including minimum variance, mean-variance, and the maximum Sharpe ratio, and also adapts to various portfolio weight constraints. For each optimization scenario, we detail the translation into the corresponding quadratic programming (QP) problem and then integrate these solutions into a new open-source Python library. Using 50 years of return data from US mid to large-sized companies, and 33 distinct firm-specific characteristics, we utilize our framework to assess the out-of-sample monthly rebalanced portfolio performance of widely-adopted covariance matrix estimators and factor models, examining both daily and monthly returns. These estimators include the sample covariance matrix, linear and nonlinear shrinkage estimators, and factor portfolios based on Asset Pricing (AP) Trees, Principal Component Analysis (PCA), Risk Premium PCA (RP-PCA), and Instrumented PCA (IPCA). Our findings emphasize that AP-Trees and PCA-based factor models consistently outperform all other approaches in out-of-sample portfolio performance. Finally, we develop new l1 and l2 regularizations of factor portfolio norms which not only elevate the portfolio performance of AP-Trees and PCA-based factor models but they have a potential to reduce an excessive turnover and transaction costs often associated with these models.


Introduction
Institutional investors often manage portfolios comprising hundreds of assets, and the performance of such portfolios is evaluated through frequent backtesting exercises.These backtests rely different models and numerous optimizations, performed repetitively using a rollingwindow scheme and a long history of return data.In this paper, we introduce a unified framework for portfolio optimization.This framework employs Quadratic Programming (QP) methods to calculate portfolios with ℓ 1 and ℓ 2 2 regularization, long-short constraints, and various portfolio objective functions such as minimum-variance, mean-variance, and maximum-Sharpe ratio.Owing to the efficiency of the QP optimization algorithms, our proposed models are suitable for the realistic settings of large-dimensional portfolios.These can be applied repeatedly in a rolling window scheme, facilitating backtesting evaluations and refining investment strategies.
Our portfolio optimization framework requires the estimation of a mean vector and a covariance matrix.The two main approaches for the latter are shrinkage covariance matrix estimation and financial factors modeling.The former uses information contained in the assets returns only.It has been studied extensively starting from linear shrinkage covariance matrix estimator by Ledoit and Wolf (2004), nonlinear shrinkage estimators such as Ledoit and Wolf (2012), and Ledoit and Wolf (2020b), up to the most recent nonlinear quadratic shrinkage estimator proposed by Ledoit and Wolf (2022) (see Section 3 for more details).The latter approach uses common risk factors with financial or economic interpretations, which are well-known to capture large amounts of variation in the returns.Among the most famous models are CAPM-model of Treynor (1961), Sharpe (1964), Lintner (1965), and Mossin (1966), the three-factor, four-factor, and the five-factor model by Fama and French (1993), Carhart (1997) and Fama and French (2015), respectively.The extensions of these models under the non-Gaussianity assumption for the asset returns and factors are given in Hediger et al. (2021).There is also the relative momentum factor, which extends the three-factor model.It was first introduced and analyzed by Jegadeesh and Titman (1993), see also Chitsiripanich et al. (2022) and the references therein for momentum-based portfolio strategy without crashes.
While the aforementioned classical common risk factors remain among the most important, a large literature now exists on determining the inclusion of particular factors from the dozens, if not hundreds, available: see, e.g., Bai and Ng (2002), Stock and Watson (2002), Tsai and Tsay (2010), Bai and Ng (2013), Bai and Liao (2016), and the references therein.The amount of available alternative data, coupled with advancements in computational power and statistical techniques, such as the estimation of sparse models as in Tibshirani (1996) and Hastie et al. (2015), has led to the proliferation of different factor models, giving rise to what Feng et al. (2020) describes as a "zoo of factors." In this paper, we consider a large universe of liquid US stocks and 33 asset-specific characteristics, as listed in Table 3 in the Appendix.To extract relevant information from this large number of factors while capturing the dynamics in the dependency between factors and returns in a large portfolio of assets, we use different models, such as: Asset Pricing (AP) Trees introduced in Bryzgalova et al. (2020), and three different Principal Component Analysis (PCA) based factor models that invest in leading factor portfolios including the PCA on the factor portfolios, the Risk Premium PCA (RP-PCA) introduced in Lettau and Pelger (2020), and the Instrumented PCA (IPCA) from Kelly et al. (2019).All of these papers show that their asset-specific factor based models outperform the common risk factors models mentioned earlier in terms of higher in-sample and predicted R 2 values, leading to higher out-of-sample portfolio performance.Recently, Goyal and Saretto (2022) used IPCA to explain the returns of option contracts and achieved a significantly better outof-sample R 2 .Motivated by these successes of these recent factor-based models and their flexibility in capturing information from a large number of stock-specific characteristics, we forgo the aforementioned common risk factors models and focus on the AP-Trees and PCAbased models in our unified portfolio optimization framework.We compare these emerging models and the aforementioned shrinkage approaches in portfolio optimization with liquid stocks under realistic portfolio constraints.
In Lettau and Pelger (2020) and Kelly et al. (2019), the portfolio performance of PCAbased models is evaluated using the tangent portfolio.This is a closed-form portfolio that permits unbounded long and short positions in individual assets, as well as highly leveraged long-short portfolio strategies.In this paper, we contrast the portfolio performance of the PCA-based models with commonly used benchmarks, such as the shrinkage covariance matrix estimator.We employ a rolling window exercise on an extensive history of a large set of liquid US equity returns, excluding small and micro-caps.We also apply realistic constraints on individual positions and long-short strategies to prevent highly concentrated positions and excessively leveraged portfolios.Our portfolio performance largely agrees with the original results in Lettau and Pelger (2020) and Kelly et al. (2019).But this more grounded setup further illustrates the versatility of the proposed unified portfolio optimization framework, making it relevant to the practical portfolio challenges faced by large institutional investors.
Our paper presents four primary contributions.First, we introduce a unified framework for large-scale, rapid portfolio optimization that incorporates realistic constraints and innovative regularizations to enhance investment performance.This framework is particularly relevant for institutional investors managing portfolios with hundreds or even thousands of assets, facilitating cost-efficient investment decisions.As a practical tool, we've made our Python implementation of this framework available as open-source code online.1 Second, we offer fresh insights into the performance of the recently discussed AP-Trees and PCA-based models.Third, our framework supports a multitude of portfolio problem combinations, varying in portfolio objective functions, regularizations, and constraints.This includes the ℓ 1 and ℓ 2 2 regularized portfolio problems, as introduced by DeMiguel et al. ( 2009) for the minimum-variance portfolio.We expand upon this by introducing the ℓ 1 +ℓ 2 2 regularized maximum-Sharpe ratio portfolios and the comprehensive ℓ 1 +ℓ 2 2 regularized mean-variance portfolio frontier.Lastly, within the scope of AP-Trees and PCA-based models, we demonstrate how to apply our novel regularizations to both managed portfolios and individual stocks.We further illustrate how these new regularizations result in superior performance, leading to more stable and streamlined portfolio positions.Importantly, we show how to solve all of these optimization problems using QP methods.
The rest of the paper is structured as follows.Section 2 presents our comprehensive framework for portfolio optimization.Section 3 elaborates on the various covariance matrix estimators discussed in this study.Section 4 introduces a novel regularization for factor-based portfolio optimization challenges with an emphasizes on maximu Sharpe ratio portfolio.Empirical comparisons of the estimators and models across distinct portfolio optimization problems are detailed in Section 5. Concluding observations are given in Section 6.The Appendix provides details on the asset-specific factors.

Portfolio Optimization Framework
We consider a universe of N assets, with prices observed over a given period of time with T observations.Let P t,i be the price of asset i = 1, . . ., N at time index t = 1, . . ., T , where the time index t corresponds to a fixed unit of time such as days, weeks, or months.The corresponding simple returns 2 (also known as linear or net returns) are given by R Pt−1,i − 1, and the log-returns (also known as continuously compounded returns) are r t,i = log We denote the vector of log-returns of N assets at time t with r t ∈ R N .It is a multivariate stochastic process with conditional mean and covariance matrix denoted by and where F t−1 denotes the previous historical data.In this work, except for the IPCA model, we will drop the subscript t on the mean and covariance matrix since all models assume iid returns.For more general multivariate time-series models of returns with the dynamics in the conditional mean and covariance matrix together with their applications in portfolio optimization, we refer to Paolella and Polak (2015), Paolella et al. (2019), andPaolella et al. (2021).The investment portfolio is usually summarized by an N -vector of weights w = [w 1 , . . ., w N ] ′ indicating the fraction of the total wealth of the investor held in each asset.If the investor is assumed to hold her total wealth in the portfolio, then w ′ 1 N = 1, where 1 N denotes an N -vector of ones.The corresponding portfolio return r t (w) = w ′ r t is a random variable with the mean and variance given by µ w = E[r t (w)] = w ′ µ and σ 2 w = V ar[r t (w)] = w ′ Σw, respectively.
The general theory of portfolio optimization, as introduced in a seminal work by Markowitz (1952), summarizes the trade-off between risk and investment return using the portfolio's mean and variance.In particular, for a given choice of target mean return α 0 , in Markowitz portfolio optimization, one chooses the optimal portfolio as where W := w ∈ R N : w ′ µ ≥ α 0 and w ′ 1 N = 1 is a set of constraints on the portfolio weights which correspond to a fully invested portfolio with the expected return above the α 0 threshold.Under these constraints, (1) has a closed-form solution given by where 2 In the empirical analysis we work with dividend and split adjusted simple returns.
The minimum-variance portfolio (M in-V ar in Figure 1) is a solution to (1) with W := w ∈ R N : w ′ 1 N = 1 .The solution to this problem also has a closed-form expression given by w where C is defined above.However, when short-selling is not allowed, i.e., w ≥ 0 N , or when it is constrained, e.g., as in Section 2.5, then the optimization problem (1) does not have a closed-form solution and needs to be solved numerically.Nevertheless, ( 1) is a QP problem with convex constraints (hence also a convex problem).It has closed-form expressions for the gradient and hessian of the objective function, and a unique global optimal portfolio satisfying the constraints in W. In particular, by changing α 0 , one can derive a whole portfolio frontier of optimal investments w * (α 0 ) summarizing the risk-return trade-off.
Following Li (2015), we can reinterpret the mean-variance portfolio optimization problem as a linear regression with N independent variables and N observations.This relationship can be expressed as: where 2 , e represents a vector of random errors, and γ > 0 is the risk aversion coefficient (Lagrange multiplier) associated with the α 0 threshold in W described above.The least squares estimator of w, given by ŵOLS = (X T X) −1 (X T y), corresponds to the closed-form optimal portfolio weight when the constraint w ′ 1 N = 1 is omitted.In other words, ŵ = 1 γ Σ −1 µ.In practice, Σ and µ are unknown and they are replaced by their (random) estimators.Thus, the principles of linear regression can be naturally extended to portfolio optimization.In a similar vein, the theories of ℓ 1 and ℓ 2 2 regularized regression can be directly related to the regularized portfolio optimization problem.When the portfolio constraint w ′ 1 N = 1 is incorporated, this mirrors the analogous constraint in the least squares problem.
Figure 1 presents two long-only mean-variance portfolio efficient frontiers, both with and without the ℓ 2 2 regularization discussed in Section 2.2.For varying levels of portfolio variances, the expected return of the top-performing portfolio is plotted.Alongside these frontiers, we illustrate various optimal portfolios discussed in this paper.Additionally, a cloud of points represents the means and variances of 25,000 randomly drawn iid Dirichlet distributed portfolios.Specifically, each portfolio weight vector w k is independently and identically distributed as Dir(1 N ) for k = 1, . . ., 25000.In this example, the portfolios are comprised of eight stocks from the US market with tickers: AMZN, MSFT, GOOGL, F, TM, AAPL, KO, and PEP.The mean and covariance matrix are estimated using daily returns spanning the period from 2015-01-01 to 2022-01-01.Such a low dimensional portfolio problem is common in the aforementioned PCA-based models which invest into K factor portfolios that are mapped into the individual assets.
In practice, it is often the case that the investment portfolio consist of a much larger number of assets than in the example above.Figure 2 illustrates portfolio frontiers together with 25000 iid Dirichlet distributed 3 portfolios w k iid ∼ Dir(1 N ), for k = 1, . . ., 25000, and 3 Here we use Dirichlet distributed random vectors to guarantee uniform sampling on the N dimensional simplex (w ′ 1 N = 1).The results for the weights sampled from uniform distribution normalized on the simplex, i.e., w = x/(x ′ 1 N ), where x = [x 1 , . . ., x N ] and x i iid ∼ U ([0, 1]); and for the weights sampled from the absolute value of standard normal distribution normalized on the simplex, i.e., w = |x| / ∥x∥ 1 , where x = [x 1 , . . ., x N ] and x i iid ∼ N (0, 1) are similar.2 regularization) and all of the optimal portfolios considered in our portfolio framework with the long-only constraints and ℓ 1 , ℓ 2 2 , and ℓ 1 +ℓ 2 2 regularization for eight stocks (AMZN, MSFT, GOOGL, F, TM, AAPL, KO, and PEP), with the mean and covariance matrix estimated using daily returns over eight years (2015/01/01-2022/01/01).Among them are two optimal portfolios: the minimum-variance portfolio and the maximum Sharpe ratio portfolio, and a collection of random portfolios.
for the different number of assets N = 10, 20, 50, 500 selected from the largest marketcapitalization stocks in the US market with mean and covariance matrix estimated using ten years of daily returns (which is a much larger number of observations than all of our monthly data used in Section 5).As can be seen from different panels in Figure 2, the dimensionality of the portfolio has two major impacts.First, the larger the assets universe, the further away and the more concentrated around 1/N are the random portfolios.This shows that in a large-dimensional setup without proper portfolio optimization, one cannot expect to achieve any optimal risk-reward profile and that even the 1/N portfolio, which is so often advocated as a naive-diversification and well-performing portfolio-see DeMiguel et al., 2009 and the references therein-is in fact as good as any random guess.Figure 2 also depicts the equal volatility contribution portfolio, which is a special case of the risk parity portfolio (see, e.g., Roncalli, 2013 andPaolella et al., 2022).It is slightly better than random portfolios or the 1/N .However, based on the distance between the equal volatility contribution portfolio and the mean-variance portfolio frontier, and even with the uncertainty about the actual frontier, there is still a lot of opportunity for improved portfolio allocation.Second, the closed-form long-short frontier from equation (2), depicted with dotted black lines in all the panels of Figure 2, is becoming almost vertical compared with the long-only portfolio when the number of assets increases.Therefore, small changes in the optimal portfolio volatility translate to theoretically disproportionately large gains in the expected returns of the optimal portfolio.This implies that estimates of the optimal portfolio weights are sensitive to new data points, and the weights can change a lot over the consecutive rolling windows.This is the artifact of high-dimensionality and relatively close to non-singular covariance matrix estimates.Proper covariance matrix estimation in high dimensions and long-short constraints help in avoiding these over-leveraged and unrealistic but theoretically optimal portfolios.
Figure 2 depicts portfolio frontiers alongside 25,000 iid Dirichlet distributed4 portfolios w k iid ∼ Dir(1 N ), for k = 1, . . ., 25000.The assets number varies as N = 10, 20, 50, 500, chosen from the largest market-capitalization stocks in the US market.The mean and covariance matrix are derived from ten years of daily returns, a period significantly longer than our monthly data in Section 5.
From the varying panels in Figure 2, we discern two significant implications of portfolio dimensionality.First, as the assets universe expands, random portfolios veer further from and concentrate more around the 1/N mark.This suggests that without appropriate portfolio optimization in high-dimensional setups, achieving any optimal risk-reward profile is challenging.Even the frequently endorsed 1/N portfolio, often hailed for naivediversification and robust performance (see DeMiguel et al., 2009 and the cited references), performs equivalently to a random guess.The figure also presents the equal volatility contribution portfolio, a variant of the risk parity portfolio (Roncalli, 2013 andPaolella et al., 2022).While it slightly outperforms random portfolios and the 1/N , the gap between this portfolio and the mean-variance portfolio frontier indicates a lot of room for improved portfolio allocation.
Secondly, the closed-form long-short frontier, represented by (2) and illustrated with dotted black lines in all the panels in Figure 2, appears almost vertical in relation to the longsimplex (i.e., w = x/(x ′ 1 N ), where x = [x 1 , . . ., x N ] and x i iid ∼ U ([0, 1]); and from the weights derived from the absolute value of the standard normal distribution normalized on the simplex (i.e., w = |x| / ∥x∥ 1 , where x = [x 1 , . . ., x N ] and x i iid ∼ N (0, 1)) align closely.only portfolio as assets increase.Consequently, marginal shifts in optimal portfolio volatility can lead to theoretically substantial hikes in the expected returns of the optimal portfolio.This highlights the sensitivity of optimal portfolio weight estimates to new data points, with weights potentially exhibiting significant variations across consecutive rolling windows.Such behavior stems from high dimensionality and proximate non-singular covariance matrix estimates.Effective covariance matrix estimation in expansive dimensions, combined with long-short constraints, counters these over-leveraged yet theoretically optimal portfolios.In practice the true mean vector and covariance matrix are unknown, and one needs to rely on their estimates.Financial markets, especially at low frequencies, are highly efficient-or, as suggested by Pedersen (2015), they are "efficiently-inefficient".We do not attempt to construct individual stocks prediction signals-for that we refer to recent results in Chitsiripanich et al. (2022).Instead, we focus on various mean and covariance matrix shrinkage estimators as well as different factor portfolios.The former address the biasvariance trade-off, aiming to construct biased estimators that minimize the mean-square error and perform better out-of-sample.The latter offers conditional predictions of expected returns based on asset characteristics.As we will demonstrate, the factor portfolios significantly enhance the signal-to-noise ratio, leading to more accurate mean predictions and higher out-of-sample performance.However, before we turn to stock returns models, we introduce the rest of our general portfolio optimization framework.

Portfolio Constraints
The set of feasible portfolio weights W := w ∈ R N : w ′ 1 N = 1 usually includes additional constraints.Among the most commonly used are: • Long only: • Asset specific holding constraints: where U = (U 1 , . . ., U N ) and L = (L 1 , . . ., L N ) are upper and lower bounds for the N portfolio positions.
• Turnover constraints: -for individual assets limits where ∆w i denotes the change in the portfolio weight from the current position to the optimal value and U i are the turnover limits for individual positions; -for the total portfolio limit where U * is the turnover limit for the entire portfolio.
• Benchmark exposure constraints: where, w B are the weights of the benchmark portfolio, and U B is the total error bound.
• Tracking error constraints: for a given benchmark portfolio B with weights w B , r B = w ′ B r is the return of the benchmark portfolio, e.g., S&P 500 Index, NASDAQ 100, Russell 1000/2000.One can compute the variance of the Tracking Error V ar(T E) = (w − w B ) ′ Σ(w − w B ), and include the corresponding constraint into to the set of feasible portfolio weights where σ 2 T E > 0 is the variance tracking-error of the portfolio.
• Risk factor constraints: estimate the risk factors exposure for all the assets in the portfolio, e.g., via the following regression (see (19) for details) Given these estimates, one can (i) constrain the exposure to a given factor k by (ii) neutralize the exposure to all the risk factors by All the constraints listed above (including those that involve the absolute value function-see the remarks in Section 2.3) can be written as linear or quadratic constraints, i.e., • linear constraints: we can specify N -columns matrices A w and A B and vectors u w , u B to introduce linear inequality constraints for the relative positions between the assets or the benchmark • quadratic constraints: we can specify N × N matrices Q w , Q B and scalars q w , q B to build constraints Once the constraints are converted into these standard forms, they can be easily combined and incorporated into our portfolio optimization framework.We consider next, a different type of constraint that is often incorporated into portfolio optimization using the method of Lagrange multipliers.These constraints are not imposed by the portfolio manager because of her trading goals or position requirements.They are added because they are a form of regularization of the problem in high dimensions, and they help to improve the out-of-sample portfolio performance in large dimensions.

Portfolio Optimization with ℓ 2 2 Penalized Portfolio Norms
Consider now an ℓ 2 2 -constrained (also called the ridge penalty) portfolio optimization problem for the minimum-variance portfolio (1).Using the method of Lagrange multipliers, we can write the corresponding optimization problem as where λ ≥ 0 is the penalty strength parameter and ∥w∥ Using the spectral decomposition of Σ = PΛP ′ , where PP ′ = I N and Λ = diag(δ 1 , . . ., δ N ), and since ∥w∥ 2 2 = w ′ w = (P ′ w) ′ (P ′ w), we can rewrite the ℓ 2 2 penalized objective function as where Σ = P [Λ + λI N ] P ′ has all the eigenvalues shifted up by λ ≥ 0. This is, again, a QP optimization problem that falls into our unified framework.

Portfolio Optimization with ℓ 1 Penalized Portfolio Norms
Similarly to the ℓ 2 2 -constraint, we can write the Lagrangian of ℓ 1 -constrained minimumvariance portfolio optimization problem as where λ ≥ 0 is the penalty strength parameter and The main difference compared to ( 5) is that the objective function in ( 7) is non-differentiable because of the kinks in the absolute value function, and the spectral decomposition will not help in converting (7) into a standard QP problem.Instead, we define where W = (w, w + , w − ) ∈ R 3N : w = w + − w − , w + ≥ 0, w − ≥ 0, and w ∈ W .This way, we rewrote the original non-differentiable problem in N variables as a QP problem in 3N variables with additional N equality constraints. 5he following remarks can be made about this new optimization problem: (i) Note that we do not have to include the constraint w + • w − = 0 into the definition of the set of feasible weights W since any portfolio with w + • w − ̸ = 0 is strictly dominated in terms of the value of the objective function by an analogous portfolio with w + • w − = 0. Hence, the optimizer will never stop at w + • w − ̸ = 0.
(ii) If the portfolio is long-only, the ℓ 1 norm for the feasible portfolios reduces to the sum of portfolio weights, and the optimization problem (7) becomes differentiable.In this case, we observe empirically that optimal portfolio weights will never change when λ grows-see the left panel in Figure 3 (see also Figure 1 where some optimal portfolios are ℓ 1 +ℓ 2 2 regularized, and they coincide with the ℓ 2 2 regularized portfolios).This is because the constraints will disappear if we assume that w ′ 1 N = 1 and w ≥ 0 N .Even when short positions are allowed, the optimization problem will have only partially sparse solutions.In both cases, as opposed to a usual LASSO problem, the solution will not converge to 0 when λ goes to infinity because we have another constraint in W  that w ′ 1 N = 1, and one will never get all the optimal weights equal to zero.As shown in the right panel in Figure 3, in the long-short portfolio, only all the initially (when λ = 0) negative weights will converge to zero.Some of the initially positive weights will go to zero too.At the same time, the remaining positive weights will converge to a long-only minimum-variance portfolio.Importantly, some intermediate levels of λ and the corresponding non-zero optimal weights can perform well out-of-sample.
(iii) Note that any of the constraints listed in Section 2.1 such that it involves an absolute value function, can be rewritten using the w + and w − .Hence, the corresponding optimization problem can be solved using the QP methods.

Long-Short Constrained Portfolio
The long-short constrained minimum-variance portfolio optimization from (1) is defined as where W LS (ϑ) = w ∈ R N : i:wi>0 w i ≤ 1 + ϑ and i:wi<0 w i ≥ −ϑ .This is a different type of portfolio weights constraint that aggregates them based on their sign.Long-only portfolio constraint is a special case given by W LS (ϑ) for ϑ = 0. We can take again Hence, we can replace the W LS (ϑ) with a new constraint set given by and solve the corresponding QP problem.

Mean-Variance Optimization with Risk-Free Asset
In mean-variance portfolio in (1), the goal is to optimize the trade-off between portfolio returns and risk.In other words, the mean-variance method looks for a portfolio with the lowest variance while the expected portfolio returns w ′ µ is constraint from below by α 0 .
Because of the convexity of the problem, the optimal value corresponds to the minimum volatility portfolio under the target return level.
In addition to the risky assets (i = 1, . . ., N ) we can assume there is a risk-free asset for which R f = r f , i.e., E[R f ] = r f and V ar(R f ) = 0. Suppose the investor can invest in the N risky investments as well as in the risk-free asset.The portfolio with investment in riskfree assets consists of two parts: For a given choice of target mean return α 0 , choose the portfolio w * to w * = arg min where Then we can derive the Lagrangian as Solving the Lagrangian, we get So the expected return and the variance of the optimal portfolio are given by respectively.Note that because of the risk-free asset, the resulting portfolio frontier will be a line (it is the so-called one fund theorem) connecting two points in the mean-variance plane: the (0, r f ) where all the money is invested only in the risk-free asset; and the mean and variance of so called market portfolio which is the tangent point to the portfolio frontier without the risk-free asset.So in order to find solutions for different α 0 , it suffices to solve for the portfolio without risk-free asset, and take linear combinations of that portfolio with the risk-free investment.Hence, again this can be considered as part of our general portfolio framework.

Maximum Sharpe Ratio Portfolio
Markowitz's mean-variance framework in (1) provides portfolios along the optimal frontier, and the choice of the specific portfolio depends on the risk-aversion of the investor.Typically one measures the investment performance using the Sharpe ratio, and there is only one portfolio on the optimal frontier that achieves the maximum Sharpe ratio arg max where W = w ∈ R N : 1 N w = 1, w ≥ 0 , and r f is the return for a risk-free asset.This problem -although nonconvex -belongs to the family of so called Fractional Programming (FP) optimization problems that involve ratios.It is a concave-convex singleratio and can be solved by different approaches.This particular FP problem is still simple to solve using a reparametrization trick.One can note that the objective function in ( 14) is homogeneous of degree zero, and reformulate this problem as a QP problem.If there exists at least one portfolio vector w such that w ′ µ − r f > 0, then for w ′ µ − r f ̸ = 0, and w ∈ W, we can change the maximization problem into an equivalent minimization arg min where W = w ∈ R N : w ′ 1 N = 1, w ≥ 0 .Now by the homogeneity of degree zero of the objective function, we can choose the proper scaling factor for our convenience.We define w = γw with scaling factor γ = 1/w ′ (µ − r f 1 N ) > 0. So that the objective becomes w ′ Σ w, the sum constraint 1 ′ N w = γ, and the above problem is equivalent to where The optimal portfolio weights w * are recovered after doing the optimization through the transformation w * = w * /γ * .Importantly note that all the aforementioned constraints and regularizations can also be incorporated into this optimization problem (16), and it will remain equivalent to the original maximum Sharpe ratio portfolio with the same regularizations and constraints properly rescaled as in (25).In Section 4, we will provide a more detailed and precise presentation.The advantage of ( 16) is that even with these constraints and regularizations, it will be easy to solve numerically using QP methods.
In our portfolio optimization framework, once the portfolio problems are turned into standard QP problems, we use the OSQP solver from Stellato et al. (2020) to solve them.The solver uses ADMM algorithm for the optimization (see Boyd et al., 2011 and references therein for the detail introduction of the algorithm).It is an open-source solver available at https://osqp.org/docs/solver/index.html.As summarized in Table 1, portfolios with 50 assets or less can be optimized with very high precision especially compared to any numerical gradient based method.All the evaluations in Table 1 are done on a single core of the AMD Ryzen Threadripper 2990WX Processor.This concludes our summary of portfolio optimization problems that we can solve using the QP framework.The corresponding code with the implementation in Python is available online at https://github.com/PawPol/PyPortOpt.We describe next all the covariance matrix estimators considered in this paper.
Table 1: Summary of the total running time (in seconds) for 100 rolling windows of three different portfolio optimization problems from our general framework described in Section 2 for different dimensions of the problem (N = 10, 20, 50, 500), and two different levels of tolerance and precision in the optimizer: (i) default precision used in the OSQP package https: // osqp.org ; (ii) high precision with 10 4 maximum iterations, and the absolute and relative tolerance set to 10 −8 .The latter is needed to generate convex portfolio frontiers in simulations for large N , and we use it in all our empirical studies.Computations are done using a single core of the AMD Ryzen Threadripper 2990WX Processor.

Default Precision High Precision
Portfolio Objective Function N=10 N=20 N=50 N=500 N=10 N=20 N=50 N=500 In Markowitz's portfolio theory, the mean vector µ and the covariance matrix Σ are assumed to be known.However, in practice, these parameters must be estimated from data.
A prevalent method involves using the historical sample mean and sample covariance matrix under the assumption of iid observations.This approach frequently results in suboptimal out-of-sample performance.As highlighted in the introduction, there exist alternative estimators that offer improved out-of-sample outcomes.In the subsequent empirical section, we utilize our portfolio optimization framework to compare the portfolio performance yielded by various mean and covariance matrix shrinkage methodologies against that from different factor-based models.The former, the shrinkage methods, derive their estimates from daily data, while the latter, the factor-based models, utilize monthly returns and stock specific characteristics for their evaluations.
In case of daily data and the mean and covariance matrix shrinkage, for the mean estimation we use the sample mean and three shrinkage estimators from Wang et al. (2014) and Bodnar et al. (2019).For the covariance matrix, first, we use the classical linear shrinkage covariance matrix estimator Ledoit and Wolf (2004) defined as where S = 1 T T t=1 (r t − r)(r t − r) ′ and F is the estimated structured covariance matrix.In particular, F = trace(S)/N , and δ denotes the estimator of optimal shrinkage constant δ.In practice, the authors propose to use δ = max{0, min{ κ T , 1}}, where κ = π− ρ γ , and π, ρ and γ be estimated as π = In situations when the number of assets (variables) is commensurate with the sample size, the sample covariance matrix is usually not well-conditioned and not invertible.Getting the linear combination of the sample covariance matrix and identity matrix is a way to shrink the eigenvalues of the sample covariance matrix away from zero and towards their average in F = trace(S)/N , with δ ∈ [0, 1] denoting the shrinkage intensity.As a result, we get a well-conditioned covariance matrix estimator that has a lower mean-square error than the sample covariance matrix, and, in large dimensions, when N grows asymptotically with T , it is a consistent estimator of the covariance matrix.
Second, we consider a more recent nonlinear shrinkage covariance matrix estimator-the quadratic inverse shrinkage estimator from Ledoit and Wolf (2020a).The estimator can be written as Σt : where ∆t := diag( δt (λ 1,t ), . . ., δt (λ N,t )), and δt is a real univariate function of λ i,t for i = 1, . . ., N .λ = (λ 1 , . . ., λ N ) denotes the eigenvalues and U t = [u 1,t , . . ., u N,t ] are the corresponding eigenvectors.By introducing the nonlinear transformation (Hilbert transform) of the sample eigenvalues, this method helps with the curse of dimensionality.The shrinkage techniques previously described are typically employed for large-dimensional portfolio problems.A different strategy to address the challenges of dimensionality in portfolio optimization involves the use of factor models.Classical factor modeling, as presented by Fama and French (1993), Carhart (1997), andFama andFrench (2015), assumes that returns adhere to the linear model: where f t ∈ R K×1 represents a vector of observed factors, ϵi, t is the zero-mean noise that captures the idiosyncratic component uncorrelated with the observed factors, and βi ∈ R K×1 denotes a vector of unknown factor loadings.In many of these models, α i is set to 0 for all assets i.Given that this is essentially a linear regression problem and the factors are presumed to be uncorrelated with ϵ i,t , the return's covariance matrix divides into a section explained by the factors and an idiosyncratic section.Additionally, if the ϵ i,t components are assumed to be uncorrelated across assets, the covariance matrix of the idiosyncratic component can be directly estimated from the regression residuals.Consequently, this model remains applicable even when N significantly exceeds T .However, the factor model given in (19) has its limitations.First, it assumes that the factors are both known and common across all assets.This means they can only elucidate risk to a certain extent and may not always correlate strongly with the actual risk in specific market conditions.Second, the factor loadings, represented by β i , are considered constant over time.
An alternative method that addresses the first limitation is to employ Principal Component Analysis (PCA) to derive latent factors directly from the covariance matrix of asset returns, without needing additional information.However, the covariance matrix for individual stock returns does not possess a lower-dimensional latent subspace that can precisely capture the variations in these returns.As a consequence, executing PCA on the covariance matrix of individual stock returns tends to introduce significant noise.This can lead to unstable portfolios and underperformance in out-of-sample scenarios.Thus, rather than applying PCA directly to the matrix of stock returns, it's more effective to work with the matrix of returns from portfolios that are single or double-sorted based on a cross-section of firm characteristics, as discussed in Bryzgalova et al. (2020) and the references therein.
PCA, when applied to managed portfolios, can extract factors that encapsulate the comovement among returns and identify systematic time-series factors that predominantly influence cross-sectional risk.Typically, the top K eigenvectors are selected as assets in the portfolios, and one then optimizes the best capital allocation among them.Lettau and Pelger (2020) introduce the Risk Premium (RP)-PCA that identifies pivotal factors in explaining asset returns.While traditional PCA focuses solely on data comovement, it does not incorporate data means.Consequently, it may miss out on capturing vital differences in the mean risk premia of assets.In contrast, RP-PCA takes into account both the first and second moments of data, thereby enhancing estimation efficiency.Our empirical results confirm that RP-PCA outperforms PCA in portfolio performance.Bryzgalova et al. (2020) introduced the so-called Asset Pricing (AP) Trees, which serve as a generalization of sorting portfolios using tree-based methods.AP Trees offer concise and interpretable portfolios that span the stochastic discount factor (SDF) on stock returns; and it addresses challenges related to complexity, high dimensionality, and duplication.Our empirical analysis employs excess returns from AP-Trees with depth equal to three, as well as a broad cross-section of single-sorted decile portfolios.These portfolios are derived from ten distinct deciles of 33 anomaly characteristics, resulting in a total of 330 managed portfolios for the single sorting.We do not work with double sorted portfolios because in our universe of mid-and large-cap stocks considered in the empirical analysis many of the double sorted portfolios were empty.AP-Trees approach results in 36 different sortings-out of 10 stock specific characteristics we always use Size, and remaining two (9 choose 2) give 36 different trees of depth three.Each of these trees comprises 360 managed portfolios.
The AP-Trees and all the PCA-based models still assume static loadings, and they lack accuracy and flexibility because after constructing the managed portfolios, they use only the information from their returns to estimate optimal portfolio positions.In a similar way Kelly et al. ( 2019) motivated their IPCA model, where asset returns are assumed to admit the following factor structure The major distinctions from the classical factor models discussed previously are: (i) The IPCA model, analogous to BARRA's factor model, posits that the alphas α i,t and the factor loadings β i,t ∈ R K×1 are time-dependent.However, unlike BARRA's model, it assumes they are implicitly observed through where z i,t ∈ R 1×L denotes observed asset-specific characteristics, and Γ α ∈ R L×1 and Γ β ∈ R L×K are matrices of parameters estimated from the data.
(ii) Due to the dimension reduction introduced by the matrix Γ β ∈ R L×K , the number of observed factors L can be much larger than the number of factor loadings K.
(iii) The factors f t ∈ R K×1 are time-dependent and are estimated from the data.
(iv) This model is predictive, with observable factors lagged by one period relative to the returns they explain.
(v) ϵ i,t+1 , v α,i,t , and v β,i,t are mean zero random noises originating from the estimation of factors and loadings.The ϵ i,t+1 uncovers the firm-level risk, whereas v α,i,t and v β,i,t represent the residuals between the true factor model parameters and observable firm characteristics.
The rationale behind the IPCA model lies in the challenge of high-dimensional factor models: an excess of characteristics can lead to significant noise and collinearity among factors.This makes the results challenging to interpret and can diminish the model's out-ofsample performance.Hence, Γ β is introduced to aggregate large-dimensional characteristics into a linear combination of exposure risks.Any errors orthogonal to the dynamic loadings are accounted for in the v β,i,t .
In the empirical analysis, we assume that Γ α = 0 while focusing on the estimation of Γ β .Hence, for the restricted model (Γ α = 0), we have where We can derive this based on the vector form where r t+1 is an N × 1 vector of assets returns, Z t is an N × L vector of observable characteristics and Γ β is an L×K mapping matrix, f t+1 is an K ×1 vector of the combination latent factor.Then we can write the objective function of IPCA model as min where ⊗ denotes the Kronecker product of matrices.Formula (23a) shows that latent factors represent the coefficients of returns regressed on the latent loading matrix β t ∈ R N ×L , t = (1, . . ., T ).Meanwhile, Γ β denotes the regression coefficients of r t+1 on the combination of latent factors and firm characteristics.This first-order condition system does not have a close form solution, but it can be solved numerically by the alternating least squares method.
4 Regularizing Factor-Based Portfolios: An Application to the Maximum Sharpe Ratio Objective In portfolio optimization, among all the objective functions in our framework, we focus on two fully-invested optimal portfolios: the minimum variance (min Var) portfolio, as detailed in Section 2.1, and the maximum Sharpe ratio (max SR) portfolio, discussed in Section 2.7.We consider both with and without the ℓ 1 + ℓ 2 2 regularization, which is covered in Section 2.4.The minimum variance portfolio is commonly employed to evaluate models that emphasize covariance matrix estimation without mean prediction.In our study, we use it for daily data, specifically for all covariance matrix shrinkage models and for the ℓ 1 + ℓ 2 2 regularized portfolio problems.On the other hand, the maximum Sharpe ratio portfolio aims to maximize the risk-adjusted return of the portfolio strategy, meaning it offers the highest return for each unit of risk, measured in terms of portfolio volatility.Positioned centrally on the portfolio efficient frontier, it is one of the most computationally intensive problems in our framework, as it necessitates reparametrization into a higher dimensional space.Therefore, we consider it a good representative for our mean (and covariance matrix) shrinkage models using daily returns, as well as for the factor-based models employing monthly returns, given the persistence of the mean signal in the constructed factor portfolios.The corresponding optimization problem can be expressed as: where ϑ ≥ 0 represents the short-selling threshold parameter (set to ϑ = 0.2 in our study).The matrix V ∈ R N ×K encapsulates the linear mapping between managed portfolios and individual assets in our investment universe.We set the other parameters as follows: r f = 0, L j = −0.08,and U j = 0.08 for all j = 1, . . ., N .If the optimal portfolio weights w pertain to individual stocks, then V is the identity matrix, and µ and Σ signify the mean and covariance matrix (after shrinkage) estimators of those individual stock returns.For AP-Trees, we employ the high-dimensional sample mean and covariance matrix of factor portfolios.With PCA-based models, we consider K = 2, . . ., 6 dimensional µ f , and Σ f derived from PCA, RP-PCA, and IPCA estimated means, along with the estimated covariance matrix of the corresponding K factor portfolios.For PCA and RP-PCA, V comprises the first K eigenvectors of the PCA and RP-PCA covariance matrices, respectively.In the case of the IPCA model, V = ( Γ′ β Z ′ t Z t Γβ ) −1 Γ′ β Z ′ t describes the transformation from the IPCA factors of the last observation to individual stocks.
In order to solve it efficiently, we reformulate (24) into an equivalent QP problem from Section 2.7 with constraints rewritten as in Section 2.5.In Section 5, we introduce factor portfolios based on Principal Component Analysis (PCA), Risk Premium PCA (RP-PCA), and Instrumented PCA (IPCA).All these PCA-based models correspond to low-dimensional portfolio problems with w ∈ R K .If we were to continue applying the ℓ 1 and ℓ 2 penalties to each factor, it would not yield a sparse solution for either the managed portfolios or the individual stock weights.Therefore, for the PCA-based models, we define an ℓ 1 + ℓ 2 2 regularized maximum Sharpe ratio portfolio as: arg min where W LS (ϑ, V) is the same as in ( 24).Depending on the choice of V, the regularization terms in ( 25) are with respect to the managed portfolios (in PCA and RP-PCA) or the individual stocks (in IPCA).Next, we reparametrize the optimization problem in (25) as arg min subject to an additional constraint γ = 1/w ′ (µ f − r f 1 K ).Now, by defining w = γw, we obtain the corresponding quadratic programming problem arg min where Similarly to ( 6) and ( 8), we can employ the eigenvalues decomposition of Σ f = PΛ f P ′ , where PP ′ = I K , Λ f = diag(δ 1 , . . ., δ K ), and ∥V w∥ where and V is an N × K mapping matrix, the eigenvector corresponding to the first K largest eigenvalues of the PCA, RP-PCA, or IPCA covariance matrix; v + , v − are N × 1 vectors, which denote the positive and negative part of V w.Importantly, the final objective function in the optimization is quadratic, and the constraints are linear.Hence, the corresponding problem falls into the general class of QP problems that we solve using our framework.In the following empirical analysis we also use ϑ = 0.2, r f = 0, L j = −0.08,and U j = 0.08 for all j = 1, . . ., N as in all the previous methods.

Empirical Results
We gather both daily and monthly data for all stocks traded on the NYSE, Amex, and Nasdaq from January 1965 to December 2022.The daily and monthly stock returns, adjusted for splits and dividends, are sourced from the Center for Research in Security Prices (CRSP).Additionally, we obtain quarterly accounting-related information for public firms from the Compustat dataset, which includes metrics such as BE (book equity), AT (total assets), and CTO (capital turnover).Following the methodologies of Fama and French (1993) and Freyberger et al. (2020), we merge the returns data with the firm-specific information, introducing a 6-month lag for all firms to ensure our results are genuinely out-of-sample.
After obtaining the merged datasets, we construct 33 characteristics, with a full list provided in the Appendix, using data from firms in the Compustat dataset as described by Freyberger et al. (2020) and references therein.For imputation purposes, we adopt the backward cross-sectional model proposed by Bryzgalova et al. (2022).In our research, we utilize the stock universe defined by Asness et al. (2013), to which we refer as the AMP universe.To assemble this universe, we implement a rolling window approach and select stocks in each window based on specific criteria.Initially, in our market capitalization-based stock selection, we exclude the smallest market capitalization stocks, focusing mainly on largeand mid-cap stocks, which together account for 90% of the overall market capitalization.Subsequently, we filter out stocks priced below a designated threshold, ensuring the exclusion of penny stocks.Finally, to maintain the consistency of the dataset, we remove stocks with significant missing data in the last selection phase.
Depending on the specific model under consideration, we use either daily or monthly simple returns from the constructed AMP universe.For daily returns, the AMP universe typically consists of 500 to 1,000 stocks at any given time within a one-year rolling window.For monthly returns, we employ a rolling window of 20 years, resulting in an AMP universe of approximately 900 tickers for each window.Crucially, our methodology in constructing the rolling window-specific assets universe ensures that the portfolio and its performance are not affected by survivorship bias.
In portfolio optimization, the evaluation of out-of-sample performance of a specific model is often of interest.For this purpose, a rolling window backtest analysis is typically employed.Figure 4 illustrates our rolling window scheme for the monthly data utilized in factor-based models (AP-Trees and all PCA-based models).We partition the 38 years of data into a 20-year training sample (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004) and allocate the subsequent 18 years (2005-2022) for out-of-sample rolling window analysis.This involves monthly reestimation of all model parameters and optimization of portfolio weights.For models investing in individual stocks without leveraging information from stock-specific factors, we adopt a rolling window of daily returns with a one-year look-back period.The rebalancing occurs monthly, commencing on the same start date as in the case of the 20-year window of monthly returns.Thus, all out-of-sample results presented in the following sections span the identical time frame and maintain consistent rebalancing frequency.In terms of portfolio weight constraints, for all the scenarios discussed, we restrict asset concentration to no more than ±8% for a single asset and cap short positions at 20% of the total capital.We selected these thresholds to mirror a realistic industry environment, as described in Lunde et al. (2016).For our benchmark methods, we utilize daily data and incorporate three distinct mean shrinkage estimators as proposed by Wang et al. (2014) and Bodnar et al. (2019).Additionally, we employ four covariance matrix estimators: the Sample Covariance Matrix, POET (as detailed by Fan et al. (2013)), and both the Ledoit & Wolf Linear and Non-Linear Shrinkage methods from Ledoit and Wolf (2004) and Ledoit and Wolf (2020a), respectively, as discussed in Section 3.
In contrast, we evaluate these benchmarks against the ℓ 1 +ℓ 2 2 regularized minimum variance portfolio.The out-of-sample annualized Sharpe ratio results of these methods are illustrated in Figure 5.This figure showcases heatmaps that detail the out-of-sample Sharpe ratios for portfolios, rebalanced monthly using one year of daily data for parameters.Meanwhile, Figure 5b highlights the minimum variance portfolio that implements a long-short constraint, complemented by ℓ 1 and ℓ 2 2 shrinkage.Further insights into Sharpe ratios, derived from various mean and covariance matrix estimator combinations, are provided in Figure 5a.
From a vertical perspective, the heatmap sorts portfolios based on five mean estimators: the Sample Mean, Mean Shrinkage I (from Wang et al. (2014)), Mean Shrinkage II and III (both from Bodnar et al. (2019)), and a minimum variance portfolio that does not factor in mean estimation.Horizontally, the heatmap is structured according to covariance matrix estimators, namely: the Sample Covariance Matrix (SCM), POET (by Fan et al. (2013)), Linear Shrinkage (L&W-LS) from Ledoit and Wolf (2004), and Nonlinear Shrinkage (L&W-NLS) from Ledoit and Wolf (2020a).
Intriguingly, the minimum variance portfolios showcased in Panel (a) and the base of Panel (b) in Figure 5 outperform maximum Sharpe ratio portfolios that apply shrinkage either to the mean, the covariance matrix, or both.Moreover, the norms of the minimumvariance regularized portfolio in Panel (a) of Figure 5 in most of the cases mirror or surpass the performance of the covariance matrix shrinkage methods when applied to a minimumvariance portfolio.These findings indicate that the ℓ 1 and ℓ 2 2 regularized portfolio methods perform similarly to best performing covariance shrinkage estimators.
The findings presented in Figure 5b show that the minimum-variance portfolio consistently outperforms the maximum Sharpe ratio portfolio, regardless of the shrinkage applied.This observation aligns with our earlier comments regarding the inherent noisiness of individual stock means.Optimization strategies based on individual stocks frequently yield suboptimal out-of-sample results.Subsequent analyses will highlight that managed portfolios can mitigate the idiosyncratic noise present in individual stock returns, thereby delivering optimal portfolios with superior out-of-sample performance.
Figure 6 displays the out-of-sample annualized Sharpe ratios for AP-Trees portfolios, which are rebalanced monthly.These portfolios are derived from the ℓ 1 + ℓ 2 2 regularized maximum Sharpe ratio portfolio strategy, as outlined in (28).Each heatmap represents a unique managed portfolio, distinguished by market capitalization and paired with two other characteristics from Table 3.The differences across heatmaps also reflect variations in the regularization strength parameters, λ 1 and λ 2 .In all cases, a 20-year rolling window of monthly data is used.The short-selling constraint is set at ϑ = 0.2, and the maximum concentration in an individual managed portfolio is capped at 8%.We observe a notable improvement in Sharpe ratios compared to the top-performing portfolios invested in individual assets.This suggests that grouping stocks with analogous characteristics into managed portfolios effectively diminishes noise and enhances mean prediction.
Next, we examine the three PCA-based models outlined in Section 3. Figure 7 presents heatmaps depicting the out-of-sample Sharpe ratios for a monthly rebalanced portfolio that invests in K = 2, . . ., 6 factors from the PCA, RP-PCA, and IPCA models, respectively.The figure comprises 15 heatmaps, all on a consistent scale.Each heatmap demonstrates performance across different levels of ℓ 1 and ℓ 2 2 regularization parameters, taken from an  exponential grid spanning λ 1 = 10 −6 , . . ., 5 and λ 2 = 10 −6 , . . ., 5. Empirically, within these parameter ranges, the regularization has the most pronounced impact on the portfolio weights across all models.For every model and every factor count K, the proposed regularization consistently enhances performance.The peak performance is observed with K = 6 factors.Specifically, the Sharpe ratios rise for (i) the PCA from 1.52 to 2.00; (ii) the RP-PCA from 2.12 to 3.40; and (iii) the IPCA from 3.75 to 4.93.Furthermore, as illustrated in Figure 7, there is a marked improvement as the number of components from PCA and IPCA increases.Exploring a broad range of regularization parameters enables us to pinpoint their most effective values.For λ 1 , the optimal value is approximately 1.7 × 10 −4 , while for λ 2 , it lies between 1.0 × 10 −6 and 2.9 × 10 −2 .Across all values of K and various regularization strengths, the RP-PCA model consistently surpasses the corresponding PCA models.The most outstanding performer among all considered models is the IPCA model with 6 factors and combined ℓ 1 and ℓ 2 2 shrinkage.Figure 8 contrasts the performance of the PCA factor model for the maximum Sharpe ratio portfolio (K = 6) with and without regularization on the portfolio norms, as delineated in ( 24) and (28), respectively.Panels 8a and 8c present the results for the PCA max Sharpe ratio portfolio without regularization.In contrast, panels 8b and 8d showcase the results with the inclusion of ℓ 1 +ℓ 2 2 regularization, utilizing the optimal λ 1 and λ 2 parameters.Both    panels 8a and 8b suffer from a large drawdown during the financial crisis.Nevertheless, in other periods, the regularized PCA factor model demonstrates enhanced performance.This distinction becomes even more evident in Panels 8c and 8d, where the regularized portfolio model outperforms in the majority of the months considered.Figure 9 parallels Figure 8 but focuses on the RP-PCA model (K = 6).We examine both the inclusion and exclusion of the ℓ 1 + ℓ 2 2 regularization, specifically selecting the optimal λ 1 and λ 2 parameters.Panels 9a and 9c depict the underwater and monthly return plots for the RP-PCA factor model without the ℓ 1 + ℓ 2 2 constraints.Conversely, panels 9b and 9d showcase these plots with the ℓ 1 + ℓ 2 2 regularization applied.The regularized RP-PCA displays a trend akin to the benchmark model (RP-PCA factor model without ℓ 1 + ℓ 2 2 regularization).Notably, the application of regularization in the RP-PCA model considerably mitigates drawdowns; for instance, the maximum monthly drawdown shrinks from −20% to −8.2%.This enhancement is further confirmed by panels 9c and 9d, which consistently indicate elevated returns for the regularized RP-PCA model.
Finally, Figure 10 offers a similar comparison but focuses on the IPCA model.Analogous to previous observations, the IPCA model augmented with the ℓ 1 + ℓ 2 2 regularization for the maximum Sharpe ratio portfolio exhibits consistently fewer and smaller drawdowns compared to the unregularized IPCA model.Moreover, the monthly returns for the regularized maximum Sharpe ratio portfolio are consistently higher throughout the entire out-of-sample analysis period.
In summary, both the AP-Trees and the three PCA-based models gain significantly from the proposed regularization of the linear transformation of the portfolio norms, Vw.Among all the models considered, the IPCA model stands out as the top performer.The out-ofsample performance of both the original and regularized IPCA models is truly exceptional.Even when accounting for market frictions, such as transaction costs, implementation lags, liquidity concerns, and potential complications arising from the construction of certain asset-specific characteristics, a significant portion of this remarkable performance is expected to remain intact.While various methods exist to further refine the investment process and mitigate the effects of these market frictions, delving into them remains a topic for future research.It is worth noting that our IPCA model results without regularization are in agreement with findings from the original IPCA paper (refer to the Sharpe ratios of the tangent portfolios in Table 5 of Kelly et al., 2019).Our regularization of the linear combinations of the portfolio norms further enhances the model's efficacy.Moreover, the portfolio results presented in this study deviate from the original Kelly et al. (2019) portfolio due to the incorporation of a 20% long-short constraint, a cap of 8% on individual positions, and trading restrictions to only the AMP universe of mid-and large-capitalization stocks.That the portfolio, despite these constraints, can achieve such impressive monthly returns, high annualized Sharpe ratios, and minimal drawdowns over nearly 18 years of out-of-sample rolling windows is intriguing and noteworthy.
Table 2 presents key performance metrics across various benchmarks: the S&P 500 index; two minimum variance portfolios utilizing Ledoit & Wolf's linear and non-linear shrinkage covariance matrices; the best performing AP-Trees and factor portfolios based on PCA, RP-PCA, and IPCA with K = 6 models and regularization.The latter is also presented without regularization, as per the original model by Kelly et al. (2019).The initial three benchmarks are based on individual daily stock returns.For the AP-Trees, we employ 360 managed portfolios conditionally sorted based on size, beta, and lagged market capitalization using a depth-three tree (refer to Figure 6 for the highest Sharpe).The PCA and RP-PCA models utilize 330 single-sort monthly managed portfolios.Contrarily, the IPCA model strictly operates on individual stock returns, incorporating stock-specific firm data.The first IPCA portfolio is the constrained tangency portfolio without regularization, with its covariance matrix determined via the IPCA factors model.The subsequent IPCA portfolio is the same but with optimal regularization employed.In summation, the regularized IPCA portfolio, results in an annualized Sharpe ratio of 4.91, and it surpasses all other methods across nearly every metric considered.The regularized RP-PCA is performs best in terms of the lowest maximum drawdown, highest information ratio, and the lowest loss in the worst month.It also has lower volatility than the IPCA models.
Other key performance, such as rolling beta, rolling Sharpe, and rolling volatility, of our top-performing IPCA factor model employing the maximum-Sharpe ratio portfolio with ℓ 1 + ℓ 2 2 regularization, are illustrated in Figure 11. Figure 11a compares the annual returns of the IPCA model in maximum Sharpe ratio portfolio without (Benchmark) and with our ℓ 1 + ℓ 2 2 regularization (Strategy).The regularization systematically improves the performance.Hence, it should be also simple to calibrate the λ 1 and λ 2 parameters based on past performance.The distribution of the monthly returns is centered around 9% per month, rolling 6M Sharpe ratio is very high, rolling beta (against the non-regularized benchmark) is oscillating around 1, and the rolling volatility is around 20% only with a large burst during the Great Financial Crises and recent Covid period-variance levels deemed acceptable by quantitative portfolio managers without necessitating additional (de-)leveraging.

Concluding Remarks
This study presents a unified framework for portfolio optimization using quadratic programming.This framework integrates various conventional objectives for portfolio optimization, constraints, and regularizations frequently adopted in practice.As a result, it is exception-Table 2: Key performance metrics from a rolling window exercise with monthly rebalancing from 2005-01-31 until 2022-12-31.First Column: S&P500 index as a long-only market benchmark.Second Column: The minimum variance optimal portfolio with sample covariance matrix computed from one year look-back window of daily returns, and long-short constrain as in (8).Third Column: The minimum variance optimal portfolio with Ledoit and Wolf (2020a) nonlinear shrinkage covariance matrix computed from one year look-back window of daily returns, and long-short constraints.Fourth Column: 20 years of lookback window of AP-Trees managed portfolio monthly returns using maximum Sharpe ratio optimal portfolio strategies as in ( 16) with long-short constraints, ℓ 1 and ℓ2 2 shrinkage.Remaining columns use 20 years of look-back window of monthly returns and different maximum Sharpe ratio optimal portfolio strategies with long-short constraints as in ( 27) and the covariance matrix of factor portfolios for K = 6 estimated via Fifth Column: PCA model with ℓ 1 and ℓ 2 2 regularized maximum Sharpe ratio portfolio; Sixth Column: RP-PCA model with ℓ 1 and ℓ 2 regularized maximum Sharpe ratio portfolio; Seventh Column: IPCA model with maximum Sharpe ratio portfolio without ℓ 1 and ℓ 2 2 regularization.Eighth Column: IPCA model with ℓ 1 and ℓ 2 2 regularized maximum Sharpe ratio portfolio.ally suited for rapid backtesting of extensive portfolio scenarios, ensuring both accuracy and computational speed.
Employing this framework, we introduce a novel maximum Sharpe ratio portfolio problem, incorporating new types of regularizations on the norms of portfolio weights or their linear transformations.We demonstrate that, within the framework of recent tree-based and PCA-based factor models, our proposed regularization and optimization framework yield systematically enhanced returns, diminished drawdowns, reduced volatilities, and elevated Sharpe ratios for the optimal portfolios.Among the models assessed, the IPCA factor model detailed in Kelly et al. (2019) emerges as the superior performer, especially when utilizing the proposed regularization.
In future studies, it would be intresting to delve deeper into the ramifications of transaction costs on our optimal portfolios.Factor based models because of its conditional mean prediction that depends on stock specific factors lead to inherently higher turnover numbers in portfolio optimization.Nevertheless, we believe that because of the monthly rebalancing considered in this paper, the majority of the qualitiative results will remain, also the additional smoothing ℓ 1 constraints on the level of changes in the individual assets (similar to our ℓ 1 regularizations) should help to reduce turnover without large impact on the performance.Additionally, integrating alternative portfolio problems within our expansive framework could help mitigate these transaction costs.As optimal portfolio weights are deduced from the inverse covariance matrix, it's vital to consider applying shrinkage methods to this matrix, which could further bolster the robustness and efficiency of the portfolio optimization.Such strategies have been explored in Kourtis et al. (2012), Wang et al. (2015), and Bodnar et al. (2016).Further, a comprehensive examination of the asset-specific fac-tors in the IPCA model that significantly influence return predictions and boost portfolio performance is essential.Ideally, emphasizing factor sparsity would enhance the model's signal-to-noise ratio.This can also be achieved by incorporating sparse PCA extensions into the IPCA model.

Appendix: Description of Asset Specific Factors
In Table 3, we list the details of the firm specific characteristics used in the factors models.

Figure 2 :
Figure 2: Four plots of two different portfolio frontiers (long-only and the closed-form long-short from (2)), together with different optimal long-only portfolios (maximum-Sharpe ratio (14) and minimum-variance), the equally weighted portfolio (1/N ), equal volatility contribution portfolio (Equal-Var), and 25000 iid Dirichlet distributed portfolios w ∼ Dir(1 N ), for different number of assets N = 10, 20, 50, 500 selected from the largest market-capitalization stocks in the US market.Mean and covariance matrix estimated from 10 years of daily returns (2520 observations).
(a) Long-only portfolio allocation (b) Long-short portfolio allocation

Figure 3 :
Figure 3: Portfolio weights of N = 50 assets as a function of the regularization strength parameter λ of ℓ 1 penalty in minimum-variance ℓ 1 regularized portfolio (long-only vs. long-short with ϑ = 0.2), the x-axis are in log scale.

Figure 4 :
Figure 4: Summary of the rolling window analysis.We use data going back to 1985 and slide 20 years of monthly returns to estimate the parameters, with monthly rebalancing and performance updates.
(a) min V ar + ℓ1 + ℓ 2 2 (b) max SR and min V ar with mean and covariance matrix shrinkage

Figure 5 :
Figure 5: Heatmaps of annualized Sharpe ratios for different minimum-variance portfolios on individual assets from the AMP universe using one year of daily returns data and monthly rebalancing.Left: The annualized Sharpe ratios of ℓ 1 + ℓ 2 2 regularized minimum-variance portfolio strategy for different regularization strength parameters.Right: Row-wise: different mean estimators and portfolio without the dependency on the mean.Namely, maximum Sharpe ratio portfolio with four types of mean estimators: Sample Mean, Mean Shrinkage I from Wang et al. (2014), Mean Shrinkage II from Bodnar et al. (2019), Mean Shrinkage III from Bodnar et al. (2019), and minimum variance that does not depend on the mean estimation.Column wise: different covariance matrix estimators: Sample Covariance Matrix (SCM), POET from Fan et al. (2013), Linear Shrinkage covariance matrix estimator from Ledoit and Wolf (2004) (L&W-LS), Nonlinear Shrinkage covariance matrix estimator from Ledoit and Wolf (2020a) (L&W-NLS).

Figure 6 :
Figure 6: Thirty-six heatmaps of out-of-sample annualized Sharpe ratios from monthly rebalanced AP-Trees portfolios obtained from ℓ 1 + ℓ 2 2 regularized maximum Sharpe ratio portfolio strategy computed using (27).Different heatmaps correspond to different managed portfolios of market capitalization with the combination of another two characteristics from Table3, and different regularization strength parameters, λ 1 and λ 2 .In all the cases, we use a rolling window of 20 years of monthly data with short-selling constraint ϑ = 0.2 and no additional constraints.

Figure 7 :Figure 8 :
Figure 7: Fifteen heatmaps of out-of-sample performance gains from monthly rebalanced portfolio in the annualized Sharpe ratios of ℓ 1 + ℓ 22 regularized maximum Sharpe ratio portfolio strategy.The strategy varies based on the number of components from PCA, RP-PCA, and IPCA estimated covariance matrix, and different regularization strength parameters, λ 1 and λ 2 .In all the cases, we use the maximum Sharpe ratio portfolio estimated based on the last 20 years of data with short-selling constraint ϑ = 0.2 and no additional constraints.Columns: Different size components from PCA, RP-PCA, IPCA models.First Row: Annualized Sharpe ratios for the covariance matrix derived from PCA factors, regularized with ℓ 1 + ℓ 2 2 .Second Row: Annualized Sharpe ratios for the covariance matrix derived from RP-PCA factors, regularized with ℓ 1 + ℓ 2 2 .Third Row: Annualized Sharpe ratios for the covariance matrix derived from IPCA factors, regularized with ℓ 1 + ℓ 2 2 .

Figure 9 :
Figure 9: Analogous to Figure 8 but for the RP-PCA model.

Figure 10 :
Figure 10: Analogous to Figure 8 but for the IPCA model.

Figure 11 :
Figure 11: Five plots of different out-of-sample performance measures for our long-short maximum-Sharpe ratio with ℓ 1 +ℓ 2 2 regularization and IPCA covariance matrix estimator portfolio strategy from (27) versus maximum-Sharpe ratio with IPCA factors covariance matrix without shrinkage as a benchmark.

Table 3 :
Acronyms and Factor Names * denotes the characteristics used in AP-Trees managed portfolios.