Leveraged funds: robust replication and performance evaluation

Leveraged and inverse exchange-traded funds seek daily returns equal to fixed multiples of indexes' returns, but the ensuing rebalancing costs create a tension between a high correlation with the index and a low average deviation from the leveraged index' performance. With proportional trading costs, we find that the optimal replication policy is robust to the index' dynamics and obtain a sufficient statistic for index replication performance, the implied spread, which is insensitive to risk-premia and enables comparisons of funds tracking different factors of an index. Overall, the impact of trading costs on replication performance is comparable to or higher than the effect of management fees.


Introduction
Leveraged and inverse exchange-traded funds (LIETFs) aim to deliver daily returns that scale an index' return by a constant factor. Introduced in the United States in 2006, the 12 funds available at the end of that year barely held $2 billion. Since then, new funds have become available for equity indexes specific to countries and industry sectors, bonds, commodities, currencies, and real estate. At the end of 2022, over 240 leveraged or inverse funds held over $65 billion in assets, almost all of them managed by two firms, Profunds and Direxion. † Typical factors for these funds range from −3 for inverse funds to +3 for leveraged ones, which may lead to large losses in turbulent markets, thereby raising regulators' concernsand doubts. In 2015, an SEC proposal sought to cap leverage to 150%, but was not implemented. In 2017, the SEC initially approved the listing of funds with factors of ±4, only to later reconsider its decision. ‡ Overall, LIETFs appear to have grown faster than their research.
The mechanism that keeps the shares of such funds aligned with their net asset value is the activity of authorized participants (typically market makers and broker dealers), who exchange blocks of funds' shares (creation units) for the underlying basket of securities at their net asset value (minus a transaction fee of approximately 0.10%). As a result, authorized participants quickly exploit arbitrage opportunities created by minimal departures of the market price of the fund's shares from its net asset value, thereby ensuring that the former closely tracks the latter.
Keeping a constant leverage ratio requires rebalancing the portfolio daily, as to keep the amount of index exposure near the target multiple of the fund's value: both leveraged and inverse funds buy the index when it increases and sell it when it decreases (Cheng and Madhavan 2009). Rebalancing requires a significant amount of trading: for example, a typical 1% daily return (one standard deviation, for an annual volatility of 16%) implies a daily turnover of 2% for a fund with factor +2 or −1, 6% with factors +3 or −2, and a remarkable 12% with factor −3. † Such trading volume generates substantial trading costs, especially for funds with high leverage ratios and for indexes with high volatility and low liquidity. While funds managers seek to mitigate trading frictions using swaps and derivatives, the question is to what extent the trading costs generated by frequent rebalancing are responsible for the observed underperformance of LIETFs.
The tradeoff between correlation and underperformance underlies the two main metrics used to evaluate passive investment funds, which focus on the difference between the index' multiple and the fund's daily returns. The tracking error (TrE), the annualized standard deviation of such difference, reflects the precision with which the fund mimics its target at a daily frequency. The tracking difference (TrD), the annualized average of such difference, measures the systematic gap between the benchmark index and the fund. Tracking error and tracking difference are the main indicators used by academics (Charteris and McCullough 2020), practitioners (Frino and Gallagher 2001, Johnson et al. 2013, and regulators (ESMA 2014) to evaluate the performance of ETFs. ‡ This tension raises two central problems for managers and investors. Fund managers need to trade as efficiently as possible to achieve the desired tracking error while minimizing the deviation from the leveraged index. Investors compare competing funds that differ in leverage ratios, tracking error, and tracking difference, and need performance measures that control for such differences. The contribution of this paper is to answer these questions in a model with arbitrary volatility dynamics, continuous trading, and trading costs proportional to volume, leading to two main results for managers and investors.
A manager's optimal replication policy is independent of volatility, while it depends only on the trading cost ε, the target factor , and the desired tradeoff between tracking error and tracking difference, summarized by a positive parameter γ , which captures investors' aversion to tracking error. The † A fund with factor generates a daily turnover of ( − 1) times the index return (Cheng and Madhavan 2009). ‡ The website https://www.trackingdifferences.com/ publishes the tracking differences of ETFs listed in several exchanges. manager should keep the fund's index exposure, as a multiple of the fund's assets, within the approximate thresholds ± 3 4γ 2 ( − 1) 2 1/3 ε 1/3 by minimally increasing and decreasing the fund's exposure as it reaches the lower and upper levels, respectively. In this formula, higher values of the parameter γ lead to a narrower no-trade region, which results in lower tracking error, higher trading costs, and consequently a more negative tracking difference. The formula is similar to the optimal trading boundaries arising in problems of optimal portfolio choice with transaction costs, with one crucial difference: here the approximate policy is insensitive to the index price dynamics, both volatility and expected return. § Even as these boundaries remain constant, in more volatile times rebalancing is higher as the index position is more volatile, and boundaries reached more frequently. The robust replication of leveraged funds has some analogies with the robust hedging of variance swaps, developed by Dupire (1993), Neuberger (1994), and Carr and Madan (2001): if the underlying index follows a continuous process, the optimal replication policy does not depend on the particular volatility dynamics, and in the absence of frictions the replication is perfect. In the present context, such robustness remains valid in the presence of trading frictions, though the optimal strategy's performance does depend on realized volatility.
The optimal replication policy under frictions explains the underexposure puzzle observed by Tang and Xu (2013), who report that average exposures of both leveraged and inverse funds in US markets are significantly smaller than their target factor. We confirm this effect in recent data on US funds and show that it is consistent with the model's predictions: though the optimal rebalancing boundaries are symmetric (at the first order) around the target factor, the fund's volatility is lower when its exposure is closer to zero, which increases the time spent at such smaller exposures and generates a bias towards zero for the average exposure. The effect is stronger for larger factors and more volatile indexes.
The main implication for investors is that a summary of a fund's overall tracking performance is the implied spread, defined asε This quantity estimates the hypothetical bid-ask spread that would make the investor indifferent between using the fund or replicating the index by trading with such a spread. Formula (1) offers a tool for comparing the performance of funds with different factors and tracking errors: the implied spread combines the tracking difference (TrD) and tracking error (TrE) in a single number, measuring the fund's performance after controlling for the factor and the index' average volatility σ , which otherwise magnify the effect of trading frictions. The formula implies that, holding the factor constant, an optimally managed fund that seeks to halve its (negative) tracking difference should be prepared to approximately double its tracking error. In addition, the formula does not depend on the index' risk premium, which is notoriously hard to estimate with precision. Most implied spreads in US funds range between 2 and 20 basis points, with smaller spreads for funds with larger factors, implying that more volatile funds offer a better tradeoff between tracking difference and tracking error. An explanation for this phenomenon lies in the management fees charged by leveraged funds, which are nearly the same for funds with different factors, making funds with larger factors comparatively cheaper in their unit cost of exposure.
These findings shed new light on the growing literature on leveraged funds. Tang and Xu (2013) report that leveraged funds deviate significantly from their benchmarks even after management fees, and separate tracking error into a compounding component, due to the convexity of leveraged returns, and a rebalancing component, due to trading frictions † . Wagalath (2014) derives an asymptotic expression for the slippage that results from rebalancing at fixed intervals. Anderson et al. (2012) note the sensitivity to trading frictions of risk-parity strategies, a popular class of leveraged strategies, and Anderson et al. (2014) propose a leveraged performance attribution formula that emphasizes, in addition to transaction costs, the comovement between portfolio exposure and index return. The recent work of Dai et al. (2022) studies the problem of continuous intraday replication of leveraged ETFs with quadratic transaction costs.
The paper is organized as follows: section 2 introduces the model and the optimization problem. Section 3 contains the main results on optimal leverage replication and performance evaluation ( theorem 3.1). Section 4 describes how the replication of LIETFs takes place in practice, providing a quantitative analysis of the trading costs associated to rebalancing. With such context, section 5 investigates the paper's implications empirically, focusing on families of leveraged and inverse ETFs traded in US markets. Section 6 discusses the robustness of the paper's results to risk-premia proportional to return variance, finite horizons, and discrete trading. Section 7 concludes. All proofs are in the appendix.

Model
The market has a safe asset earning an interest rate r t and a risky asset (the index) with ask (buying) price S t , where and with the index' bid (selling) price (1 − ε)S t , which implies a constant relative bid-ask spread of ε > 0, or, equivalently, constant proportional transaction costs. Here (B t ) t≥0 is a standard Brownian motion on a filtered probability space † Cf. Cheng and Madhavan (2009) ( , (F t ) t≥0 , F, P), while (r t ) t≥0 is an adapted process such that T 0 |r t | dt < ∞ a.s. Volatility is integrable and stationary: < ∞ for all T > 0 and is weakly ergodic with stationary variance σ 2 > 0, i.e. lim T→∞ 1 T T 0 σ 2 t dt = σ 2 P − a.s. Thus the dynamics of the safe rate and volatility are essentially arbitrary, though volatility should neither explode nor vanish: as in Melnyk and Seifried (2018), this ergodicity condition ensures that average volatility is finite, hence the problem is well posed.
The baseline model in this section focuses on the case of a zero risk premium, i.e. μ t = 0, while section 6 extends the analysis to include a nonzero risk premium proportional to return variance and finds that it has only a second-order effect on the main results. In addition, the assumption of a zero risk premium is substantively appropriate for the optimal tracking problem at hand, as it removes a manager's incentive to generate positive tracking difference by systematically deviating from the target's exposure to earn a risk premium. Such hypothetical deviation would be counterfactual, as it would imply a positive bias for both leveraged and inverse funds. On the contrary, the 'underexposure puzzle' of Tang and Xu (2013) documents negative bias for leveraged funds and positive bias for inverse funds.
A manager's trading strategy is described by the number of shares ϕ t of the index held at time t. The corresponding fund's value F t = X t + Y t at time t is the sum of the index position Y t = ϕ t S t and the safe position X t , which follows the dynamics ‡ where ϕ ↑ and ϕ ↓ are the minimal increasing functions that satisfy ϕ t = ϕ ↑ t − ϕ ↓ t and represent the cumulative number of shares purchased and sold, respectively. Furthermore, a strategy is required to be solvent, in that its corresponding wealth F t is strictly positive at all times. (Admissible strategies are formally described in Definition A.1 below.) Thus the fund value F t satisfies the dynamics where π t = Y t /(X t + Y t ) = ϕ t S t /F t represents the ratio of the index' position and the fund value at time t. Absent any motive to systematically deviate from the target, the manager's objective is to trade off the fund's tracking error against its tracking difference, defined as follows. The (cumulative) difference D t between the fund's return above the safe rate and the multiple of the index' return above the safe rate is defined as (4) ‡ The convention of evaluating the risky position at the ask price is inconsequential. Using the bid price instead leads to the same results up to a change in notation. Accordingly, the definitions of annualized tracking difference and tracking error between the fund's and the index' multiple are where D T denotes the quadratic variation of the process D. † The manager's objective is to find a trading policy that minimizes tracking error without deteriorating performance with excessive trading costs, i.e. for a fixed level of tracking difference. Such cost-adjusted tracking error is summarized by the quantity ‡ The parameter γ > 0 is interpreted as aversion to tracking error, and the empirical results in section 5 suggest that typical values of γ range between 5 and 10, as LIETFs aim at closely tracking the daily returns of their indexes, at the price of a moderate but significant average underperformance. The final term in (7) represents trading costs, which hinder continuous portfolio rebalancing and make constantproportion strategies infeasible. (Otherwise, in the absence of trading frictions the optimal choice is to set π t constantly equal to , which achieves perfect replication, with realized tracking difference and tracking error both zero.) The appeal of the above objective is twofold: first, it makes quantitatively explicit the tradeoff between tracking error and trading costs, which is implicitly acknowledged in LIETF documents. For example, the prospectus of the fund Proshares Ultra S&P 500 states that 'The Fund seeks daily investment results, before fees and expenses, that correspond to two times (2x) the daily performance of the Index' before warning that 'A number of other factors may also adversely affect the Fund's correlation with the Index, including fees, expenses, transaction costs, financing costs [. . . ]'.
Second, the long-term average of (7) admits the interpretation of equivalent expense ratio, i.e. the hypothetical expense ratio that a frequent user of these funds would be willing to pay on the fund's assets to avoid both trading costs and † Both limits hold in probability. The first convergence follows by the continuity of D, and the second one from theorem IV.1.3 in Revuz and Yor (1999) by localization. ‡ The equality follows from lemma A.2, see also Remark A.3. tracking error in the fund's payoff: § As leveraged and inverse funds are typically open-ended, without a specific maturity target, this paper focuses on the long-term objective (8).
Note that such ergodic average may be interpreted either as the long-term average that appears in the definition, or as the unconditional expectation of the daily cost-adjusted tracking error, given a stationary initial condition for the fund's composition at the beginning of the trading period.
Indeed, suppose that the investor could choose between the fund F t , which tracks imperfectly the leveraged index, and another contract, which can be exercised anytime, and delivers the payoffF t , defined as follows: which does not entail any tracking error or trading costs, but rather a management fee φ. For such a contract, the difference process is simplyD t = −φt, as trading costs and tracking error are zero, which means that the right-hand side of (8) is precisely φ, justifying its interpretation as the equivalent expense ratio that would make an investor indifferent between the fund F t with its tracking error and trading costs, and a hypothetical substitute that has neither, but rather pays such management fee. The above comparison is more than a thought experiment: F t reflects the characteristics of leveraged and inverse funds typical of US markets, whileF t mirrors the structure of factor certificates prevalent in Europe, derivatives contracts issued by a financial institution, which pay the holder the value of a leveraged index minus a management fee, rather than the value of a replicating portfolio.

Main results
The first main result characterizes the optimal policy and its performance in terms of the solution to a one-dimensional free-boundary problem that is independent of the volatility process. § This quantity is well defined by virtue of assumption 2.1, which guarantees that tracking error does not diverge. The lim inf guarantees a good definition a priori. A posteriori, optimal strategies exist in which the limit inferior is a limit, hence the similar problem defined in terms of lim inf has the same solution. Note also that, if volatility σ is a function of some stationary Markov process, the first term in (8) , where the random variable Y has the stationary distribution under the expectation E. In particular, the formulation in (7) in terms of a time-average is equivalent to a direct ergodic formulation of the problem. We keep the current formulation because it is does not involve the Markov property and because an ergodic formulation of the trading cost in (8) would require the use of local times as well as further restrictions on trading strategies. Theorem 3.1 Let γ > 0 and = 0, 1. For ε > 0 small enough: (i) The free boundary problem and sells at π + := ζ + /(1 + ζ + ) as little as necessary to keep the risky weight π t within the interval where is the set of admissible strategies (see appendix 1). (iv) The optimal Tracking Difference and Tracking Error are whereβ is defined in (14) below. (v) The average volatilitys, exposureβ and squared correlation R 2 are: (vi) The following asymptotic expansions hold: In particular, the following relation holds: Proof See appendix 4.
The main message of this theorem is that the solution to the optimal tracking problem is completely determined by the two buy and sell boundaries π ± , in terms of which all performance statistics follow in closed form. As volatility dynamics does not enter the free boundary problem, π ± are independent of the volatility process, and depend only on the factor , the tracking-error aversion γ , and the spread ε. In consequence, also the average exposureβ and the squared correlation R 2 are independent of volatility dynamics. By contrast, the tracking difference TrD, tracking error TrE, and the fund's volatilitys also depend on the average volatility σ .
Part (vi) turns these qualitative insights into quantitative implications by deriving explicit series expansions of π ± and of the performance statistics. Each of these expansions depends on the unobservable parameter γ : eliminating this parameter from (17) and (18) yields the relation (21) among tracking error, tracking difference, and spread, which must hold regardless of the value of γ . In summary, the above theorem summarizes the optimal trading policy, its performance, and their statistical attributes. The next sections describe their main normative implications for replication and performance evaluation, while the following sections examine such implications empirically.

Trading boundaries
The optimal trading boundaries π ± identified by theorem 3.1 define a range of leverage ratios around the target factor , on which it is optimal for the manager to refrain from buying and selling. For small trading costs, the first-order term of the asymptotic expansion in (16) coincides with the optimal policy in Gerhold et al. (2014) for an investor with constant relative risk aversion γ , constant investment opportunities, and a target portfolio equal to : superficially, tracking optimally is like investing optimally after replacing the investor's desired portfolio with the target factor.
The crucial difference between optimal tracking and portfolio choice is that the optimal tracking boundaries are insensitive to the volatility process. This means that in more volatile times the manager does rebalance more often, but only because the leverage ratio reaches the boundaries more often, not because the rebalancing policy changes.
An intuitive explanation of such robustness is that, because volatility is stationary, the objective function is invariant to a time change that stretches periods of high volatility and compresses periods of low volatility, as to normalize it to a constant: while in the original setting tracking error is high when volatility is high, in the time-changed problem volatility remains fixed, but time ticks faster, thereby accumulating more tracking error on average. As the optimal policy of the time-changed problem is independent of volatility, so is the optimal policy of the original one. Figure 1 displays the optimal trading boundaries against tracking error: low tracking error (left, corresponding to high γ ) implies a narrow no-trade region, and thus frequent rebalancing. This regime is the most relevant for leveraged ETFs, as the users of these funds are primarily short-term traders, for whom the precise replication of daily returns is more important than average performance (Cheng and Madhavan 2009).
As the aversion to tracking error decreases (right), the priority shifts to reducing long-term average rebalancing costs, at the expense of higher tracking error. For leveraged funds, widening the no-trade region serves this purpose-up to a point. As the emphasis on low trading costs dominates, it becomes virtually impossible for a leveraged fund to track its target: the next best alternative is to closely replicate the index, which can be done essentially for free.
For inverse funds the situation is different: the closest factor that is replicable at no cost is zero, therefore the no-trade region continues to widen until its sell boundary becomes exactly zero, while remaining symmetric around its target multiple to preserve pre-existing short positions. At that point, negative exposure is seldom decreased (at the buy boundary, for the sake of solvency) and never increased (as the sell boundary of zero is never reached).
Overall, the small tracking errors that are typically sought by managers of leveraged and inverse funds lead to similar replication strategies, based on nearly symmetric no-trade regions. Excessive emphasis on small costs would break this symmetry: because the only costless multiples are zero and one, leveraged funds would veer toward one, while inverse funds would veer toward preservation of negative exposure while retaining solvency. Figure 1. Buy (dashed) and sell (solid) boundaries (vertical, as risky weights π ) versus average tracking error TrE (horizontal) for leveraged (top) and inverse (bottom) funds, with multipliers 4 (top), 3, 2, −1, −2, −3 (bottom). As aversion to tracking error γ decreases from left (≈ 10 6 ) to right (≈ 10 −4 ), for inverse funds (bottom) the trading boundaries widen around the target, whereas for leveraged funds (top) they first widen and then collapse to one. ε = 1% and zero risk premium. Figure 2. Negative tracking difference (− TrD, vertical) against tracking error (horizontal) for leveraged (solid) and inverse (dashed) funds, in logarithmic scale, from − 3, + 4 (top), to − 2, + 3 (middle), and − 1, + 2 (bottom). As aversion to tracking error γ decreases from left (≈ 10 6 ) to right (≈ 10 −4 ), a k + 1-leveraged fund is akin to a −k inverse one, as the respective curves (same color) approach low and high TrE aversion. ε = 1%, σ = 16%, and zero risk premium.

Tracking difference versus tracking error
which corresponds to the negative, approximately linear dependence in the left of the graphs. As the right-hand side remains constant if is replaced by − + 1, for small tracking error a −( − 1) inverse fund faces a tradeoff similar to the one of a leveraged fund with factor . Put differently, a +3 leveraged fund is as difficult to manage as a −2 inverse fund. This is clear from the comparison of solid and dashed lines with same color in the figure.
As tracking error increases, the inverse fund's tradeoff improves, in that its optimal policy yields a less negative tracking difference than the symmetric leveraged fund because the boundaries of inverse funds keep widening even as those of leveraged funds start shrinking. As the tracking error increases further, their performances become again trivial, as both leveraged and inverse exposures are allowed to depart from their targets to avoid trading costs.

Implied spread
Equation (21) links the tracking difference, tracking error, leverage factor, index volatility, and trading spread ε. As the tracking error represents the standard deviation of the estimate of the tracking difference TrD, a low tracking error means that in this equation all quantities are estimated accurately, with the exception of the spread ε. In principle, one could estimate ε as the average bid-ask spread observed in the index, but in practice such an estimate would overstate the rebalancing costs of leveraged and inverse funds, as they achieve their index exposure primarily through total return swaps and futures, which offer much lower trading costs than the index components. In practice (cf. section 4), direct holdings of the index' components are typically less than the fund's assets for leveraged funds and are actually zero for inverse funds (i.e. there are no short positions in index' components but only in swaps and futures).
This observation suggests to use (21) to define the implied spread asε in a similar way as the familiar Black-Scholes formula is used to define implied volatility rather than to price options. The implied spread offers a more attractive measure of the fund's performance than the tracking difference or the tracking error, as it controls for the effects of both volatility and the leverage factor.
This definition measures the trading cost, above which the investor is better off replicating the fund's payoff by trading, and below which the fund offers a cheaper alternative. The theoretical (population) spread is always positive, as the corresponding tracking difference is necessarily negative in view of trading costs. However, sample estimates can and do lead to occasionally negative implied spreads, which materialize if the fund deviates from a close replication, and happens to outperform its target by mere chance. In fact, such imprecise estimates of the implied spread result from the imprecise estimates of the tracking difference when the tracking error is large.
In summary, the implied spread offers a synthetic figure that represents the cost for investors of leveraged and inverse funds, adjusting for the effects of the factor, volatility, and the tradeoff between tracking error and tracking difference. However, such a measure is accurate only when the tracking difference is accurate, which is when the tracking error is sufficiently low.

R-Squared
The squared correlation of a fund's return with its benchmark is an intuitive, scale-free statistic, which reflects the fraction of the fund's variance that is explained by the benchmark's own variance. For funds with relatively low factors, such as the ones traded in US markets, the squared correlation is similarly close to the frictionless level of 100%, leaving few insights to be gleaned from this quantity.
Equations (15) and (20) provide respectively the exact and asymptotic formulas of squared correlation. Importantly, the R 2 does not depend on the volatility process, but only on the trading cost ε, the leverage factor , and the aversion parameter γ that controls the tradeoff between tracking difference and tracking error. Figure 3 plots the squared correlation against the leverage factor, highlighting several effects. First, squared correlation declines for large positive and negative factors, as the optimal no-trade region widens and the fund's exposure is increasingly variable.
Second, squared correlation is asymmetric in the factor and systematically lowers for inverse funds. For positive factors, correlation is poor near zero, becomes perfect at the benchmark value of one, and deteriorates as leverage increases further. As negative factors do not include the benchmark, their correlation is generally lower, and achieves its maximum near minus one.
The low correlation near the zero factor, though empirically irrelevant, deserves a comment, in view of its differences with the factor of 1. The explanation of such low correlation is that a fund with near-zero exposure combines a very low volatility with a relatively wide no-trade region, resulting in a fund's exposure that is rather variable, and poorly explained by the benchmark's return.
There are two extreme cases, ∈ {0, 1}, which are explicitly excluded in theorem 3.1. In these cases, an optimal trading policy is not to engage in trading at all; thus both transaction costs and tracking error vanish. Therefore, the tracking error cannot be freely chosen, as the value function vanishes and the optimal policy is independent of the value of γ .
Indeed, equations (13), (14) and (15) imply the relation between R 2 and tracking error which yields the asymptotic formula This formula validates the usual interpretation of 1 − R 2 as fraction of variance in the fund unexplained by the variation in the benchmark and shows that for near zero even a small but positive tracking error can lead to a low R 2 , as the factor appears in the denominator.

Replication of LIETFs
The model in this paper relies on the assumption that trading costs incurred by fund managers are a small constant proportion of amounts traded. Thus it is opportune to investigate the relevance of such an assumption for the replication of LIETFs. Engle et al. (2012) estimate average transaction costs of 8.8 and 13.8 basis points (bps) for NYSE and NASDAQ respectively, based on orders executed by Morgan Stanley in 2004. Their estimates for trading cost rise to 27 bps for relatively larger orders-higher than 1% of the stock's daily trading volume. Similarly, Frazzini et al. (2012) estimate median transaction cost of 4.9 bps, with a value-weighted average of 9.5 bps, which reflect the higher cost of larger trades. In short, empirical studies suggest that trading costs are near 0.1%, except for orders that are higher than 1% of daily trading volume. Thus, to evaluate whether the assumption of constant trading costs is appropriate, it is necessary to understand (i) the details of the rebalancing strategies of leveraged and inverse ETFs and (ii) the relative size of their trading volume compared to that in the underlying index.
To investigate rebalancing strategies, we obtained from ProShares the daily holdings at the security level of the main leveraged and inverse ETFs on the S&P 500 index from January 26 to April 6, 2018. This period included a number of days with large market movements, which offer the opportunity to observe rebalancing behavior in volatile times. Because the institutional objective of these funds is to offer a multiple of the daily return on the underlying index, the daily frequency is the natural one to study rebalancing. It is also the highest frequency for which public data are available.
Note also that LIETFs trade both to rebalance in response to the index' return, and to adjust exposure in response to the creation and redemption of shares by authorized participants. Thus the index return alone is not a reliable indicator of the trading volume. Instead, the direct inspection of daily portfolio composition reflects the overall trading volume resulting from both managers' rebalancing and investors' flows.
The left panel in table 1 shows the source of exposure to the index for each of the S&P 500 funds: most of the exposure is achieved through index swaps contracts with various counterparties (such as Goldman Sachs, Bank of America, JP Morgan, Credit Suisse, and other financial institutions). Such contracts are typically multiples of 10 or 100 million USD and are rebalanced in such multiples. A small fraction of the exposure is achieved through E-mini futures contracts on the S&P 500. For leveraged funds, a substantial amount of exposure, approximately 80% percent of the fund's assets, is achieved through direct ownership of each of the 505 index' components. By contrast, inverse funds do not take short positions in the index' components, thereby avoiding short sale costs associated with borrowing shares of individual stocks, while achieving exposure entirely through swaps and futures.
Comparing the changes in each security's daily holdings over time reveals the magnitude of the trading amounts generated by ProShares funds on the S&P 500 index. The right panel in table 1 displays the average daily turnover for each factor and asset class as a percentage of the fund's assets. As observed in the introduction, the overall turnover is significantly higher for larger negative factors, with the factor −3 generating approximately twice as much turnover as −2 or 3. In addition, the table reveals that in practice most of the turnover is concentrated in index swaps, with a significant minority of trading in futures and, for positive factors, in stocks. Such a pecking order reflects the lower trading costs of index swaps relative to futures and of futures relative to stocks.
To understand the size of trading volume in LIETFs compared to trading volume in the index, we examined daily assets under management for all the Proshares funds on the S&P 500 index since June 23, 2009, the first day on which all funds multiples are available. This information allows to compute the total volume generated by all funds, both from rebalancing and from the subscription or redemption of shares. Comparing such daily volume to the total volume generated by trading in the stocks included in the S&P 500 index, yields for each day the ratio between ETFs volume and stock market volume. The summary statistics are in  0.20% of trading volume in the S&P 500 index components, and on 99 out of 100 days it is below 0.86%. These figures imply that, even in the implausible worst case scenario that all trading generated by LIETFs took place in underlying assets, such volume would be on average five times below the 1% threshold above which price impact starts to become relevant. In practice, as shown by table 1, only a minority of trading takes place in the underlying assets, with the bulk concentrated in index swaps and futures, which entail much lower trading costs, thereby indicating that the assumption of constant trading costs is germane to the present application.

Empirical results
This section examines the model's implications for leveraged and inverse exchange-traded funds listed in the USA. Table 3 displays the performance statistics for some of the largest families of leveraged and inverse ETFs. To facilitate comparisons among different factors of the same benchmark, observations for each family are trimmed to the longest period that includes all factors (the youngest funds are usually the ones with triple exposure). Each row displays the relative performance of a leveraged or inverse fund compared to a hypothetical portfolio that trades without costs in the nonlevered fund on the same benchmark (e.g. SPY before management fees for the S&P 500), using as safe rate the 1-month treasury rate. The beta and tracking differences are calculated from the linear regression of the fund's daily returns on the returns of the corresponding hypothetical portfolio.
Average exposures (betas) are closely aligned with their targets, but consistently biased towards zero, and such differences are statistically significant for higher factors, as documented by their t-statistics. This is the underexposure puzzle, observed by Tang and Xu (2013), and explained in our model by equation (22) as the result of optimal rebalancing policies. The higher tracking error of funds with larger factors follow from the wider no-trade regions that are optimal for larger factors. Likewise, more negative tracking differences for larger factors may reflect higher rebalancing costs rather than management issues.
The implied spreads in table 3 tend to be lower for larger factors (−3, −2, +3) than they are for smaller ones (−1, +2). In other words, the tracking differences of a larger factor is less negative than it should be if the spread was the same as for smaller factors. Such differences are large, ranging from around 5 basis points for larger factors to about 20 for small factors.
An explanation of this difference is in the relatively flat management fees charged by funds with different factors: for example, the funds that replicate multiples of the S&P 500 index have expense ratios of 0.90% ( − 3), 0.90% ( − 2), 0.89% ( − 1), 0.89% ( + 2), and 0.91% ( + 3). These expenses detract directly from the tracking difference of the fund, and make it much more onerous for an investor (in terms of implied rebalancing cost) to pay 0.90% for a fund that generates twice the index return, than to pay 0.91% for three times. For inverse funds it is even more obvious, as the once and twice inverse funds have virtually the same expenses.
The relevance of this explanation is further confirmed by the gross implied spreads, that is, the implied spreads obtained by the tracking differences before management fees (obtained in the data by adding to the market return the daily management fee of the fund). In accordance with the above explanation, such gross implied spreads are significantly closer to each other across different factors. Some of the gross spreads for the most liquid funds take small negative values due to the effect of tracking error, as the realized return in the sample period is minimally positive before fees. † Consistently with the model, gross implied spreads are higher for less liquid asset classes, such as smaller stocks (S&P Midcap 400 and Russel 2000).
In summary, while the gross implied spread is useful to understand the impact of management fees and managers' tracking efficiency, the relevant performance measure for investors is the net implied spread, as it captures the only return that is available to them, i.e. net of fees.

Underexposure
In their empirical work on leveraged and inverse ETFs listed in US exchanges, Tang and Xu (2013) report that 'LETFs show an underexposure to the index that they seek to track'. That is, empirical average exposures are significantly closer to zero than their target multiple. Table 3 confirms the underexposure phenomenon for several families of leveraged and inverse ETFs traded in US markets: the comparison of realized average exposure with target factors yields significant t-statistics for the riskiest funds, with deviations consistently biased toward zero.
This empirical regularity is explained by equations (14) and (19) as a consequence of managers' optimal rebalancing policies. Indeed, the transaction cost correction in is negative for > 1 and positive for < 0. This effect may seem puzzling at first, as the buy and sell boundaries are approximately symmetric around .
The crucial point is that when the exposure π t is larger (farther from zero), it is also more variable. As the trading policy forces exposure to remain in a fixed range, the implication is that, within such range, the exposure on average spends more time on less variable values-where it is closer to zero. Thus the underexposure effect arises from the combination of symmetric trading boundaries with the asymmetric volatility of the fund. Substituting in equation (22) a fund's average realized exposure asβ, it is possible to solve for the ratio ε/γ , which, combined with a value of the trading cost ε, yields an estimate for γ . Larger deviations of the average exposureβ from the target factor imply a lower γ (the manager favors lower trading costs over lower tracking error) while smaller deviations imply a larger value of γ .
For example, the SPXU has a realizedβ = −2.99 and a factor = −3, which imply γ ≈ 10 assuming that ε = 0.1%, a figure consistent with the discussion in the previous section. Table 4 displays the average exposure implied for funds with typical factors for different values of ε and γ . Small factors (−1 and 2) lead to average exposures that are almost indistinguishable from their targets, especially when statistical error is accounted for. On the other hand, larger factors (−3, −2, and 3) lead to discernible differences across TrE aversion γ and trading costs ε. In particular, the configuration γ = 10, ε = 0.1% closely approximates the panel of S&P 500 funds and γ = 5, ε = 0.1% the panel of Dow Jones funds. In general, values of γ between 5 and 10 with ε = 0.1% closely reproduce the empirical estimates on funds tracking large-capitalization stocks, while higher trading costs ε are warranted for small-capitalization stocks and treasuries.
Granularity. It is worthwhile to consider whether the observed underexposure may have different explanations. One ostensible source of underexposure could be granularity. Indeed, the minimum trade size for the E-mini S&P 500 Futures contract (which is held by the ETFs tracking this index, as explained in the previous section) equals $50 times the index value, which means that when the index is 2800, the contract size is $140,000. Thus for a fund with a $1 billion exposure (the SSO has more than 2 billion in assets, hence an exposure of over 4 billions), the granularity of futures contracts could explain a discrepancy in average exposure of the order of 1.4 × 10 5 /10 9 ≈ 10 −4 .
In fact, the inspection of daily portfolio holdings described above reveals that, because daily rebalancing of leveraged (but not inverse) funds is partly achieved by trading incremental quantities of individual equities, the minimum trade unit for a fund is the price of one share, typically between $10 and $1000. Thus, for a fund with $1 billion exposure, Table 4. Average exposure (β) for typical factors ( ) across trading costs (ε) and TrE aversion (γ ), obtained from equation (22). the granularity of equity prices could justify a discrepancy in average exposure of the order of 1000/10 9 = 10 −6 . Note also that granularity-whether from futures contract sizes or equity prices-would generate either underexposure or overexposure with comparable probabilities. Empirically, the underexposure observed even for the largest ETFs, with assets under management above $1 billion, is of the order of 0.01 = 10 −2 , while overexposure never appears in any of the 40 funds in our original sample. Taken together, these observations indicate that granularity does not reproduce quantitatively the observed underexposure of major leveraged and inverse exchange-traded funds.
Noisy Returns. Another potential source of underexposure is measurement error in the returns of LIETFs and their underlying index. For example, such an error can arise if such returns are calculated from net asset values (NAVs) rather than market prices, which differ slightly from NAVs, due to time lags in their calculation and to small deviations, within the cost of subscription and redemption, which cannot be arbitraged away.
Thus, denote the observed return of a fund asR F t : where R t is the true return and ε t is a measurement error and, likewise, the observed return of the underlying index asR I t := R I t + ε I t . Then, assuming that returns are stationary and that errors are uncorrelated with true returns and with each other, the large sample average exposure converges in probability tõ is the true average exposure. In other words, measurement error in the fund's return does not bias the estimate of average exposure, but measurement error in the index' return generates a spurious proportional underexposure of Var(ε I t )/Var(R I t ). Note that such underexposure is independent of the true or observed fund returns, and in particular it would be independent of the leverage factor, while the underexposure observed in table 3 is much bigger for larger factors. This qualitative observation already casts doubt on such mechanism as a source of the observed underexposure. (In addition, the results in table 3 do not rely on NAVs but on market prices only.) A detailed analysis of returns of NAV and market prices sheds additional light on this issue. Indeed, for the SPY, which replicates the S&P 500 index, the standard deviation of the difference between daily NAV returns and market returns is 0.16%, while the standard deviations of both the market and the NAV daily returns are 1.15%, which implies that (The observation period is 2003-12-02 to 2018-12-31.) Put differently, a measurement error equivalent to the discrepancy between NAV and market price would generate a reduction in average exposure of nearly 2%, irrespective of the factor considered. By contrast, the underexposures in table 4 are much smaller for small factors, as NAVs are never used in the analysis. The above observation helps understand why the underexposures reported here are smaller than the ones observed by Tang and Xu (2013). Indeed, they calculate average exposures of LIETFs with respect to the underlying indexes rather than the market return of the nonlevered fund on the same index (this paper's approach). The benchmark index (which is not itself traded) may exhibit deviations from the market price of the same order as the NAV, and in fact Tang and Xu (2013, table 1 B) report for the SPY an exposure of 98.48%, i.e. an underexposure of 1.52%, which is broadly consistent with the effect of noisy returns.

Robustness
This section discusses, theoretically and numerically, the robustness of previous results to risk premia with stochastic volatility, finite horizons, and discrete (as opposed to continuous) trading.

Risk premia: trading strategies and performance
Assume that the underlying S has non-zero risk premia, where μ t = κσ 2 t , with κ ≥ 0. Thus this specification prescribes that the Sharpe ratio increases as return variance increases, as in the asset pricing models of Campbell and Cochrane (1999), Bansal and Yaron (2004), and others, and consistent with the empirical countercyclicality of risk premia. In view of the excess returns μ t , the cumulative returns process D t defined by (4) includes a component μ t (π t − ) dt that can be ascribed to the deviation of the fund's exposure from its target. As the exposure π t is observable to investors, † they can accordingly control for overexposure while evaluating performance, by modifying the definition of the deviation process as Replacing (4) with (23), the previous identities (5)-(6) for Tracking Difference (TrD) and Tracking Error (TrE) remain valid, and thus the objective of a leveraged ETF manager is to find a trading policy π t that minimizes the average equivalent expense ratio over all ϕ ∈ , where is the set of admissible strategies (see appendix 1). Note that in this problem, although the objective function controls for extra exposure, the riskpremium κ has not disappeared completely, as it affects the dynamics of the portfolio weight π t . The question is whether this effect is significant enough to alter the previous results.
Thus the risk premium κ affects the optimal trading boundaries only in the ε 2/3 term, which is negligible for reasonable parameter values, as illustrated by figure 4. Similarly, the tracking difference and the average exposure of the fund are also affected at the second order. In particular, the performance formula (21) remains valid, as it involves only first-order terms.
On the issue of performance, figure 5 reproduces the relevant part of figure 2 with realistic risk premia. Again, the main message is that the tradeoff between tracking error and tracking difference remains virtually unchanged at typical levels of tracking error and target factors.

Finite horizons, trading frequency, and stochastic volatility
The use of a stationary objective raises the question of how robust the results are to fixed horizons, and how well stationary optimal strategies perform when trading takes place at discrete frequency, rather than continuously. Figure 6 plots the asymptotic tradeoff between tracking difference and tracking error to the one obtained from Monte Carlo simulations from geometric Brownian motion (Black-Scholes model) and from the Heston stochastic volatility model, in the presence of a risk premium, with daily rebalancing frequency, and for a finite horizon of 5 years. For the Heston model, set v t := σ 2 t , where v t satisfies the dynamics with an instantaneous correlation d B t , W t = ρ dt, and the parametrization v 0 = σ 2 , ζ 0 = ζ * = 1− , unconditional mean k = σ 2 , mean-reversion speed θ = 5 and diffusion coefficient σ . A minimal trade is executed whenever the proportion of wealth π leaves the no-trade region [π − , π + ] as to return π to its closest boundary. The exact trading boundaries π ± are obtained by solving numerically (24) subject to the free boundary conditions (10)-(11). Again, the message of figure 6 is that, for realistic parameter values, risk premia, trading frequencies, and finite horizons have only a minor effect on the basic tradeoff identified by the asymptotic formulas in the paper. While these features make the optimization problem more complex, they remain largely inconsequential for performance evaluation and replication purposes.

Conclusion
Leveraged and inverse funds seek to replicate a multiple of the daily return on an index by frequently rebalancing their portfolio to keep a constant leverage ratio. In theory, in a frictionless market continuous rebalancing yields a perfect replication of a leveraged benchmark, i.e. zero alpha and tracking error. In practice, trading costs create a trade-off between the frequent rebalancing that generates low tracking error and the low trading costs that prevent alpha from becoming too negative. This trade-off becomes more relevant for products that seek to replicate larger multiples or less liquid benchmarks.
Portfolio performance measures based on the regression of a fund's return against its benchmark's return are ubiquitous, hence closely monitored by managers who may be evaluated on their basis. On average, the negative tracking difference of a leveraged fund results from management fees and trading costs, while the tracking error reflects the inevitable deviations from the target exposure that are necessary to keep trading costs under control. Other things equal, a fund that seeks to replicate a larger multiple of an index' return has a higher tracking error and a more negative tracking difference.
It is misleading to compare funds with different factors, and to conclude that one is better managed than the other merely because its tracking error is lower, or because its alpha is less negative: two funds with the same factor may be optimally managed, as one may seek lower tracking error at the expense of more negative tracking difference. High tracking error is not necessarily evidence of poor manager performance if the index is less liquid. On the contrary, a savvy management strategy must accept higher tracking error to achieve less negative tracking differences, and low tracking error inevitably depresses average performance.

Open Scholarship
This article has earned the Center for Open Science badges for Open Data and Open Materials through Open Practices Disclosure. The data and materials are openly accessible at https://github.com/guasoni/leveraged-funds/.
Mathieu Rosenbaum, Ronnie Sircar, Matt Spiegel, Renè Stulz, Peter Tankov and seminar participants at University of Vienna, London School of Economics, Johns Hopkins University, Séminar Bachelier Paris, University of Warwick, the 9th Financial Risks International Forum (Paris), and the Byrne Workshop at the University of Michigan. A special thanks to the anonymous referee, who helped improve the interpretation of figure 1.

Data Availability Statement
The code and data supporting this paper's findings are at https://github.com/guasoni/leveraged-funds.

Disclosure statement
No potential conflict of interest was reported by the authors.  (3)) is admissible if it is solvent and satisfies sufficient integrability:

Funding
(i) its liquidation value is strictly positive at all times: There exists ε > ε such that (ii) Let ϕ t denote the total variation of ϕ on [0, t], then The family of admissible trading strategies is denoted by .
Let X t be the safe position of an admissible, self-financing trading strategy ϕ, and Y t = ϕ t S t the risky position. The following lemma describes the dynamics of the fund value F t = X t + Y t , the risky weight π t = Y t /F t , and the risky/safe ratio ζ t = Y t /X t .
Lemma A.2 Let ϕ be an admissible trading strategy, then Moreover, the functional can be written as Proof The proof follows from similar arguments as in Guasoni and Mayerhofer (2019, lemma A.2).
Remark A.3 Note that zero risk premia (μ = 0) imply EER(ϕ) = − lim inf T→∞ T (ϕ) (as in this case D T equals T 0 ( dF t F t − r t dt) in expectation, compare (4)). Accordingly, the appendix (except part A) focuses on the equivalent problem of maximizing the functional

Appendix 2. Replication of Leveraged Benchmarks
This section contains a series of propositions, which culminate in a proof of theorem 3.1 in section 4. In particular, the optimal trading strategies are derived for replicating leveraged benchmarks under transaction costs. Setting and the free boundary problem (9)-(11) can be recast as For the next statement, the third root of a negative number a is understood as the unique real root of the equation ζ 3 = a.
The following is the verification of optimality for the trading strategy of lemma A.6 with the trading boundaries in proposition A.4: