Superreplication of the Best Pairs Trade in Hindsight

For a market with m assets and T discrete trading sessions, Cover and Ordentlich (1998) found that the “Cost of Achieving the Best Rebalancing Rule in Hindsight” is p(<i>T</i>, <i>m</i>) = <sub>n<sub>1</sub> ···<sup>Σ</sup> n<sub>m</sub>=<i>T</i></sub> (n<sub>1</sub>,<sup>T</sup>...,n<sub>m</sub>)(n<sub>1</sub>/<i>T</i>)(n<sub>1</sub> · · · (n<sub>m</sub>/<i>T</i>)<sup>n<sub>m</sub></sup>. Their super-replicating strategy is impossible to compute in practice. This paper gives a workable generalization: the cost (read: super-replicating price) of achieving the best s−stock rebalancing rule in hindsight is (m/s) p(<i>T</i>, s). In particular, the cost of achieving the best pairs rebalancing rule in hindsight is (m/2) <sup>T</sup><sub>Σ<sub>n=0</sub></sub> (T/n) (n/<i>T</i>)<sup>n</sup> ((<i>T</i> − n)/<i>T</i>)<sup>T−n</sup> = O( √ T). To put this in perspective, for the Dow Jones (30) stocks, the Cover and Ordentlich (1998) strategy needs a 10,000-year horizon in order to guarantee to get within 1% of the compoundannual growth rate of the best (30-stock) rebalancing rule in hindsight. By contrast, it takes 1,000 years (in the worst case) to enforce a growth rate that is within 1% of the best pairs rebalancing rule in hindsight. For any preselected pair (i, j) of stocks it takes 320 years. Thus, the more modest goal of growth at the same asymptotic rate as the best pairs rebalancing rule in hindsight leads to a practical trading strategy that still beats the market asymptotically, albeit with a lower asymptotic growth rate than the full-support universal portfolio.


Literature review
The theory of asymptotic portfolio growth was initiated by Kelly (1956), who considered repeated bets on horse races with odds that diverge from the true win probabilities. Kelly set forth the natural goal of optimizing the asymptotic growth rate of one's capital. This implies that one should act each period so as to maximize the expected log of his capital. By the Law of Large Numbers, the realized per-period continuously-compounded growth rate converges to the expected growth rate.
The Kelly rule was used by Beat the Dealer author Edward O. Thorp (1966) to properly size his bets at the Nevada blackjack tables. For example, imagine a situation where you have a 50.5% chance of winning the next hand. What percentage of your net worth should you bet? The classical mean-variance (Markowitz 1952) theory has no answer, except to say that it depends on your particular appetite for risk. For instance, the extreme choices of betting 0 percent or betting 100 percent are both undominated in the mean-variance plane. The Kelly criterion gives a much more satisfactory answer: bet 50.5% − 49.5% = 1% of your wealth. This achieves the (optimum) capital growth rate of 0.005% per hand played in this (favorable) situation. By the rule of 72, you would expect to double your wealth after approximately 72/0.005 = 14, 400 hands.
Thus, it became clear to many people that the log-optimal portfolio theory should replace mean-variance as the dominant decision criterion. Breiman (1961) proved that the Kelly rule outperforms any "essentially different strategy" by an exponential factor, and it has the shortest mean waiting time to reach a distant wealth goal. Thorp's (2017) biography dicusses his use of log-optimal portfolios in his money management career on Wall Street. Cover's (1987) survey and his information theory textbook (2012) are excellent primers of the theory of asymptotic growth. Cover and Gluss (1986) were the first to exhibit an on-line trading algorithm that could achieve the Kelly growth rate even when starting in total ignorance of the return process. They assume that the return vector has finite support, and use Blackwell's (1956) approachability theorem to exhibit a trading strategy that grows wealth at the same asymptotic rate as the best rebalancing rule (fixed-fraction betting scheme) in hindsight. Thus began a whole host of so-called "universal trading strategies" that, under mild conditions, "beat the market asymptotically" for completely arbitrary (including nonstationary and serially correlated) return processes.
In the lead article of the inaugural issue of Mathematical Finance, Cover (1991) gave the first simple and intuitive universal portfolio, and it removed the restriction to finitely-supported returns. Jamshidian (1992) transplants Cover's (1991) idea into a continuous-time market with several correlated stocks in geometric Brownian motions with unknown, time-varying parameters. Cover and Ordentlich (1996) give the "universal portfolio with side information," along with more perspicuous proofs of the main (1991) regret bounds. For example, Thorp's infamous "count" in Blackjack is a canonical source of side information. Cover and Ordentlich (1998) superreplicate the final wealth of the best rebalancing rule in hindsight at time 0, although they do not use the terminology of financial derivatives so explicitly. Their paper was not inspired so much by derivative pricing as it was by Shtarkov's (1987) "universal source code" in information theory. Properly interpreted, the universal source code gives a "universal gambling scheme" for repeated horse races with unknown (and perhaps nonstationary) win probabilities.

Contribution
Cover and Ordentlich's (1998) high-water mark suffers from two defects. First, for markets with many assets, the practitioner must wait a tremendously long time for his bankroll to "pull away" from the market averages. Second, the strategy is impossible to compute in practice, for it requires large-scale computation of multilinear forms.
The cleverest methods of computation either exhaust the computer's memory or else they require eons of CPU time. The (1998) strategy is only viable for two or three stocks, at best. This paper takes up the more modest goal of growing wealth at the same asymptotic rate as the best pairs rebalancing rule in hindsight. By definition, the best pairs rebalancing rule still beats the best performing stock in the market, and therefore the market averages. This approach leads to a highly practical superreplicating strategy that has the added benefit of pulling away from the market portfolio hundreds or thousands of years earlier than the (1998) universal portfolio.

Motivating example
To motivate the paper, we use a continuous-time example ("Shannon's Demon") to illustrate the fact that the possibility of "beating the market asymptotically" is no contradiction to the random walk model of stock prices. For simplicity, consider two stocks i ∈ {1, 2} that follow independent geometric Brownian motions. Suppose that the price processes S i (t) evolve according to where W 1t , W 2t are independent unit Brownian motions. A traditional value of σ is log 2 = 0.7. We have Note that lim t→∞ log(S it /S i0 )/t = 0. This means that the stocks themselves have zero asymptotic growth; they trade "sideways." Now, consider a gambler who continuously maintains half his wealth in each stock.
This is not a buy-and-hold strategy: the rebalancing rule dictates that he sell some shares of whichever stock performed better over [t, t + dt]. He puts the proceeds into the underperforming stock. If the trader starts with a dollar, his wealth V t evolves according to Applying Ito's Lemma for functions several diffusion processes (Wilmott 2001), we get V t = exp((σ 2 /4)t + σ/2(W 1t + W 2t )).
We thus have lim t→∞ log(V t )/t = σ 2 /4 = 12%. From two dead-money substrates, the gambler has manufactured continuous growth at a rate of 12% per unit time, leaving the market portfolio in the dust. Notice that this growth is merely the result of "volatility harvesting" (Poundstone 2010) or "volatility pumping" (Luenberger 1997).
Obviously, the gambler has not attempted to to guess which stock will outperform over the interval [t, t + dt] -rather, he just rebalances his portfolio after the fact.
A sample path for Shannon's Demon has been simulated in Figure 1. For a pair of correlated stocks, the dynamics will be substantially the same, albeit with a lower growth rate. For a badly chosen pair (i, j) of stocks, the gambler may very well beat stocks i and j but still underperform the market as a whole. The best pair (i * , j * ) will only be apparent in hindsight.

Definitions and notation
We consider a financial market with m assets, called j ∈ {1, ..., m}. For convenience, we will refer to these assets merely as "stocks," although they can be any sort of financial products whatsoever (cash, bonds, lottery tickets, arrow securities, insurance contracts, real estate, etc). The stocks are traded in T discrete sessions, called t ∈ {1, ..., T }. We let x tj ≥ 0 be the gross return of a $1 investment (or "bet") on stock j in session t. x t = (x t1 , ..., x tm ) is the gross-return vector in session t. We will require This means that at least one of the assets must have a strictly positive gross return. To make concrete sense of what follows, the reader may want to keep the following examples in mind.
the portfolio in session t, then the trader must sell some shares of stock j to restore the balance. Likewise, he must buy additional shares of stock j if it underperformed the portfolio as a whole. He can refrain from adjusting his holdings in stock j only if x tj = b, x t , e.g. only when stock j's performance is identical to that of the portfolio as a whole. After T plays, the wealth of the constant-rebalanced portfolio b is where we have assumed that the gambler starts with $1. The final wealth is just the product of the growth factors b, x t from each trading session. Note that the degenerate rebalancing rule b = e j amounts to buying and holding stock j, as it keeps 100% of wealth in stock j. For the Kelly horse race, the rebalancing rule b amounts to a fixed-fraction betting scheme that bets the fixed fraction b j of wealth on horse j in every race. In the sequel, we will let x t = (x 1 , ..., x t ) be the return history after t trading sessions, with transition law x t+1 = (x t , x t+1 ). For the Kelly horse race, this amounts to the win history j t = (j 1 , ..., j t ) ∈ {1, ..., m} t .
Definition 1. Given the return history x t , the best rebalancing rule in hindsight is the rebalancing rule b * (x t ) that would have yielded the most final wealth: where b, x t can be regarded as the payoff of a path-dependent financial derivative ("Cover's Derivative") of the m underlying stocks.
In the exotic option literature, this would be called a "rainbow," "basket," or "correlation" option. In the continuous-time context of several correlated stocks in geometric Brownian motion, Cover's derivative has been priced and replicated by the author (Garivaltis 2018), under the assumption of continuous rebalancing and levered hindsight optimization. By contrast, the present paper deals with discrete-time, unlevered rebalancing, and super-replication under total model uncertainty. In this (extreme) generality, there is no way to guarantee the solvency of leveraged rebalancing rules.
Thus, neither the hindsight-optimization nor the super-replicating strategy will be permitted to use leverage. Note that in the idealized Black-Scholes (1973) world, exact replication cannot be guaranteed without the ability to use leverage.  is defined by Definition 4. A superhedge (or superreplicating strategy) for a derivative payoff D(·) is a pair (p, θ), where θ(·) is a trading strategy and p is an initial deposit of money, such that for all x 1 , ..., In the words of an undated memo by Eric Benhamou of Goldman Sachs, "A superhedge is defined as a portfolio that will generate greater or equal cash-flows in any outcome. A super-hedge guarantees to make no loss as the super-hedge more than offsets the derivative security." The concept is due to Bensaid, Lesne, and Scheinkman (1992). Note that in our context, we demand that the final wealth p · W θ dominate the derivative payoff literally everywhere, and not merely with probability 1.
where the sum is taken over all solutions of the equation n 1 + · · · + n m = T in nonnegative integers.

Definition 6.
A pairs rebalancing rule is a rebalancing rule b ∈ R n + whose support has at most two stocks, e.g.
More generally, an s-stock rebalancing rule is defined by the condition Definition 7. The best pairs rebalancing rule in hindsight is the pairs rebalancing rule that would have yielded the greatest final wealth, given x 1 , ..., x T . The final wealth of the best pairs rebalancing rule in hindsight is In general, D (s) (x 1 , ..., x T ) will denote the final wealth of the best s−stock rebalancing rule in hindsight. Cover and Ordentlich (1998) corresponds to the special case s = m.
For the Kelly horse race, a pairs rebalancing rule is a fixed-fraction betting scheme that, each race, bets all its money on the same two horses (i, j) in the same fixed proportions (b, 1 − b). Thus, if three or more distinct horses wind up winning over the T races, every pairs rebalancing rule will eventually go bankrupt, just as soon as a horse other than i or j wins a race.
Proof. In the definition of superhedging, we substitute x 1 = e j 1 , x 2 = e j 2 · · · x T = e j T and sum the inequality over all possible indices j 1 , ..., j T . Using the fact that where n i = #{t : j t = i} is the number of races won by horse i.
Here we have used the tacit convention that "0 0 = 1" for the situation where there are horses i such that n i = 0.
Proof. We have to solve a standard Cobb-Douglas optimization problem over the simplex: If n i = 0, then b * i = 0, e.g. in hindsight no money should have been bet on horse i. For all other horses we have b * i > 0, and Lagrange's multipliers give the solution Proof. We sum D (2) over all horse race sequences that have at most two winning horses i, j:   (1 − b)x tj ) . One of the terms (i * , j * ) of this sum will correspond to the final wealth of the best rebalancing rule in hindsight. In practice, this will easily dominate the final wealth of the best pairs rebalancing rule in hindsight. However, in the worst-case scenario of the Kelly market, the trader's wealth will be exactly equal to that of the best pairs rebalancing rule in hindsight.
Theorem 2. After T periods, the excess per-period continuously-compounded growth rate of the best pairs rebalancing rule in hindsight over and above that of the superhedging trader is at most log m 2 + log p(T, 2) T , which tends to 0 as T → ∞. Thus, the trader compounds his money at the same asymptotic rate as the best pairs rebalancing rule in hindsight.
Proof. The trader takes his initial dollar and purchases Taking logs, we have the uniform bound The fact that lim It remains to write explicit formulas for the superreplicating strategy. To this end, let W ij (x t ) be the wealth, after x t , that has accrued to a $1 deposit into the Cover (1998) strategy applied to the specific pair (i, j) of stocks, with i < j. Alternatively, one can use the sequential-minimax universal portfolio (Garivaltis 2018) applied to stocks (i, j). On account of the fact that we have made an initial deposit of p(T, 2) dollars into each (i, j) strategy, our aggregate wealth after x t will be p(T, 2) i<j W ij (x t ).
Let b ij (x t ) be the fraction of wealth held by this strategy in stock i after x t , where is the fraction of wealth held in stock j. How much wealth in total is put into stock k after x t ? We have Thus, the total fraction of wealth to bet on stock k in session t + 1 (after return This expression merely accounts for the total wealth held by the m−1 pairs strategies (i, k) and (k, i) that have stock k in the portfolio, as a fraction of the aggregate wealth held by all m 2 strategies. This sort of fictitious accounting is one of the main tropes of universal portfolio theory. The practitioner is required to keep track of the wealths and portfolio vectors of m 2 separate pairs strategies.

Generalized max-min game
Cover and Ordentlich (1998) considered a two-person zero-sum trading game between the trader (Player 1) and nature (Player 2). The trader picks an entire trading algorithm θ(·) while nature simultaneously picks the returns (x 1 , ..., x T ) of all stocks in all periods. They used the payoff kernel which is the ratio of the trader's final wealth to that of the best (full support) rebalancing rule in hindsight. In this section we solve the generalized game with payoff kernel (θ(·), x 1 , ..., Theorem 3. In pure strategies, the lower value of the game is 1/( m s p(T, s)) and the upper value is 1. Thus, there is no pure strategy Nash equilibrium. The trader's maximin strategy is to play a minimum-cost superhedge for D (s) . Nature's minimax strategy is to pick (any) particular stock j * and have it be the best performing stock in all periods, e.g. x tj * ≥ x tj for all t, j.
Proof. Let θ(·) be a minimum-cost superhedge for D (s) . From the definition of superhedging, we have the uniform bound Thus, the trading strategy θ(·) guarantees that the payoff is at least 1/( m s p(T, s)). This is the best possible guarantee. For, suppose that a trading strategy ψ(·) guarantees that W ψ /D (s) ≥ g for all x 1 , ..., x T . Then, since (1/g)W ψ ≥ D (s) , the strategy ψ(·) is a superhedge for D (s) , with an initial deposit of 1/g dollars. Since the cheapest possible superhedge costs m s p(T, s), we must have 1/g ≥ m s p(T, s), so that g ≤ 1/( m s p(T, s)). This shows that 1/( m s p(T, s)) is the highest possible payoff the trader can guarantee.
To show that the upper value is 1, assume that nature chooses a specific return path x 1 , ..., x T with the property that a certain stock j * is the best performer in all periods. Then the best s−stock rebalancing rule in hindsight is a degenerate rebalancing rule that keeps 100% of its wealth in stock j * at all times (and 0% in the other s − 1 stocks). This also happens to be the best trading strategy of any kind that could be played against the specific path x 1 , ..., x T . Thus, this specific path guarantees that W θ /D (s) ≤ 1 for all θ(·).
Before we solve the generalized max-min game in mixed strategies, we need a simple lemma. Lemma 1. For any trading strategy θ(·), we have (j 1 ,...,j T )∈{1,...,m} T W θ (e j 1 , ..., e j T ) = 1. (26) The proof is immediate, by substituting unit basis vectors into the definition of a final wealth function, and summing over all m T possible Kelly sequences. Since the coordinates of a given period's portfolio vector sum to 1, we get Here, θ tjt is the j th t coordinate of the portfolio vector used in period t. . The value of the game is 1/( m s p(T, s)). In the mixed-strategy Nash equilibrium, Player 1 does not randomize; he continues to play a minimum-cost superhedge θ(·).
Proof. Obviously these are legitimate probabilities, since they are nonnegative and sum to 1. For a specific Kelly sequence (e jt ) T t=1 that has at most s distinct winners, the payoff is W θ (e j 1 , ..., e j T )/D (s) (e j 1 , ..., e j T ). Multiplying these payoffs by their probabilities and summing over all such sequences, we obtain an expected payoff of By the lemma, the numerator is at most 1. Thus, nature's mixed strategy has guaranteed that the expected payoff is at most 1/( m s p(T, s)), regardless of θ(·). This proves the theorem.

Conclusion
This paper generalized Cover and Ordentlich's important (1998) result, that the cost of superreplicating the best full support rebalancing rule in hindsight is p(T, m) = n 1 +···+nm=T T n 1 ,...,nm (n 1 /T ) n 1 · · · (n m /T ) nm . We obtained the result that the cost of superreplicating the best s − stock rebalancing rule in hindsight is m s p(T, s). For any significant number of stocks (say, the Dow Jones 30), the full support universal portfolio is impossible to compute in practice. However, it is very easy to superreplicate the best pairs trade in hindsight: one need only calculate and account for m 2 2-stock universal portfolios. The minimum-cost superhedge amounts to buying m 2 superhedges for p(T, 2) dollars each, one for each pair (i, j) of stocks. In temperate markets this strategy will easily have enough final wealth with which to dominate the best pairs rebalancing rule in hindsight. However, in very wild markets, involving frequent bubbles and crashes, the final wealth of this strategy will not be much more than that of the best pairs rebalancing rule in hindsight. In the limiting case of the Kelly horse race market, the strategy will have a final wealth that is exactly equal to the derivative payoff.
In practice, the trader will have to pick a tolerance , and calculate the shortest horizon T on which he can guarantee to achieve a compound growth rate that is within of that of the best pairs rebalancing rule in hindsight. T is the smallest solution T of the inequality log m 2 + log p(T, 2) T < .
With this horizon in hand, the trader takes his initial dollar and purchases 1/( m 2 p(T, 2)) superhedges, yielding a final wealth of at least D (2) /( m 2 p(T, 2)), where D (2) is the wealth of the best pairs trade in hindsight. Under mild conditions on the realized path of stock prices, the trader will beat the market asymptotically. However, his asymptotic growth rate will be somewhat lower than that achieved by the full-support universal portfolio.