Principled pasting: attaching tails to risk-neutral probability density functions recovered from option prices

The popular ‘curve-fitting’ method of using option prices to construct an underlying asset's risk neutral probability density function (RND) first recovers the interior of the density and then attaches left and right tails. Typically, the tails are constructed so that values of the RND and risk neutral cumulative distribution function (RNCDF) from the interior and the tails match at the attachment points. We propose and demonstrate the feasibility of also requiring that the left and right tails accurately price the options with strikes at the attachment points. Our methodology produces a RND that provides superior pricing performance than earlier curve-fitting methods for both those options used in the construction of the RND and those that were not. We also demonstrate that Put-Call Parity complicates the classification of in and out of sample options.


Introduction
For more than thirty years, researchers have employed a wide variety of techniques that use a cross-section of option prices observed at a single point in time to recover an underlying asset's implied RND.One of the more popular of these techniques follows the theoretical insight of Breeden and Litzenberger (1978) that a continuum of (strike price, option price) pairs can be used to recover the entire RND using the second derivative of the option price with respect to the strike price.However, in practice, only a discrete spacing of such pairs over a limited range is available.Thus, researchers developed ways to interpolate between the limited, discrete spacing.A popular methodology traces out the interior of the RND by fitting a curve to the observed option prices and then taking the second derivative of the call option prices consistent with that curve.In this curve-fitting methodology, left and right tails are then attached to the interior of the RND to complete the entire RND.
Coupled with some assumption(s) concerning attitudes towards risk, a RND can shed light on market views regarding the likelihood of movements in the price of the asset underlying the option contract.However, even absent assumptions about risk, RNDs are incredibly useful constructs for policymakers interested in market participants' views on the value of resources in future states of the world.See Feldman et al. (2015) for elaboration on this point.
Construction of the RND using the curve-fitting technique is a numerically intensive task that requires choices about: the options selected to be used for the construction of the interior of the RND, the space in which to fit a curve to these selected option prices, the functional form of the fitted curve, and the method for attaching the tails.Not surprisingly, variations on the procedure have proliferated, due to both the many decisions required in the construction of the RND as well as the great interest among market practitioners and policymakers in the insights offered by RNDs.
To supplement the usual practice of attaching tails that match interior values for the RND and RNCDF, we propose adding an intuitive and common-sense constraint.Namely, the left (right) tail should accurately price the option, in the set of options used to construct the interior of the RND, with the lowest (highest) strike price.The usual practice makes sure that the left and right tails match the height of the RND and the probability mass given by the RNCDF at the attachment points.Our methodology ensures that the probability mass within the tails is properly located by incorporating the conditional expectation of the asset price finishing below (above) the left (right) attachment point that is embedded in the attachment point option price.
When comparing our methodology to earlier curve-fitting techniques, we assess the quality of the constructed RNDs based on the ability of the relevant integrals of the RND to accurately price options.We pay careful attention to option pricing performance both for those options used in the construction of the interior of the RND (in sample) and those that were not (out of sample).Determining which options are in and out of sample requires some care, as Put-Call Parity blurs the distinction between the two classifications -a consideration that has not featured in earlier work.
Our paper is organized as follows.Section 2 reviews the literature on curve-fitting RND recovery.In section 3 we briefly present the curve-fitting process of recovering the interior of the RND as well as our new approach for attaching the left and right tails.Section 4 discusses the particulars of our application to options on the FTSE 100 Index and outlines two earlier and well-known alternative methods for attaching the tails.Section 5 compares the RNDs from each of the methodologies, focusing on the ability of the different methods to accurately price options.Section 6 offers conclusions.

Review of the literature
Longer reviews of the entire RND literature can be found in Jackwerth (2004), Taylor (2005), and Figlewski (2018) with well-crafted, succinct reviews available in Bliss and Panigirtzoglou (2002), Figlewski (2010), Markose andAlentorn (2011), andLu (2019).In these reviews, the techniques used to recover the RND are often divided into parametric and non-parametric categories and then subdivided into categories such as mixture methods, kernel methods, and curve-fitting methods.† Here we only review the literature on curve-fitting methods given that our proposed enhancement directly applies to this technique.Shimko (1993) is the first application of a RND recovery using the curve-fitting methodology.He uses options on the S&P 500 index, constructs the curve using a quadratic function in (strike price, implied volatility) space, and adds left and right lognormal tails that match the values of the RND and RNCDF at the attachment points.Working with options on foreign exchange, Malz (1997) advocated fitting the curve in (call delta, implied volatility) space using a quadratic equation where implied volatility is a function of: the at-the-money implied volatility, the distance of delta from 0.5, and the square of the distance of delta from 0.5 with the distances weighted, respectively, by the prices † See Markose and Alentorn (2011) for a graphical representation of one such classification system.quoted for the risk reversal and strangle option combinations.Campa et al. (1997), also working with options on foreign exchange, create the curve using a natural cubic spline in ( strike price forward rate , implied volatility) space.Rather than attaching tails to complete the RND, they add pseudo-data points by assuming that the implied volatility is unchanged to the left (right) of the option price with the lowest (highest) strike price.Bliss and Panigirtzoglou (2002) and Bliss and Panigirtzoglou (2004) combine the methods of Malz (1997) and Campa et al. (1997) in their recovery of RNDs for FTSE 100 Index options and short sterling futures by fitting the curve in (call delta, implied volatility) space using a natural cubic spline and pseudo data points to extend the tails.When constructing the spline, they also use vega weights to reduce the importance of options that are deep out-of-the-money.‡ In examinations of options on the S&P 500, Figlewski (2010) and Birru and Figlewski (2012) construct the curve using a fourth degree spline in (strike price, implied volatility) space with a single knot at the at-the-money strike.Like Shimko (1993), both of these papers attach tails to the interior of the RND by explicitly matching the values from the RND and RNCDF using the three-parameter Generalized Extreme Value distribution rather than the two-parameter lognormal distribution.This allows them to match the RNCDF at a single point and the RND at two points for each tail.
Given its popularity, the curve-fitting method is often included in studies that compare two or more techniques for recovering RNDs.Bliss and Panigirtzoglou (2002) compare the curve-fitting method to a parametric recovery involving a mixture of two lognormals.Bondarenko (2003) introduces the positive convolution approximation method and compares it to the curve-fitting method and six other RND recovery techniques.Bu and Hadri (2007) compare the spline curvefitting to confluent hypergeometric function methods.Eight different techniques, both parametric and non-parametric, are compared in Lu (2019).Reinke (2020) compares the performance of the methods put forward by Jackwerth (2004) and Figlewski (2010).Bahaludin and Abdullah (2017) focus exclusively on curve-fitting methods, comparing the second and fourth order polynomial approaches to spline approaches.Firm conclusions across all of these studies are difficult to reach given that in some studies a 'true' distribution or stochastic process is specified that differs from study to study while in other studies calculated option prices from the recovered RND are compared to the actual option prices used in the recovery.However, the use of the curve-fitting methodology in all of these studies demonstrates its popularity.§

RND recovery
Recovery of a single RND requires data on option prices for an underlying asset on a single contract/trading day.These ‡ More details can be found below in section 4.3.2.§ Although they do not make any explicit comparisons with curvefitting results, Cui and Xu (2022) also call attention to the popularity of curve-fitting techniques when introducing a new RND estimation technique based on the spanning result of Carr and Madan (1999).
data are usually obtained from an exchange, a market-maker, or a third-party distributor and may be end-of-day settlement prices, time-stamped transactions, or bid/ask listings.Standard practice then selects some subset of the option prices based on data filters meant to screen out erroneous prices or prices that are thought to not accurately reflect market conditions (e.g.options without traded volume or open interest) to create a final dataset.

Recovering the interior of the RND
Recovery of the interior of the RND typically starts with N out-of-the-money (OTM) (strike price, option price) pairs selected by the researcher.A researcher may choose to use all available OTM options to recover the interior of the RND or only a subset of the OTM options.As also discussed in section 4.3, we only use a subset of the available OTM options, based on the option call deltas, to construct the interior of the RND in order to allow for a meaningful outof-sample comparison across the different methodologies we examine.The top panel of figure 1 provides an illustration of the difference between the in-sample and out-of-sample options for one trading day in our dataset.For typical applications using exchange-traded options such as options on crude oil futures or the FTSE 100, N would usually be in the range of 20 to 70.These N selected OTM options are then transformed via non-parametric techniques into a much larger set of T (strike price, estimated option price) pairs at strikes that are very close together.Commonly, T is a fairly large number on the order of 5000.The interiors of the RNCDF and RND are then calculated by applying numerical derivatives to these T pairs.Examples of this methodology are found in Aït-Sahalia and Duarte (2003), Birru and Figlewski (2012), and Bliss and Panigirtzoglou (2004).
To fix notation, an (N, 2) matrix of observed strike prices and OTM option prices that satisfy monotonicity and convexity constraints is used to ultimately create a (T, 4) matrix of tightly-spaced strike prices, theoretical call prices, and the RNCDF and RND evaluated at each tightly-spaced strike price.The process is visualized in equation ( 1).

=⇒
where: K i is an observed strike price, P i is an observed OTM put price, C i is an observed OTM call price with i = 1, 2, . . .N, X j is a constructed, tightly-spaced strike price, Ĉj is an estimated price for a call option with strike X j , RNCDF j is the RNCDF evaluated at X j , and PDF j is the RND evaluated at X j with j = 1, 2, . . .T.
Both matrices in equation ( 1) are ordered such that K i+1 > K i and X j+1 > X j .It will be the case that X 1 = K 1 and X T = K N , but otherwise there is no direct correspondence between X j and K i .The values for RNCDF 1 , RND 1 , RNCDF T , RND T are not available because calculating centered, numerical derivatives at these points requires option prices that are outside the observed range [K 1 , K N ].
Using the methodology of Bliss and Panigirtzoglou (2004) as an example, the process of moving from observed option prices to the interior RND (that is, moving left to right in equation ( 1)) involves five steps.
Step 1 -determine the implied volatilities (σ i ) associated with the (K i , P i or C i ) pairs using a non-linear solver and, for the case of options on futures, the Black (1976) option pricing equation.
where F is the futures price for the underlying asset at the current point in time, r is the risk-free rate of interest, t is the time to expiration in years for the option contract, and (•) denotes the standard normal cumulative distribution function.
Step 2 -The implied volatility associated with the closestto-the-money strike price, denoted σ ATM , is used to convert the K i strike prices into call deltas in what is termed the 'point conversion' method by Bu and Hadri (2007) using the equation † Step 3 -A curve is fit to the ( , σ i ) pairs, using a cubic smoothing spline with vega weights (essentially moneyness † The point conversion method is used to ensure that the order of the options in strike price -option price space will be preserved when converted into call delta -sigma space. weights) with vega defined as where φ(•) is the standard normal probability density function.This curve is then evaluated at T tightly spaced call deltas to recover T predicted implied volatilities denoted σj .The tightly-spaced call deltas are generated by first creating T tightly-spaced strike prices X 1 through X T in the interval [K 1 , K N ] and then using equation (3) to convert the strike prices into call deltas.
Step 4 -The tightly spaced ( , σj ) pairs are converted into tightly-spaced (X j , Ĉj ) pairs by solving equation (2) using X j and σj in place of the K i and σ i .Thus the (X j , Ĉj ) pairs will be consistent with the spline fit in Step 3.
Step 5 -As is detailed in Figlewski (2010), the RNCDF and RND are recovered from the centered numerical first and second derivatives calculated from the (X j , Ĉj ) using These five steps suffer from some numerical complications related to the precision of the evaluation of (•) in Step 1.As a result some options must be discarded, as discussed in section 4.1.In addition, the tightly-spaced X j cannot be so tightly spaced that the evaluation of (•) in equation (3) yields j+1 meaning that some judgment be exercised in how tightly to space the X j .

Attaching the tails
With the interior of the RND determined as in section 3.1, the left tail (the portion of the RND below X 2 ) and the right tail (the portion of the RND above X T−1 ) must be added to complete the RND.A variety of methods are used in the literature, for example adding pseudo data points beyond the range of observed strike prices (Bliss and Panigirtzoglou 2004), pasting on generalized extreme value tails (Figlewski 2010, Birru andFiglewski 2012) or simply rescaling the interior of the RND so that it integrates to 1.0 over the range [X 2 , X T−1 ] (Aït-Sahalia and Duarte 2003).
There are a number of conditions that the attached tails can be forced to satisfy.Perhaps most obviously, the value of the functions used to create the attached tails at X 2 and X T−1 should match the interior portion of the RND at these two points.Denoting the functions used for the left and right tails as f L (•) and f R (•), these two conditions are Secondly, the probability mass in the two tail functions should match the probability mass below X 2 and above X T−1 implied by the RND.That is where F T denotes the underlying futures price at the contract's expiration.Implicit in equation ( 7) are conditions on the support for the two tail functions as well as the condition that the probability mass of the entire RND sums to 1.0.
Our contribution is to also constrain the functions used for the left and right tails to accurately price the observed options with strikes that are extremely close to the attachment points.
Accurately pricing these options with strikes K 1 = X 1 and K N = X T provides the best possible means for locating the probability mass contained in the tails of the RND.More formally, and using the fact that an option's price is the discounted, risk-neutral probability-weighted payoff of the option given that it finishes in the money, we have (assuming that the OTM option with strike K 1 = X 1 is a put, while the OTM option with strike K N = X T is a call) Another possible condition is that the expected value of the entire RND should match the appropriately discounted price of the underlying asset.In the case of options on futures, this condition is written as We do not employ this last condition for two reasons.First, the condition is difficult to enforce when separately creating the left and right tails, and second, the condition can be used as a measure of goodness of fit when evaluating the entire RND after attaching the tails.We use a 2-parameter Weibull probability density function for the left and right tail functions.Our choice of the Weibull is motivated by several considerations.First, the [0, ∞) support for the Weibull is appropriate for considering assets such as the FTSE 100 with prices that cannot be negative.Second, the Weibull is a very flexible functional form that is relatively tractable.Finally, as emphasized by Savickas (2002), the Weibull easily accomodates the negative skewness typically seen in asset price distributions.† The general form of the 2-parameter Weibull distribution is In what follows we will also use the gamma function (z) and its variants, the upper incomplete gamma function (z, M ) and the lower incomplete gamma function γ (z, M ), defined as

Left tail.
For the left tail we use a 'scaled' Weibull PDF: where ) with β L defined in order to guarantee the condition in equation ( 6) that the left tail and the interior RND take on the same value at X 2 .We then use the left tail conditions in equations ( 7) and ( 10) to create a system of two non-linear equations to solve for the left tail parameters k L and λ L .The initial forms of the equations are The second condition in equation ( 15) can be written as dF T † See also Savickas (2005) for additional option pricing applications of the Weibull distribution.

− β
We evaluate the remaining integral in equation ( 16) with the following substitutions A second round of substitutions allows equation ( 17) to be written as Substituting the result in equation ( 18) into equation ( 16) and then into equation ( 15) provides the two non-linear equations used to solve for the parameters of the left tail.
These two non-linear equations enforce the conditions found in equations ( 7) and ( 10) that the left tail matches the CDF of the interior RND at X 2 and correctly prices the option with strike X 1 for the particular case of the Weibull distribution.

Right tail.
The same scaled Weibull approach is used to create the right tail.
with β R defined in order to guarantee the condition in equation ( 6) that the right tail and the interior RND take on the same value at X T−1 .We then use the right tail conditions in equations ( 7) and ( 10) to create a system of two non-linear equations to solve for the right tail parameters k R and λ R .The initial forms of the equations are The second condition in equation ( 21) can be written as We evaluate the remaining integral in equation ( 22) with the following substitution A second round of substitutions Substituting the result in equation ( 24) into equation ( 22) and then into equation ( 21) provides the two non-linear equations used to solve for the parameters of the right tail.
These two non-linear equations enforce the conditions found in equations ( 7) and ( 10) that the right tail matches the CDF of the interior RND at X T−1 and correctly prices the option with strike X T for the particular case of the Weibull distribution.

Application to FTSE 100 options
To demonstrate the efficacy of our tail pasting procedure, we use daily settlement prices for options on the FTSE 100 Index that are traded on the London International Financial Futures and Options Exchange (LIFFE).These are European-style options that, as explained in Markose and Alentorn (2011), can be treated as options on the FTSE 100 Index futures given that the option contracts and the futures contracts expire on the same day.To provide some assurance that our findings are robust across time, historical settlement prices for the FTSE 100 options were accessed from OptionMetrics' IvyDB product using library subscriptions to Wharton Research Data Services (WRDS) for two time periods: February 6, 2013 through February 2, 2014 using the seven quarterly contracts June 2013 through December 2014 and September 30, 2019 through September 30, 2021 using the twelve quarterly contracts December 2019 through September 2022.† The daily settlement prices on the nineteen underlying FTSE 100 futures contracts were obtained from Bloomberg.We use daily yields on sterling (GBP) LIBOR at the overnight, 1 week, 2 week, and 1-12 month tenors to discount future cash flows, employing a linear interpolation between LIBOR maturities to find the interest rate that matches the maturity of the option contract on each day, using the Intercontinental Exchange as the source for the 2013-2014 LIBOR rates and Bloomberg as the source for the 2019-2021 LIBOR rates.‡ Across the nineteen contracts (a total of 2912 contract/trading days), we have a total of 495 528 option prices with a recorded settle price and a matching interest rate.

Data cleaning
Given that these data are end-of-day settlement prices and not actual prices at which trades took place, we must carefully † The WRDS subscription does not allow access to the most recent year's data.‡ In their applications to FTSE 100 options, Bliss and Panigirtzoglou (2004) and Markose and Alentorn (2011) also use GBP LIBOR rates.First, the minimum tick on the LIFFE is 0.5 FTSE 100 index points.On any given day, quite a few OTM calls and puts will be assigned a settle price of 0.5.Among the 0.5 priced calls, only the 0.5 priced call with the lowest strike conveys any information, as the calls with higher strikes would have been priced lower were it not for the minimum tick requirement.Among the 0.5 priced puts, only the 0.5 priced put with the highest strike conveys any information, as the puts with lower strikes would have been priced lower were it not for the minimum tick requirement.Thus, we discard all of the calls with a price of 0.5 aside from the 0.5 call with the lowest strike and we discard all of the puts with a price of 0.5 aside from the 0.5 put with the highest strike.As can be seen in table 1, this eliminates 9710 options.
Second, options on futures must satisfy the following noarbitrage conditions.§ A total of 2137 of the options violated these conditions-in every case the lower bound.Third, as detailed in textbooks such as McDonald (2012), option prices must satisfy the following monotonicity, slope, and convexity conditions.
A total of 69 324 violations of these conditions were found, the vast majority for low-priced options violating the convexity condition even though we retained options where the two § See McDonald (2012).
sides of the convexity inequalities were equal to each other.All three of these conditions were applied repeatedly across the dataset, as a given violation does not determine which option(s) are generating the violation.This is particularly true for the convexity condition as it involves three options.As a result, and as discussed below, even after these deletions we still use the projection method of Dykstra (1983) to enforce these conditions.Fourth, we drop all pairs of put and call options that share the same strike and violate put-call parity after allowing for the expenses associated with profiting from the violation.In particular, any put-call pairs that violated put-call parity by more than 5 index points (equivalent to £50) were dropped.In total, 13 856 options were eliminated for put-call parity violations.As discussed in section 5.1 these eliminations will be important when considering whether or not an option should be considered as out of sample.
Fifth, we do not estimate a RND for any contract/day with fewer than 5 OTM options, dropping 1110 options.
Sixth, as a contract approaches maturity, the shrinking horizon means that the RND becomes increasingly degenerate and of less interest for both policymakers and market participants.We drop the 8514 options with 7 or fewer calendar days until expiration.
Finally, as mentioned in section 3.1, the numerical complications involved in evaluating (•) means that accurate implied volatilities cannot be calculated for some options.Any numerical software that evaluates (•) will have a limit H above which (H) returns 1.0 and a limit L below which (L) returns 0.0.† Accurate implied volatilities can only be recovered for options where (28) This means that accurate implied volatilities cannot be recovered for options with strike prices that are very deep into the left and right tails of the RND.A total of 9181 options were outside the inequalities in equation ( 28).In every case it is the lower limit that is binding, leading us to drop 9181 options with strikes that are quite low relative to the futures price.For researchers working with small sample sizes, Jäckel (2015) provides a methodology for recovering numerically intractable implied volatilities using a rational approximations approach.
Any attempt to recover the internal portions of the RNCDF and RND via numerical derivatives can yield erratic results if the call options do not satisfy the monotonicity, slope, and convexity conditions in equation ( 27).We follow the lead of Aït-Sahalia and Duarte (2003), including GAUSS code written by Professor Duarte, ‡ and use the projection algorithm of Dykstra (1983) on our 381 696 cleaned option prices to guarantee that the monotonicity, slope, and convexity conditions are strictly satisfied on each of the 2850 contract/trading days.As a result of the earlier data filtering, the adjustments made by the Dykstra algorithm were quite small, with the largest adjustment to any single option amounting to only † In MatLab L −38.381 and H 7.65.‡ available at https://www.jefferson-duarte.com.
4.2 FTSE 100 index points.Across all 2850 contract/days, the average root mean square difference between the day's observed option prices and the Dykstra adjusted option prices was 0.0177 FTSE 100 index points.

Data description
As can be seen in table 2, the options that remain after the filtering described in section 4.1 are roughly evenly divided between puts and calls and spread fairly evenly across moneyness and time to expiration.The sample categories at the bottom of table 2 denote whether an option was used in the recovery of the RND or not.These sample categories are explained in greater detail in section 5.1.The Moneyness categories are those put forward by Figlewski (2002) based on ln( K F ) σ √ t .The large disparity between puts and calls across moneyness is not surprising.If market participants are aiming to protect against the possibility of large drops in the FTSE 100 Index then we would expect a preponderance of deep OTM options to be puts.
Table 3 provides a slightly different view of the data by providing counts of the number of options across the 2850 trading days in our sample.On average, each contract/trading day has almost 155 options available that are spread fairly evenly across moneyness.About 62 OTM options are used to construct the RND for a typical trading day and the typical trading day has an average of almost 31 truly out of sample options.
Finally, table 4 presents information on the options and trading days for each of the nineteen contracts in our sample.
Table 2. Descriptive statistics for the 381 696 options in the dataset across puts and calls by: moneyness, days to expiration, and sample.OTM denotes out of the money, ATM denotes at the money, ITM denotes in the money.In sample options are those used to construct the interior of the risk neutral density.Out of sample options are those options that were not used to construct the interior of the risk neutral density and that do not share a strike with an option used to construct the interior.Quasi out of sample options are the remaining optionsthose options that share a strike with an in sample option.As our sample includes the main phases of the Covid-19 pandemic, there is a fairly sizable movement in the FTSE 100 index during each contract's life, with a given contract/trading day having between 12 and 268 options from which to select the OTM options for RND construction.As can be seen in the columns labeled In Sample, our RNDs are constructed from as few as 6 to as many as 124 OTM options.Maturities for the contract/trading days range from 8 days to slightly more than one year.Our entire sample comes from a period of very low interest rates with the highest average LIBOR rate for a contract amounting to a bit less than 9 basis points.

RND construction
As outlined in section 3, a researcher constructing a RND must make a number of choices regarding data and estimation techniques.We discuss the choices made in our new technique (hereafter TailHAP for tails matching the Height, Area, and option Price at the attachment points) and, for comparison, two other procedures in highly cited existing studies from Bliss andPanigirtzoglou (2004, 2002) (hereafter BP) and Birru and Figlewski (2012) and Figlewski (2010) (hereafter FB).Given our interest in the performance of the different procedures for attaching the tails, we make sure that the interiors of the RNDs cover a small enough range so that the left and right tails will have enough mass to significantly influence the pricing of any options as well as to guarantee a useful number of out-of-sample options.To construct the interior of the RND for all of the techniques we only use options with call deltas within the range [0.05, 0.99].For example, on trading day 20200619 for the Dec. 2020 contract, the options used for the interior have strikes that range from 4200 to 8300 while the futures settle price for that day was 6227.0.All of the RND constructions are done using MatLab.

TailHAP construction.
As described in section 3.1, to begin the construction of the interior of the RND we fit a cubic smoothing spline in ( , σ i ) space with vega weights using MatLab routine csaps with the smoothing parameter set to 0.9.The left and right tails are constructed by using the routine fsolve to find the values for k L and λ L (and thereby β L ) that solve the expressions in equation ( 19) and the values for k R and λ R (and thereby β R ) that solve the expressions in equation ( 25).Solving these highly non-linear equations involving (z, M ) and γ (z, M ) requires good starting values k L0 , λ L0 , k R0 , and λ R0 that are set as described in the Appendix.Figure 1 provides an example of one trading day's results for the TailHAP procedure, using the results for 20200619 for the Dec. 2020 contract mentioned above.The upper panel shows the entire RND along with the option prices available for that trading day.The + signs denote the options used to construct the interior of the RND.The bottom panel shows the spline in ( , σ i ) space used to create the interior of the PDF.The circles denote the same + options from the top panel.
We also consider a variant of the TailHAP procedure that avoids the (z, M ) and γ (z, M ) functions, denoted TailHA-PAlt, where the left and right tails use lognormal densities rather than Weibull densities.The procedure is identical in that the parameters for 'scaled' lognormal tails are chosen to match the values of RND and RNCDF at the attachment points as well as the prices of the OTM options with the lowest and highest strike prices.Using some of the results for the lognormal distribution from the Appendix, for the left tail we use fsolve to find the values for μ L and σ L such that For the right tail we use fsolve to find the values for μ R and σ R such that Starting values for equations equations ( 29) and ( 30) require much less thought, given that (•) is more tractable than γ (z, M ) and (z, M ).We use 2 and σ L0 = σ R0 = σ ATM .

BP construction.
To allow for an apples to apples comparison of the RNDs from the different procedures, when constructing the BP RND we deviate slightly from the exact procedure followed by Bliss andPanigirtzoglou (2004, 2002).In their work, Bliss and Panigirtzoglou add two pseudo data points in (K, σ ) space: one below the lowest strike at (K 1 − 150, σ 1 ) and one above the highest strike at (K N + 150, σ N ).That is, they extend the volatility smile in either direction by 150 index points by forcing the smile to become horizontal at either end.They then construct the RND as in section 2. They state that 'Extrapolating the implied volatility function manner has the effect of smoothly pasting log-normal tails onto the density function beyond the range of strikes.' † We accomplish this same pasting by directly formulating lognormal tails that allow us to evaluate the entire RND over the same range as for the TailHAP and FB procedures.For the left lognormal tail we set σ L = σ 1 and then solve for the other lognormal parameter, μ L , so that the value of the RND from the left tail and the interior matches at the left attachment point.Given the functional form of the lognormal PDF in the Appendix we have For the right lognormal tail we set σ R = σ N and solve for μ R as (32) Figure 2 provides an example of the BP procedure for the same trading day as in figure 1.The only difference between the two figures are the left and right tails, as both the TailHAP and BP procedures follow the same method for producing the interior of the PDF.As can be seen in the heading for the top panel, the BP RND integrates to a value well below one and has a mean that is quite different from the futures price.
As figure 2 makes clear, the BP pasting of lognormal tails makes no attempt to match the left and right values of the RNCDF, that is, the BP procedure makes no attempt to ensure that the RND integrates to one.Therefore, we consider an alternative procedure, BPAlt, that adds lognormal tails that match both the RND and RNCDF at the attachment points by using fsolve to find the values for μ L , σ L , μ R , and σ R such that 1

FB construction.
The construction of the interior of the RND in Figlewski (2010) and Birru and Figlewski (2012) uses a fourth order spline in (X , σ ) space with a single knot at the option price that is nearest to the money rather than the cubic spline in ( , σ i ) space used in the TailHAP and BP procedures.We implement the fourth order spline using Mat-Lab's spap2 routine.Not surprisingly, the different splining techniques produce different values for the interior portions of the RNCDF and RND, especially at the left end.On average, across all 1984 contract/trading days, the value for RNCDF 2 † Bliss and Panigirtzoglou ( 2004) page 417.from the FB procedure is 27 percent below the value for RNCDF 2 from the TailHAP and BP procedures.The value for RND 2 from the FB procedure averages 24 percent below the value for RND 2 from the TailHAP and BP procedures.
The FB construction also differs in the functional form and attachment method of the left and right tails.The tails are both Generalized Extreme Value (GEV) distributions with parameters chosen to impose the constraints that the tails match the RND at two interior points and the RNCDF at one interior point.The GEV distribution and density are given by Where possible, the FB methodology matches the right tail to the interior of the RND at the X i nearest the 92nd and 95th percentiles.Denoting these particular strikes by X 0.92 and X 0.95 and the heights of the RND at these points by RND 0.92 and RND 0.95 , the right-tail parameters μ R , σ R , and ξ R are solved for using the three conditions For the left tail, the FB methodology matches the interior of the RND at the X i nearest the 2nd and 5th percentile.Denoting these particular strikes by X 0.02 and X 0.05 and and the heights of the RND at these points by RND 0.02 and RND 0.05 , the lefttail parameters μ L , σ L , and ξ L are solved for.The left tail procedure is slightly more complicated than for the right tail.
As explained by Figlewski 'Since the GEV is the distribution of the maximum in a sample, its left tail relates to probabilities of small values of the maximum, rather than to extreme values of the sample minimum, i.e. the left tail.To adapt the GEV to fitting the left tail, we must reverse it left to right.' † Thus, the three conditions for the left tail are As mentioned in section 4.3, we only use options with call deltas within the range [0.05, 0.99], so the desired FB attachment points are rarely available in the interiors of the RNDs and RNCDFs.We then follow Figlewski (2010) and Birru and Figlewski (2012) when confronted with the same situation and implement the FB procedure by first finding the strike price with a RNCDF value as close as possible to RNCDF T−1 − 0.03, denoted X α0R , and the strike price with a RNCDF value as close as possible to RNCDF 2 + 0.03, denoted X α0L .In these cases, the FB tails are constructed so that and Figure 3 shows that on the same example day the FB procedure produces a much smoother interior for the RND -a result that also holds in general.The bottom panel shows the FB spline in (X , σ ) space.The header for the top panel demonstrates that the FB procedure produces a RND with an integral essentially equal to one.However, the expected value for the RND is not as close to the futures price as for the TailHAP procedure.We return to this point in section 5.

RND evaluation
Applying the five procedures in section 4.3 to our FTSE 100 data generates five sets of 2850 constructed RNDs and five sets of 381 696 estimated option prices.Evaluation of these results can be conducted both at the level of the RND and at the level of option prices, with the latter including assessment across maturity, moneyness, and sample.Prior to this evaluation, attention must be paid to the definitions of options that are in and out of sample.The spline fit in (strike price, implied volatility) space used in the construction of the interior of the risk neutral density shown in red in the top panel.

Option sample definitions
When dealing with option pricing, the difference between in and out of sample is not clear cut if Put-Call Parity holds.
Applying the bottom half of equation ( 10) to the case of pricing a call option with strike X written on a futures contract the price of which at expiration will be drawn from the RND f (F T ) we have A valid RND will integrate to 1.0 and, in the case of futures, will have a mean equal to F t (the current futures price).
Substituting results from equation ( 40) into equation ( 39) and comparing to equation ( 10) yields the familiar Put-Call Parity equation for options on futures.
Thus, if we calculate the difference between an observed option price C and a predicted option price Ĉ from a constructed RND and Put-Call Parity holds for both the actual and predicted prices we find that a put and a call that share the same strike price will also share the identical pricing error.
Obviously, the OTM options used to create the RND (the left hand side of equation ( 1)) are definitely in sample options.However, any in the money (ITM) options that share the same strike with an OTM option will also share the exact same pricing error and therefore are not truly out of sample.For this reason we divide our options prices into three, nonoverlapping categories: In Sample, Out of Sample, and Quasi Out of Sample.In Sample options are those on the left hand side of equation ( 1).Out of Sample options are any options that are not In Sample and do not share a strike with any of the In Sample options.Quasi Out of Sample options are the those options that are not In Sample but share a strike with one of the In Sample options.

RND characteristics
Ideally the constructed RNDs would integrate to 1.0 and have a mean equal to the prevailing futures price.Calculating these integrals for the constructed RNDs is somewhat complicated by their blended nature-left and right tails attached to an interior.Given that the FTSE 100 Index is based on share prices, the theoretical support of the RND is [0, ∞) but we obtain our results by numerically integrating the integrals in equations ( 9) and ( 11) over the support [0, 1, 2, . . .12 000] using MatLab's trapz function.This interval appears to be wide enough as the tails of the RND typically fall off quite quickly.
As can be seen in the top third of table 5, most of the procedures produce RNDs with integrals that are quite close to 1.0 as most of the procedures attempt to match the values from the interior of the RND at the attachment points.Unsurprisingly, the BP procedure is the lone exception as it makes no attempt to match the values of interior RND, typically resulting in an integral that is less than 1.0.On rare occasions, the FB procedure produces a RND that integrates to well below 1.0.This is likely due to using a GEV density for the tails.As seen in equation ( 34), the support for the GEV density is (−∞, ∞) whereas our numerical integrals only use the range [0, 12 000].If the estimated GEV tail parameters leave some probability mass below zero, then the resulting integral calculated over the [0, 12 000] range will be less than 1.0.
All of the procedures produce RNDs with expected values that, on average, are less than the prevailing FTSE 100 Index futures price.In terms of RMSE, the TailHAP procedure comes closest to the futures price, followed by TailHA-PAlt.Again unsurprisingly, the BP procedure struggles in this regard because it does not attempt to incorporate any information from the interior RNCDF in its construction, thereby generating an expected value that is typically quite far below the futures price.
The bottom two-thirds of table 5 provide measures of how well the tails of the constructed RNDs match the attachment point values of the RND and the observed option prices at X 1 and X N .For the TailHAP and TailHAPAlt procedures, this information simply shows that these conditions are generally satisfied, subject to the limitations of the fsolve routine, as they were used as constraints in the construction of the RND.The same is true for the left and right CDF errors for the FB procedures.For the BP procedure, ignoring the values for the interior RND leads to large discrepancies for the mass in the left tail vis-a-vis the interior RNCDF but almost no discrepancies for the mass in the right tail.Looking across all the entries in table 5, the TailHAP procedure generates RNDs that come closest to the ideal characteristics.

Option pricing errors
Our emphasis on the ability of the RND to accurately price options is shared by Reinke (2020) who compares the RND techniques of Figlewski (2010) and Jackwerth (2004) using options on the S&P 500 Index.As in Reinke (2020), we use root mean square error (RMSE) as our measure of accuracy when calculating pricing errors.Unlike Reinke, we do not emphasize errors calculated in terms of implied volatilities.
Table 5. Risk neutral density performance across the five alternate estimation procedures for the 2850 trading days in the sample.Performance is measured in terms of: the density integrating to 1.0, the difference between the expected value of the density and the futures price for the particular contract on the trading day, the square percent error in matching the probability mass in the left and right tails, and the difference between the predicted option price and the actual option price at the attachment point.RMSE and RMSPE denote root mean square error and root mean square percent error.Using implied volatilities is certainly reasonable given that option traders routinely quote prices in terms of implied volatilities.However, when deciding whether or not a pricing error is large or small, expressing the pricing error in terms of the fundamental units in which contracts are settled (pounds sterling in the case of the FTSE 100) provides a more intuitive measure of the RND's accuracy.An option mispricing of 0.1 implied volatility percentage points is not as easily interpretable as a mispricing of £15 that can be compared to the transactions cost of taking positions to profit from the mispricing.The LIFFE options on the FTSE 100 are quoted in FTSE 100 index points, with each index point equal to £10.All of our pricing errors are reported in pounds sterling by multiplying the index point difference between the predicted and actual option price by £10.We obtain the pricing errors by numerically integrating the integrals similar to those in equation ( 10) over the support [0, 1, 2, . . .12 000] using Mat-Lab's trapz function.

Procedure
Table 6 presents statistics on overall pricing errors for each of the procedures.Given that our option prices cover a wide range from 0.5 to 4791 FTSE 100 Index points, we present pound pricing errors in both level and percentage terms.The mean values indicate whether the average error is an under or over pricing of the option.For three out of the five procedures, the options are generally overpriced, a point that we return to shortly in the discussion of table 7.
The RMSE provides the best sense of the magnitude of errors.For example, the TailHAP procedure produces an average mispricing of roughly £68 or 6.8 FTSE 100 Index points, or almost 14 ticks.Such an error is large if market participants can take positions to profit from the mispricing of these options for less than roughly £70.We also tabulate percent pricing errors, given that an error of 5 FTSE 100 Index points is quite small for an option with an observed price of 1000 index points but quite large for an observed option with a price of 6 index points.Using this metric across all five procedures, TailHAP provides the best pricing performance, but not always by a wide margin.The TailHAPt and BPAlt procedures produce percent errors that are only slightly larger.
Given its lack of attention to interior RNCDF values, the BP  2) with the implied volatility for the predicted option price from the RND.This panel allows for only a very rough comparison with Reinke's S&P 500 Index options results.Our calculations are across all 381 696 options in the sample, whereas Reinke calculates RMSE for each trading day.All five of our procedures understate implied volatility on average.In terms of RMSE volatility errors, FB has the best performance, followed closely by TailHAP and TailHAPAlt.Reinke reports average in and out of sample RMSEs across 1537 trading days for the equivalent of our FB procedure that range from 0.0176 to 0.0486 for calls and 0.1059 to 0.1673 for puts -not terribly different from the values reported in the bottom panel of table 6.
Errors can also be examined across option type, maturity, moneyness, and sample.Table 7 presents pricing errors separately for puts and calls, omitting percent pricing errors.Aside from the extremely large errors for the BP procedure, pricing errors for puts are positive (the puts are overpriced) and negative for the calls (the calls are underpriced).Because puts pay off in the event that the underlying asset price falls below the strike price, put prices are largely determined by the left side of the RND.Call prices, where the option pays off in the event that the underlying price exceeds the strike price, are largely determined by the right side of the RND.Thus, table 7 suggests that all but the BP procedure are producing RNDs with left sides that contain too much probability mass ('fat' left sides) and right sides that contain too little probability mass ('thin' right sides).More detailed tables of put and call errors across moneyness and sample indicate that puts are generally overpriced across every type of option except for deep out of the money puts that are also out of sample.† The underpricing of calls is largely the result of large underpricing errors for deep in the money calls whether they are out of sample or quasi out of sample.In short, puts are overpriced except for those with very low strikes that are out of sample.Thus, the fat left side of the RND is not exclusively the result of a fat far-left tail.Call underpricing is most pronounced for those calls with very low strike prices -also the result of a fat left † These detailed tables are available from the authors upon request.
side of the RND.The smaller put option RMSEs for the Tail-HAP and TailHAPAlt procedures seen in the second row of table 7 are evidence in favor of our methodology that matches the option price with strike K 1 = X 1 when fitting the left tail.
Errors by maturity are found in table 8. Here, the TailHAP and TailHAPAlt procedures generally have only a slightly better performance than the FB procedure at short maturities, but the advantage widens at longer maturities with the notable exception of relatively large errors for TailHAPAlt for options with greater than 300 days to maturity.As can be seen in table 9, the TailHAP procedure produces the smallest RMSEs across all the moneyness categories, most notably for Finally, and perhaps most interestingly, table 10 presents pricing errors by sample for each of the procedures.For in sample options, the TailHAP procedure returns the smallest RMSE.The out of sample results present the most stringent pricing criterion -how well do the constructed RNDs price options that played no part in the construction of the RND?Here again, the TailHAP procedure has the smallest RMSE, and by a wide margin.The same is true for the quasi out of sample options.

Conclusion
We demonstrate the viability of adding additional constraints when attaching the tails to the interior of a constructed RND.In particular, we document the feasibility of ensuring that the left and right tails correctly price the observed option prices at the attachment points, using either a Weibull (Tail-HAP) or lognormal (TailHAPAlt) functional form for the tails.We recognize that these pricing constraints, especially for the Weibull functional form, increase the numerical complexity of the RND construction.For the Weibull (TailHAP) choice, care must be taken in selecting the starting values as detailed in the Appendix.Finding the best starting values requires some patience, and even with good starting values the TailHAP procedure requires roughly twice as much computing time than does the TailHAPAlt procedure.Is the additional complexity worth the cost?We think so, based on the generally superior performance of the TailHAP procedure documented in section 5.The TailHAP procedure does best both in terms of the characteristics of the RND and the pricing of individual options.For those wishing to minimize these numerical issues, the TailHAPAlt or FB procedures may be preferred.
The advantage of the TailHAP procedure is not merely the result of using the flexible functional form of the Weibull distribution for the tails.The Weibull itself is a special case of the GEV distribution used in the FB procedure.Rather, the advantage of the TailHAP procedure stems from incorporating the information in the attachment point option prices on the location, and not just the amount, of the probability mass in the left and right tails.Even using the less flexible function form of the lognormal distribution in the TailHAPAlt procedure still results in a generally superior performance compared to the methods that do not attempt to match the prices of the options at the attachment points.
We reach several other conclusions.First, for almost all of the procedures the estimated RNDs generally overprice puts and underprice calls.Second, attaching tails without regard to values from the interior RNCDF (BP procedure) produces quite large errors that can be dramatically reduced with relatively little effort (BPAlt).Third, in a few instances the (−∞, ∞) support of the GEV creates problems for the FB procedure to return a RND that integrates to one, has a mean that matches the futures price, and accurately prices put options.On the other hand, the FB procedure produces a very smooth interior RND.Practitioners should be aware of these trade-offs.Finally, we document that Put-Call Parity complicates the determination of which options are in and out of sample in the construction of a RND.

Appendix. TailHAP starting values
As made clear in sections 3.2.1 and 3.2.2, using the Weibull distribution to price options involves the lower and upper incomplete gamma functions γ (z, M ) and (z, M ).† These functions can create problems for non-linear equation solvers such as MatLab's fsolve, as the solver can easily select parameter values such that either z < 0 and/or M ∈ C at which point the incomplete gamma functions cannot be evaluated and the solver fails.For this reason, high-quality starting values are critical in implementing the TailHAP procedure.
To determine these starting values, we first start with a simpler problem involving the lognormal density function.As is made clear in equation ( 10), put and call option prices with strikes at the observed option prices with the lowest and highest strikes are partly determined by the conditional expectations for the price of the underlying asset: E[F T | F T < X 1 ] and E[F T | F T > X T ].If the RND for the price of the underlying asset was a lognormal density, as assumed in the Black model, then it is not difficult to calculate these conditional expectations.By comparing equations ( 2) and (10), using the following properties of the lognormal distribution, and taking advantage of the fact that the mean of the lognormal density equals the futures price F, we can write the conditional expectations under the lognormal assumption as where σ 1 and σ N are the implied volatilities obtained in Step 1 of section 3.1 and μ 1 = ln(F) − σ 2 1 t 2 and μ N = ln(F) − σ 2 N t 2 .To obtain good starting values for the left Weibull tail, we use MatLab's fsolve routine to find the parameters k L0 and λ L0 such that: (1) E[F T | F T < X 1 ] evaluated under a Weibull distribution matches the value for E[F T | F T < X 1 ] calculated under the lognormal distribution as in equation (A2), and (2) the Weibull RNCDF evaluated at X 2 matches RNCDF 2 .For the right tail starting values we use MatLab's fsolve routine to find the parameters k R0 and λ R0 such that: (1) E[F T | F T > X T ] evaluated under a Weibull distribution matches the value for E[F T | F T > X T ] calculated under the lognormal distribution as in equation (A2), and (2) the Weibull RNCDF evaluated at X T−1 matches RNCDF T−1 .The conditional expectations under the Weibull distribution, following the same logic as for the lognormals, are written as k L0 † Using a GEV distribution to price the options also involves γ (z, M ) and (z, M ) as can be seen by replacing the Weibull distribution in equations ( 14) and ( 20) with the GEV distribution and proceeding with the pricing solutions.See also Markose and Alentorn (2011).

Figure 1 .
Figure 1.Top Panel: Example risk neutral density constructed using the TailHAP estimation procedure.Bottom Panel: The spline fit in (call delta, implied volatility) space used in the construction of the interior of the risk neutral density shown in red in the top panel.

Figure 2 .
Figure 2. Top Panel: Example risk neutral density constructed using the Bliss and Panigirtzoglou (2002) estimation procedure.Bottom Panel: The spline fit in (call delta, implied volatility) space used in the construction of the interior of the risk neutral density shown in red in the top panel.

Figure 3 .
Figure 3. Top Panel: Example risk neutral density constructed using the Birru and Figlewski (2012) estimation procedure.Bottom Panel: The spline fit in (strike price, implied volatility) space used in the construction of the interior of the risk neutral density shown in red in the top panel.

Table 1 .
Sequential data filters applied to arrive at the final dataset of 381 696 options used in the estimation and evaluation of risk neutral densities.

Table 3 .
Descriptive statistics for the number of options available on each trading date in the dataset of 381 696 options across 2850 trading days by: puts and calls, moneyness, and sample.OTM denotes out of the money, ATM denotes at the money, ITM denotes in the money.In sample options are those used to construct the interior of the risk neutral density.Out of sample options are those options that were not used to construct the interior of the risk neutral density and that do not share a strike with an option used to construct the interior.Quasi out of sample options are the remaining options -those options that share a strike with an in sample option.

Table 4 .
Descriptive statistics across all 19 contracts for the 381 696 options in the dataset by number of options, number of in sample options, range of the futures price, range of days to expiration, and average LIBOR interest rate.

Table 6 .
Descriptive statistics for option pricing errors (Predicted -Actual) for all 381 696 options in the dataset across the five risk neutral density estimation procedures in terms of: British pounds, British pound pricing errors in percentage terms, and implied volatility.RMSE and RMSPE denote root mean square error and root mean square percent error.

Table 7 .
Descriptive statistics for put and call option pricing errors in British pounds (Predicted -Actual) for all 381 696 options in the dataset across the five risk neutral density estimation procedures.RMSE and RMSPE denote root mean square error and root mean square percent error.

Table 8 .
Descriptive statistics for option pricing errors in British pounds (Predicted -Actual) by days to expiration for all 381 696 options in the dataset across the five risk neutral density estimation procedures.RMSE denotes root mean square error.

Table 9 .
Descriptive statistics for option pricing errors in British pounds (Predicted -Actual) by moneyness for all 381 696 options in the dataset across the five risk neutral density estimation procedures.OTM denotes out of the money, ATM denotes at the money, ITM denotes in the money, and RMSE denotes root mean square error.

Table 10 .
Descriptive statistics for option pricing errors in British pounds (Predicted -Actual) by sample for all 381 696 options in the dataset across the five risk neutral density estimation procedures.In sample options are those used to construct the interior of the risk neutral density.Out of sample options are those options that were not used to construct the interior of the risk neutral density and that do not share a strike with an option used to construct the interior.Quasi out of sample options are the remaining options -those options that share a strike with an in sample option.RMSE denotes root mean square error.