Zero-coupon interest rates: Evaluating three alternative datasets

The zero-coupon yield curve is a common input for most financial purposes. We consider three popular yield curve datasets and explore the extent to which the decision as to what dataset to use for a particular application may have an impact on the results. Many term structure papers evaluate alternative models for estimating zero coupon bonds based on their ability to replicate bond prices. However, in this paper we take a step forward by analyzing the consequences of using these alternative datasets in estimates of other moments and variables such as interest rate volatilities or the resulting forward rates and their correlations. After finding significant differences, we also explore the existence of volatility spillover effects among these three datasets. Finally, we illustrate the relevance of the choice of one particular dataset by examining the differences that may arise when testing the expectations hypothesis. In the conclusions, we provide guidance to end users in selecting a particular dataset. ARTICLE HISTORY Received 21 November 2018 Accepted 10 June 2019


Introduction
The zero-coupon yield curve is a key input for many financial and economic purposes. The risk free interest rates for different maturities determine the current value of future nominal payments. They provide the benchmark for pricing all fixed income securities and their derivatives. Since the valuation of these securities must fit an empirically observable term structure of interest rates, this input is essential for the valuation process. The term structure is also used in the calibration of fixed income valuation models, Greek calculations, risk measurement, and design of hedging strategies. The shape of the zero-coupon yield curve is relevant to theories of macroeconomics, particularly in the area of monetary economics. Tristani, & Vestin, 2008). As is well known, macroeconomic factors and yield curve factors interact strongly. The short-end of the yield curve would covariate with the monetary policy instruments under the control of the central bank (see, e.g. Taylor, 1993). The average level of the yield curve is usually associated with the rate of inflation and the slope with the business cycle (see, e.g., Diebold, Rudebusch, & Aruoba, 2006). Macroeconomic factors can be extracted simultaneously from a dynamic factor model for a zero-coupon yield curve using the dynamic Nelson and Siegel (1987) yield curve framework (see, e.g., Diebold & Li, 2006).
A preliminary empirical analysis uses intraday trading data from US Treasury securities to show their yields to maturity and to draw yield curves that are extracted from the three zero-coupon interest rate databases to be compared. We selected two dates in which the adjustment of a yield curve is of special complication. A date with a structure of yields to maturity with two humps (July 5, 2006) and another date with severe liquidity problems (September 9, 1999). We observe mispricing problems of each curve for a specific maturity tranche, i.e. FRB in the very short term and DoT between 2 and 7 years, convexity problems in the long end in the case of FRB, and overfitting problems with sawtooth patterns in the case of DoT and F082.
Since all three databases are estimated from different assets, we cannot analyze the ability of these estimates to replicate prices observed in the market. Instead, we measure the differences among the three yield curves by analyzing the following variables: (1) the volatility term structure, which is the volatility of zero-coupon rates with different maturities; and (2) the correlations among forward interest rates. On the one hand, the term structure of volatilities is a necessary input for calibrating many interest rate models and particularly the so called "volatility consistent models". Within this category we can find models such as Black, Derman, and Toy (1990) model, one of the most popular tools among practitioners, or some extended versions of Hull and White (1994) model. The current term structure of interest rates is exogenously given in these models. Many financial problems such as product valuations, risk measurements, and hedging strategies, depend on spot interest rate volatilities and correlations. For instance, Value-at-Risk depends, above all, on the volatility of spot interest rates and their correlations.
On the other hand, the forward rate correlations determine dynamic models of term structure (see, e.g. Cox, Ingersoll, & Ross, 1985;Ho & Lee, 1986;Black et al., 1990;Heath, Jarrow, & Morton, 1992, and the LIBOR market model). They are the key input for pricing financial products, such as interest rate caps, floors and swaptions. They are also particularly relevant to macroeconomic issues, such as the ability of forward rates to capture market expectations about future short-term interest rates.
The first step of our empirical analysis is focused in the study of the temporal dynamics of the volatilities estimated from the standard deviation for 60-day rolling windows and from a conditional model. We use daily yield curve estimates for the three datasets, i.e. DoT, FRB andF082, in the period 1994-2017. The descriptive statistics of the VTS estimates point to important differences for terms of one year or less, F082 being the most volatile. Using various tests to contrast the significant differences between nonoverlapping observations from the VTS time series in terms of mean, median and variance, we obtain how the behavior of the F082 series is different from that of the other two series. The statistically significant differences between DoT and FRB are concentrated in the short end of the curve, although they also appear in intermediate maturities. The results are similar regardless of the method used to estimate volatility. Finally, an analysis of cross-data volatility spillovers shows that the more flexible fitting of F082 can anticipate movements in the other two series.
In a second step, we analyze correlations between forward rates. We also apply a preliminary descriptive analysis and different equality tests of averages, medians and variances between the pairwise correlations series. The best behavior from a theoretical point of view of long-term forwards is observed in the DoT series. In addition, the correlations from the FRB and F082 series are much more unstable.
Our latest analysis focuses on replicating a simple classical test of the expectations hypothesis, specifically the analysis presented by Campbell (1995). We observe that depending on the sample of interest rates used, the result of this test would have been different. The DoT dataset shows the most disparate results with those obtained with the other two datasets, especially in the shortest maturity.
Our analysis shows that there is not an optimal yield curve dataset. Anyway, we provide some guidance as to which criteria should be used to choose among alternative datasets. Final users of the yield curve belong to different worlds, namely the academia and the industry, and so they address different issues and have different goals. The DoT dataset provides on-the-run yield curves with rigid shapes that could be especially appropriate for pricing liquid short-term securities, for calibrating interest rate models, and for other purposes in macroeconomic research, monetary policy, or analysis of investors' risk preferences. The FRB dataset is the only one that allows obtaining a continuous yield curve without interpolation. These flexible off-the-run yield curves generate low volatility levels for most maturities, which could be suitable for pricing medium-and long-term bonds and for market risk management purposes. Finally, the F082 yield curves provide an accurate view of the current full market situation.

Alternative yield curve datasets 2.1. Description
In this paper, we use daily data on three of the most popular risk-free interest rate datasets from January 1994 through December 2017. These datasets contain daily estimations of the term structure of zero-coupon interest rates corresponding to the US Treasury market. Two of those datasets are publicly available from the Federal Reserve Board's website. As a private alternative, we also consider a dataset only available through a financial data vendor. This is the case of the Bloomberg zerocoupon yield curve from government securities (code F082).
The US Department of Treasury daily fits yield curves from observations of either the yield on the most recently auctioned on-the-run Treasuries, or the yields on the interpolated constant-maturity series (see, Jordan & Mansi, 2003). These yield curves are simultaneously published by the Federal Reserve Board's website as the "Treasury Constant Maturities" dataset which is included in the "H.15 Selected Interest Rates" data release, 1 and by the US Department of the Treasury's website ("Daily Treasury Yield Curve Rates"). 2 As such, we refer this dataset as Department of Treasury (DoT). More difficult to find but even more frequently used, the Federal Reserve Board's website also offers another daily zero-coupon yield curve included in the "Finance and Economics Discussion Series" section. 3 It is linked to the G€ urkaynak, Sack, and Wright (2007)'s working paper. For convenience, we refer to this latter set of interest rates as "the FRB dataset"; however, the website indicates as follows: "Note: This is not an official Federal Reserve statistical release." Table 1 summarizes some of the main characteristics of these three datasets. The fitting technique is completely different. Each method provides yield curves with different degrees of flexibility (see, e.g., Bliss, 1996;Bolder & Gusba, 2002;Bolder et al., 2004;Yallup, 2012). Some techniques are better to describe the hump that is so frequently observed in yield curves or the behavior of long-term interest rates. Other techniques are more rigid in the adjustment of the current yield curve, but they produce more stable time series of spot rates.
First, the FRB dataset is adjusted applying a weighted version of the well-known and widely used parametric and parsimonious procedure proposed by Svensson (1994). This method is an extension of the Nelson and Siegel (1987) method. 4 It extends the flexibility and the range of shapes generally seen in yield curves: monotonic forms, humps at various areas of the curve, and s-shapes. As usual, they make the variance of the error term proportional to the modified duration of each security to penalize the valuation errors of the short-term securities. In this manner, a better adjustment on the short-end of the yield curve is enforced. Second, the DoT dataset uses a quasi-cubic Hermite spline function that passes exactly through the yields of a given set of securities with constant maturities (see, Jordan & Mansi, 2000). The theoretical security for each maturity is calculated from composites of quotations obtained by the Federal Reserve Bank of New York. Therefore, DoT does not estimate zero-coupon yield term structures but instead uses simple yield curves relating yields to maturity and terms to maturity. Third, the F082 dataset uses a piecewise linear function, although no more details about the concrete function are reported by Bloomberg. 5 The second source of discrepancies is the cross-sectional market data used as input for the estimations. The FRB uses end-of-the-day prices but does not identify the specific prices employed. There is no information indicating whether the prices are quoted prices (mid bid-ask, bid or ask prices) or trading prices (last trading price, average daily prices or other prices). The DoT is based on the closing market bidside yields on actively traded Treasury securities in the over-the-counter market. The quotes for these securities are obtained at or near the 3:30 PM close each trading day. The F082 dataset uses the "Bloomberg generic prices", supplemental proprietary contributor prices or both. 6 The third and probably main source of divergences among datasets is the basket of assets included in the estimations of each daily yield curve. The FRB excludes from the estimation all Treasury bills and the on-the-run and "first-off-the run" issues of bonds and notes, bonds with less than three months to maturity, bonds with embedded options, twenty-year bonds since 1996 and "other issues that [they] judgmentally exclude on an ad hoc basis." Most daily estimates incorporate more than one hundred observations. Conversely, the DoT considers only on-the-run securities, including four maturities of the most recent auction bills (4-, 13-, 26-and 52-week), six maturities of just-issued bonds and notes (2-, 3-, 5-. 7-, 10-and 30-year) and a composite rate in the 20-year maturity range. Note that as they consider on-the-run bills and bonds, the resulting yield curve can be interpreted as a par yield curve because these just-issued assets are traded near par. However, coupon bias and forward rate bias may appear and with them, differences between zero-coupon interest rates and par yields, especially for long maturities. 7 It should be noted that daily yield curve estimations are obtained from eleven daily observations. Finally, the third dataset (F082) considers all outstanding Treasury bonds. In this case, the number of daily observations is larger than two hundred.
The three datasets provide interest rates for several discrete maturities. The maturities range from 1-month (DoT), 3-month (F082), or 1-year (FRB) to 30-year (all the datasets). A relevant aspect is that the FRB dataset includes interesting information for a number of purposes, such as pricing, hedging, etc. It includes the six Svensson model's parameters resulting from each daily estimation, which allows to immediately compute the zero-coupon interest rate for any required maturity. To obtain no reported maturities from the other two datasets requires to use an interpolation method. This interpolation adds additional noise to the estimated interest rates being doubly interpolated.
Recent episodes of extraordinary low levels of interest rates have resulted in negative yields for some Treasury securities on the secondary market. 8 The DoT considers these negative yields unrelated to the time value of money, consequently the negative input yields are reset to zero percent prior to use as inputs in the yield curve derivation. By contrast, we observe some negative interest rates for the shortest maturities in the case of both the FRB and the F082 datasets. Table 2 summarizes the descriptive statistics for the time series of continuously compounded spot rates we obtain for the three datasets. In the case of the DoT dataset, we recalculate the reported yield as a continuously compounded interest rate. According to the information on the DoT website, the market yields are investment yields or bond equivalent yields. The formula uses simple interest and the day count convention actual/actual. In addition, the 1-month interest rate is not reported in the F082 dataset. This is also the case of the DoT dataset from the beginning of our sample period (January 1996) up to July 2001. For both cases, we estimate the corresponding interest rate by using cubic interpolation. The table shows a prevalent positive slope of the yield curve. The average spot rate increases as time to maturity grows, from 2.22% for the 1-month rate, to 4.63% for a 30-year maturity. The volatility of the time series increases from approximately 2.13% for monthly rates to 2.22% for 6-month rates, and then goes down to 1.24% for 30-year spot rate. Comparing among datasets, we highlight the difference in average levels. As we comment in the next section, DoT and FRB consider completely different Treasury securities according their liquidity. The DoT dataset only considers the most liquid assets, the on-the-run ones, meanwhile the FRB dataset excludes all the bills and the on-the-run and first-off-the-run bonds. The F082 dataset includes all the Treasury securities. As expected, the yearly average spot rates for the DoT are lower than the corresponding rates for the FRB for all the maturity spectrum.

Implications of the basket of assets
From the previous section, we can highlight that the inputs included to fit the underlying yield curves provided by the three considered risk-free interest rate datasets are plainly different in characteristics and in number of included securities. For instance, they range from eleven to more than two hundred securities. In this section, we comment some of the potential implications of these disparities. The three datasets use exclusively US Treasury securities as input, but these assets have some important features with significant effects on yields. The main differentiating characteristic is the liquidity. Other relevant features to consider are a potential market segmentation, the number and maturity spectrum of the considered securities, and the optionality.
Liquidity is a key factor in the pricing of fixed income securities. Since the seminal work by Amihud and Mendelson (1991), there has been a large literature on how security's liquidity is priced in Treasury markets. The observed differences in prices imply that market participants price liquidity. Investors are willing to pay a higher price for liquid assets. Otherwise, the more highly liquid securities are traded with a liquidity premium that implies a higher price and therefore a lower yield-to-maturity. Sarig and Warga (1989) show that the "on-the-run" or "just-issued" security is by far the most highly liquid bond and is traded with a liquidity premium on prices. Krishnamurthy (2002), Goldreich, Hanke, and Nath (2005), D ıaz and Escribano (2017) examine the "on-the-run/off-the-run" spread and the regular and well-defined trading activity cycle throughout the lifetime of the US Treasury bonds.
In this sense, the criterium employed by both the DoT and the FRB datasets about the considered Treasury assets in their estimation are completely opposite. The DoT only includes on-the-run assets, meanwhile the FRB excludes on-the-run and firstoff-the-run assets. The on-the-run bond generally has higher prices than previous issues (off-the-run) that mature on similar dates. Literature proposes three reasons to explain this behavior: liquidity, transaction cost, and repo specialness. The high demand from institutional investors of the on-the-run security increases their price in the cash market. These investors choose to hold these liquid securities because they can sell them more quickly and without high losses. Although off-the-run bonds are less expensive than on-the-run bonds, investors think that they are difficult to find and scarce in markets. 9 On-the-run Treasuries are appealing securities to create short positions. These securities can easily be short-selling and later easily repurchased in the repo market. Simultaneously, long investors are willing to pay a higher price for securities that they can lend at a premium to short-sellers in the repo market. These assets are frequently traded "on special", i.e., it can be used as collateral to borrow money at a rate below the prevailing general repo rate, because it is more liquid or, alternatively, because of its scarcity due to the limited supply or short squeeze. 10 Positive specialness is generally considered to be a signal of greater "market desirability" or a relatively scarce supply of the specific instrument used as collateral in the repo contract. Specialness in the repo market may cause on-the-run securities to trade at a premium. Investors are willing to pay something extra for a Treasury bill because of its characteristics as a security.
The supply and demand of Treasury securities, the market-level illiquidity, macroeconomic shocks, and monetary policy decisions may influence in the government bond liquidity. Among others, Krishnamurthy (2002) and Barnejee and Graveline (2013) highlight that the effects of supply and demand can have important consequences in the Treasury market. Reductions in the supply of Treasuries lower the yield-to-maturity on Treasuries relative to corporate securities that are less liquid and riskier than Treasuries. Fontaine and Garc ıa (2012) stress the relevant importance of funding liquidity or funding conditions in the repo market as an aggregate risk premium in the Treasury market. They suggest that changes in monetary aggregates and in bank reserves are key determinants of the liquidity premium. Fleming (2003) and Longstaff (2004) show the existence of a significant flight-to-quality and flight-toliquidity component in Treasury bond prices. There is a premium related to the market sentiment and the amount of the funds that flow into equity and money market mutual funds. In times of adverse economic and financial conditions, a greater demand for liquidity increases liquid Treasury security prices by more than usual. An increased perception of market risk may increase the spread between on-the-run and off-the-run government bonds.
Concern about segmented markets is the reason given by G€ urkaynak et al. (2007) to exclude all Treasury bills in the estimation of the FRB dataset. Duffie (1996) shows that Treasury bill rates can exhibit idiosyncratic variations that seem distinct from the other longer-dated segments of the Treasury markets. These short-term assets are often traded on a convenience yield because are used by large investors to adjust their interest rate risk exposure. Both the DoT and the F082 include Treasury bills in their basket of assets.
The maturity spectrum is a relevant aspect in the fitting process of a yield curve. The relative number of assets included in the different time intervals of all the maturity spectrum determines the quality of the fitting and the resulting shape of the curve. 11 Firstly, this is specially the case of decisions about assets with maturities shorter than one year or longer than ten years to include. In this sense, four of the eleven maturities considered by the DoT fall under the first range and they are Treasury bills, whereas only two of the maturities are longer than ten years. In contrast, the FRB excludes all bills regardless of maturity and bonds with fewer than three months to maturity but includes almost all straight bonds and notes. Secondly, the resulting shapes of the yield curve can be much more complex when the number of assets is high enough. The DoT estimates only uses eleven observations providing simple shapes. The fitting from one hundred observations in the case of the FRB, or even twice this number in the case of F082, renders much more complex yield curves. Thirdly, long term bonds are extremely price sensitive to small changes in interest rates. Thus, a large proportion of these bonds can force a fine adjustment at the long end of the yield curve at the cost of accuracy for the shortest maturities.
Finally, the F082 dataset includes non-straight bonds in the basket of assets. During the sample period, a group of old 30-year callable Treasury bonds are outstanding. It is well-known that optionality affects prices depending on the moneyness of the implicit option. During some periods, these bonds are traded at extremely high yields to maturity.

An illustration
In this section, we attempt to illustrate the implications of using either different models and techniques to fit the yield curve or different baskets of assets as input in the estimation process. To highlight differences, we choose two dates in which the fitting process is particularly intricate. Previously, we examine the shape of the yield-tomaturity of the Treasury securities traded during the day. We obtain intraday US Treasury security quotes and trades for all issues from the GovPX database. 12 GovPX consolidates and posts real-time quotes and trades data from six of the seven major interdealer brokers. The reported price is a discount rate using the actual/360 basis. We recalculate the yield-to-maturity as a continuously compounded interest rate by using the actual/actual. 13 First, we analyze the data corresponding to July 5, 2006. On that day, the shape of the observed yields to maturity of the securities traded in the US Treasury market was complex enough to show the problems related to the higher or lower flexibility of the alternative models and methodologies used in each database. Figure 1 depicts the original yields to maturity of the traded bonds, notes and bills in the US Treasury market and the three alternative yield curve estimates. The yield to maturity of all the securities traded in the market is obtained from the market prices reported by GovPX. Blue dots represent the observed yields to maturity of these securities and the lines represent the estimations of the term structure of zero-coupon interest rates using the three alternative datasets. On July 5, 2006 (Panel A), yields to maturity showed a double hump, a phenomenon that is often observed in the US market. For many fitting techniques it is difficult to capture this double hump. Mispriced zero-coupon interest rates can be observed in several maturity tranches. At the very short end of the curve, the FRB does not fit the yields to maturity (for instance, the curve provides a 1-month spot rate of approximately 5.47% while the observed yield to maturity is 4.86%). This result is surprising because the FRB uses the General Least Square version of the Svensson model, which should be particularly appropriate to fit the short end of the yield curve. The lack of Treasury bills in the FRB sample composition is the most plausible reason for this outcome. In addition, the DoT curve clearly shows mispriced zero-coupon interest rates for maturities between 2 and 7 years.
Finally, a convexity problem appears at the long end of the term structure. Nonparametric models usually supply long-term zero-coupon interest rates with implicit non-credible forward rates. It is well known that forward rates are very sensitive to the slope of the yield curve, particularly at the very long end. Therefore, zero-coupon yield curves should be asymptotically flat to provide sensible forward rates. 14 In any event, even when using parametric models such as those of Svensson or Nelson & Siegel, this problem cannot be avoided. G€ urkaynak et al. (2007) note that convexity makes it difficult to fit the entire term structure of securities, especially securities of twenty years or more. They maintain that convexity tends to pull down the yields of very long-term securities, giving the yield curve a concave shape at the long end. This reported problem in the FRB database generates three inconsistencies. First, longterm bonds cannot be priced from these zero-coupon yield curves; that is, their prices cannot be replicated from the zero-coupon interest rates estimates provided by the FRB. Second, for sufficiently long maturities, instantaneous forward rates become negative. 15 Third, the b 0 parameter of the Svensson model should capture the longrun level of interest rates. If we consider that estimated values for this parameter lower than 0.01% makes no economic sense, this constraint is saturated in 19% of the dates in the FRB dataset from 1996 to 2017. 16 As a second example, we choose September 9, 1999 (see Panel B of Figure 1) as a clarifying illustration of the problems related with the impact of liquidity on the yield curve estimate. We can see a wide gap in terms of yields to maturities between onthe-run bonds and off-the run bonds. 17 In this case, the data choice is crucial. Depending on this decision, we are actually estimating different interest rates: the spot rates corresponding to the average market liquidity level (F082), the spot rates of the most liquid references (DoT), and the spot rates of the seasoned bonds (FRB). Although the level of these interest rates should be different, it is most likely that the volatility and the correlations among forward rates are also different.
Another important difference among the three datasets can be clearly observed in Panel B of Figure 1. DoT and F082 are far from being smooth curves. Instead, both methods show an overfitting problem for this date. The sawtooth pattern that these two models provide generates serious inconsistencies in the forward rate term structure. Figure 2 plots the implied instantaneous forward rates derived from each dataset for both dates. As we can see, forward rates are extremely sinuous. In addition, the downward shape of the yield curves in the FRB case produces extremely low and even negative forward rates at the long end of the yield curve.

Analysis of the time series of alternative yield curve datasets
As we have seen, the three popular zero-coupon yield curve estimates differ considerably in various aspects: the model and methodology employed to fit the yield curves, the market variables used as inputs (prices/yields) and the baskets of assets considered in each sample. Consequently, a direct comparison of goodness of fit between these spot rate datasets is not possible. Previous literature usually compares the ability of different yield curve fitting techniques to replicate bond market prices or to adjust arbitrage-free dynamic term structure models. These authors apply different methods on a unique set of market prices. In our case, the original datasets of market prices used by DoT, FRB and F082 are not available and clearly different. We observe different levels and shapes of the yield curves but there is no way to analyze what the best yield curve are. However, we study the potential consequences of using one of these alternative interest rate datasets on the estimates of other variables that imply using the time series of interest rates: the volatility term structure (VTS) and some correlations between pairs of forward rates. Both variables play a key role in valuation models of interest rate derivatives.
From the three zero-coupon interest rate datasets, we extract spot rates for the 11 maturities reported by the DoT. These maturities range from one month up to 30 years. These spot rates are the input for two alternative methods to estimate the VTS. First, we calculate simple standard deviation measures by using 60-day rolling windows from the log-difference of the value of the spot rates for each maturity. We refer to the resulting annualized volatilities as "historical volatilities." Second, we Figure 2. The impact on the instantaneous forward interest rates. The DoT and F082 datasets provide interest rates for certain fixed maturities. We use a simple cubic interpolation technique to build continuous yield curves for these two datasets. For each maturity n, the corresponding instantaneous forward interest rate is proxied by the 1-week rate beginning n years ahead. Panel A. July 5, 2006Panel B. September 9, 1999 examine fitting results of a set of the most common specifications for the well-known family of the conditional volatility models. We consider a standard GARCH (1,1) model because it is traditionally assumed to estimate volatility of daily interest rates (see, e.g., Longstaff & Schwartz, 1992). We also consider the EGARCH model family proposed by Nelson (1991). Among others, Hamilton (1996) and Andersen and Benzoni (2007) have documented that an EGARCH representation for the conditional yield volatility provides a convenient and successful parsimonious model for the conditional heteroskedasticity in short term interest rate time series. According to the Schwarz and Akaike Information Criterion (SIC and AIC respectively), we choose the EGARCH (1, 1) model to estimate interest rate volatility. Table 3 summarizes descriptive statistics of the VTS estimations from the three interest rate datasets. Both the historical method (panel A) and the conditional method (panel B) produce quite similar values for all the descriptive statistics, except in the short-end of the term structure. The conditional method obtains time series for maturities up to 6-month with higher values for all the statistics, particularly high for the excess of kurtosis. These highly leptokurtic volatilities indicate the presence of extreme values. However, we are not interested in comparing volatility estimation methods but only in examining potential implications of using different interest rate datasets to compute the VTS. The most remarkable outcome is the observed differences among the three datasets in the estimation of the short rate volatilities. There are clear differences in all the reported statistics for maturities of one year or less. Except for the 1-month maturity, the F082 dataset shows higher volatility estimates in this tranche for both the historical and the conditional methods.
As we are considering an extremely large daily dataset covering a 22-year period, it would be reasonable to expect the smooth values that we obtain for all the statistics for the maturities higher than 1-year. To test if there are statistically significant differences among the three datasets, we compute parametric and non-parametric test for equality of distributions. Since the VTS series estimated as historical volatilities and as conditional volatilities show similar behavior, we focus on the analysis of historical volatilities. 18 As volatility is computed as an annualized standard deviation using a 60-day rolling window, its time series can present serial autocorrelation problems. Therefore, in the equality tests we use non-overlapping observations, i.e. we consider only one out of every 60 volatility observations (5,497 trading days, 5,437 volatility estimates and 96 observations per sample and maturity).
For each pair of time series, we run a standard t-test of equality of means, a common non-parametric sign test (binomial test) and a signed-ranked test of equality of medians, and the Levene (1960) test of equality of variances. Table 4 shows the results for the pairs DoT versus FRB, F082 versus FRB, and DoT versus F082, where the first dataset is called "X" and the second one "Y". We compute the "X/Y ratio" at time t by dividing the volatility level from the dataset X at time t by the corresponding volatility from the Y dataset at time t. The table shows average X/Y ratio, the percentage of times the ratio is higher than the unity. Also, we compute the t-test, the sign test, the signed-ranked test where the null hypothesis is that the X/Y ratio is equal to unity, i.e., there is no significant difference between the volatility level of These tables report daily averages of the volatility term structure (VTS) computed by two alternative methods: annualized standard deviations using 60-day rolling windows from the log-difference of the value of the spot rates ("historical volatilities") and an EGARCH(1,1) model. As input, we use zero-coupon interest rates from the three daily yield curve datasets (DoT, FRB, F082) during the period from 1996 to 2017 (5,497 trading days and 5,437 volatility estimates).
both the X and the Y datasets. We also compute the Levene test of equal variances in each pair of volatility series. Table 4 shows clear evidence of statistically significant discrepancies for most maturities. The different behavior of the volatility time series obtained from the F082 dataset is significant from a statistical point of view for nearly the entire maturity Volatility term structure (VTS) computed by annualized simple standard deviation measures using 60-day rolling windows from the log-difference of the value of the spot rates ("historical volatilities"). As the input, we use zero-coupon interest rates from three yield curve datasets (Department of the Treasury, DoT; Federal Reserve Board, FRB; Bloomberg, F082) during the period from January 1996 to December 2017. In these tests we use non-overlapping observations from the VTS time series, i.e. we consider only one out of every 60 volatility observations (5,497 trading days, 5,437 volatility estimates and 96 observations per sample). The X/Y ratio is defined as the mean volatility for the X dataset divided by the corresponding mean for the Y dataset. The hypothesis that the mean of the X/Y ratio is equal to unity is tested using a standard t-test. A standard sign test is used to test that the proportion of X/Y ratios higher than unity is equal to the proportion of ratios lower than unity, which is equal to 0.5. The Levene statistic is used to test the equality of variances between X and Y. ÃÃ and Ã indicate significance at 1% and 5% level, respectively, in a two-tailed test.
spectrum. Differences in the volatility level of the zero-coupon interest rates computed from the FRB or the DoT datasets are only significant in the short end of the yield curve (1-and 3-month), in the central range of maturities (3-and 5-year), and in the long end (20-and 30-year). The null hypothesis of equal means and equal medians between DoT and FRB cannot be rejected for key maturities of the VTS, i.e. maturities from 6-month to 2-year and maturities from 7-to 10-year. In the case of the variance of the volatility, we also observe statistically significant differences among the three interest rate datasets for the shortest maturities. These dissimilar behaviors observed in the VTS computed from the three datasets have relevant implications for practitioners, academics, supervisors, and policy makers. As mentioned above, VTS is a key variable in a multitude of studies, in fields such as financial economics, macroeconomics and monetary policy. For instance, interest rate models are calibrated to fit the model parameters to the VTS market data. These models are used for pricing or assessing any fixed income securities and their derivatives. In addition, Value-at-Risk and Expected Shortfall calculations in fixed income portfolios are extremely sensitive to the VTS and the correlations among different maturities.
We also proceed to a preliminary analysis of the existence of a volatility spillover among these three datasets. There are at least two reasons why volatility changes in one data set may precede changes in other datasets: differences in the liquidity of the assets considered to estimate zero coupon bonds (on-the-run bills and bonds for DoT, second or further off-the-run bonds for FRB, and all bonds for F082), and the different degrees of flexibility of the functional form employed to estimate the yield curve. Moreover, this different degree of flexibility can be more acute in the different tranches of the yield curve (e.g., the weighted Svensson model used by FRB is forced to better adjust short-term zero-coupon bonds at the cost of poorer long-term adjustment).
To explore these potential indirect effects, we estimate a three-dimensional VAR model in which the variables examined are the daily changes in the volatility of interest rates for a given maturity obtained from the three sets of interest rates (FRB,DoT and F082). 19 Specifically, we consider the volatilities estimated through the E-GARCH model, described in Section 3, for four of the main interest rate maturities: 3 months, and 2, 5 and 10 years.
The results suggest that cross-dataset volatility spillovers appear mainly in the medium and long end of the term structure. 20 For short maturities, we only find a Granger-causality positive relationship from FRB estimates to F082 dataset. On the contrary, for medium and longer maturities the relationships are more entangled. In this case, changes in the volatility of more flexible models precede changes in the volatility of more rigid ones. The most robust result we find is a positive Granger-causality relationship from F082 data set to both FRB and DoT and from DoT estimates to FRB. Other relationships appear for some specific maturities, but they are not present for others. Thus, we can conclude that the use of a particular functional form to estimate the yield curve may be the origin of the leading-lag relationships found in interest rate volatility estimates, a result that may depend on the tranche of the yield curve and the flexibility of the model used to estimate that yield curve.
We also examine effects of using alternative interest rate datasets on the correlations between pairs of forward rates. A popular approach to modeling term structure dynamics computes the forward rate process directly, using the initial historical forward rate volatilities as an input. The properties of the correlation matrix of instantaneous forward rates for different maturities is crucial in these models. Forward rates also play a key role in many other financial issues, such as pricing financial products where correlations among forward rates are crucial (e.g., interest rate caps and swaptions), and in analyses of market expectations about future short-term interest rates. The empirical behavior of interest rates implies that the long-term forward rates should be less volatile than short-term forward rates. Desirable properties of any yield curve are that it should produce a smooth forward rate curve, which converges to a fixed limit as maturity increases. Forward rates are highly sensitive to the shape of the yield curve, particularly at the very long end. The intuition says that the correlation between short-term forward rates should be lower than the correlation between long-term forward rates. Economic rationality suggests an almost perfect positive correlation between long-term forward rates. Under the assumption that forward rates reflect expectations about future shortterm interest rates, agents should perceive similar values for the interest rates for different long horizons. 21 We compute time series of pairwise correlation coefficients by using 60-day rolling windows between three pairs of several year-ahead forward rates with a six-month tenor. We consider maturity pairs in the short-end, in the medium-term, and in the long-end of the yield curve. In particular, we examine the correlation between the six-month spot rate, R 0.5 , and the one-year ahead forward rate with a half-year tenor, F 1,1.5 , i.e. q(R 0.5 ; F 1,1.5 ), the correlation between the 2-and the 5-year forward rate, q(F 2,2.5 ; F 5,5.5 ), and the correlation between the 10-and the 29.5-year forward rate q (F 10,10.5 ; F 29.5,30 ).
Panel A of Table 5 summarizes statistics of pairwise correlations from daily contemporaneous forward rates on the three time horizons. The mean values of the correlations in the shortest and the medium-term maturities are quite similar for the three interest rate datasets used to compute the forward rates. The main difference is observed in the long-end of the yield curve. For this time horizon, the "desirable" result should be values close to the unity because the expected asymptotically flat forward rate curve. The highest correlation is obtained from DoT dataset (q ¼ 0.7), but this value is almost two and three times the mean values obtained from FRB and F082 respectively. In fact, one third of the long-term correlations computed from the F082 are negative (more than 13 percent in the case of the FRB). 22 In addition, the standard deviation of these correlations is much higher in the case of FRB and F082. The DoT dataset is the only one to support the intuition of increasing correlations between forward rates as maturity increases. Thus, this result for the FRB and F082 datasets is far from the desirable property of correlation coefficients to be close to the unity for long maturities.
Panel B of Table 5 reports formal tests for differences in correlations between forward rates obtained from alternative interest rate datasets. In this case, the pairwise correlations are computed using three-month non-overlapping periods (96 periods). 23 The null hypotheses of equal mean, median and variance between forward correlations are rejected in most cases. These hypotheses cannot be rejected for the comparison among DoT and F082, for both short-and medium-term maturities, and among FRB and F082 at the longer end. Besides these exceptions, results of these analyses corroborate previous findings. There are statistically significant differences in the behavior of forward rates when different interest rate datasets are used. Using a different interest rate dataset as input can affect the pricing of interest rate derivatives. For instance, Brigo and Fabio (2001) and Rebonato (2002) find that an increase in the average correlation between the percentage changes in discrete forward rates increases the valuation of swaptions. Pairwise correlation coefficients from daily forward rates using 60-day rolling windows. They are six-month forward rate starting at time 0, 1, 2, 5, 10 and 29.5 (F 0,0.5 ¼ R 0.5 , F 1,1.5 , F 2,2.5 , F 5,5.5 , F 10,10.5 , and F 29.5,30 respectively). For instance, F 5,5.5 is the 5-year ahead forward rate with a half-year tenor. As the input, we use zero-coupon interest rates from the three yield curve datasets (Department of the Treasury, DoT; Federal Reserve Board, FRB; Bloomberg, F082) during the period from January 1996 to December 2017 (5,497 trading days). Panel A reports summary statistics for the full sample of 5,437 daily observations of each forward correlation. Panel B shows results of equality tests in which we use non-overlapping observations from the daily correlation time series, i.e. we consider only one out of every 60 correlation observations (96 observations per sample). As the correlation coefficient can have values ranging from þ1 to -1, the X/Y ratio is defined as the mean correlation coefficient for the X dataset plus one divided by the corresponding mean for the Y dataset plus one. The hypothesis that the mean of the X/Y ratio is equal to unity is tested using a standard t-test. A standard sign test is used to test that the proportion of X/Y ratios higher than unity is equal to the proportion of ratios lower than unity, which is equal to 0.5. The Levene statistic is used to test the equality of variances between X and Y. ÃÃ and Ã indicate significance at 1% and 5% level, respectively, in a two-tailed test.

The relevance of the use of alternative yield curve estimations
The expectations hypothesis has a central role in term structure theory. An extensive financial economics literature has tested whether the expectations hypothesis holds.
Most studies have frequently rejected the expectations hypothesis. However, results vary from one study to the next depending on the methodology, the maturity spectrum, the frequency of the data, the sample period, and the source of the interest rates. The latter conditioning factor is related with our study. In this sense, Longstaff (2000) highlights that negative findings obtained in previous studies result from using Treasury bill rates. The apparent term premium is influenced for other security-specific features such as their liquidity. Longstaff finds support for the expectations hypothesis from repo data, considering repo rates as measures of the short-term riskless term structure. Other authors, such as Downing and Oliner (2007) and Brown, Cyree, Griffiths, and Winters (2008), obtain the opposite result from commercial paper interest rates.
To test the expectations hypothesis is a relevant exercise to illustrate the relevance of choosing one among alternative yield curve datasets. As Longstaff (2000) points out, Treasury bill yields often exhibit specialness and idiosyncratic variations. It is not uncommon for researchers to exclude bills from their analyses of the term structure of interest rates. The DoT is the only examined dataset to consider Treasury bills. Also, it only includes on-the-run bills which are the focus of specialness. Both the FRB and the F082 exclusively consider Treasury notes and bonds. In addition, the lowest maturities range from 1-month in the case of DoT and from 3-month in the case of the F082.
In this section, we replicate the result of a simple expectations hypothesis test. We only attempt to analyze the impact of using one or another yield curve dataset on a widely studied test. This study does not claim to provide a comprehensive and detailed analysis of the expectations analysis but instead aims to highlight the implications of using different samples of rates. Therefore, we merely reproduce the classical Campbell's (1995) test for our alternative yield curve datasets. As a slight robustness improvement, we use instrumental variables to address the problem of measurement error for long-term interest rate (see, e.g., Campbell & Shiller, 1991). We do not attempt to find evidence supporting or rejecting the theory; instead, we only attempt to illustrate the possible implications of considering one or another yield curve dataset. We run all the regressions using the same methodology, the same maturity spectrum, the same frequency of the data, the same sample period, but using the three alternative external interest rate datasets.
The inputs of the tests are prepared following the detailed description of Campbell (1995). We calculate continuously compounded yields by using one month as the basic time unit. 24 The yield spread of a bond is defined as the difference between its yield and the short rate. As the short rate, Campbell (1995) uses the yield of a 1-month Treasury bill. Because the DoT reports yields of on-the run securities for several fixed maturities, our actual 1-month yield is proxied by the 1-month yield reported by the DoT dataset. The excess return is calculated as the difference between bond returns and the short yield. The excess return is also the yield spread less (m-1) times the change in the bond yield, where m is the maturity in months.
The expectations hypothesis implies that excess returns of long bonds over short bonds are zero on average, as required by the pure expectations hypothesis. The first row of Table 1 (page 135) in Campbell (1995) checks whether the excess return of long bonds over short bonds has a zero mean. 25 Table 6 reports the results of replicating the Campbell's original Table 1 by using the three yield curve datasets (Panels A, B and C) and reproduces the Campbell's results (panel D). Instead of the humpshaped excess returns observed in Campbell, we obtain that the excess return increases with maturity. Most relevant for our analysis are the important differences among datasets in the estimation of excess returns for the 10-year maturity and the maturities shorter than 6 months. The mean excess return from the FRB dataset is two and three times higher than the obtained from the other datasets in the short end of the curve. As expected, excluding bills produces important differences with Table 6. Replication of " Table 1 Means and Standard Deviations of Term Structure Variables" (Campbell, 1995, p. 135 Continuously compounded yields calculated from zero-coupon interest rates from the three external yield curve datasets (Department of the Treasury, DoT; Federal Reserve Board, FRB; Bloomberg, F082) and from our own yield curve estimates, applying the weighted Svensson to the four different datasets from GovPx asset prices. The sample period ranges from January 1994 to December 2017. The yield spread is defined as the difference between the bond yield and the short yield. The excess return is calculated as the difference between the return on a bond and the short yield. The short yield is proxied by the 1-month yield reported by the DoT dataset. Standard deviations are shown in parentheses. Ã indicates significance at 5% level in a two-tailed test.
Campbell footnote: "Source: Author's calculations using estimated monthly zero-coupon yields, 1952-1991, from McCulloch and Kwon (1993. The data are measured monthly, but expressed in annualized percentage points. Each row shows the mean of the variable, with the standard deviation below in parentheses. Excess returns and yield spreads are measured relative to 1-month Treasury bill rates." respect to other datasets. However, the largest difference in absolute terms among datasets is observed for the 10-year maturity. The omission of on-the-run bonds in the FRB basket of assets provides larger excess returns in this particular analysis. Campbell (1995) also analyses whether excess returns on long bond over short bonds are unforecastable over the life of the short bond. According to the experiment design, the expectations hypothesis holds if the slope coefficients from the regressions of long rate changes on a constant and the long-short yield spread should be one (first row of the Campbell's original Table 2, page 139). Panels A to C in our Table 7 replicate these tests using the three datasets. Panel D reproduces Campbell's (1995) original Table 2. The differences among the results obtained from the alternative yield curves can be observed in all maturities in both level and statistical significance. Tests Table 7. Replication of " Table 2 Regression Coefficients" (Campbell, 1995, p.139 1952-1991, from McCulloch and Kwon (1993. Each row shows a regression coefficient b, with the standard error below in parentheses. Each coefficient should be one if the expectations hypothesis holds. The regression in the first row is y mÀ1, tþ1 À y m, t ¼ a þ b y m, t Ày 1, t ð Þ = mÀ1 ð Þ where m is long bond maturity in months. The regression in the second row is The standard error in the second row is corrected for serial correlation in the error term of the regression." using the DoT dataset provide markedly different results from those obtained using the other yield curves, especially in the case of the shortest maturity (2-month) and for the 2-and 4-year.
In summary, the results obtained in this section suggest that the conclusions of a simple test as those conducted by Campbell (1995) can differ significantly depending upon the yield curve estimates used to perform the test. The differences observed may affect both the size of the parameters estimations and its statistical significance. These differences go beyond the inclusion/exclusion of Treasury bills. Most discrepancies affect medium and long maturities.

Conclusions
This paper reports on the nature of alternative zero-coupon yield curve datasets and on how such datasets can lead to somewhat different results for some common purposes in financial economics. It is somewhat reassuring that for many of the instances we examine, the results are generally robust for the three datasets. But in some cases, the empirical results are quite sensitive to the yield curve data.
The most obvious difference among datasets is the liquidity of the considered securities. The on/off-the-run phenomenon is well-known in the literature. The basket of assets used as input for the DoT estimates includes only on-the-run securities. Instead, the FRB excludes all the bills and the on-the-run and first-off-the-run bonds. The F082 includes all the outstanding bills and bonds. However, there are other relevant aspects that may generate different behavior and shapes on the yield curves. The three data providers use different parametric and non-parametric models in the fitting process. In addition, they consider quotes, end-of-day prices, or yields to maturity.
In this paper, we examine different properties of the yield curves and their indirect implications on two relevant variables: the resulting volatility term structure and the correlations between forward rates. The outcomes suggest that even if bond prices are replicated accurately using any of the three popular datasets examined, the impact on other variables can be remarkable, especially at both ends of the yield curve. Differences in volatilities primarily affect the short end of the yield curve; moreover, the discrepancies are large with respect to correlations between long-term forward rates.
Finally, we study the extent to which the choice among alternative yield curve datasets matters for the results of a simple tests of the expectations hypothesis. The outcomes clearly indicate that when short rates (less than one year) or long rates (more than ten years) are involved, the results may differ significantly depending upon the set of yield curves used to test the expectations hypothesis. Excluding bills is not the only source of discrepancies among datasets.
In general, the DoT dataset performs well in the short end of the yield curve. It should be a good input for institutional investors interested on pricing liquid shortterm assets. Moreover, for valuating interest rate derivatives and for monetary economics purposes, the smooth on-the-run yield curve obtained by the DoT provides forward rates consistent with theory. With respect to financial data users interested on pricing bonds and other medium and long-term instruments with a "regular" liquidity level, the FRB dataset is a good choice. The information reported by the FRB allows to compute zero-coupon rates for any maturity obtaining a continuous yield curve with flexible and reliable shape. In addition, their low volatility levels for most maturities are a desirable property for risk management purposes. Finally, the F082 dataset provided by Bloomberg performs a "too" good fitting for all the maturity spectrum and shows the "average" liquidity level in the US bond market. In some cases, this overfitting may show some important relationships that could be modeled and are overlooked by the other datasets. In other cases, it simply fits to the noise present in the prices. These yield curves provide an accurate view of the current full market situation.
In conclusion, we can state that choosing one or another zero-coupon interest rate dataset can lead to empirical results that can differ significantly. The dataset choice may matter. When empirical results are sensitive to the dataset, researchers, traders, portfolio managers, policymakers, etc. should be cautious about selecting the most appropriate dataset or even about accepting those particular results or the empirical method that led to those results. Final users of the yield curve data should check the empirical results against alternative datasets, or at least they should defend the use of one of them depending on their respective applications.
Notes 9. See, e.g., Amihud and Mendelson, (1991), Duffie, (1996), Warga, (1992), Longstaff, (2000, Krishnamurthy, (2002), Goldreich et al., (2005), and Vayanos and Weill, (2008). 10. See, e.g., Fleming, (2000), Pasquariello and Vega, (2009), Graveline and McBrady, (2011) and Fontaine and Garcia (2012). 11. In fact, the BIS (2005) comments that most central banks exclude part of the maturity range for which debt instruments are available. For instance, certain central banks consider only the interval from one to ten years. 12. Our GovPX sample ranges from January 1994 to December 2006. The transaction data available until May 2001 include the last trade time, size, and side (buy or sell), the price (or yield in the case of bills), and the aggregate volume (volume in millions traded from 6 pm on the previous day to 5 pm). The quote data used from June 2001 include the best bid and ask prices (or discount rate actual/360 in the case of bills) and the mid-price and mid-yield (actual/365). 13. Callable bonds are excluded. When two securities have the same remaining maturity, only the youngest one is considered. 14. If we assume that forward rate contains information regarding economic agents' expectations about future interest rates, it would be sensible to assume that the 30-year forward rate should be similar to the 31-year forward rate. 15. G€ urkaynak et al.,(2007) recognize that the Svensson specification "assumes that forward rates eventually asymptote to a constant. The downward tilt to forward rates at long horizons is an important characteristic of the U.S. yield curve; for example, the instantaneous forward rate ending 25 years ahead has continuously been below the instantaneous forward rate ending 20 years ahead for the past decade." 16. Please, note that negative values for this parameter are not allowed. Theoretically, this parameter should be positive as it is the asymptotic value of instantaneous forward rate when the term to maturity approaches infinity. 17. The period from fall 1998 to the end of 1999 is characterized by several episodes of flights-to-quality. It has been suggested that the Year 2000 bug (Y2K) effect was one of the triggers. 18. The results obtained from EGARCH volatilities can be requested from the authors. 19. We apply the Augmented Dickey-Fuller Test, the Phillips-Perron Test, and the Kwiatkowski-Phillips-Schmidt-Shin test to check the stationarity of volatility changes rejecting the hypothesis of existence of a unit root. 20. To check the statistical significance of the model, we applied the VAR -Grangercausality Wald test using Eviews 7. All statistical results are available upon request. 21. Forward rates may embed term premiums increasing with maturity. Our analysis examines the possible impact of using one or another interest rate dataset on the correlation between forward rates in three time horizons. For each correlation, we compare its behavior in each dataset, so the results of our analysis should not be related to the impact of term premiums that are the same for all the three datasets. However, these premiums could affect the order of magnitude of the differences in correlations between datasets for different time horizons. 22. In the case of the F082 dataset, this result could be caused by the fitting method of the yield curve between 20-and 30-year maturities. First, there are only a few number of observations with a maturity longer than ten years. Second, F082 estimation at the long end of the curve merely consists of a linear interpolation between 20-and 30-year interest rates without a real estimation of intermediate rates. Thus,F 10,10.5 and F 29.5,30 become extremely dependent of the position of the 20-year rates. If it is high (with respect to 10-and 30-year rates) it makes F 10,10.5 go up while making F 29.5,30 go down. Inversely, if the 20-year rate is low (compared with 10-and 20-year rates), F 10,10.5 will decrease while F 29.5,30 will rise. 23. As the correlation coefficient can have values ranging from þ1 to -1, the X/Y ratio is defined as the mean correlation coefficient for the X dataset plus one divided by the corresponding mean for the Y dataset plus one.
24. The DoT provides market yields that are either investment yields or bond-equivalent yields. They use simple interest and day count convention actual/actual. We recalculate this yield as a continuously compound interest rate. No 1-month rates are available in this dataset until August 1, 2001. Prior to this date, we estimate the corresponding interest rate by using cubic interpolation. 25. The data in Table 7 are reported in annualized percentage points, i.e., the natural monthly variables are multiplied by 1,200.

Funding
This work was supported by the Spanish Ministerio de Econom ıa, Industria y Competitividad (ECO2017-89715-P).