Combined forecasts to improve Survey of Profession Forecasters predictions for quarterly inflation in the U.S.A.

Abstract The main aim of this study is to evaluate and improve the Survey of Professional Forecasters (S.P.F.) quarterly inflation rate forecasts. According to the Diebold–Mariano test, on the horizon 1991:Q1–2015:Q1, there were no significant differences in accuracy between the four types of predictions provided by SPF (mean forecasts, median predictions, predictions of financial service providers [f1] and predictions of non-financial service providers [f2]). The main contribution is given by the use of the algorithm for stochastic search variable selection in order to construct Bayesian combined predictions. Considering the horizon 2013:Q1–2015:Q1, the proposed Bayesian combined predictions for rate of change in the quarterly average headline consumer price index (C.P.I.) level outperformed the initial experts’ expectations. The combined predictions based on the Bayesian approach and principal component analysis for core inflation and personal consumption expenditures inflation improved the accuracy of S.P.F. predictions and naïve forecasts on the horizon 2015:Q1–2016:Q1.


Introduction
The Survey of Professional Forecasters (S.P.F.) is one of the most well-known and appreciated providers of predictions for the U.S.A. that uses experts in forecasting mostly from the economic environment. A large number of studies, especially in the last few years, have made assessments of these predictions. In many cases, comparisons with alternative providers were made in order to determine a leader in forecasting. S.P.F. represents the oldest quarterly survey of macroeconomic predictions for the U.S.A.
The aim of this study is to assess some S.P.F. forecasts of inflation rate in the U.S.A. and also to improve these experts' predictions. More specifically, we are interested in assessing and improving the quarterly inflation rate predictions of S.P.F. by introducing a Bayesian technique that has not been applied until now in the literature for generating combined KEYWORDS Forecasts; combined forecasts; stochastic search variable selection; inflation

The evaluation of S.P.F. forecasts in the literature
The delay of 5 years in the publication of Greenbook 1 forecasts made individuals look for other alternative forecasts provided by private experts in order to guide decision-making. The S.P.F. is the most well-known provider of predictions in the private sector, having been publishing forecasts since 1968. The survey is made by forecasters that belong to the business environment and Wall Street. The American Statistical Association and the National Bureau of Economic Research conducted S.P.F. before 1990. Then, the Federal Reserve Bank of Philadelphia took it over, bringing some important improvements. The involved forecasters have three main features: anonymity, speciality and volatility. Most of these forecasters are directly involved in the economic or financial environment, with the predictions being, according to Croushore (1993), their actual job. The anonymity ensures that the forecasters can provide their own expectations without the external pressure of other experts. The number of forecasters varies over time, with some disappearing and new ones appearing. Therefore, there are many missing values in the S.P.F. database.
The literature review focuses on the empirical results of the evaluation of the point and density forecasts made by S.P.F. and on the methods to improve S.P.F. forecasts. This evaluation aims to measure prediction performance, which reflects the forecasts' quality from three perspectives: accuracy, bias and efficiency. Prediction bias reveals persistent differences between actual values and predictions for an indicator, with a common measure for bias being the expected value of forecast error. Forecast accuracy, known as the converse of error, shows how close to actual values the predicted values are. Efficiency is a concept that is closer to statistical sufficiency, measuring the forecast reliability. Strong efficiency implies a strong form of rational expectations.
The literature regarding the evaluation of S.P.F. forecast accuracy proves that these predictions outperform even complex macroeconomic models, according to Ang, Geert and Min (2007). Significant improvement in accuracy is brought about by the S.P.F. mean compared to individual expectations because of the high volatility over time. Moreover, household expectations are significantly influenced by S.P.F., as Carroll (2003) stated. The forecast median is widely utilised in practice, but no one can demonstrate whether this median is superior to individual expectations. The accuracy of S.P.F. consensus predictions was evaluated by Sinclair, Stekler and Carnow (2012) for the following variables: inflation rate, real G.D.P. growth and unemployment rate on the horizon 1968 :Q4-2011:Q1. The S.P.F. forecast accuracy was also evaluated by Lahiri, Monokroussos and Zhao (2013), who showed that these predictions are better at yielding spread predictions for the current quarter and the next one, but the yield of spread expectations is more accurate for longer horizons. The headline inflation forecasts of the S.P.F. have deteriorated in the past few years and, at present, are outperformed by the predictions based on lagged headline inflation. However, the predictions for headline inflation are, according to Trehan (2015), acceptable for core inflation.
In terms of efficiency, the S.P.F. percentiles have been assessed by Lee and Wang (2012) according to the following aspects: predictive ability, the predictions' rationality and forecast encompassing. These dimensions are evaluated in order to determine whether they are in line with Greenbook expectations. For inflation forecasts, most of the percentiles are not as predictive as the Greenbook ones, but they are encompassed by Greenbook. The main conclusion proved that the inflation median of S.P.F. over-predicted the phenomenon. Many economists have tried to test whether people's expectations are rational or not. The results in the beginning of the 1980s confirmed that expectations were biased, and some researchers considered these irrational expectations to be a stylised fact. This conclusion was denied over time. The S.P.F. mean predictions for inflation and core inflation may include more useful information compared to naïve predictions, as Liu and Smith (2014) showed. Rossi and Sekhposyan (2015) assessed the growth rate of G.N.P./G.D.P. deflator predictions provided by the S.P.F., Blue Chips 2 and Greenbook. The authors concluded that S.P.F. forecasts are weaker compared to Greenbook expectations, with the best predictions belonging to Blue Chips in 1980:Q1-2005:Q1. The traditional test of rationality did not reject the hypothesis of rational expectations for S.P.F. and Greenbook.
The tests of bias for the expectations were applied by Croushore (2012) for S.P.F. predictions over time. The revision of the main macroeconomic variable expectations brought about a lot of problems in terms of measuring accuracy. The bias of tests results are dependent by two key elements: the selected subsample and the measure of the actual values of the variables. The consistent instability across the subsample was also observed by Giacomini and Rossi (2010), who do not identify any overall stylised facts. For quarterly inflation forecasts, S.P.F. expectations are almost unbiased when inflation rises. These forecasts were combined with Michigan predictions by Ang et al. (2007). The authors assessed the prediction performance of more surveys (S.P.F., Michigan survey and Livingston), analysing bias measures.
Even if the rationality under symmetric loss is, in most cases, rejected for S.P.F. output predictions, Elliott, Komunjer and Timmermann (2008) showed that only a low degree of asymmetry is required in order to overturn the rejections of the prediction rationality assumption. The variance pattern in the S.P.F. is carefully scrutinised, as Capistrán and Timmermann (2009) stated. In S.P.F. inflation predictions, there is a significant relationship between bias and disagreement. The asymmetry test was developed by Romer and Romer (2000), who proved that S.P.F. median inflation forecasts are outperformed by Greenbook expectations.
Tests based on regressions were also employed by Clements (2002) in order to assess the S.P.F. density predictions of inflation and G.D.P. growth. The author tested whether prediction probabilities of certain events equalled the real probabilities and also checked whether any systematic divergences between the two elements were linked to those variables in the experts' information sets at the particular time when the predictions were made. The quality of the event probability predictions was evaluated using forecast -encompassing tests.
Besides the S.P.F. forecast evaluations, some authors have tried to improve these predictions using various techniques. For example, different forecast combinations were constructed by Genre, Kenny, Meyler and Timmermann (2013) using S.P.F. expectations. Only a few schemes of combinations for unemployment and output growth outperformed the equally weighted scheme. For inflation, many other schemes outperformed this benchmark. Bratu Simionescu (2012) improved the S.P.F. predictions using regression models that explain the actual values. Moreover, the author proposed better forecasts based on a historical accuracy measure.

Methodology
Considering a multiple regression model, we propose to solve the canonical problem of selecting a variable. If Y is the dependent variable and X 1 , X 2 , …, X p are p exogenous variables, the main objective is to select the best model with the subset X 1 *, X 2 *, …, X q * from the initial set. The model has the following form: where * 1 , * 2 , … * q are parameters and e t is an error term. A Bayesian procedure for selecting the best subsets of variables that influence the dependent variable was proposed by George and McCulloch (1997) and is the S.S.V.S. This algorithm enables the determination of a Bayesian hierarchical prior mixture that calculates a posterior probability with a higher value for better models. S.S.V.S. avoids the difficulty of computing probabilities for all models (2 p models) by using as estimation methods -Gibbs sampling -for simulating the sample from a posterior distribution, as George and McCulloch (1993) explained. An important advantage is the fast and efficient simulation that is ensured by the estimation algorithm. The solution is rapidly obtained because there is a high chance of obtaining models with high probabilities. (1) The aim is to describe a general model with a hierarchical mixture. Let us we start from a linear standard model that explains the relationship between the dependent variable and the set of possible predictors (X 1 , X 2 , ..., X p ): X is a matrix (n × p elements) Y is a matrix (of dimension n × 1) β is a vector of parameters (dimension p × 1) and σ is an unknown positive scalar. If the estimates of the predictors' parameters have small values, those variables are eliminated from the initial model. Each possible subset of choices regarding the predictors' selection is indexed using the following vector: γ i = 0, if the estimate for parameter i is small, and i = 1 if the estimate for parameter i is large.
If q (q γ ≡ γ ' 1 ) is the dimension of the -th subset and is unknown, the uncertainty related to predictor selection is modelled using a prior mixture: In the case of the γ-th subset of predictors, β is modelled as a realisation of a prior with multivariate normal distribution: The element i on the matrix diagonal ℵ ( , ) is the best set for which the parameter is 0 or 1. The properties of the hierarchical priors are deduced from the specification of ℵ ( , ) . The residual variance σ 2 for the γ-th model is explained as a realisation of an inverse γ process for prior: The last relationship is equivalent to: Even if might be constant, it is preferable to decrease it when the number of predictors in the subset increases. can be suitable as the prior estimator of σ 2 and can be considered as a prior for the sample size. If there is no information about the prior for σ 2 , it is recommended that the selection of ≡ s 2 LS , where s 2 LS is the usual ordinary least squares (O.L.S.) estimator of σ 2 . ϑ is chosen as ( 2 | ) in order to attribute a high probability to the interval Even if can be modelled as a realisation of any prior π(γ) from the 2 p possible values for γ, the form of the prior is: The marginal posterior distribution ( |Y ) includes essential information for variable selection. Knowing the data-set for Y, the posterior distribution ( |Y ) updates the prior probabilities for each possible value of .
The prior hyper-parameters, especially ℵ ( , ) , are chosen in order to have for ( |Y ) a posterior that receives a higher probability for the set of predictors.
The S.S.V.S. algorithm is based on Gibbs sampling, which is used to simulate a sequence of the parameter when 0 (i) > 0. The following distributions are conditioned by the last values that were generated for parameters: For the application in the next section, the following model is used: γ i = 1 shows that the predictor is selected in the model. γ i = 0 shows that β i is almost 0 and the associated predictor is not selected in the model. More levels are considered for Gibbs sampling with hierarchical proper priors. In our application, the following priors are considered: • First level: Bayes formula is used to update the conditional posterior of γ i . Conditional posteriors for β i , ω i and s 2 have conjugate forms.

The assessment and improvement of S.P.F. quarterly inflation predictions
The variable used in this analysis is the rate of change in the quarterly average headline consumer price index (C.P.I.) level (annualised percentage points). In the U.S.A., inflation is mainly the effect of money supply increases. In this study, we used the actual values of the quarterly inflation rate (seasonally adjusted; annualised percentage points) provided by Federal Reserve Economic Data (F.R.E.D.) and the corresponding forecasts provided by the S.P.F. in the following variants: the prediction mean, the prediction median and the individual predictions made by a financial service provider (f1) and by a non-financial service provider (f2). The variable representing the mean of the predictions is computed as a simple arithmetic average of the individual forecasts that are given by experts. The median of the predictions is represented by the inflation forecast that is located in the middle of the set of predictions. Financial service providers of inflation forecasts refer to: commercial and investment banking, insurance, payment services, mutual and hedge funds, asset management and Association of Financial Service Providers. In the category of non-financial service providers, the following entities that offered inflation predictions were included: universities, consulting companies, pure research companies, manufacturers, investment advisors and forecasting firms. The average inflation rate follows the trend of the actual inflation rate. In 2008, the financial and non-financial services providers failed to accurately predict inflation rate evolution, while the average predicted inflation was quite close to the registered values. As Figure 1 shows, the financial service providers encountered the biggest difficulties in predicting inflation rate, with the comparison being made with the average predictions of individual forecasters and with those of non-financial services providers. Indeed, the low inflation at the beginning of the economic crisis was totally unexpected by financial institutions. Inflation reduction is used as a measure for alleviating economic recession severity by stimulating the labour market to adjust faster. The monetary authorities are those who control monetary policy by maintaining a low and stable inflation rate.
For missing values in the forecasters' data, the imputations were made using previous recent forecasts in the data-set. The S.P.F. forecasts are taken from the 'historical S.P.F. forecasts data' . The time series covered is the period from the first quarter of 1991 to the first quarter of 2015 (1991:Q1-2015:Q1). The source of data for the actual inflation values is the Organisation for Economic Co-operation and Development. The data-set is seasonally adjusted.
As we can observe from the graph in Figure 1, in the last quarter of 2008 and the first quarter of 2009, a strong deflation was registered. This phenomenon was caused by a plunge in energy costs. This deflation generated economic problems for the F.E.D. Indeed, the sustained price slide might have delayed consumer purchases. Moreover, persistent inflation is dangerous because it decreases corporate profits, later generating a bull run in stocks. Deflation was also observed in the last quarter of 2014 and in the first quarter of 2015. S.S.V.S. is applied in order to identify the predictions that better explained inflation evolution. First of all, the algorithm is applied for the entire forecast horizon (1991:Q1-2015:Q1). The correlation matrix did not indicate a multicollinearity problem. The estimation results for the model with all of the candidate predictors are presented in Table 1.
The mean, median and forecasts of financial service providers positively influenced the actual inflation values. The forecasts of non-financial service providers had a lower and negative impact on registered values. The variable inclusion probabilities are computed in Table 2.
The values of posterior means indicated that the highest probability of inclusion is registered by the median of forecasts, being followed by the average predictions. So, the aggregate predictions better explained the actual values than the individual forecasts. The refined regression model includes only one variable, according to Table 3.
After the application of the S.S.V.S. algorithm, the average forecasts and the predictions of the individual providers are excluded for an acceptance probability of 0.3. In the final regression, only the median of the forecasts is chosen. We also computed some accuracy indicators in order to identify the best predictions: mean error, mean absolute error, root mean square error and mean of the relative error in absolute value. The calculations are presented in Table 4. The individual error is computed as the difference between the actual and predicted value.
All of the experts' predictions were on average overestimated, with the mean error having a negative value. The least mean error was registered by the forecast mean. For the other indicators (mean absolute error, root mean absolute error and mean of the relative error in absolute value), the forecasts of non-financial service providers had the lowest values.  However, these results are contrary to the S.S.V.S. algorithm procedure, which showed that the forecasts of non-financial service providers were less correlated to the actual values. The use of the accuracy measure alone is not enough. In the literature, the use of a statistical test for checking accuracy differences is recommended. In this case, the sample is large enough and the Diebold and Mariano (1995) (D.M.) test is applied (see results in Table 5).
According to D.M. tests for all paired variables, there are no significant differences between experts' forecasts at a significance level of 5%.
The S.S.V.S. procedure is also applied in order to determine new predictions as an alternative to the initial one of the S.P.F. experts. The algorithm is applied on another period (1991:Q1-2012:Q4) in order to make predictions for the horizon 2013:Q1-2015:Q1. We consider more probabilities of acceptance (0.3, 0.4 and 0.5), as can be seen from Table 6.
For a probability of acceptance of 0.3 and 0.5, the mean, median and forecasts of nonfinancial service providers positively influenced the actual inflation values. The forecasts of financial service providers had a lower and negative impact on registered values. For a probability of acceptance of 0.4, the median and the forecasts of financial service providers positively influenced the actual inflation values, while the median and the forecasts of non-financial service providers had a lower and negative impact on registered values. The variable inclusion probabilities are presented in Table 7.
For an acceptance probability of 0.3, the median has the highest probability of being included in the final model. For the acceptance probabilities of 0.4 and 0.5, the highest chances to be included in the model are taken by the forecasts of non-financial service providers and the mean, respectively. For refined regression, the estimation results are presented in Table 8.
After the application of the S.S.V.S. algorithm, for an acceptance probability of 0.3, all of the variables, excluding the constant, are included in the final regression. For an acceptance probability of 0.4, the mean, median and forecasts of non-financial service providers are selected. For an acceptance probability of 0.5, only the mean and the median forecasts were introduced in the final regression. These Bayesian regressions are used in order to construct combined forecasts, which are denoted by C1, C2 and C3. Another important strategy for improving experts' predications might be the use of principal component analysis for the forecasters' predictions. The equations from the principal component analysis are used to build new predictions. This method solves the problem of multicollinearity that appears in traditional regression models. We use the data from the  period 1991:Q1-2012:Q4 in order to extract the principal component. The corresponding equation will be used to determine the new predictions for 2013:Q1-2015:Q1. The communalities are presented in Table 9. The highest communalities are registered by the mean and the median of the experts' forecasts: 0.843 and 0.844, respectively. The lowest communality was computed for the forecasts of non-financial service providers.
Only one principal component was extracted using the Kaiser criterion (the principal component for which the eigenvalue is greater than 1). The total variance explained can be seen in Table 10.
This principal component explained 68.46% of the variation in forecasts. The first two principal components together explained 88.922% of the variation. The selected principal component (inflation) has the following representation (t is the index for time): The new combined forecast based on principal component analysis will be denoted by C4. The combined predictions are presented in Table 11.
Moreover, the traditional schemes of the combination were applied (optimal scheme [O.P.T.], scheme with weights that are inversely correlated with mean squared error [I.N.V.]  and scheme with equal weights [E.Q.]). Let us consider two predictions at time t: p1 t and p2 t . The prediction errors follow a normal distribution N(0, 2 i ), i = 1, 2, and 2 1 , 2 2 are the errors' variances. If the forecast errors covariance is σ 12 = ρ × σ 1 × σ 2 and the error coefficient of correlation is ρ, the linear combination of the two predictions at time t (c t ) based on the weight m is determined as: For E.Q., the weights are equal. The weights (m opt andm inv ) are determined using the following formulae for the O.P.T. and I.N.V. schemes: The following notations are used for the new combined predictions: C5 (mean-median), C6 (mean-f1), C7 (mean-f2), C8 (median-f1), C9 (median-f2) and C10 (f1-f2). These combined forecasts are provided for each classical scheme of combination (see Table 12).
Excepting the combination mean-median, the weight m for O.P.T. is lower than m for I.N.V. The combined forecasts based on O.P.T. are presented in Table 13.
The predictions based on O.P.T. are quite close to the actual values, and there are insignificant differences between the combined predictions of this scheme.
The combined forecasts based on I.N.V. are presented in Table 14. For the first three quarters of 2013, I.N.V. led us to the same values of the combined predictions. However, for the other quarters, the predictions are very close.
The combined forecasts based on E.Q. are presented in Table 15. We also computed some accuracy indicators for experts' expectations and for the proposed combined forecasts in order to identify the best predictions: mean error, mean absolute error, root mean square error and mean of the relative error in absolute value (Table 16).
The assessment of forecast accuracy led us to the conclusion that our Bayesian technique of combination improved the experts' predictions. All of the accuracy measures registered lower values for combined predictions. The least mean absolute error and the least mean of the relative error in absolute value were registered by C1, corresponding to the Bayesian regression with an acceptance probability of 0.3 in the variable selection. The least mean error and root mean square error were computed for C3, corresponding to the Bayesian regression with an acceptance probability of 0.5 in the variable selection. The predictions based on principal component analysis performed worse than the experts' forecasts and the new predictions refers to horizon 2015:Q1-2016:Q1 (see Table 17 -19). Our Bayesian combined predictions performed better than naïve forecasts according to all accuracy measures.
The mean error of the predictions based on O.P.T. is lower than the mean error of C1, C2, C3 and C4 and of the experts' predictions. C6 (combination between mean and f1) registered the lowest value for the mean error. The mean of the relative error in absolute value was improved for the combined forecasts of O.P.T., C5 (combination between mean and median) and C6 (combination between mean and f1), being better than C1 according to this criterion. In general, I.N.V. and E.Q. performed worse than the combined forecasts based on the S.S.V.S. algorithm. If we make the comparison between these combined predictions based on classical schemes and naïve forecasts, all of the predictions based on I.N.V.   Table 18. The values of R.M.S.E. for these Bayesian combined forecasts are presented in Table 19. Core inflation shows a long-run trend at a certain price level. It excludes the items with volatile price movements. It eliminates the transitory modifications in price and short-run price volatility. Products in the food and energy sector are eliminated because they experience temporary shocks in price that could diverge from the global tendency of inflation.
In the case of P.C.E. inflation, consumer goods and services are taken into account. P.C.E. includes imputed and actual expenditures made by households (non-durable and durable goods and services).
In the U.S.A., the core inflation varied between 1.3% and 2.4% over 2015:Q1-2016:Q1, with small changes from one quarter to another. P.C.E. inflation was quite unstable, with a negative value in the first quarter of 2015 and the maximum value on this horizon in the second quarter of 2015. For such an indicator, the naïve predictions or trend extrapolation are not good solutions. Our approach based on S.S.V.S. or principal component analysis might provide better results.
In case of the S.S.V.S. procedure, the acceptance probabilities are 0.3, 0.4 and 0.5 and the predictions are denoted by C1, C2 and C3. The predictions based on principal component analysis are denoted by C4.
The combined forecasts based on the proposed methods have close values for core inflation, but also for P.C.E. inflation. These new predictions will be compared to naïve forecasts and to S.P.F. expectations as an average of individual predictions and values of financial and non-financial service providers.
According to R.M.S.E., all of the proposed combined forecasts outperformed the naïve predictions and the S.P.F. expectations. In the case of core inflation, the most accurate predictions were given by principal component analysis. Financial service providers offered the worst predictions for core inflation, but there are no large differences between these values and the average.
In the case of P.C.E. inflation, the S.S.V.S. approach at 0.5 acceptance probability provided the best forecasts. The naïve forecasts had the lowest accuracy, and this result is expected because a negative value was registered for P.C.E. inflation in the first quarter of 2015, followed by positive values.

Conclusions
The assessment of forecast accuracy is a priority for prediction providers. In the context of the economic crisis, the necessity for getting better forecasts has considerably increased. The inflation rate is still an important indicator of economic health. S.P.F. is one of the main providers of forecasts for this variable that is also predicted by Blue Chips, the Government Administration and the Congressional Budget Office. In this study, we employed the rate of change in the quarterly average headline C.P.I. level (annualised percentage points). According to D.M. tests, on the horizon 1991:Q1-2015:Q1, there were no significant differences in accuracy between the four types of predictions provided by S.P.F. (mean, median, f1 and f2). The main results of the research showed that S.S.V.S. helped us to get better S.P.F. predictions on the horizon 2013:Q1-2015:Q1. However, the optimal combined predictions outperformed these forecasts in terms of mean absolute relative error. For core and P.C.E. inflation on the horizon 2015:Q1-2016:Q1, the proposed combined forecasts using Bayesian and principal component analysis approaches performed better than naïve and S.P.F. forecasts. A limit of the research could be the fact that the results of the forecast accuracy evaluations depend on the forecast horizon and on the length of the data-set used for making estimations. Therefore, the proposed technique has an empirical character, and its generalisation is marked by a degree of uncertainty that might be minimised only by checking many different sets of data.
In future research, new techniques of forecast combination might be proposed in order to improve S.P.F. experts' predictions. For an indicator such as inflation, the extrapolation of the last value in the data-set (the construction of naïve forecasts) might be a better solution than trend extrapolation. Notes 1. Greenbook, or Greenbook of the Federal Reserve Board of Governors, represents a book that presents the projections for different economic indicators from the U.S. economy. These predictions are provided by the Federal Reserve Board. 2. Blue Chips economic indicators offer predictions for the current year and the subsequent year from each panel member, as well as a mean or consensus of their forecasts.

Disclosure statement
No potential conflict of interest was reported by the authors.