Forecasting the Estonian rate of inflation using factor models

ABSTRACT The paper presents forecasts of headline and core inflation in Estonia with factor models in a recursive pseudo out-of-sample framework. The factors are constructed with a principal component analysis and are then incorporated into vector autoregressive (VAR) forecasting models. The analyses show that certain factor-augmented VAR models improve upon a simple univariate autoregressive model but the forecasting gains are small and not systematic. Models with a small number of factors extracted from a large dataset are best suited for forecasting headline inflation. The results also show that models with a larger number of factors extracted from a small dataset outperform the benchmark model in the forecast of Estonian headline and, especially, core inflation.


Introduction
Inflation and changes in inflation are key measures of macroeconomic performance, so it follows that forecasting inflation is important in countries around the world, including Estonia. Volatile dynamics such as the pre-crisis rise in inflation, which has been largely attributed to the supply-side shocks that hit the small open economy (Benkovskis, Kulikov, Paula, & Ruud 2009), have challenged the forecasting skills of central bankers and policy-makers. 1 Forecasters earlier relied on models with only a few predictors, until increasing amounts of data became available at high levels of sectoral, regional and temporal disaggregation. Those macroeconomic, microeconomic and financial time series hold information that may be useful for economic forecasting and empirical analysis of monetary policy (Ibarra-Ramírez 2010). Bernanke & Boivin (2003) point out, however, that researchers who use a small number of variables in their analysis can exploit only a limited amount of information. Small-scale models have some advantages in their simplicity and tractability, but they are prone to omitted variable bias (Gavin & Kliesen 2008).
Factor models in which the individual macroeconomic and financial time series are driven by a small number of factors can be used to address the shortfalls of small-scale models. First, factor models summarize the information contained in a big dataset, which allows a richer information set to be incorporated in the analysis. Second, factor models are flexible in the way that they can simultaneously accommodate data released at different times, frequencies and areas. Finally, their methods for extracting driving factors are statistically rigorous, as they are agnostic about the structure of the economy (Bernanke & Boivin 2003).
This paper investigates the properties of the factor model forecast of Estonian headline and core inflation for the period from the second quarter of 2011 to the second quarter of 2014. Factors are constructed using a principal component analysis and are then incorporated into different parametrized forecasting models. To evaluate the relative performance of the forecasting methods, the forecasting errors of the factor-augmented models are compared to a univariate benchmark model to assess their predictive abilities.
This paper contributes to the growing literature on forecasting in a data-rich environment in three ways. 2 It is the first systematic study to analyse the applicability of a factor-augmented vector autoregressive (VAR) model for forecasting inflation in Estonia. Second, it examines the importance of the number of factors in the inflation forecasting model, when the factors are extracted from datasets where consumer price indicators are excluded or from subgroups of variables. Third, the paper analyses the impact of small changes in the dataset on the forecast error distributions of different factor-augmented forecasting models. This paper is organized as follows. Section 2 reviews briefly the existing literature. Section 3 discusses the econometric framework. Section 4 presents the data used in the econometric model. Section 5 presents the empirical results. The final section concludes. Tables 1-4 are displayed in the main text. Appendix 1 presents the factor analysis result tables and graphs and Appendix 2 presents the robustness test results. The Online Appendix displays the data used in the benchmark model.

Literature review
Forecasting using factor models has received a considerable amount of attention in recent years. Various studies have provided compelling evidence in support of the factor model forecast methodology. However, the literature is less conclusive in answering questions of how many factors to use in the model, the size of the dataset and the forecasting horizon.  Stock & Watson (2002b) review the forecast performance of factors, which they call diffusion indexes. The authors extract those factors from large datasets and estimate the consistency of models with time variation. They show that their diffusion index models, or factor models, offer substantial improvements over univariate autoregressive models, leading indicator and VAR models in an out-of-sample forecast of the Federal Reserve Board's Index of Industrial Production. Lin & Tsay (2005) compare the forecasts of simple factor models with those produced by advanced large predictor models such as partial least squares, BMA and combination forecasts models. Their findings indicate that partial least squares outperform other models in short-horizon forecasts using a dataset of 141 predictors. The factor model provides good forecast accuracy when the number of common components is between three and five. Gosselin & Tkacz (2001) compare the forecasting performance of four different factor models with that of univariate models. They conclude that the factor models are as accurate as more advanced models in forecasting the Canadian inflation rate. They include 344 Canadian variables together with 110 U.S. macroeconomic and financial variables. Small factor models that contain one, two or three factors yield the best forecast accuracy. The researchers provide evidence that gains in forecast efficiency can be obtained for a small open economy by combining foreign macroeconomic and domestic time series. Angelini, Henry, & Mestre (2001) extract up to 4 factors from large cross-sectional datasets comprising 278 variables for 11 EMU countries. They conclude that factor models have relatively good forecasting performance in four and eight quarter-ahead forecasts. Their  findings indicate that small models with two or three factors match the best alternative forecast models in an out-of-sample forecasting framework, especially if those factors are related to nominal developments. Bruneau, De Bandt, Flageollet, & Michaux (2007) investigate the forecasting performance of dynamic factors, which are extracted from 200 macroeconomic variables for France. Their results indicate that the dynamic factor model has good forecasting properties, especially when forecasting the core inflation rate. Factors extracted from datasets with blocks of homogeneous variables, particularly variables related to labour markets, improve their forecasts considerably. They also provide small-horizon factor-augmented VAR forecasts, finding that the FAVAR forecasts outperform the standard dynamic linear regression forecasting equation models at times of rising core inflation. Schumacher & Dreger (2004) study the performance of large-scale factor models for economic activity in Germany. They extract the factors from a dataset of 121 time series and calculate the prediction errors in out-of-sample forecasts, and they find that factor models outperform simpler univariate benchmark models. However, their forecasting gains prove to be limited and not systematic. Artis, Banerjee, & Marcellino (2005) construct a dynamic factor model from a U.K. dataset consisting of 81 variables. They consider forecasting models with between 4 and 12 factors and up to 3 lags. Their results are in line with those of previous studies for the U.S.A., such as Stock & Watson (2002b), who find that factor-based forecasts outperform standard benchmark models for price developments at both short and longer horizons.
The literature on factor model forecasts is less extensive for countries in Central and Eastern Europe (CEE), particularly for inflation forecasting. Ajevskis & Davidsons (2008) compare the forecasting performance of a diffusion index model with a generalized dynamic factor model for Latvia's gross domestic product (GDP). They use 126 quarterly time series to extract up to 12 factors. Both models outperform simpler models but the differences are not statistically significant. For short horizons, a model with four factors and two lags provides the best forecasting performance, but models with more factors and zero lags lead to better forecasting results for longer horizons. Stakenas (2012) focuses on Lithuanian GDP forecasting and uses simple and advanced principal component analysis to extract factors from a dataset of 52 monthly variables. He finds that factor models outperform naive univariate benchmark models. The most suitable models for the Lithuanian case encompass two factors irrespective of whether the factors are extracted by a generalized or static principal component method. In addition, the forecasts produced by a state-space model give similar results to those from forecasting using the principal component method.
For Estonia, Schulz (2007) derives common factors with a small-scale state-space model and with a large-scale diffusion index model, and subsequently forecasts real economic growth. The factor models show a better forecasting performance for most forecasting periods than univariate and multivariate benchmark models do. Schulz (2007) emphasizes that even though many data series are available for the Baltic states, those series are not very long and this makes it difficult to compare the results with those from mature Western countries.

Empirical model
The forecasting model uses a two-step approach. First, the factors are extracted and then they are incorporated in a forecasting model. This paper closely follows the static principal component approach of Stock & Watson (2002b) for the factor extraction. The forecasting equation is based on the approach proposed by Bernanke, Boivin, & Eliasz (2005).

Econometric framework
For the formal setup, assume X t to be an N × 1 vector of time series with t = 1, . . . ,T. It is assumed that both N and T are large. Those time series are driven by a few (q unobserved common factors. In the general formulation of a dynamic factor model, each element of the vectorX t = [x 1t . . . x it . . . x Nt ] ′ , for i = 1,2, . . . ,N can be represented as follows: where f t is the q × 1 vector of common factors, l i (L) is a lag polynomial in non-negative powers of L and e it is the idiosyncratic error term. The lag polynomial adds dynamics to the factor loadings l i , which are the weights that form a linear combination of the original variable when multiplied with the latent component. It is assumed that the innovation of the common factor f t has an autoregressive structure and that the idiosyncratic error term and the common factor are mutually orthogonal at all leads and lags. Moreover, in the so-called exact dynamic factor model, it is assumed that Ee it e js = 0 for all s if i = j, meaning that the idiosyncratic errors are mutually uncorrelated at any leads and lags (Stock & Watson 2011). Equation (1) has an alternative formulation in finite lag form: . . ,f ′ t−p ) ′ is an r × 1 vector, where r = ( p + 1) × q factors drive the variables. Λ is the factor loading matrix that relates the common factor to the unobserved series. It can be seen that the high-dimensional time series variable vector, X t is driven by a vector of latent factors, F t and a vector of mean-zero idiosyncratic disturbances, e t .
The static representation of the dynamic factor model yields the advantage that the factors can be estimated using principal components. It should be noted that since X t can contain lagged values, F t can be understood as containing arbitrary lags of factors. When the number of predictors N and the number of observations T grow large, the factors can consistently be estimated by the principal components of the T × T covariance matrix of X t . 3 Stock & Watson (2002b) show that consistency is even preserved in an approximate factor model with factor loadings and idiosyncratic errors that are serially and weakly cross-sectionally correlated (Soares 2013).
The intuition behind this property is that only the linear combination of factors will remain after the weighted averages of the idiosyncratic disturbances have converged to zero because of the law of large numbers (Stock & Watson 2011).
The forecasting equation is based on the approach proposed by Bernanke et al. (2005), who extract the factors in a similar manner to Stock & Watson (2002b) and then proceed by estimating a factor-augmented VAR. Though the variable of interest is the inflation rate, more economic variables could be incorporated in the VAR model. Let Y t denote an M × 1 vector of observable macroeconomic variables. Along with the vector of observable time series, additional economic information is contained in a k × 1 vector of unobserved factors, F t . Given a vector Y t of important macroeconomic variables and a vector F t of unobserved driving factors, it is reasonable to assume joint dynamics for(F t ,Y t ).
The joint dynamics are given by where F(L) is a conformable lag polynomial of finite order d in the lag operator L and e h is an error term with a mean of zero and a covariance matrix Q.
If at least one of the terms of F(L) that relate Y t to F t−1 is non-zero, Equation (3) is referred to as a factor-augmented vector autoregression, or FAVAR; otherwise, this system reduces to a standard VAR in Y t . Since it is assumed that M + k ≪ N, the FAVAR model can handle more information than standard small-scale VAR models, as the informational content of the large N size dataset is summarized in a small set of k factors.
The h-step ahead forecast for F t Y t is obtained recursively.
The point estimate obtained is compared to the actual observed value, forming the forecast error e t t+h to calculate the root-mean-square errors (RMSE) (Hamilton 1994).

Number of factors and lag structure
Factor forecast applications differ not only in the factor estimation method employed but also in the number of factors used. The basic factor approach suffers from an important shortcoming as the factors that are extracted are ordered by how they express the common movement in the whole dataset, but this does not take account of the specific variables being forecast. Nor is the forecast horizon considered, thought this could be of significance when targeted predictors co-move with the variable to be forecast more in certain periods than in others. Periods of stronger co-movement can be expected to yield better forecast performance (Eickmeier & Ziegler 2008). Dias, Pinheiro, & Rua (2010) point out that including only the first few factors in the forecasting equation might exclude other factors that have a high correlation with the target variable or the forecast horizon.
One important determinant of the predictive power of the factors and the number of them to be included in the forecasting equation is the size and composition of the dataset. Studies have shown the relevance of targeted predictors (Bai & Ng 2008;Boivin & Ng 2006). Somewhat in contradiction to the principle that large datasets are beneficial, oversampling problems are reported when arbitrary variables that are irrelevant for the time series to be forecast are added. Boivin & Ng (2006) point out that reducing the sample size can help sharpen the factor structure, and that as a result forecast efficiency improves when certain series show idiosyncratic error cross-correlations. 4 A second argument in favour of pre-selecting variables is that economic considerations might mean subgroups of variables related to the variable of interest would enhance the forecasting abilities of the factors extracted. For example, Bruneau et al. (2007) extract the first factor from a block of homogeneous sets of variables such as the Survey Block or the Employment Block to compare the forecasting performance of pre-selected subgroups with the ungrouped dataset forecasts. The assumption that removing or grouping targeted predictors should affect the forecasting performance is tested in two ways. First, I extract the same number of factors from a benchmark sample set and a reduced size one. The reduced size dataset excludes time series of domestic and foreign consumer prices, which should constitute targeted predictors of the headline and core inflation rates, therefore potentially worsening the predictive abilities of the extracted factors. Second, I construct sets of homogeneous variables and extract the factors from those subgroups. In the next step, I compare the forecasting performance of subgroup factor model forecasts.
In addition, I also combine the first factor from each of the subgroups, which is usually considered to contribute most to the forecast, and compare the forecasting performance of that factor with the performance of the individual subgroup forecasts.
While some studies base the number of factors on formal restrictions, others choose the number of factors heuristically. Following Bernanke et al. (2005), I use a heuristic approach and construct various FAVAR models with different numbers of factors and lag structures from different sized datasets, and use performance measures to assess their forecasting abilities. The reason for doing this is to allow the lag length and the number of statistically significant factors to be re-estimated in recursive out-of-sample forecasts for each period when the in-sample window is extended. However, assessing the impact of the number of factors and their lags on the forecasting performance is more difficult when these are reestimated for each period, making it challenging to draw conclusions if models with fewer factors and lags have higher predictive abilities than models with more factors.

Forecasting procedure and evaluation
Multistep ahead forecasts are made at one-quarter to six-quarter-ahead horizons, so h = 1, . . . , 6. I use a recursive pseudo out-of-sample forecasting method. The forecast performance is evaluated on the out-of-sample set. The in-sample set is used to initialize the methods of factor estimation, model estimation and lag order selection. The dataset starts in the first quarter of 2004 and ends in the second quarter of 2014. The choice of the starting date reflects the aim of incorporating a large number of balanced time series in the analysis. For every quarter, the forecast h-steps-ahead is obtained recursively.
From y h 2011/2Q+h to y h 2014/2Q+h , the forecast mechanism reoccurs 12 times. The iterative forecasts at the end of the out-of-sample set produce forecast values that are not used for further analysis as the actual observed inflation and core inflation values were not available at the point of analysis. Therefore, fewer observations enter the forecast performance evaluation for larger h-steps-ahead forecasts.
To compare the forecast accuracy of the models, the RMSEs are calculated for each model from the differences in the values for the quarter-on-quarter inflation rate.
So that the forecast results are comparable, the RMSEs of all the forecast models are also computed relative to the RMSE of the benchmark autoregressive (AR) forecasts. Therefore, the relative RMSE of the benchmark AR is 1.00 % or 100%.
I abstain from using the Diebold-Mariano test (Diebold & Mariano 1995) to test formally the statistically significant difference between the models in their predictive abilities. 5 Researchers tend to conduct forecasting exercises on different time periods but testing the model on different time periods proves difficult in the Estonian case, as the length of the data sample for the factor estimation is limited. Instead, I test for the impact of removing one observation by excluding the second quarter of 2014 from the calculation of the RMSE for every forecast horizon. The new RMSE are calculated using data from the second quarter of 2011 to the first quarter of 2014. If the RMSE do not deviate by significant margins between the two time periods, the results obtained are considered to be robust for small changes. In addition to testing for the impact of small changes to the forecasting period, I draw 2000 random samples from the benchmark dataset of 388 variables (see Section 4) and create datasets of 329 variables. The same principle is applied to the reduced dataset, with the number of variables in each random draw cut by 37, or about 15%. Those 2000 different datasets are used to extract the factors and forecast the headline and core inflation rates in the way described earlier. In the next step, the distributional properties of the 2000 consecutive individual model forecast errors are analysed. Specifically, I plot the frequency distribution of the FAVAR models and analyse their shape, centre, spread and position relative to the benchmark AR model. I also test for the impact of different stationarity-inducing transformation schemes on the forecast performance.

Factor-augmented VAR forecast models
The FAVAR forecasts are constructed by choosing the number of factors to be included and the lag order. I estimate 13 FAVAR models, the results for 7 of which are reported in detail. 6 All the 13 FAVAR forecasts share the same properties for the M vector. The M vector is a one-variable vector that contains either the headline inflation rate or the core inflation rate, depending on the forecasting exercise.
For the models with a fixed lag length, I start testing from small dimensional FAVAR models and then add more factors and lags. 'FAVAR 1F.1 Lag' contains the first factor (1F.) and has a lag length of one (1 Lag). 'FAVAR 12F. 1 Lag' is a three-variable vector, containing the inflation rate plus the first two factors (12F.). The model 'FAVAR 123F. 1 Lag' contains the third factor as well. Equal size k-factor models were also tested for lag lengths of two and three.
The forecast results of the FAVAR models are compared to the results of the benchmark model. Following Stock & Watson (2002b), a univariate autoregressive model of order p is used as the benchmark. The benchmark AR is based on the headline inflation rate and the core inflation rate. The lag length of the estimated lag polynomial is iteratively estimated by BIC, and is allowed to vary between one and three (1 ≤ p ≤ 3). 7 Given that the ARMA model forecasts do not improve upon the AR model forecasts, they are not reported in Section 5. 8 In the spirit of Rünstler et al. (2009), the forecasting abilities of the FAVAR models are also tested against the averages of N varying-length bivariate VARs. For each time series, the VAR is where y t is the inflation rate and x i,t is a quarterly indicator. The average of the N forecasts are then Stock & Watson (2010) also posit that since the financial crisis, it has become increasingly difficult to improve systematically upon simple univariate forecasting modes like the random walk model (RWM) by Atkeson & Ohanian (2001). Therefore, a RWM constitutes the last alternative benchmark model.

Data
The data section contains two parts. Section 4.1 briefly presents the variables and their treatment in the dataset and Section 4.2 reports the results of the factor analysis.

Variables
The series chosen for the panel used in the analysis are similar to the variables used by Stock & Watson (2002b). First, credit aggregates such as credit to firms and households are included along with data for different credit maturities, such as long-term and short-term credit. Similarly, series such as deposits from companies and deposits from individuals have been included. State budget revenues and state budget expenditures series are used in addition.
Various interest rates such as the 6-month Euribor rate and short-term interest rates are included in the dataset as money supply aggregates such as the M3 rate and key data on the balance of payments. Further statistics on trade in consumer and capital goods are used so as to account for Estonia's open economy structure. The series of the composite leading indicators (CLI) may help to predict the future economic climate and are also included.
Labour market dynamics can play a significant role in the development of wages and prices, and I include the unemployment and job vacancy rates among other statistics. Next, I took in data on the output of total, intermediate and capital goods, and data on new orders such as new orders for manufacturing goods. Like the CLI, business survey statistics give information on economic expectations, so turnover and sales are included in the dataset as they can be seen as indicators of consumer sentiment.
Following the findings of Gosselin & Tkacz (2001), who conclude that the macroeconomic dynamics of trading partners are of importance for factor modelling of inflation and output in open economies, I also consider price aggregates and the composite leading indicator series of Estonia's biggest trading partners. The aggregate PPI index for the whole euro area enters the panel as, alongside the individual PPI indexes for Finland, Lithuania, Latvia, Germany and other main trading partners. In addition, the indexes are split up into sub-categories such as producer prices for energy, and food and beverages.
Another major group of variables is the harmonized consumer price indexes (HICP) of Estonia and Estonia's trading partners. The foreign consumer price indexes can be interpreted as foreign inflation proxies. First, the HICP series from trading partners in the European Union are included in the dataset and second, the sub-indexes such as the HICP energy series or HICP food and beverages series also enter the dataset. In total, more than 140 different harmonized consumer prices indexes are included in the dataset.
Financial market dynamics should also be considered, so I include stock price data from the Helsinki stock exchange (OMXH) and the Russian RTS index. The effects of productivity changes are captured by incorporating data on the number of hours worked, average wages by employment and nominal and real unit labour costs. The last major items included in the dataset are various economic deflators.
Only a few variables on personal consumption are available for Estonia and the same applies to detailed payroll and housing sales statistics. There are no Estonian sovereign debt securities or Estonian inflation-protected securities. This is unfortunate as inflationprotected securities may be used to compute measures of inflation expectations (Shen & Corning 2001).
The variables to be forecast are the Estonian headline inflation rate and the Estonian core inflation rate. Headline inflation is defined as the official measure of consumer price inflation in Estonia for goods and services. Core inflation is a sub-category of headline inflation that excludes energy, food, alcohol and tobacco items.
The first panel used in this paper consists of 388 domestic and foreign time series at 42 quarterly observations, ranging from the first quarter of 2004 until the second quarter of 2014. This panel is labelled the 'benchmark dataset'.
To test for panel size effects and targeted predictor effects, the second panel with 246 time series was created. Its basis is the benchmark dataset, with all domestic and foreign HICP excluded. First, those series have been excluded as the factor analysis in Section 4.2 has indicated their importance for the first factor, and as a possible consequence their forecasting performance. Thereby, I can also test how a reduced-size dataset which is based on the idea that removing targeted predictors should limit the predictive abilities of the extracted factors compares to a reduced-size dataset where the series to be removed from the dataset are determined by random sampling. This procedure is explained in more detail in Section 5.4. The panel excluding all domestic and foreign HICP was labelled the 'reduced dataset', or for clarity, the 'reduced-size dataset'. A complete list of the variables used in the benchmark dataset is reported in the Online Appendix (see Table C1).
The untreated dataset contained monthly and quarterly time series, so the monthly series were transformed into quarterly series. First, this yields the advantage that the quarterly series do not have to undergo a linear interpolation procedure to generate monthly series. Second, Eickmeier & Ziegler (2008) point to evidence that quarterly data are better suited to factor forecasts than monthly data. The process of transformation involved averaging the monthly values as quarterly values, summing up the monthly values, or taking the value for the end of the last month as the quarterly value.
Missing observations were treated with a regularized iterative missing principal component analysis algorithm to avoid the overfitting problems associated with using an expected-maximization algorithm (Josse & Husson 2012). In the next step, the seasonal effects were removed from the set of variables. Time series that were already seasonally adjusted according to the issuing source were still put through this stage to remove any residual seasonality. The augmented Dickey-Fuller test was performed on all the seasonally adjusted time series. Non-stationary series were marked and then subjected to the stationarity-inducing transformation. The transformations involved taking the log differences for series that included non-negative values. For series that included positive and negative values, the first difference was taken. The exact treatment of every time series can be found in the Online Appendix C.1. In the last step, all the series were standardized to have sample mean zero and unit sample variance.

Factor analysis
I start the analysis of the factors with the benchmark dataset (N = 388). As described in Section 3.4, a maximum of five factors is used in the vector autoregressive models. The principal components summarize the variance in a dataset. The first component explains 21.94%, the second 16.68%, the third 7.99%, the fourth 5.65%, and the last one 3.82% of the total variance in the dataset. The cumulative share of the total variation of the macroeconomic variables explained by the first three factors is 46.61% and that explained by the first five factors is 56.08%.
For the reduced-size dataset, the variance explained by the first principal component is almost six percentage points more than the variance explained by the first principal component in the big dataset. The cumulative explained variance of the first three common components is 46.24% and that for the first five components is 56.44%, which is about the same as in the big dataset.
In the next step, the latent common components are extracted. The dynamics over the span of the dataset of these factor indexes are captured in the time series plot of Figure A1 in the appendix. To make the presentation clearer, only the first three factors are depicted. The initially unobserved factor dynamics are plotted together with the observed headline inflation rate. The visual analysis indicates that all three factors show either strong comovements or converse movements with the inflation rate. Those movements seem either to coincide with or to lead the inflation rate, which should give them predictive abilities. For the smaller dataset (N = 246), co-movements of the factors and the inflation rate are visible but not as conspicuous.
The correlation between the observed variables and the unobserved common component can be analysed by extracting the variables that are most characteristic for each dimension obtained by principal component analysis. This means that the statistically significant variables are identified and ranked by their correlation coefficient for the particular factors. The significance threshold at which a variable characterizes the dimension is set at 0.05. Only the variables with the 10 highest positive and negative correlation coefficients are extracted and analysed.
An example of the correlation between the observed variables and the unobserved common component is given in Table A1 in the appendix. The table reports the correlation of variables with the direction of the first factor. The producer price indexes (PPI) of Estonia's trading partners contribute most to the first factor, with the Finnish PPI excluding construction being the most important. PPI Industry Lithuania (ex construction) and PPI Intermediate goods of the European Union 15 are ranked as the fourth and fifth most important variables in terms of correlation. The turnover and sales of intermediate goods and the output of intermediate goods are also strongly positively correlated with the first factor.

Benchmark dataset forecasting results
The results for the forecast errors from the benchmark dataset are reported in Table 1. First, the relative root-mean-square errors (RMSE) of the benchmark AR, the alternative forecasts and the FAVAR forecasts are reported. The RMSE of each of the forecasting models are shown relative to the RMSE of the benchmark AR model (so the autoregressive forecast has a relative RMSE of 1.00). The six columns show the relative forecast error for one-to six-quarter-ahead forecasts. To give an example, the forecast error of the simulated alternative RWM is 114.7% of the forecast error of the autoregressive forecast at the one-quarter horizon. Obviously, low values of RMSE indicate smaller forecast errors. The results for the lowest relative RMSE, which indicates the highest predictive abilities of the factor models, are given in bold.
The last row in the table shows the RMSE of the autoregressive benchmark for the given forecast horizon. The RMSE of the benchmark AR model can be interpreted as the percentage deviation of the forecast point estimates from the actual observed values over the full forecast window.
First, in many cases, the performances of the FAVAR forecasts are better than those of the benchmark forecasts, but the differences are generally quite small. For example, from the results of the one-quarter-ahead forecasts in the first column, it can be observed that factor-augmented VAR forecasts including only the first factor show an improvement in forecasting performance over the AR benchmark and the other alternative models.
In line with the results of Stock & Watson (2002b), models with a low lag order tend to perform better for all horizons. In most cases, the FAVAR models with two lags show the best forecasting performance, and they show a tendency to improve on the benchmark at short horizons. The smallest forecast errors are usually obtained for forecasts two quarters ahead. Forecasting one-year ahead, only the FAVAR model with the first factor and two lags offers an improvement of 5% over the benchmark. For the forecasts six quarters ahead, no FAVAR model is able to outperform the benchmark AR model. In contrast, the RWM seems to capture the inflation dynamics appropriately on longer horizons, outperforming the benchmark AR model by almost 20%.
Turning to the results for the core inflation in Table 2, it can be seen that the RMSE of the benchmark model are smaller than those for the headline inflation forecasts. This is in line with the theoretical arguments; given that the core inflation rate is less volatile than the headline inflation rate, it should be easier to forecast and therefore should yield smaller forecasting errors.
The FAVAR models tend to have slightly higher predictive abilities for headline inflation than for core inflation and this is especially true at longer forecast horizons. The forecasts from the FAVAR models outperform the benchmark AR forecasts only on the one-and two-quarter forecast horizon for core inflation.

Forecasting results for the reduced dataset
The forecasting results for headline inflation using the reduced size dataset are shown in Table 3. It may be presumed that domestic and foreign consumer price indexes constitute important predictors of the Estonian headline and the core inflation rates. Removing those targeted predictors may change the factor structure and consequently the forecasting performance of the FAVAR model. Decreasing the size of the dataset and removing the targeted predictors may thus produce higher RMSE, indicating lower predictive abilities. It is, however, also possible that decreasing the sample size from 388 to 246 variables will lead to the removal of less important predictors that dilute the extracted factors, resulting in a set of factors which can be used to calculate FAVAR forecasts that have lower RMSE than the FAVAR forecast with factors extracted from the benchmark dataset. As the underlying time series of the benchmark model have not changed, and so neither have their RMSE, the absolute and relative forecasting errors of the FAVAR models can be directly compared between the two different sized datasets. Stock & Watson (2002b) found that the performance of comparable models is usually better when factors from a full dataset are used than when those from a reduced size subset are used. However, the assumption that removing predictors from the dataset would lead to worse RMSE values cannot be confirmed by the results obtained for Estonia. First, the best performing headline FAVAR models from the benchmark dataset contain fewer factors than the best performing headline FAVAR models from the reduced dataset. Second, for forecasts within one to three quarters, the FAVAR model with the first factor shows the lowest RMSE in the benchmark dataset, whereas the FAVAR model with the first three factors shows lower forecasting errors in the reduced size dataset for the four to six quarters horizon. Comparing these different models, I see the forecasting performance is quite similar and most models outperform the benchmark by small margins. Table 4 shows the forecasting results for core inflation when the reduced dataset is used. I already know that the benchmark AR model is the main competitor to the FAVAR forecasts, as the RWM does not seem to capture very well the less volatile dynamics of the core inflation rate.
Notable differences appear when the headline and the core inflation forecasts are compared within each of the two datasets and are also apparent between the benchmark and the reduced-size datasets. The best headline FAVAR forecasts show lower forecasting errors than the benchmark AR for the forecasts one, three, four, five and six quarters ahead even though the performance improvement is partly weak in economic terms. The best core FAVAR model forecasts improve upon the benchmark AR in all the oneto four-quarter-ahead forecasts. However, the forecast improvement gains are much higher, especially for the forecasts one, two and three quarters ahead. To give an example, for the forecasts one to four quarters ahead, the core FAVAR models improve on average upon the headline FAVAR forecasts by 11%.
Comparing the headline inflation forecasts of the benchmark dataset with the headline inflation forecasts of the reduced dataset, it can be seen that FAVAR models with one factor have the lowest forecasting errors in the benchmark dataset, whereas FAVAR models with the first three factors have the lowest RMSE in the reduced dataset. The differences in forecasting errors between these two forecasting models are, however, small and not systematic.
The results for the core inflation forecasts are more conclusive. Not only can a tendency for multi-factor models to have better forecasting abilities than models with only the first factor be observed, but the best performing core FAVAR forecasts are obtained when the factors are extracted from the reduced size dataset. In addition, those forecast errors are the smallest of any model at any for all forecasting horizons.
The results from Tables 1 to 4 indicate that the forecasting performance of the FAVAR models is directly related to the number of factors included in the model. There is a clear tendency for FAVAR models with the first three factors to have higher predictive abilities than models containing only the first factor when those factors are extracted from a reduced size dataset. The forecasting performance also depends on the number of factors and the inflation measure to be forecast. These dynamics are interesting and deserve some discussion.
One possible explanation why models including the first three factors have a similar forecasting performance to that of models with only the first factor, depending on the size of the dataset, is that the information content of the benchmark dataset is higher than the information content of the reduced size dataset. When the factors are extracted and included in an FAVAR model, the number of factors needed in the model for it to exhibit good predictive abilities reflects the additional information content of the dataset. An FAVAR model with only the first factor from the benchmark dataset seems to capture an appropriate amount of additional predictive information. In contrast, the first three factors have to be included in an FAVAR model to obtain similar predictive abilities when those factors are extracted from a reduced size dataset with presumably lower information content. One explanation might be that the benchmark dataset contains more targeted predictors. For that reason, the first factor shows good predictive abilities, whereas more than just the first factor is needed to achieve similar predictive abilities when the size of the dataset is reduced and thereby possible targeted predictors are excluded.
The second question that arises is why the core inflation forecasts with factors extracted from the reduced-size dataset improve upon the benchmark dataset and headline inflation forecasts by significant margins. One possible reason underlying this observation may be derived from the factor analysis in Section 4.2. When the factors are extracted from the small dataset, their dynamics are less pronounced. It may be conjectured that the interdependencies of these three factors and the inflation rate in the VAR system are more accurate in capturing the less volatile dynamics of the core inflation rate.
Finally, the results for the random walk forecasts deserve attention. The only model which consistently outperforms the benchmark AR by economically meaningful margins is a unit-root-based forecast, which is arguably a surprise. Comparing the results of the random walk forecasts for headline inflation and for core inflation, it is clearly observable that random walk forecasts have substantially better forecasting abilities than all other models when headline inflation is forecast three to six quarters ahead. No such pattern is visible for core inflation, and the random walk forecast tends to worsen with increasing forecast horizon.
The results for headline inflation are in line with the findings in Atkeson & Ohanian (2001) who found that backward-looking Philips curve forecasts cannot improve upon naive RWMs. Even though it has been shown that those findings are sensitive to the sample period and the parametrization of the Philips curve model, Stock & Watson (2007) admit that on average, it is difficult for multivariate models to beat simple univariate models. Stock & Watson (2007) argue therefore that the value added of more complex multivariate models compared to simple univariate models is limited.

Forecasting results for the subgroups
So that homogeneous subgroups are obtained, only the monthly data series are used. To give an example of the series construction, consider the composition of the money and price indexes subgroups. The money subgroup groups together credit, deposits, finance, interest rates and money supply variables. The price indexes subgroup brings together PPI trade partners, domestic PPI, HICP trade partners, and import and export prices are joined. Forecasts are conducted for each subgroup. 9 In addition, the first factor from each of the six subgroups is estimated. In the next step, all six factors enter a separate factor-augmented VAR and the forecasting procedure is repeated. Last, only the first factor from the three best performing subgroups (money, price indexes and real factors) are combined to form an FAVAR model that include three factors. Table 5 shows the forecasting results for headline inflation for the subgroup price indexes. An FAVAR model where all three factors are included shows impressive forecasting performance, outperforming not only the benchmark model but also the random walk forecasts.
On the four-quarter horizon, the subgroup model shows forecast errors that are less than half those of the benchmark model. 10 Even the worst three-factor forecast improves upon the benchmark model by almost 20%.
However, the price indexes subgroup forecasts do not have the same predictive power when the core inflation rate is forecast, showing forecast improvements only at short horizons (Table A18 in the appendix). Similarly, the headline inflation forecasts of the subgroup money show good predictive abilities on longer horizons, outperforming the benchmark model by approximately 30% when the first factor is used for forecasting ( Table A12 in the appendix). Also, the one-factor forecasts of the subgroup including real variables show lower prediction errors than the benchmark model at short horizons. 11 In the last step, I combine the first extracted factor from each subgroup and include them in the forecasting equation (Table 6). The forecasts fail to improve upon the benchmark forecasts under most specifications and show higher forecasting errors than models where the factors are extracted from a single dataset. This dismal forecasting behaviour might be a consequence of factors that have proven their forecasting abilities being put together in some subgroups such as price indexes or money with factors that do not improve forecasts when they are used to augment the VAR model. Surprisingly, an FAVAR that includes the first factor from the price indexes, money and real subgroups does not improve solidly over the model with all six subgroup factors at short horizons. At longer horizons in contrast, grouping only the three subgroup factors yields considerable forecasting gains. On the six-quarter horizon, a model composed of the first factor from the price indexes, money and real subgroups shows forecasting errors that are almost 50% lower than those of the benchmark AR model. However, these forecasting gains vanish with core inflation and no model shows stronger predictive abilities than the benchmark model.
The results of the subgroup forecasts indicate that factors extracted from homogeneous blocks of nominal variables, in particular those derived from consumer and producer prices, show superior forecasting abilities at most horizons when headline inflation is forecast. I also find good forecasting properties from the money subgroup, especially at longer horizons. These encouraging results are in line with the results obtained by other authors who use similar subgroups of nominal variables to extract factors to forecast nominal economic variables (Banerjee, Marcellino, & Masten 2014;Bruneau et al. 2007). Nevertheless, the question remains whether the strong forecasting performance stems from removing series with highly cross-correlated errors in the factor model or from linking nominal price series to the variable to be forecast or a combination of both. 12

Robustness analysis
First, I check if the transformation procedure impacts the forecast performance of the models. Following Stock & Watson (2002a), I use a group-wise stationarity-inducing transformation scheme on the benchmark dataset. To do this, I take, for example, the first difference of time series in the finance class and the second difference of time series in the credit class. 13 After the transformation, I test again for stationarity of the variables, then I repeat the complete forecasting exercise for headline and core inflation. The results indicate that the group-wise transformation scheme leads to inferior forecasts under all specifications (headline, core, reduced-sized and subgroup forecasts). 14 This might indicate that groupwise transformation excessively differentiates the variables in the dataset, diminishing their predictive power even before their information content is summarized by the factors.
Analysing the sensitivity of the forecasts to small changes in the time period reveals that the results obtained for both datasets tend to be quite robust; see Tables A7-A10 in Appendix 2. When one-quarter is removed from each period for which the forecast errors are calculated, the RMSEs change only a little in most cases. The models with the best forecasting abilities in the full datasets also tend to have the highest predictive abilities in the datasets where one period was removed. Overall, the robust forecasting Table 6. Headline inflation out-of-sample forecasting results one to six quarters combination of the first factor from the subgroups. performance can be attributed to the models obtained from the benchmarked and the reduced size datasets. The results for the core inflation forecasts indicate even less sensitivity to changes in the time sample. Next, I analyse whether small changes in the composition of the dataset affect the forecasting results. The frequency distribution for the headline inflation forecasts with factors extracted from the benchmark dataset can be seen in Figure A3. The vertical line represents the AR benchmark, while the frequency plots report the distribution of the forecasting errors of the FAVAR models. Model distributions that are to the left of the vertical line have lower forecasting errors than the benchmark AR model does.
The distribution of the forecast errors supports the hypothesis that FAVAR models with the first factor and two lags (FAVAR 1F. 2 Lags) outperform the benchmark model. Up to the forecast horizon five quarters ahead, the mass of the distribution is centred clearly to the left of the benchmark AR model with a low spread. On the five-quarter-ahead horizon, the mass is still centred to the left of the benchmark, although for some samples the RMSE are larger than for the benchmark. Forecasting six quarters ahead, the FAVAR models do not manage to outperform the benchmark in the majority of the sample cases.
The sampling distribution for core inflation (see Figure A4) shows that five FAVAR models have lower forecast errors than the benchmark AR model has at the one-and two-quarter forecasting horizons. In particular, the RMSE of the samples from the FAVAR models with the first factor and one lag (FAVAR 1F. 1 Lag) and the first three factors and one lag (FAVAR 123F. 1 Lag) clearly outperform the benchmark AR. Analysing the graphs over longer forecasting horizons leads to the same conclusions as those in the analysis of Table 2, as all the FAVAR model forecasts fail to outperform the benchmark AR forecasts in most cases.
When the number of variables in the reduced size dataset is decreased by 15%, the forecasting error frequency distribution for headline inflation (see Figure A5) shows less stable behaviour. At the one-quarter forecasting horizon, the distribution of the forecasting errors of the FAVAR model with the first factor and two lags (FAVAR 1F. 2 Lags) and the FAVAR models with the first factor and three lags (FAVAR 1F. 3 Lags) are clearly to the left of the benchmark AR model. At the two-quarter forecasting horizon, the mean of the FAVAR model with the first factor and two lags (FAVAR 1F. 2 Lags) is centred slightly to the right of the benchmark. More interestingly, the FAVAR model with the first three factors and one lag (FAVAR 123F. 1 Lag) has a spread distribution with a mean slightly to the left of the benchmark AR. For the one-factor two lags or one-factor three lags FAVAR models, the distributions tend to be to the left of the benchmark value for the three-and four-quarter-ahead forecasts. The distributions of the first three factors model (FAVAR 123F. 1 Lag) are spread between a range of approximately 0.9 and 1.1, with a tendency to be centred slightly to the left of the benchmark AR.
For core inflation, the distributional properties (see Figure A6) of the forecasting errors of the FAVAR models are similar to those of the headline inflation forecasts. At short forecasting horizons of one and two quarters, the FAVAR model including the first factor and one lag (FAVAR 1F. 1 Lag), the model with the first and the second factor and one lag (FAVAR 12F. 1 Lag) and the model with the first three factors (FAVAR 123F. 1 Lag) clearly outperform the benchmark AR, even when the reduced size dataset is shrunk in size by 15%. At the three-and four-quarter forecasting horizon, only the model with the first three factors (FAVAR 123F. 1 Lag) tends to outperform the benchmark AR.
However, the distribution is spread out and asymmetric. Similarly, for the five-and sixquarter forecast horizon, some samples from the same model (FAVAR 123F. 1 Lag) show lower forecasting errors than those of the benchmark, but most values lie at the right side of the spread-out distribution.
In summary, the results show that small arbitrary changes to the number of variables in the two datasets have only a small impact on the forecast performance of FAVAR models that include only the first factor. For the reduced size dataset, however, a different dataset composition has substantial effects on the FAVAR model with the first three factors and one lag (FAVAR 123F. 1 Lag). At the three-to four-quarter forecasting horizon in particular, the slightly asymmetric spread of the distribution around the benchmark AR value of one makes it difficult to draw a conclusion as to whether the FAVAR 123F. 1 Lag model forecasts outperform the benchmark model or not. This indicates that arbitrary changes to the number of predictors have a stronger impact on the reduced-size dataset than on the larger benchmark dataset.

Final comments
This paper investigates the performance of factor-augmented VAR models when they are used to predict the Estonian headline and core inflation rates. The factors are extracted by a principal component method from a big benchmark dataset with 388 quarterly economic and financial time series, and a reduced size dataset consisting of 246 series. In addition, factors are extracted from subgroups of variables that are formed by economic intuition. The FAVAR forecasts range from the second quarter of 2011 to the second quarter of 2014 and their forecasting errors are compared to naive benchmarks, such as an autoregressive forecast.
The analysis of the forecasts of Estonian headline and core inflation at various forecast horizons and using different sample sets yields interesting and arguably surprising results. Five findings can be highlighted. First, factor model forecasts can improve upon an autoregressive forecast but in most cases the forecasting gain is limited. Second, some models with one factor have smaller forecasting errors when the factors are extracted from a big benchmark dataset. Third, certain big factor models that contain three factors perform better than models with fewer factors when the factors are taken from a smaller dataset where the consumer price indicators have been excluded. This indicates that the dataset size and dataset composition matter for forecasting performance. Fourth, factors extracted from homogeneous subgroups of nominal variables show the best performance for projecting headline inflation but have less predictive power for core inflation forecasts. Fifth, the forecasting performance is less contingent upon small arbitrary changes in the dataset composition when the factors are extracted from a large dataset than is the case with small arbitrary changes in a small dataset.
Surprisingly, essentially similar forecasting results for the Estonian inflation rate, and even better ones in certain cases, emerge when the factors are extracted from a reduced-size dataset that excludes domestic and foreign consumer price indicators. These effective forecasts can be obtained from FAVAR models with the first three factors and one lag. However, the robustness analysis for this model indicates that small changes in the composition of the reduced-size dataset might have a substantial impact on the first three factors and therefore also on the forecasting performance.
Extracting factors from subgroups of nominal variables, in particular those derived from price and money data yield substantial forecasting improvements under all horizons, whereas forecasts from other subgroups show dismal forecasting performance. Combinations of the first three factors from the subgroup prices improve upon the benchmark model by almost 50%. However, combinations of the first factor from all six subgroups or the best performing three subgroups do not improve substantially upon the benchmark model.
Even though the results point to notable differences between the headline and core inflation forecasts, a clear statement of whether FAVAR models are better suited to forecasting one or the other is difficult to derive. Headline inflation forecasts show a tendency to perform better at longer horizons, whereas core inflation forecasts have slightly better predictive abilities at short horizons when the factors are extracted from the benchmark dataset. However, for the FAVAR models with more factors, when the factors are extracted from the reduced-size dataset, the core inflation results clearly outperform the headline inflation results in the first four quarters. Restricting the dataset size further by constructing homogeneous subgroups of variables fails to improve forecast accuracy upon the benchmark model where core inflation is concerned.
The findings provide evidence that simple factor model forecasts such as factor-augmented VAR models can improve upon naive forecasts under certain circumstances. The forecast performance depends greatly on the number of factors included in the model, the size of the dataset from which the factors are extracted, the time series to be forecast, and lastly, the forecasting horizons.
Forecasting inflation still remains a challenge and this also applies to Estonian inflation. Among the models examined, substantial forecasting gains can only be reaped from two distinct models. Even from the perspective of an experienced forecaster, it is still difficult to assess a priory how many factors should be incorporated in the model in relation to the size of the dataset. Forecasting with factors extracted from subgroups built on economic intuition can improve upon forecasts with factors extracted from a single dataset. However, categorizing variables into subgroups and combing the correct number of factors from different subgroups is non-trivial, especially compared to constructing an FAVAR model from a single large dataset. For Estonia, the results indicate that using an FAVAR model with the first factor extracted from a large dataset provides good forecasting performance, even when the exact size and composition of the dataset are unknown. 3. The objective function for the estimation of the factors F t is given by where F = [F 1 , . . . , F t , . . . F T ] ′ and L i is the ith row of Λ. F and Λ are subject to the constraint F ′ F/T = I r , where I r is the r × r identity matrix. Hence, applying the principal components method means that the residual sum of squares is minimized subject to the normalization that F ′ F/T = I r . 4. One formal way to separate targeted predictors from uninformative time series is proposed by Bai & Ng (2008). They suggest partitioning the panel of predictors into two subsets. The first subset should include all time series (targeted predictors) that are relevant for the specific variable to be forecast and the other subset should include all series that are non-informative. The partitioning is done with thresholds defined by the least absolute shrinkage and selection operator (LASSO) and the elastic net rules. While those shrinkage models are interesting from a technical perspective and most researchers in the field acknowledge the importance of targeted predictors, practitioners tend to rely on heuristics to determine which time series to include in their dataset. 5. The Diebold-Mariano test suffers from two shortcomings when the forecasting approach of Bernanke et al. (2005) is followed. First, the finite sample properties of the estimators on which the forecasts may depend are not preserved asymptotically. Second, the DM test is prone to nested model bias (Giacomini & White 2006). That presents a problem under the out-of-sample extending window forecasting procedure when the competing forecasts are obtained from autoregressive and factor-augmented VAR models. 6. The six models not reported include an FAVAR forecast where the lag order is allowed to vary, a model including the first five factors, and models including only the second factor at different lag lengths. The forecasting results for those models are available upon request. 7. To ensure that the AR model constitutes a competitive benchmark model, the RMSE of different lag length intervals were compared. Neither a fixed lag order of one, two or three lags nor intervals ranging from 1 ≤ p ≤ 2 up to 1 ≤ p ≤ 12 show lower forecasting errors for the benchmark model than the forecasts obtained from AR models where p is allowed to vary between one and three. 8. The results for the autoregressive moving average (ARMA) are identical to the results from the AR benchmark forecasts. Within the order constraints given, which are a maximum of three lags for any autoregressive component and a maximum of three lags for any moving average component, the Bayesian information criterion (BIC) determined unanimously that the given process does not include any moving average terms. Therefore, the lag structure is equal to the lag structure of the benchmark AR process, and the forecasting results are identical. 9. The exact grouping scheme can be found in Table A11. 10. Adding the third factor from the subgroup price indexes improves substantially upon the benchmark model. A factor analysis reveals that the HICP components of Estonia's Central and Northern European trading partners, namely the U.K., Germany and Finland, contribute most to the third factor. The most important HICP components are non-energy industrial goods (NEIG), industrial goods and goods (ox services). In contrast, the HICP components of Estonia's Baltic trading partner Latvia dominate the variables that are negatively correlated with the third factor. The most important HICP component is once more non-energy industrial goods (NEIG). 11. The forecasting results for the additional subgroups for headline and core inflation are shown in Tables A12-A24 in the appendix. 12. To test whether the good forecasting performance is driven by a sharpened factor structure, I use the LASSO on the benchmark dataset. The LASSO operator is constructed in a similar fashion to that used in Bai & Ng (2008), the difference being that the LASSO tuning parameter λ is chosen by cross-validation. The LASSO model indicates that the reduced panel should only contain seven variables from the given benchmark dataset. Six of those variables are nominal price indicators, with the seventh being the survey of consumer price trends over the last 12 months. This means that the selection operator chooses a majority of the variables that are included in the subgroup prices, albeit in smaller numbers. One interpretation of this finding is that the variables to be included in a model suggested by the LASSO operator are similar to those variables grouped by economic intuition. This might indicate that the good results are driven in part by a sharpened factor structure and in part by a close link between the nominal variables the factors are extracted from and the nominal variable to be forecast. 13. The detailed transformation scheme can be found in the appendix Table 11 14. The results are available upon request.  Figure A2. Estimated common factors and Estonian core inflation ratereduced-size dataset.  Figure A4. Frequency distribution Estonian core inflationbenchmark dataset.