Investigating Growth at Risk Using a Multi-country Non-parametric Quantile Factor Model

We develop a Bayesian non-parametric quantile panel regression model. Within each quantile, the response function is a convex combination of a linear model and a non-linear function, which we approximate using Bayesian Additive Regression Trees (BART). Cross-sectional information at the pth quantile is captured through a conditionally heteroscedastic latent factor. The non-parametric feature of our model enhances flexibility, while the panel feature, by exploiting cross-country information, increases the number of observations in the tails. We develop Bayesian Markov chain Monte Carlo (MCMC) methods for estimation and forecasting with our quantile factor BART model (QF-BART), and apply them to study growth at risk dynamics in a panel of 11 advanced economies.

The existing literature, with few exceptions, uses quantile models for a single variable of interest.These models are specified conditional on a specific quantile and assume a linear relationship between the predictors and the quantile function of some outcome variable. 1For macroeconomic data this assumption might be warranted in normal times but in turbulent times it could be that regression relationships change or turn non-linear.Moreover, often several variables rather than a single one are of interest, and in these cases a joint model would be preferable.These observations motivate the model we develop in the present paper.
In contrast to much of the existing literature we propose a non-parametric model which involves multiple equations and allows for assessing whether the quantile response function is linear or unknown and possibly highly non-linear.In particular, the model we propose is a multi-country, non-parametric quantile regression, which we then use to investigate growth at risk in a panel of 11 advanced economies.
The justification for adopting non-parametric methods is provided by Huber, et al. (2020) and Clark, et al. (2021), which found Bayesian non-parametric vector autoregressions (VARs) to be able to successfully model the tails of predictive densities of macroeconomic variables in a flexible and accurate manner.These papers found that Bayesian Additive Regression Trees (BART) are an effective non-parametric method that is particularly useful in crisis times (e.g., the Financial Crisis or the Covid-19 pandemic) when growth at risk issues are of particular importance.However, in normal periods, the predictive gains from using BART are more muted (and sometimes negative).In the present paper we extend the BART regression methods used in these papers to the quantile BART case.Since the predictive gains of BART vary over the business cycle, we assume that within each quantile, the response function is a convex combination of a linear model and some unknown non-linear function, which we approximate using BART.Studies such as Taddy and Kottas (2010) have developed other Bayesian approaches to nonparametric model-based quantile regression.
The justification for use of a multi-country model is that a panel dimension can often improve forecasts with respect to single country models; see, among many others, Bai, et al. (2020) and Feldkircher, et al. (2021).Moreover, and specifically for the quantile case, macroeconomic data sets are short, leading to a small number of observations in the tails of the distribution.
We develop a model for the p th quantile that includes a factor that summarizes the available cross-country information at that quantile.In addition, as indicated below, our Bayesian model specification has features that allow information from other countries to inform estimates for a given country.Exploiting this cross-country information through a pooling prior improves predictive accuracy by parsimoniously including international information to inform coefficients associated with domestic quantities.
We then develop Bayesian Markov Chain Monte Carlo (MCMC) methods for estimation and forecasting with our quantile factor BART model (QF-BART).These methods are scalable to large panels with a potentially large number of exogenous regressors.
In terms of empirical results, our proposed models commonly improve on the benchmark single country linear quantile model in recursive growth forecast comparisons, more so in the tails than near the center of the distribution.Importantly, estimating the weight assigned to the BART component of the model as compared to the linear component is helpful to forecast accuracy.The estimated combination weight is smaller (i.e., the model is more linear) for the 25 and 75 quantiles than in the tails, i.e., for the 5 and 95 quantiles.Moreover, some form of international information definitely pays off (either via the new pooling prior, or by outright including non-domestic series).The effects of the common (volatility) factor are also relevant, as it seems to explain a large fraction of the forecast error variance in most countries, in particular in the tails.A shock to this factor, which can be interpreted as an uncertainty shock, has different (stronger negative) effects in the left tail than the right tail of the growth distribution.In the left tail, the BART piece with an estimated weight tends to slightly mitigate the effects of the shock.Finally, a financial shock in the US spills over to other countries.There is asymmetry in the responses in the sense that a positive shock affects the growth quantiles, whereas a negative shock's effects are not as sharp.Moreover, the effects and asymmetry are more pronounced in the 2010-19 period than earlier.
The remainder of the paper is structured as follows.The next section defines and motivates our QF-BART model including its prior and discusses MCMC estimation.The third section contains our empirical work while the fourth section concludes the paper.

A multi-country non-parametric regression model
We model the joint distribution of (for simplicity, demeaned) output growth for a panel of N countries.These are stored in an N −dimensional vector y t = (y 1t , . . ., y N t ) with y it denoting time t output growth in country i.Domestic real activity might depend on the lags of y t as well as other, exogenous factors.These pre-determined quantities are included in a K-dimensional vector x it .We adopt a notational convention where x it is structured such that J domestic quantities for country i are always ordered first, followed by all K − J non-domestic variables.
We assume that y it follows a quantile regression model which, for the p th quantile, is given by: with g ip : R K → R denoting unknown country-specific functions and β ip is a K × 1-dimensional vector of regression coefficients.ω ip is a quantile and country-specific parameter that controls how much weight is placed on the non-linear part of the model.The case ω ip = 1 would correspond to a fully non-linear model whereas ω ip = 0 would be a (conditionally) linear quantile regression specification.Contemporaneous relations across the elements in y t are introduced through a static factor model with λ ip denoting the country-specific factor loading and f pt the corresponding international factor.Finally, ip,t follows an asymmetric Laplace distribution (ALD) scaled by a parameter σ ip with its p th quantile being equal to zero.2 We assume that the latent factor is uncorrelated over time and arises from a Gaussian distribution: with h pt being a (logarithmic) variance that evolves according to an AR(1) process: Here we let µ p denote the unconditional mean, ρ p the autoregressive parameter, and ς 2 p the error variance of the log-volatility process.This model possesses several features which should improve not only its predictive capabilities but also allow for additional inferential opportunities.First, the presence of the quantile-specific weights allows for data-based selection of the degree of non-linearities across quantiles.Recent literature indicates that, in the lower tails of the distribution of output growth, macroeconomic relations change and might be subject to substantial non-linearities.While such non-linearities may be important in the extremes of a distribution, linear models might describe the behavior well in tranquil periods of the business cycle (e.g., in the center of the distribution).
Our model allows for this by setting the corresponding weights ω ip appropriately.Second, our model allows for lagged relations across countries.The key point to notice is that these dynamic interdependencies can differ across quantiles.For instance, it could be that in the presence of a global adverse economic shock, cross-country dependencies are more important than in tranquil times.Third, the presence of a common static factor that exhibits conditional heteroscedasticity can capture contemporaneous relations across the elements in y t for a specific quantile.Moreover, since the factor is conditionally heteroscedastic, it can also control for sudden shifts in the conditional variance of the dependent variables.Inclusion of this stochastic volatility factor allows us to control for unobserved heterogeneity, a feature which might be extremely important during periods such as the recent Covid-19 pandemic (see, e.g., the discussion in Carriero, et al. (2021)).
The model in Eq. ( 1) is quite general and nests several commonly used alternatives in the literature.For instance, setting ω ip = 0, λ ip = 0 for all p, and setting x it such that it includes only the first lag of GDP plus a (lagged) measure of financial conditions yields a model very closely related to the one proposed in Adrian, Boyarchenko, and Giannone (2019).Notice that this model essentially rules out cross-country relations.Setting ω ip = 1 for all p yields a non-parametric quantile regression model which, depending on λ ip and the choice of x it , allows for cross-country relations in a flexible manner.Setting ω ip = 0 returns the linear quantile regression model but with a common volatility factor.

Approximating the unknown functions using BART
We treat the function g ip as unknown and approximate it using BART (Chipman, George, and McCulloch, 2010).Though other alternatives are possible, BART has been successfully employed in economics for forecasting financial time series in Huber and Rossini (2020), nowcasting GDP in selected European economies in Huber, et al. (2020), and tail forecasting of output, inflation, and unemployment in Clark, et al. (2021).BART is a sum-of-trees model that approximates g ip by summing over many individual trees that all take a simple form and act as "weak learners." The BART approximation for g ip is given by: with v denoting a tree function that is determined by a tree structure T s ip and a vector of terminal node parameters µ s ip .This terminal node parameter vector has dimension b s ip .The tree structure consists of multiple decision rules that ask whether a covariate exceeds a threshold and, according to these simple binary rules, produces (disjoint) partitions of the input space.These take the form x ij,t > c or x ij,t ≤ c, with x ij,t denoting the j th element of x it and c being a splitting/threshold value.Sequences of these decision rules lead to a terminal node coupled with a corresponding terminal node parameter in µ s ip .When S is large, the BART approximation is prone to overfitting if no further regularization is introduced.Chipman, George, and McCulloch (2010) use regularization priors to force the trees v to be simple.We achieve this through shrinkage priors on the tree structure and the terminal node parameters.Following Chipman, George, and McCulloch (1998), the prior on T s ip is obtained by constructing a tree-generating stochastic process.The prior p(T s ip ) comprises of three aspects.First, tree complexity ultimately depends on the depth of intermediate nodes d.
If d is large, the tree is complex and thus might overfit the data.To force the individual trees to be simple, we assume that a given node at depth d is non-terminal with probability proportional to: where α is between 0 and 1 and ζ > 0. Notice that this probability decreases in d: growing more complicated trees becomes unlikely if d is already large.The amount of shrinkage is controlled by α and ζ.These hyperparameters are often set to α = 0.95 and ζ = 2, implying that trees with two or three terminal nodes receive over 80% of total prior probability.Chipman, George, and McCulloch (2010) found that, for over 40 data sets, this choice performs well, and extensive cross-validation for α and ζ only improves predictive accuracy by small margins.The second and third aspect are concerned with how decision rules are constructed.To this end, we use discrete uniformly distributed priors to select the variables showing up in the decision rule as well as a uniform prior over the splitting/threshold values.
The second source of shrinkage is a Gaussian shrinkage prior on µ s ij,p , the j th element of µ s ip .Chipman, George, and McCulloch (2010) recommend scaling the prior using the range of the data.More specifically, let y i,min and y i,max denote the minimum and maximum of the observed data in country i.The corresponding Gaussian prior is then given by: with γ > 0 being a prior scaling parameter, typically set equal to 2. The prior implies that if the number of trees S is large, the prior variance decreases and the amount explained by a single tree is decreased.This is consistent with the notion that each tree acts as a "weak learner," explaining only a small share of variation in the response variable, but the ensemble model provides sufficient flexibility to capture even complicated conditional mean relations.Another feature, noted by Huber, et al. (2020), is that the prior variance increases in the range of the data.Hence, if outliers arise, the prior becomes increasingly loose and allows for more flexibility in terms of capturing observations far outside the range of past data.
The priors on the tree structures and the terminal node parameters constitute the main ingredients of BART.Since our model also features a linear part and additional coefficients, we also need to specify priors on β ip and ω ip .We discuss them in the next sub-section.

Priors on the remaining coefficients of the model
On the coefficients β ip we use a horseshoe-type prior on each element β ij,p : where C + denotes a half-Cauchy distribution, ψ ij,p is a coefficient and quantile-specific scaling parameter, and ϕ is a global shrinkage parameter that is common to all coefficients.Notice that the presence of ϕ introduces dependencies across coefficients (including across countries) and across quantiles.The key advantage is that the presence of the local shrinkage parameters ψ ij,p allows the detection of signals (i.e., non-zero or heterogeneous β ij,p over the cross section) even if ϕ is close to zero.
The prior mean β jp pools information over the cross section.In our hierarchical specification, it is estimated from the data using a Gaussian prior for the domestic variables, and deterministically set to zero for non-domestic quantities: β jp ∼ N (0, φj ) for j = 1, . . ., J, β jp = 0 for j = J + 1, . . ., K.
The parameter φj is the prior variance of the common mean, which we set to a weakly informative value of 10 for the empirical application.We refer to this prior as the pooled horseshoe (HSP), while setting the common mean to a zero vector of size K yields the conventional horseshoe (HS) that we consider as an alternative.
For the factor loadings λ ip , we use a set of independent Gaussian priors for all i, p: Note that λ ip is a scalar and, hence, we use this relatively non-informative prior rather than a prior such as the HS which is used to avoid over-parameterization as might occur with high dimensional parameters.
On the weights ω ip we consider a Uniform prior: This prior introduces no particular prior information on the amount of non-linearities.If we wish to be informative on ω ip we can also use a Beta prior and specify the hyperparameters appropriately.
The remaining coefficients of the model relate to the error term.Kozumi and Kobayashi (2011) write the ALD using a scale-location mixture of Gaussians: , e ip,t ∼ N (0, 1), and z ip,t ∼ Exp(1).On the scale parameter σ ip we use an inverse Gamma prior: with the relatively non-informative choices of a σ = 1 and b σ = 1.
This completes the prior setup.In the next sub-section we briefly discuss the Markov chain Monte Carlo (MCMC) algorithm used to carry out estimation and inference.

Full conditional posterior simulation
We use Markov Chain Monte Carlo (MCMC) techniques to obtain a draw from the joint posterior of the latent quantities and coefficients of the model.Specifically, the following steps of the algorithm are carried out for each equation (i.e., country) i and quantile p: The full conditional posterior of the tree structures takes no well-known form.Chipman, George, and McCulloch (2010) propose a Bayesian backfitting strategy to set up a Metropolis Hastings (MH) algorithm to sample the trees individually, conditionally on the other S − 1 trees.This step is carried out marginally of µ s ip .The terminal node parameters can then, under our conjugate prior, be simulated from a set of independent Gaussian distributions that take a well-known form.
• Sampling p(β ip |•).The regression coefficients are, conditional on the remaining parameters and latent states, simulated from a multivariate Gaussian posterior distribution with known moments: ỹip is a T −dimensional response vector with t th element given by ỹip,t = (y it −ω ip ĝip (x it )− , and V ip is a prior variance matrix with main diagonal element The prior mean β p = (β 1p , . . ., β Jp , 0 K−J ) collects the estimated common means β jp in the corresponding position of the J domestic variables with the remaining elements being zero for the HSP prior, while β p = 0 K for the conventional HS prior.
• Sampling from p(β jp |•).The posterior distribution for the non-zero elements for j = 1, . . ., J of the prior mean for HSP is The full conditional posterior of ω ip takes no well-known form.
Since the support of ω ip is bounded and the target density univariate, we adopt a slice sampler (see, e.g., Neal (2003)) that is straightforward to implement and mixes well.
• Sampling from p(λ ip |•).The factor loadings are obtained by simulating from univariate Gaussian conditional posterior distributions: ŷip denotes the T × 1 response vector with typical t th element given by ŷip,t = (y it − , and fp has typical element fpt = f pt /(τ p √ σ ip ν ip,t ).
• Sampling from p(σ ip |•).Kozumi and Kobayashi (2011) show that the conditional posterior of the scaling parameter σ ip follows an inverse Gamma distribution: • Sampling from p(ν ip,t |•).For each t, we simulate ν ip,t from a generalized inverse Gaussian (GIG) posterior distribution: where cip = w it /(τ 2 p σ ip ) and dip = 2/σ ip + θ 2 p /(τ 2 p σ ip ).3 • Sampling from p(ψ 2 ij,p |•).The local, coefficient-specific scaling parameters are simulated using the scheme outlined in Makalic and Schmidt (2015).Conditional on auxiliary shrinkage parameters ξ and η ij,p , the posterior of ψ 2 ij,p is inverse Gamma distributed: These steps relate to the quantities we have to simulate for each country (or equation) and quantile.Next we turn to the quantities that we simulate per quantile and thus pool over countries.
• Sampling from p(f pt |•).For each t, we simulate f pt from a sequence of independent Gaussian posterior distributions as follows: λ p = (λ 1p , . . ., λ N p ) , and • Sampling from p(h p |•) and p(µ p , ρ p , ς p |•).We sample the full history of log-volatilities h p = (h p1 , . . ., h pT ) and the parameters of the state evolution equation using the efficient sampler proposed in Kastner and Frühwirth-Schnatter (2014).This algorithm samples the log-volatilities, conditional on everything else, all without a loop.
The final step refers to the global shrinkage parameter of the horseshoe prior.This step pools information across all equations and quantiles.Since we rely on auxiliary random variables to obtain a well-known full conditional posterior distribution, we first simulate from p(ξ|•) and then from p(ϕ|•).
• Sampling from p(ϕ 2 |•) and p(ξ|•).The conditional posteriors of the global shrinkage parameter ϕ and the auxiliary global parameter ξ are, respectively, inverse Gamma distributed: This completes our MCMC algorithm.In all our empirical work we repeat the different steps 30, 000 times and discard the first 15, 000 draws as burn-in.One key advantage of the present algorithm is that it is scalable to larger data sets (i.e., including more countries, additional endogenous variables, or more covariates) because, conditional on the factors and ϕ, the different posterior quantities are independent across equations and quantiles.

Empirical results
In this section we first investigate whether our modeling approach improves upon a set of simpler, nested alternatives by means of a forecasting horse race.We then focus on international growth at risk dynamics in two ways.First, we analyze how GDP growth reacts to changes in the common factor.Afterwards, we focus on how a shock to US financial conditions spills over to the other economies in our sample.

Data overview, competing models and forecasting design
Our sample runs from 1975Q1 to 2020Q4.We use annualized quarterly growth rates of GDP data from the Main Economic Indicators (MEI) database, maintained by the OECD, and the composite indicator of systemic stress (CISS) by the European Central Bank (ECB).For data availability reasons we include Austria (AT), Denmark (DK), Finland (FI), France (FR), Germany (DE), Italy (IT), Netherlands (NL), Spain (ES), Sweden (SE), United Kingdom (UK), and the United States (US).
We estimate the models for p ∈ {0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 095}.For each model we consider two different choices for the covariates.The first, which we label CISS, includes the CISS and a single lag of y it in x it , implying that K = 2.The second includes cross-country information in x it by including the first lag of GDP growth and the CISS of all countries; hence, K = 2N .The latter is referred to as CISS-CC to indicate that the information set includes cross-country data.
Since our model is quite flexible and nests several competing models, we also include a range of restricted variants of the general model outlined in Section 2. First, we obtain the ABG model by using the CISS covariates and setting Λ = 0 and ω ip = 0. We use frequentist methods to carry out estimation so as to be the same as ABG, while we estimate all other models using Bayesian methods with either the HS or HSP prior (so, for example, we will consider both CISS-CC-HS and CISS-CC-HSP specifications).ABG will serve as our benchmark model to which we compare all other specifications.We then add features to this benchmark.We begin by remaining linear (ω ip = 0) but adding the international factor to the ABG model by letting Λ = 0 in order to investigate whether it plays an empirically important role.All subsequent models also let Λ = 0. We next investigate non-linearities by setting ω ip = 1 and thus obtain a multi-country quantile BART model with a common international factor.Finally our most flexible model allows for ω ip to be estimated from the data.An overview of all model specifications is provided in Table 1.
We compute direct forecasts for h ∈ {1, 4}.HS (shrinkage to zero) ω = 0 (parametric) Λ = 0 (independence) CISS-CC (cross-country) HSP (pooling cross-section) ω = 1 (nonparametric) Λ = 0 (dependence) ω ∈ (0, 1) (estimated) Notes: "Data" refers to the information set for individual country models."Prior" indicates the prior on the parametric part of the model; we consider the conventional horseshoe prior (HS) shrinking towards zero and the pooled horseshoe (HSP) prior pushing the model towards cross-sectional homogeneity."Weights" refers to the specification of the conditional quantile function: parametric, nonparametric, or whether we estimate weights on the parametric and nonparametric part."Factor" indicates whether an international factor modeling the cross-sectional covariance structure within quantiles is present.We consider all possible combinations of these specification choices.

Tail forecasting results
Table 2 reports the forecast comparison of the various models based on the relative qw-CRPS.
Each cell in the heatmap shows the qw-CRPS relative to the ABG benchmark model.Numbers smaller than one indicate outperformance (green colored) vis-á-vis the ABG model whereas numbers exceeding one suggest a weaker performance (red colored) than the benchmark.
Four main comments can be made.First, and focusing on aggregate results across countries, our proposed models commonly improve on the benchmark ABG model.The gains are about 20 percent when looking at the standard CRPS, decrease to about 10 to 15 percent in the left tail, and increase to about 30 percent in the right tail (based on additional results reported in the Appendix).Overall, these results suggest that at each quantile, and particularly in the right tail, it pays off to allow for non-linearities and for cross-country relations.
Second, while there are small differences between setting ω ip = 0 (linear quantile) or ω ip = 1 (BART quantile), there is often some benefit to estimating the weight ω ip , in turn allowing for both linear and BART pieces in the model.The key advantage of estimating ω ip is that it combines the best of both worlds and thus translates into a model that is strongly non-linear and non-parametric in the tails and close to a linear quantile regression model in the center of the distribution.Such a behavior is beneficial if loss functions which evaluate the full predictive distribution are used.
Third, the HSP prior, that includes pooling, is typically better than HS, but the differences shrink or are eliminated once cross-country information is included in the model.It is noteworthy that once we use a pooling prior the predictive benefit of adding cross-country information directly diminishes sharply.This points towards the fact that, through pooling, our approach successfully picks up cross-sectional information in a very parsimonious manner.
Finally, there is some heterogeneity across the countries under analysis.In particular, for Spain, France, Italy, and the UK the results are broadly in line with those mentioned above.In contrast, for Austria and Sweden estimating the weight ω yields little gains (setting ω ip = 1 is often best), and for the other countries it is overall difficult to beat the benchmark.

Estimated weights over time
In the previous sub-section we have shown that our proposed framework yields forecast distributions which are often more precise than the ones obtained from the ABG benchmark and simpler nested alternatives.One key advantage of the model is that it allows for different weights ω ip across countries and quantiles and this improves forecasts when the full predictive density is evaluated.In this sub-section, we investigate whether our intuition that non-linearities are relevant in the tails while linear models are adequate in the center of the distribution is supported by our model.
Figure 1 reports the estimated weight ω ip over our hold-out period.Darkblue cells indicate a weight close to one while gray shaded cells imply a weight close to zero.We focus on two models, the CISS and CISS-CC models coupled with the pooled Horseshoe prior (HSP). 4 It turns out that for most countries, our conjecture is confirmed.That is, we observe weights which approach unity if we move out in the tails (i.e., the model becomes more nonlinear).When we focus on the center of the distribution, the combination weights approach zero (i.e., the model is linear).Comparing the right and left tails reveals that the estimated is often larger for the 5 percent quantiles than the 95 percent quantiles.This indicates that non-linearites are important when our focus is on modeling sharp upswings in GDP growth but become even more important when interest centers on capturing downturns in GDP growth that are extreme (i.e., below the 5 percent quantile).
When we compare the model which does not utilize cross-country information (CISS) to 4 The results for the remaining specifications look similar and may be found in the Appendix.the one which explicitly includes cross-sectional data (CISS-CC), we find only modest differences in combination weights.These differences are mainly related to somewhat smaller weights on the BART specification in the upper tail of the distribution (for some selected countries such as AT, DK, the UK, and the US), but the main finding that, in the center of the distribution, our model assigns no weight to the non-linear model still holds.
Zooming into the different results reveals that most countries share the general dynamics described in the previous paragraphs (i.e., ω ip close to one in the tails and ω ip ≈ 0 in the center of the distribution).One exception to this broad-based finding is France, which displays larger combination weights across all quantiles.In addition, there exists temporal heterogeneity.For instance, in several countries we observe that, after the global financial crisis (and sometimes slightly earlier), combination weights decrease markedly in the upper tails of the distribution.

The role of the common volatility factor
In this sub-section, we investigate the common volatility factor across quantiles.In a first step, we assess the relevance of the common factor volatility specification by considering time averages of variance decompositions.These are computed, by taking the Gaussian representation of the ALD (the distribution of ip,t in Equation ( 1)), as follows: , with Var( ip,t ) denoting the variance of ip,t .decomposition provides information on the share of variation in the shocks (conditional on the quantile) that is explained through the common factor (similar to Stock and Watson (2005)).
Table 3 reports time averages of variance decompositions resulting from the CC-HSP model.Interestingly, for most countries the commonality is larger and more substantial in the tails than at the center of the distribution, and a bit larger in the right than in the left tail.
These larger contributions in extreme periods can be traced back to the fact that several of the recessions in our hold-out period can be viewed as shocks with a pronounced global dimension (such as the global financial crisis or the Covid-19 pandemic) and the factor is picking this up.
Across countries, we find a considerable degree of homogeneity within country groups.
For instance, Finland, Denmark, and Sweden feature commonalities that are very pronounced in the tails but decline once we approach the center of the distribution both from left and right.
The US and the UK share a rather similar pattern in terms of commonalities (high shares in the tails and for the median, smaller shares for the quantiles in between).
The heterogeneity across quantiles in the role of the volatility factor is further supported by Figure 2, which reports estimates of the factors (upper panels) and associated log-volatility per quantile (lower panels).In the upper panel, we observe that especially in the tails the factor moves sharply during global events such as the global financial crisis and the Covid-19 pandemic.
Turning to the evolution of the log-volatilities in the lower panel generally yields consistent insights with the findings discussed for the level of the factor.The log-volatility spikes during recessions (i.e., in the early 1990s, 2008-2009, and 2020), and for p = 0.5 the level of the logvolatility is much smaller than for the other quantiles but then exceeds the increases in volatility observed for the other quantiles of the distribution.This finding also sheds light on why the amount of variation explained through the factor for most countries is lowest but still sizable in the 50 percent quantile.In most periods, the volatility factor is small (around −5 to −10 on the log-scale) but then during the pandemic it rapidly increases and reaches values of around 5 on the log-scale.This suggests that in tranquil periods, the factor only explains little variation in the shocks but in recessions (or turbulent times) this share increases appreciably and approaches 1.

Generalized impulse responses to a global business cycle shock
The discussion on the qualitative and quantitative properties of the estimated factor provides evidence that it can be interpreted as a global business cycle shock since, depending on the quantile adopted, it closely tracks events such as global recessions.Following Stock and Watson (2005), we now consider how changes to the factor, labeled factor shocks, impact GDP growth across countries and quantiles.
The posterior quantiles of the generalized impulse response functions (GIRFs) for a common factor shock as estimated with the CISS-CC-HSP model are reported in Figure 3.This figure includes the GIRFs for our model with ω ip estimated (gray shaded areas) and for ω ip = 0 (solid blue lines).
A first interesting finding is that, for all countries, a factor shock has different effects in the left tail than the right tail.In both tails, growth is negatively affected, confirming that higher volatility/uncertainty is detrimental for growth, but the size of the effect (and persistence of the negative effect) is much larger in the left tail.Moreover, notice that for the right tail we also observe an overshoot in real activity in response to an adverse business cycle shock.
Second, when we consider the left tail, the piece with an estimated weight tends to mitigate the effects of the shock.This is most likely driven by the fact that, if we rule out non-linearities, there is more to be explained through the factor model and this might translate into factor dynamics which not only pick up business cycle shocks but also soak up information left in the error term potentially arising from ignoring non-linear dynamics between GDP growth and the CISS.
A third striking pattern is the pronounced degree of cross-country heterogeneity in the 5 percent quantile (and, to a somewhat lesser extent, in the 10 percent quantile).When our focus is on the left tail, we observe that France, Italy, the UK, and Spain exhibit sharp declines in GDP growth.Once we consider higher quantiles the GIRFs become much more similar across countries.For instance, we find only modest differences if we focus on p = 0.5.

Generalized impulse responses to a US financial conditions shock
The previous sub-section emphasized that our latent factor can be interpreted as a global business cycle shock.In this sub-section we will instead focus attention on the international effects of a shock to US financial conditions and whether the real effects of such a shock differ from the ones arising from changes in f pt .weight ω and either with (Figure 4) or without (Figure 5) the common factor in the model's innovation component.Recall that the CISS is defined so that higher values represent tighter financial conditions; a positive shock may be expected to reduce GDP growth.
As the model is non-linear, the sign and size of the shocks can matter to determine the effects (i.e., contrary to the linear case, the effects are not proportional to the size of the shock, or symmetric).Hence, we have recomputed the model for various sub-samples to analyze how different global business cycle conditions impact the estimates of the GIRFs.
Comparing the two figures shows that the factor volatility has little effect on the results, which is not surprising as it should not affect much the point estimates of the GIRFs (rather   their precision).In both cases, a financial shock in the US spills over to other countries.There is asymmetry in the sense that a positive shock affects the growth quantiles, whereas a negative shock's effects are not as sharp.Moreover, the effects and asymmetry are sharper in the 2010-2019 period than earlier.
Analyzing cross-country differences provides additional interesting insights.For some countries (DE, FR, DK, and ES), we find substantial time variation in the GIRFs.Prior to 2010, the corresponding heatmaps feature a great deal of gray colored cells, implying no reactions at all.After the global financial crisis, US-CISS shocks have pronounced effects for these countries that are mostly located in the left tail of the distribution of GDP growth.For other countries (IT, UK, US, and NL), we find less evidence in favor of time-variation in the GIRFs.In these countries, positive (negative) shocks to the US-CISS have negative (positive) effects on GDP growth for p ∈ {0.05, 0.1}.Notice, however, that when we consider p ∈ {0.9, 0.95}, the effect of a CISS shock seems to reverse sign; a positive shock to the CISS has positive effects on GDP growth if it is already historically high and a negative shock triggers a decline in GDP growth.6: Cumulative (one-year ahead) generalized impulse response functions of GDP by quantile (p) to a {−3, −2, −1, 1, 2, 3} in-sample standard deviation US financial conditions shock.Model: CISS-CC, Λ = 0 and ω = 0, HSP prior.Difference between conditional and unconditional forecast across countries (grouped average over time by indicated period).more linear; positive and negative shocks to financial conditions in the US have similar effects, of opposite sign.In addition, the pattern is clearer in the data since 2007 than before.The effects also look smaller and more stable over time than in the non-linear quantile specification, and in general they are more marked at the lower quantiles.These marked differences in the empirical findings highlight the importance of allowing for non-linearities also in the context of quantile regressions.

Conclusions
In this paper we propose a non-parametric quantile panel regression model which assumes that the conditional mean is a convex combination of a linear and an unknown non-linear function.
We learn the unknown functions using BART, a successful tool closely related to random forests.
To decide on how much weight the BART piece should receive in the p th quantile, we estimate it alongside the remaining model parameters.This non-parametric feature enhances model flexibility, especially in the tails.Using cross-sectional information, in addition, enables us to improve predictive accuracy.This is achieved by proposing a novel pooling prior as well as introducing cross-country information directly.To carry out estimation and inference we design a scalable MCMC algorithm and apply the model to investigate "growth at risk" using an international panel of 11 countries.
In terms of empirical results, our proposed models commonly improve on the benchmark single country linear quantile model in recursive growth forecast comparisons, more so in the tails than near the center of the distribution and in particular when estimating the weight ω, in turn allowing for both linear and BART pieces in the model.The estimated combination weight is smaller (i.e., the model is more linear) for the 25 and 75 percent quantiles than in the tails, i.e., for the 5 and 95 percent quantiles.Moreover, some form of international information definitely pays off (either via the new pooling prior, or by outright including non-domestic series).The effects of the common (volatility) factor are also relevant, as it seems to explain a large fraction of the forecast error variance in most countries, in particular in the tails.A shock to this factor, which can be interpreted as an uncertainty shock, has different (stronger negative) effects in the left tail than the right tail of the growth distribution.In the left tail, the BART piece with an estimated weight tends to mitigate a bit the effects of the shock.Finally, a financial shock in the US spills over to other countries.There is asymmetry in the responses in the sense that a positive shock affects the growth quantiles, whereas a negative shock's effects are not as sharp.Moreover, the effects and asymmetry are sharper in the 2010-20 period than earlier.The responses are instead much more proportional and symmetric in the linear model, highlighting the importance of allowing for non-linearities in the specification of quantile regressions.

Appendix
Further forecast results Relative quantile weighted cumulative ranked probability scores (CRPS) for h = 1 and models with Λ = 0.The results are benchmarked to CISS with Λ = 0 and ω = 0. Lower ratios (shaded in green) indicate better performance (and vice versa, shaded in red).

Figure 1 :
Figure 1: Non-parametric weights ω ip for Λ = 0 with the HSP prior plotted over time with respect to the training/holdout samples (h = 1).

Figure 2 :
Figure 2: Estimates of the factors and associated log-volatility per quantile, HSP prior.

Figures 6
Figures 6 and 7 are similar to Figures 4 and 5 but now the weight ω is set to zero, so that the quantile part is linear.Here, too, the common (volatility) factor component does not seem to have an obvious effect on the results.Instead, the effects of financial conditions are now

Figure 8 :
Figure 8: Relative quantile weighted cumulative ranked probability scores (CRPS) for h = 4 and models with Λ = 0.The results are benchmarked to ABG with Λ = 0 and ω = 0. Lower ratios (shaded an green) indicate better performance (and vice versa, shaded in red).

Figure 10 :
Figure 10: Relative quantile scores (QSs) for h = 4 and models with Λ = 0.The results are benchmarked to ABG with Λ = 0 and ω = 0. Lower ratios (shaded an green) indicate better performance (and vice versa, shaded in red).

Table 3 :
Time averages of variance decompositions, ω sampled, CC model and HSP prior.