Gaussian Process Vector Autoregressions and Macroeconomic Uncertainty ∗

,


Introduction
Economic relations can change over time for a variety of reasons, such as technological progress, institutional changes, major policy interventions, but also wars, terrorist attacks, stock market crashes and pandemics. Standard econometric models, such as linear single and multivariate regressions assume instead stability of the parameters characterizing the conditional first and second moments of the dependent variables. When stability is formally tested, it is often rejected (see, e.g., Stock and Watson, 1996). This has led to the development of a variety of methods to handle structural change in econometric models.
Parameter evolution is assumed to be either observable (i.e., driven by the behavior of observable economic variables) or unobservable, and either discrete and abrupt or continuous and smooth. Examples include threshold and smooth transition models (see, e.g., Tong, 1990;Teräsvirta, 1994), Markov switching models (see, e.g., Hamilton, 1989), and double stochastic models (see, e.g., Nyblom, 1989). In all these models, a specific and fully parametrized type of parameter evolution is assumed, and then linear or non-linear filters are used for estimation in a classical context or Markov chain Monte Carlo (MCMC) methods in a Bayesian framework. Examples of economic applications of all these methods include Koop and Korobilis (2013), , Aastveit, Natvik, and Sola (2017), Caggiano, Castelnuovo, and Pellegrino (2017), Alessandri and Mumtaz (2019), and Caggiano, Castelnuovo, and Pellegrino (2021). 1 Assuming a specific type of parameter evolution increases estimation efficiency but can lead to mis-specification. A more flexible alternative allows for a smooth evolution of parameters without specifying the form of parameter time variation. In a classical context, the evolution can be either deterministic (see, e.g., Robinson, 1991;Chen and Hong, 2012), or stochastic (see, e.g., Giraitis, Kapetanios, andYates, 2014, 2018;Kapetanios, Marcellino, and Venditti, 2019, for the specific case of (possibly large) vector autoregressive models (VARs)). Kernel estimators are the main tool used in this literature.
Alternative approaches, which can also capture non-linear relationships between the target and explanatory variables, are, e.g., based on using functional-coefficient regressions (Cai, Fan, and Yao, 2000;Kowal, Matteson, and Ruppert, 2017), regression trees (Chipman, George, and McCulloch, 2010;Huber, et al., 2020;Coulombe, 2020), neural networks (Hornik, Stinchcombe, and White, 1989;Gu, Kelly, and Xiu, 2021;Coulombe, 2022) or infinite mixtures (Hirano, 2002;Bassetti, Casarin, and Leisen, 2014;Kalli and Griffin, 2018;Billio, Casarin, and Rossini, 2019;Jin, Maheu, and Yang, 2022). Most of these approaches, however, are fairly different from the VAR models that are the workhorse of modern time series econometrics, making interpretation of the estimation results and computation of quantities such as impulse response functions difficult. In addition, they typically focus on extending the specification of the conditional mean, while assuming a constant conditional variance, which can be restrictive for macroeconomic and financial data. Finally, some of the methods do not scale well into high dimensions and are thus not particularly suited for large datasets nowadays used in macroeconomics.
In this paper, we propose a new model that belongs to the non-parametric class and is capable of capturing, in a very flexible way, not only parameter evolution but also general non-linear relationships. Our model can be applied in a large data context while keeping the flexibility and ease of use of VARs, and allowing for time-varying conditional variances. Specifically, we combine the statistical literature on Gaussian process (GP) regressions (see, e.g., Crawford, et al., 2019), with that on VARs to obtain a GP-VAR model. Borrowing ideas from the literature on Bayesian Minnesota-type VARs, the model assumes, for each endogenous variable, a different non-linear relationship with its own lags and with the lags of all the other variables (and possibly of additional exogenous regressors). Gaussian processes are used to model non-linearities in a flexible but efficient way. They can be viewed as a non-parametric alternative to the adaptive Minnesotatype shrinkage proposed in Chan (2021). They are also similar to neural networks, in the sense that they are universal approximators based on infinite mixtures of Gaussian distributions. 2 We develop an efficient (Bayesian) estimation procedure, based on the structural form of the GP-VAR, which has the additional benefit that its complexity is linear in the number of endogenous variables and does not depend on the number of lags. Hence, estimation can be parallelized and, in addition, the conjugate structure of the model we use allows for pre-computing various matrix multiplications and kernel operations, which further speeds up computation. As a result, estimation is feasible also for very large models.
As in all Bayesian procedures, an assumption on the distribution of the errors is required. As is common in the Bayesian VAR literature, we assume that the errors are Gaussian. Yet, we also permit the variance of the errors to change over time, adopting a stochastic volatility (SV) specification. Already in linear BVARs, the use of SV permits to have time variation in the conditional distribution of the variables, so much so that BVAR-SV are empirically a good alternative to quantile regressions (see Carriero, Clark, and Marcellino, 2022). Moreover, multi-step predictive densities, used to produce forecasts and generalized impulse responses, are non-Gaussian. The use of SV in the GP-VAR adds flexibility, and prevents overfitting in the sense of avoiding that large realizations of the shocks are interpreted as changes in the conditional mean.
We illustrate the GP-VAR with synthetic data generated from a highly non-linear multivariate data generating process (DGP) that have both Gaussian and non-Gaussian shocks. This DGP assumes that some equations feature parameters that exhibit structural breaks while others depend non-linearily on the lags of the endogenous variables. To assess whether the GP-VAR is also capable of recovering linear relations, one equation is a standard linear regression model. In all these cases, our approach works reasonably well in terms of detecting the nature of non-linearities.
Our GP-based model has a vast range of applicability, for both reduced form and structural analysis. This mirrors the possible applications of standard VARs but allows for much more general dynamic relationships across the variables. Besides the evaluation with synthetic data, we consider a forecasting application and a more structural economic analysis. In the forecasting application, we compare the performance of the GP-VAR with other linear and non-linear competitors that allow for parameter change, and with the BVAR with SV. Predicting US output, inflation and interest rates, we show that the GP-VAR improves upon all competing models, with gains that are particularly pronounced at the four-quarter-ahead horizon.
As an example of a structural economic analysis, and to gather new insights on a topic that has recently attracted considerable attention (see, e.g., Bloom, 2014), we use the GP-VAR to investigate the effects of exogenous uncertainty shocks on US macroeconomic and financial time series. Comparing the responses of the GP-VAR with the ones of a standard linear BVAR reveals that our model produces sensible responses for real activity and stock markets. Differences between the responses relate to the shape and magnitudes, with the GP-VAR producing stronger reactions of uncertainty, real GDP growth, and stock markets returns while yielding similar reactions of employment growth. Considering models of differing sizes shows that the impulse responses do not differ markedly across model sizes.
Our proposed framework naturally allows for analyzing potential asymmetries in transmission channels. The responses to a positive uncertainty shock (higher unexpected uncertainty) are typically much stronger than the ones to a negative shock. Interestingly, the shape of the IRFs also differ, with positive shocks leading to responses that peak later. In addition, our findings suggest that the relationship between real activity and uncertainty becomes proportionally slightly smaller for large shocks, while financial markets react relatively more strongly to larger increases in uncertainty. Finally, our framework also allows us to investigate whether transmission mechanisms have changed over time.
Doing so reveals that the effects of uncertainty have been smaller in the great inflation period (1970Q1 to 1984Q4), more pronounced through the great moderation (1985Q1 to 2006Q4) before again turning more muted during the post great moderation phase (2007Q1 to 2019Q4).
The paper is structured as follows. Section 2 provides an introduction to Gaussian process regression. In Section 3 we develop the GP-VAR model. This section also provides necessary details on the prior setup and posterior computation. We then analyze model performance using synthetic data in Section 4. Sections 5 and 6 include our empirical work. In the former section we briefly discuss the dataset and provide some in-and outof-sample model evidence. In the latter, we focus on the macroeconomic implications of an uncertainty shock. The final section briefly summarizes and concludes the paper.
The Online Appendix contains additional empirical results and technical details on the specification, estimation and use of the GP-VAR model.

A brief introduction to Gaussian processes
In this section we briefly discuss Gaussian process (GP) regressions with a focus on time series data. 3 GP regressions are a non-parametric technique to establish a flexible relationship between a scalar time series y t and a set of K predictors x t in period t. The 3 For a textbook treatment, see Williams and Rasmussen (2006). key advantage of this approach is that it does not rely on parametric assumptions on the precise functional relationship between y t and x t .
In general, a non-parametric regression is given by: with f being some unknown regression function f : R K → R and ε t denoting an independent Gaussian shock with zero mean and constant variance σ 2 . We relax this assumption in Sub-section 3.1 to allow for heteroskedastic shocks. An assumption on the error distribution is needed in a Bayesian context and Gaussianity is the most common one, though different distributions can be easily accommodated by exploiting a scale-location mixture of Gaussians representation (see, e.g., Escobar and West, 1995).
In standard regression models, the function f is assumed to be linear with f (x t ) = β x t where β is a K × 1 vector of linear coefficients. If mean relations are non-linear, this assumption might be too restrictive. To gain more flexibility one can embed the covariates in x t into a higher dimensional space such as the space of powers , with x 2 t = (x t x t ) and higher orders defined recursively. Conditional on choosing a sufficiently large integer R, this would provide substantial flexibility to approximate any smooth function f . However, adequately selecting R is key and the mapping, moreover, is ad-hoc in the sense that there exist infinitely many non-linear mappings ψ.
Standard Bayesian methods place a prior on the coefficients associated with the covariates (and possible non-linear transformations thereof) and thus control for uncertainty with respect to these basis functions but at the cost of remaining within a class of functions (such as linear, polynomial or trigonometric functions). By contrast, in GP regressions we treat the function f as an unknown quantity and let the data decide about the appropriate form (and degree) of non-linearities.

Estimating unknown functions: the function space view
The key inferential goal in GP regression is to infer the function f from the data under relatively mild assumptions. This is achieved by specifying a prior on f (x t ). A typical assumption is to assume that f (x t ) follows a Gaussian process prior: being the mean function and denoting a kernel (or covariance) function that determines the relationship between f (x t ) and f (x τ ) for periods t and τ . The kernel is typically parameterized by a low dimensional vector of hyperparameters ϑ and controls the behavior of the function f . This kernel needs to be positive semidefinite and symmetric.
In what follows, we will set the function µ(x t ) = 0 for all t. This is without loss of generality, since any explicit basis function for µ(x t ) can be used to model the mean process. If the focus is on modeling stationary data, µ(x t ) = 0 implies that a priori the process is centered around a white noise process. In case one would like to model persistent or non-stationary data it would be straightforward to implement a prior that forces the system towards a set of random walk processes. This can be achieved by setting µ(y t−1 ) = ρy t−1 , where ρ denotes a persistence parameter with prior mean E[ρ] = 1.
Alternatively, one could specify the prior on f to imply persistence in y t . This possibility is discussed in much more detail in Section A.1 of the Online Appendix.
A common choice in GP regressions is the Gaussian (or squared exponential) kernel function: with ξ denoting a scaling parameter and κ the (inverse) length scale and thus ϑ = (ξ, κ) .
Larger values of κ lead to a GP which displays more high frequency variation whereas lower values imply a slowly varying mean function. The parameter ξ controls the prior variance of the function f . To see this, note that if This specification is quite flexible and fulfills several convenient conditions. For instance, Williams and Rasmussen (2006) show that the use of the Gaussian kernel implies that f (x t ) is mean square continuous and differentiable. Moreover, this kernel function represents a positive semidefinite and symmetric covariance function. Furthermore, Mercer's theorem (Mercer, 1909), under this kernel, states that the GP regression can be written in terms of an infinite number of basis functions. These basis functions are Gaussians with different means and variances. This suggests a connection to the literature on Bayesian non-parametrics (Escobar and West, 1995;Neal, 2000;Kalli and Griffin, 2018;Frühwirth-Schnatter and Malsiner-Walli, 2019) that relies on infinite mixtures of Gaussians to estimate unknown densities. The link between GPs and infinite mixtures of Gaussians shows that the Gaussian assumption on ε t is not too restrictive as the model allows to recover non-Gaussian features in the data.
The GP prior represents an infinite dimensional prior over the space of functions.
This implies that the estimation problem is infinite dimensional as well. However, since we sample data in a discrete manner, the GP prior becomes a multivariate Gaussian prior with 0 T being a T ×1 vector of zeros, K ϑ (X, X) a T ×T kernel matrix with typical element k ϑ (x t , x τ ) and X = (x 1 , . . . , x T ) . This implies that, in terms of full data matrices, the GP regression is given by: where I T denotes a T × T identity matrix.
Assuming for the moment that σ 2 is known, the posterior of f follows a multivariate Gaussian distribution: with variance-covariance matrix V f and posterior mean vector f : The mean function f can be interpreted as a weighted average of the values of the en-dogenous variable: where α = (α 1 , . . . , α T ) = (K ϑ (X, X) + σ 2 I T ) −1 y. This (finite dimensional) representation shows how one moves from an infinite dimensional problem to a finite dimensional one.
The expression for the variance-covariance matrix V f also has an intuitive interpretation. The first term is the prior variance (i.e., the kernel matrix). The second term measures how much of the variance is expressed through the covariates in X and thus the posterior covariance indicates how much the model learns from X.
The predictive distribution of f (x T +h ) can be easily derived by exploiting basic properties of the multivariate Gaussian: whereby Similar to the posterior mean f , the predictive mean f T +h is a weighted average of the values of the endogenous variables y with the weights depending on the relationship between X and a realization of the vector of covariates x T +h related to the h-step-ahead horizon. The predictive variance V T +h , again, depends on a term that is purely driven by the prior evaluated at x T +h minus a term that measures the informational content in the covariates.
Before proceeding to the discussion on how to set the kernel it is worth noting that what we have discussed above is often labeled the function-space view of the GP. This is because the prior is elicited directly on f . Another way of analyzing GPs is based on the weight-space view. Under the weight-space view one can rewrite the GP regression as a standard regression model as follows: with W ϑ denoting the lower Cholesky factor of K ϑ (X, X) = W ϑ W ϑ and η is a Gaussian shock vector with zero mean and unit variance. This is a standard regression model with T regressors, a coefficient vector η and a Gaussian prior on η. Standard textbook formulas for the Bayesian linear regression model (see, e.g., Koop, 2003, Chapter 4) can be used to carry out posterior inference.
This also shows that if we set K ϑ (X, X) = XV ϑ X , we obtain a linear regression model that features a Gaussian prior with zero mean and a typical prior variancecovariance matrix V ϑ . This kernel implies many more parameters than the parsimony inducing Gaussian kernel. Hence, the resulting fit and forecasts can be expected to have posterior distributions with larger variances than those associated with the Gaussian kernel, though bias would be lower if the true model is linear and features stable parameters.

Choosing a kernel and the role of the hyperparameters
In the previous sub-section the quantities for the posterior of f and the predictive density for future values of f (x T +h ) suggest that the kernel and its hyperparameters play an important role. In this sub-section, we discuss this issue in more detail.
One of the key advantages of GPs is that by constructing suitable kernels, one can determine the space of possible functions. This gives rise to substantial flexibility and allows for capturing a large range of competing models within a single econometric model. For instance, Williams and Rasmussen (2006, Chapter 6) discuss how kernels can be constructed to mimic the behavior of neural networks, regression splines, polynomial and linear regressions. Tree-based techniques such as Bayesian additive regression trees (BART, Chipman, George, and McCulloch, 2010) can be cast in this framework by exploiting the ANOVA-representation of the model and then the weight-space view of the GP. In principle, and we will build on this feature later, summing over the corresponding kernels gives rise to another kernel and suitable weights could be constructed to select, in a data-driven way, which model summarizes the data best.
As stated in the previous sub-section, our focus will be on the Gaussian kernel due to its excellent empirical properties and analytical tractability. The two hyperparameters κ and ξ control the curvature and the marginal variance of the function, respectively. We illustrate the effect of κ on the prior and posterior of f in Figures 1 and 2 (2007), which features a persistent stochastic trend in inflation. Once we increase κ we observe that the draws from the prior display more high frequency variation with shorter cycles between peaks and troughs. Once this prior is combined with the data, the estimated mean functions display much more curvature and fit the actual data increasingly well.
Once κ is set too high, the functions arising from the prior vary substantially and are likely to capture also very small deviations of inflation from its trend. This translates into a close-to-perfect fit of the posterior mean of the functions and gives rise to serious overfitting concerns.
To see how a GP regression captures a possibly non-linear relationship between y t and x t , Figure 2 shows the functional relationship between output growth and lagged macroeconomic uncertainty. If κ = 0.01, the regression relationship is almost linear and suggests that high levels of (lagged) macroeconomic uncertainty are accompanied by negative output growth rates. When we set κ = 0.1 we observe much more curvature (both in the prior and the posterior) in the relationship, indicating that if uncertainty is between 0 and around 1.7, GDP growth is between 2.3 and 2.5 percent. However, once a certain threshold in the first lag of uncertainty is reached, the relationship becomes strongly negative until it becomes essentially flat for very high levels of uncertainty. A similar finding, but slightly more pronounced, arises if we set κ = 4. In this case GDP growth does not change much as long as lagged uncertainty is between 0 and 1.7 and then the relationship becomes, again, strongly negative.
As is clear from these stylized examples, the role of the kernel and its hyperparameters crucially impacts the posterior estimates of the function f . Setting κ too small leads to a model which might miss important (higher frequency) information whereas a κ set too large translates into an overfitting model which might yield a very strong in-sample fit but poor ouf-of-sample predictions. Setting κ is thus of crucial importance and in all

Macroeconomic uncertainty indicator
Notes: In this figure, we showcase the GP regression with US GDP growth data using the first lag of macroeconomic uncertainty as the only regressor. The left panels report, for different values of κ, the 5 th and 95 th prior percentiles (with the area in between shaded in light red), three draws from the prior (dashed red lines), and the actual values of GDP growth (black dots). The right panels report the 90% posterior credible sets (shaded in light red), the posterior medians (solid red lines), and actual GDP growth (black dots).
our empirical work we will infer it through Bayesian updating.
Another key question is whether the estimated function converges to the true underlying function. The literature deals with this question using several assumptions on the error distributions (mostly setting σ 2 = 0) or how the GP regression behaves if the underlying function f differs in terms of smoothness from the GP prior controlled by the kernel (see, e.g., Stone, 1982;van der Vaart and van Zanten, 2008;Yang, Bhattacharya, and Pati, 2017;Teckentrup, 2020). Stone (1982), by focusing on iid data, shows that the optimal rate of estimation of a ζ−smooth function is T −ζ/(2ζ+K) and thus decreases in K while it increases in the smoothness of the true function. Building on this finding, Teckentrup (2020) analyzes the contraction properties of a Gaussian process regression under a general Matérn kernel function and provides error bounds that also depend on the relationship of the smoothness of the true and estimated functions. If these agree, one can achieve a convergence rate of T −ζ/K .
After having provided the necessary foundations on Gaussian process regression, we will now focus on developing a model that is suitable for macroeconomic analysis.

Large-dimensional Gaussian process VARs
In this section, we first develop the GP-VAR in Sub-section 3.1. Next, Sub-sections 3.2 to 3.4 are devoted to the development of efficient MCMC schemes to carry out posterior and structural inference. Finally, Sub-section 3.5 details how to compute forecasts and (generalized) impulse response functions for the GP-VAR.

The Gaussian process VAR
In the following discussion, let y t = (y 1t , . . . , y M t ) denote an M × 1 vector of macroeconomic and financial variables. 5 Moreover, x t = (x 1t , . . . , x M t ) denotes an M p × 1 vector with x jt = (y jt−1 , . . . , y jt−p ) storing the "own" lags of the j th endogenous variable and z t = (z 1t , . . . , z M t ) an (M − 1)M p × 1 vector of "other" lags. Hence, z jt = (y −jt−1 , . . . , y −jt−p ) , where y −jt denotes the vector y t with the j th element excluded.
We discriminate between own and other lags of y t because we assume that lags of other endogenous variables impact a given endogenous variable differently from its own lags. The literature on Bayesian VARs (see Bańbura, Giannone, and Reichlin, 2010;Koop, 2013;Huber and Feldkircher, 2019;Chan, 2021) has captured this through shrinkage priors that treat coefficients on own and other lags differently. We wish to capture this equationspecific asymmetry by specifying our GP-VAR to depend on two latent processes: one driven by x t and one by z t . The structural form of the resulting GP-VAR is then given by: with F (x t ) = (f 1 (x 1t ), . . . , f M (x M t )) and G(z t ) = (g 1 (z 1t ), . . . , g M (z M t )) and f j and g j being equation-specific functions. The function f j controls how y jt depends on its own lags while g j encodes the relationship between y jt and the lags of the other endogenous variables. The functions f j and g j , and hence F and G, differ because we construct differ-ent kernels with distinct hyperparameters. 6 The matrix Q is an M × M lower triangular matrix with zeros along its main diagonal. This matrix defines the contemporaneous relations across the elements in y t .
Finally, ε t is an M × 1 vector of Gaussian shocks with zero mean and an M × M time-varying variance-covariance matrix H t = diag(ω 1t , . . . , ω M t ). We will assume that ω jt follows a flexible stochastic volatility (SV) model: with the logarithm of h jt = log ω jt being assumed to evolve according to a stationary AR(1) state equation. We let ρ hj denote the persistence parameter, σ 2 hj the error variance, and h j0 the initial state of the log-volatility process.
Allowing for time variation in the shock variances provides additional flexibility and enables us to capture non-Gaussian features in the shocks (not only, but also due to the fact that h jt enters the model non-linearly). 7 In principle, we could also allow for unknown functional relations between the contemporaneous terms of the preceding j − 1 equations and the response of equation j. However, this would lead to a complicated non-linear covariance structure. Since we are interested in carrying out structural identification based on zero impact restrictions we opt for choosing this simpler approach which implies multivariate Gaussian reduced form shocks, but with a time-varying covariance matrix. Given that the literature on GPs typically assumes the shocks to be Gaussian and homoskedastic, this is already a substantial increase in flexibility. 8 The model in Eq. (2) assumes that the shocks in ε t are, conditional on Qy t , orthogonal and hence estimation can be carried out equation-by-equation. We will exploit this representation for simplicity and computational tractability. The j th equation, in terms of full-data matrices, is given by: It is worth stressing that one could also think of our decomposition in terms of a new function with a kernel that is given by the sum of the kernels of the functions f j and g j . 7 One could also introduce additional scaling factors that arise from inverse Gamma distributions to obtain a model with t-distributed shocks. 8 A rare exception is Jylänki, Vanhatalo, and Vehtari (2011), who propose a GP regression with heavy tailed errors and mainly focus on fast and robust approximate inference of a posterior that is analytically intractable due to a t-distributed likelihood.
Notice that our estimation strategy is not invariant with respect to reordering the elements in y t , a common problem if this orthogonalization strategy is used. In Subsection C.2 of the Online Appendix, we show that different orderings have only a small impact on the estimated impulse responses.

Conjugate Gaussian process priors
In this sub-section, our focus will be on the priors on f j and g j . The priors on the remaining, linear quantities are standard and thus not discussed in depth. We use a Horseshoe prior (Carvalho, Polson, and Scott, 2010) on the free elements in Q, a Beta prior on the (transformed) persistence parameter (ρ hj + 1)/2 ∼ B(25, 5), and an inverse Gamma prior on the state innovation variances σ 2 hj . This prior is specified to have mean 0.1 and variance 0.01.
For equation-specific functions f j and g j , we specify two GPs with one conditional on X j = (x j1 , . . . , x jT ) and one conditional on Z j = (z j1 , . . . , z jT ) : suitable kernels with typical elements given by: For j = 1, . . . , M , ϑ j1 and ϑ j2 are equation and kernel-specific hyperparameters and the matrices D X j , D Z j are diagonal matrices with typical i th elementσ 2 X j i ,σ 2 Z j i . These are set equal to the empirical variances of the i th column of X j and Z j , respectively.
Inclusion of the diagonal scaling matrices D X j and D Z j serves to control for differences in the scaling of the explanatory variables. Notice that since the hyperparameters are allowed to differ, we essentially treat own and other lags asymmetrically through different functional approximations f j and g j .
The kernel is scaled with the error variances in Ω j . A typical diagonal element of the corresponding re-scaled kernel is given by ω jt × k ϑ j1 (x jt , x jt ) = ω jt ξ j1 and ω jt × k ϑ j2 (z jt , z jt ) = ω jt ξ j2 . Typical off-diagonal elements are given by The interaction between the kernel and the error variances gives rise to convenient statistical and computational properties.
First, note that if ω jt is large, the corresponding prior on the unknown functions is more spread out. In macroeconomic data, ω jt is typically large in crisis periods when the x jt and z jt are far away from their previous values. Since the diagonal elements of the kernels are effectively determined by ξ j = (ξ j1 , ξ j2 ) the presence of ω jt allows for larger values in the marginal prior variance and thus makes large shifts in the unknown functions more likely. Second, the interaction between ω jt and ω jt−1 implies that the covariances are scaled down if ω jt ω jt−1 , suggesting that the informational content decreases if increases in uncertainty are substantial (i.e., ∆ω jt is large). If ω jt ≈ ω jτ and both are large, the corresponding covariance will be scaled upwards. This implies that our model learns from previous crisis episodes as well. Third, as we will show in Sub-section 3.4, interacting the kernel with the error variances leads to a conjugate Gaussian process structure which implies that we can factor out the error volatilities and do not need to update several quantities during MCMC sampling. This speeds up computation enormously and allows for estimating large models.
Before discussing how we select the hyperparameters, it is worth highlighting a possible identification problem of our model. In our baseline specification we center f j and g j around zero a priori. If we introduce an additional intercept term (or a simpler mean function) no identification issues arise. However, if we believe that f j and g j are centered on non-zero values, we can not separately identify them. In our empirical work, we normalize the grand mean of g j to be equal to zero. 9 It is worth stressing, however, that if interest is on predictions or impulse responses, this does not cause any additional issues since the conditional mean function (which is the sum over f j and g j ) is identified. Exploiting basic properties of the Gaussian distribution one can easily show that the sum of f j and g j in Eq. (4) gives rise to a new latent process m j which is, again, Gaussian: Hence, our model can be also viewed as a standard Gaussian process that combines information in X j and Z j by summing over two different kernels. This immediately implies that if we are interested in sampling from the posterior predictive distribution of y t+h (and related functions such as impulse responses) it is sufficient to estimate m j .

Selecting the hyperparameters associated with the kernel
So far, we always conditioned on the hyperparameters that determine the shape of the Gaussian kernel. A simple way of specifying ϑ j1 and ϑ j2 is the median heuristic approach stipulated in Chaudhuri, et al. (2017). This choice works well in a wide range of applications featuring many covariates (see, e.g., Crawford, et al., 2019). The median heuristic fixes ξ j1 = ξ j2 = 1 and defines the inverse of the bandwidth parameter as: for j = 1, . . . , M . This simple approach has the convenient property that it automatically selects a bandwidth which is consistent with the time series behavior of the elements in y t .
To illustrate this, suppose that y jt is a highly persistent process (e.g., inflation or shortterm interest rates). In this case, for τ = t − 1, the Euclidean distance x jt − x jτ will be quite small and, hence, the mean function f j smoothly adjusts. If y jt is less persistent and displays large fluctuations (e.g., stock market or exchange rate returns), the Euclidean distance x jt − x jτ will be large and, thus, f j allows for capturing this behavior. The dispersion in z jt might have important implications for y jt if the aim is to model a trend in y jt that depends on other covariates. This could arise in a situation where the prior on f j is set very tight (i.e., the posterior of f j will be centered on zero) and information not coming from x jt would then determine the behavior of y jt . This discussion highlights how the median heuristic allows for flexibly discriminating between signal and noise and thus acts as a non-linear filter which purges the time series from high frequency variation, if necessary.
Given that we work with potentially large panels of time series, it is questionable that the median heuristic works equally well for all elements in y t . As a solution, we propose to use the median heuristic to set up a discrete grid for both ξ j1 (ξ j2 ) and κ j1 (κ j2 ). For each element in this grid we specify a hyperprior. We use Gamma priors on all elements. For j = 1, . . . , M , that is for the linear shrinkage hyperparameters and for the bandwidth parameters. Here, c ξ1 , c ξ2 , c κ1 and c κ2 are scalars that define the tightness of the hyperprior. In the empirical application, we set c ξ1 = c ξ2 = c ξ and c κ1 = c κ2 = c κ . These parameters strongly influence the shape of the conditional mean and are crucial modeling choices and we set them through cross-validation. In our empirical application, we find that small values of c κ work reasonably well, yielding an informative prior that forces κ j1 and κ j2 towards zero.
Based on this set of priors we can derive the conditional posterior distribution. Since ξ j1 (ξ j2 ) and κ j1 (κ j2 ) are placed on a grid, we can pre-compute several quantities related to the kernel (such as inverses and Cholesky factors) while at the same time infer them from the data with sufficient accuracy, which is crucial for precise inference. In what follows, we center these grids around the median heuristic and additionally take into account the considerations of the informative Gamma priors: and ξ jk ∈ [0.04, 4], for j = 1, . . . , M and k = 1, 2.
Here, the intervals indicate the minimum (maximum) value supported for each hyperparameter. Within this two dimensional range, we define a discrete grid of around 1000 combinations with equally sized increments along each dimension. 10 The corresponding posterior is discrete and we can use inverse transform sampling to carry out posterior inference. Further details are provided in Sub-section 3.4.

Posterior computation
Posterior inference for the GP-VAR is carried out using a novel yet conceptually simple MCMC algorithm which cycles between several steps. In this section we will focus on how to sample from the posterior of f j , p(f j |•), with • denoting conditioning on everything else, and ϑ j1 . Sampling from p(g j |•) and p(ϑ j2 |•) works analogously with some adjustments. These relate to the fact that we introduce a linear restriction that (ι ι) −1 ι g j = 0, with ι denoting a T × 1 vector of ones. The corresponding conditional posterior distribution is a hyperplane truncated Gaussian where efficient sampling algorithms are available (see Cong, Chen, and Zhou, 2017). Further details can be found in Sub-section A.2 of the Online Appendix. It is worth stressing that we sample f j and g j separately. This increases the computational burden slightly but allows us to consider both latent processes separately from each other. In case our focus is purely on prediction or impulse response analysis, one can also simulate the process m j defined in Sub-section 3.2 without any additional restriction. Both procedures yield exactly the same results.
Generalizing the results in Section 2, it can be shown that the posterior of f j is Gaussian for all j: with posterior moments given by: where for j = 1, the term j−1 k=1 q jk Y k is excluded. In principle, computing the inverse and the Cholesky factor of V −1 f j constitutes the main bottleneck when it comes to sampling from p(f j |•). This is especially so if T is large.
But, in common macroeconomic applications which use quarterly US data, T is moderate and thus computation is feasible. In our case, even if interest centers on using monthly data or even higher frequencies, we can exploit the convenient fact that, conditional on the hyperparameters ξ j1 and κ j1 , as well as its Cholesky factor B f j can be pre-computed. In addition, notice that These practical properties (due to the conjugate structure) substantially speed up computation in terms of sampling from p(f j |•).
These results are conditional on the hyperparameters. As outlined in the previous sub-section, we will estimate them by defining a discrete two dimensional grid of 1000 combinations. For each hyperparameter combination on this grid, we compute the corresponding kernel K ϑ j1 (X j , X j ) as well as all relevant quantities (i.e., C f j ). Based on these values we jointly evaluate the conditional posterior ordinate by applying Bayes theorem.
The exact form of the conditional likelihood is given by: Note that the shape of K ϑ j1 (X j , X j ) depends on the hyperparameters ϑ j1 = (ξ j1 , κ j1 ) , which we want to update. For each pair of values ϑ (s) j1 on our two dimensional grid (with s denoting a specific combination), we compute the corresponding kernel Hence, within our sampler evaluating the likelihood is straightforward and computationally efficient. All that remains is to multiply the likelihood with the prior. The corresponding posterior ordinates for each ϑ (s) j1 are used to compute probabilities to perform inverse transform sampling to sample from p(ϑ j1 |•).
Conditional on f j and g j , the remaining parameters (i.e., the free elements in Q, the log-volatilities and the associated coefficients in the corresponding state equations) can be sampled through (mostly) standard steps. One modification relates to how we sample the volatilities in Ω j . The main difference stems from the fact that the volatilities in Ω j also show up in the prior on f j and g j . To circumvent this issue we integrate out the latent processes f j and g j . This calls for a minor adjustment of the original sampler by integrating out the latent processes f j and g j first and then sampling the log-volatilities using an independent Metropolis Hastings update similar to the one proposed in Chan (2017). We provide additional details and the full posterior simulator in Section C of the Online Appendix.

Forecasts and generalized impulse responses
In non-parametric models such as the GP regression described in Section 2, the effect of the covariates on y t are typically analyzed through so-called partial dependence plots (Friedman, 2001). Our large dimensional setting and the fact that we have a VAR-type structure in the conditional mean, imply that partial dependence plots are difficult to compute and visualize since they would require integration over a large number of covariates. Moreover, VARs are dynamic models and partial dependence plots are difficult to employ in dynamic settings. To capture non-linear model dynamics and possible relations across variables, impulse responses are used to investigate the effects of structural shocks on y t . In this paper, we will follow this route as well. Because the model is highly nonlinear, we need to resort to generalized impulse responses (GIRFs) originally proposed in Koop, Pesaran, and Potter (1996). As GIRFs are based on (multi-step-ahead) forecasts, and since forecasting with the GP-VAR is of interest by itself, we start with a discussion on forecast computation.
We begin by computing the predictive distribution of the one-step-ahead forecasts p(y t+1 |I t ), with I t denoting all available information up to time t. The one-step-ahead predictive distribution is obtained by simulating from the predictive distribution of m t+1 , p(m t+1 |I t ), and sampling from the marginal distribution of the shocks ε t+1 ∼ N (0, H t+1 ).
The draw from the one-step-ahead predictive density is used to set up x t+2 and z t+2 .
Based on these, we can compute the corresponding kernels and obtain a draw m t+2 from the density p(m t+2 |I t ). Again, a draw from y t+2 ∼ p(y t+2 |I t ) is obtained by sampling from the marginal shock distribution ε t+2 ∼ N (0, H t+2 ) and adding this draw to m t+2 .
Higher order forecasts are obtained analogously. The resulting predictive distribution of y t+h will be highly non-Gaussian and might feature heavy tails and/or asymmetries. This forms the baseline.
The GIRFs are computed as follows. To analyze the effects of a structural distur-bance (such as an uncertainty shock) we assume that the uncertainty indicator is (without loss of generality) in the j th position in y t . A corresponding shock of size ς in time t to the uncertainty indicator shifts all elements in y t by ς q j , i.e., the j th column of (I − Q) −1 scaled by a scalar that reflects the shock size ς. The other shocks are sampled, again, from their marginal distributions, i.e., for all i = j we have that ε it ∼ N (0, ω it ). Based on this we draw from the predictive distribution conditional on the uncertainty shock y t+1 ∼ p(y t+1 |I t , ε jt = ς) and use this draw to computex t+2 andẑ t+2 . For higher order conditional forecasts we proceed as in the case of the unconditional forecast distribution by simulating from the marginal distribution of the structural shocks which are added to the conditional mean forecastsm t+h .
The corresponding dynamic responses are then obtained by subtracting the mean of the unconditional predictive distribution from the conditional (on the uncertainty shock) predictive density. This yields: Notice that δ ht is state-dependent and, due to the non-linear nature of the conditional mean function, allows for asymmetries in how y t reacts to shocks. 11 This gives rise to two inferential opportunities. First, one can assess how a given shock has impacted the economy in a given point in time. This allows us to investigate whether transmission mechanisms depend on the underlying state of the economy. Second, the non-linear mean function directly implies that shock transmission can be asymmetric, so that positive shocks might feed through the economy differently than negative shocks, and non-proportional, so that larger shocks can have proportionally different effects than smaller shocks. In all our empirical work we will exploit both dimensions and focus on asymmetries in the sign, and non-proportionality in the size, of the shock as well as explicitly consider state dependencies by computing δ ht over time. Finally, we also integrate out uncertainty with respect to the state of the economy by averaging over all values of t.

Illustration using synthetic data
In this section, we illustrate the computational merits of our approach and evaluate whether it successfully recovers different features of a highly non-linear DGP.
To illustrate our methods, we simulate T = 200 observations from a highly non-linear small-scale VAR with M = 3 equations. The three equations differ in terms of whether they are linear or non-linear in the parameters but also with respect to the distribution of the shocks. Non-linearities are captured in two ways. First, we assume a break point and second we assume non-linear relations between the response variable and the lags of the other variables. In all these equations, we assume that the functions f j and g j differ to assess whether our approach is capable of discriminating between the two. The precise form of our DGP is given by: and k = 1, . . . p, and I(•) denotes the indicator function that equals one if its argument is true and zero otherwise. The free elements in Q, are also simulated from a Gaussian distribution with q jk ∼ N (0, 0.1 2 ). Moreover, we introduce an SV specification with heavy tails for the structural error variances ω jt = λ jtωjt with eachh jt = logω jt following an independent random walk law of motion:h jt =h jt−1 + σh j uh t , with uh t ∼ N (0, 1).
For each equation, we set the initial stateω j0 = exph j0 = 0.01 and the state innovation variance σh j = 0.01. We consider (conditionally) t 3 -distributed errors with three degrees of freedom in the first equation by simulating λ 1t ∼ G −1 (3/2, 3/2) and (conditionally) Gaussian-distributed errors for the second and third equation by setting λ 2t = λ 3t = 1 for all t.  Horizontal panels refer to each equation of the DGP while vertical panels show the different components. The red shaded areas represent the 90% posterior credible set of f j , g j , m j (= f j + g j ) and y j , while the solid black line denotes the actual outcome of these quantities.
The figure suggests that our approach is capable of detecting different functional relations between y t , x t and z t . Across all three equations, we find that the estimated conditional mean function m j tracks the actual value rather well. It is also worth stressing that if the DGP features non-Gaussian shocks, the model recovers the true mean function particularly well (see the first row of Figure 3). Zooming into the estimates for the different latent components reveals that most of this strong fit is driven by accurate estimates of f j . Considering the estimates for g j suggests that, if the actual relationship between z t and y t is non-existent, our approach accurately detects this behavior. However, there are some cases where f j soaks up variation in g j . This, however, does not impact the mean estimate m j .
Since this discussion has been based on a single draw from the DGP, one might ask whether the strong performance of the GP-VAR is due to a particularly favorable realization from the DGP. To briefly investigate whether this is the case we show, in the first row of Table 1 two and three feature substantial high frequency movements which, in our framework, is mostly picked up by the stochastic volatility component.
One key advantage of the GP-VAR is that we can remain agnostic on the form of non-linearities the conditional mean function might take. This is confirmed for synthetic data where, irrespective of the non-linear nature of the model, the estimated sum of f j and g j (i.e., m j ) closely tracks the dynamics of the actual outcome. This finding holds both for the linear case (i.e., the first equation), even with fat-tailed errors, and for highly non-linear situations (i.e., the second and third equations).
We have stressed that our approach is computationally efficient and scalable to large datasets. To investigate this claim more carefully, Figure 4 shows the time required to generate 1000 draws from the joint posterior for a given equation across different values of K and for T = 200. We show the computation times for our GP-VAR and a VAR with SV. Since our approach is embarrassingly parallel the actual times for generating a The most striking take away from the figure is that the computation time of the GP-VAR does not depend on K. This implies that increasing the number of lags and/or endogenous variables does not impact estimation times considerably. By contrast, the time necessary to generate a draw from the joint posterior of the VAR rises rapidly in K.
This shows that our approach scales well in high dimensions and, in fact, is much faster than competing approaches to non-linear VAR models such as TVP-VARs or regimeswitching VARs.
To provide a rough gauge on actual estimation times for practitioners, MCMC estimation of the GP-VAR with eight endogenous variables and five lags takes around 30.5 minutes on a standard desktop computer (for 10,000 MCMC draws). These are gross estimation times and thus include pre-computation of matrices used during MCMC simulation and are not based on parallel computation of the individual equations (which is possible due to the structural form). Estimating larger models (such as the 64 variable GP-VAR) takes around four hours.
12 Since we need to augment the j th equation with the contemporaneous values of the preceding j − 1 equations, this statement is only approximately valid.

Modeling the US economy using GP-VARs
In this section, we apply our GP-VAR to US macroeconomic data. Sub-section 5.1 provides details on the data and Sub-section 5.2 discusses whether the GP-VAR fits the data well, shows some of its key in-sample features, and investigates its forecast performance.

Data overview
We use the quarterly version of the dataset proposed in McCracken and Ng (2016)  • GP-VAR-16: In addition to the variables in the VAR with M = 8 endogenous variables, we also include the components of GDP (such as real personal consumption and real private fixed investment), additional labour market variables (such as unemployment, initial claims, average weekly working hours and average hourly earnings across all sectors), housing starts, as well as the real M2 money stock.
• GP-VAR-32: On top of the variables of the VAR with M = 16 variables, we include important financial variables, further data on housing and data on loans.
• GP-VAR-64: The largest model we consider features M = 64 endogenous variables.
The set of endogenous variables is obtained by taking the variables with M = 32 and including additional financial variables and data on manufacturing.
All variables are transformed to be approximately stationary and we include five lags of the endogenous variables. The precise variables included (and transformations applied to each variable) are shown in Table B.1 in the Online Appendix. We consider these different model sizes for several reasons. First, we would like to assess how adding additional information impacts the forecasting performance and the responses of key variables to an uncertainty shock. Second, we are interested in the relationship between non-linearities and the size of the model.

Predictive evidence and in-sample features
In this section, we start by providing some predictive evidence of our GP-VAR and investigate how the role of the parameters associated with the kernel impact predictive accuracy.
To this end, we employ a recursive forecasting design. Our initial training period goes from 1960Q1 to 1999Q4. After computing the one-and four-step-ahead predictive distributions, we add an additional observation, recompute all models and simulate from the corresponding predictive densities. This procedure is repeated until the end of the sample (2019Q4) is reached.
We consider three ways of specifying the equation-specific (j = 1, . . . , M ) and kernelspecific (k = {1, 2}) hyperparameters. The first one is a semi-automatic approach that is based on putting the (inverse) length scale and the linear scaling parameter on a two dimensional grid κ jk ∈ [0.1κ jk , 2κ jk ] and ξ jk ∈ [0.04, 4]. Notice that the median heuristic is used to determine the lower and upper bound of the grid. The second approach (labeled "semi-automatic w/o linear scaling" in Table 2) puts κ jk ∈ [0.1κ jk , 2κ jk ] on a grid and sets ξ jk = 1. Finally, we also consider a specification that does not rely on the median heuristic (labeled "naive" in the table). The grid is κ jk ∈ [0.1, 2] and ξ jk is again set equal to 1.
As competing models, we include the standard Minnesota BVAR with SV, a TVP-VAR-SV similar to the one used in Primiceri (2005) and the BART-VAR with SV proposed in Huber and Rossini (2022). All models are estimated for different model sizes and benchmarked to a small-scale Minnesota BVAR with SV for the three variables we focus on: real GDP growth (RGDP), CPI inflation (CPI), and the Fed funds rate (FFR).
For the three focus variables, we compute log predictive Bayes factors (LPBFs) relative to the small-scale BVAR with SV, so that positive numbers indicate that a given model works better than the benchmark while negative values suggest a weaker forecasting performance. The LPBFs do not only take into account how well a given model predicts the Notes: The table shows joint and marginal LPBFs relative to a small-scale BVAR with SV for the one-and four-step-ahead horizons. The first column indicates model size, the second column provides information on how the hyperparameters are treated. We consider three cases: "semi-automatic" refers to a two-dimensional grid for both the (inverse) length scale and the linear scaling parameter which is scaled using the median heuristic, "semi-automatic w/o linear scaling" to a grid only for the (inverse) length scale and sets ξ jk = 1 while "naive" refers to a grid only for the (inverse) length scale without scaling the grid using the median heuristic and ξ jk = 1 (j = 1, . . . , M ; k = {1, 2}). Positive values imply that a particular model improves upon the small-scale BVAR with SV, while negative values suggest that the benchmark is preferred. The best specification for each variable and horizon combination is shown in bold. The red shaded entries associated with the benchmark give the actual log predictive likelihoods (LPLs).
realization of a given variable but also factor in forecasting performance for higher order features of the predictive distribution (for a discussion, see Geweke and Amisano, 2010).
To investigate whether controlling for heteroskedasticity pays off in the GP-VAR, we also consider homoskedastic variants of the GP-VAR in the upper part of the table. forecasts. For one-year-ahead predictions of GDP growth, the single best performing model is the largest (M = 64) homoskedastic GP-VAR with the semi-automatic approach that fixes the linear scaling parameter to one. Strikingly, at that horizon and for this specific variable, using a homoskedastic specification is almost uniformly better than the corresponding SV setup. For inflation and the Fed funds rate, the smaller-sized GP-VARs again yield the best density forecasting performance, outperforming all competing models.
Finally, focusing on interest rate forecasts shows that GP-VARs with SV do well (for all model sizes) but once we turn off SV forecasts become highly imprecise. This is driven by the fact that during the zero lower bound, the conditional variance of the interest rate equation approaches zero and a model which assumes homoskedasticity fails to take that into account.
After having established that the different GP-VARs with SV do well when used to forecast US macroeconomic quantities, we focus on some in-sample features for the GP-VAR with eight endogenous variables. To get an impression on how the linear shrinkage parameter evolves over time, Figure 5 plots the product of the error variances times the linear shrinkage parameter that determines the kernel of f j and g j . Two interesting features emerge. First, less shrinkage (larger parameter values) is applied to the own lags Notes: This figure reports the posterior means of the product of the error variances ω jt and the linear scaling parameters for own lags (ξ j1 ) and for other lags (ξ j2 ), respectively. These two quantities correspond to the diagonal elements of the re-scaled kernels ω jt × k ϑ j1 (xt, xt) = ω jt ξ j1 and ω jt × k ϑ j2 (zt, zt) = ω jt ξ j2 .
than to the lags of the other variables. This holds for most variables under scrutiny (except for CPI inflation and S&P 500 returns). Second, less shrinkage is typically applied during recessionary times. In particular, we introduce little shrinkage during the recessions in the early 1980s and the financial crisis. Notice, however, that there are also some exceptions from this pattern (such as stock market returns, hours worked or hours employed). This finding is, again, in line with results from the forecasting literature showing that more information is particularly useful during problematic times (see, e.g., Koop, 2013).
Finally, we investigate differences in κ j1 and κ j2 across equations (j = 1, . . . M ) and variable types. Boxplots that show the posterior distribution of the inverse of the length scale parameters for own and other lags are in Figure 6. Recall that large values of κ jk (k = 1, 2) imply more variation in the latent processes whereas values of κ jk close to zero imply less variation in f j and g j . A general pattern is that for all variables the hyperparameters associated with the kernel on own lags are considerably larger than the ones for the kernel related to the other lags. This indicates that the own lags of a given endogenous variable require more flexibility (i.e., functions that allow for much more

The macroeconomic effects of uncertainty shocks
We now analyze the effects of uncertainty shocks using our GP-VAR and focus on assessing how macroeconomic uncertainty feeds through the economy. In Sub-section 6.1, as a benchmark exercise, we compare our impulse responses to those obtained from a model similar to that used by JLN. In Sub-section 6.2 we leverage the non-linear nature of the GP-VAR and analyze how the effects of uncertainty shocks change according to the sign or size of the shocks, and over time.

Comparison with standard BVAR analysis
We benchmark the IRFs of our GP-VAR-8 to the ones of a BVAR with SV that is closely  Notes: Average generalized impulse responses (GIRFs, outlined in Sub-section 3.5) to a positive one standard deviation shock in macroeconomic uncertainty. Solid lines denote the posterior medians, while shaded areas correspond to the 68% posterior credible sets. GP-VAR-8 refers to the smallest variant of our non-parametric model and BVAR-8 refers to a small-scale BVAR with SV, which is closely related to the specification used in Jurado, Ludvigson, and Ng (2015).
In Figure 7 we report the (average over time in the case of the GP-VAR) posterior quantiles (16 th , 50 th and 84 th ) of the responses to a macroeconomic uncertainty shock in the JLN model (in gray) and in the corresponding GP-VAR (in orange) with eight endogenous variables and uncertainty ordered second after stock market returns. Uncertainty responses to its own shock differ slightly between the GP-VAR and the linear model.
These differences relate to responses within the first five quarters after the shock hit the system. The BVAR yields uncertainty reactions that peak after one quarter, declining steadily afterwards. As opposed to this swift reaction in uncertainty, the GP-VAR generates endogenous uncertainty reactions which peak after five quarters, declining sharply afterwards. After around eight quarters, both IRFs (almost) coincide.
This uncertainty reaction has direct implications on how the other variables in the model react. Real GDP growth reacts in an hump-shaped manner under both models.
However, driven by the somewhat later peak in uncertainty, the GP-VAR produces much stronger output growth reactions that peak slightly later (after around five quarters).
When we focus on employment growth the IRFs differ less. In principle, both models suggest a peak decline of around 0.8 percentage points, with the GP-VAR generating a somewhat slower response, reaching its trough after about six to seven quarters. But in principle, responses between the linear and non-parametric model tell a similar story.
Finally, financial market reactions measured through the S&P500 suggest a much stronger decline in stock prices under the GP-VAR. Interestingly, the shape of the IRFs suggests that the linear model generates the strongest reaction after around two quarters. In the GP-VAR, we find that stock markets react faster and stronger to uncertainty shocks, with substantial reactions within the first year after the shock hit the system.
To conclude, in Figure 8 we report the GIRFs to the uncertainty shock for the same four variables displayed in Figure

Asymmetries in the transmission of uncertainty shocks
The non-linear and non-parametric nature of our models allows for asymmetries in the impulse response functions. This implies that shocks propagate non-linearily through the model, giving rise to differences in the GIRFs both over time but also for different shock magnitudes or signs.

Asymmetries with respect to the sign of the shock
The first aspect we consider relates to whether positive and negative uncertainty shocks trigger different responses of the economy. In Figure 9 we report the responses to negative and positive uncertainty shocks from the GP-VAR-8, averaged over time. The figure thus shows GIRFs to a positive (in orange), negative (in blue) and a negative shock multiplied Notes: Average generalized impulse responses (GIRFs, outlined in Sub-section 3.5) to a negative (positive) one standard deviation shock in macroeconomic uncertainty. Solid lines denote the posterior medians, while shaded areas correspond to the 68% posterior credible sets. Here, negative ×(−1) denotes a negative one standard standard deviation shock with the respective responses being mirrored across the x-axis.
by -1 (in gray, to ease comparison). From the figure we observe some differences. These differences mostly relate to peak reactions as well as short-run (i.e. within two years) responses. In general, we find that positive shocks (higher uncertainty) trigger a stronger reaction of uncertainty, which in turn translates into more pronounced reactions of real activity and stock market quantities.
More specifically, considering the endogenous reaction of the uncertainty indicator shows that responses to a positive uncertainty shock peak after around a year and quickly die out afterwards. However, if uncertainty unexpectedly declines, the peak happens on impact and is much smaller as opposed to an adverse uncertainty shock.
Turning to real GDP and employment growth, we find that positive shocks trigger stronger reactions for both variables. Interestingly, the timing of the peak responses is similar for negative and positive shocks but reactions appear much more pronounced for the latter. Stock market reactions also differ markedly across positive and negative shocks. For positive shocks we, again, find that the peak effect happens after one year and that it is more pronounced as compared to the negative shock. Overall, the picture that emerges from the GP-VAR is that higher unexpected uncertainty has stronger effects on the economy than lower uncertainty, a feature that is a priori ruled out in linear VARs.

Asymmetries with respect to the size of the shock
Our GP-VAR also allows for analyzing how shocks of different sizes impact the economy.
As opposed to a standard VAR which assumes that shocks enter linearly (and thus responses to shocks of different sizes are exactly proportional to each other) our GP-VAR is more flexible and allows for investigating whether shocks of different magnitudes trigger different dynamics in the GIRFs.
In Figure 10 we consider two shock sizes: a one standard deviation and a two standard deviation shock. To permit straightforward comparison of the shapes of the responses to differently sized shocks, we also add impulses to a two standard deviation shock which are then re-scaled to match the impact of the one standard deviation shock (the gray shaded area in the figures). Notes: Average generalized impulse responses (GIRFs, outlined in Sub-section 3.5) to a positive two (one) standard deviation shock in macroeconomic uncertainty. Solid lines denote the posterior medians, while shaded areas correspond to the 68% posterior credible sets. Here, 1 sd refers to a one standard deviation shock, 2 sd indicates a two standard deviation shock, and 2 sd ×(1/2) denotes a two standard deviation shock with the respective responses divided by two.
This figure gives rise to at least two observations. First, when we compare the shape of the responses to a one standard deviation to the ones of a two standard deviation shock we find differences in the timing (and more generally in the shape) of the IRFs.
A stronger shock triggers a faster peak reaction of GDP and employment growth. Stock market reactions display a somewhat different shape. After a sharp immediate reaction (for both shock sizes) the peak effect happens to be on impact if the size of the shock is large whereas it turns out to materialize after one year if the shock size is smaller.
Second, in terms of the magnitudes we find that a two standard deviation shock triggers peak responses with magnitudes that are less than twice the magnitudes to a one standard deviation shock. This is particularly visible for employment and output reactions. For stock market responses, the impact reactions are (almost) proportional to each other.

Asymmetries over time
After showing that the economy reacts asymmetrically with respect to the sign and size of the uncertainty shock, this section asks whether the effect of uncertainty shocks changes over time (see Section 2.5 of Castelnuovo (2022), or Mumtaz and Theodoridis (2018) for some evidence using TVP-VARs). The main feature emerging from the figure is the different behavior of the response of uncertainty across sub-samples. In the final two sub-samples, uncertainty responses increase up to four quarters after the shock, with peak effects being strongest in the great moderation period and becoming slightly weaker in the final sub-sample. Moreover, sign effects of uncertainty responses increase appreciably in the last two sub-samples.
These differences in the responses of uncertainty trigger differences in the IRFs of the other quantities which relate not only to the magnitudes but also to the shapes of the responses. We find that GDP growth, employment and the S&P 500 display the strongest reactions in the great moderation regime, becoming slightly weaker in the post great moderation period. The weaker reaction of real activity over time corroborates findings in Mumtaz and Theodoridis (2018) who also report smaller responses of real activity to uncertainty shocks. As opposed to their findings, we observe that stock market reactions do not change much in magnitude but the shape differs (in accordance with the different shape in the uncertainty reaction described above). We, moreover, observe that asymmetries in terms of the sign of the shocks have decreased over time for GDP and employment growth. Only for stock market reactions these sign asymmetries have increased, with benign uncertainty shocks yielding a much weaker positive reaction of stock markets during the post great moderation regime.
Finally, to conclude this section we raise the issue that considering GIRFs averaged over sub-samples possibly still masks important differences over time within sub-samples.
To shed light on whether IRFs change within regimes, Figure 12  Overall, we can conclude that the effects of uncertainty change both during subsamples defined by economic considerations and sometimes also within each sub-sample. Notes: Impulse response functions to a positive one standard deviation shock in macroeconomic uncertainty in the GP-VAR-8 across sub-sample periods. Solid lines denote the yearly averaged posterior medians, with colors ranging from yellow (start of the sample) to red (end of the sample).
This kind of time variation is a priori ruled out in linear VAR models, which can therefore lead to biased estimates of the effects of uncertainty.

Conclusions
In this paper, we have developed a flexible multivariate model that uses Gaussian processes to model the unknown relationship between a panel of macroeconomic time series and their lagged values. Our GP-VAR is a very flexible model which remains agnostic on the precise relations between the endogenous variables and the predictors. This model can be viewed as a very flexible and general extension of the linear VAR commonly used in empirical macroeconomics. We also control for changes in the error variances by introducing a stochastic volatility specification. While a more flexible conditional mean can reduce the need of a time-varying conditional variance, empirically we find heteroskedasticity to be relevant also for GP-VARs.
We develop efficient MCMC estimation algorithms for the GP-VAR, which are scalable to high dimensions, so much so that for large models estimation is even faster than for the corresponding BVAR-SV. Scaling the covariance of the Gaussian process by the latent volatility factors is particularly helpful to achieve computational gains, as it permits to pre-compute several quantities before MCMC sampling. This speeds up computation enormously.
To illustrate the practical working of the GP-VAR, we first test it on simulated data from different linear and non-linear models, finding that it is capable of reproducing a variety of non-linear patterns (but also a linear behavior). Then, we show in a forecasting exercise that our model yields favorable density forecasts of US output, inflation and shortterm interest rates with respect to both linear and other non-parametric and time-varying specifications.
In the main part of our empirical work we re-assess the effects of uncertainty shocks by replicating and extending the analysis carried out by Jurado, Ludvigson, and Ng (2015) based on linear VARs with the GP-VAR. Overall, our empirical results suggest that the measurement of uncertainty and its effects with a simple linear VAR can lead to several incorrect conclusions. Not only the effects of uncertainty can be over-stated, but they can also be treated as stable over time, symmetric for positive and negative shocks, and proportional to the shock size. Instead the GP-VAR model, which is preferred to the linear VAR in terms of fit and forecasting performance, returns time variation in the responses, asymmetry and non-proportionality. Hence, the empirical features we uncover should be also replicated by theoretical models about uncertainty and its effects, which instead at the moment typically assume stability and symmetry (see, e.g., the survey in Bloom, 2014).

Macroeconomic Uncertainty
Niko Hauzenberger a , Florian Huber a , Massimiliano Marcellino b , and Nico Petz a a University of Salzburg b Bocconi University, IGIER and CEPR In this sub-section, we discuss how to handle persistent time series with GPs. In principle, appropriately choosing κ allows for capturing slowly evolving trends in y t . But one elegant aspect of GPs is that stochastic trends in y t can also be modeled explicitly through the kernel. This can be achieved as follows. Let B denote a T × T lower triangular matrix with 1's on the main and off-diagonal elements.
The corresponding weight-space representation is then given by: where r is a prior scaling parameter that controls the average jump size of the shocks in η t . Notice that this equation can be represented in component form as follows: which implies that y t is driven by a latent random walk factor with the matrix B capturing the state evolution dynamics and r representing the state innovation variance. This is a standard unobserved components model. Notice that the corresponding kernel matrix can be derived as rBB . Adding this to the Gaussian kernel discussed in the main text yields a combination between a kernel that captures unknown relations between y t and x t but also possible stochastic trends in y t .
The properties of this linear persistence kernel are illustrated in Figure   Notes: In this figure we showcase the GP regression with US inflation data, setting the matrix of regressors, B, to be a lower triangular matrix with 1's as the diagonal and off-diagonal elements. The linear kernel is given by rBB . The left panels report, for different values of r, the 5 th and 95 th prior percentiles (with the area in between shaded in light red), three draws from the prior (in dashed red), and the actual values of inflation (black dots). The right panels report the 90% posterior credible sets (shaded in light red), the posterior medians (in solid red), and actual inflation (black dots).

A.2 Sampling the factors under linear restrictions
As stated in the main text, we identify the conditional mean of the process by introducing the linear restriction that Gg j = 0 with G = (ι ι) −1 ι . This can be efficiently achieved through the sampler proposed in Cong, Chen, and Zhou (2017).
Suppose our goal is to simulate g j from N S (g j , V g j ) with restriction S = {g j : Gg j = 0} and moments given by: This can be achieved using Algorithm 2 of Cong, Chen, and Zhou (2017). This algorithm consists of two steps. In the first step we sample g * j from the unrestricted distribution N (g j , V g j ). In the second step, we obtain a draw from the restricted distribution by setting: All involved quantities are trivial to compute and this step does not introduce additional computational hurdles. This normalization essentially subtracts a constant term from g * j , the unrestricted draw of g j , so that the resulting grand mean will be equal to zero. This leaves the conditional mean function m j unchanged.

A.3 Sampling the log-volatilities
We sample the log-volatilities marginally of the factors. This is achieved by exploiting the weight-space view of the GP. Integrating out f j and g j allows us to rewrite the j th equation as whereW j denotes the lower Cholesky factor of K ϑ j1 (X j , X j ) + K ϑ j2 (Z j , Z j ) + I T and Ω j = diag(ω j ) with ω j = (ω j1 , . . . , ω jT ) . Our goal is to sample the log-volatilities h j = log ω j from its conditional posterior distribution p(h j |Ỹ j ,W j , θ hj ), with θ hj = (ρ hj , σ 2 hj , h j0 ) collecting the parameters associated with the state equation of the logvolatility process, which evolves according to a stationary AR(1).
The corresponding T -dimensional full conditional posterior distribution can be expressed as: where p(Ỹ j |h j ,W j ) refers to the likelihood and p(h j |θ hj ) to the prior. While the prior is Gaussian and defined by the state equation in Eq.
(3), the likelihood takes a multivariate Gaussian form only when conditioned on the log-volatilities h j (and thus Ω j ): where √ Ω jWjW j √ Ω j refers to the variance-covariance matrix ofỸ j . At this point it proves convenient to use the fact that Ω j is a diagonal matrix. The main challenge is that h j enters Eq. (A.3) non-linearly and is never observed directly (but evolves according to a latent process), which complicates the evaluation of the likelihood.
Moreover, the conjugate nature of our GP regression complicates likelihood evaluation further. The algorithm proposed by Kim, Shephard, and Chib (1998) based on auxiliary mixture indicators cannot be used because Ω j -and thus h j -are serially correlated and are not independent from each other, as the full symmetric kernel matrices K ϑ j1 (X j , X j ) and K ϑ j2 (Z j , Z j ) enter the Cholesky factorW j . 1 As a remedy, we follow Chan (2017) who proposes computationally efficient sampling techniques. Our algorithm is a variant of the independence Metropolis-Hastings (MH) which can be readily applied in the present setting. 2 As noted by Chan (2017), this algorithm is fast and exhibits good mixing properties for several reasons. The independent MH step samples the log-volatilities jointly from their full conditional posterior distribution, the proposal distribution is constructed such that the acceptance rate of the MH step is sufficiently high, and in addition, we resort to sparse matrix methods to speed up computation.
In the following, we provide a detailed discussion on our independence MH step.
We use a second-order Taylor approximation with respect to h j and then employ the Newton-Raphson optimization algorithm to represent the non-trivial conditional posterior in Eq. (A.2) in the form of a multivariate Gaussian distribution. This approximation is centered on the mode of p(h j |Ỹ j ,W j , θ hj ) and the variance-covariance is given by the inverse of the negative Hessian of log p(h j |Ỹ j ,W j , θ hj ) evaluated at the obtained mode. This Gaussian distribution provides a sufficiently accurate approximation to the full conditional posterior of h j and thus ensures a high acceptance rate when used as the proposal density in our MH step, leading to favorable mixing properties.
To construct our Gaussian proposal density thus requires evaluating the gradient and Hessian of the logarithm of the full conditional posterior of h j . According to Eq. (A.2), the log-posterior distribution is the sum of the log-likelihood and log-prior distribution, implying that the gradient/Hessian of the log-posterior is also the sum of the gradient/Hessian of these two components.
We first focus on the gradient and the Hessian of the log-prior distribution, since 1 In particular, this implies that the measurement errors log(ε 2 j ) of the linearized observation equation (which is obtained by squaring and taking the logarithm of Eq. (A.1)) are no longer independently log(χ 2 1 ) distributed and cannot readily approximated by a known mixture of Gaussian distribution to render the likelihood Gaussian conditional on the mixture indicators.
these quantities are mostly standard (see, e.g., Chan, et al., 2019). For the t th element of h j , the prior is defined by Eq.
In the following, let r P (h j ) denote the first derivative (gradient) and R P (h j ) the second derivative (Hessian) of the log-prior distribution with respect to h j evaluated at h j : While r P (h j ) is a T × 1 vector, R P (h j ) is a tridiagonal matrix of dimension T , with non-zero elements on the main diagonal and the two diagonals below and above the main one.
Next, we focus on the log-likelihood, which has the following form: where all the quantities involvingW j can be pre-computed andŶ j = Ỹ j exp − h j 2 with denoting the component-wise multiplication. Now, let a T × 1 vector r L (h j ) denote the gradient and a T × T -matrix R L (h j ), the Hessian of the log-likelihood again with respect to h j evaluated ath j : It is easy to see that the presence of W jW j In what follows, we exploit the special structure of W jW j −1 . Elements of the two kernels K ϑ j1 (X j , X j ) and K ϑ j2 (Z j , Z j ) are defined by a squared exponential function that per construction imposes a view that periods close to each other (usually this is the case for t − 1, t, and t + 1) feature relatively large covariances, while dissimilar periods have very small covariances. This structure implies that the elements of W jW j −1 on the tridiagonal main band carry most of the information, while entries far away from this main band are typically close to zero. Hence, we approximate R L (h j ) by a T × T matrix R L (h j ) that keeps only the elements of R L (h j ) on the tridiagonal main band and sets all other entries to zero. This approximation has the convenient feature that bothR L (h j ) and R P (h j ) have the same band sparse structure. Consequently, the Hessian of the logposterior distribution is also a tridiagonal matrix, which greatly facilitates computation through the use of band sparse matrix algorithms.
Combining the gradient (the approximated Hessian) of the log-likelihood with that of the log-prior distribution, we obtain the gradient (the approximated Hessian) of the log-posterior distribution for h j evaluated ath j : Finally, to define the moments of our Gaussian proposal density for our independence MH step, we employ the Newton-Raphson optimization method. To obtain the modeĥ j of the log-posterior distribution of h j , we initialize the algorithm with a suitable starting value h j =h (1) < , with = 10 −4 being sufficiently close to zero and acting as convergence criterion for the numerical maximization algorithm. Whileĥ j is used as the mean of the Gaussian proposal density, the variance-covariance is set to the inverse of the (approximated) negative Hessian evaluated atĥ j . Hence, the proposal is given by: This proposal has good empirical properties. In all our empirical work it leads to acceptance rates between 35 and 50 percent.

A.4 A sketch of the posterior simulator
Our posterior simulator is comprised of several steps that involve the full conditional distributions of the corresponding latent quantities and parameters of the model. Since we use a collapsed sampler the ordering of the steps is crucial for sampling from the correct joint posterior distribution.
The joint posterior distribution associated with the j th equation is given by: p(f j , g j , q j1 , . . . , q jj−1 , Ω j , ϑ j1 , ϑ j2 , d j , ν j , j |Data), where d j denotes the mixture indicators used in the Gaussian approximation to the log-χ 2 1 distribution, ν j are the coefficients associated with the SV state equation and j are the hyperparameters associated with the Horseshoe prior. Let q j• = (q j1 , . . . , q jj−1 ) denote the j th row of Q.
Conditional on adequately chosen starting values, our MCMC algorithm draws from the joint posterior of equation j th coefficients and states by iterating through the following steps.
Step 1: Sample a value of q j• from p(q j• |f j , g j , The moments V q j and q j• take a standard form. Step 2: Simulate the full history of the (log) volatilities from p(Ω j |q j• , ϑ j1 , ϑ j2 , d j , ν j , Data) using the independent MH algorithm outlined in Sub-section A.3. Notice that this step is marginally of f j and g j Step 3: The factor f j is obtained from the multivariate Gaussian conditional posterior p(f j |g j , q j• , ϑ j1 , Ω j , Data) as described in Sub-section 3.4.
Step 4: The factor g j is obtained from the multivariate Gaussian conditional posterior p(g j |f j , q j• , ϑ j2 , Ω j , Data) as described in Sub-section A.2.
Step 5: ϑ j1 is obtained from its discrete posterior distribution using multinomial sampling.
Step 7: Sample the parameters of state equation associated with the log-volatilities. The corresponding conditional distributions all take standard forms.
Step 8: Sample the hyperparameters of the Horseshoe prior in . We use the Gibbs updating step described in Makalic and Schmidt (2015) which only samples from inverse Gamma distributions.
The precise ordering of the steps in our algorithm is crucial for obtaining draws from the correct stationary distribution since we rely on marginalizing out some of the states/parameters in some steps of the MCMC algorithm. 5 Notice that f j and g j are sampled conditionally 4 If j = 1, ignore this step and proceed to step 2. 5 Van Dyk and Park (2008) discuss a general approach on how to construct collapsed Gibbs samplers in order to maintain the correct stationary distribution.
on Ω j while Ω j is sampled marginally from f j and g j . This implies that we again sample Ω j marginally from the factors and the factors conditionally on the volatilities, leading to a joint update from p(Ω j , f j , g j |•).

A.5 Computing generalized impulse response functions
Computing the GIRFs is achieved as follows. We proceed in an equation-by-equation basis using the structural representation of the GP-VAR in Eq. (2). In this case, the one-step-ahead predictive distribution of m 1t is: with predictive moments given by: Here, we let W j = (X j , Z j ) with t th row W jt and K * ϑ j (W j , W j ) = K ϑ j1 (X j , X j ) + K ϑ j2 (Z j , Z j ). A draw from the one-step-ahead predictive distribution of y 1t+1 is obtained from: Draws from the forecast distribution of y 1t+1 are used to form the predictive distribution for m 2t (and thus y 2t+1 and so on). In general, the predictive density for the conditional mean of equation j > 2 is given by: and the predictive variance defined as in the case of j = 1. A draw from the predictive distribution of y jt+1 is obtained by drawing ε jt+1 ∼ N (0, ω jt+1 ) and adding this to m jt+1 .
Drawing from the posterior of the forecasts for all M elements in y t yields the onestep-ahead forecast distribution p(y t+1 |I t ). Higher order forecast distributions are then obtained by using y t+1 ∼ p(y t+1 |I t ) to construct W jt+2 and then compute m jt+2 , for j = 1, . . . , M , and drawing from the marginal distribution of the structural shocks. In general, h-step-ahead predictions y t+h are obtained similarly by drawing from p(y t+h |I t ).
As described in the text, we focus on an uncertainty shock with the uncertainty index being located in the j th position in y t . Let ε jt denote the structural innovation in time t and we assume that ε jt = ς. This implies that y t changes by q j . Again, we can compute point forecasts but instead of using W jt+1 (which comprises of y t and its p − 1 lags),Ŵ jt+1 is constructed based onŷ t = y t + q 1 +ε t . The shockε t is obtained from the marginal distribution of the shocks but its j th element equals zero. UsingŴ jt+1 , we can compute the predictive distributions of y t+1 given a shock of size ς in ε jt . As an intermediate step, we need to compute the mean forecast based on ε jt = ς: with mean and variance given by: These one-step-ahead forecasts can again be used to construct iterative higher order predictions conditional on a unit structural shock to the first variable. Doing so for each of the M variables yields a draw from y t+h ∼ p(y t+h |I t , ε jt = ς). Notes: This figure reports the posterior means of the product of the error variances ω jt and the linear scaling parameters for own lags (ξ j1 ) and for other lags (ξ j2 ), respectively. These two quantities correspond to the diagonal elements of the re-scaled kernels ω jt × k ϑ j1 (xt, xt) = ω jt ξ j1 and ω jt × k ϑ j2 (zt, zt) = ω jt ξ j2 . Notes: This figure reports the posterior summaries in the form of simplified boxplots of the inverse length scale parameters for own lags (κ j1 ) and other lags (κ j2 ), respectively. The solid black lines denote the posterior medians, while the blue (green) shaded areas represent the 50% posterior credible sets (i.e., the posterior interquartile ranges).

C.2 Robustness with respect to different orderings
This figure shows heatmaps of the correlations between the posterior median of the GIRFs for 25 different variable orderings (chosen at random). To ensure that the identification does not impact the results, we fix the ordering of the first two elements in y t (the S&P 500 is ordered first and the uncertainty index second). The heatmaps show that for our focus variables, the correlations are (almost) always above 0.9. This indicates that the ordering does not play a particular role for the estimation of the IRFs. The only responses that display slightly stronger changes are the ones of the S&P 500. In this case, the differences in IRFs are, however, mostly related to higher order IRFs which are insignificant. Shortterm reactions are almost identical and thus do not change much if we alter the orderings. Notes: Average generalized impulse responses (GIRFs, outlined in Sub-section 3.5) to a positive one standard deviation shock in macroeconomic uncertainty. Solid lines denote the posterior medians, while shaded areas correspond to the 68% posterior credible sets. GP-VAR-8 refers to the smallest variant of our non-parametric model and BVAR-8 refers to a small-scale BVAR with SV, which is closely related to the specification used in Jurado, Ludvigson, and Ng (2015).  Notes: Period-specific generalized impulse responses (GIRFs, outlined in Sub-section 3.5) to a positive (orange) and negative (blue) one standard deviation shock in macroeconomic uncertainty. Solid lines denote the posterior medians, while shaded areas correspond to the 68% posterior credible sets. We consider economic recessions and expansions according to the NBER Business Cycle Dating Committee. Notes: Impulse response functions to a positive one standard deviation shock in macroeconomic uncertainty in the GP-VAR-8 across sub-sample periods. Solid lines denote the yearly averaged posterior medians, with colors ranging from yellow (start of the sample) to red (end of the sample).
C.4 Additional results on model size and asymmetries  Notes: Average generalized impulse responses (GIRFs, outlined in Sub-section 3.5) to a negative (positive) one standard deviation shock in macroeconomic uncertainty. Solid lines denote the posterior medians, while shaded areas correspond to the 68% posterior credible sets. Here, negative ×(−1) denotes a negative one standard standard deviation shock with the respective responses being mirrored across the x-axis. Notes: Average generalized impulse responses (GIRFs, outlined in Sub-section 3.5) to a positive two (one) standard deviation shock in macroeconomic uncertainty. Solid lines denote the posterior medians, while shaded areas correspond to the 68% posterior credible sets. Here, 1 sd refers to a one standard deviation shock, 2 sd indicates a two standard deviation shock, and 2 sd ×(1/2) denotes a two standard deviation shock with the respective responses divided by two. Notes: Average generalized impulse responses (GIRFs, outlined in Sub-section 3.5) to a negative (positive) one standard deviation shock in macroeconomic uncertainty. Solid lines denote the posterior medians, while shaded areas correspond to the 68% posterior credible sets. Here, negative x (−1) denotes a negative one standard standard deviation shock with the respective responses being mirrored across the x-axis. Notes: Average generalized impulse responses (GIRFs, outlined in Sub-section 3.5) to a positive two (one) standard deviation shock in macroeconomic uncertainty. Solid lines denote the posterior medians, while shaded areas correspond to the 68% posterior credible sets. Here, 1 sd refers to a one standard deviation shock, 2 sd indicates a two standard deviation shock, and 2 sd x (1/2) denotes a two standard deviation shock with the respective responses divided by two.