A Bayesian approach for estimation of weight matrices in spatial autoregressive models

We develop a Bayesian approach to estimate weight matrices in spatial autoregressive (or spatial lag) models. Datasets in regional economic literature are typically characterized by a limited number of time periods T relative to spatial units N. When the spatial weight matrix is subject to estimation severe problems of over-parametrization are likely. To make estimation feasible, our approach focusses on spatial weight matrices which are binary prior to row-standardization. We discuss the use of hierarchical priors which impose sparsity in the spatial weight matrix. Monte Carlo simulations show that these priors perform very well where the number of unknown parameters is large relative to the observations. The virtues of our approach are demonstrated using global data from the early phase of the COVID-19 pandemic.


Introduction
Spatial econometrics deals with the study of cross-sectional dependence and interactions among (spatial) observations. A particularly popular spatial econometric model is the spatial autoregressive (or spatial lag) specification, where spatial interdependence between observations is governed by a so-called spatial weight matrix. The spatial weight matrix is typically assumed non-negative, row-standardized and exogenously given, with spatial weights based on some concept of neighbourhood. Geographic neighbourhood is often preferred due to exogeneity assumptions. However, when relying on geographic information, several competing approaches exist for constructing the weight matrix (for a thorough discussion, see LeSage and Pace 2009).
Recently, Kelejian and Piras (2014), Qu and Lee (2015), Han and Lee (2016), and Hsieh and Lee (2016)  Since direct estimation of a spatial weight matrix requires estimating at least ( − 1) parameters (ignoring the other model parameters), only few approaches target direct estimation of spatial weight matrices. Recently, Ahrens and Bhattacharjee (2015) and Lam and Souza (2020) tackle this problem through LASSO-based approaches (Tibshirani 1996), which involve (a priori) expert knowledge about the interactions between spatial units, while allowing the final estimates of the spatial weights to slightly deviate from it. However, for regional economic panels, where the time dimension is often limited relative to the number of spatial observations , estimation results in a deleterious proliferation of the number of parameters.
In this paper we describe a novel and flexible Bayesian approach for estimation of spatial weight matrices. Our definition of spatial weight matrices fulfils the typical assumptions employed in the vast majority of spatial econometric literature. The resulting spatial weight matrices are assumed non-negative and specific requirements to identification of the parameters can be easily implemented in a Markov-chain Monte Carlo (MCMC) sampling strategy. Although our primary focus is on row-standardized spatial weight matrices, weights without row-standardization are Ahrens and Bhattacharjee (2015) consider the case of sparsity in the spatial weights by employing shrinkage towards the zero matrix. also implementable. To make our estimation approach applicable to spatial panels where the number of time periods is limited as compared to the number of spatial units , we focus on spatial weight matrices which are binary prior to potential row-standardization.
In this paper we primarily focus on scenarios where no a priori information on the spatial structure is available. However, we also discuss how a priori spatial information can be implemented in a very simple and transparent way. For cases where the number of unknown parameters is large relative to the number of observations, we discuss hierarchical prior setups which impose sparsity in the weight matrix. In a Monte Carlo study, we show that these sparsity priors perform particularly well when the number of spatial observations is large relative to the time periods . We show that our approach can be implemented in an efficient Gibbs sampling algorithm, which implies that the estimation strategy can be easily extended to other spatial econometric specifications. Among several others, such extensions include shrinkage estimation to avoid overparameterization (Piribauer and Cuaresma 2016), more flexible specifications of the innovation process (LeSage 1997), controlling for unobserved spatial heterogeneity (Cornwall and Parent 2017;Piribauer 2016), or allowing for non-linearity in the slope parameters (Basile 2008;Krisztin 2017). It is moreover worth noting that the proposed approach can be easily adapted to matrix exponential spatial specifications (LeSage and Pace 2007), spatial error specifications (see, LeSage and Pace 2009), or local spillover models (Vega and Elhorst 2015).
The rest of the paper is organized as follows: the next section outlines the panel version of the considered spatial lag model. Section 3 discusses the Bayesian estimation approach of the spatial weights along with several potential prior setups. Section 4 presents the Bayesian MCMC estimation algorithm and also discusses how to efficiently deal with the computational difficulties when updating the spatial weights in the MCMC sampler. Section 5 assesses the accuracy of the sampling procedure via a Monte Carlo simulation study. Section 6 illustrates our approach using data on global infection rates of the very first phase of the recent COVID-19 pandemic. The final section concludes.

Econometric framework
We consider a panel version of a global spillover spatial autoregressive model (SAR) of the form: where denotes an × 1 vector of observations on the dependent variable measured at period . and represent parameters associated with fixed effects for the spatial units and time periods, respectively. is an × 0 full rank matrix of explanatory variables, with corresponding 0 × 1 vector of slope parameters 0 . is a standard × 1 disturbance term ∼ N (0, 2 ). The × matrix denotes a spatial weight matrix and is a (scalar) spatial dependence parameter. is non-negative with > 0 if observation is considered as a neighbour to , and = 0 otherwise. A vital assumption is also that = 0, in order to avoid the case that an observation is assumed as a neighbour to itself. A frequently made assumption amongst practitioners is that is row-stochastic with rows summing to unity. In this paper, we mainly present results relating to row-stochastic weight matrices. However, as the decision on row-standardizing depends on the empirical application, it is worth noting that the proposed approach may be easily adapted to problems without row-standardization of .
The reduced form of the SAR model is given by: is a so-called spatial multiplier matrix. To ensure that ( − ) is invertible, appropriate stability conditions need to be imposed. For row-stochastic spatial weight matrices, a sufficient stability condition for the spatial autoregressive parameter often employed is ∈ (−1, 1) (see, for example, LeSage and Pace 2009).
In most cases, the elements of are typically treated as known. In the spatial econometric literature, there are various ways as a means to constructing such a spatial weight matrix. In this study we focus on estimation of weight matrices which are binary prior to row-standardization.
We also consider specifications with a spatial lag of the temporally lagged dependent variable. Sampling strategies for these cases are presented in the appendix.
Thorough discussions on the implications of row-standardization are provided by Plümper and Neumayer (2010) and Liu et al. (2014). We therefore assume that the typical element of our spatial weight matrix can be obtained from an unknown × spatial adjacency matrix (with typical element ). We therefore define = ( ), where (·) denotes the row-standardization function: The elements of the adjacency matrix are assumed as unknown binary indicators, which are subject to estimation. It is worth noting that the assumption of a binary covers a wide range of specifications commonly used in the literature such as contiguity, distance band, or nearest neighbours (see, for example, LeSage and Pace 2009).
To alleviate further notation, we collect the respective dummy variables associated with the fixed effects along with the explanatory variables in a × matrix with corresponding × 1 parameter vector . Moreover, define = 1 , . . . , and D = { , } denotes the data. The Gaussian likelihood (D|•) is then given by: When the elements of the spatial weight matrix are subject to estimation, the number of unknown parameters is likely much larger than the number of observations. Since spatial economic panels often feature limited relative to , the proposed estimation approach has to address the issue of over-parametrization. We discuss different ways to tackle this problem. First and foremost, one may reduce the dimensionality of the problem by imposing a priori information on spatial weights or assuming symmetry of the spatial neighbourhood structure. Alternatively, we consider hierarchical prior setups which impose sparsity in the weight matrix.
When estimating spatial weights in addition to the spatial and slope parameters, identification issues are more complicated as compared to models assuming exogenous spatial weights. We therefore follow De Paula et al. (2019), who provide a thorough discussion on parameter identi-Eq.
(3) implies some observations may have zero neighbours. However, priors on the number of neighbours can be easily elicited to rule out such situations. Moreover, a researcher might easily abstain from row-standardization by neglecting the transformation in Eq. (3).
The function (·) may simply be dropped when considering models without row-standardization of . fication for rather general spatial autoregressive model specifications. As mentioned before, we consider spatial weight matrices which are non-negative and = 0 for all . Further standard assumptions include = | | < 1 ∀ , | | < 1, and || || < for some positive ∈ R, as well as 0 ≠ 0. As an additional identifying assumption, it is important that the main diagonal elements of 2 are not proportional to a vector of ones.  (2019) show that a strongly connected spatial network for global identification is needed. Since strong a priori information on the spatial weight matrix is often not available (or desired), we therefore assume ∈ (0, 1) and only consider positive spatial autocorrelation, which is a typical assumption for empirical applications.

Bayesian estimation of W
In this paper we use a Bayesian estimation approach to obtain estimates and inference on the unknown quantities , , 2 , as well as the elements of . After eliciting suitable priors for the unknown parameters, we employ a computationally efficient MCMC algorithm.
Let ( = 1) denote the prior belief in including the th element of the spatial weight matrix.
Conversely, for a proper prior specification the prior probability of exclusion is then simply given by ( = 0) = 1 − ( = 1). With − denoting the elements of the neighbourhood matrix without , the posterior probabilities of = 1 and = 0 conditional on all other parameters are given by: where 1 and 0 are given by through updating the spatial weight matrix via setting = 1 The most obvious case, where this assumption would be violated is a fully connected with = 1/ for all ≠ .
These assumptions can be checked during estimation by using standard rejection sampling techniques in the MCMC sampling steps (see, for example, Pace 2009, or Koop 2003). Rejection sampling rejects draws of parameter combinations which do not fulfil these assumptions. and = 0, respectively. Using the law of total probability, it is straightforward to show that the resulting conditional posterior for is Bernoulli: with¯( 1) = ( = 1| − , , 2 , , D) and¯( 0) = ( = 0| − , , 2 , , D) given in Eq.
(5). Since the conditional posterior follows a convenient and well-known form, efficient Gibbs sampling can be employed.
A Bayesian estimation framework requires elicitation of a prior on . Obvious candidates are independent Bernoulli priors on the unknown indicators : where denotes the prior inclusion probability of , ( = 1) = . Conversely, the prior probability of exclusion then simply takes the form ( = 0) = 1 − .
A natural prior choice would involve setting = = 1/2 for ≠ , and zero otherwise, which implies that each off-diagonal element in has an equal prior chance of being included.
However, in many cases a researcher has possible a priori information on the underlying structure of the spatial weight matrix. The following stylized examples demonstrate how to incorporate such information in a flexible and straightforward way. Case (B) depicts the opposite case where no prior spatial information is available. Specifically, this case considers full estimation of all 2 − potential links with respective prior inclusion To reduce the dimensionality of the parameter space, an interesting alternative might be the assumption of a symmetric , which halves the number of free elements in the spatial weight matrix. This assumption can be imposed in the way by simply simultaneously updating = , respectively. Notes: Alternative prior setups for a linear city of = 15 spatial observations. Case (A) shows a prior specification without any prior uncertainty on the spatial links. This setup implies an exogenous and no estimation of the weights is involved. Case (B) involves no spatial prior information and each element has a prior probability of inclusion = 1/2 ∀ ≠ . Case (C) shows uncertainty of the linkages in only within a certain spatial domain. Case (D) is a stylized prior specification considering uncertainty among two (or more) weight matrices, with setting = 1 in regions where the two matrices overlap. probability = 1/2 for ≠ . Figure 1 depict prior setups where a priori spatial information is available to the researcher, but associated with uncertainty. Case (C) illustrates a prior where the general spatial domain is assumed as being a priori known, but uncertainty over specific linkages exists. In empirical practice, spatial weight matrices based on geographic information are often viewed as being preferable due to exogeneity assumptions to (socio-)economic data.

Subplots (C) and (D) in
The illustrated prior specification follows this idea by still allowing for uncertainty and flexibility among the spatial neighbourhood.
Recent contributions to spatial econometric literature propose selecting (Piribauer and Cuaresma 2016) or combining (Debarsy and LeSage 2018) multiple exogenous spatial weight matrices. Case (D) follows a similar idea by depicting a mixture of a distance band and a contiguity matrix (i.e. neighbourhood if regions share a common border). The intersecting elements of the two spatial structures (resulting in a contiguity matrix) are assumed as being included by setting = 1.

Hierarchical prior setups and sparsity
The prior structure in Eq. (7) involves fixed inclusion probabilities , which implies that the number of neighbours of observation follows a Binomial distribution with a prior expected number of neighbours of ( − 1) . However, such a prior structure has the potential undesirable effect of promoting a relatively large number of neighbours. For example, when = 1/2, the prior expected number of neighbours is ( − 1)/2, since combinations of resulting in such a neighbourhood size are dominant in number.
To put more prior weight on parsimonious neighbourhood structures and therefore promote sparsity in the adjacency matrix, one may explicitly account for the number of linkages in each row of the adjacency matrix = [ 1 , . . . , ] . We consider a flexible prior structure on the number of neighbours that corresponds to a beta-binomial distribution BB ( − 1, , ) with two prior hyperparameters , > 0. The beta-binomial distribution is the result of treating the prior inclusion probability as random (rather than being fixed) by placing a hierarchical beta prior on it. For , the resulting prior can be written as follows: where Γ(·) denotes the Gamma function, and and are prior hyperparameters.
In the case of = = 1, the prior takes the form of a discrete uniform distribution over the number of neighbours. By fixing = 1, we follow Ley and Steel (2009) and anchor the prior

Bayesian MCMC estimation of the model
This section presents the Bayesian MCMC estimation algorithm for the proposed modelling framework. Estimation is carried out using an efficient Gibbs sampling scheme. The only exception is the sampling step for the spatial (scalar) autoregressive parameter , where we propose using a standard griddy Gibbs step. The sampling scheme involves the following steps: I. Set starting values for the parameters (e.g. by sampling from the prior distributions) II. Sequentially update the parameters by subsequently sampling from the conditional posterior distributions presented in this section.
Step II. is repeated for times after discarding the first 0 draws as burn-ins.
The resulting conditional posterior distribution is Gaussian and of well-known form (see, for example, LeSage and Pace 2009): The conditional posterior of 2 is inverted Gamma: A random walk Metropolis-Hastings step for might be employed as an alternative.

Sampling
For the spatial parameter , we use a standard Beta distribution (see LeSage and Pace, 2009, p. 142). The conditional posterior is given by: Note that the conditional posterior for does not follow a well-known form and thus requires alternative sampling techniques. We follow LeSage and Pace (2009) and use a griddy-Gibbs step (Ritter and Tanner 1992) to sample .

Sampling the elements of the adjacency matrix
As discussed in the previous section, we propose two alternative prior specifications for the unknown indicators of the spatial weight matrix . First, an independent Bernoulli prior structure with fixed inclusion probabilities (7). Second, a hierarchical prior structure which treats the inclusion probabilities as random (8). After eliciting the prior, the binary indicators can be sequentially sampled in random order from a Bernoulli distribution with conditional posterior given in (6).

Fast computation of the determinant terms
For the Bayesian MCMC algorithm, it is worth noting that repeated sampling from Eq. (6) is required. However, this requires evaluating the conditional probabilities ( = 1|·) and The main computational difficulty lies in the calculation of the determinants | 0 | and | 1 |, which has to be carried out per Gibbs sampling step for the 2 − unknown elements of the spatial adjacency matrix. The computational costs associated with direct calculation of these determinants steeply rises with -in fact by a factor of O ( 3 ). This makes direct evaluation of the determinant prohibitively expensive, especially for large values of Since the support for is limited, the griddy-Gibbs approach (or sometimes inversion approach) relies on univariate numerical integration techniques of the conditional posterior for and uses the cumulative density function for producing draws of . A Metropolis-Hastings step may be used as a standard alternative, but these typically produce less efficient draws with poorer mixing properties (see also LeSage and Pace 2009).
. To avoid direct evaluation, we provide computationally efficient updates for the determinant, allowing for estimation of models with larger sample sizes.
It is worth noting that it is not necessary to directly calculate the determinant of the × Here, denotes the spatial weight matrix obtained by setting = 1 and = 0, respectively.
Direct evaluation of | | can be largely avoided, since updating changes only the -th row of , if we do not restrict to be symmetric (we will address this case shortly). To illustrate, let ( ) denote the current -to be updated -spatial adjacency matrix, and ( ) the associated spatial Using the so-called matrix determinant lemma, we can efficiently calculate: is an × 1 vector of zeros, except for its -th entry, which is unity. The × 1 vector contains the differences between the -th row of and the -th row of ( ) . (12) provides a simple update) and the inverse of . Direct evaluation of −1 is -similar to direct evaluation of the determinant -prohibitively expensive for moderate to large , since it has to be carried out for each unknown element of . However, we can rely on the so-called Sherman-Morrison formula to avoid direct evaluation of the matrix inverse: Combining the formulas in Eqs. (12) and (13) thus provides a numerically cheap and viable way to update the elements of the spatial adjacency matrix.
Note the implication that an update of necessitates a direct evaluation of the determinant | | and the matrix inverse −1 , as in this case no convenient equations exist. An update of , however, has to be performed only once per Gibbs step, as opposed to the 2 − updates necessary for , thus justifying the relatively higher computational The binary nature of can be exploited for additional computational gains. Either 0 or 1 always exactly equals ( ) and thus its determinant and inverse is already known. This only necessitates calculating | | and ( ) −1 for only = 1 or for = 0, but not both.
If a symmetric spatial adjacency matrix is assumed, the update process remains generally the same, however the determinant and matrix inverse updates have to be performed iteratively.
In this case, both and (for ≠ ) are set to either 1 or 0. Thus, both the -th and the -th row of differ from ( ) . Following the notation in the non-symmetric case, let us denote the differences between these rows as and . To obtain an update of | | and −1 , we first evaluate Eqs. (12) and (13), based on , , | ( ) |, and ( ( ) ) −1 . Using the resulting determinant and matrix inverse, as well as , and , we again evaluate Eqs. (12) and (13), which yield | | and −1 .

Simulation study
To assess the accuracy of our proposed approach, we evaluate its performance in a Monte Carlo study. Our benchmark data generating process comprises two randomly generated explanatory variables, as well as spatial unit and time fixed effects: =˜ ˜+˜+˜+˜˜0 +˜.
To maintain succinct notation, we denote the simulated values in the Monte Carlo study with a tilde. The matrix of explanatory variables˜is defined as˜= [˜1 ,˜2 ], where both˜1 and˜2 are normally distributed with zero mean and variance of one, 0 = 2. The corresponding vector of coefficients is defined as˜0 = [−1, 1] . The vector of residuals˜is generated from a normal distribution with zero mean and˜2 = 0.5. The fixed effects parameters˜and˜are randomly generated from a standard normal distribution.
The row-stochastic spatial weight matrix is based on an adjacency matrix , which is generated from an /20 nearest neighbour specification, by additionally assuming symmetry of the weight matrix prior to row-standardization. The nearest neighbour specification is based costs. More specifically, = ( 0 + 0 )/2 where 0 is a /20 nearest neighbour adjacency matrix. on a randomly generated spatial location pattern, sampled from a normal distribution with zero mean and unity variance. In the Monte Carlo study we vary ∈ {10, 40} and ∈ {20, 100}.
For the Monte Carlo simulation study, we compare the following prior setups: 1. Fixed ( = 1/2) prior: this prior corresponds to the fixed Bernoulli prior specification in Eq. (7), where we set = 1/2. For all prior specifications under scrutiny, we consider two alternative estimation setups by assuming that the adjacency matrix is either symmetric or non-symmetric. We moreover report the predictive performance of two alternative specifications using exogenous weight matrices.

Sparsity
In these cases the employed weights are based on the true (symmetric) adjacency matrix by fixing the accuracy to the 99% and 95% level, respectively. We simulate such cases by randomly switching 1% and 5% of the elements in the true binary adjacency matrix , respectively. The resulting exogenous adjacency matrices thus result in exactly 99% and 95% overlap in the binary observations with the true adjacency matrix, while maintaining the same level of sparsity.
The prior setup for our remaining parameters is as follows. We assume a Gaussian prior for with zero mean and a variance of 100. We use an inverse gamma prior for 2 with rate and shape parameters 0.01. The prior for the spatial autoregressive parameter is a symmetric Beta specification with shape and rate parameters equal to 1.01. The chosen priors can thus be considered highly non-informative.
In Table 1 we use several criteria to evaluate the performance of the alternative specifications.
For the spatial autoregressive and the slope parameters we report the well-known root mean However, a direct comparison of the results between symmetric and non-symmetric specifications does not appear reasonable, since the adjacency matrix in the data generating process is assumed symmetric. squared error (RMSE). For assessing the ability to estimating the spatial adjacency matrix, we use the measure of accuracy. The accuracy measure is defined as the sum of correctly identified unknown elements, divided by the number of total elements to be estimated. This measure is calculated separately for each posterior draw. The reported value is an average over all posterior draws and Monte Carlo iterations.  In addition, the last two columns in Table 1 show the results for the benchmark SAR models using exogenous randomly perturbed adjacency matrices with accuracy fixed at the 99% and the 95% level, respectively.
Intuitively, the precision of the estimation improves as the number of observations increases in proportion to the number of unknown parameters. The results in Table 1 largely confirm this intuition. The performance indicators for both and also clearly improve for high levels of spatial autocorrelation ( = 0.8). In scenarios where the number of unknown parameters is smaller than the number of observations our approach even manages to outperform both rather hard benchmarks using exogenous spatial weight matrices close to the true DGP. This relative outperformance appears particularly pronounced when the strength of spatial dependence is large. In these settings, symmetric specifications (which resemble the true DGP) even manage to produce accuracy in the adjacency matrix close to unity.
Particularly interesting results appear in the most challenging Monte Carlo scenarios, where the number of unknown parameters is particularly large relative to the number of observations ( = 100 and = 10). In these scenarios, the number of parameters to be estimated exceeds the number of observations by a factor of more than ten. In these cases, prior specifications without using shrinkage appear to fail estimating the underlying spatial structure by producing rather poor accuracy measures. However, when employing sparsity priors, the table reveals that our approach still manages to produce relatively accurate predictive results. In the existence of pronounced spatial autocorrelation, the sparsity specifications even manage to closely track the predictive performance of the rather tough exogenous benchmarks.
Note that the symmetric specifications (where we impose = ) typically outperform their non-symmetric counterparts due to their resemblance to the true DGP. However, for settings where the number of unknown parameters is smaller than the number of observations both scenarios track each other closely. Among the alternative prior specifications under scrutiny, the table shows rather similar results (no clear best specification emerges) in scenarios where is small relative to The number of unknown parameters amounts to 2 + + 0 +2 and ( −1)/2+ + + 0 +2 for non-symmetric and symmetric spatial weight matrices, respectively. . However, for particularly over-parametrized settings (high and low ) the proposed sparsity priors particularly outperform the fixed setups. Specifically, even in the scenario with = 100 and = 10, the sparsity priors still perform comparatively well.

Empirical illustration
To illustrate our proposed approach using real data, we estimate spatial panel specifications based on country-specific daily infection rates in the very early phase of the coronavirus pandemic. We use the COVID-19 data set provided by the Johns Hopkins University (Dong et al. 2020). The Countries without any (official) infections in the starting period have been excluded from the sample. We moreover exclude India as a clear outlier from the sample due to its particular small (official) infection rates throughout the observation period.
With a biweekly time lag, the dependent variable thus captures data from 2nd of April to the 20th of April ( = 19). For a better comparison, we have fixed the time period captured by for all alternative specifications. It is moreover worth noting that a notable earlier starting date would result in relatively few (cross-sectional) observations. However, our results, are rather robust when considering a longer time horizon.
or Han et al. (2021), among others, and use panel versions of a spatial growth specification for the country-specific COVID-19 infections: where = − −14 , and is an × 1 vector comprising the (logged) daily number of official cases per 100,000 inhabitants per country for time period = 1, ..., . and represent fixed effects for the countries and the time periods, respectively. denotes the spatial weight matrix with spatial autoregressive parameter as defined before. We again primarily focus on rowstochastic weight matrices. Results based on spatial weight matrices without row-standardization are presented in the appendix.
We also consider alternative model specifications using contemporaneous as well as temporal lags of the spatial lag ( − with ∈ {0, 14}). A plethora of recent studies exploit the contemporaneous spatial information ( = 0) for modelling the spread of COVID-19 infections (among others, see Han et al. 2021, Jaya and Folmer 2021, Kosfeld et al. 2021, Guliyev 2020, or Krisztin et al. 2020). Using contemporaneous spatial information appears reasonable when the primary interest lies in quantifying spatial co-movements of infection rates. However, for many questions of interest, a temporal spatial lag − ( > 0) might be an interesting alternative since it reflects the notion that the spatial process of virus transmission takes some time to manifest (Elhorst 2021, Mitze andKosfeld 2021). Since our proposed estimation approach can be easily applied to these alternative specifications, we provide estimates for both specifications.
In addition to the Initial infections variable −14 , matrix −14 contains three explanatory variables on a daily basis. Several studies emphasize the importance of climatic condition on the COVID-19 virus spread. For a survey on the effects of climate on the spread of the COVID-19 pandemic, see Briz-Redón and Serrano-Aroca (2020). We therefore use daily data on the country specific maximum measured temperature (Temperature) and precipitation levels (Precipitation) as additional covariates. Both variables stem from a daily database of country-specifc data, The spatial growth regression in (14) may be alternatively specified in levels rather than in log-differences by setting = . Results using this alternative specification are very similar and are presented in the appendix.
It is worth noting that in the special case of > 0, computational efficiency is tremendously increased, as no log-determinant calculations are required in the MCMC algorithm. The sampling strategy for these cases is presented in the appendix. which was compiled via the Dark Sky API. As a third variable, we also include the well-known stringency index (Stringency) put forward by Hale et al. (2020), which summarizes countryspecific governmental policy measures to contain the spread of the virus. In this application, we use the biweekly average of the reported stringency index. Since all these influences arguably require some time to be reflected in the official infection figures, we use a biweekly lag of 14 days (in accordance with in alternative variants).  for specifications using a contemporaneous spatial lag , while the right part summarizes results for the case −14 . For each specification, the first rows contain the posterior mean and standard deviations for the slope parameters followed by estimates of and 2 . Posterior quantities which appear significantly different from zero using a 90% posterior credible interval are depicted in bold. The table moreover presents the average posterior expected number of neighbours, which is given by the average row sum of the matrix of posterior inclusion probabilities based on ( = 1|D).
This measure can be viewed as a measure of sparsity in the estimated matrix of linkages. All specifications moreover contain fixed effects for both and . Table 2 shows rather similar and 2 posterior quantities for the flat and the sparsity prior.
However, there appear some marked differences between the specifications and −14 . In https://www.kaggle.com/datasets/vishalvjoseph/weather-dataset-for-covid19-predictions As robustness checks, we have also tried a shorter lag length of one week. The estimated spatial structures appeared very similar to the biweekly benchmarks. All these additional robustness checks, along with the R codes, are available from the authors upon request.
For the benchmark specifications, the number of unknown parameters and observations thus amounts to 753 and 513, respectively. all cases, spatial dependence appears strong and precisely estimated, but appears particularly high in the temporal lag specification −14 . However, the table similarly reveals higher estimates for the nuisance parameter 2 for the temporal spatial lag models. The table shows rather precise and negative coefficients for the initial infections variable, indicating conditional convergence patterns. For most model variants the table moreover suggests a significant negative impact of the stringency index on infection growth. The majority of the slope parameter estimates associated with the variables temperature and precipitation appear more muted and insignificant. Overall, the table moreover clearly demonstrates that a hierarchical prior setup can enforce sparsity in the resulting adjacency matrix. Both sparsity specifications result in an average number of neighbours smaller than the models with fixed prior specifications.   For the sake of visualization, we distinguish between negligible evidence for inclusion (< 0.50; white colour), moderate evidence (0.50 − 0.75; grey colour), and strong evidence (> 0.75; black colour).
The two upper plots in Figure 2 depict posterior inclusion probabilities ( = 1|D) for the specifications involving a contemporaneous spatial lag , while the lower part shows temporal spatial lag specifications −14 . In both cases, the left subplots present results based on independent prior inclusion probabilities of = 1/2. The right plots are based on sparsity priors using = 7. The columns in the subplots indicate marginal posterior importance of the countries as predictors of coronavirus infections in linked countries. Conversely, rows depict the countries to be predicted. The results using sparsity priors generally produce similar patterns as the fixed prior specifications and clearly demonstrate its ability of dimension reduction in the connectivity structure. For the contemporaneous spatial lag specification (upper plots), the figure suggests a slightly more pronounced regional dependency structure as compared to the temporal spatial lags. The figure moreover reveals marked spill-out effects from Asian countries, as well as from Iran and Italy.
Results based on a biweekly temporal spatial lag −14 show even more pronounced spillout effects from Asian countries (most notably China, Republic of Korea, and Singapore). For European countries, results similarly suggest Italy as a further important source country of spatial virus transmission. The estimated spatial linkages are thus in close agreement with the actual origins of the overall virus transmission for the very early period of the global outbreak of the pandemic.
To showcase convergence of the posterior MCMC chains, Figure 3 depicts trace plots for , 2 , and slope parameters. Overall, the trace plots show rather good mixing and convergence The regional dependency structure appears particularly pronounced when a level specification of the infection dynamics is imposed. Sensitivity checks based on this alternative specifications are presented in Figure A1 in the appendix.
When comparing the results, it is important to note that for all specifications under scrutiny, we have fixed time period in the dependent variable ( ranges from the 2nd February to the 20th of February; i.e. = 19).The biweekly temporal spatial lag specification thus inherently comprises spatial information prior to the period in .
properties. Convergence of the chains have moreover been checked using the diagnostics proposed by Geweke (1992) implemented in the R package coda (Plummer et al. 2006). Results moreover appear rather robust concerning alternative modelling frameworks. Estimation results of these alternative specifications are presented in the appendix.  Notes: Posterior draws based on 5, 000 MCMC draws, where the first 2, 500 were discarded as burn-ins.
Estimates when using a smaller time lag of seven days also appear very similar. Results along with the R codes used are available from the authors upon request.

Concluding remarks
In this paper we propose a Bayesian approach for estimation of weight matrices in spatial econometric models. A particular advantage of our approach is the simple integration into a standard Bayesian MCMC algorithm. The proposed framework can therefore be adapted and extended in a simple and computationally efficient way to cover a large number of alternative spatial specifications prevalent in recent literature. Our approach may thus be easily extended to cover inter alia non-Gaussian models such as spatial probit (LeSage et al., 2011) or logit specifications (Krisztin and Piribauer, 2021), local spillover models (Vega and Elhorst, 2015), or spatial error models (LeSage and Pace, 2009).
Our approach does not not necessarily rely on specific prior information for the spatial linkages.
Spatial information, however, can be easily implemented in a flexible and transparent way. We moreover motivate the use of hierarchical priors which impose sparsity in the resulting spatial weight matrix. These sparsity priors are particularly useful in applications where the number of unknown parameters exceeds those of the observations. The virtues of our approach comes at the price that we focus on spatial neighbourhood structures which are binary (prior to rowstandardization). However, this assumption is implicitly assumed in many spatial applications in the regional economic literature where spatial weight matrices are constructed based on concepts of contiguity, distance band, or nearest neighbours.
Based on Monte Carlo simulations, we show that our approach appears particularly promising when the number of spatial observations is large relative to the time dimension , which is a rather common characteristic of data sets in the regional science literature. We moreover demonstrate the usefulness of our approach using real data on the outbreak of the COVID-

Estimation strategies for alternative spatial lag specifications
In the empirical application, the paper also considers model variants with a spatial lag on the temporal lag of the dependent variable. The considered specification can be written as: where −1 now denotes the temporal lag of the dependent variable and the other quantities are defined as before. From a Bayesian perspective, it is worth noting that an additional temporal lag of the dependent variable −1 can be treated like any other explanatory variable and thus part of the matrix of covariates .
From a computational perspective, the specification in Eq. (15) is much easier to deal with as compared to SAR models involving a contemporaneous spatial lag in the dependent variable (i.

e.
). This is due to the fact that the likelihood function does not involve a determinant term.
To maintain succinct notation, we again collect the fixed effects along with the explanatory variables in a × matrix and stack the quantities as before = 1 , . . . , , with and −1 denoting the stacked × 1 vectors of the dependent variable and the lag, respectively.
Defining = − ( ⊗ ) −1 − , the likelihood reduces to a much simpler form and is given by: By using the same prior specifications like in the SAR case, the posterior probabilities of including or excluding conditional on the other parameters are then given by: where 1 and 0 denote the updated vector of residuals when = 1 and = 0, respectively.
The conditional Bernoulli posterior for is given by: The remaining conditional posterior distributions required for the MCMC sampler are given by: Unlike the other parameters, the conditional posterior for again takes no well-known form and can be sampled by using a griddy-Gibbs or tuned Metropolis-Hastings step: When using a normal prior distribution for ( ), it is worth noting that the spatial lag can be simply captured in the matrix of explanatory variables, such that the parameter is incorporated in the vector . However, in order to pay particular attention to model stability as well as prior consistency to the benchmark SAR specification in the main body of the paper, we similarly employ a beta prior for , which results in the non-standard form of the conditional posterior for .
When considering specifications with a spatial lag in the explanatory variables (typically referred to as SLX models), the MCMC sampling scheme is rather similar, which also considerably reduces the computational burden as compared to SAR frameworks.

Empirical results for additional model specifications and Monte Carlo diagnostics
This section provides results based on alternative model specifications. We provide estimates and inferences for three different specifications. We first consider a specification where the dependent variable is based on the log levels of infection rates rather than (biweekly) log differences (all else being equal). These results (labelled level specification are presented in Table A1 and Figure A1. Overall, the results in general appear very similar to the benchmark specifications. Second, we also consider specifications without row-standardization of the spatial weight matrix. A summary of the estimation results along with the posterior results for the spatial weigh matrix is provided in Table A2 and Figure A2, respectively. Third, to show the merits of our approach in highly over-parametrized environments, we also present a robustness check with only = 10. Results are presented in Table A3 and Figure A3. We have moreover tried various other robustness checks including versions using a shorter time lag of only seven days or even shorter time periods, which produces similar results. Notes: Posterior quantities based on 5, 000 MCMC draws, where the first 2, 500 were discarded as burn-ins. Values in bold denote significance under a 90% credible interval. Level specifications refer to specifications by using (all else being equal) log levels of infection rates rather than log-differences as the dependent variable.
Note that the interpretation of initial infections variable in the level specifications is different to benchmark case using log differences as dependent variable. Specifically, in the former parameters smaller than unity (as compared to negative parameters in the benchmark specifications) point towards convergence.
From an econometric point of view, estimation is the same as compared to the row-stochastic counterparts without conducting the standardization in the MCMC sampler. However, in this case several caveats arise. Most notably, row-standardization of has the great advantage that the parameter space for the spatial autoregressive parameter is clearly defined, such that the inverse ( − ) −1 exists. To ensure stationarity of the MCMC sampler in case of no row-standardization, we have therefore implemented a rejection step by rejecting draws resulting to singular solutions.
Specifically, in these specifications we reduce the end date of the dependent variable accordingly.  Notes: Posterior inclusion probabilities of spatial links based on 5, 000 MCMC draws. Inclusion probabilities 0.50-0.75 (little evidence for inclusion) are coloured grey. Strong evidence for inclusion (>0.75) indicated by black colour. Level specifications refer to specifications by using (all else being equal) log levels of infection rates rather than log-differences as the dependent variable.     Notes: Trace plots and posterior densities based on 1, 000 MCMC draws, where the first 500 were discarded as burn-ins. Dashed lines denote prior distributions.