A calibrated gravity model of interregional trade

ABSTRACT Lack of interregional trade data is often a major obstacle when doing economic analysis at the subnational level. This paper discusses a calibration procedure for estimating bilateral trade between the regions of a country. Our approach can be equivalently characterized as an application of the gravity-RAS or the doubly constrained gravity model method. Either way, a crucial element is represented by the distance elasticity parameter, which the user is expected to provide exogenously. We propose a way of estimating that parameter using standard econometric methods with readily available data and demonstrate our calibration procedure in a case study of Italy.


INTRODUCTION
Modelling the differential impacts of economic policy on the regions of a country often requires data on how those regions interact with each other through trade. The fact that in most applied contexts such data do not exist has been a long-standing problem for regional analysts. Over decades, researchers' efforts to get around this obstacle have given rise to a large and varied body of literature (Miller & Blair, 2009). Even so, no consensus has emerged over a standard set of methods. As a result, how to overcome the lack of interregional trade data remains a lively area of investigation.
Recently, renewed consideration has been given to a family of approaches some refer to as 'gravity-RAS' (Cai, 2021;Distefano et al., 2020;Fournier Gabela, 2020;Wilson, 2016). Earlier examples of these methods can be found in Riddington et al. (2006) and Sargento et al. (2012). As the name suggests, gravity-RAS techniques seek to reconstruct the bilateral flows in an unobserved trade network using a gravity model in conjunction with the RAS algorithm. A mainstay of empirical trade analysis, the gravity model expresses bilateral trade between two regions as a function of their respective economic masses (e.g., gross domestic product -GDP) and distance (Baldwin & Taglioni, 2006). To increase realism, this basic model formulation is often augmented with a variety of additional explanatory variables. In the gravity-RAS method, a gravity equation is used to generate an initial preliminary estimate of an origin-destination trade matrix of interest on a product-by-product (or industry-by-industry) basis. This step of the procedure their analysis, we propose that in the absence of better information, it may be viable to estimate the distance elasticity of trade within a country from data on trade between countries. Evidently, extrapolating distance elasticities from an international to a domestic setting must be done with caution. In the AvW model, distance acts as a proxy for trade costs. There is a risk that the elasticities obtained from international data naturally end up embodying a variety of cost factors that are not as relevant in a subnational context (e.g., tariffs, costs stemming from the use of different languages, currencies, legal systems, etc.). To the extent that such effects can be partialled out, however, one can expect the international distance elasticities to approximate their interregional counterparts reasonably well. In fact, this study is not the first to use international trade data to fill a gap in regional statistics (e.g., Jahn, 2017;Lahr et al., 2020). Still, we are not aware of any other attempts to do so within a comprehensive theoretical framework. While our analysis focuses on the case in which all regions form part of the same country, the approach is conceptually easy to extend to assemblages of regions located in different countries.
In addition to providing us with a way around the conundrum of estimating interregional distance elasticities without any interregional trade data, working with a structural gravity model helps us clarify how to handle intra-regional flows. Very often, analyses of interregional trade use different methods to estimate trade between regions and trade within regions (Fournier Gabela, 2020;Jahn, 2017;Sargento et al., 2012). In our approach, on the other hand, there is nothing about intra-regional trade that would warrant special treatment. In our view, this results in an estimation procedure that is both internally coherent and easy to implement.
Finally, we note that the AvW model on which this paper relies is only one of various microfoundations that the economic literature has proposed for the gravity equation. In principle, our calibration procedure can be easily adapted to work with alternative theoretical formulations. This could provide us with a way of exploiting data other than international trade statistics to parametrize our interregional model. Using the model of Chaney (2018), for example, one could conceivably work out the distance elasticities of interest from information on the distribution of firm sizes.
After a more formal description of our methodological framework (section 2), we demonstrate how the proposed gravity model calibration procedure plays out in an application to Italy (section 3). Our exercise contemplates 11 distinct product categories, each with a corresponding trade matrix to be estimated. Product by product, we econometrically obtain the relevant distance elasticity using data from the World Input-Output Database (WIOD) (Timmer et al., 2015) and use italong with pairwise distancesto compute the starting values for RAS balancing. Finally, we attempt to validate our results against a competing set of estimates that we construct from the findings of a unique survey carried out by the Bank of Italy (section 4). Incidentally, Italy's most widely used supply-and-use tables rely on this very dataset for information on interregional trade (Cherubini & Paniccià, 2013;Paniccià & Rosignoli, 2018). Validation exercises of the kind conducted here are not common in this strand of methodological literature. Given the general unavailability of subnational trade data, new techniques are generally evaluated on international (Lahr et al., 2020;Lamonica & Chelli, 2018) or simulated data (Bonfiglio & Chelli, 2008;Cai, 2021). Also, performance is often assessed focusing on broader aggregates (e.g., input-output multipliers) rather than bilateral flows (Jahn et al., 2020;Rokicki et al., 2020).

METHODOLOGICAL FRAMEWORK
2.1. A standard gravity model set-up Consider a country whose economy consists of n . 1 regions and m products. Let X k denote the n × n origin-destination matrix of bilateral trade in a generic product k. Its (i, j) entry, x ijk , represents exports of k from region i to region j. Summing along the i-th row gives region i's overall output of k, that is, y ik = j x ijk . Conversely, the sum along the j-th column, e jk = i x ijk , is region i's total expenditure on k. In vector form, the row and column totals of X k will be denoted y k = [y ik ] and e k = [e jk ], respectively.
Interest lies in bilateral trade between regions, which unfortunately cannot be observed directly. Hence, our aim will be to estimate X k for k = 1, . . . , m. Here it is assumed that the y's and e's are all known. In applications, they might not be. In most cases, however, they can be estimated using standard methods and readily available data (e.g., Eding et al., 1999;Jackson, 1998;Kronenberg, 2009;Lahr, 2001;Madsen & Jensen-Butler, 2005Többen & Kronenberg, 2015;Yamada, 2015). Knowledge of the row and column totals only places 2n − 1 independent constraints on the n 2 unknown entries of X k . To put more structure on the problem, we postulate that bilateral trade adheres to the gravity model of Anderson and Van Wincoop (2004). In that formulation, trade is separable and products are differentiated by region of production, with preferences over varieties taking the constant elasticity of substitution form. Then, x ijk can be written as x ijk = e jk ( p ijk /r jk ) 1−s k . The notation is as follows: p ijk represents the price charged in region j for product k from region i; r jk is a consumer price index for product k in region j; s k denotes the elasticity of substitution among varieties of product k. The model posits that the cost of trade behaves as an ad-valorem tax. Specifically, it is assumed that p ijk = t ijk p ik , where p ik represents the price received by region i producers of k and t ijk . 1 captures the trade costs. If all product markets clear, then: where y k = i y ik is total product k output and p ik is an index reflecting how costly it is to export product k from region i. Note that as long as all variables have economically meaningful values, x ijk can only be zero if y ik = 0 or e jk = 0 (or both). In other words, a zero entry can only appear in X k as a part of a row or a column of zeros.

Calibration to the marginal totals
Given our goal of estimating X k , the main problem with equation (1) is that in a typical applied setting the trade cost factor t ijk is itself unknown. To get around this difficulty, we follow standard practice and express t ijk as a parametric function of observables. For simplicity, at this stage it is assumed that the cost factor is a log-linear function of only one variable, bilateral distance d ij : with d ij known and strictly positive for all i and j (including when i = j). Substitution into (1) yields the following gravity equation: The parameter u k represents the distance elasticity of trade for product k and plays a central role in our estimation problem. If u k were known, then X k could be actually solved for. This is perhaps easier to see when (3) is written in matrix notation. To this end, define a ik = (g 1−s k k /y k )(y ik /p 1−s k ik ) and b jk = (e jk /r 1−s k jk ), with a k = [a ik ] and b k = [b jk ] denoting the corresponding n-dimensional vectors. Also, let F k = [d u k ij ] represent the n × n matrix of exponentiated bilateral distances. Then, the trade matrix for product k can be written as: Angled brackets are used to indicate a square matrix with the relevant vector placed along the main diagonal and the off-diagonal entries all set equal to zero. Given that X k must match the known row totals: where i represents a vector of ones with suitable length. Analogously, the column totals imply: Clearly, if u k is known, F k is also known. Then, the system (5-6) can be solved for the values of a k and b k (up to a multiplicative constant). Plugging those two vectors back into (4) gives X k . Effectively, X k can be recovered by means of the RAS algorithm. Using RAS with F k as the seed and the pair (y k , e k ) as the marginal totals yields X k as the fitted matrix (Cai, 2021). The approach can be applied separately to each of the m products in the economy. In applications, this procedure is not feasible because u k is unknown. A natural workaround is to replace u k with an estimate. Suppose that a distance elasticity estimateû k is indeed available for product k. Then, the matrixF k = [dˆu k ij ] can be used as the seed for RAS instead of F k to produce an estimate, say,X k , of the true trade matrix X k . Note that ifû k is unbiased for u k , then E(dˆu k ij ) ≈ d u k ij . This way of proceedingi.e., fixing the value of the free parameter u k before solving for the remaining model parameters from a single observation on a benchmark equilibrium (y k , e k )is reminiscent of the way many economic models (e.g., applied general equilibrium models) are operationalized in policy analysis practice (Sancho, 2009). It is in keeping with the terminology used in that strand of literature that we refer to our approach as 'calibration'. While calibration enables us to operate despite the very limited availability of data, the fundamentally deterministic nature of the framework makes it difficult to account for uncertainty. In our setting the gravity model is unlikely to hold in the exact form of equation (3), but as an approximation instead. Furthermore, the elasticity estimateû k is intrinsically uncertain andif they have to be estimatedthe row and column totals will also be. An analysis of how these sources of uncertainty may end up affecting the calibrated flows can be found in Cai (2021).

Empirical distance elasticities from international data
At this point, the question becomes how to obtainû k . Here, we will use an econometric approach. Like Anderson and Van Wincoop (2003), we modify equation (3) by moving y ik and e jk to the left-hand side, taking logs and including a stochastic error term e ijk . We will account for the indices p ik and r jk using fixed effects (Feenstra, 2002). Thus, our estimations will be based on the following empirical counterpart of (3): with z ijk = x ijk /(y ik e jk ), j k = (1 − s k ) log g k − log y k , m ik = (s k − 1) log p ik and n jk = (s k − 1) log r jk .
Equations of this kind are commonplace in applied trade analysis and can be estimated using standard regression methods. In our setting, the challenge is once again given by the lack of data, as the left-hand side of (7) is inherently unobservable. In fact, if z ijk were known, there would be no need for an estimate of u k in the first place. To overcome the lack of regional-level data, we posit that our gravity model of trade applies both within and between countries. Accordingly, we will estimate equation (7) using bilateral trade data at the international level, of which there is no shortage. In other words,X k will be computed using a value ofû k extrapolated from international trade patterns.
The idea is spelled out in greater detail in the context of our case study (section 3.2). It is natural, however, to ask to what extent it is acceptable to model economic flows within countries on the basis of a distance elasticity that reflects flows between countries. Given that u k = (1 − s k )d k , the assumption that, for a given product k, the same value of u k applies to both intra-and international flows can be broken down into two parts. The first one is a requirement that the elasticity of substitution s k is the same for all geographical entities. This is a standard regularity assumption that underlies a lot of empirical research (Anderson & Van Wincoop, 2004). The second part of the assumption is more problematic, as it demands that the sensitivity of trade costs to distance, as measured by d k , be the same whether or not a country border is involved. Starting with McCallum (1995), a sizable body of literature has argued that there are significant trade frictions associated with national borders (due not only to tariffs, but also for example to differences in languages, business practices, regulatory frameworks, and so forth). In that case, an estimate of u k obtained from international data might overstate how rapidly trade costs increase with distance in a domestic context. Concerns of this kind certainly have merit but can be greatly alleviated by adopting a less restrictive specification for the trade cost factor t ijk . In the empirical section of the paper, for example, equation (2) is augmented with a border dummy and other explanatory variables.

EMPIRICAL APPLICATION
3.1. Case study overview Assessing how closely a certain estimation strategy approximates actual interregional trade flows has generally proved difficult because suitable validation datasets are hard to come by. Here, we take advantage of information collected by the Bank of Italy in the 2009 Survey of Industrial and Service Firms (SISF). With annual periodicity, the SISF collects information on a variety of business topics from a stratified sample of Italian firms. In addition to a fixed component, the questionnaire has a variable part that covers different themes each year. Exceptionally, the 2009 wave asked firms with 50 employees or more about the geographical breakdown of their sales. These questions were introduced at the behest of Istituto Regionale Programmazione Economica Toscana (IRPET, the Regional Institute for Economic Planning of Tuscany), which has been using the resulting information in the construction of its multiregional supply and use system for Italy ever since (Cherubini & Paniccià, 2013;Paniccià & Rosignoli, 2018). Although not part of the official system of statistics, the IRPET tables are a recognized source of regional-level input-output data for Italy. Another input-output regionalization effort that relies on the 2009 SISF is that of Cai and Lorenzoni (2014), who attempt to improve on the spatial and industry resolution of the interregional trade data using a generalized least squares approach à la Chow and Lin (1971).
The interregional trade data available from the SISF are indeed aggregated quite heavily. In the spatial dimension, the survey divides Italy's territory into four macro-areas: Northwest, Northeast, Centre and South. Below, these geographical entities will be designated as 'regions'. They do not correspond, however, to Italy's administrative regions (of which there are twenty). In terms of economic activities, the survey distinguishes 11 broad aggregates, delimited on the basis of the NACE 1.1 classification. We will refer to those aggregates as 'products'. We do so for consistency with section 2, even though we recognize that in the national accounting and input-output literature they would be more commonly designated as branches of economic activity or industries. More information on geographical and product aggregation can be found in Appendix A in the supplemental data online.
Ultimately, the 2009 SISF yields a 4 × 4 origin-destination trade matrix for each of 11 products. Effectively, this gives us a set of benchmark estimates against which to evaluate the performance of our calibration procedure. Below, we will treat those benchmark matrices as if they were the 'true' trade matrices that we need to estimate. Accordingly, they will be referred to as the validation data. Appendix B in the supplemental data online examines the benchmark matrices in the context of Italy's official statistics and highlights some of their limitations as a validation dataset.

A calibration approach
The problem at the centre of our case study can be formalized as follows. Again, we will proceed one product at time. As customary, we can represent international trade in generic product k as an origin-destination matrix linking the N countries that make up the world economy. For ease of exposition, we will assume that the main country of interest, Italy, is the N -th of those countries. In the trade matrix for product k, let us then blow-up country N into its n constituent regions. Clearly, n = 4 in our case study. Bilateral trade in product k can then be written in the form of a (N + n − 1) × (N + n − 1) partitioned matrix. With a slight change in notation from section 2, we will now use the symbol X k to denote this partitioned matrix as a whole (as opposed to its intra-national component only): The top-left block, X 11 k , is a matrix of international flows that can actually be observed. In the bottom-right corner, X 22 k represents interregional trade among Italian regions. This matrix is unknown and it is our ultimate goal to estimate it. Finally, the off-diagonal blocks X 12 k and X 21 k account for the international imports and exports of Italy's regions, respectively. They are not objects of primary interest in this case study. Correspondingly, we can construct an analogously partitioned matrix of bilateral distances D in which the distances between Italy's regions are represented by the bottom-right block D 22 .
In line with the arguments presented in section 2, our approach to the problem of recovering X 22 k assumes that all the entries of X k are described by the AvW model. In practical terms, this amounts to treating Italy's regions as if they were individual countries. Provided the appropriate explanatory variables are accounted for, this assumption does not seem especially problematic. On a theoretical level, there is indeed no consensus as to the appropriate level at which the gravity equation should hold (Redding & Weinstein, 2019). Empirically, gravity models have been used successfully at various scalese.g., at the firm (Bas et al., 2017), sectoral (Head & Ries, 2001), regional (McCallum, 1995) and national levels (Redding & Venables, 2004). Furthermore, when it comes specifically to our case study, Italy's regional economies are comparable in size (e.g., GDP) to small or medium-sized EU countries. This applies to the four macro-regions considered in this case study, but is also true for the overwhelming majority of the 21 administrative regions of the country (i.e., at the NUTS-2 level) (see Figure C1 in the supplemental data online).
In this framework, the entries of X 11 k represent a set of observations on the left-hand side variable of the gravity equation and can be exploited to estimate the parameters of our model by standard econometric methods. To operationalize this approach, we will obviously need to be more explicit about several features of the modele.g., what variables appear on the right-hand side, what sources of data to use, how to specify the error term. We deal with such empirical issues in section 3.3. Once parameter estimates are available, we can approximate X 22 k using our RASbased calibration procedure. Even though the gravity model specification we will use in our case study is richer than equation (7), it will still be true that the only model coefficient for which we will actually need an estimate is the distance elasticity,û k (section 3.4). We thus compute our seed matrix asF 22 k = [(d 22 ij )ˆu k ]. We will then use the RAS algorithm to scaleF 22 k to the row and column totals of X 22 k . Appendix D in the supplemental data online presents a simplified numerical example of the entire procedure.
Clearly, for this approach to be feasible we need knowledge of the marginal totals of X 22 k . In a typical application, estimates of those totals would have to be constructed separately. As an aside, that might require data on (or estimates of) the column totals of X 12 k (international imports) and the row totals of X 21 k (international exports). In this case study, however, the exceptional availability of the SISF data means that we can use the row and column totals of the benchmark trade matrix itself. This is not merely a matter of convenience. It also simplifies the task of assessing our distance elasticity estimate, as it removes inaccurate marginal totals as a potential source of discrepancy between the calibrated and the benchmark matrix.

Distance elasticity estimation
To estimate the distance elasticity coefficients, we fit a close relative of (7) to international trade data. As discussed, there is a risk thatunless they are properly accounted forthe additional trade costs associated with national borders may distort our distance elasticities. To control for this possibility, we modify equation (2) so that the cost of trading between i and j depends not only on distance d ij , but also on a binary variable, b ij , whose value is zero for i = j and one otherwise. We further augment equation (2) with a common language indicator, l ij , which takes the value one when i and j share an official language and zero in all other cases. Hence, the trade cost factors take the form: Accordingly, (7) becomes: with f k = (1 − s k ) log t k and c k = (1 − s k ) log l k .
To estimate equation (9), we need data with at least some variation in b ij . International trade databases contain information on flows between countries but not within countries. Therefore, we obtain the left-hand side for our regressions from the WIOD (Timmer et al., 2015). Constructed by harmonizing trade statistics and supply-use tables, a multi-country input-output framework such as the WIOD constitutes an internally coherent dataset of both intra-and international economic transactions. Specifically, we use the world input-output table for 2009 from the 2013 release of the WIOD. As in our validation dataset, economic activities are classified according to the NACE 1.1 scheme, although with finer detail. We aggregate the table to reproduce the 11 products observed in the SISF data. Product by product, we then add up over users to obtain the desired bilateral trade matrices, from which z ijk is straightforwardly computed. When it comes to the right-hand side of (9), data on bilateral distances and shared languages were extracted from Centre d'Études Prospectives et d'Informations Internationale's (CEPII's) GeoDist database (Mayer & Zignago, 2011). Specifically, the distance measure we use is distwces, a population-weighted aggregate of bilateral distances between major cities. Because they use consistent definitions for both intra-and international links, GeoDist distances are the standard choice for applied work on border effects. Also, contrary to simpler alternatives (e.g., the distance between country centres), our distance specification does not imply that d ij = 0 whenever i = j.

Bilateral trade calibration
Equipped with our dataset, we estimate equation (9) separately for each product. Once the distance elasticities have been obtained, the trade matrices can be calibrated using the procedure outlined in section 2.2. For this purpose, we use a measure of interregional distance that is entirely consistent with the specification used in the econometric estimationsthat is, with GeoDist's distwces. Specifically, bilateral distances between Italian macro-regions are computed on the basis of population data and road distances between main urban areas ('capoluoghi di provincia').
Note that replacing the simple trade cost specification of equation (2) with the more flexible version of equation (8) does not affect our calibration procedure. In fact, allowing trade costs to depend on border and common language dummies does modify our gravity equation (3). Namely, it introduces an extra multiplicative term, , on its right-hand side. Our estimation exercise, however, focuses on a single country. In the interregional trade matrix X 22 k we are trying to recover, all the entries are characterized by b ij = 0. In addition, all regions share a common official language, so that l ij = 1 for all the relevant (i, j) pairs. As a result, those elements have h ijk = l 1−s k k . For any given k, this term is merely a multiplicative constant and can be reabsorbed into either a k i or b k j , indifferentlyjust like (g 1−s k k /y k ). Looking back at the estimation of equation (9), the fact that our calibration exercise does not actually require an estimate of the border effect parameter has another interesting implication: it is after all not indispensable that the estimation dataset contain information on intra-national flows. Without variation in b ij , we could indeed reabsorb the term f k b ij into the intercept of the model and estimate u k from any standard source of international trade data. Here, we prefer to stick to the approach of section 3.2. There are applications in which the ability to explicitly model border effects would be desirable. For example, in Europe one often needs estimates of bilateral trade between regions located in different countries (Brandsma et al., 2015). This is a very natural extension of our calibration framework and it is not pursued any further in this paper.
The reasoning on border effects extends naturally to any other sources of trade friction (tariff and nontariff barriers) that may be relevant in an international but not in a domestic contest. As long as the bilateral flows we are trying to estimate take place within a single country (or freetrade area), the effect of those frictions does not vary across origin-destination pairs and can thus be left in the RAS scaling factors. By contrast, one may worry that neglecting tariff and other trade barriers in the preliminary econometric work on the distance elasticity might makê u k noisier and possibly biased. Such concerns, however, can be easily addressed by introducing the appropriate control variables or by carrying out the estimations on a smaller and more homogenous sample of countries.
Finally, we ponder a question of accounting consistency. To what extent will our calibrated X 22 k matrix aggregate back to what is known about country-level trade flows? In this respect, the first thing to notice is that country-level data only place one constraint on the elements of X 22 k , namely, that they add up to total intra-national trade in product k. Generally speaking, this constraint has little practical significance: because international trade statistics do not collect information on trade within countries, total intra-national trade in product k is simply unknown. It is true, however, that certain datasetsnotably, the WIOD data we use in our exercisedo include estimates of a country's trade with itself. In such case, the entries of our calibrated X 22 k matrix will only add up to that estimate if the marginal totals used for RAS are consistent with it. In our application this will not happen, as the SISF data that provide the target margins for our RAS procedure inevitably fail to match the WIOD's intra-national trade figures for Italy. In our view, this is hardly a reason for concern. In a typical application, interregional trade is estimated as a component of an input-output (or social accounting) matrix. It is with the other parts of that matrix that our estimates must be consistent, not necessarily with the datasets used to estimate ancillary parameters like the distance elasticities. That kind of accounting consistency can naturally be enforced through the row and column totals we use to scale our seed matrices.

Evaluating accuracy
We then compare our calibrated trade matrixX 22 k with the corresponding benchmark matrix computed from the SISF. With this aim, we use three summary indicators, which compute separately for each of the 11 products. First, like several related studies (Cai & Rueda-Cantuche, 2019;Jackson & Murray, 2004;Sargento et al., 2012), we compute an overall discrepancy measure known as the standard total per cent error (STPE). Given generic matrix Z and an estimate thereofẐ, the STPE is given by i,j |ẑ ij − z ij |/ i,j z ij × 100. In our case, given that by construction the entries ofX 22 k add up to the same grand total as those of the benchmark, the value of STPE has an intuitive interpretation: it is twice the share of total trade in product k that the matrixX 22 k allocates to an incorrect origin-destination pair. (Each unit of good k that is attributed to a wrong i, j link increases the numerator by 2.) Second, we compute the diagonal total per cent error (DTPE) associated with our estimates.
The DPTE is defined as iẑ ii − i z ii / i z ii × 100 and measures the tendency for the estimates to over-or under-state intra-regional trade. In many applications (e.g., most inputoutput regionalization exercises), interest lies primarily in estimating what share of a region's needs is sourced from the region itself, whereas breaking down interregional imports by origin is not a major preoccupation. Finally, our third indicator acknowledges the fact that the trade flows in the SISF validation dataset are themselves estimates: the entries of the benchmark matrix have in fact been constructed from a sample survey and are subject to a non-negligible amount of uncertainty. Thus, for our third discrepancy indicator what we extract from the SISF data is no longer point estimates of the bilateral flows, but rather 95% confidence intervals (CI95). We then ask how often our own estimate falls within that interval.

Distance elasticity estimation
Our attempt to reconstruct the Italian interregional trade network begins by estimating a full set of distance elasticities for the 11 product categories of interest. To this end, we fit equation (9) to our WIOD-based estimation dataset using the fixed effect estimator on a product-by-product basis. The sample consists of 27 European countries: pre-Brexit EU member states excluding Croatia, for which data are unavailable. 1 Having limited our consideration to countries that form part of the European Single Market, we do not need to worry about how a variety of tariff and nontariff barriers to trade might affect our estimates.
The main results are displayed in Table 1. As can be verified from the number of observations in each regression, zero trade flows are not a major concern at this degree of commodity aggregation: having dropped all the zero flows from the sample, the observation count for every single product departs little if at all from 729, the number of entries that make up a 27 × 27 origin-destination matrix. In terms of the R 2 coefficient, our empirical model fits the data reasonably well for all products. The estimated distance elasticities, which represent the main object of interest, are all highly significant and have the appropriate (negative) sign. Ranging from −1.92 to −1.17, the estimates display non-negligible product-to-product variation. In absolute value, we find systematically larger coefficients for manufacturing (i.e., products numbered 1-7) than for services (products 8-11). Taken together, the estimates are in line with the economy-wide value of -1.55 used by Jahn (2017). Such results are evidence that interregional trade might decay more steeply with distance than postulated by several related studies, which have often proceeded from the working assumption that the distance elasticity would be -1 (Distefano et al., 2020;Fournier Gabela, 2020;Sargento et al., 2012). Finally, we turn to the coefficients on the border and common language dummies. With few exceptions, they have the expected sign and are statistically significant at conventional confidence levels both individually and jointly. For reasons of space, the corresponding estimates are not displayed in Table 1. See Appendix E in the supplemental data online.  (9) to EU trade data on a product-by-product basis. In each case, columns Observations and R 2 report the size of the sample and the coefficient of determination, respectively. The two rightmost columns show the estimated distance elasticity and the associated robust standard error. For additional results, see Appendix E in the supplemental data online. Note: Compared are the calibrated and benchmark trade matrices in two scenarios: in panel (a), calibration uses the distance elasticity estimates of Table 1; panel (b) relies on the alternative sets of elasticities introduced in section 4.3. The indicators STPE (standard total per cent error), DTPE (diagonal total per cent error) and % in CI95 (95% confidence interval) are defined in section 3.5. All calculations are carried out one product at a time.

Calibrated versus survey-based trade matrices
Product by product, we apply the three-step procedure of section 3.3 using the elasticities in Table 1 as an input. In passing, it is worth noting that all bilateral distances in the Italian data fall within the range spanned by the WIOD-based estimation dataset. Eventually, each calibrated trade matrix is related to its counterpart from the SISF by means of the summary indicators introduced in section 3.5. The results are displayed in Table 2 (panel a).
Our measure of overall dissimilarity, the STPE, varies between 15% and 35%. The largest value is observed for Hospitality, an aggregate for which the SISF has known coverage issues (see Appendix B in the supplemental data online). For the median product, the STPE is 22%. Such values of the STPE compare reasonably well with those reported in similar exercises by Sargento et al. (2012) and Fournier Gabela (2020), but it should be kept in mind that those studies differ from ours in a number of important methodological aspects (e.g., number of regions, degree of product aggregation, treatment of intra-regional flows).
Naturally, summary measures like the STPE conceal any systematic pattern of error. Hence, Figure 1 presents a series of product-by-product scatterplots of the calibrated against the SISFbased bilateral flows. As can be seen, it is predominantly smaller flowsoften those originating from the South and Centre regionsthat are affected by sizable relative errors. Conversely, larger flows are generally estimated more precisely.
While this is encouraging, other indicators in Table 2 cast doubt on the accuracy of the calibrated matrices. For the median product, the bilateral flows produced by our procedure fall within the 95% confidence interval computed from the SISF only 62% of the time. For some products (Food and Hospitality), the share is merely half. Furthermore, the DTPE, which focuses on the elements along the main diagonal, indicates significant amounts of error in our intra-regional trade estimates. In this respect, it is useful to make a distinction between manufacturing (products 1-7) and services (8)(9)(10)(11). In the case of manufactured products, the DTPE shows a general tendency to overestimate intra-regional trade. Whenever that happens, the constraints given by the row and column totals imply that interregional trade must be correspondingly underestimated. In the most severe case (Food), total within-region trade is overstated by as much as one fourth. If we limit our consideration to products 1-6, every single intra-regional link but one (Metalwork sales within the Northeast) turns out to be overestimated. Product 7 (Other industry) represents the only exception to this pattern of systematic overestimation. Overall, these results suggest that for virtually all manufactured products the distance elasticities we have used to calibrate the model may be too high (in absolute value). Indeed, specifying too steep a distance decay process exaggerates the frictions between faraway locations while playing down those between nearby locations. Naturally, this inflates all intra-regional flows, which in general take place over comparatively short distances.
The case of services, on the other hand, is symmetric to that of manufacturing. The DTPE is negative for all products and all intra-regional flows are underestimated, which seemingly implies that the distance elasticity we extrapolated from the WIOD is lower than appropriate (again, in absolute value). As a matter of fact, this is consistent with IRPET's finding that in the SISF interregional trade data distance decay is steeper for services than for manufacturing (Cherubini & Paniccià, 2013;Paniccià & Rosignoli, 2018).

An alternative set of elasticities
Given that the calibrated trade matrices obtained using the results of Table 1 are not entirely satisfactory, we experiment with an alternative set of elasticities. We repeat our calibration exercise settingû k = −1 for all manufactured products andû k = −2 for all services. These somewhat arbitrary assumptions are motivated by our earlier finding that the distance decay processes implied by our econometric estimates are quite likely too steep for manufacturing and too flat for services. As an aside, Table 1 shows that the WIOD data confidently reject these parametric assumptions.
The new set of calibrated matrices is compared with the SISF data in Table 2 (panel b). Relative to Table 2 (panel a), dissimilarity metrics generally improve. The STPE decreases for virtually all products and its median value drops to 15%. The only exception is represented by the Other industry aggregate. In the same spirit, there is a marked increase in the share of trade flows that fall within the 95% confidence interval from the SISF, which for the median product is now 88%. When it comes to intra-regional trade, the DTPE occasionally flips sign but generally indicates more moderate amounts of error. Finally, as we move from Table 2 (panels a to b) the improvement in accuracy tends to be more pronounced for manufactured products than for services.

Discussion
The results of our case study can be looked at in two ways. On one level, they represent a test of the gravity-RAS approach to interregional trade estimation. In this sense, Table 2 (panel b) is evidence thatprovided the user-supplied distance elasticity is chosen appropriatelya simple implementation of the method is capable of recovering an unknown trade matrix with a degree of accuracy suitable for most policy analysis applications. On another level, our empirical exercise is an experiment in estimating the crucial distance elasticity parameter from international trade statistics. From this point of view, our results are mixed. On one hand, the calibration approach outlined in section 2 has proved to be practically feasible and easy to operationalize. On the other, we find that the distance elasticities calculated from the WIOD are often some way off from those implicit in the SISF data. As a result, our calibrated trade matrices reproduce the benchmark trade flows with a certain amount of bias. Arguably, adopting a richer gravity model specification than the relatively simple set-up we have used here could improve the accuracy of our econometric estimates. It is far from obvious, however, that it would also move them closer to the distance elasticities embodied in the benchmark data. After all, our estimates are broadly consistent with the findings of similar analyses (Jahn, 2017) and with the empirical gravity model literature (Disdier & Head, 2008;Head & Mayer, 2014).
But why would the SISF trade matrices be characterized by distance elasticities that are systematically different from those of their WIOD counterparts? We find it difficult to believe that this might be due to fundamental differences in economic behaviour between the inter-and the intra-national level. More likely, the reason should be searched in the design of the SISF. One possible explanation, for instance, has to do with the fact that the SISF population covers only comparatively large firms. In fact, the subsample for which interregional trade data are available is delimited even more restrictively. Assuming that it is predominantly large firms that engage in long-distance trade, with small firms mostly focused on short-distance transactions, trade will appear to decline less rapidly with distance if the dataset considers only large firms than if it contemplates the total economy (Chaney, 2018). Indeed, for virtually all products of manufacturing we do find evidence that the distance elasticity we calculated from the WIOD is systematically larger (in absolute value) than that embodied in the benchmark tables. This result does not hold up for services. In our analysis, however, services are often problematic in other ways as well. In the case of Hospitality, for example, the descriptive analysis of Appendix B in the supplemental data online suggests that what is covered by the SISF in only a narrow segment of the market. For Trade, on the other hand, the relationship between the SISF and the WIOD is complicated by accounting differences. While the SISF collects data on sales, the national accounts on which the WIOD is based report the value of the service provided by the vendor (e.g., sales minus the cost of the goods purchased for resale). More generally, the international trade data available to the compilers of the WIOD were of far lower quality for services than for manufacturing (Dietzenbacher et al., 2013) and it is possible that this is reflected in our econometric elasticity estimates.
On the whole, the characteristics of the SISF give us reason to question whether the benchmark matrices provide an entirely faithful representation of interregional trade patterns in Italy's total economy. 2 In other words, the distance elasticities that emerge from the validation data are likely to be themselves distorted representations of those that characterize the Italian economy as a whole. In this sense, the finding that the elasticities computed from the WIOD often represent fairly inaccurate approximations of those embodied in the benchmark trade matrices does not necessarily constitute damning evidence against the calibration procedure put forth in this paper. Overall, as a test of whether interregional trade flows for a national economy can be reliably predicted using distance elasticities computed from international trade statistics, our case study, though instructive, seems rather inconclusive.

CONCLUSIONS
A common problem for regional analysts is lack of data on the patterns of trade in goods and services within the borders of a country. This paper has considered a simple calibration procedure that can be used to reconstruct an interregional trade network from statistical information that is usually available. In terms of the existing literature, our approach can be characterized equivalently as an application of the gravity-RAS or of the doubly constrained gravity model method. Although easy to implement, these techniques rely on a user-supplied (and product-specific) estimate of the distance elasticity of interregional trade. Severe misspecification of the distance elasticity potentially biases the estimates quite significantly. Yet, there is no standard accepted way of determining the value of this parameter in applications.
A way around the problem of the missing elasticities, we have argued, emerges naturally once we consider the microeconomic foundations of the gravity equation. Most applications in the area of interregional trade estimation have used the gravity equation merely as a robust empirical regularity. Our approach, on the other hand, is to work with a gravity equation derived from microeconomic principles, the AvW model. Doing so gives us an econometric framework for estimating the distance elasticity of trade between the regions of the same country using data on trade between different countries, while controlling for potentially confounding factors such as border effects. Unlike data on interregional trade, which are extremely difficult to come by, international trade statistics are easily available with quite detailed levels of commodity disaggregation.
The empirical part of the paper is concerned with a case study of Italy. It demonstrates the operation of the calibration procedure and attempts to validate its results against survey data. To estimate the necessary distance elasticities, we have used bilateral trade data extracted from the WIOD world input-output table. The validation dataset, on the other hand, has been constructed using information on interregional trade collected one off by a Bank of Italy survey of manufacturing and service firms. The same data have already been used extensively in the construction of subnational input-output accounts (Cherubini & Paniccià, 2013;Paniccià & Rosignoli, 2018). Our results show that gravity model calibration is indeed easy to operationalize and capable of reproducing the validation data with an encouraging degree of accuracy. Curiously, calibration tended to produce more accurate trade estimates when the distance elasticity parameter was set on the basis of a naive guess (namely, −1 for manufacturing and −2 for services) than when the WIOD-based econometric estimates were used. We are reluctant to interpret this finding as evidence of a fundamental flaw in our approach to distance elasticity estimation, as it is apparent that the limitations of the validation dataset do play a role in the analysis. Indeed, having been assembled from a survey that focuses on comparatively large firms, our benchmark matrices seem to offer only a partial view of Italy's interregional patterns of trade. Although very rare, survey-based interregional input-output databases should be less susceptible to the kind of difficulties we have encountered working with raw survey data. Like other techniques before (Fournier Gabela, 2020;Jahn et al., 2020), in the future the gravity model calibration approach could also be tested on such data. In doing so, however, it is important to keep in mind that the interregional trade data contained in subnational input-output tables are often heavily modelled, in many cases using a gravity framework akin to the one we are attempting to assess (Yamada, 2015;Zheng et al., 2021).
It is worth noting that, even though our case study has only considered regions that form part of the same country, the calibration procedure described in this paper could be easily generalized to reconstruct trade networks that stretch across national borders. Another extension possibly worth exploring concerns the use of alternative data sources to estimate the distance elasticities of the model. Economists have identified various causal mechanisms through which gravity-like trade patterns may arise. Some of those (e.g., Chaney, 2018) may suggest ways to calibrate our model from data other than using international trade statistics (e.g., from data on firm size distribution). Finally, gravity model calibration is easy enough to implement so as to represent an alternative to popular nonsurvey input-output regionalization techniques, such as location quotients (Flegg et al., 1995;Flegg & Tohmo, 2013, 2016Kowalewksi, 2015) or CHARM (Kronenberg, 2009;Többen & Kronenberg, 2015). Given a complete set of calibrated trade matrices, computing the trade coefficients for a single-or multiregional input-output model is indeed immediate (Miller & Blair, 2009). In fact, the calibrated trade flows could even be embedded in a fully balanced multiregional supply and use system (Boero et al., 2018;Temursho et al., 2021).