Linking multisectoral economic models and consumption surveys for the European Union

Multisectoral models usually have a single representative household. However, more diversity of household types is needed to analyse the effects of multiple phenomena (i.e. ageing, gender inequality, distributional income impact, etc.). Household consumption surveys’ microdata is a rich data source for these types of analysis. However, feeding multisectoral models with this type of information is not simple and recent studies show how even slightly inaccurate procedures might result in significantly biased results. This paper presents the full procedure for feeding household consumption microdata into macroeconomic models and for the first time provides in a systematic way an estimation of the bridge matrices needed to link European Union Household Budget Surveys’ microdata with the most popular multi-regional input–output frameworks (e.g. Eurostat, WIOD, EORA, OECD).


Introduction
Most economic models, such as Input-Output (IO) models or Computable General Equilibrium (CGE) models, represent private consumption using a single representative agent. However, this assumption of the representative agent has been widely criticised (Brock & Durlauf, 2001;Hoover, 2008;Kirman, 1992;Lehtinen & Kuorikoski, 2007;Savard, 2004;Stiglitz, 2018) and the inclusion of heterogeneous household profiles in economic modelling is increasingly encouraged.
For example, the recommendations of Stiglitz et al. (2009) included '2. Emphasising the household perspective' and '4. Give more prominence to the distribution of income, consumption and wealth'. The G-20 'Data Gaps Initiative' also recommended establishing a link between National Accounts (NA) data and distributional information as a conceptual/statistical framework.
As -among others- Kim et al. (2015) have shown, treating the household aggregate and its consumption structures in a heterogeneous way (differentiating household types) reveals important insights into structural change driven by socio-economic changes. There is a continuum between introducing many (hundreds of) household groups in a model and linking a multisectoral (IO or CGE) model with a micro-simulation model. Recently there has been some development in this area (see, among others, Colombo, 2008) and it has been shown how this approach outperforms the representative agent approach for many policy issues that involve income distribution (Savard, 2004).
Nevertheless, a large part of IO and CGE models do not take full advantage of the information available on consumption structures by household types in increasingly publicly available official data. Indeed, several factors contribute to this state of 'under-research'.
One important issue is that, in all cases, the structural information in household surveys (Household Budget Surveys, HBS) needs to be bridged to the consumption structure information in IO statistics, which is not straightforward given the lack of publicly available 'contingency matrices' (commonly known as 'bridge matrices' 1 ) and the different valuation methods of the two datasets involved. Both issues complicate and hinder analyses of the implications of household heterogeneity for the effectiveness of public policies or the distributional effects of policies.
Contingency/bridge matrices are part of the national accounts (NA) but national statistical institutes (NSI) do not usually publish them. In consequence, most of bridge matrices used in the literature are ad-hoc estimates and the methods used for their construction are not always explained in depth. Among other works, some examples in the past were Kehoe et al. (1988aKehoe et al. ( , 1988b for the construction of social accounting matrices (SAMs) and CGE models; or Wier et al. (2001), Flores & Mainar (2009), Steen-Olsen et al. (2016), to analyse environmental impacts. Serrano and Fernandez-Vázquez (2017) recently emphasised the importance of using accurate contingency matrices by showing how significantly consistency can affect the results in terms of impact analysis. Notice that in demand-driven models, the size of the impact is often more due to the size of the shock than to the technological structure (see for example Arto & Dietzenbacher, 2014). Therefore, an error in the size of the shock is especially relevant.
On the other hand, there are some accounting differences between the household expenditures reported in surveys and the consumption data in the IO framework. For example, expenditure data from surveys is reported in purchasers' prices while consumption data in IO tables is reported in basic prices. Also, the data have to be adjusted due to differences in the geographical scope (expenditure data follows the residence principle while IO data follows the territorial principle) and, when the IO tables are industry by industry, expenditure data have to be transformed from product to industry. Thus, the linkages between the two datasets requires a number of transformations that are not always known by modellers and practitioners. Relevant steps to follow when linking consumption (e.g. household) surveys to multisectoral models were explained in Mongelli et al. (2010) and Min and Rao (2017).
In this context, this article introduces a systematic method for linking consumption data from the IO framework and expenditure surveys and provides estimates of the bridge matrices for the 28 countries that made up the European Union (EU-28) in 2016 (when the consumption survey microdata currently available was released).
The remainder of the article is organised as follows. Section 2 presents the data and methods for bridging information from consumption surveys and macroeconomic models, including a brief description of the different steps for linking the two datasets and the method for estimating the bridge matrices, which are a central element of the procedure. Section 3 shows the results of the estimation of contingency matrices for the EU-28 and tests the main assumptions. Section 4 concludes and discusses the results.

Methods
At first sight, linking expenditure data from surveys and consumption data from the IO framework could be seen as a simple problem that can be solved with a direct conversion of classifications and an adjustment method such as a RAS (this is represented by the dashed arrow in Figure 1). However, as we show in Section 2.1, this approach is completely wrong and results in biased outcomes. Producing results at the household level sounds very appealing; however, before starting the calculations, it is important to understand and apply the right method in order to produce robust results. As shown in the lower panel of Figure 1, the procedure involves a number of steps and relevant concepts of the National Accounts (NA). Therefore, the following steps should be considered before using consumption-surveybased profiles in macroeconomic models: (1) Align consumption microdata to NA principles.
On the one hand, survey data does not follow the accounting principles of NA which are a compendium of multiple data sources. On the other hand, multisectoral models are mainly based on NA; thus, it is necessary to adapt the data from surveys to the national accounting principles to be able to use it correctly in macroeconomic modelling (more details of the reasons for this adaptation and the procedure are given in Section 2.1).
(2) Convert consumption microdata aligned to NA principles to production-based classifications.
Consumption data is usually available in classifications of expenditure according to purpose such as COICOP (Classification of Individual Consumption by Purpose). However, multisectoral multi-industry models follow a classification of products aligned with an industry classification such as CPA (Classification of Products by Activity, Eurostat, 2019a). Thus, it is necessary to bridge the two. For such bridging, estimations of bridge matrices based on public official data of COICOP vs CPA are used. The sub-steps of this central step are as follows: (a) Prepare the available official contingency tables to be used as priors: they should have the proper aggregation and territorial to residential adjustment. (b) Prepare the consumption data from the use tables to be used as input for the estimation process: this data should be in purchasers' prices and have the proper aggregation. (c) Identify similar countries to select the most suitable proxies for the countries for which official contingency tables are not available, to be used as priors. Estimate the bridge tables consistently for each database, linking the consumption in the use tables and in the final consumption expenditure of households by consumption purpose (COICOP 3-digit level) (part of the National Accounts). In Section 2.2 we will introduce further details of this step and in Section 3 we will show the results of the estimation procedure.
(3) Change the valuation of consumption microdata in NA principles and productionbased classifications to basic prices (bp). Consumption microdata and Household Final Consumption Expenditure (HFCE) in NA are in purchasers' prices (pp). However, multisectoral models usually work at basic prices (producer prices). The difference is the net taxes (i.e. taxes less subsides) paid by the user but not perceived by the producer and the trade and transportation margins that need to be relocated to the 'margin' industries (e.g. trade and transport industries). In Section 2.3, we explain how to convert the prices from 'pp' to 'bp'. (4) Adjust the data from the product classification (i.e. CPA, see EC, 2008)  The remainder of this section explains the method.

Alignment of consumption microdata to NA principles
Multisectoral models such as IO and CGE models are based on Supply and Use tables and IO tables which are a core element of the NA (see Eurostat, 2019b). Therefore, the accounting principles of NA are reflected in the data and consequently in the model. However, this is not the case of consumption surveys. Thus, it is crucial to align the accounting principles of all the datasets involved. Aggregate HFCE data of the whole economy is often reported by NSIs using COICOP. This data are part of the NA and their totals are consistent (excluding vintages issues) with the total HHFC in the CPA reported in the Supply and Use tables and IO tables and other main aggregates of the NA, such as the split of the GDP from the expenditure side. However, the total HFCE resulting from summing up the HFCE of individual households reported in consumption surveys does not match the aggregate HFCE of NA, with differences ranging from 50% to 97% for EU Member States in 2010 (Eurostat, 2018a).
Furthermore, not only the aggregates, but, more importantly, the structures of the consumption surveys are inconsistent with the HFCE figures in NA (see Table 16 in Eurostat, 2015). The coverage rate of the consumption surveys with respect to the NA consumption varied across the different COICOP categories from 6% to 119% for EU member states in 2010 (Eurostat, 2018a). This is due to the fact that in the compilation of HFCE of NA, consumption surveys are a major data source but not the only one. Data from survey results is complemented with additional information such as tax statistics or transportation surveys to produce the HFCE of the whole economy. Amores (2018) details some of the sources of the differences such as conceptual and classification differences, measurement errors (e.g. under/over-estimation of some categories like alcohol, tobacco, housing, water, energy or food) and estimation errors (e.g. high-income households being underrepresented).
To make the microdata of the consumption surveys consistent with the HFCE in NA, we suggest the following procedure. First, for every COICOP category, we calculate the ratio between the HFCE in NA and the total for the whole population in the consumption survey. These ratios can be interpreted as scaling coefficients of the average representative household of the NA. Second, we use these coefficients to align the consumption profile extracted from the survey to the NA accounting standard. To do so, each category of consumer profiles from the survey is uprated/downrated by multiplying it category-wise against such coefficients. The existence of unmatched COICOP categories between HFCE in NA and consumption surveys makes a stepwise procedure necessary. Further details on such procedure can be found in Section A3.1 in the Supplementary Material.

Conversion from COICOP to CPA
Once the consumption data in COICOP has been adapted to the NA principles, they must be converted into CPA. This is done using the so-called contingency or bridge matrices. In the case of the European Union, NSIs build these bridge tables to compile NA. However, these matrices are not part of the datasets compulsorily submitted to Eurostat (2014) and NSIs do or do not publish them depending on different considerations (transparency, quality, confidentiality, resources, etc.). In the case of the European Union, only eight countries (Austria, Czechia, Denmark, Estonia, Finland, Slovakia, Sweden and the United Kingdom) make them available, and it is very unlikely that the majority of NSIs will publish such data in the short run.
In this context, we suggest a systematic procedure to estimate bridge matrices that we apply to the estimation of the 20 countries of the European Union for which data is not available. The starting points are the vector of consumption in CPA (64 products) and COICOP (47 categories) from the NA of Eurostat. In the cases in which we found differences between the two vectors, we preserve the vector of CPA, in order to keep it consistent with the IO tables. We combine this information with the set of eight available official contingency matrices for 2010 that will be used as a benchmark to estimate the matrices of the remaining 20 countries.
The contingency tables are matrices with a dimension of 64 (CPA products) x 47 (COICOP categories). The element x i,j of the matrix represents the total quantity of product i (e.g. chemical products) that is used for the purpose j (e.g. routine household maintenance); the sum row-wise gives the total HHFC of the use table in purchasers' prices (CPA) and the sum column-wise gives the total HFCE of the NA (COICOP).
One key element for the estimation of the contingency tables is the selection of the benchmark country whose structure will be used as a prior. In our procedure, we try to identify the most suitable proxies for the countries for which official contingency tables are not available by comparing a set of macro indicators with those of the eight potential benchmarks. In particular, we compare the structure of the HHFC from the use table in pp (CPA classification); the structure of the HFCE from the NA (COICOP classification); the GDP per capita as an indicator of development stage, and the sociocultural distance.
As Supplementary material, data for the 28 matrices can be found in Annex A1 (to use before Annex A2 with the transformation of data from purchaser's prices to basic prices), and further details on their estimates in Annex A3.

Transformation of data from purchaser's prices to basic prices
Once the data are matched with the proper classification (CPA vs COICOP) and with the accounting principles (NA vs surveys), the valuation of the data must be aligned. Definitions of types of prices can be found in the UN Systems of National Accounts 2008 and the European Systems of Accounts 2010 (UN, 2009 andEC, 2013 respectively). As shown in Figure 1, consumption data, both from surveys and from NA, is based on purchasers' prices. In contrast, the multisectoral models usually work with basic prices.
This step is required for two reasons. First, the vector of consumption in basic prices does not include taxes and, in consequence, it is lower than the vector in pp. For example, just looking at the totals of households' consumption from the Eurostat use tables, the value in purchasers' prices is on average for the EU-28 around 13% higher than the one in basic prices, reaching in some cases more than 20%. Therefore, apart from all issues involving the structure of consumption, in general for each euro of expenditure from a consumption profile from surveys, 13% of the shock tested with an IO table would be an overestimation. Second, the vector of consumption in basic prices records all trade and transportation margins associated with final consumption in the so-called margin industries while in the vector of consumption in purchasers' prices (i.e. coming from consumption surveys) these margins are reported as part of the value of the product consumed. Thus, the direct link of consumption profiles in purchasers' prices to an IO table in basic prices would result in an underestimation of the impact linked to the demand for trade and transportation services (hidden in the purchasers' prices) and an overestimation in the rest of all other industries.
Data in purchasers' prices is transformed into basic prices using the following information: the vectors of HHFC in basic prices and purchasers' prices, the vectors of margins associated with HHFC from the table of trade and transportation margins, and the net taxes associated with HHFC from the table of taxes less subsidies on products. We use this information to calculate the implicit net tax rates and margin rates per product as described by Mongelli et al. (2010) who also explain how to use them, as well as the limitations of the approach.
The Supplementary material includes a tool to convert consumption profiles from HBS in purchasers' prices (already in CPA and aligned to NA) into basic prices (Annex A2) and further details (Annex A3.3).

Adaptation of the data to the type of IO table (products vs industries)
The final step consists of linking the consumption profiles adjusted to NA, in CPA and basic prices, to the IO tables. If the IO table is product by product, this can be done in a straightforward way. However, if the IO table or CGE model is industry by industry (e.g. WIOD or OECD-ICIO), an additional transformation is required in order to transform the profiles from products (CPA) to industries (NACE classification). This should be done following the same approach followed for transforming the supply-use table into the symmetrical IO table.
In the case of MRIO tables, developers often use the fixed product sales structure assumption (Model D, see Box 12.3 in UN, 2018). Therefore, if consumption profiles are to be applied to an MRIO, the recommendation is to use Model D for transforming the consumer profiles as well.
On the other hand, NSIs often apply hybrid models (with extensive internal data) that cannot be exactly reproduced by practitioners. However, these hybrid models are often closer to the commodity technology assumption than to the alternative industry technology assumption. On the other hand, the commodity technology assumption tends to produce negatives that should be solved (i.e. applying Almon, 1970).
The omission of this step can also generate important errors. As an example, we compared the vector of domestic demand of households in the International Use tables 2010 from WIOD 2 (Timmer et al., 2015) with the WIOT 2010 (constructed following Model D, see UN, 2018) domestic households vector (Table 1). This gives us a measure of the size of the potential error if this step is omitted.
We calculated the relative differences between the two tables by product for each country, and looked at the median and the maximum across countries for each product (upper part of Table 1). The relative differences in each product can be huge: 9% median across countries of the median across products. However, the median across countries of the maximum across products is 100% and the relative difference can be as high as 3189% (Manufacture of coke and refined petroleum products in Croatia). Then, we took the median and maximum across the products of each country independently, which, we summarised across countries (bottom part of Table 1). The differences are above 10% in half of the countries. To understand the implications, we need some context because the share consumed of every product is not the same, but there are products that are much more relevant in the consumption basket than others. That is why we calculated the average weighted deviations for each country (WAPE), which can be very significant as well (9% median across countries, but up to 63% in Croatia and 58% in Luxembourg).

Results of estimation of contingency matrices for the EU-28
This section shows the results of the estimation of the contingency matrices for the EU-28. Since we want to achieve an appropriate method, which will be in line with using an official contingency matrix to approach the desired one, we first examine the similarities across official contingency matrices that are available and depart from this comparison. Then we identify benchmark tables by explaining assumptions of criteria which may lead to good choices of benchmark tables. Finally, we test these assumptions and present the final results.

Similarities across official contingency matrices
The comparison of available official contingency matrices provides an idea of possible sources and sizes of errors when using the structure of the matrix of one country to estimate the matrix of a second country. In order to compare the similarity or difference among official contingency tables, we compare the structure of coefficients (or shares) of how each of the COICOP categories is distributed across CPA. We use the Weighed Average Percentage Error (WAPE) of the shares of each cell of the contingency matrix in the total of the country objective O with respect to those of a benchmark country B, WAPE_s O,B (Annex A4 presents alternative measures). The WAPE is defined as:  Table 2. 3 The percentages indicate the global deviations that were obtained departing from the bridge matrix of a benchmark country (in columns) to estimate that of the objective country (in rows). In the last two columns we show the two countries that would lead to the best approximations (i.e. lower WAPE_s O,B ). The results show that Finland is the best benchmark in four out of eight cases, followed by the Czechia and Sweden (two cases each). Finland is also the second-best benchmark in two cases.
Once we have compared the difference between the matrices, we will test the size of the potential errors that could stem from using a The structure of the prior matrix is constructed using the marginal frequencies of the elements of the column (HHFC/CP) and row vectors (HFCE/COICOP). As we will see, this is obviously a very rude guess to start with, since it creates non-zero values in every cell as long as there is a value in those objective row and column totals.  (2020), which is essentially the same as CPA, and with 35 (instead of 47) COICOP accounts. Their tables for these 8 countries are also compared to the aggregated official contingency ones. Source: Own elaboration.
• 'Non-zero naïve tables': the tables are constructed from the same marginal frequencies but only for the non-zero cells known from the official contingency table (hence this matrix does not generally achieve the objective column and row values). An alternative would have been constructing them based on the Eurostat (2018b) correspondence. • Count-seed RAS: Cai and Vandyck (2020) recently estimated contingency tables for the European Union using the count-seed RAS method (Cai & Rueda-Cantuche, 2019) in which a single benchmark matrix is constructed by counting the number of items that simultaneously contribute to a given aggregate cell of the table in a disaggregated mapping between CPA and COICOP (3 000 CPA categories and more than 100 COICOP ones); then a RAS is used until convergence.
The WAPE_s O,B obtained are reported in Figure 2. The best results are obtained using the tables of the benchmark country as priors. In all the cases, this method shows the lowest figures, with an average WAPE_s O,B of 15%. The second-lowest average WAPE_s O,B is obtained from the 'most distant' method (30%) and from Cai and Vandyck (33%). Comparing these two methods, we observe that the 'most distant' results in a lower WAPE_s O,B in six out of the eight cases. Furthermore, this occurs in all the cases if we look at the values for the 'most distant except for Denmark (DK)' for all other countries as shown in the figure and in brackets in the table below.
Finally, the 'full naïve' method would result in a relatively poor approximation, with the WAPE_s O,B exceeding 170% in all cases. The test shows that the choice of the prior plays a crucial role when estimating the contingency tables. Furthermore, from Figure 2 we can conclude the following: The use of an official table as a benchmark for the estimation of contingency tables results in lower errors than any of the other methods tested. This is especially true if the benchmark table is the closest to the country objective.
The difference across the official matrix used is not very large, compared to the alternative methods. Therefore, using an official matrix (even selected at random) as the prior is better than using the count-seed RAS method or naïve priors. The use of naïve priors would introduce huge biases, even considering the non-zero structure; therefore, practitioners are strongly advised to avoid them.
In this section, we have identified for each country the closest benchmark by comparing the structure of its official table with the structure of the other seven official tables. The question now is how to identify the benchmark for the 20 countries for which the contingency tables are not available.

Identification of benchmarks
We have concluded in the previous section that it is crucial to use an official matrix as a prior although which one is selected is less critical. However, to make it as good as possible, one of the key elements of estimation of contingency tables is the identification of the countries with official contingency matrices that are the 'most similar' to those we want to estimate.
Based on the studies on the variables explaining consumption and, especially, the structure of consumption, we identify four indicators to associate the similarity in consumption across countries: structure of the HHFC by CPA, structure of the HFCE by COICOP (these two are the row and column totals searched), the GDP per capita and the sociocultural distance.
As shown in Table 3, the first two indicators are the aforementioned HHFC and HFCE, based on their WAPE_s O,B for 2010. The third one is the Pearson correlation among countries of the 2005-2015 GDP per capita at current prices, as a proxy for the economic development stage. The sociocultural distance ranking departs from a geographical distance and a cultural distance metric (from Kaasa et al., 2016) but is ultimately defined by the authors on the basis of expertise. Further information on this metric is found in Annex A3.
For each of these four metrics we identify the closest country with an official contingency matrix. We use this information to identify the benchmark country. The results are reported in Table 3. 4 For each criterion, we record not only the closest partner but also two additional ones. This allows the choice of a country that appears better ranked across the different criteria, even when it may not be the closest in some of the criteria, rather than a country that

SK (CZ) SK (CZ) SK(AT) SK (CZ) SK (CZ) UK United Kingdom UK (FI) UK (FI) UK(EE) * UK (FI)
Italics: Countries with official contingency table available. In these cases, the closest countries different from themselves are shown in brackets in the 'choice' column to test the selection criteria. Bold: Match between the closest country for the 4 criteria and the chosen one. * The sociocultural distance metric is not applied for cases in which it was considered from the quantitative metrics and authors' logic that the proximity was not sufficiently clear-cut. Source: Own elaboration. appears first in one or two criteria, but far away in others. For these reasons, in some countries like Bulgaria or Romania, the choice does not appear in first position in any of the metrics considered.
The results of the benchmark (last column in Table 3 shown in Figure 3) show that Austria would be the best choice for the estimation of the contingency matrix of six of the missing countries, Czechia for five, Finland and Slovakia for three, Estonia for two and the United Kingdom for one. In a few countries, in particular Lithuania and Poland (for which the closest countries are Estonia and Slovakia, respectively, for every metric), we find some general closeness with a particular country. For Latvia, as well, the closest found is Estonia for three metrics. However, this is clearly not always the case; other countries show two metrics at most for which the same country applies. In those cases, we choose as a benchmark the country ranked the most times in the top three across all metrics. A rare case in this regard is Portugal, which, while resulting closer to Austria in both HHFC-CPA and HFCE-COICOP metrics, is far from it in GDP per capita, while the Czechia is among the closest three countries in all metrics. Alternatively, although not provided in this article for simplicity (available upon request), we also estimated a set of bridge tables by choosing only the first criterion (closeness by CPA), and another set by choosing only the second criterion (closeness by COICIOP).  Table 3).
Note: Solid colour indicate countries which data is used as prior for the benchmarked countries (patterned-with the same basis color) Source: Own elaboration using https://mapchart.net/.

Testing the assumptions
We tested the method for the eight countries for which contingency matrices are available. We estimated the contingency matrix taking as priors all the other available tables and applying a RAS of 100 iterations. Then we calculated the WAPE: where x O ij represents the values of the official contingency matrix of the objective country O andx O,B ij are values of the contingency matrix of the objective country O taking as a prior the contingency matrix of the benchmark country B.
The resulting WAPE_x O,B are shown in Table 4, together with a column showing the closest country (lowest WAPE_x O,B ) and the benchmark country resulting from applying our selection criteria (last column in Table 4, and countries in brackets in the last column in Table 3). The results show that the prior tables that provide closest approximations to the official tables are essentially the same as those obtained with our method, except for Czechia and Finland. In these cases, we find that the WAPE_x O,B associated with the country selected following our method is very close to the lowest WAPE_x O,B : for Czechia, the WAPE_x O,B of the benchmark country according to Table 4 is 10 (Slovakia) and the lowest experimental WAPE is 8 (Finland); for Finland, the WAPE_x O,B of the benchmark country according to Table 4 is 10 (Sweden) and the lowest WAPE is 9 (Czechia).
Also, none of the individual criteria of Table 3 or alternative combinations of them would have led to any choice closer to the results from the WAPE_x O,B . This gives us confidence in the decision to use the choices of Table 3 based on a combined set of criteria. However, we do not claim infallibility, being aware of the fact that the availability of official contingency matrices may have some influence on these results, notably considering the high representation and matching of results from the Nordic countries. Given the similarities in the results among tables derived from different countries when no anomalies exist (e.g. some zeros that make it impossible to obtain others, especially in the totals of the benchmarks), we made more effort to make sure that none of these anomalies exist, in order to avoid infeasibilities and thus trying to obtain robust results.

The contingency tables of the European Union
Finally, because of this process, we have a set of 28 contingency tables for the European Union: 8 official and 20 estimated. All the tables match the vectors of consumption by CPA and COICOP in the NA, and the estimated tables are done using a RAS procedure using as the prior for the table of the country reported in the last column of Table 3. The tables for the countries where official tables are available are also benchmarked to the latest CPA/COICOP data in the NA in order to correct for possible inconsistencies (vintage data, currency differences and other country-specific issues 5 ).
As indicated above, Annex A1 provides these contingency tables. The tables are provided at the Eurostat standard of NA for CPA (64 categories) and 3-digit level of COICOP (with 47 categories, from CP011 to CP127).
With simple aggregations, one can obtain the contingency tables from which one derives the bridge at the levels of 56 CPA categories of the international WIOD tables Release 2016 and at the levels of 61 categories of EU countries in EORA. OECD aggregation in ISIC 4 (36 industries) can also be obtained by aggregation, given that the only industry that is more disaggregated in OECD IICOIs than in Eurostat tables is CPA_B (mining) which is not directly consumed by households.
Similarly, COICOP categories of the contingency tables can be aggregated to accommodate them to the level of detail desired in the model used.

Conclusions
Using the detailed information contained in consumption surveys that is becoming increasingly publicly available opens up research avenues for better understanding changes in consumption structure, driven by individual behavioural changes and sociodemographic trends. Integrating that into economic modelling constitutes a major step forward in analysing how changes in consumption structures drive aggregate structural change in an economy.
The consistent introduction of information from household consumption surveys into the IO framework also opens up a whole area of research on the impact of household characteristics on economic and environmental variables. That includes issues like the various links (in both directions) between the labour market and income distribution, carbon footprints by income/age group and other household characteristics and many more.
However, the proper integration of the information of consumption surveys into the structure of economic models is not straightforward. The process requires a number of data manipulations, which are not always well known by modellers because it requires highly specialised expertise in National Accounts and surveys. Furthermore, it also requires additional data in the form of contingency tables to derive the suitable bridge matrices that, in most cases, are not publicly available. Indeed, although NSIs build these bridge tables to compile the National Accounts and the Supply-Use tables of the IO framework, it is very unlikely that the majority of them will publish this kind of data in the near future.
We briefly presented a method, which summarises the main tasks for making this link: (1) align survey data with National Accounts, (2) convert survey data from the expenditure/consumption classification (COICOP) to the product classification (CPA), and (3) change the valuation of consumption data from purchasers' prices to basic prices. All these tasks are required in order to make this link. However, they are often neglected, unknown or non-transparent in many of the works using data from consumption surveys.
The method and the set of contingency tables presented in this paper aim to provide a comprehensive toolbox to make the link between consumption surveys and economic models in a rigorous manner. In this regard, we consider that the linking method and contingency tables presented in this work, together with the tools in the Annexes, can facilitate the use of microdata from consumption surveys in IO analysis and economic modelling.
The main contribution of this article, apart from clarifying the linking process, is the set of contingency tables that practitioners can use to derive the bridge matrices at their desired disaggregation for the 28 countries that made up the European Union in 2016 (when the consumption survey microdata currently available was released). These contingency tables constitute a key element for this data integration. As analysed by Serrano and Fernandez-Vázquez (2017), inaccurate bridges cause important biases in the analytical studies of consumption. In addition, as the proper estimation of these tables is very time-consuming and requires in-depth knowledge and expertise on National Accounts, this paper provides a ready-to-use database that will be useful for many practitioners.
We departed from 8 official publicly available contingency tables, rearranged them consistently to common classifications, and estimated the remaining 20 tables.
We also found that while using an official table as a prior, even if randomly selected, it outperforms recent methods such as the count-seed RAS Vandyck, 2020 andRueda-Cantuche, 2019) and significantly outperforms the naïve prior.
We also developed a method for identifying which of the seven selected available tables is the best to use as a prior to estimate the contingency table of each missing country, finally working with all seven of them as priors. We successfully tested the robustness of this benchmark identification method. All in all, as more or less summarised by a reviewer, from this work it emerges that 'if you want to obtain a table of country 'D', make a RAS of the indicated bridge matrix of country 'A' and you will get a good approximation.' The limitation of the test is the concentration of available official tables used as benchmarks in the northern European countries. Obviously, having some benchmark contingency tables available for southern European countries would improve our estimates for this set of countries.
Together with the limited of availability of contingency matrices over time, the article has a geographical focus on the European Union since it has microdata on consumption available for research purposes, which is not common worldwide. However, the method is general and valid whenever there is similar data and similarities between targeted and benchmark countries.
Furthermore, we found stability in the choice of prior countries according to the structures of the searched total vectors of the contingency matrices. Consequently, our estimates are valid for the European Union countries of the WIOD and OECD and we consider the application of the method to other countries worthy of further research, as well as to relevant databases such as EXIOBASE or GTAP. Finally, we recommend that future updates of all these databases also include information bridging the data on expenditure with classifications like COICOP. This should comprise more than the bridge COICOP / product classification (specific to the database), but also the survey to NA adaptation and the purchasers' prices to basic prices transformation.
It should be stressed that, due to the importance of these contingency tables for economic analysis, ideally this exercise should be carried out within official statistical programmes of NSIs. This would reduce the number of assumptions and increase the quality of the data. For example, at the level of Eurostat or the OECD, being able to gather official contingency matrices in a consistent manner, with different links to standard classifications and other parts of the National Accounts, would enormously reduce uncertainty, biases and assumptions made by researchers and users.