Explaining value chain differences in MRIO databases through structural path decomposition

ABSTRACT Many multiregional input–output (MRIO) databases are used to calculate consumption-based accounts. Results feature in climate policy discussion on emissions reduction responsibilities; yet studies show that outcomes produced by each database differ. This paper compares the emissions associated with value chains from Eora, EXIOBASE, GTAP and WIOD. Structural path analysis identifies the largest paths in each database and the differences in common paths are calculated. For the top 100 value chain paths that contain the largest difference, structural path decomposition is used to identify the contribution each part of the value chain makes towards the difference. The results identify and quantify key flows that are the cause of difference in the databases. From these, we can conclude that key MRIO database construction decisions, such as using the residence or territorial principle for emissions allocation and whether energy spends are reallocated based on physical data, are the major causes of differences.


Introduction
Understanding how to develop policy to mitigate greenhouse gas (GHG) emissions has become difficult in an increasingly globalised world. To appreciate the role of trade in terms of emissions, calculations involving multiregional input-output (MRIO) databases have become the dominant and most progressive method. These databases centre on the evaluation and manipulation of trade flows between regions and industrial sectors, using a flow matrix approach. For example, the flow of steel from steel production into car manufacturing is associated with the CO 2 consequent upon that use, allowing the full supply chain emissions of cars to be calculated. The number and types of policy applications of MRIO suggested by both academics and policy makers is growing exponentially. There are a number of leading databases available and each database produces a different set of consumption-based accounts (CBA). Each country's difference from the multi-model mean CBA is shown in Figure 1. However, there has been little appreciation as to why they produce different results. Owen et al. (2014) used structural decomposition techniques to attribute the difference in CBA calculated by the Eora (Lenzen et al., 2013), GTAP  and WIOD (Dietzenbacher et al., 2013) MRIO databases to the component parts of the environmentally extended Leontief equation. This paper delves deeper into the investigation into the causes of model difference and the resulting effect on output, by considering differences within individual value chains. For this study, we also include the EXIOBASE MRIO database (Tukker et al., 2013;Wood et al., 2014).
If policy is to be developed for tackling consumption-based emissions, and this policy addresses the goods and services consumed, then clearly understanding the reliability of the emission estimates of groups of goods and services is paramount. One way to understand the reliability is to focus on the consistency of supply chain impacts across the models. Hence the first aim of this paper is to use structural path analysis (SPA) to find, for each database pairing, the paired value chains that exhibit the largest differences. For example, the value chain that describes the emissions associated with the electricity used to make steel that ends up in cars bought by German consumers might not be the largest path in calculating the CBA for Germany using Eora or WIOD. However, when the size of this particular path is compared between the two database calculations, it might have a large difference. Once the one hundred largest path differences (PD) are calculated for every common country, for each database pairing, the second aim is to use decomposition techniques to determine which part of the value chain is responsible for the highest portion of the difference. This technique has become known as structural path decomposition (SPD) (Wood and Lenzen, 2009).

Structural path analysis
SPA is a technique that decomposes a consumption-based account to the sum of an infinite number of production chains -sometimes called paths. Wood and Lenzen, (2003, p. 371) describe this process 'unravelling the Leontief inverse using its series expansion'. The SPA technique was first described by Defourny and Thorbeck (1984) and Crama et al. (1984). SPA can be used to find those production chains that contribute most to a particular CBA. Paths are categorised according to their length. For example, a zero-th order path represents an industry's direct on-site emissions arising from final demand of the product produced by that particular industry. This could be the emissions from transport in providing a transport service. A first order path has one further step in the chain: for example the emissions from steel production that are used to make cars for final demand. Most SPA studies rank these chains in order of their importance. Because there are an infinite number of paths of decreasing importance that sum to the total CBA, most authors will display the top 20 or so chains. Writing in 2006, Peters and Hertwich state that there are very few I-O studies that apply SPA and that hybrid Life Cycle Assessment techniques are a more popular method employed to consider production chains. By 2016, this is still the case -SPA methods remain relatively underexposed, particularly amongst research using MRIO databases. Wood and Lenzen (2003) use SPA and a 1995 I-O database for Australia to compare the CBA of two Australian research institutions. Their analysis reveals a large proportion of the two institutions' ecological footprint impacts occurring upstream in first or second order paths. Using the same database, Lenzen (2003) furthers this work to analyse the Australian economy as a whole and considers CBAs calculated using energy, land, water, GHG, NO x and SO 2 emissions. Lenzen (2003) demonstrates that when considering energy and emissions rather than land use, the zero-th order paths dominate the rankings. The reason for this is that direct land use only applies to a few industrial sectors. A production chain has to start with one of these sectors to show as having significant impact. This means that chains will often have to be least first order to link to the land using sectors. There is significant direct emissions use for a wider proportion of industrial sectors meaning that many zero-th order paths will be significant. The advantage of a emissions-based study is that the largest paths will be relatively short and quick to find during the SPA procedure. Both Lenzen's (2003) and Peters and Hertwich's (2006) analyses, of Australia and Norway, respectively, find that zero-th order paths involving electricity, metals, chemicals and transport services are significant.
SPA can also be used to consider 'sustainable chain management '. Foran et al. (2006) explain that by identifying the all processes in the production of a product, it is possible to decide whether to focus on improving direct onsite production processes or the indirect contributions further along the supply chain. Foran et al.'s (2006) study was based on a request by the Australian government to look at the Triple-Bottom-Line performance of industry sectors. Treloar (1997) uses SPA to investigate the embodied energy paths of the Australian residential buildings sector. Based on his investigation, Treloar (1997) concludes that SPA techniques can improve the completeness of process-based analysis of product impacts and suggests such a hybrid technique. However, taking process-based life-cycle and combining this with I-O data to account for the missing information will result in the double-counting of certain paths. The identification and correction of double-counted paths has been the subject of much debate between Strømman et al. (2009), Crawford (2009) andStrømman (2009). Acquaye et al. (2011) use this hybrid methodology, specifically combining a UK focused two-region MRIO database with process LCA data, to consider the upstream paths that contribute to the production of biofuels. The authors discuss how SPA has been used in this case to identify carbon 'hot spots', or rather the highest carbon intesity path of the upstream supply chain for biodiesel.
Other methodological enhancements include work by Sonis et al. (1997, p. 278) whose block SPA technique identifies 'paths of influence' and can reveal the 'finer structure of economies' using a macro level approach rather than the micro level of each individual path. And, beyond the field of CBA, SPA has also been suggested by Suh (2005) and then implemented by Lenzen (2007) to explore and understand relationships between components of an ecosystem. Suh (2005, p. 256), however, warns that applying ideas from the field of economics to ecology might not always be appropriate but notes that the I-O system is an 'efficient way of presenting . . . data for a network structure'.

Structural path decomposition
SPD was developed by Wood and Lenzen (2009) as a combination of decomposition analyses and SPA. Wood and Lenzen (2009) use SPD to understand changes in a production chain between two points in time. Where decomposition analyses assign proportions of the difference in CBA to elements in the environmentally extended Leontief input-output equation, SPD assigns difference proportions to elements in a product's supply chain. For example, the largest difference in a chain between times t 0 and t 1 could occur in a zeroth order path such as the onsite electricity emissions making an electricity final demand product or a first order path, such as the emissions from livestock that are used to make food products for final demand. In addition to identifying the chains that contribute most to the difference, SPD identifies which part of the chain has the highest difference associated with it. For example, in the second order path representing the livestock emissions associated with final demand for food, the difference between this path in t 0 and t 1 can be shared between the three parts of the chain: the emissions intensity of livestock production; the amount of livestock needed to make a food product; and the amount of food product bought by final demand consumers. Wood and Lenzen (2009) use the logarithmic mean divisia index (LMDI) decomposition technique for the SPD methodology and apply it to Australian I-O tables for 1995 and 2005. Wood and Lenzen (2009) find that between 1995 and 2005, the largest changes in emissions production paths involved livestock and electricity. The element most responsible for difference tends to be either a change in level of domestic final demand or a change in level of demand for export.
Since Wood and Lenzen's (2009) initial paper, there have been very few applications of the technique in the literature. Oshita (2012) uses SPD to look at changes in CO 2 emissions in Japanese supply chains between 1990 and 2000 and Gui et al. (2014) consider changes in CO 2 emissions in Chinese supply chains between 1992 and 2007. Both examples use SPD to explain a change in emissions over time but rather than use the LMDI technique, both Oshita (2012) and Gui et al. (2014) opt for polar decompositions.
Clearly, there is an opportunity for SPD techniques to be applied to different MRIO systems rather than different time frames. The work presented in this paper may present the first application of SPD for this use. In addition there is also an option to explore using the Dietzenbacher and Los (D&L) (Dietzenbacher and Los, 1998) or Shapely-Sun (S-S) (Sun, 1998) decomposition technique within the SPD calculations, which are considered more suitable than polar decompositions (de Boer, 2009).

Data and methods
As explained in the paper by Owen et al. (2014), in order to compare Eora, EXIOBASE, GTAP and WIOD, the databases need to be the same size and have the same structure. These means that each of the databases must be mapped on to a common classification (CC) containing a common set of regions and sectors, presented in the same order, format and currency. This paper adopts the same CC system, of 41 regions and 17 sectors, used in Owen et al. (2014) and Steen-Olsen et al. (2014) and the reader is directed to these papers for more information as to how the aggregation system was generated. This study uses a CC that is slightly different to those described in Owen et al. (2014) and Steen-Olsen et al. (2014) in that a symmetric input-output table (SIOT) structure is preferred to the supply and use table (SUT) format. Finding structural paths with MRIO databases in SUTs formats adds additional complexity to the purpose of uncovering database differences so it was decided to convert the aggregated databases into an industry-by-industry SIOT format, using the fixed product sales structure assumption for use in this paper. This paper chooses to use data from the year 2007 since this is the one year common to all four databases. In their comparison of MRIO databases, Moran and Wood (2014) use a harmonised CO 2 emissions vector. For this study we take the CO 2 emissions data provided by each database supplier to allow for investigation into whether the emissions or economic data is more responsible for difference in value chains. We calculate the largest paths in the following databases: • Eora CC -Eora mapped to the common classification • EXIOBASE CC -EXIOBASE mapped to the common classification • GTAP CC -GTAP mapped to the common classification • WIOD CC -WIOD mapped to the common classification We then compare the same paths in the corresponding databases (based on the same aggregations) to find the top 100 paths for each country with the largest path difference.

SPD equations used
A series expansion is used to calculate the largest paths in each database: (1) adapted from Wood and Lenzen (2003). q is the total consumption-based emissions, e is the emissions intensity vector, A is the direct requirements matrix and y is the vector of total final demand for the specific country. i, j, k and l are component sectors. A first order path from sector i into sector j is calculated by e i a ij y j . A second order path from sector i via sector k into sector j is calculated by e i a ik a kj y j and so on (Peters and Hertwich, 2006). For the SPD, rather than find the path difference associated with the elements from e and A, it was thought more useful to consider that e is constructed from the emissions vector f divided by total output x and that each element of A, a ij , is the corresponding element of the transactions element Z, z ij divided by the corresponding column sum, or rather total output element x j .
This means that zero-th, first, second and third value chains can be characterised thus: The difference can now be interpreted to consider the effect that the emissions vector has on its own rather than being combined with the effect of total output. However, we must be aware that decompositions will sometimes show very large positive effects for f and negative effects for x −1 which, in effect, cancel each other. In these cases, we must ensure to consider the effect of the other factors, even though their net effects are smaller. In addition it is also easier to interpret the difference between individual elements in Z rather than in A where they intrinsically linked to the remainder of the items in the column because each item shows the proportion of the column sum. Dietzenbacher and Los (2000) warn that decomposition analyses need to be treated with care due to the dependency problem. A decomposition equation assumes that each term is independent of each other term. However, the authors point out in their example that 'changes in intermediate input coefficient and in value added coefficient affect each other' (Dietzenbacher and Los, 2000, p. 4). Decomposition of consumption-based emissions often requires the calculation of the emissions per unit of output. It is not appropriate to assume that a change in emissions efficiency can occur independently of the technology matrix used to calculate the Leontief inverse. A solution to the dependency problem is suggested by Dietzenbacher and Los (2000) but most decomposition studies do not address it. In fact, few, with the exception of Hoekstra and van der Bergh (2002) and Minx et al. (2011), mention the issue. The equation presented above splits emissions efficiency into the component parts f and x −1 , this removes the efficiency vector from the equation This amendment does not follow the proposed form suggested by Dietzenbacher and Los (2000) for cases with dependent determinants. And by introducing Z and x −1 as a substitute for A, the dependency issue remains since x is directly dependent on Z. There is no simple way of amending the terms to create independency and we highlight that the dependency issue is problematic for all decompositions that assess changes in emissions and energy (Minx et al., 2011).
Splitting e into the separate elements f and x −1 , and splitting A into Z and x −1 means that paths of zero-th order now contain three elements rather than two. Fourth order paths, which can still give large emissions values, now contain eleven elements rather than six. The D&L (Dietzenbacher and Los, 1998) structural decomposition approach, used in Owen et al. (2014) is too complex for an eleven element comparison. The S-S (Sun, 1998) approach, is instead used to decompose the difference in paths to each element in the value chain equation. S-S is equivalent to the mean effect calculated by D&L but it does not provide the full range of equivalent decompositions. This means that we cannot comment on the variation associated with the contributional effect to the difference for each term. The general format for PD for paths of zero-th to third order value chains is shown in Equations 6-9, respectively.
For the general case x = y 1 y 2 . . . y n , the general format for the S-S decomposition equation is:

Structural path analysis
To illustrate the results produced by a SPA we first use the example of the UK value chains from the GTAP CC and WIOD CC databases. Table 1 shows the top 20 value chains from GTAP. The largest path in the aggregated GTAP databases for the UK is the path representing the emissions from GBR electricity, gas and water supply that go directly to the final demand for that product. This path represents 11.1% of the total CBA for the UK. All of the paths in the top 20 are either zero-th or first order paths. This fits with the findings of Lenzen (2003) who suggests that for SPA using energy and emissions data, most of the large paths are zero-th and first order. The top 20 represents 42% of the overall footprint. Paths originating from the electricity, gas and water supply industry contribute to significant portion of the largest paths. These sectors also featured highly in Peters and Hertwich's (2006) SPA of Norway.
For the corresponding WIOD data, shown in Table 2, the two largest paths are the same but the path in third place is the eighth largest in the GTAP system. Similarly the third largest path in the GTAP data is eighth largest for WIOD. The next stage is to find the largest differences between corresponding paths. For example, the difference between the zero-th order path from the GBR electricity, gas and water supply in the GTAP and WIOD systems is 2.4 MtCO 2 . This path is the largest in both tables, but the difference may not be the largest. To find the largest differences, we need to look beyond the top 20 paths. To identify the top 100 PD the top 1000 zero-th, first, second, third and fourth order paths were found using GTAP and WIOD. Matching path descriptions were found for each order and the difference calculated. PD were then ranked and any outside the top 100 discarded.
The CBA for the UK as calculated by the GTAP and WIOD databases using the CC system differs by 27.6 MtCO 2 , with WIOD calculating the footprint to be higher. Table 3 shows the top 20 value chain differences. The largest difference is for the path from the emissions associated with GBR Transport to the final demand for that same sector. For this path, GTAP calculates higher than WIOD. The second largest path is that from the emissions associated with the GBR electricity, gas and water supply industry that are used for intermediate demand for the same sector and final demand for the same sector. This path is 14.3 MtCO 2 larger in WIOD than in GTAP (hence negative in the MtCO 2 column). Because this path is a first order path, it contains an interaction with a cell in the Z matrix.  In addition, since this path difference is larger than the path difference associated with the zero-th order path from GBR electricity, gas and water supply (ranked number 16 in Table 3), one would assume that it is data from the transactions matrix causing difference. The total 27.6 MtCO 2 difference between GTAP and WIOD is the sum of thousands of PD, both positive and negative. The next stage is to find out which element in the series expansion equation, used to calculate the size of a value chain, is responsible for the majority of the difference in paths and to calculate the percentage contribution each element makes to the overall difference. Table 4 shows the elements in the emissions vector f, the inverse output vector x −1 , the transactions matrix Z and the final demand vector y from GTAP and WIOD that make up the paths shown in Table 3. As Table 3 shows, the path with the largest difference between GTAP and WIOD is the value chain of emissions for transport that go to make a final demand of the same product. Table 4 shows that the industrial emissions associated with the UK transport sector are 131.4 MtCO 2 in GTAP and 91.9 MtCO 2 in WIOD. The inverse output values are 3.35 × 10 −6 and 4.27 × 10 −6 . Final demand of UK transport by UK consumers is 82,228 million US dollars (USD) in GTAP and 53,326 USD in WIOD. Each of the f, x −1 and y elements differ between the two databases, but which element contributes the most to the path difference of 15.3 MtCO 2 ? From Table 3, we suspected that the second largest path difference might have something to do with the differences in the Z matrix. Table 4 shows that there is a large difference between the value from GTAP's Z matrix (6994) and the value from WIOD's (46,163). But again, how much of the overall difference can be explain by this? SPD is used to calculate the contribution each element in the path makes towards this difference and the results are shown in Table 5.

Structural path decomposition
As expected, the second row of Table 5 reveals that the element that contributes most to the path difference is the element in the transactions matrix Z. Each element can either contribute positively to the difference -meaning that using the GTAP element rather than the WIOD element makes the difference positive, or negatively -meaning that using the GTAP element rather than the WIOD element makes the difference negative. Both Z and y, in this case, contribute towards the negative difference, whereas the inverse output has a positive effect. The emissions vector f makes little difference in this case. The overall difference of −14.3 MtCO 2 is the sum of the positive and negative differences and is therefore the net difference between the paths. The percentage values in each row calculate the influence each element has on the gross difference. The first row of Table 5 is the path representing UK transport emissions in transport products and here the difference is positive, meaning that GTAP's path is higher than WIOD and the majority of the difference (41%) is due to the final demand element in GTAP being larger than the element in WIOD.

Global results
The UK case study was used to explain how results were generated and to give an example of how to interpret the findings. This study has calculated the structural paths for the 40 common countries from Eora CC, EXIOBASE CC, GTAP CC and WIOD CC. Table 6 summarises the characteristics of the 100 largest paths from each database. The sum of the largest 100 paths is largest in Eora and a greater proportion of the total is made up of very large ( > 500 MtCO 2 ) paths in the Eora database. In each database, zero-th order paths make up the majority of the total and paths originating from the USA contribute to just under half of the sum of the largest 100 paths. There is some disagreement in the proportion of paths originating in different industrial sectors. Eora and GTAP show 26% and 22% of paths originating in the transport sector, where as EXIOBASE and WIOD estimate 8% and 7%. Eora reports a lower proportion of paths originating in the electricity, gas and water sector and a larger proportion in the construction sector. GTAP WIOD  • Is one database consistently calculating the larger of the paths when two databases are compared? • What is the frequency distribution by size of path difference? • What orders of paths contribute to the total of the top 100 PD?
• Are there particular countries that tend to produce large PD?
• Are there particular sectors that tend to produce large PD?
• Are there particular elements within the series expansion equation that tend to be responsible for most of the difference between paths? • In what type of paths does the emissions data contribute most to the difference? • In what type of paths does the monetary data contribute most to the difference?

Is one database consistently calculating the larger of the paths when two databases are compared?
In general, Eora estimates CBAs to be larger than the estimates from EXIOBASE, GTAP and WIOD (see Figure 1). This finding is also demonstrated in the SPA where Eora paths tend to be larger than their counterparts in EXIOBASE, GTAP and WIOD. Figure 2 shows that from the sum of the top 100 PD, Eora paths contribute 72% of the gross difference when paired with EXIOBASE, 74% when paired with GTAP and 75% when paired with WIOD. EXIOBASE paths make up slightly more of the difference when paired with GTAP and WIOD. GTAP and WIOD paths share the difference almost equally.
In the following questions, the difference assigned to each pairing is disaggregated to show characteristics of the difference. Figures 3-6 display the findings and Table 3 in the SI gives the full results

What is the frequency distribution by size of path difference?
Each database pairing contains a small number of very large PD and the majority of the PD are between 10 and 20 MtCO 2 . When Eora and EXIOBASE are compared, (see Figure 3, top left), we find two paths that differ by more than 500 MtCO 2 . This contributes 1,225 MtCO 2 (26%) to the gross difference. To put this into context, the United Nations (UNFCCC, 2007) reports global CO 2 emissions to be 30,113 MtCO 2 . 1225 MtCO 2 represents 4% of the global total. Paths where Eora reports highest are the only ones where differences of over 500 MtCO 2 are observed. This finding is consistent with the character of the Eora paths shown in Table 6. This also reinforces the conclusions drawn in Owen et al., 2014) that GTAP and WIOD are more similar to each other than when either database is paired with Eora. EXIOBASE and WIOD have the fewest very large PD with just over 50% of the gross difference from PD of over 20 MtCO 2 .

What orders of paths contribute to the total of the top 100 PD?
In all six pairings, the majority of gross difference is from zero-th order paths as shown in Figure 4. These are paths from the source emissions straight to final demand of the same product, by-passing the interactions matrix Z. This means that the cause of the difference must lie in the emissions vector f, the output vector x and final demand vector y. In the Eora and EXIOBASE SPD comparison, 97% of the difference is in zero-th and first order paths. For Eora and GTAP this figure is 96%, Eora and WIOD 97%, EXIOBASE and GTAP 91%, EXIOBASE and WIOD 95% and finally for GTAP and WIOD, 93%. Only pairings involving GTAP have PD that are third order in the top 100. To contain a third order path in the top 100 differences means that there is likely to be large differences in the Z matrix.

Are there particular countries that tend to produce large PD?
There are no paths in the top 100 PD for any of the six pairings where the path crosses a country border. All paths with large PD are contained within a single country. This is not surprising since none of the 100 largest paths identified by the SPA cross borders. Figure 5 shows that for every pairing, the gross difference is made up of paths from the USA, followed by China, India and Russia. These four nations make up 87%, 90%, 88%, 80%, 76% and 80% of the gross difference from the top 100 PD from the Eora and EXIOBASE; Eora and GTAP; Eora and WIOD; EXIOBASE and GTAP; EXIOBASE and WIOD; and GTAP and WIOD SPA calculations, respectively. Interestingly, Table 6 shows that paths originating in India contribute to 5-6% of the sum of the 100 largest paths in each of the databases, but paths originating in India contribute to 14% of the gross difference when Eora and EXIOBASE are paired, 11% for Eora and GTAP and 15% for Eora and WIOD. Figure 5 also reveals that it is not the case that one database consistently calculates larger paths for those originating in India indicating that the difference is unlikely to be caused by one database simply containing larger values.

Are there particular sectors that tend to produce large PD?
When comparing Eora and EXIOBASE, Figure 6 shows that 36% of the gross difference is from paths which originate in the transport sector, with Eora reporting higher paths in most cases. The electricity, gas and water sector; petroleum, chemicals and non-metallic mineral sector; and construction sector also feature heavily in the contribution to the overall difference. PD involving GTAP feature the electricity, gas and water sector for a larger proportion of the paths than the Eora and EXIOBASE or Eora and WIOD pairings. Interestingly, Table 6 shows that transport is the origin sector of 7% of the 100 largest paths in WIOD and 26% in Eora, yet for the paths with the largest difference, we find transport originating paths contribute to 35% of the gross difference between Eora and WIOD, with Eora always reporting the larger path. Transport does not feature in many of the paths with high differences when comparing EXIOBASE and WIOD. This finding is discussed later.

Are there particular elements within the series expansion that tend to be
responsible for most of the difference between paths? SPD allows us to identify the contribution towards the difference that each element in the path makes. To summarise the information we first consider which element contributes most to the path difference. Figure 7 shows, that for the gross difference from the top 100 PD between the Eora and EXIOBASE databases, the element from the emissions vector is the largest contributor of difference (45%). The final demand figure contributes 30% of the differences, followed by total output (17%) and the element in the transaction matrix (8%). The emissions vector (f) contributes most to the difference in pairings involving Eora and the transactions matrix (A) contributes most to the difference in pairings involving GTAP.

What are the characteristics of paths where the emissions or the monetary data
contribute most to the difference? Finally, we characterise the types of paths where emissions are the causes of difference and the types of paths where the monetary information is the cause of difference. Table 7 shows the top 10 paths where the element in the emissions vector was the largest contributor to the difference. The effect of each element is shown in the adjacent ' MtCO 2 ' column. We find that the transport (TRNS), construction (CNST), trade (TRAD) and public administration, education, health and defence sectors (PAEH) are where the emissions vectors disagree. Surprisingly, the electricity, water and gas (ELGW) sector does not appear high in the list of paths where the emissions contribution differs substantially.
The three Eora pairings show very large PD where the value chain starts with the emissions from USA TRNS and China CNST. Paths starting with China CNST do not appear in the remaining pairings which suggests that Eora may overestimate the emissions from this sector. Table 8 shows the top 10 paths where either total output, the transaction matrix or the final demand matrix were the highest contributors towards the path difference. Emissions for the ELGW sector seem to align between databases, but the monetary data differs quite staggeringly and is one of the major contributors towards PD. The three GTAP pairings show very large PD where the value chain involves a transaction with the ELGW sector. Compared to Eora, EXIOBASE and WIOD, GTAP underestimates the monetary information for the USA ELGW sector and over estimates for China's ELGW sector.

Domestic value chains
In the top one hundred paths with the largest differences, every path from every database pairing is entirely contained within one single country. There are no paths with very large differences that describe imports to final or intermediate demand. Emissions in trade account for around one quarter of global emissions (Davis and Caldeira, 2010;Peters et al., 2012). but the 'off-diagonal' blocks within MRIO databases which show the imports to  intermediate and final demand are often estimated based on proportionality assumptions (Bouwmeester and Oosterhaven, 2007;Tukker et al., 2009;Erumban et al., 2011;Peters et al., 2011). Owen et al. (2015) demonstrate that the sections of the transactions matrix Z that represent imports align less between databases than the domestic transactions but the authors also find that the effect of the difference in the import blocks on the overall difference in CBA is not as significant as that of other factors such as total emissions. Some individual country level results do show paths that contain imports as having large differences for nations that rely on traded goods and Table 5, which shows the largest PD for the UK using GTAP and WIOD, has several such paths. However, the nations that have the largest emissions CBAs and the largest individual emissions supply chains tend to be countries like the USA, China and Russia that are not overly reliant on traded goods for intermediate and final demand. In addition, the largest paths often involve electricity, water and gas which are more likely to be domestically sourced. Owen et al. (2014) concludes that differences in the emissions vector are a major cause of difference between Eora and GTAP and Eora and WIOD. Similarly, Moran and Wood, (2014) find that harmonising the emissions vector causes CBA calculated using Eora, EXIOBASE, GTAP and WIOD to converge. This study finds that the emissions element is the greatest cause of difference in 63 out of the top 100 paths with large differences between Eora and WIOD, contributing to 43% of the sum of the gross difference and 61 of the top 100 paths when Eora and EXIOBASE are compared (45% of the gross difference sum). Table 9 reveals that the total global emissions differ between databases with Eora estimating total Global emissions to be slightly larger than EXIOBASE, GTAP and WIOD. In particular, Eora's industrial emissions figure is considerably larger than that used in the other databases and the figure assigned to household direct emissions from home heating and private transportation is lower.

Sources of difference from the emissions vector
In this study, the SPD calculations focus on the difference in supply chains between the four MRIO databases and hence exclusively use the industrial emissions figures. And, because Eora's industrial emissions figure is so much larger than the amount used by the other three databases, it is unsurprising that Eora's emissions are a large source of difference.
There are two main reasons for databases attributing different proportions of total emissions to industries rather than households: (1) due to whether the residence or territorial principle is used for emissions allocation; and (2) the allocation of the road transport activity to users. The residence principle is used in a national accounting framework and states that emissions activity of a resident unit (i.e. a person or company) are allocated to the territory of residence (Genty et al., 2012). This means specifically that when calculating a national account, activities of tourists are removed and reallocated to the country of residence of the tourist and any domestic residents' activities abroad are added. This affects the overall level of the transport component in each country. The territorial principle allocates emissions to the country where they take place and are used in national statistics. The second affect relates to the distribution of road transport (classified as an own activity in energy balances) is allocated to industries and households undertaking the activity (as in energy accounts). In energy balances published by each country or international institutes such as the International Energy Agency (IEA), the road transport is represented as a single activity. Such a representation is not consistent with the industry/household delineation in National Accounts, where the emission should be recorded under the establishment/household undertaking the activity (Eurostat, 2014;UNDESA, 2015). Both WIOD and EXIOBASE use the residence principle, and allocate road transport to the user (Genty et al., 2012;Tukker et al., 2013) and Figure 6 shows that the transport industry is not as significant a source of difference between EXIOBASE and WIOD as it is in any of the other five database pairings. In addition, the EXIOBASE and WIOD column in Table 7, which shows the top PD where the emissions vector is the most significant cause of difference, does not a have a single path containing the transport sector. Eora does not correct for the residence principle or road transport , which may explain why the allocation to households and industry is different in Table 9 and why transport appears prominently in both Table 7 and Figure 6 where Eora is compared to EXIOBASE and WIOD. GTAP also uses a territorial treatment but allocates international transportation to consumers not producers . GTAP v6 used a 50/50 (gasoline) and a 75/25 (diesel) split for fuels used in road transport to the road transport industry and to households respectively, ignoring usage by other industries (McDougall and Lee, 2006).

Sources of difference from the monetary data
This study finds that the majority of the difference in paths, where the monetary data is the largest contributor towards the overall path difference, involves the electricity, gas and water sector. Either the total output, the element from the transactions matrix (Z) or the final demand figure for this sector is very different between the databases. In Table 10 we show the proportion of the electricity, gas and water production mix for each country in the CC that is supplied by that sector itself (including intermediate imports of electricity, gas and water from abroad), taking the values from the Z matrix for each database. Table 10 shows that there is a large difference in the electricity, gas and water proportion across the databases and this discrepancy was brought to our attention by the SPA. There are a number of reasons as to why the monetary data could differ for this sector. The definition of what is included as electricity, gas and water may be different for the different databases. For example, in some countries gas manufacturing is treated as a margin sector.
In general, Eora, EXIOBASE and WIOD agree on the electricity, gas and water proportions and GTAP is the outlier. However when one reads down Table 10, the proportions vary significantly between countries. Eora, EXIOBASE and WIOD report values of over 40% for Austria, Brazil, the UK, Portugal, Slovakia and Turkey but less than 1% for Canada. going to final demand is insensitive to the value of the intra-sectorial flow (e.g. see Miller and Blair 2009, p. 279). Apart from the intra-sectoral flows, electricity is a sector further fraught with difficultly when monetary data is used to describe the distribution of electricity use. Different industrial sectors spend different amounts of money to receive the same KWh of electricity because the price per KWh differs by sector. GTAP does not rely on user submitted values in the energy rows of the I-O tables. Here physical data on energy use in Joules is taken from the IEA, converted to monetary values and placed in the I-O tables . This removes the problem of electricity prices described.
In Table 11, we calculate, from the Z matrix, the proportion of domestic electricity, gas and water supply that goes to each other domestic sector, including domestic final demand. Electricity, gas and water own use is zeroed, to highlight the proportional allocation of impact to consumers, and exports to intermediate and final demand in other countries are combined in the export column. Table 11 allows us to compare the proportional use by selected countries for all four MRIO databases. If the proportional use deviates from the multi-model mean by more than one standard deviation, the value is bold and shaded. Table 11 shows that for large developed nations, such as Australia, Canada, Germany, Spain, the UK and the US, GTAP allocates a greater share of electricity, gas and water to manufacturing (MANU) and a smaller share to domestic final demand (DOM FD). This implies that the price per unit for manufacturing sectors is lower than average and Eora, EXIOBASE and WIOD thus underestimate the energy use in these sectors. It is not conclusive whether this pattern holds for China, India and Russia because there are larger overall differences between the four MRIO databases for these nations. Further investigation into this re-proportioning construction method is needed before we can conclusively state that the electricity, gas and water sector is more reliably described in GTAP.

Residence or territorial principle?
This study recommends using the residence principle for allocation within the emissions vector. The residence principle is the technique used within the system of national accounts, thus this should be reflected in the data used to construct consumption accounts. It is perhaps an indication of the intention of the MRIO database construction community that the most recent database, EXIOBASE takes the residence principle (Usubiaga and Acosta-Fernández, 2015).

Economic data as a proxy for physical flows
An interesting finding is that GTAP's method of reallocating electricity spends to match the energy used proportions has significant difference when comparing structural paths. The investigations in this study have highlighted several classic issues in I-O analysis. If the energy sector covers both the energy producing and distributing functions of energy supply then this is an example of allocation uncertainty issue identified by Lenzen (2000). Lenzen (2000) warns of proportionality assumption uncertainties explaining that when monetary data is used in I-O tables to represent a physical flow of commodities between industries one assumes that a dollar spend on energy by the energy sector is the same Table 11. Proportion of electricity, water and gas supplied to each sector (excluding electricity, water and gas own use) by model and for selected countries.

Future use of MRIO outcomes in policy
Whilst calculations of national CBA have been shown to be robust (Lenzen et al., 2010), this study indicates that calculations which involve extracting smaller portions of national level results table such as finding product footprint may not be as accurate. Wiedmann et al. (2011) explain that product footprints may become policy relevant if eco-labelling becomes a requirement of product sustainability standards. MRIO databases are less similar at this level of detail and the data is subject to higher levels of uncertainty due to the assumptions made in the database construction starting to have an effect at this scale. In this study, the most detailed level of data is explored; the value chain. Results show that there are large variations in the size of supply chains between databases. These differences obviously reflect the different source data used but choice of source data does not impede the recommendation for using MRIO database to assess global value chains -alternative source data can always be supplemented in the database. The effect of different construction techniques is more of a concern here. There is no set of agreed steps for constructing the emissions vectors; dealing with missing data; balancing the database; and thus each MRIO database has its own unique construction method. The findings from this study suggest that the choice of territorial or residence principle for generating the emissions vector and the technique used in GTAP for dealing with electricity price variations have large effects on the outcomes. It is therefore suggested that global value chain data is not yet robust enough to be used in climate policy. Nevertheless  and  exploration of this approach shows its potential in demonstrating the interconnectedness of consumers, producers and associated environmental impacts in an increasingly globalised world.

Using aggregated data
The conclusions drawn are based on aggregated versions of the original MRIO databases. Owen et al. (2015) demonstrate that the aggregated versions are reasonable representations of the original databases using a series of matrix difference statistics. One could argue that care needs to be taken in interpreting results that are highly aggregated. However, this study calculated SPD on paths of length 11, which represented fourth order paths. Finding and identifying the fourth order paths from the original versions of the database would be a processing heavy calculation due to the sizes of the original matrices. The aggregated databases are quicker to use. Results using the aggregated versions could be seen as an initial sifting process. Now that the paths with cause for concern have been identified, the sectors involved could be studied in more detail at the disaggregated level.

Conclusion
This study represents the first time that SPD has been used with a S-S decomposition and is the first to compare PD between multiregional input-output (MRIO) databases. We show that SPD is an important technique for highlighting and explaining differences in the global value chains produced by MRIO databases in the calculation of consumption-based accounts. The work expands upon the findings from Owen et al. (2014) by including the EXIOBASE MRIO database and allowing consideration of difference at the sector level.
The findings presented in this paper will be of great interest to constructors of MRIO databases since they help to explain why MRIO database outcomes differ. The findings also point to key areas where harmonisation of source data and construction techniques could bring about convergence of results. For example, we find that sources of difference can be traced to the different emissions source data used, as well as choice of construction techniques, such as using the residence principle rather than the territorial for emissions allocation and redistributing energy spends based on physical data. This work should also be of interest to the users of MRIO database outcomes since it highlights at which scales results are most consistent. For example, national level consumption-based accounts are more robust than global value chain information.
We recommend that this work be extended to include future MRIO systems and to consider data from different years.