New technique to analyse global distributions of CO2 concentrations and fluxes from non-processed observational data

We have developed a new observational screening technique for inverse model. This technique was applied to our transport models with re-analysed meteorological data and the inverse model to estimate the global distribution of CO2 concentrations and fluxes. During the 1990s, we estimated a total CO2 uptake by the biosphere of 1.4–1.5 PgC yr-1 and a total CO2 uptake by the oceans of 1.7–1.8 PgC yr-1. The uncertainty of global CO2 flux estimation is about 0.3 PgC yr-1. We also obtained monthly surface CO2 concentrations in the marine boundary layer to precisions of 0.5–1.0 ppm. To utilize non-processed (statistical monthly mean) observational data in our analysis, we developed a quality control procedure for such observational data including a repetition of inversion. This technique is suitable for other inversion setups. Observational data by ships were placed into grids and used in our analysis to add to the available data from fixed stations. The estimated global distributions are updated and extended every year.


Introduction
Of the greenhouse gases, CO 2 contributes most to global warming. However, the distribution of its sources and sinks is not fully understood (IPCC, 2007). The World Meteorological Organization (WMO) has launched the Global Atmosphere Watch (GAW) programme, which recommends, in its strategic plan for 2008-2015, that observational data be integrated by numerical models and data assimilation techniques according to the International Global Atmospheric Chemistry Observations (IGACO) strategy. Such integrated products are important and effective tools for environmental monitoring. Following this strategy, the Japan Meteorological Agency (JMA) has provided an integrated product of the distribution of CO 2 concentrations since 2009 by putting non-processed (statistical monthly mean) observational data from cooperative laboratories into our transport model and the inverse model, which is one of the most useful tools for this purpose (Rayner et al., 1999;Gurney et al., 2002;Rödenbeck et al., 2003;Gurney et al., 2004;Baker et al., 2006;Stephens et al., 2007;Gurney et al., 2008). Although many studies use processed observational data in the inverse model, we used non-processed observational data in the analysis to estimate CO 2 fluxes with a high spatial and temporal resolution because there must be some signals that are contained only in nonprocessed data. However, some observational data are affected by local fluxes and such data may easily give adverse impacts on analysis results; the quality of the estimation depends on how to screen non-processed data. Therefore, we have developed a new data screening technique for estimating CO 2 distributions. The important feature of our analysis is that we only used data from selected observational site that were screened by utilizing results from the inverse model. This method is simple but rational in that we can select observational data that match our inversion setup (number of regions, pre-subtracted and prior fluxes, uncertainty of fluxes and distribution of observational sites). If certain observational data are used only in a specific phase in the inversion, we can do so by determining flexibly which data to be used in one phase but not in another. Michalak et al. (2005) showed an important approach in surface flux inversions. In this method, observational errors were estimated using a Maximum Likelihood approach. However, this method did not completely reject observational data, which were affected by local fluxes and might cause impacts on analysis results.

Inversion method
We utilized a Bayesian Synthesis Inversion method (Tarantola, 1987;Enting, 2002) to estimate surface CO 2 fluxes from observational data, in combination with an atmospheric transport model and prior fluxes statistically determined for 22 regions. Our analysis method is based on TransCom 3, which is described by Baker et al. (2006). The methods and parameters for our analysis are summarized here. We assumed that observed CO 2 concentrations be determined linearly from regional fluxes through a matrix of transport coefficients, as shown in eq. (1), where y i , and x j are observed CO 2 concentrations and regional CO 2 fluxes, respectively, and a ij are coefficients representing the contributions of regional fluxes to observations which are calculated by the transport model.
In general, meaningless fluxes may be calculated when eq. (1) is solved simply to find CO 2 fluxes that minimize the differences between observations and model simulations. To avoid this disadvantage, we used the following techniques. First, prior fluxes (x p ) and their uncertainties (C x ) were introduced to determine an x that minimized the cost function S(x) in eq. (2), which calculated more suitable regional fluxes using prior information as shown in the second term of the right-hand side in eq. (2). Secondly, the transport model results from pre-subtracted fluxes (emission from fossil fuel burning and biospheric and oceanic exchanges) were subtracted from observations and then residual concentrations were inversed using eq. (1). In eqs. (2) and (3), C y is the uncertainty of observed CO 2 concentrations and the matrix A (a ij ) contains the result of transport model from all unit fluxes and A T means its transposed (adjoint) matrix. The posterior regional flux estimate x may ⎛ be solved for analytically as eq. (3). From the calculated posterior CO 2 fluxes and subsequent simulations with the transport model, CO 2 concentrations were obtained globally in three-dimensional grids.
2.1.1. Prior fluxes and their uncertainties. Surface fluxes were estimated for 22 geographical regions, as for TransCom 3 (Gurney et al., 2000), for the period from 1985 to 2007. The transport model was run for 3 years with a pre-determined pattern of CO 2 fluxes in the re-analysed wind fields (JRA-25, Onogi et al., 2007) to obtain pre-determined regional flux patterns for different regions and months of the year in a way consistent with TransCom 3 (Baker et al., 2006). We also adopted prior source fluxes and their uncertainties as done in the previous study (Baker et al., 2006).
To obtain realistic flux estimates, pre-subtracted fluxes were used as for TransCom 3. The spatial pattern of emissions from fossil fuel burning in 1990 (Andres et al., 1996) and 1995 (Brenkert, 1998), CASA biosphere fluxes (Randerson et al., 1997) and oceanic exchanges (Takahashi et al., 1999) were treated as known fluxes and the contributions of these known fluxes were subtracted from the observed concentrations before calculating the minimal cost function. Recent CO 2 emissions from fossil fuel burning were estimated from annual global statistics from 1980 to 2007 (Marland et al., 2007). We used the values for 2007 for the years thereafter.
2.1.2. Observational data and their uncertainties. Previous studies (Baker et al., 2006;Gurney et al., 2008) used processed data from GLOBALVIEW (NOAA, 2007), which were smoothed, interpolated and extrapolated. However, such data may not retain signals that can only be found in non-processed observational data. In this study, we used monthly mean observational data submitted to the WDCGG from different laboratories for the inversion (Rödenbeck et al., 2006). This is an important feature of our analysis. To put non-processed observational data into the inversion, the data were subjected to quality control and their uncertainties were estimated. In the first step of quality control, only data traceable to the WMO standard scale were selected (Zhao and Tans, 2006). In the second step, we chose the longest one from plural records in a single location. Next, missing data were filled by interpolation or extrapolation (Nakazawa et al., 1991;WMO, 2000) to create data sets representing average seasonal cycle. Uncertainties of the data were defined as the standard deviation of the differences between the observed and the smoothed data as done by many previous studies. Data in periods of no observation were filled with the data created earlier, but were given a large uncertainty (150 ppm) to avoid affecting analysis results. The data were subsequently screened in the following steps ( Fig. 1): 1. All available observational data were put into the inversion model to obtain global estimates of concentrations.
2. Monthly observations with a larger mismatch (analysisobservation) than the threshold (1 sigma for the first cycle or three sigmas for the subsequent cycles. The sigma is the square root of the mean squared value of the distance of each analysis from the observation for all observational sites) were rejected (given a large uncertainty (150 ppm)): data selection.
3. To reduce computational cost, large uncertainties were given to all observational data in the sites from which more than a half of observational data were rejected (site selection).
4. If any data were rejected in Step 2, the procedure from Step 1 was repeated.
5. The inversion results (concentrations) so far were treated as those in the OBS case.
6. Predetermined uncertainties, defined as the standard deviations from the seasonally smoothed data, were given to the sites from which less than a half of observational data were rejected (data restoration in the selected sites).
7. As the final step, the inversion was performed once again and the resulting concentrations were treated as those in the CNTL case.
In general, we repeated three to four times of Steps 1-4 to finish the analysis. This process leads to the rejection of observational data that differ from the concentration estimated by the inverse model based on a majority of the observational data. It is emphasized that the rejected data are not represented in the sparse horizontal and spatial resolution of the model. The observational sites are distributed as shown in Fig. 2, in which colours indicate the rates of monthly values selected before Step 5. In general, the rates are large at oceanic sites and in the southern hemisphere, but small at land sites and in the Northern Hemisphere. In the European area, there are many sites where data selection rate is extremely small. In this area, many data measured in summer and winter were rejected. This is likely due to a large spatial flux variability from local anthropogenic or biospheric fluxes. Therefore, a larger number of geographical regions are required in model calculations for a more detailed analysis.
In addition to the data from fixed stations on land, we used observational data from ships in the western Pacific; these data were placed into grids by averaging over 5 • × 5 • square boxes and then over a month to increase the horizontal resolution of the analysis and minimize the period of no observational data. The uncertainties of such data were fixed at 0.5 ppm, based on the uncertainties of neighbouring sites and the temporally uneven distribution of observational data.
As a result, data from as many as 146 sites/grids, of which 58 were from fixed land stations, 13 from flask aircraft observations at the upper troposphere and lower stratosphere (Matsueda et al., 2002;Machida et al., 2008) and 75 from ship observations, were selected for use in the analysis. These numbers were larger than in previous studies. The important feature of this observational network is that we could select more than 90% of flask aircraft observations (The north-south circle coloured red from Japan to Australia in Fig. 2). This means that a combination of our transport model and analysed surface flux could represent CO 2 concentrations variability in lower troposphere. This may be supported by previous study (Stephens et al., 2007). The contributing institutions are listed in Table 1 and a time series of the numbers of sites/grids is shown in Fig. 3. There were about 70-80 sites/grids available in the 1990s, but only 20-40 sites/grids in the 1980s. In total, we introduced 21 738 monthly mean observations, of which 19 527 were selected in the CNTL case and 17 539 in the OBS case. We also tested to use GLOBALVIEW data experiment without data selection procedure shown earlier, which is referred to as the GV case, in which   Sasaki et al., 2003) was used for the analysis. This is an offline model based upon JMA's operational global spectral model (Sugi et al., 1990) to calculate distributions of passive tracers (CO 2 , 222 Rn, SF 6 , etc). The model has a horizontal resolution of 2.5 • and 32 vertical layers. It can simulate horizontal and vertical transport, vertical diffusion by turbulence (Mellor and Yamada 1974), cumulus convection (Kuo, 1974) and shallow convection with surface fluxes of given tracers. The model was utilized in several TransCom experiments and received good evaluations in terms of vertical mixing strength (Stephens et al., 2007). The vertical mixing coefficients of JMA-CDTM are relatively high compared with other transport models Patra et al., 2008). Data from the Japanese 25-year re-analysis (JRA-25, Onogi et al., 2007) and JMA Climate Data Assimilation System (JCDAS), following the same methodology as JRA-25, were used to calculate the transport of tracers. This high-quality and long-term meteorological   -25 (1985-25 ( -2004-25 ( , Onogi et al., 2007 and JCDAS (2005-present) Transport scheme Semi-Lagrangian (horizontal) and box scheme (vertical) Vertical diffusion Turbulent (Mellor and Yamada 1974), cumulus convection (Kuo, 1974) and shallow convection.
data set enabled us to analyse CO 2 concentrations from 1985 to 2007. The main features of the analysis method are described in Table 2.

Estimated regional fluxes
In this section, we show estimated fluxes in three experiments: CNTL, OBS and GV cases. In the CNTL case consisting of all processes of data selection, examined the contribution of meteorological variability. In the OBS case, Step 7 in Section 2.1.2 was skipped, so that only extreme outliers were excluded from the observational data. In the GV case, using processed GLOBALVIEW data, we did not apply the data selection method described earlier because the data were already quality controlled. Figure 4, which relates the global land and ocean fluxes to the ENSO index, defined as the sea surface temperature (SST) deviations from the climatological averages over a sliding 30-year period in the east tropical Pacific. The important feature is that there was no large difference in the results between the three cases. This shows that our data and site screening approach is replaceable to other data quality control systems in previous studies (e.g. GLOBALVIEW). After 1994, small (0.1-0.3 PgC yr −1 ) differences are seen between CNTL case and GV, OBS cases. This may be due to the effect of data (not site) selection. The results also shows that the total land CO 2 fluxes increased following El-Niño events (1987-1988, 1997-1998 and 2002-2003), with delays of several months. This relationship was not observed in 1991-1992, due to the eruption of Mt. Pinatubo in June 1991 and the introduction of large amounts of aerosols into the atmosphere, thus affecting the climate. This study also shows that ocean global fluxes increased during El-Niño events, but that the peaks in 1997-1998 in land flux were smaller than previously reported (Baker et al., 2006;Gurney et al., 2008), thus contributing to the increase in global CO 2 concentration. This may be due to the specifications of relatively strong vertical mixing in the lower troposphere in JMA-CDTM.
To evaluate the sensitivity of data selection in the flux estimation in a regional scale, the three methods (CNTL, OBS and GV cases) were compared. With respect to carbon fluxes, the global difference between the three cases was not very large, but there were some regional differences (Figs 5 and 6). In land regions, large differences were seen in less constrained regions where many data were rejected, for example Temperate Asia, Europe, Boreal Eurasia, Temperate North America and Tropical Africa. The interesting result was that the fluxes in Temperate Asia increased only in the CNTL case. The OBS and GV cases did not represent the features of observational data (summer minimum and winter maximum). In ocean regions, large differences were seen in the Northern Ocean, north Pacific, west tropical Pacific and south Indian Ocean. The large difference in the north Pacific, a well-constrained region, may be due to smaller rates of data selection at many sites/grids in this region, as shown in Fig. 2. Flux variability (standard deviation from the average) was generally the smallest in the OBS case and the largest in the CNTL case. The result is due to the features of observational data in each case. In the CNTL case, we used all observational data from the selected sites. In the GV case, we used all processed observational data from GLOBALVIEW. In the OBS case, we used only selected data from selected sites and the data rejection rate was relatively high in summer and winter. The estimated uncertainties of fluxes are summarized in Tables 3 and 4. The former indicates that the uncertainties for posterior sources did not differ much between the cases in many regions. The differences are only seen in the regions (Europe, Temperate Asia and Temperate North America) where many data are rejected in the OBS case. To validate such results, we needed to validate our results with independent observational data (e.g. continuous aircraft observation; Machida et al., 2008).
Carbon fluxes in the 1980s (1985-1990), 1990s (1991-2000) and 2000s (2001-2007) were estimated for different regions and for the whole globe. In the 1980s, the averaged land and ocean uptake without fossil fuel emission was 0.9 and 1.3 PgC yr −1 , respectively. The tropical emission was 1.1 PgC yr −1 , the northern uptake was 3.2 PgC yr −1 and the southern flux was nearly zero. In the 1990s, due to the effect of the eruption of Mt. Pinatubo in June 1991, the total pre-subtracted flux was about 1.0 PgC yr −1 larger than the increased amount of atmospheric CO 2 . The northern and southern land regions contributed to the reduced fluxes. In the 2000s, as the total pre-subtracted flux increased, land and ocean uptake remained at the level of the 1990s. One important feature of this analysis is that the uptake by the Southern Ocean was weaker than the prior flux, as suggested previously (Gurney et al., 2002;Baker et al., 2006).

Sensitivity tests
As described in the previous sections, the analysis in this study features the use of non-processed (statistical monthly mean) observational data, including ship measurements and re-analysed meteorological data for a period longer than two decades. To evaluate the impacts of these features (data selection method, meteorological variability and ship observational data) and analysis precision, we assessed sensitivity by the following tests.
The first test, referred to as the 'CNTL' case, consisted of all processes of data selection and examined the contribution of meteorological variability. The second 'CLMT' case used the averaged results from the transport model for the whole analysis period as the transport coefficients. In the other tests, where the results of the transport model for specific years were used as a transport matrix, are referred to as the 'Y1997', 'Y1998', 'Y1999', 'Y2000' and 'Y2001' cases, based on the starting year of the transport model simulation with re-analysed meteorology. The mismatches in CO 2 concentration between the observational data and the results of these tests are shown for different regions of the world in Table 5. The mismatch was the smallest in 'CLMT'. The mismatch in 'CNTL' was a little larger than that of 'CLMT', but smaller than the cases from 'Y1997' to 'Y2001'. This series of tests showed that model results are suboptimal when using specific years as a transport matrix. The reason why 'CLMT' provided similar mismatch to 'CNTL' may be due to some imperfection in the transport model, especially in vertical transport.
In the second set of calculations, several sites including 'Ryori (142 • E, 39 • N)', 'Minamitorishima (154 • E, 24 • N)', 'Yonagunijima (123 • E, 24 • N)', 'Iceland (20 • W, 63 • N)' and 'Easter Island (109 • W, 27 • S)' were removed from the 'CNTL' case to check the stability of the analysis and the contribution of observational sites. These stations are located in the area where the estimated fluxes are largely different between the CNTL and OBS cases described in the previous section. Table 6 shows that data mismatches between the observation and analysis were mostly in ranges of 0.5-1.0 ppm. The mismatches were small in the oceans and Southern Hemisphere, but large in the land areas and Northern Hemisphere due to the error in pre-subtracted fluxes (especially biospheric flux). The mismatches became larger in all cases when specific sites were removed. The increase in data mismatch was significant when there were no other sites around the removed site as was the case in Iceland, whereas the mismatches did not change when many other sites existed around the removed site (e.g. Minamitorishima and Yonagunijima). The data mismatches were larger for land sites like Ryori, where analysed concentrations were sensitive to regional land flux, than for oceanic sites. The reason why the mismatch increased small in Easter Island may be that concentration variability is small in the Southern Hemisphere.
In the third set of calculations, 'gridded' ship sites were excluded from the 'CNTL' case. Posterior flux uncertainties were compared because no significant changes were expected in the distribution of CO 2 concentrations and data mismatches between the two cases. As are shown in Tables 3 and 7, posterior flux uncertainties became smaller than in the 'CNTL' case for all regions except Tropical Indian Ocean (+0.1%). The flux uncertainties were largely reduced in neighbouring (Boreal North America, Boreal Eurasia, Tropical Asia and Temperate Asia) and less constrained (Temperate South America, Boreal Eurasia and Southern Ocean) regions.

Comparison with a similar study
The monthly CO 2 concentrations in the 'CNTL' case were compared with those in CarbonTracker , which estimated carbon fluxes and concentrations at a high resolution from 2001 to 2007 using the Ensemble Kalman Filter (EnKF). In Fig. 7, the data mismatches in the 'CNTL' case were compared with those in CarbonTracker for different observational sites. The averaged data mismatches were similar to each other, 0.9 ppm for CNTL and 1.1 ppm for CarbonTracker and their geographical patterns did not differ greatly, except for northern land regions where data mismatches were smaller in Carbon-Tracker. This may be due to the higher spatial and temporal resolutions in carbon fluxes for CarbonTracker, especially in land regions.

Result and conclusion
We have shown here the ability to estimate global distributions of CO 2 concentrations over more than two decades with a new observational data and site screening technique. The estimated global CO 2 fluxes were quite similar to those shown previously (Stephens et al., 2007). Due to the specifications of our transport model regarding strong vertical mixing, the estimated uptake was relatively weak in the northern lands but relatively strong in the tropics. In the Southern Hemisphere and oceanic areas, CO 2 concentrations were estimated to precisions of 0.5-1.0 ppm. Our analysis is characterized by dependence on the use of observational data and on its selection of sites, especially the use of more marine boundary and less land sites. Theoretically, more land sites are included by adopting an analysis scheme with higher horizontal and temporal resolutions. This method of data selection may easily be introduced to other inversion method studies without developing new methods for site and data selection or data quality control.
The first experiment showed that interannual variability in meteorology was important in estimating CO 2 fluxes; the averaged climatological meteorology produced the equivalent results, given the current specifications of our transport model. Modified transport processes, especially a vertical transport scheme, would improve a performance in using interannual meteorology.
The second experiment showed the stability of our analysis. In areas of densely distributed observational sites, the precision of the analysis was stable, independent of an increase or decrease in the number of observational sites. However, in areas of sparse distribution, analysed CO 2 concentrations were sensitive to changes in the observational network especially in the northern hemisphere. To perform highly precise and stable analyses, data from other types of observation should be introduced, including continuous measurements on board aircraft and observations from satellites. The former produces observational data with a similar precision to observations on the surface (Machida et al., 2008), and the latter covers almost the whole world.
The third experiment showed that our analysis improved as observational data increased, beginning in 1993, and that our analysis was highly precise and stable especially in the Pacific area. To make such evaluations for other areas, other types of observational data should be introduced, as described earlier.
When we compared our analysis with that of CarbonTracker in terms of data mismatches, we found that the performance of our analytical method was sufficiently high. Considering the current observational network, the quality of our analysis was constantly high for more than 20 years.

Summary
A new technique has been developed to treat observational data in analysing global CO 2 concentrations. This technique does not require any other method for the quality control of observational data. Using this technique, non-processed observational data were easily analysed to estimate CO 2 fluxes and distributions for more than two decades in accordance with an inversion setup without developing a quality control system for observational data. This is an important feature of our analysis that researchers can make use of this technique with their own inversion setup. With the high-resolution transport model and flux regions, we can use observational data that were not available in the current inversion. However, we may result in rejecting site/data that should be used in the inversion due to imperfect inversion setups (e.g. transport model and prior surface fluxes). We need to improve further our inversion setup to obtain robust carbon fluxes. The estimated CO 2 fluxes were similar in quantity to those previously reported. In the 1990s, CO 2 uptakes totalled about 1.4 PgC yr −1 on land areas and 1.8 PgC yr −1 in the oceans. The uncertainty of global CO 2 flux estimation was about 0.3 PgC yr −1 . Estimated CO 2 uptakes were relatively weak in the northern hemisphere and strong in the tropics. These results are almost consistent with previous studies. The analysed CO 2 concentrations had precisions of about 0.5-1.0 ppm in the marine boundary layer. The mismatches between the observed and estimated concentrations were generally larger in northern land areas than in other areas. Several modifications would reduce the mismatches in our analysis. The first is the inclusion of Table 4. Long-term averaged CO 2 flux estimates from this study for the 1980s (1985-1990), 1990s (1991-2000) and 2000s (2001-2007)   additional types of observational data by aircraft (Machida et al., 2008) and satellites (GOSAT). Secondly, inversion settings such as increased number of regions and reduced error in the presubtracted fluxes should improved the precision of the analysis.
Thirdly, our transport model should interpolate the limited meteorological parameters to simulate the processes of tracer transport more precisely, by combining the current off-line model with GCM MJ98 (Shibata et al., 1999), which calculates GCM parameters without temporal or spatial interpolations. The EnKF would be an option for the analysis scheme to minimize computational costs and to realize a more precise analysis (Miyazaki, 2009).