Quantifying the benefit of A-SCOPE data for reducing uncertainties in terrestrial carbon fluxes in CCDAS

ESA’s Earth Explorer candidate mission A-SCOPE aims at observing CO2 from space with an active LIDAR instrument. This study employs quantitative network design techniques to investigate the benefit of A-SCOPE observations in a Carbon Cycle Data Assimilation System. The system links the observations to the terrestrial vegetation model BETHY via the fine resolution version of the atmospheric transport model TM3. In the modelling process chain the observations are used to reduce uncertainties in the values of BETHY’s process parameters, and then the uncertainty in the process parameters is mapped forward to uncertainties in both in long-term net carbon flux and net primary productivity over three regions. A-SCOPE yields considerably better reductions in posterior uncertainties than the ground-based GLOBALVIEW station network. This is true for assimilating monthly mean values and instantaneous values, and it is true for two potential vertical weighting functions. The strength of the constraint through A-SCOPE observations is high over the range of observational uncertainties.


Introduction
CO 2 is the most important anthropogenic greenhouse gas, and the continued increase of atmospheric CO 2 is accepted to be the major reason for present, observed global warming. The increase of CO 2 is clearly of anthropogenic origin, but it is tempered by uptake from natural reservoirs. Therefore, understanding and predicting the cycling of this gas through natural and humancontrolled systems is a matter of special importance. Despite considerable advances, major questions remain about the magnitude and distribution of present sources and sinks of this gas as well as their evolution, and their controlling mechanisms, especially their response to climate change. A clear requirement is the development of monitoring tools to ascertain the current sources and sinks and their changes. Such changes may occur as a result of climate change or of deliberate mitigation strategies. Hence, the monitoring of CO 2 and other greenhouse gases is immensely important, both for fundamental Earth System Science (as a necessary complement to global modelling), and for international climate policy.
A first awareness of global change began in the late 1950s when Charles David Keeling from the Scripps Institution of Oceanography, San Diego, developed a technique precise enough to detect the rise in atmospheric CO 2 . Keeling's time series of atmospheric CO 2 concentration measurements from Mauna Loa (Keeling and Whorf, 2002) is an icon of contemporary environmental science. Major observational programs were subsequently put in place by various institutions from many different countries to create a global network, providing spatial gradients of CO 2 concentrations to constrain location and strength of CO 2 sources and sinks.
The current ground-based in situ measurement network is mostly based on flask samples which are collected weekly to biweekly. The sampling stations are mainly located at remote sites to sample the CO 2 concentration of the marine boundary layer. A single flask measurement can be done with a high accuracy (≈ 0.2 ppmv) such that a homogenized and gap-filled data product of this network like GLOBALVIEW (GLOBALVIEW-CO 2 , 2004) reports uncertainties of 0.5-1 ppmv depending on the station location. However, the temporal (weekly to biweekly) and spatial (large gaps for instance in the tropics) resolution is fairly poor. Since the last few years new continuous atmospheric CO 2 observations have become available at some of these remote measurement stations to overcome the limitation in the temporal domain. Nevertheless, the network is still too sparse to quantify CO 2 sources and sinks on a regional to continental scale. This reflects the underdetermined nature of the inverse problem of inferring two-dimensional surface flux fields from point measurements (Kaminski and Heimann, 2001).
In addition to the ground-based flask and continuous station measurement system, there are a number of tall tower sites around the globe, which also provide in-situ continuous measurements of CO 2 . These towers supply measurements of CO 2 in the continental boundary layer rather representative for regional to continental scale fluxes. For instance, NOAA is running around ten towers in the United States, and the EU's CHIOTTO and then CarboEurope projects have set up around ten towers in Europe and Siberia. It is expected that this network will further expand in the future. Furthermore, there is an ever expanding network (see http://www.fluxnet.ornl.gov) providing local-scale direct flux observations via eddy covariance techniques.
Remote sensing of atmospheric CO 2 from space has the potential to deliver the data needed to substantially reduce the currently large uncertainties in the spatial and temporal distribution of CO 2 sources and sinks. Several sensitivity studies have evaluated the improvement in atmospheric inversion simulations of CO 2 that would be enabled by precise, global space-based integrated column CO 2 data. A pioneering study has been performed by Rayner and O'Brien (2001) who established the required precision for column-integrated CO 2 concentration data to be useful in constraining surface sources. Using an atmospheric synthesis inversion the required precision of monthly averaged (uniform weighting function) column data should be better than 2.5 ppmv (1.5 ppmv for oceanic coverage only) on a 8 • × 10 • footprint for comparable performance with the existing surface network. They also reported that space-based column CO 2 observations with 1 ppmv precision were predicted to substantially reduce inferred CO 2 flux uncertainties of annual mean fluxes from greater than 1.2 GtC region −1 yr −1 to less than 0.5 GtC region −1 yr −1 when averaged over the annual cycle and for continent-ocean basin scale regions. Since then further studies (Pak and Prather, 2001;Rayner et al., 2002;Patra et al., 2003) have essentially confirmed the message that the overall precision on the measurements, including instrument uncertainties, noise, and uncertain atmospheric properties, needs to be better than 1% (or 3.6 ppmv) to provide a constraint on CO 2 fluxes comparable with the ground-based network.
More recently some studies have taken into account the characteristics and therefore potential benefits of different types of satellite instruments in synthetic inversion approaches. For example, Houweling et al. (2004) have distinguished between thermal infrared (AIRS) and near infrared (SCIAMACHY, OCO) spectrometers. The thermal infrared instrument AIRS has the advantage that it can measure the entire globe independent of day light or surface albedo, and thus has a relatively high number of measurements. Because of AIRS' independency on high surface albedo it is notably better performing over the oceans than SCIAMACHY. In contrast, OCO is able to measure in sunglint mode over the oceans. A crucial factor is the ability to measure at low altitudes; therefore the near infrared instruments SCIAMACHY and OCO will certainly be more favourable as the thermal infrared instrument has a rather limited sensitivity to CO 2 near the surface. Their overall conclusion is that OCO will be the most promising satellite concept of those tested. Miller et al. (2007) and Chevallier et al. (2007) specifically looked at the contribution of OCO column integrated CO 2 retrievals (X CO 2 ) to the reduction of uncertainties in the estimation of CO 2 sources and sinks. Both could show that, given the estimated error characteristics of the OCO instrument, OCO observations would significantly reduce the uncertainties of CO 2 surface fluxes, in the case of Chevallier et al. (2007) even at weekly timescale and grid point resolution of the underlying transport model (2.5 • × 3.75 • ) over land (reduction of 15-40% of prior uncertainties) and on monthly and basin-wide resolution over oceans (reduction of 20-40% of prior uncertainties). Two further recent studies also addressed the assessment of the OCO mission (Baker et al., 2008;Feng et al., 2009).
Unfortunately the launch of NASA's OCO mission in February 2009 failed. However, another satellite mission specifically aimed at measuring CO 2 from space, the GOSAT mission of the Japanese Aerospace Exploration Agency, was successfully launched in January 2009. GOSAT carries both a thermal and a near infrared spectrometer. The thermal spectrum provides similar information to what AIRS has been measuring, whereas the near infrared provides information about the total CO 2 column, which is more important for flux estimation. Chevallier et al. (2009) have quantified the potential of GOSAT data in a sensitivity study similar to the above mentioned studies. They found that GOSAT should significantly reduce uncertainties in CO 2 flux estimations over terrestrial vegetated areas at the scale of weeks and a few hundred kilometres, over the oceans improvements are only seen over larger scales (e.g. ocean basins and over a year).
The above-mentioned approaches to convert atmospheric CO 2 concentrations into estimates of surface CO 2 fluxes are all based on 'top-down' inverse modelling of atmospheric transport. While this approach yields insights into the recent past and present, it cannot have predictive ability for the future. Future predictions, in contrast, are based on results from 'bottom-up' process-based model simulations. These simulations, however, lack the rigorous inclusion of the observational constraint.
An alternative method that is fully consistent with both the philosophy of inverse modelling, and the approach of predictive modelling employs techniques from variational data assimilation. In a first step both process parameters and initial conditions are estimated with the best possible accuracy using the best available observational constraints at the appropriate scale of the problem. A second (prediction or 'prognostic') step is then using not only standard modelling techniques employing the optimized parameters and initial conditions to arrive at a forecast, but also techniques of uncertainty propagation to estimate uncertainty ranges for the prediction (Scholze et al., 2007). This is a significant advance over current modelling techniques.
The Carbon Cycle Data Assimilation System (CCDAS) (Scholze, 2003;Rayner et al., 2005) is so far a unique example of the above outlined approach. It builds upon the study by Kaminski et al. (2002) who have used the seasonal cycle of atmospheric CO 2 to constrain a simplified terrestrial biosphere model. In CCDAS this simplified model is replaced by the more comprehensive, prognostic terrestrial biosphere model BETHY (Knorr, 2000;Knorr and Heimann, 2001). CCDAS uses a reduced version of BETHY which has no phenology scheme and no water balance. Instead it uses pre-optimized leaf area index (LAI) and plant available soil moisture. Global vegetation is mapped onto 13 plant functional types (PFT). Fifty-seven control parameters affect the photosynthesis scheme, and both the autotrophic and heterotrophic respiration schemes. The assimilation of time series of atmospheric CO 2 flask data in CCDAS is controlled by a gradient algorithm, which searches the parameter space by iterative evaluation of a cost function and its gradient with respect to the parameters. The gradient information is provided efficiently by the model's adjoint. At the cost function minimum, an uncertainty of the estimated parameter set that is consistent with assumed observational and model uncertainties is approximated by the inverse of the function's full Hessian matrix, evaluated for the optimal parameter set up to machine precision. The calibration process, hence, delivers a set of optimized parameters, together with their uncertainties (see Fig. 1).
CCDAS makes considerable use of derivative code, i.e. the adjoint code providing the gradient of the cost function, the Hessian code used to approximate parameter uncertainties, and Jacobian code to propagate these uncertainties forward. All derivative code is generated directly from the model's source code (Kaminski et al., 2003) by the automatic differentiation tool Transformation of Algorithms in Fortran [TAF, Giering and Kaminski (1998)]. Since CCDAS has only 57 parameters, the evaluation of the full Hessian and the full Jacobian are computationally feasible.
In this study, we employ CCDAS to explore the benefit of the concept for ESA's Earth Explorer candidate mission A-SCOPE, which differs from the above mentioned concepts for observing CO 2 from space (AIRS, OCO and GOSAT) in that it relies on an active LIDAR instrument. NASA is pursuing a similar concept with the ASCENDS mission (Michalak et al., 2008). The advantages of an active mission are that it does not require the sun as a light source, and can therefore provide both day and night, all-seasons and all latitude measurements and thus will provide an increase in the number of observations by a factor of two to three compared to passive missions. But more importantly, such an active mission concept provides a direct measurement of the atmospheric path and thus can assure the observation of the entire atmospheric column. This is an advantage over the OCO and GOSAT concept, which are particularly sensitive to the presence of aerosols, leading to potentially large gaps in regions with relatively persistent high levels of aerosols, such as some tropical regions (e.g. India), southeast Asia, or the Sahara. Two potential wavelengths, namely 1.6 µm and 2.0 µm, have been identified because of their high signal-to-noise ratio and favourable, near-uniform vertical weighting functions (see ESA, 2008 for details on the A-SCOPE mission concept).
As a measure for the performance of A-SCOPE data we use the posterior uncertainty on regionally aggregated surface fluxes. Methodologically we are addressing a network design problem in a quantitative way. Quantitative network design was introduced to biogeochemistry by Rayner et al. (1996), who used an inverse model of the atmospheric transport to design surface networks, and it was also Rayner and O'Brien (2001) who first applied the approach to mission design. Kaminski et al. (2002) demonstrated the application of quantitative network design techniques for assimilating a synthetic flux measurement together with global atmospheric CO 2 samples and vegetation greenness approximated by AVHRR observations. The remainder of this paper is organized as follows. Section 2 introduces quantitative network design methods and the extensions of CCDAS that were necessary to conduct the study. Section 3 describes and discusses the experiments that have been performed. Section 4 summarizes the main findings and draws conclusions.

Methodology
Methodologically, assessing the potential of a particular data stream in terms of quantifying a target quantity belongs to the class of so-called network design problems. This section gives a brief introduction to the mathematical formalism for quantitative network design, and then describes the extensions of CCDAS that were required to conduct the study.

Brief introduction to quantitative network design
Quantitative network design uses data assimilation systems and is, thus, closely linked to data assimilation. Hence, our introduction (following Kaminski and Rayner, 2008) starts off with the formalism behind Fig. 1. In the formulation of the inverse problem it is convenient to quantify the state of information on a specific physical quantity by a probability density function (PDF): the prior information is quantified by a PDF in the space of control variables (here, process parameters of BETHY and the initial concentration), the observational information by a PDF in the space of observations, and so on. Tarantola (1987) describes this probabilistic framework in detail and provides examples. Enting (2002) introduces the same framework with an exhaustive overview on applications to biogeochemistry.
If the input to the inverse problem can be characterized by Gaussian PDFs, the model that links control variables to observations is linear, and the model error follows a Gaussian PDF as well, then the posterior information is also quantified by a Gaussian PDF (see Tarantola, 1987). The mean of that PDF is given by and the covariance of its uncertainty is given by where M denotes the Jacobian matrix of the model, x 0 and C(x 0 ) the mean and covariance of the prior information's PDF. d and C(d) denote the mean and the covariance of uncertainty of the observations. In the inversion procedure the corresponding PDF has to reflect errors in both the observational process and our ability to correctly model the observations. We achieve this via and by subtracting the mean model and observational errors from Mx 0 and d, respectively. Note that, in practice, these mean errors are usually difficult to assess.
It is easy to verify that x (from eq. 1) minimizes the cost function (the exponent of the Gaussian posterior PDF) and that the Hessian matrix H (x) of J, i.e., the matrix composed of its second partial derivatives ∂ 2 J ∂x i ∂x j , is constant and given by If the model is non linear or any of the PDFs of the inputs are non-Gaussian, eqs (1) and (2) do not hold anymore. But we can still approximate the posterior PDF by a Gaussian with mean x given by the minimum of eq. (4) (with the matrix M generalized to the non linear model M(x)) and covariance given by eq. (5).
In practice, any variational data assimilation system, for example, in operational numerical weather prediction or oceanography, is based on eq. (4). The optimization mode of CCDAS uses an iterative procedure to minimize the cost function of eq. (4), which yields x, and computes C(x) via eq. (5). As long as the uncertainties in the individual data streams are independent, the contribution of each of them to the right hand side of eq. (4), and, hence, also to eq. (5), can be quantified by a separate term in the sum. In this formalism, the contribution of a synthetic data set (e.g. synthetic A-SCOPE observations) is to be handled as follows: (1) The mean value is generated with the model itself, that is, the equation d = M(x) is applied, where x is the best possible parameter value, taken from a minimization of eq. (4) for the existing observational network.
(2) The covariance of uncertainties (eq. 3) is specified such as to reflect the expected characteristics of the observational products generated by the instrument and our ability to simulate them.
By construction of the synthetic data, their cost function contribution (eq. 4) at the optimum, x, is zero but positive in the neighbourhood of x. This means the synthetic data increase the curvature of the cost function. In mathematical terms, the curvature is expressed by the Hessian in eq. (5), which takes full account of the specified uncertainty in the synthetic data and their sensitivity to the model parameters. The effect of the synthetic data is a reduction of posterior parameter uncertainty.
The second step in Fig. 1 is the estimation of a diagnostic or prognostic target quantity y, in our case some spatio-temporal mean carbon flux. The target quantity's PDF can be approximated by a Gaussian with mean and the covariance where N is the model (in Fig. 1 denoted as diagnostic/prognostic model) that maps the control variables onto the target quantity, D is its linearization around the mean of the posterior PDF of the control variables, also denoted as the Jacobian matrix of N , and C(y mod ) is the uncertainty in the model result from errors in the model. Only if y coincides with one of the observations used in the inversion step, this uncertainty is already accounted for in C(x), and we omit the C(y mod ) contribution. If N is linear and the posterior PDF of the control variables Gaussian, then the PDF of the target quantity is Gaussian as well, and completely described by eqs (6) and (7).

Including A-SCOPE data in CCDAS
One of the objectives of this study is to assess the data uncertainty for A-SCOPE that is required to achieve a given posterior uncertainty in a scalar target quantity σ y . Since we will use a diagonal C(d), with only two different entries for data over ocean and land, respectively denoted by σ 2 d,O and σ 2 d,L , this can be done in a particularly efficient way.
Denoting the diagonal entries of C(d) by σ 2 d,i , and the corresponding components of (the vector valued function) M and d respectively by M i and d i , we rewrite the first term of eq. (4) in least squares form Taking second derivatives (one with respect tox k and one with respect tox l ) yields Accumulating all summands for data over ocean and all summands for data over land, inserting into eq. (5) yields where the Hessian contribution of the prior is denoted by H 0 and the contribution by A-SCOPE is decomposed into two terms H A,O and H A,L determined by model characteristics and two factors depending on the data uncertainties over ocean and land. Inserting in eq. (7) yields where with our scalar target quantity, the Jacobian D takes the form of a row vector. Thanks to this decomposition we can precompute H A,O and H A,L , such that a plot of σ y over σ d can be produced by pure matrix algebra without further CCDAS simulations.
For assessing the effect of A-SCOPE as an extension of the ground-based network, we can repeat the above algebra but starting from a cost function that has in addition to eq. (8) a third term representing the fit to the ground-based network. This situation is also covered by eq. (11), if we generalize the meaning of H 0 to represent the Hessians of all cost function contributions except the A-SCOPE term. In other words, in this situation H 0 denotes the Hessian when inverting against data from the ground-based network only.
In summary, the simulations require code for the computation of  Figure 2 shows the forward modelling chain of the extended CCDAS. The Biosphere Energy Transfer HYdrology scheme (BETHY, Knorr, 2000;Knorr and Heimann, 2001) is used to simulate the surface fluxes of CO 2 from the terrestrial vegetation.

Representation of atmospheric transport
Atmospheric CO 2 concentrations are modelled by the atmospheric transport model TM3 (Heimann and Körner, 2003), in its fine 4 • × 5 • horizontal resolution, with 19 vertical σ levels. As in Houweling et al. (2004), the model uses meteorological driving fields for the year 2000 as provided by Kalnay et al. (1996). Owing to the linearity of the atmospheric transport of CO 2 , the vector c of the changes in the total column CO 2 (X CO 2 ) at each observational location and time in response to a given flux field f can be represented by its Jacobian matrix A The flux field f is represented in the full 4 • × 5 • resolution of the transport model and monthly temporal resolution. To compute one column of the Jacobian matrix corresponding to a given surface grid cell and month, the model is run with a unit emission in that grid cell and month (Enting, 2002). For each component of c, the simulated X CO 2 value corresponding to the observational location and time is recorded.
The above procedure would require one model run per grid cell and month in the period from the start of the CCDAS integration until the month in which the last X CO 2 observation takes place. To reduce the number of required model runs, we make two simplifications: (1) For fluxes more than 4 yr prior to a given X CO 2 observation we assume that their CO 2 emission is completely mixed within the global atmosphere. This means all columns in A corresponding to fluxes more than 4 yr ago contain a constant a that quantifies the response in a completely mixed atmosphere (i.e. in a one box model).
(2) For fluxes less than 4 yr prior to a given X CO 2 observation but not in the same month or the 2 months before, we assume that their CO 2 emission has the same effect as all other fluxes emitted in the same latitude band. This means all columns in A corresponding to fluxes for a given month in said time span and in a given latitude band have the same response. Here, we use 8 latitude bands.
This means to simulate c the response to the fluxes of 3 months f f is represented in full spatial resolution (full Jacobian) A f , the response of the fluxes of 4 yr (actually minus 3 months) f l is represented in latitudinal band resolution (latitudinal Jacobian) A l , and the response to all previous years f g by a single number (global Jacobian) A g , i.e.
c ≈ A f f f + A l f l + A g f g .
(13) Figure 3 illustrates how the Jacobian is composed by blocks for full, latitudinal, and global Jacobians. The first row of blocks belongs to the earliest observations and the last row to the last observations. The first column belongs to the earliest fluxes and the last column to the last fluxes. Each block represents the impact of a particular month of fluxes on a particular month of concentrations. Concentrations are column integrated, taking the spectral weighting function and the orbit parameters into account.
We illustrate the computational savings through the matrix approximation in a sample calculation with 20 yr of fluxes and observations in the last year, the set up used for this study. To provide the full Jacobian, A f , we perform 12 sets of runs over 3 months each. The first covers the period from January to March, the second the period from February to April, etc. Each set consists of as many runs as we have grid cells, i.e. 72 × 46 (excluding the poles). We record the concentrations over the 3 months. The annual periodicity of the transport model's meteorological driving data is exploited in the following way: The simulation starting in December provides the response at the last concentration month (i.e. December) to fluxes in the same month, but it is also used to provide the response at the first concentration month (January) to fluxes 1 month prior. Without the periodicity we would need two additional sets of runs, for November and December in the year before the observations start. On the other hand, the sets of runs starting in November and December of the last year can be restricted to an integration period of 2 and 1 months, respectively. In total the computation of A f requires 39 744 runs over 3 months each, that is, 9936 yr of transport model simulation. To provide the latitudinal Jacobian, A l , we perform 12 sets of runs over 48 months each, where each set consists of as many runs as we use latitudinal bands in which all fluxes produce the same response, in our case 8. In total the computation of A l requires 96 runs over 48 months each, i.e. 384 yr of transport model simulation. Finally, Fig. 3. Structure of the Jacobian transport matrix.
Tellus 62B (2010), 5 the global Jacobian, A g , requires a single run over, say, 5 yr. By contrast, without the approximation we would require 244 sets of runs, with integrations periods decreasing from 244 months to 1 month. Each set would consist of as many runs as we have grid cells, that is, 72 × 46. This would produce a total of about 8.2 million years of transport model simulation.
The transport Jacobian is computed in two versions. The first version (instantaneous Jacobian) uses instantaneous samples at days 7, 14, 21, 28 of each month, and at 0.00 and 12.00 GMT. Note that for our assessment the modelled Jacobian does not need to match the exact date of the observation but only a meteorological situation typical for that time of the year. We can then assign every X CO 2 observation to the closest date in the record. The alternative version of the Jacobian uses monthly mean concentrations.

Uncertainties in observations and model
For both Jacobian versions the specification of the data uncertainty is complicated by the fact that the simulated and observed quantities differ. The observed quantity is X CO 2 for a short interval in time and space (almost a point measurement) whereas the simulated quantity is a mean X CO 2 value. The monthly mean Jacobian simulates a mean over a horizontal grid cell and 1 month, whereas the instantaneous Jacobian simulates the mean over a horizontal grid cell and the model time step of 30 min.
Computing the difference between observed and simulated quantities brings in an additional source of uncertainty reflecting the error we make when transforming one quantity into the other. This error is called representation error (see, e.g. Heimann and Kaminski, 1999). We take this uncertainty into account by including an additional term C(d rep ) in eq. (3) C(d rep ) is hard to specify. We use a diagonal matrix, with the square of a constant σ rep on the diagonal. We derive a value based on the conservative assumption of n point samples of a Gaussian distributed X CO 2 within the grid cell with standard deviation σ het ('het' standing for heterogeneity) where σ het can, in principle, be observed. We use a conservative value of 3 ppmv for the total column, to reflect the fact that we also sample downstream of large fossil fuel emissions or over forests in the growing season. For the monthly mean Jacobian we use n = 30, which is about the (temporally and spatially varying) average sample size per horizontal grid cell and month, as derived from orbit simulations using MODIS cloud cover . For the instantaneous Jacobian n = 1, because we use each sample individually. A potential correlation between cloud cover and X CO 2 could be taken into account by a methodological refinement. First, C(d rep ) would not be diagonal (uncertainty correlation across grid cells). Second, eq. (15) is too optimistic, because it is based on uncorrelated uncertainties of samples within the same grid cell. None of this is addressed in this study. For our base case with a vertical weighting function based on the 1.6 µm band (Ehret et al., 2008) we use observational uncertainties of 0.5 ppm over land and 1.5 ppm over the ocean, respectively denoted σ obs,L and σ obs,O . This corresponds to the target and threshold requirements of the A-SCOPE mission as given in the A-SCOPE Report for Assessment (ESA, 2008). For the 2.0 µm band we increase the observational uncertainties by a factor of two. Here again we are slightly optimistic by neglecting correlations in the observational uncertainties, that is, by assuming only random errors. This is not too severe, because the A-SCOPE Report for Assessment (ESA, 2008), assumes only 10% of the observational uncertainty to be systematic. Also, calibration against ground measurements may help to build a model of the systematic error, that is, with a uniform mean value (bias) plus a random component. The mean value can then be subtracted from the observations prior to assimilation. Only the random component then translates into a correlated uncertainty.
The uncertainty due to model error is also hard to specify. For the monthly mean Jacobian we use a diagonal form, with the square of a constant σ mod of 0.5 ppmv on the diagonal. This is probably conservative given that value we specify here has to be characteristic for the performance of a state-of-the-art model with a resolution of 4 • × 5 • . For the instantaneous Jacobian we also use a diagonal form but with a σ mod of 1.5 ppmv. We use this larger value for two reasons. First, the model error is larger for instantaneous values than for monthly mean values, where a fraction of the error cancels out. Second, the approximation of diagonal uncertainties is better for the monthly mean values than for instantaneous values, where correlated uncertainty from model error may play a larger role. Correlated uncertainties among a number of observations have the following effect on the cost function. They reduce the weight in the direction of the sum (average) of the observations and increase the weight in the direction of their differences. In our diagonal formulation we mimic the effect on the average by inflating the uncertainty. It is important to note that at this point we address only the residual model error for perfect parameter values and without any representation error, because the former is addressed by our method and the latter is accounted for by a separate term in eq. (14). Also, the positive correlations are partly compensated by processes involving mass conservation (e.g. within a carbon pool), which tend to create negative correlations among uncertainties.
For plotting the target uncertainty σ y over the observational uncertainty σ obs we introduce a scaling factor k for the Tellus 62B (2010), 5 observational uncertainty and combine eq. (11) with eq. (14) σ y,mod refers to the uncertainty due to errors in the terrestrial model, and the first term specifies the uncertainty in case we had a perfect model for y. σ y,mod is, of course, strongly dependent on the model and difficult to assess. In order not to mask the assessment of A-SCOPE through a rather arbitrary assumption on σ y,mod , we don't include it in the default computation of σ y . We do this consistently throughout all experiments, that is, our A-SCOPE assessment and our benchmark, the groundbased station network. For both the A-SCOPE assessment and the benchmark exactly the same model error is to be used in eq. (16). Owing to the simple dependency of σ y on σ y,mod , one can easily combine it with the uncertainties we provide. We show, however, an example calculation that estimates the model error from an ensemble of terrestrial biosphere models. Cramer et al. (2001) (Fig. 5) compare NPP and NEP simulated by six terrestrial biosphere models. Both quantities depend on the process representations in the individual models. For the 1990s the global NPP of the six models spans a range of about 15 GtC yr −1 , while NEP spans a range of about 1.5 GtC, and with one outlier removed below 0.5 GtC. For the definition of relative ranges it makes sense to refer to global NPP also for the NEP range, because NEP is the difference between two large fluxes. The ranges relative to 60 GtC (a typical value for global NPP, our model's global NPP is 64.9 GtC) are then 25% for NPP and 2.5% for NEP. For our three regions the 25% of NPP are conservative. The global map in fig 2. of Cramer et al. (1999) shows a lower relative range (in this study for the 16 models of the Potsdam NPP intercomparison) over these regions, but a larger relative range over Africa. This is also confirmed by fig. 1 of Kicklighter et al. (1999) who show the NPP range spanned by 90% of the same models over latitude. Over our 20 yr integration period the long-term NPP average per year is 2.6 GtC over Europe, 6.5 GtC over Russia, and 6.1 GtC over Brazil. Associating the above derived relative ranges with a ±1 standard deviation interval (again a conservative assumption), yields respectively for the three regions per year 0.3 GtC, 0.8 GtC, 0.8 GtC for NPP and 0.03 GtC, 0.08 GtC, 0.08 GtC for NEP. Our computational example is based on spreads in model simulations representing the state-of-the-art around the year 2000. We can hope that these spreads decrease with progress in terrestrial modelling, and with systematic model calibration against observations. Our example is also conservative in that the spreads include the parametric uncertainty (caused by wrong values of the process parameters), a source of uncertainty that we explicitly specify in eq. (16).

Experiments
This section describes the four experiments that are performed with the extended CCDAS. All experiments are run with the CCDAS configuration determined by the optimized parameter values (see Table 1) from Scholze et al. (2007), who use 41 sampling sites from the (GLOBALVIEW-CO 2 , 2004) network. The total data uncertainty σ d is 1.08 ppmv per observation on average. For a detailed description of the parameters we refer to Rayner et al. (2005). The simulation period covers 20 yr with the ground-based network sampling over the entire period and A-SCOPE only in the final year. The Hessian for the ground-based network (H 0 in eqs 10, 11 and 16) is taken from Scholze et al. (2007). The same holds for the Jacobian mapping parameter uncertainty onto flux uncertainty (D in eqs 11 and 16). The experiments use the following seven target quantities: net carbon flux (NEP) and net primary productivity (NPP) over Europe, Russia, and Brazil, as well as global NEP for consistency checks. The experimental setups are as follows: (1) Base experiment: This experiment applies vertical weighting for the 1.6 µm band (see Table 2) to the transport Jacobian with monthly mean concentrations and assumes global coverage with observational uncertainties of 0.5 ppmv over land and 1.5 ppmv over ocean plus an uncertainty of 0.5 ppmv reflecting model error. The first simulation (Case 1) uses both the ground-based flask sampling network and A-SCOPE, the second simulation (Case 2, base case) runs without the ground-based flask station network and only A-SCOPE sampling, and the third simulation (Case 3) with the ground-based station network only. This experiment allows us to assess the benefit of A-SCOPE and the ground-based flask sampling network separately and in conjunction. Cases 2* and 2** illustrate the effect of including the uncertainty due to model error (σ y,mod term in eq. 16).
(2) Sensitivity to temporal and horizontal sampling: This experiment repeats Experiment 1, Case 2, but with different sampling. Based on the above-mentioned A-SCOPE orbit specification by Breon et al. (2009) it assumes sampling of instantaneous concentrations, for which the sensitivity uses the nearest location, day and time of day in the instantaneous transport Jacobian.
(3) Sensitivity to vertical weighting: This experiment repeats Experiment 1, Case 2, but with the 2.0 µm band vertical weighting (see Table 2) instead of the 1.6 µm band weighting and with observational uncertainties increased by a factor of two.
(4) Sensitivity to data uncertainties: This experiment repeats Experiment 1, Case 2 with a joint scaling factor for the two observational uncertainties over land and ocean in eq. (10) and samples 25 values of this scaling factor. Table 3 lists the prior and posterior uncertainties for Experiments 1-3. In Experiment 1, Case 2, which is our base case, the posterior uncertainties are strongly reduced compared to the prior uncertainties. This is the case for both NPP and NEP over all three regions. The strongest reduction occurs for global NEP, Notes: Units are V max : µmol(CO 2 ) m −2 s −1 , a J,T : (deg C) −1 , a ,T : µmol(CO 2 )mol(air) −1 (deg C) −1 , activation energies E: J mol −1 , τ f : years, offset: ppm, all others unit-less. Uncertainties are in percentage except for log-normally distributed parameters for which a range is given. Uncertainties represent one standard deviation. with a posterior uncertainty of 0.013 GtC yr −1 . For a consistency check we can calculate a lower bound for that value by thinking of a well-mixed atmosphere that is sampled at the end of the integration period by all A-SCOPE observations. The number of observations on the 72 × 48 grid over 12 months is about 40 000, and the average data uncertainty about 1 ppmv, which yields an uncertainty of 0.005 ppmv. We can infer, as a single unknown, a 20 yr global mean NEP by inverting the box model. This is particularly easy if we assume that the initial concentration (parameter 57) is known and neglect the contribution of the prior uncertainty in eq. (2). Using a conversion factor of 20 × 0.5 yr ppmv per GtC, we end up with a value of 0.0005 GtC yr −1 . This is consistent with the experiment's posterior uncertainty of 0.013 for global NEP, which is well above this lower bound. Note that posterior uncertainties for A-SCOPE derived by CCDAS are generally lower than those of assessments for the OCO mission by classical transport inversions (Chevallier et al., 2007;Miller et al., 2007;Baker et al., 2008) or Kalman filters (Feng et al., 2009). Their values, however, refer to considerably shorter temporal averaging periods than our 20 yr. Extending the averaging period typically reduces the uncertainties owing to negative correlations in uncertainties along the temporal axis. This is well known in flux inversions but also holds in CCDAS (Scholze et al., 2007). A detailed attribution of the differences in posterior uncertainties to factors such as the averaging period, the additional constraint through our terrestrial model, and the mission concept is far beyond the scope of the present study.
The benefit from the observational constraint is limited by the uncertainty from errors in the terrestrial model. The two bottom rows of Table 3 (labelled Experiment 1, Cases 2* and 2**) illustrate the effect of including a σ y,mod term in eq. (16) for the calculation of Experiment 1, Case 2. For Case 2* we use the values for the example calculation for σ y,mod based on model spread representing the state-of-the-art around the year 2000 (see Section 2.2). Note that this calculation produces the total uncertainty in model output, that is, it also includes the parametric uncertainty, which our method already accounts for. Hence, assigning this total uncertainty to σ y,mod (the fraction of uncertainty in model output that is not produced by parametric uncertainty) is extremely cautious. This is also indicated by the large uncertainties produced by the prior parameter uncertainties (row 1). Meanwhile a number of benchmarking activities (e.g. Randerson et al., 2009;Cadule et al., 2010) are aiming at separating realistic from unrealistic process formulations in terrestrial biosphere models. Such activities will drastically narrow down σ y,mod : To reflect the anticipated progress, Case 2** uses a σ y,mod of 10% of the above-calculated total uncertainty.
The reduction of uncertainty relative to the prior uncertainty quantifies the strength of the observational constraint on the respective target quantity. It is shown for all target quantities in Fig. 4. For each of the regions, NEP is better observed than NPP. The comparison between Cases 2 (black bars) and 1 (grey bars) shows that adding the station network yields only slight improvement. The constraint by the station network alone is about a factor 20-40 weaker than the constraint by A-SCOPE.
Uncertainty reductions for Experiments 2 and 3 are in the same range as for Case 2 of Experiment 1. This means the good   performance of A-SCOPE is robust against the horizontal and temporal averaging in the Jacobian and a change of the spectral band with the associated change in weighting function and observational uncertainty. In Experiment 2 the inflated uncertainty for σ mod in eq. (14) meant to compensate for challenges in modelling instantaneous samples as well as possible correlations in uncertainties due to model error did not degrade the performance of A-SCOPE.
Experiment 4 enables us to plot the posterior uncertainty in all seven target quantities over the scaling factor. Figure 5 shows a modest sensitivity of the target uncertainty to the scaling. This is because the scaling is deliberately restricted to the observational uncertainty while uncertainties reflecting model and representation error are kept constant.

Conclusions
The present study investigated the benefit of A-SCOPE observations in a CCDAS that links the terrestrial vegetation model BETHY (Knorr, 1997(Knorr, , 2000 to observations of CO 2 total column content via the fine resolution version of the atmospheric transport model TM3 (Heimann and Körner, 2003). In the modelling process chain the observations are used to reduce uncertainties in the values of BETHY's process parameters, and then the uncertainty in the process parameters is mapped forward to uncertainties in both NEPs and NPP over three regions. Note that traditional transport inversions cannot handle target quantities other than NEP.
For the assessment, other sources of carbon dioxide (the socalled background fluxes) such as fossil fuel emissions, land use change fluxes and exchange fluxes with the ocean were prescribed to fixed values without uncertainty. We are thus likely to over estimate the A-SCOPE constraint on the terrestrial process parameters and, hence, also on the calculated fluxes.
A-SCOPE yields considerably better reductions in posterior uncertainties than the ground-based GLOBALVIEW station network used by Scholze et al. (2007). This is true for assimilating monthly mean values and instantaneous values, and it is true for both the 1.6 µm band and the 2.0 µm band vertical weighting function. The strength of the constraint through A-SCOPE observations is high over the range of observational uncertainties from 0.05 to 1.25 ppmv over land and from 0.15 to 3.75 ppmv over ocean. A potential A-SCOPE mission would, thus, have a major impact on our understanding of the global carbon cycle and narrow down the currently large uncertainties in future climate simulations owing to the climate-carbon cycle feedback (Friedlingstein et al., 2006).
The reasons for the strong constraint lie in the real global coverage and the much larger number of observations compared to the GLOBALVIEW station network. In contrast to pure transport inversions, the CCDAS approach exploits the powerful constraint provided by the terrestrial process formulations within the vegetation model. The model classifies global vegetation into 13 PFTs grouped according to the plants' morphology, physiology, phenology as well as bioclimatic limits. Each PFT has its own set of process parameters and provides a strong link between the various regions of occurrence. Hence, an observation over one region can help to constrain fluxes over another region. For a model with more PFTs, or spatially varying parameter values, the observational constraint would be weaker.
Similar to pure transport inversions our study results also depend on the assumptions on uncertainties that reflect model and representation errors. This study neglected correlated uncertainties for the observations and for the sampling of the horizontal mean X CO 2 concentrations over a TM3 grid cell. The A-SCOPE Report for Assessment (ESA, 2008) specifies a systematic error contribution of as low as 10% to the total observational uncertainty. It is desirable to stay with a mission design that assures a low and uniform systematic error. This is because calibration against ground measurements may help to build a model of the systematic error, for example, with a uniform mean value plus a random component. The mean value can then be subtracted from the observations prior to assimilation and only the random component enters the inversion in the form of a correlated uncertainty.
Our estimate of the uncertainty that reflects representation error is based on independent sampling of the X CO 2 concentration within a horizontal transport model grid cell of 4 • × 5 • and an ad hoc assumed variability of 3 ppmv. This yields a relatively low representation error. It would be desirable to derive an improved estimate reflecting observed small-scale variability. Traditional transport inversions would also benefit in the same way from a better quantification of representation error.
Any comparison based on published work with classical transport inversions is difficult, because differences in posterior flux uncertainties are affected by a number of factors such as the constraint through the terrestrial model as mentioned above, but also through differences in averaging periods and regions, transport model, and mission concepts.
This study has quantified the benefit of A-SCOPE in conjunction with the ground-based flask sampling network only. Both observational types are similar in that they constrain the net flux of carbon dioxide via the atmospheric concentration. There are or will be, however, further remote sensing data streams available from optical sensors (e.g. MERIS) or microwave sensors (e.g. SMOS). Such instruments provide direct constraints on vegetation phenology (MERIS) or hydrology (SMOS), which are tightly coupled to the terrestrial carbon cycle. It is expected that such data streams provide constraints complementary to A-SCOPE. It is highly desirable to set up a system that can benefit of this multiple constraint in terms of uncertainty reduction in carbon fluxes. Since standard transport inversions lack any process representation of terrestrial phenology and hydrology, such a quantitative assessment has to be performed in the framework of a CCDAS.

Acknowledgments
This study was supported by the European Space Agency under contract 21061/07/NL/JA. The authors thank Simon Blessing for his help with shell-scripting, Martin Heimann for providing the atmospheric transport model TM3, and Christiane Textor for helpful comments on the manuscript. Computing support was provided by the QUEST computer cluster at the University of Bristol, UK.