Towards determining uncertainties in global oceanic mean values of heat, salt, and surface elevation

Abstract Lower-bounds on uncertainties in oceanic data and a model are calculated for the 20-year time means and their temporal evolution for oceanic temperature, salinity, and sea surface height, during the data-dense interval 1994–2013. The essential step of separating stochastic from systematic or deterministic elements of the fields is explored by suppressing the globally correlated components of the fields. Justification lies in the physics and the brevity of a 20-year estimate relative to the full oceanic adjustment time, and the inferred near-linearity of response on short time intervals. Lower-bound uncertainties reflecting the only stochastic elements of the state estimate are then calculated from bootstrap estimates. Trends are estimated as in elevation, 0.0011 ±  0.0001 °C/y, and (−2.825 ± 0.17) × 10−5 for surface elevation, temperature and salt, with formal 2-standard deviation uncertainties. The temperature change corresponds to a 20-year average ocean heating rate of  W/m2 of which 0.1 W/m2 arises from the geothermal forcing. Systematic errors must be determined separately.


Introduction
Many papers have been directed at estimating multi-decadal ocean heat uptake (Purkey and Johnson, 2010;Lyman and Johnson, 2014), salinity change as an indicator of fresh-water injection (Wadhams and Munk, 2004;Boyer et al., 2005), and sea level (elevation) changes (Nerem et al., 2006;Cazenave et al., 2014) or all together Peltier 2009;Forget and Ponte, 2015). Many more such calculations have been published than can be listed here. A great difficulty with most of these estimates is the historical inhomogeneity in the various data sets employed, and the consequent assumption of nearly untestable statistical hypotheses used to extrapolate and interpolate into data sparse times and places (Boyer et al., 2016 andWunsch, 2016 for generic discussions). A number of papers have claimed 'closure' of the sea level change budget, but that is accomplished through large and uncertain error budgets of the various components.
Ocean general circulation models (GCMs) and coupled climate models have also been used to calculate spaceand time-mean oceanic temperature ðTÞ; salinity ðSÞ, and sea surface elevations ðgÞ. Most models, including the ECCO system (Estimating the Circulation and Climate of the Ocean; Wunsch and Heimbach, 2013;Forget et al., 2015), compute the ocean state in a deterministic fashion. Given initial conditions and time-varying meteorological boundary conditions, the model time-steps the state vector, as though the external fields, including initial conditions, were fully known. Ensemble (Monte Carlo) methods attempt to estimate the uncertainties of the state at a particular time, usually a forecast time, by computing families of disturbed initial and/or boundary conditions. A general discussion of the accuracy or precision of general circulation and climate models does not appear to exist. As in all systems, errors will always include systematic ones, e.g. from lack of adequate resolution or improperly represented air-sea transfer processes, amongst many others. Stochastic errors will arise from noisy initial and boundary conditions, as well as rounding errors, and interior instabilities of many types, both numerical and physical. Analysis of systematic and stochastic errors requires completely different methods. (Henrion and Fischoff, 1986, and especially their Fig. 1 reproduced in Wunsch 2015, p. 48.) The central purpose of this paper is to suggest a potentially useful direction towards partially solving the problem of determining uncertainties in global-scale quantities calculated with both observations and models. Examples are produced for oceanic T; S; g values and their variability from the nearly homogeneous (in the observational network sense) data sets 1994-2013. Specifically, a separation is attempted between the stochastic and systematic errors inevitably present. Stochastic errors are those often labeled as 'formal' and are derived from estimates of random noise present. Estimation of systematic errors involves a near line-byline discussion of the individual computer codes used to calculate oceanic states. Errors will be different in calculations of the mean state, and in their temporal and spatial changes. For reasons discussed below, as the duration of a computation is extended, the distinction between stochastic and systematic errors can no longer be made and paradoxically, it is the comparatively short (20-year) time interval of the state estimate that supports the methodology outlined.
The ECCO version 4 solution employed here represents the output of an oceanic general circulation model whose initial and boundary (including meteorological) conditions and interior empirical coefficients have been adjusted in a least-squares sense to fit 25 years of global data. All data are assigned an error variance or covariance and the final state is obtained from the free-running adjusted system. For 'state estimation' as done in ECCO ECCO Consortium 2017a,b;Fukumori et al., 2018), two major obstacles loom if Monte Carlo methods are to be used: (1) the immense state and control vector dimensions; (2) the absence of quantitative estimates (probability distributions) of the stochastic contributions in the initial/boundary conditions and stochastic structures generated by internal instability and turbulence. The same obstacles to uncertainty calculation loom in any ocean or coupled-climate model run for a long time whether or not based upon combinations with data.
A number of methods exist for calculating uncertainties in systems such as that of ECCO. To the extent that the system is linearizable, the method-of-Lagrange-multipliers/adjoint used there can be shown (Wunsch, 2006) to have identical uncertainties to those obtained from sequential estimates, such as the Rauch-Tung-Striebel (RTS) smoother. 1 This approach is very well understood and is practical for small systems (Goodwin and Sin, 1984;Brogan, 1991;Wunsch, 2006). It involves calculated covariance matrices that are square of the state vector dimension and of the control vector dimension at any time, t. For the ECCO version 4, state vector dimension at each time step is approximately N ¼ 11 million and updating the state and its covariance requires running the model N þ 1 times at each time-step. Similar dimensions and issues apply to the system control vector.
Other methods include calculation of inverse Hessians (Kalmikov and Heimbach, 2014), sometimes using Lanczos methods. Hypothetically, one could solve a Fokker-Planck equation corresponding to the model (Gardiner, 2004) and its initial/boundary condition, or the prediction-particle filtering methods of Majda and Harlim (2012). None of these methods is computationally practical for the global ocean or climate system with today's computersalthough that should gradually change in the future.
Nonetheless, some form of useful uncertainty estimate is necessary for values calculated from models, whether from ordinary forward calculations, or from a state estimate. For example, as described by ECCO Consortium (2017a), Fukumori et al. (2018), the 20-year average ocean temperature is 3.5310 C found from the N ¼ 2:4 Â 10 6 volume weighted grid points of the adjusted model (centers of cells). How reliable is that number? On the one hand, it is extremely accurate up to the machine precision of 2 À64 . A standard error might be calculated by dividing the variance of the volume-weighted elements by 2:4 Â 10 6 , but such a number is meaningless: (1) much of the thermal structure of the ocean is deterministic on the large-scaleand with other effectively permanent subbasin scale structuresstable over 20 þ years. Treating that structure as stochastic would be a major distortion.
(2) The distribution of values is very inhomogeneous over the three-dimensional volume and any supposition of uniform probability densities or of near-Gaussian values is incorrect.
Parts of the ocean structure and of the meteorological forcing fields are deterministic processes over decades. For example, the depth and properties of the main thermocline, or of the dominant wind systems, do not vary significantly over 20 years and the response times of the global ocean extend to thousands of years and beyond (Wunsch, 2015, p. 338). Superimposed upon the initial and boundary conditions are noise fields best regarded, in contrast, as stochasticbut which cannot build up global-scale covarying structures in a restricted time period.
When integrated through a time-stepping fluid model, the stochastic elements, even disturbances that are white noise in space and/or time, will give rise to complex structured fields (Fig. B5 of Wunsch, 2002). A crux of the uncertainty problem for model outputs then is to separate the deterministic from the stochastic elements. Ensemble methods, generated by stochastic perturbations of initial/ boundary conditions/parameters, face the same difficulty: What are the appropriate joint probability distributions to use in generating the ensembles (Evensen, 2009)? 2 To the extent that the stochastic influence can be regarded as perturbations about a stable deterministic evolution, the probability densities will be centered about deterministic fields, as in Eq. (3.5.9) of Gardiner (2004). Systematic errors will remain as part of the deterministic components, and must be dealt with separately. A literature on systematic errors in numerical models does exist, much of it directed at the accuracies of advection schemes and other numerical approximations (Hecht et al., 1998). As is always the case with real systems, the stochastic error provides a lower bound on the true uncertainty.
What follows is largely heuristic and without any statistical rigor. Methods for separation of deterministic from stochastic elements in large volumes of numbers do not appear to have been widely explored. (This issue should not be confused with the problem of separating 'deterministic chaos' from true stochastic elements familiar in dynamical systems theory; Strogatz, 2015).

Mean values
A start is made with time-mean three-dimensional fields, which permits introducing the basic ideas while greatly reducing the volume of numbers required. A supposition is thus made that only the time average fields are available and sampled, temporarily suppressing the information contained in the time-variability. Suppression of the deterministic component, so as to leave a stochastic field, is required for both mean and time variations.

Temperature
Consider the problem of determining the 20-year global ocean average temperature and its corresponding uncertainty. A 20-year average, computed 50 years in the future, might usefully be compared with the present 20-year average. Hourly values of the state estimate, averaged over 20-years, 1994-2013, produce point-wise calculated mean potential temperatures, T i at each grid point. Mean temperature at one depth can be seen in Fig. 1, displaying the classical large-scale features that are clearly deterministic over 20 years with superimposed stochastic elements. A histogram of the gridded mean temperatures can be seen in Fig. 2 and its heavily skewed behavior is apparent.
To form an average, the volume represented by each grid point is accounted for by weighting each value T i by the relative volume contribution a i such that, as quoted above. A histogram of the similarly skewed weighted values can also be seen in Fig. 2. A standard deviation computed from the volume weighted values has no meaning as: (a) much of the field is effectively deterministic and, (b) the probability density of the stochastic elements is unknown.
Here the bootstrap and related jackknife methods in the elementary sense described by Efron and Tibsharani (1993), Mudelsee (2014), and others are used. The basic assumption of the bootstrap is that the values making up the subsampled population are independent, identically distributed (iid), values. Any assumptions that stochastic elements in cold, deep, temperatures are drawn from the same population as the much warmer near-surface values, or that this structure is dominantly stochastic, cannot be correct.
An assumption is now made that the strongest globally spatially varying structures represent the deterministic component. As already alluded to, this assumption is based on considerations of physics aloneand not upon any statistical methodology: any three-dimensional, globally correlated structure can only have been generated by very long-term effectively systematic processes. If a process can be rendered indistinguishable from white noise, then at zero-order most covariance structure has been removed. 3 Integration of stochastic fields does, of course, produce globally correlated structures (Wunsch, 2002, Appendix B) but these take many decades and longer to appear. To the extent that the stochastic background has generated large-scale covarying fields, the results here will be biassed low. Experiments with simpler models (not shown) are supportive of the assumption that over time intervals, short compared to the full adjustment time of the system, stochastic disturbances can be so evaluated.
To render the assumption concrete, a residual population, h 0 i ¼ a i T i ð Þ 0 ; that is iid over the full water column everywhere is generated by subtracting the globally correlated structures. In particular, let the three-dimensional matrix of volume-weighted temperatures be, N k i ; / j ; z k À Á ; written with columns in longitude, latitude, and depths. Setting temperatures to zero over land is the simplest choice. One might, alternatively, mask out these regions from the computation. Experiments (not shown) with the mean temperature field, showed only very slight differences from those obtained with the zero assumption (apart from weak structures appearing in the zero region, and masked out here). The choice has the virtue that the same method can be used in conjunction with coupled oceanatmosphere models, where land values must be specified. Note too, that zeros are also assigned to regions below the local water depth so that the v i vectors (defined below) can capture the influence of topography over a fixed interval of orthogonality.
To proceed, map this three-dimensional matrix N k i ; / j ; z k À Á into two dimensions by stacking the latitude columns, to form N 0 r a ; z k ð Þ, where r a is just a reordering of longitude and latitude. Write N 0 as its singular value decomposition, where the vertical dimension is described by the K vectors, v i , making up the columns of matrix V K : Columns of U k are denoted u i : K is the number of non-zero singular values and hence is the rank of N 0 . (U K ; V K contain the first K columns, etc. and K K is a K Â K diagonal matrix.) The fractional value of the squared singular values, k i ¼ diagðK K Þ i ; as the sum, is shown in Fig. 3 and representing the cumulative variance by singular vector pair. The first singular vector pair u 1 ; v 1 accounts for over 90% of the variance (Fukumori and Wunsch, 1991). A horizontal chart of u 1 is displayed in Fig. 3. Subtraction of q pairs, leaves a residual with a very much reduced spatially correlated structure. The variance with depth of both h i and h 0 i (Fig. 2) shows that removal of the first pair alone (q ¼ 1Þ leaves a variance much closer to being depth-independent and with a histogram that is unimodal and not heavily skewed, so that a standard deviation has some simple meaning. (A chart of u 2 ; not shown, shows it to have a spatially complex structure with a plausibly stochastic behavior, which could be tested in turn.) The crux of the statistical problem is in the decision about how many pairs, q, should be removed? A rigorous answer to this question is not provided here. Three basic criteria are used: (1) the residual should be unimodal; (2) the residual variance should not show a major depth dependence; (3) the dominant singular value should account for no less than 30% of the total variance. The criteria permit use of the estimated variance as a useful uncertainty parameter, and support for the iid assumption. A statistically rigorous result requires a method for use analogous to the use of the AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) in linear regression analyses (Priestley, 1981).
For temperature, removal of only the first pair, q ¼ 1; reduces the temperature norm of N 0 q r a ; z k ð Þ by over 90% and the histogram of volume weighted values (Fig. 2) is now unimodal. Over 20 years, the response of the ocean is dominantly linear, producing a normal stochastic field that is supported, e.g. by the discussion of Gebbie and Huybers (2018), and the known physics of short-time scale adjustment. With the risk of over-estimating the uncertainty, q ¼ 1 is chosen, and the elements of N 0 q r a ; z k ð Þ are assumed to be iid and thus suitable for computing a bootstrap mean and standard error. Fifty samples of bootstrap reconstruction produces a two-standard deviation uncertainty of 1.9 Â 10 À3 C owing to the stochastic elements or hTi ¼ 3:53160:002 C, the 'formal' error. (For a Gaussian process, two-standard deviations is approximately 95% confidence interval.) Much structure exists both in the suppressed and retained singular vector pairs, which ultimately need examination and physical explanation.

Salinity
The time mean salinity, (34.727, dimensionless on the practical scale), determined from the volume-weighted values a i S i has the histograms shown in Fig. 4. 4 Taking now q ¼ 2; a i S i , removes about 94% of the variance, and two standard deviations of the bootstrapped residual produces 34.72 ± 0.02. The variance with depth is rendered much more nearly uniform, albeit imperfectly so, and as for temperature, the residual histogram of a i S i is now unimodal. That q ¼ 2 for salinity rather than q ¼ 1 for temperanture is likely explained by the much smaller deviations of S about its volumetric mean, but for present purposes that is immaterial.
For reference, under the assumption that total oceanic salt content remains unchanged, Dh % À h 0 DS=S 0 where h 0 , S 0 are the mean values of mean depth and salinity (Munk, 2003) ignoring the density change to salinity 5 and the contribution of melting sea ice, the uncertainty DS ¼ 60:02 corresponds to a sea level change uncertainty of about Dh ¼ ± 2.2 m. This value may seem surprisingly large, but it simply says that the salinity data permits inference of the total amount of freshwater of about Dh ¼ 2:2 m out of a total average depth of about h ¼ 3800 m, or about 0:06%; which by most standards is remarkable accuracy. One can hope that a comparison 50 years hence will not find changes in Dh, which are significantly different from zero! The formal uncertainty in the present state is sufficiently small in determining systematic errors, care must be taken over the definition/calibration changes that may take place in the future as they have in the past; see Millero et al. (2008) for a discussion of systematic changes in definitions.

Sea surface height/dynamic topography
Mean sea surface height, g; the 'dynamic topography' in the present ocean state, can in principle be compared to its value determined as a 20-year average, 50 years, or any other time interval into the future. Values in the ECCO version 4 state estimate are determined relative to the best available geoid known today (Reigber et al., 2005). The dynamical variables are the horizontal gradient elements and thus if in the future, a different geoid is used, offset by a constant from the one used in ECCO, that change would be of no significance. On the other hand, care would be needed in the future to accommodate changed geoids, for example, with higher spatial resolution, temporal changes, and with significant regional implications. Because dynamics respond only to horizontal gradients in g; the result is most directly connected to the altimeter, gravity, and hydrographic data.
The assumption used so far, that globally covarying fields over 20 years can be interpreted as the deterministic components, is physically sensible for temperature and salinity. For g, however, the ability of the ocean to transmit barotropic (depth independent) signals globally within a few days, makes the assumption dubious. Nonetheless, with this caveat, the global time-mean value of g and an estimate of its accuracy is calculated within the model context. Because of the complexities of geoid error and the rapid achievement of barotropic motion equilibrium, only the time differences of the mean surface height are discussed. For the record, the area weighted mean of the values in Fig. 5 is 8.48 ± 0.13 cm with the error based upon q ¼ 3; but no more discussion is provided here.

Time changes: difference of last and first years
Time changes, represented for now by the difference between two yearly-averages, years t 1 , t 2 , should largely remove the deterministic components contained in the initial/boundary conditions. A trend, e.g. in exchange of heat between ocean and atmosphere as a part of the global warming signal and part of the surface boundary conditions, might be regarded as deterministic. But, as has been noted in numerous publications (Ocaña et al., 2016), with a 20-year record, the duration is far too short to distinguish a true deterministic trend from the longterm stochastic shifts characteristic of red-noise processes. Thus, any trend present is treated as though arising from a stochastic process. Discussion of temporal changes is done in two ways: (1) the value of the differences of the first and last years (20-year difference), and which makes no assumptions about the nature of the trend. (2) The bootstrapped or jackknifed estimate of the trends, assumed to be linear ones using all of the intermediate year averages.

Temperature/heat content
One interesting example is the difference between the mean ocean temperature in 2013 and what it was in 1994 (shown for two depths in figs. 6-7)as a constraint on the rates of global warming. This difference is again a static field and can be analyzed in the same fashion as the time-mean was treated. The spatial pattern of warming and cooling is a complicated one with large-scale structures corresponding to known physical regimes, e.g. the eastern tropical Pacific, the near-Gulf Stream system/subpolar gyre, the Southern Ocean. As one might expect, temperature difference variances are much larger near the surface than they are in the abyss and the gridded temperature field itself is far from having an iid. One can proceed as above, subtracting the appropriate three- dimensional singular vector pairs. To shorten the discussion, the temperature differences are here integrated in the vertical to different depths, including the bottom. The top-to-bottom area-weighted integrals and their histogram distribution are shown in Fig. 8.
Note that the two yearly estimates are not independent onesthey are connected through the time-evolving equations of motion. To the extent that any systematic error in the ECCO system is time-independent, it will be subtractive in the time-difference. The vertical integral, top-to-bottom has no dominant singular vector pair, with the largest squared singular value accounting for less than 30% of the variance. Proceeding under the assumption that the 20-year time-difference has a vertical integral that is close to iid (q ¼ 0Þ, the bootstrap produces a temperature change of 0:0206 6 6:7 ð Þ Â 10 À4 C (two standard deviations). The underlying ECCO GCM uses a constant heat capacity of c p ¼ 3994 J= C=kg, so that with an ocean mass of 1:37 Â 10 21 kg; 0:02 C corresponds to a net heat change of ð1:1 6 0:37Þ Â 10 23 J. Over 20 years,  With all the results here, these values represent lower bounds on the uncertainty. Geothermal heating itself has uncertainties, which should be separately analyzed, and the dependence of heat capacity and density on temperature, salinity, and pressure will introduce systematic errors.

Salinity/freshwater
The pattern of vertically integrated differences of salinity between 1994 and 2013 ( Fig. 9) is already visually somewhat stochastic in character and thus no further structure is removed (the largest singular value corresponds to only 20% of the variance). The mean salinity change between the two years is (À5.4 ± 0.84) Â 10 À4 from the bootstrap estimate with q ¼ 0. Using the above expression for the addition of freshwater, the net increase in water depth is 0:06 6 0:01 m, or 3.0 ± 0.5 mm/y.

Surface height
The difference in height over 20 years (Fig. 10) is 6.37 ± 0.3 cm, or an average change of 3.2 ± 0.015 mm/y where the standard error is obtained from the bootstrap with q ¼ 2: Nerem et al. (2006) quote a rate from altimeter data alone, as 3.1 ± 0.4 mm/y. Although the estimates are not independentthe state estimate uses all the altimeter dataand the altimeter data do not extend over the ice-covered regionthe results are approximately consistent. (See Lanzante, 2005, for the interpretation of overlapping uncertainty estimates.) g is calculated in the model using the full equation of state, thus accounting for the addition of freshwater, the thermal expansion from external heating, and any interior redistribution of heat and salt as functions of position and depth. The change in elevation over 20 years of 6.4 ± 0.3 cm is crudely then about 5-6 cm from the addition of freshwater, with any remainder attributable to thermal expansion.

Estimated linear trends
A generalization for both temperature and salinity is that the top 100 m are noisy year-to-year, but that integrals to 700 m are much cleaner and visually very close to linear. In both fields, the abyssal region, defined as the levels below 3600 m, shows a counter-trend to that of the watercolumn total.

Temperature/heat content
The integrated temperature to various depths is shown in Fig. 11. The best fitting, assumed linear, trend over Fig. 8. Vertical integral, top-to-bottom, of the temperature difference between 2014 and 1993 and area weighted (106 C). The result is treated as fully stochastic, which may somewhat overestimate the formal uncertainty (q ¼ 0Þ. Histogram is again unimodal. Spatial complexity here is indicative of the difficult oceanic sampling problem. 20 years is sought, whether deterministic or a red-noise random walk is immaterial at this stage. The least-squares fit mean slope for the top-to-bottom change is 0:0011 6 6:9 Â 10 À5 C/y. Standard error is computed from a bootstrap of the full field, under the assumption that the time differences are basically stochasticand which likely slightly overestimates the uncertainty. (A jackknife estimate was identical.) The mean slope implies a change over 20 years of 0.0213 ± 0.0014 C and which differs only slightly from the value computed between first and last years as might have been inferred from Fig. 11.

Salinity trends
Integrated salt anomalies are displayed for each year to several depths in Fig. 12. An overall freshening, top-tobottom is evident, including a slight increase in salinity at and below 3600 m. This abyssal change accompanies the general cooling seen below 3600 m in Fig. 11, but this physics is not further described here.
As with temperature, the best fitting least-squares straight line differs slightly from the calculation using only the first and last years, producing a value of À2:82560:137 ð Þ Â 10 À5 =y for a net change over 20 years of À5:65 6 0:274 ð Þ Â 10 À4 (For comparison, Boyer et al., 2005, estimated the trend as À5:4 Â 10 À4 =y from a much longer and much more inhomogeneous data set. No uncertainty was specified.) An estimate of the trend is 2:2 Â 10 À3 61:5 Â 10 À4 m=y, again from a bootstrap average (with q ¼ 0), of about 2 mm/y. The corresponding mean surface height change is then 4.4 ± 0.15 cm over the 20 years. Attribution of the combined heating and salinity change into an equivalent sea level trend is a complex matter, that must include gravity field changes (see Church et al. 2010;Pugh and Woodworth 2014; for complete discussion.

Discussion
Although quantitative 20-year time-means and changes in the global average oceanic heat, salt, and dynamic topography (sea surface height) have been estimated, the main goal here is not those numbers per se: the intention is to begin the discussion of the separation of random and systematic errors in model property estimates. Results here are almost entirely heuristic, but the approach using resampling (bootstrap) methods applied to spatially decorrelated fields can perhaps be made rigorous. In particular, methods for separating deterministic and stochastic elements of the three-dimensional, time-dependent fields, in the absence of real knowledge of the probability distributions, should be explored. Formal errors here are sufficiently small that a reasonable expectation is that systematic ones dominate, but that is only surmise. Attention must then turn to the issue of systematic errors in the model and state estimate. These will never be zero, but because of the data-fitting in the state estimation Fig. 11. Annual mean temperature anomalies integrated to different depths including the bottom in C. Two standard deviation bars are derived from bootstrapping the full temperature difference field each year. Ttot is integrated top-to-bottom; T100m, T700m, T3600m are integrated to 100, 700, and 3600 m, respectively. Tabyss is integrated from 3600 m to the bottom. Cooling in the region below 3600 m was discussed by Wunsch and Heimbach (2014) and Gebbie and Huybers (2018). process, they are expected to be much-reduced compared to those found in unadjusted climate models.
A full discussion of the structures and causes of the various fields appearing in the means and in the heating/ cooling, salinification/freshening, elevation increases/ decreases in time and space requires a specialized study of each field separately and is not attempted here.