Coexchangeable process modelling for uncertainty quantification in joint climate reconstruction

Any experiment with climate models relies on a potentially large set of spatio-temporal boundary conditions. These can represent both the initial state of the system and/or forcings driving the model output throughout the experiment. Whilst these boundary conditions are typically fixed using available reconstructions in climate modelling studies, they are highly uncertain, that uncertainty is unquantified, and the effect on the output of the experiment can be considerable. We develop efficient quantification of these uncertainties that combines relevant data from multiple models and observations. Starting from the coexchangeability model, we develop a coexchangable process model to capture multiple correlated spatio-temporal fields of variables. We demonstrate that further exchangeability judgements over the parameters within this representation lead to a Bayes linear analogy of a hierarchical model. We use the framework to provide a joint reconstruction of sea-surface temperature and sea-ice concentration boundary conditions at the last glacial maximum (19-23 ka) and use it to force an ensemble of ice-sheet simulations using the FAMOUS-Ice coupled atmosphere and ice-sheet model. We demonstrate that existing boundary conditions typically used in these experiments are implausible given our uncertainties and demonstrate the impact of using more plausible boundary conditions on ice-sheet simulation.


Introduction
Numerical experiments are vital tools to climate science.Knowledge of physical processes embedded into software can simulate events that are otherwise impossible to observe at the required spatial and temporal resolutions.Most climate simulators utilise boundary conditions to represent the non-computed physical processes on which the simulator relies.Generally, boundary conditions are fixed using a reference run or runs from existing multimodel ensembles (MMEs) that explicitly model the boundary condition's physical process.For example, to run an ice-sheet model over the last glacial maximum (LGM) requires climate variables such as temperature, and precipitation (Gregoire et al., 2012) that can be obtained using the Paleoclimate Model Intercomparison Project (PMIP) ensemble of simulator runs (Ivanovic et al., 2016).MMEs can be biased and do not necessarily span the uncertainty of the physical process (Salter et al., 2022).
If modelling boundary conditions is viewed as a statistical reconstruction problem, there is a rich literature in statistics that attempts to combine data from multiple models and historical observations to infer spatio-temporal climate properties.Rougier et al. (2013) present a generalised framework, therein termed the coexchangeable model, where exchangeability judgements over an MME along with an assumption that each model in the ensemble has the same a priori covariance with the field they aim to represent lead to a simple model that can be used to estimate a true unobserved process.Second order inference via Bayes linear methods (Goldstein and Wooff, 2007) for this model ensures scalable reconstructions for a spatio-temporal field, and the requirement for only prior means and variances implies an easier prior modelling task.The coexchangeable model has seen some interest in more recent research, such as in Sansom et al. (2021) where it is used to model emergent constraints for future climate projections.Alternative to the methodology of Rougier et al. (2013), a line of research stemming from Chandler (2013) proposes a similar fully probabilistic model with individual weightings of each simulator rather than via an assumption of exchangeability.As noted in Rougier et al. (2013), the model in Chandler (2013) is a special case of the coexchangeable model.Other statistical reconstructions of boundary conditions exist; however, they are either derived from simulator data alone and are limited by the span of the ensemble (e.g.Salter et al., 2022), or are reconstructed from large and dense datasets (e.g.Liu and Guillas, 2017;Zhang et al., 2020;Sha et al., 2019) and so are inappropriate here given the relative data sparsity at the LGM.There are a family of post-processing methodologies in the climate sciences, therein termed offline data assimilation, that adjust model output with direct observations or proxy data based on ensemble methods (e.g.Hakim et al., 2016;Steiger et al., 2014).We show that these methodologies, as well, may be subsumed by the coexchangeable model.Online data assimilation techniques are not possible for large climate ensembles: the simulators are run on large supercomputers, are highly bespoke, and very limited access to the model architectures is given outside of the modelling groups.
Joint reconstruction of two or more physically distinct fields may, from a theoretical perspective, appear straightforward within hierarchical Bayesian frameworks such as those proposed by Chandler (2013).However, scalability becomes a much bigger challenge.This is particularly so when the fields are highly dependent, as many in climate are.In this study we jointly model sea-surface temperature (SST) and sea-ice concentration (SIC) to drive a coupled atmosphere and ice-sheet model, where the dependence between the boundary conditions is critically important.The need for scalable inference and generality makes a coexchangeable approach desirable; though extensions are required to incorporate the additional assumptions of conditional exchangeability between the fields.Thus, we develop a coexchangeable process model that offers a Bayes linear analogy to the natural Bayesian hierarchical model for the problem.From simple and natural exchangeability judgements, we develop efficient, scalable inference for joint reconstruction.
This article proceeds as follows.Section 2 provides background and context to the applied modelling problem.Section 3 reviews the existing statistical literature on Bayes linear statistics and exchangeability analysis.Section 4 extends the coexchangeable model for coupled processes, providing a Bayes linear analogy to the Bayesian hierarchical model for which we then present scalable inference via geometric updating.Section 5 uses the developed methodology to reconstruct SST and SIC.Section 6 discusses the results of using the reconstructions of SST and SIC to simulate ice sheet-atmosphere interactions at the LGM, and Section 7 concludes.Code and data are provided as part of an R package downloadable from github.com/astfalckl/exanalysis.

Reconstructing boundary conditions of SST and SIC
Modelling and understanding paleoclimate events is crucial in understanding potential effects of future climate change: one way to calibrate climate and ice-sheet behaviour is to simulate the past (Kageyama et al., 2017;Harrison et al., 2015;Schmidt et al., 2014).The last major deglaciation (∼21-7 kya) is the most natural period to study, given the relatively rich source of observations on the climate and ice sheets compared to the more distant past.To begin to study deglaciation, an ice sheet needs to be grown within the model under the steady state boundary conditions.The ice-sheet is grown by forcing an ice-sheet model with the relatively stable conditions of the LGM (∼23-19 kya) until convergence.Seasonal variation is known to play a significant role in this process (Joughin et al., 2012), so the boundary conditions contain the seasonal cycles.The character of the resulting ice sheets are sensitive to these boundary conditions and so it is crucial to use an accurate reconstruction, realistic uncertainty quantification and a method for perturbing the boundary conditions under uncertainty to force the ice sheet model.
Existing reconstructions of LGM SST and SIC are predominantly derived from either paleodata syntheses or via numerical simulation.Data only reconstructions include CLIMAP (CLIMAP Project, 1981), GLAMAP (Sarnthein et al., 2003), MARGO (Kucera et al., 2005), and more recently Paul et al. (2020).As these global reconstructions are based solely on proxy-based paleodata, they are subject to large measurement error, biases in polar regions, incomplete spatial coverage, and poor temporal resolution.Coupled simulations of the LGM are useful to ensure spatio-temporal coverage and consistency of SST and SIC, but these can also be very different from observations in critical regions for growing ice sheets (Salter et al., 2022).PMIP is the most notable experimental body that guide protocols for coupled ocean-atmosphere models to simulate the LGM climate (Kageyama et al., 2017(Kageyama et al., , 2021)).The PMIP community has produced MMEs for different phases of the project run with different generations of model and slight adjustments in inputs.It is common practice to use a PMIP simulator output to directly force ice sheet simulations (e.g.Gregoire et al., 2016).One study to use both paleodata and simulators is Tierney et al. (2020) who adjust a model with observations to provide SST reconstruction, with uncertainty, based on a single simulator.Differences in model physics typically induce more variability than perturbations to the parameters of a single model.Therefore, reconstructions based on a single model may be biased and overconfident.Our methodology allows for the first joint reconstruction of SST and SIC that coherently combines the PMIP models (we use iterations PMIP3 and PMIP4) with available proxy data to deliver boundary conditions with uncertainty quantification.
We use three sources of data to inform our reconstructions: PMIP simulations, SST proxy data, and maximum sea-ice extents.As with Rougier et al. (2013) and Sansom et al. (2021), we select a single representative simulation from each modelling group that contributed to the PMIP3 and PMIP4 MMEs in order to make the assumption of prior exchangeability reasonable.We use the MARGO SST proxy data compilation where seabed sediment core samples were used to infer the true LGM SST, supplemented with some more recent data from Benz et al. (2016).Note, many of the SST measurements are far from any icesheets but can still useful in constraining a global climate simulation.Finally, we use the Southern Hemisphere maximum sea-ice extent as published in Gersonde et al. (2005); Northern Hemisphere extents are only available for specific regions (e.g.De Vernal et al., 2005;Crosta et al., 1998;Xiao et al., 2015), and so we use a simple estimate of the Northern Hemisphere sea-ice extent provided by subject matter experts.SST proxy data and maximum sea-ice extents are shown in Figure 2 in Section 5 alongside the specification of the statistical model.

Existing statistical methodology
SST and SIC are dependent quantities: warmer waters will support less sea-ice and viceversa.The physics that govern the relationship between SST and SIC is represented by a series of partial differential equations, the structure and parametrisations of which change across the simulators and with reality.For example, certain simulators are predisposed to supporting more or less sea-ice at a given SST.The dependencies between SST and SIC are naturally modelled as an emergent property of the model and captured within a hierarchical framework using conditional probability statements.However, in climate modelling, specification of the probability distributions is not obvious and computation for large climate models is prohibitive.An alternative view treats expectation, rather than probability, as the primitive quantity (De Finetti, 1975); probabilities are then the expectations of indicator functions for events.This motivates second order approaches such as Bayes linear methods (Goldstein and Wooff, 2007), where inference concerns expectations and variances directly rather than as a by-product of probabilistic inference.Model specification thus only concerns specifying expectation and variance, rather than the full probability distributions.Rougier et al. (2013) show the advantages of second order specification for climate modelling and further show that based on certain judgements of exchangeability, efficient methods for inference of MMEs may be formed.Our application requires extending the theory of Rougier et al. (2013) to exchangeable processes, that is, second order hierarchical models.First, we review the existing methodology: Section 3.1 provides a short introduction to Bayes linear theory, Section 3.2 defines second order exchangeability, and Section 3.3 presents the coexchangeable model of Rougier et al. (2013).

Bayes Linear Theory
Under the Bayes linear paradigm, beliefs on random quantities are described via expectations and variances and are then adjusted by data.The belief specifications define an inner product space in which the random quantities live; the inner product space is analogous to a probability distribution in probabilistic Bayesian analysis.Consider random quantity, X, with observations X i , defined on the Hilbert space X endowed with inner product ⟨X, Y⟩ = E[X ⊺ Y].Denote by D the collection of observed data, D = (X 1 , . . ., X m ), as a concatenated vector of the observations.In general, the X i do not necessarily have the same length or a-priori belief specifications and can represent multiple sources of data that inform X.The random quantity of interest, X, may be multivariate or univariate, in which case its inner product is simply E[XY].Later, X will denote the unknown spatio-temporal field of SST, and the X i , the observed simulator outputs over which we will assume exchangeability.We write the adjusted expectation in terms of X and D as E D [X], that is, the expectation of our beliefs X adjusted by data D. Adjusted expectation is defined as the element in the subspace D = span[1, D] that minimises ∥X − E D [X]∥ and has solution where var[D] † is any pseudo-inverse of var [D], most commonly the Moore-Penrose inverse.Equation (1) describes the orthogonal projection of each element of our beliefs, X, onto D.

Bayes linear analysis of exchangeable data
Exchangeability for a sequence of random quantities within a probabilistic Bayesian analysis represents a simple a priori symmetry judgement that implies that any finite sub-collection of quantities within the sequence have the same distribution.Second order exchangeability for such a sequence is an analogue for judgements when expectation is primitive, and implies that any finite sub-collection of quantities share the same joint inner product space, i.e. prior expectation and variance.If data, D, are second order exchangeable we may, according to the second order representation theorem (Goldstein, 1986), write for a common mean M and uncorrelated residuals R i .Second order exchangeability implies that all observed X i are of the same length, cov[X i , ∀i, i ′ .For second order exchangeable sequences there is predictive sufficiency for updating beliefs on X by only updating M by the data D. Geometrically, this means that given M, D and X are orthogonal and thus uncorrelated; these results are established in Goldstein (1986).Further, the sample mean, X = 1 m m i=1 X i , is Bayes linear sufficient for updating beliefs on M, and consequently on X. Second order exchangeability affords two main advantages: first, belief specifications are simplified; and second, sufficiency of the sample mean makes inference independent of the number of samples m.Rougier et al. (2013) leverage second order exchangeability to describe a Bayes linear approach for modelling MME's.In what follows, we explicitly define multivariate quantities, represented by the bold font.Let X := {X 1 , . . ., X m } be a collection of q-dimensional outputs from the m simulators that form the MME; X * , the true unobserved process that the simulators aim to model; and Z X , the noisy and incomplete observation of X * .The model requires only two assumptions: first, that the X i are second order exchangeable and, second, that the X i are coexchangeable with the truth X * , implying cov[X * , X i ] = Σ, ∀i.As in (3), exchangeability within X implies

Exchangeability analysis for multi-model ensembles
where M X is a shared mean term, the R i X are the zero-mean, uncorrelated residuals of each simulator with common variance, and the M X and R i X are uncorrelated.Coexchangeability between X * and X implies sufficiency of M X for X * .This allows us to write where A is a known matrix, and U X represents the ensemble discrepancy that is uncorrelated with M X and the R i X .The data, Z X are modelled as for measurement error W X , and known incidence matrix H X .The statistical model defined by ( 4)-( 6) will hereafter by referred to as the coexchangeable model ; a graphical representation is provided in Figure 1a at the end of Section 4. Inference for this model makes use of Bayes linear sufficiency of the ensemble mean, X = 1 m m i=1 X i , for updating by X and is therefore very efficient.Updated beliefs on X * is done in two stages: first the update of our beliefs by the ensemble, and second by the data; the inferential procedure is outlined in the supplementary material.Climate post-processing routines, as in Hakim et al. (2016) and Steiger et al. (2014), are subsumed by the coexchangeable model, and simply describe (6) with beliefs E[X * ] = X and var[X * ] = P. Calculations that rely on ensemble methods approximate P by the empirical covariance matrix of the ensemble, or some linear transformation thereof (see definitions in Whitaker and Hamill, 2002;Snyder and Hakim, 2022).Our contributions, that build on the coexchangeable model, may also be considered as similar developments to such post-processing routines popular in the climate sciences.
4 The coexchangeable process model

Exchangeable processes
Consider each simulator as producing output pairs of q-dimensional fields (X i , Y i ) with the corresponding fields in reality denoted (X * , Y * ), for which we have partial observations Z X and Z Y , made with some error.In what follows we define SST by X and SIC by Y. Changes in SST and SIC are driven by complex physical relationships.We capture structural dependencies between random quantities (X, Y), by first only imposing the coexchangeable model for X i , X * , and then considering exchangeability judgements over the processes Y i , given X i , and coexchangeability of Y * , given X * .Whilst it is enticing to consider the coexchangeable model over both fields simultaneously, it is deficient here in two ways.First, due to the dependence of Y i on X i , the assumption of second order exchangeability between the Y i is violated.Second, the natural way to construct the model is to define the process of Y i given X i parametrised by some β i as in a hierarchical model.Inference on the β i is of scientific interest; however, the coexchangeable model does not allow for this.The model sophistication required to handle the hierarchical structure requires nuanced judgements of conditional second order exchangeability.In this section, we simply state the exchangeable process model, and defer detailed discussion of the conditional exchangeability judgements to Section 4.2.
For any single simulator we model E[Y i ] = M(X i ; β i ) as a process of X i parametrised by some β i , specific to the ith simulator.The mean function M(X i ; β i ) may represent any relationship between X i and β i and, to infer β i from Y i , we need only specify a joint inner product space in which they reside.We set M(X i ; β i ) = ϕ(X i )β i where ϕ(•) maps X i to some specified family of basis functions.In our application, we use a monotonic decreasing spline basis for ϕ(•) to reflect the property that warmer SSTs will tend to lead to less SICs.We may then write where R i Y is some associated residual term.A simple example to consider is where X i and Y i represent temporal observations at a single location, ϕ(X i ) = X i and β i is a scalar.This simply models X i and Y i as scalar multiples of each other, where the multiplier is simulation dependent.Such examples are common when modelling emergent constraints in climate modelling.Note, the choice to model M(X i , β i ) linear in β i is not required by the theory but it allows for natural specification of the inner product space and leads to sufficiency arguments, discussed below, that aid computation.To complete the analogy with a Bayesian hierarchical model, we impose second order exchangeability over β 1 , β 2 , . . .so that with expectation . Conditional exchangeability does not lead to sufficiency of the sample mean and so we must calculate the belief updates in the much larger joint inner product space.Define Φ i := ϕ(X i ); equations ( 7) and ( 8) imply that the joint space of the Y i and the β i can be formed as where 0 a×b is a a × b zero matrix, J a×b is a a × b matrix of ones, and I a is the a × a identity matrix.
Calculating the belief updates using ( 9) is not immediately obvious as the random quantities β i appear on both the left-hand side as data and the right-hand side as unknown parameters.Noting that (8) is equivalently restated as 0 = M β + R i β − β i , and using the re-parametrisation of Hodges (1998), we can re-express (9) as  2); the joint belief specifications and specific updating equations are provided in the supplementary material.We note that the specification of (9) differs from the standard approach to modelling exchangeable processes in Bayes linear statistics where inference only concerns M β by, in effect, substituting (8) into (7) (see Goldstein and Wooff, 1998).By jointly modelling the β i and M β we provide a closer analogy to Bayesian hierarchical models.
As we are required to update our beliefs in the joint inner product space, computation of E Y [B] and var Y [B] can be difficult.These calculations require an order O(m 3 (q + k) 3 ) matrix inversion of var [Y], where m is the number of simulations in the MME, q is the dimension of the climate simulation, and k is the dimension of the β i .Theorem 4.1 shows that a Bayes linear sufficiency argument can be made that permits a smaller computational order of O(8k 3 m 3 ) for k < q, thus enabling efficient inference.Theorem 4.1 says, in effect, that the beliefs on B may be equivalently updated by the projection of the Y i onto the k-dimensional column space of Φ i .For climate models, q is generally very large.For many basis designs k ≪ q, and when groups, i, index MIP simulations, m is generally small.
Then β is Bayes linear sufficient for Y for adjusting B if the column space of projection matrix Proof.Proof available in the supplementary material.
Matrix Φ i being of full rank is sufficient, but not necessary, in satisfying the condition C(P i ) = C, ∀i in Theorem 4.1, and most sensible basis designs will ensure this.Assuming C(P i ) = C, ∀i, and appealing to the sufficiency of β for Y for adjusting B, we write the joint specification of the exchangeability judgements made in (10) as where and var β[B] follows (1) and (2); the specific equations are given in the supplementary material.We also provide partial updating equations for E β[M β ] and var β[M β ] which are calculated with computational order O(k 3 m 3 ) and may be used in equations (S4) and (S5) that update the full coexchangeable process model.

Repeated observations of a process
Assume, within each simulator output i, we observe a sequence (X i 1 , Y i 1 ), (X i 2 , Y i 2 ), . . . of observation/covariate pairings; here, each (X i t , Y i t ) is some p-dimensional sub-level process contained in (X i , Y i ) and the β i are invariant to the dimension indexed by t.In the simple example provided above, each (X i t , Y i t ) are pairings of scalar observations, and so p = 1, and β i is invariant in time.Similarly, for our application, each t indexes time, but the X i t and Y i t are spatial observations within the spatio-temporal X i and Y i ; though we could instead index space, both space and time, or some other feature of the process.The exchangeable process model implicitly requires an assumption of conditional exchangeability in (7), the second order equivalent of the exchangeability judgements used in Bayesian regression (for discussion see Williamson and Sansom, 2019).Specifically, conditional second order exchangeability implies that the Y i 1 , Y i 2 , ... are second order exchangeable given some X i t that is constant for all t.We may thus specify a mean process, E[Y i t ] = M(X i t ; β i ) for some known covariate X i t and unknown parameter β i ; above, we assume the mean process to be E[Y i t ] = ϕ(X i t )β i .According to the second order representation theorem, where cov Yt ] and we arrive at (7) by building the joint representation, that is, stacking the instances of t.In can be intuitive to think of the full simulator outputs (X i , Y i ) as matrices with the variant and invariant dimensions (here, space and time) indexed across the rows and columns, respectively.The mathematics in Section 4.1 simply then require X i and Y i to be substituted by their vectorised equivalents, vec(X i ) and vec(Y i ).

Reality as a coexchangeable process
We now show how judgements of coexchangeability may be made to incorporate (10) or ( 11) into the coexchangeable model.Define the true process of Y i that the simulators attempt to resolve as Y * .An assumption coexchangeability of Y * and Y i given fixed X i = X ∀i, and hence Φ i = Φ ∀i, is equivalent to assuming coexchangeability of Y * and the β i .Bayes linear sufficiency of M β for the Y i for adjusting Y * follows.We write a model for where U Y is a model-mismatch term uncorrelated with the Y i and the assumption of coexchangeability permits A Y to be any matrix of suitable dimensions.The choice of A Y = ϕ(X * ) is obvious in our context but different choices are permissible should they make sense to the application.Finally, the data, Z Y are modelled as As with the coexchangeable model in Section 3.3, we show an graphical representation of coexchangeable process model in Figure 1b; the equations for the two-stage belief update are provided in the supplementary material.
The coexchangeable model of Rougier et al. (2013) b) The coexchangeable process model Figure 1: Graphical representation of the coexchangeable and coexchangeable process models.Boxes represent observed quantities, dashed circles represent unobserved quantities over which we make prior belief specifications, and solid circles represent unobserved quantities for which we calculate updated beliefs.Analogous to conditional independence in probabilistic models, arrows may be used to identify Bayes linear sufficiency between quantities.
For simplicity, residual terms are omitted.
5 Joint reconstruction of Paleo sea-surface temperature and sea-ice with PMIP3 and PMIP4

Paleodata
Geological, ecological and geochemical measurements of the LGM have large associated uncertainties, and these uncertainties are further compounded by relating the measured processes into proxies for SST and SIC.We use judgements from subject matter experts to account for sampling bias and the errors in the proxy-data, which are considered to be systematic in space.In some cases we directly incorporate these judgements into the belief structure of the model, namely, expectations, variances, and covariances.In other cases, similar to Rougier et al. (2022), we use 'pseudo-observations' to reflect subject matter experts' judgements in sparsely observed areas.SST data, Z X , uses proxy measurements obtained from sea-bed sediment core samples recorded either as annual or summer means.The measurements are shown in Figure 2; annual and summer means are depicted with points and triangles, respectively.There is very likely some strong observation bias in the foraminifera-based proxy measurements from the Arctic and Nordic seas, approximately north of 62 • N. In this region, very cold water and full sea-ice coverage are common; both inhospitable conditions for most foraminifera species.The non-existence of foraminifera in ocean sediments are rarely reported, since the absence of foraminifera leaves little to be analysed in ecological or geochemical studies.
Thus, observations may be biased towards the warm climate events naturally present within the inter-decadal variability, when foraminifera are found.We account for this through the specification of the measurement error term in (6), W X .Define B N as a set of indices that index observations from the Nordic Seas in Z X .We set As the bias originates from a systematic reporting error we believe the measurement errors to be correlated.We partition var[W X ] = V D (I + V B )V D and set V D as the diagonal matrix of the reported standard deviations in the MARGO dataset, and V B[b,b ′ ] = 1 ∀b, b ′ ∈ B N where b ̸ = b ′ , and 0 otherwise.This represents the weakest possible belief specification as it makes the measurement error in the Nordic seas perfectly correlated, in essence reducing the information into a single observation.Without accounting for the bias in these observations, the Nordic Seas would be too warm to support sea-ice, which is known from geological records not to be the case.The specification of E[W X ] b = 2 comes from subject matter experts; even still, due to the imposed correlation structure the analysis is not sensitive to this specification.Finally, we set incidence matrix in ( 6), H X = H T X ⊗ H S X , where H T X calculates either the annual or summer means of X * as necessary, and H S X spatially interpolates these averages from simulator's spatial grid to the data locations.Point-wise proxy measurements of SIC are difficult to interpret and unreliable.More robust are estimates of maximum sea-ice extent.The Northern and Southern extents used for Z Y are shown in Figure 2 by the red dashed lines.Z Y is a 4145-dimensional vector, i.e. the same spatial resolution as the numerical models, that records a 1 at spatial locations within the extent boundaries, and a 0 outside.Incidence matrix, H Y in ( 14) is a p × q matrix that collates, from Y * , the February SIC from the Northern hemisphere and August SIC from the Southern hemisphere.We specify measurement error, W Y , to be spatially correlated and certain of SIC estimates in the poles, where we are confident that there is full sea-ice coverage, and mid-latitude and equatorial regions, where we are confident there is zero sea-ice.We set var[W Y ] = Kcor[W Y ]K where K is a diagonal matrix that represents our marginal uncertainty and cor[W Y ] is spatially correlated error; descriptions of K and cor[W Y ] are given in the supplementary material.

Fitting the coexchangeable process model
To fit the coexchangeable process model we first model SST via the coexchangeable model described in Section 3.3 and the process of SIC given SST is modelled using the methodology developed in Section 4. As described above, we build a MME by selecting a single representative simulation from each of the PMIP3 and PMIP4 modelling groups, with the exception of the HadCM3 model simulations where we make use of all three available PMIP4 simulations that use different ice sheet boundary conditions.The m = 13 models selected are The assumption of coexchangeability is a prior judgement, and at the time of the analysis each of these models was deemed coexchangeable by the project's SMEs.Each ensemble member was projected onto the FAMOUS ocean grid with 4145 spatial locations, and we use the 12 monthly means of SST and SIC.

Sea-surface temperature
Following notation in Section 3.3, define the ensemble SSTs as X = {X 1 , . . ., X m }, and assume second order exchangeability within the ensemble.This leads to the representation in equation ( 4), for which we require prior specification of var[M X ] and var[R i X ].Coexchangeabilty of X i and X * leads to (5), for which we require prior specification of model mismatch terms E[U X ], var[U X ] and incidence matrix A. We specify var[R X ] = α 2 var[M X ] so that var[X] = (1 + α 2 )var[M X ] and, following Rougier et al. (2013), set E[U X ] = 0 and A = I.
For most climate fields, including SST, we can exploit spatio-temporal structure so that computation of adjusted beliefs is scalable.The most obvious way to do this is to assume separability through space and time so that var[X] and var[X * ], and hence var[U X ], have a Kronecker structure.For example, var[X] = var[X T ] ⊗ var[X S ] where T denotes time and S denotes space.If we similarly partition var[U X ] = var[U X T ] ⊗ var[U X S ] and equate either var[X T ] = var[U X T ] or var[X S ] = var[U X S ] computation of our adjusted beliefs is efficient.Note that this assumption is weaker than the application in Rougier et al. (2013) where it is assumed that κ 2 var[X] = var[U].Here, we set var[X T ] = var[U X T ] = J n×n as our subject matter experts believe that discrepancies between the X i and reality are predominantly spatial and constant in time.
We set var[X S ] as the positive semi-definite matrix that minimises the distance between var[X] and the empirical covariance matrix of X, S X , under the Frobenius norm.This specification follows similar arguments to Rougier et al. (2013) where var[X] = S X , but preserves Kronecker structure in var[X] to allow for efficient computation.Details of this calculation are in the supplementary material.Elements of var[U X S ] are defined via the where τ ≥ 6, c ∈ (0, π], (a) + = max(0, a), and d(i, j) is the geodesic distance between locations i and j.The C 4 -Wendland covariance function is commonly chosen so as to define a smooth process on the sphere; (see, for example Astfalck et al., 2019).Parameters are specified as κ = 1.61, c = 0.92, and τ = 6; these values are selected to represent the subject matter experts' beliefs as to the magnitude and correlation lengths of U X .
A sensitivity analysis is provided in the supplementary material to highlight how these judgments influence inference.
The 2-stage Bayes linear update follows Rougier et al. (2013).As with Rougier et al. (2013), we assume the first update E Here, we choose α 2 = 1 and have m = 13 and so do not assume var , where var X[X * T ] = J n×n and var . The above specifications lead to a prior predictive for X * that is warmer than our true beliefs at the poles where we are certain there was full sea ice coverage and so the SST must be −1.92 • C.This is problematic due to the data sparsity in the poles, and so we correct our prior by adding 10 equally longitudinally spaced pseudo-observations at 80 • N and 80 • S, each of −1.92 • C. Figures 3a and 3b plot E X[X * ] and marginal standard deviation of var X[X * ], respectively, for January.
The second update is calculated by where var 3c and 3d show these updates, also for January, respectively.Figures 3e and 3f give indication of what information is gained from the data: Figure 3e shows the difference the ensemble mean and E X,Z X [X * ]; Figure 3f shows the empirical standard deviation of the ensemble, which when compared to Figure 3d is shown to be more uncertain.Regions of cooling are apparent westward of large continental masses, i.e. on the North American Pacific Coast, or the African Atlantic Coast.This is indicative of models not capturing up-welling phenomena, and is a pattern previously observed in the error between models and pre-industrial simulations (Eyring et al., 2019), as well as the cooling in the Southern Ocean, and warming the Indian and Pacific oceans.Larger uncertainty is seen in the Pacific where measurements are sparse.

Sea-ice concentration given sea-surface temperature
Following notation in Section 4 we now consider the process of SIC given SST.Define the jointly-observed ensemble of SST and SIC as (X, Y) = {(X 1 , Y 1 ), . . ., (X m , Y m )} where each Y i is a spatio-temporal vector of SIC that we model dependent on SST, X i .The Y i comprise 12 (i.e.monthly) 4145-dimensional conditionally second order exchangeable spatial processes, Y i t , where i indexes the ensemble member and t indexes time.We assume E[Y i t ] = ϕ(X i t )β i which leads to the representation in ( 7), for which we require specifications of ϕ(•), E[β i ], var[β i ], and var [R i  Yt ].We assume the β i to be second order exchangeable which leads to the representation in (8) that requires specification of E[M β ], var[M β ] and var[R β ].Together, these exchangeability judgements lead to the representation for the Y i in (9), and equivalently (10).
For modelling SIC given SST we require spatial variation in the physics of the process.For example, the relationship between SIC and SST is different in the Nordic seas where sea-ice is supported at warmer SSTs than other locations.To account for this, we specify ϕ(X i t ) = Ψ(X i t )Θ i where Ψ(Y i t ) models the behaviour of SIC and SST at individual locations using spline bases, and Θ i is a fixed-rank spatial basis of the spline coefficients.Note that we can model the spline coefficients individually in space, in which case ϕ(X i t ) = Ψ(X i t ), but as the size of β i then scales with the spatial resolution; this is not feasible for climate models.Here we specify Ψ(X i t ) with I-spline bases, a basis family commonly used for monotone functions (Ramsay, 1988).To ease computation, we project the spatial basis Θ i onto a principal component design calculated using a projection of the Y i onto the column space of the Ψ(X i ), Θi .Approximating the spatial coefficients using principal components restricts inference for the β i and M β to linear combinations of Θi ; we argue this is appropriate for modelling the mean MME process.We use a more flexible parametrisation in the model discrepancy below so that inference on Y * given the SIC data is not restricted to linear combinations of the principal component design.Full specification of Ψ(X i t ) and the Θ i are in the supplementary material.All Φ i = ϕ(X i ) are full rank and so the conditions of Theorem 4.1 are met and β = ( β1 , . . ., βm ) is sufficient for (X, Y) for updating B = (β 1 , . . ., β m , M β ).We specify var [R i  Yt ] as a heteroskedastic error process; smaller variance is specified for very cold SSTs, where we are confident there is sea-ice, and warm SSTs, where we are confident there is no sea-ice, and larger variance is specified for SSTs approximately between −1 • C and 3 • C where sea-ice behaviour is vari- Adjusted beliefs E β[B] and var β[B] are calculated as in the supplementary material.As mentioned in Section 4.1 an advantage of specifying our model jointly over the β i and M β is that we obtain inference for each β i (as opposed to only M β as in Goldstein and Wooff, 1998).We look at the individual fits in Figure 4 where we plot, at four locations, the fit ensemble members prior to the coexchangeable adjustment.Location A shows a point in the Arctic where the models largely agree.Location B shows a region that occasionally sees small concentrations of sea-ice in the models but is predominantly too warm to support full sea-ice coverage.Location C shows three MME members whose relationship of SIC given SST disagree with the rest of the ensemble.Finally, location D shows sea-ice behaviour at the Antarctic sea-ice edge.Given the SSTs that are observed, the differing physics predominantly lead to differences in Winter sea-ice.S6) and (S7).Expectations and marginal standard deviations of both updates, calculated with X * = E X,Z X [X * ], are shown in Figure 5.The expected SSTs do not produce adequate seaice when only considering learnt relations from the MME (Figure 5a) and marginal standard deviation of SIC is large in regions where the expected SST is cold enough to guarantee sea-ice coverage (Figure 5b).Updating SIC using the data leads to more extensive seaice cover (Figure 5c) and a reduction of standard deviation everywhere except the sea-ice boundary (Figure 5d).We also show, in Figures 5e and 5f the empirical ensemble mean and standard deviation, respectively.Our final reconstruction produces more sea-ice in the Southern Hemisphere, but more crucially, drastically reduces the sea-ice uncertainty in the sea-ice interior.Similar behaviour is seen in the Northern Hemisphere winter.
6 Sampled boundary conditions and their influence on glacial ice sheet modelling outputs The remit of this work was to reconstruct, with uncertainty, joint SST and SIC fields to act as boundary conditions into FAMOUS-Ice (Smith et al., 2021), an 'atmosphere only' global climate model coupled to a ice sheet model.To examine the effects that the reconstruction has on the atmosphere-ice sheet model outputs we run a small ensemble varying only the SST and SIC boundary conditions.Bayes linear analysis updates our beliefs of the first two moments of X * and Y * .Define the Cholesky decompositions L X and L Y , so that where Z is a vector of independent random variables Z i with E[Z i ] = 0 and var[Z i ] = 1.Note, assigning a distributional form to the Z i is a further choice; for example, we could assign a Gaussian, uniform, or any other distribution that would make sense given the context.Similar to how ensemble design is considered in history matching (e.g.Salter et al., 2019), we may eschew such distribution assumptions and set the bounds of the Z i with an appeal to Chebyshev's inequality.As is standard in the history matching literature, we set the concentration parameter k = 3 and sample Z i ∼ U(−k, k); we call this the plausible set, and stress that it is not a probabilistic design, rather, a bounding notion of plausibility.
To generate a joint sample ( X * , Ỹ * ), we first sample X * and then Ỹ * .The structural dependencies between the sampled X * and Ỹ * are captured by our updated beliefs on M β and U Θ .An example of a joint sample ( X * , Ỹ * ) for the months of February and August is given in Figure 6.
We generate an ensemble of 25 runs comprising a single reference run utilising the mean SST and SIC fields produced as part of the PMIP4 LGM experiments, and 24 randomly generated plausible samples of SST and SIC.Averaging over PMIP models to act as boundary conditions in ice-sheet modelling is commonplace (e.g.Kageyama et al., 2017).Crucially, it should be noted that the reference run and each of the PMIP runs (X, Y) do not lie within the calculated plausibility bounds.To examine the impact of running with plausible boundary conditions, we compare reference and plausible ice sheet heights at four geographically distinct locations: Arctic Canada, Central Greenland, Hudson Bay, and the Pacific coast; this is shown in Figure 7.The simulator is run for 5000 years, with the ice sheet initialised with the LGM Glac-1D reconstruction (Tarasov et al., 2012).Beyond the SST and SIC fields, no other model parameters were changed and the model set-up was based on previous simulations of the Greenland ice sheet (Gregory et al., 2020).
Figure 7 shows that the boundary conditions have a strong influence on the simulated ice sheet.SST and SIC can affect ice sheets either through changing the evaporation over the ocean that transforms into snow falling on the ice sheet, or by warming/cooling the regions close to the oceans thus affecting the surface melt rate.We find that the primary differences in ice sheet size are due to changes in snow accumulation.Our ensemble mainly produces lower ice elevation than the reference run.Examination of the individual ensemble members revealed that this was due to the cooler Eastern Pacific and Western Atlantic boundary conditions reducing evaporation over these oceans causing a reduction in the accumulation of snow onto the ice sheet.The difference between the reference run and the samples is most pronounced at the Pacific coast.This matches our expectation that ocean-proximal sites are more sensitive to marine influence of the SST and SIC than more continental sites.This is also where we see the strongest reduction in SST in our samples compared to the PMIP model simulations.Indeed, the second update causes a strong Pacific cooling along the coast of North America (Figure 2e) relative to what we expect based on the PMIP models.This is a region where models tend to underestimate the upwelling of cold waters from the deep ocean.Such biases are common in climate models, but are particularly problematic for coupled climate-ice sheet models where the strong feedbacks between climate and ice sheets amplify the effects of climate biases, which can lead to runaway ice sheets (amplified growth or decay) and unrealistic geometries.The ensemble shows a substantial spread of ice elevation (5-10% of height) caused by the variance in reconstructed SST and SIC, highlighting the importance for considering this source of uncertainty for modelling past ice sheets.Our results show that correcting for biases and incorporating uncertainty in surface ocean conditions has a substantial effect on the simulated ice sheet, which in turn influences the internal dynamics of the ice sheet and its vulnerability to collapse or propensity to grow.The ice sheet geometry itself has direct impact on atmospheric circulation and an indirect influence on ocean circulation from runoff, thus directly impacting global heat distributions and surface climate conditions.Our methodology provides a way to reduce climate biases by prescribing ocean surface conditions compatible with observations, while at the same time exploring the effects of this source of uncertainty on other parts of the earth system.

Discussion
By exploiting natural conditional exchangeability judgements we develop theory for the coexchangeable process model, as an extension to Rougier et al. (2013), that combines multi-model ensembles and data to model correlated spatio-temporal processes.We provide results for efficient and scalable inference that may be used for large spatio-temporal problems where probabilistic Bayesian methods are often computationally infeasible (see, for example, Sansom et al., 2021).Our methodology requires fewer assumptions and less onerous belief specifications than that required by a probabilistic Bayesian analysis.To achieve these advancements we develop a Bayes linear analogue to a hierarchical Bayesian model.By combining exchangeability judgements and using the reparameterisation of Hodges (1998), we extend the Bayes linear exchangeable regressions methodology.We obtain hitherto missing desirable properties present in traditional Bayesian hierarchical models such as the ability to make individual group level inference.
Large scale computational models often have complex spatio-temporal boundary conditions.This is particularly true for Earth system modelling, when any simulation of part of the Earth system, requires other spatio-temporal fields as boundary conditions.Our application looked at palaeo-era ice-sheet modelling, where our model had a coupled atmosphere and ice-sheet, with the SST and SIC as prescribed boundary conditions.These are usually specified using results from a reference simulation, or using a member of a Model Intercomparison Project (MIP).However, individual simulations of Earth system components are known to have biases and any individual simulation cannot adequately represent uncertainty due to boundary conditions.An idea for future investigation is to use the differences between MIP phases to estimate the ensemble discrepancy.This would allow for differences between older and newer MIP phases to inform the discrepancy between the newer models and reality.Whilst outside the scope of this work, using MIP iterations to inform model discrepancy is an interesting avenue, especially for models of the present-day where lots of data are available for validation.
Our methodology allows MIP simulations to be combined with observations efficiently, leading to joint reconstructions of climate boundary conditions that can be used in any area of Earth System modelling.We demonstrate its efficacy by reconstructing last glacial maximum SST and SIC to force an ice-sheet model.We show that the differences between reference ice-sheets and ice-sheets under plausible boundary conditions were considerable and that the uncertainty in the ice-sheet due to propagated boundary condition uncertainty is not ignorable.Other aspects of the Earth system are likely to be sensitive to their boundary conditions, so that joint reconstructions of the type we present here would allow MIP simulations and data to be combined in order to correct existing biases and quantify an important source of uncertainty.The use of MIP ensembles to drive simulations of Earth system components leads to important questions around how these ensembles should be designed.Our method makes the case that a priori exchangeability across as many models as possible is an important design goal.
10) with the familiar linear canonical form Y = XB + E. As we consider the joint representation, expectation is taken jointly over the Y i and β i .The adjusted expectation, E Y [B], and adjusted variance, var Y [B], are calculated via (1) and ( for measurement error W Y , and known incidence matrix H Y .Similar toRougier et al. (2013), we assume sufficiency of Y * for Z Y , allowing us to update beliefs on Y * in two stages.The assumption of sufficiency of Y * for Z Y is akin to an assumption of conditional independence between Z Y and the Y i in a probabilistic analysis.The two-stage update of beliefs on Y * may thus be equivalently thought of as either the joint update by the Y i and Z Y or as sequential updates by the Y i and then Z Y .

Figure 2 :
Figure2: Proxy-based measurements of LGM SST and maximum sea-ice extents.Annual and summer mean SST's are represented by points and triangles, respectively.Sea-ice extents are represented by a red dashed line.The Southern sea-ice extent is as reported inGersonde et al. (2005) and the Northern sea-ice extent is provided by coauthors.

Figure 3 :
Figure 3: Adjusted beliefs of January SST: (a) expectation of SST adjusted by X, equal to the ensemble mean; (b) marginal standard deviation of SST adjusted by X; (c) expectation of SST adjusted by X and Z X ; (d) marginal standard deviation of SST adjusted by X and Z X ; (e) contribution of the data to our expected beliefs of SST E X,Z X [X * ] − X; and (f), marginal standard deviation of the ensemble.All plots are shown in degrees Celsius.
able.As above we assume E[M β ] is well approximated by the empirical mean of the MME members so that E[M β ] = 1 m m i=1 βi ; further, similar to Rougier et al. (2013), we set var[β i ] = λ where λ is a diagonal matrix of the eigenvalues calculated from the principal component decomposition described above.As before we set var[R β ] = var[M β ] and so var[R β ] = var[M β ] = λ/2.

Figure 4 :
Figure4: Spline fits, parameterised by β i , of SIC given SST to each ensemble member prior to the coexchangeable adjustment.Shaded regions denote ±2 standard deviations.

Figure 5 :
Figure 5: Adjusted beliefs of August SIC given expected SST, X * = E X,Z X [X * ]: (a) expectation of SIC adjusted by β; (b) marginal standard deviation of SIC adjusted by β; (c) expectation of SIC adjusted by β and Z Y ; (d) marginal standard deviation of SIC adjusted by β and Z Y ; (e) ensemble mean; and (f), marginal standard deviation of the ensemble.Values are of sea-ice concentration measured between 0 and 1.

Figure 6 :
Figure 6: A single plausible sample from the SST and SIC reconstructions.Figures 6a and 6b show SSTs from February and August, respectively and Figures 6c and 6d show SICs from February and August, respectively.

Figure 7 :
Figure 7: Ensemble time-series plots at four spatial locations: Arctic Canada (A), Central Greenland (B), Hudson Bay (C), and the Pacific Coast (D).The reference run is shown in red, the simulations forced with sampled boundary conditions in black.