Mesh-clustered Gaussian process emulator for partial differential equation boundary value problems

Partial differential equations (PDEs) have become an essential tool for modeling complex physical systems. Such equations are typically solved numerically via mesh-based methods, such as finite element methods, with solutions over the spatial domain. However, obtaining these solutions are often prohibitively costly, limiting the feasibility of exploring parameters in PDEs. In this paper, we propose an efficient emulator that simultaneously predicts the solutions over the spatial domain, with theoretical justification of its uncertainty quantification. The novelty of the proposed method lies in the incorporation of the mesh node coordinates into the statistical model. In particular, the proposed method segments the mesh nodes into multiple clusters via a Dirichlet process prior and fits Gaussian process models with the same hyperparameters in each of them. Most importantly, by revealing the underlying clustering structures, the proposed method can provide valuable insights into qualitative features of the resulting dynamics that can be used to guide further investigations. Real examples are demonstrated to show that our proposed method has smaller prediction errors than its main competitors, with competitive computation time, and identifies interesting clusters of mesh nodes that possess physical significance, such as satisfying boundary conditions. An R package for the proposed methodology is provided in an open repository.


Introduction
Computer models have become essential to study physical systems that are expensive or infeasible, and have been successfully applied in a variety of scientific research, ranging from cell adhesion (Sung et al., 2020) to designing a rocket injector (Mak et al., 2018).
Typically, a physical system is described by a computer model consisting of a series of partial differential equations (PDEs) (Evans, 2010) and evaluated in a two-or three-dimensional space.For instance, in Wang et al. (2018), the governing PDEs are evaluated to simulate the characteristics of a three-dimensional swirling flow in a cylindrical chamber.
These PDEs are typically solved by numerical methods, such as finite element methods (FEMs) or a collocation method (Fornberg and Flyer, 2015), based on a mesh specification in a two-or three-dimensional space, and the numerical solutions are evaluated at these mesh node coordinates (also called grid points (Mak et al., 2018;Tan, 2018a)).The number of nodes is usually fairly large to ensure the numerical accuracy of PDE solutions.Such evaluations, however, are often prohibitively costly for input space exploration.For instance, the high-fidelity simulation in Mak et al. (2018) generates around 100,000 grid points and takes six days of computation time for a given input.Thus, it is essential to develop a cheaper statistical emulator as a surrogate model that approximates the solutions in a timely fashion.
The paper focuses on developing an efficient emulator for a series of solutions at many coordinates in a PDE boundary value problem, where the coordinates are fixed across inputs.A PDE boundary value problem involves solving a set of PDEs subject to specified conditions at the boundaries of the domain.A popular statistical emulator for computer simulations is through Gaussian process (GP) modeling (Santner et al., 2018;Gramacy, 2020), which provides a flexible approximation to the relationship between simulation output and inputs and quantifies uncertainty through its predictive variance; however, the GP is mainly for predicting a scalar output and is not directly applicable to the context of many outputs.One idea is to simultaneously emulate the output at each coordinate separately using independent GPs, which is discussed in Qian et al. (2008), Conti and O'Hagan (2010), and Lee et al. (2011Lee et al. ( , 2012)), and another idea is using GPs with a special shared covariance structure (Gu and Berger, 2016).However, these methods do not incorporate the information of mesh node locations into their models, making it challenging to provide predictions at any coordinates in the domain of interest beyond the mesh coordinates.
Another idea is to perform dimension reduction to approximate the simulation outputs using basis expansion, such as functional principal component analysis (Ramsay and Silverman, 2005) or Karhunen-Loève expansion (Karhunen, 1947;Loève, 1955), and then fit GP models on the coefficients, the number of which is usually much less than the number of outputs.
The methods adopting this idea include Higdon et al. (2008), Rougier (2008), Rougier et al. (2009), Marrel et al. (2011), Mak et al. (2018), and Tan (2018a).Such methods, however, achieve the dimension reduction by a finite truncation of the expansion in a function basis, and the approximation error can introduce additional bias to the predictions.
In this paper, we propose a novel emulator, called mesh-clustered Gaussian process (mcGP) emulator, for predicting the outputs at many fixed coordinates by incorporating the information of mesh node coordinates.Specifically, instead of fitting separate GPs with different hyperparameters at each node coordinate, the proposed method makes use of the divide-and-conquer idea, which segments the node coordinates into clusters with a softassignment clustering approach, within each of which, GPs with the same hyperparameters are fitted at each coordinate.In particular, the Dirichlet process (DP) prior is adopted here, facilitating a flexible clustering structure for the proposed mixture model without the need for specifying a fixed number of clusters.In addition, a basis expansion representation is employed for mesh-based numerical solutions and the GPs are used to model the coefficients.
This approach enables us to make predictions across the entire spatial domain, extending beyond the mesh coordinates.Note that, this basis expansion does not perform dimension reduction, so no additional bias will be introduced to our predictions.Importantly, given such a sophisticated mixture model, the proposed method can be fitted efficiently by adopting the variational Bayesian inference method (Jordan et al., 1999;Wainwright and Jordan, 2008), which provides an analytical approximation to the posterior distribution of the latent variables and parameters, facilitating faster computation and scalability when compared with traditional approaches like Markov chain Monte Carlo (MCMC).
In addition to efficient emulation, our method also provides two important features.
First, our method enables efficient uncertainty quantification for emulation with theoretical guarantees.In particular, we provide the error analysis that not only considers the uncertainty from the emulator given limited training samples, but also accounts for the numerical error arising from the discrete approximation of the continuous domain through a mesh configuration in numerical methods for solving PDEs.Second, by revealing the clustering structures, the proposed method provides valuable insights into qualitative features of the resulting dynamics.Unlike traditional reduced-basis methods for flow simulations in physics and engineering, such as proper orthogonal decomposition (POD) (Lumley, 1967), which are unsupervised learning methods purely based on the flow data, the latent clustering structure by the proposed method is determined by both inputs and outputs as in Joseph and Mak (2021) and Sung et al. (2023), which can be used to separate a simulated flow into key instability structures, each with its corresponding spatial features.
It is important to note that recent developments in the field of finite element methods have led to the emergence of the statistical finite elements (statFEM) approach, which holds promise in advancing uncertainty quantification and model predictions through a Bayesian statistical construction of finite element methods.See, for example, Duffin et al. (2021), Girolami et al. (2021) and Akyildiz et al. (2022).Specifically, statFEM introduces a hierarchical statistical model to handle uncertainties in data, mathematical models, and finite element discretization.It decomposes data into three components-finite element, model misspecification, and noise-each treated as a random variable with a corresponding prior probability density.In contrast, our focus in this paper is primarily on the development of an efficient surrogate model, providing a faster alternative to costly finite element simulations.
The rest of this article is organized as follows.Section 2 provides a brief overview of PDE boundary value problems and mesh-based numerical methods.The proposed model is introduced in Section 3, and its error analysis is derived in Section 4. Real examples are demonstrated in Section 5. Section 6 concludes with directions for future work.Theoretical proofs, and the R (R Core Team, 2018) code for reproducing the numerical results, are provided in Supplementary Materials.

PDE boundary value problems and mesh-based numerical methods
PDEs are an essential tool in the description of complex systems drawing from scientific principles.A PDE boundary value problem typically can be expressed by a set of partial differential equations with boundary conditions, and the solutions are often dependent on certain inputs, denoted as x ∈ χ ⊂ R p , typically representing parameters in the equations and boundary conditions, where χ is assumed to be compact and convex.Specifically, a PDE boundary value problem can be written in the form of where Ω is a compact domain in R d with a Lipschitz boundary denoted by ∂Ω, L(•; x) and G(•; x) are differential operators on Ω and ∂Ω given the input x, respectively, f (•; x) and g(•; x) are two known functions on Ω and ∂Ω with the input x, and u(s) is the solution to the partial differential equations (1).Given the input x, the exact solution to the equations when uniqueness holds, denoted by u 0 (s; x), usually cannot be written explicitly.Instead, numerical methods are used to approximate the exact solution, for which mesh-based numerical methods are most widely used, including the finite difference method (FDM), finite volume method (FVM), and finite element method (FEM) (Brenner and Scott, 2007;Tekkaya and Soyarslan, 2019).The extensions to mesh-free methods, such as the collocation method (Golberg et al., 1999;Wendland, 1998;Fasshauer, 1999;Wendland, 1999;Fasshauer, 1996), are straightforward and will be discussed in the remarks.Specifically, mesh-based numerical methods subdivide a large system into smaller, simpler parts by a particular space discretization in the space dimensions, which is implemented by the construction of a mesh of the object.Figure 1 demonstrates a mesh specification for a 2-dimensional FEM problem, in which the triangular elements connect all characteristic points (called nodes) that lie on their circumference, and these connections are mathematically expressed through a set of functions called shape functions.Suppose that there are N nodes in this mesh-based numercial method, the coordinates of which are denoted by S N = (s 1 , s 2 , . . ., s N ), where s j ∈ Ω ∪ ∂Ω.We assume that S N is fixed across different inputs x.In the case of an adaptive mesh, where S N may vary across different inputs, we explore potential methods in Section 6.The numerical solutions can be expressed as where v j (s) is the (given) shape function depending on the discretization, and β j (x) is the corresponding coefficient, which is independent of s.The shape function has the Kronecker Delta property, that is, v j (s i ) = 1 if i = j and v j (s i ) = 0 if i ̸ = j.This ensures that the function interpolates the solution at the mesh nodes, i.e., u N (s j ; x) = β j (x) for j = 1, . . ., N .
In other words, the solution of the PDE at the mesh node s j is equal to β j (x).As an example, for a triangular element with six nodes as in Figure 1, the quadratic shape functions, which will be adopted in our later implementation in Section 5, can be defined as: where ξ 1 , ξ 2 , and ξ 3 represent the barycentric coordinates associated with the three vertices (Nodes 1, 2, and 3, respectively), which are given by ξ j (s) = a j + b j s 1 + c j s 2 , where a j , b j , and c j are determined based on the geometry and orientation of the triangle, and ensure that the shape functions satisfy the property that they are equal to 1 at the corresponding nodes and equal to 0 at the other nodes.This formulation can be easily extended to multiple triangular elements; we refer more detailed information and variations of shape functions to Bathe (2006).
The coefficients are obtained by solving a linear system, where β N (x) = (β 1 (x), . . ., β N (x)) T is a vector of the coefficients, and L(x) ∈ R N ×N is the stiffness matrix and b(x) ∈ R N is the load vector, both of which are determined by the numerical method.
The main challenge of mesh-based numerical methods is that directly solving the PDE boundary value problem for any input x, or equivalently, solving the linear system (4), can be computationally demanding (e.g., the high-fidelity simulation in Mak et al. (2018)).Thus, an efficient emulator that can approximate the solution, u N (s; x), for any s ∈ Ω, x ∈ χ, is called for.
3 Mesh-clustered Gaussian process (mcGP) emulator From (2), since emulating u N (s; x) is equivalent to emulating {β j (x)} N j=1 (because v j is a known function), we aim to build an efficient emulator that approximates β N (x) := {β j (x)} N j=1 for any x ∈ χ.Suppose that n computer simulations with the inputs, x 1 , . . ., x n , are conducted, and their corresponding solutions at the N nodes are {β N (x i )} n i=1 .Clearly, this is a multi-output regression problem, because for each input x i , the output β N (x i ) returns a vector of size N , where N can be fairly large.To this end, we propose an efficient emulator that couples over clusters of Gaussian process (GP) emulators, with the aid of the mesh specification to find the clustering structure.
To begin, we briefly introduce the GP emulator in the following subsection.

Gaussian process (GP) emulator
A GP is a popular tool for building an emulator for computer experiments (Santner et al., 2018;Gramacy, 2020) due to its flexibility and the capability of uncertainty quantification through the predictive distribution.Specifically, suppose that we aim to emulate the single output β j (x), then the function β j can be assumed to have a GP prior with zero mean and a positive-definite covariance function, , where τ 2 j is a positive scale and Φ θ j is a positive-definite correlation function that depends on some hyperparameters θ j .We denote such a GP as The GP assumes that the random vector b j = (β j (x 1 ), . . ., β j (x n )) T follows an n-dimensional ) T , we assume that the GPs are independent, implying that the N random vectors, b 1 , . . ., b N , are independent.
For a new input x, it can be shown that the posterior predictive distribution is a normal distribution with the mean and the variance where The posterior predictive mean can be used to predict β j (x) and the posterior predictive variance can be used to quantify the prediction uncertainty.
Two families of correlation functions are widely used in practice, which are the power exponential correlation functions and Matérn correlation function (Santner et al., 2018;Stein, 1999).For instance, the Matérn correlation function has the form of where ∥a∥ 2 θ = p k=1 (a k /θ k ) 2 with the p-dimensional lengthscale hyperparameter θ = (θ 1 , . . ., θ p ), ν > 0 is the smoothness parameter (Cramér and Leadbetter, 1967), and B ν is the modified Bessel function of the second kind.

Model specification
While it is possible to model multivariate GPs, instead of independent GPs, using a separable covariance function (Bonilla et al., 2007;Qian et al., 2008) or a nonseparable covariate function (Fricker et al., 2013;Svenson and Santner, 2016), fitting these models can be computationally prohibitive when N is large.In addition, recent studies (Zhang and Cai, 2015;Kleijnen and Mehdad, 2014;Li et al., 2020) have shown that such multivariate GPs could actually yield worse prediction accuracy than the independent (univariate) GPs described in Section 3.1.On the other hand, Gu and Berger (2016)

considers independent
GPs that share the same lengthscale hyperparameters over the N outputs, i.e., θ 1 = • • • = θ N , but assume different τ j 's over the N outputs.
From the perspective of statistical learning, independent univariate GPs sharing the same hyperparameters over the N outputs could be underparametrized, while the one sharing different hyperparameters could be overparametrized.To this end, we propose a flexible model that serves as a compromise between the two models: where DP(H, α 0 ) denotes a Dirichlet process (DP) prior (Ferguson, 1973) with a positive real scalar α 0 and H being a distribution over τ 2 j and θ j .The parameter α 0 is often called concentration parameter.The smaller α 0 yields more concentrated distributions.
The DP prior is a Bayesian nonparametric model and is a popular tool for developing mixture models, which are often called infinite mixture models, because such mixture models have a countably infinite number of mixture components; therefore, these models do not require to pre-specify a fixed number of mixture components, which can be difficult to determine in practice.A DP can be constructively defined by the stick-breaking construction (Sethuraman, 1994), by which (8) can be equivalently expressed as where "iid" denotes independently and identitically distributed, and "Categorical" and "Beta" denote a categorical distribution and a Beta distribution, respectively.From (9), z j is a latent variable indicating the assignment of β j (x) to the k-th GP having the hyperparameters θ k and τ 2 k , where the number of GPs is countably infinite.This forms an infinite mixture of GPs for multivariate outputs, with the mixing proportion Pr(z j = k) = π k .
The proportion π k is given by ( 11), which provides the stick-breaking representation of a , where δ (τ 2 k ,θ k ) is the indicator function whose value is one at location (τ 2 k , θ k ) and zero elsewhere.The proportions {π k } ∞ k=1 always sum to one and can be resembling the breaking of a unit-length stick into a countably infinite number of pieces (hence the name).That is, a portion of a unit-length stick is broken off according to γ k and assigned to π k , so (11) can be understood by considering that after the first k − 1 values have been assigned their portions, the length of the remaining stick is k−1 l=1 (1 − γ l ), and this remaining piece is broken according to γ k and assigned to π k .It is important to note that because the π k 's decrease exponentially quickly, only a small number of components will be used to model the data a priori, which allows the number of clusters to be automatically determined.More details about DP applications can be found in Neal (1992), Lo (1984), and Rasmussen (2000).
We further let the latent indicator variable be mesh-dependent, implying that the clustering structure is determined by the mesh coordinates.Specifically, assume that a node coordinate s for the cluster k follows a d-dimensional multivariate normal distribution, where µ k is the mean and Σ k is the precision matrix, and their priors are a multivariate normal distribution and a Wishart distribution, respectively, that is, where µ 0 , Σ 0 , W 0 and κ 0 are fixed hyperparameters.Unlike a conditional model where z|s is employed to model the dependence of z and s, the model ( 13) is a generative model (Ng and Jordan, 2001), which assumes that the mesh coordinates {s j } N j=1 , while observed and fixed, are treated as instances or realizations generated from the underlying distribution s|z.
This conceptualization allows us to incorporate uncertainty related to distinct instances of mesh coordinates within the same cluster, and offers a consistent way of specifying each component's responsibility for a given node coordinate (Meeds and Osindero, 2005;Sun and Xu, 2010).It is also worth noting that the multivariate normal assumption for s|z aligns with a well-known classification method-quadratic discriminant analysis (QDA), implying that mesh nodes will be categorized into distinct classes/clusters based on this model choice.
Remark 3.1.Although the latent indicator variable is mesh-dependent particularly for mesh-based numerical methods, the idea can also be extended to mesh-free numerical methods, such as the collocation method (Wendland, 1998), where the basis function v j (s) becomes radial basis functions with some given knot locations and the knots can then be clustered in a similar manner.In addition, the idea can be naturally extended to other applications than solving PDEs that may have different dependence structures, such as the spatiallyrelated dependence structure between proximate sites in Dahl and Bonilla (2019) and the network-related dependence structure in Wilson et al. (2012).
Combining the above models, a graphical representation of the proposed model is given in Figure 2. It should be noted that the proposed mcGP is intrinsically different from the infinite mixture of GPs in Rasmussen and Ghahramani (2001), Meeds and Osindero (2005) and Sun and Xu (2010).In particular, the mixture model therein focuses on dividing the input space of x to address both the problems of computational complexity and stationary assumption for a univariate GP, whereas our proposed method addresses multi-output regression problems by dividing the mesh node coordinates, {s j } N j=1 , into regions, within which separate univariate GPs with the same hyperparameters make predictions.

Parameter estimation: Variational inference
Although Markov chain Monte Carlo (MCMC) methods, such as Gibbs sampling, can be naturally used to draw the posterior distribution of hidden variables for DP mixture models (see, e.g., Rasmussen (2000); Neal (2000); Rasmussen and Ghahramani (2001)), they are often computationally demanding.Therefore, variational inference (VI) (Jordan et al., 1999;Ganguly and Earp, 2021) is adopted here to approximate the posterior, which leads to faster computation and efficient scalability.
Denote D as the observational data, D = {β j (x 1 ), . . ., . By the graphical model representation in Figure 2, the joint distribution is where p(b j |z j ) is the probability density function (pdf) of the n-dimensional multivariate normal distribution from (9), i.e., b ) is the pdf of the multivariate normal distribution from (13), p(γ k ) is the beta distribution from (12), and p(µ k ) and p(Σ k ) are the multivariate normal distribution and the Wishart distribution from ( 14), respectively.
Clearly, the posterior, p(ϕ|D) = p(D, ϕ)/p(D), has a complex probability density which cannot be represented in a closed form.To this end, we apply VI to provide an analytical approximation to p(ϕ|D).Specifically, VI finds a distribution that is restricted to belong to a family of distributions of simpler forms, denoted by q(ϕ), such that q(ϕ) ≈ p(ϕ|D), which is called variational distribution.This can be done by finding the variational distribution that minimizes the Kullback-Leibler (KL) divergence of q(ϕ) from p(ϕ|D): because log p(D) = KL(q(ϕ)||p(ϕ|D)) + ELBO(q) and log p(D) is fixed with respect to q(ϕ).The function is called evidence lower bound (ELBO), which is a lower bound for the log-evidence of the data (hence the name), that is, log p(D, ϕ) ≥ ELBO(q) for any q.This can be shown by the fact that KL(q(ϕ)||p(ϕ|D)) ≥ 0 for any q.
Here we adopt the mean field approximation method (see Blei et al. (2017) for more details) to formulate the variational distribution q(ϕ), which considers the mean-field variational family where the variables ϕ are mutually independent and each governed by a distinct factor in the variational density.In addition, we approximate the posterior DP by a truncated stick-breaking representation as in Ishwaran and James (2001), Blei and Jordan (2006), and Sun and Xu (2010).This can be done by setting q(γ K = 1) = 1 with a fixed value K, indicating that the stick is no longer broken after K − 1 steps.The length of the remainder of the stick is assigned to π K , while the lengths of the sticks beyond K become zero.This implies that the mixing proportion π k = 0 for any k > K. Together, the mean field approximation method uses the following factorized variational distribution to approximate p(ϕ|D): Given this mean-field variational family, the variational distribution for each ω ∈ ϕ can be optimized by maximizing the ELBO using coordinate ascent variational inference (Bishop, 2006), which gives the optimal variational distribution: where the expectation E ϕ\ω is with respect to the variational distribution over all variables ϕ except for ω.For example, from (15), the optimal variational distribution for The distribution p(z j |γ) can be found from ( 10).It follows that where the third equation is based on the variational distribution of z j , i.e., q(z j ).The fourth equation changes the order of summation, and the last equation holds because the terms ] are constant with respect to γ k when k ′ ̸ = k.Since p(γ k ) follows a Beta distribution as in (12), we have which implies that q(γ k ) follows Beta( N j=1 q(z j = k) + 1, N j=1 q(z j > k) + α 0 ).The optimal variational distributions for the remaining variables ω ∈ ϕ can be derived in a similar manner.The specific results are provided in the E-step of Algorithm 1, and for detailed derivations, we refer to Supplementary Materials S1.Note that the truncation level K is a variational parameter which can be freely set.Although it is possible to optimize K with respect to the KL divergence, we hold it fixed as in Blei and Jordan (2006) and Sun and Xu (2010) throughout this paper.
Given the optimal variational distributions, the hyperparameters of the GPs, θ k and τ 2 k , can be estimated by maximizing the expected log-likelihood with respect to the approximated distributions, which is called variational EM in the literature (Blei et al., 2017).Specifically, , the estimates, denoted by θk and τ 2 k , can be solved by maximizing with respect to θ k and τ 2 k .The estimators are given in the M-step of Algorithm 1, and the detailed derivations are provided in Supplementary Materials S1.The E-and M-steps repeat iteratively by updating the parameters of the variational distributions until the ELBO converges, where the explicit form of ELBO (derived from ( 16)) is provided in Supplementary Materials S1.In total, each iteration going through all the observations would take at most O(KN n 3 ), which is linear with respect to the number of mesh nodes, N .We have developed an open-source R package mcGP, enabling the implementation of the variational EM algorithm described in Algorithm 1.

Prediction
For unknown x ∈ χ, the predictive posterior distribution of β j (x) can be constructed as: where the approximation replaces p(z j = k|D, ϕ) with its variational distribution q(z j = k) as in Sun and Xu (2010), and p(β j (x) ) is the pdf of a normal distribution with the mean (5) and the variance (6) by replacing θ j and τ 2 j with θk and τ 2 k , respectively.
where Γ θk (x, X n ) = Φ θk (x, X n )Φ θk (X n , X n ) −1 , and the posterior predictive variance is The derivation of ( 20) is provided in Supplementary Materials S2.The posterior predictive mean ûN (s; x) of ( 19) is used to predict u N (s; x), and its error analysis is studied in the next section.

Error analysis of mcGP emulator
The error analysis is crucial for understanding the uncertainty of the emulator.By the triangle inequality, the norm of the difference between the posterior predictive mean ûN (s; x) and true solution can be decomposed into evaluation and emulation components: , where the L 2 -norm is defined as ∥f ∥ L 2 (Ω,χ) = ( Ω χ f (s; x) 2 dsdx) 1/2 .The numerical error measures the discrepancy between the numerical solution u N (s; x) and the ground truth u 0 (s; x), and the emulation error is the error for the emulator ûN (s; x) given limited evaluations of the simulator u N (s; x).To save the space, we investigate the numerical error and emulation error in Supplementary Materials S3 and S4, respectively, and then apply the error bound to a common PDE problem, elliptic equations, in Supplementary Materials S5.

Numerical studies
Numerical studies are conducted in this section to examine the performance of the proposed method.Specifically, three real-world computer models comprised of PDEs are considered, which are solved via FEM using the quadratic shape functions as in (3).
In these numerical studies, the hyperparameters in the priors ( 12) and ( 14) can be quite generic without the need of estimation.Similar to Yuan and Neubauer (2009) and Sun and Xu (2010), the hyperparameter µ 0 in ( 14) is set to the sample average of the node coordinates S N , and Σ 0 are set to the sample inverse covariance of S N ; the parameter κ 0 is the number of degrees of freedom under a Wishart distribution, which is set to the dimension of s j , that is, κ 0 = d; the scale parameter W 0 of the Wishart distribution is set to Σ 0 /d such that the mean of Σ k is the sample inverse covariance of S N .The concentration parameter α 0 in ( 12) is set to 0.5.
For each individual GP, Matérn correlation functions ( 7) is considered.The smoothness parameter ν is set to 5/2, which leads to a simplified form of ( 7): A small nugget parameter is added for numerical stability, which is set to g ≈ 1.5 × 10 −8 .
The truncation level K is set to 10.These numerical experiments were performed on a MacBook Pro laptop with Apple M1 Max of Chip and 32 GB of RAM.
Two performance measures are considered to examine the prediction performance.The first measure is the root mean squared error (RMSE) which is calculated as where n test is the number of test input points, u N (s j ; x test i ) is the numerical solution of the i-th test input point, x test i , at the j-th node location, s i , and ûN (s j ; x test i ) is the posterior predictive mean as in (19).The second measure is the average continuous ranked probability score (CRPS) (Gneiting and Raftery, 2007), which is a performance measure for predictive distribution of a scalar observation.Since the predictive distribution is a mixture of normal distributions as in Section 3.4, the CRPS can be computed by an analytical formula as in Grimit et al. (2006).For both RMSE and CRPS, lower values indicate better prediction accuracy.
To provide a comprehensive evaluation, we include the following methods for comparison: uGP, the independent univariate GPs sharing the same lengthscale hyperparameter (θ j ) but different scale hyperparameters (τ 2 j ) across the N mesh nodes, which is similar to Gu and Berger (2016); iGP, the independent GPs sharing different hyperparameters (θ j and τ 2 j ) across the N mesh nodes; pcaGP, which uses a functional principal component analysis (FPCA) with truncated components (Wang et al., 2016): with the leading M eigenfunctions {ψ l (s)} M l=1 , and the corresponding coefficients {α l (x i )}: where u 0 (s) is the mean function, which can be estimated by n i=1 u N (s; x i )/n.The number of components, M , is selected by finding the leading M eigenfunctions that explain 99% of variance over all n training cases.Then, iGP is applied to the M coefficients {α l (•)} M l=1 .This approach is similar to Dancik and Dorman (2008) and Mak et al. (2018).

Poisson's Equation
In this subsection, we explore the performance for emulating a PDE boundary value problem on an L-shaped membrane based on Poisson's equation, which has broad applicability in electrostatics and fluid mechanics (Evans, 2010).The model is represented as: ∆u = (x 2 − 2π 2 )e xs 1 sin(πs 1 ) sin(πs 2 ) + 2xπe xs 1 cos(πs 1 ) sin(πs 2 ), s = (s 1 , s 2 ) ∈ Ω, where x is the one-dimensional input variable with x ∈ [−1, 1], the operator ∆ is defined by , Ω is an L-shaped membrane, and the Dirichlet boundary condition, u = 0 on ∂Ω, is considered.The geometry and mesh, along with the solutions through FEM when x = −1, 0, 1, are demonstrated in Figure 3. Partial Differential Equation Toolbox of MATLAB ( 2021) is used to create the geometry and mesh to solve the equation.
We conduct a computer experiment of size n = 5, where the input locations are equally spaced in the input space [−1, 1], i.e., x i = 0.4i − 1.2 for i = 1, . . ., 5. The mesh size is set to 0.2, yielding N = 401 mesh nodes as shown in Figure 3.The proposed method is applied to this experimental data.Figure 4 illustrates the variational distribution, q(z j = k), for k = 1, . . ., 4, and Figure 5 presents the corresponding hyperparameter estimates, τ 2 k and θk .Note that here k = 5, 6, . . ., 10 are not shown in Figures 4 and 5 because q(z j = k) < 0.001 for all j, indicating that only four mixture components are only needed in this mixture model.This shows that, even though the proposed model considers an infinite mixture of GPs throughout a DP prior, the mixing proportions decrease exponentially so quickly that only a small number of components are used to model the data a priori.The fitting result reveals interesting scientific insights.First, the cluster k = 3 has higher probability mass on the nodes at the boundaries and locations of s 1 = 0 and s 2 = 0 (Figure 4), and the corresponding hyperparamters of the GP are τ 2 3 ≈ 0 and θ3 ≈ 7.5 (Figure 5), which makes sense because these nodes are related to the solution of 0 based on the boundary condition.
Second, the cluster k = 1 features higher probability mass on the nodes in the regions where the magnitude of solutions is the highest and their shapes are similar.The cluster shares high estimates τ 2 1 and θ1 , indicating the input-output relationship is smooth but the output values in these regions have relatively high variability across different input settings.More interestingly, the cluster k = 2 covers a group of the nodes s 1 = 0 in the neighbourhood of the cluster k = 3, while the cluster k = 4 contains a group of nodes in the vicinity of the cluster k = 1.The example shows that, unlike the traditional reduced-basis methods like proper orthogonal decomposition (POD) (Lumley, 1967), the clustering structures by the proposed method not only provide a useful insight for discovering the underlying physics, but also reveal their shared input-output relationship.
To illustrate the prediction performance, Figure 6 shows the FEM solution (left) at The prediction uncertainty can be quantified by the posterior predictive standard deviation, where the most uncertain predictions are located in the nodes of cluster k = 1, which is expected provided that the variation of the outputs are most dynamic with large magnitude in this cluster.We further examine the prediction and computation performance on 201 test input points equally spaced in [−1, 1] with various mesh size settings, which are 0.4, 0.2, 0.1, 0.05, and 0.025.The results are presented in Figure 7. From the left two panels, it appears that uGP and pcaGP perform worse than the other two in terms of prediction accuracy (both RMSE and CRPS).While there is no significant difference between iGP and mcGP in the predictive scores, it can be seen that mcGP generally yields lower RMSEs than other methods.
The right two panels present the computational cost.It is of no surprise that the proposed method requires more fitting time due to the complications of a mixture model; however, as indicated in Section 3.3, the third panel (from the left) shows that the computational cost for fitting mcGP is linear with respect to N , which is reasonably tractable in practice.
Besides, in the context of emulation, the computation for predictions is more of interest, for which it can be seen that the proposed method (and its competitors) can predict much faster than conducting a real FEM simulation.It is important to note that while in Figure 7, iGP demonstrates comparable prediction accuracy, in the problems presented in Sections 5.2 and 5.3, it performs less accurately compared to uGP and mcGP.On the other hand, mcGP generally exhibits better prediction accuracy than iGP and uGP.

Laminar flow past a cylinder
In this subsection, we investigate a system of a two-dimensional flow past a circular cylinder, which is a classical and interesting problem in fluid mechanics (Rajani et al., 2009;Seo   design method (Joseph et al., 2015), as shown in the left panel of Figure 8.The computer simulations at these points are performed using the FEM solver in the QuickerSim CFD Toolbox (Ltd., 2022).Figure 9 shows the computational mesh and computer solutions of v at two different input settings.The total number of mesh nodes is N = 3778.
The proposed mcGP method is applied to this experimental data.Figure 10 illustrates the variational distribution, q(z j = k), for j = 1, . . ., N , and k = 1, . . ., 5, 8, 9, 10, and the q(z j = k) < 0.1 for all j at these clusters.The result shows that different clusters exhibit high probability mass on the nodes in different fluid regions.For example, the cluster k = 4 has higher probability mass on the nodes at the upstream and vertical boundaries with small τ 2 4 and large θ4 , while the cluster k = 3 imposes higher probability mass in the downstream boundary and associated neighbouring regions.These two clusters share a common feature of low variation in the vertical velocity, as manifested by Figure 9.The components k = 8 and k = 9 puts higher probability mass in the frontal area of the cylinder with the highest output magnitude, and this comes with the large hyperparameter estimates τ 2 8 and τ 2 9 .In addition, the clusters k = 1, 2, and 10 have higher probability mass in the wake region.It turns out that the regions of high probability mass exerted by different clusters constitute the entire computational domain, implying the present clustering strategy is efficient.
The prediction and computation performance is examined on 100 uniformly random test input points from the input space.Three (out of 100) test FEM simulations along with the mcGP predictions are presented in Figure 11.From visual inspection, it appears that the point-wise predictions of mcGP are fairly accurate at the three input points.All dynamic structures in the flow are well captured, including the large-magnitude region in the front of the cylinder and the wake region behind the cylinder.Table 1 shows the results of the 100 test data in comparison of other GP models, indicating that the proposed mcGP outperforms the others in terms of prediction accuracy with reasonable computation time.Similar to the previous subsection, pcaGP performs the worst, which is not surprising because the approximation error of the dimension reduction approach can introduce additional bias to the predictions (Sung et al., 2024b).Unlike the previous subsection, uGP outperforms iGP in terms of both RMSE and CRPS.This again demonstrates that mcGP can serve as a middle ground between these two models.

Thermal stress analysis of jet engine turbine blade
In this subsection, we investigate the performance of the proposed method on a thermal stress analysis application for a jet turbine engine blade in steady-state operating condition.
More details can be found in Wright and Han (2006), Carter (2005) and Sung et al. (2024a).
In this study, we aim to emulate the thermal stress and deformation of a turbine with the  Figure 14 illustrates the variational distribution of the selected components, q(z j = k), and the corresponding hyperparameter estimates, τ 2 k and θk , are shown in the middle and right panels of Figure 12.The clusters k = 6, 7, 8 and 9 are not shown here because q(z j = k) < 0.3 for all j at these clusters.The distribution of the nodes in resulting clusters correspond to the distribution of thermal stress in a physically meaning manner.The cluster k = 4 has higher probability mass on the top center of the blade airfoil, where the thermal load is relatively large with high variability (τ 2 4 ≈ 1584) across different input settings; on the other hand, the cluster k = 3 put higher probability mass on the platform area, where the thermal load is relatively small with subtle variability (small τ 2 3 ).In addition, the cluster k = 1 has higher probability mass on the leading and trailing edges of the blade next to the platform, which also tends to have a high variability (τ 2 ≈ 4147) of the thermal stress.surrogate model for expensive PDE boundary value problems that simultaneously emulates the PDE solutions over a spatial domain.An important innovation of this work lies in its incorporation of mesh node locations into a statistical model, forming a mixture model that compromises the bias-variance trade-off in the context of many outputs.Furthermore, we develop a rigorous theoretical error analysis for the proposed emulator, which provides an important insight about its uncertainty quantification.Three real examples show that the method not only has advantages in prediction accuracy, but also enables discovery of interesting physics by interpreting the mixture clusters.
The proposed method shows several avenues for future research.First, in addition to the fine mesh in a spatial domain, the proposed method can be modified for simulations having fine grids over both the spatial and temporal domains.This can be naturally done by incorporating the time information into the latent model ( 13) to segment the temporal domain, and it is conceivable that the resulting clustering structures can reveal interesting dynamic features.Second, although the method developed herein assumes that the mesh specifications are identical across different input settings, the proposed method can be modified to tackle different mesh specifications.This can be done by utilizing the idea of common grid by Mak et al. (2018).Specifically, we recommend first determining a common grid, denoted as S = (s 1 , . . ., sN ) T , which serves as a reference for predictions.By evaluating training outputs u N (s j ; x i ) for each i = 1, . . ., n and j = 1, . . ., N via (2), we can subsequently model {u N (s j ; x)} j=1,...,N as an mcGP, as introduced in Section 3.This strategy enables predictions at untried points x ∈ χ on the common grid S. Lastly, it is worthwhile investigating the incorporation of the boundary conditions into the proposed model as in Tan (2018a,b), to further improve the prediction accuracy.We leave these to our future work.
Supplemental Materials Additional supporting materials can be found in Supplemental Materials, including the detailed derivations for Algorithm 1, the derivation of (20), the detailed error analysis in Section 4, and the R code for reproducing the results in Section 5.

Figure 1 :
Figure 1: Introduction of finite element method; similar to one from Tekkaya and Soyarslan (2019).

Figure 8 :
Figure 8: The left panel presents the design points in the laminar flow application, and the middle and right panels show the hyperparameter estimates of mcGP: τk (middle) and θk (right).

Figure 9 :
Figure 9: The left panel demonstrates the geometry and mesh in the laminar flow application, and the middle and right panels are the two (out of 30) training examples of the simulations, where the input settings are indicated by the values of x.

Figure 10 :
Figure 10: Variational distribution q(z j = k) in the laminar flow application.

Figure 11 :
Figure 11: Validation performance of mcGP prediction in the laminar flow application.The upper panels present three (out of 100) test FEM simulations (with the input settings indicated by the values of x) and the bottom panels are the corresponding posterior predictive means of mcGP.

Figure 12 :
Figure 12: The left panel presents the design points in the turbine blade application, and the and right panels show the hyperparameter estimates of mcGP: τk (middle) and θk (right).

Figure 13 :
Figure 13: Three (out of 30) examples of the training FEM simulations in the turbine blade application, where the input settings are indicated by the values of x.

Figure 14 :
Figure 14: Variational distribution q(z j = k) in the turbine blade application.

Figure 15 :
Figure 15: Validation performance of mcGP prediction in the turbine blade application.The upper panels present three (out of 100) test FEM simulations (with the input settings indicated by the values of x) and the bottom panels are the corresponding posterior predictive means of mcGP.

Table 1 :
Performance comparison in terms of prediction accuracy and computational cost in the laminar flow application, in which the better performances are boldfaced.Note that the test FEM simulations take, on average, 1068 milliseconds per run.

Table 2 ,
which again shows that the proposed method can outperform others in terms of prediction accuracy with reasonable computation time.PDE simulations based on mesh-based numerical methods have become essential in various applications, ranging from engineering to health care.In this paper, we propose a new

Table 2 :
Performance comparison in terms of prediction accuracy and computational cost in the turbine blade application, in the better performances are boldfaced.Note that the test FEM simulations take, on average, 3819 milliseconds per run.