Covariance–Based Rational Approximations of Fractional SPDEs for Computationally Efficient Bayesian Inference

ABSTRACT The stochastic partial differential equation (SPDE) approach is widely used for modeling large spatial datasets. It is based on representing a Gaussian random field u on Rd as the solution of an elliptic SPDE Lβu=W where L is a second-order differential operator, 2β∈N is a positive parameter that controls the smoothness of u and W is Gaussian white noise. A few approaches have been suggested in the literature to extend the approach to allow for any smoothness parameter satisfying β>d/4. Even though those approaches work well for simulating SPDEs with general smoothness, they are less suitable for Bayesian inference since they do not provide approximations which are Gaussian Markov random fields (GMRFs) as in the original SPDE approach. We address this issue by proposing a new method based on approximating the covariance operator L−2β of the Gaussian field u by a finite element method combined with a rational approximation of the fractional power. This results in a numerically stable GMRF approximation which can be combined with the integrated nested Laplace approximation (INLA) method for fast Bayesian inference. A rigorous convergence analysis of the method is performed and the accuracy of the method is investigated with simulated data. Finally, we illustrate the approach and corresponding implementation in the R package rSPDE via an application to precipitation data which is analyzed by combining the rSPDE package with the R-INLA software for full Bayesian inference. Supplementary materials for this article are available online.


Introduction
Handling many observations from a Gaussian random field in spatial statistics can be challenging since the related computational tasks involve factorizations of large covariance matrices which are usually dense.This is often referred to as the "big N problem" (Banerjee et al., 2015), and various approaches have been suggested to handle the computational issues (see, e.g., Heaton et al., 2019, for a recent comparison).One of the most widely used methods is the SPDE approach by Lindgren et al. (2011).This is based on the fact that a centered Gaussian random field u on the spatial domain D = R d with an isotropic Matérn covariance function (Matérn, 1960), can be represented as a solution to the stochastic partial differential equation (SPDE) Here, Γ(•) is the Gamma function, K ν is a modified Bessel function of the second kind, ∆ is the Laplace operator, and W is Gaussian white noise.The parameter κ > 0 controls the practical correlation range, σ 2 is the variance, τ 2 = Γ(ν)/(σ 2 κ 2ν (4π) d/2 Γ(ν + d/2)), and the fractional power β is related to the smoothness parameter ν > 0 via the relation 2β = ν + d/2 (Whittle, 1963).Lindgren et al. (2011) used this representation to construct computationally efficient Gaussian Markov Random Field (GMRF) approximations of Gaussian Matérn fields by considering the SPDE on a bounded domain D, restricting the smoothness to 2β ∈ N, and then performing a finite element method (FEM) discretization.
The SPDE approach has become widely used in applications, and has initiated a great number of extensions and generalizations (Lindgren et al., 2022).The reason for this is not only the computational benefits, but also that it provides a flexible framework for defining more sophisticated models for spatial data.It, for example, facilitates the construction of non-stationary Gaussian random fields by allowing the parameters κ and τ to be spatially varying (Lindgren et al., 2011;Fuglstad et al., 2015), and allows for the construction of Matérn-like random fields on more general manifolds by defining such fields via the SPDE (2) posed on the manifold (Lindgren et al., 2011;Bolin and Lindgren, 2011).
One of the main criticisms of the SPDE approach is the requirement 2β ∈ N, which restricts the possible values of the corresponding smoothness parameter ν of the Matérn covariance function.Given the importance of ν when performing prediction, as shown by Stein (1999) and Bolin and Kirchner (2023), several methods for removing the restriction of 2β ∈ N have been proposed.Lindgren et al. (2011, Author's response) proposed to construct a GMRF approximation by approximating the spectrum of a Gaussian Matérn field by a spectrum that is a reciprocal of a polynomial.This method is applicable for stationary models but it can not be applied to non-stationary models, and it has a fixed accuracy which may not be sufficient for certain applications.Bolin et al. (2020) proposed combining the FEM approximation of Lindgren et al. (2011) with a quadrature approximation of the fractional operator to obtain a numerical method that works for any β > d/4 and can be made arbitrarily accurate.That work also provided a theoretical convergence analysis of the method, which was extended in Bolin et al. (2018) and Herrmann et al. (2020).Bolin and Kirchner (2020) later proposed a different type of approximation referred to as the rational SPDE approach, which has a lower computational cost.
Even though the methods that work for non-stationary models with general smoothness are computationally efficient, they are much less used than the standard SPDE approach for statistical applications.The reason for this is that non-fractional SPDE models work in combination with the integrated nested Laplace approximation (INLA) method (Rue et al., 2009) and are implemented in the widely used R-INLA (Lindgren and Rue, 2015) R (R Core Team, 2022) package.This software facilitates including SPDE-based models in general Bayesian latent Gaussian models, and the great majority of all applications of the SPDE approach have been done via this software.
Unfortunately, the methods of Bolin et al. (2020) and Bolin and Kirchner (2020) provide approximations which are not compatible with R-INLA.The reason is that the methods do not yield a Markov approximation, so the precision matrix obtained from the approximations are not sparse.The covariance matrix of the approximations are of the form PQ −1 P, where both P and Q are sparse matrices that depend on the parameters of the model.To achieve a sparse precision matrix, which is necessary for R-INLA, Bolin and Kirchner (2020) showed that one can work with a latent model with sparse precision matrix Q if the projection matrix A, which connects the locations of the mesh for the FEM approximation and the observation locations (see Section 2 for details), is adjusted to A = AP.This matrix, however, depends on the model parameters, which is not allowed in R-INLA.
The main goal of this work is to solve this problem by proposing an alternative rational approximation.The main idea is to approximate the covariance operator of the random field directly, instead of first approximating the solution u and then deriving the corresponding covariance operator.This provides an approximation suitable for R-INLA, which is more numerically stable than that of the original rational SPDE approach.The proposed method is implemented in the R package rSPDE (Bolin and Simas, 2023), which is available on CRAN and has an interface to R-INLA.Using the package, we show that the proposed method facilitates full Bayesian inference of all model parameters, including β, for latent Gaussian models based on fractional SPDEs.
The outline of the paper is as follows.In Section 2, we give an overview of the model structure of the proposed approximation and show how it can be used for computationally efficient inference.The mathematical details and justifications of the method are provided in Sections 3 to 5. Specifically, in Section 3, we introduce the generalized Whittle-Matérn fields, which contain most of the previously proposed non-stationary SPDE-based Gaussian random fields as special cases, and for which our proposed method is applicable.In that section, we also provide the details of the FEM approximations.The new covariance-based rational approximation is introduced in Section 4, where we also prove that it provides an approximation of the covariance function of the generalized Whittle-Matérn field with an explicit rate of convergence in the L 2 -norm.In Section 5, we show that the covariance-based rational approximation can be represented as a GMRF, and illustrate how this can be used for statistical inference.Some of the details of the rSPDE implementation are discussed in Section 6, and a comparison in terms of the accuracy of approximating covariance function by our method and some other methods is provided in Section 7.An application to modeling of precipitation data is presented in Section 8 and the article concludes with a discussion in Section 9. Finally, the supplementary materials contain seven appendices which provide further technical details and proofs.

Overview of the approximation strategy
As mentioned in the introduction, the main idea behind our strategy is to directly approximate the covariance operator of the random field.In this section we show the structure of the resulting approximation and also provide an illustration on how it can be used for inference in a simple application.More details will be given in later sections.The covariancebased rational approximation of the Whittle-Matérn field u(s) defined in (2), whose covariance operator is (κ 2 − ∆) −2β , uses a combination of the finite element method and rational approximations in order to approximate u(s) as u n (s) = n j=1 w j φ j (s), where {w j } n j=1 are stochastic weights and {φ j } n j=1 are FEM basis functions.We denote w = [w 1 , ..., w n ] ⊤ .With our approximation, w can be expressed as a sum of m + 1 independent GMRFs with sparse precision matrices: Any linear predictor in R-INLA has this form, which means that we can perform inference in a computationally efficient manner based on the covariance-based rational approximation by using the same ideas as are used in R-INLA.For example, suppose that we observe y 1 , . . ., y N , N ∈ N, where s 1 , . . ., s N ∈ R d are spatial locations, and 4) can be written in matrix form as y = Aw +ϵ ϵ ϵ, where A is the projector matrix with elements Then, the precision matrix of X is the block diagonal matrix (5) Writing the model in terms of the weights X allows us to equivalently write the model as y = AX + ϵ ϵ ϵ where A is a block matrix of size N × n(m + 1) obtained by combining m + 1 copies of A as , where Q is given in (5).Standard results for latent Gaussian models then give us that the posterior distribution of X is X|y ∼ N (µ µ µ X|y , Q −1 X|y ), where Finally, we can obtain the marginal likelihood, ℓ(y), of y as The sparsity of Q is essential for computation.For instance, evaluating log |Q| in the likelihood can be done efficiently based on sparse Cholesky decomposition (Rue and Held, 2005).Sparsity of Q also facilitates computationally efficient sampling of w, and hence of u.
See Appendix E for further details on the methods for sampling and likelihood evaluation.

Whittle-Matérn fields and FEM approximation
In this section we introduce the class of fractional-order SPDEs we are interested in as well as their FEM approximations.The model assumptions are presented in Section 3.1, and in Section 3.2 we introduce the FEM approximations and study their convergence.
Let us begin by introducing some notation that will be needed later on.Given a bounded domain D ⊂ R d , d ∈ {1, 2, 3}, we denote by L 2 (D) the Lebesgue space of square-integrable real-valued functions endowed with the inner product (ϕ, ψ) L 2 (D) = D ϕ(x)ψ(x)dx.We denote the Sobolev space of order k by H k (D): where we are using the multiindex notation for the differential operator D γ , and (•, We denote by , where C ∞ c (D) is the set of infinitely differentiable functions with compact support on D. Additional notations needed for the theoretical analysis are given in Appendix A.

Model assumptions
We are interested in the class of Gaussian random fields on D that can be represented as solutions to SPDEs of the form where L β is a fractional power (in the spectral sense) of a second-order elliptic differential operator L which determines the covariance structure of u, τ > 0 is a constant parameter, and W is Gaussian white noise on L 2 (D).We have the following assumptions on D: Assumption 1.The domain D is an open, bounded, convex polytope with closure D.
Under Assumption 1, we may define H 2 N (D) = {w ∈ H 2 (D) : ∂w/∂ν = 0 on ∂D}, where ν is the outward unit normal vector to ∂D.Indeed, the expression ∂w/∂ν = 0 on ∂D makes sense since the trace of Dw is well-defined in this case (see, e.g., Evans and Gariepy, 2015, Theorem 4.6).Let us now describe the assumptions on the differential operator L: Assumption 2. The operator L is given in divergence form by Lu = −∇ • (H∇u) + κ 2 u, and is equipped either with homogeneous Dirichlet or Neumann boundary conditions.Furthermore, the function H : D → R d×d is symmetric, Lipschitz continuous and uniformly positive definite, and κ : D → R is an essentially bounded function, that is, Under Neumann boundary conditions, we additionally require that where λ is the Lebesgue measure on D.
The SPDE (8) under Assumptions 1 and 2 defines a class of models that have previously been considered by Bolin et al. (2020); Cox and Kirchner (2020); Herrmann et al. (2020); Bolin and Kirchner (2020) and is referred to as generalized Whittle-Matérn fields.It contains many previously proposed non-stationary SPDE-based spatial Gaussian random field models as special cases, such as those by Lindgren et al. (2011);Fuglstad et al. (2015Fuglstad et al. ( , 2019)); Hildeman et al. (2021), and the method that we later introduce thus also applies to those models and their fractional extensions.
In the case of Dirichlet boundary conditions, define the space V = H 1 0 (D) ⊂ L 2 (D), and in the case of Neumann boundary conditions let V = H 1 (D) ⊂ L 2 (D).Then, under Assumptions 1 and 2, L induces the following continuous and coercive bilinear form on V : Remark 1.Under Assumptions 1 and 2, if f ∈ L 2 (D), then there exists a unique solution u of Lu = f and the operator L is H 2 (D)-regular, that is, u ∈ H 2 (D)∩H 1 0 (D) under Dirichlet boundary conditions, whereas under Neumann boundary conditions, we have u ∈ H 2 N (D).See, for instance, (Grisvard, 2011, Theorem 3.2.1.2) for Dirichlet boundary conditions or (Grisvard, 2011, Theorem 3.2.1.3)for Neumann boundary conditions.
By remark 1, specifically by the existence and uniqueness of the solution to the equation Lu = f , we can define the inverse operator L −1 : L 2 (D) → L 2 (D).By Rellich-Kondrachov theorem (Evans and Gariepy, 2015, Theorem 4.11), L −1 is a compact operator, and observe that L −1 is self-adjoint, see Appendix A for a justification.Hence, by the spectral theorem for self-adjoint and compact operators, there exists an orthonormal basis {e j } j∈N in L 2 (D) formed by eigenvectors of L whose eigenvalues {λ j } j∈N are non-negative and can be arranged in a non-decreasing order.
Remark 2. Under Assumptions 1 and 2, the operator L satisfies the Weyl's law, that is, there exist c, C > 0 such that for every j ∈ N, cj 2/d ≤ λ j ≤ Cj 2/d .See Davies (1995, Theorem 6.3.1) for the Dirichlet case.For the Neumann case, the Weyl's law holds for the case in which H is a constant diagonal matrix (Fedosov (1963(Fedosov ( , 1964))), in particular, it holds for the Neumann Laplacian.The result for a general H satisfying Assumption 2 is a direct consequence of the Weyl's law for the Neumann Laplacian together with Proposition 4 in Appendix B and the min-max principle.
Our goal is to obtain approximations of the covariance operator L −2β of the Gaussian random field u which solves equation (8).Let Then, one can readily check, by Steinwart and Scovel (2012, Theorem 3.10), that the covariance operator L −2β is a kernel operator, with kernel ϱ β (•, •).That is, for any f ∈ L 2 (D), we have (L −2β f )(x) = D ϱ β (x, y)f (y)dy for a.e.x ∈ D. It is well-known that there exists a centered square-integrable Gaussian random field u that solves (8) if, and only if, its covariance operator, L −2β , has finite trace (Lototsky and Rozovsky, 2017, Theorem 3.2.5).Under Assumptions 1 and 2, one can use Weyl's law (Remark 2) to show that L −2β has finite trace if, and only if, β > d/4.Hence, if β > d/4, then u is a centered square-integrable Gaussian random field with covariance function ϱ β (x, y) = E[u(x)u(y)], where the equality holds for a.e.(x, y) ∈ D × D.

Finite element approximation
The goal is now to provide a convergence analysis for FEM approximations of the covariance operator L −2β .Let us start by describing the setup we will use.
Assumption 3. Let V h ⊂ V be a finite element space that is spanned by a set of continuous piecewise linear basis functions {φ j } n h j=1 (see Appendix D), with n h ∈ N, defined with respect to a triangulation T h of D indexed by the mesh width h := max T ∈T h h T , where h T := diam(T ) is the diameter of the element T ∈ T h .We assume that the family (T h ) h∈(0,1) of triangulations inducing the finite-dimensional subspaces (V h ) h∈(0,1) of V is quasi-uniform, that is, there exist constants K 1 , K 2 > 0 such that ρ T ≥ K 1 h T and h T ≥ K 2 h for all T ∈ T h and h ∈ (0, 1).Here, ρ T > 0 is the radius of the largest ball inscribed in T ∈ T h .
We are now in a position to describe the FEM discretization of the model (8).Let L h : V h → V h be defined in terms of the bilinear form a L as its restriction to Note that L h is a positive-definite, symmetric, linear operator on the finite-dimensional space V h .Hence, we may arrange the eigenvalues of j=1 which are orthonormal in L 2 (D).Let W h denote Gaussian white noise on V h .That is, there exist independent standard Gaussian random variables ξ 1 , . . ., ξ n h such that W h = n h j=1 ξ j e j,h .Then, we refer to the following SPDE on V h as the discrete model of (8): Let u h be a solution of (10), then the covariance operator of u h is given by L −2β h , and )e j,h (y), for a.e.(x, y) ∈ D × D, is the corresponding covariance function.We have the following result regarding the convergence of the FEM approximation ϱ β h to the exact covariance function ϱ β in the L 2 (D × D)norm defined by ∥f ∥ 2 L 2 (D×D) = D D f (x, y) 2 dxdy.The proof is given in Appendix B.
Remark 3. Cox and Kirchner (2020, Theorem 1) proved the bound (11) in the case of homogeneous Dirichlet boundary conditions.They did not provide a bound for the case of homogeneous Neumann boundary conditions.Proposition 1 arrives at the same bound for the Neumann case.For this, we additionally require that ess inf x∈D κ(x) ≥ κ 0 > 0 and that the domain D is a convex polytope in the Neumann case.As far as we know, this is a new result.The key step in the proof is to obtain an analogous result to Cox and Kirchner (2020, Lemma 2), which is given by Proposition 4 in Appendix B.

Rational approximation
Having introduced the FEM approximation, we are now ready to define the complete approximation of the covariance operator of the generalized Whittle-Matérn fields.The approximation is obtained by combining a rational approximation of the fractional power of the covariance operator with the FEM approximation.We begin by introducing the method and then provide a theoretical justification by showing an explicit rate of convergence of the approximate covariance function to the correct one in the L 2 (D × D)-norm.
In Bolin and Kirchner (2020), the authors obtained an approximation of the solution to (8), which also implicitly defines an approximation of the corresponding covariance operator.However, as we have previously mentioned, this results in an approximation that is not implementable in R-INLA.Also, for statistical applications there is usually no need to have an approximation of the solution itself, since only the corresponding distribution matters for inference.With this in mind, we propose to directly approximate the covariance operator L −2β .To this end, we first split , where {x} = x − ⌊x⌋ is the fractional part of x.Then, we approximate L −{2β} h with a rational approximation.This yields an approximation are polynomials obtained from a rational approximation of order m of the real-valued function f (x) = x {2β} .That is, which covers the spectrum of L −1 h .The coefficients are computed as the best rational approximation in the L ∞ -norm, which, for example, can be obtained via the second Remez algorithm (Remez, 1934) or by the recent, and more stable, BRASIL algorithm (Hofreither, 2021).See Appendix F for details about this algorithm and a justification for the choice of using the best rational approximation in L ∞ -norm.
By defining the covariance function )e j,h (y), for a.e.(x, y) ∈ D, we have that ϱ β h,m is the kernel of the covariance operator L −2β h,m .There are two sources of errors when we consider ϱ β h,m as an approximation of the true covariance function ϱ β of the generalized Whittle-Matérn field: the FEM approximation and the rational approximation.The following proposition, whose proof is given in Appendix B, shows that we have control of these two sources of errors via the FEM mesh width h and the order of the rational approximation m.
Proposition 2. Let β > d/4.Under Assumptions 1, 2 and 3, for every ε > 0 and for sufficiently small h, we have: Remark 4. We can calibrate the accuracy of the rational approximation with the finite element error by choosing ⌉.This ensures that the rate of convergence in (13) is min{4β − d/2 − ε, 2}.See Section 7 for further details on the choice of m.

GMRF representation
The goal of this section is to obtain a sparse matrix representation of the precision operator of the rational approximation from the previous section, so that the methods in Section 2 can be used for computationally efficient sampling and likelihood evaluation.
The solution u h in (10) at spatial location s can be represented as u h (s) = n h j=1 w j φ j (s), where {w j } n h j=1 are stochastic weights and {φ j } n h j=1 are the piecewise linear finite element basis functions.We will now show how to represent w = [w 1 , ..., w n h ] ⊤ as a sum of independent GMRFs, each with a sparse precision matrix.The key step is to apply a partial fraction decomposition in (12): Here, {r i } m i=1 , {p i } m i=1 and k are real numbers, and I V h is the identity operator mapping the finite element space to itself.Let C be the mass matrix with elements C i,j = (φ i , φ j ) L 2 (D) , and let L be the matrix obtained by the bilinear form a L (•, •) induced by the differential operator L, which has elements L i,j = (H∇φ i , ∇φ j ) L 2 (D) + (κ 2 φ i , φ j ) L 2 (D) .Then, we can use ( 14) to obtain the covariance matrix of w as (see Appendix C for a derivation): where In the Matérn case, that is, when κ is a constant and H is an identity matrix, we simply have L = G + κ 2 C, where G is the stiffness matrix with elements G i,j = (∇φ i , ∇φ j ) L 2 (D) .
Since we have the same degree for numerator and denominator in the rational approximation, we can use the BRASIL algorithm (Hofreither, 2021) to compute the coefficients {a i } m i=0 and {b i } m i=0 in ( 12) and thus the coefficients {r i } m i=0 , {p i } m i=0 and k in ( 14).Another option, commonly used in practice, is to use a "near best" rational approximation.One such option, which was used in Bolin and Kirchner (2020), and which is also implemented in the rSPDE package, is the Clenshaw-Lord Chebyshev-Padé algorithm (Baker and Graves-Morris, 1996).See Appendix F for details about this algorithm.Also, observe that the interval where one should compute the rational approximation may vary with the parameters κ and H, and that recomputing the coefficients {a i } m i=0 and {b i } m i=0 for different values of these parameters is not practical for implementations.To avoid this, recall from Assumption 2 that κ 2 0 is a lower bound for the eigenvalues of L in the case of Neumann boundary conditions and that λ 1 ≤ λ 1,h (see Proposition 3 in Appendix B).We can, then, re-scale the operator L h as L h /κ 2 0 so that we can replace the interval [λ −1 n h ,h , λ −1 1,h ] by [δ, 1], where, ideally, δ is chosen in such way that δ ≤ κ 2 0 /λ n h ,h for all considered mesh sizes h.In the rSPDE package, the choices δ = 0 and δ = 10 −(5+m)/2 are implemented.However, the difference in accuracy with respect to approximating the covariance function is negligible between these two choices.
For these options, we verified empirically that if f β (x) = x {2β} , {2β} = 2β − ⌊2β⌋, and f β,m is the rational approximation of f β where the numerator and denominator have same degree m, then f β,m = x ⌊2β⌋ m j=1 r i (x − p i ) −1 + k, where {p i } m i=1 are negative real numbers and {r i } m i=1 and k are positive real numbers.This, together with the fact that the BRASIL algorithm is only implemented for rational approximations with numerator and denominator having the same degree, are the main reasons we chose to consider the numerator and denominator having the same degree m.Bolin and Kirchner (2020) instead considered a rational approximation where the numerator has degree m and the denominator has degree m + 1.However, with this choice the partial fractions would not yield a decomposition into positive-definite operators in our case.
Since {p i } m i=1 are negative real numbers and {r i } m i=1 and k are positive real numbers, we have that r i (L −1 C) ⌊2β⌋ (L − p i C) −1 , for i = 1, ...m, and K ⌊2β⌋ are valid covariance matrices.Thus, w can be expressed as a sum of m + 1 independent random vectors x i as in (3), where Q i is the precision matrix of x i .By (15), we obtain that Then, the precision matrix of X is the block diagonal matrix shown in (5).The final step in order to obtain a GMRF representation is to use the mass lumping technique as for the standard SPDE approach, see Appendix C.5 in Lindgren et al. (2011).Thus, the mass matrix C in ( 16) is replaced by a lumped mass matrix C, where C is a diagonal matrix with Cii = n h j=1 C ij , for i = 1, ..., n h .With this adjustment, Q in ( 5) is sparse and we thus have obtained a GMRF representation.

Implementation and the rSPDE package
The proposed covariance-based rational approximation method has been implemented in the R package rSPDE.In the following sections, we will use this package to illustrate the performance of the method.In this section, we give a brief introduction to the package and how it can be used in combination with R-INLA for computationally efficient Bayesian inference of latent Gaussian models involving the generalized Whittle-Matérn fields.
The usual workflow of fitting standard SPDE models in R-INLA can be divided into six steps.Namely, constructing the FEM mesh, defining SPDE model, creating a projector matrix, building the INLA stack, specifying the model formula, and finally calling the function inla to fit the model.Details about this can be found in Lindgren and Rue (2015).To fit a model with a fractional SPDE, this procedure remains the same.The only difference is that when defining the SPDE model, creating the projector matrix and building the index for INLA stack, we use functions from the rSPDE package.These functions are very similar to the corresponding R-INLA functions in terms of functionality.For example, a fractional SPDE model can be created with the command: model <-rspde.matern(mesh= mesh) where mesh is a FEM mesh that can be obtained by inla.mesh.2dfunction from R-INLA.The default order of the rational approximation in this function is m = 2, which provides a good trade off between computational cost and accuracy, see Figures 1 and 6.As for the corresponding inla.spde2.maternfunction that can be used to define non-fractional SPDE models in INLA, one can also set priors for κ and τ in rspde.matern.Further, we can also define a prior for the smoothness parameter ν or specify ν so that a SPDE model with a fixed smoothness parameter can be generated.This feature can be used, for example, in the case that one already knows what ν is or wants to compare two different models with different ν, as we will do in Section 8.
The projector matrix A for a given mesh and observation locations loc is computed as A <-rspde.make.A(mesh = mesh, loc = loc) As for the creation of the model, the default order of the rational approximation when creating the projector matrix is m = 2, which can be changed by the user.The other arguments of the function are the same as those in the corresponding R-INLA function inla.spde.make.A.In the step of building the INLA stack, usually an index set is needed.
The index can be computed with the function rspde.make.index,which replaces the R-INLA function inla.spde.make.indexand has the same arguments.With these functions, the fractional models can be used as any other random effect in R-INLA.After fitting the model with the R-INLA function inla, posterior samples from a latent field and hyperparameters can be obtained by using inla.posterior.sample, and posterior distributions of the model parameters can be extracted via the rSPDE function rspde.result.
Besides the INLA-related functions, the rSPDE package also provides various utility functions.For example, once a fractional SPDE model, model, has been created with the rspde.maternfunction, one can simulate from it by calling simulate(model) to obtain a prior sample from a given choice of parameters, and the marginal log-likelihood from Section 2 can be computed by l <-rSPDE.matern.loglike(model,y, A, sigma.e)Here, y is the observed data and sigma.e is the standard deviation of the measurement noise.In addition, if a model is fitted with this approach, then kriging and posterior sampling can be obtained by using the predict function.For further details and examples, we refer the reader to the vignettes at https://davidbolin.github.io/rSPDE.
Finally, rSPDE also provides an interface to the inlabru package (Bachl et al., 2019), which simplifies the construction of spatial models.This was used in the application in Section 8, where the entire code for defining and fitting the fractional model is: mesh <-inla.mesh.2d(loc= loc, max.edge = c(0.5,10), cutoff = 0.35) spde <-rspde.matern(mesh= mesh, nu.upper.bound= 1) res <-bru(z ~-1 + field(coordinates, model = spde), data = data) Here loc are the measurement locations and data is a data frame with the locations and observations.

Numerical experiments
In this section, we compare the accuracy of the covariance-based rational approximation with the operator-based method from Bolin and Kirchner (2020), and with the "parsimonious" method from Lindgren et al. (2011).Since the latter method is implemented in R-INLA, we refer to it as the INLA approximation.We also note that the INLA method constructs a covariance-based Markov approximation (see also Bolin and Kirchner, 2020, Section 2), so it can be viewed as a 0th order covariance-based rational approximation.
For the comparison, we consider the SPDE model ( 2) with homogeneous zero Neumann boundary conditions on the unit square D = [0, 1] 2 , with τ chosen such that σ 2 in the Matérn covariance is one.The reason we consider the square domain, is that we have an explicit expression for the covariance function of the solution u.Indeed, we have, from Khristenko et al. (2019, Eq. (2.13)), that the covariance function of u is given by where ∥ • ∥ is the Euclidean norm on R 2 and ϱ(•) is the Matérn covariance function in (1) with σ = 1 and ν = 2β − 1.To compare the accuracy of the covariance approximations, we evaluate the true and approximate covariance functions on a regular mesh on [0, 1] 2 with N = 100 equally spaced nodes on each axis.We will compare these approximations with respect to the L 2 ([0, 1] 2 × [0, 1] 2 )-norm and the supremum norm on Appendix G shows how we approximate the errors in these two norms in detail.
For the operator-based and covariance-based rational approximations, we consider the orders of rational approximation as m = 1, 2, 3, 4. We choose smoothness parameters ranging from 0.1 to 3.1 with steps of size 0.05.Further, we test three possible values of κ.These values of κ, say κ 1 (ν), κ 2 (ν) and κ 3 (ν) are chosen in such a way that the practical range ρ = √ 8ν/κ is fixed as 0.1, 0.5 and 1, respectively, for all values of ν.The resulting errors for the different methods are shown in Figure 1.
We begin by observing that for smoothness parameters ν = 1, 2 or 3, there is no rational approximation and the errors only come from the FEM approximation.With this in mind, one should note that for smaller range parameters most of the approximation error comes from the FEM approximation, thus yielding a small difference of errors across the different methods.However, for larger ranges, such as, in this case, practical range equal to 1, the errors have different orders of magnitude as the order of the rational approximation increases, with the errors from the operator-based and covariance-based approximations of same rational approximation order having approximately the same order of magnitude.Furthermore, we can observe numerical instabilities of the operator-based approximations of order 3 and 4 as ν increases for both practical ranges 0.5 and 1, whereas the covariancebased method is stable for all orders of approximation.
In order to further illustrate the effect of the FEM error on the rational approximation of the covariance operator we repeated the analysis from above but with a coarser FEM mesh, consisting of 50 equally spaced nodes on each axis over the domain [0, 1] 2 .The results are shown in Figure 6 in Appendix G.We now observe that for practical range 0.1, there is no visible difference between the covariance-based or operator-based rational approximations of orders 1 to 4, with a very small difference between the "parsimonious" INLA approximation and the remaining rational approximations.Further, for practical ranges 0.5 and 1, we hardly see any differences between the rational approximations of orders 2, 3 and 4. The only noticeable difference being that for large values of ν, the operator-based rational approximation becomes numerically unstable.On the other hand, it is noteworthy that for practical ranges 0.5 and 1, there is a significant difference (difference in orders of magnitude) between the rational approximations of order 0, 1 and the remaining orders.
To summarize, the results indicate that the covariance-based method generally has a similar accuracy as the operator-based method, which is higher than the accuracy provided by INLA's method.The results also show that the covariance-based method is more numerically stable, especially for larger values of m, the order of the rational approximation.
It is important to remember that the INLA method only provides a fixed approximation, furthermore it only works in this case of stationary parameters, whereas the other methods are applicable also for non-stationary models and can be made arbitrarily precise by increasing the order m.As previously mentioned, the operator-based method is not suitable for inference in R-INLA, but the covariance-based method is.Thus, in conclusion, the covariance-based method provides a method that facilitates inference for stationary and non-stationary fractional SPDE-based models in R-INLA, which is also more accurate than the current INLA method for stationary models.
The numerical experiments in this section were implemented using the rSPDE package.All plots in this section, along with several more, for different choices of all the parameters involved, can be found in a shiny (Chang et al., 2021) app available at https://github.com/davidbolin/rSPDE.The results above were obtained by the Clenshaw-Lord Chebyshev-Padé algorithm with δ = 0 (see Section 5).The shiny app also contains the results by the BRASIL algorithm and the Clenshaw-Lord Chebyshev-Padé algorithm with δ = 10 −(5+m)/2 .We also include results on likelihood errors in Appendix G.

Application
In this section, we illustrate the usage of the covariance-based rational approximation method through an application to a spatial data set of precipitation observations.The dataset, available at https://www.image.ucar.edu/Data/precip_tapering/contains an-Figure 2: The finite element mesh over the contiguous US and the stations shown in dots.nual precipitation anomalies observed by weather stations in the United States (standardized by the long-run mean and standard deviation for each station).We study the data from the year 1962, which contains observations from 7352 stations throughout the contiguous United States.We chose this dataset because it is simple enough to use a stationary model, which allows us to highlight the advantages of the fractional model without having to construct a complicated hierarchical model.Kaufman et al. (2008) also studied this data as an illustration for the covariance tapering method.
We model the data by (4) where ϵ ϵ ϵ is independent Gaussian measurement noise with Q ϵ ϵ ϵ = σ −2 ϵ I and u is a Whittle-Matérn field obtained as a solution to (2), where D is a bounded region (see Figure 2).The field is discretized using a finite element mesh that covers the contiguous United States with 9485 nodes.Figure 2 shows the mesh and the 7352 stations.Our interest is to compare the stationary SPDE models with either a fractional smoothness parameter ν (referred to as the fractional model) or a fixed parameter ν = 1 (referred to as the integer model) in terms of predictive power.In order to more easily interpret the parameters, we consider a parameterization of the Whittle-Matérn field in terms the standard deviation σ = Γ(ν)/(τ κ ν (4π)Γ(ν + 1)), the practical correlation range ρ = √ 8ν/κ, and the smoothness ν.The prior distributions for the parameters are chosen as the default choices from the rSPDE package.That is, the priors of log(ρ) and log(σ) are independent Gaussian distributions with variance 10 and the mean values are chosen based on size of the domain.Further, the prior of ν is a Beta distribution on the interval (0, 1) with mean 1/2 and variance 1/16.The choice of prior for ν is motivated by the fact that we do not believe that this should be a very smooth field.We also tested with Beta distributions on larger intervals and found that this did not affect the parameter estimates or the predictive performance of the model much.
We fit the models using R (version 4.2.1) and the rSPDE package (version 2.2.0) combined with inlabru (version 2.7.0) and R-INLA (version 23.02.17) running on a machine with an Intel i9-12900KF CPU, 64GB RAM and an Ubuntu operating system.The complete code for the analysis can be found in the supplementary materials.The total time for fitting the fractional and integer models are 38.4s and 15.4s, respectively.The posterior distributions of the parameters of the Gaussian field and standard deviation of measurement noise for the two models are shown in Figure 3.One can note that the posterior mode of ν for the fractional model is around 0.52, which indicates that a fractional smoothness is needed.Compared to the fractional model, the integer model has a smaller σ and a larger σ ϵ , indicating that the latent field explains less of the variability of the data.Finally, the practical correlation range of the integer model is substantially smaller than that of the fractional model, which likely is caused by the fact that a small range is needed to better explain the short range behavior of the data if the smoothness parameter is forced to be an integer.
To further compare the models, we perform two leave-group-out pseudo cross-validation studies (Liu and Rue, 2022).In the first, for each station, we predict the value of the station based on all data except that from stations that are closer than a certain distance D (referred to as the distance of removed data).We then vary this distance and compute the accuracy of the predictions as functions of D. In the second, for each station, we instead remove the data from the k nearest stations and compute the accuracy of the predictions as functions of k.According to the screening effect (Stein, 2002), in both cases the removed observations are the most informative.The quality of the prediction is measured in terms of Mean Squared Errors (MSE) and the negative Log-Score (LS) (Good, 1952).Both metrics are negatively oriented, which means that a lower value indicates a better result.The results of the two cross-validation studies are shown in Figure 4. We see that the fractional model outperforms the integer model in both cases.For example, the fractional model with the distance of removed data being 400km achieves the same levels of MSE and negative LS as the integer model with the distance of removed data being 300km (indicated by dashed lines).Also, the fractional model with 125 removed data points achieves the same levels of MSE and negative LS as the integer model with 100 removed data points.We can note that the two models have similar performance when the distance of removed data and number of removed data are close to zero.This is expected due to the mean-squared continuity of the latent fields, combined with the fact that the models have nugget effects.This means that both models will have an MSE close to the variance of measurement error when the distance of removed data or number of nearest removed data are close to zero.

Discussion
We have introduced a new rational SPDE approach which provides stable and computationally efficient approximations for the covariance structure of generalized Whittle-Matérn Gaussian random fields with general smoothness β > d/4.We further derived an explicit rate of convergence of the method, which provides a theoretical justification for the approach.Compared to the rational SPDE approach of Bolin and Kirchner (2020), the main advantage is that we obtain a GMRF representation of the approximation.This allowed us to implement the method so that fractional SPDE models now can be estimated in R-INLA, where we in particular can estimate the smoothness parameter from data.
The current version of rSPDE has truncated log-normal and beta priors for the smoothness parameter ν as possible choices.A natural question for future research is how this prior should be chosen in a more systematic way.A potential way to do this is following the idea of penalized complexity priors (PC-priors) (Simpson et al., 2017).Fuglstad et al. (2019) derived PC-priors for κ and τ of the Whittle-Matérn fields assuming a fixed value of ν.We plan to extend that work by deriving PC-priors for all three parameters.Another potential area of future work is to extend the proposed method to spatio-temporal SPDE models as those proposed by Lindgren et al. (2020).

A Additional notation
In this section, we introduce some notation that we will use for the technical details in the following sections.Let (E, ∥ • ∥ E ) and (F, ∥ • ∥ F ) be two separable Hilbert spaces with norms ∥ • ∥ E and ∥ • ∥ F respectively.Then (E, ∥ • ∥ E ) → (F, ∥ • ∥ F ) means that E ⊂ F and there exists a constant C such that for any x ∈ E, we have ∥x∥ F ≤ C∥x∥ E .In this case, we say that E is continuously embedded in We let L(E, F ) denote the Banach space of bounded linear operators from E to F endowed with the operator norm, that is, ∥A∥ L(E,F ) = sup ∥u∥ E =1 ∥Au∥ F , where A ∈ L(E, F ). Similarly, we let L 2 (E, F ) denote the Banach space of Hilbert-Schmidt operators, endowed with the Hilbert-Schmidt norm, that is, Recall that a bounded linear operator T on a Hilbert space E is self-adjoint if, for all f, g ∈ E, (T f, g) E = (f, T g) E .Now let us show that L −1 is self-adjoint, where L is the operator from Section 3.1.We have that L −1 is compact and thus bounded on L 2 (D).For any g ∈ L 2 (D), L −1 g ∈ V (V is defined in Section 3.1).Thus, for any f, g ∈ L 2 (D), let Lu = f , so that u = L −1 f .By the symmetry of the bilinear form (9), we have that

B Proofs of Proposition 1 and Proposition 2
Let us start by providing some relations between the eigenvalues of L and L h .Recall, from Section 3, that {λ j } j∈N are eigenvalues of L and {λ j,h } n h j=1 are eigenvalues of L h , both given in non-decreasing order.We have the following standard result: Proposition 3.Under Assumption 3, we have that 1.

Let, now, Ḣσ
< ∞ , with inner product and norm given by and ∥ψ∥ 2 Ḣσ L (D) = ⟨ψ, ψ⟩ Ḣσ L (D) , respectively.Further, we define [H 1 , H 2 ] σ as the real interpolation between the Hilbert spaces H 1 and H 2 (see Bolin et al., 2022, Appendix A for a brief review of real interpolation of Hilbert spaces).
We consider the fractional Sobolev space of order σ, with 0 < σ < 2, σ ̸ = 1, given by By Cox and Kirchner (2020, Lemma 2), we have that with Dirichlet boundary conditions We want to apply Cox and Kirchner (2020, Theorem 1), however it was only proved under Dirichlet boundary conditions.Therefore, we need some additional auxiliary results to conclude an analogous result in the case of Neumann boundary conditions.To this end, we need to prove the following result, which is a version of Cox and Kirchner (2020, Lemma 2) for Neumann boundary conditions: Proposition 4.Under Neumann boundary conditions we have where H 2 N (D) was defined in Section 3.Moreover, where the norms ∥ • ∥ Ḣσ L (D) and ∥ • ∥ H σ (D) are equivalent on Ḣσ L (D) for σ ̸ = 3/2.
Proof.First, observe that Ḣ0 L = L 2 (D).Also, since the bilinear form a L is continuous, coercive and symmetric, a L is an inner product on H 1 (D), whose corresponding norm is equivalent to ∥ • ∥ H 1 (D) .Now, by definition of ∥ • ∥ Ḣ1 L (D) , we have that for every ϕ ∈ H 1 (D), a L (ϕ, ϕ) = ∥ϕ∥ 2 Ḣ1 L (D) .This means that the norm induced by a L coincides with the norm D) .This shows the equivalence between ∥ • ∥ Ḣ1 L (D) and ∥ • ∥ H 1 (D) .Now, observe that from Lax-Milgram's lemma, for every i ∈ N, the eigenvector e i of L belongs to H 1 (D) and satisfies (e i , e j ) Ḣ1 L (D) = a L (e i , e j ) = λ i δ i,j , where δ i,j is the Kronecker's delta.For any ϕ ∈ Ḣ1 L (D), we have that ϕ = i∈N a i e i , with i∈N a 2 i λ i < ∞ and a i = (ϕ, e i ) L 2 (D) .Since i∈N a 2 i λ i < ∞, the series i∈N a i e i is absolutely convergent in H 1 (D), which implies that it also converges in H 1 (D) since H 1 (D) is a Hilbert (complete) space.On the other hand, the series converges to ϕ in L 2 (D), so the series must converge to ϕ in H 1 (D) as well because of the inclusion H 1 (D) ⊂ L 2 (D).Thus, ϕ ∈ H 1 (D).
Conversely, let ψ ∈ H 1 (D) and observe that {e ).We obtain (19) by the same arguments as in the proof of Bolin et al. (2022, Corollary 10).Similarly, to prove (20), it is enough to show that ( Ḣ2 ).To this end, first, let ϕ ∈ Ḣ2 L (D) and write ϕ = i∈N a i e i , with a i = (ϕ, e i ) L 2 (D) .Let, ϕ N = N i=1 a i e i and by linearity of L, we have that Lϕ N = N i=1 a i λ i e i .Now, observe that i∈N λ 2 i a 2 i < ∞ implies Lϕ N converges to some g ∈ L 2 (D).On the other hand, since L : Ḣ2 L (D) → L 2 (D) is self-adjoint, it is a closed operator.Therefore, Lϕ = g.We now apply H 2 (D)-regularity of L (Remark 1) to conclude that ϕ ∈ H 2 N (D).Finally, it follows from the closed graph theorem that ( Ḣ2 . By the Kirszbraun theorem (Kirszbraun, 1934), H can be extended to a Lipschitz function on R d with the same Lipschitz constant.Denote this extension by H. Now, let R > 0 be such that D ⊂ B(0, R), where B(0, R) stands for the ball with center 0 and radius R in Then, by convexity of B(0, 2R), φ is Lipschitz and bounded.This implies that φ H is Lipschitz, since it is the product of bounded Lipschitz functions, φ H ∈ C c (R d ), and the restriction of φ H to D is H. Therefore, by Grisvard (2011, Theorem 1.4.1.1),H∇ψ ∈ (H 1 (D)) d .In particular, Lϕ ∈ L 2 (D).Thus, Lϕ = i∈N b i e i , i∈N b 2 i < ∞, where b i = (Lϕ, e i ) L 2 (D) .We then apply Gauss-Green formula (Grisvard, 2011, Theorem 1.5.3.1)twice together with the fact that ϕ and e i satisfy Neumann boundary condition, to conclude that D).Now, we repeat the same argument from the previous inclusion, to obtain that (H 2 ) from the closed graph theorem.This proves (20).
We are now in a position to obtain a version of Theorem 1 of Cox and Kirchner (2020) (more precisely, of Remark 8 in Cox and Kirchner (2020)) that works for both Dirichlet and Neumann boundary conditions.
Remark 5. From Assumptions 1 and 3, there exists a linear operator I h : H 2 (D) → V h such that for every 1 ≤ θ < 2, I h : H θ (D) → V h is a continuous extension and there exists a constant C which only depends on κ, H and D such that where 1 ≤ θ ≤ 2 Indeed, this follows by Ciarlet (2002, Theorem 3.2.1)together with Chandler-Wilde et al. (2015, Theorem 3.5).
. Remark 6.Note that ϱ β h is the covariance function of the stochastic process obtained as the solution of (10).Now we are ready to give the proof of Proposition 1. .
We begin by handling the term first term in the right-hand side of ( 22).Recall that if H is a Hilbert space and A, B : H → H are linear operators, then Now, let τ = 2β − d/4 − δ > 0 and apply Lemma 1 (where we take the ε in its statement as ε/2 and γ = 0) together with equation ( 23) to obtain We have the following bound for the Hilbert-Schmidt norm of L where we used item 2 of Proposition 3 and Weyl's law (Remark 2).Therefore, since 2δ < ε/2 and h is sufficiently small, we obtain Now let us give a bound for the second term on the right-hand side of ( 22).Let γ = min{4β − d − 4δ, 2} > 0, so γ ≤ 2. Observe that in order to apply Lemma 1, we must choose δ such that γ ̸ = 1/2 in the Dirichlet case, or γ ̸ = 3/2 in the Neumann case.This is possible, since we can reduce δ if necessary.
The natural domain of the operator L −(2β−d/4−δ) is L 2 (D).Furthermore, by the definition of the Ḣσ L (D) space, we have that ) is bounded.Recall that {e j } j∈N is an orthonormal basis in L 2 (D).Then we have .Combining this with Lemma 1, we obtain that , where we chose ε in the statement of Lemma 1 as ε/2.Again, since 2δ < ε/2 and h is sufficiently small, we arrive at The result now follows from ( 24) and ( 25).

C Derivation of the GMRF representation
In this section, we derive equation ( 15).Recall the rational approximated covariance operator in ( 14): , where L h was defined in Section 3.2, L −2β h,m was defined in Section 4 and I V h is the identity map on the finite element space V h .The first part of this expression is the sum of the terms of the form are positive-definite.They are also self-adjoint, and thus valid covariance operators.We will deal with each term in the partial fractions expansion separately.We begin with the terms of the form rL where L = (L h − pI V h )L h and z ∈ V h (see Section 3.2 for the definition of V h ).Let {φ j } n h j=1 be the finite element basis of V h .We can write z in the finite element basis as z = n h j=1 z j φ j .Similarly, we have that x = n h j=1 x j φ j .Let us now obtain a relation between z = [z 1 , ..., z n h ] ⊤ and x = [x 1 , ..., x n h ] ⊤ .Observe that, for each l = 1, ..., n h , we have (z, φ l ) L 2 (D) = n h j=1 z j (φ j , φ l ) L 2 (D) .However, by ( 28) and ( 29), we also have To this end, let B be the matrix of the operator L h in the basis The relation z = (C −1 L) n x now follows by induction (the base case is basis function takes the value 1 at that node, decreases linearly to the value 0 at all the neighboring mesh nodes and takes the value 0 constantly elsewhere on the domain.
In practice, the basis functions are first defined for a reference element and then mapped to a physical element on mesh.For example, we can consider a one dimensional domain b be a partition of the domain.Each sub-interval [x i , x i+1 ], i = 1, ..., n h − 1 is an element.We can choose the reference element as [0, 1] and define two basis functions on this element as φ r,1 (X) = 1 − X and φ r,2 (X) = X for X ∈ [0, 1].Through a change of variables x = x i + (x i+1 − x i )X for x ∈ [x i , x i+1 ], i = 1, ..., n h − 1, we can find the basis functions defined on the mesh.For the interior nodes, we have and for the two boundary nodes we have This type of basis functions is also referred to as hat functions.The basis functions in higher dimensional spaces generalize naturally from the one dimensional case.Figure 5 shows an illustration of a basis function on a two dimensional domain.In the case of Dirichlet boundary conditions, we remove all basis functions centered at mesh nodes on the boundary, so that the FEM approximation satisfies the Dirichlet boundary conditions.

E Likelihood evaluation and posterior sampling
It is computationally efficient to evaluate the likelihood in (7) since Q, Q X|y and Q ϵ ϵ ϵ are sparse.Specifically, to compute the marginal likelihood (7), Algorithm 1 can be used.
Algorithm 1 Marginal likelihood computation

F Ideas of rational approximation algorithms
In this section, we will briefly introduce the ideas of the BRASIL algorithm and the Clenshaw-Lord Chebyshev-Padé algorithm that were mentioned in Sections 4 and 5.
The idea of the BRASIL algorithm is that one can achieve the best rational approximation of a continuous function on a compact interval [a, b] ∈ R by interpolating a certain number, depending on the degree of the rational function, of points such that the maximum error of the approximation in each sub-interval divided by those points are equal.The BRASIL algorithm first initializes a partition of the interval [a, b] by a set of points, then uses the barycentric rational interpolation on those points, and adjusts iteratively the partition so that the maximum absolute errors in each sub-interval are approximately equal.See Hofreither (2021, Section 3) for a complete description of the algorithm.
The Clenshaw-Lord Chebyshev-Padé algorithm approximates the target function by a combination of a Padé approximation and a Chebyshev series.Padé approximation consists of approximating a target function f by a rational function R [m/n] , with degree m and n for the numerator and denominator polynomials, respectively.The coefficients of the polynomials are computed so that the derivatives at 0 agree with the derivatives of the target function up to the highest possible order.That is, f (k) (0) = R The two algorithms compute the best or the near best coefficients of rational approximation in the sense of L ∞ -norm.The main reason for computing the coefficients in this way is that we by Stahl (2003) then have an explicit rate of convergence of the error, which allows us to compute the explicit bounds for the covariance error.If we only had a bound in the L 2 -norm (say), it would be less clear how to use that in the theoretical analysis.Further, as far as we know, there are no known methods for obtaining optimal rational approximations with respect to other norms that have known rates of convergence.
There are other methods for computing rational approximations of fractional powers of elliptic operators, based on alternative representations of the fractional power.One example is the method of Bonito and Pasciak (2015) which was applied in Bolin et al. (2020).That method, however, has a much higher error for low orders of the rational approximation (Bolin and Kirchner, 2020), and is therefore not suitable in our context.

G Further numerical experiments
In this section, we provide some additional details on the numerical experiments and provide some plots of the absolute relative errors of the likelihood shown in (7) with different orders of the rational approximations.
First we describe how we approximate the L 2 ([0, 1] 2 × [0, 1] 2 )-norm and the supremum norm on [0, 1] 2 × [0, 1] 2 .In order to approximate these norms we first need to build some matrices induced by the covariance operators.First, denote by {s i } N 2 i=1 the locations of the mesh nodes.For two continuous functions ρ, ρ : [0, 1] 2 × [0, 1] 2 → R, let Σ and Σ be N 2 × N 2 matrices with corresponding (i, j)th elements given by Σ(i, j) = ρ(s i , s j ) and Σ(i, j) = ρ(s i , s j ), respectively.The L 2 ([0, 1] 2 × [0, 1] 2 )-distance between ρ and ρ can be can be approximated, on this regular mesh, by the following quadrature: where ∥ • ∥ F stands for the Frobenius norm.Similarly, we can approximate the supremum where ∥ • ∥ max stands for the max norm.Thus, to approximate the errors, we just need to assemble the true covariance matrix and the covariance matrix of the approximation.Let us now describe how this is done.To this end, fix some smoothness parameter ν > 0, and let β = ν/2 + d/4.We build the covariance matrix Σ β , of size N 2 × N 2 , associated to the true covariance function by setting its (i, j)th element to be Σ β i,j = ϱ β u (s i , s j ), where ϱ β u is given in (17).In practice, we truncate the sum in (17) to a sufficiently large range of k ∈ Z 2 .Let Q I,β be the precision matrix obtained from INLA's method of general smoothness, with corresponding covariance matrix Σ β I = Q −1 I,β .Now, fix some order m for the rational approximation and let Q m,O,β be the precision matrix from the operator-based rational approximation of order m.The covariance matrix associated to the operator-based rational approximation is given by Σ β O,m = Q −1 m,O,β .Finally, let Q m,C,β be the precision matrix given by ( 5).The corresponding covariance matrix is then given by Σ  The results of the covariance error for the coarser FEM with 50 equally spaced nodes on each axis can be seen in Figure 6.We now consider similar a comparison for the likelihood errors of the different methods.For the comparison, we generate 1000 sets of samples, where each contains 1000 observations on D = [0, 1] 2 generated from (4) where u has covariance function (17).For each set of samples y i , we compute the true log-likelihood value ℓ(y i ) and the approximation l(y i ) for each of the three methods, and finally store the absolute relative error |1 − l(y i )/ℓ(y i )|.The median of the 1000 absolute relative errors for the three methods are presented in log scale in Figure 7.
We can note that the error tends to decrease when the order of the rational approximation, m, increases.Recall that the error for ν ∈ N solely comes from the FEM error, so we can see that there is no need for a large m to obtain an error which is on the same scale as the FEM error.In fact, the likelihood error for integer ν and non-integer ν are quite similar as long as m ≥ 2. This means that we essentially have the same likelihood error for a general ν with our method as the standard SPDE approach has for integer values of ν (where the error only comes from the FEM discretization).Finally, we can also note that the covariance-based method has better numerical stability with respect to m compared with the operator-based method.More comparisons can be found the Shiny app at https://github.com/davidbolin/rSPDE.

Figure 1 :
Figure 1: Errors in L 2 (D × D)-norm (top) and supremum norm (L ∞ (D × D)) (bottom) on D = [0, 1] 2 for different practical ranges ρ for different values of ν.All methods use the same FEM mesh, with 100 equally spaced nodes in each direction.

Figure 4 :
Figure 4: MSE and negative Log-Score as functions of distance (in km) of removed data (top) and number of removed data (bottom).

Figure 5 :
Figure 5: Example of a finite element basis function on a mesh in two dimensions.
ϵ y, where µ µ µ X|y is computed by solving Q X|y µ µ µ X|y = A ⊤ Q ϵ ϵ ϵ y for µ µ µ X|y with Rue and Held (2005, Algorithm 2.1).3: Compute log |Q| by exploiting sparsity of Q.First, compute the Cholesky decomposition of Q: Q = LL ⊤ .Then, compute log |Q| = log |LL ⊤ | = 2 log |L| = 2 i log L ii where L ii denotes ith diagonal element of L. 4: Compute log |Q X|y | and log Q ϵ ϵ ϵ in the same way as Step 3. 5: Compute the likelihood of y by using (7) Similarly, samples from predictive distributions of the latent field can be obtained effectively via Algorithm 2. Algorithm 2 Predictive distribution sampling Input: Locations s 1 , • • • , s N where u(s) should be sampled.1: Assemble Q and Q ϵ ϵ ϵ .2: Compute Q X|y and µ µ µ X|y in the same way as Step 2 in Algorithm 1. 3: Use Q X|y and µ µ µ X|y to sample X|y by following Rue and Held (2005, Algorithm 2.4).4: Construct a projection matrix A new for the locations s 1 , • • • , s N .5: Return A new X|y as a sample from π(u(s 1 ), • • • , u(s N )|y).
k = 0, ..., m + n.Now, for any continuous function f on interval [−1, 1] ∈ R, there is a unique Chebyshev series, which has the form f (x) = ∞ k=0 a k T k (x), that converges uniformly to the function f .Here, {a k } k are called the Chebyshev coefficients and {T k } k are the Chebyshev polynomial of the first kind.{T k } k are defined from the recurrence relation:T 0 (x) = 1, T 1 (x) = x, and T n+1 (x) = 2xT n (x) − T n−1 (x), for x ∈ [−1, 1]. a k can be computed by a k = 2/π 1 −1 f (x)T k (x)dx √ 1−x 2 .One can obtain the Chebyshev series of a continuous function on a compact interval [a, b] through a change of variables.To approximate a continuous function on an interval [a, b], the Clenshaw-Lord Chebyshev-Padé algorithm first expands a continuous function on [a, b] with its (truncated) Chebyshev series, then uses Padé approximation to approximate the series.

Figure 6 :
Figure 6: Errors in L 2 (D × D)-norm (top) and supremum norm (L ∞ (D × D)) (bottom) on D = [0, 1] 2 for different practical ranges ρ for different values of ν.All methods use the same FEM mesh, with 50 equally spaced nodes in each direction.
β C,m = IQ −1 m,C,β I ⊤ ,whereI is a block matrix of size N 2 × N 2 (m + 1) obtained by combining m + 1 copies of the N 2 × N 2 identity matrix I N 2 as I = I N 2 • • • I N 2 .

Figure 7 :
Figure 7: The log-scaled log-likelihood errors, where ρ is the range parameter and the standard deviation of the measurement noise is 0.1 (top) and 0.01 (bottom).All the methods use the same FEM mesh, with 100 equally spaced nodes in each direction.
m, whereas the second part is kL