A Particle Method for Solving Fredholm Equations of the First Kind

Fredholm integral equations of the first kind are the prototypical example of ill-posed linear inverse problems. They model, among other things, reconstruction of distorted noisy observations and indirect density estimation and also appear in instrumental variable regression. However, their numerical solution remains a challenging problem. Many techniques currently available require a preliminary discretization of the domain of the solution and make strong assumptions about its regularity. For example, the popular expectation maximization smoothing (EMS) scheme requires the assumption of piecewise constant solutions which is inappropriate for most applications. We propose here a novel particle method that circumvents these two issues. This algorithm can be thought of as a Monte Carlo approximation of the EMS scheme which not only performs an adaptive stochastic discretization of the domain but also results in smooth approximate solutions. We analyze the theoretical properties of the EMS iteration and of the corresponding particle algorithm. Compared to standard EMS, we show experimentally that our novel particle method provides state-of-the-art performance for realistic systems, including motion deblurring and reconstruction of cross-section images of the brain from positron emission tomography.


Introduction
We consider Fredholm equations of the first kind of the form where f (x) and h(y) are probability densities on X and Y, respectively, while g(y | x) is the density of a Markov kernel from X to Y. Given g and samples from h, we aim to estimate the density f . This class of equations has numerous applications in statistics and applied mathematics. For example, h might correspond to a mixture model for which we wish to estimate its mixing distribution, f , from samples from h. This problem is known as density deconvolution or indirect density estimation (Delaigle et al., 2008;Ma, 2011;Pensky et al., 2017;Yang et al., 2020). In epidemiology, these equations link the unknown reproduction number of a disease to the observed number of deaths (Goldstein et al., 2009;Gostic et al., 2020;Marschner, 2020). In instrumental variable regression and causal inference, Fredholm equations can be used to estimate a nonlinear regression function or identify causal effects in the presence of confounders (Hall et al., 2005;Miao et al., 2018). Since the seminal work of Vardi et al. (1985); Vardi and Lee (1993), Fredholm equations of the first kind have also been widely used in positron emission tomography. In this and similar contexts, f corresponds to an image which needs to be inferred from noisy measurements (Aster et al., 2018;Clason et al., 2019;Zhang et al., 2019).
In most interesting cases, Fredholm integral equations of the first kind are ill-posed and it is necessary to introduce a regularizer to obtain an unique solution. Solving the regularized problem remains computationally very challenging. Standard approaches require discretization of the domain, X, which restricts their applications to low-dimensional scenarios (Burger et al., 2019;Tanana et al., 2016;Ma, 2011;Kleefeld, 2011;Koenker and Mizera, 2014;Jose and Rajan, 2017;Qi-Nian, 2000;Yang et al., 2020). They additionally make the assumption that the solution is piecewise constant (Burger et al., 2019;Tanana et al., 2016;Ma, 2011;Koenker and Mizera, 2014;Yang et al., 2020). This is the case for the popular Expectation Maximization Smoothing (EMS) scheme (Silverman et al., 1990), a smoothed version of the infinite dimensional expectation maximization algorithm of Kondor (1983).
In this paper, our contributions are three-fold. First, we provide novel theoretical results on the EMS scheme on continuous spaces, establishing that it admits a fixed point under weak assumptions. Second, we propose a novel particle version of EMS which does not suffer from the limitations of the original scheme. This Monte Carlo algorithm provides an adaptive stochastic discretization of the domain and outputs a sample approximation of f through which a smooth approximation can be obtained via a natural kernel density estimation procedure. Although this algorithm is related to sequential Monte Carlo (SMC) methods which have been widely used to perform inference for complex Bayesian models (Liu and Chen, 1998;Liu, 2001;Del Moral, 2004;Doucet and Johansen, 2011;Douc et al., 2014), standard SMC convergence results do not apply to this scheme so we also provide an original theoretical analysis of the algorithm. Third, we demonstrate this algorithm on both illustrative examples and on two realistic image processing applications.
The rest of this paper is organized as follows. In Section 2, we review Fredholm integral equations of the first kind, the EMS algorithm and establish existence of a fixed point for the continuous version. In Section 3, we introduce a particle approximation of the EMS recursion and provide convergence results for this scheme. We demonstrate the application of the algorithm in Section 4 and then briefly conclude.

Fredholm integral equations of the first kind
We recall that we consider equations of the form (1). We concern ourselves in particular with the case in which (A0a) X ⊂ R d X and Y ⊂ R d Y are bounded subsets of Euclidean spaces; and, (A0b) g can be evaluated pointwise and it is feasible to obtain samples from h.
In most applications the space H = X × Y ⊂ R d X ×d Y is closed and bounded and (A0a) is satisfied. For instance, in image processing both X and Y are typically of the form [−a, a] × [−b, b] for a, b > 0, f and h are continuous densities on X and Y respectively, and the available data are the values of h over the discretization of Y induced by the pixels of the image (e.g. an image with 10 × 10 pixels induces a discretization on Y in which the intervals [−a, a] and [−b, b] are each divided into 10 bins).
In applications the analytic form of h is often unknown, and the available data arise from discretization of h over Y or from sampling. The first case is considered in, e.g, Vardi and Lee (1993) and extensions to the case of sampling are considered in, e.g., Ma (2011). We focus here on the sampling case. In applications in which only a fixed number of such samples are available, sampling from h can be approximated by a bootstrap-like approximation by drawing from the empirical distribution of the available samples.
As observed by Chae et al. (2018, Section 6), considering (1) in the context of probability densities is not too restrictive. A wider class of Fredholm integral equations of the first kind can be recast in this framework: if f , h and g are positive and appropriately integrable functions, theñ If f and h are bounded below and g is positive, then the shifted functions f (x) + t, h(y) + t X g(y | x) dx are positive for some appropriate t > 0, and, assuming integrability, we can apply the normalization described above. Finally, if g is not necessarily non-negative, (1) can be transformed into a non-negative integral equation by considering the positive and negative parts of g (Chae et al., 2018, Section 6).
As the set of probability densities on X is not finite, if the kernel g is not degenerate then the resulting integral equation is in general ill-posed (Kress, 2014, Theorem 15.4). Fredholm's alternative (see, e.g., Kress (2014, Corollary 4.18)) gives a criterion to assess the existence of solutions of (1); however, the lack of continuous dependence on h causes the solutions to be unstable and regularization techniques are needed (Kress, 2014;Groetsch, 2007). Common methods are Tikhonov regularization (Tikhonov, 1963) and iterative methods such as those in Landweber (1951); Kondor (1983). See Yuan and Zhang (2019) for a recent review.

Expectation Maximization
From a statistical point of view, (1) describes an indirect density estimation problem: the density f has to be estimated from a set of observations from h. This can in principle be achieved by maximizing an incomplete data likelihood for f through the Expectation Maximization (EM) algorithm (Dempster et al., 1977). Nevertheless, the maximum likelihood estimator is not consistent, as the parameter to be estimated (i.e. f ) is infinite dimensional, but the observations available from h are finite dimensional (Laird, 1978). Additionally, the ill-posedness of (1) aggravates this problem and even full knowledge of h would result in poor approximations of f in general (Silverman et al., 1990).
We briefly review a number of iterative schemes based on the EM algorithm which aim to find approximate solutions of (1) through regularization. The starting point is the iterative method of Kondor (1983), an infinite dimensional EM algorithm, which minimizes the Kullback-Leibler divergence, with respect to f over the set of probability densities on X (Mülthei et al., 1989). Minimizing (3) is equivalent to maximizing a continuous version of the incomplete data log-likelihood for the function f (Mülthei et al., 1989). This scheme has a number of good properties, Mülthei et al. (1987, Theorem 7) establishes that iterating (2) monotonically decreases (3) and Mülthei et al. (1987, Theorem 8) that if the iterative formula converges to a limit, then this is a minimizer of (3). However, the minimizer need not be unique. Convergence of the EM iteration (2) to a fixed point has recently been proved under the existence of a sequence (f s ) s≥1 with h s (y) = X f s (x)g(y | x) dx, such that KL(h, h s ) converges to the infimum of (3) and some additional integrability conditions (Chae et al., 2018).
In general, implementing the recursive formula (2) analytically is not possible and discretization schemes are needed. Under the assumption of piecewise constant densities f , h and g, with the discretization grid fixed in advance, the EM recursion (2) reduces to the EM algorithm for Poisson data (Vardi and Lee, 1993), known as the Richardson-Lucy (RL) algorithm in the image processing field (Richardson, 1972;Lucy, 1974), where the intensities of pixels are modeled as Poisson counts, where f b for b = 1, . . . , B and h d for d = 1, . . . , D are the constant values over the deterministic discretization of the space for f and h respectively. The Iterative Bayes (IB) algorithm of Ma (2011) considers the case in which only samples from h are available. These samples are used to build a kernel density estimator (KDE) for h, which is then plugged into the discretized EM iteration (4).
As discussed earlier, despite being popular and easy to implement, the EM algorithm (4) has a number of drawbacks. In particular, after a certain number of iterations the EM approximations deteriorate resulting in unstable estimates that lack smoothness and give spiky estimates of f (Silverman et al., 1990;Nychka, 1990). Byrne and Eggermont (2015) emphasize that minimizing (3) does not deal with the ill-posedness of the problem and regularization is needed.
A natural way to introduce regularization is via maximum penalized likelihood estimation (MPLE), maximizing where P is a penalty term (see, e.g., Green (1990)).
In most cases, an updating formula like (4) cannot be obtained straightforwardly for MPLE because the derivative of P (f ) usually involves several derivatives of f . A possible solution is to update the estimate of f from iteration f n to f n+1 evaluating the penalty term at f n , rather than at the new value f n+1 . This is known as the one-step late (OSL) algorithm (Green, 1990). The resulting update formula is usually easier to compute but there is no guarantee that each iteration will increase the penalized log likelihood. However, if convergence occurs, the OSL algorithm converges more quickly that the corresponding EM for the penalized likelihood.

Expectation Maximization Smoothing
An easy-to-implement regularized version of the EM recursion (4) is the EMS algorithm of Silverman et al. (1990), an EM-like algorithm in which a smoothing matrix K is applied to the EM estimates at each iteration The EMS algorithm has long been attractive from a practical point of view as the addition of the smoothing step to the EM recursion (4) gives good empirical results, with convergence occurring empirically in a relatively small number of iterations (e.g. Silverman et al. (1990); Li et al. (2017Li et al. ( , 2020; Becker et al. (1991)). Latham and Anderssen (1992) show that under mild conditions on the smoothing matrix the discretized EMS recursion (5) has a fixed point while Nychka (1990) shows that with a particular choice of smoothing matrix, the fixed point of (5) minimizes a penalized likelihood with a particular roughness penalty. With this choice of penalty, the OSL and the EMS recursion have the same fixed point (Green, 1990). Fan et al. (2011) establish convergence of (5) to local-EM, an expectation maximization algorithm for maximum local-likelihood estimation, when the smoothing kernel is a symmetric positive convolution kernel with positive bandwidth and bounded support. Under the same assumptions, if the space on which the EMS mapping is defined is bounded, the discrete EMS mapping is globally convergent when the bandwidth is sufficiently large.
The focus of this work is a continuous version of the EMS recursion, in which we do not discretize the space and use smoothing convolutions Kf (·) := X K(u, ·)f (u) du in place of smoothing matrices, where h n (y) := X f n (z)g(y | z)dz.

Properties of the Continuous EMS Recursion
Contrary to the discrete EMS map (5), relatively little is known about the continuous EMS mapping.
We prove here that it also admits a fixed point in the space of probability distributions. This result is obtained under the following assumptions on the kernel g and the smoothing kernel, K (A1) The density of the positive kernel g(y | x) is continuous, bounded and bounded away from 0: (A2) The density of the smoothing kernel is specified via a continuous bounded density, T , over R d X , such that inf v∈X X T (u − v)du > 0 as: Assumption (A1) is common in the literature on Fredholm integral equations as continuity of g rules out degenerate integral equations which require special treatment (Kress, 2014, Chapter 5). The boundedness condition on g ensures the existence of a minimizer of (3) (Mülthei, 1992, Theorem 1). Assumption (A2) on T is mild and is satisfied by most commonly used kernels for density estimation (Silverman, 1986). The EMS map describes one iteration of this algorithm, for a probability density f , It is the composition of linear smoothing by the kernel K defined in (A2) and the non-linear map corresponding to the EM iteration F EM , and f (Ḡ f ) is the appropriate normalizing constant (here and elsewhere we adopt the convention that for any suitable integrable function, ϕ, and probability or density, f , f (ϕ) = f (x)ϕ(x)dx). That is, F EM corresponds to a simple reweighting of a probability, with the weight being given byḠ; for a probability density, f , The existence of the fixed point of F EMS can be established using results from non-linear functional analysis: Proposition 1. Under (A1)-(A2), the EMS map, F EMS , has a fixed point in the space of probability distributions over X.
The proof is detailed in Appendix B.

Particle implementation of the EMS Recursion
In order to make use of the continuous EMS recursion in practice, it is necessary to approximate the integrals which it contains. In order to do so, we develop a particle method specialized to our context via a stochastic interpretation of the recursion.

Particle methods
Particle methods also known as Sequential Monte Carlo (SMC) methods are a class of Monte Carlo methods that sequentially approximate a sequence of target probability densities {η n (x 1:n )} n≥1 of increasing dimension, whose evolution is described by Markov transition kernels M n and positive potential functions where each η n is defined on the product space H n (Del Moral, 2004, 2013. These sequences of densities naturally arise in state space models (e.g. Liu and Chen (1998); Doucet and Johansen (2011);Li et al. (2016)) and a number of inferential problems can be described by (8) (see, e.g., Liu and Chen (1995), Chopin (2002), Johansen et al. (2008), Kantas et al. (2015), Zhou et al. (2016)). The approximations of η n for n ≥ 1 are obtained through a population of Monte Carlo samples, called particles. The population consists of a set of N weighted particles {X i n , W i n } N i=1 which evolve in time according to the dynamic in (8). Given the equally weighted population at time n−1, { X i n−1 , 1 N } N i=1 , new particle locations X i n are sampled from M n (· | X i n−1 ) to obtain the equally weighted population at time Then, the fitness of the new particles is measured through G n , which gives the weights W i n . The new particles are then replicated or discarded using a resampling mechanism, giving the equally weighted population at time n, { X i n , 1 N } N i=1 . Several resampling mechanisms have been considered in the literature (Douc et al. (2014, page 336), Gerber et al. (2019)) the simplest of which consists of sampling the number of copies of each particle from a multinomial distribution with weights {W i n } N i=1 (Gordon et al., 1993).
At each n, the empirical distribution of the particle population provides an approximation of the marginal distribution of X n under η n via Throughout, in the interests of brevity, we will abuse notation slightly and treat η N n as a density, allowing δ x0 (x)dx to denote a probability concentrated at x 0 . These approximations possess various convergence properties (e.g. Del Moral (2004Moral ( , 2013), in particular L p error estimates and a strong law of large numbers for the expectations for sufficiently regular test functions ϕ (Crisan and Doucet, 2002;Míguez et al., 2013).

A stochastic interpretation of EMS
The EMS recursion (6) can be modeled as a sequence of densities satisfying (8) by considering an extended state space. Denote by η n the joint density of (x, y) ∈ H defined over H by η n (x, y) = f n (x)h(y) so that f n (x) = η n | X (x) = Y η n (x, y) dy. This density satisfies a recursion similar to that in (6) where we allow extra flexibility by letting the smoothing kernel K n+1 change at each iteration. With a slight abuse of notation, we also denote by η n the joint density of (x 1:n , y 1:n ) ∈ H n obtained by iterative application of (9) with the integrals removed. The following result holds: Proposition 2. The sequence of densities {η n } n≥1 defined over the product spaces H n by (8) with and G n (x n , y n ) = g(y n | x n ) X η n | X (z)g(y n | z) dz (11) satisfies, marginally, recursion (9). In particular, the marginal distribution over x n of η n , η n (x 1:n , y 1:n ) dx 1:n−1 dy 1:n = Y η n (x n , y n ) dy n satisfies the EMS recursion (6) if we make the identifications f n (x) = η n | X (x) and h n (y) = f n (x)g(y | x)dx.
To facilitate the theoretical analysis we separate the contribution of the mutation kernels (10) and of the potential functions (11), in particular, we denote the weighted distribution obtained from η n by

A particle method for EMS
Having showed that the EMS recursion describes a sequence of densities satisfying (8), we follow the welltrodden path of employing SMC to approximate this recursion, replacing the true density at each step of the recursion with a sample approximation obtained at the previous iteration, giving rise to Algorithm 1. As written the algorithm suggests that the entire of the history of the particle system must be stored; however, as we are interested in approximating only the marginal distributions at time n, η n , we need to store only the particles at time n − 1 and n to be able to compute the weights. Nevertheless, as the EMS recursion (6) aims at finding a fixed point, after a certain number of iterations the approximation of f provided by the SMC scheme stabilizes. We could therefore average over approximations obtained at different iterations to get more stable reconstructions. When the storage cost is prohibitive, a thinned set of iterations could be used. The resulting SMC scheme is not a standard particle approximation of (8), because of the definition of the potential (11). Indeed, G n cannot be computed exactly, because η n | X is not known. The SMC scheme provides an approximation for η n | X at time n. Let us denote by η N n | X the particle approximation of the marginal η n | X in (12) We can approximate by using the particle approximation of the denominator h n (y n ), to obtain approximate potentials Algorithm 1: Particle Method for Fredholm Equations of the First Kind At time n = 1 1 Given η 1 : (15) and obtain the normalized (17) The use of G N n within the importance weighting corresponds to an additional approximation which is not found in standard SMC algorithms. In particular, (15) are biased estimators of the true potentials (11). As a consequence, it is not possible to use arguments based on extensions of the state space (as in particle filters using unbiased estimates of the potentials (Liu and Chen, 1998;Del Moral et al., 2006;Fearnhead et al., 2008)) to provide theoretical guarantees for this SMC scheme. If G n itself were available then it would be preferable to make use of it; in practice this will never be the case but the idealized algorithm which employs such a strategy is of use for theoretical analysis.
In principle, one could reduce the variance of associated estimators by using a different proposal distribution within Algorithm 1 just as in standard particle methods (see, e.g., Del Moral (2004, Section 2.4.2), Doucet and Johansen (2011, Section 25.4.1)) but this proved unnecessary in all of the examples which we explored as we obtained good performance with this simple generic scheme (the effective sample size was above 70% in all the examples considered).
At time n + 1, we estimate f n+1 (x) by computing a kernel density estimate (KDE) of the weighted particle approximation and then applying the EMS smoothing kernel K n+1 . This approach may seem counter-intuitive but the KDE kernel and the EMS kernel are fulfilling different roles and both are required to obtain a degree of smoothness consistent with both the EMS algorithm (regularizing the ill-posed inverse problem at hand) and dealing with finite sample size (in the KDE).
We consider standard d X -dimensional kernels for KDE, the smoothing bandwidth and S is a bounded symmetric density (Silverman, 1986). To account for the dependence between samples, when computing the bandwidth, s N , instead of N we use the effective sample size (Kong et al., 1994) The resulting estimator, satisfies the standard KDE convergence results in L 1 (Devroye and Wagner, 1979) and in L 2 (Silverman, 1986) (see Section 3.4.2). A similar approach has been shown empirically to give good smooth approximations of Fredholm integral equations of the second kind (Doucet et al., 2010). As we have considerable freedom over the choice of both K n+1 and S the integral in (17) can typically be computed analytically.
There are a number of algorithmic parameters which must be specified. The number of particles, N , can be set using the usual approaches for particle methods and the number of iterations by identifying approximate stationarity (see Section 4 and Appendix E.1 for an exploration of this). There are two other parameters, specific to Algorithm 1 which must also be chosen. The initial density, f 1 , must be specified but we did not find performance to be sensitive to this choice (see Appendix E.1). We advocate choosing f 1 to be a diffuse distribution with support intended to include that of f because the resampling step allows SMC to more quickly forget overly diffuse initializations that overly concentrated ones. For problems with bounded domains, choosing f 1 to be uniform over X is a sensible default choice and the one which we use in Section 4. The degree of smoothness provided by the smoothing kernel K n is a more important algorithmic parameter for the SMC implementation of EMS. If the expected smoothness of the fixed point of the EMS recursion (6) is known, K n should be chosen accordingly. If no information is known on the expected smoothness, the level of smoothing introduced could be picked by cross validation, comparing, e.g., the reconstruction accuracy or smoothness.
We end this section by identifying a further degree of freedom which can be exploited to improve performance. If sampling from h is computationally cheap, a variance reduction can be achieved by averaging over several Y ∼ h(·) when computing the approximated potentials G N n . At time n, draw M i.i.d. samples Y ij n ∼ h for j = 1, . . . , M for each particle i = 1, . . . , N and compute the approximated potentials by averaging over the M replicates Y ij .
This incurs an O(M N ) computational costs and can be justified by further extending the state space to X × Y m . Unfortunately, the results on the optimal choice of M obtained for pseudo-marginal methods (e.g. Pitt et al. (2012)) cannot be applied here, as the estimates of G n given by (15) are not unbiased. We discuss the choice of M in Appendix E.1 through an example. In the examples shown in Section 4 we selected M = N , which results in O(N 2 ) computational costs, but smaller values of M could be considered (see Appendix E.1). Alternatively, we could store Y ij 1:n and average over an increasing number of samples drawn from h (as, e.g., Rubenthaler et al. (2009) advocate in a different context) but this approach is not explored here.

Convergence properties
As the potentials (11) cannot be computed exactly but need to be estimated, the convergence results for standard SMC (e.g., Del Moral (2004, 2013) do not hold. We present here a strong law of large numbers (SLLN) and L p error estimates for our particle approximation of the EMS and also provide theoretical guarantees for the estimator (17).

Strong law of large numbers
For simplicity, we only consider multinomial resampling (Gordon et al., 1993). Lower variance resampling schemes can be employed but considerably complicate the theoretical analysis (Douc et al. (2014, page 336), Gerber et al. (2019)). Compared to the SLLN proof for standard SMC methods, we need to analyze here the contribution of the additional approximation introduced by using G N n instead of G n and then combine the results with existing arguments for standard SMC; see, e.g., Míguez et al. (2013).
The SSLN is stated in Corollary 1. This result follow from the L p inequality in Proposition 3, the proof of which is given in Appendix C.1 and follows the inductive argument of Crisan and Doucet (2002); Míguez et al. (2013). Both results are proved for bounded measurable test functions ϕ, a set we denote The only assumption needed to prove Proposition 3 is (A1). As a consequence of (A1), the potentials G n and G N n are bounded and bounded away from 0 (see Lemma 1 in Appendix C.1), a strong mixing condition that is common in the SMC literature and is satisfied in most of the applications which we have considered.
Proposition 3 (L p -inequality). Under (A1), for every time n ≥ 1 and every p ≥ 1 there exist finite constants C p,n , C p,n such that for every bounded measurable function ϕ ∈ B b (H) and where the expectations are taken with respect to the law of all random variables generated within the SMC algorithm.
The first inequality controls the error between the exact evolution (13) at iteration n and the evolution given by the particle population with the approximated potential G N n , while the second inequality also takes into account the resampling step at time n. The SLLN is a corollary to the L p -inequality (Shiryaev, 1996, page 254): Corollary 1 (Strong law of large numbers). Under (A1), for all n ≥ 1 and for every ϕ ∈ B b (H), we have almost surely as N → ∞: A standard approach detailed in Appendix C.2 yields convergence of the sequence {η n } n≥1 itself: Proposition 4. Under (A1), for all n ≥ 1, η N n converges weakly to η n with probability 1, in the sense that the integrals η N n (ϕ) converge to η n (ϕ) for all continuous bounded functions ϕ : H → R (almost surely).
This result is particularly interesting, because it shows that the particle approximations of the distributions themselves obtained with the SMC scheme converge (almost surely in the weak topology) to the sequence in (13), whose marginal over x satisfies the EMS recursion (6).

Convergence of kernel density estimator
Proposition 3 allows us to bound the L p error of the estimator f N n+1 (x) of f n+1 (x).
Corollary 2. Under (A1)-(A2), for every n ≥ 1 and every p ≥ 1, the estimator in (17) satisfies the following L p inequality for fixed x ∈ X and D a finite constant depending on n, p and the choice of KDE.
Corollary 2 implies convergence in L 1 and of the mean integrated square error (MISE) for f N n+1 (x).
Corollary 3. Under (A1)-(A2): f N n+1 converges almost surely to f n+1 in L 1 for every n ≥ 1: and the MISE satisfies Proof. Similarly to Theorem 4.4 in Crisan and Míguez (2014), convergence of the MISE to 0 is obtained by making use of the fact that the space X is bounded (A0a) as established in Appendix D.2.

Examples
This section shows the results obtained using the SMC implementation of the recursive formula (6) on some common examples. A one-dimensional toy example in which the analytic form of both f and h is known is also investigated in Appendix E.1. In this section we consider a simple density estimation problem and two realistic examples of image restoration problems, motion deblurring and positron emission tomography (Hohage and Werner, 2016). In the first example, the analytic form of h is known and is used to implement the discretized EM and EMS. IB and SMC are implemented using samples drawn from h. In the case of motion deblurring, the RL algorithm (i.e. EM for Poisson counts) is implemented by considering the data image as a discretization of the unknown density h into bins. The same image is used to draw the samples necessary for the SMC implementation.
A number of parameters have to be set in order to run EM, EMS, IB and SMC implementation of EMS. For all algorithms we need to specify an initial density f 1 and the number of iterations n. Unless otherwise stated, the number of iterations is n = 100 (we observed that convergence to a fixed point occurs in a smaller number of steps for all algorithms; see Appendix E.1) and the initial distribution f 1 is uniform over X. For the smoothing kernel K, we use isotropic Gaussian kernels with marginal variance ε 2 . The bandwidth s N is the plug-in optimal bandwidth for Gaussian distributions where the effective sample size (16) is used instead of the sample size N (Silverman, 1986, page 45). The deterministic discretization of EM and EMS ((4) and (5) respectively) is obtained by considering B equally spaced bins for X and D for Y. The number of bins, as well as the number of particles, N , for SMC varies depending on the example considered. In the first example, the choice of D, B and N is motivated by a comparison of error and runtime. For the image restoration problems, D, B are the number of pixels in each image. The number of particles N is chosen to achieve a good trade-off between reconstruction accuracy and runtime.
For the SMC implementation, we use the adaptive multinomial resampling scheme described in Liu (2001, page 35). At each iteration the effective sample size (16) is evaluated and multinomial resampling is performed if ESS < N/2. This choice is motivated by the fact that up to adaptivity (which we anticipate could be addressed by the approach of Del Moral et al. (2012)) this is the setting considered in the theoretical analysis of Section 3.4 and we observed only modest improvements when using lower variance resampling schemes (e.g. residual resampling, see Liu (2001)) instead of multinomial resampling.
The accuracy of the reconstructions is measured through the integrated square error

Indirect density estimation
The first example is the Gaussian mixture model used in Ma (2011) to compare the Iterative Bayes (IB) algorithm with EM. Take X = Y = R (although note that |1 − The initial distribution f 1 is Uniform on [0, 1] and the bins for the discretized EMS are B equally spaced intervals in [0, 1], noting that discretization schemes essentially require known compact support and this interval contains almost all of the probability mass. First, we analyze the influence of the number of bins B and of the number of particles N on the integrated square error of the reconstructions, and on the runtime of the deterministic discretization of EMS (5) and on the SMC implementation of EMS (Figure 1). Despite having higher runtimes for fixed numbers of bins/particles, the SMC implementation gives better results in terms of ISE(f ) for any particle size and, indeed, for given computational cost. Indeed, finer discretizations for EMS do not have such a significant effect on ISE(f ).
Next, we compare the reconstructions provided by the proposed SMC scheme with those given by deterministic discretization of the EM iteration (4), deterministic discretization of the EMS iteration (5) and deterministic discretization of the EM iteration when only samples from h are available (IB).
Having observed a small decrease in ISE(f ) for large B, we fix the number of bins B = D = 100. For the SMC scheme, we compare N = 500, N = 1, 000 and N = 5, 000. We discard N = 10, 000, as it shows little improvement in ISE(f ) with respect to N = 5, 000, and N = 100, because of the higher ISE(f ). We draw M = 1, 000 samples from h and we use this sample to get a kernel density estimator for h to use in the IB algorithm and (re)sampled points from it as the samples used during every step 2 of Algorithm 1.
We set ε = 10 −3 and we compare the smoothing matrix obtained by discretization of the Gaussian kernel (EMS (K)) with the three-point smoothing proposed in Silverman et al. (1990, Section 3.2.2), where the value f third point is f The reconstruction process is repeated 1,000 times and the reconstructions are compared by computing their means and variances, the integrated squared error (23) and the Kullback-Leibler divergence between h and the reconstruction of h obtained by convolution off N n+1 with g, (Table 1). To characterize the roughness off N n+1 , we evaluate bothf N n+1 and f at the 100 bin centers x c and for each bin center we approximate (with 1,000 replicates) the mean squared error (MSE) (24) Table 1 shows the 95th percentile w.r.t. the 100 bin centers x c . The discretized EM (4) gives the best results in terms of Kullback-Leibler divergence (restricting to the [0, 1] interval and computing by numerical integration). This is not surprising, as IB is an approximation of EM when the analytic form of h is not known, and the EMS algorithms (both those with the deterministic discretization (5) and those with the stochastic one given by the SMC scheme) do not seek to minimize the KL divergence, but to provide a more regular solution. However, the solutions recovered by EM have considerably higher ISE than that given by the other algorithms and are considerably worse than the other algorithms at recovering the smoothness of the solution. The reconstructions given by IB are also not smooth, this is not surprising as no smoothing step is present in the IB algorithm.
SMC is generally better at recovering the mean of the solution µ = 0.43333, the global shape of the solution (ISE is at least two times smaller than EM and EMS (K) and about half than EMS (3-point) and IB) and the smoothness of the solution (the 95th-percentile for MSE(x c ) is at least two times smaller). For the discretized EMS (5) and the SMC implementation the estimates of the variance are higher, this is a consequence of the addition of the smoothing step and can be controlled by selecting smaller values of ε.
IB and SMC give similar values for the KL divergence. The slight increase observed for the SMC scheme with N = 5, 000 is most likely due to the high sensitivity of this divergence to tail behaviors. Indeed we observed that when taking a bandwidth which does not depend on N at the last time step the value of the KL divergence is the same for all SMC schemes.

Motion deblurring
Consider a simple example of motion deblurring where the observed picture h is obtained while the object of interest is moving with constant speed b in the horizontal direction (Vardi and Lee, 1993;Lee and Vardi, 1994). The constant motion in the horizontal direction is modeled by multiplying the density of a uniform random variable on [−b/2, b/2] describing the motion in the horizontal direction and a Gaussian, N (v; y, σ 2 ), with small variance, σ 2 = 0.02 2 , describing the relative lack of motion in the vertical direction We obtain the corrupted image in Figure 2b from the reference image in Figure 2a using the model above with constant speed b = 128 pixels and adding multiplicative noise as in Lee and Vardi (1994, Section 6.2). Figure 2b is a noisy discretization of the unknown h(u, v) on a 300 × 600 grid. The addition of multiplicative noise makes the model (1) misspecified, but still suitable to describe the deconvolution problem when the amount of noise is low. For higher levels of noise, the noise itself should be taken into account when modeling the generation of the data corresponding to h.
We compare the reconstruction obtained using the SMC scheme with that given by the deconvlucy function in MATLAB c (The MathWorks Inc., 1993), an efficient implementation of the Richardson-Lucy (RL) algorithm for image processing.
The smoothing parameter is ε = 10 −3 , and the number of particles is N = 5, 000. These values are chosen to achieve a trade-off between smoothing and accuracy of the reconstruction and to keep the runtime under three minutes on a standard laptop.
The distance between the reconstructions and the original image is evaluated using both the ISE (23) and the match distance, i.e. the L 1 norm of the cumulative histogram of the image, a special case of the Earth Mover's Distance for gray-scale images (Rubner et al., 2000). SMC gives visibly smoother images and is better at recovering the shape of the original image (ISE(f ) is 1.4617 for SMC and 2.0863 for RL). In contrast, the RL algorithm performs better in terms of match distance (0.0054 for RL and 0.0346 for SMC).

Positron emission tomography
Positron Emission Tomography (PET) is a medical diagnosis technique used to analyze internal biological processes from radial projections outside in order to detect medical conditions such as schizophrenia, cancer, Alzheimer's disease and coronary artery disease (Phelps, 2000). The data distribution of the radial projections h(φ, ξ) is defined on [0, 2π] × [−R, R] for R > 0 and is linked to the cross-section image of the organ of interest f (x, y) defined on the 2D square through the kernel g describing the geometry of the PET scanner. The Markov kernel g(φ, ξ | x, y) gives the probability that the projection onto (φ, ξ) corresponds to point (x, y) (Vardi et al., 1985) and is modeled as a Normal distribution centered at 0 and with small variance (e.g. σ 2 = 0.02 2 ) to mimic the alignment between projections and recovered emission.
The data used in this work are obtained from the reference image in the final panel of Figure 3, a simplified imitation of the brain's metabolic activity (e.g. Shepp and Vardi (1982)). The collected data are the values of h at 128 evenly spaced projections over 360 • and 185 values of ξ in [−92, 92] to which Poisson noise is added. Figure 3 shows the reconstructions obtained with the SMC scheme. As observed earlier, convergence to a fixed point occurs in less than 100 iterations. The smoothing parameter is set to ε = 10 −3 and the number of particles is N = 20, 000. Figure 10 in the supplementary material shows relative error and ISE for the reconstructions in Figure 3. The ISE between the original image and the reconstructions at Iteration 50 to 100 stabilizes below 0.08.

Conclusions
We have proposed a novel particle algorithm to solve a wide class of Fredholm equations of the first kind. This algorithm has been obtained by identifying a close connection between the continuous EMS recursion and the dynamics (8). It performs a stochastic discretization of the EMS recursion and can be naturally implemented when only samples from the distorted signal h are available. Additionally it does not require the assumption of piecewise constant solutions common to deterministic discretization schemes.
Having established that the continuous EMS recursion admits a fixed point, we have studied the asymptotic properties of the proposed particle scheme, showing that the empirical measures obtained by this scheme converge almost surely in the weak topology to the sequence of densities given by the EMS recursion as the number of particles N goes to infinity. This result is a consequence of the L p convergence of expectations and the strong law of large numbers which we extended to the particle scheme under study. We have also provided theoretical guarantees on the proposed estimator for the solution f of the Fredholm integral equation. This algorithm outperforms the state of the art in this area in several examples.

Supplementary Material
The supplementary material contains the analysis of the EMS map, proofs of all results, an additional example and additional results for the PET example. MATLAB code to reproduce all examples is available online 1 .

A Notation
For the convenience of the reader, we summarize the notation adopted in the following arguments. A slightly more technical presentation is adopted than within the main manuscript as a little care is required in order to obtain rigorous results. We work on a probability space (Ω, A, P) rich enough to allow the definition of the particle system introduced in Section 3 for all N ∈ N. All expectations and probabilities which are not explicitly associated with some other measure are taken with respect to P.
For any H ⊆ R d , we consider the Borel σ-algebra B(H), and we endow any product space with the product Borel σ-algebra.
Let the Banach space of real-valued bounded measurable functions on H, endowed with the supremum norm, ϕ ∞ = sup u∈H |ϕ(u)|, be denoted by B b (H) and the subset of B b (H) such that ϕ ∞ ≤ 1, by B 1 (H).
Let M(H) be the Banach space of signed finite measures on (H, B(H)) endowed with the total variation (TV) norm For each ω ∈ Ω, we obtain a realization of the particle system with N particles at time n and a corresponding random measures denoted by η N n : ω ∈ Ω → η N n (ω) ∈ P(H)

B Existence of the Fixed Point
Let us formally define the EMS map as a map from the set of measures to the set of probability measures, F EMS : M + (X) → P(X), such that dy and the EM map, F EM : M + (X) → P(X) as in (7), slightly more formally as: We introduce the smoothing operator, K : P(X) → P(X), corresponding to the smoothing kernel in (A2) and observe that F EMS η = K (F EM (η)) = (F EM η) K.
In order to prove that the EMS map admits a fixed point, a number of properties of the EM map, of the smoothing operator K and of the EMS map itself must be established. We show that F EMS is a compact operator on M + (X) (Corollary 4). To do so, we show that F EM is continuous and bounded (Proposition 5) then we prove that K is compact (Proposition 6). Compactness is needed to prove existence of a fixed point.

B.1 Properties of the Continuous EMS Map
Proposition 5. Under (A1), the EM map F EM in (26) is a continuous and bounded operator on (M + (X), · T V ).
Proof. First we prove continuity on M + (X) with the TV norm (25). We make extensive use of Fubini's Theorem, whose applicability is granted by the boundedness of g, which implies the boundedness of h: Let η ∈ M + (X) and {η n } n≥1 be a sequence of measures in M + (X) with η n − η T V → 0 as n → ∞. For any ϕ ∈ B 1 (X) consider dy .
For fixed y, g(y | ·) ∈ B b (X), thus |η (g(y | ·)) − η n (g(y | ·))| ≤ m g η n − η T V so the first term is bounded by |η (g(y | ·)) − η n (g(y | ·))| dy ≤ ϕ ∞ m 2 The function x → Y ϕ(x)g(y | x)h(y) η (g(y | ·)) dy is measurable, since g, h are continuous, ϕ is measurable and the continuity of y → η (g(y | ·)) follows from the continuity of g and the Dominated Convergence Theorem, and bounded by m 2 g ϕ ∞ / η T V . Thus the second term is bounded by The two statements show that if η n − η T V → 0 then, F EM η n − F EM η T V → 0, proving that the EM map (5) is continuous in M + (X).
Finally, consider boundedness. A non-linear operator is bounded if and only if it maps bounded sets into bounded sets (e.g. Zeidler (1985, page 757)). The EM map F EM maps the space of positive finite measures M + (X) into the space of probability measures P(X), whose elements have TV norm uniformly bounded by 1; in particular F EM maps any bounded subset of M + (X) into a uniformly bounded subset of P(X), showing that F EM is a bounded operator.
Proposition 6. The smoothing operator K defined in (27) is compact on (P(X), · T V ).
Proof. To prove that K is compact we need to prove that it is continuous and maps bounded subsets into relatively compact subsets.
To prove continuity we just observe that K in (27) is a bounded linear operator: since K : P(X) → P(X) and K η = ηK with ηK T V = η T V = 1, we have that the operator norm K = 1 and K is continuous.
To prove that K maps bounded subsets into relatively compact subsets we use the Arzelà-Ascoli theorem. The diameter of the image of K is uniformly bounded by 1, as ηK ∈ P(X) implies ηK T V = 1, for all η ∈ P(X). To show that K is equicontinuous, for any η ∈ P(X) take η ∈ P(X) so that η − η ∈ M 0 (X). We need to prove that for every ε > 0, there is a δ(ε) > 0 independent on η, η , such that As K is a Markov kernel, its Dobrushin coefficient is at most 1 (Dobrushin, 1956): and (28) holds with δ(ε) = ε. Equicontinuity and uniform boundedness give the relative compactness of {ηK : η ∈ P(X)} by the Arzelà-Ascoli Theorem (e.g. Zeidler (1985, page 772)), proving that K is a compact operator.
Corollary 4 (Compactness of F EMS ). The EMS map F EMS is compact on (M + (X), · T V ).
Proof. The EMS map is the composition of the continuous and bounded operator F EM (by Proposition 5) which maps bounded sets into bounded sets with the compact smoothing operator K (by Proposition 6) which maps bounded sets into relatively compact sets. It follows that F EMS is continuous and maps bounded sets into relatively compact sets, hence F EMS is compact (e.g. Zeidler (1985, page 54)).

B.2 Proof of Proposition 1
Proof. Let us denote by s : M(X) → R the map s : µ → µ(X). This map is continuous: The set of probability measures P(X) ⊂ M(X), is the intersection of the set {µ : s(µ) = 1} with the set of non-negative measures, {µ : µ T V = s(µ)}. As both sets are closed, P(X) is closed too. Moreover, P(X) is non-empty, bounded (since all of its elements have TV norm equal to 1) and convex: take µ, ν ∈ P(X) and t ∈ [0, 1], then as t, 1 − t ≥ 0 and P(X) ⊂ M + (X).
These properties and the compactness of the EMS map (Corollary 4) give the existence of a fixed point by Schauder's fixed point theorem see, e.g., Zeidler (1985, Theorem 2.A).

C Convergence of the SMC Approximation
The theoretical characterization of the particle method approximating the EMS recursion is carried out by decomposing Algorithm 1 into three steps: mutation, reweighting and resampling. This decomposition is standard in the study of SMC algorithms (Crisan and Doucet, 2002;Chopin, 2004;Míguez et al., 2013) and allows us to examine the novelty of the particle approximation introduced in Section 3 by directly considering the contribution to the overall approximation error of the use of approximate weights G N n . First, consider the following decomposition of the dynamics in (8) with potentials (11) and Markov kernels (10). In the selection step, the current state is weighted according to the potential function G n η n (x 1:n , y 1:n ) ≡ Ψ Gn (η n )(x 1:n , y 1:n ) = 1 η n (G n ) G n (x n , y n )η n (x 1:n , y 1:n ); then, in the mutation step, a new state is proposed according to M n+1 η n+1 (x 1:n+1 , y 1:n+1 ) ∝η n (x 1:n , y 1:n )M n+1 (x n+1 | x n ).
Each step of the evolution above is then compared to its particle approximation counterpart: the weighted distribution Ψ Gn (η N n ) is compared with Ψ Gn (η n ), the resampled distributionη N n is compared withη n and finally η N n is compared with η n . The proof of the L p -inequality in Proposition 3 follows the inductive approach of Crisan and Doucet (2002); Míguez et al. (2013) and consists of 4 Lemmata. Lemmata 2, 4 and 5 are due to Crisan and Doucet (2002); Míguez et al. (2013) and establish L p -error estimates for the reweighting step performed with the exact potential G n (exact reweighting), the multinomial resampling step and the mutation step. Lemma 3 compares the exact reweighting with the reweighting obtained by using the approximated potentials G N n and is the main element of novelty in the proof.
In the following we commit the usual abuse of notation and we denote by η n both a measure and its density with respect to the Lebesgue measure.

C.1 Proof of Proposition 3
Before proceeding to the proof of Proposition 3 we introduce the following auxiliary Lemma giving some properties of the approximated potentials G N n : Lemma 1. Under (A1), the approximated and exact potentials are positive functions, bounded and bounded away from 0 We have the following decomposition G N n (x, y) − G n (x, y) = G n (x, y) η n | X (g(y | ·)) − η N n | X (g(y | ·)) η N n | X (g(y | ·)) = G N n (x, y) η n | X (g(y | ·)) − η N n | X (g(y | ·)) η n | X (g(y | ·)) for fixed (x, y) ∈ H.
Proof. The boundedness of G n and G N n follows from definitions (11) and (15) and the boundedness of g. The second assertion is proved by considering the relative errors between the exact and the approximated potential: respectively.
Proof. The proof follows that of Crisan and Doucet (2002, Lemma 4) by exploiting the boundedness of G n (Lemma 1).
Lemma 3 (Approximate reweighting). Assume that for any ϕ ∈ B b (H) and some finite constants C p,n for any ϕ ∈ B b (H) and for some finite constantC p,n .
Proof. Apply the definition of Ψ Gn and Ψ G N n and consider the following decomposition .
Then, for the first term For the second term Hence, By applying Minkowski's inequality and the decomposition of the potentials in Lemma 1 Then, consider S N n := σ Y i n : i ∈ {1, . . . , N } , the σ-field generated by all the Y i n at time n. By construction, the evolution of X i n for i = 1, . . . , N is independent on S N n (this is due to the definition of the mutation kernel (10)). Conditionally on S N n , the Y i n are fixed for i = 1, . . . , N and we can use the fact that, the integrals of functions from X to R with respect to η n and η n | X coincide as do their integrals with respect to η N n and η N n | X , thus for fixed y: where the last inequality follows the hypothesis since g(y | ·) is a bounded and measurable function for all fixed y ∈ Y. Hence, since Y i n is S N n -measurable and independent of η N n | X , we have Therefore, with the constantC p,n = 2 C p,n m 6 g .
Lemma 4 (Multinomial resampling). Assume that for any ϕ ∈ B b (H) and for some finite constant C p,n then after the resampling step performed through multinomial resampling for any ϕ ∈ B b (H) and for some finite constant C p,n .
Proof. The proof follows that of Crisan and Doucet (2002, Lemma 5) using the Marcinkiewicz-Zygmund type inequality in Del Moral (2004, Lemma 7.3.3) and the hypothesis.
Lemma 5 (Mutation). Assume that for any ϕ ∈ B b (H) and some finite constant then, after the mutation step for any ϕ ∈ B b (H) and for some finite constant C p,n+1 .
Proof. The proof follows that of Crisan and Doucet (2002, Lemma 3), where after applying Minkowski's inequality we can bound the first term with the Marcinkiewicz-Zygmund type inequality in Del Moral (2004, Lemma 7.3.3) and the second term with the hypothesis.
The proof of the L p -inequality in Proposition 3 is based on an inductive argument which uses Lemmata 2-5: Proof of Proposition 3. At time n = 1, the particles ( Then, assume that the result holds at time n: for every ϕ ∈ B b (H) and some finite constant C p,n The L p -inequality in (18) is obtained by combining the results of Lemma 2 and Lemma 3 using Minkowski's inequality for every ϕ ∈ B b (H) and some finite constantsC p,n ,C p,n . Thus, C p,n =C p,n +C p,n . Lemma 4 gives E |η N n (ϕ) −η n (ϕ)| p 1/p ≤ C p,n ϕ ∞ N 1/2 for every ϕ ∈ B b (H) and some finite constants C p,n , and Lemma 5 gives for every ϕ ∈ B b (H) and some finite constant C p,n+1 . The result holds for all n ∈ N by induction.

C.2 Proof of Proposition 4
Using standard techniques following Dudley (2002, Chapter 11, Theorem 11.4.1) and Berti et al. (2006) and given in detail for the context of interest by Schmon et al. (2020, Theorem 4), the result of Corollary 1 can be strengthened to the convergence of the measures in the weak topology.

D.1 Proof of Corollary 2
Proof. The result follows from (18) with for fixed x ∈ X. The density of K n+1 (x, ·) is continuous and bounded by uniformly in x by (A2) and S is a bounded density, thus Hence, we can control the expectation by (18) in Proposition 3 du depends on n, p, the bandwidth s N , the covariance matrix Σ and the choice of kernel T, S.

D.2 Proof of Corollary 3
Proof. The L p inequality (20) gives for every as N → ∞. Then, Corollary 1 gives pointwise almost sure convergence of f N n+1 (x) to f n+1 (x) for all x ∈ X.
As both f N n+1 (x) and f n+1 (x) are probability densities on X, we can extend them to R d X by taking x ∈ X 0 otherwise respectively. Both ψ N n+1 (x) and ψ n+1 (x) are probability densities on R d X and are measurable functions. Moreover, ψ N n+1 (x) converges almost surely to ψ n+1 (x) for all x ∈ X. Hence, we can apply Glick's extension to Scheffé's Lemma (e.g. Devroye and Wagner (1979) The last inequality follows from the boundedness of X given by (A0a). Hence,

E.1 Analytically tractable example
Here we consider a toy example involving Gaussian densities for which both the EM recursion (2) and the EMS recursion (6) can be solved at least implicitly. The Fredholm integral equation we consider is The initial distribution f 1 (x) is N (x; µ, σ 2 EMS,1 ) for some σ 2 EMS,1 > 0. The fixed point of the EMS recursion (6) with Gaussian smoothing kernel K n (x , x) = N (x; x , ε 2 ) is a Gaussian with mean µ and variance σ 2 EMS solving We can compute the Kullback-Leibler divergence achieved by f ∞ (x): KL h, as X f ∞ (x)g(y | ·) dx is the Gaussian density N (y; µ, σ 2 EMS +σ 2 g ). The fixed point for the EM recursion (2) is obtained setting ε = 0. The corresponding value of the Kullback-Leibler divergence is 0. Figure 4 shows the dependence of σ 2 EMS and of the KL divergence on ε. The conjugacy properties of this model allow us to obtain an exact form for the potential (11) G n (x n , y n ) = g(y n | x n ) h n (y n ) = N (y n ; x n , σ 2 g ) N (y n ; µ, σ 2 g + σ 2 EMS,n ) where σ 2 EMS,n is the variance of f n (x). We use this example to show that the maximum likelihood estimator obtained with the EM iteration (4) does not enjoy good properties, and to motivate the addition of a smoothing step in the iterative process ( Figure 5).
Taking σ 2 f = 0.043 2 and σ 2 g = 0.045 2 we have |1 − 1 0 f (x)dx| < 10 −30 , thus we can restrict our attention to [0, 1] and implement the discretised EM and EMS by taking B = D = 100 equally spaced intervals in this interval. The number of iterations n = 100 is fixed for EM, EMS and SMC. The number of particles for SMC is N = 10 4 and ε = 10 −2 . The smoothing matrix for EMS is obtained by discretization of the smoothing kernel K n (x , x) = N (x; x , ε 2 ).
We do not include the approximation obtained with IB as this coincides with that of EM. the correct shape and is not smooth. On the contrary, both EMS and SMC give good reconstruction of f while preserving smoothness. Then we compare the deterministic discretization (5) of the EMS recursion (6) with the stochastic one given by SMC with the exact potential (31). To do so, we consider the variance of the obtained reconstructions, their integrated square error (23), the mean integrated square error for between h and h N n+1 (y) = Xf N n+1 (x)g(y | x) dx and the Kullback Leibler divergence KL(h,ĥ N n+1 ) (restricting to the [0, 1] interval and computing by numerical integration) as the value of the smoothing parameter ε increases ( Figure 6). We consider one run of discretized EMS and compare it with 1,000 repetitions of SMC for each value of ε (this choice follows from the fact that discretized EMS is a deterministic algorithm). The number of particles for SMC is N = 10 3 . Both algorithms correctly identify the mean for every value of ε while the estimated variances increase from that obtained with the EM algorithm (ε = 0) to the variance of a Uniform distribution over [0, 1] (Figure 6 top left). Unsurprisingly, the ISE for both f and h increases with ε ( Figure 6 top right and bottom left), showing that an excessive amount of smoothing leads to poor reconstructions. In particular for values of ε ≥ 0.5 the reconstructions of f become flatter and tend to coincide with a Uniform distribution in the case of EMS and with a normal distribution centered at µ and with high variance (≥ 0.08) in the case of SMC. This difference reflects in the behavior of the Kullback-Leibler divergence, which stabilizes around 133 for EMS while keeps increasing for SMC (Figure 6 bottom right).
We now consider the effect of the use of the approximated potentials G N n in place of the exact ones G n in the SMC scheme. We compare the ISE for f given by the SMC scheme with exact and approximated potentials for values of the number M of samples Y ij n drawn from h at each time step between 1 and 10 3 with 1,000 repetitions for each M . Through this comparison we also address the computational complexity O(M N ) of the algorithm, with focus on the choice of the value of M . Figure 7 shows the results for N = 10 3 and ε = 10 −3/2 . The behavior for different values of N and ε is similar. The plot of ISE(f ) shows a significant improvement when M > 1 but little further improvement for M > 10.
To further investigate the choice of M we compare the reconstructions obtained using the exact and the approximated potentials for M = 10, M = 10 2 and M = N = 10 3 . Figure 8 shows pointwise means and pointwise MSE (24) for 1,000 reconstructions. The means of the reconstructions with the exact potentials (blue) coincide for the three values of M , the means of the reconstructions with the approximated potentials (red) also coincide but have heavier tails than those obtained with the exact  potentials. The MSE is similar for reconstructions with exact and approximated potentials with the same value of M . In particular, the little improvement of the MSE from M = 10 2 to M = 10 3 suggests that M = 10 2 could be used instead of M = N = 10 3 if the computational resources are limited. Using M = 10 2 instead of M = 10 3 reduces the average runtime by ≈ 90% for both the algorithm using the exact potentials and that using the approximated potentials. Silverman et al. (1990, Section 5.4) conjectured that under suitable assumptions the EMS map (6) has a unique fixed point. This conjecture is empirically confirmed by the results in Figure 9. We run EM, EMS and SMC with approximated potentials for n = 100 iterations starting from three initial distributions f 1 (x): a Uniform on [0, 1], a Dirac δ centered at 0.5 and the solution N (x; µ, σ 2 f ). The number of particles is set to N = 10 3 and the smoothing parameter ε = 10 −1 . Both EMS and SMC converge to the same value of the Kullback Leibler divergence regardless of the starting distribution. The speed of convergence of the three algorithms is similar, in each case little further change is observed once 4 iterations have occurred.