Conditional particle filters with bridge backward sampling

Conditional particle filters (CPFs) with backward/ancestor sampling are powerful methods for sampling from the posterior distribution of the latent states of a dynamic model such as a hidden Markov model. However, the performance of these methods deteriorates with models involving weakly informative observations and/or slowly mixing dynamics. Both of these complications arise when sampling finely time-discretised continuous-time path integral models, but can occur with hidden Markov models too. Multinomial resampling, which is commonly employed with CPFs, resamples excessively for weakly informative observations and thereby introduces extra variance. Furthermore, slowly mixing dynamics render the backward/ancestor sampling steps ineffective, leading to degeneracy issues. We detail two conditional resampling strategies suitable for the weakly informative regime: the so-called `killing' resampling and the systematic resampling with mean partial order. To avoid the degeneracy issues, we introduce a generalisation of the CPF with backward sampling that involves auxiliary `bridging' CPF steps that are parameterised by a blocking sequence. We present practical tuning strategies for choosing an appropriate blocking. Our experiments demonstrate that the CPF with a suitable resampling and the developed `bridge backward sampling' can lead to substantial efficiency gains in the weakly informative and slow mixing regime.


Introduction
Sampling from the posterior distribution of the latent states of a dynamic model, such as a hidden Markov model (HMM), is a common statistical task. Such 'smoothing problems' arise in many fields of science, such as ecology [Wood, 2010], epidemiology [Rasmussen et al., 2011] and genetics [Mirauta et al., 2014]. The particle Markov chain Monte Carlo (PM-CMC) methods [Andrieu et al., 2010] have proved invaluable for solving such problems. In particular, the conditional particle filter (CPF) with multinomial resampling and backward sampling (BS) [Andrieu et al., 2010, Whiteley, 2010, or the probabilistically equivalent particle Gibbs with ancestor sampling algorithm [Lindsten et al., 2014], have been found to perform well with challenging HMMs and long data records Künsch, 2018, Lee et al., 2020].
However, the performance of CPF with BS (CPF-BS) deteriorate when the observations are weakly informative, that is, when the weights (or the potential functions in the Feynman-Kac (FK) distribution) are nearly constant [cf. Chopin et al., 2022]. In such scenarios, the multinomial resampling steps introduce too much noise by eliminating particles excessively. Furthemore, when the dynamics of the model of interest are slowly mixing, the backward/ancestor sampling step has only a limited effect. We mitigate these issues by devising new resampling algorithms and an alternative for backward sampling for such weak potentials and slowly mixing scenarios.
A time-discretisation of a continuous-time Feynman-Kac (FK) path integral model leads to an 'extreme' weakly informative and slow mixing scenario [cf. Chopin et al., 2022]. Motivated by successes of CPF-BS in the usual discrete time settings, we were interested to seek for a CPF-BS analogue which is stable with respect to refined time-discretisations. It is relatively easy to see that the direct application of BS degenerates under such refined discretisations, except for limited cases such as when the driving Markov process admits jumps, such as considered in [Miasojedow and Niemiro, 2015], or in a univariate case where the trajectories can cross with positive probability.
To address the inherent inefficiency of multinomial resampling, we draw inspiration from the recent works of Arnaudon and Del Moral [2020] and Chopin et al. [2022] where other types of resampling are shown to be more effective for weakly informative potentials. Arnaudon and Del Moral [2020] propose a continuous-time version of the CPF with 'killing' resampling, however, this is an idealised algorithm in the sense that practical diffusion models need to be time-discretised. The work of Chopin et al. [2022] studies the stability of resampling for particle filters (and not the CPF) as the time discretisation is refined. There, a new systematic resampling method is proposed, which incorporates a 'mean partition' step, which has not been developed for the CPF yet. The mean partition step does further decrease superfluous resampling [Chopin et al., 2022], and empirical results suggest improved performance.
The main contributions of this paper are as follows.
• We detail two new conditional resampling algorithms: the 'killing' and systematic resampling with mean partition (Section 4). These are conditional versions of resampling algorithms, which were shown to be stable in the weak potentials setting (in the continuous-time limit) [Chopin et al., 2022]. We also detail a generic sufficient condition for conditional resamplings (Assumption 7), which guarantees validity of the CPF (Theorem 2), and complements the result of Chopin and Singh [2015]. • Our main contribution is a new CPF with bridge backward sampling (CPF-BBS) (Section 5), which may be regarded as a generalisation of BS to an arbitrary 'blocking sequence,' and which can avoid the degeneracy problem of CPF-BS with refined discretisations. • The performance of the CPF-BBS relies on an appropriately chosen blocking sequence, which depends on the model at hand. Therefore, a significant portion of our work focuses on finding practical, computationally inexpensive and robust tuning criteria for choosing such a sequence (Section 6). We introduce a method for blocking sequence selection that requires a small number of independent runs of the standard particle filter for the model of interest. Our developments related to blocking sequence selection can be of independent interest, and potentially useful with other methods based on blocking, such as the blocked particle Gibbs [Singh et al., 2017]. Systematic resampling in the context of CPF has been proposed before [Chopin and Singh, 2015] but not the more efficient mean partition version. Furthermore, Chopin and Singh [2015] do not discuss or demonstrate efficiency in the context of weak potentials, which is our primary motivation.
The CPF-BBS is a general method, but requires evaluation of and simulation from the conditional distributions of (multiple steps of the) proposal distributions. In practice, this typically means that the proposals are linear-Gaussian, arising for instance from a linear stochastic differential equation (SDE). The latter can occur in single molecule studies [d' Avigneau et al., 2022], and one of our numerical examples demonstrates how animal movement modelling based on telemetry data [Johnson et al., 2008] can be combined with a path integral model to account for habitat preferences instead of so-called step-selection formulation [cf. Hooten et al., 2017, Thurfjell et al., 2014. Linear-Gaussian state dynamics are common with structural time series models [Durbin and Koopman, 2012] too, and smoothing distribution approximations can lead to weak potentials [cf. Vihola et al., 2020].
The CPF-BBS features 'bridging' CPF steps, which resemble the intermediate block importance sampling suggested in [Lindsten et al., 2015a], and the MCMC rejuvenation considered in [Lindsten et al., 2015a, Carter et al., 2014; there are similarities also with the bridging particle filter suggested in [Del Moral and Murray, 2015]; see also [Mider et al., 2021]. We believe that our approach is more efficient than direct importance bridging, and because our approach can be intuitively related to a continuous-time analogue (through [Chopin et al., 2022]), it is expected to behave well with respect to refinement of timediscretisation, unlike the MCMC bridging.
Our experiments (Section 8) demonstrate how the developed resamplings outperform standard multinomial resampling in the weak potentials and slow mixing scenario, and we establish empirically an order between their performance, which follows a similar pattern as the results of Chopin et al. [2022] for the standard particle filter applied to FK path integral models. Empirical results using the CPF-BBS show a significant improvement over CPF-BS, and reveal how the method is stable with respect to refined discretisation. Finally, our tuning algorithm appears to deliver blocking sequences that facilitate efficient inference with little additional specification from the user.

Preliminaries and notation
We aim at inference of a smoothing distribution, typically arising from a hidden Markov model (HMM), which consists of a latent Markov state X 1:T taking values in X, with initial (prior) density m 1 (x 1 ) and transition probability densities (m k (x k | x k−1 )) 2≤k≤T , as well as conditionally independent observations Y 1:T with observation densities (g k (y k | x k )) 1≤k≤T . The smoothing distribution is the posterior density of the latent states X 1:T : with normalising constant Z := γ(x 1:T )dx 1:T ∈ (0, ∞). We approach the inference of (1) via an alternative representation often referred to as a Feynman-Kac (FK) distribution [Del Moral, 2004]. In our context, the FK distribution consists of the 'components' (M 1:T , G 1:T ) (described below) that are chosen such that where γ(x 1:T ) is defined as in (1). The components M 1 (x 1 ) and M k (x k | x k−1 ) define, respectively, an alternative initial distribution and 'proposal' transition densities of a Markov chain on X, and the components G 1 : X → [0, ∞) and G k : The FK distribution (2) is a generalisation of the HMM smoothing problem (1). Indeed, taking M k ≡ m k and G 1 (x 1 ) = g 1 (y * 1 | x 1 ) and G k (x k−1:k ) ≡ g k (y * k | x k ) satisfies (2). However, it is often beneficial to choose another 'proposal' family M k ̸ ≡ m k , in which case the 'weights' in (2) may be taken as for k ≥ 2 to satisfy (2). The FK distribution (2) arises also in contexts beyond the HMM scenario, such as with time-discretised path-integral models, as discussed in Section 7.
Above and hereafter, 'dx' stands for a σ-finite dominating measure on a general state space X, integers are equipped with the counting measure, and product spaces are equipped with products of the dominating measures. We use the shorthand notation for sequences: . We also denote [N ] := {1, . . . , N }, and use U(a, b) and U([N ]) to denote the uniform distribution in the interval (a,b) and on integers [N ], respectively. Finally, test and potential functions are implicitly assumed measurable.

The conditional particle filter
The particle filter is a sequential Monte Carlo algorithm, which includes sampling from Markov dynamics M k , and resampling proportional to weights arising from G k . The resampling operation r(a (1:N ) | g (1:N ) ) defines a probability distribution on [N ] N which depends on non-negative 'unnormalised weights' g (1:N ) . That is, if A (1:N ) ∼ r( · | g (1:N ) ) in [N ], then P(A (1:N ) = a (1:N ) ) = r(a (1:N ) | g (1:N ) ). We will only consider unbiased resamplings [Crisan et al., 1999] r, which means that for all j ∈ [N ]: Algorithm 1 describes the particle filter targeting the FK distribution (2) using N particles, and an unbiased resampling r( · | g (1:N ) ). The boldface notation X ) stands for the latest particles augmented with their ancestors, generated during the algorithm. In what follows, we use underline to denote 'all particles' at one time instant, so for instance X k = X (1:N ) k . The conditional particle filter (CPF) introduced in [Andrieu et al., 2010] is similar to the particle filter, but its motivation is different: it implements a π-invariant Markov transition kernel from an input (so called 'reference' path) X * 1:T ∈ X T to a newly chosen path X * 1:T ∈ X T . The original scheme of Andrieu et al. [2010] assumed multinomial sampling with ancestor tracing. Algorithm 2 presents a generic version of the CPF with N particles and with ancestor tracing, using a generic conditional resampling r (p,n) . The conditional Algorithm 2 CPF-AT(r (p,n) , X * Algorithm 3 CPF(r (p,n) , X * 1:T , B 1:T ; M 1:T , G 1:T , N ) the ancestor of the reference. This makes it possible to write Algorithm 2 such that the reference trajectory can be located at arbitrary indices B 1:T , unlike earlier formulations, which assume reference at index 1 [e.g. Chopin and Singh, 2015]. The arbitrary reference indices turn out to be convenient for us, when we introduce the bridge backward sampling CPF in Section 5. Definition 1 gives a sufficient condition that r (p,n) is a valid conditional resampling for use with Algorithm 2.
Ancestor tracing version of the CPF requires typically N = O(T ) in order to remain efficient [Andrieu et al., 2018, Lindsten et al., 2015b. If the potentials G k are nearly uniform, this issue can be mitigated to some extent by an efficient resampling. For instance, when the model is a time discretisation of a continuous-time model (cf. Section 7), N does not need to increase with respect to the number of time steps, but in general, N still has to increase with respect to the time horizon.
Whiteley [2010] suggested, in a discussion note to [Andrieu et al., 2010], that if transition densities M k can be calculated, backward sampling may be used in CPF (with multinomial resampling), instead of ancestor tracing. Later, a probabistically equivalent variant of BS called 'ancestor sampling' (AS) [Lindsten et al., 2014], was introduced. The Markov kernels of CPF with BS/AS are reversible with respect to π, and guaranteed to outperform AT in the asymptotic variance sense [Chopin and Singh, 2015]. The improvement has been found substantial in many empirical studies, and the BS/AS variants have been found stable with constant N and very large T ; see [Lee et al., 2020] for a theoretical result which consolidates these findings. We present a generalisation of BS in Section 5, which remains efficient with slowly mixing M k , and in particular, with refined time discretistaions.

New conditional resampling algorithms
The simplest unbiased resampling, that is, satisfying (3) is multinomial resampling, where A (k) are drawn independently from the categorical distribution Categ(w (1:N ) ) with normalised weights w (j) = g (j) N i=1 g (i) . However, in the context of this work, multinomial resampling is wasteful, and we focus instead on conditional versions of two resampling algorithms, that were recently shown to be stable in refined discretisations [Chopin et al., 2022]. We briefly discuss these resampling algorithms below, and then describe their conditional variants.
The first resampling is the 'killing' resampling, defined as follows [cf. Del Moral, 2013]: where g * = max i∈{1:N } g (i) (and in case g * = 0, r kill may be defined arbitrarily). The killing resampling is valid also with any other choice of g * as long as g (j) ≤ g * , but we consider the above one minimising the resampling rate, which was also used in [Murray et al., 2016]. Like multinomial resampling, the components of the random vector A (1:N ) are independent but not identically distributed.
The second resampling is 'systematic resampling with mean partition' (Definition 5), which is a variant of 'standard' systematic resampling (Definition 3) where the weights w 1:N are processed in a particular 'mean partition' order (Definition 4).
Definition 5. (Systematic resampling with mean partition). Let F −1 ϖ denote the generalised inverse distribution function corresponding to the re-indexed weights w 1:N ϖ , where ϖ is a mean partition order as in Definition 4. Set A i := ϖ(F −1 ϖ (Ǔ i )), whereǓ 1:N are defined as in Definition 3.
Systematic resampling with mean partition was introduced in [Chopin et al., 2022], where it was shown to have the smallest resampling rate among a number of other algorithms, including killing resampling. The empirical evidence in [Chopin et al., 2022] further suggests that the mean partition order variant improves performance over the 'standard' systematic resampling. The work of Chopin et al. [2022] featured also another resampling algorithm, the Srinivasan sampling process [Gerber et al., 2019], which was on a par with systematic. However, it seems difficult to devise an efficient implementation of the conditional version of the latter.
Algorithms 5 and 6 describe the conditional variants of killing resampling and systematic resampling with mean partition, respectively. These algorithms are valid conditional resamplings, as indicated by Lemma 6, whose proof is given in Appendix A. Both algorithms make use of a cyclic shift of 1:N by s, which is denoted by σ N s , that is, σ N s (i) := 1+(i+s−1) mod N . The conditional systematic resampling with mean partition is an extension of the conditional systematic resampling of Chopin and Singh [2015]. The mean partition may be found in O(N ) time, and our implementation is based on Hoare's scheme Hoare [1962]; see Algorithm 12 in Appendix E.

Bridge backward sampling
The backward/ancestor sampling CPF [Whiteley, 2010, Lindsten et al., 2014 often has impressive performance even with small N and large T [Lee et al., 2020]. The selection probabilities in the backward sampling step include the transition density M k+1 (X . When M k+1 is slowly mixing, this density is typically very small for all i except for the ancestor i = A (B k+1 ) k , and therefore the backward sampling step essentially reduces to ancestor tracing.
We discuss next the conditional particle filter with bridge backward sampling (CPF-BBS), which is a generalisation of CPF with backward sampling (CPF-BS) [Whiteley, 2010] suitable for slowly mixing M k . The backward sampling step is replaced by a 'bridging' procedure which spans over multiple time steps, and requires tractable dynamics {M k }, in the following sense: Then, we are able to simulate from and evaluate the density of the conditional distribution of x ℓ given x ℓ−1 and x u :M .
We further assume that we are able to evaluate the conditional density of x u given x ℓ : Assumption 7 is restrictive, and satisfied for instance by linear-Gaussian M k . Note, however, that M k need not necessarily correspond to the statistical model, but may be another 'proposal' distributions, as long as the Feynman-Kac model (2) corresponds to the smoothing distribution (cf. discussion in Section 9).
Algorithm 7 gives the pseudocode of the CPF-BBS algorithm. The first step (line 1) invokes the forward CPF (Algorithm 3). The bridging procedure (line 5) that replaces the usual backward sampling step in CPF-BS, requires a fixed 'blocking sequence' 1 = T 1 < · · · < T L = T that gives rise to the blocks (T i−1 , T i ), i = 2, . . . , L. For each block, ℓ = T i−1 and u = T i are referred to as the block lower and upper boundaries, respectively, and Algorithm 8 attempts to change the ancestor of state X (B * u ) u at time ℓ from B * ℓ to a different particle from the pool X l . Success of this step hinges on Algorithm 8 being able to generate particles that could equally well explain the future state X (B * u ) u being conditioned on (see line 8 of Algorithm 8), for which the conditional densities of Assumption 7 are needed in its forward simulation procedure (lines 1-7). The new ancestor from the pool X l is then found by ancestor tracing (line 9). Success in this step relies on an efficient resampling strategy (line 3) to avoid particle degeneracy so that many particles from X l can survive to line 8. The choice of the blocking sequence is important and we devise a practical design choice procedure for this in Section 6.1.
The following result, whose proof is given in Appendix B, ensures the validity of CPF-BBS.
With dense blocking sequence T 1:T = 1:T , the bridging CPF and its tracing (lines 2-7 and 9 of Algorithm 8, respectively) are eliminated, and therefore the CPF-BBS simplifies to the backward sampling CPF (CPF-BS) of Whiteley [2010]. This means that the CPF-BBS can be viewed as a true generalisation of CPF-BS for arbitrary blockings.
The other extreme case, that is, the trivial blocking sequence T 1 = 1, T 2 = T leads to running a CPF and then another CPF with same initial particles and targeting the conditional distribution π(x 1:T −1 |X * T ) (cf. Lemma 12). This may not be practically useful, but can give insight about what the 'bridge CPF' is about.
We conclude this section with two remarks about methods related to Algorithm 7.
(i) If we modify the algorithm by replacing BridgeCPF by the following algorithm: 1: SetX , then we get a CPF version of the extended importance sampling for particle filters suggested in [Doucet et al., 2006]. We did not investigate such 'importance bridging' method further, but we believe that our bridge CPF generally allows for longer block sizes, for similar reasons why particle filters tend to be more efficient than importance sampling. (ii) Algorithm 7 has similarities with the blocked particle Gibbs (or blocked CPF) of Singh et al. [2017], but we believe that CPF-BBS can be substantially more efficient with the same computational complexity, because: • CPF-BBS uses a block-wide 'lookahead' which is possible to implement thanks to Assumption 7, instead of using a modified potential only at the last time instant like (the direct implementation of) blocked particle Gibbs. • The BBS update is not conditioned on a single point at the block start, like the blocked particle Gibbs, but uses all particles which were generated by the 'forward' CPF. (Algorithm 3). Note, however, that the blocked particle Gibbs is directly parallelisable unlike the CPF-BBS. Finally, we note that the reverse update order of the blocks occurs since a block's update depends on the value at the lower boundary of the subsequent block. Although not pursued here, it might also be possible to devise a forward only implementation, following Lindsten et al. [2014].

Blocking sequence selection
The CPF-BBS (Algorithm 7) is valid with any choice of the blocking sequence T 1:L . However, its choice affects simulation efficiency, that is, the mixing of the Markov chain. In this section, we discuss a computationally inexpensive method that can be used in practice to determine a suitable blocking sequence prior to running the CPF-BBS in order to facilitate efficient mixing.
We begin in Section 6.1 by discussing a proxy for the integrated autocorrelation time (IACT) of the Markov chain output by the CPF-BBS. Then, Section 6.2 details an estimator we have developed for the proxy. Finally, Section 6.3 describes a practical algorithm for blocking sequence selection that is based on the estimator of Section 6.2. We will study the methods presented in this section empirically in Section 8.
6.1. The probability of lower boundary updates (PLU). A theoretically attractive candidate strategy for blocking sequence selection is monitoring the IACT for variables of interest, based on the output of the CPF-BBS. Efficient inference could then be obtained by choosing the blocking sequence that minimises the IACT. However, this approach is typically computationally demanding or even infeasible, since the estimation of the IACT is notoriously difficult and often requires extensive simulation of Markov chains.
For these reasons, we base the selection of the blocking sequence on a proxy for IACT that is easier to work with. We call the proxy the 'probability of lower boundary updates' (PLU), and its definition for the block (ℓ, u), using the notation of Algorithm 8, is: In other words, PLU(ℓ, u) measures the probability that the bridge CPF (Algorithm 8) on block (ℓ, u) updates the value at the block lower boundary ℓ. Intuitively, higher values of PLU should be associated with lower IACT. Indeed, our experiments in Section 8 indicate that maximising PLU (with respect to the block size) appears to yield a block size that approximately minimises IACT.
6.2. Approximate estimator for the PLU. Even though PLU(ℓ, u) is much easier to estimate than IACT, it still requires iterating the CPF-BBS for each candidate blocking, which is computationally demanding. We have developed an estimator for PLU(ℓ, u) which avoids this, and is based on a single 'stationary' CPF state (the generated particles X (1:N 1:T and reference indices B 1:T ), which is used for any block boundaries ℓ, u. The practical algorithm postponed to Section 6.3 will be based on this idea, but assumes further that such a stationary CPF state can be well approximated by an independent particle filter.
The estimator from a single CPF state is presented in (9) below and is based on two 'asymptotic' characterisations for PLU, for small and large block sizes, respectively. The idea behind the characterisations is that the event X occurs when a trajectory traced back from the generated particle tree in the bridge CPF has a different value at the block lower boundary than the reference.
Consider first the case of a small block size, that is, u − ℓ ≈ 1. In this case, PLU is approximately characterised by: refer to the ℓth and uth value of a reference trajectory. The rationale for (6) comes from CPF-BS being a special case of the CPF-BBS for the dense blocking with unit block sizes. Letting b t and b t+1 denote the indices of the current reference, the probability of choosing b t in backward sampling [Whiteley, 2010] is given by: Here, under the weak potential setting (i.e. with approximately constant potentials), the right hand side approximately reduces to M u|ℓ (X ℓ ) since ℓ = t, u = t + 1 with a unit block size. The probability of choosing a non-reference is therefore approximately given by (6).
On the other hand, if the block size is large, PLU(ℓ, u) is approximately characterised by: where the quantity p k equals the probability that a resampling event occurs, divided by N . In the case of systematic resampling with mean partitioned weights W The justification of (7) comes from a calculation detailed in Appendix F, which shows that PLU G (ℓ, u) approximately equals the expected proportion of particles whose ancestor at time ℓ is not the reference after an 'artificial' conditional particle system has evolved for u − ℓ time steps from time ℓ. Therefore, PLU G (ℓ, u) may be loosely interpreted as approximating the probability of choosing non-reference at time ℓ, when the ancestry of a particle chosen uniformly at time u is traced back until time ℓ.
Our estimator for PLU(ℓ, u) is constructed by 'interpolating' (6) and (7) such that where the scaling is added so that the estimator approximately reduces to (6) and (7) for short and long blocks, respectively, in the weak potential setting. The estimator in (9) was derived assuming an access to CPF state with N particles. It is also possible to estimate the PLU(ℓ, u) from a CPF (or particle filter) state which has a different number of particles N 0 (which is often useful to take 'large' in practice so that N 0 ≫ N ). In this case, we can estimate PLU G and PLU M as follows, and then use (9) with the desired N in the scaling.
To estimate PLU G , we simply compute p k using (8) from the N 0 particles and substitute it directly to (7) with the desired N < N 0 . For PLU M we use the alternative estimator of the form which follows by assuming that In other words, the block transition density for the reference is assumed to be approximately equal to a constant c(ℓ, u) times a 'typical' value of the block transition densities for particles not including the reference. The estimator (10) may be derived by appropriate substitution of (11) and (12) into (6). 6.3. Algorithm for blocking sequence selection. In this section we describe a practical method based on (9) to choose the blocking sequence. Algorithm 9 describes a method that uses (9) to evaluate S candidate blocking sequences (T . . , L (s) − 1. 8: end for 9: returnφ PLU particles and number of iterations, which are tuning parameters of the blocking candidate evaluation. Here, we use indexing notation where A[i, j, k] stands for the element in row i, column j and slice k in an array A. Furthermore, the columns of arrays need not have the same number of rows, and indexing operations with ':' mean 'all elements' in the particular dimension.
One iteration of the main loop in Algorithm 9 consists of running the standard particle filter (Algorithm 1) with mean partitioned systematic resampling followed by a traceback using ancestor tracing (Algorithm 4) in lines 2-3. Then, given the output of the particle filter, we estimate PLU using Algorithm 10 (see below) on line 4 for each block (ℓ, u) within each blocking sequence. The computation is a straightforward application of Equations (6)-(9) using the particle filtering results. Finally, lines 6-8 summarise the estimates of PLU by taking their mean over the n replicate runs of the particle filter. The elementφ PLU [i, s] in the output of Algorithm 9 describes in terms of PLU, how efficient the ith block in the blocking sequence s was.
Compute PLU M (ℓ, u) using (6) and PLU G (ℓ, u) using (7) 6: Set ϕ PLU [i, s] = PLU(ℓ, u) given in (9) 7: end for 8: end for 9: return ϕ PLU Algorithm 9 may in principle be used to evaluate any candidate blocking sequence, but we suggest to use it with Algorithm 13 given in Appendix E.2 that constructs dyadic blocking sequences: the block sizes T k+1 − T k are powers of two. More precisely, if T = 2 p * + 1 for some p * , Algorithm 13 returns blocking sequences T (i) 1:L (i) for i = 1, 2, . . . , p * + 1, where the block sizes of the ith sequence are all constant 2 i−1 , except for a possible 'residual block' of length < 2 i−1 as the last block in each sequence i.
Finally, Algorithm 11 describes a method based on Algorithms 9 and 13 for choosing a single blocking sequence to be used with the CPF-BBS and a given FK distribution. In summary, Algorithm 11 first constructs the candidate blocking sequences using Algorithm 13. Then, Algorithm 9 is run to obtainφ PLU given these sequences. The dataφ PLU is then reinterpreted as a set of elements D PLU , whose element (ℓ, b, e PLU ) describes the estimated PLU, e PLU , of the block with lower boundary ℓ and upper boundary ℓ + b. Finally, D PLU is processed such that blocking sequences with largest block sizes are considered first, and at each block lower boundary, the best performing block size in terms of the estimated PLU is selected to the output blocking sequence.

Linear diffusions with path integral weights
We discuss next a class of continuous-time models and their discretisations, for which the methods of Section 5-6 are particularly useful. We will consider instances of these models also in the experiments (Section 8).
We start with the continuous-time model on a time interval [0, τ ]. The prior dynamics M correspond to the solution of a linear stochastic differential equation (SDE): where B t is a d-dimensional Brownian motion and F and K are matrices of appropriate dimension, and µ init and Σ init are the mean and covariance of the initial distribution, respectively. The law of interest is M weighted by non-negative weights of the form w( are 'potential' functions that 'penalise' the trajectories of M. That is, the distribution of interest is proportional to M(dx [0,τ ] )w(x [0,τ ] ).
In practice, we assume a time discretisation of [0, τ ], 0 = t 1 < t 2 < · · · < t T = τ , which leads to the discrete-time FK distribution (2). The dynamics M 1:T in (2) correspond to the marginals of X [0,τ ] ∼ M, that is: which are linear-Gaussian. Appendix C details how M k can be derived from the parameters of the SDE, and also how their necessary conditional distributions required by Assumption 7 can be determined. The potential functions G 1:T in (2) stem from approximating the path integral by a Riemann sum: This leads to potentials of the following form: Remark 9. The scenario detailed above can be generalised and/or modified in a number of ways. Indeed, the potentials G k can also include purely discrete-time elements, as in our Cox process experiment (Section 8.2). The law M, or equivalently M k , can also correspond to the law of linear SDE conditioned on a number of linear-Gaussian observations. In such a case, the distributions M k are still linear-Gaussian, and we can derive the required conditional laws. This can be useful in many practical settings, and indeed was essential for our movement model example (Section 8.3).

Experiments
8.1. Comparison of conditional resamplings. We first investigate the performance of the CPF-BBS (Algorithm 7) using the conditional resamplings ρ kill and ρ syst . For reference, we also study conditional multinomial resampling with conditioning indices i and k, ρ mult . This conditional resampling may be simply implemented by first drawing the ancestor indices A (1:N ) ∼ Categ(w (1:N ) ) as in standard multinomial resampling, and then enforcing the condition A (k) = i (since A (1:N ) are independent).
In this section, we study a correlated random walk incorporating a path integral type potential function, hereafter called the CTCRW-P model. The dynamics of the model X t = (V t L t ) T are driven by the SDE where B t is the standard Brownian motion, σ, β v and β x are parameters, and (L t ) t≥0 and (V t ) t≥0 represent location and velocity processes, respectively. The FK representation (14) & (16) of CTCRW-P is given by Here, η is a parameter, and T t k−1 ,t k , Q t k−1 ,t k and S are the transition matrix, conditional covariance matrix and stationary covariance matrix, respectively, arising in the solution of the linear SDE (17). Their expressions are given in Appendix D.1, in Equations (41), (42)-(43) and (44)-(45), respectively.
The simulations were run with all combinations of the algorithm and model configurations described above. We estimated PLU (discussed in Section 6.1) by tallying iterations where x and dividing by their total, and estimated the IACT for L 0.0 using batch means [Flegal and Jones, 2010]. Figure 1 summarises the results of this experiment. The mean PLU shown in the top row is computed over the number of blocks (given here by τ /blocktime). The figure shows systematic and killing resampling performing better than multinomial resampling, which can be seen from the lower IACTs and higher mean PLU. The performance with multinomial resampling is poor here, as expected, since the model has weak potentials with |∆ k | = 2 −7 . In contrast, killing and systematic resampling behave nearly uniformly, with systematic resampling performing slightly better. This finding aligns well with the theoretical and empirical findings in [Chopin et al., 2022] for the particle filter in a similar context of path integral potentials and |∆ k | close to 0.
The CPF-BBS coincides with the CPF-BS when blocktime = |∆ k |, which corresponds to the first value on the horizontal axis. Even though increasing N naturally improves the performance of the CPF-BS too, the CPF-BBS has better simulation efficiency with an appropriately chosen blocktime, for any N in the simulation. Note that the estimation of the IACT is quite noisy here, since the mixing is poor especially with multinomial resampling and with poorly chosen blocking sequences induced by the value of blocktime. In contrast, the computed mean PLU appears less noisy, and in the case of systematic and killing resampling the best blocktime in terms of IACT is identified. We also investigated the relationship of PLU with IACT 32.0 , and the findings were similar. A further experiment fixing σ = 1.0 and varying η ∈ {0.125, 0.5, 2.0} instead also resulted in similar findings (see supplementary Figure 8).

8.2.
Choice of the blocking sequence. As already illustrated empirically with Figure 1 and discussed in Section 6, the choice of the blocking sequence is a tuning parameter affecting the sampling efficiency of the CPF-BBS. Figure 2 exemplifies this further by showing another look at the results obtained from the experiment in the previous section. Here, the logarithm of the inverse relative efficiency (IRE) is plotted at each time point when systematic resampling was used. The IRE is obtained by scaling the IACT by the number of particles, and measures the asymptotic efficiencies of estimators with varying computational costs [Glynn and Whitt, 1992]. The panes from left to right show the results with odd blocktime values and represent a range of algorithms beginning from the CPF-BS (blocktime = 2 −7 ). Here, the algorithms that are tuned well use only 4 particles, motivating the search for an appropriate blocktime (or blocking sequence). By visual inspection, it appears that the best blocktime values found in the experiment are roughly 2 1 for σ = 2.0, 2 3 for σ = 0.5 and 2 3 − 2 5 for σ = 0.125. When the value of σ is decreased, larger blocktime achieves better performance, since a small σ leads to 'stiff' dynamics M 1:T .
The best found blocktimes represent balances between two phenomena. On the one hand, the blocks are large enough so that the dynamic model has sufficient time to bridge from the block lower boundaries to the values conditioned on at the upper boundaries (in Algorithm 8). On the other hand, the blocks are small enough to avoid particle degeneracy within the blocks.
Next, we investigate how well the estimates ofφ PLU computed using Algorithm 9 coincide with PLU. We studied the relationship ofφ PLU and PLU with respect to blocktime (that is, with blocking sequences constructed with constant block sizes) using the CTCRW-P model with N ∈ {2 1 , 2 2 , . . . , 2 10 } and the parameter σ ∈ {0.03125, 0.125, 0.5, 2, 4}. The rest of the model configuration was as in Section 8.1. To estimate PLU, we ran 1100 iterations of Algorithm 7 with the first 100 discarded burn-in, monitoring for each block the proportion of iterations where x ℓ . In Algorithm 9, we used n = 50 runs of the particle filter and N as reported above. Figure 3 visualises the results for N ∈ {2, 8, 32, 128} (the results for other N yield no further conclusions). The estimated PLU andφ PLU appear to be in close agreement, with only slight discrepancies seen for large blocktimes. This finding motivates the use ofφ PLU as a maximisation criterion for finding a block size that likely results in a high overall PLU. Next, we turned to study Algorithm 11 for selecting the blocking sequence based on ϕ PLU . We investigated this with a model that slightly differs from the form (16), and is a Cox process model incorporating a reflected Brownian motion (CP-RBM) first appearing in [Chopin et al., 2022] and briefly detailed (with minor changes) below.
The CP-RBM model is a random intensity time-inhomogeneous Poisson process, generating observation sequencesτ . The intensity function is piecewise constant, and given by where α and β are parameters. The process (X t k ) k=1,...,T is distributed such that where N (r) (µ, σ 2 , a, b) is a distribution we call the 'reflected normal distribution', with parameters µ, σ and bounds a and b. To simulate from N (r) (µ, σ 2 , a, b), one first draws Z ∼ N (µ, σ 2 ) and then sets X = reflect(Z, a, b), where 'reflect' is an operation that recursively reflects (that is, mirrors over a boundary) Z with respect to a (if Z < a) or b (if Z > b) until a value within (a, b) is obtained and outputted.
To apply the CPF-BBS with the CP-RBM, we use the following FK representation: (20) where |∆ T | = 0. Note that these M k differ from [Chopin et al., 2022] and satisfy Assumption 7, even though the statistical dynamic model (19) does not. Here, the intractable reflection part of the model dynamics is accounted for in the potential functions G k . The FK distribution above is valid for the inference of the CP-RBM in the situation that the time discretisation is made fine enough such that each ∆ k contains at most one observation. The density N (r) (x; µ, σ 2 , a, b) contains an infinite sum (see Appendix D.2), which we truncate to the first ten terms. We first drew a realisation of the process X using (19) with |∆ k | = 2 −6 , σ = 0.3, a = 0, b = 3 and time interval length τ = 2 8 . Then, conditional on this realisation, we simulated one dataset,τ , from the Poisson process with intensity (18) with α = 1 and β = 0.5. Finally, we augmented the discretisation of the process X with the time pointsτ , leading to a model of the form (20).
For the blocking sequences, we considered the sequences induced by the constant blocktimes {2 −6 , 2 −5 , . . . , 2 5 } and a (inhomogeneous) blocking sequence constructed using Algorithm 11 with n = 50 and N = 8. Here, a minor change to the choice of candidate blockings (that is, Algorithm 13) was done: instead of constructing them using block sizes (integers) in powers of two as discussed in Section 6, we constructed them using the power of two blocktimes 2 −6 − 2 5 as this is more natural for a continuous-time model. For each blocking sequence, we then applied the CPF-BBS with N = 8 for 26000 iterations with the first 1000 discarded as burn-in. Figure 4 summarises the results of the experiment. The top pane shows the true simulated state, the observationsτ and the 50% and 95% credible intervals of the distributions X t |τ . The middle pane compares the IACTs obtained from the samples of said distributions with some of the considered blocking strategies; the inhomogeneous blocking is highlighted in red. The blocking strategies omitted from the figure yield no further conclusions and the IACTs for the blocktimes > 2 3 were greater than for the strategies depicted. Finally, the bottom pane visualises the inhomogeneous blocking sequence obtained using Algorithm 11.
In terms of the IACT, the blocking sequence returned by Algorithm 11 appears to perform similarly to the best choices for the blocking sequences constructed with constant blocktimes, indicating that the method here provides adequate performance without trial runs of the CPF-BBS. The bottom pane shows how the blocktime of the inhomogeneous blocking switches between 1/2, 1 and 2. 8.3. Movement modelling with terrain preference. We conclude with an application of the CPF-BBS to a movement modelling scenario. Here, we are interested in modelling the movement of an object on a plane based on noisy observations and knowledge of terrain in which the object moves. We assume that the object has a 'preference' for spending time in certain terrain types.  To model such a setting, we build on the continuous-time correlated random walk (CTCRW) model suggested for animal movement modelling based on telemetry data [Johnson et al., 2008]. The dynamics of the CTCRW model arise from a special case of the SDE (17), obtained by setting β x = 0 and denoting β := β v . Using this SDE independently in x and y dimensions yields a 4-dimensional state t ) T and a movement model on the plane, which we call the CTCRW SDE. The full CTCRW model also incorporates twodimensional location observations z = (z k ) k=1,2,...,Kz observed at times (t k ) k=1,2,...,Kz . Each observation is related to the location state variables, , where η is a standard deviation and I 2 stands for the 2 × 2 identity matrix. We use the initial distribution X t 1 ∼ N ((0, z 11 , 0, z 12 ) T , diag(σ 2 V , σ 2 L , σ 2 V , σ 2 L )), where z 11 and z 12 are the first and second coordinates of the first observation, respectively, σ 2 V = σ 2 /(2β) (the stationary variance of the velocity component) and σ 2 L is a parameter. The details regarding the solution of the CTCRW SDE are given in Appendix D.3.
Our model, which we denote CTCRW-T (T standing for 'terrain') differs from the CTCRW model of Johnson et al. [2008] by incorporating the effect of terrain. We use a discretisation of the CTCRW SDE conditioned on the observations as the sequence of M k 's in the FK representation of the CTCRW-T. More specifically, we define where X t stands for the state of the CTCRW model at time t and Z stands for all observations and z their realised values. The distributions in (21) are Gaussian, and they can be computed as marginal and conditional distributions of the joint normal distribution (36) in Appendix C.
The CTCRW-T models terrain preference through its potentials G k that are of the form (16) with V(x) = − log (v i ) when x is in terrain i. We call the values v i ∈ [0, 1], i = 1, . . . , K T , 'terrain coefficients', which induce the potential values for each of the K T terrain types.
We apply the CTCRW-T model in a region of Finland containing lakes, plotted in the background of Figure 5. The colors of the background map depict the value of V, with black representing larger values, that is, lower potential. We define the terrain types based on the Corine Land Cover classification [Finnish Environment Institute SYKE, 2018] which classifies each 20×20 metre cell in Finland to one of five classes. The terrain types and their associated terrain coefficients (in parentheses) are 'Artificial surfaces' (0.2), 'Agricultural areas' (0.6), 'Forests and semi-natural areas' (0.5), 'Water bodies' (0.0) and 'Wetlands' (0.5). The terrain coefficient of 'Water bodies' is set to zero, since we want to constrain the movement on land only.
With the potential map constructed this way, we set τ = 16 and hand-picked 16 observed locations in a clockwise pattern around the lakes, spacing the observation times equidistantly in time. The observed locations appear as crosses in Figure 5. The CTCRW model parameters β and σ were fit via maximum likelihood, and we set η = 50 and σ L = 50.
We then applied the CPF-BBS with systematic resampling, N = 16 and blocktime = 1.0 for 11000 iterations, discarding the first 1000 as burn-in. |∆ k | was set to 2 −7 . The right pane of Figure 5 shows 250 of the simulated location trajectories from the CTCRW-T model. In comparison, the left pane shows trajectories simulated from the CTCRW model conditioned on the observed locations, simulated using (21). We observe that the trajectories simulated from the CTCRW-T model are influenced by the conditioning on the observations, while avoiding water bodies, as desired.
We also tested the performance of the CPF-BBS with the blocking sequence obtained using Algorithm 11 (using N = 512 and n = 25), as well as CPF-BS in this example. Here, the number of particles for Algorithm 11 had to be set slightly higher to ensure that a sufficient number of particles end up in regions of positive potential (due to the hard constraint induced by 'Water bodies'). Figure 6 compares the three algorithms by plotting the IACT of the state variable L We also experimented with the above three algorithms using a higher value for |∆ k |, a situation where a greater discretisation error in the approximation (15) may be tolerated. We found that when |∆ k | was increased to 0.125, the resulting IACTs of L (x) t were similar between the three algorithms (see Figure 7 in the supplementary material).

Discussion
The methods presented in this paper make inference more efficient (and feasible) for an important class of statistical models, which includes hidden Markov models (HMMs) involving weakly informative observations and slowly mixing dynamics and, in particular, time-discretisations of continuous-time path integral models.
Our first contribution was presenting two new conditional resampling algorithms for CPFs in such a context: the killing resampling ρ kill , and the systematic resampling with mean partitioned weights ρ syst . Based on our experiments, ρ syst performs slightly better than ρ kill , coinciding with the recent theoretical and empirical findings of Chopin et al. [2022] for the particle filter in a similar context. Based on our findings, we generally recommend to use ρ syst with the CPF in the weakly informative regime, but the simpler killing resampling ρ kill can also be sufficient for most purposes.
Adaptive resampling [Liu and Chen, 1995] can also be used to reduce resamplings and thereby make the particle filter stable in the weakly informative regime. Adaptive resampling has been suggested also in the context of conditional particle filter [Lee, 2011, Algorithm 5.3]. However, it is not obvious how to implement a valid (bridge) backward sampling with adaptive resampling.
Our main contribution is a new CPF with bridge backward sampling (CPF-BBS), which may be regarded as a generalisation of the celebrated CPF with backward sampling (CPF-BS) [Whiteley, 2010]. The key ingredient of the CPF-BBS which avoids performance issues of the CPF-BS in the weak potentials and slowly mixing context, is the bridging CPF step that updates the latent trajectory subject to a blocking sequence that acts as a tuning parameter of the method. We presented a computationally cheap procedure for finding an appropriate blocking sequence, which is based on a proxy of the integrated autocorrelation time of the output Markov chain, the so-called probability of lower boundary updates (PLU), which measures the probability that the bridge CPF updates the value at the block lower boundary. We derived an estimator for PLU that we suggest to use for blocking sequence tuning via Algorithm 11 that uses a small number of trial runs of the standard particle filter with ancestor tracing to estimate PLU prior to running the CPF-BBS.
The CPF-BBS is generally applicable, assuming that the conditional distributions M u|ℓ and M k|k−1,u related to the individual blocks (ℓ, u) may be computed (Assumption 7). In principle, it is always possible to choose such proposals M k , but careless choice might result in informative potentials G k and therefore poor performance. The contrary is also possible: with suitably chosen M k , the G k can be weakly informative, even if the HMM observations are informative. This can be achieved by designing M k and G k by suitable 'lookaheads' or 'twisting' [Guarniero et al., 2017], such as an approximate smoothing distribution from a Laplace approximation [cf. Vihola et al., 2020, Section 8.1].
The experiments suggest that our estimator for PLU is in good agreement with the true PLU. Algorithm 11, which finds an appropriate blocking automatically, showed promising behaviour in our experiments, leading to performance similar to 'hand tuning' the blocking sequence. Using Algorithm 11 is easy: it only requires the user to specify the number of iterations and number particles used in the selection to obtain adequate performance 'out of the box'. In all of the examples we studied, we found 50 iterations to suffice for block selection, but we presume that the number of particles has to be chosen in a model by model basis.
The performance of the CPF-BBS in practice was promising: we found that the method can provide a substantial performance improvement over CPF-BS in the weak potentials and slowly mixing dynamics setting. This was particularly clear with our movement modelling experiment, which can be of independent interest in certain applications. For instance, our model could be a potentially useful alternative modelling approach for 'step-selection' type analyses for territorial animals.
We believe that PLU and the ideas in the estimator we derived for it can be of interest in other contexts, too. In Section 6.2 we discussed the possibility of obtaining estimates for PLU for N ̸ = N 0 , where N 0 is the number of particles used for the necessary computations. We found empirically (results not reported) that the agreement between PLU and PLU remains similar as in Figure 3 if we use this alternative estimation procedure. This method could potentially be elaborated to a heuristic for choosing the number of particles N for the CPF-BBS. One potential way forward is to determine a 'cutoff level' for how large a PLU is 'large enough,' and the smallest N reaching this level would be chosen. Further developments of these ideas are out of the scope of the present paper.
In some applications relevant for the weakly informative context, the initial distribution M 1 can be diffuse (relative to the smoothing distribution) -even an (improper) uniform measure. In such a case, the CPF and also the CPF-BBS will suffer from poor mixing, but there are relatively direct extensions that are applicable also with the CPF-BBS. Indeed, Fearnhead and Meligkotsidou [2016] discuss general state augmentations that can be useful, and a straightforward implementation is often possible in terms of M 1 -reversible transitions [Karppinen and Vihola, 2021].
Multilevel Monte Carlo -type methods [e.g. Vihola, 2018] have been recently used for inference with increasingly refined discretisations of continuous-time models. The CPF-BBS could be useful for devising multilevel estimators for path integral models. We start by stating an easy lemma, whose proof is immediate.
Next we derive the conditional distribution of , so a simple calculation yields We conclude that A (1:N ) ∼ρ( · | g (1:N ) ) may be drawn by first drawing B from the marginal distribution of A (k) , that is, Lemma 11. Suppose that ϖ is a permutation of [N ], and ϖ * is a cyclic shift of ϖ, that is, ϖ * (i) = ϖ(σ s (i)) for some s ∈ [N ], and that Then, it holds that A 1:N and A 1:N * have the same distribution, where Proof. Without loss of generality, we may consider the case s = 1 and ϖ(i) = i, in which case ϖ * (i) = σ 1 (i). DefineŨ i := (U i − w 1 mod 1), let j = arg min iŨ i , and letŨ i * =Ũ σ j (i) . Observe thatŨ 1:N * and U 1:N have the same distribution, so the claim follows once we show that A 1:N = F −1 (U 1:N ) andĀ 1:N * = σ 1 (F −1 σ 1 (Ũ 1:N * )) are equal, up to a cyclic shift. Indeed, we will see that for all i ∈ [N ] and 0 ≤ k ≤ N − 1: = σ 1 (F −1 σ 1 (Ũ i )) = k + 1 ⇐⇒Ā i = k + 1. Let us first assume k ≥ 1, then the expression on the left is equivalent to because F σ 1 (ℓ) = F (ℓ + 1) − w 1 . Whenever U i − w 1 ≥ 0, we haveŨ i + w 1 = U i , and the expression on the right simplifies toĀ i = F −1 (U i ) = k + 1, as desired.
Then, by Lemma 11, it holds that have the same distribution as the indices from systematic resampling with order ϖ that have been shifted by σ C . In particular, note that Definition 1 (iii) holds for the latter.
Consider then the count of indices equal to i: Since A j = i ⇐⇒ I σ C (j) = 1 and the indices I 1:N are ascending, it holds that where max is zero in case the set is empty. The event N i = n is equivalent with We deduce that only two values of n have nonzero probability (for U ∈ (0, 1)), since: where r = N w i − ⌊N w i ⌋. Furthermore, the conditional probabilities for the events N i = n are given as: where the numerator satisfies

Since
• P(N i = ⌊N w i ⌋ + 1) = r and P(N i = ⌊N w i ⌋) = 1 − r (from above), • P(C = c | N i = n) = 1/N (because C is independent of I 1:N and therefore N i ), • P(A k = i | C = c, N i = n) are deterministic, either zero or one, and precisely n are one, it holds that Observe also that the random variable U conditional on A k = i and N i = n has the density U(0, r) if N i = ⌊N W i ⌋ + 1 and U(r, 1) if n = ⌊N W i ⌋. This follows since U is conditionally independent from the event A k = i given N i = n, since U only depends on A k = i through N i . Similarly, In practice, we can simulate C from this distribution as follows: (1) DrawC ∼ U {1, . . . , n}, Algorithm 6 proceeds by first drawing n ∼ N i | A k = i. Then, c ∼ C | N i = n, A k = i is drawn, after which A 1:N (satisfying A k = i) may be drawn by drawingĀ 1:N and shifting by c. □

Appendix B. Validity of CPF-BBS
We start by two auxiliary results about marginal distributions after partial ancestor tracing and a partial CPF. In what follows, we assume that G k (x 1:k ) = G k (x k−1:k ) for k ∈ {2:T }. Using the definition of M k as in Appendix A, let us fix some notation: for u = 1, . . . , T , denote byπ (N ) u (x 1:u , a 1:u−1 , b u ): and γ u:T (x u:T ) := T k=u+1 M k (x k | x k−1 )G k (x k−1:k ) with γ T :T (x T ) ≡ 1, then the following define probability distributions for u = 1, . . . , T : u , x * u+1:T ). Lemma 12. Suppose that r (p,n) is a valid conditional resampling scheme, with respect to resampling r in Definition 1. Suppose (X 1:u , A 1:u−1 , B u , X * u+1:T ) ∼ µ Proof. In the case (i), the joint density of all variables may be written aš by Lemma 10 (ii). The result (i) follows as we marginalise x (i) Adding the variables generated in lines 2-7 of Algorithm 8 leads toπ . Thanks to Lemma 10 (ii) Because the fraction in (29) does not depend on b ℓ , we may now marginalise over b ℓ , . . . , b u−1 and add the distribution ofB u−1 ∼ Categ(ω Introducingb ℓ:u−2 by AncestorTrace leads to addition of terms 1 (b k−1 =ã k−1 ). Then, calculations similar as above, but in reverse order, lead to (29) with b ℓ:u−1 replaced with b ℓ:u−1 . The result follows by marginalising overX ℓ+1:u−1 ,Ã ℓ+1:u−2 . □ Proof of Theorem 8. We start by observing that by Theorem 2, (X 1: T . Then, the proof relies on an iterative application of Lemma 12 (i) and (ii), with (ℓ, u) = (T L−1 , T L ), . . . , (T 1 , T 2 ), which concludes that (X 1 , X * 2:T ,B 1:T ) ∼ µ (N ) 1 (x 1 ,b 1 , x * 2:T )/N T −1 so (X * 1:T ,B 1:T ) ∼ Z −1 γ 1:T (x 1:T )/N T = π(x 1:T )/N T . □ Appendix C. Computing the conditional distributions in Assumption 7 for discretisations of linear SDEs The practical application of Algorithm 7 requires for each block the computation of the conditional distributions M u|ℓ and M k|k−1,u , where k is a time index, and ℓ and u refer to the block lower and upper boundaries, respectively. This section discusses how these distributions may be computed when: • M 1:T stem from a discretisation of a linear SDE • M 1:T stem from a discretisation of a linear SDE that is conditioned on a set of noisy linear Gaussian observations. Note that it is enough to only consider the second case, since the first one may be obtained by omitting the conditioning on the observations (see discussion at the end of this section).
Following [Särkkä and Solin, 2019], the conditional means and variance (matrices) of the SDE (13) are given for t > s by where expm denotes the matrix exponential. We introduce the notation (32) T s,t := expm(F(t − s)), Q s,t := Cov[X t | X s = x s ].
Assuming a Gaussian initial distribution, we have: where the time discretisation corresponds to (34) 0 = t 1 < t 2 < · · · < t T = τ as in Section 7, and where µ init and Σ init are the initial mean and variance, respectively. Suppose then that there are observationsỸ = (Ỹ k ) k=1,...,Ky observed at timest k , k = 1, . . . , K y , where each observation time is one of the times in time discretisation (34). Further suppose theỸ k are distributed as (35)Ỹ k | Xt k = xt k ∼ N (Z k xt k , H k ), where Z k and H k are matrices and observation variances, respectively. We may then define the augmented observations Y = (Y k ) k=1,...,T with times (34). The random vector Y is distributed like the observationsỸ at their respective times, and has missing elements otherwise.
Consider then the joint distribution of X t k , X t k−1 and X tu for k = ℓ + 1, . . . , u − 1, conditioned on the observations Y 1:T = y 1:T . Since all variables involved are jointly Gaussian, this conditional distribution is: where we have used the notation µ k|n := E[X t k | Y 1:n = y 1:n ], Σ k|n := Var[X t k | Y 1:n = y 1:n ], Σ p,s|n := Cov[X tp , X ts | Y 1:n = y 1:n ].
Here, conditioning on a missing observation should be understood as the observation being removed from the condition.
To obtain the (cross)covariances in (36), the following backwards recursion for s = t − 1, t − 2, . . . from [de Jong and Mackinnon, 1988] may be used (with a matrix transpose applied to the result as needed): (37) Σ s,t|T = Σ s|s T T ts,t s+1 Σ −1 s+1|s Σ s+1,t|T . (36) and (37) reveals that all the quantities required are computed routinely by the Kalman filter and smoother [cf. Durbin and Koopman, 2012] applied to the linear Gaussian state space model composed of (33) and (35). Note that the Kalman filter automatically handles any missing values in the observation sequence.

An inspection of Equations
For k = ℓ + 1, by elementary properties of the Gaussian distribution, the distribution M u|ℓ , that is X tu | X t ℓ = x t ℓ , Y 1:T = y 1:T , is Similarly, for k = ℓ + 1, . . . , u − 1, the distribution M k|k−1,u , that is, X t k | X t k−1 = x t k−1 , X tu = x tu , Y 1:T = y 1:T , is In the case where M 1:T simply corresponds to a discretisation of the linear SDE (13), the above computations can be repeated with the conditioned means, variances and covariances replaced with their unconditional counterparts. In practice, an easy way to compute the unconditional means and variances is to set all observations missing in the Kalman filter. The unconditional covariances can then be obtained from (37) as before.

Appendix D. Models
This section gives additional details related to the models appearing in Section 8. D.1. CTCRW-P. The CTCRW-P SDE (17) may be placed into the form of the linear SDE (13) by setting The expressions for T s,t and Q s,t in (32) are given as follows. A direct computation yields when β v ̸ = β x . If β v = β x , the first element of the second row is replaced by t exp (−β v t).
The transition matrix T s,t may be obtained from (41) by substituting t − s for t.
If β v ̸ = β x , the elements q ij , 1 ≤ i, j ≤ 2 of Q s,t are given by (42) If β v = β x , the element q 11 remains as in (42), but the elements q 12 and q 22 become: (43) Finally, the stationary covariance matrix S with elements s ij used in the initial distribution of the CTCRW-P model is obtained by taking the limit (t − s) → ∞ in the previous equations: (44)