Concave-Convex PDMP-based Sampling

Abstract Recently nonreversible samplers based on simulating piecewise deterministic Markov processes (PDMPs) have shown potential for efficient sampling in Bayesian inference problems. However, there remains a lack of guidance on how to best implement these algorithms. If implemented poorly, the computational costs of simulating event times can outweigh the statistical efficiency of the nonreversible dynamics. Drawing on the adaptive rejection literature, we propose the concave-convex adaptive thinning approach for simulating a piecewise deterministic Markov process, which we call CC-PDMP. This approach provides a general guide for constructing bounds that may be used to facilitate PDMP-based sampling. A key advantage of this method is its additive structure—adding concave-convex decompositions yields a concave-convex decomposition. This makes the construction of bounds modular, as given a concave-convex decomposition for a class of likelihoods and a family of priors, they can be combined to construct bounds for the posterior. We show that constructing our bounds is simple and leads to computationally efficient thinning. Our approach is well suited to local PDMP simulation where conditional independence of the target can be exploited for potentially huge computational gains. We provide an R package and compare with existing approaches to simulating events in the PDMP literature. Supplementary materials for this article are available online.


Introduction
Monte Carlo methods based on continuous-time Markov processes have shown potential for efficient sampling in Bayesian inference challenges (Goldman and Singh, 2021;Fearnhead et al., 2018).These new sampling algorithms are based on simulating piecewise-deterministic Markov processes (PDMPs).Such processes evolve deterministically between random event times with possibly random dynamics at event times (see Davis, 1993, for an introduction to PDMPs).Monte Carlo methods based on simulating a PDMP with a desired target distribution of interest were first proposed in the computational physics literature where they were motivated as examples of non-reversible Markov processes (Peters and de With, 2012;Michel et al., 2014).These ideas transitioned to the statistics community as an alternative to traditional Markov chain Monte Carlo (MCMC) samplers.Popular algorithms in this development include the Bouncy Particle Sampler (Bouchard-Côté et al., 2018) and the Zig-Zag sampler (Bierkens et al., 2019) amongst others (Vanetti et al., 2017;Wu and Robert, 2020;Michel et al., 2020;Bierkens et al., 2020).
There is substantial theoretical and empirical evidence to support the claim that non-reversible MCMC samplers offer more efficient sampling as they reduce the chance of the sampler moving back to areas of the state space that have recently been visited (Diaconis et al., 2000;Bierkens and Roberts, 2017;Bouchard-Côté et al., 2018).Samplers based on PDMPs simulate from a density π(θ) by introducing a velocity component v which is used to evolve the position of the state, θ, with deterministic dynamics.At stochastic event times the velocity component is updated using a Markov kernel after which the process continues with the new velocity.The main challenge to implementing a PDMP-based sampler is the simulation of the random event times.This involves simulating the next event time in a time-inhomogeneous Poisson process with rate function that depends on the current position of the sampler.
Simulation of event times in a Poisson process is often facilitated through a method known as thinning (Lewis and Shedler, 1979).The aim of this paper is to provide a general framework to aid practitioners in implementing PDMP-based samplers.Specifically we introduce the concaveconvex thinning approach to facilitate simulating a PDMP-based sampler (CC-PDMP).This method can be applied whenever the rate function of the PDMP can be decomposed into the sum of concave and convex functions, and can be used to facilitate thinning from polynomial rate functions, allowing for broad applicability of the method.We discuss the efficiency of the method compared to alternative approaches in the literature for exact sampling of event times.
Related to our work is the concave-convex adaptive rejection sampler of Görür and Teh (2011) which uses the same upper-bounding approach to construct bounds on the log density within a rejection sampler that draws independent samples from univariate density functions.In contrast, by applying this technique within PDMP sampling, the bounds are constructed on the gradient of the log density (i.e. the rate) and used within the non-reversible Markov Process framework to generate samples from multivariate densities.Moreover, the CC-PDMP framework allows for both subsampling and local updating schemes to facilitate these methods for high dimensional sampling.A short comparison between concave-convex adaptive rejection sampling and CC-PDMP for sampling from a univariate density is given in Appendix C. Some authors have proposed using automatic but approximate methods for simulating a PDMP-based sampler.Among these approaches, Cotter et al. (2020) propose numerical integration and root-finding algorithms to facilitate simulation.Others have considered using approximate local bounds which can be simulated exactly (Goldman and Singh, 2021;Goan and Fookes, 2021;Pakman et al., 2017).Both of these approaches sacrifice exact sampling of the posterior and involve a trade-off between the computational cost and the approximation of the sampling distribution.
The paper is organised as follows.In Section 2, we introduce the technical details of simulating from a PDMP-based sampler and the details of some popular samplers.Section 3 reviews the literature on thinning for PDMP-based samplers.We introduce the concave-convex PDMP approach for adaptive thinning in Section 4. Empirical evaluation of CC-PDMP, and comparison to existing methods, is given in Section 5. We conclude with discussion of limitations and extensions that are possible for the method.Code for implementing our method and replicating our results is available at https://github.com/matt-sutton/ccpdmp.
2 Sampling using a PDMP

Piecewise deterministic Markov processes
Before we explore PDMP-based samplers, we first review PDMPs.A PDMP is a stochastic process, and we will denote its state at time t by z t .There are three key components that define the dynamics of a PDMP: 1.A set of deterministic dynamics defined as z t+s = ψ(z t , s).

Event times that are driven by an inhomogeneous
Poisson process with rate λ(z t ).
3. A Markov transition kernel q(z|z t− ) where z t− := lim s↑t z s .
A PDMP with these specifications will evolve according to its deterministic dynamics between event times and at event times will update according to the Markov transition kernel q(z|z t− ).The event times must be simulated from a Poisson process with rate λ(z t ), which depends on the position of the process at time t.The result of simulating this process is a PDMP skeleton {(t k , z t k )} n k=1 where for k = 1, . . ., n, t k denotes the kth event time and we have stored the state of the process immediately after this event time, z t k .Given this skeleton it is possible to fill in the values of the state between the events using the deterministic dynamics.

PDMP-based samplers
Assume we wish to construct a PDMP process to sample from a target distribution of interest, π(θ).Current PDMP-based samplers are defined on an augmented space z ∈ E, where E = X × V, that can be viewed as having a position, θ t , and a velocity, v t .As described below, the dynamics of the PDMP can be chosen so that the PDMP's invariant distribution has the form ν(z) = π(θ)p(v|θ), for some conditional distribution of the velocities, p(v|θ).As a result, the θ-marginal of the invariant distribution is the target distribution we wish to sample from.We will consider target distributions of the form where θ ∈ R p and U (θ) is the associated potential.An important feature of PDMP samplers is that they need to know the target distribution only up to a constant of proportionality.
If we simulate the PDMP, and the process converges to stationarity, then we can use the simulated skeleton of the PDMP to estimate expectations with respect to the target distribution in two ways.Assume we are interested in a Monte Carlo estimate of g(z)ν(z)dz.The first estimator averages over the continuous time path of the PDMP This requires that the integral of g(ψ(z t k , s)) with respect to s may be computed.The second, simpler, approach is to "discretise" the PDMP path.This involves taking an integer M > 0, defining s = (t n − t 1 )/M , calculating the value of the state at discrete-time points, z t1+s , . . ., z t1+M s , and then using the standard Monte Carlo estimator The points z t1+js may be found by evolving the process along its deterministic dynamics for the appropriate event time interval.

Bouncy particle sampler
The Bouncy Particle Sampler (Bouchard-Côté et al., 2018) takes the velocity space V to be either R p or the unit sphere S p−1 and targets an invariant distribution ν(z) = π(θ)p(v) where p(v) is either the standard p-dimensional Normal distribution, or a uniform distribution on S p−1 .This sampler evolves a PDMP with linear deterministic dynamics φ((θ t , v t ), s) = (θ t + sv t , v t ) between events.The canonical event rate is given by At an event time t generated according to this rate the velocity is updated as v This update can be interpreted as reflecting the velocity off the hyperplane tangential to the gradient of U (θ).Additionally, at random times t an additional event occurs according to a homogeneous Poisson process with rate λ ref > 0 in which the velocity is refreshed, drawing v t ∼ p(v).Refreshment ensures that the Bouncy Particle Sampler samples the invariant distribution and does not get "stuck" on contours of the potential (Bouchard-Côté et al., 2018).Formally the Bouncy Particle Sampler has event rate λ(z t ) = λ c (z t ) + λ ref with Markov transition kernel where δ a (A) denotes the Dirac delta function with point mass at a.The Bouncy Particle Sampler is an example of a global PDMP as all components of the velocity v are changed according to the Markov transition kernel.

Zig-Zag sampler
The Zig-Zag sampler (Bierkens et al., 2019(Bierkens et al., , 2021a) ) takes the space of velocities, V, to be {−1, 1} p with p(v) = 2 −p uniform on this space.The Zig-Zag sampler also uses linear dynamics φ((θ t , v t ), s) = (θ t + sv t , v t ) between events, but unlike the BPS only a single element of the velocity vector is updated at event times.The overall Zig-Zag process can be viewed as a composition of p local event rates where each event rate has a corresponding Markov transition kernel which operates on a single component of the velocity.Specifically, the i-th component of the velocity is updated with local event rate for i = 1, ..., p. Simulating an event time in this local framework consists of simulating an event time for each local event rate and then taking the minimum time as the event time t for the overall process.At an event time t triggered by event rate λ i the velocity is updated as Once the velocity is updated for component i local event times must be re-simulated for all rates λ j (z t ) which are functions of θ i .This fact can induce massive computational savings when there is conditional independence between the components of θ.To see this, consider the extreme case where the target is fully independent across dimensions.In this case each event rate, λ i (z t ), will only depend on the i-th components of θ t and v t .Thus at each event, it is only the time of the next event for the component whose velocity changes that needs to be recalculated.The computational cost of simulation in this case will scale as O(1) per event as opposed to O(d) for the global Bouncy Particle Sampler.

Global and local PDMP methods
As described above, we can divide PDMP samplers into two main classes: global methods and local methods.These PDMPs differ in how they operate on the velocity vector when an event is triggered.Global methods are PDMP-based methods where the transition kernel acts on the entire velocity vector.By contrast, local methods are defined so that the transition kernel only affects a subset of the velocity.In this section we will give a general algorithm for constructing a local PDMP-based sampler (though see also Bouchard-Côté et al., 2018).Let v [S] denote the sub-vector of v with elements indexed by S ⊆ {1, . . ., p} and v [−S] denote the sub-vector of v without the elements indexed in S. For some set S define the local kernel as one that satisfies θ (v|v t− ) only updates the sub-vector v [S] .Suppose we have a partition of the components of θ 1 , ..., θ p given by S = {S 1 , S 2 , . . ., S F }. Subset S f of v is updated with local kernel q θ (v|v t− ) and associated event rate for f = 1, . . ., F .Simulating a local PDMP with this structure involves simulating an event time for each of the F rates and applying the local transition kernel corresponding to the smallest simulated event time.For the partition S define N = {N 1 , . . ., N F } with [Sm] denotes that the rate of events associated with S m depends on some element of θ [S f ] .When an event is triggered for a local rate f the velocity v [S f ] is updated and new event times are simulated for rates which depend on the value of one or more θ i for i ∈ S f .This simulation process is summarised in Algorithm 1.
In this framework the Zig-Zag sampler is a local PDMP with partition S = {{1}, {2}, . . ., {p}} and local kernels q If there is a single factor S = {S} containing all indices S = {1, . . ., p} we recover the global Bouncy Particle Sampler.
Algorithm 1: Simulating a PDMP with local structure Inputs:

Algorithms for event-time simulation
Simulating the first event time, τ , from a Poisson process with event rate λ(t) is equivalent to simulating u ∼ Uniform[0, 1] and solving If the event rate is sufficiently simple this can be done exactly.For example if the rate is constant or linear, then there are analytical solutions for τ in terms of u.If the rate function is convex a solution may be found using numerical methods (Bouchard-Côté et al., 2018).For more complex functions, the rate can be simulated by a process known as thinning.This process makes use of Theorem 1.
The efficiency of simulating via thinning depends on how tightly the rate λ upper-bounds λ and how costly it is to simulate from λ. Another key tool for enabling the simulation of event times is known as superposition and the general process is outlined in Theorem 2.
Theorem 2 (Kingman (1992)).Suppose that Λ 1 , ..., Λ n are a set of independent Poisson processes with rates λ 1 (t), ..., λ n (t) with first arrival times τ [1] , ..., τ [n] .The Poisson process with rate λ(t) = n i=1 λ i (t) may be simulated by returning the first arrival time as τ = min i=1,...,n τ [i] .Both superposition and thinning provide useful tools in the simulation of event times from a non-homogeneous Poisson process.However, constructing the upper-bounds required for thinning is largely an ad hoc and time consuming process for the practitioner.A common desire for a practitioner is to reuse simulation knowledge.For example if an event rate can be written as the sum of several sub-event rates, that can be exactly simulated, ideally this knowledge should be used to simulate from the sum.We refer to this class of event rates as additive rates.Specifically, we will refer to a Poisson process with a rate as process with an additive rate.The superposition method from Theorem 2 facilitates additive event-time simulation by thinning using an event rate which satisfies λ(z t ) ≤ λ(t).We can simulate the first event from a process with this rate as τ = min(τ 1 , τ 2 ), where τ 1 is simulated with rate λ1 = max{0, f 1 (t)} and τ 2 with rate λ2 = max{0, f 2 (t)}.This simulated time will be accepted with probability max{0, Convex bounds are constructed using linear segments connecting evaluations of f u (t).
whether we are viewing it as a function of the state, or as here, as a function of time.Suppose λ(t) = max{0, f (t)} where f (t) can be decomposed as: If the event is rejected, we can reuse the information from the evaluation of f u (τ ) and f n (τ ) with an additional evaluation of f n (τ ) to refine the simulation on the abscissae If an event does not occur on the range t ∈ [0, τ max ) the PDMP process is evolved by τ max and the thinning process is repeated.We give the general construction of the piecewise linear function (t) for t ∈ [t 1 , t 2 ) and note this construction may be applied iteratively over the abscissae.This construction is also depicted visually in Figure 1.A convex function f u (t), is by definition a function that can be upperbounded by the line segment connecting any two function evaluations.So, for any t ∈ [t 1 , t 2 ) we have the upper-bound f u (t) ≤ u (t) where For a concave function f n (t) with derivative f n (t) the line segment corresponding to the tangent of f n (t) will upper-bound the function.Thus f n (t) will be upper-bounded by for t ∈ [t 1 , t 2 ).This minimum will switch at the point of intersection between the lines f n (t 1 ) + f n (t 1 )t and It is simple to find this intersection point (Görür and Teh, 2011), which will be a point on the interval [t 1 , t 2 ] provided the derivatives f n (t 1 ) and f n (t 2 ) are not equal.If f n (t 1 ) = f n (t 2 ), the linear segment will not change over the interval and we take t * = t 2 .So we take and combining these bounds we have (t) = u (t) + n (t) which is a piecewise linear function upper-bounding f (t) ≤ (t) for t ∈ [t 1 , t 2 ).

Concave-convex decompositions
The proposition below gives simple conditions for the class of non-homogeneous Poisson processes that admit thinning using the concave-convex approach.
Proposition 1.If a Poisson process with rate function λ(t) can be written as λ(t) = max{0, f (t)} where f (t) is continuous except possibly at a finite number of known points, ∞ , then the process admits thinning using the concave-convex adaptive thinning approach.
Proof.Consider f (t) on the interval [0, τ max ) where if t 1 exists, τ max = t 1 otherwise τ max is some arbitrary value τ max > 0. On this interval f (t) is continuous and by the Stone-Weierstrass theorem (Stone, 1948), for any > 0 there exists a polynomial g(t) on the interval [0, τ max ] with f − g < .The function g is a polynomial so it admits the concave-convex decomposition with convex function g u (t) = {i:ai>0} a i t i and concave function g n (t) = {i:ai<0} a i t i .The Process with rate λ(t) admits thinning on the interval [0, τ max ) using the Poisson process with rate λ(t) = max{0, g(t) + } which has a convex-concave decomposition.If an event does not occur on the interval [0, τ max ) then the process is considered on the next continuous interval.
The proof of Proposition 1 relies on finding a polynomial that can upper-bound the process over a finite interval.If a bound can be given for higher order derivatives of f (t) then there are two main approaches for finding an upper-bounding polynomial.Using Taylor's expansion of f (t) about zero If there is a known constant M such that |f (k+1) (t)| ≤ M we can employ thinning based on the k+1) .Alternatively polynomial upper-bounds can be constructed based on interpolation where the error is controlled.For example using Lagrange polynomial interpolation on [0, τ max ] with k evaluations of the function has error is bounded by τ k+1 max M (k+1)! .Adding this bound as a constant yields an upperbounding polynomial without requiring evaluation of the derivatives of f (t) required for the Taylor series expansion.

Concave-convex adaptive thinning and PDMP-based samplers
We will refer to the process of using concave-convex adaptive thinning for simulating a PDMPbased sampler as CC-PDMP sampling.In this section we illustrate how CC-PDMP sampling is applied when the target distribution corresponds to a Bayesian posterior: where U ( ) (θ) is the potential of the likelihood and U (p) (θ) = − log p 0 (θ) is the potential for the prior distribution p 0 .Consider the kth rate for a fully local PDMP such as the Zig-Zag sampler here we could apply concave-convex thinning with of likelihood and prior part where k have concave-convex decompositions then f k has a concave-convex decomposition.This allows for reuse of bounding knowledge, priors and likelihoods can be swapped provided their decompositions are known.Moreover, to implement this fully local method the rates λ k (t) = max{0, f k (t)} must have a concave-convex decomposition for k = 1, ..., p.Thus we can reuse this information to find a decomposition for a global PDMP-based sampler with rate Terms can be trivially regrouped to facilitate simulation from any choice of local PDMP structure as described in Section 2.5.

Choice of abscissae
A key user choice in CC-PDMP sampling is that of the position and number of abscissae on the interval [0, τ max ).Suppose that CC-PDMP simulation is implemented with abscissae t 0 = 0 < t 1 < • • • < t n = τ max if an event does not occur on the interval [0, τ max ) the PDMP will be evolved by τ max and the thinning process will repeat where the function evaluations at τ max can be reused at t 0 = 0. Thus one simulation with n abscissae would be computationally equivalent to n evolutions of the CC-PDMP approach using abscissae with only two points t 0 = 0 and t 1 = τ max .This encourages choosing a minimal number of abscissae and more carefully choosing the length of the interval.
There are two main issues that can occur when tuning the parameter τ max .If τ max is too small, then the proposal frequently lie outside the interval [0, τ max ) and simulating an event will requiring many iterations of bounding the event rate.Whilst if τ max is too large then the bound on the rate may be be poor, leading to many events that are rejected.In this article, we refer to an iteration of the CC-PDMP that does not generate an event time (i.e. a rejected proposal event or not generating on the interval) as a shadow event.The total number of iterations will be the sum of the number of events and shadow events.Efficiency is measured as the proportion of iterations that are events not shadow events, calculated as the ratio of the number of events to the number of iterations.
Adapting the value of τ max will not change the sampling dynamics.Consequently, unlike in adaptive MCMC, this parameter may be adapted throughout the course of simulating the PDMP.Ideally this parameter should be a little larger than the average event time so that events are proposed on the interval.A simple automatic approach for choosing this parameter is to set τ max equal to the q-th percentile of previous simulated event times, and update τ max every 100 iterations of the sampler.We found that setting q = 80th percentile worked well for automatically selecting τ max .
We investigate the dependency on this tuning parameter by implementing the Zig-Zag sampler on the Banana distribution.The 2-dimensional Banana distribution has potential: where κ controls how much mass concentrates around the region θ 2 ≈ θ 2 1 .
Since the potential is polynomial in both parameters it is trivial to see that the Zig-Zag rate functions will also be polynomial.Specifically the rates will be polynomial functions of time t of degree 3 and 2 for parameters θ 1 and θ 2 .Further implementation details may be found in the supplementary material and associated GitHub.
Figure 2 shows the efficiency of the Zig-Zag sampler on the Banana distribution with κ = 1 when changing the interval length τ max from 0.1 to 4 with two abscissae.The Zig-Zag algorithm was simulated for 10000 events at each value of τ max .The effect of selecting τ max too small or too large is clearly seen and the red line shows the performance using the adaptive approach.While it is possible to get low efficiency if the parameter τ max is chosen poorly, we have found either using the adaptive approach or a short pilot run of the sampler to tune τ max gives reliable performance.

Efficiency
Figure 2: Efficiency, defined as the proportion of events that are not shadow events, of Zig-Zag using the CC-PDMP thinning with varying choices for τ max on the Banana target.The red line shows the performance of an automatic choice for this parameter.

Choice of decomposition
The choice of concave-convex decomposition is not unique (Görür and Teh, 2011).Arbitrary convex functions can be added to f u (t) and subtracted from f n (t) to give a new valid decomposition.A concave-convex decomposition is minimal if there is no non-affine convex function that can be added to f n (t) and subtracted from f u (t) while preserving the convexity of the decomposition (Hartman, 1959).Consider the function where f (t) = −6t + 6.It is simple to see that the function is convex for t < 1 and concave for t > 1.A minimal decomposition is given by the piecewise functions Görür and Teh (2011) give a general construction for a minimal concave-convex decomposition.
However, this construction relies on finding all points of inflection, which are points where the function changes convexity.Alternatively, Proposition 1 gives a simple decomposition f 2 u (t) = 3t 2 + 3 and f 2 n (t) = −t 3 − 3t.While the decomposition from Proposition 1 is not minimal, it does not require finding all points of inflection.These two decompositions are shown in Figure 3 using abscissae at 0 and 1.The optimal decomposition gives a tighter bound, here reducing the bounding rate by approximately 0.33 on average across the interval.However, using this bound comes with additional computation cost of finding inflection points.This has the potential to reduce the overall efficiency of the method.Our experience is that the efficiency gains for using the optimal polynomial are generally not sufficient for it to be beneficial.

Experiments
We now present empirical evaluation of CC-PDMP and comparison with other approaches to simulate PDMPs.Our experiments were implemented using the R package ccpdmp available at https://github.com/matt-sutton/ccpdmp.The package enables you to simulate a PDMP provided one specifies the concave and convex decomposition of the rate function.If the rate function is polynomial (or bounded by a polynomial) the practitioner may parse this instead and the concave-convex decomposition will be handled internally.The package contains some basic example use cases and code to reproduce the experiments.Code snippets are given in the additional material.

Application in generalised linear models
Generalised linear models (GLMs) provide a rich class of models that are frequently used in Bayesian inference.Let {(y i , x i )} n i=1 be the observed data, where y i is an observed response and x i ∈ R p is a vector of associated covariates for i = 1, . . ., n.The expected value of y i is modelled by g −1 (x T i θ) where g −1 : R → R is the inverse link function.The potential of the likelihood has the form where φ : R 2 → R is the GLM mapping function φ(a, y) = log p(y|g −1 (a)) returning the log likelihood for observation y given a.The partial derivative with respect to θ k has the form where φ (a, y) = ∂ ∂a φ(a, y) and higher order derivatives are defined similarly.Over the time interval where a i (t) = x T i (θ + tv).We can use this to define local rates as λ k (t) = max{0, f k (t)} for k = 1, ..., p.For GLMs, repeated application of the chain rule yields: where ∂ai ∂t = x T i v and f k denotes the jth derivative of f k (t).When there are bounds on φ (j) we can use the upper-bounding Taylor polynomial.In Appendix B we provide bounds for several modelling choices.
To date only linear bounds for logistic regression have been used for thinning (Bierkens et al., 2018;Bouchard-Côté et al., 2018).These are based on the bound |φ (a, y)| ≤ 1/4.Table 1 shows the efficiency for thinning using polynomials of order 1-3 (using bounds |φ (a, y)| ≤ 1/4, |φ (a, y)| ≤ 1/(6 √ 3) and |φ (4) (a, y)| ≤ 1/8).The linear 1st order bound matches the bound used in Bierkens et al. (2018) for logistic regression using the Zig-Zag sampler.Table 1 shows that higher order polynomial thinning facilitated through the CC-PDMP allows more efficient thinning.For the linear bounds (Polynomial order 1) we used a fixed τ max = 1 as linear proposals are exactly simulated using the concave-convex approach.For other polynomial orders we used the adaptive procedure described in the tuning section to select τ max .

Comparison to thinning via superposition
In this example we demonstrate the advantages of CC-PDMP when the event rate is additive.In particular, we compare thinning using the CC-PDMP approach with thinning using superposition as outlined in Section 3. Let {y i } n i=1 be the observed data with the following model independently for i = 1..., n.The derivative of the potential is Which has the concave-convex decomposition The global Bouncy Particle Sampler rate can be defined as which has concave-convex decomposition defined by the decompositions of the individual f k functions.In contrast, thinning via superposition involves simulating an event time for the linear and exponential terms of the rate individually.The proposed event time is the minimum of all simulated times.This approach to simulating event times for exponential likelihood and Poisson-Gaussian Markov random fields has been previously used in implementing PDMPs Bouchard-Côté et al.  Figure 4 compares the thinning and overall computational efficiency for the Bouncy Particle Sampler applied to the Poisson likelihood problem.In the CC-PDMP approach we used the adaptive update for τ max as described in the tuning section.As the dimension increases we see the proportion of iterations that are events using superposition drops quickly, consequently the overall computation time for this approach scales poorly.The poor performance of the superposition approach with increasing dimension is the result of the large number of exponential terms, v k exp(θ k + v k t), in the event rate.The thinning acceptance rate for this proposal is max {0, When v k < 0 these exponential terms contribute zero to the denominator of the superposition approach.In comparison the CC-PDMP approach uses a linear upper-bound on these exponential terms which can be negative and reduce the denominator term, leading to more efficient thinning proposals.This is seen in Figure 4 where the thinning efficiency remains roughly constant with increasing dimension.

Local Methods
Local PDMP methods take advantage of the conditional independence between parameters.Consider the following extension to the previous Poisson example independently for i = 1, . . ., n where we fix ρ = 0.5.The prior on θ corresponds to an AR(1) process.The partial derivative is which has the same linear and exponential form as the rate in Section 5.2 so we can use an analogous concave-convex decomposition for this rate.In this section we consider local PDMP implementations.The CC-PDMP implementation facilitates simple construction of thinning bounds for local PDMP factorisations.For the partition S = {S 1 , . . ., S F } the local rate for factor f is for f = 1, . . ., F .In this section we consider a number of local decompositions of the form S (j) = {S 1 , . . ., S F } where S 1 = {1, 2, ..., j}, S 2 = {j + 1, j + 2, .., 2j}, . . ., S F = {p − j, . . ., p − 1, p}.
If p = F j then the final factor will have fewer elements than the previous factors.For these decompositions the conditional independence between parameters gives the following neighbours N = {N 1 , . . ., N F } where This local decomposition offers computational advantages since updating a single rate requires resimulation of three additional rates regardless of the dimension of the problem.However, this computational efficiency comes at a loss of statistical efficiency.In particular, a global PDMP will be able to move further in stochastic time than local methods before requiring a new event simulation as the global rate function will lower bound the local rate.The overall efficiency of the PDMP method should therefore be considered in terms of both its computational and statistical efficiency.
In Figure 5 we investigate the scaling performance of local methods in comparison with alternative state-of-the-art MCMC methods.In particular, we compare local PDMP methods with well-tuned implementations of Metropolis adjusted Langevin algorithm (MALA) and Hamiltonian Monte Carlo (HMC).MALA is tuned to obtain acceptance rate approximately equal to 0.5 and HMC tuned to scale with acceptance probability approximately equal to 0.6.For MALA this involves scaling the variance in the proposal and for HMC this involves adjusting the number Figure 5: Log-log breakdown of computational, statistical and overall empirical efficiency scaling with dimension.The ESS values are calculated with respect to the first coordinate θ 1 using the coda R package on a discretised trajectory of the PDMPs.Plotted are the average rates and error bars of all methods calculated from 50 repeated runs of the methods.The legend shows the slope and intercept fitted for each method giving empirical evidence for scaling rates.
of leapfrog steps per iteration.These implementations are in line with theoretical scaling results when the parameters of the distribution are all independent.As expected, the computational efficiency for the local methods remains (approximately) constant with increasing dimension.Based on the fitted models shown alongside the legends, it appears that computational cost for both MALA and Bouncy Particle Sampler scales approximately as O(d −1 ).The computational cost for HMC scales at a rate greater than O(d −1 ) due to the increase in the number of leapfrog iterations.The statistical efficiency for the local methods improves with larger local factors and is highest for the Global Bouncy Particle Sampler.The statistical efficiency for MALA drops at a rate roughly equal to O(d −1/3 ).Overall efficiency can be seen as the sum of the log computational and log statistical efficiencies.The Zig-Zag and local Bouncy Particle Sampler methods attain the best overall efficiency scaling rates around O(d) or better.It is clear that there is a trade-off between the statistical efficiency of larger local factors and the increased computational cost.This decomposition can be thought of as an additional choice for PDMP implementation that can easily be tuned using the CC-PDMP approach for thinning.Despite a poorer scaling with dimension, HMC remains competitive with local methods to a very high dimension.

Conclusion
PDMP-based samplers have shown advantages over traditional MCMC samplers, but their use has been limited by the perceived difficulty in simulating the PDMP dynamics.We have introduced CC-PDMP as a general approach that can simulate the PDMP provided we can specify a concave-convex decomposition of the event rate.This method has broad applicability, enables simple implementation of local PDMP methods, and empirically outperforms alternative simulation approaches.Additional generalisations of the CC-PDMP approach are also possible by making use of other ideas for adaptive rejection sampling.For example, to bound the concave term, f n (t), of the rate function we require that the derivative f n (t) is known on the interval t ∈ [0, τ max ).However, we could instead use a looser bound that does not require this derivative information based on the adaptive rejection bounds proposed in Gilks (1992).If the rate function is particularly computationally intensive, a lower bound based on the abscissae where f u , f n and f n and f u are evaluated may be used to perform early rejection in the thinning.Additional generalisations may also be found in the difference of convex functions programming literature (Le Thi and Pham Dinh, 2018).A useful approach to finding a decomposition was to construct a polynomial approximation that upper-bounds the rate.An approximate version of our algorithm could also be implemented where this polynomial approximation is estimated via interpolation.This would allow thinning without having to find the concave-convex decomposition.However, the overall algorithm would be biased unless error in the interpolation is added to the polynomial rate.
Finally while our CC-PDMP approach has been illustrated only on PDMPs with linear dynamics the approach could be used more generally on PDMPs with nonlinear dynamics, such as the Boomerang sampler of Bierkens et al. (2020).The only requirement is that the rate function can be bounded by a function with a concave-convex decomposition.

B.3 Gaussian Spike and Slab Prior
Following Chevallier et al. (2020) and Bierkens et al. (2021b), the spike and slab prior can be simulated using PDMP dynamics.The Gaussian Spike and Slab prior has the form θ j ∼ w j N (0, σ) + (1 − w j )δ 0 for j = 1, ..., p where w j is the prior probability of inclusion and δ is the Dirac spike at zero.
For RJPDMPs there are three types of events that occur; a reversible jump (RJ) move if a variable hits the axis (θ j =0), events according to the likelihood and prior N (0, σ) for θ j nonzero, or the RJ move to reintroduce a variable.The rate to reintroduce a variable is calculated in Chevallier et al. (2020) where it is found to be where p rj is a tuning parameter for the probability of jumping to the reduced model when hitting the axis in Chevallier et al. (2020).The time to hit an axis for parameter θ j will be −θ j /v j when θ j v j > 0 and ∞ otherwise.Finally simulating non RJ events is equivalent to simulating from the Gaussian slab and thus the part of the event rate contributed by the prior on parameter θ j is v j (θ j + v j t)/σ 2 for t ∈ [0, τ max ].This is linear in time so will be exactly simulated by the CC-PDMP approach.

Figure 1 :
Figure 1: Upper bounds for a rate function based on concave (left) and convex (right) information on three abscissae.Concave bounds are formed using linear segments from the gradient of f n (t).Convex bounds are constructed using linear segments connecting evaluations of f u (t).

Figure 3 :
Figure3: The upper-bounds resulting from two different concave-convex decompositions of the function f (t) = −t 3 + 3t 2 − 3t + 3, based on abscissae at 0 and 1.The top row corresponds to the decomposition from Proposition 1 and the bottom row corresponds to an optimal decomposition which recognises f (t) as a convex function on [0, 1).The columns show the functions f u , f n and f as well as their piece-wise linear upper-bounds.

Figure 4 :
Figure 4: Efficiency of thinning for the global BPS using thinning via super-position and CC-PDMP thinning with increasing dimension.Efficiency of the thinning (left) and total computation time (right) is averaged over 20 repeated runs.The Bouncy Particle Sampler samplers were simulated for 1000 event times on each run.