Approximation methods for piecewise deterministic Markov processes and their costs

ABSTRACT In this paper, we analyse piecewise deterministic Markov processes (PDMPs), as introduced in Davis (1984). Many models in insurance mathematics can be formulated in terms of the general concept of PDMPs. There one is interested in computing certain quantities of interest such as the probability of ruin or the value of an insurance company. Instead of explicitly solving the related integro-(partial) differential equation (an approach which can only be used in few special cases), we adapt the problem in a manner that allows us to apply deterministic numerical integration algorithms such as quasi-Monte Carlo rules; this is in contrast to applying random integration algorithms such as Monte Carlo. To this end, we reformulate a general cost functional as a fixed point of a particular integral operator, which allows for iterative approximation of the functional. Furthermore, we introduce a smoothing technique which is applied to the integrands involved, in order to use error bounds for deterministic cubature rules. We prove a convergence result for our PDMPs approximation, which is of independent interest as it justifies phase-type approximations on the process level. We illustrate the smoothing technique for a risk-theoretic example, and compare deterministic and Monte Carlo integration.


Introduction
Many models in risk theory can be formulated as piecewise deterministic Markov processes (PDMPs)a general class of finite-variation sample path Markov processes introduced by Davis [14].This applies, among others, to the classical Cramér-Lundberg model, the renewal risk models, and multi-portfolio models recently introduced by Albrecher and Lautscham [2].Moreover, PDMPs are sufficiently general to allow for non-constant model parameters, i.e., quantities such as the hazard rate or the premium rate may be state dependent.Examples of PDMPs and their control in the field of insurance mathematics are, e.g., Dassios and Embrechts [13], Embrechts and Schmidli [21], Schäl [42], Rolski et al. [41], Cai et al. [8], Leobacher and Ngare [35], Eichler et al. [20].
The general theory of PDMPs is well developed, see for example the monographs by Davis [15], Jacobsen [26], or Bäuerle and Rieder [7] for general results on PDMPs and their optimal control.More specialised contributions to the control theory of PDMPs can be found in Davis [15], Lenhart and Liaot [34], Costa and Davis [10], Dempster and Ye [19], Almudevar [3], Forwick et al. [23], Bäuerle and Rieder [6], Costa and Dufour [11], or Davis and Farid [16] for viscosity solutions of associated Hamilton-Jacobi-Bellman equations, and Colaneri et al. [9] for a general comparison principle for solutions to control problems for PDMPs.
For the numerical treatment of (control) problems for PDMPs, however, only problem-specific solutions have been provided.A standard approach is to link expected values representing a quantity of interest in the problem to the solution of an associated integro-(partial) differential equation, see, e.g., Asmussen and Albrecher [4].In only very few cases is it possible to derive an explicit solution to this integro-(partial) differential equation.Requiring an explicit solution typically restricts the complexity of the model significantly.One possibility is to solve the integro-(partial) differential equation numerically.This carries all the intricacies and difficulties of a combined numerical method for differential and integral equations.Alternatively one can apply crude Monte Carlo methods, see, e.g., Riedler [40].Those methods, while robust, are limited in speed by the Monte Carlo convergence rate.Another-highly sophisticated-approach uses quantisation of the jump distribution, see de Saporta et al. [18].
In this article we concentrate on particularly easy to implement methods similar to Monte Carlo.The aim is to adapt the problem in a way that also allows for deterministic numerical integration algorithms such as quasi-Monte Carlo (QMC).QMC has been applied successfully to problems in risk theory, see Tichy [45], Coulibaly and Lefèvre [12], Siegl and Tichy [43], Albrecher and Kainhofer [1], Preischl et al. [39].It should be noted that the finiteness of the total variation needed for the convergence estimate [1, Theorem 1] has not been proven.
We would like to highlight two features of our approach.Inspired by [1], we reformulate a general cost functional as a fixed point of a particular integral operator, which allows for iterative approximation of the functional.In terms of numerical integration this means that we get a high-dimensional integration problem of fixed dimension, where the dimension is a multiple of the number of iterations.Having a fixed dimension is required for the application of standard QMC or other deterministic cubature rules.The second contribution addresses another requirement for the application of QMC, namely some degree of regularity of the integrand.We introduce a smoothing technique which, in its simplest case, leads to C 2 integrands.For those we obtain deterministic error bounds.We prove convergence in distribution of the smoothed integrand to its unsmoothed limit.
This particular convergence result has an additional benefit for a typical situation in risktheoretic modelling.In the literature on the analysis of ruin probabilities, or more generally, on Gerber-Shiu functions, the assumption of a claim size distribution of mixed exponential or phasetype form is quite common.Apart from the possibility to obtain explicit expressions for quantities of interest in such setups, this modelling approach is motivated by the fact that the class of phasetype distributions is dense in the class of distributions with support on [0, ∞), see [41,Theorem 8.2.3].Under mild assumptions on the claim size distribution we want to approximate, our convergence result applies and justifies the phase-type approximation procedure even on the process level.Furthermore, efficient and easy to implement numerical methods for the computation of important targets such as Gerber-Shiu functions and expected discounted future dividend payments of an insurance company are of particular importance when models become more general and hence also more complicated.This makes our contribution valuable from both the analytical and the numerical point of view.
The paper is structured as follows.In Section 2 we recall the definition of a PDMP and provide some risk-theoretic examples.In Section 3 we derive the fixed point approach for valuation of a cost functional of a PDMP.Section 4 reviews deterministic numerical integration of possibly multivariate C k functions.Subsequently, Section 5 is devoted to the aforementioned smoothing procedure, and presents a stability result.Section 6 contains an application of the smoothing to one of the risk-theoretic examples and a comparative study of deterministic and Monte Carlo integration for this example.

Piecewise deterministic Markov processes
In this section we first define piecewise deterministic Markov processes.Then we give a couple of examples of practical interest.
A PDMP is a continuous-time stochastic process with (possibly random) jumps, which follows a deterministic flow, e.g., the solution of an ordinary differential equation, between jump times.We will not give the most general definition of PDMPs here, but instead refer to the monograph by Davis [15].For a subset A of R d we denote by A • , Ā, and ∂A its interior, closure, and boundary, respectively.We write B(A) for the Borel σ-algebra on A.
If φ is a flow on A, then we write φ A consists of the points on the boundary of A from which the trajectory moves into A • immediately, and ∂ + φ A consists of the points on the boundary of A to which a trajectory moves from A • without passing other points on the boundary in-between.Furthermore, we write The classical example of a flow arises through ordinary differential equations (ODEs).Let g : R d → R d be Lipschitz continuous.By the classical Picard-Lindelöf theorem on existence and uniqueness of solutions of ODEs we have that for every x ∈ R there exists a continuously differentiable function κ : R → R d such that κ(0) = x and κ (s) = g(κ(s)) for all s ∈ R. For t ∈ R we define φ(x, t) = κ(t).The function φ defines a flow on R d .If A ⊆ R d , then the restriction of φ to A × R is a flow on A. Definition 2.3.Let K be a finite set and let d : K → N be a function which satisfies that, for every (i) The state space (E, E) of a PDMP is the measurable space defined by (ii) The flow of a PDMP is defined by φ = {φ k } k∈K .
(iii) The active boundary of the PDMP is defined by (iv) The jump intensity λ of a PDMP is defined by a family of functions λ = {λ k } k∈K with λ k : E k → [0, ∞) measurable and bounded for all k ∈ K.
]) measurable for every A ∈ E, and Q(•, x) is a probability measure on (E, E) for every x ∈ E with Q({x}, x) = 0.
We call the triple (φ, λ, Q) the local characteristics of a PDMP.
Given a state space (E, E) and local characteristics (φ, λ, Q) of a PDMP we define the function Definition 2.4.Let (E, E) be a state space and let (φ, λ, Q) be local characteristics of a PDMP, let x ∈ E, and let (Ω, F, P) be a probability space.A piecewise deterministic Markov process starting in x is a stochastic process X : [0, ∞) × Ω → E which satisfies the following.There exists a sequence of random variables (T n ) n∈N with T n ∈ [0, ∞] and T n ≤ T n+1 a.s. and lim n→∞ T n = ∞ a.s.for all n ∈ N such that (i) it holds P-a.s. that X 0 = x, (ii) for all n ∈ N, t ∈ [T n , T n+1 ), and for (k, y) ∈ E with X Tn = (k, y) it holds P-a.s. that (iii) for all s, t ∈ [0, ∞) it holds P-a.s. that (iv) for all n ∈ N and all A ∈ E it holds P-a.s. that X is a Markov process.
Theorem 2.5.Let (E, E) be a state space and let (φ, λ, Q) be local characteristics of a PDMP, let x ∈ E. There exist a probability space (Ω, F, P x ) and a stochastic process X : [0, ∞) × Ω → E such that X is a PDMP starting in x with state space E and local characteristics (φ, λ, Q).Furthermore, X has the strong Markov property.
Proof.The proof of Theorem 2.5 for a more general setup that also allows for the possibility of explosions and countable K can be found in [15,Section 2.25].
Figure 1 illustrates a path of a PDMP.
Let f : E → R be a function.For all k ∈ K we denote by f k the function for the space of n-times differentiable functions on E and C n b (E, R m ) for the space of functions in C n (E, R m ) for which all derivatives are bounded.Moreover, C n 0 (E, R m ) is the space of functions in C n b (E, R m ) for which all derivatives vanish at infinity.
Further, for f : E → R, a PDMP X, and t ∈ (0, ∞) we write E(f In the remainder of this section we provide some illustrative examples from risk theory.For other examples and applications in different fields we refer to Davis [15], de Saporta et al. [17], Riedler [40].

Classical Cramér-Lundberg model
Let X = (X t ) t≥0 be a stochastic process given by where x, c ≥ 0, N = (N t ) t≥0 is a homogeneous Poisson process with intensity λ N > 0, {Y i } i∈N is a family of positive i.i.d.random variables with distribution function F Y , and S t = Nt i=1 Y i for all t ≥ 0. A usual assumption in this kind of model is the independence of {Y i } i∈N and N .In risk theory the process X represents a standard model for the surplus of an insurance portfolio.A quantity of interest is the probability of X ever becoming negative, i.e., we are interested in P(τ < ∞) , where τ = inf{t ≥ 0 : X t < 0}.The model translates into a PDMP via where we have used the notation y − B = {y − y : y ∈ B} for all y ∈ R and B ∈ B(R).For y ∈ E 2 , any definition for Q will do, since the jump intensity is 0 there, but the above definition is provided for definiteness.

Cramér-Lundberg model with dividend payments
A classical modification of the model from Section 2.1.1 is the introduction of a dividend barrier at level b > 0.Then, once the surplus reaches the barrier, the incoming premium rate is immediately distributed as a dividend.Furthermore, if the process starts above b, the excess is distributed as a lump sum dividend, such that X 0+ = min{x, b}.A typical quantity of interest is the expected value of discounted future dividend payouts until ruin of the company, which is given by where δ > 0 is a preference-based discount factor and τ = inf{t ≥ 0 : X t < 0}.The model translates into a PDMP via • φ 1 (y, t) = y + ct ∀y ∈ E 1 and ∀t ∈ R, φ 2 (y, t) = y ∀y ∈ E 2 and ∀t ∈ R, φ 3 (y, t) = y ∀y ∈ E 3 and ∀t ∈ R, Note that only initial values x ∈ (−∞, b] translate to a viable initial value for the PDMP.However, this is sufficient for determining V (x) for all x ∈ R via (2).

Cramér-Lundberg model with time dependent dividend barrier
In Albrecher and Kainhofer [1] the model from Section 2.1.2is further extended to include a time dependent barrier The quantity of interest is again the expected value of discounted future dividend payments until the time of ruin, i.e., for x ≤ b 0 , where again τ = inf{t ≥ 0 : X t < 0} and δ > 0 is a preference-based discount factor.

Cramér-Lundberg model with loan
In Dassios and Embrechts [13] the model from Section 2.1.2is modified such that the insurance company is not ruined when the surplus hits zero, but has the possibility to take up a loan at an interest rate ρ > 0. The time of ruin is given by τ = inf{t ≥ 0 : X t < −c/ρ}.The corresponding quantity of interest is for x ≤ b, where δ > 0 is a preference-based discount factor.The model translates into a PDMP via

Multidimensional Cramér-Lundberg model
In Albrecher and Lautscham [2] a two-dimensional extension of the model in Section 2.1.2is studied.The basis are independent surplus processes modelling two insurance portfolios X (j) t , j ∈ {1, 2}, where c (1) , c (2) ≥ 0 and S (j) are compound Poisson processes with intensities λ (1) , λ (2) and jump size distributions F Y (1) , F Y (2) .Furthermore, b (1) , b (2) ≥ 0 are barriers.As a new feature, the drift of the component at the barrier is added to the other component's drift, causing faster growth of the latter.Dividends are only paid when both surplus processes have reached their individual barriers.We show how the model translates into a PDMP, namely The flow is given by and It remains to describe the jump behaviour.We get deterministic "jumps" at the active boundaries of E 1 , E 2 , E 3 which do not manifest themselves as jumps of the process, i.e., ) and similar for the other active boundaries.Since each surplus process is a compound Poisson process with drift, jumps in the components occur due to realisations of independent identically distributed exponential random variables (independence implies that mutual jumps occur with probability zero).The twodimensional process thus jumps at the minimum of the individual jump times.This means that we have a constant jump intensity λ k = λ (1) + λ (2) for k = 1, 2, 3, 4, and λ 5 = 0.If a jump occurs at time t ≥ 0, it happens with probability λ (1)  λ (1) +λ (2) in the first surplus process with jump size distribution F Y (1) , and with probability λ (2)  λ (1) +λ (2) in the second surplus process with jump size distribution F Y (2) .It remains to describe the jump kernel for the jumps from x ∈ E. To this end define, for , and (y (1) , y (2) ) ∈ E k 1 , B (1) = {z (1) ∈ R : (z (1) , z (2) ) ∈ B, z (2) = y (2) } , B (2) = {z (2) ∈ R : (z (1) , z (2) ) ∈ B, z (1) = y (1) } .

Furthermore,
A quantity of interest in this model is again the expected value of discounted future dividend payments until the time of ruin of one of the portfolios, for x (1) ≤ b (1) , x (2) ≤ b (2) , with τ = inf{t ≥ 0 : (X t , X t ) ∈ E 5 }, and δ > 0 being a preferencebased discount factor.

Iterated integrals and a fixed point approach
In this section we derive a method for numerical approximation of the quantities of interest appearing in the models introduced in the previous section.We rewrite the quantity of interest as a sum of integrals with fixed dimension and an error term that goes to zero exponentially fast with increasing dimension of the integral.This allows for the use of deterministic integration rules.The starting point for the derivation of this integral representation is the observation that the quantity of interest is a fixed point of a certain integral operator associated to the PDMP.Definition 3.1.Suppose there exists a set K c ⊆ K such that for all k ∈ K c it holds that λ k (x) = 0, and φ k (x, t) = x for all x ∈ E k and all t ∈ R. We call E c a cemetery of the PDMP.Definition 3.2.Let a PDMP be given and let E c = ∅ be a cemetery of the PDMP.A running reward function : where τ = inf{t ≥ 0 : Let T 1 be the first jump time.Equation ( 4) can be rewritten as follows, Since X is a PDMP and hence a Markov process, this yields Recall that for every t ≥ 0 it holds that P x (T 1 > t) = exp − t 0 λ(φ(x, s))ds =: 1 − F W (t, x) and denote the corresponding density by f W .With this, the function H and the operator G admit representations as integrals, Note that H(x) corresponds to the expected discounted rewards collected before the first jump at time T 1 when starting in x.GV (x) represents the expected discounted rewards from time T 1 onwards conditional on the event {X T 1 / ∈ E c , X 0 = x}.Iterating the above steps n ∈ N times leads to Lemma 3.3.Let Ψ : E c → R and : E → R be bounded, for all k ∈ K assume that the functions λ k are bounded by C λ ∈ (0, ∞), and for all x ∈ E let t * (x) = ∞.Then for all x ∈ E and for all n ∈ N it holds that Proof.The boundedness of and Ψ implies that also V is bounded by From the original definiton of G we have that Recall that For every n ∈ N let Z n ∼ Erlang(n, C λ ) be an Erlang-distributed random variable.Combining this with (6) we get that The latter expression converges to zero as n → ∞ uniformly in x ∈ E.
Combining Lemma 3.3 with (5) results in the error estimate Finally, we obtain the following representation, In (8) we denote by {t j } j∈{1,...,i−1} the family of inter-jump times and by {x j } j∈{1,...,i−1} the family of post-jump locations.
Remark 3.4.Solving the integral G i−1 H(x 0 ) brings several advantages compared to a crude Monte Carlo approach.First, ( 8) is an integral with a fixed dimension.Hence, it can be approximated using deterministic integration rules like quasi-Monte Carlo, for which deterministic error bounds are available.Second, the bias of restricting oneself to a fixed number of jumps can be estimated uniformly in x = x 0 using the bias estimate in Lemma 3.3.Third, rare events like surviving a large number of jumps are-in this formulation-not rare in the sense that it is unlikely to draw such a realisation, which has the effect of importance sampling.
4 Cubature rules for C κ -functions In order to obtain convergence estimates for numerical integration methods such as quasi-Monte Carlo (QMC) methods or other cubature rules, we need more regularity of the integrands than they admit in many practical applications.For example, we may need to bound a certain norm of the Hessian matrix of the integrand.In Section 5, we will rewrite the problem introduced in Section 3 so that the integrand is a function for some κ ∈ N. We outline two different methods for treating such integrands f by cubature rules.

Quasi-Monte Carlo methods
Quasi-Monte Carlo methods are equal-weight cubature rules with M deterministically chosen integration nodes.Let the integrand f : In order to obtain a convergence estimate for numerical integration of f using QMC, we require a so-called Koksma-Hlawka type inequality.The original Koksma-Hlawka inequality bounds the integration error of a QMC rule by the product of the variation of the integrand (in the sense of Hardy and Krause) and the so-called discrepancy of the integration node set (see, e.g., [36,Chapter 2]).We remark, however, that we cannot easily apply the classical Koksma-Hlawka inequality in this paper, as we cannot rely on the integrands to have bounded variation in the sense of Hardy and Krause.Hence, we are going to resort to a variant of the Koksma-Hlawka inequality which was recently proven in Pausinger and Svane [38].
) be a QMC rule using M integration nodes x 1 , . . ., x M ∈ [0, 1) d .Then by [38,Theorem 3.12] we have where M (f ) = sup x∈[0,1] d Hess(f, x) , Hess(f, x) is the Hessian matrix of f at x, • denotes the usual operator norm, and where Disc I (x 1 , . . ., x M ) is the isotropic discrepancy of the integration node set, where µ d denotes the Lebesgue measure on the R d .Now let x 1 , . . ., where by Disc * (x 1 , . . ., x M ) we denote the star discrepancy of x 1 , . . ., x M , defined as It is well known that common point sequences that are employed in QMC methods, such as Sobol' sequences or Halton sequences, have a star discrepancy of order (log M ) d /M (and it is known that this order can, if at all, only be improved with respect to the exponent of the log-term).Hence, by using, e.g., Sobol' points in a QMC method for numerically integrating a C 2 -function, we cannot expect an error that converges to zero faster than (log M )/M 1/d .As we shall see below, this order of magnitude can, with respect to the disadvantageous dependence on d, not be improved further for C 2 -functions.However, there is room for improvement if we make additional smoothness assumptions on the integrand.

Product rules
In Hinrichs et al. [25] it is shown that, by using products of Gauss rules, one can obtain the following result.Let f : [0, 1] d → R be such that f ∈ C κ for some κ ∈ N.Then, by using a product rule Q G, M ,d of M -point Gauss quadrature rules, one obtains where c κ = (π/2)(e/(6 √ 3)) κ , and where where D β denotes the (weak) partial derivative of order β for β ∈ N d 0 .A d-fold Gauss product rule as described above uses M = M d points in total, and hence yields a convergence order of M −κ/d .It is known due to [5] that this convergence order is best possible.For the special case κ = 2, we only obtain a relatively small improvement over the bound implied by (9).However, there is an additional advantage in the bound (10).By requiring that the function f satisfies additional smoothness assumptions, namely that f ∈ C κ for some κ ∈ N which is possibly larger than 2, we obtain an improved convergence rate.Hence, we face a trade-off between imposing a higher degree of smoothness on the integrand f to obtain a higher accuracy in the quadrature rule, and the error we make by smoothing the integrand to that extent.It is therefore likely that the method needs to be fine-tuned on a case-by-case basis.In practice, product rules often cannot be applied, since, for example, for integrating a d = 1024-variate integrand using only M = 2 integration nodes per coordinate requires a point set consisting of M = 2 1024 integration nodes.To overcome the latter problem, it might be useful to apply the theory of weighted integration as introduced in [44], possibly combined with truncation (see, e.g., [30]) or multivariate decomposition methods (see, e.g., [32]).A detailed analysis of these approaches applied to the present problem is left open for future research.

Smoothing of the integrand
The integrand in (8) is not necessarily a C κ -function.Therefore, in this section we provide a technique for smoothing the integrand in order to apply convergence results for integration rules that are described in Section 4.
The piecewise construction of the process described in Definition 2.4 leads to the situation that X t = φ(X T j−1 , t − T j−1 ) for t ∈ [T j−1 , T j ) is a function of X T j−1 and T j−1 .In particular, all subsequent pre-jump locations depend on all previous post-jump locations and jump times, via φ and λ.Consequently, regularity of the integrand depends on regularity of the flow φ and the intensity function λ.The analysis in this section is restricted to the case where the flow originates from autonomous ODEs, i.e., for all k ∈ K there exist Lipschitz continuous functions General results from the literature on ODEs, see, e.g., [24], yield that the derivatives ∂ ∂y φ k , ∂ 2 ∂y 2 φ k , ∂ ∂t φ k can be described by so-called associated first and second order variational equations for which one requires g k to be a C 2 -function.
For the density f W of the inter-jump times to be C 2 we need that λ ∈ C 2 (E, R).Also we need ∈ C 2 b (E, R), and Ψ ∈ C 2 b (E, R) since they appear in the integral defining H.A serious problem with respect to smoothness arises if the PDMP model allows for jumps from the active boundary.Suppose (k, y) ∈ E and t * (k, y) < ∞.Then, conditional on X t = (k, y), the time of the next jump is distributed as min(T, t * (k, y)), where T has distribution function But in general neither t * (k, y) nor min(T, t * (k, y)) will depend smoothly on y, even if λ k has arbitrarily high regularity.We are not aware of a general remedy for this problem.However, for all PDMP models put forward in Section 2.1, the jumps from the active boundary do not constitute jumps of the original problem.In the following subsection we describe by example how PDMPs can be approximated by PDMPs that do not allow for jumps from the boundary.
Concerning the jump kernel Q, it is hard to state general sufficient regularity conditions.An exemplary favourable situation arises if the jump kernel Q admits a C 2 -density f Y in the sense that Q(A, x) = A f Y (x 1 , x)dx 1 for all A ∈ E and all x ∈ E. In the one-dimensional examples from risk theory in Sections 2.1.1-2.1.4,this condition is satisfied and for the two-dimensional example in Section 2.1.5we present a smoothing technique in Section 5.2.

Smoothing of the flow
Consider the example from Section 2.1.4without dividend barrier.We can describe the problem alternatively with a state space consisting of three components: for some c > 0, ρ > 0. The function φ 2 is given by φ 2 (y, t) = y ∀y ∈ E 2 and ∀t ∈ R, and φ 3 by φ 3 (y, t) = y ∀y ∈ E 3 and ∀t ∈ R, Here, g 1 is not differentiable in 0. However, we may smoothen this discontinuity using a "smoothened Heaviside function".Note that Γ * = ∅.
Proof.The elementary proof is left to the reader.
There are various possible choices for the smoothing: from the left ). Figure 2 depicts these three possible smoothings for a function with a discontinuity in ξ = 1.A concrete example for a function h that satisfies the above requirements is given by For every ε > 0, a smoothed version of the function g 1 defined in ( 11) is given by We can finally formulate a PDMP corresponding to the new model, where the flow has been smoothened, Note that Γ * = ∅.Since the dividend barrier b is never reached, we also have to smoothen the reward function in a way that the region where dividends are paid can be reached, i.e., ε (y) = c h( y−b+ε ε ).We will show convergence of the corresponding value functions in Section 6.

Smoothing of jump measures
We give an example for a class of jump measures that can be approximated by measures leading to C 2 -integrands in (8).
Let (E, E) be the state space of a PDMP and let (φ, λ, Q) be its local characteristics.Let the probability kernel Q satisfy the following assumption.Assumption 5.3.We assume that 1. there exists a positive integer n such that for every k ∈ K, and every y ∈ E k , there exist sets B 1 (k, y), . . ., B n (k, y) such that (i) for every j ∈ {1, . . ., n} there exists (ii) for every j ∈ {1, . . ., n} it holds that {(ȳ, z) 2. for every k ∈ K and every j ∈ {1, . . ., n} the mapping from 4. for every x ∈ E and every j ∈ {1, . . ., n} there exists a C 2 -mapping G j,x : [0, 1] dim(B j ) → B j such that for all A ∈ E it holds that where µ m denotes the m-dimensional Lebesgue measure, 5. for every k ∈ K and every j ∈ {1, . . ., n} the mapping from Note that Assumption 5.3.1 implies that, for every x ∈ E, B j (x) is a C 2 -manifold, and that for all Under Assumption 5.3 we have for x ∈ E and for f (G j,x (u))du, where p j (x) = Q(B k,j , x) for all x ∈ E. For the integral in (8) this implies that we have iterated sums for each jump, which increases the complexity for large numbers of jumps.Instead, we may write the sum as an integral over [0, 1], f (G j,x (u))du du 0 , where q 0 (x) = 0 and q j (x) = p 1 (x)+• • •+p j (x).However, with this "trick" we have lost the property of the integrand being C 2 .So, using again our smoothened Heaviside function h : R → [0, 1], we can smoothen the indicator functions, This expression, considered as a function of x, is C 2 as it is a composition of C 2 -functions.
Theorem 5.4.In the setup of this section we have for all Proof.It holds that f (G j,x (u)) du .
For our concrete example of h the first integral can be estimated by 5  8 ε.
yielding the statement of the theorem.Now, consider the example from Section 2.1.5.Here, a jump can be either a jump in x 1 -direction or a jump in x 2 -direction, i.e., In this case we can find functions Remark 5.5.If we consider, say, i = 100 in (8), then we get a very high number of terms to be summed in the integral.However, we always assume ε to be very small, in particular, we may assume that per jump at most two, and in most situations only one, of the terms h(ε −1 (u − q j−1 (x))) + h(ε −1 (q j (x) − u)) are nonzero.

Convergence
In this section we prove a general convergence result for approximated versions of PDMPs with smoothing as above.We will exploit results on Feller processes presented in Kallenberg [28,Chapter 19] and Ethier and Kurtz [22,Chapters 4.2 and 4.8].For the remainder of this section we make the following assumptions: With this, we can utilise the following theorem.
Theorem 5.6.[15, Theorem 27.6]If t * (x) = ∞ for all x ∈ E and for all λ ∈ C b (E, R), and if the mapping x → E f (y)Q(dy, x) is continuous for all f ∈ C b (E, R), then the PDMP is a Feller process.
We give an example for a class of jump kernel which comprises the jump kernels of the onedimensional examples in Section 2.1 and which satisfies (iii).
Example 5.7.Let E k ⊆ R be an interval for every k ∈ K and let f Y be a bounded density function on R. Furthermore, let, for every x = (k, y) ∈ E and every Since, by assumption, all f j are continuous and all E j are intervals, it holds that | is bounded by 2 f j ∞ and goes to zero as y 1 → y 2 for almost all ȳ.Therefore, bounded convergence implies that the above sum converges to 0. From this the desired continuity follows.
The generator of X in the setup of the current section is given by Af where Proposition 5.9 ([28, Proposition 19.9]).If A is the generator of a Feller semigroup (P t ) t≥0 , then any dense, (P t ) t≥0 -invariant subspace D ⊆ D(A) is a core for A.
Proposition 5.10.Under the assumptions made in this section, and for A being defined as in (13), Proof.We certainly have that [15, p.76], since the PDMP is Feller by Theorem 5.6.
We have to prove that C ∞ b (E, R) is invariant under (P t ) t∈[0,∞) .We show this by proving that, for all k ∈ N, For k = 0 this is just the Feller property.Since all derivatives are bounded in the sup-norm, differentiation and application of P t commute, i.e., Theorem 5.11 ([28, Theorem 19.25]).Let X be a Feller process in E with semigroup (P t ) t≥0 and generator A with domain D(A), and for all n ∈ N let X n be Feller processes in E with semigroups (P n t ) t≥0 and generators A n with domains D(A n ).Let D be a core for A. Then the following statements are equivalent: for all t > 0 we have P n t → P t as n → ∞ in the strong operator topology, (iii) for every f ∈ C 0 (E, R) and every t 0 ∈ (0, ∞) it holds that Remark 5.12.The notion of weak convergence of processes in Item (iv) needs an explanation.
Proof.Let σ denote the Skorokhod metric on D([0, ∞), E).Let ε > 0. There exists t > 0 such that ∞ t e −δs f ∞ ds < ε 4 .By Skorokhod continuity of F 2 there exists an η > 0 such that for all We are in the position to show that cost functionals indeed commute with weak limits of PDMPs.Lemma 5.14.Let X be a PDMP and (X n ) n∈N be a sequence of PDMPs on the same state space E and with the same cemetery E c , and let : E → R and Ψ : E → R be a running reward function and a terminal cost function, respectively.Assume that both and Ψ are continuous and bounded.Assume further that X n 0 = x for all n ∈ N and X 0 = x, and as n → ∞.
Proof.Recall that ≡ 0 on E c , and Ψ ≡ 0 on E\E c , so that Moreover, if ω is a path of the PDMPs, then it holds that ω(s) = ω(τ ) for all s ≥ τ , such that ∞ τ δe −δs Ψ(ω(s))ds = e −δτ Ψ(ω(τ )).This completes the proof.Also, finite time ruin probabilities, i.e., the probability of the PDMP reaching the cemetery before a given time horizon t, commute with weak limits, as we show next.
Lemma 5.15.Let X be a PDMP and (X n ) n∈N be a sequence of PDMPs on the same state space E and with the same cemetery E c .Assume further that X n 0 = x for all n ∈ N and X 0 = x, and Proof.Consider a functional of the same form as F 1 in Remark 5.12, with f 1 = 1 E c .Since the cemetery is the union of only entire ({k} × E k ), the indicator function of the cemetery is continuous.Therefore if we define ψ(x, t , n ∈ N, we have lim n→∞ ψ n (x, t) = ψ(x, t) for all x ∈ E and for all t ≥ 0.
The following theorem specifies conditions under which Theorem 5.11 is applicable in the PDMP setting.
Theorem 5.16.Let X be a Feller PDMP with local characteristics (φ, λ, Q) and let X n , n ∈ N, be Feller PDMPs with local characteristics (φ n , λ n , Q n ).Further, let the following assumptions hold: Proof.Let D(A n ), n ∈ N, and D(A) be the domains of the generators A n , n ∈ N, and A, corresponding to X n and X, respectively.For f n ∈ D(A n ) we have By Proposition 5.10, D = C ∞ b (E, R) is a core for all generators involved.For every f ∈ D we set f n = f for all n ∈ N, such that trivially f n → f as n → ∞.Next, observe that we have for all n ∈ N, Since Q n , n ∈ N, and Q are probability measures on (E, B(E)), and since, by assumption, g n → g and λ n → λ uniformly in x ∈ E, the terms in ( 15) converge to zero.The term in ( 16) can be estimated as follows, The latter expression tends to zero, since for all x ∈ E it was assumed that ( 14) holds, and since λ n → λ uniformly in x ∈ E. Thus, Item (i) of Theorem 5.11 holds.This implies that Item (iv) of Theorem 5.11 holds.The latter is equivalent to the assertion of this theorem.
Remark 5.17.Note that in the Feller case we can move to another external state only due to a purely random jump, i.e., a jump determined by Q n for n ∈ N or Q.Therefore, if we assume uniform convergence of the local characteristics across all state components, and in particular also Q n → Q in the sense of ( 14), then the result of Theorem 5.16 still holds.
Since uniform convergence of the local characteristics and the assumption that t * (x) = ∞ are essential in the proof of Theorem 5.16, we need an alternative argument for situations with an active boundary or for situations in which a smooth approximation fails.A prototypical univariate example for both cases is a drift of the form g(x) = c 1 {x≤b} for some b ∈ R.Here one faces either a discontinuity or a subdivision of R into two state components, i.e., R = {x ∈ R : x ≤ b} ∪ {x ∈ R : x > b}, with a continuous drift on each component.For a specific example, we find a method for dealing with this particular situation in the next section.

Application to the Cramér-Lundberg model with loan
In this section we apply our smoothing technique to the example presented in Subsection 2.1.4and calculate the quantity of interest using different numerical integration methods.In this setup, φ 1 solves the ODE ∂ ∂t φ 1 (y, t) = g 1 (φ 1 (y, t)) ∀y ∈ E 1 and ∀t ∈ R, with In what follows, suppose that Assumption 6.1 holds.A variable substitution t j = − ln(v j ) and y j = (χ j − + c ρ )z j , where v j ∈ [0, 1], z j ∈ [0, 1], χj (v 1 , . . ., v j , z 1 , . . ., z j ) = χ j − (t 1 , . . ., t j , y 1 , . . ., y j ).We then put Due to the recursive structure of the functions χ1 , χ2 , . . ., χi−1 , the Jacobi matrix of the substitution has lower triangular shape, such that its determinant is the product of the diagonal elements.For being able to reasonably apply (9) we need to bound the Hessian of the integrand.If for example the jump size distribution is the Gamma distribution with parameters α, β > 0, i.e., dF Y (y) = dy, then this boils down to the condition β ≥ 3 and δ + λ > 3, which implies that the integrand is bounded in 0. In the original problem statement this corresponds to an additional integrability condition on the jump size distribution.Finally, for x ∈ E of the form x = (4, b) we have Remark 6.2.In Section 5.3 the stability, with respect to the smoothing parameter ε of the considered functional of the process, is dealt with in a fairly general setting.Unfortunately, because of the discontinuity of the drift g in the present example, we cannot achieve uniform convergence of the smoothed drift around the barrier level b, whereas point-wise convergence is achieved.Theorem 6.3.In the setup of this section, the following assertion holds true.There exists where Since lim ε→0 C(ε) → 0, it holds that |φ ε 1 (t, y) − φ 1 (t, y)| ≤ ε for sufficiently small ε > 0. Recall that T 1 is the time of the first jump of X conditional on X 0 = (1, y).Since the jump intensity is constant on E 1 , T 1 is exponentially distributed with intensity λ N .Hence, we can write , where C 1 (ε) = max 1, 3 16 e ρC(ε) − 1 , see (19).With this, the first term in (20) can be estimated by Next, observe that (we remind the reader that the states x ∈ E are denoted by x = (k, y), which is why in the following the terms y 1 , y 2 are not to be confused with the integration variables y j used in and below (17)), Combining this with (19), we can estimate the second term in (20) by e ρC(ε) − 1 . ( Furthermore, since the third term in (20) can be estimated as follows, Taking the supremum over y ∈ E 1 in (20) and using ( 21), (22), and ( 24) we obtain that for some constant C and for sufficiently small ε.Thus, which completes the proof.

Numerical experiment
We now solve the example presented above numerically.We set the following parameter values.The initial value of the PDMP x 0 = 0, the premium income rate c = 5, the credit rate ρ = 0.05, the intensity of the Poisson process λ = 4, the jump size distribution is for all x ∈ [0, ∞) given by F Y (x) = 1 − e −αx with α = 1, and the discount rate δ = 0.02.With this, the optimal dividend threshold according to [13] is b = 3.24289.Furthermore, we set the smoothing parameter ε = 0.01.
For computing the flow it is enough to solve the corresponding ODE once and to store the solution for repeated use.We implemented Monte Carlo (random), quasi-Monte Carlo with the Sobol' sequence (Sobol), and quasi-Monte Carlo with a scrambled version of the Halton sequence (scrambled Halton), where scrambling refers to a permutation of digits (see, e.g., [37]).The Sobol' point generator we used was taken from Frances Y. Kuo's homepage [31] and is based on [27].
The reference solution was calculated using Monte Carlo with M = 5000 • 2 10 = 5120000 sample paths and d = 1024, meaning that the maximum number of jumps we allow for is 512.In our plots we show the results plotted over an increasing number of integration nodes M ∈ {50 • 2 j : 1 ≤ j ≤ 16}.
Figure 3 shows the estimated standard deviation of the estimation, which is calculated by using 50 repetitions with randomly shifted versions of our integration nodes.
So the domain D(A) of the generator consists of all functions in C b (E, R) which are continuously differentiable along the trajectories of the flow on all components, cf.Ethier and Kurtz [22, page 8], and C 1 b (E, R) ⊆ D(A).Definition 5.8 ([28, Chapter 19]).Let A be a closed linear operator with domain of definition D(A).A core for A is a linear subspace D ⊆ D(A) such that the restriction A|D has closure A.

Figure 3 :
Figure 3: The estimated standard deviation of the estimation.