A static replication approach for callable interest rate derivatives: mathematical foundations and efficient estimation of SIMM–MVA

The computation of credit risk measures such as exposure and Credit Value Adjustments (CVA) requires the simulation of future portfolio prices. Recent metrics, such as dynamic Initial Margin (IM) and Margin Value Adjustments (MVA) additionally require the simulation of future conditional sensitivities. For portfolios with non-linear instruments that do not admit closed-form valuation formulas, this poses a significant computational challenge. This problem is addressed by proposing a static replication algorithm for interest rate options with early-exercise features under an affine term-structure model. Under the appropriate conditions, we can find an equivalent portfolio of vanilla options that replicate these products. Specifically, we decompose the product into a portfolio of European swaptions. The weights and strikes of the portfolio are obtained by regressing the target option value with interpretable, feed-forward neural networks. Once an equivalent portfolio of European swaptions is determined, we can leverage on closed-form expressions to obtain the conditional prices and sensitivities, which serve as an input to exposure and SIMM-driven MVA quantification. For a consistent forward sensitivity estimation, this involves the differentiation of the portfolio-weights. The accuracy and convergence of the method is demonstrated through several representative numerical examples, benchmarked against the established least-square Monte Carlo method.


Introduction
This research contributes to the quantification of risk measures related to counterparty credit risk (CCR) for interest rate derivatives that are traded over-the-counter (OTC).CCR refers to the risk that the counterparty in a bilateral agreement may default before the final settlement of every cash-flow and will fail to meet its financial obligations (Bank for International Settlements 2019).This type of risk has gained a lot of attention in the aftermath of the global financial crisis of 2007-08.Since then, the regulators have introduced numerous measures to mitigate systemic credit risk that have resulted in quantifying CCR through various value adjustments (xVA) (Gregory 2020).The family of xVAs includes CVA, which is defined as the expected loss that is incurred from a counterparty defaulting.Other xVAs quantify the cost related to funding, collateral and capital requirements.A general introduction to the challenges related to CCR quantification can be found in Zhu and Pykhtin (2007).
An important means to reduce CCR for uncleared OTC derivatives is to post collateral (Gregory 2020).For interest rate derivatives we can in general distinguish two types of collateral: Variation Margin (VM) and Initial Margin (IM).VM matches the current value of the underlying portfolio and needs to be updated on a regular basis.The funding cost related to VM is known as FVA and is reflected by the expected exposure of the portfolio (Burgard and Kjaer 2013), similar to CVA.Posting VM significantly reduces the exposure to counterparty risk but does not bring it down to zero.Typically there will be a delay between the event of default and the settlement of all outstanding positions, which is known as the margin period of risk (Gregory 2020).IM serves as a protection against exposure changes due to market moves during this close-out period.In general, IM is supposed to cover the potential future exposure of a contract over a timehorizon of 10 business days with a 99% confidence.In that regard, the volume of IM typically reflects the 99% valueat-risk of the 10-day move in portfolio value.A portfolio's VaR varies over time and should therefore (just like VM) be updated on a regular basis.
A VaR measure is highly model-dependent and computationally intensive to quantify.To address these problems, the International Swaps and Derivatives Association (ISDA) has developed the Standard Initial Margin Model (SIMM).Since its publication in 2016, SIMM has become an industrystandard approach to IM quantification.Benefits of such a common methodology include a transparent dispute resolution and a consistent regulatory governance of collateral (International Swaps and Derivatives Association, Inc 2016).In short, SIMM is a sensitivity-based approach, in which VaR measures are approximated using the instrument's sensitivity to the shock of an underlying risk-factor.For the user of SIMM, this means that IM quantification reduces to the calculation of a set of bucketed portfolio sensitivities, such as Delta and Vega.
The exchange of collateral needs to be funded, which comes at a cost for the dealer.The total expected funding cost over the life-time of a derivative contract is known as the Margin Value Adjustment (MVA) (Green and Kenyon 2015) and is a recent addition to the xVA-collection.An ISDA margin survey reported that market participants collected over $300 billion of IM for non-cleared derivatives at year-end 2021 (International Swaps and Derivatives Association, Inc 2021).Due to its volume, an adequate quantification and management of IM-induced funding cost is highly relevant.Under the assumption that the size of IM is implied by SIMM and thus by sensitivities, it follows that MVA is driven by the future distribution of portfolio sensitivities over the life-time of the trade.Within a Monte Carlo simulation framework, this means sensitivities need to be calculated along the Monte Carlo path.For vanilla derivatives, which admit closed-form Greeks, the computational cost may be manageable.However, for exotic derivatives, such as options with early-exercise features, this is a demanding problem.
In this work, we present a universal approach to the computation of prices, Greeks, exposures and sensitivities along the Monte Carlo path for callable interest rate derivatives.Our focus will be on Bermudan swaptions, which is a class of exotic OTC derivatives that is heavily traded in the market.Our methodology is an extension of the work presented in Lokeshwar et al. (2022) and Hoencamp et al. (2023) and relies of the concept of static replication.A static replication is a portfolio of vanilla instruments, that mirrors the value of the original exotic option, until it is either exercised or matured.The portfolio composition of a static replication is constant throughout the life-time of the trade.This is in contrast with a dynamic replication, which needs to be continuously rebalanced (such as a traditional Delta hedge), or a semi-static replication, which needs to be re-balanced on a finite number of instances.The decomposition of a complex product into a portfolio of vanilla options greatly simplifies its risk analysis and allows for an efficient price and sensitivity calculation.
In the context of CVA, methodologies that facilitate efficient exposure calculations, have received a lot of attention in the recent literature.Ordinary least-square regression techniques for American-style options embedded in a simulation framework, have been popularized by Carriere (1996), Tsitsiklis and Van Roy (2001) and Longstaff and Schwartz (2001).For exposure purposes, this regression methodology is extended by Joshi and Kwon (2016) and Feng et al. (2016) to reduce regression bias and noise along the Monte Carlo path.In other works, Monte Carlo methods have been combined with finite-difference estimation (De Graaf et al. 2014, Simaitis et al. 2016), the COS-method (Shen et al. 2013, Feng et al. 2016), the Stochastic Grid Bundling Method (SGBM) (Karlsson et al. 2016) and Chebyshev interpolation (Glau et al. 2021).In recent advances, machine learning has been applied to facilitate efficiency in exposure calculations.This includes a Deep Optimal Stopping algorithm to estimate the exercise strategy (Andersson and Oosterlee 2021) and a Deep xVA Solver, which relies on a neural network-based BSDE solver.
The publication of SIMM and the increasing significance of MVA have been incentives to investigate methodologies that facilitate efficient sensitivity calculation.For practitioners, the default approach is 'bump-and-reval', due to its simplicity and straight-forward implementation.For spot calculations, this is feasible, but for an application such as MVA, it would imply path-wise model re-calibration and option re-evaluation for each scenario, time-step and sensitivity-component.Thus, the computational burden would be excessive.In Glasserman (2004), unbiased Monte Carlo sensitivity estimators are proposed, which yield better convergence properties than the traditional finite-difference estimator.These are known as the likelihood ratio method, which relies on differentiating the payoff probability density, and the path-wise derivative method, which relies of differentiating the payoff function itself.Yet the computational improvement of these methods can be limited for path-dependent instruments.
The application of algorithmic differentiation (AD) to the path-wise derivative method has been shown to yield a significant efficiency gain.The concept of AD is that the estimation routine is decomposed into a series of basic operations, which are differentiated by a repeated application of the chain rule (Griewank and Walther 2008).The differentiation is either performed in a forward iteration (tangent mode) or backward iteration (adjoint mode, AAD).See for example Giles and Glasserman (2006), Capriotti (2011) or Capriotti et al. (2017) for an application with American-style derivatives.The tangent mode is efficient if the number of independent variables (i.e. the dimensionality of the function input) is small.It is therefore less suitable for the calculations of large gradients.In contrast, the adjoint mode is efficient if the number of dependent variables (i.e. the dimensionality of the function output) is small.A limitation of AAD, however, is that it is demanding in terms of memory and its implementation is often technical.Others have combined adjoint differentiation with LSMC (Antonov et al. 2017, Caspers andLichters 2018), stochastic automatic differentiation (Fries 2019) or SGBM (Jain et al. 2019).Alternative approaches to path-wise sensitivity approximations include principal component analysis (Kappen 2017), Chebyshev interpolation (Zeron and Ruiz 2018) and least-square regression (Lakhany and Zhang 2021).
With a focus on Bermudan swaptions under an affine termstructure model, our contribution to the literature is threefold.First we extend the semi-static replication algorithm that is presented in Lokeshwar et al. (2022) and Hoencamp et al. (2023) to a replication approach that is truly static.The method is based on formulating the portfolio optimization as the regression of a shallow neural network to the target's option value.As a result, the trained parameters represent the weights and strikes of the portfolio composition.Second, we show that with the proper constraints on the regression, a Bermudan can be replicated with a portfolio of European swaptions.The accuracy is demonstrated through several numerical examples.Thirdly, we derive efficient estimators for sensitivities along the Monte Carlo path, which serve as an input to dynamic IM quantification.Here we exploit the fact that Deltas and Vegas of European swaptions are available in closed-form.We also show that for complete and accurate sensitivities, a differentiation of the portfolio weights needs to be incorporated.We demonstrate the performance of the algorithm through several numerical examples and convergence analyzes, benchmarking the results to the established least-square Monte Carlo method (Longstaff and Schwartz 2001).
The outline of the article is as follows.In section 2, we define the modeling landscape, the considered interest rate derivatives and the risk metrics of interest.In section 3, we present the regression algorithm and show how we can achieve a static replication for Bermudan swaptions.In section 4, we derive the estimators used for price and sensitivity calculation along the Monte Carlo path.Section 5 is subject to several representative numerical examples.We conclude with a summary of the results in section 6.

Problem formulation
A problem overview is provided in figures 1-4.

Problem overview
(1) Initial Margin (IM) is a form of collateral, which serves as a protection against the 99% value-at-risk of the exposure change during the margin period of risk (figure 1).( 2) The industry standard approach to IM quantification is the Standard Initial Margin Model (SIMM), which takes the portfolio sensitivities as input (figure 2).
(3) The expected funding cost of posting IM is called Margin Value Adjustment (MVA).Quantifying MVA requires the simulation of future IM (i.e.portfolio    sensitivities), which is a computational challenge (figure 3).( 4) We propose a method to statically replicate a callable derivative with a portfolio of vanilla options.This simplifies the computation of path-wise sensitivities and future IM (figure 4).

Model and assumptions
First, we fix a finite time-horizon [0, T], on which we consider a continuous-time financial market.We assume the market is frictionless and free of arbitrage.We let the probability space ( , F, P) represent all possible states of the economy and the filtration F = (F t ) t∈[0, T] all information generated by the economy up to time-t.
We consider the notion of a bank account or money market account.Investments in the money market are assumed to compound a continuous, risk-free interest r t .We refer to r t as the short-rate.The time-t value of a unit of currency invested in the money market at time-zero is denoted as M t , and we assume it satisfies the following dynamics We denote by Q the risk-neutral measure equivalent to P, which is associated with M t as the numéraire.Attainable claims denominated by the numéraire are assumed to be martingales under Q, which guarantees the absence of arbitrage (Harrison and Pliska 1981).
Throughout this paper we shall consider short-rate dynamics that are captured by an affine term-structure model, as introduced in Duffie and Kan (1996).That means that the short-rate can be written as for some function g : R d → R. The vector x t := {x 1 (t), . . ., x d (t)} denotes a Markovian system of state-variables in R d .The state-variables are assumed to satisfy an SDE of the form with We let g(•) and μ(t, •) be affine functions of x t , satisfying the standard regularity conditions, such that x t admits a strong solution.Note that we restrict our scope to the Gaussian subclass of affine short-rate models.This is done to impose an intuitive relationship between model parameters and model-implied option volatilities (see section 3.5).This will facilitate the tractability of the computations; the estimation of Vega in particular.Generalizations to state-dependent diffusion coefficients should be possible but will involve the differentiation of Fourier transforms.This increases the complexity of the problem and falls outside the scope of this paper.
A zero-coupon bond is a contract that guarantees the holder one unit of currency at a pre-specified maturity date T. We will denote its time-t value as P(t, T) for t ≤ T.An important result for affine term-structure models, is that zero-coupon bond prices are exponential affine in x t .See for example Andersen and Piterbarg (2010) and Duffie and Kan (1996) for details.Next to zero-coupon bonds, we will often refer to a related quantity, known as the zero-rate.We define a zero-rate as

R(t, T) := − log P(t, T) τ
where τ denotes the year fraction between date t and T. For simplicity, we will assume that the collateral rate used for discounting and the instantaneous rate used to derive term-rates are both implied by the same short-rate r t .In other words, our set-up will be a classic single-curve model environment.As term rates, we consider the classic, forward-looking LIBOR.Similarly one could consider a backward-looking, RFR-based term rate.In certain markets, the IBOR benchmark is discontinued, in favor of such set-in-arrears term rates (see Lyashenko and Mercurio 2019).As forward rates of forward-and backward-looking term rates are equivalent before the start of the accrual period, their treatment within this framework would mostly be the same.

Interest rate derivatives
The focus of this work will be on modeling Bermudan swaptions.Here we briefly introduce related derivatives and notation.
Interest rate swap: We consider fixed-for-floating interest rate swaps, where floating LIBOR payments are exchanged against fixed rate payments on dates T 0 , . . ., T M .The year-fraction between T j−1 and T j will be denoted τ j .
European swaption: A European swaption is a contract, which gives the holder the right, but not the obligation to enter an fixed-for-floating interest rate swap with pre-specified fixed rate K at a pre-specified future inception date T 0 .Considering the annuity A 0,M (t) := M m=1 τ m P(t, T m ) as numéraire, the risk-neutral value of a swaption (see Filipovic 2009) can be expressed as where the expectation above is taken under the annuity measure Q 0,M .The parameter δ distinguishes between a payer (δ = 1) and a receiver (δ = −1) swaption.Under the assumption that the swap rate has Gaussian dynamics, the swaption price can evaluated using Bachelier's formula (see Musiela and Rutkowski 1997).Given a swaption has market price V mkt (t, K), the implied normal volatility is defined as the unique scalar σ imp that solves the equation Bermudan swaption: A Bermudan swaption is a contract, which gives the holder the right to enter a fixed-for-floating interest rate swap with maturity T M at any of the dates T 0 , . . ., T M −1 at a pre-specified fixed rate K. Should the holder of the Bermudan choose to exercise at T m , the payoff is given by The value of a Bermudan swaption can be represented through a dynamic programming formulation, in which the price is captured by means of a backward recursion (Glasserman 2004), given by where C m represents the continuation value of the contract.

Credit risk metrics
In this work, we will consider credit exposure and forward sensitivity profiles of Bermudan swaptions.Exposure is essential for CVA computation.Product sensitivities, Delta and Vega in particular, serve as an input to ISDA SIMM calculations for initial margin.
2.4.1.Exposure and CVA.The expected positive exposure (EPE) of a financial contract with price V t at time t ≥ 0 is Expected positive exposure represents the expected loss on a claim in the event of a counterparty defaulting at a given time t ≥ 0 (Gregory 2020).The expectation is evaluated under the risk-neutral measure Q.Exposure is a key ingredient in the quantification of counterparty risk at trade or counterparty level through CVA.CVA is the difference between the total value of a derivative in a market that is completed with counterparty risk and its default-free value (Burgard and Kjaer 2013).Let τ ≥ 0 denote the default time of the counterparty.Assuming τ is independent of exposure (i.e.ignoring wrong-or right-way risk), CVA can be computed as follows Green (2015) CVA = (1 − RR) with RR denoting the recovery rate, LGD the loss given default and PD the probability of default of the counterparty.A common approach in the literature is to model the default time τ as an F t −adapted stopping time using a Cox-construction (Jeanblanc and Li 2020).In that case, the survival function of the counterparty follows an exponential distribution, i.e.Q(τ > t) = exp{− (t)} with (t) denoting a hazard function.The hazard function can be specified in terms of a hazard rate λ(t), which yields (t) = t 0 λ(u) du.Therefore, the probability of default can be written as 2.4.2.Initial margin and MVA.An important means to reduce the exposure at risk is the exchange of collateral.In this work, we will consider initial margin (IM) and its related funding costs.IM is a protection against exposure changes during the margin period of risk, the time interval after a default when outstanding positions are not yet settled, but VM is no longer updated (Green and Kenyon 2015).Typically, IM reflects the 99% value-at-risk of the portfolio change over a 10-day time interval.Throughout this work, we will consider the Standard Initial Margin Model (SIMM) to be the default method for IM computations.The main take-away from SIMM-IM is that it is a sensitivity-based approach.The idea is that the response of an instrument to a risk-factor shock is efficiently approximated by multiplying the instrument's risk-factor sensitivity by the corresponding shock size (International Swaps and Derivatives Association, Inc 2016).We will provide a brief summary of SIMM below, but refer to International Swaps and Derivatives Association, Inc (2020) for details.
Consider risk-factors θ := (θ 1 , . . ., θ K ) and denote the 10day risk-factor increments as θ 1 , . . ., θ K .The variance of the total response of an instrument V t to each shock θ i is computed as where D θ V t denotes the gradient of V t w.r.t.θ and θ denotes the covariance matrix of θ .If the total response defined above, is assumed to be approximately Gaussian with mean zero, the VaR can be estimated as follows Where −1 denotes the inverse of the standard normal CDF.For interest rate derivatives, ISDA distinguishes three relevant types of risk-factors, namely Delta-risk, Vega-risk and Curvature-risk, the latter referring to the second-order Gamma impact.The total IM is given by where each component represents a VaR estimation w.r.t. a type of underlying risk-factors, i.e. market interest rates and market implied volatilities.The risk-factors of each type are subdivided into twelve buckets, corresponding to the yields at the tenors T := {2W , 1M , 3M , 6M , 1Y , 2Y , 3Y , 5Y , 10Y , 15Y , 20Y , 30Y }.Hence for each component one needs to compute the corresponding 12-dimensional sensitivity vector, which are specified as (see also International Swaps and Derivatives Association, Inc 2020): Here, σ k denotes the the implied at-the-money volatility of a swaption with an expiry time equal to τ k ∈ T .Curvature : Curvature margin represents the second-order impact of the interest rate risk-factors to the VaR estimate.In SIMM, the Gamma is approximated with a Vega-Gamma relationship.Therefore, the computation of Curvature margin takes the Vega sensitivity vector as input, similar to Vega margin.
For SIMM-IM quantification in practice, the user only needs to compute the portfolio sensitivities.The other parameters are provided by ISDA in an annual publication.
A key difference between VM and IM is that the former is symmetric.VM reflects the value of the underlying trade and can be rehypothecated.This means the receiver can reuse the collateral, for example in the margin agreement of an opposite transaction.In contrast, IM reflects the risk of a trade and must be exchanged in both directions, without netting the amounts.The collateral is often required to be posted on a segregated account and therefore cannot be rehypothecated (Gregory 2020).As a result, IM must be funded by the dealer, which comes at a cost.The expected lifetime cost of posting IM against a portfolio is called Margin Value Adjustment (MVA) (Green and Kenyon 2015).
The size of MVA is driven by the dynamic IM profile of the portfolio and the funding spread that applies to the IM posting.As the volume of IM is dependent on the state of the economy, {IM(t) : t ∈ [0, T]} represents a stochastic process.The expected initial margin (EIM) of a financial contract at time t ≥ 0 is defined as EIM represents the expected volume of IM that needs to be posted by the dealer at a given time t ≥ 0 (Gregory 2020).
With this expression at hand, MVA can be computed as follows with FS denoting the funding spread, which reflects the spread on the collateral rate w.r.t. the risk-free rate (Green and Kenyon 2015).

The static replication method
In this section, we describe the algorithm that is used to compose a static replication portfolio for a Bermudan swaption.The algorithm is embedded in a Monte Carlo framework and relies on a regress-later technique.See Glasserman and Yu (2004) for a general introduction to regress-later.The method is strongly inspired by the work of Jain and Oosterlee ( 2015) and extends the approach that is presented in Lokeshwar et al. (2022) and Hoencamp et al. (2023).
In Jain and Oosterlee (2015), it is shown that early-exercise options can be priced with an application of a regress-later technique.As such, the 'later' option price V T m+1 is regressed against the 'later' risk-factor realization x T m+1 .The continuation value is subsequently estimated by evaluating the conditional expectation of the regressed option value, i.e.
which can be computed in (semi) closed-form for an appropriate choice of basis-functions φ k .
In Lokeshwar et al. (2022), it is proposed to perform the regression with a shallow neural network (i.e. a feed-forward neural network with a single hidden layer).By considering the appropriate network structure, it is shown that the regression can be interpreted as a portfolio of short-maturity options.As a consequence, the continuation value can be evaluated by simply pricing the replication instruments.In Hoencamp et al. (2023), this approach is extended to an interest rate modeling framework.There it is shown that a Bermudan swaption can be semi-statically replicated by an options portfolio written on zero-coupon bonds.The central concept of the latter two studies is that if a portfolio perfectly reproduces the value function of a derivative security at some future time T j for every realization of the state-variables, the no-arbitrage condition implies that this portfolio will also replicate the security at any time t < T j , as long as no cash-flows can occur between t and T j .Where a dynamic replicating portfolio needs to be rebalanced continuously through time, the semistatic replication only needs rebalancing on a finite number of instances.
The algorithm that is proposed in this paper builds further on the foundations that were laid by Jain and Oosterlee (2015), Lokeshwar et al. (2022) and Hoencamp et al. (2023).We suggest two main novelties compared to the earlier studies • We propose to use the swap rate as the regression variable, which is an implicit risk-factor in our model set-up.As a consequence, the regression can be interpreted as the payoff of a portfolio of European swaptions.We further elaborate on this in Section 3.4.• We propose a new variation to the algorithm, which allows us to compose a replication that is not semistatic, but in fact truly static.In other words, this algorithm lets one compose a portfolio at time zero, which will mirror the value of the Bermudan until it is either exercised or expired, without the need of updating the portfolio at intermediate time-points.
We further elaborate on this in Section 3.3.

The algorithm
The algorithm is executed in an iterative manner, moving backwards in time.Starting at the final exercise date T M −1 , a regression is performed to determine the weights and strikes of an options portfolio which replicates the Bermudan's payoff.This replication is subsequently used to estimate the value of the Bermudan at the preceding monitor date, yielding the target function of the consecutive regression.This process is repeated until the first monitor date T 0 and a portfolio has been composed which statically replicates the Bermudan between time-zero and its final exercise date.Below we will describe the algorithm in more detail.
Start by sampling N trajectories of the underlying statevariable.For each realization of x T M −1 , compute the Bermudan swaption payoff at T M −1 (the final exercise opportunity) . This will act as the target function of the first regression.Now recursively execute the following steps for m = M − 1, . . ., 0:

Select and sample the regression variable
Select an asset z m (t) = {z m,1 (t, x t ), . . ., z m,d (t, x t )} taking value in R d that will act as the independent regression variable.For compactness, we will suppress the dependency of z m on the state variable x t , when clear from the context.The asset(s) should satisfy the following conditions: where f m denotes a target function, which we define in (8) below.Subsequently compute for each sample path the realizations of z m (T m ).

Regress the target function
Consider a regression function G m : R d → R that is defined as Here we denote by A 1 : R d → R n , A 2 : R n → R affine functions of the form where q m (x) denotes a possible scaling factor.The rationale behind this target function is that we aim to regress the fraction of the option value that is lost, when an exercise date passes, but the option is not exercised.We provide an elaborate motivation and interpretation in section 3.3.Subsequently, optimize the parameters The parameters are determined by minimizing the cost function where z m (n) and x T m (n) denote sampled realizations of the asset and the state variable, respectively, on a corresponding MC path.

Estimate the continuation value
Finally, the continuation value of the Bermudan swaption at the preceding exercise date T m−1 is estimated by calculating With an appropriate choice of z m , this can be computed in closed-form or approximated with an efficient numerical routine.

Interpretation of the regression function
The regression function as presented in ( 7) can be considered a shallow neural network, i.e. a feed-forward neural network with a single hidden layer.The structure is graphically represented in figure 5(a).The outcome of the first hidden layer is a vector in R n , consisting of n neurons.Each individual neuron O m,i is calculated as The functional form of each hidden node thus corresponds to the payoff of an arithmetic basket option written on z m,1 , . . ., z m,d .
The outcome of the second layer is calculated as The functional form of the outcome of the neural network, hence corresponds to the payoff of a weighted portfolio of basket options, plus a zero-coupon bond maturing at T m .
Computing the conditional expectation given in (11) can be interpreted as evaluating the risk-neutral price of this portfolio.

A static replication
Let m (t) denote an options portfolio maturing at T m with payoff equal to By the deliberate choice of the target function f m in (8), the payoff of this portfolio yields the approximation which is graphically represented by the red area in figure 6(a).As a consequence of the no-arbitrage condition, it follows that at any time t < T m the risk-neutral portfolio value yields the approximation m (t) ≈ C m−1 (t, x t ) − C m (t, x t ).By a repeated argument, whenever t ∈ (T m−1 , T m ) we have The combined portfolio (t) := M −1 i=0 i (t), composed at t = 0, can be interpreted as a fully static replication of the Bermudan swaption.Whenever a monitor date T m is reached, subportfolio m expires.Then, two things could happen: The holder exercises the Bermudan and the value of M −1 i=m+1 i plus the payoff of m will equal the exercise value.

• C(T m , x T m ) > h m (x T m ) : The holder holds the
Bermudan, subportfolio m expires worthless and M −1 i=m+1 i will continue to replicate the Bermudan.Thus, portfolio will mirror the Bermudan swaption value until it is either matured or exercised.
The algorithm presented in this work is a variation to the semi-static replication approach presented in Lokeshwar et al. (2022) and Hoencamp et al. (2023).There, the target function f m is selected such that the portfolio payoff m yields the approximation which is graphically represented by the red area in figure 6(b).In that case, subportfolio m is set-up on T m−1 , it replicates the Bermudan on the time-interval (T m−1 , T m ) and expires at T m .Its payoff will equal the exercise value of the Bermudan, or suffice to set-up subportfolio m+1 in case the Bermudan is continued.The advantage of the latter approach is that only m (t) needs to be priced in order to value the Bermudan, rather than M −1 i=m i (t).A disadvantage is that the replication is only semi-static and needs rebalancing on each monitor date T m .

Estimation of the continuation value
How the continuation value as given in ( 11) is computed, depends on the specification of the model and the selection of the regression asset.In the general case, the portfolio m can be interpreted as a portfolio of arithmetic basket options.The valuation of a basket option is typically still not an easy exercise.However, with the appropriate constraints on the neural network structure used for the regression, the replicating portfolio can be reduced to a set of European options.A first approach would be to consider a one-dimensional regression asset, which corresponds to the design depicted in figure 5(b).This would be appropriate under a one-factor model.A second approach would be to constrain matrix A 1 to have only one non-zero entry in each row.This corresponds to the neural network design depicted in figure 5(c).In the numerical examples included in this work, we will be considering a 1-factor model environment.Generalizations to multi-factor models have been described in Hoencamp et al. (2023).We will consider the following assumptions: (t,x t ) .Under the assumptions stated above, the (sub)portfolio m as defined in (12), can be interpreted as a portfolio of European swaptions, written on the swap rate S m,M (t).† The † For simplicity we will ignore the zero-coupon bond in the replicating portfolio (see section 3.2).This is achieved by constraining b 2 = 0. Our numerical experiments indicate a minimal impact on the accuracy of the replication.
valuation of m (t) for some t < T m comes down to where the expectation in the last line is taken under the annuity measure Q m,M associated to the numéraire A m,M (t).The expression above should be recognized as a linear combination of European swaption prices (see equation ( 2)).The parameter w 1,i distinguishes between a payer (w 1,i > 0) and receiver (w 1,i < 0) swaption.The parameter w 2,i distinguishes between a long (w 2,i > 0) and a short (w 2,i < 0) position in the swaption contract.

Valuation of the replicating portfolio
Under the assumptions of section 3.4, portfolio m consists of European swaptions, which means the valuation comes down to pricing each individual option.We apply a coefficient freezing technique, described in Andersen and Piterbarg (2010) and Schrager and Pelsser (2006).With this technique, the swap rate process is approximated as a generalized arithmetic Brownian motion, by freezing the stochastic terms in its diffusion.The option's implied volatility can then be approximated by integrating the 'frozen' diffusion coefficient.We briefly summarize the method below, and refer to Andersen and Piterbarg (2010) and Schrager and Pelsser (2006) for details.Consider the dynamics of the swap rate.By Itô's lemma, it can be established that S m,M satisfies the SDE Considering the annuity as numéraire, the swap rate can regarded as the quotient of some tradable assets and the numéraire itself.As a consequence, the swap rate is a Martingale under the annuity measure Q m,M (Jamshidian 1997).Hence, for 0 ≤ t ≤ T ≤ T m we can write The diffusion coefficient is stochastic, due to the P(t,T i ) A m,M (t) terms.However, these terms again represent the quotient of a tradable asset and the numéraire.It is conjectured in Schrager and Pelsser (2006), that these terms are Martingales with low quadratic variation.A good approximation is therefore achieved by freezing them at their time-t value, resulting in The approximation Sm,M (T) conditioned on F t is normally distributed.Applying Itô isometry allows us to approximate the implied volatility of an option written on S m,M , which yields Substituting σimp (t, T) into Bachelier's formula results is the desired swaption price.

Numerical algorithms and mathematical consistency
In this section, we describe the numerical routines for computing sensitivities along the Monte Carlo path.In short, we show how to compute the sensitivities of the replicating portfolio in a time-consistent manner.This involves the differentiation of the replicating instrument values (i.e.greeking) and the differentiation of the portfolio weights.
Our approach to the latter is inspired by the work of Jain et al. (2019).In Jain et al. (2019) a regress-later approach is considered, which utilizes ordinary least-square regression.As such, the weights of the basis-functions can be obtained explicitly as a function of the risk-factor and consistently differentiated.In this paper, only an implicit relation between the risk-factors and the portfolio weights is known through the minimized cost function (equation ( 10)).Yet, we show that an estimator of the weight sensitivities can still be obtained through an application of the implicit function theorem.
An advantage of our approach is that it should be easier to evaluate sensitivities to implicit risk-factors, which are not directly modeled.In the context of MVA this concerns the sensitivities to the realizations of future market rates and future implied volatilities.With regard to static replication, the replicating portfolio is selected such that the required future Delta and Vega are tractable to compute.In contrast, with, for example, an application of AD on LSM, this may not be straight-forward.It would require the computation pseudo Jacobian-inverses or adding plenty pseudo-interpolation-nodes to the term-structure of zero-rates and instantaneous volatilities, which may be a tedious task.

Notation and assumtions
Let t ∈ [0, T) and let V t denote the time-t risk-neutral value of a Bermudan swaption with strike K and tenor structure T 0 , T 1 , . . .T M , conditioned on the event that the option is not yet exercised at t. Next to the Bermudan swaption, we consider a static replicating portfolio consisting of European swaptions.We write where m denotes a subportfolio of swaptions, written on the swap rate S m,M .The subportfolios are defined as Where we denote π m (t) := (π m,1 (t), . . ., π m,n (t)) , w m := (w m,1 , . . ., w m,n ) and k m := (K m,1 , . . ., K m,n ) To keep computations feasible, we will consider two assumptions: • The time-zero zero-rate curve τ → R(0, τ ) is assumed to be an interpolation between nodes {R 0 , R 1 , . . ., R K }, with R k := R(0, τ k ).The nodes are assumed to be the result of a calibration bootstrap against market rates, such as swap-or deposit rates.• The instantaneous volatility η(t) is assumed to be piece-wise constant between dates {τ 0 , τ 1 , . . ., τ K }.
The volatility level η k between τ k−1 and τ k , is assumed to be the result of a bootstrap against market implied volatilities σ k of European swaptions with maturity τ k .

Sensitivities along the monte carlo path
In this work, we focus on two types of sensitivities, namely Delta and Vega.The time-zero sensitivities are specified as follows: Let := ( 0 , 1 , . . ., K ), where k denotes the sensitivity of V 0 to an infinitesimal bump of the zero-rate node R k .It is defined as The impact of a bump in a zero-rate node, while keeping other nodes fixed, is graphically represented in figure 7(a).Let V := (V 0 , V 1 , . . ., V K ), where V k denotes the sensitivity of V 0 to an infinitesimal bump of implied volatility σ k corresponding to the swaption with maturity τ k .It is defined as A bump in σ k , while keeping other implied volatilities fixed will impact the level of η k and η k+1 .This impact is graphically represented in figure 7(b).
The sensitivities along the Monte Carlo path are understood as future-time generalizations of the quantities defined above.That is, future Delta k (t) is interpreted as the sensitivity of V t to an infinitesimal bump of the hypothetical zero-rate node R k (t) := R(t, t + τ k ), conditional on F t .This is graphically represented in figure 7(c).Future Vega V k (t) is interpreted as the sensitivity of V t to an infinitesimal bump of the time-t implied volatility σ k (t) := σ imp (t, t + τ k ) corresponding to an ATM swaption maturing at t + τ k , conditional on F t .This is graphically represented in figure 7(d).

Computing the sensitivity
First we consider the general case and denote where either θ k (t) := R k (t) or θ k (t) := σ k (t).Later on we will distinguish between the two cases.Without loss of generality, assume that t ∈ (T m−1 , T m ].The aim is to compute We iteratively compute the sensitivities of each subportfolio, for i = M − 1, . . ., m. Starting with i = M − 1, a static replication of V T M −1 is achieved with a single European swaption, i.e.M −1 := π M −1,1 .Thus, setting w M −1,1 = 1 and K M −1,1 = K.The Delta and Vega simply correspond to the sensitivities of a European swaption, i.e.Strike sensitivities For the same reason as for the weights, the sensitivities for the strikes should be considered.However, we conjecture the impact of the strike sensitivities is relatively small, which is supported by our numerical experiments.We will ignore this term for the remainder of this work.
A further derivation of the Delta/Vega sensitivities is the subject of sections 4.4 and 4.5.A further derivation of the weight sensitivities is subject of section 4.6.

The delta sensitivities
We start by working out the terms of the form D θ π m and consider the case θ k (t) = R k (t) := R(t, t + τ k ).We perform the derivation for the general setting 0 ≤ t ≤ T ≤ T m and aim to compute The D θ π m in (19) corresponds to the case t = T.The case t < T will serve as a path-wise sensitivity estimator, which will be required to compute the weight sensitivities in the subsequent section.
We first characterize the forward Delta sensitivity of a zerocoupon bond in the following lemma.
Lemma 4.1 Let 0 ≤ t ≤ T 0 ≤ T 1 and assume that the zerorate curve τ → R(t, t + τ ) is an interpolation between nodes {R 0 (t), . . ., R K (t)}, where R k (t) := R(t, t + τ k ).Then the forward Delta sensitivity is given by where τ i = T i − t and α k denotes an F t -measurable coefficient, dependent on the interpolation scheme of the zero-rate nodes.If the interpolation is linear, the coefficient is given by A proof to this lemma is given in appendix 1.Now, for brevity denote H m,i (t Then by an application of the chain rule and the product rule, we can write Annuity sensitivity (21) Working out each of the three partial derivatives above, we find Bachelier Delta Under the swap rate dynamics approximation given in ( 15), this term is computed as the standard Bachelier Delta sensitivity, i.e.
Annuity sensitivity From Lemma 4.1 it directly follows Swap rate sensitivity Applying the chain rule and Lemma 4.1, it follows

The vega sensitivities
Now consider the case Just as for Delta, we perform the derivation for the general setting 0 ≤ t ≤ T ≤ T m and aim to compute Again, the D θ π m in ( 19) corresponds to the case t = T.The case t < T will serve as a path-wise sensitivity estimator, which will be required to compute the weight sensitivities in the subsequent section.We first characterize the forward implied volatility Vega in the following two lemmas.
Lemma 4.2 Let S m,M satisfy the Gaussian approximation of ( 15) and denote σ (t, T) := A proof is given in appendix 2. The sensitivity of σ (t, T) w.r.t.σ k (t) is an F t -measurable random variable, which is dependent on the calibration procedure of the instantaneous volatility η(t).If η 1 , . . ., η K are scalars obtained by a bootstrap procedure, the sensitivities can be further characterized by the following lemma.
Lemma 4.3 Let η(t) be scalar-valued and piece-wise constant.Let m,M be as in ( 14) and let k denote the diffusion of the swap rate underlying the option corresponding to σ k (t) := σ imp (t, t + τ k ).Then the forward implied volatility Vega is given by where τ := T − t and α k denotes an F t -measurable coefficient.If by approximation m,M ≈ k ≈ k+1 , the coefficient is given by where we denote T k := t + τ k .
A proof is given in appendix 3.By an application of the chain rule, we obtain Considering each of the partial derivatives above, we find Bachelier Vega Under the swap rate approximation given in (15), this term can be computed as the Bachelier Vega sensitivity, i.e.
Bachelier Delta Under the swap rate approximation given in ( 15), this term can be computed as the Bachelier Delta sensitivity.Hence, similar to ( 22), we have Imp. vol.sensitivity The forward implied volatility sensitivity is characterized in Lemma 4.2.Under the assumptions of Lemma 4.3, the sensitivity can be approximated as Swap rate sensitivity Under the swap rate approximation given in ( 15), we can write 2 du.It follows that the forward swap rate sensitivity can be estimated as Under the assumptions of Lemma 4.3, this can be further approximated as

The weight sensitivities
The weights and strikes of m are obtained through an optimization scheme as described in section 3.1.The objective function for this optimization can equivalently be expressed as where we denoted g m (t) := f m (x t ) q m (x t ) .The parameters w m and k m are determined by Assuming L m attains a local minimum in (w m , k m ), it follows by the first order optimality condition that ∇L m = 0.In particular, we have where we can compute In order to characterize the weight sensitivities, we first consider the Jacobian of λ m (i.e. the Hessian of L m w.r.t.w).
The Jacobian can be obtained by again differentiating the expression above, which yields Hence, note that D w λ m is specified by the auto-correlation matrix of the random vector π m .In other words, it depends on the cross-sectional co-variance matrix of the portfolio payoffs.With this expression at hand, we characterize the weight sensitivities in the following proposition Proposition 4.4 Consider a swaption portfolio m written on swap rate S m,M and let its weights and strikes satisfy equation ( 25).Furthermore assume that the strikes K 1 , . . ., K n are all distinct.Then the derivative of w m w.r.t.θ t is welldefined and its Jacobian matrix at (w m , θ t ) is given by This result is a consequence of the implicit function theorem.A rigorous proof is given in appendix 4.
Remark 1 The first term in ( 28) is given by the inverse of the matrix in ( 27).Note that this term is both pathindependent and time-independent.Hence, although its computation requires numerical estimation and matrix inversion, it can be used along each MC path, at each time-step without nested re-evaluation.
What remains to be done is computing D θ λ m .We therefore compute Note that the second term is zero if subportfolio m yields a perfect fit to the target function g m .Therefore, we can expect this term to be small and we will ignore it.Working out the first term, we find In the first term of the expression above, D θ π m (T m ) represent the Deltas/Vegas of subportfolio m .This term needs to be numerically estimated.The path-wise estimator has been derived in sections 4.4 and 4.5.The second term also needs to be numerically estimated.The path-wise estimator involves D θ g m (T m ), where g m (t Working out this term and denoting the event In the expression above, the term . ., M − 1 were assumed to be known from previous iterations.
Algorithm 1: The static replication algorithm Generate N risk-factor scenarios for x T 0 , . . ., x T M −1 ;

The algorithms
Below we summarize the algorithms for composing the replicating portfolios (Algorithm 1) and computing the path-wise sensitivities (Algorithm 2).For the execution of the path-wise sensitivity algorithm, it is assumed that the static replication has been performed and subportfolios 0 , . . ., M −1 have been derived from the trained parameters β 0 , . . ., β M −1 .

Numerical experiments
We demonstrate the performance of the algorithms presented in this work by considering several case studies.We select a one-factor Gaussian model with piece-wise constant instantaneous volatility to perform our numerical experiments (see Andersen and Piterbarg 2010).That is, we consider state-variable dynamics given by where the scalar a denotes a constant mean-reversion rate, W t is a one-dimensional Brownian motion under Q and f (t, T) := − ∂ log R(t,T) ∂T denotes the instantaneous forward rate.The time-zero yield curve is assumed to be a linear interpolation between zero-rates with tenors τ k as given in table 1.We consider a flat yield curve with all zero-rates equal to 3%.The instantaneous forward rate is assumed to be piece-wise constant between the tenor dates τ k .Each volatility level η k is bootstrapped against an ATM European payer swaption with maturity τ k and implied volatility σ k , such that σimp (0, τ k ) = Algorithm 2: The path-wise sensitivity algorithm Generate N risk-factor scenarios for x t 0 , . . ., x t J ; using equation ( 29) and ( 21) / (23); All implied volatility targets σ k are set to 50 bps, resulting in the instantaneous volatilities presented in table 1.The mean-reversion rate is fixed a priori at a = 0.01.The tenors τ k have been selected to coincide with the SIMM tenor buckets specified by ISDA (International Swaps and Derivatives Association, Inc 2020).
Sampling required for regression data, pricing, sensitivities, exposures and IM profiles is done using Monte Carlo simulation.The risk factor is simulated through an Euler discretization scheme.The discretization uses weekly time-steps, i.e. t = 1/52.
As case studies, we consider a 1Y × 5Y and a 1Y × 10Y Bermudan receiver swaption with unit notional.The underlying of the T 0 × T M Bermudan is assumed to be a fixedfor-floating interest rate swap with fixed rate K and a tenor structure T = {T 0 , . . ., T M } of annual payments.Unless stated otherwise, the fixed rate K is selected to be ATM, i.e.K = K ATM := S 0,M (0).The exercise dates of the Bermudan are set to coincide with the fixing dates of the underlying swap, i.e.T 0 being the first and T M −1 being the last exercise opportunity.
The composition of the static replication portfolio relies on the regression of feed-forward neural networks with a single hidden layer.Below, we list some details concerning the fitting procedure of the neural networks.
• As an independent regression variable for G m we select the swap rate S m,M (T m ).The regression data ized from a uniform distribution U(−1, 1).For m < M − 1, the parameters of G m are initialized with the optimized parameters of G m+1 .

Pricing
We start by analyzing the pricing accuracy of the static replication.As each neural network has 8 hidden nodes, every As a benchmark, we report price estimates obtained from the least-square method (LSM) introduced by Longstaff and Schwartz (2001).As basis functions for the ordinary leastsquare regression, we use {1, x t , x 2 t , x 3 t }.The LSM estimate is the mean of 25 independent runs of 80 000 MC paths each.In brackets, we report the standard error generated by these 25 runs.
From table 2, we observe a close correspondence between SR and LSM, in particular for ATM strikes.The accuracy slightly deteriorates for far OTM options.This can be attributed to a limited number of ITM paths at future monitor dates, yielding only a few non-zero data-points for the regression.SR is not subject to a standard error as the portfolios are evaluated at t-zero.SR is however subject to numerical errors due to the inaccuracy of the neural network regression.On the normalized data, the fits typically show a mean squared error in the order of 10 −6 − 10 −5 .5.1.2.Exposure profiles.Consecutively, we consider the exposure profile and related CVA quantities of the Bermudan swaptions.Exposure depends on the future distribution of the option price.Figure 8 shows the expected positive exposure (EPE) simulated with the static replication next to an LSM implied benchmark.The benchmark is generated with the LSM-bundle algorithm, presented in Feng et al. (2016).We briefly summarize this method below.The LSM-bundle algorithm allows one to estimate the EPE of a Bermudan option when combined with the LSM algorithm of Longstaff and Schwartz (2001).Within LSM, the continuation value C m (T m ) is estimated by regressing realized (discounted) cashflows at T m+1 against the risk-factor realizations at T m on in-the-money MC paths.However, to compute the exposure, an unbiased estimate of the continuation value is required on each MC path on every exposure date (not just monitor dates).In Feng et al. (2016) it is therefore proposed to first perform an LSM sweep.Then in a second sweep, for each exposure date t the MC paths are divided into two bundles.If t ∈ (T m−1 , T m ], the two bundles are defined as: Subsequently for both bundles an ordinary least-square regression is performed against the discounted realized cashflows that were computed in the first LSM sweep.The regression functions provide an unbiased estimate of the exposure for a given realization of the risk-factor in their respective domains.We refer to Feng et al. (2016) for details.Note that for the SR method, no second sweep of regressions is required.Exposures are obtained by simply pricing the replicating portfolio along the MC path.
The EPE profiles in figure 8 represent the mean of 25 MC simulations of 3000 paths each.The shaded areas represent a window of one standard error for both respective methods.We observe the typical staircase nature that corresponds to the exposure profile of a Bermudan option.Each discontinuity represents the passing of an exercise opportunity, after which the option is exercised for a certain fraction of the scenarios.Similar to the time-zero prices, we observe a good correspondence between the SR estimates and the benchmark.Additionally, it is noted that the standard error for SR is significantly smaller compared to LSM-bundle.This can be attributed to the benefit of the regress-later approach of SR compared to the regress-now approach of LSM.In the latter, the continuation value is estimated by regressing the (noisy) realizations of the cashflow.In the former, the regression is performed against the deterministic option's target value (numerical errors aside).As a result, the MC noise is integrated out more efficiently with SR.In general, this indicates that a significantly smaller number of MC paths would be required to reach a similar level of accuracy.
We further study the convergence of the standard error, by considering the CVA statistic.To do so, we assume a constant LGD = 1 and constant hazard rate of λ(t) = 0.01.The CVA is estimated using equation (3). Figure 9 shows the mean and standard deviation of the CVA estimate generated with 25 independent MC runs, as a function of the number of MC paths per run.We observe that on average the SR standard error is a factor 4 smaller compared to the LSM-bundle benchmark.For an increasing number of paths, the two methods show a consistent convergence.

Sensitivities
We continue by analyzing the accuracy in the sensitivity computation for the static replication.We distinguish between time-zero sensitivities and future sensitivities along the MC path.The Delta and Vega sensitivities in this section are understood in accordance with the definitions provided in section 4.2.

Time-zero sensitivities.
Figure 10 reports the timezero Delta sensitivities for the 12 tenor buckets.The sensitivities have been scaled to represent the basis point value w.r.t. the zero-rate R k .The SR results (presented in blue) are calculated in accordance with the Delta routine described in section 4.2.As a benchmark, we consider an LSM bump-andreval estimate (presented in orange).For Delta, this means a zero-rate corresponding to one of the predefined tenors is bumped with one basis point and the yield curve is reinterpolated accordingly.The same seed is used for the LSM base run as for the bumped LSM run, to facilitate variance reduction.Figure 11 reports the time-zero Vega sensitivities for the 12 tenor buckets.These sensitivities have been scaled to represent the basis point value w.r.t.implied volatility σ k .The bump-and-reval benchmark for Vega is obtained by bumping one of the implied volatilities with one basis point and rebootstrapping the volatilty term-structure accordingly.Also here the same seed is used for the base and the bumped LSM run.
The results shown is figures 10 and 11 are the mean of 25 independent runs of 80,000 paths each.For the LSM bumpand-reval, this means 13 LSM simulations were involved for each run (1 base simulation and 12 bumped simulations).The SR estimate only requires a single simulation, in order to estimate the weight sensitivities of the replicating portfolio.The swaption sensitivities can be readily evaluated.The error bars indicate the standard error of the LSM estimate.The SR standard errors were each 2 to 3 orders of magnitude smaller compared to LSM and are therefore not included in the figures.Overall, we observe a satisfactory correspondence between the SR sensitivities and the LSM benchmark.For the LSM Vega, we observe relatively large standard errors compared to Delta.This can be attributed to the fact that a bumped yield curve translates one-to-one into bumped zerocoupon bond prices.A bumped volatility term-structure only indirectly impacts the risk-factor distribution and is therefore more susceptible to a higher standard error.
Table 3 reports the SIMM Delta, Vega and Curvature margins computed with the sensitivity estimates of SR and LSM, respectively, for different levels of moneyness.In brackets we report the standard error of the LSM implied quantities.For the Delta and Vega margin estimates, we see the accuracy of the SR method confirmed.The Curvature margin LSM estimates demonstrate a significantly larger standard error in the relative sense.Curvature margin reflects the Gamma sensitivity of the option, which in SIMM is approximated through a Vega-Gamma relationship (International Swaps and Derivatives Association, Inc 2016).Second-order sensitivity approximations are notorious for amplifying numerical errors of the first order sensitivities, which we see reflected in the LSM benchmark.The SR estimate does not suffer from such standard errors.
The convergence of the margin estimates is demonstrated in figure 12.Here we present the mean and standard error as a function of the number of MC paths, obtained from 25 independent runs.We observe that the consistency of the SR method is confirmed for an increasing number of MC paths.

Future sensitivities: European swaption.
We finally analyze the expected future sensitivity profiles of a Bermudan swaption.Calculation of expected future sensitivities through the LSM bump-and-reval method would impose a severe computational burden.A nested MC simulation is required for each time-step, on each MC path, for each sensitivity.Benchmarking future sensitivity profiles is hence not feasible with LSM and out of scope for this paper.
To verify the accuracy of the SR method along the MC path, we therefore start by considering a 5Y × 5Y European payer swaption.This can be considered a special case of the Bermudan, with only a single exercise opportunity at T 0 .We proceed to replicate this swaption, while selecting the 15Y swap rate as regression asset.As a result, the 5Y × 5Y swaption is replicated with a portfolio of 5Y × 15Y swaptions.The same neural network structure is used as for the Bermudans.Subsequently, we analyze the sensitivity profiles of the replicating portfolio, which we benchmark with the analytical sensitivities of the swaption.
The results are displayed in figure 13.The Vega profiles on the right show a near-perfect match between SR and the benchmark.We do observe some numerical errors in the Delta 10Y , 15Y and 20Y buckets.Around t = 0, the latter two are dominant for the 5Y × 15Y swaptions in the replicating portfolio, but zero for the target 5Y × 5Y swaption.If the replication is perfect, the sensitivities in this bucket cancel with the weight sensitivities.Here we note that some small errors remain.Yet, overall, we observe a satisfactory agreement between SR and the analytical profiles.5.2.3.Future sensitivities: Bermudan swaption.Figure 14 shows the expected future Delta sensitivity profiles of a Bermudan swaption.The profiles are generated by taking the mean of 25 independent MC runs of 3000 paths each.The shaded areas (where visible) represent a single standard error window.As the underlying is a receiver swap, we note that the Delta profiles are mostly negative.The tenor buckets surrounding the time until maturity of the underlying swap are dominant.As time passes, the time to maturity decreases and the dominant sensitivities shift from one bucket to the next, implying the triangular-shaped profiles.We see this for example reflected in the (red-colored) Delta-profile of the 5Y bucket in figure 14(a).In the first year, the sensitivity grows (in the absolute sense) to its maximum until the maturity of the underlying swap is exactly 5Y away.Then, the sensitivity of this bucket starts to drop and flows into the 3Y bucket.The discontinuous jumps signify the passing of an exercise opportunity.Despite the small number of paths, we observe a satisfactory low standard error.
Figure 15 shows the corresponding expected Vega profiles and their standard error window.The sensitivities are scaled by the implied volatility in basis points of each corresponding bucket, which is in line with the SIMM routines.The Vega sensitivities are bucketed according to their time to expiry.Hence, we observe triangular patterns resulting from Vegas shifting from one bucket into the next as the time until the future monitor dates decreases.Also here, we observe that the shaded areas representing the standard errors are minor or even invisible in the figure, even though the number of paths is relatively small.
Figure 16 shows the expected initial margin profiles, obtained from the simulated sensitivity distributions.The shaded areas again denote a single standard error window.The figure shows the separate profiles corresponding the Delta-, Vega-and Curvature margin.The sum of these three yields the total expected initial margin as a function of time.From this plot, we observe that overall the Delta margin is dominant for the computation of IM and MVA.For Delta, we observe the typical staircase profile.Each discontinuity represents the passing of an exercise date and the subsequent decrease in the option's total Delta sensitivity.The volume of Vega scales with the square root of the times until expiry of the underlying options.Hence the smooth decay of the Vega margin over time, with similar jumps after each monitor date.The contribution of the curvature margin to the total IM and MVA is very limited.In terms of shape, the profile shows a similar nature as Delta.
We study the convergence of the margin estimates by considering MVA as a function of the number of MC paths.Figure 17 shows the Delta-, Vega-and Curvature components    of MVA in separate plots.We report the mean and standard error, obtained from 25 independent runs.The MVA estimates have been generated from the future IM distributions, while assuming a constant funding spread of FS(t) = 0.01.From the figures, we observe a consistent convergence and see the low standard errors of the sensitivity profiles confirmed.We conclude that generally, 3000 MC paths suffice to guarantee a relative standard error of below 1%.

Conclusion
This paper presented a static replication algorithm for Bermudan swaptions under an affine term-structure model.We showed that under the appropriate conditions, one can construct a portfolio of European swaptions that mirrors the value of the exotic instrument until it is either exercised or matured.This replication is static, i.e. despite the earlyexercise features of the instrument, the portfolio does not need to be re-balanced throughout the lifetime of the original contract.We continued to derive efficient estimators for the price and sensitivities of the portfolio along the Monte Carlo path.These quantities are essential for the quantification of modern risk metrics such as exposure, CVA, IM and MVA.
Exploiting closed-form price and sensitivity approximations for European swaptions allowed us to avoid cumbersome nested simulations that are associated with a naive bump-and-reval approach.Moreover, this enabled us to accurately estimate high dimensional, bucketed Delta-and Vega sensitivities, despite the fact that the risk-factors may be embedded in a low dimensional model.This yielded a richer handle for sensitivity modeling compared to, for example, a parallel shift in the model variables.Through several representative numerical examples, we demonstrated the performance of our approach under a one-factor model, benchmarked to the established least-square Monte Carlo method of Longstaff and Schwartz (2001).Overall, we observed superior convergence for the static replication method with regard to exposures, sensitivities and IM statistics.Thus, we demonstrated static replication to be a suitable alternative for exposure, dynamic IM and MVA quantification.

Disclosure statement
The opinions expressed in this work are solely those of the authors and do not represent in any way those of their current and past employers.No potential conflict of interest was reported by the authors.

Appendix 3. Proof of Lemma 4.3
Proof Recall that it is assumed that the instantaneous volatility in (1) takes values in R and is piece-wise constant between dates {τ 0 , . . ., τ K }.Furthermore it is assumed that the volatility levels η 1 , . . ., η K are obtained through a bootstrap procedure, such that the implied volatility approximations σimp given in (17) agree with the implied volatilities σ k observed in the market.Lastly, note that only η k and η k+1 are sensitive to a change in σ k if other market-implied volatilities are kept fixed.For compactness of notation, we show the proof for t = 0.The generalization to t > 0 follows the same line of thought.
First consider the case τ k−1 ≤ T < τ k .Note that we can write

Figure 7 .
Figure 7. Impact of bumping the market rates (left) and implied volatilities (right) at t-zero (top) and a future simulation date (bottom).All bumps correspond to bucket τ k = 10.(a) Zero-rate bump at t = 0. (b) Instant.volatility bump t = 0. (c) Zero-rate bump at t = 7 and (d) Instant.volatility bump at t = 7.

Figure 8 .
Figure 8. Expected positive exposure profile: LSM-bundle (orange) vs. static replication (blue).The LSM and static replication standard error windows are represented by the orange and blue shaded area, respectively, based on 25 independent runs of 3000 MC paths.(a) 1Y × 5Y Bermudan and (b) 1Y × 10Y Bermudan.

Figure 13 .
Figure 13.Bucketed Delta (left) and Vega (right) profiles for static replication of a European 5Y × 5Y swaption.SR sensitivities (solid) are shown next to an exact benchmark (dashed).(a) Delta profiles and (b) Vega profiles.

Figure 14 .
Figure 14.Bucketed Delta profiles for static replication.The standard error windows are represented by the shaded area, based on 3000 MC paths.(a) 1Y × 5Y Bermudan and (b) 1Y × 10Y Bermudan.

Figure 15 .
Figure 15.Bucketed Vega profiles for static replication.The standard error windows are represented by the shaded area, based on 3000 MC paths.(a) 1Y × 5Y Bermudan and (b) 1Y × 10Y Bermudan.

Table 1 .
(Kingma and Ba 2014) neural network parameters are optimized using the AdaMax optimizer, which is a stochastic gradient-descent method(Kingma and Ba 2014).The batch-size (i.e.number of training points used per iteration) is set to 32.The learning rate (i.e. the step-size scaling per iteration) is set to 0.01.•The parameters of G M −1 are randomly initial-

Table 2 .
V 0 estimates Bermudan swaption for different levels of moneyness, presented in basis points of the notional.Table2reports the V 0 price estimates implied by the risk-neutral valuation of the static replication portfolio (SR) for different levels of moneyness.