A gradient based calibration method for the Heston model

The Heston model is a well-known two-dimensional financial model. Because the Heston model contains implicit parameters that cannot be determined directly from real market data, calibrating the parameters to real market data is challenging. In addition, some of the parameters in the model are non-linear, which makes it difficult to find the global minimum of the optimization problem within the calibration. In this paper, we present a first step towards a novel space mapping approach for parameter calibration of the Heston model. Since the space mapping approach requires an optimization algorithm, we focus on deriving a gradient descent algorithm. To this end, we determine the formal adjoint of the Heston PDE, which is then used to update the Heston parameters. Since the methods are similar, we consider a variation of constant and time-dependent parameter sets. Numerical results show that our calibration of the Heston PDE works well for the various challenges in the calibration process and meets the requirements for later incorporation into the space mapping approach. Since the model and the algorithm are well known, this work is formulated as a proof of concept.


Introduction
In finance, calibrating model parameters to fit real market data is challenging because most model parameters are implicit in the real market data [4,7,10,12,13,15].We consider the well-known two-dimensional Heston model, which contains at least four parameters implicit in the market data.The Heston proposed a two-dimensional stochastic differential equations (SDE) model to simulate the behavior of the stock price [5] and presented a closed-form valuation formula for his model.Some calibration techniques are based on this formula [13,4].The closed-form equation has some restrictions w.r.t.improving the model by considering non-constant parameters [13].Another strategy to improve the model is to introduce additional processes, which leads to an increase in the difficulty of the calibration process [15].Since the stock price and variance are stochastic processes in the Heston model, one can Monte Carlo optimization methods can be applied [15].
We consider the well-known two-dimensional Heston model, which contains at least four parameters implicit in the market data.The parameter calibration is formulated as a constrained optimization problem to minimize a cost functional.The cost functional describes the difference between the reference data, the subsequent market data, and the data obtained by numerically solving our model.In the space mapping approach [1], a coarse and a fine optimization solver are used.This paper presents the first step "a gradient descent algorithm" for the Heston model towards a space mapping approach in finance.Later, within space mapping, the gradient descent algorithm will compute a coarse and a cheap approximation for the calibration problem of the Heston model.
Our goal is to introduce the space mapping approach to financial applications, and therefore we use the log-transformed normalized Heston model formulated as a partial differential equation (PDE) and the gradient descent algorithm as a pre-step, since both are well known and thus we can focus on the novel aspects.During our research, we didn't find any other PDE-based calibration approach for the Heston model.Of course, we are aware that there are already algorithms to compute the exact solution of the Heston calibration problem [4].However, these methods are limited to the assumption that the parameters are constant, whereas our approach considers time-dependent parameters and constant parameter calibration.Furthermore, if the parameters are assumed to be time-dependent, an analytical solution can't be derived and thus an exact solution is not available.Therefore, as mentioned before, the paper is considered as a proof of concept to introduce a new calibration method based on financial research.
We focus on deriving a gradient descent algorithm for the Heston model that satisfies the requirements for its use in the space mapping approach.In addition to our financial model, the gradient descent algorithm requires the adjoint of the model.Therefore, we formally derive the adjoint of the Heston model and construct the gradient using well-known techniques from optimization of partial differential equations (PDEs) [6,16].The gradient descent algorithm has previously been applied to the Heston model using neural networks, so it has not been explicitly computed [11].In this work, we focus on the calibration rather than the numerical solution of the model, i.e., our obtained results can be further improved by using more accurate numerical methods.Nevertheless, the calibration results obtained are remarkable.Our attempt can be easily extended to include time-dependent parameters for the variance, thus providing an improvement over current calibration strategies based on the semi-analytical solution of the Heston model.
The rest of the article is organized as follows.In Section 2, we introduce the Heston model and our approach to solving the resulting log-transformed Heston PDE.We then focus on parameter calibration.In Section 3, we derive the adjoint of the log-transformed Heston model and the corresponding gradient.We then discretize the method in Section 4 Finally, in Section 5 we focus on numerical results for the application of the gradient descent algorithm.We analyze the behavior of the algorithm in various problems for constant and time-dependent parameter calibration.The paper ends with a conclusion and an outlook in Section 6.

The Heston Model
The Heston model was proposed by Heston in 1993 [5] and describes the dynamics of the underlying asset S ∈ R >0 and the variance ν ∈ R >0 by a two-dimensional SDE.This model is an extension of the well-known Black-Scholes model, which considers only a stochastic process for the asset and leads to where r ∈ R >0 is the risk-free rate, σ S ∈ R >0 is the volatility, and W S t is the Brownian motion.By definition, the variance is the square of the volatility of the asset, ν = σ 2 S .Heston considered a Cox-Ingersoll process for modeling the variance, which leads to the SDE system of Heston's model under the risk-neutral measure given by where κ ν ∈ R >0 is the mean reversion rate, µ ν ∈ R >0 is the long term mean, and and σ ν ∈ R >0 is the volatility of variance.The Brownian motions W S t and W ν t are correlated by the constant ρ ∈ [−1, 1] [5].For the variance process to be positive, the Feller condition 2κ ν µ ν ≥ σ 2 ν must be satisfied.If the Feller condition is violated, the (2) becomes complex which leads to computational problems.Using Kolmogorov's backward equation, we derive the Heston PDE under the risk-neutral measure where V (S, ν, t) denotes the fair price of the option.If we look at the European plain vanilla put option, the terminal condition ("pay-off") is as follows where K ∈ R >0 denotes the predefined strike price.Heston proposed the following boundary conditions V (S, 0, t) = rsV S (S, 0, t) The boundary conditions for the underlying follow directly from the assumptions on the financial market.We reverse the time direction by introducing τ = T − t and thus the payoff condition (4) becomes an initial condition.Next, it is advantageous to use the variable transformation x = log(S/K) for the asset, as it simplifies the numerical implementation, i.e., we obtain a log-transformed variable with normalization to the strike price K.We rewrite the option price equation as Ṽ (x, ν, τ ) = V (S, ν, t)/K and obtain the so called log-transformed normalized Heston PDE defined on the semi-unbounded domain x ∈ R, ν ≥ 0, 0 ≤ τ ≤ T and supplied with the initial condition Ṽ (x, ν, τ and the following boundary conditions The boundary conditions for the underlying asset are the normalized log-transformation of Heston's proposed boundary conditions ( 5), ( 6) and (8).At ν = 0 the parabolic PDE degenerates to a first order hyperbolic PDE, and therefore we need to consider the Fichera theory [2,9] to assess whether it is necessary to provide these analytic boundary conditions or not.
In the sequel we assume that the Feller condition 2κ ν µ ν ≥ σ 2 ν is satisfied.Therefore we obtain an outflow boundary at ν = 0 and must not impose any analytical boundary conditions on this boundary.But we obviously need a numerical boundary condition to complete the scheme, which will be later discussed in section 4.
In addition to the Heston PDE with constant parameters for the variance process, we also consider the time-dependent parameters κν , μν , σν , as well as ρ.Then the corresponding PDE formulation is given by In the numerical simulations 5, we will also discuss variations of constant and timedependent parameter.

Derivation of a gradient based optimization strategy
For a given data set V data and reference parameters ξ ref , we formulate the calibration as optimization task with the cost functional given by As our aim is to fit the parameter to real market data and therefore no ξ ref exists, we set λ = 0 and the cost functional reduces to In the following, we derive a gradient-based algorithm that allows us to calibrate the parameters numerically.To this end we use a Lagrangian approach.

First-order optimality conditions for the Heston model
Let us denote the Lagrange multipliers by ψ = (φ, φ a , φ b , φ c , φ d ), set Ω = (0, ν max ) × (x min , x max ) and split the boundary ∂Ω into First, we focus on the log-transformed normalized Heston equation with constant parameters (9) and define Then it can be written as Next, we define the operator e which will represent the constraint in the Lagrangian.Since at Γ c no boundary condition has to be given, we introduce Ω = Ω ∩ Γ and the operator e is implicitly defined by The Lagrangian for the constrained parameter calibration problem is then given by We formally compute the first-order optimality conditions by setting dL = 0.For details on the method we refer to [6,16].Before computing the Gâteaux derivatives of L in arbitrary directions [6], we note that by Green's first identity it holds Therefore, we can rewrite As e is linear in V , the Gâteaux derivative in some arbitrary direction h reads For the cost functional we have To identify the adjoint equation, we consider for arbitrary h.Note that we are not allowed to vary Ṽ at Ṽ (x, ν, 0) as the initial condition is fixed.Therefore we have h(0, ν, x) ≡ 0.
We consider the four boundary conditions separately.At Γ c , also the parabolic adjoint PDE degenerates to a first-order hyperbolic PDE, and thus we have to consider the Fichera theory [2,9] for the variance again.
The Fichera condition w.r.t. the variance at ν = 0 of the adjoint is the same as before.Therefore no analytic boundary condition is supplied for this boundary, as we assume that the Feller condition holds.On Γ a we have On the other hand, choosing ∇h ̸ = 0 (19) must still hold.This yields φ = 0 on Γ a and As Ṽ (x min , ν, 0) = exp(−rτ ) is given and independent of x, we obtain φ x = 0 there and thus φ a = φ = 0 at this boundary.Similarly, we find on As Ṽ (x, ∞, τ ) = exp(−rτ ) is given and independent of ν, we obtain φ ν = 0 there and thus φ d = φ = 0 at this boundary.Altogether, the adjoint equation reads which is equivalent to with terminal condition φ(T ) = 0 and φ = 0 on the boundaries Γ a , Γ b and Γ d and the outflow boundary at ν = 0.

Derivation of the Gradient
Let ξ = (σ ν , ρ, κ ν , µ ν ) be the parameters to be identified, as r and q are given by the data.We compute the optimality condition by setting d ξ L(V, ξ, ψ) = 0. Since the boundaries Γ a , Γ b and Γ d are zero, we focus on Ω.In the following we state the derivatives w.r.t. the different parameters separately.For σ ν we get Similarly, we obtain for the other derivatives Note that d ξ L( Ṽ , ξ, ψ)[h ξ ] = 0 needs to hold for arbitrary directions h ξ .Therefore, we can read off the gradient from the above expressions.We extend this gradient formulation for time dependent parameter ξ = (σ ν , ρ, κν , μν ).The gradient is then time-dependent as well and given by

Gradient descent algorithm for the parameter calibration
Solving the first-order optimality condition all at once is difficult due to the forwardbackward structure.Therefore, we propose a gradient descent algorithm in the following.For a given initial parameter set ξ 0 , we can solve the state equation for the Heston model with constant control variable ξ (9) or time dependent parameter (15).With the state solution at hand, we can compute the corresponding adjoint equation ( 21) or (22) backwards in time.Then we have all the information available to compute the gradient and update the parameter set using a gradient step.The procedure is sketched in algorithm 1.

Algorithm 1: Gradient descent method for Heston parameter calibration
Result: calibrated parameters for Heston model initialize parameters u 0 ; while ∥gradient∥ > ϵ do solve ( 9) or (15); solve (21) or ( 22); compute the gradient; line search for step size; update the parameter set; end Since the parameter domain for κ ν , µ ν , σ ν and ρ is restricted, as well as the constraint that the Feller condition has to be fulfilled, we use the projected Armijo rule [16].In the projected Armijo rule, we choose the maximum σ k ∈ {1, 1/2, 1/4, . ..} for which Here γ ∈ (0, 1) is a numerical constant, which is problem-dependent and typically chosen as γ = 10 −4 .We will use this value for the numerical results later on.

Discretization
In this section we introduce a closure boundary condition at ν = 0 for the Heston and its adjoint and perform a domain truncation to discretize the problem.Following the Fichera theory and assuming that the Feller condition holds, no analytical boundary condition needs to be imposed at ν = 0, neither for the Heston model nor for its adjoint, since the PDEs have a pure outflow boundary at this point.Nevertheless, we need a closure condition for this boundary for the discretization.We suggest to follow Heston's approach, as discussed in [9,3] and use the reduced hyperbolic formulation of the Heston PDE and its adjoint.At Γ c , we obtain for the log-transformed normalized PDE and similar for the adjoint Now, we perform a domain truncation to obtain a rectangular grid, instead of a semiunbounded domain and introduce the grid points.We consider uniform meshes in each direction and obtain for the spatial directions x i = x min + i∆ x for i = 0, . . ., N x with ∆ x = (x max − x min )/N x and ν j = j∆ ν for j = 0, . . ., N ν with ∆ ν = ν max /N ν , as well as τ k = k ∆ τ for k = 0, . . ., N τ with ∆ τ = T /N τ for the temporal direction.Since this is a proof-of-concept, simple and well-known spatial and temporal discretization methods are used to illustrate our approach.For the time discretization, we use the well-known alternating direction implicit (ADI) method, in more detail the Hundsdorfer-Verwer scheme [8].This scheme is of second order for any choice of θ, where θ is a measure of classification similar to the θ-method, and is able to handle mixed derivative terms.To present the ADI method, we use a general second order PDE formulation Since the log-transformed normalised Heston PDE and its adjoint are second-order PDEs and we further use the same discretization.
In a first step, we split the operator of the PDE into three operators with where D x describes the discretization matrix of the first derivative w.r.t.x and accordingly D xx of the second derivative w.r.t.x, D ν and D νν of the first and second derivative w.r.t.ν, I denotes the identity matrix.The discretization matrices are derived using central finite differences.Let u i,j k ≈ u(x i , ν j , τ k ) and simplify u k ≈ u(x, ν, τ k ).In each time step, we have to solve the following system of equations We choose θ = 3/4 and improve the implementation as we used the approach in [14] by using a matrix based instead of a vector based implementation of u k .
At this point, we have to discuss the boundary conditions for the Heston model again.We start with the x dimension.We set Ṽ (x min , ν, τ ) = exp(−rτ ) and similar Ṽ (x max , ν, τ ) = 0, as we suggest a sufficiently small x min and a sufficiently large x max .For the variance boundaries, we follow the approach of Kùtik and Mikula [9].Due to the Fichera theory, at ν = 0 we gain a outflow boundary and no information can enter the domain from the region ν < 0. Since the same holds for the adjoint, we propose the same approach there.In these, we extend the numerical domain for this case and introduce    As the calibrated values differ over the test cases, it is reasonable to assume, that we only find local minima.However the results are remarkable.
Since in the real market the parameter are not considered as constant, we improve the approach by considering different parameters, and some parameter sets as time-dependent.From the relative change in the constant calibration setting, we choose the following (additional) test cases listed in Table 6.Further the table includes the links to the cost functional reduction tables and figures of the cost function evolution with respect to the different cases.
For each case of the time-dependent test cases V data is generated as before and ξ init is assumed to be constant and thus also the same as before.
To quantify the different calibrations, we compute the average relative cost functional reduction by and summarize the results in        doesn't improve the cost functional reduction significantly.The first slightly improvement can be found by using at least two time-dependent parameters (C5-C7).Surprisingly C7 gives the best calibration results, where κ ν is the only variable which is calibrated as a constant, and C5, where all parameters are calibrated as time-dependent gives the least improvement, when considering combinations of time-dependent parameter calibration.The fact that V data is generated with constant parameters and the best case considers time-dependent parameters indicates that the time-dependency is a good way to overcome the local minima.Those results are inline with literature [13,15].

Conclusion and Outlook
Our paper began with an introduction to the Heston model.The introduction was followed by the derivation of a gradient-based optimization, including the derivation of a gradient descent algorithm.In the next section, the discretization of the schemes and algorithms is presented and numerical results are illustrated.These show that the gradient descent algorithm is a feasible calibration technique.The relative cost function reduction is remarkable, even though we can expect to find only local minima.Thus, our approach follows recent research.Furthermore, the assumption of at least time-dependent parameters is better suited to the real market situation, since in the real market almost no parameter is constant.In future work, we plan to incorporate the gradient descent algorithm into the space mapping approach and to test the space mapping approach on real market data.

Figure 2 :
Figure 2: Cost functional evolution per iteration for the different test set in the case scenario C1.

Figure 3 :
Figure 3: Cost functional evolution per iteration for the different test set in the case scenario C2.

Figure 4 :
Figure 4: Cost functional evolution per iteration for the different test set in the case scenario C3.

Table 1 :
Five initial values sets for κ init

Table 2 :
Five initial values sets for µ init

Table 3 :
Five initial values sets for σ init

Table 4 :
Five initial values sets for ρ init for the different percentages and the corresponding calibrated values for C0.

Table 5 :
Relative reduction of the cost functional computed with 31 using the different test cases for ξ init and ξ cal in the constant calibration setting.

:
Cost functional evolution per iteration for the test cases within the constant parameter calibration.

Table 6 :
Different cases for calibration setting.

Table 7 :
Relative cost function reduction for the C1 calibration.

Table 14 .
Note, that all cost function reduction averages are huge.We observe that a time-dependent calibration for one parameter alone (C1-C4)

Table 8 :
Relative cost function reduction for the C2 calibration.

Table 9 :
Relative cost function reduction for the C3 calibration.

Table 10 :
Relative cost function reduction for the C4 calibration.

Table 11 :
Relative cost function reduction for the C5 calibration.

Table 12 :
Relative cost function reduction for the C6 calibration.

Table 13 :
Relative cost function reduction for the C7 calibration.

Table 14 :
Average cost functional reduction in percentage for the different cases over the 20 test cases, as well as a list for the corresponding relative cost functional reduction as well as the figures of the cost functional evolution.