Pricing vanilla options using artificial neural networks: Application to the South African market

Abstract In this paper, a feed-forward artificial neural network (ANN) is used to price Johannesburg Stock Exchange (JSE) Top 40 European call options using a constructed implied volatility surface. The prices generated by the ANN were compared to the prices obtained using the Black-Scholes (BS) model. It was found that the pricing performance of the ANN significantly improves when the number of training samples are increased and that ANNs are able to price European call options in the South African market with a high degree of accuracy.


Introduction
This paper aims to answer common questions on machine learning applications in quantitative finance asked by researchers and practitioners. Firstly, whether machine learning techniques such as artificial neural networks (ANNs) can be applied to real-world financial derivative pricing problems and secondly, how these techniques compare to traditional frameworks such as the Black-Scholes (BS) model by (Black & Scholes, 1973). These questions are evaluated by gauging the effectiveness of a feed-forward ANN in pricing European call options using a constructed implied volatility surface in a South African market context.
Numerous studies have been done in the past on the pricing performance of machine learning techniques. (Hutchinson et al., 1994) showed that learning networks can be effectively used to price and hedge derivative securities when traditional parametric models fail. (Bennell & Sutcliffe, 2004) Ryno du Plooy ABOUT THE AUTHOR Ryno du Plooy currently works in industry as a model validation analyst and is pursuing a master's degree in Quantitative Finance at the University of Johannesburg. His research interests include quantitative finance and machine learning. Pierre J. Venter currently works in industry as a quantitative analyst. He is busy with a PhD in Actuarial Science at the University of Pretoria. He is also appointed as a research associate at the Department of Finance and Investment Management, University of Johannesburg. His research interests include quantitative finance and financial econometrics.

PUBLIC INTEREST STATEMENT
The popularity of machine learning has grown exponentially in recent years with numerous applications spanning multiple industries including the financial industry. The focus of this paper is the development of an artificial neural network to price vanilla European call options using option price data from the market. The findings presented in this paper highlight the potential of machine learning applications in derivative pricing and will hopefully help contribute to future research in the field.
established that the performance of ANNs is significantly improved when the price of the underlying asset and the resultant option price is normalised using the strike price. (Culkin & Das, 2017) found that a deep neural network can be trained using artificially generated input data to accurately price European call options. (Liu et al., 2019) concluded that ANNs are able to calculate option prices and implied volatilities efficiently and accurately and have the ability to function as efficient approximation techniques for asset price processes that require time-consuming computations.
The 21 st century has seen the emergence of the fourth industrial revolution, the age of artificial intelligence and automation. With reference to previous studies done on machine learning applications in financial derivative pricing problems, ANNs were shown to be powerful data-driven nonparametric alternatives to traditional derivative pricing models. As stated by (Bennell & Sutcliffe, 2004), ANNs are capable of modelling complex non-linear relationships and are not bound by the restrictive assumptions of traditional models such as the BS model. ANNs do not depend on the assumptions of the underlying stochastic process or require an explicit formula.
The BS model will serve as the benchmark for developing the ANN in this paper since the BS model is considered the market standard for pricing vanilla financial derivatives. The aim is to develop an ANN that is able to learn the BS model by modelling the relationship between the inputs to the BS model and the option prices obtained using the closed-form solution of the BS model. Thus, if it can be shown that ANNs are able to approximate the BS model and accurately price European call options using option price data from the market, then the real-world applications of machine learning techniques can be extended to more complex financial derivatives. This paper consists of the following sections: Section 2 reviews the recent and relevant literature on the use of ANNs in financial derivative pricing problems. Section 3 consists of the research methodology used in this paper, this includes the theory behind ANNs and how they learn, as well as visiting the BS framework. Section 4 comprises of the data and the results of this paper and finally, the findings and concluding remarks are presented in Section 5.

Literature review
The application of machine learning techniques in financial derivative pricing and other areas in quantitative finance has been extensively explored in the past. Resources such as Keras and PyTorch that help facilitate the implementation of machine learning techniques in a straightforward manner and have paved the way for a resurgence in machine learning research focused on financial applications. In this section, the relevant literature on the applications of ANNs in financial derivative pricing is reviewed. (Hutchinson et al., 1994) explored non-parametric methods for pricing and hedging derivative securities since the dynamics of the underlying security can be learned by these methods with minimal assumptions made on the nature of the underlying process. The study compared the pricing performance of four learning networks namely, ordinary least squares (OLS), radial basis function (RBF) networks, multilayer perceptrons (MLPs) and projection pursuit regression (PPR) to the traditional BS model. In the first section of the study, the learning networks were trained using artificial BS option prices and in the second section, the learning networks were trained using daily S&P 500 futures options observed for a 5-year period from January 1987 to December 1991. The study was also the first to propose the use of the homogeneity hint by (Merton, 1973), which entails scaling the underlying spot price with the strike price to reduce the number of inputs to the learning networks. The study concluded that learning networks are able to accurately price and hedge derivative securities and that non-parametric learning networks are useful substitutes for cases when parametric models fail. (Bennell & Sutcliffe, 2004) compared the performance of the BS model with an ANN or more specifically a MLP, in pricing European call options on the FTSE 100 index. Data on FTSE 100 European call options traded on the London International Financial Futures and Options Exchange (LIFFE) was collected over a period spanning from 1 January 1998 to 31 March 1999. This resulted in 9,556 observations after cleaning the data set. A key feature the study investigated was to determine if the normalisation of the spot prices of the underlying and European call option prices produced more favourable results when training the ANN. The training set consisted of data that ranged from 1 January 1998 to 31 December 1998 and the testing set comprised of the remaining data. A third of the training set was further divided into a validation set to test the performance of the ANN during training. Additionally, the data were further categorised into two groups namely, "in-the-money" if the ratio of the spot price scaled by the corresponding strike price is strictly greater than one and "out-of-the-money" if otherwise. It was concluded that normalisation significantly improves the pricing performance of ANNs and is a key property that needs to be incorporated when using ANNs for financial derivative pricing problems. (Culkin & Das, 2017) applied a deep neural network to the pricing of vanilla European call options and compared the performance of the deep neural network to the BS model using a similar approach followed by (Hutchinson et al., 1994). Artificial input data were generated for each of the parameters which resulted in 300,000 call option prices. The input data were partitioned into 240,000 samples for the training set and 60,000 samples for the validation set. The spot prices of the underlying and call option prices were normalised using the homogeneity hint. The input data were fed into a deep neural network consisting of four hidden layers with 100 neurons in each hidden layer. (Culkin & Das, 2017) noted the importance of selecting an appropriate output activation function to ensure that the output of the deep neural network results in a non-negative European call option price. The study found that simplistic deep neural networks can be trained to price European call options accurately.
In a recent paper by (Liu et al., 2019), the performance of ANNs in the pricing of financial derivatives and the calculation of implied volatility was investigated. The ANN was trained on an artificially generated data set where the trained ANN acted as an agent of three different solvers considered in the study. These solvers were the closed-form solution given by the BS model, the COS (Fourier-cosine series expansions) for the Heston stochastic volatility model and Brent's iterative root-finding method for calculating implied volatilities. The ANN consisted of four hidden layers with hyperparameters such as the number of neurons, activation functions and batch sizes being optimised through the use of a random search threefold cross-validation (CV) test. Model selection was performed using 200 epochs and the mean-squarederror (MSE) as the loss function. One million random samples for the input parameters were generated from both wide and narrow parameter ranges. The randomly generated input samples were then converted into European call option prices using the BS model. The data set consisted of four inputs, namely the spot price scaled by the strike price, time to maturity, the risk-free rate and the volatility parameter. These inputs to the ANN were used to produce a normalised European call option price as output. The results of the study show that ANNs are able to calculate European call option prices and implied volatilities efficiently and accurately, and that the pricing performance is improved when a wider range of parameters are used to train the ANN. It was also found that ANNs have the ability to serve as approximation techniques for asset price processes that require time-consuming computations.
Two shortcomings associated with previous literature have been identified. First, the training and testing sets either consist only of artificial option price data or real-world option price data. This may skew the accuracy of results and fail to capture how well machine learning techniques are able to generalise on unseen data. Second, the accuracy of the results is measured using performance metrics, but no information is provided on what the actual option prices generated by these techniques are. This paper aims to fill this gap in the literature by training an ANN on artificially generated data and then using the trained ANN to price JSE Top 40 European call options using option price data from the South African market. The European call option prices generated by the ANN are then compared to the prices obtained using the closed-form solution of the BS model to get a better insight on what the actual pricing error is.
The methodology of this paper is presented in the next section.

Methodology
In this section, the theoretical framework of the BS model for European call options, background to feedforward ANNs, as well as general concepts that facilitate the learning process of ANNs will be examined.

Black-Scholes (BS) model
The value of a financial derivative at any time t should depend only on time and the value of the underlying asset at that time (Shreve, 2008). More formally, the value of a European call option at time t that provides a payoff of max S T À K; 0 f g at maturity T is simply equal to the discounted expected payoff under the risk-neutral measure Q: where: � e À rτ is the discount factor, � r is an arbitrary risk-free rate, � τ is the year fraction between the valuation date t and the maturity date T, � S T is the spot price of the underlying at maturity T, Under the assumption of the above-mentioned dynamics, a closed-form solution for the value of a European call option on a non-dividend paying stock is given by: � S 0 is the spot price of the underlying at the valuation date t ¼ 0.

Artificial neural networks
In the most basic sense, an ANN can be visualised as an artificial representation of the biological brain. ANNs consist of numerous layers of interconnected information processing units called neurons that make up the structure of the ANN. These neurons receive information in the form of outputs from neurons in the preceding layer and process this information using an activation function. The processed information is transferred forward through the network on a neuron-by-neuron basis (Haykin, 1999). (Cybenko, 1989) and (Hornik et al., 1989) proved that a feed-forward ANN with a single hidden or internal layer and a continuous non-linear activation function, such as the sigmoidal function, can approximate any continuous function with precision. A simple diagram of the proposed architecture of the ANN used in this paper can be seen in Figure 1. where: � φð�Þ is an arbitrary activation function, � þ1 is the bias term, � c=K is the normalised European call option price.

Forward propagation
The flow of information in an ANN as described by (Haykin, 1999) is characterised by a forward pass of the information in a sequential manner throughout the network, which is known as forward propagation. To help ease the burden of notation, let the indices i and j denote neurons in different layers of the ANN, where neuron i is located in the layer to the left of neuron j. By making use of this notation, the flow of information is given by: � w j0 ðnÞ is connected to a fixed input a 0 ¼ þ1. (Rumelhart et al., 1986) derived the formal framework for the learning procedure of ANNs known as the back-propagation algorithm. A feed-forward ANN will calculate the difference between the actual output yðnÞ and the predicted output aðnÞ using an arbitrary loss function JðnÞ, which is a function of the iteration n, and subsequently update the synaptic weights in order to minimise the difference between the actual and predicted outputs. To better illustrate the mechanics behind the back-propagation algorithm, consider a single neuron j which is connected to neuron i in the previous layer by w ji ðnÞ. The process of updating the synaptic weight w ji ðnÞ under the generalised delta rule for a feed-forward ANN according to (Haykin, 1999), can be formally defined as:

Gradient-based learning
or in an alternative form as: Δw ji ðnÞ ¼ À η @JðnÞ @w ji ðnÞ ; where: � n denotes a single iteration, � η is the learning parameter, � Δw ji ðnÞ is the adjustment to the synaptic weight connecting the ith and jth neuron, � δ j ðnÞ is the local gradient, � @JðnÞ @w ji ðnÞ is partial derivative of the loss function with respect to the synaptic weight connecting the ith and jth neuron.
The representation of the local gradient δ j ðnÞ can change depending on whether neuron j is located within the output layer or within a hidden layer.
To help with deriving an expression for δ j ðnÞ in equation (1), when a neuron j is located within the output layer containing neurons equal to C, it is useful to first define the following: ε j ðnÞ ¼ y j ðnÞ À a j ðnÞ; Through the application of the chain rule of calculus, the local gradient δ j ðnÞ in equation 1 when neuron j is located in the output layer can be defined as: δ j ðnÞ ¼ À @JðnÞ @z j ðnÞ ¼ À @JðnÞ @ε j ðnÞ @ε j ðnÞ @a j ðnÞ @a j ðnÞ @z j ðnÞ For the second case, let neuron j be located within one of the hidden layers, where neuron j is connected to neuron i in the previous layer by w ji ðnÞ and to neuron k in the output layer by w kj ðnÞ.
It is possible to define a new set of functions to derive an expression for δ j ðnÞ when neuron j is located within a hidden layer as: ε k ðnÞ ¼ y k ðnÞ À a k ðnÞ; where the number of neurons in the output layer is equal to C and the number of neurons in the hidden layer preceding the output layer is equal to κ. Using the chain rule, the local gradient δ j ðnÞ in equation (1) when neuron j is located within a hidden layer can be defined as: δ j ðnÞ ¼ À @JðnÞ @z j ðnÞ ¼ À @JðnÞ @ε k ðnÞ @ε k ðnÞ @a k ðnÞ @a k ðnÞ @z k ðnÞ @z k ðnÞ @a j ðnÞ @a j ðnÞ @z j ðnÞ Using the solution for δ j ðnÞ in equation 3 for the expression ε k ðnÞφ k 0 z k ðnÞ ð Þ and replacing the subscript j with k, equation 4 can be rewritten as:

Activation functions
Activation functions are fundamental to how feed-forward ANNs process the information received from the neurons in the preceding layer. An activation function can also be referred to as a "squashing function" since the activation function limits the output of a neuron to a specific range of values. A key property for an activation function as stated by (Haykin, 1999) is that the function must be differentiable. The rectified linear unit (ReLU) and Softplus functions used in this paper are defined as: (1) ReLU (2) Softplus φðzÞ ¼ ln 1 þ e z ð Þ:

Performance metrics
Performance metrics are fundamental to the evaluation of how well an ANN performs when comparing the predicted outputs with the actual outputs. The mean-squared error (MSE), root-mean-square error (RMSE) and the coefficient of determination ðR 2 Þ are some of the most widely used metrics when evaluating regression-based machine learning problems (Albon, 2018). These metrics are defined as: RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where: � N is the number of observations, � y i is the actual value, �ŷ i is the predicted value, � � y is the mean of the actual values.
The data generating process, ANN architecture and results of this paper will be presented in the next section.

Results
This section covers the data generation process for the training, validation and testing data set, ANN architecture as well as the results of using a feed-forward ANN to generate a European call option price surface from an implied volatility surface.

Data
The following section describes the process followed to generate artificial European call option data for training and validating the ANN. The section also includes the process involved with transforming the constructed implied volatility surface into a call option price surface that will be used as the testing set to evaluate the pricing performance of the ANN.

Training and validation data
Since high-quality option price data are scarce in the South African market due to illiquidity, it was necessary to resort to generating artificial training data. The artificial training data were randomly sampled from wide ranges for the input parameters which were transformed into European call option prices using the closed-form solution of the BS model. Some of the parameter ranges were exaggerated to enhance the uniqueness of the artificially generated data, resulting in a substantial number of input parameter combinations to aid the ANN in generalising on unseen data. To better understand the relationship between the number of training samples and the pricing performance of an ANN, two separate training sets were artificially generated where an ANN is trained on each respective set. These two cases are represented as: • ANN Case 1: ANN trained on 200,000 artificially generated data samples.
• ANN Case 2: ANN trained on 1,000,000 artificially generated data samples.
The same parameter ranges were also used to generate the validation sets consisting of 20% of the number of training samples for each case. The parameter ranges used for generating the two artificial training sets can be seen in Table 1.
After obtaining call option prices via the closed-form solution of the BS model using the input data generated for both cases, a similar approach to that of (Hutchinson et al., 1994;Bennell & Sutcliffe, 2004  By applying the homogeneity hint by (Merton, 1973), the inputs for the training and validation set are: x Train ¼ S 0 =K; τ; r; σ BS f g and x Val ¼ S 0 =K; τ; r; σ BS f g respectively. The homogeneity hint was also applied to the European call option prices obtained from the artificial training and validation data, which resulted in outputs of the form: y Train ¼ c=K f g and y Val ¼ c=K f g.

Testing data
The testing data for the ANN is in the form of an All Share Index (ALSI) implied volatility surface constructed using data obtained from the JSE dated 9 April 2019. A granular implied volatility surface was constructed by assuming linear interpolation in strike, and linear variance in the maturity space. Both of these interpolation techniques are consistent with market best practices. This resulted in a constructed implied volatility surface consisting of 10,000 implied volatility estimates. The price of the underlying index (JSE ALSI Top 40) on the valuation date was R51,564.09 and the risk-free rate was assumed to be equal to the 3-month T-bill rate of 7.01%, which was obtained from the South African Reserve Bank (SARB) on the valuation date. It was also assumed that the continuous dividend yield is equal to zero.
The BS model was used to convert the input parameters into call option prices. The homogeneity hint was once again used to normalise the testing set, which resulted in the following inputs: x Test ¼ S 0 =K; τ; r; σ BS f g and outputs: y Test ¼ c=K f g. No additional feature scaling was performed on the other input parameters. The JSE ALSI implied volatility surface and BS price surface are illustrated in Figure 2.

Artificial neural network architecture
The ANN in this paper was implemented in Python using the Keras Sequential Application Programming Interface (API) based on TensorFlow 2.0, which was initially developed by (Chollet, 2015). Calculations were performed on a Dell Inspiron 3567 i5-7200 U CPU @ 2.50 GHz with 4GB of installed RAM. The following ANN architecture was proposed to price JSE Top 40 European call options, as displayed in Table 2.
To facilitate a dynamically driven process of training the ANN, validation losses were monitored after each epoch to prevent the model from over-fitting. Thus, if the ANN's performance starts degrading after a few epochs, the training process will automatically be terminated. This dynamic training process can further be enhanced by using model checkpoints, which stores the ANN configuration after each epoch. Once training is completed, the optimal ANN configuration is saved, which can then be deployed to price European call options.
A very important feature to consider as highlighted by (Culkin & Das, 2017), is the use of an output activation function which results in a non-negative price. It is common practice to use a linear function in the output layer for financial derivative pricing problems, but this may result in occasional negative prices since the linear function is zero-centred. According to (Hull, 2009), the lower bound of a European call option at time t ¼ 0 is given by: This inequality holds due to the optionality embedded in the option. Since the holder of the option has the right but not the obligation to exercise the option. Thus, the value of a European call option cannot be negative. To avoid the violation of this fundamental property, the Softplus function was chosen to be the output layer activation function.

Numerical results
After extensively training an ANN on each artificially generated training set, the trained ANN from both cases was deployed into production. Given input data consisting of the implied volatility surface and other parameters observed on the valuation date, the ANN from both cases were used to generate the prices of JSE Top 40 European call options. These predicted prices of the form c=K were compared to the actual prices obtained by converting the implied volatility surface into a call option price surface using the BS model. The actual prices generated by the BS model were also scaled to be in the form c=K. The performance on the testing set can be seen in Table 3.
From Table 3, it can be seen that the ANN in both cases produced accurate results. The MSE and RMSE improved in the second case due to the greater number of training samples. The R 2 reported for both cases were close to one. A graphical representation of the performance of the ANN from both cases on the testing set can be seen in Figure 3.  Although the results in Figure 3 are promising, real-world applications demand a non-scaled price. This is achieved by multiplying the generated European call option prices with the corresponding strike price K, which results in a price for every point on the implied volatility surface. The price surfaces generated by the ANN from both cases compared to the actual price surface in Figure 2b are illustrated in Figure 4. By evaluating Figure 4a and 4b, it is evident that the ANN from both cases produced a European call option price surface almost graphically identical to the original price surface. From the absolute errors in Figure 4c and 4d, it can be seen that the magnitude of differences reduced significantly when the number of training samples were increased in the second case. This same observation can be made when comparing the relative errors in Figure 4e and 4f. The size of the relative error for deep "out-of-the-money" short-dated European call options is quite significant compared to the rest of the price surface. This overestimation bias is, however, solely attributable to the Softplus function used in the output layer of the ANN since the function is not zero-centred and the value of deep "out-of-the-money" options are very close to zero. A simple solution to this phenomenon is to increase the number of samples used to train the ANN or to search for an alternative output layer activation function. A more detailed view on the results obtained can be seen in Table 4.  From Table 4, it is evident when considering the range of absolute errors that the pricing performance of the ANN significantly improved in the second case where the number of training samples were increased to 1,000,000 samples. The findings of this paper will be concluded in the next section.

Conclusion
The purpose of this paper was to investigate the performance of ANNs when applied to the pricing of European call options in the South African market. The research question can be divided into two parts, firstly, given a feed-forward ANN trained on artificial data, what resultant European call option prices will be generated from an implied volatility surface. Secondly, what degree of error can be expected when comparing the generated prices to prices obtained using a traditional option pricing model such as the BS model. This paper revisited previous literature on the use of ANNs in modern financial derivative pricing problems such as the work done by (Hutchinson et al., 1994;Bennell & Sutcliffe, 2004;Culkin & Das, 2017;Liu et al., 2019). By making use of an approach that is consistent with previous studies, artificial European call option price data was generated, this resulted in the creation of two training sets. The first training set consisted of 200,000 samples and the second training set consisted of 1,000,000 samples. After training an ANN on each of the training sets, it was found that for both cases, an ANN is able to price European call options with a high degree of accuracy when given an implied volatility surface constructed using option price data from the South African market. It was also found that an increase in the number of training samples resulted in a significant improvement in the pricing performance of an ANN.
Areas for further research include investigating sampling techniques to generate higher quality input data that will result in the need for fewer training samples and thus be computationally less expensive. Furthermore, modern pricing frameworks that incorporate collateral and valuation adjustments should also be considered. Finally, the performance of ANNs applied to exotic options in the South African market should also be investigated.