Measurement of extreme market risk: Insights from a comprehensive literature review

The experience of past financial market turmoil suggests that in addition to eroding investor wealth, the severe consequences of rare extreme market events can spillover and impair the broader real economies. In this context, this paper is an evaluation of the methodological and empirical advances in the measurement of the extreme market risk. This paper argues that a major reason for the origin of such risks post 1980s has been the unintended consequence of asymmetric monetary policy to sustain the rise of financial markets. Thereafter, this review identified the value at risk (VaR) and VaR-based alternative expected shortfall (ES) as the principal measures of extreme market risk. The deficiencies in the standard modelling approaches for VaR-ES measures have led to several advanced estimation methodologies. However, the lack of identification of optimal methodology, in the internal models approach (IMA) regime where financial institutions (FI’s) can choose suitable VaR-ES modelling technique incentivizes regulatory arbitrage and other inconsistencies. Therefore, this paper investigates the theoretical and empirical research literature on VaR and ES estimation for financial asset market prices. This paper finds that the extreme value theory (EVT) followed closely by the filtered historical simulation (FHS) are highly accurate methodologies. In addition, Mixture distributions, asymmetric and non-linear versions of the conditional quantile (CQ) approach, (volatility) asymmetry and long memory conditional volatility

ABOUT THE AUTHORS Gourab Chakraborty is a graduating PhD student in Quantitative Finance area from the Institute for Financial Management and Research (IFMR), Chennai. His research interest lies in financial market risk measurement and asset pricing. He is also a Financial Risk management (FRM) Level II Candidate.
Prof G. Balasubramanian is the former dean of IFMR. His research interests are financial modelling and analytics.
Prof G. R. Chandrashekhar is currently a professor in the area of Data Science at IFMR. His research interest spans investigating strategic actions, unravelling complexity, and databased exploration.
This literature review paper is a part of Gourab's PhD thesis. The alternative methodologies identified in this paper are being used in working papers to measure extreme market risk across asset classes and financial markets.

PUBLIC INTEREST STATEMENT
We live in an age of financialization, where economic policy is geared to sustain the rise of financial markets as drop in financial asset prices can wreak havoc for the real economy. Yet, arguably unintended consequences of policy action has led to higher instances of financial market turmoil since the 1980s, especially during the last decade since the global financial crisis (GFC) of 2008. Therefore, the measurement of extreme market risk is of paramount importance for financial institutions (FIs) and the regulatory authorities. While the current regulatory guidelines do not specifically mandate the FIs to measure this risk following any specific methodology, the scale of available methodologies is enormous. With this background, this study reviews the empirical literature for extreme market risk estimation methodologies. It finds that the few estimation approaches which can better explain the statistical properties of the extreme returns can superiorly measure extreme financial market risk.

Introduction
In this paper, we review the literature on the measurement of extreme (financial) market risk, in particular for the estimation of Value at risk (VaR) and Expected Shortfall (ES). Mishkin (2009) and Orlowski (2012) suggest that extreme market events are unforeseen or rare, i.e. low frequency but high severity (LFHS) and refer to its risk as "tail risk". From a statistical standpoint, LFHS market crises are tail events, i.e. outliers that lie in the tails of the distribution. While Kemp (2011) limits tails to the lower 10% quantile and beyond the upper 90% of a returns' distribution, Smith (1987) offers a much stronger and narrower definition with tighter bounds-beyond 2.6 times standard deviation of asset returns away from the mean. The term "tail risk" concerns the "chances" ₋ measured by the probability (estimated by empirical and theoretical probability distributions) and "consequences" ₋ Value at Risk (VaR) and Expected Shortfall (ES) estimates of adverse returns in the tails of the returns distribution (Tolikas, 2008(Tolikas, , 2014. Kemp (2011) suggest that the tail risk stems from fat tails, which is statistically referred to as (left skewed) leptokurtosis. Fat tails is an empirical regularity when asset prices have lower levels of volatility during ordinary market periods and considerably elevated volatility levels during market turmoil. This paper aims to present a researcher in the field of extreme (market) risk measurement with a wide spectrum of methodologies to accurately measure and determine VaR and ES estimates; and acquaint him(er) with the contours of the maximum progress made in this area.
It is unsurprising that academic literature that proposes new VaR-ES models and backtesting methodologies with a prima facie aim to improve extant ones and compare the alternative VaR-ES estimation methods and backtesting techniques has expanded rapidly in the last two decade. This forms the primary motivation for this paper.
In the following sections, the review has tried to identify four broad themes that direct the research on measurement of extreme market risk and align the survey along these broad themes. These broad themes are ₋ the origin of market tail risk 1 as perceived in the literature, the measurement frameworks of tail risk from the past, the weaknesses in the traditional tail risk measurement approaches in the past and the consequent evolution of newer models, and lastly, the impact of this evolution of tail risk measurement on the empirical literature. Figure 1 offers a schematic progression of this literature review.
The evolution of newer models and the impact on the empirical literature occur simultaneously. Hence, the studies aligned to these two latter themes have been consolidated. Accordingly, this review of literature can be aligned into three strands: antecedent, evolution of extreme risk measures and the consequent strand. This alignment guides the organization of the paper. Section 2 reviews the antecedent strand of literature on the critical nature of extreme market risks and highlights its importance of its measurement. Section 3 outlines the second strand of literature on the historical evolution of alternative extreme market risk measures. Section 4 reviews the consequent seminal and empirical literature on the alternative VaR and ES estimation methodologies. The conclusions are presented in section 5. This paper argues, respectively, in section 2 that the increasingly frequent occurrence of extreme financial market events with debilitating consequences for the real economy since the 1980s and further argues in section 4.2 that the glaring weaknesses of the traditional tail risk estimation methodologies and the lack of identification of the optimal tail risk estimation methodology in the Basel guidelines are the principal drivers of research in tail risk measurement.
The key contributions of this paper are threefold. First, this paper argues that the origins of tail risk, especially post 1980s, can be attributed to the financialization. Second, it provides a comprehensive review that combines the origins of tail risk with the range of extreme market risk measures and estimation methodologies. Finally, it reviews a vast body of empirical literature Note: While the FHS is an improvement over the HS (a non-parametric) approach, FHS framework per se is a semiparametric technique.
Source: Adapted from studies reviewed in section 4.2 on the extreme market risk and categorically identifies the extreme value theory (EVT) as the most accurate methodology for extreme tails.

The origins of the extreme market risk, the high severity of rare extreme market events -need for its accurate measurement
In this section, this paper tries to trace the origins of tail risk ₋ the first broad research driver ₋ in the enormous and wide influence of financial markets on the real economy, especially economic policy making. This nexus is argued to encourage asset price bubbles, which subsequently bursts upon arrival of aggravating market information.
The March 2020 crash provides a relevant context to appreciate the importance of extreme financial market risk measurement. Studies such as Zhang, Hu and Ji (2020) and Wojcik and Ioannou (2020) which have evaluated the rapid asset price declines during COVID19 pandemic in mature financial markets and in China, indicate that while the unconventional monetary policy (UMP) appears to be Source: Authors' estimation from market prices of indices and exchange rates available in Yahoo Finance and in. investing.com favoured response of central banks to mitigate the adverse economic impact of the pandemic, the undesirable consequences of the UMP are rather likely to aggravate the financial market risks.
This suggestion with regard to the monetary policy is consistent with the contention of Lian et al. (2018) and Cieslak et al. (2019) that UMP incentivizes destabilizing speculation in financial markets that may trigger unsustainable asset price inflation for 15 to 40 months (Gottesman & Leobrock, 2017), which culminates in extreme market events.
To elaborate, Figure 2 exhibits the 21 day rolling average value of a notional 1$ investment in USA and 8 select emerging and frontier markets with that in the USA during COVID19 induced crash.
This figure suggests that investments stand to lose a substantial portion of their value during extreme market events and the time to recovery can be long. In fact, while the zoonotic origin of extreme markets is novel, nevertheless, the occurrence of rare yet extreme financial market events and the destruction of wealth in the aftermath, in the pre-COVID era, are extensively chronicled in financial history since the Dutch Tulip crisis in 1636.
A scrutiny of financial market history suggests that the steep financial asset price deflation in extreme market events, which affect multiple financial markets and asset classes are rare before the 1980s. This survey argues that a major driver of research into extreme market risk stems from the fact that while the post-1980s market disasters are rare such events are increasingly frequent.
To explain, the 1700s witnessed just 2 events ₋ the South Street Sea bubble in 1720 and the crash from the end of "Seven Years War" in 1763. Thereto, the 1800s saw 4 extreme events ₋ from the sovereign bond defaults during 1825-26 across Europe and Latin America through to the Panic of 1873 in the US, Austria and Germany. The 20 th century prior to the 1980s saw only 3 extreme market episodes induced by the banking crisis ₋ the panics of 1907 and 1929 (triggering the "Great Depression") and the Spanish leg of the "Big Five Crises" in 1977. The reader may refer to Table A1 in the appendix for a chronological account of extreme market events till Covid19 induced March 2020 financial market crash.
To sum up, the financial markets worldwide have witnessed 10 extreme market events since 1600. However, the post 1980s era experienced 13 extreme markets events from the Latin American Debt Crisis of the 1980's through East Asian financial crisis, Dot Com Bubble, global financial crisis, the Eurozone crisis and the COVID19 induced March 2020 Crash. (Cieslak et al., 2019;Coombs & Van Der Heide, 2020;Nageswaran & Natarajan, 2019).
An enquiry into the drivers of high incidence of extreme market risk since the 1980s highlights the exposure of broad stakeholders in an economy to financial markets is pervasive in an era of financialization of the physical economy. Tomaskov-Devey and Lin (2011) point out that since the 1980s, financialization has led to the substitution of active income from labor income with passive income from financial asset prices as the dominant component of national wealth. In fact, Cieslak et al. (2019) illustrate that while the unemployment in the US falls less when equity indices rise, it rises much more when market indices decline. Corporations and governments prefer to borrow via financial markets rather than banks. Financial Intermediaries (FI) earn greater share of their profits from trade of financial securities (securitized) loans, and market-linked products than from extending credit and other intermediation. Thus, financialization is characterized by substitution of credit risk with market risk. Karwowski et al. (2017) contend that as the financial markets rise to replace banking as the dominant source of funds, the need to implement asymmetric or unconventional monetary policy (UMP) via financial markets and sustain financial asset price inflation becomes a tacit policy objective.
In the asymmetric monetary policy, central banks use (inter-bank) overnight rate rather than the money supply rate as the monetary policy instrument, and monetary policy is highly sensitive to financial asset price deflation. Central banks write real (monetary) put options by lowering overnight rates to push up falling asset prices and write real (monetary) call options when reluctant to raise the overnight rates to arrest asset price inflation. In fact, overnight rates have lost efficacy at near zero levels. Therefore, central banks have been lately targeting (often lowering) the interest rates along the yield curve by dealing in (often buying) government and pseudo-government securities, as in quantitative easing (QE). A downward shift in the yield curve has contributed to corporate bond price inflation.
However, an accommodative monetary policy, especially UMP, signals that the central banks are implicitly underwriting the price inflation of liquid financial securities with higher ex-ante returns (ex. equities and bonds). Accordingly, the long-run capital investments which are illiquid and cumbersome appear less attractive against liquid financial securities. Also, the lower rate of interest signals less scarcity of goods and services in the future. Hence, the long-run investment may not return the expected profitability. In addition, an implicit central bank underwriting of risky securities in a lower interest rate regime lowers the the opportunity cost of not investing in safer investments and induces savers to invest in more risky marked-linked investment products. Therefore, these works argue that the asymmetric and unconventional monetary policies tend to cause more destabilizing speculation rather than long-term physical investments and that the monetary policy can be argued to be a major compounder of financial market fragilities leading to extreme market events.
An exploratory data analysis of the daily logarithmic returns during these extreme market episodes highlight that financial asset price deflation was usually greater than 3 standard deviations fall from the mean and the subsequent periods of recovery were protracted. Therefore, the fallout of these extreme market episodes can spillover into broader real economy and cripple it, especially the households.
The pervasive nature of financial market exposure and the severity of losses during extreme market events demonstrate that financial intermediaries (FI's) need to assign sufficient risk capital to protect themselves and the financial system against rare market catastrophes, and the elementary objective of the national and supra-national financial regulators and supervisory authorities is to improve the early identification of extreme market risk and mitigate the fallouts. Therefore, measurement of extreme market risk is an elementary responsibility for the former and of acute concern for the latter. Supranational consortium of financial regulators, such as the Basel Committee of Banking Supervision (BCBS), Bank of International Settlements (BIS) mandate the FI's to estimate VaR and lately ES (BCBS, 2019) derived market risk capital buffers; which are aimed set aside to absorb losses during future extreme market events.

Extreme market risk measures-Evolution, attractions and limitations
In this section, this paper tries to partly address the second and third broad themes of research on extreme market risk measurement. To be specific, this section reviews the strand of literature (Alexander, 2008;Aragones et al., 2001;Berk & DeMarzo, 2012;Clare et al., 2013;Dowd, 2005; Ellis, 2017) that examines the evolution of extreme market risk measures over the years. The review along this broad theme finds that the extreme market risk measures have improved over the past three decades from the traditional standard deviation (SD) of asset price returns through to the recent alternatives of value at risk (VaR) and expected shortfall (ES).
Over time, the limitations in the naïve SD estimator were supplemented with stop loss limits during market trades. In addition, gap analysis examines the difference in net income due to asset and liability sensitivity towards interest rates. Moreover, sensitivity metrics consists of the first and second order mathematical derivatives of asset prices with changes in risk factors. These include "beta" for equities, "duration" and "convexity" for bonds, and "greeks" for options. Further, capital buffers such as Margin Amount and Risk Capital represent extreme market risk in terms of capital amount. These tools are devised to absorb the extreme market losses and protect the integrity of financial markets and banks, respectively. This review finds that several of these extreme risk measures also serve as the tools to absorb and mitigate the fallout of extreme market risks. Importantly, no single estimator is perfect and every alternative has its attractions and limitations. Nonetheless, all of these discussed measures represent mostly the consequences of loss and do explicitly summarize the chances (precisely the probability distribution) of loss.
This major limitation was overcome by Value at Risk (VaR) and the Expected Shortfall (ES), which are probabilistic statements of extreme losses. The VaR is a tail quantile that represents the probability boundary for extreme market losses. The ES measures the average of losses greater than VaR. The ES is also referred to as expected tail loss (ETL) and the Conditional VaR (CVaR).  Source: Adapted from studies reviewed in section 4.2 categorical and has argued to replace the ES with median shortfall (MS). The median shortfall (MS) estimates the median of losses worse than VaR; similar to ES which is the mean of the losses. MS for a particular confidence level (C V ) can be estimated as the VaR at confidence level C V ' = 0.5*(1+ C V ). On balance, this section views VaR and ES as the most widespread (not necessarily sufficient) extreme risk measurement frameworks.
That said, the review of studies cited in this section and the ones discussed in the subsequent section indicate that while the VaR and ES despite are simple in theory, these measures are nontrivial to estimate in practice. In addition, the BCBS guidelines do not suggest any specific estimation methodology for VaR and ES estimations, rather advise the FI's to employ their "internal models"; subject to the qualification of the internal (VaR-ES) risk model in the backtesting of the VaR estimates. Against, this backdrop, the primary aim of the following section is to evaluate the alternative VaR and ES methodologies.

An evaluation of VaR and ES estimation approaches
In this section, the paper has tried to align the remaining sections of the literature review along the second and third broad theme of research and address the fourth theme that evaluates the effect of the tail risk estimation frameworks' progression on the empirical literature. This section suggests that three fundamental estimation methodologies of VaR and ES, the analytical procedures based on Gaussian distribution, non-parametric Historical Simulation (HS), and semiparametric Monte-Carlo Simulation (MCS) (Best, 1999;Choudhry, 2013, page 32) suffer from major weaknesses that have spawned development and application of alternative estimation techniques over the past two decades.

Deficiencies in the three elementary VaR-ES estimation techniques and the need for advanced VaR-ES estimation methodologies
Overwhelming fraction of the consequent strand of literature highlights that the normal distribution fails to account for "fat tails" and other empirical regularities. The data imply that Gaussian distribution assigns unreasonably negligible probabilities to the tail events. For instance, the magnitude of the extreme price deviations during the 19 October 1987 flash crash and the East Asian financial crisis of 1998, and the LTCM crisis was 20 SDs away from mean returns; cannot occur according to Gaussian distribution. To sum up, the Gaussian VaR-ES framework has been documented to severely underestimate extreme market risk. This underestimation is profound for non-linear instruments with discontinuous payoffs like derivatives.
In addition, studies such as Hendricks (1996), Jackson et al. (1997), and Vlaar (2000) that applied the historical simulation (HS) approach, have found the HS derived VaR estimates found to outperform those from Gaussian approaches. These studies highlight the possible strengths of the HS technique. First, the empirical distribution accommodates fat tails. Second, no distributional assumption lends the theoretical flexibility to be applied to derivatives. Third, HS is intuitive and conceptually simple. Fourth, it is easy to produce confidence intervals for (non-parametric) VaR and ES estimates; and lastly, it is easy to implement and communicate the results. In addition, Huang and Tseng (2009) find the HS VaR to be marginally more accurate than the MCS VaR. This superior accuracy of the HS estimates can be attributed to higher matching of tail probabilities.
However, HS estimator is non-precise, i.e. estimates have high standard error, particularly for high confidence levels that represent the tails (rare events). Therefore, HS estimates are difficult to verify.
While MCS VaR estimates outperform Gaussian VaR estimates in backtesting (Pritsker, 1997;Bao, Lee, and Saltoglu, 2006), it is found to produce relatively inaccurate estimates compared to HS, Kernel smoothing, and CaViaR approaches (Huan, Lin, Chien, and Lin, 2004;Bao et al., 2006). Abad et al. (2014) too observed that it performs poorly than HS and (parametric) student t approaches. These studies document that the extreme value theory (EVT) approach is far more accurate and conservative than the MCS method.
The lacunae in the traditional approaches and the need to accurately characterize the extreme market risk have motivated the development of alternative VaR-ES estimation methodologies. A review of the studies examining the conventional approaches and ones applying the alternative techniques strongly suggest that the accurate measurement of extreme market risk acquires higher importance since it is often difficult to model unforeseen phenomena that usually lie outside the domain of available observations.

An evaluation of alternative VaR-ES estimation methodologies
This subsection critiques the best 64 known alternative methodologies for the VaR and ES estimation, based on different statistical (distribution) and econometric (volatility) methods. However, such a large number of approaches and methods complicate the selection of the methodology to measure VaR and ES. A systematic classification and comparison of different methods for risk measures will simplify the optimal selection. Figure 3 attempts to simplify this classification and comparison.
In the univariate risk metric estimation, the extreme risk model can be perceived to consist of 3 kinds of statistical models: risk factor mapping, the data generation process (DGP) and the risk resolution model. To explain, the estimation of VaR-ES measures for all assets (including non-linear /exotic) in a complex large portfolio can be a computational ordeal. Risk factor mapping replaces the current/market value of assets with exposures in a set of fundamental risk factors (equity market indices, yield curve, commodities, currency foreign exchange rates) by applying partial derivatives on an analytical pricing function (ex. Black Scholes formula) or by regression (ex. CAPM). In many instances, a working linear relationship is approximated between a (possibly non-linear) asset's market value and the primitive risk factors while risk mapping.
The data generation process (DGP) step augments the fitting of returns distribution with volatility clustering effects to better describe the evolution of returns series over time. It essentially states that conditional on assuming the evolution of the returns process, i.e. the conditional volatility model, the theoretical distribution is assumed for returns. The returns distribution is assumed to be parametric (in figure 1.2) or left as empirical. The parametric distribution can be fitted to the entire return distribution or in the tails. If the volatility clustering effects are not estimated then the risk model is referred to as "unconditional".
The interaction of the risk mapping and data generation process lends itself to 3 broad risk resolution models: non-parametric, parametric and semi-parametric. The choice between these three components is intertwined. For instance, if returns distribution is empirical then the risk resolution model cannot be analytic but non-parametric or EVT for the tails. By contrast, if the returns are assumed to be independent and identical distributed (IID) then an unconditional risk resolution model is used.

Improvements over the standard Historical Simulation (HS) technique
A review suggests that Hill (1975) and Pickands (1975) type estimators are notable alternative non-parametric improvements over HS techniques. However, the Hill (1975) and Pickands (1975) type estimators estimate tail fatness of extreme returns filtered by the peaks over threshold (POT) approach from the returns distribution. It is noteworthy that the POT method is a constituent method to identify extreme returns also it also leads to parametric modeling of the extreme returns by the generalized pareto (GP) distribution. Moreoever, the HS technique has been significantly improved by the non-parametric age weighting (Boudoukh et al., 1998) and the kernel smoothing (Huang and Tseng, 2007) and semi-parametric volatility weighting White (HW), 1998a, 1998b) and the filtered historical simulation (FHS) approaches.
Empirical studies such as Barone-Adesi et al. (2002), Angelidis and Degiannakis (2005). Alexander & Sheedy (2008), and I. Roy (2011) find that the FHS approach is a highly accurate estimation approach with forecast accuracy comparable to that of extreme value theory (EVT) approach.

Parametric advances over the analytical approach based on Gaussian distribution
The insights from empirical studies on parametric improvements to Gaussian VaR-ES estimation can be classified into 3 broad buckets. The first strand searches for a suitable conditional (timevarying) volatility model, preferably with volatility asymmetry and long memory. The second direction tries fitting a skewed and fat-tailed parametric distribution that best explains the empirical returns distribution. The third strand aims to improve the Gaussian distribution by incorporating higher-order conditional moments ₋ Skewness and Kurtosis.
The following inferences are gathered with respect to the performance of parametric VaR and ES estimates with the performance of VaR and ES estimates based on conditional volatility models. • Second, the exponentially weighted moving average (EWMA) process is inaccurate Abad et al., 2014) for VaR-ES estimation.
• Overall, with minor departures (Gonzalez Riveria et al, 2004), the empirical findings suggest that the forecast accuracy of the VaR estimates based on the stochastic volatility (SV) family models and the GARCH family models are similar. There is no significant improvement in migrating from GARCH framework to SV approach .
• In general, the conditional volatility models with leverage effect and long memory such as fractionally integrated-asymmetric power ARCH (FIAPARCH) and Asai et al. (2012) realized volatility (RV) models are seen to provide highly accurate forecast estimates of VaR and ES. Giot and Laurent (2003) and Brownlees and Gallo (2011) observe that, although under a Gaussian distribution, an RV model is found to yield more accurate VaR estimates than a GARCH family model, regardless, under a skewed and leptokurtic distribution, such as the skewed (S)-t distribution, both competing frameworks produce similar levels of VaR forecast accuracy.
The second strand suggests that when asymmetric and fat-tailed distributions are considered, the accuracy of the VaR and ES estimates improve considerably: • To elaborate, symmetric fat tailed distribution like the student t distribution fits the data better in the tails than the Gaussian distribution but is restricted by the symmetry assumption. Therefore, it can underestimate probability mass in the left tails and consequently offer inaccurate VaR and ES (Brooks and Persand, 2003). In addition, the t distribution has no constraints on the maximum losses and can produce misleadingly high-risk estimates at higher confidence levels. Thus, tail risk estimates derived from t-distribution are unreliable. Moreover, the t-distribution is not stable and therefore the VaR estimates cannot be forecasted over longer horizons (Jorion, 2010).
• A group of studies have performed comparative evaluation of the statistical accuracy of the VaR estimates from the student t-distribution with those from the generalized error distribution (GED), skewed(S) version of GED (S-GED), skewed t-distribution (STD), and the skewed generalized t (SGT) distribution. Notable studies include Angelidis et al. (2004), Huang et al. (2004), and Lin and Shen (2006), Maghyereh and Awartani (2012), and Assaf (2015). These studies find that while modeling the empirical returns distribution with the GED, S-GED, and the STD provide more accurate tail risk estimates over the t distribution, nonetheless the SGT distribution offers the most accurate results. In fact, these studies attribute the notable improvement of the latter three distributions over the t distribution to the distributional asymmetry. In addition, further insights obtained from this strand of literature are as follows: (i) Fan et al. (2008) find that the GARCH-GED process is superior to the GARCH-N model and the HS-(ARMIA forecast) for VaR estimation. In addition, Bali and Theodossiou (2007) note that the VaR measures derived from the GARCH-GED model are more accurate than the generalized t derived VaR estimates whereas the GARCH-SGED model outperforms those from the GARCH-GED model and GARCH-skewed t model in statistical backtesting. Nonetheless, the GARCH-SGT model outperforms the GARCH-SGED model in VaR and ES estimation. Lee et al. (2008) notes that the GARCH-SGED model is superior to the GARCH-GED model for extreme market risk estimation. However, in a study on real estate markets, Zhou and Anderson (2012) suggest that the VaR estimates obtained upon assuming that the extreme values of GARCH process filtered residuals follow the GP distribution and the FHS implied VaR estimates are not more accurate than the estimates from GARCH type GED process. (ii) In contrast, Angelidis et al. (2004) finds that the GARCH, exponential (E)-GARCH and threshold (T)-GARCH models with t-distributed innovations perform superiorly in backtesting than their counterparts with GED innovations. Leverage effect or volatility asymmetry can be enhanced by EGARCH-t and AP-ARCH-t models instead of GARCH-t model. Su and Knowles (2006) exhibit that the improvement of GED and SGED over the student t distribution vis-a-vis the accuracy of VaR and ES estimates is marginal. In fact, Bali et al. (2008) find that the GED VaR can underestimate extreme market risk.
• This review has gained the following additional insights with regard to the Skewed "t" distribution: (i) Studies with favourable assessment: Giot and Laurent (2003) find that the VaR and ES forecasts from the skewed t distribution are more accurate than the estimates with the gaussian and student-t distribution assumptions. In addition, Altun et al. (2018) suggest that the skewed t distribution provides more accurate VaR forecasts than the skewed N distribution. It also indicates that the backtesting performance of the skewed t VaR-ES models is similar to those corresponding to the GED and the SGED. (ii) Studies with not a favourable assessment: Angelidis and Degiannakis (2005) highlight that capturing leverage effect or volatility asymmetry rather than the return distribution is more important for the accuracy of conditional VaR estimates. Moreover, Corlu et al. (2016) argue that generalized lambda (GL) distribution is a more suitable candidate than the skewed-t distribution.
This paper has received the following insights while reviewing studies which have investigated the skewed generalized t distribution in extreme market risk measurement: (i) Features: the SGT distribution is a skewed extension of the generalized t distribution (McDonald & Newey, 1988). The SGT nests several well-known distributions such as the generalized t (GT), the skewed t (ST) of Hansen (1994), the SGED of Theodossiou (2000), and the normal, Laplace, uniform, GED, and Student t distributions. (ii) Advantages: Hence, it can cater to highly diverse levels of kurtosis and skewness. Therefore, it can model the return distribution of a wide variety of assets. [Source: Harris et al. (2004), Bali and Theodossiou (2007), Bali et al. (2008), Cheng andHung (2011), andLin et al. (2014).] In fact, Lin et al. (2014) suggest that the accuracy of the SGT VaR models improves considerably if the tail fatness is estimated by the modified Hill estimator; the latter was proposed by Huisman et al. (2001).
(iii) Disadvantages: By contrast, Polanski and Stoja (2010) and BenSaïda and Slim (2016) find other parametric approaches outperform SGT models. The former observes that Gram and Charlier (GC) (24) expansion VaR and other GC expansion VaRs outperform SGT VaR at moderately high and extremely high tails. The latter, finds that, in equity indices, the Generalized Hyperbolic (GH) distribution explains the physical returns better than the SGT distribution. However, in exchange rates, the SGT distribution is a superior alternative.
This survey notes that the mixture distributions offer reasonably accurate VaR and ES estimates. To elaborate, the insights specific to the mixture distributions are as follows: • Studies such as Venkataraman (1997), White (1998), andAlexander andLazar (2006) report that the VaR and ES estimations obtained from models involving a mixture of distributions [normal (MN) and Student t (Mt)] are generally quite accurate. In particular, Su and Knowles (2006) observe that the accuracy of VaR models from GARCH-MN is comparable to that of Markov Switching (MS) model. In addition, GARCH-MN VaR model outperforms the VaR estimates from GARCH-N and GARCH-t models (Lee & Lee, 2011;Xu & Wirjanto, 2010). The accuracy is increased in the skewed and fat tailed mixture distribution VaR models (Alexander & Lazar, 2006;Miftahurrohmah et al., 2017) and in non-linear GARCH-mixture distribution VaR models (Nikolaev et al., 2013).
The third strand of literature examines the parametric adjustments that aim to augment the performance of the Gaussian approach with higher conditional moments (ex. skewness and kurtosis). These improvements are Cornish Fisher (CF) expansion, Gram Charlier (GC) expansion, and Saddle Point Approximation (SPA) on the one hand, and Johnson SU (distribution) and Fourier transformation: • Early research from studies like Pitchler and Selitsch (1999) and Mina and Ulmer (1999) suggest that (uncorrected) Cornish-Fisher (CF) expansion is a highly fast technique that is less robust than MCS. These studies also highlight that GC expansion, Johnson's transformation and SPA have similar forecast accuracy performance with respect to CF technique. Fuss et al. (2007) suggest that CF expansion is a significant improvement over GARCH-N • The CF VaR is found to deflate the artificially superior risk-adjusted returns by hedge funds/ institutional investments (by S.D., Sharpe ratio, Semi-deviation, Gaussian VaR). Studies that share this finding are Favre and Galeano (2002); Amenc et al. (2003), Gueyié and Amvella (2006), Gueyié and Amvella (2006), Liang and Park (2010), and Boudt et al. (2013).
• Empirical studies like Tesfalidet, Desmond, Hailu, and Singh (2014) note that uncorrected CF VaR and ES estimates can be non-monotonous, i.e. extreme tails may provide lower values. A mathematical and simulation-based line of research originating with Jaschke (2002) investigates into the causes of non-monotonicity. Christoffersen (2003) and Giamourdis and Ntoula (2009) highlight that the CF VaR is monotonic and well-behaved when skewness and kurtosis parameters in the CF expansion formula have within their narrow domains of validity (DVs); for the skewness coefficient the domain of validity (DV) of is between -/+ 1.2.
• However, statistical improvements have been developed to overcome the (non-monotonic) deficiencies and obtain highly accurate risk estimates. To specify, Chernozhukov, Fernandez-Val, and Galichon (2010) proposes an increasing rearrangement formula to overcome the narrowness of the DV for the CF formula. Later, Maillard (2012) points out that the skewness and kurtosis parameters in the CF expansion are not the same as the corresponding empirical statistics and that most researcher confuse between the two pairs. It proposes a methodology to extract the formula coefficients (that satisfy the DV from sample estimates. Further, Amedee-Menasme, Fabric, and Maillard (2019) improve upon this extraction by providing analytical expressions for a range of empirical Skewness and Kurtosis values. The analytical formulae are derived using response surface methodology (RSM) polynomial regressions.
These seminal studies have also backtested the model generated VaR and ES estimates and demonstrated significant improvement in statistical accuracy. However, this review failed to notice any empirical study that examines the comparative performance of these improvements with competing VaR-ES models.
In addition to these improvements, the Filtered Historical Simulation (FHS) method, the conditional quantile (CQ) approach [also known as the Conditional Autoregressive VaR (CAViaR)] and the extreme value theory (EVT) framework standout as the preeminent semi-parametric improvements.
The conditional quantile (CQ) approach is based on directly modelling the quantiles of returns distribution rather than modelling the entire distribution. Given that the VaR and ES measures are tightly linked to the standard deviation of returns distribution that exhibits clustering, the CQ approach uses a conditional autoregressive specification to formalize VaR and ES clustering; volatility clustering often translates to autocorrelated distribution.
This literature review was able to find only a limited literature on CQ/CaViaR technique. For instance, Bao et al. (2006) and Polanski and Stoja (2010) observe that while the VaR-ES forecasts from the standard symmetric absolute value (SAV) version of the CQ approach are accurate in quieter markets/phases, the accuracy drops during market turbulence. However, the forecast accuracy increases substantially during market turmoil, especially at extreme confidence levels when asymmetric extensions of the CQ method that capture the leverage effect and other nonlinearities of returns are applied. Notable asymmetric versions which lead to greater accuracy in VaR and ES estimates are the asymmetric slope (AS), Indirect GARCH (1,1) (IG), combination of T-GARCH and Wong and Li (2001)'s Mixture ARCH model (Yu Li, and Jin, 2010) and the non-linear dynamic quantile (NLDQ)-AS extension (Gerlach, Chen, and Chan, 2011;Sener et al., 2012).
Finally, the extreme value theory (EVT) is a credible methodology in the field of probability theory that can be used to describe and forecast low frequency high severity events (LFHS). In the area of risk measurement, the EVT framework has been applied in the insurance industry, portfolio optimization, and in the measurement of operational risk (Wong, 2013). A major attraction of the EVT is that it characterizes the tails of the returns distribution and does not interfere with the modeling of the entire distribution. The true distribution of the returns is unknown and can be estimated only from empirical distribution, parametric density and semi-parametric approaches. To elaborate, if suitable assumptions are satisfied, then the tools in EVT can be used to characterize the extreme realizations of a given random process or distribution. The principles of EVT were founded by Fisher and Tippett (1928) who showed that the asymptotic distribution of the adequately scaled extreme realizations within a random sample from majority of the distributions can and usually do converge to one out of three extreme value distributions: generalized extreme value (GEV), generalized logistic (GL) and the generalized pareto (GP) distributions. This powerful result allows the disregard of the exact form of the entire distribution of returns to estimate the extreme quantiles ₋ VaR and ES measures. This characterization of few massive losses rather than a sequence of medium-sized losses is argued to mitigate the underestimation of VaR and ES. The former is of greater concern to protect investment values. This flexibility is further increased as the EVT can independently characterize the right and left tails of the asset returns distributions. Several of the reviewed studies in the EVT strand of literature cite Embrechts et al. (1997) and McNeil et al. (2005) as detailed and systematic reference materials for the EVT and its application in finance.
These works and the studies reviewed also highlight the limitations of EVT. For instance, basic EVT assumes that extreme values are realized from IID samples. Empirical returns are usually serially correlated. While the challenges to EVT modelling from this stylized fact can be overcome by few adjustments, there remains no agreement to the most suitable technique. In addition, a salient feature of EVT is the inevitable trade-off between the availability of a limited sample and the need to have vast numbers of realizations. The former is often realized as the EVT's domain of interest is extreme occurrences, which by definition should be rare. The latter is attributed to the requirement of asymptotic nature from theory. Therefore, the selection of the sample data can be a critical step in applying the EVT. Moreover, the scale parameter and to a greater extent the location or the threshold parameter of the theoretical EVT distribution are sensitive to the cut-off choice during the preparation of the sub-sample of extreme values. A lower value can induce bias while a higher value can induce standard error. Furthermore, multivariate EVT is more complex and can encounter far more computational impediments than the univariate EVT. Koedijk et al. (1990) are an early application of the EVT that examines the fat-tailed nature of 8 European currency exchange rates against the dollar. This study examined the fat-tailed nature of financial returns especially initially applying the non-gaussian stable distributions with infinite variance and later using the student t-distribution and its extensions. Its findings suggest that the empirical distributions can be better described by the stable laws distribution than the t-distribution.
In a pioneering research, Longin (1996) defined an extreme return as the greatest or lowest return of an asset over a period. This definition stands out as what qualifies as extreme in financial markets is different from that in physical phenomena. Hence, the occurrence of extreme event is not contingent on any exogenous event. It implies that rational expectations may not hold and extreme returns may be realized without any major news.
The empirical literature on the use of EVT to study extreme market risk is quite extensive. In fact, this study could review a total of 52 empirical papers on EVT VaR and ES estimation. Therefore, we have classified the papers into four broad categories to render the comparative evaluation of EVT literature more organized and insightful.
These studies argue that the EVT framework is a strong modelling alternative over the Gaussian, student t, GED, S-GED, HS, and MCS frameworks; especially at extremely high tails (greater than 99% confidence level). Standalone, the GEV and GP distributions fit well with the returns distribution and in VaR backtesting. The FHS VaR comes close to the EVT VaR in forecast accuracy.
A narrow cohort of studies have tested and compared the performance the conditional EVT models with unconditional EVT models. These include Danielsson and De Vries (2000), Bystrom (2004), Kuester et al. (2006), andSamuel (2008), Marimoutou, Raggad, andTrabelsi (2009), andZikovic andFiler (2013). Most of these studies support the hypothesis that conditional GEV and GPD-derived VaR and ES estimates are superior to those for unconditional GEV and GPD in terms of forecasting accuracy. However, Bystrom (2004) suggests that the improvement in the conditional estimates is marginal. In particular, Samuel (2008) infers that Markov Switching (MS) ARCH process can significantly improve statistical backtesting performance. Moreover, Zikovic and Filer (2013) find that GPD derived VaR estimates marginally outperform those estimated with the FHS approach.
(a) Fitting the EVT distributions to empirical distribution of extreme returns:₋ Empirical studies that have tried to explain the empirical distribution of extreme returns with the theoretical returns report that: In the developed markets, the GP distribution was found to fit the empirical returns distribution better than the Frechet distribution (a particular form of the GEV distribution) (Jondeau & Rockinger, 1999). In a similar finding, the GP distribution was found to be more accurate than the GEV distribution when fitted with the empirical returns (Walls and Zheng, 2006).
(ii) For the past two decades, a new research agenda has emerged wherein the empirical works such as Gettinby, Sinclair, Power, and Brown (2004), Gettinby et al. (2006), Tolikas (2008), Tolikas and Gettinby (2009), and Tolikas and Fifield (2011), and Tolikas (2014) have modelled the extreme returns distribution with the theoretical GL distribution. Studies find that the GL distribution explains the physical distributions better than the GEV distribution.
(a) Modelling the extreme returns distribution with the peaks over threshold (POT) approach:₋ The strand of EVT literature which identifies the extreme returns with the POT approach can be categorized into two sub-strands. The former sub-strand tries to describe the empirical distribution of extreme returns with the theoretical generalized pareto (GP) distribution whereas the latter uses non-parametric estimators of the tail-fatness index.
These studies suggest that the GP is a robust candidate distribution to describe the extreme returns distribution and the (POT)-GP distribution framework is a superior alternative over the HS, MC, EWMA and skewed GARCH approaches to model the VaR and ES measures.
Additional insights that are obtained from reviewing this sub-strand include: (i) Youssef et al. (2015) found that long memory and volatility asymmetry process (ex. FI-APARCH) considerably improves the accuracy of the VaR and ES estimates.
(ii) In addition to volatility asymmetry, the incorporation of asset price seasonality (ex. in commodities) and jumps in the conditional volatility models leads to more accurate and conservative VaR and ES estimates. Chan and Gray (2006) advocate the use of seasonality while Ze-To (2008) and Liu et al. (2018) support the adjustment for jumps.
(iii) Kellner and Rosch (2016) determined that the ES measure carries more "Model Risk" than VaR, and measured the relatively greater model risk of ES over VaR.
(iv) Muela et al. (2017) observed that the GPD VaR is more conservative and accurate than uncorrected CF VaR in forecasts.
The second sub-strand of the EVT-POT literature that models extreme returns distribution with the non-parametric Hill Estimator of tail fatness index and its extensions, consists of studies such as Ponwall andKoedijk (1999), Huisman, Koedijk, Kool, andPalm (2001), Odening and Hinrichs (2003), Gencay and Selcuk (2004), Bao, Lee, and Saltoglu (2006), Walls and Zhang (2006), Bhattacharyya and Ritolia (2008), and Straetmans et al. (2008). The two common findings across the studies in this line of research are the following: ₋ (i) These studies suggest that while the tail (fatness) index estimates from the POT-Hill type estimators are more stable than the tail index estimates for the POT-GP distribution, nonetheless the GP distribution derived VaR estimates are more accurate than the Hill-derived VaR estimates. (ii) In addition, Hill and modified Hill estimates' derived VaR and ES measures have more forecast accuracy than the naïve HS VaR estimates.
In addition, few salient insights gained from individual studies are as follows: ₋ (i) The Huisman et al. (2001) modification of the Hill (1975) tail fatness index estimator reduces the small sample size bias of rare extreme market events. (ii) Bao et al. (2006) observed that while the Conditional Quantile (CQ) or the Conditional Autoregressive VaR (CaViaR) approach is a close modeling alternative, the Hill estimate approach, GPD and GEV distribution-based method perform exceptionally well in providing accurate VaR and ES estimates during market stress periods. (iii) In an empirical study, on the US equity market, after , Streatmens et al. (2008 found that pure new sectors (example PC's, Biotech, Internet) have greater extreme market risk and correlated market risk than new-old sectors (Utilities, Bank, Insurance, Pharma).
The reviewed empirical literature on EVT comes across as heavily devoted to evaluating the left tails of returns distribution or equivalently the right tail of the loss distribution. In fact, this literature review could identify only 3 empirical works viz Gencay and Selcuk (2004), and Karmakar (2013), which have compared the left and right tail risks. The findings suggest that the left and right tail risks are statistically asymmetric. From a financial economics perspective, the findings indicate that the upside extreme rewards for a long position asset holder are lower than the downside extreme market risk. Equivalently put, the extreme market risks of a long investor appear to be higher than those for a short investor.
Moreover, an evaluation of the EVT literature suggest a nuanced common acknowledgement that while the EVT is a reliable and highly accurate (extreme market risk) estimation framework, the accuracy of the EVT-based risk estimates is likely to be comparable to that of risk measures from competing methodologies, especially at moderate tails. Actually, the EVT estimates may not be highly accurate for confidence levels lower than 99% (usually at 95%) and in few assets may be even perform worse than alternative models. However, the accuracy and reliability of the EVT quantile (i.e. VaR and ES) estimates improve as one moves farther into the tails, at higher confidence levels (ex. at 99% and beyond), where the estimates significantly outperform those of alternative approaches. In fact, Kuester et al. (2006) suggest that this observation is highly relevant for conditional rather than for unconditional EVT models. Nonetheless, the results of the empirical EVT literature argue in favour of the estimation of VaR and ES measures in the conditional EVT framework. In this context, Bystrom (2004) argues that the unconditional VaR and ES measures are of greater relevance to long-run investment whereas the short-run traders need to pay attention to small-horizon risks measured by conditional estimates.

Conclusions
This article has argued that since the 1980s, financial markets have risen rapidly to acquire deep and widespread influence not just on financial intermediaries (FI's) but on the broader real economy: corporations, ordinary citizens, and quite interestingly on monetary policy. Therefore, in addition to dislodging financial markets and eroding investor wealth, the systemic risks of severe consequences of rare extreme market events can spillover and impair the wider economies, globally. The experience of past extreme financial markets supports this contention.
A review of literature leads to an understanding that overwhelming majority of the empirical research has measured extreme market risk with quantile estimators; predominantly with Value at Risk (VaR), and in recent past with Expected Shortfall (ES). This paper finds that the stark deficiencies in the three fundamental models, i.e. Gaussian, Historical Simulation and Monte Carlo Simulation, has spawned the development of several considerably improved alternative VaR-ES measurement frameworks, especially after the global financial crisis of 2008. However, the liberty of FI's in the internal models approach (IMA) of the Basel norms to use any statistically accurate VaR-ES methodology within an array of multiple approaches for risk measurement, and subsequent monitoring, control and reporting is strongly anticipated to promote regulatory arbitrage. Hence, the identification of the most suitable/accurate VaR-ES estimation model is the principal aim of this review, which presents the crucial insights gained from examining a wide range of VaR-ES models. It discusses the relative strengths and weaknesses of the modelling alternatives. Specifically, it finds that extreme value theory (EVT) is a highly accurate candidate framework to model the tails of the returns distribution where extreme market events are realized. The EVT methodology is followed closely by the filtered historical simulation (FHS). In addition, the nonparametric Hill (1975) and Pickands (1975) family extensions of tail fatness index estimators and the asymmetric and non-linear extensions of the semi-parametric Conditional Quantile (CQ) approach yield quite accurate estimates. Moreover, in the parametric framework, conditional volatility models that assume skewed and leptokurtic distributions ₋ Skewed Generalized t (SGT), followed by the skewed generalized error and skewed t distributions ₋ which can accommodate volatility asymmetry and long memory, provide superior VaR forecasts. This is best observed in the realized volatility (RV) models, followed by the FIAPARCH models in the GARCH family. In fact, few empirical works like Asai et al. (2012),  and Abad and Benito (2013) suggest that the choice of the theoretical return distribution dominates over the choice of the conditional volatility model in the accuracy of VaR-ES estimates. Lastly, the corrected Cornish Fisher (CF) approach can offer reasonably accurate VaR-ES estimates.