Pricing electricity day-ahead cap futures with multifactor skew-t densities

Short-term risk management is becoming increasingly significant in power trading as the intermittent renewable generators introduce more weather risk into the price formation dynamics. There is a vacuum in hedging instruments at the day-ahead stage to protect retailers in particular from such volatility and price spikes. Motivated by this requirement, this paper analyses a flexible hedging product, day-ahead cap futures. For pricing this product, we parametrically predict the probability distribution of day-ahead prices using the multifactor Generalized Additive Model for Location, Scale and Shape (GAMLSS) based upon the skew-t distribution with weather forecasts and calendar information as explanatory variables. In particular, we reveal that this higher-order moment model is superior to several lower-order models such as the normal distribution in all the following three aspects: fairness as pricing method, underwriting risk of the risk-taker and the variance reduction effect of the risk hedger.


Introduction
Hedging the spot price risk of commodities close to delivery is analytically challenging because of the special idiosyncrasies and fundamental factors which materialize when physical supply and demand are matched. The Samuelson effect of volatility increasing as futures approach maturity has been researched since the 1960s, but some commodities also manifest substantial changes in their skewness and kurtosis close to real-time. Furthermore, there is often a requirement by producers and consumers for financial products, as yet unfulfilled by the market, to hedge these short-term risks. Electricity is a good example and an important one in terms of value. Weather conditions have always affected electricity demand but, with the increasing penetration of renewable generators, weather now has substantial effects on supply as well. Thus, close to delivery, for example, the distribution of spot prices will often exhibit changes in skewness and kurtosis, driven by weather conditions, sometimes flipping the sign of the skewness significantly . Evidently such a radical switch in the skewness can switch the balance of risk between producers and consumers. In contrast longer * Corresponding author. Email: mtakuji1122@gmail.com term forward trading is based upon average expectations and the daily futures returns tend to behave more regularly, being stochastically distributed closer to normal variates. In this paper, therefore, we develop a hedging product to support the short-term, day-ahead hedging requirements of electricity retailers and generators who are facing the increased spot market price risks induced by the penetration of renewable energy facilities. The approach which we develop can be generalized to other commodity markets, but electricity is particularly appropriate in its complexity to show the effectiveness of the methodology.
For the stylized setting, consider an electricity retailer at the day-ahead stage. The most liquid wholesale electricity 'spot' markets worldwide tend to be the day-ahead auctions in which generators and retailers set prices for delivery at hourly intervals during the next day. These auctions typically take place around noon, matching bids and offers from the retailers and generators, with the consequent market clearing prices for all 24 h of the next day being revealed within an hour or so later. In some markets the delivery periods are less than an hour, being 30, 15 or even 5 m, but we refer to the hourly periods as generic. Forward trading prior to the day-ahead auction, if it were active, would not have been at the hourly granularity, but rather in the form of average daily or at best average daytime prices. So prior to the day-ahead auction, the retailer would be only partially hedged for the next day's hourly prices; perhaps not at all if the forward prices were not liquid, or carried high forward premia, or if the retailer did not want to incur the collateral costs of holding derivatives. Just before the auction opens, the retailer will make price predictions to support its bids and in doing so may be concerned about the risk of very high prices at some hours. Since the retailer will typically have already sold power to its customers on contracts over longer durations, the cash flow risk of high, scarcity-induced, spikes in some hours would be a serious concern. On the other side of the market, if a renewable generator had sold power under a forward contract and it was concerned about a drop in output because of the weather, it may also be buying in the auction to fulfill its prior contract and could be similarly concerned about a price spike. In both these cases, a selective hedge against high prices in particular hours would be desirable. We therefore look at the design of a market product based upon an adaptation of day-ahead cap futures (DCF) † to hourly electricity prices, as an instrument to provide this protection, and we examine how it could be fairly priced.
The paper is therefore structured as follows. In the next section, we provide some of the background research and practice in order to position our research contribution. Then we formulate the proposed DCF contract schemes and define performance evaluation indexes for pricing it. In Section 4, we provide an electricity market overview of the application to Japan Electric Power Exchange (JEPX) in order to specify an appropriate forecasting/pricing model. In section 5, we construct the price density forecast model and then in Section 6 we verify the estimated results in terms of the accuracy of the forecast model. Section 7 demonstrates the evaluation of the DCF pricing and Section 8 concludes.

Background
The process of hedging price risk in electricity markets with derivatives has evolved in several markets in line with the progress of liberalization and diversification of participants. Futures and forwards have become widely used, but swaps, standard options, and various exotic options, traditionally used in the field of financial markets, have been introduced more slowly into electricity markets (Eydeland andWolyniec 2003, Aid 2015). Electricity derivatives with call option type payoffs at various granularities are listed on the exchanges of some countries, attracting varying degrees of liquidity. The most basic options are Calendar-year, Quarterly, and Monthly Physical Options. These correspond to the financial products that have call option type payoff functions with the annual average, quarterly average, and monthly average spot prices as the underlying assets, respectively (Eydeland and Wolyniec 2003). Furthermore, there are some finer time granularity options called Fixed-Strike Option (a.k.a. Daily Option) and † Noting that spot electricity products are not priced until the dayahead auction is executed and that the proposed hedge position caps the spot price for the buyer, we follow the convention in some markets for this instrument to be referred to as 'cap-futures' instead of 'call options'.
Floating-Strike Option (a.k.a. Index or Cash Option) in the US markets. These options have in common that they allow the owners to make daily decisions of exercise during the exercise month (contract month), and the difference between the two is that the former is an option with a fixed exercise price and the latter has a floating strike price that is determined the beginning according to a monthly index (Eydeland and Wolyniec 2003).
Similar to options, but more selective in their risk management role, are the so-called Cap (Floor) Futures as listed on ASX in Australia (Aoude et al. 2016) and EEX in Germany (Hinderks and Wagner 2019). Their difference to regular options is that 'they act as futures contracts, meaning that the payment is made at the end of the delivery period' (Hinderks and Wagner 2019). The Cap/Floor Futures which started trading on EEX in 2015 has a time granularity that is finer than the other electricity derivative products. They use the hourly intraday price in the EPEX SPOT market as the underlying, and the exercise period is one week (that is, the contract decision is made every week). These products were designed for participants that want to protect on a weekly basis against extreme prices. As of 2020, however, this weekly product had attracted very little liquidity.
A less mature market than Australia or Germany is Japan, where the power exchange liquidity is still emerging and new market instruments are actively under consideration to meet the participants' risk management requirements. We use Japan, therefore as a practical case study of our approach, since the need there is particularly topical. However, our proposed instrument would be relevant to most day-ahead electricity markets. Thus, following full retail liberalization in 2016, many retailers entered the Japanese electricity market and the competitiveness of the market has been increasing year by year. ‡ Moreover, coupled with the fact that the price spike risks have increased due to recent changes in the power supply mix (see Section 4), the importance of spike protecting measures is significant. For example, in Japan, there is a widely used hedging contract called 'Joji Back-Up (JBU)' which allows the retailers to procure next-day electricity at a fixed price from power generators (EMSC 2017), but there is an institutional motivation to look for more selective and flexible hedging instruments which would be especially appropriate for new entrants. § We do not, however, in this research consider the design of an instrument to cap hourly prices risks as part of a government intervention, which may involve some kind of subsidy, but rather analyze whether a fair price can emerge from the market for such an instrument as offered by an intermediary, or negotiated between participants. Further, we analyze whether such a fairly priced cap ‡ In addition to the rapid expansion of the scale of electricity trading at JEPX since 2018 (e.g., JEPX's share of domestic electricity demand in the summer of 2019 (July-September) was 36.6%, 2.0 times that of the same period of the previous year), the share of electricity sold by retailers has also been steadily increasing, and the market considered to be more competitive (EMSC 2019).
§ Regulators have considered abolishing JBU (METI 2020), stating that it is 'not desirable that new entrants are dependent on JBU', and it may incur 'problems in terms of fair and effective competition' (JSEC and METI 2020). Thus, the need for flexible and fair hedging instruments to replace JBU is under active consideration.
traded between a retailer and a generator or an intermediary can be mutually profitable. A cap product which has the effect of partially removing price risk can be considered to be a one-way contract for difference (CfD), whereby if the market price is above a strike price, a counterparty will repay the difference. There is an important distinction, however, with the CfDs which are becoming a favored approach by governments to incentivise investment in renewable facilities. For example, two-way CfDs are administered by the UK government to de-risk renewable generators' price exposure, thereby encouraging investment. If the electricity market price is below (above) the strike price the difference will be paid to (or by) the renewable generators with the government as counterparty. This takes out the price risk completely for the generator but the strike price is set by the government (or via a government auction) at a level above average wholesale prices. Thereby it provides a subsidy. Furthermore, such a strike price is set for a long period of time to provide a bankable cash flow to support new investment. Thus, these two-way CfDs are serving a different purpose from the fair-priced daily caps that we envisage in this research. In the DCF trading scheme we propose, each hedger is assumed to be buying or selling the fair-priced products selectively and repeatedly on a daily basis; thereby having the desired effect of reducing only the risk of daily profit/loss fluctuations and not being confounded with any investment subsidies.
There are many previous studies on pricing option-type electricity derivative products. In much of the research, pricing models based on stochastic process originally developed for conventional financial instruments are generally applied (for example using Black-Scholes or Hull-White models) that assume a normal (or lognormal) distribution to obtain the option prices analytically, or by using GARCH or regime switching methods to incorporate volatility fluctuations with Monte Carlo methods for the pricing. Examples of these widely-used option pricing methods are in Di Masi et al. (1994), Guo (2001), Duan et al. (2002), or Buffington and Elliott (2002), and such models and pricing methods are reviewed by Eydeland and Wolyniec (2003). As for the relatively new EEX 'Cap/Floor Futures' mentioned above, Hinderks and Wagner (2019) also proposed a pricing method using the Hull-White model. But one of our key motivating observations is that over the short-term, day-ahead horizon, the conditional density function for each hourly price is not normal, but much better represented by a four-parameter density that can capture dynamic skewness and kurtosis as well as stochastic volatility. Furthermore, at the day-ahead stage, weather variables will be significant factors in driving the dynamic changes in the price density functions. Thus, we argue that the existing body of research on electricity options is not sufficiently accurate in its stochastic specifications for short-term day-ahead cap futures.
We therefore draw upon the research on four parameter density forecasting methods for day-ahead electricity prices using the generalized additive model for location, scale and shape (GAMLSS) with various exogenous variables, as undertaken by Serinaldi (2011), Bello et al. (2016), Gianfreda and Bunn (2018), Bunn et al. (2018), and Abramova and Bunn (2019). These studies report superior accuracy compared to conventional density models such as Quantile Regression and GARCH. In this research, we undertake the pricing method for DCF by first forecasting the price densities with this approach, which gives closed-form representations of the density functions. These allow an analytical calculation of the integral for the specified DCF payoff function. To the best of our knowledge, this study is the first attempt to apply GAMLSS to the pricing of financial derivatives in electricity markets. In the empirical analysis, we use multiple benchmark models including a normal distribution. As performance evaluation for pricing DCF, we calculate the expected net payoffs and prediction biases to evaluate fairness in pricing and measure the underwriting risk of the risk taker (insurer) and the variance reduction effect of hedger. We reveal that the highorder moment model using GAMLSS has advantages through a comparatively smaller prediction error, an improvement in fairness and effective hedging.
Note that for density forecasts, various machine learning methods can be considered as alternatives to GAMLSS. Despite their computational intensity and their ability to use nonlinearities, they may not necessarily result in better forecast accuracy (Weron 2014). They also have the disadvantage that the structure of the forecasting model is not transparent. In contrast, GAMLSS has the advantage in that it is easy to understand (highly transparent) because the density (and price of DCF) is described by the explicit formula with some forecasted parameters, and the functional form of the density predictions make it easily tractable for practitioners using derivative contract/pricing models.

Day-ahead cap futures and the pricing method
For a DCF, figure 1 illustrates the transaction and settlement procedure.
First, the DCF has a payoff (face value) function P(S t , K) := max(S t − K, 0) with the day-ahead price (the 'spot price' in JEPX) as the underlying, where S t is the observed spot price at time t and K, the 'strike price', is predefined. Just before the day ahead market, † the seller ('insurer') offers the buyer a price based on the expected value of the DCF: is the price density forecast (see Section 5). It should be emphasized here that DCF pricing requires density estimation of its underlying asset, i.e. the day-ahead price. This is because the DCF payoff is a non-linear function and point estimation may thereby bias the expected DCF price. It should be intuitively clear from figure 1 that the DCF price P t (K) will be different if the density's scale or shape changes.
If the buyer accepts this price of the DCF, the contract is concluded, and after the day ahead market (that is, after the announcement of the day-ahead price), the net payoff, π f (S t , K) := P(S t , K) − P t (K) (i.e. the difference between face value and the price quoted in advance), is paid by the seller to the buyer. In this way, the net payoff is designed to have an expected value of 0, so it can be said to be a fair † In JEPX day ahead (spot) market, the auction closes at 10 am the day before the delivery hours, and the clearing prices are announced promptly thereafter. derivative contract between risk-neutral players. In practice, risk-averse buyers (retailers) may be willing to pay a premium for this protection and insurers will want a transaction cost, but the formation of risk premiums is not considered in our initial formulation.
We introduce the two types of DCF: one with a fixed strike price and the other with a variable strike price linked to the price forecast. In the case of variable strike price, we consider a DCF whose strike price is k times the expected spot price S t (that is, the variable strike price is defined as k S t , where k is a fixed multiplier). We assume rational expectations and that the insurer would offer the variable strike price in accordance with the expectation of the forecasted spot price density function. We evaluate the effectiveness of the predictive model against benchmarks in three ways. One evaluation is in terms of minimizing the bias of the net payoff, which corresponds to the fairness of the pricing, the second is minimizing the underwriting risk (we use variance) of the risk taker (insurer), and the third is minimizing the post-hedged risk (i.e. maximizing hedge effect) of the risk hedger (retailer or power producer), which corresponds to the efficiency of risk burden/allocation.
There are many ways in which DCFs may be introduced and used in the market and in figure 2, we summarize just three possible schemes for comparison. In the figure, 'P', 'I', and 'R' refer to power producer, insurer, and retailer, respectively. The start point of the arrow means the seller and the end point means the buyer. The solid arrow represents a long-term (year-round) contract between two parties, and the dashed arrow represents a transaction that purchases only on the required days.
Of the two types, the fixed strike price DCF ([A] in the figure), corresponding to the cap/floor futures in ASX and EEX, assumes that the retailers will make a decision to purchase it as needed and not continuously throughout the year. On the other hand, there may be a need for retailers to purchase hedge products systematically all year to reduce the daily price fluctuation risk as efficiently as possible and with this in mind case [B] shows a DCF whose strike price is linked to the forecast price. † In [A-1] and [B] it is assumed that an insurer will be the seller and retailers will be the buyers, but in [A-2], we envisage that the power producer will be the seller and retailers will be the buyers, with an insurer acting as the broker.
To evaluate the fairness of pricing regarding issues (i) and (iv) in figure 2, respectively, we use a special notation for sample average of the net payoffs as is the sample average given observations of S t . In practice, since the same contract will be made multiple times, it is desired that the average net payoff over a certain period should approach 0, hence the sample average is an appropriate measure of fairness. (Conversely, the further this average net payoff is from 0, the more this derivative is a disadvantageous contract in which either the seller or the buyer suffers a final loss, throughout the contract period, and consequently, as such, the contract would not be considered fair).
The risk mitigation performance indices use variance, as shown in table 1 for each player's contract schemes. The pricing performance for the insurer (risk taker) evaluates how much the underwriting risk (variance of net payoffs) is suppressed during a certain contract period, that is V I,f (K) := Var[−π f (S t , K)] for the fixed strike price case and V I,v (k) := Var[−π v (S t , k)] for the variable strike price case, where Var[·] denotes the sample variance given observations of S t . The sample variance calculated in this way reflects the risk of fluctuations in the daily cash flow of the insurer, so naturally, the smaller the sample variance, the better in terms of risk mitigation.
As for retailers, they would like the net payoff to be close to zero on average whereas their variations are small so that the protection is effective. One possible measure of this cash flow risk is its variance and so the ratio of reduced variance of spot prices purchased with DCF protection to the original variance of spot prices as a Variance Reduction Ratio (VRR) is calculated for effectiveness (i.e. the smaller the VRR, the greater the hedging effect). As a result, the retailer's VRR is evaluated as a power producer is offering the protection, it could be done in several ways. The most likely way would be through its trading department and the process would then involve marking positions to market, in the same way as the insurer. It is possible however that it might take an asset-backed trading position and cover its exposure to the net payoff from its spot market generation sales. This is viable if its marginal cost is below K and thereby offers a fixed profit margin to the extra sales under the DCF. The generator's VRR is therefore defined as in table 1 as VRR P,f (K) := Var[ P t (K)]/Var[P(S t , K)]. VRR (VRR R,v (k) or VRR P,f (K)) so defined represents how much the exposure of the retailer's or generator's original position could be reduced by the portfolio combined with the DCF, so the smaller VRR, the higher the DCF's hedging effect. (Note that Yamada (2019b, 2021) define 1-VRR as the hedging effect and use it to evaluate the hedging performance of weather derivatives).

JEPX overview
In order to apply and backtest the DCF formulations in Section 3 to empirical data from JEPX we need to review some aspects of the Japanese electricity market, since ultimately the comparison is about using accurate price density forecasts. Firstly, figure 3 displays the changes in power generation shares over 2014-19. Evidently, solar power has been rapidly progressing, and nuclear power has restarted. These changes in the power supply mix also affect the price fluctuations. As shown in figure 4, the JEPX monthly average price has been strongly linked to global oil (WTI) through the LNGoil indexation, but in the more recent years the fall in prices in spring and autumn has been remarkable due to the increase in solar power generation. On the other hand, the probability of price spikes has been increasing in summer and winter, possibly due to the gradual abolition of thermal power that functions as the flexible power resource. Thus, when seeking accurate forecasts, it is necessary to incorporate strong seasonality and the fact that the seasonal effects themselves evolve over time.
Japanese derivatives have been slow to mature. The electricity futures, whose experimental listing started in September 2019, are transacted on 'base load' (24 h) and 'daytime load' (8:00-20:00). We observe that JEPX system price spikes around the 16:00-20:00 (as shown in figure 5) and so we define the period 16:00-20:00 as 'peak' for introducing the DCF described previously.
The densities of peak load prices from 2014 to 2019 are shown in figure 6. In the more recent years, the positive (right) skewness have become more noticeable, indicating that the risk of upward price spikes is increasing overall (base load and daytime load also have same tendencies as peak load, see Appendix 1). The density functions of these histograms were estimated by Skew-t distribution type 5 ('ST5') and the evolution of the estimated distribution parameters is shown in figure 7 (the reason for choosing ST5 will be described in Section 5). The parameter 'nu' corresponds to skewness and 'tau' corresponds to kurtosis. †

Density forecast model
The GAMLSS methodology was introduced by Rigby and Stasinopoulos (2005) and Akantziliotou et al. (2002) as a way of overcoming some of the limitations associated with the Generalized Linear Models, GLM (Nelder and Wedderburn 1972), and Generalized Additive Models, GAM (Hastie and Tibshirani 1990). GAMLSS are semi-parametric regression type models. They are parametric, in that they require a parametric distribution assumption for the response variable, and 'semi' in the sense that the functions of explanatory variables may involve using non-parametric smoothing functions (Stasinopoulos and Rigby 2007). GAMLSS is different from classical GLM framework allowing not only the location parameters but also shape and scale parameters of a relevant number of distributions to be modeled as functions of exogenous explanatory variables.
Assuming the probability density function of response variable Y for T observations is given by f Y (y t |θ t ), where θ t = (θ t, 1 , θ t, 2 , θ t, 3 , θ t, 4 ) T = (μ t , σ t , ν t , τ t ) T is a vector of distribution parameters on day t (representing the location μ, the volatility σ , skewness ν and kurtosis τ ), we use 'parametric' linear GAMLSS framework which relates distribution parameters defined as follows: (1) † Skewness and kurtosis are calculated by complicated formulas of distribution parameters (see Appendix 4), so they are not equal to nu and tau, respectively, but there is a certain correspondence between the distribution parameters and the orders of the moment.  where l ∈ {1, 2, 3, 4} specifies the distribution parameter corresponding to μ, σ , ν, τ respectively. θ t ∈ R T is a vector comprised of values for distribution parameter l over t = 1, . . . , T time steps, T is the number of observations, g l (·) is the monotonic link function of distribution parameter l, η l ∈ R T is the linear predictor vector for distribution parameter l, J l is the number of exogenous variables for parameter l, and X l ∈ R T×J l +1 is the design matrix for independent variables. We assume identity link functions g 1 (·) and g 3 (·), for the location and skewness parameter, whereas logarithmic link functions are used for g 2 (·) and g 4 (·) to ensure positivity for standard deviation and kurtosis.
The challenge in constructing the JEPX price model is how to model strong seasonal trends and the interaction between seasonal trends, annual changes, and explanatory variables. Yamada et al. (2015) estimated the JEPX seasonal trend as a non-parametric smoothing function by using GAM. In present study, however, in order to ensure the robustness and avoid computational overload, we do not use the non-parametric method, but instead, we use yearly cyclical dummy variables based on Fourier series expansion, thereby estimating all parameters parametrically.
First, we define the following Fourier expansion-based seasonal functions (yearly cyclical trends).
where θ i, t = 2πi × Seasonal t /365.25, Seasonal t (= 1, . . . , 365(or 366)) is yearly cyclical dummy variables allocated from 1st Jan. to 31st Dec., Period t is the elapsed days from the beginning of sample period †, α, s and c are coefficients (constant) estimated by GAMLSS, and m is the number assigned to the seasonal terms in each formula of distribution parameter l. Next, we introduce the following exogenous variables. The temperature prediction error Temp t (calculated by removing the yearly cyclical trend from the measured value as with Matsumoto and Yamada (2019a)), sunny dummy Sun t which is 1 when previous-day weather forecast is 'Sunny', and 0 † More concretely, in this work, we defined Period t as 1 − exp[−Day t ],where Day t is number of elapsed days (annualized) from 1st Jan. 2013 considering solar power's rapid introduction from 2013 and recent declining trend in Japan as with Matsumoto and Yamada (2019a). otherwise (note that averaged across the targeted load time zone), and WTI t , WTI spot price in the previous month of the month to which the day t belong. Compared to elsewhere, there is very little transparency of system data nor are there any (public) price forecasts and system forecasts for load, generation and reserve margins available in Japan, as we see in some other markets. In the future, the model can be refined by incorporating further market data as it becomes available.
Nevertheless, the above variables are able to demonstrate the value of the methodology. Using the above periodicity function and explanatory variables, the formulae for each parameter corresponding to (1) can be specified. First, we specify the locational parameter μ. We incorporate the strong correlation with WTI in the model as indicated in figure 4. Also, we added the interaction of Period t × Sun t to capture the increasing influence of solar power generation.
Next, the following equation is defined for time-varying volatility σ . As with Serinaldi (2011), we include absolute value of price change |S t−1 − S t−2 |, which summarizes the daily price volatility.
Similarly, skewness and kurtosis parameters ν and τ are defined by the following equations. In order to estimate highorder parameters robustly, we use the lower dimensioned Fourier dummy valuables (or constant) for the coefficients (but the coefficient of temperature is assumed to be periodic as it seems to significantly change in summer and winter).
According to the multifactor formulations of equations (3)-(6), three models were successively estimated with a Skewed t density, the ST5 variant, as described below: 'M4' with all four parameters time-varying, 'M3' with time-varying μ, σ , and ν but constant τ , 'M2' with time-varying μ and σ but constant ν and τ . In addition, as a case to compare with ST5, we also consider 'NO' model that estimates time-varying μ and σ (3)-(4) with a normal distribution and 'OLS' that estimates only time-varying μ (3) by the least-squares method. It is possible to estimate GAMLSS with a number of four parameter density functions. Keeping in mind that the skewness parameter can be both positive and negative, we extracted the following possible distributions as in Gianfreda and Bunn (2018): the 'skew-t' (ST1 in Azzalini 1986, ST2 in Azzalini and Capitanio 2003, ST5 in Jones and Faddy 2003, the 'Jonson's SU' (JSUo in its original parametrization as in Johnson 1949 and JSU in its alternative parametrization as in Johnson 1955), the 'sinh-arcsinh' (SHASHo and SHASHo2 in Jones and Pewsey 2009), and the 'skew exponential power' (SEP1 and SEP2 in Fernadez et al. 1995). To identify the best fitting density functions, values of three measures of the goodness-of-fit (AIC, Global Deviance, and RSQ), which were obtained by applying equations (3)-(6) to each distribution parameter, are reported in table 2. † From this analysis, we decided to focus on ST5. ‡ To improve the forecast accuracy with extra adaptation, we included a 'short-term error correction' variable for the expected value, as in Yamada et al. (2015). § First, the predicted priceS t obtained from the density distribution ϕ t (S t ) = † SEP1 and SEP2 are excluded from the Table 2, because the model estimation could not be robustly performed, producing errors during the convergence calculation. ‡ Previous studies that modeled the density of electricity prices include Huurman et al. (2012) that used the GARCH-type model forecasting (log) normal distributions and Voronin et al. (2014) that used the Gaussian Mixture model (GMM; Reynolds (2009)), which is not implementable in GAMLSS. In this study, we focused on the GAMLSS framework because it can effectively forecast densities using exogenous variables and therefore on the densities covered in related studies using GAMLSS. The Skew-t density has been found to be the most effective and flexible density for use within GAMLSS for electricity prices (see, e.g., Panagiotelis and Smith 2008, Abramova and Bunn 2019. § Yamada et al. (2015) combined a long-term trend estimation with a short-term (V)AR model for the residual series to predict JEPX price and demonstrated that the short-term model significantly reduced the prediction errors. Although such autocorrelation could be considered by estimating the GAMLSS in daily rolling manner adding the variable of the previous day's price to the equation for 'mu', we adopted ϕ t (S t ; μ t , σ t , ν t , τ t ) estimated by the GAMLSS is given by the following equation:S Then, an AR(1) model is applied to the time series of the prediction errors ε t := S t −S t to correct any bias and autocorrelation that may emerge over the short term: where OLS estimation is used and η t is the random residual. Consequently, a predicted value of ε t is obtained asε t = a 1 ε t−1 + c, and the corrected forecast priceŜ t is obtained by the following equation:Ŝ Similarly, the corrected forecast price density is given by the following equation (note that μ t in ST5 is a location parameter that translates the average value as shown in Appendix 4 (A2)). In this paper, we use this corrected forecast density ϕ t (S t ) for DCF pricing:

Forecast accuracy
To validate the forecast accuracy, we use JEPX Tokyo area price ¶ for the day-ahead spot price S t [JPY/kWh] and for exogenous variables the following empirical data are used: • Weather forecast of 'sunny' Sun t and maximum temperature forecast error Temp t [°C] from previous-day forecast published by Japan Metrological Agency (JMA) • WTI oil price WTI t [JPY/bbl] from historical WTI spot price FOB * * In order to verify the robustness and validity of the model, we use three different load cases of 'Base-load (24 h)', ' ' and two different data period cases as follows (note that daily rolling short-term error correction using sample data for last 30 days is performed for each case, and this relationship is shown in figure 8  the same approach as Yamada et al. (2015), with an emphasis on the advantage of the ease of calculation and the consideration of short-term autocorrelation. ¶ Downloaded from http://www.jepx.org/market/index.html. Daily prices are obtained by averaging half-hourly data. Downloaded from http://pe-sawaki.com/WeatherForecast/, website that records past weather forecasts by the JMA. For obtaining temperature forecast error, we removed seasonal trend from observed data by using spline function. * * Downloaded from https://www.eia.gov/dnav/pet/hist/Leaf Handler.ashx?n=PET&s=RWTC&f=M.  The significance of the coefficients of M4 in equations (3)-(6) is shown in table 3. Overall, the coefficients of each term are significant, including the interaction terms (see Appendix 2 for each estimated parameter with seasonality). The coefficient of the second-order Fourier dummy for sunny weather was not confirmed in the 2018 forecast case, but confirmed in the 2019 forecast case, probably because the introduction of solar power increased the impact of sunny weather on prices.
In this section, the Pinball Loss (PL) score (see, e.g. Nowotarski and Weron 2018) is calculated in order to evaluate the prediction accuracy of the quantiles of the density forecasts, as in Gianfreda and Bunn (2018). First, figure 9 illustrates the average of all days of the pinball scores at 99 quantiles (on 1%, 2%, . . . , 99%) plotted by model. The line graph with dots represents the PL score for the basic forecast model, whereas the bar graph represents that with short-term correction. For comparison and to assess the value of information, the results of forecasts without weather forecasts and lagged prices are additionally shown as the line graph with crosses. † In each case, PL score decreases as the number of incorporated distribution parameters increases. The † It mean the models consisting of only terms with coefficients of β 1 , f l1, (t), f l2, (t), and f l3, (t) out of equations (3)-(6). latest information such as weather forecasts contributes significantly to the improvement of PL score, and the effect is particularly remarkable in the higher-order moment models. The short-term error correction also significantly improves the PL score for all model and load cases.
Similarly, PL scores for each percentile are also measured in the same way as Nowotarski and Weron (2018) as shown in figure 10. Overall, the PL score is low for the high-order moment model at all percentiles. Focusing on the peak load with strong skewed density, the PL score of M2 becomes relatively higher than M3 and M4 especially around high percentiles. That is, M3 and M4 are superior in terms of forecasting upper quantiles (long tail due to price spikes) at peak load.
As some studies have demonstrated that GAMLSS is superior to other models in terms of the accuracy of point estimates as well as the quantile estimates ‡, we also calculated the RMSE for the forecasted (expected) value. As shown in ‡ For example, Serinaldi (2011) reported the superiority of GAMLSS in Mean Week Error (MAE normalized by week) by comparing over ARX and AR-GARCH. In the field of real estate price forecast, Cajias (2018) verified that the prediction error (calculated as RMSE and MAPE, etc.) of GAMLSS is smaller for both in-sample and out-of-sample than other point estimation models such as OLS and GAM, and similarly, Mayr et al. (2012) demonstrated superiority of GAMLSS in point prediction accuracy (by using mean squared errors) compared with GAM. Table 3. Significance of each coefficient for GAMLSS. Figure 9. Pinball Loss scores for each model. Note: The bar graph shows the value when both the weather forecast and short-term correction are incorporated, the line graph with dots shows the value when short-term correction is not incorporated, and the line graph with crosses shows the value when both are not incorporated. Figure 10. Pinball scores on each percentile for each model. Figure 11. RMSE for each model. Note: The bar graph shows the value when both the weather forecast and short-term correction are incorporated, the line graph with dots shows the value when short-term correction is not incorporated, and the line graph with crosses shows the value when both are not incorporated. figure 11, the tendency was confirmed that the higher the incorporated distribution-parameters of the model, the smaller the RMSE for all six cases. Interestingly, NO model has a larger prediction error than OLS. Since the normal distribution for prices is not originally suitable, the NO model with its stronger specification constraints may have been overfitted † (in this regard, we have added a more detailed discussion in Appendix 3, showing the daily fitting results). In addition, RMSE is significantly reduced by incorporating the weather forecast into the GAMLSS model as well as combining it with the short-term error correction, as with the PL score in figure 9. Thus short-term exogenous and adaptive data is valuable for more accurate predictions.
Regarding the prediction accuracy using RMSE for the other possible second-and third-order parameter models, the results are displayed in table 4 (Note that the short-term error correction is not considered here; that is, the values correspond to the line graphs with dots of figure 11). The models of LOGNO (lognormal distribution), SN1, and SN2 (skewnormal type1 and type2) produced unstable results. For SN1, there were also cases of nonconvergence (displayed as NA). † In both NO and OLS, maximum likelihood estimation (MLE) is performed assuming that residuals have normal distributions (OLS corresponds to MLE with an assumption of normality for the regression error term (e.g., see Press et al. 1992)), but there is the difference in that NO model of GAMLSS imposes the constraint also to the time-varying size of standard deviation whereas OLS does not incorporate such constraint.
Additionally, as other possible three parameter models, we also tried fitting to distributions of GIG ‡ and exGAUS §, but they did not converge in all the six cases. ¶

Evaluation of the DCF pricing
In this section, in order to evaluate the performance of the pricing models for DCF, the results for the performance indices, related to (i)-(vi) in figure 2 and Table 1, are produced first for the fixed strike price case and then for the variable case. Note that all the data used in this section for backtesting is the out-of-sample data described in Section 6.

Fixed strike price case
For the fixed strike price case, we measure the performance indices when K is changed from 0 to 20 [JPY/kWh].
Net Payoff of insurer: −π f (S t , K) := Mean[ P t (K) − P(S t , K)] ‡ Generalized Inverse Gaussian distribution. It is widely used for financial time series modeling.
§ Exponential Gaussian distribution. Carr and Madan (2009)    As shown in figure 12, the net payoff of the insurer is closer to 0, as desired for fairer pricing, for the high-order moment model than for NO. Regarding NO model, insurer's net payoff (corresponding to overestimation for the DCF payoff forecast) is large for daytime and peak load cases. † Each plot has a peak near 10 [JPY/kWh] (which is close to the average day-ahead price: see figure 6, figures A1 and A2). This result reflects the possibility that the 'overestimation † In the OLS case, the price of DCF is obtained by simply substituting the point estimate S t into the face value function instead of using the spot price density; therefore, it essentially leads to a downward bias due to 'Jensen's inequality.' This may be thought of an interesting trade-off relationship between retailer's variance reduction and the pricing bias in the variable strike price case (detailed in Section 7.2, figure 17), and therefore, we included the OLS case here for reference. bias' of the DCF's price P t (K), which results from the estimation of density 'shape,' still remains albeit slightly. The reason for this is that P t (K) is less affected by the change of estimated density's shape when K approaches 0 or increases, whereas if K approaches the average of the distribution (around 10 [JPY / kWh]), P t (K) is sensitive to change depending on the density shape (i.e. left-skewed or right-skewed). ‡ This may be easily understood by moving only K in figure 1. Note that the same intuition can be applied to figure 15.
Variance of insurer: V I,f (K) := Var[−π f (S t , K)] ‡ In this regard, the results here mean that the estimation bias of the density shape has not been completely removed even for the high-order moment models. However, this incompleteness could be avoided as much as possible by selecting the appropriate strike price multiplier in the variable strike price case (described later in Section 7.2).   As shown in figure 13, the variance of net payoff of DCF is the smallest for M4 in all six cases and strike prices, which reflects the higher accuracy of M4. Note that the variance at K = 0 corresponds to the variance of the forecast error for the original spot price (since JEPX price is always positive). It also corresponds to the net payoff variance of daily 'futures' instead of 'cap futures'.
Note that the reason why in each graph, V I,f (K) is almost constant in the range where K is small but decreases as K becomes large is as follows. First, when K is large, both the payoff value P(S t , K) and the price P t (K) are small, so naturally net payoff π f (S t , K) is also small; accordingly, its variance V I,f (K) also becomes smaller. On the contrary, when K is small and especially when K < S t is almost always satisfied (i.e. P(S t , K) = S t − K), the following formula holds: Thus, π f (S t , K) is a value that does not depend on K. Therefore, the variance of net payoff V I,f (K) is constant when K is small (note that the 'mean' of the net peyoff is also constant, so the same is true for figure 12).
VRR of the power producer: As shown in figure 14, the VRR of the power producer is the lowest for M4 overall (M2 is the lowest only for the peak load 2019 case). The VRR of the NO is extremely high probably because the large forecast error resulted in increasing the variance of the forecast values (in the numerator of VRR P, f (K)). †

Variable strike price case
For dealing with the variable strike price case, we measure the performance indexes by changing the strike price multiplier as k ∈ {0, 0.1, . . . , 1.5}.
Net Payoff of insurer: The absolute value of the insurer's net payoff (the forecast bias) is small overall in the high-order moment models (M3 and M4). Particularly it is close to 0 for k = 0.8 − 0.9. OLS generally has negative net payoffs (i.e. underestimates) because the DCF's latent value stemmed from the distribution of price forecast error is not considered by OLS at all. ‡ Variance of insurer: In general, the pricing models that reduce the underwriting risk for insurer are also based on the high-order moment † Regarding VRR P, f (K), the relationship with K is not so obvious because, as K becomes larger, both the numerator and the denominator become smaller (additionally, if K is extremely large, only the denominator may become 0 and VRR P, f (K) may diverge). ‡ For example, when k = 1, the net payoff of OLS is always negative (i.e., −π v (S t , 1) = Mean[0 − P(S t , S t )] = Mean[−max(S t − S t , 0)] < 0). In other words, OLS (with k = 1) is a contract that just pays face value, and does not provide fair pricing. models (M3 and M4) for all six cases (figure 16). The variance of the M2 model is also small, but it increases at peak load (due to the relatively large forecast error). These results are almost the same as V I,f (K) in figure 13, and as a matter of course, they are the same values at k = 0(K = 0). Note that the reason why V I,v (k) becomes constant in the range where k is small and decreases as k increases is the same as explained in figure 13 for the fixed strike price case.
VRR of retailer: The high-order moment models turned out to have an advantage also in reducing the VRR of the retailer, as shown in figure 17. The VRR of NO is comparatively higher than that of high-order moment models, possibly because NO with low forecast accuracy tend to produce wrong high pricing even such a time when the resulted face value payoff is 0. The retailer's VRR in M4, which was confirmed to have overall the lowest insurer's variance among all models, was around 60% at the minimum when k is 0.8-0.9. Furthermore, at this time, the net payoff bias became just as close to 0 as previously shown in figure 15. The retailer's VRR by OLS is the smallest (40-50%) at k = 1 among all the models; however, this is a natural result because such a DCF contract is not fair in that the buyers can always receive the face value for free as mentioned.
The additional interesting aspect here is that the VRR of DCF with strike price 0 (k = 0), which is equivalent to that of daily 'futures', is larger than VRR of daily 'cap futures' with optimal strike price. In other words, if k is set around 0.8-0.9, such DCF is superior to normal futures in terms of the variance reduction effect for retailers (i.e. Var[S t − π v * (S t , k * )] < Var[S t − π v (S t , 0)]). It should be clear that when k is large, the effect of 'capping' on the original cash flow S t is small, and that the variance of the portfolio incorporating DCF becomes close to the same as the original variance (i.e. VRR approaches 1). On the other hand, it may not be obvious that there is an optimal solution for VRRs when k is around 0.8-0.9, so this result is a new insight revealed by this study. This means that as a day-ahead hedging product, 'cap-futures' (with non-linear payoffs) can have a superior hedging effect than 'futures' (with linear payoffs), and to price the capfutures, the accurate density forecasting method is highly crucial. In this respect, we argue that the proposed approach in this study (DCF product design and its pricing method) is of practical significance.

Conclusion
In this study, we developed a new hedge product 'Day-ahead Cap Futures' to meet the short-term hedging needs in electricity markets. We proposed a pricing method for it based on the multifactor skew-t price density forecast and verified its effectiveness by using empirical data from JEPX. For benchmarking comparison, we used different models such as Figure 17. VRR of retailer VRR R,v (k) (variable strike price case).
OLS, and variations of the multifactor specification replacing skew-t with the normal distribution, and other variants. As a result, it was found that the multifactor model that incorporates higher-order parameters has overall better prediction accuracy, contributes to reducing the insurer's underwriting risk and simultaneously enhancing the DCF hedge effectiveness on cost/revenue fluctuations.
In summary, this study's contribution includes the following details: • We applied the multifactor skew-t specification, estimated with GAMLSS to electricity derivatives' pricing for the first time and showed its effectiveness. Particularly, we confirmed that this approach, which can estimate a continuous price density more accurately with four parameters, has analytical tractability and is suitable for derivatives pricing. • We modeled intertwined seasonality trends that change over time even in the higher order density parameters by using a Fourier expansion-based dummy, confirmed its significance, and verified its effectiveness. The robustness was demonstrated under multiple cases and strike prices with several distribution types. • We revealed that this approach, assuming timevarying high-order distribution parameters, has the potential to improve the accuracy of mean values as well as quantiles, and conversely, that applying a conventional but misspecified normal distribution to electricity prices, in the context of strong skewness or kurtosis, would not be suitable for DCF pricing. • Aiming for more effectively suppressing cost/ revenue fluctuation risks of retailers and power producers, we proposed a DCF with variable strike price linked to the forecasted spot price. As a result, it was found that the high-order moment model is superior in terms of both increasing hedging effect of hedger and reducing the insurer's underwriting risks.
Overall, this research has shown the value of accurate density forecasts, which become particularly apparent in applications that relate to the extreme risks in tails of the distributions. Protecting retailers against the risk of price spikes in the dayahead electricity market was the motivating example of this analysis. The results clearly show that a cap futures product priced with the multifactor skew-t can be effective. We have not considered how much premium an insurer or a power producer might require in order to offer such a product, but our results imply there is sufficient headroom in the net payoffs for this to be viable. Furthermore, as we noted in the background review that similar cap/floor futures have already been traded on the power exchanges of Germany and Australia and so their viability is a fact. Rather, we have shown the value of a well-specified stochastic model for the conditional density forecasts which are required in the pricing formulas. Furthermore, we have argued that the short-term products at the day-ahead stage may become increasing attractive as renewable facilities introduce more weather risk into the daily price formation processes. Figure A2. JEPX day-ahead Tokyo price densities (daytime load).