Determinants of price fluctuations in the electricity market: a study with PCA and NARDL models

In the modern electricity markets, negative prices and spike prices coexist as a pair of opposite economic phenomena. This study investigates how these extreme prices play as the determinants to drive price fluctuations in the electricity market. We construct a two-stage analysis including a principal component analysis (PCA) and a nonlinear autoregressive distributed lags model (NARDL). We apply this analytical method to the wholesale Pennsylvania, New Jersey and Maryland (PJM) electricity market. We find that according to PCA, in the individual transmission lines, spike prices are determinants with largest explanatory power to the variation of prices, while according to NARDL, from the standpoint of the overall market, negative prices have a larger potential effect on both the real-time market and the forward market. These results are valuable and contributive to managers and operators in the electricity markets for policy decision making. ARTICLE HISTORY Received 3 June 2018 Accepted 25 February 2019


Introduction
The trend of modern electricity system reform in all major countries of the world is market-oriented. The modern electricity system is no longer an electric transmission grid only. It has been a wholesale energy market style with new multiple functions. Besides energy production and transmission, the electricity market is also responsible to secure reliable electricity supply, manage the energy prices and consequently improve market efficiency.
The marketisation of the electricity system brings more market phenomena. The original transmission organisation has become a wholesale market where the wholesale price of electricity is matched between generators and demanders. Like the financial markets, the electricity market takes the electricity prices to reflect the energy demand and supply. Along with the development of the wholesale electricity market second largest wholesale electricity market in the world. It coordinates the movement of power in 13 states and the District of Columbia, including over 240,000 square miles of territory, 84,000 miles of transmission lines and 65 millions of people. It includes over 11,000 transmission lines with hourly updated electricity prices. Findings from the PJM market can provide valuable lessons and experiences for other electricity markets.
Inspired by a series of recent literature, this study designs a two-step research procedures. First, following (Baek, Cursio, & Cha, 2015;Chakrabarty & Tyurin, 2011;Li, Cursio, & Sun, 2018) we construct a Principal Component Analysis (PCA) model to explore the negative and spike prices in each individual transmission line from PJM market, and see how the two types of prices affect the price fluctuation from the dimension of individual transmission lines. This micro-level analyses can bring latent insights to the managers and operators of the electricity markets. Second, we construct the nonlinear autoregressive distributed lags model (NARDL), and use it to assess the determinant of price fluctuation from the angle of the overall electricity market. Results from the NARDL model can shed light on market supervision and help market operators make decisions on the policy making and regulation adjustments.
Through the results of PCA we select six components and interpret their implication related to the covariates. We find that components with the largest explanatory power to the variation of prices are highly related to spike LMPs and the position and the extent of concentration of the overall LMPs. Therefore, in the dimension of the individual transmission lines, there exists over-demand with a high frequency. Different from PCA, results of the NARDL model suggest that the negative prices have a larger potential effect on both the real-time market and the forward market. As an implication, this finding confirms the contribution of the renewable energy incentive mechanisms, and believes that the renewable energy will fulfil the energy demand and help balance the energy equilibrium in the future.
The remainder of this paper is organised as follows. Section 2 introduces previous literature and research methodology. Section 3 describes the PJM market, data and covariates. Section 4 presents results of the PCA and NARDL models and implications. Section 5 concludes.

Literature review
As a special type of energy, electricity cannot be stored in large quantities. Therefore, the electricity market is widely viewed as the most difficult market to balance between supply and demand. For example, He and Victor (2017) study the electricity system in China after Chinese government provided access to electricity to its entire population in 2015. They summarise lessons and experiences about electricity supply in this large emerging countries and figure out that the power equilibrium is too hard to achieve. Yang (2017) also takes Chinese electricity system as the target and points out that the power equilibrium and efficiency are not achieved yet even after installing advanced metering infrastructure. Li, Cursio, Jiang, and Liang (2019) study the U.S. smart grid electricity system and find that the abnormal price movement in the electricity market, which is viewed as a signal of the inequilibrium between supply and demand, is relevant to calendar issues with significance.
Thus, previous studies have widely acknowledged that it is extremely challenging for electricity generators (especially those with unstable output) to fulfil the demand with a fixed amount of supply. As stated by existing studies (Bilitewski, 2012;Fr€ ommel, Han, & Kratochvil, 2014;Yuan, Bi, & Moriguichi, 2006) , a stable price level is a signal of inequilibrium between power supply and demand. Reduction in price swings is an achievement of both market efficiency and resource efficiency. Therefore, many studies pay attention to price swings, and contribute to its reduction from aspects of methodologies and phenomena.
Many studies focus on the extreme price values in the electricity market, and try to explore their trend and patterns. These studies can be sorted in two groups. One group of studies focus on the extremely high price records, which are usually called spike prices. They attribute the price swings to the prevalence of spike price. For example, Hadsell and Shawky (2006) focus on the high prices during peak hours, and examine the volatility characteristics of the New York Independent System Operator (NYISO) electricity markets. They find evidence that links the occurrence of spike prices and market price volatility. Joskow and Wolfram (2012), Dutta and Mitra (2017) introduce the progress of spike pricing in the electricity market, and discuss candidate technologies which could reduce spike pricing and thus control the proliferation of time-varying electricity pricing.
Another group of studies focus on negative pricing, which is the distinctive phenomenon in the electricity market other than the other financial markets. The occurrence of negative pricing arises because certain types of generators (e.g., nuclear, hydroelectric, and wind energy) pay demanders to take power instead of lowering their output due to technical and economic factors, even when demand is insufficient to absorb their output ( U.S. Energy Information Administration, 2012a, 2012b). Genoese et al. (2010) find that negative pricing has an increasing trend and unbalanced distribution in German markets, and it enlarges the price volatility. Barbour, (2014) state that negative pricing is the key factor to the energy efficiency and directly affects the development of relevant technologies, such as energy storage. Therefore, like spike prices, negative prices are also critical factors to the price swings in the electricity market.
Some existing studies suggest that the electricity market is driven by specific factors with dominant power. For example, Simanaviciene, Virgilijus, and Simanavicius (2017) investigate the psychological factors and their influences on energy efficiency in households, in order to identify and track individual's energy consumption behaviour. Therefore, we consider exploring the factors with dominant effects on electricity market price swings by using factor-related methods. One candidate method is NARDL, in which nonlinearities are introduced via positive and negative partial sum decompositions of the explanatory variables (Shin, Yu, & Greenwood-Nimmo, 2013). A number of studies show that NARDL is an ideal tool to examine prices relations (Ibrahim, 2015;Jammazi, Lahiani, & Nguyen, 2015;Nusair, 2017;Shin et al., 2013). For example, Ibrahim (2015) examines the relations between food and oil prices for Malaysia using a nonlinear ARDL model. Jammazi et al. (2015) use a wavelet-based NARDL to investigate the fluctuations in the exchange rates and its impact on crude oil prices.
Another method is PCA, which is considered as one of the most widely used techniques in multivariate statistical inference. Recent studies summarise the advantages of PCA into three aspects: (1) PCA reduces the dimensionality of the multivariate statistical problems, replacing a number of variables a smaller number of PCs which effectively summarise a previously large part of the variation of the data (Baek et al., 2015;Bai, 2003;Bai & Ng, 2002;Stock & Watson, 1998; (2) PCA is a preferable approach for studies with large data sizes (Aït-Sahalia & Xiu, 2019; Cao & Huang, 2007;Skiadopoulos, Hodges, & Clewlow, 2000); (3) PCA constructs latent common structure of factors and discovers the structural meaning (Chakrabarty & Tyurin, 2011;Forni, Hallin, Lippi, & Reichlin, 2000Forni & Lippi, 2001). In this study, we follow the structure of method in Li, Cursio, and Sun (2018) and build our PCA model.
Following the existing studies, in this paper we plan to explore both spike and negative prices. It is new by comparison with the previous studies which focus on one type of extreme prices only. From the perspective of economics, spike prices reflect the over demand while negative prices reflect over supply. The managers and operators of the electricity markets need to know about the patterns and trends of the market prices movement, and consequently have appropriate preparations for different types of extreme cases. Moreover, the existing studies suggest that the factorrelated analytical methods are powerful tools for studies with big data. Therefore, as another research target, in this paper we use the NARDL and PCA methods as the effective tools to explore the price fluctuations in the electricity market. This paper is an extension of the previoius study (Li, Cursio, & Sun, 2018).

NARDL framework
In this study we use the NARDL approach proposed by Shin et al. (2013). The basic model is described as follow: Where the dependent variable y t and its lag length y tÀ1 are scalar variable, x þ tÀj and x À tÀj are decomposed independent variables. The NARDL model has advantages for large data since it yields valid results regardless of whether the underlying variables are integrated of order one, zero, or a combination of both (Pesaran, Shin, & Smith 2001). According to Shin et al. (2013), Jammazi et al. (2015) and Nusair (2017), in the NARDL model the existence of a long-run relationship among a set of variables can be tested without any prior knowledge about the order of integration of the individual variables, which avoids problems associated with unit roots pre-testing. Moreover, both the dependent variable and independent variables can be introduced in the model with lags, which makes the test procedure more flexible than the other methods.

PCA framework
In this section we briefly review the mathematical mechanism of PCA. Principal Component Analysis is one of the most widely used techniques in multivariate statistical inference. From the perspective of mathematics, if there are a series of variables that are related, PCA can transform them into the same number of uncorrelated new variables. Suppose we have a column vector of n random variables x ¼ x 1 ; x 2 ; …; x n ½ T and its mean vector is a zero vector (E[x] = 0). Since these n random variables x ¼ x 1 ; x 2 ; …; x n ½ T are suspiciously related, we need to transform them into normalized linear combinations and find which combination explains most of the total variability. In mathematics, we look for a non-zero column vector B The variance of B'x can be written as Since the covariance matrix of x is C, the variance of B'x can be written as To find B, we solve the following Lagrange function where λ is a Lagrange multiplier. As the first order condition (FOC), the vector of partial derivative is We can simplify the FOC into the equation CB ¼ λB, which conforms to the expression of the eigenvalue. Therefore, λ is the eigenvalue to the covariance matrix C, and B is the corresponding eigenvector. To each λ i (i ¼ 1; 2; …n), the corresponding eigenvector B i has the explanatory power of the total variability. B 0 i x, the linear combination, is the principal component (PC) of x with variance equal to λ i .
Finally, we have a total of n PCs, which are independent to each other, as the substitution of x to explain the variability.
The PCs keep most of the important information contained in the original variables x. PCA enables us to identify the PCs as a new set of orthogonal factors.

PJM market
In this section we introduce the PJM electircity market. PJM was established in 1927 and is currently the leading electricity transition system in the world. Early in 1962, PJM installed its first online computer to control generation and then established the first energy management system (EMS). In 1997, PJM opened its first bid-based energy market and evolved into the largest deregulated wholesale electricity market in the world. In 2013, PJM launched a new stage smart-grid development and implemented the Advanced Control Center in order to ensure uninterrupted operation of the electric system and maintain the steadiness of the electric market.
More importantly, PJM serves as a clearing house of electricity power. Market participants, including the power generators and consumers, offer and bid for electricity on a real-time basis. PJM matches bids and offers and gives the market-clearing price within minutes of the spot trade. Then electricity will be generated and transmitted to each service area. PJM is not only an electric system, it conveys more functions like the financial markets, as discussed by Bessembinder and Lemmon (2002), Geman and Roncoroni (2006), Longstaff and Wang (2004), and Seifert and Uhrig-Homburg (2007).
Today, PJM is the biggest regional transmission organisation (RTO) of power in the United States, and coordinates the movement of power in 13 states and the District of Columbia. Areas served by PJM are divided by the transmission lines which are referred to as the pricing nodes (Pnode). The market-clearing price is referred to as the locational marginal price (LMP) and updated hourly. LMP is the sum of the cost of energy, the marginal cost of transmission loss, and the marginal cost of congestion, which are the leading contributors to volatility in electricity prices. It represents the incremental value of an additional MW of power transported to a particular Pnode.
Thus, in this study we take PJM as the research target and treat it as a market from the perspective of finance and economics. Findings from PJM will shed light on the efficient management on electricity markets.

Data and covariates
We use the hourly LMP data during 2013-2016 including distinct 11,574 Pnodes. Table 1 presents the descriptive statistics of LMPs. There are about 392 million LMP records on these Pnodes. We identify all the non-positive LMP records as the negative pricing. There are over two million negative LMPs, which count for about the bottom 1% of the total LMPs. Likewise, we distinguish the spike LMPs as the top 1% LMPs for each Pnode as a consistency of previous studies (Walawalkar, Blumsack, Apt, & Fernands, 2008). The mean of negative LMPs is À$26.22 and the mean of the spike LMPs is $326.21. The standard deviation of negative LMPs is 47.37 while the standard deviation of spike LMPs is 240.14. The overall ranges of both groups are wide: negative LMPs spread between À$2240.3 and 0 whereas the spike LMPs spread between $175.96 and $4643.74. The distribution of negative and spike LMPs are depicted in Figure 1.
For further analyses we use the 16 covariates listed in Table 2. These covariates are categorised into three categories of LMPs. We exclude the maximum of negative LMPs because it is zero in all Pnodes.

Results and implication
In this section, we give our analyses and results in two steps. First, following Li et al. (2018), we use PCA model to explore the determinants of electricity prices across Pnodes. The results are from the standpoint of individual transmission lines (Pnodes) to observe the price swings. Second, we use NARDL model to see how these extreme prices and the energy inequilibrium behind them affect the stability of the whole market. The results are from the standpoint of the overall electricity market, which can bring more insights for market managers and operators. Table 3 presents the variation explained by the eigenvalues of PCA. PC1, the component that has the largest eigenvalue 5.67, contributes 35% explanatory power to the variation of data. PC2 has the second largest eigenvalue (3.36) and explains 21% of the variation. The cumulative explanatory contribution by PC1 and PC2 has reached 56% as shown in the cumulative column. Including the first six components, the cumulative explanatory contribution by PC1 -PC6 already reaches up to 98%. Therefore, PCA helps reduce the dimensionality. We extract the first six components. They are constructed as linear combinations of covariates so that they have orthonormal loading coefficients. Table 4 presents how these PCs relate to our covariates and lists the coefficients of covariates for each PC in the columns. For example, PC1 is expressed as the linear combination of our original covariates by the following equation: PC1 ¼ 0:3697Mean À 0:3410Skewness À 0:37Kurtosis À 0:0947Neg Per À 0:0347Neg Min À 0:0925Neg Mean þ 0:0926 Neg Std þ 0:0639Neg Sku À 0:0810Neg Kur þ 0:3177Peak Per þ 0:3704Peak Min þ 0:3742Peak Mean þ 0:1286Peak Std À 0:3020Peak Sku À 0:2868Peak Kur þ 0:0641Peak Max

PCA results
In Equation (9) Similarly, in PC2, we find that two covariates have significantly larger absolute values of coefficients: Peak_Std (0.4198) and Peak_Max (0.4160). Both covariates are different from the dominant covariates for PC1, and are from the spike LMP group. Similar to PC1, PC2 can be interpreted as the distribution and dispersion of spike LMPs.
For PC3, three covariates are dominant as observed in Table 4: Neg_Min (À0.4527), Neg_Std (0.4240) and Neg_Mean (À0.3811). PC3 can be interpreted as the position and distribution of negative spike LMPs. Similarly, PC4's dominant covariates are also the skewness and kurtosis of negative LMPs (Neg_Sku and Neg_Kur). So PC4 is also a representative of negative LMP group. But compared with PC2, both PC3 and PC4 have less explanatory power. There is only one covariate that dominates in PC5 and PC6 respectively. In PC5, the only covariate is Peak_Std (À0.4019), and in PC6 it is Neg_Per (0.8583). As shown in Table 3, PC5 and PC6 have less explanatory power. They are supplementary variation.
In summary, through the result of PCA we select six components and interpret their implication related to the covariates. We find that components with the largest explanatory power to the variation of prices are highly related to spike LMPs and the position and the extent of concentration of the overall LMPs. Therefore, in the next part, we further examine the determinants from the perspective of the overall electricity market.

NARDL results
Results from PCA suggest that features from the overall market are critical to interpret the variation of electricity market prices. In this part, we further examine how these extreme prices and the energy inequilibrium behind them affect the stability of the whole market. Different from the PCA results which focus on the variation across the transmission lines, here our standpoint is from the overall electricity market and we examine the overall market's price movement from a time series perspective. It is new by comparison of previous studies that focused on the individual transmission lines only ( (Hadsell et al., 2004;Hadsell & Shawky, 2006;Holland & Mansur, 2006;Simanaviciene et al., 2017;Li et al., 2018).
The current PJM market system offers two basic types of markets in which participants may trade electricity. The first functions as a real-time market. In this market, participants can enter sale offers and purchase bids for electricity on a real-time basis, and depending on circumstances, electricity can often be generated and transmitted within minutes of the spot trade. The second market in the PJM system is a one-dayahead forward market. In this market, participants submit offers to sell and bids to purchase electricity for delivery during the subsequent day.
According to this market structure, we combine the hourly LMP data into the daily level. For 24 hours in day t, we calculate the standard deviation of the overall market as STD t , and the average hourly LMP as MEAN t . We include percentage of negative LMPs (Neg Per t ) and percentage of spike LMPs (Peak Per t ). Using the pair of variables about negative and spike LMPs, we can analyze and compare their effects on the market fluctuation. We use the NARDL approach on the basis of previous studies (Jammazi et al., 2015;Nusair, 2017;Shin et al., 2013). The basic model is described as follow: Table 5 presents the results of NARDL. According to the two-level market structure, the lengths of lags (N, O, P, and Q) are set as 1 because the one-day-ahead market at day tÀ1 can affect the subsequent day t. We examine the aspects of negative and spike prices and compare their impacts on the standard deviation of the overall market.
We first compare the percentage of negative LMPs (Neg Per t ) and percentage of spike LMPs (Peak Per t ). Since both extreme prices are components of the overall market price records, at day t, both Neg Per t and Peak Per t have positive effects on the market fluctuation (STD t ). But the coefficient of Neg Per t (0.5347) is larger than that of Peak Per t (0.1194), indicating that occurrence of negative prices has a larger influence on the market fluctuation. Similar situations also appear between the onelag variables Neg Per À tÀ1 and Peak Per þ tÀ1 . The coefficient of Neg Per À tÀ1 is À0.1848. It implies that the occurrence of negative pricing in the previous day will alert the market managers and make them adjust the real-time market in the subsequent day to reduce the market fluctuation. By contrast, the coefficient of Peak Per þ tÀ1 is À0.0011 and not statistically significant, implying that spike pricing does not have such an influence on the market fluctuation in the subsequent day as negative pricing.
The results from the NARDL model bring new insights from the overall market level. Different from PCA which focuses on the changes in the individual transmission lines, the NARDL model suggests that the negative prices have a larger potential effect on both the real-time market and the forward market.

Implications
In the modern world, the electricity system has evolved to be a market for power trading and the primitive power-transmission function has also been diversified. The modern electricity market not only takes charge of the maintenance of software, networks, and hardware units, but focuses more on the establishment and enforcement of the regulations and protocols for market participants, and the management of market-clearing settlement prices (H elyette & Roncoroni, 2006;Longstaff & Wang, 2004). The feature of marketisation forces the managers and operators to switch their original impression to the electricity system and study the new market from the perspective of economics. According to economics, the foremost mission of the market operators and managers is to balance the power supply and demand, maintain the wholesale electricity price and avoid the frequent occurrence of extreme prices. So they must supervise negative LMPs and spike LMPs, the signal of economic inequilibrium between supply and demand. But different from the other markets, the electricity markets have a distinctive feature, the negative pricing. Negative pricing is an outcome of boosting renewable energy sources. In the United States, the government has launched a series of policies to promote diverse types of renewable energy. For example, the wind power is one of the renewable energy that the U.S. government is advertising (Deng, Hobbs, & Renson, 2015;Zhao & Wu, 2014). The wind power generators receive large tax credits as the subsidy from the government to encourage continuous production.
However, our results show that the efforts of the government do not significantly accomplish the foremost mission by reducing the occurrence of extreme prices in the electricity market. The emergence of renewable energy does not reduce the spike prices, but incurs the negative prices coexisted in the market. Figure 2 depicts the distribution of negative and spike LMPs in 24 hours. There is a reciprocal relationship between the numbers of negative and spike LMPs. However, the majority of negative LMPs appear between the midnight and the early morning during which the large amount of energy is definitely not needed. In the daytime and especially the working hours, the number of spike LMPs is far larger than the number of negative LMPs. Figure 2 confirms the dominant place of spike prices and the results from the PCA model. It further suggests that the current electricity market is still facing the shortage of energy, even after the adoption of renewable energy incentive mechanisms. Although the appearance of negative prices has limited power to reduce the overdemand situation and the accompanied spike prices, promoting the renewable energy is still a promising and meaningful policy for the long-term development of the electricity market. The NARDL model provides new insights from a new angle of view. Results of the NARDL model are from the overall market level rather than the regional level, and indicate that the negative prices have a significant effect to maintain the fluctuation of both the real-time market and the forward market. Developing diverse renewable energy will eventually the large demand and consequently facilitate the market price stability.
The concurrent production of renewable energy is tightly driven by the timerelated factors. For example, the wind power is determined by the seasonality of wind, which makes the wind power fail to become a steady supplier. In addition, the limitations of transmission capacity are also a critical factor to cause the geographical energy imbalance. As the practical resolutions, the development of electrical energy storage (EES) can save the excessive energy during the over-supply period and release during the over-demand period. It may help reduce and remove the extreme prices in both types (Liu, Woo, & Zarnikau, 2017;Sioshansi et al., 2009). As another possible resolution, upgrading the transmission capacity may also help reduce the power inequilibrium. According to U.S. Energy Information Administration (2012a, 2012b), transmission loss and congestion are causes to incur the daily price fluctuations. Upgrading the transmission line can provide a smoother power transmission from over-supply regions to over-demand regions and consequently solve the spatial power inequilibrium.

Conclusion
As two frequently observed phenomena and the constituents of extreme values, spike and negative prices have opposite economic meanings. Negative prices indicate oversupply while spike prices indicate over-demand. This study assesses the impact of negative pricing and spike pricing on the price fluctuation. We evaluate the price fluctuation by the standard deviation of prices for each transmission line. We analyze the price data from the PJM electricity market including over 11,000 transmission lines with hourly updated records. For both negative and spike price groups, we calculate 16 relevant covariates by transmission lines. These covariates capture the distributions of spike prices and negative prices respectively.
To compare the effects on price fluctuations between negative prices and spike prices, we employ a two-stage analyses with a PCA model and a NARDL model. Through the result of PCA we select six components and interpret their implication related to the covariates. We find that components with the largest explanatory power to the variation of prices are highly related to spike LMPs and the position and the extent of concentration of the overall LMPs. Therefore, in the dimension of the individual transmission lines, there exists over-demand with a high frequency. Different from PCA, results of the NARDL model suggest that the negative prices have a larger potential effect on both the real-time market and the forward market. As an implication, our finding confirms the contribution of the renewable energy incentive mechanisms, and believes that the renewable energy will fulfill the energy demand and help balance the energy equilibrium in the future.
In summary, our results indicate that in the current electricity market, although types of renewable energy generators have already been participating and making contribution, the over-demand and energy shortage are still the big issues. The timevarying inequilibrium between power supply and demand are common in any area. From the perspective of policy makers, developing EES and upgrading the transmission capacity should be the resolution to reduce the market price fluctuations.
Additionally, our results suggest that PCA and NARDL are efficient tools for electricity market analyses. Using PCA, we can reduce the dimensionality of multivariate analysis, and discover the structural meaning of factors by constructing latent common structures. The NARDL model enables us to take a comprehensive picture of the overall market, and figure out the determinants of price fluctuations.