Electric short-term load forecast integrated method based on time-segment and improved MDSC-BP

ABSTRACT In this paper, an integrated forecast method is proposed based on multi-resources data, which improves the maximum deviation similarity criterion (MDSC) of time-segment BP neural network. The existing short-term load forecast methods for power systems will lead to the low accuracy or even failure of the load prediction method since the multi-stage load change and weather fluctuation factors are not considered. The improved similar day category screening method with the time-segment BP neural network model is employed to deal with the above problem in this paper, where a regional load characteristic law is used to divide the load into seven time periods such that a time-segment BP neural network model is proposed. Based on the feature vector and the real-time meteorological data of the forecast day, the trained forecast model can provide the load value of the forecast day and overcome the restriction of the historical load data. Meanwhile, the prediction accuracy and the training time are also improved under the fluctuating meteorological conditions. Finally, a load forecast of a certain area is given to show the prediction accuracy of different types of days can reach more than 96%,illustrate the effectiveness of the proposed methods.


Introduction
Scientific load forecast is of great significance and plays an important role in power system (Kang et al., 2017). Today, with the rapid development of smart grid technology, the complex power grid size increases the complexity of the electric power data, which requires the accuracy and real-time performance of the load forecast. Thus it becomes the ultimate goal of further research and development of power system load forecasting theory (Liao et al., 2011).
At present, the domestic and international scholars have conducted in-depth research on the optimal load forecasting methods and put forward some more optimized load forecasting methods: one is the traditional statistical methods, including linear regression, correlation analysis, time series and gray system method . These prediction methods are simple, but not suitable for some nonlinear influencing factors. The other method is machine learning, including fuzzy inference systems, artificial neural networks, wavelet transforms and support vector machines. Each of the above methods has obvious disadvantages, such as the inability of fuzzy systems to learn autonomously. The function of support vector can solve nonlinear and local minimum problems CONTACT Jing Lu lujing@hpu.edu.cn well, but there are some problems when dealing with large-scale data. The artificial neural network method has strong self-learning ability and better nonlinear processing ability, but it requires artificial selection of time characteristics and will lead to unsatisfactory prediction results (Lei et al., 2016). An improved grey correlation and bat-optimized neural network method is proposed in (Wu et al., 2018). The bat algorithm is used to optimize prediction model of BP neural network, and similar day samples were selected for training. The prediction results are better than that of the traditional neural network methods. The method of alternating particle algorithm to optimize the network weights of BP (Tang, 2017) such that the errors caused by solving the predicted values from the BP prediction model can be reduced. The prediction accuracy is improved by EEMD method (Wang & Meng, 2015), where the historical data was decomposed into a more stable component sequence, and different prediction models for the characteristics of each sequence were then constructed. Bayesian evidence method (Miao et al., 2015) is selected to optimize LSSVM parameters, and the final predicted value is more accurate. This paper shows that the factors affecting load changes include not only the type of date, meteorological factors and regional characteristics, but also the real-time changes of uncertain factors. Many literatures just consider the similarity analysis of historical data, where the meteorological factors on the characteristics of the load similarity were used to deal with the daily meteorological factors. However, only considering the similarities cannot analyse the load characteristics of meteorological factors, nor can it reflect mutation of meteorological factors on the effect of different time.
In view of the above analysis, this paper proposes an integrated forecast method of BP neural network based on improved MDSC. Firstly, an improved MDSC algorithm is used to cluster the load on the forecast day, so as to judge the category of similar days on the forecast day. Then the real-time meteorological factors and the influence of mutation factors are considered, the time period is proposed to train the BP neural network clustered similar day after class load data. According to the feature vector and real-time meteorological data of the forecast day, the load value of the forecast day can be output by the time-segment comprehensive prediction model of BP neural network trained. In conclusion, the proposed method can improve the load prediction accuracy.

Improved MDSC algorithm
In this paper, the MDSC algorithm is used to cluster the power load data according to the shape similarity of the load data curve. When there is a big difference in the points of the load curve load data, such as jump, this method has the disadvantage of excessively large difference in the values of similar load curves in clustering. Therefore, in order to solve this problem, a deviation limit interval was added to the original MDSC algorithm for improvement, so that the load data is clustered into a type, which has similar shapes, and the deviation is within a certain range, thus reducing the instability of the load .
Step 1: Suppose that the input sample load data is n m-dimensional data,x i = (x i1 , x i2 , · · ·, x im ) represents the i-th data, and x ik represents the load value at the k-th time point of the i-th data, i = 1,2, . . . ,n, k = 1,2, . . . ,m. For the absolute difference at the corresponding time points, from the following formula, s ij,k = |x ik − x jk |, i,j = 1,2, . . . ,n, k = 1,2, . . . ,m. Suppose x i and x j have the number of similar points n ij , satisfying s ij , k ≤ γ , and the number is n ij . Suppose the maximum number of k continuously m ij satisfying γ < s ij , k < δ, called m ij is the maximum number of continuous deviation between x i and x j . The maximum number of consecutive points of deviation, i,j = 1,2, . . . ,n, k = 1,2, . . . ,m. Among them, γ is a preset constant, called the maximum deviation, is a threshold value used to measure the similarity of the load values at two corresponding time points; δ is the allowable deviation of the maximum deviation point (Xi et al., 2019); if s ij , k ≤ γ , then x ik and x jk are similar, otherwise they are not similar; if s ij , k > δ, the two curves are not similar; the expression of m ij is as follows: From Equation (1), γ is 0.10, and δ is 0.25. Such a value would at least ensure that the load data is unstable.
Step 2: Calculate n ij and m ij with x i as the comparison centre. If they meet the following two conditions at the same time, they are called the maximum deviation similarity criterion. And x j and x i are similar, so all curves x j is similar to x i . The conditions are as follows: (1) n ij ≥ n 0 , n 0 = [α × m], 0 ≤ α ≤ 1, α is a similarity constant that can be preset. In order to ensure the high similarity of the load curve, the value of α is 0.80; In order to ensure the low deviation β of the load curve, the value is set as 0.20.
Step 3: Compare the calculated values of n ij , m ij, n 0 and m 0 , select x j that meets the MDSC, and let S( From Equation (2), if x i is the load curve that minimizes the value of D(x i ), then x i is the class centre of S( From the formula, i, j = 1,2, . . . , n.

Load characteristic analysis
The predictability of the load is reflected in the regularity and randomness of the load data, and the change law between the two is an important factor affecting the accuracy of load prediction. According to the load data of the selected area, the typical daily load curve of the area is as shown in Figure 1. It is found that the area does not have morning and evening peaks as the first-tier and coastal cities do, but according to the local production and life rules, there are noon peaks and late peaks (Gao et al., 2017). The load is in the trough period from 0 to 6 o'clock, and the lowest load is around 4 o'clock in the morning. After 6 o'clock, the load rises linearly until 9 o'clock. The load starts to stabilize until it reaches a peak at 11:30 and then decreases linearly. As it stabilized, it started to rise at around 16 o'clock, peaked at 17:45, and then declined. The unique region and long-term living and working laws make the daily peak load and peak load of the area basically fixed. These change laws are beneficial to the study of load forecasting in this area.

Division of time periods
The existing study finds that the regularity of the load is mainly reflected in the time and space dimensions. Therefore, based on the analysis of the clustering results of historical load data, it is a better method to establish prediction models for different periods (Pan et al., 2004).
Through the analysis of the above regional load data, the load composition and characteristics of different periods can be observed. Thus, according to this rule, the 96-point daily load is divided into 7 periods in the form of prior knowledge: 0: 00 ∼ 5: 00, 5: 00 ∼ 8: 00, 8: 00 ∼ 11: 00, 11: 00 ∼ 14: 00, 14: 00 ∼ 17: 00, 17: 00 ∼ 21: 00, 21: 00 ∼ 24: 00. Through the analysis of available each time strong regularity of load and has strong similarity, therefore in the region after the historical load data clustering, collect real-time change of meteorological data to predict day period of time, according to the meteorological factors to predict, it is concluded that result in the change of the load during index, through a period of time MDSC -BP model for load forecast, the time finally combined seven prediction model predicted results after get predict daily load data.
In this paper, the integrated prediction method is adopted to distinguish the load of different degrees in different periods, including industrial load, commercial load and civil load, and divide the influence degree of meteorological factors on load by thick lines (Luo et al., 2007). The prediction accuracy can be improved to some extent by making each prediction closer to the actual load value. In addition, the segmented model of daily load greatly reduces the scale of prediction model and reduces the difficulty and time of training.

Weather factor impact index
Due to the randomness of load objectively, time-segment integrated load forecast model is proposed, which takes into account the characteristic vector of load and realtime weather factors in different time periods. In conclusion, the combination of the above methods can improve the accuracy of load forecast.
The influence index of weather factors reflects the load changes in different periods under different weather conditions. Therefore, the relationship between different weather factors and load changes in different periods is mapped to the influence index of weather factors that represents the influence of weather factors on load changes (Fan et al., 2017). By classifying the historical weather data and load data in the past two years of 2017-2018 in this region. Furthermore, the load change affected by weather factors in different periods is taken as the influence index and normalized. The results are shown in Table 1.

Periodic BP neural network
BP neural network is characterized by its strong generalization ability. It can fit and learn the mapping relationship between input and output through complex nonlinear relations. Most of the neural network models in many literatures are full-point load forecast, which cannot fully reflect the load fluctuations with real-time influencing factors in different time periods. Therefore, based on improved MDSC time period this paper puts forward the integrated forecast model of BP neural network. The improved MDSC is used to cluster the load data curve to obtain stable load data curve, and the similar daily load data on the predicted day was determined as training samples. Then, the historical load value with sampling time of 15 min and the influence index of weather factors in different periods are used to train the BP neural network model, and the corresponding model in each period can be obtained. Then, the data of different time periods under the same weather factor can be distinguished according to the real-time updated weather factor influence index in the forecast daily weather forecast. Thus, the load value of each time period of the predicted day is trained and the accurate load data of 96 points of the predicted day is obtained after the inverse normalization of the model combination.
In this paper, a three-layer BP neural network prediction model is selected. The number of neurons H in the hidden layer of the neural network is calculated according to where M is the number of neurons in the input layer, N is the number of neurons in the output layer, and a is a constant between [0,10]. The parameters of the neural network are set as follows: the maximum number of iterations is 20, and the target value is 0.00001 (Saif et al., 2016). The hidden layer neuron transfer function is The predicted daily load characteristic vector V is composed of average load v 1 , maximum load v 2 , minimum load v 3 , daily power consumption v 4, and the daily weather type index v 5 on the predicted day. Assuming that the class centre of the i-th load curve S(x i ) is x i , the average load is x i1 , the maximum load is x i2 , the minimum load is x i3 , the daily power consumption is x i4 , and the daily weather type x i5 ; Equation (7) is calculated as where ω j represents the weight coefficient, and the value is ω 1 = 0.3, ω 2 = 0.3, ω 3 = 0.3, ω 4 = 0.1, ω 5 = 0.5. The type of load curve on the forecast day is determined by the type Z of the load curve that makes Equation (6) the minimum value. The value of Z is shown in Equation (8): where i is the number of all categories of the historical load curve, and Z is the sequence number of the load curve x i that minimizes d(x i ,V).

Step-by-step BP neural network prediction step based on improved MDSC
The improved MDSC algorithm is firstly used in this paper to cluster the load data of this region from 2017/12/01to 2019/02/23, and the power consumption characteristics of the load are also considered. The closer the date is, the more it can reflect its characteristics. Therefore, the data of the first 100 days of the forecast day is selected as training data for training. In order to avoid the complexity of model training and facilitate calculation, the initial load data x i is firstly normalized according to where y ik indicates the normalized load data; x i is the load data at time i at time k; x max is the maximum load data in the sample after the initial processing; x min is the minimum load data in the sample after the initial processing.
Step 1: calculate the similarity degree n i and deviation degree m ij of x i , x j by the load curve x i in all samples according to the formula, and then judge the category by traversing x i according to the MDSC to obtain the clustering result R and the number of categories r; Step 2: Calculate the class S(x i ), D(x i ) of the number r of classes, and find the class centre x i that minimizes the value of D(x i ); Step 3: Initialize the BP network and set the parameters of the neural network: the maximum number of iterations is 20, and the target value is 0.00001; Step 4: Normalize the obtained historical load data obtained; Step 5: Input the characteristic quantities v 1 , v 2 , v 3 , v 4 , and v 5 of the load data 100 days before the forecast date, and output the forecast day V; Step 6: Calculate the distance d(x i ,V) between the prediction date V and x i according to the categories S(x i ) and x i obtained in the second step; Step 7: Determine the similar category Z of the forecast day; Step 8: Directly input the load data of type Z into the BP neural network as training data to calculate the output error. If the output error meets the accuracy requirements, go to the next step; otherwise repeat step 8.
Step 9: Collect the weather data of the day to be predicted, and obtain the weather type index of different periods according to the weather data of the day to be predicted. Use the BP neural network model of the period to predict the period and output the prediction result of the period.
Step 10: Integrate the forecast results of each period to obtain the 96-point load value on the forecast day.

Case verification and result analysis
The load data used in this paper is from a county power supply company in Henan Province of the State Grid. The sampling interval is 15 min and the sampling time is 96 sampling points. The weather data were collected from meteorological stations in this region. The sampling interval was 1 h, and a total of 24 sampling points were sampled. Then cubic spline interpolation was conducted. The load data of the region from 2017/12/01 to 2019/02/23 was selected, and the 96 point load curve of 2019/02/24-2019/02/26 was predicted by using the prediction method proposed in this paper.

Clustering results analysis of improved MDSC
The improved MDSC algorithm was used to cluster the 450-day historical sample data selected in this area. The obtained clustering results were divided into 52 categories, of which the first 6 categories accounted for 89% of all samples for 401 days in total. 46 types are one day (two days apart) from the data. The specific clustering results is shown as Table 2 and Figure 2: From the data in Table 2, it can be seen that in the clustering result, the number of the first type of curve is 193,  Figure 2. Comparison of load forecast results on the first day. which belongs to the working day; the number of the second type of curve is 108, which belongs to the weekend; the number of the third type of curve is 40, which is the day before and after the holiday; the number of curves in the fourth category is 33, which is a national holiday; the number of curves in the fifth category is 20, which is similar to other holidays such as Christmas abroad; the number of curves in the sixth category is at least 7; it is a major maintenance day. Further, in order to obtain the load level feature vector V of the forecast day, the feature quantities v 1 , v 2 , v 3 , v 4 and v 5 of the load data 100 days before the forecast day are input to the BP neural network, and the forecast day V is obtained by output. First calculate the distance d(x i ,V) between the prediction day V and x i . According to the above clustering results, select the first 6 categories that are valid for calculation, and determine the similar category Z that minimizes d(x i ,V). The calculation results of the distance d(x i ,V) between V and x i is shown in Table 3: According to Table 3, it can be seen that the feature vector V on the first day of the prediction day and the d(x i ,V) value of the second type are the smallest, so the second type load data is selected as the training input sample prediction day of BP neural network. The eigenvector V and the d(x i ,V)value on the second day of the first type are the smallest, and the eigenvector V on the third day of the prediction day is the smallest and the d(x i ,V) value of the first type. Therefore, the load data of the first type is selected as BP training input samples for neural networks.

Establish a time-phased BP neural network model
After the above training data is obtained according to the above clustering method, a time-division BP neural network model is further established to distinguish the data of different periods of the same category of days. According to the influence index of weather factors, so as to train the load data of the period to be predicted. In the prediction process, the time-division BP neural network model is trained according to the weather type fluctuation index obtained from the real-time meteorological data of the predicted day. And looking for the influence index of weather factors in similar time period in historical data, input the trained neural network model, so as to obtain the neural network prediction model with seven periods. Finally, the forecast data of each period are integrated to obtain the load data of the forecast day.

Comparison of prediction results and error analysis
In order to verify the prediction performance of the above improved MDSC-BP multi-period integrated prediction method, the root mean square error RMSE and the average absolute percentage error MAPE are applied to evaluate the prediction method. The calculation formula is selected as where y i is the real value,y i is the predicted value, and n is the total number of samples. In this paper, the MDSC-BP prediction model is selected to compare the prediction results of the improved MDSC time-segment BP neural network integrated prediction model. Two methods were used to predict the 96 point load curve of a certain area of Henan province from 2019/02/24 to 2019/02/26. The model comparison and prediction results are shown in Figures 2-4, and the model prediction performance indexes is shown in Table 4.   From the prediction results in Figures 2-4, The improved MDSC-BP time-division comprehensive prediction method proposed in this paper shows good performance in the stability of the prediction value at a single point and the overall prediction effect. Unlike the MDSC-BP method, which predicts a very high peak or a very low peak and then deviates from the actual load. The reason is that in this paper, the improved MDSC is used for clustering before prediction to distinguish load characteristics and keep the numerical stability. In addition, the real-time meteorological factors are taken into account in the prediction model to carry out the comprehensive prediction by using BP neural network in different periods, so as to ensure that the uncertainties have too great influence on the prediction model. This fully demonstrates the effectiveness of improved MDSC-BP time -segment comprehensive prediction method proposed in this paper. Table 4 shows two kinds of forecast methods of performance indexes. The improved MDSC -BP time-segment integrated prediction method has better prediction effect on different categories. RMSE can measure the difference between the predicted values and the actual value, improve the integrated prediction method in predicting MDSC -BP time-segment day on the first day, the second day and on the third day of the RMSE value were 2.21, 2.03, 1.79, and RMSE value of MDSC -BP method has been more than 5, show that the improved method has stronger nonlinear fitting ability and higher prediction accuracy. It can be seen from the MAPE index that the MAPE value of the improved method is below 2%, while that of the MDSC-BP method is around 5%. It can be seen from the performance index value that the improved MDSC-BP time division integrated prediction method has lower overall prediction error and better effect.

Conclusion
In this paper, the problems of low accuracy or even failure of the prediction method under the condition of multi-stage and weather fluctuation are considered. In order to improve the accuracy of load forecast, based on improved MDSC a time-segment BP neural network integrated forecast method is proposed, which integrates historical load data, meteorological data and other multisource data. The result of example verification shows that the prediction error is greatly reduced, the prediction accuracy is improved, and the training time of the prediction model is shorter, which verifies its strong rationality and practicability. Finally, because different regions have different regional and load characteristics, the prediction method should be selected according to local conditions to improve its prediction accuracy.