Wind speed forecasting based on wavelet decomposition and wavelet neural networks optimized by the Cuckoo search algorithm

Wind speed forecasting is of great importance for wind farm management and plays an important role in grid integration. Wind speed is volatile in nature and therefore it is di ﬃ cult to predict with a single model. In this study, three hybrid multi-step wind speed forecasting models are developed and compared — with each other and with earlier proposed wind speed forecasting models. The three models are based on wavelet decomposition (WD), the Cuckoo search (CS) optimization algorithm


Introduction
Owing to the environmental problems caused by fossil fuels, renewable energy plays an important role in the energy sector because of its sustainability. Among the renewable energy sources, wind energy has gained popularity recently. Currently, it supplies about 4% of the total electrical energy demand worldwide (GWEC 2018)a figure that is expected to reach 15%-18% by 2050 (Philbert et al. 2013). China is the largest wind energy producer, with an installation capacity of 188 232 MW (GWEC 2018).
The optimal performance of a wind farm depends mainly on the quality of the wind speed forecasting within the wind farm. However, wind speed has strong randomness and volatility, which makes forecasting difficult. Accurate prediction of wind speed is important in wind power generation and the safe operation of wind farms. Therefore, wind speed-related forecasting techniques have become a focal point in the wind energy sector (Salcedo-Sanz et al. 2009).
Early studies used different techniques to address the forecasting of wind speed. These methods include physical methods (Landberg 1999;Negnevitsky and Poter 2006;Lange and Focken 2009), statistical methods (Ma et al. 2009), and a combination of the two (Zhang et al. 2014). Currently, artificial neural networks (ANNs) are widely used in wind speed forecasting, including multi-layer perceptron (Madhiarasan and Deepa 2017), radial basis function , recurrent neural networks (Shao, Deng, and Jiang 2018;Shi, Liang, and Dinavahi 2018), amongst others. Early studies also showed that artificial intelligence (AI) methods appear to be more accurate compared to traditional statistical models. As an AI technique, ANNs have been widely applied in different forecasting fields. The ANN method requires only one parameter (e.g. wind speed) that is suitable for online measurement. Compared with other methods, the ANN method has higher error tolerance. Proposed by Rumelhart, Hintont, and Williams (1986), the back propagation neural network (BPNN) is a typical ANN method that can implement any complex nonlinear mapping function proved by mathematical theories and approximate an arbitrary nonlinear function with satisfactory accuracy, according to Zhang et al. (2008). However, the sigmoid functions, usually used as the transfer function in ANN models, easily fall into local minima (Alexandridis and Zapranis 2013), resulting in lower accuracy of wind speed forecasting.
In addition, traditional ANNs use the gradient descent method to calculate the input weights, output weights, and hidden-node bias, which can converge slowly in the calculation process and tend to get stuck in local minima easily. To overcome these drawbacks, Zhang and Benveniste (1992) introduced the wavelet neural network (WNN) as an alternative to the traditional ANN, which is a single-hidden layer feed-forward neural network (SLFN) that applies a wavelet as the activation function instead of the classical sigmoid function. This technique has been shown to improve wind speed forecasting compared with traditional ANNs (Falamarzi et al. 2014;Chitsaz, Amjady, and Zareipour 2015). Recently, Wu, Wang, and Chi (2013) and Zhao et al. (2016) introduced AI for parameter determination, within which the bionics heuristic algorithm seems the most promising, including particle swarm optimization (PSO) (Liu et al. 2018), genetic algorithms (GAs) (Zheng et al. 2018), and more.
Early studies on combining wavelet analysis and ANNs adopted the WD-ANN structure (Chandra et al. 2014;Liu et al. 2013;Babu and Arulmozhivarman 2013) or a WNN (Falamarzi et al. 2014;Chitsaz, Amjady, and Zareipour 2015). Here, we use the Cuckoo search (CS) algorithm, proposed by Yang and Deb (2009), as an optimizing technique. CS is a new optimization metaheuristic algorithm that has been widely and successfully used in optimizing the parameters of models and other practical problems (Babu and Arulmozhivarman 2013). CS algorithms have fewer parameters requiring fine tuning, and a stronger ability to search for optimal solutions of the multimodal objective functions than PSO, GAs, and other algorithms. Three hybrid multi-step forecasting models, which are combinations of the CS algorithm, wavelet analysis, and an ANN, are introduced and validated with observations from two wind farms for the winter and summer periods. These three models are named the CS optimized WD-ANN (CS-WD-ANN) model, the CS optimized WNN (CS-WNN) model, and the CS optimized WD-WNN (CS-WD-WNN) model.
We explain the methods and data used in this study in section 2. In section 3, we present the results of the simulation from the three hybrid models, and compare them with earlier proposed forecasting models. Finally, we close the paper with a discussion and conclusion in section 4.
2. Methods, study area, and data 2.1. Three multi-step forecasting methods combining WD, WNN, and CS Wavelet analysis and WNNs have been widely applied in wind speed forecasting (Chandra et al. 2014;Doucoure, Agbossou, and Cardenas 2016). However, the accuracy of the forecasting result depends on the mechanisms used to determine the optimization parameters. Here, we develop three hybrid multi-step models that use the CS optimization algorithm for parameter determination (Yang and Deb 2009;Valian and Valian 2013). The three multi-step forecasting models represent three different combinations of WD, ANN, and WNN. Furthermore, the CS algorithm is utilized as an optimization tool for parameter determination.

CS-WD-ANN
One widely used method to construct forecasting models for complex data series like wind speed is to decompose the original series into several sub-series, and then establish sub-forecasting models. Applying this concept, the CS-WD-ANN model discussed in this study employs wavelet analysis for series decomposition. For a series x that undergoes WD into approximation series A j ; j ¼ 1; 2; . . . ; n; and detail series D j ; j ¼ 1; 2; . . . ; n, it can be represented by The decomposition level will be stopped depending on an empirical relation, which is shown by (Doucoure, Agbossou, and Cardenas 2016; Siwek and Osowski 2012) where SD stands for the standard deviation, A j means the approximation component at the jth decomposition level, and x is the original series. After the wavelet-based decomposition, the CS-WD-ANN method establishes a neural network for each sub-series (Figure 1(a)). The SLFN method is used as a sub-forecasting model, in which the parameters are optimized by the CS algorithm. Let N j be the sub-forecasting model of D j , j ¼ 1; 2; . . . ; n, and N nþ1 be the sub-forecasting model of A n . The multistep forecasting result is denoted as vectorx, expressed byx

CS-WNN
A CS-optimized WNN model, named CS-WNN, is established. The structure of the model is shown in Figure 1(b). Let the input and output of WNN be x 1 ; x 2 ; . . . ; x n and y 1 ; y 2 ; . . . ; y m , respectively. The output of the jth hiddenlayer neuron can be expressed as where l is the number of hidden-layer neurons; ω ij denotes the weights from input layer to hidden layer; h j is the wavelet function of the jth hidden-layer neuron, and a j and b j are the scale and position factors, respectively. In this study, the Gaussian second derivation wavelet is applied as an activation function, defined as Original series Step 1 Wavelet Decomposition Step 2 Sub-forecasts with CS

Final result
Step 3 Reconstruction (a)

(b)
Original series Step 1 Wavelet Decomposition Step 2 Sub-forecasts with WNN optimized by CS

Final result
Step 3 Reconstruction (c) where a and b are the scale and position factors, respectively. The optimization of network weights, scale, and position factors of activation functions are calculated by the CS algorithm. The output of the WNN can be calculated by where l and m denote the numbers of hidden-layer neurons and output-layer neurons, respectively; ω ik is the weight from hidden layer to output layer; and h i ð Þ is defined in Equation (4).
The output error of the WNN is estimated as the fitness function in the CS algorithm, as where yn k ð Þ is the desired output and y k ð Þ is the forecast value; M is the number of samples.

CS-WD-WNN
The wind speed data are decomposed by WD into subsequence waveforms with different frequencies. These sub-sequence waveforms are used as the input in the WNN (Figure 1(c)). The CS algorithm is employed to optimize the parameters of each sub-sequence series to obtain the final result.

Study area and data
The wind data used as a training dataset to the CS-WD-ANN, CS-WNN, and CS-WD-WNN models are collected at 80 m above ground level from the anemometer towers of two wind farms in Shandong Province, eastern China. The two wind farms are located at (36.03°N, 116.22°E) and (37.50°N, 120.97°E), respectively. These examples are obtained in summer and winter. Compared with spring and autumn, wind speed in winter and summer is more volatile and more difficult to predict. The data from the first wind farm for the period 19-25 January 2014 and 1-7 July 2014 are referred to as case1winter and case1summer, respectively. The data from the second wind farm for the period 5-11 February 2014 and 20-26 August 2014 are referred to as case2winter and case2summer, respectively. In each case, 696 data points with a 15-min interval are used.

Evaluation criteria
We apply four criteria to evaluate the forecasting performance of the three models: the mean absolute error (MAE) (Equation (8)), the mean absolute percentage error (MAPE) (Equation (9)), the root-mean-square error (RMSE) (Equation (10)), and the Pearson correlation coefficient (r) (Equation (11)): Here,x t represents the forecasting value of the corresponding observation x t , and T is the number of forecast periods.

Results and comparative analysis
To validate the effectiveness of CS-WD-ANN, CS-WNN, and CS-WD-WNN, four wind speed series (presented in Figure S1) are used to perform the multi-step wind speed forecasting in case1winter, case2winter, case1summer, and case2summer, respectively. It is important to note that the 601st and 696th data points (the last day of each case) are used as the test data in all cases. WD is used to decompose the original data of the wind speed series into various frequencies (Figures S2-5) at decomposition level 6. Depending on the decomposition result, the developed models establish a CS-optimized SLFN for each sub-series. In this study, the number of neurons (h) for the hidden layer in all developed hybrid models is determined by h ¼ ffiffiffiffiffiffiffiffiffiffiffiffi ffi n þ m p þ a; a 2 1; 10 ½ ; where n and m denote the numbers of input-layer neurons and output-layer neurons, respectively. According to the autocorrelation of the wind speed sequence, we choose the number of input-layer neurons to be 20. The number of output neurons is set to be 1 (one-step), 3 (three-step), and 5 (five-step) ahead. The number of hidden-layer neurons is decided by using an empirical expression (Equation (12)). We can obtain such a result: h % 5 þ a ð Þ¼6; 7; 8; . . . ; 15. After testing all values of h; 6 is proven to achieve the highest prediction accuracy. Figure 2 and Figures S6-8 show the forecasting results for case1winter, case2winter, case1summer, and case2summer, respectively. In each case and prediction step, the plots that compare the results of the developed hybrid models and observations are labeled as (a) and three scatterplots that compare each the developed model with the observational result as (b), (c), and (d). Table 1 shows the mean errors for different experimental tests in case1winter and case2winter; Table S1 shows the mean errors for different experimental tests in case1summer and case2summer.

Comparison among the three developed models
As shown in Table 1, Figure 2 and Figure S6, the prediction accuracy reduces as the forecast step increases for all developed models in the two cases (case1winter and case2winter). For example, the RMSE of CS-WD-WNN in case1winter for one-step, three-step and five-step predictions is 0.3870, 0.5772, and 0.7697, respectively. Similar results are obtained in the other two models and case2winter. The results demonstrate that CS-WD-WNN outperforms all the other developed hybrid models. Figure 2, Figure S6, and the error results shown in Table 1 indicate that CS-WD-WNN performs best compared to the other models. CS-WD-ANN performs much worse than all the developed models in this study. The values of MAE, MAPE, and RMSE for CS-WD-WNN in both cases are smallest compared with CS-WNN and CS-WD-ANN. The values of RMSE for case1winter in CS-WD-WNN compared with CS-WD-ANN and CS-WNN are respectively reduced by 45.95% and 43.86% for one-step, 40.39% and 30.52% for three-step, and 29.06% and 15.60% for five-step predictions. For case2winter, the RMSE is reduced by 41.04% and 16.72% for one-step, 47.77% and 28.09% for three-step, and 24.83% and 6.27% for five-step predictions.
The scatterplots in Figure 2(b-d) and Figure S6(b-d), and the calculated r values in Table 1, for both cases, indicate that the result of CS-WD-WNN is closer to the actual wind speed from the wind farms. The value of r in case1winter is greater than 0.98 (close to 1) in all prediction steps, while it is greater than 0.93 for one-step and three-step predictions in case2winter. In five-step prediction, the value of r is much smaller compared to the other two prediction steps for case2winter. As pointed out earlier, the forecasting accuracy reduces as the number of prediction steps increases.
It can be seen from Table S1 and Figures S7 and S8 that the same conclusion as that in winter can be drawn. As the number of forecasting steps of the model increases, the accuracy reduces. Also, comparing the three models, CS-WD-WNN still shows the highest accuracy. The values of RMSE for case1summer in CS-WD-WNN compared with CS-WD-ANN and CS-WNN are respectively reduced by 24.37% and 20.77% for onestep, 11.23% and 7.98% for three-step, and 34.45% and 16.32% for five-step predictions. For case2summer, the RMSE is reduced by 37.92% and 30.44% for one-step, 13.19% and 3.15% for three-step, and 31.04% and 22.35% for five-step predictions. The scatterplots ( Figures S7(b-d) and S8(b-d)) and the r values (Table  S1), for both cases, indicate that the result of CS-WD-WNN is closer to the actual wind speed from the wind farms.
We can see that the accuracy of each model in summer is not as good as that in winter; however, comparing the three models, CS-WD-WNN still performs the best.

Comparison between the developed hybrid models with other wind forecasting models
To further discuss the forecasting performance of the three hybrid models, this section compares them with earlier proposed models, including the BPNN, Autoregressive Integrated Moving Average (ARIMA), WNN, wavelet transformation decomposed Particle Swarm Optimization WNN (PSO-WD-WNN) and Persist methods. Specifically, the BPNN, ARIMA, and Persist models are the generally used benchmarks for veryshort-term wind speed forecasting. Moreover, WNN and PSO-WD-WNN are related hybrid models corresponding to our three models. The results of error comparison among the different models is displayed in Table 2, with the MAE, MAPE, RMSE, and r used as evaluation criteria. Figure 3 compares the performance of the developed hybrid models with earlier wind forecasting models for both cases, including the forecasting error. Most wind speed forecasting applications require multiple-step predictions. However, the effective prediction step size of statistical models usually does not exceed 3-5 steps. Here, we choose three-step forecasting to compare with the earlier proposed wind speed forecasting models for application purposes.
Form Table 2, the average MAE, RMSE, and MAPE for CS-WD-WNN are 0.4597, 0.5578, and 6.08%, respectively. These errors are smallest compared to all other models. The box plots in Figure 3 show that the boxes of CS-WD-WNN are narrow, the forecasting errors of which are distributed around zero and with very few outliers compared with the other models. This indicates that the predicted result of the model is stable and close to the actual wind speed.
As shown in Table S2, the two cases (case1summer and case2summer) show that the forecasting   performance of each model in summer is not as good as that in winter. The wind speed data of different seasons show different volatility and statistical characteristics. So, even using the same model, the forecasting errors of winter and summer may be different. The results show that, even in the summer, our proposed CS-WD-WNN model shows better forecasting results than other models. In addition, the introduction of CS in the developed hybrid models in our study improves the forecasting result compared with the previous models. From Table 2 and Table S2, the average RMSE is reduced by 12.69% and 8.56% from WNN to CS-WNN, respectively. This indicates that the introduction of the CS optimization algorithm has an advantage on CS-WNN over the gradient descent method on WNN. Comparing the CS-WD-WNN and PSO-WD-WNN models, CS shows an advantage over PSO, with the average RMSE reduced by 14.25% and 2.86%, respectively, from PSO-WD-WNN to CS-WD-WNN (Table 2 and Table S2).

Conclusion
In this study, we develop three hybrid multi-step wind speed forecasting models (CS-WD-ANN, CS-WNN, and CS-WD-WNN). The models combine wavelet analysis, SLFN, and CS. The CS-WD-ANN model employs wavelet analysis to decompose the original series into several sub-series, and then CS-optimized SLFN is applied to each sub-series. In CS-WNN, the CS algorithm is applied to the wavelet activation functions to determine the forecasting parameters. Lastly, in CS-WD-WNN, the original series is decomposed by wavelet analysis into sub-sequence waveforms with different frequencies. These waveforms are used as the input in the WNN. The CS algorithm is used to optimize the parameters of each WNN to obtain the final result. The observational data from two wind farms in Shandong Province, eastern China, are used as an input for the developed models.
The results show that CS-WD-WNN performs best among the three developed hybrid models, with lowest values of MAE, MAPE, and RMSE, and highest values of r in all cases. CS-WD-ANN performs worst among the three developed hybrid models in all cases. Furthermore, the models are compared with the earlier proposed wind forecasting models, including BPNN, Persist, ARIMA, WNN, and PSO-WD-WNN. CS-WD-WNN still outperforms all these models; the average MAE, MAPE, and RMSE are lowest in CS-WD-WNN compared to other models. Employment of the CS algorithm in our developed hybrid models shows more of an advantage with respect to the forecasting results compared with other models. For example, comparison between WNN and our CS-WNN model shows that the errors in CS-WNN are smaller than in WNN. The CS algorithm also shows an advantage over PSO in the two cases' wind speed forecasting results. The CS-WD-WNN model performs well in wind speed prediction, and the accuracy is higher than that of earlier proposed models.

Acknowledgments
We are grateful to the members of our research group who are not listed as coauthors, for their helpful discussions and comments.

Disclosure statement
No potential conflict of interest was reported by the authors.