Particle swarm optimized extreme learning machine for feature classification in power quality data mining

ABSTRACT This paper proposes enhanced particle swarm optimization (PSO) with craziness factor based extreme learning machine (ELM) for feature classification of single and combined power quality disturbances. In the proposed method, an S-transform technique is applied for feature extraction. PSO with craziness factor is applied to adjust the input weight and hidden biases of ELM. To test the effectiveness of the proposed approach, eight possible combinations of single and combined power quality disturbances are assumed in the sampled form and the performance of the proposed approach is investigated. In addition white gaussian noise of different signal-to-noise ratio is added to the signals and the performance of the algorithm is analysed. The results indicate that the proposed approach can be effectively applied for classification of power quality disturbances.


Introduction
A major challenge in the area of power systems is monitoring and detection of power quality disturbances [1]. The power quality monitoring equipment should be capable of recognizing, capturing and classifying the power quality disturbances [1,2]. In the past few years, researchers have oriented towards the development of feature selection and classification algorithms that can effectively detect power quality disturbances. Techniques like discrete wavelet transform, S-transform and Hilbert transform have been applied for feature extraction [3][4][5][6][7][8][9]. Analysis of the literature indicates that, among the available feature extraction techniques, the performance of S-transform is attractive for the signal with noise disturbances. S-transform preserves the phase information during decomposition, as it uses variable window length and engages the Fourier transform kernel [3]. With respect to classification algorithms, researchers have proposed rule based classification, fuzzy logic based classification and neuralbased approaches. Of the different classification methods, the neural network based approaches have been widely applied for classification of power quality disturbances [10][11][12][13][14][15][16]. The gradient-based learning algorithms, such as back-propagation and its variant Levenberg-Marquardt method, have been applied for training the multilayer feed forward network. Even though the performance of these algorithms is satisfactory, the major problem in these algorithms is that they are slow in learning, get stuck in local minimum and require appropriate choice of stopping criteria, learning rate and learning epochs [17]. In the recent years there has been a growing research interest towards the analysis of extreme learning machine (ELM), applied for single-hidden-layer feed forward neural networks. ELM, introduced by Huang et al., is a fast learning algorithm where the input weights and hidden biases are randomly generated and the output weights are analytically determined using the least square method. Due to its fast learning and good generalization ability, ELM has been successful in a wide variety of applications [18][19][20]. However, the random choice of input weights and biases created an uncertainty in regression and classification problems [19]. To alleviate this problem researchers have focused on optimization techniques for choosing the number of neurons in the hidden layer and tuning of input weights and hidden biases [17,21,22]. The authors in [18] have proposed a real-coded genetic algorithm approach to select the optimal number of hidden nodes, input weight and biases. The proposed algorithm B-ELM [23] optimizes the output layer weights by Bayesian linear regression. An improved particle swarm optimization (PSO) algorithm has been proposed to enhance the performance of ELM [24]. From the detailed analysis of the literature [18,19,20,[23][24][25]26], it is evident that optimization techniques have been applied to enhance the performance of ELM. On the subject of optimization algorithms, several variants have been proposed with the objective of evading local minima and improve the search performance. In this perspective, this work focuses on the application of chaotic PSO-based ELM for feature classification. This method incorporates a craziness operator with chaotic weight updation to maintain diversity in the particles.
The paper traces its flow in eight sections with Section 1 explaining the significance of power quality disturbance classification and the merits and challenges in ELM-based classifier. While Section 2 details the feature extraction using S-transform, Section 3 discusses the basics of ELM classifier. Sections 4 and 5 focus on the principles of PSO and enhanced PSO algorithm, respectively, while Section 6 details the application of chaotic PSO for adjusting the weights and bias of ELM. The results are analysed in Section 7. Finally, the concluding remarks are given in Section 8.

Feature extraction using S-transform
The S-transform [27] provides necessary information for analysing and detecting power quality events [11]. This method combines the short-term Fourier transform and the wavelet transform. The derivation of the S-transform is done from the wavelet transform by modifying the phase of the window function or mother wavelet. For a signal x(t), the S-transform is defined as where g f (τ )is the gaussian modulation function, defined as follows: where f ∈ denotes the linear frequency and, Substituting (2) and (3) in (1) the expression becomes The discrete version is calculated from the fast Fourier transform. The discrete Fourier transform of the time series x(t)is obtained as follows: The discrete S transform is obtained by allowing, f −→ n/NT and τ −→ jT The discrete inverse of the S transform can be obtained as The rows and columns of the complex matrix obtained as output from S transform represent frequency and time, respectively. In this paper, the distinctive features belonging to each event signal have been extracted by applying S transform on the power   quality event data. In comparison with the other time-frequency analysis methods, the S-transform has better noise immunity, which makes it an efficient way to recognize the power quality disturbances [28].
The simulated normal voltage signal (a) along with time-maximum amplitude plot (b), frequency-maximum   [29].
In this work 20 features are extracted from the Smatrix for classification of power quality events and are tabulated in Table 1. By carefully analysing the characteristics of the different disturbances, it is evident that for proper classification, the features need to be extracted from time, frequency and amplitude, as well as from high-and low-frequency areas separately [29]. The classification accuracy depends on the number of features and the characteristics of the features obtained. The features have been obtained by applying standard statistical techniques on S-matrix.

ELM classifier
ELM, proposed by Huang et al. [30], is a single hidden layer feed forward network, where the input weights are chosen randomly and the output weights are calculated analytically. Sigmoid, sine, gaussian and hard limiting functions can be used as activation functions for the hidden neuron layer. The linear activation function is used for the output neurons [25]. ELM has several significant features, different from traditional popular gradient-based learning algorithms, for feed forward neural networks. The advantage of ELM includes faster learning speed, good generalization capability and evades issues like local minima [25].
Let us consider the ELM network with N observation samples {P i , Q i }, where P i = [p i1 , . . . , p in ] ∈ n is an n-dimensional feature of the sample i and Q i = [q i1 , . . . , q ic ] ∈ C is its coded class label. If the sample P i is assigned to the class label c k then the kth element of Q i is one (q ik = 1) and other elements are −1. The output (Q) of the ELM network with H hidden neurons and C distinct classes is defined as follows: (8) where W represents H × n input weights, B represents H × 1 bias of hidden neurons and U represents C × H output weights. G j (.) is the output of the jth hidden neuron and is defined as follows: H, (9) where G(.) is the activation function.
In case of the radial basis function, the output of the jth neuron G j (.) is defined as follows: where W and b j are the centre and width of the radial basis function neuron.
In case of the sigmoidal activation function, the output of the jth neuron G j (.) is defined as follows: where Q H is the hidden layer output and is given as follows: In the ELM algorithm, for a given number of hidden neurons, it is assumed that the input weights W and bias B of hidden neurons are selected randomly. By assuming the predicted output,Q is equal to the coded labels Q, the output weights are estimated analytically aŝ where Q + H is the Moore-Penrose generalized pseudoinverse of the hidden layer output matrix.
The ELM algorithm consists of the following steps: 1. Select the number of hidden neurons and a suitable activation function for a given problem. 2. Randomly, choose the input weight (W) and bias (B).
3. Analytically, calculate the output weight (U). 4. Use the calculated weights (W, U, B) for estimating the class label.
In this work the input to the ELM is the 20 features extracted from the S-matrix, while the output is the integer value of the class label.

Particle swarm optimization
In the recent years, PSO has been applied to solve a wide variety of engineering problems [31]. In PSO, particles constituting a swarm moves around the search space to determine the best solution [32]. Each particle in PSO adjusts its movement considering its own experience as well as the the experience of other particles.
The velocity of the ith particle in D-dimensional space is expressed as V i = (V i1 , . . . , V id , . . . , V iD ), the position of the ith particle is expressed as X i = (X i1 , . . . , X id , . . . , X iD ),the best position of the ith particle is expressed as Pbest i and the global best position among all the particles is expressed as gbest.
The velocity and position of the particles are updated using Equations (14) and (15) where 1 ≤ i ≤ n, 1 ≤ d ≤ D.
In the above expression, ω is the inertia weight, c 1 and c 2 are learning coefficients, r 1 and r 2 are separately generated uniformly distributed random numbers in the range [0,1], k is the iteration number and n is the number of particles.
The following weighting function is usually used in Equation (14) where ω max and ω min are the initial and the final weight, respectively, k is the current iteration number and k max is maximum iteration number. The parameter ω regulates the trade-off between the global and the local exploration abilities of the swarm. The first term of the velocity update equation (14) replicates the memory behaviour of the particles; the second term indicates the cognitive behaviour of the particles while the third part signifies the social behaviour of the particles. The convergence of the PSO algorithm is influenced by proper selection of parameters in the velocity update equation [33].

Enhanced PSO with craziness factor (crazy PSO)
To avoid being trapped into local optimum attempts have been made to improve the performance of PSO [34,35]. In this work, a craziness operator, along with chaotic weight updation is introduced in the conventional PSO algorithm to enhance the performance of PSO. In birds flocking or fish schooling, a bird or a fish frequently varies its direction. This phenomenon is introduced using a "craziness" factor [32]. To maintain the diversity of the particles, it is necessary to introduce craziness operation in a PSO algorithm [36]. This is done by updating the velocity of the particles using (17) and then applying the usual equations to modify the position and velocity using (14) and (15), respectively, where r 3 and V cr are uniformly distributed random number uniformly distributed between 0 and 1. P(r 3 ) and sign(r 3 ) are defined, respectively, as where p craziness is a predefined probability of craziness.
The inertia weight is an important control parameter that affects the PSO's convergence. The application of chaotic dynamics to update the inertia weight in PSO improves the searching behaviour and possibility of being trapped into local optimum is avoided [34,35]. In this context, the chaotic PSO approach based on the logistic equation applied for determining the weight factor. The logistic equation is defined as follows: where k is the sample and μ is the control parameter, 0 < μ ≤ 4. The behaviour of the system represented by (18) is greatly changed with the variation of μ. The value of μ determines, whether (18) stabilizes at a constant size, oscillates between limited sequences of sizes, or behaves chaotically in an unpredictable pattern. The behaviour of the system is sensitive to the initial value of "y" [37]. The system becomes deterministic, displaying chaotic dynamics when μ = 4 and y(1){0, 0.25, 0.5, 0.75, 1}.The parameter "ω' of (17) is modified by (19) through the following equation:

Crazy PSO-based ELM
In ELM, as the computation of output weights is based on the input weight and hidden biases, their choice greatly influences the performance of ELM. The issues with random generation of parameters have been discussed in the literature [20]. In order to ensure better generalization ability of ELM, this paper proposes chaotic PSO algorithm-based selection of input weight and biases. The detailed steps of the proposed method are as follows: • First, the swarm is randomly generated. Each particle in the swarm is composed of a set of input weights and hidden biases. All components in the particles are randomly initialized within the range of [−1,1]. • Second, for each particle, the corresponding output weights are computed. Then the fitness of each particle is evaluated. In this work the fitness of each particle is based on the root mean square error of the training set. • Third, with the fitness of all the particles, the Pbest i and gbest of the swarm are updated. • Fourth, each particle updates its position and velocity according to (14) and (15), and a new population is generated. • Finally, the above optimization process is repeated until maximum iteration.
Thus, the enhanced PSO algorithm is used to optimize the input weights and hidden biases.

Results and discussion
In order to investigate the performance of enhanced PSO-based ELM algorithm for classification of power quality disturbances, around 1100 samples of 8 classes of disturbances are generated using MATLAB software with each waveform based on the model as specified in reference [29]. As the electric power systems operate under noisy environment, the sampled signals come together with noise. The proposed approach must be analysed under noisy conditions. In this paper noisy signals are generated for all the 8 classes by adding different levels of gaussian white noise with signal-tonoise ratio of 30, 40 and 50 dB.
The proposed PSO ELM approach has been implemented in MATLAB and the percentage of classification accuracy is tabulated for training samples and testing samples in Tables 2 and 3.The results are presented for ELM structure with 100 hidden neurons and the sine activation function. The parameter settings for PSO are as follows: 50 particles, acceleration constants c 1 : 1.5,c 2 : 1.5, inertia weight ω max : 0.9, ω min : 0.5, and maximum iterations: 100. These parameters are selected carefully for efficient performance of the algorithms.
From the table, it is evident that the proposed approach has better classification accuracy in comparison to the conventional ELM approach and the PSObased ELM approach.   Further, it is evident that the classification accuracy of the proposed algorithm is superior to the other algorithms for the signals with disturbances.
The comparison with the support vector machine (SVM) and back propagation neural network presented in Table 4 illustrate the superiority of the proposed classifier. The results are presented by choosing the structure of the back propagation neural network similar to the ELM model, while in SVM, the radial basis function kernel is used as the kernel function, with softness factor and the kernel function parameters selected as 5 and 4, respectively.

Conclusion
This work envisages the application of craziness-based PSO to enhance the performance of ELM for feature classification of single and combined power quality disturbances. The input weight and hidden biases are adjusted using the optimization technique. The proposed technique is compared with PSO-based ELM and conventional ELM to highlight the classification accuracy. The results reveal that the proposed approach has superior classification accuracy under normal and noisy conditions. The fact that the proposed classifier results in better classification accuracy in comparison with the SVM and back propagation neural network augurs well for its application in power quality event detection and classification. The addition of a regression network along with the proposed classifier to indicate the severity of the disturbance will help in initiating the corrective measures depending upon the nature and magnitude of the disturbance.

Disclosure statement
No potential conflict of interest was reported by the authors.