Fault diagnosis of transformer based on fuzzy clustering and the optimized wavelet neural network

ABSTRACT In order to solve the disadvantages of the traditional wavelet neural network (WNN) algorithm applied in transformer fault diagnosis, such as uneven sample distribution of training samples and high diagnostic error rate and long training time, an improved fault diagnosis method is proposed based on fuzzy clustering and the flower pollination algorithm. Firstly, fuzzy clustering is applied to deal with transformer fault sample data so as to remove the bad data; secondly, the flower pollination algorithm is applied to obtain the optimal parameters of the WNN. The example analysis results show that WNN based on the flower pollination algorithm (FPA-WNN) has better convergence, lower diagnosis error rate and shorter training time compared with WNN based on the particle swarm algorithm (PWA-WNN) and it is more suitable for transformer fault diagnosis.


Introduction
The power transformer is the core equipment in power transmission and distribution, and the safe and reliable operation of power transformer has an important impact on the power grid and national economy. It is significant to the stable operation of the power system if transformer faults or latent faults can be diagnosed and predicted accurately and quickly.
Dissolved gas analysis (DGA) technology is one of the most convenient and effective methods for fault diagnosis of oil-immersed transformers. It can diagnose the latent fault which may cause serious damage accurately and reliably. In recent years, various criteria for transformer fault diagnosis have been proposed at home and abroad using the DGA gas ratio as a characteristic parameter, such as the IEC ratio (Yang, Liu, Li, & Hu, 2007) and the improved Rogers ratio (Rogers, 1978). However, most of the fault diagnosis criteria are based on field experience, and there is misdiagnosis occuring in practical applications.
Artificial intelligence (AI) technology has been widely used because the complex nonlinear relationships between DGA gas content and transformer faults have been established based on these methods and it has the advantages of continuous learning and updating compared with conventional DGA 'hard criteria'. Among them, the support vector machine and artificial neural network are CONTACT Maofa Gong sdgmf@163.com widely used in transformer fault diagnosis and have made good effects. Xue, Zhang, Li, and Peng (2015) proposed a method which optimized the parameters of the support vector machines using the cuckoo algorithm to obtain the best diagnosis model. However, the method would lead to a local optimum. Ma (2008) proposed a method based on the genetic algorithm and WNN, but the method had the disadvantages of a relatively complicated network structure, long training time and low accuracy. Song and Wang (2015) proposed a method which optimized the weights and scaling factors of WNN using the PSO algorithm, and the method also had the disadvantages of high diagnostic error rate and long training time. Moreover, there were fault diagnosis methods based on grey theory (Li, Sun, Chen, Zhou, & Du, 2003), Bayesian classifiers (Wang, Zhang, Jin, & Guo, 2018) and the expert system (Shi, Shi, Mu, Li, & Liu, 2014), but most of them had the disadvantages of over fitting, long training time and low diagnostic accuracy.
In this paper, an improved fault diagnosis method is proposed based on fuzzy clustering and the flower pollination algorithm. Firstly, the fuzzy c-means clustering algorithm (FCM) is used to deal with the collected fault sample data in order to filter out the isolated sample and avoid uneven distribution of the sample, and it can influence the correct rate of transformer fault diagnosis.
Secondly, the WNN algorithm is used with the flower pollination algorithm and applied to obtain the optimal parameters of WNN. Finally, the effectiveness of the method in transformer fault diagnosis is verified by a simulation experiment. The theory and simulation results prove that the proposed method has global optimization and it has a high rate of accuracy for transformer fault diagnosis.

Fuzzy C-means clustering algorithm
As a typical clustering algorithm, FCM has been widely used in engineering and scientific fields, such as medicine imaging, bioinformatics, pattern recognition and data mining. FCM categorizes a given data set X = {x 1 , . . . , x n } ⊂ R p into c fuzzy subsets by minimizing the following objective function: where c is the number of clusters and selected as a specified value in this paper, n is the number of data points, u k is the membership of x k in class i, m is the quantity controlling clustering fuzziness and V is the set of cluster centres (v i ∈ R p ). The matrix U with the ikth entry u ik is constrained to contain elements in the range [0, 1] such as c i=1 u ik , ∀k = 1, 2, . . . , n. The function J m is minimized by a famous alternate iterative algorithm.

The introduction of WNN
WNN (Chen, Pan, Wang, & Yun, 2007) is a new network based on wavelet transforms. The architecture of the WNN is based on a multilayer perceptron (MLP). In the case of WNN, the discrete wavelet function is used as the node activation function. Because the wavelet space is used as a characteristic space of pattern recognition, the characteristic extraction of the signal is realized by the weighted sum of the inner product of the wavelet base and signal vector. Furthermore, because it combines the function of time-frequency localization by wavelet transform and self-studying by the neural network, the network possesses approximate and robust capacity.
The topology of the WNN is shown in Figure 1. The number of input layer nodes is m(k = 1, 2, . . . , m); the number of hidden layers is n(j = 1, 2, . . . , n); the number of output layer nodes is N(i = 1, 2, . . . , N); the kth input sample of the input layer is x k ; the actual output value of the ith node of the output layer is y i ; the expected output value of the ith node of the output layer isŷ i ; the connection weight between the input layer node k and the hidden layer node j is w kj and the connection weight between the output layer node i and the hidden layer node j is w ji ; the telescopic translation coefficients of the jth hidden layer nodes are a j and b j . The hidden layer wavelet neurons use the Mexican Hat wavelet function, and the output layer nodes use the Sigmoid function. The input of the jth wavelet element of the hidden layer is ( 2 ) The output of the jth wavelet element of the hidden layer is Then the output of the ith node of the network output layer is (4) The network output error function is Therefore, when the input node, hidden layer node and output nodes are determined, the key to construct a suitable WNN is to determine the parameters in the network. The selection of these parameters depends on the optimization of the WNN training algorithm and thus it is especially important to find a suitable WNN training algorithm.

FPA optimize the parameters of WNN
The British scholar X.S. Yang proposed a flower pollination algorithm (Wang, 2016;Wang et al., 2018)  There are two key steps in this algorithm, such as selfpollination and cross-pollination.
The self-pollination (local pollination) is shown in the following equation: where x t i is the pollen i or solution vector x i at iteration t and g * is the current best. The parameter L is a step size which obeys Levy distribution.
The cross-pollination (global pollination) is shown in the following equation: where x t j and x t k are pollens from the different flowers of the same plant species and ε is a proportional coefficient between 0 and 1.
For selected transformer fault data, the process of a wavelet neural network fault diagnosis model based on the flower pollination algorithm are as follows: Step 1: Process transformer fault data gathered from related references.
Step 2: Initialize the basic parameters of the FPA algorithm.
Step 3: Make each pollen position correspond to a set of parameters {w kj , a j , b j , w ji } and calculate the best solution g * with the initial pollen and assign Fmin to the fitness at g * .
Step 4: For each pollen, If rand < p, //A switch variable p ∈ [0, 1], the global agent is updated via Equation (6), else the local agent is updated via Equation (7).
Step 5: Evaluate a new solution. If Step 6: Check the termination condition, and if it is not safe, please go back to Step 3.

Transformer insulation fault gas treatment using the FCM method
In this paper, 241 sets of transformer fault data from related references (Li, 2014;Sun, Li, & Sun, 2001;Xiong et al., 2007;Yin, 2007;Yin, 2013;Zhou, 2010;Zeng et al., 2011) are collected and each set of transformer fault data has a clear fault conclusion. Due to the transformer insulation gases are highly dispersive, the data quality may not be ideal. Therefore, the accuracy of the algorithm trained by these samples is poor, and the diagnostic accuracy of the algorithm is not ideal. To solve this problem, the FCM method is used to deal with transformer fault gas before the algorithm is trained.
The specific process is as follows: Step 1: Select the appropriate number of FCM clusters according to the fault data.
Step 3: Delete data in categories that have less than 4% of the selected failure data.
Taking the low-temperature (LT) superheat data as an example, the results after using the FCM method are shown in Table 1. The fault data which have less than 4% in categories of the selected example should be removed. As seen from Table 1, the percentage Cluster 3 in categories has less than 4% of the selected data, so the gas of this category should be removed.

Determination of WNN topology and failure sample data
When the transformer fails, the transformer insulation oil will decompose and produce different types of insulating gas. Different faults will produce different types and volumes of insulating gas. Therefore, depending on the type and volume fraction of the insulating gas, the type of failure can be determined by a certain method. Insulating gases such as H 2 , CH 4 , C 2 H 6 , C 2 H 4 , and C 2 H 2 are mainly caused by abnormal faults in the transformer. However, the volume fraction of the fault gas may vary greatly. Large fluctuations in the fault data will have a large effect on programme convergence. Therefore, the fault data which should be normalized is shown in the following formula:  Table 2. The attention concentration of insulating gas.
X 1 (H 2 ) X 2 (CH 4 ) X 3 (C 2 H 6 ) X 4 (C 2 H 4 ) X 5 (C 2 H 2 ) X 6 (all) where c i is the attention concentration of insulating gas and x ix is the actual value of the insulating gas. The attention concentration of insulating gas is shown in Table 2. The topology of the WNN of this paper is shown in Figure 2.
The input value X 1 , X 2 , . . . , X 6 represents the volume fractions of the normalized H 2 , CH 4 , C 2 H 6 , C 2 H 4 , C 2 H 2 , and total hydrocarbons (sum of the total insulation gas volume normalized by the five gases) respectively. The output value Y 1 , Y 2 , . . . , Y 6 represents the failure of partial discharge, low energy discharge, high energy discharge, low-temperature overheat, medium temperature overheat and high temperature overheat. The training speed and accuracy of the algorithm can be affected by the number of hidden layer nodes. According to relevant references, the topological structure of WNN has 6 outputs and 6 inputs. The best number of hidden layer nodes in WNN is about 12 (Fang, Peng, Li, & Shu, 2011).
In this paper, the mean square error of the algorithm is compared by changing the number of hidden layer nodes in WNN to determine the best hidden layer node number. When the input and output data of the training set are the same, the mean square error of different hidden layer nodes is shown in Table 3. As seen from Table 3, the number of hidden layer nodes is 12 and the mean square error is the minimum. So the hidden layer number of WNN is 12.

Case analysis
Transformer fault data that were collected from related references are randomly divided into two parts: 180 sets of transformer fault data were selected as training data and the remaining 61 sets of transformer fault data were used as test data. The structure of WNN of 6-12-6 based on PSO and FPA is trained and the best training result is shown in Figure 3. It can be seen from Figure 3 that FPA-WNN has a faster convergence rate than PSO-WNN. FPA-WNN reaches the target mean square error value in about 200 iterations while PSO-WNN reaches in about 270 iterations.
In order to verify the superiority of the proposed method, 61 sets of test samples were brought into the trained PSO-WNN and FPA-WNN. The accuracy of the algorithm and the mean square error are shown in Table 4. It can be seen from Table 4 that the diagnostic

Conclusion
A WNN fault diagnosis model based on fuzzy clustering and the flower pollination algorithm can avoid the problems existed in the other fault diagnosis methods, such as long training time and high diagnostic error rate. Simulation results show that the FPA-WNN algorithm has a lower fault diagnosis error rate and shorter training time, which provides a better fault diagnosis method for transformer fault diagnosis compared with the PSO-WNN algorithm.

Disclosure statement
No potential conflict of interest was reported by the authors.