Pipeline signal feature extraction method based on multi-feature entropy fusion and local linear embedding

This paper considers the problem of effective feature extraction of acoustic signals from oil and gas pipelines under different working conditions. A feature extraction of pipeline leakage detection method is proposed based on multi-feature entropy fusion and local linear embedding (LLE). First, seven kinds of commonly used entropy which can reflect the characteristics of the signal better are extracted from the pipeline signal through experiments, including permutation entropy, envelope entropy, approximate entropy, fuzzy entropy, energy entropy, sample entropy and dispersion entropy. The seven-dimensional feature vectors are obtained by feature fusion. Second, the LLE algorithm is used to reduce the dimension of the feature vector to complete the secondary feature extraction. Finally, the support vector machine (SVM) is used to identify the working conditions of the pipeline. The experimental results show that, compared with other dimensionality reduction methods, single-feature entropy method and multi-feature entropy fusion method, the proposed method can identify the types of pipeline working conditions effectively and reduce the problems of false negatives and false positives in pipeline leakage detection.


Introduction
Pipeline transportation is widely used in the transportation of oil and gas because of its low cost and low power consumption (H. Lu et al., 2020). In recent years, due to the continuous increase of pipeline operating life, the aging degree is becoming more and more serious. Man-made destruction and theft are also a serious threat to the safety of oil and gas pipelines, and the risk of leakage is increasing gradually (Wang et al., 2019). It is a very meaningful topic to monitor pipeline leakage accurately in real time. At present, there are many pipeline leakage detection methods, including negative pressure wave detection (L. Sun et al., 2010), acoustic detection (J. Lu et al., 2018), magnetic flux leakage detection (Liu et al., 2015), optical fibre detection (Li et al., 2012), etc. Among them, the sound wave method has been widely used because of its high detection accuracy, high sensitivity, low false alarm rate and low installation and maintenance cost (J. Lu et al., 2021;Yang et al., 2022). Therefore, the acoustic detection method is used to collect pipeline signals in this paper.
Feature extraction is one of the key steps in pipeline leakage detection, which directly affects the diagnosis results. With the improvement of detection accuracy, the CONTACT Jingyi Lu ljywdm@126.com technology of feature extraction has attracted more and more attention. To identify the rotor state accurately, the characteristic frequency band energy entropy is used to extract the defect feature of the rotor in Pang et al. (2018) and input it into support vector machine (SVM) for recognition. In J. Sun et al. (2016), the signal of gas pipelines is collected and decomposed by local mean deposition. The root mean square entropy of the decomposed components is extracted and the feature vector is constructed, which is input into SVM to identify the aperture of the pipeline. In Ni et al. (2014), the characteristic entropy of pipeline signal is extracted and input into the SVM optimized by Particle Swarm Optimization. The experimental results show that the recognition effect of the proposed method is better under the condition of high noise. However, extracting a certain feature of the pipeline signal often cannot get a high recognition rate in different external environments, thus it is necessary to extract a variety of features to reflect more information of the signal (J. Lu et al., 2019). In Zhou et al. (2020), 30 groups of features of pipeline signals are extracted, and 12 of the most sensitive features are selected to form feature vectors, which are fed into SVM for working condition recognition. In Feng et al. (2015), the sample entropy and power spectrum distribution features of the signal are extracted and input into SVM for classification. The feature combination can distinguish the common noise around the pipeline accurately. In Zhu et al. (2021), a bearing fault feature extraction method is proposed to extract the time-frequency features of the signal and do further feature extraction to retain the fault sensitive features and remove the insensitive features. Finally, the extracted feature matrix is used as the input of SVM for bearing fault diagnosis. The fault diagnosis method based on multi-feature entropy fusion can reduce the risk of loss of effective information in the signal ; however, it will lead to redundant information in the extracted features, which will affect the accuracy of further fault identification. Therefore, it is necessary to use dimensionality reduction algorithm for secondary extraction of features. The kernel principal component analysis is used to reduce the dimension of the original feature set in Cheng et al. (2016), and it is input to learning vector quantization neural network for identification to complete the fault diagnosis of planetary gear. In Wan et al. (2018), the vibration signal is decomposed into several components by variational mode decomposition. The multi-feature entropy of each component is calculated to form the feature matrix, and it is dimensionalized by principal component analysis (PCA). In Attoui et al. (2017), the collected rolling bearing signal is decomposed by second-order wavelet packet, and the short-time Fourier transform is used to calculate the peak value of each harmonic main frequency band and the energy distribution of wavelet packet decomposition. The high-dimensional feature set of rolling bearing fault signal is constructed, and then linear discriminant analysis is used to reduce dimensionality, and finally adaptive neuro-fuzzy inference system is used for online fault identification.
In this paper, a feature extraction method of pipeline leakage detection based on multi-feature entropy fusion and local linear embedding (LLE) is proposed. First, a variety of entropy values containing more information of the data set are extracted from the pipeline signal for fusion. Second, the LLE algorithm is used to extract secondary features from the fused feature vector to improve the recognition accuracy. Finally, the pipeline leakage detection is carried out by using the SVM to identify the pipeline signals accurately under different working conditions. The effectiveness and superiority of the proposed method are verified by experiments.
This paper is organized as follows. Section 2 introduces the theories of multi-feature entropy fusion, LLE and SVM. In Section 3, the performance of the Entropy fusion-LLE-SVM model is analysed and experiments are made to compare it with other algorithms. Finally, our work is summarized and concluded in Section 4.

Multi-feature entropy fusion
In a physical sense, complexity reflects the rate at which a time series generates new patterns with the increase of the length of the series . The greater the complexity value of the sequence, the more new changes of the data over time, the faster the frequency of data changes, i.e. the data changes are irregular and chaotic; when the frequency of new changes in the sequence becomes slower, the data of the sequence changes regularly and has obvious periodicity. Therefore, the change of the state of a system can be described by the complexity of its time series.
In view of the heuristic thought of thermodynamics entropy, information entropy was first propounded by Shannon to evaluate the complication of system (Ai et al., 2017). However, there are various definitions of entropy, and it is an important direction to apply entropy and complexity to describe and identify the implied regularity of dynamical systems for nonlinear time series. The magnitude of the entropy value can visually reflect the complexity of the fault signal, and the larger the entropy value, the greater the complexity of the signal.
Entropy can usually measure the degree of uncertainty of information. The problem in practical application is that the single entropy value of the extracted signal is not enough to reflect the characteristics of the pipeline leakage signal, thus it is necessary to extract multiple entropy values from different perspectives. In this paper, seven kinds of entropy which can better reflect the characteristics of the signal are selected, namely permutation entropy T1 (Bandt & Pompe, 2002), envelope entropy T2 (J. Sun et al., 2014), approximate entropy T3 (Pincus, 1991), fuzzy entropy T4 (Chen et al., 2007), energy entropy T5 (Gao et al., 2020), sample entropy T6 (Alcaraz & Rieta, 2010) and dispersion entropy T7 (Rostaghi & Azami, 2016). The seven characteristic entropies are combined, and the multi-characteristic entropy values of groups of signals are calculated and combined as the feature vector as follows: where n represents the number of samples.

Locally linear embedding
LLE is a nonlinear dimensionality reduction algorithm, which reflects the global characteristics by analysing the local characteristics. LLE uses the set weight coefficient to keep the structural characteristics of the original dataset unchanged and maps the high-dimensional dataset to the low-dimensional coordinate system in order to obtain the low-dimensional features with good clustering. Moreover, the LLE method can exploit the low-dimensional features fully in the high-dimensional nonlinear dataset and exclude redundant information.
The LLE dimension reduction is carried out on the highdimensional dataset X D×n , which is composed of n Ddimensional data, and the data Y d×n of n d-dimensional data (d D) in the low-dimensional coordinate system is obtained. The specific steps of the algorithm are as follows: (1) The number of nearest neighbour points k and the embedding dimension d of data point X(i) are selected, and the Euclidean distance between X(i) and other arbitrary data points X(j) is calculated.
(2) Define an error function: where k is the number of nearest neighbour points, ω ij is the weight coefficient of the j th nearest neighbour point of X(i), and the restriction condition of ω ij is set to k j=1 ω ij = 1. The k weight factors ω ij of X(i) are solved by Lagrange multiplier method, and the weight coefficient matrix W is reconstructed by filling the corresponding position 0 with all the weight coefficients.
(3) Define a reconstruction error function: where the limiting condition of , and Y is obtained by eigenvalue solution.

Support vector machine
SVM is a machine learning algorithm proposed by Vapnik in 1995, which is based on statistical learning theory and structural risk minimization principle (Cortes & Vapnik, 1995). It transforms the input space into a linearly differentiable high dimensional space by defining an appropriate kernel function. Then a nonlinear transformation is implemented to find the optimal linear hyperplane of the high dimensional space. The parameter selection of SVM is an important problem in the process of modelling. In fault diagnosis and pattern recognition, satisfactory diagnosis and recognition results can only be obtained by selecting appropriate kernel functions and appropriate parameters for a given sample set. Among them, the function of penalty parameter is to balance model complexity and training error, and its value affects the complexity and the stability of the model. By selecting the appropriate penalty parameters, the SVM can achieve the desired balance between the fitting ability of the training sample set and the generalization ability of the test sample set. In addition, the selection of the parameters of the radial basis kernel function determines the mapping space, which determines the dimension of the sample distribution in the data space (Cherkassky & Ma, 2004).

Data acquisition
The experimental data used in this paper are from the simulation experimental platform of natural gas pipeline leakage detection in Northeast University of Petroleum. The total length of the pipeline is 169 m, the diameter is 150 mm, and the wall thickness of the pipeline is 20 mm. The transportation of gas and liquid can be realized in the pipeline. There are several leakage points on the pipeline to simulate the leakage of the pipeline in the field, and the relevant parameters of the pipeline can be monitored by the monitoring station. The interval between the leakage points is 10 m, the pipeline pressure is 0.3 MPa, the flow velocity is 16 m/s, and the leakage diameter is 16 mm. The experimental software adopts the LABVIEW programming environment. The signal is collected by the acquisition board of NI company. The simulation experimental platform for pipeline leakage detection is shown in Figure 1.
The pipeline signal data of three different working conditions are collected in this paper, including normal signal, knocking signal and leakage signal. The normal signal is the signal collected when the valve is closed and the gas in the pipeline is transported normally. The knocking signal is the interference signal collected when the valve is closed and the pipeline is knocked artificially. The leakage signal is through the installation of a 10-m-high pressure acoustic attenuation tube at the leakage point, a plug with a leakage aperture of 1 mm and a ball valve with a 0.4 mm at the end of the pipe, and then quickly switch the valve switch to simulate the signal collected by the pipeline leakage, and the sampling frequency is 3 kHz. The time domain waveforms and their frequency spectrum of the three working conditions of the pipeline are shown in Figures 2-4.

The algorithm flow
The method proposed in this paper extracts multi-feature entropy fully from pipeline signals to constitute a feature matrix. The extracted feature matrix contains more information in the signal ; however, there is also a large amount of redundant information and irrelevant information, which will inevitably affect the accuracy of pipeline leakage detection if the directly constructed feature matrix is identified. Therefore, the multi-feature entropy and LLE are combined to obtain the main features with low dimensionality and high sensitivity by using the processing ability of the manifold learning algorithm for nonlinear complex data, and the extracted low-dimensional features are identified by SVM to achieve the purpose of pipeline leakage detection. The algorithm flow is shown in Figure 5.

Feature extraction
The pipeline acoustic signals under three different working conditions are collected by the experimental platform in this paper. The multi-features of the pipeline signals are extracted, including permutation entropy, envelope entropy, approximate entropy, fuzzy entropy, energy entropy, sample entropy and dispersion entropy. There are three types of labels in the data samples, among which the label of normal signal is 0, the label of knocking signal is 1 and the label of leakage signal is 2. A total of 100 samples are selected for each type of label. Further, 784 sampling points are intercepted from each data sample. Different entropy values are extracted from the signal to reflect the state characteristics of the pipeline signal, as shown in Figure 6. Entropy can describe the disorder and complexity of the signal. The normal signal characteristics of the pipeline have high disorder and large entropy, because the shock characteristics of the pipeline are less during normal operation. When there is leakage or interference in the pipeline, the shock characteristics become stronger, the disorder of the signal becomes lower, thus the entropy decreases. It can be seen in Figure 6 that different entropies have different sensitivities to the classification of pipelines under different working conditions, and a single entropy cannot reflect the characteristics of the signal accurately.

Multi-feature entropy dimensionality reduction
Seven kinds of feature entropy extracted are fused from pipeline signal. From the obtained multi-feature entropy curve, most of the feature entropy has a small degree of discrimination to normal signals, knocking signals or leakage signals, which can lead to subsequent misclassification easily. Therefore, it is necessary to fuse the multi-feature entropy. The feature vector matrix with multi-feature entropy will contain more information of the data set, however there is also a lot of redundant and irrelevant information. If the fused feature vector matrix is recognized directly, it is bound to affect the effect of pattern recognition.
In this paper, the LLE algorithm is used to reduce the dimension of the feature vector matrix. Set the number of nearest neighbour points to 13 and the target dimension to 3. The visualization result after dimension reduction by LLE is shown in Figure 7. It can be seen in Figure 7 that LLE can distinguish the types of pipeline signals effectively, the characteristics of knocking signals are separated completely, and the gathering places of each type of pipeline signals are more concentrated. As a comparison, the SNE method (Kerstin et al., 2012) and the PCA method (Kirby & Sirovich, 2002) are used to reduce the dimension of the feature vector matrix. Figure 8 shows the visualization results of the two methods. It can be seen in Figure  8 that the two dimensionality reduction methods cannot effectively distinguish between normal and knocking conditions, there is a certain degree of intersection and overlap between samples, and the sample characteristics of leakage conditions are scattered relatively. Combining  Figures 7 and 8, we can see that the dimensionality reduction effect of LLE is better than that of SNE and PCA, which verifies that the method used in this paper has a good dimensionality reduction effect.

Pattern recognition
To further demonstrate the effectiveness of this method, the SVM is used to classify and identify the extracted low-dimensional features. In the experiment, a total of 300 groups of samples were divided into two groups randomly according to the proportion of 8:2, in which 240 groups of samples constructed training set and 60 groups of samples constructed test set. The SVM is trained with the training set, and then the test set is input into the trained SVM for prediction. The classification result of the SVM on the test set is shown in Figure 9.  The method of Entropy fusion-LLE-SVM proposed in this paper has an accuracy of 100% for the recognition of pipeline signals under various working conditions. This method can accurately identify the signals of three different working conditions on the test set. As a comparison, the SVM is also used to classify multi-feature entropy  and SNE dimensionality reduction, multi-feature entropy and PCA dimensionality reduction, multi-feature entropy non-dimensionality reduction and single entropy. The comparison results are shown in Figure 10. Most of the methods can accurately identify pipeline leakage signals, which can effectively reduce the false alarm rate. However, the error recognition rate of the signal during the normal operation of the pipeline is improved, that is, the normal signal is mistakenly regarded as an interference signal or a leakage signal. This may be due to the fact that the partial entropy cannot well identify the normal signal and the interference signal, and there is no distinction between the two signal characteristics. Therefore, the experiments show that the proposed method can better distinguish the working conditions of pipeline signals and has a better recognition effect.
To illustrate more intuitively the superiority and effectiveness of the proposed method in oil and gas pipeline leakage detection, Table 1 shows the classification accuracy and recognition ratio of different methods on the test set. Aiming at the research of oil and gas pipeline leakage detection, the experimental results show that the method of multi-feature entropy fusion and LLE proposed in this paper has the best recognition effect. The conclusions are as follows: (1) the dimensionality reduction effect of LLE is better than that of SNE and PCA. (2) Multi-feature entropy of pipeline signal is extracted and fused into the feature vector matrix, which can contain more information of data. (3) There is redundant information after multifeature entropy fusion, and the recognition accuracy is improved greatly after dimensionality reduction. To sum up, the identification effect of the proposed method is the best compared with different methods, which verifies the superiority and effectiveness of this method in oil and gas pipeline leakage detection.

Conclusion
Aiming at the problem of effective feature extraction of acoustic signals from oil and gas pipelines under different working conditions, a feature extraction of pipeline leakage detection method has been proposed based on multi-feature entropy fusion and LLE. Through the experimental analysis of the original pipeline signal collected, the conclusions are described as follows: (1) The multi-feature entropy fusion method contains more effective information compared with the singlefeature entropy, and the recognition accuracy is improved greatly.
(2) There is redundant information after multi-feature entropy fusion. After dimension reduction by LLE algorithm, the differentiation of pipeline signals under various working conditions is improved.
This method has high recognition accuracy and effectively addresses the issues of false negatives and false positives in the process of leakage detection of oil and gas pipelines.
In the actual pipeline operation environment, it is easy to collect the pipeline signal in the normal state, but it is difficult to collect the abnormal signal. Therefore, there is a problem of sample imbalance. In the future, we will study the problem of pipeline leakage detection under the condition of unbalanced samples. In addition, it is also a meaningful research topic to improve the LLE algorithm to improve the effect of dimensionality reduction.

Disclosure statement
No potential conflict of interest was reported by the author(s).