Helicopter gearbox vibration fault classification using order tracking method and genetic algorithm

ABSTRACT In this paper, we implemented a diagnostic system for vibration faults that occur on the PUMA helicopter gearbox. We used an approach based on the joint use of the Order Tracking signal analysis and the Genetic Algorithm. To achieve this goal, we first collected a database of vibration signals measured during periodic inspections. The available vibration signals are acquired under a time-varying operating conditions. Therefore, we used the Order Tracking method, which is more accurate in extracting faults features. This technique was performed by resampling the vibration data and then applying the Short Time Fourier Transform. To enable efficient and continuous monitoring of gearbox vibration faults from features, we used Genetic Algorithm to build a rules-based diagnostic model. Genetic operators have been adapted to the specificity of the problem to optimize the parameters of this model. This approach is successfully applied to the diagnosis of vibration defects of helicopter gearboxes. The results have been validated effectively with test data. The diagnostic model can therefore be implemented on helicopter computers to detect faults in flight or on the ground. This approach has been used for the first time in the field of helicopter gearbox vibration fault diagnosis.


Introduction
Monitoring the mechanical condition of helicopters is an important safety issue. Indeed, a defect not detected in time may worsen, spread and lead to significant property damage or even loss of life. This is why important resources are deployed for the early detection of helicopter defects. Periodic maintenance has shown some limitations, especially in the case of failures that occur randomly. It is progressively replaced in the aeronautical field by conditional maintenance based on regular tests.
In the case of helicopters whose most sensitive part is the gearbox, the suitable method to perform the conditional maintenance is the analysis of the vibratory signals generated by the components. If there are defects on one of the gears, they would cause changes in these vibration signals. Therefore, monitoring the condition of the transmission system during operation, such as the gearboxes, is crucial as it is intended to prevent system malfunctions that could cause the system to shut down or even cause human damage.
So far, condition monitoring and identification of gearbox damage has received a lot of attention from researchers engaged in multidisciplinary activities, especially in intelligent sensor technology, signal processing and evolutionary algorithms.
In the field of signal processing, several techniques have been adopted. The time-domain-based technique extracts scalar indicators that give information on the evolution of power and signal peaks (RMS, Peak Indicator, Kurtosis, Skewness, etc.) [1,2]. However, this method gives imprecise results during the diagnosis [3].
Spectral processing is the major tool for the study of vibratory signals of rotating machines. Many problems associated to the detection of faults in the components of the rotating machine can be solved by Fourier analysis [4]. Nevertheless, there are cases where simple Fourier analysis is inefficient; we mainly refer to the case of local non-stationary signals.
The signal processing of non-stationary signals requires the implementation of a specific tool allowing the analysis of the time-frequency domain. In this sense, the wavelet transform has gained popularity in the field of the diagnosis of vibratory defects [5]. More recently, empirical mode decomposition (EMD) has been widely used. Similar to the wavelet transform, the EMD breaks down the signal into a collection of intrinsic functions (IMF). IMFs are obtained iteratively using the Hilbert-Huang transformation [6,7].
However, if the vibratory signals measured on the gearboxes are not stationary and the rotational speeds of the shaft are not constant, as in the case of helicopter gearboxes, all these techniques will have some limits and cannot be applied effectively.
Order Tracking Signal Processing is a useful technique when the rotational speed of the shaft changes.
We used this method because it allows to extract precisely the features of the faults from the collected vibratory signals.
Nevertheless, the automatic and continuous monitoring of faults from extracted variables is not an easy task and must be carried out continuously and efficiently, this can be done effectively with machine learning methods.
The Neural Networks and Deep Learning have been applied successfully [8][9][10][11] but requires a lot of calculation, a large database to perform a training and predisposed to overfitting. Decision trees have been used but they have the disadvantage of instability on the small sample. Random Forest is resistant to overfitting but classification problems are found if the number of relevant variables is small [12,13]. Support Vector Machine is a good classifier but requires a lot of calculation [14].
Other methods based on evolutionary algorithms have been used to achieve this goal [15,16]. These are stochastic algorithms whose principle is inspired by the theory of evolution to solve various problems. Among these algorithms, there are Genetic Algorithms that are metaheuristics inspired by the process of natural selection.
The purpose of this article is to implement a diagnostic system for vibratory faults occurring in gearboxes mounted on helicopters. To build this system, we used a database of vibration signals that was collected during periodic inspections of PUMA SA330 helicopters. We used the Order Tracking signal analysis method to extract fault features from the vibratory signals. This technique takes into account the fluctuations of the rotational speeds of the shaft to analyse the vibratory signals. To enable the detection and continuous identification of defects from the data calculated by the previous technique, we used Genetic Algorithms to construct a diagnostic model based on classification rules.
The originality of this work is to build a diagnostic model of vibratory faults combining the Order Tracking signal processing technique with a classifier based on Genetic Algorithms. This technique was used for the first time to set up a model that allows a quick and efficient diagnosis that is adapted to the specificity of the vibrations generated by helicopter gearbox. This paper is organized as follows. Section 2 describes the Order Tracking Signal Processing technique. Section 3 gives the basic concept of Genetic Algorithms and the classification task. The results of features extraction and data classification are discussed in Section 4. Finally, Section 5 concludes the paper.

Order tracking signal processing technique
Order Tracking is a technique for analysing the vibratory signals of rotating machines, such as engines, compressors, turbines and pumps. The vibration signal generated by a rotating machine is the superposition of signals generated by the various mechanical components that compose it, such as gearboxes, bearings, blades and shafts. All these signals have harmonics that are multiple of shaft rotation frequency.
Signal processing using the Fast Fourier Transform (FFT) method is widely used to analyse vibration signals. The FFT power spectrum can be used to diagnose rotating machines by associating the characteristic frequencies with the different mechanical components. If the machine is running at an invariable speed, peaks in the power spectrum can be identified at certain multiples of the shaft rotation frequency.
However, in rotating machines, the rotational speeds of the shaft are not always constant. Therefore, it would be difficult to observe mechanical faults. As the rotational speed changes, the frequency bandwidths of the harmonics become wider. Therefore, there may be an overlap between some frequencies. Identification from the power spectrum of the characteristic vibratory components frequencies becomes complicated. Visible peaks associated with particular mechanical parts cannot be identified. The Order Tracking techniques are effective when the speed of rotation changes with time because it allows the normalization of the speed of rotation. The order components are the vibration harmonics of the rotational speed. The order 1 is 1 times the speed of rotation and the order 2 is 2 times the speed of rotation of the shaft and so on. Thanks to the Order Tracking, it is possible to easily distinguish the hidden harmonics in the power spectrum. The spectrum obtained with this technique shows more clearly the peaks associated with the different mechanical parts.
The first uses of this technique have their origins in the field of electronics [17]. The principle is that the acquisition systems are triggered by electronic circuits synchronized with speed sensors. Thus, thanks to these techniques, the data acquisition is done directly in the angular domain; the sampling is done in constant increments of the rotation angle of the shaft.
With the improved computing power of digital signal processors, it has become easier and economically more appealing to resample signals in the angular domain, thereby reducing the complexity of acquisition systems.
There are several methods adopted for Order Tracking analysis that can be grouped into three major families: Computed Order Tracking, Kalman filter based methods, and Order Tracking Transform methods.
Computed Order Tracking methods operate in the time domain by interpolating the signal in the angular domain by a resampling approach [18]. The principle of the second family is based on the use of the Vold-Kalman filter in order to estimate the amplitudes of the harmonics and the instantaneous speed of the rotational speed of the shaft [19,20]. With regard to the approach Order Tracking Transform, it performs both time domain synchronization with the shaft rotation speed and the Fourier transform to evaluate the amplitude and phase of each order. Thus, we can obtain the harmonic amplitude without going through the resampling phase [21].
In our work, we used the method of Computed Order Tracking techniques, since we have a data acquisition system that allows measuring the rotational speeds of the shaft by a tachometer.
Vibration frequencies are often multiples of the rotational speed. With this approach, we can extract them accurately. The principle is based on resampling and interpolation of the measured signal to obtain a constant number of samples per cycle (angle increment) [18,22].
In practice, the rotation speed ω cyc is measured independently with a tachometer which generates pulses at each rotation of the shaft (Figure 1(a)). So we can calculate the angular vector θ according to Equation (1). Figure 1(b) shows the sampled data in the time and angle domains: The maximum value of the order O max that can be detected depends on the sampling frequency of the signal f s , it is calculated by Equation (2): Thus the sampling frequency f rsm of the signal in the angular domain must be greater than twice the value of O max to avoid the aliasing phenomenon. In the case of our study we have taken a value four times higher than this value as it is expressed by Equation (3): The vibratory signal sampled in the angular domain is represented in Figure 2.
After this resampling step, we can apply the Short Time Fourier Transform (STFT) method to the vibratory signal in order to calculate the features. This method provides spectral information on nonstationary data and is often used to evaluate whether a signal is stationary or not [23,24].
The principle of STFT is based on the calculation of the Fast Fourier Transform (FFT) of overlapping segments of the signal (see Figure 3). The FFTs of each segment are returned as a dataset that contains both the time and frequency domain. However, the weighting window must be well defined to improve temporal resolution and avoid spectral leakage (refer to Equation (4)): F T is the STFT of the signal S(i), L is the length and K is the time step of the sliding window W in . N is the frequency intervals. Frequency resolution improves and the temporal resolution decreases as the length of the window increases. The percentage overlap (O v ) between the windows is given by Equation (5): After calculating the STFT and averaging it over time, we extracted the signal features by Equations (6): where S(k) and f k are respectively the amplitude and the frequency of the kth order (k = 1, 2, . . . , K with K is the number of spectrum lines)

Genetic algorithm concept
In our work, we used Genetic Algorithms to construct a rule-based model to classify defects from vibrational data. The rules database allows from 12 features calculated from a vibratory signal to identify and detect a possible defect. Here is an example of a rule that can be generated by GA: where feat 1 , feat 2 · · · feat 12 are the features calculated by signal processing and Class is the predicted fault.
In what follows we will introduce a brief overview of the concept of Genetic Algorithms.
GAs are evolutionary algorithms based on the manipulation of the evolution process and adaptation of organisms in natural environments. Most of these algorithms are problem-solving processes based on Darwinian Theory.
They have successfully solved difficult optimization problems in various fields [25][26][27][28]. One of the biggest advantages of GAs is their flexibility. It offers the possibility of adapting the technique to the specificity of the problem studied.
They have been used effectively in the field of unsupervised learning especially clustering to discovering the structure of data when we have unlabelled data [29,30] and in supervised classification to define rules for classifying data [31][32][33][34]. Different representations can be used by these classifiers, for example decision trees, classification rules, discriminant functions and many others.
For classification rules, discovered knowledge is usually represented by IF-THEN prediction rules, where the IF part contains predictive attributes and the THEN part contains the prediction of the class. The discovered rules can be evaluated according to several criteria, such as the degree of confidence in the prediction, the accuracy rate of the classifications, comprehensibility, etc. [35].
The Genetic Algorithm constructs the classification model by inserting new rules. Two approaches are used to codify the population of individuals (chromosomes): the Michigan approach and the Pittsburgh approach [36,37]. In the Michigan approach, each individual codes a single prediction rule, while in the Pittsburgh approach, each individual encodes a set of prediction rules. In this case, a population consists of a set of individuals where each represents a list of rules. In our study, we opted for the Michigan approach to codify the rules.
The Genetic Algorithms follow all the steps described in the diagram in Figure 4. The main steps can be summarized as follows: (1) An initial population is randomly generated, and the performance of individuals in this population is evaluated. (2) The following operations are then repeated until a stop criterion which can be a maximum number of iterations or a maximum performance level to be achieved: • Individuals who will produce children are selected. This selection takes into account their performance. The better an individual is, the more likely he is to reproduce. • Create an offspring by combining the selected parents.
• Some genes of children may mutate randomly.
This can bring new characteristics to the new offspring by increasing their performance. • The performance of individuals in this population is evaluated. • Individuals whose performance is the least adapted are eliminated and will not be part of the next generation. (3) At each iteration, the best solution (represented by an individual or population) to the problem is retained. It is these solutions that will be proposed by the algorithm as answers to the problem.
The general principle of a Genetic Algorithm has been described, next we will describe the blocks necessary for its implementation.
After the codification of individuals, a fitness function has been defined to evaluate each individual. Then, genetic operators were adapted to this codification to produce a new population. They are the operators of selection, crossover, mutation and replacement.

Individual encoding
There are at least two ways to codify the individuals. They depend on how to represent the class to predict (part THEN of the rule) [37]. The first possibility is to represent the predictive class in the genome of the individual. The code of the individual will therefore include the codes of the IF part and the THEN part [38]. In our work, we used a second possibility where the individual's code includes only the IF part. It associates all the individuals of the population to the class to predict which remains unchanged during the execution of the algorithm. The execution of the algorithm is repeated as many times as the number of existing classes. At each execution, the algorithm discovers only the rules relating to a single class [39].
A chromosome is composed of several genes. The number of these genes is the same as that of the attributes or predictive variables obtained from the vibratory signal. Each of them represents a condition involving an attribute [38,40]. The first gene represents the first condition; the second gene represents the second condition, and so on (see Figure 5). All of these genes codify the IF part of the rule. Each gene is subdivided into three fields: Weight, Operator and Value.
The first filed of a gene "W" is the weight which is a real number whose value is in the range [0 1]. It gives the degree of importance of the attribute that corresponds to this condition. If its value is greater than a fixed threshold, the condition is accepted otherwise this condition is removed from the rule. The second field "Op" corresponds to the relational operator that can only take two values 0 or 1 that encode the two operators " < " or " ≥ ". As for the last field of a condition "V ", it gives the value of the attribute; also its value is normalized between 0 and 1.
The advantage of this type of coding is to offer flexibility to the length of the rule even if the length of the chromosome is fixed. This is possible because a condition of a rule can be accepted or eliminated by comparing the weight "W" of the gene with a threshold value that we set at 0,3.

Fitness function
It is necessary to have a fitness function to be able to evaluate individuals according to their performance and to select the best ones. This function is entirely specific to the problem and takes as parameter an individual "I" and calculates a value "V al " which represents its level of performance (Equation (7)): For classification problems, the fitness function evaluates the performance of each individual (rule) [35]. It is necessary to recall the basic concepts of the evaluation of a classification rule before defining the fitness function. Let a rule whose antecedent is "A" and the consequent (the predicted class) is "C" with the form: "IF A THEN C ". After using a rule to classify a data instance, depending on the class provided by the rule and the actual class of the instance, one of the following four types of results can be observed: this can be summarized in Table 1: • The actual class is C and the predicted class is also C.
• The actual class is C, but the predicted class is not C.
• The actual class is not C and the predicted class is also not C. • The actual class is not C, but The predicted class is C.
The calculation of the fitness function is based on the number of times these results occur after the evaluation of the individual on each line of the database. Sensitivity (Se) and Specificity (Sp) indicators will be combined to obtain the value of the fitness function f (Equations (8)). In our work, the fitness function that we propose is the same as that used by [35]. The weights c 1 and c 2 control the dependence of the fitness function on the values of TP,FP,TN and FN. For example, a decrease of c 1 or an increase of c 2 will generally improve the prediction accuracy, but will increase the tendency to overfitting. In this article we set c 1 to 1 and c 2 to 20

Selection operator
This operator allows generating offspring from the individuals who have the greatest value of the fitness function. The roulette wheel selection method has been widely used but it has problems when chromosome performance varies considerably. Other methods have replaced it, such as rank selection or tournament selection [41]. Rank selection first sorts the population by fitness function and attribute a rank according to their positions. In this method, all chromosomes have a chance of being selected, but it leads to a slower convergence.
We used in our study the tournament selection; it increases the chances of the worst individuals to participate in the improvement of the population. A tournament consists of a competition between several individuals randomly selected from the population. The winner of the tournament is the individual of better quality.
The operators who will be next studied are the crossover and the mutation operators. In order to avoid producing an invalid child, some restrictions have been imposed on these operators. They produce only valid rule conditions, avoiding inconsistencies.

Crossover operator
From selected individuals, a new generation is created using the crossover operator. Offspring individuals are obtained from the selected parent by a combination of genes. The crossover operator aims to direct research towards promising areas of the research space.
This procedure applies with a certain probability which is called the crossover rate which generally ranges between 0.45 and 0.95. This rate represents the proportion of the parent population that will be used by a crossover operator. In the case of our study, we fixed this rate at 0.8.
This operator is applied in several ways; we can mention single-point crossover and two-point crossover [42]. In the case of our study, we adopted the Heuristic Crossover Operator which was adapted to our type of individuals codification as follows: The combination between the chromosomes representing the individuals is carried out at the level of each field. If the fields are real values (fields of the weights and fields of the values of the attributes) the combination is done according to Equations (9). For fields corresponding to the operators " < " and " ≥ " that are encoded by the "0 " and "1" values, the combination is performed by a permutation of the fields of both chromosomes. An explanation of this operator is shown in Figure (6): where: • Gene i 1 and Gene i 2 are respectively ith genes of parent chromosomes; Figure 6. Crossover operator.
• W i (1,2) and W i of (1,2) are respectively the weight values of the parent chromosomes and the child chromosomes; • Op i 1 and Op i 2 are comparison operators; (1,2) and V i of (1,2) are respectively the values of the attributes of the parent chromosomes and the child chromosomes;The values of the attributes are calculated in the same way as the weights (W i (1,2) ,W i of (1,2) ); • α is a random weighting value that we set in our case between −0,2 and 1,2.

Mutation operator
This operator consists in changing the value of the parts of a gene with a very low probability. It guarantees the diversity of the population that is essential for Genetic Algorithms and prevents some genes favoured by chance from spreading to the detriment of others and from being present in the same place on all chromosomes. Also, this operator limits the risks of premature convergence to a local optimum by ensuring that each point in the search space can be reached. Thanks to this property, we are sure to be able to reach the global optimum [15]. For our case, the mutation is adapted to the genotype of the individuals. There are two mutation operators: The mutation mechanism for real value fields is to add or subtract a random value from the current value and the mutation mechanism of the comparison operator field ( " < "," ≥ ") consists in inverting its value.

Replacement operator
The replacement operator determines the final composition of the next generation. There are two main types of methods [15].
The first type, called stationary replacement, consists of keeping a constant population size. At each generation, children replace all or part of the parents. In this case, the best parents are kept for the next generation to maintain the same population size. The second type, which can be called elitist replacement, consists in having a growing population size. A child is included in the next generation only if it is at least better than the least successful of the parent generation. We can then imagine a whole range of variations between these two main methods. In our work, we chose the stationary replacement method.

Features extraction and order tracking signal processing results
The data we used in our study was collected from a PUMA SA330 helicopter maintenance centre. Only one defective component was present in the system during data acquisition. In our work we have selected six cases of study: (1) spiral input pinion bearing spalling; (2) spiral input pinion gear tooth scuffing; (3) helical input pinion chipping; (4) helical idler gear crack; (5) collector gear crack; (6) no defect.
The acquired data are as follows: For each case studied, we obtained 20 vibratory signals with a duration of 1 min and a sampling frequency f s = 40, 000 Hz (see Figure (7)). A tachometer mounted on the measuring device records the rotational speed of the shaft. To increase the size of the database, we divided each signal into four parts.
We first interpolated the vibratory signal in the angular domain with a sampling frequency f rsm (see Equations (1), (2) and (3)) then we calculated the spectrogram of the signal. We used the flat top window with a length of 2006 points and an overlap of 50% (see Figure 8).
With Matlab Signal Processing Toolbox, we analysed and extracted the fault features from all available vibratory signals. These data will be used by the Genetic Algorithm to construct a rule-based diagnostic model.

Genetic algorithm classification results
The features calculated from the vibration signal processing are first organized into a matrix of 480 lines and 12 columns (80 lines for each class). We carried out a post-processing by normalizing the data between 0 and 1 before starting the classification of the data. This step is performed so that the variables are treated with the same priority. Next, we divided the data into two parts: 50% of the data for training (240 lines) and 50% for the validation (240 lines). These data allow the Genetic Algorithm to construct a rules-based model for classifying defects in the six classes of defects mentioned in Section 4.1.
For each class, we used a population of 100 individuals. The maximum number of iterations has been set to 50. The offsprings are obtained from the combination of 80% of the population and the mutation of 30% of this population.
The Diagnostic model based on Genetic Algorithms have effectively classified vibratory defects. The classification rate for the training data is 100%. For validation data the classification rate has reached the value of 99.16% (Tables 2 and 3).
All the signals are classified with a percentage of 100%, except for the signals corresponding to faults of    Class 2 (Spiral input pinion gear tooth scuffing) and faults of Class 3 (Helical input pinion chipping) which respectively have a classification rate of 97.62% and 97.37%. Figure 9 shows the variation of the fitness function for class 1 and class 2 during training.
The Genetic Algorithm program was fully developed and executed with Matlab R2017b on a machine with the following performance: Intel Core i7 2.20 GHz processor with 8.0 GB of RAM. The execution time is 18 min 14.15 s. This program is run offline just to build the diagnostic model using the vibratory database.
After this step the diagnosis can be made according with the Simulink model of Figure 10. The main steps can be summarized as follows: (1) Phase 1: Acquisition of vibratory data by acceleration sensors at the gearbox. (2) Phase 2: • Application of the Order Tracking signal processing technique to extract 12 features. • Apply post-processing to data.
(3) Phase 2: Detection and Identification of fault by a rule-based model.
We used Genetic Algorithms to set up a rule-based classification system that can detect and locate vibration defects from vibration signals. The following example is a rule generated by the genetic algorithm. IF (F 1 ≥ 0, 69) and (F 3 < 0, 90) and (F 6 < 0, 20) and (F 7 ≥ 0, 60) and (F 9 ≥ 0, 47) and (F 12 ≥ 0, 92) THEN Fault Class = 3 where F 1 , F 2 · · · F 12 are the extracted features from vibratory signals by Equations (6) and "Fault Class = 3" corresponds to the fault "helical input gear shredding ".
The advantage of using Genetic Algorithms is that, from a few training data, we can build an efficient classification model.

Conclusion
In our work, we build a diagnostic model of vibratory faults that occur on gearboxes of PUMA SA300 helicopters. We used a database of vibratory signals collected during periodic inspections. Among the vibratory data available, we selected six classes, one corresponding to the faultless case and the others to five different types of defects. The possibility of using conventional signal processing techniques has not been used because the vibratory signals are collected with time-varying operating conditions. We opted for the Order Tracking technique which takes into account the variation of the rotation speed by interpolation of the signal in the angular domain. We then carried out the STFT to extract the features from each signal. From computed data and Genetic Algorithms we have built a rule-based classifier.
Compared to other methods, this technique has the advantage of constructing a classification model from a smaller database. We obtained very satisfactory results with a classification rate of 100% for training data and 99.16% for validation data. Thanks to the obtained results the model has been validated and retained for a possible computer implantation on the ground or in flight for the vibratory faults diagnosis.
The technique we adopted was used for the first time to diagnose vibration faults in PUMA SA330 helicopter gearboxes. We have chosen a technique which is adapted to the specificity of the problem by a joint use of the order analysis technique for the signal processing and the Genetic Algorithms for the classification of the defects.

Disclosure statement
No potential conflict of interest was reported by the authors.