Neural network-based data-driven modelling of anomaly detection in thermal power plant

ABSTRACT The thermal power plant systems are one of the most complex dynamical systems which must function properly all the time with least amount of costs. More sophisticated monitoring systems with early detection of failures and abnormal behaviour of the power plants are required. The detection of anomalies in historical data using machine learning techniques can lead to system health monitoring. The goal of the research is to build a neural network-based data-driven model that will be used for anomaly detection in selected sections of thermal power plant. Selected sections are Steam Superheaters and Steam Drum. Inputs for neural networks are some of the most important process variables of these sections. All of the inputs are observable from installed monitoring system of thermal power plant, and their anomaly/normal behaviour is recognized by operator's experiences. The results of applying three different types of neural networks (MLP, recurrent and probabilistic) to solve the problem of anomaly detection confirm that neural network-based data-driven modelling has potential to be integrated in real-time health monitoring system of thermal power plant.


Introduction
Industrial systems have the purpose to perform a given production task in a given time and at given costs. A refinery, gas or thermal power plants need to be operative all the time and it must function properly all the time with least amount of costs. The industrial units get damaged due to continuous usage and this should be detected as early as possible to prevent losses [1]. Anomalies that are detected through sensor data could be interpreted in many ways, it could be that one or more sensors are faulty or some components are faulty or something else is happening, thus it is important to study these phenomena and characteristics. An anomaly detection approach defines a region of n-dimensional data space representing normal behaviour, and declares any observation in the data that does not belong to this normal region as an anomaly.
Recently, the attention has been devoted to improved monitoring systems for power plants [2,3]. Today, a large number of parameters are measured and saved in databases to be used for historical analysis etc. More sophisticated monitoring systems are required, with the possibility of being used in real time to detect failures and abnormal behaviour of the power plants by developing a graphical user interface. For effective and efficient engine health monitoring (EHM), a host of parameters are usually monitored (speed, power, gas inlet pressure and temperature, exhaust and operating pressure) [4].
Various methods have been proposed by the scientific community and implemented in industrial applications. Most of the approaches can be classified in three groups: rule-based expert systems, data-driven approaches also known as the data mining approach or machine learning approach and model-based approaches [5]. Rule-based expert systems use specific system knowledge of an expert to perform the diagnosis task. In this process, one or more rules are triggered by some deviation of a system parameter. The modelbased approach encodes human knowledge into a model. But this model is very time consuming and labour intensive, and the feasibility of modelling every part of a complex system is very low [1]. Data-driven approaches for anomaly diagnosis rely on the analysis of measured system data and thus lead to a utilization of the capabilities provided by the historical records of the system behaviour. A data-driven modelling approach uses available process information for chemical batch process operation as presented in [6], for Gas-Turbine in [7,8] and for Coal Fired Power plant in [9].
The main goal of our paper is to evaluate neuralnetworks as a data-driven modelling approach aimed at early anomaly detection in thermal power plant CONTACT Lejla Banjanovic-Mehmedovic lejla.mehmedovic@untz.ba without using a huge amount of training data. For this purpose, operational data from the system for monitoring and control in Thermal Power Plant "Tuzla", Bosnia and Herzegovina has been employed for the training of an artificial neural network (ANN) models. Our scientific goal is to prove that neural network is effective in determining anomalous regimes despite similarity of results for normal and anomaly data from thermal power plant and has potential to be integrated in real-time health monitoring system of thermal power plant. The rest of this paper is organized as follows. Section 2 describes the state-of-the-art for current anomaly detection methods in industry plants. Section 3 describes the selected sections of thermal power plant. Section 4 proposes the neural network based datadriven modelling. In Section 5, experimental results are presented to demonstrate the effectiveness of the proposed approach in selected sections of thermal power plant. Finally, Section 6 concludes the paper.

Anomaly detection in industry plants
Anomaly detection refers to detecting patterns in a given data-set that do not conform to an established normal behaviour. The patterns thus detected are called anomalies [10]. Anomalies are also referred to as outliers.
Conventional anomaly detection techniques have been used for a long time, but with the development of computer technology modern anomaly detection techniques can be developed. Some of those techniques are: distribution-based approaches, depth-based, clustering-based, distance based technique (k-nearest neighbour), density-based, spectral decomposition (principal component analysis, PCA), and classification approaches (support vector machines (SVM), neural networks), etc. [11].
Many approaches of anomaly detection have been developed and applied effectively to identify the anomaly detection in industrial plants using different performance parameters. In paper [12], the proposed anomaly detector using PCA is validated using data from a power generation plant. The paper [13] presents a prognostics-based technique that reduces the LED qualification time. The similarity-based-metric test extracts features from the spectral power distributions using peak analysis, reduces the dimensionality of the features using PCA, and partitions the data-set of principal components into groups using a k-nearest neighbour clustering technique.
A combination of segmentation algorithms with a one-class SVM approach for efficient anomaly detection in oil platform turbo-machines is presented in paper [10]. In paper [1], data mining techniques for classifying data streams at a refinery are shown. After clustering and identifying sensor failures, a new model for forecasting the occurrence of next sensor failure was created. The paper [14] describes the design and development of a fuzzy-neural data fusion system for increased state-awareness of resilient control systems in hybrid energy systems.
The application of neural networks for anomaly detection in power plants is considered in papers [4,7,15,16].
The paper [14] describes about normal and abnormal vibration data detection procedure for a large steam turbine using ANN. Self-organization map (SOM) is trained with the normal data obtained from a thermal power station, and simulated with abnormal condition data from a test rig developed at laboratory. In [17], an event detection system using neural network (multilayer perception (MLP)) is trained with data from a nuclear power plant to help the operators in identifying anomalies and taking timely decisions. The example presented in [7] efficiently recognizes anomaly patterns of common combustion problems using ANN in a gas turbine. Back propagation (BPNN) and generalized regression (GRNN) neural network models were implemented for the performance based anomaly detection of a small sized gas turbine [4]. The comparison between neural networks based and statistical anomaly detection techniques for gas turbine data can be found in [18].
Our paper proposes neural network-based datadriven modelling in determining anomalous regimes in selected sections of thermal power plant without using a huge amount of training data. This modelling could be useful for effective and efficient system health monitoring.

Selected sections of thermal power plant
A thermal power plant, as a large and complex system, consists of multiple smaller systems that work together and ensure continuous electricity generation. One of the most important systems involved in thermal power plant operation is the plant's boiler. The boiler represents the entire system that participates in the conversion of water into steam [19]. The most important sections are water-steam section, Feeedwater system, Steam Drum and Steam Superheather section, where the last sections are selected sections of the boiler (Steam Superheaters and Drums) for anomaly detection, presented in Figure 1.

Water-steam section
This system is primarily engaged in converting water into steam and consists of multiple subsystems with separate functions. The system consists of multiple pipes and vessels. Heat exchange between different media occurs in this system in order to achieve optimal steam parameters. The steam is driven further to the turbine propelling it, which is essential for electricity generation. Given that the quality and parameters of steam directly affect the electricity generation, anomaly detection in this system is of great importance for the plant. The most important process variables related to steam are temperature, pressure and steam flow which directly affects the current power output.

Feed-water system
One of the important subsystems of the boiler is the system for its feedwater supply, because without the feedwater supply there is no steam generation. After raw water is treated at the water chemical treatment plant, the water is stored in the feedwater tank. The water is distributed further from the tank into the feedwater pipe system using feedwater pumps. The pumps maintain the specified feedwater flow which is determined by the required power output. The purpose of this system is distribution of feedwater to the most important subsystem of the water-steam systemthe Steam Drum.

Steam Drums
The most important subsystem of the water-steam system is the Steam Drum, which is a large tank with the task of steam extraction from a water-steam mixture stored in the drum. Given the importance of the drum for electricity generation, it is logical that timely anomaly detection is required in this system. The most important process variable in the drum is the water level. In addition to that, steam pressure, conductivity and pH value are also measured.

Steam superheaters
The steam generated in the Steam Drum is distributed through this system. After distribution of steam through this system, it is called the superheated steam. Steam Superheaters system consists of pipes mounted in the boiler that distribute the steam to the turbine. This system has a major role in removing moisture from the steam, which improves its quality. The temperature of steam generated in the drum still does not match the temperature needed in the process.
The steam generated in the drum still contains a certain percentage of moisture. Such steam should not be distributed to the turbine due to the possibility of condensation on the blades. In every thermal power plant, there is a tendency to produce 100% quality steam. Because of that, the steam is heated in this system using flue gases. Increasing the steam temperature results in removal of moisture from the steam and that is the primary goal of this system. The steam temperature is increased by around 160 C compared to steam temperature in the drum. The goal of the system is to maintain the temperature around 535 C, which is optimal for boiler analysed in this paper.

Neural network-based data-driven modelling
Classification, linear or non-linear problems, with or without underlying system dynamics guides the choices of network composition and the topology. The purpose of the assessment is to determine which type of neural network based date-driven model fits best for anomaly detection problem. The neural network-based data-driven modelling framework includes a few important phases: variables selections and data acquisition, plant behaviour modelling, model validation using performance metrics and testing the performance of a trained different datadriven models.

Variables selection
Variables selection and data acquisition are the two key elements for successful modelling of systems behaviour and analysis [4]. Selected sections of thermal power plant for anomaly detection in our research are Steam Superheaters and Steam Drums. These sections consist of two separate systems each, which produce steam of adequate quality together.
The process variables from the Steam Superheaters for each separated system are: superheated steam temperatures (TS I and TS II), superheated steam flows (FS I and FS II) and superheated steam cooling water flows (CWF I and CWF II).
There are multiple temperature sensors measuring the steam temperature in the superheaters, but due to its importance, only the measurement in front of the turbine is used for the anomaly detection analysis. The temperature is the most important process variable in this section because it is used in turbine and boiler protection systems. Lowering or rising the temperature below or above the certain values with certain duration in any of the two systems instantly triggers protection that causes outage of the turbine and the boiler.
Superheated steam flow is a process variable which affects the unit power output and that is why it is important. Anomalies related to this variable can lead to generator power output anomalies, which, in some cases, may have an impact on the whole electric power system. The flow value must be above the required technological minimum for the unit to operate normally. This variable is used in the protection systems combined with superheated steam pressure. As the flow depends on the pressure, there is no exact minimum value of the flow, but the protection is designed according to the steam pressure-flow function.
Superheated steam cooling water flow has its role in temperature control and is related to anomalies related to the superheated steam temperature. However, the water flow is not used in any of the protection systems.
The variables from the Steam Drums for each separated system are: drum levels (DL I and DL II), drum pressures (DP I and DP II) and feed-water flows (FWF I and FWF II).
Drum level must be maintained between the minimum and maximum limits during the unit operation. The plant can be severely damaged if these limits are exceeded. For this reason, the level measurement is used in the protection systems. This variable is measured in relation to some zero point, which means that the value can be negative.
There are also no protections related to drum pressure, which is also not the usual practice. Yet, the pressure difference between the two drums is used in the protection systems. If the protection is triggered, coal and mazut supply to the boiler is stopped. If the absolute value of the pressure difference between the drums rises above 17 bar, mazut burners, coal feeders and coal pulverizers will be stopped.
Feedwater flow is related to the drum level control. If the flow is not sufficient to meet the requirements related to the amount of water in the drum, the outage related to low drum level may occur. Besides that, cracks in the feedwater pipe system can be identified by using the flow measurements. If the cracks are identified, the outage is usually planned to repair the pipe system.
The data representing anomalous behaviour contain information about the system state in the initial stage of such behaviour and some time in the course of such state (5-10 min). The data do not necessarily represent the plant outage, but its unstable operating state. Also, the outage may occur for many other reasons (other than anomalous behaviour of the used process variables), but it certainly has an impact on those variables because many of the plant's subsystems are linked. The variables used in this paper can have an impact on those which are not, and vice versa.
"Tables 1" and "2" summarize the statistical characteristics of process variables, used for anomaly detection. The data are separated on the ones representing normal behaviour and the ones representing anomalous behaviour. The similarity between normal and anomaly data is very attractive in our research, because it is not possible to use a simple method for the detection of "anomaly" such as threshold valuesfor the input parameters. Scaling factors for input variables were not used because the data does not differ significantly in their amounts.

Neural network modelling
In the case of very complex time-varying and non-linear systems, where reliable measurements are very complicated and valid mathematical models do not exist, a number of different methods from the area of Artificial Intelligence have been proposed. ANNs are massively parallel-interconnected networks that have the ability to perform pattern recognition, classification and prediction. ANN learning can solve problems with the noisy and complicated training data and it is robust to errors in the training data-set. ANNs represent an important class of anomaly detection techniques [15]. For anomaly detection, it is needed to relate the measurement data to the ideal performance, and distinguish between normal and abnormal states [3]. The accuracy of classification by ANN benefits from its classifier algorithm, which determines the best solution by trying to minimize the number of incorrectly classified cases during the training process. The choice of network architecture is dependent on the problem [4,20]. The implemented data-driven modelling approach utilizes three supervised diverse paradigms of artificial neural-networks: MLP, Elman recurrent neural network (ERNN) and probabilistic neural networks (PNN).
MLP neural networks are commonly used in pattern recognition, where the main problem is classification of an unseen instance into one of the existing classes. Due to this fact, MLP neural networks are a logical choice as an anomaly detection technique [21]. MLP is a feedforward ANN which consists of input, output and one or more hidden layers of nodes arranged in parallel between input and output layers. Logarithmic and sigmoid functions are commonly used activation functions in hidden layers, while linear functions are used in the output layer. MLP neural networks use a learning algorithm called back propagation (BP). Levenberg-Marquardt (LM) algorithm has been used for network training, validation and testing as it finds the best weights by minimizing the function.
The wight update Dw ji (n) is defined by the generalized rule: where h represents a learning rate parameter used by the network, e j are output node errors, v l are the weighted sum of the weighted inputs and y l are the level network outputs. Recurrent neural networks (RNNs) use a feedback loop in their hidden layers [22]. Therefore, this type of network can be used to solve complex problems that some feedforward networks cannot solve, but the downside is that some additional learning difficulties may occur.
A RNN used as parameter anomaly detection technique in this paper is the ERNN, which usually has only one hidden layer with a feedback loop, but multiple hidden layers can be also used. The feedback loop present in this type of neural network returns a hidden layer output value which is used as an input for the next iteration. Activation functions in hidden and output layers are very similar to the MLP neural network (a tansigmoid function is used in the hidden layer while a linear function is used in the output layer).
Synaptic weight adjustments are calculated as: v k are conflutation functions, y k are output node values and e k are errors of output node. Probabilistic neural networks (PNNs) belong to the stochastic neural networks group [20]. The probabilistic neural net is based on the theory of Bayesian classification and the estimation of probability density function (PDF).
The PNN works by creating a set of multivariate probability densities that are derived from the training vectors presented to the network. The summation layer neurons compute the maximum likelihood of pattern x being classified into c by summarizing and averaging the output of all neurons that belong to the same class: where N i denotes the total number of samples in class c, n is the number of features of the input instance x, s is the smoothing parameter and x ij is a training instance corresponding to category c. The test instance with low probability with respect to established PDFs is considered as abnormal. The accuracy of these methods heavily depends on the used threshold.
If the a'priori probabilities for each class are the same, and the losses associated with making an incorrect decision for each class are the same, the decision layer unit classifies the pattern x in accordance with the Bayes' decision rule based on the output of all the summation layer neurons: where C(x) denotes the estimated class of the pattern x and m is the total number of classes in the training samples. If the a'priori probabilities for each class are not the same and the losses associated with making an incorrect decision for each class are different, the output of all the summation layer neurons will be where cost( i x) is the cost associated with misclassifying the input vector and apro( i x) is the prior probability of occurrence of patterns in class c.
The advantages of this network type are: rapid training process (faster than BP based networks), guaranteed convergence to the optimal classification with increasing data set and the ability to change the number of learning inputs with minimal or no additional training.

Performance metrics
The choice of an evaluation measure depends on the domain of use and the given problem. Each of them has specific characteristics that emphasize different aspects of the evaluation of algorithms [23,24]. There are four possible outcomes of anomaly detection. True positive (TP) and true negative (TN) outcomes represent a correct classification, while false negative (FN) and false positive (FP) outcomes represent an incorrect one. Both types of incorrect classifications represent a hazard. FPs can cause an action which is not needed, but FNs are actually more dangerous because an anomaly would be ignored. Based on these counts, the following performance metrics are calculated: accuracy (ACC), sensitivity or true positive rate (TPR) or recall, specificity or true negative rate (TNR), precision (PR) or positive predictive value (PPV), negative predictive value (NPR) and F1 score [24].
The main verification measure is accuracy (ACC), which is defined as the proportion of correctly classified instances against all (correctly and incorrectly classified) instances. It is calculated as follows: In addition to this measure, some measures that take into account if the outcome is positive or negative are also used. The most commonly used pair is sensitivity or true positive rate (TPR) and specificity or TNR. These are calculated as follows: These measures take into account every type of anomaly that may occur and are not influenced by class distribution because each of them refers to only one class. In contrast to sensitivity and specificity, there are also measures that are influenced by class distribution. There is a pair of measures used. These are recall, calculated using "(11)", and precision (PR) which is calculated as follows: These measures are commonly used if the number of true negatives exceeds the number of TPs by far. The last pair of measures used in this paper is a pair of predictive values (positive and negative) based on precision. The positive predictive value (PPV), calculated using "(13)", represents precision in positive outcomes and the negative one (NPR) represents the same in negative classification outcomes and is calculated as follows: These measure pairs are used as verification measures from different points of view. Maintaining good scores for all of the measures is often an optimization problem. Sometimes improving the score of one measure decreases the score of the other and vice versa. If there is a desire for verification using only one measure, it can be done by fixing the score of the other measure of the same pair. However, it is more often that only one measure, called the F1 score, is used. It is calculated as follows:

Evaluating classifiers
The basis of all measures to evaluate the model is measuring its effectiveness, i.e. estimation of the ability of the classifier to correctly classify as many of the examples that were not involved in the process of creating a model as possible. Therefore, it is not customary in the process of generating models to use all the available examples of a well-known classification. The most popular result validation or evaluation technique is cross-validation [24]. The goal of cross-validation is to define a data-set to test the model in the training phase in order to limit problems like overfitting, give an insight on how the model will generalize to an independent dataset (i.e. an unknown dataset, for instance from a real problem). From this reason, the initial set of examples is divided into three parts: the training, validation and test data-set. The training data is used to build the model and validation is usually used for parameter selection and to avoid overfitting. On the contrary, test data-set is only used to test the performance of a trained model. There are no general rules on how to choose the number of observations in each of the three parts. The idea is to separate the available data into a training data set (50%, 60% to 70% of the data) and remaining (25 to 20% or 15%) each for validation and testing. The better approach would be to repeat the previous procedure multiple times, titled as k-fold cross-validation. The idea behind k-fold cross-validation is to divide all the available data into k roughly equal-sized groups. In each iteration of k-fold cross-validation, k-1 groups are used for training and the remaining one is used for testing. The k-fold cross-validation iterates through a number of folds. After the first iteration, the next group is used for testing, and the remaining data are used for training. This procedure repeats until all of the groups are used for testing once. The main disadvantage is the potential for different validation results due to stochastic process of group forming at the beginning of the validation process. The k-fold cross-validation can give different results each time it is performed. This can be avoided by repeating the process multiple times and using the mean validation result. In order to improve the training phase the 10fold cross-validation was selected for the final estimation of algorithms [12].

Experimental results
In this paper, the MLP, Elman and PNN neural network-based data-driven modelling of anomaly detection was developed using the Matlab/NN toolbox functions. The performance and robustness of the different networks were compared so that the best datadriven model in terms of accuracy, performance and cost could be selected among available architectures.

Data setting
The selections of input variables of the ANN have been made based on the physical significance, working and thermodynamic principles of the thermal turbine operation. For the current work, six input parameters (which statistical characteristics are presented in Tables 1 and 2) are used for training and validation of the ANN model as well as testing and simulations for each selected section. The operational data was collected from an actual system for monitoring and control of thermal power plant "Tuzla" with the sampling period of 1 s. There are 962 instances that represent normal behaviour and the same amount representing anomalous behaviour in the input data-set. The data set is divided into a training data set (70% of the data), a validation data-set (15% of the data) and a test dataset (15% of the data). Scores of an ideal classification would all be the same (value 1 for ACC, PR, NPR and F1), except TPR and TNR (value 0.5). Given that the initial synaptic weights are chosen randomly, it is possible to obtain different results if a neural network is trained and tested multiple times. Therefore, all of the obtained results are obtained from average confusion matrices.

Neural network parameters setting
The parameters used for MLP neural network-based anomaly detection are: number of neurons in hidden layers, learning rate, number of epochs and learning momentum. The number of hidden layers can be increased depending on the problem. However, a MLP with three hidden layers is sufficient to map every continuous function by adding a certain number of neurons to meet required complexity [16,25].

Discussion and recommendations
The comparison of neural network data-driven modelling (MLP, ERNN and PNN) was obtained by network parameter changes. If we treat this problem as multiobjective optimization task, where each objective is defined as specified neural network parameter changing, we got the optimal front, which consist of more suboptimal results, presented in Table 3 for Superheaters section and in Table 4 for Steam Drums section, respectively. For PNN, the spread coefficient is only one objective for creating search space of neural network structures. We calculated all performance metrics (ACC, TPR, TNR, PR, NR, F1) [27]. In this research, we found optimal front of the neural network structures instead of the optimal parameters of neural networks, which requires more complex methods.
For both selected sections, the PNN provide better results for all performance metrics. Namely, PNN model achieves the total accuracy, PR, NPR and F1 score of 99.9% for selected best parameters of NN modelling.
The ERNN provides a bit better results (81%-98%) compared to the MLP neural network (72%-97%) with test data from Steam Superheaters. The similar situation is with test data from Steam Drums: the ERNN provides a bit better results (94%-98%) compared to the MLP neural network (89%-97%) .
The both neural network types (MLP and ERNN) give the better classification results for the Steam Drums section (87%-97%) compared to the Steam Superheaters section (72%-97%). ACC and F1 score are higher for similar parameters for Steam Drums section then for Steam Superheaters section. This is somewhat expected, because there is a greater difference between the data representing anomalous behaviour and the data representing normal behaviour for the Steam Drums. The effectiveness of those different neural network approaches for anomaly detection is demonstrated in Figure 2 (for Steam Superheaters section) and in Figure 3 (for Steam Drums section). The same number of anomalies and normal samples are used for presentation of results for both sections (Steam Superheaters and Steam Drums).
Although the MLP and the ERNN give good results, a few number of FP classifications is noticeable on both figures (from Table 3 for Steam Superheaters section, TPR values are for MLP: 0.53-0.62 and for ERNN: 0.53-0.57). This is compensated by reducing the number of FN classifications (TNR for MLP: 0.38-0.49 and for ERNN: 0.43-0.49), which is more important. This is one more reason why PNN data-driven modelling can be chosen as the best for anomaly detection for those data from thermal power plant. Figure 2. (a). The three process variables from first separated system (TS I, FS I and CWF I); (b). the three process variables from second separated system (TS II, FS II and CWF II); (c). the comparison of MLP (10,0.3,100,0.1), Elman (10,0.05,50,0.9) and PNN(5) output (anomaly detection) with desired output variable for Steam Superheaters section.

Conclusion
Anomaly detection is an important problem that has been researched within different research areas and application domains. The industrial units get damaged due to continuous usage and this should be detected as early as possible to prevent losses. How the thermal power plant system is one of the most complex dynamical systems which must function properly all the time with least amount of costs, it is very important to have correct anomaly detection in system.
This paper presents the comparative study of different neural network based data-driven models to explore possibilities of anomaly detection in selected sections of "Tuzla" thermal power system. All of the inputs are observable from monitoring system of thermal power plant, and their anomaly/normal behaviour is recognized by operator's experiences. Experimental results demonstrate that neural networks are highly successful for early anomaly detection, especially PNN model achieves the total accuracy, PR, NPR and F1 score of 99.9% for selected best parameters of NN modelling.
Only some data from large sections of the boiler were included in the analysis provided in this paper. The future research could be extended in next directions: to many other (smaller) sections as data sources, to optimal neural network structures and to including Figure 3. (a). The three process variables from first separated system (DL I, DP I and FWF I); (b). the three process variables from second separated system (DL II, DP II and FWF II); (c).the comparison of MLP (10,0.3,100,0.1), Elman (10,0.01,200,0.9) and PNN (5) output with desired one (anomaly detection) with desired one for six process input variables from Steam Drums section. this anomaly detector as standalone application integrated in modern real-time health monitoring system of the thermal power plant.

Disclosure statement
No potential conflict of interest was reported by the authors.