Dynamic multimode process monitoring using recursive GMM and KPCA in a hot rolling mill process

The increasing competitive market has put forward higher demand for iron and steel production process, which is characterized by high-dimensional, nonlinear and multi-scale coupling. The newly rising internet of things (IoT) and advanced communication technologies have promoted the widespread application of data-driven process monitoring methods. To deal with the multimode and non-stationary properties of hot rolling production, a dynamic multimode process monitoring method is proposed based on the recursive Gaussian mixture model (RGMM) and recursive kernel principal component analysis (RKPCA). The proposed approach is applied to the monitoring of the hot-rolled strip thickness oversizing, and comparative experiments are conducted with KPCA, GMM-KPCA on actual production data. Results show that the proposed method shows better performance than conventional methods in terms of fault detection rate and false alarm rate when detecting time-varying multimode faults. The proposed method has also been integrated into an actual system and has been running smoothly in large steel mills in China.


Introduction
With the increasing fierce market competition, the demand for multi-species, multi-specifications as well as high-quality products from customers is becoming stronger, which makes the process monitoring of steel production an area of both academic and industrial activity (Ma et al., 2017). The steel production process is composed of multiple interdependent physical and chemical processes, characterized by a heterogeneous, highdimensional, multi-scale coupling, and complex genetic mechanism . Taking the hot rolling production process as an example, it consists of a reheating furnace, a roughing mill, a finishing rolling mill and a cooling and coiler sub-process. The final quality of hotrolled coil products consists of three types: size (e.g. steel strip thickness and width), shape (e.g. crown, wedge and flatness) and surface (e.g. oxidized scale and longitudinal cracks). Most of these quality indices are generally not measurable on-line. The above characteristics make the traditional mechanism model-based fault diagnosis or process monitoring methods difficult to be applied in the dynamic nonlinear hot rolling process.
CONTACT Hongwei Wang hongweiwang@zju.edu.cn After decades of application of advanced sensing, communication technology and distributed control system (DCS), most large steel mills have formed a five-level automation and information architecture that includes basic automation system, process control system, manufacturing execution system, manufacturing management system, and business decision-making system. The newly rising industrial internet of things (IIoT) and cyber physical system (CPS) technologies have further broken the barriers between different information systems and promoted the collection, fusion and storage of multi-source heterogeneous data (Cao et al., 2020;Han et al., 2020;. The IIoT platform and development of data science enable the widespread application of data-driven process monitoring methods (Nkonyana et al., 2019), in which normal operation conditions are modelled with historical process data and the state of the monitored process is then examined by evaluating the deviation of indicators Quiñones-Grueiro et al., 2019;. The most typical data modelling methods are based on feature extraction, such as principal component analysis (PCA) and independent component analysis (ICA) (Guo et al., 2019;Jiang et al., 2016;Jiang & Yan, 2018;Zhou et al., 2016). They extract the main features from the sample that reflect the normality or abnormality by dimensionality reduction. Another category is correlation-related methods, such as canonical correlation analysis (CCA) and partial least squares (PLS) Liu et al., 2017;Liu et al., 2018). Some machine learning and deep learning methods have also been applied in the process monitoring due to their powerful feature extraction capability in dealing with highdimensional problems, such as autoencoder (AE) and its variations, and Bayesian networks Lee et al., 2019;Song et al., 2020;Yu & Zhao, 2019).
However, a variety of product specifications and multiple system states bring multimode characteristics to the hot rolling production process. This issue has brought challenges to the process monitoring of complicated and changeable industrial process. Several approaches have been proposed to deal with these issues. Regarding the multimode process monitoring, the improved methods can be mainly divided into two categories: global models and multiple models. Zhang et al. proposed a basis vectors-based common and specific features extraction method and used a Kullback-Leibler distance-based metric to measure the changes in both the features . Huang et.al divided the multimode process into a common pattern and a mode-specific pattern and applied the sparse coding method in dictionary learning to obtain the reconstruction error . Another category of methods is to apply clustering algorithms to divide sample data into different modes and then establish a corresponding monitoring model for each single mode. Chen et.al integrated the Kmeans algorithm with Just-In-Time-Learning-aided CCA in monitoring processes with operating mode change and slowly time-varying behaviour, and conducted comparative experiments with kernel-based methods (Chen et al., 2021). Song et.al proposed a neighbourhood subtractive clustering algorithm to divide the training dataset into several sub-datasets and constructed the process monitoring model in each mode based on the elastic network (EN). For online monitoring, kNN and the voting strategy were applied to select the most suitable monitoring model and the contribution plot-based method was used for fault diagnosis (Song et al., 2020). The Gaussian mixture model (GMM) has been widely applied in multimode process monitoring to determine the number of modes automatically, since it works without any prior knowledge about the mode number. Wang et.al applied the Dirichlet process Gaussian mixed model (DPGMM) to classify the dataset and the t-distributed stochastic neighbour embedding (t-SNE) to obtain the low-dimensional features. Based on the features, the monitoring index for fault detection was constructed using the support vector data description (SVDD) . However, most of the existing GMM-based methods ignore the time-varying behaviour of the monitoring process, which subverts the online adaption of the mode classification and identification.
In addition to the multimode characteristics, the non-stationary property of process parameters and equipment states also makes the hot rolling production time-varying. Several adaptive methods, based on the multivariate statistical process monitoring (MSPM) models, have been proposed. Liu et al. developed a moving window KPCA (MWKPCA) monitoring approach, which adapted the data mean and covariance matrix of the feature space in updating procedure and approximated the eigenvalues and eigenvectors of the Gram matrix in the downdating procedure (Liu et al., 2009). To overcome the heavy computation and memory burden of KPCA, Jaffel et al. proposed a reduced KPCA approach in which the reduced kernel matrix was constructed based on the projection value of the transformed data (Jaffel et al., 2016). Zhang et al. improved the dynamical recursive KPCA method based on the singular value decomposition technique and applied it to the fault detection of the continuous annealing process and the penicillin fermentation process (Zhang et al., 2012). Most of the existing adaptive research studies are based on single-mode process monitoring approaches. In this paper, a method that integrates recursive GMM and recursive KPCA is developed for dynamic and multimode fault detection in the hot rolling process. The proposed hybrid framework takes into account multiple characteristics of the hot rolling production process, including multimode, nonlinear and non-stationary. In this framework, GMM is applied to conduct mode identification, and KPCA is used in the fault detection process, the recursive mechanism is combined with both GMM and KPCA to achieve the completely online adaption of the monitoring process.
The rest of this paper is organized as follows. Section 2 introduces the preliminary knowledge of GMM and KPCA, and presents the updating process of RGMM and RKPCA in detail. Section 3 described the mechanism model and parameters' selection of thickness monitoring. Then the proposed RGMM-RKPCA approach is tested in comparative experiments. Finally conclusions are drawn in Section 5, together with some discussion of potential future work.

RGMM
Assuming that a set of samples X = {x l ∈ R D } N l=1 is generated from K multivariate Gaussian components, the distribution of x can then be expressed as where ω k is the weight of the kth Gaussian compo- μ k , k is the mean and covariance of the kth component. The whole model parameter set is represented by The log-likelihood of the joint density of sample X can be calculated as The EM algorithm can be used to obtain the model parameters.
In the E-step, the conditional expectation of γ lk is calculated, which denotes the posterior responsibility of sample x l belonging to the k-th component. γ lk is obtained according to Bayes' theorem, In the M-step, the parameters of the probability model are estimated by seeking the maximum value of the loglikelihood function. The Lagrange function is used to find the extreme value.
Then ω k , μ k , k yields When the new data x N+1 come, in RGMM the model parameter is updated as follows.

RKPCA
be the set of zero-mean sample data. The basic idea of the KPCA method is to map the input space to a high-dimensional space through nonlinear mapping .
The covariance matrix C in the feature space is The eigenvalues λ ≥ 0 and eigenvectors v are calculated by Considering that all eigenvectors v can be expressed as a linear expansion of ( where k = 1, 2, . . . , M, define the kernel matrix K μν := ( (x μ ) · (x ν )), (9) can be simplified to For a new testing sample x, its projection in the eigenvector space is where k = 1, . . . l (l denotes the number of retained KPCs).
The two statistics are calculated as In the recursive KPCA, if a new sample x M+1 comes, to reduce the computational complexity of KPCA, we retain the first m eigenvalues of X M . The parameters of the KPCA model are updated according to the following steps.
Then the updated KPCA projection direction is calculated as Since the number of the PCs will change over time, we use the cumulative percent variance (CPV) method to determine the number of PCs in RKPCA modelling.
The r PCs are chosen with a predetermined CPV. The rdimensional nonlinear principal component t of the new sample in the feature space is The confidence limit for Q is approximated by where μ is the mean of Q statistics and ρ 2 is the variance. The confidence limit for T 2 is approximated by where β is the confidence level.

Description of the hot-rolled strip thickness oversizing
With the development of the national economy, industries, such as automobile manufacturing, machining, electronics and electrical appliances, have an increasing demand for strip steel, and at the same time, the quality requirements of products are also getting higher. Due to the large width-to-thickness ratio of strip steel, thickness is one of the most important dimensional quality indicators of strip steel. Thickness accuracy is usually divided into strip thickness deviation and head thickness hit rate. This paper focuses on the latter, which requires the difference between the thickness of the strip head and the required thickness of the finished product to be within the allowable accuracy range. Because the adjustment accuracy of the automatic gauge control (AGC) system is generally 30 ∼ 50 μm, the thickness with more than ±50 μm deviation from the standard is defined as a product quality problem. Figure 1 shows the examples of the strip thickness oversizing.
To determine the key parameters of the multimode process monitoring model, we first analyse the thickness influencing factors and change law from the perspective of the mechanism model. When the rolled piece bites into the rolling mill, the roll will give the rolled piece a large rolling force, resulting in plastic deformation. At the same time, the roll system is subjected to a force in the opposite direction and equal in magnitude to cause the roll to bend and deform. The resulting loaded roll gap is the thickness of the rolled strip. Therefore, the thickness setting model is based on the elastic deformation theory of the roll and the plastic deformation theory of the rolled product. Figure 2 shows the schematic diagram of the thickness change of the strip rolling, and the classical P-h curve coupling the spring equation and the rolling force formula, where the rolling force P is the ordinate, thickness and roll gap value is the abscissa. The P-h curve can be used to intuitively analyse the various causes for the thickness difference ( Figure 2).  Corresponding to the p-h curve is the spring equation for the rolling mill shown below.
where h is the thickness of the rolled piece, S 0 is the no-load roll gap, P is the rolling force, P 0 is the prepressing force shown in Figure 6, K is the stiffness coefficient of the rolling mill, and S F is the thickness change caused by the bending force, G is the zero position of the roll gap, and δ is the oil film thickness compensation. P is related to strain accumulation, dynamic recrystallization, static recrystallization, phase change and other factors. Existing rolling force models are derived from the concise deduction of relevant influencing factors under specific constraints, and the following expressions can be used.
where B is the bandwidth, l c is the horizontal projection length of the contact arc between the roll and the rolled piece, which is related to the radius of the roll. Q p is the stress status modulus, which is determined by the shape parameter of the deformation zone. K is the metal deformation resistance, which is related to the deformation temperature, deformation speed and degree of deformation. K T is the influence coefficient of the front and back tensile stress on the rolling force.

Parameters' selection
According to the above mechanism model, we selected 48 parameters as inputs of the process monitoring model, which are divided into incoming slab-related, rolling force-related and rolling mill-related parameters. Table 1 shows the range and distribution of some of the variables. Due to fluctuations in the data acquisition equipment, such as the unstable value of the sensor in a complex and changeable environment, the values of the obtained process parameters will fluctuate. When the fluctuation is large, it will destroy the statistical law of the data set, which is regarded as an outlier; when it is small, this type of fluctuation is regarded as noise. Due to the complex working conditions of the industrial production process and changes in the production environment, the existence of noise is inevitable. Considering the existence of noise during modelling can improve the robustness and stability of the model. Therefore, it is necessary to select an algorithm that can retain noisy data when eliminating outliers. In this paper, the isolated forest (iForest)  algorithm is selected to eliminate the abnormal values in the thickness data of the hot-rolled strip production. iForest is a fast outlier detection method based on ensemble learning, which has linear time complexity and high accuracy, and is suitable for continuous numerical data. Unlike other anomaly detection algorithms that use distance and density indicators to describe the degree of alienation between samples, iForest detects outliers by isolating sample points. Compared with traditional algorithms such as LOF (Local Outlier Factor) and K-means, iForest is more robust to high-dimensional data. The algorithm flow is shown in Figure 3. Given the sample data, to construct an isolated tree, a feature and its segmentation value are randomly selected, and the data set is recursively divided until any one of the following conditions is met: the tree reaches the restricted height; there is only one sample on the node; all features of the samples on the node are the same. The task of anomaly detection is to give a ranking of the degree of abnormality. The commonly used ranking method is based on the path length or anomaly score of the sample points. The path length refers to the number of nodes that the sample travels from the starting node to the final node in the isolated tree.

Experimental results
The steel industry IoT platform, independently developed by our team, has been applied to several largeand medium-sized steel companies in China. The big data platform collects the automated control systems of different manufacturers on the hot strip production line and the L2 database management system, and builds a complete factory-level database of quality, process, equipment and energy consumption. To eliminate the influence of steel grade, thickness, width, etc., we set the steel grade as Q195L, the thickness of the intermediate billet is 45 mm, and the thickness of the finished product is 3.4 mm. A total of 2300 pieces of production data were obtained from the field, and 238 abnormal data are eliminated by the isolated forest algorithm. We select 2000 pieces of data from the remaining set as historical data sets for algorithm training and testing. The overall offline learning and online monitoring process is shown in Figure 4.
Two commonly used indices, the FDR (fault detection rate) and the FAR (false alarm rate), are applied in the paper to evaluate the performance of the process monitoring method. FDR reflects the ratio of the number of correctly detected faults to the total number of faults,  while FAR indicates the proportion of normal samples that are falsely reported as faults in the total number of normal samples. To verify the effectiveness of GMM algorithm in modal recognition and the advantage of RKPCA in dealing with dynamic processes compared with KPCA algorithm, we used three methods for comparative experiments: KPCA, GMM + KPCA, and RGMM + RKPCA. The threshold of the CPV is 90%, and the confidence level of all methods is 99% (Table 2). Figure 5 shows the results of different methods of monitoring thickness-related variables in the hot rolling production process. In the KPCA method, we first use 1000 normal samples for model training. The test data includes four parts, 250 normal data of mode 1, 250 normal data of mode 2, and 250 faulty data of mode 1, 250 faulty data of mode 2. SPE statistics and T2 statistics of the KPCA can detect the fault of mode 2 well, but the accuracies are low when detecting the fault of mode 1, which can be seen from the FDR value -22%. In addition, FAR of the SPE statistics of KPCA is 7.2% in mode 2, which means that since the KPCA algorithm ignores the data characteristics of different modes, the number of false alarms will increase.
In the latter two algorithms, we first use the GMM algorithm to classify 1000 normal samples, and use the normal samples of each mode to establish the corresponding KPCA model. In the online monitoring process, the GMM algorithm is applied to identify the mode of the new sample, and the corresponding KPCA model is used for diagnosis. The difference between the two models is that the RGMM-RKPCA algorithm will update the model parameters according to the diagnosis results of the samples during the training phase, and the SPE and T2 indicators will also change dynamically. By comparing KPCA with GMM-KPCA, we can see that after mode identification all the fault data in mode 1 are detected with a FAR value of 4%. Although the fault detection rate of mode 2 is reduced by 7.6% and 1.8% with SPE and T2 statistics, respectively, the false alarm rate is also much reduced. It can be inferred that establishing multiple nonlinear monitoring models for different steady-state processes can effectively solve the problem of multi-modality. However, the non-stationary problem is still not well addressed in GMM-KPCA, which can be found by the false alarm rate. It can be seen from Figure 5(e,f) that compared to the KPCA algorithm, the T2 and SPE statistics of GMM-KPCA can better identify fault data in different modes, and RKPCA can further reduce the proportion of false alarms in each mode. By online adapting the sample data of the KPCA model, the monitoring statistics in RKPCA also become adaptive and robust to slight faults.
The algorithms proposed in this paper have been running smoothly for six months in the actual system. Figure 6 shows the GUIs of the monitoring software on the site of a steel plant. Figure 6(a) shows the monitoring of different indices in the hot rolling process, such as thickness, width and FDT (finisher delivery temperature). Figure 6(b) illustrates the parameter importance ranking that affects different monitoring indices, and the corresponding reasonable parameter interval recommendation.

Conclusion
To better deal with the multimode and time-varying properties of hot rolling production, we propose a dynamic multimode process monitoring method based on the recursive Gaussian mixture model (RGMM) and recursive kernel principal component analysis (RKPCA) in this study. GMM is applied to classify the dataset and determine the mode for a new sample, and KPCA is used to construct the monitoring index for fault detection in each mode. The proposed approach is applied to the monitoring of the hot-rolled strip thickness oversizing in comparison with KPCA, GMM-KPCA on actual production data. Results show that the proposed method shows better performance than conventional methods in terms of fault detection rate and false alarm rate when detecting time-varying multimode faults.
It is worth mentioning that the proposed method has some shortcomings and there are some research directions for future study. The time complexity of the proposed approach can be improved for real-time industrial application. As for the nonlinear monitoring methods, incorporation with some emerging deep learning algorithms, such as the recurrent neural network (RNN) and deep belief networks (DBN), could be a promising extension. As the subsequent step of fault diagnosis, the fault location and repair mechanism are necessary in the industrial field to guide the operation of physical entities. Besides, with the higher dimensions of process industrial data, the distributed parallel approach can also be taken into consideration to handle the big process data more efficient.

Disclosure statement
No potential conflict of interest was reported by the author(s).