Multivariate statistical process control methods for batch production: a review focused on applications

ABSTRACT In this paper, we highlight the basic techniques of multivariate statistical process control (MSPC) under the dimensionality criteria, such as Multiway Principal Component Analysis, Multiway Partial Squares, Structuration à Trois Indices de la Statistique, Tucker3, Parallel Factors, Multiway Independent Component Analysis, Multiset Canonical Correlation Analysis, Slow Features Analysis, and Parallel Coordinates. Furthermore, we summarize the procedures of each statistical technique and the usage of multivariate control charts. In addition, we review the most significant proposals and applications in practical cases. Finally, we compare and discuss the benefits and limitations within methods.


Introduction
Multivariate statistical process control focuses on process monitoring in which several related variables are involved (Bersimis et al., 2007). Nowadays, there are many industrial scenarios where simultaneous monitoring or control of two or more related qualityprocess characteristics is necessary. In particular cases, several industries rely on batch processing, including, for example, chemicals, pharmaceuticals, food products, fabrics, metals, bakeries, pulp, and paper manufacturers. In this specific situation, high-quality products are commonly described by quality characteristics (variables), each of which must be controlled within specifications to maintain customer satisfaction and to describe the process performance as the batch progresses (Lewis, 2014).
Pioneer work on multivariate process control was first illustrated in an air testing of sample bombsights case (Hotelling, 1947). This key point gave way too many other techniques, such as multivariate extensions for all kinds of univariate control procedures and control charts like multivariate Shewhart-type, cumulative multivariate sum (MCUSUM) and multivariate exponentially weighted moving average sum (MEWMA) (Bersimis et al., 2007). However, these procedures only took into consideration the multivariate approach, leaving behind further considerations where independence and multinormality are not always verifiable.
In batch processes, traditional multivariate procedures became unusable and principal contributions appeared with multivariate projection methods. As for our concern, first novel approaches to batch monitoring were mostly grounded on multiway partial least squares (MPLS) and multiway principal component analysis (MPCA) , 1995a. From that moment on, several strategies and methods have been developed in order to meet statistical assumptions, such as correlationautocorrelation structure, distribution and parameter free, non-linearity, among others, typical in batch production models (Lewis, 2014).
As known, multivariate process control is one of the most rapidly developing areas of statistical process control and the need for review papers is increasingly compelling for boosting new research ideas in batch processing (Bersimis et al., 2007;Woodall & Montgomery, 1999). This paper is the result of a literature review of the most recent developments in the area of multivariate statistical process control, considering batches datasets.
The remaining sections are organized as follows. A brief review of the article search methodology is detailed in Section 2. The standard terminology used throughout the paper is specified at the beginning of Section 3. Dimensionality and non-dimensionality reduction methods are described in Sections 3.1 and 3.2, respectively. Relevant proposals and applications of these strategies are summarized in Section 4. Finally, some concluding remarks are delivered in Section 5.

Materials & methods
Scholar Google and Databases were searched for relevant articles using the keywords ' MSPC', 'batch', 'application', and'review' from 1990 to 2018. As a result, about 160 articles were searched and reviewed. Only 37 publications from 23 journals and 9 databases were selected from the previous search and analyzed in detail, focused on the criteria of practical applications in industrial processes. Articles with application on matrices instead of multiway arrays were less considered. The databases distribution of articles can be observed in Table 1. In some cases, the third way is not a 'K' number of observations, instead, an ordered time succession is considered.
A wide variety of methods are used to control batches in MSPC. Most of them rely on dimensionality reduction via decomposition of the multilinear array X ½ � (Figure 2) considering lower rank approximation to represent it, ending with an estimated array b X h i and its corresponding errors tensor E ½ � that can be obtained as E ½ � ¼ X ½ � À b X h i : An unfolding procedure of the three-way array is performed in several methods. Three different slicing perspectives ( Figure 3) are possible within this context: Then, according to the subject of the study 12 types of unfolding can be done, as shown in Table 2: For simplicity purposes, in this review we are going to consider the following notation:  . After the unfolding step, the application of singular value decomposition (SVD) is usual, where two or more principal components are subject of main analysis using the Hoteling's T 2 statistic, or similar, like D 2 statistic. Errors block is also monitored with statistics, such as Q and SPE. Two principal components allow to generate practical visual approaches, nevertheless, for T 2 control charts, cross-validation can be applied to define an optimum R number of components to use.
Based on the literature reviewed during this study, two main groups of multivariate statistical methods can be discerned: those that reduce dimensionality and others that preserve data original dimension.
According to the presented classification of MSPC methods for batch production, the reviewed articles were distributed as shown in Figure 4. It is important to point out that some of these articles consider two or more methods to compare their performance and highlight a specific one, under predefined criteria.

Multiway principal component analysis (MPCA)
MPCA was first proposed by Wold et al. (1987), and later Nomikos and MacGregor  suggested a formal methodology for batch processes. This method allows a reliable feature extraction of the data and projection onto a sub-space of fewer dimension, easier for graphing and interpretation (Lu et al., 2008).
From a given multiway array X ½ � of reference batches, MPCA performs Kunfolding ( Figure 5) and column centering, then conventional PCA is applied (Abdi & Williams, 2010). The preprocessed matrix X K ð Þ exhibits the following PCA decomposition: Scores (P) and loadings (Q) are obtained. The common practice is to use scores in a biplot (t 1 -t 2 scores plot), which involves R retained components, typically the first and second ones (Macgregor, 1997). Also, projected data into the t 1 -t 2 scores plot help to define confidence regions (Geladi et al., 2003).

Multiway partial least squares (MPLS)
PLS is also referenced as Projection to Latent Structures because the method's purpose is to obtain a reduced representation of the main information, using latent variables. It is based on the strong correlation between process and quality variables sets, X and Y respectively (Nomikos & MacGregor, 1995a).
MPLS performs very well when enough historical data are available, allowing to draw inferences (Flores-Cerrillo & MacGregor, 2005). As in MPCA takes place, the tensors for process variables X ½ � and quality variables Y ½ � must be K-unfolded into two-way matrices and then, PLS steps are applied (Abdi, 2010). The decomposition of preprocessed matrices X K ð Þ and Y K ð Þ is done as follows: . K-unfolding and MPCA method scheme.
and loadings (Q X ; Q Y ) are obtained for processes and quality variables. Further analysis includes the multivariate regression Y K ð Þ ¼ X K ð Þ B, where B is computed, as shown by Rosipal and Krämer (2005) (Figure 6):

Structuration des tableaux à trois indices de la statistique (STATIS)
The method was initially proposed by L'Hermier des Plantes (L'Hermier Des Plantes, 1976) and later described by Escoufier (Escoufier, 1987) and Lavit, Escoufier, Sabatier & Traissac (Lavit et al., 1994). This multivariate data analysis method is aimed at exploring and analyzing the structure of several data tables obtained under different scenarios. The method reduces data dimensionality through a similarity measure based on Euclidean distances between points' configurations. Considering the three-way array X ½ �, and X � T are obtained and stored in a bigger matrix Z, then its Singular Value Decomposition provides weights for the slices. In this step, every slice is represented onto the interstructure space denoted by: C ¼ ZZ T Figure 6. MPLS method scheme.

Second
Step -Intrastructure: Slices are juxtaposed in a unique array X I ð Þ . Matrix X I ð Þ , masses matrix M and weights matrix A, conform the triplet X I ð Þ ; A; M À � . A generalized singular value decomposition (GSVD) is performed as follows: In terms of statistical process control, the pioneering work done by Scepi (Scepi, 2002) was the first contribution of STATIS. However, a recent review on this multivariate method and its extensions is available in Abdi et al. (2012).

Parallel factors analysis (PARAFAC/CANDECOMP)
This method was developed in 1970 as PARAFAC (PARAllel FACtors) by Harshman and as CANDECOMP (CANOnical DECOMPosition) by Carrol and Chang, independent and simultaneously (Carroll & Chang, 1970;Harshman, 1970). Because of the nature of the decomposition, PARAFAC can be compared with a three-way PCA, with one scores and two loading sets of vectors stored in matrices. The PARAFAC method was successfully applied for statistical control and analysis in batch production (Meng et al., 2003).
In this case, Figure 8 reveals the decomposition of the three-way array X ½ �, which can be described as follows: Where a ir , b jr and c kr are elements of the matrices A, B and C with I � R, J � R and K � R orders, respectively.
The K-unfolded matrix X K ð Þ can be decomposed considering the Khatri-Rao product � (Smilde et al., 2004) as: The computation of the sets is often slow and difficult to process; alternating least squares can be applied but is not much efficient. In addition, Kiers and Krijnen (1991) proposed a three steps method that improves the process and makes it more feasible to use.

Tucker3
This method was first proposed by Tucker (1966) as shown in Figure 9. It can be used for higher dimensional data arrays; for instance, Tucker3 is a decomposition of X ½ � in four sets of loadings: A, B, C (as in PARAFAC), and G ½ � called 'core'. Every observation can be expressed by: where a ir , b jr and c kr are elements of the three matrices A, B and C with I � R 1 , J � R 2 and K � R 3 orders, respectively; and g r 1 r 2 r 3 represent an element of the core G ½ � with order R 1 � R 2 � R 3 .
The K-unfolded matrix X K ð Þ can be decomposed considering the K-unfolded core G K ð Þ and the Kronecker product � (Bellman, 1997) as: Tucker3 model allows the contribution of much more factors than PCA and PARAFAC. Its core structure can describe the interaction level between these factors. When the core is a super identity matrix (diagonal full of ones, and zeros for the other elements), Tucker3 is equivalent to a PARAFAC analysis. A comparison between MPCA, PARAFAC and Tucker3 was carried out in a semibatch process in Louwerse & Smilde (2000).

Multiset canonical correlation analysis (MCCA)
MCCA extends the CCA constraint of 'two sets of variables' to the possibility of analyzing multiple sets at the same time. The purpose is to find weight vectors that maximize the correlation among the canonical variables. Within the multiway array X ½ �, the m th weight vector U m k is obtained per K-slice from the SVD of a square matrix (Parra, 2018), and its corresponding canonical variable is computed as (Figure 10): A more detailed math formulation and computing is available at (Jiang et al., 2018;Wang et al., 2017).

Multiway independent component analysis (MICA)
Often used to find independent signals that recreate some complex ones, MICA is a powerful method when MPCA does not fit the data in optimum ways. MICA is equivalent to apply ICA on an unfolded and mean-centered matrix X K ð Þ (as in MPCA), leading to the following result: where every a i is statistically independent from each other where A contains the scores, and S, the information of the independent components from X K ð Þ . These components, independent in a statistical context, are computed from weights The iterative procedure to estimate W consists of calculating weights vectors w i with maximum non-gaussianity using an objective function and kurtosis as a nongaussianity measurement. After every non-gaussianity maximization, w k is decorrelated from the others previously computed. MICA decomposition is summarized in Figure 11. This method is a big aid when non-Gaussian variables become difficult to manage. More details and math specifications are available at (Hyvärinen & Oja, 2000).

Multiway slow features analysis (MSFA)
MSFA is equivalent to apply SFA over a J-unfolded matrix X J ð Þ . The basic idea of SFA as described by Wiskott et al. (Wiskott, 1999;Wiskott & Sejnowski, 2002) is to find a series of instantaneous scalar input-output mapping functions so that the output signals vary as slow as possible. In this sense, the method's goal is to find a set of constrained functions y t ð Þ called features with the capability to model a complex behavior from a dataset. Optimization problem is described as follows: the functions y t ð Þ should have a 'slow variation', measured by the variance of its first-order derivative with respect of time; also present zero mean, unit variance, and decorrelation with each other.
The MSFA process starts with K-unfolding of X ½ � to perform mean centering of X K ð Þ for as the i th row of X J ð Þ , covariance matrices A J�J and B J�J are defined as follows: Solving the optimization problem of SFA leads to the generalized eigenvalue problem (GEVD): This allows to project new observations using w j vectors as depicted in Figure 12.
For a more detailed and extensive analysis of the method, see Zhang et al. (2017).

Parallel coordinates
Parallel coordinates as a graphical representation system were proposed by Alfred Inselberg from which multivariate data could be represented in a two-dimensional system (Inselberg & Dimsdale, 1990). Each K observation of X ½ � within I-slices is represented in Jvariables by a polygonal of J À 1 sides. In addition, the correlation analysis between J variables focuses on intersections from polygonals that are constructed from raw I-slices (Inselberg, 2009;Wegman, 1990). When lines present a similar behavior, this tool is very useful in identifying clusters between observations (Inselberg, 2009).
In this sense, parallel coordinates can be used for 'visual clustering', i.e. to find groups of similar points based on visual features such as the proximity of lines or line density (Heinrich & Weiskopf, 2013). For control and monitoring, parallel coordinates capacity for clustering and summarization have great importance in plotting confidence regions (Dunia et al., 2012) as illustrated in Figure 13.

Applications timeline
In 1994, Nomikos and MacGregor simulated data from an industrial process for semibatch emulsion polymerization of styrene-butadiene to make a latex rubber (SBR) was considered to validate the monitoring of batches using MPCA on historical data . This kind of approach has many advantages because allows to control the quality of the process and its products even if there's not a deep understanding of the process dynamics. Kosanovich et al. (1994) applied MPCA on real data of an industrial single batch reactor, including stages separation. This improved the detection of the principal causes of batch failure, like variable reactor temperature (stage 1) and heat input rate (stage 2).
In 1995, Nomikos and MacGregor used MPCA to summarize variables and time of real data from industrial batch polymerization reactor into low-dimensional spaces, which allowed to monitor the process visually (Nomikos & MacGregor, 1995b). Also, Kourti et al. (1995) applied MPLS technique to real data of the same industrial batch polymerization process with one and two stages, considering multiblock arrangements. They constructed monitoring charts which helped to detect faults quickly and easily, allowing the isolation of the possible causes of the fault. In addition, Nomikos and MacGregor (1995a) pointed out that MPLS can be easily implemented for almost any batch or semi-batch process. They applied this technique on simulated data of a polymerization process to control not only process variable trajectories but also final quality measurements when the batch production ends. Gallagher et al. (1996) showed that using MPCA on real data for a nuclear waste storage tank monitoring helped to monitor the process and identify differences between the start-up period and the calibration period. The main advantage of this method is shown in terms of input data reduction.
MPCA was also applied in 1996 with simulated data of a batch polymerization reactor study, based on 'pilot-scale methyl methacrylate' (MMA) reactor, the proposed method used special confidence bounds and validated the capacity of the method to monitor the data (Martin et al., 1996). Control strategies using MPCA also help to improve process understanding, as shown in Kosanovich et al. (1996), where the major sources of variability were identified within the polymerization process. Extensions of MPCA like the use on Nonlinear PCA were considered in Dong and McAvoy (1996) to improve the performance of the monitoring scheme in two sets of simulated data: one for emulsion polymerization and the other of a two-stage jacketed exothermic batch chemical reactor.
Process knowledge was combined in 1998 with an MPCA and MPLS approach as found in Neogi and Schlags (1998). The combination was applied to real data of emulsion polymer production to detect potential anomalous batches, exact time of abnormalities and possible causes or variability. For instance, in 1999 a successful MPCA application was developed based on real historical data from industrial batch production of PVC via polymerization process (Tates et al., 1999). Also, Martin et al. (1999) proposed the application of MPCA on simulated data for a pilot-scale MMA reactor process, combined with inverse MPLS (IMPLS) led to predict desired profiles for variables which enhanced the control and monitoring performance.
MPCA was considered by Chen and Liu (2000) for its use, at 2000, in two batch processes: a catalyst reaction (simulation) and a wafer plasma etching (real). This strategy resulted simple and easy, but still capable of dimensionality reduction of the dataset to few factors in a latent space. A comparison between MPCA, PARAFAC, and TUCKER3 was done in Louwerse and Smilde (2000) using simulated data of a semi-batch emulsion polymerization of SBR. The main conclusion of this work declares that there is no 'king method', each of them has its advantages and disadvantages, it will depend on the data and analysis goals. For 2001, the same simulated dataset was used by Yuan and Wang (2001) to test a combination of MPCA and wavelet analysis to deal with the dynamic evolution of variables. PARAFAC2 was also considered in Wise et al. (2001) for fault detection and diagnosis in a semiconductor etch process; the method was compared with MPCA, PARAFAC, and PCA. The main advantage reported of PARAFAC2 was the ability to deal with data records of variable length without preprocessing requirements.
In 2003, the benchmark process of fed-batch penicillin production (simulated data) was considered by Ündey et al. (2003) for an MPLS application combining Iterative Learning Control (ILC) and Manipulated Variable Trajectories (MVTs). This strategy helped to predict the end-of-batch quality measurements and increase the process control performance. In Meng et al. (2003), a simulated process of semi-batch emulsion polymerization was employed to highlight the capacity of methods like MPCA and PARAFAC to monitor batch processes. The author assessed that there will always be a delay in the signaling at the start (unless it is a major fault); however, more sophisticated methods will not be required.
In 2004, researchers showed that MPCA could provide a powerful tool for estimating trajectories for variables and its expected behavior. They applied this method on a real Nylon polymerization process data and a simulated dataset of batches for an emulsion polymerization process (García-Muñoz et al., 2004). Same year, Lee et al. (2004) added a Kernel analysis to conventional MPCA to perform MKPCA on a fed-batch penicillin production simulated dataset. This approach was able to identify significant deviations that could cause a lower quality of the final products. Also, it could manage nonlinear relationships in the process variables leading to a better performance, if compared to MPCA. A different approach was considered by Albazzaz (2004) and Yoo et al. (2004) when they applied MICA to control two simulated processes: a semi-batch polymerization reactor and fed-batch penicillin production, respectively. Results from these analyses exhibited a superior performance on monitoring if compared to MPCA, this was achieved because MICA can extract more interpretable results in the main components when non-gaussianity is presented in the data.
In 2005, Gourvénec et al. (2005) exploited the STATIS method to analyze three-way arrays in two sets of real data for control and monitoring pharmaceutical batch process of maleic acid formation and yeast bakery production. This approach offers visualization tools of high quality and is a big aid for dealing with batches of variable duration, preserving the structure of the dimensions (times, variables, batches). In addition, MPCA was used as a reliable diagnostic tool in 2006, as shown in Cho et al. (2006), where it was applied to a real PVC batch process and a simulated dataset of fed-batch penicillin process. By 2007, MPCA andMPLS were used in Ferreira et al. (2007) for a real dataset of industrial pilot-scale fermentation where process knowledge was improved, and abnormal variations were detected.
Advances in control monitoring appeared in 2008 with an MPCA variant called Dynamic PCA, proposed by Doan and Srinivasan (2008) to monitor a simulated fermentation process. The main results from this research were process phases recognition and fault sensitivity improvement. On the other hand, Lin and Chen (2008) tested MPLS performance combined with ILC and MVTs, similar to the work presented in Ündey et al. (2003). In this application, of simulated data, the control objective was to maintain the concentrations of two reaction products at predefined desired levels after the batch run finishes. The goal was reached, tracking quality in the batch operation process using only the information contained in the historical database rather than detailed knowledge of the operation process model. STATIS approach was considered by Niang et al. (2009) for control monitoring which allowed to project trajectories of variables within the batch and to detect significant differences when compared to expected trajectories from historical data. Data used in this study corresponded to a batch polymerization reactor (real) and an industrial process with batches of variable duration (simulated). STATIS was also applied by Fogliatto and Niang (2009) using simulated data for a process with batches of variable duration. Off-line and on-line control were achieved, time-wise monitoring was performed, and fault times were correctly identified.
Fed-batch penicillin fermentation, a simulated process, was studied once more in 2013 by Yu et al. (2013) to validate the monitor ability of MICA in batch processes. The proposal has a superior capability for multiphase fault detection and diagnosis, and, not only the abnormal events can be precisely captured but also the major faulty variables were accurately identified. STATIS was used in Niang et al. (2013) to control a batch polymerization reactor process (simulated). The results illustrate the good performance of the method for off-line and on-line monitoring. Another STATIS application was developed in 2015, as detailed in Filho and Luna (2015) where a toy model (simulation) of an industrial process was considered to test the control abilities of the method. The monitoring scheme and the non-parametric control analyses demonstrated its discriminative ability by means of correlation structure preservation between variables and time. Slow Features Analysis (SFA) and Canonical Correlation Analysis (CCA) were applied to the batch scheme in 2017 using, in both cases, simulation studies of a numerical example and the well-known fed-batch penicillin fermentation process. First, Zhang et al. (2017) applied the multiway-kernel version of SFA (MKSFA) and others of similar use like MKPCA, MKICA, and BDKPCA, however, they enhanced the monitoring capacity considering a global preserving scope, leading to MGKSFA. This approach was able to manage nonlinear characteristics of the variables and inherent time-varying dynamics of the data. The proposed method (MGKSFA) performed better than the others in the comparison in terms of fault detection time and rate. After this, Wang et al. (2017) applied multiset CCA (MCCA) for parallel-running processes which allowed to monitor individual and joint features. The case studies validated the efficiency of the method, in terms of control and monitoring, of the parallel processes. Finally, for 2018, a new MPCA exploitation showed up in Cho and Kim (2018) with real data from an industrial PVC batch process. The method took full advantage of the information contained in data, to detect abnormalities of new batches earlier. An extension of the previous MCCA application can be found in Jiang et al. (2018) where a numerical example and a real data application on injection molding machine were used as case studies to test the approach, improved with consideration of key operation units. The proposed monitoring scheme detected a local fault incorporating both the variables within the key units and the information from other related units. Another application of STATIS, in its Dual version and combined with Parallel Coordinates (PARCOORDS) was proposed in Ramos- Barberán et al. (2018) where a numerical example and the well-known benchmark process of fed-batch penicillin production were considered for validation. Monitoring was achieved without special considerations for data distribution, bagplots based regions enhanced the discrimination capacity to detect faults, and visualization of reference/new batches using PARCOORDS became a powerful tool for abnormal pattern recognition. Reviewed methods and its applications to control batch processes are summarized in Table 3, where most of these are related to chemical industrial processes.

Results and discussion
MSPC applied to batches usually performs dimensionality reduction over the multilinear array analyzed. Dataset unfolding is the main step towards consideration of this multiway analysis as a two-way problem, involving matrices instead of arrays. Depending on the chosen method and the research objective, results obtained are biased by slicing consideration and unfolding direction.
In addition, dimensionality reduction methods rely on SVD to solve their corresponding optimization problems considering restrictions and cost functions (Rosipal & Krämer, 2005). Therefore, a comparison between these methods can be done.
The preferred method, for its ease of interpretation and well-known implementation, is MPCA, nevertheless, more robust approaches can be slightly better in terms of fault detection rates (Zhang et al., 2017). While MPCA maximizes the explained variance along one array, MCCA focuses on data correlations' maximization between selected slices, and MPLS computes maximum covariance between two arrays (Rosipal & Krämer, 2005).
Tucker3 and PARAFAC models need less parameters than MPCA to decompose the multilinear array. However, MPCA captures more variance than these models. On the other hand, Tucker3 and PARAFAC principal components can be computed for every dimension in the batch (Louwerse & Smilde, 2000).
When non-gaussianity is presented in the dataset, MICA exhibits an advantage over MPCA due to a better extraction of latent variables and data optimum fit. This characteristic allows to discriminate any unusual behavior within batches. Both MSFA and MICA are usually applied in blind source separation (Zafeiriou et al., 2013); analogous to batches variables, they perform latent variables extraction, with a crucial difference for MSFA where the variation of extracted features is smoothed.
Parallel coordinates, as a non-dimensionality reduction method, differ from the others because no latent variables are needed, preserving the multilinear nature of the dataset. Values of each variable and observation within batches can be displayed in an all-in-one system, which allows to identify local patterns and their global dynamics behavior.

Conclusions
To sum up, the present review does not pretend to select the best method; instead, it shows that each one of the described techniques may perform better under specific conditions, considering batch process practical implications and researchers optimization goals. Finally, the combination of analyzed methods widens their application under different scenarios, helps to overcome their limitations, and enhances their individual advantages in terms of results interpretation.

Disclosure statement
No potential conflict of interest was reported by the authors.