Identification of a nonlinear rational model based on bias compensated multi-innovation stochastic gradient algorithm

The nonlinear rational model is a generalized nonlinear model and has been gradually applied in modelling many dynamic processes. The parameter identification of a class of nonlinear rational models is studied in this paper. This identification problem is very challenging because of the complexity of the rational model and the coupling between model inputs and outputs. To identify the nonlinear model, a bias compensated multi-innovation stochastic gradient algorithm is presented. The multi-innovation technique replacing the scalar innovation with an information vector is adopted to accelerate the traditional stochastic gradient algorithm. However, the estimate obtained by the accelerated algorithm is biased because of the correlation between the information vector and the noise. To overcome this difficulty, a bias compensation strategy is used. The bias is calculated and compensated to get an unbiased estimate. Theoretical analysis shows that the proposed algorithm can give biased estimates with linear complexity. The proposed algorithm is validated by a numerical experiment and the modelling of the propylene catalytic oxidation.


Introduction
To describe the dynamic characteristics of the nonlinear systems, many nonlinear structures have been developed, such as the NARMAX model, Volterra series model, block-oriented nonlinear model and so on [1][2][3][4]. In recent years, a model named nonlinear rational model (NRM) has been gradually applied in the modelling and control of nonlinear systems, particularly in some chemical processes and mechanistic systems [5][6][7][8]. The NRM is a kind of generalized nonlinear model. Traditional rational model, NARMAX model, integral model, output affine model and linear difference equation model can be seen as its special form [9].
The NRM is defined as the ratio of two polynomial expansions of past inputs, outputs and prediction errors [10]. The identification of the NRM is quite difficult because the NRM cannot be parameterized into a linear-in-parameter system [9][10][11] and the coupling between model input and output. Despite the difficulties, researchers have reported some results [7,9,12]. For example, to identify the parameters of the NRM, a prediction error algorithm and a new rational model estimation (RME) algorithm were proposed [11,13]. To decrease the computational cost of the above two algorithms, a recursive RME algorithm, an error back propagation algorithm, an implicit leastsquares iterative algorithm, two maximum likelihood algorithms and a globally convergent algorithm were derived [5,7,10,14,15]. To determine the NRM's structure, an orthogonal RME algorithm and a genetic algorithm were investigated [9,11]. Zhu et al. summarized the advances in NRM identification and control [8].
Although these algorithms work well for many NRMs, they have at least O(n 2 ) complexity, which makes them unsuitable for online applications. To decrease the complexity, the stochastic gradient (SG) algorithm is an alternative because it costs only O(n) flops each iteration [16]. There are many gradientbased algorithms, among which, a key term separation gradient iterative algorithm was derived to identify a fractional-order nonlinear system [17], a three-stage forgetting factor SG method was proposed for a Hammerstein system [18] and an auxiliary model stochastic gradient method was studied for a Wiener-Hammerstein system [19].
However, the estimate for the NRM given by the traditional SG algorithm is biased because the information vector is correlated to the noise [10,20,21]. To get unbiased estimates, the bias compensation (BC) technique and instrumental variable (IV) technique are often used, for example, a BC-based method was proposed to estimate the state of charge for lithium-ion batteries [22], a BC-based sign algorithm was addressed to estimate the weight vector of an unknown system [23], an IV method was implemented for detecting and correcting parameter bias within structural equation models [24], a unified model-implied IV approach was reported for structural equation modelling with mixed variables [25]. Sometimes, for the IV method, although there are some guiding principles, it is still very difficult to select an appropriate instrumental variable. Therefore, the BC method is adopted to obtain an unbiased estimate for the NRM in this paper [10].
This paper considers the parameter identification of the so-called ARX-NRM. This NRM contains a process model with nonlinear rational form and its noise model has the same denominator as the process model. This paper investigates this parameter identification in the time domain, without considering modelling error, time-delay estimation, uncertainties in the real systems and frequency-domain identification [26][27][28][29]. Meanwhile, this paper discusses the integer adaptive methods for the nonlinear models. Recently, the fractional adaptive method has been an important part of adaptive algorithms. For details, please see [30][31][32][33].
The main contributions of this paper are as follows: (1) The parameter identification of an ARX-type nonlinear rational model is considered, which is quite difficult because this model is a nonlinear-inparameter system and its output is coupled with the input. (2) To accelerate the stochastic gradient algorithm, the multi-innovation technique is integrated into the algorithm, in which the scalar innovation is replaced by the innovation vector. The rest of this work is organized as follows. In Section 2, the ARX-NRM to be estimated is described. Then an unbiased multi-innovation SG algorithm is presented in Section 3. In the next section, the performance of the proposed algorithm is analysed. Numerical examples and case study are adopted to validate the proposed algorithm in Section 5. Finally, conclusion is summarized in Section 6.

Problem description
Consider an ARX-NRM depicted in Figure 1, where u(k), y(k) and v(k) are the input, output and noise, respectively. f (k) and g(k) are two nonlinear polynomials concerning y(k − i) and/or u(k − j), i, j = 1, 2, . . . , r.
From Figure 1, we can express the output y(k) by where It can be seen from Figure 1 and Equation (1) that the structure of this NRM is similar to that of the ARX model, [21]. The difference is that the numerator f (k) and denominator g(k) of the NRM are both nonlinear polynomials, and the input u(k − i) is implicit in the nonlinear transfer function. Thus the NRM in Figure 1 is named ARX-NRM.
Multiplying both sides of Equation (1) by 1 + g(k) yields and Then we can parameterize the ARX-NRM as follows: where The identification of the ARX-NRM in Figure 1 is transformed into the estimation of the parameter vector θ based on the observations {u(k), y(k)} N k=1 , where N is the data length.

Stochastic gradient (SG) algorithm
Consider the parameterized system in Equation (5) and denote the error e(k) as whereθ(k − 1) is the parameter estimate at time k − 1.
Sometimes, e(k) is also called innovation [16]. Define a cost function as follows: The SG algorithm updates the parameter estimate along the negative gradient direction of the criterion function The SG algorithm for identification of θ is as follows [20]:θ where η(k) is the variable step size that can be calculated by [16]

Multi-Innovation SG (MI-SG) algorithm
Although the SG algorithm costs less calculation than the least-squares algorithm, it converges slowly. To accelerate the algorithm, multi-innovation is introduced [16]. Replacing the single innovation e(k) in Equation (9) with the multi-innovation vector E p (k), and replacing the single information vector ϕ(k) with the information matrix p (k) gives a stacking gradient G p (k) as follows: where p is the stacking length, G p (k), E p (k) and p (k) are the stacked gradient, stacked innovation vector, and stacked information matrix, respectively. Expanding G p (k) yields (13) where g(i) denotes the gradient at time i. It can be seen from Equation (13) that the stacked gradient G p (k) is the sum of recent p gradients, and it can also be regarded as the weighted information vectors, and the weighting coefficient is the innovation at the corresponding time. In short, this summation or weighting increases the size of the gradient, modifies the direction of the gradient and is conducive to accelerating the gradient algorithm. This gradient algorithm using multi-innovation is called multi-innovation SG (MI-SG) algorithm.

Bias compensated MI-SG (BC-MI-SG) algorithm
Let us study the properties of the parameter estimate given by the MI-SG algorithm. Considering Equations (7) and (9), the stacked gradient G p (k) in Equation (12) is rewritten as where T(i) = ϕ(i)v(i), = ϕ(i)ϕ T (i) and θ 0 denote the true parameter vector.
Considering Equations (15) and (14) becomeŝ Taking expectation on both sides of Equation (16) yields Supposing is not related to θ (i − 1), and using the conditional expectation formula [20] E{f The fourth item on the right side of Equation (17) can be written as Equation (17) becomes When k → ∞, E{θ(k)} = E{θ(k − 1)}, we have It can be seen from Equation (21) that the parameter estimate obtained by the MI-SG algorithm is biased. We can find that the bias −1 k i=k−p+1 E{T(i)} is caused by the η(k) k i=k−p+1 T(i) on the right of Equation (16). To obtain an unbiased estimate, this term must be subtracted from the right of Equation (14), i.e. Figure 1, considering Equation (6)

Computational analysis
The calculation costs of each iteration of the SG, MI-SG, RLS and BC-MI-SG algorithms are shown in Table 1. It is seen that: (1) The computational burden of the three SG algorithms is all O(n).

Numerical example
Consider an ARX-NRM in Equation (1) with where the input u(k) is a Gaussian signal with mean zero and variance 1.0 2 . A noise v(k) with mean zero is added to the model. 600 observations are collected and depicted in Figure 2. The initial value of each entry of the parameter vector is set to 1 × 10 −6 and the estimation error is defined as δ(%) = θ(N)−θ 0 θ 0 × 100.

1) Results using BC-MI-SG, MI-SG, SG and RLS algorithms
The parameter estimates using the proposed BC-MI-SG are shown in Table 2, and the estimation errors are depicted in Figure 3, where σ 2 = 0.01 2 , λ = 0.4, p = 3. For comparison, the parameter estimates given by the SG, RLS and MI-SG algorithms are also listed in Table 2,  and the estimation errors using the last three algorithms are also depicted in Figure 3.
It can be seen that: (1) In Figure 3, all curves decrease when k increases, which means that the estimation errors of the three algorithms become small with the new data being used. (2) The error curves of the two algorithms with MI are almost the same, which are far lower than that of the SG algorithm. In other words, the parameter estimates given by the two MI-SG algorithms are more accurate than those given by the SG algorithm. (3) Among the two curves with MI, the curve of the proposed BC-MI-SG algorithm is at the bottom, which shows that the estimation error given by the proposed algorithm is smaller. That is to say, the bias compensation can improve the estimation accuracy of the MI-SG algorithm.  (4) In the second half of identification, the curve of the RLS algorithm has little difference with the proposed algorithm, which implies the RLS algorithm can give an accurate estimate for the ARX-NRM. However, the RLS algorithm costs too much computation to prevent its application in some situations that needs fast identification (see Table 1 for more details). 2) Results using the BC-MI-SG algorithm with different noise variances To show the performance of the proposed algorithm under different noise levels, we estimate the ARX-NRM in Equation (28) using the BC-MI-SG algorithm with σ 2 = 0.01 2 , 0.02 2 , 0.04 2 . The estimation errors using different σ 2 are depicted in Figure 4. It can be seen that: (1) For a given noise variance σ 2 , the overall trend of the estimation error decreases with the increase of k.
(2) When the variance is small, the curve of the estimation error is relatively smooth. With the increase of variance, the fluctuation of the error curve increases. (3) The estimation error of σ 2 = 0.01 2 is smaller than those of the σ 2 = 0.02 2 and σ 2 = 0.04 2 . That is, a larger σ 2 is not conducive to the improvement of the estimate's accuracy.

Case study
A chemical model describing propylene catalytic oxidation with the following structure [5,15,34] is used to validate the proposed algorithm, where two inputs C o (k) and C p (k) are the oxygen and propylene concentrations at time k respectively. The rate of disappearance of propylene y(k) is taken as the output variable. The true values are a 0 = 0.231, b 0 = 7.28 × 10 −4 . The inputs C o (k) and C p (k) are taken as random integers between (1, 100) and between (1, 10) respectively, {v(k)} is taken as a white noise sequence with mean zero and variance σ 2 = 0.1 2 . The curve of 600 observed data is shown in Figure 5. Following ARX-NRM structure is used: Estimate using proposed BC-MI-SG algorithm with λ = 0.4, p = 3 is listed in Table 3, where the estimation error is calculated by the following formula:  For comparison, the estimates of the SG and MI-SG algorithms are also shown in Table 3. It is easy to find that the estimation error of the proposed algorithm is the smallest one among the three algorithms, which means the model obtained by the BC-MI-SG algorithm is the most accurate model of the three.

Conclusion
To identify the parameters of an ARX-NRM, a bias compensated multi-innovation stochastic gradient algorithm is presented. To accelerate traditional stochastic gradient algorithm, a multi-innovation is integrated into the algorithm. The multi-innovation technique replaces the scalar innovation in the SG algorithm with an information vector. Theoretical analysis shows that the MI-SG algorithm gives a biased estimate because the output contained in the information vector is correlated to the noise. To get an unbiased estimate, the bias is calculated firstly and then compensated to the MI-SG algorithm. The proposed algorithm is validated by numerical experiments and the modelling of the propylene catalytic oxidation. Results indicate that the proposed algorithm can give accurate estimates using less computation.

Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Disclosure statement
No potential conflict of interest was reported by the author(s).