Designing optimal models of nonlinear MIMO systems based on orthogonal polynomial neural networks

ABSTRACT This paper presents a new method for modelling of dynamic systems by using specially designed orthogonal polynomial neural networks. These networks utilize the feature that the basis made of orthogonal functions can be used for approximation of arbitrary function, while their property of orthogonality enables optimal performances in the sense of both convergence time and approximation error. In this regard, generalized quasi-orthogonal polynomials, specifically tailored for the application in the modelling of complex dynamic systems with time-varying behaviour, are considered. Adaptivity of the designed model is achieved by using variable factors inside the orthogonal basis. The designed orthogonal neural network is applied in modelling of laboratory twin-rotor aero-dynamic system as a representative of nonlinear multiple input-multiple output systems. Detailed comparative analysis is performed for a different number of polynomials in expansion with the purpose of finding the optimal model in the sense of trade-off between model accuracy and complexity.


Introduction
Efficient modelling of complex, MIMO (multiple input-multiple output) nonlinear systems with time-varying behaviour, prone to changes in a plant environment and deviations in the plant structure, always represented a great challenge in control system theory and practice. In that sense, neural networks proved to be an excellent tool for modelling of an arbitrary dynamic system [1,2] because of their theoretical capability of approximating any measurable function to any desired degree of accuracy, having the right structure [3]. Also, modelling by neural networks belongs to black box modelling methods, where we only need input-output signals, while the knowledge of physical principle of modelled plant and solution of a possibly complicated set of mathematical equations is not required. On the other hand, finding the optimal architecture (number of hidden layers and processing elements), determining training method for the networks' weights and setting their initial values can still be a very tedious process if we need to find the right measure of the trade-off between accuracy and network complexity [4,5].
The remainder of the paper is structured as follows. Section 2 describes deriving specially tailored generalized Legendre quasi-orthogonal basis to be used for designing the neural network. These polynomials have adaptive factors incorporated inside themselves. Section 3 explains the proposed orthogonal neural network model in detail, as well as the training procedure. Section 4 describes the laboratory twin-rotor aerodynamic system, which will be used as a case study in Section 5 for designing an orthogonal neural network model based on a black box approach, i.e., input-output experimental data sets. The comparative analysis is also conducted for the different number of orthogonal polynomials in expansion with the purpose to find an optimal model in the sense of trade-off between model accuracy and complexity. Similarly, an optimally designed orthogonal model was compared with some other known state-ofthe-art models of twin-rotor aero-dynamic system, where it demonstrated best overall performances regarding accuracy (modelling error), training time (model complexity) and robustness to internal perturbations and external disturbances. These features make designed model a perfect candidate for application in modern control algorithms, like Model Predictive Control, based on reliable models which can be simulated quickly during complex online optimization with constraints. Finally, Section 6 concludes the paper.

Generalized quasi-orthogonal Legendre polynomials
Legendre polynomials were discovered in 1782 during an effort to find the force of attraction between the celestial bodies during their revolution [25]. After that, orthogonal polynomials and their properties were a subject of extensive study of many mathematicians and scientists from other fields where orthogonal functions found numerous applications, control systems engineers among them.
First, let us consider the classical definition of orthogonality and orthogonal polynomials [26]. If we have the set of polynomials P n x ð Þ: then, we say that these polynomials are orthogonal over the interval (a, b) if and only if: where a and b are the boundaries of orthogonality interval and w(x) is weight function. For our analysis of dynamical systems operating in real-world conditions, which are prone to a deviation of nominal values of their components because of wearing over time or under environmental changes (temperature, humidity, pressure), it is convenient not to consider orthogonal polynomials in their pure form (2), but rather polynomials are known in literature as almost orthogonal [12]. The integral is no longer equal to the zero for n�m, but, because of the mentioned conditions of real-time operating, is some constant very close to zero. That adaptive variable will enable us to partially overcome the drawback of the black-box modelling approach that it is generally valid only for a narrow set of signals, it was calculated from.
One important class of orthogonal polynomials, convenient for technical applications because of their properties [26], are Legendre polynomials, defined over orthogonality interval [−1, 1] and with weight function w(x) = 1. For applications in control systems theory, we should rather consider shifted Legendre polynomials in their explicit form where the interval of orthogonality is shifted to [0, 1] by using affine transformation, i.e., shifting function x ! 2x À 1 : Shifting enables later application of Legendre polynomials inside technical systems that operate in real-time, so, with additional substitution x ¼ e À t , we can obtain exponential polynomials orthogonal over the real-time interval 0; 1 ð Þ. By applying the Laplace transform to the exponential orthogonal polynomials, we get orthogonal rational functions, capable of emulating transfer functions of arbitrary dynamic systems [12].
If we now apply the definition of almost orthogonality on shifted Legendre polynomials (3), we obtain the sequence of almost orthogonal Legendre polynomials: where: Г is the symbol for the gamma function and δ ≈ 1 is a constant very close to one. The final form of generalized Legendre polynomials can be derived by using the following definition of quasi-orthogonality [27]: where k represents the order of quasi-orthogonality, and a and b are the boundaries of quasi-orthogonality interval. Filters based on these polynomials enable simpler modelling of the systems with the difference in the degree of polynomials in the numerator and denominator in transfer function greater than one (equal to k + 1). If we apply definition (6) to the sequence of almost orthogonal Legendre polynomials given by (4) and (5), we obtain generalized quasi-orthogonal Legendre polynomials over the interval (0, 1): where: For example, here are the first few second-order (k = 2) generalized quasi-orthogonal Legendre polynomials of this sequence, which will be used for designing the orthogonal neural network with the task of modelling twin-rotor system: x þ δ 2 12 ; It should be mentioned that another orthogonal polynomial basis could be also used for designing orthogonal neural network model described in the next section, for instance Chebyshev or Laguerre. Legendre polynomials are chosen because authors have already developed whole mathematical framework for generalized quasi-orthogonal variant of these polynomials with a lot of useful mathematical relations, including relations analogue to Christoffel and Rodrigues formulas and Bonnet recurrence relation valid for classical orthogonal polynomials [13,14,28]. Programming modules were also previously developed with all the functions necessary for easy programming of designed models.

Orthogonal neural network models
Developed quasi-orthogonal functions given by (7) and (8) can be used for designing very efficient adaptive polynomial neural networks. These networks can be treated as a special subtype of higher-order neural networks [29]. The structure of such a network is presented in Figure 1. It produces a function that represents a combination of Legendre (or other) quasi-orthogonal polynomials and can be used for approximation of an arbitrary function. Thereby, orthogonal polynomial expansion is superior to approximations by other functions because of natural in-build optimality of orthogonal basis, leading to the better accuracy of the approximation and shorter convergence time for a similar number of terms in the expansion. The polynomial neural network shown in Figure 1 is applied in modelling of an arbitrary plant (object of modelling). Theoretical background for such an application can be found in the fact that an arbitrary function y(X), where X represents the vector of inputs X = [x 1 ,x 2 , . . ., x m ], can be approximated by orthogonal expansion of quasiorthogonal polynomials p i (X) of a neural network: where w i is weight of the network, f is an activation function and R(X,y plant ,δ,n) is the approximation error that decreases with an increase of the number of expansion terms (n) [30]. Label x in Figure 1 and equation (10), marking input signals to the plant, should not be confused with x in equations (7) and (8), used for generating quasi-orthogonal Legendre polynomials, where x is just a label of a general variable. As already explained in Section 2, δ is an adaptive variable with a value close to one that enables the feature of almost orthogonality i.e., shifted orthogonality which can describe small deviations in plant structure or operating mode because of wearing over time or environmental changes. In that way, designed models will gain some level of flexibility and validity over a wider range of operating conditions. Past instances of the plant inputs/outputs are also included in orthogonal neural network design, providing its recurrence. These past samples of input/output signals are provided by blocks labelled TDL (Tapped Delay Line) which memorize signal values in previous instances. Block named Legendre expansion then generates quasi-orthogonal Legendre polynomials (P i ) based on current and time-delayed values of inputs and outputs and the value of δ. In Figure 1 there are a previous input values and b previous output values. We experimented with up to three delayed input/output samples and, in experiments presented in this paper, we used a = 1 and b = 2.
An adequate model of the plant of interest is obtained by finding the optimal values for network weights w i (i = 1, 2, . . ., n) through the training of the polynomial network. For that purpose, the chosen training algorithm uses some performance function, usually based on modelling error -difference between the plant (y plant ) and model (y model ) outputs, with the goal to minimize it. Several papers deal with optimal training methods for polynomial neural networks [6,9,31]. The performed comparative analysis demonstrated that the best training method for polynomial neural networks in the sense of accuracy and convergence time is the Levenberg-Marquardt algorithm or damped leastsquares method [32,33]. This algorithm is based on a gradient vector and the Jacobian matrix and is designed specifically for performance (cost) function given in the form of a sum of squared errors: where m is the number of instances in the input-output data set. The Levenberg-Marquardt algorithm is practically modified Newton's method for function optimization with respect to a specific variable. During the execution of this training algorithm, the change in weight vector is calculated as: where w is a weight vector with n adjustable parameters and J represents Jacobean matrix with size m � ncontaining the derivatives of the errors with respect to the weights: for i = 1, . . ., m and j = 1, . . ., n. Parameter λ is called the damping factor, and, it is usually initialized to be large, so at first, updates are small steps in the gradient descent direction.
In the next step, if the cost function increases, then λ should be increased even more. Otherwise, if the cost function decreases, λ should be decreased, and, in that case, the Levenberg-Marquardt algorithm approaches the Gauss-Newton method. The Levenberg-Marquardt algorithm is considered to be very fast and easy to implement by simple modifications in the traditional backpropagation method.

Case study: twin-rotor aero-dynamic system
As a case study for modelling of dynamical systems by using designed orthogonal neural network models, explained in Section 3, laboratory twin-rotor aero-dynamic system (TRAS), shown in Figure 2, was used. This system is convenient because of its multiple input-multiple output (MIMO) nature with present high order nonlinear dynamics and cross-couplings, very difficult to model with conventional methods based on Lagrange equations. It is also very prone to parameter variations, external disturbances, and unwanted induced vibrations. The system is controlled in real-time from a PC in the MATLAB/Simulink environment [34]. As its name suggests, the main components of TRAS (see Figure 2) are two rotors -the main rotor and tail rotor, driven by DC motors. These rotors (propellers) are attached at the ends of the beam with a counterbalance pivoted on its base and enable the beam to rotate in the horizontal and vertical planes, so the system has two degrees of freedom defined by pitch and azimuth angles. Besides these two variables measured by position sensors (encoders) at the pivot point, the state of the beam is also described by corresponding angular velocities which are software reconstructed. There are two more state variables -angular velocities of the rotors, measured by associated tacho-generators.
It should be mentioned that this laboratory system acts like a helicopter only in some aspects and it is not suitable for serious considerations regarding aeromechanics and flight control. It cannot fly through the air and imitate all aerodynamical effects like gusts of wind, but it will be good enough for our purposes, because we only need system with multiple inputs, cross-couplings, easy appliance of perturbations or disturbances, and developed computer functions for comfort experimenting, in order to validate designed modelling method presented in this paper. Controlling the TRAS system usually means stabilizing the beam in an arbitrary desired position or making it track the desired trajectory. Controlled variables (output signals) are horizontal position (azimuth angle) and vertical position (pitch angle) of the TRAS beam (α h and α v , respectively). Control inputs for TRAS are supply voltages of the DC motors (U h and U v in Figure 3). Change in input voltage results in the different rotation speed of the corresponding rotor (propeller) and appropriate change in the beams' position. However, there are also significant cross-couplings between the actions of the rotors, resulting in each rotor influencing both position angles.
Modern methods of design of real-time controllers require high-quality models of the system, and, as already stated, because of non-linear dynamics, cross-couplings, parameter variations, susceptibility to external disturbances and other factors, modelling of twin-rotor aero-dynamic system is not an easy task, if we approach it from the point of the white box -from the point of complete mathematical-physical analysis. This approach, although it provides a general model valid for the whole range of plant inputs and states, is only suitable for simple plants with a small number of parameters. Comprehensive models of TRAS with necessary mathematical and physics background can be found in [35][36][37].
In this paper, for the purpose of comparing designed adaptive orthogonal neural network model with some state-of-the-art models, two other white-box models will be analysed: linearized model described in [38] and default nonlinear model provided with laboratory set-up given in Figure 3 without further explaining of labels, which can be found in [34].

Modelling of twin-rotor aero-dynamic system by using adaptive orthogonal polynomial neural network
As a case study for designing optimal models of nonlinear MIMO systems by using orthogonal polynomial neural network, laboratory twin-rotor aero-dynamic system -TRAS (Figure 2) was considered. General neural network structure (Figure 1) was adapted for two input-two output system, where inputs are voltages of DC motors: U h -the horizontal DC-motor PWM control signal (tail rotor) and U v -the vertical DCmotor PWM control signal (main rotor), and outputs are: α h -horizontal position (azimuth angle) and α v -vertical position (pitch angle) of the TRAS beam. The adjusted model in parallel to the laboratory system is presented in Figure 4. Block named Legendre expansion produces generalized orthogonal polynomials according to (7) and (8) by using input/output current and time delayed samples (one previous sample for inputs and two for outputs have been used in presented experiments). Having both inputs in polynomial development, we can easily emulate input-output cross-coupling inside the plant. In all experiments, second-order (k = 2) Legendre polynomials (9) were used because of their property to model well internal transfer functions characteristic for the twin-rotor system with the difference of three in orders of numerator and denominator. Starting experiments were performed for strictly quasi-orthogonal polynomials with value of parameter δ equal to one. Later, δ was perturbated in order to simulate adaptability to changed environmental factors, sensed disturbances, measurement errors, and occurred uncertainties. As activation function, classical sigmoidal function f x ð Þ ¼ 1 1þe À x was applied, guarantying a smooth, differentiable model. Bias b was not needed in our experiments, i.e., it was set to zero. Weights in the model were adjusted by Levenberg-Marquardt training algorithm based on error, i.e., the difference between real -measured outputs and model outputs, as explained in Section 3.
Modelling process described in this paper is based on a black-box approach (inputoutput experimental data sets). The same input signals were applied both to the orthogonal model and a plant (TRAS). Thereby, the input signals U h and U v are normalized in the range [−1, +1] corresponding to the input voltage range of [−24 V, +24 V]. In nonlinear system identification, input signals are usually chosen to have different levels of amplitudes to be able to excite most of the dynamic modes of interest. So, the following normalized input voltages ( Figure 5) driving the motors were applied: the azimuth input -square wave with 0.15 amplitude and 1/50 [Hz] frequency, and the pitch signal -sine wave with the amplitude 0.25 and frequency 1/50 [Hz]. Signals in input/output data sets were sampled with period of 0.01 seconds with overall duration of the excitation of 100 seconds.
To find an optimal model in the sense of trade-off between model accuracy and complexity, experiments were repeated for a different number of terms in the polynomial expansion. Two criteria were considered: model accuracy and training time. Model accuracy was evaluated by using the performance index in the form of root-meansquare error (RMSE): After training the network by applying training input data set and calculating RMSE, model accuracy was also considered for two more testing sets with different shapes than original. Testing set 1 had for the azimuth input -sine wave with 0.  Table 1.
We can see from results that model accuracy expectedly increases with the increase of the number of terms (polynomials) in orthogonal expansion. On the other hand, training time also increases rapidly, so, if the network complexity and speed of training are most important, higher-order models soon become unusable. Small gains in accuracy do not always justify unnecessary complex networks. Similar trends regarding model accuracy are also present for two testing sets (two responses to different inputs), with obvious discrepancy for the model built with seven terms in the polynomial expansion. The reason for this decrease in accuracy can be found in the effect of overfitting, i.e., reduction in generalization ability for higher-order polynomials. On the other hand, lower-order  networks are inadequate to achieve targeted modelling precision. We can see that there exists some optimal value for the number of expansion terms which achieves the best accuracy, not only for training, but also for testing data, and all that with bearable training time. In this particular case, that optimal expansion is with six terms. Parallel responses of the laboratory TRAS and optimal orthogonal model with six terms are given in Figures 6 and 7. Let's now compare designed orthogonal polynomial models with some other known models of twin-rotor aero-dynamic system:   Figure 3. • Nonlinear ARX neural network [39] with two hidden layers with 10 neurons each.
The first and the second model are white-box first principle models with included nominal values of the laboratory set-up parameters given in [34], and third is blackbox model -neural network trained just like our orthogonal neural network. Modelling results are presented in Table 2.
It should be noted that for linearized and nonlinear models we do not have a training phase, so given results are actually for training set used just as another testing set. We can see that the linearized model cannot catch all the complexities inside the system, resulting in the worst accuracy results. The nonlinear system is better, but not as good as NARX neural network, confirming the assumption that black-box models, although not interpretable as the first principle model, provide much better accuracy for complex MIMO systems, being able to mimic all dynamic modes in these systems. NARX network achieves satisfying results and, it should be mentioned, with much shorter training time, but that result is still with almost double RMSE than for optimal orthogonal neural network.
The true power of orthogonal networks comes to the surface if we introduce changes to original nominal systems imitating environmental changes, measurement errors or occurred uncertainties in the form of perturbations around nominal values of 1% and 3% and repeat the calculation of RMSE for developed models of TRAS. Perturbations are incorporated into experiments by programming artificial measurement error (noise) into feedback information obtained by position sensors (encoders) responsible for reporting current TRAS position, i.e., azimuth and pitch angles. Percentage of change in relation to normal (nominal) system response is calculated as a ratio of overall signal power, i.e. noise to signal ratio. We can see from the results in Table 3 that other models are rigid and cannot follow the changes well. On the other hand, designed orthogonal model possess incorporated measure of variations -δ that can be now set to 1.01 and 1.03 for changes of 1% and 3% respectively, making models adaptive to changes by shifting a little their orthogonal inner products, without the need to train the model again, and obtaining similar accuracy, that way.

Conclusion
This paper considers a method for modelling complex nonlinear MIMO systems based on designing one new type of orthogonal polynomial neural networks. These networks use the basis of specially tailored quasi-orthogonal functions suitable for modelling dynamic systems that utilize the feature of orthogonal polynomial expansions to approximate arbitrary functions with any given accuracy (approximation error), of course, by using the sufficient number of terms in the expansion. Thereby, orthogonal polynomials are naturally superior to other functions in the sense of optimal development (same accuracy for fewer terms in expansion or better accuracy for the same number of terms). Applied quasi-orthogonal polynomials (Legendre in this case) are very convenient for catching time-varying behaviour and perturbations in real systems thanks to the incorporated adaptive factor inside the polynomial terms. Coefficients in the polynomial expansion are weights in a neural network that can be optimized -trained by using classical training methods. In this research, the modified Levenberg-Marquardt algorithm was applied. Thanks to the orthogonal basis and single-layer structure, orthogonal neural network avoids the main problem of traditional feedforward neural networks -finding the optimal structure in terms of the numbers of layers and processing elements, as well as the initial values of weights. The required number of processing elements, which are composed of the expansion terms of quasi Legendre polynomials, is determined according to the desired output accuracy. Due to uniqueness of the weights, rapid convergence during training of weights is ensured.
Modelling potential of designed adaptive orthogonal polynomial neural networks was verified on the example of twin-rotor aero-dynamic system as a suitable representative of complex nonlinear multiple input-multiple output systems with present cross-couplings, and prone to external disturbances and unwanted induced vibrations. Modelling of such a system can be very challenging for traditional first-principles methods, but also to other black-box methods based on identification by using input-output data sets. First, a set of models were designed with a different number of terms in the polynomial expansion. It was proven that there exists a sweet spot, i.e., optimal number of terms that guarantee the best trade-off between model accuracy and complexity, expressed by computation time. The effect of overfitting was also noticed for larger polynomial expansions.
The next step was to compare the optimal orthogonal polynomial model with three other known models of twin-rotor aero-dynamic system: first -obtained by analytic linearization, second -nonlinear Simulink model provided by laboratory twin-rotor aero-dynamic system manufacturer and third -Nonlinear ARX neural network. Optimal orthogonal neural network demonstrated best overall performances among all models, especially better than the first two -white-box models. The superiority of the orthogonal model is even more evident when we introduce disturbances and measurement noise into the system. Thanks to the adaptive factor, which is part of the quasi-orthogonal expansion, the orthogonal model can recognize perturbations and provide the same level of accuracy under new conditions. Designed orthogonal models can be especially useful in modern control algorithms based on reliable models, like Model Predictive Control (MPC) [40]. MPC is a feedback control algorithm that uses the model to predict future plant outputs and solves optimization problem to select optimal control, while satisfying a set of constraints. This type of control can handle well MIMO systems, like TRAS, because it is multivariable control that controls the outputs simultaneously by considering all the interactions between system variables [41]. On the other hand, for instance, PID controller design would be challenging in this situation because the separate control loops for different system variables would operate independent of each other, as if there are no interactions between them. Another problem would be tuning too many controller gains.
The biggest problem of MPC, that slowed the applications of that control algorithm for several decades, is the need for a lot of calculation, because the algorithm must solve a complex online optimization problem with constraints at each time step. That can be very challenging in the case of complex models of the plants and can require very strong processors with a large memory. Thanks to the single-layer structure and less model parameters, orthogonal polynomial models presented in this paper demand lot less calculations during determining outputs and should provide much better performances for MPC in the sense of shorter calculation time for the same time prediction horizon or the larger horizon for the same calculation time, comparing to other models. First experiments in that direction have already proved to be encouraging.