Physically motivated structuring and optimization of neural networks for multi-physics modelling of solid oxide fuel cells

ABSTRACT Neural network models for complex dynamical systems typically do not explicitly account for structural engineering insight and mutual interrelations of various subprocesses that are related to the multi-physics nature of such systems. For that reason, they are commonly interpreted as a kind of data-driven, black box modelling option that is in opposition to a physically inspired equation-based system representation for which suitable parameters are subsequently identified in a grey box sense. To bridge the gap between data-driven and equation-based modelling paradigms, this paper proposes a novel approach for a physics-inspired structuring of neural networks. The derivation of this kind of structuring, an optimal choice of network inputs and numbers of neurons in a hidden layer as well as the achievable modelling accuracy are demonstrated for the thermal and electrochemical behaviour of high-temperature fuel cells. Finally, different network structures are compared against experimental data.


Introduction
Solid Oxide Fuel Cells (SOFCs) [1][2][3][4] are high-temperature fuel cells that are investigated as an option to set up a decentralized supply with electricity and heat. This kind of cogeneration approach is especially promising for the implementation of future distributed power supply grids due to the wide range of different fuels that can be used by SOFCs. Moreover, SOFCs are characterized by high efficiency factors. They operate in an environmentally friendly manner if the required fuel -that is composed of hydrogen and/or hydro-carbonates -is produced from renewable sources. When modelling SOFCs for control purposes, it can be observed that numerous effects from different physical domains are strongly interconnected. These effects include heat and mass transport phenomena, exothermic electrochemical reactions, and electric power supply of consumers via the systems' terminals. From the modelling perspective, these phenomena need to be described in terms of balance relations for the change of the internal energy of a fuel cell stack module -related to its temperature distribution -and by balancing mass transport phenomena of the gases supplied at both the anode and cathode sides. Finally, also strongly nonlinear phenomena of the electrical characteristics need to be addressed which include regions of activation polarization, Ohmic polarization, and concentration polarization in the electric current-power as well as current-voltage relations [1,3,[5][6][7][8][9][10][11].
On the one hand, the process of electric power production is directly related to the supplied gas mass flows and the resulting fuel utilization. On the other hand, its achievable efficiency shows a strong dependence on the fuel cell stack temperature. Especially the latter effect becomes important if a fuel cell is operated not only in the close vicinity of a predefined quasi-static operating point, but if a certain variability of the provided electric power is also desired in the case that the fuel cell is operated dynamically. In such cases, the mass flow of supplied anode gas (typically hydrogen or the output of a gas reformer unit mixed with nitrogen) is then utilized to control the power production process and to prevent fuel starvation. In addition, controlling the inlet temperature of preheated ambient air at the cathode side [12] allows for keeping the stack module's temperature within certain limits [13]. Lower temperature bounds need to be obeyed to ensure sufficiently good ion conductivity of the cell material, while an upper temperature threshold must not be violated to prevent thermal stress and material degradation that may lead to accelerated ageing and undesirably high costs of maintenance.
The aforementioned goals can be achieved by implementing nonlinear control procedures that are designed on the basis of low-order, however, sufficiently accurate dynamical system representations [13][14][15]. Due to the reason that these control approaches typically require the real-time capable estimation of non-measurable system states and disturbance variables, during both transient heating phases and non-stationary high-temperature operation, suitable system models are often restricted to finitedimensional sets of ordinary differential equations (ODEs) [16]. Although the underlying dynamics posses a distributed parameter characteristics, most control-oriented models for SOFCs are derived by means of an early lumping procedure, i.e., introducing a finitedimensional approximation before designing (temperature) controllers. For that purpose, finite element, finite difference, or finite volume schemes can be used. In the latter case, the distributed system properties are approximated by nonlinear ODEs which represent locally averaged variations of the thermal energy and, accordingly, spatially averaged variations of temperatures within finitely large elements [7]. Because these finite volume models have to be valid over large intervals for the operating temperature and because the internal storage variables represent the variations of thermal energy in an averaged sense, experimental parameter identification routines are inevitable to find the most suitable system parameters [17,18]. Modelling approaches that are closely related to this type of system representation are given in terms of Hammerstein models [19]. Other alternatives, like nonlinear models of coupled partial differential (PDE) as well as partial differential-algebraic equations [20,21] are more commonly used during offline design stages and are more challenging concerning their computational complexity if real-time control implementations are desired. The other extremal model, namely purely algebraic (look-up table-like) input-output mappings cannot be used for variable operating conditions due to their limited domains of validity [22,23].
As an alternative to an equation-based system modelling, the derivation of neural networks can be seen as a data-driven substitute. From a modelling and application point of view, both equation-based and data-driven alternatives have well-known advantages and disadvantages. While data-driven procedures allow for rapidly determining accurate simulation models by means of training neural networks with a predefined structure, the equation-based option has big advantages if model-based control procedures are to be derived. Those control procedures successfully exploit structural properties such as quasi-linearity of the dynamical ODE model 1 or affine couplings between control inputs and internal system states. For that reason, this paper tries to bridge the gap between data-driven and equation-based modelling paradigms by means of a physically inspired structuring of neural network representations of complex multi-physics processes. The physically inspired structuring of neural network models in this paper, which in a similar way is also used in [24], makes use of the following fundamental stages: (1) Define physically independent subprocesses such as heat conduction and exothermal reactions for the SOFC system with their corresponding subsystem inputs (e.g. gas mass flows and preheater temperatures) and outputs (variation rates of the internal energy or temperature); (2) Define interconnections of these subnetworks (e.g. additive superposition of different sources for temperature variations) and multiplicative couplings (representing proportionality of the exothermal heat production and the electric SOFC power with respect to the measurable stack current); (3) Define those network outputs for which experimental data are available so that they can be employed together with measured system inputs to set up appropriate training data sets for the structured network representation; (4) Include further structural insight such as linear dynamical elements to turn feedforward function approximation networks into nonlinear ODE models for the respective subsystem dynamics.
Moreover, this kind of structuring approach allows for simplifying the derivation of an optimal configuration of neural networks, in terms of the required numbers and couplings of both hidden layers and associated neurons. In addition, those input quantities can be identified systematically that represent the most relevant information content for the investigated input-output relations.
To structure the corresponding system models for SOFCs according to the four fundamental steps above, a finite number of storage variables is firstly specified in accordance with the equation-based system representations of previous work [7]. Then, the nonlinear relations at both the system inputs and outputs are defined as feedforward neural networks [25,26], for which the most suitable numbers of neurons in the hidden layers as well as the combinations of non-redundant and informationcarrying inputs are determined by a principal component analysis method exploiting a singular value decomposition of suitable input-output pairs according to [27].
At this point, it should be mentioned that the area of physics-inspired neural network modelling has gained a significant attention [28][29][30][31] in recent years. The current research directions in this field mostly employ deep-learning networks to approximate the multidimensional space and time dependencies of distributed process variables such as volume flows and pressures in fluidic systems. Based on the approximation of these process variables (which can be interpreted as a solution to a PDE), tasks such as the highly accurate training of the PDE structure, PDE discretizations, simulations for alternative sets of input data and boundary conditions or the identification of selected parameters can be solved. For that purpose, the complete PDE model is typically constructed by applying techniques for automatic differentiation [32] to the aforementioned deep neural network [33,34] to compute the involved (space) derivatives. Due to the fact that the space and time dependence of the solution is directly approximated, a large number of training data are required for such approaches. Loss functions are either energy-related expressions or squared solution deviations at initial and boundary points as well as at specific collocation points. In contrast to these techniques, the current paper proposes to structure a shallow neural network by means of physical insight as sketched above to obtain a computationally inexpensive model that is yet sufficiently accurate to perform control and state estimation tasks in future work.
To highlight the practical applicability of this approach, this paper performs a comparison of the physically structured neural network modelling procedure with equation-based alternatives derived in previous work of the authors, especially with the model described in [7] which served as the basis for a robust control design in [13]. The physically structured neural network models for the thermal system behaviour and exothermic reaction processes are composed of Hammerstein-type input nonlinearities and nonlinear state dependencies. In such a way, these mappings allow for a representation of the temporal variations of all storage variables so that they can be combined with linear dynamical elements representing the required integration with respect to time. A more general structuring of neural networks into static Hammerstein and Wiener parts that are coupled with linear transfer function elements can be found in [35]. This article and the references therein provide a good overview of current research activities concerning training techniques for finding optimal parameters of the static nonlinearities as well as of the included dynamical (sub-)transfer functions. These approaches as well as the approaches for a stability analysis of feedforward neural networks in a NARX configuration published in [36] could be combined in future work with the models and structuring techniques developed in this paper.
In addition, the structured neural network models proposed in this paper are extended to describe nonlinear dynamics in the electric power production that depend on the temporal variations of the supplied hydrogen mass flow, the fuel cell temperature (in terms of couplings with the thermal subsystem networks), and the electric current. In contrast, state-of-the-art neural network models for fuel cell systems are typically restricted to a representation of the quasi-stationary input-output behaviour [37][38][39] (both from an electrochemical and thermal point of view) which are less suitable for a control synthesis aiming at a dynamical system operation or, if they are given by dynamical representations, they focus directly on temporally discretized state equations [40] (in this reference for predictive control).
The remainder of this paper is structured as follows. Sec. 2 summarizes the basics of an equation-based modelling of the fuel cell system and as such provides the basis for the structural restrictions imposed to the neural network models derived in Sec. 3. Besides the presentation of three different system structures for the thermal SOFC behaviour and one dynamical model for the electric SOFC power, the optimization of these models with respect to the required numbers of hidden neurons as well as the relevant input quantities are discussed. Sec. 4 gives a comparison of the modelling accuracy of the equation-based and neural-network based system models. This section further focuses on the advantages of the presented neural network representations, highlights a simulation-based robustness analysis of two of the proposed novel network models, and gives an outline on how they can be interfaced with a robust control design that is based on the use of linear matrix inequality techniques for dynamical systems [41] with a polytopic uncertainty representation. Finally, conclusions and an outlook on future work are presented in Sec. 5.

Equation-based modelling of the thermal behaviour of a solid oxide fuel cell stack
In previous work of the authors [7,13], it has been shown that an equation-based model for the thermal behaviour of an SOFC system can be applied successfully to perform tasks such as model-based parameter and state estimation as well as control design for the heating and high-temperature reaction phases. Besides floating-point techniques for parameter identification, making use of local either gradient-based or gradient-free approaches, also guaranteed identification techniques on the basis of interval analysis were employed [42]. In the current paper, the equation-based model of the SOFC stack, which is separated from the gas preheater dynamics in Figure 1, serves as a reference solution against which different neural network models are compared. Those data-driven neural network models represent the system dynamics alternatively. If an equationbased, lumped parameter model is of interest, the overall dynamics of the SOFC stack is typically subdivided into a model for the temperature distribution in the interior of the stack module and into its associated electrochemical behaviour.
The lumped parameter representation of the thermal SOFC behaviour according to [7] is based on integral energy balance equations in a finite volume representation. It serves as a substitute for more detailed physically-oriented descriptions that could be given by nonlinear PDEs [1,2]. Although PDE models theoretically represent the distributed parameter nature of processes such as heat exchange and transport phenomena of ions in the interior of the fuel cell, they are difficult to handle if a real-time capable state estimation or control task is to be solved. The reason for this is twofold. The complex geometry as well as the large variety of involved materials in the interior of the SOFC stack lead, on the one hand, to an excessive number of variables to be specified (respectively, identified Figure 1. Semi-discretization of the fuel cell stack module with gas preheaters according to [7] for a parallel flow configuration of cathode and anode gas. experimentally) and to strongly coupled nonlinear equations for which boundary and interface conditions can hardly be specified in an accurate way. On the other hand, the spatial resolution of such models is by far too detailed (assuming that it would be possible to solve the aforementioned parameterization task perfectly) for any real-time control task.
Therefore, finite-dimensional sets of ODEs are commonly preferred as a compromise between modelling accuracy and computational complexity. These ODEs result from a spatial semi-discretization of the SOFC stack (shown by the cuboid in Figure 2 with the lengths L L , L M , and L N ). When performing this spatial discretization by means of a finite volume technique, the stack is subdivided into n x ¼ L � M � N equally sized elements. For each sub-cuboid with the edge lengths Figure 2, the element temperature # I :¼ # I ðtÞ, I :¼ ði; j; kÞ 2 fð1; 1; 1Þ; . . . ; ðL; M; NÞg, is described by an ODE Here, the individual summands account for the following phenomena according to Figure 3: • HT: heat transfer due to heat conduction and convection (as a linearized model for heat radiation), • G: enthalpy flows of the supplied gases, where χ in Figure 2 denotes all relevant gas fractions, • R: exothermic reaction enthalpy, and EL: Ohmic losses due to electric currents I I .
In addition to the thermal subsystem model, the electrochemical behaviour is commonly described by equivalent circuit models or phenomenological linear transfer function representations, see also Sec. 2.2.

Quasi-linear structure of the dynamical system model for the thermal SOFC behaviour
Following the procedure described in [43,44], the nonlinear ODE model for the thermal system behaviour can be re-written into the quasi-linear, input-affine state equations This results from the assumption of heat transfer terms _ Q I HT that are linear in the stack temperatures as well as from the representation of the heat capacities of all gases and reaction enthalpies in _ Q I G;I À j and _ Q I R by means of temperature-dependent polynomials. Hence, a multiplicative coupling between the system matrix A :¼ A x; p ð Þ and the state vector x containing all finite volume element temperatures is obtained. Therefore, this system matrix, as well as the input matrix B :¼ B x; p ð Þ in which Ohmic losses _ Q I EL are included, depend on the state and parameter vectors x and p, respectively.
If a discretization using three finite volume elements located in the direction of the gas mass flow according to L ¼ N ¼ 1 with M ¼ 3 is performed, the structure of the system and input matrices turns into Then, the state vector is defined as x ¼ # ð1;1;1Þ # ð1;2;1Þ # ð1;3;1Þ � � T ; all inputs (control and disturbance variables, i.e., the ambient temperature, the anode and cathode gas inlet temperatures and the electric current) are summarized in the vector where a locally homogeneous current distribution according to I ð1;1;1Þ ¼ I ð1;2;1Þ ¼ I ð1;3;1Þ ¼ 1 3 I is assumed. If an in-depth structural analysis of this system model is performed in the frame of a model-based, robust control and state observer design [43,44], the following properties become visible: (1) all off-diagonal elements in the system matrix (3) are non-negative (so-called Metzler matrix); (2) all diagonal elements in A x; p ð Þ need to be strictly negative to reflect physically motivated stability properties if no further disturbance terms are included in (2); (3) all elements of B x; p ð Þ are non-negative; (4) from a thermal point of view, # CG;in is the control input, all other entries in u are disturbance inputs.

Modelling of the electric fuel cell power
As stated in [7,44], the electric fuel cell current I k ¼ Iðt k Þ at the sampling instant t k as well as the consumed mass flow of hydrogen are related to each other via Faraday's law of electrochemistry. In addition, the terminal voltage U k ¼ Uðt k Þ of the fuel cell stack can be determined with the help of information on the stack's open-circuit voltage (related to the Nernst potential), from which various voltage drop phenomena need to be subtracted. These voltage loss phenomena (namely, activation polarization, Ohmic polarization, and concentration polarization) can be represented with the help of quasi-static relationships, such as the analytic approximation This approximation was investigated in [8] for the purpose of a maximum power point tracking procedure which is based on the online estimation of the coefficients a i , i 2 f0; . . . ; 6g, by means of a Kalman Filter procedure. An extension of this estimation approach towards a real-time implementable optimization of the fuel efficiency was published in [45]. In (4), the included logarithmic term of the electric current is inspired by Tafel's equation [46,47]. Multiplying this voltage expression with the total electric current directly yields the quasi-static electric power in terms of P EL;k ¼ U k � I k .
Due to the dependence on the overall mass flow of hydrogen and the electric current (which is proportional to the consumed hydrogen mass flow), stationary operating points can be predicted with such kind of models. However, if the coefficients are not adapted online, this model is only valid in the close vicinity of the stack and inlet temperatures for which the parameters were identified. Dynamical dependencies further arise due the relatively fast exchange of charge carriers in the interior of the SOFC stack and, if the power is controlled by using the hydrogen mass flow over longer time scales, by fluidic inertia of the transported gases and by couplings with the thermal system state.
For a constant thermal operating point, it was shown in [15] that linear transfer functions between _ m H 2 and P EL of at least third order are required to implement control procedures that allow for smooth transitions between various electric operating points. However, these models show errors in stationary gains and time constants of more than 50 % (compensated for in [15] by a robustification 2 of the control procedures) if the stack and input temperatures significantly deviate from the point of identification.

Physically structured neural network modelling
The structural investigation of an equation-based model gives rise for three different alternative neural network configurations for the thermal dynamics of the SOFC stack. These models are sketched in the following subsections, where all sigmoid-type hidden layers are implemented as hyperbolic tangent functions. All connections for which the weights are a-priori defined, for example, for the representation of matrix-vector products or unweighted summations of various sub-network outputs, are highlighted by means of non-filled arrow heads in the corresponding graphical representations; all connections with weighting factors to be adapted during training are visualized with filled ones.

Fundamental data-driven thermal SOFC model
As stated in the introduction of this paper, it is desired to derive continuous-time network models. These are given by means of a static mapping between the current system states as well as the collection of all measurable (time-dependent) control, input, and disturbance variables in the input layer of a function approximation neural network and the respective outputs of the net. Before the training takes place, all input and output vectors of the fundamental network structure in Figure 4 are low-pass filtered with identical time constants so that undesirable phase shifts between the input-output relations are avoided. The system dynamics are then represented by interpreting the network outputs as the right-hand side of a set of ODEs In such a way, the neural network representation serves as a substitute for the previously described analytic system model summarized in Eqs. (1) and (2).
In the fundamental neural network model according to Figure 4, the state vector x t k ð Þ defined in (5) represents the finite volume element temperatures according to Sec. 2 with n ¼ 3. The model (8) does not explicitly distinguish between the different physical reasons for temperature variations. However, this knowledge and their separation are essential if a control synthesis is of interest. Such control procedures typically aim at manipulating the cathode gas enthalpy flow (in the following included in a heat transfer subnetwork) by variations of the cathode gas inlet temperature and the respective mass flow which are both components of the vector (6). Using such a control procedure, temperature variations caused by the exothermal reaction enthalpies are suppressed. Therefore, the models in the following two subsections provide a more detailed representation of the dynamics by distinguishing these two phenomena with the help of a structured network model.

Separation of heat transfer and exothermic reaction enthalpies
As a first in-depth modelling approach for the thermal system behaviour, the dynamics for temperature variations due to exothermic reaction enthalpies are separated from heat transfer phenomena in the differential equations (8). The exothermic processes are only relevant for non-zero (i.e., strictly positive) electric currents I t k ð Þ. This can be reflected by the variation rates _ x R;i , i 2 f1; . . . ; ng, which depend in a multiplicative way on the electric current, see also the input-affine structure in Eq. (2).
The respective subnetwork for the reaction enthalpies furthermore depends on the vector q t k ð Þ defined in (6) and on the current state information x t k ð Þ as the finite volume element temperatures (5). Note, for this subnetwork with the hidden neurons R 1 ; . . . ; R L 1 , the input as well as output weights of the multiplicative layer need to be treated as constant parameters that are all set to the value 1. The a-priori fixing of these parameters ensures that the corresponding temperature variation rates _ x R;i can be described independently by the respective subnetwork outputs and afterwards added to the heat transfer outputs _ x th;i without any further scaling. In a similar manner, however, without any multiplicative couplings at the subnetwork outputs, the heat transfer behaviour is described by the lower block in Figure 4. This leads to the overall system model  (5) and (6) as well as the network outputs (7).
that -as mentioned before -serves as a substitute for the differential equations (8). This model structure has the clear advantage that f net;R x t k ð Þ; q t k ð Þ ð Þ;0 holds perfectly for vanishing currents I t k ð Þ;0 during the system's heating phase so that the actual values of f net;R x t k ð Þ; q t k ð Þ ð Þ directly represent the disturbance heat flows to be compensated by a control procedure that aims at keeping the fuel cell stack temperature in the close vicinity of a desired thermal state.
However, disturbances which also arise in the interior of the heat transfer block, are not yet extracted by this representation. As shown by the simulation-based robustness analysis in the following section, this may lead to a certain tendency for overfitting the training data and hence to an insufficient capability to generalize to perturbed system inputs q t k ð Þ or uncertainty in the initial stack temperature x t 0 ð Þ. Therefore, the following subsection imposes a quasi-linear, input-affine structure for f net;th x t k ð Þ; q t k ð Þ ð Þ that is inspired by the equation-based model (2) in Sec. 2.1 so that the corresponding neural network model can be employed as the basis for a robust control synthesis in future research.

Quasi-linear data-driven heat transfer model
In the third neural network model, a quasi-linear structure of the thermal SOFC behaviour is identified. The model reflects the quasi-linear state-space representation (2) according to the model-based problem formulation in Sec. 2. As in the equation-based approach, a state-dependent system matrix A x t k ð Þ; q t k ð Þ ð Þ and a state-dependent input vector b x t k ð Þ; q t k ð Þ ð Þ are introduced. The latter is coupled in an input-affine way with the cathode gas inlet temperature # CG;in as the control variable, cf. Sec. 3.1. In addition, all terms that do not fit into this structure, as well as dependencies on the ambient temperature # A and the anode inlet temperature # AG;in are captured by the disturbance term d x t k ð Þ; q t k ð Þ ð Þ. To represent the corresponding matrix product in the term A x t k ð Þ; q t k ð Þ ð Þ � x t k ð Þ, its linear subnetwork output in Figure 6 represents all matrix elements in a column-wise notation. Then, the fixed gains for the subsequent multiplication and summation layers are set appropriately to the values 0 and 1 so that the matrix-vector product forming the first summand in (10) is obtained. Similarly, also the multiplicative output of the second block b x t k ð Þ; q t k ð Þ ð Þ � # CG;in t k ð Þ is parameterized. In both cases, as well as for summing up the terms f net;th x t k ð Þ; q t k ð Þ ð Þ with the subnetwork outputs f net;R x t k ð Þ; q t k ð Þ ð Þ for the reaction enthalpies, which remains unchanged as compared to Figure 5, the respective weighting factors are not adapted during any training stage.
In contrast to the analytic system model (2), zero elements are not imposed in A x t k ð Þ; q t k ð Þ ð Þ and b x t k ð Þ; q t k ð Þ ð Þ during the neural network training. From a modelling perspective, this can be interpreted as a degree of freedom to be exploited by the training algorithm so that not only interactions between directly neighbouring elements are accounted for. Interactions from more distant elements hence resemble higher-order spatial discretization techniques that are known from solving partial differential equations numerically [48].

Remark 1.
Currently, the optimization approach for the presented neural network for the quasi-linear system structure of the thermal dynamics is solely based on the minimization of the squared distance between the measured temperature derivatives and the corresponding model forecasts. Future work could further account for the following options: (1) Enforcing the structure of the matrix A x t k ð Þ; q t k ð Þ ð Þ as in Eq. (3) during training of the neural network (especially, enforcing a Metzler-type parameterization with guaranteed non-negative off-diagonal elements); (2) Minimizing the term d x t k ð Þ; q t k ð Þ ð Þ by treating it as a penalty term added to the squared forecast error so that the dynamics are represented in a quasi-linear way without possible redundancies in the term d x t k ð Þ; q t k ð Þ ð Þ.

Remark 2.
The use of a single hidden layer with the neurons H 1 ; . . . ; H L 2 for the thermal system behaviour and using this as the input for all three output terms highlighted in grey colour in Figure 6 represents the influence of all physical phenomena (especially, the non-exothermic heat transfer in the SOFC stack) by common nonlinear input-and statedependent relations.

Data-driven modelling of the electric fuel cell power
In analogy to the unstructured model for the thermal system behaviour according to Figure 4, a neural network-based representation for the dynamics of the electric fuel cell power can be derived as a dynamical system representation. It describes the relation between the vector q t k ð Þ and the electric voltage Uðt k Þ as the inputs to a finite number of time derivatives of Uðt k Þ. To avoid unnecessarily stiff system models and to focus on long-term voltage variations with their dependence on the thermal operating state, it is possible to restrict this network to a single scalar output variable _ Uðt k Þ. The direct proportionality of the electric power to the measurable current is then established by integrating _ Uðt k Þ over time and multiplying it with Iðt k Þ. For generalizations of this model towards a fractional-order representation of the dynamics, the reader is referred to [45].
Alternatively, a static feedforward function approximation network can be trained at this stage for which Uðt k Þ needs to be removed from the input layer and treated directly as the network output.
For all neural network models presented so far in this section, optimal numbers L (resp. L 1 and L 2 ) of neurons in the hidden sigmoid layers can be determined by the principal component analysis approach [27] summarized in Secs. 3.6 and 3.7. In addition, also connections to redundant inputs and not sufficiently rich entries (with respect to their information content) of the vectors q t k ð Þ are removed by the procedure described in the following subsection to limit the network complexity and to reduce the computational burden during both training and evaluation.  Figure 5). Remark 3. The continuous-time system formulation, in which the nonlinear dynamics are represented as static mappings between system inputs, current system states and forecasted time derivatives of all state variables, was preferred in this paper over recurrent discrete-time network models or other discrete-time representations such as NARX models in which temporally delayed state variables are introduced as further network inputs. By the proposed choice of modelling, a system representation is obtained in Figure 6 that allows in future work for combining data-driven modelling methodologies with model-based techniques for a robust control synthesis. With the help of the quasilinear system structure according to Figure 6 (or, respectively, Eqs. (2) and (10)), a stabilizing control synthesis can be performed if linear matrix inequality techniques are employed. For the purpose of the control design, it will be necessary to extract worstcase bounds of all entries of the matrices and vectors A x t k ð Þ; q t k ð Þ ð Þ and b x t k ð Þ; q t k ð Þ ð Þ, respectively. In such a way, the procedures from [43,49] can be extended to system models, where the dynamics are internally represented in the form of neural networks. Fundamentally, it will become necessary to extract the worst-case realizations from Here, determining the respective matrix element bounds by techniques from the field of interval analysis will be a promising direction for future work.

Application of physically motivated network structures to other dynamical systems
Besides modelling of SOFC systems, the proposed physically motivated structuring of subsystem representations is useful for a large variety of other modelling tasks. An example is the control-oriented representation of state equations for polymer electrolyte fuel cells. There, techniques for impedance spectroscopy could provide the identification data for the current-voltage characteristics, cf. [50], where a multiplication with the actual current leads to the corresponding electric power in analogy to the previous subsection. Moreover, direct dependencies of models for the gas partial pressures on the supplied hydrogen mass flow could be represented by multiplicative output layers (cf. Figure 5).
Apart from modelling and identification of dynamical systems in electrochemistry, compartmental models that are widely used for representing biological systems, medical processes or tasks of water purification are obvious candidates. There, suitable neural network structures could be extracted from a graph topology that can be used to describe such processes [51,52]. For wastewater treatment processes, for example, these may be the interconnections between aeration tanks and specific layers in a subsequent settler, where oxygen supply necessary for the growth of substrate consuming bacteria only takes place in the first component [53].

Optimal network inputs
The selection of optimal (non-redundant and sufficiently information carrying) network inputs in this subsection as well as the choice of the optimal numbers of hidden layer neurons in the next subsection are performed by means of a subset selection that is based on a singular value decomposition of suitable matrices [27].
For the case of choosing the optimal network inputs, the training data matrix T 2 R l�m is considered, where l denotes the number of training samples; m is the number of inputs to the networks (excluding the state variables that are fed back in the dynamical system representations discussed before).
Using this matrix T, a singular value decomposition is performed according to where U T where σ T;1 � σ T;2 � . . . � σ T;m � 0 are the singular values sorted in descending order and 0 ðlÀ mÞ�m 2 R ðlÀ mÞ�m is a zero matrix of appropriate dimension. Due to the large number of sampling points available as measurements in this application (typically a few hundred thousand), l is chosen during a correlation-based data aggregation stage to be larger than m by at least a factor of 100. The actual choice of the l data points for the singular value decomposition is described further in Remark 4. The number η T > 1 of relevant system inputs is then identified as the largest integer for which holds with the normalized singular values and the sufficiently small threshold value 0 < � T � 1. Now, as described in [27], define the matrix � V T as the first η T columns of V T and partition it into Performing a QR factorization of � V T T with column pivoting yields a permutation matrix P T 2 R m�m such that holds so that R T;1 is upper triangular and Q T T Q T ¼ I. Now, the selected input subset T 1 is obtained as where T 1 2 R l�η T .

Remark 4.
For the selection of the number l of training data, the percentage from the overall experimental network inputs is successively decreased so that this percentage of data -chosen randomly -still reflects the fuel cell's overall input-output relationship. For that purpose, the matrix T is expanded by all corresponding system outputs (i.e., the time derivatives of the element temperatures or the measured stack voltage). Then, the matrix of correlation coefficients C 2 R ðmþnÞ�ðmþnÞ is computed, both for the original and reduced data set after deleting certain rows of T and the associated outputs. If the correlation coefficients of the original and the reduced data do not deviate significantly, the reduction is admissible. This reduction stage eliminates points before the singular value decomposition that aims at the optimal input selection. The neural network training, however, is fully independent of this reduction stage and may contain its own data aggregation. Especially after the reduction of network input data by this computation of correlation coefficients, one obtains so-called tall skinny matrices for which a principal component analysis can effectively be evaluated on standard CPUs (for a number of training samples around l¼10 5 with m < l=100 input vector components, where this latter value turns into the number L of hidden neurons in the following subsection). For larger values of both l and m, this principal component analysis can be carried out on modern GPU architectures. Then, approximate solutions of singular value decompositions by truncation techniques seem to be one of the promising solutions [54][55][56][57]. However, GPU implementations were not investigated in this paper, because a standard CPU implementation was sufficiently efficient for the application scenario in this paper.

Remark 5.
Input variables in the set of training data that have a negligibly small correlation to all system outputs in the same data set are natural candidates to be omitted when training the networks in Figures 4-6. The detection of such candidates can be performed by a classical correlation analysis. Strictly speaking, this technique is only applicable if besides the second-order moments of the combined input-output distribution also all further higher-order moments of these input-output pairs are practically zero. In those rare scenarios, where the correlation-based input exclusion leads to poor system approximations, this input variable reduction should be reverted.

Optimal number of hidden layer neurons
The selection of an optimal number of hidden neurons basically follows the same stages as in the previous subsection. The major difference is that the matrix T from the previous subsection is now replaced by a matrix H 2 R l�L , where L is the number of neurons in the hidden layer under investigation. This matrix is determined from a simulation of an over-parameterized neural network that has been trained up to the point of reasonable convergence. Then, the respective matrix H is used to determine the singular value decomposition U H , Σ H , V H , from which the 1 � η H < L most important singular values are extracted after specifying the threshold � H according to the same procedure as before. This singular value decomposition approach reveals that η H neurons represent the main system information in the hidden network layer. This means that the outputs of the remaining L À η H neurons can be described (despite the nonlinear sigmoid activation functions whose outputs are included in the data H) by a linear combination of the most important η H neurons. Deleting those non-relevant neurons is reasonable because the outputs of the hidden layers in all networks in Figures 4-6 are combined linearly within the subsequent layer(s). After deleting the non-relevant neurons and retaining the pre-trained weights and bias values, training is continued with two further options. If the approximation quality (measured as the sum of squared output errors over the complete training data set) deteriorates significantly for the reduced network size (typically due to an inappropriate threshold � H ), η H is increased to a larger value (in the limit case, all neurons may be kept, i.e., leading to η H ;L). If the quality of the reduced network is still acceptable, a further reduction is possible as shown by the break conditions in Figure 7.

Remark 6.
For networks with multiple hidden layers, this procedure can be employed individually for each of the sigmoid layer outputs.

Remark 7.
Individual network links with sufficiently small weights (corresponding to sufficiently small correlation coefficients between the hidden layer inputs and outputs) may be deleted fully in order to limit the computational effort. This option, however, is not further considered in this paper.

Training and experimental results
In Figure 7, the overall training and optimization procedure for all neural networks presented in this paper is summarized. The basis for the training and numerical evaluation of all networks has been the acquisition of measured data from an SOFC test rig available at the University of Rostock for a time span of slightly more than eight hours from which the high-temperature reaction phase takes place in the last 1:5 hours. The use of these training data -with relatively many points during the electrochemically idle phase -has the advantage that the influence of the cathode gas enthalpy flows can be described accurately. This allows for developing accurate temperature control strategies following the aims mentioned in Sec. 3.1.
The experimental data used in this section (comprising information on all segment temperatures, gas mass flows, stack inlet temperatures, as well as the electric terminal current and voltage) are sampled with a temporal resolution of 100ms. Before the optimal input selection for the networks has been carried out, these data have been lowpass filtered with a first-order linear transfer function with an edge frequency of 1Hz. This frequency is also used for a low-pass filtered derivative estimation after its automatic numerical discretization using the FixedStepAuto option in SIMULINK R2019b.
To perform the optimal input selection, the total number of approx. 300;000 data points has been reduced to approx. l ¼ 10;000 points by a random choice, where the latter corresponds to the statements of Remark 4 with a change of the correlation coefficients that is smaller than 0:1%. After that, the singular value decomposition has been evaluated with the results presented in Table 1 (containing also the optimized numbers of hidden layer neurons). The ✓ symbols in Table 1 represent those inputs that are -according to the singular value decomposition -relevant for describing the corresponding system dynamics, while ✘ denotes the irrelevant inputs. For example, the heat transfer subnetwork according to Figure 5 should not use the electric current, voltage, and ambient temperature as inputs, while only the ambient temperature (which was practically constant for the whole experiment) could be eliminated from the reaction enthalpy subnetwork.
The training of the neural networks has been preceded by a data aggregation stage. For the thermal system models, 1-minute averages of all measurements have been formed (which are still shorter time spans than the typical thermal time constants), while the electrochemical behaviour has been identified for 1-second averages. The network   Figure 6 (reaction enthalpies)   training has been carried out in MATLAB using the standard Bayesian regularization backpropagation algorithm, parallelized on two CPU cores with a maximum number of 5000 epochs in which a worsening of the validation performance was allowed in 50 subsequent iterations. From the aggregated data sets, multiple random subdivisions into training (70%), test (15%), and validation (15%) data were performed. For all presented results, the cost function was chosen as the sum of squared output errors with respect to the derivatives of the state variables. Table 2 summarizes the best possible performance after optimization in terms of the root of the mean-square approximation error (RMS). There, ✓ symbols indicate that the specific quantity in the columns for training data represented the desired network output, while ✘ in the training data columns indicated that the corresponding data are not used for the respective network option. In addition, ✘ in the column optimized? Denotes that the corresponding number of hidden layer neurons is the outcome of the singular value decomposition-based optimization procedure. This optimization was continued until the number of 7 hidden layer neurons was reached in the option T4. There, the RMS value started to increase (but was still below the analytic, equation-based system model's quality from the authors' previous work). Therefore, the value of hidden layer neurons was left unchanged in T5-T8, denoted by the sign (✓) in parentheses, while the hidden layer optimization was independently repeated for the electric power networks, see P1-P4.
Having a closer look at the optimization results, it becomes obvious that the heat transfer subnetworks in Figures 5 and 6 do not make use (as stated above) of the electric voltage and current. Moreover, one of the stack temperatures and the cathode gas mass flow have been identified to be irrelevant for approximating the stack voltage (P1-P4). It can be seen clearly from Table 2 and from the graphical visualizations in Figures 8a and 8b that the RMS of the thermal model was reduced by using all neural network variants by a factor of at least to 1:6 in comparison with the analytic system model (R0). Moreover, Figure 8c displays the measured electric power. A comparison of the time span of non-zero powers with the regions of largest errors of the model R0 shows that the analytic system representation becomes less accurate as soon as the exothermal reaction starts to take place.
All neural networks, for which the approximation errors are depicted in Figures 9 and 10 reduce this error noticeably. Especially, the robustified options help to stabilize the approximation errors also during the high-temperature reaction phase. According to the structure diagram in Figure 7, this robustification was achieved by adding further Figure 10. Experimental identification of the thermal fuel cell behaviour with Δ# ð1;j;1Þ :¼ # m;ð1;j;1Þ À # ð1;j;1Þ , j 2 f1; 2; 3g, cont'd.
training data points with random temperature disturbances of 2K standard deviation to the measured quantities. This simple, yet heuristic measure helps obviously to prevent overfitting of the measured quantities and preserves the networks' capabilities for robustness and generalization to inputs which deviate from the training data set. This property is later discussed in more detail after the presentation of the results for modelling the electric fuel cell power.
In addition, Figure 11 and Figure 12 show the accurate modelling of the electric power by using both neural network options which outperform the LTI transfer function derived in [15] especially in constant power phases in which temperature changes occur. It should also be noted that the dynamical voltage (power) model is more accurate in most phases except for stack powers close to 90W. The reason for this is most likely that the training data for powers close to this value contain mostly constant operating points and no significant power variations in transient system operation. Figure 10. Experimental identification of the electric fuel cell behaviour (phases with missing data correspond to an operation of the SOFC with vanishing current, leading trivially to P EL ¼ 0 W).
From an application point of view, it is not only necessary that the trained neural networks allow for accurately reproducing the training data. In addition, they must also be insensitive against real-life input variations and must be stable in the vicinity of the training Figure 11. Experimental identification of the electric fuel cell behaviour (phases with missing data correspond to an operation of the SOFC with vanishing current, leading trivially to P EL ¼ 0W), cont'd. data set. These properties are checked by randomized simulations (100 simulation runs for each of the networks T6 and T8) in Figure 13. For that purpose, the initial stack temperatures as well as the cathode and anode gas inlet temperatures were disturbed by normally distributed random numbers with a standard deviation of 3K, the hydrogen mass flow by a standard deviation of 2% of its actual value, and the stack voltage and current independently by 0:5%. From Figure 13, it becomes obvious that the network model T6 has the drawback that some of the disturbed simulations tend to become unstable during the high-temperature reaction phase. This is caused by the lack of a model-free disturbance term d x t k ð Þ; q t k ð Þ ð Þ that was included additionally in T8, see also Figure 6. Moreover, the heuristic robustification approach by adding small temperature disturbances to the training data set also helped to make this network model less sensitive against input variations. Hence, the robustified quasi-linear structure according to Figure 6 implemented as the neural network T8 represents the basis for a future robust control design for the thermal system behaviour. It also enables possible interfaces between the presented physically motivated neural network models and interval-based control and state estimation schemes according to [13,43,44]. As far as the modelling of the electric fuel cell power is concerned, all of the presented models P1-P4 are capable of representing the actual system dynamics and can serve as a basis for deriving power controllers as well as for validating procedures that allow for maximum power point tracking or for the optimization of the fuel efficiency according to [8,45].

Conclusions and outlook on future work
In this paper, various neural network structures were presented and optimized for the multi-physics behaviour of SOFC systems. It was shown that physical insight, inspired by a first-principle modelling, allows for deriving networks with a specific (in this case quasi-linear) structure between system states and identified nonlinear characteristics. In such a way, it was not only possible to bridge the gap between purely equation-based and data-driven paradigms but modelling errors were also reduced significantly by the specific system structure under consideration of robustness constraints.
As already discussed, these structured neural networks will be used in future work as the basis for a reliable, guaranteed stabilizing control synthesis. For that purpose, it will be investigated how far methods from the field of interval analysis, possibly in combination with redefining the neural networks' parameters as interval quantities according to [58], can be used to bound the worst-case network outputs and how this information can then be employed for a control synthesis with certified stability over a predefined range of system inputs and operating conditions. Notes 1. A quasi-linear continuous-time system model is typically given in the form _ x ¼ A x ð Þ � x þ B x ð Þ � u for which linear control and state estimation procedures can be applied by introducing state-dependent gain matrices. This procedure is also often easier to apply than more general nonlinear control techniques and is denoted as extended linearization. It is discussed in further detail in [59][60][61] and the references therein. Please note that we also use the term quasi-linear in the frame of our neural network modelling if the corresponding subsystem representation makes use of this specific system structure. 2. The influence of this uncertainty is reduced either by an integral output error feedback, sliding mode controllers, or by disturbance estimates on the basis of an internal model control approach.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
A. Rauh would like to thank Lab-STICC (Robex) of ENSTA Bretagne for covering the article processing charges. The contribution of J. Kersten and W. Frenkel to this study was funded by the European Union within the European Regional Development Fund [TBI-V-1-256-VBW-090].