Cycle-by-Cycle Combustion Optimisation: Calibration of Data-based Models and Improvements of Computational Efficiency

ABSTRACT Modern combustion engines require an efficient cycle-by-cycle fuel injection control scheme to optimise the single combustion events during transient operation. The online optimisation of the respective control inputs typically needs accurate while sufficiently simple models of the combustion quantities. Based on a recently presented cycle-by-cycle optimisation scheme with a hybrid model, this paper focuses on two aspects to enhance the accuracy as well as computational efficiency for an online computation. Firstly, the proper calibration of Gaussian processes nested in a combined physics-/data-based model structure is addressed. Respective test bench measurements and a tailored two-step training procedure are presented. Secondly, the computational efficiency of the online cycle-by-cycle optimisation is increased by mapping computationally intensive calculations into the data-based models through offline preprocessing. In addition, a data-driven approximation of the complete optimisation scheme is proposed to further minimise the computational demand. Simulation studies are used to evaluate the performance of these approaches.


Introduction
Despite the ongoing electrification of the mobility sector, in the near and mid-term future, diesel driven vehicles will still play an important role in public transportation, especially for heavy-duty and off-road applications [1,2]. Thus, increasing the diesel engine efficiency, i.e. decreasing the emitted CO 2 , while minimising harmful emissions like nitrogen oxide (NO x ) or soot remains a high priority [3]. Since the transient engine operation offers great optimisation potential, recent governmental legislations enforce diesel engines to fulfil emission limits that were required only for less dynamic scenarios also under highly transient conditions, i.e. Real Driving Emissions (RDE) [4,5].
Focusing on the transient engine operation, sophisticated control schemes aim to optimally regulate the combustion process via the actuated inputs of e.g. the air or the fuel injection system [6,7]. In this field, approaches of cycle-by-cycle control try to optimise the single combustion events during engine transients. For this task, the parameters of the fuel or even water injection [8] are the main manipulated inputs as they directly affect the combustion whereas air system variables, such as intake pressure or oxygen fraction, pertain time-varying boundary conditions [9]. In order to optimally control this system, mathematical models are employed to precisely assess the cross-relations between the various actuated inputs, time-varying conditions, and combustion quantities (emissions and torque). Since accurate physicsbased approaches are often too complex for fast-running cycle-by-cycle control schemes, data-based modelling techniques [10][11][12] are utilised due to their tight mathematical structure [13,14] despite their limited extrapolation capability, e.g., for extreme conditions or unseen transients. Further, also combined physics-/databased, i.e. hybrid, models [15] are in the scope of research as they aim to unite the advantages of both modelling domains. Based on the modelled relations between the combustion in-and outputs, the actuated signals are determined online or offline through optimisation-based model inversion [11,16,17].
The real-world application of the previously characterised controllers necessitates a consistent calibration of their data-based models as well as a sufficient computational efficiency to satisfy real-time requirements. In detail, the data-based model calibration involves the gathering of reference data that suits the system specifics and the model purpose. Respective approaches discussed in literature range from sophisticated transient measurement procedures to stabilise Homogeneous Charge Compression Ignition [12] towards steady-state approaches to describe the emissions and torque of engines with a conventional combustion [10,18]. According to the discussed concepts, the design of the experiments must also consider the employed data-based modelling approach. In case of neural networks, the authors of [11] e.g. propose a wide-range design with mixed space filling and full factorial parts whereas the local linear models discussed in [10] require a spatially divided concept tailored to their distributed nature. Even if suitable data is available, the overall model structure may further complicate the data-based model generation. For instance, in a hybrid model, the coupling of the physics-/data-based parts must be considered to determine appropriate training data. Respective calibration approaches are yet not widely discussed.
In order to enhance the computational efficiency of cycle-by-cycle control schemes various concepts are discussed in the literature. Approaches of online optimisation are e.g. tuned by their initial conditions (warm vs. cold start), shut-down criteria, or number of iterations [16]. To further reduce the online effort, the optimisation problem is preprocessed offline and the results are stored in surrogate data-based models [11,17,19]. These concepts increase the computational efficiency but also require a suitable sampling of the optimisation problem to properly calibrate the surrogate models. In addition, they shrink the flexibility of the original optimisation as e.g. configurable weights are fixed in the offline solutions. Thus, keeping the online optimisation instead of the data-based approximation but reducing its computational effort through mathematical simplifications seems a promising, yet not well discussed concept.
Following the literature overview, this paper focusses on the calibration of databased models located in a hybrid model structure and discusses enhancements of the computational efficiency of an online optimisation scheme. The papers' contributions are based on a cycle-by-cycle combustion control scheme that was previously presented in [20]. This approach adapts the fuel injection pattern cycleby-cycle and includes a hybrid cylinder chamber model [21] where combined physics-/data-based parts predict the emissions and the indicated mean effective pressure (IMEP) per engine cycle. The calibration of the data-based part gets challenging as e.g. one of the models does not predict an output signal but rather is integrated closed-loop into the state space representation. Therefore, the paper extends a mixed space filling / full factorial measurement design similar to [11] by a two-step calibration procedure that separately generates the state space and output data-based models. Furthermore, the computational effort of the online optimisation is induced by certain pre-calculations, parts of the physic-based model, and an equality constraint. To exclude these elements from the online execution, the paper describes a concept to map their characteristics through offline preprocessing into the data-based models. As a result, they still affect the online optimisation although they are not executed individually any more. In addition, the paper discusses the derivation of a full data-based approximation of the online optimisation scheme while keeping its original flexibility e.g. by preserving configurable weights.
The paper is structured as follows. Section 2 recapitulates the hybrid cylinder chamber model of [21] and its application to the combustion optimisation in [20]. Section 3 introduces a calibration procedure for the data-based parts of the hybrid model scheme and discusses the design of test bench measurements in order to gather appropriate reference data. Section 4 describes measures that reduce the computational effort of the original combustion optimisation through adaptations of the optimisation scheme and the data-based models. Section 5 evaluates these improvements by means of simulation studies and the conclusions are discussed in Section 6.

Combustion optimisation with a hybrid cylinder chamber model
In order to optimally control the combustion during transient engine operation, an optimisation scheme was introduced in [20] to determine correction values for the fuel injection parameters under consideration of their effect on the engine NO x and soot emissions as well as on the IMEP. These adaptations are set cycle-by-cycle in reaction to the current air system state and the fuel pressure which are typically delayed during engine transients. The optimisation-based approach thereby extends the standard fuel injection control as depicted in the system overview of Figure 1. In course of the optimisation, the considered engine outputs are determined by a combined physics-/data-based, i.e. hybrid, cylinder chamber description from previous work [21]. Since the current paper discusses the calibration of the databased model part as well as its utilisation for improving the computational efficiency of the optimisation, this section recapitulates relevant aspects of the initial modelling and control approach. Accordingly, Section 2.1 introduces the hybrid cylinder chamber description and Section 2.2 describes its utilisation for the combustion optimisation.

Description of the physics-/data-based combustion chamber model
The cylinder chamber modelling approach proposed in [21] transforms a conventional, lumped parameter description concept [22][23][24] into a hybrid approach that combines physics and data-based models. Thereby, the cylinder interior as well as the gas exchange and fuel injection valves define the balance area boundaries, as indicated by the dashed lines in the cylinder sketch in Figure 2. The model further differentiates between two gas fractions, namely oxygen (O 2 ) and the pseudo-component not-oxygen (O 2 ). Overall, it contains the states with the total gas mass m, the oxygen faction X O 2 , and the cylinder pressure p.
In detail, the physics-/data-based cylinder chamber description separates the combustion cycle k into the three stages depicted in Figure 3, namely gas exchange (blue), compression (grey) and combustion (yellow). The mathematical representation of this arrangement refers to a periodic hybrid automaton [25] that comprises the mentioned phases as well as their transition events, i.e. exhaust valve opening at t EO k , intake valve closing at t IC k and the start of the first fuel injection at t I1 k , see Figure 3. At each transition, the final state of the currently active phase defines the initial state x 0 of its successor. Except to this requirement, each phase is assumed to be free regarding its specific modelling concept. This generic framework of sequentially connected phases allows to describe the evolution of the cylinder chamber state x t ð Þ by means of In this approach, the gas exchange and compression phase (phases 1 and 2) are modelled by means of the continuous-time, lumped-parameter cylinder modelling concept [22][23][24] that is represented by the functions f P1 � ð Þ and f P2 � ð Þ, respectively. The input signals of phases 1 and 2 in (2) consider the external air system dependencies, i.e. the thermodynamic states in the intake (IM) and exhaust (EM) manifold, see Figure 2, as well as the engine speed n E . The time-continuous input variables of the gas exchange phase in (3) also enable to attach further models which e.g. describe the air system delay and deadtime effects [26][27][28].
In contrast to the gas exchange and compression, the complexity of the combustion process requires complex models [29][30][31][32] which are, however, inappropriate for the desired combustion optimisation. Thus, at phase 3 in (2), the continuous-time evolution of the states xðtÞ is substituted by the discrete-time approximation that only determines the final state x t EO kþ1 À � of the combustion phase, since it is required for the initialisation of the succeeding gas exchange phase, see Figure 3. In detail, physics-based surrogate models predict the gas mass mðt EO kþ1 Þ and the oxygen fraction X O 2 ðt EO kþ1 Þ. They are derived assuming that the injected fuel mass m I� k evaporates completely and combusts stoichiometrically with the oxygen demand described by the stoichiometric factor μ O 2 . Thus, the overall cylinder gas mass m increases and the oxygen fraction X O 2 decreases, respectively. Due to the hard-to-describe combustion physics, a more complex, data-based model M P3 p γ P3 M À � determines the cylinder pressure pðt EO kþ1 Þ in (5). Its input vectorγ contains the initial state x t I1 k À � calculated by the physics-based model of the compression phase (phase 2) of (2), the fuel injection parameters comprising the start positions and fuel mass distribution of the injection impulses according to Figure 4 as well as the fuel pressure p F and the engine speed n E . These variables are selected as they affect the cylinder pressure evolution during the combustion phase within the governing differential equation. Further, the data-based model from the input vector γ P3 M 2 R 9 to the scalar pressure signal pðt EO kþ1 Þ. Gaussian process regression with a squared exponential kernel [33] is utilised throughout the paper to set up these data-based mappings as it has proven its suitability in the engine modelling domain [13] and is also supported by recent engine control units [34]. The Gaussian process hyper-parameters are estimated by a maximum likelihood approach utilising the software ASCMO (Advanced Simulation for Calibration, Modelling and Optimization) [35].
At each engine cycle k, the cylinder chamber model also describes the output signals that comprise the IMEP P as well as the NO x and soot emissions E NO x and E S . Due to the phase-wise model structure, see Figure 3, and the integral characteristic of the output signals, the vector y k is calculated by the sum of the individual contributions of each phase. Similar to the state space description (2), the outputs y P1 k and y P2 k of phases 1 and 2 are derived from the physics-based models. In detail, the IMEP is determined from the in-cylinder pressure trace pðtÞ according to [36]. Further, no emission components are assumed to be aspirated during the gas exchange. Regarding the combustion phase (phase 3), the single elements of y P3 k are determined by the data-based surrogate models M P3 α γ P3 M À � ; α 2 P; NO x ; S f g. Similar to (7), they define individual mappings from the input vector γ P3 M 2 R 9 (6) to the respective scalar output signals, which can be summarised in the vector-valued dependency expression These data-based mappings are also set-up by Gaussian process regression. Thus, they are characterised by the mean values ðE NO x ; E S Þ and standard deviations ðσ NO x ; σ S Þ.

Fuel injection-based combustion optimisation
The combustion optimisation in [20] determines desired fuel injection parameters u I;D k that are tailored to the transient engine operation. Therefore, it computes the offsets to adapt the steady-state fuel injection parameters under consideration of the actual emission quantities that are e.g. affected negatively by the slow air system and fuel pressure dynamics. Thus, the desired fuel injection parameters for the transient engine operation are calculated on a cycle-by-cycle basis via This approach considers all degrees of freedom of the two pulse fuel mass flow profile, see Figure 4, i.e. the adaptations Δm I� k and Δm I1 k for the overall and pilot fuel mass, the shift Δφ I2 k of the main injection start, and Δφ I12 k for the distance between the pilot and the main injection. To determine the pilot injection shift Δφ I1 k , the conversion matrix V sums up its relative offset Δφ I12 k and the total shift Δφ I2 k of the main injection. The signal scheme in Figure 5 visualises the integration of the correction approach (16) into the standard fuel injection control concept. This section originally determines the steady-state parameters u I;st k via the base maps f st α ðm I� k ; n E Þ; α 2 fm I1;st k ; φ I12;st k ; φ I2;st k g in dependence of the total fuel mass m I� k and the engine speed n E . The offsets Δu I;dy k are calculated in addition by the section "Dynamic Correction" based on a numeric optimisation approach that utilises the physics-/data-based cylinder chamber model (1)- (13). Further, the optimisation also considers certain NO x and soot limits E lim NO x and E lim S , the engine speed n E , the air system coupling variables v O , the fuel pressure p F as well as the desired IMEP P D;O k . The inputs v O and p F thereby introduce the current state of the air system and fuel pressure dynamics to the offset calculation.
The actual corrections Δu I;dy k (14) are derived by solving the optimisation problem min Δu I;dy s:t: : Phase 1 and 2 of ð2Þ to calculate pðtÞ; t EO which consists of the objective function (17a) and the constraints (17b)-(17g). In detail, the objective function (17a) has the structure and thus weights via w α ; α 2 fNO x ; σ NO x ; S; σ S ; Fg between the fuel consumption, the emissions NO x and soot as well as their uncertainties (standard deviation). To prevent the optimisation from seeking non-physical results, e.g. zero emissions, the emission quantities are further limited smoothly by means of E lim NO x and E lim S . Due to the max function, emissions are only minimised in case they exceed their limit, see Figure 6.
The expressions in (17b)-(17e) comprise the hybrid cylinder chamber model (1)- (13). In detail, (17b) represents the physics-based part, i.e. phases 1 and 2 in (2), and describes the pressure trajectory pðtÞ in the time interval t EO represent the combustion phase approximation in (2) and predict the IMEP P P3 , the NO x and soot emissions E P3 NO x and E P3 S , and the pressure pðt EO kþ1 Þ according to (5) and (12). The identifier O denotes the utilisation of the data-based models in the optimisation. Their input vector is derived from (6) and comprises the cylinder state xðt I1;O k Þ calculated by (17b) as well as the optimised fuel injection parameters determined according to (16). Similar to the dependency relation (13), the data-based models O P3 α ðγ P3 O Þ define individual mappings from the input vector γ P3 O 2 R 9 (19) to the scalar outputs, as summarised by Additionally, the equality constraint in (17f) requires the actual IMEP P O k to match the desired IMEP P D;O k . The actual IMEP P O k is determined in (17e) for the current corrections Δu I;dy k according to the combined physics-/data-based calculation approach (9). The reference value P D;O k is also estimated by means of (9) via However, the fuel injection parameters are uncorrected, i.e. the steady-state parameters u I;st k are utilised without any offset Δu only considers the steady-state fuel injection properties, which is also indicated by the identifier ϕ.
Finally, the inequality constraints (17g) limit the value range of the fuel injection corrections Δu I;dy k . The upper and lower boundaries ensure that the data-based models (21) are utilised only within the training data range discussed in Section 3.3. The fuel mass offset Δm I� k is restricted indirectly such that the resulting total fuel mass m I�;O k maintains the lower and upper boundaries (5mg and 35 mg) depicted in Figure 9. The limits of the other optimisation variables directly originate from their respective value range in the training data.

Generation of data-based models for the combustion optimisation
The combustion optimisation (17) for the transient engine operation utilises the physics-/ data-based, i.e. hybrid, cylinder chamber description (1)- (13). The parameters of the physics-based part are typically well known, since they mainly result from geometrical properties. In contrast, the data-based models O P3 α γ P3 O À � ; α 2 p; P; NO x ; S f g require tailored reference data of their in-and output signals to fit the generic Gaussian processes (1 training) and evaluate their accuracy (1 test). Since steady-state test bench measurements are employed to gather this data, their design needs to ensure that the static models are also valid during engine transients as also discussed in [18,19,37]. The nested location of the data-based models within the hybrid structure further complicates their generation. To solve these issues, Section 3.1 describes a test bench setup that enables the required measurements. Section 3.2 defines the variables to be varied in terms of the measurement campaign while Section 3.3 proposes a concept to shape their variation range. To properly generate the data-based models, Section 3.4 introduces a calibration procedure that processes the steady-state reference data into respective training and test data.

Description of the test-bench setup
The test-bench setup that is utilised for the gathering of the measurement data comprises all components from the system overview in Figure 1, i.e. an engine control unit, the air and fuel injection system as well as the core engine with an attached electric break to control its speed. Additional test-bench equipment supervises the overall system, controls the measurement procedure, and manages the sensor signal processing. The utilised sensors are visualised in Figure 7. They measure the thermodynamic state in the intake and exhaust manifold, i.e. the pressure, temperature, and oxygen fraction, which also correspond to the air system coupling variables v O t ð Þ required by the optimisation, see Figure 5. The soot and NO x emissions as well as the cylinder pressure trace are also measured by this setup.

Determination of the variation variables
Steady-state test bench measurements are executed to gather reference data for the training and test of the data-based models O P3 To ensure the suitability of the data sets, the measurement design ideally varies the signals Figure 7. Overview of the test bench structure focussing on the sensors that are located in the intake and exhaust manifold.
in a range that fits to the application in the optimisation. However, due to the limited controllability of certain elements of γ P3 O , the variables are varied during the test bench measurements. The engine speed n E , the fuel injection parameters u I k , and the fuel pressure p F are inherited from γ P3 O since they can be directly set by the test-bench electric break or the engine control unit. In contrast, no actuators are available to control the cylinder pressure pðt I1 k Þ, gas mass mðt I1 k Þ, and oxygen fraction X O 2 ðt I1 k Þ. However, the air system actuators enable to set these parameters indirectly, e.g. via the intake manifold conditions. Accordingly, Figure 8 visualises a mapping that shows the main dependencies between the air system actuators and the cylinder filling properties. Thus, the EGR valve is aligned with X O 2 ðt I1 k Þ since it affects the intake manifold oxygen fraction X O 2 IM . Similarly, the waste gate valve allows to vary pðt I1 k Þ by means of the intake pressure p IM . Since the actual value of X O 2 IM and p IM are measured by the test bench, see Figure 7, respective control loops are established to vary both individually. However, no additional air system actuator is available to set the intake temperature T IM , and consequentially the gas mass mðt I1;O k Þ independently from the oxygen fraction X O 2 IM and pressure p IM . As a result, both induce a certain cylinder gas mass mðt I1;O k Þ, which is suboptimal from the measurement design perspective but unavoidable due to the lack of actuators. The previous analysis only considers major dependencies and neglects e.g. the impact of the exhaust manifold state. However, due to the limited number of actuators, they could not be controlled at all.

Design of the parameter variation
The analysis in Section 3.2 defines the signals of γ Vari (26) to be varied during the steady-state test bench measurements to gather reference data for the training and test of the data-based models O P3 α γ P3 O À � ; α 2 p; P; NO x ; S f g. Now, this section focuses on the derivation of the data points that are actually tested at the measurement campaign. Figure 8. Visualisation of the main relations between the air system actuators and the cylinder state To derive the list of samples to be measured, the multidimensional variation space defined by γ Vari must be shaped and filled with data points, respectively. Thus, it needs to be limited to regions aligned with the desired area of application whereas technically inapplicable sections must be dismissed. Accordingly, since the databased models are utilised in the fuel injection-based combustion optimisation (17), the fuel injection parameters u I k in γ Vari (26) are varied apart from their base value calibration. Further, the models are intended to be used during transient engine operation. Thus, the variation space also needs to include steady-state operating points that represent the conditions during dynamic engine operation, e.g. in the course of a delayed rise of the intake manifold and fuel pressure after a load step. However, this also pinpoints a limitation of the steady-state measurement concept. If an operation point is only reachable by means of transient engine operation, it cannot be considered.
The diagrams in Figure 9 visualise the variation range of the single elements of γ Vari (26). The black dots represent the desired data points of the measurement procedure. The engine speed and fuel mass in D1 are varied between upper and lower bounds with a slightly modified full factorial design [38], where adjacent data points are shifted to improve space coverage. Further, each engine speed/fuel mass tuple represents a base point at which the remaining variables of γ Vari are varied with a space filling design [38]. The diagrams D2 -D4 of Figure 9 show the variation range of the intake pressure p IM (D2) and the oxygen fraction X O 2 IM (D3) as well as of the fuel pressure p F (D4). Since all of them exhibit a delayed or overshooting response during engine transients, the purely steady-state measurement procedure also has to cover these regions of operation. Therefore, base maps (blue) describe the standard value of these parameters in dependence of the engine speed n E and fuel mass m I� k . To cover positive and negative deviations during transient engine operation, the green and red maps up-and downwards of the base values further define a certain variation range. For the intake pressure p IM and the fuel pressure p F , the base maps are shifted up-and downwards by a certain offset. However, the positive offset of p IM is smaller than the negative for engine safety reasons. During engine transients, the intake oxygen fraction X O 2 IM may over-or undershoot the base value. Thus, its upper limit in D3 (red) equals the highest possible value, i.e. the fresh air oxygen fraction. The lower limit (green) is derived from prior measurements where the EGR valve is opened step-by-step to determine the minimal oxygen fraction that ensures save engine operation.
The diagrams D5 -D7 of Figure 9 depict the variation design for the fuel injection parameters, namely of the main injection start φ I2 k in D5, the distance φ I12 k between the pilot and main injection in D6, and the pilot injection mass m I1 k in D7. In order to enable the optimisation of the fuel injection parameters, the green and red limit maps define a certain variation range apart from their base values (blue) that correspond to the steady-state fuel injection parameters u I;st k . According to Figure 5, the base maps are described by f st α ðm I� k ; n E Þ; α 2 fm I1;st k ; φ I12;st k ; φ I2;st k g in dependence of the fuel mass m I� k and the engine speed n E . For the main injection start, the map is shifted symmetrically by � 6 � CA (D5). The pilot injection start is varied between À 2 � CA and 6 � CA relative to the main injection (D6). The lower limit is smaller than the upper to avoid interactions of the pilot and main injection. Finally, the pilot injection mass is varied between 1 mg and 3:5 mg (D7), whereas for a fuel mass m I� k below 10mg its variation range decreases. The described experiment design comprises full factorial and space filling parameters varied between non-box shaped boundaries. To determine sets of data points that align with these requirements, the toolbox ExpeDes of the software ASCMO [35] is utilised where the space filling design is determined by a Sobol sequence. To increase the robustness against drifts and measurement failures, the test plan contains six sections that independently describe a complete experiment by 275 data points, respectively. This data point amount is a trade-off between the test bench allocation, space coverage, and total model complexity, i.e. the computational effort of the Gaussian processes.

Calibration of the data-based models
The test bench measurements designed in Section 3.3 intend to generate appropriate reference data to train and test the data-based models O P3 α γ P3 O À � ; α 2 p; P; NO x ; S f g of the combustion optimisation (17). As they are part of the physics-/data-based cylinder chamber model (1)-(13), they are calculated closed-loop with the state space equations of the gas exchange and compression phase. Since this complicates the generation of the data-based models, this section proposes a respective calibration strategy.
The calibration procedure of the data-based models initially assumes, that measurement data is generated in accordance to Section 3.3. The top section in the overview sketch of Figure  10 depicts this prerequisite. Based on this data, the generation of the models O P3 α ð�Þ follows a two-step procedure, as visualised in the lower section of Figure 10.
Step one generates the approximation model O P3 p ð�Þ that determines the cylinder pressure pðt EO kþ1 Þ at the end of the combustion phase. Due to this model, the state space description (2) can be simulated stand-alone.
Step two generates the remaining data-based models O P3 β γ P3 O À � ; β 2 P; NO x ; S f g while the previously built instance of O P3 p ð�Þ is utilised to simulate the cylinder chamber state x.
The generation of the data-based models requires training and test data sets. Each of them consists of single samples that comprise signals of the inputs γ P3 O and the designated outputs, respectively. The signal flow depicted in Figure 10 visualises the assembly of these data samples at calibration step one and two based on the reference data from test bench measurements. At step one as well as two, the fuel injection parameter u I k Ⓐ, the fuel pressure p F Ⓑ and the engine speed n E Ⓒ of the input vector γ P3 O are inherited from the reference data. In contrast, the derivation of the cylinder state x t I1 k À � Ⓖ requires the evaluation of the cylinder chamber model, since the respective signals are not provided by the measurement data. To execute the respective simulations, the measured air system coupling variables v O t ð Þ Ⓓ are required. Further, at step one, the cylinder pressure approximation O P3 p ð�Þ within phase 3 is not existing yet (phase 3 is crossed out at step one in Figure 10). Thus, the cylinder pressure pðt EO kþ1 Þ at the end of phase 3 is set according to the measured signal Ⓔ. Consequentially, the same signal is also assigned to the output element, since O P3 p ð�Þ is trained at step one. During the second calibration step, the previously created model O P3 p ð�Þ enables to simulate the cylinder chamber model in a stand-alone fashion. As a result, the cylinder state O À � also contains the unavoidable Figure 10. Assembly of input-output samples of the training and test data sets at calibration step one and two for the generation of the data-based models O P3 α γ P3 O À � ; α 2 p; P; NO x ; S f g of the optimisation problem (17). modelling error that is introduced by the cylinder pressure approximation O P3 p ð�Þ. This ensures data consistency compared to a single-step calibration of all data-based models. At step two, the input vector signals are assembled similar to step one. However, the output values for NO x , soot, and the IMEP of the combustion phase are inherited from the reference data via Ⓕ, respectively.
The measurement procedure designed in Section 3.3 comprises six sections that independently sample the full range of operation. Three of them are selected for the training and test data, respectively. The sections with the highest share of successful runs are utilised for the training. The symmetric split also considers the computational effort of the Gaussian process regression which scales with the training data size. Overall, the training and test data sets comprise 753 and 524 data points, respectively.

Improvements of the computational effort of the combustion optimisation
The fuel injection control concept introduced in Section 2.2 calculates the correction values Δu I;dy k for the fuel injection parameters to optimise the combustion during transient engine operation. However, the proposed approach requires the online optimisation problem (17) to be solved, which may be critical w.r.t. the computational power or timing. Accordingly, this section proposes measures to reduce the computational effort of the correction value calculation. Certain adaptations of the optimisation scheme are proposed taking advantage of the flexibility of the databased models, e.g., to learn further relations in addition to those of their original training data. In detail, Section 4.1 describes a simplifying restructuring of the original optimisation problem (17) that maps certain calculations into the data of the data-based models to remove them from the optimisation scheme. Section 4.2 extends this concept and also projects the IMEP equality constraint (17f) into the data-based models training data. Finally, Section 4.3 proposes an alternative approach that solves the online optimisation problem offline and stores the results in dedicated correction maps.

Restructuring of the original optimisation problem
The solution of the optimisation problem (17)  ; φ I2;st k g to be evaluated, see Figure 5. To eliminate this overhead, Section 4.1.1 and 4.1.2 transform the databased models of the optimisation scheme (17) such that parts of the physics-based model as well as the base maps f st α ðm I� k ; n E Þ are mapped into their training data. Section 4.1.3 updates the optimisation problem accordingly. Finally, Section 4.1.4 discusses the accuracy of the updated data-based models.

Substitution of the physics-based calculations of the compression phase
In the optimisation problem (17) , since both result from the actual pressure trace pðtÞ determined by the physics-based model (17b). To substitute these calculations, the approximative data-based mappings are introduced to directly obtain pðt I1;O k Þ and P P2;O k . The input vector thereby comprises signals that are aligned with the pressure rise during the compression phase, i.e. the initial cylinder state x t IC k À � , the start φ I1 k of the pilot injection, the engine speed n E , and the overall fuel mass m I� k . The identifier Z generally denotes that the physics-based calculations of the compression phase (phase 2) are replaced by the mapping (27). Thus, the cylinder chamber state xðt I1;O k Þ can be approximated by The substitution (29) which does not rely any more on the compression phase model to calculate pðt I1;O k Þ. This change also requires to update the aligned data-based mappings (21) to Since the cylinder pressure pðt I1 k Þ is replaced by pðt IC k Þ in γ P3 Z , the calculations of the physics-based models of phase 2 in (17b) are projected into the data-based models (31).

Substitution of the total fuel injection parameters by their correction values
The previous section introduces the data-based models Z P3 α γ P3 Z À � ; α 2 p; P; NO x ; S f g in (31) and Z P2 β γ P2 Z À � ; β 2 P; p f g in (27), which both avoid computing the physics-based models of the compression phase in (17b) during the optimisation. However, their input vectors γ P2 Z (28) and γ P3 Z (30) still contain the total fuel injection parameters u I;O k (20), which require to determine the steady-state parameters u I;st k (15) by the base maps f st ν ðm I� k ; n E Þ; ν 2 fm I1;st k ; φ I12;st k ; φ I2;st k g, see Figure 5. To save this effort, their dependencies should also be mapped into the training data of the data-based models.
At the input vector γ P2 Z (28) of the data-based models Z P2 β γ P2 Z À � , the pilot injection start φ I1;O k is a function of the corrected fuel mass m I�;st k þ Δm I� k , the individual shifts Δφ I2 k and Δφ I12 k of the pilot and main injection as well as of the engine speed n E , see Figure 5. These known dependencies enable to rewrite the input vector γ P2 Z (28) into Expanding the dependencies of f φ I1;O k ð�Þ further turns γ P2 Z into the vector which does not rely on the total pilot injection start φ I1;O k any more. This is also indicated by the identifier Y. Further, the vector dimension changes from R 6 to R 7 . The input vector γ P3 Z (30) of the data-based models Z P3 α γ P3 Z À � is transformed similarly. Due to the dependencies of u I;O k resulting from Figure 5, it is rewritten into After expanding the dependencies of f u I;O k ð�Þ, the input vector γ P3 Z turns into which also just relies on the fuel injection parameter corrections Δu I;dy k . Due to the input vector transformations (32) and (33), the steady-state fuel injection parameter u I;st k and consequentially their associated base maps f st ν ðm I� k ; n E Þ must not be evaluated during the optimisation any more. In other words, the transformation inherently integrates the base maps f st ν ð�Þ into the training data of the data-based models. As a result, the data-based mappings (27) and (31) are updated to Since the cylinder pressure mapping Z P2 p ð�Þ in (27) represented an interims result of Section 4.1.1, it is neglected in (34). k are both projected into the training data of the databased models. Accordingly, the optimisation problem (17)

Reformulation of the optimisation problem
The physics-based model part in (36b) now only describes the gas exchange (phase 1) to determine x t IC k À � . Furthermore, the data-based model Y P2 P ð�Þ calculates the compression phase IMEP P P2;O k in (36e) instead of the physics-based model utilised in (17e).

Generation and evaluation of the data-based models
The data-based models Y P3 α γ P3 Y À � ; α 2 p; P; NO x ; S f g and Y P2 P that are introduced in Section 4.1.2 differ from the models of the original optimisation problem (17). Thus, they need to be generated and tested individually. Their calibration procedure basically follows the approach described in Section 3.4, whereas the input signals of γ P3 Y are used instead of γ P3 O and the IMEP model Y P2 P ð�Þ is also trained during step two. The correlation plots in Figure 11 visualise the training (green) and test (red) error of the data-based models that are utilised closed loop in the optimisation. The results for NO x , soot and the phase 3 IMEP (D1 À D3) are comparable with the results originally discussed in [20]. Further, the newly introduced IMEP model for phase 2 exhibits a small training as well as test data error (D4).

Elimination of the IMEP equality constraint
Section 4.1 introduced several modifications of the original optimisation problem (17) to reduce the calculation overhead. However, the resulting optimisation scheme (36) still requires a certain computational effort to balance the IMEP equality constraint (36f) at each optimisation iteration. Accordingly, Section 4.2.1 proposes to project this constraint into the data of the data-based models such that their prediction inherently satisfy the constraint thus making its explicit consideration superfluous. Section 4.2.2 reformulates the optimisation problem (36) accordingly. Finally, Section 4.2.3 describes the generation and evaluation of the updated databased models.

Definition of data-based models with inherent IMEP constraint
The IMEP equality constraint (36f) of the optimisation problem (36) |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } B : The term A, which originates from the desired IMEP P D;O k (22), only depends on parameters that are constant during a certain optimisation run, i.e. the cylinder state x t IC k À � at intake valve closing, the engine speed n E , the fuel pressure p F , and the steady-state fuel injection parameters u I;st k . In contrast, term B, which results from the actual IMEP P O k (36e), also depends on the optimised injection parameter corrections Δu I;dy k . In the course of the optimisation, term A and B of (37) are balanced continuously while the objective function (36a) is minimised. Since the objective purely consists of data-based models, the IMEP constraint may be projected into their training data such that for all of their data points term A and B of (37) are balanced. As the predictions of such data-based models would inherently satisfy the IMEP constraint, its dedicated balancing during the optimisation becomes superfluous.
However, the terms A and B of (37) are unbalanced by default for all training and test data samples created according to Section 3. Hence, a post-processing of the data points is proposed, i.e. a virtual rerun of the test bench measurements, to derive modified samples that inherently satisfy the constraint (37). This data processing approach is visualised in Figure 12. In detail, the fuel mass m I�;orig k of each data sample (green circle) is modified by Δm I�;adp k into the virtual fuel mass m I�;virt k which causes the equality constraint (black line) to be satisfied. In order to preserve the homogeneous, space filling structure of the data sets, the other fuel injection parameters as well as the engine speed Figure 11. Evaluation of the data-based models Y P3 α γ P3 n E , fuel pressure p F , and cylinder state x t IC k À � remain unchanged. Since the modified full mass m I�;virt k changes the emissions, the NO x and soot values E NO x and E S of the processed data samples are updated to maintain data consistency. The models Y P3 NO x ð�Þ and Y P3 S ð�Þ from (35) are utilised for these updates. The previous data processing introduces a redundancy in the training and test data. Accordingly, the adapted fuel mass m I�;virt k is, e.g., aligned with the shift Δφ I2 k of the main injection via the desired IMEP that equals term A of (37). The redundancy allows to define the data-based mapping with the input vector to determine the corrected fuel mass m I�;st k þ Δm I� k that maintains the desired IMEP P P23;ϕ k under consideration of the other fuel injection parameter corrections, the engine speed n E , the fuel pressure p F , and the cylinder state x t IC k À � . The identifier X indicates that the data of the respective data-based models inherently satisfy the IMEP constraint. The elements of the input vector γ P3  (38). The redundancy in the data also allows to define the mapping with the input vector γ P3 Figure 12. Concept for processing a data point to inherently satisfy the IMEP constraint equation (37).
to describe the main injection start correction Δφ I2 k for a certain adapted fuel mass ðm I�;st k þ Δm I� k Þ and desired IMEP P P23;ϕ k . Finally, the data-based models Y P3 NO x γ P3 Y À � and Y P3 S γ P3 Y À � of the exhaust emissions specified in (35) must be redefined due to the updated data such that their predictions inherently satisfy the IMEP constraint. Their input vector γ P3 Y (33) is extended with the redundancy mapping (41) of the main injection start correction Δφ I2 k leading to The expansion of the dependencies of X P3 where Δφ I2 k is substituted with the desired IMEP P P23;ϕ k . Accordingly, the data-based mappings (35) of the emission predictions are updated to Due to the data preprocessing approach of Figure 12 and the adapted input vector γ P3 X (43), the data-based models X P3 NO x ð�Þ and X P3 S ð�Þ predict emissions that inherently align with the IMEP constraint (36f).

Reformulation of the optimisation problem without IMEP constraint
Section 4.2.1 introduces the data-based models X P3 NO x ð�Þ and X P3 S ð�Þ which provide emission predictions that inherently satisfy the IMEP constraint (36f). Hence, the optimisation problem (36) where the IMEP constraint (36f) is excluded. As the main injection correction Δφ I2 k is substituted by the desired IMEP P P23;ϕ k in the input γ P3 X (43) of the data-based models X P3 NO x ð�Þ and X P3 S ð�Þ, it is also removed from the optimisation variables of (45). However, at the end of each optimisation run, the value of Δφ I2 k is determined by means of X P3 in order to determine m I�;min k and γ P3 to derive m I�;max k . In order to consider these restrictions within the constraint (45e), the bounds of Δm I� k from (24) and (25) (24) and (25). In case only the total fuel mass is optimised via Δm I� k , i.e. for Δφ I12 k ¼ 0 � CA and Δm I1 k ¼ 0 mg, the limits (46) and (47) are constant during an optimisation run. Otherwise, they need to be updated at each optimisation iteration.

Generation and evaluation of the data-based models
The data-based models X P3 α γ P3 1 must be generated and tested. In contrast to the previous models, their data is derived by post-processing existing data sets according to Section 4.2.1. The data of the data-based models from Section 4.1.4 is utilised for this purpose.
The correlation diagrams in Figure 13 visualise the training and test data error of the data-based models. The error of NO x (D1) and soot (D2) is smaller compared with the previous models Y P3 α γ P3 Y À � in Figure 11. This results from the smooth training data, that is derived from sampling Y P3 NO x ð�Þ and Y P3 S ð�Þ according to Section 4.2.1. The redundancy models X P3

Offline learning of the optimisation results
The previous Sections 4.1 and 4.2 introduce approaches to minimise the computational effort of the original optimisation problem (17). However, instead of solving it online during runtime, the following approach determines the corrections Δu I;dy k offline and stores the results in respective data-based surrogate models, as depicted in the signal flow diagram in Figure 14. To establish this concept, Section 4.3.1 proposes an input-output structure for these correction models that also preserves the flexibility of the original optimisation problem, e.g., regarding variable weights. As different types can be created, Section 4.3.2 describes the variants investigated by this paper. Finally, Section 4.3.3 describes the generation and evaluation of the surrogate correction models.

Definition of data-based models for the fuel injection parameter corrections
In order to derive data-based models that substitute the optimisation (17), a set of input signals γ W must be defined to unambiguously describe the correction values Δu I;dy k . These signals are selected according to the external dependencies of the original optimisation problem (17) depicted in Figure 5. In the left part of the diagram in Figure 14, these relations are restructured to derive the signals that specify the external dependencies of (17). Hence, both IMEP set points P D;O k and P D;st k , the cylinder filling properties x t IC k À � at intake valve closing, the engine speed n E , the fuel pressure p F , and the weights of the objective function (18) are required. For reasons of simplicity, the weights w σ NOx and w σ S of the model uncertainty terms are neglected. Consequently, the data-based mappings can be defined to explicitly describe the fuel injection corrections Δu I;dy k . The identifier W denotes that these data-based models substitute the online optimisation.

Derivation of the data for the data-based correction models
The generation of the substituting data-based models (49) requires certain training and test data. According to Section 3.3, such data sets must contain data samples that vary the input signals γ W in a meaningful range and also provide the output signals, i.e. the fuel injection parameter corrections Δu I;dy k . In contrast to the data-based models utilised in the original optimisation problem (17), this input-output data does not originate from measurement data, but from sampling (17) for various boundary conditions. The sketch in Figure 15 shows the concept that is utilised to perform this sampling of the optimisation scheme. Since the data derived according to Section 3.3 already contains a strong variation of the air system conditions, engine speed, fuel pressure, and load, it is used to define a base variation for the boundary conditions of the optimisation. Further, at each data point, the weights are varied in addition. Within this paper, the weight sampling variants V1 and V2 depicted in Figure 15 are investigated. V1 considers a single weight case where NO x and fuel mass are equally prioritised. In contrast, V2 considers multiple combinations of different NO x and fuel mass weights. For each of the resulting samples of the input γ W , the optimisation problem (17) is solved in order to determine the corresponding fuel injection corrections Δu I;dy k .

Generation and evaluation of the data-based models
As Section 4.3.2 proposes an approach to determine the training and test data for the data-based correction models (49), this section focusses on their generation and evaluation. In detail, individual models based on Gaussian process regression are created for the weight variants V1 and V2 of Figure 15. Further, only correction models for the main injection shift Δφ I2 k and the fuel mass offset Δm I� k are generated, since both were identified by [20] as the dominant degrees of freedom of the combustion optimisation.
The correlation plots in Figure 16 visualise the training and test data error of the models W Δm I� k ð�Þ (D1)/(D3) and W Δφ I2 k ð�Þ (D2)/(D4) for the weight variants V1 and V2, respectively. Overall, the error measures indicate that all models are fitted very well. However, both weight variants show an increased local error in case no corrections are requested, i.e. in the region of Δm I� k ¼ 0 or Δφ I2 k ¼ 0, as well as if the main injection shift Δφ I2 k is limited by its lower or upper boundary (17g). Since the fuel injection corrections turn into constant values, i.e. flat surfaces, in these regions, the Gaussian process regression obviously has difficulties to accurately describe that behaviour. Figure 15. Sampling concept of the optimisation problem to derive training and test data for the databased models W α γ W À � ; α 2 Δm I� k ; Δm I1 k ; Δφ I12 k ; Δφ I2 k � � . The weight variation cases V1 and V2 are investigated.

Simulation-based evaluation of the improvements in the computational efficiency of the combustion optimisation
Section 4 introduces several concepts to improve the computational efficiency of the fuel injection-based combustion optimisation (17). This section analyses and compares their effects on the accuracy of the fuel injection control as well as on the time required to determine the corrections values Δu I;dy k . The comparison is realised in the simulation-based test-bench environment introduced in [20]. Thus, all tested variants are implemented in Simulink via embedded function blocks. Further, the optimisation-based approaches are solved by the Matlab function fmincon with the interior point algorithm. The Gaussian process models are generated by the software ASCMO [35] and are executed as m-code. The run time that is referred to by the following analysis represents the time needed to determine the fuel injection parameter offsets, e.g. via fmincon or the data-based correction models. This measure excludes the effort of the gas exchange calculations since they are executed equally by each approach once per cycle prior to the correction value calculation. In a standard, non-optimised Simulink environment this model part requires � 0:6 s to run.
The different variants of the combustion optimisation are compared at a certain transient test cycle. Its engine speed and accelerator pedal trajectory is depicted in the diagrams D7 and D8 of the simulation results overview in Figure 17. In detail, the engine speed rises as the accelerator pedal is pushed for a certain period of time. The remaining subplots of Figure 17 visualise simulation results, i.e. the emissions and the IMEP generated by cylinder 1 (D1) -(D3), the optimised fuel injection parameters (D4) -(D5) as well as timing measures for the optimisation procedure in D6 and D9.
The coloured lines in the plots of Figure 17 indicate the different evaluation cases. Their properties are summarised in the table below the plots. Thus, all variants are configured with the same weight configuration, which equally prioritises NO x and fuel mass. In detail, the black case represents a reference at which the fuel injection parameters are not optimised. The red line shows the behaviour in case the corrections Δu I;dy k are calculated according to (36) where certain computational overhead is reduced compared to the original formulation (17). The light and dark green cases both visualise the simulation results of the optimisation problem (45), which utilises data-based models that inherently contain the IMEP constraint. The light green case additionally shows the performance of a warm start procedure, i.e. the optimisation is initialised with the final results of the previous run. The light and dark blue cases both represent approaches where the considered fuel injection adaptations Δu I;dy k are determined by the purely data-based correction maps (49). The dark blue line corresponds to the weight variation case V1 of Figure 15, i.e. it is generated with the data of only one weight configuration. In contrast, the light blue line corresponds to variant V2 which is trained with data from multiple weight configurations.
As the red case in Figure 17 nearly equals the original optimisation formulation (17), it shows the expected behaviour of the combustion optimisation. In the first section, which is marked by the dotted lines (. . .) in the subplot of Figure 17, the overall injected fuel mass is decreased by 4% compared to the black case. However, to maintain the IMEP, the main injection start position is shifted forward, since the non-optimised NO x fraction is below its limit E lim NO x (D1). As a result, the emitted NO x mass increases by 25%. In the second cycle section, which is marked by (--), the main injection start is mainly shifted backwards compared with the black reference curve in order to decrease the high NO x emissions. To maintain the IMEP, the fuel mass is increased accordingly. As a result, the emitted NO x mass decreases by 9 % while the fuel mass increases by 4 %. Overall, 5%NO x is saved with a 3 % higher invest of fuel mass. Even if soot is neglected by the optimisation due to its zero weight (w S ¼ 0), the soot mass is reduced by 2 %. On average, a single optimisation run requires 285ms.
The optimisation results of the green coloured cases, where the IMEP constraint is projected into the data-based models, equal those of the red case very well. Particularly, the IMEP trajectory (D3) remains unchanged. However, both require fewer iterations at each optimisation run (D6). The warm start approach (light green) even further reduces the iteration count and in consequence the overall run time required by the optimisation (D9). Compared to the red reference, the average optimisation run time decreases by 56:9% for the dark green case and 64:9% for the light green variant.
The light and dark blue cases, which both utilise purely data-based correction maps to calculate Δu I;dy k , cause slightly different simulation results compared with the red and both green cases. Particularly, in the middle of the first section (-) as well as at the end of the second (--), the determined fuel injection parameter corrections deviate from those of the online optimisation approaches. However, the total reduction of NO x by 4 % and the fuel mass increase of 2% or 3% are nearly equal to the cases with online optimisation. In contrast, the run time (D9) decreases significantly, i.e. the average duration of the light blue case is 86:1 % below the time required by the light green variant. The dark blue case, which comprises data-based models that are especially tailored for the currently tested weights w NO x ¼ w F ¼ 0:5, see V1 in Figure 15, is even 79:1 % faster compared with the light blue variant. The data-based models of the dark blue case are more Figure 18. Test of the data-based correction models V1 and V2, see Figure 15, for a weight factor configuration that is not included in their training data. 1) Each approach also simulates the gas exchange phase once per cycle which requires � 0:6 s in addition. complex since they also support weight factor combinations that differs from w NO x ¼ w F ¼ 0:5, see V2 in Figure 15. This flexibility is demonstrated by the simulation results depicted in Figure 18. In contrast to Figure 17, NO x has a lower priority compared to the fuel mass (w NO x ¼ 0:3; w F ¼ 0:7). As the light blue case is trained only for the weight combination w NO x ¼ w F ¼ 0:5, its corrections strongly deviate from those of the online optimisation (red). However, the dark blue case, which is trained with the full range of the NO x and fuel mass weights, adapts its predictions respectively and shows the same error level that was already discussed for Figure 17.

Conclusion and outlook
In order to improve the transient engine operation, combustion control schemes based on a cycle-by-cycle online optimisation require accurate while sufficiently simple optimisation models. Accordingly, this paper proposes methods to enhance their consistency and computational efficiency. Respective results are discussed based on an online optimisation scheme that contains a hybrid cylinder chamber description where the states and outputs are calculated by coupled physics-/databased models. To consistently calibrate the data-based parts of this hybrid set-up, the proposed two-step training procedure defines a certain calibration order for the data-based models involved in the state space and output calculations. In detail, the cylinder pressure surrogate model is generated prior to those which predict the emissions and torque of the combustion phase. The additionally suggested test bench measurements further support the generation of the data-based models as the gathered data is tailored to their application in the combustion optimisation. However, in terms of the test bench measurements, further research can be spent in varying the intake manifold temperature in addition to the pressure and oxygen fraction to increase the diversity of the data.
To improve the computational efficiency of the optimisation-based control, timeintensive calculations are moved from the online optimisation scheme into the training data of the data-based models. Different concepts are proposed and compared in a simulation-based test environment. According to the results, projecting the IMEP constraint into the data-based models strongly reduces the computational effort without any loss of accuracy. A warm start strategy even further increases the speed of this approach. Concepts that determine the fuel injection parameter adaptations from purely data-based correction maps are even faster, since they are trained offline with given optimisation results and thus are free of any optimisation overhead during runtime. However, this speed-up is achieved at the expense of a loss of accuracy and lower flexibility, since changes in the emission limit maps or untrained weight values cannot be considered. As the previous results are determined in a simulation-based test environment, they also need to be implemented, tested, and verified in a real-world setup, i.e. in a rapid control prototyping system or in an engine control unit. Consequentially, this will require to improve the computational efficiency of the calculations for the gas exchange phase as well. Even if the changed host environment affects the total run time of the discussed approaches, the identified trends, however, are expected to persist.

Disclosure statement
No potential conflict of interest was reported by the author(s).