Data-driven surrogate modeling of aerodynamic forces on the superstructure of container vessels

ABSTRACT The operation of fluid engineering systems is usually governed by a wide range of different parameters. Investigations of the entire parameter spectrum using classical, first-principle based CFD methods are costly with regards to CPU and wall-clock time. Therefore, a near real-time assessment of complex flows using CFD to support the operation is deemed unfeasible. The paper is concerned with methods for data-based surrogate models to predict the forces exerted by the aerodynamic pressure field on the superstructure of a full-scale container ship for different container loading conditions and wind directions. The strategy aims to assist a fuel-efficient operation and is based on a two-step approach. During an initial step, a reduced representation of pressure fields obtained from 3D Navier–Stokes simulations is compiled. To this extent, a classical proper orthogonal decomposition is compared with convolutional neural network autoencoders. A subsequent parameterization employs a feedforward neural network to link the reduced model with the operational parameters, i.e. the angle of attack and container loading condition, and to enable a rapid on board assessment. Both methods provide a similar agreement for the pressure fields, as well as the resulting forces, with the CNN-based surrogate model being significantly more compact.


Introduction
The environmental footprint and approximately half of the direct operating costs of merchant ships are governed by their fuel consumption, which in turn is dominated by the fluid dynamic resistance. Although wind forces have a noticeable effect on the drag and influence the manoeuvring or dynamic positioning of ships, the focus for optimizing ship operations mostly refers to the analysis of hydrodynamic forces. For a secure and fuel-efficient operation, it is, however, generally desirable to monitor all acting forces on board and to feedback the status to an operation management tool in order to assist operational decisions.
To predict the loads in near real-time on board a ship, data-based surrogate models are a viable approach. The goal of the present research is the development and assessment of such strategies for the prediction of aerodynamic ship loads. Standard Reynolds-averaged Navier-Stokes methods (RANS) or hybrid large eddysimulation (LES)/RANS methods can provide highquality data to compile a surrogate model. However, the offline computational effort to generate the data-basis is considerable in conjunction with RANS or LES, and the challenge is, therefore, to develop reliable surrogate CONTACT Rupert Pache rupert.pache@tuhh.de Institute for Fluid Dynamics and Ship Theory, Hamburg University of Technology, Hamburg, 21073 Germany models with a small amount of considered RANS/LEScases. In the present study, data obtained from hybrid RANS/LES simulations is compiled for a larger number of operating conditions. Subsequently, the data of the full-order CFD model is structured and fed into an order reduction procedure. During the second step, a regression is performed to link the operating parameters with the coefficients of the reduced-order model (ROM). Different from most previous studies, we aim at parameterized ROMs of the pressure field around the superstructure rather than the integral forces on the vessel. Moreover, we compare classical with machine learning-based reduced order approaches.

Ship aerodynamics
With attention given to merchant ships, the wind influence on the ship resistance and side force is the main point of ship aerodynamic interest. In regard to container vessels, the topology of the air-wetted superstructure is governed by the stowed cargo and related variations, which significantly influence the aerodynamic flow pattern in combination with the angle of attack. The surrogate model must therefore be able to handle different geometry and flow topologies. To overcome this challenge a unified flow field based on an extrapolation strategy is applied herein. Other aerodynamic aspects refer to the funnel performance as well as comfort aspects on deck of ferries and passenger ships. For the special purpose or research vessels, the interaction between the external flow and the exhaust gas flow influences the helicopter operations and the placement of sensor installations.
Wind tunnel experiments were previously the standard source of information for the analysis ship aerodynamics, e.g. Blendermann (1996) or Andersen (2013) for the effect of different container loadings on the wind forces of the ship. Experimental results mainly focus on forces. They are nowadays complemented by an increasing amount of numerical simulations which can provide more comprehensive data. For example, Janssen et al. (2017) describe steady RANS simulations to assess the influence of different container stack shapes on the aerodynamic resistance of ships. The aerodynamics of most ship superstructures refers to bluff-body flow, featuring geometry induced open vortical separation. Here, forces are governed by the pressure field. In such flows, RANS methods often provide less accurate predictions and the use of scale-resolving approaches yields substantial predictive improvements, as outlined by Angerbauer and Rung (2020). Therefore, hybrid RANS/LES approaches, i.e. Detached Eddy Simulations (DES), introduced by Strelets (2001), lie at the root of the present ROM strategy.

Order reduction and regression
Ship aerodynamics are controlled by different parameters, e.g. the approach flow angle, the floating position of the ship or the distribution of the container stacks. It is a high-dimensional, nonlinear problem. The basic idea of the order reduction is to closely mimic the response of the elaborate Detached Eddy Simulations (DES) simulations to the control parameters in a compact form. In this paper, Proper Orthogonal Decomposition (POD) and Convolutional Neural Network (CNN) based Autoencoder (AE) techniques are utilized as two alternative order reduction strategies, which serve as a basis for a subsequent regression to arrive at the final surrogate model.
The POD method belongs to the family of modal analysis methods. It was introduced by Lumley (1967) to fluids engineering for the analysis of turbulent flows and later resumed by Sirovich (1987). A more recent overview of different modal analysis applications in fluid dynamics is given by Taira et al. (2017). The POD method is perhaps the most widely used technique for a compact analysis of flow fields, and therefore also used herein as a classical approach next to machine learning methods. The POD aims at extracting an orthonormal basis which is optimal in a least-square sense. The strategy supports a reduced representation of the original problem in two different ways, i.e. the data-driven reduction of the governing equations or the reduction of a data set describing a dynamical system. In the current work, the latter practice is adopted by means of a model-free, data-driven approach for which the input data is generated using time-averaged results of 3D transient CFD simulations. The approach is often referred to as snapshot-method, following the work of Ly and Tran (2001), who utilized field data snapshots as input for a POD analysis of the Rayleigh-Bénard convection. Nowadays, POD based surrogate modeling serves a wide range of applications, e.g. the design and optimization of complex systems or the quantification of uncertainties. An excellent overview is reported by Benner et al. (2015), where the parametric model reduction pursued herein is also mentioned as an option for surrogate modeling.
To complete the surrogate model, a mapping between the coefficients of the reduced-order model and available control parameters employed by the online assessment on board a ship is required.
The present control parameters refer to the wind angle and container loading. Computing the coefficients of a reduced basis from a regression or interpolation of the initial coefficients is frequently labeled nonintrusive reduced-order modeling. Different techniques were previously used for such mapping exercises. Walton et al. (2013) use radial basis functions to describe the flow over an oscillating wing in response to the pitch and heave of the tip. The development of a decisionmaking tool for an aerial vehicle surrogate model, which is controlled by the structural integrity of the vehicle, is described by Mainini and Willcox (2015), who use self-organizing maps for the mapping. A more recent publication of Swischuk et al. (2019) extensively compares the influence of different regression strategies for the mapping. The authors use multivariate polynomial regression, k-nearest-neighbor, decision tree regression and artificial neural networks, to support the reconstruction of pressure fields around a 2D airfoil in response to two parameters, i.e. the Mach number and the angle of attack. Such efforts to employ machine learning methods for the mapping between the control parameters and the reduced coefficients are sometimes labeled physicsinformed neural networks (PINN). Interestingly, the study of Swischuk et al. (2019) suggests that less complex regression tools provide more satisfactory results. The present research partly follows a similar route for a more complex 3D flow problem, featuring different air-wetted Figure 1. Schematic description of an autoencoder. The input data is processed by the encoder to obtain a condensed, reduced variable representation in the latent space of the data. An approximation of the original data is retained by applying the decoder.
domains. Contrary to Swischuk et al. (2019), we employ an artificial neural network for the mapping in combination with different order reduction approaches based on both machine learning and POD.
The simplicity and efficiency of the POD is based on the linear basis of the method. However, the linearity also imposes limitations when addressing nonlinear phenomena. Regarding the reduction of nonlinear phenomena, machine-learning approaches have recently gained attention, and machine learning procedures are nowadays used in parallel to classical CFD approaches. Recently published examples refer to Ghalandari et al. (2019), who combine a genetic algorithm with an artificial neural network for the design optimization of compressor blades, or Hu et al. (2021), who apply intelligent labeled models based on machine learning for a low-cost prediction of flood control dams, and Ma et al. (2021), who design a 24h-prediction-model of the significant wave height by combining classic forecast models with machine learning algorithms. Artificial neural network (NN) based autoencoders (AE) are used for the nonlinear feature extraction in dynamical systems, cf. Brunton et al. (2020) for fluid dynamic applications. Autoencoders are an unsupervised learning technique composed of three parts: An encoder which collects the input data, a bottleneck with the latent or reduced variables and a decoder which finally provides the output, cf. Figure 1. Similar to other machine learning approaches, activation functions are used to transfer the data between neighboring layers. It can be demonstrated that autoencoders employing linear activation functions in combination with an L 2 loss function represent a POD model (Baldi & Hornik, 1989). Using nonlinear activation functions for the autoencoder enables improved compression capabilities in comparison with the POD method, as outlined by Milano and Koumoutsakos (2002) for a nearwall turbulent flow. The simplest autoencoder consists of one encoder and one decoder layer, supplemented by the latent space in between. Adding more layers to the encoder and decoder results in a deep autoencoder and allows the model to extract the features of the data more precisely. Maintaining the spatial correlations featured by fluid dynamic data is usually desirable but can be an issue for applications of neural networks. Using Convolutional Neural Networks (CNN), spatial correlations are considered, which might explain the popularity of multilayer CNN-autoencoders (CNN-AE) to reduce the dimension of flow field information. Murata et al. (2020) describe the mode decomposition of a flow around a cylinder with CNN-autoencoder. Agostini (2020) employs the reduced information of a CNN-autoencoder for different applications, e.g. clustering, prediction of temporal evolution and sparse reconstruction.
The present paper compares a POD and a CNN-AE for an order reduction of the pressure field around a 3D container ship in full scale. Both reduction strategies are based on snapshot DES simulations for different deck loading conditions and wind angles. The reduced models are supplemented by a regression method to complete the surrogate model. In this regard, a NN based regression is used for the mapping between the coefficients of the reduced-order model and system parameters. The different steps of the procedure are illustrated in Figure 2.
We aim to assess the attainable accuracy of the two reduction strategies for confined reduced space in a real-world case, and highlight the predictive differences returned by the POD and the CNN-AE reduction strategy when NN regression is employed to connect the reduced space with the operating variables. The objective of our procedure is to predict the pressure field as accurate as possible for a wide range of operating parameters. An accurate prediction of the pressure field can be of importance for various reasons, e.g. for flow control, optimizations or locating sensors to be used for diagnostic purposes. These tasks require the consideration of a large parameter space. In the present study, the wind loads on the ship superstructure are determined on the basis of the pressure fields. The remainder of the paper is structured as follows: Section 2 outlines the scope of the study and introduces the test case. The subsequent section describes the computational approach. This involves brief descriptions of the Navier-Stokes procedure and the employed discrete model, the POD method and the CNN approach, followed by an overview of the regression strategy. Section 4 reports the results of the POD-based and the CNN-based surrogate models. Final conclusions are outlined in Section 5.

Outline of present study
The current study investigates the full-scale aerodynamic flow field around the superstructure of an existing container feeder, cf. Figure 3. The length of the ship refers to L = 140 m, the airflow velocity is assigned to U = 20 m s −1 , and the Reynolds-number reads Re U,L = 1.9 · 10 8 .
Only the aerodynamic flow above the calm water level is considered, and therefore only a single-phase discretization is applied using slip flow boundary conditions to represent the free surface.
Two exemplary operating parameters, i.e. the wind encounter angle β and the deck loading condition, are considered to reconstruct the time-averaged 3D pressure field initially obtained from transient, scale-resolving DES simulations. The study is supported by 119 DES simulations for different wind and loading conditions. Assessed wind encounter angles range from 0 • to 180 • and are discretized into equal steps of 5 • (or coarser). The loading cases used in the present study refer to seven stack sizes involving n H = 0, 1, 2, 3, 4, 5, 6 containers. Note that the height of all container stacks is constant for each configuration. DES results are structured into training or learning, validation and testing, viz.
Training data is used to learn or train the reduced model based upon DES results for n H = 0, 2, 4, 6, cf. Figures 4, and β = 0 • , 20 • , . . . , 160 • , 180 • (set 1) or β = 0 • , 10 • , . . . , 170 • , 180 • (set 2). To investigate the influence of the data allocation distance, the training data therefore involves data sets with either 4 × 10 (set 1) or 4 × 19 (set 2) operating points. Validation data is used to assess the predictions of the reduced model against DES results for n H = 0, 2, 4, 6 and β = 25 • , 65 • , 105 • , 145 • . Thus, the validation data involves 4 × 4 operating points which refer to the trained container stacks but different angles of attack. Both the training and the validation data belong to the same container stack data group. Approximately 29% [18%] of the data volume of this group is used for the validation in conjunction with the first [second] training set. Test data is used to assess the predictions of the complete surrogate model against DES results for n H = 1, 3, 5, cf. Figure 5, and β = 5 • , 25 • , . . . , 145 • , 165 • . Thus, the test data involves 3 × 9 operating points not used for the training.  During the first step, a POD and a CNN-autoencoder are used to reduce the dimension (cf. upper parts of Figure 2). Both alternatives are investigated for two reduced variable amounts: The investigated number of latent variables refer to 3 or 5, and the amount of considered POD modes refers to 5 and 10, respectively. Subsequently, the coefficients of the considered latent variables and POD modes are linked with the operating parameters using a neural network (cf. middle parts of Figure 2). In conclusion, 8 different surrogate models (2 reduction strategies × 2 latent space dimensions × 2 training sets) are generated and validated against data extracted from 4 × 4 operating points and subsequently tested for 3 × 9 operating points.

Navier-Stokes procedure
The data used to compute the POD and to train the CNN-autoencoder is obtained from time-dependent 3D CFD simulations. The simulations employ an implicit second-order accurate finite volume method as described by Rung et al. (2009) andVölkner et al. (2017). The algorithm is based on the strong conservation form of the momentum equations. A pressure-correction algorithm is implemented to determine the pressure for (in-)compressible flows. Unstructured grids based on arbitrary polyhedral cells are used for the spatial discretization together with a cell-centered, collocated variable arrangement. Time derivatives are approximated by an implicit three time-level method and spatial integrals are approximated by the mid-point rule. The approximation of convective fluxes employs hybrid central / upwind-biased second-order accurate approximations. Figure 6 depicts a fraction of the unstructured grid. The extensions of the grid domain read 7L (length), 7L (width) and 1.4L (height). Three grids were investigated prior to the study. Table 1 outlines the size of the grids and the predicted (longitudinal) drag forces for an exemplary loading with 4 vertical containers. Major resolution differences refer to the near-wall regime, which is consecutively doubled when traversing from grid 1 to grid 2 and grid 3. The baseline grid 1 features a spatial resolution of approximately 2.2 cells/meter in the vicinity of the vessel. The 4.6% drag force deviation between the coarse baseline grid and the fine grid is deemed moderate, and the coarse baseline grid 1 is utilized in this study due to a large number of required DES simulations. Mind that larger dominating vortex structures are still resolved by grid 1 and a satisfactory agreement of the pressure field is seen.

Flow field discretization and turbulence modeling
Turbulence modeling refers to the Improved Delayed Detached Eddy Simulation (IDDES) strategy, cf. Gritskevich et al. (2012), based on the k-ω Shear Stress Transport (SST) model of Menter et al. (2003). To limit the computational effort, wall functions are employed along the surface of the ship, cf. Gritskevich et al. (2017). The  validation of the present computational model to simulate the flow around the superstructure of a feeder ship -including the employed DES closure -has been described in a previous publication of the authors (Angerbauer & Rung, 2020). This reference reports a 5-10% maximum deviation between simulated and measured forces, depending on the loading configuration. Except for minor geometrical simplifications, the current work employs the same feeder ship and grid resolution. Hence, we do not repeat the details herein but refer to the previous publication. Due to the different configurations, the CFD grid arrangement and the related wetted domain changes. This hampers the reduction step which requires a joint data structure. To obtain jointly structured input data, we employ a fixed meta-grid of approximately 320 × 64 × 64 Cartesian nodes. As depicted in Figure 7, the meta-grid covers only a small cutout of the CFD domain around the ship geometry. The meta-grid consists of wetted fluid nodes and unwetted nodes inside the containers. The pressure field is interpolated from the unstructured CFD grid to the meta-grid prior to the compilation for the ROM. For this purpose, an inverse distance weighted interpolation is used, which involves the pressure in the nearest cell center of the finite volume grid and computed pressures in its adjacent neighbors. Mind that the number of fluid nodes in the CFD meshes varies due to the different height of container stacks. For this reason, the pressure field is extrapolated to the inner area of the container stacks to preserve a constant number of meta grid pressure points.
Pressure forces of the CFD simulation simply follow from a (numerical) integration of the pressure along the surface of the geometry. Pressure forces of the surrogate model are extracted from the pressures predicted on the meta-grid, which inheres wetted (fluid) and unwetted (solid) nodes. To this end, the respective nearest point of the meta-grid is identified and assigned to each surface element of the unstructured CFD grid. Subsequently, the surface normal vector n n n i and the area A i of each associated CFD surface element are transferred to its associated meta-grid point. Further details of the force calculation are described in Section 3.6.

Proper orthogonal decomposition
The proper orthogonal decomposition (POD) is an established tool to analyze flow fields. The flow fields used as input to the method are labeled snapshots (Ly & Tran, 2001). In the current work, the snapshots follow from the time-averaged results of DES-simulated pressure fields p p p(α α α) for different operating parameter combinations α α α. The pressure fields enter the snapshot-matrix P P P ∈ R n×m as column vectors, where the number of snapshots is denoted by m and the spatial dimension of the meta grid refers to n.
The singular value decomposition (SVD) of the snapshot matrix P P P reads where the left and right singular vectors form the columns of the matrices ∈ R n×m and V V V ∈ R m×n . The diagonal matrix ∈ R m×m contains the singular values of P P P sorted by the importance of each value. The R left singular vectors of constitute a reduced POD basis considering the largest singular values. The pressure field p p p can be approximated by the linear combination of the orthogonal POD basis vectors aka. POD modes wherep p p is the POD-based approximation of the pressure field for the parameters α α α, and r i are the POD coefficients.
To approximate the pressure field in a reduced space, only R m modes are considered. The present study employs R = 5 and R = 10 modes and the number of snapshots refers to either m = 40 or m = 76. For a given snapshot p p p the POD coefficients usually follow from The learning/training results of Equation (3) are subsequently used to train the mapping between the reduced coefficientsr i (α α α) and arbitrary input (system) parameter valuesα α α as described in Section 3.5. Moreover, one can use (3) to validate a reduced basis i L , that was derived from a learning set α α α L , in combination with a validation set p p p(α α α V ). To this end, we simply introduce the projection r i L (α α α V ) = T i L p p p(α α α V ) into (2) and comparep p p with p p p, cf. Section 4.1.

Convolutional neural network autoencoder
The convolutional neural network (CNN) is widely used for machine learning-based image recognition. But it is also applied in the field of fluid dynamics (Brunton et al., 2020). In contrast to fully connected neural network layers, which are not feasible for high dimensional data, the CNN conserves the spatial information of the input field. A standard CNN is composed by convolutional layer with a nonlinear activation function and pooling layers. Deep CNNs consist of several layers arranged in sequence. Optional fully connected layers complement the full structure of the network.
In a fully connected layer each perceptron of the layer is connected to all outputs of the previous layer a a a. Each of the connections is multiplied by a weight W i and a bias b i is added. Afterwards an activation function f is applied to the value of the perceptron to get the output d. The output of an entire layer can be calculated by The nodes of a convolutional layer are not fully connected. Instead the convolutional layer applies filters to the input and is described by herein a x,y,z,n is the input of the layer with the three dimensions x, y, z and n is the count of the channels of the input layer or the feature layers of the previous layer. The weights of the m 3D filters W i,j,k,m and one bias b m for each filter have to be learned during the training process of the network. K specifies the depth, height and width of the convolutional operation. For the sake of simplicity, a uniform value for the different dimensions is used here and in the later use of the model. An activation function f is applied to the output z x,y,z,m of the convolutional layer: In the current work, the Exponential Linear Unit (ELU) (Clevert et al., 2016) is used for the activation function, because it converges faster and produces more accurate results for many applications compared to simpler models.
To select only the most important features of the filters, maximum pooling layers are used. With the used size of 2 for all dimensions, the spatial resolution is halved. For the pooling layers, no additional parameters have to be learned.
The concept of an autoencoder consists of three different parts: the encoder G, the latent variables r r r and the decoder H. The encoder finds a reduced representation with the latent variables r r r of the input field p p p: G : p p p → r r r.
The target of the decoder is to restore the source dimensions of the fluid field from the latent variables and to obtain an accurate representationp p p of the original fluid field: H : r r r →p p p.
The learning process of the network layers aims to find the optimized values for the weights and biases of the network minimizing an error function , which is also called loss function. The value of the mean absolute error (MAE) loss function is determined by where a weighting matrix M is used to ensure that only points in the fluid field and not inner geometry points are considered by the loss function: The network is trained using the backpropagation algorithm (Rumelhart et al., 1986) and the Adam optimizer (Kingma & Ba, 2015) is used to update the weights efficiently.

Coefficient mapping
The above described reduced representation of the pressure field either refers to the reduced coefficients of the POD modes or the latent variables of the CNN-AE. Both are labelled reduced variables in the remainder and denoted by r r r. The length of the reduced variable vector depends on the dimension of the reduced space. In the present study, we employ R = 5[10] for the POD and R = 3[5] for the CNN-AE approach. Subsequent to the dimension reduction, a feedforward neural network is employed to map the m = 2 input parameters to the R reduced variables by a shallow network of fully connected layers, i.e. the input layer (l 0 ), two hidden layers (l 1,2 ) and an output layer (l 3 ). The number of hidden layer neurons is assigned to 12 (l 1 ) and 16 (l 2 ), respectively, and kept constant in this study. The number of neurons of the input and output layers corresponds to the number of input parameters m and a number of reduced variables R. The basic concept of the neural network-based mapping α α α → r r r is already described in Section 3.4. Different from the CNN-AE, each neuron of the layers 1-3 receives input from all outputs of the previous layer. Inputs are again multiplied by a weight and a bias is added. An ELU activation function f elu (z) = α e (e z − 1) if z <= 0 and f elu (z) = z if z > 0 is applied in line with Equation (4). The weights W and biases b of the network are optimized by minimizing a loss function which refers to the mean square error of the approximated reduced variablesr r r i , i.e.
and is sometimes called projection driven neural network. The backpropagation algorithm (Rumelhart et al., 1986) and the Adam optimizer (Kingma & Ba, 2015) are used for the training of the neural network. The training data agrees with in the data employed for the dimensionality reduction, cf. Section 2. The system parameters aligned with a particular pressure field are used as input parameters for the training of the NN-regression. Supplementary, polynomial and Gaussian processbased regressions were employed for the considered application. The results for the pressure field and the extracted pressure forces on the vessel significantly deteriorated for both alternatives compared to the NNregression and were therefore not pursued further. Mind that this contradicts to some extent the findings of Mainini and Willcox (2015).

Determination of the aerodynamic forces
The considered aerodynamic flow fields are governed by massive separation featuring large separated flow regions with negative pressures. The vessel is exposed to forces that are dominated by pressure and friction forces are at least one order of magnitude smaller than the pressure forces. Results obtained for the container configuration 0 at an approach flow angle of β = 0 • reveal a longitudinal friction force of −1.3 kN, which is slightly below 1% of the corresponding pressure force of −141.6 kN. The ratio is very similar for other approach flow angles and loading configurations.
Therefore, only the pressure field is considered for the reduction process. The forces acting on the vessel are based on the surrogate model pressures. To this end, the surface pressure is summarized for all wall neighboring points of the meta-grid and the total force F follows from where A i and n n n i denote the surface area and surface normal vector based on the triangulated representation of the geometry surface. Mind that the surface information is transferred from the CFD mesh to a corresponding point of meta grid (cf. Section 3.2).

Results and discussion
The section assesses the performance of the surrogate model. The assessment distinguishes between the two building blocks of the surrogate model. Initially, the ROM representation is analyzed (validation). Subsequently the parameterization of the coefficients is included to assess the entire model for non-trained configurations (testing).

POD based model reduction
The first snapshot matrix is compiled from m = 40 snapshots, i.e. 10 equally spaced wind angles × 4 stack heights. The spatial resolution of the pressure field follows the 320 × 64 × 64 points of the meta grid, viz. n = 1.310.720. The evolution of singular values and cumulative energy with increasing POD modes is illustrated in Figure 8. Similar to many other problems, an exponential decay of the singular values is seen, which justifies the use of a low-dimensional basis as an adequate approximation of the high-fidelity simulations. Considering the first 5 POD modes, 90% of the field energy is captured, which can be increased to 94% using 10 modes. Figures 9 and 10 compare the POD based pressure fields with the CFD data for 5 and 10 modes, respectively. The performance of the POD is demonstrated for the (trained) container configuration 4 at four different inflow angles, which were not considered during the training of the POD-modes i . Inaccuracies are particularly observed adjacent to the surface of the geometry. Using 10 instead of 5 POD-modes reduces the extent of the deviations and significantly improves the accuracy of the pressure fields returned by the POD.
To enable the prediction of pressure fields in response to the control parameters, the mapping of the input parameters to the POD coefficients is required in the next step. Prior to this, the variation of the POD coefficients r i with the loading height and the wind encounter angle are displayed for the first 10 modes in Figure 11. The figure reveals that the dependence of the coefficients on the input parameters seems more uncorrelated for modes 6-9, making the mapping of the coefficients presumably difficult.

CNN-AE based model reduction
The learning of the CNN autoencoder employs the same 40 snapshots as the POD. The encoder and decoder are each composed of 4 convolutional and pooling layers. They are connected by a dense layer composed by the reduced variables in the latent space, cf. Figure 1. The details of the autoencoder architecture are summarized in Table 2. The employed CNN involves approximately     370.000 model parameters, which have to be determined during the learning process to minimize the deviation of input and output fields. The network is trained with the Adam (Kingma & Ba, 2015) optimizer with a learning rate of 1.25 · 10 −4 and 2 steps per epoch.
To evaluate the quality of compression and reconstruction of the pressure fields, the high-fidelity CFD data is compared with the CNN-AE output for 3 and 5 latent variables in Figures 12 and 13. The comparison again refers to container configuration 4 and 4 different wind encounter angles. The accuracy of the reconstruction significantly increases when using 5 reduced variables instead of 3 and a satisfactory agreement is observed for 5 reduced variables. The deviations between the original and the reconstructed fields for R = 3 are of similar magnitude as the POD results based on 10 modes. Remarkably, areas of inaccuracies and deviations agree for the CNN-AE and the POD reconstruction. Figure 14 depicts the course of the 3 reduced variables as a function of the two input parameters. For the POD method the modes are arranged in the order of importance, while for the CNN-AE based method such ordering is clearly missing. Similarly, Figure 15 displays the courses for 5 reduced variables. No direct correspondence between the function courses obtained with 3 and 5 reduced variables is discernible. Compared to the POD modes, the courses of the CNN-AE approach are smoother.

Performance and costs of the surrogate models
To parameterize the reduced-order models a mapping of the model coefficients r r r to the system input parameters is needed. For the regression of the coefficients, a small shallow neural network is constructed and trained. Details of the employed neural network are summarized in Table 3. The number of nodes for the last layer depends on the number of reduced variables considered in the specific case. For the POD based approach, four different variants are investigated. They differ in the amount of POD modes, i.e. n r = 5 and n r = 10, and the extent of the training data. The latter either features an increment of 20 • or 10 • within the wind encounter angle interval from [0 • , 180 • ]. The same variants are evaluated for the CNN-AE based surrogate model. The test data described in Section 2 and Figure 5 is used to evaluate the  performance of the surrogate models. The high-fidelity reference data is available for 3 different loading configurations and 9 different wind angles. Mind that the test data was neither used for creating the reduced order models, nor for learning the subsequent parameter mapping.
An average of the error experienced for each model is summarized in Table 4. The indicated errors are based on the normalized interval deviations for the pressure field, the longitudinal (X) force and the lateral (Y) force. Tabulated force errors refer to mean averages over all 3 × 9 test cases. As regards pressure-related errors involve all field data points of the meta grid for all 27 cases, which enter the average in an unweighted manner. As indicated by the validation results depicted in Figures 9-13, near-wall pressures -which govern the forces -are more difficult to predict. On the contrary, the predictive agreement improves towards the far-field. This reduces the average pressure error due to the many far-field points.
The averaged pressure field error reveals clear impro vements for a larger number of reduced variables. Likewise, the use of the larger training data set often -but   not always -reduces the error to a small extent. Compared to the POD based models, the CNN-AE based approaches yield smaller average errors. However, the overall performance and force prediction displayed by both strategies is similar. Figures 16-21 outline the force predictions obtained for the 8 surrogate models tabulated in Table 4. In conjunction with the lower number of reduced variables, i.e. n r = 5 for the POD and n r = 3 for the CNN-AE strategy, more significant deviations occur. On the contrary, a fair predictive agreement with CFD data is observed when the amount of reduced variables is approximately doubled. The fair agreement of surrogate force predictions with forces predicted by the full-order CFD, is attributed to partially compensating wall-adjacent pressure deviations. A closer inspection of the pressure fields reveals that the pressure deviations obtained from the surrogate models are often similar on corresponding forward and rearward faces, cf. right column of Figures 12 and 13. Exemplary comparisons of the total forward-and rearward-facing contributions predicted by the CNN-AE approach for the loading configuration n H = 5 are displayed in Figure 22. The figure indicates that the deviation 'symmetry' often causes well        balanced total forces, in particular between 45 • and 135 • , and changes observed when increasing from n r = 3 to n r = 5 reduced variables maintain this 'symmetry'. A particularly striking example refers to the 60 • wind angle, where an over-prediction for n r = 3 transits into an under-prediction for n r = 5 for the rearward faces and vice versa for the forward faces. In conclusion, pressure prediction improvements obtained from increasing the reduced variable space are not necessarily translated into corresponding force prediction changes, cf. Figure 22 between 45 • and 135 • . Exceptions refer to wind angles close to 0 • and 180 • . Here, the stagnation regimes, e.g. along the upper deckhouse, are usually predicted fairly accurate by the surrogate models and the challenge refers to the correct prediction of under-pressure along leeward pointing faces, which usually benefits from an increased amount of reduced variables. The aerodynamic behavior is more challenging to capture in the interval [60 • , 120 • ], where local maxima and minima might occur for configuration 1, cf. Figures 16  and 19. Hence, more high-fidelity CFD data with step sizes of 5 • has been compiled in this regime to support an assessment of the surrogate models. Although these CFD data points were not included in the training data, the extreme values are recovered by the POD based surrogate model with n r = 10 and the CNN-AE based model with n r = 5. Overall, the POD model with n r = 10 and the CNN-AE model with n r = 5 provide comparable results. The coarser training data set with an increment of 20 • appears sufficient for a prediction of the pressure field and the denser database with an increment of 10 • does not significantly improve the predictive accuracy. A minor exception refers to a narrow band between 0 • and 30 • . Here, small benefits are observed in combination with the more extensive amount of training data for the POD strategy, in particular for the loading configuration n H = 5. Since the generation of high-fidelity data during the offline part is costly, the satisfactory quality observed for the coarse training data set is an essential advantage. Due to improved suitability for nonlinear phenomena, the CNN-AE based reduced representation is substantially more compact than the POD based variant. Accordingly, the predictive accuracy returned by the CNN-AE is much less sensitive to the number of reduced variables than the POD. At the same time, the online effort to predict a pressure field from the surrogate model is reduced in comparison to the POD based strategy. On the contrary, the present offline effort to generate the reduced model is significantly lower for the POD approach. The singular value decomposition and the subsequent learning of the NN for the coefficient mapping requires a moderate effort. The wall-clock time for generating the complete POD-based surrogate model refers to approximately 30 minutes on a standard 2020 multi-core workstation. The training of the CNN-AE requires more computational effort, i.e. about 12 hours on a workstation equipped with the modern year 2020 GPU (e.g. NVIDIA Tesla V100) in the present study.
All surrogate model variants reveal weaknesses for predicting the longitudinal force for angles around 0 • ± 30 • . Here, the POD reveals slight improvements over the CNN-AE.

Conclusion
The present study demonstrates that the POD and the CNN-AE strategy offer a reasonable basis to develop a data-driven surrogate model, and to rapidly reconstruct complete pressure fields on board merchant vessels, as well as to determine related forces. Based on O(100) full-scale fluid dynamic simulations for two different input/control parameters, a data-driven approach is developed and assessed. During the first offline step, the dimensionality of the data is reduced. As expected, the nonlinearity of the activation functions used for the CNN-AE enables a more compact data representation than the POD. The second step maps the coefficients or variables of the reduced space to the input parameters.
In this step, a uniform neural network regression based scheme is applied for both variants. The performance of the models is demonstrated for a relative complex aerodynamic flow around the superstructure of a container feeder ship in full scale. Different inflow angles and container loading heights are considered as input parameters. The pressure fields to be reconstructed are of different dimensions, due to the different loading cases. The variation of the field dimensions is handled by an appropriate data mapping into a fixed meta grid.
Results indicate that the pressure fields and the related loads predicted by both surrogate models are in satisfactory agreement with the CFD data although only a confined reduced space is used for the reconstruction step. While the number of reduced variables or modes is of great importance, in particular for the POD approach, the influence of the training data size is subordinate in the considered application. The POD is a linear algebra method and, therefore, partly interpretable. The CNN-AE or their reduced variables are not interpretable and much harder to analyze with respect to the attainable accuracy. The NN based mapping of the input parameters on the coefficients of the reduced variables is well suited for this purpose and applications of simpler regression approaches revealed a substantial performance degradation. To summarize, it has been shown that machine learning approaches and combinations with traditional order reduction methods offer a great potential for decision support systems during the operation of vehicles. Displayed surrogate models are only aware of their training data and thus, e.g., limited to the employed specific geometry. For other flow topologies, the database must be -at least-increased and the adaptation of the structure of the CNNs requires further studies. Reliable predictions are only achievable within the parametric range of the database. Future work will be devoted to more elaborate and perhaps more accurate training strategies. For instance, by considering additional field variables, such as the velocities, the evaluation of the physical consistency/compatibility between the different flow fields can be included in the loss function.