Machine learning-accelerated aerodynamic inverse design

The computational cost of iterative design methods has been a challenge in aerodynamics. In this research, the data-driven acceleration of an iterative inverse design method was implemented to reduce its computational cost. Although iterative design methods are robust, a lot of unwanted data is generated during their intermediate stages. Inverse design methods rely on correcting an initial geometry based on a given target parameter distribution. The generated data during the early iterations of the inverse design was incorporated into two deep-learning models to accelerate target geometry attainment. The deep learning models were used to recognize the correlation between the pressure distribution and corresponding geometry as well as the meaningful changes of geometry and pressure distribution toward their targets. The deep learning models were validated in viscous and inviscid compressible flows for various benchmark aerodynamics problems. In conclusion, between 70 to 80% computational cost decrease was observed for online uses of the machine learning module with the inverse design algorithm. This approach suggests incorporating machine learning techniques into design algorithms by exploiting the intermediate data for further improvement of them. We draw a new interpretation of learning dynamic changes through consecutive iterations instead of typical time-dependent problems in the use of LSTM network.


Introduction
Aerodynamic shape design in internal and external flows has been a paradigmatic problem in engineering and physical science for a long time.Overall, aerodynamic problems can be categorized into flow-field-analysis (direct problems) (Ji et al., 2022;Maeyama et al., 2022;Yuhui et al., 2022) and inverse design (ID) problems (Duan et al., 2020).In flow-field-analysis problems, the geometry is known, hence the fluid flow characteristics and details, such as velocity and pressure fields are desired.In ID problems, the goal is to determine the target geometry for a set of desired flow parameters.This goal can be achieved through either shape optimization techniques (Halder & Kumar, 2020;Kou et al., 2023;Li, Du, et al., 2022) or ID methodologies (Nili-Ahmadabadi et al., 2021;Warey et al., 2022;Yu et al., 2020).While shape optimization methods offer more straightforward access to optimal parameters by simply making corrections to geometry and tracking the changes in target variable, ID techniques are more computationally efficient since the goal is to reach a predetermined distribution of target variable so their search space is bounded and constrained.In addition, ID methods determine the target geometry corresponding to a target flow parameter (e.g.pressure or velocity) distribution onto the solid's boundary also known as flow surface design (Dulikravich et al., 2012), or control some global properties such as entropy minimization (flow field design).There are two types of ID methods: (a) coupled (non-iterative) (Chung-Yuan & Dulikravich, 1986;Hicks & Henne, 1978;Woods, 1955) and (b) decoupled (iterative) approaches (Madadi et al., 2015;Zhang et al., 2023).Coupled methods utilize a formulation in which the surface coordinates are implicitly expressed as dependent variables.Iterative ID methods solve a sequence of flow-field-analysis problems and iteratively correct the geometry until the target pressure along the boundary is achieved.The iterative approach has been extensively used in the design of twodimensional (Chen et al., 2021) and three-dimensional airfoil (Ma et al., 2022), supercritical airfoil (Wang, Li, et al., 2022), compressor blades (Liang & Yao, 2023), and airfoil cascade (Zhu et al., 2020).
The iterative approach proves to be more robust due to its capability to be paired up with different solvers and Figure 1.Overview of the proposed approach: (a) Initial geometry is inputted into the inverse design algorithm to achieve a known airfoil target pressure distribution.As the inverse design algorithm changes the geometries to reach the target distribution, a dataset of geometries and their pressure distributions is created, this process is stopped at an iteration way before the convergence criteria are met.(b) The training set produced in the previous section is utilized to train a data-driven deep learning model to estimate the target geometry from target pressure distribution by learning the mapping between the pressure distribution and geometry as well as the trend of geometry changes.its applicability to complex flows.However, these benefits come with a significant computational cost, since a considerable amount of data regarding geometry and flow variables is produced through the intermediate steps of an iterative ID.This data will be left unused after each geometry correction step; however, it can be used to train machine learning (ML) models to accelerate the ID or optimization algorithm toward its target as shown in Figure 1.If the inverse design algorithm is stopped before the convergence and after generating sufficient data (Figure 1 (a)), a dataset from the early iterations of the ID algorithm is produced.This dataset can be used to train the deep learning model to learn various features of the data for predicting the target geometry (Figure 1 (b)).Thus, by feeding the generated data, including the flowfield analysis of corrected geometries within early iterations (before meeting the convergence criteria), into a deep learning model, a significant decrease in computational cost is expected.However, the effectiveness of this approach depends on the accuracy of the deep learning model in predicting the target geometry.This computational cost reduction has always been a need in the industry and a driving force for all technologies in the field of aerodynamic design (Dulikravich, 1997).
Nowadays, ML technology has a credible spot in both scientific communities and industry as it has shown excellent results and unique potential.ML algorithms build a model based on sample data, known as training data, to make predictions or decisions without being explicitly programmed to do so (Koza et al., 1996;Mitchell, 1997).ML methods increasingly utilize special techniques called deep learning.Deep learning enables computational models to extract and learn new features by processing the input data in multiple layers.Each layer acts as a non-linear transformation to change the composition or representation of the data to extract and abstract new data, from the data obtained in the previous level (LeCun et al., 2015).
The development of the Back Propagation algorithm (Jackson, 1999) and multilayer neural networks (NNs) facilitated the evolution of a variety of NNs and various applications in the field of deep learning such as convolutional neural network (CNN) (LeCun et al., 1989) and Long Short-Term Memory (LSTM) network (Hochreiter & Schmidhuber, 1996, 1997).In the field of fluid mechanics, NNs have been extensively used for heat transfer (Doner et al., 2023), turbomachinery (Chen, Liu, et al., 2022), turbulent flows (Teng et al., 2022) and aeronautics (Rozov & Breitsamter, 2021).Many studies on the use of super-resolution techniques for computational cost reduction of the computational fluid dynamics (CFD) analysis of the fluid flow have been conducted (Fukami et al., 2019;Kochkov et al., 2021;Ling et al., 2016).
ID (Li, Wang, et al., 2022) and optimization (Wang et al., 2023) algorithms are more precedent than ML methods in the aerodynamic design field (Chen et al., 2020).The adapted applications and interpretations of the prominent NNs, such as FNN, CNN, and LSTM network, to physical problems in fluid mechanics (Fukami et al., 2019;Liu et al., 2020) can allow further comprehension and invention of specialized networks in fluid mechanics.Recently, Wang et al. (Wang, Wang, et al., 2022) trained a Generative Adversarial Network (GAN) with 8000 pressure distributions and their geometries obtained by CFD solution.They augmented their dataset to train two artificial NNs to obtain the correlation between pressure distribution-aerodynamic coefficient and pressure distribution-geometry and optimize their nacelle geometry.Yang et al. (Yang et al., 2022) used a two-step deep learning approach to address the ambiguity and challenges of determining the target pressure distribution.Tandis and Assareh (Tandis & Assareh, 2017) developed a new population-based hybrid algorithm to improve accuracy and convergence speed in the inverse design of airfoils.Chen et al. (Chen, Sun, et al., 2022) used their numerical and experimental data as the training set to construct a data-driven ML model to map between the design variables and aerodynamic forces of high-speed trains.In addition, data-driven mapping from geometry space to design variables space (Kou & Zhang, 2021) or machine learning-based order reduction of design space (Hoang et al., 2022) is also another approach to improve inverse design methods efficiency.
Recurrent neural networks (RNNs) and LSTM networks, which had a revolutionary role in speech recognition and artificial intelligence, have been used in the reduced-order modelling of dynamical systems and extreme events prediction (Vlachas et al., 2018), development of a non-linear aeroelastic model for a flat plate (Torregrosa et al., 2021), and non-linear unsteady bridge aerodynamics (Li et al., 2020).However, there has been very little emphasis on the use of such networks, as compared to other NNs, in the field of ID (Tyan et al., 2022).
Considering the significant computational costs associated with design procedures, surrogate modelling has emerged as a popular approach to address this challenge.There has been a notable emphasis in modern surrogate modelling on utilizing Artificial Neural Networks (ANNs) in surrogate modelling to explore the design space for optimal points of target function (Sun & Wang, 2019).Therefore, surrogate models are commonly intertwined with optimization methods.ANNs have been extensively used as surrogate models.While the types of input and output are quite similar, the major differences are in the methodology and application of ANNs as surrogate models.For instance, using data obtained from parallel high-fidelity simulations or prior experimental cases, ANNs have been employed to reconstruct complex flow fields (Yu & Hesthaven, 2018), predict aerodynamic characteristics as a substitute for expensive adjoint solvers (Renganathan et al., 2021), and improve the accuracy of results from a low-resolution grid (Zhang et al., 2021).Furthermore, ANNs have been utilized as the primary inverse design tool to achieve the airfoil corresponding to a given pressure distribution (Sun et al., 2015).Although the latter case may not strictly follow the design variables-to-target variables pattern, unlike optimization methods.
In the proposed approach, the ML model works effectively in tandem with the primary ID module rather than replacing it, enhancing overall performance and data efficiency in the online state.The ML module is uniquely integrated into the design algorithm and leverages the generated data within the main algorithm, eliminating the need for parallel simulations to create a training set.The unconventional choice of the LSTM network allows for the interpretation of time-related dependencies as dynamic changes across consecutive iterations.This results in learning of the correlation between the pressure space and geometry space as well as the identification of directional changes in design and target variables.
Utilizing a previously trained NN or an external database as the training set, while its relevancy to the evaluated data is ambiguous, is the common ground in past studies.Most similar past studies rely on large datasets containing tens of thousands to hundreds of thousands of data, which makes it hard to work with or reproduce.The present study solely utilizes the data derived from previous geometry corrections of an ongoing ID for training the learning machine, which imposes no additional computational cost, thereby expediting the prediction of the target geometry.Our datasets, which were only as large as a few hundred pressure distributions and their corresponding geometries, were incorporated into a machine-learning module to make the method self-sufficient in terms of data.This novel approach maximized the use of the geometries and flowfield analysis data generated during the initial iterations of the elastic surface ID method to accelerate the prediction of the target geometry.The computational cost of the hybrid ML-ID method was evaluated for different airfoils in offline and online connections to the ID loop.This method can be extended to other iterative design algorithms with necessary adjustments and without reliance on external datasets or pre-trained neural networks.

ID module
The ID problems are defined by a set of governing equations for the flow field, computational domain, boundary and initial conditions, and geometry correction algorithm.The residual-correction ID algorithms aim to iteratively correct the initial geometry to achieve the predefined target parameter, usually pressure distribution, and minimize the residual value.In the computational domain ( ) shown in Figure 2, the far-field boundary condition is applied to L ∞ via the free-stream conditions and Mach number.The no-slip boundary condition is applied to the airfoil surface (L).
The fluid flow is solved and the target parameter distribution, current pressure distribution (p d ) in this case, on the airfoil surface (L) is calculated.Since the fluid flow equations are solved on a geometry surface other than the target geometry (L = L * ), a residual value (ϑ) is generated according to Eq. 1.
Thus, for a simplified set of flow equations, the inverse design problem can be formulated as the minimization of the ϑ value in the above equation by finding the surface L * that achieves this minimum.The system of flow equations and boundary conditions is solved as a wellposed inverse problem for an arbitrary initial geometry to iteratively correct the current geometry (L (k) ) and reduce the gap between the current and target pressure distributions.Therefore, the geometry in k th iteration must change iteratively for the residual value (ϑ) to approach zero.The iterative inverse design algorithms define the geometry correction (δL (k) ) for the geometry L (k) in the kth iteration to minimize the homogenous function f , which includes the residual values based on the current (k) pressure distribution.

Elastic surface algorithm (ESA)
In this work, the elastic surface algorithm (ESA) as a residual-correction method was used for the geometry correction (δL (k) in Eq. 2) during the ID process.In this method, airfoil walls were considered as an elastic bent beam that deformed under the difference between the current (iteration) and target pressure distribution, as shown in Figure 3 (a).Until the convergence criterion for pressure difference is satisfied, the geometry correction is continued according to the flowchart shown in Figure 3 (b).The Timoshenko model beam theory is used as a fluid-solid-interaction (FSI) approach to provide a fast and stable convergence toward the target geometry.Geometry corrections are carried out by solving the deformation vector in Eq. 3, utilizing a Finite Element Method.
where K, U, and F are stiffness, deformation, and acting force vectors, respectively, and the iteration index is indicated by superscript n.Further details about the ESA ID algorithm are beyond the scope of this paper and can be found in the reference (Safari et al., 2014).

CFD flow solver
The computational domain was discretized using a structural grid, as shown in Figure 4.The free stream conditions for the simulation were a Mach number of 0.45 (M ∞ = 0.45) and a Reynolds number of 10 7 .To enforce the pressure far-field boundary condition, the free stream Mach number and the angle of attack (AOA) were taken into account.Zero-degree AOA in the case of the NACA-0012 airfoil, four-degree AOA for the DSMA-523A, and one degree AOA for the FX63-137 airfoil were used.
The CFD solution was evaluated based on a compressible flow model with atmospheric boundary condition, where pressure and temperature were set to 1 atm and 298 K (P ∞ = 1atm, T ∞ = 298K).An inviscid flow solver numerically solved the flow over the NACA-0012 airfoil, whereas a viscous turbulent flow solver resolved the flow over the DSMA-523A and the FX63-137.The turbulence model is the one-equation Spalart-Allmaras model (Spalart & Allmaras, 1992), which calculates the turbulence dynamic viscosity.ANSYS Fluent 2022R2 (ANSYS, Canonsburg, Pennsylvania) was used as the CFD flow solver.The solution data was then extracted into a MATLAB (MathWorks,  Natick, Massachusetts) code where the discretization of the domain and the generated codes of the ID algorithm and ML model were implemented as separate modules automatically.This step was followed by the generation of the next geometry and evaluation of the convergence criterion (maximum of 1% deviation from the target pressure).
Given the fact that the geometry correction in the ID algorithm is based on the difference between the calculated and target pressure distributions, the independency of pressure distribution from the grid size needs to be investigated.Figure 5 illustrates that the grid with 20,372 elements is sufficient for the CFD evaluation of pressure distribution.

ML module
The main goal of combining the ID algorithm and ML module is the fast prediction of the target geometry at the output by receiving the corresponding pressure distribution at the input.Therefore, appropriately defining the inputs and outputs is necessary.Additionally, the correct representation of the input and output data to achieve the maximum learnable features is also investigated in the later part of this section.After determining the input and output structures, the network architecture can be determined by design guidelines through a trial-and-error procedure.

Input and output format
The input and target (or output) matrices were defined based on the m points used to evaluate the pressure value or geometrical Cartesian co-ordinates on the airfoil's surface generated in the k th iteration out of the total of G iterations in the dataset.In the input matrix x, X (k)  i and P

(k)
i indicate the x-component and static pressure of the i th point on the airfoil's surface in iteration k according to Eq. 4. Similar to that in the output matrix t, X denote the i th point x-and y-component of the k th iteration on the airfoil's surface in the Cartesian co-ordinates, according to Eq. 5. Typically, 24-59 design points were required for the representation of an airfoil or a rotor geometry (Chahine et al., 2012;Mueller et al., 2013).In this research, 30 points were used to parametrize the geometry and pressure distribution, meaning the m value is equal to 30, whereas the G value can vary depending on the dataset size used.
Since the P i elements of the input and the Y i elements of the output matrices are interpolated on invariable X i points, the network is insensitive to constant X i elements.Therefore, all X i elements and the first and last airfoil geometry points (Y 1 and Y m ) are excluded during preprocessing.This leaves the input and output matrices to be of order (m−2) ×G and m×G, respectively.Additionally, the input and target matrices are mapped into [−1, 1] interval to address the Internal Covariate Shift Problem as well as to unify each input or output component role in the loss function.
The early investigations showed poor results when using the L1 loss function.The error function was chosen to be in the form of the L2 loss function, L(t, y), aka MSE function according to Eq. 6: where N P and N G indicate the number of points and total geometries in the dataset, whereas t ij and y ij represent the elements of the NN's target and output matrices, respectively.

Design and training of NNs
There are mainly two approaches for designing the architecture of a NN, which are as follows: a) creating a model that is large enough to solve the problem and gradually reducing the model size to achieve a smaller model with an acceptable error, and b) selecting a very small model and incrementally expanding it for the accuracy of the output to reach a desirable margin of error.In this work, the second approach was chosen due to the reduction of the model's complexity, computational cost, and RAM usage in the second approach.This strategy follows a rule known as Occam's razor (Blumer et al., 1987), which prefers a simpler model due to its higher generalizability potential.In programming, the Minimum Description Length principle is used to measure and compare the complexity of possible models (Allender, 1992;Schmidhuber, 1997).This principle indicates that a NN with a lower complexity of weights corresponds to a higher probability in the Bayesian view (Neal, 1996;Titterington, 2004) and a higher performance with more generalizability (Baum & Haussler, 1989).In other words, the model is less likely to overfit the training data.After a thorough investigation and evaluation of different architectures, optimal architecture was designed for a Deep Feedforward Neural Network (DFNN) shown in Figure 6 with four hidden layers and 313 parameters Figure 7 illustrates the capability of the deep learning model in estimating the target geometry from different pressure distributions of various airfoils in the validation set.The noisy and inaccurate output of the model is obvious before the optimization of the network architecture, settings, and preprocessing of the data.
In FNN, each layer is described as a linear transformation, and each input is connected to all outputs from the next layer, as shown in Figure 6 In many advanced networks, the number of connections is much fewer, so each input unit is only connected to a small subset of output neurons in each layer (Goodfellow et al., 2016), such as CNNs and LSTM networks.CNNs are mainly used in computer vision and statically structured applications, whereas LSTM networks are usually employed for dynamic problems such as transforming a dynamic signal into another signal or a numerical vector.
For any perception of the structures and features in the input data, a certain depth in the network architecture is required.In special applications, FNNs are unable to reach such depth due to the aforementioned points.This results in networks with a small width or complex response manifold, which are not desired since they are hard to train and can easily overfit.As seen in Figure 8, Goodfellow et al. (Goodfellow et al., 2014) also confirmed that increasing the number of parameters without increasing the number of layers and depth of the network does not improve the network performance.
To improve the deep learning model performance, the use of very small datasets requires that the correlations between input data and its meaningful changes during geometry corrections can be recognized.RNNs are the deepest of all NNs, they are general computers more powerful than FNNs, and in principle, they can create and process memories of arbitrary sequences of input patterns (Siegelmann & Sontag, 1991).RNNs, in comparison to FNNs, use their output from the previous state as feedback, which helps them to remember their previous states.For this reason, they are  used when there is a meaningful order or sequence of input data.Hence, using the ID data to train such networks is reasonable, because the pressure/geometry corrections are ordered and meaningful toward their target.
In this paper, the input pressure and output geometry vectors are interpreted as two dynamic signals that change continuously due to ID shape modification.Therefore, the LSTM network enables the ML module to learn not only the pressure-geometry correlation but also short and long-term dependencies, indicating that it is able to track the geometry changes due to its recurrent structure.The architecture of an LSTM unit is shown in Figure 9. Deciding on how much of the new information should be added to the cell state (C t ) is done by using Eq.7 and Eq. 8 in the input gate.Subscripts i, c, f , and o are used to differentiate weight matrices (W) and bias vectors (b) used to produce gate values i t , c t , f t , and o t in sigmoid function (σ ) shown in Figure 9, respectively.The subscript t indicates the time step of each value and the hidden output h t .
In the forget gate, it is decided how much of the past information should be forgotten based on the previous time step hidden output (h t−1 ) and current time step input (x t ) according to Eq. 9.
In the output gate, after the final value of the cell state is determined using the candidate value ( Ct ) and previous time step cell state (C t−1 ) according to Eq. 10, the final output value for this time step (h t ) is calculated from the o t and C t values, as described in Eq. 11 and Eq.12.
All variables, paths of information, and more details are shown in Figure 9.
The loss function for the LSTM network is the same as defined in Eq. 6 in the form of the L2 loss function.The early evaluation of the LSTM network showed that the substitution of normal LSTM layers with bidirectional LSTM (bi-LSTM) layers and keeping the constant geometrical points (in the leading and trailing edges) in the output target matrix (and adjusting them later), will improve the network performance, especially on smaller datasets as shown in Figure 10.Although this method is uncommon in the field of computer science, from the ID and fluid mechanics perspective, estimating the static leading and trailing edge points allows the LSTM network to have a better estimation of the full geometry.Hence, the number of responses (NR) in the LSTM output increased from 28 points to 30.
The substitution of a normal LSTM layer with a bi-LSTM layer enables the data and learned features to flow in both upstream-to-downstream and downstream-toupstream directions.Assuming the 30 points used to evaluate the pressure and geometry value on the airfoil surface as a time series, in a normal LSTM layer, the influence of the 30th pressure point would be neglected because the 30th point of the geometry is fixed.In a bi-LSTM layer, however, the input pressure points at the end of the time series, physically located near the trailing edge area, can affect the earlier geometry points in the output, which is more consistent with the physics of the problem.
The networks' architectural design was carried out by dividing the generated data obtained from the ID of the NACA-0011 airfoil into training, validation, and test sets.The first 70% and 30% of the dataset (excluding the target geometry data) were assigned to the training and validation sets, respectively.The target geometry data or final geometry achieved in the ID and its corresponding pressure distribution were assigned to the test set.
The ADAM algorithm (Kingma & Ba, 2014) was used as the training function of the LSTM network.This algorithm calculates the network parameter θ using the gradient of the loss function, E(θ ), and its squared value.Where l indicates the iteration number, β 1 and β 2 are the decay rates for producing the element-wise dynamic averages m l and v l of the loss function gradient and its squared value with an added momentum term in Eq. 13 and Eq.14, respectively.α is the learning rate and ε is a small value to prevent the denominator of Eq. 15 from becoming zero (Bishop, 2006;Murphy, 2012;Pascanu A schematic of an LSTM layer's input and output configuration and the nomenclature used in this work are shown in Figure 11.To achieve the best physical connection and feature extraction from the input and output data for a better estimation of the target geometry, the LSTM network is evaluated by three different representations of input and output data.Table 1 contains the architecture of designed networks for three studied representations. First, the number of pressure distributions in the input and their corresponding geometries in the output were considered as the Number of Observations (NO).The pressure and geometry changes on the airfoil surface were interpreted as a time series; hence, the number of variables used to parametrize geometry/pressure distribution were considered as the Sequence Length (SL).In this representation, each geometry correction was interpreted as a separate observation.Since the number of ID iterations is almost always more than the number of pressure/geometry variables, this representation results in maximizing NO (Figure 12).
In the second representation, all data were interpreted as a single NO and the geometry/pressure points were assembled orderly to create a sequence with a length of 155 × 30 to increase the feature extraction in time.This representation interprets all the changes across each individual point and all geometry corrections as a time series.Since all input/output variables were taken to have the same value, the NF equals 1 and this representation results in maximizing SL with the length of 4650, according to Figure 13.
In the third representation, each geometry/pressure point was interpreted as a separate feature, which gives a total of 30 NF.This means that each point is tracked across geometry/pressure distribution corrections according to ID and that the pressure/geometry corrections were interpreted as a time series and as a single observation.In other words, it is assumed that, in each time step, a new pressure distribution/geometry is received with 30 variables, in which every variable is independent of the others.Since any padding and trimming will hurt the physics of the problem, and due to  the similarities of the output mode of this representation with the second representation, no new architecture was required to be evaluated.This representation tries to mimic the ID behaviour and physics; hence, due to its highest number of features among all the representations, it is based on maximizing NF, as depicted in Figure 14.
The three previously mentioned representations are named Max.NO (Figure 12), Max.SL (Figure 13), and Max.NF (Figure 14), respectively.The DFNN and LSTM network under the mentioned representations were compared in terms of their capability in estimating the target geometry (loss function value on the test set) and computational cost in Figures 15 and 16 respectively.As shown in Figure 15, the minimum error (or loss) in the smallest dataset, with only 7% of ID initial data, belongs to the LSTM network under the Max.NO representation.The LSTM network was not only able to estimate the target geometry with a lower error but also has a better consistency in terms of loss value reduction by increasing the dataset size, as compared to that of DFNN.Although DFNN has a lower error compared to Max.SL and Max.NF representations, the LSTM network with Max.NO representation has generally a lower error, especially in smaller datasets.For example, the Max.NO LSTM error on the second smallest dataset (10% of all data) is approximately one order of magnitude lower than DFNN, according to Figure 15.
According to Figure 15, the network's error decreases exponentially by increasing the dataset size.Therefore, providing the maximum information, whether through the correct choice of input/output data representation or through the utilization of maximum data in each evaluated dataset, has a crucial role in improving the NN performance in estimating the target geometry.Regarding  the aforementioned data, since the network architecture, correct representation, and other hyperparameters were determined, the regularization method instead of the early stopping method was used in the network training process.Switching to regularization enables one to use the maximum data in each dataset without having to spend a noticeable portion of the dataset for the validation set.This method also increases the accuracy if the  As shown in Figure 16, the minimum computational cost belongs to DFNN due to its fewer network parameters.However, the inconsistencies of the loss function value by increasing the dataset size, are noticeable.These inconsistencies can be reduced by increasing the number of evaluated initial points in the network's training process, although this significantly increases the computational cost.In Figure 16, the Max.NF representation has the lowest computational cost, having the highest correspondence to the ID algorithm physics.

Result
After the network design procedure and choosing the correct architecture and appropriate representation of input/output data were accomplished, the NN was evaluated and validated in its ability to estimate the target geometry by utilizing the generated data from previous iterations of the ID.Therefore, the LSTM network with Max.NO representation was used for its superior performance, especially on smaller datasets.

Hybrid ID-ML module in the offline sate
In the offline connection between the ID and ML module, the ID loop has no direct connection with the ML loop.In other words, the ML loop only uses a subset of total data generated during the ID, without redirecting the NN's output geometry back into the ID loop.This dataset, obtained from a previously converged ID, is introduced to the network as a training dataset for the ML model to estimate the target geometry (Figure 17).These datasets follow the order of the ID geometry correction starting from the initial geometry.To determine the reduction percentage of geometry corrections when using the ML module to estimate the target, the dataset size starts from a small value and increases until the deep learning model reaches an acceptable estimate of the target geometry.Hence, if n(Tr), n(Va), and n(Te) represent the number of the members of the training, validation, and test sets, respectively, dataset size n(D) can be computed from n(D) = n(Tr) + n(Va) + n(Te).A smaller n(D) with respect to the total available data for estimating the target geometry is desired.
As mentioned previously, each network was evaluated from 10 initial points (initial weights) to ensure that the NN training process was not stopped at a local Minima.This method is fairly common but it forces a huge computational cost, as the NN training computational cost would increase to ten times of its value.Additionally, since the validation set is excluded for maximum training data efficiency, no solid criterion is available to choose between the trained networks with different initial weights.In this work, however, an innovative approach was used which assumes that 'the lower the chosen network error is on an unseen subset of data, there is a higher chance that this network would be closer to the global Minima, after the training process.'Therefore, the dataset was initially divided into training and validation sets with 70% and 30% ratios, respectively; they are named as 'initial training set' and 'initial validation set,' respectively.The NN weights were initialized and network was trained by a very small learning rate for a few iterations; this process was repeated for 100 different initial points.The weight matrix of the network (W) with the best performance on the initial validation set  was then used to initialize the main network and train it by using the whole dataset as the training set according to the L2 regularization method (Eq.16), in which the regularization parameter, λ, was determined beforehand.

Benchmark problems in aerodynamic inverse design
A. NACA-0012.The NACA-0012 as a symmetrical airfoil was considered to test and verify the developed hybrid ML-ID method.An inviscid flow solver solves the Euler equations to calculate the pressure distribution at each geometry correction step.The pressure far-field boundary condition with Mach number of 0.45 (Figure 4) with zero AOA was considered for flow analysis.The ID of the airfoil was carried out with two different initial geometries to investigate the effect of initial geometry on the deep learning model performance.The initial and converged geometries with a maximum of 1% deviation from the target pressure distribution for the two different initial geometries are shown in Figure 18.
Incorporating deep learning techniques into the ID algorithm for estimating the target geometry increases the convergence rate and reduce the computational cost.During the ID, especially in complex flows and geometries, the convergence rate drops after getting close to the target geometry.Since each geometry correction in ID is computed according to the difference between the current and target pressure distributions, the initial convergence rates toward the target geometry during the ID are very fast, and then they slow down significantly due to lower pressure differences (Figure 19).Hence, a considerable number of geometries and their corresponding flow data are produced fairly near the target geometry (Figure 19).Therefore, the deep learning model is able to accelerate the ID process by estimating the target geometry using the previously generated data, especially the ones close to the target geometry.In this way, this weak point of ID is removed.The geometry corrections toward the target geometry during the ID are shown in Figure 19.Altogether, 157 geometry corrections were needed to reach the target pressure distribution with 1% maximum error, producing 158 geometries and their respective pressure distributions.Now, the conditions of using the LSTM network under Max.NO representation are met.To evaluate the network performance, the first N (an arbitrary number) generated data from the ID are used for training the network and estimating the target geometry as the output by giving the target pressure distribution as the input.The plots of the LSTM network training process and its error histogram are shown in Figure 20 (a) and (b), respectively.Figures 21 and 22 show the LSTM network estimation of the target geometry by using the mentioned dataset size of the original database of 158 data, before and after smoothing by the Bezier curve.
As shown in Figure 23, the NN estimated the target geometry within a 1% error margin between the final and target pressure distributions, by only using the first 28% of the total data.By using even smaller datasets, the developed NN obtained a good estimate of the target geometry in the leading-edge area (Figure 21), which is the last and hardest area to converge during the ID.Hence, achieving the convergence conditions with even smaller datasets is very likely, if the incorporation of the ID algorithm and ML module is in an online state.In this way, with the known capability of the ML module        in estimating the high curvature area, such as the leading edge, returning the estimated geometry back into the ID algorithm can cause faster convergence after a few geometry corrections.

B. DSA-523A.
The lower surface of the DSMA-523A for its high curvature and existence of a turning point on its surface was chosen to verify the hybrid ML-ID method.The free stream conditions are the same as those for the latter case (0.45 Mach and 1 atm).However, the ID was carried out in a viscous flow with 4 degrees of attack angle.Convergence was achieved after 261 geometry corrections.The thickness distribution of the initial geometry, as shown in Figure 24, is 50% of the DSMA-523A airfoil.Figure 24 illustrates the slower convergence rate, as compared to that of the NACA-0012 case, due to the geometry and fluid flow complexities, by showing some of the intermediate geometries.The training process and performance of the LSTM network with 56% of total data generated during the ID process are shown in Figure 25 (a) and (b), respectively.
Figures 26 and 27 show the NN estimation of the lower surface of the target geometry for training datasets with first 52% and 56% of total data, respectively.As shown in Figure 27, the NN estimated the target geometry correctly by only using the first 56% of the total ID data.The         predicted geometry is considered converged by reaching the maximum 1% deviation from the target pressure distribution as depicted in Figure 28.
However, as shown in Figure 29, the ML module requires a much smaller training dataset (only 46% of the total data) to give a good estimate of the target geometry.Therefore, considering the faster convergence rate of the ID algorithm in low-curvature areas, faster convergence rate and higher computation cost reduction are expected by integrating the ML and ID modules in the online state rather than in the offline state.This enables the ML module to predict a geometry rather close to the target geometry with very smaller training datasets and converge to the target geometry by passing some shape modifications through the ID loop.
C. FX63-137.The FX63-137 asymmetrical airfoil was chosen due to its high curvature, complex geometry, and popularity in wind turbine applications.The free stream conditions are same as the latter case, and the ID was conducted in one-degree AOA and a viscous flow regime.The objective was to achieve a maximum deviation of 3% from the target pressure distribution, which was successfully accomplished through 336 geometry corrections.Due to its geometric complexity, this particular case required the highest number of geometry corrections compared to the previous cases.Figure 30 illustrates the initial geometry as well as some intermediate geometries and the target geometry.
In this case, as both surfaces went through inverse design, the LSTM network was employed twice, with each network dedicated to predicting one surface.The networks were trained separately and used independently to predict their respective surfaces.Figure 31    show the changes of the loss function during the training process of the NN with 52% of the FX63-137 data for upper and lower surfaces, respectively.The approximated airfoil using 52% of the FX63-137 ID data as well as the target geometry are shown in Figure 32, the NN output is considered converged within 3% margin of the target pressure distribution as demonstrated in Figure 33.
In the previous section, the target geometry was estimated by training a small portion of the ID data to an LSTM network and giving the target pressure distribution as the input.By considering the computational cost of each shape modification in the ID and the time required for training the deep learning model, the computational cost of the hybrid ML-ID method is reduced due to the difference between the percentage of the data used for training and total data.Table 2 and Figure 34 show this computational cost reduction by taking into account the average time required for each shape modification during the ID, time required for training the NN, and reduced number of shape modifications after reaching the final geometry.For the case of the FX63-137 airfoil computational cost for training the NN for each surface was taken into account, as mentioned in Table 2 for upper and lower surfaces, respectively.

Hybrid ID-ML module in the online state
In this section, the ID of DSMA-523A and FX63-137 is accomplished by using the hybrid ML-ID module in the online state.Boundary and far-field conditions are the same as those mentioned in section 1.
According to Figure 35, after a certain number of iterations is passed in the ID algorithm, an initial dataset is formed and NN is trained by using the available data.The target pressure distribution is then inserted into the NN as the input and the estimation of target geometry is received as the output.This output is returned to the ID loop afterwards for further possible shape modifications required.This process is repeated until the next specified step of using the ML module is reached or the convergence condition is met.In this way, it is expected that a good estimate of the target geometry is achieved by using the deep learning model, and further minor geometry correctios are performed by the ID algorithm.

Benchmark problems in aerodynamic inverse design
A. DSMA-523A.The ID of DSMA-523A, which was achieved in 262 iterations without the ML module, was repeated using the online incorporation of the ML module into the ID module.Initially, after 75 iterations, the data were inputted into the ML algorithm, and the estimation of the target geometry by the LSTM network is shown in Figure 36.After the data were returned to the main ID loop and passed 25 further geometry corrections, the data were inputted into the LSTM network once again; the network estimation of the target geometry at this step is shown in Figure 37.
After the implementation of the ML module at the 101st step, the output geometry was inputted into to the ID loop, and the ID converged at the 110th iteration.Figure 38 illustrates the geometries and their corresponding pressure distributions of the initial, target, ML estimations at the 76th and 101st iterations, and converged geometry at the 110th iteration.
Figure 39 depicts the variation of the dimensionless Pressure Error Function (PEF) across different iterations, according to Eq. 17, in which M denotes the number of points on the airfoil's surface.A significant reduction of the PEF can be observed in the iterations related to using the ML module.
Figure 40 compares the computational cost of the pure ID and hybrid ML-ID module in the offline and online states.As indicated, the reduction in the computation cost in the online hybrid ML-ID module is 57.39%, which is 14% less than that of the offline hybrid ML-ID module.
To study the effect of the iteration step number of using the ML module during the ID, the LSTM network was used after 25 geometry corrections in the ID.The output of the LSTM is compared to the target geometry in    Notably, immediately after the second use of the LSTM network, the ID converges with the 1% maximum deviation from the target pressure distribution criterion at the 101st iteration.Figure 43 illustrates the geometries and corresponding pressure distributions for the initial, target, ML estimations at the 26th and 51st iterations, and converged geometry at the 51th iteration.
Figure 44 shows the changes in PEF, using Eq.17, across various iterations.A significant reduction in the PEF value is seen in the iterations related to the first and second uses of the ML module.Figure 45 compares the computational cost of the pure ID as well as offline and online hybrid ML-ID modules.As indicated, the reduction in the computational cost in the online hybrid ML-ID module is 80.55%, which is 37% lower than that of the offline hybrid ML-ID module.

B. FX63-137.
Using purely the ID module, the FX63-137 airfoil was converged within 3% deviation from the target pressure in 337 iterations.In this section, the online ML-ID module is employed and the ML module is used every 25 iterations, starting from the 50th iteration in a similar fashion as the DSMA-523A case.The only difference is that the online ML-ID module was applied to both upper and lower airfoil surfaces, although the FX63-137 is geometrically more complex.Figure 46 shows the output of the NN trained by the generated data during the first 50 iterations.The NN estimation was reinputted into the ID loop and subsequently feedbacked into the ML module after 25 ID iterations, the estimated geometry at 76th iteration is shown in Figure 47.
Similarly, the ML module was used again at 101st iteration after another 25-iteration step according to Figure 48.The final geometry was attained after the output was feedbacked into the ID module for three more geometry corrections.The estimated geometries in the 51st, 76th and 101st iterations as well as the initial, target, and converged geometries and their corresponding pressure distributions are illustrated in Figure 49.
Figure 50 shows the variation of PEF on each surface of the FX63-137 airfoil.Significant jump-downs are observed in the iterations where the ML module was employed, with one exception.The only instance of PEF jump-up is at 51st iteration on the upper surface, where the NN makes a flawed prediction of the lower surface which is amplified by the change of the upper surface.However, the NN learned and adjusted the direction of changes and made improvements at iterations 76th and 101st that compensates for it.Another notable effect in the online usage of ML-ID module is the intensified effect of the ID corrections right after the ML usage.As shown Figure 50, noticeable pressure drops occurred on the lower surface after the after the 76th and 101st iterations.As intended for the online state, this intensified drop in PEF was a deliberate outcome after the ML prediction, where the ID module can make slight local adjustments      to the points that contribute the most to the pressure distribution deviation.Consequently, this results in such intensified PEF drop-downs in the follow-up iterations.
As shown in Figure 51 the online ML acceleration of our aerodynamic ID algorithm has resulted in 68.51% computational cost decrease compared to pure ID, and 39.69% reduction compared to the offline ML-ID module.

Conclusion
This paper proposed a novel approach to incorporate deep learning models into an iterative inverse design algorithm for maximizing the use of generated data in the early stages of the inverse design procedure to reduce its computational cost.The inverse design algorithm was stopped way before the convergence iteration, and the data of the early iterations were used to train the learning machine, which did not impose any further computational cost, unlike past studies.The neural network was not pre-trained and the clarity of the data used as the training set was as transparent as the inverse design algorithm, shown in multiple figures.The datasets were only as large as a few hundred pressure distributions and their corresponding geometries.This was feasible due to the nature of the inverse design methods, which have an ordered and meaningful change of pressure distribution and geometries toward their targets, thereby preventing the learning machine from making inappropriate extrapolations, despite the use of very small and unconventional datasets.Furthermore, these meaningful changes were exploited to be taught to an LSTM network and increase the network performance.A new interpretation in the use of the LSTM network was drawn in which learning dynamic changes through time as in the classic problems was replaced with learning dynamic changes through consecutive iterations.
A deep learning model was first carefully designed and optimized by evaluating different structures and representations of input and output data under different interpretations, allowing the deep learning model to learn different sets of features.The results were compared in terms of the mean squared error and computational cost to select the best machine learning model to combine with the elastic surface inverse design method.The results showed that the hybrid inverse designmachine method is not only able to detect the correlation between the input pressure distributions and output geometries, but also can detect the direction of changes toward their corresponding targets.The hybrid machinelearning inverse design was used in offline combination to validate the machine learning module and its competency to predict the target individually, resulting in 72%, 44%, and 48% computational cost decrease for the NACA-0012, DSMA-523A, and FX63-137, respectively.Combining the strengths of both modules in the online state, the reduction percentage was even higher, up to 70% and 80% computational time reduction for the two latter cases.
This method was proven to be robust for high-Reynolds flows across various airfoil geometries.If needed adjustments are provided, the methodology used in this research can be applied to other iterative design methods to enhance their performance without any need to import any external datasets or neural networks and increase their data efficiency.Furthermore, the data generated including flow-field variables other than pressure during the flow-field analysis of each geometry can be leveraged to increase their method robustness even further and address other aspects and weaknesses of that particular method.Furthermore, a better interpretation of neural network concepts in aerodynamics paves the way for more effective use and development of specialized neural networks.

Figure 2 .
Figure 2. Schematic of the computational domain , airfoil surface L, and far-field boundary condition L ∞ .

Figure 3 .
Figure 3. (a) Schematic of beam model used for airfoil walls in ESA.(b) Methodology of finite element solution to the non-linear beam equations.

Figure 4 .
Figure 4. Computational grid and boundary conditions for the investigation of the computational fluid dynamics (CFD) solution.

Figure 5 .
Figure 5. Evaluation of pressure distribution independency of grid's size for the (a) lower and (b) upper surfaces.

Figure 6 .
Figure 6.Designed architecture of the DFNN for target geometry estimation.

Figure 7 .
Figure 7. Deep learning model output of various airfoils in the validation set: (a), (b), (c), and (d) before the optimization of the network, and (e), (f ), (g), and (h) after the improvements on the network architecture, settings, and preprocessing of the data.

Figure 8 .
Figure8.Effect of increasing the network parameters without increasing the number of layers and network's depth, on its performance on test set(Goodfellow et al., 2014).

Figure 9 .
Figure 9. Architecture of an LSTM unit with inputs, outputs, and key parameters.

Figure 10 .
Figure 10.Impact of preservation of the constant geometrical points and NR increase on LSTM performance.

Figure 11 .
Figure 11.Schematic of an LSTM layer's input and output configurations and its terms.
et al., 2012).Each network was trained and evaluated by 10 different weight initializations.

Figure 15 .
Figure 15.Loss function value for DFNN and LSTM network under the described representations on the test set across varied datasets of inverse design data.The lower loss function is equal to a better estimation of the target geometry.

Figure 16 .
Figure 16.Computational cost of DFNN and LSTM network training under various input representations of data for differently sized datasets of inverse design data.
correct parameters are chosen(Morgado-Dias & Mota, 2003).However, it increases the model complexity and number of hyperparameters.

Figure 17 .
Figure 17.Flowchart of incorporating the machine learning techniques with the inverse design algorithm in offline mode.

Figure 18 .
Figure18.The initial and final geometries with a maximum 1% deviation from the target pressure distribution for the two initial geometries: (a) case 1: 50% thickness of the target airfoil; and (b) case 2: a linear thickness distribution of 150% to 50% of the target airfoil from the leading edge to the trailing edge.

Figure 19 .
Figure 19.Geometry corrections from the initial geometry toward the final geometry during the inverse design of the NACA-0012 airfoil with 50% thickness of the target geometry as the initial geometry.

Figure 20 .
Figure 20.The LSTM network performance using 28% of the total data generated in the inverse design of the NACA-0012 airfoil as the dataset: (a) the training set and test set error changes during network training and (b) error histogram.

Figure 21 .
Figure 21.Comparison of the NN output and target geometry using the first 22% of the NACA-0012 total data generated during the inverse design, before (right-hand side) and after smoothing (left-hand side) with the Bezier curve.

Figure 22 .
Figure 22.Comparison of the NN output and target geometry using the first 28% of the NACA-0012 total data generated during the inverse design, before (right-hand side) and after smoothing (left-hand side) with the Bezier curve.

Figure 23 .
Figure 23.Initial and target geometry as well as converged ML estimated geometry (right-hand side) and their corresponding pressure distributions (left-hand side) using only 28% of the NACA-0012 total data.

Figure 24 .
Figure24.Geometry corrections from the initial geometry toward the final geometry during the inverse design of the DSMA-523A airfoil with a 50% thickness of the target geometry as the initial geometry.

Figure 25 .
Figure 25.The LSTM network performance using 56% of the total data generated during the inverse design of the DSMA-523A airfoil as the dataset: (a) the training and test set error changes during training the network and (b) error histogram.

Figure 26 .
Figure 26.Target geometry and NN output using the first 52% of the DSMA-523A total data generated during the inverse design, before (right-hand side) and after smoothing (left-hand side) with the Bezier curve.

Figure 27 .
Figure 27.Target geometry and NN output using the first 56% of the DSMA-523A total data generated during the inverse design, before (right-hand side) and after smoothing (left-hand side) with the Bezier curve.

Figure 28 .
Figure 28.Initial and target geometry as well as converged ML estimated geometry (right-hand side) and their corresponding pressure distributions (left-hand side) using only 56% of the DSMA-523A total data.

Figure 29 .
Figure 29.Target geometry and NN output using the first 46% of the DSMA-523A total data generated during the inverse design, before (right-hand side) and after smoothing (left-hand side) with the Bezier curve.

Figure 30 .
Figure 30.Geometry corrections from the initial geometry toward the target geometry during the inverse design of the FX63-137 airfoil with a 50% thickness of the target geometry as the initial geometry.

Figure 31 .
Figure 31.The training set and test set error changes of the LSTM network, using 52% of the total data generated during the inverse design of the FX63-137 airfoil as the dataset: (a) upper surface and (b) lower surface.

Figure 32 .
Figure 32.Target geometry and NN output using the first 52% of the FX63-137 total data generated during the inverse design, before (right-hand side) and after smoothing (left-hand side) with the Bezier curve.

Figure 33 .
Figure 33.Initial and target geometry as well as converged ML estimated geometry (right-hand side) and their corresponding pressure distributions (left-hand side) using only 52% of the FX63-137 total data.

Figure 34 .
Figure 34.A comparison of the computation cost for the inverse design and hybrid ML-ID method in estimating the target geometry in the offline state.

Figure 35 .
Figure 35.The inverse design module in online connection with the machine learning module.

Figure 36 .
Figure 36.Comparison of the target geometry and the online ML-ID output in the 76th iteration (right-hand side) and smoothed geometries with Bezier curve (left-hand side).

Figure 37 .
Figure 37.Comparison of the target geometry and the online ML-ID output in the 101st iteration (right-hand side) and smoothed geometries with Bezier curve (left-hand side), after the ML-ID output in 76th was feedbacked to the ID module.

Figure 38 .
Figure 38.Initial, target, and ML estimations at the 76th and 101st iterations, and converged geometry at the 110th iteration with 1% maximum deviation from the target pressure distribution (right-hand side) and their corresponding pressure distributions (left-hand side).

Figure 39 .
Figure 39.Pressure error function variation versus iteration for the inverse design of lower surface of the DSMA-523A when using the ML module at the 76th and 101st iteration in the online hybrid ML-ID.

Figure 40 .
Figure 40.Comparison of the computational cost of the pure inverse design, offline, and online hybrid ML-ID module for the DSMA-523A when using the ML module at the 76th and 101st iteration.

Figure 41 .
Figure 41.Comparison of the target geometry and the online ML-ID output in the 26th iteration (right-hand side) and smoothed geometries with Bezier curve (left-hand side).

Figure 42 .
Figure 42.Comparison of the target geometry and the online ML-ID output in the 51st iteration (right-hand side) and smoothed geometries with Bezier curve (left-hand side), after the ML-ID output in 26th was feedbacked to the ID module.

Figure 43 .
Figure 43.Initial, target, and ML estimations at the 26th and 51st iterations, and converged geometry at the 51st iteration with maximum 1% deviation from the target pressure distribution (right-hand side) and their corresponding pressure distributions (left-hand side).

Figure 41 .
Figure41.Then, the LSTM output was fed back into the ID loop and, after 25 geometry corrections by the ID, it was re-inputted into the ML module.Figure42shows the output of the second use of the LSTM network and compares the output geometry with that of the target.Notably, immediately after the second use of the LSTM network, the ID converges with the 1% maximum deviation from the target pressure distribution criterion at the 101st iteration.Figure43illustrates the geometries and corresponding pressure distributions for the initial, target, ML estimations at the 26th and 51st iterations, and converged geometry at the 51th iteration.Figure44shows the changes in PEF, using Eq.17, across various iterations.A significant reduction in the PEF value is seen in the iterations related to the first and second uses of the ML module.Figure45 compares

Figure 44 .
Figure 44.Pressure error function variation versus iteration for the inverse design of lower surface of the DSMA-523A when using the ML module at the 26th and 51st iteration in the online hybrid ML-ID.

Figure 45 .
Figure 45.Comparison of the computational cost of the pure inverse design, offline, and online hybrid ML-ID module for the DSMA-523A when using the ML module at the 26th and 51st iteration.

Figure 46 .
Figure 46.Comparison of the target geometry and the online ML-ID output in the 51st iteration (right-hand side) and smoothed geometries with Bezier curve (left-hand side).

Figure 47 .
Figure 47.Comparison of the target geometry and the online ML-ID output in the 76th iteration (right-hand side) and smoothed geometries with Bezier curve (left-hand side), after the ML-ID output in 51st was feedbacked to the ID module.

Figure 48 .
Figure 48.Comparison of the target geometry and the online ML-ID output in the 101st iteration (right-hand side) and smoothed geometries with Bezier curve (left-hand side), after the ML-ID output in 76th was feedbacked to the ID module.

Figure 49 .
Figure 49.Initial, target, and ML estimations at the 76th and 101st iterations, and converged geometry at the 104th iteration with 3% maximum deviation from the target pressure distribution (right-hand side) and their corresponding pressure distributions (left-hand side).

Figure 50 .
Figure 50.Pressure error function variation versus iteration for the inverse design of both surfaces of the FX63-137 when using the ML module at the 51st, 76th and 101st iteration in the online hybrid ML-ID.

Figure 51 .
Figure51.Comparison of the computational cost of the pure inverse design, offline, and online hybrid ML-ID module for the FX63-137 when using the ML module at the 51st, 76th, and 101st iteration.

Table 1 .
Designed LSTM Network Architectures for different representations of input/output data.
a Number of Hidden Units.

Table 2 .
Computational cost reductions by using the ML and ID modules in the offline state.