Accurate Discharge Coefficient Prediction of Streamlined Weirs by Coupling Linear Regression and Deep Convolutional Gated Recurrent Unit

Streamlined weirs which are a nature-inspired type of weir have gained tremendous attention among hydraulic engineers, mainly owing to their established performance with high discharge coefficients. Computational fluid dynamics (CFD) is considered as a robust tool to predict the discharge coefficient. To bypass the computational cost of CFD-based assessment, the present study proposes data-driven modeling techniques, as an alternative to CFD simulation, to predict the discharge coefficient based on an experimental dataset. To this end, after splitting the dataset using a k fold cross validation technique, the performance assessment of classical and hybrid machine learning deep learning (ML DL) algorithms is undertaken. Among ML techniques linear regression (LR) random forest (RF) support vector machine (SVM) k-nearest neighbor (KNN) and decision tree (DT) algorithms are studied. In the context of DL, long short-term memory (LSTM) convolutional neural network (CNN) and gated recurrent unit (GRU) and their hybrid forms such as LSTM GRU, CNN LSTM and CNN GRU techniques, are compared using different error metrics. It is found that the proposed three layer hierarchical DL algorithm consisting of a convolutional layer coupled with two subsequent GRU levels, which is also hybridized with the LR method, leads to lower error metrics. This paper paves the way for data-driven modeling of streamlined weirs.


Introduction
Weirs are the most useful and common hydraulic structures, which are applied in various usages such as irrigation networks, sewage networks and water supply systems (Abdollahi et al., 2017). According to the crest type, main weir groups are classified into sharp-, broad-, and short-crested weirs. Circular-crested, overflow (ogee) and streamlined weirs are special kinds of short-crested weirs (Bagheri & Kabiri-Samani, 2020a).
Streamlined weirs, as a nature-inspired type of weirs, has gained tremendous attention among hydraulic engineers due to their well-known performance with high discharge coefficient, overflow stability behaviour and minimized fluctuation in water free surface.
The general shape of streamlined weirs, which is designed according to aerofoils, is originally derived from birds' wing topology. The importance of streamlined weirs, purported to be the most state-of-the-art form of weirs, is well-documented in hydraulic engineering field (Rao & Rao, 1973;Bagheri & Kabiri-Samani, 2020). However, due to the complexity of the geometry of streamlined weir in design, this kind of weir has been paid less attention among practitioners. The estimation of the discharge coefficient of weirs is an important subject since many experimental and/or numerical researches have been undertaken recently in different types of weirs (Arvanaghi et al., 2014;Arvanaghi & Oskuei, 2013;Borghei et al., 1999;Johnson, 2000;Mahtabi & Arvanaghi, 2018;Qu et al., 2009;Rady, 2011;Tullis, 2011). For the last two decades, computational fluid dynamics (CFD) has drawn tremendous attention from both academia and industry to model problems that involve fluid domains and their corresponding boundary condition and interactions. OpenFOAM software, as an open-source toolbox, is widely used in high-fidelity computational models due to its incorporation of a vast variety of solvers compatible with different range of fluid flows. Although CFD-based performance assessment of fluid-flow phenomena leads to reliable results, it suffers from computationally demanding procedures and a requirement of profound academic knowledge in the field of fluid mechanics (Bagheri & Kabiri-Samani, 2020b, 2020a. Data-driven modelling offers a framework to assess a model as a black-box. Hence, it is possible to analyse a broader range of models and systems irrespective of the nature of the problem. In particular, ML-DL modelling is an active field of research in other engineering fields such as structural and earthquake engineering (Abasi et al., 2021;M S Barkhordari & Es-haghi, 2021;Mohammad Sadegh Barkhordari & Tehranizadeh, 2021;Esteghamati & Flint, 2021;Hariri-Ardebili & Salazar, 2020;Pourkamali-Anaraki et al., 2020;Soraghi & Huang, 2021), biomedical engineering (Alizadehsani et al., 2021;Ayoobi et al., 2021), etc. Other applications of ML-DL techniques can also be found (Aswin et al., 2018;Athira et al., 2018;Selvin et al., 2017;Vinayakumar et al., 2017).
Recently, different ML and surrogate modeling algorithms have been applied in various hydraulic engineering problems such as dams, sedimentation, spillway, etc. Bhattacharya et al., 2007;Hariri-Ardebili et al., 2021;Roushangar et al., 2014;Torres-Rua et al., 2012). It is recognized that an empirical relationship for discharge coefficient based on experimental or hydraulic models faces some limitations regarding hydraulic and geometric parameters (Ebtehaj et al., 2018). The main motivation of the present study is to bypass the computational cost of discharge coefficient prediction via CFD framework by investigating the potential capability of hybrid ML-DL algorithms as an alternative to CFD-based simulations. The comparison between CFD-based discharge coefficient and the proposed data-driven techniques is also graphically illustrated in Figure 1. which is inspired from Kabiri-Samani, 2020a and2020b). The data-driven modelling part of Figure 1 will be discussed comprehensively in Sections 4 and 5. The incorporation of various geometric and hydraulic parameters affecting hydraulic operations of weirs require the application of an accurate model to determine their discharge coefficients. In this context, the need for proposing an accurate technique for the estimation of discharge coefficient is a challenging task.
In this work, a group of 12 classical and hybrid ML-DL algorithms are employed to predict the discharge coefficient of streamlined weirs based on an experimental dataset.
In the following, Section 2 describes literature related to different usages of ML-DL techniques in weirs. Section 3 explains the data employed in this study. Section 4 describes the ML-DL algorithms including the proposed one. Section 5 illustrates results obtained by different data-driven techniques. Finally, in Section 6, the conclusion is presented, and future works are outlined.

Related works
The determination of discharge coefficient of weirs is the most momentous factor for the design of these hydraulic structures. Several studies were performed by using various ML-DL algorithms to predict the discharge coefficient. In this section, some of the state-of-the-art ML-DL techniques related to the estimation of the discharge coefficient are presented in Table 1 considering different weir configurations. One may note that none of the existing studies investigated the potential capability of ML-DL techniques for streamlined weirs which reflects the main motivation of the present study.

Data description
The flow rate over a short-crest weir are computed based on continuity and Bernoulli's equations as expressed in Equation (1): where is weir discharge coefficient; represents weir width; 1 = ℎ 1 + ℎ describes total head; ℎ 1 is upstream head over the crest; ℎ indicates upstream velocity head and equals to 2 /2 ; refers to approach velocity; and g denotes the acceleration due to gravity.
In this research, an experimental dataset for 120 models of streamlined weirs, which are designed based on the principle of the Joukowsky transform function, is used (Bagheri & Kabiri-Samani, 2020a). The model is graphically illustrated in Figure 2 and the related hydraulic parameters are shown in Table 2. The data consist of two groups, namely with and without base-block under streamlined weirs. In models without baseblock, parameter is considered equal to zero. Table 2 shows 9 parameters, which are considered as model inputs in the proposed method. Besides, the discharge coefficient is the model output.

Methods
In this section, the studied ML-DL methods are introduced in Section 4.1. Details of the implemented methods and parameters are also stated. Besides, the proposed method is introduced in detail in the following. All data-driven techniques are implemented by Python programming language. In this research, "sklearn" and "keras" packages by "tensorflow" backend are used for program development. A GPU GFORCE GTX950 with 16GB RAM DDR4 is used as the implementation hardware.

Machine-Deep Learning Algorithms
With the development of ML-DL methods, a good variety of ML-DL-based models were introduced and received extended attention (see Table 1).   (Cho et al., 2014) and their hybrid forms such as LSTM-GRU, CNN-LSTM and CNN-GRU techniques are analysed by different error metrics. In the following, DL techniques are introduced briefly while a detailed discussion on the proposed algorithm is provided. As a variant of recurrent neural network (RNN), LSTM has a long-term memory function that is suitable for processing important events with long intervals and delays in time series. Therefore, the neural network structure, which is primarily composed of LSTM units with memory functions, can make decisions based on previous states to adapt to various running scenarios (Guo et al., 2021). LSTM has been widely used in issues related to sequential data such as natural language processing (NLP), voice recognition, and time series analysis (Sezer & Ozbayoglu, 2018).

CNN's original idea was initially modeled on mammalian vision. This type of
network is able to achieve results similar to humans in some cases and even stronger than human vision in some other cases. CNN is made up of a number of convolutional layers.
From the combination of these layers of convolution, a deep neural network is formed.
CNN has been widely used and achieved brilliant results in image processing, image classification and computer vision (Sammut & Webb, 2011).
Similar to LSTM, GRU is another variant of RNN. In general, two main layers are implemented in GRU. It first determines how the previous information should be passed along to the future. Next, it determines how much of the past information must be discarded in the second layer (Ayoobi et al., 2021). GRU leads to better performance for smaller and less frequent datasets in comparison to LSTM (Gruber & Jockisch, 2020).
Model parameters of these classical DL techniques are summarized in Table 4.

Proposed Method (LR-CGRU)
The dataset is split into the "training" and "testing" groups to generate meta-inputs for the proposed algorithm. A successful out-of-sampling technique for this purpose is the k-fold cross-validation (CV) technique. In this context, by transforming the whole dataset into k mutually exclusive and collectively exhaustive subsets, only one set is used for testing and the remaining (k-1) subgroup are incorporated in the training procedure.
In addition, the initial weigh assignment of ML-DL algorithms is commonly performed by a random configuration. Hence, k-fold CV technique can lead to unbiased assessment.
In the proposed ML-DL algorithm in the present study, k=5 is used for the CV tool.
According to Razavi-Far et al. (2019), the predictive models are trained in "one-stepahead" configuration.
A three-layer hierarchical DL algorithm consisting of a convolutional layer coupled with two GRU levels is introduced as the final DL algorithm, which is also hybridized by LR method as the ML technique due to its lower CV errors (detailed explanation of error metrics and their obtained values for ML-DL algorithms will be discussed in Section 5). Accordingly, LR-CGRU is the combination of LR, CNN and GRU and uses a convolutional layer as the first layer and two GRU layers subsequently in the DL phase. A graphical representation of the proposed algorithm is demonstrated in Figure 3. The proposed model is trained 5 times due to the usage of 5-fold CV technique.
In 5 the remaining 20 percent. Accordingly, we have 5 predicted datasets by both ML and DL algorithms, in which the computed data are averaged for both ML and DL methods.

Verification of the proposed algorithm
In this section, at the first stage, the predicted results of all ML methods including SVM, RF, LR, KNN and DT are compared with the experimental results, which are graphically demonstrated in Figure 4(a)-(e). Intuitively, it can be observed that LR and RF methods provide better results compared to other ML techniques in terms of YY plot. Figure 4. Comparison between the experimental data and ML methods; Figure 4 An ML-DL model can be evaluated in a tricky manner. The dataset usually is split into training and testing sets. Then, the model performance is evaluated based on an error metric to specify the precision of the model. However, this technique is not reliable enough as the computed accuracy for one test set may be very different from another one.
To cope with this problem, k-fold cross-validation (CV) is performed. As mentioned in Section 4.2, 5-fold CV technique is considered for all applied ML-DL algorithms. In detail, in the first iteration, the first fold is employed to test ML-DL model and the rest of the data is considered as the training set. In the next iteration, the second fold is used as the testing set and the rest of data is employed as a training set. This procedure continues until 5 folds.
To assess the performance of each ML-DL method, eight error metrics, namely mean squared error (MSE), root mean squared error (RMSE), mean absolute error (2) where describes real (i.e., experimental) dataset and ̂ refers to predicted outputs.

Comparison with previous works
Finally, the data-driven outputs are compared with those of previous related works. Bagheri and Kabiri-Samani (2020a) proposed an algebraic equation to compute the streamlined discharge coefficient ( ) using dimensional analysis and curve-fitting tool in MATLAB as follows: Based on Equations (10) and (11), the coefficient was: In Carollo & Ferro (2021), according to dimensional analysis and self-similarity theory, the stage-discharge relationship was obtained as: .0412 (13) by combining Equations (11) and (12): By substituting Equation (13) into Equation (14): .0412 In the last step, the discharge coefficient was obtained as: In Figure 7, results from equations proposed by Bagheri & Kabiri-Samani (i.e., Eq. (10)) and Carollo & Ferro (i.e.,Eq. (16)) are compared with those by the proposed LR-CGRU algorithm. As it can be seen, the proposed data-driven technique provides more accurate outputs than the algebraic expressions introduced by Bagheri & Kabiri-Samani (2020a) and Carollo & Ferro (2021), which highlights the superiority of ML-DL driven techniques for the prediction of discharge coefficient.

6-Conclusion and future works
This paper aims to predict the discharge coefficient of streamlined weirs, which are known as a state-of-the-art type of weirs. As an alternative to the computational fluid dynamic procedure to predict discharge coefficient of this nature-inspired type of weirs, the potential superiority of machine learning-deep learning algorithms is investigated.
Five classical machine learning techniques, namely linear regression, random forest, support vector machine, k-nearest neighbours, and decision tree, are applied. In addition, amongst deep learning algorithms, long short-term memory (LSTM), convolutional neural network (CNN) and gated recurrent unit (GRU) and their hybrid forms (i.e., LSTM-GRU, CNN-LSTM and CNN-GRU) are compared by eight different error metrics.
To enhance accuracy, a three-layer hierarchical deep learning algorithm consisting of a convolutional layer coupled with two subsequent GRU levels, which is also hybridized by the linear regression method (i.e., LR-CGRU), is proposed. In general, hybrid deep data-driven algorithms provide more accurate results than the classical ones.
Furthermore, it is clearly demonstrated that LR-CGRU technique outperforms other eleven machine-deep learning algorithms.
Finally, the superiority of the proposed data-driven technique is demonstrated by a comparative analysis between previously introduced algebraic expressions to predict discharge coefficient. Results indicate that LR-CGRU algorithm can act as an alternative tool to forecast the discharge coefficient of streamlined weirs accurately, which paves the way for data-driven modelling of streamlined weirs. Although the capabilities of twelve machine-deep learning algorithms are investigated to predict discharge coefficient, there is still a need for future studies to enhance both accuracy and efficiency of the estimation.
Furthermore, investigation on the application of the proposed ML-DL algorithm in probabilistic risk assessment (Ali Kia et al., 2021) of streamlined weirs can be performed in future works.