Design of modified structure multi-layer perceptron networks based on decision trees for the prediction of flow parameters in 90° open-channel bends

ABSTRACT A modified multi-layer perceptron (MLP) model based on decision trees (DT-MLP) is presented to predict velocity and water free-surface profiles in a 90° open-channel bend. The ability of the new hybrid model to predict the velocity and flow depth in a 90° sharp bend is investigated and compared with the abilities of MLP and multiple-linear regression (MLR) models. The MLP and DT-MLP networks are trained and tested using 520 and 506 experimental data measured for velocity and flow depth, respectively, at five different discharge rates of 5, 7.8, 13.6, 19.1 and 25.3 l/s. The MLP and DT-MLP comparison results against MLR reveal that the two artificial neural networks (ANNs) are 84% and 16% more accurate than the MLR model in predicting the velocity and flow depth variables, respectively. According to the results, the root mean square error (RMSE) value of the DT-MLP model decreases by 9% and 7.5% in predicting velocity and flow depth, respectively, compared with the MLP model. It was found that the hybrid decision-tree-based method can significantly improve MLP neural network performance in forecasting velocity and free-surface profiles in a 90° open-channel bend.


Introduction
Most rivers and open channels have curved paths. Therefore, it is essential to understand the hydraulic behavior of flow in bends. In sharp bends (R c /b ≤ 3, where R c is the central radius of the channel and b is the channel width), the flow patterns are more complex than in mildly curved bends (R c /b > 3; Leschziner & Rodi, 1979). However, flow in sharp bends is influenced more by the centrifugal force of its main property -i.e., the secondary flow of Prandtl's first kind -than flow in mild bends. Secondary flow affects different flow variables, such as velocity and water depth profiles, even in the cross sections before and after a bend, making the flow structure complex. Secondary flow is the main reason that fluctuations occur in the velocity components of bends (Naji, Ghodsian, Vaghefi, & Panahpur, 2010). Therefore, many research works have concentrated on examining the flow patterns, secondary flows and their power, and velocity and flow turbulence components in curved channels. Shukry (1950) and Rozovskii (1961) were the first to present an extensive experiment on curved channels. Rozovskii investigated the hydraulic characteristics of flow in curved channels with 90°mild bends and 180°sharp bends; Rozovskii also examined the location of the maximum velocity and its displacement. Ye and CONTACT Hossein Bonakdari bonakdari@yahoo.com McCorquodale (1998) performed an experimental study on flow patterns in curved channels. According to the results, secondary flows and super-elevation begin in the upstream cross sections and slowly reach the inner cross sections of the bend. Channel bed level changes in a sharp bend with a movable bed were investigated by Blanckaert and Graf (2001), who reported a minor secondary rotating flow cell at the outer bend wall. Gholami, Akhtari, Minatour, Bonakdari, and Javadi (2014) employed experimental and numerical models to investigate the flow pattern in a 90°sharp bend. The results indicated that the inner wall and cross section after the bend face maximum velocity, subsequently transmitting it to the outer wall. Numerous experimental studies have been conducted on the hydraulic behavior of flow within a bend, including examinations of the main flow properties, i.e., the velocity and depth in channel bends (Anwar, 1986;Bergs, 1990;Blanckaert & DeVriend, 2004;DeVriend & Geldof, 1983;Sui, Fang, & Karney, 2006;Uddin & Rahman, 2012).
In the numerical field, Jung and Yoon (2000) studied flow patterns and bed topography in a 180°mild bend. It was observed that, generally, in mild bends with any type of bed material, the maximum velocity at the begining of the bend is oriented toward the inner wall and as it moves towards the end of the bend, its geometric location gradually shifts towards the outer wall. The flow pattern in a 270°sharp bend as well as the velocity components and water surface level within the channel were numerically modeled and examined by DeMarchis and Napoli (2006). Bodnar and Prihoda (2006) numerically modeled the turbulent free-surface fluid flow in a 90°sharp bend. To examine the flow pattern in a 180°mildly-curved channel, Zhang and Shen (2008) presented a threedimensional model that was able to effectively represent water surface level variations, longitudinal and transversal velocity profiles and the phenomenon of flow separation in curved channels. However, it could not accurately predict the longitudinal water surface profile on the channel axis. Ramamurthy, Han, and Biron (2013) performed a comprehensive study to propose the best modeling parameters for free-surface flow in sharp bends. They found that the Reynolds stress turbulence model is in best agreement with the experimental results.
According to the results, the suggested networks are able to predict velocity values in different cross sections well. However, the accuracy of the genetic algorithm method was higher than the MLP method in velocity flow prediction. Bilhan, Emiroglu, and Kisi (2011) examined an ANN model in predicting the discharge of triangular side weirs in curved channels. Baghalian, Bonakdari, Nazari, and Fazli (2012) studied and compared the performance of MLP with an analytical solution and a numerical model to investigate the flow patterns in curved channels. Zaji and Bonakdari (2014) assessed the performance of a number of ANNs in predicting the discharge capacity of triangular side weirs. The results indicated that the MLP model is the least precise means of predicting these parameters among the studied models. Sahu, Jana, Agarwal, and Khatua (2011) used an MLP model to study the velocity field within a curved channel. Rowiński, Piotrowski, and Napiórkowski (2005) employed MLP modeling to calculate the longitudinal dispersion coefficient in rivers but not fully satisfying the results obtained using ANNs.
In the present research, a new hybrid MLP model based on decision trees (DT-MLP) is designed for the purpose of improving MLP performance. Two models, namely DT-MLP and MLP, are generated to predict velocity and two separate models are created to predict water surface depth. Vast experimental tests were undertaken by the authors in a 90°sharp bend to train and test the networks (Akhtari, Abrishami, & Sharifi, 2009;Bahrami, Ghaneeizad, & Akhtari, 2009). The inputs to both models are the points coordinates (X and Y) and different flow discharge rates (Q), while the outputs are velocity and flow depth. The multiple-linear regression (MLR) model serves to predict the two abovementioned variables, and by comparing the error values of the MLP, DT-MLP and MLR models, their performance is evaluated. Moreover, the MLP and DT-MLP model performance is compared with the experimental results from the prediction of flow variables at different discharge rates.

Experimental model
The experimental research was conducted on a flume in the hydraulic laboratory at the Ferdowsi University of Mashhad, Iran (Gholami et al., 2014).

Geometric properties of the flume
The flume components were as follows: (1) Straight input channel: 3.6 m long, located upstream; this length was chosen to create a flow that was stable and developed in the channel before reaching the bend.
(2) Curved channel: the central angle of the bend was 90°and the central radius (R c ) was 60.45 cm. The bend was sharp (R c /b = 1.5 < 3) in terms of channel width (b = 40.3 cm).
(3) Straight output channel: 1.8 m long, located downstream; this channel was selected to create flow that developed after the bend and to prevent turbulence caused by overflow upon reaching the bend.
The channel's cross section was 40.3 × 40.3 cm (width × height). The channel boundaries were smooth with a Manning's roughness coefficient (n) of 0.008. The geometric properties of the channel are shown in Figure 1.

Experimental process
The main reservoir was placed at the beginning of the channel's entrance. A sharp-crested triangular weir in the main reservoir measured the discharge rate, whereas a sharp-crested rectangular weir was installed downstream of the straight channel's end. In order to regulate the channel water depth, the weir height was adjustable. The velocity was read after regulating the discharge rate and water depth within the channel. Upon reading the velocity, the channel bed and water surface values were read for the above-mentioned cross sections with a micrometre. A one-dimensional velocity meter (propeller) was used to read the axial velocities in the flume (Armfield Limited, Co., 1995). A micrometre also measured the water depth. The micrometre measured depth with 0.1 mm precision and velocity with 2 cm/s precision . A view of the channel is presented in Figure 2.   In the present research, the water entered the flue at five different discharge rates (5, 7.8, 13.6, 19.1 and 25.3 l/s) by changing the opening size of the valve to the main reservoir. The various hydraulic properties applied in the lab are shown in Table 1. The Froude and Reynolds numbers are defined as Fr = (V/ √ g. D) and Re = (V. D/ν), where V is the velocity, g is the gravity acceleration, D is the hydraulic depth, and ν is the kinematic viscosity of 0.804 × 10 −6 .

Soft computing methods
In this section, the hybrid DT-MLP method is explained. The proposed method comprises MLP ANN regression and a DT classification algorithm.

Multi-layer perceptron (MLP) neural network
The flexible form of MLP in simulating nonlinear problems with high applicability (Haykin, 1999) has led to the widespread use of this method in practical situations Kim, Shiri, & Kisi, 2012;Kisi, 2009). An MLP is constructed from three layers: input, hidden and output. The model's input variables are introduced by the input layer via neurons and are then transferred to the hidden layer accordingly (Bishop, 1995). The hidden layer cumulates the input layer neurons by using a weighted summation and subsequently transmits them to a non-linear future via activation functions. MLP models often employ sigmoid activation functions (Chang & Liao, 2012;Ebtehaj & Bonakdari, 2013;Moharrampour, Kherad, Abachi, Zoghi, & Asadi, 2012;Rezaeian Zadeh, Amin, Khalili, & Singh, 2010). Any function that is bounded and has a proportional relation between the input and output variables is a sigmoid function (Smith, 1993). In the present study, the hyperbolic tangent activation function is utilized for the hidden layer (Dawson & Wilby, 2005;Khan & Coulibaly, 2006;Yonaba, Anctil, & Fortin, 2010): The output MLP layer performs like a linear regressor. Therefore, a weighted summation of the hidden layer neurons is carried out in this layer to evaluate the final model results. The input and output numbers are the same for the neuron layer and model variables, therefore no rule is available to specify the number of hidden layer neurons. If the neuron number is extremely low, the analysis capability and numerical accuracy of prediction will consequently reduce. However, if the hidden layer neuron number is extremely high, the model will undergo overtraining and memorize rather than analyse the data. Thus, trial and error should be employed to determine the number of hidden layers (Bilhan et al., 2010;Gholami et al., 2015;Kisi, 2008b;Kisi & Cigizoglu, 2007). As stated earlier, the hidden and output layers require weighted summations. Determining the weight coefficients in the MLP method is called training. In this study, the Levenberg-Marquardt (LM) method is applied (Levenberg, 1944) for the training process. LM uses the backpropagation algorithm to determine the models' weights. The training 'stop' criterion of the present models is considered 100 epochs, which is when the models converge completely (Kisi, 2008a;Kisi & Cigizoglu, 2007;Zaji & Bonakdari, 2015b). The number of epochs (iterations) is selected with respect to each model. Until both networks reach an acceptable error level between data obtained from the ANN models and experimental data, the training process continues for 100 iterations.

Hybrid decision-tree-based multi-layer perceptron (DT-MLP) neural network
The hybrid DT-MLP functions with the DT classification algorithm introduced by Breiman, Friedman, Olshen, and Stone (1984) in combination with the MLP regression method in order to increase the method's performance. The class variable Y takes values of 1 to k, where k is the number of determined problem classes. The purpose of classification is to predict Y using the input variables X 1 to X n , where n is the number of input variables. The DT algorithm has some advantages over other classification methods. The first benefit of DT is its simplicity; DT applies trial and error to find the optimum split locations of the dataset. The second advantage of DT is that its results are presented in a tree structure -thus, presenting classifications of three or more input variables is very convenient.
In the DT-MLP method, a DT is used to optimize the MLP regression method by separating the entire dataset into segments. Therefore, in place of using a large MLP model for the entire dataset, the dataset and MLP are divided into smaller parts. The DT-MLP procedure is as follows. The DT algorithm should be trained with the training dataset. In this step, classification precision plays an important role. High classification precision leads to a large, impractical tree as well as a high probability of overtraining, which occurs when the accuracy of the training dataset classification is better than the testing dataset classification. On the other hand, using smaller trees leads to low classification precision, which consequently results in low DT-MLP regression performance. Therefore, trial and error is employed in the current study to determine the optimum DT classification precision. The DT algorithm splits the dataset into k classes. The maximum allowable number of classes found with trial and error according to the dataset change ranges is selected. Following dataset splitting, the first largest MLP is divided into k smaller MLP models. The maximum number of hidden nodes in the first MLP model is considered equal to the sum of the number of hidden layer neurons in the smaller MLP models, which are developd in the DT-MLP model. Regarding the maximum acceptable number of hidden layer neurons, trial and error is used to find the greatest optimum MLP number and smaller MLP models in the DT-MLP method. There are 12 and 15 hidden layer neurons in the MLP model for velocity and water depth, respectively. To achieve a better comparison of the two models (MLP and DT-MLP), the divisions in the DT-MLP model are considered, while the sum of the hidden layer neurons in the 3 velocity classes and 5 water depths achieved is 12 and 15, respectively, to obtain a model with the highest accuracy. The final step in DT-MLP entails collecting the separate class results of each smaller MLP to generate the final model outcomes. In this stage, the results of the smaller MLP models, which were applied to the classified dataset, are gathered to export the final model results. The most suitable number of classes differs for each dataset case and should be determined using trial and error. In the present study, five different classes, namely 'very low', 'low', 'medium', 'high' and 'very high' are considered for depth value prediction and three classes of 'low', 'medium' and 'high' are considered for velocity prediction. The DT-MLP procedure is presented in Figure 3. The designed DT-MLP model presents equations for velocity and water surface depth, as shown in Appendices 1 and 2, respectively.

Model performance evaluation
The performance of the MLR, MLP and DT-MLP models in this work is verified with the mean absolute error (MAE), the root mean square error (RMSE), the coefficient of determination (R 2 ) and the average absolute deviation (δ), as computed by the following equations: where t i is the output observational parameter, O i is the parameter predicted by the MLP and DT-MLP models, O i is the mean neural models' parameter and N is the number of parameters. The RMSE and MAE show the difference between the modeled and observed data in the same unit. Higher model accuracy leads to RMSE and MAE values closer to zero. In order to investigate the performance of the considered models in practical situations with different ranges of input and output variables, non-dimensional statistics are used. δ is the non-dimensional parameter that facilitates the comparison of different models regardless of dimensions and size. A measuring circumstance of observed outcomes replicated by the model is provided by R 2 , which is the linear regression line between the values predicted by the MLP and DT-MLP models and the observed values to determine the network application.

Data and model analyses
In this section, the data used to train and test the models, the data measurement positions in the experimental tests and the input and output data are explained. The process involved generating two models (MLP and DT-MLP) to predict the velocity and two separate models to predict the water surface depth. Three inputs, i.e., the coordinates of the points in two directions (X and Y) and the flow discharge rate (Q), along with one output, namely the depth-averaged velocity corresponding to these points, are considered in the two velocity prediction models. For the water depth prediction model, the three inputs considered are the coordinates of the points in two directions (X and Y) and the flow discharge rate (Q), along with one output, namely the water depth corresponding to these points. In both velocity prediction models, 104 experimental data were used for each discharge rate, making a total of 520 experimental data for all five discharge rates. These data were divided into two groups: testing and training. Out of 520 data, 364 (70%) were randomly selected for training and 156 (30%) for testing. A total of 506 water surface experimental data were used to predict the water depth, 354 of which (70%) served to train the networks and 152 of which (30%) were used for testing. The data utilized were related to eight different cross sections of a 90°bend (0°, 22.5°, 45°, 67.5°, 90°, 40 cm before the bend and 40 and 80 cm after the bend; Akhtari et al., 2009). Each cross section has 13 transverse points. The value of each velocity data is the depth-averaged velocity of that location. Eight different cross sections, 13 points in the cross section's width, and the distances and coordinates of the studied points are shown in Figure 4.

Velocity prediction
This section presents the results of the depth-averaged velocity predicted by the MLP and DT-MLP models. As mentioned above, three classes, namely 'low', 'medium'  and 'high' are considered in the velocity simulation. The classification tree is presented in Figure 5. A comparison of the MLP and DT-MLP results with the experimental values in the training dataset is presented in Figure 6, a careful examination of which reveals that both the ML and DT-MLP models are consistent with the experimental results. However, both models present larger values when predicting the maximum points and smaller values when predicting the minimum points. Both models completely overlap at all points except for the maximum and minimum points. The DT-MLP model predicted the minimum points better than the MLP model. Also, the minimum points completely overlapped with the experimental results of the DT-MLP model. However, both models are alike at the maximum points. A scatter plot graph of the velocity values predicted by the MLP and DT-MLP models is given in Figure 6 against the experimental values from the testing dataset stage. The velocity values are compressed on both sides of the fit line for both models, but the data are scattered more for the MLP model, indicating that the DT-MLP model is more accurate with an R 2 value close to 1. The DT-MLP hybrid model (R 2 = 0.9540) is more accurate than the MLP model (R 2 = 0.9160). It could therefore be stated that using the DT-MLP hybrid algorithm increases the model accuracy when it comes to predicting the flow velocity in a 90°curved channel, as the value of R 2 is 4% higher.
The line is fitted to the y = C 1 x + C 2 equation, meaning that the model works better when C 1 approaches 1 and C 2 approaches 0. The DT-MLP model has a C 1 value of 0.9540, which is closer to 1 than the C 1 value of the MLP model that is equal to 0.9154. The DT-MLP model indicates the significant difference between the hybrid and MLP models, with a C 2 of 3.28 for DT-MLP. Both models make a slight underestimation. However, the DT-MLP model's amount of underestimation is insignificant and can be overlooked compared with that of the MLP model. Thus, it can be stated that the fit line nearly overlaps the exact line of the DT-MLP model, meaning that the model performed very well.

Water surface prediction
The results for the open-channel bend's water surface are presented in this section. As noted above, the most appropriate number of classes for this dataset is five. Therefore, the dataset is classified using the DT-MLP method into 'very low', 'low', 'medium', 'high' and 'very high'. The optimum classification tree is presented in Figure 7.
The scatter plots of the water depth values predicted by each model are drawn in Figure 8 against the experimental values. The water depth predictions for five different discharge rates are separated and drawn on the opposite sides of the corresponding experimental values. It is clear from this figure that the scatter plots of all discharge rates have an R 2 of 0.9980 with both models, which is almost equal to 1 since the water depths are different for each discharge rate. This indicates that both models (MLP and DT-MLP) predicted water depth highly accurately. The models' accuracy can, however, be easily compared by breaking each graph down and the results of both models (obtained from one run in each model) are separated for different water depths. The MLP model has the highest accuracy for a discharge rate of 5 l/s with an R 2 value of 0.9635 and the lowest accuracy for a discharge rate of 19.1 l/s with an R 2 of 0.9135. The DT-MLP model also benefits from maximum accuracy with a maximum R 2 value of 0.9724 at a discharge rate of 5 l/s discharge and the lowest accuracy with a minimum R 2 value of 0.9105 at a discharge rate of 25.3 l/s. Both models performed with maximum accuracy at the minimum discharge rate and minimum accuracy at the maximum discharge rate. The value of R 2 increased with the DT-MLP model in relation to the MLP model at the discharge rates of 5 and 7.8 l/s, and thus performed better. The value of R 2 reached 0.9650 with the DT-MLP model from 0.9135 with the MLP model at a discharge rate of 19.1 l/s, therefore R 2 increased by almost 6% when the hybrid DT-MLP model was used compared with the simple MLP model. Both models performed similarly at the median discharge rate of 13.6 l/s with almost the same R 2 value. It can therefore be stated that using the hybrid DT-MLP algorithm increases the model accuracy for most discharge rates and leads to higher R 2 values in comparison to the simple MLP model. This increased accuracy is especially evident at low discharge rates, which is one of the benefits of using hybrid algorithms.

Evaluation of depth-averaged velocity profiles
The transverse distributions of depth-averaged velocities in different cross sections at discharge rates of 5, 7.8, 13.6, 19.1 and 25.3 l/s that were predicted by the MLP and DT-MLP models are compared with the corresponding experimental values in Figure 9. The MAEs of these profiles are presented in Table 2. The MLP model error is almost 5 times that of the DT-MLP model at a discharge rate of 25.3 l/s, which indicates that the DT-MLP model is highly accurate and shows an acceptable consistency level with the experimental values in this cross section. The velocity increase is obvious at the entrance cross section of the inner wall bend, which was predicted by all three models for all discharge rates. The accuracy of the DT-MLP model decreased by 45%, 78% and 40% in relation to the MLP model at discharge rates of 5, 7.8 and 25.3 l/s, respectively. The maximum velocity was at the inner wall  in the 45°cross section, since the bend under study is a sharp one. Carefully examining the values in the table for this section also reveals that the DT-MLP model performed more successfully at low discharge rates with a 74% accuracy increase compared with the discharge rate of 25.3 l/s and with a 57% decrease in error. At a discharge rate of 5 l/s and in the bend's entrance and middle cross sections, the error value of both models was larger than for the rest of the discharge rates. Therefore, using the DT algorithm to decrease the error in this cross section is one of the most important and fundamental benefits of the hybrid model. The error value of the MLP model is nearly 2.5 times that of the DT-MLP model at discharge rate of 25.3 l/s in the 90°cross section as opposed to the entrance and middle cross sections of the bend, while at the lowest discharge rate of 5 l/s, the error of the DT-MLP model only decreased by 33% in relation to the MLP model (see Figure 9). This is because the MLP and DT-MLP models are unable to predict the return of the velocity profile at the inner wall at low discharge rates. However, only the DT-MLP model is able to predict this at higher discharge rates; therefore, the error value decreases dramatically. The error value of the MLP model is almost 2.3 times that of the DT-MLP model in the cross section located 80 cm after the bend for a discharge rate of 5 l/s. Therefore, it can be generally stated that using the DT-MLP model at low discharge rates decreases the error value by an average of 80% in comparison with the simple MLP model. The DT-MLP model error decreases by an average of 88% at high discharge rates, which is more obvious before and after the end of the bend cross sections. Similar to high and low discharge rates, the error decrease of the DT-MLP model is not as obvious at the three middle discharge rates of 7.8, 13.6 and 19.1 l/s. Therefore, as mentioned previously, using the DT-MLP hybrid algorithm to predict the velocity profile in different cross sections is very effective and efficient in decreasing the error at high and low discharge rates. The DT-MLP equation for predicting the depth-averaged velocity is presented in Appendix 1. According to the equation, the matrix includes 3 rows and 4 columns, where the number of neurons and velocity number in each class are 4 and 3, respectively. Hence, there are a total of 12 neurons for velocity.

Evaluation of water surface profiles
The transverse distribution of the water surface transverse cross section is compared with the corresponding experimental values in Figure 10 at the cross sections of 0°, 45°, 90°and 40 cm after the bend under different discharge rates. The MAEs of these profiles are presented in Table 3. The MLP and DT-MLP models were both run once for all discharge rates. However, in order to  Figure 10. The transverse profile of water surface predicted by the MLP and DT-MLP models compared with the experimental results at different discharge rates at cross sections of (a) 0°, (b) 45°, (c) 90°and (d) 40 cm after the bend.
compare them better and more precisely with regard to Table 4 and since the water surface prediction models performed similarly, the results related to each discharge rate were separated and the models were evaluated. The presence of centrifugal force (due to the channel's curve) in the initial bend of the cross section creates a transverse gradient in the water surface such that the flow depth increases at the outer wall and decreases at the inner wall. It is clear that both models can efficiently predict the free-surface profile at the channel cross section entrance under all discharge rates. According to Table 3, the DT-MLP model exhibited a 67% error decrease compared with the MLP model in this cross section at a discharge rate of 5 l/s and thus is more consistent with the experimental results. The DT-MLP model was more accurate than the MLP model at a discharge rate of 7.8 l/s and its error decreased by 65%. Both models performed alike at the median discharge rate of 13.6 l/s and the DT-MLP model had a slightly greater error value. DT-MLP was again more accurate than MLP at discharge rates of 19.1 the middle discharge rates. The maximum error decrease in the DT-MLP model is 162.5% and 180% at the discharge rates of 5 and 19.1 l/s, respectively, at the cross section at the end of the bend. It can be concluded that using the suggested modified model to predict water depth in channels in order to better execute them and design wall heights correctly is quite practical. The DT-MLP equation for water surface simulation is presented in Appendix 2. In this equation, the matrix has 3 rows and 3 columns, meaning the numbers of neurons and water depth in each class are 3 and 5, respectively. Therefore, there are a total of 15 neurons for depth.

Comparison of models' capabilities
The performance of the MLP and DT-MLP models used to predict the depth-averaged velocity and water surface is presented in Table 4 through the RMSE, MAE, R 2 and δ (%) statistical indexes. The two parameters predicted by the MLR model are also included, and the model's performance compared to the MLP and DT-MLP models is evaluated. The values show that the MLP and DT-MLP models predicted the velocity with an R 2 of 0.9200 and 0.9200, which are higher than the value obtained with the MLR model (0.5). The R 2 value of the MLR model in predicting water surface depth is smaller than that of the two other models. The MLP and DT-MLP models predicted both velocity and flow depth more accurately than the MLR model. The values in Table 4  water depth prediction. This R 2 value is related to each model's results for all discharge rates. When separating the results related to one run (Figure 8), the R 2 variables were in the 0.9 to 1 range. In general, however, the DT-MLP model (RMSE = 0.058) outperformed the MLP model (RMSE = 0.054). Figure 11 illustrates the bar graph of MAE error values with both the MLP and DT-MLP models in predicting velocity and water depth at different discharge rates with the test dataset. The MAEs are drawn for the MLP and DT-MLP models on the vertical axes of these graphs. Generally, viewing these graphs clarifies that the error value is smaller for the water depth prediction model Figure 11(a) than for the velocity prediction model Figure 11(b); therefore, it can be stated that both the MLP and DT-MLP models performed well in predicting water depth. The graph in Figure 11(a), water surface prediction, shows that the error value of the DT-MLP model significantly decreases at the discharge rates of 7.8 and 19.1 l/s compared with the MLP model. This error value decreases at a discharge rate of 5 l/s with the DT-MLP model; however, since both the MLP and DT-MLP models performed well at this discharge rate, the error value is very small and invisible on the graph. Both models performed quite similarly at discharge rate of 13.6 l/s and the MLP model had a smaller error value at discharge rate of 25.3 l/s. The error decrease is very noticeable in Figure 11(b), velocity prediction, for the graphs for the DT-MLP model rather than the MLP model at all discharge rates. The decreasing error value of DT-MLP is especially noticeable at the low discharge rate of 5 l/s, similar to the water depth prediction model. This may be considered one of the advantages of the present research in terms of using the DT-MLP algorithm to decrease the error in shallow water (approaching supercritical flow) of both velocity and water depth prediction models. Comparing Figure 11(a) and (b), it can be seen that the DT-MLP hybrid algorithm is more successful when employed in models that predict velocity rather than depth. Because the error value decrease is so tangible at higher as well as lower discharge rates in these models, the velocity prediction of the DT-MLP model in the present research is effective and successful for all discharge rates. Using the proposed DT-MLP hybrid model in the present study is therefore very effective in practical cases.

Conclusion
An attempt was made in the present research to evaluate the performance of an MLP model in predicting velocity and water surface variables in a 90°bend before and after the model was combined with a decision tree (MLP vs. DT-MLP). A simple MLP model was also designed to test the suggested model and then both were compared. A total of 520 and 506 depth-averaged velocity and water surface experimental samples were used for training and testing the networks at five different discharge rates. The performance of the two models was examined and compared with the MLR and experimental results. In velocity and water surface depth prediction, the MLP and DT-MLP models exhibited lower error than the MLR model. The results indicate that the DT-MLP model (MAE = 1.76) performed better in predicting flow velocity than the MLP model (MAE = 1.52).
The DT-MLP model accuracy was greater than the MLP model in flow depth prediction as well, with significant reduction in model error. According to the model results comparison for different flow discharge rates, the DT-MLP model was substantially more accurate compared to the MLP model at high and low discharge rates (5 and 19.1 l/s) rather than other discharge rates. Therefore, using the DT-MLP hybrid algorithm proposed in the current research decreases the simple MLP model's error to a large degree and can be used in practical cases. Other soft computing methods in combination with a DT model can be used to predict various additional flow variables in bends, for instance shear stress, pressure and turbulence intensity.

Appendix 1. DT-MLP equation for velocity prediction
The matrix row numbers represent the numbers of model inputs (inputs: X, Y, Q). The column numbers signify the numbers of neurons used in each class. There are 12 hidden layer neurons for the velocity prediction models. Velocity = linear((tanh (input × iw + b 1 )) × lw + b 2 ).

Appendix 2. DT-MLP equation for water surface prediction
The matrix row numbers represent the numbers of model inputs (inputs: X, Y, Q). The column numbers signify the numbers of neurons used in each class. There are 15 hidden layer neurons for the flow depth prediction model. Watersurface = linear((tanh(input × iw + b 1 )) × lw + b 2 ). For the 'very low' class: