Monthly streamflow prediction using a hybrid stochastic-deterministic approach for parsimonious non-

Accurate streamflowprediction is essential in reservoirmanagement, flood control, and operation of irrigation networks. In this study, the deterministic and stochastic components of modeling are considered simultaneously. Two nonlinear time series models are developed based on autoregressive conditional heteroscedasticity and self-exciting threshold autoregressive methods integrated with the gene expression programming. The data of four stations from four different rivers from 1971 to 2010 are investigated. For examining the reliability and accuracy of the proposed hybrid models, three evaluation criteria, namely the R2, RMSE, and MAE, and several visual plots were used. Performance comparison of the hybrid models revealed that the accuracy of the SETAR-type models in terms of R2 performed better than the ARCH-type models for Daryan (0.99), Germezigol (0.99), Ligvan (0.97), and Saeedabad (0.98) at the validation stage. Overall, prediction results showed that a combination of the SETAR with the GEP model performs better than ARCH-based GEP models for the prediction of the monthly streamflow. Abbreviations: ADF = Augmented Dickey-Fuller; AIC = Akaike Information Criterion; ANFIS = Adaptive Neuro-Fuzzy Inference System; ANNs = Artificial Neural Networks; AR = Autoregressive Models; ARIMA = Autoregressive Integrated Moving Average; ARCH = Autoregressive Conditional Heteroscedasticity; ATAR = Aggregation Operator Based TAR; BL = Bilinear Models; BNN = Bayesian Neural Network; CEEMD = Complete Ensemble Empirical Mode Decomposition; DDM = Data-Driven Model; GA = Genetic Algorithm; GARCH = Generalized Autoregressive Conditional Heteroscedasticity; GEP = Gene Expression Programming; KNN = K-Nearest Neighbors; KPSS = Kwiatkowski–Phillips–Schmidt–Shin; LMR = Linear and Multilinear Regressions; LR = Likelihood Ratio; LSTAR = Logistic STAR; MAE = Mean Absolute Error; PACF = Partial Autocorrelation Function; PARCH = Partial Autoregressive Conditional Heteroscedasticity; R2 = Coefficient of Determination; RMSE = Root Mean Square Error; RNNs = Recurrent Neural Networks; SETARMA = SelfExciting Threshold Autoregressive Moving Average; SETAR = Self-Exciting Threshold Autoregressive; STAR = Smooth Transition AR; SVR = Support Vector Regression; TAR = Threshold Autoregressive; TARMA = Threshold Autoregressive Moving Average; ULB = Urmia Lake Basin; VMD = Variational Mode Decomposition; WT = Wavelet Transforms ARTICLE HISTORY Received 5 July 2020 Accepted 25 September 2020


Introduction
Streamflow, as a nonlinear hydrological variable, has a primary role in decisions and management of water engineering sector managers and experts (Attar et al., 2020;Ravansalar et al., 2017). Managing streamflow is an essential task because it has a direct impact non-deterministic, non-stationary, and stochastic behavior . Therefore, streamflow modeling is a difficult and challenging issue. Moreover, there are other factors that directly affect streamflow, such as heterogeneity, noise, hydrological variability, seasonal and periodic patterns, precipitation, temperature, and characteristics of the watershed (Tikhamarine, Souaggamane, et al., 2019). Accurate modeling and forecasting streamflow can be resulted in well designing of structures such as dams, waterway canals, irrigation systems, etc. (Yaseen, Allawi, et al., 2018). They also help decisionmakers to save money and time (Diop et al., 2018). Thus generating a reliable and accurate model is still an essential task for model developers in the water sector . In order to attain an accurate model for predicting, there are some sophisticated tools and ways to modeling hydrological processes in different time scales, such as long term and short term stages, which are in two general categories, namely physical (hydrological) based models and datadriven models (DDMs) (Yaseen, Awadh, et al., 2018). Physically-based models attempt to develop streamflow equations using the hydrological processes of a watershed like the geomorphology of the basin. While DDMs are black-box methods which they extract the streamflow patterns of previous and historical data for modeling and forecasting the future. Recently, DDMs are the most widely used methods and recognized as being modern methods in modeling hydrological parameters by scholars . Due to their excellent accuracy, these methods are commonly used by experts in modeling hydrological processes such as linear and multilinear regression (LMR) (Abdulelah Al-Sudani et al., 2019;A. Ahani, Shourian, & Rahimi Rad, 2018), autoregressive models (AR) (Banihabib et al., 2019), genetic algorithm (GA) combination models (Yaghoubi et al., 2019), gene expression programming (GEP) (Das et al., 2019), artificial neural networks (ANNs) (Ghose & Samantaray, 2019;Xu et al., 2009), wavelet transform (WT) (Freire et al., 2019;Honorato et al., 2019;Ravansalar et al., 2017), adaptive neuro-fuzzy inference system (ANFIS) (Chang et al., 2019;Yaseen et al., 2017), bayesian neural network (BNN) (Ren et al., 2018), recurrent neural networks (RNNs) (Tian et al., 2018), support vector regression (SVR) Yu et al., 2020) support vector machine (Ghorbani et al., 2018). Sabzi et al. conducted monthly streamflow modeling utilizing ANFIS, the standalone models of ANN and autoregressive integrated moving average (ARIMA), and an integrated ANN-ARIMA model by using snow telemetry data in Elephant Butte reservoir at Mexico city (Zamani Sabzi et al., 2017). Hydrological parameters, in most cases, have nonlinear behavior between each other, so this is one of the advantages of DDMs to model them in both linear and nonlinear conditions (Abdollahi et al., 2017;Liang et al., 2018;Shiri et al., 2012). GEP model has received much attention in the last few years as a great DDM (Kisi et al., 2014), and it can find full empirical deterministic equations. Ferreira introduces GEP with the cutting edge paper in 2001 (Ferreira, 2001). The principle of gene expression programming dates back to the using of genetic algorithms that deal with encoding linear chromosomes of fixed length (Kiafar et al., 2017). Shiri et al. examined artificial intelligence approaches, including GEP, ANFIS, and ANN, for forecasting streamflow on a daily scale. Their experiments conducted that the GEP model has better estimation in comparison with the other models (Shiri et al., 2012). Al-Juboori et al. calculated streamflow on a monthly scale using the GEP model in three rivers, Hurman in Turkey, Diyalah, and Lesser Zab in Iraq, and their results were compared with markovian model and ARIMA models. In their analysis, the GEP model performed better than two other methods (Mahmood Al-Juboori & Guven, 2016). However, these DDMs have a limitation regarding dealing with stochastic parts of equations. These models give us the mathematical equations of streamflow without considering the influence of random parts and stochastic terms of this phenomenon. Considering this issue, time series framework can influence these models in positive ways. Traditional time series models such as AR, ARMA, ARIMA techniques were mainly used as linear time series modeling and they focused on considering the mean behavior of streamflow modeling. By investigating the literature, it was found that the streamflow behavior is not only fitted by linear models but also nonlinear models. In this regard, comparing linear and nonlinear models is not correct and there should be some models which consider both mean and variance of data. Recently developments in nonlinear time series modeling by researchers could be divided into considering the structural asymmetry of models in conditional mean and kurtosis. Nonlinear time series has received much attention in recent years (Fathian et al., 2019).
Autoregressive conditional heteroscedasticity (ARCH) is generally used in economics, finance, and recently in hydrology to consider and to deal with time series and representing the changes of variance during the time. This model was first introduced by Engle and Bollerslev (Bollerslev et al., 1994;Engle, 1982). Variance volatility with time is essential to develop accurate models though considering both deterministic and stochastic behavior of data.
ARCH considered heteroskedasticity of conditional variance of noisy data and related it with a linear combination of observed data. It is evident that there are complexities in streamflow data such as inconsistency, noise, variance, parsimonies, nonlinearity, and stochastic behavior (Rahmani-rezaeieh et al., 2019). Using ARCH and ARCH family models will help the water policymakers to get attention to uncertainties and parsimonies in streamflow data. Bearing this in mind, the ARCH model could consider variance volatility in time series data.
Threshold autoregressive (TAR) model is one of the important nonlinear models which is called regime switching model utilizing a threshold value. Self-exciting threshold autoregressive (SETAR) categorized a stochastic process by changing between two regimes of stream flow data. This application is one of the threshold family models which describes the asymmetries in the observed data. There have been some studies about the application of TAR and TAR family models in hydrological science as follows: Komornik et al. have been tried to forecast the mean monthly flow of rivers in the Tatry alpine mountain region in Slovakia as a nonlinear model. They used threshold autoregressive (TAR), smooth transition autoregressive (STAR), self-exciting threshold autoregressive (SETAR), logistic smooth transition autoregressive (LSTAR), and aggregation operator based threshold autoregressive (ATAR). The performance of all nonlinear models for rivers is compared, and the ATAR model outperformed all the models (Komorník et al., 2006). Tongal proposed the predicting performances of SETAR and k-nearest neighbors (KNN) models for the Nolin river located at the Green River basin in Kentucky, the USA, in a daily scale and as a result, he found that the SETAR model outperforms the KNN model in predicting streamflow (Tongal, 2013). Tongal and Berndtsson proposed two nonlinear models, including phase-space reconstruction and SETAR for lake water level forecasting in three lakes in Sweden; Vanern, Vattern, and Malaren. As a result of utilizing the Akaike information criterion (AIC), the best SETAR models were chosen (Tongal & Berndtsson, 2014). Considering previous studies and their linear and non-linear models, common of this literature suggest SETAR models because of it's ability in streamflow forecasting. This model also draws heavily to other linear models. Thus, considering particular streamflow regime, this study presents the applicability of SETAR models in terms of forecasting stream flow.
As many research studies on the use of ARCH models, Chen et al. have been developed an analysis of streamflow forecasting of the Wu-Shi river in Taiwan with the aim of a combination of ARCH, bilinear models (BL), TAR, and threshold autoregressive moving average (TARMA) models. They tried to compare linear models (ARMA) with nonlinear models (TAR, TARMA, BL, and ARCH), and they find out that nonlinear models are better than the linear models in considering the variations of data. As the overall result of this work, they recommend the use of TAR and TARMA models, and for peak streamflow values, the BL model resulted effectively (Chenet al., 2008). Szolgayová et al. developed a hybrid model for forecasting daily river discharges of the Hron and Morava rivers in Slovakia by using the generalized autoregressive conditional heteroscedasticity (GARCH) model (Szolgayová et al., 2017). Based on the literature, threshold time series along with conditional heteroscedasticity models in the case of streamflow modeling are still novel nonlinear techniques which they help the precious and rigorous streamflow predicting models (Fathian et al., 2019).
In the light of the reviewed high impacted research papers on streamflow prediction, there is still a shortage of research on the nonlinear behavior of streamflow data and their prediction. The primary goal of this study is to analyze errors by two different nonlinear approaches, namely ARCH and SETAR models, and then combining them with the GEP models in four stations in Iran. Thus, in the present study, in order to make dynamic models with threshold variables, two regime SETAR models were used. Also, ARCH models were considered the variance behavior of streamflow time series data. Finally, these two ARCH-type and SETAR-type components were combined with the GEP model in order to develop new streamflow data lag-based accurate equations. As discussed above, the main reason for selecting the GEP model is to determine the deterministic component (as an equation) of monthly streamflow time series. In the author's point of view, there is rarely specific work and similar studies in combination between DDMs and nonlinear error analyzing methods with considering the comparison between two nonlinear models, namely ARCH and SETAR, in the estimation of monthly streamflow.
The paper is organized as follows: firstly, the study area, and used streamflow data with their statistical analysis were provided in section 2.1. Secondly, each model, including GEP, ARCH, and SETAR model, is discussed in sections 2.1-2.4. In the next step, proposed standalone and integrated models were developed and discussed in section 2.5. The information about evaluation metrics was presented in section 3. Subsections 4.1-4.4 have summarized the results, combined model assessments. Discussion about the study were presented in Section 5, and finally, all the remarks and conclusions and suggested future works were demonstrated in section 6.

Study area, data, and statistical analysis
Iran is a country located in Asia in the southwest of the middle east and has been divided into six large basins, namely the Caspian Sea, Persian Gulf, Urmia, Central, Hamoon, and Sarakhs basins (H. Ahani & Kherad, 2013). As shown in Figure 1 is a map of the study area that is placed in the Urmia Lake basin (ULB) in East Azerbaijan that is located between 36°,45 to 39°,26 North and 45°,5 to 48°, 22 East. Four stream flows were used in this study, with 40 years' monthly data from 1971 to 2010. From each river, one station was selected, including Daryan on Daryan river, Germezigol on Ganbarchai, Saeedabad on Saeedabad chai, and Ligvan on Ligvanchai. It is worth mentioning that chai in Persian means river. Table 1 illustrates the spatial characteristics of stations, including the longitude and latitude of each station with their elevation. As seen in the table, Ligvan station with 2200 m and Daryan station with 1600 m have the highest and lowest elevation, respectively. In this study, the data of streamflow were divided into two sections, including calibration and validation stages. For this purpose, 75% of data (480months × 0.75 = 360months) were considered as train series, and 25% (480-360 = 120months) of remain data were considered as test series. Figure 2 demonstrates the average annual values of the monthly streamflow of all stations. Statistical characteristics of monthly streamflow time series of the case study stations in both calibration and validation stages are shown in Table 2. The statistical characteristics consist of mean, standard deviation, skewness and kurtosis coefficients, and the lag-one autocorrelation coefficient. The highest amount for correlation is stated in Saeedabad station (0.773 and 0.699) in calibration and validation stages, respectively.

GEP model
As stated above, GEP, first announced by (Ferreira, 2001), is one of the popular DDMs that are in the group of artificial intelligence models based on theories in Darwinian evolution algorithms that they deal with neurons of the human brain. These algorithms tried to search for the best solution, which has the best fit among the other solutions. Each solution is stated as an individual. Weak quality solutions are eliminated by using fitness functions (Ashrafian et al., 2020). In the GEP model, chromosomes are performed as an expression tree, and they can be coded by linking functions. The results of the GEP model could be reported by mathematical equations and decision tree structures, while they are a correlation of the input variables of the user. GEP is an application that directly represents genes. GEP selected each gene's terminals in random behavior. These terminals are extracted from the head and tail of each gene. Terminals are essential because they help the model modification for developing next-generation and evolution process. GEP establishes an original problem set with the first individual names 'parent node' this individual multiply to 'offspring node' with new top-performance operators. The genetic information of each parent transfers to a new generation with new modified environmental adaption, and it resulted in better fitness. GEP tries to find the best offspring node with a low error. In this way, the evolutionary process grows in a better way. The main advantages of the GEP model can be discussed as follows (a) chromosomes are small, easy to recombine, mutate, duplicate; (b) The generated decision trees are units of their chromosomes and based on fitness, they selected to reproduce modified chromosomes (Conditioning et al., 2019). The main steps in modeling by GEP are summarized in six steps. (1) selecting the fitness function like RMSE, (2) selecting input variables in order to engender chromosomes, (3) choosing operators, (4) scheming the design of chromosomes, (5) choosing the linking functions, and in the last step, (6) choosing the genetic operators (Rahmani-rezaeieh et al., 2019). More information can be found in (Ferreira, 2001).

ARCH model
Streamflow time series is defined as the volume of water, which varies in time, which is volatile and has fluctuations in time. Modeling streamflow time series has the principal purpose that we can have insights on data and can forecast or predict that variable. Meanwhile, modeling, the main focus is on the average of used data. However, for modeling hydrological data, significantly streamflow variance of data should be considered. So models that can use the variances of data become highlighter than the others. ARCH was first introduced by (Engle, 1982). ARCH model is one of the abovementioned models that deal with a variance of data. This model can consider the parsimonious effects of observed data and provides a framework that is considered nonlinear volatility in data (Hamilton, 1994). Equations (1) and (2) illustrate the ARCH model (Bollerslev et al., 1994).
Where σ 2 t is conditional variance, ε t is a discrete-time stochastic process, and α 0 are the ARCH model's parameters, q is the model's order, and z t is the regular and standard series.

SETAR model
TAR models are dealt with nonlinear systems in a discrete time scale, also considers anomalies', fluctuations, and seasonality of proposed data. This method was first developed by (Tong, 1983a). This model is also called the regime-switching model, which has a value of the threshold for switching between the upper regime and the lower regime of streamflow. These models have Kth parts of AR (p) models in which each part is different from the other parts, and p shows the order of AR models. TAR models can be depicting by TAR (k, p). The whole modeling process is consisting of four main steps; (a) model developing, (b) statical identification, (c) parameters estimation (d) predicting (Pan et al., 2017).
SETAR is a nonlinear time series in the family of TAR models (Tong, 2015). These models are supposed to use in modeling and forecasting streamflow because of their nonlinearity characteristics. SETAR models have some advantages which make them popular such as considering time irreversibility, jump and chaos phenomena, the harmonic bias in data. Therefore, SETAR models are an excellent tool for modeling a parsimonious non-linear time series (Gonzalo & Wolf, 2005). This model also can have two or more regimes depending on the levels of lagged variables in its structure. Using SETAR models by considering two high and low regimes was defined by (Tong, 1983b). In the current study, the SETAR model was used in predicting monthly streamflow, divided streamflow time series data into subregimes (two regimes). This method is another way of developing nonlinear dynamical methods (Tongal, 2013). Streamflow time series data set with (Q 1 . . . ,Q n ), in which n is the number of time series, considering two regime SETAR model can be defined by Equations (3) and (4) as follows: Where ∅ are the coefficients of the AR model, p 1 and p 2 are the orders of AR models for lower and upper regimes in a two-regime SETAR model, d is a lag or delay time, e t is the residual value at time t.
If the threshold variable is chosen as the lagged value of Q t , then it can be written as SETAR (d,p 1 ,p 2 , . . . ,p j ).
The threshold values can be selected from (−∞, ∞) and the threshold (r) and the delay (d) parameters are an experimental procedure that they can be used to select the most proper model for a time series by minimizing the AIC (Tongal & Berndtsson, 2017). Based on Tongal (Tongal, 2013), because of simplicity, efficiency, and lower computational process, the two regime SETAR were selected in this study.

Development and integration procedure of models
In general, pre-processing is a fundamental step in analyzing historical time series data in various phases, such as designing, application, and methodologies of new models. Data pre-processing techniques could be divided into five groups, including stationary tests, independence tests, normality tests, periodic estimation, consistency, and homogeneity tests. More information about these tests was discussed in (hydrology & 1993, n.d.). Figure 3 demonstrates the whole study steps flowchart. In order to start modeling, the very first step is to collecting streamflow data from hydrometric stations. The next step, splitting the historical data into two segments of calibration and validation phases, is another critical step in every modeling. To reach the best relationship between input and output data, the calibration step is one of the essential parts in modeling (Tongal, 2013). As mentioned before, in this study, the streamflow data for each station were split into 75% calibration and 25% validation. The next step is the data-normalization step in which Delleur and Karamouz equations were used for the normalization step in this study. More information can be found (Attar et al., 2020). The next step is checking if the data have any trend or seasonality effects or not. When the statistical characteristics (mean, variance, autocorrelation) of historical time series data are all constant over some time, and there is no trend in data series, data series are called stationary. In the current study, the stationary of data was evaluated by two critical tests, including unit-root test augmented dickey-fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test. A unit root test of data series was done by ADF null hypothesis that shows the process has a unit root, and the alternative hypothesis has no unit root, and that means the data series are stationary, and the result of the test is always a negative number. The degree of negativity of this number shows the degree of stationary data. A stationary around means that is tested by the KPSS model, and in contrast with the ADF test, the null hypothesis in KPSS is that the data is stationary, and the alternative hypothesis is that the data is not stationary and the process has a unit root.
Moreover, in order to start modeling with nonlinear methods, the LR was used in this study. LR test can discover nonlinearity by using AR(p) and TAR(P) model. The model with AR(p) is the null hypothesis of the LR model, and the alternative hypothesis is modeling with TAR(p).
A vigorous model was achieved when the optimum lags were considered in each modeling scenario. The proper number of lags for each scenario is entirely dependent on input streamflow data. It can be defined based on autocorrelation analysis (partial autocorrelation) methods. There are some steps to defining the optimal number of lags (a) normalization of data, (b) data stationaries, (c) autocorrelation analysis (which lags can be defined based on pacf diagrams in the confidence level of 95%).
In the present study, in order to develop GEP models, each model needs a series of sorted and complete data with defined lags as input data. The input data were considered with five lags (months) for modeling standalone GEP as follows: The foremost motive for choosing the GEP model is because to create a deterministic equation with different input values while utilizing user-defined linking functions.
As stated before, the procedure of modeling streamflow data with the GEP model considers the mathematical equation of model based on historical time-series data through defined lags by a user. Thus, the main focus on streamflow as a stochastic random phenomenon is hidden while using a standalone GEP model. For this reason, in this study, two models were introduced for modeling random parts of the equation, namely ARCH and SETAR models. In this direction, ARCH and SETAR models consider the variance of the observed data besides considering the statistical mean of streamflow data. Herein, after defining both the deterministic and random parts of the streamflow data using GEP, ARCH, and SETAR models, new series were generated by combining these parts. The combined GEP-ARCH hybrid based model can be defined as: Where, D t is the modeled deterministic part of the streamflow time-series by GEP and ε t is the modeled random part of the streamflow series by ARCH. As shown in Equation (5), the equation obtained from GEP models and the results of ARCH modeling is considered as D t and ε t respectively. The combined GEP-SETAR hybrid based model can be defined as: Where, D t is the modeled deterministic part of the streamflow time-series by GEP and e t is the modeled random part of the streamflow series by SETAR. As shown in Equation (6), the equation obtained from GEP models and the results of SETAR modeling is considered as D t and e t respectively.

Assessment of models
In practice, for rigorous accuracy, selecting the most effective and reliable model is an essential task. The accuracy of models can be examined by evaluation criteria. To find the best model, it is necessary to examine these criteria. The evaluation criteria used in this study were the coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute error (MAE), which can be formulated as follows: Where N is the number of observations, O i are the actual observations of streamflow,Ō is the mean of the observed data, and P i is the estimated value of streamflow.

Results and discussions
In the present study, the benchmark and proposed prediction hybrid models were established using two novel variance-based stochastic nonlinear methods, namely ARCH and SETAR combining with selected best fitted standalone GEP model for four selected hydrometric stations in the Urmia Lake basin. This result part is divided into subsections, including the results of premodeling tests, assessment of standalone SETAR models, the results of combined GEP-ARCH and GEP-SETAR models, and in last, the conclusion and future works were recommended.

Pre-modeling tests
In order to test if the data are stationary and have the potential to nonlinearity modeling, three tests, including ADF, KPSS, and LR tests, examined with a specific hypothesis for each test. The results of these tests, demonstrated in Table 3, were showed that by examining their p-value amounts, all data are stationary, and the series has a nonlinearity indefinite significance level. For instance, in Daryan station, the alternative hypothesis is accepted under a 95% confidence level, which means that the series is stationary. The KPSS test is on the other side of the ADF test, which means that based on Table 3, the null hypothesis is accepted, which means that the data series is stationary under a 95% confidence level. In the case of the likelihood ratio (LR) test, the alternative hypothesis is accepted, which means that the streamflow data series is nonlinear, and SETAR models can fit them.

Assessment of applied standalone nonlinear SETAR models
Before starting SETAR modeling, the threshold and delay parameters with the orders of AR models should be firstly estimated. In order to test data independence and orders, ACF and PACF figs were calculated. The degree of SETAR models was estimated by PACF. Also, the satisfactoriness of SETAR models is established when all the ACF coefficients were located between confidence intervals.
As shown in Figure 4, the orders for monthly streamflow for all the stations can be defined (four lags for Daryan, Germezigol, Ligvan stations, and two lags for Saeedabad station). They directed that the SETAR models are useful for monthly streamflow modeling. Next, delay (d), order values (P 1 , P 2 ), and the threshold value (regime-switching threshold value) was estimated based on the minimum AIC criterion. These values for each station are shown in Figure 5. For instance, in Daryan station the SETAR model values (d = 10, P 1 = 3, P 2 = 2), for Germezigol station (d = 5, P 1 = 2, P 2 = 2), for Ligvan stations (d = 3, P 1 = 2, P 2 = 1) and for Saeedabad station (d = 1, P 1 = 10, P 2 = 4) were calculated.
Based on the obtained delay parameters and order values from Figure 5, the threshold value for SETAR models based on Equations (5) and (6) are calculated for each station. Equations (10-13) illustrate the results of two regime SETAR models for all stations including Daryan, Germezigol, Ligvan, Saeedabad with SETAR (2,3,2), SETAR (2,1,2), SETAR (2,2,1) and SETAR (2,2,2) respectively. The complementary information like the order, the delay parameter, and threshold values for each station can be obtained from these equations. In the equation number 10, the equation (Q t = 0.0118 + 0.7726Q t−1 -0.2374Q t−2 +0.206Q t−3 ) with the threshold of (Q t−10 ≤ 1.132) shows the lower regime of the SETAR model.
with the threshold of (Q t−10 > 1.132) shows the upper regime. The behavior of the basin can be determined based on these threshold values of streamflow data. As if, streamflow water level surpassed the threshold value, it increases the runoff probability in the basin. In opposite, if the stream flow water level is below (when there is little precipitation) the threshold value, then drought would happen.
Results of fitted SETAR two-regime models to the monthly streamflow Datasets of Germezigol station with SETAR (2,1,2) Results of fitted SETAR two-regime models to the monthly streamflow Datasets of Ligvan station with SETAR (2,2,1) (12) Results of fitted SETAR two-regime models to the monthly streamflow Datasets of Saeedabad station with SETAR (2,2,2) Where Q t−1 and Q t−2 are the streamflow with the lags of 1 and two months, respectively.

Assessment of applied combined models
The following values were considered in this study for the input variables of the GEP model. Number of Genes = 3, number of chromosomes = 30, head size = 8, mutation rate = 0.04, inversion rate = 0.1, gene recombination rate = 0.1, one-point recombination rate = 0.3, two-point recombination rate = 0.3, Gene transposition rate = 0.1, insertion sequence transposition rate = 0.1, root insertion sequence transposition rate = 0.1. Then, as formerly explained, in this study GEP model, combined by nonlinear time series models, including SETAR and ARCH models. In this part, the results of the study were divided into two sections, namely GEP-ARCH and GEP-SETAR combinations. Figures 6-9 demonstrate the time series plots for each station in a combination of GEP lags with ARCH models. As shown in Figure 6, the observed streamflow (black lined) were compared with all models; it could be concluded that GEP5-ARCH is closer to observed values.

GEP-ARCH combination
Tables 4-7 represent the obtained performance criteria of each combination of different lags of the GEP model with nonlinear ARCH models. As shown in the following tables, evaluation indices have convincing results for each combination in both calibration and validation stages. In Daryan hydrometric station, in the calibration stage GEP5-ARCH model with the correlation coefficient values of (R 2 = 0.737) error values                 Figures 11-14 show the time series plots for each station of Daryan, Germezigol, Ligvan, and Saeedabad. Tables 8-11 represent the obtained performance criteria of each combination of different lags of the GEP model with nonlinear two regime SETAR models. As shown in these tables, evaluation indices have convincing results for each combination in both calibration and validation stages.

GEP-SETAR combination
In Daryan station, GEP4-SETAR model was selected as best performed model among others because of evaluation indices, R 2 = 0.995, RMSE = 0.002 m 3 /s, MAE = 0.017 m 3 /s for calibration station and R 2 = 991, In this section, the observed and modeled data of each GEP-SETAR combination based on their coefficient of efficiency values were shown in Figure 15. The results showed satisfactory and acceptable values for each station by considering the calibration and validation stages. In comparison to GEP-ARCH type hybrid models, these combined GEP-SETAR models have very high values of correlation between observed and estimated values. Therefore, SETAR type models perform better than hybrid ARCH methods.

Comparison between selected ARCH-type and SETAR type models
As shown in previous sections, two deterministic and stochastic parts of both GEP and time-series models combined by Equations (5) and (6). As an example of Daryan station, the results of combined GEP-ARCH models in both calibration and validation stages show (R 2 = 0.737, RMSE = 0.042 m 3 /s, and MAE = 0.475 m 3 /s) and (R 2 = 0.477, RMSE = 0.079 m 3 /s, and MAE = 0.464 m 3 /s) respectively. While the results of combined GEP-SETAR models in both calibration and validation stages in Table 8 shows (R 2 = 0.995, RMSE = 0.002 m 3 /s, and MAE =

Discussion
In this study, the concept of nonlinearity, parsimonious, and complexity of monthly streamflow data was considered for defining the superior model. At the first step, 40 years of streamflow data with the 30 days (monthly) scale were selected from four rivers located in ULB. Streamflow data were modeled with GEP methods (including five different scenarios and inputs), and the equations and the estimated data were defined, and they compared with the historical data in two calibration and validation stages. It is clear that the GEP model has the capacity to model the linear and nonlinear relations between the input variables. Also, the GEP only exported the deterministic components of the streamflow equations. Considering both deterministic and stochastic concepts of every hydrological model. Here, in this study, ARCH models were combined with the best performed GEP model, and the results were reported. In the next step, in order to find the mean behavior of streamflow data and the changes between upper and lower regime types (switching regimes) and considering the threshold between them, SETAR models were utilized. Finally, the results of SETAR models were combined with superior GEP models, and the results showed a satisfying and acceptable increase in the models' performances in comparison to hybrid GEP-ARCH models based on evaluation criteria. In other words, to address these findings, the relative error index was calculated and depicted in Figure 16 for all the proposed stations. For Daryan station, as it shown in blue box plots, the error plots were demonstrated that the GEP-SETAR has the lowest value following the standalone GEP and the hybrid GEP-ARCH models. In Germezigol station (green error box plots), it can be seen that both hybrid models, including ARCH type and SETAR type models, performed better than the sole GEP model. Yellow color error box plots reveal the relative error-index between 0 and 3, and the GEP-SETAR model draws heavily in comparison to the two other models. Finally, the same results can be seen at the pink color error box plot for Saeedabad station, which presented the lowest error model.
The overall performance comparison of two regime SETAR models along with ARCH type models for two primary calibration and validation stages were demonstrated in Table 12. Utilizing the error type (RMSE, MAE) and accuracy type (R 2 ) evaluation metrics, Table 12 demonstrates the ratio of SETAR-type models to ARCH-type models' performance in percent. Overall, as a significant result, all SETAR-type hybrid models in this study have better performance than ARCH-type models.

Conclusion and future works
Nowadays, providing a method for enhancing hydrological forecasting accuracy is a challenging issue for engineers and scholars in water resources planning and management. Relatively, with the literature review, there is little attention in utilizing both deterministic DDMs and stochastic time series for forecasting hydrological parameters, especially streamflow. In this study, the predictability of newly hybrid applied DDMs from monthly streamflow of four stations, Daryan, Germezigol, Ligvan, and Saeedabad, at Urmia basin was assessed using performance metrics called R 2 , RMSE, and MAE and visual plots. One commonly used DDM called GEP was used in this study as a primary deterministic model, while two-time series models called ARCH and SETAR models as nonlinear models as a stochastic determination parts were used. By comparing the latest results of best performed GEP and ARCH and SETAR models, it is found out that in all calibration and validation stages within all stations, SETAR models have the best performance, comparing to ARCH models. In terms of accuracy, prediction results for Daryan, Germezigol, Ligvan, and Saeedabad, respectively, improved by about 51%, 15%, 1%, and 2% when the GEP model integrated with SETAR compared to ARCH. In addition, the RMSE value for the ratio of the SETAR-type models to ARCH-type models decreased to 45% (Daryan), 74 (Germezigol), 88% (Ligvan), and 16% (Saeedabad) at the validation stage. Overall, it can be stated that hybrid GEP-SETAR models are demonstrated improving classical models, and they can be used for analyzing, modeling, and predicting stream flows for future water management. For future studies, it can be useful to consider other statistics like mean and kurtosis using new methods such as SETARMA, BL, BL-ARCH, or other autoregressive conditional heteroscedasticity models such as GARCH, partial autoregressive conditional heteroscedasticity (PARCH), Non-linear GARCH. In contrast, this study applied the variance statistics of observed streamflow data. Another suggestion is for data decomposition as nonlinearity, non-stationary, complexity, and random distribution can be seen on hydrological processes (i.e. streamflow, rainfall, water stage, groundwater, etc.). To overcome those difficulties, several data pre-processing tools such as complete ensemble empirical mode decomposition (CEEMD), improved CEEMD, and variational mode decomposition (VMD), can be addressed.