A decomposition and multi-objective evolutionary optimization model for suspended sediment load prediction in rivers

Suspended sediment load (SSL) estimation is essential for both short- and long-term water resources management. Suspended sediments are taken into account as an important factor of the service life of hydraulic structures such as dams. The aim of this research is to estimat SSL by coupling intrinsic time-scale decomposition (ITD) and two kinds of DDM, namely evolutionary polynomial regression (EPR) and model tree (MT) DDMs, at the Sarighamish and Varand Stations in Iran. Measured data based on their lag times are decomposed into several proper rotation components (PRCs) and a residual, which are then considered as inputs for the proposed model. Results indicate that the prediction accuracy of ITD-EPR is the best for both the Sarighamish (R 2 = 0.92 and WI = 0.96) and Varand (R 2 = 0.92 and WI = 0.93) Stations (WI is the Willmott index of agreement), while a standalone MT model performs poorly for these stations compared with other approaches (EPR, ITD-EPR and ITD-MT) although peak SSL values are approximately equal to those by ITD-EPR. Results of the proposed models are also compared with those of the sediment rating curve (SRC) method. The ITD-EPR predictions are remarkably superior to those by the SRC method with respect to several conventional performance evaluation metrics.


Background
Water is an essential substance for all creatures. Nowadays, the growing world population is directly dependent on water adequacy. Because of different water distribution systems in various parts of the world, the idea of using reservoirs is highlighted and has great significance. Thus, reservoir capacity is a topic of the utmost importance. Yet, problems related to reservoir capacity are serious challenges in this growing world (Cieśla et al., 2020). Suspended sediment load (SSL) in rivers, which is affected by hydrological and meteorological variables, is one of environmental and hydrological issues in watersheds (Adnan et al., 2019). The sedimentation rate is one watershed management tools, and the volume of sediments also has effects on CONTACT Chengwen Wu sj_wcw@wzu.edu.cn; Shahab S. Band shamshirbands@yuntech.edu.tw; Amir H. Mosavi amir.mosavi@mailbox.tu-dresden.de water quality (contaminant transport), aquatic animals' health, ecological impacts, geomorphology, channels' and hydraulic structures' design, river bed sustainability, and dams and reservoirs engineering (Khan et al., 2019;Kisi & Zounemat-Kermani, 2016). Furthermore, sediment rates have effects on the volume of dams, in which silting and erosion phenomena cause a reduction in capacity (due to the increase of dead storage). The hydrology of the basin can be used to estimate the concentration of suspended sediment carried by rivers in the watershed .
In general, numerical environment modeling needs ubiquitous dynamic modeling systems in which the generated models can be very near to the observed historical data. Therefore, bearing in mind the architectures of hydrological data, which consists of both stochastic and deterministic behavior, hydrologists try to model hydrological variables in both physical applications and stochastic models. For this, there are many computational methods, such as data-driven models (DDM), which are gateways for accurate environmental forecasting. Consequently, DDM techniques have vast application areas in environmental modeling and have been used by many scholars in recent years. Modeling and prediction of SSL are one of the essential issues in water resources engineering, and decision-makers use watershed sediment modeling globally. Precise SSL modeling, like modeling in other fields, depends on the significance of data, input parameters, lag of input variables and data scale (hourly, daily, weekly, monthly and yearly). The rigorous estimation of SSL is fundamental in hydraulic and sediment engineering in a river basin (Zounemat-Kermani et al., 2020;Sharghi, Nourani, Najafi, & Gokcekus, 2019). There exists a sediment rating curve (SRC) that defines the relationship between the discharge of streamflow and sediment concentration. SSL time series are affected by different variables, including hydrological, morphological, and meteorological variables, which render SSL data complex (Kisi & Yaseen, 2019). Additionally, due to the complexity and nonlinearity of SSL time series, data-driven models (DDMs) have delivered more efficient results compared with those from physical and empirical models. DDMs have long been associated with SSL modeling. They are able to learn from the behavior of input data and result in fast computing and rigorous, flexible and accurate prediction results.

Literature review: forecasting models
There is a rich literature reporting on SSL modeling with DDMs over the past few decades. Various models, such as artificial neural networks (ANN) (Fard & Akbari-Zadeh, 2014;Kumar et al., 2019;Liu et al., 2019;Samet et al., 2019;Zounemat-Kermani et al., 2020), emotional ANNs (Sharghi, Nourani, Najafi, & Soleimani, 2019), adaptive neuro-fuzzy systems (ANFISs) (Adnan et al., 2019;Ehteram et al., 2019), genetic programming (GP) (Jaiyeola & Adeyemo, 2019; Khozani et al., 2020;Safari & Mehr, 2018), local weighted linear regression (LWLR) (Kisi & Ozkan, 2017), extreme learning machines (ELMs) (Ebtehaj, Peterson et al., 2018;Roushangar et al., 2021), support vector machines (SVMs) (Ebtehaj, Bonakdari, Shamshirband, & Mohammadi, 2016;Meshram et al., 2020;Rahgoshay et al., 2019), multivariate adaptive regression splines (MARS) (Adnan et al., 2019), fuzzy c-means clustering techniques (Kisi & Zounemat-Kermani, 2016), etc., were able to model and predict SSL. Zounemat-Kermani et al. (2020) employed hybrids of ANFIS and SVR with genetic algorithms (GA-ANFIS and GA-SVR) for SSL and bedload (BL) prediction, and their results were compared with those by two customary models, namely SRC and MLR, at Grande de Loíza River on Puerto Rico Island, USA. Utilizing discharge, SS and BL data on a daily scale, they found that GA-ANFIS and GA-SVR models performed better than standalone models. Rajaee (2011) evaluated a preprocessing based DDM (termed WANN) for predicting daily SSL in Yadkin River at Yadkin College, New York City, USA using 30 years of data from 1957 to 1987. Their outcomes were compared with those of MLR and SRC models. Based on his results, WANN was selected as the best performing model among MLR and SRC models. Mirbagheri et al. (2010) predicted SSL using three models, namely ANN, ANFIS and wavelet neuro-fuzzy (WNF), along with SRC. They utilized river discharge and SC data on a daily scale obtained from Rio Rosario Station located in Hormigueros, Puerto Rico, USA. They found that WNF was effective on hysteresis phenomena and it outperformed ANN and SRC. Alizadeh et al. (2017) employed ensemble WANN for modeling and forecasting a one-time step ahead of SSC at Skagit River near Mount Vernon in Washington County, Pennsylvania, USA. They selected observed and forecasted timeseries data of SSC as input variables. They found that each step ahead WANN performed better than the previous one. Martins and Poleto (2017) tested maximum entropy in modeling SSC with two data series (Coleman, 1986). They compared the results of empirical equations of maximum entropy (Tsallis and Shannon entropy) with Prandtl von Karman methods and the Rouse equation. They found that Tsallis and Shannon entropy performed better in modeling SSC.
Owing to the nature of hydrological phenomena such as SSL, their behavior is generally characterized by high non-stationarity and nonlinearity changes. Therefore, creating an accurate predictive model is highly challenging owing to the existing high complexity issue. In this regard, DDMs that can create nonlinear relationships between inputs and outputs are considered. Although DDMs have been successfully applied in SSL prediction, they have some weaknesses. For example, some DDMs such as ANN, ANFIS and SVM techniques have unknown parameters that have remarkable influences on their accuracies. Most literature reviews commonly use classical training algorithms for training these models. Nevertheless, these models may be trapped in a local optimum. Moreover, some DDMs lack a comprehensive expression for use in practical tasks and may also produce uncertainties in terms of predicted values. These reasons lead to the development of equation-based models such as EPR and MT for the estimation of hydrological phenomena. In addition, owing to the nonlinearity and seasonality of time-series datasets, applying datasets directly to models may not provide significant insights for these phenomena. Hence, the employment of data pre-processing techniques, which can extract the embedded features of non-stationary and dynamical time-series signals, is highly recommended. Among various pre-processing approaches, intrinsic time-scale decomposition, as one of the noise-assisted data analysis approaches, is considered for decomposing input and output variables with a few PRCs, which can convert non-stationary to stationary signals. Hence, a novel beneficial approach coupling equation-based models and pre-processing techniques is developed in this study to create a robust predictive model Rezaie-Balf et al., 2019).

Problem statement
SSL consists of small particles of silt, sand, clay, gravel, etc. that are smaller than 63 μm in size. Under certain critical weather conditions (e.g. climate change), strong typhoons or stormwater runoff, extreme sediment transportation occurs and particles are carried extensively from upstream areas into downstream plains and watersheds (Huang et al., 2019). Once the flow velocity reduces, these particles will settle and gather in river bottoms or hydraulic structures like channels, reservoirs, dams, etc. For instance, sediment deposition in dams causes a reduction in their active storage capacities (Adnan et al., 2019). SSL can also diminish soil quality and agricultural crop yield. Therefore, understanding sediment behavior, modeling and prediction of SSL is a significant step in hydrological management.
All DDMs and regression models necessitate knowledge of the exact structure of input sediment data. Thus, attention has also been focused on input data (preprocessing techniques) prior to modeling. Meanwhile, sediment data, like other hydrological data, have some characteristics such as stochastic nature, time and frequency domain, anomalies, trends, seasonality, periodicities, etc. (Attar et al., 2020). Directly utilizing raw data for model generation causes some uncertainties and errors, and also diminishes the accuracy of the model. Over the past few years, a growing appeal of utilizing pre-processing techniques has been found in hydrological issues, which has successfully enhanced model performance. In recent years, several signal processing techniques have emerged, which can separate fluctuating signals into individual smaller sequences. In order to analyze nonlinear and complex data series (SSL time-series data), these decomposition methods are adaptable and practical tools. They try to decompose the original data into a limited number of residuals (Zeng et al., 2020). Principal component analysis (PCA) (Loska & Wiechuła, 2003), continuous wavelet transforms (CWTs) (Tiwari & Chatterjee, 2010), wavelet multi-resolution analysis (Nourani et al., 2014), maximum entropy (Singh & Krstanovic, 1987), singular spectrum analysis (Rocco S, 2013), and some noise control techniques of time-series data such as empirical mode decomposition (EMD) techniques (Kuai & Tsai, 2012), complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) (Adarsh & Reddy, 2015), intrinsic time-scale decomposition (ITD)  are some algorithms that have been used by researchers. ITD, which was first presented by Frei et al. in 2006, practised new processes of constructing the baseline through a piecewise linear function (Frei & Osorio, 2007). This is a novel model based on adaptive decomposition techniques and can expose the signal decomposition in a better way. At the endpoints, ITD can rigorously control the endpoints effect. Because of the mono level iteration of the ITD algorithm, fast running is another advantage. In ITD, the original data is divided into several (more than five) monotonic PRCs from high to low frequencies. These PRCs can then be applied to analyze immediate information (Zeng et al., 2012).

Objectives and motivations
Real-world data such as SSL in rivers often have some complex characteristics such as non-stationary, nonlinear, noisy and limited dynamical information. Thus, high quality and novel techniques are entailed in modeling these data taking their characteristics into. Today, there is significant development in the use of DDMs in real-world applications. In some cases, results from standalone DDM models have shown lower accuracy compared with those from hybrid models (like ITD-DDMs) (Hassanpour et al., 2019). Recently, hybrid models (pre-processing-DDMs) have been found to be able to give an acceptable level of accuracy and havebecome the most essential ideas in the hydrological modeling literature, including SSL prediction. The objective of the present study is to enhance the accuracy of selected novel DDMs with innovative pre-processing techniques in SSL prediction and modeling at two different stations in Iran, namely Sarighamish Station on ZarrinehRoud River in the Saeengale sub-basin and Varand Station on Chahardangeh River in the Neka subbasin. To the best knowledge of the authors, there is no related research on using ITD pre-processing techniques in hydrological problems, especially sediment transport problems.

Study area and data description
ZarrinehRoud and Chahardangeh Rivers are selected from two different sub-basins (Saeenghale and Sari) in north and north-western Iran, respectively, with one hydrometric station on each river. It is worth mentioning that these two stations (Sarighamish and Varand) are selected based on their area size for evaluating models from two important basins in Iran, namely the Urmia Lake basin (ULB) and the Mazandaran basin. Information about their geographical characteristics, including longitude, latitude, elevation, and their related sub-basins of Sarighamish and Varand Stations, are given in Table 1. Moreover, the locations of the selected stations and key information about input data (SSL and discharge), including their statistical characteristics such as mean, standard deviation, kurtisos, maximum, minimum, variance and skewness are illustrated in Figure 1. Based on an analysis of the input data from these two rivers, the mean values of streamflow discharge along with SSL for Sarigamish Station in both the training and testing stages are significantly greater than those of Varand Station.
As stated above, the area of the Saeenghale sub-basin is 7160 km 2 , which is seven times greater than that of the Sari sub-basin with 1198.3 km 2 . ULB is located in northwestern Iran (44°7 to 47°53 E and 35°40 to 38°30 N) with 51,876 km 2 , which amounts to 3.15% of the area of all basins in Iran. ULB consists of 24% land and 11% Urmia Lake. ZarrinehRoud River is one of the biggest rivers in Urmia Lake basin, which debouches to Urmia Lake (24% of the total intake to Urmia Lake). It is 302 km long, with an average annual water flow of 1583 MCM. This river originates in Chehelcheshme Mountains, crosses Shahindezh, Miandoab and Keshavarz cities and finally reaches the southern part of Urmia Lake. Based on the literature, the flow rate of ZarrinehRoud River varies from 120 to 10 m 3 /s annually. This significant fluctuation in river flow rate depends on climatic conditions. There are also dissolved components in ZarrinehRoud River that lead to dissolved sediment loads. These data are measured by hydrological experts in Urmia environmental and water organizations. Sarighamish is a hydrometric station located on this river. This station is located in the Saeenghale sub-basin of ULB (Emdadi et al., 2016;Ghavidel & Montaseri, 2014). Sari City is the capital city of Mazandaran, Iran, which is located at 36°34 4 N, 53°T able 1. Geographical locations and characteristics of stations. 3 31 E. Based on the literature, the climate of this city is mild, with much rain in winter (Vatanpour et al., 2020). The annual average temperature is reported to be 16.7°C. Moreover, on average, the month with the highest temperature is August at 25.2°C and February is defined as the coldest month with a temperature of 25.2°C. Additionally, the average precipitation of Sari City is 690 mm per year (Ghanbarpour et al., 2013). The lowest rainfall occurs in June (23 mm), while the highest rainfall (98 mm) occurs in December. Tajan River is known as the most important river of Sari City since it provides water supply to the whole watershed. Chahardangeh River is one of the principal tributaries of Tajan River, which initiates in Kiyasar region mountains. Chahardangeh River, of total length 95.73 km, is located in the Sari sub-basin of Mazandaran province with an area of 1198 km 2 . Moreover, Sari sub-basin is located between (86°46 38.8 to 30°55 19. 2 N) and (52°46 44.8 to 6°56 30.1 E); it also has some hydrometric stations. Hydrometric data from Varand Station on Chahardangeh River from July 1995 to August 2017 are employed in this study.
In the present study, for modeling discharge and SSL data, all data series are divided into two categories, namely 75% and 25% of the entire data series for training and testing, respectively. From Sarighamish Station, data for 15 years (0.75 × 244 = 184) are for training, and data for the remaining five years (0.25 × 244 = 61) are considered for testing. From Varand Station, data for 17 years (0.75 × 262 = 197) are for training, and data for the remaining six years (0.25 × 262 = 65) are for testing. Additionally, input data for modeling in this study are SSL and discharge data from the selected stations.

Sediment rating curve (SRC)
The SRC is a mathematical equation relating water discharge and sediment concentration. The SRC is an essential tool when there is no manpower, regular sediment sampling or laboratory investigation. The SRC can be displayed as either an equation or a graph, relating sediment and discharge, which help to solve sediment related problems efficiently. SRC equations and curves are mainly used in sediment transport estimation in rivers (Demirci & Baltaci, 2013). The SRC often comes in the form of power equation (Kisi & Zounemat-Kermani, 2016). Additionally, a rating curve can help experts to estimate sediment loads using streamflow data. Equation (1) demonstrates the equation in detail, which is basically an exponential regression between SSL and discharge: (1) in which Q stands for discharge (m 3 /s), SSL denotes suspended sediment load (mg/L) and a and b are constant coefficients.

Intrinsic time-scale decomposition (ITD)
ITD, a time-frequency analysis method, was first presented by Frei & Osorio (2007). ITD has the ability to decompose complex, nonlinear and non-stationary signals into a group of PRCs with few iterations. The principal purpose of ITD is to assimilate higher signals into several PRCs. ITD is developed with the following steps: The operator L is considered to represent local extrema of X t , which removes the baseline signal. The performance of the operator results in precise rotation. PRCs can be computed by i.e. the mean of signals.
The steps of ITD algorithms include the following.
(a) Defining the extreme points of x(t) as an input signal with the occurrence time of τ k , where k = 0, 1, 2, . . . with τ 0 = 0 as the first signal (b) Defining x(t), L(t) and H(t) as input signal and operators on the interval [0, τ k + 2], respectively. It is noteworthy that operator L is considered to be a linear function on the interval [τ k , τ k + 1] during the baseline extraction process. The baseline extraction operator is computed as follows: and in which 0 < α < 1 is taken as a constant parameter (α = 1/2). The process of making monotonic original signal generation x(t) between the extrema points is an essential step in defining PRCs. (c) H(t) can be defined as follows for extracting PRCs: PRCs can be computed by subtracting the baseline from the input signal.
(d) The process is repeated (Equations 9 and 10) until a monotonic function is constructed by baseline (L(t)), which is a single signal divided into PRCs: in which p is the number of PRCs constructed.
The critical concepts about the ITD model can be summarized as low computation time, free transient smoothing, smearing in time-scale space solving, iteratively generating optimum PRCs and constant shifting. Additionally, because of feature extraction and filtering, this method can perfectly adapt to realtime data (naturally accrued) while maintaining its initial structure. More details about the ITD method can be found in Frei and Osorio (2007) and Wang et al. (2021).

Multi-Objective genetic algorithm-based EPR (MOGA-EPR)
EPR is one kind of hybrid and nonlinear regression mathematical method that is hybridized with genetic algorithms (GAs) (Giustolisi & Savic, 2006). In other words, a GA is used for searching for exponents in a symbolic formula based on a regression technique. This method demonstrates the physical polynomial structure of a system. The main objective of polynomial models is to combine output variables with functions. EPR can be summarized in two main steps, namely (a) structure generation made by a GA, and (b) estimating constant variables using the least squares (LS) method. Based on the EPR model, the prevalent mathematical formula is shown as follows: where y is the independent variable (output), a i are parameters needing to be adjusted, F and f are processesbased user-defined functions, X is an input matrix and m is the number of terms excluding a 0 . The input matrix can be defined as It is worth mentioned that the performance of the EPR model has a direct relationship with certain factors, including the number of inputs, selected functions (e.g. logarithmic, hyperbolic, exponential, etc.) (Giustolisi & Savic, 2006). Based on the GA algorithm, a population of solutions as individual chromosomes is generated by the LS method (minimizing the sum of squared errors). Thus, the general EPR equation is expressed as follows: in which m is the maximum number of additive terms, X i andŶ are model inputs and outputs, respectively, f and ES are user-defined functions and k is the number of input variables. More details about the EPR method can be found in Bonakdari et al. (2017) and Ghaemi et al. (2021).

Model tree
The MT method was first introduced by Quinlan (1992). It is a kind of machine learning method that deals with the classification of extensive data and, as a result, generates robust and rigorous models. MT is based on a binary decision tree; thus, it generates tree-based models and is used for continuous class learning. It is noteworthy that decision trees are suitable for categorical type data, but it is applicable for numerical type data too (Quinlan, 1992). It is used to find a proper relationship between input(s) and output variables by utilizing linear regression models to primary leaf nodes (parent nodes). MT is based on the "divide-and-conquer" method, in which the data sets either connect to a leaf (parent node) or split into subsets in order to evaluate results within the process. In some cases, these divisions result in complex structures; thus, it prunes back by new subtrees and leaves. Computing the standard deviation (sd) of data sets is the first step in the MT method. The data sets are then split to generate a decision tree. Eliminating over fitted outcomes (pruning) is the second step in the MT method. Pruning techniques, which are based on regression functions, are useful in omitting sub-trees.
The standard deviation (error) is computed as follows: in which "sd" signifies the standard deviation, T is a set of examples that reach the primary node and T i denotes the subset of patterns that possess the ith outcome of the potential data set. By inspecting all outcomes, the MT method selects a node that minimizes the error. The strong points of the MT method include error estimation in unseen cases, utilizing multivariate regression methods in each node, simplification of linear models to minimize error and smoothing predicted values. More information about the MT method can be found in Ghaemi et al. (2019).

Hybrid ITD decomposition-based models
In nature, hydro-climatic time series often comprise many intrinsic mode functions with different frequencies and exhibit complex nonlinear characteristics. In addition, the results of climate change (such as extreme weather) and human activities (such as environmental pollution and deforestation) are becoming prominent, and these adverse hydrological parameters might lead to deviation from normal climatic patterns and thus accurate prediction could be difficult to achieve. Consequently, the performance of a single prediction model is often not advised if the original signal (or one resolution component) is directly adopted as the input variable.
Signal decomposition algorithms can decompose hydrological time series into a set of relatively stable sub-series and reduce modeling difficulty. Hence, subseries decomposed by the pre-processing algorithm, ITD, include only a similar scale of hydrological variables and thus are easier to predict. ITD is firstly used to decompose suspended sediment load and river flow time series into a set of sub-series and then a standalone MT and MOGA-EPR are applied to build adequate models for the prediction of each sub-series according to their own characteristics. A schematic diagram of the proposed hybrid ITD-MT and ITD-EPR models is given in Figure 2.
The pre-processing-based models (i.e. ITD-MT and ITD-EPR) comprise the following main steps. In the first step, the original data is divided into two parts, namely the training and testing parts. Secondly, the ITD procedure is employed in order to decompose the original input and output time series E(t) into several PRC components H(t) (i = 1, 2, 3, . . . , n). In the next step, for each extracted PRC component (for example PRC1), MT and EPR models are established as SSL predicting tools to simulate the decomposed PRC components, and to compute each component by the same sub-series (PRC1) of input variables, respectively. Finally, the predicted values of all extracted PRC components using MT and EPR models are aggregated to generate the SSL, and then the error via the predicted data set is evaluated. To summarize, hybrid ITD and MT/EPR predictive approaches are employed according to the "decomposition and ensemble" idea. The decomposition is used to simplify the forecasting skill, while the aim of the ensemble is to formulate a consensus prediction on the original data. In this study, for validating and making the pattern of the provided PRC components (e.g. PRC1, PRC2, PRC3) reflect the prediction model and enhancing the SSL prediction accuracy, data sets from Sarighamish and Varand Stations in Iran are selected.

Statistical performance evaluation measures
In order to determine the best model, five statistical criteria, namely the coefficient of determination (R 2 ), the root mean square error (RMSE), Legates-McCabe's Index (LMI), the ratio of RMSE to standard deviation (RSD), Willmott's index of agreement (WI) and the Akaike information criterion (AIC) are implemented as shown in Equations (11) to (15) (Moeeni et al., 2017;Sun et al., 2021;Zeynoddin et al., 2019): where SSL o and SSL m are the observed and modeled values of SSL, respectively. Moreover, SSL o and SSL m denote average values of the observed and modeled SSL data, respectively, N is the total number of data points and K indicates the number of parameters.

Results and discussion
The main aim of this study is to evaluate the accuracy of ITD-EPR/MT in predicting SSL at Sarighamish and Varand Stations. Thus, in this section, the performance of the models with different input combinations (some   197) are for training, and the remaining six years of data (0.25 × 262 = 65) are for testing. After that, the performance of standalone models of EPR and MT, and integrating them with ITD pre-processing, are investigated using some evaluation benchmarks (namely R 2 , RMSE, WI, LMI, RSD and AIC) in the training and testing stages at the two proposed stations.

Optimum input variable selection for SSL prediction
In this study, the proposed prediction models are developed in a MATLAB R environment. One of the most remarkable steps in the development of model architectures is to determine the best input variable for modeling. The original (non-ITD) dataset with its statistically substantial lagged variables, determined by cross-correlation functions (CCFs) and partial autocorrelation functions (PACFs) operating in a 95% confidence interval, are applied as inputs for developing the models. For the time-series datasets of this present study, the time delays of input/output parameters (SSL and Q) are computed by the above-mentioned function. PACF and Partial CCF diagrams of Sarighamish and Varand Stations are shown in Figure 3, in which the vertical axis indicates the time delay (lag number) and the horizontal axis shows the PACF and CCF. Time delays applied to the models are marked in all diagrams. According to Figure 3, two lags of Q are important for modeling SSL at Sarighamish Station. Moreover, PACF is applied for lag time selection of the output variable (SSL). Clearly, the PACF value for Sarighamish Station equals two, whereas no time lag is determined for either the input or output variables at Varand Station.
After having considered special input variables for the proposed stations, the best models standing at an acceptable level of accuracy returned by EPR for Sarighamish and Varand Stations are expressed, respectively, as follows:

Prediction results at Sarighamish Station
As seen in Table 2, the integration of ITD-EPR/MT gives preferable accuracy (i.e. generally the largest R 2 and the lowest RMSE) compared with the other standalone models, which indicates that intrinsic time-scale decomposition has a high influence in increasing the accuracy of DDM models at Sarighamish and Varand Stations. On the other hand, regarding the performance comparison of the proposed methods at the training stage, it is obvious that, for SSL forecasting, the best predicted SSL values (based on the observed values) are yielded by ITD-EPR with statistical parameters (highest R 2 = 0.95, lowest RMSE = 1385.4 and RSD = 0.257) compared with inferior results for ITD-MT (R 2 = 0.93, RMSE = 1537.1  and RSD = 0.285), and standalone models such as EPR (R 2 = 0.82, lowest RMSE = 2293.7 and RSD = 0.426).
In the testing stage, evaluation metrics in terms of RSD (0.35) and RMSE (291.81) by ITD-EPR outperform those by other methods such as SRC with higher RSD (40%) and RMSE (43.73%), which stands at the second rank. Despite of acceptable performance of ITD-MT for the training dataset, it has lower accuracy than standalone EPR (15.18% of R 2 and 37.5% of LMI) for estimated SSL values. In addition, ITD-EPR is selected as the most appropriate model based on AIC (700.484) with the lowest error.
Furthermore, in order to achieve a thorough concept of the efficiency of the proposed models, scatter plots of the observed and estimated values in the training and testing stages are displayed in Figure 4. Scatterplots show a linear regression line (SSL for = a·SSL obs + b, where a is the slope and b is the ordinate intercept) between the measured and estimated values. A greater R 2 illustrates a better agreement between the observed and forecasted SSL values, and the ITD-EPR model outperforms other methods applied.
Peak SSL values observed and forecasted by different methods in the training and testing stages are presented  in Figure 5. According to Figure 5, among the models, the difference between observed and predicted SSL by the SRC model is the highest. On the contrary, SSL predicted by ITD-EPR (the solid blue line) is the closet to the observed TDS time series (the dotted black line), which explains the best accuracy of this model compared with other models. It is also obvious from Figure 5 that the dispersion of relative errors between the observed and forecasted SSL by the ITD-EPR model is closest to zero.

Prediction results at Varand Station
Similarly, the evaluation metrics R 2 , RMSE, WI, LMI and RSD applied for SSL prediction at Varand Station for training and testing datasets and results are presented in Table 3.  (Table 3). Similar to the training stage results, comparison of the proposed models in Table 3 indicates ITD-EPR (RMSE = 168.91; WI = 0.93; LMI = 0.69) and SRC (RMSE = 321.08; WI = 064; LMI = 0.48) attain the highest and lowest SSL forecasting accuracy in the testing stage, respectively. Moreover, an evaluation of the efficiency of ITD in SSL prediction indicates that the integration of this pre-processing method can improve the accuracy of the proposed models. It is shown from Table 3 that, although the MT model estimates SSL values poorly (R 2 = 0.69 and LMI = 0.5), its integration with the ITD method increases R 2 and LMI values by roughly 27.53% and 10%, respectively.
Additionally, scatter plots, relatively error and time series of estimated SSL values versus the observed ones are presented in Figures 6 and 7. In terms of scatter plots, the slope of SSL values for the ITD-EPR model is closest to the ideal line and a number of SSL estimated values are underestimated. Moreover, hybrid models have better performance than other models in estimating peak values. Taking the maximum SSL value as an example (1123.678), the proposed methods, namely EPR, MT, SRC, ITD-EPR and ITD-MT, underestimate by about 18.95%, 35.31%, 63.06%, 5.11% and 13.56%, respectively. In the case of the relative error for the testing dataset, the variations of relative error values are between −5 and 5, whereas those for the best standalone model (EPR) are between −8 and 8.

Discussion and study limitations
The SSL simulations in the present study reveal that there are obvious differences between the original DDMs and data decomposition-based models, indicating the significance of the training framework in prediction models. As stated above, for a standalone prediction model, it is often hard to fully and accurately reflect the formation and changing mechanisms of natural hydrological variables such as SSL since only one resolution component is used for establishing the predicting module. It indicates that other resolution sub-components in the original SSL time series cannot be separated effectively. To avoid this problem, decomposition algorithms are suggested to select various resolution intervals, and after that, the features of each sub-series can be separated. Therefore, the performance of hybrid methods (such as ITD-MT and ITD-EPR) are superior to those of the standard MT and EPR methods, which match the results of recent similar studies (Napolitano et al., 2011;Wu & Huang, 2009).
In addition, for further comparison, box plots are employed for evaluating the accuracy of the proposed models in SSL forecasting at two proposed stations. In this study, box plots indicate the spread of relative errors between observed and estimated SSL values based on quartiles so that the whiskers explain the variations outside the 25th and 75th percentiles (Figure 8) (Prasad et al., 2018). Regarding the distribution of the relative errors of the models, the higher and lower capability of ITD-EPR and SRC are clear compared with other approaches. Furthermore, it is obvious that a huge number of SSLs estimated by models are underestimated.
Moreover, the prediction performances of various methods for peak SSL values are compared. The thresholds for sediment series for Sarighamish and Varand Stations are equal to 11164 and 564.721 ton/day, respectively. Therefore, 10 SSL observations are selected exceeding the threshold and their corresponding forecasted values by the proposed models are shown in Figure 9. In case of Sarighamish Station, except the SRC model, which has the lowest accuracy for all maximum SSL values, all other models have approximately the same error for SSL peak values with less than 10,000 ton/day. However, for other peak values, the efficiency of the ITD technique in SSL forecasting is undeniable. The dominant superiority of ITD-EPR in SSL forecasting is summarized with an average error percentage value (13.60%), which is significantly lower than the corresponding ITD-MT (26.79%), EPR (32.26%), MT (41.45%) and SRC (75.56%) for 10 sediment peaks. In the case of Varand Station, all models are roughly incapable of predicting SSL peak values, especially for values greater than 10,000 ton/day. Additionally, ITD-EPR has a good performance in predicting peak values of SSL with an average error percentage value of 25.88% compared to the corresponding ITD-MT (26.52%), EPR (47.51%), MT (45.19%) and SRC (67.16%) values for 10 sediment peak amounts.
Although the proposed model has an acceptable accuracy in SSL prediction, it is possible to employ other evolutionary DDMs and modern algorithms for integration with pre-processing methods, such as vibrational mode decomposition (VMD) and complete ensemble empirical mode decomposition (CEEMD), to create more accurate models in SSL prediction. With the aim of more accurate SSL estimation, more data samples or different input variables with daily or hourly timescales may be attempted. Consequently, for future work, they may be suggested as the potential measures for improving the accuracy of SSL for forecasting both data samples and different input variables.

Conclusions
In this study, the capability of integrating ITD and DDM methods is investigated in SSL forecasting at Sarighamish and Varand Stations in Iran. This study focuses on the effect of the time delay of input/output parameters in SSL predicting. A comparison of result indicates that the ITD data-decomposition method, via decomposing datasets and solving non-stationary behaviors of SSL time-series datasets, has a notable influence on increasing model accuracy. For instance, at Sarighamish Station, the estimated SSL using ITD-EPR has lower errors for RMSE (291.81) and RSD (0.35) compared with other methods. By comparing the performance of EPR and ITD-EPR, it can be seen that the computed value of RMSE decrease from 431.06 to 291.81 and the value of RSD also decreases, from 0.55 to 0.35. At Varand Station, the outcomes demonstrate that EPR and MT are accurate models with the help of the ITD algorithm. However, the results indicate that the MT model attains lower accuracy compared to other DDM models for SSL prediction in terms of R 2 (0.73) and WI (0.91) at Sarighamish Station and 0.69 and 0.79 at Varand Station, respectively.
Additionally, the proposed models are compared with the sediment rating curve (SRC) empirical method. Outcomes illustrate that SRC provides the greatest average error percentage values for both Sarighamish (75.56%) and Varand (67.16%) Stations compared with those from DDMs.
The application of the decomposition method is thus highly recommended for SSL forecasting with the same scale of input/output variables and watershed properties for evaluating DDM generalization. Although other non-linear programming simulation methods can be applied to find contributors to SSL in river bodies, it may typically be subjected to a prohibitory computational burden, especially for complex and large river systems estimation.
It should be noted that the selection of the number of training datasets in DDMs has a great impact on the estimation accuracy. It can be expected that the accuracy increases to a remarkable level by increasing the size of the data, since it leads to a better capability of models to predict SSL variability over different periods. The outcomes of this study can assist in obtaining a range of datasets to provide an optimum model for SSL estimation. It appears that the future of SSL prediction by different evolutionary DDMs and creating modern algorithms will be very bright and promising.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This study is supported by the Scientific Research Fund of Zhejiang Provincial Education Department+Research on Key Technologies for Intelligent Prediction of Soil Erosion (Y202147738). This work also was supported by Taif university Researchers Supporting Project Number (TURSP-2020/114), Taif University, Taif, Saudi Arabia. Open Access Funding by the Publication Fund of the TU Dresden.