Artificial intelligence models for suspended river sediment prediction: state-of-the art, modeling framework appraisal, and proposed future research directions

River sedimentation is an important indicator for ecological and geomorphological assessments of soil erosion within any watershed region. Sediment transport in a river basin is therefore a multifaceted field yet being a dynamic task in nature. It is characterized by high stochasticity, non-linearity, non-stationarity, and feature redundancy. Various artificial intelligence (AI) modeling frameworks have been introduced to solve river sediment problems. The present survey is designed to provide an updated account of the latest and most relevant AI-based applications for modeling the sediment transport in river basin systems. The review is established to capture the subsequent developments in the advanced AI models applied for river sediment transport prediction. Also, several hydrological and environmental aspects are identified and analyzed according to the results produced in those studies. The merits and constraints of the well-established AI models are further discussed in much detail, particularly considering state-of-the art, modeling frameworks and their application-specific appraisal, and some of the key proposed future research directions. Together with the synthesis of such information to drive a new understanding of models and methodologies related to suspended river sediment prediction, this review provides a future research vision for hydrologists, water scientists, water resource engineers, oceanography and environmental planners.


Introduction
River systems are often regulated for multipurpose usage, such as, but not limited to, water supply, irrigation, navigation, flood control, and hydropower generation 2018; Yaseen, Zigale, et al., 2019). Effective utilization of a river system's resources and the mitigation of risk posed to this system usually requires water treatments with different forms of civil structures and hydrological control systems Yaseen, Mohtar, et al., 2019). Furthermore, altering the watershed runoff conditions by increasing the land use intensity for agricultural or recreation activities, and by removing the vegetation cover for urbanization, building roads and highways constructions, or mining operations can significantly affect the dynamics and the morphology of a river system (Heathcote, 2009). When a river system's equilibrium is disturbed by human-induced activity, it often tends to adjust to a new equilibrium state by scouring the bed, depositing the sediments, or changing its overall planform. Such changes may be localized or may also extend over a long reach. The issues of aggradation and degradation, siltation of reservoirs, channel scours during a flood event, local scour around the structures, and river bend migration, are the examples of such changes to a natural river system. Excessive amount of sediment in water creates problems for the operation of hydraulic machinery (Betrie et al., 2011;Walling & Collins, 2008). Large concentration of sediment also affects the overall quality of water. Therefore, addressing such problems requires an prediction of suspended loads, which represent about 95% of the total sediment loads (Simons & Şentürk, 1992). Development of a reliable suspended sediment transport model still remains a challenge due to the complex character of a river system's geometry that governs the water velocity, and the turbulence structure of flow, which in turn, control the sediment-carrying capacity of the water flowing through the river system (Armanini et al., 2015).
Despite the importance of sediment transport in environmental pollution, its pattern understanding from the engineering point of view is less than satisfactory, perhaps due to the complex interrelations of a large number of variables that influence the transport process (Shojaeezadeh et al., 2018). Traditionally, hydrologic studies have relied on similitude analysis through experiments, because there appear to be no reliable and comprehensive theoretical formulae that can describe the two-phase phenomenon of fluid and sediment transport. Numerous analytical and experimental methods were developed to compute sediment transport variables. Some of these methods describe the geometric boundary and its resistance to water flow, sediment transport rate, and mass conservation of sediment. This article aims to review some of the useful and practical artificial intelligence (AI) models that have been applied to model suspended sediment in a river system. The AI-based models are described in the next section after reviewing the relevant literature related to suspended sediment transport.
Accurate prediction of the amount of suspended sediment in rivers and streams is critically important for the operation of canals, diversions, and dams (i.e. hydraulic structures) (Cigizoglu, 2004;Liu, Zhou, et al., 2019;Sharafati et al., 2019;Suif et al., 2016). In watershed systems, sediment transport and erosion are a complex hydrological and environmental problems so the impact of sediments present in a river in terms of the global utilization of surface water resources has become a major research area (Greig et al., 2005;Malagó et al., 2017;Sinha et al., 2019). Several natural processes influence sediment dynamics in river basins, including deforestation, overgrazing, and agricultural activities that erode the soil surface and contribute much of the sediment input. Given the difficulty faced by physical-deterministic models (e.g. issues associated with initial or boundary conditions, stochasticity of river flow, and non-stationarity of the flow), the prediction of sediment load in a river is more likely to be achieved using AI-based modeling frameworks that are capable of handling nonlinear relationships between water flow and environmental factors.
It is of prime interest to the present discussion that we consider suspended sediment prediction as a complex and nonlinear process, given the several influencing factors . In general, these factors, which are likely to drive our understanding of the causal inference that exists between sediment transport and its related predictor variables, could be classified into four primary categories as Afan et al. (2016): (i) Meteorological origin (such as precipitation characteristics and erosive effect of rainfall). (ii) Hydrological origin (such as base flow, runoff amount, and floodwater rates). (iii) Geological origin (such as river basins' soil characteristics). (iv) Watershed geomorphology (drainage pattern, topography, and hypsometry).
Due to its interdisciplinary nature, sediment transport remains a widely explored research area, spanning across several methods that have been developed for the prediction (estimation) and the forecasting of suspended sediment in rivers (Williams & Berndt, 1976). However, the adoption of purely physical or deterministic methods can be inaccurate or rather time-consuming (particularly incorporating the initial or boundary conditions into the physical-based models that differ significantly for different watersheds). Furthermore, such methods can be overparameterized due to the nonlinear and complicated nature of suspended sediment transport in river bodies. Even though such spatial data can be obtained from satellite or other remotely sensed sources, they are difficult to obtain at the precise level of initial condition for different watersheds and also, they may need calibration before they are ready to be used as a physical model input (Akay et al., 2008).
The determination of physical processes involved in the transportation of suspended sediment to water bodies is important for the practical implementation and putting in place the necessary measures to mitigate sediment deposition (Adams et al., 2018;Sadeghi & Singh, 2017). Several studies have recently reported the issues associated with active storage and estimation of a reservoir's lifespan (De Vente et al., 2005). The issue of reservoir sedimentation is thus a universal problem, which, if solved, can be of universal value. Another issue is the progression of reservoir sedimentation that is generally is a complex transport mechanism (Verstraeten et al., 2003).
Various studies have been shown that the complexity of physical processes related to the current-flow density can contribute to the difficulty in predicting the concentration of suspended sediment (Leisenring & Moradkhani, 2012;Wu et al., 2018). The density current-flow refers to the specific flow pattern experienced. This can happen when the water-specific gravity of the inflow turbidity is greater than that of water in the reservoir . The water of high relative density settles at the base of the reservoir and maintains its flow under relatively clean water. However, there is a clear demarcation between the two fluids of varying densities. A conceptual model's capability to represent reality is a function of its conceptual background; hence, it is necessary to ensure that such models are built using out-of-range data. Thus, the dynamics of density current flow can be established using a conceptual model developed with reliable data in the first stage.
Water pollution is another sedimentation-related problem of interest to the present study . Soil degradation by water erosion causes the transportation of pollutants from the soil surfaces into the river bodies (Halecki et al., 2018;Zhao et al., 2017). Numerous models and empirical equations have been developed for soil losses estimation, especially to estimate the size distribution of sediments leaving the field, e.g. empirical equations and physically based models (Rice & Church, 1996). The physically-based models usually operate with a high quality, and huge data volume, which sometimes may be greater than the actual available amount of data in the study area. These data are often required to develop the empirical relationship(s) between the watershed and relevant predictors, and they must be fed as initial conditions, or empirical constants into the physics-based equations for model validation purposes. On the other hand, the composition of sedimentassociated with individual events cannot be predicted by such empirical equations, prompting the need for the often unavailable data (Cigizoglu & Alp, 2006).
Suspended sediment load (SSL) has been modeled using several methods like empirical or AI models (such as those using data-driven approaches), hydraulic/ numerical methods (such as sediment transport or physics-based models), statistics-based models (such as copula function or joint distribution models), as well as physical/mathematical models (such as distributed physically-based and lumped conceptual models) (Ab Ghani & Azamathulla, 2014;Merkhali et al., 2015). In this respect, some studies have attempted to analyze the pattern of SSL in stream flow by using one factor such as high flow and a flood event during individual hydrologic events, and these have been achieved mainly by the use of mathematical models (Duy Vinh et al., 2016;Fang & Wang, 2000;Huang et al., 2015;Lenzi & Marchi, 2000). These studies have reported that the relationship between SSL and the flow rate is not so robust, and that it is likely to follow a non-constant (i.e. non-linear) trend. Furthermore, most of these studies have pointed out a lag between the peak of SSL and that of the flow rate. Regarding the hydraulic/numerical models, these tend to be time-consuming and rather complicated because they involve solving a set of differential equations in 2-phase stream flow discharge and sediment transport (Demirci & Baltaci, 2013;Jha & Bombardelli, 2011). Mathematical models may not work favorably, unless the true spatial distribution of the data for most of the important variables e.g. evaporation and precipitation are provided and are of relatively high quality. They also require data on the spatial changes in watershed properties; hence, such models involve a prolonged modeling process. Owing to the unavailability of required data in most developing countries, the results deduced from physical and mathematical models are often uncertain and can be impractical for diverse watershed regions (Afan et al., 2016).
Traditionally, either simple statistical models (e.g. sediment rating curves (SRC)) or numerical models (e.g. finite difference methods) have been employed to simulate the behavior of SSL in rivers or streams (Nguyen et al., 2009;Walling, 1977). Recently, the emergence of artificial intelligence (AI) or machine learning (ML) models, that operate by the integration of soft computing methods and data mining approaches, have led to many promising results, especially in simulating nonlinear systems related to hydrological procedures to solve water resources problems (Qin et al., 2019;Zounemat-Kermani et al., 2009). Considering recent developments in artificial intelligence models and their implementations in modeling sediment transport in river basins, it is necessary to therefore create new grounds that can advance the past and current progress on AI-based models.
The diverse nature of independent parameters required to model sediment problems as well as the complicated nonlinear process of SSL have provided motivation to explore the capability and efficiency of AI-based techniques for SSL modeling (Francke et al., 2008;Olyaie et al., 2015;Shamaei & Kaedi, 2016). To date, several supervised learning, AI-approaches have been developed for the modeling of SSL. The frequently used keywords on river sediment prediction that employed AI-models and the countries where these works were undertaken are reported in Figure 1. Determined from the literature, these are based on the works reported in Scopus database with Figure 2 presenting the implemented models as being: (i) Network-based AI models include algorithms, such as: • Artificial Neural Network (ANN) (Alp & Cigizoglu, 2007). • Adaptive neuro-fuzzy inference system (ANFIS) (Kisi & Zounemat-Kermani, 2016). • Wavelet transformation (Rajaee, 2011).
The current study is aimed to survey the recent literature on the applicability of AI methods for sediment transport modeling, particularly over the past four years, with the primary aim of providing a best practice, summarized guideline on the application and enhanced capability of AI models. The survey also highlights the progress made in terms of AI methods and related advanced computer-aided methodologies for modeling river sediment transport. The review also discusses each of the AI model's merits and constraints, the necessity for additional exploration of various river types, their unresolved issues related to data collection, and recommendations for future research. Since previous studies embarked on an extensive review of the AI models (Afan et al., 2016), this study serves as an extension recognizing recent advancements in the deployment of AI models as prediction tools. We also aim to make recommendations for novel approaches that demonstrate the versatility of AI models in river systems engineering, water resources management and sustainability.

Artificial neural network
Artificial neural network is a powerful computation technique that can handle non-linear relationships and complex problems (Qin, Wang, Lin, Zhang, Xia, et al., 2018). ANN models have been applied in many areas, such as speech and image recognition, chemical research, medicinal and molecular biology research, and ecological and environmental studies (Lek & Guégan, 1999). ANN models were developed, considering the problem at hand and the solution to achieve the goal (Qin, Wang, Lin, Zhang, and Bilal, 2018). An ANN architecture consists of three layers, the first one is called the input layer where computation of weighted sum is performed, the second layer is used for data processing i.e. hidden layer, which can be converted to multiple layers depending on the complexity of the problem, and lastly, the final result is produced in the output layer. Each layer is made up of neurons which unite to form the architectural framework (Singh et al., 2009). Conventional ANN models typically consist of multi-layer feed-forward neural network with application of backpropagation algorithm (BPNN) (Rumelhart et al., 1988); similarly, radial basis function neural network (RBFNN) applies feed-forward neural network and one hidden layer for model execution (Chen & Cowan, 1991), multi-layer perceptron (MLP) network using feedforward neural network (Rosenblatt, 1961), recurrent neural network (RNN) using backpropagation and additional layer connected to hidden layer (Du & Swamy, 2019), Levenberg-Marquart (LM), Bayesian regularization (BR), and gradient descent and adaptive learning  (Maier & Dandy, 1999;Samarasinghe, 2006). The general architecture of three layers is presented in Figure 3.

Support vector machine
SVM, based on statistical learning theory, has a resilient conjectural statistical arrangement which makes it a robust model. SVM has been successful in handling classification, regression and other kinds of forecasting or predictive modeling. The SVM model is based on statistical learning theory and structural risk minimization hypothesis. It was developed by (Cortes & Vapnik, 1995) and since it is a kernel-based model, it is able to reduce model's complexity and prediction error. Usually, kernel type models have good adaptability, can achieve great results in global optimization and generalization, and are capable of managing small samples and minimizing empirical risk (Raghavendra & Deka, 2014). Kernels are special nonlinear functions created by non-linear mapping and allow the model to separate complicated hyperplanes. The correct selection of a kernel function is the key to produce excellent model performance. The most applied kernel functions are linear, polynomial, radial basis function, and sigmoid. The modified versions popularly applied are least-square SVM (Suykens & Vandewalle, 1999), linear programming SVM (Zhou et al., 2002), and Nu-SVM (Schölkopf et al., 2000). The general architecture of the SVM model is presented in Figure 4.

Adaptive neuro-fuzzy inference system
Fuzzy logic models are a powerful tool for dealing with difficult computational problems (Wang, Kisi, Zounemat-Kermani, Zhu, et al., 2017), and can deal with non-linearity, uncertainty, and subjective data. These models overcome the shortcomings of numerical modeling. The most popularly applied model is ANFIS, whose architecture consists of a multilayer feed-forward network that utilizes a neural network learning algorithm and can identify non-linear boundaries and fuzzy logic to distinguish non-linear equations and together can map the input-output space (Takagi & Hayashi, 1988) which allows it to achieve high non-linear mapping for nonlinear time series data. The stages of ANFIS consist of choosing the type of interference system, such as Mamdani, Sugeno and Tsumoto (Mamdani & Assilian, 1975;Takagi & Sugeno, 1985), aggregation, and defuzzification. The general five-layer design of ANFIS is presented in Figure 5.

Bayesian networks (BNs)
Bayesian methods provide conformity for reasoning in terms of the fractional beliefs under the condition of uncertainty. Various numerical parameters imply the degree of belief and later they parameters are combined and manipulated as per the rules of probability theory, considering conditional probabilities, absolute certainty, and conditionally independent variables (Pearl, 2014). They give rise to the conjugate form of Bayesian theorem and graphical theory, which adhere to probability rules for conducting interference. Thus, in a simple form, BNs are probabilistic, directed acyclic graphical models (Gregory, 2005). For experimental data analysis, a conditional distribution is computed using the rearrangement of the chain rule which allows for the computation of conditional probability of the discrete state of any variable with any number of parents. Additionally, nodes stipulate no parents, indicating that random variables are not conditionally dependent, and the joint nodes are conditionally dependent which can be calculated by a joint probability distribution which is the product of nodes and parents. States of nodes and their conditional probability can be allotted as per a qualitative or quantitative procedure. The efficient BNs are as good as the proximity of dependencies, thus only the most relative number of variables and  . Basic Bayesian network design that satisfies the local Markov property which states that a node conditionally independent of its non-descendants' parent (e.g. E & F). Joint probability (e.g. P(G|F), product of P (node|parental node)) is calculated using the chain rule of probability.
their association with the problem should be used for construction (Mount & Stott, 2008). Consequently, the computational cost and complexity can increase exponentially, if the number of nodes is not controlled which may happen in real-life problems. A schematic representation of a basic Bayesian network design is shown in Figure 6.

Bibliography
Three AI models, including SVR, ANFIS and ANN models, were employed for SSL simulation at Coruh River, Turkey (Buyukyildiz & Kumcu, 2017). The models were built using a daily scale of discharge (Q) and SSL hydrological data, and the accuracy of prediction by the SVM model over other implemented AI models was confirmed.
Kaveh et al. (2017) examined different trained ANFIS models with parametric algorithms, such as Levenberg-Marquardt and back-propagation. They used the daily scale of Q and SSL dataset gathered from Schuylkill River, United States and results showed the capability of the applied AI models for modeling suspended sediment load. Bharti et al. (2017) developed an ANN, least support vector regression (LSSVR), reduced error pruning tree (REPT), and M5 Tree model for the modeling of Q and SSL at the Pokhariys watershed, India. Using climatological variables with monthly scale, including rainfall, air temperature (AT), relative humidity (RH), pan evaporation (ET o ), solar radiation, sunshine duration, and wind speed, they showed the influence of climatological variables on the modeled Q and SSL. However, the performance of AI models was related to the variance results.
Pektas and Cigizoglu (2017a) presented a traditional perdition investigation for daily scale sediment transport in Yadkin River using the feasibility of ANN and MLR models. The ANN model reported an acceptable prediction performance. The same case study was investigated for ANN internal tuning (Pektas & Cigizoglu, 2017a, 2017b. On the same trend, ANFIS and ANN models were implemented for sediment prediction (Riahi-Madvar & Seifi, 2018). Predictive models constructed, based on hydraulic parameters, including hydraulic radius, water surface slope, sediment particle dimensions, sediment shear, river discharge, and water depth. From statistical results and uncertainty analysis, the ANFIS model showed better results than the ANN model. The ANFIS and ANN models were developed for simulating suspended sediment concentration for Thames River, London, Ontario, using daily water temperature, river discharge, and electrical conductivity, and results evidenced the potential of ANN in modeling SSC. Moeeni and Bonakdari (2018) combined the integrated autoregressive moving average exogenous (ARMAX) model with ANN to predict SSL based on river Q and SSL using daily time scale. Normalizing the data using the exponential and Box-Cox transformations, they found the combined model to have an excellent predictive performance. Hamaamin et al. (2018) designed two AI models, including ANFIS and Bayesian regression (BR), to predict SSL for Saginaw River, United States, with daily data of air temperature, rainfall, and SSL as input variables. The models were validated against the soil and water assessment tool (SWAT) and both models provided a reliable alternative for SWAT to predict SSL. Gholami et al. (2018) applied the ANN model to evaluate soil erosion from Kaslilian watershed, Iran. The Geographic information system (GIS) was used for data pre-processing of the spatial variation of soil erosion, where the predictors were rainfall intensity and amount, soil moisture, vegetation cover, slope, air, and soil temperature. The ANN model performed well for the watershed.
The classification and regression tree (CART), adaptive neuro-fuzzy inference system, multi-layer perceptron, and two support vector regression models were compared for simulating SSL of Haraz watershed in the northern region of Iran (Choubin et al., 2018). These models were constructed, based on several hydrometeorological data (e.g. river water level, river discharge, rainfall, and SSL). The CART model was found to be superior, especially for one-month lead time. Samet et al. (2019) predicted SSL in Gizlarchay River, Maku Dam, Iran, using ANFIS, ANN, and GP using water temperature, river discharge, and three-section sediment sampling (CM) and found ANFIS to be far better than ANN and GP. Khan et al. (2019) predicted SSC of the Ramaganga River, India, using the ANN model and with daily river discharge and suspended sediment concentration. The SSC prediction was promising for the watershed. Emamgholizadeh and Demneh (2019) used daily data of Q and SSL from Telar and Kasilian Rivers, Iran, to construct several AI models (i.e. GEP, ANN and ANFIS) which were validated against the SRC approach. The evolutionary GEP model showed potential for modeling SSL. Bisoyi et al. (2019) employed the ANN model to predict SSL for Narmada River, India, using daily rainfall, Q and SSL data. Considering various hydrological variables and focusing on one peak event based on monsoon season, it was necessary to use rainfall and Q datasets.
Proper input selection is essential for predictive models (Noori et al., 2011). Kumar et al. (2019) used ANN and ANFIS along with Gamma Test (GT) for input selection for modeling river sediment and discharge. Daily antecedent Q and SSL from Pathagudem and Polavaram watersheds in India for the period 1996-2010 were used. Results showed the suitability of GT as a prior stage for the learning process. Singh et al. (2018) integrated GT and correlation function (CF) with AI models, including multi-layer perceptron (MLP), co-active adaptive inference system (CAN-FIS), self-organization map (SOM), and radial basis function (RBF), and two regression models (e.g. SRC and MLR). Models were constructed for hydrological data obtained from Burhabalang basin, Orissa, India. Both GT and CF were used to select the appropriate input variables as a prior modeling procedure. The MLP model with one lead time of SSL and Q was the best prediction model among all other models. Malik et al. (2019) used the RBF, SOM, least square support vector regression, and MARS models with GT for input selection to predict SSC from the Ashti, Tekra and Bamini stations located at Godavari River basin, India. They found the RBF model to be the best model.

Wavelet transform
The limitation of various AI models to handle nonstationary data has led to incorporate wavelet transform, developed by (Grossmann Jean, 1984), as s signal preprocessing method to provide insights into timescaling and its relationship through numerical analysis and manipulation of multi-dimensional signals. This technique can be used for diagnostic classification and forecasting (Nourani et al., 2014). The hybrid model applies the wavelet transform, however, its performance is based on the selection of a suitable mother wavelet and decomposition level. Frequently used mother wavelets are Daubechies-1, Daubechies-2, Daubechies-4, Haar, db2 and db4. The two common forms of wavelet analysis are discrete wavelet analysis which deals with discrete signals and decomposes the series into sub-signals at a specific wavelet and decomposition level (Daubechies, 1988). The other is continuous wavelet transform which deals with continuous signals and is useful for disclosing series features under multi-temporal scales (Percival & Walden, 2000). Wavelet transform is applied for wavelet decomposition, wavelet de-noising, wavelet aided complexity explanation, and wavelet aided predicting (Sang, 2013). A schematic diagram of hybrid wavelet-AI model is shown in Figure 7.

Bibliography
A complementary predictive model, based on the integration of data-mining M5 decision tree (M5Tree) with wavelet data pre-processing technique, was developed for SSL prediction for the upper Rio Grande and Lighvanchai Rivers . Discharge and sediment data with daily and monthly time scales were used to build the predictive models, and Standalone M5Tree and ANN models were used for comparison. The complementary WA-M5Tree was found to be a robust prediction model for both rivers and time scales. (Sharghi, Nourani, Najafi, and Soleimani, 2019) coupled complementary wavelet exponential smoothing (WES) with ANN model to enhance predictive performance based on time series data pre-processing as a prior stage to prediction. WES-ANN was validated against autoregressive integrated moving average (ARIMA), seasonal ARIMA (SARIMA), and ANN models. The Upper Rio Grande and Lighvanchai Rivers daily SSL data were used to build the complementary predictive and benchmark models. The proposed model exhibited improved prediction. Himanshu et al. (2017) evaluated the wavelet SVM (W-SVM) and SVM models for daily SSL prediction for different lead times (1, 3, and 6 days), using the Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA). The WSVM model was found to be applicable for the prediction of SSL over all targeted leads, and results showed the bias of TMPA precipitation on SSL modeling for data-sparse regions. Alizadeh et al. (2017) equated the W-ANN model and classical ANN model for predicting multiple scale ahead of SSC at Skagit River, United States. Daily Q and SSC data were used for initiating the models based on the correlated lead times. Results indicated that the W-ANN model was superior to the classical ANN model. Sharghi, Nourani, Najafi, and Gokcekus (2019a) developed a complementary model based on waveletemotional neural network (W-ENN) for predicting daily and monthly SSL in Rio Grande and Lighvanchai Rivers. The motivation of the wavelet preprocessing approach application to establish the fact that sediment transport is related with stochasticity, seasonality, and non-linearity. The proposed W-ENN model was validated against standalone ENN and ANN models and was found to be having potential for modeling SSL. Himanshu et al. (2016) used an ensemble WA-SVR model with hydrometeorological variables for the prediction of SSL, including the peak values of sediment and accumulated sediment for reservoir operation, at multiple steps ahead daily scales, including (1-, 3-, 6-and 9-days). Daily Q, rainfall, and SSL data over 40-years from two watersheds (i.e. Muneru and Marol) in India were used.  used the Hilbert-Huang transform (HHT) to identify the time scale of daily horizon Q, RF, SSC and normalized difference vegetation index (NDVI) of Kuye River, China. Several predictive models, including ANN, MLR, and their complementary version integrated with ensemble empirical mode decomposition (EEMD), were employed. The intrinsic mode function (IMF) was used to detect the correlation between RF, Q, NDVI, and SSC. The integrated EEMD-ANN model was more accurate than MLR, ANN and EEMD-MLR. The non-linear IMF provided antecedent values of the predictors.

Introduction to nature-inspired optimization algorithm
Nature-inspired meta-heuristic algorithms have gained popularity in engineering applications because of their ability to solve optimization problems. They mimic biological incidents, attend local optima, and are flexible. They can be divided into evolutionary-based, physicallybased, and swarm-based. Evolutionary-based methods are inspired by the laws of evolution. They allow optimizing by selecting the best individual which leads to the better next generation. Some of the popular evolutionary algorithms are genetic algorithm (Goldberg & Holland, 1988), evolutionary strategies (Rechenberg, 1994), genetic programming (Koza, 1994), and gene expression programming (Ferreira, 2001). The swarm-based method is based on the social behavior of groups of animals and insects. The most popular optimizers are (i) Particle Swarm Optimization (PSO) found in the social behavior of flocking birds where particles fly in the search space to find the best result (Kennedy & Eberhart, 1995); (ii) ant colony optimization centered around the social behavior of ants in the ant colony to find the shortest path to home and food (Dorigo et al., 1996); and Artificial Bee Colony (ABC) inspired by the behavior of honey bee swarms (Karaboga & Basturk, 2007). In general, the nature-inspired/heuristic algorithms can be combined with AI models to serve as efficient optimization algorithms with the potential to improve their accuracy and precision (Cicek & Ozturk, 2021;Fadaee et al., 2020). The combined models may be called integrative or hybrid AI models. In this study, we use the term hybrid for an embedded AI model with a nature-inspired or heuristic algorithm that improves the performance of a standalone AI model. Adib and Mahmoodi (2017) used the hybrid (integrative) ANN-GA model to predict annual SSL in Marun River, Iran. Zounemat-Kermani (2017) tested several AI models (i.e. ANN, ANN-PSO, ANFIS and GEP) for modeling SSC in highly dynamic river discharge, and sufficient lead times were computed using mutual information. Daily sediment data was gathered from San Joaquin River, United States. The hybrid ANN-PSO was more efficient and robust for SSC prediction than ANN, ANFIS and GEP.

Bibliography
A new AI model, integrating genetic algorithm with SVR model, was developed for modeling daily scale SSL at two earth dams located in Iran (Rahgoshay et al., 2018). Two AI models, including multivariate adaptive regression spline (MARS) and M5Tree, were developed. Results concluded the superiority of MARS and M5Tree models. The river discharge with three-day antecedent values had a major correlation to predict one day ahead SSL.
Integrating the continuity equation and fuzzy patternrecognition into a structure of double artificial neural networks, (Chen & Chau, 2016) developed a hybrid double feedforward neural network model for daily SSL estimation, by integrating continuity equation and fuzzy pattern-recognition into a structure of double ANNs. The results showed that the hybrid model outperformed other three benchmarking conventional ML models.
A fuzzy C-means clustering (FCM) approach was hybridized with an SVR model for internal parameter optimization and the hybrid model was implemented for predicting SSL of Sistan River, Iran (Hassanpour et al., 2019). The proposed hybrid FCM-SVR model was developed using univariate modeling scheme where only daily SSL data were used. Several benchmark models were developed, including SRC, ANN, ANFIS and SVR, and results demonstrated the hybrid FCM-SVR model accurately predicted SSL.
A new hybrid AI model was developed based on a binary nature-inspired optimization algorithm with feedforward neural network model to quantify the monthly sediment load of Narmada River, one of India's largest rivers (Meshram et al., 2019). Using ten years of Q, RF and SSL historical data, the model was validated against the ANFIS model and was found superior to the ANFIS model and accurately modeled SSL. Rahgoshay et al. (2019) hybridized SVR model with two nature inspired optimization algorithms, including GA and particle swarm optimization (PSO) for SSL prediction. M5 Tree and MARS were used for validation purposes. For two case studies (i.e. Veynakeh and Royan), located in Iran, the hybridization of PSO with SVR model was promising for modeling sediment transport. Yadav et al. (2018) developed two hybrid predictive models by tuning ANN and SVR models using genetic algorithm (GA) for modeling SSL using RF, Q, T and SSL data from Mahanadi River, India. Two predictive models, including MLR and sediment rating curve (SRC), were used for model assessment. The hybrid models exhibited a noticeable prediction accuracy.

Introduction to ensemble artificial intelligence methods
In contrast to the conventional or stand-alone AI models, in the ensemble learning, multiple base algorithms are trained. In other words, ensemble learning attempts to improve the performance of conventional AI models by combining multiple hypotheses for the same dataset. Alternatively, ensemble learning can also be accomplished for conjoining the results of different models (e.g. SVM and ANN) for the same dataset (Wang et al., 2011). Ensemble methods are learning algorithms that create classifiers and categorize new data points by considering weighted and unweighted voting of predictions. The basic method is Bayesian average but newer methods have been developed, such as error-correction output coding, bagging, and boosting (Dietterich, 2000). Ensemble classifiers are more accurate than any individual member, and two classifiers producing different errors are considered as diverse classifiers. Additionally, diverse classifiers are judged good since one classifier result is unrelated to the other, giving a better voting advantage when error rates below 0.5 are chosen. There are three advantages over other models like ANN or decision tree: (i) It can produce good accuracy without sufficient data; (ii) it is constructed by running local search from many different starting points which provide better approximation resulting in a better computational technique; and (iii) it has better representation by using effective space of hypothesis search (Zhang & Ma, 2012).
Random forest (RF) is a bagging type ensemble learning method which has an additional layer of randomness. It is the combination of predictors creating a distribution pattern of trees in forest and generalizes the error of tree power of individual trees and correlation among them. Inputs are selected randomly so that the node can grow . Its accuracy is good; it is robust to outliers and noise, faster than bagging and boosting, and provides good internal assessment of the error, strength, correlation and variable importance (Breiman, 2001). The basic architecture RF is presented in Figure 8. Shamaei and Kaedi (2016) applied the ensemble stacking method to predict SSL based on the results of two different AI techniques, including genetic programming and neuro-fuzzy models. It was found that the ensemble technique drastically improved the prediction accuracy of the single soft computing models.

Classification and regression tree (CART)
Sequential binary splits, applied for explanatory variables using a decision rule for allocating items to a group, are the basic concept of a classification tree. CART was initially developed to handle clinical data due to its ability to simultaneously manage combinations of categorical variables and continuous information and search for the best way to split the range of continuous variables into two groups (Breiman et al., 1984). The tree was able to split at different points or nodes till the end or leaf node. The CART addresses linear and regression problems and helps in linear logistic and additive logistic models for classification problems. Binary recursive portioning is used for splitting the sequential data into homogeneous subsets until the condition is fulfilled. Each split depends on the definite variable value and divides into two subsets generating a binary tree structure. The construction features include: selection of binary splits of the measurement space, decision of creating a node or continuous splitting and assigning each terminal node to a class (Crichton et al., 1997). Classification components are dependent variables, independent variables, learning dataset and future dataset. Regression components are prior probabilities from each outcome and cost matrix. The progress of CART models has been fairly slow due to the complexity of analysis in some cases. These models are effective to understand multifaceted interactions between predictors in respect of traditional multi-variate techniques. They can handle highly skewed data and categorical data, deal with missing data by using surrogate variables, need little amount of input for analysis, and are relatively simple to interpret (Lewis et al., 2000).

Multivariate adaptive regression splines (MARS)
Multivariate adaptive regression spline is a flexible regression non-parametric modeling approach which can work with high dimensional non-linear data and is the form of expansion of spline basis function (Wang, Kisi, Zounemat-Kermani, and Li, 2017). It utilizes the technique of recursive partitioning approach and regression analysis (Bhagat, Paramasivan, et al., 2021;Friedman, 1991), is flexible and accurate, and can forecast continuous and binary output. It has been applied in the fields of energy, and environmental and ecological studies and can interpret complex and nonlinear relationships between predictors and response variable and does not make any assumption between input and output variables (Bhagat, Paramasivan, et al., 2021). It consists of knots which are breaks between regions, basic functions for distinct intervals of predictors with two phases known as forward and backward phases. The forward phase generates all possible basis functions and the backward phase eliminates basis functions which create overfitting which in turn improves prediction accuracy (Yilmaz et al., 2018). The general schematic of the MARS model is shown in Figure 9.

M5 model tree
M5 model tree is a piecewise linear model proposed by Quinlan (1992). It maps the inputs and the output, constructed in two different stages i.e. the tree growth and tree pruning. The first stage splits the input into subsets using a linear regression model and diminishes the error between measured observations and predicted values and instantaneously creates decision tree. The second stage consists of the pruning of tress from each leaf (Heddam & Kisi, 2018). The basic architecture of M5 model tree is shown in Figure 10.

Regression model
The regression method is a baseline statistical modeling technique which takes account of the relationship between variables and finds a meaningful relation and differences. A regression model can be linear, non-linear, or parametric and non-parametric, depending on the type of analysis. The basic models consist of predictor variables and response variables. The response variables are assumed continuous, whereas predictors can be discrete or continuous with various assumptions on correlation. There are many frequently applied regression models, such as simple linear regression (Younger, 1979), multiple linear regression (Andrews, 1974), logistic regression (Hosmer et al., 2013), ordinal regression (Harrell, 2015), multinomial regression, and discriminant analysis (Klecka et al., 1980). Khosravi et al. (2018) developed several data mining predictive models to quantify hourly scale SSL measured  at Andean Catchment, Chile. Their models were standalone reduced error pruning tree (REPT), instancebased learning (IBK), M5P, hybrid models (i.e. bagging-M5P, random sub-space-REPT (RS-REPT), and random committee-REPT (RC-REPT)). The input combinations were constructed based on Q, water temperature (WT), and electrical conductively (EC) in addition to SSL. The hybrid bagging-M5P data mining model accurately predicted hourly SSL in comparison with other models. Ulke et al. (2017) inspected five different empirical models, including models of Einstein, Lane and Kalinske, Chang-Simons-Richardson, and Brooks, for predicting SSL of three major rivers in Turkey. The models were formulated using Q, SSL, river cross-section area, material size of the transported material, and Manning coefficient (n). Among the empirical formulations, the Brooks model attained the best prediction results. However, when genetic algorithm was integrated with the Brooks model for internal parameters tuning, the GA-Brooks model showed results comparable to those attained from classical ANN and ANFIS models. The empirical model revealed that the particles size influenced the SSL transport. Overall, the GA-Brooks model exhibited an ability for modeling SSL of this particular region. Afan et al. (2017) employed three AI models, including ANN, response surface method (RSM), and RSM basis global harmony search (GHS), for modeling SSL. The RSM model is integrated as an input selection method prior to prediction. Daily Q and SSL over a decade from Johor River were used to construct the models. Results showed the viability of integrating RSM with GHS to select the optimal antecedent records for prediction. Kisi and Ozkan (2016) developed a local weighted linear regression (LWLR) model for SSC prediction based on daily discharge and suspended sediment concentration data in Eel River, United States. LSSVR, ANN and SRC models were used for validation, and the LWLR model showed a superior prediction performance.

Bibliography
Using rainfall, river flow discharge, and sediment yield, Sudhishri et al. (2016) employed a non-linear dynamic (NLD) model for modeling daily sediment transport in a large mountainous watershed, Bino Watershed, India. Modeling was done using rainfall, river flow discharge, and sediment yield. The NLD model showed superior prediction results over ANN and W-ANN models. However, under-estimation was experienced during the events of high land-sliding and flash floods.

Appraisal of the literature
The exploration of related studies leaves little room for uncertainty as to the superiority of mathematical datadriven models (viz. regression-based) and soft computing data-driven models (viz. artificial intelligence or machine learning methods) over traditional sediment rating curve (SRC) method in simulating SSL (Singh et al., 2018;Zounemat-Kermani et al., 2016). In a recent review, Zounemat-Kermani, Matta, et al. (2020) acknowledged the absolute superiority of neuro-computing AI models over the traditional and statistical methods, such as SRC, MLR or ARIMA. The findings of studies show that AI models performed better than standard mathematical data-driven models. However, the question is which category (network-based, tree-based, support vector-based or evolutionary-based) or what type of AI model can be chosen as the best AI model for SSL simulation, is still ongoing. Further research in this particular area and related hydrological areas would lead to an answer but would also make it more indistinguishable and open-ended. Even though the majority of studies affirm the promising results from AI models in SSL simulation over traditional methods (e.g. Buyukyildiz & Kumcu, 2017;Choubin et al., 2018;Singh et al., 2018;Zounemat-Kermani et al., 2016). Different reasons might be associated with dissimilarities in announcing the best AI model by various studies. Indeed, existing contrasts in the datasets applied (e.g. length of dataset, input vector, and lead time), feature selection methods (e.g. auto-correlation, average mutual information, Gamma Test), data preparation techniques (e.g. holdout method, cross-validation method), and model tuning techniques (e.g. learning algorithms, model architecture, and termination factors) could be pointed out as some of the leading causes. In such studies (AI application to SSL), all the details related to modeling (e.g. data division strategy, used algorithm or method and calibration of its control parameters) should be provided by the modeler. In this way, the same methodology can be repeated and more robust conclusions about the implemented method may be derived from the results. Wu et al. (2014) suggested a protocol for ANN implementation and evaluated it using drinking water quality data. They examined 81 journal papers since 2000 and reported that there was no systematic protocol for the development of ANN models. The proposed protocol included the reasons why a specific method was selected, methods used and details of implementation, data collection and pre-processing, input selection, data splitting, model architecture selection, and model calibration and validation (replicative, predictive and structural). The study emphasized that similar protocols should be developed for other AI methods to better follow the implemented methodology for other case studies.
The literature suggests that the development of most AI models is based on a multivariate modeling strategy. The suspended sediment is modeled, based on various hydro-meteorological variables, possibly because AI models can establish the non-linear relationship that describes the physical progression between the predictand and the predictors by exploiting the advantages of the randomized learning process. Sediment transport modeling was conducted, using various hydrological and geophysical variables as causally related predictors (Tao et al., 2019). For modeling, it is assumed that the predictor-predictand relationship is stationary, but this assumption does not hold in real cases, as it depends on the actual data. When there is a non-stationary predictor-predictand relationship, the model results can be uncertain. Hence, the reasons for the non-stationary predictor-predictand relationship must be identified, and the selection of predictors must be carefully done to prevent the presence of any non-stationarity. Different statistical methods have been used to select the most suitable input combinations to develop sediment load prediction models. These methods are the gamma test, correlation analysis, intrinsic mode function, auto-correlation, average mutual information, neighborhood component analysis, Boruta feature selection, and iterative input selection, to name a few (Ahmed, Deo, Raj et al., 2021;Ahmed, Deo, Ghahramani et al., 2021;Kumar et al., 2019;Singh et al., 2018). These methods first evaluate each candidate input's relation with the output separately and rank them, and then the higher-ranked inputs are selected for model development. However, individually highly correlated inputs do not guarantee good performance when working together due to collinearity. Recursive Feature Elimination (RFE), a wrapper feature selection algorithm, has shown prominence in selecting ML models' inputs. The RFE has been recently used to select inputs for a sediment heavy metal prediction model (Bhagat, Tiyasha, Awadh, et al., 2020; and a sediment chemical prediction model (Sakizadeh, 2020). The advantage of RFE is its ability to select the most suitable input combination by iteratively removing the less influencing inputs. However, the use of different kinds of wrappers, including RFE, is still limited in the AI sediment prediction model development.
Despite the excellent performance of AI models in modeling sediment transport and abstracting nonlinearities that characterize the physical process via multivariate modeling strategies, it is often necessary to build new univariate models, which will serve as 'time-series prediction models.' Such models utilize the patterns and magnitudes of the study variables to predict their potential value in the future. This is important for most developing countries where there are limited or unavailable hydro-meteorological data at the metrological monitoring stations. Therefore, there is a need to develop such a univariate modeling strategy for implementation using the feasibility of soft computing models. It should be noted that the predictive ability of AI models for univariate input data applications has been verified for sediment simulation.
The present review also points to the significant improvement of AI models in their integration with data pre-processing approaches to achieve a 'complementary modeling approach.' Such studies point to the dependence of integrative models on the input space decomposition with several frequency-based information levels used to enhance performance. However, according to a recent study by Quilty and Adamowski (2018), some of the current research studies inaccurately developed wavelet-based AI models and thus they could not be properly applied for practical purposes, mainly related to forecasting applications. The issues in these research studies are related to: (i) the potential use of future data as an input in the development of selected models, (ii) an inappropriate choice of the decomposition level and wavelet filters, and (ii) the imprudent partitioning of training and testing data. Because of not addressing the reflection boundary conditions in applying the wavelet decomposition, the researchers have incorrectly implemented wavelet-based AI models, resulting in much better accuracy (with a relative positive bias) than what is normally achievable. To resolve this issue, the subsequent study of Quilty and Adamowski (2018) reported a new strategy to avoid such errors and to adequately using wavelet decomposition method. The strategic recommendation mentioned in the study should be considered in future research studies related to wavelet-based complementary modeling approach. To demonstrate the newly proposed approach, a few other research studies (e.g. (Al-Musaylh et al., 2020;Ghimire et al., 2019;Prasad et al., 2017)) have already adopted maximum overlap discrete wavelet transform for water resources, solar radiation and energy prediction studies.
In the recent decade, the ensemble machine learning (EML) models, like stacking, bagging, and boosting methods, have been used dramatically in hydrologic modeling; and simulating sediment transport in water bodies has followed the same trend (Oehler et al., 2012;Sharafati et al., 2020). Literature shows that the prevalent approaches in generating ensemble models aim to enhance the general capability and accuracy of individual AI models in simulating/predicting/forecasting suspended sediment. This claim can be supported, based on the results of: (i) stacking EML models (Alizadeh et al., 2017;Shamaei & Kaedi, 2016), (ii) model averaging (Wang et al., 2015), (iii) bagging EMLs (Nhu et al., 2020), (iv) boosting EMLs (Shadkani et al., 2020) and (v) ensemble empirical model decomposition with adaptive noise algorithm for multi-scale river flow prediction (Wen et al., 2019).
There is no universal AI model that can capture the global watershed features, mainly because each model presents its own limitations based on the algorithm and watershed input features. This is evidenced by the nature of various patterns due to the 'regionalization effect of the model accuracy in terms of hydrological alterations.' Meanwhile, the results of AI models have proven the sophisticated nature of these predictive tools or the predictive algorithms when implemented in various types of watershed areas with diverse features, while modeling and applying the non-linear hydrological parameters.
Optimization in engineering, specifically in hydrological and water management engineering, has been a challenge for many decades. Additionally, optimization methods play a vital role in the training of AI models. As an illustration, standard types of AI using mathematical techniques, such as traditional gradient-based optimization methods, were used to solve problems of interest. Some studies reported the successful practice of certain types of gradient-based optimization methods like the Levenberg-Marquardt algorithm in hydrological modeling (Qanza et al., 2019;Zounemat-Kermani, 2012). Nonetheless, due to their intrinsic shortcomings in addressing complexities in non-linear, chaotic, and stochastic hydrological systems, effective and reliable nature-inspired optimization techniques, called metaheuristic algorithms, have been under development. Due to their robustness and efficiency in coping with the complexities of noisy environment, they are superior to traditional mathematical algorithms for exploring the problem search space (Gomes et al., 2018). Recent literature shows that several attempts have been made to enhance the efficiency of meta-heuristic optimizers in training ordinary AI models .
Comparing different versions of AI models, it is clear that the integrated meta-heuristic AI models are the most appropriate for utilization as a potential alternative to the existing models for modeling river suspended sediment over diverse hydrological regions. Integrated metaheuristic AI models represent a hybridization between nature-inspired optimization algorithms and classical AI models, such as ANFIS, SVR, ANN, etc (Sharafati et al., 2021). In this regard, Fadaee et al. (2020) assessed the capability of using integrative (hybrid) AI models in improving the suspended sediment predictions based on two types of meta-heuristic algorithms, namely the butterfly optimization algorithm (BOA) and the genetic algorithm (GA). In general, the meta-heuristic algorithm increased the accuracy of AI models (ANFIS and ANN) by almost eight per cent. In such studies, however, limited data were used, and therefore, their generalization was also limited. More comprehensive studies related to the implementation of AI techniques with nature-inspired algorithms, considering that various stations have distinct geographical characteristics or climate, are needed. Different optimization algorithms have been found to work better with different AI algorithms for suspended sediment modeling. Rahgoshay et al. (2019) showed that the SVR model performed better in modeling the sediment transport amount when it was hybridized with PSO than with the GA method. However, Darabi et al. (2021) showed that ANFIS performed better when hybridized with sine-cosine algorithm (SCA) than certain natureinspired optimization algorithms, such as PSO. There are no guidelines on which optimizer should be hybridized with which AI algorithm for better prediction of sediment load. Different AI algorithms were individually optimized in previous studies using different optimizers to find the best AI-optimizer combination. However, only a few AI and optimization algorithms have been evaluated. Darabi et al. (2021) examined the performance of three AI models, namely ANN, ANFIS and RBRNN models, with four optimization algorithms, namely sinecosine algorithm (SCA), PSO, firefly algorithm, and bat algorithm, for sediment load prediction. Experiments can be conducted using more AI and optimization algorithms for comparative performance evaluation of different AI-optimizer combinations.
It is evident from the literature that linear regression and sediment rating curves are generally used and compared with AI methods for modeling SSL, and these methods produce inferior results compared to the latter methods. The MLR and SRC methods cannot adequately map the hysteresis behavior of the Q-SSL relationship. Therefore, it seems that comparison of highly non-linear AI methods with MLR and/or SRC in modeling SSL is not reasonable.
The other important issue in modeling SSL is that in most studies, previous SSL values were used as model inputs, which is hard to apply in practice because of the difficulty in measuring SSL data especially in the case of extreme events. On the other hand, the use of only water level data instead of discharge carries much more importance in modeling SSL, especially for the developing countries where effective variables are not available or missing for most of the stations.
It is worth noting some of the limitations to the use of AI approaches in modeling the SSL data series. It is known that two variables, SSL and streamflow, are generally used for modeling SSL and these variables have high spatial and temporal variability and mostly having highly skewed distributions. The main limitations of the implemented AI models are their low applicability to other basins which may have different morphological and climatic characteristics. The other limitation is their black box nature and their physical interpretation is a challenge and requires further analysis in quantifying the relationship between the independent and dependent parameters in a more reasonable way. Although the AI methods have some advantages in local modeling of SSLs, their limitations need to be addressed.

Insights for future research direction
Although several studies have endorsed ensemble learning as a more robust and effective artificial intelligence paradigm, there still exists a significant gap in utilizing ensemble learning in constructing AI models for suspended sediment concentration (Baskin et al., 2017;Wang et al., 2011). In this sense, further research is encouraged to explore the capability of different types of ensemble learning, particularly novel methodologies such as, AdaBoost and bagging techniques. Another topic that should be pursued in the area of suspended sediment modeling is the importance of bedload transportation in river systems and its influence on the variations of SSL. Although some studies consider both bedload and SSL modeling using AI methods (Pektaş & Doğan, 2015;Zounemat-Kermani, Mahdavi-Meymand, et al., 2020), the vast majority of the published studies have focused solely on the SSL as the target variable. Since employing AI models facilitates the process of encountering and embedding different input parameters, it is suggested to examine the effect of adding the bedload variable to the input vector in improving the final accuracy.
There is a need to channel more effort towards extending the current scope of research in this area. Considering these outcomes and the extensive state-of-the-art, the following recommendations can be considered for future studies: This review provides interesting findings on AI models, especially in the area of their practical applicability. For instance, ELM, advanced version of ANN model has shown fast-computational learning capability, which has been qualified as an online expert predictive system with great real-time application potential. It has been recommended for the monitoring of sediment transport, which contributes to the operation of reservoir systems and development of dependable irrigation systems and water pollution control. Another benefit of SSL prediction is its importance to river engineering sustainability from hydrological, environmental, and ecological perspectives. The general architecture of ELM is shown in Figure 11.
In the single hidden layer ELM model, the random initialization of weight of internal network throughout the learning process can determine its efficiency. However, this framework can be an issue from the soft computing viewpoint, since the learning performance of the network can be affected by the single hidden layer which can lead to inaccurate predictions. Therefore, the extension of ELM to a deep learning NN model is needed, where recurrent hidden layers are hosted in the feature space, thereby ensuring a better determination of the internal weight. The presence of deep-learning-based AI models to solve hydrological related problems is yet to be explored; however, their high accuracy and fast training/testing times have rendered them desirable for modeling sediment transport. Future applications of ELM as a hydrological prediction tool in areas with prominent hydrological feature memories could hinge on the use of advanced versions of the deep learning (DL) process which utilizes gradientbased Long Short-term Memory Network (LSTM) along with recurrent (or convolutional) layers (Hochreiter & Schmidhuber, 1997) as a predictive tool. The LSTMbased recurrent ELM can be applied in a way that LSTM can learn to bridge the marginal time lags in the case of discrete steps in order to enforce a constant error flow through error carousels, thereby permitting multiplicative gate units to learn the open/close access to the constant error flow. Such a model can implement input data time-series lagged behavior, and the result can be promising (Hochreiter & Schmidhuber, 1997;Sak et al., 2014;Tian & Pan, 2015). The local features in space and time can also be searched using LSTM-based ELM, since it has a generally little computational complexity, which can facilitate the extensive extraction of features with a low latency of the hydrological related variable output.
Another possible future direction is the assessment of prediction uncertainty, which has become a requirement for most modeling within the hydrology and water quality studies (Bayram et al., 2013;Wan Mohtar et al., 2019). Using the Lower Upper Bound Estimation method, Chen and Chau (2019) attempted to address uncertainty analysis on a novel hybrid double feedforward neural network model for producing the sediment load prediction interval with promising results. This provides a preparatory work and other uncertainty methods can be further explored. Beven (2016) suggested that the development of hydrological science was being threatened by scarce resources. Hence, the use of cost-effective soft computing models should be considered rather than embarking on costly experimental investigations. Therefore, this advantage must be exploited to build expert systems, which can help solve real-world problems.
Suspended sediment flow is a complex non-linear process that AI algorithms try to map by establishing a relationship between inputs and output. It is often suggested that AI models' inputs should be selected based on the non-linear relationship between inputs and output, rather than using conventional linear statistical approaches like correlation and auto-regression. In recent years, RFE method has been integrated with AI algorithms like SVM and ANN to use the RFE-SVM and RFE-ANN to select input features based on non-linear inputsoutput relationships. For example, Pour et al. (2020) used RFE-SVM to select inputs for a rainfall prediction model of peninsular Malaysia using AI. Such non-linear feature selection methods can be employed to improve AI models' sediment prediction accuracy. The relationship between suspended sediment and its driving factors may change with time. This is more likely due to changes in rainfall and other drivers of river sedimentation in the contexts of climate and other environmental changes. Therefore, the predictive model developed using historical data becomes obsolete within a short period. In recent years, the forward rolling method has been used to make the AI prediction model adaptable to changes in the input-output relationship. For example, Khan et al. (2021) used a forward rolling AI algorithm to develop a climate change resilient heatwave prediction model for Pakistan. Such approach can be used in modeling suspended sediment to make them robust to environmental changes.
Sediment data is unavailable in most of the rivers, even in developed countries. Transformation of basin information from one gauged catchment to another nearby ungauged catchment is often suggested to model the ungauged catchment's hydrological process. Recently, AI models have been used to predict streamflow of one catchment using rainfall and streamflow data of nearby gauged catchment. For example, Kim and Song (2020) trained a convolution neural network with data of a gauged catchment and used it for streamflow prediction of a nearby catchment. Kratzert et al. (2019) conducted a review on predictions in ungauged basins using AI and its potential. A similar study can be conducted to predict sediment flow in ungauged or poorly gauged catchments using nearby catchment data.
Big data is considered as the future solution to complex prediction problem. An extensive amount of data from different sources, including satellite, dense observational network and governmental statistics, are used. AI models can be used to integrate satellite high-resolution precipitation, land use, soil, and digital elevation model data for real-time prediction of river sedimentation's spatial distribution. Such models have more potential to provide the required information for river sedimentation management.
Obviously, the efficiency of AI models must not only be based on its numerical accuracy, rather, but its practical implementation for water resources management must be considered, especially, knowledge-based expert systems. The physically-based modeling approaches, for instance, have been proven useful and as excellent hydrological methodologies. Therefore, hybridization of the AI modeling technique with the current decision-making outline is still a difficult task, especially when considering the complicated nature of the system physics and availability of information.
The impact of human activity on modeling has been noted as a significant factor missing in the extensive review. It is certain that human activities can significantly affect the behavior of a basin of a multi-criteria catchment; therefore, such a constructive factor must not be neglected as it can influence model performance.
Based on an experimental study (Juez et al., 2018) concluded that the dynamics of fine sediment jointly observed with discharge found in nature was different in different reaches of the same river system. The morphological evolution of the riverbed within a reach (degradation or aggradation) is controlled by the importance of the sediment available at that location relative to the incoming sediment. The type of hysteretical loop modulated by the interaction of both the distal and proximal supply is intrinsically related with these two types of morphological processes of the riverbed. Consequently, sediment processes as sediment hysteresis are difficult to model adequately. This shows that adequately modeling of SSL needs new modeling strategies considering the inputs including the morphological changes in the riverbed (Juez et al., 2018).

Disclosure statement
No potential conflict of interest was reported by the author(s).