Introducing an evolutionary-decomposition model for prediction of municipal solid waste flow: application of intrinsic time-scale decomposition algorithm

Owing to the importance of municipal waste as a determining factor in waste management, developing data-driven models in waste generation data is essential. In the current study, solid waste generation is taken as the function of several parameters, namely month, rainfall, maximum temperature, average temperature, population, household size, educated man, educated women, and income. Two different stand-alone computational models, namely, gene expression programming and optimally pruned extreme machine learning techniques, are used in this study to establish their reliability in municipal solid waste generation forecasting, followed by Mallow’s coefficient feature selection method. The lowest Mallow’s coefficient defines the optimal parameters in solid waste generation forecasting. The novel hybrid models of intrinsic time-scale decomposition-gene expression programming and intrinsic time-scale decomposition- optimally pruned extreme machine learning methods based on Monte-Carlo resampling are employed, and an empirical equation is presented for solid waste generation prediction. For examining the reliability of these models, five statistical criteria, namely coefficient of determination, root mean square error, percent mean absolute relative error, uncertainty at 95% and Willmott’s index of agreement, are implemented. Considering Willmott’s index, the Monte Carlo-intrinsic time-scale decomposition-gene expression programming model attains the closest value (0.957) to the ideal value in the training stage and 0.877 in the testing stage. The hybrid ensemble model of intrinsic time-Scale decomposition-gene expression programming presented lower values of root mean square error (12.279) and percent mean absolute relative error (4.310) in the training phase and in the testing, phase compared to gene expression programming with (12.194) and (5.195), respectively. Overall, the prediction results of the hybrid model of intrinsic time-scale decomposition-gene expression programming using Monte-Carlo resampling technique agrees well with the observed solid waste generation data.

Waste management; artificial intelligence; intrinsic time-scale decomposition (ITD) algorithm; gene expression programming; machine learning; circular economy (CE)

Introduction
Growing population and agricultural activities to provide food and water lead to massive increases in waste generation polluting the natural environment (Samal et al., 2020). These environmental issues are perceptible, especially in developing countries, because there is no particular waste management infrastructure (Arena et al., 2003). These environmental issues, namely river contamination, underprivileged agricultural practices, sanitation problems, unpleasant odor, groundwater quality pollution because of landfills seepage of leachate, dirty flies, and mosquitoes, can cause an increasing death rate year to year (El-Fadel et al., 1997;Wen et al., 2019). Waste can be classified into different categories, namely municipal (Nęcka et al., 2019), hazardous, industrial, agricultural, bio-medical, etc. In this research, the main focus is on MSW, which generally contains degradable waste (like food waste), partially degradable (like wood), and non-degradable (like plastics) (Soni et al., 2019;Tenodi et al., 2020). SWG is one of the crucial environmental challenges (Abbasi et al., 2019). It can conserve natural resources in an every-day environment, which directly affects human health (Mozhiarasi et al., 2020).
Municipal waste generation rate forecasting is one of the recent vital issues of decision-makers to develop and plan a waste management system; it can be a reliable solution in future waste management especially in populated megacities around the world (Abdulredha et al., 2020;Duan et al., 2020;Kannangara et al., 2018). Waste generation data are reported as time series in different scales such as daily, monthly, and yearly data (Liu et al., 2019). These data have quite dynamic nature (Vu et al., 2019). Due to their complexity and nonlinearity, it is essential to use a rigorous model to achieve acceptable and accurate results. As mentioned previously, different parameters are affecting the SWG models. They have high fluctuations in their amounts, therefore considering these drivers as main predictors for modeling has the highest importance (Araiza-Aguilar et al., 2020).
More recently, from the literature, there are new signal processing techniques that separate large fluctuating signals into individual smaller sequences. For analyzing non-linear and complex data series, these decomposition methods are adaptable and useful tools. They try to decompose the original data into a limited amount of residuals (Zeng, Ismail, et al., 2020). ITD, which is called intrinsic time-scale decomposition, was first presented by Frei and Osorio (2007). It is a novel model based on adaptive decomposition techniques, which can expose the signal decomposition in a better way. ITD attains complex dynamics that practice new processes of constructing the baseline through a piecewise linear function. The advantages of this model include low complexity in the computation, fast speed, etc. In ITD, the original data is divided into several (more than five) monotonous PRCs from high to low frequencies, which are then applied to analyze immediate information (Zeng et al., 2012).
The core objective of the present study is to present an accurate prediction model of MSW using DDMs combining new pre-processing data decomposing (ITD algorithm) and resampling techniques (Monte-Carlo) to decompose municipal waste data series into several subseries to gain accuracy in model prediction. The technique, along with ITD, as mentioned above, results in outstanding improvements in data quality before modeling and predicting non-stationary and non-linear MSW data. According to literature, there are no studies about ITD algorithm in environmental systems, especially MWS prediction. The novelty of this study lies in the integration of FS, ITD, and DDMs in waste management and MSW forecasting. Another contribution of the study is to extract a comprehensive relationship from GEP model in order to compute monthly municipal SWG. In addition, this study performed a sensitivity analysis in order to determine the most important input variable on monthly municipal SWG.
The rest of the present research is structured as follows. Section 2, which addresses materials and methods, includes empirical equations, selected models methodology and input data selection; in Section 3, case study and used data are detailed; Section 4 presents the model's assessments criteria; Section 5 reports the results and discussions of this study; and finally, in Section 6, the conclusion of the outline results are given.

Empirical equations
According to previous studies, some parameters are very effective in SWG forecasting in terms of municipal waste. These parameters can be expressed as follows: There were few studies in the literature, which developed a model based on SWG predictors. Benítez et al. (2008) developed a model for RSW per day with EDU, NH, income as predictors. They presented the best linear model with four variables as follows: They found that the linear model with these four variables (three independent and one dependent) could explain 51% of the results and produce the best coefficient of determination value (Benítez et al., 2008).
Ali Abdoli et al. (2012) presented two regression models (simple linear and logarithmic) for SWG forecasting in Mashhad city (one of the megacities in Iran) based on income, POP and Tmax as input variables. They obtained the model coefficients using multilinear regression analysis. The results of their study were as follows: Linear model: Log-Log model: The found that the log-log model performed better than simple linear model with the coefficient of determination values of 0.72 and 0.64, respectively (Ali Abdoli et al., 2012). Silva et al. (2020) developed an empirical equation for waste generation (t/day) using POP and WGI (kg/person day) They declared that the WGI data were obtained from SNIS (Silva et al., 2020).

Gene expression programming (GEP)
GP is an application of genetic that was first conducted by (Koza, 2007). GP is a programming method that develops binary strings like genetic algorithm methods in order to introduce complex and non-linear structures. This method is inspired by the human brain and Darwinian evolutionary theory, which solves the problems based on genetic operatives such as reproduction, mutation, and cross over. In the reproduction phase, the method decides which program should substitute with the other epochs. The structure of this method is based on parse trees, and in the reproduction stage, a defined number of trees are substituted in the implementation stage. The mutation stage shields the generated model from the pre-mature conjunctions, and lastly, the crossover phase control all parameters. GP has some shortcomings like having only three crossovers for generating parse trees, lack of independent genomes, inability to develop simple expressions. Therefore, in 2001, GEP model based on the evolutionary population concept was developed by Ferreira (2001), as a modified version of GP method. GEP integrates linear chromosomes genetic algorithm and parse trees genetic programming methods. The required parameters for GEP models are as follows: set of terminals, set of functions, fitness functions, control parameters, and terminal inputs. A notable change in GEP model in comparison to GP is the transformation of the genome into the next generation without replication or mutation, and it results in simple linear regression.
The fitness function (f i ) of an individual program (i) can be computed as follows: in which M is the selection rang, C (i,j) is the value returning by individual chromosomes I for the jth case of fitness and T j is the largest value for the jth case of fitness.
Besides, multiple genes are categorized into single chromosomes, including head and tail. In GEP model, each gene has constant variables and terminal sets along with their arithmetic functions. The overall steps in GEP model comprise the followings: Step (1) Fixed lengths chromosomes are created for individuals Step (2) Chromosomes are conveyed as expression trees Step (3) In the reproduction stage, the best-fitted individuals are selected Step (4) Until having defined the best solution, the iteration process continues (replication, modification, and generation) (Iqbal et al., 2020).
As discussed previously, two critical components of GEP are chromosomes and ETs. The integration of ETs with user-defined linking functions is an essential rule in GEP. Based on individual problem and the input variables, GEP gives some sub-ETs; thus, considering this integration, the sequence of a gene could be defined.

Optimally pruned extreme learning machine (OPELM)
OPELM is one of the AI techniques that were first introduced by Huang and Chen (2006). It originates from ANN model. According to literature, OPELM gives faster results than most other OPELM based algorithms as it keeps the accuracy of models (Miche et al., 2010). OPELM selects the weights of hidden neurons of ANN like FFNN. In comparison to SLFN models, OPELM has some advantages. A typical SLFN with M samples, L hidden nodes, and activation function h(x) can be described as follows (Feng et al., 2016): The activation function can be represented as below: Minimize : ||Hβ − T|| 2 and ||β|| (8) in which H is the hidden layer output matrix and can be shown as: According to Şahin (2013), the main objective of OPELM is to minimize ||β|| in term of 2 ||β|| . As discussed in Miche et al. (2010), utilizing OPELM algorithms can diminish the time for training models, and i and t also have simpler algorithms. The least-squares method in this model was used to compute the output weights and biases (Heddam & Kisi, 2017). It uses four types of kernel functions, namely sigmoid, Gaussian, non-linear, and linear (Miche et al., 2010).

Intrinsic time-scale decomposition (ITD)
ITD is a method to extract the instantaneous frequency of non-stationary data series signals into high frequency and low-frequency components called PRCs using iterative time-frequency algorithm for decomposition. ITD is a new, non-stationary, non-linear signal processing method that was first introduced by Frei and Osorio (2007). This method has similarities to wavelet decomposition algorithms without their basic choices. It means that the basic functions are straightly extracted from the original data series signals. Considering the exclusive signal, the basis does not repeat for other existing signals. ITD method does the extraction in real-time in an accurate way and resulting in PRCs, which contain residual components along with their frequency components. The spectral algorithm in the use of ITD is based on Hilbert transform and a proper rotation is demonstrated as H(t) and baseline signal as L(t).
PRCs are defined as is the mean of the signal, written as L(t). The input signal x(t) is then decomposed as: ITD algorithm follows the following steps: (1) Let the corresponding occurrence time τ k and the extreme points of input signal x(t), where k = 0, 1, 2, . . . Considering τ 0 = 0 as the first signal.
(2) The input signal x(t) is considered on the interval [0, τ k + 2] and L(t) and H(t) as operators over the time interval [0, τ k ]. The baseline extraction operator is designed as: and where α is a constant value between 0 and 1. Linear contraction of the original signal is built to make monotonic x(t) between the extrema points, which is necessary for PRCs. (3) To extract PRCs, the following operator is defined: (4) The processes in equations (11) and (12) are repeated iteratively until the baseline L(t) converts to a monotonic function, in which the single signal can be divided into PRCs.
where p is the number of achieved PRCs. Thus, in this study, the ITD algorithm provides a spectral de-noising analysis for the original data, which is used as a pre-processing technique for the original SWG data. Besides, ITD has simple computation, improves the data quality, avoids the smoothing of transients, and smearing in time because of sifting (Zeng, Li, et al., 2020).

Robust optimal input selection (Mallow's coefficient)
In practice, for developing rigorous models, selecting the most effective and reliable variables is an essential task in the development of descriptive models. The models' prediction ability is profoundly affected by the parameters chosen before the model generation. So, the right selection of input parameters for having decent crossvalidation and a precise model is crucial (Ashrafian et al., 2020). There are a vast number of approaches (feature selection methods) to find the best set of input variables, such as forward selection, AIC, backward elimination, BIC, and CP. To select a proper FS technique, based on a literature review, Mallow's coefficient performs well in selecting the predictor parameters (Sattar et al., 2019). Moreover, it also helps in determining the influence of each input variable on the output variable. To achieve this, the parameters of the original time series dataset of MSW are optimized using CP feature selection technique to minimize the number of predictor input variables for accurate prediction of MSW (Ghaemi et al., 2019). CP creates a regression equation with a minimum number of input variables, which has the best fit among all equations. For developing a specific descriptive model with k available variables and p selected input predictors (k > p); the Mallow's coefficient can be developed as below: in which, RSS p is the residual sum of predictors (p) squares, MSE k is the mean square error of k variables (complete all variables), and n is the size of the samples. More information about this coefficient can be found in Olejnik et al. (2000).

Data resampling -Monte-Carlo technique
Resampling process is a pre-processing step that changes the original data distribution in order to meet some userprescribed criteria. That is the resampling method does not consider the generic distribution tables such as normal distribution tables to compare probability values. In this technique, random replacement of original data series according to which number of sample cases are similar to the original data series, are selected. There are some categories of resampling approaches such as crossvalidation, Jackknife resampling, random subsampling, and nonparametric bootstrapping (Fox, 2002). Monte-Carlo techniques, or MC experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The concept of this experiment relies on randomness procedure to address problems. The underlying idea behind MC is that repeated random sampling and statistical analysis are conducted to obtain the appropriate results. In this technique, data series are divided into multiple patches randomly and then validation of the predictive models are computed. Afterward, the average of error values that are obtained for each patch should be returned. The most important feature of MC resampling technique is to ensure an accurate comparison of predicting models and also to avoid biased results (Bokde et al., 2020).

Description of ensemble ITD-based models
This part establishes a platform for estimating Tehran megacity SWG by conducting standalone (GEP and OPELM) models and the hybrid (ITD-GEP and ITD-OPELM). The goal of combining ITD pre-processing techniques is to find out whether or not they increase the accuracy of stand-alone models. One reason for the integration of ITD with the proposed DDMs is that the traditional stand-alone models have several difficulties because of interfacing with vibration and noise of nonlinear and complex input data. Therefore, the primary advantage of coupling ITD-based algorithms with standalone models is decreasing the noise of the historical data along with increasing the accuracy of generated data, which makes the models become close to the primary SWG data. The whole study procedure, which is considered to predict SWG at Tehran city, is demonstrated by a flowchart as shown in Figure 1. Before starting the modeling using stand-alone models and their combination with decomposition-based models (ITD), the optimum input data are selected using the CP Mallow coefficient. Thus, the most critical and influential variables are selected firstly by considering the relevant parameters of Tehran SWG data. Next, the whole selected data are analyzed and the missing data, outliers, and duplicated data are detected and omitted. Afterward, in order to start the main steps in stand-alone and decomposition-based models, the SWG data time series should be divided into two groups of training (a total of 75% of SWG data) and testing stages (25% of remaining). The whole study procedure can be divided into three main steps, as follows: • Modeling SWG with two important and commonly used DDMs (GEP and OP-OPELM) to gain the equation of the SWG time series. • To enhance the productivity and accuracy of the abovementioned stand-alone models, the SWG data are decomposed with ITD based technique, which divides the data into several PRC components. Based on Figure 2, the primary SWG data utilize ITD algorithm to decompose data into four PRCs and one residual component. Next, each PRCs and residual components are modeled by two proposed DDMs to compute SWG. Finally, all the forecasted SWG values of extracted sub-series of PRCs and residuals are gathered to generate the final SWG data. • In the final step, the results of stand-alone DDMs in SWG modeling are compared with the predicted values of hybrid ITD models.
The idea of decomposing and aggregating input data and then integrated with two DDM techniques (ITD-GEP and ITD-OPELM) formulates an accurate estimation of SWG with ten input values in Tehran megacity. The decomposed input data can simplify the data structure, and aggregating help the modelers to generate an appropriate formula. Finally, to define the best model, all generated models are evaluated with statistical performance metrics.
As stated above, ITD method is used to decompose input data signals into several PRCs to extract the main components. Searching the pattern of input data series and changing these patterns from high dimensions to lower dimensions is essential. Figure 2 demonstrates the initial SWG data in pink color. ITD method decomposes SWG data into four permanent PRCs and the residual lines, which are shown in yellow color (with the x-axis of the month). From Figure 2, it is evident that the sum of all PRCs results in the input signal.

Case study and data collection
Tehran, the capital city of Iran, is located in the slope of central Alborz mountain between the 35°42' 55.0728" N and 51°24' 15.6348" E. The area of Tehran is over 700 km 2 with approximately 12.5 million population, which is the largest city of Iran. Moreover, the elevation of Tehran varies from 1800 m in the north, 1200 m in central, and 1050 m in southern parts. Tehran climate is categorized as semi-arid with an annual rainfall of 245-316 mm. The temperature of Tehran varies from a minimum of 18°C to a maximum of 38.7°C annually. TWMO has the duty of collecting waste, measuring its weight at landfill sites and reporting the data. A vast amount of waste (around 8000 Tons/day) is generated daily as a result of the massive population, according to TWMO. A significant proportion (70%) of waste belongs to organic and biodegradable waste wastes, which are almost categorized as wet waste. Figure 3 shows the map of the study area with input data for SWG modeling.
As mentioned above, there are lots of factors related to waste. Therefore, in this study, the parameters such as a month, rainfall, maximum temperature, average temperature, population, household size, educated man, educated women, GDP, and income, are selected from the  period of 1991 to 2013 as input variables considering MSW as output in SWG modeling.
Some studies found that an increase in income can change the consumption patterns of households, resulting in changed composition and quantities of waste (Trang et al., 2017;Abbasi & El Hanandeh, 2016). According to Silva et al. (2020), socio-economic factors such as income, education level and GDP also contribute significantly to variations in SWG. Climate change is one of the influential factors that is already happening due to human activities. It is founded that the world may experience changes in seasonal precipitation, higher temperatures, and more extreme rainfall events. In future, these changes can be effective on the range of economic, social, and environmental processes straightforwardly. Accordingly, waste generation activities are evolving due to economic, social, and environmental change (Bebb & Kersey, 2003).
The whole input data include 22 years of monthly data (274 months), 75% of which (207 months) are defined as training, and the remaining 25% (69 months) are selected as testing data. The statistical characteristics are shown in Table 1 for the training and testing periods.

Model assessment criteria
The reliability of the proposed models is evaluated using several statistical criteria including coefficient of determination (R 2 ), root mean square error (RMSE), percent mean absolute relative error (PMARE), uncertainty at 95% (U95), Willmott's index of agreement (WI), which are computed as follows: where, SWG o and SWG m denote the observed and modeled values of SWG, respectively. Besides, SWG o and SWG m denote the average values of observed and modeled SWG data, respectively, and N is the number of the entire data.

Application results and discussion
After having decomposed input parameters of MSW time series data with ITD algorithm, CP coefficient is used to decrease the model complexity and unessential parameters are omitted. Afterward, two important DDMs, namely GEP and OPELM, are employed in MSW prediction. The advantage of considering these DDMs (GEP and OPELM) is that they consider operational parameters and are able to give a functional set of input variables as well as constant variables.

Determining optimum input variables
The best subset of input variables is selected by Mallow's coefficient. Mallow's CP is the best way to simplify the highest number of parameters (Fadaee et al., 2020). The results of the optimum input variables in SWG modeling are demonstrated in Table 2. This table provides R 2 , CP, and standard deviation of all input variables. Ten subsets and models are structured, considering input variables for SWG forecasting. It can be seen that some parameters, including M, T ave , GDP, HS, and income in model number 5, with the lowest CP value, are the most critical parameters. While considering R 2 values, models 5-10 are close to one another. The reported values of R 2 , CP, and Stdev for the selected subset are 71.4, 9.9, and 146.27, respectively.

Stand-alone model configuration
The primary aim of this section is to define the selected model's parameters along with their initializations (GEP and OPELM models). Thus, the following subsections discuss each model's parameters and scenarios. Table 3 demonstrates input parameters such as function set (+, −, ×, /, exp, power) used in this study with trial and error. The mutation and inversion rate are set as 0.138  and 0.546, respectively. Gene recombination and transportation rate are defined to have the same value as 0.277. The maximum tree depth in each node is selected as 6 and the number of chromosomes is 30 along with three genes.

Development of the GEP model
The results of SWG modeling with stand-alone GEP are demonstrated in two ways, including the obtained formula along with the decision tree. Based on CP results, a group of parameters including HS, income, m, T ave , GDP is selected to form the SWG equation by GEP programming. Figure 4 demonstrates the three subsets of the decision tree in SWG forecasting using the GEP model. Moreover, Table 4 provides SWG non-linear mathematical formula (ET Total ) using the optimally selected variables (from CP), which is generated by GEP method.

Development of OPELM model
The main aim of designing OPELM algorithm is for vigorous improvement of stand-alone OPELM models in estimating SWG. To achieve this, an elegant OPELM algorithm is utilized by conducting MATLAB code. As previously mentioned, OPELM model scenarios start with the first step of input data training (considering the featured vectors). The second step is on the weight of the input layer for the hidden layer. The hidden matrix is computed in the third step. Next, the inverse of the hidden matrix is computed. Finally, the weight of the hidden layer is designed.

Comparison of the proposed models for SWG prediction
Tables 5 and 6 provide the performance metrics of four models in SWG forecasting, including GEP, OPELM, ITD-GEP, ITD-OPELM, and MC-based AI models in training and testing stages. For the training phase, the  shows that with the 95% uncertainty confidence, MCbased ITD-GEP has the lowest uncertainty, which indicates that this model has the best performance among other models in SWG forecasting. By considering WI index as shown in Table 5, ITD-GEP with the help of Monte-Carlo resampling technique indicates a satisfactory result in SWG forecasting. The results in the testing stage (Table 6) also demonstrate satisfactory results on the applicability of MC resampling with ensemble ITD-GEP and ITD-OPELM models. By considering R 2 coefficient, MC-ITD-GEP has a higher value than the ITD-OPELM model. In other words, MC-ITD-GEP has 4.7% and 17.2% increases in accuracy in comparison to stand-alone GEP and ITD-GEP models. Considering RMSE, the error of ITD-GEP using MC technique decreases 1.22 units, while MC-ITD-OPELM error decreases 2.4 units in comparison to their stand-alone models. PMARE metric also demonstrates that MC-ITD-GEP method has the lowest error among other ensemble and stand-alone methods. The uncertainty in 95% of confidence level has the highest value for OPELM model in the testing stage. Finally, WI metric demonstrates that MC-ITD-GEP outperforms all other models.
The scatter plot outcomes of the selected models, including stand-alone GEP and OPELM, along with the coupled version of them (ITD-GEP and ITD-OPELM), are plotted in training and testing stages, which are illustrated in blue and red color, respectively ( Figures 5 and 6). Figure 5 demonstrates the observed and predicted SWG (M Ton) in the training phase for stand-alone models (GEP, OPELM) and ITD integrated models (ITD-GEP and ITD-OPELM). As shown on the above left, the equation of y = 0.6912x + 69.922 demonstrates the relationship between two modeled and observed data sets, and the squared R has a satisfactory value (0.7767), which shows a proper fit between the observed and predicted SWG data. Considering R 2 , OPELM method is followed by stand-alone GEP with a value of 0.7271. However, in the next step, by integrating novel signal processing ITD method with stand-alone models, the squared R values increase to 0.8214 for ITD-GEP and 0.7801 for ITD OPELM, respectively. Comparing the two ITD-hybrid models, ITD-GEP model with the equation of y = 0.8898x + 19.219 representing the observed and predicted values in SWG forecasting has more similarity than that of ITD-OPELM method. Thus, the bestperformed method in training stage is ITD-GEP. Figure 6 provides the information mentioned above for the testing stage, and stand-alone GEP and OPELM models are compared with their ITD-hybrid models utilizing the equation between predicted and observed SWG data. In comparison to the training information, the squared R values are smaller. Nevertheless, again in comparing stand-alone models, GEP outperforms the OPELM in SWG forecasting while in comparing the decomposed ITD-GEP and ITD-OPELM models, ITD-GEP model with the equation of y = 0.6635x + 58.479 and R 2 = 0.735 has the highest value and this model is selected as the best and accurate model in Tehran city SWG forecasting.

Further analysis and discussion
A clear understanding of waste generation is one of the essential issues in environmental studies like waste collection and waste treatment in cities, especially megacities in developing counties. Temperature, growing population, GDP, income, household size, number of educated people, etc. are critical parameters in controlling the generation of waste. Having information and historical data of each abovementioned parameter can aid policymakers to manage and make better decision. According to Pan et al. (2019), GDP and population  parameters are essential variables in SWG in mega and super megacities, respectively. This study presents a case study on predicting the SWG formula in Tehran megacity in Iran. The best input values are selected using CP coefficient, including M, T max , T ave , Rain, GDP, POP, EM, EW, HS, and income. In this study, the applicability of stand-alone models (GEP and OPELM) are evaluated, and results of stand-alone models are combined with ITD-decomposed SWG input values. The results indicate that hybrid ITD-based models (ITD-GEP and ITD-OPELM) have the greatest similarity with the observed SWG data. Figure 7  At last, by drawing all predicted SWG time series and comparison among all models, ITD based hybrid models outperform the two stand-alone models, and they improve the accuracy shortcomings of stand-alone GEP and OPELM models. Overall, ITD-GEP is nominated as the best-performed model in SWG forecasting.
Another analysis that can be addressed in this study is the sensitivity analysis (SA) of variables, which indicates the effect of different values of predictors on target. For each predictor, SA% is as follows [53]: where t max and t min = maximum and minimum of the estimated target over the ith input domain, where other predictor values are equal to their average values. Results of variable importance in modeling municipal SWG are indicated in Figure 8 based on GEP model (the best stand-alone model). This figure indicates that the most effective variable in monthly SWG is GDP.
Overall, the performances of noise detection decomposition models (ITD-based) based on 22 years of SWG data are evaluated, and the results reveal that hybrid ITD-GEP and ITD-OPELM outperform their stand-alone models.

Conclusion
Since MSWG is an essential issue in the era of increasing population rate, it deserves to be simulated by DDMs to have better management. Many factors have significant influences on SWG, such as a month, Rain, Tmax, Tave, POP, HS, EM, EW, GDP, and income. Considering all parameters and factors as input parameters for designing an appropriate model takes much time and effort. In this regard, CP is utilized as FS technique to determine the optimum input parameters for SWG forecasting. After having defined the best inputs, two standalone DDMs, namely GEP and OPELM, are used in this study. As the first step, this study proposes a comprehensive and promising formula in order to predict monthly municipal SWG using less number of predictors and more influential ones. In addition, since the waste data are complex, non-stationary, and non-linear, ITD signal decomposing method is utilized to extract SWG input data features into straightforward and linear baseline signals of predominant PRCs. The critical aspect of ITD based methods is that they generate piecewise linear operators, which render the models easy and faster computation. Decomposing the input data into several PRCs, then modeling with DDMs and aggregating them again in order to report the output value is the other important objective of this study. Then, Monte-Carlo resampling technique is used to handle imbalanced datasets. Several performance metrics, particularly R 2 and WI, and visual plots, demonstrate that by decomposing the nonstationary and nonlinear SWG into relatively stationary signals and appropriately overcoming the noise terms hidden inside the original SWG, prediction quantities can be improved significantly. Based on the sensitivity analysis, GDP is found as the most important variable on monthly SWG at Tehran, while the average temperature is less important than the rest. To sum up, in the present study, ensemble AI models is proposed, namely MC-ITD-GEP and MC-ITD-OPELM, in which MC-ITD-GEP model outperforms all other models for SWG prediction. Since waste management is a significant issue, future works should be undertaken on SWG prediction using brandnew methods considering other relevant parameters as input.