The least-square support vector regression model for the dyes and heavy metal ions removal prediction

Abstract Wastewater treatment plants are typically complex because they involve physical, chemical, and biological processes. Meanwhile, the efficiency of the removal of pollutants such as auramine (AO), methylene blue (MB), and ion Cadmium (Cd (II)) from the wastewater are difficult to be measured directly in real-time, as this measurement requires laboratory instruments that are time-consuming. The soft sensors could be the solution to perform the real-time prediction of the AO, MB, and Cd (II) removals’ capability. Hence, this study investigates the performances of a soft sensor, namely least-square support vector regression (LSSVR) to estimate the AO, MB, and Cd (II) removals’ ability. In this study, two wastewater-related case studies involving AO, MB, and Cd (II) removals were used to evaluate the predictive performance of the LSSVR. Additionally, its results were compared and analyzed with other soft sensors. For both case studies, notice that LSSVR gives the best results for AO, MB, and Cd (II) removals as compared to other soft sensor models where its root means square errors, mean absolute errors, and the approximate error, are lowered by 83% to 1,756%. Moreover, its coefficients of determination, denoted R2 are the highest which are all more than or close to 0.9 for all the AO, MB, and Cd (II) removals even for the testing data for the case studies that were not used to develop the LSSVR model. In conclusion, LSSVR is more suitable for evaluating the effectiveness of the AO, MB, and Cd (II) removals at present.


Introduction
Wastewater treatment plants (WWTPs) are generally complicated, since they are integrated with different processes such as physical, chemical, and biological processes (Nasr et al. 2021).These processes cause high fluctuations in pollutant load, flow rate, and hydraulic conditions.As a result, certain crucial quality indicators are hard to assess in real-time without explicitly mentioning them; however, they are critical for optimization, process monitoring, and controlling (Wang et al. 2021).This limitation from the quality indicators has ultimately drawn the attention to implement artificial intelligence (AI) into the wastewater industry to facilitate the online monitoring of wastewater parameters.AI methods including soft sensors have been used to model wastewater treatment processes (Ngu et al. 2023).These techniques create predictive mathematical models to achieve real-time prediction of hardto-measure variables using easy-to-measure variables (Yeo et al. 2023).Soft sensors allow reliable and consistent assessment of wastewater parameters and have become a prominent approach in the field of machine learning (ML; Shang et al. 2014;Ngu and Yeo 2022).
ML algorithms or models can detect the trend of a given data without the need for prior knowledge or human interaction (Zhong et al. 2021).Moreover, the objective of ML is to replicate how humans interpret data with the intent of achieving a target.Besides, support vector machine (SVM) is a popular modern ML approach.SVMs can effectively do non-linear and linear classification by using a technique known as the kernel trick whereby the inputs are mapped into high-dimensional feature spaces (Mahesh 2020;Pervez et al. 2023c).SVM can generalize well and is resilient to large dimensional data.However, the training pace is slower and its predictive ability depends on the parameters used (Singh et al. 2016).Least squares support vector regression (LSSVR) is a type of SVM that offers dependable and superior generalization performance by aiming to reduce the upper bound of the generalization error from SVMs (Yeo and Lau 2021).Its performance is, however, dependent on the specification of the kernel and regularization parameter (Liu et al. 2014).LSSVR began to be employed in chemical fields whereby the processes are generally nonlinear as they fluctuate over time (Pervez et al. 2023a).Therefore, this LSSVR model has been used by Pervez et al. (2023a), Ngu and Yeo (2023b), Pervez et al. (2023d), Malang et al. (2023b), and Yeo and Lau (2021).However, notice that a very limited study is using LSSVR as a modeling tool to overcome issues of wastewater treatment.
Environmental issues might arise as a result of dye and heavy metal ions contamination in wastewater.The infiltration of different dyes and heavy metal ions in the water can damage marine life, ultimately killing an entire ecosystem.Around 15% of produced dyes worldwide are discharged as effluents (Rahman et al. 2012).These discharged dyes include toxic organic cationic dyes, such as Auramine O (AO) and methylene blue (MB) that are employed in the manufacturing of paint and dye.MB is an essential dye that is utilized extensively in the textile and printing industries, as well as some large-scale medical applications (Wang et al. 2005).However, MB may pose detrimental effects on human health if present in excessive amounts such as vomiting, increased heart rate, and shock (Mazaheri et al. 2017).Meanwhile, AO is used in coloring leather, food, carpets, paper, and textiles (Ali et al. 2021).But, a small amount of AO is extremely injurious and carcinogenic (Cancer 1987).Furthermore, Cadmium ions (Cd (II)) are heavy metals that pose a serious threat to the environment due to its high mobility and toxicity.Besides, the production of pigments, alloys, and phosphate fertilizers make extensive use of Cd (II).
Moreover, Cd (II) discharges to the water from the used batteries and galvanized pipe corrosion, as well as from volcanic eruptions.To address these issues, activated carbon has been presently proven to be a successful inexpensive adsorbent for heavy metals like Cd (II) and dyes, including MB removals due to its porosity and the high surface-to-volume ratio (Neolaka et al. 2022).Besides activated carbon, zinc ferrite (ZnFe 2 O 4 ) is another adsorbent that can be used to eliminate AO, MB, and Cd (II) from the wastewater (Zhao et al. 2022).The adsorption capability of an adsorbate and its removal process is significantly influenced by the quantity and presence of surface functional groups.Acid treatment can change the chemical and physical characteristics of carbon which affects how well it adsorbs substances (Wang et al. 2005).Typically, utilizing nontoxic adsorbents such as natural resources like wood and coal can improve the efficiency of the adsorption process.This adsorption method is eco-friendly, as it converts agricultural waste into beneficial adsorbents that can filter out metal and organic compounds from effluents.Additionally, adsorbents derived from agricultural wastes are a more affordable and environmentally friendly solution instead of pricey and nonrenewable adsorbents synthesized from polymers and petroleum wastes.
Nevertheless, a minimal study involves the LSSVR model that has a high non-linear prediction ability in optimizing the AO, MB and Cd (II) removals.Therefore, to address this research gap, this study aims to investigate and analyze the prediction of the LSSVR for the dyes and heavy metal ions removals.To evaluate the predictive performance of the LSSVR in optimizing the AO, MB and Cd (II) removals, two different case studies were used.In each of these case studies, the obtained results from LSSVR were compared with principal component regression (PCR), partial least square regression (PLSR), locally weighted partial least square regression (LW-PLSR), and Artificial Neural Network (ANN) and Support Vector Regression (SVR).

Methodology
In this section, two case studies that were utilized to develop and evaluate the predictive performance of the LSSVR model are described.Then, it is followed by LSSVR model development, data splitting, and parameter setting, as well as error metrics.

Case study 1: The removals of the MB and Cd (II) using natural walnut carbon
A case study for a wastewater treatment process involving MB and Cd (II) removals using an adsorption process was adopted by Mazaheri et al. (2017).A UV-visible spectrophotometer was used to record the absorbance spectra for MB in the region of 300 to 750 nm.While, at the maximum absorbance spectra of 228.8 nm, Cd (II) was measured using a Varian AA 220 atomic absorption spectrophotometer.Furthermore, activated carbon was created using walnut wood waste to act as an adsorbent, as they are widely available, affordable, and locally produced.This carbon from the walnut wood waste was directly mixed for 8 h in 150 mL of 5 mol L −1 of nitric acid (HNO 3 ) to activate it.Deionized water was used to filter and wash this HNO 3 -activated carbon at 50 � C until it reached pH 6, and then it was dried at 105 � C for a day before it was used as an adsorbent to remove MB and Cd (II).In this adsorption process, its environmental factors are pH, stirring duration, adsorbent mass, contact time, and the concentrations of MB and Cd (II).Hence, they serve as the input variables for the least-square regression models.Meanwhile, its output variables are the removal percentages of MB and Cd (II).

Case study 2: the removal of dyes and heavy metal ions using ZnFe 2 O 4 nanoparticles
This case study was adopted from Zhao et al. (2022) in which the ZnFe 2 O 4 nanoparticles were used as adsorbents to remove AO, MB, and Cd (II).In this batch experiment, the dyes (AO, and MB) and heavy metal (Cd (II)) solutions were prepared with different concentrations and their pH values were adjusted.Moreover, 25 mL iron (III) chloride hexahydrate (0.4 M) was added to 25 mL zinc acetate (0.2 M), 25 mL NaOH solution (3 M) was included, and then a produced red precipitate was put into a Teflon autoclave at 300 � C for 12 h.Later, the final precipitate which is the synthesized ZnFe 2 O 4 nanoparticles was cleaned with distilled water and dried in an oven at 80 � C.Then, a given amount of these ZnFe 2 O 4 nanoparticles were mixed with these solutions.After a certain time, the concentrations of the dyes (AO and MB) and heavy metal (Cd (II)) in the solutions were determined using UV-Vis spectrophotometer and atomic absorption spectroscopy method, respectively.The main operating parameters for this case study that serve as input variables are adsorbent amount, analyte concentration, pH of the solution, and sonication time while its output variables are the removal percentages of AO, MB, and Cd (II). Figure 1 is a schematic diagram illustrating the input and output variables for Case studies 1 and 2.

Least squares support vector regression (LSSVR) model development
The and Ngu and Yeo (2022).The function of primal space can be expressed as shown in Eq. ( 1).
where by w denotes the m-dimensional weight vector while u and b correspond to the mapping function and the bias term.The objective function of minimization can be expressed in Eq. ( 2).
which follows the equality constraint shown in Eq. ( 3).
where by Z ¼ By applying the Lagrangian principle to Eq. (1), Eq. ( 4) is obtained as: where by a ¼ (a 1 , a 2 , … a m ).Upon eliminating w and e, Eq. ( 5) is obtained: where by y Then, the final form of the function of the LSSVR model for estimation is expressed in Eq. ( 6).
Moreover, the kernel function used in this LSSVR is the polynomial kernel that is shown in Eq. ( 7).
where p is the kernel parameter.Besides LSSVR, other models such as PCR, PLSR, LW-PLSR, ANN, SVR were also developed using the experimental data from Case studies 1 and 2.More details of these models can be found in Yeo (2023), Jana et al. (2022), Thien and Yeo (2021), and Yeo (2021).

Data splitting, parameter setting and software configuration
In this study, two case studies that consisted of a different number of samples were randomly splitted into training and testing or validation datasets with a ratio of 90:10 for the soft sensor models development and evaluation.Table 1 shows The RMSE of LSSVR using multiple splits of the data for Case studies 1 and 2. From Table 1, it can be seen that the train: test split ratio of 90:10 provides lower range of RMSE for both case studies.Hence, the train:test split ratio of 90:10 was used in this study.There are 52 experimental datasets for Case Study 1 while Case Study 2 has 30 experimental datasets.Based on the abovementioned ratio of data and the number of datasets, the total number of data, N T , the number of training data, N 1 , and the number of testing data, N 2 can be determined.The features of the datatsets in both case studies were standardized.A latent variable that refers to the number of latent variables adopted in each regression RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi where by y i refers to the actual output, ŷi refers to the predicted output.MAE is another performance error metric that is commonly utilized to evaluate regression models to avoid zero errors.The formula for MAE can be demonstrated by Eq. ( 9) (Pervez et al. 2023b).
R 2 is typically expressed in a range of 0 to 1, with a value nearest to 1 indicating better predictive performance.Eq. ( 10) shows the formula for R 2 (Yeo et al. 2020;Thien and Yeo 2021).
whereby ȳ denotes the mean value of the actual output.E a is employed to solve errors related to overfitting models.A situation that may arise while analyzing the results is the prediction error values for the training dataset could be the largest but the values for the testing dataset could be the smallest.Hence, E a can be utilized to calculate the overall prediction error value of the model in which a lower value suggests a better predictive accuracy.The formula for E a can be demonstrated by Eq. ( 11) (Ngu and Yeo 2023a;Yeo et al. 2022).
whereby RMSE 1 and RMSE 2 denote the RMSE values for testing and training datasets, respectively.Figure 2 illustrates the framework of the AI models used in this study.

Assessing the results from the AI models
The simulation results for training and testing datasets from LSSVR, LW-PLSR, PLSR, PCR, ANN, and SVR are summarized in Tables 3 and  4 for MB and Cd (II) removals, respectively.As mentioned earlier in Section 2.5, a lower value for RMSE, MAE, and E a indicates the higher predictive performance of a model whereas a closer value of R 2 to 1 means better accuracy of a model.It can be observed that the LSSVR model Besides, it can be seen from Tables 3 and 4 that ANN performed better than LW-PLSR, PLSR, PCR, and SVR for both training and testing data for both MB and Cd (II) removals, especially for Cd (II) removal in which its R 2 values are higher than 0.7.Also, from Tables 3 and 4,  the RMSE and MAE values of the ANN are between 11.22 to 18.12 for MB removal and 8 to 9.96 for Cd (II) removal which is the second lowest after LSSVR.This is due to ANN having a Rectified linear unit (ReLU) function that helps to transform the summed weighted input from the node into the activation of the node or output for that input thus increasing its prediction accuracy (Galety et al. 2021).Other than that, for the training data for both MB and Cd (II) shown in Tables 3 and 4, LW-PLSR has better RMSE 1 , MAE 1 , and R 1 2 values than PLSR, PCR, and SVR.LW-PLSR indicates better prediction performance, as it has a locally weighted (LW) algorithm that selects the nearest samples characterized by a minimum distance in the scores space that was used for local PLSR model calculation to correct the non-linear relationship between the data (Perez-Guaita et al. 2013).
On the other hand, as for the testing data, although PLSR, PCR, and SVR have better RMSE 2 , and MAE 2 for both MB and Cd (II) removals as compared to LW-PLSR, LW-PLSR still has better R 2 2 due to the presence of LW model.From Tables 3 and 4, it can also be observed that PLSR and PCR have similar predictive performance as their simulation results are close to each other as evident from their RMSE, MAE, and E a values of the training and testing data for MB and Cd (II) are all in the range of 17.24 to 27.37.Notice that PLSR, however, achieves fairly better prediction for the training data than PCR due to its consideration of both output and input variables (Yeo 2021).This is justified by their RMSE 1 , MAE 1 , and R 1 2 values in Tables 3 and 4 whereby PLSR is lower than PCR.Anyhow, the E a values for PLSR and PCR for the removal of MB and Cd (II) are still much worse than LSSVR.On top of that, besides PLSR and PCR, it can also be seen in Tables 3  and 4 that SVR with Gaussian kernel function did not give good results, especially for the testing data for the removal of MB and Cd (II) since its R 2 2 values are big negative values which indicate poor results.For the overall results, the results from LSSVR presented in this study are also superior to the simulation study conducted by Mazaheri et al. (2017) was only used training data for evaluation.Hence, LSSVR can be an alternative to the models to be used for similar applications.In a nutshell, for both training and testing data, the results displayed in Tables 3 and  4 show that LSSVR is the best-performing model to predict the effectiveness of MB and Cd (II) removals accurately.

Visualizing the comparisons of the models
To further evaluate the predictive results from the LSSVR, ANN, LW-PLSR, PLSR, PCR, and SVR, the comparisons between their predicted percentages of MB and Cd (II) removals are visualized in this section.Figure 3(a)-(d) compares the predicted output variables which are the percentages of MB and Cd (II) removals from the LSSVR, PCR, PLSR, LW-PLSR, ANN, and SVR for Case Study 1 and the actual percentages of MB and Cd (II) removals.From Figure 3(a)-(d), obvious that the predicted outputs from the LSSVR are mostly closer to their actual outputs and also within the error bars of their actual outputs.Moreover, it can also be seen in Figure 3(a)-(d) that the predicted outputs from PLSR, PCR, and SVR are mostly far from the actual outputs with error bars.Although some of the predicted outputs from ANN and LW-PLSR are sometimes close to the actual outputs, they are not as close as the predicted outputs from the LSSVR.In conclusion, for Case Study 1, the LSSVR provides the best-predicted outputs for the percentages of MB and Cd (II) removals.

Predictive performance of the LSSVR model
In addition, this section provides more details comparisons between the predicted outputs from the LSSVR and their actual outputs.Figure 4(a)-(d) shows the predicted outputs of the training and testing data from the LSSVR model against the actual outputs' plots for Case Study 1 of the prediction of MB and Cd (II).In Figure 4(a)-(d), the closer the predicted outputs to the y-x line the higher the accuracy of their values and the closer to the actual values.It can be observed from these figures that all the predicted outputs from the training and testing datasets are close to the y-x lines.Thus, these results denote that the LSSVR offers accurately predicted outputs for the percentages removal of MB and Cd (II).

Assessing the results from the AI models
To further examine the predictive performance of the LSSVR, another case study for the removal of dyes and heavy metal ions using ZnFe 2 O 4 nanoparticles which are Case Study 2 was used.This section presents the results for training and testing datasets from LSSVR, LW-PLSR, PLSR, PCR, ANN, and SVR, and they are summarized in Tables 5 to 7 for the AO, MB, and Cd (II) removals, respectively.Similar to Case Study 1, the results shown in Tables 5 to 7 are RMSE, MAE, E a, and R 2 for both training and testing data of the AO, MB, and Cd (II) removals.The smaller the values of the RMSE, MAE, and E a , as well as the closer the R 2 values to 1, the better the performance of a model.From Tables 5 to 7, among all AI models, the LSSVR again provides the best performance for the AO, MB, and Cd (II) removals since its E a values are 249% to 1,642% smaller.Besides, the RMSE and MAE values for the training and testing data from the LSSVR are lowed by 228% to 1,756% as compared to other AI models.This is because the LSSVR consists of the polynomial Kernel function that can map the nonlinear data in both training and testing data for the AO, MB, and Cd (II) removals to high dimensional space and LOO which can provide the maximal parameters, hence it has a more accurate prediction ability (Yeo and Lau 2021;Pervez et al. 2023d).Also, in Tables 5 to 7, LSSVR produced high values for R 1 2 and R 2 2 , between 0.95 and 0.99 for AO, MB, and Cd (II) removals.
Other than LSSVR, ANN is the second-best model since it has the second lower E a values for the AO, MB, and Cd (II) removals, which are 14.02, 21.38, and 12.17, respectively.Although LW-PLSR has better RMSE 1 , MAE 1 , and R 1 2 for the AO and Cd (II) removals than ANN, its other results are worse than ANN.The ReLU function in ANN accelerates the convergence of gradient descent toward the global minimum of the loss function due to its linear, non-saturating property for better performance (Khan et al. 2020).After ANN, LW-PLSR performed slightly better than PLSR, PCR, and SVR for the training data for the AO, MB, and Cd (II) removals since its RMSE 1 , MAE 1 , and R 1 2 are better.This is because LW-PLSR applies weights to a dataset to increase its prediction accuracy (Yeo et al. 2022).
In Case Study 2, notice that PLSR, PCR, and SVR have gotten almost similar results since their  In this case study, the LSSVR is well-performed for both training and testing datasets.Thus, in the final analysis, it is more convincing to mention that LSSVR is an appropriate model to attain the optimal conditions for the AO, MB, and Cd (II) removals.

Visualizing the comparisons of the models
For Case Study 2, this section compares the predicted output variables from the LSSVR, ANN, LW-PLSR, PLSR, PCR, and SVR with their respective output variables for the AO, MB, and Cd (II) removals.Figure 5

Predictive performance of the LSSVR model
Like Case Study 1, the predictive performance of the LSSVR model is further discussed in this section where the comparisons were made between the predicted outputs from the LSSVR and their actual outputs.For Case Study 2, Figures 6(a

Conclusion
The presence of dyes and heavy metal ions in the water sources could lead to dangerous effects on human life and other organisms.Besides, they are categorized as the most dangerous pollutants in water systems.Hence, the efforts to remove   accuracy, since it gives the best results for AO, MB, Cd (II) removals among other AI models.The RMSE, MAE, and E a are lowered by 83% to 1,756% as compared to LW-PLSR, PLSR, PCR, ANN, and SVR models.In addition, the R 2 values for the LSSVR are the highest which are all more than or close to 0.9 for all the AO, MB, and Cd (II) removals even for the testing data.
Hence, it can be concluded that LSSVR is the most appropriate model to be used in assessing the efficacy of AO, MB, and Cd (II) removals.
Figure 1.A schematic diagram illustrating the input and output variables for Case studies 1 and 2.

Figure 2 .
Figure 2. The framework of the AI models used in this study.

Figure 3 .
Figure 3. Comparisons of the actual outputs and predicted outputs from the LSSVR, PCR, PLSR, LW-PLSR, ANN, and SVR for Case Study 1, (a) training data for the prediction of MB, (b) testing data for the prediction of MB, (c) training data for the prediction of Cd (II), and (d) testing data for the prediction of Cd (II).

Figure 4 .
Figure 4. Predicted outputs from the LSSVR model against the actual outputs' plots for Case study 1, (a) training data for the prediction of MB, (b) testing data for the prediction of MB, (c) training data for the prediction of Cd (II), and (d) testing data for the prediction of Cd (II).
)-(f) demonstrates the plots of the predicted outputs from the LSSVR model against the actual outputs of the training and testing data for the AO, MB, and Cd (II) removals.Figure6(a)-(f) shows that the predicted outputs from the LSSVR are near the y-x line and their R 2 values are very high too, even for the testing data shown in Figure6(b), (d) and (f) in which the tested data were not involved in the LSSVR model development.Again, these results indicate that the LSSVR model possesses a perfect fit for the predictions of the AO, MB, and Cd (II) removals.
the dyes like AO and MB, as well as the heavy metal such as Cd (II) from the water sources or systems are essential.The efficiency of the removals of these pollutants using absorbents such as activated carbon produced from natural walnut and ZnFe 2 O 4 nanoparticles has been proven and the optimization of these methods is still ongoing.Hence, this study would like to introduce the LSSVR model to determine the optimum operating conditions for the removals of the AO, MB, and Cd (II).In this study, the performance of the LSSVR model in estimating the AO, MB, and Cd (II) removals was examined.Two different case studies for the AO, MB, and Cd (II) removals were utilized to develop and evaluate the prediction of the LSSVR.Based on these case studies, the results show that LSSVR is the best-performing model with the highest

Figure 5 .
Figure 5. Comparisons of the actual outputs and predicted outputs from the LSSVR, PCR, PLSR, LW-PLSR, ANN, and SVR for Case Study 2, (a) training data for the prediction of AO, (b) testing data for the prediction of AO, (c) training data for the prediction of MB, (d) testing data for the prediction of MB, (e) training data for the prediction of Cd (II), and (f) testing data for the prediction of Cd (II).

Figure 6 .
Figure 6.Predicted outputs from the LSSVR model against the actual outputs' plots for Case study 2, (a) training data for the prediction of AO, (b) testing data for the prediction of AO, (c) training data for the prediction of MB, (d) testing data for the prediction of MB, (e) training data for the prediction of Cd (II), and (f) testing data for the prediction of Cd (II).

Table 1 .
The RMSE of LSSVR using multiple splits of the data for Case studies 1 and 2.

Table 2 .
Parameters employed for all AI models in both Case studies 1 and 2.
(LOO) cross-validation is integrated with the LSSVR to tune the optimum parameters (c, k, and p) as displayed in Table 2 (Yeo and Lau 2021; Malang et al. 2023a).Overall, LSSVR produces the best results in comparison to other regression models where its RMSE, MAE, and E

Table 3 .
Prediction results of LSSVR, LW-PLSR, PLSR, PCR, ANN, and SVR for MB removal for Case study 1 on training and testing datasets.

Table 4 .
Prediction results of LSSVR, LW-PLSR, PLSR, PCR, ANN, and SVR for Cd (II) removal for Case study 1 on training and testing datasets.

Table 5 .
Prediction results of LSSVR, LW-PLSR, PLSR, PCR, ANN, and SVR for AO removal for Case study 2 on training and testing datasets.

Table 6 .
Prediction results of LSSVR, LW-PLSR, PLSR, PCR, ANN, and SVR for MB removal for Case study 2 on training and testing datasets.
(a)-(f) visualizes the comparisons of the actual outputs and predicted outputs from the LSSVR, PCR, PLSR, LW-PLSR, ANN, and SVR for Case Study 2. From Figure5(a)-(f), it can be said that only the predicted outputs from the LSSVR are close to the actual outputs.It also can be spotted that other AI models produced the majority of predicted outputs that are far from the actual outputs.Again, in Case Study 2, the LSSVR model is still the most suitable model for the prediction of the AO, MB, and Cd (II) removals.

Table 7 .
Prediction results of LSSVR, LW-PLSR, PLSR, PCR, ANN, and SVR for Cd (II) removal for Case study 2 on training and testing datasets.