Modeling monthly pan evaporation using wavelet support vector regression and wavelet artificial neural networks in arid and humid climates

ABSTRACT Evaporation rate is one of the key parameters in determining the ecological conditions and it has an irrefutable role in the proper management of water resources. In this paper, the efficiency of some data-driven techniques including support vector regression (SVR) and artificial neural networks (ANN) and combination of them with wavelet transforms (WSVR and WANN) were investigated for predicting evaporation rates at Tabriz (Iran) and Antalya (Turkey) stations. For evaluating the performances of studied techniques, four different statistical indicators were utilized namely the root mean square error (RMSE), the mean absolute error (MAE), the correlation coefficient (R), and Nash–Sutcliffe efﬁciency (NSE). Additionally, Taylor diagrams were implemented to test the similarity among the observed and predicted data. Outcomes showed that at Tabriz station, the ANN3 (third input combination that are air temperatures and solar radiation used by ANN) with RMSE of 0.701, MAE of 0.525, R of 0.990 and NSE of 0.977 had better performances in comparison with WANN, SVR and WSVR. So, the wavelet transforms did not have positive effects in increasing the precision of ANN and SVR predictions at Tabriz station. Also, approximately the same trend was seen at Antalya station. In other words, ANN5 (fifth input combination that are air temperatures, relative humidity and solar radiation used by ANN) with RMSE of 0.923, MAE of 0.697, R of 0.962 and NSE of 0.898 had a more accurate predictions among others. Conversely, wavelet transform reduced the prediction errors of SVR at Antalya station. So, the WSVR5 with RMSE of 1.027, MAE of 0.728, R of 0.950 and NSE of 0.870 predicted evaporation rates of Antalya station more precisely than other SVR models. As a conclusion, results from the current study proved that ANN provided reasonable trends for evaporation modeling at both Tabriz and Antalya stations.


Introduction
Evaporation is playing a prominent role in the processes of hydrologic cycle. Therefore, estimation of this phenomenon is necessary as it affects water resources, especially in arid and semi-arid areas like Iran. A considerable amount of precipitation is allocated to evaporation and the sum of evaporation from the land surface is approximately 61% of the entire global precipitation (Chow, Maidment, & Mays, 1988). Every year, several millions of cubic meters of fresh water that has collected with high costs, efforts and difficulties evaporated from tank of dams and reservoirs. Therefore, it is of great significance to estimate the evaporation amount for water CONTACT Shahaboddin Shamshirband shahaboddin.shamshirband@tdtu.edu.vn resource monitoring. In regions with hot weather and low rainfall, evaporation can lead to losing the noticeable amount of water, hence it causes decreasing in the water surface elevation. Typically, there are two different (direct and indirect) approaches for estimation of evaporation from various variables. Measuring pan is the most prevalent direct method for specifying the value of evaporation (Stanhill, 2002). Class A pan, as the most frequently used pan, has 4 ft (122 cm) in diameter and 10 inch (25 cm) deep and is located about 6 inch (15 cm) above the soil surface. Although calculating evaporation by using pans appears to be having precise answers but using indirect approaches for approximating pan evaporation (PE) seems to be easier and cost-effective due to the fact that installing pans and meteorological stations are the expensive process. Indirect methods for estimation of PE are based on using meteorological variables. Empirical and semi-empirical models have been completely used for calculating PE from meteorological variables such as wind speed, sunshine hours, relative humidity, radiation, atmosphere pressure and the main factor, temperature. Because of these several variables, estimating evaporation rate becomes more complex as it is totally non-linear. A number of scholars have tried to predict evaporation from different meteorological variables (Chang, Sun, & Chung, 2013;Kisi, 2006;Lin, Lin, & Wu, 2013). In recent years, machine learning approaches including Artificial Neural Network (ANN), Adaptive Neuro-Fuzzy Inference System (ANFIS), Fuzzy Genetic (FG), Support Vector Regression (SVR) and the integrated models of these methods with wavelet or other data preprocessing approaches have been effectively applied in water related fields such as water resource engineering, prediction of suspended sediment load in rivers, forecasting monthly stream flow, estimating friction factor in irrigation pipes, PE modeling, etc. (Chau, 2017;Choubin et al., 2018;Choubin, Malekian, Samadi, Khalighi Sigaroodi, & Sajedi Hosseini, 2017;Fotovatikhah et al., 2018;Jain & Srinivasulu, 2004;Keshtegar, Piri, & Kisi, 2016;Kisi, 2005;Kisi, Genc, Dinc, & Zounemat-Kermani, 2016;Nourani, Alami, & Daneshvar Vousoughi, 2015;Rajaee, Mirbagheri, Zounemat-Kermani, & Nourani, 2009;Samadianfard et al., 2018;Samadianfard, Delirhasannia, Kisi, & Agirre-Basurko, 2013;Samadianfard, Nazemi, & Sadraddini, 2014;Samadianfard, Sattari, Kisi, & Kazemi, 2014;Taormina, Chau, & Sivakumar, 2015;Wang, Xu, Chau, & Chen, 2013). Goyal, Bharti, Quilty, Adamowski, and Pandey (2014) applied ANN, ANFIS, least squares-SVR (LS-SVR) and fuzzy logic to increase the precision of PE estimation using various meteorological variables, like daily rainfall, sunshine hours, minimum and maximum air temperatures and humidity. The outcomes exhibited that the LS-SVR and fuzzy logic can be utilized effectively in predicting PE values. Guven and Kisi (2013) Kim and Kim (2008) for PE and evapotranspiration modeling. The outcomes indicated the specific capabilities of both studied models for the precise approximation of PE and evapotranspiration values. Recently, Shafaei and Kisi (2015) applied ANFIS, SVR and autoregressive moving average (ARMA) models, separately, then combined the models with wavelet, known as WAN-FIS, WSVR, and WARMA models. The efficiency of the aforementioned two groups (single form of models and combined forms) is compared with each other. Obtained results discovered that the combined models give better accuracy in anticipating lake levels in the study zone in comparison to single models. In previous researches, it has been tried to improve the prediction accuracy of PE estimates. So, it can be comprehended that the current gap is developing and implementing new methods for decreasing the prediction errors. So, in the current research, the potential of two machine learning approaches, namely ANN and SVR, and the integrated wavelet of these models, Wavelet-ANN (WANN) and Wavelet-SVR (WSVR), used to forecast evaporation rates in Tabriz and Antalya stations and the efficiencies of them were compared with each other. The goals of the study were (i) evaluating the performance of aforementioned models in the estimation of PE, and (ii) investigating the role of climate in predicting PE by considering humid and arid climates. The rest of the paper is structured as follow. Section 2 provides the characteristics of study area, description of implemented methods and evaluation parameters. Additionally, section 3 discussed the obtained results and finally, the conclusion is presented in section 4.

Study area
The monthly data of two synoptic stations in two different countries, Tabriz Station (latitude 38°05 N, longitude 46°17 E) in Iran and Antalya Station (latitude 36°42 N, longitude 30°44 E) in Turkey that are in approximately same latitudes, were used in the current study ( Figure 1). Tabriz, which is situated at northwestern of Iran, is a mountainous basin with a semi-arid climate and cold winters. Besides, Antalya, which is placed in the southern region of Turkey, has Mediterranean climate that characterized by temperate wet winters and hot arid summer. The elevations from sea level are 1350 and 64 m  for the Tabriz and Antalya stations, respectively. In summary, as it can be proved from Table 1, Antalya has a hot and wet climate. In contrary, Tabriz has cold and dry weather. Meteorological variables that were used for this study are air temperatures (T), solar radiation (Rs), wind speed (W), relative humidity (RH) and PE with the time period of 1992-2016 for Tabriz station and 1986-2006 for Antalya station. Table 1 represents the monthly statistical parameters of the applied variables for both stations. It can be seen in Table 1 that the mean values of PE at Tabriz and Antalya stations are 5.425 and 5.568 mm, respectively. Furthermore, it is obvious that the minimum PE in Tabriz station is zero because in winters when the temperature reaches below zero, water freezes in the pan and the volume of evaporation becomes negligible. Most of the variables indicate normal distributions because they have low skewness values, except wind speed in Antalya station which has the maximum skewness and shows a skewed distribution. This might be due to the location of this station in a coastal area (Kisi, 2009 Figure 2.

Artificial neural networks
In preceding decades, the ANN approach was considered by many researchers as a dependable tool of computations. From the initial neural model which has been introduced by McCulloch and Pitts (1943), researchers developed several different models. Additionally, most activities have been concentrated on backpropagation and its extensions (Salas, Markus, & Tokar, 2000). This algorithm (Rumelhart & McClelland, 1986) is utilized in feed-forward ANNs, meaning that the neurons are ordered in several layers, and deliver their signals forward, then the errors are propagated backward. The network obtains inputs by input layer neurons, and the network output is obtained from the output layer  neurons. The number of hidden layers is typically determined by trial and error. ANNs are extremely talented of modeling and simulating linear and non-linear systems.
In the current study, input layers were determined using a trial-error process and the number of neurons in each layer were selected to be among 1 and 10. Figure 3 shows a typical configuration of the ANN model.

Support vector regression
The support vector machine (SVM) is an extensively used and popular estimator which its fundamental idea was introduced by Vapnik (1995Vapnik ( , 1998. Then, a regression technique, namely SVR, was developed based on SVM models in order to solve complex problems, effectively. This model was constructed on the structural risk minimization. For this purpose, ε− the insensitive loss function is identified which means that the model allows tolerating errors up to ε in the training data sets. Therefore, the SVR seeks for a linear function formulated as follows: where F and L denote the coefficients of the weight vector of the linear expression. This linear regression problem can be explained as the following optimization problem: The constant C > 0 is a positive trade-off factor for the grade of the experimental error. C and ε are predefined parameters. Moreover, ξ i and ξ * i that is called as slack variables. For solving this problem, the SVR uses kernel trick (Smola & Scholkopf, 2004). In this study, four different kernel functions, including Pearson VII function-based, radial basis function (RBF), polynomial and normalized polynomial are used. For building optimum SVR model, the parameters of SVR such as the kernel function, C and ε must be selected cautiously. Figure 4 indicates the schematic configuration of the SVR model.

Wavelet transform
Developed by Grossman and Morlet (1984), the wavelet transform (WT) has been broadly applied in different fields of science and engineering. It is mathematical transformations which are used for eliciting further information from the data which is not currently obvious in its crude form. This transformation can determine  trends in the data such as discontinuities, breakdown points and minimum and maximum. Also, the ability of WT to be applied to any types of time series has made it a beneficial tool. As a consequence, this tool seems to be an effective substitute for the Fourier transform in saving local, non-periodic and multi-scaled phenomena (Goyal et al., 2014). The WT generates some sub-series from the main time series data in order to create wavelet data-driven models. The WT appears in different forms such as continuous (CWT) and discrete (DWT). The calculation of the CWT approach requires large data and high calculation time. Conversely, the DWT needs less calculation time and is simpler to apply than the CWT. Two sets of filters, including low and high passes, are employed by DWT to decompose the main time series ( Figure 5) (Nourani, Alami, & Aminfar, 2009). Choosing wavelet types and level of decomposition are the key points of WT. These factors should determine carefully. So, trial and error procedure is frequently utilized for finding appropriate mother wavelet type. Different wave families were examined in the current research and Daubechies with relatively better performance was selected for PE estimation. Furthermore, an empirical survey by Nourani, Hosseini Baghanam, Adamowski, and Kisi (2014) recommended that the optimum enactment of WT occurs when the decomposition level is selected equal to the integer part of [log (N)]. So, this parameter was selected equal to two for PE estimation.

Wavelet artificial neural networks and wavelet support vector regression
In this investigation, wavelet analysis was combined with ANN and SVR for monthly pan evaporation prediction as presented in Figure 5. Above mentioned wave types were evaluated and finally the measured monthly time series, H were decomposed into multi-frequency time series including details (H D1 ; H D2 ; . . . ; H Dn ) and approximation (H a ) by optimum DWT. All of the decomposed components were used as inputs for both ANN and SVR separately. The output is known as prediction shown in Figure 6.

Evaluation parameters
For evaluating the precision of the studied models, four different evaluation functions are used including root mean squared error (RMSE), mean absolute error (MAE), correlation coefficient (R) and Nash-Sutcliffe efficiency (NSE). The mentioned functions are defined as: where p i is the i th predicted value by models, o i is the i th observed value, and n is an entire number of sample data. The MAE, R and NSE functions can be stated as: whereō andp are an average of observed and predicted data, respectively. Moreover to the statistical parameters defined in Equations (3-6), Taylor diagram (Taylor, 2001) was used to validate the accuracy of considered models. Basically, the mentioned diagram is a detailed illustration of the observed and predicted data (e.g. IPCC, 2007). Taylor presented a solo illustration for presenting different evaluation parameters, instantaneously. Remarkably, these are capable for highlighting the correctness of predictive models using several points on a polar plot. In this diagram, correlation coefficient, standard deviation and RMSE values among predicted and observed values can be presented for better recognition of variations (Gleckler, Taylor, & Doutriaux, 2008;Taylor, 2001). Additionally, the mentioned diagram can be plotted by a developed code in Wolfram Mathematica (version 11.0.1.0).

Results and discussion
The performance of considered methods, namely ANN, WANN, SVR and WSVR, to forecast PE at two different stations, Tabriz (northwest of Iran) and Antalya (southwest of Turkey), was compared. In this research, all models were partitioned, where 70% of the data was used to train and the 30% was used to test the mentioned models. For Tabriz station, monthly data from 1992 to 2008 were considered as the training data set, while the measured data from 2009 to 2016 served as the testing data set. Furthermore, for Antalya station, training and testing data set periods were from 1986 to 1999 and 2000 to 2006, respectively. Based on meteorological variables, seven different input combinations were considered in the current study. Because T has the highest correlation with PE in Tabriz and Antalya stations, it has been used in all input combinations to increase prediction accuracy. As given in Table 2, the evaluated input combinations are: (1) T and RH, (2) T and W, (3) T and Rs, (4) T, RH and W, (5) T, RH and Rs, (6) T, W and Rs, (7) T, RH, W and Rs. The results of statistical parameters including RMSE, MAE and R values for prediction of PE at Tabriz  Table 3 that WT did not have a substantial positive effect in reducing the prediction error at Tabriz station. In other words, the precision of ordinary ANN and SVR methods were higher than correspondent ones with WT. Thus, it can be concluded that the meteorological parameters of temperature and radiation had the most effects in the exact prediction of PE at Tabriz station. Also, relative humidity had a positive effect in scenario V for reducing the prediction errors. Somehow different trend was seen for Antalya station. At this station, ANN5 with input parameters of temperature, radiation and relative humidity, with RMSE of 0.923, MAE of 0.697, R value of 0.962 and NSE of 0.895 had more accurate prediction in comparison with other methods and scenarios. Then, WANN5 was selected as the second best with the same input parameters, with RMSE of 0.927, MAE of 0.682, R value of 0.968 and NSE of 0.894. In addition, ANN6 was selected as the third best with input parameters of temperature, radiation and wind speed. So, it is clear that, similar to the Tabriz station, temperature and radiation had the most positive effects for presenting an accurate prediction of pan evaporation. Also, relative humidity increased the prediction precision in Antalya station. It is maybe due to the fact that Antalya is next to the seaside, therefore it could be concluded that relative humidity is an important factor in increasing the accuracy of estimated pan evaporation, adjacent to the seaside. So, it can be comprehended that the ANN has the undeniable capability in reducing the  prediction error of PE values. This may be due to the artificial architecture of ANN models. Figures 7 and 8 show the observed and predicted values of PE in the test stage at Tabriz and Antalya stations, respectively. It is obvious that the predictions of Tabriz station are in better agreement with observed data in comparison with Antalya station. It is deduced that in wet areas, the high degrees of humidity reduces the accuracy of predictions since it has the lowest correlation with PE. Also, Figures 9 and 10 show the observed (x-axis) and predicted (y-axis) PE values for the best scenarios of each method in the test stage for both studied stations in the form of scatter plots. At Tabriz station, ANN3 seems to be better than the optimal scenarios of the ANN, WANN, SVR and WSVR. In other words, the estimates of ANN3 are closer to the exact line than the others. Also, for Antalya station, the predictions of ANN5 are less scattered than the best scenarios of other mentioned methods. So, the ANN3 and ANN5 with less scattered predictions seem to be the best models for Tabriz and Antalya stations, respectively. It is recognizable from   Figure 10 that in Antalya station, as much as the observed amount of evaporation increases, the estimated amount deviates from the exact line. So, estimation of evaporation for great values has less accuracy in comparison to low values. However, Figure 9 shows that for Tabriz station this deviation is not noticeable. Moreover, the resulted statistical parameters of the mentioned models in the test period are presented in Table 3. Additionally, the overall results proved that the accuracy of the studied models in Tabriz station was higher than Antalya station. This may be due to the fact that the PE values in arid regions are lower than humid regions and it affects the predictions. The obtained results are parallel to the related literature. Kisi (2009) applied ANN for modeling pan evaporation data of inland (Fresno station) and coastal stations (Los Angeles and San Diego stations) of USA and he found that the models generally performs better in the inland station when compared to coastal ones. Moreover, comparing the obtained results with the findings of Ghorbani, Kazempour, Chau, Shamshirband, and Taherei Ghazvinei (2018) showed that the RMSE value of ANN-3 in Tabriz station (0.701) is lower than the RMSE value of Hybrid MLP-QPSO Model in Talesh station. Probability distribution of the observed and predicted data in the test period is presented in Figures 11 and  12. These figures represent the probability occurrence of a specific PE inside a particular interval (Al-Shammari et al., 2016). It can be comprehended that the probability distributions of the predicted values of the ANN3 Figure 11. Histograms of pan evaporation at Tabriz station. and SVR3 (Tabriz station) and the ANN5 and WANN5 (Antalya station) are close to the observed values for most intervals.
Furthermore, Figure 13 illustrates Taylor diagrams for the best studied models. In these figures, the radial length from the green point (i.e. the observed PE value) is a quantity of the RMSE (Taylor, 2001). Consequently, the more precise model is identified by the point with an R value of 1 and the minimum RMSE. It is evident from Figure 13 that ANN3 and ANN5 produced the best prediction of PE in Tabriz and Antalya stations, respectively.

Conclusion
In the current study, the capabilities of ANNs, SVR and also their combination with WT in predicting PE in Tabriz (Iran) and Antalya (Turkey) stations have been examined. A time series data of monthly PE (1992( for Tabriz and 1986( -2006 for Antalya) were prepared and 70% of them was selected as training and remaining 30% was used as a testing period. Seven different scenarios including different effective meteorological parameters have been defined for analyzing the effectiveness of each parameter in increasing the prediction accuracies. Also, the performances of studied methods including ANN, WANN, SVR and WSVR, with seven different scenarios for each of them, were comprehensively examined using three statistical parameters, namely RMSE, MAE and R coefficients. Additionally, Taylor diagrams were used where the joint assessment of model correctness using R and RMSE values was carried out. The obtained results showed that at Tabriz station, ANN3, SVR3 and ANN5 had better predictions of PE in comparison to other methods. Also, ANN5, WANN5 and ANN6 proved as the best models for Antalya station. Moreover, the results showed that PE highly depends on temperature and radiation rather than other parameters. As a conclusion, ANN can provide reasonable predictions of PE in both Tabriz and Antalya stations. However, WT did not have a remarkable influence in reducing the prediction errors. Finally, to enrich the effectiveness of the current research, considering and testing the mentioned models in other locations with different climates can be carried out in the future works.

Disclosure statement
No potential conflict of interest was reported by the authors.