Daily global solar radiation modeling using data-driven techniques and empirical equations in a semi-arid climate

Solar radiation, moisture and temperature are the most vital meteorological variables which affect plant growth. Due to the fact that the global solar radiation (GSR) is scarcely gauged at meteorological stations in developing countries, it is commonly estimated by data-driven techniques or by empirical equations. In this study, support vector regression (SVR), model trees (MT), gene expression programming (GEP) and adaptive neuro–fuzzy inference system (ANFIS) and several empirical equations were applied to assess the relations between GSR and several meteorological variables including minimum temperature ( T min ), maximum temperature ( T max ), relative humidity (RH), sun- shine hours (n), maximum sunshine hours (N), corrected clear-sky solar irradiation (ICSKY), day of year (DOY) and extra-terrestrial radiation ( R a ). For this purpose, the daily GSR measured from the beginning of 2011 to the end of 2013 at Tabriz synoptic station, which is located in semi-arid regions of Iran, were used. A direct strong relationship was observed to exist between the GSR and n. For evaluating the performances of studied techniques, three different statistical indicators were used namely root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (CC). Additionally, a Taylor diagram was utilized to test the similarity between the observed and predicted GSR values. Results indicated that the SVR-6 with input parameters of R a , RH, T min , T max , n/N had better accuracy in predicting GSR with RMSE of 1.656, MAE of 0.990, CC of 0.980 and WI of 0.990 than the other models. Moreover, MT-6 ranked as the second best model in the prediction of GSR values. As an interesting point, studied empirical equations had lower accuracies comparing with the SVR, GEP, MT and ANFIS methods. For instance, GSR values were computed by Angstrom and Prescott equation, as the best empirical equation, with RMSE of 1.786, MAE of 1.156, CC of 0.977 and WI of 0.988. Conclusively, results from the current study proved that the SVR provided reasonable trends for GSR modeling at Tabriz synoptic station. Furthermore, MT models with linear equations can be implemented with a high degree of simplicity and acceptable precision in GSR estimation.


Introduction
Energy, which plays a vital role in the current societies, accelerates economic developments and has been thought to be as one of the global critical issues in the last few decades (Gairaa, Khellaf, Messlem, & Chellali, 2016). Furthermore, by sharp decreasing of the world reserves oil and also due to its high pollution, many believe that solar energy is one of the best substitutions of fossil fuels according to its unique characteristics such as worldwide accessibility and environmental friendly features (Shaddel, Javan, & Baghernia, 2016). Solar energy is mainly utilized to design solar systems CONTACT Shahaboddin Shamshirband shahaboddin.shamshirband@tdtu.edu.vn (Qazi et al., 2015), radiant floor cooling systems (Feng, Schiavon, & Bauman, 2016), environmental and agricultural studies (Kaufmann & Hagermann, 2015;Lamnatou & Chemisana, 2013) and managing the effects of global warming (Ming, De_Richter, Liu, & Caillol, 2014). However, despite the broad range of applications of solar energy, direct measuring of solar radiation is not available in most countries, especially developing ones. Furthermore, in some regions, sensors of solar radiation have not been installed in the meteorological stations. Even in some stations with these sensors, the measured data could be missing or inaccurate due to technical problems. So, the common practice in the mentioned situations is to use mathematical, empirical or recently data-driven techniques, which have been established on the basis of measured meteorological parameters, to have precise estimations of actual solar radiation (Ozgoren, Bilgili, & Sahin, 2012;Sun, Zhao, Zeng, & Yan, 2015). Data-driven techniques have the numerous applications in hydrological engineering and contemporary real-life problems (Chau, 2017;Fotovatikhah et al., 2018;Jeihouni, Delirhasannia, Alavipanah, Shahabi, & Samadianfard, 2015;Moazenzadeh, Mohammadi, Shamshirband, & Chau, 2018;Samadianfard, Delirhasannia, Kisi, & Agirre-Basurko, 2013;Samadianfard, Nazemi, & Sadraddini, 2014;Samadianfard, Sattari, Kisi, & Kazemi, 2014;Taormina, Chau, & Sivakumar, 2015;Wu & Chau, 2011). So, the researchers attempt to develop accurate predictor models of solar radiation in various time scales that is a crucial issue of related solar energy sciences. Data-driven techniques have been widely used for predicting solar radiation over the last decade. The key merit of them is that, unlike the empirical equations, they do not need a primary model of the associations among input and output data. Tymvios, Jacovides, Michaelides, and Scouteli (2005) analyzed the capabilities of artificial neural network (ANN) models comparing with Angstrom-Prescott linear equation for estimating daily global solar radiation (GSR) using input parameters in Athalassa, Cyprus. The obtained results proved that the ANN model, using climatological parameters of n, N and T max , provided more precise estimates of GSR than the studied empirical equations. Benghanem, Mellit, and Alamri (2009) developed different ANN models, using meteorological parameters of sunshine hours (n), temperature (T), relative humidity (RH) and day of year (DOY), for predicting GSR in Madinah, Suadi Arabia. They reported that the ANN model with input parameters of n and T produced accurate predictions with the correlation coefficient (CC) of 0.976. Koca, Oztop, Varol, and Koca (2011) tested the effects of different combinations of meteorological parameters in accurate estimation of GSR using ANN models in seven cities of Turkey. The results showed that the predictions of Antakya were more precise than those of the other studied cities. Additionally, they stated that the coordinates of the cities and their sunshine hours had undeniable effects on the accuracy of the GSR predictions. Wu and Liu (2012) estimated the GSR values using temperature parameters by developing a support vector machine (SVM) based models for 24 stations in China. The SVM model with input combinations of T and T max -T min provided lower error meters. Mostafavi, Saeidi Ramiyani, Sarvar, Izadi Moud, and Mousavi (2013) proposed a new hybrid approach involving genetic programming and simulated annealing for estimating GSR values. By performing sensitivity analysis, they found that T min and T max had vital effects on accurate estimation of solar radiation. In other research, Ramedani, Omid, Keyhani, Khoshnevisan, and Saboohi (2014) inspected the capabilities of several kernel functions of support vector regression (SVR) in estimating GSR values in Tehran. They stated that SVR with radial basis function had superior predictions than the adaptive neuro-fuzzy inference system (ANFIS) and ANN models. Kisi (2014) examined the capabilities of fuzzy genetic approach (FG) in GSR modeling of seven cities of Turkey. The coordinates of the studied locations and month number of the year were utilized as input parameters of FG and the results were compared with the results of ANN and ANFIS methods. He reported that the accuracy of FG outputs was superior to correspondent ANN and ANFIS predictions. In another study, SVR models with radial basis and polynomial kernel functions have been utilized by Piri, Shamshirband, Petkovic, Tong, and Rehman (2015) for estimating GSR using climatological parameters of n, minimum temperature (T min ), maximum temperature (T max ) and RH in two stations of Zahedan and Bojnurd, Iran. Results indicated that the precision of SVR estimates was higher than correspondent empirical results due to statistical analysis. Mohammadi, Shamshirband, Anisi, Alam, and Petkovic (2015) used SVR methodology for estimating the horizontal GSR using climatological parameters of n and maximum sunshine hours (N) as inputs and compared the obtained results with the correspondent estimates of empirical equations. The outcomes of the research proved the superior capabilities of SVR methodology comparing empirical equations. Citakoglu (2015) applied ANN and ANFIS methodologies for estimating GSR using meteorological parameters as inputs for various stations in Turkey. The obtained results showed that the ANN had superior accuracy comparing ANFIS and empirical equations. Mousavi, Mostafavi, Jaafari, Jaafari, and Hosseinpour (2015) developed genetic programming (GP)-based approaches for estimation of daily GSR using meteorological parameters. They stated that the presented models had higher performances in comparison with the regression-based models. Mehdizadeh, Behmanesh, and Khalili (2016) examined the capabilities of gene expression programming (GEP), ANN and ANFIS methods in addition to various empirical equations for estimation of daily GSR in Kerman City, Iran during the time period of 1992-2009. The findings of the study indicated that the ANN and ANFIS models with sunshine based parameters had prevailing performances in comparison with other studied methods. Sharifi, Rezaverdinejad, and Nourani (2016) performed a comparative research among ANN, GEP, Wavelet regression (WR) and empirical equations for estimation of daily GSR. The obtained results proved the high ranking capabilities of ANN models in GSR estimation. Vakili, Sabbagh-Yazdi, Khosrojerdi, and Kalhor (2017) presented an ANNbased model for estimating GSR using particle matter, wind speed, RH and T in Tehran, Iran. They discussed that adding particle matter to climatological parameters increased the accuracy of GSR predictions. Rao, Premalatha, and Naveen (2018) examined the effects of different combinations of meteorological parameters in GSR estimation by ANN models. Obtained results proved that the ANN model with input parameters of theoretical sunshine hours and extraterrestrial radiation had the most accuracy in GSR estimation. Kaba, Sarıgül, Avcı, and Kandırmaz (2018) implemented a deep learning theory for GSR estimation in 34 stations of Turkey. They concluded that the considered model had suitable accuracy in comparison to many previous researches in GSR estimation. Benali, Notton, Fouilloy, Voyant, and Dizene (2019) tried to predict hourly GSR values on the site of Odeillo, France using three soft computing techniques including ANN, smart persistence and random forest. They stated that the precision of random forest models was considerably higher than other studied models in GSR prediction.
Even though GSR is one of the most frequently measured parameters worldwide, the numbers of these stations are still scanty, mainly in developing countries. So, despite the great number of studies aimed to estimate GSR in different stations worldwide, evolving flexible and accurate techniques for valid predictions is highly required due to the importance of the availability of accurate GSR data. Furthermore, regarding the insufficient researches on applications of model tree (MT) and SVR in GSR estimation and the significance of different combinations of climatological parameters in increasing the estimation accuracies, the main goal of the current study is assessing the capabilities of MT and SVR methodologies and comparing with ANFIS, GEP and empirical equations in GSR estimation. For the comprehensive investigation of the aforementioned models, the comparisons are performed based on widely used statistical meters. Additionally, a Taylor diagram was used for selecting the relatively accurate model in GSR estimation.

Study area and data collection
The monthly climatic data of automated weather station (latitude 38°05 N, longitude 46°17 E) located in Tabriz, Iran, were used in the current study ( Figure 1). Tabriz, which is situated at northwestern of Iran, is a mountainous basin with a semi-arid climate and cold winters. Meteorological variables (Table 1) that were used for this study are: minimum temperature (T min ), maximum temperature (T max ), relative humidity (RH), sunshine hours (n), maximum sunshine hours (N), corrected clear-sky solar irradiation (ICSKY), day of year (DOY) and extra-terrestrial radiation (Ra) and global solar radiation (GSR) with the time period of 2011-2013. Table 2 represents the daily statistical parameters of the applied meteorological variables for Tabriz Station in both training and test phases. Most of the variables indicate normal distributions because they have low skewness values, except n/N and GSR which they show negative and positive skewed distributions, respectively. R a shows higher correlation with GSR in both training and test data. Furthermore, DOY has the lowest correlation with GSR comparing with other meteorological parameters. Overall results reveal that the R a , T max and T min have the first, second and third highest correlations with GSR values, respectively. Moreover, there is a high inverse correlation between the RH and GSR. The minimum GSR value of training data (1.97) is higher than the corresponding value of test data (1.09) and this may cause extrapolation difficulties for the applied data-driven models in estimation low GSR values in the test period. The observed meteorological data are shown in Figure 2.

Empirical equations
Empirical equations, which implement meteorological parameters, are beneficial for the estimation of solar radiation. Angstrom (1924) and Prescott (1940) proposed Angstrom-Prescott method for predicting GSR using n/N by implementing a linear relationship. Furthermore, Swartman and Ogunlade (1967) added RH parameter for accurate estimation of GSR. They also used a linear regression-based equation for GSR estimation. Bristow and Campbell (1984) and Allen (1997) estimated GSR utilizing minimum and maximum temperatures and applying nonlinear equations. Elagib and Mansell (2000) used an exponential function implementing n/N and Chen, Ersi, Yang, Lu, and Zhao (2004) presented accurate predictions of GSR values by adding minimum and maximum temperature difference. The selected empirical equations for comparing their accuracies with datadriven techniques are illustrated in Table 3.

Support vector regression
The support vector machine (SVM) is a broadly implemented estimator which was developed by Vapnik (1995) and uses the concepts of supervised learning. Later, a regression-based technique (SVR) was presented using the theory of SVM and structural risk minimization for solving complicated problems. In this method, a kernel function is employed for converting a nonlinear problem to a linear one. Thus, ε− the insensitive loss function is recognized which indicated that the model permits tolerating errors up to ε in the training data sets. So, the SVR look for a linear function as follows: where F and L represent the coefficients of the weight vector of the linear expression. This linear regression can be defined as the following: C is a predefined constant trade-off factor for the grade of the experimental error. Additionally, kernel functions are one of the vital factors of SVR for accurate solving of complex issues (Smola & Schölkopf, 2004). In the current study, four different kernel functions, namely polynomial, normalized polynomial, Pearson VII functionbased and radial basis function were utilized. Figure 3 specifies schematic configuration of SVR model.

Gene expression programming
Gene expression programming (GEP) implements computer programs for solving complicated problems (Ferreira, 2001a(Ferreira, , 2001b. Also, the main distinction between GEP and its former version such as GP and genetic Note: The terms X mean , X min , X max , S x , C v and C sx denote the mean, minimum, maximum, standard deviation, coefficient of variation and skewness, respectively. algorithm (GA) is due to the differences in the nature of individuals (Mesbah, Soroush, & Rostampour Kakroudi, 2017). GEP algorithm begins by generating preliminary population through the random formation of chromosomes with the fixed length. Then, every chromosome is examined by a fitness function in order to modify the reproduction of the future generation. Various fitness functions can be contemplated for a GEP model. Owing to the fact the RRSE fitness function has been applied frequently in prior studies (e.g. Emamgolizadeh, Bateni, Shahsavani, Ashrafi, & Ghorbani, 2015), it was preferred herein. The mentioned process will be continued for a predefined number of generations or until an appropriate solution is established (Ferreira, 2006). Simple drawing of GEP algorithm can be comprehended from Figure 4.

Model trees
Recently, there has been an increasing trend in using decision trees in modeling complex nonlinear problems. M5 model tree (MT) as one of the commonly used conventional decision tree algorithms was introduced by Quinlan (1992) and has been created based on divide and conquer methodology and can be applied for prediction of several variables (Singh, Sachdeva, & Pal, 2016).
MT has a high proficiency in handling a large number of attributes and is a strong technique in predictive purposes (Solomatine & Dulal, 2003). Moreover, MT is a mathematical predictive algorithm which its nodes of the trees are selected over the attributes that minimize the expected error as a function of the standard deviation of target parameter (Zhang & Tsai, 2007). MT implements three steps in its process including breaking the input space, establishing the tree and finally obtaining the knowledge of it (Behnood, Olek, & Glinicki, 2015). The mentioned process is presented schematically in Figure 5. Primarily, the input data is separated into different regions, as shown in Figure 5(a). The mentioned process is based on implementing linear regression and reducing the standard deviation of the errors among measured and predicted values. Then, using the splitted space from the first step, a decision tree, with leaf at the top and nodes at the bottom, is generated (typically presented in Figure 5(b)). The last step includes predicting target value using obtained the linear regression equations. At this step, input data is presented to the root and start passing its way to the nodes by evaluating the splitting parameters (X i ) (see Figure 5(c)). The mentioned criteria for generating linear regression equation, namely standard deviation reduction (SDR) can be computed as follows: where T symbolizes a set of examples which reaches the nodes, T i indicates the subset of examples and sd represents the standard deviation. Considering the splitting procedure, the data in child nodes have lower standard deviation in comparison with the parent node and are purer. After investigating the whole potential splits, MT selects the one that maximizes SDR. The mentioned procedure usually creates a large tree that may cause overfitting. For solving this issue, the tree must be pruned back, for instance by changing a sub tree with a leaf. Hence, the next stage contains pruning the overgrown tree and substituting the sub trees with linear regression functions. For additional detailed information, readers are recommended to study Quinlan (1992).

Adaptive neuro-fuzzy inference system
ANFIS, as one of the artificial intelligence methods, is based on Takagi-Sugeno fuzzy inference system (Saini & Kumar, 2016). This method has been broadly used in solving complicated engineering problems (for instance, Olatomiwa, Mekhilef, Shamshirband, & Petković, 2015;Piri & Kisi, 2015). ANFIS includes fuzzy logic operations based on the theory of membership function and uses some if-then rules. ANFIS is consisted of three key parts including a fuzzy rule base, data and a suitable procedure. The membership functions are defined by data and the inference process would be performed by the procedure (Akbarpour, Mohajeri, & Akbarpour, 2016). Moreover, ANFIS employs a first order Sugeno-type fuzzy inference system, which has two inputs and one output defined as follows: Rule 1: if x is A 1 and y is B 1 , then f 1 = p 1 x + q 1 y + r 1 Rule 2: if x is A 2 and y is B 2 , then f 2 = p 2 x + q 2 y + r 2 Where x and y are input characteristics, A 1 , A 2 , B 1 , B 2 are nonlinear functions' coefficients and p 1 , q 1 , p 2 , q 2 are linear functions' coefficients and f(x,y) is first order polynomial (Ertunc, Ocak, & Aliustaoglu, 2013). A common ANFIS model is consisted of five different layers including fuzzification, multiplication, normalization, defuzzification and summation. A typical diagram of ANFIS is presented in Figure 6. For additional information about the details of mentioned layers, readers are referred to Akbarpour et al. (2016).

Evaluation parameters
For inclusive justification of the performance of the studied data-driven techniques and empirical equations, graphic plots and some statistical parameters were implemented as follows: I: Correlation coefficient (CC) (Samadianfard et al., 2018), expressed as II: Root mean square error (RMSE) (Willmott & Matsuura, 2005) follows as III: Mean absolute error (MAE) (Chai & Draxler, 2014) stated as IV: Willmott's Index of agreement (WI) (Willmott, Robeson, & Matsuura, 2012) expressed as where O i and P i are the observed and predicted i th value of the GSR. V: Taylor Diagrams (Taylor, 2001). Also, Taylor diagram has been exploited to establish its implication in hydrological modeling for assessing the precision of the predicted data. It is a solo diagram which simultaneously combines statistical parameters such as standard deviation, RMSE and CC values. Impressively, Taylor diagrams are capable of highlighting the preciseness of different models in comparison with observations with a series of points on a polar plot (IPCC, 2007;Taylor, 2001).

Results and discussion
The capabilities of four data-driven techniques including SVR, MT, GEP and ANFIS in predicting daily GSR in Tabriz station, Iran were compared with the performances of six empirical equations, namely A-P, E-M, A, B-C, S-O and C. In the present study, 70% of the whole dataset was utilized to train and the 30% of was used to test the mentioned models. In other words, daily data from the beginning of 2011 to the end of 2012 were selected for training and the measured data of 2013 were implemented for testing.
Based on meteorological parameters defined in Table 1, nine various input combinations were contemplated in the current computations (Table 4). Furthermore, it is clear from Table 2 that R a and temperature values have the most direct correlations with GSR, while RH has the higher inverse correlation with GSR. Hence, R a , T min , T max and RH parameters were considered in most input combinations to increase the precision of the models. As shown in Table 4, nine different input combinations tried in the study are (1) R a , (2) R a , n/N, (3) R a , RH, (4) R a , T min , T max , (5) R a , RH, T min , T max , (6) R a , RH, T min ,  T max , n/N, (7) R a , RH, T min , T max , n/N, DOY, (8) R a , RH, T min , T max , n/N, ICSKY and (9) R a , RH, T min , T max , n/N, DOY, ICSKY.
The results of statistical indicators such as CC, RMSE, MAE and WI for predicting daily GSR at Tabriz station for studied models and considered empirical equations are presented in Tables 5 and 6. The overall investigations proved that the precision of predictions increased from input combination 1-6 in all data-driven techniques. As mentioned earlier, four kernel functions including polynomial, normalized polynomial, Pearson VII functionbased and radial basis function were examined for all SVR models. It was found that, except for SVR-1 which polynomial kernel function produced accurate results, Pearson VII function-based provided a precise prediction for all SVR models comparing other kernel functions. Among SVR models, SVR-6 which utilized input combinations of R a , RH, T min , T max and n/N with RMSE of 1.656, MAE of 0.990, CC of 0.980 and WI of 0.990 had the best performances comparing with other SVR models and showed higher accuracy. Additionally, SVR-7 ranked as the second best SVR model. This indicates that DOY did not have a positive effect in decreasing the prediction accuracy. Similar to SVR models, the same trend was approximately seen between MT models. Comparing the results of various MT models revealed that the MT-6 which used input combinations of R a , RH, T min , T max and n/N with RMSE of 1.672, MAE of 1.008, CC of 0.979 and WI of 0.989 had the highest accuracy among the MT models. However, MT increased the RMSE and MAE accuracy of the best SVR model (SVR-6) by 0.97% and 1.82%, respectively. Unlike SVR and MT models, input combination of seven resulted in accurate prediction comparing with other GEP models. Additionally, the analysis of GEP error meters proved that the GEP-7 by input combinations of R a , RH, T min , T max , n/N and DOY with RMSE of 1.681, MAE of 1.064, CC of 0.979 and WI of 0.989 provided relatively more precise predictions in comparison with other GEP models. Nevertheless, GEP increased the RMSE accuracy of SVR-6 and MT-6 by 1.51%, 0.54% and MAE by 7.47%, 5.56%, respectively. Finally, the results of the ANFIS models clearly showed that, similar to SRV and MT models, ANFIS-6 which implements input combinations of R a , RH, T min , T max and n/N with RMSE of 1.937, MAE of 1.171, CC of 0.972 and WI of 0.985 provided the best prediction among the ANFIS models. However, the precision of all ANFIS models is lower than those of the SVR, MT and GEP models.
Moreover, investigating the statistical parameters of empirical equations from Table 6 shows that the commonly used A-P model which employs R a and n/N with RMSE of 1.786, MAE of 1.156, CC of 0.977 and WI of 0.988 presents more accurate predictions in comparison with other empirical equations. Although A-P was selected as the best empirical equation in the current study, but its RMSE and MAE accuracies are increased by 16.97% and 18.28% using the best data-driven model (SVR-6), respectively. Also, E-M with the same input parameters of A-P, was selected as the second best empirical equation with RMSE of 1.869, MAE of 1.254, CC of 0.975 and WI of 0.986. So, the investigating the error meters of A-P and E-M clearly proves that the linear  relationship between GSR and input parameters of R a and n/N (existed in A-P equation) provides more accurate predictions in comparison with exponential relationship (existed in E-M equation). Figure 7 illustrates the observed and predicted values of GSR in the test period using the best models of each data-driven techniques including SVR-6, MT-6, GEP-7 and ANFIS-6 and two best empirical equations including A-P and E-M. It can be comprehended that the predictions of SVR-6 are in better agreement with observed GSR values comparing other models. Furthermore, Figure 8 presents the observed and predicted GSR for the selected best models in the test period in the form of scatterplot. Also, it can be seen from Figure 8 that the estimates of SVR-6 and MT-6 are less scattered through the exact lines. In other words, the estimates of SVR-6 and MT-6 are closer to the exact lines than other studied models.
As mentioned in the literature review, Ramedani et al. (2014) concluded that SVR with radial basis function with RMSE of 3.3 predicted GSR with acceptable accuracy in Tehran station, Iran. So, comparing the obtained results from the current study with the findings of Ramedani et al. (2014) indicates that the accuracy of SVR-6 is significantly higher than those implemented by Ramedani et al. (2014).
An extra investigation of observed and predicted values by SVR-6, MT-6, GEP-7, ANFIS-6, A-P and E-M models is also performed. Figure 9 illustrates the probability distribution of the testing data. These figures are beneficial for understanding the capabilities of different studied models in GSR estimation. It is apparent from Figure 9 that probability distributions of the predicted GSR by SVR-6 are closer to the observed one for most intervals presented in the test period.
For exhaustive examination of attained results using the best models of different data-driven techniques and two best empirical equations, including SVR-6, MT-5, GEP-7, ANFIS-6, A-P and E-M, a Taylor diagram has been applied for evaluation of the statistical parameters among observed and predicted GSR using mentioned models. Figure 10 presents the declared diagram for the best models by which the distance from the reference point is a measurement of the centered RMSE (Taylor, 2001). Therefore, the best model is identified by the point with higher CC and the lower RMSE (Heo et al., 2014). As it is obviously clear in Figure 10, the SVR-6 generated accurate results which were much closer to the observation in comparison with other studied models. MT-6 and GEP-7 are also better than the empirical models according to the Taylor diagram.
The founded results of Taylor diagram verified that the SVR-6 which utilized input combinations of R a , RH, T min , T max and n/N with RMSE of 1.656, MAE of 0.990, CC of 0.980 and WI of 0.990 had the best performances comparing other studied models.
Moreover, one of the key features of the GEP is its proficiency for providing mathematical expression among input and output parameters. In other words, GEP model is capable of finding an explicit mathematical function relating the input and output parameters. The resulted  equation of GEP-7, as the best GEP model in predicting daily GSR, is as follows: GSR = −1.00888 − 0.763834 × exp(−n/N) × (R a − 1.00888) + R a − 1.408 + RH − R a RH + arcsin(n/N) + tan(cos(sin(0.123332 × T max ))) It should be noted that Equation (8) is dimensionless. However, the dimensions of used parameters are as mentioned in Table 1. In the final equation of GEP-7 (Equation (8)), not all parameters of input combination 7 are certainly presented. This can be the result of the automatic procedure of GEP for selecting more effective input parameters in modeling GSR values. Also, the trees structure acquired by MT-6, as an accurate model of MT, is illustrated in Figure 11. It should be noted that the most optimum and also simple trees, which gave acceptable and accurate results, were selected as the best MT models. This pictured tree model is only based on the values of R a and n/N. As a result of MT classifier, 4 rules were produced which are shown in Figure 10. These simple linear equations can straightforwardly be implemented to predict daily GSR at Tabriz station.

Conclusion
As mentioned before, direct measurement of solar radiation is considered a difficult task, especially in developing countries. So, in the current research, the performances of four data-driven techniques including SVR, MT, GEP and ANFIS and six empirical equations in predicting GSR at Tabriz station, Iran have been examined. For that purpose, time series of daily GSR and several meteorological parameters such as R a , RH, T min , T max , n/N, DOY, ICSKY in the time period of 2011-2013 were gathered and 70% of them was selected as train data and 30% for test data. Then, nine various combinations of input parameters have been considered for training datadriven techniques and finding the best combination of them to precise predications of GSR. Furthermore, the capabilities of the studied models and empirical equations were extensively inspected using three statistical parameters including RMSE, MAE and CC. Also, Taylor diagram was utilized for fully assessment of the studied models. The obtained results revealed that the SVR-6, which implements input combination of R a , RH, T min , T max and n/N with RMSE of 1.656, MAE of 0.990, CC of 0.980 and WI of 0.990 had the best performances comparing with other data-driven techniques and empirical equations. Besides, graphical Taylor diagram proved the effectiveness of SVR-6 in GSR modeling. Additionally, it was found that only A-P and E-M empirical equations had the acceptable accuracy and other empirical equations did not provide accurate predictions of GSR. Furthermore, it was suggested that MT-6 with four simple and explicit linear equations can also be applied beneficially for precise predictions of GSR. As a conclusion, SVR and MT models proved to be accurate models in predicting GSR in Tabriz station and could be used in practice for solar radiation estimation. Future works can focus on examining the applicability of different datadriven techniques on increasing the accuracy of solar radiation prediction.