Novel Genetic Algorithm (GA) based hybrid machine learning-pedotransfer Function (ML-PTF) for prediction of spatial pattern of saturated hydraulic conductivity

ABSTRACT Saturated hydraulic conductivity (Ks) is an important soil characteristic that controls water moves through the soil. On the other hand, its measurement is difficult, time-consuming, and expensive; hence Pedotransfer Functions (PTFs) are commonly used for its estimation. Despite significant development over the years, the PTFs showed poor performance in predicting Ks. Using Genetic Algorithm (GA), two hybrid Machine Learning based PTFs (ML-PTF), i.e. a combination of GA with Multilayer Perceptron (MLP-GA) and Support Vector Machine (SVM-GA), were proposed in this study. We compared the performances of four machine learning algorithms for different sets of predictors. The predictor combination containing sand, clay, Field Capacity, and Wilting Point showed the highest accuracy for all the ML-PTFs. Among the ML-PTFs, the SVM-GA algorithm outperformed the rest of the PTFs. It was noticed that the SVM-GA PTF demonstrated higher efficiency than the MLP-GA algorithm. The reference model for hydraulic conductivity prediction was selected as the SVM-GA PTF paired with the K-5 predictor variables. The proposed PTFs were compared with 160 models from past literature. It was found that the algorithms advocated were an improvement over these PTFs. The current model would help in efficient spatio-temporal measurement of hydraulic conductivity using pre-available databases.


Introduction
Saturated hydraulic conductivity (K s ) of soil is one of the most important characteristics influencing water flow rate, soil quality, pollutant, and chemical transportation in soil, nutrient, plant water absorption, and crop growth (Indoria et al., 2020). It further reflects the geometry, sizes, and connectivity of pores in soil (Pot et al., 2020). The K s is one of the most important hydraulic properties from an agriculture point of view as it indicates the productive potential of soil (Rousseva et al., 2017). It is also crucial for predicting heat CONTACT Kanhu Charan Panda kanhucharan.bm@gmail.com Supplemental data for this article can be accessed here. https://doi.org/10. 1080/19942060.2022.2071994 and mass transport in soil and distributed hydrological modelling (Jena et al., 2021). Hence, precise prediction of K s is essential to minimize the prediction uncertainty in these models, thereby increasing their practical applicability. The measurement of hydraulic conductivity could be done in the field or laboratory. Recently, there has been significant progress in the direct measurement of K s in terms of sample collection techniques, laboratory equipment (e.g. K-Sat), and infiltrometers with automatic data logging features for in-situ K s determination, yet the direct measurements are still time-consuming, difficult, and expensive (Naganna et al., 2017). The K s value is also susceptible to a significant change in a short period of time that could affect hydraulic conductivity and related hydrologic systems (Assouline and Or, 2013). Activities like soil compaction, aggregation, burrowing by soil fauna and flora, and the drying-wetting cycle significantly affect hydraulic conductivity that may occur over a period of time (Kuncoro et al., 2014). Extreme alterations in soil structure due to processes like tillage can raise the hydraulic conductivity several times (de Almeida et al., 2018;Jorda et al., 2015). As soil hydraulic characteristics show high spatial variation, it is typically impossible to estimate hydraulic conductivity with adequate spatial density. Inadequacy of high resolution spatial hydraulic conductivity data is frequently cited as a vital flaw for land surface models that simulate hydrologic processes across large areas with high frequency. Further, the prediction of K s continued to be a matter of global concern among the research community for a couple of decades.
These issues can be solved by using the Pedotransfer Functions (PTFs), i.e. models that can simulate the hydraulic properties of soil using readily available properties of soil (Padarian et al., 2018;Van Looy et al., 2017). The PTF is fundamentally a relationship between a soil hydraulic property and the variables that strongly affect it. PTFs that consider some soil structural factors can be very helpful in simulating alterations in soil hydraulic characteristics caused by changes in soil structure. The major factors affecting the K s include Total Organic Carbon (TOC), Wilting Point (WP), Field Capacity (FC), Available Water Content (AWC), soil texture, sand, silt, and soil clay content. Thus, considering these variables in the PTFs for K s prediction is of prime importance. The TOC induces water repellence to soils, thereby increasing soil stability, reducing bulk density, and increasing soil porosity (Blanco-Canqui and Benjamin, 2013). Thus, the K s value of soil increases. The hydraulic conductivity value increases (decreases) when the sand (clay) content of soil increases. It is due to an increased number of macropores (micropores) in the soil (Zhang et al., 2017). The value of K s declines with a rise in WP and FC of the soil because (i) WP and FC increase with the rise in soil matric potential and osmotic potential (Sheldon et al., 2017), and (ii) hydraulic conductivity increases with a reduction in soil matric potential (Blanco-Canqui et al., 2017). The K s decreases with an increase in AWC because of the rise in the number of micropores and the decline in the number of macropores (Zhang et al., 2017). The value of K s is relatively high (low) for coarse (fine) textured soils. The PTF approach for predicting hydraulic properties is getting popular because of the rapidly rising spatio-temporal database of soil hydraulic parameters and tremendous advancement in statistical modelling techniques.
Past studies reported that PTF predictions are further improved when structural factors are included in the modelling (Araya and Ghezzehei, 2019). The K s is strongly affected by the physicochemical properties of the soil. These relationships between K s and these soil properties are nonlinear and very complex in nature. Past studies (Otchere et al., 2021;Sameen et al., 2020;Zhang et al., 2020;Ali et al., 2021;Zhou et al., 2021) have suggested that Machine Learning (ML) based PTFs (ML-PTFs) like Multilayer Perceptron (MLP) and Support Vector Machine (SVM) are some of the most feasible techniques available till date to address nonlinearity by extracting hidden patterns from the data.
One of the most frequently employed statistical approaches for PTF development is the Multiple Linear Regression (MLR). Sachindra et al. (2014) reported that the MLR shows poor accuracy for nonlinear predictorpredictand relationships. Compared to the MLR technique, Generalized Linear Models (GLM) can better address nonlinearity; nonetheless, the GLM requires a large dataset and is susceptible to outliers (Chandler, 2020). The Multilayer Perceptron (MLP) approach might be used to reduce the impact of outliers in nonlinear problems (Singh et al., 2020;Otchere et al., 2021;Abdalrahman et al., 2022). The Support Vector Machine (SVM) is another nonlinear regression approach that performs well with outliers and overcomes overfitting difficulties (Sameen et al., 2020;Pei et al., 2021). The MLP models were superior to the MLR models because the MLP models could handle predictor-predictand nonlinearity better (Kouadri et al., 2021;Kumar et al., 2019a). SVM approaches outperformed positive coefficient regression (PCR) and stepwise regression (SR) models, according to Goly et al. (2014), for two reasons: (i) overfitting issues of the SR and PCR models, and (ii) superiority in handling nonlinear predictor-predictand relationship by the SVM. In the light of the above reviews, it was found that MLP and SVM had high accuracy in predicting the variable of interest compared to the techniques mentioned above. The predictive ability of the MLP and SVM models in several diverse fields, e.g. streamflow measurement (Meng et al., 2019), sediment yield prediction Tao et al., 2021), rainfall prediction (Chen et al., 2022), soil hydraulic properties prediction (Kashani et al., 2020), environmental and chemical studies (Kalantary et al., 2019), water quality estimation (Deng et al., 2021), nutrient transport modelling (Choubin et al., 2018) and climate change impact assessment (Sachindra et al., 2018) had been well documented.
Many past studies have demonstrated the application of machine learning methods in several fields. Le et al. (2019) demonstrated the application of MLP for estimation of the heating load of buildings and energy efficiency for smart city planning. Moosavi et al. (2019) employed MLP to predict the laboratory-scale performance of CO 2 foam flooding for improving oil recovery. Past studies have used MLP to detect deadly diseases like dengue (Gambhir et al., 2017). Geethanjali et al. (2008) utilized a machine learning approach to predict the amount of dissolved calcium carbonate concentration throughout oil field brines. Several research works have reported the successful use of machine learning techniques for enhanced oil recovery (Ahmadi, 2015b;Ahmadi and Pournik, 2016). The use of MLP to predict the oil flow rate of the reservoir was demonstrated by Ahmadi et al. (2013). Ahmadi and Chen (2020) used MLP to predict permeability impairment due to scale deposition. Estimation of permeability and porosity of oil reservoirs via petrophysical logs using the MLP technique was carried out by Ahmadi and Chen (2019) and Ahmadi (2015a). Ahmadi et al. (2015b) utilized MLP for predicting minimum gas miscibility pressure. Past studies have also successfully employed the SVM approach in several fields (Ahmadi et al., 2014;Ahmadi et al., 2015a;Ahmadi, 2016;Ahmadi and Mahmoudi, 2016;Ahmadi and Bahadori, 2015).
Parameter tuning is one of the most important steps for establishing the ML-PTFs. In the conventional ML models (e.g. MLP and SVM), the tuning is carried out by trial-and-error method, making it challenging to obtain the optimum model parameters. The implementation of optimization techniques could solve these issues. One of the frequently used optimization approaches is the Genetic Algorithm (GA). The successful application of hybrid GA-based ML techniques in the fields like hydrological modelling (Molajou et al., 2021), evapotranspiration modelling (Gong et al., 2021), and groundwater modelling (Seyedpour et al., 2019) has been well documented. Many past studies had reported that the GA improved the efficacy of standalone ML models like MLP and SVM (Zhang et al., 2020;Kumar et al., 2019b;Ali et al., 2021;Zhou et al., 2021). However, the suitability of the GA-based hybrid ML-PTFs for predicting hydraulic conductivity is yet to be assessed.
It was noticed that the ability of GA-based hybrid ML-PTFs has never been evaluated for the prediction of hydraulic conductivity. Thus, two-hybrid ML-PTFs, i.e. GA-based MLP (MLP-GA) and SVM (SVM-GA) PTFs, were proposed in this study. These hybrid ML-PTFs are also compared with standalone MLP and SVM PTFs. The outcomes of the study would be helpful to engineers and researchers for the accurate assessment of K s . Considering the above reviews, the research gap in these past studies is outlined as follows: (1) past studies only employed classical ML models for the prediction of hydraulic conductivity and (2) the ability of optimization algorithms to enhance the efficiency of traditional ML models for prediction of hydraulic conductivity is yet to be tested thoroughly. In the light of these research gaps, the following objectives are decided for the study: (1) screening of appropriate predictor combination for hydraulic conductivity prediction, (2) development of hybrid GA based ML-PTFs (i.e. MLP-GA and SVM-GA PTFs) for prediction of K s , and (3) comparison of the predictive capacity of MLP, SVM, MLP-GA, and SVM-GA PTFs. The study will help in the precise prediction of hydraulic conductivity using pre-available databases.

Study area
The research was conducted in the Sheonath basin (area = 27116 km 2 ), located in Chhattisgarh, India. The basin lies between latitudes 20 0 15' N and 22 0 39' N and longitudes 80 0 24' E and 82 0 36' E ( Figure 1). The major land uses of the basin include agriculture (63%), forest (22%), and barren terrain (10%). The basin has a tropical climate characterized by the hot and humid summer season. The summer temperature in the basin can reach up to 42 o C. Most rainfall occurs during the Monsoon season (June-September), with an average rainfall of 1,292 mm. The northern and southern portions of the basin are mountainous, while the middle half is a plain region. Around 80% of the basin population lives in rural areas, with agriculture and agricultural-based small industries serving as the primary source of income. The primary crops grown in the basin include paddy, maize, jowar, groundnut, gram, and wheat. Because the study area contains a wide range of soil ( Figure 2) and crop types, the Sheonath River Basin was chosen to test the efficiency of the proposed hybrid ML-PTFs.

Estimation of soil physical parameters
In the current study, 286 soil samples were collected from widespread locations in the study area ( Figure 2). Soil samples were collected from a depth ranging from 0-30 cm. The variation of dominant soil hydraulic properties for the soil samples was presented as the textural triangle in Figure 2. It was noticed that major soil groups in the study area included sandy-loam, sandy-clay-loam, sandy-clay, clay-loam, and clay. The hydraulic conductivity of the study area ranged between 5-30 mm/hr, the FC ranged between 7-34%, and the WP spanned between  6-24%. It was noticed that the hydraulic conductivity decreased, whereas the FC and WP increased with the increase in clay content.

Hydraulic conductivity determination by Constant head permeameter test (CHPT)
The CHPT is one of the most reliable methods for determining the hydraulic conductivity of soils (Yuan et al., 2019). The method works on the principle of Darcy's law.
The test determines the amount of fluid passing through the soil sample in a fixed time period. The testing apparatus is equipped with an adjustable constant head reservoir and an outlet reservoir which maintains a constant head during the test. Water used for testing is de-aired water at a constant temperature. Knowing the height of the soil sample column L, the sample cross-section A, the constant pressure difference h, the volume of passing water Q, and the time interval T, one can calculate the hydraulic conductivity (K) of the sample as:

FC and WP determination by Pressure plate apparatus
One of the most eminently successful research tools for the development of soil moisture characteristic curve is the Pressure Plate apparatus (Do et al., 2018). It provides a convenient, reliable means of removing soil moisture, under controlled conditions, from soil samples throughout the whole plant growth range without disturbing the soil structure. The soil moisture was extracted at 0.33 and 15 bar to determine the FC and WP, respectively (Alaboz et al., 2021). The AWC was determined by subtracting the WP from the FC.

Texture determination by Hydrometer method
Textural analysis of the soil was performed using the Bouyoucos hydrometer method (Bouyoucos,1962). Fifty gram of soil sample was taken in a beaker, and 50 mL of 6% H 2 O 2 was added to it. The beaker was covered by watch glass and heated on the heating plate until the organic matter was not oxidized. The content was transferred into a dispersing cup with about 400 mL of distilled water. Then, 100 mL of 5% Calgon solution was added to it, and the suspension was stirred with the help of an electric stirrer for 10 min. The suspension was transferred into a measuring cylinder, and the volume was made up to 1L. It was then shaken vigorously for 5 min with the help of a plunger. The hydrometer was placed into suspension, and readings were recorded exactly after 4 s and after 2 h. The sand, silt, and clay content were calculated, and the textural class was determined with the help of the Textural Triangle. The corrected hydrometer reading = R+(T -67) × 0.2.
Where R is the hydrometer reading and T is the temperature of suspension in o F. % (silt+ clay) = (Corrected hydrometer reading at 4 seconds/weigh of the soil) ×100 % Clay = (Corrected hydrometer reading at 2 hours/weight of the soil) ×100 % sand = 100 -% (silt + clay)

Bulk density determination by the Core method
The core method is one of the most reliable and simple methods available for determining soil bulk density (Al-Shammary et al., 2018). The method involves sampling a soil core from the desired depth under its most natural condition using a cylindrical core sampler and determining the oven-dried mass of soil per unit volume of the core.

Total organic carbon (TOC) determination
TOC of the soil samples was estimated using the Walkley-Black chromic acid wet oxidation method (Walkley and Black, 1934). The wet oxidation technique utilized exothermic heating and oxidation of organic carbon of the sample with potassium dichromate and concentrated H 2 SO 4 and the titration of excess dichromate with 0.5N ferrous ammonium sulphate solution to a sharp one drop endpoint. In this procedure, 10 ml of 1 N K 2 Cr 2 O 7 solution is added to 0.2-0.5 g dried sample in a 500 ml Erlenmeyer flask mixed with swirling. Twenty (20) ml of concentrated H 2 SO 4 are added and mixed gently. The mixture is allowed for 30 min, and after it is diluted to 200 ml volume of distilled water and 10 ml of 85% H 3 PO 4 , 0.2 g NaF and 15 drops of diphenylamine indicator are added. The solution is back titrated with 0.5 N ferrous solution (Gaudette et al., 1974).

ML techniques
In this study, primarily three ML algorithms were used viz., MLP, SVM, and GA. Hybrid PTFs were proposed by combining GA with MLP and SVM approaches.

Multilayer Perceptron (MLP)
MLPs are among the most popular and frequently used neural networks employed to solve a wide range of issues (Meshram et al., 2022;Otchere et al., 2021;Kouadri et al., 2021;Singh et al., 2018). MLPs utilize a supervised process that develops a model using sample data with known output. The MLP derives the relationship only from the examples provided that include all important information required for the connection. Typically, MLP is made up of three layers (i.e. input, hidden, and output layers), and each layer could contain several neurons that give it the nonlinear computational ability. The data is routed from the input to the output layer via the hidden layer. The output layer performs essential tasks such as classification and prediction. The main computational power of the MLP lies in the hidden layers situated between the input and output layers. In an MLP, data passes from the input to the output layer in the forward direction, similar to a feed-forward network. The backpropagation approach is used to train the neurons of the MLP. MLPs can tackle issues that aren't linearly separable and are meant to simulate any continuous function.

Support vector machine (SVM)
SVMs are one of the most reliable and precise machine learning approaches (Sameen et al., 2020). The goal of SVM is to discover separating hyperplanes that will effectively partition the samples into various groups (Deng et al., 2021;Tanveer et al., 2022). The hyperplanes are far away from the closest members of the groups when the datasets are linearly separable. SVM attempts to simulate this condition as closely as feasible. Nonlinear SVM substitutes different manifolds for hyperplanes, but the underlying idea stays the same. SVMs are theoretically sound, take very few training samples, and are unaffected by the number of predictors. Instead of empirical risk optimization, the SVM works on the structural risk minimization induction approach. A detailed theoretical introduction to SVM could be found in Kalantary et al. (2019).

Genetic algorithm (GA)
GAs are metaheuristic techniques that may be employed for a wide range of optimization issues . The natural genetics evolutionary theory forms the basis of the GA. A standard GA approach comprises four stages: fitness evaluation, selection, genetic operations, and replacement. In a fundamental GA loop, a population pool of chromosomes survives. Except for the fitness evaluation, the chromosomes reflect the coded order of the available solutions that are utilized for all GA processes. Initially, the population is generated arbitrarily, and the best solution is identified by estimating the objective function in the decrypted form of the chromosomes. After establishing the population pool, the evolution of GA begins. At the beginning of every generation, a mating pool is generated by choosing appropriate chromosomes from the population. The offspring fitness is evaluated, and at the end of the generation, some chromosomes in the population will be replaced by offspring as per the substitution scheme. Iteration of the generation takes place until the termination requirements are met. The best solutions or the fittest chromosomes can appear in the final population by replicating natural selection. A brief introduction to GA can be found in many past pieces of literature (Ali et al., 2021;Liu et al., 2021). Figure 3 describes the process for the development of MLP-GA and SVM-GA PTFs.

Performance evaluation criteria
In the current study, four statistical measures were adopted for the performance evaluation of the ML-PTFs. These measures include Nash-Sutcliffe efficiency (NSE; Panda et al., 2022), weighed version of coefficient of determination (wR 2 ; Biru and Kumar, 2018), the ratio of the root mean square error to the standard deviation of measured data (RSR; Martínez-Salvador and Conesa-García, 2020), and percent bias (PBIAS; Yang et al., 2021). The above matrices are presented in equations 1-4.
Where, K o and K p are observed and predicted values of saturated hydraulic conductivity and K o and K p are the average values of observed and predicted values of saturated hydraulic conductivity.
The correlation between the dependent and independent variables was schematically presented in Figure 4. Results revealed that soil hydraulic conductivity showed a significant negative correlation with clay at 95% CI (−0.72), wilting point (−0.762), and field capacity (−0.766), and a significant positive correlation at 95% CI with sand (0.589). The rest of the parameters did not show any significant correlation. As seen in Figure 5, hydraulic conductivity decreased with an increase in clay content due to (i) a decrease in the number of macropores (Zhang et al., 2017) and (ii) hydraulic conductivity is proportional to the number of macropores (Ilek et al., 2019). Similarly, hydraulic conductivity increased with an increase in sand percentage due to an increase in the number of macropores. Hydraulic conductivity decreased with an increase in wilting point and field capacity of soil because (i) WP and FC increase with the increase in soil matric potential and osmotic potential (Sheldon et al., 2017), and (ii) hydraulic conductivity increases with a decrease in soil matric potential (Blanco-Canqui et al., 2017). The parameters like silt content, AWC, TOC, and BD did not show any systematic variation with the variation of hydraulic conductivity. The efficiency of the ML-PTFs for several combinations of the predictor variables was tested in this study.

ML model development
The ML-PTFs were tested for eight predictor combinations, namely K-1 to K-8 ( Figure 6). Finding the appropriate training data length is one of the essential steps of any modelling process. An ideal training data set should be able to capture all patterns that occurred in the past. The training data set length of machine learning models mainly depends on the complexity of the problem and the complication of the learning process (Schmidt et al., 2019). The M-test is among the most reliable techniques to determine the appropriate training data length (Shamim et al., 2016). When the gamma and SE value approximate steady-state and asymptote to X-axis, the data length is considered the standard data length in the M-test. In the present study, these indices became constant at the 217th data point, accounting for approximately 75% of the data (Fig. S2 of supplementary  material). Hence, 75% of data was used for calibration, and the remaining 25% of data was used for validation. The use of more data length than the aforementioned data length obtained by the M-test would lead to an overfitting problem due to increased complexity of the learning process, whereas using fewer data compared to standard length would reduce the efficiency of the models significantly as the model would not be able to learn the entire data patterns (Singh et al., 2018). Thus, using a data of length other than the standard length would be a major hindrance in developing a potential model. The ML models were trained by 75% of the data points and tested by 25% of the points. The results of training and testing of the ML-PTFs for the eight predictor combinations are presented in Figure 7. The K-5 predictors for SVM-GA model showed the best performance (NSE = 0.974, RSR = 0.162, wR 2 = 0.948, PBIAS = 0.229) followed by the K-5 predictors for MLP-GA PTF (NSE = 0.97, RSR = 0.172, wR 2 = 0.94, PBIAS = 2.894). Similarly, the least accuracy was demonstrated by the K-1 predictors for MLP model (NSE = 0.513, RSR = 0.698, wR 2 = 0.315, PBIAS = −3.242) followed by the K-7 predictors (NSE = 0.701, RSR = 0.547, wR 2 = 0.484, PBIAS = −0.788) for the SVM and K-2 predictor for the MLP PTF (NSE = 0.709, RSR = 0.539, wR 2 = 0.504, PBIAS = −1.896). It was found that the K-3 predictors for the MLP PTFs were the most biased model (PBIAS = −10.26), whereas the SVM model combined with the K-3 predictor variables is the least biased model (PBIAS = −0.15). The performance of each model during training and testing periods is given in Table S1 of the supplementary material. Additionally, the learning or accuracy curve for the best set of predictor combinations has been provided to better understand model efficacy at different iterations (Fig. S1 of the supplementary  material). It was observed that hybrid machine learning models had fewer accuracy drops from training to testing period compared to conventional ML models, i.e. MLP and SVM models. It could be noticed from the accuracy curve that a high deviation of accuracy in models was found at approximately less than 200 iterations. It might be due to overfitting and underfitting problems created during hyperparameters tuning as well as the complexity of learning algorithms.

Model Ranking
In this study, two types of model ranking were carried out for precise assessment of the model performance, i.e. (i) ranking of the predictor combinations for various ML-PTFs (Figure 8(a)) and (ii) ranking of the ML-PTFs for the predictor combinations (Figure 8(b)). It was noticed that the ML-PTFs performed the best for the K-5 predictor combination, followed by the K-4 predictor set (Figure 8(a)). The superior performance of the models for K-5 predictors might be attributed to two reasons, (i) a large number of predictors lead to overfitting of MLP and SVM PTFs (Sameen et al., 2020;Ghazvinei et al., 2018), and (ii) the inability of the models to explain the total variance of the predictand dataset when very few numbers of predictors were used (Panda et al., 2022). The result suggested that the number of predictors in the K-5 combination was optimum for hydraulic conductivity prediction, further supporting the findings presented in Figure 4. The ML-PTFs showed the worst performance for the K-7 predictor set, followed by the K-1 predictors. It might be due to (i) the overfitting effect of the MLP and SVM-based PTFs (Sameen et al., 2020) and (ii) the inability of the PTFs to handle the noise in data introduced by increasing the number of predictor variables (Otchere et al., 2021).
It could be noticed that the performance of the SVM-GA PTF was superior compared to the rest of the PTFs, followed by the MLP-GA PTF (Figure 8(b)). The efficiency of the GA based PTFs was higher compared to the standalone MLP and SVM models because of (i) its ability to remove redundant information from the dataset (Dong et al., 2018) and (ii) its capability of creating new data patterns more efficiently (Kumar et al., 2021). It was observed that the efficacy of the SVM PTF to predict hydraulic conductivity was higher compared  Sun et al., 2021), and (ii) SVM is more capable of dealing with noise in an unbalanced dataset compared to MLP (Moraes et al., 2013). The outcomes were supported   by the Taylor diagram ( Figure 9). It could be seen from Figure 4 that the predictor variables have a high correlation among themselves. The high efficiency of the hybrid ML-PTFs indicated the superiority of the proposed PTFs to handle the inter-variable correlation compared to the traditional ML-PTFs (Ali et al., 2021;Zhou et al., 2021). The architecture of the ML algorithms used in this study is presented in Table 2.

Model Comparison
The distribution and the extreme values of the percentage error of the ML-PTFs are presented in Figure 10. It was noticed that the MLP showed the highest error percentage, whereas the SVM-GA PTF demonstrated the least error percentage in predicting hydraulic conductivity. It  might be due to two reasons, (i) SVM is superior to MLP in capturing the nonlinearity in predictand variable (Otchere et al., 2021;Yafouz et al., 2021), and (ii) the GA optimizes the performance of standalone SVM and MLP models by removing redundant information Figure 11. Spatial variation of observed and simulated hydraulic conductivity for different predictor combinations and improving the prediction ability of models (Dong et al., 2018;Kumar et al., 2021). It was also observed that the models showed a high error percentage in simulating low values. It could be because a slight fluctuation in the low values can cause a considerable error value. However, GA reduced the error in predicting low values significantly.
The ability of the best performing ML-PTFs for all predictor combinations (Figure 8(a)) to capture the spatial variation of the observed hydraulic conductivity was presented in Figure 11. The percentage error of these PTFs in capturing the spatial variation of the observed hydraulic conductivity is shown in Figure 12. It was noticed that the Figure 12. Spatial variation of error in the best performing ML-PTFs of each predictor set models showed high spatial variation and a high percentage of error in simulating hydraulic conductivity for areas with extreme hydraulic conductivity values. The observation supported the previously reported outcomes. The ML models simulated the moderate hydraulic conductivity values better than the extreme values for all the predictor combinations by their respective best-performing models (Melesse et al., 2011). From the above result, it could be concluded that the ML-PTFs showed a high degree of certainty (uncertainty) in predicting average (extreme) values of hydraulic conductivity (Shamshirband et al., 2019). Otherwise stated, the predictive ability of the ML models was more reliable for medium-textured soil compared to coarse and fine-textured soil. The SVM-GA model for the K-5 predictors showed the highest efficiency in simulating extreme hydraulic conductivity values. Hence, this study chose the SVM-GA PTF with the K-5 predictor combination as the base model.

Comparison of the hybrid ML-PTFs with past studies
It was noticed that the benchmark model proposed in this study, i.e. the SVM-GA model combined with the K-5 predictors (RMSE = 0.12 cm/hr), was an improvement over the hybrid multi-model ANN (MM-ANN; RMSE = 2.57 cm/hr) proposed by Kashani et al. (2020). It was due to the inherent ability of GA to optimize the standalone ML models and the superior ability of SVM to address nonlinearity compared to ANN models, as discussed earlier. The current model is also an improvement over the wavelet-based Random Forest (RF) technique (RMSE = 1.73 cm/hr) employed by Singh et al. (2020) for hydraulic conductivity prediction. As reported by Deng et al. (2022), the inaccuracy of the RF model might be due to the following reasons (1) too small training data, (2) a large number of trees, and (3) inappropriate feature selection. Similarly, the efficacy of the best performing ML-PTFs for all predictor combinations used in the current study was compared with the efficiencies of 160 models obtained from past studies (provided in Table S2 of supplementary material). It was noticed that the R 2 (RMSE) values of the models were significantly higher (lower) compared to past studies ( Figure 13). The improved efficiency models used in this study might be due to (i) optimum predictor screening that was neglected by the past research works and (ii) implementation of optimization techniques (i.e. GA) with the ML-PTFs.

Conclusion
In the current study, two hybrid ML-PTFs (i.e. MLP-GA and SVM GA) were proposed for predicting soil hydraulic conductivity. The predictive accuracy of the hybrid PTFs was compared with standalone MLP and SVM PTFs for eight predictor combinations (i.e. K-1 to K-8). The outcomes of the study lead to the following conclusions: i. The K-5 predictor combination (i.e. sand, clay, WP, and FC) showed optimum accuracy for all ML-PTFs due to the optimum number of variables in the combination. Similarly, the SVM-GA PTF demonstrated the highest efficiency for all predictor combinations because of the superior ability of the model to handle nonlinear predictor-predictand relationship compared to other models. Interestingly the upcoming series of the current paper would also concentrate on developing a novel approach to improve the efficiency of the model to replicate the extreme values.

Data Availability Statement
In the study, only primary datasets were used, which could be made available upon request.

Disclosure statement
No potential conflict of interest was reported by the authors.