Robustness analysis of machine learning classifiers in predicting spatial gully erosion susceptibility with altered training samples

Abstract The present research intended to assess the robustness of three popular machine learning models, i.e. random forest (RF), boosted regression tree (BRT) and naïve bayes (NB) in spatial gully erosion susceptibility modelling in Jainti River basin, India. A gully inventory map of 208 gullies was prepared through field survey and Google earth imageries. Following the 70/30 ratio, three randomly sampled groups of altered training and validation gully sets G1, G2 and G3 were prepared for modelling gully erosion susceptibility. Using information gain ratio and multi-collinearity analysis, 14 gully conditioning factors (GCF) were selected. The discrimination ability and reliability of the models were measured through Kappa coefficient, efficiency, receiver operating characteristic curve, root-mean-square-error (RMSE) and mean-absolute-error (MAE). The stability of the machine learning models was estimated by comparing the accuracy statistics and the departure in areal outcomes among intra-model and inter-model. RF model was found as the most consistent. With the highest mean AUC (0.903), efficiency (91.17), Kappa coefficient (0.835) and lowest RMSE (0.192) and MAE (0.081), RF was found to be more consistent when the training and validation data sets were altered. The effectiveness of each input GCFs was determined using map removal sensitivity analysis technique. This study could be supportive in ascertaining model deployment for mapping gully erosion and managing the land resource.


Introduction
Gully is the most visible expression of soil erosion and can be defined as a channel with >0.093 square metres cross-sectional area (USDA-SCS 1966;Poesen et al. 1996). The gully-affected areas are designated with some harmful characteristic such as unproductive and degraded land, soils with low nutrients and high contaminants, drainage sedimentation, desertification, destruction of roads and anthropogenic infrastructures (Poesen et al. 1998;Fox et al. 2016). Ekholm and Lehtoranta (2012) pointed out that from an ecological perspective, gully erosion decreases the surface vegetation cover and accelerates the process of desertification. Gullies are formed when erosion by water or erodibility of the soils is greater than the geomorphic threshold of the specific unit or an area (McCloskey et al. 2016). Gully formation and development, within a river basin or specific unit, is controlled by numerous factors, i.e. climatic characteristics, soil type and properties, drainage characteristics, anthropogenic activities and various favourable topographical attributes (Rahmati et al. 2016;Piacentini et al. 2018). Alternative dry-hot and wet seasons accelerate gully erosion during boiling and dry summer, with the presence of favourable soil and wilted vegetation, fine cracks appear on the dried soil surface (Roose 1996). Thereafter, during wet season, the first quick precipitation concentrates in these cracks, and the flow velocity and volume of water get increased and stimulate the erosive power, which leads to the formation of rill and gullies (McCloskey et al. 2016).
Globally, gully erosion has been considered as one of the major concern as it accounts for 10% to 94% share of soil loss by water-related erosion. In India, declining of land efficiency due to gully erosion is the major concern in some specific areas, and approximately 3.98 million hectare land all over India has been affected by gully erosion (Ghosh and Guchhait 2016). It has an adverse effect on land quality and productivity both at on-site or at the source areas and off-site or at the downstream areas (Wu et al. 2008).
To mitigate and control the gully expansion in a watershed scale, different researchers pointed out gully erosion susceptibility mapping (GESM) as an imperative tool or effort (Sidorchuk et al. 2003;Li et al. 2016). During the last decades, a lot of work has been carried out on spatial gully erosion susceptibility modelling in different countries (Conforti et al. 2011;Conoscenti et al. 2014;Saha et al. 2019). A general review of these works has revealed that specifically in case of GESM, there are two parallel approaches. In the earlier works, various GIS-based bivariate and multivariate statistical methods were used to generate GESMs, such as information value (Conforti et al. 2011), logistic regression (Conoscenti et al. 2014;Saha et al. 2019), evidential belief function and weight of evidence , index of entropy (Zakerinejad and Maerker 2014), fuzzy-AHP (Rahaman et al. 2015), conditional probability (Rahmati et al. 2017), certainty factor (Azareh et al. 2019). Besides these, some index-based assessments were performed such as stream power index (SPI), gully density index (GDI) and normalized topographic method (Castillo et al. 2014;Zakerinejad and Maerker 2015). An accurate GESM could provide useful information to formulate soil and water conservancy policy (Bouaziz et al. 2011). Therefore, to identify the future risk areas of gully erosion, spatial modelling of gully erosion susceptibility is essential and is the first step towards the management of land.
Recently, data mining techniques and machine learning models have been used in spatial assessment of GESM. Machine learning models evaluate and predict an event based on learning of training and testing data using computer-based algorithms and artificial intelligence (Kuhnert et al. 2010;Aburomman and Ibne Reaz, 2017)). The machine learning algorithms have been popularized as the models reflect high to very-high robustness and excellent prediction capability (Zabihi et al. 2019). Among these data mining models, frequently applied models in the field of spatial susceptibility assessment are support vector machine (Yunkai et al. 2010), naïve bayes, multilayer perception Roy and Saha 2021), random forest (Kuhnert et al. 2010), support vector machine, classification and regression trees, random forest, Bagged CART and boosted regression trees (Zabihi et al. 2019). These models have shown different performance in various fields of spatial susceptibility assessment (Hipni et al. 2013).
Gully formation is a complex process that involves multiple factors, including soil type and properties, climatic elements, relief properties, hydrology and anthropogenic activities (Rahmati et al. 2017). The generation of a reliable GESM is a challenging work till now. Despite huge work and effort on GESM, there is still an argument over which method, model or technique are robust and finest. Furthermore, superiority of the outcome of the model not only depends on the selection of a model rather directed by the quality of data (i.e. training and validation set partitioning and factor selection) incorporated in building the model (Tien et al. 2016). In modelling approach, a collection of inventory data is essential to build and run the model. The inventory data set is categorized into training and validation data through random sampling and partitioning method. Therefore, use of only one set of training and validation data set may not justify the model results as the error may occur in the sampling and selection of the data. The model's capability and robustness could be checked using different and altered inventory data sets (both training and validation) and accuracies could be compared among different machine learning models in both dimension, i.e. within the model or intra-model and among the models or intermodel (Rahmati et al. 2017). Two important aspects of model accuracy assessment are discrimination ability and the reliability (Pearce and Ferrier 2000). In gully erosion susceptibility zonation, discrimination ability refers to the separation capability of a model between gully presence and absence areas, and reliability refers to the agreement between observed gully locations and predicted gully locations (Pearce and Ferrier 2000). An ideal model should have both dimensions to represent its robustness (Pearce and Ferrier 2000). Therefore, to resolve the above-stated concerns as reported in the literature, the present analysis was envisioned to (i) prepare GESMs using random forest (RF), boosted regression tree (BRT) and naïve bays (NB) machine learning classifiers, (ii) prepare different training and validation gully samples using proper technique, (iii) analyse the stability of the models evaluating the produced GESMs based on altered training gully samples, (iv) find out inter-model and intra-model areal departure when training samples were altered, (v) compare the prediction accuracies of these machine learning models using three discrimination indices, i.e. efficiency, receiver operating characteristics, kappa coefficient and two reliability measures, root-meansquare-error (RMSE) and mean-absolute-error (MAE) to identify the best fit and robust model, (vi) consider the relative contribution of conditioning factors of gully erosion. RF, BRT and NB models were selected for this study since these models have been frequent and effective use in the field of spatial susceptibility assessment of natural hazards. Besides this, these models handle multidimensional independent variables, i.e. categorical, nominal, ordinal, discrete and continuous variable without any kind of pre-assumptions.

Study area
The most part of Jainti Basin is extended in the Deoghar and Giridih districts of Jharkhand state, India (Figure 1). With a catchment area of 542.69 km 2 , the river drains through the lateritic surface in the eastern fringe of the Chotanagpur Plateau which is classified as one of the most gully erosion-prone areas in India (Ghosh et al. 2015;Ghosh and Guchhait 2016). Jainti is the sixth-order left-bank tributary of the river Ajay. This is trellis pattern drainage system, and it has two major tributaries, i.e. Dilia and Baghdaru River. The western segment has a relatively higher elevation, i.e. 380 m, including few steep-sloped small flat-topped hills and the eastern portion of the basin characterized with relatively lesser elevation (<170 m) and gently sloped. The climatic pattern perceived in this catchment varies from sub-humid to sub-tropical monsoon type (Singh 1957;Ghosh et al. 2015). Annual average rainfall is about 1239 mm, of which more than 80% occurs by the monsoonal rainfall between June and September ( Figure 2) just after the extremely hot (yearly mean temperature varies between 8 and 43 C) and dry summer (March to early June). These climatic characteristics of the basin, from the perspective of a morphogenetic region, can be regarded as Tropical Wet-Dry unit (AW climate type) with the occurrence of basal chemical weathering, formation of Badlands and crusting of Fe and Al minerals over the basin surface (Chorley et al. 1984). The presence of dense vegetation cover in the whole basin is very few rather some sparse vegetation. Butea monosperma, Bombax ceiba, Eucalyptus globulus and shrubs are spread throughout the basin. The diversity in soil type is very less, and most of the parts are dominated by loamy skeletal haplustepts as reported by the National Bureau of Soil Survey and Land Use Planning (NBSSLUP), India. The slope condition varies abruptly from flat to 25˚ (Figure 6(b)). According to classified land use/land cover (LULC) map, the presence of barren (14.78%) and fallow land (26.16%) indicates the existing of soil erosion processes followed by the climate prevails and geological/geomorphological setting.
The rationale behind the selection of the Jainti River basin for the present analysis is multi-fold. Firstly, the availability of topographical, geological and climatic data sets and accessibility, i.e. relief and drainage, precipitation (rainfall) amount and satellite images. Secondly, during the field survey, it was observed that some gullies are gradually encroaching on the agricultural fields and erosional processes are somewhat accelerating by the anthropogenic activities along with topographic and climatic influence such as deforestation, unlawful soil excavation, inappropriate farming practice and overgrazing, which pave the way of gully erosion; thereby, agricultural sector is facing trouble and environmental condition is degrading in this region as well as in this basin (Table 1). Such activities demand the need for conservation measures and therefore, delineation of susceptible areas can be considered as the first step. Besides this, in India, laterite soil is reported as one of the most erosion vulnerable soils owing to its own physical and chemical elements, i.e. the presence of highly erosionprone kaolinitic clay in B layer, less moisture retention capability, crusting of FeO (iron oxide) on the surface and unfavourable for vegetation growth (Ghosh et al. 2015).

Data used and sources
In the present analysis, prerequisite data sets were collected from various government organizations, websites and filed study ( Table 2). The watershed boundary was delineated using topographical maps and digital elevation model (DEM). Topographical maps (sheet no. 72 L/16, 72 L/15, 72 L/12, 72 L/11, 72 L/8, and 72 L/7) were collected from the Survey of India (SOI) of 1:50000 scale and DEM was downloaded from the USGS (United States Geological Survey) Earth Explorer acquired on February 2011 of 30 m Ã 30 m spatial resolution. Land use/land cover (LULC) map   Figure 3.

Gully erosion inventory
Gully erosion inventory map is essential to provide baseline information about the distribution pattern and characteristics of gullies and specific erosional processes involved in relation to the geophysical components present in the affected areas (Duman et al. 2005;Rahmati et al. 2017). In our earlier study, we mapped a total of 152 gullies in this basin. Among those, some were mapped using field survey between 4 and 10 February 2018, and others were demarcated using Google Earth images. In the present analysis, to produce the gully inventory map, an extensive field survey was carried out in between 20 to 26 January 2020 for the second time with global positioning systems (GPS) and a total of 208 gullies were identified in polygon format (Figures 4 and 5). Finally, all the gullies were overlapped on a Google Earth satellite image to verify the gully locations. Thus, collection of field data (gully inventory) before modelling an event (gully erosion susceptibility) is essential in the field of stochastic modelling (Rahmati et al. 2016). Most of the gullies are permanent types, which need special measures to check the further development processes and characterized with varying length, width, shape and depth (Table 1). In this study, gully polygons were transformed to points (head-cut point). The positions of the points were used in training and testing of models. The equal number and percentage of nongully point locations were selected randomly and used in the processes of training and testing of models.

Splitting and sampling
In modelling an event, it requires two sets of data such as training data set and validation data set during pre-modelling phase (Arabameri et al. 2018). Therefore, rational partitioning of the collected field gullies into training and validation set is important for constructing the models. After reviewing the existing literature concerning the splitting of data set, it was found that there are two ratios, i.e. 70/30 and 75/25 to split the samples into training and testing gullies (Rahmati et al. 2016;Arabameri et al. 2018). The reviews also asserted that it is quite enough to segregate among the gullies to ensure effective model building and validation (Pourghasemi and Rahmati, 2018). Therefore, for sampling and splitting the data, three schemes are suggested such as consideration of space, time and random segregation (Chung and Fabbri 2003). The exact time or duration of gully formation was unknown; therefore, the inclusion of time scheme was unfeasible for the present study. Hence, space and random segregation strategy were applied together. All the 208 gully polygons were converted into point (head-cut point) and 208 gully points were selected for training and validating the models. Same number of non-gully points was selected randomly for running the model. Among the total 208 gully locations, 70% (146) were used to train the models, and 30% (62) were used to validate the models. Repeating these process three times, sample gully sets G1, G2 and G3 were prepared to realize the data sensitivity (space and random variation of the training and validation data set) in relation to model's output and robustness (Cama et al. 2017). The gully heat cut points of sample data sets (G1, G2 and G3) were then overlaid with each conditioning factor to find out gully pixels (positive event) and non-gully pixels (negative event) of each layer according to their nature of appearance. In modelling an event using machine learning methods, a positive event (gully presence) and negative event (gully absence) were recommended in various literatures (Cama et al. 2017).

Selecting the gully erosion conditioning factors
For evaluating the gully erosion susceptibility in any region, selection of effective factors is essential. The selection of effective factors is a very difficult task because there are no standard criteria (Saha et al. 2020;Rahmati et al. 2017). In this study first, factors were chosen for modelling the gully erosion susceptibility considering the previous literatures and geo-environmental condition of the basin (Rahmati et al. 2017;Roy et al. 2020;Saha et al., 2021). Afterwards, to select the effective gully erosion conditioning factors (GCFs) multi-collinearity test and information gain ratio method were used. Finally, 14 geo-environmental gully erosion conditioning factors, i.e. geomorphology, normalized differential vegetation index (NDVI), LULC, slope, elevation, distance from river, rainfall erosivity factor, topographical positioning index (TPI), aspect, geology, soil types, topographical wetness index (TWI), slope length (LS factor) and stream power index (SPI) were selected for modelling the gully erosion susceptibility in a basin.

Multi-collinearity diagnosis
Collinearity specifies a linear function between independent parameters, showing a strong correlation among them which reduces the accuracy of the models (Arabameri et al. 2018). Variance inflation factor (VIF) and tolerance were considered as suitable and efficient indicators for diagnosing multi-collinearity between the parameters (Oh and Pradhan 2011). A VIF of !5 and tolerance of 0.20 reveal the multi-collinearity problem (Saha et al. 2020;Roy and Saha 2019). This analysis was done through extracting the pixel information of each rasterized factor in ArcGIS 10.3.1 and then importing these in SPSS V.20 to perform the test.

Information gain ratio (IGR)
The all geo-environmental factors may not have equal predictive capability, and some of them may reduce the accuracy of the models (Chapi et al. 2017). If the less capability or null predictive factors are removed from the models, the results would be more accurate and prediction would be better ). There are many procedures to measure the predictive capability of geo-environmental factors and to select the suitable factors causing gully erosion including Fuzzy-Rough sets, Relief-f and Information Gain Ratio (IGR) (Quinlan 1993;Kononenko 1994;Chapi et al. 2017). Information gain depends on the information theory that leads to reduce entropy to demonstrate the significance of conditioning factors (Chapi et al. 2017). The IGR is a standard technique for quantifying the predictive capability of the factors in data mining in which a higher ratio designates a greater predictive capability of the factors (Chapi et al. 2017;Shirzadi et al. 2019).
Given a training data G consists of n input samples, n(E i , G) is the number of samples in the training data G belonging to the class E i (gully erosion, non-gully erosion). To classify G, the average amount of information or entropy [Entropy(G)] is needed, which is classified following the Equation (1) (Quinlan 1993): The quantity of information that is essential to split G into G 1 , G 2 , … ,G m and regarding the GCF such as elevation [Entropy(G, Elevation)] was calculated by Equation (2) EntropyðG, ElevationÞ ¼ EntropyðGÞ (2) The IGR for a certain GCF elevation was computed using Equation (3) IGRðD, ElevationÞ ¼ EntropyðGÞÀEntropyðG, ElevationÞ SplitEntropyðG, ElevationÞ where SplitEntropy (G, P) denotes the probable information produced by partitioning the training data G into m subsets. SplitEntropy (G, P) was calculated following Equation (4):

Conditioning factor description
Selection of GCFs is an essential and vital step to assess the prediction capability of the applied models (Poesen et al. 2003). In the present work, 14 GCFs were selected according to the nature of the river basin, accessibility and availability of data, previous literatures, multi-collinearity test and IGR method. The influence of these GCFs is illustrated below.

Topographic factors
Topographic factors have large control over the formation and development of gullies (Nagarajan et al. 2000). The topographic factors included in this analysis were categorized into primary factors, i.e. slope steepness (Figure 6 Table 3. Thematic layers of these factors were extracted from a DEM using raster calculator in ArcGIS 10.3.1 and SAGA-GIS softwares. Elevation was taken as a gully conditioning factor ). It may develop in different altitude, relying upon the gully conditioning mechanisms . Slope steepness depends on continuous land evolution processes and stimulates surface and sub-surface runoff, soil loss, and drainage pattern. The areas having a steep slope and high surface flow are more susceptible to gully initiations (Debanshi and Pal 2020). Slope aspect is known as a significant parameter for susceptibility studies of denudation processes and natural hazard and reins the soil moisture content, exposition of land to sunlight, physical weathering, vegetation growth which indirectly influence the gully development (Tehrany et al. 2015). LS factor reflects the horizontal distance on the surface between the source point of overland flow and the change point of slope or flow concentration point (Renard 1977). It influences the flow velocity and erosion magnitude. The TWI and SPI account for the erosive capability of surface flow based on expected discharge, which promptly accelerates toe erosion and river incision (Nefeslioglu et al. 2008). TPI compares the average  altitudinal difference among the neighbour pixels in a DEM and signifies the land capability/resistance of accumulation/erosion (Weiss 2001). The values of TPI are positive for hill-slopes and uplands and on the other hand negative for valley areas and zero for flat areas (Vorpahl et al. 2012).

Hydrological factors
Proximity to the river: Stability of the slope segment largely depends upon the distance from the river channels. The potentiality of slumping and displacement of soil particles increases as distance from the river decreases (Figure 6j).
3.5.3.1. Rainfall erosivity (R) factor. Rainfall erosivity represents the capability of erosion by a rainstorm phenomenon within a specific area. The R factor value for each meteorological station was calculated following the formula of the RUSLE model using average annual rainfall data of 35 years (from 1982 to 2016) for this basin (Choudhury and Nayak 2003).
where 'ra' denotes average annual rainfall in mm.

Lithological factors
Geology: Geologically, entire basin comes under the Chhotanagpur Gneiss and Granophyre formation except the lower-middle part which belongs to the Lower Gondwana system. Most of the gullies are located on the Chotanagpur gneiss and granophyre formation (Figure 6(l)).

Geomorphic factors
Geomorphology: The Jainti basin is a denudational plateau in nature with some small geomorphic features such as karst origin, badlands, structurally originated from dissected hills and valleys. Dissected nature of this plateau fringe river basin and a large share of badlands epitomize the erosion processes and potential of gully development (Figure 6(k)).

Soil-related factors
Soil texture type: The State Agriculture Management and Extension Training Institute (SAMETI) of Jharkhand has categorized the soil into four classes in this basin following USDA soil classification scheme such as loamy skeletal haplustepts, fine loamy paleustalfs and rhodustalfs, fine mixed hyperthermic typic paleustalfs and fine loamy haplustepts and haplustalfs (Figure 6(n)). Among them most of the basin is covered by loamy skeletal soil haplustepts, which is very erosive in nature (Haldar et al. 1996).
3.6. Machine learning models 3.6.1. Random Forest (RF) RF was first introduced by Breiman (2001) and is an extended form of the model CART (Classification and Regression Tree). In regression and classification problems, RF model is a strong tool as illustrated in the literature (Trigila et al. 2015). RF model is an ensemble one and modified bootstrap aggregation (Pourghasemi and Rahmati 2018). This model represents the modified bagging or bootstrap aggregating method and therefore ensembles in nature. Integrated decision rules of RF model generate a very large number of trees to form a 'forest', and these are grown on the basis of bootstrapped sample applying CART framework with a random subset of factor chosen at every node. The final discernment about class membership and model building is decided on the basis of election of all decision trees (Micheletti et al. 2014). In the current research, to run the RF model, 'randomForest' package of 'RStudio' was used along with ArcGIS 10.3.1.

Boosted regression tree (BRT)
BRT model is an ensemble machine learning approach and statistical application. This novel modelling approach improves the prediction capability of a particular model by combining various techniques (Elith et al. 2008). This model involves two sets of rules from the CART models, i.e. boosting and regression and also assembles the strong points of these algorithms so that the variance in the final outcomes could be lowest and analytical accuracy could be enhanced (Aertsen et al. 2010). Boosting algorithm enhance the set of assemblage tree providing new trees in it and excluding residual errors that found in it (Cao et al. 2010). BRT model systematically boosts the number of trees that are instinctively set through the intricacy of tree structure, intercross-validation, rate of learning (influence of each tree over developing model) and bag fraction, which were governed by trial-and-error process (França and Cabral 2015). Therefore, BRT model overcomes the prime disadvantage of a single tree model. In this study, BRT model was performed in 'R Studio' software using 'gbm' package (Elith et al. 2008).

Naive bayes (NB)
This model follows the rules of Bayes' theorem and it operates under an assumption of conditional independence of the preconditioning factors (He et al. 2019). It is very robust to the irrelevant factors and can perform classification to estimate essential parameters using a small figure of training data (Bhargavi and Jyothi 2009). In this research, the Naïve Bayes models operations were performed in 'RStudio' using 'rminer' package. The further methodological details can be found in the literature of Pourghasemi and Rahmati (2018). For preparing and testing the accuracy and goodness-of-fit three machine learning models, i.e. RF, BRT, and NB were used. The data used in this study have different spatial resolutions. The resolution of the DEM (30 m Ã 30m) was considered as the base resolution for preparing the final gully erosion susceptibility maps and the other factors having different resolutions were resampled into the selected base scale. All the factors were normalized because in dimensions, 14 variables are distinct. The popular approach for data processing (Choi et al. 2010) is to turn the data into values between 0 and 1.
where Yi refers the normalized values of Xi, Xmin and Xmax signify the minimum and maximum value of Xi, respectively. This was achieved by translating the nominal and interval class group data to constant values ranging from 0.1 to 0.9. Consequently, the constant values were not ordinal data, but nominal data, and the numbers show that the input data are classified. Produced susceptibility indices of these models were then classified into five categories following natural breaks classification algorithm namely very-low (VL), low (L), moderate (M), high (H) and veryhigh (VH).

Validation of the results
The goodness-of-fit, robustness and predictive capability of a model can be judged through several statistical measures assessment (Lombardo et al. 2015). Assessment of predictive proficiency is mandatory in order to judge the ability of prediction of models, and without these the resultant maps are not obvious and improbable to the users for designing a plan and making decisions (He et al. 2019). In this regard, the robustness of the models was assessed considering the alternating training and validation gullies (Rahmati et al. 2017). In this analysis, both threshold dependent and independent methods were applied to measure the goodness-of-fit and prediction capability (Rahmati et al. 2017).

Discrimination accuracy measures
Efficiency It is the ratio of gully presence and gully absence pixels which were categorized by the models (Rahmati et al. 2017). The formula used for calculating the efficiency is as follows: where True Positive (TP) and True Negative (TN) are the correctly categorized pixels as gully presence and gully absence accordingly. False Positive (FP) refers to the number of gully absence pixels that were erroneously classified to the gully presence class, and False Negative (FN) denotes the number of gully presence pixels that were wrongly classified as gully absence class. Jaccard index Jaccard index denotes the true positive (TP) numbers of whole samples that are positive predictions or real positives (Jaccard 1901).
Matthews's correlation coefficient (MCC) Matthews's correlation coefficient includes all the samples, i.e. TP, FP, TN, FN in the computation and the resultant values ranges from þ1 to À1 (wrong classification to true classification) and random result is indicated by the value 0 (Lever et al. 2016).
Kappa coefficient This index indicates the sudden chance of occurrence (Equation (11)) and computed as the gap between expected (P expect ) and observed (P observe ) agreements (Guzzetti et al. 2006).
where P observ and P expect can be expressed as below: where N is the total number of pixels in the map layer. The kappa coefficient values were calculated for measuring the accuracy following Landis and Koch (1977) ( Table 5).
Receiver operating characteristics (ROC) ROC curve is a threshold-dependent approach, which has been extensively used in the assessment of the performance of models (Romer and Ferentinou 2016). Area under curve (AUC) epitomizes model efficiency of which value ranges from 0 to 1 or between 0% and 100%. AUC value nearer to 1 signifies the highest prediction capability of models and AUC value less than 0.5 indicates weak performance (Rahmati et al. 2017). The ROC curve graphically represents true positive cases or sensitivity against false positive cases or 1-specificity (Rahmati et al. 2017).

Reliability accuracy measures
RMSE RMSE (Root Mean Square Error) is the square root of the ratio of the difference between predicted values and actually observed values (Arabameri et al. 2019;Fitri et al. 2019). The formula can be expressed as follows: where N is the size of sample, V predict is the predicted and V observ is the observed values of the dependent factor. MAE As like RMSE, the mean absolute error was calculated dividing the total difference between observed and predicted values by a total number of observations (Arabameri et al. 2018).  (Landis and Koch 1977

Analysis of factors' sensitivity
In various research papers, sensitivity analysis was used to quantify the effectiveness of each input GCF (Ferretti et al. 2016). Map Removal Sensitivity Analysis (MRSA) was used by Lodwick et al. (1990) in the area of spatial risk modelling. It determines differences in derived GESM when one particular input is omitted from the model at the time of training . MRSA was therefore used to define the main determinants and to measure the effect of individual parameters on the uncertainty of GESM. MRSA model results of Percentage of Contribution (PC) of each factor were calculated by the following equation (Park 2015): where ALL AUC is the AUC value of GESM using all GCFs; and F iAUC is AUC for GESM when the i-th factor is excluded. AUC values for the RF model are higher than the other models as result this model was undertaken for the sensitivity of the GCFs.

Multi-collinearity assessment
The multi-collinearity test ensured that among the selected GCFs no multi-collinearity is present for this study area (Table 6). In this analysis, VIF value ranges from 1.037 (slope aspect) to 1.940 (TWI) and the lowest tolerance value is 0.521 (soil types). Therefore, all the selected GCFs can be included in the modelling process (Table 6).

Information gain ratio (IGR)
According to the result of IGR, the chosen factors have good predictive capability (

Factor contribution analysis
In this work, factors importance was assessed for each data set, i.e. G1, G2 and G3 samples using RF and BRT models. According to the BRT model, the factors responsible for making the area susceptible to gully erosion were geomorphology, NDVI, land use/land cover, R factor, slope aspect, proximity to the river, topographical wetness index (Figure 7). In the case of RF model, most significant factors were geomorphology, NDVI, R factor, slope steepness, TWI, TPI, distance from the river and elevation (Figure 7). In case of NB model, the most significant factor for gully erosion in the study area is geomorphology. In earlier studies, the direct or indirect effect of these parameters on gully erosion was reported (Rahmati et al. 2016). Among the geomorphic units, most of the basin comes under the denudational plateau, and most of the surveyed gullies are distributed over this unit and representing geomorphology as the most important factor. Non-vegetative and exposed lands cop up with denudational plateau was found to be more threatening parts for gullying. These facts recognized geomorphology and NDVI as relatively most important factors in both BRT and RF models. The RF model showed the most acceptable performance followed by BRT and NB, respectively. The next important factors were R factor, slope steepness, TPI and distance from the river. The influence of the rest of the factors is less important in these models. This fact may be accredited to the point that factors have not played a significant role in spatial gully development.

Spatial gully erosion susceptibility maps (GESMs)
The applied methods produced total nine GESMs, three (G1, G2 and G3) from each model. Models of gully erosion susceptibility using machine learning algorithms were developed considering training data sets to predict gully erosion in the study area. Gully pixels and non-gully pixels were used as the training data sets. For training the models, 70% pixels from both the classes were randomly taken as training data sets (Guzzetti et al. 2005). All the models were successfully performed and during the training period, the relational weights of the models were used to measure the gully erosion susceptibility indices for all pixels. The zoning of susceptibility indices for each model was done using the natural breaks classification method and classified into five classes of susceptibility (Figures 8-10) such as very high, high, moderate, low and very low (Guzzetti et al. 2005). The findings of each model showed that all the gully-affected areas are located in the upper catchment and some are in the southern edge of the Jainti River basin. However, the areal shares of the susceptibility zones of each model for different training samples are presented in Figure 11. The intra-model areal departure among the G1, G2 and G3 sample-based modelling results and inter-model areal difference for three sample-based modelling approaches are shown in Table 8. The areal departure in RF model is the lowest followed by BRT and NB, respectively. The mean probability value and departure in probability (SD) values of different pixels are the lowest in case of RF model followed by the BRT and NB models, respectively (Table 8).

Models accuracy assessment
In terms of efficiency, Kappa coefficient, Jaccard coefficient, MCC and AUC, RF model achieved the best performance in all sample data sets, i.e. G1, G2, G3 followed by BRT and NB model (Table 10), respectively. The RMSE and MAE values were also the lowest for the RF model, indicating the satisfactory reliability in comparison with the BRT and NB models. The mean efficiency percentage (Table 10) (Table 9).

Comparison of the GESMs
To obtain a better outcome and enhance the prediction performance of the machine learning models, numerous preconditioning factors have been used in various research works. Romer and Ferentinou (2016) and Conoscenti et al. (2013) argued that the precision of models could be improved using a large number of factors according to the nature of the study area. To figure out the main drivers of the constructed model, an evaluation on relative importance of the factors was carried out in previous literatures (McCloskey et al. 2016;Hembram et al. 2019). The analysis figured out NDVI, geomorphology, R-factor, LULC, elevation, slope steepness and distance from the river as relatively more important factors in both BRT and RF models in case of the Jainti River basin (Figure 7). In the badlands and denudational dissected hill and valley areas have a large concentration of gullies of this basin. Less Figure 11. The percentage share of basin area and actual gullies by different gully erosion susceptibility classes in RF, BRT and NB models when the sample training set is altered. Source: Author.
intensity and quantity of surface cover over these geomorphic units are triggering the erosion processes. R-factor denotes the soil erodibility capability of the rainstorm and it has a great role in gully erosion combined with favourable geographic set up (Choudhury and Nayak 2003). Stability of slope sometime governed by the types of LU/LC, i.e. barren and fallow land, is often more hazardous areas for rapid soil disintegration, since these are widely exposed lands threatened by rainfall and runoff . Coassociation of high altitude and steep slope favored the gullying processes by regulating runoff, drainage properties, etc. ). Short water flows join to the main river are often marked as rill and gully-forming flows. Furthermore, compactness and wetness of the soil-forming material vary with the distance from the main channel, which plays a significant role in gully development (Conoscenti et al. 2013). Thus, rather than individual, the association of these factors drives the gully erosion processes in this river basin. The produced GESMs using machine learning models quantified the gully erosion susceptibility in the Jainti River basin. RF, BRT and NB models with G1, G2 and G3 training sample data sets notified diverse results of gully erosion susceptibility (Table  8). These maps would be beneficial to identify the very high-susceptible areas to gully erosion and also the areas that are more likely to be affected by gully erosion in the  near future (Kuhnert et al. 2010). The percentage share of each susceptibility class with respect to the total basin area considering each training sample data set in each model is presented in Table 8. In all cases, results of the used models were satisfactory by fulfilling the two basic agreements that are field site gullies (both training and validation set) were found in the high to very-high susceptibility zones of the present GESMs (Figures 8-10) and the extreme susceptible zones (high and very high) covered a small area. Besides this, areal departure (positive or negative) between/among the models was computed (Table 8) following the work of Mahato and Pal (2019). The computational results showed that stability of RF model was greater in all sets of training sample followed than those of the other BRT and NB models, respectively, which was also founded by Rahmati et al. (2017). By overlapping the inventory gullies, it was seen that with the increased susceptibility level percentage of those gullies also increased. Figure 11 showed that within very-low, low and moderate susceptibility classes, percentage share of gullies is less compared with the high and very-high susceptibility class in each spatial GESMs, which was established a constructive agreement of the results and models.

Comparison of the models' accuracy
To verify the effectiveness and to compare the capabilities of the RF, BRT and NB machine learning models in producing GESMs, seven different statistical measures were applied in this analysis. Along with this, models' stability and sensitivity were tested considering the alternating three different training gully sets. For all training sample gully sets, the stability was very good in all of three models (Table 10). However, in terms of AUC values, the performances of BRT and NB model were very good, but RF model was excellent . The efficiency, Jaccard index, MCC and kappa coefficient values also revealed the same. In numerous previous studies on gully erosion vulnerability assessment, mainly the discrimination capability was used to evaluate the model performance (Zakerinejad and Maerker 2014;Rahmati et al. 2016) but choosing a single dimension in this regard is not a robust approach and there may remain a possibility of inappropriate model selection (Kuhnert et al. 2010 Bengio and Grandvalet (2004) and Pourghasemi and Rahmati (2018). With the measures of Jaccard index and MCC, similar performance can be found with the aforementioned results. Therefore, based on the overall assessment of these three machine learning models, the RF model was depicted as the best fit model to predict spatial gully erosion susceptibility in this study. During the modelling phase, RF model includes all the conditioning factors of various types without excluding any factors (Zhang et al. 2018). This model has the ease to deal with a very large set of data and with different categories of data . Through the process of recurrent prediction of each operation by means of combining various tree algorithms, it can identify and cogitate the nonlinear connection between conditioning and dependent variables (Zhang et al. 2018). RF model has better advantages than of other models as it is recognised with high prediction ability, nonparametric, produce nonlinearity, and computes variables importance. Although it has some drawbacks in some cases especially when dealing with a large set of data, it consumes time and high memory, a large number of end nodes in overlap (Arabameri et al. 2018). The best fit of the RF model in terms of stability and prediction accuracy was also confirmed in a large number of research studies in the field of spatial susceptibility assessment of natural hazards such as Trigila et al. (2015) and Rahmati et al. (2017). Despite the quite popular application of NB model in susceptibility mapping (Tzu-Tsung 2012; Pham et al. 2017); its performance showed relatively less accuracy than that of RF and BRT in this analysis.
Garosi articulated its inadequacy of use in susceptibility mapping of gully erosion. Though it can deal with categorical and continuous variables of various kind, its hindrance is very much sensitive to multi-collinearity, means it needs absolute independency of the factors and performs well with less training data (Tsangaratos and Ilia 2016). Pham et al. (2017) also reported its less performance than some other machine learning models. Rahmati et al. (2017) compared between RF and BRT model and state BRT model's relatively less performance in terms of prediction and goodness-of-fit. The mapping of gully erosion susceptibility through these machine learning models could be beneficial to the decisionmakers considering robustness of these models. However, the limitation lies within the quality of input data such as scale, resolution and availability for wider applicability and effective outcome through these models.

Analysing the sensitivity
The effect of GCFs on GESMs MRSA was considered (Table 11). Table 11 indicates the percentage of the AUC excluding each factor. The findings of the RMSA indicate that all the variables chosen have a good effect on the GESM. Among the GECFs, geomorphology (14.71% in G1, 13.24% in G2, 13.37% in G3 data sets) has the highest sensitive factor, thereafter LULC, NDVI, slope and R-factor. Soil and geology, on the other hand, have the lowest sensitivity to GESM. MRSA would help in understanding the contribution and relevance of GECFs for modelling gully erosion.

Conclusions
In semi-arid regions, soil erosion mainly occurs in the forms of the gully and rill erosion. In case of large study areas, viz. large watersheds, preparation of an acceptable GESM is essential but challenging and cost-efficient. The machine learning models are getting very popular in the hazard assessment. To verify the robustness, performance stability and accurateness of the models, the application of altered testing and training data set is quite important for better acceptability. It is difficult to completely solve the problem of over-fitting. Even with the multi-collinearity test and IGR, noise is still present in the GCFs used in this study and change in the number of selected GCFs might produce a change in GESMs. Future research to improve the precision of the following gully erosion susceptibility models may include recognition of differences among various methods for the selection of effective factors; selection of more effective gully erosion conditioning factors; evaluation of models using specific techniques and preparation of GESMs using additional ensemble approaches. The present research is contributed to a methodical evaluation and comparison of three machine learning models, i.e. RF, BRT and NB considering the three sets of training and validation samples.
Among the RF, BRT and NB model, the RF model produced the best performance in terms of discrimination accuracy (e.g. mean efficiency, Kappa coefficient and AUC) and reliability performance (e.g. mean RMSE and MAE). It may be concluded that to produce precise GESM and to deliver insight about the discriminative capability of geo-physical predictors, RF model can be very effective.
In terms of stability of result, the RF has also produced the best result even when using alternative training and validation samples, indicating better model transferability (i.e. the proficiency of performance in other areas). Delineating gully erosion susceptible areas using field-based methods are very expensive and time-consuming, especially for the large watersheds. Therefore, as a very modern tool, application of machine learning models along with GIS-based data and interfaces could be very operative in producing GESMs. Finally, the produced GESMs for the Jainti River basin showed the areas that are prone to gully development, which could be an effective tool for agriculture planners and policy-makers.