Integrating multilayer perceptron neural nets with hybrid ensemble classifiers for deforestation probability assessment in Eastern India

Abstract The rapid expansion of human settlement, agricultural land and roads because of population growth in several regions of the world has contributed to the depletion of forest land. In this study, novel ensemble intelligent approaches using bagging, dagging and rotation forest (RTF) as meta classifiers of multilayer perceptron (MLP) were used to predict spatial deforestation probability (DP) in Gumani Basin, India. The success rate and correctness of prediction of the ensemble models were compared with MLP. A total of 1000 deforested pixels and 14 deforestation determining factors (DDFs) were used. The ensemble models were trained using 70% of the deforested pixels and validated with the remaining 30%. DDFs were chosen by applying the information gain ratio and Relief-F test methods. Distance to settlement, population growth and distance to roads were the most important factors. The results of DP modelling demonstrated that nearly 16.82%–12.64% of the basin had very high DP. All four models created DP maps with reasonable prediction accuracy and goodness of fit, but the best map was produced by MLP-bagging. The accuracy of the MLP neural net model was increased 2-3% after ensemble with the hybrid meta classifiers (RTF, bagging and dagging). The proposed method could be used for deforestation prediction in other areas having similar geo-environmental conditions. Furthermore, the findings might be used as a basis for future research and could help planners in forest management.


Introduction
Deforestation is a quasi-natural phenomenon occurring on our planet's surface (Wan Mohd Jaafar et al. 2020). Worldwide, forests are affected by several threats, including population increase in urban areas, expansion of farming land and amenities, illegal mining and unregulated property rights (Gaveau et al. 2009;Newman et al. 2014;Robinson et al. 2014). The conservation of biodiversity and the removal of substantial carbon sink may help reduce carbon dioxide concentrations (Buchanan et al. 2008;Wang et al. 2009). Climate change, ambient carbon cycle imbalance and ecosystem degradation are the main environmental threats correlated with deforestation. Deforestation is considered as one of the most remarkable aspects of modifications in land use/land cover. Forest is a vital natural resource that provides a large range of ecological goods and facilities and plays a crucial role in balancing the atmospheric condition and, thus, climate change; therefore, forest cover change has become a global concern (Kumar et al. 2014). The effects of the growing strain on the environment have culminated in habitat destruction, deforestation and depletion for biodiversity (Nandy et al. 2007;Sun and Southworth 2013). Furthermore, the increased rate of soil erosion due to loss of forest cover may increase the environmental risks, such as landslide, water pollution and degradation of wetland ecosystem, which may have a major detrimental effect on the well-being of humans on a large scale (Glade 2003;Wahab et al. 2019). Thus, identifying the underlying forces behind forest cover modification is crucial for recognising the transformation in our planetary ecosystem and reducing the speculation regarding spatial and temporal deforestation probability (DP) (Bax et al. 2016). The deforestation process occurs in a haphazard fashion. On the basis of a set of suitable and desirable characteristics of physical and anthropogenic factors, forested lands are converted into other land use. For instance, forest patches near roads may have a high chance of being deforested. Similarly, low-elevation and gentle slope areas are favourable for cultivation, and farmland than rough terrain Turner et al. 2001). Understanding the causes of deforestation is, therefore, important in the formulation of effective mitigation steps and policies (Hosonuma et al. 2012). Causes of deforestation and the severity of their effects differ considerably from one region to another region and change over time. Most causes have been described as leading to rather than accelerating deforestation (Geist et al. 2002). Some deforestation research has focused on anthropogenic forces, although the analysis of deforestation processes requires considering the natural and anthropogenic aspects of the ecosystem (Bax et al. 2016;Wan Mohd Jaafar et al. 2020).
Traditional approaches used for analysing deforestation suffer from a series of limitations, such as follows: 1. correlation cannot be regarded as a clear indicator of the source; 2. statistical models selected for prediction may have minimal explanatory importance; 3. relationships can be nonlinear. With recent advances in remote sensing (RS), geographic information systems (GIS) and various statistical techniques, spatial DP can be forecasted more accurately (Pontius and Schneider 2001;Houet and Hubert-Moy 2006;Arekhi 2011;Hamzah et al. 2020;Saad et al. 2020). In the Carpathian Mountains, the increasing accessibility to large temporal satellite imagery and the development of GIS and RS tools have facilitated the comprehensive study of past human-induced forest depletion. Many areas have also been studied at national (Munteanu et al. 2014(Munteanu et al. , 2015 and international scales (Sobala et al. 2017;Kaim et al. 2018;Szymura et al. 2018;Wan Mohd Jaafar et al. 2020). Several scholars have prepared DP models based on logistic regression algorithms in tropical areas (Kumar et al. 2014;Bavaghar 2015;Kucsicsa and Dumitric a 2019). Traditional unsupervised techniques, including regression analysis (Ludeke et al. 1990), change vector analysis (Nackaerts et al. 2005) and principal component analysis (Deng et al. 2008;Ortega et al. 2020), have been widely used to detect changes in forest cover. Artificial intelligence (AI) and machine learning (ML) algorithms have been widely adopted for mapping different hazards and potentiality such as gully erosion susceptibility, landslide susceptibility (Roy et al. 2019), flood susceptibility , land subsidence (Tien ; Individual tree crown detection and delineation (Wan Mohd Jaafar et al. 2018) and groundwater potentiality mapping (Tien . In all those cases, ML and AI methods have shown good capability in modelling hazards. ML techniques have been used for the prediction of deforestation. Ortega et al. (2020) used the deep learning technique and support vector machine to detect deforestation.  used random forest and reduced error pruning trees (REPTree) for modelling the DP. Dlamini (2016), Kr€ uger and Lakes (2015) and Mayfield et al. (2017) used Bayesian networks for assessing DP, which provided reasonable results.
In recent years, several authors have used hybrid ensemble methods for mapping landslides (Fang et al. 2020), gully erosion ) and groundwater potentiality (Rahmati et al. 2018) and these techniques have achieved better results than individual models. Ensemble method is a learning in which several models, such as classifiers, are systematically produced and integrated to solve a specific computational intelligence problem. Ensemble method is mainly used to enhance a model's efficiency (classification, estimation, etc.) or minimize the possibility of an unexpected selection of a weak one. The ensemble of hybrid meta classifier and artificial neural network is still not used in the field of deforestation modelling. On the basis of the accuracy of the hybrid ensemble models used in the abovementioned fields, the current work addressed the question that hybrid ensemble methods are equally accurate for DP modelling or not. We selected ensembles of multilayer perceptron neural nets (MLPnn) and three hybrid ensemble models, i.e. MLP-bagging, MLP-dagging and MLP-rotation forest (RTF), to prepare DP maps of the study area.
The novelty of this work is that the employed hybrid ensembles of MLPnn and bagging, dagging and RTF models have not been used for deforestation modelling. This work not only included these methods but also used Friedman and Wilcoxon signed-rank tests for judging the difference among the DP maps produced by these models, which are also relatively new in this field. Information about the forest cover changes of this area remains limited. In this situation, RS is a vital source of data for the effective monitoring of this region. The forest cover changes were demarcated using the normalised difference vegetation index (NDVI). The DP maps would help the researchers and decision makers of this region. In addition, these sorts of methods have not yet been used in this area, as well as in India for the evaluation of DP. The detailed explanation of all of these methods and parameters would direct future researchers working in this field.
The purpose of this research is to evaluate the DP in the Gumani River Basin, India by applying the hybrid ensemble frameworks of MLPnn and ensemble strategies, i.e. bagging, dagging and RTF. Preparation of the probability map for deforestation is helpful to policymakers for identifying the areas susceptible to deforestation and evaluating the current forest management.

Description of the study area
The Gumani River is located in the fringe area of the Chhota Nagpur Plateau of India. It is the tributary of the Ganga River having a length of 120.09 km. Geographically, this basin extends from 24 37 0 39 00 N-25 7 0 19 00 N lat. and 87 21 0 20 00 E-87 54 0 20 00 E lon. (Figure 1), encompassing an area of 1274.57 km 2 . The forested area has been decreased from 24. 11% (1990) to 14.33% (2020) of the total area of the basin (Landsat TM 1990 and OLI 2020 images of the USGS Earth Explorer). The lower part of the basin is agriculturally prosperous, whilst the upper part has a high concentration of population and settlement. Population growth is high in this study area; the total population was 560,000 in 1991 and increased to 750,000 in 2011 (Census of India 1991. Therefore, population increase has a detrimental effect on the forest cover, whilst attention should be given to geographical context and other criteria of forest depletion. Geologically, this area comprises Rajmahal Traps, lower Vindhya system, lower Gondwana system and new alluvium. This basin often has different geomorphological nature because the upper portion belongs to the undulating plateau and the lower portion is a plain area. The elevation of the study area ranges from 17 m to 581 m from the mean sea level. The climate varies from subtropical humid to subhumid (Chandniha et al. 2017). Rainfall in this basin mainly occurs between June and September (Chandniha et al. 2017). The mean annual rainfall is 1,300 mm (Chandniha et al. 2017). According to the National Bureau of Soil Survey and Land Use, the prevalent soils are fine loamy, loamy skeleton and clay skeleton. The forest concentration is mainly high in the upper portion of the basin and low in the lower portion. For protecting the remaining forest areas in the basin, prediction of deforestation area and formulation of suitable strategies by the local government are necessary. Our work would help the decision makers in this respect.

Background theory of methods employed
3.1. Ensemble model for DP assessment DP models using ensemble structures of MLPnn and bagging, dagging and RTF for spatial DP were obtained through four key stages ( Figure 2).

Selection of deforestation determining factors (DDFs):
After the survey of the published literature, the DDFs were selected. The selected parameters were justified using two statistical methods, i.e. information gain ratio (IGR) and Relief-F. Deforestation affecting factors were divided into two classes, namely, natural factors (viz. altitude, slope, forest density, distance to forest edge, proximity to river, aspect and topographic position index, [TPI]) and anthropogenic factors (viz. population density, agricultural land density, distance from agricultural land, proximity to road, settlement density, proximity to settlement and population growth) in the DP analysis. 2. Collection and preparation of data layers: Data regarding deforested locations and DDFs were collected to predict spatial DP. In January 2020, an intensive field investigation with a handheld global positioning system was conducted to validate the deforested locations collected through the interpretation of Google Earth images and NDVI. 3. Assessment of the contribution of the DDFs: A frequency ratio (FR) model was used, and the percentage shear of the sample deforestation points was calculated for judging the significance of the DDFs. 4. Preparation of deforestation models and DP maps: To construct deforestation models, ensemble methods were firstly, implemented to refine the training data set. Input configured data were then utilised to categorise the groups for the probability of spatial deforestation by using the MLPnn base classifier. Finally, frameworks of ML ensemble were built for DP models. 5. Validation and comparison of models: Using the ROC, efficiency, accuracy, MAE and RMSE DP maps were validated and compared in consideration of the training and testing datasets. Friedman and Wilcoxon statistical signed-rank tests were performed to check whether differences exist amongst the DP models or not.

Deforestation map
The forest cover change (1990-2020) was considered a dependent variable ( Figure 3) for DP modelling. NDVI was measured from the Landsat images of 30 m Â 30 m   1990, b. 2000, c. 2010, d. 2020, and e. deforestation map (1990-2020).  Figure  3d) via GIS tools, and NDVI values greater than 0.3 were considered forest (Weier and Herring 2000). During these decades, nearly 9% of forest cover was lost. The forest cover areas are 24.11%, 20.96%, 16.56% and 14.33% of the total basin area for the years of 1990 (3a), 2000 (3b), 2010 (3c) and 2020 (3d), respectively. NDVI map of 1990 of the study area was considered as the base map for this study. A binary map with the groups of 'deforestation' and 'non-deforestation' was produced by subtracting the forest cover from 1990 to 2020 (Figure 3e) for the duration of 1990-2020. For preparing the DP models and obtaining enhanced result, 1000 pixels for both classes, i.e. deforested and non-deforested, were randomly selected from the total deforestation and non-deforestation pixels (S€ uzen and Doyuran 2004). Amongst them, 70% were considered for modelling, and 30% were selected for validating the models.

Preparation of DDFs
For constructing the DP models, seven natural factors (i.e. altitude, slope, forest density, distance from forest edge, proximity to river, aspect and TPI) and seven anthropogenic factors (i.e. density of population and agricultural land, distance from agricultural land, proximity to road, settlement density, proximity to settlement and population growth rate) were selected (Table 1). These factors were considered as independent factors, and a thematic layer for each variable was prepared. In Table 1, methods of preparing the factors and sources of data have been presented. The regional topography condition plays an important role in the forest cover change. Spatial variation in the deforestation process is influenced by slope, altitude, aspect and TPI (Bax et al. 2016;Szymura et al. 2018). The slope classes determine the spatial variability in deforestation process (Siles 2009;Kumar et al. 2014;Bavaghar 2015;Vanonckelen and van Rompaey 2015;Bax et al. 2016;Szymura et al. 2018). A slope map (Figure 4a) was extracted from ASTER DEM with a resolution of 30 m Â 30 m (Table 1). Aspect (Figure 4j) controls the amount of sunlight and rainfall of a particular region (Kumar et al. 2014;Bavaghar 2015;Bax et al. 2016). It affects the composition and development of forest cover. The degree of deforestation is also indirectly connected to slope face (Bayat 2000). The DEM of the basin was considered the altitude map ( Figure 4k). In high-altitude areas, natural hazards, such as weathering, aeolian flooding and landslide, are the main drivers of deforestation; in low-altitude areas, deforestation is induced mostly by anthropogenic activities (Ercanoglu and Gokceoglu 2002). Distance to the river is a parameter that determines the stability and instability of slope, indirectly influencing the forest cover change (Saha et al. 2002;Yalcin 2008). Waterbodies may be exposed to forested areas and reflect secondary routes for timber collection (Nackaerts et al. 2005). For distance to river, a thematic layer was prepared in a GIS environment by using the Euclidean distance buffer tool (Figure 4c). The distance from the margins of forest is an important factor that can regulate deforestation (Matlack 1994). This factor is an intermediate area from which forest destruction continues at the border of existing forest (Arekhi 2011;Kumar et al. 2014). DP is determined using the nature and features of forest edge in the core forest region. This thematic layer was also produced Here, a to i indicates the cell value of 3 Ã 3 window.   Figure 4. Independent variables used for modelling the deforestation probability of the study area: a) slope, b) forest density, c) proximity to river, d) distance to agricultural land, e) proximity to road, f) proximity to forest edge, g) population density, h) proximity to settlement, i) settlement density, j) slope aspect, k) altitude, l) agricultural land density, m) population growth and n) TPI.
Different sociocultural and economic practices are mainly responsible for the degradation and loss of forest (Boudreau et al. 2005). The potentiality of deforestation is multiplied as the population continues to grow near a forested area (Vanonckelen and van Rompaey 2015;Szymura et al. 2018). As a result, population growth ( Figure  4m), population density ( Figure 4g), distance to settlement ( Figure 4h) and settlement density ( Figure 4i) are the main reasons for deforestation. A reciprocal relationship exists between forest cover change and settlement density. As settlement density (Figure 4i) increases, the probability of deforestation in its neighbouring parts will be increased and vice versa. The installation of road systems across land cover proceeds to divide the forest land and is the first move towards forest depletion. The road network is a vital deforestation-triggering factor because the forest close to the road is highly prone to degradation and vice versa (Chomitz and Gray 1999). The chances of deforestation are high in accessible areas (Bavaghar 2015). Here, a distance-to-road map was produced using the Euclidean distance buffer tool (Figure 4e). Rapid population increase is the main cause of deforestation (Michalski et al. 2008). Much inhabitants need substantial food and house and, hence, considerable land for farmland and houses (Cropper and Griffiths 1994). Overpopulation is considered the major cause of forest destruction in accordance with international organisations, including FAO. The population density map of the study area was constructed on the basis of data from the 2011 census ( Figure 4g). Agricultural land density (Figure 4l) is an important factor for assessing the DP of a particular region because it identifies the concentration of agricultural land in a particular area. The chances of deforestation are high where the density of agricultural land is high. The distance to agricultural land ( Figure 4d) is also an important land use predictor for determining DP. The chances of deforestation will be increased as the distance decreases and vice versa because a high probability of building or other human land usage will occur near an agricultural field. Population growth can be followed by a high rate of forest cover change (Vanonckelen and van Rompaey 2015;Szymura et al. 2018). The population growth (Figure 4m) data were collected from the Census of India (2011). High rates of population growth lead to the increase in settlement and agricultural area in the area of forest cover (Minetos and Polyzos 2010).

Factor selection
The selection of conditioning variables is a challenging task in any study because no specific criteria are available. Tien  and Roy and Saha (2020) identified effective factors by using statistical models for natural hazard assessment. Gayen and Saha (2018) used multicollinearity analysis for selecting DDFs. Different statistical methods, such as correlations, regressions, Relief-F tests, IGR, probabilistic models and ML models, can also be used to select DDFs. In this study, the IGR and Relief-F methods were applied for selecting the important deforestation determining factors. IGR solves the weakness of information gain related to attributes that can take on a vast range of different values that could learn the set of training too well. IGR has been used to assess which of the factors are perhaps the most significant. Relief-F algorithms have often been used as a form of selecting features that is implemented in a pre-possessing period well before the model is trained and is one of the most powerful pre-processing algorithms.

Information gain ratio (IGR)
For DP, anthropogenic and natural factors do not have the same diagnostic power and may even reduce the predictive capacity of a model. If we remove the irrelevant DDFs from the model, enhanced findings and prediction can be obtained (Mart ınez-Alvarez et al. 2013). IGR is amongst the most effective factor selection strategies (Tien . Information is gained on the basis of an intelligent principle that helps reduce variance and shows the importance of influencing variables. In data mining, IGR is an important strategy for quantifying factor predictability (Witten and Tibshirani 2011). Quinlan (1993) established the IGR, in which a high ratio means a great predictive capacity. In the supplementary material section, equations used to calculate IGR are mentioned (S1). In this study for identifying as well as selecting the important DDFs IGR was used.

Relief-F test method
The Relief-F method, implemented by Kira and Rendell (1992), iteratively changes the weights of features in accordance with their capacity to distinguish between adjacent shapes. The principal concept of the Relief-F algorithm is similar to the specific rules of the k-nearest neighbour algorithm (Altun et al. 2007). Being in the same class is likely to yield a distance close to a given distance. If the attribute is useful, the closest distances of the same class are expected to be closer to the range given throughout this attribute than the closest distances of all other classes (Altun et al. 2007). Mathematically, X is assumed to be a randomly drawn sample of the outcomes of a binary test. Two closest neighbours, one from the same class (strike or NH) and the other from another class (miss or NM) should be evaluated. Then, the weight (wi) for the i-th feature is updated via a heuristic computation (Cai and Ng 2012).
Further information on the algorithm is provided in the paper of Liu and Motoda (2008).

Deforestation occurrence in relation to DDFs and analysis of its influence
The percentage of deforestation samples and the FR of subclasses of each factor were calculated to understand the influences of the selected DDFs on the deforestation process. The percentage of deforested sample in subclasses of each explaining variable was calculated by overlaying each raster representing independent variables with the randomly selected deforestation pixels. FR provides a proportion of deforestation pixels in a specific category for each input layer (Lee and Pradhan 2007). FR values (Equation 2) based on the frequency of deforestation samples were calculated using the following equation: where, f refers to the pixels of deforestation in the explanatory variable subclass, tf indicates the total deforestation pixels, x denotes the total pixels in the explanatory variable subclass, and tx is the total number of pixels.

Base classifier of MLPnn
MLPnns are regarded as the techniques of artificial neural networks (ANN) and are commonly utilised in classification (Haykin 2009). MLPnn is a feedforward neural network and for the training process, it uses backpropagation. No decision has been reached about the relative values of individual input variables, the plurality of inputs is set on the basis of weight adjustment throughout the training phase, and the distribution of the training data set is independent of the pre-assumptions in these techniques (Gardner and Dorling 1998). Three main sequences exist for creating the neural networks in MLP, i.e. input, hidden and output layers ( Figure 5). In accordance with a specific application, every layer in a network contains adequate neurons. The input layer is inactive and rarely gathers data (e.g. data from various DDFs). Hidden and output layers analyse information on a constant basis. Input layers are known as variables influencing deforestation, output layers are regarded as the graded outcomes of inferring deforested or non-deforested groups, and hidden layers are the categorising layers for converting inputs into outputs. MLP Neural Nets have shown to be performing better than conventional classification methods (Benediktsson et al. 1990). There are some benefits of using this approach: (1) there are no pre-assumptions as to the distribution of the training dataset, (ii) there is no need to decide on the relative importance of the various input measures, and (iii) the weights are changed to choose the most input measures during the training process (Gardner and Dorling 1998).
MLPnns are of two key phases: (I) inputs are transmitted via the hidden layers to the output values, then the output values are compared with the pre-values to approximate the differentiation; (II) in achieving the best performance, weights are balanced to eliminate the disparity. Let x ¼ xi, i ¼ 1, 2,.., 14 is the vector of the 14 factors impacting deforestation, and y ¼ 1 (deforested) or 0 (non-deforested). The number of neurons in the input and output layers is generally calculated via operation. The number of hidden layers and their neurons is quantified by trial and error (Gong 2009). For a classification question, MLPnn data processing includes three stages: learning, weighting, and classification stages. The learning phase happens with the issuance of random initial relational weights, which are continuously revised until the correct training efficiency is achieved. Subsequently, the modified weights derived from the prepared network are often used to process test data and assess the overall precision and effectiveness of the application. The network efficiency is assessed by evaluating the consistency of training and test data in terms of the percentage and overall accuracy of classification (Congalton 1991). Learning information from the input neurons is considered to acquire the information of the output neurons by using the hidden neurons. Neuron j obtained from neuron i in its corresponding input layer in the first hidden layer can be represented as: where w ij reflects the weight of the association between input neuron i and hidden neuron j, pi is the data at input neuron I, and t is the input neuron number. The output value generated in the concealed neuron j, p j , is the transfer function, f, which is evaluated as the amount provided in neuron j, x'. f, the transfer function, can be described as Function f is typically a nonlinear sigmoid feature that is implemented to the weighted sum of input data until the data are transferred to the next stage.
The sum of the squared differences between the expected and actual output neurons E values is defined as follows (Subasi 2007): where Y dj is the expected output neuron j and Y j is the actual output neuron. Each w ji weight is adjusted to lessen the value E based on the training algorithm used. In this study, MLPnn was fitted with 500 epochs, 1 hidden layers and validation threshold of 20 generated from the trial-and-error process to avoid overfitting cases.

ML ensemble techniques
3.6.1. RTF RTF is an ensemble approach assembled with individual decision trees (Kuncheva and Rodr ıguez 2007) and initially proposed for classification by Rodriguez et al. (2006). It is based on the concept of a random forest approach aimed at creating reliable and flexible classifiers (Rodriguez et al. 2006). An individual tree is configured inside the RTF with compressed data sets associated with the space rotated using Principal Component Analysis (PCA). In this model, bootstrap samples are used as a training set for specific classifiers (Kuncheva and Rodr ıguez 2007). Throughout this process, points are derived from training datasets using base classifier to generate learning sub-training datasets (Pham, Bui et al. 2016). The function of DDFs in this analysis is x ¼ ðx 1 , x 2 , ::::::::x n Þ: Y ¼ ðy 1 , y 2 Þ denotes the main vector divisions, deforested or not deforested. D stands for the training data. F 1 , F 2 , :::F n are categorized in accordance with the ensemble. T specifies a certain set of DDFs and is divided into sub-classes k. A new training nonempty subset X 0 ij is prepared by applying the bootstrap method where F ij is the j th subset of features to run classifier D i . Further, a linear transformation is used to X 0 ij to prepare coefficients of matrix C ij wherein size of each matrix of X 0 ij is M Â 1 with the coefficients of r ð1Þ ij … .r ðkÞ ij : Ensemble RTF is established on the basis of the rotation matrix formed using the basic methods of characterisation and conversion (Xia et al. 2014). The rotation matrix is obtained by rearranging Ri matrix.
In this matrix, columns of R are reorganized as per original feature and a novel reorganized rotation matrix is called as R i r wherein xR i r signify the altered training set for classifier Di and all classifiers are to be run in a similar method.The obtained coefficients that are created for each entity class are organised using a sparse rotation matrix called R i via the average mixture strategy.
where l ðxÞ j is the chief confidence allocated to the class of y i , the likelihood allocated by the classifier Di and the regression dij is d ij ðx R r i Þ: In this hypothesis, x is from class y i , and c is the number of classes (Rodriguez et al. 2006).

Dagging
Dagging is a well-known re-sampling ensemble approach that produces and integrates a number of classifiers utilizing the same learning algorithm for base-classifiers. Ting and Witten proposed dagging in 1997. The procedure varies in many respects from the process of boosting and bagging. For example, based on the outcome of the previously generated classifiers, the boosting technique adapts the training data set in terms of distribution, while bagging modifies it stochastically and boosts the basis of the success of each classifier as a voting weight. For multiple disjoint experiments, dagging is used as a replacement for bootstrap experiments to obtain base classifiers (Ting and Witten 1997;Kotsianti and Kanellopoulos 2007). Furthermore, strong empirical indications prevail that dagging in noisy settings is far more resilient than boosting. A resampling ensemble strategy is used to merge multiple classifiers for ensuring improved predictive performance of base classifiers dependent on majority voting (Kotsianti and Kanellopoulos 2007). For this purpose, we created an ensemble in this research using dagging ensembles with MLPnn base classifier through voting methodology.

Bagging
Bagging, designed by Breiman (1996), combines several cases of training dataset and uses bootstrap aggregation technique to achieve results of strong predictive precision centered on a based classifier (Wu et al. 2020). It was used to provide a precise mapping of DP. For very large ensembles, bagging gives great results; having a greater number of estimators results in increasing the accuracy of these approaches in comparison to RTF model. Such ensemble is chosen because a slight change in the training data represents and enhances the capacity for estimation (Wu et al. 2020). Random selection of bootstrap samples to create a range of training subsets, generation of classifiers of several models, and combining the classifier development in the final model are the three main steps in bagging (Tien . In bootstrap experiments, one third of instances are not exterminated in the early test process. Bagging classifier in the bagging system uses the displacement approach to produce a bootstrap sample from the actual training dataset. The bagging hybrid ensemble solution enhances the success to each array of classifiers by linking them to the original feature scheme for the bagging categorisation phase. These cases were recognised by Breiman (1996) as off-bag tests. A Bagging fits each base classifier on random subsets of the initial dataset and then aggregates their individual predictions to form a final prediction (either by voting or by averaging).

Construction of models and DP maps
DP models utilising hybrid ML ensemble frameworks were developed using training data sets to predict the deforestation in the study area. For running the ML models continuous and categorical factors were used. The continuous DDF were classified based on the natural break classification method for the frequency ratio model as to know the influence of the sub-categories of the DDF through FR model. Deforested and forested pixels were considered as the training datasets. Pixels (70%) from both classes were randomly set as training datasets for running the models. The deforestation and non-deforestation were characterised as 0 and 1 codes, respectively. Once all the four models were effectively run in the training phase, the relational weights of the models were applied to compute the DP indices for all pixels. The measuring variables were standardised by training via the trial-and-error method to construct such DP models. Generally, 1 to 2 hidden layers are enough for pixel based mapping. For modelling the DP in this study using ensemble models ArcGIS and R-studio were used. Caret, rpart, ipred, rotationForest, neuralnet packages of R studio were used for predicting the deforestation probability. In this analysis, we used 1 hidden layers, 0.3 learning rate, 0.2 momentum, 0 seed, 500 training times and 20 validation thresholds for the MLPnn to: decide the quantity of data for reduced-error pruning, upgrade weight, add value to the weight, divide the data, and build the ensemble and finish the calibration testing (Pham et al. 2017;Onan 2016). The validation threshold is the value being used by validation test to be terminated. A threshold function is a Boolean function which determines whether a certain threshold is crossed by the value equality of its inputs. The percentage bag size indicates the training range size (Sedano et al. 2013). Likewise, 16 iterations, 1 seed, 100% of bag size (training range size) and MLPnn as base classifiers were set for bagging. Eighteen iterations, 2 seeds and MLPnn as base classifiers and 8 iterations, 1 seed and principal component analysis as base filters were used.

Validation techniques
3.8.1. Threshold-dependent methods ROC curve remains the most effective and acceptable approach that can effectively test models (Kumar and Indrayan 2011). In this study, three threshold dependent methods i.e. ROC, precision and accuracy were used for effectively evaluate the performance of the used models. The area under the curve (AUC) indicates the effectiveness and consistency of the models (Pepe 2000). The ROC curve has been used in various disciplines and branches (e.g. engineering and medical). Accuracy and precision have been considered for checking the robustness of models. Equations of AUC, sensitivity, specificity, precision and accuracy are mentioned in the supplementary material section (S2). High values of AUC, precision and accuracy indicate the good capability of models. AUC values vary from 0 to 1; an AUC value is highest with 1 which suggests a perfect estimation, whereas an AUC value < 0.5 implies poor results (Can et al. 2005).

Statistical techniques
Statistical evaluation techniques, such as MAE and RMSE, were selected for this study to validate the models. MAE is the sum of difference between predicted and actual DP values of the datasets. RMSE is defined by the square root of MAE (Supplementary material-S3). Can et al. (2005) set a cut-off value of 0.5. A value above 0.5 suggests poor results, whereas a value less than 0.5 suggests good performance.

Friedman and wilcoxon statistical signed-rank tests
The focus of this sub-subsection was to review the results of ensemble ML classifiers via statistical tests on multiple data sets. The classifiers of ML ensembles were tested using the same random samples. The main objective of these tests was to determine which of used methods vary statistically in performance. In this respect, Friedman and Wilcoxon rank tests are suitable because these tests do not presume homogeneity of regular distributions or variance (Tien . The signed-rank tests of Friedman (1937) and Wilcoxon (1945) were applied in this work to analyse the major differences amongst model outputs. A decision was obtained in consideration of the likelihood of hypotheses (p-value); if the p-value is valid, then a considerable gap exists amongst the models and vice versa (Tien . The signed-rank Wilcoxon determines the statistical importance of the systematic pairwise variations amongst the DP models. For this test, p-value and zvalue were considered to determine the important variations amongst the models. If the p-value is smaller than 0.05 and the z-value reaches the threshold z values (À1.96 and þ1.96), then the null alternative hypothesis will be accepted and the results of the DP models will be substantially different (Tien Bui, Pham et al. 2016).

Relief-F test and IGR
The IGR and Relief-F approaches were used to examine the relative importance of each of the DDFs for modelling DP probability. IGR and Relief-F were calculated for the training data, as shown in Figure 6 and Table 2. The resulting IGR and Relief-F indicated that selected variables have good predictive capability. Distance from settlement shown the maximum prediction capability; the IGR and Relief-F values were 0.3100 and 0.0922, respectively. Aspect contributed the least predictive value with IGR and Relief-F values of 0.0023 and 0.0052, respectively.

Frequency of deforestation in relation to DDFs
The selected input factors led to a spatial heterogeneity in deforestation process across the study area. The percentage of deforestation samples and FR value in each subclass of DDFs was calculated to understand the influences of DDFs. The histograms (Figure 7) depict the relationship of deforestation with the different DDFs.
For each slope class, deforestation varied (Figure 7a). The maximum deforested samples were identified in the low-slope class (56.7%), followed by those in the moderate-slope class. Similarly, the FR value was highest in the low-slope class, i.e. 1.08. The relationship between deforestation occurrence and aspect was also analysed (Figure 7b). The percentage of deforested samples and FR value (Table 3) were maximum for the flat area. For elevation (Figure 7b), the percentage of deforestation pixels was 67% between 17 and 145 m elevation, and it reduced in the high-altitude classes. The FR value was maximum (1.13) for the 79-145 m elevation class. A similar pattern could be observed in TPI (Figure 7d). The highest deforested samples were observed on flat land (53%). Most of the forest reductions were connected with distance to forest edge. In the first 62 m buffer ring, above 46% of the overall deforested samples were concentrated and within 0.5 km, which was 92% of the Figure 6. Contribution of DDFs in making the area potential for deforestation calculated using IGR and Relief-F. samples (Figure 7j). The FR value was also maximum (1.49) for the first buffer ring (0-62 m). A remarkable relationship was found between deforestation occurrence and proximity to the river. The maximum FR value (1.29) was achieved in the 0-156 m buffer ring. The incidence of forest loss decreased with increasing distance from settlement and roads (Figure 7f and k). For proximity to settlement and road, 91% and 87% of the total deforested sample pixels were concentrated within 0.5 km. The FR value of the 0.10-0.50 km road buffer ring was the maximum at 2.12, and the 71-142 m settlement buffer ring had the maximum FR value of 1.11 (Table 3). Deforestation occurrence was negatively associated with forest density (Figure 7g). The percentage shear of deforestation samples and FR value were highest for the lowforest density class.
A negative association was also found in case of distance to agricultural land (Figure 7m). A high rate of deforestation occurrence (73%) was determined at less than 200 m from agricultural land, and FR value was maximum for the 0-58 m buffer ring. The concentration of deforestation samples and FR values were high in the areas with high settlement (Figure 7i) and agricultural land density (Figure 7l). Figure 7e and n reveal that heavy deforestation occurred in areas marked by high population density and fast population growth.

Analysing the deforestation probability
The DP indices of all pixels were calculated of the total area, and each pixel was allocated with a specific probability index. Probability indices for deforestation were reclassified using a statistical approach. For this analysis, the geometrical interval was used as a statistical tool to reclassify DP indices. The approach of geometric interval is ideal for classifying continuous data as DP indices whilst minimising variance (Frye 2007). The DP indices were classified into five probability classes on the basis of this method, namely, very low, low, moderate, high and very high (Figure 8). The outcome of the MLP model indicated that 25. 16%, 22.19%, 21.02%, 14.81% and 16.82% of the overall forest area of the basin fell under very low, low, moderate, high and very high DP classes, respectively (Table 4). The outcomes of the MLP-RTF

Validation and comparison of DP models
The robustness of the DP models was judged using three threshold-dependent methods (AUC of ROC, precision and accuracy), two threshold-independent methods (MAE and RMSE) and two statistical tests (Friedman and Wilcoxon signed-rank tests). The AUCs showed that the precision of the DP maps reached more than 86% (0.86) for the test and validation data sets (Table 5). The MLP-bagging method for training and testing achieved the highest accuracy, followed by MLP-dagging, MLP-RTF and MLPnn. The AUC value of success rate curve (training data) and prediction rate curve (test data)was the highest for the MLP-bagging (0.902 and 0.943) and the lowest for the MLPnns (0.869 and 0.885), respectively ( Figure 9). The highest values of precision and accuracy, were obtained by the MLP-bagging and the lowest by the MLPnn, respectively ( Table 5). The values of statistical measures, i.e. MAE and RMSE, were calculated in consideration of the training and validation data sets. The lowest values (0.24 and 0.38) were obtained for the MLP-bagging ensemble model. On the other hand, the highest values (0.29 and 0.43) were obtained by the MLPnn model. Therefore, from the validation results, it was found that the accuracy of the MLP model was improved after combining with the selected three meta classifiers. On an average AUC of prediction and success rate curves was increased by 3%. The highest increase of AUC values of both curves were found in the MLP-Bagging ensemble modes i.e. 5.4% (in success rate curve) and 5.8% (In prediction rate curve) respectively. However, as per the results of ROC, precision, accuracy, MAE and RMSE, the robustness level of the MLP-bagging model was higher than those of the other MLPnn and ensemble models.
Friedman and Wilcoxon signed-rank tests were used to ascertain the DP models. The results of the Friedman test are presented in Table 6. The mean ranking values for the MLPnn, MLP-bagging, MLP-dagging and RFB-RTF models were 2.77, 2.22, 2.42 and 2.48, respectively.
The signed-rank test of Wilcoxon was applied to determine the gaps in pairs amongst the ML models at a relevance level of 5% (Table 7). When p (value) < 5% (0.05) and z (value) > z (À1.96 and þ1.96), the capabilities of the models in the Wilcoxon rank test varied substantially [106]. Analysis suggested (Table 7) a substantial disparity amongst all DP models.

Discussion
The changes in the forest cover of the Gumani River Basin are well recognised, with numerous factors primarily focused on institutional, financial and economic aspects (Vanonckelen and van Rompaey 2015), the low performance of protected areas (B alteanu et al. 2016) and environmental disruptions (S avulescu and Mihai 2011). The estimated evaluations for DP are limited, with only a few works assessing the relative impacts of biophysical, socio-demographic and land use approaches on the changes in the forest cover at temporal scales (Munteanu et al. 2015;Vanonckelen and van Rompaey 2015). Thus, we measured the future possibility of deforestation across the Gumani River Basin in this study by using hybrid ensemble frameworks, MLP-bagging, MLP-dagging and MLP-RTF. In this analysis for preparing the DP models first, hybrid ensemble methods were used to optimize the input data using training dataset. Thereafter, optimized input data were used to categorize classes for spatial DP considering the MLPnn base classifier . Ultimately, frameworks of the machine learning ensemble were developed for the DP models. The results of training sets of DP were used for the creation of DP maps. Ensemble approaches are classification methods for data processing, whilst MLPnns are regarded as ANNs with excellent results in the spatial modelling of deforested areas.
The findings of this study indicated that all probability models of deforestation utilising hybrid ML ensemble increased the efficiency of the MLPnn (AUC ¼ 0.869) base classifier. This result is reasonable because DP models using hybrid ML ensemble systems are well recognised to be very successful in enhancing the efficiency of base classifiers. The DP models in this analysis produced a satisfactory result and allowed basic performance indicators (such as accuracy, precision, AUC, RMSE, MAE  ). Due to the less error and very low overfitting problem, the ensemble methods provided  Figure 9. Validation and comparison of DP models by ROC curves: (a) success rate curve, (training data set) and (b) prediction rate curve (validation data set).  better results than previous works done by the different scholars . The quantity or overall area of deforestation is helpful for planning or zoning, but the models could not be used for measurement. Another drawback of the used models is that the assumed predictors of deforestation do not alter with time. This drawback is common amongst many ML models, but it is especially applicable to our models because deforestation predictors were chosen on the basis of predisposing risk factors for deforestation Mas et al. 2004). Despite these drawbacks, the findings showed that data sets that are publicly accessible could be considered to estimate the DP within the research area. DP models utilising ensemble frameworks were compared. The results of the evaluation of the DP maps were obtained using ROC, efficiency, accuracy, MAE, RMSE and two statistical tests, i.e. Friedman and Wilcoxon signed-rank tests (Tables 5-7). The results showed that MLP-bagging considerably outperformed the other models. MLPbagging (AUC ¼ 0.943) had the strongest predictive capacity, followed by MLPdagging (AUC ¼ 0.928), MLP-RTF (AUC ¼ 0.884) and MLP models (AUC ¼ 0.902). MLP-bagging is more efficient in mitigating volatility and discrimination compared with other ensemble approaches (Pham et al. 2017;Sedano et al. 2013). Feature selection approach is widely used to test the predictive capacity of variables to improve model performance by eliminating unwanted or unimportant factors in advance (Pham, Pradhan et al. 2016). The Relief-F and IGR methods were utilised in this analysis for selecting and judging the predictive potentiality of different DDFs for DP models. On the basis of these methods, the distance to settlement and the distance to road and population growth showed the strongest influences on DP models because most of deforested locations were identified on or along road and settlement. The remaining factors, such as forest density, distance to forest edge, proximity to river, population density, agricultural land density, distance to agricultural land, density of settlement, altitude, slope and aspect, also indicated good contributions to DP models, as confirmed in other similar studies (Sahana et al. 2018). A relative difference of nearly 3% was determined from the comparison results of the DP models on the basis of the ROC curve, but it was substantial for the DP maps (Table 5). Therefore, even minor changes in the efficiency of DP models would contribute to increased change in the reliability of DP maps. Furthermore, the efficiency of such probability models for deforestation depends greatly on optimising the predictive parameters.
The output of this research might help researchers to analyse deforestation in other areas. Hybrid ensemble approaches could also be used to assess data and serve as reliable alternatives to conventional computational strategies for modelling DP. The use of soft computing approaches would inspire the scientific communities to use sophisticated techniques for precisely modelling probable deforestation areas. In populated countries, such as India, this work would assist the policymakers in making strategic plans for managing the existing forest cover.

Conclusions
In this research, hybrid ensemble frameworks, MLP-bagging, MLP-dagging and MLP-RTF, were effectively implemented for the analysis of DP of the Gumani River Basin. ROC, accuracy, precision, MAE, RMSE and Friedman and Wilcoxon signed-rank tests were used to validate and compare four DP models. The findings indicated that DP models utilising ML ensemble systems worked well in this study, and substantial differences existed amongst the models.
Among the MLPnn, MLP-Bagging and MLP-Dagging model, the MLP-Bagging model produced the best performance in terms of accurateness (efficiency, accuracy and AUC) and reliability (RMSE and MAE). It may be concluded that to prepare an accurate deforestation probability map, MLP-Bagging model can be very effective. After ensemble of meta-classifiers with the base classifier, the accuracy of the MLPnn model was increased significantly. Delineating deforestation probability areas by means of field based methods are expensive and time-consuming, especially for the large watersheds. Therefore, as an alternative, application of ensemble machine learning models along with RS-GIS based data and interfaces could be very effective in creating deforestation probability map. Finally, the produced deforestation probability maps for the Gumani River basin displayed the areas having high and very-high probability of deforestation, which could be an effective device for policymakers and environmental planners.