Multivariate statistical algorithms for landslide susceptibility assessment in Kailash Sacred landscape, Western Himalaya

Abstract Landslide susceptibility mapping plays an imperative role in mitigating hazards and determining the future direction of developmental activities in mountainous regions. Here, we used 518 landslide occurrences and nine landslide-conditioning parameters to build landslide vulnerability models in the Kailash Sacred Landscape (KSL), India. Four multivariate statistical models were applied, namely the generalized linear model (GLM), maximum entropy (MaxEnt), Mahalanobis D2 (MD), and support vector machine (SVM), to calibrate and compare four maps of landslide susceptibility. The results demonstrated the outperformance of Mahalanobis D2 for predictability compared to other models obtained from the area under the receiver operating characteristic curve (ROC). The ensemble model data shows that 10.5% of the landscape has susceptible conditions for future landslides, whereas 89.50% of the landscape falls under the safe zone. The occurrence of landslides in the KSL is linked to the middle elevations, vicinity to water bodies, and the motorable roads. Furthermore, the observed patterns and the resulting models exhibit the major variables that cause landslides and their respective significance. The current modelling approach could provide baseline data at the regional scale to improve the developmental planning in the KSL.


Introduction
Due to the adverse interplay of several factors, including the rugged topography, diverse and inconsistent precipitation patterns, lithology, and seismic activity, the Himalayan mountains are prone to various natural hazards, including glacial lake outburst flood, landslides, debris slow and ice/rock avalanches (Kaushik et al. 2020;Abdo 2022).In recent decades, landslides in the Himalayas have garnered significant scientific attention due to their ability to cause widespread devastation among all the natural hazards in the region (Das et al. 2023).However, the current paradox is that such landslides are further exacerbated by ever-increasing infrastructure, road construction or hydropower projects in Himalayan mountainous regions (Sarkar et al. 2011;Kundu and Patel 2019;Sangeeta and Singh 2023).Thus, in the recent past, landslides have caused large-scale destruction and fatalities, resulting in severe life and property loss in the region (Das et al. 2023).Mitigation of landslides is the need of the hour to ensure sustainability in the area.In this quest, landslide susceptibility mapping is considered a reliable tool made possible by recent advancements in remote sensing and GIS (Hearn and Hart 2019;Lee 2019;Rahman et al. 2022;Basharat et al. 2023).This tool helps in decision-making for mitigating risks arising from landslides.Thus, accurate landslide susceptibility maps become essential for planning disaster risk reduction and advance alerting.However, several attempts have been made over the past few decades to map landslides using different mathematical modelling algorithms.A majority of methods applied in landslide susceptibility modelling and environmental modelling assessment include weighted overlay (Singh et al. 2019;Abdo et al. 2022;Naceur et al. 2022), frequency ratio (Wang et al.2020;Shu et al. 2021), generalized additive modelling (Bordoni et al. 2020), weighted logistic regression, evidential belief function (Zhao et al. 2022), fuzzy logic (Hejazi et al. 2022), neuro-fuzzy (Ghorbanzadeh et al. 2020), Naïve Bayes (Pham et al. 2017), Multilayer Perception Neural Network, Functional Tree (Pham et al. 2017) and many more.The reliability and practicality of output results are significantly affected by the appropriate selection of models, data availability, and quality.However, the effect of different landslide mapping approaches on susceptibility models has not been extensively explored.
In the last few decades, numerous statistical models have been developed and employed to forecast the distribution of animal and plant species, aiding research in conservation biology, ecology, and evolution (Zimmermann et al. 2010).Although some models, such as Maximum Entropy Algorithm (MaxEnt; Phillips and Dud ık 2008), Generalized Linear Model (GLM; Atkinson and Massari 1998), Support Vector Machine (SVM; Naimi and Ara ujo 2016), and Random Forests (RF; Breiman 2001) have been utilized for predicting landslide-susceptible zones, many others such as Mahalanobis D 2 (Farber and Kadmon 2003) have not been fully explored in landslide susceptibility modelling except few in recent years (Pourghasemi et al. 2020).The first objective of this study was to assess the feasibility of predicting landslide-susceptible zones in the western Himalayan region using the Mahalanobis D 2 algorithm and to compare its performance with other commonly utilized modeling algorithms like GLM, SVM, and (Pourghasemi et al., 2020).All models and analytical procedures here rely on statistical assumptions that the conditions leading to landslide occurrences resemble those of previously recorded landslides (Guzzetti et al. 2005).We aim to predict areas where no evidence of landslides is currently available, but those areas may be prone to future landslides.In this modelling exercise, we assume that if the area shares similar conditioning features (e.g.topographic, vegetation, geology, etc.) with existing landslides, they are more likely to be potential landslide sites in the future.
Over the past decade, there has been an increased focus on the multi-model ensemble approach to evaluate landslide susceptibility zonation.Several studies (Chen et al. 2017;Roy et al. 2019;Chen et al. 2019;Kadavi et al. 2018) have highlighted the benefits of the multi-model ensemble approach in enhancing the predictive performance of models and handling complex, high-dimensional data.This approach has been shown to offer advantages over the single-model approach by improving model accuracy and reducing uncertainty (Rokach 2010).Employing an ensemble model can also minimize the variability and dispersion of predictions and enhance model performance (Weigel et al. 2008).The second aim of this study is to employ an ensemble of GLM, SVM, MaxEnt, and Mahalanobis D 2 models and assess their performance in the Himalayan landscape like KSL.
The landslide models generate maps on a continuous scale, which may not be directly applicable to land managers and decision-makers in implementing effective mitigation strategies.However, most studies across the globe reclassify the map into 2-5 classes (i.e.high, medium, and low), which may not be statistically correct to delineate only landslide susceptible areas.Determining the optimal thresholds to identify landslide-prone areas is crucial in producing effective landslide susceptibility maps.Multiple statistical methods are applied to optimize thresholds of statistical models in animal and plant ecological studies, including maximizes (sensitivity þ specificity)/2, maximizes Kappa, mean predicted probability, and minimizes the distance between ROC plots etc. (Weigel et al. 2008;Cantor et al. 1999).However, the applicability of these statistical measures for delineating landslide-susceptible zones from a landslide susceptibility map on a continuous scale has not been widely tested.The last objective of this study is to explore the potential of the Boyce Index, commonly used in species distribution modeling, in both evaluating the quality of the model and identifying meaningful thresholds between the landslide-susceptible and safe zones of the final ensemble model (Hirzel et al. 2006).The Boyce index provides ratio curves between predicted and expected values and offers valuable information about the model's robustness, resolution, and deviation from randomness.Furthermore, it can assist in reclassifying predicted maps into susceptibility classes with practical meaning (Hirzel et al. 2006).The results of this study offer valuable insights into creating more effective and precise landslide-prone areas, which land-use managers and decision-makers can utilize to mitigate the risks posed by landslides within the KSL.

Study area
The geographic area of Indian Kailash Sacred Landscape, KSL -India (29 30 0 N-30 44 0 N and 80 24 0 E-82 49 0 E) is 31,175 km 2 , located at the tri-junction of north-India, western Nepal, and the south-western part of the Tibetan Autonomous Region (TAR) of China.The Indian landscape of the KSL is divided between the districts of Pithoragarh (almost entirely) and Bageshwar (partially).It comprises an area of 7120 km 2 inside the state of Uttarakhand.The region falls within the central Himalaya (Figure 1).This area is divisible into six eco-climatic regions, viz., (i) arctic (over 5500 m asl), (ii) sub-arctic (4500-5500 m asl), (iii) cold-temperate (3500-4500 m asl), (iv) cool-temperate (2000-3500 m asl), (v) warm-subtropical (700-2000 m asl), and (vi) warm-tropical (below 700 m asl).The study area has over 2500 settlements (permanent or seasonal) representing >1700 revenue villages.It has a deep cultural tie to Nepal and TAR.It also serves as an entrance to Mount Kailash, known for its historical legacy and faith values to Hindus, Buddhists, Jains, Sikhs, Bons, and many other religions.Geological instability, steep topology, hard weather conditions, and the turbulent nature of rivers make the entire landscape vulnerable to natural hazards like landslides, avalanches, and flash floods.Cloud bursts and heavy rainfall cause these hazards, often resulting in immense human lives and property loss.

Inventory of landslides and background points
This study used four popular models (i.e.Generalized linear model, Mahalanobis distance (D2), MaxEnt, and Support vector Machine) and ensemble them to predict the landslide susceptibility in KSL -India.Figure 2 presents the complete methodological flowchart about the modelling process.Landslides are easily identified in freely available Google Earth software (Slingsby and Slingsby 2019).Therefore, a high-resolution Google image for 2015 was used for identifying fresh landslides within KSL -India.We interpreted landslides based on open soil structures, distortions in the forest cover or grasslands, fresh open rocks, and other specific geomorphological features (Pradhan 2013;Pham et al. 2017).Since all landslide occurrences within the study area were identified, it can be inferred that total sampling was used.Each identified landslide was marked using the 'add polygon' feature of Google Earth.A total of 518 polygons were identified as potential landslide spots.Among them, 29% of random landslides were verified based on ground-truthing from December 2015 to August 2016, which yielded 100% accuracy (Supplementary Figure 1; the supplementary material includes a few ground pictures and corresponding landslides through google earth imagery).We merged all polygons into a single polygon shapefile in ArcGIS 9.3 (ESRI 2008).Further, we generated fifteen random GPS locations within each polygon that yielded 7,770 GPS locations.After removing spatial autocorrelations using the SDM toolbox (Brown 2014), yielded 537 locations, of which 70% of locations were randomly drawn for training points to make landslide susceptibility models and 30% were treated as test data sets to validate each model performance.Similarly, 537 random points were generated across the KSL -India to consider them as background points for the flowing modeling purpose.

Landslide predictor variables
We used ten landslide predictor variables for landslide susceptibility assessment, viz., digital elevation model (DEM), angle of slopes, euclidean distance from the roads, euclidean distance from the drainage line, aspect, Land Use Land Cover (LULC), Normalized Difference Vegetation Index (NDVI), soil depth, geology, and soil texture (Figure 3).More details of these predictors were given in Supplementary materials.Detail classes of soil texture and geology were provided in Supplementary Tables 1 and 2.

Statistical analysis of correlation
The pair-wise correlation or multicollinearity among all predictor variables was tested using the Pearson Correlation Coefficient (PCC, r) in R-Studio (Benesty et al. 2009).PCC has a range of À1 to 1, with a positive number indicating a perfect positive relationship between the variables, and the negative value indicates the ideal negative relationship.The zero value presents relatively no relationship among the variables (Guo et al. 2014).The PCC can be estimated by applying Eq. (1): where r xy ¼ PCC between the variables X and Y.The x i is the values for X variables, y i values for Y variables, x stands for the average of the X variable samples and y is for the average of the Y values dataset.Table 1.Estimating relative weights using a generalized linear model under different conditions.

Observable pattern of landslide occurrence
We measured the percentage of landslides in all predictive variables.The landslide intensity was assessed within each categorical variable using Levely's Index by applying Eq. ( 2).
where U ij is the percentage occurrence of the landslide within each category (j) of the predictor variable (i).A ij is the percentage of land area available in each category (j) of predictor variables (i).

Generalized linear model
Here we used logistic regression (LR) to identify the most suitable model that can depict the association between a dependent variable and several independent variables (Ozdemir and Altural 2013).In its simplest form, the logistic regression model can be represented as follows: The logistic regression model can be expressed as having an estimated probability, denoted as P, of an event occurring.The probability varies between 0 and 1, forming an S-shaped curve, since parameter Z can range from À1 to þ1.The parameter Z is defined as: where Z is the linear combination of the intercept (B 0 ) and the independent variables (X i ) weighted by their respective coefficients (B i ).Based on Eqs. ( 3) and ( 4), the logistic regression can be denoted as: (5) Mahalanobis distance statistics (D 2 ) The landslide susceptibility model was calibrated using the Mahalanobis distance.The model was first used in the distribution of species modelling by Clark et al. (1993).
Here, the Mahalanobis distance measures the dissimilarity between the average landslide characteristics in each cell.The average of the landslide characteristics has been estimated based on the geographical distribution of landslide occurrence.Assuming multivariate normalcy, D 2 has a v 2 distribution with n degrees of freedom, where n ¼ total number of predictors and the model outcome can be converted to probability values.According to Erdas (ERDAS Inc 1999), the Mahalanobis distance has been calculated using Eq. ( 6) at each cell.
where D 2 represents the Mahalanobis distance.The x represents a vector of landslide characteristics associated with the predictors of each spatial cell.The m is an average vector of landslide characteristics recorded at each cell of present landslide locations and T specifies the transpose of a matrix.The C À1 represents the covariance matrix for the vector of landslide features.The 'mahal' function of the 'sdm' package in R (Naimi and Ara ujo 2016) has been used for the present Mahalanobis D 2 model construction.

MaxEnt model
The Maximum Entropy (MaxEnt) model is derived on the basis of the entropy maximization principle of statistical physics (Banavar et al. 2010).This is established from the information-theoretic approach (Ruddell et al. 2013) applicable to predict spatiotemporal patterns of landslide occurrence.It is pattern-oriented and was initially used within ecosystems to predict species distribution using species occurrence data (Grimm et al. 2005;Phillips and Dud ık 2008).It has subsequently acquired popularity in the same field (Elith et al. 2011).However, MaxEnt has not been explored extensively for landslide susceptibility modelling (Felic ısimo et al. 2013).MaxEnt detects the potential set of predictors for which a pattern might emerge.Therefore, sets of predictor variables and landslide occurrence data can create the most 'susceptible' conditions in this case.Here, ten categorical variables were used for predicting landslide-susceptible conditions using MaxEnt.In general, the MaxEnt is based on information theory and statistics.We used MaxEnt default parameters to estimate probability values of landslides ranging from 0 to 1.The proportional influence of independent predictor variables and categories within each predicted variable to the MaxEnt model was also calculated.The function 'MaxEnt' of the 'dismo' package of the R statistical platform has been utilized to present MaxEnt model construction (Hijmans et al. 2017).MaxEnt defines the landslide distribution as a probability distribution that allocates importance weights to the sample area's locations (X) (Felic ısimo et al. 2013).Independent variables, such as the existence or lack of landslide sites, both are included to evaluate the landslide probability distributions.Then, using the maximum entropy rule, extend the even distribution function to the most likely sites (Felic ısimo et al. 2013).The MaxEnt modelling primarily aims to find the Gibbs distribution, which maximizes the probability of reprimanded log-likelihood and can be calculated using Eq. ( 7) (Phillips and Dud ık 2008;Park 2015).
Here, b j represents the normalization constant for the variables included for the jth features.The first term of Eq. ( 5) denotes the log probability function, maximizing the model value and improving data accuracy.In contrast, the second term represents the value of normalization.Following that it is evident that the Gibbs distribution function through MaxEnt modelling provides the best fit of data.

Support Vector Machine
The Support Vector Machine (SVM) algorithm is a theory-based universal constructive learning technique.It can handle classification and nonlinear regression difficulties by converting the input variables into a large-dimensional space in the form of an inner product derived from positive definite kernel functions (Yao and Dai 2006).The performance on several learning problems, including classification, regression, and novelty detection, has enhanced the popularity of the SVM (Karatzoglou et al. 2006).The model was first used in species distribution modelling by Guo et al. (2014) and later became more popular in several other sectors.The model was used in a few studies for landslide hazard modelling (Yao and Dai 2006;Yilmaz 2012;Pradhan 2013), of which Yao and Dai 2006 explained the use of the SVM algorithm in landslide susceptibility modelling.Here, ten categorical variables were used for predicting landslide-susceptible conditions using SVM.The function 'ksvm' in package 'kernlab' of the R has been utilized for the present SVM model construction.This study included Radial Basis Function (RBF) linear kernel to prepare the SVM model.The objective of the SVM classifier is to determine the best-separating hyperplane from the given tanning sets that can discriminate between two classes: occurring landslide regions and non-occurring landslide regions.A hyperplane can be defined using Eq. ( 8) in the case of linearly separable data: where 'w' stands for a vector coefficient that estimates the hyper plane's orientation in the feature space, 'd i ' is the positive slack variables, and 'b' is the offset of hyperplane from the origin (Cortes and Vapnik 1995).An optimal hyperplane leads to get the solution using Lagrangian multipliers as given in Eqs. ( 9) and (10).

Minimize
In the above equation, a i denotes Lagrange multipliers, C represents the penalty, and slack di allows for the penalized violation of the constraint.A decision function that classifies the new data can be described as Eq. ( 11).
Model performance Here we used the Receiver Operating Characteristic (ROC) Curve is used to analyse the outcome of statistical models by estimating their accuracy (Williams et al. 1999).This gives a diagnostic for distinguishing between two independent events and evaluates the performance of the classifier (Swets 1988).The curve represents the probability of getting an adequately expected event response (true-positive rate) versus the probability of receiving an erroneously predicted event response (false-negative rate or false-positive rate).The ROC Curve calculates the future chances of landslide occurrence, indicating the quality of the model, and it may be applied to evaluate the model prediction capabilities (Brenning 2005).Finally, the Area Under ROC Curves (AUC) was estimated, which is the basis for calculating the accuracy of the model.The AUC ranges between 0 and 1, of which 1 indicates the perfect model performance (i.e. an AUC value near 1 indicates good predictive power of the model).In contrast, an AUC value 0.5 indicates an inaccurate model (i.e. a value near the 0.5 value reflects the bad performance of the model) (Pradhan 2013).'sdm' and 'ROCR' packages of R were used for building all ROC plots and calculating AUC values (Naimi and Ara ujo 2016).

Ensemble model and its binary classification
All models have achieved an AUC value > 0.85, indicating the high reliability of the susceptibility maps.Rather than relying on a single 'best' model, some authors, such as Thuillier et al. (2003), have advocated using multiple models and ensemble forecasting.The ensemble model performance may not lose any place (in reality) where there is a high probability of landslide susceptibility or vice versa (Hong et al. 2018).
There are two main reasons to use an ensemble over a single model for better results.The 1st one is better generalization performance, and the 2nd one is its robustness which is the spread or dispersion of the predictions and model performance (J.Brownlee 2021).Therefore, the output maps can be used for the plan of risk mitigation enforcement by decision-makers.To get appropriate model averaging, all raster outcomes of models were re-scaled between 0 and 1, and the AUC weighted mean model explained in the sdm package in R (Naimi and Ara ujo 2016) was used to constructed using the feature tool of raster calculation in ArcGIS 9.3 geospatial software.
Although the gradient of the sensitivity map (value range: 0-1) provides more information, it is more convenient for decision-makers to visualize the entire landscape in a few different categories (e.g.susceptible, and safe zones).It is critical to note that a continual scale is frequently inaccurate for landslide hazard management because, in a natural environment, the susceptibility index may not be proportionate to the probability of a landslide (Hirzel et al. 2006).Steps or exponential patterns can be seen in real curves.As a result, a curve showing the projected frequency ratio of evaluation points may produce more relevant findings.This curve gives an objective criterion for determining thresholds to reclassify susceptibility intensity into two or more groups and the accuracy of the landslide susceptibility map.
As suggested by Hirzel et al. (2006), the average model probability map was subsequently categorized into ten classes (with probability intervals of 0.10), and predictedto-expected ratios (F i ) for each category were determined using Eq. ( 12): where p i is the expected frequency of assessment points in class i, and E i denotes the expected frequency expressed as the relative area covered by each class.The Ei was plotted against class intervals.The susceptibility map was reclassified into two major zones (Safe zone and landslide susceptibility zone) by selecting Fi curve threshold points.Fi ¼ 1 denotes a random model.This point is considered the threshold between the safe (F i < 1) and landslide-susceptible (F i > 1) zone.This predictive power of the landslide susceptible map has been visualized by the Boyce Index, which ranges between À1 and þ1, where positive, zero, and negative values of the index indicate good predictive power of the landslide susceptibility, random model, and incorrect model, respectively (Hirzel et al. 2006).The binarized map was further validated with ground-truthing data.The percentage of landslide susceptible areas within each eco-climatic zone was also computed.

Observed pattern of landslide occurrence
The majority of the landslides (96.1%) were confined to <3000 m asl.The landslides occurrences were much more frequent in areas having > 20 slopes and having south (21.82%), south-east (21.23%, east (16.51%), and south-west (15.73%) aspects, of which eastern slopes were affected the most.The observed landslides had manifested primarily in scrubland (35.08%), followed by forested land (26.47%) and least represented in snow regions (2.25%).Likewise, incidences of landslides had a high correlation with proximity to water sources.Landslide occurrences were high (38.39%)within 50 m distance of water sources, followed by the area of 50-100 m distances of water (34.00%) and 100-150 m distances of water sources (16.27%).Also, there was a high correlation between the distance to roads and landslides.Among NDVI classes, the landslide has not occurred within the water area, whereas other classes of NDVI have evidence of landslide occurrences.Deep soil-containing area was the least preferred, whereas the landslide phenomenon mostly preferred shallow and moderately deep soil-containing areas.Loamy with modified erosion and loamy-skeletal on steep sloppy regions (6.39%) were selected mainly by landslide occurrences.Detailed percentage frequency and preferences of landslides regarding nine independent predictor variables are available in the supplementary material (Supplementary Figures 2-11).

Model outcome
The Pearson correlation coefficients (r) among predictor variables show that none of the ten used predictor variables has a strong positive or negative correlation among themselves (Figure 4).Therefore, all predictors were used for several landslide susceptible model buildings.There is a clear difference among the characteristics of presence and absence point locations.The density plot of the predicted values by presence and absence data are depicted from the density plot and box plots (Figure 5).Results were analyzed with six different combinations using a generalized linear model (Table 1).Relative model weights were determined using a performance-based criterion with AUC.Also, the Akaike information criterion (AIC) is used to predict error; thereby, the relative quality of statistical models to incorporate information-based criteria of each model can be estimated.Coefficients of each variable contribution on the best-fitted GLM model have been given in Table 2.All ten variables used in the GLM model are considered best-fitted models, as can be seen from Table 2, Figure 6, of which the AIC value is lowest, and the AUC value is highest among all other models.The SVM model is shown in Figure 6.The models derived from Mahalanobis D 2 and MaxEnt have been provided in Figure 6.The independent variable contribution and influence of each category of different predictors in the MaxEnt model are shown in Figure 7.It can be seen from Figure 7 that soil texture, geology, and LULC contributed the highest among all predictor variables.In contrast, slope and NDVI have the most negligible influence in the built landslide susceptible model using MaxEnt.

Model performance
The combination of training and testing presence data, followed by random partitioning, resulted in higher AUC values for all models (without adjustment) (Figure 8).The comparisons between landslide occurrence and randomly generated point occurrences have an AUC of 0.5, rejecting the null hypothesis of random landslide distribution.Among all four models, the Mahalanobish D 2 model performance was best (AUC ¼ 0.98), followed by the MaxEnt (AUC ¼ 0.91), SVM (AUC ¼ 0.91) and the GLM (AUC ¼ 0.85) (Figure 8).

Ensemble model
The landslide susceptibility map derived ensembling (i.e.combined model outcome) revealed that most landslide-susceptible zones fall at the middle elevational range of the landscape (Figure 9).The susceptible zone further extends towards the southern region and beside the rivers.The map also clearly shows that higher elevational ranges of the landscape have the most negligible occurrence of landslides.Fi value of the predictive map ranges between 0 and 3.2.Boyce Index (Spearman correlation coefficient r: 0.94, p < 0.001) described the good predictive power of the ensemble susceptibility map.It is revealing that 10.5% area of the landscape has susceptible conditions for future landslides, whereas 89.50% of the landscape has been predicted as a safe zone.Further validating the binarized map with ground truthing data shows out of 154 locations, 148 (96%) accurately fall within the risk zone.

Discussion and conclusion
The present study aimed to develop a landslide susceptibility model for the KSL-India, using four different models: GLM, SVM, Mahalanobis D 2 , and MaxEnt.Though GLM (Saha et al. 2022;Brenning 2005), SVM (Huang and Zhao 2018) and MaxEnt (Javidan et al. 2021) was used in several regional level analysis globally but the potential of Mahalanobis D 2 has not been explored yet globally except few (Rabby 2023;Tsangaratos and Benardos 2014) and the model is used first time in the Himalayan landscape to assess landslide susceptibility.Interestingly, the results of the study showed that the Mahalanobis D 2 model performed the best, followed by the MaxEnt and SVM, while the GLM model performed the worst, suggesting use of the Mahalanobis D 2 in future for regional landslide modelling particularly at western Himalayan landscape, which can inform better decision-making and risk management strategies.
The study also aimed to create an ensemble model of the four individual models to obtain a more robust and accurate prediction of landslide susceptibility as ensemble frameworks can reduce both bias and variance and avoid overfitting problems against base classifiers to improve their predictive capability (Kuncheva 2004;Kadavi et al. 2018).The ensemble model had a high predictive ability, with 96% of the ground truthing data falling within the risk zone.To keep the modelling simplicity, we applied weighted AUC ensembling approach, there are more complex ensembling approaches explained in Kadavi et al. (2018) and Thuiller et al. (2009) could be explored for future landslide susceptibility research in the fragile Himalayan landscapes to identify more precise landslide susceptible zones.
It is important to consider the practical application of landslide models and how they can inform decision-making for effective mitigation strategies.While landslide models typically produce maps on a continuous scale, it may not be immediately useful for land managers and decision-makers.Many studies have addressed this issue by reclassifying the maps into 2-5 classes, such as high, medium, and low susceptibility areas.However, such classifications may not accurately delineate only landslide susceptible areas and may not be statistically sound (Shano et al. 2020).In this study, first time, we used the continuous Boyce index not only to evaluate the performance of the ensemble landslide susceptibility, but also reclassified the map meaningfully into safe and risk prone areas.The monotonic increase in predicted-to-expected ratios (Fi) values through Boyes Index indicates that the ensemble model has a high predictive ability.For the Fi ¼ 1 in the curve was used as a threshold between safe (Fi 1) and risk-prone (Fi > 1) zones.
Landslides are geo-environmental severe hazards in the Himalayas, causing many life and property losses.Limited studies are carried out to assess the characteristics of landslides (Sarkar et al. 1995;Mehrotra et al. 1996;Hewitt 1998;Hasegawa et al. 2009), structural and geomorphological features of landslide hazards (Dunning et al. 2009).The relationship between forest and landslide event (Haigh et al. 1995), the effect of the earthquake on land-sliding (Khattak et al. 2010), anthropogenic influences on landslide (Barnard et al. 2001), the effect of precipitation on landslide occurrence (Dahal et al. 2008;Dahal and Hasegawa 2008).Few papers decipher associated risk in landslide occurrence (Anbalagan and Singh 1996;Bhasin et al. 2002;Owen et al. 2008) in several pockets in Himalayan regions.Our study showed that landslides were mainly confined to elevations below 3000 m asl, with a high occurrence on slopes greater than 20 , particularly on south, southeast, east, and southwest aspects.The eastern slopes were the most affected, and landslides occurred primarily in scrubland and forested land, while snow regions were the least affected.Proximity to water sources was also found to be a significant predictor of landslide occurrence, with a high incidence of landslides within 50 m distance of water sources.Moreover, the study found a high correlation between the distance to roads and landslide occurrences.The findings of the study have significant implications for land-use planning and management in the KSL-India.For example, >20 slopes besides roads can be the priority for locally restoration projects.In addition, the high incidence of landslides within 50 m distance of water sources suggests that development activities should be restricted in these areas to minimize the risk of landslides.Furthermore, the high correlation between the distance to roads and landslide occurrences highlights the need for careful road construction and maintenance to minimize the risk of landslides.A similar pattern of landslide occurrences has also been reported by Sarkar et al. (2011) and Pham et al. (2017) from the study in the middle of the Pauri Garhwal, Uttarkashi, and Tehri Garhwal Districts of Uttarakhand Himalaya.However, the study has some limitations that need to be considered when interpreting the results.The study used only ten predictor variables, and other important variables that may affect landslide susceptibility, such as rainfall intensity, were not considered.
In conclusion, the study developed a landslide susceptibility model for the KSL-India, that is essential tools for land managers because it can help predict areas where landslides are likely to occur.By identifying areas at risk, land managers can develop mitigation strategies to minimize the potential impact of landslides.For example, they can establish early warning systems, implement land use zoning to restrict development in high-risk areas, and undertake engineering measures to stabilize slopes and other factors.Moreover, the study's ensemble model had a high predictive ability, therefore, it can help in disaster risk reduction planning and decision making.Land managers can use the information from landslide susceptibility models to prioritize areas for monitoring and targeted risk reduction measures.The developed model may reduce the potential loss of life and property damage due to landslides.

Figure 1 .
Figure 1.Geographical location of Kailash Sacred landscape (KSL) in the state of Uttarakhand, India.

Figure 2 .
Figure 2. Flow chart of research methodology to identify landslide susceptible/ risky zone at Kailash Sacred landscape (KSL).

Figure 4 .
Figure 4. Measure of pair-wise linear correlogram using Pearson correlation coefficients (r) amongst selected relevant geographical and environmental predictors/factors.

Figure 5 .
Figure 5. (A) Distribution density of both landslide presence location (red) and randomly generated absence location (blue).(B) Distribution of both presence (red) and absence (blue) locations represented using box plots.

Figure 9 (
Figure9(A) depicts a landslide susceptibility map in Kailash sacred landscape, Uttarakhand, India, whereas Figure9(B) Boyes Index with Probability of Landslide Occurrence in which the top left plot depicts predicted-to-expected ratios (Fi) of evaluation points against ten landslide probability bins starting from high to low.The monotonic increase in Fi values indicates that the map has a high predictive ability.The solid horizontal line (Fi ¼ 1) is the curve of a random model, which was used as a threshold between safe (Fi 1) and risk-prone (Fi > 1) zones.Supplementary Material 2 contains information about the proportion of areas at risk of landslides within different eco-climatic zones.

Figure 7 .
Figure 7. (A) Different variable contribution in MaxEnt model, (B) variable response in MaxEnt model.

Table 2 .
Individual predictor variable performance in the best-fitted model generated from the generalized linear modeling method.