Assessment of landslide susceptibility for Meghalaya (India) using bivariate (frequency ratio and Shannon entropy) and multi-criteria decision analysis (AHP and fuzzy-AHP) models

ABSTRACT The main goal of the present study is to generate the GIS-based landslide susceptibility map (LSM) of Meghalaya, India. For this purpose, two bivariate statistical (FR and SE) and multi-criteria decision analysis (AHP and Fuzzy-AHP) methods are utilised. The study area is situated over the Shillong Plateau in the North Eastern Region of India and is subjected to numerous landslides due to heavy rainfall and seismic activities. To obtain the LSMs, a landslide inventory of 1330 events and 15 conditioning factors are prepared. The inventory dataset is randomly split into a 70/30 ratio to make training and testing datasets for the study. The results show that the southern escarpment, the southeast region and the hillslopes along the roadsides are highly susceptible to a landslide. The LSMs are validated using the area under the curve (AUC) values and F1-scores, in which AHP (AUCAHP = 0.913; F1-score = 0.884) shows the highest prediction accuracy for Meghalaya. The generated LSMs can help in landslide risk mitigation strategies and sustainable infrastructure development.


Introduction
Landslides are one of the most frequently occurring natural hazards and have proven to be disastrous by causing massive damage to infrastructure, human settlements and loss of lives worldwide. According to the International Disaster Databases (EM-DAT), landslides account for 5% of total natural disasters globally over the last two decades, the fifth highest after floods, storms, earthquakes and extreme temperatures (Mizutori & Guha-Sapir, 2020). Additionally, between 2004 and 2016, 75% of fatal landslide events (3642 out of 4862) occurred in Asia alone (Froude & Petley, 2018). In Asia, India is the second most affected country by this disaster after China, as per the Centre for Research on the Epidemiology of Disasters (CRED; Guha-Sapir et al., 2012). Approximately 12.6% of the Indian landmass (about 0.42 M km 2 ) is prone to landslides and accounts for about 8% of global fatalities due to landslides (NDMA, 2019). The landslides are more frequent in the Himalayan regions than in the rest of India and account for about 80% of total landslide events in the country .
The Himalayas are the youngest fold mountains, risen due to the interaction of the Indian and Eurasian plates with the northward movement of the Indian plate, causing significant stress on rock mass and making them friable, unstable and prone to earthquakes and landslides (NDMA, 2019). The torrential rains in the monsoon season further aggravate the problem of slope failures in the region. In combination with climatic and tectonic activities, the increased anthropogenic activities are also a major causative factor for this hazard (Balamurugan et al., 2016). To reduce the adverse impact of landslides and associated losses and for landslide management of a region, the landslide susceptibility studies (LSS) at a regional scale are proven to be an effective tool (Kanungo et al., 2006;Pourghasemi et al., 2012a). The susceptibility studies tell the probability of an event's occurrence in the considered area under various causative factors (Gudiyangada Nachappa et al., 2020). The outcome of such studies is in the form of landslide susceptibility maps (LSM) which show the spatial distribution of different susceptibility classes and locations with high risks (Chen & Li, 2020). However, the reliability of the LSM depends upon the selected conditioning factors, historical landslides, quality of data and the applied methodology for the analysis and modelling (Sarkar & Kanungo, 2004). The conditioning factors are associated with topography, geomorphology, geology, land use land cover (LULC), anthropogenic activity, rainfall, seismicity, etc. (Shano et al., 2020) and are responsible for slope failure. The relation of these factors with past landslides forms the basis for estimating future susceptibility to landslide occurrence (Chimidi et al., 2017).
In recent times, with the use of geographic information systems (GIS) and remote sensing techniques, several LSS have been carried out worldwide (e.g. Sarkar & Kanungo, 2004;Pradhan & Lee, 2010;Pourghasemi et al., 2012aPourghasemi et al., &, 2012bSur et al., 2016;Ghasemian et al., 2022). Broadly, these landslide susceptibility models can be divided into qualitative (knowledge-based), quantitative (physical or statistical-based), and semi-quantitative (a hybrid of qualitative and quantitative) approaches (Shano et al., 2020). The qualitative approach includes geomorphic and landslide inventory techniques. For comparison, the semi-quantitative methods include both the past landslide data and an indirect process involving multicriteria decision analysis (MCDA) techniques based on expert judgement to evaluate the weightage of different thematic data layers (Yilmaz, 2009). The most popular semi-quantitative approaches include the analytical hierarchy process (AHP; Kayastha et al., 2013;Zhao et al., 2017), weighted overlay (Pandey et al., 2008;Shit et al., 2016) and fuzzy logic-based methods (Pham et al., 2021;Balamurugan et al., 2016;Ercanoglu & Gokceoglu, 2004;Pourghasemi et al., 2012a;Sur et al., 2016). The quantitative approaches rely upon historical landslides and their spatial correlation with each conditioning factor to derive the weight of these factors. Frequency ratio (Chimidi et al., 2017;Gudiyangada Nachappa et al., 2020), Shannon entropy (Roodposhti et al., 2016;Zhao et al., 2017), weights of evidence (Chahal et al., 2017;Nohani et al., 2019), logistic regression , neural networks (Kanungo et al., 2006), random forest (Pham et al., 2021), support vector machine (Kavzoglu et al., 2014) and hybrid approaches (Chen & Chen, 2021) are the most frequently utilised quantitative methods for LSSs in the literature. The merits and limitations of various LSS models have been recently reviewed by Shano et al. (2020). Although there is abundant literature on susceptibility studies worldwide, no single approach can be considered ideal. Therefore, for landslide susceptibility mapping, selecting an appropriate model depends on the nature, scale, quality and data availability for the assessed area of interest .
The present study aims to compare the most popular bivariate statistical methods (frequency ratio and Shannon entropy) and semi-quantitative MCDA approaches (AHP and Fuzzy-AHP) for landslide susceptibility mapping and get an insight into their prediction capabilities in terms of accuracy in vulnerable hilly regions such as the Himalayas. There are many LSSs available for the western and central Himalayan regions of the Lesser andShivalik Himalayas (2021Pham et al., 2019;Kayastha et al., 2013;Mathew et al., 2009;Sarkar & Kanungo, 2004;Sur et al., 2016). However, such studies for the Eastern Himalayas in India's north-eastern region (NER) are minimal (Dikshit et al., 2020;Nerella et al., 2019). For example, Pandey et al. (2008) constructed the landslide hazard zonation map of Dikrong river basin of Arunachal Pradesh in the NER of India using the weighted overlay method. They reported slope, rainfall, complex geological setting and human interference as the major causative factors in the region. Ghosh and Carranza (2010) applied the evidence-belief-function model to generate a predictive map of rockslides at a local scale in Darjeeling in the Eastern Himalayas (India). For the same region, the index of entropy approach is utilised by Mondal and Mandal (2019). It identifies soil type as a major governing factor in the landslide susceptibility assessment of the region. Balamurugan et al. (2016) have utilised frequency ratio (FR) and fuzzy gamma operator models for landslide zonation of part of national highway-39 in Manipur, NER of India. Similarly, Chanu and Bakimchandra (2022) performed a landslide susceptibility assessment along a major highway in Manipur, NER of India, using the AHP model. In the NER of India, such studies concentrate on assessing susceptibility either along a road network or at a local scale, and region-specific LSS is lacking.
In this study, the landslide susceptibility map of Meghalaya, situated on the Shillong Plateau in the NER of India, is presented using bivariate and MCDA methods. Meghalaya has numerous faults, shear zones and other tectonic features. Heavy rainfall, high seismicity and numerous tectonic features make the area susceptible to hazards like landslides. The region in the recent past has experienced several such events, e.g. the Sonapur landslide in the East Jaintia Hills district, the landslide in the Jowai area in the West Jaintia Hills district and the Mawlai-Umjapung landslide in the East Khasi Hills district of Meghalaya (GSI). In 2020 alone, the considered study region has experienced more than 20 landslide events (NERDRR, 2020). Among several landslides, the Sonapur landslide, which was first activated in 1988 due to the 1987 Cachar earthquake, is still getting triggered during monsoon, causing large-scale destruction (Umrao et al., 2016). Therefore, to understand the severity of the landslide scenario and the influence of various factors governing it, evaluating landslide susceptibility at the regional scale and mapping for Meghalaya becomes indispensable. The details of the study area, various conditioning factors applied, methodology and results obtained are discussed in the following sections.

Description of the study area
Meghalaya, the abode of the clouds, is one of the seven sister states in NER of India and the rainiest place in the country, covering about 22,400 km 2 area (between 89° 49' 15" E to 92° 48' 14" E and 25° 1' 51" N to 26° 7' 5" N, Figure 1). The area has received an average annual rainfall of 1234-7467 mm over the last 30 years (Indian Meteorological Department). Rolling Shillong Plateau with a southern escarpment is the region's prominent landscape feature. It shares its boundary with Assam in the north and east while forming an international border with Bangladesh in the south and west. The elevation profile of the study area varies from 7 to 1962 m above the mean sea level with sudden rise along the southern escarpment within a short distance, forming a steep slope in the south of the study region. In the study area, the slope ranges from 0° to 76°.
The study region is also characterised by high seismicity due to complex seismotectonic and lithological settings. The area is influenced by active tectonic deformation (Agrawal et al., 2022). E-W trending Dauki Fault along the southern boundary, Oldham Fault in the north, N-S trending Dhubri Fault in the western portion, Barapani Shear Zone, Kopili and Umrangso Faults in the eastern and northeast parts of the study area are some of the major tectonic features. The area is covered by various lithologic formations, including Proterozoic (Paleoproterozoic, Mesoproterozoic) (Pr), Late Carboniferous-Permian (LcP), Mesozoic (Jurassic, Cretaceous) (Ms), Palaeogene (Oligocene, Eocene, Palaeocene) (Pl), Neogene (Miocene, Pliocene) (Neo) and Cenozoic (Holocene, Quaternary, Meghalyan, Middle-late Pleistocene) (Cn) types of formations (Figure 2(a)). Proterozoic formation covers approximately 51% of the area in the region (Table 1) and can be considered of the highest strength against deformation (Agrawal et al., 2022). On the other hand, sandstone, siltstone, conglomerates of a different era and recent Quaternary sediments are medium to low strength and cover about 37% of the study area (Table 1). These lithological units are primarily concentrated in the southern portion of the region. Concerning land use land cover, most of the study area is covered by dense vegetation (76.06%), followed by light vegetation (17.25%), human settlements and built spaces (3.22%), agricultural land (2.96%), water bodies (0.45%) and rock outcrop and bare lands (0.05%). These topological, geological and other geoenvironmental factors and increased anthropogenic activities make the study area prone to disastrous events like landslides.

Data collection
The present study collects data from several sources, such as the Bhukosh-Geological Survey of India (GSI) (https:// bhukosh.gsi.gov.in/Bhukosh/MapViewer.aspx) for the construction of landslide inventory, geomorphology

Landslide inventory
The prediction accuracy of the LSM primarily depends upon the accuracy of the inventory of the past landslide data (Reichenbach et al., 2018). Landslide data points are collected from the Bhukosh-GSI and Google-Earth images. A sum of 1330 landslides is obtained and mapped to produce the inventory map, as shown in Figure 1. The area of mapped landslides varies from 100 m 2 to 124,319 m 2 . As landslides with a size of less than one cell (10 m × 10 m) cannot be drawn, the minimum area is fixed at 100 m 2 , and landslides equal to or larger than this size are considered for the present LSS. The identified landslides are mostly rainfall-induced, followed by road constructions and other anthropogenic activity, and very few are associated with seismic activity. The failure mechanism of these landslides was either shallow rotational or translational failure with debris and rock-cum-debris movement (GSI). Finally, the landslide inventory data are randomly distributed in a 70/30 ratio to create the training and testing dataset, respectively (Gudiyangada Nachappa et al., 2020). The training dataset (at 933 locations, 70%) is used to build the model, and the testing dataset (397 sites, 30%) is used to validate the model.

Landslide conditioning factor
Landslides are often governed by many factors related to topography, hydrology, geology and human activities. These factors are known as landslide conditioning factors and are crucial pillars of any GIS-based LSS (Sarkar & Kanungo, 2004). Based on the previous studies and regional geological-geomorphologicalenvironmental characteristics, 15 landslide conditioning factors such as slope, elevation, aspect, plan curvature, distance from river, roads and faults, topography wetness index (TWI), stream power index (SPI), land use, normalised difference vegetation index (NDVI), rainfall, soil texture, lithology and geomorphology are considered in the present study (Pourghasemi et al., 2012a;Pradhan & Lee, 2010).

Slope (degrees), aspect and elevation.
The slope angles directly impact the landslides. With the increase in the slope angle, the effect of stress and gravity on the surface-forming material increases (Sarkar & Kanungo, 2004;Yilmaz, 2009). The amount of sunshine and rainfall received are affected by the slope aspect, which describes the direction of the slope face. It impacts the wetness index, weathering conditions and land cover (Ercanoglu & Gokceoglu, 2004). On the other hand, elevation influences landslides indirectly by affecting rainfall, surface-forming material and geological properties, land use/cover and tectonics Mathew et al., 2009). Figure 2(b-d) shows the slope, aspect and elevation map of the Meghalaya derived from the corrected DEM using the spatial analyst tool in ArcGIS 10.8 in the present study. The slope layer is classified into five classes, the aspect layer into nine categories and the elevation layer into eight, as shown in Table 3.

Plan curvature.
The plan curvature is also derived from DEM using the spatial analyst tool, shown in Figure 2(e). Curvature influences the surface erosion processes, especially during rainfall, by either converging or diverging the downhill flow and thus becomes one of the critical factors controlling the landslide (Gupta & Dixit, 2022; Ram & Gupta, 2022). The plan curvature is classified into three classes: concave (<-0.05), flat (−0.05-0.05) and convex (>0.05).

Stream power index (SPI) and topographic wetness index (TWI). SPI is a topographic factor that
reflects the erosive power of streams in any catchment assuming the discharge is proportional to a specific catchment area (A S ). It can be calculated by equation (1; Moore et al., 1991).
Where β is the local slope (degrees). TWI is another topographic factor frequently used in LSS, suggesting the tendency of water to accumulate at any point in the catchment and the tendency of movement of water along the slope under gravitational forces (Pham et al., 2021). Water accumulation at any point can affect the stability of the slope, depending on the surface-forming material and its effect on the geotechnical properties like permeability, pore water pressure and shear strength (Yilmaz, 2009). It can be defined by equation (2).
Where a is the upslope catchment area, and tan β is the surface slope. The SPI and TWI maps for the LSS assessment are prepared using SAGA GIS tools in QGIS and classified into five classes, shown in Figure 2(f-g).

Distance from the river.
Distance from the river is inversely related to landslides, as the closer the river, the more the chance of the slope being unstable. The proximity to streams increases the soil moisture and erodes the slope's toe, which increases the susceptibility to slope failure (Pourghasemi et al., 2012a). The stream network map is derived from DEM data in ArcMap using the hydrology-spatial analyst tool and classified into five different buffer zones from the river at a 150 m distance (Figure 2(h)).

Distance from road. An anthropogenic
activity like road construction alters the natural slope of the hilly area and increases the slope instability. In the past, numerous landslides have occurred in the vicinity of roads either constructed or under construction Roodposhti et al., 2016). The road network data (primary, secondary and tertiary roads) is collected from the OpenStreet map (https:// www.openstreetmap.org/export). Finally, the area is divided into five different buffer zones from the roads at a 150 m distance (Figure 2(i)).

Distance from fault. Fault represents struc-
tural discontinuities with reduced rock strength, making the area vulnerable to landslides (Chen & Li, 2020). In this study, major structural discontinuities are obtained from Bhukosh-GSI and buffered into five zones at 1000 m distance intervals (Figure 2(j)).

Land use land cover (LULC).
Any region's land use land cover (LULC) indirectly represents exposure to surface erosion and influences slope stability. The bare land and built space have positively impacted landslides in the past. The present study uses a global LULC map derived from Sentinel-2 imagery at 10 m resolution by Esri (Karra et al., 2021). The map has 10 land use classes: water, trees (forested area/dense vegetation), grass, flooded vegetation, crops, shrub, built space, bare ground, snow/ice and clouds. The LULC map is extracted by mask for the study area, and classes like grass and shrub are grouped into a single category named light vegetation. In contrast, flooded vegetation and crop are grouped into agricultural land (Figure 2(k)). Furthermore, the accuracy assessment of reclassified LULC map is done through randomly generated 300 points falling under different land-use classes ( Table 2). The overall accuracy is 85.33%, while the kappa coefficient (k) value is 0.824. The value of k > 0.8 shows that the used map is reasonably accurate (Gupta & Dixit, 2022).

Normalised difference vegetation index.
Normalised difference vegetation index (NDVI) is an indicator of green cover over an area and the health of the biomass. Higher NDVI values indicate more vegetation cover, and a healthy vegetation cover offers higher slope stability and reduces landslide probability (Nohani et al., 2019). The NDVI is obtained from near infrared (NIR) and red band (RED) by using equation (3). For this purpose, Sentinel-2 multispectral imagery with 10 m resolution is utilised, and obtained NDVI values are grouped into six classes, as shown in Figure 2(l) and Table 3.

Rainfall.
Precipitation, especially rains, is one of the foremost reasons for landslide occurrence on hill slopes. The amount of rainfall and its intensity depend upon the topographical characteristics and vary spatially and temporally, influencing the landslide susceptibility in the region. However, the influence of rain on landslides is governed by the slope-forming material, land cover and lithology (Kayastha et al., 2013). In the present study, rainfall data for the years 1991 to 2020 is collected from the India Meteorological Department, and the annual mean of the past 30 years` precipitation is considered to create the rainfall map in the GIS environment (Figure 2(m)).

Soil texture. The topsoil characteristics are
another essential factor affecting the landslide susceptibility of an area (Sarkar & Kanungo, 2004). Properties like permeability, cohesion and others are influenced by soil texture, affecting the nature and intensity of slope movement (Ghasemian et al., 2022).
In the present study, the soil map is derived from the FAO world soil map. The soil present in the area is mostly loam, sandy loam and clay (Figure 2(n)).

Geomorphology.
The geomorphology of an area also influences the landslide occurrence in the area and is considered in many susceptibility studies . A geomorphological map for the study area is obtained from the Bhukosh-GSI. The region is classified into seven geomorphological units (highly dissected plateau (HDP), moderate to low  dissected plateau (MDP), highly dissected hills and valley (HDHV), moderate to low dissected hills and valley (MDHV), pediment-pediplain complex (PC), alluvial-flood plain (AP) and water bodies (W); Figure 2(o)).

Lithology. The lithological units of an area
have varying rock strength profiles and permeability, thus, governing the susceptibility to failure. Therefore, in LSS, it is considered one of the essential factors (Pradhan & Lee, 2010). The lithological map of the study area is obtained from Bhukosh-GSI (at a scale of 1:2 M). The lithological formations are grouped into six classes depending upon the geological era, as mentioned in section 2 (Figure 2(a)).
All 15 landslide conditioning factors are transformed into a spatial resolution of 10 m before using the LSS.

Frequency ratio (FR)
This approach suggests the possibility of a future event based on past information (Chimidi et al., 2017;Yilmaz, 2009). FR derives the spatial relation between landslide location (number of landslide pixels) and different landslide conditioning factors (Nohani et al., 2019;Pradhan & Lee, 2010;Shano et al., 2020) using equation (4). As it represents the possibility of occurrence, the greater FR value shows higher chances of landslide occurrence and higher corresponding hazard (Pradhan & Lee, 2010la).
Where FR i = frequency ratio of i th class, LS i = total landslide area (number of landslide pixels) in the i th class, LS = total landslide area (total number of landslide pixels) in the study area, A i = area falling under i th class (total number of pixels of i th class) and A = total area (total number of pixels of the entire map).
These FR values of different classes (Table 3) are then used to obtain the prediction rate (PR) of each factor which depicts the weightage of the individual class, using equations (5) - (7).
Where RF is the relative frequency, MAX(RF i,j ) is the maximum value of RF of j th factor, MIN(RF i,j ) is the minimum value of RF of j th factor, PR j is the prediction rate of j th factor.
The FR values obtained using equation (4) will act like the weight of each class (w ij, FR ), and PR j is converted to a percentage, which will be the weight of the j th factor, i.e. W j, FR . To generate the landslide susceptibility index (LSI) using the FR method, the weightage to each class of every landslide conditioning factor is given as per the corresponding FR values obtained by equation (4) and then integrated with the corresponding weight of each element.

Shannon entropy (SE)
Entropy is the quantitative measurement of a system's deviation, variability, instability and uncertainty and can predict the future trend of a specified system (Lotfi & Fallahnejad, 2010). The Shannon entropy has been widely used for the weighted index calculation in landslide and other hazard studies (Nohani et al., 2019;Pourghasemi et al., 2012b;Zhao et al., 2017). It analyses the dissimilarity in the system in LSS, demonstrating the potential for each contributing factor to cause a landslide. A higher SE index indicates a more significant impact of the factor on the landslide occurrence (Roodposhti et al., 2016). Equations (8) to (10) are used for the calculation of the information coefficient (weighted index) based on SE (Pourghasemi et al., 2012b;Zhao et al., 2017). Where FR = frequency ratio as defined in equation (4), P ij = probability density for each class, D j = entropy of the j th conditioning factor, m j = number of classes in the j th factor, n = number of landslide conditioning factors and W j,SE = entropy weight of each factor. Table 3 shows entropy weights obtained for all the conditioning factors. These weights are normalised and integrated with FR to produce the LSM.

Analytical hierarchy process (AHP)
The analytical hierarchy process (AHP) is a semiquantitative, multi-criteria decision-making approach developed by Saaty (2000), which breaks down a complex problem into a simple and subjective hierarchy. It involves problem definition, objective, alternatives, pairwise comparison matrix for weight determination and overall priority of the factors (or sub-factors) contributing to landslide (Saaty, 2008;Shano et al., 2020). In LSS, it is frequently utilised for assigning weightage to conditioning factors and sub-factors (Kayastha et al., 2013;Shahabi & Hashim, 2015).
In AHP, conditioning factors (or their classes) are arranged in the hierarchical order and assigned a numerical value subjective to judgement based on their relative importance, forming a pairwise comparison matrix A ½ � n�n . In matrix [A], the scale of assigned value can vary between 1 and 9 based on the degrees of preference of one factor (on the vertical axis) over the other (on the horizontal axis) ( Table 4). A higher value shows greater dominance of that factor. Similarly, these values can vary inversely (1/9 to 1) when the element on the horizontal axis is more dominant than that on the vertical axis (Table 4). In the present study, for assigning the degree of preference scale to a factor (or their classes), the relative percentage of area affected by landslide in that class category is used to make the judgement. Thus, it allows the consideration of 'previous knowledge' and reduces the bias in the scheme (Yilmaz, 2009).
Once the pairwise comparison matrix is constructed, the weights (values; e.g. A ij ) are normalised, and the criteria weights (CW), representing the weight of each factor (or of their classes), can be obtained using equations (11) and (12).
The final step is to calculate the consistency index (CI) and check for the consistency ratio (CR) using equations (13) and (14), respectively, which should be less than 0.1.

CR ¼ CI=RI
Where λ max is the principal eigenvalue, n is the order of the matrix and RI is the random consistency index that depends upon the order of the matrix (Table 5).
In the present study, for the pairwise comparison matrix of conditioning factors, the CR is equal to 0.049 (Appendix Table A1). Also, for the comparison matrix of classes of each factor, the CR value is less than 0.10 and ranges from 0 to 0.054 (Appendix Table A2).

Fuzzy-AHP (FAHP)
The fuzzy set theory, first developed by Zadeh (1965), deals with the vagueness of the system by moving away from crisp values and considering the concept of partial truth. To handle the fuzziness and impreciseness, the fuzzy sets are integrated with popular statistical and non-statistical methods (Emrouznejad & Ho, 2017;Groumpos, 2010;Salmeron, 2009) and has been widely used in various field like environmental impact assessment (Aryafar et al., 2013), hazard and risk assessment (Aghataher et al., 2008), palaeontology (Papadopoulou et al., 2021), susceptibility studies (Mallick et al., 2018;Roodposhti et al., 2014;Sur et al., 2016) and many more. Various FAHP methods are available in the literature, such as the fuzzy priority method of Laarhoven and Pedrycz (1983), Buckley's (1985) geometric mean method and Chang's (1996) extent analysis method.
For the present study, Fuzzy-AHP (FAHP), using a geometric mean method developed by Buckley (1985), is adopted. This method is computationally   Kahraman et al., 2003). It is an extension of AHP using the linguistic variables, and the steps involved are summarised below (Buckley, 1985): Step 1: Fuzzification Fuzzification is the conversion of a linguistic term into a membership function. A triangular membership function is shown in Figure 3. The parameters l 1 , m 1 and u 1 denote the lowest value, most likely value (middle value) and the upper value that forms a fuzzy value (µ A , e.g. μ A;11 ¼ l 1 ; m 1 ; u 1 ð Þ) and is called TFN (Kahraman et al., 2003).
Using TFN, a pairwise comparison matrix M ¼ μ ij � � is constructed (Appendix Table A3 Where μ ij ¼ l ij ; m ij ; u ij À � , i, j = 1, 2, . . ., n is TFN. Step 2: Calculation of fuzzy geometric mean value (r i ) for i th criteria Step 3: For each criterion, calculation of fuzzy weights (w i ) Where Step 4: De-Fuzzification The next step is to de-fuzzify the fuzzy weights, i.e. transform them into crisp values. There are different defuzzification methods such as the centre of gravity (COG), the centre of the area (COA), mean of maxima (MOM), extended COA and Decade algorithm (Runkler, 1997;Saade, 2000), and a detailed discussion of the respective methods are given by Runkler (1997). In the present study, we adopted the COA method of defuzzification (Emrouznejad & Ho, 2017) as follows: Where w i is non-fuzzy weights. The normalised de-fuzzified weights are obtained for both conditioning factors (W j,FAHP ) and their classes (w ij,FAHP ), as shown in Table 6.
Finally, the weightage corresponding to each class (w ij,k ) is integrated with the respective conditioning factors for all four models using the raster calculator in ArcGIS 10.8, and LSI is generated using equation (19).
Where W j,k is the weight of the j th conditioning factor using the k th model, and k is equal to 1, 2, 3 and 4 for FR, SE, AHP and FAHP, respectively. The LSI obtained using all four methods is classified into five susceptibility classes (very low, low, moderate, high and very high) based on the natural breaks classification system (Pourghasemi et al., 2012a), and susceptibility maps are produced using ArcGIS 10.8. Finally, the model validation is performed. The flow chart summarising the adopted methodology is represented in Figure 4.

Model validation
In susceptibility studies, model validation is a nondisposable step that suggests the effectiveness of the applied models. In the present study, the maps obtained from all four models are compared with the existing landslide data (true points) to determine the performance of each model. For this purpose, the testing dataset (30% of total landslide inventory) is utilised. To evaluate the models' performance, F-score statistics and area under the receiver operating characteristic curve (AUC) are applied (Chen et al., 2018;Mathew et al., 2009).
The F-score is a way to evaluate the accuracy of a model with binary classification systems (Ghasemian et al., 2022). F-score statistics capture the model's precision rate and sensitivity (recall) into a single measure. It is also known as F1-score when balance emphasis is given to both precision and sensitivity metrics in the calculation (Shahabi et al., 2021). The F1-score is a harmonic mean of these two measures (i.e. precision and sensitivity) and is calculated as  Where beta = 1, which balances weight on precision and sensitivity precision ¼ TP=ðTP þ FPÞ sensitivity ¼ TP=ðTP þ FNÞ. TP (true positive) is the number of landslide pixels that are correctly classified, FN (false negative) is the number of landslide pixels that are classified as non-landslide and FP (false positive) is the number of pixels that are associated with the nonlandslide event but classified as landslide pixels.
The receiver operating characteristic (ROC) curve allows the comparison of sensitivity (i.e. true positive rate) with false positive rate (i.e. 1 -specificity) in the output maps (Roodposhti et al., 2014). It is a probability curve generated by plotting sensitivity on the y-axis against the false positive rate on the x-axis. The ability of a landslide susceptibility model to differentiate between various classes is measured by the area under the curve (AUC), which is used as a summary of the ROC curve. The AUC value varies from 0.5 to 1, and the higher the value, the better the model will be distinguishing between positive (landslides) and negative (nonlandslide) classes. In general, AUC value is classified as excellent (>0.9), very good (0.8-0.9), good (0.7-0.8), average (0.6-0.7) and poor (<0.6; Chen et al., 2018).

Identification of most influential factors and their classes
In GIS-based LSS, it is essential to identify the relative influence of each conditioning factor and its classes on the event's occurrence. The weights corresponding to  each factor and their classes are calculated using FR and SE methods, listed in Table 3. The FR value shows a spatial correlation between factors and landslide inventory. It is assumed that the higher the FR, the more significant the influence of a particular factor on the landslide. In the present study, pixels with slopes equal to or greater than 30° have higher FR. In AHP and FAHP models (Table 6), the subcategory of 30°-40° and >40° slope also show a more significant influence than others (Appendix Tables A1-A4). In the case of FR and SE models, subfactor of bare land of LULC, clay of soil texture, Mesozoic of factor lithology, and areas with SPI > 1.2, TWI < 5, rainfall > 6100 mm/year in the study region are showing greater susceptibility for landslide than other class categories of the respective conditioning factors (Table 3). Among 15 conditioning factors, slope, LULC, TWI, SPI and lithology are the most influential factors as per the FR model ( Figure 5). In the SE model, along with these factors, soil texture also shows a significant influence on landslide occurrence (Table 3). Using AHP, conditioning factors, such as slope, rainfall, distance from road, lithology and LULC are found with a higher weight share than others ( Figure 5), while the distance from fault is found with the least weightage (Table 6, Appendix Table A1). In the FAHP model, the dominant landslide factors remain the same as in AHP ( Figure 5, Table 6).

Spatial distribution landslide susceptibility using selected models
The present study employs four susceptibility models, namely FR, SE, AHP and fuzzy-AHP, to develop the LSM of Meghalaya (Figure 6(a-d)). For this purpose, 15 landslide conditioning factors and landslide training datasets are used in the model construction. The result shows that the area under the southern escarpment and southeast portion of the study area has moderate to very high susceptibility to landslides in all four cases ( Figure 6(a-d)). According to the FR model, 2.17%, 5.98% and 13.10% landmass of the study area have very high, high, and moderate susceptibility to landslides, respectively (Figure 7). Approximately similar distributions of susceptibility classes are obtained using the SE model, which revealed that 2.07%, 5.38% and 10.87% of the area fall under very high, high and moderate classes, respectively. Whereas 4.01%, 12.04% and 26.85% of the area are identified with very high, high and moderate susceptibility classes, respectively, using the AHP model (Figure 7). For the FAHP model (Figure 7), 3.88% and 12.15% area (second largest after AHP) show very high and high susceptibility categories. In comparison, 27.35% area shows moderate susceptibility to landslides, the highest among all four models (Figure 7). Along with the southern escarpment and southeast region of the study area, these classes are also concentrated near the road network of the study area in the case of AHP and FAHP models (Figure 6(c-d)).

Validation of landslide susceptibility maps
The testing dataset (397 events; 30% of total landslide inventory) is utilised for performance evaluation and validation of produced susceptibility maps. Figure 8 shows the distribution of the number of landslides in each susceptibility class corresponding to each model. In the case of the MCDA approach (AHP and FAHP), more than 80% of landslides have occurred in very high and high susceptibility classes (Table 7). Less than 5% of the total landslide points are erroneously classified and fall into the low and very low classes using AHP and FAHP approach. However, for the bivariate approaches, approximately 60% (in the case of FR) and 65% (in the case of SE) of known landslide events occurred in the landslide susceptibility class of very high to high. The FR approach shows ~16% of events in the low (13.33%, 53 of 397) and very low (2.52%, 10 of    Figure 8). In comparison, ~19% of events occurred in low (14.35%, 57 of 397) and very low (4.78%, 19 of 397) susceptibility classes in the case of LSM produced by the SE approach (Figure 8).
The LSM produced using the adopted models is validated using the ROC-AUC method and F1-score statistics. The ROC curve can also be drawn using a training dataset called the success rate curve; however, the success rate is not a correct method for evaluating the prediction capability of the models (Pourghasemi et al., 2012a). Therefore, ROC using the testing dataset only is adopted in the present study. The ROC curve produced using the testing dataset (prediction curve) for all four models is shown in Figure 9. According to obtained AUC values, the AHP model indicates more accurate results (AUC = 0.913) than the other applied models for the study area. For FAHP, FR and SE models, AUC values are 0.903, 0.896 and 0.888, respectively (Table 7). For F1-score statistics, the AHP method also possesses the highest F1 score (0.884), followed by FAHP (0.867), FR (0.772) and SE (0.739), as shown in Table 7. Moreover, all four models (AHP, FAHP, FR and SE) demonstrate good performance for the landslide susceptibility analysis (AUC > 0.8).

Discussion
Susceptibility mapping is one of the most-applied approaches and the first step towards landslide hazard and risk mitigation. The outcome of such susceptibility studies depends upon the applied conditioning factors (Nohani et al., 2019). However, there are no fixed criteria for selecting the conditioning factors at present . Therefore, based on the published literature on landslide susceptibility and past landslide  characteristics, 15 landslide conditioning factors are adopted in the present study. Among the selected set of factors, slope (degrees) is found as the most significant factor influencing landslides in the area. In this study, the landslides are primarily associated with locations having slope ranges from 30° to 40° and >40°, similar to Mathew et al. (2009). Besides Slope, Lithology, LULC, Rainfall, TWI and Distance from roads are also identified as critical factors influencing landslides, consistent with the previous studies (Chen & Li, 2020;Pourghasemi et al., 2012a;Shahabi & Hashim, 2015).
The present LSS applies prevalent and widely used bivariate statistical (FR and SE) and MCDA models (AHP and FAHP) for LSM of Meghalaya, India. The prediction power of each model is obtained using a testing dataset. We identified AHP (AUC AHP = 0.913, F1-score = 0.884) as the best model followed by FAHP, FR and SE for the considered study area (Figure 9). Kavzoglu et al. (2014) also reported the MCDA model (AHP) as a better model than other applied models in their study. Some studies reported fuzzy-AHP as a better model than AHP (Mallick et al., 2018;Sur et al., 2016). Zhao et al. (2017) also compared fuzzy-based SE and AHP models and reported SE with higher prediction accuracy than fuzzy AHP. In Fuzzy-AHP, the fuzzy comparison matrix lacks consistency (Duru et al., 2012), which may explain the better performance of AHP over FAHP in the present study. The prediction accuracy of SE is comparable to that of FR in the present study (Figure 9), which is consistent with others (Nohani et al., 2019;Youssef et al., 2015). Nevertheless, the spatial distribution of high to very high landslide susceptibility class for all four models is approximately similar and concentrated along the southern escarpment and southeast portion of the study area (Figure 6a-d).
The findings of the present study can be used for the estimation of the socioeconomic vulnerability to landslides in the study area in terms of socioeconomic losses and downtime (Agrawal et al., 2021). Overall, the applied four models are acceptable for the LSS of Meghalaya based on AUC and F1-score results. The LSS is data-driven and controlled by geologic conditions, anthropogenic activity and LULC. Therefore, the study has some inherent limitations, which can be reduced by applying a high-resolution dataset with advanced data mining techniques and considering temporal variations in the dataset.

Conclusion
This study uses FR, SE, AHP and FAHP models to generate the landslide susceptibility map of Meghalaya, NER of India. The landslide inventory of 1330 data points is prepared and distributed in a 70/ 30 ratio to form training and testing datasets. Based on the present study, the slope, rainfall and lithology are the most influencing factors among the selected 15 conditioning factors. The prediction performance of each model is evaluated by the AUC and F1 scores. The results showed that the AHP model performs better than the other three models in the present study, with an AUC value of 0.913 and an F1 score of 0.884. The produced LSMs reveal that the southern escarpment of the study area, the area in the southeast, and hillslopes along the roads possess greater susceptibility for future landslides. If the road network gets affected due to landslide events, the intra-district/state and inter-district/state connectivity get hampered and impart substantial economic losses to the population in the region. Therefore, the presented LSM for the considered study area can help the authorities and decisionmakers to plan and prepare the risk mitigation strategies for future landslides and plan the sustainable infrastructure development in the region accordingly.