An integrated approach based landslide susceptibility mapping: case of Muzaffarabad region, Pakistan

Abstract Landslides result in the devastation of property and loss of lives. This study assesses landslide susceptibility by employing geographic information systems (GIS) and machine learning techniques, that is, support vector machine (SVM) and artificial neural network (ANN), with the integration of advanced optimization techniques, that is, particle swarm optimization (PSO). The landslide-inducing factors considered in this study include fault density, lithology, road density, slope, elevation, flow direction, aspect, earthquake intensity, curvature, Normalized Difference Water Index (NDWI), waterways density, rainfall, and Normalized Difference Vegetation Index (NDVI). The resulting landslide susceptibility maps (LSMs) showed that the areas falling under the high and very high susceptibility class have higher rainfall levels, weak lithology, high NDWI, and flow direction. The accuracy assessment of the techniques showed that ANN with an Area Under the Curve (AUC) of 0.81 performed better than SVM with an AUC of 0.78 without the optimization. Similarly, the performance of ANN was also better than SVM using PSO. During the integrated modeling, the AUC of PSO-ANN was 0.87, whereas the AUC of PSO–SVM was 0.84. The accuracy assessment of the produced LSMs also showed a similar trend in terms of accuracy percentage as that of the models.


Introduction
Landslides are often considered to be one of the most destructive types of natural hazards.causing extensive destruction, including harm to natural resources, cause loss of life, and lead to the destruction of property (Mondini et al. 2021).Almost everywhere in the world, there are different types, frequencies, and intensities of landslides, such as rockfall and debris flow (Zhu et al. 2019;Abdo et al. 2022;Ghorbanzadeh et al. 2022).Primarily in the world's hilly regions, they are acknowledged as the most dangerous natural hazards causing colossal destruction (Zhu et al. 2019;Abdo et al. 2022;Maqsoom et al. 2022).Landslide hazard losses are anticipated to escalate in the future due to increased urbanization, economic growth, and atypical occurrence of severe regional precipitation brought on by climate change (Saha et al. 2021;Jakob 2022;Li et al. 2022;Naceur et al. 2022).Furthermore, early mapping of landslides in the aftermath of heavy rains or severe earthquakes is critical for speedy reaction, delivery of humanitarian aid, and other disaster mitigation measures.(Piralilou et al. 2021;Jaafari et al. 2022;Pham et al. 2022).Therefore, it is vital to research and assesses this severe natural hazard in order to prevent and lessen its devastating effects through susceptibility modeling and offering susceptibility maps (Bai et al. 2021;Piralilou et al. 2021;Yuan et al. 2022).
The creation of accurate and precise landslide maps is crucial for effective risk assessment.Achieving high-quality landslide susceptibility models typically involves utilizing various geospatial factors that trigger landslides, including elevation, land use, aspect, geology, land cover, slope, and other relevant variables (Mallick et al. 2021).Numerous studies were conducted in the past to examine landslide risk using various methods, as evidenced in the existing literature.In 2022, Saleh Yousefi and colleagues conducted a study to evaluate the susceptibility of a 6682 km area to landslides.They utilized a two-step methodology that involved creating landslide susceptibility maps using (a) boosted regression trees, (b) random forest, (c) multivariate adaptive regression splines, and (d) multiple discriminant analysis.In addition, they applied the analytic hierarchy process (AHP) to determine the exposure of roads to landslides by assigning weights to four buffer zones (i.e., 0-50 m; 50-150 m; 150-300 m; and >300 m) based on their proximity to the road network (Yousefi et al. 2022a).In 2019, Ataollah Shirzadi and co-authors conducted a study to examine the impact of sample size and raster resolution on the accuracy of landslide susceptibility modeling and prediction of shallow landslides.They used the Information Gain Ratio technique to assess the usefulness of each conditioning factor, and they employed the Alternating Decision Tree, which is not commonly used in landslide modeling, to develop their models (Shirzadi et al. 2019).Moreover, numerous other studies have established the superiority of ANN and support vector machine (SVM; Dou et al. 2015;Pham et al. 2016;Chen et al. 2017a;Xi et al. 2019;Yu and Chen 2020;Gautam et al. 2021;Liu et al. 2021;Maqsoom et al. 2022;Aslam et al. 2022c) when used individually or when integrated with other techniques.
For instance, Pham et al. (2016) measured the effectiveness of five ML methods for assessing the susceptibility of landslides in Uttarakhand, India: Fisher's Linear Discriminant Analysis (FLDA), LGR, Bayesian Network, NB, and SVM.They found that in comparison with the other methods, SVM performed better.For the landslide susceptibility mapping of the Indrawati watershed in Nepal, Gautam et al. (2021) employed four ML techniques: LGR, ANN, SVM, and frequency ratio.They discovered that ANN outperforms all of the other techniques.Also, Aslam et al. (2021) adopted a methodology involving the integration of conventional ML techniques, namely support vector machine (SVM), random forest (RF), and logistic regression (LGR), with the DL technique of convolution neural network (CNN) for landslide susceptibility mapping.For feature extraction and modeling during the integration phase, CNN and ML techniques were used, respectively.The outcomes demonstrated that SVM outperformed other models when used exclusively for modeling.Also, when integrated with CNN, SVM showed significant improvement, followed by LGR and RF.To further explore the performance of various techniques, Aslam et al. (2022a) compared different conventional and unconventional ML techniques, including Linear Discriminant Analysis, Naïve Bayes (NB), Quadratic Discriminant Analysis, RF, Multivariate Adaptive Regression Spline, Artificial Neural Network (ANN) for the same objective.The authors observed that the ANN outperformed the other techniques.Furthermore, witnessing the performance of ANN in Aslam et al. (2022a) and CNN in Aslam et al. (2021), Aslam et al. (2022b) compared multiple neural networks, including different CNN architectures and residual network (ResNet), and found that ResNet outperformed others.From these previously conducted studies, it can be concluded that among the conventional ML techniques, ANN and SVM were the best in terms of their performance and that the model integration enhances the predictions.
In general, researchers have employed different methodologies to generate final landslide susceptibility models.These methodologies include physical-based, heuristic, and statistical approaches (Huang et al. 2017;He et al. 2019).The physical-based approach involves the use of physical principles and equations to predict the probability of landslides.These methods require detailed topographical data and are best suited for small areas (Huang et al. 2017).Heuristic approaches, on the other hand, rely on expert knowledge and judgment to identify areas that are prone to landslides based on past experiences and observations, making them somewhat subjective and potentially biased.As a result, they often produce moderately accurate results (He et al. 2019).Statistical methods use quantitative analysis of data to identify the key factors that contribute to landslide susceptibility, but they require normally distributed triggering factors, which can be difficult to obtain (He et al. 2019).In contrast, machine learning algorithms have gained popularity due to advancements in remote sensing databases and algorithm development (Maqsoom et al. 2022) and some examples of commonly used algorithms are ANN, SVM, RF, and decision tree (Tien Bui et al. 2018;Achour and Pourghasemi 2020;Pandey et al. 2020).However, these algorithms have their limitations, including the potential for local optimum, over-fitting, and slow training speed (Hussain et al. 2022).
To address the limitations of traditional machine learning algorithms and generate high-quality landslide susceptibility maps, researchers have developed and utilized ensemble machine learning algorithms.These algorithms have demonstrated superior performance in generating landslide susceptibility maps, including naive Bayes, AdaBoost, random subspace, ANFIS, Rotation forest, Reptree, and bagging (Park et al. 2019;Chen et al. 2021b).The success of these algorithms has inspired scholars to produce and test new ensemble machine-learning models for generating even more precise landslide susceptibility maps.However, there is no consensus on the optimal approach for landslide susceptibility modeling, researchers suggest continued development and testing of new models in order to achieve the highest possible accuracy (Talukdar et al. 2020;Islam et al. 2021).
It has been noted that many ensemble machine-learning algorithms suffer from over-fitting (Talukdar et al. 2020).Eshtay et al. (2018) reported that the weights of the input layers in some ensemble machine learning algorithms are randomly produced and not optimized during the training phase, leading to unstable performance and the potential for overfitting.To address this issue, they proposed the use of metaheuristic optimization algorithms, such as particle swarm optimization (PSO), to search for the best parameters and optimize the weights of the input layer.Xi et al. (2019) in their study evaluated the effectiveness of particle swarm optimization (PSO), a cutting-edge optimization technique, in improving the effectiveness of ANN in simulating the seismic landslides in China's Ludian districts.Results from the PSO algorithm showed that the performance of ANN experienced a significant increase.Similarly, in the study of Zhao and Zhao (2021), SVM and PSO were combined, and the use of grid and slope units was examined in order to evaluate the methodologies for creating LSMs in Luoyang County, China.The outcomes demonstrated that PSO-SVM with slope units outperformed in terms of landslide susceptibility mapping in contrast to the same model with grid units and the individual SVM model with both units.In these previously mentioned two studies, the performance of SVM and ANN was improved after the optimization with PSO, which shows the efficacy of PSO.Therefore from the literature above, it is deduced that ML techniques, when optimized with PSO, provide more accurate results.
The district of Muzaffarabad is the capital of the state of Azad Jammu and Kashmir (AJK), which is an earthquake-prone region owing to its topographical settings.Previously, rainfalls and earthquakes are established majorly as the sources of activation of landslides in the region (Owen et al. 2008).The region has been exposed to frequent earthquakes of various degrees (Rossetto and Peiris 2009).An example of an extremely devastating earthquake event is the 2005 Kashmir earthquake.The earthquake brought havoc not only to Muzaffarabad city, which was the epicentre but also to its surrounding areas and disrupted several slopes prompting 158 landslides (Kamp et al. 2008;Khan et al. 2011).There are limited studies concerning the mapping of landslides in this region.To the best of our knowledge, the majority of these investigations only utilized conventional statistical and machine learning (ML) techniques.For example, Kamp et al. (2008) used the GIS-based multi-criteria evaluation method in their study for plotting landslide susceptibility in Kashmir during the earthquake of 2005 with the help of eight exploited landslide-inducing factors.Other studies that investigated the assessment of landslide dangers influencing the same area included Owen et al. (2008), Saba et al. (2010), Khattak et al. (2010), Riaz et al. (2018), Batool et al. (2021), Hussain et al. (2022), Ahmad et al. (2022) and were limited to conventional techniques.Therefore, the combination of machine learning techniques optimized with metaheuristic optimization algorithms such as PSO can provide reliable and better results for landslide susceptibility mapping in the Muzafarabad region, which can ultimately aid in risk assessment and better disaster risk management of the region.
The objective of this study is to conduct the landslide susceptibility investigation of the Muzaffarabad region using state-of-the-art ML techniques and an optimization technique.The used optimization technique is PSO, and ML techniques are SVM and ANN.The latest landslide inventory and a database of landslide-inducing factors were exploited to establish the models and ultimately generate the study area's landslide susceptibility maps (LSMs).Moreover, as established previously that the techniques involving integrated models perform better than the individual model.This study aims to assess if the integration of PSO and ML techniques fosters better results.Thus, the purpose of using the two ML techniques is to evaluate their effectiveness independently and in the case of integration with PSO.This study used 13 landslide-inducing factors derived from the latest available data for the specified objective stated above.

Study area
Geographically speaking, Muzaffarabad region lies in northern Pakistan's lower Himalayas, in the AJK region.The Muzaffarabad region is tectonically uplifted and dissected by the Main Boundary Thrust (MBT) and Bagh-Balakot faults (known for crustal deformation) (Kazmi and Jan 1997;Saba et al. 2010).The NW Himalayan Syntaxis, where crustal-scale north-leaning fold makeups are overlain over EW local thrusts, is where Northern Pakistan is located.Among these significant north-swerving folds is the Hazara-Kashmir Syntaxis, where significant regional thrusts are folded over Syntaxis' northern extremity.The MBT and the Panjal Thrust are somewhat amalgamated as they loop around the Syntaxis.The Muzaffarabad Thrust, a third fault that borders the Kashmir-Himalaya at its southwest corner and extends diagonally over the Hazara-Kashmir Syntaxis' central region before merging with the MBT-Panjal Thrust at the Syntaxis' western edge (Baig 2006).A 75 km stretch of this fault was ruptured by the 2005 Kashmir earthquake, running from Balakot in the northwest through Muzaffarabad to Bagh in the southeast.(Avouac et al. 2006).
The district is located on a very high and steep hilly terrain above the Neelum as well as the Jhelum river.The considered area is eminent for the earthquake having a magnitude of 7.5 in 2005, causing more than 80 thousand deaths (Avouac et al. 2006;Rai and Murty 2006).Muzaffarabad is the capital of AJK and has an area of 20,665 m 2 .It is located with the geographical coordinates of 34 21 0 30 00 N and 73 28 0 20 00 E. The city of Muzaffarabad suffered the most due to the devastating Kashmir 2005 earthquake (Rai and Murty 2006).
The climate of the region varies greatly.December to February are cold months with snowfall on high mountain peaks.Snowmelt occurs in June, July, and August because of the greater warmth of these months.In the winter, the typical maximum weather is 16 C and a minimum of 3 C, while in summer, the temperature ranges from 35 to 23 C. The annual high temperature recorded is 22.3 C, and the low temperature is 11.1 C. The area receives the highest rainfall for the monsoon spell between June to September every year and receives 1242.8 mm of average annual rainfall.The wettest month, July, receives an average precipitation of 328.7 mm, followed by August, which receives an average of 229.9 mm.From October to December, there is a slight rainfall, with the lowermost average of 37.2 mm observed in November.
The lithologies found in the region date back to the Precambrian age (Hussain et al. 2009).The fundamental lithologies are largely sedimentary or meta-sedimentary, containing schist, slate, shale, siltstone, sandstone, and limestone (Baig 2006;Hussain et al. 2009).These lithologies have been fractured and joined by a number of faults (such as the MBT and Panjal faults) and fold structures (such as the Hazara-Kashmir syntaxis), further weakening their stability.The rugged terrain characterized by steep slopes, deeply incised valleys, and high-relief mountains has exacerbated these precarious geological states (Kamp et al. 2008;Owen et al. 2008;Khan et al. 2019).The study area map is shown in Figure 1.

Materials and methods
As identified previously, the primary objective of this work is to check the applicability and performance of SVM and ANN for evaluating the landslide susceptibility individually and when integrated with PSO.For this, 13 landslide conditioning parameters were selected as the input data along with landslide inventory.A detailed outline of the approach is given in Figure 2, and the different steps involved are as follows:

Landslide inventory and training and testing datasets
The first step in the research was to create the spatial distribution of landslide locations.A widespread assumption is that past, and present landslides are the basis for all landslide prediction studies (Van Den Eeckhaut et al. 2006;Capitani et al. 2013).In other words, a specific set of conditioning elements determines slope failures, and future slope failures are predicted to occur under similar conditions.Consequently, accurate recognition of the landslide locations is substantial for the probabilistic investigation of landslide susceptibility (Pradhan et al. 2009;Pradhan 2010;Choi et al. 2012;Umar et al. 2014).
In the current study, the past landslide locations were marked using the official historical records and Landsat satellite images, which are available freely.The procedure of change vector analysis for change detection, as explained in Aslam et al. (2022c), was adopted to prepare the inventory.In order to determine the landslide distribution in the region, a landslide inventory map was created.Both landslides, as well as nonlandslide locations, were exploited for applying machine learning (ML) techniques, which is normal practice (Ballabio and Sterlacchini 2012;Chen et al. 2017a).Previous 606 landslide locations (the center points) were identified for the inventory and were labeled as '1'.While other 606 non-landslide randomly selected center points were labeled as '0'.In total, 1212 landslide points were used in the study.Furthermore, 1212 points of landslide inventory were split into 2/3 and 1/3 ratios for testing and training the datasets.Thus, 67% of the data were randomly labeled as a training dataset, and the rest of the 33% data was nominated as testing data.

Database of landslide-inducing factors
Although mapping of landslide susceptibility is still the center of attention of the research community, there are no agreed procedures for the assortment of landslide-inducing factors (Bui et al. 2016).However, it has been established by several researchers (Saha et al. 2005;Owen et al. 2008;Khattak et al. 2010;Saba et al. 2010;Pourghasemi et al. 2012;Kanwal et al. 2017;Ali et al. 2019;Rahman et al. 2022) that landslides tend to occur where few conditions such as elevated mountains with steeper slopes, high rainfall concentration, and seismicity are substantially found.Therefore, the choice of landslide-inducing factors in this study was based on the previously conducted research and according to the topography of the considered area.In total, 13 inducing factors, namely aspect, slope, elevation, curvature, lithology, flow direction, fault density, road density, rainfall, earthquake intensity, waterways density, Normalized Difference Water Index (NDWI), and Normalized Difference Vegetation Index (NDVI) were considered for the landslide susceptibility investigation.
The current study used the ASTER GDEM and Landsat-8 images with 30 m Â 30 m resolution.These data sources are open-access, and data can be downloaded for free.The five geomorphometric factors, elevation, aspect, slope, curvature, and flow direction, were extracted using DEM.NDVI and NDWI were taken out from Landsat 8 images acquired very recently.The infrared (IR) and red (R) bands were used to get the NDVI by exploiting the following relation (Hong et al. 2016;Chen et al. 2018;Yousefi et al. 2022b) in the raster calculator of ArcGIS: Similar to NDVI, the NDWI was also obtained through the raster calculator of ArcGIS but by using green (G) and near-infrared (NIR).The following relation (Du et al. 2016) was used: The geological maps of Pakistan were used for digitizing the layers like faults, earthquake intensity, and lithology at a scale of 1:2,000,000.The thematic layers of roads and waterways were digitized from the topographic maps of Pakistan.For the polyline features such as faults, roads, and waterways, the line density was used to calculate density in ArcGIS.The formulation of the rainfall layer was done by the station data from Pakistan Meteorological Department.The average monthly data for the last 21 years was used.The average yearly rainfall was calculated through the below-stated formula using the available data source (Arnoldus 1980): where p i denotes the average rainfall of a month while p denotes the average yearly rainfall.
Finally, all the thematic layers were standardized and normalized for further processing.For standardization, all three vector layers of landslide conditioning factors, including point, polygon, and polyline with distinct resolutions, were transformed with a resolution of 30 m Ã 30 m into a raster format.Afterwards, the Natural breaks (Jenks) classification technique was initially used to classify the thematic layers of all the factors into five classes.Subsequently, for the normalization, all factors were further reclassified into five categories (where 1 stand for very low susceptibility; 2 stands for low susceptibility; 3 stands for medium susceptibility; 4 stands for high susceptibility, and 5 stands for very high susceptibility) dependent on the potential of inducing landslide susceptibility.

Methods
The SVM and ANN models were trained using the training dataset.First, the models were operated without any enhancement.Afterwards, the models were integrated with PSO to enhance their functionality.During the training, the models examined the factors, such as hydrological, topographical, lithological, etc., against the landslide and non-landslide points.Then the trained model's performance was evaluated on the basis of the testing data set.Jupyter Notebook was used for the implementation of the techniques used.The 10-fold cross-validation process was adopted in order to reduce inconsistency and eliminate over-fitting.The models were fine-tuned to enhance their accuracy and performance.
As a result, this process assisted in advising the importance of individual factors.The final LSMs were then prepared in ArcGIS utilizing the weighted overlay analysis, together with the thematic layers of the factors.Afterwards, LSMs were prepared and were classified into five categories of susceptibility including very high; high; moderate; low; and very low; using the Natural breaks (Jenks) classification technique.

Multicollinearity analysis
The correlation between considered landslide-inducing factors was assessed by multicollinearity analysis.This statistical analysis highlights the strong correlation among two or more variables in a multiple regression model (O'brien 2007).The variance inflation factor (VIF) and tolerance (TOL) were assessed to identify multicollinearity among the causative components.The TOL value is the reciprocal of the VIF value.
The equation expressed below was used to calculate the VIF value: To understand the above equation, assume that X ¼ fX 1 , X 2 , … , X N g represents the independent variables set, and the coefficient of determination is represented by R 2 j : All the other variables in the regression model are regressed through the jth independent variable X j.The value of tolerance (TOL) indicates the correlation intensity amongst the independent variables.A variable with a VIF value of greater than 10 and TOL value of less than 0.1 exhibits multicollinearity and should be eliminated (Wang et al. 2019;Aslam et al. 2022b).

Swarm optimization (PSO)
PSO was proposed for the first time by Eberhart and Kennedy (1995).It is a vigorous evolutionary algorithm that possesses a superior learning rate and requires less memory.These aspects demonstrate the extraordinary brilliance of the PSO, contrasted with other optimization algorithms.Through the implementation of PSO, p best, which represents the most convenient personal, and g best shows the best global positions, are discovered by the particle activity.The following equation formulates the position of particles: And the velocity of the particles is given by the subsequent equation: where, X 1 , X 2 , indicates the current and new position, V 1 indicates the current and V 2 indicates new velocity of each particle.x signifies the inertia weight.Moreover, C 1 and C 2 represent two constant and positive acceleration rates that are chosen by the user.Furthermore, the terms r 1 and r 2 signify arbitrary values, which can be specified by the form of (0,1).

Support vector machine (SVM)
The most popular ML approach and most effective classifier used for supervised learning are the SVM, which was initially proposed by Vapnik (1999).The simple concept of SVM is based on the statistical learning theory (Cortes and Vapnik 1995).The advantage of using high dimensional and linearly non-separable datasets is why SVM is used widely in diverse classifications and regression problems (Mountrakis et al. 2011;Kavzoglu et al. 2014) comprising the landslide susceptibility prediction based on a set of input data (Gleason and Im 2012).SVM defines the margin of the hyperplane by using support vectors.Centered on the statistical methodology, SVM can distinguish the optimum hyperplane for differentiating two classes (Kavzoglu et al. 2014;Pham et al. 2016).Suppose that the vector of landslide conditioning factors is X ¼ x 1 , x 2 , … , x n , and the vector of classified variables (non-landslide and landslide) is represented by Y.The optimum distinguishing hyperplane can be established by resolving the subsequent classification function: where, a i is constant, c signifies the offset from the origin of the hyperplane, n represents the total number of conditioning factors and g x i ð Þ is the kernel function.In the present study, the kernel function used is the Gaussian Radial basis function.Aimed at a binary classification problem such as the present problem of landslide involving nonlandslide and landslide points, the constraint condition for solving the equation is: where In the above condition, W is the weighting factor, and h(X) structures a non-linear function that separates the input space from high-dimension spaces.

Artificial neural network (ANN)
Modern times have observed the usage of computational intelligence, particularly ANN, for resolving numerous problems.ANN is encouraged by the human neural network and is trained to ascertain the non-linear comparisons amongst a set of input-output data (Wang 2003).Compared with statistical techniques, the extraordinary benefit of ANN is its implementation efficiency.To put it in another way, the numerical data is not required to be categorized for use in ANN.Multilayer perceptron (MLP) is the most commonly used and robust type of ANN.MLP is highly pertinent in modeling functional relations (G€ unther and Fritsch 2010).
MLP is based on three layers, specifically the input, hidden, and output layers, including the computational nodes.The working of MLP is represented in graphical form in Figure 3.In general, MLP establishes the influence of every landslide conditioning factor by allocating weights and biases.In the input layer, let X be the input parameter.In the hidden layer, the weight (W) is multiplied by the input parameter, and then the bias (b) is added.In the end, in the output layer, an activation function (f(x)) is employed to the gained value to generate the local output.Mathematically, the following equation supports the representation of MLP: In the present study, the chosen (f(x)) is the Tan-sigmoid (Tansig) activation function because of its solid performance in preceding analyses (Seyedashraf et al. 2018;Xi et al. 2019).This relation is expressed in the following equation:

Model evaluation
To assess the performance of the suggested model framework, measurements of the Receiver Operating Characteristic (ROC) curve were used.The ROC curve is a typical method for the assessment of the performance of prediction techniques adopted by Bradley (1997).It is generated by mapping the true positive (true prediction made by the model) rate against the false positive (false prediction made by the model) rate at several threshold values.In statistics, the true positive and false positive rates are also mentioned as sensitivity and 100-specificity.Additionally, the Area under the Curve (AUC) measure has been utilized broadly to quantitatively assess the performance of various techniques in the context of landslide susceptibility mapping (Tsangaratos and Ilia 2016;Pham et al. 2017;Chen et al. 2017bChen et al. , 2017c;;Qi et al. 2021).In particular, a prediction approach is believed to be excellent if the AUC value is near 1 (Tsangaratos and Ilia 2016;Zhu et al. 2018).

Outcomes of multicollinearity analysis
Table 1 describes the outcomes of the multicollinearity analysis of landslide-inducing factors.None of the factors was removed from the further analysis as it was found that none of the variables had a VIF value greater than 10, the threshold value.Therefore, there is no need to remove any of the factors from further analysis.

Thematic layers of inducing factors
The aspect of the study area was found to have 9 orientations: Flat, West, Northwest, North, Northeast, East, Southeast, South, and Southwest.A substantial portion of the area has Northern and Southward orientations, as can be seen from Figure 4a.It is primarily the elevated mountain slopes of the area that are directed toward the North and South sides.Moreover, the elevation in the area varies from 574 to 4438 m.It can be observed from Figure 4b that the Northeast side has a high elevation while the southwest side has a lower elevation.The curvature map of the considered area represented in Figure 4c illustrates the geomorphological characteristic of the area by offering information about the divergent/convergent or accelerated/decelerated character of the flow.The positive values show the concave/divergent surfaces, and the negative values reflect the convex/convergent character of the surface of the study area.The convexity/convergence (accelerated flow) or concavity/divergence (decelerated flow) of the surface greatly influences the moisture-holding capacity of the soil.
The angles of the slopes in the area show that the area is dominated by steeper slopes which usually have a greater chance of sliding than mild slopes.The steeper slopes are mostly centered on the Southeastern part of the area, which also has a higher elevation than the rest of the area, as can be perceived from Figure 4d.Furthermore, the flow direction map represented in Figure 4e shows that the flow is from North to South, which is typical in mountainous regions.There is more cutting of slopes on the way to Southward flow, therefore, resulting in higher landslide susceptibility.
Since the considered area has experienced large-magnitude earthquakes, the earthquake intensity is comparatively high in this area.The Northern portion of the selected area has high seismicity and thus has high susceptibility.The Southern portion has relatively moderate seismicity, as evident from Figure 4f.Moreover, higher soil moisture is much more responsible for destabilizing the slopes as compared to less moisture soil.Thus, the NDWI is crucial.Most of the regions having high NDWI are on the Northeast and Southeast side, as can be seen in Figure 4g.The possible reason for high NDWI on these sides can be the snow on elevated mountains.However, some central regions also have high NDWI, which is basically due to the tributaries and rivers in the area.
In contrast to the NDWI, the high NDVI is primarily centered in the Northwest and Southwest direction, which is the low-lying area.Overall, the NDVI for most of  the area is high, as can be seen in Figure 4h.The area's high NDVI concentration is compatible with the dense vegetation seen in the hilly areas.
Higher vegetation cover on slopes leads to a reduction in slope failure and soil erosion.Contrary to this, no vegetation enhances the chances of failure as the surface is open.This was applied to the re-classification procedure of NDVI.Moreover, Figure 4i (rainfall map) shows that the rainfall in the studied area varies between 983 and 1324 mm and increases gradually from North to South.Normally, as compared to the lower level of rainfall, a higher level of rainfall has more potential to trigger landslides.
The study area is composed of diverse lithology.Limestone covers the larger part of the area, as shown in Figure 4j.Other protruding lithologies are slate, volcanic rock, and quartzite.Examples of weak lithologies are limestone and slate, which cover most of the area.The waterways in the area cover almost all of the area.Several small tributaries drain into the main river, which runs from North to South, as seen in Figure 4k.Normally, the areas nearer to the waterways possess higher landslide susceptibility because of the cutting of slopes and moisture.At the same time, with the increase in distance, susceptibility decreases.
The road network in the area is mainly centered in areas with low elevation and less steep slopes, as can be observed from Figure 4l.In mountainous areas like the one selected for the study purpose, usually, the slopes are unstable because of the cutting of toes of the slopes for the construction of the roads.Therefore, similar to waterways, susceptibility decreases with the increase in remoteness from the roads, whereas the areas closer to the roads are more prone to landslides.Additionally, as already shown in the earthquake intensity map, the region has high seismicity because two important fault lines traverse the region.These faults run from North to South along the East and West margins, as shown in Figure 4m.Like waterways and roads, the areas closer to the faults have a greater landslide susceptibility, while the susceptibility decreases with the increase in remoteness from the faults.

The relative importance of inducing factors
From Table 2, it can be observed that the significance of landslide-inducing factors is the same for all the models, with slight differences.It is also important to note that the influence of the similar controlling element varies in accordance with the distinct models.Some of the landslide-inducing factors, NDWI, flow direction, lithology, rainfall, elevation, and slope, have a higher effect on the models, whereas the remaining factors cause less effect.

Landslide susceptibility maps (LSMs)
After applying the SVM model, the resulting landslide susceptibility map (Figure 5a) shows that 23% of the area has low and 3.9% of the area has very low susceptibility.These regions are primarily situated on the Northeast and Southeast sides of the district.The area falling under the moderate susceptibility class covers 43.1%, and 23.32% of the total area has high susceptibility, as listed in Table 3.At the same time, 6.75% of the total area represents very high susceptibility and is mainly found in the West and Southwest of the district.The resulting landslide susceptibility map from the ANN model (Figure 5b and Table 3) portrays that 3.2% of the area lies in the very low susceptibility class, and 23.9% of the area is covered by the low susceptibility class.However, moderate susceptibility covers an area of 42.1%.Furthermore, 23.6% of the total area comes under the high, and the very high susceptibility class covers 7.2% of the total area.These regions are primarily positioned in the West and Southwest of the district, just like the dissemination in the LSM by SVM.The produced LSMs have very comparable distributions and proportions of respective susceptibility classes for both SVM and ANN models.Additionally, the LSM produced as a result of the integrated modeling of PSO-SVM shows that an area of 6.8% falls under the very high moderate susceptibility class (Figure 5c and Table 3), which is slightly lower than the results of ANN but is similar to the results of SVM.Furthermore, PSO-SVM based LSM shows a very comparable trend to SVMM and ANN for the rest of the susceptibility classes.
Interestingly, the LSM produced after the integrated modeling through PSO-ANN shows that 6.8% of the area has high susceptibility (Figure 5d and Table 3), which is slightly lower than the results of the ANN model but is the same as the results of the SVM, and PSO-SVM models.The percentage of area for other susceptibility classes is again very similar to the rest of LSMs.For all of the LSMs, the areas having high and very high susceptibility are generally located towards the West and Southwest of the district.This is considering the variation of inducing factors and their importance.It can  be observed that the West and Southwest of the district are low elevated, has lower slope angles, higher flow direction, higher NDWI, relatively high rainfall intensity, and limestone as the lithological unit.Moreover, in accordance with the results, the area falling under different susceptibility classes is also very similar.

Accuracy assessment
The accuracy assessment results are depicted in Figure 6.The ANN model with 0.81 AUC outperformed the SVM model having an AUC of 0.78 when used for modeling without the optimization.Moreover, a similar trend of model performances was observed with the optimization, with the only difference that the accuracies were enhanced.When integrated with PSO, the ANN model obtained an AUC of 0.87, whereas the AUC of the SVM model during the integration with PSO stood at 0.84.

Discussion
In this work, the Muzaffarabad district's landslide susceptibility was mapped using an integrated technique, which produced the landslide susceptibility maps (LSMs).The approach involves a well-known optimization technique: PSO, and two state-of-the-art Machine Learning techniques: SVM and ANN.
A precise landslide inventory map is required as a key step for performing such modeling and accurate analysis.Satellite imageries are the most reliable data source for detecting landslides and preparing inventory maps ( -Duri c et al. 2017;Ghorbanzadeh et al. 2022).The imageries from Landsat are also used to prepare a landslide inventory map in this study.Moreover, selecting inducing factors is critical in landslide susceptibility mapping (Hong et al. 2018;Shu et al. 2021;Li et al. 2022).There is not a single method that is universally acknowledged for choosing the components that cause landslides, although most of them are based on the area's geographical characteristics and landslide literature.On the basis of previous literature and topographical and geographical settings of the region, 13 landslide-inducing factors were chosen for plotting the landslide susceptibility (Saha et al. 2005;Owen et al. 2008;Khattak et al. 2010;Saba et al. 2010;Pourghasemi et al. 2012;Kanwal et al. 2017;Ali et al. 2019).
Afterwards, the considered factors were attained from relevant data sources and used along with the landslide inventory for further analysis.As multiple landslideinducing factors were used in this study, the correlation between them was assessed utilizing the multicollinearity analysis.The outcomes of the analysis showed no collinearity among the considered factors (Table 1).
The data were divided into testing and training data with a proportion of 67 and 33%, respectively.The training data was used to construct the models with 10-fold cross-validation, and testing data was used to check the functionality of the trained models.The models' processing resulted in advising weights for the inducing factors.Finally, the LSMs were produced in ArcGIS using the weights of the conditioning factors found from the models.
According to the produced LSMs, the high and very high susceptibility classes are centered in the West and Southwest of the district (Figure 5).These regions have lower elevations, less steep slopes, higher rainfall volumes, NDWI, and flow direction.Even though normally, the regions with higher elevation tend to have more susceptibility, as established in previous studies, the results of this study, however, show that the regions having relatively low elevation than the rest of the area have higher susceptibility (Figure 4b).The possible reason behind this is that the low elevated regions are the floodplains of the main river in the area, and the huge water flows make the slopes unstable.This can also be justified by the previous studies that landslides were possibly debris flow because most of them occurred along the rivers.Due to this reason, the models have forecasted that lower elevation areas are highly susceptible.The other possible reason can be the disturbance to the natural settings due to human activities in lower elevated regions (Pradhan and Kim 2014;Dragi cevi c et al. 2015).
The slope angles of most landslide-susceptible regions are between 0 and 23 , and rarely the susceptibility is high in areas where the slope is too sharp (Figure 4d).There is nearly no shallow landslide on the slope, which is too steep due to the thin soil layer (Chen et al. 2021a).Additionally, the regions with lower slope angles are more influenced by anthropogenic activities, which makes the slopes unstable.Anthropogenic activities greatly influence an area's inherent topography, which has consequences.With time, the risk of landslides has grown because of the increase in deforestation rates, rising population density, and unrestrained urbanization (Flentje and Chowdhury 2018;Froude and Petley 2018;Bragagnolo et al. 2020).
Furthermore, flow direction, rainfall, and NDWI are linked to water content and have been ranked among the most substantial inducing factors.Rainfall is related strongly to very high landslide susceptibility in the area, which is very pragmatic as the area receives significant rainfall over a year, varying from 983 to 1323 mm at different places (Figure 4i).The degree of saturation of the soil increases because of the rapid infiltration of water in the soil due to heavy rainfall (Mandal and Mandal 2018;Maqsoom et al. 2020).When the moisture upsurges, the slope material becomes loose, and the risk of sliding increases.
The flow direction was also found to be very crucial related to the landslide susceptibility, possibly owing to the topography and water regime of the area.The flow direction in the area is from North to South, as indicated by the major rivers in the area (Figure 4e).While flowing from North to south, the cutting of slopes due to huge flow always causes debris flow (Ballabio and Sterlacchini 2012;Rahim et al. 2018).The considered area in this research has experienced a huge number of landslides positioned near the rivers and debris flow (Saba et al. 2010).Moreover, the NDWI is used to calculate moisture growth at a particular location (Rahim et al. 2018).Higher soil moisture creates higher landslide susceptibility than lower moisture levels (Yang et al. 2017;Aslam et al. 2022c).
Table 3 provides a detailed depiction by comparing the results of different areas falling under different classes of susceptibility of LSMs.The comparison shows that all the models showed a similar trend in terms of area under different susceptibility classes with slight variations.However, for all of the models, the maximum area (around 8850 Sqm) has moderate susceptibility, and around 6100 Sqm area has high to very high susceptibility.Additionally, the assessment results proved that the PSO-ANN model has better accuracy (AUC ¼ 0.87) than PSO-SVM (AUC ¼ 0.84).ANN (AUC ¼ 0.81) also performed better than SVM (AUC ¼ 0.78) without the optimization.The results lead to the conclusion that the optimization improved the accuracy of both ANN and SVM.Aslam et al. (2021) in their study integrated CNN with multiple ML techniques, including SVM, and found that SVM outperformed other techniques with (AUC ¼ 0.87) and without (AUC ¼ 0.86) integration.The integration of SVM with CNN showed a noticeable improvement in accuracy.The accuracy of SVM in this study is less than the accuracy of SVM in the Aslam et al. ( 2021)study, and the reason is that they used a bigger area (combination of two Muzaffarabad and Nowshera districts) and also the choice of their inducing factors was different.Also, another possible reason for the low accuracy of SVM in this study as compared to Aslam et al. (2021) is that in this study, the kernel function used for the implementation of SVM is the Gaussian Radial basis function whereas they have used the Radial basis function.Moreover, another potential reason can be that Aslam et al. (2021) used a cross-fold validation process for the implementation of the models; however, in this study, the hold-out implementation process is used.Also, Aslam et al. (2022a), when comparing the performance of different conventional and unconventional ML techniques, including ANN, observed that the ANN (AUC ¼ 0.92) outperformed the other techniques.The accuracy of the ANN is very high than the accuracy of the ANN with and without the optimization in this study.However, again the reasons for the difference are very comparable to the reasons mentioned previously.The neural network used in this study and in Aslam et al. (2022a) is the same, which is MLP.The differences are the targeted area and a different implementation method, that is, hold-out.
Further, Xi et al. (2019) assessed the efficiency of PSO for improving the performance of the ANN for plotting landslide susceptibility and found that the performance of ANN experiences a substantial increase when optimized with PSO.The AUC of ANN increased from 0.77 to 0.83 for the validation dataset.Similarly, in this study, PSO also enhanced the accuracy of ANN from 0.81 to 0.87, which is a very significant increase.Moreover, Zhao and Zhao (2021), when combined PSO with SVM to prepare landslide susceptibility maps, also found an increase in the accuracy of the SVM.Based on the slope units, the accuracy of SVM (AUC ¼ 0.85) improved considerably when optimized with PSO (AUC ¼ 0.95).In these previously mentioned two studies, the performance of SVM and ANN improved considerably after the optimization with PSO, which resembles the outcomes of this study.
Moreover, the produced LSMs were assessed for their accuracy using the historical landslide locations.For the accuracy assessment, a correlation was calculated between the LSMs and the landslide locations.The results showed acceptable conformity for PSO-SVM, and PSO-ANN, with PSO-ANN performing better (89%) than the rest, as listed in Table 4.However, the ANN model-based LSM outperformed (82%) the SVM model for individual models.Thus, it can be concluded that the accuracy of an LSM is dependent on the assessment model and the inducing factors used for a given area.

Conclusion
Several techniques have been created and utilized to map landslide susceptibility, both individually and in an integrated manner where one technique is amalgamated with another technique.All of these techniques differ from one another based on their effectiveness.Whereas, in this study, to evaluate the landslide susceptibility of Muzaffarabad district, two state-of-the-art techniques, that is, SVM and ANN, are adopted to compare the results for two scenarios.In the first scenario, these methods were used alone; for the second scenario, they were integrated with a state-of-the-art optimization technique of PSO.Based on the landslide phenomenon-related knowledge and the characteristics of the considered area, 13 inducing factors, namely slope, fault density, curvature, elevation, road density, lithology, earthquake intensity, curvature, waterways density, flow direction, NDWI, rainfall, and NDVI were selected in this study.The outcomes of multicollinearity showed that there is no collinearity present among the considered factors.A landslide inventory containing 1212 landslide points (including 606 landslide points and the same amount of non-landslide points) was used.Out of the 1212 points, 67% were used as training and 33% as testing datasets.The models were built based on the training data and were validated using the testing data.
The models in both scenarios ranked NDWI, lithology, slope, flow direction, rainfall, and elevation as the most important landslide-inducing factors.However, their weights varied for every model.The resulting LSMs from every model revealed that almost 30% of the area is subjected to high to very high susceptibility, while approximately 40% has moderate susceptibility.The areas having high susceptibility are mainly on the Southwestern side, while the low and moderate susceptibility areas are mainly on the Northeastern side of the district.Considering the outcomes of this study relative to the referred literature, the performance of all the models used in this study was fairly decent.The accuracy assessment revealed that the performance of the ANN model was relatively greater than the SVM model with (AUC ¼ 0.87) and without (AUC ¼ 0.81) the PSO optimization.Moreover, the results of the accuracy assessment of the produced LSMs also showed that the PSO-ANN model-produced map had the highest accuracy (89%).Followingly it was the PSO-SVM models' LSM with an accuracy of 86%.The accuracy of SVM and ANN-produced LSMs are 82%, and 80% respectively.Thus, this study establishes the fact that integrated techniques can result in more efficient results than the individual models.
This study was limited to the integration of conventional techniques, that is, ANN and SVM with optimization technique PSO only.Nevertheless, as mentioned above, this study yielded better results as compared to the previous literature.Future studies can consider applying the used integrated techniques in different regions having similar or different topographical and environmental settings as of the area focused in this study for landslide susceptibility mapping.Also, the conventional techniques used in this study can be integrated with other optimization techniques such as Grey Wolf and Firefly for better results.Lastly, the LSMs produced in this study can help mitigate the damaging effects of this natural hazard by delineating the spatial distribution of potential landslide risk areas in the studied area.This makes it easier for the authorities to undertake necessary measures in a localized manner.

Figure 1 .
Figure 1.Map of Pakistan with the study area highlighted in red and map showing the elevation, fault lines, and waterways of the studied area.

Figure 2 .
Figure 2. Methodological flowchart of the study showing all the important steps and sequence of the work.

Figure 3 .
Figure 3.The general structure showing different kinds of layers and learning mechanism of ANN as an MLP.

Figure
Figure 4. Continued

Figure 6 .
Figure 6.The ROC curves of the used models obtained for the testing dataset.

Table 1 .
Multicollinearity analysis of landslide inducing factors.

Table 2 .
Obtained relative importance of landslide inducing factors from different models.

Table 3 .
Area division of landslide susceptibility types generated from different models.

Table 4 .
Outcomes of accuracy assessment of the generated maps from different models.