A generalized framework for quantifying and monitoring the severity of meteorological drought

Abstract The current study proposes a new framework for quantifying and monitoring the severity of meteorological drought. The proposed framework consists of three phases. The first phase of the framework uses K-component Gaussian Mixture Distribution (GMD) in the computation. The second phase is mainly based on the dissimilarity matrix-based clustering using C-index and Monte Carlo Feature-based Selection (MCFS) method. The third phase uses the Markov chain, transition probabilities and a non-homogeneous Poisson process under the Bayesian estimation. The Relative Importance (RI) values are used to choose appropriate stations. The Deviance Information Criteria (DIC) is used to check model suitability, and Root Mean Square Error (RMSE) is utilized for determining model performance. The proposed framework is validated to the 52 meteorological stations in Pakistan for 49 years from 1968 to 2016. Moreover, the outcomes of the current analysis provide insight to quantify and monitor meteorological drought comprehensively and accurately.


Introduction
Drought is the shortage of water reserves compared to normal conditions due to the high temperature and below-average precipitation in an area for a period (Moazzam et al. 2022;Shorachi et al. 2022) .It is one of the most destructive and complex natural disasters that may lead to severe health, social and agricultural losses (Kron et al. 2019).In addition, it adversely affects ecosystems, livelihoods, and socioeconomic, and environmental conditions (Asmall et al. 2021;Oyounalsoud et al. 2022).The drought effects include wildfire, food crisis, food shortages, immigration and irregulating in climate and hydrological cycle (Zeng et al. 2022).Drought is generally classified into four major types: meteorological drought, ie insufficient precipitation, agricultural drought, ie insufficient soil moisture; hydrological drought, ie groundwater & water flow reduction and socioeconomic drought, ie the gap between water demand and supply (Zhang et al. 2020;Wang et al. 2021).
The meteorological drought is the leading type of drought that may occur due to a shortage of precipitation (Jim enez-Donaire et al. 2020).Due to the shortage of precipitation for a moderate period, meteorological drought propagates to agricultural drought, ie the agricultural land becomes soil moisture deficient, and the long duration of this situation shifts prominently toward hydrological drought (Shah et al. 2021).The water reservoirs are highly affected due to the hydrological drought, and the effect of water shortage remains for a long period (O'Connor et al. 2022).All these types of droughts are interconnected and occur sequentially, which may lead to some difficulties in distinguishing between these types and generate an uncertain decision in quantifying the drought effect.Drought events have various interconnected properties, ie recurrence interval, intensity, duration and severity.Drought is a complex natural hazard that cannot be eliminated, but its effect could be reduced or eliminated by utilizing decision support tools for quantifying the physical characteristics (duration, intensity and severity) of droughts (Anshuka et al. 2019;Orimoloye 2022).For this purpose, globally the researchers and climatologists utilized drought indicators or indices to analyse the meteorological and hydrological cycles and use this information to monitor the drought community for framing drought policies and making decisions (Alahacoon and Edirisinghe 2022;Kafy et al. 2023).
The subsequent decision support tools highly rely on indices and indicators, widely used to quantify the drought events along their intensities, severity and magnitude (Alahacoon and Amarnath 2022;V elez-Nicol as et al. 2022).Drought indices are used in the broad sense to aggregate parameters, ie precipitation, temperature, humidity, streamflow, etc. and are formulated to monitor drought events with all their characteristics (Niaz et al. 2021a(Niaz et al. , 2021b(Niaz et al. , 2022d)).Based on the influential hydroclimatic variables, drought indices estimate a single value which has a significant advantage over mere raw data in quantifying drought characteristics.With the passage of time, the drought assessment tools remain under significant progress and improvements.Drought indices have a highly significant role in drought assessment studies and the researchers developed various drought indices for studying different types of droughts at their multiple levels/conditions (Alahacoon and Amarnath 2022;V elez-Nicol as et al. 2022).The Standardized Runoff Index (SRI) is applicable to assess and analyse the hydrological drought events and soil moisture percentile for agricultural drought assessment (Tigkas et al. 2016;Babre et al. 2022).Climatologists around the world utilized the Standardized precipitation index (SPI) proposed by McKee et al. (1993), the Standardized precipitation evapotranspiration index (SPEI) proposed by Vicente-Serrano et al. 2010 and the Standardized Precipitation Temperature Index (SPTI) purposed by Ali et al. (2017) for the assessment of meteorological, agricultural and hydrological drought event based on various time scale.Harisuseno (2020) used SPI, RAI (Rainfall Anomaly Index) integrated through SSI (standardized Stream flow Index) in a comparative study to assess the drought severity and duration in Pekalen River Basin, Indonesia.Ellahi et al. (2020) and Ellahi et al. (2021) utilized 6-month and 12-month SPI utilized for agricultural and hydrological drought assessment.Liu et al. (2021) utilized SPI and SPEI at multiple time scales in the spatiotemporal analysis of drought in Sichuan Province, China.Sein et al. (2021) used SPEI at a 3-month time scale for the assessment of agricultural drought to analyse the spatial-temporal pattern of drought and its effect on crop production in Myanmar.In contrast, the SPI is widely used as a drought monitoring tool but has some limitations (Mishra et al. 2022;Niaz et al. 2022aNiaz et al. , 2022bNiaz et al. , 2022c)).The index only relies on precipitation and does not consider other variables influencing drought severity, such as temperature, evapotranspiration, etc.Moreover, the areas where the evapotranspiration is high and low precipitation, or the precipitation is highly variable SPI may underestimate drought severity (Dibi-Anoh et al. 2023).Meanwhile, SPI may overestimate drought severity in areas with high precipitation but low evapotranspiration or high runoff.The researchers discussed the use of different statistical approaches in various studies of drought analysis and provided the foundations for the use of advanced statistical tools/methods in improving these assessments (Ali et al. 2019;V elez-Nicol as et al. 2022;Kafy et al. 2023).
The evolution and spatiotemporal analysis of drought events based on reliable and improved scientific statistical tools for predicting drought occurrences is still needed to reduce losses due to such abnormal climatological and drought events (Niaz et al. 2023a(Niaz et al. , 2023b)).So, the significant efforts of the researchers are to improve the currently employed techniques for monitoring the drought patterns and efficiently address the abnormalities of climatological variables.This may help the management and authorities in decisionmaking or early mitigation of drought events to minimize damages produced by such events.The study aims to propose an effective new framework based on a standardized drought monitoring index (ie improved standardized precipitation evaporation index (SPEI)) and advanced statistical methods.The study provides a pathway to help the decision-makers better understand the climatological variables' effects on drought conditions and their worst impact on living organisms, ensuring the effective implementation of drought mitigation policies to help the farmers and authorities/government for reducing the social and economic losses caused by it.

Standardized precipitation evapotranspiration index (SPEI)
Climatologists around the globe introduced various statistical techniques for drought monitoring (Alahacoon and Amarnath 2022).SPI is a multi-scale drought index that considers only the precipitation to calculate the drought index.On the other hand, SPEI is the extension of SPI, incorporating precipitation and potential evapotranspiration (PET).PET is an important climate factor and plays a significant role in categorizing drought at numerous time scales.The SPEI is calculated from the non-exceedance probability of the water balance based on the difference between Precipitation and PET.SPEI could capture more severe and moderate droughts than SPI (Tirivarombo et al. 2018).The potential evapotranspiration is a reasonable indicator of the soil water moister conditions, so it is an important climatological factor in the drought analysis.The potential evapotranspiration provides a reasonable indication of the soil water losses; therefore, it is an important climatological factor in calculating drought analysis.Based on these characteristics, the SPEI is more effective for the appropriate and timely interventions to the impacts of droughts requiring monitoring tools that can objectively quantify the drought severity level.The appropriate fitting of probability distribution to the water deficit data in the standardization procedure plays a significant role in the estimation of SPEI.For this purpose, the K-Component Gaussian distributions are utilized for the current analysis to enhance the accuracy of the standardization procedure.
2.2.K-component Gaussian mixture distributions (GMDs) in the computation of SPEI Gaussian mixture distribution (GMD) is the weighted linear combination of Gaussian distributions.Various recent studies utilized GMDs to handle multi-distributed data as it has many computational and theoretical benefits over unimodal distribution (Xia et al. 2022;Siena et al. 2023).A GMD is the parametric probability density function represented as a weighted sum of Gaussian component densities.GMDs are commonly used as a parametric model of the probability distribution of continuous variables.The analysts and researchers used GMDs to analyse data in various fields because it plays an important role in the precise and accurate results regarding distribution fitting (see Wang et al. 2015;Prabakaran et al. 2019).Chen and Sang (2018) used K-component GMDs in their proposed method to construct the model of human faces.In the calculation of SPEI, the data of water deficit (calculated by precipitation and potential evapotranspiration (PET)) might follow a mixture distributions, but in previous studies, the fitting of a single distribution might lead to diminishing the accuracy of SPEI calculation for drought monitoring and related analysis.In the computation of SPEI, at the stage of standardization, divided the data set into K-components.Fitted these components by Gaussian distributions and generated cumulative Gaussian mixture distribution function for k-components.Mathematically, it could be formulated as: Þpresent the probability distribution of the kth component for the 'i-th' location in the mixture, 'ZðS i Þ' is the spatial variable denotes the water deficit vector at the space 'S i ', W ik is the weight under the constraint P K k¼1 W ik ¼ 1, representing the contribution of kth component in the mixture distribution for the location 'S i ', l ik and r ik are the average and variance value for kth component for ith location.Further, after the computation of the cumulative probabilities of the mixture model, standardized the vector using the standardizing procedure adopted by Zhao et al. (2018).Some metrological stations have parallel drought characteristics distribution and temporal patterns of drought occurrence.These stations could be classified into homogeneous clusters and analyse each group's drought characteristics.Therefore, after calculating SPEI for each observed metrological station, distributed these metrological stations in homogeneous clusters for further analysis.
The clustering of multiple stations is significantly effective in drought mitigation and management policies as it helps in collecting comprehensive data, developing accurate early warning systems and developing confined drought management strategies.

Hierarchical clustering based on agglomerative strategy
The main purpose of clustering is to group the data with minimum variation within groups and maximum variation between these groups.For this purpose, the challenging task for the optimal clustering process is determining the optimal number of clusters and adapting an appropriate clustering scheme by various combinations of cluster numbers, distance measures and clustering methods.The hierarchical clustering algorithm based on agglomerative strategy performed in our proposed framework.
The Euclidian distance measures were calculated as presents their Euclidian distance measure.The complete link aggregation method is used to build a hierarchy of clusters.The 'complete' method for the computation of between cluster distances in cluster analysis is formulated as: The minimum value for the number of clusters is between one and the number of stations minus one.In contrast, the value for the maximum number of clusters is between two and the number of stations minus one, ie greater than or equal to the minimum number of clusters.But in our experimental study for the minimum number of clusters, we consider the values 2 and 10 for the maximum number of clusters.C-index is an index that might be used to optimize the value of clusters number for a given data.Mathematically, C-index is formulated as: where, S max 6 ¼ S min , S w is the sum of the within-cluster distances, S min is the sum of N w (the total number of pairs of observations from the same cluster) smallest distances between all the N t pairs (the total number of pairs of observations) in the entire dataset and S max is the sum of N w largest distances in all the N t pairs of points in the entire data.The value of C-index lies between 0 and 1.The number of clusters with minimum Cindex value are considered the appropriate number for given data.For a detailed description of hierarchical clustering and C-index visit Charrad et al. (2014).The homogeneity within the group is investigated after the distribution of observed meteorological stations in various clusters.The selection of a single important station from each homogeneous cluster is performed by Monte Carlo Feature Selection (MCFS).et al. (2008) provided an algorithm called MCFS for feature selection by ranking each attribute or feature in terms of their relative importance or discrimination in high-dimensional data problems.According to this algorithm, classification trees may declare a feature more important.The features importance (Relative Importance (RI)) was determined by building many trees for randomly chosen subsets (Niaz et al. 2020 ).For detail description, let d denotes the number of features, m is the number of randomly selected fixed features in each subset denoted by s and t represent the number of constructed trees for every s.Furthermore, we split every set s into training and test parts, then t trees are trained and validated.The relative importance of each feature, say f k was measured by

Draminski
No: in n fk ðcÞ No: in c v where, wAcc u c is the weighted accuracy over c's trees, GRðn fk ðcÞÞ denotes the gain ratio for tree nodes, No: in n fk ðcÞ is the number of samples in the node n fk ðcÞ and No: in c is the number of samples in the root of c th tree.The values of three parameters, m, s and t were prespecified by a practitioner and set u ¼ v ¼ 1.For the main step procedure, overall s.t trees are generated, where s and t would be so large that every feature has a chance of being appeared in many different subsets.For more details, see Figure 1.

Non-homogeneous Poisson process
The non-homogeneous Poisson process models are the widely used models for the analysis of counting processes (failure or event occurrence, reliabilities) and to analyse the rate of change of a process (Guler Dincer et al. 2022;Liu and Xie 2022).Thaduri (2020) utilized the NHPP model to estimate the probabilities for rerouting traffic using TMS.This study analysed life data from a growth curve to calculate the mean number of repairs and the rate of failure occurrence.The goal was to predict the nowcast for the present working condition.Al-Dousari et al. (2021) conducted a comparative study of NHPP models.They utilized appropriate models to analyse the rate of change in the process and predictions of COVID-19 confirmed cases, recoveries and deaths.The counting process M t ð Þ, t !0, is a stochastic process as it is non-decreasing, integer-valued, and non-negative for all t !0: Let M t ð Þ is the number of events that occurred during time t, then The counting process having independent event occurrence in non-overlapping time intervals has independent increments, ie has stationary increments if the distribution of the number of events occurred during the time interval t, t þ h ½ , depends on the time interval.In other words, M t ð Þ is a Poisson random variable having intensity function kðtÞ varying over time, is a NHPP with mean value function l So, the probability mass function for the NHPP can be defined as: where, k is the number of events that occurred at the time interval t, t þ h ½ :

Bayesian estimation
The The simulated samples for the P H=T k ð Þ and the posterior summaries of interest are obtained for H by utilizing the standard MCMC method, ie Gibbs sampling.Mostly, Gibbs sampling is initiated by providing initial values or generating the initial values of parameters from the prior distribution and then gradually converging to the target value.The Trace plots and some useful summary statistics are the frequently used techniques to check the convergence, and these results provide a clear indication of stabilized simulations.A library, 'R2jags' of R software introduced by Su et al. (2015), provides considerable simplifications in these calculations.A Bayesian adequacy model, Deviance Information Criteria (DIC), is considered to check the model's suitability.
Root Mean Square Error (RMSE) can be calculated for estimated drought events to validate the fitted model's performance for each cluster representative station.This is accomplished by comparing the observed drought events with those predicted by the model for each representative station.It delivers an understanding of the accuracy of the fitted model for each location and could help us identify the locations where the model may need improvement.Mathematically RMSE presented as Where, 'Observed' is the original data value, 'Predicted' is the fitted model estimated value and 'n' is the number of observation s.

Proposed framework algorithm
The proposed framework mainly focused on the regionally integrated clusters study.First, it distributes the observed meteorological stations of the region in the optimal number of homogenous clusters.Then it targets the single important and representative location for the drought events analysis of the respective cluster.The proposed framework mainly consists of three phases.The flowchart of the proposed framework is presented in Figure 2.Each step of the algorithm at each stage is as follows: Phase I: K-component Gaussian distribution-based SPEI calculation 1. Decide the study area and the meteorological stations to collect time series precipitation data.2. Calculate the SPEI vector for each meteorological station by utilizing K-component Gaussian distribution in the standardization process, as discussed earlier in Section 2.1.
Phase II: decision on the appropriate number of clusters using C-index, dissimilarity matrix-based classification of meteorological stations and MCFS-based identification of representative meteorological stations.3. Utilize the method of C-index to identify the appropriate number of clusters for given data.The mathematical description for calculating C-index is provided in Section 2.2. 4. Utilize the clustering scheme based on the dissimilarity matrix and use the Euclidian distance measures to calculate the dissimilarity matrix.Section 2.2 provides a detailed mathematical description of clustering. 5. Utilize the MCFS algorithm to identify representative meteorological stations in each cluster based on their relative importance in the cluster.For the mathematical description and algorithm details, see Section 2.3.
Phase III: Drought events analysis using NHPP and the estimation of model parameters under Gibbs sampling 1.The Poisson events of interest are the months the SPEI measurement has exceeded a given interest threshold.Here, SPEI À1 is the threshold of interest and considers the drought event occurrence (ie observed drought event) for each such month in the time.2. The threshold corresponds to the overall average measurement of SPEI for each meteorological station.So, based on our interest, the number of times the monthly SPEI values exceed their threshold over a given time interval is a counting process, particularly NHPP.It allows calculating the probability of the number of drought events and the rate of change of drought event occurrences in a given time interval.3. Consider the NHPP, ie Power Law Process (PLP) model, to study the accumulated number of drought event occurrence months at each representative meteorological station of the clusters for any time interval in the given period.The intensity and mean value function parametric forms depending on time and on unknown parameters assumed for the NHPP are

The statistical analysis for estimating NHPPs parameters considers the hierarchical
Bayesian approach with non-informative priors of model parameters under MCMC simulation methods (ie Gibbs sampling) to get the posterior summaries of interest.We further assume prior independence among the parameters and the appropriateness of models for each representative meteorological station of each cluster using Deviance Information Criteria (DIC).

Application
Globally, the agricultural and economic sectors of various countries are influenced due to the adverse effects of climate change; Pakistan is one of those highly affected countries.
Pakistan is facing many challenges of water deficiency and water contamination.Almost all regions of the country are affected due to unbalance and irregular behavior of the precipitation.Especially in Sindh and southern Punjab, several human deaths and livestock destruction have been reported in the last three decades.So, it is necessary to strengthen the monitoring of precipitation patterns, drought monitoring and the generation of effective mitigation policies for early warnings by developing and collecting efficient and accurate monitoring frameworks and tools (Tehreem et al. 2022).To evaluate the proposed framework strength, efficacy and efficiency, time series precipitation and temperature (minimum & maximum) data from various meteorological stations dispersed throughout Pakistan have been considered.The monthly average precipitation, minimum temperature and maximum temperature data from January 1968 to December 2016 were collected from the Pakistan meteorological department.The study area and the observed locations are presented in Figure 3.

Results
At the initial stage of the proposed framework, the modified SPEI based on GMD is estimated.The scatter plots and histograms of theoretical and empirical CDF of K-CGMD  for water deficit values were calculated from temperature and precipitation data sets.The CDFs for each station are utilized to standardize Water deficit values 'D' for calculating SPEI values.Figure 4 presents the scatter plots and histograms of theoretical and empirical CDF for Cherat, Kakul, DIK and Darosh.Table 1 presents the K-CGMD Bayesian Information Criteria (BIC) values for selected stations.The small BIC values and QQ plots for each station indicated the appropriateness of 12-CGMD fitting and improved the accuracy of drought monitoring and its related analysis.The largest value of BIC is À5710.38 for Jiwani, and the smallest value of BIC is À6983.29 for Balakot.Figure 5 presents the temporal behavior of SPEI-1 for 49 years at Cherat, Kakul, DIK and Drosh.It clearly shows the variation in the annual pattern of SPEI-1 at each station, and these variations in the annual pattern differ from station to station.After calculating the SPEI-1 of the individual station, in Phase II, first, we generate the clusters of all 52 meteorological stations based on these calculated SPEI-1 values.Moreover, the optimal number of clusters is selected based on the C-index before generating the clusters.The number of clusters and their C-index values are given in Table 2.The minimum value of the C-index indicated that its corresponding number of clusters is optimal.The minimum value of the C-index is 0.5429, indicating that the optimal number of clusters is 9.The 52 stations are further distributed in 9 clusters based on their dissimilarity matrix calculated using the Euclidian distance.The list of meteorological stations corresponding to each cluster is presented in Table 3.The number of stations for each cluster varies, ie cluster 1, 2, 3, 4, 5, 6, 7, 8 and 9 has 4, 5, 4, 8, 2, 4, 12, 8 and 5 stations, respectively.Furthermore, the homogeneity within the cluster is analysed using the Pearson Chi-square test of homogeneity, with the null hypothesis being that the stations within clusters are homogeneous.
The results of the Pearson chi-square test of homogeneity for each cluster are presented in Table 4, where the P-value for each cluster is !0.9976.It indicates that all the generated clusters are homogeneous, even though the number of stations varies in each cluster.
The MCFS algorithm provides the ranked list based on the relative importance of each station in each cluster.Table 5 presents the relative importance values for stations of each cluster except cluster 5 because it has only two stations, and the MCFS is unsuitable for such small clusters.The most important stations for Cluster 1 to 4 are Peshawar, Muzaffarabad, DIK and Chitral, with relative importance values of 0.4234, 0.2381, 0.5190 and 0.6660.The least important stations for clusters 1 to 4 are Cherat, Murree, Sargodha  and Astore, with relative importance values of 0.2187, 0.1973, 0.3442 and 0.1560.Similarly, the most important stations for Clusters 6 to 9 are Hyderabad, Rohri, Nokkundi and Lahore, with relative importance values of 0.7748, 0.7536, 1.2512 and 0.5373, respectively.The least important stations for clusters 6 to 9 are Karachi, Barkhan, Zohub and Kotli, with relative importance values of 0.3845, 0.1436, 0.1380 and 0.1939.The important stations of each cluster are selected and utilized for the analysis of drought events.Based on the climate moisture categories in Liu et al. (2021), drought events are considered or observed for the month having SPEI -1.For months with a Standardized Precipitation-Evapotranspiration Index (SPEI) value of À1, the months in which drought events occur are considered as the observed drought events at each representative station.The accumulated number of observed drought events for time 'T' is the total number of months having SPEI À1 till that time.
The Bayesian analysis for the NHPP-PLP model is conducted under MCMC simulation, and the Posterior summaries of interest are presented in Table 6, along with the DIC values for each important station and the two stations of cluster 5 (10 selected important stations).For all these stations, the values of MC errors and standard deviations are small enough.It is assumed that the prior distribution of 'a' and 'b' is Uniform(0, 100].The marginal posterior densities of the hyperparameters (a,b) and their trace plots for each station are presented in Figure 6.These plots are presented to observe the convergence in the process.The burn-in sample of size 5000 was considered to eliminate the effect of the initial values.The results for marginal posterior distributions with Monte Carlo estimates for DIC are generated based on 25,000 simulated Gibbs samples.
The estimated parameter values for the NHPP-PLP model of each station are utilized to estimate the cumulative drought events over the time in the study time for Peshawar, Muzaffarabad, DIK, Chitral, Dir, Parachinar, Hyderabad, Rohri, Nokkundi and Lahore.The observed accumulative drought events over time and the estimated drought events for each station are presented in Figure 7, and the Mean Square Error for each station is provided in Table 7.In NHPP model validation results, the MSE values for Peshawar, Muzaffarabad, DIK, Chitral, Dir, Parachinar, Hyderabad, Rohri, Nokkundi & Lahore are 2. 765, 6.104, 1.709, 1.675, 6.071, 12.954, 1.316, 2.804, 1.595 and 0.994.Generally, the results revealed that the PLP model performed efficiently in predicting drought events over time, but its performance is not as effective in Cluster 5(see Figure 7 ei & eii).

Discussion
Researchers and climatologists have introduced various drought indices, but standardized drought indices are the most frequently used drought indices (Tigkas et al. 2016;Ali et al. 2020;Harisuseno 2020;Liu et al. 2021;Niaz et al. 2021aNiaz et al. , 2022b, 2022e, 2022f;, 2022e, 2022f;Sein et al. 2021;Babre et al. 2022).The probability distribution fitting during standardization is the main issue in accurately calculating standardized drought indices.This issue was addressed by Ali et al. (2021), who introduced the K-CGMD fitting in standardization.It significantly improved the values of standardized drought indices.Therefore, in our study, during the calculation of SPEI, 12-CGMD is utilized for standardizing and fitting the distribution of Water deficit values.The R library 'mixtools', introduced by Benaglia et al. (2009) for the mixture model, are used for K-CGMD fitting.The BIC values of the fitted GMD at each station, the density plots and QQ plots provided significant insights into the accuracy of fitting the distributions.The time series plots of modified SPEI for randomly selected four stations are also provided in the study.Although the drought index SPEI is widely used that combines the temperature and precipitation data to provide a standardized measure of water availability, it assumes that the relationship between temperature and precipitation is constant over time.But in some regions, this relationship may depend on other factors, particularly in such areas; the fluctuation in the relationship of precipitation and temperature significantly impacts water availability and may lead to inaccurate estimates (Zaki and Noda 2022).The major factors affecting water availability are precipitation, evapotranspiration, land surface characteristics, humidity, wind speed and climate change (Mukherjee et al. 2018;Rehana et al. 2021).But SPEI ignored some significant variables and were not considered in calculating water availability.However, the SPEI is recommended as an alternative to SPI because it quantifies anomalies in accumulated climatic water balance, incorporating potential evapotranspiration to investigate drought events (Tefera et al. 2019).Clustering of meteorological stations may help in developing more accurate drought monitoring and early warning systems.It also helps to establish localized drought management strategies (Feeny 2017).By analysing the specific cluster, it is possible to identify the areas particularly exposed to drought and get targeted interventions that are altered to the specific needs of those societies (Aladaileh et al. 2019).Therefore in our study, clusters of all 52 meteorological stations based on their time series data of SPEI are obtained using hierarchical clustering.The decision of the optimal number of clusters is decided using C-index.The minimum value of the C-index indicates the optimal number of clusters of 52 meteorological stations in the study area.The optimal number of clusters is 9, with a minimum value of C-index, ie 0.5429.After generating nine homogeneous clusters, a test for homogeneity indicates that the stations within each cluster exhibit homogeneity.In the next step of the proposed framework, the main target is to obtain the important representative station in each cluster.For this purpose, the MCFS technique provides an efficient and effective way to select each cluster's important station.The stations are ranked based on their relative importance for each cluster.The selected relatively important stations enable us to assign the correct decision regarding drought severity conditions to  each cluster every month for 49 years.To implement the last phase, the threshold of SPEI À1:0 are used to obtain the vector of accumulated drought events that occurred at the selected 10 important stations.The PLP-based NHPP model is utilized to investigate the drought occurrence rate and estimate the drought events over time.For the Bayesian analysis, we assume that the prior distribution of 'a' and 'b' is uniform with the range (0, 100] along with the prior independence among the parameters.The Monte Carlo error indicates that the margin of error is very small when using the MCMC samples to estimate the posterior means for each parameter at each station.Therefore, the means of marginal posterior distributions of the parameters at each important and cluster-representative station are used to estimate the drought events over time and the rate of change of drought event occurrence.At the end of our study, the estimated overobserved drought events and the RMSE for each station showed that for Peshawar, DIK, Chitral, Hyderabad, Rohri, Nokkundi and Lahore, the PLP model performed well, for Muzaffarabad and Dir it performs average, but for Parachinar, it performs worst due to some abrupt changes occurred during the time.

Conclusion
The current study proposes a generalized framework for quantifying and monitoring meteorological drought severity.The performance of the proposed framework is validated and assessed for 52 stations in Pakistan.The modified and improved SPEI based on GMD is estimated in the proposed framework.The recommended clustering method in the farmwork efficiently produced various homogeneous clusters, and MCFS performs well in selecting important representative stations for each cluster.The Bayesian-based approach MCMC simulation is utilized to fit the NHPP-PLP model to estimate average drought events along its rate of change of events occurrence.DIC values for MCMCbased NHPP-PLP model parameters estimation and the RMSEs for the fitted NHPP-PLP model indicate the accuracy and significance, respectively.The outcomes of the proposed framework effectively quantify and monitor the severity of meteorological drought.Therefore the proposed framework for spatiotemporal assessment of drought events and their severity is a significant addition to the literature that helps the decision-makers to understand the effects of drought occurrences better and implement their drought mitigation policies to reduce the losses caused by it.The framework can further be improved by adding new climatological variables to calculate the indices to that quantify and monitor the severity of meteorological drought.

Ethics statement
All procedures followed were in accordance with the ethical standards with the Helsinki Declaration of 1975, as revised in 2000.

Figure 1 .
Figure 1.Monte Carlo Feature Selection algorithm flow chart.

Figure 2 .
Figure 2. Flow chart of proposed framework.

Figure 3 .
Figure 3. Study area with observed meteorological stations.

Figure 4 .
Figure 4.The scatter plots and Histogram of theoretical and empirical CDF of K-CGMD.

Figure 6 .
Figure 6.NHPP model parameters marginal posterior density and trace plots for each station.
(Al-Dousari et al. 2021o (MCMC) simulation-based Bayesian inference estimation procedures, which incorporate prior distributions of model parameters, are the most efficient and effective methods for parameter estimation.NHPP models rely on intensity and mean value functions that include various hyperparameters.The estimation of these hyperparameters is based on the joint or marginal posterior distributions of these parameters, which are calculated using MCMC simulation algorithms based on Bayesian estimation procedures(Al-Dousari et al. 2021; de Oliveira Peres et al. 2022; Kim and Kottas 2022).For the mathematical description of the Bayesian estimation, let H denote the parameters of the intensity function of the NHPP model, PðHÞ be the joint prior distribution of the parameters, LðH; T k Þ is the likelihood function of the model of interest, T k ¼ k; t 1 , t 2 , t 3 . . . . . . . . .t k ; T ð Þ is the data set having k observed occurrence time t 1 , t 2 , t 3 . . . . . . . . .t k on a time interval T and PðH=T k Þ is the joint posterior distribution for H given the data T k for the time interval [0, T].Mathematically,

Table 1 .
K-CGMD BIC values for all observed stations.

Table 2 .
The number of clusters and its calculated C-index values.

Table 3 .
Stations distribution for each cluster.

Table 4 .
Pearson Chi-square test for homogeneity.

Table 5 .
MCFS based estimated Relative importance (RI) for the stations in each cluster.

Table 6 .
NHPP model parameters MCMC summary statistics for the representative station of each cluster.