Mapping thermokarst lakes in different physical states on the central Tibetan plateau

ABSTRACT Mapping and monitoring the spatial–temporal evolution of thermokarst lakes on the Tibetan Plateau can significantly contribute to the estimation of climate change impacts on permafrost degradation. Remote sensing-based methods have been developed to monitor the interannual changes of thermokarst lakes. However, the seasonal changes, which are vital for understanding thermokarst lake dynamics, are commonly overlooked. This paper developed automatic methods which integrated the normalized difference water index (NDWI), modified normalized difference water index (MNDWI), principal component analysis (PCA), and independent component analysis (ICA) with Markov random field (MRF) for rapid mapping of thermokarst lakes in different physical states characterized by season changes. To validate the effectiveness, the proposed methods were applied to map thermokarst lakes on the central Tibetan Plateau, using four Sentinel-2 images representing different evolutionary states. Specifically, the area-to-point regression kriging (ATPRK) method was employed to downscale the 20-m short-wave infrared (SWIR) band to 10 m. Using manually digitized lakes as a reference, the proposed methods achieved an average Kappa coefficient of 0.79, significantly outperforming the water index thresholding method that merely attained 0.44. The results corroborate that the proposed methods can be robustly applied in mapping thermokarst lake dynamics with both interannual and seasonal trends on the Tibetan Plateau.


Introduction
The Tibetan Plateau, commonly known as 'the Third Pole', has the world's highest and most extensive permafrost in middle and low latitudes (Zou et al. 2017).The Tibetan Plateau shows a more significant warming trend twice than the global average rate during past decades, thus becoming a hotspot of climate change (Liu and Chen 2000;You et al. 2021).Climate change has resulted in substantial ice-rich permafrost thawing on the Tibetan Plateau since the mid-1950s (Wu and Zhang 2008;Zhong et al. 2021), characterized by changes in the near-surface soil freeze-thaw cycle and the thickening of permafrost thaw and active layer (Liu et al. 2022).Thawing permafrost could accelerate the decomposition of soil organic matter and emit substantial carbon into the atmosphere, thus further aggravating the global warming (Walter Anthony et al. 2018 Edward , and Stuart Chapin 2006).The permafrost thawing process may dramatically alter the storage of ground ice and groundwater-surface water interactions, which greatly affects regional hydrologic cycles (Liljedahl et al. 2016).Additionally, permafrost degradation has posed threats to the local ecosystems and human infrastructure due to the substantial thawing settlement, which is unprecedentedly challenging the local socio-economic development and the welfare of humanity (Yang et al. 2010).
Thermokarst lakes are common landforms in ice-rich permafrost regions, which are formed by the thawing of ice-rich permafrost and subsequent ground subsidence (Walter Anthony et al. 2007).The formation of thermokarst lakes is a significant indicator of unstable and warming permafrost (Walter Anthony et al. 2018).It has been proven that thawing permafrost significantly influences thermokarst lake changes (lake abundance/size) and their related hydrologic variability (Karlsson, Lyon, and Destouni 2012;Yang et al. 2016;2021).Thus, thermokarst lake dynamics is highly sensitive to changes in climate and permafrost conditions.Current observations indicate that thermokarst lakes in the continuous permafrost region of the Tibetan Plateau, have been experiencing the process of rapid development due to climate change and human activities, with significant growth in lake abundance and area (Luo et al. 2022;Niu et al. 2011).As a result, mapping and monitoring thermokarst lake dynamics can help us better understand changes in permafrost conditions and climate change impacts on the Tibetan Plateau.
Given the vast permafrost distribution, low temperature, low atmospheric pressure, and highaltitude environment of the Tibetan Plateau, satellite remote-sensing data and techniques are effective tools to map, monitor and evaluate thermokarst lake dynamics at the large scale (Runge, Nitze, and Grosse 2022).In current studies, visual interpretation of remote sensing images is widely used for thermokarst lake detection because of its high accuracy, although it is usually laborious and time-consuming (Luo et al. 2015;2022).For automatic lake mapping, the most commonly utilized methods are the unsupervised water index thresholding and the supervised classification (Janiec, Nowosad, and Zwoliński 2023).The thresholding approach often involves automatic selection of one or more water indices to rapidly map lakes (Zhang et al. 2019b;Zou et al. 2018).However, their methods are commonly based on the generic threshold, which has limitation in describing thermokarst lake variations brought by satellite imaging conditions and water constituents, and may accordingly lead to underestimation of lake area and number over a large area (Zhang et al. 2020).Supervised machine learning approaches, including support vector machines, random forests, and convolutional neural networks, have demonstrated effectiveness in thermokarst lake mapping (Hughes-Allen et al. 2023;Janiec, Nowosad, and Zwoliński 2023;Pi et al. 2022;Sui et al. 2022;Wei et al. 2021;Yang et al. 2022).Nevertheless, the spectral variations among thermokarst lakes can arise from differences in lake size, depth, and the presence of suspended sediments and aquatic plants.Consequently, when applying supervised classification methods across different regions, the need for updating training samples can reduce the efficiency and portability of these techniques (Cordeiro, Martinez, and Peña-Luque 2021).Furthermore, these automatic methods were only designated to detect lakes with a relatively stable water level (i.e.around October) (Olthof, Fraser, and Schmitt 2015;Tian et al. 2017;Wei et al. 2021;Șerban et al. 2020).Due to the contribution of meltwater from ground ice and continuous changes in climate factors such as temperature and precipitation, the water levels and physical states of lakes may continuously vary throughout the annual cycle, especially for small thermokarst lakes (Burn 2002;Luo et al. 2022).It is important to monitor seasonal changes of thermokarst lakes because more comprehensive observations for estimating thermokarst lake dynamics in response to permafrost degradation and climate change can be achieved (Yang et al. 2021;Zhang, Zhang, and Zhu 2020).Therefore, it is essential to develop an automated method that can realize rapid mapping of thermokarst lakes in different physical states characterized by season changes.
The Google Earth Engine (GEE) cloud computing platform integrates multiple satellite imagery with planetary-scale analysis capabilities, and it enables the generation of global-scale and long-term water dynamics datasets.Pekel et al. (2016) produced the Global Surface Water Dataset (GSWD) based on the 30-m Landsat archives from 1984 to 2020.Another notable surface water change product is the 30-m global surface water dynamics from 1999 to 2021, which was also derived from the Landsat time-series, and released by the Global Land Analysis and Discovery (GLAD) lab (Pickens et al. 2020).Although these two global surface water datasets provide wide coverage and long-term observations, the challenges remain when directly using them to investigate the spatio-temporal changes of thermokarst lakes, primarily for two reasons.Firstly, Landsat images have a spatial resolution of 30 m, making it difficult to accurately identify thermokarst lakes with areas below 3600 m 2 , considering that the minimum size for a reliable feature identification is four to five pixels (Șerban et al. 2020;Wei et al. 2021).Secondly, both datasets categorize ice and snow-covered water areas as non-valid outputs, thus restricting the observation period merely to the unfrozen months.In fact, small and shallow thermokarst lakes are seasonally ice-covered.As a result, seasonal variations would be neglected if the observation is only focusing on the unfrozen months.Qin, Lu, and Li (2019) proposed a semi-automatic method that combined normalized difference water index (NDWI) with Markov random field (MRF) for thermokarst lake mapping on the Tibetan Plateau.Although this method utilized both the spectral and spatial contextual information to identify accurate lake boundaries for small lakes comprising just a few pixels, it is not perfect and needs to be further improved.In particular, it was applicable to map thermokarst lakes only in a limited period (i.e.December).The validity of mapping thermokarst lakes in different physical states has not yet been ascertained.Also, NDWI may raise challenges when there is suspended matter in lakes or environmental noises such as built-up areas and snow in remote sensing images (Jiang et al. 2014).Furthermore, only four 10-m bands from the Sentinel-2 imagery were used for the thermokarst lake identification.Given the considerable variation in lake size and depth across the Tibetan Plateau, sharp spectral heterogeneity may exist across different individual lakes (Tian et al. 2017), it is interesting to involve more bands for the precise mapping of small thermokarst lakes.
This paper further improved the thermokarst lake mapping method by integrating the NDWI, modified normalized difference water index (MNDWI), principal component analysis (PCA), and independent component analysis (ICA) with MRF for rapid mapping of thermokarst lakes in different physical states from remote sensing images.This proposed method has the following attractive characteristics: (1) it can achieve precise and efficient mapping of thermokarst lakes in different seasons; (2) it incorporates the downscaled Sentinel-2 short-wave infrared (SWIR) band for more accurate mapping; (3) it is a highly automated approach as only few parameters that need for tuning.It is expected that the proposed method can provide fundamental technical support for the dynamic monitoring of spatial-temporal evolution of thermokarst lakes on the Tibetan Plateau.

Study area and datasets
The study area is situated on the Chumarhe High Plateau of the central Tibetan Plateau, covering an area of approximately 90 km 2 (Figure 1).From the data (2015-2020) collected by the nearest Wudaoliang meteorological station, the annual average temperature is −4°C and the annual precipitation is 400 mm/y, mainly concentrated from May to August.Influenced by the cold climatic conditions, the vegetation type in this area is homogeneous.The ground surface is dominated by the dry alpine steppe and alpine meadow, without trees and shrubs (Niu et al. 2011).In the northwest of the study area, Salt Lake greatly expanded since 2011 due to the outburst of the headwater lake, and such outburst event was proven to further accelerate regional permafrost degradation (Lu et al. 2020).
The study area is underlain by continuous permafrost with massive ground ice.Extensive thermokarst lakes distributed along the railway was attributed to permafrost degradation (Luo et al. 2015).To date, it has difficulty in precisely distinguishing which lake can be identified as thermokarst lake.Past studies generally classified small lakes located within permafrost-influenced areas as thermokarst lakes (Luo et al. 2015;2022;Mu et al. 2020).Currently, there is no explicit definition for the maximum surface area for thermokarst lakes.Wei et al. (2021) identified lakes with an area less than 3 km 2 as thermokarst lakes on the Tibetan Plateau.In this study area, referring to Wei et al. (2021), we identify all the lakes as thermokarst lakes because all their sizes are less than 3 km 2 .In an annual cycle, the state of thermokarst lakes contains three phases: ice, ice-water mixture and water.Thermokarst lakes on the Chumarhe High Plateau usually freeze from mid-October to mid-May, and the ice-water mixture period often occurs in late October and in late April to mid-May (Niu et al. 2011).
In this study area, lakes are smaller than 0.1 km 2 , except for one lake covering an area of 0.15 km 2 .Thermokarst lakes with a surface area below 0.01 km 2 account for about 90% of the total lakes.To map small thermokarst lakes (< 0.1 km 2 ) on a regional scale, the high-resolution Sentinel-2 imagery was used in this study.The Sentinel-2 mission includes two polar-orbiting satellites with a frequent revisiting period of five days.The Multispectral Instrument on-board each satellite involves 13 spectral bands from the visible and near-infrared (NIR) bands to the SWIR band with different spatial resolutions from 10 to 60 m (Drusch et al. 2012).In this study, we downloaded the Sentinel-2 Level-2A data from the GEE platform, and the 10-m red, green, blue and NIR bands and the 20-m SWIR band were used for thermokarst lake mapping.To demonstrate the effectiveness of the proposed methods in mapping thermokarst lakes in different physical states, four Sentinel-2 images, captured on 22 January, 27 April, 1 June, and 11 July in 2018, were chosen (Figure 1).The cloud cover percentages for the whole image archive are 1.53%, 2.27%, 4.48%, and 41.83%, respectively.Within the study area, all four images exhibited no cloud cover.Thermokarst lakes are frozen in January.At the end of April, the ice in small lakes and lakeshore zones of larger lakes has melted.The evaporation in this area is greater than the precipitation (Niu et al. 2011).In the open-water season (i.e. from May to October), the water level of thermokarst lakes is lower in early June, whereas with the replenishment of precipitation and the melting of ground ice, the lakes expand rapidly with a high water level in July (Luo et al. 2022).

Methodology
To rapidly map thermokarst lakes in different physical states, as illustrated in Figure 2, we designed semi-automatic MRF-based methods that mainly include: (1) the fusion of Sentinel-2 images, (2) lake detection techniques for initial lake recognition, (3) the multi-threshold method for training sample generation, (4) the MRF model for optimal lake mapping, and (5) the postprocessing.

Lake detection techniques
In this study, four lake detection techniques, namely NDWI, MNDWI, PCA, and ICA, were utilized to enhance the spectral characteristics of thermokarst lakes.It is worth noting that NDWI had previously been utilized by Qin, Lu, and Li (2019).

Water index
NDWI refers to a water index using the green and NIR spectral bands for the normalized difference calculation to enhance water information in remote sensing images.MNDWI uses the SWIR band in place of the NIR band in NDWI.Existing literatures have shown that MNDWI is more conducive to water extraction than NDWI since water can absorb more SWIR spectrum than NIR spectrum (Du et al. 2016;Xu 2006).The calculation of NDWI and MNDWI is listed below: where green and NIR are the green and NIR bands, respectively, SWIR 10m represents the 10-m fused SWIR band generated by downscaling the 20-m SWIR band to 10 m.The image fusion approach will be further introduced in section 3.2.

Feature transformation
PCA and ICA are feature transformation methods that have been widely used to reduce data dependency and enhance feature separability.PCA transforms multispectral images into new linearly unrelated feature components (Abdi and Williams 2010), whereas ICA converts multispectral images into independent components (Hyvärinen and Oja 2000).In this study, the red, green, blue, and NIR bands were transformed into four PCA/ICA components.The primary spectral information from the original bands mainly exists in the first and second components.The components of PCA and ICA that highlight thermokarst lake characteristics were selected through visual interpretation.
The NDWI, MNDWI, PCA, and ICA images are shown in Figure 3.There are clear distinctions between thermokarst lakes and other features in the lake detection images.

Fusion of Sentinel-2 images
For Sentinel-2, MNDWI needs to be calculated by scaling the 10-m green band and 20-m SWIR band to the same resolution.In this study, the 20-m SWIR band was downscaled to 10 m using the area-to-point regression kriging (ATPRK) method (Wang et al. 2016).In the first step of the multiple regression modeling, the 10-m red, green, blue and NIR bands were first upscaled to 20 m.Next, a linear regression model was constructed where the four upscaled 20-m bands are the dependent variables and the original SWIR band is the independent variable, and the model coefficients were calculated by the least squares.Then, the 10-m regression prediction for the 20m SWIR band was generated by combining the model coefficients with the four original 10-m bands.In the second step, the area-to-point kriging (ATPK)-based interpolation method was performed to downscale the 20-m residuals generated by the above regression model to 10 m.Then, the 10-m fused SWIR band was produced by integrating the 10-m residual term into the 10-m regression prediction.Further details on the implementation of the ATPRK method can be found in Wang et al. (2016).

Multi-threshold method
To separate thermokarst lakes from lake detection images, the multi-threshold method was used for the generation of training sample masks (Li et al. 2016): where r is the value of the pixel in lake detection images, m and s are the mean and standard deviation of lake detection images.In PCA and ICA components, water bodies may be the brightest or the darkest part.If it is the latter, the new components need to be multiplied by −1 and then used the multi-threshold method.T and DT are parameters that determine the thresholds of training samples.T determines the maximum threshold for generating non-lake samples, and it was set as Table 1 by trial and error.DT determines the value range of uncertain pixels and was fixed to 1.5.
Only three values of 0, 1 and 2 exist in training sample masks (Figure 4), standing for non-lake, thermokarst lake and uncertain pixel, respectively.

Markov random field
To classify uncertain pixels in training sample masks as either lake or non-lake, MRF was integrated for further thermokarst lake mapping.MRF can be essentially regarded as the energy function minimization problem (Boykov and Kolmogorov 2004;Li et al. 2016;Lu et al. 2019;Rother, Kolmogorov, and Blake 2004).Firstly, MRF was constructed as a graph model G = V, E 〈 〉, as shown in Figure 5.The vertex set V includes all pixels in the image and two specific sets, S and T, indicating the Gaussian mixture models generated by the training samples of lake and nonlake, respectively.The edge set E contains two kinds of edges: t-links and n-links.The thickness of the edge represents its weight.The t-links connect the pixels in the image with the set of S and T. The weight of t-links is the regional energy E r (L), which can ensure the consistency between the label set L and training samples.The n-links connect the adjacent pixels in the image and its weight is decided by the boundary energy E b (L), indicating the similarity of adjacent pixels in the  4-neighborhood systems.Then, the st-mincut algorithm was implemented to find an edge set L with the minimum sum of the weights to divide the image into two parts: lakes and non-lakes.The energy function E(L) can thus reach a minimum, corresponding to the optimal segmentation.The MRF algorithm is specified as follows: where l is the non-negative weighting coefficient that represents the ratio of E r (L) to E b (L) in E(L).l was set to 50 according to Li et al. (2016) and Rother, Kolmogorov, and Blake (2004).In the label set L = {l 1 , l 2 , . . ., l m * n }, l i [ {0, 1} denotes the label of the ith pixel in the image (size of m * n).0 and 1 stand for non-lake and lake, respectively.The detailed implementation of energy minimization can be referred to Boykov and Kolmogorov (2004).

Postprocessing
The postprocessing involves the elimination of small lakes and the generation of the vectorized thermokarst lake maps.First, considering the 10-m spatial resolution of Sentinel-2 imagery, the minimum size of mapped thermokarst lake area was set as 400 m².Lakes smaller than four pixels were removed through the morphology erosion operation.Second, the lake mapping results were transformed into vector files using the 'Raster to Polygon' tool in ArcGIS Pro, and then, the area, perimeter, and shape index (SI) were stored as the attributes of lake vectors.SI quantifies how closely the shape of water body resembles a circle and is calculated as follows (Su et al. 2023): where A and E are the area and perimeter of water body, respectively.SI value closer to 1 denotes a more circular shape, while a large value indicates a more elongated irregular shape.

Accuracy assessment
To quantitatively evaluate the proposed methods, three accuracy assessment indices were used: Producer's accuracy, User's accuracy, and Kappa coefficient, indicating the omission error (i.e.completeness), commission error (i.e.correctness), and the overall performance of classification, respectively: where TP is the number of mapped lake pixels that overlap with the true lake pixels (true positives).
FN is the number of mapped non-lake pixels that overlap with the true lake pixels (false negatives).FP is the number of mapped lake pixels that do not overlap with the true lake pixels (false positives).N is the total pixel number, n is the number of classes, m ii is the number of pixels correctly mapped as class i, C i is the total number of pixels mapped as class i, and G i is the total number of pixels belonging to class i.The reference maps that reflect the true lake boundaries were manually digitized by visually interpreting the 10-m Sentinel-2 images in ArcGIS Pro (Figure 6).

Results
To validate the effectiveness of the proposed method, a comparison analysis was conducted between the results of the proposed method and those of the water index thresholding approach.
The thresholding experiment was performed using the non-parametric Otsu's algorithm in each of the four lake detection techniques to generate thermokarst lake maps for four different periods (Qin et al. 2023).
The quantitative evaluation results are presented in Table 2.During the periods of ice and high water level, both NDWI-, MNDWI-, PCA-, and ICA-based MRF methods performed well.Specifically, the User's accuracy of the NDWI-based MRF reaches above 99% with the Kappa coefficients up to 0.91.In the ice-water mixture period, the overall performance of the MNDWI-, PCA-and ICA-based MRF is better than the NDWI-based MRF, and the three improved methods can map lakes more completely than the NDWI-based MRF.When the water level of thermokarst lakes is lower, the PCA-and ICA-based MRF methods have obvious superiority over the NDWI-and MNDWI-based MRF methods, as demonstrated by the Kappa coefficient.The Producer's accuracy of the NDWI-and MNDWI-based MRF only reaches 33.33% and 47.65%, respectively, indicating only a small portion of lakes was mapped.These four experiments show that MNDWI is more robust than NDWI when they are combined with MRF to map thermokarst lakes in different physical states.During the periods of ice-water mixture and low water level, the Kappa coefficient using MNDWI is higher than that using NDWI.Furthermore, when compared to the water thresholding method, the proposed methods achieved higher Kappa coefficient in each experiment and the average kappa coefficient across the four experimental sets increased by 0.35.As a result, the accuracy assessment validates the effectiveness of applying the MRF model into the thermokarst lake mapping (Figure 7).

Applicability of the proposed methods
As shown in Figures 8-11, for each experiment group, we superimposed four automatic mapping results on the reference map and selected a typical subarea to demonstrate the applicability of the proposed methods.During the ice and high water level periods, all four MRF-based methods can efficiently map thermokarst lakes.However, the performance of the MNDWI-, PCA-and ICA-based MRF outperforms the NDWI-based MRF methods during the period of ice-water mixture and low water level.The applicability of each method mainly depends on the lake detection technique.In the study area, thermokarst lakes vary greatly in size, ranging from 400 m 2 to 0.15 km 2 based on the Sentinel-2 image acquired in July.This makes the spectral characteristics of thermokarst lakes with different sizes vary remarkably even in the same period.In different periods, as air temperature changes, thermokarst lakes exhibit different physical states.When the ice layer exists in thermokarst lakes, NDWI and MNDWI combined with the multi-threshold method might only identify ice (Figure 9).This is due to the pixel value of ice being greater than 0 in the NDWI and MNDWI images, while that of water is less than 0. The multi-threshold method cannot classify both ice and water into the same class.In the stage of low water level, different depths of lakes are spectrally heterogeneous in the NDWI and MNDWI images.Specifically, the pixel values of shallow water bodies are less than 0, while those of deep waters are greater than 0. The multi-threshold method fails to classify shallow water into the training samples of lakes (Figure 10 (a) and (b)).Although MRF can capture spectral and spatial contextual characteristics of thermokarst lakes, the lack of training samples representing the various spectral characteristics of lakes may generate incomplete mapping results.In comparison, PCA and ICA are more robust than NDWI and MNDWI.PCA and ICA methods transform the original bands into four new uncorrelated components by the linear transformation, enhancing data separability and reducing data redundancy (i.e.noise).The new component that primarily represents thermokarst lakes can contain different depths of lakes, thereby providing more complete training samples for MRF.However, the User's accuracy of PCA-and ICA-based MRF is lower than that of NDWI-and MNDWI-based MRF.This is mainly because PCA and ICA transform both the spectral information of thermokarst lakes and the linear engineering objects into the same component.This inadvertently causes the railway and highway being erroneously identified as lakes, thus raising over-detection.Considering that constructions in permafrost regions are limited and the vectorized data of linear engineering such as railways and highways are usually available, these construction areas can be readily masked out to minimize the commission errors in thermokarst lake mapping using the PCAand ICA-based MRF.

Adapt to map lakes with snow cover
Although the precipitation on the Tibetan Plateau is scarce and is mainly concentrated from May to August, it is considered that if there is snow cover, it would inevitably influence the remote sensing monitoring of thermokarst lakes.To test the robustness, the proposed methods were applied to map thermokarst lakes using the sentinel-2 image acquired on 3 December 2018, when there was a thin layer of snow on the ground and lake ice.The reference map in Figure 12 (a) was manually delineated by the visual interpretation from the Sentinel-2 image acquired on 23 December 2018, which was taken at the closest date without cloud and snow cover.
The quantitative results are shown in Table 3.The four MRF-based methods can effectively map thermokarst lakes with the kappa coefficient exceeding 0.90.We selected two sub-regions with obvious snow cover to qualitatively evaluate the results.As shown in Figure 12 (c) and (g), the MNDWI-based MRF method distinctly misclassifies the snow around lakes.This is mainly because the MNDWI values of snow and ice are similar (Liang et al. 2022).Snow was included in training samples of lakes, thus causing over-detection.Although snow cover interferes with the accurate mapping of thermokarst lakes, lake detection techniques can still enhance spectral discrepancies between ice and snow.Besides, MRF considers both spectral and spatial contextual information of lakes.Therefore, the MRF-based methods can be conveniently adapted to the automatic mapping of thermokarst lakes with snow cover.

Adapt to map lakes at a regional scale
To further validate the robustness of the proposed method, we extended its application in mapping thermokarst lakes at a larger scale, specifically on the northern bank of the Chumarhe River, covering an approximate area of 1650 km² (depicted in the entire scene of Figure 1 (a)).This region has complex surface features, with the southwestern area comprising mountains reaching altitudes exceeding 5000 m.Apart from the perennial Chumarhe River, there are seasonal streams formed by melting snow flowing down from the surrounding mountains.We utilized the same set of the Sentinel-2 images (Figure 1) except for the image acquired on 11 July 2018 due to cloud cover.Considering the absence of cloud-free images during the  adjacent time, the Sentinel-2 imagery captured on 15 August 2019 was chosen as an alternative (Figure S1).When mapping thermokarst lakes over wide areas, it is essential to mask out features like rivers and mountain shadows, which exhibit low reflectance similar to that of lakes, as well as lakes larger than 3 km².River mask utilized in this process was derived from the Global River Widths from Landsat (GRWL) Mask V01.01 dataset, which contains the global river masks at mean discharge (Allen and Pavelsky 2018).The SRTM DEM V3 was employed to generate slopes and shaded relief maps to exclude mountain shadows from lake mapping.Furthermore, lakes larger than 3 km² were masked out using the dataset 'The lakes larger than 1 km² in the Tibetan Plateau (V3.0) (1970s-2021)' (Zhang et al. 2019a).
Subsequently, the proposed methods were utilized to extract thermokarst lake maps for the four distinct periods within this region.Due to the continuous fluctuations in river flow across the year, it was proven inadequate to rely solely on the GRWL Mask dataset for river masking (Figure S1).During the postprocessing, the global river vector data products including the Open-StreetMap waterways, the GRWL Vector Product V01.01, and the HydroRIVERS dataset were used to further eliminate rivers (Lehner and Grill 2013;Wei et al. 2021).Rivers and streams often exhibit elongated and linear shapes.Water bodies with elongated shapes (SI > 3) located in proximity to river and stream vectors (< 100 m) were removed as potential river or stream features (Su et al. 2023).Finally, by visually interpreting the 500-m buffer zone around river and stream vectors, the misclassified rivers and streams were further eliminated, for generating the final thermokarst lake maps.As shown in Figure 13 and S2-S4, the results corroborate the robustness and feasibility of the proposed methods to map thermokarst lakes from Sentinel-2 imagery at regional scale.

Future work
In this study, the training sample masks play a crucial role in the quality of lake mapping results.Future research can develop new methods to generate training sample masks with even higher quality.In addition, compared with NDWI, MNDWI produced by the 10-m fused SWIR band is more robust in mapping thermokarst lakes with different physical states.Thus, it can be considered to downscale the other Sentinel-2 bands such as SWIR-2 (band 12) to increase more available spectral information for efficient mapping of thermokarst lakes (Wang et al. 2018).In future studies, the sub-pixel mapping (SPM) method may be used to further improve the accuracy of thermokarst lake mapping.SPM can obtain the lake mapping result with a finer spatial resolution derived from the input of coarse proportion imagery (Wang, Shi, and Atkinson 2014), which may be effective for identifying pixels without obvious spectral characteristics and reduce the threshold of the minimum area of reliable mapped thermokarst lakes using Sentinel-2 imagery.This study only mapped regional thermokarst lakes in different physical states on the Tibetan Plateau within one year.We did not observe and analyze the long-term evolution of thermokarst lakes at a wider scale.Future work will be centered on applying these proposed methods to monitor spatio-temporal changes in thermokarst lakes across the entire Tibetan Plateau.

Conclusion
Thermokarst lakes on the Tibetan Plateau are usually tiny and affected by the seasonal changes, thus raising substantial challenges for remote sensing-based mapping approaches.To solve this problem, this study improved the MRF-based thermokarst lake mapping methods by integrating NDWI, MNDWI, PCA, and ICA with MRF to rapidly map thermokarst lakes in different physical states characterized by season changes on the central Tibetan Plateau.In addition, the ATPRK method was used to downscale the 20-m SWIR band for producing the 10-m MNDWI image.To illustrate the effectiveness, the proposed methods have been applied to map thermokarst lakes in a lake-concentrated area with an area of approximately 90 km 2 along the Qinghai-Tibet railway.A series of experimental results have demonstrated that: (1) the proposed methods have broad applicability for mapping small thermokarst lakes in different physical states; (2) the downscaled Sentinel-2 SWIR band can increase the available spectral information for more accurate thermokarst lake mapping; and (3) the PCA-and ICA-based MRF are robust during periods of ice-water mixture and low water level.We consider the proposed methods can provide favorable technical support for monitoring interannual and seasonal changes of thermokarst lakes in response to permafrost degradation and climate warming over wide susceptible areas on the Tibetan Plateau.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Figure 1 .
Figure 1.Location of the study area (a) and datasets (b)-(e).The Sentinel-2 image was acquired on (b) 22 January, (c) 27 April, (d) 1 June, and (e) 11 July 2018, corresponding to the ice, ice-water mixture, low water level and high water level periods of thermokarst lakes, respectively.Sentinel-2 images are composed by the NIR, Red, and Green bands.

Figure 2 .
Figure 2. Flowchart of semi-automatic MRF-based methods for thermokarst lake mapping.

Figure 3 .
Figure 3. Lake detection images generated by NDWI, MNDWI, PCA, and ICA based on the Sentinel-2 images acquired in (a) January, (b) April, (c) June, and (d) July of 2018.

Figure 4 .
Figure 4. Training sample masks produced from lake detection images based on the Sentinel-2 images acquired in (a) January, (b) April, (c) June, and (d) July of 2018.The pixels marked by green, red, and black stand for non-lake, thermokarst lake and uncertain areas, respectively.

Figure 5 .
Figure 5. Diagram of MRF for thermokarst lake mapping.The MRF graph model refers to Li et al. 2016.

Figure 6 .
Figure 6.Reference maps.(a) The manually digitized thermokarst lakes (blue) superimposed on the Sentinel-2 images and (b) the binary reference maps.

Figure 12 .
Figure 12.Performance of the proposed methods in mapping snow-covered thermokarst lakes.(a) The manually digitized thermokarst lakes (blue) superimposed on the Sentinel-2 images acquired on December 3, 2018.(b)-(e) and (f)-(i) Thermokarst lake mapping results (red) of NDWI-, MNDWI-, PCA-, and ICA-based MRF methods in comparison with the manually digitized lake boundaries (blue curve) in Sub-area A and B, respectively.

Figure 13 .
Figure 13.Thermokarst lake mapping results on the northern bank of the Chumarhe River in the ice period.(a)-(d) Automatic results of NDWI-, MNDWI-, PCA-, and ICA-based MRF methods superimposed on the Sentinel-2 images acquired on January 22, 2018.

Table 1 .
Parameter setting for T in the training samples generation.

Table 2 .
Quantitative evaluation results of thermokarst lake mapping.

Table 3 .
Quantitative evaluation results of thermokarst lake mapping based on the Sentinel-2 image acquired on 3 December 2018.