Land cover change detection in the Aralkum with multi-source satellite datasets

ABSTRACT The Aral Sea, once the fourth largest freshwater lake on Earth, has lost circa 90% of its original water surface in 1960. Maps of different land cover categories provide a suitable baseline to plan and implement effective measures to combat ongoing desertification, such as reforestation of dried out Aral Sea soils. In this study, we used satellite-based remote sensing data and applied a machine learning method (Random Forest) to map land cover in the Aralkum in 2020. We tested different satellite data from optical (Landsat-8, Sentinel-2) and Radar instruments (Sentinel-1) and trained a random forest model for classifying different combinations of these data sets into ten distinct land cover classes. We further calculated per-pixel uncertainty based on posterior classification probability scores. An accuracy assessment, based on in-situ data, revealed that the average overall accuracy of land cover maps was 86.8%. Fusing optical and radar instruments achieved the highest overall accuracy (88.8%, with lower/higher 95% confidence interval values of 87.6%/89.9%, and a Kappa value of 0.865. Classification uncertainty was lower in more homogeneous landscapes (i.e. large expanses of a single land cover class like water or shrubland). Only around 9% of the study area was still water in 2020, while 32% was covered by saline soils with high erosion risk. Several potential applications of this land cover map in the Aralkum exist – spanning many areas of environmental impact assessment, policy, and planning and management or afforestation. This methodological framework can similarly provide a useful template for more broadly assessing large-scale, land dynamics at high-resolution in the entire Aralkum and surrounding areas.


Introduction
The Aral Sea was once the fourth largest freshwater lake on Earth, with an area of approx. 67,000 km 2 in 1960 (Micklin 2016). It has lost circa 90% of its original water volume over the past six decades as a result of massive human intervention in the water balance of the Aral Sea basin, e.g. for the development of irrigated lands alongside the two major tributaries of the lake, Amudarya and Syrdarya, and hydropower production (Dukhovny and de Schutter 2011;Micklin 2016;Dimovska 2019). The area of exposed, dry seabed reached 60,000 km 2 in 2009 (Löw et al. 2013), ultimately resulting in the emergence of a new desert, the Aralkum. This caused a series of severe environmental issues (Breckle et al. 2012). The hydrological and soil morphological changes that occurred during and after desiccation gave rise to new terrestrial surfaces that are exposed to varying degrees of soil erosion risk from wind (Dukhovny et al. 2008). Indeed, in Central Asia the Aralkum became one of the major sources of dust and salt storms (Spivak et al. 2012;Groll, Opp, and Aslanov 2013;Issanova et al. 2015;Ge et al. 2016;Shi et al. 2020;Zhang et al. 2020;Karami et al. 2021), emissions of organohalogen substances from dried salty soils (Kotte et al. 2012), and created an ideal habitat for locust breeding (Löw et al. 2016).
One potential mitigation measure to address the ongoing desertification and wind erosion is largescale afforestation of the dried seabed. Under the framework of the Concept of Transforming the Aral Sea Region with Ecological Innovations and Technologies together with Green Aral Sea Initiative (UNDP, 2021), systematic work is currently underway to establish protective forests in order to stabilize the environment and restore the natural balance to the affected soil. As a result, more than 1 million hectares of land have been prepared for planting, 461,000 hectares have been reforested, including 93 km of protective fences, and in 2019, 420 hectares of Saxaul nurseries were established, with planting continuing through the spring of 2020 by a decree the Cabinet Ministries of the Republic of Uzbekistan (LexUZ 2019). The planning of such measures, however, requires identifying and mapping of not only land cover in the Aralkum at high risk for soil erosion, but also that which can support the planting of appropriate vegetation such as black Saxaul shrubs (Haloxylon aphyllumin) (Chang et al. 2019).
Recent studies used satellite remote sensing to monitor the decrease in the water surface, water level, and related change in evaporation and increasing surface temperature arising from the desiccation of the Aral Sea (Jin et al. 2017;Singh et al. 2018;Deliry et al. 2020). Further studies mapped land cover change in the Aralkum and surrounding areas with optical satellite images (Löw et al. 2016;Shen et al. 2019), or identified potential areas for establishing vegetation (Kim et al. 2020). Usually, these studies calculate vegetation indices such as the Normalized Difference Vegetation Index (NDVI) (Rouse et al. 1974) and other indices from satellite observations to separate different land cover categories (Löw et al. 2016;Shen et al. 2019). This is because these indices support the separation of different land cover categories, which are characterized by different vegetation types and densities, and which have different reflectance values in different parts of the electromagnetic spectrum. The classification of satellite imagery in these studies was performed by applying different supervised image classification methods, mostly in combination with machine learning algorithms such as random forest (RF) (Breiman 2001) or support vector machines (SVM) (Cortes and Vapnik 1995). The reported overall accuracies in studies using these algorithms range from 79% (Shen et al. 2019) to 95% (Löw et al. 2016). This enabled land cover change mapping in the whole Aralkum desert with a high classification accuracy.
Previous, satellite-based land cover assessments (e.g. Löw et al. 2013) revealed that within 8 years (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008), the landscape had changed dramatically: in 2000, the Aral Sea and smaller water bodies still covered a huge part of the study area (circa 29.000 km 2 , i.e. 41% or the Aralkum), but by 2008 they had decreased significantly (circa 13,000 km 2 , 19%), leaving behind only small water bodies, such as the lakes Ribachie, Dzhyltyrbas, and Sudochie. Vegetation cover and species composition is influenced by on-going primary succession of the dried seabed and recent satellite vegetation assessments indicate a regrowth in vegetative cover such as in the eastern part of the Aralkum (Kim et al. 2020).
Despite these technological advancements, a sound understanding of the soil erosion risk in the Aralkum nevertheless requires land cover maps with more distinct land cover categories at a high spatial resolution (Dukhovny et al. 2008). The limited or non-availability of in-situ reference dataincluding soil analysis -often hampers the creation of such detailed maps, due to the associated costs of collecting ground-truth reference data. Complicating matters, satellite data availability for the Aralkum is limited by cloud cover in the winter season and frequent dust storms and haze in the summer season, limiting the use of optical instruments. It is for these reasons that previous studies focus either on images from the summer seasons (Shen et al. 2019) or moderate resolution images with a high revisit frequency that allow for certain interpolation techniques that can address cloud cover and data gaps (Löw et al. 2013(Löw et al. , 2016. With open satellite archives such as those provided by the National Aeronautics and Space Administration (NASA) (Landsat -30 m) or the European Space Agency (ESA) (Sentinel-1 and 2-10 m), an increasing amount and variety of highresolution satellite remote sensors have recently become available. Their full potential, however, has not yet been leveraged for Aralkum land cover mapping. Against this backdrop, this study combines cloud-based (i.e. Google Earth Engine, GEE) and local computational workflows to map land cover at high spatial resolution in the Aralkum. The main objectives are as follows: (i) To evaluate and compare different remote sensing data sets for land cover mapping; (ii) To map land cover in the observation period 2020; (iii) To discuss the results and make recommendations for future action.

Study area
The study area is the Aralkum, i.e. the desert that emerged within the Aral Sea border of 1960 ( Figure 1). The remaining portions of the Aral Sea and the newly formed Aralkum are located in Central Asia. In the literature, the Aral Sea surface in 1960 is usually placed between approximately 67,000 and 68,000 km 2 , depending on the data source and methodology used to estimate its area (Létolle et al. 2007;Micklin 2010).
The study area is characterized by continental climatic conditions with cold winters and hot summers. The average annual temperature in the southern portion (meteorological station "Moynak" in Uzbekistan, 59.02° E, 43.47° N) is 11.7°C, with highly variable annual precipitation falling between 60 and 140 mm. In the northern portion (meteorological station "Aralskoe More" in Kazakhstan, 61.67° E, 46.78° N) the average annual temperature is 7.8°C, and the average annual precipitation is 141 mm. Potential annual evaporation rates of 800-1,100 mm in the northern portion and of 1,000-1,300 mm in the southern portion are typical (Breckle et al. 2012).

In-situ reference data
A key dataset for this study was a comprehensive set of ground reference data from field visits in the Aralkum, which was used for calibrating and validating the applied remote sensing methods. A team of specialists from Scientific-Information Center of the Interstate Commission for Water Coordination of Central Asia (SIC-ICWC) and the International Innovation Center Prearalie under the President the Republic of Uzbekistan (IICP), Karakalpakistan Hydrogeological expedition, Institute of Bioorganic Chemistry of the Academy of Sciences of Republic of Uzbekistan, visited the Uzbek portion of the Aralkum in October 2019 and June 2020 to perform the ground measurements (Dukhovny, Stulina, and Kenjabaev 2020). The location of the in-situ samples is shown in Figure 2 and exemplary photographs in Figure 3.
The survey team collected comprehensive information about land cover conditions at 2,142 different "sample" locations in situ (see Table 1 and Figure 2). Photographs and locations recorded with a Global Positioning System (GPS), using Garmin Etrex 30x, and a location accuracy of circa 2-3 m were taken at each location and later complemented by on-screen collected reference data (see section 4.1.2 and). On each site, vegetation cover, species composition, and soil condition were recorded. The sites were selected to be homogeneous landscapes within a radius of at least 100 meters around the visited location. The Land cover Classification System (LCCS) developed by the United Nations (UN) Food and Agriculture Organization (FAO) (Di Gregorio and Jansen 2005) served as the basis for distinguishing land cover classes. It was adjusted and extended to contain information about 17 distinct land cover classes (Dukhovny et al. 2008), which were identified as relevant for describing the surface types and their associated eolian erosion risk. The original 17class schema was later reduced to 10 classes (see section 4.1.2). Figure 3 exemplarily shows photographs of different land cover categories observed in the Aralkum.

Landsat-8 satellite images
The data used to create the land cover maps included Landsat 8 Operational Imager (OLI) data (Table 2 and Figure 4). The Landsat-8 OLI images contain five visible and near-infrared (VNIR) bands, three short-wave infrared (SWIR) bands, and two thermal infrared (TIR) bands. The analysis-ready dataset, i.e. the United States Geological Survey (USGS) Landsat 8 Surface Reflectance Tier 1 product, is atmospherically corrected surface reflectance from the Landsat 8 OLI/ TIRS sensors. The data was atmospherically corrected by USGS using the C-Function of Mask (CFMask) algorithm for Landsat-8 (Foga et al. 2017) and includes a cloud, cloud and topographic shadow, water and snow mask, as well as a per-pixel saturation mask. Strips of collected data are packaged into overlapping "scenes" covering approximately 170 km x 183 km using a standardized reference grid. Both data sets, Landsat-8 and 5, were from 2020.

Sentinel-1 satellite data sets
The Sentinel-1 mission is a constellation of two polar-orbiting satellites, operating day and night performing C-band synthetic aperture radar imaging, enabling them to acquire imagery regardless of the weather (Figure 4). The Sentinel-1 C-band operates at a central frequency (ν) of 5.404 GHz in the microwave portion of the electromagnetic spectrum corresponding to a wavelength (λ) of 5.55 cm (ESA 2021a). Over the study area, Sentinel-1 acquisitions are in the so-called interferometric wave (IW) mode, which registers the backscattering of a vertically transmitted microwave signal in a vertical and horizontal receiver, creating a vertical transmit and vertical receive (VV) and vertical transmit and horizontal receive (VH) polarized band, respectively (ESA 2021a). The two-satellite constellation offers a 6-day exact repeat cycle at the equator (ESA 2021a). Since the orbit track spacing varies with latitude, the revisit rate is significantly greater at higher latitudes than at the equator. Starting from the Ground Range Detected (GRD) Level-1 product, the data was preprocessed with the Sentinel-1 Toolbox (ESA 2021b) through a) thermal noise removal, b) radiometric calibration, and c) terrain correction to create geocoded, backscatter coefficients (σ0) with 10 m pixel spacing (ESA 2021a).

Sentinel-2 satellite data sets
Sentinel-2 with its Multi-Spectral Instrument (MSI) is a wide-swath, high-resolution, multi-spectral imaging mission (ESA 2021c) (Table 2 and Figure 4). The full mission specification of the twin satellites (Sentinel-2A and 2B) flying in the same orbit but phased at 180°, is designed to give a high revisit frequency of 5 days. Sentinel-2 carries an optical instrument payload that samples 13 spectral bands: four bands at 10 m, six bands at 20 m and three bands at 60 m spatial resolution (Table 3). The orbital swath width is 290 km (ESA 2021d). Atmospherically corrected Sentinel-2 images (MSI Level-2A) were obtained from the ESA/ Copernicus portal (ESA 2021e). Only the 10 and 20 m bands were used for this study, while the coarser bands were omitted to be better comparable to Landsat-8.

Methods
We created land cover maps, based on different satellite data inputs, with ten land cover classes for the year 2020. We quantified the accuracy and uncertainty of our classifications and summarized and discussed land cover in the Aralkum, as well as potential applications of this method to support environmental monitoring assessments.

Land cover classification
Our land cover classification for the Aralkum combined cloud-based (Google Earth Engine) processing for the pre-processing of the satellite data and local computing approaches (using the software R) to classify, validate, post-process, and map the classes. The necessary processing steps are described below and illustrated in the flowchart in Figure 5. First, the reference data was collected during insitu field campaigns in the Aralkum, one in 2019 and another one in 2020 (see section 4.1.2). During these two campaigns, a total of 2,142 ground (insitu) samples were collected. The reference data was then complemented on-screen and further enhanced with a synthetic minority over-sampling technique (SMOTE) (Chawla et al. 2002). We defined ten distinct land cover classes (see) for our classification and evaluated several input data sets from different satellite systems (Sentinel-1 and 2, and Landsat-8), which we classified with a Random Forest algorithm. The land cover maps,  based on different inputs, where evaluated in terms of an accuracy assessment and a spatial measure of classification uncertainty was provided.

Calculation of predictor variables
The following steps were done in Google Earth Engine (Gorelick et al. 2017). We used the atmospherically corrected Landsat-8 and Sentinel-2 images to create a stack of images for 2020, the year for which we collected most of the in-situ data. We first calculated the percentage of area affected by cloud or cloud shadow (no-data pixels) for each optical image. These calculations used the CFMask algorithm for Landsat-8 and applied the 'QA60ʹ flag provided in the Sentinel-2 metadata by ESA. Images with percentages above 90% of no-data pixels and images without orthorectification according to their metadata were not processed. We then applied the CFMask algorithm (Landsat) and the ESA Sentinel 'QA60ʹ flag to mask clouds, cloud-shadows, snow, and dust/haze. Each stack consequently contained different valid observations at the pixel-level, depending on cloud, snow and haze/dust. Different spectral indices were then calculated and appended to the cloud-free Landsat-8 and Sentinel-2 image stacks, along with the spectral bands for use in land cover classification. To capture differences in land cover with optical sensors, we calculated vegetation indices (VI). Vegetative cover can be derived by applying such indices (e.g. the NDVI, see equation 1) (Rouse et al. 1974), which was also used in several studies about the Aralkum (Kim et al. 2020;Löw et al. 2013;Shen et al. 2019): where NIR is the near-infrared and Red the red multispectral bands of the satellite sensors (see Table 2). The Normalized Difference Water Index (NDWI, equation 2) (Gao 1996), also used in a previous Aralkum study (Shen et al. 2019), was selected to capture the subtler differences in vegetation vigor and potentially also in soil moisture: where SWIR is the shortwave infrared and NIR the near-infrared multispectral bands of the satellite sensors (see Table 2).
The Salinity Index (SI, equation 3) (Gorji, Sertel, and Tanik 2017) was utilized to improve the mapping of marsh solonchak and solonchak soils as well as the separation of non-saline, sandy and bare soils: SI ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where Blue is the blue and Red the red multispectral bands of the satellite sensors (see Table 2). The Tasseled Cap Transformation (TCT) is a technique commonly used in land cover mapping or other classification applications (Kauth and Thomas 1976). It takes the linear combination of satellite imagery bands and a specialized coefficient matrix to create a n-band image with the first three bands containing the majority of the useful information, similar to Principal Component Analysis (PCA). The first three bands represent brightness, greenness, and wetness, respectively. The coefficient matrix, which is unique to each imaging sensor, is based on image statistics and empirical observations. Brightness, greenness, and wetness indices were calculated for the Landsat-8 (Baig et al. 2014) and Sentinel-2 data (Shi and Xu 2019).
We kept the atmospherically corrected, multispectral bands with a spatial resolution of 10-30 m from the Landsat-8 (7 bands) and Sentinel-2 (10 bands) images, respectively, and harmonized the spatial resolution of the input datasets to a common pixel size of 10 m for all sensors using nearest neighbor resampling.
Maximum value composites were then produced by computing the maximum value of every predictor variable (multi-spectral bands, NDVI, NDWI, SI, and TCT indices) at the pixel-level in three distinct epochs, guided by data availability: (i) Epoch-1 between March and May, (ii) epoch-2 June-August, and (iii) epoch-3 between September and November. Gaps which occurred due to the previous masking of clouds, were thereby temporally filled.
For the Sentinel-1 data, we first applied speckle filtering using a Lee Sigma with a 5 × 5 window to the VV and VH bands, respectively, and then computed bi-monthly maximum composites of the filtered VV and VH polarized bands to reduce the amount of data. In addition, we computed the ratio of VV to VH as predictor variable.
We tested different combinations ("sets") of variables as predictors in the classification (Table 3). Under Set A, the multi-spectral bands, NDVI, NDWI, SI, as well as the TCT indices were calculated based on the Landsat-8 composites. Set B uses the same approach but is based on Sentinel-2 composites, while Set C combines Set A and Set B. Set D is exclusively composed of Sentinel-1 bi-monthly composites (VV, VH, and ratio). Finally, Set-E combines all optical (Set-C) and SAR (Set-D) sets, i.e. 159 predictor variables.

Preparation of reference data
The final class legend used for mapping consisted of 10 instead of 17 classes, as originally recorded during the field surveys (). After considering the constraints that the broad spectral resolution of the Landsat-8 and Sentinel-2 sensors impose when used for classifying spectrally similar land cover classes, especially different salt soils, and based on initial tests (not reported here), we decided to skip or merge some classes.
In addition to the reference data from ground sampling, sample locations were collected remotely for different classes in the whole study area due to the relative inaccessibility of these areas. This was done by visually interpreting high-resolution images (such as Google Maps) and different band combinations of Landsat-8 and Sentinel-2 images from 2020. In the same way, some outliers or samples from the field surveys represented by mixed pixels were removed.
As an additional means of quality control for the reference data, homogeneous regions in terms of reflectance were identified with a K-medoids clustering (Kaufman and Rousseeuw 1990). For this purpose, the predictor variables listed in Table 3 (Set-C) were used as input. Based on the structural Silhouette technique, the optimal number of clusters was automatically determined (Chiang and Mirkin 2010). Then, a circular buffer of 120 m was created around each reference data point (GPS or on-screen collected) and the sample was only kept for homogeneous regions (i.e. only one cluster was allowed to be present in the 120 m buffer area around the sample point location).
For the validation data, 50% of the samples (including the circular buffer area) were randomly set aside for the accuracy assessments per land cover category, the remaining samples were used for training of the classifier algorithms. One hundred points were randomly created inside the validation samples per land cover category and used to assess the accuracy of the maps (see section 4.2).
Despite the extensive in-situ and additional onscreen sampling, the number of training locations of some land cover class was small. This is attributed to the fact that the area is partially not well accessible. In the context of supervised classification, imbalanced training samples are often handled by over-and undersampling to achieve a more balanced training data set. Hence, classification was done using a balanced training dataset in which all classes had the same number of training samples as the majority class category.
In this study, we applied SMOTE (Chawla et al. 2002) in the programming software R (within the "DMwR" package, version 0.4.1) to generate synthetic calibration samples. SMOTE's core idea is to artificially generate new samples of the minority class using bootstrapping and k-nearest neighbors. As a hybrid method, SMOTE features both oversampling of the minority class and under sampling of the majority class (Waldner, Jacques, and Löw 2017). SMOTE was chosen because it was found to increase classification accuracy in previous studies (Johnson, Tateishi, and Hoan 2013;Johnson and Iizuka 2016;Pozi et al. 2015).

Land cover classifications
We used a RF classifier (Breiman 2001) in the programming language R (as implemented in the "randomForest" package, version 4.6-14) to classify the input data and to map the land cover classes. RFs generate a multitude of decision trees by randomly drawing samples with replacement from the training data and determining the best split at each decision tree node by considering a maximum number of randomly selected features ("max features"). We tested different parameter ranges for the number of trees and for "max features" and finally used 300 trees and 10% of the input features considered at each split to parameterize the RF classification models. We trained a RF model for each of the distinct input data sets A-E (see Table 3).
To enhance the reliability of the experiments, i.e. to avoid overfitting and increase the robustness of our results, each classification with RF was repeated 100 times using different folds of the training data set. For each fold, we performed a land cover classification using only 85% of the training data (note that the validation dataset was not subsampled as it was only used to validate the final ensemble of the independent classifications). The mode of the 100 independent classifications was used as the mostlikely land cover class for every pixel at each timestep, and validated with the independent validation data.

Post-classification
We used an erosion-dilation kernel of radius 1 in the programming language R (implemented in the "terra" package, version 1.3-4) eroding three times and dilating twice was then applied to smooth the land cover maps and remove isolated pixel noise ("salt and pepper," see Figure 6) (d'Andrimont et al. 2020).

Accuracy assessments and land cover change analysis
We evaluated the classification results with the 100 samples per land cover category (see section 4.1.2), applying standard accuracy assessment metrics based on error matrices (Congalton 1991). We followed the best practice recommendations for assessing classification accuracies and area estimation (Olofsson et al. 2014), as well as those for calculating accuracy estimates with confidence intervals (Foody 2009). We calculated the following metrics to assess the accuracy of the different classifier models, using the caret package (Kuhn 2012) in R: • Overall classification accuracy including 95% confidence intervals • Kappa coefficient of covariation • User´s (UA) and producer´s (PA) classification accuracy • Class-wise F1 scores, the harmonic mean between UA and PA (Van Rijsbergen 1979) The overall accuracies achieved by the different classification approaches were reported at 95% confidence intervals. The confidence interval of the difference (inequality, equation 4) in overall accuracy values between two classifier algorithms is be given as: where SE p 1 À p 0 is the standard error of the difference between two estimated proportions with z ¼ 1.96 and α ¼ 0.05. p 1 and p 0 are the proportions of correctly classified test samples of the two classifiers under comparison. Using confidence intervals reveals more information about the disparity in the compared classification accuracy values (Foody 2009). Besides the final class label non-parametric algorithms like RF can give, for each classified object or pixel x, a soft answer that provides an estimation of the membership degrees of x to the land cover category (Giacco et al. 2010). In the RF framework it is defined as the number of trees in the RF ensemble contributing to the final class (Loosvelt et al. 2012). This membership degrees is in the form of a vector (equation 5), which contains the class membership estimations associated with x: pr x ð Þ ¼ p 1 ; p 2 ; . . . ; p i ; . . . ; p n � � (5) where p i is the membership degree of x to class i, and n the number of land cover classes. Each of the elements in pr x ð Þ can be interpreted as a degree of belief or probability that a case x actually belongs to class i. In this study two measures of uncertainty were calculated from each vector pr x ð Þ. These are based on the maximum posterior classification probability value in pr x ð Þ, pr max x ð Þ, and the α -quadratic entropy (Pal and Bezdek 1994). The maximum probability pr max x ð Þ belongs to the class i that is usually taken as the final class when the soft results are transformed into a hard one. E ¼ 1 À pr max x ð Þ gives a measure of doubt that can be used to quantify map uncertainty. The αquadratic entropy (equation 6) is defined as where pr x ð Þ is the vector that contains the soft outputs, n the number of classes, and α an exponent that determines the behavior of H / pr x ð Þ ð Þ.The advantage of this measure is that it summarizes all the information contained in pr x ð Þ, and commits the probabilities of the other classes in the uncertainty evaluation. It also has a higher sensitivity compared to other measures such as the Shannon entropy H (Foody 1995) when the components of pr x ð Þ change. When values of / increase from 0 to 1, then H / pr x ð Þ ð Þ will become more and more selective if the components in pr x ð Þ tend toward equalization. In this study, / = 0.75 was chosen to impose a stronger penalty under such a scenario.
Finally, the ratio of the observed to the maximum possible H / pr x ð Þ ð Þ was normalized to a scale of [1-100]. If a pixel is found to have maximum probability (minimum entropy) of belonging to one class ( pr max x ð Þ ¼ 1) then it will have an alpha score H / pr x ð Þ ð Þ of 100. H / pr x ð Þ ð Þ was calculated for each class per input data set tested and maps of H / pr x ð Þ ð Þ values were created to visualize spatial variations in the reliability of land cover mapping (also referred to as map uncertainty).

Accuracy assessment of the land cover classification
The influence of the selected predictor variables, as defined by five distinct input data sets (A-E), was observed and expressed as variability in classification accuracy and uncertainty. For instance, at a global level (see Table 4), the overall accuracy for the 2020 land cover maps, based on different sets of predictor variables, ranged from 85.3% (Set-B, Sentinel-2) to 88.8% (Set-E, all data sets combined). Comparing all input data sets, the average overall accuracy was 86.8%, while the overall accuracy of the land cover map based on the combined optical and SAR predictor variables (Set-E) was highest (88.8%). In terms of statistical significance, however, there is no significant difference among all models tested at the 95% confidence level and when looking at overall accuracy, as indicated by the overlapping 95% confidence intervals. At a land cover class level, accuracies based on different input data sets varied more as compared to the overall accuracies (see Table 5). Among the different input data sets, Set E (all data sets combined) achieved the highest F1 scores for most of the land cover categories (seven out of ten). For instance, the F1 scores for Set E varied from 0.740 to 1.00 for "Solonchak" and "Water," respectively. The same pattern was found for other input data sets. "Water," "Shallow water," "Marsh/ shore solonchaks," and "Meadow" tended to have the highest F1 scores, while other land cover categories with no or only sparse vegetative cover (e.g. "Desert soils-1") tended to have the lowest. "Shrubland-1" and "Shrubland-2" were mapped with relatively high accuracies (F1 scores > 0.85 for all input data sets tested). Table 6 presents the median alpha scores H / pr x ð Þ ð Þ for different land cover classes and Figure 7 illustrates the spatial variability of classification uncertainty. Low levels of uncertainty, expressed by high alpha scores, were present in large areas of homogeneous land cover such as "Water," "Marsh/shore Solonchaks," "Solonchaks," or "Meadow." Conversely, areas with Table 4. Overall classification accuracies for different input data sets (predictor variables) in 2020. The maximum overall classification accuracy value (representing highest accuracy in the land cover maps) among the input data sets for a given class is boldface. spatially complex landscapes tended to present medium to low alpha scores i.e. the classifier assigned different land cover classes to the same pixel with similar probability. This was especially apparent in the vast transition zones between the salt soils (e.g. "Marsh/shore solonchaks" and "Solonchaks") and other, non-vegetative land cover types (see Figure 7) . Similar to the F1-scores (Table 5), combining all predictor variables decreased the classification uncertainty. Set E had the highest alpha scores (lowest uncertainty) in four out of ten land cover classes (see Table 6).

Land cover area assessment
The resulting land cover map revealed that in 2020, approx. 9% of the Aralkum territory was covered by water (compare also Table 7). This confirms the significant loss in the lake's surface area found in other studies, as well as continued shrinking of the water surface; in 2015, for example, surface area still approximated 14%, as measured with Landsat data (Shen et al. 2019).
The area of strong salt accumulation in the topsoil, including salt crusts, covered a large fraction of the total Aralkum (~32%), while shrublands could be Table 5. F1 scores for different input data sets (predictor variables) in 2020. The maximum F1-score value (representing highest perclass-accuracy in the land cover maps) among the input data sets for a given class is boldface. 1 = Water, 2 = Shallow water, 3 = Marsh/shore solonchak, 4 = Solonchak, 5 = Sandy soils with sparse vegetation, 6 = Desert soils-1, 7 = Desert soils-2, 8 = Meadow, 9 = Shrubland-1, 10 = Shrubland-2.  Table 6. Median values of the alpha score H / pr x ð Þ ð Þ calculated from the total of classified pixels, for each of the land cover classes and different sets of predictor variables. The maximum alpha score value (representing a minimum of uncertainty in the land cover maps) among the input data sets for a given class is boldface. The first value given is the F1 score, the second value the alpha score H / pr x ð Þ ð Þ. 1 = Water, 2 = Shallow water, 3 =Marsh/shore solonchak, 4 = Solonchak, 5 = Sandy soils with sparse vegetation, 6 = Desert soils-1, 7 = Desert soils-2, 8 = Meadow, 9 = Shrubland-1, 10 = Shrubland-2.  found on ~16% of the area (Table 7). While the area of salty soils increased compared to 2015 (Shen et al. 2019), which can be explained by the continued decrease of the water level (Singh et al. 2018), the area of shrubland differs from Shen et al. (2019) (2%) but is more in the range of Löw et al. (2013) (16%). The difference can be explained by a continued increase in the vegetative cover (Kim et al. 2020), but also by the different definition of land cover classes like shrub cover. The combined area of "shrubland" and "grassland" in the study of Shen et al. (2019) is in fact 15%, and our shrubland categories contain different abundancies of shrubland including herbaceous vegetation.
The desiccation of the Aral Sea is accompanied by a sinking groundwater table and conversion of marsh solonchak to more sandy soils (Wucherer and Breckle 2001), a process that is characterized by increasing salinization (Novikova and Aldyakova 2006). For this reason, "March/shore solonchaks" are most often found directly adjacent to the remaining water or in the shallow eastern part of the Aralkum, which becomes periodically inundated, depending on the inflow of the Amudarya river (Micklin 2016). With continued desiccation, soil particles are exposed, moisture is lost, and sandy soils proliferate (Stulina and Sektimenko 2004). This is reflected by a spatialtemporal sequence of different soil types on the exposed seabed that can be seen in our maps: automorphic to hydromorphic solonchaks closer to the shoreline ("March/shore solonchaks") and desert sandy soils or sands and dunes at a greater distance from the coastline (Figure 8). Over time, salt soil tends to shift to bare area (Löw et al. 2013;Shen et al. 2019). We purposefully distinguish between different salt affected soil categories, not only due to significant differences in wind erodibility, but also because they reflect these processes. It has been shown that the two types of land cover (salt vs. bare) do not always change in one direction (sand soil to bare area), but can also switch from bare area to salt soil (Shen et al. 2019). Finally, as vegetative encroachment continues, shrub vegetation might establish and secure the loose sandy top soils, ultimately decreasing the soil erosion risk (Dukhovny et al. 2008).

Potential Application of Data and Methods
Our results provide a comprehensive land cover and uncertainty map with ten dedicated classes for the Aralkum. Until now, such data was available only at coarse-spatial resolution or limited to smaller geographic areas (e.g. Löw et al. 2013Löw et al. , 2016 and lacked confidence information at pixel level. Studies using historical multiclass information at finer resolution (30 m) were found to be accurate but have been limited to only a few land cover classes (e.g. Shen et al. 2019), hindering the detailed assessment of different risk levels of soil erosion as previously proposed for the Aralkum (Dukhovny et al. 2008).
The high resolution and accuracy of our map combined with the per-pixel confidence layer make it suitable for assessing land cover and land cover change, and it can become a basis for other methods to assess erosion risk in the Aralkum. Mapping soil erosion risk could be used for spatially targeting afforestation measures, an appropriate method for reducing the effects of soil salinization in the Aralkum . Several strategies for afforestation and restoration of the former Aral Sea bed were presented by several agencies and projects (Roll et al. 2003), such as the joint project "Addressing the urgent human insecurities in the Aral Sea region through promoting sustainable rural development" of the United Nations Development Programme (UNDP) and the United Nations Educational, Scientific and Cultural Organization (UNESCO) (UNDP-UNESCO 2021). While selecting potential afforestation sites requires the recording of several parameters such as soil moisture, terrain, or salinity, areas with high Table 7. Land cover area in 2020 (square kilometers, sqkm) for the 2020 land cover classification, based on input data "Set-E." 1 =Water, 2 =Shallow water, 3 =Marsh/shore solonchak, 4 =Solonchak, 5 =Sandy soils with sparse vegetation, 6 =Desert soils-1, 7 =Desert soils-2, 8 =Meadow, 9 =Shrubland-1, 10 =Shrubland-2. erosion risk should also be included. The land cover map generated in this study is a suitable input for guiding these measures in the vast Aralkum area. Due the timespan and resolution of Landsat data, the proposed method is additionally suitable for backtracing changes in land cover, soil erosion risk, and ongoing desertification. Although the inclusion of Sentinel-1 Synthetic Aperture Radar (SAR) data increased the accuracy of the maps, the use of features from the Landsat-8 satellite performed nearly as well and could be applied to track land cover change in the years preceding the Sentinel missions in 2014. Supplementing with Landsat-5 or decision fusion with the Moderate Resolution Imaging Spectroradiometer (MODIS) instruments ) may help to compensate for the lack of the SAR data in previous years.

Accuracy, uncertainty, and its sources
Besides enhancing the spatial resolution of available land cover information in the Aralkum (10 m), we provided a recent land cover map in 2020 with higher , per-pixel alpha score, which is high in landscapes with higher top-soil salt accumulation (e.g. shore solonchaks with visible salt crust) and low especially in the transition zones between the highly salt affected landscapes and the mostly non-vegetated landscapes around (other solonchaks without salt crust, sandy soils). classification accuracies and spatial resolution, as compared with previous studies (Löw et al. 2013;Shen et al. 2019). Our mapping product additionally offers previously unavailable per-pixel confidence information. With the exception of "Shallow water" and "Solonchaks," the F1 scores presented similar magnitudes for every class, indicating accurate estimations of land cover area.
In the case of the mostly non-vegetated land cover classes, commission and omission errors achieve lower accuracies, and are related to the semantic representation and generalized definition of the realworld land cover classes and the presence of large transitions zones between salty soils, sandy soils and vegetated areas. We defined our land cover classes to comply with the proposed scheme of assigning land cover categories to certain erosion risk classes (Dukhovny et al. 2008). In this context, a compromise is made between the reliability of the method for classifying distinct land cover categories (e.g. water and shrublands) and those that present similar spectral characteristics but needed to be separated to allow for a proper risk class assignment.
While restricting the number of predictor variables and increasing the training sample size used in the SMOTE classification or adding other predictor variables (different vegetation indices) may have increased the accuracy of the final classification, the effect is likely to be minimal.
The land cover map has an associated alpha score, which we used as a proxy of per-pixel-classification confidence. Such maps provide useful information about the reliability of the classifications as well as for areas with higher uncertainty (i.e. low alpha scores). We strongly recommend using the spatial uncertainty indicator when making use of these maps and guiding decisions related to managing areas prone to high erosion risk in the Aralkum. In addition, per-pixel confidence maps provide information about the areas where more ground-truth data might be needed. Areas with high uncertainty were similar across all tested input data sets and were most often concentrated in heterogeneous landscapes or in small patches. It should be further noted that due to the local conditions in the Aralkum and inaccessibility of some terrain, a random or stratified random sampling scheme could not be realized on ground. The sample was therefore subsequently balanced and enhanced by the SMOTE method. Previous studies found that SMOTE can improve the machine learning methods for remote sensing based land cover mapping (Douzas et al. 2019;Feng, Huang, and Bao 2019;Johnson and Iizuka 2016;Wang et al. 2019).
The mapping results also revealed further advantages of the Sentinel 1 sensor over its optical counterpart (Sentinel 2, Set B) and its suitability for classifying some land cover classes. "Shallow water" can be still distinguished from "Water," since the backscatter coefficient is a mixed signal of specular reflection on the water surface and double-bounce signals from the reed vegetation with stems rising above the water surface, while the infrared information of the optical sensor is absorbed in both classes. The dielectric constant, prominent in the top/surface-layer of the saline "Marsh/shore solonchak" provides further information on salinity content in soils; when soil moisture is low, the imaginary part of the dielectric constant increases with salinity (Taghadosi, Hasanlou, and Eftekhari 2019), i.e. high absorption of energy, resulting in a lower backscatter coefficient σ0 (Hoa et al. 2019). Conversely, the high sensitivity to surface roughness and soil moisture content in the first 5 cm of the topsoil results in a higher backscatter intensity thereby aiding in the identification of different soil types, soil granularity and mixed vegetation types (Benninga, van der Velde, and Su 2020).

Conclusion
We have evaluated several remote sensing-based predictor variables to map land cover at 10 m in the Aralkum in 2020, based on ground-truth data. The map is, to our knowledge, the first of its kind to be developed with ten distinct land cover classes for the Aralkum, and one of only a few of any kind available. It was created with a combination of optical (Sentinel 2, Landsat 8) and SAR (Sentinel 1) satellite data and has an overall accuracy of 88.8%. Per-class F1 scores were 0.88 on average (median) for ten land cover classes, underscoring the confidence of our estimates.
By exploiting the capabilities of GEE for analyzing huge amounts of data in the cloud, we have demonstrated that time-series mapping of land cover over the large Aralkum area at high-spatial resolution is feasible. This land cover assessment framework is suitable for understanding land dynamics and identifying risk zones where mitigation measures such as large-scale afforestation could be implemented. It can in turn become the basis for decision-making and monitoring of afforestation in targeted areas. To this end, our map provides a spatial indicator of classification accuracy that can guide the decision-making process.