A novel hybrid sand and dust storm detection method using MODIS data on GEE platform

ABSTRACT Accurate sand and dust storm (SDS) detection is important for assessing SDS disaster risk. Machine learning (ML) based SDS detection approaches have been widely used in recent years due to their higher accuracy and better detection results. However, this approach usually requires manual annotation of numerous training samples that are, in practice, laborious and time-consuming. To overcome this challenge, we propose a novel hybrid SDS detection method that combines the support vector machine (SVM) algorithm implemented on the Google Earth Engine (GEE) cloud computing platform with a spectral index to aid the automatic labelling of training samples. Based on 8 SDS events captured by MODIS over Arid Central Asia (ACA), the effectiveness and accuracy of this method were assessed and compared to traditional approaches. The experimental results indicate that the proposed method can distinguish between mixed pixels (thin cloud and land surface) and SDS pixels and that it minimizes misdetection more effectively. This method achieved more than 98% training accuracy and validation accuracy in SDS detection.


Introduction
As a common meteorological hazard, sand and dust storms (SDS) present a formidable challenge to global sustainable development, particularly in arid and semiarid regions (Middleton et al., 2019). SDS have significant socio-economic impacts on transportation, infrastructure, industry, agriculture, human health and solar power plants (Cheng et al., 2015;Jugder et al., 2011;Middleton et al., 2019). Thus, effective SDS detection is critical to the implementation of the United Nations (UN) 2030 Agenda and sustainable development goals (SDGs; Gholizadeh et al., 2021). In arid regions, the traditional ground-based observations are generally sparse and noncontinuous, so it is difficult to capture the spatial extent of SDS events with the desired accuracy. Different types of satellite data play different roles in SDS detection with their different spatial, temporal, and spectral resolutions (Li et al., 2021). SDS detection approaches that identify dust pixels also determine the effectiveness of SDS monitoring. There are basically two kinds of SDS detection algorithms: empirical-based and machine learning (ML)-based methods.
Empirical-based methods are relatively mature in their state of development and are commonly used. They are based on the premise that earth surface features have characteristic spectral signatures that describe their absorption and scattering ability at different wavelengths and that spectral index can be used to distinguish different object types by highlighting major differences in these spectral signatures. Both solar reflectance bands and infrared emissivity bands have been used for this purpose. Some spectral indices are commonly applied in SDS detection research, for example, the Normalized Difference Dust Index (NDDI; Qu et al., 2006), Enhanced Dust Index (EDI; Han et al., 2013), Improved Dust Identification Index (IDII; Zandkarimi et al., 2020) and Dust Storm Detection Index (DSDI;JebaJebali et al., 2021). The advantage of these empirical-based methods is that they are based on empirical thresholds, which can be comprehensively determined by mineralogical dust composition and vertical dust characteristics (Darmenov & Sokolik, 2005). However, the phenomenon of "different objects with the same spectrum" often appeared based on these methods in SDS detection. Additionally, the thresholds of indices for SDS event detection vary in different regions (JebaJebali et al., 2021). These indices are by themselves insufficient for distinguishing dust pixels and mixed pixels between clouds, especially thin clouds, and other objects (Han et al., 2013). We therefore propose a new multi-band index to mask the misidentification caused by mixed pixels of thin clouds and other objects.
New approaches using ML algorithms have shown an increased interest in approaching the SDS detection problem, such as random forest (RF; Berndt et al., 2021), support vector machine (SVM; Shi et al., 2020) and artificial neural networks (ANN; El-ossta et al., 2013). Shi et al. (2020) found that SVM-based supervised classification methods have good performance in SDS detection. The advantage of an MLbased method is that it can fuse different datasets as input data to improve SDS detection accuracy. However, in the traditional ML-based method, including supervised, semi-supervised, weakly supervised, and even active learning, most of the training samples used in these ML models are manually labelled. Usually, the classification accuracy of these models is dependent more on the manually labelling samples than the model itself. Thus, the SDS detection results of ML classifiers are highly dependent on a large number of independent unbiased, labelled samples obtained from tedious and time-consuming manual labelling (Ahmed et al., 2020). High dependence on manually labelled samples makes them poorly applicable for large-scale, multi-temporal, near real-time and continuous satellite observations. Therefore, the SDS detection performance achieved by using these approaches is typically compromised, thereby reducing the practicality of these methods. Although the dependence of dust indices on empirical threshold limits the empirical-based method mentioned above, it has the advantage of simple and efficient mathematical calculation. The empirical-based method can provide more valuable labelled samples required for ML-based classification based on these inherent advantages.
Based on the findings mentioned above, this letter proposes a novel hybrid SDS detection method that integrates the advantages of empirical-based and MLbased methods to improve SDS detection accuracy and increase the potential scope of RS-based SDS detection. In addition, we propose a Multi-Band Snow Cloud Index (MBSCI) that can express the difference between snow, clouds, and other objects, providing a means to distinguish thin clouds and dust in particular. This study used MODIS data for SDS detection based on the Google Earth Engine (GEE) platform.

Study area and datasets
Arid Central Asia (ACA) is one of the four largest SDS regions and one of the main potential SDS sources in the world (W. W. Wang et al., 2020). ACA is located between longitudes 46°E-107°E and latitudes 34° N-54°N, including five countries in Central Asia and the arid area of Northwest China ( Figure 1). As the largest arid regions in the mid-latitudes of the Northern Hemisphere, affected by the westerly circulation, large amounts of sand and dust produced in this region is driven by the westerly wind to North China, the North Pacific and as far as the Atlantic Ocean (Indoitu et al., 2012). As shown by the red boxes in Figure 1, the study areas include two different deserts in ACA. As the Aral Sea shrinks, the lakebeds are gradually drying up, and the Aralkum Desert (AD), one of the new-born deserts, has become one of the most active SDS sources in ACA (Shen et al., 2016). In addition, as the world's second-largest shifting sand desert, the Taklimakan Desert (TD) is the most active SDS source in ACA (Chen et al., 2017). In  (Table 1). Table 1. Summary information on the used datasets

Methods
In previous SDS detection studies, dust pixels are difficult to distinguish from mixed pixels containing clouds and other objects. For example, thin cloud pixels are misclassified as SDS pixels due to the mixed ground reflection information. (Han et al., 2013) introduced the concept of dust optical density (DOD) to construct the EDI for SDS detection and showed a good performance. DOD, as well as dust endmember fraction (ɑ), was simply estimated by the linear spectral unmixing (LSU) method (Keshava & Mustard, 2002). Therefore, to emphasize the difference between SDS pixels and other objects, the endmember fractions were introduced as the parameters for distinguishing different objects. In this study, spectral unmixing was implemented in the GEE platform, and the output values were constrained to be nonnegative. The endmember selection was based on the Pixel Purity Index (PPI) method and the Vertex Component Analysis (VCA) method was adopted to select six types of endmember objects (dust, water, vegetation, cloud, desert, snow; Nascimento & Dias, 2005). As shown in Figure 2, the first row represents the true color images of SDS events. The second row contains the spectral unmixed results with RGB composition (Red-Dust, Green-Cloud, Blue-Snow, Blackother composition). Meanwhile, EDI (Han et al., 2013), NDDI (Qu et al., 2006), NDVI (Tucker, 1979), NDWI (McFeeters, 1996), NDSI (Salomonson & Appel, 2004), MBWI (X. X. Wang et al., 2018), MBSCI were computed according to the formulae shown in Table 2. The ɑ is the dust endmember fraction which is calculated based on the LSU method. B1-B7 means surface reflectance for bands 1-7 provided by MOD09GA version 6. Compared to the SDS event image shown in Figure 2, the MBSCI appears to have a good performance in distinguishing clouds and snow from other objects. The result indicated that the threshold to distinguish clouds is [1,10]. When the value of MBSCI is greater than 1 and less than 10, the pixel is regarded to be covered by clouds. Where it is less than 2 we consider it to be a thin cloud. When it is greater than 10, it is regarded as snow-covered.
In order to obtain a large number of training samples, the multi-threshold method was used to extract the labelled samples of different objects: Cloud, dry Lakebed (Lb), Water Dust (WD), Thick Dust (TcD), Thin Dust (TnD), Water Bodies (WB), Bare Land (BL), Vegetated Land (VL), Snow and Ice (SI). Based on cloud types issued by the World Meteorological Organization (WMO) and their spectral reflectance characteristics, clouds were divided into three types: Thick Cloud (TcC), Cloud-Cirrostratus (Cs), Thin Cloud (TnC). First, we investigated band reflectance, spectral index values and endmember fractions of different objects (Figure 3). Then, referring to previous studies, as well as the reflection characteristics of different objects of the study area, a set of empirical thresholds were defined to extract training samples in this study (Table 3). . Table 2. List of spectral indices adopted, abbreviation, equations, and bibliographic references In this study, SVM were chosen as the machine learning classification algorithm for SDS detection. The main reasons are as follows; first, the SVM classifier has good performance in the previous SDS detection studies, especially based on multi-spectral images (Rivas-Perea et al., 2013;Shahrisvand & Akhoondzadeh, 2013;Shi et al., 2020). The SDS events captured by the RS image are random in spatial scale and varying degrees. Whereas SVM works well with a limited amount of training data, with the aim of finding one or more optimal hyperplanes which can minimize the structural risk and classification errors. Then, based on a trial-and-error procedure, the SVM classifier performed best for SDS detection among different ML methods provided by GEE platform, such as RF, SVM, Gradient Boosting Decision Tree (GBDT) and Classification and Regression Trees (CART; Tamiminia et al., 2020) . The LibSVM package was employed for multiclass SVM classification in this study (Chang & Lin, 2011). C-Support Vector Classification (C-SVC) with linear kernel and cost parameter set to 1 was found to perform best from a comparison between several SVM types. We therefore propose an SVM method that uses spectral index aided automatically labelled training samples (AL-SVM). All classification models were trained and validated on the GEE platform. According to previous studies, SDS events often have specific source areas and spatially continuous dust plumes diffused from the source area (Boroughani et al., 2020). Based on this theory, in order to describe the SDS-covered area more realistically and inhibit Saltpepper noise caused by image classification, we removed small SDS patches based on the scale transformation method to obtain more realistic SDS information. The methodology and algorithm development used in this research are shown in Figure 4.

Results and discussion
To evaluate the performance of the proposed SDS detection method, spectral index-based methods including EDI and NDDI were selected as benchmarks. Figure 5 shows the true colour SDS events and SDS-covered areas derived with NDDI, EDI and AL-SVM. The first row shows the true colour images of 8 SDS events derived from MOD09GA daily reflectance data with RGB band combination (R-band 2, G-band 4, B-band 3). The second row shows the derived SDS-covered areas based on the NDDI method. When NDDI is greater than 0 and less than 0.28, the pixel is regarded as covered by dust (Karimi et al., 2012). The third row is the derived SDS-covered areas based on the EDI method. When EDI is greater than −0.2, the pixel is regarded as covered by dust. The     fourth and the last row are the derived SDS area and classification results based on AL-SVM supervised classification method. In this study, thick dust, thin dust and dust water are regarded as SDS-covered areas. The details of 8 SDS events as shown in Table 4. Table 4. Training accuracy (TA) and validation accuracy (VA) of SDS detection The dark yellow colour pixels represent the SDScovered areas ( Figure 5). The NDDI and EDI could generally identify the dust pixels in the whole image among the results. However, dust properties vary significantly among different dust sources, and the reflection characteristics of dust plumes on satellite imageries are also slightly different. For example, the salt dust storm that happened in the Aral Sea ( Figure 5 -(d; e)(f)) shows a higher albedo than the dust storm that occurred in the TD ( Figure 5 -(g; h)). Based on this, the traditional spectral index methods should set different thresholds to satisfy different SDS reflection characteristics. Therefore, this brings greater uncertainty and limitations to SDS detection based on the traditional spectral index methods.
However, AL-SVM, which is based on supervised classification, can overcome this limitation. It can train the classifier based on a large number of labelled samples, so that SDS with different reflection characteristics can be easily and accurately detected. The result produced by AL-SVM showed the best spatial extent of SDS events among these results ( Figure 5). To solve the Salt-pepper noise problem, the extent of main SDS-covered areas can first be extracted entirely based on the upscaling method. Then, the combination of the main SDS event boundary and dust patch area can update the mask with the SDS detection results, thereby removing the Salt-pepper noise. From Figure 5 -(a; b)(c)(d)(e), the SDS that occurred on the water surface was not fully recognized by NDDI and EDI. Meanwhile, mixed pixels of thin clouds and deserts are easily recognized as SDS pixels. Therefore, we introduced endmember fractions into AL-SVM as an important basis for training sample selection. Meanwhile, the mixed pixel covered by dust and water was distinguished based on the land cover type obtained from the annual land cover dataset (MCD12Q1). Similarly, to distinguish between thick clouds and snow, we relied on MBSCI and used DEM data to optimize the sample labelling of snow. In this study, the lower limit of elevation selected by MBSCI labelled samples was set as 2500 m to reduce the generation of erroneous training labels. To accurately detect all dust-covered areas to measure the strength of SDS as far as possible, we divide the SDS into TcD and TnD according to the severity of the SDS.
Given the sparse distribution and lack of spatial continuity of weather stations, no ground-based observation was utilized to assess the accuracy based on the interpolation method. In this study, we set different random seeds to distinguish training and testing samples, both of which have the same quantity. 500 training samples and 500 testing samples for each class (Cloud, Thin Cloud, Dry Lakebed, Dust Water, Water Bodies, Thick Dust, Thin Dust, Vegetated Land, Bare Land, Snow and Ice) were selected. Thus, from a quantitative perspective, the training accuracy (TA) and validation accuracy (VA) were calculated based on the training samples and testing samples, respectively. The accuracy of traditional approaches (NDDI and EDI) was also calculated based on test samples. As shown in Table 4, the VA of the proposed SDS detection method (83.80%) was higher than the EDI (78.39%) and NDDI (65.68%). The experiment without using MBSCI for selecting samples (AL-SVM # ) also be added to prove the effectiveness of MBSCI. The results show the VA of images with cloud interference is significantly improved after using MBSCI for selecting samples (e.g. AD: 2014AD: -04-22, 2018TD: 2014-04-25). Additionally, different SDS pixels (thick dust and thin dust) are easily misclassified with each other. After removing the self-confusion (AL-SVM*), the training accuracy and validation accuracy (98.43%) of SDS detection are significantly increased. Although there exists a certain confusion between thick dust and thin dust, AL-SVM presented a better precision for SDS pixels. On the other hand, it also shows that the dust thickness is difficult to identify only based on passive remote sensing data. Although automatically labelled training samples are not such reliable as manually visually labelled samples, it not only saves lots of time and effort in manually building the training sample set but also has better model performance and good SDS detection results.

Conclusion
This letter proposed a novel hybrid SDS detection method using SVM classifiers trained with automatically generated labelled samples. This method used the inherent advantages of spectral indices and endmember fractions in training data labelling and introduced SVM as a classifier to improve SDS detection accuracy. Based on eight typical SDS events in ACA, we could conclude from the results that the proposed method performs better in distinguishing cloud-surface mixed pixels and SDS pixels. In addition, the implementation of this method on the GEE platform provides a basis for deriving largescale, long-term data series, and continuous SDS detection applications. However, it should be stated that the proposed method still needs to be validated by more multispectral sensors, such as NPP-VIIRS, FY-4A/AGRI and others. In the future, we may be able to produce highly accurate SDS risk distribution maps. This is very important for studying the occurrence and development of SDS and identifying their source.