The synergies of SMAP enhanced and MODIS products in a random forest regression for estimating 1 km soil moisture over Africa using Google Earth Engine

ABSTRACT Due to the coarse scale of soil moisture products retrieved from passive microwave observations (SMPMW), several downscaling methods have been developed to enable regional scale applications. However, it can be challenging for users to access final data products and algorithms, as well as managing different data sources and formats, various data processing methods, and the complexity of the workflows from raw data to information products. Here, the Google Earth Engine (GEE), which as of late offers SMPMW, is used to implement a workflow for retrieving 1 km SM at a depth of 0–5 cm using MODIS optical/thermal measurements, the SMPMW coarse scale product, and a random forest regression. The proposed method was implemented on the African continent to estimate weekly SM maps. The results of this study were evaluated against in-situ measurements of three validation networks. Overall, in comparison to the original SMPMW product, which was limited by a spatial resolution of only 9 km, this method is able to estimate SM at 1 km spatial resolution with acceptable accuracy (an average correlation coefficient of 0.64 and a ubRMSD of 0.069 m3/m3). The results show that the proposed method in GEE provides a precise estimation of SM with a higher spatial resolution across the entire continent.

Remote Sensing (RS) techniques make it possible to observe SM at a global scale with sufficient time continuity, even in difficult and inaccessible locations (Brocca et al., 2023;Mohanty et al., 2017;Mohseni & Mokhtarzade, 2020).Various studies have demonstrated that passive microwave radiometers are one of the most efficient RS technologies for SM estimation, due to their high sensitivity to the soil dielectric constant, which has a direct relationship with SM (Xu et al., 2022).Since late 1970s, data from various RS microwave radiometers have been utilized to retrieve SM.Soil Moisture and Ocean Salinity (SMOS), in orbit since November 2, 2009 (Mecklenburg et al., 2012), and Soil Moisture Active Passive (SMAP), in orbit since January 31, 2015 (Montzka et al., 2016;Reichle et al., 2022), are the first two satellites dedicated to directly estimate near-real-time global SM.Both SMAP and SMOS suffer from the same shortcoming as the previous RS radiometers regarding their coarse-scale spatial resolution (approximately 9 km to 40 km), which is inadequate for studies at the regional or local level (Mohseni & Mokhtarzade, 2021;Mohseni et al., 2022).
Several efficient methods for retrieving SM with moderate and high spatial resolution have been developed to combine SM obtained from passive microwave observations (SM PMW ) with observations from other RS domains (such as optical, thermal, and active microwave) with higher spatial resolution (Peng et al., 2017).The underlying concept of these downscaling methods is that as SM influences emissivity, temperature dynamics, soil color, albedo, light absorption bands, and dielectric constant, a variety of SM-related indices can be calculated using different band ratios and band combinations of optical and thermal RS domains (Khellouk et al., 2020;Mohseni et al., 2021;Sahaar et al., 2022).Furthermore, active microwave backscattering signals of the land surface are also sensitive to SM (Mirmazloumi et al., 2021;Zribi et al., 2020).Both mentioned electromagnetic spectrum domains enable sensors to operate at higher spatial resolution than passive microwave, but SM retrieval is challenging without auxiliary data and more complex mathematical algorithms, considering, e.g., the complex spatio-temporal dynamics of vegetation cover (Pradhan et al., 2018).The use of these intermediate and highresolution observations in combination with SM PMW allows estimation of SM with higher spatial resolution (Petropoulos et al., 2015).The scaling methods can be categorized into various groups based on the features used to disaggregate the SM PMW products: radarbased (Jagdhuber et al., 2019;Narayan & Lakshmi, 2008), radiometer based (de Jeu et al., 2014;Gevaert et al., 2016), optical/thermal-based (Chul Jung et al., 2019;Peng et al., 2017;Zheng et al., 2021a), data assimilation based (Fang et al., 2018;Lievens et al., 2015;Xu et al., 2020), and soil surface attributes-based methods (Montzka et al., 2018;Shin & Mohanty, 2013).Optical and thermal methods are widely used due to the extremely strong theoretical relationship between SM, surface temperature, and vegetation condition and easy access to data obtained from different optical/thermal satellites (Mohseni & Mokhtarzade, 2021).
So far, a vast number of empirical (Piles et al., 2016), physical-based (Merlin et al., 2005), data assimilation (Naz et al., 2020;Pellenq et al., 2003), and spatial interpolation methods (Kim & Barros, 2002) have been developed and applied to downscale SM PMW using optical/ thermal indices (Merlin et al., 2010;Sabaghy et al., 2018).Piles et al. (2011), applied a Linear Regression (LR) model to downscale SMOS data using Land Surface Temperature (LST), and Normalized Difference Vegetation Index (NDVI) of MODIS.Song et al. (2013) used a singlechannel technique to downscale the BT observed by AMSR-E using an LR between Microwave Polarization Difference Index (MPDI) and NDVI.They retrieved SM from downscaled BT using a Single-Channel Algorithm (SCA).Similarity, Sánchez-Ruiz et al. (2014) used Normalized Difference Water Index (NDWI) calculated with three various MODIS Short Wave Infrared (SWIR) bands along with NDVI and LST to downscale SMOS data over the REMEDHUS network in Spain.Fang et al. (2022) and Hu et al. (2020) applied the Geographically Weighted Regression (GWR) method to obtain an optimal local regression for downscaling SMAP data using 8-day LST, and 16-day NDVI.Merlin et al. (2008) introduced a novel downscaling approach called Disaggregation based on Physical And Theoretical scale Change (DisPATCh) to disaggregate coarse-scale SM data based on a semiempirical Soil Evaporative Efficiency (SEE) model and a first-order Taylor series expansion.A vast number of researches investigated the performance of DisPATCh over different study areas (Djamai et al., 2016;Fontanet et al., 2018;Malbéteau et al., 2016;Merlin et al., 2008;Montzka et al., 2018;Neuhauser et al., 2019;Qu et al., 2021;Zheng et al., 2021b).
Aside from the aforementioned empirical and semi-empirical methods, a number of studies have demonstrated the ability of Machine Learning (ML) models in downscaling SM PMW (Zhao et al., 2021).Srivastava et al. (2013) compared the performance of the Generalized Linear Model (GLM) and three ML techniques including Artificial Neural Network (ANN), Support Vector Machine (SVM), and Relevance Vector Machine (RVM) to downscale SMOS products using MODLS LST.Abbaszadeh et al. (2019) presented 12 distinct Random Forest (RF) models to downscale the daily composite version of SMAP data using LST and NDVI data of MODIS along with soil texture and topography information of the study area.In this way, Hu et al. (2020) used LST, NDVI, horizontally and vertically polarized BT, and SWIR reflectance as the input data of the RF method to downscale SMAP products over Inner Mongolia.Bai et al. (2019) also applied RF to downscale SMAP product using the LST, NDVI, Enhanced Vegetation Index (EVI), NDWI, Leaf Area Index (LAI) derived from MODIS, backscattering coefficients of Sentinel-1, and surface elevation and slope as auxiliary data.Because downscaling methods are usually pixel-based and ignore the information among the neighbors of the pixels, Xu et al. (2021) proposed a new downscaling method based on a Convolutional Neural Network (CNN) and map-to-pixel approach during the network construction.In a study conducted by Ghafari et al. (2020), SMAP radiometer BT was downscaled to 3 km using airborne SAR backscatter observations collected during the fifth SMAP Experiment (SMAPEx-5) conducted in southeastern Australia.The study compared the results of these downscaled data with SMAP/Sentinel-1 SM data, which also had a spatial resolution of 3 km.Xu et al. (2022) proposed an SM downscaling framework based on the Wide & Deep Learning (WDL) method to improve the spatial resolution of SMAP product, using the horizontally and vertically polarized BT, LST, and surface reflectance.They also used topographic attributes, soil properties, climate types, and land cover types as auxiliary datasets to improve the accuracy of the WDL method.
Downscaling methods, whether using classical regression methods or machine learning, rely on the relationship between vegetation conditions, surface temperature, and SM.
However, these methods present challenges.Firstly, they require multiple data sources with different formats and software for processing.Secondly, filling missing values, especially for cloudy conditions, is necessary.Thirdly, adjusting temporal resolutions and overpass times of the data sets is crucial.Lastly, limited accessibility and modification options for algorithms pose difficulties for non-experts seeking local-scale SM information (Khazaei et al., 2023).
The use of cloud computing platforms like Google Earth Engine (GEE) overcomes these challenges by providing access to extensive datasets from satellites, RS products, and model-based measurements (Mutanga & Kumar, 2019).GEE offers accessibility to diverse datasets, ready-to-use image-driven products, easy data upload, a wide range of image processing algorithms, and various computing techniques (Amani et al., 2020;Ghorbanian et al., 2020;Tamiminia et al., 2020).Multiple SMAP products, including SMAP L3 Radiometer Global Daily 9 km Soil Moisture and SMAP L4 Global 3-hourly 9-km Surface and Root Zone Soil Moisture, are available through GEE.These products provide global land surface conditions, SM measurements, meteorological variables, and other relevant data (Chan et al., 2016;Chen et al., 2018;Colliander et al., 2017;Kim et al., 2018;Ma et al., 2019).The SMAP mission outperforms earlier satellites in terms of resolution and accuracy.By leveraging GEE and statistical methods, merging coarse-scale SM PMW data with other RS observations enables accurate high-resolution SM estimation.
The purpose of this study is to implement an optical/thermal-based downscaling procedure for estimating SM at 1 km spatial resolution at a depth of 0-5 cm using GEE for the African continent.The estimation of SM in Africa holds immense importance due to the consistent predictions of climate change models, indicating an increased occurrence of droughts and floods in various regions (Myeni et al., 2019).These climate change consequences pose a significant threat to the food security and livelihoods of smallholder farmers across the continent.SM plays a crucial role as a water source for plant growth between rainfall events, directly impacting crop yields and agricultural drought (Anderson et al., 2012).Therefore, accurate quantification of SM becomes a vital parameter in assessing weather-induced risks associated with climate change in Africa.Efforts have been made in recent years to establish in-situ SM monitoring networks, supporting satellite-based SM retrievals (McNally et al., 2016).Despite these advancements, there remains a persistent lack of comprehensive SM information in Africa, characterized by insufficient spatial resolution and acceptable accuracy.Consequently, further research and improvements are necessary to obtain SM data that adequately addresses the unique requirements and challenges faced in Africa.
By delving into the details, the novelty of this study can be analyzed from two distinct perspectives: Firstly, the study introduces a pioneering application of GEE for implementing a downscaling method, thereby surmounting the challenges associated with traditional downscaling approaches.Secondly, this study endeavors to estimate soil moisture SM at moderate resolutions for the entire continent of Africa, a task that has not been previously undertaken.Through the proposed methodology, it becomes feasible to consistently apply this approach across the entire continent, resulting in the generation of SM maps at a remarkable 1-km resolution, obviating the need for auxiliary data.
The proposed method in this study leverages the power of the GEE computing platform and generates a weekly SM in Africa.Similar to other downscaling methods, the proposed method has three main steps: (1) estimating SM-related features at moderate spatial resolution, (2) establishing a linking model between measured features and SM PMW , and (3) estimating the SM at the spatial resolution of SM-related features.The proposed method employs RF Regression (RFR), known as one of the most effective ML techniques for establishing regression models between non-relative values (Mirmazloumi et al., 2022).The trained RFR model is subsequently applied to optical and thermal data to estimate SM at higher spatial resolutions.The strength of this methodology lies in its accessibility and user-friendliness, enabling researchers and scientists to work with the data and derive meaningful insights without requiring specialized software or technical expertise.Additionally, this approach offers the advantage of generating up-to-date data on demand.
The study area and satellite data used in this study are detailed in the next section.In Section 3, the proposed method, the RFR, and the SM indices are described.Results and discussion are presented in Section 4 and Section 5, respectively.In Section 6, the conclusion is presented.

Study area and data
In order to evaluate the effectiveness of the GEE to apply the SM PMW downscaling algorithm, Africa (between 18°W and 25°E and 4°N and 18°N) is considered as the core study area.Given the variety of climatological, biogeographical, pedological, and lithological properties over the African continent (Zribi et al., 2008), different facets of the performance of the downscaling approach in GEE can be assessed in this area.
Over the last few years, several temporary or permanent SM validation networks with in-situ stations have been installed in Africa to support satellite retrievals, satellite product improvements, and assessments (Myeni et al., 2019).Some of these validation sites are listed in the International Soil Moisture Network (ISMN) to support the further validation of RS products at the global scale (Dorigo et al., 2021).AMMA-CATCH, DAHRA, SD_DEM, TAHMO, are four important SM measuring networks located in different parts of Africa.In addition, the Cosmic-ray Soil Moisture Observing System (COSMOS) has launched 14 validation stations in Africa.The networks measure the SM at various soil depths, ranging from 0-5 cm to 2 m depths every hour.Cosmic-ray Neutron Probes, TERSOS, and HYDRA are some of the detectors utilized at different stations to measure SM in Africa.These stations are found in different types of landscape, including wetlands, watersheds, urban areas, and croplands.Since TAHMO has the greatest number of stations (70 stations), it is important to consider the time period availability of this network.Given this prerequisite and the availability of other sites' data, the largest time frame for which the maximum number of in-situ measurements is accessible and can be used for the validation process is the year 2020.For this time period, data from stations that measure SM up to 10 cm depth of the soil are gathered for the validation procedure.To ensure the reliability of our validation process, we have excluded in-situ data near water bodies and urban areas.This decision was made due to potential uncertainties introduced by the presence of these features, such as water bodies in wetlands, rivers, lakes, and the urban environment.By excluding such data points, we aim to minimize the impact of these sources on the accuracy of our SM measurements derived from passive microwave observations.Thirtyfive stations from three different networks are remaining after the reference data analysis; their main characteristics are listed in Table 1, and their locations are displayed in Figure 1.
Based on Table 1, most of the stations are located in croplands and have tropical and temperate regimes.For each station, the SM measurements at the depth of 0-10 cm between 6 am and 6 pm are averaged for each week and applied as ground reference data in the validation process.Based on the information provided in Table 2, this method utilizes various MODIS and SMAP data with different spatial and temporal characteristics.For land surface temperature (LST) observations, version 6 of the 8-day Terra LST and Emissivity product with a 1 km spatial resolution (MODIS/V006/MOD11A2) was used.Regarding reflectance in various NIR and SWIR bands and a vegetation index, this study employed the Terra surface spectral reflectance product (MODIS/006/MOD09A1) and vegetation fraction product (MODIS/061/MOD13A1), both at 500 m spatial resolution.Moreover, the global and yearly land cover type map of MODIS (MODIS/061/ MCD12Q1) was also used.

Satellite data
The proposed method is implemented using SPL4SMGP.007SM product available in GEE.SPL4SMGP.007represents the level 4 of the SMAP product.This product merges SMAP observations into a physically based numerical land surface model simulating the water, energy, and carbon cycles.As a result, the level-4 data provides a global estimate of surface and root zone SM, surface and soil temperature, and land surface fluxes.Additionally, the product includes algorithm diagnostics derived from the ensemble-based data assimilation system (Entekhabi et al., 2014).SPL4SMGP.007utilizes the SMAP radiometer data spatially enhanced to 9 km, based on the Backus-Gilbert optimal interpolation technique.This product is provided on the global cylindrical EASE-Grid 2.0 (Mohseni et al., 2022).For this study, data from January to December 2020 were averaged for each week before being used for downscaling SM.

Methodology
Figure 2 shows the proposed process of estimating SM at the local spatial resolution using GEE.
The underlying concept behind the proposed method is the relationship between SM, atmospheric condition, surface temperature, and vegetation.This approach is also known as the Universal Triangle Technique (UTT) or the Temperature-Vegetation (T-V) method (Carlson & Petropoulos, 2019).The theory states that for a given region, under particular climatic conditions, there is a unique relationship between SM, vegetation fraction (Fv), and surface temperature (Ts) (Carlson, 2007;Nichols, 2011).In general, SM is directly influenced by soil evaporation and vegetation transpiration, both of which influence the Ts observed by RS.It means that due to the cooling of the plant canopy by vegetation transpiration, the observed Ts decreases.Without water available to transpire, the canopy heats up.Denser vegetation is better able to cool its canopy than sparse vegetation.It can be said that there is a negative relationship between the observed Ts and Fv.However, the rate of the mentioned relationship between Ts and Fv is not equal in different SM and increases in the dry soil conditions (Mohseni & Mokhtarzade, 2020).Through the corresponding relationship, SM values can be estimated from Ts and Fv, both of which are measurable using RS observations.So far, a wide range of SM-related indices have been developed and calculated using optical/ thermal RS products.As part of the application of the UTT, SM PMW products and optical/thermal data have also been merged under the disaggregation method (Carlson & Petropoulos, 2019;Piles et al., 2010Piles et al., , 2011;;Xu et al., 2018).There are three main steps in the downscaling method using optical/thermal observations: (1) estimating optical/thermal SM-related features at moderate spatial resolution, (2) establishing a linking model between optical/thermal features and SM PMW , and (3) estimating the SM at the spatial resolution of MODIS product.

The estimation of optical/thermal SM-related features at moderate spatial resolution
In order to extract SM-related features from optical/thermal data, it is necessary to exclude all products and pixels that contain incorrect or inaccurate data from the input provided to the algorithm.For this purpose, any MODIS products with more than 20% of cloudy pixels were removed.Thermal features derived from MODIS observations may have some untraceable errors in estimation LST in woodlands and forests due to the limited penetration depth of optical and thermal bands within canopied areas.The accuracy of these features is a crucial factor that can impact the precision of the RFR model.The potential impact of woodland area on estimating SM-related features and then the performance of the RFR necessitates their exclusion from our proposed method.Subsequently, the flag data were processed by utilizing the MODIS LULC product to eliminate pixels corresponding to water bodies, artificial land, and temporary snow and ice.This step was also performed for the coarse-scale SM PMW .Consequently, any pixels that consisted of more than 20% of water bodies, artificial land, temporary snow, or ice were excluded from the further processing.Additionally, all MODIS products were rescaled to a resolution of 1 km before being integrated into the downscaling workflow.This rescaling process was conducted in GEE by calculating the average values within 500 m and generating pixels with a spatial resolution of 1 km.
After removing pixels with weak data, SM-related features were extracted for the remaining pixels.In this study, and given the UTT method, NDVI product is used as the vegetation index.Moreover, the MODIS LST at the daytime (LST day ) and LST difference between daytime and nighttime (ΔT s , Eq. ( 1)) are applied as the temperature features.The ΔT s is estimated using the observations of ascending and descending overpass time of MODIS.
Applying ΔT s is based on Thermal Inertia Theory (TIT), which states that in the same environmental temperature variation, materials with low water content and, thereby, low heat capacity and thermal inertia, have a greater surface temperature difference (Fang et al., 2013).Moreover, the reflectance at three different NIR/SWIR bands, as the water absorption bands, including bands 5-7 of MODIS are also used in this study.The consideration of these three water absorption bands is grounded in previous studies that utilized NDWI or other indices derived from these specific bands to downscale SM PMW data (Abbaszadeh et al., 2019;Sánchez-Ruiz et al., 2014).As the reflectance of the water absorption bands is influenced by SM, they hold significant potential as robust features for estimating SM in relation to soil characteristics.
All daily features and RS-products at both 9 km and 1 km spatial resolutions are then aggregated to produce weekly maps.It should be noted that several studies have attempted to predict shorter-term SM-related features, such as daily LST, daily evapotranspiration, etc., by employing different interpolation methods to estimate SM values at higher temporal resolutions (Han et al., 2010;Jung et al., 2017;Wang et al., 2011).However, the objective of using weekly inputs is to mitigate the impact of the accuracy of interpolation techniques in GEE performance and the outputs of downscaling workflow.

The establishment of a linking model between optical/thermal features and SM PMW
In the second step of the proposed method, the relationship between SM and SM-related features extracted from RS data must be established (Eq.( 2)) (Peng et al., 2017).This relationship can range from linear regression to various ML regression models (Sabaghy et al., 2018).Based on the UTT and TIT theory explained above, the following relationship can be established at the spatial resolution of SMAP: The spatial resolution of SM PMW in GEE is 9 km, which allows the estimation of the linking function (f, Eq. ( 2)) at this resolution.GEE has a number of built-in ML tools that are designed to work with multi-band raster data.In this study, we employed RFR designed in GEE as the linking function between optical/thermal features and SM PMW observations.RFR is an ML algorithm specifically designed for regression problems.It utilizes an ensemble of decision trees to make predictions.In the RFR algorithm, numerous decision trees are created, each making predictions based on a subset of the data features.These individual predictions are then combined to generate a final prediction.This approach helps mitigate the risk of overfitting and enhances the accuracy of the predictions.RFR is commonly utilized in scenarios involving multiple predictors or features that may contribute to the outcome.Its ability to handle non-linear relationships and overfitting, achieved through the averaging of multiple decision trees, makes it valuable for establishing the relationship between different remote sensing-based observations (Zhao et al., 2018).Additionally, RFR has demonstrated good performance, especially when compared to traditional statistical models.It can also handle large datasets and highdimensional feature spaces, which makes it well suited for big data applications and computing platforms such as GEE (Borah et al., n.d.).In this study, RFR implemented in GEE was trained using 80% of the 9 km coarse-scale pixels within the study area, while the remaining 20% pixels were used for testing.To achieve this, the optical/thermal features were upscaled to match the SM PMW resolution.It should be noted that RFR was executed separately for each week using only the data from that specific week.That would enable also the extension to special circumstances and extreme events such as floods and droughts.

Estimating the SM at the spatial resolution of MODIS product
In this step, the observed relationships between optical/thermal features and SM PMW at a spatial resolution of 9 km are utilized to convert the original features extracted from MODIS, which are at a 1 km spatial resolution, into weekly SM values.
In addition to the SM maps generated as output, we also applied a correction equation (Eq.( 3)) developed and used by (Fang et al., 2013) to the resulting SM values.The purpose of the correction equation is to minimize the discrepancy between the SMAP values and the average of the disaggregated SM values over the SMAP pixel (Fang et al., 2013).
where SM(i,j), SM c (i,j) are the estimated weekly SM and the corrected SM of pixel of (i,j) at the spatial resolution of 1 km, respectively.N is the number of SM (i, j)  )/2 can change to the weekly SM PMW .The output of this study will undergo validation using in-situ measurements of 0-5 cm SM collected at 35 stations, which are listed in Table 1.Similar to other RS studies, there are uncertainties associated with validating the outputs due to the significant spatial difference between the point-based reference measurements and the output map (Montzka et al., 2021).To address this issue, we mitigate the uncertainty by implementing a dense validation site, considering various validation protocols to select appropriate sites, study duration, and in-situ stations (Colliander et al., 2017(Colliander et al., , 2021;;Gruber et al., 2020).Furthermore, the measurements from all stations within an SMAP pixel are averaged for the validation process, even for coarse-scale outputs such as 9 km or 1 km soil moisture.
where N is the number of samples applied to calculate the statistics, y(i) is the i th measurement, and x(i) is its corresponding reference value.

Estimating SM at 1 km using RFR in GEE
The proposed method was applied to GEE to obtain a 1 km spatial resolution map of SM for the entire African continent (see supplementary material).However, for the purpose of visualizing the results, we present those parts of the output map of African countries where the in-situ stations are located (see Figure 1).Therefore, the domain of the result map for Kenya, which has 11 stations of TAHMO validation site and 2 stations of COSMOS validation site, is shown in Figure 3. Similarly, the results for Ghana, the other African country with 21 TAHMO in-situ stations, are also illustrated in Figure 4.Both Figures 3(a) and 4(a) show the SMAP SM products at the spatial resolution of 9 km (SPL4SMGP.007),while Figures 3(b) and 4(b) represent ΔT s (LST day LST night ), which is one of the SM-related features, retrieved from LST products of MODIS at the spatial resolution of 1 km.
The outputs of the proposed method for both countries are presented in Figures 3(c) and 4(c).These outputs were calculated using the RFR (which was trained at 9 km spatial resolution) and original RS-related features with the spatial resolution of 1 km in GEE.Based on Figures 3(c) and 4(c), it can be observed that the estimated SM in the two regions of interest exhibits a spatial pattern that lies between the spatial pattern of the SMAP products and other optical/thermal features used to estimate high-resolution SM values in GEE.Overall, the quality and accuracy of the SM estimation process are highly dependent on the availability and quality of the input data, as well as the selection of appropriate land cover types for the estimation process.Given Figures 3(c) and 4(c), the outputs generated from the SM estimation process typically consist of pixels that do not have any estimated values.These pixels are usually limited to those that do not have any data from the optical/thermal and passive microwave observations.Based on the characteristics of passive microwave RS, this limitation of the downscaling method is predominantly related to the lack of data in optical or thermal measurements caused by adverse atmospheric conditions.However, in our study, we utilized the average weekly MODIS data, which contains thermal and vegetation information for most of the pixels.Despite this, there are still some pixels that do not have any estimated SM values.These pixels have been excluded from the estimation process based on the land cover not suited for soil moisture estimation.Additionally, Figures 3(d) and 4(d) show the output of the proposed method after applying correction formula (Eq (3)) to ensure homogeneity between the results of the RFR and the SMAP product for Kenya and Ghana, respectively.
The proposed method focuses on estimating 1-km SM by considering three crucial factors: 1) main original coarse-scale SM product, which is the 9-km SMAP product, 2) SMrelated features with higher spatial resolution, which are derived from MODIS observations, and 3) downscaling method, which is RFR.By leveraging these three factors, the estimation of SM at a resolution of 1 km becomes feasible.It can be said that the observed SM reflects the variations present in all the data used for estimation, including LST, NDVI, LULC, three-band's reflectance, and the original coarse-scale SM product.However, it is expected that after pixel-mean correction, the spatial pattern and the range of output SM map (Figures 3(d) and 4(d)) are closer to the SMAP product comparing to the initial RFR result (Figures 3(c) and 4(c)).Since SMAP product has been thoroughly examined in a number of research locations (Mohseni et al., 2022), the agreement between the pattern and range of calculated SM and the SMAP observations can be attributable to a major improvement and the success of the downscaling procedure.
However, as can be seen in Figures 3(d) and 4(d), there are two main impacts of this correction visible in the outputs, which may make it less effective compared to the initial RFR retrieved SM.First, as shown in these sub-figures, the SMAP pixel watermark can be observed in the results, which is a fundamental challenge of all SM PMW downscaling methods.Most of the coarse-scale SMAP pixels can be identified in the results, but this problem does not exist in the estimated SM using the trained RFR.Secondly, there are also more pixels with non-value SM after implementing the correction, due to the non-value pixels in SPL4SMGP.007,as shown by the black areas in Figures 3(d) and 4(d).This issue can pose a challenge in estimating SM values near water bodies, urban areas, and forests, where different landscapes have varying SM values that their spatial and temporal SM information are required for different studies.In conclusion, the correction coefficient can improve the estimated SM values, bringing them closer to the SMAP product.However, the presence of the SM PMW pixel watermark and the increased number of flags and no data pixels in the results have an inevitable and somewhat destructive effect.

Statistical evaluation of the proposed method
To statistically evaluate the performance of the proposed method, we compared SM maps generated using the proposed method with in-situ measurements taken over a 1-year period.Figure 5 represents the results of spatial correlation analyses conducted separately for Ghana and Kenya, comparing the outputs of the proposed method with in-situ SM measurements.It should be noted that since SMAP SM values were used to train the RFR, this figure also presents the statistical evaluation of the SM PMW observation with 9 km resolution.The data used to generate the results in Figure 5 were limited to the weeks and stations where both the RS products/outputs and in-situ measurements were available.
Figure 5 displays the R values, along with the average SM, Bias, RMSD, and ubRMSD measured in m 3 /m 3 units, for each area of interest.In this figure, panels (a) and (d) present the spatial correlation analysis results for the initial SMAP product (SPL4SMGP.007)at a 9 km spatial resolution, which were used for training the RFR in GEE.As can be seen in these sub-figures, the R values between the SMAP products and the measured SM values in Kenya and Ghana are 0.63 and 0.71, respectively, indicating a moderate-to-strong correlation and accuracy for SMAP product.The weekly SMAP observations have an average SM value of 0.32 m 3 /m 3 with a RMSD of 0.089 m 3 /m 3 and ubRMSD of 0.087 m 3 /m 3 in Kenya, while in Ghana, the average SM value is 0.20 m 3 /m 3 with an RMSD of 0.077 m 3 /m 3 and ubRMSD of 0.056 m 3 /m 3 .
The results of comparing the RFR outputs with the in-situ measurements at a 1 km spatial resolution are shown in Figures 5(b,e).Based on these figures, the proposed method can estimate weekly SM with correlation coefficient of 0.65 and 0.64 in Kenya and Ghana, respectively.As can be seen, the RMSD values for the proposed method in Kenya and Ghana are 0.082 m 3 /m 3 and 0.095 m 3 /m 3 , respectively.Moreover, considering the bias, 0.077 m 3 /m 3 and 0.061 m 3 /m 3 are achieved as ubRMSD for the proposed method in Kenya and Ghana, respectively.The spatial correlation analysis results for the downscaled SM map for Kenya and Ghana after applying the mean-pixel correction are also represented in Figures 5(c,f), respectively.The results indicate that the correlation coefficients do not change in both study areas after applying the correction.Additionally, the RMSD and ubRMSD values of the SM maps after this equation are 0.11 m 3 /m 3 in Kenya and 0.095 m 3 /m 3 and 0.058 m 3 /m 3 in Ghana, respectively.
As discussed in the preceding sections, our downscaling approach utilizes SMAP observations as a reference to train the RFR model at 9 km resolution, enabling the estimation of 1 km SM using GEE.Consequently, the precision of SMAP's measurements assumes a pivotal role in shaping the accuracy of our method's outcomes.In Figure 5, we depict the accuracy of 9 km SMAP in both validation sites, highlighting that our method's capacity to replicate SMAP measurements inherently cannot surpass the precision of the original RS observations.This is due to the absence of in-situ measurements for calibrating input values.Our primary objective centers on augmenting spatial resolution across expansive geographical extents while upholding the accuracy of input data.Hence, the efficacy of the proposed method resides in its capability to extend high-resolution SM estimation to substantial regions, even without in-situ SM data, while preserving the accuracy embedded within the input RS observations.Acknowledging the pursuit of exceptionally high levels of accuracy in SM estimation through RS observation represents a formidable challenge.The nominal accuracy of SMAP's SM PMW stands at 0.04 m 3 /m 3 (Entekhabi et al., 2014); however, actual accuracy exhibits variability from 0.04 to 0.13 across diverse global regions (Colliander et al., 2017(Colliander et al., , 2021;;Das et al., 2016;Mohseni et al., 2022).Furthermore, a wealth of research substantiates that various downscaling methods yield Root Mean Square Error (RMSE) values ranging from 0.04 to 0.12 across heterogeneous study areas and temporal frames (Abbaszadeh et al., 2019;Bai et al., 2019;Das et al., 2011;Fang et al., 2022;Hu et al., 2020;Kim et al., 2018;Xu et al., 2022;Zhao et al., 2018).This collective body of work establishes a consistent performance benchmark for downscaling methodologies.
The study examined the temporal patterns of the original SMAP product, as well as the initial output of RFR and the outputs after correction equation (Eq (3)).The SM trends of these observations were plotted and compared, as shown in Figure 6.Overall, the three maps showed similar trends.For most weeks in Ghana, it was observed that the average SM values of the initial output of RFR were closer to the SMAP data as compared to the outputs after the Eq (3).Similarly, in Kenya, the initial output of RFR showed closer proximity to the SMAP data than the outputs after Eq (3), albeit in a lower number of weeks.
The results of our study indicate that the proposed method is highly effective in estimating SM with significantly better spatial resolution compared to the SMAP products with 9 km spatial resolution.Furthermore, the proposed method achieves this improvement almost without sacrificing accuracy.This is a significant development, as it can facilitate the estimation of SM at a continental or global scale, which is typically challenging, timeconsuming, and expensive when processing RS data.Despite the promising results of the proposed method, applying the correction equation and using SMAP measurements to correct the estimated values did not lead to any significant improvements in the R, RMSD, and ubRMSD of the estimated SM.These findings indicate that while the correction method may improve the accuracy of the RFR model in some study area and under specific land cover and atmospheric condition, it may not always be necessary or effective in certain regions and time periods.

The uncertainty of the outputs
In previous studies, the distribution of stations in validation sites has been identified as a crucial factor in validating SM products retrieved from SM PMW (Beck et al., 2021;Colliander et al., 2017;Mohseni et al., 2022).A similar issue arises when estimating higher spatial resolution SM from these data.In this study, measurements from multiple stations within a coarse scale pixel are averaged and then compared with the coarse-scale values.Similarly, when evaluating SM outputs with higher spatial resolution, the values of each station are compared with the values of pixels on which they are located.In line with this issue, Core Validation Sites (CVSs) have been established to investigate the SMAP and SMOS product performances (Colliander et al., 2017(Colliander et al., , 2021)).However, within Africa, there are no CVSs and the available networks are very sparse, which can introduce limited informative value, particularly when validating the original SM map and the performance of the downscaling procedure.
Different remote sensing satellites are designed with specific spatial and temporal resolutions in accordance with their mission objectives and resource constraints.The MODIS sensor, which we utilized in our study, presents a high temporal resolution of 1 day, enabling daily LST retrievals.Variations in atmospheric conditions may, however, result in observations of LST every 3 days or even longer.Various temporal resolutions as a result of atmospheric conditions can cause uncertainty.Interpolation methods can be employed to estimate LST values on days when direct measurements are unavailable due to atmospheric conditions (Zhao & Duan, 2020;Zhu et al., 2022).However, these interpolation techniques introduce significant uncertainties that can impact the accuracy of downscaling methods.Consequently, this uncertainty could propagate discussions regarding the performance and outcomes of the downscaling method.It should be kept in mind that top layer SM (0-5 cm) values of SPL4SMGP.007product is used in this study (Chan et al., 2016;Escorihuela et al., 2010;Mohanty et al., 2017).However, the study investigates the results using in-situ SM reference data obtained at 10 cm depth, which may introduce additional uncertainties.At weekly time steps as analyzed in this study, this effect can be regarded as negligible.

Conclusion and outlook
This study proposes a workflow using the Google Earth Engine (GEE) to estimate SM at the spatial resolution of 1 km, overcoming the limitation of the tens of kilometers resolution obtained from passive microwave RS observations.The method involves MODIS optical/ thermal measurements, SMAP coarse scale SM products, and a Random Forest Regression (RFR) to estimate SM at a higher spatial resolution.The method was applied to the continent of Africa for the year 2020, producing 52 weekly SM maps that were evaluated using in-situ SM measurements from 35 stations across three validation networks.The results indicate that the proposed method can estimate SM at 1 km spatial resolution with acceptable accuracy, with an average correlation coefficient of 0.64 and a ubRMSD of 0.069.The proposed method in GEE can estimate SM at a very good spatial resolution in a nearly expanded area compared to the original SMAP measurement, which had a spatial resolution of 9 km and similar correlation coefficient and RMSD values.
The current study used an averaging method to estimate weekly optical/thermal features, and thereby, weekly SM at a 1 km spatial resolution.However, the temporal heterogeneity of SM especially in croplands is very high, making weekly estimation inadequate for some applications.Therefore, future studies may focus on using interpolation methods to retrieve optical/thermal features at shorter time intervals, such as 2-3 days (the temporal resolution of SMAP) for further implementation in the proposed method to obtain higher temporal resolution SM data for agricultural applications.This can support the improvement of agricultural practices, water resource management, and climate modeling, among other applications.

Figure 1 .
Figure 1.MODIS land cover map over the study area.In total, the 0-10 cm depth SM observations of 31 TAHMO stations, 1 COSMOS station, and 1 SD_DEM station are used in this study to validate the proposed method.

Figure 2 .
Figure2.The workflow of the proposed method for estimating SM using GEE.

Figure 3 .
Figure 3. (a) SMAP SM product (SPL4SMGP.007)at 9 km resolution, (b) ΔT s (LST day LST night ) calculated using MODIS LST products with 1 km spatial resolution, (c) estimated SM map at 1 km spatial resolution using the proposed RFR method, optical/thermal features, and SM values of SMAP, and (d) 1-km SM C map after applying the pixel correction equation on the estimated SM for Kenya, Africa.

Figure 4 .
Figure 4. (a) SMAP SM product (SPL4SMGP.007)at 9 km resolution, (b) ΔT s (LST day -LST night ) calculated using MODIS LST products with 1 km spatial resolution, (c) estimated SM map at 1 km spatial resolution using the proposed RFR method, optical/thermal features, and SM values of SMAP, and (d) 1-km SM C map after applying the pixel correction equation on the estimated SM for Ghana, Africa.

Figure 6 .
Figure 6.Temporal patterns of the original SMAP product with 9 km resolution, the 1 km SM map retrieved from RFR, and the SM map after applying the correction coefficient Eq (3) for 2020 with in-situ measurements of the Kenya (a) and Ghana (b) validation sites.

Table 1 .
The primary attributes of the SM sites which were used in the validation process.Trans-African Hydro-Meteorological Observatory (name of in situ network in Sahel Zone in Africa).
2 Cosmic-ray Soil Moisture Observing System.3 Table 2 summarizes all satellite-based data applied in this study to estimate SM values at moderate spatial resolution.All of the data in Table 2 are available for download, processing, and use in GEE cloud computing platform (https://developers.google.com/earthengine/datasets).A brief description of the main data pre-processing in this study is provided in Section 3.

Table 2 .
Main characteristics of the RS observations and products used in the downscaling methods.