Digital mapping of soil organic carbon stocks in the forest lands of Dominican Republic

ABSTRACT Mapping the spatial distribution of soil organic carbon (SOC) in lands covered by tropical forests is important to understand the relationship and dynamics of SOC in this type of ecosystem. In this study, the Random Forest (RF) algorithm was used to map SOC stocks of topsoil (0–15 cm) in forest lands of the Dominican Republic. The methodology was developed using geospatial datasets available in the Google Earth Engine (GEE) platform combined with a set of 268 soil samples. Twenty environmental covariates were analyzed, including climate, topography, and vegetation. The results indicate that Model A (combining all 20 covariates) was only marginally better than Model B (combining topographic and climatic covariates), and Model C (only combining multispectral remote sensing data derived from Landsat 8 OLI images). Model A and Model B yielded SOC mean values of 110.35 and 110.87 Mg C ha−1, respectively. Model A reported the lowest prediction error and uncertainty with an R2 of 0.83, an RMSE of 35.02 Mg C ha−1. There was a strong dependence of SOC stocks on multispectral remote sensing data. Therefore, multispectral remote sensing proved accurate to map SOC stocks in forest ecosystems in the region. GRAPHICAL ABSTRACT


Introduction
Soils hold the largest carbon (C) pool on Earth after the oceans, with an estimated total of 1,500-2,400 Pg C up to 1 m depth (Scharlemann et al., 2014;Tifafi et al., 2018). Soil organic carbon (SOC) directly influences the physicochemical properties nutrient retention capacity and infiltration rate of the soil (Scholten et al., 2017;Viscarra Rossel et al., 2016). In addition, SOC has the potential to mitigate the adverse impacts of current and future climate change (Edenhofer et al., 2014) and help improve the primary productivity of the biosphere (Grinand et al., 2017).
The Global Climate Observing System (GCOS) has identified 54 Essential Climate Variables (ECVs), including components of the cryosphere, biosphere, and hydrosphere that are "needed to understand and predict the evolution of climate, to guide mitigation and adaptation measures, to assess risks and enable attribution of climate events to underlying causes, and to underpin climate services" (WMO, 2020). The present work focuses on one of these ECVs, SOC stocks, particularly those stored in lands covered by tropical forests.
The global C stocks in forest biomass and their spatial distribution are relatively well documented and have been estimated with reasonable accuracy compared to SOC stocks (Baccini et al., 2012;Harris et al., 2012;Ruesch & Gibbs, 2008;Saatchi et al., 2011). Most local and international policies for climate change mitigation have focused on conserving and studying the C stored in forests. In addition to this, SOC is of great importance because the Earth's soils store two-to-three times as much carbon in organic form as there is C in the atmosphere globally (Trumbore, 2009). In this sense, the construction of a robust and transparent system to measure, report and verify (MRV) SOC changes represents a key tool to support compliance with the Sustainable Development Goals (SDG), specifically the SDG indicator 15.3.1 "Proportion of land that is degraded over the total land area" Jan & Jeffrey, 2018).
At present, there is still uncertainty about the amount of global SOC stocks and their spatial distribution, mainly due to the little attention given by decision makers at the local, national and international levels (Gianelle et al., 2010). In an analysis and review of 27 studies that estimated SOC globally, it was found that the SOC mean value is approximately 1,460.5 Pg C, ranging from 504 to 3,000 Pg C (Scharlemann et al., 2014). One of the main reasons for the uncertainties found in these estimates is the large number of factors that interfere in SOC dynamics combined with all the uncertainties leading to error propagation associated with the difficulty in assessing C and soil bulk density (Köchy et al., 2015).
Even though there is scientific interest in monitoring forests and soils, there is a lack of data to carry out efficient monitoring and determine the current state of these resources (Liang et al., 2016). In 2017, the Food and Agriculture Organization (FAO), the Intergovernmental Technical Panel on Soils (ITPS), the Intergovernmental Panel on Climate Change (IPCC), the Global Soil Partnership (GSP), the Science-Policy Interface of the United Nations Convention to Combat Desertification (UNCCD-SPI), and the World Meteorological Organization (WMO) jointly organized a Global Symposium on Soil Organic Carbon. This symposium provided guidelines for developing efficient systems and protocols for measuring SOC with higher accuracy (FAO, 2020a). In the last decade, digital soil mapping (DSM) approaches have focused on mapping SOC using remote sensing techniques as the main emerging tool to improve spatial estimates of SOC (Mahmoudzadeh et al., 2020;Padarian et al., 2019). DSM allows quantifying the spatial variation of SOC stocks using environmental covariates (Zhang et al., 2017), which describe the relationship of a soil attribute and its spatially implicit forming factors (Jenny, 1941). The environmental auxiliary variables of SOC can be obtained from digital elevation models (DEM) (Farr et al., 2007;B. Wang et al., 2018), remote sensing data (Duarte et al., 2020;Wulder et al., 2016;Xiao et al., 2019) and climatic data (Ermida et al., 2020;Veronesi & Schillaci, 2019). The easy accessibility of satellite images combined with Machine Learning (ML) techniques has significantly improved the accuracy of SOC mapping. In a review of 120 studies on SOC mapping, in which different ML techniques were applied, it was found that the Random Forest (RF) algorithm has optimum performance in the selection of environmental covariates for SOC mapping. At the same time, it also behaves better than other ML techniques and Multiple Linear Regression (MLR) (Lamichhane et al., 2019).
There are few studies on SOC mapping of Dominican Republic lands. The main report comes from the Global Soil Organic Carbon Map (GSOCmap) launched by FAO. In fact, GSOCmap represents the first global estimation of SOC content carried out with a participatory approach to compile all the available data on soils at the national level . With regard to the tropics, SOC estimates are very limited on forest lands since most of the research has focused on estimating SOC from agricultural lands.
The objective of this study was to estimate SOC content in forest lands of the Dominican Republic and their spatial distribution by applying ML techniques, a dataset of environmental covariates obtained from remote sensing (RS) and field data. We compared the influence of three groups of predictive variables for SOC mapping: (1) multispectral remote sensing variables, (2) topographic variables, and (3) climatic variables. The performance of the ML model was also evaluated. Our model was implemented in the Google Earth Engine (GEE) cloud-based computing platform (Gorelick et al., 2017).

Materials and methods
The overall structure of the method (Figure 3) consisted of five stages such as: (i) selection of a geospatial dataset; (ii) data pre-processing; (iii) model building/ development; (iv) evaluation of the model performance; and (v) mapping of SOC.

Study area
The study was carried out in the Island of Hispaniola (central region of the Caribbean), Dominican Republic. It corresponds to forest lands located between 17°36′ and 19°58′ latitude north, and 68°19′ and 72°01′ longitude west, belonging to the Greater Antilles. The territory of the country covers 48,198 km 2 (Figure 1).
The Dominican Republic has a tropical climate, which is altered only by the Alyssian winds of the Atlantic and topographical factors of the Island. The average annual temperature is 25°C, with August being the hottest month and January the coldest. Precipitation is distributed in two seasons: a rainy season, which goes from April to June and from September to November, with precipitation of 2,500 mm yr −1 ; and a dry season, which goes from December to March, with precipitation of 450 mm yr −1 . On the Island, the areas with the highest humidity are in the north because they are influenced by the Atlantic Ocean, while the driest areas are found to the south, along the Caribbean coast (Cano-Ortiz et al., 2015).
The soils are divided into 10 classes based on characteristics such as depth, slope, and drainage. Soil classes correspond to savannah, non-calcareous clay and calcareous soils as well as soils derived from igneous rocks, soils of volcanic and metamorphic origins, recent alluvial soils, organic soils, wetlands, coastal beaches and dunes (MARN, 2012). Our study focused on soils covered by pine, broadleaf, dry and mangrove forests, covering an area of 14,499 km 2 .

Forest mask
To estimate SOC stocks, the study area was defined with a forest mask. For this, we used the land cover map prepared in 2015 by the Ministry of the Environment and Natural Resources (MARN) of the Dominican Republic. On this map, the concept of forest was applied in accordance with the FAO Forest Resources Assessment (FRA-FAO) defined by the country as "Natural or planted ecosystem of at least 0.5 hectares covered by trees higher than 5 meters and with a canopy cover of more than 40%" (FAO, 2020b). Forest types ( Figure 2) that are part of this study are defined as follows (MARN, 2012): Broadleaf forests. represented by trees where the combination of broad-leaved species predominates; it comprises plant communities from semi-humid in transition to cloudy. It is the type of forest with the largest existence in the country. It is classified as cloudy broadleaf forest, located in areas with elevations from 600 to 2,300 m above sea level (m.a.s.l.); moist broadleaf forest in areas with elevations from 300 to 1,500 m.a.s.l. and semi-humid broadleaf forest located in areas with elevations up to 900 meters above sea level, or m.a.s.l.
Dry forests. mostly secondary forest; they are composed of semi-deciduous trees that develop at elevations below 500 m.a.s.l. The greatest presence of these forests is located in the lowlands, both southsouthwest and northwest of the country.
Pine forests. composed of pine species. The authentic Dominican pine forest is located especially in the highlands and the dominant species is Pinus occidentalis, which is found in the large mountain ranges with elevations above 2,000 m.a.s.l. The pine forest has two types of cover: high and low canopy cover density.

Mangrove forests.
Coastal and wet ecosystems found in swampy and flooded regions; they mainly belong to the Rhizophoraceae family, with exposed supporting roots. This forest is located at elevations below 20 m.a.s.l.

Soil organic carbon database
In this study, 268 soil samples from the National Forest Inventory (NFI) were used. The NFI was collected in 2018 by the MARN of the Dominican Republic with the support of the REDD/CCAD-GIZ program and the World Bank's Forest Carbon Partnership Facility (FCPF) (available in: https://www.sica.int/documentos/ inventario-forestal-nacional-de-republica-dominicana_ 1_126744.html) (MARN, 2021). The sampling design adopted by the NFI corresponds to stratified sampling for each forest stratum. For building the NFI, the methodology proposed by the REDD/CCAD-GIZ program was used (Feliz et al., 2019;MARN, 2021).
Data used were taken from 268 plots located in the different forest types ( Figure 2; Table 1). Each soil sample was collected from the NFI at a depth of 0 to 15 cm and their geographic coordinates were recorded with a global positioning system device (GPS). In the NFI, SOC data were reported with an extrapolation from depth of 0 to 15 cm. The samples were numbered, bagged, and brought back to the Laboratory of Soils and Water of the Dominican Institute of Agricultural and Forestry Research Agricultural Technology Center (CENTA) and the Dominican Agribusiness Laboratory (LAD). After drying, the samples were weighed and passed through a 2-mm sieve. The determination of SOC content (Mg C ha −1 ) is based on the Walkley & Black chromic acid wet oxidation method (Walkley & Black, 1934).
Bulk density (BD) was determined on subsamples dried at 105°C as described by (Dane Topp et al., 2002). Results were reported as g cm −3 on an ovendry basis and SOC was reported as g (100 g) −1 . Soil organic carbon stock (SOCS) was computed as the product of three variables, organic carbon content (C), bulk density (BD), and thickness (D). SOCS was calculated according to Equation 1: Where C is the concentration of soil carbon (g C (100 g) −1 ); BD is bulk density (g cm −3 ), D is the thickness of the layer (cm), gravel [%] is the percentage of gravel in the soil sample. Table 1 shows the descriptive statistics of SOC (0-15 cm depth) samples collected from the NFI. SOC contents ranged from 15.95 to 282.38, with a mean value of 110.35 and a median of 101.34 Mg C ha −1 . The coefficient of skewness is −0.46 Mg C ha −1 . The sampling point´s standard deviation (SD) is 63.78 Mg C ha −1 and is lower than the mean value.
Once the forest mask and environmental variables was defined, a method to estimate SOC content in forest soils with a geospatial dataset was developed, they were combined with the soil samples collected in the field, and a regression model was applied for each of the three models; the covariates were divided using the RF algorithm in the GEE platform; finally, the models were evaluated and the spatial distributions and SOC stock map was built ( Figure 3). We iterated the model A, B, and C and calculated the average standard deviation (SDs) to analyze the uncertainty of each model in predicting topsoil SOC ( Figure 6).

Environmental predictors
For the digital mapping of SOC, we selected 20 accessible and commonly used predictive environmental dataset covariates, which represent key factors for the spatial distribution and formation of SOC content such as: vegetation, soil, topography, and climate (McBratney et al., 2003). These covariates represent factors of soil formation according to (Jenny, 1941). Further spectral vegetation indices (SVIs) were calculated using Landsat-8 images (Table 2). From the combination of these dataset covariates with data soil samples, three models with different combinations of predictive variables were built, using the RF algorithm for the digital mapping of SOC. The models were as follows: For each of the models, the RF algorithm was applied, and its accuracy was evaluated. The relative importance of the variables in the model was also assessed. All the datasets were obtained and processed in the GEE cloud-based platform.
All Landsat-8 OLI surface reflectance data from the year 2018 ± 0.5 (available in the GEE platform) were used in this study: a total of 92 images from path 123 and row 32; 89 images from path 125 and row 34 and 94 images from path 125 and row 33. Landsat surface reflectance data were atmospherically corrected using the Landsat Surface Reflectance Corrected (LaSRC) (OLI) algorithms ; We used methods provided by Earth Engine for filtering image collections using the code "imageCollection.filterDate()" and we built a composite mosaic multiband and multi- The CFmask algorithm was used to mask cloud and shadow produced, as well as a per-pixel saturation mask (Zhu & Woodcock, 2012). In addition, the empirical Earth rotation model (ERM) was used as a basis to perform a terrain illumination correction algorithm (Tan et al., 2013), which allowed conducting the topographic correction for each image. For reflectance images, we used the medoid method (Flood, 2013).

Climatic variables.
We used the climatic datasets available in the GEE platform closest to the date soil sampling, such as Moderate Resolution Imaging Spectroradiometer (MODIS) MOD11A1 V6 product, which provides daily land surface temperature (LST) and emissivity Daily Global 1 km. GEE collection snippet: ee.ImageCollection("MODIS/006/MOD11A1").
The temperature value is derived from the MOD11_L2 swath product (Wan, 2014). Relative humidity data (2 m above ground) were obtained from the Global Forecast System (GFS). This is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP) and National Aeronautics and Space Administration (NASA). The GFS is a coupled model, composed of an atmosphere model, an ocean model, a land/soil model, and a sea ice model, which work together to provide an accurate picture of weather conditions (Saha et al., n.d.). GEE Collection snippet: ee. ImageCollection ("NOAA/GFS0P25").
For the precipitation data, we used the Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS) dataset, which builds on previous approaches to "smart" interpolation techniques and high resolution, long period of record precipitation estimates based on infrared Cold Cloud Duration (CCD) observations (Funk et al., 2015). GEE collection snippet: ee. ImageCollection("UCSB-CHG/CHIRPS/DAILY").
Topographic variables. Terrain analysis is crucial for modeling environmental systems. Specifically, the topography is considered as a variable that can largely explain SOC changes. In fact, models that take topographic attributes into account can provide better estimates of SOC stocks (McBratney et al., 2003). We used the digital elevation model (DEM) derived from NASA's Shuttle Radar Topography Mission (SRTM DEM) (Farr et al., 2007). We calculated the topographic slope, aspect and elevation from this SRTM V3 product (SRTM Plus) at a resolution of 1 arc-second (approximately 30 m). GEE collection Snippet: ee.Image("USGS/ SRTMGL1_003"). The number of terrain and climate-based covariates used within each dataset is shown in Table 3, and an example of its geographical representation is shown in Figure 4.

Google earth engine (GEE) platform
The model for mapping SOC stocks in forests developed in the present study was built in the GEE cloudbased computing platform. GEE is a platform designed for scientific analysis at the petabyte (PB) scale and has an extensive public data catalog for earth observation (Gorelick et al., 2017). One way to use this platform is using an online tool called The Code Editor, which lets the user access the platform using a scripting language (JavaScript).
GEE has hosted historical images of the Earth for more than forty years. The images collected daily are made available to the public for data mining on a global scale. GEE allows processing massive data of a raster format for large areas and with high volumes of information. In our case, topographic, climatic and vegetational variables were analyzed (Tables 2 and 3) with high performance and minimum user involvement in the processing. The algorithm used in GEE to estimate SOC contents was the Random Forest (RF). The codes developed in this study using the GEE cloud-based computing platform are available in Appendix A.

Random forest (RF) modelling
In this study, the RF algorithm was selected to predict the SOC stocks in the forest ecosystem of the Dominican Republic. RF is one of the most popular and most powerful supervised ML algorithms that can perform both regression and classification tasks. As the name suggests, this algorithm builds a set of regression trees. Each of the trees predicts the result in each pixel, while the final prediction is obtained averaging these values (Breiman, 2001). We used the RF to estimate the relative importance of the predictive variables. For the prediction accuracy, the 268 SOC samples were randomly divided into 2 sets: 70% of the total samples were used as model training data (n = 188), and the remaining 30% for model validation and accuracy assessment (n = 80). RF modelling was performed using the GEE cloud computing platform applying the following line of code: ee.Classifier. smileRandomForest.setOutputMode ("REGRESSION"). The principal parameters of the algorithm were: number of decision trees = 100 and default values to min leaf population (1), variables per split (square root of the number of variables), bag fraction (0.5), max nodes (defaults to no limit), seed (0) and set output mode = regression.

Model evaluation and uncertainty analysis
To evaluate the performance of the SOC model, five indexes were calculated using the following formulas: coefficient of determination (R 2 ), Lin's concordance correlation coefficient (LCCC) (Lin, 1989), root-mean-square error (RMSE), mean absolute percentage error (MAPE), and mean absolute deviation (MAD).
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi 1 n where n i ¼ 1; 2; . . . ; n ð Þ is the number of samples used for the ML model, y i is the value observed (Mg C ha −1 ), � y i is the corresponding mean value, f i is the predicted value (Mg C ha −1 ), � f i is mean value. ∂ x and ∂ y are the variances of the predicted and measured values; r is the correlation coefficient between the predicted value and the measured value.
The prediction algorithm with the lowest MAD, MAPE and RMSE, and highest R 2 and LCCC values are determined as the best model for SOC prediction. We iterated model A, B, and C and calculated the average standard deviation (SDs) to analyze the uncertainty of each model in predicting topsoil SOC ( Figure 6).

Geospatial environmental predictive datasets
The summary statistics for each of the 20 environmental covariates at each sampling site used in the present study are shown in Table 4. To describe the environmental covariates, they were divided into three groups that fit different models in our study: (i) imagery remote sensing data set, (ii) terrain-based covariate data set, and (iii) climatic-based covariates.
The relationships between SOC content and each predictive covariate is of great importance to know information about the contribution of each covariate in the model. Pearson`s correlation analysis between  SOC and predictive covariates was derived as shown in Figure 5. SOC stock was positively correlated with temperature, TWI, B blue, B red and B green band, but negatively correlated with precipitation, relative humidity, elevation, and slope. Interestingly, the correlations with the images were the most significant. Finally, we found that there was multicollinearity between the vegetation indexes derived from remote sensing and SOC.

Soil organic carbon models
SOC content in forest-covered soils of the Dominican Republic were estimated using three different models that grouped a series of geospatial datasets. Table 5 shows the descriptive statistics of soil organic carbon (values in Mg C ha −1 ) for each model. Model A recorded the highest mean value of SOC (110.35 Mg C ha −1 ). The analysis of variance applied showed no significant differences (p < 0.05) between the three models evaluated. The use of climatic and topographic covariates helped improve the model, but not significantly. Using only multispectral imaging can produce good results in digital mapping of SOC at a depth of 0-15 cm.

Spatial model performance
Comparisons of the performance for the three dataset models by cross-validation is shown in Table 6, Figures 6 and 7. We found that Model A is the model with the best performance, explaining 83% of the spatial variation of SOC. In general, the more predictors, the better the model. This model groups 20 predictive variables in a dataset: multispectral remote sensing variables + topographic variables + climatic variables. Model B (topographic and climatic variables) yielded an R 2 = 0.77 and an RMSE = 38.57 Mg C ha −1 . Interestingly, Model C (only multispectral remote sensing-derived variables) yielded an R 2 = 0.79 and an RMSE = 35.69 Mg C ha −1 . These results are consistent with the significant correlations between covariates and SOC (see, Figure 7). We iterated the model A, B, and C and calculated the average standard deviation (SDs) to analyze the uncertainty of each model in predicting topsoil SOC ( Figure 6). We found the highest SD in Model C for all forest types and Model A with the lower uncertainty compared to models B and C.

Covariates relative importance
The average relative importance of each covariate derived from the geospatial dataset to estimate SOC was calculated. To facilitate the analysis for each model, we combined the relative importance of all environmental covariates to 100% (Figure 8). For the dataset grouped in Model A, the 3 most important covariates were slope, temperature and NDVI (34% of the total relative importance). The vegetation indices were ranked at different levels. In Model B, which groups climatic and topographic covariates, the covariates of elevation and precipitation recorded the highest relative importance (48% of the total relative importance). For model C, which only groups covariates derived from Landsat 8 satellite images, the Index-Based built-up Index (IBI) was the covariate with the highest relative importance in the model. For the IBI index, three thematic indices were used: the Modified Normalized Difference Water Index (MNDWI), the Soil Adjusted Vegetation Index (SAVI), and the Normalized Difference Built-up Index (NDBI; Xu, 2008). Table 2 provides further details of the indices used and Figure 9 provides further details of the indices per each type of forest.

SOC stock spatial distribution
A spatially explicit SOC map was created using the GEE cloud computing platform. The results obtained with the three spatial distribution models to predict SOC content in forest-covered areas of the Dominican Republic are shown in Figure 10. Non-forest areas were excluded from the analysis and are shown without color on the map.  Analyzing the best model obtained in our study (Model A), identified that the spatial patterns of SOC are closely related to the type of forest. The highest SOC (0-15 cm depth) contents are found in the mangrove forests that are located in the coastal areas of the country, with average estimates of 131.87 Mg C ha −1 , and maximum and minimum values of 193.09 Mg C ha −1 and 63.91 Mg C ha −1 , respectively ( Table 7). The lowest SOC content is found in the soils covered by pine forests, especially those located in the elevated and steep slopes, in the central and southern region of the country. These soils have a SOC mean value of 89.06 Mg C ha −1 and a minimum value of 44.76 Mg C ha −1 . Most of these soils are dominated by degraded forests with low productivity and dry shrub vegetation. Their low SOC content is attributed to steeper slopes, which make soils more susceptible to erosion and greater water discharge.
We found that in the soils covered by forests in the Dominican Republic, a total of 144,051,831 Mg C is stored in the topsoil (0-15 cm depth), with 52.1% corresponding to soils covered by broadleaf forests, 31.2% covered by dry forests, 14.3% covered by pine forests, and 2.4% covered by mangrove forests. Table 7 shows more details of the results of SOC obtained in the three models for forest type.

Method for measuring and monitoring SOC stocks: a contribution to regional and global initiatives
SOC stocks have acquired great relevance due to the role they play in climate regulation and as an important indicator of soil quality. International organizations such as The United Nations Framework Convention on Climate Change (UNFCCC), the United Nations Convention to Combat Desertification (UNCCD) and the Convention on Biodiversity (CBD) have widely recognized the importance of SOC in the international framework of climate change mitigation. In this sense, there have been emerging regional initiatives aiming at the sustained production of soil information, such as the   Soil Information System for Latin America and the Caribbean (SISLAC), or global initiatives such as ISRIC -World Soil Information, legally registered as the International Soil Reference and Information Center, which has the mission to serve the international community as custodian of global soil information. Similarly, the Group of Earth Observation (GEO) has established a Global Soil Information System (GLOSIS) as part of the Global Earth Observation System of Systems (GEOSS). All of these initiatives have encouraged countries to establish national systems for monitoring and measuring SOC. Therefore, there is a need to develop accurate, replicable and low-cost methods to quantify and monitor SOC stock changes.
In the present study, we performed a digital mapping of SOC stocks (in the area under study), using a combination of freely accessible geospatial datasets provided in the GEE cloud computing platform, and field data using the RF algorithm to predict the distribution of SOC in forest soils of the Dominican Republic. To our knowledge, this is the first attempt to map the SOC stocks using this type of technique, in the country and the tropical region of Central America and the Caribbean. The results obtained are encouraging because the three models used had a good performance, even with variables derived only from Landsat 8 OLI images (Model C).
Compared with other methods used to map SOC stocks in tropical regions (Guevara et al., 2018;Ramesh et al., 2015;Rossi et al., 2009;Vasques et al., 2016), our method eliminated excessive soil sampling, which can result in a high cost, particularly in those territories covered by forests where access is very difficult.

Importance of variables in the SOC prediction model
The CLORPT model (CL: Climate; O: Organism, vegetation; R: Relief; P: Parent material; and T: time) (Jenny, 1941) and the SCORPAN model (S: property or soil class; C: climate; O: organisms; R: topography; P: parent material; A: age or time factor; and N: space, spatial position) (McBratney et al., 2003) are conceptual models commonly used for digital soil mapping since they relate environmental covariates to soil properties. However, these relationships between covariates and properties differ depending on the geographical area of the soil being analyzed. A review conducted by (McBratney et al., 2003) indicated that the key environmental covariates to infer soil properties were relief (80% of studies), followed by soil class (S) (35%), vegetation (O) and parent material (P) (both 25%), spatial position (N) (20%) and climate (C) (5%).
Several studies have indicated that topographic factors such as elevation and slope have a higher correlation with SOC changes (J.L. Boettinger et al., 2010;Hinge et al., 2018). The NDVI allowed us to understand the importance of the amount of biomass and vegetation cover to predict SOC stocks. Other studies have also reported that SOC can be estimated only by the presence of vegetation (Yang et al., 2008;Zhao & Shi, 2010). Therefore, the NDVI can be used as an approximation to determine SOC. The BSI allowed us to understand the importance of analyzing bare soil, especially in highly fragmented secondary forests such as the forests of the Dominican Republic and many tropical forests. This type of forest structure indicated that vegetation and bare soil combine to generate a forest with a low canopy density (mainly fragmented forests due to human intervention). Our results show that the IBI can significantly enhance the model to predict SOC in fragmented forests with a low canopy density effectively suppressing background noise caused for bare soil. Xu (2008) found that the IBI possesses a positive correlation with land surface temperature and negative correlations with the NDVI.
In terms of the climatic covariates used in our study, temperature was the second most important variable, which explains the SOC changes in Model A. In this sense, previous research has indicated that temperature is a direct predictor of SOC since it has a major influence on determining the type of vegetation, its growth and the microbial decomposition of organic matter (M. Wang et al., 2014).

Comparative analysis of other SOC measurements and mapping initiatives in the region
There have been different local initiatives for the measurement of SOC in the Central American Region in recent years. These have focused on collecting soil C data as part of a multipurpose methodology of local forest inventories, including the 5 pools of carbon defined by the IPCC. However, a wall-to-wall mapping of SOC has not been generated yet. Our study is the first report in which soil C data is used in combination with ML techniques and open-access dataset and available in the GEE cloud computing platform for geographically explicit mapping of the SOC.
ML techniques are widely used in digital SOC mapping as they combine complex and non-linear relationships between different soil attributes and predictive environmental covariates (Drake et al., 2006). Although various prediction algorithms have different capabilities, size of the training sample affects more than the selection of models to improve the prediction accuracy of SOC (Somarathna et al., 2017).
By using the RF algorithm, Model A yielded a SOC mean value (0 − 15 cm) of 110.35 Mg C ha −1 (Table 5). This value is higher than the mean value of 81.04 Mg C ha −1 reported in the Global Soil Organic Carbon Map (GSOCmap V1.5) prepared by the Food and Agricultural Organization (FAO). The global soil carbon map consists of national SOC maps, developed as 1 km soil grids, at a depth of 0-30 cm (FAO, 2020a); this is the main and most recent SOC mapping initiative existing in the region with which we can compare the results obtained herein. Another comparison with actual SOC mapping, shows that the SOC mean value obtained in the present study is lower than that of 128.80 Mg C ha −1 reported in SoilGrids -global gridded soil information; this is a system for digital soil mapping based on a global compilation of soil profile data (WoSIS) and environmental layers, which uses state-of-the-art ML methods to map SOC contents at a depth of 0-15 cm and 250 m resolution (Hengl et al., 2017). We believe that, due to the source of the data, depth of the soil sample, spatial resolution, scale of the map and the technique used to predict SOC contents, our results are slightly different from the results reported by these two global initiatives for the Dominican Republic. Figure 11 shows a comparative analysis of the three maps described.
Other reports on SOC in Central America are found in El Salvador and Costa Rica, where soil C was measured at 0-20 cm and 0-30 cm depths, respectively. Both measurements were part of the report of the National Multipurpose Forest Inventory, which was built to quantify the SOC stocks of those countries in order to report Forest Reference Emission Levels (FREL/REDD+) to the UNFCCC. However, geographically explicit maps of SOC stocks were not developed. In El Salvador, the mean SOC value in soils covered by forests was 137.45 Mg C ha −1 at 0-20 cm depth (García & MARN, 2018), and Costa Rica, the mean value was 108.81 Mg C ha −1 at 0-30 cm depth (Emanuelli et al., 2015); both reports are close to those obtained in our study (110.35 Mg C ha −1 ). As mentioned above, these are the most recent reports for the region, and they provide evidence of the potential that our methodology has as it can be replicated in other countries in the future, and thus contribute to SOC mapping at the regional and global levels.

Conclusions
The present study developed and applied a methodology for SOC mapping in forest lands, using geospatial datasets available in the GEE platform. This approach opens new possibilities for applying ML techniques that will allow countries to develop robust, transparent, consistent, and replicable systems for measuring and monitoring C in soils. In our study, Figure 11. Comparative map of soil organic carbon (Mg C ha −1 ) obtained with Model A versus Global Soil Organic Carbon Map (GSOCmap V1.5) and SoilGrid map 2.0. A1). Model A included all predictive covariates (Multispectral remote sensing variables + topographic variables + climatic variables); A2) zoomed-in image to SOC map derived from Model A. B1). Global Soil Organic Carbon Map (GSOCmap V1.5); B2) zoomed-in image to GSOCmap V1.5. C1). SoilGrids map V2.0); C2) zoomed-in image to SoilGrids map V2.0.
we found a better coefficient of determination in Model A (topographic, climatic, and Landsat 8 OLI imagery datasets), and determined that the SOC content is mostly related to slope, temperature, and NDVI.
This study shows that freely accessible multispectral optical imagery available in the GEE platform, such as Landsat 8 OLI, can by itself estimate SOC with adequate accuracy in the tropical forests of the Dominican Republic. Adding climatic and topographic covariates improved the model, but not significantly. The results obtained allow indicating that multispectral images are a good tool for SOC digital mapping at 0-15 cm depth. ML helps simplify model adjustments. This is a great advantage because ML allows mapping SOC content using many predictive variables of climatic, topographic, or vegetation type with minimum human interaction; however, using soil samples distributed and stratified by forest type was crucial to improve model prediction of SOC content with an R 2 of 83% We found that the GEE platform has excellent potential in "wall-to-wall" SOC mapping in forest lands. Further research is required on the use of these tools for SOC mapping in different land uses (e.g., agricultural and livestock soils), different ecosystems beyond the tropics, and at further depths. We can conclude that the methodology developed may encourage new research that favors the fulfillment of the Pillar 4 Implementation Plan towards a Global Soil Information System within the framework of the Global Soil Alliance, and especially the support of Indicator 15.3.1 of the Objectives of Sustainable Development.
Future research is needed to evaluate; i) new spatial datasets available such as SoilGrid database, Sentinel-1 Synthetic Aperture Radar (SAR) data, and other satellite images like MODIS, Sentinel -2 MultiSpectral Instrument (MSI), or Planet. ii) other algorithms such as deep learning (neural network), iii) perform hyperparameter estimation and optimization of the RF model to ensure maximum model accuracy, iv) increase the number of SOC stock samples to improve model performance, and v) assessing the land use and land cover change effect on soil organic carbon from landscape to national scale are some outlooks for future studies.