Forecasting monthly soil moisture at broad spatial scales in sub-Saharan Africa using three time-series models: evidence from four decades of remotely sensed data

ABSTRACT Soil moisture is a critical environmental variable that determines primary productivity and contributes to climatic processes. It is, therefore, important to forecast soil moisture to inform expectations of derivative outputs reliably. While forecasting soil moisture continues to advance, there is a need to extend it to different geoclimatic regions, including in sub-Saharan Africa, where livelihoods predominantly rely on subsistence agriculture. We used remotely sensed soil moisture data produced by the European Space Agency – Climate Change Initiative (ESA CCI). The data, which covered the period 1978 to 2019, were used to forecast monthly soil moisture in different agroecological zones and land cover types. The Seasonal Random Walk, Exponential Smoothing and Seasonal Autoregressive Integrated Moving Average (SARIMA) forecasting models were trained on 70% of the data (November 1978 – August 2007) and subsequently applied to a test dataset (September 2007 – December 2019). All models showed solid prediction accuracies for all agroecological zones (unbiased root mean square error, ubRMSE ≤ 0.05 m3 m−3) and land cover types (ubRMSE ≤ 0.04 m3 m−3). This was corroborated by similarities in season-adjusted anomalies between observed and forecasted soil moisture for nearly all agroecological zones and land cover types, with a correlation coefficient of r > 0.5 for most locations). The broad-scale interpretation of soil moisture forecasting can inform moisture availability and variability by regions; however, more research is encouraged to improve forecasting at spatially and temporally detailed levels to assist small-scale farming practices in the continent.


Introduction
Soil moisture is a crucial resource that serves as an immediate and main source of water for vegetation growth (Entekhabi et al., 2010;Petropoulos et al., 2015;Robock et al., 2000).In addition, it forms part of hydrological systems and thus controls the exchange of energy between the atmosphere and terrestrial ecosystems (Robinson et al., 2008;Seneviratne et al., 2010;Vereecken et al., 2008).Due to its vital ecosystem services and climate-moderating functions, soil moisture is recognized as an Essential Climate Variable that supports the goals of the United Nations Framework Convention on Climate Change (UNFCCC) and the Intergovernmental Panel on Climate Change (IPCC) (Bojinski et al., 2014).Understanding the amount and dynamics of soil moisture is vital to inform the determination of land capability for a specific purpose and to assist in the modelling of hydrological and energy cycles between land and the atmosphere (Agustí-Panareda et al., 2010;Koster et al., 2009;Seneviratne et al., 2010).
Traditional soil moisture monitoring method involves regular recording at strategically located sampling points (Robock et al., 2000).While field-based observations remain a reliable method for quantifying and monitoring soil moisture, they are limited to sampling locations (Seneviratne et al., 2010;Susha Lekshmi et al., 2014).Therefore, these methods fail to capture the spatially distributed nature of soil moisture caused by variations in landscape characteristics such as topography, vegetation composition, soil type and hydrology (Zhang et al., 2019).Moreover, field surveys involve substantial finance and labour to gather representative information from large spatial areas.On the other hand, remote sensing enables a synoptic observation of large spatial areas in a timely and costeffective manner.Various space-borne sensors have been used to derive soil moisture-related information.Examples of widely used sensors to retrieve soil moisture globally include Soil Moisture Active Passive (SMAP), Advanced Microwave Scanning Radiometer (AMSR), Advanced Scatterometer (ASCAT), Soil Moisture and Ocean Salinity (SMOS) and Sentinel-1.Sabaghy et al. (2018) provide an overview of the remote sensing systems and their inherent data interpretation principles for characterizing soil moisture.
Remotely sensed soil moisture data have been applied to different applications such as irrigation monitoring (e.g.Brocca et al., 2018), drought mapping (e.g.Liu et al., 2019), crop yield modelling (e.g.Gibon et al., 2018), agricultural risk monitoring (e.g.Champagne et al., 2015), vegetation status estimation (e.g.Boke-Olén et al., 2018) and runoff simulation (e.g.Liu et al., 2018), among others.The frequent data acquisition mode offered by remote sensing is ideal to characterize the time-series process of soil moisture.However, improving the accuracy of capturing the time-series process of soil moisture using remote sensing continues to be the focus of research (Fu et al., 2014;Konings et al., 2017;Loew & Schlenz, 2011;Zwieback et al., 2018).One of the improvements in soil moisture estimation is aligned with advances in spatial resolution of remotely sensed data (Babaeian et al., 2019;Korres et al., 2015).For example, Sadeghi et al. (2017) introduced OPtical TRApezoid Model (OPTRAM) that exploits the relationship between the Normalized Difference Vegetation Index (NDVI) and shortwave infrared bands of remotely sensed data to estimate soil moisture.This model has two benefits: firstly, it captures the temporal dynamics of soil moisture through reliance on the physical evidence of NDVI, and secondly, it can provide a high level of spatial detail considering the prevalence of high-spatial-resolution optical data worldwide.However, such an approach remains limited to opportune atmospheric weather conditions that affect the quality of optical data.Therefore, remote sensing systems that use highresolution radar sensors are the ideal alternative to optical remote sensing.In this regard, downscaling coarse-resolution data and advanced high-spatial resolution images such as Sentinel-1 is becoming an important source of data for mapping soil moisture from the local to regional scale (Huang et al., 2020;Ma et al., 2020;Meyer et al., 2021;Vergopolan et al., 2021).
Another important approach to improve soil moisture estimation is by factoring in land cover types and climatic scenarios (Baik et al., 2019;Celik et al., 2022;Dente et al., 2013;González-Zamora et al., 2019;Wang et al., 2018;Zohaib et al., 2017).These scenarios influence moisture holding capacity of soil through processes such as evapotranspiration, energy flux, infiltration and percolation, among others (Seneviratne et al., 2010).For example, Feng and Liu (2015) showed the variation in the relationship between climatic variables (precipitation and temperature) and soil moisture for different land cover types in a humid environment in China.Their study, however, was limited to correlation analysis rather than a forecasting exercise.Wang et al. (2018) performed a time-series analysis of soil moisture grouped by land cover types and noted variations of overall trends with land cover types.They also revealed the differential influence of precipitation and temperature on soil moisture variation.Zohaib et al. (2017) studied the spatiotemporal pattern of root zone soil moisture (RZSM) stratified by major climatic zones in East Asia and reported RZSM trends being dependent on climatic zone.They also showed the temporal correlation of RZSM with climatic variables, including precipitation, skin (radiometric) temperature and actual evapotranspiration in each climatic zone.
While soil moisture estimation has improved significantly with remote sensing techniques, variation in uncertainty along land cover and climate gradients remains one area of focus for improvement (Dorigo et al., 2010;Kim et al., 2015;Pierdicca et al., 2015).In addition, the complex interplay of landscape factors controlling soil moisture characteristics (Di et al., 2019;Zwieback et al., 2018) suggests the need to explore soil moisture dynamics in different climatic and land cover setups.The above studies imply the need to investigate the time-series patterns of soil moisture in different environmental regions of the world, considering variations in climatic and landscape characteristics.In earlier work, Legates et al. (2010) highlighted the complexity of soil moisture driven by the spatial variations in biogeographical and climatic factors and suggested more research efforts to improve soil moisture monitoring.One of the suggestions refers to the need for improving modelling performances in regional-level assessments.This recommendation, though it applies to a variety of surface models, serves well to address the challenges of soil moisture estimation.The second recommendation encourages for developing soil moisture forecasting capacity.This study, therefore, aims to forecast soil moisture amounts in sub-Saharan Africa at broad land cover and agroecological scales using monthly time-series data spanning nearly four decades.The forecasting exercise in this study uses aggregated data at broad management units (agroecological and land cover zones).This approach indeed conceals localized variations; however, it is helpful as a quick indicator of regional-scale soil moisture patterns (Sehler et al., 2019;Seneviratne et al., 2010) that, in turn, aid in the prediction of other derivatives such as primary productivity and drought occurrences (McNally et al., 2016;Milly et al., 2008).

AEZs and land cover types of sub-Saharan Africa (SSA)
An agroecological zone (AEZ) represents the agricultural potential of an area using its climatic characteristics and available natural resources (Fischer et al., 2000;Snapp, 2017).Agro-ecology incorporates the socio-economic values of land and the ecosystem services it offers to humans and the environment.Agroecological zones can be delineated at different spatial scales ranging from global to localized levels (Fischer et al., 2000).This study used the AEZs created specifically for SSA (IFPRI, 2015).Assessing soil moisture across AEZs is relevant since the zonation is determined by the growth period for crops, climate and altitude (Dudal, 1980;FAO, 1978).The climate and soil conditions, including the amount of soil moisture available for crop growth, fundamentally define the growth period.According to the latest zonation (IFPRI, 2015), nine broad AEZs are represented in SSA (Figure 1a).These include Arid, Semi-Arid, Tropical Cool Arid, Tropical Cool Semi-Arid, Sub-Tropical, Humid, Sub-Humid, Tropical Cool Humid and Tropical Cool Sub-Humid.We used the land cover map of the SSA produced as part of the Global Land Cover 2000 mapping project (Figure 1b).The map was created using multi-source remote sensing data, including SPOT VEGETATION, Japan Earth Resources Satellite (JERS-1) radar, European Remote Sensing (ERS) radar, Defence Meteorological Satellite Program (DMSP), Operational Linescan System (OLS) and digital elevation model (Bartholomé & Belward, 2005;Mayaux et al., 2003).Like the AEZs map, the land cover map had a spatial resolution of 1 km.Although this resolution is acknowledged as too coarse for Africa, where mixed land cover is common at localized scales, it provides a good indication of generic classes at broad spatial scales such as the one used in the present study.The map classifies SSA into six generic land cover types, five of which were used in the present study: agriculture, bare soil, forests, grasslands and woodlands/shrublands.The sixth land cover type representing water bodies was excluded from the analysis since it does not represent surface soil.

Soil moisture data
Soil moisture data covering the period November 1978 -December 2019 at a spatial resolution of ~25 km were obtained from the European Space Agency -Climate Change Initiative (ESA CCI, http://www.esa-soilmoisture-cci.org)and represented the latest version (version 04.2) in a series of releases (Dorigo et al., 2017;Gruber et al., 2017Gruber et al., , 2019)).The soil moisture coverage was derived by merging active and passive C-band microwave sensors at the global scale (Dorigo et al., 2015;Kidd and Haas, 2017).The data provide volumetric soil moisture for the top 2 cm of surface soil (Kidd and Haas, 2017).Validation studies have shown the ESA CCI soil moisture data's accuracy compared to in situ observations (e.g.Dorigo et al., 2015;González-Zamora et al., 2019;McNally et al., 2016).Furthermore, studies have reported the product's superiority over other remote-sensing-based soil moisture estimation methods (e.g.Jing et al., 2018;Qiu et al., 2016;Zeng et al., 2015).González-Zamora et al. (2019) also showed the accuracy of the data across different environmental conditions in Spain.Although retrieval algorithms, sensor transitions and instrumental inconsistency may cause uncertainty (Su et al., 2016), the error level in soil moisture does not exceed the acceptable limit of 0.003 m 3 m −3 yr −1 (Global Climate Observing  System, 2011).In this study, we compared the data with in-situ observations taken at 22 sites collected from COSMOS, PBO H2O, AMMA-CATCH, CARBOAFRICA and DAHRA networks (https:// ismn.geo.tuwien.ac.at/en/networks/) located along the central (northern part of SSA) and southern African region (Dorigo et al., 2011;Dorigo, Wagner et al., 2011).The comparison showed a strong correlation (Pearson's correlation, r ≥ 0.61) across sites with r > 0.75 for 15 sites.
Individual daily soil moisture coverages were spatially incomplete due to orbital differences of the multiple satellites that acquired the data (Dorigo et al., 2015).As a result, each pixel location was quantified as an average of the days when there were records of the monthly soil moisture at each pixel location.We used the first month (November 1978) as a reference based on which the pixels of all the other months (until December 2019) needed to coincide spatially.In enforcing this matching, missing values were imputed for each month by using a combination of Bayesian kriging and regression available in ArcGIS Pro (ESRI® ArcGIS Pro, version 9.2).Generically, spatial interpolation estimates values at unknown locations using values from known locations by assuming that features closer to each other are more similar than those farther apart.Unlike other interpolation methods, kriging provides a measure of estimation uncertainty and spatial correlation of data (as opposed to input from the analyst) to assign weights to observations (Cressie, 1990;Goovaerts, 2019).We specifically chose the Bayesian kriging over the other variants of the interpolator as it uses a large number of simulations to optimize the kriging model parameters without the need for manual adjustment (Krivoruchko & Gribov, 2019).For the regression component, we used precipitation data obtained from WorlClim for the 1978-1980 period (Fick & Hijmans, 2017) and the Climate Hazards Infrared Precipitation with Stations (CHIRPS) for the 1981-2019 period (Funk et al., 2015).The combination of kriging (spatial autocorrelation) and regression that used precipitation as a predictor was preferred to limit the sole influence of each approach.Subsequently, the monthly soil moisture data were analyzed at the pixel-level to maintain localized information in a modelling exercise that included training and testing using holdout data.Validation of the imputation at known locations showed an R 2 of 0.99 for each month.The resultant pixel-level model outputs were spatially averaged per AEZ and land cover type to answer the study's objective.Figure 2 provides a summary of the methodology followed in the study.

Methods
We explored three statistical methods used to forecast soil moisture data.These included the Seasonal Random Walk, Seasonal Autoregressive Integrated Moving Average (SARIMA) and Exponential Smoothing models.Prior to implementing the models, we ascertained for data stationarity in which the mean, variance and autocorrelation are constant throughout the data.Subsequently, we divided the dataset into two parts: training and testing sets.The training set contained the first 70% of the time series (November 1978-August 2007) and the remaining 30% (September 2007 -December 2019) was set aside for testing the three models.Details of the modelling exercises are given in the following sub-sections.

Random walk model
The Random Walk represents the most straightforward and basic approach to forecasting a time-series process on which the other two methods and many others are built (Fildes & Kourentzes, 2011;Green et al., 2009).It found prominence after applications in statistical physics, such as the Brownian motion (Kac, 1947;Weiss, 1983), and has since been widely used in zoology (Bearup & Petrovskii, 2015), ecology (Ahmed et al., 2021), climate (Yan & Wu, 2010) and hydrology (Yang & Liang, 2020), among others.Essentially, the simple Random Walk modelling traces the dynamics of a process by assuming each step taken at a given time to be unbiased and unrelated to the previous event (Lin & Segel, 1974).This assumption renders the approach largely independent of the temporal resolution of the time-series data (Bearup & Petrovskii, 2015).The model's simplicity, coupled with its favorable accuracies, makes it popular in different fields.We applied a simple Random Walk in the present study to see how efficiently the model can capture monthly soil moisture dynamics.As such, the exploration will inform if the dynamics reflect random fluctuations at each time step.The Random Walk model can be described as follows: where y t is soil moisture at time t, y tÀ 1 is soil moisture at time t-1 and w t is a Gaussian white noise series with mean zero and variance σ 2 .A seasonal random walk is a counterpart to a random walk in the periodic domain.Typically, it is a non-stationary process and can be described by with Φ s ¼ 1 It is seasonally integrated with order one since its seasonal difference is the white noise process w t .In this study, s = 12 since we are analyzing a monthly seasonal time series.Combining the nonseasonal random walk model (Equation 1) and the seasonal random walk model (Equation 2) results in the equation: where y tÀ 12 is the value of the soil moisture from the same season in the previous year, and y tÀ 13 is the value of the soil moisture from the previous season in the previous year.For example, for the monthly soil moisture data, the seasonal effect for October 2018 would be the seasonal effect for October 2017, plus some random mean-zero white noise process.This approach works well in our case as it accounts for the natural variation that happens over a 12-month period when using monthly soil moisture data.The seasonal random walk model was developed using the training sample dataset using the "forecast" package (Hyndman et al., 2021) in R 4.1.2(R Core Team, 2021).

SARIMA model
SARIMA is a time-series model that considers the random nature and time dependence of variables under study.Research has shown the suitability of this model for time-series analysis of climate data, including soil moisture that is influenced by seasondependent patterns (e.g.Li et al., 2003).We applied the SARIMA model in the current study to capture the potential effect of climate seasonality on soil moisture.
As stated in Shumway and Stoffer ( 2016), a time-series y t f g, for t = 1, . . ., n, that follows a SARIMA model can be expressed as follows: where is the seasonal autoregressive polynomial and is the non-seasonal autoregressive polynomial.B is the backshift operator, while s represents the number of observations per season.The operator Brepresents a one-period lag in a time series.That is, given a time series y t , the backshift operator B applied to y t yields y tÀ 1 (expresses as By t ¼ y tÀ 1 ).This enables us to investigate how changes in a variable at one point impact its values in the future.This principle can be extended to increase the lag time, such as B 2 y t ¼ y tÀ 2 for twotime lags.As in the case of Random Walk modelling, we used s = 12 in this research.d and D are the differencing terms for the non-seasonal and seasonal orders, respectively.
is the non-seasonal moving average polynomial.The model in Equation 4 can also be expressed using a standard ARIMA model with a seasonal component added to it as ARIMA p; d; q ð Þ � P; D; Q ð Þ s model, where p and P are the autoregressive terms for the non-seasonal and seasonal orders, respectively.Similarly, q and Q represent the moving average terms for the nonseasonal and seasonal orders, respectively.
We used the Hyndman-Khandakar algorithm to fit the best SARIMA model for the training sample set (Hyndman & Khandakar, 2008;Hyndman et al., 2021).The algorithm was applied in R 4.1.2(R Core Team, 2021) using the "auto.arima"function to search for the optimal p, d, q, P, D and Q values (Hyndman et al., 2021) and the corrected Akaike Information Criterion (AICc) (Hurvich & Tsai, 1989) was used to help choose the most suitable model.The algorithm uses maximum likelihood estimation (MLE), Kwiatkowski Phillips Schmidt Shin (KPSS) stationarity testing (Kwiatkowski et al., 1992) and AICc to determine the best SARIMA model.The following formula is used to calculate the AICc: where k is the number of parameters, and n represents the sample size.

Exponential smoothing model
Unlike a simple moving average time-series modelling, Exponential Smoothing apportions differential weighting to past observations when forecasting the future (Hyndman & Athanasopoulos, 2018).Specifically, the most recent observation carries the greatest weight that decreases exponentially for the older observations.This approach can be used to forecast climatic variables that often follow a temporal sequence of events (e.g.Diodato et al., 2019;Papacharalampous et al., 2018).Therefore, it is justified to hypothesize that the amount of soil moisture at a given time reflects the amount of the most recent past, while the similarity decreases to the point of insignificance with time.A time-series y t f g (for t = 1, . .., n) can be decomposed into trend, seasonal and irregular (or noise) components as follows: where μ t ; γ t and ε t are the trend, seasonal and irregular components of the observed time-series, respectively.For a model defined as y t ¼ μ t þ ε t , μ t can be expressed as a weighted average of y t and μ tÀ 1 : where α is the weight (smoothing constant).Equation 7 is a simple (or single) Exponential Smoothing model commonly used to forecast a series when there is no trend.In the presence of a trend in a series, Equation 7 can be modified as follows: where the second update for the trend T t can be defined as Equations 8 and 9 together are known as the Holt-Winters model (Mills, 2019).We can incorporate seasonality further in the series by updating the Holt-Winters framework as follows: where the seasonal updating equation is defined as The Holt-Winters exponential smoothing model was fitted for the training sample dataset using the "forecast" package (Hyndman et al., 2021) in R 4.1.2(R Core Team, 2021).Like in the SARIMA modelling, the exponential smoothing model uses MLE to estimate the values of the smoothing parameters (such as α, β and γ).

Accuracy assessment
The present study developed the models to forecast soil moisture using the training dataset (November 1978 -August 2007) of the data.Rolling window estimation was applied to read the data over time in all the modelling approaches (Seasonal Random Walk, SARIMA and Exponential Smoothing).The rolling window was preferred to the fixed window method since it accommodates parameter instability expected to occur with time (Balcilar et al., 2014;Clark & McCracken, 2009).The trained models were subsequently applied to a 30% holdout validation dataset (September 2007 -December 2019).Since the validation dataset does not participate in model development, it provides an unbiased indication of model performances.We, therefore, used the validation data to evaluate the qualities of the models using the unbiased root mean square error (ubRMSE) that is widely used in timeseries forecasting efforts.The ubRMSE, is recommended for assessing soil moisture evolution since it corrects for extreme biases in the mean or amplitude of estimations that are otherwise unaccounted for in RMSE (Entekhabi et al., 2010;Rao et al., 2022).The ubRMSE can therefore be computed from RMSE as in Equation 12.
for RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where y 0 t and y t are the forecasted and observed values, respectively.We also used correlation coefficient, r, to compare the degree of similarity between predicted and observed soil moisture data (Dorigo et al., 2015;Entekhabi et al., 2010).The standard correlation statistic is affected by severe soil moisture fluctuations caused by seasonal variations (Liu et al., 2012).We, therefore, added a correlation analysis comparing anomalies of observed and forecasted values commonly used in time-series of soil moisture (Dorigo et al., 2010(Dorigo et al., , 2015;;Entekhabi et al., 2010).An anomaly interprets soil moisture at a given time against one representing a longer-term pattern.Therefore, it is computed as the difference between the soil moisture at a given month and the average soil moisture computed from a moving window of months, including that month's temporally averaged soil moisture (Dorigo et al., 2010(Dorigo et al., , 2015)).We used a temporal average of soil moisture using a three-month moving window by assuming 4-seasons-per-year scenario.

Forecast accuracy by agroecological zone
Pixel-level ubRMSE of monthly soil moisture forecasting for September 2007 -December 2019 (148 months) is shown in Figure 3. Generally, there was a certain level of agreement in terms of accuracy among the three forecasting approaches.The pixellevel averages were then aggregated by AEZs (Figure 4).The ubRMSE never exceeded 0.05 m 3 m −3, with the humid zones showing comparable forecasting errors compared to the drier aridic zones.A relative comparison of the modelling approaches showed SARIMA and Exponential Smoothing models to yield lower estimation errors than the Seasonal Random Walk approach in each AEZ.
Figure 5 presents the time-series soil moisture patterns for the nine agroecological zones.In general, the three models were able to forecast soil moisture well in most agroecological zones.Agroecological zones with relatively weakly forecast temporal dynamics include Sub-Tropical, Tropical Cool Humid and, to a lesser extent, Arid regions.It is important to note that these specific agroecological zones show erratic variability of soil moisture dynamics that fluctuate at different (short-and long-term) intervals.Although the three approaches were quite successful in forecasting locally variable fluctuations, the Seasonal Random Walk method tended to over-and under-estimate in the extreme fluctuation cases in most of the AEZs.
The aggregation of the pixel-level anomaly correlations by AEZ is summarized in Figure 6.More than 50% of the samples within each of the humid AEZs showed good agreement (r > 0.5) between the anomalies of the observed and modelled soil moisture using the three forecasting approaches (Figure 6a).A similar finding was obtained for Arid and Semi-Arid zones within the aridic AEZs.The mean correlations between anomalies revealed the comparability of accuracies across AEZs, with r exceeding 0.6 using the three modelling approaches (Figure 6b).

Forecast accuracy by land cover type
The overall soil moisture forecasting errors per land cover type were low with ubRMSE never exceeding 0.04 m 3 m −3 (Figure 7).Similar to the observation in the estimation per AEZ, the SARIMA and Exponential Smoothing models proved to be the best models in nearly all land cover types, though marginally.The time-series patterns shown in Figure 8 reveal detailed estimation accuracies obtained using the three models per land cover type.For example, forecasting was highly accurate in agricultural, forest, woodland and bare soil land cover types for nearly the entire time period of September 2007 to December 2019, irrespective of the model type.In contrast, the weakest soil moisture forecasting accuracy was observed for grassland.It is also important to note the similarities of forecasting among the three approaches, such as those observed in Bare soil, Forest and Woodland during the last few months of the forecast period (Figure 8).A comparison of the soil moisture estimations and observed values based on anomalies is shown in Figure 9.The number of samples that had r > 0.5 between observed and estimated soil moisture exceeded 50% of all samples within each land cover type with the agriculture except in bareland in which only one modelling approach (exponential smoothing) yielded such a result (Figure 9a).The mean correlation between observed and estimated soil moisture anomalies within each land cover zone exceeded 0.7 (Figure 9b), indicating the stability of estimation across the season.This stability is similar to what was observed in the results comparing AEZs.

Forecasting accuracies at aggregated AEZs and land cover types
Monitoring and forecasting soil moisture should be essential for hydrological, agricultural and environmental management efforts.The need for soil moisture prediction cannot be overemphasized for SSA where subsistent agriculture predominates (Bjornlund et al., 2020) and biodiversity loss threatens environmental sustainability (Mohammed, 2020;Scheren et al., 2021;UNEP-WCMC, 2016).In the present study, we modelled monthly soil moisture dynamics of SSA using ~40 years of remotely sensed data at the pixel level and aggregated the results at regional scales (agroecological and land cover zones).Soil moisture estimation models that were calibrated using the rolling window forecasting method and implemented on independent test datasets showed the reliability of generalizing soil moisture in broad agroecological zones.The forecasting can be considered accurate in most humid zones, while it was still encouraging for the aridic zones (Figure 5).The high prediction accuracies in humid zones can be attributed to relatively dense vegetation that improves moisture-holding capacity (Feng, 2016).It is essential to highlight moderate underestimations for high soil moisture values in humid zones, such as between September 2016 and December 2019 (Figure 5).Estimation uncertainty occurs when soil moisture amounts are high such as after dry (or relatively   low moisture) periods.Almagbile et al. (2019) found similar results in which lower estimation accuracies were observed in the wet than in the drier season within a humid study area.It is also important to recognise the potential limitation of all the models in handling major fluctuations that deviate significantly from long-term trends, although this argument must be explored with better statistical evidence.Still, these underpredictions (as opposed to overprediction) can be considered conservative estimations that prepare communities, particularly farmers, to be prudent  in their expectations of soil moisture content.Such underpredictions are not expected in the drier (aridic) zones; this can be linked to the fact that the soil moisture in those regions is generally low.Therefore, any underprediction observed for certain times in the Tropical Cool, Arid and Sub-Tropical AEZs can be considered as deviations from long-term patterns.This can be ascertained by the fact that the underpredictions for the two specific AEZs were noted when soil moisture was above 0.15 m 3 m −3 .
Soil moisture information by land cover type should be interpreted carefully since the land cover type can obscure soil except in bare and agricultural lands during the off-season.However, knowledge of soil moisture per vegetation type remains relevant for regional or localized spatial scales (Feng, 2016).In the present study, soil moisture prediction was achieved at high accuracies within all land cover types and forecasting approaches (Figures 7 and 8).The predictions were successful in forecasting even in highly variable scenarios observed in Bare soil, where soil moisture amount and range were the lowest (0.07-0.11 m 3 m −3 ).The high forecasting accuracy in agricultural areas can be linked to the relatively predictable seasonal patterns or the management efforts by farmers to maintain cropping patterns such as routine irrigation applications, minimum tillage practices and mulching that likely guarantee consistent soil moisture over the years (Fér et al., 2020;García-Moreno et al., 2013).The high prediction accuracies in Forest and Woodland land cover types can be explained by the long-term sustenance of vegetation structure (as opposed to short-term harvesting of crops), which tends to stabilize soil moisture dynamics (Feng & Liu, 2015).It is important to note the low forecasting accuracy in forest and to a lesser extent in woodland zones especially at high values.This could be attributed to the higher complexity of vegetation forms than the land cover designation indicates.The LCCS classification used to produce the map classifies an area as forest cover if the tree cover >15% while the rest can be covered by woody savanna and savanna covers (Bartholomé & Belward, 2005).Similarly, woodlands consist of woody plants interspersed by savannas (Bartholomé & Belward, 2005).Such heterogenous land cover composition can complicate the soil moisture dynamics, making the forecasting accuracy relatively uncertain, especially when the soil moisture values are high.The high uncertainty of soil moisture estimation in forests compared to other land cover types such as crop lands was also shown by Celik et al. (2022).The high level of soil moisture variability over time in grassland was the most difficult to reproduce (Figure 8), although the mean ubRMSE was still encouraging at less than 0.04 m 3 m −3 and comparable to the other land cover types (Figure 8).Forecast uncertainty in grassland can be explained by high moisture variability within localized spatial scales (von Randow et al., 2012), and thus the broad-scale averaging used in our study could have contributed to the weakness of accuracy in this land cover type.In addition, the low soil moisture holding/retention capacity of grasslands (Chen et al., 2019) can add to a rapid temporal variability that may not have been captured well by the monthly scale analysis in our study.
Furthermore, the impacts of extreme weather episodes were neglected in our study.Herbst et al. (2008) suggested that the impact of such episodes on soil moisture could be more significant than that of land cover type.This impact can most certainly be realized in the face of the increasing variability and unpredictability of the climate the world is experiencing (Green et al., 2019).In contrast, the impact of land cover on soil moisture outweighs that of weather factors under climatic conditions that fall within long-term patterns (Feng & Liu, 2015).

Effect of land cover dynamics on forecasting accuracy
Although not a stated objective, assessing the effect of land cover dynamics on soil moisture forecasting accuracy is worthwhile.By focusing on the period that was used for testing the forecasting models (2007-2019), we used Global Land Cover data updated in 2009 (Figure 10a) and 2014 (Figure 10b) to assess the forecasting accuracies.The maps clearly show shifts in land cover types with, for example, an increase in agriculture in 2014 compared with the year 2009.Similarly, a substantial increase in woodland/ shrubland is observed, replacing grassland, particularly in the southern part of the continent.The land cover shifts are reflected in soil moisture variations in the 2 years, with significant differences observed in agriculture and woodland/shrubland land cover types (Figure 10c).In the agriculture zone, soil moisture remained stable across 2014 compared to 2009, while the opposite can be noted in the woodland zone.However, the forecast accuracies were high in all land cover types for both years despite the soil moisture dynamics.This is attributed to the fact that the forecasting exercise was applied at pixel level and thus the accuracy is determined at such level.Subsequent spatial aggregations are therefore unlikely to influence the forecast accuracies irrespective of potential land cover dynamism.

Way forward for improved soil moisture forecasting
One of the limitations of our study is the use of coarse spatial scale remote sensing data (25 km × 25 km) and the further aggregation to regional spatial zones.Such data obscure local variations that would have been captured using higher spatial resolution data.The spatial aggregation is relevant to provide a generic outlook on land cover types and ecological zones, although such zones are temporally and spatially dynamic (FAO, 2017).One practical solution in this regard is to use more detailed land cover classes and AEZ than those used in the present study.The problem concerning the spatial resolution of remote sensing data is solved by combining low-resolution data with high-resolution data (Babaeian et al., 2019;Korres et al., 2015;Sadeghi et al., 2017;Song et al., 2019).For example, Song et al. ( 2019) integrated the 25 km soil moisture data derived from Advanced Microwave Scanning Radiometer-Earth Observing System-2 (AMSR-2) with NDVI and land surface temperature obtained from the Moderate Resolution Imaging Spectroradiometer (MODIS) to produce 1 km scale soil moisture data.A similar data fusion was adopted by Zhao et al. (2021) to downscale the ESA CCI soil moisture data to 1 km scale.Such an approach can be expanded by including landscape features as covariates to downscale soil moisture data (Gaur & Mohanty, 2019;Zhao et al., 2018).For instance, Zhao et al. (2018) combined MODIS derived optical data, vegetation indices, temperature and topographic data to downscale the 36 km Soil Moisture Active and Passive (SMAP) soil moisture data to 1 km scale.Another remote sensing system that allows for soil moisture mapping at all weather conditions are radar data.The increasing availability of high-spatial resolution radar data provides an ideal opportunity for spatially detailed mapping purposes (Cui et al., 2022;Zeyliger et al., 2022).High-spatial resolution forecasting is a key missing factor in the SSA context where agricultural fields in particular are small and fragmented (Giller et al., 2021).
We attempted to show the insensitivity of soil moisture forecasting to land cover dynamics if the estimation is implemented at the pixel level (Figure 10c).However, land cover type and its dynamics remain crucial variables that influence soil moisture content and, as such, can improve forecasting accuracy.Although certain landscape variables such as topography are considered time-invariant, their interactions with other non-constant biophysical and climatic characteristics (Bogena et al., 2010;Qiu et al., 2001) make them valid temporal inputs in soil moisture forecasting.Understanding land cover dynamics trends can therefore be considered in the forecasting process, however, this should be handled cautiously as the expected land cover change may not occur.
Adding biophysical variables such as topography and vegetation characteristics can improve soil moisture forecasting accuracy (Chen et al., 2019;Fathololoumi et al., 2021;Lawrence & Hornberger, 2007;Liang et al., 2017;Qiu et al., 2001;Vereecken et al., 2008;Yu et al., 2018).For example, Liang et al. (2017) compared soil moisture content across topographic gradient and found a higher moisture content in a gully followed by a valley-head slope and side slope.Their study also showed the importance of factoring in soil porosity, hydraulic conductivity, vegetation density and basal area in quantifying soil moisture content.Similarly, Fathololoumi et al. (2021) demonstrated the response of soil moisture contents to environmental and biophysical variations at different spatial scales, indicating the value of including such characteristics in soil moisture forecasting.It is also beneficial to incorporate resource utilization cultures in the forecasting process.For example, cropping practices such as crop type and tillage methods impact soil moisture retention capability by changing soil structures and, subsequently, capillaries that serve as a conduit for water movement (Gabriel et al., 2017;Hatfield et al., 2001).While including several influencing factors improve the accuracy of soil moisture forecasting, the complex interactions of those factors must be well understood to make robust conclusions and interpretations (Chen et al., 2019;Fathololoumi et al., 2021).

Conclusion
The present study applied three modelling methods (SARIMA, Exponential Smoothing and Random Walk) to estimate monthly soil moisture data using only antecedent soil moisture records as inputs.Although there were slight variations with SARIMA and the Exponential Smoothing performing better in terms of error, the methods generally agreed in forecast accuracies.This indicates that soil moisture forecasting may not be influenced significantly if well-established methods such as those used in this study are applied.All the models were able to forecast soil moisture using test data set (September 2007 -December 2019) at high accuracy levels at broad spatial scales, namely AEZ and land cover types of SSA.Relatively modest variations in estimation accuracy can be expected across AEZ, with the more humid and arid zones having the best accuracy compared to the other zones.Soil moisture forecasting by land cover type showed the best accuracies in Agriculture, Forest, Woodland and Bare soil, while estimations in most grassland areas were still encouraging.Season-normalized anomalies between observed and forecasted soil moisture were strongly correlated (Figures 6 and  9), confirming the forecasting accuracies even with soil moisture fluctuations expected with seasonal changes.The broad spatial scale averaging adopted in the study provides a synoptic guide of soil moisture scenarios that can be used as inputs for other applications such as crop yield forecasting and drought prediction.

Figure 1 .
Figure 1.Agroecological zones (a) and land cover types (b) of SSA.

Figure 2 .
Figure 2. Flow diagram showing the methods followed in the study to forecast monthly soil moisture in SSA.

Figure 3 .
Figure 3. Pixel-level unbiased RMSE (ubRMSE) of monthly soil moisture forecasting using three modelling approaches implemented on test datasets (September 2007 -December 2019) of SSA.

Figure 4 .
Figure 4. Unbiased RMSE (ubRMSE) of monthly soil moisture forecasting using three modelling approaches implemented on test datasets (September 2007 -December 2019) per agroecological zone of SSA.

Figure 5 .
Figure 5. Monthly soil moisture forecasted using three modelling approaches implemented on test datasets (September 2007 -December 2019) per AEZ of SSA.

Figure 6 .
Figure 6.Correlations of anomalies between observed and forecast soil moisture using three modelling approaches implemented on test datasets (September 2007 -December 2019) per AEZ of SSA.(a) the proportion of samples (pixels) with a correlation greater than 0.5 within each AEZ; (b) mean correlation.

Figure 7 .
Figure 7. Unbiased RMSE (ubRMSE) of monthly soil moisture forecasting using four modelling approaches implemented on test datasets (September 2007 -December 2019) per land cover type of SSA.

Figure 8 .
Figure 8. Monthly soil moisture forecasted using three modelling approaches implemented on test datasets (September 2007 -December 2019) per land cover type of SSA.

Figure 9 .
Figure 9. Correlations of anomalies between observed and forecast soil moisture using three modelling approaches implemented on test datasets (September 2007 -December 2019) per land cover type of SSA.(a) proportion of samples (pixels) with a correlation greater than 0.5 within each land cover type; (b) mean correlation.

Figure 10 .
Figure 10.Comparison of forecasting accuracy in a changing land cover scenario.(a) land cover map of 2009 derived from global land cover (GLC); (b) land cover map of 2014 derived from global land cover (GLC); (c) forecasting accuracy based on 2009 and 2014 land cover maps.