Assessing the Performance of Satellite-Based Models for Crop Yield Estimation in the Canadian Prairies

Abstract Timely monitoring of crop production using a remote sensing-based approach offers promise toward enhancing food security. Statistical models developed using satellite data typically employ a single vegetation index from a single sensor for yield estimation. With the increasing availability of satellite datasets, there is now an opportunity to investigate the potential of available vegetation indices from different sensors in estimating yields. The key objective of this study was to develop a best-performing satellite-based yield model for the Canadian Prairies for wheat, barley, and canola, trained using municipality-level data from 2009 to 2019. We tested the statistical performance of models built using (a) indices from different sensors (Landsat and Sentinel-2), (b) indices sensitive to different yield properties, and (c) single versus multiple vegetation indices. Results showed Landsat-NDWI as the best performing single-index across all indices and sensors for each crop. Sentinel-2-EVI performed best for wheat and canola and Sentinel-2-SR for barley; but these models were built using only 4 years of data. We found that best-performing single-index models recorded similar predictive accuracy as multi-index models during model validation. The results from this work suggest that satellite-based yield estimation can be improved by selecting the right index related to different crop properties.


Introduction
Timely, consistent, and reliable estimation of crop yields are key to making strategic decisions related to food security planning from local to global scales, especially under changing growing conditions.Crop yield is influenced by interactions between multiple factors such as weather, soil health, and management choices (e.g., irrigation, fertilizers).The influence of different factors that drive yields are reflected in crop growth conditions that can be directly measured using remotely sensed satellite imagery.Satellite data provides near-real-time intelligence on vegetation health status suitable for continuously monitoring and assessing crop yields over space and time (Sishodia et al. 2020).Historical and near-real time crop production insights are critical inputs for decision making by different agricultural stakeholders (farmers, crop insurers, policy makers, commodity traders) (Weiss et al. 2020;Karthikeyan et al. 2020).
Different agricultural stakeholders (e.g., producers, commodity traders, government, crop insurers) require different pieces of intelligence on agricultural production toward achieving diverse targets (Table 1).For example, producers want to optimize production quantity using historical and near-real time data, crop insurers aim to design satellite-based insurance contracts, and government or policy makers are interested in strengthening food security using satellite-based food monitoring systems.Accordingly, the role of satellite vegetation indices in providing context-specific yield insights in addition to accurate yield estimates is critical.Currently, most agricultural stakeholders rely on models using traditional single indices such as NDVI and EVI which may not be well suited to monitoring nutrientstressed crops (e.g., in nutrient-limited regions) or water-stressed crops (e.g., in irrigation-limited regions).The careful selection and inclusion of additional information by combining multiple, different indices measuring different crop properties could be useful for supporting existing satellite-based agricultural intelligence capabilities.
Most studies using a satellite-based approach for estimating crop yields employ a single vegetation index for deriving yield estimates (Liu et al. 2020;Bolton and Friedl 2013;Wall et al. 2008;Burke and Lobell 2017;Kouadio et al. 2014).Traditional vegetation indices derived from popular satellite data sources (AVHRR 1 , MODIS 2 , Landsat) include normalized difference vegetation index (NDVI) and two-band enhanced vegetation index (EVI2), both tracking the biomass component of yield.MODISbased EVI-2 has been found to perform better than NDVI at crop yield prediction previously (Liu et al. 2020).Some studies have found normalized water difference index (NDWI), which is sensitive to crop water content, to be an effective predictor of yields in more arid regions (Bolton and Friedl 2013).Other studies note that a green chlorophyll vegetation index (GCVI), tracking variation in chlorophyll concentration is better suited than NDVI or EVI to capture the effects of nutrient stress on resulting yields (Burke and Lobell 2017;Azzari et al. 2017).GCVI is used because it has been suggested to approximate LAI well (Nguy-Robertson et al. 2012).Another study using combined Landsat and Sentinel-2 data reports that best results were obtained by a model with NIR and red spectral bands (Skakun et al. 2019).Marszalek et al. (2022) highlights the importance of indices tracking crop water and nitrogen content derived from red-edge and nearinfrared bands of the Sentinel-2 sensor for yield prediction at the field level.
Few previous studies have evaluated the potential of combining multiple trait-specific vegetation indices for estimating yields or compared different satellite data sources and their value in estimating yields for multiple crops.Most previous studies (with a few exceptions; Zhao et al. (2020), Ulfa et al. (2022), Cavalaris et al. (2021) have focused on studying one crop type (region-specific major crops) using a single satellite data source.Zhao et al. (2020) investigated the potential of combining biomass and chlorophyll indices for wheat yield prediction at the field level using Sentinel-2 data.They found a small increase in model predictive skill when different indices at their respective peaks were combined.They recommend further testing of a multi-index modeling approach in different regions with diverse crop management practices.Similar results were also noted by Ulfa et al. (2022) in modeling field-level wheat yields using multi-indices derived from Landsat data.Currently, a multi-index crop yield estimation approach remains mostly constrained to the field level and one crop type with limited data records.
In this study, we investigate the statistical performance of satellite-based crop yield models including single vegetation indices and multiple vegetation indices tracking different yield traits derived using different satellite datasets.We train and cross-validate all models using observed historical yield data (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019) at the municipality level for three key crops (wheat, barley, canola) in the Canadian Prairies.The main objective of this study was to develop a best performing satellitebased crop yield model.Our study had the following three sub-objectives: (1) compare how well different Reference studies are listed below.

Study area: Canadian Prairies
The Canadian Prairies are an economically important region of Canada with field crop production accounting for almost half of agricultural operations in the region.Spring wheat, barley, and canola are the largest commodities grown by planted acreage.The study area covers the three provinces of Alberta, Saskatchewan, and Manitoba (Figure 1).Regional agro-climatic conditions vary year-to-year resulting in differing growing environments.Precipitation is highly variable (300-500 mm per annum), both spatially and temporally, leading to moisture deficit and crop health stress in irrigation-limited areas.The growing season for annual spring crops begins with seeding in April and concludes with harvesting between early to late fall.Majority of the agricultural land in Manitoba is located within the moist black and dark gray soil zones.Agricultural regions of Alberta and Saskatchewan are located across differing soil zones experiencing varying interactions between soil and climate resulting in diverse crop growth environments.
Crop yield and satellite data (MODIS, Landsat, Sentinel-2) There resolution with daily and near-daily (every 3 d) revisit cycles respectively, compared to Landsat (7&8) imagery which is available every 16 d.As a previous study in the Prairies found seasonal peak EVI2 to perform better at crop yield estimation than other metrics derived using MODIS data (Liu et al. 2020), we extracted a single-index, EVI2, using MODIS data.With Landsat and Sentinel-2 data, we retrieved seven and ten trait-specific vegetation indices respectively (Table 2).Annual cropspecific masks freely available from the Canadian annual crop inventory at 30 m spatial resolution from 2011 onward (crop masks are available at 56 m spatial resolution for years 2009 and 2010) were applied prior to calculation of crop-specific indices.The crop-specific masks captured both spatial and temporal cropping conditions of only cropping areas within each municipality targeting the specificity of data used for model training toward improving the quality of estimates.A sensor-specific cloud mask with a threshold of < 10% cloud coverage was applied to all satellite imagery to obtain cloud-free pixels prior to calculation of indices.All suitable Landsat images were selected to derive maximum monthly composites toward the calculation of indices.Due to a 16-day repeat cycle associated with the Landsat sensor, there were missing values for some municipalities.MODIS-and Sentinel-2-based indices were extracted as maximum weekly composites.A previous study noted that yield estimation using Sentinel-2 data extracted at a weekly resolution had a superior performance than monthly resolution data as a weekly temporal scale better reflected the crop growth status (Marszalek et al. 2022).With each image collection, vegetation indices were extracted as a spatial mean at the municipality level.All satellite datasets were accessed and processed in Google Earth Engine (GEE) due to the platform's capability in handling large volumes of satellite imagery (Gorelick et al. 2017).A selected summary of spatial distribution of satellite vegetation indices at municipality level used as model inputs is shown in Figure 1.

Crop yield estimation
We use a statistical linear regression modeling approach common in the literature to estimate yields from satellite indices (Johnson et al. 2021;Basso and Liu 2019).Two yield models, single-index and multiindex, for each crop were developed.Crop-specific biomass index, water index and chlorophyll index were combined in a multi-index modeling framework (Equation 1) to test for additional improvements over a single-index model (Equation 2).In each multiindex model, we only included a single-index representing a specific yield trait as a potential predictor to avoid multi-collinearity issues.We also conducted additional testing for multicollinearity between different indices using Variance Inflation Factor (VIF) before including respective indices in multi-index models and found that combinations of indices that had lower VIF values also had lower Akaike information criterion (AIC) values.As we wanted to conduct a comprehensive analysis using a multi-index approach, we tested all possible combinations of trait-specific indices with Landsat data (12 trait-specific index combinations) and Sentinel-2 data (24 trait-specific index combinations) to identify the best combination of indices for yield estimation (Table A7).Models using a single index (Equation 2) for all Landsat-and Sentinel-2-derived indices individually were compared to multi-index models.
Where I ¼ {B, W, C} In the regression Equations ( 1) and (2), y represents crop yield, I represents satellite vegetation index, M represents random municipality effects and T represents a linear year trend.BI, WI and CI represent a biomass, water and chlorophyll index respectively.
As imagery from different satellite sources are available at different times during the year, we developed both multi-index and single-index models at different time steps specific to the sensor to gauge which model performed best.With Landsat data, we developed monthly models within each growing season using monthly peak indices.Weekly models were developed for each of twenty-two weeks within each growing season using Sentinel-2 weekly index composites.With MODIS data, we developed a single model using seasonal peak vegetation index as peak stage was found to be the optimal time to obtain yield estimates by a previous study in the Prairies (Liu et al. 2020).We also investigated the performance of multi-stage 5 against single stage models and found that a single stage model performed better than a multi-stage model for each crop at yield estimation (Figure A6).All models include random municipality effects 6 to account for local variations at the municipality level due to differing management practices (e.g., irrigation, tillage, cultivars etc.) and linear year trend to account for the effects of technological developments on crop yield.

Model selection and assessments
We used the Akaike information criterion (AIC) as the model selection metric to identify the best-performing multi-index model combination over all possible index combinations considered.AIC as a model selection metric weight both model performance and model complexity.For models with the lowest AIC (AICc for small samples) value (Tables A4-A6), we performed leave-one-year-out cross-validation (LOOCV) to estimate the prediction error of chosen models at the municipality level.LOOCV as an out-of-sample validation test allows to evaluate the generalizability of a model in different years with different climatic conditions (Liu et al. 2020;Kouadio et al. 2014).Finally, we compared the prediction accuracy of chosen models using the following metrics: root-mean-square-error (RMSE) measured in tons per hectare, relative rootmean-square-error (RRMSE, %) to mean observed yield, and mean absolute error (MAE) measured in tons per hectare.We used an additional metric, Wasserstein distances (WD), to further evaluate model performance.WD is a distance metric that quantifies the difference or distance between two data distributions (Muskulus and Verduyn-Lunel 2011).The value of WD is equal to 0 if distributions are identical with the value of WD becoming larger as distributions become different.WD was calculated between the data distributions of observed and estimated yields.

Results
Overall performance of multi-index and singleindex models at the municipality level The best performing single-index and multi-index models using Landsat data during 2009-2019 performed quite similarly on cross-validation metrics (RRMSE range: 14.6-19.7%and 14.3-19.4% for single-index and multi-index models respectively across crops) (Table 3).We observed low WD values for best performing crop-specific single-index and multi-index Landsat models showing a good agreement between observed yields and estimated yields.For wheat, the multi-index combination EVI þ SR þ NDWI showed only a marginal increase in predictive power over NDWI, the best single-index predictor for wheat.A single-index model using NDWI performed slightly better than a multi-index model for barley and canola.
For barley and canola, the prediction error of singleindex NDWI model (multi-index model) is 18.69% (19.42%) and 16.45% (16.49%) respectively.The best performing multi-index combination for barley and canola were found to be EVI þ SR þ GI and EVI þ SR þ NDWI respectively.Overall, we note that a single index tracking water content in crops, NDWI, performed better than all other trait-specific Landsat indices tested for each crop.For example, compared to traditional indices such as NDVI derived using Landsat data, NDWI showed a slightly better performance.The prediction error (RRMSE (%)) of NDWI (NDVI) single-index model is 14.57% (15.14%) for wheat, 18.69% (19.69%) for barley and 16.45% Root-mean-square-error (RMSE) in t/ha, RMSE relative to mean observed yield (RRMSE (%)), mean absolute error (MAE) in t/ha, coefficient of determination (R 2 ), and Wasserstein Distance (WD).
(17.65%) for canola.For modeling cases tested using Landsat data, satellite predictors derived at the height of July (mid-season coinciding with peak of crop growth stage) was observed to perform better at yield estimation than predictors derived during other periods (early-and late-season) within the growing season.Further we note that yield estimation using Landsat data demonstrated improvements over yield estimates derived using MODIS data for each crop.A1 and A2).For wheat and barley, Landsat predictors derived at the height of July remained the optimal time period for obtaining yield estimates.Seasonal peak MODIS-EVI2 (RRMSE: 13.52%) had the lowest prediction error for canola (Table A3).With Sentinel-2 data, we observed that multi-index models were unable to demonstrate huge improvements over single-index models for each crop (Table 4).A single-index model using EVI (RRMSE: 18.17%) tracking biomass content had the lowest prediction error compared to a multiindex (EVI þ TCARI þ GI) model (RRMSE: 21.35%) for wheat.For barley and canola, a multi-index model performed marginally better than a single-index model.The best performing multi-index combination for barley and canola were found to be EVI þ SR þ NDWI and EVI þ TCARI þ GI respectively.The prediction error (RRMSE (%)) of the best performing multi-index (single-index) model is 18.57% (18.88%) for barley and 15.19% (15.31%) for canola.For models using Sentinel-2 data, indices derived toward end of June (week 25) for wheat and barley and at the beginning of August (week 32) for canola had lowest prediction errors.The cross-validation modeling results for the best performing single-index and multi-index models using Landsat and Sentinel-2 data during the 2016-19 period are presented in Table 4. Detailed results for 2016-19 period for wheat, barley and canola are presented in Tables A1-A3 respectively in the appendix.Figures 2 and 3 show the relationship between observed yield and yield estimated by the best performing single-index and multi-index models using Landsat and Sentinel-2 data and by a model using MODIS-EVI2 for periods 2009-2019 and 2016-2019 respectively.We observed an overall pattern of underestimation of high yields and overestimation of low yields using different satellitebased yield models.The spread of data points associated with the MODIS model for each crop is higher compared to Landsat and Sentinel-2 models during both modeling periods.

Interannual variation in estimated yields
The best performing multi-index and single-index models using Landsat and Sentinel-2 data and MODIS single-index model were used to estimate crop yields at the municipality level over the study period.Figure 4 shows the mean interannual variation in model-estimated yield against observed yield for each crop.Crop yields have increased over the 11-year period with some   Observed yield shown along the y-axis range from 0 to 6 t/ha for wheat (A-E), from 0 to 8 t/ha for barley (F-J) and from 0 to 4 t/ha for canola (K-O).
interannual variation.Yield models based on MODIS and Landsat data were able to capture the overall observed trend over 2009-2019.We observed overestimation of low yields and underestimation of high yields by all models.Landsat-based multi-index and singleindex models for canola demonstrated nearly identical estimation performance during the entire study period (Figure 4H).We noted some minor differences in estimation performance of both Landsat-based models for years 2013-2019 and 2009, 2015-2019 for wheat and barley respectively (Figures 4B and 4E).Interannual yield estimates varied by sensor-specific index predictors used to train models over the full period (Figure A1).Sentinel-2 multi-index and single-index model performance during the 2016-2019 period varied by crop type (Figure A2).A marginal difference in performance between both model types was seen for wheat and barley during this short time period (Figures 4C and 4F) whereas for canola, both models exhibited similar performance (Figure 4I).

Evaluation of model performance at municipality level
The comparison of model performance over the 2009-2019 period showed that results were quite similar when using a dataset including either the best set of Landsat-based indices or a MODIS index.(Figure 5).The ranges of RRMSE for wheat, barley and canola using Landsat (MODIS) data were À67% (À80%) to 56% (58%), À71% (À93%) to 87% (92%) and À61% (À68%) to 85% (85%) respectively.The overall spatial distribution of yields estimated using different satellite index predictors during periods, 2009 and 2019, for wheat, barley and canola is shown in Figures 6-8 respectively.The spatial distribution of estimated yields for each crop showed that model performances varied slightly across the agricultural municipalities by the index predictor(s) used for model training.The spatial distribution of model estimation error (underestimation, overestimation, no change) for wheat, barley and canola is presented in Figures A7-A9 respectively in the appendix.We note that models were able to capture the increase in yields observed between 2009 and 2019 in some municipalities.Overall, satellite-based yield models were able to satisfactorily reproduce spatial yield variations observed at the municipality level as documented by different yield reporting agencies which is highlighted by low WD values (Tables 3 and  4 and Tables A1-A3).WD values of best performing single-index and multi-index models using Landsat and Sentinel-2 data range between 0.11 À 0.33 and 0.10 À 0.46 respectively.

Discussion
This study investigated the potential of combining multiple, different trait-specific satellite-based vegetation indices over using single indices for crop yield estimation using different satellite datasets.Modeling results show that multi-index models based on Landsat and Sentinel-2 data were unable to demonstrate large improvements over models using sensorspecific single indices at yield prediction (Tables 3  and 4).Combining vegetation indices tracking different yield traits namely biomass, water content and chlorophyll content does not seem to provide additional information on crop health and it is likely that overlapping signals results in a lack of improvement in multi-index models.For example, Figure 9  In this study, single-index models showed prediction performance equivalent to the performance of Landsat-based single-index models again demonstrated better prediction performance than multiindex models for each crop during the 2016-19 period.Using Sentinel-2 indices, single-index model using EVI outperformed the multi-index model for wheat whereas prediction performance of single-index models and multi-index models was comparable for barley and canola.As EVI is less prone to saturation with dense crop biomass and better reflects crop health, Sentinel-2-EVI was similarly found to be an important predictor of crop yield in a previous study (Pejak et al. 2022;Li et al. 2022).
Modeling results show that a single-index modeling approach has the potential to perform well compared to a multi-index approach as long as a user employs an appropriate vegetation index suited to the particular satellite product in a yield estimation model.With a single-index approach, identifying the best crop-specific, region-specific and sensor-specific vegetation index is critical for obtaining high quality yield estimates.In contrast, a multi-index modeling approach is beneficial when a user wants to phase out the index identification and selection process.Using a multi-index approach, an user can let the model automatically select and use the index that best tracks a particular yield trait through a growing season (e.g., drought vs non-drought year) to produce estimates.Further testing of a multiindex modeling approach could provide higher utility at yield estimation in operational applications (e.g., forecasting yields, commodity arrival prediction) monitoring diverse agricultural regions.Users deploying a Landsatbased model in an operational environment might observe missing yield estimates in some agricultural regions due to less frequent (16-day) repeat cycle of the sensor and cloud cover.In the future, the application of a Sentinel-2-based model in such a scenario is promising as it has a better repeat cycle (near-daily, that is, every three days) than Landsat.A summary of the best performing crop-specific, satellite sensor-specific singleindex and multi-index combinations identified during the two periods tested in this study is presented in Tables 5 and 6 respectively.
Results further revealed the importance of some vegetation indices that could be useful toward improving yield estimates in the Prairie agricultural region.Landsat-based single-index, NDWI, which is sensitive to crop moisture content demonstrated predictive skill similar to multi-index models when tested during the full study period.This finding highlights the capability of NDWI in tracking plant water fluctuations and its resulting impacts on final crop yield.NDWI has been similarly found to be a more effective predictor of crop yields compared to indices such as NDVI and EVI in arid agricultural regions and in croplands with highly variable soil properties (e.g., salinity) (Bolton and Friedl 2013;Cavalaris et al. 2021).This suggests that the quality of existing regional yield forecast models could be improved by using NDWI instead of NDVI or EVI as commonly practiced.For example, the Canadian crop yield forecaster tool currently uses NDVI with climate data as inputs to forecast crop yields in agricultural regions of Canada (Chipanshi et al. 2015).Further exploration of predictors such as NDWI that better represent crop health status over the performance at prediction of wheat yields.Additionally, our findings on the sensor-specific optimal time period to source predictors, that is at the height of July with Landsat indices and during Juneto-August for crop-specific Sentinel-2 indices, could be useful toward developing within-season yield forecasting models.We further note that overall performance of Landsat-based indices were found to be better than MODIS-EVI2 during the 2009-19 period.Such an insight could be valuable for agricultural risk planning and management.Agricultural stakeholders such as crop insurers currently using MODIS-based data for applications such as designing index-based insurance could consider Landsat-based indices to identify any difference in value proposition for farmers when using different satellite data inputs for insurance design.In the near-recent modeling period (2016-19), MODIS-EVI2 had a better prediction performance for canola over all other indices.Such a finding indicates the importance of both better spatial resolution of Landsat and better temporal resolution of MODIS in tracking crop yields at large scales of aggregation.A previous study reported similar findings when MODIS-and Landsat-based indices were tested for monitoring crop yields in contrasting agricultural regions of the U.S. and India (Azzari et al. 2017).
Although crop yield models using Sentinel-2 indices were unable to demonstrate marked improvements over MODIS-and Landsat-based models at the coarser municipality level, many studies have highlighted the potential of Sentinel-2 data for yield prediction at finer spatial levels (e.g., field, farm, landscape) (Hunt et al. 2019;Segarra et al. 2022;Zhao et al. 2020;Cavalaris et al. 2021;Kayad et al. 2019).Our future work will look into evaluation of the best performing Sentinel-2 indices identified in the current study using high resolution field level crop yield data in the Canadian Prairies.For example, we found that green index (GI) outperformed NDWI in tracking plant water signals for wheat and canola in each multi-index combination tested using Sentinel-2 data.GI has been noted to be sensitive to irrigation status in a previous study (Deines et al. 2017) and further testing at the field level could help guide management decisions for yield optimization.
The spatial pattern of yields estimated using different models varied slightly for each crop across some agricultural regions of the Canadian Prairies.These results indicate that chosen satellite index/indices drive the location-specific performance of models requiring further testing and identification of location-specific best vegetation index for monitoring crop health at the sub-regional or field level in the Prairies.For example, Bolton and Friedl 2013 recommends a mixed model, NDWI in more arid regions (for its ability to detect irrigation) and EVI2 in other regions, for predicting maize yield in U.S. counties.
The results of this study showed that statistical yield models based solely on vegetation indices were unable to capture well the extreme training data points.Additionally, model performance was weak in some years shown by the upward and downward spikes in model residuals (Figures A1 and A2).Models highly underestimated crop yields in years 2009 and 2013 and overestimated yields in 2012.A study investigating climate variability impacts on crop production in the Canadian Prairies over the 2000-13 period found that drought conditions in 2009 and 2012 resulted in below average crop productivity, mostly in Alberta and Saskatchewan (Dong et al. 2016).Incorporating additional predictors related to weather and environmental data could help improve performance of models driven solely by vegetation indices in such scenarios.As pure vegetation indexbased methods are unable to track yield outcomes in atypical growth conditions (e.g., droughts, high temperatures), a study tested several input datasets (vegetation indices, weather data, environmental variables) and models (linear regression, polynomial regression and machine learning) for estimating yields (Sakamoto 2020).They found that a machine learning model, Random Forest, incorporating additional environmental variables provided accurate yield estimates during a drought year.Statistical models based on machine learning algorithms has advantages over a traditional regression model especially when some predictors are highly correlated (Jeong et al. 2016).Efforts are currently underway to integrate weather and environmental data in satellite-based models of crop yield.Introducing additional information representing variability of environmental conditions may improve the current performance of models in atypical production years.We plan to further evaluate the potential of state-of-the-art machine and deep learning algorithms in detecting spatiotemporal variation in yields over traditional regression models using high spatial resolution yield datasets.Such efforts will provide further exploration of predictors that improve representation of atypical impacts on final crop yield which could be useful for stakeholders such as crop insurers, commodity traders and policy analysts.

Conclusion
Timely and consistent monitoring of crop production using a satellite-based approach is one way to enhance food production and overall food security.A satellitebased approach is advantageous as it can be applied in data sparse regions to obtain crop yield estimates.This study assessed the performance of several vegetation indices derived from different satellite datasets within two modeling approaches for crop yield estimation in the Canadian Prairies.Both single-index and multi-index models demonstrated similar performance at yield estimation over a 11-year period (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019) for each crop (wheat, barley, canola).Single-index, NDWI, outperformed other Landsat-based indices for each crop during this period.Our analysis further showed that best performing crop-specific model using Landsat data produced yield estimates that were marginally better than models using MODIS-EVI2.Spatial examination of estimated yields revealed slightly varying location-specific performance of models trained using different index predictors.
Such findings indicate the need to identify the best vegetation index/indices within different agricultural subregions toward improving statistical-based yield estimates.Furthermore, introduction of additional weather and environmental variables within pure vegetation indexbased yield models may help improve prediction of interannual yield variability within a given administrative unit or in atypical production years (e.g., droughts, high temperatures).Overall, this study highlights the importance of exploring and testing different vegetation indices capturing different aspects of plant growth toward improving yield estimates and for further use in applications such as yield forecasting in agricultural regions under changing growing conditions.

Notes
1. AVHRR: Advanced Very High Resolution Radiometer 2. MODIS: Moderate Resolution Imaging Spectroradiometer 3. Saskatchewan crop yields: https://applications.saskatch ewan.ca/agrrmyields 4. Manitoba crop yields: https://www.masc.mb.ca/masc.nsf/mmpp_index.html 5.As including multi-step information from the same growing season introduces collinearity between predictors and influences sensitivity of model coefficients, we decided to eliminate multi-stage models from the linear regression modelling approach.6. Regression models with municipality as a fixed effect were also tested (results not shown).As models with fixed effects did not influence results significantly and due to flexibility of models including a random effect, we present and discuss results for latter models.
Table 1.A selection of use-case applications of satellite-based vegetation indices by different agricultural stakeholders.near-real time and HDA is historical data archive.
Figure 1.Spatial distribution of sensor-specific selected mean vegetation indices observed at the peak of the growing season at municipality level of the Canadian Prairies for wheat, barley and canola during the 2016-2019 period.Regions in white represent the absence of cropland within a given municipality.

Figure 2 .
Figure 2. Benchmarking of best performing single-index and multi-index models using Landsat (7&8) data against MODIS singleindex model for full time series (2009-2019) for wheat (Top panel), barley (Middle panel) and canola (Bottom panel).Observed yield shown along the y-axis range from 0 to 8 t/ha for wheat (A-C) and barley (D-F) and from 0 to 4 t/ha for canola (G-I).

Figure 3 .
Figure 3. Benchmarking of best performing single-index and multi-index models using Landsat (7&8) and Sentinel-2 data against MODIS single-index model for the time period 2016-2019 for wheat (Top panel), barley (Middle panel) and canola (Bottom panel).Observed yield shown along the y-axis range from 0 to 6 t/ha for wheat (A-E), from 0 to 8 t/ha for barley (F-J) and from 0 to 4 t/ha for canola (K-O).

Figure 4 .
Figure 4. Mean annual variation in estimated yield against observed yield of wheat (Top panel), barley (Middle panel) and canola (Bottom panel) using MODIS baseline model (A,D,G) and best performing Landsat (B,E,H) and Sentinel-2 (C,F,I) models.Crop yield shown along the y-axis range from 2 to 4 t/ha for wheat (A-C), from 3 to 5 t/ha for barley (D-F) and from 1 to 3 t/ha for canola (G-I).A simple yield average across all municipalities was calculated to get an annual mean yield value for inter-year comparison of model estimated yield to observed yield.
shows the relationship between best performing Landsatbased indices identified for multi-index model (EVI þ SR þ NDWI) for wheat.High correlations (> 0.75) are observed between biomass index (EVI) and chlorophyll index (SR) (Figure 9A) and between biomass index (EVI) and water index (NDWI) (Figure 9B) (Detailed correlation results for wheat, barley, and canola are shown in Figures A3-A5 respectively in the appendix).Additional research should consider the inclusion of raw band values of satellite images that are used to calculate indices as ratios of bands to investigate if yield estimation quality improves for a multi-index model trained with specific satellite bands.For example, one of the bands included in all chosen indices in the Landsat-based multi-index model for wheat, EVI þ SR þ NDWI, covers the near-infrared (NIR) spectrum.The NIR-based band tracks the chlorophyll aspect of yield quality and the importance of this band was previously reported(Skakun et al. 2019;Marszalek et al. 2022;Prey and Schmidhalter 2019).Marszalek et al. (2022) notes that Sentinel-2based raw bands covering the red and NIR spectrum achieved higher accuracies than the index-based approach at yield estimation.

Figure 5 .
Figure 5. Ranges of cross validation model prediction error (RMSE) relative to mean observed yield at the municipality level, 2009-2019 period, for best performing model using Landsat data (EVI þ SR þ NDWI for wheat, NDWI for barley and canola) and MODIS data (EVI2) for each crop.

Figure 6 .
Figure 6.Spatial distribution of observed yield and estimated yield of wheat using different satellite index predictors for 2009 and 2019.Landsat-NDWI is the best performing single-index.Best performing multi-index combination using Landsat data for wheat is EVI þ SR þ NDWI.Regions in white represent absence of cropland within a given municipality.Regions in grey represent municipalities for which an model-specific yield estimate is unavailable.

Figure 7 .
Figure 7. Spatial distribution of observed yield and estimated yield of barley using different satellite index predictors for 2009 and 2019.Landsat-NDWI is the best performing single-index.Best performing multi-index combination using Landsat data for wheat is EVI þ SR þ GI.Regions in white represent absence of cropland within a given municipality.Regions in grey represent municipalities for which an model-specific yield estimate is unavailable.

Figure 8 .
Figure 8. Spatial distribution of observed yield and estimated yield of canola using different satellite index predictors for 2009 and 2019.Landsat-NDWI is the best performing single-index.Best performing multi-index combination using Landsat data for wheat is EVI þ SR þ NDWI.Regions in white represent absence of cropland within a given municipality.Regions in grey represent municipalities for which an model-specific yield estimate is unavailable.

Figure 9 .
Figure 9. Correlation between best performing Landsat-based vegetation indices used in a multi-index model for wheat (EVI þ SR þ NDWI).

Figure A1 .
Figure A1.Accuracy of yearly yield predictions by the best performing multi-index and single-index models using Landsat data against MODIS single-index model for 2009-2019 for wheat, barley and canola.Yield anomaly (%) is the model residual relative to mean observed yield.A simple average of yield anomaly across all municipalities was calculated to derive an annual mean anomaly value.

Figure A2 .
Figure A2.Accuracy of yearly yield predictions by the best performing multi-index and single-index models using Landsat and Sentinel-2 data against MODIS single-index model for 2016-2019 for wheat, barley and canola.Yield anomaly (%) is the model residual relative to mean observed yield.A simple average of yield anomaly across all municipalities was calculated to derive an annual mean anomaly value.

Figure A3 .
Figure A3.Correlations between different Landsat-based vegetation indices for wheat observed in July.Pairwise correlation coefficient and pairwise relationship between variables is shown in the upper triangle portion and lower triangle portion of the plot respectively.The curves on the diagonal show the density of respective variables where frequency count is reported along the y-axis and the variable value range is reported along the x-axis.

Figure A4 .
Figure A4.Correlations between different Landsat-based vegetation indices for barley observed in July.Pairwise correlation coefficient and pairwise relationship between variables is shown in the upper triangle portion and lower triangle portion of the plot respectively.The curves on the diagonal show the density of respective variables where frequency count is reported along the y-axis and the variable value range is reported along the x-axis.

Figure A5 .
Figure A5.Correlations between different Landsat-based vegetation indices for canola observed in July.Pairwise correlation coefficient and pairwise relationship between variables is shown in the upper triangle portion and lower triangle portion of the plot respectively.The curves on the diagonal show the density of respective variables where frequency count is reported along the y-axis and the variable value range is reported along the x-axis.

Figure A6 .
Figure A6.Yield estimation accuracy (RMSE) of single stage and multi-stage models using Sentinel-2 data for 2016-2019 for wheat, barley and canola.In the plots, x-axis show a week or combination of weeks using Sentinel-2 data.Single weeks are denoted by labels starting with '1'.The week number shown corresponds to a week in a calendar year.

Figure A8 .
Figure A8.Spatial distribution of observed yield of barley and estimation error of best-performing models using different satellite index predictors for 2009 and 2019.Landsat-NDWI is the best performing single-index.Best performing multi-index combination using Landsat data for wheat is EVI þ SR þ GI.Regions in white represent the absence of cropland within a given municipality.Regions in grey represent municipalities for which model-specific yield estimation error is unavailable.

Figure A7 .
Figure A7.Spatial distribution of observed yield of wheat and estimation error of best-performing models using different satellite index predictors for 2009 and 2019.Landsat-NDWI is the best performing single-index.Best performing multi-index combination using Landsat data for wheat is EVI þ SR þ NDWI.Regions in white represent the absence of cropland within a given municipality.Regions in grey represent municipalities for which model-specific yield estimation error is unavailable.

Figure A9 .
Figure A9.Spatial distribution of observed yield of canola and estimation error of best-performing models using different satellite index predictors for 2009 and 2019.Landsat-NDWI is the best performing single-index.Best performing multiindex combination using Landsat data for wheat is EVI þ SR þ NDWI.Regions in white represent the absence of cropland within a given municipality.Regions in grey represent municipalities for which model-specific yield estimation error is unavailable.

Table 2 .
Sensor-specific vegetation indices tracking different properties of crop yield.
M: MODIS-based model and L: Landsat-based model.SI: best performing single-index model and MI: best performing multi-index model.SI (L) model: NDWI for wheat, barley and canola.MI (L) model:

Table 4 .
Cross validation performance metrics of best performing single-index and multi-index during the time period corresponding to Sentinel-2 data availability (2016-2019).

Table A3 .
Cross validation performance metrics for canola of best performing multi-index models and single indices during the time period corresponding to Sentinel-2 data availability(2016)(2017)(2018)(2019).

Table A5 .
AICc values of Sentinel-2-based multi-index models for barley and wheat for 2016-2019.The best performing model is presented in bold.

Table A6 .
AICc values of Sentinel-2-based multi-index models for canola for 2016-2019.The best performing model is presented in bold.

Table A7 .
Sensor-specific multi-index combinations tested for modeling.