Phenological piecewise modelling is more conducive than whole-season modelling to winter wheat yield estimation based on remote sensing data

ABSTRACT Most of the existing remote sensing-based yield estimation methods adopt the mean or cumulative value of meteorological factors within the whole growing season, which may ignore the impact of adverse meteorological conditions on the growth of winter wheat in a certain phenological period. In this study, we distinguished the developmental progression of winter wheat as three phenological periods. In each phenological period, the vegetation indices and meteorological factors were optimized. Then the accuracy and spatiotemporal transferability of the phenological piecewise modelling was compared with that of the whole-season modelling based on four regression methods (i.e. multiple linear regression, artificial neural network, support vector regression and random forest). The results showed that the optimal combinations of variables for the whole-season modelling and the phenological piecewise modelling were different. Compared with the whole-season models, the R2 for the phenological piecewise models improved by 1.4% to 7.6%, the root mean square error (RMSE) decreased by 1.1% to 8.2% among four regression methods . In addition, compared with the whole-season models, the spatiotemporal transferability for the phenological piecewise models was generally better. The accuracies after spatiotemporal transfer for the phenological piecewise models were still higher than that for the whole-season models.


Introduction
The capacity of global food supply is facing increasing pressure due to the population growth, climate change, and ecological environment deterioration (Godfray et al., 2010;Rounsevell et al., 2005;Searchinger et al., 2008). Winter wheat is one of the most widely cultivated food crops in the world. Its growth and yield are closely related to global food security (Thenkabail et al., 2012). Therefore, timely and accurate estimation of the winter wheat yield in a large region is of great significance to guide agricultural production and improve the agricultural disaster response ability.
Remote sensing statistical models, crop growth models, and data assimilation models are the three commonly used methods to estimate crop yield . Crop growth models predict the crop yield by simulating the growth processes of crop from sowing to harvest (Challinor et al., 2004;Jones et al., 2003). It has a clear physiological mechanism and can be successfully applied at a field scale (Jego et al., 2012). However, the various model parameters of crop growth models are hard to obtain at a large scale and crops growth models are usually based on simplified growth processes, which may lead to some uncertainties . The data assimilation method can solve this problem to some extent by assimilating remote sensing data and crop growth models, but its calculation processes are complicated and the computational efficiency is not satisfactory at a large scale (Lee et al., 2010). Compared with the two methods above, the remote sensing statistical model is a type of empirical models. Although the established model may only perform well on specific crop cultivars or certain geographical regions (Doraiswamy et al., 2003;Fang et al., 2011), it is still commonly adopted to estimate the crop yield due to its advantages of few data requirements, fast computational efficiency, and applicable to a large scale (Kowalik et al., 2014;Liao et al., 2019;Nolasco et al., 2021). Since the vegetation indices calculated by remote sensing reflectance data can reflect the growth status of vegetation and has a high correlation with yield, the early developed statistical methods were based on the linear statistical regression models between the vegetation index and the actual yield (Mkhabela et al., 2011;Rasmussen, 1997;Siyal et al., 2015). After that, more and more studies notice the improvement effect of combining the remote sensing vegetation index with meteorological factors (such as temperature, precipitation and sunlight; Kamir et al., 2020;Prasad et al., 2006;Sharifi, 2020;Stepanov et al., 2020). In addition, the phenology variables such as start of season, heading date and mature date can reflect the duration of crop key growth periods, which control the accumulation of effective dry matter (Saseendran et al., 2000). In recent years, some studies improved the accuracy of crop yield statistical estimation models by introducing crop phenology variables as additional independent variables (Guo et al., 2021;Ji et al., 2021;Sun et al., 2020). Except for the abundance of model variables, the spatial resolution of remote sensing data has been also improved with the development of satellite sensors. The main sensor products used in crop yield estimation include: SPOT-VEGETATION composite products (1000 meters, 10 days; Kowalik et al., 2014), MODIS composite products (250/500 meters, 8/16 days;Funk & Budde, 2009;Leroux et al., 2019), Landsat products (30 meters, 16 days; Liao et al., 2019;Siyal et al., 2015), and Sentinel-2 products (10 meters, 3-5 days; He & Mostovoy, 2019;Hunt et al., 2019). Different satellite sensors may have different characteristics (e.g. with a higher spatial resolution but a lower temporal resolution). It is necessary to select appropriate remote sensing products according to the study area and research objectives.
Although the current remote sensing statistical yield estimation methods are relatively mature, there is still room for improvement. Firstly, in terms of the vegetation index, some studies have compared the performance of commonly used vegetation indices on crop yield estimation, such as the normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI; Johnson, 2016;Nolasco et al., 2021;Tuvdendorj et al., 2019). However, some new vegetation indices with great potential in yield estimation are rarely evaluated, such as the near-infrared index (NIRv), which is closely related to solarinduced chlorophyll fluorescence (Badgley et al., 2017), and the normalized difference phenology index (NDPI), which is extremely insensitive to soil and contains vegetation water information Wang et al., 2017). In addition, different vegetation indices may show different applicability in different phenophases due to their different emphasis. Secondly, when introducing meteorological factors to improve the yield estimation model, most studies considered the meteorological factors within the whole growing season. Even in the study that integrating meteorology and phenology, the phenology was usually taken as new independent variables into the model (Guo et al., 2021;Ji et al., 2021). Since crops have different requirements for hydrothermal conditions in different growth phases, using the mean/ accumulated value of meteorological factors within the whole growing season may weaken/ignore the impact of adverse meteorological conditions on crop growth in a certain phase. Therefore, the phenological piecewise modelling, i.e. separating the growing season into different phenological periods and selecting appropriate input variables in each phenological period to construct the yield estimation model, may be more conducive to the improvement of yield estimation.
The main objectives of this study are: (1) to compare the optimized variables between the whole-season yield modelling and the phenological piecewise yield modelling; (2) to compare the estimation accuracy between the whole-season yield models and the phenological piecewise yield models; and (3) to compare the spatiotemporal transferability (i.e. the ability of models to predict the winter wheat yield in other years or regions whose samples did not participate in the model training) between the whole-season models and the phenological piecewise models.

Study area
The Henan Province is located in central China (110.4°E -116.6°E, 31.4°N -36.4°N), covering approximately 167,000 square kilometres (Figure 1a). Under the temperate monsoon climate, the annual averaged temperature in Henan Province in 2018 was 15.5°C, and the annual precipitation was 748.9 mm (Henan Statistics Bureau, 2019). Henan Province is one of the main grain-producing provinces in China, and the main crops are winter wheat and summer maize ( Figure 1b). According to the China Rural Statistical Yearbook (National Bureau of Statistics of China, 2019), in 2018, the sown area of winter wheat in the province was 5.74 million ha, and the total yield of winter wheat in the province was 36.03 million tons, ranking the first among all provinces in China. Henan Province mainly consists of plains, and also has some mountainous regions ( Figure 1c). In the mountainous regions in the west of Henan Province, the fields of winter wheat are relatively small and scattered ( Figure 1d). In contrast, the fields of winter wheat cultivated in the plains in the east of Henan Province are large and concentrated (Figure 1e and 1f). Based on the FROM-GLC10 land cover product with 10-m spatial resolution (Gong et al., 2019), the statistical results of the area of each cropland patch in Henan Province showed that more than 75% of the cropland patches are over 25 hectares (i.e. 500 m × 500 m) ( Figure S1).
where ρ NIR , ρ R , ρ B , ρ SWIR represent the surface reflectance of the near-infrared band (841-876 nm), red band (620-670 nm), blue band (459-479 nm) and short-wave infrared band (1628-1652 nm), respectively. To reduce the effects of clouds and atmospheric conditions on vegetation index time-series data, the change-weight filtering method (Zhu et al., 2012), which can effectively reduce noise and preserve crop phenology characteristics, was used to reconstruct the vegetation index time-series data with good quality.

Meteorological data
The meteorological data set CFSv2 (Climate Forecast System Version 2) was used in this study, which was released by the National Centers for Environmental Prediction (NCEP; Saha et al., 2014). The CFSv2 reanalysis product is produced by the Climate Forecast System, which is a fully coupled model representing the interaction between the Earth's atmosphere, oceans, land, and sea ice. The temporal resolution is 6 h, and the spatial resolution is 0.2°. The data set contains a variety of meteorological data worldwide since 1979 and was commonly used in related researches (Carbin et al., 2016;Pegion & Alexander, 2013). Considering the growth and development of crops are mainly affected by temperature, moisture and sunlight conditions, the daily mean temperature 2 m above the ground, total precipitation, mean specific humidity, mean soil moisture content 5 cm depth below the surface layer and total short-wave solar radiation were used and computed on the Google Earth Engine platform in this study.

Phenology data
The phenology data used in this study were the spatial distributions of the start dates of four winter wheat phenological events in Henan province: the start date of regreening (BBCH 21), the start date of jointing (BBCH 31), the start date of heading (BBCH 51) and the start date of milking maturity (BBCH 73). The regreening date is the date when the leaves of winter wheat begin to turn green after overwintering. It is the date that marks the beginning of winter wheat growth, that is, the start of the growing season. The jointing date is the date when the stem of winter wheat begins to elongate fast, which indicates that winter wheat has entered a period of rapid growth. The heading date is the date when the fully developed spike of winter wheat extends out of the top leaf with the elongation of the stem, which marks the transition from vegetative growth to reproductive growth. The milking maturity date is the date when the grain filling of winter wheat is complete, which indicates that winter wheat has ended its growth and entered the mature stage. According to the phenology calendar released by the Ministry of Agriculture and Rural Affairs of the People's Republic of China (http://www.moa.gov.cn/), in Henan Province, these four dates usually occur in early March, early April, late April, and late May. These four dates are important time nodes during the growing season of winter wheat, and the demands of winter wheat for hydrothermal conditions vary greatly during these periods (Table S1).
The start date of regreening was extracted by the dynamic threshold method (White et al., 1997) based on NDVI time-series data, and the threshold was set as 20%, which was verified to be suitable for extracting the regreening date of winter wheat (RMSE is about 10 days; Gan et al., 2020). The start date of heading was provided by Huang et al. (2020), which was extracted by the accumulated temperature method with a high accuracy (RMSE is about 5 days). The start dates of jointing and milking maturity were generated by adding a fixed duration (number of days) to the start date of regreening and the start date of heading, respectively. The fixed duration was obtained from local phenological observation data (see details in Appendix A).

Winter wheat map and yield statistics data
Since this paper focuses on the comparison of wholeseason yield modelling and phenological piecewise yield modelling, we pursue a high user accuracy to ensure the classified winter wheat pixels are as much as possible the true winter wheat pixels. The winter wheat map used in this study (Figure 1) was the intersection of the winter wheat maps in 2017 and 2018, which were both produced by a double threshold method (Huang et al., 2020; see details in Supplementary Materials). The spatial resolution is 250 m. The overall accuracies of winter wheat maps in 2017 and 2018 are 90.0% and 89.2%, respectively. The producer accuracies are 87.5% and 85.4%, respectively. The user accuracies are 95.4% and 96.1%, respectively. The kappa coefficients are 0.80 and 0.79, respectively.
The county-level statistical data of winter wheat yield from 2017 to 2018 in Henan Province, China were downloaded from the EPS data platform (http:// olap.epsnet.com.cn/) as the reference data to train the yield estimation models and assess accuracy. To reduce the uncertainty from some counties with small winter wheat planting area, the top 100 counties with winter wheat planting area in Henan Province (the number of winter wheat pixels for each county is between 1721 and 34,005, and the winter wheat cultivated area for each county is between 10.8 × 10 3 ha and 212.5 × 10 3 ha) were finally selected in this study ( Figure 1).

Methods
This study was conducted on the county level and the spatial aggregation of the data was realized by calculating the average value of pixels in each county. Therefore, the dependent variable is the winter wheat yield (kg/ha) in each county from 2017 to 2018, and the independent variables are the spatial mean values of optimized vegetation indices or meteorological factors corresponding to winter wheat pixels in each county from 2017 to 2018. The variable optimization, model building and accuracy assessment were conducted based on the samples from both years. In addition, to ensure the robustness of the results, these steps were also conducted based on the samples from each year respectively.

Variable optimization
Variable optimization was divided into two parts and based on the multiple linear regression model. The first part was the optimization of vegetation indices, and then the meteorological factors were optimized. In the optimization of vegetation indices, four vegetation indices, NDVI, EVI, NIRv and NDPI, as well as two calculation methods, the maximum value in a growth phase (VI max ) and the mean value in a growth phase (VI mean ), were considered. The combination of vegetation index and calculation method which showed the highest R 2 with the county-level yield statistics was selected. Then the optimal vegetation index was combined with different meteorological factors to build multiple linear regression models with different independent variables. Finally, the multiple linear regression model was selected based on two criteria: (1) it showed the highest R 2 with the county-level yield statistics; and (2) each independent variable of the model passed the P < 0.05 significance test. The independent variables of the selected model were considered as the optimal input variables. The meteorological factors included daily mean air temperature (T), total precipitation (P), mean soil moisture content (SM), mean specific humidity (SH) and total solar radiation (SR). In addition, since the precipitation, soil moisture content and specific humidity all characterize the water condition, only the variable with the highest R 2 among them was selected when combined with other meteorological factors.
The growth and development process of winter wheat was divided into three phenophases: from regreening to joining, from jointing to heading and from heading to milking maturity. In each phenophase, the vegetation indices and meteorological factors were optimized according to the above method. The selected variables in each phenophase were all used as independent variables to participate in the modelling.

Yield estimation models
To test whether the phenological piecewise modelling always performed better than the whole-season modelling based on different regression methods, in addition to the traditional multiple linear regression (MLR), three commonly used and state-of-the-art machine learning regression models, artificial neural network (ANN), support vector regression (SVR) and random forest regression (RF), were selected in this study. ANN, SVR, and RF were implemented based on the "nnet", "e1071", and "randomForest" packages within the R environment software, respectively. The zero-mean normalization was conducted and the parameters were optimized by comparing the errors of the models with different combinations of parameters (Table 1).

Multiple linear regression
Multiple linear regression is a linear regression model applied to the case that multiple independent variables jointly affect the dependent variable. Compared with the simple linear regression model, it is more effective and closer to reality (Sousa et al., 2007).

Artificial neural network
Artificial neural network imitates the natural neural network, and it can effectively solve the complex regression problem with a large number of correlated variables. The structure of an artificial neural network includes an input layer, multiple hidden layers and an output layer. Each hidden layer is formed by several artificial neural nodes, and the estimation error is transmitted to the feedforward neural network through the backpropagation algorithm. The weight of each node is adjusted through continuous iteration until the estimation error reaches an acceptable level (Tsai & Lee, 1999;Zhou, 1999).

Support vector regression
Support vector regression is a kind of regression models based on the kernel function. When solving nonlinear problems, low-dimensional variables are mapped to high-dimensional space by the kernel function. By finding the optimal hyperplane (the sum of the distances from all points to the hyperplane is the The number of hidden layers = 3 The number of nodes in each hidden layer = 5 Learning rate = 0.01 Momentum = 0.4 Error criterium = "LMS" Hidden layer = "Tansig" Output layer = "Tansig" Method = "BATCHgdwm" Support Vector Regression (SVR) Kernel = "radial" Gamma = 0.25 Cost = 1 Episode = 0.2 Random Forest (RF) The number of trees = 500 Mtry = 3 shortest) in the high-dimensional space, the nonlinear problem is transformed into a linear problem (Bennett & Demiriz, 1998;Suykens & Vandewalle, 1999).

Random forest regression
Random forest regression is an ensemble learning algorithm, which uses the bootstrap resampling method to generate different sample subsets to build different decision trees, and calculates the average of the prediction results of all decision trees as the final prediction result (Briem et al., 2001). It has become one of the most widely used machine learning regression models because it has few parameters, high prediction accuracy and can effectively solve the problem of interaction among independent variables (Leroux et al., 2019;Peng et al., 2020).

Accuracy assessment
This study adopted the 10-fold cross-validation method, and it was repeated 20 times to take the average of each accuracy indicator. The coefficient of determination (R 2 ), BIAS, mean absolute error (MAE), root mean square error (RMSE), and relative root mean square error (RRMSE) were used as the accuracy indicators. Among them, R 2 is used to evaluate the goodness-of-fit of a model, and a larger R 2 indicates a better goodness-of-fit. BIAS is used to evaluate the deviation between the estimation and the reference. If the value is greater than 0, it indicates that the estimated result is overall greater than the reference result, and vice versa. MAE and RMSE are used to evaluate the overall absolute error between the estimation and the reference, and RRMSE is used to evaluate the overall relative error between the estimation and the reference. The smaller the absolute values of these four indicators indicate the smaller error and the higher accuracy. The calculations of the five indicators were shown in Equations (5)-(9), respectively.
RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where Ŷ i is the estimated result of the i th sample, Y i is the corresponding reference value, � Y is the average of samples, N is the number of samples, covŶ; Y À � represents the covariance between the estimated results and the references, and varŶ À � and var Y ð Þ represent the variance of the estimated results and the references, respectively.
In addition, the independent samples t-test was used to test whether there were statistically significant differences (P < 0.05) in the accuracies between the phenological piecewise models and the whole-season models. The experimental results being tested were the accuracy indicators (e.g. RMSEs) from 10-fold crossvalidation (conducted 20 times). The Kolmogorov-Smirnov (K-S) test was performed in advance to verify that all samples were normally distributed.

Evaluation of the spatiotemporal transferability
In addition to the estimation accuracy, the spatiotemporal transferability of the whole-season models and the phenological piecewise models were also compared. In the evaluation of temporal transferability, the data in 2017 was first selected as training data to build the models, and the data in 2018 was selected as test data to assess the accuracy. Then the data in 2018 was selected as training data, and the data in 2017 was selected as test data. It is to evaluate the ability of models to predict the winter wheat yield in other years whose samples did not participate in modelling (i.e. the temporal transferability). In the evaluation of spatial transferability, the 100 counties were randomly divided into two equal groups, and the two groups also took turns as the training samples to build the models or test samples to assess the accuracy. Correspondingly, it is to evaluate the ability of models to predict the winter wheat yield in other regions whose samples did not participated in modelling (i.e. the spatial transferability). The assessment indicators were R 2 , MAE, RMSE and RRMSE mentioned above.

The optimal combination of variables for different models
For the whole-season model, the R 2 between mean value of EVI (EVI mean ) within the growing season and yield was the highest ( Figure S2a). After combining the vegetation index with meteorological factors, the R 2 was greatly improved (Figure 2). Among those combinations, the combination of air temperature, soil moisture content and solar radiation performed the best. The R 2 was the highest and all independent variables passed the significance test (P < 0.05). Therefore, the combination of EVI mean , air temperature, soil moisture content and solar radiation was the optimal combination for the whole-season yield model. Similar results were found based on samples from 2017 and 2018 respectively ( Figure S3 & Table S2).
For the phenological piecewise model, EVI also performed the best among the four vegetation indices in each phenophase. Specifically, EVI max was the best choice in the period of regreeningjointing, and EVI mean was the best choice in the periods of jointing-heading and heading-milking maturity ( Figure S2b, c and d). However, the importance of meteorological factors was different in different phenophases (Figure 3 & Table 2). For the period of regreening-jointing, EVI max combined with soil moisture content performed the best. The independent variables all passed the P < 0.001 significance test and the R 2 (0.42) was higher than that of other combinations. For the period of jointing-heading, the combination of EVI mean , soil moisture content and solar radiation was the best. For the period of heading-milking maturity, air temperature and solar radiation both significantly contributed to the yield model. Therefore, the combination of EVI max and SM in the period of regreening-jointing, EVI mean , SM and SR in the period of jointing-heading, and EVI mean , T and SR in the period of headingmilking maturity was the optimal combination for the phenological piecewise model. Similar results were found based on samples from 2017 and 2018 respectively ( Figure S4, Table S3 &  Table S4).

The estimation accuracy for different models
In general, the accuracies of the phenological piecewise models were higher than that of the wholeseason models (Figure 4). Except for SVR only with a little improvement (failed to pass P< 0.05 significance test), the accuracies of MLR, ANN and RF were all improved significantly (passed P< 0.05 significance test). The R 2 increased by 4.3% to 7.6% among the three models, the absolute value of BIAS decreased by 19.9% to 33.3%, the MAE decreased by 7.1% to 8.4%, and the RMSE and the RRMSE decreased by 3.0% to 8.2%. Similar improvements were found based on samples from 2017 and 2018 respectively ( Figure S5).

The spatiotemporal transferability for different models
After spatiotemporal transfer, the accuracies of the whole-season models and the phenological piecewise models both decreased to some extent ( Figure 5). After temporal (spatial) transfer, the R 2 for the whole-season models and the phenological piecewise models decreased by 0.01 to 0.03 (0.04 to 0.05) and 0.01 to 0.05 (0.03 to 0.05) among four regression models, respectively. The MAE increased by 23 kg/ ha to 89 kg/ha (28 kg/ha to 87 kg/ha) and 9 kg/ha to 44 kg/ha (22 kg/ha to 67 kg/ha), respectively. The RMSE increased by 13 kg/ha to 79 kg/ha (23 kg/ha to
81 kg/ha) and 1 kg/ha to 58 kg/ha (27 kg/ha to 70 kg/ha), respectively. The RRMSE increased by 0.2% to 1.3% (0.4% to 1.3%) and 0.0% to 1.0% (0.4% to 1.1%), respectively. Although the difference for decreased accuracies between the whole-season models and the phenological piecewise models was unobvious and varied depending on the specific regression model, the accuracies after spatiotemporal transfer for all four regression models based on phenological piecewise modelling were still higher than that based on whole-season modelling.

The difference for the optimal combination of variables between the whole-season modelling and the phenological piecewise modelling
For winter wheat, temperature, water and sunlight are all important to its growth and development (Xiao et al., 2016). Therefore, the air temperature, soil moisture content, and solar radiation were all selected to build the whole-season models. However, the degrees of importance of these meteorological factors are different in different phenophases in our study, which may be related to the different demands of crops for hydrothermal conditions in different phenophases and leads to the difference for the optimal combination of variables between the whole-season modelling and the phenological piecewise modelling.
During the period of regreening-jointing, winter wheat needs much water to differentiate its spike primordium, and the soil moisture content should reach 75%-85% of the field capacity (Song et al., 2006). Therefore, the soil moisture content may be the most important factor during this period. The jointing-heading is the period when winter wheat grows the fastest. The study area in this study can generally meet the needs of sunlight and temperature for the growth of winter wheat during this period, while the lack of water is an important factor restricting the growth of winter wheat in this area (Liu et al., 2018). Therefore, the meteorological factors that characterize the water condition show the highest importance during this period in our study. Similarly, Lei et al. (2010) found that the water deficit had a significant effect on the yield of winter wheat especially during the periods of Figure 4. The estimation accuracy for the whole-season models and the phenological piecewise models. * above the column indicates that there is a significant difference (P < 0.05) in the accuracy between the whole-season model and the phenological piecewise model based on this regression method. Figure 5. Assessment of the spatiotemporal transferability for the whole-season models and the phenological piecewise models. Note that the change after transfer for R 2 is decreased, while the change after transfer for the other indicators is increased.
regreening-jointing and jointing-heading. Heading-milking maturity is a critical period for the maturity of winter wheat. During this period, the requirements for temperature, water and sunlight conditions are very strict. Too high or too low will both have a great negative impact on the growth of winter wheat. In our study, the temperature contributes the most among the meteorological factors during this period. This may be because the period of heading-milking maturity is the most sensitive temperature stage for wheat (Sánchez et al., 2014), and the high temperature stress is one of the most important influence factors (Duncan et al., 2015;Teixeira et al., 2013).

The difference for the accuracy between the whole-season models and the phenological piecewise models
Due to the different optimal combinations of variables, the accuracies for the phenological piecewise models and the whole-season models are different. Compared with the accuracies of whole-season models, the accuracies of phenological piecewise models are higher. This may be explained by the stronger biophysical mechanism of phenological piecewise models. In addition to the different demands of crops for hydrothermal conditions in different phenophases mentioned above, the other reason is that the meteorological factors within the whole growing season cannot reflect the influence of adverse meteorological conditions in a certain phenophase. For example, the range of suitable temperature for the growth of winter wheat is 4-6°C during the period of regreening-jointing, 12-16°C during the period of jointing-heading, and 18-22°C during the period of heading-milking maturity (Gong, 1988;song et al., 2006). The suitable temperature for the growth of winter wheat varies greatly in different phenophases, so the mean temperature during the whole growing season cannot effectively reflect whether the growth of winter wheat is always in a suitable condition. Therefore, the phenological piecewise modelling is closer to the actual growth process of crops and can achieve a higher accuracy.

The difference for the spatiotemporal transferability between the whole-season models and the phenological piecewise models
In general, there is little difference for spatiotemporal transferability between the whole-season models and the phenological piecewise models. However, the difference for the spatiotemporal transferability between the specific regression models (i.e. MLR, ANN, SVR, RF) is relatively obvious. The spatiotemporal transferability of SVR and RF based on phenological piecewise modelling is stronger than that based on whole-season modelling, but for MLR and ANN, the results are opposite. Compared with the whole-season modelling, the phenological piecewise modelling needs more independent variables and contains more crop growth information. In this case, the spatiotemporal transferability or generalization ability of MLR and ANN is limited due to their disadvantage of easy overfitting (Karystinos & Pados, 2000;Lawrence & Giles, 2000). In contrast, the spatiotemporal transferability of SVR and RF is stronger by their advantage of applicable to handle high-dimensional data (Belgiu & Dragut, 2016;Chang & Lin, 2011;Maulik & Chakraborty, 2013).

The performance of different vegetation indices and regression methods
Among the four vegetation indices, EVI and NIRv performed the best in winter wheat yield estimation, followed by NDPI and NDVI. This may be related to the characteristics of the vegetation indices. NDVI can effectively reflect the greenness of vegetation, but it is easily saturated and susceptible to atmospheric conditions and soil background. Compared with NDVI, EVI solves these problems to a certain extent and can better reflect the growth status of vegetation. It shows a higher accuracy in yield estimation, which is consistent with the results of Johnson (2016) and Sharifi (2020). NDPI is mainly designed to eliminate the interference of soil background changes on vegetation signals in the early growth phase of vegetation (Wang et al., 2017), and its ability to resist the interference of soil background is stronger than that of NDVI . However, in the early growth phase of crops, the contribution of vegetation index to yield estimation is limited according to our results, which may explain there is no obvious advantage in yield estimation when comparing NDPI with other vegetation indices. NIRv can reflect the photosynthesis of vegetation because of its high correlation with the suninduced chlorophyll fluorescence. Therefore, it has great advantages in estimating the gross primary productivity and yield of crops Wu et al., 2020).
Among the four regression methods, RF performed the best, with the highest accuracy, significant improvement, and the best spatiotemporal transferability. This is similar to the results of Peng et al. (2020) and Guo et al. (2021). The response function between crop growth rate and meteorological factors generally presents a nonlinear relationship (Soltani & Sinclair, 2012). In this case, the machine learning regression methods, which are more suitable for building nonlinear relationships, perform better in crop yield estimation (Ludwig & Asseng, 2006). Moreover, there are many factors affecting crop growth, and the machine learning regression methods also perform better in dealing with the problem of the strong interaction between independent variables. In addition, compared with the other two machine learning regression methods (i.e. ANN and SVR), RF can better solve the curse of dimensionality. It requires fewer parameters and is less dependent on training data (Belgiu and Dragut, 2016), which may explain the superiority of RF.

Limitations and prospects
In this study, we aim to compare the difference between the whole-season modelling and the phenological piecewise modelling for winter wheat yield estimation. Except for input variables, other aspects of the two modelling schemes are generally identical. However, if the objective is to estimate crop yield, the uncertainties from data sources , parameter settings of models (Rodriguez-Galiano et al., 2015), scale effects (Tarnavsky et al., 2008), or some other factors should be more carefully considered. Moreover, a main potential limitation of this study is the resolution of MODIS data we used. The fields of winter wheat cultivated in the plains in the east of our study area are large and concentrated (Figure 1e & 1f), thus the resolution of MODIS can be well matched. However, in the mountainous regions in the west of our study area, there are a few small and scattered winter wheat fields cultivated (Figure 1d). These fields may be influenced by sub-pixels due to the sparse spatial resolution of MODIS. In this case, the remote sensing data with finer resolution (e.g. Sentinel-2) can be used to solve the problem. In addition, the phenological piecewise modelling requires additional phenology data. Considering the availability of phenology data and the relatively fixed time frame of crop growth, the phenophases distinguished by a fixed time frame may be also effective. Finally, it was found in this study that the phenological piecewise modelling is better than the whole-season modelling for winter wheat yield estimation, while for other crops and at other scales (e.g. the filed scale), the performance of phenological piecewise modelling needs to be further evaluated.

Conclusions
In this study, the optimized variables, accuracy and spatiotemporal transferability of the phenological piecewise models were compared with that of the wholeseason models. The accuracies of the phenological piecewise models were higher than that of the wholeseason models. The difference for spatiotemporal transferability between the phenological piecewise models and the whole-season models was little and the results varied depending on the specific regression model. However, the accuracies after spatiotemporal transfer for the phenological piecewise models were still higher than that for the whole-season models.
About the optimized variables, the enhanced vegetation index, air temperature, soil moisture content, and solar radiation were selected for the whole-season modelling, while for the phenological piecewise modelling, the optimal combinations of variables in different phenophases were different. It was mainly related to the different demands of crops for hydrothermal conditions in different phenophases. The phenological piecewise model has stronger biophysical mechanism and we highlight the empirical yield estimation models in the future should considered it.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This research was funded by the National Natural Science Foundation of China (NO. 41771047).

Appendix A Determination of the duration of phenophases
The fixed durations from regreening to jointing and heading to milking maturity were obtained from the observation data of 35 winter wheat phenology stations in Henan Province (Figure 1) from 2017 to 2018. The phenology stations were established by the China Meteorological Administration (CMA). The stations monitor the growth and development status of field crops through direct human observation. The specific records include the crop variety, the name of the crop development stage and its date, the anomaly of the development stage, etc.
A total of 61 (recording both regreening and jointing dates) and 63 (recording both heading and milking maturity dates) station-year samples were used. Considering the days from regreening to jointing and heading to milking maturity recorded by most stations did not vary greatly (the differences between the upper quartiles and the lower quartiles in Figure A1 were within 10 days), the median value of samples was taken as the corresponding fixed duration (i.e. 26 days for the duration from regreening to jointing and 34 days for the duration from heading to milking maturity). Figure A1. The days from regreening to jointing (reg-jot) and heading to milking maturity (hed-mlk) recorded by phenology stations from 2017 to 2018. n: the number of samples.