Early mapping of winter wheat in Henan province of China using time series of Sentinel-2 data

ABSTRACT Accurate mapping of winter wheat in its early stages is crucial for crop growth monitoring and crop yield forecasting. However, early mapping of winter wheat using remotely sensed data is challenging because remote sensing observations can only be used for a part of the growth period. In this study, a framework was proposed for early season mapping of winter wheat using spectral and temporal information of Sentinel-2 images. First, time series of temporal and spectral features were derived using Whittaker smoothing. Subsequently, sensitivities of different parameters (i.e. input features, time interval, and length of time-series data) to early mapping were analyzed. Finally, early maps of winter wheat were generated based on optimal parameters. Results show that the earliest identifiable timing was delayed as the time interval of the time series increased. Winter wheat can be mapped in the early overwintering period (5 months before harvest) with an overall accuracy of 0.91, which is comparable to that of post-season mapping (0.94). In addition, the misclassification in early mapping was caused by uneven sample spatial patterns, natural conditions, and planting management; however, most errors can be gradually amended during the green-up and jointing periods, and the overall accuracy remained stable after the jointing stage. This study demonstrates that it is feasible to implement large-scale early mapping of winter wheat using satellite observations. The proposed approach potentially provides a reference for early mapping of other crop types in agricultural regions worldwide.


Introduction
With future changes in climate and environment, global urbanization, and land cover/use, the crop planting structure is changing significantly every year (Qingshui et al. 2011;Wenbin, Verburg, and Tang 2014). Large scale, high resolution crop maps serve as input data for research on national food security, crop planting systems, production structures, and regional layouts (Jiang et al. 2019;Huang et al. 2022Huang et al. , 2019Huang et al. , 2016Zhuo et al. 2022aZhuo et al. , 2022bZhuo et al. , 2019. A spatial distribution map of crops before maturity can help improve crop monitoring and provide early warnings and yield predictions (Jiang et al. 2019;Wang et al. 2020). Mapping early season crop distribution is extremely valuable for dynamic crop growth monitoring and yield prediction.
Remote sensing is widely employed for spatial mapping of land cover products because of its advantages in obtaining large-scale geospatial information at high spatiotemporal resolutions .
Several global and regional land-cover products have been developed based on remote sensing observations for scientific research and data analysis. Recently, high-resolution land-cover datasets such as Finer Resolution Observation and Monitoring of Global Land Cover (FROM-GLC) products with spatial resolutions of 30 and 10 m were developed by the Tsinghua University Gong et al. 2013). Global-scale land-cover products have laid a solid foundation for studying global land cover and environmental changes (Giri, Zhu, and Reed 2005;Lunetta et al. 2006). In addition, some regional cropland products were developed by providing more detailed types of crops, such as Cropland Data Layer (CDL) data released by the United States Department of Agriculture (USDA) at a 30-m spatial resolution (https://nassgeodata.gmu.edu/CropScape/) to support research on cultivated-land changes and crop distribution (Lunetta et al. 2006;Cai et al. 2018).
However, most of these studies involve post-season mapping, and there are relatively few studies on the early mapping of crops.
Remotely sensed satellite images from different sensors have been extensively applied in crop-type mapping (Dong et al. 2020a;Yang et al. 2019). Moderate Resolution Imaging Spectroradiometer (MODIS) data have been widely used for large-scale crop monitoring because of their high temporal resolution (Wardlow and Egbert 2008), but its coarse spatial resolution is not suitable for fine crop mapping. Landsat and Sentinel-2 usually contain more detailed crop spatial distribution information because of their high spatial resolution Yin et al. 2020;You and Dong 2020). However, compared to Landsat, Sentinel-2 has a shorter revisit period (5 days) and red-edge bands, which further improves large-scale crop mapping (Defourny et al. 2019). Therefore, Sentinel-2 imagery was used for early mapping in this study.
Time series data can represent the phenology of crops, and different crops can be effectively characterized based on their phenological characteristics (Jinwei et al. 2016;Dong et al. 2020b;Niu et al. 2022;Feng et al. 2020;Zhong et al. 2011). For example, since winter wheat thrives during tillering, jointing, and heading periods but stops growing or even dies during the overwintering period, its vegetation index time series data have two local peaks and one valley, which is helpful for the identification of winter wheat (Pan et al. 2012;Zhong et al. 2019a). Momm, ElKadiri, and Porter (2020) established a multi-year normalized difference vegetation index (NDVI) time series with CDL reference data using Landsat-5 images, resulting in high classification accuracy and generalization capabilities. Tian et al. (2019) combined Landsat and Sentinel-2 images to extract two NDVI peaks and one valley value in the winter wheat phenology curve, which was used as a time-series feature with the threshold method to complete winter wheat mapping in the main producing areas of China. However, these studies integrated time-series characteristics of the entire growing period of crops, whereas only partial growing-period timeseries data were available in early mapping. Thresholds commonly used in large-area crop mapping also introduce uncertainties. In addition, cloud cover seriously affects the spatiotemporal availability of surface observations (Prudente et al. 2020;Meraner et al. 2020). Therefore, early identification of crop distribution information with high spatial resolution in large-scale regions remains challenging.
The early mapping of high-resolution crop extents using optical remote sensing time-series data observations faces many challenges. (1) Difficulties in efficiently compositing time-series data when optical remote sensing images have cloudy pixels and relatively long revisit cycles. Several studies have been carried out using sufficient observations in the study areas to synthesize time-series data of key phenology during crop growth (Tian et al. 2019;Feng et al. 2020;Xie et al. 2019;Akbari et al. 2020). However, the time series length is too short for early mapping, leading to the omission of critical phenological information and difficulty in completing early crop mapping.
(2) The spectral signature is not evident in the early stages of the crop growth period, and it is difficult to distinguish different crops. Therefore, the characteristics of a specific time period with large differences must be determined for different objects. For example, for rapeseed mapping, Zang et al. (2020) used the enhanced area yellowness index to enhance weak yellow signals during the flowering period. You and Dong (2020) found that the reflectance of maize and soybean red-edge bands differed during the heading period of maize and the pod period of soybean. (3) The storage, calculation, and preprocessing of remote sensing time series data on a large scale requires a large storage space and efficient computing resources.
The main objective of this study was to develop a framework for early mapping of winter wheat in a large-scale area. Based on the Google Earth Engine (GEE) platform, we used a random forest (RF) classifier and spectral and temporal features to explore the early mapping method of winter wheat in Henan province, China. Specifically, this study aimed to answer the following questions: (1) What input feature combination scheme would be optimal for the early mapping of winter wheat? (2) What is the earliest identifiable timing for winter wheat? (3) Is the early map reliable compared to the post-season map? (4) What causes omission and commission errors of winter wheat in early mapping?

Study area
The study area covers Henan Province (Figure 1), which is China's main winter wheat growing region. Topographically, Henan is high in the west and low in the east. The central and eastern areas are large-scale alluvial plains, most of which are in a warm temperate zone, while the south is a subtropical zone. The annual average temperature from south to north is 10.5-16.7°C, and the annual average rainfall is 407-1296 mm, which is suitable for the growth of various crops. The total sown area of crops is approximately 12.07 million hectares, of which grain crops are mostly planted. Winter wheat is the main food crop, accounting for more than 50% of the grain sown area, and its production has consistently accounted for more than 20% of the country's total, ranking first in China. Detailed phenological calendar data for winter wheat in Henan province (Table 1) were obtained from the Department of Crop Production, Ministry of Agriculture, China (http://www.zzys.moa.gov.cn/).

Data from ground samples
Ground samples were used to train the classification model for winter wheat and for evaluation. From December 7-21, 2019, 3,883 ground samples were collected through field surveys using the global positioning system. The samples included winter and non-winter wheat (e.g. other crops, such as garlic, and other land-cover types, such as forests and buildings). In addition, 128 non-winter wheat samples were added through visual interpretation of high-resolution images from Google Earth, which were mainly distributed in the western and southern mountainous regions. The collected sample set ( Figure 1) consists of 2,165 winter wheat samples and 1,846 non-winter wheat samples ( Table 2). The entire set was randomly divided into a training set (70%) and a validation set (30%).

Satellite data
All Sentinel-2 images archived in GEE were used to implement the classification. The Sentinel-2 multispectral instrument Level-2A product was chosen because of its high spatiotemporal resolution. Sentinel-2 A/B observes the Earth's land surface with a 5-day repeat visit and spatial resolution of 10-60 m. The Sentinel-2 Level-2A product is an orthoimage bottom-of-atmosphere-corrected reflectance product. Based on the winter wheat phenology, the start and end times of the time series were set as 10 October 2019, and 6 June 2020, respectively, during which 2,967 images were collected. Because of the massive amount of data, GEE was chosen as the computing platform because it provides an extensive catalog of satellite imagery and high-performance cloud computing services (Gorelick et al. 2017). Given that the QA band of the Sentinel-2 data catalog records data quality information, the QA60 band was used to perform cloud masking on each image. This process masked all pixels tagged as dense and cirrus clouds. In addition, slope data were used to reflect the topographical characteristics of winter wheat.

Methodology
A framework was developed to implement the early season mapping of winter wheat ( Figure 2). First, raw data were processed before classification. The sensitivity of remote sensing observations to early mapping was then tested. Finally, the winter wheat in the study area was mapped, and results were evaluated. Details of these steps are described in the following sections.

Data processing
Four spectral indices were calculated to indicate the growth status of winter wheat. The spectral index is a practical and empirical measure of vegetation status on the ground (Pan et al. 2015;Massey et al. 2017;Bargiel 2017). Collected images were used to calculate spectral indices, including NDVI, inverted red-edge chlorophyll index (IRECI), land surface water index (LSWI), and green chlorophyll vegetation index (GCVI). These four indices can characterize the conditions and changing laws of surface vegetation and non-vegetation (Table 3).
Whittaker's smoothing method was used to derive a spectrum time series with regular intervals during the wheat-growing season (Whittaker 1922). We obtained the maximum observation value and composited the time-series data for each time interval. You and Dong (2020) found that as the time interval increases, more pixels can satisfy the need for at least one good observation per period; for example, over a 30-day temporal interval, time-series observations are almost unaffected by clouds. However, this temporal frequency was insufficient for early mapping. To generate a complete curve with finer time intervals (10,   15, or 20 days), the Whittaker smoother was used to fill the gaps caused by no observations, insufficient data, or cloud cover. The Whittaker smoother is a penalized least-squares algorithm with a high computational efficiency and requires few parameters. It strikes a balance between time-series fidelity and roughness. Assuming that the fidelity is S (Equation.
(2)), the penalized least-squares method aims to find the optimal z, which minimizes the difference matrix (Q) (Equation. (3)) (Kong et al. 2019): where y and z represent the original and smoothed time series, respectively; t is the time; λ denotes the roughness parameter; and D represents the difference matrix, formed as n-d rows (d is the different order) and n columns. The value of d was set to 2 (Kong et al. 2019), and the value of λ was 0.5 after the trial and test. Winter wheat was used as an example with a 10day interval to composite and smooth spectral time series data of winter wheat pixels ( Figure 3). Given that the number of features is directly related to the efficiency of the classification algorithm, we used only features with the highest importance for classification. The feature selection method was as follows: First, a time series of all images in the study area were composited at 10-day intervals, and 14 features of the spectral bands and indices were selected as input features for classification. The importance score of the RF classifier in GEE was then used to evaluate the importance of each feature in this classification scheme. Finally, highly important features were chosen for the early mapping of winter wheat.   Tucker 1979 Reflect information such as crop growth trends and the health status of different vegetation. Korhonen et al. 2017 Correlates with the plant canopy chlorophyll content and leaf area index Chandrasekar et al. 2010 Sensitive to water content and can effectively characterize soil and canopy water content of different vegetation GCVI GCVI ¼ ρ NIR ρ green À 1 Gitelson et al. 2005 Describe the activity of vegetation photosynthesis The ρ green , ρ red , ρ NIR , and ρ SWIR1 represent the green, red, near-infrared, and short-wave infrared bands, respectively; ρ RE1 ; ρ RE2 , ρ RE3 represent the three red edge bands such as B5, B6, and B7, respectively.

Sensitivity of remote sensing observations to early mapping
The input feature combination of the classifier, time interval of the composite remote sensing time-series data, and length of the time-series input to the classifier were chosen as variable parameters to explore the influence of these parameters on mapping performance. In this section, we attempt to obtain: (1) the optimal features of the classification model for winter wheat, (2) the optimal time interval of the composite time series, and (3) the earliest identifiable timing of winter wheat.

Time intervals
The quality of time-series data obtained from different time intervals of the composite time series was different. Three time intervals were tested (i.e. 10, 15, and 20 days) to construct the time series and evaluate their mapping performances. A shorter time interval can show crop growth status, but it produces more temporal gaps than a relatively coarse interval. The NDVI time-series curves of the three time intervals were plotted. The three curves show double peak characteristics of the growth of winter wheat ( Figure 4). In Figure 4(a), four original observational data points are missing in time-series data. In contrast, Figure 4(c) shows that the temporal characteristics of winter wheat during the overwintering period are relatively weak.

Length of time series
In general, when the time series input to the classifier is longer, the classification accuracy is higher. To obtain the earliest identifiable timing and optimal strategy, we set up different classification scenarios and compared the classification results of the time series of different lengths in each scenario. The three schemes with different input features and time intervals were combined for comparison ( Table 4). The start time was set as early October (sowing period), and the length of the time series of each classification scenario was sequentially increased until early June of the following year (harvest period). The classification accuracy increased as the time-series length increased. The earliest identifiable timing (EIT) was defined as the first time the overall classification accuracy reached a threshold of 0.9. Each scenario was evaluated based on the EIT and overall accuracy to obtain the optimal strategy.

Early mapping and accuracy assessment
Based on the RF classifier, the optimal strategy was used to map winter wheat at the earliest identifiable timing. The overall accuracy and F1 score were used to assess the winter wheat mapping results. RF can process multidimensional data and it has been successfully applied to crop mapping (Fei et al. 2022;Akbari et al. 2020;Yuanhuizi et al. 2019). Generally, RF methods do not require complex parameter tuning. The most critical parameter is the number of decision trees, which was set to 100 based on previous research (Pelletier et al. 2016). Other parameters such as minLeafPopulation, bagFraction, and seed were set to default at 1, 0.5, and 0, respectively. Accuracy assessment is an essential process for analyzing classification results. Thus, the confusion matrix was calculated for evaluation (Liu, Frazier, and Kumar 2007). The overall accuracy (OA, Equation. (4)), user accuracy (UA), producer accuracy (PA), and F1-Score (Equation. (5)) can be obtained from the confusion matrix. The calculation formulas for OA and F1-Score are as follows: where x ii and N represent the number of correctly classified samples (category i) and the total number of samples, respectively.

Optimal input feature scheme
Eight input features were chosen, including LSWI, GCVI, red edge 2, IRECI, green, red edge 1, red, and NDVI). The importance of each feature (mentioned in Section 3.3) was evaluated and ranked from the highest to lowest ( Figure 5). The first eight features were significantly higher than the other six features, and the evaluation results of the last six features were almost the same. Therefore, the first eight features were chosen as the input features of classification Scenarios S Ⅶ , S Ⅷ , and S Ⅸ . The performances of the feature schemes were compared using the OA and EIT simultaneously. We concluded that Alt 3 is the optimal feature scheme for the winter wheat classification model. The OA for each scenario is shown in Figure 6. We repeated ten times for each scenario and then averaged the OAs to reduce the uncertainty of stochasticity. The scenarios were divided into three groups according to different time intervals for the comparison of performances: (1) scenarios S I , S Ⅳ , and S Ⅶ with 10-day intervals. (2) scenarios S II , S Ⅴ , and S Ⅷ with 15-day intervals. (3) scenarios S III , S Ⅵ , and S Ⅸ with 20-day intervals. The  features used in the three scenarios for each group correspond to the three input feature schemes (mentioned in Section 3.3). Results suggest that the performance of Alt 3 is better than that of the other two schemes. The input feature schemes based on OA of the three scenarios in each group were in the order of Alt 3 > Alt 2 > Alt 1. For example, in the first group, the highest OA of the three scenarios was ranked S Ⅶ , S Ⅳ , and S I from high to low. In the same time interval scenarios, the highest OA of Alt 3 was significantly higher than that of Alt 1 and slightly higher than that of Alt 2. The comparison of EIT further shows that the performance of Alt 3 is better than that of the other two schemes. In the first and second groups, the EIT in Alt 1 and two scenarios are the same, and the EIT of Alt 3 can be 15-20 days earlier than the other two scenarios. In the third group, the EIT of Alt 3 is 40-60 days earlier than that of the other two schemes. Although the eight selected features have the potential for large scale and early identification of winter wheat, some improvements might be required in the near future. Several winter crops (e.g. winter canola and garlic) have phenological and spectral characteristics that are similar to that of winter wheat, thereby reducing the reliability of classification results. It is worth noting that the potential of synthetic aperture radar (SAR) images and spatial signatures of highresolution imagery should be considered in the near future to distinguish these confounding crop types. The backscattering intensity of SAR varies with crop canopy structure (d' Andrimont et al. 2020). Blaes, Vanhalle, and Defourny (2005) found that combining optical and SAR images can improve crop identification accuracy by at least 5%. Textural features can distinguish ground objects with similar spectral features (Haralick, Shanmugam, and Dinstein 1973). Fei et al. (2022) found that the textural features of a gray-level co-occurrence matrix effectively improved the identification accuracy for cotton. The textures of different land-cover types are different, and adding textural features can increase the ability to discriminate between them.

Optimal time interval and earliest identifiable timing
The performances of the three time intervals were compared using OA and EIT, and we concluded that the 10day period is the optimal time interval for early mapping. The three scenarios (S Ⅶ , S Ⅷ , and S Ⅸ ) using input feature Alt 3 were compared at time intervals of 10, 15, and 20 days. The highest OA for the three scenarios were 93.5%, 93.2%, and 93.0%, respectively. It can be observed that the time interval has a relatively limited impact on classification accuracy, but the 10-day time interval achieves the best performance. The EIT of S Ⅶ is five days earlier than that of S Ⅷ and ten days earlier than that of S Ⅸ . It can be seen that the EITs of the three scenarios are relatively similar, and they are all in the same phenological period. However, the EIT is delayed as the time interval becomes more prominent, which is consistent with the findings of You and Dong (2020).
The optimal strategy and final EIT were determined by comparing nine scenarios. The optimal strategy combines 10-day intervals and three alternative input features, showing the most effective performance for early mapping of winter wheat. The final EIT was determined according to scenario S Ⅶ , which is the early overwintering period (December 19) of the winter wheat phenology period, five months earlier than the harvest period.
Crop mapping is usually performed late in the growing season (Zhong, Lina, and Zhou 2019b). Limited remote sensing observations have challenged early crop mapping (Song et al. 2017). Some studies have focused on early mapping of winter wheat. For example, Dong et al. (2020a) compared the change in the cropland vegetation index with the seasonal change in standard winter wheat to judge their similarities to identify winter wheat. The EIT was obtained at the end of March, and the optimal OA was 89.88%. Tian et al. (2021) used Sentinel-2 imagery to analyze the differences in spectral signatures in early winter crop growth versus other land-cover types and used a decision tree to classify winter crops and obtained an EIT obtained of March 18, with an OA of 96.62%. The EIT of our proposed method was three months earlier than that of these two methods at comparable classification accuracies. Our findings suggest that mapping of winter wheat five months before the harvest is feasible with limited remote-sensing observations. The early mapping method for winter wheat proposed in this study also has limitations. For example, Dong et al. (2020a) demonstrated that their method achieved better accuracy than machine learning using a smaller number of training samples. Our study used an RF classifier, which puts forward higher requirements on the quantity and quality of samples.

Spectral characteristics
The reason for the improved mapping performance of winter wheat was analyzed using a time-series curve. The time series curves of winter wheat and non-winter wheat are shown for visual interpretation (Figure 7). During the sowing and emergence periods, the feature values of winter wheat and non-winter wheat were relatively similar, resulting in a lower discrimination accuracy. For example, the GCVI, IRECI, and NDVI value ranges of winter wheat from October 10 to November 9 were included for nonwinter wheat. The green, red, and red edge bands have extensive overlapping ranges. Since tillering began, the differences between winter wheat and non-winter wheat gradually became apparent. In the early overwintering period (EIT), the red edge 1 and red band difference were the largest, followed by GCVI, IRECI, and NDVI, which also showed greater differences than the other features. The number of winter wheat leaves increased from tillering to early overwintering, and the leaf chlorophyll content continued to increase. The green band shows a strong reflection of chlorophyll, whereas the red edge is the band region where the reflectivity of vegetation changes rapidly at the junction of the strong absorption in the red band and the strong reflection in the NIR band. Therefore, red edge 1, red band, GCVI, IRECI, and NDVI were sensitive to changes in leaf chlorophyll, which is a critical factor in distinguishing winter wheat from other ground objects.
The importance of various time-series periods was evaluated for winter wheat mapping. The feature importance scores were calculated according to the period (Figure 8). Overall, the cumulative value was the lowest during the mid-winter period (January 18). Because the temperature is low during the wintering period, the ground vegetation stops growing and gradually freezes until death. During this period, spectral index values, such as NDVI, gradually decreased, and the spectral difference between winter wheat and non-winter wheat was small. After the wintering period, this value increased. Because winter wheat grows rapidly after the overwintering period, the spectral response of winter wheat is quite different from that of other ground objects, which is also reflected in the curve in Figure 7. From the perspective of different periods before EIT, the value from October 30 to November 29 (tillering period) was relatively large. After EIT, the value from February 27 to April 7 (regreening and jointing  periods) was larger than that in other periods. Results suggest that the tillering period is crucial for early mapping of winter wheat, while the regreening and jointing periods are essential for postseason mapping. In addition, from the perspective of each feature, the LSWI appears to have a higher value in the sowing and emergence periods than in the other periods. Red edge 2 has the largest value during the heading period, while the other features have the largest values during the regreening and jointing periods.

Comparison of early and post-season mapping
The optimal strategy was used to map winter wheat in the EIT and the availability of the early map was evaluated. The early-and post-season maps were compared. Except for the end time of the time series, the other mapping parameters of the two maps were the same. (1) The end time of the early mapping time series was EIT (19 December 2019), and (2) the postseason mapping was 6 June 2020. As shown by these two results (Figure 9(a,b)), the distribution of winter wheat is almost the same. The OA of early mapping was 3% lower than that of post-season mapping, and the F1 score for winter wheat was also 3% lower (Table 5 and Table S1). Early mapping requires a balance between timeliness and accuracy, and it is acceptable to reach 91% 5 months before harvest.
The early map had more uncertainty than the postseason map (Figure 9(c)). The best early map performance was observed in the middle and east of the study area, which explains the uneven distribution of the samples. Six typical areas (Figure 9(d-i)) were selected for analysis. One is the commission error (regions d, e, and f), and the other is the omission error (regions g, h, and i). Correctly identified and misclassified points were selected for comparison in each region. The early season and post-season NDVI curves were drawn for analysis and composited red, green, and blue (RGB) images of different periods for visual comparison.
The main reasons for the commission errors (e.g. points A, C, and E) are because the early spectral characteristics of some crops (e.g. garlic) are similar to those of winter wheat (regions d, e, and f) (Figures 10, S1, and S2) and missing samples (regions d and f). It is worth noting that in region e, although there are samples that are evenly distributed, garlic is planted over a large area, and it has spectral characteristics very similar to winter wheat. The differences in the early spectral characteristics of winter wheat and some other crops were relatively small (Figures 10(m), S1m, and S2m). Although we can reduce commission errors by increasing samples, it is important to find features that can better differentiate winter wheat from garlic early.
The primary reason for the omission errors (such as points G, I, and K) is that the growth of some winter wheat was significantly weaker than that of the standard wheat in the early growing stage ( Figures S3, S4, and S5), which was caused by factors such as natural conditions and planting management. Specifically, region g is north of Henan, and the phenology is slightly later. Late rice planting in the previous season (harvest time around early November) led to the late sowing of winter wheat. Region h is located in a mountainous area, and it is easily affected by low temperatures and chilling damage. According to the Henan Meteorological Service (http://ha.cma.gov.cn/), the winter wheat planting time in southern Henan was about 10 days later than that in central Henan and it suffered from drought during the emergence and tillering periods, as well as insufficient irrigation, which seriously affected its growth. In addition, winter wheat at the edge of the field is often missed, which is also caused by weak growth in the early stages.
The post-season NDVI profile during the growing season was further analyzed, and most identification errors were amended during the green-up and jointing periods (17 February 2020 to 7 April 2020). Correspondingly, after the jointing period, the OA of S Ⅶ remains stable ( Figure 6). Winter wheat may grow rapidly in these two periods, and the spectral difference increases rapidly compared to non-wheat crops. Simultaneously, winter wheat with weaker growth in the early stage gradually returns to a normal growth situation. For the commission errors, taking region d as  an example, the early NDVI curves of points A and B were relatively similar (Figure 10(m)), whereas the NDVI value of point A was significantly reduced during the early green-up period (Figure 10(n)). Point C in Figure 10(j) became bare land and did not match winter wheat phenology. It is worth noting that the NDVI values of points C and D (region e) did not show a large difference until around 27 April 2020 (milk stage), when the NDVI values differed by approximately 0.2 ( Figure S1n). For the omission errors, taking region g as an example, the early NDVI curves of points G and H were very different ( Figure S3m), but in the green-up and jointing periods, the NDVI value of point G caught up with that of point H ( Figure S3n). A comparison of Figures S3m and S3n show that winter wheat at point G was not sown until mid-November, and the growth of winter wheat at point G was not equal to that at point H until the early green-up period ( Figure S3).

Limitations and expansions of early mapping methods
The early winter wheat mapping method proposed in this study achieved satisfactory results. This method can extract winter wheat planting areas five months before harvest; however, it also has limitations. Machine learning algorithms, such as RF classifiers, are primarily data-driven and highly dependent on samples. The quality of the samples, spatial distribution, and number of samples in each category resulted in classification uncertainty. Labeling errors and sample position offsets in field sampling will reduce sample quality. There are differences in crop phenology and growth in different areas of the study area, and the samples should be distributed as evenly as possible throughout the study area. An imbalance in the number of each land-cover type will cause a biased classification model. RF classifiers also have certain limitations. RFs require manual feature engineering and pre-definition of crop growth cycles, whereas deep neural networks can overcome this problem and generate high-dimensional features. The early mapping of crops using deep neural networks is a potential improvement. For example, Zhong, Lina, and Zhou (2019b) used an enhanced vegetation index time-series for crop mapping. They found that a deep learning model based on one-dimensional convolution can achieve a higher accuracy compared to XGBoost, RF, and support vector machine algorithms.
The early mapping method developed in this study can be applied to winter wheat in other regions. If the samples are in the target area, the early mapping framework can be used to obtain model variables that are suitable for the target area and perform early winter wheat mapping based on them. If the target area has no samples, the classifier transfer method can be used to map winter wheat (Wen et al. 2022). However, classifier transfer often produces errors owing to differences in remote sensing observations, environment, and management measures implemented by farmers. Climate variables (air temperature, precipitation, and soil moisture) and geographic information (elevation and aspect) can be introduced into the classification model, which enables the model to learn the differences in different climatic conditions to reduce classification errors.
The early mapping framework proposed in this study can serve as a reference for the early mapping of other crop types. In this framework, elements with large differences are phenology of crops, crop samples, remote sensing data sources, and input feature combinations. The application of this framework to other crop types requires consideration. First, the phenology of the crops in the target area is determined. Then, the appropriate time to collect samples from the classified target area is selected to obtain representative ground samples. Finally, the appropriate remote sensing data source is selected, the growth characteristics of the crop studied, and the appropriate model input feature combination is applied.

Conclusions
A framework was developed for the early mapping of winter wheat in Henan province using spectral and temporal characteristics. First, we compared the three input feature schemes and derived the optimal scheme (LSWI, GCVI, red edge 2, IRECI, green, red edge 1, red, and NDVI), which was better than the schemes that only use spectrum bands or spectral indices. Then, based on the obtained input features, we compared three time intervals and found that the 10-day interval achieved the best performance and the EIT was delayed as the time interval increased. Finally, based on the determined optimal strategy, we concluded that the final EIT of winter wheat was in the early overwintering period (five months before harvest). The identification OA was 0.91, and the winter wheat F1-score was 0.91. This study also explains the reason for the improved mapping performance in terms of EIT from the spectrum aspect and evaluates the importance of different features for classification in multiple periods. In addition, compared with postseason mapping (i.e. the OA is 0.94, and the winter wheat F1-score is 0.94), early mapping may cause misclassification probably due to uneven sample distribution, planting management, and natural conditions. However, with the rapid growth of winter wheat during the green-up and jointing periods, most errors can be gradually amended. Results show that this framework enables large-scale early mapping of winter wheat. This method can also be applied to the remote sensing identification of other crops, and hence, can serve as a reference for the early mapping of other crops worldwide.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This study was supported by the National Key Research and Development Program of China (project No. 2019YFA0607401).

Data availability statement
Due to the nature of this research, the participants of this study do not consent to share data publicly, and therefore, supporting data are not available.