Reconstructed NDVI and EVI datasets in China (ReVIChina) generated by a spatial-interannual reconstruction method

ABSTRACT Remote sensing-based vegetation index (VI) data are significantly impacted by cloud contamination. Spatiotemporal reconstruction methods demonstrate higher accuracy than temporal reconstruction methods. However, the computing time and random access memory (RAM) consumption of these spatiotemporal reconstruction methods for large-scale reconstruction remains unclear. In this study, a method called spatial-interannual reconstruction (SIR) was proposed to reconstruct cloud-contaminated pixels in MODIS normalized difference VI (NDVI) and enhanced VI (EVI) data. SIR has four major advantages: (1) High accuracy. The average mean absolute error of SIR was 0.0338, which was 20.2% and 23.4% lower than that of two state-of-the-art spatiotemporal reconstruction methods (i.e. interpolation of the mean anomalies (IMA) and Gapfill). (2) High computing speed. The average computing time of SIR was 99.7% and 98.8% lower than IMA and Gapfill, respectively. (3) Low RAM consumption. (4) Simultaneous reconstruction of all invalid values. Reconstructed 250 m spatial resolution and 16-day composite NDVI and EVI datasets in China from 2000 to 2022 (written as ReVIChina) were developed based on the SIR method and MODIS MOD13Q1 data. Spatiotemporal analyses revealed that the reconstructed datasets were more reliable than the original product and a similar dataset.


Introduction
Vegetation plays a critical role in the Earth's system.Existing research indicates a global greening trend in vegetation, which profoundly impacts vegetation productivity and the carbon cycle, water cycle and distribution, climate change and surface energy balance (Chen et al. 2019a;2019b;Forzieri et al. 2020;Piao et al. 2019;Zeng et al. 2018).Remote sensing is the most effective method for monitoring vegetation dynamics.Some remote sensing vegetation index (VI) products, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) normalized difference VI (NDVI) and enhanced VI (EVI), have been widely utilized in vegetation dynamics and related research (Cazorla et al. 2023;Dalagnol et al. 2023;Ims et al. 2019).
However, the quality of satellite-derived VI data is significantly affected by cloud contamination.Despite the use of the maximum value composite method in many VI products (e.g.MODIS and Global Inventory Monitoring and Modeling Studies (GIMMS)) to mitigate the impact of cloud contamination, residual effects persist widely, particularly in areas with frequent cloud cover (Li et al. 2021;Liu et al. 2017).For example, Liu et al. (2017) discovered a significant number of abnormally low values in MODIS NDVI products in humid areas due to cloud contamination.
To address this issue, various methods have been developed to reconstruct cloud-contaminated VI.Early methods primarily utilized temporal information (i.e.information from neighbouring dates), such as the Savitzky-Golay filter (Chen et al. 2004), the Double Logistic function (Beck et al. 2006), the Fourier-based method (Jakubauskas, Legates, and Kastens 2001), the Asymmetric Gaussian function (Jönsson and Eklundh 2002) and others (Kong et al. 2019;Yang et al. 2015).These methods are relatively simple and have been widely employed in vegetation dynamics and related research (Choler et al. 2021;Hilker et al. 2014;Tian et al. 2021).More recently, spatiotemporal reconstruction methods that incorporate both spatial and temporal information (i.e.information from neighbouring pixels and dates) have been widely developed (Cao et al. 2018;Chen et al. 2021;Chu et al. 2021;Gerber et al. 2018;Huang et al. 2021;Li et al. 2023;Militino et al. 2019;Yang et al. 2022).These methods have demonstrated higher accuracy compared to methods that solely rely on temporal information.For example, Cao et al. (2018) proposed a method that integrates spatiotemporal information with the Savitzky-Golay filter, achieving higher accuracy than the Savitzky-Golay filter, the Asymmetric Gaussian function, the Double Logistic function and the Fourier-based method.Chu et al. (2021) introduced a method that utilizes spatial, intra-annual and interannual information to reconstruct NDVI.The results showed that this method performs better than four temporal filtering methods and the method developed by Cao et al. (2018).Li et al. (2023) developed a spatiotemporal prefill method with harmonic analysis of time series (ST-HANTS) to reconstruct NDVI.ST-HANTS significantly outperformed the Savitzky-Golay filter, wavelet transform, HANTS and data assimilation.
Despite significant progress being made in reconstructing cloud-contaminated VI, two limitations still exist.First, the computing time and random access memory (RAM) consumption of these spatiotemporal reconstruction methods for large-scale reconstruction remains unclear.Previous studies have not assessed the computing time and RAM consumption (Cao et al. 2018;Chen et al. 2021;Gerber et al. 2018;Huang et al. 2021;Li et al. 2023) or assessed the computing time in small regions (Chu et al. 2021;Militino et al. 2019;Yang et al. 2022).Second, there is a lack of large-scale, long time series reconstructed VI datasets produced using spatiotemporal reconstruction methods with high-precision.Gao et al. (2023) developed a monthly reconstructed NDVI dataset with 250 m resolution in China.However, Gao et al. (2023) first used a spatial reconstruction method to reconstruct NDVI and then used the Savitzky-Golay filter to smooth the reconstructed NDVI.Both the spatial reconstruction method and the Savitzky-Golay filter have been proven to have lower accuracy than spatiotemporal reconstruction methods (Cao et al. 2018;Gerber et al. 2018;Li et al. 2023;Liu et al. 2019;Militino et al. 2019).Using low-quality datasets produced by low-precision methods may result in unreliable results.
Considering the aforementioned limitations, this study proposes a reconstruction method that simultaneously has high accuracy, high computing speed and low RAM consumption.Subsequently, 250 m spatial resolution and 16-day composite reconstructed NDVI and EVI datasets for China from 2000 to 2022 were developed using this method and MODIS MOD13Q1 data.In the following text, Section 2 introduces the study area and data.Section 3 details the preprocessing and reconstruction methods.Section 4 presents the results, including the accuracy and computing costs of the proposed method, as well as the developed reconstructed NDVI and EVI datasets.
Section 5 discusses the proposed method and developed datasets.Finally, Section 6 summarizes this study.

Study area and data
China, the largest developing country in the world (Figure 1), ranks second in terms of population and third in terms of land area.Across China, from the southeast to the northwest, a general decrease in NDVI can be observed (Figure 1(a)).This is accompanied by a gradual transformation of climate types from hot and humid to cold and dry, and a shift in dominant land cover types from forests to deserts.China is a hot research area for vegetation dynamics (Song, Feng, and Wang 2022;Yao et al. 2019;Zheng et al. 2019).Over the past few decades, China has exhibited the most prominent greening trend globally (Chen et al. 2019a;Piao et al. 2019;Zhang et al. 2017).
MODIS MOD13Q1 NDVI and EVI data (version 6.1, 250 m spatial resolution and 16-day composite) from February 2000 to December 2022 were used in this study.The NDVI is one of the most widely used vegetation indices and is sensitive to chlorophyll variations.The EVI was designed to enhance sensitivity in densely vegetated areas and mitigate atmospheric influences (Huete et al. 2002;Xue and Su 2017).The MOD13Q1 product incorporates quality assurance flags that provide information about the quality of each pixel (Huete et al. 2002).

Preprocessing
Pixels with quality other than 'good' in the MOD13Q1 NDVI and EVI data were set as invalid according to previous studies (Cao et al. 2018;Gerber et al. 2018).However, this filtering approach may unintentionally exclude some values that appear to be correct, and lead to an increase in computing time.To address this issue, the following preprocessing steps were employed to retain more pixels.
(1) Pixels with a multiyear average growing season (from Apr. to Oct.) NDVI lower than 0.1 were considered non-vegetated areas, primarily consisting of deserts and water bodies.The NDVI values of these pixels were directly set to 0.1, irrespective of their quality.The threshold of 0.1 aligns with previous studies (Liu et al. 2020;Zhao, Dai, and Dong 2018).The threshold for EVI was set to 0.067, which corresponds to the percentile of 0.1 in NDVI.
(2) Pixels with a multiyear average NDVI (EVI) on the same date lower than 0.1 (0.067) (mainly representing withered vegetation in winter) were directly set as 0.1 (0.067), regardless of their quality.
(3) In the MODIS VI products, there are pixels labelled as other quality, but their values appear to be correct (Figure 2).Therefore, the pixels labelled as other quality but with a VI higher than 80% of the multiyear average VI on the same date, were considered good quality and retained.(4) Pixels with an NDVI (EVI) lower than 0.1 (0.067) were directly set as 0.1 (0.067).The reason for using this step is to prevent the impact of extremely low values (e.g.negative values in some water bodies) on the reconstruction of the VI.

Reconstruction of the VI
In this study, a method was proposed to reconstruct cloud-contaminated VI.Because this method uses spatial and interannual information, it is called spatial-interannual reconstruction (SIR).A schematic of the SIR is shown in Figure 3, consisting of the following three steps.
(1) Calculating a multiyear average image.To reconstruct a target image, a multiyear average image on the same date was calculated.Only VI data with good quality or those retained in the preprocessing steps were considered for the calculation of the multiyear average image.If all the VI data were invalid (i.e.labelled as other quality and not retained in the preprocessing steps), the average of all the invalid VI data was used.Consequently, the multiyear average image did not contain any missing or invalid values.
(2) Defining a spatial window.For a target pixel in the target image to be reconstructed, a spatial window with an initial size of 11 × 11 pixels was established.If there was more than one valid VI data point within the spatial window in the target image, the VI data from both the target and multiyear average images within the spatial window were extracted to reconstruct the target pixel.Otherwise, the size of the spatial window was increased from 11 × 11-31 × 31, 111 × 111, 431 × 431, and so on until the aforementioned condition was met.The interval of size of the latter special window was set as four times that of the previous special window, to reduce the computing time needed for filling large gaps.(3) Reconstructing the target pixel.The VI of the target pixel was reconstructed using Equation (1): where VI x,t,1 is the reconstructed VI of the target pixel.x and y are the target pixel and a pixel in the spatial window, respectively.t and m represent the target image and multiyear average image, respectively.VI x,m and VI y,m are always valid, because the multiyear average image does not contain any invalid values.Therefore, if VI y,t is valid, VI x,t,1 can be reconstructed.This equation is based on the hypothesis that the difference in VI between x and y in the target image is equivalent to the multiyear average image.Within the spatial window, each valid VI can be utilized to reconstruct a VI value for the target pixel.The VI of the target pixel was finally computed as the weighted average of all the reconstructed VI values.The weight was calculated as: where D 2 x,n is the square of the distance between the target pixel x and pixel n.The power of D x,n was set as 2, to reduce the computing time.Distant pixels were given small weights, because closer pixels tend to be more similar.The formula on the right side of the multiplication sign in the denominator represents the difference in VI between the target pixel and pixel n  1. NDVI (EVI) values below 0.1 (0.067) were directly set to 0.1 (0.067).To further refine the time series of the VI data, a Savitzky-Golay filter (Savitzky and Golay 1964) was applied for smoothing.The use of an upper envelope was not necessary, as low outliers were already reconstructed.The R code for the SIR method is available from the authors.

Accuracy validation
The accuracy of VI reconstruction was validated in the North China Plain (NCP), Qinghai-Tibet Plateau (QTP) and Wuyi Mountains (WYM) (Figure 4).Each area is 200 × 150 km (or each image had 800 rows and 600 columns).These three areas were chosen to represent cropland/impervious surface mosaics, grassland and forest, respectively.Artificial gaps were introduced into the images during the growing season of 2015.These gaps were created with dimensions of 30 × 30 and 120 × 120 pixels, representing small and large cloud-contaminated areas, respectively.Subsequently, the SIR method was employed to fill these artificially generated gaps.Finally, the reconstructed VI data were compared with the original VI data.The mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R 2 ) were utilized to quantify the accuracy.
The performance of the SIR was compared with interpolation of the mean anomalies (IMA) (Militino et al. 2019) and Gapfill (Gerber et al. 2018).These methods were selected because: (1) they are spatiotemporal reconstruction methods and have shown higher accuracy compared to other reconstruction methods (Gerber et al. 2018;Militino et al. 2019).( 2) They can be easily implemented using R add-on packages (IMA: 'RGISTool'; Gapfill: 'gapfill').The accuracy validation and comparison were conducted using the preprocessed images.

Computational costs
The computing time of the three methods was compared in the WYM from 2 February 2003 to 10 June 2003 (a total of nine images).These specific images were chosen because the proportions of invalid values in them were evenly distributed between 0% and 100%.This selection allowed us to evaluate the performance of the different methods under varying proportions of invalid values.Subsequently, the computing time and RAM consumption of the SIR and Gapfill methods were tested for the entire region of China (the image has 16,179 rows and 19,381 columns) for 2015.The IMA method was not tested for the entire region of China because its RAM consumption approached the upper limit of the available RAM (128 gigabytes (GB)) on our computer when the image size exceeded 1100 rows and 1100 columns.These experiments were conducted using a computer equipped with an Intel Core i7-12700 Central Processing Unit (CPU) and 128 GB of RAM.

Accuracy of the VI reconstruction
The SIR method significantly outperformed IMA and Gapfill (Figures 5 and 6).For all cases, the SIR method consistently exhibited lower MAE and RMSE values, as well as higher R 2 values compared to the IMA and Gapfill methods.On average, the SIR method achieved an MAE of 0.0338, an RMSE of 0.0498, and an R 2 of 0.861.The average MAE of the SIR method was 20.2% and 23.4% lower than that of the IMA and Gapfill methods, respectively.The improvement in accuracy was particularly pronounced in the NCP region.To uncover the underlying reasons, we analyzed the intra-annual variations in the regional average NDVI and the MAE in the NCP region (Figure 7).The NDVI curves exhibited significant fluctuations, with two prominent peaks and valleys in each curve.This is because the dominant land cover type in the NCP region is cropland.Human activities, such as planting and harvesting, have a substantial impact on the NDVI values.The IMA and Gapfill methods, which rely on information from neighbouring dates, may introduce uncertainties.The MAEs of the IMA and Gapfill methods were relatively low on dates with drastic changes in NDVI.In contrast, the NDVI curves in different years exhibited similarity, and the NDVI values on the same dates were close.Consequently, utilizing interannual information proved to be more reliable than relying on intra-annual information in this particular case.The fluctuation of MAE in the SIR method was smaller than that of the IMA and Gapfill methods.In summary, the utilization of interannual information in the SIR method resulted in high accuracy within the NCP region.
The MAE and RMSE values for VI reconstruction were higher in the NCP than in the QTP and WYM (Figures 5 and 6).The average MAEs of the SIR method in the NCP, QTP and WYM were 0.0382, 0.0301 and 0.0332, respectively.This discrepancy can be attributed to the heterogeneity of the land cover types and the VI within the NCP region, making VI reconstruction more challenging.Additionally, the accuracy in filling the gaps of the 120 × 120 pixels was much higher than that for the gaps of the 30 × 30 pixels in the NCP region (Figures 5 and 6).For example, the average MAEs of the SIR method for filling the gaps of the 30 × 30 and 120 × 120 pixels in the NCP region were 0.0348 and 0.0415, respectively.This result is understandable, because distant pixels were used when filling the gaps of the 120 × 120 pixels.Distant pixels generally exhibit weaker correlations with the target pixel, thereby introducing higher errors in VI reconstruction, particularly in areas characterized by heterogeneous VIs and land cover types.In comparison, the average MAEs of the three methods for filling the gaps of the 30 × 30 pixels were similar to those for filling the gaps of the 120 × 120 pixels in the WYM and QTP regions.This can be attributed to the homogeneity of the VI and land cover type in these two regions.

Computing costs of the VI reconstruction
The SIR method exhibited a high computing speed (Table 1).On average, the computing time for the SIR method to reconstruct the images in the WYM was 13.89 s, which was 99.7% and 98.8% less than IMA and Gapfill, respectively.Most reconstructions using the SIR method were completed within 20 s.In comparison, Gapfill and IMA often took more than 5 min and 1 h to reconstruct images, respectively.In addition, when reconstructing images for all of China, the average computing time for the SIR method was 1433.22 s, with most reconstructions completed within 1 h (Table 2).In comparison, Gapfill often took more than 1 d to reconstruct images for the entire region of China.Due to the extensive computing time and RAM consumption, IMA was not tested for the entire region of China.It is important to note that the results were obtained using a single core of the CPU.Utilizing multiple cores in parallel could significantly increase the computing speed.For example, R software can support parallel computation of multiple windows, therefore, VI reconstruction can be easily run using multiple cores of the CPU in parallel.The reasons behind the high computing speed of the SIR method are analyzed in the discussion section.The RAM consumption of the SIR method was low (Table 2).The average RAM consumption for reconstructing NDVI images for the entire region of China in 2015 using the SIR method was 6.44 GB, which was 80.8% lower than that of Gapfill (33.61 GB).The differences in RAM consumption for reconstructing different images were small for both the SIR and Gapfill methods.With its low RAM consumption, the SIR method can easily be run using multiple cores of CPU.For example, with our computer equipped with 12 cores of CPU and 128 GB RAM, the SIR method could be efficiently run using 9 cores of CPU in parallel, with the CPU utilization rate approaching 100% and the RAM utilization rate below 70%.
The SIR method simultaneously reconstructed all of the cloud-contaminated pixels, similar to Gapfill (Tables 1 and 2).In contrast, IMA and many previous methods are unable to reconstruct all missing values simultaneously (Cao et al. 2018;Militino et al. 2019;Yang et al. 2022), or they require numerous complex steps for complete reconstruction (Li et al. 2018;Yao et al. 2023c).This suggests that the SIR method is a convenient and efficient approach.

Reconstructed NDVI and EVI datasets
The original MODIS MOD13Q1 NDVI and EVI data were preprocessed using the preprocessing methods described in Section 3.1.Subsequently, 250 m spatial resolution and 16-day composite reconstructed NDVI and EVI datasets were developed for China (written as ReVIChina) from 2000 to 2022 using the SIR method and preprocessed images.The ReVIChina NDVI data were compared to the original MOD13Q1 NDVI products and a monthly reconstructed NDVI product in Gao et al. (2023).Because the reconstructed NDVI product in Gao et al. (2023) is monthly composite, the original and ReVIChina NDVI data were averaged into a month for comparison.The WYM region was selected for comparison, because: (1) it has a humid climate, and the NDVI is strongly affected by cloud contamination.(2) The dominant vegetation type in the WYM region is forests, which are less affected by human activities.
Figure 8 presents the comparisons of the spatial distributions of the three NDVI datasets in the WYM region in 2002.It can be clearly seen that the original NDVI data were significantly affected by cloud contamination.The NDVI significantly decreased from April to May.The effect of cloud contamination on the NDVI was not completely removed by Gao et al. (2023).In contrast, the spatial distributions of ReVIChina NDVI in different months were similar.Low NDVI values were generally observed in impervious surfaces and rivers.These results indicate that the impact of cloud contamination on VI has been effectively eliminated in ReVIChina.
Figure 9 illustrates the intra-annual comparisons of the three NDVI datasets in the WYM region in 2002, 2005, 2007 and 2015.It is evident that the original VI curves exhibited numerous abnormally low outliers caused by cloud contamination.The impact of the cloud contamination on the regional average NDVI exceeded 0.3.The effect of the cloud contamination on the NDVI was also evident in Gao et al. (2023).In comparison, the ReVIChina NDVI curves were relatively smooth, with few extreme fluctuations.Additionally, there were strong correlations between the ReVIChina NDVI data in different years.For example, the Pearson's correlation coefficient between the ReVIChina NDVI in 2002 and 2005 was 0.955.In contrast, the Pearson's correlation coefficients for the original NDVI and the NDVI in Gao et al. (2023) were 0.622 and 0.919, respectively.These findings indicate that the quality of the VI data was significantly improved by the SIR method.
Figure 10 shows the reconstructed 23-year average growing season VI and the difference between the original and reconstructed 23-year average growing season VI.In the forests in southern China, the original VI was significantly lower than the reconstructed VI, indicating that the cloud contamination had a substantial impact in these areas.This can be attributed to two factors: (1) Southern China has a humid climate.VI is more frequently affected by cloud contamination.(2) Forests generally have high VI, thus leading to a large discrepancy between the cloud-contaminated and reconstructed VI values.In contrast, the difference between the original and reconstructed VIs was minimal in northern China due to its arid climate and generally low VI values.

The SIR method
In this study, a spatiotemporal reconstruction method called SIR was proposed to reconstruct cloud-contaminated VI data.The SIR method was shown to have high accuracy, high computing speed, and low RAM consumption, and could simultaneously reconstruct all the cloud-contaminated pixels.These four major advantages of the SIR method are discussed in detail below.
First, the SIR method had high accuracy.One of the reasons for its high accuracy is the utilization of interannual information, which enhances the reconstruction accuracy in the NCP (as discussed in Section 4.1).Additionally, SIR employs a weighted average of all reconstructed VI values as the final reconstructed VI (as described in step 3 of Section 3.2).This approach reduces the impact of outliers and improves the accuracy of VI reconstruction.
Second, the SIR method demonstrated high computing speed due to its well-designed framework.The main reasons for the high computing speed of the SIR include the following: (1) the interval of the size of the spatial window increases significantly (as explained in step 2 in Section 3.2) instead of remaining constant as seen in methods such as Gapfill and some previous approaches (Gerber et al. 2018;Li et al. 2018;Zeng et al. 2015).This strategy reduces the computing time needed for filling large gaps.( 2) SIR utilizes only one related image (the multiyear average image), which is significantly fewer compared to previous approaches (Gerber et al. 2018;Li et al. 2018;Sun et al. 2017;Yao et al. 2021).By reducing the number of related images, the computing time is decreased.(3) All the steps in the SIR method employ simple and fast mathematical operations, such as addition, subtraction, division, and averaging.SIR avoids complex and time-consuming operations.The lengthy computation time of the IMA method is attributed to its use of a thin plate spline method for interpolation, which is time-consuming (Militino et al. 2019).The computing time of the Gapfill is long because it ranks each image based on valuewise comparisons between all images (Gerber et al. 2018).
Third, the SIR method exhibited low RAM consumption.The reasons for the low RAM consumption of SIR are as follows: (1) SIR utilizes only one related image (the multiyear average image) to reconstruct the target image.Consequently, only two images (the target image and the multiyear average image) need to be read during the reconstruction process.(2) All the steps in the SIR method involve simple mathematical operations that do not require additional RAM allocation.
Fourth, the SIR method can simultaneously reconstruct all cloud-contaminated pixels, as can the Gapfill (Tables 1 and 2).This is because: (1) the related image does not contain any invalid values, and (2) there is no upper limit set for the size of the spatial window.In comparison, IMA and many previous methods cannot simultaneously reconstruct all missing values (Cao et al. 2018;Militino et al. 2019;Yang et al. 2022), or require many complex steps (Li et al. 2018;Yao et al. 2023c).The main reason for this is that the related images used in these methods contain invalid values.Overall, SIR is a robust reconstruction method with high accuracy and low computing costs.The main difference between SIR and previous methods is that SIR has been proven suitable for largescale reconstruction.
It is important to note that the SIR method is specifically designed for reconstructing VI data.It operates based on the hypothesis that the VI difference between closely spaced pixels in the target image is consistent with the multiyear average image (refer to step 3 in Section 3.2).However, this hypothesis may not hold true when reconstructing other types of data, such as land surface temperature and snow cover.This is because the spatial distribution of land surface temperature and snow cover can be influenced by various factors and can vary significantly across different years.For instance, land surface temperature can be affected by soil moisture, while snow cover can be influenced by snowfall.Therefore, SIR cannot be directly applied to reconstruct these types of data.Some modifications, such as adjusting the calculation method of related images, would be necessary when utilizing the SIR method for reconstructing such data.

The reconstructed NDVI and EVI datasets
This study developed 250 m spatial resolution and 16-day composite NDVI and EVI datasets in China from 2000 to 2022.The cloud-contaminated pixels were reconstructed using a high-precision spatiotemporal reconstruction method called SIR.The reconstructed NDVI and EVI datasets exhibited significantly higher quality compared to the original MODIS MOD13Q1 product (refer to Section 4.3).
Remote sensing VI data play a crucial role in various fields.First, it is extensively used to analyze vegetation dynamics and explore the relationships between vegetation and factors such as climate, hydrology, and human health (Geng et al. 2020;Liu et al. 2022;Yang et al. 2021).The improved quality of the developed VI datasets can enhance the reliability of conclusions drawn from these studies.Second, remote sensing VI data serve as an essential input for estimating crop yields, drought indices, groundwater recharge, aboveground biomass, and other parameters (Li et al. 2019;Parizi et al. 2020;Xie and Fan 2021;Zhang et al. 2021;2023).

Conclusions
In this study, a method called SIR was proposed to reconstruct cloud-contaminated VI data.SIR has high accuracy, high computing speed, and low RAM consumption, and can simultaneously reconstruct all cloud-contaminated pixels.
The accuracy of SIR was validated using simulated cloud-contaminated areas.The average MAE of SIR was 0.0338, which was 20.2% and 23.4% lower than the MAE of the other two methods tested.The average computing time of SIR was 99.7% and 98.8% lower than that of the other two reconstruction methods tested in the WYM.For reconstructing NDVI images across all of China, the average computing time and RAM consumption of SIR were 1433.22 s and 6.44 GB, respectively.The SIR method, due to its low computing costs, is suitable for large-scale reconstruction.The low computing costs of the SIR method can be attributed to its well-designed framework such as a reduced number of related images and simple mathematical operations.The 250 m spatial resolution and 16-day composite reconstructed NDVI and EVI datasets in China from 2000 to 2022 were developed based on the SIR method (available at https://doi.org/10.5281/zenodo.7988989and https://doi.org/10.5281/zenodo.7979989,approximately 220 GB) (Yao et al. 2023a;2023b).Spatiotemporal analyses demonstrated that the reconstructed datasets were more reliable than the original product and a similar dataset.
The developed reconstructed VI dataset can serve as essential input data in various fields such as ecology, hydrology, meteorology, agriculture, and forestry.Future studies can explore its applicability for reconstructing other types of data, such as land surface temperature and snow cover.

Figure 1 .
Figure 1.The study area of this study.The background maps display the 23-year average growing season reconstructed normalized difference vegetation index (NDVI) and the China land cover dataset (CLCD) in 2021 (Yangand Huang 2021).Note that the lower limit of the NDVI was set to 0.1 in the postprocessing steps.

Figure 2 .
Figure2.The NDVI, the quality of the NDVI, the 23-year mean NDVI and the land cover in an area of the Wuyi Mountains (WYM).Note that the lower limit of the NDVI was set to 0.1 in the preprocessing steps.

Figure 3 .
Figure 3. Schematic diagrams of the SIR method.

Figure 5 .
Figure 5.The accuracy of the three methods in filling the gaps of 30 × 30 pixels.IMA: Interpolation of the mean anomalies.MAE: mean absolute error.RMSE: root mean square error.R 2 : coefficient of determination.

Figure 6 .
Figure 6.The accuracy of the three methods in filling the gaps of 120 × 120 pixels.

Figure 7 .
Figure 7. Intra-annual variations in the regional average NDVI from 2014 to 2016 and the MAE in 2015 in the NCP region.

Figure 8 .
Figure 8. Comparisons of the spatial distributions of the original NDVI, the reconstructed NDVI in Gao et al. (2023) and the ReVI-China NDVI in the WYM.

Figure 9 .
Figure 9. Comparisons in the intra-annual variations of the original NDVI, the reconstructed NDVI in Gao et al. (2023) and the ReVIChina NDVI in the WYM.

Figure 10 .
Figure 10.The reconstructed 23-year average growing season VI (first row) and the difference between the original and reconstructed 23-year average growing season VI (second row).

Table 1 .
The computing time (unit: second) for the NDVI reconstruction and the proportion of valid values before and after the reconstruction in the WYM (800 rows and 600 columns).

Table 2 .
The computing time (unit: seconds) and random access memory (RAM) consumption (unit: GB) of the SIR method for the NDVI reconstruction in all of China (16,179 rows and 19,381 columns) in 2015.