A satellite-derived, ground-measurement-independent monthly PM2.5 mass concentration dataset over China during 2000–2015

ABSTRACT Following the accelerated development of urbanization and industrialization, atmospheric particulate matter has become a significant threat to public health globally. Environmental health studies usually use the mass concentration of fine particles (PM2.5) as a base data to predict the health risks of particulate exposure. However, PM2.5 data from ground monitoring stations in China has not been provided until January 2013 by the Ministry of Environmental Protection of China. Hence, an alternative dataset of PM2.5 spatiotemporal distributions extending to years earlier than 2013 is urgently needed, which is of great significance to atmospheric environment assessment and pollution prevention and control. Atmospheric aerosol products by the moderate-resolution imaging spectroradiometer (MODIS) have been released since 2000, which provides the possibility to reconstruct historical PM2.5. However, most current methods do not have the ability to estimate PM2.5 mass concentration independently of ground observations. The PM2.5 mass concentration data set produced by PM2.5 remote sensing (PMRS) model based on physical processes does not depend on the ground observations, and also is not affected by the uncertainty of model emission sources or the completeness of chemical reaction mechanism. These ensure that the point-by-point validation for PM2.5 mass concentration data is more convincing, and the dataset can also be further used for model assimilation and artificial intelligence training to improve their predictions. In this study, we calculate the monthly PM2.5 mass concentration near the ground over land of China using aerosol inversion products (aerosol optical depth and fine-mode fraction) of MODIS and meteorological data (boundary layer height & relative humidity) provided by the Modern-Era Retrospective Analysis for Research and Applications Version 2 (MERRA-2) data set. The results show that, in China, 6 pollution centers mainly concentrated in the central and eastern regions. The highest PM2.5 mass concentration occurred in winter, whereas the pollution range was larger in summer. There are 63.4% of validation sites with biases within ±20 μg m−3, and the expected error is as ±(15 μg m−3 + 30%) enveloped by the monthly mean PM2.5 mass concentrations. The monthly PM2.5 is stored as NETCDF format, with a spatial resolution of 1°×1°. The published data is available in http://www.dx.doi.org/10.11922/sciencedb.j00076.00061.


Introduction
As the pollution of fine particulate matter (PM 2.5 ) in the atmosphere becomes more and more serious, the harm to human health is becoming increasingly apparent. Due to an increasing number of multidisciplinary researches on pollution prevention and human exposure, the demand of PM 2.5 products is increasing. Satellite remote sensing, as a new technology, has the advantages of widely spatial coverage and strong objectivity for particle monitoring. In the past 20 years, satellite remote sensing estimation of PM 2.5 has gone through three major stages of continuous exploration, thus forming three mainstream algorithms, namely physical formula, multivariate statistical regression, and satellite-model coupling methods.
Immediately after the moderate-resolution imaging spectroradiometer (MODIS) sensor was launched into the space, validation of its aerosol products using the aerosol robotic network (AERONET) observation began. With the advancement of validation, Chu et al. (2003) found that daily average AOD from AERONET had a good correlation with 24-hour averaged PM 10 . If the particle vertical profile was available, the inversion of atmospheric particles concentration from satellite remote sensing would become possible. In the same year, Wang and Christopher (2003) evaluated the correlation between AOD from MODIS and PM 2.5 from United States Environmental Protection Agency (EPA). They pointed out that despite the influence of ambient humidity and aerosol layer height, AOD and PM 2.5 still showed a good correlation. These two studies jointly pointed out that the aerosol optical information retrieved by satellite remote sensing can be expected to be used to estimate the particulate mass concentration near ground. Therefore, a direct linear correlation between AOD and atmospheric particulate matter was established using a basic statistical regression method. The accuracy of the estimated particle mass concentration obtained by this method is limited. Engel-Cox, Holloman, Coutant, and Hoff (2004) used MODIS satellite observations to analyze the correlation between AOD and PM 2.5 over the United States. They found that the correlation strongly depended on the temporal and spatial changes and was also related to aerosol types. Further research showed that the correlation between PM 2.5 and AOD integrated by lidar in the boundary layer was better than that with the entire atmospheric AOD (Engel-Cox et al., 2006), indicating that the aerosol vertical distribution has an important influence on this correlation. Gupta et al. (2006) evaluated the correlation between AOD and particulate matter in typical sites in Europe, the United States, and Asia. They reported that this correlation was not only affected by vertical distribution but also by ambient humidity. However, the above methods still remain as simple statistical models until Koelemeijer, Homan, and Matthijsen (2006) first proposed a more complete theoretical derivation, which laid the foundation for the physical method of remote sensing of particulate matter. Although Koelemeijer et al. (2006) obtained a formula to calculate particulate mass concentration, it cannot yet be used to estimate the near-ground particle concentration directly. It is because even though the vertical distribution, hygroscopic growth and effective density of particulate matter can be solved by appropriate model assumptions, there is still an unknown parameter (called S hereafter) related to the particle characteristics (particle size distribution, complex refractive index, etc.), which makes it impossible for physical model to directly estimate near-ground particulate matter. This is also an important reason for giving up physical methods and the rapid development of other methods.
The statistics method, represented by the research of Liu, Paciorek, and Koutrakis (2009), has been widely used in remote sensing estimation of particulate matter near the ground. They estimated PM 2.5 mass concentration through a two-stage model using AOD, in-situ PM 2.5 and a series of auxiliary data. Other popular statistical methods (Brokamp, Jandarov, Hossain, & Ryan, 2018;Fang, Zou, Liu, Sternberg, & Zhai, 2016;He & Huang, 2018;Hu et al., 2017;Ma, Hu, Huang, Bi, & Liu, 2014;Ma et al., 2016;Shen, Li, Yuan, & Zhang, 2018;Wei et al., 2019) also included mixed effects models, random forest models, and deep learning, etc. A major advantage of these methods is that the models are directly constrained by ground observations so the uncertainty of the estimated particle mass concentration can be greatly reduced. However, it is impossible to analyze the reasons of outliers due to weakening the physical meaning of parameters in statistics method. Furthermore, statistical models unfortunately do not work so well in the areas where ground observations are sparse.
Another typical method is the combination of atmospheric chemistry transport model and satellite remote sensing as represented by Van Donkelaar et al. (2010, 2006, 2012. This method has a great advantage when estimating the spatial distribution of PM 2.5 with interannual variation, which is related to the prediction robustness of the model on a long-term scale. This method is limited by the chemistry scheme and emission inventory, and the accuracy is relatively low on a short term (daily or instantaneous observations).
Although the earlier physical methods have significant defects, there are still a few scholars who have studied them in depth.  and Raut, Chazette, and Fortain (2009) calculated the unknow parameter S based on mobile lidar observations, and they also took the influence of aerosol vertical distribution and hygroscopic characteristics into account. However, this method was only developed for the wavelength of 355 nm and the type of aerosol needed to be manually set up, which had become an obstacle to its application. Kokhanovsky, Prikhach, Katsev, and Zege (2009) put forward the use of Ångström index (AE) to parameterize the effective radius of particles to solve the problem of the unknown parameter S. However, the parameterized relationship of the aerosol effective radius was established only using a single typical aerosol model, which reduced the spatiotemporal universality of their method. Subsequently, some scholars Lin et al., 2015;Wang, Xu, Spurr, Wang, & Drury, 2010b) obtained the spatial distribution of S within a certain period, which cannot be easily extended outside the period. Zhang and Li (2015) combined the advantages of the methods of  and Kokhanovsky et al. (2009) and established a semiempirical physical model, in which the volume extinction ratio (VE f ) parameter was parameterized by the fine-mode fraction (FMF) to quantify S for various typical aerosol models. And then, Zhang et al. (2020) developed a regional hygroscopic growth function so as to obtain the hygroscopicity of particulate matter with different aerosol types. This method can quickly obtain the PM 2.5 mass concentration near the ground with acceptable accuracy (Li et al., 2016), which is suitable for satellite remote sensing. The semi-empirical physical model also has disadvantages: the uncertainty of aerosol vertical distribution limits the accuracy of PM 2.5 estimates; it is difficult to control the propagation error due to the combination of multiple correction schemes. At present, Van Donkelaar, Martin, Li, and Burnett (2019) and Wei et al. (2019) have released PM 2.5 data set generated by a chemical transport model based on Geos-Chem (http://fizz.phys.dal.ca/~atmos/martin/?page_id=140) and random forest algorithm (https://doi.org/10.5281/zenodo.3753614), respectively. Both data sets show the annual PM 2.5 mass concentrations. The former can describe the distribution of PM 2.5 at the global scale well, but the accuracy is relatively low in urban areas, while the latter is the opposite. In this research, a monthly PM 2.5 data set in China based on satellite remote sensing is produced using the semi-empirical physical method developed by Zhang and Li (2015), expecting to obtain a reasonable data set at all scales.

The PM 2.5 Remote Sensing (PMRS) model
In this study, we used the PMRS model  to establish a long-term data set of PM 2.5 mass concentration. The model aims to bridge the gap between remote sensing and PM 2.5 observation. PM 2.5 is defined as mass concentration of dry particles near the ground with aerodynamic diameter less than 2.5 μm, but aerosol optical depth (AOD) from remote sensing describes the sum extinction of ambient particles in columnar atmosphere. To fill this gap in-between, a series of corrections need to be performed.
Firstly, AOD f , the contribution of fine particle to AOD, is obtained by the fine mode fraction (FMF) following: where, fine mode fraction FMF is the ratio of AOD f to AOD and the subscript f denotes fine mode particles. Next, we define a columnar volume-to-extinction ratio of fine particulates (VE f ) to convert AOD f to columnar fine particle volume V f,column : where, VE f is the key parameter to link columnar optical parameters with particle microphysical properties (i.e. volume), with a unit of μm 3 /μm 2 considering particle volume in the atmospheric column is with unit of μm 3 /μm 2 and AOD f has no unit. Further, aerosol particles generally distribute in the aerosol layer near the surface. The columnar fine particle volume can be represented by the following formula: where, n(r, z) is the number concentration of ambient particles. In order to deal with the vertical distribution of aerosol particles from ground (z 0 , the ground altitude above sea level in unit of km) to the top of aerosol layer (H+ z 0 ), the vertical profile can be normalized by the particle concentration near the ground n(r, z 0 ) written as g'(z). The columnar fine particle volume can be deformed to Taking the vertical integral within the aerosol layer and we can write it as below: Following the vertical integral (Eq. (5)) in Eq. (4), it yields where, V f is defined as the fine particle volume (i.e. ò f 0 4 3 πr 3 n r; z 0 ð Þdr) on the ground. Since a drying process is in the sampling of fine particles near the ground, f(RH) is used to characterize the hygroscopic properties of ambient particles: where, V f,dry is dry volume of fine particles near the ground and RH is the relative humidity. Finally, PM 2.5 mass concentration can be obtained by: where, ρ f,dry is the effective density of dry particles in fine mode near the ground (g cm −1 ). By combing Eqs. (1)- (8), the final formula on PM 2.5 can be expressed as the product of remote sensing parameters and correction terms: It should be noted that the PM 2.5 mass concentration calculated by Eq. (9) is in unit of mg m −3 following the above parameter units, which needs to be multiplied by 10 −3 to be compared with in situ measurements (unit conversion to μg m −3 ).

Columnar volume-to-extinction ratio in fine mode
VE f defined in the previous section, has a good sensitivity to FMF, and changes monotonically with diverse aerosol types. Zhang and Li (2015) chose long-term observation for 4 typical aerosol types to parameterize the VE f , including urban/industrial, biomass burning, dust, and sea salt type: This parameterization formula has to be restrained by FMF between 0.1 and 1.0 due to the valid sample range and lower accuracy of FMF and VE f when FMF is less than 0.1.

Normalized vertical distribution
In polluted days, atmospheric particulate matters usually concentrate towards the ground layer and decrease sharply in the upper layer, while the particle vertical profile in clean days commonly shows a negative exponential distribution. Therefore, these two models are used to characterize the normalized vertical distribution profile of PM 2.5 , namely the vertical uniform and the negative exponential models. The vertical uniform model (Figure 1a) is the simplest one of the normalized profile models of PM 2.5 and describes a fully mixed state of PM 2.5 in the aerosol layer. In this model, there is no significant particulate matter above the aerosol layer. Therefore, we can simply assume that the vertical uniform model is We can deduce from vertical uniform model that PM 2.5 near the ground are approximately inversely proportional to the aerosol layer height, which is in agreement with Koelemeijer et al. (2006). When z 0 is the ground height (z 0 = 0), the integral of the vertical uniform model can be expressed as: The negative exponential model ( Figure 1b) is a distribution in which the PM 2.5 concentration decreases with the aerosol layer height. The height where the concentration of the near-surface PM 2.5 decreases to 1/e is defined as the scale height (SH) of particles. Thus, the negative exponential model can be written as: Integrating Eq (13), it yields: Similar to the vertical uniform model, the right side of integrated form (Eq. 14) is also only the height. Therefore, the integral of the normalized models can be used uniformly: When H is the aerosol layer height, the normalized model corresponds to the vertical uniform model; when H is the scale height, the normalized model corresponds to the negative exponential model. It's worth noting that about 37% of atmospheric particles are located above the aerosol layer when atmospheric particles distribute according to the negative exponential model. Some studies Lin et al., 2015; put the planet boundary layer height (PBLH) into the estimation of PM 2.5 mass concentration to replace the aerosol layer height or scale height. Although such treatment can introduce an uncertainty in a certain degree, it is also an approach that can effectively extend the aerosol vertical profile. Therefore, driven by the same parameter, the aerosol vertical model can represent two different profiles.
In addition to the above two distributions, there are other types of vertical aerosol distribution, such as a model which assumes the aerosols are well-mixed within the boundary layer and exponentially decrease in the free-atmosphere. These models often need some hard-to-observe parameters (e.g. the height of the aerosol haze layer directly observed by lidar). Thus, these vertical models are not considered in this study. Although using only two types of vertical distribution as shown in Figure 1 can introduce a certain error, the error of PM 2.5 mass concentration is within a tolerable range (Wei et al., 2021).

Hygroscopic growth function
Because the particle hygroscopicity varies greatly in different regions, a hygroscopic growth function which can characterize many types of aerosol is needed. For this purpose, the hygroscopic growth factor is reconstructed using the hygroscopic parameter (κ) to improve the performance on the spatial distribution of hygroscopic growth (Zhang et al., 2020). According to κ-Köhler theory proposed by Petters and Kreidenweis (2007), f(RH) in Equation (9) can be derived as follows: where V s is the volume of the dry particulate matter and V w is the volume of the water. The hygroscopic parameter κ can be directly measured or calculated by aerosol chemical components using a simple mixing rule: where κ i is the hygroscopic parameter of the individual aerosol components which can be measured in the laboratory, and v i is the dry component volume fraction. The distribution of κ over China can be obtained by ground-based measurements (e.g. Zhang et al., 2020), modeling or remote sensing.

Strength and limitation of PMRS model
The PMRS model describes the relationship between AOD and PM 2.5 based on physical processes, which does not rely on any ground observations. This advancement enables the PMRS model to avoid empirical calibration of long-term historical data, and enhances the function of historical and instantaneous PM 2.5 estimation. This also ensures that our point-by-point validation for PM 2.5 mass concentration data is more convincing. In addition, the PMRS model is simple, flexible, fast, and suitable for people who have no experience in modeling and statistics. The MODIS monthly products are used as input data in this study in order to reduce the uncertainty, since the PMRS model is sensitive to the errors of input parameters. The MODIS monthly products (MOD08) have high accuracy but the horizontal resolution is 1°×1°. In order to maintain the accuracy of the PM 2.5 dataset, the spatial resolution is sacrificed. In the future, as the accuracy of the inversion parameters (AOD & FMF) increases, the horizontal resolution of PM 2.5 dataset produced by the PMRS model can also be improved.

Description of input data
PM 2.5 mass concentration is estimated by available MODIS (onboard Aqua & Terra satellite) data. We employ the monthly AOD and fine-mode fraction (FMF) data with 1°×1° horizontal resolution obtained from MODIS Terra and Aqua Collection 5.1 (C5.1) monthly products over China  derived from dark target (DT) method (https://ladsweb.modaps. eosdis.nasa.gov/search/). MODIS/Terra products are available from March 2000 and MODIS/ Aqua products are available from July 2002. The data from 2013 to 2015 for both sensors is used in this study. We chose to use C5.1 product because they include FMF which is no longer available in the new versions of products like C6 and C6.1. It is noted the AOD over China from these new versions is substantially different from the C5.1 product, by up to 0.2 (De Leeuw et al., 2018). In the PMRS method, the satellite data are matched up with PBLH and RH data extracted from the Modern-Era Retrospective Analysis for Research and Applications Version 2 (MERRA-2) reanalysis data with the horizontal resolution of 2°×2.5° (https://disc.gsfc.nasa.gov/datasets?project=MERRA-2), which are assigned into the MODIS 1°×1° grid without interpolation (e.g. Inverse Distance Weighted) to prevent the rapidly changing PBLH and RH in coastal area from being smoothed.

Data records
The data set generated by the PMRS model is stored in NetCDF format. There are 4 variables in the data file, including PM 2.5 , latitude, longitude and time. PM 2.5 is a two-dimensional floating variable, with the horizontal grid determined by a one-dimensional array of latitude and longitude, and the unit is μg m −3 . The horizontal resolution is 1°×1°, and the range is 72.5°E-135.5°E and 17.5°N-53.5°N for the latitude and longitude grid, respectively. The time is a combination of year (4-digit) and month (2-digit), which is recorded by an integer number (e.g. March 2000 is recorded as 200003). The PM 2.5 in the data set only has estimated values over land in China, and invalid values over ocean and other regions. In addition, the grids with missing inversion due to lack of satellite data caused by cloud and high surface reflectivity are also filled with invalid values. Figure 2 presents the seasonal distribution of satellite-derived PM 2.5 mass concentration over China. Regarding the division of the four seasons, each season consists of three consecutive months. The spring starts with March and ends with May. The winter average has one less observation than other seasons because MODIS products started in March 2000. From the spatial distribution of PM 2.5 , there are 6 pollution centers in China, including the area among Hebei-Shandong-Henan provinces (HSH), the Yangtze River Delta (YRD), the Pearl River Delta (PRD), the Jianghan Plain (JHP), the Sichuan Basin (SCB), and the Xinjiang Tarim Basin (XTB). The inversion of aerosol properties using the MODIS DT method is difficult due to the bright surface of desert in XTB, but the concentration changes can be inferred from that in the edge area. Comparing the PM 2.5 among the four seasons, the maximum value mostly appears in winter, with that of 123 μg m −3 , and the polluted regions are mainly distributed at the HSH, the SCB and the JHP. There are slight polluted regions in the YRD and the PRD, and PM 2.5 in the Northeast (Jilin and Heilongjiang provinces) is also higher in winter than in other seasons. The mean PM 2.5 in winter over China is also the highest among the four seasons, with that of 42 μg m −3 . However, this may be related to the lack of observations due to the impact of the bright surface in winter. Although the peak of PM 2.5 mass concentration in winter is high, it seems that the spatial distribution of high concentration is restricted in relatively small areas, presumably vicinities of pollution sources owing to unfavorable diffusion conditions. In summer, one can observe that the peak of PM 2.5 mass concentration (107 μg m −3 ) is lower than that in winter but the pollution spreads to larger areas than it does in winter. The PM 2.5 mass concentration averaged in the region of eastern China (110°-120°E, 30°-40°N) in summer (66 μg m −3 ) shows a higher value than that in winter (57 μg m −3 ). The polluted region of PM 2.5 obviously moves from HSH to YRD, and that in Jilin and Heilongjiang provinces move to the coastal areas of Liaoning province in summer. This is related to the high temperature and ambient humidity in the south and coastal areas in summer leading to the easy formation of secondary pollutants. PM 2.5 in the other two polluted regions (JHP and SCB) have reduced to less than 90 μg m −3 . In the spring, the good diffusion conditions in the northern region cause that the PM 2.5 mass concentration decreases significantly, while that in the PRD and XTB increases. The mean PM 2.5 mass concentration in the region of eastern China is 53 μg m −3 , higher than that in autumn, because of more dust events in spring. Pollution is significantly weakened in autumn. The especially in 2013 when it reaches 250 μg m −3 . From the perspective of the spatial distribution, PM 2.5 mass concentrations in HSH, JHP, and SCB regions have increased year by year, gradually spreading from these pollution centers to the surrounding area, and finally form a large-scale regional pollution in the central and eastern China. After 2015, the PM 2.5 mass concentration decreases and the high-value areas also narrow down significantly. The average value over the central and eastern China drops from 55 μg m −3 in 2013 to the lowest value of 43 μg m −3 in the 15 years, as same as that in 2002. However, it should be noted that the pollution centers have not disappeared in 2015, and the PM 2.5 mass concentration in the HSH region is still significantly higher than 2002. Figure 4 shows the trend of monthly average PM 2.5 over China from March 2000 to December 2015. We find that from 2000 to 2010, PM 2.5 over China shows a slow upward trend, and the interannual increasing rate of PM 2.5 mass concentration is 0.026 μg m −3 . In June 2003, PM 2.5 has a significantly high value in eastern China, exceeding 60.0 μg m −3 . This high value of PM 2.5 is related to straw burning and unfavorable weather conditions for diffusion in eastern China (Cao, Zhang, Zheng, & Wang, 2006). From 2011 to 2015, under the policies of air pollution control, the interannual decreasing rate of PM 2.5 mass concentration is 0.172 μg m −3 yr −1 . PM 2.5 exceeds 60 μg m −3 only once in January 2013. Figure 5 shows the monthly changes of multi-year averaged PM 2.5 in China as a box and whisker plot. The small squares represent the mean values, and the median values of PM 2.5 are indicated by a short line inside each box. The top and bottom edges of each box represent the top and bottom quartiles, and the corresponding whiskers are the outliers. We find that PM 2.5 mass concentration basically runs high in winter, runs low in summer and autumn, and only slightly increases in June and July, which may be related to the burning of straw (May and June) and pollutions caused by photochemical reactions in summer. Also, some errors of PM 2.5 in June and July may be introduced by fewer available observations under the influence of cloud and rain. The highest PM 2.5 mass concentration presents in January during the whole year, with an average value of 50 μg m −3 . The minimum of the monthly averaged PM 2.5 appears in September, but its upper quartile still exceeds that of August, indicating that some pollution events with high concentrations of PM 2.5 may still occur. This monthly changes of satellite-derived PM 2.5 in Figure 5 agree with the in-situ monitoring.

Validation with ground-based observations
The estimated PM 2.5 mass concentrations are evaluated by comparison with groundbased PM 2.5 data. Ground-based PM 2.5 from two sources are used. One comes from 1442 ground sites in 372 cities over China during 2013-2015, published by the China National Environmental Monitoring Centre. The ground observed PM 2.5 from sites are averaged at city scale in order to enhance the spatial representation of ground stations. The other part is obtained from additional sources to extend the validation to the period before 2013, i.e. from the U.S. embassies in Beijing (2008Beijing ( -2015, Shanghai (2011Shanghai ( -2015, Guangzhou (2011Guangzhou ( -2015, Chengdu (2012Chengdu ( -2015 and the Hongkong environmental protection agency (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015). Figure 6 shows the biases of satellite-derived PM 2.5 in cities over China averaged from 2013-2015. The sites with bias within ±20 μg m −3 account for 63.4% of all sites. Only a few sites (7.1%) in Shandong, Jiangsu and Shanghai have a bias larger than 20 μg m −3 of PM 2.5 . At some northern regions, the sites where PM 2.5 are underestimated (bias < −20 μg m −3 ) account for 29.4%. Figure 7 shows the errors of the satellite-derived monthly PM 2.5 , which are derived from 7116 data pairs (satellite-derived v.s. in-situ) and divided into 72 bins. We then calculate the median, mean, standard deviation and maximum and minimum of errors in every bin. The expected error (dashed line in Figure 7) is defined as the envelope encompassing the standard deviation of error, which is about ±(15 μg m −3 + 30%) as compared with in situ observations. We find that with the increase of in-situ observed PM 2.5 concentration, the error of satellite-derived PM 2.5 increases. The median error is underestimated by more than 10 μg m −3 at about 70 μg m −3 of in-situ concentration. The error shows a slight overestimation at less than 50 μg m −3 and an underestimation at more than 65 μg m −3 . The mean error in this interval (50-65 μg m −3 ) tends to be 0.0 (0.076 μg m −3 ). The standard deviations of errors are basically within the lower limit of the expected error when in-situ PM 2.5 is less than 85 μg m −3 , whereas they often exceed the upper limit. From the extreme values of PM 2.5 , the high value runs the risk of being underestimated, while the low value is more likely to be overestimated, which is also

Data set value
Compared with atmospheric particulate matter in coarse mode, PM 2.5 with relatively small particle size contains an amount of toxic substances, and has a long residence time and transportation distance in the atmosphere. Therefore, it is harmful to human health and air quality, and it has great impacts on climate changes as well. In China, the pollution of atmospheric fine particles is a challenging environmental issue. The systematic nearsurface PM 2.5 monitoring has not been fully implemented in China until 2013, thus the lack of historical data is a key obstacle to historical analysis of PM 2.5 long-term trends.  Comparison of the satellite-derived monthly PM 2.5 with in situ measurements. The expected error envelope is ±(15 μg m −3 + 30%). We set 72 bins in PM 2.5 levels to calculate the median error (+), mean error (□), standard deviation (bar) and maximum and minimum errors (-).
This paper provides a historical data set of PM 2.5 mass concentration with a spatial resolution of 1°×1° over land of China from 2000 to 2015. The PM 2.5 mass concentration is estimated based on the PMRS remote sensing model, including size cutting, volume visualization, bottom isolation and particle drying procedures. The input data of PMRS model are satellite remote sensing (AOD & FMF) and NCAR FNL meteorological reanalysis data (PBLH & RH), which are completely independent of ground observations of PM 2.5 . Therefore, our data set can be objectively validated by ground observations to ensure the accuracy.
In addition, the PMRS model conforms to the laws of physics, which ensures the global consistency of PM 2.5 concentration estimation. In other words, compared with statistical methods, the accuracy of our data set does not change with the distribution of ground sites; and compared with model methods, our data set is not affected by the uncertainty of model emission sources and the completeness of chemical reaction mechanism, which increases the robustness of the data set. The historical PM 2.5 data set can not only provide basic data for the studies of human exposure to particulate matter, but also serve for historical assessment of air quality.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
Yuanyuan Wei received her Ph.D. degree in Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China, in 2020. She received her bachelors degree from Nanjing University of Information Science and Technology. She is currently a Post-doc in National Engineering Laboratory for Remote Sensing Satellite Applications, Aerospace Information Research Institute, Chinese Academy of Sciences. Her research interests include aerosol remote sensing and atmospheric environment.
Zongren Peng received his Ph.D. degree in Theoretical Physics from Pierre and Marie Curie University -Paris 6 and Institute of Theoretical Physics in French Alternative Energies and Atomic Energy Commission in 2012. He is currently a research assistant at Aerospace Information Research Institute, Chinese Academy of Sciences, China. His research interests focus on remote sensing retrieval, particle physics and statistics.

Data availability statement
The data that support the findings of this study are openly available in SONET at https://www.sonet. ac.cn/ and in Science Data Bank at http://www.dx.doi.org/10.11922/sciencedb.j00076.00061.

Open scholarship
This article has earned the Center for Open Science badge for Open Data. The data are openly accessible at http://www.dx.doi.org/10.11922/sciencedb.j00076.00061.