Methods, availability, and applications of PM2.5 exposure estimates derived from ground measurements, satellite, and atmospheric models

Fine particulate matter (PM2.5) is a well-established risk factor for public health. To support both health risk assessment and epidemiological studies, data are needed on spatial and temporal patterns of PM2.5 exposures. This review article surveys publicly available exposure datasets for surface PM2.5 mass concentrations over the contiguous U.S., summarizes their applications and limitations, and provides suggestions on future research needs. The complex landscape of satellite instruments, model capabilities, monitor networks, and data synthesis methods offers opportunities for research development, but would benefit from guidance for new users. Guidance is provided to access publicly available PM2.5 datasets, to explain and compare different approaches for dataset generation, and to identify sources of uncertainties associated with various types of datasets. Three main sources used to create PM2.5 exposure data are ground-based measurements (especially regulatory monitoring), satellite retrievals (especially aerosol optical depth, AOD), and atmospheric chemistry models. We find inconsistencies among several publicly available PM2.5 estimates, highlighting uncertainties in the exposure datasets that are often overlooked in health effects analyses. Major differences among PM2.5 estimates emerge from the choice of data (ground-based, satellite, and/or model), the spatiotemporal resolutions, and the algorithms used to fuse data sources. Implications: Fine particulate matter (PM2.5) has large impacts on human morbidity and mortality. Even though the methods for generating the PM2.5 exposure estimates have been significantly improved in recent years, there is a lack of review articles that document PM2.5 exposure datasets that are publicly available and easily accessible by the health and air quality communities. In this article, we discuss the main methods that generate PM2.5 data, compare several publicly available datasets, and show the applications of various data fusion approaches. Guidance to access and critique these datasets are provided for stakeholders in public health sectors. PAPER HISTORY Received April 6, 2019 Revised August 1, 2019 Accepted August 22, 2019


Introduction
Particulate matter (PM) is a well-established health risk factor, with impacts on human morbidity and mortality through cardiovascular (Brook et al. 2010) and respiratory diseases (Dominici et al. 2006;Ni, Chuang, and Zuo 2015;Wu et al. 2018), lung cancer and cardiopulmonary mortality (Hoek et al. 2013;Pope et al. 2002), premature births (Malley et al. 2017) and other types of diseases (Lin et al. 2017;Tian et al. 2017). Particulate matter is a combination of solid particles and liquid droplets that are suspended in the air, and they are typically classified by aerodynamic diameter (EPA 2018a). Of these, health studies have demonstrated that PM 2.5 (i.e., particles that are 2.5 µm or smaller in aerodynamic diameter) is of particular concern for public health. PM 2.5 also has a lower rate of gravitational settling, so it can travel long distances in the atmosphere and affect regions far from the emission source, if not removed by precipitation (Ouyang et al. 2015). In this study, we focus on PM 2.5 , although many of the methodological issues relevant to PM 2.5 also relate to PM 10 and, in some cases, gasphase pollutants as well.
Epidemiological and clinical studies have confirmed an association between both acute and long-term exposures to PM 2.5 and adverse cardiorespiratory health effects (Dominici et al. 2006;Peters et al. 2001;Pope and Dockery 2006). PM 2.5 was ranked the sixth highest risk factor in the Global Burden of Disease estimates of global premature mortality in 2016, and it was also ranked first among all outdoor air pollutants (State of Global Air 2018). In recent years, PM 2.5 data have been used for (1) epidemiological studies, which quantify relationships between PM 2.5 exposure and adverse health outcomes, (2) health benefit assessments, which combine pollution concentrations, population data, baseline rates of adverse health outcomes, and concentration-response functions from previously conducted epidemiologic studies, to estimate the PM 2.5 -related health burden, and (3) tools to support public health interventions from episodic events such as wildfires and dust storms, including inexpensive, rapid-response health impact assessment tools (Anenberg et al. 2016).
In an annual trend analysis, the global and regional emissions of PM species are often found to be correlated with energy consumption (Klimont et al. 2017). In the U.S., anthropogenic emissions of PM 2.5 from various sources (i.e., stationary, mobile and fire sources) are documented in the national emissions inventory report by EPA (EPA 2017). One unique example of fine particulate matter emission on a short timescale is wildfires. In the U.S., 69% of the population were exposed to PM 2.5 above 0.2 µg/m 3 due to seasonal wildfire events (Munoz-Alpizar et al. 2017). Inhalation of PM 2.5 from wildfire smoke has been associated with increased cardiopulmonary and cerebrovascular hospital admissions and emergency department visits in affected communities Rappold et al. 2011;Wettstein et al. 2018), and studies show that wildfire smoke may be a triggering factor for acute coronary events (Haikerwal et al. 2015). Detailed evaluation has also been done to investigate the mutagenicity and lung toxicity of particulate matter from different fuel types and burning phases of fires ). More research progress on wildfire health impact studies can be found in recent reviews (Henderson and Johnston 2012;Liu et al. 2015;Reid et al. 2016;Stefanidou, Athanaselis, and Spiliopoulou 2008).
In the U.S., compliance with the annual and daily (24 hr) average PM 2.5 from the Environmental Protection Agency (EPA) National Ambient Air Quality Standards (NAAQS) is solely determined by ground-based regulatory monitoring data. While PM 2.5 is monitored in most large U.S. cities, many parts of the U.S. have no ground-based measurements of PM 2.5 to assess health impacts (EPA 2018b). Even in areas with multiple ground-based PM 2.5 instruments, there are challenges in extrapolating pointbased measurements to characterize regional air quality. Local emissions and variations in geography can lead to a high level of spatial heterogeneity in measurements, such that two monitors near each other may reflect very different PM 2.5 levels. There is no consensus approach generalizing point-based estimates to ascertain prevailing population-level PM 2.5 exposure over a community or neighborhood. Furthermore, temporal variability, especially during episodic high exposure events such as wildfires that lead to short-term health-related effects, is an important public health concern .
Satellite retrievals and atmospheric models have been used to complement ground-based monitoring data, and to estimate ambient PM 2.5 levels in areas with no direct measurements. Satellite instruments can provide information on atmospheric or land-use characteristics associated with air pollution levels but cannot measure near-surface PM 2.5 in a manner directly comparable to measurements. Most satellite aerosol products are integrated values from the Earth's surface to the top of the atmosphere. As a result, these instruments cannot directly provide near-surface PM 2.5 but instead retrieve the "aerosol optical depth" (AOD), which is defined as the light extinction due to particles in the entire column of air.
The relationship between near-surface PM 2.5 and satellite data AOD is spatially and temporally heterogeneous. Models that estimate the AOD-PM 2.5 relationship may be categorized as geoscience-based or statistical: geosciencebased methods use atmospheric models to solve equations of physical and chemical processes, which is a forward approach to simulate the near-surface PM 2.5 and AOD; statistical models extrapolate data based on empirical associations. In the case of PM 2.5 , both geoscience-based and statistical methods have evolved as valuable approaches to estimate ground-level concentrations in areas lacking direct measurements. Geoscience-based models, also referred to as chemical transport models, are used extensively to inform air quality management programs in addition to characterizing AOD-PM 2.5 relationships. Here we review estimates of spatially continuous PM 2.5 data for the contiguous United States, and how these estimates were constructed using one or more of these three data sources: measurements, satellite data, and models. Several previous articles have reviewed technical methods to generate surface PM 2.5 data, including over regions with few monitors (Zhang, Rui, and Fan 2018), from remote sensing techniques (Chu et al. 2016;Hoff and Christopher 2009), and data assimilation methods (Lynch et al. 2016). Here we focus instead on existing data sources that are publicly available and easily accessible by the health and air quality communities. The complex landscape of satellite instruments, model capabilities, monitor networks, and data synthesis methods offers opportunities at the spatial and temporal scales of relevance to address questions of interest. Thus, a main goal of this work is to present an array of options for estimating near-surface PM 2.5 using publicly available datasets, and provide guidance to new users to access and assess these data. We extend a previous review by van Donkelaar et al. (2010) that addressed the relationship between AOD and measured PM 2.5 (Engel-Cox et al. 2004;Wang and Christopher 2003), meteorological factors (Gupta et al. 2006;Koelemeijer, Homan, and Matthijsen 2006;Liu et al. 2005), and the synthesis of multiple data sources on a global basis. Specifically, we provide an overview of methods to develop spatially continuous PM 2.5 estimates (Section 2); compare existing PM 2.5 estimates for the contiguous U.S. that have been used or cited in past studies, discuss applications of satellite-derived data for wildfires and the global burden of disease (Section 3); and finally summarize the current status and future directions for PM 2.5 estimates (Section 4).

Overview of four basic types of PM 2.5 datasets
We describe the generation and applications of four types of PM 2.5 datasets: 1) ground-based monitor data; 2) ground-based monitor data merged with satellite data; 3) ground-based monitor data merged with model data; 4) ground-based monitor data merged with satellite and model data. Indirect ground-based observations such as visibility  are not discussed here, nor are short-term research field campaigns that offer extensive data for individual regions over a period of days to months.

Ground-based monitoring data
Over the U.S., extensive networks of ground-based instruments operated by state, local and Tribal agencies continuously monitor PM 2.5 . Measurements from these monitors are archived by the U.S. EPA (https://www. epa.gov/outdoor-air-quality-data) and are freely available, to support assessment at annual, daily, and often hourly time scales. Spatially continuous real-time maps can be viewed on the EPA AirNow website. These data are used by cities, counties, states and the EPA to determine compliance with the NAAQS, by atmospheric modelers for model evaluation, and by health researchers as input to epidemiological studies and risk assessments. PM 2.5 data from ground monitors can be used to study correlations with adverse health impacts using area-wide averaging or nearest-monitor exposure assignments, such as a previous study of correlation with daily mortality in six U.S. cities (Laden et al. 2000) and a study of the American Cancer Society that links particulate air pollution and mortality (Krewski et al. 2009). Alternatively, various interpolation methods can be employed to create spatially continuous data fields, such as Land-Use Regression (LUR) modeling, ordinary kriging interpolation, and inverse distance weighted interpolation (Menut et al. 2013;Zhang, Rui, and Fan 2018). Relative to other sources, monitor data are often treated as the "gold standard," but the spatial coverage of pollution monitors limits the application of these data for health assessment, and can subsequently introduce errors where point-based monitors are used to assess health over a wider domain (Zeger et al. 2000). In certain cases such as Keller and Peng (2019), spatial prediction models showed better accuracy than monitoring averaging for exposure assignment, although both methods have limitations.
Before 1999, all PM 2.5 measurements were collected using a filter-based method at 24 hr or longer time averages. Since 1999, the EPA PM 2.5 monitoring system has been providing daily, continuous mass measurements reported as hourly averages. Most PM 2.5 monitors either use the Federal Reference Method (FRM) or the Federal Equivalent Methods (FEMs) (Noble et al. 2010). Many other types of ground-based monitors exist beyond FRM and FEMs with varying levels of measurement accuracy, and comparisons have been made of the measurements among various instruments (Allen et al. 1997;Chow et al. 2008;Hering and Cass 1999;Turpin, Huntzicker, and Hering 1994and references therein). The aerosol water content in PM 2.5 is operationally defined, with dependence on both the instrument and network. For example, filters collected by the U.S. EPA are equilibrated at 30-40% (within ±5%) of relative humidity (RH) (Chow and Watson 1998), while the European standard is 50% RH (European Committee for Standardization 1998). Some instruments use heaters to evaporate aerosol water that also evaporates semi-volatile material. These different practices complicate comparison across ground monitors and networks.
Related networks include the EPA Chemical Speciation Monitoring Network (CSN, launched in 2000) and the Interagency Monitoring of Protected Visual Environments (IMPROVE) program (Malm et al. 1994), which operates a network of~170 PM monitors primarily in U.S. Wilderness Areas and National Parks. Both networks provide 24-hr average PM 2.5 concentrations, measured every three to 6 days, where a chemical analysis is performed to identify the elemental carbon, organic carbon, ammonium-sulfate, ammonium-nitrate, sea salt, soil, and other trace constituents. The field and laboratory approach is similar for both networks, despite differences in sampling and chemical analysis (Solomon et al. 2014).
In addition to the EPA monitors (typically located in highly populated areas) and IMPROVE monitors (located in rural background areas), temporary PM 2.5 monitors are deployed as a part of the Wildland Fire Air Quality Response Program (WFAQRP, https://wild landfiresmoke.net/) during periods of wide-scale smoke impacts from wildfire. The temporary monitors are a mix of Environmental Beta Attenuation Monitors (EBAMs) and E-Samplers manufactured by Met One, Inc. The monitors are not FRM monitors, but can provide real-time information about PM 2.5 exposure. They are often deployed in remote, small towns where smoke impacts are heavy and other monitors do not exist. A web-based monitoring data tool merges data from these monitors with PM 2.5 data from the EPA AirNowTech system, providing time-series graphs and data that are downloadable (see https://tools.airfire.org/ monitoring).

Data merged from ground monitors and satellites
Satellite remote sensing of AOD generally offers more spatially extensive observational information than ground-based PM 2.5 measurements (except for regions with perpetual issues due to cloud cover, snow cover, bright surface, viewing geometry, etc.). In this way, satellite data directly complement point-based estimates of PM 2.5 from monitors. Several studies have developed linear regression models for estimating PM 2.5 concentrations from remotely sensed AOD (Al-Hamdan et al. 2009;Gupta and Christopher 2009;Gupta et al. 2006;Wang and Christopher 2003;Zhang, Hoff, and Engel-Cox 2009), and others have added meteorological parameters to develop multiple regression models or generalized additive models for surface PM 2.5 (Hu et al. 2014a;Liu et al. 2007Liu et al. , 2005Ma et al. 2014;Paciorek et al. 2008). Statistical approaches often incorporate ancillary data, such as meteorological fields, land use, and road density, to derive surface PM 2.5 based on merged data from ground monitors and satellites. The relationships between satellite retrievals of AOD and PM 2.5 measured by ground monitors are often evaluated by correlation coefficients (R), which are based on estimates and observed values in cross-validation exercise. The R values have been reported to be sensitive to various geographical locations and show large spatial and temporal variabilities (Hu et al. 2014a(Hu et al. , 2014bKloog et al. 2014;Paciorek et al. 2008). In fact, the performance of the derived PM 2.5 in the statistical approach relies on the availability, quality and the consistency of both ground-based observations and ancillary data, so past studies have typically been limited to a single county or a group of U.S. states, e.g., in the southeast (Bi et al. 2019;Hu et al. 2014aHu et al. , 2014bLee et al. 2016) and northeast (Kloog et al. 2014) U.S.
Despite the value of satellite data in providing spatial coverage to complement information from point-based monitors, satellite data have several limitations. In terms of temporal coverage, the polar-orbiting satellites (e.g., MODIS, MISR, VIIRS, etc.) can only sample AOD at overpass times, once or twice a day or fewer (e.g., not over cloudy or snow-covered regions). Only a few studies have used instruments aboard geostationary satellites to examine surface PM 2.5 , such as the Geostationary Operational Environmental Satellite (GOES) aerosol/smoke product (GASP) (Liu, Paciorek, and Koutrakis 2009) and the Korean Geostationary Ocean Color Imager (GOCI) AOD product (Lennartson et al. 2018;Xiao et al. 2016;Xu et al. 2015). There are also issues with missing data in satellite observations, particularly over clouds (Kokhanovsky et al. 2007) or other bright surfaces such as desert and coastline (Remer et al. 2005). The fact that the missing AOD data is not random poses challenges for identifying the most impacted communities in health studies. Further complicating the integration of AOD and surface PM 2.5 is the choice of satellite instruments (e.g., MODIS, MISR, SeaWiFS, VIIRS, CALIOP, etc.) and their AOD retrieval algorithms (e.g., Dark Target and MAIAC products for MODIS, Deep Blue for MODIS and SeaWiFS, etc.). In addition, the relative humidity of the atmospheric column, the chemical composition, hygroscopicity and size distribution of PM in the column, and the vertical distribution of PM all affect the total uncertainty of the daily satellitederived ground-level PM 2.5 (Ford and Heald 2013;Jin et al. 2019a).

Data merged from ground monitors and model simulations
Numerical models of atmospheric chemistry and transport offer another powerful option for estimating nearsurface PM 2.5 . These models may be run at the global or regional scale, often referred to as atmospheric chemical transport models (CTMs) or photochemical grid models. These models estimate the distribution of PM 2.5 on grids whose horizontal resolution ranges from a few kilometers (regionally) (McMillan et al. 2010;Vaughan et al. 2004) to hundreds of kilometers (globally). Examples of regional CTMs include the EPA Community Multiscale Air Quality (CMAQ; www.epa.gov/cmaq) model and the Ramboll Comprehensive Air Quality Model with Extensions (CAMx; www.camx.com). Examples of global CTMs include the GEOS-Chem model (Bey et al. 2001) and the TM5 global chemistry transport model (Huijnen et al. 2010;Van Dingenen et al. 2018). CTMs are used extensively in applications requiring predictions for policy scenarios that cannot be directly monitored due to their hypothetical nature. Models calculate pollution concentrations over a continuous spatial domain and time period, which are free from spatiotemporal coverage limitations in monitor and satellite data. Fusing model and observational data can help to leverage the accuracy of observational data as well as the spatial and temporal coverage of models.
The EPA Fused Air Quality Surfaces Using Downscaling (FAQSD) (EPA 2016) is one example of the model-measurement data fusion. FAQSD constructs a surface of spatially-varying regression coefficients by comparing monitor data with modeled PM 2.5 from the CMAQ model. This surface of regression coefficients is then interpolated spatially between monitors and used to create a fused daily surface PM 2.5 exposure map (Berrocal, Gelfand, and Holland 2010a, 2012, 2010b. FAQSD has been shown to provide more accurate exposure estimates (e.g., monitors withheld from the data fusion process) than ordinary kriging of the observations alone Holland 2010a, 2012). In collaboration with EPA, the Centers for Disease Control and Prevention (CDC) National Environmental Public Health Tracking Network (EPHTN) has extended the methods used for FAQSD to generate continuous model-based estimates of surface PM 2.5 . The EPHTN dataset (Centers for Disease Control and Prevention 2018) is available at the census tract and county-level for the contiguous U.S. In addition to the publicly available data, numerous previous studies developed fusion algorithms that combine surface measurements and regional modeling (CMAQ) (Friberg et al. 2017(Friberg et al. , 2016Huang et al. 2018) to estimate speciated PM 2.5 concentrations at regional to national scales within the U.S.
As necessarily simplified versions of the physical atmosphere, models only represent spatial gradients on scales larger than the model grid (i.e., 12 km for the FAQSD). Areas with few or no monitoring sites are more sensitive to the model and fusion method, potentially leading to inaccuracies in the derived PM 2.5 concentrations (Berrocal, Gelfand, and Holland 2012).
Data merged from ground monitors, satellite data, and model simulations Since there are strengths and shortcomings associated with ground monitors, satellite data, and model simulations, one method to estimate ground-level PM 2.5 is to combine all three of these sources to leverage the strengths of each one. The incorporation of model data informs the vertical distribution of aerosols in the atmosphere, and the hygroscopicity and chemical speciation of ambient PM, both key issues supporting the fusion of satellite AOD (column) with monitor data (surface). For example, high AOD may be indicative of either high levels of surface PM, or high levels of PM aloft. In this example, AOD could not distinguish between these two scenarios. The integration of model data would address this problem, with the accuracy of the surface products tied to the model simulations of aerosol vertical profiles. This approach provides a basis for estimating surface PM 2.5 from satellites, even in regions with few or no ground-based monitors (van Donkelaar et al. 2010).
Due to the incomplete spatiotemporal coverage in both monitor-based PM 2.5 and satellite AOD observations, various approaches have been developed to fill in the missing gaps. Some of them are individual methods such as kriging interpolation and land-use regression models, while other approaches are hybrid of multiple individual methods, combining model simulations and/ or satellite AOD with monitor data. Two hybrid approaches are widely usedthe statistical and the geoscience-based approaches. The Bayesian statistical downscaling method is a typical statistical modeling method that regresses PM 2.5 -AOD relationships and downscales the coarser resolution grid-mean values to locations at higher solutions (Berrocal, Gelfand, and Holland 2012;Shaddick et al. 2018aShaddick et al. , 2018b. The statistical modeling approach typically has lower computation cost and the ability to provide probabilistic uncertainty measures for subsequent health effect and impact studies. The geoscience-based (also known as process-based) approach predicts PM 2.5 estimates based on monitor data, satellite AOD, as well as PM 2.5 -AOD relationships from model simulations. Such approach is more computationally expensive, but usually provides higherresolution datasets (e.g., 1 km in Dalhousie data) than the Bayesian statistical downscaling method (e.g., 12 km in EPA FAQSD). A widely used global estimate of ground-level PM 2.5 incorporating data from monitors, satellites, and an atmospheric model was developed by van Donkelaar et al. (2016). This dataset was developed by combining AOD from multiple satellite products (MISR, MODIS Dark Target, MODIS and SeaWiFS Deep Blue, and MODIS MAIAC).
A GEOS-Chem simulation of atmospheric aerosols provides relationships between AOD and ground-level PM 2.5 , which are further adjusted by a statistical distribution based on simulated aerosol speciation, elevation, and land-use information.
Another example of data assimilation of ground monitor and satellite data into global modeling of PM 2.5 is the Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2), the latest reanalysis using the Goddard Earth Observing System, version 5 (GEOS-5) (Buchard et al. 2017). MERRA-2 data assimilate AOD from bias-corrected MODIS data, non-bias corrected MISR data and sunphotometer measurements from Aerosol Robotic Network (AERONET). In addition, the Navy Aerosol Analysis and Prediction System (NAAPS) has been used by the Naval Research Laboratory (NRL) to generate an 11-yr offline aerosol reanalysis by assimilating quality-assured and controlled MODIS and MISR AOD (Lynch et al. 2016). Other studies have developed regionally specific algorithms to calculate near-surface PM 2.5 . For example, Reid et al. (2015) used machine learning to evaluate how various combinations of monitoring data, AOD, model estimates, and auxiliary datasets (land-use, traffic, and meteorology) can be used to improve the estimations of surface PM 2.5 in Northern California. Di et al. (2016a) used a hybrid model based on neural network to assess PM 2.5 exposures over the U.S., and Beckerman et al. (2013) estimated the spatiotemporal variability of PM 2.5 in the U.S. using a landuse regression model. Two sources of error exist when inferring surface mass abundance of PM 2.5 from columnar AOD: uncertainties in satellite observed AOD, and uncertainties in modeled relationships between simulated PM 2.5 and AOD. Ford and Heald (2016) estimated the contribution of these values to uncertainties in the annual premature deaths associated with long-term PM 2.5 to be 20% for the modeled PM 2.5 -AOD relationship, and 10% for errors in the AOD retrieved from satellite instruments, based on MODIS AOD Collection 6 and 0.5°×0.667°GEOS-Chem simulations over the U.S. and China. When assimilating of AOD into model simulations, the lack of information regarding aerosol speciation and aerosol vertical distributions may lead to degraded model performance (Buchard et al. 2017). The added value of satellite data may depend on the number of existing monitors, such as the case study of Lassman et al. (2017) during the 2012 wildfire season in Washington State, which found marginal improvements by incorporating satellite data and model simulations when numerous ground monitors are present, compared with substantial improvements when fewer monitors exist.

U.S. PM 2.5 surfaces and health applications
Using the methods and data discussed above, multiple research groups and agencies have developed PM 2.5 exposure estimates. A comparison among several frequently used, publicly available surface PM 2.5 datasets will be discussed, and examples will be given on how they have been used for air quality and health applications.
Major PM 2.5 exposure datasets Table 1 shows several datasets providing spatially continuous PM 2.5 fields over the U.S. Not all these data are directly comparable, as they may be available for different years or regions. We conduct an intercomparison of PM 2.5 among four datasets for the overlapping year of 2011: CDC WONDER (Wide-ranging Online Data for Epidemiologic Research), CDC National Environmental Public Health Tracking Network (EPHTN, also mentioned above), Dalhousie University's Atmospheric Composition Analysis group data (hereby referred to as the Dalhousie data for brevity), and the ground-based monitor data (AQS + IMPROVE). The goal of this section is to provide examples on the existing large spatial variabilities in publicly available PM 2.5 products. In fact, even though groundbased monitor data have been fused in CDC WONDER, EPHTN and Dalhousie data, there are still large differences among these datasets. We chose to intercompare their county-mean values at geocoded-address level for consistency among methods, since CDC fields were only available at the county-level.
The CDC WONDER data ( Figure 1a) were developed by Al-Hamdan et al. ) using satellite AOD from MODIS and PM 2.5 from EPA monitors, and were made available through the CDC WONDER website (http://wonder.cdc.gov/). CDC WONDER data were generated based on a regression model that derives surface PM 2.5 from satellite-based columnar AOD, and a B-spline smoothing model that generates 10-km resolution, daily, spatially continuous PM 2.5 surfaces for the contiguous U.S. by using combined AQS monitor PM 2.5 measurements and bias-corrected MODIS satelliteestimated PM 2.5 . This dataset provides daily PM 2.5 data at the county scale for the continental U.S. in 2003-2011, easily accessible from a menu-driven online system operated by the CDC. This national PM 2.5 dataset in CDC WONDER allowed public health researchers and policymakers to effectively include air pollution exposure data in the context of health-related data available in the WONDER online system and other health data systems, and link these exposure data with state and county public health datasets throughout the continental U.S. The CDC WONDER data have been used to study associations between PM 2.5 air pollution and several adverse health effects, such as risk of sepsis hospitalization Evaluation of the PM 2.5 estimates from CDC WONDER found a much stronger correlation with ground-based PM  over the eastern and the midwestern United States than those over the western United States . This same pattern is evident in comparing CDC WONDER for 2011 in Figure 1a with county-average monitoring data in Figure 1d. A potential cause for the performance differences between the eastern and western United States in CDC WONDER data is the B-spline smoothing method, which results in relatively higher predicted values particularly when ground monitors are sparse. When using a different smoothing methodthe inverse distance weighted method, PM 2.5 surface estimates in California showed lower maximum values at countylevel than the B-spline method (http://www.met.sjsu.edu/ weather/HAQAST/). Figure 1b shows surface PM 2.5 from the CDC EPHTN, produced by the National Environmental Public Health Tracking Program in collaboration with EPA (EPHTP, also known as the Tracking Program), and based on Bayesian space-time downscaling (Berrocal, Gelfand, and Holland 2010a, 2010b, 2012. The EPHTN PM 2.5 data were previously used in epidemiology studies that track associations between PM 2.5 concentrations and a series of adverse health outcomes, including asthma-related emergency department visits, respiratory emergency department visits (Strosnider et al. 2019), exacerbation of existing asthma (MirabelMirabelli et al. 2016), cardiovascular chronic diseases (Weber et al. 2016), chronic kidney disease and end-stage renal disease (Bowe et al. 2018). In addition, the PM 2.5 estimates from EPHTN data were incorporated in a distributed lag nonlinear model when assessing the association between extreme heat and hospitalizations . Although satellite data have not been incorporated into the EPHTN data, the Tracking Program partnered with NASA and Emory University to enhance spatial coverage of PM 2.5 in the southeast U.S. (Hu et al. 2014a), and evaluated various satellite-based data products for characterizing adverse health impacts resulting from wildfire smoke PM 2.5 . One feature of the EPHTN data is the relatively high concentrations in the western U.S. compared with the Dalhousie and AQS +IMPROVE fields in Figure 1. This difference is likely because EPHTN data used AQS monitors but not IMPROVE monitors in the fitting. PM 2.5 concentrations in the rural areas may be overestimated when rural, low-concentration measurements from the IMPROVE network are not used, and the predictions are based on interpolation of fits from AQS monitors in urban areas. Figure 1c shows 2011 surface PM 2.5 fields from the Dalhousie data, which are based on models, satellites, and ground monitors. The data shown in Figure 1c are from V4.NA.02, and the data products are continually updated. van Donkelaar et al. (2010) describe the first global satellite-derived PM 2.5 dataset (V1.01), with subsequent datasets including additional satellite products (V2.01 ) and extending the time period of data availability (V3.01 ). The version shown in Figure 1c was developed for North America using AOD retrievals from multiple satellite products, combined with a GEOS-Chem simulation at 0.5°×0.67°resolution, and incorporation of local ground-based monitors through statistical fusion (V4.NA.02; (van Donkelaar et al. 2019)). This regional version builds upon updates made during development of the global product (V4.GL.02 ) that combine AOD from multiple satellite products based upon their relative uncertainties. The resultant 1-km PM 2.5 estimates are highly consistent (the spatial coefficient of determination (R 2 ) = 0.81) with global out-ofsample cross-validated PM 2.5 concentrations from ground-based monitors. A variant of these data is also available (with similar performance) for the year 2014 (Shaddick et al. 2018b) as used in the GBD (Cohen et al. 2017). These datasets are currently being updated with newer satellite products, updated models, and new monitoring networks (Snider et al. 2015). The Dalhousie V4. NA.02 data for 2011 shown in Figure 1c capture the spatial gradients of the county-level monitoring data in Figure 1d. The Dalhousie data seem to show the best agreement with the AQS+IMPROVE data in the remote areas of the western U.S. and New England, which is consistent with another comparison study (Jin et al. 2019b). This feature indicates that incorporating satellite data can be valuable for improving estimation of PM 2.5 concentrations in remote areas where monitor coverage is sparse.
For comparison, Figure 1d shows a fusion of EPA AQS PM 2.5 and IMPROVE PM 2.5 allocated to the county scale (i.e., data within the same county are averaged to a single county-mean value). The two datasets were combined to provide a more comprehensive representation of both urban and rural areas in the United States. For EPA AQS PM 2.5 , only counties with PM 2.5 concentrations collected over four complete quarters in 2011 were chosen, and averaged to estimate an annual value. For IMPROVE PM 2.5 , monitor locations were matched with county Federal Information Processing Standards (FIPS) code and county-mean values were calculated. Then, the EPA AQS PM 2.5 and IMPROVE PM 2.5 were fused, that is, when both AQS and IMPROVE monitors occurred within a county, measurements from these two networks were averaged to derive the county mean. The agreement between the fused data fields (Figure 1a-c) with AQS-IMPROVE fused data (Figure 1d) may relate to the weight given to local data in the data fusion algorithm, and/or to the representativeness of a PM 2.5 monitor for a county average. Nevertheless, comparisons with observations can be useful for identifying differences among datasets based on quantitative evaluation metrics. Figure 3a compares the frequency distributions of the three datasets with respect to the AQS-IMPROVE fused data. CDC WONDER consistently shows higher values than the other datasets. The Dalhousie dataset exhibits lower values of PM 2.5 overall with the widest frequency distribution (largest standard deviation in Figure 3b).
Results from our analyses of county-mean PM 2.5 estimates in 2011 are summarized in Table 2. The minimum annual county-level mass concentrations of PM 2.5 for CDC WONDER, EPHTN and Dalhousie are 7.2 μg/m 3 , 4.4 μg/m 3 and 3.3 μg/m 3 , respectively, with the CDC WONDER's minimum being the highest. Maximum values in each of the three datasets are similar (CDC WONDER: 14.9 μg/m 3 ; EPHTN: 16 μg/m 3 ; Dalhousie: 13.4 μg/m 3 ). R 2 and the normalized mean bias based on standard formula (EPA 2018c) are calculated. Figure 2 shows the linear regression slope and intercept values. CDC WONDER exhibited the lowest R 2 , 0.106, out of all the three datasets to averaged county-level AQS-IMPROVE fused data during a county-mean to countymean comparison, and the highest normalized mean bias of 33.3%. But as for the grid-level validation, the validation of CDC WONDER's surfacing algorithm showed an R 2 of 0.88 and RMSE of 1.6 μg/m 3 for the Southeastern U.S., by comparing with EPA AQS monitors collocated with the grid cells of estimates (Al-Hamdan et al. 2009). CDC EPHTN demonstrated an R 2 of 0.649 and normalized mean bias of 12.2%. The Dalhousie dataset shows R 2 of 0.693 and normalized mean bias of 1.6% if the satellitebased data are sampled coincidentally at monitor locations from both the EPA AQS and IMPROVE networks before averaging by county and comparing, but degrades to an R 2 of 0.647 and normalized mean bias of −3.3% if county-level averages are taken before comparing with the in situ monitor data, and degrades still further to an R 2 of 0.527 and normalized mean bias of −9.2% if countylevel averages are taken before comparing with the in situ monitor data and if the IMPROVE data are also excluded. The reduced agreement implies caution when using a limited number of primarily urban ground-based monitors to represent county averages, and motivates consideration of more spatially representative information such as from satellite and models as reviewed here. When further restricting the linear regression analyses to PM 2.5 < 15 μg m −3 only, the R 2 values for CDC WONDER and Dalhousie data only increase slightly, while the R 2 value for EPHTN remains the same.
This comparison highlights the importance of methodology in estimating PM 2.5 surfaces. All three datasets used data from the EPA AQS; two used satellite AOD from the MODIS instrument; and two used advanced computer models. However, the results among these vary widely, and can be sensitive to assumptions about the spatial scale represented by individual PM 2.5 monitors. This comparison is limited to county averages of annual mean for 2011 and to only those counties with data availability over four complete quarters; so more extensive inter-comparison of PM 2.5 surfaces (including daily, grid-level and county-level comparisons for more counties and years) in future studies would provide helpful information to health and air quality organizations in selecting the appropriate data set for new applications.

PM 2.5 concentrations in global-scale estimates
The highest profile application of satellite-derived PM 2.5 fields has been the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD). The GBD project continually evaluates PM 2.5 exposure using a consistent, globally applicable method representing the current state-of-thescience of data fusion of satellite, model, and monitor data.  Notes. *R 2 and normalized mean bias are calculated based on county-mean to county-mean comparison with respect to AQS+IMPROVE data of 2011 annual mean. **This is based on county-mean to county-mean comparison, but as for the grid-level validation, the surfacing algorithm that was used to create the CDC WONDER dataset was also validated on the grid level for the Southeastern US that showed an R Rather than relying on a single global function to combine information from the different data sources (satellite-derived product, model, and in situ monitor data), a Bayesian hierarchical model was used to combine multiple streams of information with calibration coefficients defined at the country-specific scale (where possible). This allowed the final estimate to more heavily weigh the data source that yielded the most accurate PM 2.5 estimates (as evaluated through out-of-sample crossvalidation) in different regions of the world. The GBD 2016 results reported in Gakidou et al. (2017) continue to use a Bayesian hierarchical model to fuse geophysical satellite-derived PM 2.5 estimates (V4. GL.02.NoGWR (van Donkelaar et al. 2016)) with in situ PM 2.5 and PM 10 monitor data, using predictors, modeled aerosols from the global GEOS-Chem model, a factor related to elevation and urban proximity, and random effects and correlations across these terms (Shaddick et al. 2018a). Overall, the R 2 compared to out-ofsample cross-validation measurement increased from 0.64 to 0.91, and the root-mean-square error (RMSE) estimates reduced from 23 μg/m 3 to 12 μg/m 3 , compared to GBD 2013 estimates (Shaddick et al. 2018a).
Besides differences of PM 2.5 estimates in different studies, the estimates of global exposure mortality can be affected by other factors such as causes of death being considered, regions being considered, and whether part of the PM 2.5 impact is being categorized as from indoor or to estimate association between outdoor PM 2.5 and nonaccidental mortality, and predicted 8.9 million [95% confidence interval (CI): 7.5-10.3] deaths globally in 2015, which is 2.2 times of the prediction in the GBD (4.0 million; 95% CI: 3.3-4.8) (Forouzanfar et al. 2016).

Species and source-specific exposure estimates
Recently, an interest in identifying characteristics of PM 2.5 associated with health risks beyond total mass concentration has surged in the health and air quality fields, but progress is limited by robust evidence for the impact of PM 2.5 composition or emission sector on health outcomes (Kioumourtzoglou et al. 2015). A challenge in identifying such relationships has been the limited availability of species and source-specific exposure estimates. A review of Bates et al. (2019) summarized the recent progress for particle-bound reactive oxygen species (ROS) and oxidative potential (OP) measurement techniques. They discussed the compositional impacts on OP as well as health effects and highlighted the importance of specific emission sources including metals, organic carbon, vehicles, and biomass burning to OP. When there is a lack of measurements of OP at various times and locations, modeling approaches such as land-use regression and source impact regressions are often used (Bates et al. 2015;Fang et al. 2016;Yang et al. 2016). While models can be used to estimate species and sector-specific PM 2.5 exposures, model biases, limited resolution, and other uncertainties still limit the accuracy in modelbased exposure estimates.
A number of different research groups have fused models with satellite and/or ground-based measurements, as discussed above. This same approach can be extended to the chemical speciation of aerosols available in models. For example, Ivey et al. (Ivey et al. 2015(Ivey et al. , 2017 used a combination of model sensitivity analysis and receptor modeling at monitor locations to estimate source contributions to total PM 2.5 from 20 different sources, including contributions from inorganic aerosol species and metals. Zhai et al. (2016) investigated mobile source contributions to air pollution concentration fields including PM 2.5 at 250-m fine resolution using a calibrated dispersion model and ground monitor data. Their results were applied to the estimation of prenatal exposure (Pennington et al. 2017) and childhood asthma outcomes (Pennington et al. 2018) due to traffic air pollution. Lee et al. (2015) used a combination of global adjoint sensitivity modeling and satellite-derived PM 2.5 concentrations to estimate the emitted species contributing to the global total premature deaths associated with long-term exposure to PM 2.5 . Rundel et al. (2015) fused monitoring data and CMAQ output and provided summaries of five major PM 2.5 species for the continental U.S., including sulfate, nitrate, total carbonaceous matter, ammonium and fine soil/crustal material.  used a combination of global modeling (GEOS-Chem) and satellite-derived PM 2.5 concentrations to estimate trends in population-weighted speciated PM 2.5 concentrations worldwide. The species considered were sulfate, nitrate, ammonium, organic aerosol, black carbon, dust, and sea salt, and these were evaluated in comparison to speciated in situ ground monitor measurements in the U.S. The relative trends agreed to within afew percent in terms of the relative trends for most species, with the largest differences being those for natural components (sea salt and dust). Di, Koutrakis, and Schwartz (2016b) calibrated GEOS-chem model simulations using ground monitoring data and predicted 1 km × 1 km resolution PM 2.5 speciation data on daily basis in northeastern U.S. Additionally, van Donkelaar et al. (2019) directly extend the Dalhousie University satellite-derived PM 2.5 methodology to include estimates of chemical composition. Across all species, the agreement with speciated measurements in North America had an R 2 that ranged from 0.57 to 0.96, and a slope from 0.85 to 1.05, with generally the best agreement for sulfate, ammonium, and nitrate (R 2 between 0.86 and 0.96, slope between 0.99 and 1.01).
The upcoming MAIA (Multi-Angle Imager for Aerosols, preformulation~2021) satellite-based remote sensing instrument will make radiometric and polarimetric measurements to better derive aerosol composition from space, helping to provide additional records of species and source-specific PM 2.5 to health researchers in several major urban areas worldwide (Diner et al. 2018).
Improved species and source-specific PM 2.5 surfaces will be of value for several reasons. For example, they can assist in the refinement of concentration-response relationships for epidemiological studies of the health impacts posed by particulate exposure. In addition, they can assist in identifying the PM 2.5 components or the emission sectors (e.g., transportation versus power generation) most responsible for PM 2.5 's toxicity.

Data assimilation and forecasting for episodic air pollution events
To date, most of the applications of data fusion among ground-based monitors, satellites, and/or model simulations were to support public health assessment retrospectively, since the measurements are only available for the past. To project future air quality, a conventional method is the "relative response factor" approach (EPA 2018d), which multiplies a base year's fused concentration field by the ratio of the CMAQ model predictions between the future and base years. Such method has been applied to system development for meeting the NAAQS ) as well as impact analysis (EPA 2012). However, there is clear potential to extend data fusion approaches to air quality forecasting, using satellite and ground-based data as initial conditions assimilated into the models. Health-based PM forecasts tracking smoke from wildland fires are one example of the application of data assimilation.
Wildfires are becoming increasingly important sources for PM 2.5 pollution hazards due to the large quantity of emissions (Larkin, Raffuse, and Strand 2014), the acute episodic nature of the events, the difficulties associated with controlling them, and impacts of a changing climate (Spracklen et al. 2009). In particular, for the Northwest U.S., a positive trend was found in the 98th percentiles of PM 2.5 due to the increasing total areas burned by wildfires. This is in contrast to the decreasing trend of PM 2.5 in the other areas in the contiguous U.S. (McClure and Jaffe 2018). Several previous studies examined the adverse health effects from exposure to wildfire smoke in previous years based on observations and simulations. Rappold et al. (2017)  Compared with the retrospective analysis, the futureoriented forecasts for PM 2.5 emission from wildfire are even more challenging. Because of the episodic nature of wildfire emission, the forecasts would require daily or even hourly PM 2.5 estimates, rather than the annual exposure fields used in health studies like the GBD. In addition, fires typically occur in regions away from major cities where ground-based monitoring data are not always available, which imposes challenges on data assimilation.
There are multiple smoke forecasting systems in the U.S., all of which use models to provide future estimates of ground-level PM 2.5 . Currently, satellite data are used in these systems to inform the location, timing, and characteristics of fire. Fire detection characteristics are combined with land cover data to calculate emissions into the model, and satellite AOD is used to evaluate model performance Draxler and Hess 1998;Herron-Thorpe et al. 2012;Larkin et al. 2009;Schroeder et al. 2008;Stein et al. 2015;Vaughan et al. 2004).
Among these smoke forecasting systems, the most direct integration of satellite AOD and wildland smoke prediction is performed by the Wildland Fire Air Quality Response Program (WFAQRP), which has been developed to address smoke issues from wildfires, bringing the latest in fire emissions and smoke transport science to Incident Management Teams, health and air quality agencies, and ultimately the public. This program provides smoke forecasting expertise, deploys temporary PM 2.5 monitors to augments existing monitoring systems, and communicates information on how to protect oneself from smoke. A daily Smoke Outlook is produced, forecasting expected smoke behavior and level of impact in the region of the fire ( Figure  4). These smoke outlooks are initialized with a statistical model incorporating MODIS AOD and surface PM 2.5 monitoring data to forecast the AQI for the next day (Marsha and Larkin 2019).

Discussion and conclusion
Remaining barriers for obtaining and applying PM 2.5 exposure for health applications This review discusses the methods, sources, and application of spatially continuous PM 2.5 datasets derived from combinations of ground-based data, satellites, and/or atmospheric models. The data fusion products leverage the benefits of each data source, providing PM 2.5 exposure estimates over continuous spatial scales. This new resource for air quality data has already been used for health assessments, and has potential for application to research, public outreach, and environmental management. Nevertheless, several challenges remain in obtaining and applying PM 2.5 exposure for health applications. For example, missing data coverage in both ground monitors and satellite observations requires additional efforts for merging multiple data sources or interpolating data in space and time. When choosing the methods for data fusion or interpolation, users often need to weigh between the advantages and disadvantages of different methods. One may choose the efficient calculation of the Bayesian statistical downscaling method for lower spatial resolution (such as 12-km resolution in Wang et al. (2018)), while others may choose the more computational expensive methods that combine CTM, satellite and ground monitors (van Donkelaar et al. 2019), or machine learning techniques (Di et al. 2019;Hu et al. 2017;Reid et al. 2015). Methods that incorporate dispersion modeling may also be useful in some applications (Ahangar, Freedman, and Venkatram 2019;Scheffe et al. 2016). Depending on the methods being used in data fusion, assimilating the ground monitor data does not guarantee similar output of surface PM 2.5 , as demonstrated in our case study of four datasets in Section 3.1. Such variability among publicly available datasets calls for more intercomparison studies to contrast and explain their differences. Future intercomparison studies are recommended to isolate individual factors contributing to comparison results, including but not limited to spatiotemporal variabilities of surface PM 2.5 (e.g., seasonal variability, topography, eastern versus western U.S.), data sources being used (model, satellite and/or monitors), regression of AOD -PM 2.5 relationships, representativeness of ground monitor data (e.g., weighting functions of monitors over a larger scale), spatial and temporal resolutions, etc. As end users of these PM 2.5 datasets, researchers in public health should be mindful that the agreement between fused PM 2.5 data and monitored PM 2.5 evaluated by R or R 2 are affected by multiple factors and assumptions being used in such evaluation (e.g., representativeness of monitor data, spatiotemporal interpolation, data resolutions, etc.). Varying sampling schedules for networks (i.e., higher frequency in urban areas versus lower frequency in rural areas) create another challenge when using ground monitor data as the gold standard, particularly for developing daily concentration fields. On days without rural sampling, predicted concentrations in rural areas might be overestimated because they are based on fits to predominantly urban monitoring. One way to combat this is to perform fusion on longer term monitor averages (monthly, quarterly, etc.). We recommend that researchers from both communities (i.e., data development and public health sectors) work together when applying the PM 2.5 estimates into health impacts assessments and/or air quality management actions. One example is to guide the model evaluations by the usage of PM 2.5 fields in health studies, which may be focused on a specific concentration range (e.g., lower concentrations in the rural areas versus higher concentrations in the urban areas), or a specific population. These specific targets may not be thoroughly tested by national or regional crossvalidation.

Web-based tools for PM 2.5 exposure analyses
Despite the benefits of these data and the rapidly advancing research in this area, there are several key steps toward wider utilization of these data. One key step is to downsize spatially continuous data based on the types of applications. Although research groups may provide global or national data, most users require a small subset of data over a particular region and time period. Online mapping tools and options that allow users to subset data prior to download can significantly reduce the data management burden on users. Another key step is to maintain and foster training tutorials and seminars on specialized mapping software. For instance, the NASA Applied Remote Sensing Training (ARSET) program has been providing in-person and online tutorials for these purposes since 2009 (NASA 2019).
Several web-based tools are available for analyzing and predicting PM 2.5 exposure. The EPA Remote Sensing Information Gateway (RSIG), developed by U.S. EPA in collaboration with NASA, allows rapid retrieval and subsetting of satellite, model and ground-based data relevant to air quality. The RSIG website can be accessed at: https://www.epa.gov/hesc/remote-sensing-informationgateway, which provides multi-source, daily PM 2.5 data at up to 12-km horizontal resolution. Users can select only the subset of data needed for a particular application to be downloaded. RSIG can also combine multiple variables to simplify the burden of data analysis on the user. For example, if one is interested in comparing CMAQ-modeled PM 2.5 with AQS observations, the AQS observations can be subsetted and regridded to the model grid by specifying the model configuration (e.g., domain range, projection, resolution) (EPA RSIG 2019). As of April 2018, multiple PM 2.5 datasets are available from RSIG, including the EPA AQS in-situ monitor data, standard CMAQ model simulations over the continental U.S. conducted by the U.S. EPA, the FAQSD, and the combination of ground-based monitor and model data used by the CDC EPHTN. In addition, RSIG also provides up-to-date NASA MODIS AOD, and the National Environmental Satellite, Data and Information Service (NESDIS) Biomass Burning Emissions Data.
The University of California Berkeley's Earth Air Quality Map (http://berkeleyearth.org/air-quality-realtime-map/) provides near-real-time maps of AQI values for PM 2.5 concentrations in the U.S. and several countries and regions outside of U.S. (e.g., China, Canada, Europe, etc.) at 0.1-degree resolution. As specified on the website, preliminary data from surface station measurements are being used with automated quality control procedure, and interpolated based on kriging method. Daily maps of PM 2.5 AQI values are available to download from June 2016 to March 2017.
When data are mapped through a web application, it can significantly reduce the burden of creating a map or plot. However, users often want the ability to create maps in a standard software platform to integrate with other aspects of their work. Currently, the netCDF is the most common format for atmospheric data, including PM 2.5 surfaces. Many programming languages, such as Python, the Interface Definition Language (IDL), MatLab, and the free National Center for Atmospheric Research (NCAR) Command Language (NCL) support netCDF data. ArcView GIS also supports netCDF, but users may have difficulty plotting large netCDF files in GIS. Because GIS platforms are so widely used among health, land planners, and policy communities, we recommend that data providers also include the standard release of GIS shapefiles as a distributed data format. Typically, shapefiles provide data in political or geographic spatial domains (e.g., U.S. counties or census tracts) rather than the grid format, which is typically used in atmospheric research.

Ongoing and future research efforts
Besides providing more guidance on the tools to potential users, improved utilization of PM 2.5 exposure datasets can be supported by increased validation. Because ground-based monitors, satellites, and models are often combined to estimated surface PM 2.5 , there are few independent data sources for validation. A recent study over New York State uses independent groundbased observations from the New York City Community Air Quality Survey (NYCCAS) Program and the Saint Regis Mohawk Tribe Air Quality Program to evaluate seven PM 2.5 products (Jin et al. 2019b). Jin et al. suggest inclusion of satellite remote sensing improves the estimate of surface PM 2.5 in the remote area, but little gains over urban area. One of the networks with continuous efforts to evaluate satellitederived PM 2.5 estimates is the publicly available Surface Particulate Matter Network SPARTAN (www.spartannetwork.org) (Snider et al. 2015(Snider et al. , 2016, which measures fine particle aerosol concentrations and composition continuously over multi-year periods at international sites where AOD is also measured by ground-based instruments. When validating PM 2.5 exposure data with ground-based observations, one caution is that the agreement between the fused data fields and in situ monitor data is related to the weight given to monitor data in the data fusion algorithm and the spatial and temporal scales that the in situ monitor data are assigned to represent. Certain criterion for data quality control may also cause a sampling bias. Linking satellite observations of AOD with groundlevel PM 2.5 estimates requires accurately understanding the relationship between aerosol extinction and aerosol mass abundance, which depends on multiple factors such as aerosol size distribution, hygroscopic growth (Brock et al. 2016a;Ziemba et al. 2013) and ambient relative humidity (Brock et al. 2016b). Targeted research efforts have been underway to advance understanding of these issues (Jin et al. 2019a). For example, the NASA DISCOVER-AQ aircraft campaign took place over four urban areas in the U.S. from 2011-2014, and provides extensive profiling of aerosol optical, chemical and microphysical properties at locations coincident with ground-based PM 2.5 sites. This multi-platform suite of observations is ideal for analyzing the relationship between satellite column AOD and ground-level PM 2.5 abundances (Crumeyrolle et al. 2014;Jin et al. 2019a).
While we have focused our discussion here on AOD, the most widely used metric for ambient particulates, there are other remote sensing instruments that can expand the value of satellite-based information (some of these are already in use in the data products presented here). For example, instruments aboard the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) provide vertical profiles of aerosol extinction with limited spatial coverage, which can be used to detect aerosols present above the surface (Ford and Heald 2013), and to correct model biases in the vertical profiles of aerosol extinction when quality-controlled retrievals are available Li, Carlson, and Lacis 2015;van Donkelaar et al. 2016van Donkelaar et al. , 2013.
Ongoing research efforts have great potential to benefit spatially continuous PM 2.5 data fusion efforts. New geostationary satellites will provide improved temporal coverage, as they orbit with Earth, and can track the evolution of weather systems, wildfire smoke, urban pollution, and other factors affect ground-level PM 2.5 . For example, the GOES-16 satellite provides AOD at a 2-km horizontal resolution every 5 min over the continental U.S. and other areas in its field of view. Improvements in these data, and other upcoming satellites, hold the potential to revolutionize the role of satellite data in estimating ground-level PM 2.5 . Ongoing improvement in numerical modeling of the atmosphere, supported by decreasing computational cost, will also improve these data fusion products, allowing for higher-resolution model simulations. The rapid rise in low-cost PM monitors also offers opportunities for obtaining surface PM 2.5 information around the world, if low-cost monitors can be reliably calibrated.
Publicly available datasets are already supporting a wide range of air quality and health applications benefiting from spatially continuous PM 2.5 data. With the increased transparency of data products and methods, improved dissemination of data to support GIS mapping software, and plain-language communication of complex ideas have the potential to vastly expand the relevance of emerging data for health and air quality. As new users explore and evaluate these tools, their feedback to the research community can inform and improve future activities. A two-way dialog between researchers and stakeholders, e.g., the NASA Health and Air Quality Applied Sciences Team (HAQAST 2019) and the NASA ARSET training program (NASA 2019), can be very helpful in defining research priorities and ensuring outcomes to serve wider needs.

Acknowledgments
The

About the authors
Minghui Diao is an assistant professor at San Jose State University. She received her B.S. degree from Peking University and Ph.D. degree from Princeton University. She is a member of the NASA Health and Air Quality Applied Sciences Team (HAQAST). Her current research interests include remote sensing of aerosol optical depth and PM2.5, and air quality assessment in California. For more information please see: www.cloud-research.org. Xiaomeng Jin is a graduate student in Department of Earth and Environmental Sciences at Columbia University. She is working with Dr. Arlene Fiore working on atmospheric chemistry. Her research uses satellite observations and atmospheric chemistry models to analyze urban surface ozone and particulate matter pollution, and their sensitivities to precursor emissions.
Arlene M. Fiore is a professor of atmospheric chemistry in the Department of Earth and Environmental Sciences of Lamont-Doherty Earth Observatory and Columbia University and an associate faculty member of the Earth Institute. For more information please see http://blog.ldeo. columbia.edu/atmoschem/.
Daven K. Henze is an associate professor of mechanical engineering at the University of Colorado Boulder. He received his Ph.D. in chemical engineering from Caltech. He is a member of the NASA HAQAST team and researches the sources and fates of aerosols and trace-gases using adjoint sensitivity analysis, inverse modeling, and data assimilation.
Forrest Lacey is an Advanced Study Program postdoc fellow at the National Center for Atmospheric Research. He also worked at the University of Colorado, Boulder as a postdoc and an adjunct professor.
Patrick L. Kinney is a professor in the School of Public Health at Boston University. He received his B.S. degree in University of Colorado, Boulder, and his M.S. and ScD degrees from Harvard School of Public Health.
Frank Freedman is an adjunct faculty at San Jose State University. He received his B.S. and M.S. degrees from San Jose State University, and Ph.D. degree from Stanford University.
Narasimhan K. Larkin is a research scientist and team leader with the USDA Forest Service Pacific Northwest Research Station's AirFire Team and a member of the NASA HAQAST science team. He received his Ph.D. from the University of Washington studying climate variability. Since 2001 he has led research into wildfires, climate, and smoke, including developing emissions inventories, smoke forecasting systems, and operational decision support tools used across the United States and in other countries.
Yufei Zou is a postdoctoral research associate at Pacific Northwest National Laboratory (PNNL). Before joining PNNL, He received his Ph.D. degree in Earth and Atmospheric Sciences from Georgia Institute of Technology in 2017 and then worked at the University of Washington in Seattle for two years. His research mainly focuses on natural hazards and extreme weather events such as large wildfires and severe air pollution associated with global climate change.
James T. Kelly is an Environmental Scientist in the Office of Air Quality Planning & Standards at the U.S. Environmental Protection Agency.
Ambarish Vaidyanathan is a health scientist with the Centers for Disease Control and Prevention. He received his Masters and Doctoral degrees in Environmental Engineering from Georgia Institute of Technology, Atlanta, GA. Dr. Vaidyanathan has broad training and work experience in epidemiology, air quality modeling, statistics, and geographic information systems. He has collaborated with NASA and NASA funded partners on remote sensing projects since 2005.