Numerical simulation of an extreme haze pollution event over the North China Plain based on initial and boundary condition ensembles

ABSTRACT The North China Plain often suffers heavy haze pollution events in the cold season due to the rapid industrial development and urbanization in recent decades. In the winter of 2015, the megacity cluster of Beijing–Tianjin–Hebei experienced a seven-day extreme haze pollution episode with peak PM2.5 (particulate matter (PM) with an aerodynamic diameter ≤ 2.5 μm) concentration of 727 μg m−3. Considering the influence of meteorological conditions on pollutant evolution, the effects of varying initial conditions and lateral boundary conditions (LBCs) of the WRF-Chem model on PM2.5 concentration variation were investigated through ensemble methods. A control run (CTRL) and three groups of ensemble experiments (INDE, BDDE, INBDDE) were carried out based on different initial conditions and LBCs derived from ERA5 reanalysis data and its 10 ensemble members. The CTRL run reproduced the meteorological conditions and the overall life cycle of the haze event reasonably well, but failed to capture the intense oscillation of the instantaneous PM2.5 concentration. However, the ensemble forecasting showed a considerable advantage to some extent. Compared with the CTRL run, the root-mean-square error (RMSE) of PM2.5 concentration decreased by 4.33%, 6.91%, and 8.44% in INDE, BDDE and INBDDE, respectively, and the RMSE decreases of wind direction (−5.19%, −8.89% and −9.61%) were the dominant reason for the improvement of PM2.5 concentration in the three ensemble experiments. Based on this case, the ensemble scheme seems an effective method to improve the prediction skill of wind direction and PM2.5 concentration by using the WRF-Chem model. GRAPHICAL ABSTRACT


Introduction
With the rapid industrial development and urbanization that has taken place in eastern China during the last several decades, the frequency and intensity of haze pollution have also been increasing-especially in the megacity cluster of Beijing-Tianjin-Hebei (BTH) (Zheng 2013;Cai et al. 2017a;Wu, Ding, and Liu 2017;Li et al. 2018). BTH is located in the North China Plain region, which is one of the most severe haze pollution regions in China (Quan et al. 2014;Yang et al. 2015). In January 2013, a total of five heavy haze pollution episodes were recorded over this region, with the highest hourly PM 2.5 concentration reaching approximately 1000 μg m −3 (Ji et al. 2014). Such a sharp decline in air quality greatly threatens human health and traffic safety (Xu, Chen, and Ye 2014).
In view of the frequent occurrence and seriousness of haze pollution events, accurate air quality prediction is demanding. Numerical simulation experiment of past haze events is a good way for achieving an understanding of the formation mechanisms, regional transportation and source apportionment of air pollution (Yang et al. 2015;Li et al. 2017;Wang et al. 2018). However, due to the highly nonlinear nature of the atmosphere and chemical substances, it is still a challenge to accurately predict severe haze events, even with state-of-the-art numerical models (Mallet, Stoltz, and Mauricette 2009;Zhang et al. 2012;Tuccella et al. 2012;Zhang et al. 2017). Ensemble-based forecasting has been investigated to improve the predictability of PM 2.5 concentration. For example, Cai et al. (2017b) carried out PM 2.5 ensemble forecasting experiments over Tianjin using the Weather Research and Forecasting model coupled to chemistry (WRF-Chem) based on various boundary layer schemes, which reduced the relative error and root-mean-square error (RMSE) of PM 2.5 concentration. Wang et al. (2009) constructed a multimodel ensemble operational forecasting system based on NAQPMS (Nested Air Quality Prediction Modeling System), CMAQ (Community Multiscale Air Quality), CAM X (Comprehensive Air Quality Model with Extensions) and other models, and the correlation coefficient of daily mean PM 10 (PM with an aerodynamic diameter ≤ 10.0 μm) during the Beijing Olympic Games exceeded 0.7. Besides, the weighted integration method using different model components was found to be better than the arithmetic average result. Multi-model ensemble forecasting with the linear regression method, as reported by Huang et al. (2015), can reduce the uncertainty of PM 10 forecasts and improve the skill in predicting pollution episodes over the Beijing area.
The dominant multi-model ensemble or varying the model physics parameterization scheme ensemble in air quality forecasting aims to reduce the uncertainties caused by physical and chemical parameterizations and numerical calculation methods (Žabkar, Koračin, and Rakovec 2013;Petersen et al. 2018;Crippa et al. 2019). In fact, a lack of consideration for emission sources and meteorological conditions can also lead to large uncertainties in air quality simulation (Tang et al. 2010). However, few studies have investigated the effects of variations in meteorological fields on the simulation or prediction of PM 2.5 concentration. Therefore, the purpose of this study was to explore the effects of varying the initial and lateral boundary values of meteorological fields on the generation and transmission of PM 2.5 through ensemble runs with the WRF-Chem model, and to evaluate whether the ensemble experiments can effectively improve the simulation of PM 2.5 concentration in regional air quality models. An extreme haze pollution event in 2015 was selected as a typical case to carry out the study.
The North China Plain suffered a series of serious haze pollution events in winter 2015 (Zhang et al. 2017). From 26 November to 2 December, the megacity cluster of Beijing-Tianjin-Hebei (BTH) endured an extreme regional haze pollution event. Most of the hourly mean PM 2.5 concentrations during this haze episode exceeded 100 μg m −3 , and were even higher than 200 μg m −3 in Beijing and central and southern Hebei (Figure 1). The observed peak concentration of PM 2.5 reached 727 μg m −3 on 30 November in the city of Baoding, and it was higher than 500 μg m −3 in Beijing throughout most of the day on 1 December. Because of the severity of this event, many studies on its formation and evolution processes have been carried out, as well as the associated regional transportation and source apportionment Li et al. 2017). Li et al. (2017) found that this event can be divided into four stages: the slow formation stage (27-28 November); the peak-sustaining stage, except in Beijing (29-30 November); the rapidly increasing stage in Beijing (1 December); and the clearance stage (2 December). In our study, the focus was on an evaluation of ensemble simulation results. The data, model configuration, and experiment design used in this study are described in section 2. A basic model verification and detailed analysis of the ensemble simulation results are presented in section 3. A summary and concluding remarks are given in the final section.

Model configuration and experiment design
In this study, the WRF-Chem model, version 3.9.1.1 (Grell et al. 2005;Skamarock et al. 2008;Peckham et al. 2011;Powers et al. 2016) was adopted to simulate the heavy haze pollution event. Considering the influence of meteorological conditions on pollutant evolution, the effects of varying the initial conditions and lateral boundary conditions (LBCs) of WRF-Chem on the variation in PM 2.5 concentration were investigated through ensemble methods. We carried out a control run (CTRL) and three groups of ensemble forecasting experiments based on different atmospheric initial conditions and LBCs. All of the experiments shared the same physical and chemical parameterization schemes: the Morrison double-moment microphysics scheme, RRTMG short-and longwave radiation schemes, Noah land surface model, Yonsei University planetary boundary layer scheme, Grell-Freitas cumulus parameterization scheme, CBMZ chemical mechanism and MOSAIC using eight sectional aerosol bins including some aqueous reactions, and Fast-J photolysis. Besides, we opened sensible and latent heat fluxes on the ground and the radiation feedback mechanism for clouds in the physical process. The dry and wet deposition of aerosols and the direct effect between aerosols and clouds have been considered in the chemical process (Chapman et al. 2009). A 0.25°× 0.25°spatial resolution of anthropogenic emissions (monthly in 2012) was used as the chemical fields to initialize the WRF-Chem model, which was provided by the MEIC (Multi-resolution Emissions Inventory for China) group of Tsinghua University (http://www.meicmodel.org/) Liu et al. 2015;Cheng et al. 2016). Figure 1 shows the model domain with a grid size of 9 km. There were 30 σ-levels in the vertical direction and the model top was set at 50 hPa.
We used the latest released fifth-generation European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis of the global climate (i.e., ERA5) as the model initial conditions and LBCs. The ERA5 dataset is composed of a 31 km high-resolution forecast (HRES) and a 62 km reducedresolution 10-member 4D-Var ensemble of data assimilation (EDA) in CY41R2 of ECMWF's Integrated Forecast System (IFS) . In our study, both HRES and the 10 EDA members were pre-interpolated to a regular 0.25°× 0.25°latitude/longitude grid with 137 hybrid sigma/pressure (model) levels in the vertical direction. HRES and the 10 EDA members are available hourly and every three hours, respectively. It was found that the meteorological forcing variables do show distinct differences between HRES and the 10 EDA members, almost everywhere in the model domain (e.g., 10 m meridional wind at 0000 UTC 25 November 2015; Figures S1a-c). Also, the standard deviation among the 10 EDA members was also obvious (Fig. S1d). In the CTRL run, HRES data were adopted as the initial conditions and LBCs, and therefore the LBCs were updated hourly. Next, three groups of ensemble experiments (named INDE, BDDE and INBDDE) were executed, and each group involved 10 ensemble members. In group INDE, the initial conditions of each run were derived from the 10 EDA members, and the LBCs remained the same as in the CTRL run (i.e., hourly HRES). In group BDDE, the LBCs of each run were derived from the 10 EDA members, which were updated every three hours, and the initial conditions were the same as in the CTRL run. In group INBDDE, both the initial conditions and LBCs in each of the 10 runs were from the 10 sets of EDA members. The differences between CTRL, INDE, BDDE and INBDDE are summarized in Table 1.
The simulations in both CTRL and the three groups of ensemble runs were initiated at 0000 UTC 25 November 2015 and then continuously integrated to 0000 UTC 3 December 2015. The first 16 hours of integrations were treated as the model spin-up time. The hourly model outputs during 1600 UTC (0000 LST) 26 November to 1500 UTC (2300 LST) 2 December 2015 were used for analysis.

Mathematical methods
The systematic bias and RMSE were used to evaluate the performance of different simulations. The systematic bias is defined as follows: where 'bias' is the systematic bias, c im represents the model output value of the ith sample, c iobs represents the observed value of the ith sample, and n is the total number of samples. The RMSE is defined as follows: (2) where diff i ð Þ ¼ c im À c iobs , and 'bias' and n have the same interpretations as in Equation (1) (He 2018). In each group of ensemble runs, the computations of bias and RMSE were based on an equal weight average of the 10 ensemble members.
The ensemble efficiency (EE) was defined as follows: where, RMSE CTRL refers to the computed RMSE in CTRL according to Equation (2) In addition, we also used the Pearson correlation coefficient (R) between the model outputs and the corresponding observations to evaluate the model skill.

Model verification
As mentioned in section 2.1, we selected a total of eight highly-polluted cities for model evaluation and subsequent analysis. All of them lay in the BTH megacity cluster. The simulation results were all interpolated to the exact in-situ environmental/meteorological observation sites. In general, CTRL reproduced the meteorological conditions of the haze event reasonably well. The R of 2-m relative humidity, temperature, and 10-m wind speed was 0.78, 0.83, and 0.61, respectively (eight cities, 1337 valid samples used).
To illustrate the model performance regarding PM 2.5 simulation, the hourly evolution of PM 2.5 concentration at the eight cities during 26 November to 2 December from both observation and CTRL model output are shown in Figure 2(a). Intuitively, the hourly PM 2.5 concentration fluctuated intensively, with sharp increases in BJ, BD, and HS especially. These anomalous fluctuations may have been caused by the rapid movement of locally contaminated air masses, which may have derived from the boundary layer jet or local valley wind circulation (Liao et al. 2016;Chen et al. 2018). However, the CTRL run of the WRF-Chem model showed quite different performance regarding PM 2.5 in the eight cities. The R varied between 0.29 and 0.69 (above 0.6 at BJ, SJZ and XT; between 0.4 and 0.5 at TJ, BD, CZ and HS). Specifically, the simulation underestimated the concentration during peak-sustaining stage (roughly 29 November to 1 December) at BJ, TJ, BD, CZ, and HS. The abovementioned sharp increases of PM 2.5  concentration were all missed by CTRL. Therefore, it is a challenge for WRF-Chem to simulate such a longlasting (7 days) extreme haze pollution eventespecially the sudden increase or decrease of instantaneous PM 2.5 concentration.

Ensemble simulation results
In order to evaluate the forecasting skills of the three groups of ensemble experiments (i.e., INDE, BDDE, INBDDE), a series of linear fitting and regression analyses were performed for the hourly PM 2.5 concentration between observation and CTRL, the ensemble mean values of ensemble runs. There are a total of 1337 valid sample points (7 (days) × 24 (hours) × 8 (cities), removing 7 missing values) in the four scatterplots shown in Figure 3. The r 2 statisticusually referred to as the determination coefficient, the ratio of regression variance to observation varianceis a statistical measure of how close the data are to the fitted regression line (Huang 2004).
In contrast to the single, deterministic initial conditions and LBCs in CTRL, all of the three groups of ensemble runs did indeed improve the prediction skill of PM 2.5 concentration. The r 2 (RMSE) increased (decreased) from 0.24 (124.23) in CTRL to 0.31 (113.75) in INBDDE (cf. Figure 3(a,d). Not limited to PM 2.5 , INBDDE showed the best performance in terms of the surface meteorological variables, followed by BDDE and INDE ( Table 2). The ensemble simulation experiment that simultaneously changed the initial conditions and LBCs (i.e., INBDDE) was an effective method to improve PM 2.5 forecasting compared to only changing the initial conditions (INDE) or LBCs (BDDE), in this case. However, what was modulating the variation in PM 2.5 concentration in the INBDDE experiment? We further examined the variation of near-surface meteorological factors to understand the possible mechanism. Table 2  −8.89%, and −9.61%) seemed to be more dominant.
Because we performed an eight-day simulation, the simulation of wind direction was more sensitive to the ensembles of the LBCs than those of the initial conditions in this case study. Also, the ensemble scheme involving simultaneously changing the initial conditions and LBCs led to the best performance for both wind direction and PM 2.5 concentration. Wind direction variation means that the regional transportation of PM 2.5 among cities may be changed, despite wind speed remaining steady. Next, we discuss the changes in PM 2.5 spatial distribution caused by the wind direction adjustment in INBDDE (with the most obvious variations). Figure 2 shows the hourly evolution of PM 2.5 and CO concentration in the eight cities from observations, CTRL and INBDDE ensemble means. In contrast to CTRL, the PM 2.5 concentration in INBDDE decreased in southern cities of the megacity cluster (HD, HS, and XT), but increased in northern cities (BJ, TJ, BD, and CZ), roughly from 28 November to 1 December. The dispersion among ensemble members in INBDDE tended to decrease in southern cities. The R of PM 2.5 in XT, HD, and HS (southern cities) decreased from 0.69, 0.29 and 0.42 in CTRL to 0.61, 0.2, and 0.3 in INBDDE. But, the R in BJ, TJ, BD, and CZ (northern cities) increased from 0. 66,0.4,0.47 and 0.48 in CTRL to 0.75,0.6,0.74 and 0.56. Based on these analyses, it can be roughly concluded that the INBDDE ensemble mean improved the prediction skill of the regional transport of atmospheric pollutants, i.e., PM 2.5 from the southern cities to the northern cities, which will reduce the pollution in the southern region but aggravate it in the northern region, and finally lead to an overall good simulation.
CO is a kind of inactive gas, which is a good indicator for analysis of air pollutant transportation. It is difficult for CO to participate in atmospheric chemical reactions near 0°C, which means the observed CO concentration is mainly affected by the co-effects of local source emissions and crossregional transmission (Saide et al. 2011). So, we used CO as a tracer gas to test the hypothesis. In Figure 2 (b), CO shows a similar evolution trend with PM 2.5 in the eight cities from 29 November to 1 December: the CO concentrations in the INBDDE ensemble mean are lower in HD, HS, and XT than those in CTRL, but higher in TJ, BD, and CZ. Therefore, it is possible that INBDDE performs better in capturing the effect of intraregional transmission or variation of near-surface wind direction on PM 2.5 than the CTRL simulation.
We used the spatial difference fields to further understand the above transportation processes (Figure 4). On 26 and 27 November, there was no significant redistribution of PM 2.5 , with fewer surface wind differences. From 28 November, the southerly wind anomaly appeared in the south-to-central part of Hebei Province. The southerly wind anomaly strengthened on 29 November and reached its peak of over 1 m s −1 on 30 November. Due to the sustained anomalous southerly wind, the PM 2.5 concentration in southern Hebei began to reduce on 28 November, and then this phenomena further extended to central Hebei on 29 November. In contrast, the PM 2.5 concentration increased distinctly in central to northern Hebei on the same day. On 30 November, the PM 2.5 concentration in southern Hebei reduced by more than 80 μg m −3 , while it increased by more than 60 μg m −3 in central to northern Hebei, which corresponded to 27% and 32% of the daily mean PM 2.5 concentration predicted by CTRL in HD and BD (288.5 μg m −3 and 185.9 μg m −3 ), respectively. The differences weakened on 1 December and tended to disappear on 2 December. Therefore, in the INBDDE ensemble runs, the pollutants located in southern Hebei Province were transported to the northern region through locally strengthened southerly wind. This redistribution of the PM 2.5 concentration resulted in the overall improvement in the eight cities.

Summary and conclusion
In this paper, we investigated the sensitivity of PM 2.5 concentration evolution to meteorological field variations through a series of numerical experiments with the WRF-Chem model. A seven-day, longlasting extreme haze pollution event over the BTH megacity cluster was selected. Based on ERA5 reanalysis data and 10 EDA members, we carried out (μg m -3 ) a CTRL and three groups of ensemble experiments with different meteorological initial conditions and LBCs. The CTRL run reproduced well the meteorological conditions and the overall life cycle of the haze event over the BTH region. However, it failed to capture the sudden increase or decrease in instantaneous PM 2.5 concentration in the eight selected cities. In general, the ensemble experiments improved the simulationespecially that of PM 2.5 concentration. Compared with CTRL, the RMSE of PM 2.5 concentration decreased by 4.33%, 6.91%, and 8.44% in INDE, BDDE, and INBDDE, respectively. This implies that the PM 2.5 concentration predicted by WRF-Chem is sensitive to the background meteorological field. In fact, there was little improvement in the nearsurface meteorological variables (e.g., 2-m relative humidity, temperature and 10 m wind speed) in the three groups of ensemble experiments, except the 10 m wind direction. It was found that the RMSE decreases of 10-m wind direction (−5.19%, −8.89%, and −9.61%) seemed to be the dominant reason for the improvement in PM 2.5 prediction skill in the three groups of ensemble experiments. For example, in the INBDDE run, the initial conditions and LBC ensemble schemes resulted in an increased southerly wind anomaly (compared to CTRL) in the southern part of Hebei Province, which consequently led to northward transport of PM 2.5 and therefore improved the PM 2.5 concentration spatial distributionespecially in the northern part of the BTH region. In conclusion, based on this case, the initial conditions and LBCs-based ensemble scheme is an effective method to improve the prediction skills of wind direction and PM 2.5 concentration by using the WRF-Chem model.
It should be mentioned that a sharp increase or decrease in PM 2.5 concentration has happened frequently in recent years, which presents a great challenge for both real-time air quality prediction and numerical simulation studies. The ensemble schemes in this paper are expected to provide a possible way for air quality forecasting. Also, it may be a potential way to make ensemble forecasts by simultaneously considering both meteorological conditions and physical/chemical parameterization schemes to reduce the random errors.