Impact of high-frequency observations on fog forecasting: a case study of OSSE

Abstract Fog that refers to the concentration of ice or water droplets in the near surface air is an important short-time meteorological phenomenon. As the measure of visibility of air environment, it directly affects societal economic activities and daily lives. As more and more high-frequency observations (observations with short time intervals) become available, understanding how to make full use of such observed data to improve fog forecasting is an important and urgent research topic. Based on the Weather Research and Forecasting (WRF) Model and an observation simulation system experiment (OSSE) framework, this study explores a modified three-dimensional variational (3D-Var) data assimilation (DA) scheme to address the utilization of high-frequency observations on fog forecasting. In the modified 3D-Var scheme, the large-scale analysis constraint (LSAC) method is employed to the WRF 3D-Var. A dense fog event, which occurred in the North of China in 2007, is selected for the case study. Experimental results show that coherently combining high-frequency observational information with large-scale analysis information enables to significantly improve the 3D-Var analyses and the initialized model forecasts of fog coverage, especially over areas with coarse observations. The modified scheme is therefore promising for improving the routine forecasting of coastal sea fog. The optimal DA interval for fog forecasting is also discussed in this study.


Introduction
Fog is a type of weather phenomenon that reduces the atmospheric horizontal visibility (AHV) to below 1 km due to the suspending of ice or water droplets in the atmosphere near the surface (Glickman, 2000). With the increasingly heavy traffic, there has been a recent increase in demand for accurate fog forecasting. However, fog forecasting is currently still challenging. For example, Zhou (2011) pointed out that current performance of fog forecasting is much lower than that of precipitation forecasting from the same operational prediction systems at the National Centers for Environmental Prediction (NCEP). Previous studies have shown that improvements of fog forecasting could be achieved by increasing model resolution, improving physical parameterization and initial conditions (Ballard et al., 1991;Muller, 2006;Hu et al., 2014). Zhou et al. (2012) pointed out that even in the ensemble system, a model with higher horizontal resolution is expected to bring more skilful fog prediction. Beside model resolutions, fog forecasting has also shown to be highly sensitive to initial conditions (i.e. Hu et al., 2014). Thus, a data assimilation (DA) system providing initial conditions that incorporates information at different model-resolved scales is important for enhancing the accuracy of fog forecasting.
The large-scale information in initial conditions could be obtained from the conventional observations of coarse spatial and temporal resolutions, such as globally used data from the Global Telecommunication System (GTS). However, for the model-resolved information with smaller scales, observations of higher spatial and temporal resolutions are required. In our previous study (Hu et al., 2017), the impact of surface data and wind profiler data on fog forecasting has been explored from the aspect of spatial resolutions using the three-dimensional The observational network that consists of the intensive surface AWSs and PBL wind profilers with high spatial resolutions is applied to a dense fog event occurring in 2007 over the North China. This paper is structured as follows. After introduction, Section 2 gives methodology, including a brief description of the WRF model and its 3D-Var DA system, the case overview, the OSSE configuration and experimental design, as well as the method of evaluation. Section 3 presents the modified scheme for the assimilation of high-frequency observations, starting from analysing the problem of traditional 3D-Var in assimilating high-frequency observations. Section 4 gives the results of the modified scheme and the corresponding physical analyses. Finally, the summary and discussions are given in Section 5.

Brief description of WRF model and its 3D-Var DA scheme
In this study, the Advanced Research WRF (ARW) version 3.3.1 (Skamarock et al., 2008) is adopted for fog simulation and prediction experiments. Model configurations and physical parameterization schemes are described below. Three two-way nested domains (see Fig. 3 of Hu et al., 2014) are used with horizontal grid spacings of 27 km (D01), 9 km (D02) and 3 km (D03), respectively. In the vertical profile, there are 40 full sigma levels with 7 sigma levels below 1 km, 1 and the model top is located at the 50-hPa level. To obtain more realistic vegetation information, we use the 500-m land use data as of the year 2000 (Zhang et al., 2007) instead of the conventional 30-s United States Geological Survey (USGS) land use data (Hitt, 1994) for D03. The physical parameter schemes employed for all three domains are the WRF single-moment (WSM) 6-class microphysics scheme (Hong and Lim, 2006), the rapid radiative transfer model (RRTM) longwave radiation scheme (Mlawer et al., 1997), the Dudhia shortwave radiation scheme (Dudhia, 1989), and the quasi-normal scale elimination (QNSE) PBL and surface layer schemes (Sukoriansky et al., 2005). The Kain-Fritsch cumulus scheme (Kain, 2004) is applied only for D01 and D02.
In this study, version 3.3.1 of the WRF 3D-Var (Barker et al., 2004) is adopted as the assimilation system (referred as traditional 3D-Var hereafter). The optimal analysis is obtained by iteratively minimizing the cost function, written as: where J b and J o are the background and observational terms, respectively. The vectors x, x b and y o represent the analysis variable, background (or first guess) variable and observation variable, respectively. B and R stand for the background and (1) variational (3D-Var) DA system of the Weather Research and Forecasting (WRF) Model with observation simulation system experiments (OSSE). Our results showed that compared with conventional observations, surface data and wind profiler data with higher spatial resolutions could significantly improve fog forecasting with a better dynamical and thermodynamic structure within the planetary boundary layer (PBL). This addresses the potential of observations that could describe information of smaller model-resolved scales for improving fog forecasting from the spatial aspect. Nowadays, due to the establishment of mesoscale observational networks [radiosondes, surface synoptic observations (SYNOP), meteorological terminal aviation routine weather reports (METAR), automated weather stations (AWS), wind profilers, radars, etc.], observations not only with high spatial resolution (i.e. several kilometres) but also with high temporal resolution (i.e. several minutes) gradually become available. Therefore, the impact of observations with short time intervals (referred as high-frequency observations hereafter) on fog forecasting has arisen as an urgent topic to study. Several studies have been done to examine the impact of observations on fog forecasting from the aspect of temporal resolutions. Liang et al. (2009a) conducted experiments with DA intervals of 6, 3 and 1 h in the cycling DA mode and compared the quality of fog forecasting at the Beijing International Airport. Their results showed that experiment of 1-h DA interval outperforms that of 3-h and 6-h with more realistic analysis increments. Given that more and more high-frequency observations are available now, an interesting question is: How far can we go with minutely data for improving fog forecasting?
To answer this question, we systematically examine the impact of high-frequency observations on fog forecasting, and perform a detailed physical analysis of the sensitivity of fog forecasting to observations of high frequency in this study. Given that 3D-Var DA system is more widely used in operational prediction systems versus more advanced DA systems, such as the four-dimensional variational (4D-Var) DA system or the ensemble Kalman filter (EnKF), this study is carried out based on the 3D-Var DA system of WRF model. A recent study by Vendrasco et al. (2016) introduced a method of large-scale analysis constraint (LSAC) into the DA system to assimilate high-frequency radar data for precipitation forecasting in the convective system. They found that DA system with the LSAC method could bring improved performance in their studied cases by making use of high-frequency observations from radar data as full as possible, and maintaining the large-scale balance as well. In this study, the LSAC method is also applied to the WRF 3D-Var (referred as modified 3D-Var hereafter) for assimilating high-frequency observations for fog forecasting. To easily analyse the possible physical effects of high-frequency observations on fog forecasting, and test the effectiveness of the modified 3D-Var scheme on fog forecasting as well, we employ an OSSE framework (the same as in Hu et al., 2017) here. observation error covariance matrices, respectively. Additionally, compared to y o , y defined by y = H(x) is a vector that transforms the gridded analysis x in the model space into the observation space using the non-linear observation operator H.
In traditional 3D-Var, the control variable space v is defined where U is the decomposition of background error covariance B through B = UU T . If the cost function is defined as a function of the analysis increment (relative to background), the basic cost function in (1) can be written as the incremental formulation (Courtier et al., 1994): where the innovation (observation minus background The control variables adopted in this study are the velocity components (U, V), temperature (T), surface pressure (P s ) and pseudo-relative humidity (RH; humidity divided by its background value). Here, the momentum control variables are direct (U, V) instead of commonly used (ψ, χ) for assimilating largescale observations. Correspondingly, both T and P s are full variables without differentiating between 'balance' and 'unbalance' parts. Sun et al. (2016) pointed out that although the original momentum control variables of (ψ, χ) show benefits for improving large-scale features due to the balance between the mass and wind fields, they miss small-scale features. In comparison, without applying balance constraints among different fields, the direct momentum control variables of (U, V) could provide closer fitting to dense observations than that of (ψ, χ) in limited area convective scale DA. Therefore, the momentum control variables of (U, V) are selected in this study. (2)

Case overview
The fog case selected in this study occurred in North China on 20 February 2007. The fog coverage from NOAA-17 satellite visible image is shown in Fig. 1. At 02:28 UTC (10:28 local standard time) 21 February 2007, fog covered the Liaoning Province, Beijing, Tianjin, the east of the Hebei Province, and most of the Bohai Sea. This fog event lasted for over 24 h, causing 234 flight cancellations, 500 flight delays and more than 30,000 stranded travellers at Beijing International Airport (Liang et al., 2009b). The operational numerical weather prediction (NWP) systems in China failed to predict this fog event. Figure 2 (Fig. 2a4). According to Hu et al. (2014), North China was dominated by a high-pressure ridge at 00:00 UTC 21 February 2007. This high-pressure ridge moved slowly eastwards from a position downstream of the Baikal Lake. Additionally, North China was over the transition zone of a weak horizontal pressure gradient and low horizontal wind speed, with the moisture being transported from the sea surfaces to the east and south of North China. These are favourable for fog formation and maintenance in North China. Truth Run. Usually, the Truth and BG Runs start from different initial conditions. Then the degree, by which the assimilation of 'observations' into the BG Run and the assimilation-initialized model forecast recovers the Truth Run, is a measure of how good is the DA scheme (see i.e. Sugimoto et al., 2009).

Description of OSSE and experimental design
OSSE mainly requires three elements: one simulation that well represents the real case (referred as Truth Run hereafter), one simulation used as the first guess background (referred as BG Run hereafter), and simulated 'observations' that sample the experimental results are mainly verified over this area (defined as DNC; see blue box in Fig. 2b), instead of being verified over the entire D03.

Simulated 'observations'.
In this study, the simulated 'observation' types include the conventional data (rawinsonde and conventional surface data) and the intensive data (AWS and PBL wind profiler data), which are drawn from the Truth Run. For the conventional data, additional experiments have shown that the impact of rawinsonde, SYNOP and METAR data on fog forecasting dominates the whole impacts (figures not shown). Therefore, these three types of data are regarded as the conventional data in this study. Although the ensuing forecasts are conducted with all three domains, DA is performed only in D02 and D03.
The observation variables, horizontal and vertical resolutions, and temporal resolution for each type of 'observations' are summarized in Table 1. For rawinsonde data and conventional surface data (SYNOP and METAR), the variable types and locations ( Fig. 3a) are based on real data from the GTS.
Since AWS and PBL wind profiler data are only available locally for a specific region, in this study these two types of observations are assumed to be available only in China within D02 ( Fig. 3b and c). Furthermore, to be realistic, more developed coastal regions are assumed to have more AWS data and PBL wind profiler data than the inland provinces ( Fig. 3b and c). Similarly, in the generation of PBL wind profiler data (Hu et al., 2017), the definition below is used: where H and ELEV are the height and elevation of observation site, respectively, with units of m. K is the number of vertical levels.
According to the temporal resolutions of real data from GTS, rawinsonde data are available only at 12:00 UTC on 20 February 2007, and conventional surface data (SYNOP and METAR) are available at 06:00, 09:00 and 12:00 UTC on 20 February 2007. For AWSs and PBL wind profilers, data with a 20-min temporal resolution are assumed, since both have high temporal resolutions in real data.
The final 'observations' are obtained by adding a noise (observational error) to the Truth. The noise is assumed to be random with an unbiased normal distribution and standard Hu et al. (2017), the Truth and BG Runs are obtained from 40-member ensemble forecasts. The 40-member ensemble forecast is generated based on the model configuration and physical parameterization schemes described in Section 2.1. First, the ensemble forecasts are integrated from 00:00 UTC 20 February 2007 to 03:00 UTC 21 February 2007. The boundary conditions for all 40 ensemble members are the same and provided by the six-hourly NCEP final analysis (FNL) data with a horizontal resolution of 1° × 1°. The initial conditions of ensemble forecasts are obtained by randomly perturbing the FNL data at 00:00 UTC 20 February 2007 (the initial time of model integration). The perturbations are generated by randomly sampling the background error covariance from the fixed covariance model of WRF DA system (WRFDA) (Barker et al., 2004). The standard deviations of initial ensemble for water vapour mixing ratio (Q v ), U and V and T are roughly 0.3 g kg −1 , 3 m s −1 and 1.2 K, respectively. Second, one member with good performance of fog forecasting ('good member') and one member with bad performance of fog forecasting ('bad member') are selected objectively from the 40-member ensemble forecasts, respectively. The detailed selection method is documented in Hu et al. (2017). Finally, the good member is regarded as the Truth Run, and used to produce the 'observations' and evaluate the performance of assimilation-forecast experiments. The bad member is regarded as the BG Run and used as a first guess in DA experiments. Then, experiments that assimilate the simulated 'observations' are conducted, and the 3D-Var analyses of model variables are verified against the Truth Run. Figure 2 shows the simulated fog coverage from the Truth (shading in Figs. 2a1-a4) and BG Runs (shading in Fig. 2b1-b4) from 12:00 UTC 20 February to 03:00 UTC 21 February 2007. The main fog process in Shandong Province is captured by both runs in comparison with the surface observations (dots in Fig. 2). However, for the Beijing-Tianjin-Hebei Region (referred as B-T-H hereafter), the simulated fog coverage in the Truth Run is much closer to observations than in the BG Run. In the BG Run, fog coverage in the entire Beijing, a large part of Tianjin, and the northeast to southwest band structure near the coast north-west of the gulf of the Bohai Sea is missed. The North China including Beijing and Tianjin is an important economical belt for China, thus the accuracy of fog prediction playing an important role in societal production. The remaining to Exp_3h, two sensitivity experiments (Exp_1h and Exp_20m) are conducted with DA interval of 1 h and 20 min, respectively. Note that although observations with the temporal resolution of several minutes are available, to simplify the experiments, the minimum DA interval is set to be 20 min. Figure 4 illustrates the initialization and forecast procedures for Exp_3h, Exp_1h and Exp_20m. In Exp_3h (Exp_1h, Exp_20m), a 3D-Var analysis is performed every 3 h (1 h, 20 min) for a total assimilation length of 6 h from 06:00 to 12:00 UTC on 20 February 2007, followed by a 15-h forecast for all experiments with the boundary conditions from the BG Run. The first guess is from the BG Run in the first DA cycle, and from the previous 3-h (1-h, 20min) forecasts in the subsequent DA cycles.

Verification method
For fog forecasts, the prediction accuracy of atmospheric horizontal visibility (AHV) is considered a vital criterion for success. In this study, the calculation of AHV follows the method of Hu et al. (2014), where the visibility-mixed phase water content (MWC) relationship is defined as follows: where AHV is in units of m. The extinction coefficient β is calculated from MWC by: where MWC is the sum of water vapour, cloud ice, cloud water, snow and rain, given in units of g m −3 .
To quantitatively evaluate the performance, the ETS (Equitable Threat Score) and BIAS are used as accuracy measures.
(4) AHV = −1000 × ln(0.02)∕ (5) = 144.7MWC 0.88 , deviations of 1 m s −1 for U and V, 1 K for T, 1 g kg −1 for Q v and 1 Pa for P for the surface data. The standard deviations for rawinsonde and PBL wind profiler data, which are used as default values in WRFDA, are shown in Table 2.

Setup of traditional 3D-Var.
Information of the background error statistics is the key in a variational system. The background error statistics provide information on how to distribute observational adjustments (increments, analysis minus background) on the model space, and how to get physically balanced analyses. In this study, the background error statistics are generated using the National Meteorological Center (NMC) method (Parrish and Derber, 1992) with the GEN_BE utility from the WRFDA system (Skamarock et al., 2008). A 40-day (1 February-12 March 2007) data-set is generated by performing cold-start 24-h forecasts every day, starting at 00:00 UTC and 12:00 UTC. Then, the difference between the 24-h and 12-h forecasts valid at the same time is computed, and the domainaveraged error statistics of the control variables are obtained. Although rescaling the factors of variance and length scales has been suggested (Ingleby, 2001), the sensitivity tests on the scaling factors show no obvious differences in fog forecasts with different scaling factors (figures not shown). Therefore, the variance and length scales are not tuned in this study. Table 3) are performed to assess the impact of highfrequency observations on fog forecasting using the traditional 3D-Var. A benchmark experiment (Exp_3h) is initialized using the traditional 3D-Var with DA interval of 3 h for a total of 6 h of assimilation before a 15-h WRF model forecast. In addition

Problem of traditional 3D-Var in assimilating high-frequency observations
This section first shows the results of traditional 3D-Var (Ex-p_3h, Exp_1h and Exp_20m designed in Section 2.3.4), and then analyses the physical deficiencies of the traditional 3D-Var.
3.1.1. Results. The evolution of statistical scores, expressed by ETS and BIAS, of the three experiments using traditional 3D-Var is shown in Fig. 5. It is clear that from Exp_3h to Exp_1h to Exp_20m, although information of higher frequency is incorporated into the system, the statistical scores worsen progressively, with temporally averaged ETS (Table 4) decreasing by 11.0 and 25.6%, respectively. Additionally, the BIAS of Exp_1h and Exp_20m is worse than Exp_3h (Fig. 5b).
One typical forecast time is selected to show the performance of simulating fog coverage (Fig. 6). Based on the fog coverage of the Truth Run (Fig. 6a), there is obvious false fog coverage over the Bohai Sea for all three experiments (Exp_3h, Exp_1h and Exp_20m). Furthermore, larger false fog coverage appears over the Bohai sea in Exp_1h (Fig. 6b2) and Exp_20m (Fig. 6b3) than that of Exp_3h (Fig. 6b1). It indicates that with using the traditional 3D-Var, assimilating higher frequency observations If fog observations/forecasts are regarded as binary events (1 = true, 0 = false), ETS and BIAS can be calculated as follows: and where F, H and O stand for points with fog forecasts, correct fog forecasts (hits) and fog observations, respectively. R, calculated byR = F × O∕ Nis a random hit penalty, where N is the total grid points in the verification domain (Muller, 2006;Muller et al., 2007;Zhou and Du, 2010;Zhou et al., 2012;Hu et al., 2014). For ETS, larger value indicates better forecasting performance. For BIAS, the ideal value is 1.0 (the numbers of observation and forecast points are the same), and over-(under-) prediction is indicated by a BIAS greater (less) than 1.0.
The performance of assimilation experiments is evaluated through the scores of predictive skill (ETS and BIAS). The simulated AHV of the Truth Run at each grid point over the verification area (a fixed area within D03 or the entire D03) at each lead forecast time is defined as the reference. Therefore, the area-averaged ETS and BIAS of each assimilation experiment over the corresponding area and time are calculated based on the simulated AHV. As in Exp_3h, but with 1-h DA interval Impact of high-frequency observations on fog forecasting using traditional 3D-Var Exp_20m As in Exp_3h, but with 20-minutely DA interval Exp_3h_LSAC As in Exp_3h, but using the modified scheme Impact of high-frequency observations on fog forecasting using the modified scheme Exp_1h_LSAC As in Exp_1h, but using the modified scheme Exp_20m_LSAC As in Exp_20m, but using the modified scheme leads the Bohai Sea to be moister, in terms of relative humidity, in Exp_1h and Exp_20m compared to Exp_3h (Fig. 7b2-b3), given that no obviously different water vapour content is observed in the analysis for these three experiments (figures not shown). The situation of moisture is consistent with the results of fog coverage simulation (false fog coverage) over the Bohai Sea ( Fig. 6b1-b3). It indicates that the accumulated errors of temperature (cold biases) dominates the whole impact of DA resulting in the bad performance of fog forecasting. Therefore, the following investigations focus on the temperature fields.
To analyse how the accumulated cold biases over the Bohai Sea after the last DA time (mentioned above) is formed, the temperature fields of DA processes (background, analysis and increment) for these three experiments are shown from the aspects of map view at two typical times (06:00 UTC 20 February and 06:20 UTC 20 February; Fig. 8) and box-averaged evolutions (Fig. 10), respectively. At 06:00 UTC 20 February ( Fig. 8a2-a4), DA processes for all three experiments are identical since they share the same background (from the BG Run). In details, due to the large warm biases over the Shandong Province in the background (Fig. 8a2), cold increments are formed over this area and the adjacent area (the Bohai Sea) during DA (Fig. 8a3) with the length scale of temperature increment a bit more than 200 km (Fig. 9). This leads to the correction of background over the Shandong Province, but result in cold biases (errors) over the area with no observations but within the DA length scale (referred as NO-OBS-IN-DA-LEN-SCAL area hereafter) (i.e. the Bohai Sea; Fig. 8a4). At 06:20 UTC (twenty past six) 20 February, DA processes repeat only for Exp_20m (Figs. 8c2-c4). Hence, temperature over the Bohai Sea for Exp_20m becomes colder than the temperature in Exp_3h and Exp_1h (Fig. 8b2-b4). This could also be seen from the boxaveraged temperature in Fig. 10a. Additionally, during the whole DA period (particularly at 12:00 UTC 20 February, the last DA time), from Exp_3h to Exp_1h and Exp_20m, colder does not improve fog forecasting, in some areas, even makes worse.

Physical analyses.
Why do more high-frequency observations incorporated into the system degrade the performance of fog forecasting in traditional 3D-Var? To uncover the puzzle, below we examine the 3D-Var results thoroughly from the aspect of physics.
First, to show the cumulative impact of DA, the physical fields in the analysis at the first model vertical level at the last DA time (at 12:00 UTC 20 February 2007) are analysed. For horizontal wind, the northerly wind to the south and the southerly wind to the north converge over the Bohai Sea in the Truth Run (Fig. 7c1). Although this situation of horizontal wind is generally captured by all three experiments (Exp_3h, Exp_1h, Exp_20m, Fig. 7c2-c4), the local horizontal wind over the specific tips of the Southern Bohai Sea and the Mouth of the Yellow Sea for Exp_1h and Exp_20m shows better match to the Truth Run than that of Exp_3h. Given that worse performance of fog forecasting is obtained from Exp_1h and Exp_20m than that of Exp_3h, it indicates that the horizontal advection is not a key ingredient for fog forecasting in this case. For temperature, compared with the Truth Run (Fig. 7a1), when DA interval becomes shorter, obvious larger errors (cold biases) in intensity and coverage exist over the Bohai Sea (Fig. 7a2-a4). This

The design of a modified 3D-Var scheme
From the analysis in last sub-section, we learn that the main problem in traditional 3D-Var of assimilating high-frequency observations for fog forecasting is the lack of constraint for physical fields (mainly the temperature field) over the biases over the Bohai Sea accumulate progressively with shorter DA interval (observations of higher frequency assimilated) and no constraint for errors (Fig. 10a). Note that, from Exp_3h to Exp_1h and Exp_20m, the accumulated errors over the NO-OBS-IN-DA-LEN-SCAL area (i.e. the Bohai Sea) are larger than the improvement over the area with intensive observations ( Fig. 10a and b). Therefore, the net impact of high-frequency observations does not show any benefit for fog forecasting, it rather deteriorates it over the NO-OBS-IN-DA-LEN-SCAL area. This is mainly due to the accumulated error of tempera- Fig. 7. The top row: map view of 2-m temperature (a1), the relative humidity (b1) and horizontal wind (c1) at the first model vertical level for the Truth Run; second and third rows: the differences (analysis minus Truth) of temperature (a2-a4) and relative humidity (b2-b4) for experiments Exp_3h (a2, b2), Exp_1h (a3, b3) and Exp_20m (a4, b4); The bottom row: horizontal wind in the analysis for experiments Exp_3h (c2), Exp_1h (c3) and Exp_20m (c4)  well as Dahlgren and Gustafsson (2012) have proposed to add innovative information in a limited area model 3D-Var scheme from a large-scale DA system. Later, Vendrasco et al. (2016) applied this idea to the assimilation of high-frequency radar NO-OBS-IN-DA-LEN-SCAL areas. To constrain the errors over these areas, a similar idea as the LSAC method is employed here. For the imbalance issue due to lack of physical constraint in traditional 3D-Var, Guidard and Fischer (2008) as Fig. 8. Map view of 2-m temperature for the Truth Run (the top row, a1, b1, c1), the differences (background minus Truth, the second row, a2, b2, c2), the increments (analysis minus background, the third row, a3, b3, c3) and the differences (analysis minus Truth, the bottom row, a4, b4, c4) for experiments (Exp_3h, Exp_1h and Exp_20m) at 06:00 UTC (the left column, a1-a4), experiments (Exp_3h and Exp_1h) at 06:20 UTC (the middle column, b1-b4) and experiment (Exp_20m) at 06:20 UTC (the right column, c1-c4) over the finest domain. Note that, although there is actually no DA for Exp_3h and Exp_1h at 06:20 UTC, the forecast field is regarded as both the 'background' field and the 'analysis' field with the increment of zero to compare with the impact of DA for Exp_20m. The same treatment is used for the following Fig. 11. 13 grids in D02) and vertical resolution of every 3 vertical model levels. Additionally, to simulate the uncertainty of the large-scale model forecast in real case, perturbations with zero mean and a prescribed standard deviation (2.5 m s −1 for U and V, 2 °C for T and 3 g kg −1 for Q v ) are added to the thinned data. Note that in comparison with the LSAC method used by Vendrasco et al. (2016), the main modification for the LSAC method in this study is that the large-scale analyses are obtained from a large-domain coarser resolution forecast that is run locally, instead of directly from coarser resolution analyses (i.e. GFS analyses, Vendrasco et al., 2016). This large-scale analysis is easily accessible in a timely manner, avoiding the limitation of obtaining the real-time GFS at desired time. Hence, this modification makes it possible to apply the LSAC-3D-Var scheme to the real-time operational prediction systems in the future, where a home-run largerdomain coarser resolution forecast is required to obtain real-time large-scale analyses.
To assess the impact of high-frequency observations on fog forecasting using the modified scheme, three additional experiments (Exp_3h_LSAC, Exp_1h_LSAC and Exp_20m_LSAC, listed in Table 3) are performed in the same manner as the corresponding experiments Exp_3h, Exp_1h and Exp_20m described in Section 2.3.4, but replaced by the modified scheme.

Improvement of fog 'forecasting'
With the same evaluation method in Section 3.1.1, statistical scores of the three experiments using the modified scheme are shown in Fig. 5. It is clear that compared to experiments with traditional 3D-Var (Exp_3h, Exp_1h and Exp_20m), ETSs of the corresponding experiments with modified scheme (Ex-p_3h_LSAC, Exp_1h_LSAC and Exp_20m_LSAC) significantly increases (Fig. 5a), especially for the short DA interval, with the temporally averaged ETSs (Table 4) increasing by 2.6, 22.3 and 39.6%, respectively. Additionally, the corresponding BIASs are much closer to the perfect value of 1.0, with the largest improvements seen in the first few forecast hours (Fig. 5b). The enhanced scores are in agreement with the improvements of simulated fog coverage (Fig. 6). Particularly, the false fog coverage over the Bohai Sea and part of B-T-H in experiments Exp_3h, Exp_1h and Exp_20m (Fig. 6b1-b3) dissipates ( Fig. 6c1-c3). This indicates that the modified scheme could effectively improve fog forecasting with assimilation of high-frequency observations.

Physical analyses
To analyse the physical mechanism for the improvement in the experiments using the modified scheme, a detailed comparison is conducted between the modified and traditional 3D-Var. For the cumulative impact of DA (at 12:00 UTC 20 February) data for improving convective precipitation forecasting. They directly used the large-scale analysis as constraint in the LSAC method. In such a sense, a new term J c , measuring the deviation of the large-scale analysis from coarser resolution for U, V, T and Q v , is added to the traditional 3D-Var cost function equation (2) as: where d c , defined by d c = y c -H(x b ), is the innovation vector that measures the departure of LSAC y c from its counterpart computed from the background x b . y c includes information of U, V, T and Q v from the large-scale analyses. R c is the error covariance matrix of the large-scale analyses for U, V, T and Q v obtained by considering constant uncorrelated errors for each variable, being 2.5 m s −1 for U and V, 2 °C for T and 3 g kg −1 for Q v (Vendrasco et al., 2016).
The procedure of applying LSAC method includes interpolating coarser resolution analysis to the finer resolution domain (where DA performs), thinning the interpolated data with an approximate 'large-scale' resolution (i.e. about 100 km in Vendrasco et al., 2016), and assimilating the thinned data into DA system as 'observations'. In this study, the large-scale analyses used are the ones from a single-domain WRF run of coarser resolution. As OSSE is utilized and 'true' atmospheric conditions (Truth) are given in this study, some special treatments have been done to make the procedure of applying LSAC method closer to that in the real situation. For example, the coarser resolution single-domain WRF run is performed with the initial condition from Truth Run of the coarsest resolution domain (D01). The interpolated data is thinned with horizontal resolution of 117 km (every (8)  Figure 5 shows that with the modified scheme, fog analysis is improved when DA interval varies from 3 to 1 h (from similar as in Section 3.1.2, it is clear that the modified scheme produces smaller temperature errors over the NO-OBS-IN-DA-LEN-SCAL area (i.e. the Bohai Sea). Furthermore, the temperature errors over the area with sparse observations are also reduced (Fig. 11). For detailed DA processes, in comparison with experiments using traditional 3D-Var, the modified scheme improves temperature analyses over the areas both with and without observations due to its large-scale analysis constraint (Fig. 12). This can be also seen in the quantitative evolution ( Fig. 10a and c). It indicates that due to the coherent combination of improvements over the area with intensive observations, and constraining errors both over the area with cipitation spots out of the area with high-frequency radar data are related to the imbalance in the analysis field and the spinup issue. This is demonstrated by the mean absolute surface pressure tendency (MASPT) that, the experiment with LSAC method has smaller value of MASPT (less noise) than that of experiment without LSAC method in the first few forecast hour. In this study, the MASPT for different experiments have also been analysed. The value of MASPT for Exp_3h is larger (more noise) than that of Exp_1h and Exp_20m (figures not shown). This is opposite to the corresponding performance of fog forecasting. We think the possible reason is that, the formation and maintenance of fog are under calm situation, which is different from the convective system. In the convective system, the precipitation forecasting is significantly sensitive to the pressure field. However, in this fog case, forecasting is more sensitive to the temperature field than the pressure field. Therefore, in this study, the issue of imbalance in the analysis could be demonstrated by the errors in the temperature field (cold biases) instead of the MASPT.

Summary and discussions
The impact of high-frequency observations (observations with short time intervals) on fog forecasting is explored using a modified three-dimensional variational (3D-Var) scheme within an observation simulation system experiment (OSSE) framework.
The idea of large-scale analysis constraint (LSAC) is used to design the modified scheme for assimilation of high-frequency observations. The OSSE is conducted based on a 40-member Exp_3h_LSAC to Exp_1h_LSAC), while the improvement becomes marginal from 1 h to 20 min (from Exp_1h_LSAC to Exp_20m_LSAC). This suggests that shortening DA interval beyond some threshold, where the use of high-frequency observations becomes ineffective, is unnecessary for fog forecasting. Fabry and Sun (2010) pointed out that different variables have their own optimal DA intervals based on the study of mesoscale forecasting and convection. This indicates that the optimal DA intervals also exist when comprehensively considering the optimal DA intervals of different variables. To verify whether the optimal DA interval exists in fog system or not, two additional experiments with DA interval of 2 h and 30 min using the modified scheme are conducted (Exp_2h_LSAC and Exp_30m_ LSAC), respectively. Fig. 13 shows that the performance (in terms of ETS and BIAS) improves from 3-h DA interval to 1-h interval, while decrease from 1-h interval to 20-m interval. This indicates that there is actually an optimal DA interval in fog forecasting. In our system, the optimal DA interval is around 1 h. Given that temperature field is the key state field dominating the performance of fog forecasting in this case, we think this optimal DA interval should be related to the adaptation of temperature. Therefore, although high-frequency observations are available, it is certainly necessary to select an appropriate DA interval based on the characteristic of fog system for improved fog forecasting. What should be mentioned is the spin-up issue when high-frequency observations are assimilated using 3D-Var scheme. In the study of precipitation forecasting in the convective system by Vendrasco et al. (2016), the spurious pre- al 3D-Var, although the assimilation of high-frequency observations brings improvements over areas with dense observations, it causes false increment (mainly in the temperature field) over the area with no observations but within the data assimilation (DA) length scale (briefly called NO-OBS-IN-DA-LEN-SCAL area) (e.g. the Bohai Sea), accordingly resulting in the accumulation of temperature errors (cold biases). With the modified scheme, ensemble forecast by selecting a member with good (bad) fog forecasting performance as the Truth (background, BG) run. A dense fog event on 21 February 2007 over North China is studied. Results show that due to coherent information extraction on both large scales and small scales, the modified scheme is able to improve fog forecasting significantly. Further analyses give physical interpretation of this improvement: with tradition- because of the large-scale information incorporated, the NO-OBS-IN-DA-LEN-SCAL areas can be constrained by remote observations. Hence, combined with improvements over the area with dense observations, the general performance of fog forecasting is improved. Additionally, the issue of the optimal DA intervals is discussed using the modified scheme. Note that, based on the utilization of OSSE, the positive impact obtained using LSAC method is fairly optimistic for the real world.
This study is a preliminary exploration of the impact of high-frequency observations on fog forecasting, and could provide clues regarding how to effectively incorporate highfrequency observations to improve the operational fog forecasts. However, this study reveals three issues that require further work. Firstly, studies of real data are needed to verify the results. Secondly, in the future, with the highly dense observations on the coastal region and sparse observations in interior oceans, the effectiveness of the LSAC method on the operational forecasting of sea fog need to be further explored. Thirdly, in the real data regime, the optimal interval for DA cycling illustrated in Section 4.3 should be further addressed for different variables and/or observational networks for improving fog forecasting.