An analogy-based method for strong convection forecasts in China using GFS forecast data

This paper describes an analogue-based method for producing strong convection forecasts with conventional outputs from numerical models. The method takes advantage of the good performance of numerical models in predicting synoptic-scale weather situations. It calculates the convective parameters as predictors to detect the favorable-occurrence environment of strong convections. Times in the past when the forecast parameters are most similar to those forecast at the current time are identified by searching a large historical numerical dataset. The observed strong convection situations corresponding to those most similar times are then used to form strong convection forecasts for the current time. The method is applied as a postprocess of the NCEP Global Forecast System (GFS) model. The historical dataset in which the analogous situations are sought comprises two years of summer (June–September) GFS 6to 48-h forecasts. The strong convection forecast is then generated every 6 h overmost regions of China, provided the availability of strong convection observations. The results show that the method performs well in predicting strong convections in different regions of China. Through comparison with another postprocessing strong convection forecast method, it is shown that the convective-parameter threshold problem can be solved by employing the analogymethod, which considers the local historical conditions of strong convection occurrence.


Introduction
On the basis of 151 severe convective precipitation cases causing torrential floods, Maddox, Chappell, and Hoxit (1979) summarized the common synoptic background characteristics and documented three typical meteorological patterns leading to severe convective precipitation. Accordingly, when forecasting floods caused by severe convective precipitation, one may only need to examine which flow pattern is most analogous to the current weather situation. Despite the large variability of the associated meteorological pattern and the parameters of floodproducing rainfall, this experience-and analogy-based approach works well in general (Yu 2011). The method recognizes that there will always have been a strong convection process historically that shares similar characteristics to the current process. With this idea in mind, an analogybased method was developed in this study to use these similarities quantitatively to produce strong convection weather forecasts.
The term 'analogy' here means two states of the atmosphere that have been observed to resemble one another, and forecasting methods based on these 'analogues' have long been used (Lorenz 1963). It hypothesizes that similar atmospheric patterns will evolve in similar ways (Bergen and Harnack 1982;Zhao, Wang, and Chen 1982). Thus, the forecast for a given period of time for the current situation can be obtained via the evolution of the meteorological conditions observed after that similar past situation (Toth 1991). In recent decades, the analogy-based method has been used as a data-training technique to improve numerical model forecasts. In this regard, the errors of numerical models for similar weather scenarios are also considered to be similar. Current forecasts can be calibrated by observations on past dates where the model reforecasts are similar to the current forecast (Hamill and Whitaker 2006;Ren and Chou 2006).
With different analog criteria for determining the similar past observations or reforecasts, the analog method has been used in various weather-element forecasts. Obled, Bontron, and Garcon (2002) sought analogs in terms of the 1000-and 700-hPa geopotential height to produce daily rainfall forecasts in western Europe. Hamill and Whitaker (2006) directly used the ensemblemean 24-h precipitation amount as the analog element to produce probabilistic forecasts of the daily precipitation amount. Other applications of the analog technique to precipitation forecasts include studies by Panziera et al. (2011), Zhou andZhai (2016), amongst others. As well as precipitation forecasts, the analog technique has also been used for surface wind speed forecasts (Frediani et al. 2017;Alessandrini et al. 2015;Vanvyve et al. 2015;Delle Monache et al. 2011), predictions of the tracks and intensities of tropical cyclones (Sievers, Fraedrich, and Raible 2000), downscaling and model error correction (Fernandez-Ferrero et al. 2009;Yu et al. 2014;Keller, Delle Monache, and Alessandrini 2017), PM 2.5 (fine particulate matter) forecasts (Djalalova, Delle Monache, and Wilczak 2015), and so on.
In this paper, we present an application of the analogy method in strong convection forecasting using the forecast data of numerical models. As one of the most important parts of operational weather forecasts, strong convection forecasts can generally be obtained by checking large simulated radar reflectivities from numerical models. However, the accuracy of this approach is influenced considerably by the model errors resulting from the initial fields and approximations and parameterizations of the model. An alternative approach is to detect the dynamic and thermodynamic environments for strong convection occurrence using convective parameters, such as convective available potential energy, the lifting index, the bulk Richardson number, and so on. Nevertheless, a major problem for this postprocess method is that the critical values of the convective parameters to distinguish the strong convection regions are difficult to determine. These critical values can change according to the region, season, and even the scheme settings of the model. Therefore, applying convective parameters in a given region requires a long-term statistical study, which is usually omitted by forecasters, especially for regions where convective events are rare. Even if these thresholds for a given region are determined, they may change with the climate change of that region. This simple but unsolved problem results in many missing reports for strong convections. Therefore, lots of methods have been developed to handle this problem (e.g. Lei et al. 2012;Zeng, Wang, and Wu et al. 2015). In this paper, we show that this problem can be readily solved by adopting the analogy method, which to some extent calibrates the model errors as well. The analogy method performs well in predicting convective processes in China.
The paper is structured as follows: Section 2 presents the mathematics that underpins the analogy method. Section 3 reports the results from an application of the method with the forecast fields from the NCEP Global Forecast System (GFS) model. Conclusions are drawn in section 4.

Method
As mentioned, what will be forecast by the analog method is determined by the analogy criteria; that is, what elements to seek for the analogs and how to produce the forecasts with these analogs. These two aspects are respectively introduced in this section.

Definition of analogy
In this paper, an 'analogy' is defined as a past prediction that matches the strong-convection-occurrence environment of the current forecast. The parameters involved in describing the basic environment producing these phenomena are: vertical velocity at the surface (Wsfc) and 850 hPa (W850), horizontal divergence at the surface (DIVsfc) and 850 hPa (DIV850), divergence of Q-vector at the surface (DIVQsfc) and 850 hPa (DIVQ850), horizontal divergence of moisture flux at the surface (DIVQFsfc) and 850 hPa (DIVQF850), precipitable water (PW), lifting index (LI), convective available potential energy (CAPE), convective inhibition (CIN), K-index, convective stability index (IC), conditional instability index (IL), conditional convective instability index (ILC), deep convective index (DCI), microburst day potential index (MDPI), total temperature (TT), severe weather threat index (SWEAT), energy helicity index (EHI), stability and wind shear index for thunderstorms in Switzerland (SWISS), wind index (WINDEX), bulk Richardson number (BRN), and storm strength index (SSI). These parameters describe different aspects of the atmospheric characteristics. Wsfc, W850, DIVsfc, DIV850, DIVQsfc, and DIVQ850 indicate the lifting conditions for convection. DIVQFsfc, DIVQF850, and PW reflect the humidity of the air. LI, CAPE, MK, IC, IL, and ILC describe the instability of the atmosphere. MDPI, SWEAT, and WINDEX are empirical parameters for strong weather, such as microbursts, gusts, hail, and so on. With these parameters as the criteria, the more their values approach one another between the two different times, the greater the analogy between their atmospheres. To incorporate all the parameters into the analogy-detection method, following Delle Monache et al. (2011), a formula measuring the analogy for a given location is defined as follows: Here, t is the current time; t 0 is the past time belonging to a long historical time series used to search for analogies; F t is the forecast at time t; A t 0 is the forecast at time t 0 ; F t ; A t 0 k k is called the analogy metric; N c is the number of convective parameters; Δt is half of the time window over which the metric is computed (6 h in this paper), F i,t+j and A t 0 þj are the convective parameter values of the forecast and their analogies at the time window, respectively; and σ i is the standard deviation of the time series of past forecasts for a given parameter, which is introduced to avoid the dimension of the parameters impacting upon the value of the analogy metric. According to Equation (1), the smaller the value of the analogy metric, the more analogous the forecast fields over the time window (see the next section for an explanation of the specific element of the forecast field).

Analogy forecast of the strong convections
With the analogy metric as the measurement, the current forecast from a numerical model is compared to all past forecasts from the same model to select the nearest matching times. These times are then ranked according to the values of the analogy metric, and it is assumed that the observed weather phenomena for the analogous times are similar to those of the current time. Therefore, strong convection observations at the selected matching times are used to produce the strong convection forecast, as follows: where FA t is the strong convection forecast at time t, N a is the number of the selected best-analogy times, OA i;t i is the observations of strong convections of the selected N a members, and r i is the weight associated with each analog: According to Equation (3), the larger the analog metric between the analogy time and the current time, the greater the weight of that time.

Application to GFS forecast data
The analogy-based method described above is preliminarily applied to the NCEP GFS model, which gives relatively long-term weather forecasts. The forecasts from GFS, with a horizontal resolution of 0.5°× 0.5°and a 6-h temporal resolution, cover the entire globe, but only those in China are used here because of the availability of strong convection observations. The strong convection observations are from the China Meteorological Administration (CMA). CMA gives reports of strong convection weather (including thunderstorms, strong surface wind, hail, and shortterm heavy rainfall), called 'Important Weather Reports', over the whole of China. These reports are remapped to the 0.5°× 0.5°GFS grids from station points with the value 0 denoting no reports of strong convections and 1 meaning the occurrence of strong convections. Therefore, for Equation (2), OA i;t i , which is the observation of strong convection, can have two values: 0 and 1. After determining the values of OA i;t i , one can accordingly find that the values of FA t can range from 0 to 1. To determine the occurrence of the strong convections, a threshold value of FA t needs to be given. An FA t value larger than the threshold represents strong convection occurrence, while one smaller than the threshold represents no strong convection having happened. On the other hand, as a discrimination of only two values, 0 and 1, FA t also represents the proportion of strong convection-happening members (members of 1) within the selected N a analogous members (here, N a = 10), denoting the probability of strong convection occurrence. Thus, FA t can also be seen as a probabilistic forecast of the strong convections. The strong convection forecasts produced by the method have a horizontal resolution of approximately 55 km and a temporal resolution of four forecasts per day corresponding to the GFS forecast time (0000 UTC, 0600 UTC, 1200 UTC, and 1800 UTC).

Performance measures
The performances of the postprocessing method are tested for four months, including the whole summers of 2016 (from 1 June 2016 to 30 September 2016). In addition, a four-month dataset in 2015 (from 1 June 2015 to 30 September 2015) is used independently as the historical forecasts to search for analogs. The basic process through which the analogy-based strong-convection forecast is obtained can be split into several steps. The first step is to search among the historical forecasts for the best analogs to the current forecast at time t. Second, the strong-convection forecast is formed from the strongconvection observations corresponding to the bestanalogy times. Here, the historical forecasts include times right before the current time t and the independent dataset in 2015 described above. Analogs are sought independently among the historical forecasts for each forecast time (6-48 h, 6-h intervals) and location in China. Aviation, power supply, businesses, and agriculture in northern China suffered heavy losses. Using this case, we examine the capability of the analogy-based method in forecasting strong convections in China. As shown in Figure 1(a), the green area (the area with strong convection reports), characterizing strong convections, covers most parts of northern China (black box in Figure 1(a); approximately (33°-43°N, 102°-120°E); denoted as 'NC'), from Gansu, Ningxia, eastward to Shanxi, Hebei, and Shandong provinces. Meanwhile, southern China (approximately (22°-26°N, 104°-118°E); denoted as 'SC') and northeastern China (approximately (40°-47°N, 125°-135°E); denoted as 'NEC') also show strong convection occurrence. These three regions have very distinct climate characteristics (Qian and Lin 2005), which allow comparisons of the analogy-based method's performance in different regions of China. The cloud-top blackbody brightness temperature (TBB) at 1200 UTC 13 June 2016 is also presented. In Figure 2(b), except for some scattered convective clouds not in the observations in Figure 1(a), low-value areas of TBB basically reflect the activity of the observed convection. Figure 1(c,d) present the distributions of two convective parameters (CAPE and LI) calculated from the 24-h forecast fields of GFS for 0600 UTC 13 June 2016, a time prior to the convection occurrence at 1200 UTC 13 June. As shown in Figure 1(c), the 24-h predicted CAPE basically presents anomalies in all three strong-convection regions, albeit with significant differences in values. The values of predicted CAPE show a decreasing trend from South to North China. In the SC region, CAPE is basically  larger than 600 J kg −1 . In the NC region, the strong convections occur approximately in areas with CAPE larger than 200 J kg −1 . Some areas in the NC region even have CAPE values of less than 200 J kg −1 . The CAPE corresponding to the strong convection in the NEC region is similar to that in NC. Nevertheless, among the three observed strong convection regions, CAPE values can even reach 1000 J kg −1 (~33°N, 112°E). The same circumstances also occur for LI, which presents large areas of negative values (implying instability) between the NC and SC regions, but no convective weather was observed there. For these two convective parameters, if the conventional threshold values (CAPE > 200 J kg −1 ; LI < 0) are still used, the 24-h strongconvection forecast with the GFS model, for large areas in China, will be false. The 6-48-h forecasts of the strong convection case using the analogy-based method with the GFS forecast data are shown in Figure 2. According to the colorshaded areas in Figure 2, which denote the strong convection areas, the strong convection processes in the three regions of concern are all predicted well from the 6-48-h analogy forecasts. The predicted strongconvection areas in Figure 2 show similar patterns to the observations in Figure 1(a). Meanwhile, no evident strong convection appears between the NC and SC regions, where CAPE and LI show large anomalies (Figure 1(c)). False forecasts mainly happen at the northeast border of China in the NEC region, in the west of the NC region in Gansu and Ningxia provinces, and in the west of the SC region in Yunnan Province (indicated by red dotted rectangles).

Evaluations
The equitable threat score (ETS), critical success index (CSI), probability of detection (POD), and false alarm ratio (FAR) are computed to evaluate the performance of the analogy-based method in forecasting strong convections. Definitions of these parameters can be found in Wilks (1995). These parameters are calculated over the period from 1 June to 25 September 2016 (the whole summer of 2016; 6-h intervals; 468 times) and then averaged over the spatial range of the whole of China (~300 grid sites). As a reference, the performance of another convective-parameters-ensemble method called the 'ingredients-based method' for strong convection forecasts (see Appendix A) is also evaluated. The method overlaps 11 convective parameters, with each parameter having a fixed threshold value to indicate the occurrence of strong convections (Li, Gao, and Liu 2004;Yu 2011).
The skills of the analogy-based method with different FA values as the thresholds to forecast the strong convections are presented in Figure 3(a-d). FA is produced for different lead times, ranging from 6 to 48 h, at 6-h intervals. As seen in Figure 3(a-d), the performance of the analogy-based method can change significantly with different thresholds of FA. The ETS in Figure 3(a) indicates that the analogy-based method performs best when the threshold is equal to 0.3. The pattern of performance indicated by the CSI is similar, except that the threshold corresponding to the peak CSI is 0.2. However, little difference in CSI is found between the thresholds of 0.2 and 0.3. POD and FAR both decrease at a near constant rate as the threshold increases, meaning a high POD is also accompanied by a high FAR. In addition, the threshold seems to relate little to the forecast lead time, as indicated in Figure 3(a-d). This provides evidence of using FA = 0.3 as the threshold to determine the occurrence of strong convections in Figure 2.
The analogy-based method (denoted as 'AN') is compared with the ingredients-based methodology (denoted as 'IN') in Figure 3(e-h). As we can see, for all the lead times from 6 to 48 h, AN performs better than IN. It seems that AN outperforms IN mainly due to a low FAR; that is, AN is more capable of discriminating the non-strong-convection times than IN. The main reason is that IN adopts a fixed threshold for each convective parameter in all locations, and this threshold value usually cannot conform to the actual value that determines the strong-convection occurrence (Figure 1(c,d)). In addition, it can also be seen that both AN and IN present decreasing forecast skill with longer lead times. However, the difference seems to be small between different lead times. The CSI for the 6-h AN forecast is 0.239, and for the 48-h forecast it is 0.219. This is reasonable because the basic idea of AN is to search for similar atmospheric environments to the strong-convection occurrence, and numerical models usually have fairly high skill in forecasting atmospheric patterns.

Summary
In this paper, an analogy-based method for the forecasting of strong convections is presented. The method originates from the detection of the dynamic and thermodynamic environment for strong-convection occurrence using convective parameters such as CAPE, LI, SWEAT, and so on. When using these parameters, the main problem is the determination of the threshold values to discriminate whether the strong convection will happen. The threshold values can be very different for different locations, seasons, and numerical-model scheme settings. The ambiguity surrounding these threshold values can have a considerable influence on the ability to detect strong convections using convective  parameters, especially in regions where strong convections happen relatively infrequently. Our paper shows that the threshold problem can be properly solved by employing the analogy-based method. The method uses the convective parameters as the criteria to search among historical forecasts for similar strong-convectionoccurrence environments to the current forecast. Forecasts of strong-convection occurrence can then be produced based on the strong-convection observations of those best-analogy times. The method is applied as a postprocess of the GFS model to produce relatively long-term strong convection forecasts. The historical archive in which the search for analogs is made includes two years of summer (June-September in 2015 and 2016) GFS 6-48-h forecasts. Given the availability of strong-convection observations, a strong-convection forecast can be generated every 6 h over most regions of China, with a horizontal resolution of about 55 km. Through a single case study and an evaluation of the method over the whole of the summer in 2016, it is shown that the method performs well in predicting strong convections in different regions of China. Comparison with another ingredients-based strong convection forecast method indicates that the analogy-based approach performs much better than that method, which overlaps convective parameters with fixed thresholds over all regions of China. This is largely due to that the analogy-based method taking into account the local historical conditions of strongconvection occurrence in the process of searching for past analogs.
The performance of the method is determined by two factors: the historical dataset and the analogy criteria. This means that the method can be easily ported to other high-resolution numerical models with proper historical datasets and predictors. In this paper, the processes associated with microphysical parameterizations are not considered in the method. When using this method in mesoscale numerical models, the simulated radar reflectivities and cloud hydrometer mixing ratios can be added into the strong-convection predictors and the method may be used as a postprocess calibration of the strong-convection forecast directly produced by the model.
At present, the method only forecasts whether or not strong-convection weather will happen; the specific phenomena of the weather, such as thunderstorms, surface strong wind, hail, and short-term heavy rainfall, are not considered. As a future line of enquiry, we hope to provide a more specific forecast regarding the kind of strongconvection weather that is happening by improving the analogy metric and the corresponding observations.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
Appendix A An ingredients-based method for strong convection forecasts The method can be understood as a quantification of the ingredients-based forecast method (Zhang, Tao, and Sun 2010). It overlaps several convective parameters, with each parameter having fixed threshold values to indicate the occurrence of strong convections. The formula for the method is where FS t is the strong convection forecast at time t, N s is the number of overlapped strong-convection-occurrence conditions determined by the convective parameters, and TS t,i is the possibility of strong-convection occurrence. If the strong convection occurrence condition is satisfied, TS t,i = 1; otherwise, TS t,i = 0. The convective parameters and their corresponding thresholds for the thunderstorm-occurrence conditions are listed in Table A1. According to Table A1, we can see that 11 convective parameters and 21 strong-convection-occurrence conditions (N s = 21) are used. Some parameters contain two thresholds, meaning the method takes into account the intensity of the strong convection to a certain extent.