Evaluation of downscaled wind speeds and parameterised gusts for recent and historical windstorms in Switzerland

Assessments of local-scale windstorm hazard require highly resolved spatial information on wind speeds and gusts. In this study, maximum (peak) sustained wind speeds on a 3-km horizontal grid over Switzerland are obtained by dynamical downscaling from the Twentieth Century Reanalysis (20CR) employing the Weather Research and Forecasting (WRF) model. Subsequently, simulated peak gusts are derived using four wind gust parameterizations (WGPs). Evaluations against observations at 63 locations in complex terrain include four high-impact windstorms (occurring in 1919, 1935, 1990, and 1999) and 14 recent windstorms (occurring between 1993 and 2011). Peak sustained wind speeds and directions are generally well simulated, although wind speeds are mostly overestimated. In general, performance and skill measures are best for locations on the Swiss Plateau and inferior for Alpine mountain and valley locations. An independent ERA-Interim WRF downscaling configuration produces overall comparable results, implying that the 20CR ensemble mean is a reliable data set in dynamical downscaling exercises. The four evaluated WGPs largely reproduce the observed gustiness, although the timing and magnitude of the peak gusts are not regularly captured. None of the WGPs stands out as single best for the complex topography of Switzerland. Differences among the WGPs are small compared to the biases inherited from the sustained-wind part in the WGP formulations. All WGPs transform overestimated peak sustained winds into underestimated peak gusts, which points to an underrepresentation of the turbulent part in the WGP formulations. The range of simulated peak gusts from downscaling all 20CR ensemble members does not reliably include the observed peak gust, indicating limited benefit in applying an ensemble approach. Despite the limitations, we infer that with spatial optimisations of the simulation (e.g. by bias correction or adaptation of the WGP schemes), downscaling of 20CR input is an efficient option for high-resolution assessments of windstorm hazard and risk in Switzerland.


Introduction
Winter storms are natural hazards with potentially disastrous socio-economic impacts on regional to continental scales. Damage from extratropical windstorms has increased in Central Europe and Switzerland in recent decades (Munich Re, 2002;Swiss Re, 2000;Usbeck et al., 2010;Imhof, 2011). This has raised public interest in highprecision, local-scale assessments of windstorm hazard and risk for, e.g. planning, engineering, or insurance applications, among others.
Such wind hazard and risk assessments largely rely on maximum (peak) wind speed during a windstorm event; for example, on peak sustained winds, typically defined as the wind speed averaged over 10Á20 minutes, or peak gusts, typically defined as the wind speed averaged over a few seconds (Klawa and Ulbrich, 2003;Hofherr and Kunz, 2010;Usbeck et al., 2010). However, these assessments are challenging because peak gusts are highly variable over space and time, and high-resolution wind information is available for recent decades only. For Switzerland, observations of peak sustained winds and peak gusts become very sparse prior to the implementation of the automated measurement network in 1980. GIS-based regionalisation of point gust observations (e.g. Etienne et al., 2010) is therefore limited in temporal coverage. Other approaches like using pressure differences as proxies for storminess (Wang et al., 2011) cannot capture complex orographic effects.
To overcome some of the caveats, such as the sparsity of observations, wind gusts can be derived from atmospheric reanalysis products by dynamical downscaling to local scales and by subsequent wind gust parameterisations (WGPs) that estimate gust speeds at local (i.e. sub-grid) scales (e.g. Goyette et al., 2003). Dynamical downscaling has been widely used to assess regional to local wind gustiness over complex terrain in Europe, with a variety of input data and downscaling configurations (Goyette et al., 2003;Goyette, 2008;Á gu´stsson and Ó lafsson, 2009;Pinto et al., 2009;Horvath et al., 2011). Thereby, a number of WGPs have been developed to account for the sub-grid turbulence (e.g. Brasseur, 2001;Benjamin et al., 2002;Doms and Scha¨ttler, 2002;Jungo et al., 2002). The generic form of a WGP is given by the combination of two components, which are the sustained wind speeds, typically obtained from the regional model simulation, and a turbulent wind component that has to be parameterised. Three types of WGP under non-convective conditions are commonly distinguished (see overview in Sheridan, 2011). The first type estimates the turbulent part from the maximum potential momentum that can be mixed down to the surface from within the Planetary Boundary Layer (PBL; Brasseur, 2001). The second type uses an empirical relation between the turbulent component and a local drag coefficient (based on Panofsky et al., 1977;Panofsky and Dutton, 1984). The third type uses a gust factor, which describes an empirical ratio between sustained wind and gust speeds (e.g. Jungo et al., 2002, for Switzerland). In addition, probabilistic views on wind gust forecasting have been introduced (Friederichs et al., 2009;Born et al., 2012).
For Switzerland, downscaling of windstorms has been performed within the frame of case studies (Goyette et al., 2001(Goyette et al., , 2003Stucki et al., 2015) and within a sensitivity study based on a set of recent windstorms (Go´mez-Navarro et al., 2015). Stucki et al. (2015) focused on a foehn storm in 1925 and showed that historical storms can be reliably downscaled and associated losses can be estimated using simulated wind gusts (see also Welker et al., 2016). Go´mez-Navarro et al. (2015) showed that downscaling recent (1979Á2013) windstorms over complex terrain like the Alps delivers reasonable winds; however, no gusts were assessed in that study. To date, the current literature lacks a systematic comparison of different gust parameterisations over complex terrain which is based on more than specific cases.
The purpose of this study is to fill some of these gaps and evaluate four commonly used gust parameterisations over complex terrain of Switzerland using the Twentieth Century Reanalysis (20CR, Compo et al., 2011) as the initial and boundary data set for dynamical downscaling. 20CR extends back to 1871 (the more recent 20CR version 2c to 1851), thus overcoming the above-mentioned temporal limitation of most reanalysis products and allowing analyses of an increased number of extreme, that is, rare events on multi-decadal to centennial scales. Moreover, the 20CR data set comprises 56 ensemble members, reflecting a range of potential initial and boundary atmospheric conditions. The dynamical downscaling from the 20CR is performed with the regional Weather Research and Forecasting model (WRF; Skamarock et al., 2008). We assess our approach of using the 20CR ensemble mean alone as a driving data set for downscaling, in comparison with using information from a subset or all 56 ensemble members in the 20CR. In addition, we take advantage of the independent downscaling configuration by Go´mez-Navarro et al. (2015) for comparisons of the downscaled sustained wind speeds. A range of performance and skill measures is used to evaluate the four WGPs. The evaluation is based on a set of recent and historical (i.e. occurring before 1980) high-impact windstorms over Switzerland provided by Stucki et al. (2014).
The article is structured as follows: In Section 2, we specify the sets of recent and historical winter storms and the available wind observations, the global reanalysis data sets, the WRF model configuration as well as the four WGP schemes. In Section 3, we analyse the sustained wind speeds obtained from downscaling 20CR data with the WRF model. In Section 4, we evaluate the simulated peak winds and gusts based on the four different parameterisations. Finally, a summary and conclusive remarks are presented in Section 5.

The sets of recent and historical windstorms
The study is based on windstorms that are selected from a catalogue of high-impact windstorms in Switzerland reaching back to the middle of the 19th century (Stucki et al., 2014). Two subsets from this catalogue (Supplementary  Table 1) are used for evaluations. A small set of windstorms is used for case studies and comprises four extremely damaging windstorm events. The set includes examples for typical storm situations in Switzerland; two are westerly windstorms and two are foehn storms, from recent and from historical periods each. These are (1) the westerly windstorm Lothar, which occurred on 26 December 1999(WSL, 2001Bru¨ndl and Rickli, 2002;Jungo et al., 2002;Wernli et al., 2002), (2) the 'once-in-a-century' foehn storm on 7Á8 November 1982 (Frey 1984), (3) the westerly windstorm on 23 February 1935 , and (4) the foehn storm on 4Á5 January 1919 (Frey, 1926;Bro¨nnimann et al., 2012). A second, larger subset encompasses the 14 winter storms since 1993 listed in Stucki et al. (2014). For this set, high-resolution wind measurements are available for the evaluation of model performance.
Windstorms that occurred only a few days apart are considered as one windstorm period.

SwissMetNet wind observations
Wind speed observations are available from the automated surface measurements network SwissMetNet (SMN), operated by the Swiss Federal Office of Meteorology and Climatology MeteoSwiss. SMN records undergo a routine quality check. Unrealistic outliers have been removed, but no homogenisation has been applied. Wind observations at 63 locations are selected for evaluating simulated wind and gust speeds ( Fig. 1; see also Supplementary Table 2). The selected stations have wind masts which take measurements of 10 m wind at actual mast heights between 6 and 16 m above ground. This limits observation errors and thus ensures comparability of observed and simulated 10 m-wind speeds.
The selected locations are well distributed over Switzerland and are representative for the complex Swiss topography. Sub-regions are categorised into mountain (]1200 m a.s.l.), valley (in Alpine terrain), and rolling to flat terrain (the Swiss Plateau). The latter category refers to the Swiss Plateau situated between the Jura range and the Alps, but it also includes a number of lower-elevation locations in the Jura range and one low-land location south of the Alps.
Two parameters derived from the SMN are used in this study: peak sustained wind speed and peak gust. The term 'peak' refers to the maximum value recorded during the lifetime of a windstorm at one location. The peak sustained wind speeds are calculated from the 10-minute mean wind speeds in the SMN data set. Of the six values per hour, we consider only the mean from minute 51 to 60 for consistency with the dynamically downscaled output (Section 2.3). The absolute maximum of all these values recorded during a windstorm period is obtained and denoted as SMNm hereafter. Analogously, the peak gusts are based on the hourly maxima of 3-second gusts in the SMN data set.
The absolute maxima of these values during each storm and for each location are considered and denoted as SMNx.
SMNm and SMNx measurements are available for 63 stations since 1993, and measurements at 31 stations reach back to 1981 (Supplementary Table 2). Because spatial representativity is favoured over time series length, we use the subset with observations at all 63 stations for the evaluation. It covers the set of 14 high-impact windstorm periods in Switzerland between 1993 and 2011. Hence, the SMNm and SMNx series each contain 14 values at 63 locations in Switzerland; these data are used for comparison with the corresponding simulated values at the nearest model grid points. In addition, measurements of sustained winds and gusts for the remaining 11 windstorms between 1981 and 1992 (in the catalogue of Stucki et al., 2014) are used to derive a constant gust factor for Switzerland (Section 2.4).

Reanalyses products and regional models
The Twentieth Century Reanalysis data set version 2 (20CR; Compo et al., 2011) serves as initial and boundary conditions for the dynamical downscaling. 20CR is a global, four-dimensional data set describing the state of the atmosphere every 6 h on a 28 )28 latitudeÁlongitude grid (see Stucki et al., 2015, for the representation of the Alps in 20CR). The 20CR reaches back in time to 1871 and encompasses 56 ensemble members; the more recent 20CR version V2c covers 1851 to 2012. The 20CR has proven to be reliable for analysing synoptic-scale, midlatitude weather systems over Europe Stucki et al., 2012;Trigo et al., 2014). Recent studies explored the potential of 20CR as input data set for downscaling (Michaelis and Lackmann, 2013;Misra et al., 2013;Stucki et al., 2015).
For this study, the 20CR ensemble mean and a number of 20CR ensemble members are downscaled, the latter to assess the range of simulated peak wind speeds and gusts during the four recent and historical events (Section 2.1). A set of 51 (five outputs suffered from data corruption) ensemble members is downscaled for windstorm Lothar in 1999 and 55 ensemble members for the windstorm in 1935. For the 1982 and 1919 foehn storms, two contrasting 20CR ensemble members per event are selected for downscaling. The selection of the 20CR ensemble members is done as follows: For each 20CR ensemble member, the near-surface (0.995-sigma level) wind speed for the six grid boxes covering Switzerland (68E Á 108E, 468N Á 88N; see Welker and Martius, 2014) is averaged with an area weight depending on the approximate Swiss area covered by each grid box (ranging from 5 ) to 40 )). Then, the temporal average over the day with the highest documented impact in Stucki et al. (2014) is calculated, and the two ensemble members delivering the strongest and the weakest near-surface winds are selected. Note that due to limited computer resources, shorter time periods are downscaled for the ensemble members than for the ensemble mean.
Analyses by Bro¨nnimann et al. (2012Bro¨nnimann et al. ( , 2013 and Stucki et al. (2015) have already shown that the 20CR ensemble mean provides a realistic estimate of the mean sea level pressure (MSLP) fields and gradients over Europe for two historical foehn cases in 1919 and in 1925. In addition, the analysis presented in the Supplement shows that synoptic air pressure gradients, and hence the low pressure systems associated with high-impact windstorms in Switzerland, are consistently defined across most of the 20CR ensemble members for the set of four windstorm cases in 1999, 1982, 1935, and 1919, although the 20CR ensemble mean may be biased towards lower wind speeds in the period before around 1950 (Supplementary Figs. 1 and 2; see also Welker et al., 2016 Table S2). (b) Locations with abbreviated station names within the Swiss river system and with respect to the Swiss topography (coloured boxes). Refer to Supplementary Table S2 for abbreviations. grid sizes decreasing from 45 km in the outermost model domain to 9 km in the intermediate and to 3 km in the innermost domain over Switzerland. The vertical structure of the atmosphere is described by 31 vertical layers. The Mellor Yamada scheme is used for the parameterisation of the PBL and the Monin-Obukhov scheme for the surface layer. The downscaling output (termed 20CR WRF hereafter) is stored in hourly resolution (instantaneous values). The simulations start about 18 h prior to the documented storm peaks to allow spin-up of the smaller-scale atmospheric features.
Additionally, the 20CR WRF simulation is compared with an independent simulation with WRF (Go´mez-Navarro et al., 2015) which is based on a different driving data set, that is, the ERA-Interim reanalysis (Dee et al., 2011). More observational variables are assimilated in ERA-Interim than in 20CR, and the model used to generate the ERA-Interim reanalysis is run at higher resolution (0.758 )0.758) than the one for 20CR. Another important difference is the use of a non-local PBL scheme employed in the WRF model, slightly modified to account for the non-resolved topography . The downscaling output (termed ERAi WRF, hereafter) is available at hourly temporal (instantaneous values) and 2-km horizontal resolutions between 1979 and 2013. More details about the parameterisation schemes used for ERAi WRF compared with 20CR WRF are available in Supplementary Table 3.

Wind gust parameterisations
Gusts are produced by eddies within the general air flow, and these eddies are generated by friction near the surface and wind shear or convection further aloft (e.g. Holmes, 2007). These interactions make wind gusts extremely variable in space and time. Mesoscale atmospheric models are not able to simulate such processes explicitly, even if run at spatial resolutions of just a few kilometers. Thus, the sub-grid processes of turbulent winds have to be parameterised. Here, four different WGP schemes are applied to the model output and their results are compared.
The first gust parameterisation is implemented in the Unified Post Processor of the WRF model. The WRF postprocess diagnostic of wind gusts (denoted as WPD hereafter) predicts the maximum potential momentum that is mixed down to the surface from the top of the PBL. The source code (NCO, 1997) and the sparse documentation suggest that it is adapted from a routine in the former NOAA Rapid Update Cycle RUC20 of Post-Processing Diagnosed Variables (RUC20, 2007; see also Benjamin et al., 2002;Zhu et al., 2009;Sheridan, 2011). The wind gust at 10 m height ffx 10 is calculated as where ff 10 is the wind speed at 10 m height and ff PBL is the wind speed at the top of the planetary boundary layer h PBL. Deep boundary layers (!1000 m) are reduced to 1000 m, so that h PBL 2000 m is limited to 50.5. The second gust parameterisation is implemented in the COSMO model (abbreviated COS; Doms and Scha¨ttler, 2002) and described in Schulz and Heise (2003) and Schulz (2008). The COS parameterisation of wind gusts at 10 m height ffx 10 is defined as The friction velocity u Ã represents the sustained nearsurface wind modulated by a drag coefficient C d . The two constant factors (3 and 2.4) of the turbulent part come from empirical estimates for a number of German airports (Schulz and Heise, 2003;Schulz, 2008).
The third wind gust parameterisation used is the Brasseur wind gust estimation (abbreviated BRA; Brasseur, 2001;Goyette et al., 2003). It determines air layers aloft where strong eddies may overcome the buoyancy and transport air parcels with high momentum towards the surface.
The height of the air parcel is z p and z 10m is 10 m above ground, TKE is the turbulent kinetic energy, u v is the virtual potential temperature, and Du v is the anomaly when the air parcel is deviated from the surface layer. The maximum wind within the layers z p then predicts the nearsurface wind gust: The fourth WGP uses an empirically deduced, constant gust factor approach (denoted GFC for gust factor constant): The gust factor is derived from the ratio of observed peak gusts to observed peak sustained winds in an independent data set, namely from measurements at 31 locations in Switzerland during 11 windstorms between 1981 and 1992 (Section 2.2). Considering the spatially unequal distribution of the 31 locations, with most stations located at lower elevations of the Swiss topography, we take the mean of three region-averaged gust factors (2.02 for mountain, 1.52 for valley, and 1.48 for plateau locations). GFC can be seen as a trivial parameterisation for comparisons with the above WGPs, assuming that measurements are available in the region of interest. Although derived for Switzerland in this study, a gust factor of 1.67 is consistent with existing estimates for complex terrain (Jungo et al., 2002;Á gu´stsson and Ó lafsson, 2004;Heneka et al., 2006;Fovell and Cao, 2014).

Dynamical downscaling of windstorms based on 20CR
3.1. Downscaled ensemble mean and members in four windstorms cases Before we evaluate the WGPs over Switzerland in Section 4, we analyse simulated sustained winds in the downscaled 20CR ensemble mean and members, based on two recent (1999,1982) and two historical (1919,1935) windstorms at six selected locations (two from each sub-region). The westerly (1999,1935) and southerly (i.e. foehn; 1982, 1919) windstorms were quite different in nature. The passage of the westerly windstorm Lothar (1999) across Switzerland was relatively short and highly intense (WSL, 2001;Bru¨ndl and Rickli, 2002;Wernli et al., 2002;Goyette et al., 2003). This resulted in marked peaks of storminess at the selected plateau locations (Fig. 2; similar in Supplementary Fig. 3 for the 1935 windstorm), and more complicated flow patterns at the mountain and valley locations. In contrast, the high-impact foehn storms in 1982 and 1919 were characterised by a persistent, strong southerly flow over the Alps lasting for 2 days, including a short phase of very strong winds over the Swiss Plateau ( Fig. 3;   Focusing on the downscaled ensemble mean, these specific patterns are largely reflected in the temporal evolution of the simulated sustained wind. The observed peaks are well (quite well) reproduced in intensity at the plateau (mountain and valley) locations with deviations of 95 m s Á1 (910 m s Á1 ) and mostly well in timing (93 h, closer at the plateau locations). The flow is also remarkably well captured for the Kloten location during the historical windstorm in 1935 (c.f. . In contrast, timing and intensity of peak sustained winds are not well simulated at the high-elevation Jungfrau location during the second phase of the 1999 windstorm (Fig. 2). Comparable flaws in Goyette et al. (2003) for a recent and Stucki et al. (2015) for a historical windstorm in Switzerland suggest that this might partly be explained by smoothed terrain and non-resolved local topography in the 3-km model.
Despite the differing initialisation time of the downscaling (for reasons of computer resources), the bulk of ensemble members reproduce similar temporal evolutions of sustained wind speeds, and the ensemble mean runs mostly within the bulk of the downscaled ensemble members. Interestingly, the simulated peak sustained wind in the downscaled ensemble mean mostly represents a high (0.9) decile of the downscaled ensemble for the 1999 and 1935 cases ( Supplementary Fig. 5). In addition, the comparison of 20CR WRF ensemble mean and members with the independent ERAi WRF data (Figs. 2 and 3) shows no substantial difference concerning the temporal behaviour and the peaks at the different locations.

Simulated peak sustained winds for 14 recent windstorms
For the evaluation of peak sustained winds in 14 windstorms between 1993 and 2011, we constrain the analysis on the downscaled 20CR ensemble mean, and we use a pooled sample consisting of the peak values during 14 windstorms at 63 locations. By this pooling of multiple windstorms, potential phase shifts could appear between weaker and  stronger windstorms, causing artificial enhancements of correlation, for instance. Some of the evaluation scores may therefore be (slightly) lower than shown in the following analysis (see Supplementary Fig. 6). However, these effects are very small to acceptable. Wind roses of the 20CR WRF output compared to SMNm ( Fig. 4a and b) show similar distributions of wind velocities and associated directions. Although southwesterly and northwesterly flow prevails at the expense of westerly flow, the observed flow patterns are generally well reproduced.
Peak sustained winds are overestimated on average in the 20CR WRF output by 2.6 m s (1 (Fig. 4a, b, and c; Table 1). The overestimation is more pronounced for valley and less for mountain locations, which suggests that the overestimation of simulated wind speed is driven by unresolved topography in the model, as argued by . The variability in the 20CR WRF output is substantially reduced with respect to SMNm, except for mountain locations (Fig. 4d). The Pearson's correlation coefficient between SMNm and the peak sustained wind in   with MSD being the mean squared difference between simulation and observations, illustrates the added value from the simulation of peak sustained wind with respect to the sample climatology MSD clim , which is given here by the average of the squared differences from the mean observation s 2 o . Positive scores between 0 and 1 mean that the simulation is skilful (von Storch and Zwiers, 2003;Wilks, 2006). Low positive skill (0.1) is found for the plateau locations, however only after calculation of the MSD skill score with simulated peak sustained winds from which the additive bias towards SMNm was subtracted.
Comparisons with an independent ERAi WRF configuration (Go´mez-Navarro et al., 2015; see Table 1 and Supplementary Fig. 6) show that the correlation between simulation and observations in ERAi WRF is similar to 20CR WRF for the total of all stations (0.48) and mountain locations (0.27). It is higher for valley (0.38) and lower for plateau locations (0.26). Additive bias of ERAi WRF compared with SMNm (0.2 m s (1 ) is lower than for 20CR WRF compared with SMNm. RMSD is almost equal to 20CR WRF output (approx. 6.1 m s (1 ), and the variability is rather overestimated (ratio of standard deviations 1.19). Biases in the WRF model, like the general overestimation of mean sustained winds, have been described in previous studies (Horvath et al., 2012;. To reduce this systematic bias, Go´mez-Navarro et al. (2015) selected a PBL scheme that was specifically developed to reduce such an overestimation . One side effect of this choice is the lower performance of ERAi WRF output compared with 20CR WRF for plateau locations. In contrast, the better performance of ERAi WRF for valley locations may largely be attributable to the higher resolution (2-km grid in ERAi WRF vs. 3-km in 20CR WRF). The subtle overall improvement from 3-to 2-km grid sizes and the substantial improvement for valley locations are in agreement with previous studies pointing out the importance of the high spatial resolution for accurate simulations of wind speed in complex topography (e.g. Goyette et al., 2003;Go´mez-Navarro et al., 2015, for Switzerland).
In summary, 20CR WRF is able to realistically reproduce wind variability in terms of wind directions and peak sustained wind speeds. The performance measures are comparatively high for plateau locations and gradually lower for mountain and valley locations. Overall, comparable results are found for the independent ERAi WRF configuration, and many of the differences to 20CR WRF are arguably attributable to slightly different model configurations. The analysis identifies the 20CR ensemble mean as a solid driving data set and our 20CR WRF configuration as an efficient option for downscaling. However, the found qualities and deficiencies of the simulated sustained wind have ramifications for the simulation of peak gusts, as the sustained wind component plays an important role in all four WGP formulations (see next Section 4).

Evaluation of the WGPs
4.1. WGPs applied to four windstorm cases (1999, 1982, 1935, and 1919) Similar to the evaluation of the simulated sustained wind (Section 3.1), the analyses of the simulated gusts based on the four parameterisations show that the temporal evolution is largely in accordance with the observations (Figs. 5 and 6, Supplementary Figs. 7 and 8). The four WGPs coherently produce similar variability in gustiness over time, with occasional outliers. The observed timing of the peak gusts is mostly well captured (93 h, even closer at the plateau locations) for the two recent windstorms, and the magnitude of peak gusts is well simulated for the selected plateau locations (95 m s (1 ). This is less the case for the mountain and valley locations, where magnitudes and timing of the strongest winds are better simulated in case of the very strong foehn episode in 1982 than for windstorm Lothar in 1999. Obviously (for this small windstorm sample), strong and persistent pressure gradients across the Alpine barrier are less challenging for the dynamical downscaling approach than westÁeast pressure gradients along the Alpine bow that lead to complex and more short-lived gustiness that are hard to simulate (cf. Horvath et al., 2011, for bora cross-mountain flow).
As for the sustained wind, we assess the impact of using the 20CR mean or members as driving data set. The analysis shows that the ensemble range of simulated peaks does incorporate the ensemble mean in our analyses ( Fig. 5 and Supplementary Fig. 7, see also Fig. 2 and Supplementary Fig. 5 for sustained wind). However, the strongest and weakest downscaled ensemble members often do not ( Fig. 6 and Supplementary Fig. 8, see also Fig. 3 for sustained wind). More importantly, the ensemble range of simulated peaks does not necessarily comprise the observed peak gust. Furthermore, selecting the strongest (weakest) member in 20CR does not necessarily result in strongest (weakest) peak gusts in the 3-km simulation. Supplementary Fig. 9 depicts the transformation of initial wind forces in 20CR to simulated peak gusts, that is, the relationship between the near-surface wind speed over Switzerland in a specific 20CR ensemble member and the median simulated peak gust over the Swiss Plateau (aggregated by the median peak gusts at the plateau locations) in the same, downscaled ensemble member. For windstorm Lothar in 1999 (the westerly windstorm in 1935), the correlation coefficient is 0.27 (0.36) with a p-value of 0.06 (0.01). Hence, downscaling of physically contrasting ensemble members may potentially inform about the consistency of the simulated gustiness, although the scores do not give clear guidance for the choice of an ensemble subset. Further studies, which are beyond the scope of this paper, could possibly lead to an enhanced sampling strategy.
In summary, all four WGPs are able to reproduce the observed temporal evolution of gustiness, and differences among the WGPs are relatively small. Regarding the selection of 20CR ensemble mean versus individual members, we find on the one hand that the downscaled ensemble mean represents well the information of the downscaled ensemble members, whereas the ensemble ranges of simulated peak gusts do not necessarily contain the observed peak gusts. Thus, the additional information from downscaling all ensemble members is limited. This may be relevant for downscaling projects with restricted computational resources. On the other hand, a strong agreement among the 20CR ensemble members (Supplementary Figs. 1 and 2) at the initial, synoptic scale, does not result in unanimously simulated peak winds when applying dynamical downscaling and different gust parameterisations. In fact, the downscaled peak gusts can be quite diverse when using different ensemble members. Depending on the application, it may therefore be reasonable to use a larger subset of ensemble members, if not all, to assess potential effects from such modulated flow patterns (see also Welker et al., 2016).

Simulated peak gusts for 14 recent windstorms
In the following, we evaluate the performance and skill of the four WGPs for simulating peak gusts by comparing them to observations (SMNx) during the 14 recent windstorms, the same sample as in Section 3.2. Overall, performance measures show small differences across the WGPs (Figs. 7 and 8,  in the same range for the subset of plateau locations, larger ( (3.80 to (8.4 m s (1 ) for mountain locations, and there are both negative and positive biases at valley locations. Gradually decreasing performance from plateau to mountain and valley locations is found, the same as for peak sustained wind. After subtraction of additive bias, the skill scores based on MSD (reduction of variance) show larger positive skill of the simulated peak gusts (between 0.2 and 0.3) compared with peak sustained winds (approx. 0.1) for the plateau locations. However, we cannot identify a physical process explaining this increase in skill. The largest differences among the WGPs become evident in the reproduction of variability: It is substantially reduced in the WPD parameterisation, less in the COS, BRA, and GFC parameterisations. This reduction in variability is slightly larger compared with peak sustained winds (e.g. approx. 0.6 vs. approx. 0.8 for the WPD). The influence of the sustained versus turbulent wind components in the WGP formulations is investigated by  simulating wind gusts based on perfect sustained winds. For this, we prescribe the observed peak sustained wind speed by SMNm and only simulate the turbulent part of the WGP. This is done exemplarily for GFC by simple multiplication of SMNm with the gust factor of 1.67, and for COS by adding the term 2.4*3*u * , with u * at the nearest grid point in the model and at the time step of peak wind speed in the model. As expected, prescribing perfect sustained winds results in even larger negative biases. For the COS (GFC) parameterization, wind gusts are lower by (0.2 ms (1 ( (0.3 ms (1 ) for mountain, (5.0 ms (1 ((3.7 ms (1 ) for valley, and (3.6 ms (1 ( (2.7 ms (1 ) for plateau locations. From these considerations, it follows that the turbulent part in these WGP formulations is substantially underestimated. Of course, prescribing perfect sustained winds leads to an improvement in some of the performance and skill measures (Fig. 8). These expected improvements are particularly large in terms of correlation (mostly increasing to !0.75) and RMSD (decreasing to around 5 m s (1 ). Hence, the specific differences among the WGPs are small compared with the biases inherited from the sustained winds. Some spatial properties of the WGPs applied to 20CR WRF become apparent when mapping the performance measures for each location on the Swiss topography (Fig. 9). Generally, performance over most of the Swiss Plateau and parts of the Jura range (c.f. Fig. 1) is very good, indicated by additive bias of 95 m s (1 (Fig. 9a), Pearson's correlation coefficients around 0.8 (Fig. 9b) and RMSDs near 5 m s (1 (Fig. 9c). In contrast, the WGPs are less reliable along the (northern) flanks of the Alps, where negative biases are larger and correlations weaker. This region is sensitive to smaller-scale windstorm dynamics, which depend on finescale topographical features that are not fully resolved with model grid sizes of 3 km. The performance for very complex  Fig. 4, but comparing observed SMNx with simulated peak gusts for the WGPs WPD (red symbols), COS (dark green symbols), BRA (blue symbols), and GFC (yellow symbols). The star symbols indicate parameterisations based on the sustained wind component provided by SMNm, that is, the yellow (dark green) star is the GFC (COS) parameterisation applied to 20CR WRF output. and exposed locations in the southeastern half of Switzerland is mixed. In summary, no clear single best WGP has emerged from our extensive comparison, although differences are found in the performance and skill measures regarding the subregions. The parameterised peak gusts inherit the spatial differences of performance from the peak sustained wind in the WRF outputs. All four WGP schemes transform overestimated peak sustained winds into underestimated peak gusts, indicating that the turbulent part of the WGP is considerably underrepresented across all schemes. The underestimation of peak gusts over mountainous terrain has been highlighted in previous studies (Goyette et al., 2001;Belusˇic´and Klaic´, 2004;Á gu´stsson and Ó lafsson, 2009), as well as the critical role of accurately simulated mean wind speeds for good gust estimates (Goyette et al., 2003;Belusˇic´and Klaic´, 2004;Á gu´stsson and Ó lafsson, 2009). Finally, it is remarkable that the constant-gust-factor approach is hardly outperformed and is close to an empirical, uniform gust factor of 1.7 found for complex Californian terrain (Fovell and Cao, 2014). However, further investigations are necessary to evaluate if such a constant factor performs similarly well in other mountainous regions.
The underestimation of the turbulent part of the WGP might differ depending on the turbulence/PBL height parameterisation used for the WRF simulation. We have explored the relationship between peak wind gusts and PBL height, since atmospheric stability plays an important role in the formation of strong wind. This fact is considered in the WPD and BRA parameterisations by including PBL height. Although sensitivities of simulated sustained wind to the choice of the PBL scheme are addressed in Go´mez-Navarro et al. (2015), their conclusions can hardly be extended to wind gusts. In this study, strongly aimed at wind gusts, we have used the 20CR WRF simulations with the PBL effects being parameterised using the Mellor-Yamada model scheme (Table S1). Apart from phenomena such as nocturnal low-level jets, it is generally expected that a deep PBL leads to strong winds since it extends the potential vertical range for down-mixing of high momentum. At 63 locations during the 14 recent windstorms, we have found a positive although not statistically significant correlation between the height of the PBL and simulated peak sustained winds (not shown). The same is found for gusts obtained from the BRA parameterisation. The weak correlations might be partly attributed to uncertainties related to the calculation of the PBL height over complex terrain. Indeed, we tested different definitions of PBL (e.g., raw PBL height calculated by the scheme, temperature gradient in the PBL and stability information from the turbulence scheme) and results strongly differed, indicating that the calculation and interpretation of this parameter is problematic, especially in areas of complex topography such as in the study region. Further, the sensitivity of wind gusts to the daily cycle was investigated. Although the PBL height differs significantly between day and night time, we could not identify a robust relationship across parameterisations in the intensity of peak sustained winds and gusts to the time of the day.
The underestimation of the turbulent part of the WGP might furthermore come from the surface drag component used for the WRF simulation. Roughness length in the WRF model is defined depending on land use. Surface roughness is known to have an important influence on surface winds and gusts. In complex terrain, the surface wind field is additionally affected by orography, making the roughness length less dominant than in simple terrain. As another example for potential enhancements, the two factors (3 and 2.4) of the turbulent part in the COS parameterisation [Eq.
The analyses suggest that the WGPs shall ideally be recalibrated for specific model configurations and areas of interest in a way that reduces the underestimation we identify in the parameterised component of the wind gust.

Summary and conclusions
In this study, we have employed dynamical downscaling from 20CR data using the WRF model to obtain eventbased maxima of sustained wind speeds (termed peak sustained winds) on a 3-km horizontal grid over Switzerland. Subsequently, peak gusts have been derived using four WGPs. We have tested intermediate and final products from this modeling chain against observations focusing on a small set of two recent (after 1980) and two historical (early 20th century) windstorms and on a larger set of 14 recent (1993 and afterwards) windstorms, for which wind measurements are available for 63 weather stations. By means of several performance and skill measures, we have assessed the performance of simulated (peak) sustained winds and gusts at subsets of these locations, that is, at the Swiss Plateau, at Alpine mountain, and at valley locations.
The comparison of the simulated with observed peak sustained wind speeds shows that in general, the sustained wind speeds and wind directions are well reproduced. However, the sustained wind speeds are mostly overestimated with the 20CR WRF downscaling configuration. The variability in 20CR WRF is reduced and the linear correlation is overall moderate (the Pearson's correlation coefficient is 0.40; 0.47 for locations on the Swiss Plateau, and 0.3Á 0.2 for Alpine mountain and valley locations). Regionally, most performance and skill measures are best for locations on the Swiss Plateau and inferior for Alpine mountain and valley locations. An independent WRF configuration (Go´mez-Navarro et al., 2015), driven by the ERA-Interim reanalysis and with a grid size of 2 km, produces overall comparable results, indicating the capability of the 20CR data set to drive regional models in dynamical downscaling approaches. Indeed, differences to 20CR WRF are arguably allegeable to model configurations, such as the better performance of ERAi WRF at valley locations, where the spatial resolution becomes a limiting factor for model performance.
The evaluation of the simulated (peak) gust speeds shows that in general, all four WGPs reproduce the observed temporal evolution of gustiness, although the timing and magnitude of the peak gusts are not always captured. We find very good performance of the WGPs over most of the Swiss Plateau and the Jura range locations, and gradually decreasing performance from plateau to mountain and valley locations, which is the same spatial pattern as for peak sustained wind. Overall, performance measures show small differences among the WGPs. None of the WGPs stand out as single best in all cases, and the specific differences among the WGPs are arguably smaller than the biases inherited from the sustained winds.
All four WGP schemes transform overestimated peak sustained winds into underestimated peak gusts, indicating that the turbulent part of the WGP is considerably underrepresented across all schemes. While the mechanisms of this braking effect in the BRA parameterisation are hard to identify due to the physics-based approach, it is more easily attributable to the dependence on empirically derived constants in the WPD (constant divisor of 2000 m), COS (constant factors 2.4 * 3), and the GFC (constant factor of 1.67) parameterisations. These findings call for a careful recalibration of the WGPs for specific applications and areas of interest, for example, by correction of the additive bias or tuning constants in the parameterisation formulae to specific regions.
Finally, we cannot make unambiguous recommendations regarding the selection of 20CR ensemble mean versus members for downscaling. On the one side, we find that downscaling the full set of ensemble members does not reliably provide estimates of potential ranges of peak gusts, and selecting the strongest (weakest) member in 20CR does not necessarily result in strongest (weakest) peak gusts in the 3-km simulation. We conclude that there is limited gain of information from downscaling the full set of 20CR ensemble members. As a guideline, we propose that downscaling projects with restricted computational resources may choose to downscale the 20CR ensemble mean alone. On the other side, the results show that the simulated peak gusts can differ substantially among the downscaled ensemble members. Depending on the application, using all or a large subset of members may therefore be reasonable to assess potential effects on gustiness from slightly modulated flow patterns in the individual members. As a compromise, downscaling of a small subset of ensemble members rather than the ensemble mean alone may inform about the consistency of the simulated wind field, for example, by downscaling ensemble members with large contrasts in pressure gradients and wind direction over the Alps.
In this study, we focus on a region which is well represented in the 20CR, and the evaluations are based on a comparably short period for which wind and gust observations at high resolution are available. Hence, the downscaling and parameterisation procedure shall be evaluated for different regions, with higher spatial resolution and statistical adaptations, and ideally include observations from time periods before 1980. Nevertheless, our approach of dynamical downscaling of the 20CR input is of particular interest to overcome some of the temporal limitations when downscaling early periods not covered by other reanalysis products. Moreover, it offers a complementary method for wind hazard and risk assessments in the field of engineering, spatial planning, or insurance. It has already been the basis for assessments of potential windstorm-related losses in Switzerland (Welker et al., 2016) and for a new wind hazard map for Switzerland estimating wind gust speeds with respect to a range of return periods (FOEN, 2016).