Resolution, physics and atmosphere–ocean interaction – How do they influence climate model representation of Euro-Atlantic atmospheric blocking?

Abstract Atmospheric blocking events are known to locally explain a large part of climate variability. However, despite their relevance, many current climate models still struggle to represent the observed blocking statistics. In this study, simulations of the global climate model EC-Earth are analysed with respect to atmospheric blocking. Seventeen simulations map the uncertainty space defined by the three-model characteristics: atmospheric resolution, physical parameterization and complexity of atmosphere–ocean interaction, namely an atmosphere coupled to an ocean model or forced by surface data. Representation of the real-world statistics is obtained from reanalyses ERA-20C, JRA-55 and ERA-Interim which agree on Northern Hemisphere blocking characteristics. Blocking events are detected on a central blocking latitude which is individually determined for each simulation. The frequency of blocking events tends to be underestimated relative to ERA-Interim over the Atlantic and western Eurasia in winter and overestimated during spring months. However, only few model setups show statistically significant differences compared to ERA-Interim which can be explained by the large inter-annual variability of blocking. Results indicate slightly larger biases relative to ERA-Interim in coupled than in atmosphere-only models but differences between the two are not statistically significant. Although some resolution dependence is present in spring, the signal is weak and only statistically significant if the physical parameterizations of the model are improved simultaneously. Winter blocking is relatively more sensitive to physical parameterizations, and this signal is robust in both atmosphere-only and coupled simulations, although stronger in the latter. Overall, the model can capture blocking frequency well despite biases in representing the mean state of geopotential height over this area. Blocking signatures of geopotential height are represented more similar to ERA-Interim and only weak sensitivities to model characteristics remain.


Introduction
Atmospheric blocking, both the phenomenon itself and its representation in numerical climate models, has been studied for almost seventy years (Elliott and Smith, 1949;Rex, 1950), especially in the Northern Hemisphere. Events occur most frequently at, and downstream of, the jet maxima (Rex, 1950;Croci-Maspoli et al., 2007). Due to spatial and temporal persistence, blocking events explain a large part of the spatial and temporal climate variability (Dawson and Palmer, 2015). Therefore, the over Europe, indicating that improvements in blocking representation benefit other parts of the climate system simultaneously.
Even though many theories on the development and maintenance of blocking exist (e.g. Charney and DeVore, 1979;Frederiksen, 1982;Shutts, 1983;Yamazaki and Itoh, 2012), there is no consensus on one unified theory. Expanding the knowledge on which processes are most important for modelling blocking successfully can at the same time identify the main processes explaining the life cycle of blocking. Only recently, substantial improvements, even eliminating frequency biases , have been achieved. One of the most studied model characteristics known to impact atmospheric blocking representation is the atmospheric resolution, both in horizontal and vertical directions Anstey et al., 2013;Schiemann et al., 2017). Horizontally it determines if small-scale eddies vital for the establishment and maintenance of blocking anticyclones are resolved (Shutts, 1983;Yamazaki and Itoh, 2012). In addition, horizontal grid spacing influences blocking via the representation of orography and specifically most of the blocking enhancement on finer grids can be explained by higher orography Berckmans et al., 2013). The underlying mechanism is that increased maximum elevation at higher horizontal resolution leads to more intensely deflected atmospheric flow, a meridionalization of the jet, which enhances Rossby wave breaking and thus increases the blocking occurrence. Improved representation of stratospheric processes and stratosphere-troposphere coupling impacts tropospheric circulation and enhances atmospheric blocking occurrence (Scaife and Knight, 2008;Woollings et al., 2010). On the other hand, Matsueda et al. (2009) and Berckmans et al. (2013) noted blocking overestimation in models, which points to the fact that resolution does not alone determine the accuracy of blocking representation.
Several studies focused on the influence of parameterization of non-resolved dynamical and physical processes on the blocking performance (Jung et al. 2010;Berner et al., 2012;Pfahl et al., 2015). Jung et al. (2010) reported large improvements of both Pacific and Atlantic blocking in atmosphere-only simulations after the physics package in the European Centre for Medium-Range Weather Forecasts (ECMWF) model was updated. Changes in blocking performance are attributed to improvements in the convection schemes and turbulent orographic form drag, but their relative importance is not analysed. Recent studies applying changes in surface drag show consistent effects on blocking (Lindvall et al., 2016;Pithan et al., 2016).
Finally, the observed climate system is not solely determined by the atmosphere but also driven by the coupling of the different subsystems, which needs to be correctly represented in models. Coupled atmosphere and ocean models often deteriorate the blocking statistics compared to atmosphere-only simulations forced by observed SST and sea ice fields. This has been related to SST biases in the coupled models upstream of blocking (Ringer et al., 2006;Scaife et al., 2011;Neale et al., 2013).
The goal of this work is to study the influence of the threemodel characteristics: atmospheric resolution, physical parameterization suite and type of atmosphere-ocean interaction on blocking events in simulations by the climate model EC-Earth (Hazeleger, 2012). Further, we analyse the relative importance of the model characteristics for successful blocking representation relative to reanalyses. The focus is on the Euro-Atlantic region (50 • W to 120 • E) where one of the two blocking occurrence maxima of the Northern Hemisphere is located. This maximum peaks during winter and is analysed here for winter and spring.

Model
The global climate model used in this study is EC-Earth, developed by a European community of academic institutions (Hazeleger, 2012). The atmospheric part of EC-Earth is based on the Integrated Forecasting System (IFS) of the ECMWF. The atmosphere, land (H-TESSEL, Balsamo et al., 2009), ocean (NEMO, Madec, 2008 and sea-ice (LIM, Vancoppenolle et al., 2009) models are communicating via a coupling software (OASIS, Valcke, 2006). Two versions of EC-Earth are used in a number of configurations of horizontal and vertical atmospheric resolution based on either prescribed SST or a coupled atmosphere-ocean system. EC-Earth Version 2.3 was developed for CMIP5 contribution. It is based on the IFS model cycle 31r1, NEMO version 2.0 and LIM2, coupled by OASIS version 3. An evaluation of the atmosphere and ocean is given by Hazeleger (2012) and Sterl et al. (2011), respectively. Koenigk et al. (2012 present an overview over the model performance in the Arctic system. EC-Earth Version 3.01 (Koenigk and Brodeau, 2017) is an interim version prepared for use in CMIP6 and based on the newer IFS cycle36r1. Changes relative to Version 2.3 were, among others, made to parameterizations of convection, radiation, gravity wave drag and microphysics, now incorporating prognostic ice. The coupled simulations are run with NEMO version 3.3.1 coupled to LIM3. EC-Earth Version 3.01 results are from a non-tuned setup. After a spin-up period of 210 years, a deep ocean drift of 0.05 • C per century remains (Davini et al., 2014a(Davini et al., , 2014b. Perennial present day simulations exhibit a global mean cold bias of roughly 1 • C and a radiative imbalance between the top of the atmosphere and the surface of −2.5 Wm −2 . Partially compensated by a large positive P-E bias of 0.3 mm per day, the atmospheric heat loss does not lead to a surface cooling trend. In addition, the model produces too large September sea ice extent. We assume that the discussed biases do not substantially influence the atmospheric blocking statistics presented here.

Model setup and experiments
A set of 17 experiments using EC-Earth Versions 2.3 and 3.01 in coupled and atmosphere-only runs of different atmospheric resolutions are analysed for the years 1979-2005. The simulations are listed in Table 1 with their settings of the three-model characteristics type of atmosphere-ocean interaction, physical parameterization and resolution. Coupled simulations are labelled CM and atmosphere-only simulations AM, followed by the version number of the physical parameterizations (v2 for v2.3 or v3 for v3.01). Suffixes encode the horizontal resolution, in particular -l, -m, and -h correspond to the low (T159), medium (T255) and high (T511) resolution. Multiple h denote even finer horizontal grid of T799 (hh) and T1279 (hhh). Grid spacing at the equator is ranging from ∼125 km (T159) to ∼16 km (T1279) and thus spans between typical climate and weather model resolutions. Typical time steps vary from 60 min for a spectral horizontal resolution of T159 to 10 min for T1279. Initialized from different states of the pre-industrial control simulation, coupled runs are calculated as ensembles. For setups CMv3-l, CMv3-m and CMv2-l two, five and three realizations are available, respectively. Ocean resolution is 1 • in all simulations, but the number of levels has been increased from 42 (v2) to 46 (v3). Lower boundary forcing of the atmosphereonly simulations differs between v2, applying the official forcing for SST, sea-ice extent and temperature following the setup of the Atmospheric Model Intercomparison Project (AMIP), and v3, where ERA-Interim SST and sea ice conditions are used and sea-ice temperatures are calculated by the atmospheric part of the model. Finally, the height of the uppermost model level is increased from 10 hPa (v2.3, 62 levels) to 0.02 hPa (v3.01, 91 levels). The 29 additional levels are added in the stratosphere above the old vertical grid and do not change the vertical resolution in the troposphere. Of v3.01 only one setup, CMv3-l, is run with 62 levels (see Table 1).
For comparison, the ERA-Interim reanalysis (ERA-I, Dee et al., 2011), the twentieth century reanalysis (ERA-20C, Poli et al., 2013) and the Japanese reanalysis (JRA-55, Kobayashi et al., 2015) are analysed over the same period of historical simulation , and ERA-20C is furthermore analysed over earlier periods of the twentieth century .

Blocking detection method
In this study, a one-dimensional index based on Tibaldi and Molteni (1990) is extended twofold. First, it is based on a central blocking latitude ϕ c varying with day of year and longitude (Pelly and Hoskins, 2003;Barnes et al., 2012). In addition, results are limited to synoptic events following Schalge et al. (2011) and Barnes et al. (2012).
The original algorithm after Tibaldi and Molteni (1990) is based on determining a southern (GHGS) and northern (GHGN) gradients of the 500 hPa geopotential height field Z at time t and longitude λ around a potential blocking latitude ϕ 0 as follows: A longitude λ is called blocked at time t if the two conditions G H G S(λ, t) > 0 and G H G N (λ, t) < −10 m/ • lat are valid for at least one latitude shift .
Use of allows detection of blocking anticyclones even if they are slightly displaced meridionally from ϕ 0 . Here, all model output is interpolated to a common T159 grid. Differences in the blocking frequency for raw and interpolated fields are negligible (not shown). During the blocking detection the values ∈ ±{1.125 • , 2.25 • , 3.375 • } are used. Size selection is done via ϕ N ,S ; here events of ϕ N − ϕ S ≈ 30 • latitude are chosen.
The central blocking latitude ϕ 0 (Equation (1)) is identified with the latitude of maximum daily variance of the five-day highpass filtered geopotential height at 500 hPa. The advantage of this approach is that the blocking detection can be applied for each data-set at the latitude where the high-frequency variability is largest, thus accounting for the model climate in terms of storm tracks and jet stream position. To ensure that the detection algorithm does not encounter abrupt changes in the central blocking latitude field, ϕ c is low-pass filtered by retaining the mean, the annual (T = 365.25 days) and semi-annual (T = 0.5 · 365.25 days) cycle and spatially smoothed by a running average over 20 • longitude.
Furthermore, to retain only synoptic events, several threshold criteria are applied to the binary blocking time series in a fourstep approach following Schalge et al. (2011) and extended by a criterion from Barnes et al. (2012). First, blocking events that are not more than 10 • longitude apart at one time are merged by setting the blocking index to 'blocked' in between. Thereby, the likelihood of detection of omega-shaped blocking structures, of which each individual branch might fail the minimum longitude extend threshold (third criterium), is increased. Second, a quantile filter is applied to eliminate cut-off lows. During a blocking event, geopotential height at the blocking latitude Z (ϕ 0 ) needs to exceed the Q = 0.7 quantile of the geopotential height distribution calculated over the whole time series individually for each longitude grid point, Z (ϕ 0 , λ) > Z Q (ϕ 0 , λ). As a third and fourth criterium, only events extending more than 20 • lon and lasting for more than five days are analyzed. To ensure that only Table 1. EC-Earth simulations 1-17 and their abbreviations are presented together with information about atmosphere-ocean interaction, physical parameteriztion and horizontal and vertical atmospheric resolution. Horizontal resolution is given as triangular spectral horizontal truncation and grid spacing at the equator. Coupled simulations were used as three members for CMv2-l, two members for CMv3-l and five for CMv3-m. Note that the number of vertical layers differ with 62 levels for EC-Earth v2 and 91 for EC-Earth v3 simulations, apart from CMv3-l with 62 levels. reasonably persistent and quasi stationary events are retained, an additional filter is used in extension to Schalge et al. (2011). As in Barnes et al. (2012), a block is perceived persistent if the centre longitude does not change more than λ = 45 • longitude during its lifetime, independent of the duration of an event. In addition, a blocking anticyclone needs to overlap longitudinally by at least 10 • lon with the anticyclone of the previous time step to be counted as the same event. The whole detection algorithm outputs binary information, for each time step and longitude grid point, saving if a blocking event was detected (1) or not (0).

Blocking default analysis and variations
The blocking detection described above is performed on daily mean 500 hPa geopotential height fields between 1979 and 2005. 26 Northern Hemisphere winter (DJF) and spring (MAM) seasons are analysed. The focus is on the Euro-Atlantic region which is divided into the Atlantic (−50 • W to 0 • E), the European continent (0 • E to 50 • E) and the Asian continent (50 • to 120 • E). All fields are interpolated on the coarsest grid (T159) to facilitate the comparison. The default analysis considers blocking events persistent for more than 5 days and migrating not more than 45 • lon during their lifetime. A statistical assessment of blocking frequency variability and model inter-comparison is done by applying a moving block bootstrap (Mudelsee, 2014) to the binary time series of synoptic blocking. Block length is chosen to vary around an arithmetic mean block length of seven days, equal to the mean blocking duration in models. In contrast to the classical bootstrap with a sample length of one, temporal correlation in the time series can be conserved. Applying a mean block length of 10, 15 or 30 days does not change the results considerably (not shown). In addition, spatial correlation is taken into account by bootstrapping all longitudes of the analysis region simultaneously. The sample size from which the bootstrap statistics are calculated is N = 1999. Two blocking time series are estimated to be statistically significantly different, showing a robust signal, if the 95% quantile confidence intervals are not overlapping.
Composite maps are calculated based on daily fields of geopotential height at 500 hPa. Statistical significance is calculated by bootstrapping daily blocking fields, applying in this instance the classical block length of one for 1000 redrawings to both blocked and non-blocked samples. Signals in blocking signature maps are statistically significant if the 95% quantile confidence intervals do not overlap. For calculation of the statistical significance of the deviation of the blocking signature in EC-Earth relative to ERA-I, the blocked events are resampled 1000 times. Then the difference (blocked 95% confidence interval minus non-blocked mean) is asked not to overlap between ERA-I and the EC-Earth simulation. In addition to the blocking composite analysis, we calculate mean state biases from daily 500 hPa geopotential height fields.

Reanalyses
Reanalyses exhibit a peak in blocking frequency around 0 • E in winter and spring (Fig. 1). Blocking occurrence peaks during winter with frequencies of almost 10% confined to a narrow longitudinal area. In addition, during winter and spring, a weaker secondary maximum over the Asian continent can be discerned.
ERA-20C differs the most from the other two reanalyses but the difference to ERA-I is not statistically significant (not shown). These small deviations of blocking frequency distribution in high-resolution reanalyses in winter on the Northern Hemisphere are in agreement with other blocking studies (e.g. Davini et al., 2012;Vial and Osborn, 2012) and also considering results using extratropical cyclones (Hodges et al., 2011), which are dynamically linked to atmospheric blocking.  1900-1926, 1925-1951 and 1952-1978. Inter-annual variability, as indicated by bootstrapping, is higher around the main European-Atlantic peak and the secondary maximum over the Eurasian continent but lower at the slopes of the blocking frequency maximum. The range of ERA-I frequencies falling into the 95% confidence interval between 50 • W and 120 • E is at most 5.2 percentage points in winter and 4.2 percentage points in spring, at a maximum mean blocking frequency of 9.7 and 7.3%, respectively. As another measure of uncertainty the decadal variability can be considered. Figure 1 also shows mean blocking frequencies from three other 26season periods in the ERA-20C record. Apart from one period in spring giving markedly more events, the variability over decades estimated from the spread of four ERA-20C periods is similar to the inter-annual variability estimated from bootstrapping one 26-season period. Thus, the range of variability is comparably large the two different approaches agree well. This indicates that only few statistically significant results are expected from the analysis of model simulations over the 26-season time series.
In the following, ERA-I is chosen as the main proxy for the observed atmospheric evolution.

Model comparison
3.2.1. Daily central blocking latitude. Figure 2 shows the daily central blocking latitude in ERA-I and biases in ERA-20C and EC-Earth simulations relative to ERA-I. Largest biases emerge in AM-l setups (Fig. 2b, e) but are considerably reduced in AM-h on finer grid (Fig. 2c, f). The AMv3 data-set reveals that improvements already occur at medium high resolution (AMv3m, Fig. 2h) and that biases do not reduce much on even finer grid (Fig. 2f, i, m). The influence of resolution exceeds that of physical parameterizations in determining the structure of ϕ c in an atmosphere-only approach (comparing Fig. 2b-c, e-f) for the particular combination of the model characteristics resolution and physical parameterizations used in this analysis. Coupled simulations are sensitive to horizontal resolution during spring, though finer grid is now detrimental, and are more sensitive to physical parameterization in winter. CMv2 simulations position the central blocking latitude further south, whereas CMv3 ϕ c fields indicate a northward bias. This northward bias is largest in continental climates during spring. Similar behaviour is also found in AMv3-h, hh and hhh.  Summarizing, AMv2-l exhibits the largest ϕ c biases and AMv3-m the smallest biases. When excluding AMv3-l all EC-Earth v3 blocking latitudes diverge by less than 2 • latitude from ERA-I over the Atlantic.

Blocking frequency -overall performance.
A comprehensive comparison of blocking frequency in reanalyses and model simulations can be done by means of Taylor diagrams (Taylor, 2001). In the Taylor diagrams (Fig. 3), showing statistics based on the seasonal-mean climatological blocking frequency distribution, standard deviation and correlation can be translated to amplitude and phasing errors of the blocking frequency distribution relative to ERA-I, respectively. Circles around ERA-I at (1,1) represent constant centred pattern root mean square (RMS) errors, not taking into account mean biases.
In both seasons, the two data-sets with the highest correlation relative to ERA-I are the two other reanalysis systems. The total occurrence of blocking is underestimated by most models in winter and overestimated in spring. The CM ensembles spread by around 0.1, and at most 0.2, in correlation and 0.5 in standard deviation. CMv3 reduces the phasing error in winter compared to CMv2. In spring, all three CM setups are separated, but CMv3-m is performing best, with a reduced phasing error compared to CM-l. The spring blocking frequency distribution of CMv3-m exhibits the smallest standard deviation bias of all model setups, though not at the same time the largest correlation. . The diagram also shows the correlation coefficient between the climatological blocking distribution of each individual dataset (reanalysis or model simulation) and ERA-I (the angle from the abscissa as indicated on the outermost circle, black-dashed isolines). The radial distance between ERA-I at (1,1) and the respective data-set is the RMS error (light grey isolines). Diagrams for winter (a) and spring (b) contain data of reanalyses (black), EC-Earth v2 (red) and v3 (blue: CMv3-m and AMv3, yellow: CMv3-l) as summarized in  Apart from AMv2 exceeding CMv2 performance in winter, AM and CM simulation results are close in RMS error space. Finer grid reduces the RMS error from AMv3-l to AMv3-hh during winter, mainly via changes in standard deviation, and increases again forAMv3-hhh climate. Largest bias reduction occurs when increasing the grid spacing from AMv3-l to AMv3-m. AMv2 points to a mixed response of horizontal resolution on correlation/standard deviation. In spring, AM simulations improve with respect to amplitude on finer grid but deteriorate based on phasing of the maximum with the exception of AMv3-hh that has the highest correlation and largest amplitude bias. However, overall RMS error differences are small. Summarizing, Taylor statistics show some individual members of CMv3-l/m being closest to ERA-I. Also, AMv3-m/h/hh show good results with AMv3-m/h performing best in winter and AMv3-hhh in spring.
Extending the analysis to include the longitudinal variability of the blocking frequency (Fig. 4) reveals one more bias pattern in the Atlantic-Eurasian sector. In addition to the underestimation in winter and overestimation of blocking occurrence during spring found in Taylor diagrams, which is likely related to a seasonal shift in the models, some simulations show a positive bias over theAsian continent leading to a smaller secondary maximum (e.g. CMv3-m during both seasons and AMv3-h/hh/hhh during winter).
AM simulations (Fig. 4b-i) are statistically mostly indistinguishable from ERA-I. During winter, a statistically significant underestimation in peak frequency over parts of the Atlantic is present in AMv2-l. One other signal is an overestimation of blocking occurrence in AMv3 over the Asian continent. This signal is not found in AMv3-l and only over a confined area in AMv3-m.
Blocking climatologies in CM setups locally differ more from ERA-I and thus show more statistically significant biases. Differences among initial-member realizations (Fig. 4k-p) are largest around 0 • and for CMv3-m also east of 50 • E. Thus, ensemble variability has similar spatial characteristics as the interannual blocking variability in ERA-Interim, indicated by the 95% quantile confidence interval obtained from bootstrapping in Fig. 1. Figure 4n the peak by about four to six percentage points, although only statistically significant over a confined area. In contrast, east of 50 • E a weak but statistically significant positive bias is found in two of five CMv3-m members. The CM ensemble spread is slightly increased in winter (Fig. 4k-m) which reduces robust signals over the Atlantic. Remaining is a negative bias in winter for CMv2-l. Again a positive bias is occurring over the Asian continent, now in all members of CMv3-l and CMv3-m.

Blocking frequency -intermodel differences.
Replacing the reference reanalysis distribution by AMIP simulations with the same physics system and horizontal resolution as the CM simulations in Fig. 4, the impact of atmosphere-ocean interaction on blocking biases can be analysed (not shown). Considering both seasons, only one member of CMv3-m is statistically significantly different fromAMv3-m during spring over the main blocking area. Additional differences are found east of 50 • E during winter, where CM overestimate blocking occurrence relative to AM setups. Thus, although robust biases relative to ERA-I differ betweenAM and CM simulations, comparing the influence of a resolved ocean circulation to prescribed SST directly does not show statistically significant differences.
Relative impact of horizontal resolution and physical parameterizations on atmosphere-only blocking statistics is also assessed by bootstrapped differencing (Fig. 5). To isolate these individual factors, an assumption has to be made on the sensitivity to vertical resolution which is varying alongside the other two-model characteristics (see Table 1). Anstey et al. (2013) did not detect any statistically significant impact of the height of the uppermost stratospheric level on atmospheric blocking frequency. EC-Earth v2 was part of this study, and because vertical grids only differ above the tropopause, we assume that the vertical grid change in EC-Earth is of minor importance for blocking frequency results.
During spring, the largest and statistically significant difference is found betweenAMv2-l andAMv3-h (Fig. 5g)  blocking better (worse) over the western (eastern) part of the 0 • to 50 • E segment. Decreased performance of AMv3-h over part of the region is due to an underestimation of blocking over eastern Europe. Largest similarity, with differences of less than two percentage points in blocking frequency, is detected for setups of same resolution ( Fig. 5a and i). Similarly during winter, AMv3-h is giving the best peak frequency and is producing blocking statistically significantly better than AMv2l and AMv2-h (panels g and i). However, again AMv3-h is not performing better over the whole longitude range due to overestimation of blocking frequency over the Asian continent. Thus, both resolution and physics updates are beneficial for representing blocking at the Euro-Atlantic maximum. One way of reading this results is that increased resolution is beneficial to capturing blocking, but improving other parts of the model, like physical parameterizations, simultaneously is necessary to distinguish the signal from the noise. Considering only winter and the higher resolution simulations (Fig. 5i), similar improvements are obtained from changing the physical parameterizations. Analysis of all AMv3 simulations available (see Table 1) gives information about the influence of atmospheric grid spacing from low to very high resolution (not shown). The two best simulations in winter, AMv3-h/hh, are statistically significantly different from AMv3-hhh over parts of the Atlantic. During spring, AMv3-h and AMv3-hhh capture the blocking peak frequency best and differ from all other simulations, which overestimate spring blocking, by up to four percentage points. Other simulations do not differ by more than two percentage points. Just comparing results at two horizontal resolutions indicated that blocking representation is improving with finer grid spacing but also that this is mainly occurring together with other changes in the model. Considering a wider range of simulations based on the same physical parameterizations confirms this observations. No continuous improvement of blocking representation with finer grid spacing can be observed in a model with fixed physical parameterization scheme. However, most AMv3 climatologies are within the 95% quantile confidence interval of ERA-I as determined with a moving block bootstrap and thus a sensitivity to horizontal grid might be restricted by internal variability and the relatively short length of the time series.
Lastly, a bootstrapped comparison for each CM membermember combination is presented in Fig. 6. The dashed line is based on a combination of all members of each CM setup forming one long time series. In contrast to single-member analysis, these results show the improvements possible by combining ensemble information or running a longer model simulation. During spring, CMv3-l is performing similarly to both CMv2-l and CMv3-m ( Fig. 6c and f). Comparing CMv2-l and CMv3m over the European continent, (Fig. 6b) shows that improved resolution and physical parameterizations lead to better blocking representation over the western part of this section and to an incorrect overestimation of blocking to the east. Although this signal is only present in two of the 15 member-member comparisons, it is robust over the same area when combining all members (dashed line). This result is very similar to the one found in the AM setup and indicates an impact of resolution, enhanced if physical parameterizations are also changed. During winter, results point towards a larger weight of physical parameterizations in improving the Euro-Atlantic blocking frequency. As indicated by Fig. 6h, results from CMv3-l and CMv3-m differ the least, by up to four percentage points (two percentage point when taking all members as one long realization), and the difference is not statistically significant. On the other hand, CMv2-l is significantly biased with respect to CMv3, missing up to six percentage points in blocking over the main occurrence peak (Fig. 6d, g). This is again contrasted by a better performance of the simulation on coarser grid with older physical parameterizations over the Asian continent. Both signals are clear features when combing the members to one time series each.
Since biases compared to ERA-I are larger in CM than in AM simulations, more robust sensitivity to model characteristics resolution and physical parameterization is present in CM results. However, in spring blocking biases are reduced in models with both types of atmosphere-ocean interaction when resolution is increased together with updates to physical parameterizations. On the other hand, the positive Asian bias indicates that other mechanisms besides resolution enhancement must impact blocking representation as a limited model resolution is understood to reduce blocking frequency from an upper boundary. Only during winter is the impact of physical parameterizations in determining blocking performance more isolated in coupled simulations.

Spatial structure -mean state biases.
As expected from basic dynamical theory, ERA-I mean geopotential height at 500 hPa is decreasing poleward in both winter and spring ( Fig. 7a  and 8a) though the meridional gradient is reduced during spring. A climatological ridge is found around 0 • E, at the boundary between Europe and the Atlantic.
EC-Earth exhibits a tendency for a negative bias in geopotential mean climate south of Greenland in all setups but strongest for CMv2-l. Sterl et al. (2011) discuss this bias in EC-Earth CMv2 and relate it to a spatial displacement of the Gulf Stream. SST biases in coupled models can be partly forced by  atmospheric circulation biases and these atmospheric biases can then also develop in AM simulations. A negative bias also exists in newer CMv3 simulations, although in an alleviated form and displaced southward. Similar to the Atlantic bias, most other mean state biases found vary little with season but differ by type of atmosphere-ocean interaction and show even larger sensitivity to physical parameterizations. AMv3 simulations show no clear resolution sensitivity, even when including AMv3-hh/hhh (not shown). Characteristic for the EC-Earth v2 climate is an underestimation of geopotential height over the Atlantic which is accompanied by a positive (negative) bias over parts of the northern Eurasian continent during winter (spring) (Figs. 7 and 8b,c,g). The mean climate of EC-Earth v3 is a weaker meridional geopotential height gradient over the Atlantic (Figs. 7 and 8d-f and h-i). During both seasons, AMv3-m exhibits smallest deviations from ERA-I. In addition, a positive bias centred over the Kara sea, coinciding with the location of the Asian frequency bias for CMv3 and AMv3-h (Fig. 4f, l, m), causes a reduction of the meridional geopotential height gradient also over these longitudes. As the Arctic/northern Russian positive bias occurs in proximity to the ice edge, it could be linked to biases in sea ice extent in coupled models or, independent of complexity of atmosphere-ocean interaction, connected to a northward ϕ c bias. In contrast to the connection discussed between the Asian blocking frequency biases and the mean state bias over this area, the region of European blocking is coinciding with relatively weak mean state biases which, in addition, are spatially not confined to the blocking region.
3.2.5. Spatial structure -blocking signature. Focus in analysing the geopotential height blocking signature in ERA-I and absolute value biases in model simulations is put on the European region (Fig. 9). Results from spring and for the other two regions (Atlantic and Asian continent) are not shown but briefly compared to the results found in winter.
The ERA-I winter blocking signature in the 500 hPa geopotential height field (Fig. 9a) is a dipole with a positive anomaly in the north and a weaker negative anomaly in the south. Blocking signature maxima during spring are weaker, spatially wider and lack a clear negative counterpart (not shown).
Winter blocking over Europe is captured with localized biases of less than 75 m. The main common feature seen in some setups is an eastward shift or weakening of the negative anomaly combined with a weak southward shift of the positive anomaly (Fig. 9b,d,e,g). The only sensitivity to model characteristics in AM simulations is a weak reduction of biases with higher resolution (Fig. 9b-c and d-f). This is with the exception of AMv3-hhh (not shown) which does not perform better than AMv3-h/hh but similar to AMv3-l/m. AM-h and AMv3-hh (not shown) give the best model results and show no biases of the main blocking signature.
In spring (not shown), blocking signature biases over Europe are much more uniform but with up to 100 m also larger. All simulations have a tendency for a southwestward shift of the positive anomaly or a weakening on the eastern flank. Simulations with lower resolution tend to exhibit the shift pattern, while higher resolution simulations mainly feature a weakening of the blocking signature. This means that blocking signatures in coupled models exhibit a larger sensitivity to resolution than to physical parameterizations, similar to findings based on blocking frequency (Fig. 6b). No atmosphere-only resolution sensitivity can be discerned and AMv3-m/hh perform best.
Over the Atlantic (not shown), winter model biases are amplified though mainly resembling the structure, and thus the conclusions found over Europe. The springtime Atlantic ERA-I signature is weaker than over Europe, and model biases are smaller in absolute numbers as well, mostly showing a strengthening of the positive anomaly. AMv3-h to hhh exhibit hardly any bias over the Atlantic but AMv3 presents a weak grid spacing sensitivity from low to high resolution.
Blocking signature biases over theAsian continent (not shown) are found for all simulations which exhibit a weaker negative anomaly around 80 • N 100 • E for the European blocking signature during both seasons . In these simulations, the blocking anticyclone is shifted northeastward and its position coincides with the second blocking frequency peak found in these simulations over the Asian continent (Fig. 4).
Season-dependent similarities among CM simulations and stronger blocking signature biases over the Asian continent related to increased resolution and newer model version are patterns found in both blocking frequency and blocking signatures. Smallest deviations from ERA-I in all regions and both seasons can be found for medium to high resolution AMv3 simulations, also mirroring blocking frequencies results. In addition, some AM resolution sensitivity is present over Europe and theAtlantic, which was not found in blocking frequency results.

Conclusions and discussions
This study examines a number of simulations with a suite of EC-Earth model configurations and shows that blocking frequency, as determined by ERA-I, is statistically generally well reproduced by the model. Analysis has been done based on blocking frequency, mean state bias of geopotential height at 500 hPa and blocking signature representation over the Atlantic-Eurasian sector (50 • W to 120 • E).
Blocking frequency in both atmosphere-only and coupled model simulations (AM and CM, respectively) is influenced by model resolution in spring, though differences are not statistically significant. Robust results are only obtained if combining resolution and more recent physical parameterization schemes. In winter, CM simulations show a robust signal of impact of physical parameterizations and AM simulations are again simultaneously improved by finer resolution and updates to physical parameterizations. The possibility of reducing biases in blocking representation through advances in atmospheric resolution and physical parameterization has been previously reported for AM simulations (e.g. Berner et al., 2012;Jung et al., 2012;Schiemann et al., 2017) but CM simulations, especially in a single-model framework, have been studied less (Anstey et al., 2013). Here, we show that coupling the atmospheric model to the ocean model gives slightly larger biases with different members indicating larger variability than for prescribed surface conditions. Comparing results from the two types of atmosphereocean interaction directly, however, shows only minor influence on blocking frequency. Most successful blocking representation was achieved by the coupled newer version EC-Earth V3.01 simulations using ∼78 km resolution (CMv3-m) and atmosphereonly simulations on a similar or finer grid of the same model version (AMv3-m to AMv3-hhh).
The occurrence of only few statistically significant differences, even though blocking frequencies differ by up to four percentage points at an absolute frequency of ten percent, is related to large natural variability of atmospheric blocking (Davini et al., 2012;Barnes et al., 2014;Gollan et al., 2015). Applying statistical tests that underestimate this variability will lead to overconfidence in assessing trends, for example related to global warming, or in finding improvements due to advances in model characteristics. Underestimation might occur if the variability from a limited model ensemble is used in the statistical assessment. Here, the variability is similar when estimated by a moving block bootstrap on ERA-I or from the spread of four different ERA-20C periods but is underestimated in the spread of a coupled model ensemble.
Currently blocking representation is statistically captured correctly, making future improvements in blocking frequency due to further model enhancements unlikely for the analysed regimes. Studying data over a longer time period could reduce the uncertainty of the sensitivity analysis and show further sensitivities as indicated by the combined time series of the CM members.
An overestimation of spring blocking occurrence atypical to other modelling studies was found (e.g. Schiemann et al., 2017), which might be caused by either a temporal shift or an overall different structure of the simulated seasonal cycle. The fact that the blocking frequency is overestimated in some instances, especially also at higher resolution, could furthermore indicate that the model may not be converging to the real atmospheric solution. This might be connected to the incomplete tuning of the model, which, in addition to the large variability of atmospheric blocking, reduces the significance of the results. However, we assume that the use of a not fully tuned model does not influence the blocking results here in a significant way. A study by Davini et al. (2017), in which a tuned setup of EC-Earth v3.2 was used, reports similar impact of horizontal resolution on blocking frequency though an exact comparison is not possible due to the use of a different detection index.
Impact of vertical resolution has not been analysed separately but we refer to the study of Anstey et al. (2013), in which EC-Earth v2.3 was included. Found was a weak sensitivity with small, spatially inconsistent and statistically not significant impacts of number of vertical levels to blocking frequency if adaptions to the vertical grid were only made in the stratosphere, as is the case here. Furthermore, spring results presented here for different physical parameterizations and same horizontal resolution do not differ much. Thus, unless the impact of vertical resolution is exactly balancing the changes to climate by physical parameterization, influence of different vertical resolutions used here can be neglected. However, many processes will influence the occurrence of blocking and it is likely that some counteract each other.
All simulations, even those representing blocking well, show biases of the mean atmospheric state. In comparison, the most pronounced deviations are found in coupled simulations. Despite this, the blocking signature or the blocking frequency in CM simulations are not affected detrimentally compared with AM simulations. Another apparent stratification of the mean state occurs with respect to model version as the older version of EC-Earth tends to underestimate the geopotential height at 500 hPa over the considered area. The more recent version consistently produces a weaker meridional gradient in the 500 hPa geopotential height than ERA-I. Through a weaker meridional gradient the likelihood for wave breaking, and interconnected for blocking, could be increased. This weaker meridional pressure gradient can, among other causes, be induced by excessive surface drag Sandu et al., 2016). However, the drag parameterization is only one part of the physical parameterization suite that has been updated between the two EC-Earth versions and attribution of process changes to the revised model description is complex. Still, more research is required to pin down specific processes explaining atmospheric blocking improvements in detail.
Visible in the mean state bias is furthermore a positive height bias over the Asian continent at higher latitudes, with impacts on blocking signature and blocking frequency. Interestingly, the bias is enhanced in higher resolution simulations and in the newer model version, and likely linked to a northward shift of the daily central blocking latitude in these simulations. This positive bias is an indicator that horizontal resolution cannot be the only modulator of atmospheric blocking as observations are expected to present an upper bound on modelled blocking frequencies. A brief sensitivity study shows that this bias persists if the central blocking latitude from ERA-I is used for blocking detection in all EC-Earth setups (not shown). Blocking detection at a constant central blocking latitude of 60 • N (not shown) eliminates the continental frequency bias and points indeed to the importance of a latitude bias. Even if the continental frequency bias is latitude dependent, physical parameterization and weak horizontal resolution impact are similar in the two other latitude selection methods presented here. Assessing blocking based on several blocking detection algorithms seems advantageous.
The eastern continental blocking bias is also visible in composites of geopotential height in setups of finer grid and more recent physical parameterizations. Blocking signature biases over Europe and the Atlantic are weaker despite of frequency biases, which agrees with other model studies (Davini and D'Andrea, 2016). Apart from a reduction of winter blocking signature bias in atmosphere-only models with finer grids over some regions no clear signals of bias reduction can be found. Sensitivity to resolution is also present in coupled models during spring.