A new regional climate model operating at the meso-gamma scale: performance over Europe

There are well-known difficulties to run numerical weather prediction (NWP) and climate models at resolutions traditionally referred to as ‘grey-zone’ ( 3 8 km) where deep convection is neither completely resolved by the model dynamics nor completely subgrid. In this study, we describe the performance of an operational NWP model, HARMONIE, in a climate setting (HCLIM), run at two different resolutions (6 and 15 km) for a 10-yr period (1998 2007). This model has a convection scheme particularly designed to operate in the ‘grey-zone’ regime, which increases the realism and accuracy of the time and spatial evolution of convective processes compared to more traditional parametrisations. HCLIM is evaluated against standard observational data sets over Europe as well as high-resolution, regional, observations. Not only is the regional climate very well represented but also higher order climate statistics and smaller scale spatial characteristics of precipitation are in good agreement with observations. The added value when making climate simulations at 5 km resolution compared to more typical regional climate model resolutions is mainly seen for the very rare, high-intensity precipitation events. HCLIM at 6 km resolution reproduces the frequency and intensity of these events better than at 15 km resolution and is in closer agreement with the high-resolution observations.


Introduction
Accurate and reliable projections of future precipitation distributions remain one of the greatest challenges in the climate model community. Current generation climate models, global and regional, struggle to accurately capture the observed intensity and frequency of precipitation, deviating from observations most radically at the extreme high-and/or low-intensity tails of the distribution (e.g. Kjellstro¨m et al., 2010;Wehner, 2013), lowering the confidence in models projections of changes in these features under climate change scenarios. One important reason for this discrepancy is likely the low resolution presently used in climate models, an issue which has been extensively discussed and studied (e.g. Iorio et al., 2004;Hohenegger et al., 2008;Gent et al., 2010;Wehner et al., 2010;Kendon et al., 2012). With the use of an atmosphere-only global climate model (GCM) run in an aqua-planet configuration, Li et al. (2011) could investigate purely atmospheric precipitationproducing processes (i.e. model parametrisations and dynamics) and their dependence across resolutions. The results did not depict any clear convergence when increasing resolution from Â2.88 to 0.358; however, precipitation extremes demonstrated larger sensitivity to resolution than the mean values. Despite the idealistic setup, the result emphasised the importance of model resolution for simulating higher order statistics of rainfall.
Running GCMs at a higher resolution than Â50 km [a resolution often considered a threshold resolution for a realistic simulation of mid-latitude synoptic variability (e.g. Jung et al., 2012)] is still generally too expensive, and therefore dynamical downscaling provides a means to explore finer scale climate information and interactions, as they are able to capture local and regional forcings. Several studies have assessed the horizontal resolution *Corresponding author. email: David.Lindstedt@smhi.se Tellus A 2015. # 2015 D. Lindstedt et al. This is an Open Access article distributed under the terms of the Creative Commons CC-BY 4.0 License (http:// creativecommons.org/licenses/by/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for any purpose, even commercially, provided the original work is properly cited and states its license. dependence in regional climate models (RCMs) (e.g. Leung and Qian, 2003;Gao et al., 2006;Rauscher et al., 2010;Sharma and Huang, 2012;Larsen et al., 2013). The majority of these, as for GCM studies, reach the conclusion of an improvement as the resolution increases, especially in regions of complex terrain. However, it is not always evident what is the primary cause of the improvement: the dynamics, the finer representation of local and regional forcings or the model physics. Gao et al. (2006) reported that resolution sensitivity of the latter was more important than surface forcing to accurately represent precipitation over complex terrain in East Asia. In a recent study, Larsen et al. (2013) investigated the sensitivity of simulated precipitation to domain size (where all domains were kept sufficiently large so that the errors caused by the lateral boundary forcings did not dominate the interior), in addition to varied resolution. They concluded that, for Denmark and surrounding areas, the results from a larger domain were superior to those of a smaller domain with higher resolution.
Therefore, just increasing resolution does not necessarily imply better results. Even though a finer grid enables more processes to be resolved, as well as more realistic surface forcing, for example in association with topography and land ocean contrasts, the impact of these on the model response may be modified or even masked by artefacts and errors in the model parametrisation, in some cases even deteriorating the resulting response (Giorgi and Marinucci, 1996;Pope and Stratton, 2002). For example, Wilcox and Donner (2007) concluded that the difference in simulated frequency of extreme rain events using two different convection schemes (in otherwise identical model setup) was larger than the change in the frequency of heavy rain events associated with a 2-K warming using either scheme.
As the grid size is reduced to Â5 km, usually referred to as the 'grey-zone' scale, convective precipitation response often starts to degrade as most convection parametrisation (CP) schemes are not designed for this resolution. The reason for this deficiency is two-fold; first, there is an increase in the risk of double counting convective precipitation and associated diabatic effects via both CP and resolved scale cloud processes (Gerard, 2007); second, through the closure of the scheme, excessive stabilisation at the grid point might arise suppressing the convective activity (Gerard et al., 2009).
Thus, at these spatial scales, parametrisations of convection eventually start to seriously violate underlying statistical assumptions upon which the schemes are based; hence, these have to become more sophisticated to account for the stronger interactions between small-scale convection and mesoscale processes (e.g. mesoscale circulations) (Frank, 1983;Molinari and Dudek, 1992).
In a case study, Yu and Lee (2010) investigated the role of CP in a numerical weather prediction (NWP) setting at resolutions ranging from 3 to 9 km. Simulations of a convective band over the mid-Korean Peninsula showed that a 3 km grid was able to resolve the convective system without CP, which was not possible at a coarser resolution. All simulations with CP gave a too-wide rainfall area and the authors concluded that one of the major issues with this CP was the false triggering of convection. They suggested a modification in the triggering of convection, which was shown to improve the results especially for a 6 km simulation. This shows the importance of the CP scheme at this scale. Fosser et al. (2014) is one of few studies that address the delicate problem of running a climate model in a multiyear transient simulation in the grey-zone regime (over central Europe). Similar to Yu and Lee, Fosser et al. conclude that at 7 km, convection is not explicitly resolved and parametrisation is still required. But their implemented scheme (Tiedtke, 1989) proved not to be applicable at this high resolution, particularly when looking at subdaily statistics. Also Berg et al. (2012) and Lucas-Picher et al. (2012) have carried out multidecadal RCM simulations within the greyzone (7 and 5.5 km, respectively). Berg et al. (2012) showed that the high-resolution simulations are comparable to coarser state-of-the-art simulations. Furthermore, the high end of the precipitation intensity distribution over Germany showed good resemblance to high-resolution observations. However, they did not investigate the validity of using the convective parametrisation scheme at this resolution. The study of Lucas-Picher et al. (2012) aimed to identify added value for a high-resolution simulation over Greenland when contrasted to a coarser one at Â27.75 km. The finer description of the orography led to a change in the precipitation pattern and more physical plausible climate fields. The spatial variability increased in the high-resolution simulation, but due to the sparseness of observations over Greenland it was difficult to fully validate.
The NWP model HARMONIE (Hirlam Aladin Regional Mesoscale Operational NWP In Europe) is especially interesting in this respect as it is specifically developed to be used at a large range of horizontal resolutions within the meso-gamma scale (from around 10 km down to 3Á4 km). Particularly, it has a convection scheme that is designed to operate in the grey-zone regime (Gerard, 2007;Gerard et al., 2009). This is achieved primarily through (1) A clean separation of resolved and subgrid condensates to avoid double counting.
(2) The use of a prognostic closure instead of a diagnostic one, which relieves the need for a quasi-equilibrium hypothesis, the validity of which is questionable at high resolution.
These properties make the CP one of the more sophisticated available today; however, until now it has mostly been used within the NWP community. We believe that the regional climate modelling groups are facing three competing demands in the coming years: (1) having a reasonable sized grid domain for synoptic system to develop, (2) for climate change analysis, particularly for rare events an ensemble of simulations are needed and (3) high enough resolution to properly simulate highly localised events such as extreme rainfall. To address this, we present results from simulations with the HARMONIE model system in a climate model setting over a relatively large domain, namely Europe, for a continuous 10-yr period. In one of the simulations, the resolution is set to Â6 km resolution, to the authors' knowledge, so far unprecedented at least at this spatial scale and length of simulation. We think this is an interesting future combination that fulfils the three points above until high performance computing allows RCMs to operate on a cloud-permitting scale. The aim of this study is two-fold: (1) to investigate to what extent the higher resolution provides an improved representation of higher order regional climate statistics such as intense precipitation events compared to another HARMONIE simulation at the more standard 15 km resolution and (2) to investigate the general climate model performance at grey-zone resolutions.

Description of HARMONIE
HARMONIE (version harmonie-36h1.3) is a seamless NWP model framework where the components have been built through cooperation between several national meteorological services. The model system contains a suite of physical parametrisation packages that are developed to be applicable to different resolutions: (1) Â15 km ALADIN physics (Be´nard et al., 2010), (2) at resolutions traditionally referred as the grey-zone ( Â5 km), ALARO physics (Gerard, 2007;Piriou et al., 2007;Gerard et al., 2009 and (3) at convective permitting resolutions ( Â2 km), AROME physics (Seity et al., 2011).

Dynamics
HARMONIE is based on the ALADIN non-hydrostatic (NH) and spectral dynamical core (Bubnova´et al., 1995;Va´nˇa et al., 2008;Be´nard et al., 2010). It is based on a two-time level semi-implicit Semi-Lagrangian discretisation of the fully elastic equations in the atmosphere. In the vertical, a mass-based hybrid pressure terrain-following coordinate is used (Simmons and Burridge, 1981;Laprise, 1992).

Surface
The surface parametrisation framework in HARMONIE is SURFEX [Surface externalise (Masson et al., 2013)]. It originates from the ISBA (Interactions SoilÁBiosphereÁ Atmosphere) surface scheme (Noilhan and Planton, 1989) and is an external surface modelling system available off-line as well as coupled to atmospheric models. When coupled to an atmospheric model, SURFEX receives variables such as air temperature, humidity, wind and precipitation for every time step and then uses them to compute momentum and surface energy fluxes. Land surface properties in SURFEX are taken from the ECOCLIMAP database (Champeaux et al., 2005). A more thorough description of SURFEX is found in (Le Moigne, 2012).

Radiation
The radiation parametrisation in HARMONIE is a twostream scheme with one band for solar and one for the thermal spectrum. It is developed to be flexible and fast (Ritter and Geleyn, 1992). The cloud geometry assumes maximum overlap between adjacent cloud layers, and clouds separated by clear air are independent (i.e. maximumrandom overlap). The treatment of cloud optical properties is updated according to Masek (2005).

Moist processes
HARMONIE uses the 3MT (Modular Multiscale Microphysics and Transport) (Gerard, 2007;Piriou et al., 2007;Gerard et al., 2009) CP scheme, which is specifically developed towards horizontal resolutions of Â3Á8 km. In standard parametrisations, separate schemes are used for deep convection (DC) and for 'non-convective' (i.e. resolved large-scale) clouds, with microphysical conversion to precipitation treated separately in each scheme. However, at ever-higher resolution, the risk of double counting convective processes increases. 3MT handles this by formerly separating the different contributing processes (see below).
The calculation of cloud fraction follows Xu and Randall (1996). This scheme computes the resolved part of condensation and evaporation. Condensate estimated at this step is saved until the condensate from the convective part of the model is computed and the sum is passed on to the microphysical package. The microphysical processes handle five prognostic water phases, where autoconversion and evaporation are computed level by level (Lopez, 2002;Gerard et al., 2009). Details of evaporation, melting and treatment of sedimentation can be found in Geleyn et al. (1994) and Geleyn et al. (2008), respectively.
The turbulence parametrisation is a pseudo-prognostic Turbulent Kinetic Energy (pTKE) scheme, which is an extension of the Louis type vertical diffusion scheme (Louis, 1979). It is connected to the turbulence scheme from Cuxart, Bougeault and Redelsperger [the CBR scheme, Cuxart et al. (2000)] through the calculation of turbulent mixing lengths, adapted from Redelsperger et al. (2001). Further information of the direct implementation can be found in Geleyn et al. (2006).
The deep CP is a mass-flux scheme based on two prognostic variables: vertical velocity relative to the environment and a mesh fraction of the updraught. Gerard et al. (2009) describe the mesh fraction as approximately the horizontal extension of the updraught. Instead of a mean grid box value, this provides an updraught that is concentrated to a fraction of the grid box. The prognostic variables enable the convection to evolve gradually with time, and precipitation can fall at a later timestep and in a different grid box than the convection process started. The prognostic closure is based on vertically integrated moisture convergence where the resolved moisture in a given timestep either leads to condensation in convectively active layers or increases the mesh fraction of the updraught. There is a clean separation between convective condensation and vertical transport fluxes, where the DC condensates are combined with corresponding large-scale condensates before entering the microphysics. Another feature that enhances the realism of the CP is that the downdraught is handled completely decoupled from the updraught. It is parametrised using the same prognostic variables as the updraught, being largely constrained by the total precipitation enthalpy fluxes (Gerard et al., 2009).

Experiment design and evaluation data
3.1. Model setup HARMONIE in climate mode (from here on HCLIM) was setup, using ALARO physics, on a curvilinear grid with a conformal Lambert projection over Europe ( Fig. 1) at two resolutions, 15 and 6.25 km, HCLIM15 and HCLIM6, respectively. Integrations have been performed from 1998 to 2007 with 4 months of spinup time prior to this. The model is forced by ERA-Interim [ERA-I, Dee et al. (2011)] at the lateral boundaries and by ERA-I sea surface temperature (SST) every 6 hours, with linear interpolation in between. An eight-point wide relaxation zone relaxes the solution towards the boundary data following Davies' (1976) formulation. For both simulations, the hydrostatic option of the dynamical core has been used in order to assess the differences only in resolution. The domain size is 300)320 grid boxes in the horizontal for HCLIM15 and 720)800 for HCLIM6. Both have 60 hybrid coordinate levels in the vertical (see Table 1 for comparative pressure levels), and a time step of 360 and 180 seconds in HCLIM15 and HCLIM6, respectively. In all figures, the extension zone (which extends dynamical fields to ensure periodicity over the whole domain) of 11 grid points has   Table 2 lists the various data sets used for evaluating the HCLIM results. To the extent possible, we compare with gridded observational data sets, primarily the E-OBS data set (Haylock et al., 2008). E-OBS consists of station data of daily precipitation and temperature that have been interpolated to a 0.228 (Â25 km) resolution over Europe.

Evaluation data
Although the gridded data are unique in its spatial extent combined with a high station density, E-OBS is not ideal to evaluate a 6 km model and therefore mainly used to assess the large-scale climate patterns of precipitation and temperature. Therefore, we also use gridded high-resolution subcontinental data sets ( Table 2). As observations of total cloudiness and surface energy fluxes are sparse and associated with large uncertainties, we use both ground-based, satellite-based as well as modelbased (reanalysis) data in the evaluation.
Observations of any climate variable are associated with constraints and limitations for a number of reasons, for example, sampling frequency and coverage limitations, deficiencies in measuring devices as well as calibration issues. As this study emphasises the evaluation of simulated precipitation, we briefly discuss issues specifically associated with precipitation observations. The use of point measurements (gauges), whether or not interpolated to a grid using more or less advanced interpolation methods, is standard in model evaluation, and is also employed here. However, systematic biases in gauge-based observations of precipitation can be substantial, especially in winter at high latitudes and in regions with complex terrain (Adam and Lettenmeier, 2003). The main factors contributing to the biases are wind turbulence induced undercatch, wetting losses, evaporation losses and underestimation of trace amounts (Yang et al., 2005), all conspiring to lower estimates compared to the real values. For the Baltic region, for example, Rubel and Hantel (2001) estimated that uncorrected observations underestimated actual precipitation by 2Á5% in summer and 25Á50% in winter. Furthermore, gauge measurements are, in principal, restricted to land areas. As they only sample events that occur over the gauges themselves, the gauges generally under-sample localised events, a prominent feature during the convectively active summer season. In complex terrain, observations tend to be sparse and gauges are most often placed in valleys (Frei and Scha¨r, 1998). This can lead to substantial underestimation of precipitation, especially extreme events during the winter season. Consequently, comparison of model grid box averages with station-based observations (although employing advanced interpolation methods) should be done with care. Using high-resolution (both in time and space) observational data sets (Table 2) may improve on such issues, by reducing the sampling errors and more realistically represent the spatial and temporal variability. The seasonal and monthly averages calculated in this study are based on 10 yr of daily data. A standard Student's t-test is applied to address the significance of the differences between observations and HCLIM, and in the seasonal maps of precipitation, only statistically significant differences at the 5% level, at each grid point, are shown. Wernli et al. (2008) introduced an object-based quality measure developed as a means to identify, compare and verify relevant features and characteristic attributes of simulated and observed fields of precipitation. The method addresses three distinct aspects of the field, namely Structure, Amplitude and Location, in short SAL. A key property of SAL is that by taking into account the structure of the precipitation field, SAL provides information about the physical nature of the processes involved (e.g. regarding scattered convective cells or frontal rain bands etc.) and how the simulation of these characteristics compare to observations. During the last decade or so, as the resolutions of numerical models used within the NWP community have gradually increased, several methods such as SAL have been developed for verification purposes. The foremost reason is that with higher grid resolution (B10 km) precipitation fields acquire more regional and local structures, especially considering intermittent convective rainfall. Therefore, in a situation where a local feature being forecast is somewhat displaced compared to observations may generate large errors in standard grid point quality measures such as RMSE, an issue commonly referred to as 'double penalty' (Ebert, 2008;Wernli et al., 2008). SAL on the other hand adopts a regional perspective, thereby avoiding a quality assessment of point-wise predictions. For the same reason, even though specifically developed for NWP verification purposes, SAL can be used to assess the quality of climatological data sets, for example provided by climate models (Fru¨h et al., 2007;Wernli et al., 2008). A prerequisite is that the model does not diverge too much in space and time from observed reality in terms of large-scale circulation patterns and dominating weather conditions, and thus forcing by reanalysis and/or applied nudging are necessary for a meaningful SAL analysis.

The SAL method
The amplitude component (A) is the normalised difference of the domain-averaged precipitation amounts. A can take values between 92, where positive values mean an overestimation and negative values an underestimation of precipitation. For example, overestimations of a factor of 1.5 and 2 lead to values of A00.4 and 0.67, respectively. The location (L) and structure (S) components require coherent objects to be identified in both simulated and observed fields. We follow the method used by Wernli et al. (2008), which involves performing contour searching, where the contour is a suitably chosen threshold that leads to distinct objects being identified. In their study, they find a threshold of 1/15 of the maximum precipitation value in each field suitable, and this is used here. To focus on mesoscale structures such as mesoscale convective systems, we remove small-scale objects that are unresolvable by the models analysed here. To achieve this, we apply a convolution procedure that smooths the field before thresholding the data. For unforced convection, Rasmussen et al. (2012) studied how well spatial correlation patterns of summer precipitation for the central United States were simulated by two RCMs. They showed that the models and observations were more strongly correlated when compared on larger scales, and their main conclusion was that model precipitation should be compared with observations on scales of at least around 100 km. Therefore, the convolution here is performed with a square of size 9 )9 grid points replacing the precipitation in one grid point with the average within the square whose centroid is located at that grid point.
The S component compares the volumes of the separate precipitation objects and contains information about the size and shape of the objects. As with amplitude, S can have values between 92, where negative values are indicative of a simulated field having too small and/or too peaked objects, and positive values of the objects are too large and/or 'flat'. For instance, in the case that the model predicts widespread, stratiform-like, precipitation instead of observed small-scale convective events, this would render a large S value. The location (L) consists of two additive subcomponents. The first accounts for the location of the centre of mass for the domain wide precipitation field, and the second considers the averaged, weighted distance between each objects' centre of mass and the centre of mass of the total field. This second subcomponent is necessary to account for the fact that many different precipitation fields can have the same centre of mass. Adding the two subcomponents, L can have values between zero and two, the former indicating a perfect match.

Results
This is the first time HCLIM has been used as a climate model and also one of few studies where an RCM has been run in climate mode at grey-zone resolution at this spatial scale. We therefore present large-scale climate features simulated by the model before we turn into the higher order and grey-zone scale characteristics. First, we compare the two HCLIM runs to standard evaluation data sets frequently used in the literature (including highresolution observations where possible). Also, we contrast the HCLIM15 results to another, well-established, model system, the Rossby Centre RCA4 atmospheric RCM (Kjellstro¨m et al., 2013), run at a horizontal resolution of 15 km. After establishing the general performance in a standard setting, we address the impact of higher model resolution by examining the HCLIM6 and HCLIM15 simulated climate in more detail focussing on precipitation. To put the HCLIM6 results in perspective with regards to the grey-zone resolution, we include a comparison to another version of RCA, namely RCA3 (Samuelsson et al., 2011), which has been run over Europe at 6 km horizontal resolution forced by ERA40 reanalysis at the lateral boundaries.

Large-scale circulation
The large-scale circulation is well represented in HCLIM, as evident from the empirical probability density functions (PDFs) of daily mean sea level pressure (SLP), see Fig. 2 (the European subregions are defined in Fig. 1). The close resemblance to ERA-I in both seasons as well as separately over land and sea (not shown) is remarkable, especially since no spectral nudging has been applied. It is also noteworthy that HCLIM simulates the daily variability with a higher accuracy than RCA4. These results give confidence that HCLIM accurately simulates large-scale synoptic variability over Europe.
In seasonal mean maps over Europe (not shown), biases in HCLIM are at most 92 hPa in any season. The main differences are positive (higher SLP), seen most frequently over southern Europe, except for summer when there is a positive bias over North Europe. The largest positive bias is seen in winter and with maximum amplitude downstream of the Alps, which could in part be related to problems in the model accurately simulating lee cyclogenesis downstream of the Alps (e.g. Smith et al., 2006). This bias is manifested over the Mediterranean region as a small contraction of the daily distribution (not shown); there are more days in HCLIM (than ERA-I) with values in the midrange and an indication of too few in the lower range. In accordance with this, a small shift to higher values in the distribution is seen for southeast Europe (Fig. 2). Otherwise, the PDFs of daily SLP are remarkably accurate in both HCLIM simulations.  given by the monthly inter-annual variability (depicted by the vertical lines in the figure, representing 91 standard deviation) during most months in North Europe, most significantly in summer (more cloud cover) and winter (less cloud cover). The differences are statistically significant in all months except March, April and September. The difference in clt is also statistically significant for East Europe and West Europe in late spring and summer compared to ERA-I. However, the differences over West Europe are not as large when comparing to CM-SAF. There are (statistically significant) lower amounts of clt in summer in HCLIM compared to CM-SAF (instead of a positive bias as compared to ERA-I). The larger values of clt over North Europe compared to CM-SAF during winter (Fig. 3) should be interpreted carefully. Satellite products are highly uncertain at high latitudes during winter (CM-SAF, 2005;Karlsson et al., 2007). A comparison with the CRU data set (Harris et al., 2013), which is based on surface measurements, does not exhibit this large bias during winter (not shown).

Clouds, radiation and surface heat fluxes
Annual cycles of energy fluxes at the surface over Europe are shown in Fig. 3. Compared to ERA-I, HCLIM underestimates short-wave downwelling radiation (SWd) during spring and summer and overestimates in winter, consistent with the corresponding cloud cover (clt) biases. Note again the larger consistency in clt and SWd over primarily West Europe between HCLIM and satellite data (CM-SAF). Also, comparing SWd in West Europe to seven groundbased observations (measuring towers) from the BSRN (Baseline Surface Radiation Network) (Ko¨nig-Langlo et al., 2013), the annual cycle is well simulated in HCLIM (Fig. 4). The model somewhat overestimates SWd throughout the year, with maximum bias of ca 20 W m (2 . The annual cycle of long-wave downwelling radiation (LWd) in HCLIM has a systematic negative bias for all months and regions compared to ERA-I (Fig. 3) and also  compared to a few BSRN stations (not shown), which is most pronounced in the cold season. Furthermore, HCLIM6 shows a somewhat better agreement than HCLIM15, particularly for West Europe and East Europe. The reason for the general negative bias in LWd is not clear, especially not over North Europe in summer when HCLIM actually overestimates mean total cloud cover. A gross comparison of vertically integrated cloud water and ice, which relates to cloud emissivity, shows that HCLIM has more cloud water and ice during summer in North Europe compared to ERA-I (not shown), consistent with the bias pattern in cloud cover. However, errors in LWd also arise from low-level temperature errors; over North Europe HCLIM is too cold in summer (see Section 4.3), and once this cold temperature anomaly has been developed, LWd fluxes from the lower atmosphere towards the surface are too low. To investigate this in more detail, a more in-depth study of the treatment of clouds (covering more aspects of clouds, such as vertical extent, cloud-base heights and cloud temperatures) and clear sky radiation in HCLIM is needed. However, this is beyond the scope of this study.
In West Europe, mostly due to the underestimation of SWd, there is a shortage of net radiative energy (R n ) (i.e. the sum of downwelling and upwelling short-wave and long-wave radiation) at the surface in spring and summer in West Europe (Fig. 3), which partly acts to produce a cold bias in the near-surface temperature in this region (Section 4.3). In North Europe, too much cloud cover in spring and summer (and thus an underestimation of SWd) is one of the main causes for a deficit in net surface radiation. In winter, the low R n is instead caused by the underestimation of LWd. In East Europe, the only significant difference in R n compared to ERA-I is in winter, again mostly due to too small LWd fluxes.
In the lowermost panel in Fig. 3, the annual cycle of anomalies of surface latent (LH) and sensible (SH) heat fluxes is shown. There are some evident differences between the models and ERA-I, which is partly explained by soil moisture in ERA-I to some degree being constrained by assimilation of near-surface humidity. In West Europe [even more so in south-eastern Europe (not shown)], HCLIM has strong concurrent positive and negative anomalies in SH and LH, respectively, in the summer. RCA4 has a similar, and even somewhat stronger, bias pattern. The strong reversal in SH/LH ratios compared to ERA-I could be due to an excessive drying of soils in summer in these regions. As will be seen later, precipitation is lower compared to observations in summer in southern Europe, particularly in south-eastern Europe, and this is associated with an underestimation of cloud amount, at least compared to CRU and CM-SAF (not shown). This, in turn, leads to a corresponding overestimate of SWd. Together these factors combine to accelerate a drying of the soils and later on during the season to an underestimation of evaporation that then feeds back on precipitation production, causing it to be too low. This is a common problem in many RCMs (Christensen et al., 1997) and also contributes to too high summer temperatures in this region, evident also in HCLIM as discussed below.
In East Europe, the situation is reversed in late spring and early summer: considerable higher LH flux than ERA-I ( Â10Á15%) and a corresponding lower SH flux of similar magnitude. This indicates that the soils are wetter in HCLIM during this season, and as the local correlation between surface heat fluxes and precipitation is strongest during convective conditions, the larger LH flux in HCLIM (and RCA) may be related to too much precipitation simulated for eastern Europe (see Section 4.4). In North Europe, HCLIM overestimates both LH and SH fluxes during winter which probably is caused by a warm bias in this region (Section 4.3). However, the strong negative bias of SH during spring and summer (for both HCLIM and RCA) is harder to explain. HCLIM has generally more precipitation than ERA-I (as discussed later). These wetter conditions, combined with an underestimation of SWd at the surface, most pronounced during late spring and early summer (Fig. 3), may be the reason for lower SH fluxes. 4.3. Near-surface temperature 4.3.1. Annual cycle. The 2 m air temperature, T2m, (Fig. 5) as simulated by HCLIM, is approximately within 92 C from E-OBS. The biases are generally on the cold side with RCA4 mostly as cold or even colder than HCLIM. As a result of too wet conditions and underestimated SWd, as discussed above, the biases are most pronounced in spring, over most subregions reaching 2Á3 degrees lower than E-OBS. A significant exception to the cold bias is the quite large warm bias in winter seen in HCLIM over Sweden. In the present version of HCLIM, lake surface temperatures are not prognostic but use a constant value retrieved from the initialised bottom soil layer. A short sensitivity test using soil temperatures from winter climatology removed the warm bias and also revealed that the influence of the bias during winter is confined to the lower atmosphere and the vicinity of major lake covered regions that become frozen and thereby very cold in winter. In the current release of HCLIM, this deficiency is not easily remedied; however, in a future release of HCLIM a prognostic lake model will be included, thus alleviating this problem. Lastly, there is not much difference between the two HCLIM runs for any of the regions; however, where there is, HCLIM6 is generally closer to the reference data (e.g. in Germany, Switzerland, France and Spain). Figure 6 shows the PDFs of T2m for winter and summer. The width and amplitude of the temperature distributions in HCLIM are similar to observations. In particular, HCLIM captures the skewed tail towards cold conditions in winter in Sweden (to a lesser extent also in Switzerland). These low temperatures are likely related to snow covered areas and very stable anticyclonic conditions, and as seen in Fig. 2, such circulation patterns in winter are well captured in the model. The cold bias in the seasonal mean climate (Fig. 5) is evident in the PDFs as small negative shifts in the temperature distributions (except for winter in Sweden where there is a warm bias).

Daily variability.
There are not any notable differences between HCLIM6 and HCLIM15 in either winter or summer, not even in areas that have complex topography. We further investigated this for alpine regions with steep topography in Scandinavia and in the Alps, by calculating PDFs in the HCLIM runs at their native resolution (not shown). HCLIM6 exhibited somewhat larger, but not statistically significant, probabilities for the temperatures far out in the lower and upper tails of the PDFs in both winter and summer, and were closer to high-resolution observations. As we were limited to only study daily means here, differences between HCLIM6 and HCLIM15 that would show up in for example daily maximum and minimum values cannot be assessed.

Precipitation
The large-scale seasonal patterns of the deviations from E-OBS are quite similar in HCLIM6/15 and RCA4 (Fig. 7). Overall, the models produce larger precipitation amounts compared to E-OBS. Also, ERA-I has more precipitation than E-OBS in all seasons with largest deviations in winter and in spring. In winter and summer, the corresponding continental wide mean values are Â20Á30% larger than E-OBS in HCLIM (Table 3). The winter wet bias is most emphasised in eastern Europe, and drierthan-observed conditions are mostly connected to complex terrain. For summer, a dipole pattern can be discerned: wetter in the north and negative or near-neutral in the south (except in connection to steep topography). The dry biases are mostly concentrated to south-eastern Europe (also seen in dry areas of the Iberian Peninsula and northern Africa) and are likely associated with the positive near-surface temperature bias compared to E-OBS. The elevated temperatures suggest too dry soils affecting the partitioning between sensible and latent heat fluxes in the energy budget so as to conspire to further reduce precipitation, all components closely intertwined in a feedback process (see 4.2. and 4.3.). The longterm monthly mean values reveal that the dry biases becomes more pronounced during summer, being small in June and relatively strong and extensive in August. Thus, the dryingwarming feedback is not due to too dry soils at the start of the summer season but is initiated during the season due to limited precipitation amounts. In autumn, differences are of generally smaller amplitudes and to a lesser extent statistically significant. Spring stands out in the models as both HCLIM and RCA clearly produce excessive amounts over large parts of Europe, most significantly in north and eastern Europe, and over mountainous areas. Corresponding continental mean values are Â60% larger than E-OBS in HCLIM. We emphasise again that E-OBS suffers from undercatchment problems, especially in winter and spring as well as in mountainous areas. Also noteworthy is that HCLIM has mostly smaller deviations towards E-OBS than RCA for all seasons.
For the domain as a whole, comparing the time-averaged spatial fields, the normalised standard deviation [or Coefficient of Variance, CV(std)] is within 10% from the observations in both summer and winter (Table 3). Moreover, the spatial correlation coefficient (R) is !0.65 (Â0.9 in summer), and the normalised rmse [or Coefficient of Variance for rmse, CV(rmse)] is Â30Á40% of the observed mean value. For spatially averaged fields (i.e. domain area averaged time series of daily data), CV(std) is within 15% of observed values and R relatively high ( !0.60 in summer, !0.80 in winter) and CV(rmse) is Â30% of the J a n observed time average. These results indicate that both the spatial pattern of seasonal precipitation and the daily variability of continental-scale precipitation are well represented in HCLIM compared to E-OBS. Furthermore, Table 3 shows that HCLIM6 improves the results compared to HCLIM15, mostly in terms of mean and rmse values (mostly with statistically significant improvements, i.e. non-overlapping 95% confidence intervals). Figure 8 shows the phase and amplitude of the annual cycle of precipitation over six subregions of Europe. Both HCLIM simulations and RCA4 show consistently larger precipitation amounts in most months, although HCLIM remains within the monthly inter-annual variability of E-OBS. The exceptions are in spring/early summer in Norway, Sweden, Germany and France where differences are statistically significant. Over Switzerland, with its steep topography, HCLIM is within one standard deviation from E-OBS, for each month RCA4 displays an annual cycle almost the reverse of observations. In Norway, the positive bias compared to E-OBS turns into a negative bias compared to the high-resolution MetNo observations. The deviations in spring and early summer are reduced when including the MetNo data; however, the cold season differences increase. The very large spread between E-OBS, models and MetNo in winter clearly demonstrates the issue of model evaluation of cold season precipitation in areas with steep topography. Correction factors due to undercatchment have been applied to the MetNo data and not in E-OBS which explains a large part of the differences between these data sets. Figure 8 shows that the inclusion of high-resolution data in the analysis somewhat reduces deviations (e.g. in Sweden, France and Spain), although the improvements are mostly statistically insignificant. As all data have been aggregated onto the coarsest grid (E-OBS at 25 km), some information is lost which may mask possible benefits of using high-resolution observations. Figures 9 and 10 present empirical PDFs based on daily mean precipitation for a number of subregions for winter and summer, respectively. Here, we concentrate on wet days. Kjellstro¨m et al. (2010) investigated the sensitivity of threshold for defining wet days. They concluded that because many RCMs have too many days with low drizzle amounts (i.e. B1 mm/d) compared to observations, 1 mm is an appropriate threshold for analysing wet days and we apply the same threshold in our analysis. As we are mostly interested in extreme events, the inset in Figs. 9 and 10 depict days with !50 mm/d in the HCLIM runs at their native resolution to be able to compare to observations before smoothing by aggregating to the E-OBS 25 km grid. The comparison is then made to high-resolution observations, which have been aggregated onto the HCLIM6 grid.

Variability and extremes.
HCLIM reproduces the observed distributions in terms of both frequency and intensity, at least qualitatively. In summer, there is an indication of systematic lower probabilities (note the logarithmic scale) in HCLIM in the frequency of moderate-to-strong events (10Á50 mm/d) most prominently in Sweden, Germany and Switzerland. In winter, the agreement is better compared to summer. The most significant is that HCLIM generates higher probabilities in Sweden than both E-OBS and PTHBV for strong to very strong events ( !50 mm/d).
Multiplying the raw histogram numbers by the mean precipitation rate in each bin yields the total precipitation amount contributed to the seasonal mean by each bin. Differences between models and observations for the rainfall intensities that are contributing most to the seasonal accumulations are clearly seen in Figs. 11 and 12. For instance, although the PDFs indicate that HCLIM has lower rain rate probabilities for moderate-to-strong intensities in summer (Fig. 9), Fig. 12 shows that HCLIM instead overestimates precipitation totals for the low-to-moderate intensities (Â1Á10 mm/d), that is, at the rate that contribute most to the total amount. This has a significant impact on the mean values. An even more prominent overestimation Units are mm/day for mean and root mean square error (rmse). nstd is the normalised standard deviation and R is the coefficient of correlation. R and rmse are calculated between the HCLIM runs and E-OBS. Subscripts S and T refer to spatial and temporal, respectively. The uncertainty range is given by the 95% confidence interval from bootstrap computations using 1000 samples.
of the low-to-moderate rate range is found for spring (not shown), a feature that is clearly reflected in the mean estimate (Fig. 7). In winter, there is a better agreement between HCLIM and observations. The location of the peak of the curves in Figs. 11 and 12 is generally well captured by HCLIM. It indicates that the model distributes the total accumulated precipitation well between the different intensity rates, but as mentioned earlier, because of the larger amplitudes at low-to-moderate intensities, there is a problem with the frequency being too high in this range. An exception is Switzerland during summer where HCLIM simulates the peak at lower intensities ( Â4Á6 and Â8Á10 mm/d, respectively) but with larger amplitude resulting in a relatively high seasonal mean in agreement with observations ( Figs. 7 and 8).
Of primary interest, though, is precipitation extremes, and whether or not there is added value increasing the horizontal resolution into the grey-zone scale. For the very rare events, !50 mm/d, HCLIM6 generally outperforms HCLIM15 (see inset figures for Germany, Switzerland and France in Fig. 10 and Sweden, Germany and France in Fig. 9). HCLIM6 has higher probabilities for extreme events, in closer resemblance to high-resolution observations, and the signal is more coherent. One may question the fairness of comparing HCLIM15 to observations aggregated onto the HCLIM6 grid and not onto the HCLIM15 grid. We tested this by aggregating observations to both resolutions, but the conclusions did not change (not shown). Extreme precipitation is thus better simulated compared to high-resolution observations at the higher model resolution. The added value is a strong indication of the benefit of running HCLIM within the grey-zone resolution range. Included in Figs. 9 and 10 are also results from RCA3. During winter it is clear that RCA3 overestimates precipitation in the higher end of the J a n  distributions, especially in Germany and Switzerland. For the same regions, in summer, RCA3 simulates extremes better than HCLIM6, while in France RCA3 clearly overestimates the precipitation (where HCLIM6 excels). These results suggest that another RCM, RCA3, with a convection scheme not particularly developed for grey-zone resolutions, sometimes is able to give similar results for summer precipitation extremes (when convective events dominate) as HCLIM6. However, the RCA3 results for winter show indications of model deficiencies not apparent in HCLIM.

Spatial characteristics of precipitation
In this section, we use the SAL method (Section 3.3) to evaluate spatial characteristics and attributes of precipitation fields simulated in the model. We focus on the summer season because of a relatively larger proportion of mesoscale convective precipitation, when high model grid resolution will likely be beneficial. Also, these are situations when models using CP may be more prone to experience symptoms of greyzone peculiarities, and thus SAL can provide quantitative information valuable for assessing model differences, especially between HCLIM6 and RCA3. Even though the latter in some regions has better representations of the summer PDFs of precipitation, is the realism greater in HCLIM considering its use of a CP scheme adapted to these higher resolutions?
To assess model performance using SAL, comparisons with 'persistence' and 'random' forecasts are made with the same approach as in Wernli et al. (2008). Here, persistence and random are based on observational datasets and are independent of the different climate models. A persistence forecast is constructed from observations but shifted 1 d forward in time. Random forecast refers to observations which have been randomly shifted around with the constrain that each daily precipitation field from observations is only used once. Comparing models relative to these standard reference forecasts provides a measure of the quality of simulated precipitation. It must be noted here that in contrast to NWP where models are initialised from an observed state, even within the domain, the climate runs here are only constrained by the boundaries and not by any initial state. Therefore, whereas NWP models stand a larger chance of beating persistence and random, as they should at least on short lead times, the same performance cannot be required from climate models which not necessarily aim to accurately forecast time and location of precipitation. In an effort to simplify, we constrain the analysis to areas with predominantly flat topography, thereby reducing the difficulty in interpreting model behaviour in complex terrain. By selecting areas that have altitudes below 800 m (as defined by model topography), we limit the analysis to some extent to unforced convective processes. The statistical significance of the calculated SAL scores is assessed with a bootstrap method (Efron and Tibshirani, 1993). Because precipitation is auto-correlated in time, we apply a block bootstrap, using a block length of 7 d, and the number of bootstrap samples is 5000. This results in 5000 estimates of medians for each SAL score considered, and these were then used to produce 5000 differences in medians between models. If the 0.5Á99.5% confidence interval for the difference does not include zero, then the difference is significant at the 1% level. Figure 13 depicts SAL scatter plots for daily mean precipitation fields for the Alps region (Fig. 1) and for France (see also Table 4). In each individual analysis, the comparison is made on the coarsest grid resolution between the model and observations. As the L component is very similar among the models, we focus on S and A. As a reference, the inter-quartile range (IQR) of random and persistence, depicted as rectangles, is included in the figures. First of all, the models have reduced spread, especially in the A component, compared to random and persistence, giving credit to the model performance in general. The scatter of dots appears quite similar among the models; however, a closer inspection reveals some important distinctions. In France, for example, a larger proportion of dots in RCA3 is assembled in the first (top right) quadrant, reflected in a positive median value of S (0.31) and A (0.62). This indicates that RCA3 typically produces too extensive and/or flat rain spell areas compared to observations. In this respect, HCLIM has better realisations of the spatial characteristics, as it reduces both the S and A median value, becoming more centred around zero. HCLIM6 has a median of S and A of (0.02 and 0.04, respectively, with an IQR of 1.4, the medians being statistically different from RCA3. More or less the same applies for the Alps region; while RCA3 still has too widespread and/or flat rain spells, HCLIM shows improvements in S, while the negative values for amplitude reflect an underestimation of precipitation (Table 4). As seen in Table 4, HCLIM6 has S median values closer to zero in all regions studied, the differences with RCA3 all being statistically significant. A further breakdown of the SAL values into 'dry', moderately wet and wet days, defined as the lowest 25%, middle 50% and top 25% of all the observed wet days, reveals that both models have positive and the largest absolute scores in all three components during 'dry' days (not shown). This is probably related to the issue of drizzle commonly seen in numerical models with CP, and thus days with very small observed amounts become more widespread in the models resulting in poor scores. We note that HCLIM suffers the least with this problem, as the scores in this category have the lowest median values (although the spread is sometimes marginally larger). The scores improve gradually towards wetter days, both in the median and in the IQR.  Scatter plots of the SAL components structure (S, abscissa), amplitude (A, ordinate) and location (L, colour of dots) for two regions. The Alps (left column) and France (right) and for three simulations: HCLIM15 (top row), HCLIM6 (middle) and RCA3 (bottom). The box plots shows the distributions in S and A, representing 25th, 50th and 75th percentiles (the box), and the 5th and 95th percentiles (whisker). Grey and magenta rectangles illustrate the IQR of 'random' and 'persistence' forecasts, respectively.
Based on these results, we draw the conclusion that, at least in the analysed regions, HCLIM6 generally improves on the spatial structure of the precipitation fields (S closer to zero) compared to RCA3 (and also to HCLIM15, although to a lesser extent) by having smaller and/or more peaked rain spells in better agreement with observations. To further concretise the SAL scores, Table 5 shows quantitative information of the spread of dots between the models and compared to persistence and random forecasts. A radius, r, is defined that represents the distance to the origin (i.e. a perfect score) spanned by the three components in the three-dimensional SAL space. This radius is then calculated for the 5%, 10%, 20% and 50% collection of dots closest to the origin for each model and observation separately. Generally, the models produce better results than both persistence and random; the exception is in Norway, where HCLIM6 is closest to persistence. Moreover, 20 and 50% of the model's best scores are better than around 10 and 25% of the best random forecasts. Except for the Alps region, HCLIM6 has smaller radii than RCA3 in nearly all categories, most evidently in France. Even though HCLIM6 has a somewhat larger spread in the Alps compared to RCA3, we reiterate that Fig. 13 shows a clear shift in the distribution of dots in RCA3 towards larger S values not evident in HCLIM6.
A couple of points should be mentioned. First, the inherent uncertainties in the observations; although we have used, as far as possible, high-resolution gridded observations, these are mostly based on point measurements (gauges), and thus the effective resolution may vary over the grid. This may impact the analysis of spatial structure in a negative way, and we have tried to handle this issue to some degree by smoothing the fields before analysis. Second, only daily accumulated fields are used here, which in effect prohibits a proper investigation of spatio-temporal evolution of convective processes that have a characteristic time scale on the order of a few hours. Such an evaluation, studying subdaily summer precipitation statistics in HCLIM will be the core subject in a forthcoming study.

Discussion and conclusions
This study describes the performance of a new extensive system, HARMONIE, in a climate configuration with a resolution within the grey-zone scale. We have shown that not only the large-scale climate is very well represented but also on the regional scale and for higher order climate statistics the model is in overall good agreement with observations and generally shows better performance than another state-of-the-art RCM, RCA4. This leads us to conclude that HCLIM is a suitable tool for investigating future projections over Europe. The value in parenthesis after the median represents the inter-quartile range (IQR). Numbers in bold style indicate that the difference in the median compared to RCA3 is significant at the 1% level. See text for more details. Crucially, we have shown that decreasing the grid box size from Â15 to Â6 km results in a more realistic representation of extreme precipitation occurrences. A few model issues have been addressed for further development. Particularly in North Europe during winter there is a near-surface warm bias of 2Á68C with maxima confined to inland water due to the treatment of skin temperature on these surfaces. For future model revisions, a lake model is being implemented to improve this inadequacy. There is a systematic underestimation of downwelling long-wave radiation at the surface over continental Europe, even in spring and early summer when there is an underestimation of downwelling short-wave radiation consistent with too much cloud cover. This, in combination with too extensive precipitation in this season (as much as Â60% greater amounts than E-OBS), leading to an excessive cooling effect from increased latent heat flux due to wetter soils, probably explains some of the spring cold bias. Also, the overestimation of cloud amount during spring and summer seems to be a real deficiency in the model, but again, cloud amounts are very hard to constrain from observations. A detailed analysis of precipitation reveals that not only the seasonal and monthly temporal and spatial patterns, including the inter-annual variability, are well reproduced in HCLIM, but also the full distributions based on daily averages are similar to observations. Nevertheless, the model has some identifiable problems with precipitation production for certain seasons and regions. There are higher amounts of precipitation compared to E-OBS in winter and spring, which is also seen in RCA4 (even larger bias in this model) and other RCMs. In these seasons, observations like E-OBS suffer from undercatch issues and may artificially amplify apparent model biases (Rauscher et al., 2010). The large excess of precipitation in spring (and to a lesser extent in summer) is attributed to the too frequent lowto-moderate intensity events in the model (which is the range that contributes most to the seasonal accumulation). For moderate-to-strong precipitation events, the model instead has too few events.
Added value with HCLIM at a grey-zone resolution is seen in both the precipitation intensity distributions as well as in the physical consistency, that is, the realism, of the simulated precipitation fields. Compared to more typical RCM resolutions, using high-resolution observational data sets and keeping the models on their native grid, very rare and high-intensity precipitation events are well reproduced in HCLIM. More importantly, HCLIM6 captures the frequency and intensity of these events better than HCLIM15 and in closer agreement to the high-resolution observations. There is also some indication of a decrease in the bias in HCLIM6 compared to HCLIM15 at the other end of the precipitation distribution as well, signified by a higher dry-day probability in HCLIM6 than in HCLIM15 (see also Fig. 12).
Similar conclusions have been demonstrated in other studies (e.g. Berg et al., 2012;Lucas-Picher et al., 2012;Fosser et al., 2014). Rasmussen et al. (2011) reported that in order to represent the spatial and temporal distribution of annual snowfall to a high level of accuracy in Colorado in the United States in the WRF model, a horizontal resolution of 6 km or below was needed. In simulations of a number of winter seasons in Arizona (US), Sharma and Huang (2012) showed significantly increased precipitation amounts increasing resolution from 12 to 6 km (but not so much from 6 to 3 km) in an RCM, and especially extreme events ( !21 mm/d) were seen in the 6 km run but not in the 12 km simulation. However, these studies have been limited to relatively small domains and/or focussed on the cold season when synoptic scale processes dominate compared to the summer season when convective processes are more important. In our results, we can extend these conclusions to apply to larger regions (Europe) which experience a high variety of surface characteristics (e.g. topography) and for both winter and summer. We have shown that a major advantage of higher resolution is the representation of rainfall at the extreme high-intensity tail of the distribution where such events are of high societal importance, due to their association with flooding, erosion and water damage (e.g. Trenberth, 2011).
The object-based SAL analysis performed on HCLIM and RCA3 summer precipitation implies that the realism of rain spell features in terms of size and volume is significantly better represented in HCLIM6 than in RCA3, both run at 6 km resolution. This highlights the importance of using formulations adopted for grey-zone applications. To accurately simulate precipitation and its extremes, one not only needs a model to actually represent the full intensity spectrum in a grid point analysis (like the estimation of PDFs in this study). These should also be represented in a realistic manner, reflected in the full spatial and temporal characteristics of precipitation as a function of the meteorological situation. In this study, we are limited to daily temporal resolution. For short duration, highly localised extreme convective precipitation events, the important processes involved usually have characteristic time scales of a few hours or less. Gerard et al. (2009) showed that the convection scheme in HARMONIE, 3MT, is able to perform well down to 4 km resolution in a NWP setting where more traditional convection schemes fail, especially in subdaily time scales, leading to increased realism of the time evolution of convective processes. Even though no such assessment could be done here, the few aspects we have investigated similarly indicate that HCLIM better simulate the spatial characteristics. This, in addition to be able to capture extreme events better than at coarser resolutions, leads us conclude that running HCLIM within the grey-zone resolution is both useful and suitable. To fully appreciate the benefit of running HCLIM at high horizontal resolution in a climate setting, though, subdaily statistics of extreme precipitation events (primarily of convective origin) should also be analysed, which is the subject of an upcoming study.