A low-cost and repeatable procedure for modelling the regional distribution of Natura 2000 terrestrial habitats

The present paper describes a procedure for mapping the distribution of Natura 2000 terrestrial habitats (Habitats Directive 92/43/EEC) at the regional scale (Lombardy, Northern Italy) by means of open-source software (QGIS and R). The habitat map within Natura 2000 sites was used for modelling the regional distribution of three selected habitats, by applying classification trees on freely available and fine-scale resolution environmental layers. Land use and forest type maps were combined to refine the regional distribution of selected habitats. The statistical validation showed a fairly substantial overall accuracy of predicted habitat distribution, which was used to determine the regional extent of the habitats and to evaluate the regional effectiveness of Natura 2000 network. We provide an easy and inexpensive procedure, replicable in other contexts in which just basic information on Natura 2000 terrestrial habitats are available, and usable for habitats monitoring according to the Habitats Directive. ARTICLE HISTORY Received 21 November 2017 Revised 4 July 2018 Accepted 7 November 2018


Introduction
The Habitats Directive (92/43/EEC) provides the legal basis of Natura 2000 (N2K), a network of protected areas established across all Member States (MS) of the European Union (EU) for the conservation of selected targets, i.e. natural habitat types (to which we will refer from here on as 'EU habitats') and species listed in the Annexes of the Directive. The monitoring and the 6-yearly reporting of the conservation status of EU habitats are among the requests made by the European Commission to all the MS. These requirements involve comprehensive, reliable and updated distribution maps of EU habitats, not just inside the N2K network (Evans & Arvela, 2011). However, these kind of thematic maps are usually lacking in several EU territories, especially outside N2K network; hence, data concerning the spatial distribution of EU habitats are extremely heterogeneous (Gruber et al., 2012;Maiorano et al., 2015;Rosati, Marignani, & Blasi, 2008;Viciani, Lastrucci, Geri, & Foggi, 2016). The most complete available information on EU habitat distribution refers to maps of 10 × 10 km grid resolution or coarser (http://data.copernicus.eu/data-access.html; EEA, 2014;Mücher, Hennekens, Bunce, Schaminée, & Schaepman, 2009), while more detailed information are often available only for some regions, or moreover only for N2K sites (e.g. Bertacchi, 2017;Viciani, Dell'Olmo, et al., 2016;Viciani, Dell'Olmo, Vicenti, & Lastrucci, 2017). To make EU habitat monitoring feasible, mapmaking should be started from fine-resolution spatial data (i.e. at the scale of N2K sites) and involve consistent and inexpensive procedures, respectively based on existing free available data and statistical methods.
Many machine-learning procedures have been used for habitat modelling in the frame of different research topics (e.g. Baselga & Araújo, 2009;Ferrier & Guisan, 2006;Guisan & Zimmermann, 2000;Thuiller, Araújo, & Lavorel, 2003). Among these, classification trees have been identified as one of the most effective techniques in GIS modelling (Muñoz & Felicísimo, 2004). For example, they have been successfully applied to map EU habitat spatial distributions at a coarse scale (10 × 10 km grid scale) across Europe (Mücher et al., 2009).
The aim of the present work was to define a straightforward procedure for implementing a fine scale regional distribution of EU habitats, knowledge of which is necessary and urgent to satisfy requests related to the 6-yearly reporting under articles 11 and 17 of the Habitats Directive. As a case study we selected the administrative region of Lombardy (Northern Italy), where an integrated map of EU habitat distribution is currently available only within the N2K network (Brusa, Cerabolini, Corti, & De Molli, 2016), while outside, which is a large part of the regional territory (∼ 85%), information is completely lacking, The procedure was specifically developed to map the three main structural types of terrestrial EU habitats (grassland, scrub and forest formations) by using freely available maps concerning environmental factors and land use. As an application of our procedure, we estimated the regional distribution of three selected target EU habitats, one for each structural type, widely spread in the study area but also occurring in many EU MS (http://eunis.eea.europa.eu/habitats): Mountain hay meadows (EU habitat 6520) for grasslands, Alpine and Boreal heaths (EU habitat 4060) for scrubs, and Asperulo-Fagetum beech forests (EU habitat 9130) for forest formations.

Study area
The study area corresponds to the whole territory of Lombardy, an administrative region of Northern Italy, which spans a surface area of 23,870 km 2 between 44°40'-46°37'N in latitude and 8°29'-11°25'E in longitude ( Figure 1).
Lombardy spreads across the Alpine and Continental biogeographic regions (ETC/BD, 2006), and its bioclimate types range from continental to oceanic (Pesaresi, Galdenzi, Biondi, & Casavecchia, 2014), with a large variability of mesoclimate due its complex orography. Geological substrates consist of a wide range of litho-types, including silicate or carbonate rocks, morainic and alluvial deposits. The high variability of environmental factors is consistent with the occurrence of a variety of plant communities, ranging from evergreen Mediterranean oak forests by Lake Garda to Alpine tundras of the highest mountains, and justify the large number of EU habitats present in the study area (Brusa, Cerabolini, Dalle Fratte, & De Molli, 2017): 58 on the whole, 49 terrestrial (code = 2, 4, 6, 7, 8 and 9) and 9 aquatic (code = 3).
The Lombardy N2K network occupies approximately 15% of the whole regional surface ( Figure 1) and consists of 193 Sites of Community Importance, or Special Areas of Conservation, and 67 Zones of Special Protection.

Selection of data sources
A schematic diagram of the development of the whole procedure is shown in Figure 2. We selected 13 layers concerning regional maps of EU habitats, environmental factors, land use types and geographic boundaries, all freely available from online data sources (Table 1): the Figure 1. The study area of Lombardy administrative region (Northern Italy) delimited with a bold black line in the inset map, and the actual extension of the N2K network in green. The red line is the limit between Alpine (northward) and Continental (southward) biogeographic regions.
We selected layers concerning environmental factors according to their supposed strength in explaining EU habitat regional distributions (Biondi et al., 2010;Brusa et al., 2017). This latter was obtained from the regional map of land use types (DUSAF, 'Destinazione  d'Uso dei Suoli Agricoli e Forestali' = 'Intended use of agricultural and forest land') and/or from the map of forest types (PIF, 'Piano d'Indirizzo Forestale' = 'Forest management plans'). We used geographic boundaries in the last phase of procedures to impose biogeographic constraints based on regional distribution of plant communities corresponding to EU habitats reported in literature (AA.VV, 2014; Biondi et al., 2010;Brusa et al., 2017).
Original layers in vector format were transformed into raster (adjusted to the DEM 20 m resolution, which was the lowest resolution of the input variables), and georeferenced to the regional coordinate system (UTM32N-WGS84, EPSG 32632), in order to perform computing operations.

Relationship between EU habitats and land use types
To assess the correspondence between one or a group of EU habitats and land use types, we firstly calculated how much of each EU habitat was represented in each land use type within the N2K regional network, by means of GIS intersections. Afterwards, for each land use class we selected the EU habitats most represented and congruent in terms of vegetation and ecology for that land use type Brusa et al., 2017;Del Favero, 2002). Forest EU habitats were selected from woodland land use types and further investigated using the forest types map (PIF). We uniquely assigned some forest types in PIF to a specific forest EU habitats, while for forest types in PIF presenting a wide physionomic and ecological range we modelled EU habitats as detailed in the next paragraph.

Dataset construction and EU habitat modelling
When a specific land use (or forest) type corresponded to a group of EU habitats (the most common case), we computed a model to split EU habitats from one another according to their ecology.
We used the training subsets for modelling EU habitats in the same group (response variables) by means of conditional inference trees (Hothorn & Lausen, 2003;Lausen & Schumacher, 1992), considering environmental factors as predictive variables. This technique estimates a classification relationship by binary recursive partitioning of predictive variables in a conditional inference framework (Hothorn, Hornik, & Zeileis, 2006); conditional nodes were evaluated through Monte Carlo testing (p-value < 0.01). The validation subset was used to check the model prediction efficiency, comparing observed and predicted distribution by means of confusion matrices. For each model, we computed different commonly applied indexes: (i) on the whole model, the proportion of correctly classified (accuracy, and its 95% confidence interval), no-information rate (NIR) and Cohen's Kappa (Cohen, 1960), sensitivity (% of positive effects) and specificity (% of negative effects), and (ii) for each EU habitat, sensitivity and specificity.
We fixed an acceptance threshold for a high overall accuracy at 85% (Foody, 2002). The NIR was used to evaluate whether the accuracy was higher than the proportion in the observed values of the largest class by means of a one-side binomial test, which performs a simple null hypothesis test about the probability of success in a Bernoulli experiment (i.e. a binomial trial) (Hollander, Wolfe, & Chicken, 2015).

Map production
For each EU habitat, we firstly mapped the potential distribution in GIS using the algorithms computed by the classification trees, and subsequently the estimated distribution through intersecting its potential distribution and the corresponding land use class (i.e. DUSAF, and eventually PIF for forest EU habitats). When a single EU habitat was assigned to two or more different land use types, we computed its regional distribution by merging all its estimated distribution in GIS. As an example, the estimated distribution of EU habitat 9130 derived by the sum of (a) direct assignation of PIF classes to the EU habitat, and (b) output from the model applied on the PIF class 'not-classified beech forests'.
The final output was the regional distribution of a given EU habitat, obtained filtering the estimated distribution by its geographic limits, based on well-established technical and scientific knowledge (e.g. data on presence/absence in administrative areas or biogeographic regions) (AA.VV., 2014; Biondi et al., 2010;Brusa et al., 2017). In our example, we applied filters to the estimated distribution of two target EU habitats: habitat 6520 was excluded from administrative province of Varese and from the Apennine belt; habitat 4060 from the low Po plain.
For the validation of regional distributions, we used the cartography of N2K network, as this is currently the only existing source concerning the real distribution of EU habitats in the study area. For each EU habitat, we divided the N2K network into two parts: the area occupied from a target EU habitat and the remaining N2K area (i.e. the area of nontarget habitats). To overcome this unbalanced design, due to extensive differences between areas of a target EU habitat (smaller) and that of non-target habitats (wider), we selected random points (N = 1000) stratified in each of these two areas (Hirzel & Guisan, 2002). For each point we assessed the presence/ absence of target EU habitat, both observed (N2K map) and predicted (regional distribution in N2K). Finally, we computed a 2 × 2 confusion matrix and the corresponding statistics as reported for EU habitat modelling (paragraph 2.4).

Results
Within each of the three structural types of terrestrial EU habitats, a group of EU habitats was assigned to one or two land use (DUSAF, and eventually PIF) classes (Table 2), and then a classification model was calculated for each of these groups (Supplementary material).
One of the advantages of using conditional inference trees (Hothorn et al., 2006) is that they automatically remove variables without a statistically significant association with the response (p > 0.01); in our study, they removed radiation for grasslands, slope for scrubs, pH and soil for forests (Supplementary material). Accordingly, substrate and precipitation were the most effective environmental factors, determining a statistically significant effect in all of the three models. Elevation and radiation were also significant, the former for grasslands and scrubs, the latter for scrubs.
The accuracy values were statistically higher than the NIR values in all models (Table 3). High values of overall accuracy were computed for scrubs and forests. However, the model of forest EU habitats was the most accurate, with a substantial agreement between observed and predicted classes according to the Kappa value. The three models showed high rate of specificity (> 84%) for each EU habitat, while in contrast, sensitivities showed a high variability, in particular for grasslands (between 64% and 89%) and scrubs (between 70% and 96%). TSS values ranged from fairly-moderate (only two cases) to substantial agreement between observed and predicted EU habitats: between 53% and 88% for grasslands, 58% and 95% for scrubs, 75% and 92% for forests.
The accuracy values of each regional distribution were statistically higher than NIR values (Table 4). A moderate or substantial accuracy was generally detected in each estimated distribution according to Kappa values. The specificity was high for all target EU habitats (>94%), but the sensitivity resulted in low values (the lowest for 9130 and the highest for 6520). The values of TSS ranged between 51% (EU habitat 9130) and 60% (EU habitat 6520).
The regional distribution of the three selected EU habitats cover overall 524 Km 2 (ca. 2% of Lombardy; Main Map). The largest area is covered by EU habitat 6520 (254 km 2 ), while the other two target EU habitats are of approximately the same size (147 km 2 for 4060 and 123 km 2 for 9130). EU habitat 4060 is mainly within N2K (93% of its regional distribution), while smaller proportion of EU habitats 9130 and especially 6520 are covered by N2K (respectively 57% and 13%). Legend: Grass = permanent grassland without trees and scrubs (from DUSAF); Meadow = permanent meadows with sparse trees and scrubs (from DUSAF); Scrub = scrublands (from DUSAF); Wood = forest areas (from DUSAF); Beech = not classifiable beech forests (from PIF).

Discussions
Our study provided for the first time an accurate map of the distribution of three EU habitats (6520, mountain hay meadows; 4060, alpine and boreal heaths; 9130, Asperulo-Fagetum beech forests) in the entire territory of the Lombardy administrative region (resolution 20 m; Main Map). Maps such as these are essential for monitoring and reporting the conservation status of EU habitats as required by art. 17 of the Habitats Directive (Evans & Arvela, 2011). A detailed knowledge of EU habitat spatial distribution is necessary to implement standardized procedures for their monitoring (Bunce et al., 2008;Hochkirch et al., 2013;Velázquez, Tejera, Hernando, & Núñez, 2010), as well as to build more effective monitoring designs (Chiarucci, 2007), and to implement management plans and instruments for decision-making in land-use planning Hochkirch et al., 2013;Louette, Adriaens, Paelinckx, & Hoffmann, 2015).
In Italy, the increasing availability of environmental datasets in recent years (Capotorti, Guida, Siervo, Smiraglia, & Blasi, 2012;Pesaresi et al., 2014;Smiraglia et al., 2013) together with the national vegetation and flora database (Gigante et al., 2012) provide a reliable groundwork for EU habitat modelling. In our study, we defined a standard procedure that could be applied to such datasets in developing fine scale resolution maps of EU habitats also occurring outside protected areas. Detailed maps are highly recommended for EU habitat monitoring, since the coarse scale of environmental layers can hide the high local variability of EU habitats .
Mapping EU habitat distributions is essential for evaluating the efficiency of N2K for protecting species and habitats. N2K sites provide a central role in recovery of process of natural ecosystems (Prisco, Carboni, Jucker, & Acosta, 2016), as well as an important additional value for biodiversity conservation (Gruber et al., 2012;Maiorano et al., 2015;Trochet & Schmeller, 2013;van der Sluis et al., 2016;; but see Araújo, Alagador, Cabeza, Nogués-Bravo, & Thuiller, 2011). Despite this, many studies comparing non-protected and protected areas under N2K, highlighted an uneven distribution of EU habitats (Angiolini, Viciani, Bonari, & Lastrucci, 2017;Joppa & Pfaff, 2009;Maiorano, Falcucci, Garton, & Boitani, 2007;Rosati et al., 2008) and protected species populations (Araújo, Lobo, & Moreno, 2007;Bagella, Caria, & Filigheddu, 2013;Rubio-Salcedo, Martínez, Carreno, & Escudero, 2013). Also our analysis revealed  that large parts of EU habitats 9130 and 6520 are not included within the regional N2K network. The percentage of EU habitat 9130 distribution not included in the N2K network is comparable to those identified by gap-analysis for Italy (Rosati et al., 2008), but at the same time, EU habitat 6520 was demonstrated to be underrepresented in the regional N2K network. The Habitats Directive considers that the monitoring of EU habitat conservation status requires full comprehension of the variables driving habitat spatial distribution. Our results underlined the role of precipitation among the most effective environmental factors shaping EU habitat distributions at the regional scale. This confirms and highlights current threats for terrestrial habitats under predicted future climate changes (Barredo, Caudullo, & Dosio, 2016;Buse, Boch, Hilgers, & Griebeler, 2015;EEA, 2017;Stocker et al., 2013).
Predictive models indicate that habitats will dramatically lose range in the future, also within N2K (Barredo et al., 2016;Bittner, Jaeschke, Reineking, & Beierkuhnlein, 2011), with drastic consequences for the provision of services (Essl, Dullinger, Moser, Rabitsch, & Kleinbauer, 2012;Jantke, Müller, Trapp, & Blanz, 2016;Marchetti et al., 2012). Eastwood et al. (2016) reported that protected sites deliver higher levels of ecosystem services compared to non-protected sites. Hence, mapping EU habitat distribution at a fine scale over wide EU territories could help to implement monitoring and management of protected areas, maximizing the benefits of the services they provide.

Conclusions
In our study, we provided a straightforward, but robust, procedure for mapping EU habitat distribution over the entire EU territory. Higher resolution environmental data and classification of land use types will be increasingly available in future; this protocol could provide an easy and low cost procedure to define EU habitat distributions with increasing accuracy. Furthermore, our procedure is suitable for predicting EU habitat distributions at the regional scale, when scenarios of future climate and/or land use changes are introduced in the models. Additionally, when old maps of land use types and corresponding climate data are available, it also can be useful for modelling historical trends of past EU habitat distributions.

Software
For our analysis, we used open source software. Processing and derivation of ancillary environmental variables, spatial analysis of EU habitats and land use maps, random points sampling, as well as the application of models outside N2K, were carried out in QGIS (v.2.14.12;QGIS Development Team, 2009). The classification trees were computed with R software (R Core Team, 2017) through the function 'ctree' available in the package 'party' (Hothorn et al., 2006), and statistics were calculated through the function 'confusionMatrix' (Kuhn, 2008) available in the package 'caret' (Kuhn et al., 2017).