Fine scale population density data and its application in risk assessment

ABSTRACT Population density is one of the key parameters for assessing the magnitude of population exposed to risk, and the better quality data we have, the better the assessment of risk. The aim of this study is to elaborate a high-resolution spatially distributed population density grid, which estimates population at the commune scale with a reliability of over 90%. The novelty of the approach is population density estimation in a regular European grid, based on buildings vector data collected in the national topographic database. Using abductive reasoning in combination with statistics and spatial analysis, the authors extract approximate information about a population from the large-scale topographic data. Moreover, linking the obtained population data with the cadastral data – by unique building identifier – allows for regular, quick and census survey-independent updates of the population surface. A shortcoming of the approach is the issue of the possible existence of two houses per family, which leads to an overestimation of population. However, in the study area it affected only two of the total 14 communes by 7%–9%.


Introduction
Information on the geographical distribution of population is of critical importance in supporting decision-making in risk assessment. Risk prevention and management for natural hazards (e.g. erosion, fires, flooding, storms and drought, earthquakes) and risks linked to human activities (e.g. technological accidents) as well as civil protection and disaster management systems need access to detailed and timely information on the geographical distribution of people. This information, due to confidentiality, privacy issues and nondisclosure requirements, is generally supplied by national census at pre-defined units, mainly administrative units or statistical divisions. The size of census spatial units depends on the number of inhabitants (e.g. 4000-8000 in the U.S.A., around 500 in Poland) and it varies significantly between the dense urban areas and forestry or agricultural areas. The census gives the fullest and the most reliable characteristics of population, however, some drawbacks of this data are well known such as the long intervals between censuses which make the data outdated, or changes in border lines of census dissemination units that make long-term population comparison difficult (Gregory et al. 2010).
An important methodological problem in combining demographic data with other data for the purpose of natural hazards risk assessment is the incompatibility of mapping units used for data analysis and visualization, known in the literature as the Modifiable Areal Unit Problem (MAUP). The MAUP is 'a problem arising from the imposition of artificial units of spatial reporting on continuous geographical phenomena resulting in the generation of artificial spatial patterns' (Heywood et al. 1998). Research on the estimation of population data in user-defined units dates back to the beginning of the twentieth century. The cartographer Wright (1936) popularized a method of presenting population density based upon the division of a given administrative unit into smaller areas complying with different types of geographical environments and called it, after the Russian cartographer Tian-Shansky, dasymetric mapping (Bielecka 2005). Although the technique of disaggregation of census population data was described in the 1930s, it still lacks a standardized methodology. Recent advancements in geographic information systems, as well as increased availability of vector and satellite data, have revitalized the concept of presenting population counts in user-defined units. Based on the literature review, approaches used to map population differ in: techniques: disaggregation or aggregation; mapping unit (e.g. administrative units, enumeration units, geographical units, regular grid); ancillary data used (e.g. topographic maps, land use/land cover vector data-sets, cadastral data, satellite images, LIDAR data, etc.); visualization methods (e.g. choropleth map, isolines, dasymetric map, 3D visualization).
The disaggregation (down scaling) methods, based on areal interpolation, were defined by Mennis (2003, p. 32) as 'a geographic data transformation from source zones to target zones.' The areal interpolation, according to Langford (2013), can be implemented as: (1) basic areal interpolationwithout any ancillary data necessity; (2) dasymetric mappingancillary data indispensable; and (3) statistical modelling which intends to qualify the functional relationship between variables and areal units by extracting patterns embedded in external data-sets (Mennis 2009). By contrast, the aggregation (upscaling) methods are based on summing population counts attributed to address points or buildings (Ural et al. 2011;Zandbergen 2011;Sridharan and Qiu 2013), street blocks (Mennis and Hultgren 2006;Uesugi and Asami 2016) or cadastral parcels (Maantay et al. 2007;Maantay and Maroko 2009;Jia et al. 2014), and detailed land cover and land-use data, derived mostly from satellite image classification (Bielecka 2005;Alahmadi et al. 2013;Gaughan et al. 2013;Dmowska and Stepinski 2017).
The availability of high-resolution satellites imagery, LIDAR and fine-scale vector geographical data with detailed attributed information allows for small-area population distribution studies, but until now the results are characterized by relatively low reliability, fluctuating around 60%-70% (Yuea et al. 2005;GEOSTAT 1A 2011;Hall et al. 2012). The obtained results could be considerably improved by using regression analysis which combines the footprint areas or building volumes to the number of people in it (Wu et al. 2008;Lu et al. 2011;Alahmadi et al. 2013Alahmadi et al. , 2016Wang et al. 2016). However, Biljecki et al. (2016), based on a multi-scale study of population estimation in the Netherlands using a 3D building model, noticed that the accuracy of the obtained results strongly depends on the level of detail of the data used, and the scale of research (level of administrative units) as well as the adopted approach. Moreover, they suggested that both models, statistical and disaggregation, are able to provide truthful and comparable results. This paper introduces and tests a new approach for refining population distribution at high resolution by combining diverse geoinformation on buildings and statistical parameters. The goal of our study is to improve the volume-based method, by combining statistical and aggregation modelling and fusing detailed information on the buildings' location, footprint and volume as well as their structural characteristics. The buildings' volumetric and structural characteristics (number of storeys, building type and functions) are derived from the highly reliable Polish Topographic Database. Our approach refers to both urban and rural areas, where the population distribution is very diversified, which is considered as a further benefit. Moreover, adopting the European quadrilateral grid as a mapping unit allows the comparison of results over time and the analysis together with other geographical data.
The proposed approach is based on the abductive reasoning (inferring a case from a rule and a result) that combines inductive (bottom-up, empirical) and deductive (top-down, theoretical) principles. This high resolution and highly reliable data could be used for risk estimation and management.
The article is structured as follows. Section 2 briefly describes the methods used for population surface creation and provides a short overview of global gridded population data. Sections 3 and 4 present the high-resolution local population surface generation methodology and its experimental creation for a small city and rural area located in the western part of Poland. Section 5 demonstrates the application of the created surface in risk management, namely a flood threat, which is considered the most frequent and destructive natural hazard in Poland. Finally, conclusions are drawn in Section 6.

Gridded population data
The advantages of using squared grid cells instead of other reference units include: simple data comparison over time and space, stable reporting of unit size and area over time, and straightforward integration with other geo-referenced geographical data. Moreover, the hierarchical construction of grid cells allows for an undemanding data aggregation. Due to these reasons census offices in some countries produced grid-based statistics, for example Austria, Finland, Norway (GEOSTAT 1A 2011), Sweden (Tufte 1990) and Great Britain (Gregory and Ell 2005;Martin 2006). In addition, three gridded products compiled over the last decade depict the distribution of human population across the globe: Gridded World Population (GPW), Global Rural-Urban Mapping Project (GRUMPv1) and LandScan. They are briefly described in the following, and summarized in Table 1.
The first data-set, available from the 2004 GPW, provides globally consistent and spatially explicit human population information in 2.5 arc-minutes grid cells (Balk and Yetman 2005). The smooth pycnophylactic interpolation algorithm, elaborated by Tobler (1979), was used to transform population data from census units to a grid. The main assumption of population modelling was based on Tobler's first law of geography (Tobler 1970), whichin the context of population modellingcan be interpreted as follows. Within a given administrative unit, the areas that border on regions with higher population densities are likely to house more people than the areas that neighbour low-population density regions (Balk et al. 2011). Disaggregation of national census data, smooth pycnophylactic (mass-preserving) interpolation (Tobler 1979) Disaggregation of national census data, smooth pycnophylactic (masspreserving) interpolation (Tobler et al. 1995) Disaggregation of national census data, 'smart' interpolation algorithm The GPW database was the main source of data for the GRUMPv1. The additional data-sets used to produce world population density were: settlement point location, the approximate footprint for urban centres in each country and satellite data (Balk and Yetman 2005). Aside from high-resolution gridded population data, the GRUMPv1 provides an urban extent data-set based on NOAA's night-time lights data-set and buffered settlement centroids (where night lights are not sufficiently bright) as well as point data-sets of all urban areas with populations of greater than 1000 persons.
Finally, LandScan Global Population (LandScan) is a world-wide population database in 30 arcsecond resolution, estimating ambient population at risk (Dobson et al. 2000). Each raster cell is attributed with population counts based on multi-variable dasymetric 'smart' interpolation algorithms similar to those recommended by Tobler et al. (1995). The algorithm uses sub-national level census counts and some ancillary data including land cover, roads, slope, urban areas, village locations and high-resolution imagery analysis. The resultant population count is an ambient or average day/night population count (Bhaduri et al. 2007).
An interesting initiative of creating spatial population data-sets for the low-income countries in Africa, Asia and South America is the WorldPop project, initiated in 2013. WorldPop integrates, in open access archives, population surfaces and other demographic data (e.g. population age-structure) elaborated by researchers within the frames of AfriPop, AsiaPop and AmeriPop projects Tatem et al. 2013). The WorldPop gridded population data-sets, with spatial resolution equal to 100 m, portray a reliable representation of population distributions based on census data (Tatem et al. 2007), survey (Alegana et al. 2015), satellite (Linard et al. 2012;Stevens et al. 2015), social media , cell phone (Deville et al. 2014) and other spatial data-sets (Tatem et al. 2007). The method is based on the 'Random Forest' flexible, semiautomatic model to estimate population counts in a grid cell. Moreover, the data are accompanied by metadata and measures of uncertainty (Tatem and Linard 2011), which is perceived as the great advantage of WorldPop. The researchers noticed that the modelling method used in the WorldPop project produce more accurate data-sets than the methods used in the elaborating of global gridded population data products like GRUMPv1, GPW and LandScan (Linard et al. 2012;Tatem et al. 2013).
The 1 sqÁkm European population grid was introduced in 2011 (GEOSTAT 1A 2011) and amended in 2014 (EFSG 2014). The data covers 28 European Union Member States and the four EFTA (European Free Trade Association) countries. Hence, the GEOSTAT model considers differences in available resources and utilizes two methods for allocating a number of inhabitants to each square kilometre cell: aggregation or disaggregation (Mostr€ om and Wardzi nska-Sharif 2016). The aggregation method (bottom-up approach) relies on aggregating geo-referenced micro data (e.g. address points, buildings, centroids of parcels) and census 2011 data-sets. Alternatively, when geo-referenced micro data are not available the disaggregation (top-down) approach was used. The disaggregation method utilizes statistical population data for the lowest administrative units in combination with supplementary spatial data (e.g. land cover/land use or European imperviousness data-set). Countries where geo-referenced address points or buildings are available only partially implemented the hybrid approach, which is aggregation and disaggregation (Dudova and Ahmedov 2013).
The main disadvantage of the existing gridded population data is their low and variable (within one data-set) reliability of population counts attributed to grid cells. Hall et al. (2008Hall et al. ( , 2012 pointed out the large differences between the described world gridded population databases as well as between the reference census and ancillary data. The European gridded population data is also imprecise. The percentage error of population estimation is between 25 for the Netherlands and more than 50 for Norway (Eurostat 2011;Nordbeck 2012). Despite the fact that the pan-European population grid is of low quality the lesson learned allowed the improvement of the population grid model and techniques and awareness of the countries' importance of high quality geo-reference data on address points, buildings or cadastral centroids.

Study area and data
The study area consists of the city of Siedlce and, mostly rural, Siedleckie county located in Mazovia, in the western part of Poland. According to the 2013 Polish statistical data, the population distribution in the study area varies considerably from the highly populated city of Siedlce (2397 people per square kilometre) to the sparsely populated commune of Korczew (27 people/sqÁkm). This is illustrated in Figure 1, by the means of a choropleth map (classification based on manual choice). Moreover, the size of reference units of the officially available census data, representing the third level of Polish administrative divisioncommunesis heterogeneous, and varies from 47 to 123 km 2 .
The building data layer from the Polish Database of Topographic Objects (BDOT10k) was used as the ancillary data. It is a country wide vector database with a thematic scope and a level of detail corresponding to civilian topographic maps at a scale of 1:10,000. The BDOT10k buildings are represented geometrically by polygons and they are described by 25 attributes. Several of them, namely building footprint, building type, its functions and number of storeys, are crucial for this research. The accuracy of the BDOT10k building location is very high; the average displacement between ground survey measurements and the building footprint coordinates, in the topographic database is 2 m (Bielecka 2015). A summary of an overview description of census and auxiliary data used is given in Table 2. Data are provided by Central Statistical Office of Poland.
The study of the building vector layer of the existent topographic data, followed by the field study of the test area, allowed for some gains in information about the types of buildings intended for permanent human habitation. Following the building typology as defined in the Database of Topographic Objects, we distinguished the following three residential building types depending on dwelling number: single-family building, also called single-family detached home or separate housea single dwelling unit inhabited by a family or small group sharing facilities such as a kitchen; two-flat residence, sometimes referred to as a two-flat or duplex or semi-detached buildinga double dwelling unit inhabited by two families, or building consisting of two separate houses typically interconnected side by side; a multi-family residential dwelling, also called an apartment buildinga building with multiple apartments. Within the study area, residential buildings constitute 55.5% of all buildings in the city of Siedlce and 36.3% in Siedleckie county, while the total area occupied by these residential houses equals 6.6% in the city, and 0.9% in the county. Moreover, the multi-family houses predominate in the city (87.3%), while single-family houses (99.7%) predominate in the county (see Table 3).
All non-residential buildings, as well as non-residential levels of the mixed-use buildings (that constitute less than 0.2% of total buildings number), were excluded from the subsequent analysis. The study did not include people living in student hostels, boarding schools, nursing homes, etc., and those living in places not intended for residential purposes, but which for various reasons are inhabited (e.g. garden houses).

Methods
The study aimed at the provision of a high-resolution population density grid, which estimates population density with a reliability of over 90%. The main advantage of grid-based models of population distribution are as follows: (1) population density data is represented as a continuous surface and (2) it is unconstrained by any of the irregular partitioning into arbitrary areal units, i.e. administrative or census enumeration and thus it is not subject to the MAUP and other areal unit-derived problems (Langford 2013;Mennis 2003).
The main assumptions in the presented model are as follows: population density is proportional to the total number of dwellings in each building and housing density, defined after Bender (1981) as the number of dwellings per unit of land. This imposes the second assumption stating that population surface represents population of place of residence at night time, i.e. the number of people staying at home at night. The third assumption is that population total for the study area (countythe second level of administrative unit) should be preserved, while in the commune (the third, lowest level of administrative division) the differences between expected (given by the Central Statistical Office) and computed population are minimized.
The buildings' volumetric and structural characteristics (i.e. building location, footprint, building type, its functions and number of storeys) were extracted from the Polish Database of Topographic Objects, while the required dwelling number was estimated using abductive reasoning in combination with statistical analysis. In particular, a statistical relationship between the apartment building footprint area and the number of its dwellings was established by means of regression analysis based on a sample of randomly distributed field data.  The estimated total number of building dwellings was then multiplied by the average private household size provided by the Central Statistical Office in Poland depending on the commune level of urbanization (see Table 2) to arrive at an initial building population estimate. By summing up these initial figures of each building located within a chosen commune, an initial commune total population approximation was accomplished. The sum of the squares of the residuals, representing the differences between the expected census origin commune population and the approximate one, was then minimized by modifying the average size of private households at the commune level. It resulted in obtaining a refined household average size for each of the communes and, subsequently, it altered the inhabitant number for each residential building, i.e. a more refined building population estimate was attained.
Finally, the refined building population was related to the European geographical quadrilateral grid system. It is a hierarchical, two-dimensional grid geographically based on the ETRS89 Lambert Azimuthal Equal Area coordinate reference system (INSPIRE 2010), established as a spatial data frame for thematic information linking within the European Union. For this analysis, the 100 m resolution grid cell was chosen and the fine-scale population density raster was created. Then, the accuracy of the fine-scale population grid was verified at the commune scale. In the closing step, a choropleth map was created in order to visualize the obtained population density surface.

Number of apartment building dwellings estimation
A study area field reconnaissance revealed the fact that each staircase is associated with two apartments on each floor (Figure 2). The ground truth data of the number of dwellings in each apartment building is unavailable, however it can be estimated on the base of the following equation: where DN abestimated number of dwellings in an apartment building; SNnumber of staircases in an apartment building; FNnumber of floors in an apartment building. As may be noticed from the provided photograph of a standard apartment building in the study area (Figure 2), it is composed of several internal entrances and each entrance has only one staircase. Moreover, each floor has the same area. A sample of 310 residential buildings was used to model a statistical relationship between the number of residential building staircases and the apartment building basic footprint area. This hypothesis was positively verified through a regression analysis (Figure 3).
The relationship between the number of an apartment building staircases and the building footprint (area) can be modelled using a simple linear regression (2): where SNnumber of staircases in an apartment building; BFbuilding basic footprint (m 2 ). The value of coefficient of determination (R 2 ) equals 0.97 what means that the number of staircases in one apartment building is explained in 97% by the building footprint area.

Commune population estimation
Assuming the average size of private households (c i ) is known, the building inhabitant's number in each of the building types follows the simple rule illustrated in Table 4.
The initial number of residents in each building was estimated using the official average size of private households listed in Table 2. Its initial value is equal to 3.00 or 2.53 depending on commune urbanization type.
By summing up the initial estimated numbers of building inhabitants for all buildings geographically located in a commune, the approximate commune total population was obtained. To arrive at a finer approximation ('best fit') of population at the commune level, the sum of the squares of the differences between the expected value, that is census origin commune population, and the initial

Fine-scale population density surface creation and its accuracy verification
The estimated number of building dwellings was then multiplied by the tuned average private households for each building located in a particular grid cell, and summed up to arrive at the cell population value. Here, the authors refer to the location of the building as to the building centroid position.
The final population density attributed to 1 ha grid varies from 0 to 635 as can be seen from histogram ( Figure 4).
In order to verify the accuracy of the resulting population surface, a comparative analysis was performed. The limited reference data availability allowed only for analysis on the third level of the Polish administrative division (i.e. at the commune scale). The obtained root mean square error of population estimation equals to 89 people, and mean absolute deviation of 3.9 (Table 5).
The minimal value of -586.10 (see Table 5), relating to the number of people in the Siedlce commune, is the highest overestimation of the population. It is, however, only 3.3% of the population in the commune (see Figure 5). The maximum value of 485.66 people is the greatest underestimation (the Skorzec commune) and represents 6.3% of the total population in the commune.
The difference in per cents in population counts (MAPE error), shown on Figure 5, varies from ¡6.3 to +8.85, when comparing population by commune given by the Central Statistical Office to the total population derived from the interpreted population surface. The population in the two  communes, namely Skorzec and Wisniew, was underestimated by 4%-6%. The underestimation could be a result of errors in the topographic data, especially with omission of some apartment buildings, misclassification of a building type (e.g. residential, office or other), or because of an incorrect value of the number of storeys. It is worth mentioning that according to the technical specifications of the Polish Database of Topographic Objects (BDOT10k), 2% of buildings omission is acceptable (MIAA 2011).
For the other two peripheral communes, mainly tourist and recreational areas, the population approximation was overestimated around 7%-9%. This is attributed to the existence of summer houses, i.e. used only temporarily as a house, that should not be included in the model. However, they are not possible to identify and subsequently exclude from further calculations, based only on the national topographic data. In the city of Siedlce overestimation could be related to apartments not inhabited yet, e.g. not sold by the developer or intended for rental but currently vacant.

Cartographic visualization
The obtained population surface represents detailed high-resolution population density data. It is a next research challenge to visualize it in the most efficient and easily readable way to take advantage of the spatial distribution of population density. The authors believe that the form of choropleth map proposed below ( Figure 6) possesses these qualities, however, they are more evident if the map is displayed in the appropriate scale depending on user experience and cognition capacities (e.g. as one of the zoomable layers in a geoportal). The population density choropleth map (Figure 6) allows for a quick visual assessment of the population data spatial distribution (Calka et al. 2016). The prevalence of the unpopulated areas is easily perceived. And indeed, they constitute 92.5% of the total study area. The very low populated regions, i.e. 2-19 people per hectare, correspond to rural areas with dispersed settlement, where 46% of the population live. By contrast, only 5% of the total population lives in the most densely populated regions, denominated by 360-635 habitants per hectare, covering only 0.01% of the study area. Those most populated regions correspond to town areas and some county villages areas (most populous ones).

Application of fine-scale population density data in flood risk exposure
Poland, a country located in central-eastern Europe, suffers few natural hazards except for an occasional flooding, with a prevalence of early spring floods. This fact is emphasized by researchers (Zolina 2012;Kundzewicz 2014;Kundzewicz et al. 2016), and international and national bodies involved in natural hazards (CEC 2007;IRFC 2010;CIA 2016). In the last 40-50 years, the annual number of heavy floods and damages caused by inundation have increased considerably (Kundzewicz 2014;Anderson 2007). However, an awareness and preparedness for such hazards have become a serious concern following the dramatic floods in Poland (and neighbouring countries, i.e. Germany and the Czech Republic) in 1997 and 2010. During the last 20 years, several country wide initiatives have come into existence, amongst which the IT system of the Country's Protection Against Extreme Hazards (ISOK) is one of the major ones. Some of the prevention products of ISOK are recently elaborated flood hazard maps and flood risk maps (Nowicki 2007;ISOK 2015), and the vector data of areas at flood risk are available to download by the public (e.g. using the spatial web services).
Sharing data on flood risk along with information about the number of people exposed to danger enables a wide and open access for all citizens (Fiedukowicz et al. 2012) and it facilitates people protection measures both in crisis situations and risk mitigation. Regardless the size and character of the hazard, people are always first and foremost affected locally, that is, where they live and conduct their daily activities (Freire 2010). Unfortunately, the ISOK system does not collect any data on population distribution. The described high-resolution population distribution surface fills this gap.
Usefulness of the data is demonstrated in an example of flood hazard in the study area crossed by two main rivers, the Bug and the Liwiec. The vector data representing areas at flood risk from the rivers, identified by ISOK, can be overlaid and intersected with the final population density grid to derive the number of people exposed to flood risk. The areas marked with the 1% probability of the Bug or Liwiec flooding (i.e. once in 100 years) are inhabited by 1782 people which accounts for 1.14% of the study area population (Figure 7). Residents of eight villages of the Korczew commune live in the highest risk area, in particular those whose houses are on the Bug River floodplain. Those in danger (925 people) constitute 30.04% of Korczew commune residents. In case of flooding 24 coaches and 78 lifeboats will be needed, assuming that one boat can carry 12-15 people. Altogether the number of buildings which may be destroyed by flooding is 949, out of which there are 402 residential houses and 547 other buildings. In this area there is also a primary school with 200 pupils. It is the most vulnerable people needing some form of rapid extraction.
The Liwiec River in the Siedlce commune does not pose as much threat as the Bug River (Nowicki 2007). Out of 17,496 residents, 652 people live in the risk area, which is 3.73% of the total population of the commune. Due to flooding 492 buildings (3.81% of all buildings in the commune) Figure 7. Visualization of the areas threatened by Bug and Liwiec floods in the study area. may be damaged, including 217 residential houses (1.68% of all houses in the commune) and 275 other buildings (2.13% in the commune).
It is also worth mentioning that only 4803 people live within 100 m of all the rivers in the study area, accounting for 3.1% of the population, while up to 13,710 (8.4%) inhabitants live within 300 m of the rivers. Although the modelled population surfaces represent maximum expected night-time densities, the number of people at risk is probably unchanged from night time to daytime due to agricultural character of the most affected areas.
So far, the number of residences affected by flooding is only roughly estimated, based on interviews with local people and total number of people in the commune.

Conclusions
Population grid represents both population distribution and density, and can be easily combined with hazard maps for exposure assessment or modelling. The authors' method of producing a population density surface is a bottom-up approach based on the computation of the number of people in each cell by an estimation of the number of inhabitants in every dwelling spatially joined with grid cells. The fine resolution local population surface is a gridded data product that renders population data at the scale and the extent required to demonstrate the spatial relationship of human population and the environment on a local scale. The presented grid creation method proved to be successful for medium-small regions characterized by uneven population distribution with large differences between the highest and the lowest population densities. The reliability of the fine-scale population density grid in comparison with census counts is over 90%. Moreover, the use of the fine-scale population density grid, where people are assigned to the cell where they actually live, avoids the MAUP issue.
Unlike other studies, population estimation based on the number of dwellings by type and function of building allows for the exclusion of non-residential parts of mixed-use buildings. Another advantage of the elaborated method is the possibility of quick and census survey-independent data updating. Linking the population surface with the cadastral databy unique building identifierallows for regular updates.
The proposed model is scalable and could be applied and customized for other rural-urban area in Poland and other European countries, ultimately benefiting disaster risk assessment and mitigation. However, usability of the fine-scale population density surface method requires an access to high-resolution detailed data on buildings, in particular their location, number of storeys as well as their types, including residential buildings' ones. This kind of information is usually stored in finescale topographic data. Such a detailed database is not available in each country. In some countries (e.g. France, Germany, the Netherlands), the stored data on buildings' height can be used to estimate the number of storeys (Dukaczewski 2013). In such a case, the expected results could be a little less accurate, which is pointed out by Biliecki et al. (2016). The other shortcomings of the presented approach are as follows: the issue of two houses per family, i.e. the permanent and the summer one, and the issue of uninhabited dwellings. Such occurrences lead to an overestimation of number of inhabitants. However, in the tested area it affected only two out of 14 communes by 7%-9%.
It should be emphasized that since 2010 the European Commission, together with Member States, has been working on spatial data (including data on buildings) harmonization within the frame of INSPIRE -Infrastructure for Spatial Information in the European Community. The application schema of buildings data includes information on the number of storeys and the number of dwellings (INSPIRE 2012). Such information allows for more accurate estimation of number of people not only at the grid cell level, but also at the building feature level of detail. However, according to the schedule (INSPIRE Road Map 2015) the data will be available after 2021.