A methodology to downscale water demand data with application to the Andean region (Ecuador, Peru, Bolivia, Chile)

ABSTRACT Mountainous regions are a hotspot for water scarcity and anthropogenic pressure on water resources. Substantial uncertainty surrounds projections of future climate and water availability. Furthermore, quantitative and distributed data on water demand are generally scarce, dispersed, and highly heterogeneous. This forms a major bottleneck to studying water resources issues and developing strategies to improve water resource management. Here we present a methodology to produce and evaluate high-resolution gridded maps of anthropogenic surface water demand with application to the Andean region. These data are disaggregated according to the major types of water demand: domestic users, irrigated area, and hydropower. This dataset was built by homogenizing, integrating, and interpolating data obtained from various national institutions in charge of water resource management as well as relevant global datasets. The maps can be used to research anthropogenic impacts on water resources, and to guide regional decision-making in regions such as the Andes.


Overview
Water resource management and governance are increasingly affected by systemic changes in water availability and socioeconomic development (Wada et al. 2016). This makes quantifying spatiotemporal patterns of water demand a critical but often challenging task (Nazemi and Wheater 2015). Existing datasets are usually scarce, dispersed, heterogeneous, and difficult to access.
As the population of developing countries is expected to increase by two billion over the next 40 years, significant supply-demand deficits are expected, with estimated investments in water infrastructure of over 6.7 trillion USD needed worldwide (Hunt andWatkiss 2011, OECD 2015). Inadequate spatial mapping of anthropogenic surface water demand results in a lack of proper water accounting. For instance, partly as a response to this data scarcity, many prominent hydrological models do not fully incorporate the impact of anthropogenic activities on natural processes, leading to major errors in hydrological outputs (Gleick et al. 2013). Some models do characterize anthropogenic surface water demand but at very coarse spatial resolutions (Döll and Siebert 2002, Van Beek et al. 2011, Bierkens 2015. Furthermore, water footprint studies are often limited to national or international scales, which is incompatible with areas of strong local gradients in water availability and demand, as is particularly the case in mountain environments (Mekonnen and Hoekstra 2018).
These assumptions are particularly problematic in areas with large topographical variability and spatial constraints on water sourcing, such as mountains (Buytaert et al. 2009, Viviroli et al. 2011. In fact, water resources in mountainous regions are under substantial stress due to both climate change and increased anthropogenic impacts (Correa et al. 2020). This could have substantial effects on the estimated 1.9 billion people that live in or downstream of mountainous areas, by increasing potential damages due to floods or droughts (Immerzeel et al. 2020). Limited spatial understanding of water demand is a key hindrance to conducting comprehensive water-related risk assessments (Drenkhan et al. 2015(Drenkhan et al. , 2019. This can also have negative consequences on adequately allocating surface water abstraction rights, leading to an increase in water-related conflicts (Nazemi and Wheater 2015).
Therefore, the objective of this study is to develop a method that is able to disaggregate spatially lumped data on anthropogenic surface water demand using the best available data for various sectors. We use the following definition of "surface water demand": human water needs that are addressed by means of a direct anthropogenic disruption to natural surface runoff processes within a river network -for example, in the form of water withdrawal from a river -or artificially altering the flow regime -for example, via a dam. We consider demand here in terms of end purpose, i.e. number of users (inhabitants), irrigated area (ha), and generated hydropower (MW). We do not consider industrial water requirements because of the lack of relevant data and the very specific requirements of different types of industries. We also do not quantify actual volumetric water abstraction, deviation, or storage, because such estimates depend on a number of additional variables (e.g. per capita water requirement; crop water requirements) that are methodologically well understood but require further local information.

Case study
The aforementioned challenges are manifest in the Andes, where climate and the ensuing water availability are extremely variable and affected by various drivers (Garreaud 2018). For example, the eastern slopes of the tropical Andes display high precipitation frequencies and magnitudes due to moist air influx from the Amazon rain forest (Buytaert and De Bièvre 2012), whereas the Pacific coast of Peru and northern Chile is one of the most arid regions of the world (Clarke 2006). Overall, precipitation ranges from above 8000 mm year −1 on the Pacific coast of Colombia to approximately 200 mm year −1 over the Bolivian Altiplano (Garreaud et al. 2003, Garreaud 2018, to less than 5 mm year −1 in the Atacama Desert of northern Chile and southern Peru. Many major population centres are located in highly seasonal and vulnerable environments prone to water scarcity, such as the capital of Peru, Lima, with its 12 million inhabitants. Furthermore, the region is witnessing rapid demographic growth, with an average annual growth rate of 1.5% until 2050 predicted under a medium scenario across all four countries considered here (United Nations Population Division 2019).
The method developed here aims to produce spatially disaggregated maps of observed or estimated domestic demand and irrigation demand as well as hydropower production at 3 arcseconds resolution (approximately 90 m at the equator; see Fig. 1). The geographical extent covers Ecuador, Peru, Bolivia, and Chile down to 45°S latitude, and therefore excludes Patagonia, Chile.
Comprehensive, homogenized datasets of anthropogenic surface water demand are highly valuable for researchers and decision makers, for example as an input in analyses to assess and forecast water availability in the context of population growth and climate change. The data can also be used for research on the impacts of human pressure on water resources or on the potential impacts of environmental changes on human development, adaptation, or resilience, and to guide regional decision-making on water resources in regions such as the Andes.

Methods
In order to derive the datasets mentioned previously, we first compiled existing national databases as well as other globally Figure 1. Surface water demand maps of the four capital city areas of the study region. The coloured pixels represent river points coded with the number of people and the irrigated area that depend on this pixel for their water supply. In the cases of Quito, Ecuador (a) and La Paz, Bolivia (c), domestic water demand is visualized on top of irrigation demand. In the cases of Lima, Peru (b) and Santiago, Chile (d), irrigation demand is visualized on top of domestic water demand. available datasets on surface water demand across all three sectors. We then developed an algorithm to homogenize, combine, infill, and disaggregate these datasets. Finally, we validated our results by comparing them to actual data for two major Ecuadorian rivers, which are the only systems in our study region for which high-resolution data are available.

National databases
The countries in the study region display varying degrees of data availability, accessibility, and spatiotemporal completeness. We describe here the available data that were obtained as well as how they were incorporated in the dataset for each country (Table 1).
In Ecuador, the National Water Secretariat (SENAGUA) is the highest official authority responsible for maintaining an updated registry of all authorized surface water and groundwater abstraction allocations as well as granting new requests. It houses limited information on the coordinates of given abstraction points accounting for all major sectors, irrigation, domestic and hydropower.
In Peru, hydropower locations and peak power production data were obtained from the Peruvian Ministry of Energy and Mines. The National Water Authority (ANA) provided data on domestic demand numbers of major cities and irrigated areas.
Bolivian water demand data are scarce, with only a 2010 national dam inventory available for hydropower. Domestic demand data for the city of La Paz were provided by the staterun water company (EPSAS).
Chile has the most comprehensive water demand characterization within our study region, with monthly allocations per officially registered surface abstraction point available for all major sectors as of 2015. However, data on irrigated area and population served per abstraction point are not available.

Disaggregation and combination procedure
We discuss here the process of combining or using available data on anthropogenic surface water demand with our estimates. In the case of hydropower, it is typically possible to obtain data from the relevant authorities or other publicly available datasets.
Direct spatially disaggregated data on domestic and irrigation water demand is often inaccessible, unavailable, or incomplete. In those cases, we estimated anthropogenic surface water demand with an algorithm that estimates the most likely water abstraction point for a certain subset of population or irrigated area. Here we use 2015 population maps from the WorldPOP project (Sorichetta et al. 2015) at a resolution of 3 arcseconds, in addition to agricultural landuse and land-cover (LULC) maps from the MapSPAM initiative (International Food Policy Research Institute 2019) at 10 km spatial resolution. We combined these data with highresolution topographical data from the United States Geological Survey (USGS) Hydrosheds maps at 3 arcseconds resolution (Lehner et al. 2008), to identify the most likely abstraction point as follows:

Identification of the river network
We derived the river network from digital elevation data from USGS Hydrosheds at 3 arcseconds resolution using a D8 hillslope flow algorithm.

Separation of surface water and groundwater abstraction
We obtained data on the domestic and irrigation water demand from surface water (SW) and groundwater (GW) sources at the finest possible level in each country. We then compile statistics on the percentage of surface and groundwater use.

Identification of the location of surface water abstractions
For surface water abstractions, we assume that water is sourced from the nearest river or water body that satisfies the following criteria: (a) the size of the associated catchment is above a predetermined thresholda. (b) the abstraction point is not below a threshold elevation differenced.
In our application, we used a catchment area threshold of 40 km 2 , and an elevation threshold of 50 m, which represents the typical elevation difference that can be bridged with small pumping infrastructure. These values are based on our field observations and experience in the Andes, but can be adjusted for specific purposes if needed.

Correction for groundwater use
In order to correct for groundwater use, we multiply both our obtained population served and irrigated area maps by the percentage of surface water use in the relevant administrative unit, following the approach of Gleeson et al. (2012) (Appendix A).

Domestic demand
We used population maps from the 2015 WorldPOP project (Sorichetta et al. 2015) at 3 arcseconds resolution, which provide an estimated number of inhabitants per pixel.
In Ecuador, we implemented the water allocation algorithm described previously using the 2015 WorldPOP dataset in addition to available data for the city of Quito only. We used available data from two major rivers in the validation process. Provincial-level statistics on groundwater use are available from the SENAGUA database (Table A1).
In Peru, as domestic water demand data for major cities were available, those were masked out and the remaining rural areas were assigned withdrawal points using the water allocation algorithm and the 2015 WorldPOP dataset. National-level statistics on groundwater use are available from the International Groundwater Resource Assessment Centre (IGRAC) ( Table A2).
For Bolivia, no domestic water demand data were available except for the city of La Paz. Therefore, we implemented the water allocation algorithm described above in all four countries using the 2015 WorldPOP dataset. National-level statistics on groundwater use are available from the International Groundwater Resource Assessment Centre (IGRAC) ( Table A3).
Our data for Chile include coordinates of actual abstraction points but do not provide information on the number of people served by these abstractions. Therefore, we simply ran the algorithm for the entire country using the WorldPOP dataset as well. Provincial-level statistics on groundwater use are available from the Chilean Public Works Ministry (Table A4).

Irrigation
We obtained irrigated area maps by crop type from the MapSPAM initiative (International Food Policy Research Institute, 2019) regridded at 3 arcseconds using nearestneighbour resampling. We combine all individual crop maps to obtain a total irrigated area per pixel of analysis.
In Ecuador, we implemented the water allocation algorithm described previously using the MapSPAM dataset with available data used in the validation process. Provincial-level statistics on groundwater use are available from the SENAGUA database (Table A1).
In Peru and Bolivia, we implemented our water allocation algorithm using data from the MapSPAM initiative. Nationallevel statistics on groundwater use for both countries are available from the International Groundwater Resource Assessment Centre (IGRAC) (Tables A2 and A3).
The data for Chile contain irrigation abstraction coordinates, but no information about the area that they serve. Therefore, we also ran the algorithm over the entire country using the MapSPAM dataset. Provincial-level statistics on groundwater use are also available from the Chilean Public Works Ministry (Table A4).

Hydropower
Hydropower generation locations as well as peak electricity production statistics are available for all four countries as of 2010. Figure 2 summarizes the steps used to develop and validate the final data outputs.

Validation
It is not possible to completely quantify potential errors in the input data because most datasets come without the necessary metadata to analyse those errors. Instead, we examined the performance of our algorithm by selecting two major Ecuadorian river catchments for which data are available, and of sufficiently high quality, from SENAGUA ( Fig. 3): the Guayas River, which serves the city of Guayaquil, the largest in Ecuador; and the Esmeraldas River, which passes through the capital Quito (where it is known locally as the Guayllabamba River).
We first examined normalized cumulative plots of simulated and observed abstractions over the normalized distance along each river transect x to assess the performance of our algorithm. To do so, we used a discrepancy factorA d , defined  as the extent to which the simulated A sim and observed A obs cumulative water demand profiles diverge, i.e. the value of the integral area between the two curves defined in Equation (1) below. A value of 0 indicates a perfect alignment, whereas a value of 1 indicates maximum divergence.
We then compared our results with 10 000 random allocations of the MapSPAM and WorldPOP irrigated area and population served pixels, respectively, to river cells, without accounting for topography or distance. We then computed the discrepancy factor between the cumulative curves generated by the algorithm and the average cumulative river profile of all random allocations in both river cases.

Results and discussion
The data and method presented here produce maps of domestic water demand, irrigated area and hydropower production from surface water resources. Table 2 summarizes the main results. We make the following observations. First, the proportion of the population that the algorithm is unable to allocate ranges between 32% in Ecuador to 0% in Chile. These are populations that live in headwater catchments above the highest river pixels in the river map. These highest pixels are determined by the catchment size threshold of 40 km 2 and represent a trade-off with the density of the river network that the D8 algorithm generates. We chose this threshold based on field observations that rural communities tend to source water from nearby small streams instead of larger rivers because the former tend to have better water quality. However, unallocated populations will need to be allocated manually as these are often small upland communities that draw water from various small rivers and suffer recurring water scarcity.
To evaluate the accuracy of the allocation of domestic and irrigation demand allocation, we compare our results to a baseline that consists of a random allocation (Fig. 4). This visually demonstrates the improvement in performance of our algorithm.
A similar trend can be observed in the normalized cumulative plots of simulated and observed abstractions along our selected river profiles (Fig. 5). The results display a good agreement between our results and observed data but a substantial divergence between our results and the random allocation, with A d decreasing by an order of magnitude between the random allocation and the allocation algorithm (e.g. 0.37 to 0.04 for domestic users in Esmeraldas; Table 3). This provides evidence of the ability of the algorithm to identify the (approximate) location of water use.
The algorithm does not account for the existence of advanced infrastructure such as pumping and inter-basin transfers, nor is it able to represent return flows. This leads to an underestimation of demand in certain locations. For example, several major irrigation projects on the Pacific Coast of Peru rely on bulk water transfers from the Andean highlands. It would be straightforward to implement this in the procedure, conditional on the availability of abstraction and supply points. Additionally, the effect of this problem on our domestic demand estimates is limited as we use direct data from major cities in the region, where major water supply infrastructure is mainly used and water use is generally well documented (McDonald et al. 2014).
The algorithm is also prone to overestimating the number of abstraction points because it allocates each population pixel individually, while in practice larger clusters of users (e.g. a village) will be served by infrastructure drawing water from a single location. This results in an overestimation of the smoothness of the cumulative abstraction profile compared to the actual curve. The actual data show major spikes as a result of the existence of large, major water abstraction points. This again relates to the abovementioned lack of integration of large infrastructure in the methodology as a result of the sparsity of available information.
Lastly, our method to correct for groundwater use is necessarily spatially coarse, because groundwater abstraction data are only available at the national level for Peru and Bolivia, limiting the accuracy of the correction. For example, in Peru, surface water sources account for 40% of total irrigation requirements. However, there is considerable variability within the country, with several major irrigation projects along the Pacific Coast relying on complex infrastructure schemes involving groundwater abstractions whereas certain upland small-scale farmers might rely entirely on surface water. Nevertheless, the approach we have taken in such circumstances is consistent with previous attempts to quantify water use (e.g. Gleeson et al. 2012).
Such information is highly valuable in the context of water resource management and regional assessment of water stress. Especially in mountain regions, water stress can show strong spatiotemporal patterns (Buytaert et al. 2017), which are difficult to identify using maps of population and irrigated area. Additionally, the methodology allows us to set specific surface water abstraction rules depending on local management context, including for instance environmental flows, and allocation priorities during a hydrological drought. We should note that our analysis focuses only on quantifying water demand, irrespective of water availability. As such, we do not account for water scarcity or environmental flow requirements, which are beyond the scope of this study. In Table 2. Summary of results across all four countries. MapSPAM and WorldPOP total irrigated area and population are used as reference points. Total population and total irrigated area statistics were obtained from FAO (2016). Groundwater (GW) use is obtained from the IGRAC database (Gleeson et al. 2012 (d), respectively. Blue: data estimated using the procedure presented here; green: random allocation of domestic water demand and irrigation demand from WorldPOP and MapSPAM datasets, respectively; orange: independent data obtained from the Government of Ecuador (SENAGUA).

Figure 5.
Normalized cumulative profiles of domestic water demand and irrigation demand for the Esmeraldas River: (a) and (c), respectively; Guayas River: (b) and (d), respectively. Blue: data estimated using the procedure here; orange: independent data obtained from the Government of Ecuador (SENAGUA). The difference between observed and simulated data curves (S) is highlighted in grey. A smaller S indicates a better agreement between the datasets.
It is feasible to calculate actual volumetric surface water use (e.g. in m 3 ) from our spatially disaggregated surface water demand maps. This will depend on locally specific technical characteristics as well as particular management and policy constraints. The most straightforward approach to estimating volumetric domestic water use would be to multiply our domestic demand maps by per capita water consumption statistics at the relevant spatial scale. Irrigation water use can be estimated in a similar way by combining our maps with specific crop distribution and water demand information.
Future work should focus on developing actual water abstraction datasets for the countries under consideration, which will necessarily involve the cooperation of the various agencies responsible for maintaining such datasets. There has been progress in this regard, as evidenced by the public availability of the national databases that we have used in this analysis. However, considerable data scarcity still remains, which must be addressed to promote integrated water resource management at both regional and national levels. Focus should be directed initially at regions with substantial demand. Moreover, instead of measuring individual abstraction points, decision makers can set up measurement stations upstream and downstream of a river reach with known significant anthropogenic pressures to get an initial estimate of water use. Such efforts are crucial to assess actual water stress in a region undergoing major demographic changes, with population growth rates up to 2050 projected at 37.7% and 62.4% in Ecuador and Bolivia, respectively (United Nations Population Division 2008). Such data could then be coupled with regional and global hydrological models to determine anthropogenic impacts on water availability. Furthermore, more work needs to be done on understanding water use amongst upper Andean communities, who might rely on various water sources across a hydrological year or use unconventional methods such as rainwater collection or fog harvesting.

Conclusions
This study is intended to help both decision makers and scientists to achieve a better spatial understanding of the impact of surface water demand on water security. Whilst we do not estimate actual volumetric water use, our datasets and methodology complement past studies which do estimate such requirements but fail to allocate them adequately in space, mainly due to their coarse spatial resolution. Therefore, possible specific applications include combining our maps with relevant hydrological data to obtain actual water use, or developing risk and vulnerability analyses considering the cumulative irrigated area located downstream of a given mining project.
We do not consider the data, particularly in Peru and Bolivia, to be suitable for localized applications due to the uncertainties involved in the allocation algorithm. Specifically, as the algorithm assigns a given demand pixel to the nearest river point, it assumes all demand sources use gravity as the main water transport mechanism or use a maximum pumping elevation of 50 m. Engineering solutions such as transport from upstream areas are therefore not accounted for. Various steps limit the uncertainty generated from such structural issues, such as correcting the obtained datasets for groundwater abstraction. Finally, as hydrological extremes increase in frequency and intensity, adequately mapping the full extent of risk will be a key step towards ensuring better societal preparedness.

Data availability
The data are intended to assist in developing more thorough and accurate assessments of water resource availability in the Andean region. The surface water demand data also enable more rigorous decision-making across all management and governance scales. Any raw data obtained from the respective country national databases used in this analysis can be obtained from the corresponding author on reasonable request. The datasets are publicly available at http://dx.doi.org/10.6084/m9.figshare. 9168041 (Zogheib et al. 2019).
The scripts used in the analysis are in the form of freely available GRASS GIS scripts located in the Figshare repository. Calculations were Table 3. Computation of discrepancy factor A d between normalized cumulative estimates from our algorithm and (a) available observations and (b) the average of 10 000 random allocations. done using GRASS GIS (version 7.0) and Python (version 3.6.5), both of which are available as open-source software.