Integrating VGI and 2D hydraulic models into a data assimilation framework for real time flood forecasting and mapping

Crowdsourced data can effectively observe environmental and urban ecosystem processes. The use of data produced by untrained people into flood forecasting models may effectively allow Early Warning Systems (EWS) to better perform while support decision-making to reduce the fatalities and economic losses due to inundation hazard. In this work, we develop a Data Assimilation (DA) method integrating Volunteered Geographic Information (VGI) and a 2D hydraulic model and we test its performances. The proposed framework seeks to extend the capabilities and performances of standard DA works, based on the use of traditional in situ sensors, by assimilating VGI while managing and taking into account the uncertainties related to the quality, and the location and timing of the entire set of observational data. The November 2012 flood in the Italian Tiber River basin was selected as the case study. Results show improvements of the model in terms of uncertainty with a significant persistence of the model updating after the integration of the VGI, even in the case of use of few-selected observations gathered from social media. This will encourage further research in the use of VGI for EWS considering the exponential increase of quality and quantity of smartphone and social media user worldwide. ARTICLE HISTORY Received 26 June 2018 Accepted 29 April 2019


Introduction
Hydrologic and hydraulic modeling for rainfall-runoff and river flow routing simulations are currently implemented within Early Warning Systems (EWS) for managing and mitigating the devastating impact of floods in urban ecosystems (e.g. Krzhizhanovskaya et al. 2011;Alfieri, Pappenberger, and Wetterhall 2014;Girons Lopez, Di Baldassarre, and Seibert 2017). EWS generally incorporate Data assimilation (DA) algorithms for managing the uncertainty of physically based river channel flow simulations towards more accurate and timely efficient forecasting of flood wave propagation along fluvial valleys. DA supports hydrodynamic modeling for making optimal use of the different and diverse water stage observation systems. Assimilation of river flow observations (McLaughlin 2002;Moradkhani et al. 2005;Liu and Gupta 2007) and remotely sensed data (Matgen et al. 2010;García-Pintado et al. 2013) constitute the two main components of DA-based EWS. However, while inchannel levee-protected river high flows are easier to forecast using 1D hydraulic models, especially in gauged systems, several challenges affect DA when dealing with over-bank distributed floodplain flow propagation. This issue is mainly affecting ungauged basins but also gauged rivers considering the lack or uncertainty, when available, of distributed out-of-channel water level sensors and remotely sensed information. In this regard, remote sensing technology is progressing at an unprecedented pace with radar and optical images that are available with Earth Observation (EO) data and platforms that capture flood dynamics from the global to the local scale. Nevertheless, flood modeling is still a pivotal asset of EWS considering that EO data are not timely available for the satellite orbit revisit time and vegetation cover that impact optical images. As a result, research on novel DA frameworks that integrate out-ofchannel flood flow observations, making optimal use of all the diverse potential water observation information is, thus, crucial for more effective and accurate numerical simulations supporting EWS in understanding and forecasting hazardous events.
Among the innovative source of inundation observations, data gathered by citizens, or crowdsourced data, are characterized by great potential, but also a significant challenge. Georeferenced information produced by citizens is now largely and freely available thanks to the spreading of smartphones and social media users, even in developing countries (Hilbert 2016). Spatial information is voluntarily collected by citizens and published online using social networks (e.g. Facebook, Instagram, etc.). Additionally, the increasing number of custom-made platforms and web applications is demonstrating the need and importance of user-driven flood observations (e.g. Horita et al. 2015). This novel source of data seems to be very promising for supporting flood forecasting from global to local scales, but significant efforts are still needed before using such data in practice. Several projects tested the use of water observations taken by citizens for water resource and risk management (e.g. Buytaert et al. 2014;Van Meerveld, Vis, and Seibert 2017), but the use of crowdsourced data for real-time flood forecasting have not been fully explored yet.
Flickr and Twitter seem the most used social media for getting crowdsourced information related to disasters, allowing all public data to be found and extracted using their Application Programming Interfaces (API). Images captured from video streams by Youtube are also used. In this context, the analysis and interpretation of the media content for organizing, collecting, positioning and extracting relevant information is an active research topic with specific regard to the extraction of geotagged information from social media streams (e.g. Schindler et al. 2008;Luo et al. 2011).
Different definitions and acronyms are usually used to refer to water, territorial and environmental monitoring data gathered and disseminated by citizens. Volunteered Geographic Information (VGI, Goodchild 2007) refers to the general principle of citizens volunteering for collecting georeferenced information and is commonly used within the geospatial and GIS communities. Other definitions and acronyms are also adopted without strictly considering the spatial reference, such as Crowdsourced Data (CD), User Generated Content (UGC). For a more comprehensive list see Table 1. In this manuscript, we select and use the acronym VGI.
Several research works present recent findings concerning the use of VGI for water resource and risk management (e.g. Bonney et al. 2014). Buytaert et al. (2014) give some examples of citizen engagement in hydrology and water science specifying also the type of data that are continuously collected by citizens and that may serve this purpose.
Outcomes of these projects and researches are encouraging in demonstrating the possibility of effectively observing climatic and hydrologic processes (Muller et al. 2015). Social media data have been used for directly creating deterministic or probabilistic flood maps from citizen-observed water levels (McDougall 2011;McDougall and Temple-Watts 2012;Fohringer et al. 2015), stream discharges (Le Boursicaud et al. 2016;, flood extent Cervone et al. 2016;Li et al. 2018;Rosser, Leibovici, and Jackson 2017), or just their geotagged position (Eilander et al. 2016;Brouwer et al. 2017;Poser and Dransch 2010;Triglav-Čekada and Radovan 2013;Sun et al. 2016;Holderness and Turpin 2015).
Crowdsourced data have been also integrated for validating flood models. Aulov, Price, and Halem (2014) developed a platform (AsonMaps) for a rough estimation of water levels and flood/non-flood areas early validation of the surge model forecasts. Fava et al. (2014) integrated the voluntary prediction models with short-term information, to fill the gaps of the traditional observed data for improving the flood forecasting. Kutija et al. (2014) gathered information from a web page for validating and calibrating a hydrodynamic model. Smith et al. (2015) validated GPU accelerated hydrodynamic modeling with crowdsourced information for real-time flood forecasting in an urban environment. Yu, Yin, and Liu (2016) validated a 2D urban hydro-inundation model integrating traditional observation with reported localized flood incidents at the street or house level by the public. Mazzoleni et al. (2015), Mazzoleni et al. (2018) demonstrated the benefits of assimilating both traditional and VGI observations with simplified hydrological and hydraulic modeling for improving the flood prediction. Nevertheless, further research is needed to test the assimilation of crowdsourced data in real scenarios adopting more advanced models and considering other uncertainties related to VGI, such as the ones on the location and the timing.
The principal issue affecting the use of VGI for flood analysis is the uncertainty and low reliability resulting from the use of such data. While VGI are still considered as qualitative information, there is a significant potential of using VGI as quantitative water observation information, but the uncertainty of the information has to be carefully evaluated. The uncertainty analysis is expressed as a function of the citizen technological equipment, the experience, the credibility (e.g. random citizens versus trained volunteer) and any further information characterizing the citizen-driven VGI gathering process that impact the accuracy, completeness and precision of the water observation (Tulloch and Szabo 2012;Bordogna et al. 2014). Reliability assessments require ad hoc  See et al. 2016 statistical tools to evaluate the random error and bias to be assigned to the observation (e.g. Bird et al. 2014). Data reliability can be assessed considering not only the expertise and methodology that characterizes the data source but also the time and the position in which that information is sensed and processed. To address this issue, semantic rules governing what can occur at a given location can be used as a filter for observations (Vandecasteele and Devillers 2013) or taking the mean and the standard deviation of compared measurements at predefined time windows (Mazzoleni et al. 2018).
While an exponentially increasing number of investigations are working towards a quantitative use of crowdsourced information (Assumpção et al. 2018), we could not find many researches attempting to integrate VGI into DA frameworks for improving EWS performances. Additionally, the few case studies implementing VGI for DA-based flood forecasting adopt simplified hydrologic and hydraulic models, and 1D hydraulic algorithms in particular (Mazzoleni et al. 2015(Mazzoleni et al. , 2018. Exploiting the full potential of citizen-driven flood hazard and risk management and modeling systems, also integrating VGI, require a comprehensive simulation of the floodplain spatially distributed water dynamics that 1D hydraulic models cannot provide. VGI and 2D hydraulic models shall be, thus, jointly considered and implemented for DA of EWS frameworks that effectively consider all the diverse data sources as well as the physical and human processes hat interplay towards more accurate and efficient flood early warning and inundation hazard management. This research tries to address this important missing component of VGI use for flood modeling studies by developing a research that: (a) implement and test a DA framework using VGI and a 2D hydraulic model; (b) use VGI effectively for flood risk management; and (c) use a real case study to evaluate advantages and limitations of the proposed approach.
The paper is organized as follows: In Section 2 the case study of the Tiber River is presented, with specific subsections on input data and available VGI for the selected flood event. In Section 3 the models and methods are described, with specific regard to the forecasting hydrologic and hydraulic models and the DA methodology, focusing on the modeling and the observational error analysis. Results are, then, presented in Section 4 and conclusions are given in Section 5.

Case study
The selected case study is the Tiber River basin, the second largest catchment in Italy. In particular, the fluvial segment from Orte Scalo to the Castel Giubileo dam is selected, located just upstream of the city center of Rome in central Italy (bounding box of the domain ranges between 42.6°and 43.85°N and 11.8°-12.9°E in the Lazio region, Figure 1). The catchment area at the Orte Scalo node has an extension of 5881 km 2 , with several minor tributaries contributing downstream to the main river (such as Treja, Farfa, Corese, Aja Di Poggio tributaries), and only one main tributary represented by the Nera river (4180 km 2 ) that lead to an overall extension of the basin at the Castel Giubileo downstream node of 15,086 km 2 . Elevations for the entire basin range from a maximum of 2470 m above sea level (a.s.l.) to a minimum of -6 m a.s.l. with an average elevation of 524 m (a.s.l.). Land use is predominantly associated with cultivated lands (~55%) with also forests (~40%) and urbanized areas (~5%). The climatic and hydrologic regime of the Tiber River basin is characterized by variable and intense precipitation with most of the total annual rainfall (average yearly cumulated precipitation of 1020 mm/year, Romano and Preziosi 2013) occurring in autumn and spring, following a Mediterranean regime. The Tiber River is often subject of flooding conditions, which impact the selected river segment and also the historical city center of Rome (Tauro et al. 2016;Nardi, Annis, and Biscarini 2018a). The hydrologic regime of the upstream boundary condition is governed by the Corbara dam, located 40 km from the Orte Scalo node, and the Paglia River confluence, just downstream of the Corbara dam, that control the flooding routing dynamics of the Orte Scalo -Castel Giubileo fluvial valley. This river reach is strategically important for the flood risk management of the Rome city center, considering the flood wave expansion and attenuation within the Orte-Castel Giubileo floodplain.

Available material and data
The increasing frequency of inundation events in the Tiber River basin (Brunetti et al. 2004;Fiseha et al. 2014) motivated the development of a flood hazard modeling and mapping updating program (also known as PS1), that was commissioned in 2012 by Lazio Region and Tiber River Basin Authority. The hydrologic and hydraulic modeling study consisted in the gathering, processing and modeling of updated topographic and bathymetric data (Table 2) for the creation of a robust and accurate geospatial dataset representing the geometry of the fluvial bottom. Additionally available datasets consist in the river discharge/stage rating tables and observation of recent flood events that were gathered from the Lazio Region Civil Protection agency for the main bridges and weirs. Land use data (Corine Land Cover at the fourth level for the whole national territory provided by Istituto Superiore per la Protezione e la Ricerca Ambientale, ISPRA) were also gathered for the hydrologic parametrization of floodplain landscape within the domain of study. Hydraulic parameters (e.g. roughness parameters) were calibrated using observation from recent floods to build a consistent flood wave routing model that was implemented to simulate the ground effect of synthetic design hydrographs for predefined recurrence intervals (50, 200 and 500 years return time). The different maximum potential flooding extents were consequently identified upon the hydrologic forcing scenarios for urban planning and regional zoning purposes. The hydrology and the implemented hydraulic model, represented by a 2D hydraulic algorithm, are gathered and used as reference material and methods for this research.

Crowdsourced data
The 2012 flood event was selected as a reference test case. VGI from structured volunteering projects (e.g. phone apps, custom web apps) are not available for the study domain. Nevertheless, social network users are generally very active considering the high density of population that characterizes the Tiber fluvial valley in the proximity of the city of Rome. Unconventional data sources, Youtube, in particular, were used. Twitter Search API was also queried, but valuable information for a DA implementation was not found. From the several available video streams, a small subset of videos of the flooding was found of particular interest in the visual observation of water levels in recognizable locations. The detailed georeferencing of the images was possible using the tags of the posts and also by matching the crowdsourced picture with google street view. The geocoding of the captured media location was, thus, finalized and a small number (less than 20) of flood observation nodes were identified within the computational domain in correspondence of three urbanized areas (Orte Scalo, Torrita Tiberina and Monterotondo) that were severely affected by the Tiber River flooding (Figure 2). Additional-crowdsourced images were gathered from the internet, news articles and social network (Twitter in particular), providing additionalcrowdsourced material, even if the selected flood event took place six years earlier than this work. The presented web crawling activity, with youtube video extraction combined with internet search using as keywords the date, the place and common hashtags (e.g. #Tiber, #flood), supported the identification of a total of three water observations (Table 3) to be considered suitable for being used as potentially quantitative observation for the proposed procedure.
The selected VGI are characterized by uncertainties in their spatial and temporal localization. The The methodology can be replicated in near real time for the same purpose adopting the Application Programming Interface (API) of the social media platform that allow to filter geotagged information selecting keywords related to the flood event. This process can be automatized using semantic filtering and deep learning. Tkachenko, Jarvis, and Procter (2017) demonstrated the potential of using polysemous tags of images in the Flickr database as support of early warning of an event before its outbreak. Brouwer et al. (2017) used the Twitter streaming API for spatially and semantically filter tweets related to the analyzed flood event. Jiang et al. (2018) adopted transfer learning and lasso regression for waterlogging depth extraction from video images. An advanced algorithm for extracting information from the twitter platform for disaster response was recently developed, such as TAGGS (de Bruijn et al. 2018).

Flood modeling
The selected inundation model is based on the combination of a hydrological rainfall-runoff model and of a 2D river flow routing algorithm.
The hydrologic model is based on the Instantaneous Unit Hydrograph (IUH) rainfall-runoff model implementing the width function (WF); the WF characterizes the shape of the runoff response to the unit precipitation pulse by analyzing the travel time distribution of surface runoff at the basin scale (Grimaldi, Petroselli, and Nardi   . In particular, the WFIUH implements DEMbased terrain analysis, interpolation and processing for simulating the runoff response to the precipitation input using a IUH that is expressed as a function of the geomorphic structure of the river network and predefined surface flow dynamics (i.e. channel and hillslope surface flow velocities assigned as a function of catchment feature morphologic and land use properties) (Rodríguez-Iturbe and Valdes 1979; Bras 2004, Grimaldi, Teles, andBras 2005;Petroselli and Grimaldi 2018). This hydrologic model, that was calibrated using real events, is able to transform rainfall observations, gathered from the Tiber basin monitoring network, into a runoff for estimating the hydrograph to be used as upstream boundary conditions for the hydraulic routing model. The selected 2D river flow routing algorithm is the FLO-2D PRO software. It is a physically based process model able to simulate the flood wave routing over unconfined flow surfaces (2D equations) using the dynamic wave approximation to the momentum equation (O'Brien, Julien, and Fullerton 1993). FLO-2D routes the flood hydrograph on the gridded floodplain surface simulating the channel flow propagation and the channel-floodplain overbank exchange, and the interaction of floodplain flow dynamics with urban features and obstructions (e.g. levees, culverts, bridges, embankments, buildings). FLO-2D is a volume conservation model, using the Manning coefficient to represent terrain roughness, solving the De Saint Venant equations. FLO-2D is defined as a Quasi-2D model considering that the channel is represented using cross sections and a 1D geometry linked to the unconfined gridded surface where the flow, overflowing from the 1D channel, is propagated using a fully dynamic 2D wave routing.
For the presented case study, a grid resolution equal to 150 m was set, interpolating topographic data from high resolution 1-meter LiDAR dataset. This resolution proved to be a good compromise between computational effort and accuracy of results (Peña and Nardi 2018). In situ surveyed cross sections were adopted for representing the geometry of the river channel. Boundary conditions were assigned considering both the flow measurements in the upstream part of the Tiber River and the flow hydrographs from 15 ungauged basins simulated by the adopted WFIUH hydrologic model. The computational domain was optimized adopting a hydrogeomorphic model (Nardi, Vivoni, and Grimaldi 2006;Nardi et al. 2018b;Morrison et al. 2018) according to Annis et al. (2019) approach.
The distribution of the Manning values for the floodplain surface within the study domain was evaluated using reference values extracted from literature (varying between 0.02 and 0.20 m -1/3 s) associating roughness conditions to land use classes of the Corine Land Cover project at the fourth level

DA model
In this work, the Ensemble Kalman Filter (EnKF) method (Evensen 2003) is applied to the selected Quasi-2D hydraulic model. The EnKF model is a sequential DA method that estimates the unknown model state (e.g. flood flow depth) based on the available observations at each time step. Specifically, the updated probability density function (pdf) of the model state is given by a combination between the data likelihood and the forecasted pdf of the model states by means of a Bayesian update. The forecast (a priori) state error covariance matrix is approximated propagating the ensemble of the model states, considering its uncertainties, from the previous time step. At the same time, an ensemble of observations at each update time is generated according to their error distribution introducing a noise term. If a set of observations y tþ1 is taken at time t þ 1, these can be assimilated into the model. The observation for the i-th ensemble member can be expressed as: ð Þ is a propagator that links the state variables to the measured variables providing the expected value of the output given the model state and parameters. η i tþ1 is the noise, considering a random normal distribution with zero mean and variance R y tþ1 , usually considered time dependent.
The state variable x iÀ tþ1 of forecast model in the EnKF, for the i-element of the ensemble at a time t þ 1 can be expressed as: generated starting from the deterministic value and adding a random normal error with zero mean and a certain variance, namely From the a priori estimate of the state variable x À tþ1 , the posterior estimate x þ tþ1 is calculated using the observation y tþ1 performing a linear correction with the Kalman filter to the forecasted state ensemble members: where K tþ1 is the Kalman gain matrix, expressed as: where P À tþ1 is the ensemble covariance matrix; H is the observation transition operation and R y tþ1 is the variance of the observation error.
In this work, the state variable x t is the water depth in a specific location of the computational domain. In case of crowdsourced information, like a photo depicting water depths or a quantitative description of flood depths from any web content, the state variable can be located both in the channel or more likely in the floodplain. The non-linear function (. . .) introduced in Equation (2) is the hydraulic model engine, whose forcing term I t is the ensemble of the flow hydrographs, and the parameters θ are the channel and floodplain roughness. The model error w t is estimated considering the uncertainties related by the input forcing and the model parameters. The observation y t is a water depth value gathered from the VGI. For this reason, the observation transition operation H is an identity matrix, being a direct relation between state variable and observation. The perturbation v t to be assigned to the observation ensemble is strongly dependent on the nature of the observation, as described in Section 3.4. The correction of the model states in the 2D hydraulic domain is enforced into the floodplain cell where the observation is located, but also to both the closest channel cell and to the floodplain cells that are hydraulically connected to the channel cell. A propagation of the correction upstream and downstream is, thus, performed adopting a gain function similar to Madsen and Skotner (2005) since the correction of only one channel cell would bring instability to the flood wave routing model. This correction propagation model is also weighted by implementing a spatially varying correction factor that varies as a function of the distance of the corrected location. Specifically, the correction propagation function weights are proportional to the inverse of the distance between the analyzed channel cell and the channel cell connected to the floodplain cell where the observation is located. In the following subsections, the procedure for taking into account the model errors (Section 3.3) and the observation errors (Section 3.4) in order generate the ensembles of the forecast state variables and the observations to be implemented in the EnKF framework are illustrated. Section 3.4 presents one of the main novelty of the proposed work, concerning the approached tested for managing the uncertainties of VGI into the DA flood EWS approach.

Errors of the flood forecast model
The EnKF takes in to account the uncertainty related to the model errors through a realization of the model results and in particular by perturbing: the forcing input given by the static sensors and the hydrologic model; the model parameters that is the channel roughness expressed by the Manning values. Following Weerts and El Serafy (2006), the i-element of the hydrologic input at time t is generally expressed with as follows: where Q S t is the flow value given by the observation or by the hydrologic model, N 0; R t ð Þ is a noise term normally distributed with zero mean and the variance R t that depends on the type of hydrologic input and can be expressed as: where α t is the coefficient of variation related to the uncertainty of the input discharge. The uncertainty related to discharge observation (α StS;t ) is given by the sum of two different components (Clark et al. 2008): the estimation of the water level from the static sensor (EWL) and the transformation of the water level into discharge with the rating curve (ERC). In this case study, α StS;t has been imposed equal to 0.12, considering the flow rating curve component equal to 0.1 (Weerts and El Serafy 2006) and the component due to the water level estimation equal to 0.02 (Mazzoleni et al. 2015).
The uncertainties affecting the hydrologic model, like the value of the coefficient of variation α I , are related to different factors, such as the measured rain and its distribution at the basin scale, the simplified modelling of the flow routing, the neglected physical process (e.g. groundwater flow), the mud and debris flow, the antecedent soil moisture conditions among others. After a calibration analysis of the abovementioned hydrologic model for four small-gauged basins during the 2012 flood event, here not reported for sake of brevity, the coefficient of variation α I is assumed to equal to 0.3. The uncertainty related to the model parameters is considered as follows (Clark et al. 2008;McMillan et al. 2013): where p s i is the perturbed model parameter for the i-element of the ensemble, p s is the model parameter and ε P is the fractional parameter error. In this case, the channel roughness has been chosen as the perturbed parameter and ε P is assumed equal to 0.25. This constrains the value Manning of the channel between 0.30 and 0.50 m -1/3 s.

Observation errors in crowdsourced data
The observation errors related to VGI are characterized by three different factors: location error, timing error and the water depth estimation error.
3.4.1. Location error (err VGI loc ) DA models normally consider the location of observation ascertain. This is reasonable for typical oceanographic or hydrologic measurement methods, so the issue related to a potential location error is still not much investigated in the literature. Locational information of VGI, for example, tweets, can be uncertain because geotags are available for only a very small number of tweets and may significantly differ from the actual location of the observation (Hahmann, Purves, and Burghardt 2014).
If the VGI is a picture, even if the geotagged position is in a wrong place, the image could provide landmarks to place the correct position of the observation, whose location error can be lower than the resolution of the large-scale hydraulic model, thus negligible. If the VGI is a text message from a social platform or it is an image without any recognizable landmark, the geotagged position of the VGI can vary considerably depending on its type (McClanahan and Gokhale 2015;Brouwer et al. 2017). The perturbation of the VGI observation given by the positioning error for the i-element of the ensemble can be expressed as a noise error normally distributed with zero mean and variance R VGI loc : In the adopted hydraulic model, this error can be implemented moving the position of the cells to which a VGI observation is assigned considering the number of times the location error of the i-element of the ensemble is greater than the resolution of the model both for x and y coordinates (Figure 3): where X VGI and Y VGI are the North and East coordinates of the geotagged VGI. The variance R VGI loc varies depending on the type of geotagging. Considering a similar approach suggested by Sengupta et al. (2012), for the time step in which an observation is assimilated, if the location of the observation related to i-element of the ensemble X VGI i ; Y VGI i À Á is different to X VGI ; Y VGI ð Þ , the observation at this location is derived considering how it could be if an observation at X VGI i ; Y VGI i À Á is assimilated. Hence, the water depth at a location X VGI ; Y VGI ð Þis given by: In other words, the observation at the original location X VGI ; Y VGI ð Þ is measured starting from the observation at the perturbed location X VGI i ; Y VGI i À Á and considering the reciprocal water level differences before the updating step of the DA. The selected VGI data, being images in which landmarks are clearly visible, are affected by a low location error whose variance is imposed equal to R VGI loc = 100.

Timing error (err VGI time )
The timing error, that is the error of assigning a specific time to a VGI information, is characterized by two components: (a) The error related to the wrong time set in the device: this error is not more than few seconds or few minutes and can be negligible. (b) The lag time between the information acquisition and the user posting time. For (b), if the VGI data has not text reporting the exact time of the information to be used or this information is imprecise, the time between the information acquisition and its sharing by the user can be considerably high, i.e. several hours. In order to take into account this error, the proposed approach tests an error-induced timing of the VGI.
The time step related to the VGI observation of the i-element of the ensemble can be perturbed using the following expression: where N 0; R VGI time À Á is a noise error normally distributed with zero mean and variance R VGI time [hours]. If at time step t k the i-element of the ensemble is affected by a VGI observation, its correspondent perturbed observation is directly given by Equation (7). If the i-element of the ensemble is not affected by an observation at time t k but it has been already affected by the observation at time t kÀ1 , its water depth observation at time t k should be the value assumed in case of a correction given by Equation (7) at time t kÀ1 . Lastly, if the i-element of the ensemble is not affected by an observation at time t k but it will be affected by the observation at time t kþ1 , its water depth observation at time t k should be the value assumed by its variable if no updating has been performed to that simulation at time t k (Figure 4). In this work, the variance R VGI time associated with the time location related to the VGI images has been imposed equal to 30 min for each image.
3.4.3. Water depth derivation (err VGI wd ) Water surface elevation has been derived by adding the water depth observed from VGI data to the local ground elevation (Fohringer et al. 2015;Brouwer et al. 2017). Water depths can be derived both from image interpretation or from text messages describing the flood dynamics. Brouwer et al. (2017) observed that water depths mentioned by tweet messages are generally higher than the water depths derived from the visual interpretation of the photographs, with errors lower than 55 cm. However, a statistical test could not confirm the mean error in water depth was any different from zero, so the water depth estimation errors have been simulated using a normal distribution with zero mean and a standard deviation of 20 cm.
The water surface elevation has been derived as a sum of the terrain elevation given by the LiDAR DTM and the depth deduced by the visual interpretation of the image. The perturbation of the water surface elevation for the i-element of the ensemble is assigned as follows: where R DEM is the variance related to the DEM error, assigned equal to 0.3 m and R Depth is the variance of the water depth derivation from visual interpretation, assigned equal to 0.2 m, as in Brouwer et al. (2017). However, a lower limit of 0.05 m has been assigned for the water depth derivation in order not to have negative or zero values of water depths. The water depth has been deduced comparing the images during the flood with the same images get in dry conditions from Google Street View.

Results
Here we present a comparative analysis of the inundation modeling using the proposed approach making use of the error correction procedure (updated model) as respect the model run without the correction procedure (non-updated model). Figure 5(b) shows the spatial distribution of the differences between the mean water depths of the updated as compared to the no updated simulations. The vertical correction along the water profile is also represented in Figure 5(d), while the temporal variation is shown in Figure 5(c). The persistence of the correction is about 8 h. Further results of the performance analysis are reported in Table 4. These outcomes suggest that a more significant effect of VGI data for improving the model performance can be obtained with the largest number of observation data, spatially but also temporally distributed at least every few hours, considering that the effect of each observation persists for several hours after its assimilation in the framework.
The correction of the water levels does not considerably affect the flood extension considering the mean values of the water surface elevation. For example, at 9:00 am on 14/11/2012, namely at the time of the correction of the VGI data from Torrita Tiberina, the increasing of the flooded areas after the correction is only equal to 0.079 km 2 . Figure 5(b) shows the spatial distribution of the mean water level corrections at the time of the Torrita Tiberina VGI acquisition. The performances of the overall simulation, in terms of Nash-Sutcliffe efficiency and Person Correlation, are calculated considering four different stage gage stations ( Figure 6). Since the corrections are local and persist for few hours, the performance improvement in case of updated simulation, compared to the no updating, is almost negligible for the first two stations (in the upstream part of the basin). On the other hand, a slight improvement can be appreciated in the downstream part of the basin.
The effect of the correction procedure is minor, an expected result considering the number of injected VGI observations into the DA procedure, but the aim of the tests, concerning the computational efficiency and accuracy increase of the 2D flood routing model, is encouraging. The proposed methodology implementing a DA-based state variables updating process allowed the 2D hydraulic model to be quite stable after each correction. This seems promising towards the adoption of larger spatially distributed datasets of VGI during flood events.

Conclusions
In this work, VGI data are used in a DA framework to investigate potential improvements in the performances of 2D hydraulic models for flood forecasting.
A new methodology was tested for taking into account all the sources of uncertainties of the citizen information, including the timing and location errors. This work was motivated by the need for investigating new sources of information to cope with data scarcity issues that affect river basin flood risk management and mitigation globally. In fact, while expensive ground monitoring systems (i.e. gauges) seem inaccessible, especially in developing countries, citizen data are rapidly spreading in urban areas worldwide (Statista 2017). Nevertheless, a reliability assessment of VGI observation is a key factor for their successful integration into flood hazard forecasting models. A case study was developed for testing the proposed approach and the November 2012 flood event in the Tiber River basin was selected for the availability of traditional flood observations, VGI together with a calibrated detailed topographic, hydrologic and hydraulic model. The available flood modeling was integrated into a novel DA framework using the EnKF in order to consider the uncertainty related to the VGI observation. Results show slight, but consistent improvements in the performance of the flood simulations assimilating VGI, an expected result considering the few VGI injected into the model. This encourages the development of additional tests for evaluating the potentially improved performances in case studies with a larger number of VGI data. Integration of spatially distributed VGI in floodplain domains with other sources of information such as stage gages and satellite remotely sensed data can be a challenge for improving the flood forecasting. Moreover, additional analysis of the timing and location errors that characterize these kinds of information using a richer data environment for their validation could give more support on the modeling of their probability distribution.

Notes on contributors
Antonio Annis is a post-doc researcher at the Water Resources Research and Documentation Centre (WARREDOC) of University for Foreigners of Perugia. He received his PhD joint degree from the University of Florence and the Technical University of Braunschweig. His research interests include integration of remotely sensed data and crowdsourced information in to hydraulic modelling and hydrogeomorphic models for floodplain mapping.
Fernando Nardi is an associate professor of University for Foreigners of Perugia where he also serves as Director of the Water Resources Research and Documentation Centre (WARREDOC). He received his PhD degree in hydrology from Sapienza University of Rome. His research interests pertain to numerical and geospatial modelling for hydrology, hydraulics, geomorphology, Citizen Science and Big Data applications within flood risk and water resource management research projects.