Uncertainties in different leak localization methods for water distribution networks: a review

ABSTRACT In recent decades, research on leak detection and localization in water distribution networks has been an area of growing interest in both water management and fault detection. In the literature, numerous leak localization techniques were developed from model-based methods (such as steady-state and quasi-steady state) and data-driven/machine learning models (e.g. time series modeling, prediction, and classification). However, there is still a need for study on the definition and enumeration of various sources, types, and nature of uncertainties in leak localization modelling processes. In the context of steady-state analysis, this review paper’s main objective is to list the uncertainties related to model-based, data-driven and hybrid methods. This review outlines that, for the three classes of methods, the interplay of uncertainties with the modelling approximations jointly influences the localization performance and are often overlooked. Furthermore, realization of modelling assumptions and error propagation is needed for a successful real-world implementation.


Introduction
Water utilities worldwide are under increasing stress due to urbanisation, population growth, and climate change coupled with the loss of revenue and treated water volumes during supply. Globally, Water Distribution Networks (WDNs) lose about 30% of treated water on average during supply (Liemberger and Wyatt 2019). Deteriorating pipes and other network elements, such as valves and joints, mainly contribute to water loss in WDNs. The situation is further exacerbated by financial constraints leading to improper rehabilitation measures and lack of systematic pressure management and monitoring. Due to the rising concern about the loss of treated water, the associated loss of revenue, risk of contamination through intrusion from leaky joints (Kirmeyer et al. 2001), and additional energy costs caused by pumping to compensate for the amount of lost water (Colombo and Karney 2002), leak detection and localization has gained substantial interest among researchers in recent decades. When leak(s) occur in WDNs, mostly leaks with large magnitudes and bursts are repaired only after such incidents are reported by the public or water utilities. Such methods are referred to as passive leak detection methods that generally lack systemic action and result in higher losses (Puust et al. 2010). Background leakage and unreported leaks with smaller magnitudes, on the other hand, take longer to detect and repair (Lambert 1994). Therefore, those smaller leaks generally cause bigger water losses over time than large pipe bursts. Thornton, Kunkel, and Sturm (2008) describe that the amount of water lost due to leaks can be minimised if the time taken to detect, localise and repair faults in the network is minimised. While there are several methods available in the literature that aim to reduce leak detection and localization times, it is rather difficult to categorize them using a common framework. Several researchers have attempted to categorize them in different ways. Wang et al. (2001) were one of the first authors to classify the scientific methods developed for leak detection and localization in WDNs into four families -(1) offline methods based on direct inspections and observation, (2) online methods referring to batch monitoring for large diameter pipelines using specially designed free moving pistons (pigs) inside pipes, (3) acoustic methods based on the usage of acoustic signals and (4) hydraulic methods based on transient and steady-state models derived from first principles. These hydraulic methods use additional offline and online measurement data to systematically detect and locate leaks. Whilst the acoustic methods are suitable for detection, they have limitations due to background noise in the signals, interferences due to multi-leak scenarios, and damping effects as observed predominantly in plastic pipes. Steady-state and transient hydraulic methods, on the other hand, are the most widely researched techniques in leak detection and localization. While the steady-state methods are based on mass and energy conservation laws, transient methods estimate leak location and size by the time and frequency domain analysis of pressure waves from high-frequency measurement data in the presence of leaks. Among them, transient methods require highly accurate models and are sensitive to parameter variations despite being data hungry. Colombo, Lee, and Karney (2009) categorized transient analysis explicitly into direct, inverse and frequency domain analysis of pressure transients. Although these techniques locate leaks with a very high accuracy, they are limited to simplified networks or single leak scenarios as they are heavily influenced by the model development process including analytical approximations, data and parameter uncertainties, and the choice of analysis algorithm (Duan 2018). The scalability of transient methods to large scale real-world networks under realistic operating scenarios needs further research. For the reasons mentioned above, steady-state methods in combination with acoustic methods for locating leaks were determined to be more accessible and cost efficient. A comprehensive survey of previous literature reviews, different types of leak detection and localization methods, and their classifications was discussed in Wu and Liu (2017), Chan et al. (2018) and Hu et al. (2021). The authors broadly classified the leak localization methods into two categoriesmodel-based methods and data-driven methods and summarized their limitations. Hybrid modelling using combinations of these two methods have also been proposed in recent years and Hu et al. (2021) points out that such hybridization could potentially improve the performance of such methods. Table 1 provides a sample list of relevant methods that are validated/tested in some benchmark and real-world networks. These studies were conducted on both prominent benchmark networks (e.g. Hanoi, L-Town, Net25) and several real-world networks, mostly across Europe. However, factors such as the range of leaks simulated or tested experimentally and the ratio of number of installed sensors to the size of the network also varies. In addition, unlike leak detection, there is not a common metric put forth by these studies for determining the leak localization accuracy. Hence, it is difficult to compare the performance of different leak localization methods and their efficacy in real-world implementation.
From the recent scientific advancements in leakage detection and localization over the last two decades, five key observations can be made: (1) Most of the existing methods in literature were validated and assessed on simulated environments and are rarely applied to real-world scenarios. Several assumptions during the model development process do not consider the unexpected fluctuations in water demands and network changes (both in structure and parameters). Furthermore, many models assume leak-(s) to be present only in nodes and are reliant on idealistic no leak scenarios for discriminating leak from no-leak events.
(2) Most of the proposed methods focus more on leak detection rather than localization. Leak detection problems can be solved even without a hydraulic model provided historical consumption data and measurements of reasonable quality are available. Detection does not require highly sophisticated hydraulic modelling with or without high volume and quality measurements. This is specifically true for data-driven methods that use measurements from critical points in the network for flagging the onset of single or multiple leaks. To improve the accuracy of detection and localization performance, these methods require sufficient quality of measurement data and appropriate mathematical preprocessing steps to account for data loss and measurement errors.
(3) Measurement of transient signals required for transient based methods experience severe damping and dispersion due to unknown network properties and background noise. The success of transient methods for accurate localization in complex networks is further reduced by the underdetermined nature of the problem as it is rather difficult to reproduce these damping and dispersion effects from measurements. Therefore, transient methods are more suitable for relatively controllable tree-like subnetworks for accurate localization (Che, Wang, and Ghidaoui 2022). Hence from a localization standpoint, the trend in method types used for leak localization in complex networks gradually shifted from transient and steady-state methods to data-driven and hybrid methods, particularly in the last decade. This transition is mainly due to the advent of the Internet of Things (IoT) and efficient storage and computing devices along with less complex and more computationally efficient soft-sensing techniques. Recent advancements within the field of multivariate data analysis and machine learning for engineering systems allowed their application as complementary approaches for overcoming the limitations of model-based methods for leak localization. (4) The performance of localization algorithms are seldom compared due to the lack of benchmark networks that closely resembles realistic operating scenarios. Vrachimis et al. (2020) was one of the first attempts that aimed to close this gap in terms of localization accuracy and cost. (5) Despite the advancements in sensor technologies, data storage and computation capabilities, methods that aim to address uncertainty valuation and propagation in leakage localization are still seldom applied in the field of leak localization.
These key observations identify a gap in understanding the performance analysis of leakage localization algorithms and uncertainties associated with them. In the following, a methodology section describing the systematic data collection process followed for the literature search. A brief background about uncertainties in WDNs is then outlined and connected to and discussed in the light of the three classes of leak localization methods -model-based, data-driven and hybrid ( Figure 1). To conclude, a summary of current understanding of uncertainties in leak localization under different modelling approaches is provided.

Methodology
Our review process first involved a systematic search for literature using the tools Web of Science (www.webofknowledge. com) and Scopus (www.scopus.com). We used identifiers 'Water Distribution Networks' OR 'Water Distribution Systems' AND 'Leak Localization' AND 'Uncertainty' within the search fields author keywords, title and abstract. The search period was set between 1990 to early 2022 when this article was first drafted. As expected, the number of articles explicitly mentioning those keywords were merely 26, which are discussed in detail in this review. It is a fact that most of the published articles in the last three decades addressed mostly leak detection rather than localization and even if otherwise, the key findings and suggestions towards uncertainty analysis in localization performance were implicit. Hence, the authors resorted to manual screening of papers by going through titles, abstract and key outcomes of leak localization research that deal with model calibration procedures, model approximations, and uncertainty consideration in one or more entities of the leak localization process. For brevity and to focus on techniques usable on a network level, the methods considered and discussed in this paper are only based on steady-state hydraulic analysis formulated within model-based, data-driven and hybrid localization methods, omitting transient models and frequency domain analysis. This resulted in a total of 174 publications, when the literature review was carried out in early 2022. Subsequently, 103 publications were selected based on their primary localization algorithm approach (model-based, data-driven and hybrid) and those that provided definitions for terminologies and insightful ideas from other disciplines. The publications were selected and compared based on their leak localization analysis with respect to modelling strategies, function approximations and one or more uncertainty factors. Walker et al. (2003) defined uncertainty as any deviation from an, in the end unachievable, ideal of complete determinism of a relevant system. Along those lines, uncertainty can be described by mapping three dimensions -nature, source, and type of uncertainty. The nature of uncertainty distinguishes aleatory uncertainty and epistemic uncertainty (Hall 2003;Walker et al. 2003). Aleatory uncertainties are inherent in nature and cannot be reduced further by additional measurements. An example would be measurements of flow or pressure. The measured variable will always have a random error due to the chance of occurrence or inherent nature of the measuring device, irrespective of ideal conditions in which measurements are taken. Epistemic uncertainty on the other hand arises from our limited capability to understand and describe a process. With respect to WDN modelling, determining the value of pipe roughness parameters with 100% certainty without relaxing any physical laws is infeasible. Such uncertainties that are reducible with increased understanding of physical laws are epistemic uncertainties. Sources of uncertainties can be classified into uncertainties caused by input data, parameter, and model structure uncertainty (Hutton et al. 2004). The definitions of those are however not trivial and can often overlap. For example, pipe roughness can be seen as either a model input derived from measurements, or from a look-up table based on pipe material specifications, or it can be a model parameter that needs to be calibrated. The type of uncertainty is distinguished by the knowledge about the possible outcomes of a model and the probability of the occurrence of these outcomes (Brown, Heuvelink, and Refsgaard 2005). This results in several types diverging to a varying degree from determinism which are increasingly more difficult to distinguish in practice. A more pragmatic approach, by distinguishing uncertainty types only by their possible implementation into the modelling processes, either into statistical, scenario or deep uncertainties, as described by Tscheikner-Gratl et al. (2017), is therefore used here. The first type of uncertainty is statistical uncertainty, in which all uncertainties can be handled statistically from the probabilities of all possible outcomes. The second type referred as scenario-based arises when the probabilities are difficult to compute but different possible outcomes are estimable. The third type -deep uncertainty arises when both knowledge of outcomes and their probabilities are difficult to estimate. Based on these definitions, the major factors that are influencing the uncertainty of leak localization algorithms and their feasibility for real-world implementation can be described. A ranking between those factors is not done in literature based on numerical evaluation so far for leak localization problems and is not attempted here.

Network changes
• The WDNs undergo several changes due to ageing, scaling, corrosion, and network expansion due to population growth. This ultimately results in changes in pipe diameters and roughness coefficients of WDN links and hence in the model to be utilized for leak localization. • The nature of uncertainty with respect to pipe roughness due to scaling and/or corrosion is epistemic, the source being parametric and type being statistical as it can be sufficiently quantified. • Model calibration hence becomes an important prior step for accurate localization. While many methods based on optimization-calibration are used for leak localization (e.g. Berglund et al. 2017;Blocher, Pecci, and Stoianov 2021;Kapelan, Savic, and Walters 2003;Pudar and Liggett 1992;Sanz et al. 2016;Sophocleous, Savić, and Kapelan 2019), the quantification of parameter uncertainty is largely missing in the literature. • Although parameter uncertainty quantification has been studied from a modelling standpoint using First Order Second Moments (FOSM) methods (Bush and Uber 1998;Lansey et al. 2001), Monte Carlo (MC) methods (Kang, Pasha, and Lansey 2009) and polynomial chaos expansion theory (Braun et al. 2020), the propagation of errors into leakage localization needs further research.

Need for iterative calibration
• Like roughness parameters, nodal demands also vary stochastically and are uncertain due to the effects of climate change and different socio-economic characteristics among consumers (Mazzoni et al. 2023;Steffelbauer et al. 2021). • The nature of nodal demand variations can be attributed as epistemic, as they can be quantified with suitable measurements or historical data. As they are used as input data for solving a system of steady-state equations, they are considered as model input uncertainty and their type can be considered as a combination of statistical and scenario archetypes. • The need for iterative calibration is inevitable as it stems from the model-mismatch between roughness and nodal demand parameters to reduce structural, parameter and model input uncertainty (Ormsbee and Lingireddy 1997;Walski 2021). Although several methods have been developed based on MC methods and Kalman Filters (e.g. Xie, Xiang Xie, and Hou 2017) to jointly reduce and quantify this uncertainty during model calibration and predictions, further error propagation into the localization process is not widely studied.

Measurement uncertainty
• The measurement errors in measurement of pressure, flows and reservoir levels can be classified into systematic and random errors. Regardless of the method of localization used (i.e. model-based, data-driven, or hybrid), measurements are fed as inputs to the model. Since the random errors are irreducible but quantifiable with statistical measures, measurement uncertainties are aleatory and statistical in nature. • Measurement errors, when not catered to, affect specific model runs, ultimately propagating to model calibration, model performance assessment and hence in leak localization accuracy (Hutton et al. 2014). A guide to expression of uncertainty in measurements comprising estimation of input data uncertainty and ways to propagate them to model estimates through Monte Carlo methods are discussed in Muste, Lee, and Bertrand-Krajewski (2012). MC and FOSM methods by Sumer and Lansey (2009) and a method of sensitivity analysis of parameter variations by Piller et al. (2017) are used in WDN model design, calibration and performance assessments but are still rarely applied for leakage localization problems.

Choice of measurement locations
• Measurement locations per se also constitute uncertainties since they may have intercorrelations through hydraulic phenomena and thus come with different strengths of uncertainties. Hence, the sensor placement problem plays a vital role in leakage localization as well. • The optimal sensor placement problem has multiple solutions due to being underdetermined. Different approaches can lead to equifinal solutions for sensor location and number. Steffelbauer and Fuchs-Hanusch (2016) found that uncertainty in demands, and model parameters has a greater impact on accurate leak localization than the optimization method and objective function used. • In addition, given measurements are made at a chosen set of optimal sensor locations from existing model information and measurement data, the knowledge of the WDN now changes if model recalibration is performed with new sensor information. This leads to a different set of optimal sensor locations if the optimization is performed again. Hence, the choice of measurement locations per se constitutes a part of the overall uncertainty and since they are affected by hydraulic model, prior knowledge of measurement locations and measurement data uncertainty, they can be classified as input and parameter uncertainty (source), scenario (type) and aleatory in nature.

Choice of modelling, objective functions, and constraints in solution strategies
• Most assumptions in leak localization techniques are not sufficiently realistic. For example, many methods assume only single leak, high magnitude leaks or the presence of leaks only at nodes instead of at pipes. Given that leak detection and localization aim at locating new leaks, assumptions on single leak scenarios are reasonable since it is quite unlikely that multiple leaks happen at the exact same time (Mukherjee and Narasimhan 1996; Sophocleous 2019).
• Methods are highly dependent on the accuracy of the hydraulic model but are not clear about the outcomes of using approximated models (e.g. using quadratic headloss (Eck and Mevissen 2014) or end-user demand consumption modelling (Berardi et al. 2017;Sanz and Perez 2014)) with assumptions on model structure and model parameters. • Leak localization is dependent on the number and location of sensors (Cugueró-Escofet, Puig, and Quevedo 2017) and the effects of large leaks can mask the presence of background leaks (Laucelli et al. 2015). • There are very few methods that discuss about the choice of optimization methods, function approximations and their effect on localization metrics (Berglund et al. 2017;Blocher, Pecci, and Stoianov 2021;Kabaasha, van Zyl, and Mahinthakumar 2020;Marzola, Alvisi, and Franchini 2022). Such discussions on the effects of optimization frameworks and the role of assumptions inhibiting the implementation of leakage localization for real-world implementation needs to be evaluated for different choices of modelling approaches. • As the model approximations, measurement data and prior assumptions affects the objective function and optimization results of localization, they fall under three sources of uncertainty -structure, input, and parameter uncertainty. As far as the type of uncertainty, these choices fall under scenario type and are epistemic in nature.

Leak localization methods under uncertainties
Each class of leak localization method is affected by several confounding variables -choice of methods, data quality and quantity, their uncertainties as well as an interplay between them. These factors are often overlooked and a comprehensive literature that connects and compares uncertainties from different methods is missing in the literature. In this section, a brief overview of these methods is presented, and the uncertainties associated with each step in different classes are discussed in detail. Leak localization in WDNs consists of four key processes -Measurements (M), Hydraulic Simulation (S), Calibration (C) and Leak detection and Localization (L) linked depending on the class of method chosen to solve the leak localization problem. A flow chart depicting the interactions among them for different methods is shown in Figure 2. The input data (Xi) and measurements (M) constitute the sources of input uncertainty. The remaining blocks -Hydraulic simulation (H) is a system of non-linear mathematical equations derived from first principles that is calibrated (C) for leak detection and localization (L) using some optimization framework. Hence these three blocks together comprise model uncertainty. Figure 2b-c also denotes the data flow in training and testing of meta-models for leak detection and localization. Here, Xc refers to the calibration data, which plays a major role in hyperparameter tuning, and is crucial for determining the optimal set of parameters for meta-models. For model-based methods, the source of uncertainty can stem from all four blocks but can also be propagated throughout the whole process as indicated with blue arrows in contrast to data-driven methods where the models are black-boxes and are highly influenced by the quantity and quality of data and are propagated through the machine learning methods used. For hybrid methods, the source of data is either directly from measurements or from model states or both. Hence, the source of uncertainty and the propagation path also varies depending on the type of process followed. Furthermore, unlike modelbased methods, the model structure and parameter uncertainty within data-driven and hybrid models are highly influenced by the computational constraints (training epochs) and data sufficiency.

Results and discussion
The major features emerging from the literature search for each class of leak localization methods and their associated uncertainties are discussed in this section. Assumptions on single leak scenarios and propagating uncertainties from demand models for analyzing localization performance emerged as a common feature among many researchers. Even though many data-driven and hybrid modelling techniques have evolved in recent decades, most of the literature limits their discussion with benchmark networks (e.g. Hanoi, Net25) or case studies based on established real-world networks with historical data (e.g. Nova Icaria network). Furthermore, while all the methods assume the availability of calibrated models and measurement data of reasonable quality, model-based and hybrid methods were able to predict localization hotspots. Datadriven methods, as described earlier, were more successful in detecting faults across time than localization. In the following subsections, a brief background about the method characteristics is provided with relevant discussions on the interplay between uncertainties and each of the modelling entities for leak localization.

Model-based methods
Apart from the literature on providing essential discussions on the background about model-based methods, 16 key publications were reviewed for exploring uncertainties and leak localization performance. They were analyzed on networks of varying sizes but mostly focused on uncertainties due to nodal demands and modelling approximations. A brief background and relevant discussion on uncertainties is provided below. Pudar and Liggett (1992) proposed the use of pressure measurements along with a hydraulic model of the network to estimate leak location and magnitude. The location of the pressure measurements is selected based on nodal head-flow sensitivity analysis to maximize the effectiveness in finding leaks. Nevertheless, noisy measurements and uncertain nodal demands heavily influence this approach. It requires a highly accurate hydraulic model and parameter estimates for precise identification of leak location and magnitude. In the following years, many derivative methods based on the sensitivity matrix were developed to understand the performance of the sensitivity-based methods under the influence of uncertainties (Meseguer et al. 2015;Perez et al. 2014Perez et al. , 2016. However, these methods consider only uncertainties in consumption patterns under no leak scenarios except Steffelbauer and Fuchs-Hanusch (2016). Hence, the influence of combined uncertainties due to existing leaks and other parameters in the leak localization process is still missing. Another subset of the model-based approach, based on optimization to calibrate for leakage size and location, was developed (Berglund et al. 2017;Stoianov 2020, 2021;Daniel et al. 2022;Sophocleous, Savić, and Kapelan 2019;Vrachimis et al. 2021). These methods seek to minimize the errors between measurements and estimates from hydraulic models. Optimization based leak localization methods are also heavily underdetermined problems that suffer from the problem of equifinality like the optimal sensor placement problem described before. The process of model-based leak localization consists of four sub-processes that interact together as shown in Figure 2a. An overview of measurement uncertainty is discussed in the previous section. Uncertainties pertaining to other sub-blocks are discussed here.

Hydraulic simulation
We limit our scope to steady-state methods that are used for leak localization. These models comprise of a system of linear equations that conserves the volumetric flow rate in all network links and a system of non-linear equations capturing conservation of energy derived empirically by applying the Hazen-Williams or Darcy-Weisbach headloss relationship. With length (l) of the pipes, diameters (D), and nodal demands (d) as input data (Xi), along with measurements (M), the hydraulic solver computes all system states. Among the three groups of uncertainty under hydraulic solver, model structure uncertainty can be reduced to a greater extent by using verified model topology data and approximation methods. Model input uncertainty and parameter uncertainty play a significant role in contributing to uncertainty in the hydraulic solver block.

Calibration methods
In addition to the uncertainties with respect to roughness parameters and demand models in the calibration process, their effect on calibration techniques in model-based leak localization, is of interest. Different calibration techniques using fire flow calibration data were evaluated using a real-world network by Lippacher et al. (2019). The automatic calibration method that adjusted parameters for both roughness and minor loss coefficients had the highest localization accuracy. This study emphasizes the need for detailed model structure and appropriate calibration measurement data for improving the localization accuracy. Jun et al. (2022) validated the convergence and smoothness of optimization objective function surfaces for calibration when uncertainties in model structure, parameter and measurement errors are introduced. Although the response surfaces remained smooth for different uncertainties, they identified a shift in optimum values. A novel method that couples the least squares optimization method with demand calibration by tracking changes in geographical distribution of historical demand model parameter was developed for detecting and locating leaks larger than 1 l/s Sanz, Meseguer, and Pérez 2017). An evolutionary approach to track future changes in demand components could further improve the localization results. Chew et al. (2022) validated their method in a real WDN to combine daily model calibration and leak localization from a model utilization perspective, a systematic approach that accounts for daily nonrevenue estimates, adjustment of measurement offsets and pump curves and calibration of model parameters. In another approach for hydraulic model maintenance, Waldron et al. (2022) applied Principal Component Analysis to evaluate the significance of newly observed measurements to guide sampling, independent of the optimization-calibration process to identify the best hydraulic states for model fitting. Such methods can be used as a pre-processing step in calibration-based leak localization techniques to reduce localization uncertainty.

Localization methods
Most of the model-based methods that have already been proposed in the literature use two kinds of approaches to solve the localization problem. The first type is based on optimization techniques, where the hydraulic model outputs are compared with the measurements and the error is minimized. In the traditional optimization-calibration approach, the deviation between the measurements and model outputs are minimised subject to mass and energy conservation laws. The objective function is usually the sum of squared differences from the model estimates to the measurements or sum of maximum absolute differences or sum of maximum absolute errors. Some other researchers have also used genetic algorithms, linear programming, and mixed integer linear programming approaches to reduce the search space and increase the efficiency of leak localization (Berglund et al. 2017;Misiunas et al. 2006;Sanz et al. 2016;Sophocleous, Savić, and Kapelan 2019;Wu and Sage 2008). While the optimization-based approach can detect and locate leaks with great accuracy, they are computationally demanding.
The second class of model-based leak localization methods are based on sensitivity analysis, wherein, pressure residuals or flow residuals are calculated at the measurement locations. The residual vectors are then used to compute the correlations against the sensitivity matrix where the change in pressure corresponding to change in the network flows will be calculated Okeya, Hutton, and Kapelan 2015;Perez et al. 2014). The sensitivity matrix denotes the node at which there is a maximum deviation in the flow for the given pressure. The correlation of this sensitivity matrix with respect to the residual vector at the measurement nodes denotes the probable locations closest to the leak. However, the residuals are influenced by calibration and measurement errors which affects the localization accuracy. Readers are referred to Wan et al. (2022) for a detailed review of these two types of model-based localization methods.

State estimation and state observability
State estimation and state observability also plays a crucial role in efficient modelling and monitoring of WDNs. These techniques are used to estimate the system state (e.g. intermediate flows, pressures, valve states) from the limited number of noisy measurement locations in a large network. A detailed review of state estimation for WDNs is presented by (Tshehla, Hamam, and Abu-Mahfouz 2017). As WDNs are highly non-linear, the accuracy of non-linear state estimators for WDNs is highly dependent on the number and location of measurement devices and on the type of numerical optimization method used. Hence, if the estimated states are inaccurate and when used as input or as parameters for a consequent leak localization model the overall accuracy of the localization model is reduced. Table 1 provides a summary of model-based methods that implicitly or explicitly accounted for certain uncertainties and the corresponding methods used for analysis.

Data-driven methods
In the next class of methods, only measurement data or measurement data in combination with network topology information is used for leak detection and localization process. One of the distinctive features observed from the relevant literature studies is their limitation to not localize leaks to fine resolutions. Although they often do not require accurate models, their generalizability and need for quality data should be considered for real-world applications. Six key publications that considered uncertainties are discussed here with few other works that provided implicit details on uncertainties and leak localization. Gertler et al. (2010) performed leak localization using principal component analysis on pressure data. To distinguish between WDNs that leak and those that do not, De Silva, Mashford, andBurn (2011) andCody, Narasimhan, andTolson (2017) applied classification techniques based on support vector machine models. These techniques' reliance on explicit no leak and leak status data during the training phase to detect future leaks is a significant limitation. A few other approaches, which model network topology as connected graphs (Mulholland et al. 2014;Rajeswaran, Narasimhan, and Narasimhan 2018), used linear and mixed integer nonlinear programming techniques to compare the network flows determined from a system of underdetermined flow balance equations with the flow measurements that were available. With this method, each pipe section must have the proper weightings applied to it (depending on the pipe's integrity), or the network must be sequentially divided using valve isolations or intermediate flow measurements. This limits the practical applicability and scalability to larger networks.
In recent years, smart water networks, with the availability of sensing and control layers, offer the potential to identify leaks early in real-time (Mounce 2021). Artificial neural networks were trained on time series data from smart meters for burst detection in WDNs (Mounce and Machell 2006;Mounce et al. 2013;Mounce, Boxall, and Machell 2010). These pioneering works reported promising real-time application potential, however, restricted to large burst events to reduce high false positive rates. Recently, methods based on classification techniques such as spatial interpolation (Alves et al. 2021;Romero et al. 2021;Soldevila et al. 2019) or graph neural network methods (Zanfei et al. 2022;Örn Garðarsson, Boem, and Toni 2022) are also used for leak localization. Uncertainties within the context of data-driven approaches are discussed in detail in the following sections.

Modelling assumptions and limitations
It is understood that data-driven methods are successful for burst detection and detection of medium to large magnitudes of leaks but are not scalable and reliable for leak localization (Chan et al. 2018;Hu et al. 2021;Kammoun, Kammoun, and Abid 2022). The major factors influencing their applicability are the quality and distribution of data used for training the data-driven 'meta-model'. The scenario for which the model has been trained also influences the optimum parameters search needed for accurate localization. Most of the data-driven models are based on a supervisory approach, wherein, the meta-models are trained exclusively for leak-free states or using an ensemble of all prior historic leak events. The anomalies are then attributed to only leaks, thereby failing to differentiate the anomalous behaviour with seasonal variations, change in network state due to change in valve or pump settings. Such assumptions lead to high false positive rates in detection, poor accuracy in localization and less generalizability. Due to these limitations, these methods are still seldomly used for leakage localization despite the ease of training without expert knowledge.

Data requirements and model assessments
For reducing systematic errors and quantifying uncertainty in training and testing data, multivariate statistical techniques such as time series modelling, Principal Component Analysis, interpolation can be used for detecting missing data, tracking change in statistical distributions, data imputation and denoising. Detailed information about data analytics required for pressure and flow measurements in training and validating data-driven methods can be found in Wan et al. (2022). To understand the uncertainties due to the chosen meta-model architecture (type and parameters), a complete understanding of the meta-model behaviour is required to carry out uncertainty propagation and quantification from input (measurement data) to output (leak magnitude, location, and time of leak onset). To assess the quality of the chosen meta-model, techniques such as cross-validation, bagging, boosting, bootstrapping, Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) can be used (Hastie, Friedman, and Tibshirani 2001).

Sub-classes and associated inherent uncertainties
Data-driven methods can be sub-divided into statistical multivariate data analytical methods, prediction-classification, and clustering-based methods. The prediction-classification methods and clustering-based methods can be formulated in different ways -with or without the use of data from hydraulic models. For the sake of clarity, methods that explicitly use only measurements and/or topological information are considered as data-driven methods here. A method of spatial interpolation using Kriging and Bayesian time reasoning is carried out by Soldevila et al. (2019). The differences in estimated nodal heads between the testing and nominal conditions are used for flagging leak nodes. A similar approach by Sun et al. (2020), with an additional step prior to Bayesian time reasoning, uses linear discriminant analysis and neural networks to obtain probabilities of each node having a leak.
Another approach (Soldevila et al. 2021) uses Dempster-Shafer reasoning as an alternative to Bayesian reasoning to discriminate ambiguous hypotheses between nominal network estimations and existing observed network behaviour. This method outperformed earlier ones in terms of localization accuracy. An approach that compares a leaky state with the nominal network state, using graph-based state interpolation and state comparison, is presented in Romero et al. (2021). This method was able to locate leaks to the closest sensor location despite having a total number of measurement nodes to be <1% of all nodes.
While these methods show promising results, they are greatly influenced by the sampling strategies and locations. The need for training data from nominal states in these methods must be assessed with great consideration. This is because when the new nominal observations with uncertainties are compared with prior nominal observations, classifiers trained with predefined data distribution struggles to differentiate leaks with new nominal observations. Hence, the integration of methods like Dempster-Shafer reasoning into data-driven formulations are reliable. This argument has also been advocated in other engineering disciplines (e.g. Ma et al. (2021) and Sun and You (2021)). It is also important to obtain an optimal solution for sampling locations and strategies to be used for interpolation techniques. This is because these techniques are essentially data-driven simulation-optimization problems that are rarely discussed in the literature (Hüllen et al. 2020). The complexity arises from the possibility of slightly different realizations of the same type of interpolation models (e.g. Kriging) from slightly different samples or different interpolation models from the same samples (e.g. Kriging or Polynomial Regression).
Among other data-driven methods, applications of deep learning models for WDNs already look promising and are expected to have a larger transformation towards the 'digital water' infrastructure. A brief review of the fundamentals of deep learning and their applications to the water industry is given by Shen (2018). Applications of Graph Neural Networks (GNN) and Graph Convolutional Networks for spatial interpolation, and estimation in leak detection and localization has gained traction recently (Zanfei et al. 2022;Örn Garðarsson, Boem, and Toni 2022). However, these methods are yet to be tested for real-world scenarios and have high false positive rates and longer detection time windows. Here as well, an understanding of different uncertainties quantification methods is crucial to improve their performance. Table 2 provide the summary of data-driven methods that considered uncertainties during the modelling phase.
Within the context of practical applications of machine learning methods, the possibilities and advantages of uncertainty quantification techniques have been explored in other fields (e.g. to estimate uncertainties in rainfall-runoff modelling (Klotz et al. 2022), uncertainty quantification in drug design and discovery (Mervin et al. 2021)). These studies outline that there is a significant knowledge gap in identifying the robustness of applied machine learning algorithms and the advantages of exploring different frequentist and Bayesian methods to quantify the parametric and output uncertainties of meta-models. Furthermore, this process guides data sampling strategies, choosing appropriate meta-model architecture and to constrain assumptions on model and data further to reduce uncertainties in predicted outputs. To the best of the authors' knowledge, no such uncertainty propagation and quantification studies has been done so far for data-driven leak localization methods.

Hybrid methods
The third class of method uses knowledge from both measurement data and hydraulic model using multivariate statistical methods or machine learning together with model-based analysis for leak detection and localization. Currently, there are only a few hybrid approaches tested and validated for leakage localization. Four research works that can be classified as hybrid models are discussed here. Hybrid algorithms share features of both model-based and data-driven models (e.g. assuming leaks to be present only at nodes, single leak scenarios, accurate hydraulic model available for detecting new leaks). Hu et al. (2021) have highlighted that hybrid methods could complement the limitations of model-based and data-driven methods and can be promising for research in leak localization. Their sub-classifications are discussed here.

Classification and prediction methods
Classifier based leak localization techniques that use data from nominal and abnormal steady states are also used for detecting nodes closest to leak(s) (Ferrandez-Gamot et al. 2015;Quiñones-Grueiro et al. 2021;Soldevila et al. 2016). Few other researchers have proposed different ways of hybridisation for improving the leakage localization (Romero-Ben et al. 2022;Steffelbauer et al. 2022). Romero-Ben et al. (2022) proposed two different complementary approaches for detecting leaks in artificially simulated networks. They determined that datadriven methods are well suited for regions with more pressure sensors and model-based methods for regions with fewer pressure sensors. However, the feasibility of using these methods for real-world networks still needs to be investigated due to limited understanding of uncertainty propagation, measurement noise and effect of sensor placement in determining the accuracy of localization results. Steffelbauer et al. (2022) proposed an approach wherein the demands are calibrated using multiplicative time series modelling followed by hydraulic model calibration using the FOSM method and sensitivity matrix-based leak localization.

Series and parallel hybrid methods
It is to be noted that hybridisation could be done in different ways depending on the assumptions. In fields related to chemical engineering, especially in the applications of control theoretic approach for optimization and control of energy and chemical process systems, hybrid modelling has been used increasingly to improve system efficiency. Zendehboudi, Rezaei, and Lohi (2018) have reviewed different ways in which hybridisation could be done and how it has helped in advancing the optimization of chemical process systems. Although the review is about the application in chemical systems, the methods per se and their advantages and limitations are also applicable to other engineering systems in general. Hence, they could also be adapted to monitoring and operations of WDNs. In parallel hybrid methods, the steady state models are assumed to be accurate, and measurements of flow and pressure are assumed to be reliable. The measurements are also used to train a meta-model which is validated using statistically similar data. The difference in estimates from the meta-model and a calibrated hydraulic model (or equivalently, a model with quantifiable uncertainty in one or more parameters) are then used to determine probable locations of uncertainty. Classifierbased approaches fall under this category, wherein, the residuals from model-based simulations and observations are used for training a data-driven classification meta-model (Ferrandez-Gamot et al. 2015;Lučin et al. 2021;Santos-Ruiz et al. 2020;Soldevila et al. 2016Soldevila et al. , 2017Soldevila et al. , 2022. Another novel method that uses a neural network for training residuals from hydraulic models via Gaussian process estimation for estimating leak size and location is discussed in Quiñones-Grueiro et al. (2021). Though these methods considered uncertainties in demands (also leak flow rates in some cases) and roughness parameters during the data generation process for training classifiers, the effect of combined uncertainties in model-based and datadriven classifier parameters affects the solution to some extent and most likely irreducibly. Table 3 summarizes the uncertainty aspects that are considered in the hybrid-models.
In series hybrid methods, the measurements are used to build an estimator (e.g. time series model, neural networks, graph interpolations) which is then fed as the input to a hydraulic solver for calibration, and pressure and flows are consequently estimated in the network. Leaks are then localised using either statistical measures or sensitivity-based metrics. In the series hybrid approach, the measurements are usually assumed to be reliable. Although hybrid methods are advantageous with respect to fast computation times and modelling complexity, propagation of uncertainty depending on the hybridisation needs further study.

Summary of study outcomes
This paper provides a brief review on the different sources, types, and nature of uncertainties with respect to steady-state hydraulic modelling and data-driven algorithmic approaches for leak localization. Despite recent advancements in WDN modelling strategies and the availability of increased amounts of measurement data, the ability to locate leaks in real-world networks with high accuracy remains a challenging task due to the various sources and interplay of uncertainties. The following five major factors that contribute to uncertainties in leak localization problem were elucidated in this review. i) network changes, ii) need for iterative calibration procedures, iii) measurement uncertainty, iv) choice of measurements and their locations, and v) choice of modelling, objective functions, and constraints in solution strategies. Furthermore, for each class of localization methods (i.e. model-based, data-driven and hybrid methods), the current review outlined the following major factors that are often overlooked when dealing with uncertainties in leak localization processes. • Most of the existing model-based methods rely on unrealistic assumptions, such as considering a part of modelling inputs/parameters to be deterministic (e.g. demands which are highly stochastic) or stochastic with overly simplistic prior distributions (e.g. Gaussian models for roughness residual analysis), approximations in optimization formulations despite the existence of equifinality in choosing appropriate measurement locations and sampling strategies. • Data-driven methods have been successfully applied for leak detection, but there only exists a limited number of methods for localization. Further, unlike the dissemination of knowledge on uncertainty quantification of deeplearning models applied in other engineering disciplines (e.g. sensitivity analysis, neural networks with MC dropouts, Variational Inference, Deep ensemble), there exists no such methods for quantifying uncertainties in datadriven leak localization methods. • Hybrid modelling strategies that combine model-based methods with data-driven methods look promising in increasing the accuracy of localization and computational efficiency. However, a complete understanding of assumptions made in their linking is often overlooked and it is crucial in propagating and quantifying the overall uncertainty in localization predictions.
The review also highlighted that for localization methods to be successful in real-world WDNs, all the stakeholders in the WDN leakage management should be aware of the applicability and limitations of the model selected and the underlying uncertainties of this choice. This enables practitioners in water utilities to make an informed decision on the model for effective leak localization. Thus, it positively impacts the decision-making process for efficient resource utilization during on-field test campaigns as well as rehabilitation plans. Such systematic decision support systems allow for considering, the trade-off between the uncertainties in predicted localization hotspots and the targeted cost-benefit ratio for efficient leakage management.

Conclusion
Leak localization is one of the key research areas within fault detection in WDNs in recent decades. The success of leak localization algorithms vastly depend on the modelling strategies, understanding of the underlying modelling approximations and the interplay of uncertainties arising from the confounding variables. Though some earlier attempts have been made to provide a comprehensive review of different methods of solving the leak localization problem, a compendium of information linking the different leak localization modelling strategies, their underlying assumptions and characteristics of model uncertainties is missing in the literature. This article provided a systematic review of different sources, types, and nature of uncertainties that influence the leak localization performance with respect to the three different classes of methods i.e. model-based, data-driven and hybrid models.
The review outlined that the five major factors that contribute to uncertainties in leak localization are often overlooked during modelling process, which is a major research gap in the field and, as the tools are available in comparable fields, one that should be filled. For the different families of methods different research needs for the future can be highlighted.
• For model-based methods, the interplay of uncertainties across different sub-blocks of the modelling process is often ignored. Many techniques focus on considering uncertainties during roughness or demand calibration, or during optimal sensor placement but tend to overlook their joint influence on localization performance due to significant intricacies among different variables and modelling assumptions. • For data-driven methods, even though many methods allow for successful leak detection of medium to large magnitude leaks across time, spatial localization is still evolving. Unlike model-based methods, uncertainty quantification of data-driven meta-models has not yet been studied well for leak localization. Further, the generalizability of such data-driven models needs active research for successful realization in real-world networks. • For hybrid models, the linkage of hydraulic models with data-driven models poses additional challenges in identifying the sources of uncertainties and reducing them further.
In conclusion, it is therefore paramount to realize the modelling assumptions and propagate errors during the modelling phase, irrespective of the class of methods used, for a more accurate and comprehensive leak-localization system.