A Bayesian hindcasting method of levee failures applied to the Breitenhagen slope failure

ABSTRACT Hindcasting of past levee failures enhances insights in the performance and vulnerability of levees. The scarcity of field evidence makes identifying the cause(s) of failure difficult. Under these circumstances, multiple scenarios and model choices are possible to characterise and to model the failure. This paper shows how probabilistic Bayesian techniques advance the procedure of hindcasting of levee failures. In the developed approach, a-priori levee information, and failure observations are systematically taken into account to determine the most likely scenario and the most representative model choices to characterise the failure most accurately. Observations, such as the slip surface, are taken into account in the probability estimates. The levee failure near Breitenhagen, Germany (2013) is used as a case study. The levee failed during river floods due the instability of the landside slope. The levee failure was most likely triggered by locally weak soil conditions and unexpected high water pressures due a connection between a pond on the riverside of the levee and the aquifer. These conditions were likely caused by the occurrence of a previous breach at this location. The approach developed in this paper is expected to support a more systematic and objective method of analysis of other levee failures.


Introduction
Consequences of flood defence failures are very large and it is therefore of great importance to gain insights in the performance and vulnerability of these systems flood defence (Vorogushyn, Merz, and Apel 2009;Jonkman and Schweckendiek 2015). Hindcasting of failed levees provides valuable insight into general levee performance, quality of strength models and dominant factors that contribute to levee failure. Examples of the hindcasting of past failures include analyses of levee failures in New Orleans, USA (Seed et al. 2006), Breitenhagen, Germany (Grubert 2013) and Yabe-river, Japan (Honjo et al. 2015). These insights can be used to advance the design and engineering of flood defences. As part of these hindcasts, various model choices have to be made to calculatively characterise the failure as accurately as possible. However, these model choices are often uncertain. Furthermore, field evidence is scarce: subsoil information is often "washed away" and there are often no direct visual observations during the failure. This makes identification of the cause(s) of failure difficult.
Forensic analysis provides a systematic procedure of analysis for the investigation of failures. The procedure of forensic analysis roughly consists of three stages: (1) collecting and reviewing of evidence, (2) utilising calculative models for a back analysis (hindcasting) to identify the cause, and (3) reporting the findings (Carper 2000). In previous work, it appeared that hindcasting using a deterministic sensitivity analysis identifies several possible causes of failure (Zhang, Tang, and Zhang 2010;. However, the deterministic sensitivity approach of hindcasting did not provide an explicit insight into the likelihood of various failure scenarios. In order to enhance levee hindcasting, probabilistic Bayesian techniques are used in this paper to identify the most likely scenario of failure, the most representative model choices to characterise the failure, and the most dominant parameters triggering the failure. The prior and posterior probabilities reflect the probability that a model best describes the reality and that a scenario of failure was present given the evidence. The combination of the most likely scenario of failure and most representative model choices is referred to as "MLC" in the remainder of this paper. In the past, Bayesian techniques have been used to characterise the lifetime reliability of geotechnical structures (Zhang, Tang, and Zhang 2010;Schweckendiek 2014b;Baecher 2017), and have been applied to the hindcasting of different types of geotechnical related structures (Luckman, Der Kiureghian, and Sitar 1987;Gilbert, Wright, and Liedtke 1998;Zhang, Tang, and Zhang 2010;Gilbert 2016). However, to our knowledge, Bayesian techniques have not been applied to hindcasting of specific levee failures. Another novel element in the proposed approach in this paper is that Bayesian updating is carried out using specific information on the geometry of the observed failure, in this case, the slip surface. This approach allows us to account for the scarcity of evidence, as the available information after failures is generally limited to a small number of observations, photos and videos.
The article is structured as follows: The background of the method for the hindcasting of levee failures using Bayesian techniques is presented in section 2. In section 3 the case study of the levee failure near Breitenhagen in the year 2013 is analysed using the Bayesian hindcasting method. The results are discussed in section 4, and the conclusion and recommendations in section 5.
2. Probabilistic hindcasting of slope instability using Bayesian updating

Method background for Bayesian levee hindcasting
The proposed approach for hindcasting levee failures consists of defining scenarios that could explain the failure and assign prior probabilities of occurrence to these scenarios. Subsequently, the probability of failure given a scenario is calculated (which is used as likelihood) and Bayesian updating is used to update the prior probabilities with the likelihood and failure observation (see e.g. [Gilbert, Wright, and Liedtke 1998;Schweckendiek 2014b]). The objective is to find the most likely scenario (S i ) that has resulted in the failure event (F).

Prior probabilities
The first step of the proposed hindcast approach is to use the collected information for the identification of possible scenarios (S i ) that could have resulted in failure of the levee. A scenario typically represents the loading conditions possibly triggering the failure as well as the subsoil conditions (e.g. pore water pressures, stratigraphy). Prior scenario probabilities (i.e. P(S i )) are assigned in such a way that they total a sum of 1 (e.g. Schweckendiek et al. 2017).
2.1.2. Likelihood function based on failure and observed evidence The Bayes' theorem makes it possible to incorporate evidence in the analyses, such as observed failure information (Schweckendiek 2014a). To apply Bayesian updating of the previously assigned prior scenario probabilities, two pieces of information are incorporated using a likelihood function in the next steps, i.e. (A) the actual occurrence of failure and (B) the geometry of the observed slip surface (when available).
Hence, likelihood function in step (A, referred to as likelihood A) is calculated by determining the probability of failure (F) per scenario P(F|S i ) according to: In this equation, the limit state function Z is used to describe when failure occurs. This is evaluated in this paper by using a slope stability model. The joint probability density function f(x) of the random variables (X) is used to describe Z. Negative values of the limit state function (Z) describe which combinations of random variables (X) combination results in failure of a scenario (S i ). Hence, the likelihood function A shows how likely it was that, according to a model, failure would have occurred given a certain scenario. And thus how likely it was that failure was observed given a certain scenario. This likelihood is expressed as a probability of failure given the scenario. This probability of failure is the results of all the uncertain input parameters (X) in the model. The First Order Reliability Method (FORM) is used in this paper to estimate the probability of failure, Equation (1). Subsequently, Equation (2) is used to find the design point at Z(X) = 0 which is the combination of parameter values that provide the highest probability density (see Rackwitz 2001). Hence, this is the most likely combination of parameters values triggering a failure. When this design point is found, the probability of failure is calculated, using P f = 1−f(β).
The design point is found using a process of iterations in evaluating: Where β is the reliability index and α m is the influence coefficient, or the FORM sensitivity coefficient (for which Σα m 2 = 1) of variable X m ; u m is the standard normally distributed variable representing a normalised stochastic variable. The value of the influence coefficient of a variable provides a measure of its contribution to the reliability. The highest value of the influence coefficients (max α m ) identifies the dominant basic variables. The process of iterations in the used FORM analysis using the Probabilistic Toolkit is described in (Brinkman 2015).
Additionally, for slope instability analyses, the observed shape of the slip surface of slope failure (h') can be combined with the likelihood function (A) and, thereby as a next step, can be used for updating the scenario probabilities. This combination of likelihoods is further referred to as the likelihood (B). Evidence is expressed by an observation function h' and limit state function (Z), as in Equation (3). Negative values of the h' conditioned limit state function (Z) describe which random variable (X) combination result in both potential failure of a scenario (S i ) and observed slip surface shape (h'). The following likelihood function combines the influence of failure and slip surface (h') (into likelihood function (B)): In which Z(X) is the limit state function that includes the observed slip surface in the slope stability computation.

Posterior distribution of scenarios probabilities with observed failure
Subsequently, the earlier assigned prior probabilities of scenario S i are updated using both the information that the slope has failed (A) and the geometry of the slip surface that is observed (B). This posterior distribution is shown in Equation (4): 2.2. Bayesian hindcasting of slope instability 2.2.1. Steps of Bayesian hindcasting of slope instability A Bayesian method for the hindcasting of slope instability is proposed in six steps, based on the method by  and described in Figure 1. The presented Bayesian theory and the model for slope instability analysis (introduced in the following sections) are used to identify the combination of most likely scenario triggering the failure and the most representative model choice (MLC), step by step: (1) System description and identification of possible failure scenarios (e.g. stratigraphy and water pressures) with the help of the collected evidence. Prior probabilities are assigned to all scenarios (S i ), a uniform prior can be chosen in case no prior knowledge is available to discern the scenarios: (2) Introduction of alternatives for model choices (M j ) and parameter choices (X) into the scenarios (S i ) and building slope stability models (noted as (S i ∩M j-)). Prior probabilities are assigned to all model choices: (3) Evaluation of the limit state function using the generated slope stability models (S i ∩M j ), to calculate the likelihood, one by one: A. The conditional probability of event F (likelihood (A)): B. The conditional probability of F (likelihood (B)) including the field observation information when available (observed slip surface information (h')): (4) Update of prior scenario and model choices probabilities into posterior probabilities using likelihood (A) given failure (F) and (B) slip surface information (h'): (5) Identification of the combination of most likely scenario and model choices: the scenario and model choices resulting in the highest posterior probabilities, given failure (F) and slip surface information (h'), are considered the most likely to characterise the failure most accurately (MLC): Please, note that the prior probabilities refer to the probability that a model (M) best describes the reality, and that a scenario (S) was present given the information.
In the next paragraphs step 1 to step 6 will be explained in more detail.

Elaboration of the hindcasting steps
2.3.1. System description and inputs (Step 1. and Step 2.) A general model of slope stability analysis of a levee is used to analyse the failure of the levee. The analysis calculates the Factor of Safety (FoS) which is a function of the resisting moment (M R ) and the driving moment (M S ). Higher pore water pressures increase M S and decrease the available shear strength of the soil and lead to a reduction of M R . The limit state of the slope stability model is used with the purpose to determine the conditional failure of the levee and to incorporate the influence of the observation information (Step 3.A. and 3.B). The performance of a levee depends on various typical variables (van Deen and van Duinen 2016) ( Table 1). Collected evidence is thoroughly studied on the typical variables influencing the performance of the levee with the purpose of building a slope stability model. Further, the collected evidence is used to identify scenarios which might support conditions for triggering the levee to fail.
Generally accepted literature is used to collect possible alternatives to typical variables supplementing the required set of variables in building the slope stability model.
Each discrete variable (Table 1) contributes a finite set of possible choices of model and parameter choices. All possible combinations of choices are used to generate slope stability models covering the possible characteristics of the failure.
A uniform prior can be chosen in case no prior knowledge is available to discern the scenarios of failure and alternatives of typical discrete variables. For example; the evidence is used to construct four scenarios possibly triggering the failure. Each of them is assigned a probability of 0.25 (Equation (5)).

Likelihood given observed failure (Step 3.)
The generated slope stability models (step 1 and step 2) are used in step 3A to determine the conditional probability of failure, which is taken as likelihood given failure (step 3A). The probability of slope instability is calculated using the limit state function Z(X), see Equation (14). This is repeated for each generated slope stability model and its corresponding random FoS(X); with each model based on a combination failure scenario, model and parameter choices: Equation (15) is used to calculate the conditional probability of failure (step 3A.).
The collected evidence is subsequently used in step 3B. to estimate the geometry of the observed slip surface ( Figure 2) or a zone in which the slip surface might have occurred. The slope stability models that produce a slip surface that correspond the observed information (shaded area in Figure 3 and denoted with h') are taken into account of the analysis and expressed by Equation (16) (step 3B.).
In practice, the observations of a slip surface are not exact. Therefore h' represents an estimated area with an upper boundary (S R1 ) and lower boundary (S R2 ), see Figure 2. The yellow area shows the envelope of all possible slip surfaces that is the result of step 3A. The shaded area shows the boundaries that are introduced to incorporate the observations, as done in step 3B.

Posterior probabilities identifying the most likely scenario and model choices (
Step 4. and Step 5.) Posterior probabilities are used to identify the MLC for hindcasting purposes. Equation (4) is calculated by applying a tree diagram in which all combinations of scenario and model choices are shown. For each combination of scenario and model choice, the likelihood is calculated in a slope stability computation by computing the probability that failure would have occurred. Combining these likelihoods with the prior probabilities of the scenarios finally gives the posterior distribution. For explanatory purposes, a simplified example is shown, in this section. The simplified example takes only two discrete choices (n = 2) into account per discrete variable, that is two scenarios of failures ("S 1 ","S 2 ",) and two model choices ("M 1 ","M 2 "). An overview of the resulting four possible slope stability models (noted as (S i ∩M j )) is shown both by an event tree ( Figure 3) and an Edwards-Venn diagram ( Figure 4). The slope stability models are used to determine P(F|S i ∩M j ). Prior probabilities are assumed P(S 1 ) = P(S 2 ) = 0.5 and P(M 1 ) = P (M 2 ) = 0.5; since P(S i ) is independent of P(M j ) result in P(S 1 ∩M 1 ) = P(S 1 )P(M 1 ) = 0.25.
The four slope stability models (S i ∩M j ) are assumed mutually exclusive, and the total probability of failure is the union in event F ( Figure 4). This total probability indicates whether the failure was likely (could be expected) or not (a surprise); the total probability of failure is calculated as: P(F) = P(F|M 1 > S 1 )P(M 1 |S 1 )P(S 1 ) + P(F|M 2 > S 1 )P(M 2 |S 1 )P(S 1 )  Bayes rule is used to update the prior model and scenario probabilities by the information that failure occurred (step 3A). When slip surface information is available, the set of likelihoods given failure and observation information (step 3B) replaces the set of likelihoods given failure (step 3A, Figure 3). The posterior probabilities of the scenarios of failure and model choices are obtained by alternately evaluating Equation (18) and Equation (19): . Event tree with the incorporation of the scenarios of failure ("S 1 ", "S 2 "), alternatives in model choices ("M 1 ", "M 2 "). The slope stability models are put to purpose to determine the related likelihood given failure ("F|S i ∩M j ") and given non-failure ("F|S i ∩M j "). This way, the most likely scenario leading to failure and model choice to represent the levee failure most accurate is obtained by evaluating Equation (11) and Equation (12).

Dominant influence factor (Step 6.)
The MLC is used to determine the most dominant variable leading to the failure of the levee. The influence factor is a by-product from the evaluation of the limit state function using FORM, as is done in step 3A and step 3B, and provides a measure of the contribution of each variable (Σ m α 2 = 1) to failure. The most dominant variable is identified utilising Equation (13).
3. Case study of levee failure near Breitenhagen, Germany in 2013 3.1. The Breitenhagen case: levee failure and input data

General failure information
In 2013, the levee near Breitenhagen (Germany) failed and caused considerable economic damages to the area, see e.g. Figure 5 (Weichel 2013). Previous analyses of the levee failure show that the levee would be calculatively stable (FoS > 1) under conditions of best estimates of levee characteristics (Grubert 2013;). This indicates that possibly an anomaly (e.g. old breach or conductive layer) dominated the outcome (Grubert 2013;. Details on the interpretation of the collected data and the identification of anomalies, which are introduced in the analyses as possible scenarios leading to failure, are discussed by . The collected data are summarised in Table 2. The data that support the findings of this study are available in the International Levee Performance Database (ILPD)a public database with information on levee failure and performance cases (at https://leveefailures.tudelft.nl/, failure-id 121001: Breitenhagen, 2013).

3.1.2.
Step 1: System description, identification of possible failure scenarios 3.1.2.1. General. The collected evidence is examined to identify possible failure scenarios, i.e. loading and subsoil conditions that possibly explaining the failure. The Breitenhagen levee has a typical cross-section of a levee on top of an aquitard on both land-and riverside, consisting of layers of cohesive and alluvial soil, and a high permeable aquifer consisting of sand and gravel ( Figure 6; based on [Grubert 2013]). At the location of the breach, the levee was about 3.50 m high and had a crest width of 3.00 m. A possible failure scenario is caused by the trees growing at the section between the levee and the pond near the toe of the levee at the waterside. Furthermore, this particular section of the levee has likely breached in the past, resulting in a pond in front of the levee (Sixdorf 2016).
3.1.2.2. Failure scenarios. Seepage of water from the riverside causes local high water pressures inside the levee to reduce the shear strength and possibly trigger slope instability. The best-estimated situation is referred to as the "Base case" (BC). Subsequently, three possible scenarios of water pressures are identified as shown in Figure  7. To emphasise the scenario-specific dominant water pressures, the influence of each scenario-specific water pressure is introduced in relative to the BC.  further elaborates and motivates the different scenarios: . Scenario "Base case" (BC): the best estimate of the base case is based on normal levee investigation data (borings, etc.) and best estimates of water pressures, . Scenario "Saturated levee" (S1): higher phreatic as a result of more permeable levee than expected, based on evidence from photos, . Scenario "Conductive layer" (S2): higher water pressures inside levee due to a conductive layer inside the levee as a result of tree roots, based on evidence on photos, . Scenario "Pond connection with aquifer" (S3): higher water pressures in the aquifer under the levee due to a close connection of the outside water level with the aquifer, based on evidence on photos.
3.1.2.3. Water pressures in failure scenarios. This analysis adopts a pragmatic implementation of water pressures using simplified models of quasi-steady-state flow to incorporate the transient flow (Figure 7; [TAW 2004]). Three aspects are considered: A. the phreatic line (PL), B. the aquifer head (HH1) and C. the interpolation between A. and B. (this is done linearly and not further discussed). Whether steady-state conditions are reached depends on the duration of the water level and the hydraulic conductivity of the soil. In this case, the higher water event lasted over a month, of which 12 days the water was relatively close to the maximum (see . Analysis by Drews (2015) shows that the aquifer conditions mostly reach steady conditions for the various scenarios, which is incorporated in the modelling in this study. For the phreatic line, this is highly dependent on the hydraulic conductivity of the levee material, see Drews (2015). For the reported low conductivity, steady-state conditions are not met. However, since photos show wet soils during, the slope failure; indicating a high phreatic line. Hence, a phreatic line that represents (close to) steady-state conditions is used in the base case.
It should be noted that using steady-state conditions is usually pragmatic and conservative for design considerations; for hindcasting, this may lead to the identification of the wrong dominant scenario in case there are significant transient effects. A sensitivity analysis shows that using transient water pressures based on a low conductivity dike body does not influence the outcomes much in this case since the aquifer pressures are much more important, see "Discussion".
3.1.2.4. Modelling water pressures. The water pressures are modelled in each scenario using probability distributions to reflect uncertainty in actually occurred pressures. This is shown in Figure 7 and summarised in Table  3. The red points in Figure 7 are assigned the probability distribution, linear interpolation is used to connect this location with waterside and landside boundaries. The following choices are made: . Base Case (BC): the mean of the PL (dark blue) and aquifer (light blue) are shown in Figure 7 and are based on quasi-steady-state conditions. For the PL, there is a small uncertainty modelled (0.3 m; see (Rozing 2015)). Also, a truncated distribution is used, similar to the dashed line of S1, to reflect the physical boundaries as the water level cannot exit the levee body. The PL can be higher than steady-state conditions though due to e.g. rainfall. For the aquifer, blanket theory (USACE 2000) is used to compute the water pressures (HH1). In this case, a semi-pervious top stratum to calculate the head is shown in Figure 6. The uncertainty is 0.3 m in both x-, and zdirection. Again a truncated distribution is used with upper bound equal to the maximum river level and lower bound equal to the pre-flood ground water table. . Scenario "Saturated levee" (S1): The only difference with BC is the increased standard deviation of the  There is a linear interpolation between HH2 and both the PL and HH1 which have the same properties as the BC (but are not shown in the figure for clarity) . Scenario "Pond connection with aquifer" (S3): The only difference with BC is the increased mean of the aquifer head (HH3) due to the pond connection, resulting in a lower L1 in the blanket equations, see Figure 6. 3.1.3.
Step 2: Alternatives for model choices and parameter choices The possible model choice and parameter choices are derived that, together with the failure scenarios of step 1, will characterise the levee failure most accurately. The different scenarios are implemented with alternatives of Limit Equilibrium Methods (LEM), soil reaction behaviour and properties of soil parameters, to generate a set of slope stability models that include all possible characterisations of the failure ( Table 1). The LEM of Bishop (M Bishop ), Uplift Van (M Uplift ) and Spencer (M Spencer ) are applied as possible analytical slope stability methods (Bishop 1955;Spencer 1967;CIRIA 2013). LEMs are chosen over more complicated methods such as Finite Elements because of their short running time and sufficient accuracy for the not-complex slip surface as observed in this failure. The different LEM mainly differ in the various shapes of slip surfaces that they take into account. Moreover, the LEM of Bishop is considered less accurate than the LEM of Uplift Van and Spencer but is incorporated as a control method. The uncertain typical variables Table 3. Uncertainties in water pressures. The hydraulic head distribution is modelled in the red nodes ( Figure 7) and modelled as a truncated normal distribution. The coordinates correspond with Z and X in Figure 7. *S1, S2, and S3 have the same distribution for PL and HH as the base case, except for what is shown in the "water pressure" column.  (step 1) are the subject of this study as presented in Table 1. Therefore, the overall imperfection of the simulation is accepted and compensatory measures, such as an overall model uncertainty factor, are left out of the assessment. How soils react to loading depends on the rate of the loading and the conductivity of the soil. Often, the shear strength of the soil is modelled as drained or undrained, while in reality the soil likely reacts as partially drained. Initially, drained (SB dr ) and undrained (SB und ) (Schofield and Wroth 1968;Ladd 1991) are used in this paper as these are most commonly used. Partially drained behaviours should be considered in case this is expected to have a large influence on the results; which is not expected for this case. Undrained soil behaviour takes the possibly generated water pressures by deformations into account, in contrast to drained soil behaviour (van Deen and van Duinen 2016).
The undrained soil response for low permeability materials is implemented using the SHANSEP implementation (Ladd 1991)  The report of Grubert (2013) describes the local soil types and characteristics but does not report the probability distributions of the strength parameters that represent the diversity of shear strength in space (TAW 2001;Calle 2008). Based on the descriptions of the soil characteristics by Grubert (2013) and Normcommissie (2011), the corresponding values of soil strength are collected for both drained and undrained soil behaviour (Tables 4 and 5). The probability distributions of the parameter input are adopted from Baker and Calle (2006).
Conform steps 1 and 2, failure scenarios are identified, a finite set of interchangeable alternatives of model choices and parameter choices are systematically collected, which results in 24 combinations of failure scenarios and model choices (S i ∩M j ∩SB n ). Each combination is used to build a slope stability model (see Figure 8), which is evaluated for the probability of failure given the a-priori conditions. When observed slip surface information available (see step 3 below). The observed slip surface information is included in all 24 slope stability models by conditioning the search area of the critical slip surface to the observed slip surface. Both the soil parameters and hydraulic pressures of relevant soil layers are implemented as continuous probabilistic distributions (Tables 3-5). An overview of all  (Table 3).
combinations is presented using an event tree in Figure 8 Appendix A, which is enclosed in the supplemental material of this paper, presents an elaborate overview of all combinations and calculations results, and a calculation example that demonstrates how the posterior probability of S 3 is calculated.

3.1.4.
Step 3: Likelihood given failure and field observational information The assembled 24 slope stability models are used to evaluate the accompanied limit state functions with FORM. This results in the conditional probability of failure of each slope stability model, which is adopted as likelihood in this analysis. The likelihood is determined for two situations: A. Field observation information is not available and the search area of the critical slip surface is not defined in the slope stability models, resulting in P (F|S i ∩M j ∩SB m ) (step 3A.), B. Field observation information is available and the search area of critical slip surface is conditioned to the observed slip surface information in the slope stability models, resulting in P(F∩h'|S i ∩M j ∩SB m ) (step 3B.). Figure 9 shows the field observation information that is available. The slope stability models are conditioned (h') according to the observed shape of the initial slip surface as part of step 3B. (Equation (8)). The exact geometry of the slip surface is difficult to estimate, and therefore estimated by an upper and lower boundary that are 2 m apart, as suggested in Figure 2. 3.1.5.
Step 4, 5, and 6: Systematic analysis of uncertainties Both sets of the 24 likelihoods (step 3A. and 3B.) and the prior probabilities (step 1 and 2) are implemented alternately in Bayes rule and used to determine the posterior probabilities for each scenario of failure and model choice (step 4). The resulting posterior probabilities substantiate the identification of the MLC (step 5) which is used to identify the parameters that contributed to the failure the most.
Evaluating the limit state with FORM enables to estimate the conditional probability of failure, but also determines the contribution of the individual parameters to failure (Sa 2 i = 1). Because FORM might introduce approximation errors on the probability of failure, the results have been verified with Monte Carlo which gave very similar results for the considered cases.

Updated scenario probabilities
The two sets of the 24 likelihoods (step 3A. and 3B.) are used to update the prior probabilities of earlier identified Table 4. The probability distribution of soil properties (parameters of drained soil behaviour), soil weight (γ), wet soil weight (γ wet ), soil friction angle (w), cohesion (c'). * Grubert (2013) used local soil investigations to identify the soil types, **Normcommissie 2011 provides typical mean values of strength parameters which are common in the Netherlands with similar soil description, *** Baker and Calle (2006) provide default typical values of variation coefficients with similar soil description. The probability distributions are lognormal with a mean (μ) and a standard deviation (σ) (Schweckendiek et al. 2017  scenarios that possibly trigger a failure (step 1). The prior and the resulting posterior probabilities are shown in Figure 10, step by step.
Updating the prior probabilities of the scenario BC, S 1 , S 2 , and S 3 , based on the information of the actual failure (step 3A) emphasises S 3 as the most likely scenario (P(S 3 ) = 0.25 to P(S 3 |F) = 0.72). When the observed slip surface information is included in the update (step 3B), the posterior probability of S 3 reduces slightly (from P(S 3 |F) = 0.72 to P(S 3 |h'∩F) = 0.64). The results identify S 3 as the most likely scenario of failure.

Systematic analysis of uncertainties and most dominant parameters
Additionally, the set of the 24 likelihoods (step 3A. and 3B.) is used to update the prior probabilities of alternatives in the model choices (step 2), in the process of identifying the MLC. Both the prior and the resulting posterior probabilities are shown in Figures 11 and 12. Then, the MLC is deployed to determine the most dominant contributor to the failure (step 6). Figure 11 shows that updating of the prior probabilities on the information that the actual failure happened (step 3A), emphasises the LEM of Uplift Van as the most likely choice of LEM (P(M Uplift ) = 0.33 to P(M Uplift |F) = 0.43). However, when the observed slip surface information is included in the update (step 3B), the resulting Figure 10. Prior probability per scenario BC, S 1 , S 2 , and S 3 (equally distributed probabilities (green)), the posterior probabilities per scenario given failure (red) and observed geometry of slip surface (blue).  Figure 12 shows the most likely resulting slip circle (in the design point) when S3: Pond and undrained soil behaviour is assessed using LEM of Uplift Van, Spencer and Bishop without observational information (step 3A, upper row) and with observational information (step 3B, lower row) included. Figure 13 shows that updating prior probabilities of soil response behaviour on the information that the levee failed (step 3A.) emphasises the posterior probabilities of the undrained soil response behaviour as the most likely soil behaviour, i.e. P(SB undr ) = 0.50 to P(SB undr |F) = 0.65. Including the observed slip surface information in the update (step 3B.) has little effect on the posterior probabilities. The posterior probabilities identify the undrained soil response behaviour to characterise the failure most likely most accurately; however, the differences are small and no firm conclusion can be drawn.
The slope stability model implemented with the combination of scenario pond connection with the aquifer (S 3 ), method of Spencer (M Spencer ) and undrained response soil behaviour (S udr ) characterises the failure most likely most accurately. This slope stability model is used to evaluate the limit state and to estimate the contribution of the individual variables to the failure. Figure  14 shows that overall the shear strength ratio and the hydraulic head in the aquifer are considered the most dominant parameters contributing to the failure. The influence of the Pre-overburden pressure seems to decrease as a result of the incorporation of the observed slip surface information.

Framework for probabilistic hindcasting of levee failures
The six steps of hindcasting are based on a framework which explicitly and transparently accounts for significant uncertainties and model choices by quantitative means that reflect on the likelihood of occurrence. Although, consideration of all possible combinations of scenarios of failure and model choices is time-consuming; it does provide a thorough analysis of the failure possibilities.
Even more, the suggested Bayes technique based approach enables the implementation of observational information in the hindcasting. The approach provides insights on how each piece of evidence influences the scenario and model choice related posterior probabilities, building up to the identification of the slope stability model best representing the failure, piece by piece. The proposed slope stability model facilitates the identification of the most dominant parameter contributing to failure. The information that the event of failure actually happened is decisive for the identification of the most likely failure scenario. The information on the shape of the slip surface is decisive for the identification of the most representative model choices and the most dominant parameter contributing to failure.
The First Order Reliability Method (FORM) is used in this paper to estimate the probability of failure for every generated slope stability model. However, FORM gives an approximation of the probability of failure meaning that the exact value of the probability of failure is unknown. FORM has the benefit of providing a quick answer and insight into the influence factors. It also has (potential) drawbacks. For instance, the found local design point might correspond to a non-representative failure probability. In such cases, the calculated influence factors are non-representative as well. Also, the iterative process might not converge to one answer due to numerical problems. However, the software that was used (Probabilistic Toolkit of [Brinkman 2015]) provides the option of detecting numerical problems and this was not an issue for this assessment. Also, several results of FORM calculations have been verified with parallel Monte Carlo simulation, providing very similar results.
Scenarios of failure and alternatives of model choices are assumed equally probable as a first estimate, despite that some of the scenarios and the model choices are considered less likely, such as the introduction of Bishop's method. Experts consider Bishops' method to be less accurate than both Spencer's and Uplift Van method. Moreover, slope stability computations are the bases of the likelihoods and result in posterior probabilities to determine the most likely causes. Field observations can be implemented into the analyses by including these in the slope stability computations, such as the geometry of the slip surface or the actual slope failure. Absence of clear evidence impedes the estimation of prior probabilities, but appears less important for results.
Furthermore, by including the likelihood of occurrence in the hindcasting, the outcome is considered to reflect the actual situation more objectively than a hindcasting based on a deterministic sensitivity analysis. With a deterministic sensitivity analysis, the results are highly determined by the chosen input parameter bounds. Also, despite the equally assigned prior probabilities over the discrete choices as a first estimate, which is considered as a very rough estimate, experts in the field might be able to suggest more appropriate prior probabilities. Altogether, it is still possible that the actual cause was not identified as part of the considered scenarios.

Application of probabilistic hindcasting to the Breitenhagen case
The six steps of hindcasting are applied to the levee failure near Breitenhagen. Several elements in the analysis are based on local data such as the scenarios of water pressure and part of the soil parameters. Other elements rely on generic inputs such as some shear strength parameters; see Tables 4 and 5. The more local data are available, the more accurate the hindcast. In this case, it was not feasible to collect more local data and we believe there was sufficient data available to draw conclusions. But more local data might yield in even more conclusive conclusions.
The connection between the river and the aquifer, as presented in "Pond connection with aquifer" Scenario (S 3 ), results in high water pressures building up underneath and in the toe area of the levee. High water pressures in these areas reduce the local shear strength. These conditions influence the toe area (passive area) of the levee the most, resulting in a non-linear and stretched slip surface when slope instability occurs. Observations, illustrated in Figure 9, confirm that the passive area of the slip surface stretches up to the electrical pole at the toe of the levee and tilts the pole. Thus, the observed stretched passive part of the slip surface indicates high pore pressures underneath the levee. Other scenarios of water pressures concentrate on the active area of the slip surface, leaving the passive area relatively intact and result in a slip surface which is less stretched. Even though the other scenarios are not very probable, they cannot completely be excluded. Also, the method of Spencer is most robust and accurate when calculating critical slip surfaces that are not typically circular shaped. This explains why the posterior probability favours the method of Spencer when incorporating the observed slip surface information (h') ( Figure 11).
Whether the soil is best characterised by drained or undrained behaviour is, in reality, less binary than the models suggest. The soil behaviour is probably best characterised as partially drained which explains the relatively small difference between the resulting posterior probabilities (Figure 13).
The forensic analysis of  using a deterministic sensitivity analysis-based hindcasting, reports a collection of identified dominant parameters contributing to failure: that deviating low shear strength associated with low values of POP or cohesion justify the failure. This is in contrast to the findings of the current analysis, which identifies the low shear strength associated with low values of Shear strength ratio. The difference is explained by: (1) Undrained soil behaviour is identified as the most likely model choice in the analysis and, therefore, automatically excludes the cohesion as the most dominant parameter, (2) The applied range of the values of POP (by  covers situations that are less relevant for this specific situation and results in FoS << 1. This low value of FoS explains why the POP is identified as a dominant contribution to failure when a deterministic sensitivity analysis of hindcasting is used. However, when parameter values are related to their likelihood of occurrence, the Shear strength ratio values resulting in failure are more probable to occur, which explains the high influence factor. In this study, the water pressures are mostly based on steady-state assumptions for the phreatic line and hydraulic head in the aquifer. While a pragmatic and conservative choice for engineering purposes, this assumption can lead to the wrong conclusions for hindcasting. In general, transient effects should be incorporated in a hindcast. In this study, steady-state conditions were likely reached for uplift due to the long duration of the high water and the high conductivity of the aquifer. For the phreatic line, steady-state may be more questionable because of the low conductivity of the levee material (though the pictures do show a mostly saturated levee). A sensitivity analysis with a lower phreatic line (reflecting transient effects) does not yield a different conclusion. This is because the stability is mainly determined by the aquifer head. For cases where the levee body is of more importance for the stability, transient effects also become more important.
LEM's are used in this paper to model slope stability, where a Finite Element Model (FEM) would be better in capturing complex geometries and soil behaviour. However, in this case, a sensitivity analysis shows that the scenarios of water pressures are more important for failure than the soil behaviour. Furthermore, the observed slip surface does not follow a complex shape. Therefore, in combination with its computational efficiency, we chose a LEM as this is deemed sufficiently accurate. FEM might, however, be more appropriate in case of more complex geometries and soil behaviour, and in case water pressure scenarios are less dominant. In order to use the benefits of FEM, more local information would be needed as well (hindcasting this is difficult to collect more data as this mostly vanished), especially if spatial variability is to be incorporated.
The findings of the forensic engineering report by Grubert (2013) identified a collection of causes: unexpected saturation of the levee, steep slope of the levee and, foremost, the influence of the tree roots. However, this present study shows that a pond in front of the levee is the most likely cause of failure. The findings of Grubert (2013) are possible as well, as the computed scenario probabilities are not negligible. Both visual and historical data indicate that the pond in front of the levee is likely to be leftover from a former breach (Sixdorf 2016). Due to the reparations of this breach, the conditions of the soil might deviate from other stretches of the levee. The most likely scenario of a local high hydraulic head in the aquifer and the most dominant parameter identified by the present study can directly be connected to the former breach and the reparations, which would explain the actual location of the breach in 2013.

Conclusions
This paper demonstrates the application of probabilistic Bayesian techniques to the hindcasting of levee failures due to slope instability in six steps. The method provides insights into the most relevant uncertainties of hindcasting by evaluating all possible scenarios of failure and model choices on the likelihood of occurrence. This results in the identification of the most likely scenario and most dominant parameters triggering a failure. The suggested steps of analysis provide a thorough, workable and transparent method of analysis. Furthermore, this method is an improvement of existing deterministic hindcasting methods  in the sense that it provides a better insight in the relative likelihoods of the various possible causes to failure.
The developed approach of hindcasting is applied to the levee failure near Breitenhagen, with the identified most likely scenario of failure, the most representative model choices characterising the failure, and the most dominant parameters triggering the failure as a result, despite the scarcity of evidence. The levee failure near Breitenhagen in Germany in 2013 is most likely triggered by locally weak soil conditions and unexpected high water pressures due to a connection between a pond on the riverside of the levee and the aquifer underneath the levee. The slope stability model that characterises this failure most accurately is implemented with LEM of Spencer and undrained shear strength soil behaviour. Within this combination of a failure scenario and model choices, the shear strength ratio is identified as the most dominant contributor to the failure. The contribution of the high hydraulic head due to the pond connection is identified to be the second dominant. Based on the available evidence, an old levee breach explains both the presence of a pond and the locally weak soil (Sixdorf 2016) and would thus explain the specific location of the breach.

Recommendations
Even though the probabilistic method of hindcasting is successfully applied to the levee failure near Breitenhagen as part of the forensic analysis, the method can be improved by analysing more levee failures. Historical cases such as New Orleans (2005), would make an interesting object of study, and the method can also be further developed for other failure mechanisms, such as piping. Overall, it is expected that the developed approach can support a more systematic analysis of other levee failures.
In this paper, we have used FORM for probabilistic calculations. In some cases, this method could lead to inaccurate estimates of the design point and probability of failure. In future assessments, it is recommended to investigate the (parallel) use of other methods, such as Monte Carlo, especially in case on non-linear limit state functions and complex failures.
This method shows that including observational information in the hindcasting is vital to the identification of the most dominant contributing variable. Therefore, it is recommended to explore whether it is possible to include more evidence in the hindcasting with the help of Bayesian techniques, e.g. past performance information.
In order to characterise failure, LEMs are used in this paper as the slope stability is mainly determined by water pressure scenarios and the slip surface relatively simple. Finite Elements Method (FEM) analyses can lead to a more accurate characterisation of slope instability, and should especially be considered for complex geometries and soil behaviour (see e.g. Varkey, Hicks, and Vardon 2017). However, this comes at the cost of a computational burden.

Disclosure statement
No potential conflict of interest was reported by the author(s).