A causal inference method for canal safety anomaly detection based on structural causal model and GBDT

ABSTRACT It is a common idea to take advantage of as much evidence as possible in engineering safety monitoring data analysis. Compared to the direct use of as many measurement points as input features in prediction tasks, this paper reveals that this habit tends to introduce a risk in structural anomaly detection. This paper proposes a machine learning based causal inference method, SCMGBDT structural causal model of gradient boosting decision tree (SCMGBDT), within a structural causal model framework to improve the robustness of the model in structural safety anomaly detection tasks in terms of three aspects. First, the causal effect generation relationship between environmental measurement points and correlated response measurement points is explained by constructing a common confounder causal graph; then, the GBDT machine learning method is introduced to discover the nonlinear statistical relationship between environmental measurement points and response measurement points; finally, the model parameter estimation results are improved by introducing regularisation constraints and cross-checking methods. By comparing the model estimation precision and the anomaly detection accuracy under different simulated anomaly scenarios, the results show that the SCMGBDT model proposed in this paper has a reasonable construction process and the model can maintain a good anomaly detection performance under different anomaly scenarios.


Introduction
The timely detection of anomalies through monitoring data is important for project safety and emergency response when structural defects occur in hydraulic engineering (Li et al., 2016).Due to the varying structures of hydraulic engineering and the various factors that affect structural response, to achieve this goal, professionals are hired during project operations to use advanced mathematical and statistical methods to identify anomalies as effectively as possible (Fisher et al., 2017).Numerous anomaly detection methods are applied to engineering safety monitoring practices, including statistical thresholds (Bukenya et al., 2014), structural simulation (Pyayt et al., 2011), cluster analysis (Chen et al., 2019), regression analysis (Salazar, Moran, et al., 2017), etc.These methods attempt to mine patterns from the perspectives of statistical features of response magnitude (Yu et al., 2010), environmental and response distribution patterns (Hu et al., 2018), etc. from historical monitoring data, and then take the outlier monitoring data as a basis for recognising possible anomalies in structures (Belcher et al., 2015).Among them, the regression based anomaly detection method can be used to construct anomaly detection models that can be extrapolated through statistical modelling techniques without constructing physical response mechanisms between environmental factors and response magnitudes, which has good adaptability in the analysis of hydraulic engineering safety monitoring data.
A common regression-based anomaly detection process (AhmadiNedushan, 2002;Mata et al., 2021;Prakash et al., 2018) is shown in Figure 1.First, organise the historical monitoring data of engineering safety, and select the environmental factors affecting the change of structural response based on a priori knowledge or data analysis results; then, construct the response magnitude regression model based on statistical modelling methods, and use the regression model to estimate the monitoring data to be tested; finally, judge whether the structure is abnormal according to the deviation of the response magnitude estimation results from the actual measurement results.It is not difficult to find that the technical uncertainty of the regression modelling step in the anomaly detection process has the greatest impact on the anomaly detection performance, such as the design of the regression model and the selection of the learning method, and thus becomes the focus of the anomaly detection research.
To construct a suitable regression model, experts often take into account the data preparation and feature selection process based on engineering characteristics and experience, so that the model can better fit the measured data (F.Li et al., 2013;M. Li et al., 2019).In recent years, the rapid development of artificial intelligence technology has led to an excellent data fitting ability (Mata, 2011;Su et al., 2016), while machine learning methods have the feature of adaptive modelling (Dai et al., 2018), which largely simplifies the regression modelling process, helping to significantly reduce the related human and economic cost, and more and more researchers have tried to use machine learning and deep learning methods to build regression models for anomaly detection, which has become a research focus in recent years (Kang et al., 2017).
It is worth noting that, limited by the habit of machine learning in regression modelling research, researchers in engineering safety anomaly detection tend to consider regression modelling a prediction task, i.e. to achieve the highest in-bag prediction accuracy, and rarely consider the applicability of the constructed regression model in the anomaly detection stage (Jung et al., 2013).In particular, when there are multiple response measurement points in the monitoring area, researchers usually consider the response measurement points at neighbouring locations as one of the bases for prediction and use them as input features together with environmental measurement points during model training (Salazar, Toledo, et al., 2017).This approach is beneficial to improve the prediction accuracy of the model, but ignores the difference in the relations between different input variables and output variables of the model, which brings risks to the anomaly detection stage of the model (El Bilali et al., 2022).
Therefore, some scholars have tried to improve the out-of-the-box data modelling method to enhance the generalisation ability and robustness of anomaly identification of regression models.Yuen et al. combine Bayesian inference with ordinary least squares (OLS) modelling (Yuen & Mu, 2012), enabling the introduction of prior knowledge in model parameter identification, especially when modelling with correlated data, which needs to consider the correlation of residual terms of different measurement points (Yuen & Ortiz, 2017).For the non-stationary nature of the structural evolution process, Mu et al. try to combine Kalman filtering methods to achieve continuous learning of model parameters (Mu & Yuen, 2015;Nguyen & Goulet, 2018), which can effectively recognise outliers and alleviate the influence of outliers on model parameter identification.However, the above methods are all based on linear modelling approaches, which have challenges when used in machine learning regression models.Recently, several studies have attempted to combine causal inference and machine learning techniques (Koutroulis et al., 2022;Lin et al., 2022), enabling domain knowledge to be introduced into regression modelling and anomaly detection tasks, but no research has been reported for structural safety monitoring.In this paper, we revisit the regression modelling problem in engineering safety anomaly detection from the perspective of causal inference; an engineering safety anomaly detection model based on a structural causal model and gradient boosting decision tree (GBDT) is constructed to improve the robustness of the anomaly detection performance, and a case study of safety monitoring of a section of the canal in the China South-to-North Water Diversion Project is used for validation.
This paper is structured as follows.After this brief introduction of the problem and background of the work considered in this paper in section 1, section 2 introduces the methods needed to build the canal structure anomaly detection model in this paper, including a structural causal model and GBDT; section 3 presents the process of building the canal structure anomaly detection model, including the canal safety monitoring status used for experiments, the anomaly detection modelling design, and the GBDT-based structural causal model learning method; section 4 verifies the effectiveness of the anomaly detection model proposed in this paper through experiments, and discusses the experimental results; and section 5 provides an overview of the advantages of, and recommendations for future research on, the suggested models.

Structural causal model
Since the need for prediction is prevalent in big data applications, prediction is often used as the goal of model estimation in regression modelling.However, research in recent years has found that machine learning models can achieve good results by learning pseudocorrelation, but do not generalise well in real-world settings, making these problems more appropriate as causal inference tasks.The causal inference task is somewhat similar to the prediction task in that both are based on some evidence that gives an estimate of a variable.The difference between prediction and causal inference lies in the different modelling objectives.The prediction task is to learn the conditional probability distribution PðyjxÞ from historical data by simply considering the correlation between features x and variables y, which is used to estimate the value of y that fits the pattern of historical data given x.The causal inference task, on the other hand, is to learn the intervention distribution PðyjdoðxÞÞ among variables from historical data under certain causality assumptions, which is usually used to estimate the effect of a change in x on y.
Although both tasks are based on statistical modelling methods for model learning, the learning process differs between them, making the parameter estimation results of the constructed statistical models different.Structural causal modelling is a modelling approach used to accomplish the task of causal inference.Structural causal models include causal graphs and probability distribution expressions used to describe the causal relationships between variables.There are three types of variables -the explanatory variable x, output variable y, and unobserved variable u -where x and u are used to describe the independent variables in a causal relationship, and y is used to describe the dependent variable in that relationship.When the observed variable x is not sufficient to explain the generation of y, u is often introduced as a complement to the cause.A change in the value of the independent variable directly leads to a change in the output variable, and the causal relationship between them is often described using a conditional probability distribution, i.e. f y ðxÞ !y.The process of constructing a structural causal model has two steps: the construction of a causal graph and model training.
A causal graph is a directed acyclic graph that consists of nodes and directed edges, where the former represent variables and the latter represent the form of causal effects between variables.In Figure 3, the causal relationship between explanatory variables and output variables is represented by a solid line, while the causal relationship between unobserved variables and output variables is represented by a dashed line.The causal graph provides a convenient tool for describing complex causal relationships in a system, and it also reveals the challenge faced in the causal inference process, namely the potential confounding bias in the statistical model constructed when there is a back-door path between the variables to be inferred.'Back-door path' refers to the presence of confounding variables that together affect the explanatory variable x with the output variable y in the analysis of PðyjdoðxÞÞ.
Model training refers to the use of data to quantify the causal relationship between variables, i.e. the intervention distribution PðyjdoðxÞÞ.Correlation does not mean causality, and thus confounding bias exists PðyjxÞ�PðyjdoðxÞÞ, such that the training process of structural causal models does not directly exploit the correlation between data as in predictive models, but requires covariate adjustment, matching, or the introduction of instrumental variables to mitigate confounding bias.In addition, some simple causal relationships can be described by linear models, while some complex causal relationships may exist between variables, which often require consideration of the use of nonlinear modelling or nonparametric models like machine learning to describe them.

Gradient boosting decision tree
GBDT is a decision-tree-based ensemble learning algorithm (Chen & Guestrin, 2016), developed on the basis of a gradient boosting method, which has shown good performance in large number of classification and regression tasks.Similar to other ensemble learning models such as random forest, GBDT integrates multiple weak learners and combines them into a strong learner, which improves the prediction accuracy and generalisation ability of the model.It uses a classification and regression tree (CART) as a weak learner for regression.The construction process of a CART is described as Algorithm (1).Through dividing the sample input space continuously, a decision tree containing M leaf nodes is constructed to realise the prediction of the regression function f(x) as Algorithm (2).
In the above equation, C m represents the subsample space divided by the decision tree, and |C m | represents the count of samples in C m .The f(x) trained by the sample data minimise a loss function measured by mean square error (MSE), namely: To satisfy the above function, the training process of the decision tree adopts a greedy algorithm, which optimises the output function f(x) by continuously selecting the optimal segmentation feature j and the segmentation point s, so that the entropy after the tree node segmentation is the smallest, which is: In the above equation, C L and C R represent the two subspaces divided by the original node r, and � c L and � c R represent the mean of the samples contained in C L and C R , respectively.
The fitting ability of a decision tree as a weak classifier is very limited.GBDT performs multiple rounds of training based on the boosting mechanism to build a decision tree group.The goal of each new decision tree r i is to minimise the gradient of loss function.For the squared loss function, the residual ε n ¼ y À F nÀ 1 ðxÞ is just the gradient of the previous regression function F n-1 (x).Input: train data collection C with input feature X and label y, minimal sample number of a leaf m s , max tree depth m d Output: root node of the constructed decision tree r Create a root node r While depth of r is smaller than m d or count of C is larger than m s : Calculate feature j and value s the minimised Gini feature split by When constructing predictive models with correlated input features, GBDT can better mine the interaction effects between variables, and decision tree is easier to explain.So it is more interpretable than other machine learning methods, especially in terms of the presence of SHapley Additive exPlanations (SHAP) (Lundberg & Lee, 2017), a computational framework dedicated to unifying the field of interpretable machine learning based on the concept of Shapley value.Shapley value is derived from joint game theory.In an activity with M players participating together, a subset S of players in the total set N forms an alliance, and its contribution value is v(S), and the marginal contribution of a player i in the alliance is vðS [ figÞ À vðSÞ.The Shapley value is the contribution of player i for all possible alliance permutations: For most agnostic models SHAP reduces the computation time by approximated sampling (Štrumbelj & Kononenko, 2014) and building local interpretation models.Taking advantages of the feature that decisiontree-based machine learning methods have a natural feature selection process when predicting, Lundberg et al. proposed the TreeSHAP method (Lundberg et al., 2020), which realised fast and exact Shapley value calculation for the tree-based ensemble learning model.

Canal safety anomaly detection modelling
This paper suggests a machine learning regression modelling technique to implement a structural causal model for canal anomaly detection, including three major steps -causal graph construction, model training, and anomaly detection -as shown in Figure 2. First, we use domain knowledge to construct a causal graph based on the layout of the measurement points in the monitoring section, which depicts the causal relation between measurement points and hidden physical quantities.Then, the monitoring data are collated and divided into a training set, validation set and test set; the structural causal model is trained using the training set and validation set data; the SHAP method is used to observe the feature importance of the model; and the causal structural model is reconstructed if the model confounding bias is not considered to be mitigated.Finally, anomaly detection is performed using the test set data, and an early warning is issued if abnormal data are found to be beyond the discriminant threshold.

Canal safety monitoring status
A canal is a common water delivery building constructed in water diversion projects.We will take a canal section of the South-to-North Water Diversion project as an example.Under the influence of water level, temperature, material ageing, and geological movement or other factors during operation, the surrounding structure may be cracked, and slide, which may result in structural instability.The monitoring equipment layout is shown in Figure 3.
Canal safety monitoring generally focuses on the deformation displacement and seepage pressure of the canal.If the canal is found to have abnormal displacement, this is an indication of possible future settlement or landslide of the canal.If an abnormal seepage pressure is found in the canal, it is an indication of possible leakage in the canal.The challenge in canal safety monitoring is that canal displacements and seepage pressures change with environmental factors during project operation, and project safety operation managers need to identify whether there are abnormal changes in these response magnitudes, in addition to the effects of environmental factors (X.Li et al., 2019).
It is worth noting that environmental factors affecting canal displacement and seepage pressure changes during project operation do not include only temperature and water level, and thus considering only the effect of observed environmental quantities on response quantities is often insufficient to determine whether there are abnormalities in structural safety.Although other environmental factors cannot be directly observed, these factors tend to act on similar safety monitoring response magnitudes, and thus experienced engineering safety monitoring experts compare the effects of known environmental quantities on response magnitudes while also comparing whether there is reasonable consistency in the changes of the same type of response magnitudes.Based on the above considerations, a causality diagram of the monitored quantities at this section is constructed, as shown in Figure 4, to guide the regression modelling process.As shown in Figure 3, the prerequisite for achieving anomaly identification is to construct an estimation model for the response measurement point y.Usually, there are two considerations in constructing anomaly identification models.One is to construct a regression model of y using only the known environmental factors x, i.e. learning distribution PðyjxÞ, which is called a causal model because x has a direct causal relationship with y.The other is to construct a regression model of y using the known environmental quantity x and other response quantities t to supplement information on the unobserved environment u, i.e. learning distribution Pðyjx; tÞ, which is called a non-causal model because feature t has no direct causal relationship with y; hence the model is called a noncausal model.

A GBDT-based structural causal model
Based on the above analysis, this paper proposes a GBDT-based structural causal model construction method, which attempts to construct a structural anomaly identification model using the idea of causal inference.Following Figure 4, we consider that there is a certain correlation between t and y under normal working conditions, influenced by the unobserved environmental variable u; thus, the original causal diagram can be converted into a common confounder causal diagram, as shown in Figure 5, which treats the similar effect of u on t and y as the direct effect of t on y.
The structural equation in Figure 5 can be expressed as: Considering that there is a nonlinear relationship between the environmental factors and the response measurement points, a system of equations is established as follows: The m and g in the set of equations represent nonlinear transformations.This paper uses the GBDT model to learn, and the regression equation of y is linear regression.The model learning process is divided into two steps: 1. Learn to estimate m and ĝ with GBDT model, and compute vi ¼ t i À mðxÞ 2. Estimate the partial linear coefficient It is worth noting that to satisfy the assumption that the effect of t in estimating y is independent of x, an orthogonalised regression value vi ¼ t i À mðxÞ is used instead of using to estimate θ in the linear regression.
However, recent research showed that naive inference based on a direct application of machine learning methods to estimate the causal parameter θ is generally invalid.The use of machine learning methods introduces a bias that arises due to regularisation.Using sample splitting -i.e.estimating the models g and m with one part of the data (training data) and estimating θ with the other part of the data (validation data) -overcomes the bias induced by overfitting.We can exploit the benefits of cross-fitting by switching the role of the training and validation sample.Based on the GBDT model, construct the regression equation for t i using the training set, and record the residuals of mt;i using the validation set as vi ¼ t i À mt;i ðxÞ Use the training set to construct the regression equation y ¼ ĝðxÞ and record the estimates with the validation set ĝ Estimate the results of the parameter of the regression linear equation f according to the following equation *GBDT stands for Gradient Boosting Decision Tree. [R8-3]

Anomaly detection
Once regression f(x) has been trained, the residual ε that cannot be explained by f(x) is obtained through and the MSE σ of ε is calculated to construct the discriminant interval, which is used to detect anomalies.Specifically, for observed monitor data (x, y) to be tested, the identification result is abnormal if the distance h between estimated value ŷ ¼ f ðxÞ and monitor data y is larger than 3σ; thus, h ¼ y À f ðxÞ j j > 3σ, and monitor data is treated as normal otherwise.The statistical meaning of 3σ is that y is within about the 95% confidential interval of f x ð Þ, which is a common setting in the literature.This threshold is given as 2σ in some papers as well.
Two characteristics are needed in an anomaly detection model to provide good performance.One the one hand, the validation dataset demonstrates good predictive performance for f(x), resulting in a smaller standard deviation of the model residuals and, consequently, a narrower anomaly detection threshold that enables the model to detect abnormal loads earlier.On the other hand, f(x) should be able to generalise well, which means that the model can identify a flexible feature from monitoring information.
Based on the above and the available research results, some differences between the causal and non-causal models in estimation and anomaly detection can be foreseen.First, the non-causal model is based on more variables in estimating y.The prediction accuracy of the non-causal model will be higher relative to the causal model, i.e. the non-causal model has a lower MSE σ.This means that the non-causal model has a narrower anomaly discrimination interval and can easily identify minor anomalous perturbations.
Secondly, in addition to the observed environmental factors of temperature and water level, the effects of other environmental factors that produce responses to the structure may be reflected in each response measurement point.The fact that most of the time during the operation the project is in a steady state leads to a high correlation between the engineering response measurement points in the monitoring data, which, when used for regression modelling, can lead to a prediction model that relies too much on this information.This means that when an anomaly occurs in the structure as a whole, it will have a similar effect on the each response measurement point, making the non-causal model unable to effectively recognise the anomaly.
The SCMGBDT model proposed in this paper aims to make its estimation behaviour more consistent with causality from the perspective of causal inference after introducing adjacent response measurement points, i.e.Pðyjx; tÞ ¼ PðyjdoðxÞ; doðtÞÞ, to improve the model estimation accuracy while enhancing the generalisation ability, and achieve robust performance in the anomaly detection stage.

Discussion of experimental results
To test the improved performance of proposed methods, two types of experiments were conducted on anomaly detection models including seepage detection and displacement detection.One type of experiment is model estimation and interpretation based on SHAP to verify the supposition in section 3.2.Estimation residuals on different response measurement points are compared and the feature importance of each model is present.The other type of experiment is an anomaly detection test with simulation data, and three abnormal scenes were simulated on real monitoring data: single point anomaly, multi-point asynchronous anomaly and multi-point synchronised anomaly.In this paper, monitoring data of a canal from 2019.8 to 2022.6 were used for model training and testing; the same hyper-parameters were used for all model training processes, including a training loss function of MSE, a maximum iteration number of 50, a minimal leaf sample of 5, a maximum tree depth of 3, and a learning rate of 0.1.

Estimation and interpretation of models
Estimation accuracy is a base requirement for anomaly detection, and more input features means more estimation evidence, which lead to less estimation error.The monitoring sections have five response measurement points: three seepage measurement points (P1, P2 and P3) and two displacement measurement points (ES1 and ES2).The estimation precision is measured by mean absolute error, as described in Table 1.
As stated in the content of section 3.3, the noncausal model achieves the highest estimation precision; the SCMGBDT proposed by this paper performs better than the causal model and presents less precision than SCMGBDT.However, the engineering safety anomaly detection task is more in line with a decision task than a prediction task.When building decision models using complex machine learning methods, causality and correlation should be distinguished as much as possible, and features with causality should be used as the main basis for decision-making to give the models better generalisation performance.Thus, a good anomaly detection model should have not only accurate model prediction, but also a reasonable distribution of model feature importance.
To reveal the difference in how these models behave in estimation, feature importance is an important type of evidence.Although machine learning computation is complex to describe, SHAP can   estimates depending on correlation features in a complementary manner, the causal features stay dominant as in the causal model.

Anomaly detection performance comparison
As this paper focuses on anomaly detection robustness, model estimation precision is not the main standard in anomaly detection, because anomaly patterns vary.
To test the models' robustness in anomaly detection, this paper constructs an anomaly dataset by simulating different forms of anomalies in the canal structure in part of the original dataset.Three scenarios of structural anomalies are simulated in this paper, including a single measurement point local anomaly (scenario 1), a nonuniform overall anomaly (scenario 2), and a uniform overall anomaly (scenario 3).Anomalous data are generated by adding time-linearly correlated deviations to the original data series.Scenario 1 adds deviations to only the response measurement points to be tested, and Scenario 3 adds equal deviations to all steel stress measurement points; Scenario 2 also adds deviations to all steel stress measurement points, but with different deviation magnitudes for different points: the same deviation on test measurement points as other scenarios and half deviation on other measurement points.
The robust anomaly detection model has a lower frequency of missing and false positives, where the model incorrectly identifies an anomaly as a normal measurement (i.e.false positive), and false negatives, where the model incorrectly identifies a normal measurement as an anomaly (i.e.false negative).This paper evaluates the anomaly recognition performance of the model by accuracy, measured as (TP+TN)/(TP +FP+TN+FN).In this paper, three structural anomaly scenarios are used to test three models -the causal model, non-causal model, and SCMGBDT, and the results are shown in Table 2.
As can be seen from Table 2, the highest anomaly detection accuracy in different scenarios is not the same.The non-causal model could recognise the can recognise the highest amount anomaly data in scenario 1 because the model has best estimation precision, and SCMGBDT has comparable anomaly detection accuracy as well.However the single monitoring measurement point anomaly is just an extreme simulation scenario.The meaning of the proposed SCMGBDT models is presented in Scenario 2 and Scenario 3, which usually take place in reality.SCMGBDT could maintain anomaly detection performance for it not only has a better estimation precision than causal model but also takes environmental factors as the main basis for estimation.
In contrast, the reason why non-causal fails to have a comparable anomaly detection in Scenario 2 and Scenario 3 with Scenario 1 is that the estimation model takes correlation measurement point as its main basis for estimation; however, this correlation may continue in an abnormal engineering condition.Taking Scenario 3 as an example, the details of the anomaly detection process are presented in Figures 8 and 9.
These two figures show the changing estimation interval of different models as the same deviations are applied on each response measurement point.A counterintuitive phenomenon emerges where the non-causal model's estimation has a nearly consistent change such that the abnormal measurement data is always in the estimation interval, even though it has the narrowest estimation interval for recognising abnormal data.In contrast, SCMGBDT exploits correlation measurement points to improve estimation precision as well; its performs more consistently with the causal model, for taking environmental factors as the main estimation basis.In summary, firstly, the more reasonable inference computing and higher estimation precision make the proposed SCMGBDT model in this paper more robust in recognising different abnormal scenarios.Secondly, although correlation does not imply causation, this does not mean that correlation cannot be used for decision-making.The experimental result explains that correlation and causation can be balanced in the regression modelling process, which requires the model builder to give more consideration to techniques related to causal inference.

Conclusions
It is a common idea to take advantage of as much evidence as possible in engineering safety monitoring data analysis.Compared to the direct use of many measurement points as input features in prediction tasks, this paper reveals that this habit tends to introduce a risk in regression-based anomaly detection.This depends on the different importance of causation and correlation in decision making task.This paper proposes a machine-learning-based causal inference method, SCMGBDT, within a structural causal model framework to improve the robustness of the model in the task of structural safety anomaly detection, in terms of three aspects.First, the causal effect generation relationship between environmental measurement points and correlated response measurement points is explained by constructing a common confounder causal graph; then, the GBDT machine learning method is introduced to elucidate the nonlinear statistical relationship between environmental measurement points and response measurement points; finally, the model parameter estimation results are improved by introducing regularisation constraints and cross-checking methods.By comparing the model estimation precision and the anomaly detection accuracy under different simulated anomaly scenarios, the results show that the SCMGBDT model proposed in this paper has a reasonable construction process and the model can maintain a good anomaly detection performance under different anomaly scenarios.
The solution and model proposed in this paper also have shortcomings, and future work should be attempted including the following: (1) the causal structural model is constructed without considering the direct effect of loads between adjacent response measurement points, which means the model may not be well suited for engineering with more consistent structural response, such as dams and culverts; (2) in the anomaly detection robustness experiments, only empirically based simulated anomaly data were used.In the future, it is necessary to consider the use of simulated anomaly data generated by structural mechanics, such as finite element simulation, to better evaluate the model anomaly detection performance.

Algorithm 1 .
CCT(C, m s , m d ): Construction of a CART.

Figure 2 .
Figure 2. Flow chart of SCMGBDT construction and anomaly recognition.SCM stands for Structural Causal Model, GBDT stands for Gradient Boosting Decision Tree.

Figure 3 .
Figure 3. Monitoring equipment layout of a canal section.

Figure 4 .
Figure 4.The causal graph of environmental factors of the structure.

Figure 5 .
Figure 5.The modified common confounder graph in normal engineering conditions, based on Figure 4.
present a direct comparison of results of model features.Figures 6 and 7 present the summary SHAP plot of the causal model, non-causal model and SCMGBDT of measurement points P1 and ES1, respectively.A SHAP plot of other measurement points performs similarly in terms of feature importance, which we do not include to avoid repetition.SCMGBDT: The SHAP summary plot ranks each input feature from top to bottom according to feature importance.Each row in a SHAP summary plot depicts the relationship between the feature value and its SHAP value, with the magnitude of the feature value indicated by colour and the SHAP value indicated by the horizontal coordinate of the sample point.The summary plot of the causal model estimates only environmental factors including water level, temperature and time effect.The difference in the summary plot between the non-causal model and SCMGBDT present insights regarding advantages in anomaly detection.Figures 6 and 7 show that when measurement points in the same section are used in estimation, non-causal model estimates mostly depend on correlation features; however SCMGBDT Figure 6.SHAP summary plot of (a) causal model, (b) non-causal model and (c) SCMGBDT of seepage measurement point P1.SHAP stands for SHapley Additive exPlanations, SCMGBDT stands for Structural Causal Model of Gradient Boosting Decision Tree.[R27-1]

Figure 7 .
Figure 7. SHAP summary plot of (a) causal model, (b) non-causal model and (c) SCMGBDT of displacement measurement point ES1.SHAP stands for SHapley Additive exPlanations, SCMGBDT stands for Structural Causal Model of Gradient Boosting Decision Tree.[R27-2]

Figure 8 .Figure 9 .
Figure 8. Anomaly detection result of seepage (water pressure) of different models.SCMGBDT stands for Structural Causal Model of Gradient Boosting Decision Tree.[R27-3] train data collection C with input feature X and label y, loss function L(y, F(x)), max iteration number M, learning rate λ Output: regression function F M (x) Initialise regression function F 0 (x) by the average value of y in C For m = 1 to M: Calculate residual's gradient of F m-1 (x),ε m ¼ À @Lðy;FmÀ 1 ðxÞÞ Construct a CART f m (x) with Algorithm CCT(C m , m s , m d ), where C m ¼ fðx i ; ε m;i Þg; i ¼ 1; 2; ; C j j Update F m ðxÞ ¼ F mÀ 1 ðxÞ þ λf m ðxÞ Algorithm 2. GBDTðC; L; N; λÞ: Train a GBDT model.Input: train data collection C with input feature X and label y, loss function L(y, F(x)), max iteration number M, learning rate λ Output: regression function F M (x) Initialise regression function F 0 (x) by the average value of y in C For m = 1 to M: (Continued) Input:

Table 1 .
Mean absolute error of different GBDT models.

Table 2 .
Anomaly detection accuracy of different models in different scenarios.