Modelling and predicting landslide displacements and uncertainties by multiple machine-learning algorithms: application to Baishuihe landslide in Three Gorges Reservoir, China

Abstract Time series landslide displacement is the most critical data set to understand landslide characteristics and infer its future development. To predict landslide displacements and their quantitative uncertainties, a mathematical description of the landslide evolution should be established. This paper proposes a novel hybrid machine-learning model to predict landslide displacements and quantify their uncertainties using prediction intervals (PIs). First, wavelet de-noising and Hodrick-Prescott (HP) filters are applied to decompose the original landslide displacement into periodic, trend, and noise components. Second, a module built on the framework of bootstrap and extreme learning machine (ELM) with a hybrid grey wolf optimizer (HGWO) is used to derive a formula for modelling the periodic component of the landslide motion. Another formula for predicting the trend component of the displacement is derived by double exponential smoothing (DES). Grey relational analysis and kernel principal component analysis (KPCA) are used to select the main factors controlling the landslide motions. Finally, the two constructed formulas are used to generate the predictions of landslide displacements and the PIs. Validation and comparison experiments have been carried out on the Baishuihe landslide in the Three Gorge Reservoir of China. Results demonstrate the proposed method can achieve better performance with higher-quality PIs than other state-of-the-art methods.


Introduction
Landslide is a prevalent and recurrent geological phenomenon worldwide. It poses serious threats to the local communities and could disrupt the roads, tunnels, bridges, farmlands, and the environment (Huang 2009). In the Three Gorges Reservoir (TGR) of China, over 4200 landslides are distributed throughout this region, and the majority of these landslides present characteristics of multiple triggers and reactivations . Accurate modelling and predicting landslide displacements are essential for the prevention of landslide hazards.
Numerous approaches have been developed for modelling landslide based on physics-based models and statistics-based models. Physics-based models are genersally built on creep theory describing the constitutive relationship of rock and soil Miao et al. 2018); these models require a wide range of actual observations and laboratory experiments to determine the physical and mechanical parameters, and thus limit their wide application. Statistical models predict landslide displacement based on regression analyses of the historical displacements . In comparison, statistical models tend to develop the response relationship between landslide displacement and its associated causal factors using a variety of statistical analyses and machine learning methods, which do not require the determination of the physical parameters. Consequently, statistical models are generally easier to implement and have gained increasing popularity.
In previous literature, statistical models have been vigorously discussed and widely used in forecasting landslide displacement. Representative models include the grey system model (Deng 1982), the Verhulst model (Tianbin and Mingdong 1996), multivariate regression models (Jibson 2007), autoregressive-integrated moving average (ARIMA) (Carla et al. 2016), and others. In the past decades, some great progress has been made since the involvement of machine learning (ML) methods in landslide research. ML was first directly used as a landslide prediction model, but got unsatisfactory outputs, mostly because of the suboptimal parameter settings, complicated impact factors, and insufficient observations. Therefore, optimized ML models with hybrid parameters were created.
In recent years, ML models such as artificial neural networks (ANN) (Lian et al. 2015;Ma et al. 2018;Guo et al. 2019), support vector machine (SVM) (Pradhan 2013;Zhou et al. 2016), and various kinds of optimized coupling models  have gradually become the mainstream as a more powerful approach in landslide research. Generally, SVM has a better performance than the ANN, but the model itself has some defects, e.g. having difficulty in parameter selection . Nowadays, the extreme learning machine (ELM) has overcome the challenge of parameter initialization and has advantages on global minimum optimization and strong generalization; therefore, it has been successfully tested in many other fields with promising prediction results (Miche et al. 2015;Cao et al. 2016;Tram er et al. 2017). However, as a single-hidden-layer feed-forward neural network, ELM may introduce additional prediction errors if trained with a small data set ). To achieve a better performance, ensemble-based bootstrap sampling and hybrid parameter optimization methods are introduced to ensure the optimal performance of ELM in this study.
The optimization algorithms can be divided into the deterministic algorithms and the stochastic algorithms (Mirjalili 2015), the latter is capable of avoiding local solutions due to randomness. Among the stochastic algorithms, grey wolf optimizer (GWO) proposed by Mirjalili et al. (2014) is a new biological intelligence algorithm that is inspired by the behaviour of grey wolves, gives better quality solution than other old metaheuristic algorithms (Faris et al. 2018;Teng et al. 2019), such as genetic algorithm (GA), ant colony optimization (ACO), particle swarm optimization (PSO), and others (Mirjalili et al. 2014;Gao and Zhao 2019;Wang and Li 2019). However, as one of the swarm intelligence algorithms, GWO is still likely to fall into stagnation when predation suffers slower convergence during some of the long sequential execution time (Zhu et al. 2015). Therefore, a meta-heuristic search method named differential evolution (DE) is introduced to further accelerate the convergence and increase the optimization accuracy of GWO, called the hybrid GWO (HGWO). The HGWO will update the previous best position of each grey wolve to help GWO jump out of the stagnation through DE's strong searching ability.
Besides model building, the selection of input variables also has a big impact on the prediction accuracy in machine learning. The recent researches began to analyse the role of impact factors in landslide evolution during ML modelling Miao et al. 2018;Zhou et al. 2018;Zhu et al. 2018); however, only a few have established the response relationship between causal factors and landslide deformation. In consequence, the physical meanings of various components of the landslide displacement time series are unclear, causing the associated causal factors cannot effectively be linked with the corresponding components. What's more, previous researches on landslide displacement forecasting are mainly focused on deterministic prediction (H. Zhou et al. 2018 ), which provides only one fixed forecast value at a given time point for each target. Although it can estimate the forecasting error at the point, point prediction gives limited consideration on stochastic behaviours of the landslides system and therefore could not efficiently represent the dynamic uncertainty of landslides.
To solve these problems, this paper proposes a new composite model integrating multiple ML and statistical models to produce unbiased reliable forecasts for landslide displacement. The model combines multi-algorithms to optimize the ML models and quantifies the uncertainty by prediction intervals (PIs). The model aims to establish a more accurate response relationship between causal factors and landslide deformation. The landslide displacement time series consists of a long-term trend dominated by internal geological conditions, a short-term periodical fluctuation affected by external triggering factors, and random noise components. Time series landslide displacements are first denoised and decomposed. Then, a novel composite framework involving bootstrap resampling, double exponential smoothing (DES), and hybrid grey wolf optimizer (HGWO) optimized ELM is built to obtain formulas to predict landslide displacements. Experiments on the Baishuihe landslide in TGR from 2003 to 2013 have confirmed the effectiveness and utility of the proposed model for long-term practical monitoring and forecasting of landslide displacements.

Time series analysis of landslide displacements
Generally, the dynamic movement of a landslide is subject to internal geological conditions and external environmental factors. The displacement dominated by geological conditions (e.g. topography, geotechnical properties) is generally found to be approximately monotonic over time, indicating the long-term trend. The displacement affected by external triggering factors (e.g. rainfall and reservoir water variation) can be expressed as a proximate periodic function, leading to different deformation features. The pattern of the landslides displacement in the TGR of China is normally controlled by joint efforts between geological conditions and environmental factors. The accumulated displacement can be decomposed as follows: where D t is the original accumulated displacement, T t stands for the trend displacement, P t is the periodic displacement, and n t represents the random noise of the displacement.
In this study, a wavelet-based denoising method, which preserves important signals while removing noise by diagnosing features of data at different frequencies, is applied to remove the random noise in original displacement. Then, the de-noised displacement time series is divided into the periodic and trend components by the Hodrick-Prescott (HP) filter. The HP filter has been widely used in macroeconomics due to its strength in deriving smoothed-curve expression of the time series observation. Previous studies have proven that HP is more sensitive to the long-term trend evolution than to the short-term fluctuations (Ravn and Uhlig 2002). The sensitivity of the long-term trend to short-term variation can be adjusted by modifying a parameter k: Then, the trend component of a time series can be obtained as follows: where y t (t ¼ 1, 2, … ,T) denotes the logarithms of data time series; c represents the trend component; k is the smoothing parameter, controlling the smoothness of the trend component. The objective of Equation (2) is to obtain a smoothed trend component by minimizing both 1) the misfit between the observed and the trend component (i.e. term 1 in Equation (2)) and 2) the second-order difference of the trend component (i.e. term 2 in Equation (2)). Thus, if the smoothing parameter k is 0, no smoothing takes place. The larger the value of k is, the smoother the result is. When the value of k becomes large enough, for example, k ¼ infinity, the trend component could be considered as linear. In principle, the appropriate value of k also depends on the periodicity of the original data. For yearly data, k ¼ 100 is suggested, while for monthly data k ¼ 14400 is suggested (Zhu et al. 2018). In this study, the landslide displacement with sharp acceleration occurred once a year. Consequently, the smoothing parameter k ¼ 100 is assigned.

Analysis of model uncertainty
In the landslide displacement prediction model, the predicted displacement can be expressed as follows: where F i is the ith observable target values, Tðx Ti Þ and Pðx Pi Þ represent the true trend and periodic regression outputs of a forecasting model, x i is the vector of inputs, and e is the random error assumed to be normally distributed with mean zero and variance r 2 e : In practice, the goal of regression is to produce an estimate of the truê T ðx Ti Þ andPðx Pi Þ: Given all terms in Equation (1) have associated sources of uncertainty, and assuming they are independent, the total variance of prediction is given by the following: where r 2 Tmodel and r 2 Pmodel stand for the model uncertainty of the forecasting trend and periodic displacement separately, and r 2 e is the random noise variance. So far, a diverse set of approaches have been developed to quantify the model uncertainty, ranging from Delta (Chryssolouris et al. 1996), mean-variance estimation (MVE) (Nix and Weigend 1994), Bayesian (Dybowski and Roberts 2011), lower upper bound estimation (LUBE) (Khosravi et al. 2011b), and bootstrap (Khosravi, Nahavandi, Creighton, Srinivasan 2011). Bootstrapping is a statistics-based technique focusing on the simplification of the original time series, allowing estimation of the sampling distribution of almost any statistic (Wan et al. 2014). Besides, this method has good compatibility with other training methods of ML (Pearce et al. 2018); it requires no complex matrices and derivatives during calculations, ensuring efficient implementation.
Model uncertainty can be attributed to several factors including model bias, training data variance, and parameter uncertainty (Pearce et al. 2018). The model bias occurs in two main forms: firstly, how comprehensive are the employed factors affecting typical landslide systems; secondly, how representative is the model representing the landslide process. Accordingly, to mitigate the bias, grey relational analysis (GRA) and kernel principal component analysis (KPCA) are coupled to analyse the possible triggering factors and figure out the potential contribution of each component affecting landslide displacement. Then, an optimized ELM model is applied to acquire the global minimum optimization by minimizing the output errors; consequently, the model misspecification could be assumed zero. The parameter uncertainty results from sub-optimal parameters are assigned for the forecasting model. Besides, ML related algorithm is sensitive to parameter setting; various weighting parameters lead to varying outputs even if with the same inputs. Therefore, an ensemble-based ELM trained with different parameter initializations (parameter resampling) and an HGWO parameter method are proposed to derive the optimum parameters. As for the training data uncertainty, it is solved by a bootstrapping. The final resulting ensembles of ELMs contain a certain level of diversity, which can be used to estimate the model uncertainty and construct the PIs given a certain confidence level.

Forecast model and evaluations
In this study, an integrated model, based on the framework of bootstrap and ELM optimized by the HGWO, is used to obtain the prediction formula for the periodic component and calculate the trend prediction formula aided by DES. The corresponding predictions are summated to get the forecasted cumulative displacements and the prediction intervals (PIs). The main processes are illustrated in Figure 1.

Double exponential smoothing
The DES is designed to process time series exhibiting a certain trend (Holt 2004). This approach works by applying a weighted combination of the past observations with recent observations given relatively higher weight than the older ones (Zhu et al. 2018), and updating the slope component through exponential smoothing. In this study, DES is employed to predict the trend displacement of a landslide with an obvious pattern.
Given an observed time series x i f g , we begin at the time i ¼ 0 to indicate an observation sequence. s i f g denotes the smoothed value of the DES at the time i, b i f g indicates the best estimate of the trend displacement at time i: Thus, the output of DES can be written as F iþm , an estimate of x at the time i þ m, where m ! 1, m 2 N, based on time series data available before time i: Then, DES can be performed based on the following formulas: where i ¼ 1 , 2 , 3:::, f, s 1 ¼ x 1 , b 1 ¼ x 1 Àx 0 , f is the data smoothing factor with 0<f<1, n is the trend soothing factor with 0 < n < 1:

Extreme learning machine
ELM is a hidden-layer feed-forward neural network training/learning method. ELM has advantages on strong generalization and can learn thousands of times faster than networks trained using backpropagation (Huang et al. 2006). In this algorithm, the initial parameter including the weight matrix connecting the input to the hidden layer, the threshold of hidden layer neurons is randomly assigned. During the model training, the output weights of the hidden layer and the global optimal solution can be obtained when the number of hidden neurons is set. Given a set of training data ðx i , t i Þ, where ðx i , t i Þ 2 R n Â R m ði ¼ 1, 2, :::, NÞ: A standard ELM withÑ hidden neurons with an activation function f ðxÞ can be written as follows: where a i ¼ a i1 , a i2 , :::, a in ½ T is the weight matrix connecting the input to the hidden layer,b i is the threshold of hidden layer neurons, b i ¼ b i1 , b i2 , :::, b im ½ T is the weights matrix connecting the hidden to output neurons. We rewrite the above equation compactly as Hb ¼ T, where H is the output matrix of the hidden layer, b is output weigh matrix, and T is the targets matrix. These variables can be expressed as follows: The ELM training process is equivalent to finding the least-squares solutionb of Equation (10), kH a 1 , :::, aÑ , b 1 , :: kH a 1 , :::, aÑ , b 1 , :: The smallest norm least-squares solution isb ¼ H † T, where H † is the Moore-Penrose generalized inversion of the matrix H:

Hybrid grey wolf optimizer
Grey wolf optimizer (GWO) is a recent meta-heuristic algorithm that is inspired by the behaviour of grey wolves (Mirjalili et al. 2014). This algorithm embeds the biological evolution and the 'survival of the fittest' (SOF) principle of biological updating of nature, which leads to favourable convergence. However, the GWO is likely to fall into stagnation when predation suffers slower convergence due to the long sequential execution time (Zhu et al. 2015). Thus, a meta-heuristic search method named differential evolution (DE) algorithm is introduced to help the GWO to jump out of the stagnation, this hybrid GWO is named HGWO.
Assuming the population size is N, the ith wolf in a search dimension d can be written as Thus, the position of the p th (p ¼ 1, 2, :::d) component of the ith individual can be expressed as where k ¼ 1, 2, :::, N X k p ðlowÞ and X k p ðupÞ (k ¼ 1, 2, :::, N) denote the lower and upper boundaries of individual wolves in the search landscape, respectively, randð0, 1Þ represents a random number in ½0, 1: The encircling behaviour of grey wolves can be model as follows: where t represents the current iteration, A and C are coefficient vectors, X indicates the position vector of a grey wolf, X p stands for the position vector of the prey. The coefficient vectors A ¼ 2a Á r 1 Àa, C ¼ 2 Á r 2 , r 1 and r 2 are random variables in the range of [0,1], a is the control coefficient, which linearly decreases from 2 to 0 throughout the iterations. The HGWO allows relocating a solution around another in an n-dimensional search space to simulate chasing and encircling prey by grey wolves in nature. During the simulation, the grey wolves will first search and track the prey, and then encircle it. However, in fact, the optimal prey position is unknown. To mimic wolves internal leadership hierarchy, the wolves are divided into four types: alpha (a), beta (b), delta (d), and omega (x). The objective is to find the minimum in this search landscape; thus in HGWO, it assumes that positions of a, b, and d are likely to be in the prey (optimum) positions. In the iterative search, the best three individuals obtained are recorded respectively as a, b, and d. Other wolves denoted as x relocate their positions according to the locations of a, b, and d. Figure 2 shows how a search agent updates its position according to a, b, and d in a 2D search space. It can be observed that the final position would be in a random place within a circle, which is defined by the positions of a, b, and d in the search space. In other words, a, b, and d estimate the position of the prey, and other wolves update their positions randomly around the prey. The following mathematical formulas are used to update the positions of wolf x.
where X a , X b , and X d represents the position of wolf a, b, and d respectively;C 1 , C 2 , and C 3 are randomly generated vectors. Equations (15)-(17) calculate the distances between the position of the current individual and that of individuala, b, and d. The final locations of the current individual can be calculated by DE, first proposed by Storn and Price (Storn and Price 1997), is a stochastic algorithm solving global optimization. It starts with a random population generation, whose next-generation population is produced based on mutation, crossover, and selection operations. DE adopts a typical novel strategy to produce the variation of individuals: firstly, randomly select three different individuals; then zoom the difference vector of two individuals; finally, synthesize the zoomed vector with the third individual to achieve the mutation operation and remain the diversity of the population as follows: where r 1 6 ¼ r 2 6 ¼ r 3 6 ¼ i; g is the number of iteration; F is the scaling factor;CR represents the crossover probability; j rand denotes a random integer between 1; d is the number of the dimension of the solutions (individuals). In DE, the greedy strategy is adopted to select individuals for the next generation using Equation (20).

Prediction intervals (PIs) and evaluation indices
PIs are common tools used to quantify the uncertainty related to prediction models (Mazloumi et al. 2011). Before we elaborate on PIs, it is worth distinguishing confidence intervals (CIs) from PIs because they are not the same parameter but unfortunately are often confused. A CI is an estimate of an interval computed from the statistics of the observed data, e.g. population mean. CIs consider the distribution P r ðf ðxÞjf ðxÞÞ in Equation (3), and hence only require estimation of r 2 f : A PI is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. PIs consider P r ðFjf ðxÞÞ in Equation (3) and must also consider r 2 e (Dybowski and Roberts 2011; Khosravi et al. 2011a). Therefore, PIs are necessarily wider than CIs.
For a landslide process model in ensemble form, given a set of pairs fðx i , F i Þg N i¼1 , where x i represents model inputs related to influenced factors of the landslide displacement, F i denotes the outputs associated with displacement prediction. A PI construction based on the bootstrap, with a given confidence level of (1 À a) Â 100%, can be expressed as follows: whereLðF i Þ,Û ðF i Þ are the lower and upper bounds of the ith PI, respectively, and a is the quartile of the standard normal distribution.
In this study, based on the bootstrap framework, we use an HWGO-optimized ELM model to construct the PIs of the landslide displacements. Figure 1 gives a schematic of the model used in the bootstrap method to generate PIs. PIs not only provide intervals where targets are highly likely to occur but also indicate the coverage probabilities. From the perspective of prediction, the constructed PIs should be as narrow as possible, whilst capturing a specified portion of data. Here, two performance indices, the PI coverage probabilities (PICPs) (Dybowski and Roberts 2011;Khosravi et al. 2011a) and the PI normalized root-mean-square width (PINRWs) (Lian et al. 2016) are used to assess the performance of a PI. These indices can be expressed by the following: where N is the number of samples, and R equals to the maximum minus minimum of the target value. By definition, PICP is a value that ranges from 0 to 1. However, a perfect PICP (100% confidence level) with an extremely wide PINRWs is meaningless for decisionmakers; thus, large values of PICP with small values of PINRWs indicate high-quality PIs. Given a confidence level of PICP, the objective was to find out the narrowest PINRWs. Therefore, a combined index that can balance the PICP and PINRW is required to provide a comprehensive assessment of PIs. Here, we use a criterion named coverage width-based criterion (CWC) based on a Gaussian function as the comprehensive index. CWC is given by (Lian et al. 2016), where w denotes a small positive value range from 0.1% to 0.5%, and l and d are two hyperparameters of which the values should be set before the learning process. Generally, l is set to 1 À a and d to a small positive value less than 1. For the testing, j is given by CWC aims to find a trade-off between the informativeness (PINRW) and validity (PICP) of PIs. According to Equation (25), the smaller the value of the CWC, the higher the quality of PIs.

Study area and monitoring data set
The Baishuihe landslide sits in the town of Zigui, Hubei province. It is located on the south side of the Yangtze River and spread into the Yangtze River. The landslide is about 56 km away from the Three Gorges Dam, China (Figure 3(a)). As a typical recurrence reservoir landslide, the Baishuihe landslide is fan-shaped (Figure 3 The materials of this landslide are mainly quaternary deposits, including silty clay and fragmented rubble with a loose and disorderly structure. The underlying bedrock is composed of Jurassic Xiangxi Formation, overlain by Quaternary deposits which contain silty mudstone and sand muddy siltstones . As shown in Figure 4, the elevation of the current slide ranges from 75 to 390 m, with a relatively flat central part, while larger gradients in the upper and lower parts of the landslide. As suggested by the field investigation and monitoring data, the landslide can be categorized into a relatively stable block (block-B), and an active back (block-A) (Figures  3 and 4).
As an activated landslide, the Baishuihe landslide was first triggered by the first reservoir impoundment in June 2003. Since then, several cracks have been found on the landslide (Figure 3(c)). To monitor its movement, 11 professional GPS stations and three borehole inclinometers were deployed on-site (Figures 3 and 4) with synchronizing measurements once a month. Besides, the water level of the reservoir has been collected by on-site water level gauge, and the precipitation has been obtained from a monitoring site installed at 9.5 km away from the landslide. The monitoring data are presented in Figure 5.
As shown in Figure 5, the displacement of the Baishuihe landslide exhibited an obvious monotone feature over time as well as seasonal patterns, increasing from April to September each year and remaining stable from October to April in the next following year. Thus, the period of the seasonal characteristic is approximately a year, which is a joint effort of the heavy seasonal precipitation and the fluctuation of reservoir water level. It can also be inferred from Figure 5 that these combined effects of the triggering factors on the landslide displacements show significant time lags. Among the displacement monitoring stations, ZG93 and ZG118 deployed on the active Block-A of the slope, have complete records to represent the dynamic behaviour of the landslide, and are selected for the following analysis in this study.

Time series analysis
The raw data set is first processed by outlier removal and missing value imputation. Then, a wavelet-based de-noising is adapted to remove random noise from time  series displacements. As shown in Figure 5, the landslide displacement pattern comprises a high-frequency seasonal displacement dominated by external triggering factors and a low-frequency monotone displacement affected by geological conditions. Thus, a decomposition of the time series displacements is conducted to establish a response relationship separately, and the result is shown in Figure 6.
As mentioned, the reservoir water level and precipitation have a seasonal trend that generates the landslide with a seasonal displacement pattern. Here, the variation of the precipitation, the reservoir water levels, and the periodic displacements, illustrated in Figure 7, are used to analyse the potential response relations. As shown in Figure 7, significant seasonality patterns of the two major triggering factors in a concordance with the periodic displacements, and the obvious time lags between the factors and the instant displacement are observable.
To evaluate the complex nonlinear relations, the GRA is first used to calculate the significance of two different data sequences. Factors with a grey relational degree (GRD) value greater than 0.6 are believed to have a significant impact on the instant displacements. Consequently, six indicators are selected as the initial influencing factors of the landslide displacement, including monthly average reservoir water level, monthly cumulative rainfall, the previous two-month cumulative rainfall, the amount of reservoir water level fluctuation per month, the amount of reservoir water level fluctuation per two months and the reservoir water level rate per month.
After correlation assessment using GRA, the Gaussian KPCA is adopted to select efficient features (without redundancy) for landslide displacement modelling and forecasting through analysing the six influencing factors and the time series displacements. KPCA performed in a reproducing kernel Hilbert space involves multivariate analyses to understand complicated nonlinear relationships between variables and  their relevance to the problem being studied. Principal components of KPCA, converting from a set of possibly correlated variables to uncorrelated orthogonal information, can realize the dimensionality reduction effectively. Generally, the first few components contain much of the signal with a high signal-to-noise ratio, while the later components are dominated mainly by noise and can be directly disposed of without great loss. As shown in Table 1, the first three principal components concentrate $90% of the total variance contribution, with the first component the highest contribution ($45%), following by $30% of the second component. Therefore, these three components are used as input-influencing factors for model training and forecasting.

Results
As mentioned, ZG93 and ZG118 on the active Block-A have complete records to represent the dynamic behaviour of the landslide and are selected to establish and evaluate the prediction model. Displacements acquired monthly from 2003 to 2013 and the associated daily reservoir water level and precipitation are used in the study. The first 100 groups of the data set (acquired from July 2003 to November 2011) are used as the training data set and the last 16 groups as the testing data set to evaluate the performance of our proposed method. During the prediction of landslide trend displacement, the parameters f and n of DES are set to 0.99 and 0.98, respectively, to obtain smooth results. As illustrated in Figure 8, the predicted trend displacement is close to those on-site measurements, with the absolute error of less than 3 mm, suggesting that the DES is a promising trend forecasting model.
Before modelling the periodic component of the landslide motion, the principal triggering components and the periodic displacements are normalized into the range of [0, 1] to eliminate dimensional effects on ML models and improve the reliability of the forecasts. It will be renormalized back after HGWO-ELM training processing. The bootstrap replicate number is set to 20, and the hidden layer neurons of ELM are set to 12. To further mitigate the effects of the model parameter uncertainty, an ensemble-based ELM is trained with different parameter initializations, and the HGWO will iterate 100 times to get the optimum parameters for each ELM. Then, the regression mean and the variance of the nth bootstrapped HGWO-ELM can be estimated as indicators of the model uncertainty, and the PIs can be constructed given a certain confidence level of (1 À a) Â 100% using Equation (21). The predicted periodic displacements with PIs at a nominal confidence level of 95% are shown in Figure 9. The predicted cumulative displacements with PIs can be derived by combining the two predicted components, and the constructed PIs with a nominal confidence level of 95% are illustrated in Figure 10. As can be seen, the on-site monitoring displacements used as either training or testing are distributed in the constructed PIs with a 95% confidence level. Even in the periods of faster seasonal variations, the results also display satisfactory coverage probability. Thus, the value of PICP obtained using the proposed method is 100%, demonstrating the predictive ability of our proposed method in landslide displacement. The value of PINRWs is 0.2116 and 0.2266 for the   two on-site stations, respectively. In general, high-quality PIs always have a higher value of PICP and a smaller value of PINRW. Thus, as a trade-off between the two, CWC is used; the smaller the value of the CWC, the higher the quality of PIs. In this study, the results of CWC are 0.2126 and 0.2276, respectively, indicating that the proposed model exhibits satisfactory performance in the interval prediction of landslide displacement. Furthermore, the proposed approach has accounted for the existing uncertainties during the model training and forecasting process and thus could generate reliable PIs.

Comparative analysis
To illustrate the superiority of the proposed method over existing state-of-art methods such as hybrid DES and PSO-ELM, hybrid DES, and GWO-ELM, we compare the results of these models based on the same framework of bootstrap. Parameters and other inputs associated with Bootstrap and DES are uniform to facilitate comparative analysis, e.g. the number of bootstrap replicates. Besides, the number of hidden layer nodes and the initial parameters of the ELM model are also consistent with those in the proposed method. During the displacement prediction, the cumulative displacements are first denoised and decomposed; then, the trend term is predicted via the DES, while the periodical displacement is predicted by the HGWO-ELM (our method), the PSO-ELM, and the GWO-ELM algorithms, separately. During this process, three different optimization algorithms, the HGWO, the PSO, and the GWO, are used to optimize the weight matrix connecting the input and hidden layers of ELM. Thus, different weight matrices generated by three different optimization algorithms are used during ELM training and predicting.
Evaluation indices defined in Section 2.3.4 are used to quantify the performance of the proposed method and the comparative methods. Table 2 gives the indices of all the studied methods. As can be seen, the PIs generated by the proposed method and comparison methods display satisfactory coverage probabilities. All the training and testing data lie in the constructed PIs with 95% confidence level. In fact, in the landslide prediction, it is better to cover all the training data with a high confidence level (e.g. 95%) PIs (means high PICP). If not, it is hard to believe that the forecasting model is well-trained, because it is a minor probability event with 5% probability of the data that would fall outside of the PIs. Besides, if there are several data in sequence falling outside of the PIs, model performance would be even worse, because it is too hard to tell whether they are caused by the prediction error or that they indicate a new state of motion of the landslide. Although the CWC is a comprehensive index that encompasses the PICP and the PINRW, given a certain value of PCIP, the CWC values in Table 2 are mainly controlled by the PINRW. The smaller the value of the PINRW, the smaller CWC, and the higher the quality of the PIs. Results in  showed that if the DES model was not used in landslide trend displacement prediction, the value of the PINRWs would be several times larger, illustrating the advantage of the DES. More importantly, the reason for using DES in this study is that it was designed to process time series exhibiting a certain trend and therefore could model the trend displacement of a landslide better than other methods (e.g. polynomial fitting and moving average).
As shown in Table 2, at the monitoring station of ZG118, the values of the CWC of the DES-HGWO-ELM, DES-PSO-ELM, and DES-GWO-ELM methods are 0.2276, 0.4712, and 0.7171, and the corresponding values for ZG93 are 0.2126, 0.4675, and 0.7355, respectively. These indicators suggest that the performance of the proposed DES-HGWO-ELM is better than the other two methods, and the DES-PSO-ELM is the worst among the three. That is, the HGWO gives the best parameter optimization for the ELM compared with the GWO and PSO, ensuring that our proposed method exhibits satisfactory predictive ability in the interval prediction of landslide displacement. The comparative results are shown in Figure 11.
From Figure 11, we can see that the widths of the PIs vary with time, which indicates that the uncertainty of the displacement prediction varies with time and that the bootstrap-based methods could quantify these uncertainties. The PIs generated by the proposed method are the narrowest, followed by the DES-GWO-ELM, and the DES-PSO-ELM ranks the last. These results demonstrate the superiority of our proposed method for interval prediction of landslide displacement.

Discussion
Predicting landslide displacements lays the foundation for the prevention of landslide hazards. The behaviour of a landslide can be characterized by its transient displacements that can be modelled mathematically. ML-related methods trained by historical monitoring data have gained increasing popularity in landslide displacement prediction. However, most of the published works provide only the deterministic point forecasting result and therefore could not accurately represent the dynamic uncertainty of landslide displacement.
We propose a novel hybrid machine-learning model to predict the landslide displacement and quantify the uncertainties in terms of PIs. Through considering the uncertainty, our approach enables users to make a more informed and appropriate decision. Besides, the variation of the PIs width is also important in decision making, because a wide PI represents a high level of uncertainty of the forecasts. Although PIs can be used as a complementary source of information along with point predictions, the calculation of PIs is computationally intensive when dealing with large data sets.
The failure of a slope is often characterized by acceleartion in deformation rate. Hence, the ML models, such as the HGWO-ELM powered by strong learning and generalization capacity, can be practically useful in assisting the prevention and mitigation of landslide hazards. For example, the predicted displacements help set warning thresholds and detect anomalous displacements, which might be a sign of an unprecedented acceleration of a typical landslide and usually trigger the early warning procedures. If the detected anomalous displacements occur at the acceleration stage, landslide models built on creep theory, such as Saito model (Saito 1965), could be used to predict a landslide.
Besides, displacement monitoring and prediction based on in situ measurements is still an important field worthy of further study in landslide research. However, the cost of the ground monitoring devices in terms of time and manpower may limit the application of this method. The developing technologies of the earth observation from the space open a new era for landslide displacement monitoring, e.g., the satellite radar interferometry with monitoring accuracy of a millimeter can also provide displacement measurements coving a wide range every 6 days using open-access Sentinel-1 data set (Hu et al. 2017), and will someday serve as the fundamental data set to fulfil landslide prediction.

Conclusions
Accurate prediction of landslide displacement is a vital part of the early warning of landslide hazard and directly provides technical support for decision-makers in emergency responses. However, prediction errors are often inevitable considering the dynamic uncertainties of landslide evolution and can be very significant in the deterministic point forecasting methods. In this paper, a hybrid machine-learning model has been established to predict the landslide displacements and quantify their uncertainties using prediction intervals (PIs). Through considering the displacement forecasting uncertainty, this approach enables users to make more informed and appropriate decisions in disaster management.
The displacements data set and the associated reservoir water level and precipitation over the Baishuihe landslide in TGR, China, have been utilized in this study. After outlier removal and missing value imputation, the original landslide displacement time series is decomposed into the periodic, trend, and noise components by wavelet de-noising and Hodrick-Prescott (HP) filters. The trend displacement that exhibits an obvious monotone feature over time is predicted by the DES, while the periodic displacement induced by the seasonal triggering factors is predicted by the ELM. Then, the predicted cumulative displacement can be obtained.
During modelling, bootstrap sampling and a HGWO are also applied to ensure the optimal performance of the ELM. Besides, the input variables for model training also have a big impact on prediction accuracy. Thus, GRA and KPCA are also combined to generate efficient triggering features for the ELM. Results validate the predictive modelling capacity of HGWO-ELM in forecasting landslide seasonal displacement.
To further illustrate the effectiveness and reliability of the proposed method, comparative analysis of other state-of-art methods including hybrid DES and PSO-ELM, hybrid DES, and GWO-ELM has also been conducted. Results show that our proposed method can guarantee a perfect PICP with the narrowest PIs (PINRWs) and the minimum CWC value. Thus, the proposed model has the capacity of effectiveness and can give a more satisfactory performance than other state-of-art methods in the interval prediction of the landslide displacement.