Improved sensor fault detection, isolation, and mitigation using multiple observers approach

ABSTRACT Traditional fault detection and isolation (FDI) methods analyze a residual signal to detect and isolate sensor faults. The residual signal is the difference between the sensor measurements and the estimated outputs of the system based on an observer. The traditional residual-based FDI methods, however, have some limitations. First, they require that the observer has reached its steady state. In addition, residual-based methods may not detect some sensor faults, such as faults on critical sensors that result in an unobservable system. Furthermore, the system may be in jeopardy if actions required for mitigating the impact of the faulty sensors are not taken before the faulty sensors are identified. The contribution of this paper is to propose three new methods to address these limitations. Faults that occur during the observers' transient state can be detected by analysing the convergence rate of the estimation error. Open-loop observers, which do not rely on sensor information, are used to detect faults on critical sensors. By switching among different observers, we can potentially mitigate the impact of the faulty sensor during the FDI process. These three methods are systematically integrated with a previously developed residual-based method to provide an improved FDI and mitigation capability framework. The overall approach is validated mathematically, and the effectiveness of the overall approach is demonstrated through simulation on a five-state suspension system.


Introduction
Sensors are considered to be the weak link in a system, especially when they transmit data through a vulnerable public network (e.g. the Internet) (Cardenas, Amin, & Sastry, 2008;Silva, Saxena, Balaban, & Goebel, 2012). A sensor fault in a physical system can be a major problem that may degrade the system performance, and even put the system in jeopardy in severe cases. The International Federation of Automatic Control (IFAC) SAFEPROCESS Technical Committee defines a fault as an unpermitted deviation of at least one characteristic property or parameter of the system from the acceptable/usual/standard condition (Isermann, 1997;Schrick, 1997).
Fault detection and isolation (FDI) and fault mitigation mechanisms are crucial for protecting a system that is susceptible to sensor faults. Fault detection makes a binary decision on whether a fault has occurred or not. Fault isolation determines the location, and assesses the extent of the fault (Willsky, 1976). Fault mitigation reduces the effect of the fault (Dubey et al., 2007). Fault mitigation differs from Fault Tolerant Control, which aims at controlling the faulty system in the presence of the fault. In this paper, we propose three new methods to improve CONTACT Zheng Wang zhengwa@umich.edu the performance of the traditional sensor fault detection, isolation and mitigation method.

Literature review
A significant amount of research has been carried out to detect and isolate sensor faults using observer-based methods due to their cost efficiency. The most common approach is to calculate residuals based on the difference between the measured outputs of the system and the estimated outputs of the observer, and compare residuals with certain thresholds to detect a sensor fault (Hwang, Kim, Kim, & Seah, 2010). For fault detection, a single observer or Kalman filter is sufficient . Fault isolation is usually addressed with a bank of observers, called a dedicated observer scheme (DOS) (Frank and Ding, 1997). In the DOS proposed by , each observer uses only one sensor for state estimation based on the assumption that the system is observable with any one of the sensors. Similarly, Bouibed, Seddiki, Guelton, and Akdag (2014) design multiple robust sliding mode observers with different subsets of sensor measurements and actuator inputs to generate residuals for both sensor and actuator faults detection. Each sliding mode observer excluding a particular sensor or actuator is designed so that the residual generated by this observer is sensitive to a fault on this sensor or actuator, but insensitive to faults on other sensors and actuators. In addition to observers designed using different inputs and outputs of the physical system, some DOSs consist of unknown input observers. Chadli, Akhenak, Maquin, and Ragot (2008) use a sliding mode observer to detect and isolate faults for nonlinear systems represented by multiple local linear models. The sliding mode observer is a linear combination of several local unknown input observers which can isolate the unknown disturbances to achieve robust FDI. Instead of isolating unknown disturbances, Methnani, Lafont, Gauthier, Damak, and Toumi (2013) consider a single additive fault as an unknown input, and attempt to reconstruct the fault with a bank of unknown input observers for each sensor and actuator.
An observer-based method can also be integrated with other methods for FDI. Rios, Edwards, Davila, and Fridman (2015) propose an approach that combines a high-order-sliding-mode multiple-observer technique and a multiple-model technique. This combined methodology has the advantages of both sliding mode observers and multiple models. The equivalent output injection of a sliding mode observer, which is a function of estimation error, can be used as a residual to detect faults in the system. Multiple models can be designed based on different fault scenarios to isolate faults.
Note that all of the methods mentioned above assume that the observers have reached their steady state so that the effect of the uncertain initial condition on a residual has died out. Otherwise, the methods may generate false alarms or missed alarms.
Some types of sensor faults may not be detected by traditional fault detection methods based on closed-loop observers. These include sensor faults caused by certain types of cyber attacks on a networked control system. Liu, Ning, and Reiter (2011) propose a cyber attack that injects false data in the sensor measurements and show that this attack cannot be detected by a static residual-based fault detector. To detect this type of sensor fault with a static residual-based fault detector, Bobba et al. (2010) propose to protect the subset of sensor measurements which are needed to ensure the system observability. Mo and Sinopoli (2010) and Mo and Sinopoli (2015) propose another kind of cyber attack which can bypass not only a static fault detector, but also one utilizing the system dynamics, such as a χ 2 fault detector. Theorem 2 in Mo and Sinopoli (2010) indicates that the system is not detectable when removing the faulty sensor, and as a result, the attacker could impose arbitrary large errors between the faulty sensor measurements and the actual system outputs. The faulty sensors in Mo and Sinopoli (2010) are a subset of the critical sensors that are indispensable for system observability. Instead of closed-loop observers, a method using open-loop observers is needed to detect critical sensor faults.
After a fault is detected and isolated, a control scheme is reconfigured (Edwards and Tan, 2006;Choy and Weyer, 2008). Although the diagnosis of a fault can lead to appropriate maintenance, the physical system may be in jeopardy during the diagnosis time. A timely mitigation technique during the FDI process may help maintain acceptable performance of the physical system. To the best of our knowledge, fault mitigation techniques that can be applied during the FDI process have not been developed for sensor faults (Lefebvre, 2014).
Based on our literature review, three research gaps are identified: (1) how to detect a sensor fault during the observers' transient state; (2) how to detect a sensor fault that can bypass a closed-loop observer-based method; and (3) how to potentially mitigate the impact of a sensor fault during the FDI process.

Contribution
Given a linear time-invariant discrete-time system with multiple sensors, assuming only one sensor is faulty at a time, the general goals of this research are to • determine the occurrence of a sensor fault; • identify the faulty sensor and estimate the fault signal; and • mitigate the impact of the sensor fault.
With respect to the previously mentioned research gaps, our contribution is to propose three new methods that respectively (1) enable sensor fault detection and reduce false alarms during the observers' transient state; (2) detect faults on critical sensors; and (3) potentially mitigate the impact of the faulty sensor during the FDI process.
These three methods are then systematically integrated with a previously developed residual-based method to create a new FDI and mitigation framework. The first two contributions are shown in Figure 1. The rest of the paper is organized as follows. In Section 2, an overview of problem statement and solution is provided. In Section 3, the mathematical description of the system is given. In Section 4, we introduce three new methods to address the research gaps, and the proposed methods are integrated with a previously developed method. In Section 5, an illustrative example validates the proposed algorithm. Conclusion and future work are given in Section 6.

Problem/solution overview
Given a linear time-invariant discrete-time system with multiple sensors, multiple observers, a state feedback controller, a residual-based fault detector, and the following assumption.
Assumption 2.1: Only one sensor is faulty at a time.
The specific goals of this paper are to • propose a non-residual based method for sensor fault detection during the observers' transient state (Contribution 1); • propose a method for critical sensor FDI (Contribution 2); • propose a method to potentially mitigate the impact of the faulty sensor during the FDI process (Contribution 3); and • systematically integrate the three new methods with a previously developed residual-based method for FDI and mitigation.
Based on the one faulty sensor assumption, the sensors can be divided into two sets. In one set, the sensors are indispensable for system observability. They are called critical sensors in this paper. In another set, the system is still observable with one sensor removed. These sensors are called non-critical sensors. Faults on non-critical sensors can be detected and isolated using a closed-loop observer which is designed excluding the faulty sensor. Since some sensor faults caused by certain types of cyber attacks (Mo and Sinopoli, 2010) on critical sensors are disguised as sensor noise, we use a bank of open-loop observers, which are artificial copies of the system fed with the same input signal (Bemporad, 2010). Two methods are running in parallel to determine which sensor is faulty. One method is based on closed-loop observers, while the other is based on open-loop observers.
To detect faults on non-critical sensors, we design one closed-loop observer with all of the sensor measurements, and multiple closed-loop observers each with one non-critical sensor excluded. Each observer is compared with all other observers, and the difference of estimated states between two observers is decoupled to calculate the estimation errors of these two observers. Thus, each observer has multiple calculated estimation errors. These calculated estimation errors are combined to determine the overall estimation error of the observer. The convergence ratio (CR) of the estimation error of an observer should be related to the designed state matrix of the observer, and not affected by the uncertain initial condition. But a sensor fault or a disturbance can change the CR of the estimation error. Based on this property, we propose the CR method for fault detection to reduce the false alarms during the observers' transient state. Bias analysis based on the calculated estimation errors is developed to distinguish a sensor fault from a disturbance. In the ideal case, the biases calculated based on the estimation errors of all observers should be the same when the system is under disturbance, but should be different under sensor fault. With bounded system noise, the bound of the difference between the calculated bias and the actual disturbance signal can be determined. Therefore, a threshold can be selected and compared with the difference between any two calculated biases. The threshold is specific for each pair of biases. If any one pair of them exceeds their threshold, the system is under sensor fault. Otherwise, the system is under disturbance.
To detect and isolate faults on critical sensors, we design multiple open-loop observers (MOLO), and analyze the residuals formed based on the difference between the measured outputs of the system and the estimated outputs. This method is only applicable to an open-loop stable or marginally stable system. If the system is open-loop unstable, the estimation error of an open-loop observer could diverge exponentially. To increase the estimation accuracy, we periodically update the states of multiple open-loop observers with the state estimated by the closed-loop observer using all of the sensor measurements when no fault is detected. There is a trade-off between estimation performance and the ability to detect a sensor fault. Therefore, we divide the multiple open-loop observers into several groups. The observers within the same group are updated with the same update frequency. To mitigate the impact of noise, the update time steps of the observers in the same group are distributed evenly within one update period, and the residuals generated by the observers within the same group are averaged. The averaged residual is compared with a threshold, which is related to the known upper bound of noise and the update frequency. If the residual is larger than the threshold, then an alarm is triggered and the states of the open-loop observers of that group are not updated with the estimated state of the closed-loop observer until the alarm is cleared. Logic is provided to determine whether the system is under normal operation or under sensor fault based on which groups of open-loop observers trigger alarms. Then the residuals of the groups that trigger alarms are analysed to determine which sensor is faulty.
For fault mitigation, we also need to consider two cases: faults on critical sensors and faults on non-critical sensors. For faults on non-critical sensors, a closed-loop observer without the faulty sensor provides a better state estimation, based on which a state feedback controller can give the control input closest to the ideal control input. Thus, pinpointing this observer during the FDI process is the key for fault mitigation. Based on this property, we propose the calculated control input (CCI) method to switch among different observers, and potentially mitigate the impact of the fault on a non-critical sensor during the FDI process. For faults on critical sensors, none of the closed-loop observers can provide a good state estimation. If the system is open-loop stable, we can use an open-loop observer for state estimation to mitigate the impact of the sensor fault. If the system is marginally stable, the only way is to replace the faulty sensor.
We also need a residual-based method based on closed-loop observers for non-critical sensor fault isolation. In this paper, we use a method adopted from Bouibed et al. (2014), and call it the calculated outputs (CO) method. The method in Bouibed et al. (2014) consists of several sliding mode observers, each excluding a particular sensor or actuator. The sliding mode observer without the faulty sensor generates a significant residual signal. In contrast, we use a bank of Luenberger observers (or Kalman filters) 1 for the CO method. In this case, the observers with the faulty sensor generate significant residuals, and the CO method is not robust to disturbance in the system. Table 1 shows the abilities of the CO, CR, MOLO, and CCI methods. Figure 2(a) shows when to use those four methods based on their abilities. We systematically integrate them as shown in Figure 2. During the observers' transient state, we use the CR method for non-critical

Mathematical formulation of the problem
The analysis is carried out based on a linear time-invariant discrete-time system equipped with multiple observers, a state feedback controller and a residual-based fault detector.

Physical system
We model the physical system as a linear time-invariant discrete-time system. It has the following form: is the fault signal added to the sensor measurements, the process noise w(k) ∈ R n×1 and the sensor noise v(k) ∈ R m×1 are zero mean random vectors with bounds w(k) ≤ ω and v(k) ≤ υ (in this paper, we use · to denote · ∞ ), respectively, A ∈ R n×n , B ∈ R n×p , C ∈ R m×n , D ∈ R n×s are real constant matrices, and F = [0 · · · 1 i f · · · 0] T ∈ R m×1 is a fault vector, with 0 corresponding to the faultless sensor, and 1 i f corresponding to the faulty sensor, and i f is the index for the faulty sensor. Based on Assumption 2.1, F has at most one non-zero element.

Closed-loop observers and open-loop observers
At each time step, all of the sensor measurements y(k) and the control inputs u(k) are gathered for state estimation. Two different kinds of observers can be utilized: closed-loop observers and open-loop observers.

Closed-loop observers
A closed-loop observer corrects the estimation with a feedback from the sensor measurements as shown in Figure 3. Based on Assumption 2.1, sensor measurements can be divided into two sets: S nc and S c . S nc contains m o non-critical sensors. S c contains critical sensors. In order to design multiple closed-loop observers, we need the following assumption: Assumption 3.1: Set S nc contains at least one noncritical sensor, i.e. m o > 0.
We assume without loss of generality that the rows of the output matrix C are ordered such that the first m o sensors are non-critical sensors. Thus, m o + 1 closedloop observers can be designed. Observer 0 uses all of the sensor measurements. Observer i uses all but sensor i (i = 1, 2, . . . , m o ). For the closed-loop observers, we use Luenberger observers with the following form: wherex i (k) ∈ R n×1 is the state estimated by the closedloop observer i (i = 0, 1, 2, . . . , m o ), y i (k) ∈ R (m−1)×1 is the sensor measurements used by observer i which does not contain the ith element of y(k), v i (k) does not contain the ith sensor noise, is the observer gain, placing the eigenvalues of E i in the unit circle, C i ∈ R (m−1)×n is the output matrix for observer i and it does not contain the ith row of C, and F i ∈ R (m−1)×1 is the fault vector of observer i which does not contain the ith element of F. If i = i f , then F i = 0 (m−1)×1 . This means that observer i f does not use the faulty sensor i f for state estimation. The corresponding observer state matrix and observer gain that do not use the faulty sensor are E i f and L i f , respectively.

Remark:
Our assumption indicates that the system is detectable without one of the sensors in S nc . If the system is detectable and the noise is truncated Gaussian, the time varying gain of a Kalman filter converges in a few steps. Therefore, for the closed-loop observers, we can also use Kalman filters with the steady-state Kalman gains (Mo and Sinopoli, 2010).

Open-loop observers
An open-loop observer is running in parallel with the physical system, reproducing the behaviour of the system as shown in Figure 4. Due to the lack of guaranteed estimation error convergence, the state of the open-loop observer is updated periodically by the closed-loop observer 0 which uses all of the sensor measurements. As mentioned in Section 2, we design M groups of open-loop observers, each group with N observers. The observers in the same group have the same update period. Then, an open-loop observer has the following form after one update period wherex g,i (k) ∈ R n×1 is the state estimated by the openloop observer i in group g (i = 1, . . . , N, g = 1, . . . , M), and κ f ,g is the update period of group g.

State feedback controller
A state feedback controller calculates a control command based on the system state, and applies it to the input of the system. The following assumption enables the utilization of a state feedback controller.

Assumption 3.2:
The system is controllable.
Since the real state of the system is unknown, the controller can only use the state estimated by a closedloop observer with the following form (Phillips and Nagle, 1994): where K ∈ R p×n is the controller gain placing the eigenvalues of A + BK in the unit circle. Notice that an openloop observer cannot provide as good of an estimation of performance as a closed-loop observer due to system noise. Therefore, we use a closed-loop observer for the state feedback controller if the system is under normal operation or under non-critical sensor fault. If an openloop stable system is under critical sensor fault, then we can switch to an open-loop observer to help mitigate the impact of the sensor fault.

Residual-based fault detector
In this paper, the residual-based fault detector uses the CO method, which is adopted from Bouibed et al. (2014).
In contrast to the method in Bouibed et al. (2014), the CO method consists of multiple Luenberger observers as shown in Equation (2), and generates the residuals based on the subtraction between the sensor measurements y i (without the ith output) and the estimated outputs C ixi as shown in Equation (5) where Q i is a real constant weighting matrix for observer i 2 , and r i (k) ∈ R is the residual generated based on observer i. The residual generated based on observer 0 is compared with a selected threshold θ CO to determine the occurrence of a sensor fault. When a sensor fault occurs, the closed-loop observer i f , which does not use the faulty sensor, is not affected by the sensor fault, and thus provides a better state estimation compared to other observers . 3 Then the residual generated by observer i f is smaller than the residuals generated by other observers (i = i f ). Therefore, we can locate the faulty sensor by finding the smallest residual among the observers from 1 to m o . After the faulty sensor is located, the estimated fault signal is given byf Algorithm 1 gives the procedure of the CO method. First, we calculate the residuals based on different observers. Then, we use the residual of observer 0 for fault detection, and compare the rest of the residuals for fault isolation. Notice that the CO method cannot distinguish a disturbance from a sensor fault since Luenberger observer is not robust to disturbance. This issue is addressed by complementing the CO method with the CR Algorithm 1: CO method for sensor FDI function CO; ; end method introduced in Section 4.3 which has the ability to distinguish a disturbance from a sensor fault.

Notations
Main notations are summarized here. x is the real system state.x is the estimated state by a closed-loop observer.x is the estimated state by an open-loop observer. e is the estimation error between observer state and system real state. e μ,ν is the difference of estimated states between two closed-loop observers μ and ν.ẽ μ(ν) is the calculated estimation error of closed-loop observer μ, and the calculation is based on e μ,ν . Detailed notations are shown in Table A.7

Framework components description and integration
Throughout this section, a simple system of a moving object is utilized as an illustration. First, we simulate sensor faults on the moving object system equipped with the CO method-based fault detector to understand its limitations. Then, three new methods are introduced and analysed in the deterministic case (noise free). The impact of random system noise is discussed for each method thereafter. The simulation result shows the improvements of the proposed methods compared to the CO method. Finally, we provide an algorithm to integrate the CO method and the three new methods.

Moving object system
The moving object system is a 1 kg mass moving along a horizontal line. Two sensors are measuring the two outputs: the velocity y v and the position y p , respectively. A state feedback controller applies a horizontal force on the mass. The sampling time is 0.1 s. The system has initial state (0, 0), process noise with bound 0.001 (m/s or m), and sensor noise with bound 0.01 (m/s or m). The initial states of the observers are chosen as (1, 0.5) 4 . The state space representation of the moving object system is shown as where x = x v 1 0.005 , and C = 1 0 0 1 . By checking the rank of observability matrix, the moving object system is observable with y v and y p or only y p , but unobservable with only y v . Therefore, y v ∈ S nc , and y p ∈ S c . Two observers can be designed with observer poles placed at [0.1 0.11]. Observer 0 uses both sensor measurements y v and y p . Observer 1 uses only y p .
Two fault scenarios are considered: (1) fault α: a ramp signal with slope 0.05 m/s 2 (0.005 m/s per time step) added to the velocity sensor y v , saturating at 1 m/s; (2) fault β: a ramp signal with slope 0.001 m/s (0.0001 m per time step) added to the position sensor y p , saturating at 1 m.
Both faults start at 10 s and run until the end of the simulation. Here, we consider ramp faults with small slopes because they are hard to detect compared to ramp faults with large slopes or step faults with large magnitudes.

The impact of sensor faults
Two fault cases are run on the moving object system equipped with the CO method-based fault detector to show its limitations. Based on each limitation, a new method is discussed and proposed. Figures 5-7 show the estimated position states of both observersx 0 ,x 1 , the real state x, and the sensor measurement y of the system equipped with the CO methodbased fault detector under fault α, β and normal operation, respectively. In both Figures 5 and 6, false alarms are generated by the CO method during the observers' transient state, which is about 0.2 s, when the system is actually under normal operation. From Figure 7, it can be seen that the imperfect initial state of the observers causes the CO method to generate false alarms. According to Equation (5), the residual r i (k) of the CO method is a function of the observer's estimation error under normal operation. A large estimation error makes the residual exceed the threshold, causing false alarms during the observers' transient state. To enable fault detection during observers' transient state, the CR method, described in Section 4.3, which utilizes the CR of observers' estimation error, will be applied.
As shown in Figure 6(b), when the system is under fault β, no alarm is generated since the fault is not detected by the CO method-based fault detector. The reason behind this behaviour is that the system is not detectable when the position sensor y p is removed, and the fault signal is changing slightly at each time step to avoid significant change in the residuals. An open-loop observer (3) does not use any sensor for state estimation. Thus, this issue can be potentially addressed by the MOLO method introduced in Section 4.4.
As shown in Figure 5(b, c), although the CO method successfully locates the faulty sensor and then the system switches to observer 1 for state estimation after  18 s, there is 8 s detection delay and the system switches between the two observers during 13 s to 18 s. This is caused by the relatively small fault signal compared to the system noise and the threshold. Thus, the faulty sensor cannot be located immediately. This detection delay makes the maximum absolute value of the position of the mass reach 30 cm as shown in Figure 5(a). The direct reason for this divergence is the discrepancy of the control input provided by the observer-based state feedback controller. To address this issue, we need to switch to the closed-loop observer without the faulty sensor as soon as possible and continue using that observer during the FDI process. Thus, we propose the CCI method to compare the control input calculated based on the state estimated by each closed-loop observer with an 'ideal' control input calculated based on the state estimated by an open-loop observer, and to switch to the observer which gives the smallest difference between the CCI and the ideal control input. This method has the potential to mitigate the impact of a non-critical faulty sensor during the FDI process. The maximum absolute value of the position of the system under the CO method will be compared with that under the CCI method in Section 4.5.

CR method for fault detection during transient and steady state
This method is proposed to detect the occurrence of an anomaly based on the convergence of estimation error. It enables fault detection during the observers' transient state. To achieve robust fault detection, a disturbance in the system is distinguished from a sensor fault by analysing the bias of the estimation error. First, this method is introduced on an ideal control system. Then the impact of the process noise and the sensor noise are discussed.

Ideal system case
Three different situations are considered for this method: normal operation, disturbance, and sensor fault. Estimation error e i of closed-loop observer i, and the difference of estimated states e μ,ν between two closed-loop observers μ and ν under three situations are shown in Equation (8) through Equation (13). Under normal operation Under disturbance Notice that Equations (9) and (11) The first step of the CR method is to calculate the estimation error of each closed-loop observer. The dynamics of e μ,ν under both normal operation and disturbance are the evolution of the estimation errors of the two closedloop observers e μ and e ν . Therefore, the estimation errors of both observers can be decoupled over two time steps. However, the dynamics of e μ,ν under sensor fault involves two unknown fault vectors F μ and F ν , and the unknown fault signal f (k). Thus, the estimation errors cannot be correctly decoupled under sensor fault. Lemma 4.1 gives the formulas for estimation error decoupling of any two different observers.
(2) Under sensor fault, the evolution of e μ,ν (13) and e μ,ν (k) = e ν (k) − e μ (k) are substituted to Equation (14), Based on Lemma 4.1, m o estimation errors can be calculated for each observer. In ideal system case, these m o estimation errors are averaged to be the estimation error e i of each observer. The combination of m o estimation errors for a noisy system is introduced in Section 4.3.2.
After getting the estimation errors of all of the observers, the next step is to analyze the convergence behaviour of the estimation error of each observer. For each observer,ẽ i ∈ R n×1 contains n states. The evolution matrix E i of the estimation error of observer i may not be a diagonal matrix. This causes the coupling of estimation errors between different states, which makes the ratio of estimation error of each state non-constant. Therefore, instead of using the estimation errors directly, we diagonalize the evolution matrix E i using a basis of eigenvectors are the same as the time-invariant observer poles. Then, we can define the CR to specify the convergence of the estimation error for each state.
is the jth element iñ e ,i (k), and κ CR is a selected integer to average the CRs over κ CR time steps. Based on the above definition, the CR of each estimation error {cr i } j is actually the same as the corresponding jth observer pole under normal operation. This is also indicated by An anomaly (a disturbance or a sensor fault) can change the CR of the estimation error in two possible cases. One case is that an anomaly makes the estimation error converge faster to zero. The other case is that an anomaly makes the estimation error converge slower or diverge to some other non-zero value. In ideal system case, the anomalies in both cases can be detected by comparing the CRs with observer poles. If a CR is larger or smaller than its corresponding observer pole, then this CR indicates the occurrence of an anomaly. Definition 4.2 shows that (m o + 1) × n CRs are calculated at each time step. Because of the system noise, it is possible that some of the CRs indicate an anomaly even though there is no anomaly. So we define the system as an anomalous system if as least half of the CRs indicate anomaly. A threshold is selected for noisy system as discussed in Section 4.3.2.
To achieve robust fault detection, a disturbance should be distinguished from a sensor fault (Hwang et al., 2010). For this purpose, bias is defined Under disturbance, the bias is Dd(k), which is the same for all observers. Under sensor fault, the bias is −L i F i f (k), which is different for different observers. The disturbance signal d(k) can be correctly determined when the system is under disturbance because of the correct decoupled estimation error. In contrast, the fault signal cannot be correctly determined because of the incorrect decoupled estimation error and unknown F i . Based on this analysis, the bias is calculated based on each observer according to Equation (22)  (1) When the system is under disturbance, (2) When the system is under sensor fault,

Proof: See Appendix 2
Theorem 4.4 shows that (m o + 1) × m o biases are calculated at each time step. Each bias is compared with other biases. If any two biases disagree with each other, then the system is under sensor fault. If the bias analysis indicates that the system is under disturbance, then we can determine the disturbance signal by averaging all of the biases. The combination of all of the biases for a noisy system is introduced in Section 4.3.2.

Noisy system case
Lemma 4.1 and Theorem 4.4 in Section 4.3.1 show the effectiveness of the CR method in fault detection when the system is ideal. In practice, we also need to consider system noise: process noise and sensor noise. When only process noise exists in the system, the output of the system can still be correctly measured, which means the state of the system can be exactly known. Therefore, process noise does not affect the accuracy of the estimation error calculation. However, when sensor noise contaminates the sensor measurements, the estimation error cannot be correctly calculated. The boundedness of sensor noise ensures the boundedness of the error of estimation error ẽ μ(ν) − e μ . Lemmas 4.5 and 4.6 give the impact of process noise and the impact of sensor noise on the estimation error calculation, respectively.
Lemma 4.5: Given a control system (1) with bounded process noise and v(k) = 0,ẽ μ(ν) (k) = e μ (k) still holds when the system is under normal operation or under disturbance.
Proof: When the system is subject to the process noise w(k), the estimation error evolution becomes Then the difference of the estimated states between two observers μ and ν is the same as Equation (9). By substituting Equation (9) into Equation (14), the calculated estimation error becomes Lemma 4.6: Given a control system (1) with bounded sensor noise and w(k Proof: See Appendix 3. Lemma 4.6 shows that the impact of sensor noise is different for estimation errors calculated based on different pairs of observers. Thus, when calculating the estimation error of each observer, we combine its m o decoupled estimation errors with different weighting ratios. The weighting ratio is determined based on the bound of ẽ μ(ν) − e μ . If the bound of ẽ μ(ν) − e μ is larger, then the corresponding weighting ratio is smaller. The combined estimation error and the weighting ratio are shown as follows:ẽ The sensor noise affects the accuracy of estimation error decoupling, thus affecting the CRs and anomaly detection. Lemma 4.6 indicates that the impact of sensor noise can be mitigated by choosing the observer gains L μ and L ν with smaller norms. An observer gain with a smaller norm, however, may reduce the convergence speed of the estimation error. Thus, there is a trade-off in choosing observer gains. The impact of sensor noise on the CRs can also be mitigated via averaging over κ CR time steps as shown in Definition 4.2. In addition to techniques for mitigating the impact of sensor noise, a threshold θ CR for CRs should be selected to balance the tolerance of system noise and the ability to detect an anomaly. As discussed in Section 4.3.1, the CRs are the same as the observer poles under normal operation but they are different from observer poles under anomaly in ideal case. However, the observer poles are usually selected to be close to 0 to ensure fast observer's estimation error convergence and noise exists on the system. So we select a upper threshold θ CR , which is larger than the largest observer pole but less than one. Then the sensor fault, which makes the estimation error converge faster, cannot be detected by the CR method. With the threshold θ CR , the lower bound of the fault signal that can be detected is (κ CR = 1) This lower bound is proportional to the threshold θ CR and the bound of the sensor noise υ. Both the process noise and the sensor noise affect the accuracy of the bias calculation, thus affecting the ability to distinguish a disturbance from a sensor noise. Based on the boundedness of the process noise and the sensor noise, the error of the bias calculation d ,μ(ν) (k) − d(k) is also bounded when the system is under disturbance. Lemmas 4.7 and 4.8 give the bound of d ,μ(ν) (k) − d(k) under disturbance when the system is subject to either the process noise or the sensor noise, respectively.
Combining Lemmas 4.7 and 4.8, the bound of the error of the bias calculation is Notice that the bounds are different for biases calculated based on different pairs of observers, and that they are all zero-mean. Based on the bounds, one specific threshold θ d,μ(ν),ζ(η) (μ, ν, ζ , η = 0, .., m o ∧ μ = ν ∧ ζ = η) can be selected to compare with the difference between any two biases averaged over κ CR time steps, thus determining whether the system is under disturbance or sensor fault. If any one pair of the biases exceeds the corresponding threshold, then the system is under sensor noise. Otherwise, the system is under disturbance.
If the system is under disturbance, the combination of the weighted biases is considered as the disturbance signal. The weighting ratio of each bias is determined based on the bound of d ,μ(ν) (k) − d(k) . If the bound is larger, then the corresponding weighting ratio is smaller. The combined bias and the weighting ratio are shown as follows: Algorithm 2 shows the procedure of the CR method. The CR method contains three steps. The first step is to calculate the estimation error for each observer. Then the CRs of the estimation errors are used to detect the occurrence of an anomaly. If an anomaly is detected, biases are calculated and analysed to determine whether the anomaly is a disturbance or a sensor fault. Figure 8 shows the fault alarms generated by the CR method under fault α. During the observers' transient state, false alarms are eliminated compared to Figures  5(b), 6(b) and 7(b). When the system is under sensor fault α, there is about 2 s detection delay, which is caused by κ CR for averaging the CR and the threshold θ CR . The detection delay is decreased compared to the 8 s detection delay in Figure 5.

MOLO method for critical sensor FDI
The MOLO method has the potential to detect and isolate faults on critical sensors.  Input :x i (k − κ CR : k + 1)(i = 0, 1, . . . , m o ) from time step k − κ CR to k + 1 Output: I A , I F , I D ,d(k − 1) //Estimation error calculation; In noise-free case (w(k) = 0, v(k) = 0), the MOLO method only works if the open-loop system is stable or marginally stable. This is due to the fact that the estimation error of open-loop observer will diverge if the system is unstable, i.e. the eigenvalues of A lie outside of the unit circle, according to Equation (29).
where e o (0) is the initial estimation error. After introducing system noise, the condition for the estimation error of an open-loop observer to be bounded is given in Proposition 4.9.
Proposition 4.9: Given a control system (1), and an openloop observer (3) the following results can be drawn: (

1) If all of the eigenvalues of A lie inside the unit circle, then the estimation error of an open-loop observer is bounded; (2) If one or more of the eigenvalues of A lie on the unit circle and A = 1, then the estimation error of an open-loop observer is bounded.
Proof: See Appendix 6.
For systems that do not satisfy the conditions in Proposition 4.9, we need to periodically update the state of the open-loop observer with the state estimated by the closed-loop observer 0 which uses all of the sensor measurements when no fault is detected. The initial estimation error of the open-loop observer is then the same as the estimation error of the closed-loop observer.
There is a trade-off between the estimation performance and the ability to detect a critical sensor fault. If the update frequency is fast, then the state estimated by the open-loop observer can track the state estimated by the closed-loop observer well, which is indicated by where e(0) is the estimation error of the closed-loop observer 0. If k is smaller, then the divergence of k−1 i=0 A i w(k − 1 − i) is smaller, which means a better estimation under normal operation. However, fast update frequency can degrade the ability to detect a sensor fault, which is indicated by The ramp fault signal f (k) is increasing with the time step k. At the time step that f (k) is significant, the fault can be detected.
The above discussion on the trade-off shows the necessity to have multiple open-loop observers for a marginally stable system with A > 1. In this paper, we divide the multiple open-loop observers into M groups. Group 1 has the slowest update frequency and group M has the fastest update frequency. Each group has N observers with the same update frequency. Based on the trade-off, if one group triggers an alarm, then the groups with slower update frequencies generate alarms as well, but the groups with faster update frequencies may not generate alarms. So if all of the groups detect a sensor fault, then we can say that the fault signal has a large slope. If only some of the groups detect a sensor fault, then we can say that the fault signal has a small slope.
Although the estimated state under the case that A > 1 may diverge for a marginally stable system, we can mitigate the impact of the process noise via averaging because the process noise has zero mean. To average the residuals, we need to find the time steps that the open-loop observers have similar divergence caused by system noise. Taking one open-loop observer for example, the state of the open-loop observer is updated every κ f ,g time steps and has been updated for j N times. At time step k + (j N − 1)κ f ,g , we need to average the residual at time steps k + (j N − j)κ f ,g (j = 1, . . . , j N ) to mitigate the impact of system noise. Proposition 4.10 validates the effectiveness of averaging. Proposition 4.10: Given a control system (1) an open-loop observer is updated every κ f ,g time steps. The impact of the system noise on the averaged residual (32) is mitigated. the time steps that are needed for averaging is about j N · κ f ,g , which is large. To reduce the time steps for averaging, we have N (N ≤ κ f ,g ) open-loop observers in each group. We evenly distribute the time steps to update the states of the open-loop observers within the same group during one update period and we have where κ ,g is the update time step interval between two adjacent open-loop observers i and i+1 in group g. Then we calculate the average of the residuals generated by the open-loop observers in the same group.
In order to average the residuals of N observers, we need the following definition Definition 4.11 (Leading observer): The leading observer is the open-loop observer which has not been updated for the longest time steps among all of the observers in the same group during the time steps (j − 1) · κ ,g and j · κ ,g , where j is a positive integer. The leading observer could be found according to the following formula: Note that if (k − κ f ,g k/κ f ,g ) /κ ,g equals N, then set H g = 1.
To average the residuals, the first step is to find the leading observer during the time steps (j − 1) · κ ,g and j · κ ,g . Figure 9 helps explain how we average the residuals generated by a group of three observers. Suppose we are at time step k 1 , which is during the first update period κ f ,g . We simply average all the estimated states at time step k 1 . Suppose we are at time step k 2 . Observer (g, 1) has not been updated for k 2 − κ f ,g time steps, which is larger than that of observer (g, 2) (k 2 − κ f ,g − κ ,g ) and that of observer (g, 3) (k 2 − κ f ,g − 2κ ,g ). Therefore, observer (g, 1) is the leading observer at time step k 2 . Based on this leading observer, we find the corresponding time steps when the divergence is similar for the other two observers. After getting the three estimated states, we can calculate the averaged residual at time step k 2 . It can been seen that the averaged residual is generated over 2κ f ,g time steps. The following formula shows the averaged residual at time step k: The average of the finite zero-mean random vector (N < ∞) does not exactly equal the zero vector. Based on the bounds of the system noise and update period κ f ,g , a threshold θ MOLO,g can be set for each group to compare with the averaged residual r avg,g . Notice that θ MOLO,g ∈ R m×1 is a vector. We compare each element {r avg,g } j (k) in r avg,g (k) with the corresponding element {θ MOLO,g } j in θ MOLO,g . If {r avg,g } j (k) ≥ {θ MOLO,g } j , then group g triggers a fault alarm. Once the fault alarm is triggered, the states of the group of the open-loop observers are not updated by the closed-loop observer until the alarm is cleared.
Logic is applied to determine whether the system is under sensor fault or under normal operation based on which groups trigger fault alarms. Based on the discussion about the trade-off, if a group triggers an alarm, the groups with slower update frequencies should also trigger alarms theoretically. Therefore, we find the group g which has the fastest update frequency among the groups that trigger fault alarms. If the majority of groups from 1 to g trigger fault alarms, i.e. the inequality (36) holds, then the system is under sensor fault. Otherwise, it could be false alarms and the system is under normal operation. 1 where θ f is a selected value with range 0.5-1. The sensor j, which makes the most of the groups that trigger alarms have {r avg,g } j (k) ≥ {θ MOLO,g } j (g = 1, 2, . . . , g ), is identified as the faulty sensor.
When the system is subject to a sensor fault on a critical sensor, the averaged residual is The above equation is drawn based on the assumption that observer N is the leading observer at time step k and it is updated at time step k − k 3 . Suppose the sensor fault starts between time step k − k 3 and k. Theorem 1 in Mo and Sinopoli (2010) indicates where is a small positive number and it is related to system noise and initial estimation error. Therefore, the fault signal could increase the averaged residual generated by multiple open-loop observers, thus detected by the MOLO method. If the slope of the ramp fault signal is arbitrarily small, then the fault signal can still bypass the MOLO method.

Remark:
The fault signal could be designed to make N i=1 Ff (k − (N − i)k ,g ) = 0 in order to bypass the multiple open-loop observers. That means, however, the fault signal is changing around zero every κ ,g time steps. If the change is small, then the impact of the fault is insignificant. If the change is large, then the fault signal can cause a significant change in the residual generated by a closed-loop observer.
Although this approach cannot guarantee the detection of a sensor fault with arbitrarily small slope, a sensor fault with a small slope would take a long time to disrupt the performance of the system. In addition, if the sensor fault is caused by a cyber attack, this long time increases the cost of the attack implementation. During this time, other techniques, such as sensor fusion, may have already detected the sensor fault.
Algorithm 3 shows the procedure of the MOLO method. At each time step, we first find the leading observer. Then we average the residuals for each group. The averaged residual is analysed to determine the occurrence of a critical sensor fault, and isolate the faulty sensor. After the faulty sensor is detected, if the system Algorithm 3: MOLO method for critical sensor FDI function MOLO; Input : y(k), u(k),x 0 (k), I F,g (k − 1),x g,i Output: )); end end //Fault detection and isolation; tmp = 0; //The number of groups that trigger fault alarms; tmp sensor,j = 0; //The sensor that each group thinks it is faulty; for g = 1 to M do for j = 1 to n do if {r avg,g (k)} j ≥ {θ MOLO,g } j then I F,g (k) = 1; g = g; tmp = tmp + 1; tmp sensor,j = tmp sensor,j + 1; end end end if 1 g g g=1 I F,g (k) ≥ θ f then I F = 1; i f = max j tmp sensor,j ; end is stable or marginally stable with A = 1, then we can directly use the state estimated by an open-loop observer for the state feedback controller as indicated in Proposition 4.9. Otherwise, we need to replace the faulty sensor. Figure 10 shows the performance of the MOLO method under fault β. In this example, we have two groups of open-loop observers. Group 1 has update period 8 s and group 2 has update period 2 s. There are 20 observers in each group and the update time steps are distributed evenly within one update period. Figure 10(a) shows the averaged residuals and Figure 10(b) shows the fault alarms of the two groups. After the first update period, the averaged residual is less noisy and the threshold of each group could be smaller. It can also be seen that the fault is successfully detected by Group 1 at about 27 s but bypasses Group 2. This is because the update period of Group 2 is too short compared to the slope of the fault signal. Overall, fault β is successfully detected by the MOLO method compared to Figure 6.

CCI method for non-critical sensor fault mitigation
The CCI method can potentially mitigate the impact of a fault on a non-critical sensor during the FDI process. At each time step, this method selects the closed-loop observer, based on which the state feedback controller gives the smallest divergence of the control input. This divergence is defined as follows: Definition 4.12 (Divergence of the control input): Divergence of the control input u i is the absolute difference between the CCI based on the closed-loop observer and that based on an open-loop observer.
The open-loop observer in the CCI method is slightly different from those used in the MOLO method. Since the CCI method switches among several closed-loop observers from time-to-time, the state of the open-loop observer should be updated to be the estimated state by the closed-loop observer which is used for feedback at time step k. For example, if closed-loop observer i is used for feedback at time step k, then we need to calculate the estimated statex(k + 1) of the open-loop observer with the initial statex i (k).
First, we analyze this method in ideal system, and give the lower bound of the fault signal that the CCI method can switch to the observer without the faulty sensor during the FDI process. Then, we analyze the impact of system noise on the lower bound of the fault signal.

Ideal system case
Under normal operation, the divergence of the control input calculated based on a closed-loop observer is a function of its estimation error. Under sensor fault, the closed-loop observer without faulty sensor gives the best state estimation, thus the smallest divergence. Theorem 4.13 demonstrates that the divergence of the control input u i f (k + 1) based on the closed-loop observer i f without the faulty sensor i f is smaller than that based on other closed-loop observers with the faulty sensor.
Proof: With faulty sensor i f starting at time step k, observer i f is not affected by the faulty sensor. The estimated statex i (k + 1) of observer i (i = i f ) containing the faulty sensor and the estimated statex i f (k + 1) observer i f arex Since the initial state of the open-loop observer is the same as the estimated state of the observer which is used for feedback at time step k, two cases should be considered: (1) At time step k, observer i (i = i f ) is used for feedback,x (2) At time step k, observer i f is used for feedback, Under case (1), the divergence of the control input of observer i f and observer i (i = i f ) are shown in Equations (43) and (44), respectively.
So when the lower bound of the fault signal satisfies Equation (39), observer i f gives the smallest divergence of the control input, and is selected to provide feedback for the state feedback controller at time step k+1. The same result is also drawn for case (2).
Based on Theorem 4.13, when the system is under non-critical sensor fault and the fault signal satisfies Equation (39), the CCI method can switch to the observer without the faulty sensor before the faulty sensor is identified. If the magnitude or the slope of the fault signal is too small, then the CCI method may not be able to select the observer without the faulty sensor to mitigate the impact of sensor fault; and the lower bound of the fault signal during the observers' transient state is larger than that during steady state because of the relatively large estimation error. In order to reduce the lower bound of the fault signal, horizon size κ CCI is introduced to calculate the divergence of the control input to consider the impact of the integral of the fault signal over κ CCI steps. Therefore, at each time step k, we need to recalculate the state of the open-loop observer with initial state same as the estimated statex i (k + 1 − κ CCI ) of the selected closed-loop observer at time step k + 1 − κ CCI . Then, the divergence of the control input of observer i f and i are Thus, the lower bound of the integral of the fault signal is If the fault starts between time steps k + 1 − κ CCI and k, e i f (k + 1 − κ CCI ) and e i (k + 1 − κ CCI ) are very small. In addition, the absolute value of the eigenvalues of E i f and E i are smaller than 1. Increasing the horizon step κ CCI and placing the observer poles closer to the origin can reduce both , however, we need to consider three conditions: A is stable, marginally stable and unstable. If the open-loop system is stable or marginally stable, i.e. the eigenvalues of A lie inside or on the unit circle, the term KA κ CCI e i f (k + 1 − κ CCI ) is bounded. Thus, increasing κ CCI can reduce the lower bound of the fault signal and increase the ability of the CCI method to select the observer without the faulty sensor. If the open-loop system is unstable, i.e. the one or more eigenvalues of A lie inside the unit circle, the term KA κ CCI e i f (k + 1 − κ CCI ) is diverging, which reduces the ability of the CCI method. Therefore, the selection of the optimal horizon step κ CCI depends on the property of the physical system.

Noisy system case
With system noise, the lower bound of the fault signal is increased as shown in Lemma 4.14 (the horizon step κ CCI is not considered in Lemma 4.14).
Lemma 4.14: Given a control system (1), and a sensor fault starting at time step k on sensor i f , observer i f gives the smallest divergence of the control input if the lower bound of the fault signal satisfies Equation (48).
The proof is similar to Theorem 4.13. The transient dynamics caused by switching among observers may degrade the performance of the control system (Liberzon and Morse, 1999). To avoid frequently switching, a threshold θ CCI is used to decide when to enable or disable the switching. θ CCI should be selected to balance the frequency of switching and the ability to mitigate the impact of the sensor fault.
Algorithm 4 gives the procedure of the CCI method. At each time step, the CCI method calculates the estimated state of an open-loop observer with the initial state the same as the selected observer at time step k + 1 − κ CCI . Then it switches to the observer which gives the smallest divergence of the control input if the switching is enabled. Figure 11 shows the system with the CCI method under sensor fault α. The maximum absolute value of position under sensor fault is 4 cm, which is smaller than that with the CO method as shown in Figure 5(a). During the detection delay (2 s), the CCI method has already switched to observer 1 for state estimation at 13 s, thus mitigating the impact of the sensor fault.

Integration of CO, CR, MOLO and CCI methods
In this section, the three new methods, CR, MOLO, and CCI methods are introduced and compared with the Algorithm 4: CCI method for non-critical sensor fault mitigation function CCI; CO method through simulation to show the improvements.
• The CR method enables fault detection during the observers' transient state, and no false alarms generated compared to CO method; • The MOLO method successfully detects the critical sensor fault, while CO method fails; • The CCI method switches to the observer without the faulty sensor during the FDI process, and the position of the object during sensor fault is reduced to 0.04 m compared to 0.3 m with CO method.
We systematically integrate all of the above methods to utilize their advantages, improving the overall performance of FDI and fault mitigation. Algorithm 5 shows the integration of the CO, CR, MOLO, and CCI methods. At each time step, the CCI method is used to mitigate the impact of a potential sensor fault. Then the CR method determines whether there is a faulty sensor on the system. If the CR method flags an alarm, and if the system observers have reached their steady state under normal operation (k > k ss , where k ss is the number of time steps that is needed for observers to reach their steady state), the CO method is used to isolate the faulty sensor, and the system switches to the observer that can mitigate the impact of the sensor fault after the faulty sensor is isolated. Meanwhile, the MOLO method detects whether there is a fault on a critical sensor. Robust control design in the presence of a disturbance is not within the scope of this paper. x i (k + 1) = E ixi (k) + L i y i (k) + Bu(k); //FDI and Mitigation begins; I F,g (k), i f ,x g,i (k + 1)] = MOLO(y(k), u(k),x 0 (k), I F,g (k − 1),x g,i ); end if I F = 1 and k ≥ k ss then [I F , i f ] = CO(y(k),x i ); I FB (k + 1) = i f ; u(k + 1) = Kx I FB (k+1) ; else if I F,g = 1 for any g then if A is stable or (A is marginally stable and A ≤ 1) then u(k + 1) = −Kx g,1 (k + 1); else

Illustrative example
Replace the faulty sensor i f end else Robust control to tolerate disturbance end end system shown in Figure 12 has five states: position h 1 of mass 1, velocityḣ 1 of mass 1, distance between two mass h, velocityḣ, and integral of h, which is used to achieve zero steady-state error. The five states are measured by five sensors directly, as shown in Table 2. A controller controls the system through u. Potential disturbance comes  from the ground. We want to maintain h to stay at 0 m, which is also the reference signal of this system. The system has sampling time 0.01 s, process noise bound 0.001 (m or m/s) and sensor noise bound 0.01 (m or m/s). The observers' transient state is about 0.1 s (10 time steps). The initial state of the system is (0, 0, 0, 0, 0). The initial state of the observers is (0.02, 0.01, 1, 0, 0). Table 3 shows part of parameters of the four methods.
Four scenarios are considered as examples: • Scenario 1: A ramp fault signal with slope 1 m/s (0.01 m per time step) added to sensor 3, saturating at 10 m; • Scenario 2: A ramp fault signal with slope 1 m/s (0.01 m per time step) added to sensor 3, saturating at 10 m; • Scenario 3: A ramp fault signal with slope 0.01 m/s (0.0001 m per time step)added to sensor 5, saturating at 10 m; • Scenario 4: A step disturbance from the ground with magnitude 0.2 m, starting at t = 30 s.
The faults in Scenario 1 and 3 start at t = 30 s. The fault in Scenario 2 starts at t = 0.05 s.  Figure 13 shows the system under a non-critical sensor fault happening during the steady state of the system. During the observers' transient state, the CR method eliminates false alarms as shown in Figure 13(b). At the time the sensor fault occurs, the CCI method switches to observer 3 for feedback as shown in Figure 13(c), allowing more time for FDI. The CR method triggers an alarm after detecting the sensor fault. The CO method isolates the faulty sensor, and calculates the fault signal as shown in Figure 13(d). The proposed algorithm integrating the four methods successfully protects the system from a non-critical sensor fault happening during the observers' steady state. Figure 14 shows the system under a non-critical sensor fault happening during the observers' transient state.
The CR method successfully detects the occurrence of the sensor fault with about 0.06 s time delay as shown in Figure 14(b), which is caused by relatively large θ CR (0.9) compared to the observer poles (about 0.1). The CCI method switches to the observer without the faulty sensor later than the time step that CR method detects the sensor fault. This is because the observers cannot provide good state estimations during the observers' transient state, thus the observer without the faulty sensor may not give the smallest divergence of the CCI. This scenario shows the effectiveness of the CR method for fault detection during the observers' transient state. Figure 15 shows the system is subject to a critical sensor fault. In Figure 15(b), the averaged residuals are less noisy after the first update period. Group 1 successfully detects the occurrence of the sensor fault, while group 2 does not. This scenario shows the effectiveness of the MOLO method for a non-critical sensor FDI. Figure 16 shows the system under disturbance from the ground. The CR method successfully distinguishes a disturbance from a sensor fault, and correctly estimates the disturbance signal.

Conclusions and future work
In this paper, the CO method and three new methods, the CR, MOLO, and CCI methods, are integrated to solve the FDI and mitigation problem using multiple closedloop and open-loop observers. The closed-loop observers include one that uses all of the sensor measurements for state estimation, and others that exclude a non-critical sensor. Based on the two different types of observers, new methods are proposed and integrated to solve various problems: • the CR method can detect non-critical sensor faults during the observers' transient state;    • the MOLO method can detect and isolate critical sensor faults; and • the CCI method can mitigate the impact of non-critical sensor faults during the FDI process.
The CR method uses the CRs of the observers' estimation errors to determine whether or not there is a noncritical sensor fault. The CRs of the observers' estimation errors are not affected by the uncertain initial condition. Therefore, the CR method can reduce false alarms during the observers' transient state. To achieve robust FDI, bias analysis is used to distinguish a sensor fault from a disturbance.
The MOLO method utilizes a bank of open-loop observers, which do not use sensor measurements for state estimation, to detect and isolate critical sensor faults. The state of the open-loop observers are updated periodically by the closed-loop observer which uses all of the sensor measurements. Because of the tradeoff between estimation performance and the ability to detect a sensor fault, the open-loop observers are divided into several groups. In the same group, the openloop observers are updated with the same update frequency, but the time steps to update them are evenly distributed in one update period. The residuals generated by observers in the same group are averaged. Then the averaged residuals of different groups are analysed to determine the occurrence of a sensor fault and to locate the faulty sensor.
The CCI method switches among different closedloop observers to potentially mitigate the impact of noncritical sensor faults during the FDI process. This method selects the closed-loop observer which gives the smallest divergence of the control input, for state estimation at the next time step.
The three new methods are integrated with a previously developed residual-based method (CO method) to collaboratively address the FDI and mitigation problem in this paper. The collaboration of the methods is illustrated in Figure 2(a) and Table 1. The proposed algorithm allows any residual-based method to be integrated besides the CO method. Simulation results show the effectiveness of our proposed framework.
This multi-observer approach can be easily extended to the multiple sensor faults case as long as the system observability still holds without the faulty sensors. However, at a high level, the framework we propose has some limitations. There is not currently a method to detect a critical sensor fault during the observers' transient state. Also, no method can potentially mitigate the impact of a critical sensor fault.
Other limitations of the framework proposed herein include an inability to detect a ramp sensor fault with arbitrary slope, and the requirement of a lower bound on the magnitude of the fault signal for detection. In some cases, our framework cannot distinguish a sensor fault from sensor noise in some cases. Addressing this issue is a topic of future work. Sensor fusion, statistic analysis, and machine learning methods are potential solutions to this problem.
Each of the methods we propose presents opportunities for future work. The CR method is very sensitive to sensor noise for fault detection. The threshold θ CR , which is used to compare with the CR, is selected to be much larger than the observer poles to reduce false alarms during the observers' steady state. Therefore, the CR method cannot detect the sensor faults that make the CR smaller or slightly larger than the observer poles. If we can make the CR method robust to sensor noise, upper and lower bounds for the CRs can be set to address more sensor faults. The bias analysis of the CR method is also sensitive to system noise. Our future work could use several robust observers for the CR method, which requires that we decouple estimation errors based on the estimated states of the observers.
The MOLO method, used for critical sensor FDI, does not work for open-loop unstable systems. Techniques such as sensor fusion could be exploited to protect unstable systems from critical sensor faults.
The CCI method does not perform well if the sensor fault occurs during the observers' transient state, as shown in the suspension system example in Section 5, because of the relatively large estimation error. This issue could be potentially addressed by combining the CR method and the CCI method together because the CR method can decouple the estimation errors of the observers. Finally, the optimal horizon step, which could reduce the lower bound of the fault signal, is unknown. A cost function should be proposed to determine the optimal horizon step in the future.
It may be impossible in general to detect every kind of sensor fault. The aim of our sensor FDI and mitigation method is to decrease the lower bound of sensor fault that can be detected, and to allow more time for other techniques to protect the system before it runs into some severe condition.

Notes
1. The reason we use the Luenberger observers (or Kalman filters) is that we can decouple observers' estimation errors for the CR method. 2. Q i should be designed to make the element {y i (k) − C ixi (k)} j (j ∈ S nc ), where j corresponds to the critical sensors, have larger weighting ratios than the element corresponding to non-critical sensors. 3. The demonstration is shown in Theorem A.1. 4. The initial estimation errors of the observers are large to help us understand the limitations of a residual-based method using closed-loop observers during the observers' transient state.
(2) Under sensor fault, the estimation error cannot be correctly calculated. Therefore,ẽ μ(ν) in Equation (17) and e μ in Equation (12) are substituted to Equation (21) to calculate the difference between two biases based on two observers,

A.7 Proof of Proposition 4.10
Proposition 4.10: Given a control system (1), an open-loop observer is updated every κ f ,g time steps. The impact of the system noise on the averaged residual (A24) is mitigated.
where j N is a positive integer.
Proof: Since the process noise and sensor noise are zero-mean vectors, The residual generated by a single open-loop observer over one update period is Then the averaged residual is r avg,g (k + (j N − 1)κ f ,g ) If j N → ∞, then Therefore, the impact of system noise is mitigated. The difference of estimated states of two observers e μ(ν) ,ē μ(ν) Estimation error of observer μ calculated based on observers μ and ν and its upper bound e ,μ(ν)

A.8 Table of notations in the paper
The calculated estimation error of observer μ after changing the coordinates e μ Overall estimation error of observer μ, which is a function ofẽ μ(ν) , ν = 0, 1, . . . , m o ∧ ν = μ e ,μ Overall estimation error of observer μ after changing the coordinates {cr i } j CR of the jth state estimation error of observer ĩ d μ (ν) The bias based on the calculated estimation errorẽ μ(ν) d ,μ(ν) ,d μ (ν) The bias based on the calculated estimation errorẽ ,μ(ν) and its upper bound κ CR Time steps for the CR method θ CR Threshold to determine the occurrence of an anomaly θ d,μ(ν),ζ (η) Threshold to distinguish a sensor fault from a disturbance φ ν Weighting ratio of calculated estimation errorẽ μ(ν) ψ μ (ν) Weighting ratio of calculated biasd ,μ(ν)

MOLO Method M,N
The number of open-loop observers groups and the number of open-loop observers in one group κ f ,g , κ ,g Update period, update interval between two adjacent open-loop observers for group g r g,i Residual signal of observer i in group g H g , r avg,g Leading observer, averaged residual in group g θ MOLO,g Threshold for the MOLO CCI Method κ CCI , θ CCI Horizontal window, threshold for the CCI method u i Control input difference of closed-loop observer i The jth element of a vector, the jth row of a matrix, the jth diagonal element of a diagonal matrix {·} j1,j2 The element at the j 1 th row and the j 2 th column of a matrix || · || The infinity norm || · || ∞