Graphical Influence Diagnostics for Changepoint Models

Changepoint models enjoy a wide appeal in a variety of disciplines to model the heterogeneity of ordered data. Graphical influence diagnostics to characterize the influence of single observations on changepoint models are, however, lacking. We address this gap by developing a framework for investigating instabilities in changepoint segmentations and assessing the influence of single observations on various outputs of a changepoint analysis. We construct graphical diagnostic plots that allow practitioners to assess whether instabilities occur; how and where they occur; and to detect influential individual observations triggering instability. We analyze well-log data to illustrate how such influence diagnostic plots can be used in practice to reveal features of the data that may otherwise remain hidden.

Since then, changepoint models have been actively investigated (see Eckley et al., 2011 for an overview) with most studies focusing on the development of changepoint detection algorithms (e.g., the PELT method of Killick et al., 2012 or wild binary segmentation of Fryzlewicz, 2014) and inference methods (e.g., Wu and Matteson, 2020 for univariate changepoint detection in presence of local outliers, Grundy et al., 2020 for a recent multivariate change in mean and variance method).
Recent literature is starting to consider issues that arise when applying offline changepoint techniques in practice such as the impact of where changepoints are incorporated into the inference pipeline; pre-modelling as a data cleaning process, within the main modelling framework or as a post-modelling diagnostic on residuals from a fitted model (Chapman and Killick, 2020). However, influence diagnostics-as an integral part of any data analysis-have been overlooked for offline changepoint analyses. Yet, diagnostic work is vital to enable analysts and practitioners to detect (i) potential problems with the changepoint model, (ii) how and where they occur and (iii) what may trigger these to occur. Providing such diagnostic tools is crucial to ensure the potential of changepoint models to be fully realized outside of the academic domain and to help practitioners develop intuition, discover features of the data that may otherwise remain hidden and make, in the end, more informed decisions (Rajaratnam et al., 2019).
In this paper, we present a unified influence framework for offline changepoint models that is fully aligned with this articulated need. The graphical influence diagnostic tools we develop are the first to highlight instabilities in changepoint models and assess the influence of single observations on the stability of the changepoint segmentation and corresponding segment parameters. We devise these plots through automated procedures, available on CRAN in the R (R Core Team, 2017) package changepoint.influence, to bring the importance of influence diagnosis to the attention of researchers in the changepoint community and to stimulate their widespread usage amongst practitioners.
Model instabilities are well-known statistical problems and influence diagnostics are essential to detect them and to investigate the role of single observations that give rise to these instabilities. We call observations whose alteration changes the resulting changepoint segmentation, and thus give rise to model instabilities, influential. Influence diagnostics have a long-standing history in regression analysis, see early studies by, amongst others, Cook (1979) and Belsley et al. (1980) who assess the effect of single observations on coefficient estimates in low-dimensional settings or more recent work by Hellton et al. (2019), Rajaratnam et al. (2019) and Zhao et al. (2019) who adapt diagnostic tools to high-dimensional regression settings. Developing influence diagnostics for changepoint analysis is arguably even more compelling than it is for regression analysis since influential observations can not only affect parameter estimates (for instance, segment means) but also the entire change- Our contributions to the changepoint literature are twofold. First, we introduce a new framework for diagnosing influential observations within changepoint models. We propose two types of influence diagnostics. Both types alter individual data points and evaluate how, and to what extent, such alterations induce differences in various outputs of the changepoint analysis. They differ in the way the data are altered. In one extreme case, we alter data points by deleting them, one by one, thereby following the intuitive and popular deletion diagnostics (Belsley et al., 1980) for regression analysis. In the other extreme case, we alter data points by contaminating them such that each point forms a segment on its own, thereby building on the idea of empirical influence functions used in robust statistics (e.g., Hampel et al., 2011 for an overview or Pison and Van Aelst, 2004 for diagnostic plots). In Section 3, we will see that the two proposed types of influence diagnostics provide complementary views. Secondly, we equip researchers, analysts and practitioners working with changepoint models with a set of diagnostic plots. These plots help to visualize the output of the influence diagnostics and identify whether the original segmentation is vulnerable to instabilities. If so, in-depth plots clearly depict how and where these instabilities manifest. More detailed follow-up visual tools then aim to identify single observations that trigger these instabilities to arise and assess their influencing role.
The remainder of this article is structured as follows. In Section 2, we present a motivating example for the development of changepoint influence diagnostics. We introduce the framework for diagnosing changepoint models in Section 3. In Section 4, we present the influence diagnostic plots that guide practitioners in answering various diagnostic questions. We demonstrate the usage of our graphical influence diagnostics on an application to well-log data in Section 5. Finally, in Section 6, we summarize our contributions and propose several directions for future work.

Motivating Example
We present a motivating example to illustrate that changepoint segmentations can be highly sensitive to individual data points, thereby calling for appropriate influence diagnostics to identify and assess these various sources of instability and data influence on them.
Well-log data. We consider the problem of detecting changes in well-log data (Ruanaidh and Fitzgerald, 2012). Figure 1a displays n = 1000 measurements from a probe that is lowered into a bore-hole. The probe takes measurements of the nuclear-magnetic response of the rock that it is passing through. Abrupt changes occur in the measurements as the probe moves from one type of rock strata to another. Changepoint analysis is used to detect these rock strata. While online changepoint detection can be used to modify the settings of the drill in (near-)real time, we focus on influence diagnostics that are suitable for offline changepoint analysis. The well-log data are particularly suited for this purpose as they exhibit several interesting features for influence diagnosis, as discussed below.
Changepoint segmentation. Several changepoint methods have been used to detect changes in the well-log data (Fearnhead, 2006, andRuanaidh andFitzgerald, 2012). and (c) highlights the span of changes to the segmentation compared to panel (a).
Following Fearnhead and Rigaill (2019), we focus-throughout the paper-on the minimum penalized cost approach to detect changes in the mean and use a Normal likelihood test statistic and the Pruned Exact Linear Time (PELT) algorithm (Killick et al., 2012), available in the R package changepoint (Killick and Eckley, 2014), to detect these changes.
Details about this approach are included in Appendix A of the Supplemental Material.
This is merely an illustrative example of a changepoint model as our framework is more broadly applicable, as will be discussed in Section 6.
In Figure 1a, we visualize in dashed (blue) lines the changepoints detected from a change in mean model: 19 changes are detected. The segments vary considerably in length, ranging from segments containing as many as 171 observations (last segment) to single observation segments (e.g., two segments in between observations 219 and 221). The latter are very low measurements that occur due to malfunctioning of the probe and can be highly influential.
Indeed, if a data point is sufficiently extreme compared to its neighbors, it occurs in a segment of its own (see Fearnhead and Rigaill, 2019 and Proposition B.1 in the Appendix).
Changepoint stability and data influence. While the well-log data have been extensively analyzed through various changepoint methods, the stability of the obtained changepoint segmentation and influence of single data observations on it is less well understood. To illustrate this, consider the influence of two types of data alterations on the obtained segmentation. from a changepoint analysis, they raise several general diagnostic questions practitioners might be concerned with. We present three main diagnostic questions, each motivating the need for a particular type of influence diagnostic, as will be discussed in Section 4: Q1. Is the output of the changepoint analysis stable or vulnerable to data instabilities?
Q2. If vulnerable, how and where do the instabilities manifest?
Q3. Which single influential observations trigger these instabilities to arise and how so?
The first two questions aim to assess the stability of various outputs of the changepoint analysis: For instance, which changepoints are sensitive to the data at hand and does this sensitivity raise questions on their occurrence and/or their location? The third question digs deeper into the influential role of single observations on the various output measures.
An important remark that needs to be made here is that influential observations need not to be seen as harmful in the analysis, in the sense of measurement errors or extreme/atypical data points, but can be seen as data points that are highly relevant for obtaining the segmentation at hand (Serneels et al., 2005).

Framework for Diagnosing Changepoint Models
In this paper, we consider observed sequences of data, y 1 , . . . , y n , and assume a changepoint analysis has been performed on such a sequence resulting inm identified changepoints at ordered locationsτ 1 , . . . ,τm. This segmentation splits the data intom + 1 independent segments, the i th of which contains the data points y τ i +1 , . . . , y τ i+1 using the convention that τ 0 = 0 and τm +1 = n. Our goal is to develop a general framework for assessing a segmentation's (in)stability, and understanding the role of each observation y t , t = 1, . . . , n, on the estimated segmentation. The framework is presented in this section. In Section 4, we explain how to use the newly constructed influence diagnostic plots for these purposes.
The proposed framework allows us to identify and analyze both global and local  To detect changepoint (in)stabilities and quantify the effect of a single data point on them, we follow the procedure of consecutively rolling through all data points and at each time either deleting or contaminating a particular observation. This way, we identify (1) whether various output measures of the changepoint analysis are stable or differ substantially after altering individual data points; (2) how and where instabilities arise: locally, only affecting the segment the deleted data point is within, or global, thereby affecting other segments as well; (3) the individual influential data points triggering these instabilities.

Rolling Procedure
For each data point t = 1, . . . , n in this rolling procedure, two new segmentations are obtained. The first new segmentation, we call the "Observed Segmentation". Here, we simply re-run the changepoint method on the altered (deleted or contaminated) data and record the "Observed Segmentation" obtained. The second segmentation, we call the "Expected Segmentation". This segmentation corresponds to our expectation regarding the change in a particular output measure of the changepoint analysis when either deleting a data point or contaminating it. Detailed results for the expected segmentations under a penalized cost approach for a change in mean are provided in Appendix B.
While all influence diagnostic plots are constructed from the observed segmentation, only the Location Stability plot and the Influence Map rely on the comparison between the observed and expected segmentation. The idea is that these plots should directly highlight unusual behavior rather than changes to the segmentation we expect to see due to the data alteration. We therefore do not display the expected changes but instead compare what is observed to what is expected under a penalized cost approach such that only changes beyond the expected ones are displayed. As such, we draw a practitioner's attention to insightful discrepancies that help to assess how and where instabilities manifest (Location Stability plot) and quantify the influence of individual data points on a changepoint's (in)stability (Influence Map). The difference between both segmentations presents evidence of a single data point's influence beyond what is to be theoretically expected, which a practitioner can interpret as influence.

Deleting Observations
Deletion diagnostics have been the subject of extensive research in the context of regression analysis and date back to Cook's distance (Cook, 1979), which measures the influence of single observations on various aspects of the fitted regression model, including and excluding the observation in question. For changepoint models, a diagnostic analysis based on deleting data points is complicated by the fact that (i) individual data points can, in addition to parameter estimates as in traditional regression analysis, affect the entire changepoint model through the number of changepoints and their location. (ii) While the total number of observations n might be considerable, each segment contains only a (sometimes small) fraction of the total sample size, hence individual observations can not only have a potentially tremendous local influence on the segment to which it belongs but this influence might also spill over globally to other segments. Hence, this calls for the need of new deletion diagnostics for detecting changepoint (in)stabilities and understanding the influential role of single data points on them.
Inspired by these intuitive and popular deletion diagnostics, we alter data points by deleting them, one by one, and assess the relative change in various outputs of the changepoint analysis (i.e., number of changepoints, changepoint location, segment parameters) to demonstrate the influence, or not, of the deleted data points. We show in Appendix B that the segmentation expected under a single data point deletion remains the same as the original one unless the data point belonged to a segment of length one. In practice, we implement the deletion approach as follows. By way of example, consider a changepoint segmentation, 1 1 1 2 3 3 3. When we leave out the first observation the expected segmentation then becomes NA 1 1 2 3 3 3. However, when we leave out the fourth observation, the expected segmentation is 1 1 1 NA 2 2 2. Note that we re-number the segments to ensure that two neighbouring segments differ in their numbering by one.

Contaminating Observations
As an alternative to deletion diagnostics, empirical influence functions are commonly used in robust statistics to determine, on a sample-specific basis, the influence of each data point on parameter estimation or prediction (Hampel et al., 2011). To this end, the effect of an infinitesimal contamination at a certain data point on a statistical functional of interest is measured and used as a diagnostic tool to assess its influence. It is hereby crucial to stress the discrepancy between influence and extremeness. While both properties coincide in the detection of influential outliers (i.e., atypical data points), in general, non-outlying influential data points as well as non-influential outliers do exist (Serneels et al., 2005).
Inspired by techniques from robust statistics, we alter data points one by one such that each point is made atypical/outlying and assess the relative change in various changepoint outputs of this contaminated data point. Fearnhead (2006) showed that the segmentation expected under this data alteration corresponds to the segmentation obtained on the original data with two extra changes added before and at the contaminated position. Being different from the bulk of the data, a contaminated point thus warrants its own segment.
However, only one extra change occurs if we are close to an original changepoint. Coming back to our earlier example (i.e., segmentation 1 1 1 2 3 3 3), when we contaminate the first observation the expected segmentation becomes 1 2 2 3 4 4 4, hence there are four segments in total instead of three. However, when we contaminate the second observation, the expected segmentation is 1 2 3 4 5 5 5, thereby including two additional changes.
These two ways of altering observations provide two extreme perspectives: on the one hand, when deleted, the segmentation reveals what would have happened had the observation not been observed. On the other hand, when contaminated, the point is simultaneously both maximally influential (it has its own segment) and minimally influential (in its own segment it does not directly contribute to other segments). Curiously the contaminated point also forces a shortening of the segments on either side, thus allowing one to identify if a change is sensitive to the length of the segment it is within. Thereby, both present complementary views on the stability of changepoint segmentations and allow us to better grasp the overall influence of single observations.

Influence Diagnostic Plots
We create a set of four diagnostic plots which range from coarse level to detailed, namely the "Stability Dashboard", "Segment Location Stability" and "Segment Parameter Stability" plots, and finally the "Influence Map". The different plots each aim to tackle a specific diagnostic question (see Figure 2), making the choice of an appropriate plot crucial for highlighting a particular aspect of the influence diagnosis for changepoint models. Practitioners should choose the most appropriate level of detail for the data set and question they are considering. All plots rely on the rolling procedure discussed in Section 3 and are constructed for both the case where data points are consecutively deleted and contaminated. The influence diagnostic plots can be created via the R package changepoint.influence.
To illustrate the usage of the plots and provide guidance on how to interpret their various features, we make use of a simulated data example. We generate an ordered sequence of

Segment Location Stability
We next move to the second level to assess how and where instabilities in the location of the changepoints occur. To this end, Location Stability plot of the changepoint locations across the n altered data points can be used.
We record the number of times a changepoint alteration occurs as well as the location of any moved or additional changepoints. This plot thus allows practitioners to assess For a segmentation which does not vary across altered data points, we expect each of the original changepoints to stay in place when all data points other than itself are altered. For ease of use, we directly display the discrepancy in number of changepoint occurrences from this expected maximum. This way only unstable (dot-dashed orange) or outlying (dotted red) original changepoints may enter the plot with a negative difference.
The latter indicates that the changepoint no longer occurs at its original location for some instances. Either the changepoint moves to another location or it disappears completely.
When a changepoint moves, it will be offset and depicted by a positive difference at another location, thereby leading towards a net balance of zero. Disappearing changepoints, on the other hand, do not appear in the plot but can be deduced from net negative balances.
It can also occur that the original changepoints remain but that additional changepoints occur due to the alterations (net positive) but this is much less common in our experience.
The Location Stability plots for the simulated data example are presented in Figure 5.

Segment Parameter Stability
This plot complements the previous in tackling the second diagnostic question by considering instabilities in the segment parameters, such as the mean. It is important to investigate the segment parameters separately as the changepoint locations may vary but for a small or uncertain changepoint, the segment parameters may not vary considerably. If one is only interested in inference on the segment parameters and not the changepoint locations then this is important information.
To construct our diagnostic plot we start by depicting the original segment parameters, such as the mean in our example, by solid (red) lines, which correspond to the ones from For the outlier method (Figure 6b), the darker areas are typically larger in size, especially towards the edges of a segment. This is in line with our expectation, since the data contamination induces two additional changes thereby triggering directly surrounding segments to be smaller in size. The additional variability then arises due to the fewer data points available for estimation of the segment parameters. This results in a "bleeding" effect at the edges of the segments. The most pronounced instability in segment means is again observed for the data points around the last changepoint. Coupled with the stability dashboard, it is clear that the changepoint moves both earlier (producing a lower mean) and disappears (producing a higher mean after 145) although again it gives no information as to which observations are responsible for this.

Influence Map
At the most detailed level we have an "Influence Map" of salient differences between the observed and expected segmentation from the deleted or contaminated data points. This final plot is a heat map which identifies single, influential observations that trigger changepoint instabilities, thereby addressing the third diagnostic question. The heat map depicts the difference in segment number between the observed and expected segmentations across each of the altered data points. Analogous influence maps can be made for other outputs of the changepoint analysis.
The horizontal axis of the Influence Map is the standard time index of the original data (1 to n). The vertical-axis indexes the altered data point (1 to n). Each coloured (taupe or blue) pixel marks the difference between the observed and expected segmentation at the specific (x, y) co-ordinate. The data point on the vertical-axis should be understood as the influential data point whose alteration leads to changes in the affected data points on the horizontal axis if any colouring appears. We colour zero difference in the heat map as white, increases in segment number as taupe and decreases as blue. Hence, data points on the vertical axis without a single coloured co-ordinate on the horizontal axis can be considered as non-influential since they do not trigger any changepoint instability. Rows with coloured pixels correspond to data points which are instability triggers. The intensity of the colour then signifies the magnitude of the instability (namely the increase or decrease in segment number). Coloured areas are expected to occur around unstable or outlying (orange or red) changepoints, which are depicted as coloured circles on the diagonal.
Before discussing the Influence Maps for the simulated data example, we describe its important features to aid practitioners in studying the influential role of the individual data points through these maps. These features are summarized in Figure 7. (i) Figure   7a highlights the role of the diagonal : colouring above the diagonal indicates that an alteration of the corresponding data point (on the vertical axis) affects earlier data points, colouring below the diagonal indicates that subsequent data points are affected. (ii) Figure   7b concerns the horizontal span of the colouring: a stop in colouring indicates that changepoints have moved, while a continuation of colouring to the last data point indicates that, that appear in the coloured area are influential and assert influence over the corresponding data points on the horizontal axis. The height can be seen as the extent to which instability arises in this influential region.
Relying on these features, we are ready to discuss the Influence Maps for the simulated data example, as presented in Figure 8. Across both maps, few coloured areas appear, each of them characterizing some form of instability. All of them occur around the originally detected unstable or outlying changepoints. The instability triggers (i.e., data points on vertical axis with colouring) are observations 101 and 129-157; their influential role will be detailed below. Note that most coloured areas are blue, thereby indicating that a particular data point (on the horizontal axis) has a lower segment number in the observed segmentation than expected; in other words less than expected changepoints occur. We subsequently discuss the Influence Maps according to their main features.
(i) In this example, we see that influential data points have a tendency to affect subsequent data points rather than preceding ones since most colouring occurs below the diagonal. Consider the blue colouring in Figure

Well-log Application
We now return to the well-log data, presented in Section 2, and address our main diagnostic questions one by one.

Stability of the Changepoint Analysis
We start by tackling our general diagnostic question "Is the output of changepoint analysis stable or vulnerable to data instabilities? " through the Stability Dashboards, presented in  illustrate that influence diagnostics should not be overlooked but rather considered as a much needed natural successor to any changepoint analysis. The mere visualization of one single additional graphic, the Stability Dashboard, can either re-assure practitioners on the stability of their performed analysis or warn them for the occurrence of instabilities. In the latter case, a more detailed influence diagnosis can be performed through our other diagnostic tools which are discussed next.

Manifestation of the Instabilities
Next, we address the question "How and where do the instabilities manifest?". First, consider the stability of the changepoint locations, as visualized in the Location Stability plots of Figure 10. For the deletion method, the few positive short (black) heights (Figure 10a) immediately highlight that only a minority of location instabilities occur. Hence, while many changes are labelled as potentially unstable (orange dot-dashed lines in Figure 10a); these instabilities only manifest themselves in rare cases. For the outlier method, by contrast, especially changepoints 368 and 695 are prone to more severe instability as can be observed from the long negative heights at their locations in Figure 10b are deleted rather than moved. The analysis of the well-log data is, however, more complex than the simulation data example, which makes it harder to directly associate the changepoint moves (black positive lines) to the original changepoints (coloured negative lines) in the Location Stability plots. Practitioners are therefore advised to consult the more detailed Influence Map to match how various data perturbations affect the original data.
Secondly, consider the stability of the segment parameters, namely the mean, as visualized in the Parameter Stability plots of Figure 11. Due to the (minor) evidence of changepoint location instability for the deletion method, the vast majority of segment means appears very stable in Figure 11a. Some instability can be observed for observa-

Sources of the Instabilities
Finally, we consider our more detailed influence Diagnostic Objective, namely "Which single influential observations trigger these instabilities to arise and how so?", through the lens of the Influence Maps ( Figure 12).
We first discuss the results for the deletion method (Figure 12a). Several potentially unstable (orange) changepoints-such as the one at location 34-hardly have any (clearly) visible coloured pixels of instability surrounding them. This is due to the larger dataset size n = 1000 than our previous example. We recommend to consider these changepoints as sufficiently stable. A handful of observations are found to assert notable influence.
Recall that within the data just after time points 200 and 400 there are malfunctions in the observations recorded. Rather than affect single observations these result in a quick Interestingly, the same phenomenon occurs for the instability triggers 326-384: their contamination causes changepoint 368 to disappear. Other similar, though less outspoken, influential regions occur in Figure 12b. All of these are blue, indicating a changepoint removal and the majority of them affect subsequent observations from the same segment (since the coloured pixels start below-diagonal and continue until the end of the sample).

Conclusion
Motivated by questions from practitioners in applying commonly used changepoint methods, this paper has presented the first approach to considering influence of the observed data points on changepoint segmentations. We provide a framework for two methods to characterize influence; deletion and contamination. Alongside the framework three levels of graphics were introduced. The stability dashboard provides an overview of the results which indicates if there are any locations of concern. The location and parameter stability plots provide the second granularity of detail indicating how the segmentations are affected. The most detailed level is the influence map which includes which observations are influential and how they influence the segmentation.
A challenging aspect of the proposed approach is to characterize what a "no problem" situation looks like. The simplest answer is that if all changepoints are stable (dashed green) then there is no problem. However, in reality this is unlikely to be the case as short segments, small changepoints and clustered changepoints as seen in our well-log example, are common. We have deliberately not addressed this subjective issue of when a point is "influential enough" to compromise an analysis, as the deliberation of this depends on the downstream pipeline of decision to be made based upon the original segmentation. We prefer to leave this evaluation to the sensibility of the practitioner.
We illustrated our general approach using the change in Normal mean test statistic coupled with the PELT search method but we stress that our approach can be applied to all changepoint methods. Furthermore, the only aspect of this paper which is specific to the change in Normal mean test statistic is our justification of the expected alterations to the segmentations that feed into the location stability and influence map. For other test statistics these either need to be calculated or to plot the altered segmentations rather than the difference from the expected. This is an important consideration as if one was using a robust test statistic such as that provided in Fearnhead and Rigaill (2019) then the outlier method would not guarantee the creation of two new changepoints. It is still interesting to consider the influence of the data in this robust setting but we leave this for future work. Our aim in this paper is to provide a framework for assessing influence in a general sense; utilising a common test statistic and search method purely as an example.
In future research it would be interesting to explore whether different test statistic and search method combinations may be more/less prone to instabilities than others.
Finally, one may consider that the influence plots characterize information about uncertainty in the changepoint segmentation. Whilst this is true, we are not aiming to provide confidence intervals or similar measures of uncertainty quantification. Akin to regression analyses, there are questions best answered by confidence intervals and others by measures of influence. Similarly, we have advocated questions here that practitioners may wish to answer for which a measure of influence for changepoint segmentations is required.

Appendices A Background to Changepoint Methods Used
The paper proposes a general influence framework for all changepoint problems. However, the calculation of the expected segmentations under the data alterations requires the assumption of an underlying model and inference framework. We choose to demonstrate the framework on the simplest, but informative, changepoint model, the multiple changepoint problem where y 1 , . . . , y n is an n length time series which is assumed to follow a Normal distribution with mean µ {i} and variance σ 2 . The µ {i} are assumed to follow a multiple changepoint structure with where the {τ i } m i=1 are the m ordered changepoint locations and adopting the standard notational convention that τ 0 = 0 and τ m+1 = n.
We use a minimum penalized cost approach to infer the number of changepoints and their location. The cost associated with a segment of data y s:t = (y s , . . . , y t ) is given by C(y s:t ) = min θ t j=s γ(y j ; θ), where γ(y; θ) is a loss function for a single observation y and θ is a segment-specific location parameter. In our simulations and data example, the loss function utilized is twice the negative log-likelihood for a Normal distribution (ignoring terms which are constant across segments), γ(y; θ) = 1 σ 2 (y − θ) 2 . The penalized cost for a segmentation is then Q(y 1:n ; τ 1:m ) = m j=0 C(y τ j +1:τ j+1 ) + β , where β > 0 is a penalty cost for the introduction of a changepoint and the vector τ 1:m collects all changepoint locations.
In order to estimate the number and location of the changepoints, we need to minimize problem (A1). There are several approaches which, directly or indirectly, optimize a form of (A1). We choose to use the PELT algorithm (Killick et al., 2012) as it is an exact optimizer of problem (A1). PELT uses dynamic programming to rearrange (A1) to Q(y 1:n ; τ 1:m ) = min j=1:n Q(y 1:τ j ) + C(y τ j +1:n ) + β , where Q(y 1:τ j ) is the cost of the optimal segmentation for data y 1 , . . . , y τ j . The final solution in equation (A2) is not calculated directly but is calculated for increasing lengths of the data y. Provided that the loss function for a portion of data reduces when a changepoint is added (i.e. more parameters improves the fit), PELT further prunes the minimization over j to reduce computational time. See Killick et al. (2012) for full details.

B Results on the Expectation
We provide numerical and theoretical results on the segmentation one expects under the two data alternations: deletion and contamination. We assume the inference is conducted using a penalized cost approach as discussed in Appendix A with the squared error loss γ(y; θ) = (y − θ) 2 which is commonly used for inferring the mean of the data, but other choices can be made. Note that this is a scaled version of our negative log-likelihood loss.

B.1 Deleting Observations
We investigate how one should expect a deletion of a single data point to affect the changepoint segmentation. We generate ordered sequences of length n = {100, 200, 300, 400, 500, 1000} with one change in mean: observations belonging to the first half of the sample are generated from a N (0, 1); observations belonging to the second half of the sample from a N (δ, 1), where we consider different values for the size of the change δ = {1, 2, 3, 4, 5} and the variance is kept fixed at one.
We then apply our rolling procedure thereby subsequently deleting each observation t = 1, . . . , n and re-estimating the changepoint location with PELT. For each observation t = 1, . . . , n in each simulation run, we record whether the changepoint moves beyond its original location. We set the number of repetitions of each simulation scenario to 500.
Note that our expectation is that observations deleted prior to the true changepoint location will see the estimated changepoint location reduce by one whereas observations deleted after the true changepoint location will remain the same.
In Figure A1, we plot the average (taken over all simulation runs) proportion of data points for which the original changepoint moves, and this for different values of the shift size δ and sample size n. Intervals of length two standard errors are indicated by the dashed lines. As with all changepoint problems, the asymptotics depend on the size of the change and the segment length. As the sample size n and/or the shift size δ increases, we observeas expected-almost no changepoint moves beyond its original location. For small mean shifts and/or sample sizes, minor perturbations from the expectation are detected. These exactly correspond to changepoint instabilities practitioners should be warned for and will thus show up in the Influence Map.

B.2 Contaminating Observations
We numerically investigate how one should expect a contamination of a single data point, namely when made outlying, to affect the changepoint segmentation. Theorem 1 in Fearnhead and Rigaill (2019) shows that for a sufficiently large outlier, y t , two additional changepoints at both t − 1 and t reduce the penalized cost. The optimal segmentation is thus expected to have two additional changes at these locations. The next proposition extends Theorem 1 in Fearnhead and Rigaill (2019) to prove how large the outlier needs to be in order for it to introduce two additional changes. the segmentation that minimizes the penalized cost will have changepoints at t − 1 and t.
Proof. We follow the proof of Theorem 1 in Fearnhead and Rigaill (2019) to show how large the value of an atypical data point y t needs to be for the optimal segmentation to have changes at both t − 1 and t (or only one of these if the original segmentation already has a change at the other time-point).
Consider any segmentation of the data that does not include changepoints at both t − 1 and t. Let the segment of the original segmentation that contains the outlier y t be y s:u for s < t and u > t. The change in cost between this segmentation and the segmentation with additional changepoints at t − 1 and t is min t−1 j=s γ(y j ; θ) + min θ u j=t+1 γ(y j ; θ) + 2β − min u j=s γ(y j ; θ), see Fearnhead and Rigaill (2019).
If the change in cost in (A3) is positive, no additional changepoints will be induced. If the change in cost in (A3) is negative, two additional changepoints will be induced. We will show how large y t needs to be for the latter to occur.
For convenience, we first introduce the following notation: Using the fact that for likelihoods (without penalization) adding a changepoint is always preferred (lower cost) we have that no additional changes will be induced if 2β ≥ γ(y t ;θ s:u ).
Two additional changes will be induced if 2β < γ(y t ;θ s:u ).
The proposition shows that two additional changes occur if the cost γ(y t ;θ s:u ) of keeping the observation in the current segment is larger than the cost 2β of introducing two additional changes. Numerical simulations (unreported) confirm this result.
Since the values of y t resulting in two additional changepoints depend on the original data at hand, it would be computationally cumbersome to calculate the exact boundary for each contaminated point. Thus, to avoid computational overload in computing the cost of each data point, we set the value of the contaminated data point equal to twice the data range to construct our influence diagnostic plots in practice. Simulation experiments (unreported) confirmed that contaminating data points by adding twice the range of the data is sufficiently large to guarantee the expected number of additional changes across different considered sample sizes.