LSTM-certainty as early warning signal for critical transitions

We trained a long-short-term-memory (LSTM)-neural network on time series generated with an agent-based model that was designed to differentiate the drivers of its dynamics into external and internal forces, with the internal ones stemming from neighbourhood interaction considered as ‘social’ influence. The trained LSTM proved capable of predicting changes in the dynamics of time series from systems prone to critical transitions. The probability of the assessment – i.e. the ‘certainty’ of the LSTM for its prediction – thus can be used to indicate qualitative changes in a system's behaviour. In many cases, these certainties announce imminent state changes earlier and also more clearly than the set of statistical methods, which is suggested for predicting critical transitions under the term early warning signals.


Introduction
Critical transitions (CTs) are state changes in a system's dynamics that tend to occur unexpectedly and often change the regime of the system from an accepted to a detrimental condition. These transitions can appear abruptly compared to the rather slow development of a critical parameter that is assumed to drive the system's change (Carpenter et al., 1999;Drake & Griffen, 2010;Scheffer et al., 2001). Additional subtleties in the system's dynamics, such as stochasticity (Boettiger, 2018), hysteresis (Mayergoyz, 2003) and in particular strong non-linear interactions in a system's components, causing rapid equilibrium changes from stable to unstable states or back, make such shifts difficult to predict. An often-cited example is climate change, as it is subjected to strong variance in weather conditions, rising CO 2 -emissions and other non-linear interactions from anthropogenic influences, which is believed to happen faster than we would like it to happen (Lenton, 2011). Similar rapid and hard to predict state changes are observed in a multitude of systems, ranging from physics (Morales et al., 2015), economics (Majdandzic et al., 2014), ecology (Carpenter et al., 2005), via brain states (McSharry et al., 2003) to social systems (Su et al., 2017).
For assessing the possibility of anticipating critical transitions in dynamic systems a set of methods has been suggested, which is commonly referred to as early warning signals (EWS) (Dakos & Bascompte, 2014;Scheffer et al., 2009Scheffer et al., , 2012. Although providing interesting insights CONTACT Manfred Füllsack manfred.fuellsack@uni-graz.at into details of system dynamics and allowing for a certain degree of forecasting CTs in simulated as well as in realworld systems, EWS are subject to a range of problems (Jäger et al., 2017;Reisinger & Füllsack, 2020), which narrow their predictive power. In an attempt to understand these problems better and to test alternative prediction potentials, Füllsack et al. (2021) designed an agent-based model meant to differentiate the drivers of dynamical systems and therewith to allow observation of what could be called an external or 'factual' influence of a critical parameter and an internal or 'social' influence of the system's components interaction separately. This model provides an option for considering different weights for the influence of a particle's neighbourhood interaction and thus allows to investigate details of the system's state changes under varying neighbourhood conditions. Here we used this model to generate a large data set of time series with varying degrees of neighbourhood influence on which a long-short-term-memory (LSTM) neural network was trained to differentiate the strength of neighbourhood influence. As it turned out, this LSTM is capable of detecting significant changes in the behaviour of systems which approach a critical transition. Tests on a variety of time series from systems prone to a CT show that in many cases the LSTM provides earlier and also clearer signals for an imminent CT than the ones gained from standard EWS-analysis. The paper is organized as follows: Section 2 presents results from prediction tests on different time series from simulated and empirical systems. Section 3 introduces the ABM used for the training set generation and discusses some of its implications. Section 4 describes the data generation process and the training of the LSTM. Section 5 discusses some preconditions for the analysis, compares methods and verifies results. Section 6 discusses the findings of the investigation and concludes the paper. Figure 1 shows twelve example results from predictions made by an LSTM (as detailed in Section 4) trained on a large data set of time series generated with five different strength levels of neighbourhood influence (as explained in Section 3). From an extensive survey of various systems known to undergo CTs, six of the shown examples (first two rows) are on data from simulated systems and six are on empirical data (details are provided in Table 1). The predictions were made on a rolling window of fractions of time series from these systems, with the analysed fractions (deep blue in Figure 1) taken in the run-up to the CT up to a short distance before it occurs (indicated with the grey vertical line). To ensure a predictive effect, the forecasts are made at the end of the rolling window (other than in many standard EWS-analyses where the middle of the rolling window is considered). The black lines indicate the predicted strength level of neighbourhood influence in five categories ranging from 0 ('no influence') in  Table 1. Table 1. Data sources and details of systems as analysed in Figure 1. dW is a normal random variable with zero mean and unit variance.

Results
Sources of model-generated data, as analysed in Figure 1, 1st and 2nd row plots 4δ 3 x 2 (x + δ) dt + σ xdW, with b = 0.9, σ = 0.05, and ain{−1, 1} with K = 10, σ = 0.025, and rin{0.2, 0.8} dy = −δ 2 − x dt Patches of ecosystems coupled by movement of organisms (Brummitt et al., 2015) Insect outbreak (Ludwig et al., 1978) w i t h e = 0.02, σ = 0.001, δ = 0.5 Van der Pol relaxation oscillator (Zhang et al., 2015)  with, σ = 0.006, t = 0.1, T = 2.12 and H in {0.2, −0.2} Ricker-model with Holling Type II (Bury et al., 2020) Pitchfork bifurcation in Ising-model (Morales et al., 2015) (Smug et al., 2018) Fold bifurcation in Ising-model (Reisinger & Füllsack, 2020) Sources of empirical data, as analysed in Figure 1, 3rd and 4th row plots End of glaziation 1 End of greenhouse earth End of younger Dryas Vostok ice core d2H (Dakos et al., 2008) ODP tropical Pacific core 1218 CaCO3 (Dakos et al., 2008) Cariaco basin core PL07-58PC ( steps of 0.25-1 ('full influence'). The red curves indicate the probability with which the LSTM predicts the highest (α = 1) category of neighbourhood influence. In contrast to observing each category's probability separately, the one for the highest α-category does not fluctuate whenever the actual prediction (black) changes and is therefore taken as the actual indicator of an imminent CT in this context. We call it 'certainty of the LSTM to see the highest α-category'. To enable a comparison, the most common standard EWS-indicator -autocorrelation at lag-1 (AC-1), calculated using the Python-library ews-tools (https://github.com/ThomasMBury/ewstools) for the same fraction of the time series on which the LSTM has been applied -is shown (green in the plots). In Table  1, details of the analysed systems are provided (further examples are shown in Figures 2-5). EWS-analysis builds on the presence of noise in a system's dynamics. Its core aspect is Critical Slowing Down, 1+x 2 dt + σ xdW, with K = 10 and r driven upwards from r = 0.2 to r = 0.8, showing nearly no LSTM-certainty indication (red curve) in all considered noise levels (σ in {0.001, 0.025, 0.05}), while the EWS-indicator AC-1 (green curve) yields some signal in this case.

Figure 5. CT in the system
showing only a very weak, and with high noise levels no decline in LSTM-certainty for the highest category, regardless whether the system's control parameter is driven upwards (top row) or downwards (bottom row).
indicating declining resilience and thus slower recovery rates from perturbations when the system approaches a CT. With regard to this, small deviations from equilibrium caused by noise can be considered continuous perturbations to which the system recovers on a permanent base, but gets delayed more and more in its recovery activities the closer it is to the CT. This delay is what EWS-analysis detects in terms of statistical signals (for a detailed exposition see a.o. Dakos et al., 2012). Usually, when simulating such systems with equation-based methods (EBM), noise is artificially added when generating time series for EWS-analysis (Boettiger, 2018). EWS-analysis however, is sensitive to differences in noise levels yielding sometimes unreliable or no results. For this reason, all simulated systems in our investigation were tested with different noise levels, added with varying strength in the form of normally distributed random deviations, or White noise (σ xdW, with σ in 0.001, 0.025, 0.05). As an example, Figure 2 shows comparative analyses of the state variables of the System dx = (−x 3 + bx + a)dt + σ xdW (first plot in Figure 1), showing that AC-1 (green curve) provides different signals about the imminence of a CT at different noise levels, while LSTM prediction (black) and LSTM certainty (red) signal state changes in the neighbourhood interaction of the system quite clearly in all three cases considered.
In all considered simulated systems the LSTMprediction (black) as well as the LSTM-certainty (red) decline in the onset of a CT. To some extent, this seems counter-intuitive. One would expect that systems prone to a CT show an increase in feedback activity indicated by the highest α-category corresponding to the strongest social influence. Our results, in contrast, show a decline of this indicator insinuating rising 'factual' influence of the critical parameter when approaching a CT. To make sure that this is not caused by representing neighbourhood interaction as a mere mean-field approximation, as is done in EBMs, these findings were tested against data from several agent-based-models, i.e. ABM-generated systems, among them 2D implementations of the Isingmodel (Reisinger & Füllsack, 2020) and ABM-simulations of a repeated Public Good Game (Hofer et al., 2018).
Results of the later are shown in Figure 3. The LSTMpredictions decline in all considered cases, indicating a drop of neighbourhood influence when approaching the CT. The LSTM-certainty (red) seems to react clearer and earlier than the one of the main standard EWS, i.e. autocorrelation at lag-1 (green).
In the case of the empirical data from real-world systems (rows 3 and 4 in Figure 1) the picture seems less clear. LSTM-predictions rise or decline when the system's state variable approaches the CT. In all considered cases however, the LSTM-predictions show significant changes in the strength of neighbourhood influence when approaching a CT.
However, not all experiments with modelled systems yield unambiguous results. In some systems, unsymmetrical equilibrium states appeared to influence the prediction results dependent on the direction from where a CT is approached. The system dx = rx 1 − x K − x 2 1+x 2 dt + σ xdW, with K = 10 and r {0.2, 0.8} for instance, interpreted as modelling an Insect outbreak in (Ludwig et al., 1978), yields quite distinct LSTM-certainty indications of an imminent CT when r is driven downwards from r = 0.8 to r = 0.2, as shown in Figure 1 (2nd plot top row), but shows nearly no indication in all considered noise levels when r is driven upwards over the same interval (although tipping as abruptly), as can be seen in Figure 4 (red curve). The EWS-indicator AC-1 (green curve) in contrast, yields some signal in this case.
Another example of a limited informative value in the examined models is the system dx dt = a − bx + r x p x p +h p + σ xdW, with b = 1, r = 0.6, p = 8, h = 1and a {0.2, 1.2}, suggested as modelling a lake eco-system in (Scheffer et al., 2001). The LSTM predicts some changes in the categories of neighbourhood influence (α-values, black line in Figure 5) in all considered noise levels. But it shows only a very weak, and with high noise levels no decline in certainty for the highest category, regardless whether the system's control parameter is driven upwards (top row in Figure 5) or downwards (bottom row).
In view of these observed limitations, we checked the significance of our findings by conducting a significance analysis over a range of variants of the modelled systems (N = 302). For this, the Kendall τ value, as serving as a measure of increasing or decreasing trend (Chen et al., 2020), was compared on the obtained autocorrelationat-lag-1 (AC-1) signals and on the LSTM-certainty-signals. A two-sample t-test considering the null and alternative hypotheses H0 : mean(τ of AC − 1) = mean(τ of LSTM − certainty) versus H1: mean(τ of AC − 1) < mean(τ of LSTM -certainty) yielded a p-value of near zero (8.260719 139423966e-33), indicating LSTM-certainty as significantly more indicative at a 5% significance level on the tested data.

The model for generating training data
The model used for generating the data with which the LSTM was trained is inspired by the well-known 2D-Isingmodel for simulating magnetic dipole moments of atomic spins (Onsager, 1944). It was designed to enable a separate observation of the overall spin orientation of the particles (i.e. in physical contexts the magnetization) and the interaction effects before being 'mixed into' the control parameter that drives the system. In a non-physical context, this could be understood as a separation of what could be called a 'factual' and a 'social' (or neighbourhood) influence on the overall spin orientation of a particle population. A timely example to explain the rationale behind this separation might be the distinction of the influence of the 'factual' (i.e. medical) effect of a Covid 19 vaccine from the 'social' influence of rumours about its alleged side effects on the number of people who choose to be vaccinated. The reason for developing the model, as detailed and discussed in (Füllsack et al., 2021), is the assumption that neighbourhood interaction is one of the main drivers of CTs, but EBMs do not adequately represent these feedback processes.
The basic variant of the model is an agent-based model, with agents interacting in a Mooreneighbourhood on a grid by mutually influencing each other's spin orientation, with spins being either −1 or +1. The 'factual' influence is a parameter c varying between 0 and 1 that acts as a probability for a positive spin orientation. The 'social' influence is defined through a function, which compresses the actual spin orientation of a neighbourhood into a relatively small interval around the tie of an agent's neighbours with opposing orientations. This function reads: and corresponds to the tangens hyperbolicus function when a = 2 and b = 1. With a > 2 (a < 2), it narrows (widens) the interval around the tie of neighbours and thus steepens (flattens) the transition. Withb > 1, it shifts the midpoint of the transition to the right, with 0 < b < 1 it shifts it to the left, which could be interesting when social influence is thought to be asymmetrically distributed between conservative and progressive neighbours. In the version of (Füllsack et al., 2021), f (x) is added to the control parameter c (i.e. the 'factual' influence), with both terms being weighted in regard to an α-value representing the predominance of the one term over the other. The sum then is min-max-normalized again to the interval 0, 1 in order to serve as a probability p for a particle's spin to be in state +1. Here in contrast, the α-weighing gets applied solely to social influence f (x), so that the resulting system conditions, and hence the time series generated in that way, vary in regard to how much social influence has been added to (an assumed steady) factual influence c. In mathematical terms: with α in 0, 0.25, 0.5, 0.75, 1 expressing the weight of social influence being added to factual influence, where s is the spin of the considered particle, nb is the sum of the spins of its neighbours, c is the critical parameter (the 'factual' influence), and norm is a min-max-scaling function for the interval 0, 1. To illustrate this briefly with an example case: if an agent with spin = −1 is surrounded by a neighbourhood of two agents with spin = −1 and six other neighbours with spin = +1, the 'factual' influence c = 0.5 and 'social' influence has full weight, hence α = 1, then, according to Equations (1) and (2), the agent's probability to shift spin to +1 will be p = 0.832.
The effect of compressing the neighbourhood interaction with Equation (1) to a small range around the neighbourhood tie is that the overall spin orientation of a population of agents interacting in a Moore-neighbourhood on a grid, when started in an all -1 mode and c increasing linearly from 0 to 1, executes a typical sigmoid transition with a slowly ascending initial phase, a steep ascend once the social feedback sets in and a slow saturation phase when the majority of agents has shifted its spins. What is more, the dynamics also show a hysteresis-like behaviour, causing the transition to occur at different values in dependence of whether c increases from 0 to 1 or decreases from 1 to 0. The analytical separation of social and factual influence allows to observe the reaction of the two terms separately in each step of time. Note that the intensity of social influence (i.e. the combined influence of neighbours with same spin) nevertheless is determined by the history of the system up to the point of observation. At this point however, it can be observed before it gets summed with the critical parameter to provide the next step's spin orientation. This setting thus allows for a second aggregated observable, namely social influence by itself (green curves in Figure 6), in addition to the average spin orientation (corresponding to magnetization in the Ising, blue curves in Figure 6).
As discussed in (Füllsack et al., 2021), the model supports the thesis that the cause of the hysteresis delay appears to be the social influence factor. Varying α shifts and flattens the transition in regard to its original form with social influence in full force (for details see Füllsack et al., 2021).

Training the LSTM
This model was used to generate a large set of time series varying in length and in the α -weight with which social influence was added to c, the 'factual' influence. The original idea was to gain an understanding of how neighbourhood interaction changes the signals gained in standard EWS-analyses in regard to a pure random walk of an external control parameter. Background was the observation that noise, when added to simple exponentially growing or decreasing developments, can induce false positives in EWS (see Jäger & Füllsack, 2019). For this reason, time series were generated so that the control parameter c is initiated with a value of 0.5 and made to perform a random walk with step size dt = 0.01. In each timestep, social influence (according to Equations (1) and (2)) was added with varying α in 0, 0.25, 0.5, 0.75, 1. For each of these αvalues, 20,000 time series of lengths 100 timesteps were generated. The model was implemented with Netlogo (Wilensky, 1999). Time series were generated using Netlogo's behaviour space-function. Figure 7 shows examples of such time series with varying α-values. As can be seen, the α-weighted social influence adds dynamics to the development, so that the possibility to differentiate the time series with a machine learning classifier can be expected.
For classification, a so-called LSTM neural network was deployed. Originally proposed to avoid the vanishing or  exploding gradient problem (Hochreiter & Schmidhuber, 1997), these artificial neural networks are considered efficient learners in the context of long-term patterns in time series (Gers et al., 2002). Their architecture foresees particular components, 'cells' and 'gates', which regulate the flow of information from neuron layer to neuron layer with the effect of enabling the ANN to consider a sort of context knowledge when being trained on new data, thus accounting for a sort of experience from earlier phases in their training and making them particularly sensible for subtle changes in developments.
The LSTM was implemented using keras (version 2.4.3, keras.io) based on tensorflow (version 2.5.0, tensorflow.org), consisting of a dense input layer with 100 neurons and Relu-activation, an LSTM-layer with 100 neurons, another dense layer with sigmoid-activation for the output with five neurons (according to the five α-values), and a dropout-rate of 0.2. In order to fit the model to data, an Adam-optimizer was used over 100 epochs with a batch-size of 64. Note that the considered time series are relatively short, so that the problem of exponential memory decay in LSTMs (Zhao et al., 2020) has not been addressed in this setting. As expectable, training yielded a high test-set-accuracy of about 95%. Figure 8 shows the classification report and the loss development of the LSTM-training on time series with l = 100.

Testing and verifying
For verifying the performance of the LSTM, its results were tested with respect to true and false positive predictions with receiver operator characteristics (ROC). The ROCcurve indicates the ratio of true positives to false positives as a discriminant value which determines whether the classifier predicts a transition correctly. The area under the ROC curve (AUC) indicates the performance in respect to sensitivity (whether true positives are detected) and specificity (whether false positive are kept to a minimum). The AUC is 1 if predictions are perfect and 0.5 if they are no better than random.
For this, we took a random sample of 300 time series of length 2000 time steps taken backwards from a distance of 10 timesteps before the CT from the considered simulated systems and labelled them with a '1' as undergoing a CT. These time series were analysed with a rolling window of size 100 in respect to the EWS-indicators autocorrelation-at-lag-1 (AC-1), standard deviation (STD), and the peak in the power spectrum (Smax) (see for the same procedure (Bury et al., 2021)) and in respect to the LSTM-certainty (that is, its indication for the probability of seeing the highest α-category). To assess the indication of an imminent CT, the Kendall τ value, which serves as a measure of increasing or decreasing trend (Chen et al., 2020), was computed over the resulting indicator curves. In the case of the generic EWS-indicators, an increasing τ > 0.3 was considered as correctly indicating the approach of a CT. In the case of the LSTM-certainty, a decreasing τ < −0.3 was considered as correctly indicating a CT.
An additional set of 300 'neutral' time series of the same length was generated with a linear (f (x) = ax − b + σ xdW) and a non-linear (f (x) = ax b + σ xdW) function,  with a and b in −3, 3 and σ in 0.01, 0.5. These time series were labelled with '0' indicating no imminent CT, and analysed in the same way as the 300 positively labelled time series.
ROC curves for the Kendall τ of the EWS-indicators AC-1, STD and Smax were considered separately (left plot in Figure 9) and in a normalized combination (right plot) and compared with the negative Kendall τ of the certainty of the LSTM. As can be seen, the LSTM-certainty clearly outperforms the considered EWS-indicators.

Discussion
We trained an LSTM neural network on time series generated with an agent-based model that allows to consider different strengths of a system's neighbourhood interactions as drivers of the system's overall dynamics. As it turned out, this trained LSTM is able to assess changing dynamics in a system's state variable when approaching a CT. Its predictions thus appear to be usable as an alternative or a complement to the set of standard EWSs. As our experiments show, in many cases the predictions provide clearer and often even earlier signals than standard EWSs, both for simulations with different noise levels, for agent-based modelled systems, and for empirical systems.
The question that remains is what exactly is it that the LSTM appears to be detecting. The answer suggested by the way the LSTM's training data is generated obviously points at feedback processes in the particles' neighbourhood interactions which agitate the system. This corresponds to common diffusion theories (a.o. Rogers, 1962), which build on a rich-get-richer dynamic of mutually reinforcing social interactions. In this vein, one would assume positive feedbacks that drive the system with increasing influence strength towards a CT. However, in the case of the simulated systems at least, the LSTM clearly indicates a decrease, not an increase of social influence strength. The predicted α-categories decline in the run-up to the CT, which appears to be at odds with the assumption that the drivers in this case are positive feedback effects.
A possible explanation for this decline is insinuated in Füllsack et al. (2021). The authors consider networked implementations of various systems undergoing a CT and differentiate the state variable with respect to whether it is seen as the average of the fraction of nodes with the highest or with the lowest centrality in the network. In this investigation as well, it could be expected that stronger signals for an imminent CT may be gained from the core of the network, that is the nodes with high centrality, since the central nodes would be the ones where positive feedback effects can propagate well due to higher network density. In contrast however, the authors find that a less dense connectivity in the network causes higher variance of spin states at lower values of the critical parameter, and thus stronger and earlier EWSs from the periphery of the network (that is, from the less central nodes). As explanation, the authors suggest that, apart from positive feedbacks in the neighbourhood interactions, there might be also negative feedback effects influencing the transition processes, that is, a sort of conservative social influence, a self-enforcing neighbourhood inertness, that keeps particles from flipping at the point where the critical parameter would suggest it. The sparsely linked network periphery would impede the propagation of these negative feedback effects, thus causing more resonance in those system dynamics that are measured with EWS.
If this proposition holds water, it could imply that what the LSTM is reporting in the case of the examined simulated systems is not a decline of positive, but of negative feedback processes in the run-up to the CT. This in turn would imply that negative feedbacks play a crucially more important role in guiding transition processes than what is usually conveyed in social reinforcement or diffusion theories. Future work will have to focus on this open question and try to pinpoint the actual reasons for the signals seen in these LSTM predictions.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The data needed to reproduce the findings of this paper are freely available at zenodo with the https://do.org/10. 5281/zenodo.5666744.