Storing and retrieving long-term memories: cooperation and competition in synaptic dynamics

We first review traditional approaches to memory storage and formation, drawing on the literature of quantitative neuroscience as well as statistical physics. These have generally focused on the fast dynamics of neurons; however, there is now an increasing emphasis on the slow dynamics of synapses, whose weight changes are held to be responsible for memory storage. An important first step in this direction was taken in the context of Fusi's cascade model, where complex synaptic architectures were invoked, in particular, to store long-term memories. No explicit synaptic dynamics were, however, invoked in that work. These were recently incorporated theoretically using the techniques used in agent-based modelling, and subsequently, models of competing and cooperating synapses were formulated. It was found that the key to the storage of long-term memories lay in the competitive dynamics of synapses. In this review, we focus on models of synaptic competition and cooperation, and look at the outstanding challenges that remain.


Introduction
Memory [1,2] and its mechanisms have always attracted a great deal of interest [3]. It is well known that memory is not a monolithic construct, and that memory subsystems corresponding to episodic, semantic or working memory exist [4]. We focus here on explicit memory, which is the memory for events and facts.
Models of memory have, themselves, long been studied in the field of mathematical psychology: the article by Raaijmakers and Shiffrin [5] provides a valuable review of models that existed well before the neural network models with which most physicists are familiar, began to appear. Here, memory was assumed to be distributed over a large set of nodes and an item was defined by the pattern of activation over a set of nodes. This was propagated through a network of links whose geometry and weights determined the output. Such models of storage and retrieval are discussed at length in [5], but in the interests of a historical presentation, we briefly describe the earliest example known as the 'brain state in a box' model, or BSB [6]. In this model, items are vectors while learning is represented by changes in synaptic strengths. For any such pair of items, the synaptic strengths between the input and output layers are modified in such a way that considerable storage and retrieval is possible, even in the presence of noise. There have in parallel been a lot of suggestions regarding the way in which working memory actually functions: from the point of view of the current review, the most important distinction between these is that forgetting involves temporal decay in the research of Baddeley and co-workers [7,8] and that it does not, in the work of Nairne and coworkers [9,10]. Although a detailed discussion of these psychological (and somewhat empirical) models is beyond the scope of this review, they do indeed offer fertile ground for mathematical modellers who would wish to construct quantitative models of working memory.
In general, memories are acquired by the process of learning. Simply put, patterns of neural activity change the strength of synaptic connections within the brain, and the reactivation of these constitutes memory [11]. In this context, we first review the different kinds of learning to which a network can be subjected [12]. These are respectively: supervised, reinforcement, and unsupervised learning. In supervised learning, the goal is to learn a mapping between given input and output vectors, as, for instance, when we classify the identity of items in a list. In reinforcement learning, the goal is to learn a mapping between a set of inputs or actions in a particular environment and some measure of reward. In unsupervised learning, the network is provided with no feedback at all. Rather, synaptic strength changes occur according to a learning rule based only on pre-and post-synaptic activity, with no reference to any desired output. The pattern of synaptic strengths that results in this case depends on the nature of the learning rule and the statistical structure of the inputs presented. It is this kind of learning with which this review will be chiefly concerned.
The somewhat bland statement above, of memories being acquired by a process of learning, actually pushes a lot of puzzles under the rug. Why is it that some memories are quickly forgotten, while others last a lifetime? One hypothesis is that important memories are transferred, via their synaptic strengths, to different parts of the brain that are less exposed to ambient noise. In particular, during a process known as synaptic consolidation [13], 1 memories that are first stored in the hippocampus are transferred to other areas of the cortex [14,15]; this transfer can happen while the events are rerun during sleep [16]. The case of the famous patient HM [17] whose hippocampus was removed following epilepsy reinforces this hypothesis: HM retained old memories from before his surgery, but he could barely acquire any new long-term memories.
There is yet another mechanism for memory consolidation which happens at the synaptic level, involving the mechanism of synaptic plasticity, whereby synapses change their strength. Short-term plasticity occurs when the change lasts up to a few minutes, while long-lasting increases/decreases of synaptic strength are known respectively as long-term potentiation/depression (LTP/ LTD); LTP was first discovered experimentally by Bliss and Lomo [18] in 1973. Long-term plasticity is further subdivided into early-long-term plasticity (e-LTP) when synaptic changes last up to a few hours and late-long-term plasticity (l-LTP), when they last from beyond typical experimental durations of 10 h to possibly a lifetime. Such l-LTP also falls within the terminology of synaptic consolidation [19]; here, relevant memories are consolidated within the synapses concerned, so that new memories can no longer alter previously consolidated ones. The two most important theoretical models of this second kind of synaptic consolidation involve a process called synaptic tagging [19][20][21]. The hypothesis is that a single, brief burst of high-frequency stimulation is enough to induce e-LTP, and its expression does not require protein synthesis. On the other hand, l-LTP can be induced by repeated bursts of high-frequency stimulation, which leads to an increase in synaptic strength until saturation is reached. There is also a view [21] that more stimulation does not increase the amount of synaptic weight change at individual synapses, but rather increases the duration of weight enhancement. In this case, it has been shown that protein synthesis is triggered at the time of induction. Also, it was found that e-LTP at one synapse could be converted to l-LTP if repeated bursts of high-frequency stimulation were given to other inputs of the same neuron during a short period before or after the induction of e-LTP at the first synapse [22]. This discovery led to the hypothesis that such stimulation initiates the creation of a 'synaptic tag' at the stimulated synapse, which is thought to be able to capture plasticity-related proteins. The general framework for these heterosynaptic effects is called synaptic tagging and capture, for the details of which the reader is referred to [19,21].
It should be mentioned here that because of the interdisciplinary nature of the field, much of the discussion in the literature [23,24] involves terminology such as 'plasticity induction and maintenance', to refer respectively to shortterm and long-term plasticity changes. Specifically, in [24], the author's findings reinforce the intuition that LTP induction and maintenance would lead respectively to short-and long-term memory. Thus in the following, models manifesting short-term memory involve only plasticity induction, while plasticity maintenance is responsible for the manifestation of long-term memory in the models that form the core of this review.
Finally, some of the most recent developments in the modelling of memory acquisition and maintenance involve the concept of engrams [25]; here, memories may be reconstructed by single neuronal activation. The underlying idea is that a big network of neurons is involved in memory acquisition, with several connections being modified; these may be lost over time or in an activity-dependent manner such that memory is virtually supported by a single connection, and later reconstructed. This mechanism suggests that memory reactivation may not rely on the same network involved in its acquisition, but rather on the reconnection of neurons that may have similar responses. The authors of [25] also suggest that memories at the time of acquisition are already stored in the cortex, instead of being transferred from the hippocampus to the cortex as suggested in [14,15].
To sum up, memory formation is a complicated phenomenon related to neural activities, brain network structure, synaptic plasticity [26], and synaptic consolidation [19].
We will provide an overview of some of the more traditional approaches, involving neural networksboth those based on detailed biophysical principles and those that were explored by statistical physicists starting from the seminal work of Hopfield [27]. Much of this has already been extensively reviewed, so the focus of the present review comprises questions like: how can short-term and long-term memory coexist in our brains? While it is known that short-term memory is ubiquitous, what are the synaptic mechanisms needed for long-term memory storage?
It is well known that too much plasticity causes the erasure of old memories, while too little plasticity does not allow for the quick storage of new memories. This palimpsest paradox [28,29] has been at the heart of the quandary faced by modellers of synaptic dynamics. While synaptic consolidation does indeed provide some insights into this, neuroscientists [30,31] have typically focused on synaptic plasticity [32], for which increasingly sophisticated models have emerged over the years [33][34][35]. There are two broad classes: biophysical models, which incorporate details at the molecular level, and phenomenological models, which relate neuronal activity to synaptic plasticity. It is the latter class of models that we will focus on in this review, both because they are more amenable to statistical physical techniques and because they account for higher-level phenomena like memory formation. Such modelling, while it may not include details of specificities involving chemical and biological processes in the brain, can outline possible mechanisms that take place in simplified structures. For example, the study of neural networks [33][34][35], while it greatly simplifies biological structures in order to make them tractable, has still been able to make an impact on the parent field. In particular, neural networks such as the Hopfield model [27,36] have been extensively investigated via methods borrowed from the statistical physics of disordered and complex systems [37][38][39]. In these models, memories are stored as patterns of neural activities, which correspond both to low-energy states and to attractors of the stochastic dynamics of the model.
What this class of phenomenological models lacks in biological detail, it typically makes up for in minimalism. Abbott, one of the pioneers in this field, summed up its virtues thus [40]: Identifying the minimum set of features needed to account for a particular phenomenon and describing these accurately enough to do the job is a key component of model building. Anything more than this minimum set makes the model harder to understand and more difficult to evaluate. The term 'realistic' model is a sociological rather than a scientific term. The truly realistic model is as impossible and useless a concept as Borges' map of the empire that was of the same scale as the empire and that coincided with it point for point.
Within this class of models, there is yet another divide; there are models which focus on the fast dynamics of neurons, and then those that focus on the slow dynamics of synapses. We will review each one in turn. In particular, in the second case, we will focus on the nature of synaptic dynamics, which involve competition and cooperation [41]. There is abundant evidence that correlation-based rules of synaptic cooperation, which lead to the outcome 'neurons that fire together, wire together', are followed in many organisms; the latter is known as Hebb's rule, due to the pioneering work of Hebb in establishing it [42]. In synaptic cooperation therefore, synapses that work together are rewarded by being strengthened. However, synapses also have a competitive side: while some synapses grow stronger and prosper, others, which left to themselves would also have strengthened, instead weaken. (An example of this can be seen in the process of ocular dominance segregation [43], where competitive correlations ensure that inputs to the left and right eye, though they fire together, do not wire together). Of these two processes, synaptic cooperation is by far the more commonly used in mathematical modelling; however, its unbridled prevalence leads to instabilities, for which synaptic competition provides a cure. From a more biological standpoint, synaptic competition is a concept that has long found favour with the neuroscience community [31]; however, its use is relatively recent in the context of statistical physics models. The present review accordingly emphasises those approaches where synaptic cooperation and competition are key.
We begin this article with a review of the Hopfield model (Section 2), where we describe the model as well as its use in storing and retrieving random patterns. We then turn to phenomenological models of synaptic plasticity (Section 3), which are further classified as rate-based models (Section 3.1) and spike-time-dependent plasticity (STDP) models (Section 3.2), where the synaptic strength is always treated as a continuous variable. A change of key sets in Section 4, when synapses are discretised, with the further possibility (Section 4.1) of occupying a multiplicity of states. In the following section (Section 5), we present an extensive review of the neuroscience literature to do with the perceived need for synaptic competition. These ideas are implemented in Section 6 where, in particular, synaptic strengths are discretised and competitive dynamics embedded, using tools from statistical physics. In the Discussion (Section 7), we summarise the state of the literature and discuss some future challenges.

The Hopfield model
Appropriately for the readership of this journal, we start by reviewing the Hopfield model, both because this is one of the seminal contributions of physics to the field and also because it is the basis on which a large class of models (STDP, cf. Section 3.2) is based.
In 1982, John Hopfield introduced an artificial neural network to store and retrieve memory like the human brain [27,36]. In such a fully connected network of N neurons, there is a connectivity (synaptic) weight J ij between any two neurons i and j, which is symmetric so that J ij ¼ J ji , and J ii ¼ 0. Such a network is initially trained to store a number of patterns or memories. It is then able to recognise any of the learned patterns by exposure to only partial or even some corrupted information about that pattern, that is, it eventually settles down and returns the closest pattern or the best guess.
We present here a simple picture of memory storage and retrieval along the lines of [44]. Each neuron is characterised by a variable S which takes the value þ 1 if the neuron is firing and À 1 if the neuron is not firing. At time t þ 1, the neuron labelled by the index i, where i ¼ 1; 2; 3; . . . ; N for a system of N cells, fires or does not fire based on whether the total signal it is receiving from other cells to which it is synaptically connected is positive or negative. Thus, the basic dynamical rule is (1) where J ij is a continuous variable representing the strength of the synapse connecting cell j to cell i. The basis of a network associative memory is that the above dynamics can map an initial state of firing and non-firing neurons, S i ð0Þ, to a fixed pattern, i , which remains invariant under it. Various memory patterns μ i for μ ¼ 1; 2; 3; . . . ; P which do not change under the above transformation act as fixed-point attractors; initial inputs S i ð0Þ are mapped to an associated memory pattern μ i if the overlap P i μS i ð0Þ=N is close enough to one. How close this overlap must be to one, or equivalently how well the initial pattern must match the memory pattern in order to be mapped to it and thus associated with it, is determined by the radius of the domain of attraction of the fixed point. The issue of domains of attraction associated with a fixed point has never been completely resolved. The sum of all synaptic inputs at site i, known as the local field, is the signal which tells cell i whether or not to fire when S j ¼ μ j for all jÞi. In order for a memory pattern to be a stable fixed point of the dynamics, the local field must have the same sign as μ i or equivalently We will call the quantities h μ i μ i the aligned local fields. It seems reasonable to assume that the larger the aligned local fields are for a given μ value the stronger the attraction of the corresponding fixed point μ i and so the larger its domain of attraction. This reasoning is almost right, but it leaves out an important feature of the above dynamics. Multiplying J ij by any constants has absolutely no effect since the dynamics depends only on the sign and not on the magnitude of the quantity P J ij S j . Since the quantities h μ i μ i change under this multiplication they alone cannot determine the size of the basin of attraction. Instead, it has been found that quantities known as stability parameters and given by where we define provide an important indicator of the size of the basin of attraction associated with the fixed point μ i . Roughly speaking, the larger the values of the γ μ i , the larger the domain of attraction of the associated memory pattern. In order to construct an associative memory, one must find a matrix of synaptic strengths J ij which satisfies the condition of stability of the memory fixed points and has a specified distribution of values for the γ μ i giving the domain of attraction which is desired.
Notice that in the above, the synapses are used for storage and retrieval of memories, as well as a way of updating the neuronal states; in other words, they are not explicitly updated.

Phenomenological models of synaptic plasticity
We move on now to models where plasticity is invoked, that is, where synapses are explicitly updated. The assumption here is that neuronal firing rates are, in their turn, responsible for synaptic strengthening or weakening. The basic principle at work is Hebb's rule [42], which as mentioned above, says that 'cells that fire together, wire together'. Another way of viewing this rule is to say that simultaneous events over a period of time suggest a causal link, and many rate-based models of synaptic plasticity have been formulated on this basis. However, and more recently, a great deal of attention has been paid to a much stricter definition of causality via the field of STDP [45,46]: here, synaptic strengthening only occurs if one of the neurons is systematically active just before another one. In addition to realising the Hebbian condition that a synapse should be strengthened only if it constitutes a causal link between the firing of pre-and post-synaptic neurons, STDP also leads to the weakening of synapses which connect neurons whose firings are temporally correlated, but where the firing is not causally ordered.
We briefly review these two classes of models below.

Rate-based models
Here, the rate of pre-and post-synaptic activities measured over some time period determines the sign and magnitude of synaptic plasticity. The activities are modelled as continuous variables, corresponding to a suitable average of neuronal firing rates. The rate of change of synaptic strength or weight J i at synapse i is modelled as a function of the presynaptic input x i at that synapse, the postsynaptic output activity y, the weight itself, and, in the most general case, the weights of other synapses: Without the competition from other synapses J j , synaptic weights could grow uncontrollably. Before the explicit inclusion of synaptic competition, this instability was combated in two ways; in Oja's [47] model, Hebbian plasticity was augmented with a decay term, so that weights equilibrated to the first principal component of the input correlation matrix. Another way forward was shown by the BCM model [48] which explicitly included both LTP and LTD regions, with a sliding threshold separating them; when synaptic weights became too large, the threshold shifted so that any further activation led to synaptic depression. Subsequently, indirect ways of including synaptic competition (Section 5), such as the normalisation of the total synaptic weights, were included in the modelling; more recently, there have been a number of approaches where synaptic weights are discretised (Section 4) and competition explicitly implemented (Section 6).

Models of STDP
STDP provides the answer to the following question: For neurons embedded in a network which are bombarded with millions of inputs, which ones are important? Which information should a given neuron 'listen' to and pass along to downstream neurons? These are the formidable questions that the vast majority of neurons in the brain have to solve during brain development and learning. The crucial link is causalityif one of the cells is active systematically just slightly before another, the firing of the first one might have a causal link to the firing of the second one and this causal link could be remembered by increasing the wiring of connections. Theoreticians in the mid-1990s realized just how important temporal order was for conveying and storing information in neuronal circuits, and experimenters saw how the synaptic connections of the brain should be acutely sensitive to timing. Thus the field of STDP was born, via the key studies of Markram and Gerstner [45,46]. With STDP, a neuron embedded in a neuronal network can 'determine' which neighbouring neurons are worth connecting with, by potentiating those inputs that predict its own spiking activity, and effectively ignoring the rest [49]. The net result is that the sample neuron can integrate inputs with predictive power and transform this into a meaningful predictive output, even though the meaning itself is not strictly known by the neuron. An early example of using such models in associative memory can be found in [50]. This introduces 'spiking' neurons in a Hopfield [27,36] network: by the term spiking, three main features are implied, which are: (a) a neuron fires when a given threshold is reached; (b) it then undergoes a period of rest, which is referred to as 'refractoriness'; and (c) noise may be added to the firing rates. The synapses connecting the neurons follow a Hebbian learning rule (with no explicit competition) whereby incoming patterns are learnt, and their retrieval analysed along the lines of Section 2 as a function of various parameters.
While models of neurons themselves are the subject of considerable discussion [51], these early models have been greatly refined in recent times and are usefully summarised in [52]. However, as pointed out in [49], these theories are limited by the types of plasticity invoked in the models concerned. Indeed, in [53], it is tacitly acknowledged that without appropriate compensatory mechanisms (referred to there as being 'non-Hebbian'), Hebbian learning alone is not able to account for the reliable storage and recall of memories; the necessary mechanisms invoked in [53] involve, in addition to the Hebbian LTP/LTD, the (implicitly competitive) mechanism of heterosynaptic up-and down-regulation of synapses, as well as transmitter-induced plasticity and consolidation. This indeed reinforces the perceived need for some form of competition, as well as a somewhat more parsimonious form of modelling where possible.
Before concluding, we also mention that most STDP models can be averaged and reduced to rate-based models with certain assumptions: if all nodes interact with each other, they can be reduced to correlation-based models [54] whereas if nearest-neighbour interactions exist, the models that result are similar to the BCM model [55]. However, the fast dynamics of neurons, on which the STDP models are based, continue to attract a lot of research interest. Typically, models of integrate-and-fire neurons on networks have been extensively studied, and their different dynamical regimes explored [56]. In [57], the memory performance of a class of modular attractor neural networks has been examined, where modules are potentially fully-connected networks connected to each other via diluted long-range connections. Interest in this fast dynamical regime has also been fuelled by the discovery of neuronal avalanches in the brain [58], which was followed by several dynamical models of neural networks [59,60], where the statistics of avalanches were investigated [61][62][63][64][65][66] and reviewed in [67]. In fact, the field of spiking neurons is now so well-established that it is the subject of textbooksof which an excellent example is the one by two of the most important workers in the field, Gerstner and Kistler [35].

State-based models
An alternative to considering unbounded and continuous synaptic weightsas is done in Sections 2 and 3is to consider discrete synapses, with a limited number of synaptic states, whose weights are bounded. This has experimental support [26,68] and also has the advantage that binary synapses, say, may be more robust to noise than continuous synapses [69]. An essential property of these models as well as real neural networks is that their capacity is finite. Such bounded synapses have the palimpsest property, that is, new memories are stored at the cost of old ones being overwritten [29]. This is in marked contrast to the case of unbounded synapses where the overall quality of both old and new memories degenerates as new information is processed. For bounded synapses, therefore, forgetting is an important aspect of continued learning [26,28,29,[70][71][72][73]. This situationthat of discrete, bounded synapses with an explicit forgetting mechanismis what we will focus on in the rest of this review.
Van Rossum and coworkers [21,74] have done a body of work on such state-based models; they have shown in particular that there is not an overwhelming reduction in the storage capacity of discrete synapses as compared to continuous ones. In their work, each synapse is described with a statediagram and each state has an associated synaptic weight. The simplest case of binary synapses ('synaptic switches') has been extensively used in earlier mathematical models [21,[75][76][77]. Interactions between synapses are incorporated in the state diagrams. Typically, Markov descriptions are used, and the eigenvalues of the Markov transition matrices give the decay times of the synaptic weights.
The above mechanism of synaptic plasticity has, however, been shown to be rather inefficient when synapses change permanently [78]. Pure plasticity indeed does not provide a mechanism for protecting some memories while leaving room for other, newer, memories to come in, hence leading to the need for the mechanism of metaplasticity [70][71][72]. 2 In order to improve performance, Fusi et al. [79] proposed a cascade model of a synapse with many hidden states, which they claimed was able to store long-term memories more efficiently, with a decay that was power-law rather than exponential in time. The pathbreaking idea behind the work of [79] was that the introduction of 'hidden states' for a synapse would enable the delinking of memory lifetimes from instantaneous signal response: while maintaining quick learning, it would also enable slow forgetting. In the original cascade model of [79], this was implemented by the storage of memories at different 'levels': the relaxation times for the memories increased as a function of depth. It was assumed that short-term memories, stored at the uppermost levels, would decay as a consequence of their replacement by other short-term memories ('noise'). On the other hand, longer-lasting memories remained largely immune to such noise as they were stored at the deeper levels, which were accessible only rarely. This hierarchy of timescales models the phenomenon of metaplasticity [80,81] and will be discussed in detail below.

Fusi's cascade model: a quantitative formulation
Fusi's model [79] of a metaplastic binary synapse with infinitely many hidden states was formulated quantitatively and investigated in [82]. Each state is here labelled by its depth n¼ 0; 1;:::; At every discrete time step t, the synapse is subjected either to an LTP signal (encoded as εðtÞ ¼ þ1) or to an LTD signal (encoded as εðtÞ ¼ À1), where εðtÞ ¼ AE1 is the instantaneous value of the input signal at time t.
The model, portrayed in Figure 1, is defined as follows: The application of an LTP signal can have three effects [82]: • If the synapse is in its À state at depth n, it may climb one level ðn ! n À 1Þ with probability α n . (This move was absent in the original model.) • If it is in its À state at depth n, it may alternatively hop to the uppermost þ state with probability β n .
• If it is already in its þ state at depth n, it may fall one level ðn ! n þ 1Þ with probability γ n .
Long-term memories will be stored in the deepest levels of the synapse, because of the persistent application of unimodal signals. The effect of noise on such a long-term memory here is to replace a long-term memory by a short-term memory of the opposite kind. If, for example, the signal is composed of all þ þ þ þ þ þ þ , an isolated À event could be seen to represent the effect of noise. In this case, the Fusi model [79] predicts that the signal is thrown from a deep positive level of the synapse to the uppermost level of the negative pole. Seen differently, this mechanism converts a long-term memory of one kind to a short-term memory of the opposite kind.
Along the lines of [79,82], the transition probabilities of this model are assumed to decay exponentially with level depth n: The corresponding characteristic length, is one of the key ingredients of the model, which measures the number of fast levels at the top of the synapse. It will be referred to as the dynamical length of the problem. The choice made in [79] corresponds to e Àμ d ¼ 1 2 , that is, μ d ¼ ln 2. A different characteristic length, the static length s , is given by Corresponding transition probabilities are indicated. In each panel, the left (resp. right) column corresponds to the À (resp. þ ) state. The model studied in this work is actually infinitely deep (after Ref [82]).
This is referred to as the static length of the problem, and gives a measure of the effective number of occupied levels in the default state [82]. The regime of most interest is where s is moderately large, so that the default state extends over several levels. The mean level depth is then essentially given by the static length.
The level-resolved output signal of level n at time t: D n ðtÞ ¼ Q n ðtÞ À P n ðtÞ (11) and the total output signal at time t: can be expressed in terms of the probabilities P n ðtÞ (or Q n ðtÞ) for the synapse to be in the À state (or the þ state) at level n ¼ 0; 1; . . . at time t ¼ 0; 1; . . . We now describe the effect of an LTP signal, that is, a sustained input of potentiating pulses lasting for T consecutive time steps (εðtÞ ¼ þ1 for 1 t T) on the model synapse. The synapse, assumed to be initially in its default state [82], will get almost totally polarized in response to the persistent signal.
This saturation phenomenon is illustrated in Figure 2, which shows the output signal DðtÞ for several durations T of the LTP signal. The synapse slowly builds up a long-term memory in the presence of a long enough LTP signal, as the memorized signal moves to deeper and deeper levels. At the end of the learning phase (t ¼ T), the polarisation profile will have the form of a sharply peaked travelling wave, around a typical depth which grows according to the logarithmic law [82] nðTÞ % d ln γT: After the signal is switched off, the total output signal decays. The late stages of the forgetting process are characterized by a universal power-law decay of the output signal: This is known as power-law forgetting [83][84][85]. The forgetting exponent is always larger than unity and depends on the ratio of the dynamical and static lengths d and s . As Equation (14) shows, it has no dependence on the duration of the learning phase, in keeping with the requirements of universality.

Comparison of cascade model with experiment
The cascade model and its variants have frequently been criticised for being somewhat abstract; one response has been to come up with ever-more sophisticated models for synaptic consolidation which incorporate the multiple timescales inherent in the cascade model. A three-layered model of synaptic consolidation has been proposed that accounts for data across a large range of experimental conditions [86]; while it has a daunting number of parameters -17 -, it is able to incorporate the retention of long-term memories. Fusi's own recent extension of the cascade model is also rather intricate: memories are stored and retained through complicated coupled processes operating on multiple timescales. This is achieved by combining multiple dynamical processes that initially store memories in fast variables, and then progressively transfer them to slower variables. It has the advantage of getting a larger memory capacity, while the corresponding disadvantage is that it is even more abstract than his earlier model, so that involved biological processes have to be explained via systems of communicating vessels [87]. We choose here instead to highlight a link with an experiment [88] whose findings are explained by the complex synaptic architectures of Fusi's original model [79], to combat the proposition that the cascade model is 'too abstract' to be useful. In particular, the experiment involves a single synapse connecting two cells, so that the Fusi model of a single synapse is appropriate. Specifically, in a system comprising an excitatory synapse between Lymnaea pre-and postsynaptic neurons (visceral dorsal 4 (VD4) and left pedal dorsal 1 (LPeD1-Excitatory)), a novel form of short-term potentiation was found, which was use-, but not time-dependent [88]. Following a tetanic stimulation (,10 Hz) in the presynaptic neuron with a minimum of seven action potentials, the synapse became potentiated whereby a subsequent action potential triggered in the presynaptic neuron resulted in an enhanced postsynaptic potential. Further, if an inducing tetanic stimulation was activated, but a subsequent action potential was not triggered, the synapse was shown to remain potentiated for as long as 5 h. However, once this action potential was triggered, the authors found that the synaptic strength rapidly returned to baseline levels. It was also shown that this form of synaptic plasticity relied on the presynaptic neuron, and required pre-(but not post-) synaptic Ca 2þ /calmodulin-dependent kinase II (CaMKII) activity. Hence, this form of potentiation shares induction and de-potentiation characteristics similar to other forms of short-term potentiation, but exhibits a timeframe analogous to that of longterm potentiation.
In [89], this experiment was interpreted via a variant of the cascade model described above, as follows: after a process of tetanic stimulation, the initial action potentials, interpreted as a non-random signal, cumulatively built up a long-term memory of the signal in the deepest synaptic levels. The synapse dynamics were then frozen so that further discharge was prevented. When a further action potential was applied, the synaptic dynamics restarted ('use'dependence): the release of the accumulated memory from the deepest levels of the synapse constituted the observed enhancement of the output signal described in [88]. While this enhancement is plausibly accounted for by the model of metaplastic synapses [82], the explanation of the freezing of the synaptic dynamics and its subsequent use-dependence needed the introduction of a stochastic and bistable biological switch to model the role of kinase (CaMKII) in the actual experiment [88].
Specifically, the synapse (Figure 1), assumed to be initially in its default state, is subjected to a sustained LTP signal of duration T 1 (i.e. the application of T 1 action potentials), and to a single action potential at a much later time (T 2 ) T 1 ). It is subjected to a random input at all the other instants of time (εðtÞ ¼ þ1 for 1 t T 1 and for t ¼ T 2 , else εðtÞ ¼ 0). In the regime where the number of action potentials T 1 of the initial signal is larger than some characteristic time T 0 of the switch, the freezing probability of the switch at the end of the LTP period is very high, that is, very close to unity. During this learning phase, the output signal DðtÞ grows progressively from Dð0Þ ¼ 0 to a large value DðT 1 Þ. The high value of the freezing probability at the end of this phase typically freezes the synaptic dynamics, ensuring that this enhanced output signal is not discharged. When the next action potential is applied at time T 2 , the switch is turned off, and the synapse then relaxes via the full discharge of the stored, enhanced output signal. Figure 3 shows a quantitative comparison between the theoretical predictions of [82] (upper panel) with sharp-electrode electrophysiology recordings of a VD4/LPeD1 synaptic pair (two lower panels) [89]. The black theoretical curve corresponds to three APs triggered during tetanic stimulation, which are insufficient to result in potentiation of a subsequent excitatory postsynaptic potential in the LPeD1 neuron (T 1 ¼ 3 ( T 0 , so that the switch remains off). The red theoretical curve corresponds to 11 APs, resulting in a potentiated subsequent response (T 1 ¼ 11 ) T 0 , so the switch is turned on and the synapse is frozen). The model biological switch used to model the action of kinase in [89] displays an essential bistability so that the phenomenon described above is observed more or less frequently depending on the difference between the duration T 1 of the initial LTP signal and the characteristic time T 0 of the switch. Thus, despite its seeming abstraction, the basic ideas of Fusi's cascade model can indeed be related to real experimental data; in fact, such complex synaptic architectures provide fertile ground for the inclusion of multiple timescales which are essential to the modelling of long-term memory.

Synaptic dynamics: the need for competition
In the models of the preceding section, while synapses have been central to the acquisition and recall of long-term memory, there has been no mention of their embedding networks, in particular to do with the neurons that synapses connect. In this section, we return to the concepts of Section 2 and to the explicit mechanisms of synaptic strengthening and weakening that result from neuronal firing within a network. We have already discussed in Section 3 several phenomenological models of synaptic plasticity, where the need for competitive dynamics has been made clear. In the following, we elaborate on several ways in which these have been implemented in the neuroscience literature.
In the following, we follow the lines of argument of Van Ooyen's excellent review article on synaptic competition [90], where a distinction is first made between independent and interdependent competition. In interdependent competition, victors emerge as a result of interactions between participants, such as in a sporting event. Interdependent competition is frequently considered, for example, in population biology; here, two species are said to compete if they try to limit the growth of each others' population. In independent competition, on the other hand, the participants do not interact, but are rather chosen on the basis of some sort of contest. This kind of competition is reminiscent of competitive learning which was introduced by Kohonen [91], and which will form the basis of the rest of this article.
In neural network models based on competitive learning, only synapses connected to the neurons most responsive to stimuli have their strengths changed. What is implicit here is that these stimuli come from presynaptic neurons so that their correlated transmission to postsynaptic neurons causes the corresponding synapses to be strengthened [92]. Such synaptic competition [31] often arises through Hebbian learning so that when the synaptic strength of one input grows, the strength of the others shrinks. Whereas many models phenomenologically enforce competition by requiring the total strength of all synapses onto a postsynaptic cell to remain constant [41], others implement biochemical processes and modified Hebbian learning rules.
To see how competition between input connections can be enforced, consider n inputs, with synaptic strengths J i ðtÞði¼ 1;:::;nÞ; impinging on a given postsynaptic cell at time t. Simple Hebbian rules for the change ΔJ i ðtÞ in synaptic strength in a time interval Δt state that the synaptic strength should grow in proportion to the product of the postsynaptic activity level yðtÞ and the activity level x i ðtÞ of the ith input. Thus ΔJ i ðtÞ / yðtÞx i ðtÞΔt : (16) If two inputs activate a common target, one needs competition to make one of the synaptic strengths grow at the expense of the other. A common method to achieve this is to constrain the total synaptic strength via synaptic normalisationthis is the constraint that with K constant and the integer p usually taken to be 1 or 2. Specifically, p ¼ 1 conserves the total synaptic strength, whereas p ¼ 2 conserves the length of the weight vector. At each time interval Δt, following a phase of Hebbian learning, in which J i ðt þ ΔÞ ¼ J i ðtÞ þ ΔJ i ðtÞ, the new synaptic strengths are forced to satisfy the normalization constraint of Equation (17). Typically this can be enforced by one of two processes: multiplicative or subtractive normalisation. These ensure that synaptic strengths do not grow without bounds.
In subtractive normalization [43,93], the same amount is subtracted from each weight to enforce the constraint. In multiplicative normalization [94][95][96][97] on the other hand, each synaptic weight J i ðt þ ΔtÞ is scaled in proportion to its size. A two-layer model is there proposed, where the stimuli in neurons of the input layer are sent to an output layer of neurons. If the neuronal inputs are above some specified threshold, then the responses in the output layer are calculated, taking into account the pattern of synaptic connections; weights are updated by a Hebbian rule after this neuronal activity stabilises. The final outcome of development may of course differ depending on whether multiplicative or subtractive normalization is used [12,98].
Kohonen [91] proposed a drastic but effective simplification of the approach of [94]. In the latter, a few hotspots of activity typically emerged in the output layer following the iterations of the input activity via the lateral synapses. To obviate the considerable time taken to ensure the convergence of these iterations, Kohonen proposed the centring of the activity in the output layer on the so-called 'winning' neurons, followed by standard Hebbian learning. This important simplification is vital to the statistical physics approaches that will be presented in Section 6.1. Another way of viewing this is to regard it as yet another nonlinear approach to competitive learning; if the layer of output neurons is assumed to be connected by inhibitory synapses, the neuron with the largest initial activity can be said to suppress the activity of all other output neurons.
The competitive approaches described in the above paragraphs are often described as hard, in the sense of being 'winner-take-all'. In soft competitive learning, all neurons in the output layer are updated by an amount that takes into account both their feed-forward activation and the activity of other output neurons. This will also be seen to have equivalences with agent-based learning models in the statistical physics approaches of Section 6.1.
Another approach for achieving competition is to modify the simple Hebbian learning rule of Equation (16) so that both increases in synaptic strength (LTP) and decreases in synaptic strength (LTD) can take place. If we assume that the presynaptic activity level x i ðtÞ as well the postsynaptic activity level yðtÞ must be above some thresholds, respectively θ x ; θ y , to achieve LTP (and otherwise yield LTD), then a suitable synaptic modification rule is [41] ΔJ i ðtÞ / ½yðtÞ À θ y ½x i ðtÞ À θ x Δt : Thus, if both yðtÞ and x i ðtÞ are above their respective thresholds, LTP occurs; if one is below its threshold and the other is above, LTD occurs. For this to qualify as proper competition, the synaptic strength lost through LTD must roughly equal the strength gained through LTP. This can only be achieved with appropriate input correlations, which makes simple LTD a fragile mechanism for achieving competition [41]. Another mechanism which ensures that when some synaptic strengths increase, others must correspondingly decreaseso that competition occursis to make one of the thresholds variable. If the threshold θ i x increases sufficiently as the postsynaptic activity yðtÞ or synaptic strength J i ðtÞ increases, conservation of synaptic strength is achievable [41]. Similarly, if the threshold θ y increases faster than linearly with the average postsynaptic activity, then the synaptic strengths will adjust to keep the postsynaptic activity near a limiting value [48]. This, however, results in temporal competition between input patterns, rather than spatial competition between different sets of synapses.
So far, causal links between seemingly correlated firings of neurons have been assumed. As before, STDP (cf. Section 3.2) makes this explicit via its emphasis on the relative timing of pre-and post-synaptic activity. In the approach of [99], presynaptic activity that precedes postsynaptic spikes strengthens a synapse, whereas presynaptic activity that follows postsynaptic spikes, weakens it. As a consequence of the intrinsic nonlinearity of the spike generation mechanisms, and with the imposition of hard limits on synaptic strengths, STDP has the effect of keeping the total synaptic input to the neuron roughly constant, independent of the presynaptic firing rates. This approach, of rewarding truly correlated neuronal activity while penalising its absence, has been taken into account in the models of synaptic dynamics presented in Section 6.2.

Statistical physics models of competing synapses
The emergence of new areas in physics has strongly contributed to the development of analytical tools; this is particularly true for the field of complex systems. A particular area which is of relevance in the context of this review is that of agent-based modelling; here, local interactions among agents may give rise to emergent phenomena on a macroscopic scale [100]. In these models, agents on the sites of appropriately defined lattices interact with each other; their collective behaviour is then analysed in terms of global outcomes. A typical example arises in, say, the context of financial markets; trading rules between different agents at an individual level can result in specific sets of traders, or their representative strategies, winning over their competitors. This makes for interesting analogies with competitive learning; approaches based on this have therefore successfully been used to investigate a wide variety of topics, ranging from the diffusion of innovations [101,102] through gap junction connectivity in the pancreas [103] to the dynamics of competing synapses [104][105][106]. It is the latter which will concern us here, but in the interests of completeness, we first briefly review an agent-based model of competitive learning in the following [101].

An agent-based model of competitive learning
The underlying idea [101] is that the strategy of a given agent is to a large extent determined by what the other agents are doing, through considerations of the relative payoffs obtainable in each case. Agents are located at the sites of a regular lattice, and can be associated with one of two types of strategies. Every agent revises its choice of type at regular intervals, and in this it is guided by two rules: a majority rule, reflecting the tendency of agents to align with their local neighbourhood, followed by an adaptive performance-based rule, via which the agent chooses the type that is more successful locally.
Assuming that the agents sit at the nodes of a d-dimensional regular lattice with coordination number z ¼ 2d, the efficiency of an agent at site i is represented by an Ising spin variable: The evolution dynamics of the lattice is governed by two rules. The first is a majority rule, which consists of the alignment of an agent with the local field (created by its nearest neighbours) acting upon it, according to: Here, the local field is the sum of the efficiencies of the z neighbouring agents j of site i and τ 1 is the associated time step. Next, a performance rule is applied. This starts with the assignment of an outcome σ i (another Ising-like variable, with values of AE 1 corresponding to success and failure respectively) to each site i, according to the following rules: if η i ðtÞ ¼ þ1; if η i ðtÞ ¼ À1; then σ i ðt þ τ 2 Þ ¼ þ1 w:p: p À ; À1 w:p: 1 À p À ; where τ 2 is the associated time step and p AE are the probabilities of having a successful outcome for the corresponding strategy. With N þ i and N À i denoting the total number of neighbours of a site i who have adopted strategies þ and À respectively, and I þ i (I À i ) denoting the number of successful outcomes within the set N þ i (N À i ), the dynamical rules for site i are: if η i ðtÞ ¼ À1 and then η i ðt þ τ 3 Þ ¼ þ1 w:p: ε À À1 w:p: 1 À ε À : Here, the ratios I i ðtÞ N i ðtÞ are nothing but the average payoff assigned by an agent to each of the two strategies in its neighbourhood at time t (assuming that success yields a payoff of unity and failure, zero). Also, τ 3 is the associated time step and the parameters ε AE are indicators of the memory associated with each strategy. In their full generality, ε and p are independent variables: the choice of a particular strategy can be associated with either a short or a long memory.
Setting the timescales the above steps of the performance rule are recast as effective dynamical rules involving the efficiencies η i ðtÞ and the associated local fields alone: if η i ðtÞ ¼ þ1; then η i ðt þ 1Þ ¼ þ1 w:p: w þ ½h i ðtÞ À1 w:p: 1 À w þ ½h i ðtÞ; if η i ðtÞ ¼ À1; then η i ðt þ 1Þ ¼ þ1 w:p: w À ½h i ðtÞ À1 w:p: 1 À w À ½h i ðtÞ: The effective transition probabilities w AE ðhÞ are evaluated by enumerating the 2 z possible realizations of the outcomes σ j of the sites neighbouring site i, and weighting them appropriately. The specific transition probabilities computed will depend on the embedding lattice or network chosen [101]. The above rules are appropriate for cases where the majority rule is clearly definable, that is, where there is a mix of agent types. The situation is less clear when there are large areas of a single species, since then, at least with a sequential update, there is a tendency for any exceptions to revert to the majority type, whatever their performance. The way around this in [101] was to formulate a so-called 'cooperative' model, where, say, a more successful agent surrounded by neighbours who had failed, was able to convert all of them to the more successful type, thus stabilising his own success. This hard rule is like the 'winner-takes-all' model of synaptic competition alluded to earlier in this review; analogously to that case, there is also a soft rule, where, while a significant majority of agents were coerced into changing their type, not all were so obliged. In [101], all these models were explored via ordered sequential updates of the agents, and phase diagrams of their extremely different dynamical behaviour in various regimes were presented. The agents were there also deemed to be memoryless, that is, they did not take earlier results into account when they made their choices. These restrictions were progressively removed in [107,108], so that the behaviour of the model with different updates, different levels of memory, as well as different interactions was explored.

A minimal model of synaptic dynamics with emergent long-term memory
The diligent reader will have noted the resemblance between Equations (19) and (23) above, and some of the equations governing neuronal and synaptic dynamics earlier presented in this article. Indeed, the detailing of the agentbased model of competitive learning [101] was to motivate just such a comparison. For example, neuronal firings are subject to the kind of local field embodied by Equation (21); the performance in both cases (successful neuronal firings and successful outcomes in the model of [101]) in turn lead to other dynamical changes, and result in global outcomes. These were precisely the lines of thought that led to the use of such agent-based models of competitive learning in some of the early, and somewhat simple-minded, models of synaptic dynamics [104,105].
Let us now recall what is needed for a minimal model of memory, via synaptic dynamics. Both cooperation and competition are needed for a meaningful model of synaptic plasticity [41], with competition acting as a check on the unstable growth of synaptic weights when cooperation alone is invoked [30,109]. Since synapses have finite storage capacities, one should also include a representation of the spontaneous relaxation of synapses when space is created via the spontaneous decay of old memories (cf. the palimpsest effect [28,29]). This is indeed what is done in the model network of synapses and neurons [106] that we will describe in the following. Like the Fusi [79] model, it is a model of discrete rather than continuous synapses; unlike it, however, here, there are explicit mechanisms of synaptic weight change via mechanisms of competing and cooperating synapses that depend intimately on neuronal firing rates.
The dynamical regime chosen in [106] is that of slow synaptic dynamics, where neuronal firings are considered stochastic and instantaneous; the synapses 'see' only the mean firing rates of individual neurons, characterising them as active or inactive, on that basis. As a result of this temporal coarsegraining, the overall effect of the microscopic noise can be represented by spontaneous relaxation rates from one type of synaptic strength to the other, so far as the palimpsest mechanism is concerned. Cooperation between synapses is incorporated via the usual Hebbian viewpoint, while the most crucial and original part of the formalism involves synaptic competition where, along the lines of Kohonen's arguments [91], synapses are converted to the type most responsible for neural activity in their neighbourhood [104,105].
The choice of basis is that of a fully connected network, as depicted in Figure 4, so that mean-field theory applies in the thermodynamic limit of an infinitely large network.
Neurons live on the nodes (sites) of the network, labelled i ¼ 1; . . . ; N. The activity state of neuron i at time t is described by a binary activity variable: Active neurons are those whose instantaneous firing rate exceeds some threshold. Synapses live on the undirected bonds of the network. The synapse ðijÞ lives on the bond joining nodes i and j. The strength J ij of synapse ðijÞ at time t is also described by a binary variable: Strong synapses are those whose strength J ij ðtÞ exceeds some threshold.

Neuronal dynamics
Neurons have an instantaneous stochastic response to their environment. The activity of neuron i at time t reads ν i ðtÞ ¼ 1 w:p: Fðh i ðtÞÞ ; 0 w:p: 1 À Fðh i ðtÞÞ ; where FðhÞ is an increasing response function of the input field h i ðtÞ. The latter is a weighted sum of the instantaneous activities of all other neurons: Strong synapses (σ ij ¼ 1) enter the sum through a synaptic weight a þ b, while weak ones (σ ij ¼ À1) have a synaptic weight a À b. We assume a and b are ν i ν j σ ij Figure 4. The fully connected network for N ¼ 4. Neurons with activities ν i ¼ 0; 1 live on the nodes. Synapses with strength types σ ij ¼ AE1 live on the bonds (after Ref [106]).
constant all over the network. All synapses are therefore excitatory for b > 0, and inhibitory for b < 0. In the following, we focus our attention onto the slow plasticity dynamics of the synaptic strength variables σ ij ðtÞ. It will therefore be sufficient to consider the mean activities ν i ðtÞ and the mean input field h i ðtÞ, defined by averaging over a time window which is large w.r.t. the characteristic timescale of neuron firings, but short w.r.t. that of synaptic dynamics. These mean quantities obey and In most of this work we shall consider a spatially homogeneous situation in the thermodynamic limit of a large network. In this case the key quantity is the mean synaptic strength which does not fluctuate anymore. The mean neuronal activity νðtÞ and the mean input field hðtÞ are related to JðtÞ by the coupled non-linear equations νðtÞ ¼ Fð hðtÞÞ (33) and hðtÞ ¼ ða þ bJðtÞÞ νðtÞ: Consider first the case where there are as many strong and weak synapses, so that the mean synaptic strength vanishes (J ¼ 0). We have then h ¼ a ν, so that the mean neuronal activity ν obeys ν ¼ Fða νÞ. We assume that the solution to that equation is ν ¼ 1 2 , meaning that there are as many active as inactive neurons on average. We further simplify the problem by linearising the coupled Equations (33) and (34) around this symmetric fixed point. We thus obtain the following expression: The slope of the effective response function, is one of the key parameters of the model. 3 It has to obey ε j j < 1. It is positive in the excitatory case (b > 0), so that f ðJÞ is an increasing function of J, and negative in the inhibitory case (b < 0), so that f ðJÞ is a decreasing function of J.
Synaptic plasticity dynamics Synaptic strengths evolve very slowly in time, compared to the fast timescale of the firing rates of neurons. It is therefore natural to model synaptic dynamics as a stochastic process in continuous time [110], defined in terms of effective jump rates between the two values (strong or weak) of the synaptic strength.
The model includes the following three plasticity mechanisms which drive synaptic evolution: 1. Spontaneous relaxation mechanism. Synapses may spontaneously change their strength type, either from weak to strong (potentiation) or from strong to weak (depression) as a result of noise This spontaneous relaxation mechanism, illustrated in Figure 5, translates into 2. Hebbian mechanism. When two neurons are in the same state of (in) activity, the synapse which connects them strengthens; when one of the neurons is active and the other is not, the interconnecting synapse weakens. This is the well-known Hebbian mechanism [42], which we implement as follows: ν i ðtÞ ¼ ν j ðtÞ : σ ij ¼ À1 ! þ1 with rate α : ν i ðtÞÞν j ðtÞ : σ ij ¼ þ1 ! À1 with rate α : 3. Polarity mechanism. This is a mechanism to introduce synaptic competition, introduced for the first time in [104,105], which converts a given synapse to the type of its most 'successful' neighbours, i.e. those which augment the firing of an intermediate neuron. Thus: if a synapse ðijÞ connects two neurons with different activities at time t, e.g. ν i ðtÞ ¼ þ1 and ν j ðtÞ ¼ À1, it will adapt its strength to that of a randomly selected synapse ðikÞ connected to the active Ω ω −1 +1 Figure 5. The spontaneous relaxation plasticity mechanism, with its potentiation rate Ω and depression rate ω (after Ref [106]). neuron i. If the selected synapse is strong, the update σ ij ¼ À1 ! þ1 takes place with rate β; if it is weak, the update σ ij ¼ þ1 ! À1 takes place with rate γ. Therefore: Mean-field dynamics For a spatially homogeneous situation in the thermodynamic limit, the mean synaptic strength JðtÞ obeys a nonlinear dynamical mean-field equation of the form dJ dt ¼ PðJÞ: The explicit form of the rate function PðJÞ is obtained by summing the contributions of the above three plasticity mechanisms. In the most general situation, the model has five parameters: the slope ε of the effective response function (35) and the rates involved in the three plasticity mechanisms. The resulting rate function is a polynomial of degree 4 [106]: with The spontaneous relaxation mechanism yields a linear rate function, while the Hebbian mechanism is responsible for a quadratic non-linearity and the polarity-driven competitive mechanism is responsible for a quartic non-linearity. This modelling of synaptic competition satisfies the requirement on nonlinearity set out in Section 5 for meaningful synaptic dynamics. The parameter ε only enters (42) through its square ε 2 . The model therefore exhibits an exact symmetry between the excitatory case (ε > 0) and the inhibitory one (ε < 0). (Since none of the plasticity mechanisms distinguishes between these two cases, this symmetry is to be expected). More generally, the model is invariant if the effective response function f ðJÞ is changed into 1 À f ðJÞ.

Generic dynamics
The rate function PðJÞ has an odd number of zeros in the interval À 1 < J < þ 1 (counted with multiplicities), that is, either one or three. These zeros correspond to fixed points of the dynamics. As a consequence, the model exhibits two generic dynamical regimes, as shown in Figure 6.
In Regime I (see Figure 6, left), there is a single attractive (stable) fixed point at J 0 . The mean synaptic strength JðtÞ therefore converges exponentially fast to this unique fixed point, irrespective of its initial value, according to The corresponding relaxation time τ 0 reads where τ 0 and J 0 are obtainable in terms of the model parameters [106]. In Regime II (see Figure 6, right), there are two attractive (stable) fixed points at J 1 and J 2 , and an intermediate repulsive (unstable) one at J 3 . The mean synaptic strength JðtÞ converges exponentially fast to either of the attractive fixed points, depending on its initial value, namely to J 1 if À 1 < Jð0Þ < J 3 and to J 2 if J 3 < Jð0Þ < þ 1. The corresponding relaxation times read In other words, Regime II allows for the coexistence of two separate fixed points, leading to network configurations which are composed of largely strong/weak synapses. In fact, it is the polarity-driven competitive mechanism which gives rise to the quartic non-linearity, essential for such coexistence.

Critical dynamics
When two of the three fixed points merge at some J c , the dynamical system (40) exhibits a saddle-node bifurcation. In physical terms, the dynamics become critical. We have then PðJ c Þ ¼ P 0 ðJ c Þ ¼ 0; remains non-critical, and a right one, where J 2 ¼ J 3 ¼ J ðRÞ c , while J 1 remains non-critical. The critical synaptic strength obeys J c > 1 3 [106]. We thus conclude that the critical point is always strengthening, as J c is always larger then the 'natural' initial value Jð0Þ ¼ 0, corresponding to a random mixture of strong and weak synapses in equal proportions.
The mean synaptic strength exhibits a universal power-law relaxation to its critical value, of the form The asymptotic 1=t relaxation law (47) holds irrespective of the initial value Jð0Þ, provided it is on the attractive side of the critical point, that is, À 1 < Jð0Þ < J c in the left critical case (where A c < 0), or J c < Jð0Þ < þ 1 in the right critical one (where A c > 0). To sum up, the non-critical fixed points of Regimes I or II are characterised by exponential relaxation; the corresponding relaxation times, whether long or short, are always finite. Anywhere along the critical manifold, on the other hand, one observes a universal power-law relaxation in 1=t. Such behaviour corresponds to an infinite relaxation time at least in terms of the mean synaptic strength J.
In conclusion, this minimal model is able to show the emergence of powerlaw relaxation or long-term memory. It is clear that the most crucial one of these is the mechanism of synaptic competition, which is in reassuring accord with the importance given to such competition by neuroscientists [31] (Section 5). Purely analytical work is able, however, just to give a flavour of the emergence of long-term behaviour in this model via the critical behaviour of the mean synaptic strength J. If realistic learning and forgetting of patterns are to be implemented with this model, considerable computational work needs to be done. Only the identification of the parameter spaces where criticality is obtained in response to random input patterns will clarify, at least phenomenologically, the routes to long-term memory in this relatively minimal model.

Discussion
Even quantitative approaches to the subject of memory are truly interdisciplinary; contributions range from mathematical psychology through quantitative neuroscience to statistical physics. The narrowing of focus to physics still provides a huge range of contributions: from the seminal contributions on Hopfield networks with their spin-glass analogies, through the emphasis on causality with spiking neurons, both of which involve fast neuronal dynamics, to the synaptic-dynamics-centred approaches that have followed, with the boundedness of synaptic weights on discrete synapses, involving multiple 'hidden' synaptic states, as well as the attribution of competitive and cooperative dynamics to synapses in model networks. In this review, we have sought to highlight those approaches which generate long-term memory; while short-term memory, characterised by exponential relaxation times, is ubiquitous, long-term memory is characterised by power-law forgetting, a much slower process.
Another emphasis of this review is on synaptic competition, whose importance has long been understood by the neuroscience community, but which has only very recently been explicitly included in model networks. This review has gone into as much detail in the need for this mechanism, as its inclusion in biophysical as well as physics-based modelling. In the latter case, the recent advent of agent-based modelling techniques derived from game theory [111] and extended to cover nonequilibrium situations, has been particularly useful.
What is still a matter of debate is the extent to which phenomenological models, on which this review has focused, are useful in unravelling the phenomenon of memory storage and recall. While it is certainly true that detailed biophysical models are overall better in matching experimental data point by point, there is a great deal to be said in favour of the formulation of minimal models. These can, unlike the former, at least benefit from a few analytical insights, which can help both experimentalists and theorists identify the parameters that are truly important in what are typically huge parameter spaces, most recently believed to be in 11 dimensions [112]. While these large parameter spaces are indeed inclusive by definition, their inner workings can only be described by computer simulations, which do not always give unambiguous insights on the relative importance of parameters, or answers to physical questions like, what are the crucial mechanisms for memory storage? This is of course not to minimise their importance; we wish only to underscore the complementarity of the insights obtained by minimal physical models to the enigma of memory. Notes 1. In the literature, this is sometimes referred to as systems consolidation, while synaptic consolidation is traditionally used to describe the molecular mechanism that leads to the maintenance of synaptic plasticity. 2. An older use of the term 'metaplasticity' relates to changes in synapses that are not expressed as changes in synaptic efficacy, but rather alter their responses to subsequent stimuli, an example of this being the sliding threshold of plasticity described in the BCM model [48]. 3. Here and throughout the following, primes denote derivatives.