Trajectory reweighting for non-equilibrium steady states

Modern methods for sampling rugged landscapes in state space mainly rely on knowledge of the relative probabilities of microstates, which is given by the Boltzmann factor for equilibrium systems. In principle, trajectory reweighting provides an elegant way to extend these algorithms to non-equilibrium systems, by numerically calculating the relative weights that can be directly substituted for the Boltzmann factor. We show that trajectory reweighting has many commonalities with Rosenbluth sampling for chain macromolecules, including practical problems which stem from the fact that both are iterated importance sampling schemes: for long trajectories the distribution of trajectory weights becomes very broad and trajectories carrying high weights are infrequently sampled, yet long trajectories are unavoidable in rugged landscapes. For probing the probability landscapes of genetic switches and similar systems, these issues preclude the straightforward use of trajectory reweighting. The analogy to Rosenbluth sampling suggests though that path ensemble methods such as PERM (pruned-enriched Rosenbluth method) could provide a way forward.


Introduction
The Boltzmann factor, which describes exactly the relative probability of microstates at equilibrium in systems whose dynamics obeys detailed balance, forms the cornerstone of a plethora of simulation methods in the physical sciences. For example, the seminal Metropolis-Hastings algorithm for Monte-Carlo simulation exploits the Boltzmann factor to generate a trajectory of configurations which sample the Gibbs-Boltzmann distribution [1]. Knowledge of the Boltzmann factor also makes possible a host of biased sampling methods, which allow efficient characterisation of rugged free energy landscapes comprising multiple free energy minima separated by barriers. In these methods, information on a CONTACT Patrick B. Warren patrick.warren@unilever.com Unilever R&D Port Sunlight, Quarry Road East, Bebington, Wirral CH63 3JW, UK target system of interest is obtained by simulating a reference system, whose microstate probabilities are biased to be different from the target system. The results are corrected for the bias by reweighting with, for example, a Boltzmann factor. The reference system is typically easier to sample than the target system. Thus in umbrella sampling [2], an external potential is used to coerce the reference system (or a sequence of such systems) to sample a free energy barrier. The basis of biased sampling schemes is the generic relation where (x) is some quantity of interest (an order parameter for example), the brackets . . . ref refer to an average over microstates x for the reference system, the brackets . . . targ refer to an average for the target system and W is a reweighting factor . ( 2 ) Here, the ratio P ∞ targ (x)/P ∞ ref (x) is the relative probability of observing the microstate x, where P ∞ ref (x) and P ∞ targ (x) are the steady-state (superscript '∞') probability distributions for the reference and target systems, respectively. For systems whose dynamics obeys detailed balance this ratio is given analytically by the Boltzmann factor (up to an overall constant of proportionality). Thus, Equation (1) provides a way to compute averages over the target system from a simulation of the reference system: during the simulation, one simply tracks the quantity W and uses it to reweight the average of the quantity of interest . The constant of proportionality does not need to be calculated since it cancels in Equation (1).
For non-equilibrium systems, whose dynamics does not obey detailed balance, the relative probabilities of microstates are a priori unknown. Hence, one has no analytical expression for the reweighting factor W, precluding the straightforward use of this type of biased sampling scheme. Sampling non-equilibrium steady states is important in a variety of contexts, including statistical mechanical hopping models with driven dynamics [3], sheared soft matter systems [4] and chemical models of gene regulatory circuits [5][6][7] where failure of detailed balance is arguably responsible for some of the most important biological characteristics [8]. This has motivated recent interest in developing efficient sampling methods for non-equilibrium steady states [9][10][11][12][13] which are not based on Equation (1), but instead take alternative approaches. For example, both the non-equilibrium umbrella sampling [9,[11][12][13] and forward flux sampling [10,14,15] methods involve partitioning state space via a series of interfaces and manipulating the statistics of trajectories between interfaces to enforce sampling of less favourable parts of the state space. While this works for barrier-crossing problems with a well-defined order parameter, it involves significant overhead in terms of defining the interfaces.
In this work, we explore the direct use of biased sampling, via Equation (1), for non-equilibrium steady states. In this approach, the reweighting factor W in Equation (1) is computed numerically, on the fly, using trajectory reweighting. This reweighting concept is not new: it is well established in applied mathematics where it is known as a Girsanov transformation and the trajectory weight is formally a Radon-Nikodym derivative. In applied mathematics, trajectory reweighting is used to calculate the probabilities of rare events [16,17] and for parameter sensitivity analysis [18,19], and in mathematical finance the method is widely used in the context of diffusion equations [16]. In chemical and computational physics, the notion of trajectory weights is also encountered in a host of path-ensemble methods stemming from the seminal works of Jarzynski and Crooks [20][21][22].
In these existing applications one is typically interested in reweighting trajectories with a fixed (and often relatively small) number of steps n. Here, we explore, using a simple example of a birth-death process, how trajectory reweighting can be used to compute steady-state properties. Interestingly, this approach turns out to be closely related to the Rosenbluth scheme for sampling the configurations of chain macromolecules [1,[35][36][37][38]. However, we find that it suffers from the same practical problem that afflicts naïve Rosenbluth sampling, in that the distribution of the reweighting factor can become very broad so that microstates that are important in the target system are hard to sample. Moreover, we show that this problem is controlled by the dynamics of the target system, so that it cannot be avoided by a judicious choice of reference system. Thus, we conclude that a straightforward application of trajectory reweighting is unlikely to work, except for trivial examples. However, the analogy to Rosenbluth sampling suggests a possible solution in a prune-and-enrich strategy, which we suggest as a direction for future work.
In the next section we present in more detail the theory of trajectory reweighting, making concrete the analogy to Rosenbluth sampling, and we also explain its practical limitations. We then illustrate these issues using the simple case of a birth-death process, before suggesting ways in which the limitations of the method might be overcome.

Theory of trajectory reweighting
The trajectory weight P n ({x i }) describes the probability of observing, in a stochastic simulation, a given sequence of n microstates {x i } where i = 0, · · · , n, given that we start in a prescribed microstate at i = 0. This is a welldefined mathematical object [16] which can be expressed as where the p(x i−1 → x i ) are the probabilities of the individual transitions in the underlying simulation algorithm. We assume that these p(x i−1 → x i ) are welldefined (and known) quantities, as is the case for simulation schemes such as kinetic Monte-Carlo and Brownian dynamics.
Knowledge of the trajectory weight, Equation (3), makes possible the above-mentioned trajectory reweighting as a biasing approach applied to dynamical trajectories [16][17][18][19]23,[26][27][28][29][30][31][32][33][34]. In detail, one simulates a reference system that has altered transition probabilities compared to the target system of interest. For example, in a kinetic Monte-Carlo simulation of a gene regulatory network the reference system might have different chemical rate constants to the target system, or in a simulation of a sheared soft matter system it might have a different shear rate. The relative weight of a given trajectory in the target system, compared to the reference system, is given by The basic idea is that one can compute averages over trajectories in the target system, from simulations of trajectories in the reference system, using as the reweighting factor the relative trajectory weight given by Equation (4).

Trajectory reweighting for sampling steady states
To apply this to steady states, we are not so much interested in the properties of short dynamical trajectories, but rather in computing the steady-state properties for systems with rugged landscapes where long trajectories are required for accurate sampling. To apply standard biased sampling schemes to compute steady-state averages using Equation (1), we require the reweighting factor W(x) in Equation (2), which in turn requires knowledge of the steady-state relative probabilities of microstates For systems whose dynamics do not obey detailed balance, this quantity is not known a priori but one can show that it is given by the long-trajectory limit of the relative trajectory weight: The right-hand side here is the average over all trajectories generated in the reference system which end in microstate x, as ensured by the δ-function (for steadystate problems, the starting microstate can be left unspecified). This result is rather obvious, but for completeness is derived in Appendix 1. In the Appendix, we only prove Equation (5) up to a constant of proportionality but we can set this constant equal to unity, without compromising the result since it cancels out in Equation (1).
Since we know the transition probabilities in Equation (4), we can compute numerically, on-the-fly, the quantity W in Equation (5) during a simulation of the reference system, and use it as a 'slot-in' replacement for the factor W(x) in Equation (1), in standard biased sampling schemes. In practice, W can easily be computed on-the-fly: when a simulation step is taken from x i−1 → x i , one calculates the relative transition probabilities p targ /p ref for this step (which are known for a given simulation algorithm), and multiplies the current estimator for W by this quantity. Appendix 2 contains a practical scheme for achieving this objective, based on the notion of a circular history array. It is important to note that typically we would not try to record W as a function of x since in general the space of microstates is very large, and each individual microstate will be visited infrequently (the birth-death process below is something of an exception to this). Rather, we would sample both W and any quantities we are interested in (generically denoted by above) at periodic intervals, and construct the righthand average in Equation (1) from these sampled values using

Analogy to Rosenbluth sampling
It turns out that trajectory reweighting has many commonalities with the classical Rosenbluth scheme for sampling the configurations of chain macromolecules [1,[35][36][37][38]. In this scheme, one 'grows' new chain configurations by addition of successive segments, in a system that is typically crowded with surrounding molecules. At each step in the chain growth, the position (and possibly orientation) of the new segment is biased to avoid overlap with the surrounding molecules (which would lead to rejection of the chain configuration). To compensate for this bias, one associates a reweighting factor with the chain configuration; this consists of a product of the relative weights for each chain segment, in the biased simulation, compared to the target system. Averages over chain configurations are then reweighted by this factor.
Returning to Equation (4) for the relative trajectory weight P n targ /P n targ , we can see directly where the analogy with Rosenbluth sampling arises. In Equation (4), P n targ /P n targ consists of a product, over all steps in the trajectory, of the relative transition probabilities p targ /p ref of the same step taken in the reference and target systems. Thus, making an analogy between generation of a trajectory and growth of a macromolecular configuration, the computation of P n targ /P n targ is equivalent to the computation of the reweighting factor for a configuration of a chain macromolecule of length n segments. Both these cases can be considered to be examples of iterated importance sampling schemes, in which the required reweighting factor consists of a product of individual weighting factors associated with a series of steps in a chain.

Sampling inefficiency for long trajectories
The analogy with Rosenbluth sampling suggests trajectory reweighting is likely to suffer from a common practical problem associated with iterated importance sampling schemes. For example, it is well known that Rosenbluth sampling can become inefficient when the number of segments in the chain macromolecule becomes large [36][37][38]. Returning to Equations (4) and (5), we can view the relative trajectory weight W, for a given stochastically generated trajectory, as the product of n 'random' numbers p targ /p. Hence, the logarithm of the trajectory weight can be viewed as the sum of a series of random numbers: Although successive steps will be correlated, if n is large enough we can apply the central limit theorem to the sum of random numbers in Equation (7). This implies that (for large n), ln W will become normally distributed [39] with mean m and variance v, both of which are proportional to n. Thus the reweighting factor W is expected to be log-normally distributed [40] with mean e m+v/2 and variance (e v − 1)e 2m+v . As a consequence of this, the coefficient of variation (the standard deviation divided by the mean) of W, sampled over many trajectories, is expected to be √ (e v − 1) ∼ e αn , where α is some constant coefficient. As the trajectory length n increases, the variability in the reweighting factor W for the sampled trajectories increases exponentially. Since the sampled W values are used to compute averages over trajectories, this drives an exponential blow-up of the sampling error: this is the fundamental challenge for iterated importance sampling schemes. Another way to think about this is that, as the distribution of W values becomes very broad, the most important trajectories are sampled very infrequently; in fact, they become exponentially rare. This point is demonstrated in Appendix 3 by considering the extreme value statistics of the sampled W values.

The required trajectory length is set by the target system
It turns out that one cannot avoid sampling long trajectories when using trajectory reweighting to sampling steady states in systems with rugged landscapes. For example, in models for genetic switches, the system typically undergoes stochastic flips between alternative stable states on a timescale that is far longer than that of the underlying molecular events [6,7,41]. One might hope that by choosing to simulate a reference system whose flipping rate is much faster than that of the target system, one could sample the entire state space more effectively. To understand why this is not the case, we derive an evolution equation for the reweighting factor W, in Appendix 1. This equation (Equation (A3)) shows that the dynamics of W (and hence its relaxation time) is controlled by the transition probabilities in the target system, not the reference system. Thus, even if the reference system does achieve rapid sampling of the whole state space, the quantity that we need to sample to compute averages in the target system (W and W in Equation (1)) will only converge on a timescale set by the unbiased dynamics of the target system. This implies that long trajectories, with their associated broad distribution of trajectory weights, are needed to compute steady-state averages in systems with rugged landscapes.
For an intuitive explanation of this [42] consider that trajectory reweighting should faithfully reproduce the behaviour of the target system, including transients. This is exploited for instance in the first-passage time problems mentioned in the introduction [30][31][32][33]. Therefore, to achieve steady state, one has to wait until all the transients in the target system have decayed away. This implies that the relaxation to steady state is governed by the target system and not the reference system.

Example: birth-death process
We now illustrate the use of trajectory reweighting to sample non-equilibrium steady states, and its associated issues, using a simple example: a toy model of a birth-death process [39] with birth rate λ and death rate μ, This might represent a set of chemical reactions in which molecules of type A are created stochastically in a Poisson process with rate parameter λ and removed from the system stochastically with rate parameter μ. We simulate the stochastic process represented by Equation (8) using the Gillespie kinetic Monte-Carlo algorithm [43]. For this model, the microstate of the system is fully defined by the (discrete) copy number of A, which we denote x; hence, x → x = 0, 1, 2, . . .. The steady-state distribution of x is given analytically by a Poisson distribution, where k = λ/μ = x ≡ ∞ x=0 xP ∞ (x) is the mean copy number. The fact that analytical results are available for this model allows rigorous testing of the results of our stochastic simulations with trajectory reweighting. We define a timescale for the model by setting λ = 1. We suppose that our target system of interest has death rate μ targ , and we wish to compute information about the target system by simulating a reference system with a different death rate μ ref . Trajectory reweighting is implemented in our simulations as described in Appendix 2; the reweighting factor is computed on-the-fly using a history array of length n.
We first demonstrate that trajectory reweighting works, in the sense that it gives correct results for the steady-state properties of the target system. Perhaps the most obvious system property of interest is the mean copy number x targ ; this can be computed by setting (x) = x in Equation (1) (7), which compares to an analytical result of x targ = 5 for the target system (since λ/μ targ = 5). For the simulated reference system (for which λ/μ ref = 10/3) we obtain, as expected, x ref = 3.333 (2). We can also examine the full probability distribution of the copy number P ∞ targ (x), which can be obtained from Equation (1) Figure 1 shows our simulation results for P ∞ targ (x), compared to Equation (9), for the same parameter set. The target system distribution obtained from our trajectory reweighting simulation is indeed in good agreement with the analytical result; although there is some loss of accuracy and an increase in the sampling error in the tail of the reweighted distribution. Plotting also the distribution P ∞ ref (x) for the reference system (which also corresponds, as expected, to a Poisson distribution, with a different parameter k), we see that the increased sampling error for the target distribution occurs for regions of state space (x) where the overlap between the target and reference distributions is small; i.e. values of x which the reference system samples poorly. This kind of problem is common to many reference sampling schemes and is often addressed by a more sophisticated choice of reference system; for example, as in umbrella sampling [1] one might use a series of reference systems designed to split the underlying probability landscape into subregions, each of which can be sampled more generously before being stitched together.
We next illustrate the fact that the length of trajectory n that is needed for accurate sampling is governed by the target system, not the reference system. To this end, again using reference and target system parameters μ ref = 0.3 and μ targ = 0.2, we vary the size of the history array, i.e. the length of trajectory n that is used in the computation of the reweighting factor W. Figure 2 shows results for the mean target system copy number, computed as x targ = 5 for the target system. We can probe the rate of this convergence by fitting the data in Figure 2 to a mono-exponential relaxation,  to the characteristic relaxation timescales of the target and reference systems, respectively. The data clearly fit τ = μ −1 targ , rather than τ = μ −1 ref , confirming that it is the target system, not the reference system, whose dynamics controls the convergence rate.
We now test our prediction that, as the length n of the trajectory used to compute the reweighting factor W increases, the distribution of W values sampled will broaden, making it hard to compute accurate results. Figure 3 (solid lines) shows histograms for ln W computed from our simulations for n = 20 (left panels) and n = 50 (right panels). As predicted by our theoretical analysis, P(ln W) indeed approaches a normal distribution for large values of n and this distribution indeed broadens as n increases. Comparing results for different values of the target system death rate μ targ (top to bottom in Figure 3), we see that the distribution P(ln W) also broadens as μ targ decreases, i.e. as the target and reference system become more different from each other. This is because the distribution of p targ /p ref values becomes wider as the reference system deviates further from the target system.
When computing weighted averages (as in Equation (1)) what is important is actually W × P(W), or W × P(ln W) since trajectories with high weight W contribute more to the average. 2 We therefore also plot in Figure 3 (dashed lines) the distribution of values of W × P(ln W). Comparing the results for different trajectory lengths n = 20 (left panels) and n = 50 (right panels), we see that as the trajectory length increases, not only does P(ln W) broaden (solid lines) but also the peak in W × P(ln W) gets shifted further into the tails of P(ln W). Thus for longer trajectories we become less likely to sample the relevant parts of W × P(W), where the weight 14 (e, f). Panels are also labelled directly by (μ targ , n), and these values of μ targ are also shown arrowed in Figure 4. W is large. We also see the same effect upon decreasing μ targ (top to bottom panels in Figure 3); as the target and reference systems become more dissimilar, sampling trajectories with a high weight in the target system becomes less likely. 3 Figure 3 rather dramatically underscores the importance of the extreme high weight trajectories in calculating weighted averages. Exactly the same kind of phenomenology is seen for Rosenbluth weights [36]. Indeed, Figure 3 closely mirrors Figure 2 in Ref. [38] for polymers localised in random media. to 50 indeed decreases this deviation. However, we pay a penalty for this in terms of the sampling error; the error bars on our data points become much larger for the longer trajectories. This reflects the increasingly poor sampling of the underlying W distribution (Figure 3). If we attempt to further reduce the systematic error for small values of μ targ by increasing n still further, for example to n = 100, then we see so few of the important but exponentially rare high-weight trajectories that the results become statistically meaningless. This illustrates rather clearly the 'Catch-22' practical issue associated with trajectory reweighting for computing steadystate properties: for target systems with slow dynamics, long trajectories are needed, but because trajectory reweighting is an iterated importance sampling scheme, its statistical accuracy decreases, catastrophically, as the trajectory length increases.

Discussion
Trajectory reweighting provides an apparently elegant way to extend biased sampling methods developed for equilibrium steady states, to non-equilibrium systems whose dynamics do not obey detailed balance. By simulating a reference system, which is easier to sample than the target system of interest, one can obtain averages over the (un-simulated) target system by reweighting averages in the simulated reference system using reweighting factors computed from the reference system trajectories. The analysis presented here shows this is possible in principle, and that it does indeed work for the nottoo-challenging test case of a birth-death process. For this particular problem though, it is clear that trajectory reweighting will never beat straightforward sampling: if the target system relaxes more slowly than the reference system (μ targ μ ref in Figure 4) one comes up against the problem of poor sampling of the high-W trajectories ( Figure 3); conversely if the target system relaxes faster than the reference system (μ targ μ ref ), it is simply more efficient to simulate the target system directly. However, we expect trajectory reweighting has the potential to be useful for systems whose dynamics makes slow switches between alternative 'basins of attraction' (e.g. a genetic switch), since using a reference system that relaxes faster than the target system should allow better sampling of trajectories that switch between the basins.
Our work reveals the two significant practical issues which preclude the straightforward application of trajectory reweighting. First, the timescale to obtain unbiased estimates of reweighted quantities is still set by the target system. For example, to calculate steady-state averages in our toy model, one needs trajectory durations which are 2-3 times the relaxation time in the target system (μ −1 targ ; see Figures 2 and 4). We expect this ruleof-thumb to hold generally. Second, for long trajectories, the distribution of trajectory weights becomes extremely broad, and the weighted averages that we need to compute are sensitive to the rarely visited high-weight region. In our toy model this is exemplified in Figure 3. This problem becomes acute for target systems which involve barrier-crossing events, such as genetic switches. Since the barrier-crossing frequency is very small, the relaxation times are very long. Then, we cannot escape from the fact that very lengthy trajectories are required to compute steady-state averages in such target systems, even if the reference system equilibrates rapidly.
It is important to note that this problem is specific to the use of trajectory reweighting for steady-state system properties. Trajectory reweighting has been successfully applied in numerous other contexts, including financial mathematics [16], rare event analysis in applied mathematics [16,17], and first-passage time problems [30][31][32][33]. The essential difference is that these problems are characterised by short-duration trajectories. For sampling of short trajectories, the measurement statistics of infrequent events can be drastically improved by generating a large number of successful but biased trajectories, even though these may carry a low weight once the bias is removed. This variance-reduction mechanism was cogently argued by Gillespie et al. [30]. In contrast, to sample steady-state properties, long-duration trajectories have to be reweighted, with the associated sampling problems discussed here.
We have noted that there is a close analogy between computing trajectory weights, and Rosenbluth sampling for chain macromolecules. This suggests a possible way forward. In the Rosenbluth method, inefficient sampling of high-weight polymer configurations was resolved by the development of the pruned-enriched Rosenbluth method (PERM) [1,37], and its descendant flat-histogram PERM [15,44]. To apply this in the present situation, one would follow an ensemble of trajectories, and in order to keep the weights in the desired region in P(ln W) (Figure 3), at intermediate times discard trajectories with low weights, and clone trajectories with high weights. The added complications are the increased bookkeeping required to keep track of a variable number of trajectories, and the need to fine-tune the hyperparameters in the algorithm. We note that PERM applied to this problem does not necessarily avoid the issue that the sampling time is set by the target system, thus for instance for a genetic switch we expect one would need to sample for 2-3 times the waiting time to cross the barrier. However, by diminishing or removing the barriers in a rugged probability landscape, we expect that the use of trajectory reweighting will distribute the sampling effort more effectively over state space, leading to more accurate calculation of steady-state probability distributions. Undoubtedly this will only become clear with further and more detailed investigations, which we leave for future work. Another possible avenue to explore is the use of simultaneous trajectory reweighting across multiple systems, as in the extended bridge sampling method [45].

Notes
1. Note that δt targ = ∞ x=0 P ∞ targ /(k + μ targ x) ≈ 0.526 is only 3% different from δt ref ≈ 0.540, whereas μ targ and μ ref differ by 40%, so replacing δt ref by δt targ in Equation (10) would make an indiscernible difference to the goodness of the fit. 2. Since we are dealing with probability distributions, P(W) dW = P(ln W)d(ln W). Multiplying through by W shows that W × P(ln W) is the correct quantity to think about in this context. 3. Note that poor sampling of P(ln W) is distinct from the problem that arises when there is poor overlap between the reference and biased probability distributions.