On the convergence of multi-scale free energy simulations

In this work, we employ simple model systems to evaluate the relative performance of two of the most important free energy methods: The Zwanzig equation (also known as ‘Free energy perturbation’) and Bennett’s acceptance ratio method (BAR). Although our examples should be transferable to other kinds of free energy simulations, we focus on applications of multi-scale free energy simulations. Such calculations are especially complex, since they connect two different levels of theory with very different requirements in terms of speed, accuracy, sampling and parallelisability. We try to reconcile all those different factors by developing some simple criteria to guide the early stages of the development of a free energy protocol. This is accomplished by quantifying how many intermediate steps and how many potential energy evaluations are necessary in order to reach a certain level of convergence.

Free energy calculations are among the most important methods in the toolbox of computational chemistry. They provide a direct connection between molecular simulation and macroscopic thermodynamic observables such as binding constants, solubilities or the equilibrium of reactions. Moreover they are rigorous from a statistical mechanics point of view and, overall, provide superior results compared to more approximate methods. Unfortunately, they are also computationally expensive and not necessarily straightforward for the non-expert (or sometimes even the expert) to perform.
Multi-scale models represent another essential set of tools that have made a tremendous impact on our understanding of a wide range of complex chemical systems. This culminated in the 2013 Nobel Prize in Chemistry that was awarded to Martin Karplus, Michael Levitt and Arieh Warshel. Multi-scale models integrate multiple levels of theory that work together to span broad domains in terms of time, space and accuracy. Thus, it seems natural that multi-scale models should be combined with free energy simulations to achieve a desired level of precision and accuracy. However, it is often not clear how to perform such calculations in the most efficient way.
One particularly vexing issue in most applications of free energy simulations arises before the free energy simulation is even performed. When calculating a free energy difference between two states, one implicitly attempts to enumerate the partition functions of the involved end states. In principle this means that one has to sample the complete energy surface of both statesa daunting task for any system that involves more than a couple of degrees of freedom. A different way to state the free energy difference problem is to ask how similar are the potential energy surfaces to each other. After all, the free energy difference corresponds to the ratio of the partition functions. The way to measure this similarity is calculating the potential energy difference between the two end states for a series of points in phase space. Those points in phase space are typically provided by molecular dynamics simulations. The overall measure for the similarity of the partition functions, the free energy difference, will converge quickly if the resulting distribution of the potential energy differences is well behaved. Wild fluctuations of energy differences lead to poor convergence of the free energy calculation.
The predominant technique to improve the convergence of free energy simulations between two very different energy surfaces is the introduction of artificial intermediate states. Commonly, those intermediate states are created by mixing the properties of the two end states according to a mixing ratio λ, where λ = 0 corresponds to the initial state and λ = 1 corresponds to the final state. Figuratively speaking, instead of comparing apples with oranges, one creates a series of apple-orange hybrids and compares those to each other.
Since each λ state requires a separate molecular dynamics simulation, it is necessary to evaluate how many intermediate states are necessary. This decision has a serious impact on the free energy calculation. On the one hand, using insufficient intermediate states can lead to biased free energy estimates which affect the accuracy of the free energy difference. On the other hand, every intermediate state increases the computational costs. Therefore, the introduction of unnecessary intermediate states is a waste of computer time. Moreover, superfluous intermediate states lead to opportunity costs in terms of accuracy. Often, precious computer time is invested in the creation of intermediate states although it would have been more opportune to spend it on longer simulations or additional repetitions of the simulation to evaluate the reproducibility of the results.
While certain criteria are available to decide a posteriori whether sufficient intermediate states have been used in a free energy simulation [1][2][3][4][5][6], there is no such criterion to decide beforehand how many λ intermediate states are required. The underlying reason is simply that the required measure for sufficient similarity in terms of phase space overlap in a free energy simulation is the resulting free energy difference itself (or, more specifically, the entropic contribution to the free energy, as will be discussed later on in the Theory section). Thus, to a certain degree, every free energy simulation involves a chicken-and-egg dilemma.
Clearly, there is no exact solution to the problem of finding an optimal λ protocol for free energy simulations, and any attempt to do so would border on the absurd. However, one can argue that finding a good λ protocol is easier than finding a good approximation to the free energy difference. First, the accuracy and precision demands are very different for the actual free energy result and for the data that is required for estimating a good free energy protocol. Second, it is possible to compensate for mistakes in the λ protocol by running longer simulations. Finally, it is straightforward to detect mistakes in the λ protocol using the aforementioned phase space overlap criteria. Thus errors can also be compensated by introducing more intermediate steps after the fact with the benefit of hindsight. However, our main argument for trying to estimate a good free energy protocol is pragmatic in nature: any reasonable automatic procedure to estimate the required λ protocol is superior to the current state of the art: Guessing followed by trial and error.
In the following, we attempt to provide some basic guidelines to estimate a good λ protocol with the limited information that is available before performing the actual free energy simulation. For this purpose, we borrow some ideas from Marcus theory and linear response theory [7]. In particular, we develop some simple criteria for the convergence of free energy simulations based on the Zwanzig equation [8] and Bennett's acceptance ratio method [6]. We start by introducing the two methods and compare their efficiency in terms of the convergence of some very simple model systems. Then we describe how the efficiency depends on the relative computational costs of the end states. This is relevant for multi-scale free energy simulations, where one end state can be a molecular mechanics simulation and the other end state can be a quantum-mechanical simulation. We also explain in how far the two methods benefit from parallelisation on modern hardware architectures. In the next step, we show how the computational costs depend on the introduction of intermediate states. Based on free energy simulations of twelve molecules in the gas phase, we exemplify the limitations of our simplistic approach. Finally, we present results for multiscale free energy simulations between molecular mechanics and quantum mechanics.

Free energy
For readers that are not necessarily familiar with free energy simulations, we use the following sections to introduce the necessary background and our notation. The free energy difference between two systems at constant volume is given by the Helmholtz free energy, A, which is defined by where k is the Boltzmann constant, T is the temperature and Z is the partition function. The subscript identifies the corresponding end state, where we refer to the initial state as state 0, and the final state as state 1. Based on the work of Boltzmann, partition functions are defined as sums over all possible micro-states, i.e.
where i enumerates all possible combinations of the coordinates r i to generate a potential energy U a based on the potential energy function of state a. Although the enumerator i looks very innocent, it is actually one of the reasons why a computer simulation can never fully converge, because even with the finite machine representation of floating point numbers there are usually still more possible manifestations of i than what can be covered with our computers in reasonable time. Fortunately, not all states are equally likely, therefore our computer simulations can be restricted to analysing states with a high importance, which corresponds to the low-lying energy regions. In practice, this means that each state is generated according to its probability p, which corresponds to and leads to an expression for Boltzmann-weighted ensemble averages of a property X, i.e.
In this context, it is convenient to introduce a short hand notation that is employed in the remainder of this paper. First, reduced units are used to express the potential energy, which means that energies are given in units of kT. Second, we skip writing out the dependence of the potential energies, U, on the actual coordinates r in ensemble averages, because it is implied that the potential energies have to be evaluated for all coordinates encountered in a trajectory. For example, it is a well-known fact that the free energy difference is composed of both an enthalpic and an entropic contribution, i.e.
Since the contribution from the kinetic energy cancels between the end points as long as mass and the number of particles are conserved, the enthalpic contribution is given by which can be calculated with relative ease from the simulations. However, the entropic contribution is given by which means that the entropy difference is Basically, the entropic contribution is a measure for the similarity of the two phase spaces and represents how easy it is to move from one state to the other [9]. If the probability distributions p 0 and p 1 are very similar, the entropy difference is close to zero. If the initial state has more phase space available than the final state, the entropy is negative and, therefore, the free energy contribution from the entropy is positive. Since the determination of the probabilities requires a good estimate of the partition functions to obtain the proper normalisation constant, long simulations have to be performed for convergence [10]. This is the very aspect that makes free energy simulations so expensive.

Free energy methods
Two of the main workhorses in the field of free energy simulations are the Zwanzig equation (also known as 'Thermodynamic Perturbation', 'Free Energy Perturbation' or 'Exponential Formula') [8], as well as Bennett's Acceptance Ratio (BAR) [6]. Notably, the third important free energy method is Thermodynamic Integration (TI) [11], but this method has only played a minor role in multi-scale free energy simulations so far. Therefore, we restrict ourselves to pointing out that TI can be almost as efficient as BAR if it is used correctly [4,12]. Also there are some recent innovations that make TI more efficient [13,14]. Excellent introductions to this matter are also provided in several books and papers [4,12,[15][16][17].
A key innovation since the turn of the millennium is the rise of the non-equilibrium work techniques by Jarzynski [18] and Crooks [19]. These approaches use the non-equilibrium work carried out in short switching simulations between the two end states. In the limit of infinitely fast switching simulations, the non-equilibrium work is the same as the potential energy difference between the end states. Thus, in this limit, the Jarzynski equation is the same as the Zwanzig equation and Crooks Theorem is the same as BAR, which is another reason why we restrict our analysis to the Zwanzig equation and BAR. For the advanced reader, we also want to point out that all the conclusions drawn in the following sections should be equally valid for the respective non-equilibrium methods. One just has to substitute our expressions for the mean potential energy difference by the corresponding mean non-equilibrium work. Unfortunately, we are not aware of a straightforward connection between those two properties, since the non-equilibrium work depends on both the switching speed and the nature of the energy surface. Thus, the presented convergence criteria for Zwanzig and BAR should be considered mere bounds for the non-equilibrium expressions.

Zwanzig equation
The Zwanzig equation is a one-sided free energy method, in the sense that only one of the two end states is used for sampling. This makes it attractive for multi-scale free energy simulations, where potential energy evaluations on the lower level of theory are significantly cheaper than on the higher level of theory. The Helmholtz free energy difference between two states is given by Here, the potential energy, U, is in reduced units of kT. For each frame of the trajectory of the initial state 0, the potential energies are evaluated both with the potential energy function of the initial state (U 0 ) and of the final state (U 1 ). For example, in a multi-scale free energy simulation of a molecule in the gas phase, U 0 corresponds to a potential energy evaluation with molecular mechanics, while U 1 corresponds to the use of quantum mechanics. Of course the Zwanzig equation can also be performed in the reverse direction, going from the final state to the initial state, yielding a A 1→0 based on a trajectory of the final state. While, theoretically, the forward process A 0→1 should yield the same result as the backward process A 1→0 , this is usually not the case in practice due to limited sampling. Thus, the resulting forward and backward Zwanzig results represent bounds to the free energy difference [6,20]. However, as noted by Lu and Kofke, the Zwanzig results are more reliable if the direction goes from a high entropy to a low entropy state due to sampling [21].
Poor convergence of the free energy result is usually addressed by introducing intermediate states. In this case, the total free energy calculation is broken down into sub-steps, i.e.
where λ represents a mixing ratio between the two end states and λ is the step size in λ space at which simulations are conducted. It is usually easier to use a fixed step size λ, but it sometimes turns out that some sub-steps are more challenging in terms of convergence than others. In such situations, it makes sense to introduce more λ states in the recalcitrant part of the simulation, which leads to unequal step sizes. One such challenge is the socalled van der Waals end point problem that arises when the van der Waals interactions of certain atoms are turned to zero [22].
A special case of the use of the Zwanzig equation is so-called Double-Wide Sampling (DWS) [23], where λ states are connected via half-steps, which involves free energy calculations to virtual midpoints. Thus, the total free energy difference becomes Since only half the step size in λ space is used, the phase space overlap between the employed end points of each sub-step is increased, which improves convergence. Another noteworthy variation of the use of the Zwanzig equation is Overlap Sampling [21,24], which considers the basic asymmetry between the forward and the backward direction of the Zwanzig results and optimises the position of the virtual intermediate state that has been introduced here in the context of DWS. By finding the optimal virtual intermediate state between two end points, the use of two Zwanzig equation results becomes equivalent to employing Bennett's acceptance ratio method [21,24].

Bennett's acceptance ratio
BAR [6] is a two-sided free energy estimator, which requires simulations of both end points of a free energy step. The BAR equation can be derived in a number of ways. Bennett's original derivation is based on finding a free energy estimate that minimises the variance of the free energy outcome. Shirts and Pande helped to popularise the method by showing that BAR is equivalent to a maximum likelihood estimator [25]. This makes BAR very attractive from a statistical point of view, since maximum likelihood estimators are known to be wellbehaved, asymptotically unbiased and no other estimator has a lower variance [25]. As already mentioned before, BAR is also the equilibrium equivalent of Crooks non-equilibrium work theorem [19]. Moreover, BAR is connected to multi-state methods like the Weighted Histogram Analysis Method (WHAM) [26,27], the multi-state Bennett method (MBAR) [28], the non-Boltzmann Bennett method (NBB) [29][30][31][32] and the optimal multi-state free energy estimator by Maragakis, Spichty and Karplus [33].
To perform a BAR calculation, the potential energy differences between the two end states have to be calculated for each frame of the two trajectories. Again we resort to the use of reduced units of kT. The free energy difference results from where f (x) denotes the Fermi function ( 1 1+exp (x) ). The unknown variable C is found using an iterative procedure that sets the ratio of the two ensemble averages to unity. A good initial guess for C can be obtained by estimating the free energy difference with Double-Wide Sampling or Overlap Sampling. Alternatively, if histograms of the U distributions for both end states are available, one can also employ the U at which the two probability distributions p 0 ( U) and p 1 ( U) intersect. In our experience, it is advisable to create the histograms of U in any case, since the inspection of the overlap integral of the two distributions provides the necessary information to judge whether the resulting free energy is converged. As a rule of thumb, the overlap should exceed 1% in order to be considered reliable. This aspect is also discussed in the subsequent sections. Commonly an equal number of data points is produced for both end states, but BAR can also be employed if the number of data points differs. In some cases, better results are possible by producing more data points for the end point with higher entropy (cf. Equations (13)- (16) in Ref. 6). Unfortunately, it is usually not known a priori which end state exhibits the higher entropy, therefore it is reasonable to start with equal amounts of sampling. Finally, it is important to point out that BAR and its derivatives are currently the most efficient methods for determining free energy differences [3,4,[34][35][36][37].

Introduction of the model system
As already pointed out in the Introduction, the starting point of our endeavour to predict the convergence properties of a free energy simulation protocol is the use of linear response theory. Linear response theory is very well established in the context of Marcus theory to explain the rates of electron transfer reactions [7]. It also forms the basis of the Linear Interaction Energy (LIE) method, which is sometimes employed in drug design [38][39][40][41][42][43]. However, it is an approximate method as it assumes that the underlying energy fluctuations obey Gaussian statistics. This corresponds to a second-order approximation of the free energy difference. Hummer, Pratt and Garcia demonstrated with hydration free energies of ions that the electrostatic potential energy distribution is Gaussian [44], and Åqvist and Hansson showed that the linear response approximation holds well for monovalent ionic solutes [45]. However, it was also shown by Smith and van Gunsteren that this approach breaks down for particle insertion and deletion due to the coupling between the different free energy contributions [46]. A recent analysis by Heid, Moser and Schröder indicates that the linear response approximation can also break down for electrostatic changes under some circumstances [47]. Nevertheless, this approach has proven to be useful in practice, if one keeps aware of its limitations. We thus follow in this pragmatic tradition, but also dedicate a separate section in Results to the discussion of the limitations of this approach. Our choice of model system is motivated by the fact that most multi-scale and absolute free energy calculations employ a single topology setup with identical numbers of atoms in the initial and final state. This setup entails no change of the general topology of the structure, and no change of the force constants, but still uses different Hamiltonians at the end points. Similar calculations are also used in transfer free energy calculations or binding affinity calculations. Thus, the free energy calculations mostly gauge the response to a change of the environment.
The strongest interactions within a molecule are usually the chemical bonds, whose potential energy surface can be approximated by harmonic potentials. The dominant change of the environment usually involves electrostatic interactions, which are described by the Coulomb potential. Combining a Coulomb potential with a significantly stronger harmonic potential leads to a change of the position and depth of the energy minimum. An example of such a change is shown in Figure 1, where the axis labels reflect the main differences. The difference between the left and the right side in Figure 1 corresponds to the change of the system as attractive electrostatic interactions are introduced. This mimics the transfer of small molecule from the gas phase into a polar solution.
As can be seen in the example of Figure 1, the electrostatic interaction energy is relatively strong with ca. −18 kT. Yet, the introduced anharmonicity and the change of the force constant are rather small. In addition, the probability distributions (blue  (Colour online) Description of two harmonic oscillators. The potential energy, U, is shown relative to an arbitrary reaction coordinate. The initial state is called state 0, while the final state is state 1. The minima reside at a distance d from the midpoint between the two minima. The different well depths define the enthalpy difference H. The energetic cost to visit the energy minimum of the other state is given by the reorganisation energy ( U reorg ).
lines) before and after the change are very similar (compare left and right side of Figure 1, where the main change is the shift of the centre of the Gaussian probability distribution from 0 to ca. 0.04 Å). From the perspective of phase space overlap, this response of the harmonic system can thus be modelled by two shifted harmonic oscillators with identical force constant. This should also approximately be the case in a multi-scale free energy calculation between MM and QM, if the MM force field has been properly parameterised, and the main difference is the impact of polarisation on the charge distribution.
In the following, we define the basic components of our model system, which consists of two harmonic oscillators. Thus, the probability distributions are Gaussian, as required by linear response theory. The underlying potential energies of the harmonic oscillators are defined as U 0 (x) = K(x + d) 2 and x is the reaction coordinate and the subscript indicates the state (0=initial state, 1=final state). K is the force constant, and d is the deviation of the equilibrium position of the harmonic oscillator from the midpoint between the two energy minima. Here, the force constants K include the conventional factor of 1 2 and are in reduced units of kT. H signifies the enthalpy difference, which corresponds to the energy difference between the two energy minima (see Figure 2). The resulting probability distributions are p 0 ( where the partition functions are given by the Gaussian integral with Z 0 = π K and Z 1 = π K e H . The potential energy difference, U, is given by Following Marcus theory, we also define the reorganisation energy, which is the potential energy difference between the coordinates of the energy minima using the potential energy function of the state under consideration, i.e. and (see also Figure 2). In the case of our model system, the reorganisation energies are identical for both states, since the force constants are the same (i.e. U 0→1 reorg = U 1→0 reorg = U reorg ). In practice, U reorg can be obtained by an energy minimisation that starts with the coordinates of the energy minimum of the other state.

Convergence properties of free energy estimates
Equipped with this simple model system, we now attempt to determine under which circumstances it is worthwhile to perform simulations of both end states (as required for BAR) and under which circumstances it is enough to just conduct a simulation for one of the end states (corresponding to the use of the Zwanzig equation). This is especially relevant if sampling one of the end states would incur significantly higher computational expenses, as is the case in multi-scale free energy simulations, where one compares an MM energy surface to a QM or QM/MM energy surface.
The efficiency and bias of the Zwanzig equation and Bennett's acceptance ratio method have been described in great detail by Shirts and Pande, [48] as well as Zuckerman and Woolf [49]. Therefore, we restrict ourselves to a short summary here. We also just discuss the variance of the free energy estimate and not the bias, since the expected bias of the Zwanzig result is half of its variance [48,49]. (19)-(23) of Ref. [48], the variance of a free energy result based on the Zwanzig equation, σ 2 Zwanzig , is proportional to the variance of the underlying potential energy difference distribution, σ 2 U . I.e.,

Zwanzig equation As described in Equations
where n is the number of data points/potential energy difference evaluations used in the free energy estimate. For two shifted harmonic oscillators, there is a direct mapping between the probability distribution of the reaction coordinate position on the one hand, and the probability distribution of the potential energy difference on the other hand. Since the reaction coordinate can be expressed in terms of the potential energy difference, Thus, the variance of the Zwanzig free energy estimate can be connected to the reorganisation energy via A comparison of the theoretical variance with variances from actual simulations of harmonic systems is shown on top in Figure 7 of the Results section. Importantly, it is not necessary to know the exact force constant nor the nature of the underlying reaction coordinate that leads to the reorganisation energy if we assume that the force constants of the end points are identical. Thus, no normal mode analysis is required to obtain the variance estimate. This makes the analysis in practice very convenient, since only two simple energy minimisations are required to obtain the reorganisation energy.

Bennett's acceptance ratio
To the best of our knowledge, there are no amenable estimates for the variance of BAR. Equation (11) of Bennett's original paper [6] specifies that where n lies between the number of data points for the initial state (n 0 ) and the final state (n 1 ). The phase space overlap is given by which depends on the details of the energy function. This corresponds to where f again denotes the Fermi function f (X) Notably, a constant shift of the potential energy difference in form of the enthalpy difference H has no effect on the probability distribution, because it cancels out. The Fermi function is depicted in Figure 3 for different temperatures. Figure 3 indicates that the Fermi function can be approximated by a step function at low temperatures. For our harmonic oscillator example, U − H < 0 if x > 0, which corresponds to the tail of the probability distribution of p 0 . Because of the symmetry of the Fermi function around 0, most of the error resulting from this approximation is going to cancel, except for the asymmetric contributions from p 0 . Thus, the approximation by a step function is most likely underestimating the real overlap, and, consequently, overestimating the variance. With the step function approximation, the phase space overlap becomes The probability distribution of a harmonic oscillator is a Gaussian, where the integral corresponds to the complementary error function, erfc, i.e.
and by employing the Chernoff-Rubin bound [50,51] which is equivalent to Notably, the Chernoff-Rubin bound overestimates the error function, which partially cancels the error that arises from using a step function instead of the Fermi function. We also provide an alternative way to derive this result in the Appendix 1.
Thus, the variance of the BAR free energy estimate for two shifted harmonic oscillators is An interesting feature of Equation 24 is that it is again absolutely independent of the underlying force constant or partition function and can be estimated based on the result of a simple energy minimisation. A comparison between theory and simulation is given at the bottom of Figure 7 in the Results section for verification. Readers that are not necessarily interested in multiscale simulations should feel free to skip the next section and continue reading at Section 1.9.

Comparison of the efficiency of the Zwanzig equation and BAR for multi-scale free energy calculations
A fundamental question of multi-scale free energy calculations is whether it is more efficient to reweight trajectories from a lower level of theory (corresponding to the Zwanzig equation) or whether one should perform additional sampling on the high level of theory energy surface (corresponding to the use of BAR).
The answer to this question depends on details of the method and the target system. Let us assume that the computational costs of MM potential energy evaluations are c MM (for example in CPU time), while the computational costs of a QM potential energy evaluation are c QM . The sampling algorithm and the roughness of the underlying energy surface also have an impact on the choice, since consecutive potential energy evaluations are not statistically independent. We therefore assume that τ energy evaluations are required to produce statistically independent data points, where τ represents the relevant relaxation time in terms of necessary energy evaluations. Equation (17) provides us with an estimate of the required number of data points for the Zwanzig equation (n Zw ) to obtain a certain precision (σ p ) given a particular U reorg , with Thus, the total computational costs, C, for employing the Zwanzig equation in multi-scale free energy calculations to reach a certain level of precision are where n Zw τ c MM represents the costs of the MM simulation and n Zw c QM represents the costs of the post-processing with QM. Similarly, the required number of data points for BAR based on Equation (24) is with the additional requirement that approximately n BAR independent data points have to be sampled from both the initial state (MM) and the final state (QM). 1 A comparison of the number of data points to reach a certain level of precision for the Zwanzig equation and BAR is given in Figure 4. The figure highlights that BAR requires significantly fewer data points than the Zwanzig equation to reach a certain level of precision (unless the difference between the systems is very small). The total computational costs, C BAR , for employing a BAR-like approach in multi-scale free energy calculations are since sampling (n BAR τ c) and post-processing (n BAR c) have to be performed on both energy surfaces. Correspondingly, the relative computational costs (Rel. Costs) of performing multiscaling by post-processing of MM trajectories vs. performing additional sampling of the QM energy surface is For example, if the computational costs of both energy surfaces are the same, i.e. c * = 1, the expression in the parenthesis on the right side of the equation becomes two, which boils down to the ratio of the two curves shown in Figure 4.

The effect of parallelisation
Time is the most precious resource at our disposal. However, the equation above does not properly reflect this aspect, since it neglects the possible parallelisation of tasks. The ability to parallelise is also of eminent importance given modern multicore hardware architectures. The effect of parallelisation is commonly included via a speed up factor, η, which is related to Amdahl's law [52] via where f P is the fraction of the computational costs that can be parallelised and n P is the number of processors that are available. The speed up factor is usually different for the sampling method (η S ) and the post-processing of existing trajectories (η P ), owing to the fact that the energy evaluations for the sampling have to be performed sequentially (e.g. in molecular dynamics simulations). On the other hand, the post-processing of existing trajectories is embarrassingly parallel. To simplify the equations, we define a relative speedup factor, η * = η P η S , which indicates in how far the post-processing is more parallelisable than the sampling stage. In most cases, η * is proportional to the inverse of the number of processors ( 1 n P ). Including the speed up factors from parallelisation into Equation (30), and focusing only on the time limiting steps, leads to the relative wall clock time to perform the tasks This equation already hints that for high levels of parallelisation the costs of the post-processing can be neglected (i.e. η → 0 and η * c * → 0) and the effect of τ cancels out between the numerator and denominator, leaving just the bare relative costs (c * ) to compete with the relative efficiencies of BAR and the Zwanzig equation ( n BAR n Zw ). The effect of parallelisation is illustrated in Figure 5, where the left side shows the use of a single processor (where the wallclock time corresponds to the total computational costs), while the right side shows the required wall-clock time to obtain sufficiently converged results when using 100 processors (i.e. η P = 1 100 ). The three rows represent different relative costs of MM and QM (c * = 1, 1000, 1,000,000). Each plot shows the relative efficiency as a function of the reorganisation energy of our model system. While BAR is more efficient than the Zwanzig equation, the possibility to parallelise the post-processing and the requirement to produce independent samples favours the Zwanzig equation in some cases. For example, for τ = 10 and c = 10 (the violet line on the left side of the top row), the Zwanzig equation is more efficient if U reorg is very small (< kT 2 ). However, by exploiting parallelisation, it can outcompete BAR until a U reorg of ca. 1 kT. Notably, there is a limit to the benefit of parallelisation, since the performance strongly depends on the relative computational costs between MM and QM, as illustrated by the different rows. Since parallelisation is mostly limited to the post-processing, while the costs are dominated by the time spent on sampling, it often has only a minor impact. If the relative costs are increased to c * = 1000 (middle), the Zwanzig equation performs better until a U reorg of ca. 1-4 kT, depending on the autocorrelation time of the system. Because of the exponential relationship between the number of required samples and U reorg , the relaxation time τ has only a limited impact on the relative efficiency if the relative computational costs are small. This is highlighted by the similarity between the plots of τ = 10 3 and τ = 10 6 in the top row, where the lines overlap. The effect of τ is amplified by the relative costs c * , as highlighted by the larger gaps between the different τ as the computational costs increase. However, it is also possible to turn this argument around and say that methods that decrease τ become more relevant as the computational costs increase. Thus, the development of improved sampling algorithms that can decrease the autocorrelation time strongly favours the use of simulations at the higher level of theory.
To summarise, Equation (32) provides a rule of thumb to determine whether multi-scaling via post-processing is worthwhile or not (E MS > 1 indicating that post-processing is preferable over additional sampling of the QM energy surface). The main component of the underlying Equations (25) and (27) is an energy minimisation, which determines the potential energy difference between an MM-optimised structure and a QMoptimised structure ( U reorg ). The same energy minimisation can also be used to estimate the relative computational costs of QM compared to MM (c * ). In addition, one can also include an estimate of the relevant relaxation time of the system in terms of the number of potential energy evaluations to reach statistical independence (τ * ). Finally, the rule of thumb incorporates the parallelisability of the post-processing (η P ), which is mainly determined by the number of available processors. The data indicate that some form of Amdahl's law of diminishing returns also applies to the post-processing of trajectories, leaving the relative costs of MM and QM (c * ) and the potential energy difference U reorg as the main criteria. If the relative computational costs are low, or if U reorg is too high, it is more efficient to run additional simulations on the QM energy surface and employ BAR. On the other hand, post-processing MM trajectories in parallel can yield significant gains under the right conditions.

On the introduction of intermediate states
While simulating so-called λ intermediate states increases the computational costs, they can also significantly increase the phase space overlap, and, therefore, the convergence of free energy calculations [4,12,17]. However, it is often unclear how many intermediate states are necessary in order to reach convergence. Thus, we employ our linear response model system to obtain an approximate solution for the optimal number of λ steps in a free energy calculation (n opt steps ). For simplicity, we are assuming again that the computational costs are the same for the two end points of each sub-step.

Zwanzig equation
Starting with the Zwanzig equation, and assuming that the intermediate states are distributed evenly in U reorg space, the relative computational costs of dividing the free energy calculation into n step sub-steps, are Assuming that e 2 U reorg >> 1, this can be approximated by Notably, the necessary number of λ points with the Zwanzig equation can be further reduced by employing so-called doublewide sampling (DWS) [4,23]. In DWS, the interval between two simulated λ-points, λ i and λ i+1 , is divided in half, and a virtual state λ i+ 1 2 is introduced at the midpoint of the interval. Since the virtual midpoint is not simulated, the computational costs only increase by the extra post-processing for λ i+ 1 2 . Since we can also neglect the costs of post-processing due to parallelisation, this leads to This means that the optimal step size increases as the problem becomes more challenging. For example, if the reorganisation energy is so high that 10 λ steps have to be used, the optimal step size in terms of U reorg is ca. 4.5 kT. Since the precision is supposed to be the same, and the step size of U reorg increases, the only way to achieve the same precision is an increase of the number of steps. Therefore, it follows that it is sometimes more efficient to run longer simulations than to introduce more intermediate steps. This is not necessarily an intuitive result.

Bennett's acceptance ratio
Similarly, the relative computational costs of dividing a BAR free energy calculation into n step steps are This indicates that the optimal step size in terms of U reorg again increases as the reorganisation energy gap widens. Thus, it is again necessary to increase the number of steps to achieve the required precision. The main results of our comparison are shown in Figure 6.

Methods
All calculations were conducted with CHARMM [53,54], using the CHARMM General Force Field [55] unless stated otherwise. The time step was 1 fs. No cut-off radius was employed. Free energy differences were evaluated with the FREN module of CHARMM.

Simulations of harmonic model systems
Following the same guidelines as provided in Refs. [56][57][58], two linear model systems were created, one consisting of 10 atoms and one consisting of 51 atoms. The molecules were simulated in the gas phase using Langevin dynamics with a friction coefficient of 5 ps −1 and a temperature of 300 K. The simulation length was 100 ns, and coordinates were saved every 1000 steps. The initial  Figure 7 is more sparse for high U reorg than the Zwanzig data.

Gas phase free energy simulations of twelve molecules
Based on a previous study [59], 12 molecules were used: water, methanol, ethanol, methanethiol, acetamide, tetrahydrofuran, benzene, aniline, phenol, ethane, n-hexane and cyclohexane. Each molecule was simulated in the gas phase using Langevin dynamics with a friction coefficient of 1 ps −1 and a temperature of 300 K. The simulation time scale was 500 ns and coordinates were saved every 20 ps. λ-Hamiltonian Replica Exchange [60] was employed to enhance sampling by exchanging structures between neighbouring λ points every 20,000 steps based on the REPD module of CHARMM. The free energy calculation was broken down into two parts. First, the charges were scaled to zero in three steps (λ = 0.0, 0.2, 0.55 and 1.0). Second, the van der Waals interactions were turned off using λ = 0.0, 0.15, 0.30, 0.45, 0.60, 0.75, 0.87, 0.96 and 1.00. Soft cores, as implemented with the PSSP command in the PERT module of CHARMM, were used with the default parameters to avoid the end point problem [53,61]. Free energy differences were calculated between all possible combinations of λ states. Standard deviations were calculated from four repetitions of each free energy simulation. U reorg values were determined based on the potential energy difference between the minimised structures of the respective end points using 10,000 steps of the adopted basis Newton-Raphson method [53].
The last 18,000 frames of the physical end points in the gas phase were employed for multi-scale free energy simulations between MM and QM. QM calculations were performed with Q-Chem [62] based on the CHARMM/Q-Chem interface [63]. In particular, potential energy differences were evaluated with four different methods using the 6-31G* basis set: [64] BLYP [65,66], B3LYP [67], Hartree-Fock and M06-2X [68,69]. The MMoptimised structure was obtained with 1000 steps of steepest decent minimisation, followed by 1000 steps with the adopted basis Newton-Raphson method. The MM-optimised structure was then further minimised with the respective QM method, using 450 steps with the adopted basis Newton-Raphson method. The SCF convergence criterion for the gradient calculations was set to 10 −10 . U reorg was calculated from the potential energy difference between the MM and QM optimised structure while using the MM Hamiltonian.

Verification with gas phase simulations
Because Equations (17) and (24) are of fundamental importance for our analysis, caution dictates to verify their results based on actual simulations. Although the equations are derived analytically, they can include some approximations that we are not aware of. For this reason, we resort to linear model systems that only include bonded terms, similar to the work by Boresch and Karplus in Ref. [57]. The linear response approximation should be valid for the benchmark systems because no non-bonded interactions and no force constants are changed.
The top panel of Figure 7 shows a comparison of the predicted variance of the free energy estimate based in Equation (17) (blue line) with variances of actual free energy simulations. The corresponding free energy simulations include both changes of bond lengths (+) and changes of bond angles (×). This is relevant, because the equation dictates that the variance only depends on the reorganisation energy U reorg , but not on U reorg indicates the potential energy difference between the energy minimum of the initial state and the energy minimum of the final state while using the Hamiltonian of the initial state. Each free energy simulation was conducted with 100, 000 potential energy differences (blue). + signs indicate results for mutations of the bond length, while × signs indicate changes of bond angles. The theoretical estimates of the standard deviation based in Equation (15) are shown for 100,000 data points (dp). Interestingly, the correlation between the simulated results and the theoretical results seems to break down once the variance reaches ca. 0.1 kT. A similar behaviour was also observed by Gore et al. in Figure 3A of Ref. [70]. This can probably be attributed to sampling errors. Notably, BAR can handle significantly larger U reorg than the Zwanzig equation. This is highlighted by the U reorg that leads to a variance of 0.1 kT (black horizontal line), which corresponds to 4.6 kT for the Zwanzig equation and 29.2 kT for BAR. the details of the force constants or the partition function. Thus, the equation should be equally valid for changes with a high force constants (like bond stretching) and lower force constants (like angle bending). The data in Figure 7 indicates that this is actually the case since there are no significant discrepancies between the two kinds of mutations. Also, in our mutations, we change one or more bonds at the same time, but this does not seem to affect the accuracy. Therefore, the data suggest that no normal mode analysis is required to make use of Equation (17), which renders its application very simple.
The top panel of Figure 7 makes apparent that the theoretical prediction of the variances is only valid for relatively small differences. As the variance approaches ca. 0.1 kT, which corresponds to a U reorg of ca. 4.6 kT, there are clear discrepancies between the theoretical variances and the variances obtained from simulation. While the theoretical variance estimates grow exponentially, the simulated variances seem to plateau around ca. 10 kT. It is expected that the simulated data will not perfectly agree with the theoretical result due to the finite number of repetitions that were used to estimate the uncertainty, which leads to an uncertainty of the uncertainty. However, the deviations are too large to be explained by random noise. One possible explanation is that the theoretical results assume perfect sampling, since they are directly obtained from a transformation of the Gaussian probability distribution of the micro-states. The simulated data, on the other hand, samples states only according to their Boltzmann probability and in finite time (100 ns). Thus, most of the samples reside close to the energy minimum of the initial state, while the tails of the probability distribution that are responsible for the large variance are not adequately sampled. This introduces a bias in the variance estimate, which could be addressed by employing the 'neglected-tail' bias model by Wu, Lu and Kofke, which was also recently discussed by Boresch and Woodcock [1,5,71]. Another possibility is the use of Umbrella Integration [72] or variational free energy profile calculations [73]. In any case, the observed sampling problems for harmonic systems in Figure 7 should serve as warning that simulated estimates of the variance are not necessarily trustworthy, since they might be affected by sampling problems.
To quantify the agreement, we have to resort to the decadic logarithm of the variances, since we have to evaluate values over several orders of magnitude. For the Zwanzig equation results, we exclude variances above 0.1 kT from this analysis, because they might be affected by sampling problems. The Pearson correlation coefficient of the remaining data with theory is very good with R 2 = 0.97. The mean signed deviation of Log 10 σ 2 Theory and Log 10 σ 2 Simulation is 0.36, which indicates that theory slightly overestimates the variance by a factor of ca. 1.4. The other possibility is that the simulations systematically underestimate the variance because of the limited number of repetitions (10). The root mean square deviation is 0.38, which is very similar to the mean signed deviation. This indicates that there are no major outliers in the considered range.
Turning our attention to the BAR results at the bottom of Figure 7, we do not observe discrepancies between theory and simulation, as in the case of the Zwanzig equation. However, this does not necessarily indicate that there are no sampling problems. A closer inspection of the BAR results relative to the Zwanzig results shows that the data for BAR is significantly more sparse above a U reorg of ca. 29 kT. This is caused by failure of BAR to converge if the phase space overlap between the end points is too low. Without free energy results, it is not possible to calculate a variance, and, therefore, those free energy simulations are not included. While the failure of BAR to converge can be frustrating for the user, it is ultimately beneficial, because it actively forces the user to employ a proper λ protocol by introducing more intermediate states. In this regard, BAR can be considered far more user friendly than the Zwanzig equation or Thermodynamic Integration, which do not provide an equivalent implicit quality control of the free energy results. However, as can be seen from the few outliers with variances higher than 0.1 kT (which, interestingly, all arise from angle mutations with lower force constants), BAR sometimes manages to converge, even if the resulting variance is poor. For this reason, the BAR implementation in the FREN module of CHARMM is even more restrictive and refuses to provide a free energy estimate if the overlap of the potential energy distributions is below 1%. Leaning on Equation (19), the estimated phase space overlap is based on where, in practice, the probabilities are estimated with histograms of the potential energy differences. The cutoff of 1% is based on our past experiences with BAR [3,4,[74][75][76], but with Equation (23) it is possible to relate this overlap to a U reorg of ca. 18 kT, which corresponds to a standard deviation of 0.25 kcal/mol if merely 1000 data points are used in the free energy step. We, therefore, think that this quality criterion is not overly restrictive.
With the available BAR data, the correlation coefficient between the decadic logarithms of the variance of theory and simulation is very good with R 2 = 0.97. The mean signed deviation between theory and simulation is 0.35, and the root mean square deviation is 0.38. This, again, indicates that the theoretical estimate of the variance is higher than the one obtained from simulation, but this can be mostly explained by the uncertainty of the uncertainty. Since parts of the deviation could also arise due to the approximations introduced in Equations (21) and (22), we also provide an alternative derivation in the Appendix, which gives the same exponential behaviour, but corrected with a prefactor. While this alternative given in Equation (A2) is more accurate, we still favour Equation (17) because it simplifies the ensuing derivations.
Even a casual comparison of the BAR data with the Zwanzig data in Figure 7 reveals the fundamental superiority of BAR when it comes to bridging energy gaps between two states. Taking as an example a reasonable variance of 0.1 kT that has to be achieved with 10,000 potential energy difference samples (highlighted by the black horizontal line), the Zwanzig equation can overcome a U reorg of 4.6 kT. With the same requirements, BAR can cross a U reorg of almost 30 kT. In practice, this means that BAR only requires a fraction of the number of λ points compared to the Zwanzig equation. Taking our estimates from Equations (34) and (37), the number of λ points can be significantly reduced, which saves computer time.

Gas phase free energy simulations of 12 molecules
By employing our estimates of the variance of free energy simulations of twelve molecules in the gas phase, we want to highlight some of the limitations of the approach. There are several assumptions that are involved in the estimate, including: (a) The system under consideration is harmonic. (b) The structure that is used for U reorg is representative for the whole energy landscape (i.e. it is the global free energy minimum and the other minima contribute only marginally to the free energy difference). (c) The force constants involved in both end states are approximately the same. Much like the model of the 'homo oeconomicus' in macroeconomics, those underlying assumptions are very weak and almost never fulfilled in practice. However, it can be illustrative from an academic point of view to observe how the linear response toy system fares when compared to data from real molecules.
The variances of free energy differences for the twelve molecules under consideration (water, methanol, ethanol, methanethiol, acetamide, tetrahydrofuran, benzene, aniline, phenol, ethane, n-hexane and cyclohexane) are shown in Figure 8. In particular, we simulate one leg of solvation free energy calculations, where the non-bonded interactions of a molecule are sequentially turned off. In our simulations, electrostatic interactions are turned off by scaling the charges to zero (denoted by 'Elec change'). In addition, the van der Waals interactions are scaled to zero ('VdW change'). To reach higher values of U reorg , also combinations that change both electrostatic and van der Waals interactions at the same time are considered ('VdW + Elec change'). We also differentiate between calculations that were performed in the 'forward' direction (deletion) and in the 'backward' direction (insertion), since it is expected that the insertion direction should be more relevant [21]. A priori, it is expected that the linear response model works better for purely electrostatic changes.
While there are definitely several outliers in Figure 8, the results from the linear response approximation are not as pathetic as one would expect them to be, given the really crude underlying approximations. Like for the harmonic systems, we exclude systems with a variance higher than 0.1 kT because of the expected sampling problems. We also exclude variances below 10 −14 to avoid numerical problems. An evaluation of the decadic logarithms of the variances of the 1341 free energy results leads to an R 2 = 0.91, which is lower than the correlation coefficient for the harmonic systems, but still pretty good. The mean signed deviation is −0.32, which means that the variance is on average higher than predicted by linear response theory. The root mean square deviation is 0.82, which is twice as high as for the purely harmonic systems. However, since we are evaluating the decadic logarithm of the variance, this corresponds to a factor of ca. 6.6. If one tries to translate this ratio into an error in terms of U reorg (ca. 0.94 kT), one ends up with the expected misestimate of the required number of λ points according to Equation (34) (1.8). Thus, if one tries to use the linear response approximation to estimate the number of λ points with the Zwanzig equation in the gas phase, the estimate can be expected to be wrong by about 2 λ points. Since it is better to err on the side of caution than risking unconverged sub-steps, one might consider adding two additional steps to the estimated λ protocol to account for the observed discrepancies.
Rather than indiscriminately adding more sub-steps, one can ask whether it is possible to identify particularly recalcitrant steps in the λ protocol. An obvious and highly recommended strategy is to conduct short simulations to see whether the predicted variances agree with the actual results. This accounts for both anharmonicity and for conformational entropy, which are not considered in the approximation. In addition, short equilibration simulations are required anyway before the actual production is started, and therefore this check does not really incur significant additional computational efforts. However, since we are aspiring to make the best use of the information that is available before trajectories are generated, we also propose two simple alternatives.  Apart from our assumptions of harmonicity and of the starting structure being representative for the ensuing trajectory, the linear response approximation implies that both end points are governed by the same force constants. One possible way to verify this is to use normal mode analysis [53,77] or quasi-harmonic analysis [78]. However, some information about the difference of the force constants can also be extracted from the reorganisation energies. Since the distance between the energy minima of the two end points in terms of our reaction coordinate is the same, the ratio of the resulting reorganisation energies is the same as the ratio of the force constants, i.e.
where U 1→0 min means that the reorganisation energy is determined with the Hamiltonian of the final state, and U 0→1 min uses the potential energy function of the initial state. The resulting ratio α actually represents a measure for the vibrational entropy [57,79,80] since By identifying large changes of the entropy by colour coding the results in Figure 8 based on ln α (left side, where black represents small changes and red represents drastic changes), one can see that several major outliers can be explained by violations of the assumption that the end states have similar force constants. In some cases ln α is 4, which corresponds to a force constants ratio of ca. 55. Thus, the end states have very different phase spaces available. How is it possible to resolve this issue? To some degree, it is possible to mitigate this problem by following the advice of Kofke and coworkers that free energy simulations are supposed to be performed in the insertion direction, as one should generally proceed from the higher entropy state to the lower entropy state [1,2,21,24,81]. Focusing on a comparison of the data points marked in bright red in Figure 8, one can distinguish between free energy calculations that are performed in the forward direction ('fw', marked with stars), which corresponds to a deletion step here, and calculations that are performed in the backward direction ('bw', marked by squares), which correspond to an insertion. The outliers in the forward direction are fully visible and mark clear overestimations of the variance by the theoretical estimate (blue line). On the other hand, the backward processes (red squares marking the insertion) are more difficult to see, because they partially overlap with other data points and are significantly closer to the theoretical estimate. This indicates that the insertion process leads to a more reliable estimate. However, a closer inspection of some of the other outliers at the top of figure also shows that sometimes the backward estimates severely underestimate the variance. A possible compromise to reconcile the conflicting data is to consider the forward and backward processes as bounds to the proper estimate, and either use the mean of the two, or, as a kind of worst-case scenario, use the larger U reorg . Another factor that is not considered by our approximation is the use of so-called dummy atoms. Dummy atoms are placeholder atoms without any non-bonded interactions [57,82]. Since no van der Waals interactions are calculated for dummy atoms, they do not cause steric clashes. However, when evaluating the potential energy difference to some other state, the dummy placeholder atoms are turned into normal atoms, which sometimes causes steric clashes, leading to the so-called end point catastrophe [53,61,83]. The effect of the end point catastrophe depends on the probability of steric clashes, when cross-evaluating potential energies with dummy states. Since the dummy-likeness depends on the λ value, the effect of dummy states can be observed by colour coding the results in Figure 8 based on λ, as is done on the right side with purple colour. In most cases, the use of dummy atoms leads to no noticeable effects, but this is only the case because our test set mostly consists of relatively small and stiff molecules. An interesting column of outliers can be observed at ca. 10 −2 kT where the variance is higher than predicted by linear response theory. All data points of this group correspond to hexane, the longest chain in the data set. In this case, the dummy state of hexane can assume any conformation that is possible in terms of the bonded terms, while the phase space of the non-dummy end state is restricted by excluded volume effects. The excluded volume is of course not considered by linear response theory, but probably could be treated with a form of Flory theory [84]. However, it would be difficult to generalise the resulting theory, therefore suffice it to say that the linear response approach may break down for dummy atoms, especially in the condensed phase. Figure 9 shows the corresponding results for BAR. As already discussed in the context of forward and backward transformations with the Zwanzig equation, there is some ambiguity in terms of which U reorg is to be used. This is especially acute in BAR, since BAR uses the potential energy distributions of both end states for its free energy estimate. As a simple ad hoc solution, we just employ the mean of the two U reorg here. Again, the correlation coefficient is pretty good with R 2 = 0.92, but the mean signed deviation is −0.57 and the root mean square deviation is 0.83. Thus, the systematic error in terms of underestimation of the resulting variances is higher than for the Zwanzig equation. Based on the root mean square deviation, this also leads to an underestimate of the required number of λ points by about two.
As for the Zwanzig equation, the left side of Figure 9 marks mismatching force constants between end points in red. While the BAR free energy estimate itself finds the optimal combination of the forward and reverse potential energy distributions, this is not case for our linear response approximation for estimating the variance based on U reorg . This aspect will, therefore, require more attention in the future. The same can be said about the treatment of dummy atoms. Therefore, it is advisable to restrict the use of our linear response approximation to systems that mostly involve electrostatic changes and no introduction of dummy atoms. This indicates that the theory can be applied to some extent to multi-scale free energy simulations between an MM and a QM end state.

Multi-scale free energy simulations from MM to QM
Our final benchmark consists of multi-scale free energy simulations between MM and QM energy surfaces. In particular, we employ MM trajectories based on CGenFF and attempt to obtain free energy differences with the Zwanzig equation to QM energy surfaces based on BLYP, B3LYP, Hartree-Fock and M06-2X with 6-31G* basis set. We employ the same twelve molecules as in the previous case, which leads to 48 free energy differences. Since performing quantum-mechanical simulations on the nanosecond time scale would be very expensive, we do not consider the use of BAR here. Instead, we attempt to check the basic transferability of our results to multi-scale free energy simulations. The results are shown in Figure 10.
Although the range of the variance data is significantly smaller compared to the more extensive test in Figure 8, the overall metrics are comparable. The R 2 is significantly lower with 0.66 (compared to 0.91), but this reflects both the limited range and the fact that this benchmark is significantly more challenging due to the inherent differences of the underlying physics of both end states. The mean signed deviation is −0.51 kT (compared to 0.32 kT) and the root mean square deviation is 0.83 kT (compared to 0.82 kT). Thus, the root mean square deviation is very similar to the previous results, but the systematic error is higher and tends to underestimate the variance. While the introduction of λ intermediate states is possible in multi-scale free energy simulations [31,85], it is less common than in molecular mechanics. Thus the linear response approximation is probably more valuable to gauge the required number of data points, which is also a way to increase precision. Alternatively, one can also use U reorg to evaluate whether a different force field is more likely to yield better precision.

Conclusions
In this paper, we have attempted to devise a pragmatic solution to one of the dilemmas involved in free energy simulations: How to devise a free energy simulation protocol when the required information for the protocol is the entropic contribution of the resulting free energy difference itself. For this purpose, we borrow some ideas from Marcus theory and linear response theory. An overview of the basic strategy is provided in Figure 11. Starting from some representative initial structures (e.g. a crystal structure), the reorganisation energies ( U reorg ) between the two states are determined. Based on U reorg , the expected computational costs can be derived, which serve as a basis to decide which free energy method is most likely going to be more efficient, and the expected number of λ steps is determined. Ideally, the procedure is supposed to be used recursively on each of the sub-steps to minimise potential errors arising from mismatching force constants.
The proposed approach was tested using free energy simulations in the gas phase of model systems and of a set of twelve molecules, reaching root mean square deviations of the decadic logarithm of the variances between 0.38 and 0.83. This indicates that the λ protocol can be chosen with an accuracy of about ± 2 steps. We also provide some metrics based on phase space overlap, vibrational entropy and the presence of dummy atoms to predict when the linear response approach is likely to break down. As a final verification, we employed multi-scale free energy simulations between molecular mechanics and quantum mechanics to demonstrate the applicability of our simplistic approach.
While it would be rather bold to suggest that the provided theory can be used in all fields of free energy simulations, it is still valuable for educational purposes. The underlying theory allows us to address practical questions such as the relative value of speed and accuracy in molecular simulation. This is highly relevant to appreciate the value of recent developments such as polarisable force fields [86][87][88][89][90][91][92][93][94], semi-empirical methods [95,96], or applications of QM/MM [31,32,32,80,. As illustrated in Section 1.7, the required number of data points in a free energy calculation increases exponentially with the error of the potential energy in terms of the reorganisation energy. Therefore, a slow method can actually out-compete a much faster method if it manages to provide modest gains in accuracy. However, the optimal balance of speed and accuracy depends on the details of the application, since other factors such as parallelisability or the autocorrelation time of sampling play a role. Those factors are quantified here.
To summarise, our results show from a purely statistical mechanics point of view the connections of the efficiency of free energy calculations with underlying concepts like accuracy, speed, parallelisability, sampling and the choice of free energy method. For simple systems, we demonstrate how to make the optimal choice in terms of the free energy protocol. From a theoretical point of view, the use of more accurate polarisable force fields and, ultimately, quantum-mechanical methods for sampling is inevitable for applications of free energy simulations. While fast, approximate methods provide the psychological advantage of more immediate gratification, they are often not competitive if the speed is bought with a notable loss of accuracy. However, since the optimal performance of free energy simulations depends on several factors, it is still up to the user to weight the impact of the different choices. Being able to come up with the best possible strategy for a challenging system is therefore still one of the hallmarks of a good computational chemist. Note 1. While different numbers of data points could in principle be used for the initial state and the final state, it is pointed out in the original BAR paper [6] that the optimal ratio depends on the variance of the Fermi functions (c.f., Equations (13)- (16) and Figure 3 in Ref. [6]) -which is equal in our model. However, spending about equal computer time on both states could also be a viable alternative..