Steering a solute between coexisting solvation states: Revisiting nonequilibrium work relations and the calculation of free energy differences

By analogy with single-molecule pulling experiments, we present a computational framework to obtain free energy differences between complex solvation states. To illustrate our approach, we focus on the calculation of solvation free energies (SFEs). However, the method can be readily extended to cases involving more complex solutes and solvation conditions as well as to the calculation of binding free energies. The main idea is to drag the solute across the simulation box where atomistic and ideal gas representations of the solvent coexist at constant temperature and chemical potential. At finite pulling speeds, the resulting work allows one to extract SFEs via nonequilibrium relations, whereas at infinitely slow pulling speeds, this process becomes equivalent to the thermodynamic integration method. Results for small molecules well agree with literature data and pave the way to systematic studies of arbitrarily large and complex molecules.

Solvation free energies (SFEs), the difference in free energy resulting from considering the solute molecule in a solvent and in the gas phase at a given temperature and pressure, are fundamental to understand various processes in biology, chemistry, and pharmaceutical sciences in view of computer-aided drug design. SFEs are often calculated using free energy methods that join solvated and gas phase states via a careful modulation of physical interactions. [1][2][3][4][5][6][7][8][9][10][11][12][13] This procedure is rather involved and strongly system dependent.
Here, we compute SFEs in terms of the work necessary to pull the molecule from a region within the simulation box containing the solvent (solvated state) to another region containing an ideal gas representation of the solvent (gas phase). A uniform solvent number density is enforced throughout the simulation box. Hence, molecules can freely diffuse while changing their resolution, from interacting to ideal gas, on the fly which consequently guarantees a constant solvent chemical potential.
In particular, let us consider a system of N atoms, where a single solute molecule σ is solvated in Ns solvent molecules of type α. To concurrently simulate the fully atomistic and ideal gas representation of the solvent, we apply the adaptive resolution framework, [14][15][16] particularly the Hamiltonian adaptive resolution H-AdResS method. [17][18][19][20] There, a Hamiltonian of the system includes a switching field λ(r) that defines regions within the simulation box where intermolecular interactions are either fully atomistic [λ(r) = 1] or nonexistent [λ(r) = 0]. A smooth interpolation between the values 0 and 1, and thus a change in molecular resolution, takes place in between these two regions. The Hamiltonian of the system has the form 21 with (r, p) being positions and momenta and K = ∑ N i=1 p 2 i /2mi the total kinetic energy of the system. In this particular case, we write the where the labels intra and inter refer to intra-and intermolecular interactions, respectively. The vector R is the position of the center of mass of a given molecule, and the switching field λ(R), which determines the molecule's identity, takes the value 1 if the molecule is in the atomistic region (AT), 0 is in the ideal gas region (IG), and a smooth interpolation between these values is defined in the hybrid (HY) region (see Fig. 1). It is apparent that the potential given by Eq.
(2) can be generalized to cases where multiple solute molecules are present or the solvent corresponds to a molecular mixture. Moreover, the choice of reference state, in this case the gas phase, is rather arbitrary and can be calibrated depending on the problem at hand. The ΔH terms are free energy compensations (FECs) introduced in the Hamiltonian to eliminate nonphysical contributions to the forces due to the gradients of λ(r) and to guarantee a flat density profile across the simulation box. For all cases considered, there are only two FEC contributions, one for all the solvent molecules, ΔHα, and the other for the solute molecule, ΔHσ. The FEC of the solvent has been identified with the excess (or residual in the case of mixtures) chemical potential 19 and can be easily computed using an iterative on-the-fly procedure 22 (see Sec. I of the supplementary material). In this formalism, we write the following time-dependent Hamiltonian: with a time-dependent harmonic restraint, with constant κ, applied on the x-coordinate of the atom i that belongs to the solute molecule σ, xi∈σ(t), whose initial position is denoted by x 0 i∈σ . This potential drives the solute molecule across the simulation box to sample different solvation conditions between two extreme cases: a fully solvated solute and its gas phase (see Fig. 1). From the Hamiltonian (3), we can compute the thermodynamic work that corresponds to the change in energy of the system, 23 with v being the pulling rate. Finally, the relation between the work defined in Eq. (5) and the free energy difference between two equilibrium solvation states can be established by the Jarzynski equality (JE), [24][25][26] with β = 1/k B T. Shortly after its formulation, the JE found immediate application in various single-molecule pulling experiments. [27][28][29][30] One interesting limiting case of the JE corresponds to the infinitely slow pulling limit where the system equilibrates for every value of the switching field λ(r). In this case, from the JE (6) and the definition of work (5), we obtain an expression for ΔF completely analogous to the thermodynamic integration method, 25,31 where the brackets ⟨⋯⟩ λ indicate the ensemble average for a given value of the field λ(r) and the δ indicates a functional derivative. The terms Ra = r AT + d HY and R b = r AT with r AT and d HY being the linear dimensions of the AT and HY regions, respectively. From the This quantity vanishes upon integration. 21 For the solvent molecules α, this result verifies that the excess chemical potential is zero. For the solute σ, we have two possibilities. ΔHσ(λ(Rσ)) ≠ 0 implies ΔF = 0 and ΔHσ(λ(Rσ)) ≡ 0 implies that, in the infinitely slow pulling limit, the SFE can be obtained as To investigate this, we consider a single component system. The solute is defined as one of the system's molecules such that ΔHα = ΔHσ. We calculate ΔF by using Eq. (7) when ΔHσ ≠ 0 and Eq. (8) when ΔHσ = 0. With this aim, we perform umbrella sampling 32,33 simulations of pure SPC/E 34 water (see Secs. I and II of the supplementary material). In practice, we select a water molecule inside the AT region (schematically indicated in Fig. 1) and restrain the x-coordinate of its oxygen atom using the harmonic potential in Eq. (4) with zero pulling rate and spring constant κ = 209.2 kJ/mol/Å 2 . First, we apply the FEC to all the molecules in the system including the selected water molecule [ΔHσ ≠ 0 in Eq. (2)]. This solute molecule is moved sequentially by Δx 0 oxygen = 0.2 Å and sampled for 20 ps to construct the biased probability distribution of xoxygen. We perform 20 uncorrelated simulation runs to calculate the solvation free energy profile and shift it with respect to the IG region. Averages and standard deviations (error bars) are reported in Fig. 2 (blue symbols) which show, as expected and within error bars, a flat free energy profile across the simulation box (ΔF = 0). Second, we repeat the same umbrella calculation as described before but without applying the FEC to the solute water molecule [ΔHσ = 0 in Eq. (2)]. Results are also reported in Fig. 2 (red symbols), where a flat energy profile is apparent in both the AT and IG regions and the gap between the two amounts for the difference in free energy of −29.45 ± 2.63 kJ/mol. 35 This result agrees within error bars (which are comparable to the ones obtained with stateof-the-art free energy methods) with the excess chemical potential as reported in the literature (−29.52 ± 0.03 kJ/mol 36 and −29.18 ± 0.16 kJ/mol 19 ), and it validates our approach since the solvation free energy of a water molecule in water is the excess chemical potential.
For the second part of this study, we focus on the calculation of the SFE of one urea molecule in water (in the following, we assume ΔHσ = ΔHurea = 0). In this case, we apply JE for different pulling rates, including the infinitely slow pulling case given by Eq. (8). We simulate urea and water molecules using the OPLS 37 and SPC/E 34 force fields, respectively (see the computational details in Sec. I of the supplementary material). In this case, the carbon atom of the urea molecule is restrained and pulled across the simulation box using the harmonic potential in Eq. (4) with zero pulling rate and spring constant κ = 209.2 kJ/mol/Å 2 . The resulting free energy profile is shown in Fig. 3 (blue symbols), and we use it as the benchmark for the next part of the investigation.
We now proceed to perform the steered molecular dynamics of the solute molecule to bring it from the AT to the IG region, and we use the JE to compute the SFE. The initial conditions for all pulling experiments, slow and fast, are extracted from different equilibrated configurations at constant temperature. Thus, the initial positions of the restrained molecule are distributed with Boltzmann weight exp(−κ(x − x 0 ) 2 /2k B T) whose width (standard deviation of the distribution) is given by the inverse of the restraining stiffness, κ −1 . We apply the time-dependent restraining potential in Eq. (4) with κ = 209.2 kJ/mol/Å 2 which is high enough to neglect, within error bars, the contribution of such a potential in the calculation of the free energy. Thus, we effectively calculate the difference in free energy for the unconstrained system. 38 In practice, to make use of the JE (6), it is possible, on the one hand, to express ⟨e −βW ⟩ in terms of a cumulant expansion, 39

FIG. 3.
Free energy profile of one urea molecule transferred between AT and IG regions with pulling rate (a) v = 10 Å/ns and (b) v = 50 Å/ns. The blue curve in each panel is obtained by using the umbrella sampling method described in the text. The free energy profiles calculated by JE using the first, first and second, and exponential averages of work are represented by black, red, and green curves, respectively. The averages and error bars are calculated over (a) six sets of ten and (b) 20 sets of ten independent simulation runs.

ARTICLE scitation.org/journal/jcp
with ⟨W 2 ⟩ − ⟨W⟩ 2 being the mean-squared variation of W. On the other hand, it is also possible to directly use the exponential average, i.e., In both cases, we use Eq. (5) to compute the applied work W on the system. The initial position of the solute is set at x 0 i∈σ = 20 Å, and the simulation total time τ is set such that the final equilibrium position x 0 i∈σ + vτ is 60 Å. We pull the restrained atom in the solute molecule at two constant speeds: v = 10 Å/ns and v = 50 Å/ns (the latter being comparable to the thermal velocity of urea at room temperature, see Sec. IV of the supplementary material). We generate 60 (200 in the case of v = 50 Å/ns) uncorrelated trajectories, and we group them into six (20) blocks of ten trajectories. The statistical analysis (first and second orders of the cumulants as well as the exponential average) is carried out for each block. In Fig. 3(a), the result of computing ⟨W⟩ for a pulling rate of v = 10 Å/ns is, as expected, larger than ΔF [4 kJ/mol (black symbols)]. Nevertheless, we use the JE by either adding the contribution of higher order cumulants as in Eq. (9) (red symbols) or evaluating the exponential average as in Eq. (10) (green symbols). Both estimates increase the accuracy with respect to the benchmark free energy curve. Taking this into account, we report the solvation free energy of one urea molecule as obtained from the exponential average as −52.9 ± 3.6 kJ/mol in good agreement with the result reported in Ref. 40 of the excess chemical potential of aqueous urea (−55.13 kJ/mol) at a mole fraction of 7.657 × 10 −4 . We show that v = 10 Å/ns [ Fig. 3(a)] is an optimum pulling rate (see Sec. V of the supplementary material) and use this value to comment on the efficiency of the method. Since error bars are systematically smaller for the JE than for umbrella sampling results, we report in Sec. VI of the supplementary material, free energy profiles with reduced size of the statistical sample. These results indicate that to obtain comparable uncertainty, it is required to run approximately the same number of molecular dynamics steps in both cases. When calculating free energies, optimized equilibrium methods usually outperform nonequilibrium methods. 31,41,42 However, our particular result might be due to the simplicity of the system under consideration and/or the choice of equilibrium calculation which, in this context, serves only an illustrative purpose and could be further optimized.
The amount of nonreversible work increases upon increasing the pulling rate to 50 Å/ns, as presented in Fig. 3(b). The first order cumulant ⟨W⟩ (black symbols) shows a difference of approximately 30 kJ/mol with respect to the benchmark free energy (blue symbols). In this case, the second order cumulant (red symbols) is more accurate with respect to the umbrella sampling calculation than the logarithmic estimator in Eq. (10) (green symbols). However, the uncertainty in the second order cumulant calculation is larger because of the limited sampling. 38,39 The error in the exponential average is lower, i.e., with smaller fluctuations, but still with a considerable mean deviation of approximately 20 kJ/mol from the reference value, that evidences the well-documented bias of this particular estimator. 38,39 Previously, we have only considered the process to bring the molecule from the AT to the IG region, but it is also possible to compute ΔF by performing the reverse process. In this case, Crooks fluctuation theorem (CFT) combines the information obtained from examining forward (F) and reverse (R) pulling events, 43 where W is the work performed on the system resulting from F and R pulling and β = 1/k B T. Similarly to the JE, the CFT has also been investigated in various experimental studies. [44][45][46] In this case, the difference in free energy ΔF is obtained by evaluating the ratio of the F and R work distributions P F (+βW)/P R (−βW). We identify F with pulling the solute from the AT to the IG region and R with the inverse process. To use the CFT, we perform a number of R processes equal to the number of F processes at a pulling rate of v = 50 Å/ns. Free energy profiles for R events are shown in Fig. 4 where a consistent behavior with respect to the F processes is apparent and rules out the presence of hysteresis in the system. For this case as well, the best estimate with respect to the umbrella sampling profile is given by the second order cumulant expansion (red symbols). Finally, we compute the work distributions P(±βW) F/R for both R and F processes (Fig. 5). We identify the intersection of the distributions, i.e., the point at which W = ΔF [ Fig. 5 (inset)], and report ΔF = 49.67 kJ/mol. The results obtained by using the JE and the CFT agree reasonably well. However, more statistics and data from different pulling rates need to be collected in order to convincingly calculate ΔF from the CFT. We have introduced a framework to compute SFEs via the work resulting from dragging the solute between solvation states where nonequilibrium work relations allow one to compute free energy differences. The method can be extended to cases including more complex solutes, multiple solvation conditions, calculation of absolute binding affinities, and relative binding free energies. Moreover, large solute molecules can be studied because (i) pulling from the AT region depletes the solvent; however, this can be compensated by defining a sufficiently large IG region at minimal computational expenses and (ii) pulling from the IG region gives results in line with the JE. Finally, when equilibrating intermediate states or finding an optimal path between solvation conditions become unfeasible, this pulling protocol provides a sound and straightforward alternative to current SFE methods.
The supplementary material includes the computational details common to all simulations as well as the specifics of the umbrella sampling calculations. In addition, there is a short discussion on the correction needed for an appropriate comparison between results obtained with DSF and conventional Ewald summation for electrostatics. To estimate the pulling speed for the nonequilibrium calculations, it is convenient to compute the mean-squared displacement of a urea molecule in aqueous solution. Moreover, to find the optimal pulling rate among the values considered in this study, we also present the standard deviation of the corresponding work distributions. Finally, we add a section where we compare the efficiency of umbrella sampling and nonequilibrium calculations.