Regularizing binding energy distributions and thermodynamics of hydration. Application to water modeled with classical and ab initio simulations

The high-energy tail of the distribution of solute-solvent interaction energies is poorly characterized for condensed systems, but this tail region is of principal interest in determining the excess free energy of the solute. We introduce external fields centered on the solute to modulate the short-range repulsive interaction between the solute and solvent. This regularizes the binding energy distribution and makes it easy to calculate the free energy of the solute with the field. Together with the work done to apply the field in the presence and absence of the solute, we calculate the excess chemical potential of the solute. We present the formal development of this idea and apply it to study liquid water.

The excess chemical potential, µ ex x , of a solute (x) within a general thermodynamic system (the solvent) is that part of the Gibbs free energy that would vanish if the interaction between the solute and solvent were to vanish. For this reason the excess chemical potential is of most interest in understanding a complex system on a molecular basis. Since the excess chemical potential is measurable, calculating it also serves as an important validation of molecular simulations.
In this communication, we present a new approach to calculate µ ex x , one that sidesteps the current dominant paradigm based on alchemically changing solutes. Our approach, a generalization of the quasichemical organization [1] of the potential distribution theorem [2], rests on appreciating and exploiting the different energies with which a solute interacts with a solvent at different spatial scales. The approach leads to a transparent accounting of the hydration thermodynamics and is readily applicable to systems modeled by ab initio potentials or molecularly complex solutes such as proteins. In this communication we present our results for liquid water modeled by both empirical and ab initio potentials. Studies on a protein modeled by empirical potentials and of aqueous ions modeled by ab initio potentials will be presented in subsequent articles.
To appreciate the need for alchemical approaches, consider the formal relation between µ ex x and solute-solvent interactions [1,2]: is the potential energy of the N + 1 particle system comprising the N solvent molecules and the one solute molecule. U N is the potential energy of the solvent and U x is the potential energy of the solute. P x (ε) is the density distribution of ε in a system in which the solute and the solvent are thermally coupled, the averaging under these conditions is denoted by . . . . P x (ε) is the density distribution when the solute and the solvent are thermally uncoupled, with . . . 0 denoting averaging in this instance.
For condensed systems the high-ε tail of P x (ε) or the low-ε tail of P (0) x (ε) are rarely well sampled and thus using either of Eqs. 1 usually fails. It is for this reason that µ ex x is often calculated by accumulating the work done in slowly transforming a solute from a noninteracting solute to a fully interacting solute. However, such alchemical transformations pose conceptual challenges in the context of ab initio simulations; an intermediate solute with fractional charge (and uncertain spin state) can prove troublesome within quantum chemical calculations. For example, using ab initio simulations, an alchemical approach was used to calculate the hydration free energy of some cations, but the same approach could not be used for a chloride anion [3]. Thus the broader applicability of alchemical approaches within ab initio simulations remains unknown, and in particular, we are unaware of a similar study for the hydration free energy of a water molecule.
In contrast to alchemical transformations, our strategy is to regularize the P x (ε) distribution by imposing a constraint [4]. Here the constraint is an external field φ(r) centered on the particle such that in the presence of this field the solute-solvent interactions are better behaved. Using the well-known rule of averages (for example, see [1]), for any potential u and the mechanical variable F , where . . . u denotes averaging when the solvent evolves in the presence of the additional potential u, we can show that The quantity, ln e −βφ , is the negative of the work done to apply φ in the solute-solvent system; − ln e −βφ 0 is the work done to apply φ in the neat solvent system; and − ln e −βε φ is the interaction free energy of the uncoupled solute with the solvent in the presence of φ. Notice that whereas − ln e −βε 0 can prove challenging to estimate because of close solute-solvent contacts, − ln e −βε φ is expected to be more amenable to a direct estimation. Indeed, if φ is chosen such that the regularized binding energy distribution P The first two terms on the right-hand side of Eq. 2 can be obtained using thermodynamic integration, but the solute-solvent Hamiltonian is unchanged, an obvious advantage in ab initio simulations. For example, if φ(ξ) varies between zero to its final value as the parameter ξ is varied from zero to λ, then [5] − ln e −βφ where the ensemble averaging is in the presence of the field at the current value of ξ. Eq. 2 is a generalization of the quasichemical approach [1] for any external field φ. In the original quasichemical development [1], solvent molecules were strictly excluded from a sphere of radius λ around the solute. In the present notation, this amounts to applying the field where r is the distance between the solvent and the solute. x 0 = e −βφ h is the probability of observing zero solvent molecules in a coordination sphere of radius λ around the solute, p 0 = e −βφ h 0 is the probability of observing an empty cavity of radius λ in neat water, and βµ ex outer = − ln e −βε φ h is the contribution to the free energy due to the interaction of the solute with the solvent outside the coordination sphere. For solutes that interact strongly with water or for solutes with complicated molecular shape, obtaining x 0 and p 0 can prove difficult, especially for a large cavity radius [6]. The flexibility to choose any field and the thermodynamic integration method (Eq. 4) both alleviate this difficulty within the context of Eq. 2. In analogy to the original quasichemical notation, since the fields used here (Eqs. 7) model soft-cavities, we rewrite Eq. 2 as where the solute (x) is a distinguished water (w) molecule.
In this study we tested two fields where a and b are positive constants and r < λ; φ(r; λ) = 0 for r ≥ λ. Before presenting the simulation details, we note that a variant of the quasichemical approach with a soft-cutoff has been presented recently [7]. That approach uses a probabilistic model to partition solvent between the inner shell (r < λ) and the bulk and is thus rather different from the approach here.
We next consider the simulation implementation of Eq. 2.
Classical simulations: Classical simulations were performed with the NAMD code [8]. Using the Tclinterface in NAMD, the force ∂φ/∂λ is applied to the solvent molecules within λ and the ensemble average (integrand in Eq. 4) estimated. From the average force-vs-λ data, the free energies are constructed.
We simulate a cubic box of 512 SPC/E [9] water molecules at a temperature of 298 K using a Langevin thermostat and a pressure of 1 bar using a Langevin barostat [10]. The decay constant for the thermostat was 1 ps −1 . The barostat piston period was 200 fs and the decay time was 100 fs. The SHAKE [11] algorithm was used to constrain the geometry of each water molecule. The solute water molecule was fixed at the center of the simulation cell.
We sample λ between 0 and 5.0Å. First the λ = 2.5Å state was equilibrated for 2.4 ns and then all other λ values were generated by successively changing λ by ±0.1Å. At each λ, we performed simulations for a total of 600 ps and used the last 500 ps for analysis.
For calculating µ ex outer,s , we extend the trajectory for states with λ between 4.5Å and 5.0Å by an additional 600 ps, saving configurations every 0.5 ps. For the last 1100 frames, we inserted a test water molecule in the center of the cavity and calculated its binding energy with the medium. Five randomly chosen orientations of the test water were used per frame. A direct numerical estimate of e −βε φ and modeling the binding energy as a Gaussian gave nearly the same values for µ ex outer,s . Ab initio simulations: We simulated water using the BLYP-D electron density functional and the cp2k code [12]. The parameters for the electronic structure calculation are as in earlier studies [13,14]. In contrast to the specification of the external force ∂φ/∂λ in NAMD, the external potential φ is provided as an input to the cp2k code. Since the code calculates the force by differentiating an interpolation function of the potential, we use a non-zero b parameter in φ ramp (Eq. 7a) to ensure forces are well-behaved at the cavity boundary.
We simulate the liquid at a density of 0.997 g/cm 3 (number density of 33.33 nm −3 ) and a temperature of 350 K. The system contains 64 water molecules. To sample the NVT ensemble, we use the hybrid Monte Carlo method [13]. The initial configuration is taken from a previous MD simulation with the same functional [14]. During the first 500 sweeps, the time step for integrating the equations of motion was adjusted to provide an acceptance ratio of 70%. Then the time step was fixed and the system equilibrated for another 500 sweeps. (One sweep of the HMC comprises 50 molecular dynamics time steps.) For estimating ln x s , a chain of HMC simulation starting at λ = 2.5Å and going up to 3.75Å in increments of 0.25Å was performed. During the construction of the chain, at each λ we performed 300 sweeps. Subsequently, for calculating averages, simulations at each λ were extended for another 1200 sweeps. For estimating ln p s , a similar strategy is used but with λ = 1.75Å to λ = 3.75 A. For both ln x s and ln p s , from the total of 1500 sweeps per λ, we use the last 1000 for analysis.
To compute µ ex outer,s , we chose the last 1000 configurations. We estimate the binding energy distribution using the procedure documented earlier [14], except that we perform 16 test particle insertions per configuration. µ ex outer,s was estimated using the ensemble average e −βε φ and by the Gaussian model. We next present the results and their discussion. Classical Simulations: Figure 1 collects the results for the SPC/E water simulation for different choices of the external field.
As expected, the net chemical potential is nearly independent of λ or the field, as it obviously must be.
From the rule of averages, we can show that where p 0 (r h ) is the probability to observe a hard-sphere of radius r h , f (r h ) is the fraction of configurations where a hard-sphere of radius r h is found in a simulation with an external field φ(r; λ) and . . . |r ≥ r h φ denotes ensemble averaging over such configurations. The location of water molecules is defined by r (Eq. 7) and λ specifies the range up to which the field is active. (Here λ ≥ r h .) Figure 2 shows the results of such a calculation, with r h chosen such that f (r h ) is around 0.25 for a given λ. As the figure shows, using either field (Eqs. 7) we predict the same value of p 0 (r h ) within statistical uncertainties of about 0.2 kcal/mol. Open Symbols: hard-cavity from φ lj (Eq. 7b). Parameters for the φ's are noted in Fig. 1. The inset shows the deviation of the calculated −kBT ln p0 from the revised scaled particle theory [15] prediction (solid black line, main figure).
is negative) relative to the revised scaled particle theory TABLE I. The different contributions to the excess chemical potential of water obtained from the ab initio HMC simulations. µ ex outer,s is calculated as an ensemble average; the result from a Gaussian fit (Fig. 3) to the binding energy data is given in parenthesis. Eq. 7a with b = 0.001Å and a = 10 kcal/mol is used to regularize the interactions. The first three rows are using Eq. 7a and the next three rows are with Eq. 7b. Energies are in kcal/mol. Uncertainties are at 1σ level. (SPT) [15] predictions. This small difference is likely due to a lower bulk surface tension predicted by the SPC/E model [16], whereas the SPT predictions are based on the experimental bulk surface tension. Indeed, using the surface tension of SPC/E, we are able to reparametrize SPT to reproduce the p 0 (r h ) obtained in this study. Ab initio Simulations: Table I collects the estimated µ ex w of water. Using φ ramp (Eq. 7a), the average of µ ex w over the λ values considered here is about −6.9 ± 0.4 kcal/mol). Earlier, using a hard-sphere conditioning and also an order of magnitude more data -110k energy values [14] versus 16k energy values used herefor calculating µ ex outer,s we found µ ex w ≈ −6.0 kcal/mol for a 64-water molecule system at an average temperature of 362 K [14]. The present estimate appears in fair agreement with the earlier result, especially considering the difference in temperatures and also the amount of data collected.
The average µ ex w using φ LJ (Eq. 7b) is −5.0 ± 0.4. The estimates using φ ramp (Eq. 7a) and φ LJ (Eq. 7b) thus bound the earlier estimate of −6.0 kcal/mol [14], but in contrast to results with the classical potential model (Fig. 1), the variation in the numerical estimate of µ ex w with both λ and the field is high (being about 1-2 kcal/mol). Comparing k B T ln x s and k B T ln p s between φ ramp and φ lj (Table I) suggests that the (large) step-size of 0.25Å for integration underlies the observed variation in µ ex w . Identifying optimal integration strategies, perhaps including non-uniform sampling of λ, is left for future work. Figure 3 compares the binding energy of the test particle with the bulk medium for different choices of the field. The value of µ ex outer,s calculated using a hard-sphere conditioning of radius 3.0Å (see [14]) is similar to that for conditioning with φ ramp (λ = 3.25Å); not surprisingly, the low-ε tail is also similar for φ h (λ = 3.0Å) and φ ramp (λ = 3.3Å).
In summary, we have presented a method of obtaining the hydration thermodynamics of the solute that avoids alchemical transformation of the solute. This method  3. Distribution of the interaction energies P (0) (ε|φ) of the test particle in the presence of different regularizing fields φ. The binding energies are translated vertically for clarity. Triangles (red) and circles (blue): φ h is used; data has been taken from our earlier study [14]. Squares (black): φramp with a = 10 kcal/mol and b = 0.001Å. The solid lines are Gaussian fit to the respective distribution. For the radii range considered here, regularizing with Eq. 7a leads to a less well-characterized low-ε tail relative to Eq. 5, as is expected. allows one to readily calculate the hydration thermodynamics of solutes using ab initio simulations. Within the quasichemical approach, using a hard-sphere cutoff to demarcate local and long-range interactions restricts one to small (typically ≤ 4.0Å) cavity sizes, since these are the cavities that can be observed in simulations with sufficient statistical precision. The present development removes this limitation, and it is now possible to probe cavity sizes that can accommodate small globular proteins (Weber and Asthagiri, in preparation). Thus exploring the hydration thermodynamics of proteins in aqueous media with the conceptual clarity afforded by the quasichemical approach appears to be within reach. Finally, the perspective the regularization approach provides on theories of liquids also remains to be explored.