Accurate schemes for calculation of thermodynamic properties of liquid mixtures from molecular dynamics simulations

We explore different schemes for improved accuracy of entropy calculations in aqueous liquid mixtures from molecular dynamics (MD) simulations. We build upon the two-phase thermodynamic (2PT) model of Lin et al . [J. Chem. Phys. 119 , 11792 (2003)] and explore new ways to obtain the partition between the gas-like and solid-like parts of the density of states, as well as the effect of the chosen ideal “combinatorial” entropy of mixing, both of which have a large impact on the results. We also propose a ﬁrst-order correction to the issue of kinetic energy transfer between degrees of freedom (DoF). This problem arises when the effective temperatures of translational, rotational, and vibrational DoF are not equal, either due to poor equilibration or reduced system size/time sampling, which are typical problems for ab initio MD. The new scheme enables improved convergence of the results with respect to conﬁgurational sampling, by up to one order of magnitude, for short MD runs. To ensure a meaningful assessment, we perform MD simulations of liquid mixtures of water with several other molecules of varyingsizes:methanol,acetonitrile, N , N -dimethylformamide,and n -butanol.Ouranalysisshowsthat resultsinexcellentagreementwithexperimentcanbeobtainedwithlittlecomputationaleffortforsomesystems.However,theabilityofthe2PTmethodtosucceedinthesecalculationsisstronglyinﬂuenced bythechoiceofforceﬁeld,theﬂuidicity(hard-sphere)formalismemployedtoobtainthesolid/gas partition,andtheassumedcombinatorialentropyofmixing.Wetestedtwopopularforceﬁelds,GAFF andOPLSwithSPC/Ewater.Forthemixturesstudied,theGAFFforceﬁeldseemstoperformasa slightlybetter“all-around”forceﬁeldwhencomparedtoOPLS+


I. INTRODUCTION
The two-phase thermodynamic (2PT) model, introduced by Lin et al. in 2003, 1 has sparked interest in recent years due to its ability to obtain converged thermodynamic properties from relatively short molecular dynamics (MD) runs.The model builds upon the density of states formalism developed by Berens et al. 2 The central idea of 2PT is to separate the total number of degrees of freedom N of the system under study into (1 -f )N "solid-like" and f N "gas-like" degrees of freedom, for which thermodynamic properties are calculated separately.This partition relies on the critical parameter f, the fluidicity, which is a measure of how the diffusive properties of the real system compare to those of an ideal hard-sphere gas.The original 2PT formalism was developed for monocomponent fluids, where the effects of mixing of different molecular species do not need to be taken into account.Recent work by Lai et al. 3 and Pascal and Goddard 4 has dealt with the derivation of 2PT expressions for multicomponent systems, where many expressions are modified by including molar fractions in the definitions.Their treatment is based upon the assumption of ideal combinatorial mixing.While this can be a valid assumption for mixtures of fully miscible and similarly a) Electronic mail: mcaroba@gmail.comsized molecules, it may not hold accurate when studying, e.g., thermodynamic properties of large solvated molecules such as typical outer-sphere electrochemically active complexes, liquid mixtures where the difference in size of the various molecules is pronounced, and fully immiscible or partially immiscible liquids.
In this paper we present and extensively assess new ways to estimate fluidicities and different mixing schemes, applied to liquid mixtures where the size of the constituent molecules is allowed to differ considerably.In particular, we study the excess entropy of mixing of methanol/water, acetonitrile/water, N,N-dimethylformamide (DMF)/water, and n-butanol/water, where the ratio of molecular weights varies between ∼1.8:1 and ∼4.1:1 (see Fig. 1).We show that while the choice of method to perform the solid/gas partition has a sizable impact on the results, it is the expression to estimate the combinatorial entropy of mixing that has the largest effect on the calculated entropy values.For instance, estimating the combinatorial entropy of mixing from the molar fractions, as given by the expression for a mixture of ideal gases, works reasonably well for the acetonitrile/water and methanol/water mixtures.However, this approximation breaks down quantitatively for DMF/water (where DMF molecules are considerably larger than water molecules) and even leads to qualitatively incorrect results for n-butanol/water, the latter being a mixture of immiscible liquids.The choice of force field also strongly affects the results.We have tested the general Amber force field (GAFF) 5 and the OPLS force field 6 with the SPC/E water model. 7Overall, GAFF performed better across mixtures while OPLS+SPC/E gave quantitatively very accurate results for methanol/water.In addition, when short MD trajectories or small systems are considered, the temperature fluctuations introduced by the specific thermostat used might lead to transient inhomogeneous distribution of the thermal kinetic energy.Here we show that the leading error can be easily accounted for with a firstorder correction consisting of the scaling of the DoS so that the correct number of degrees of freedom is retrieved.
The improved methodology presented here has been implemented in our freely available DoSPT code (http://dospt.org).

II. FORMALISM FOR ENTROPY CALCULATION
For a detailed description of the 2PT formalism, the reader is referred to the original literature on the density of states (DoS) approach 2 and the 2PT model, 1 as well as follow-up work from the 2PT authors. 3,4,8Here we are mostly concerned with the extension to mixtures, which was previously discussed in Refs. 3 and 4, and the solid/gas partition procedure.
Within the context of the 2PT model, an extensive thermodynamic property of the system (e.g., its entropy), to which we refer generically as Φ, can be expressed as a functional of the DoS S(ν), where the DoS has been decomposed into a "gas-like" DoS S g (ν) and a "solid-like" DoS S s (ν).The W Φ are the weighting functions that allow to compute the contribution of each vibrational mode to Φ, and are given in Refs. 2 and 8.The total DoS, S(ν) = S s (ν) + S g (ν), is calculated as the mass-weighted normalized sum of the vibrational modes of all the degrees of freedom present in the system.If the system is a monoatomic fluid, then the total DoS is where T is the temperature, k B is Boltzmann's constant, m j are the atomic masses, and s k j (ν) is given by the squared modulus of the Fourier transform of the atomic velocities where j runs over all the atoms in the system and k refers to each of the Cartesian components of the atomic velocities.τ is the time lapse over which the Fourier transform is performed.
In practice, when the velocities are obtained from an MD simulation, a finite value of τ needs to be used.The integral of S(ν) equals the total number of degrees of freedom.The gas-like DoS is obtained from hard-sphere (HS) theory as 1 S g (ν) = S(0) where N DoF is the total number of degrees of freedom in the system.S g (ν) integrates to f N DoF , that is, the number of gas-like DoF.The solid-like DoS is simply obtained as the difference between the total DoS and the gas-like DoS, S s (ν) = S(ν) − S g (ν), and therefore integrates to (1 − f )N DoF .The extension of Eqs. ( 2) and (3) to polyatomic molecules has been done by Lin et al. 8 and will not be discussed further in this paper.Here we are more interested in how the partition of S(ν) into its "solid-like" and "gas-like" parts is done (i.e., how f is calculated), and how the pure fluid formalism is extended to mixtures.At this point, we also need to bring into discussion the concept of partial properties, which are given on a "percomponent" basis.Since several critical parameters such as mass, moment of inertia, and symmetry number are only well-defined within an ensemble of molecules when all the molecules are of the same type, we use capital greek scripts (Λ, Γ) to denote different molecule types within a mixture, and w to denote the type of degree of freedom (i.e., translational, rotational or vibrational).For instance, in a water/methanol mixture, S Λ,w (ν) ≡ S wat,trn (ν) would denote the total DoS corresponding only to the translational degrees of freedom of the water molecules within the mixture.

A. Gas/solid partition: Fluidicity
The partition between solid-like and gas-like DoS is done via the "fluidicity" parameter f.The fluidicity of a molecular system within the 2PT formalism is defined as the ratio of the system's diffusivity (as calculated from the MD trajectory) over the diffusivity of an auxiliary ideal hard-sphere fluid 1 where N is the number of particles, V is the system's volume, and σ HS is the effective hard-sphere diameter.Using our notation extended to mixtures, we rewrite Eq. ( 5) as where Γ = 1, . . ., N sg and N sg is the number of components in the system (e.g., N sg = 2 for a binary mixture).V Λ is the partial volume of component Λ.Note that Eq. ( 6) slightly differs from the extension to mixtures in Ref. 4, since we have in principle allowed D Λ,HS 0 to depend on all the particle numbers and we are allowing the hard-sphere diameters to differ between components.The latter consideration is important if one intends to study mixtures where the difference in molecular size of the different components is large.Lai et al. 3 provided a discussion on how to estimate V Λ ; we use an approach based on Voronoi partition 9 that has proven to be fairly robust for our purposes, especially because the partial volume data can be extracted directly from the MD information and therefore reflects the possible effect of explicit molecular interactions on the partial volumes (as opposed to, e.g., using tabulated values based on reference molecular or atomic radii).
The "real" diffusivity of the system [the numerator in Eq. ( 6)] is directly calculated from the MD trajectory information 1 where M Λ is the mass of the molecules of type Λ.In order to estimate the "ideal" diffusivity D Λ,HS 0 for multicomponent gases, a mean free path including all the particle types should be taken into account.The different particle types are the different sets of identical molecules present in the simulation, which are characterized by their mass M Λ , effective HS diameter σ Λ , and particle density N Λ /V .The expression for the mean free path for the molecules belonging to component Λ is 10 The summation is performed over all the components present in the system.Assuming the diffusivity for molecules of type Λ to be proportional to the product of their mean velocity vΛ = 3k B T M Λ and their mean free path, Eq. ( 8), leads to the following expression for the hard-sphere diffusivity of component Λ in the zero-pressure limit: where we have derived the prefactor by requiring that Eq. ( 9) reduce to the monocomponent 2PT expression in the limit of only one component present in the system (the "rigorous" selfdiffusion coefficient given by McQuarrie 11 ).The extra term containing the multicomponent correction, Ω Λ , is given by A complication arises due to the fact that Ω Λ depends on the hard-sphere diameters, which cannot be obtained a priori for multicomponent systems.For monocomponent systems, Lin et al. 1 proposed expressions that relate fluidicity to hard-sphere diameter and allow to conveniently compute the fluidicity directly from known properties of the system.When more than one component is present, it is not possible to write down these equations anymore because the compressibility factor depends on more than one hard-sphere diameter, which are not known a priori.Therefore, the hard-sphere diameters must be either obtained by numerical optimization within a multicomponent formalism (the approach that we follow) or the components must be handled separately within the monocomponent formalism (e.g., the approach followed by Lai et al. 3 ).In the Appendix we propose a new self-consistent approach, based on a penalty function, to simultaneously optimize the hardsphere diameters of all the components.Additionally, we also adapt the Stokes-Einstein model of rotational diffusion to be used within the 2PT formalism.In the Appendix we compare these different ways to obtain translational and rotational fluidicities with the original 2PT implementation.In Sec.IV, we will show how the chosen fluidicity formalism has a sizable impact on the calculated entropy values.

B. Entropy of mixing
The 2PT method can accurately describe the changes in entropy arising from molecular interactions: changes in vibrational and diffusive modes can be accounted for by means of the solid/gas partition of the DoS.However, when dealing with mixtures, the increase in entropy due to the additional volume to which each of the components has access also needs to be taken into account.Lai et al. proposed to use the ideal entropy of mixing, 3 that is, the entropy of the mixture is given by Equation (11) assumes that the components are fully miscible, there is no clustering in the mixture and the molecular volumes of both components are the same.Basically, Eq. ( 11) assumes that the two substances mix like an ideal gas.We will show in Sec.IV that this assumption seems to work remarkably well for fully miscible small-sized molecules, such as our methanol/water and acetonitrile/water mixtures, but breaks down for DMF/water and butanol/water.While DMF and water are in principle fully miscible, the disparity in molecular size and the planarity of DMF might lead to some geometrical limitations as to how nearby molecules can be stacked together in the mixture.Butanol/water on the other hand are immiscible, and we chose this system as an extreme example of the breakdown of Eq. (11).Eq. ( 11) is an ad hoc correction to the entropy which requires combinatorial mixing in the mixture to occur similarly as it does in ideal gases.It would be more desirable to incorporate the combinatorial mixing contribution into the partition function and derive the corrections in a more rigorous manner.This has been done by Lazaridis and Paulaitis, 12 who argue that combinatorial mixing is accounted for, to a first approximation, by the partial molar volumes.That is, instead of Eq. ( 11) one should use the following expression: A more in-depth discussion of this issue can be found from Lazaridis and Paulaitis, 12 who argue that Eq. ( 12) provides the correct "one-particle" terms.Further corrections involve incorporating radial distribution functions into the computations.The performance of Eqs. ( 11) and ( 12) was already tested by Meroni et al. 13 They pointed out that Eq. ( 12) leads to results in worse agreement with experiment than Eq. ( 11).We will show in Sec.IV that this tends to also be the case for our calculations.However, as we will see in Sec.IV, the agreement is very much case specific and strongly influenced by the force field used.One needs to take into consideration that the total entropy of mixing arises from two separate contributions.On the one hand, there are the specific molecular interactions that change the vibrational and diffusive modes.On the other hand, there are the combinatorial effects, which are a measure of the extra volume available to each of the components upon mixing and how the molecules spatially rearrange in the new environment.
To compute the latter, we are currently relying on Eqs. ( 11) and (12), which are only simple approximations.In practice, these two effects will either compete or cooperate.The combinatorial term always increases the entropy, except for immiscible substances where the components have no access to "extra space" when put in contact (where it is zero).The interaction term can affect the entropy by either lowering or increasing it.For a specific mixture, good agreement with experiment could be due to either a good description of both terms or to an error cancellation.Finally, note that Eq. ( 12) always leads to a larger calculated entropy than Eq. ( 11), except for when all the partial molar volumes are equal, in which case they yield the same result.
All in all, a more sophisticated approach based on properties directly accessible from the MD trajectories is required to accurately account for non-ideal mixing effects that cannot be traced back to changes in vibrational or diffusive modal changes.Such an approach would ideally be based for instance on radial distribution function analysis, or another type of postprocessing of MD trajectory information, that has access to the deviation of molecule distribution from random.Trying to integrate such a general scheme within the 2PT model is beyond the scope of the present manuscript, but will be the topic of future work.

III. MOLECULAR DYNAMICS SIMULATIONS
All the MD simulations were performed with the Gromacs suite. 14,15We have tested two popular force fields: OPLS 6 and GAFF. 5 The OPLS force field was used in conjunction with the SPC/E rigid water model. 7This choice was motivated by the good results obtained by Pascal and Goddard 4 for methanol/water, and also to allow direct comparison of our implementation of 2PT with theirs.][18] All the simulations were performed, for each mixture and force field, in three steps as follows.(i) To determine the evolution of density with composition, random boxes containing a total of 2400 atoms were generated, after which NPT simulations were run at 1 atm using a stochastic velocity-rescaling thermostat 19 and a box-rescaling barostat (Berendsen) 20 with time constants of 0.1 ps and 1 ps, respectively.To realize the different molar fractions of the mixtures, the molecules in the pure water system (800 water molecules) were replaced by methanol, acetonitrile, DMF and butanol molecules in 1:2, 1:2, 1:4, and 1:5 ratios, respectively (i.e., one DMF molecule replaced 4 water molecules).This approach allowed to keep the overall box size approximately constant.A total of 9 different configurations for the pure liquids and 3 configurations for every other composition were generated.Each NPT simulation was run for a total of 500 ps and the average density over the last 250 ps was chosen as the equilibrium density of that particular configuration.After that, the equilibrium densities were averaged among configurations to obtain the final equilibrium density for each composition.(ii) Once the densities had been determined, the same number of independent configurations as before (9 for the pure liquids and 3 otherwise) was generated within boxes of fixed size, corresponding to the densities estimated in the previous step.Additional NVT dynamics were simulated for 500 ps with the Nosé-Hoover thermostat 21,22 with a time constant of 0.1 ps.(iii) Taking each of the final snapshots of the NVT equilibrations as starting points, additional 100 ps of NVT dynamics were run and subsequently sliced into five 20 ps trajectories for analysis with our 2PT implementation, DoSPT.The atomic positions and velocities were saved every 2 fs.To ensure appropriate resolution of H vibrations, the integration step for all of the simulations was chosen as 0.5 fs.
To test the effect of high-frequency vibrations on the results, we ran calculations both constraining all the Hcontaining bonds and also allowing them to vibrate.The results indicate sizable changes on the predicted equilibrium densities (especially for water) but no significant changes on the calculated entropies of mixing.A possible explanation for this is that hydrogen vibrations are dominated by intra molecular forces which do not change upon mixing.In addition, because hydrogen-containing bonds vibrate several times during the time that it takes a nearby molecule to rearrange according to the felt electrostatic or dispersive interaction, that molecule would only feel an effective interaction which is already well captured with constrained bonds.Therefore, since no remarkable features were observed other than the aforementioned effect on density, the results shown in Sec.IV are mostly for the H-bond constrained simulations.

IV. RESULTS AND DISCUSSION: EXCESS ENTROPY OF MIXING OF BINARY MIXTURES
In this section we compare the results of our simulations to experimental results available in the literature of excess entropy of mixing for the chosen binary mixtures.The excess entropy of mixing S E mix is defined as the difference between the real entropy of the mixture and the entropy calculated assuming ideal mixing.Defining molar entropies of the pure liquids with a bar, SΛ , then S E mix is given by Computationally, S mixture is given by either Eq. ( 11) or (12), depending on the approximation used for the combinatorial entropy of mixing.
In order to interpret the results of our simulations, one needs to keep always in mind the five main sources of error: 1.The choice of force field.Each force field will determine which "version of reality" we are analyzing, and will impose the absolute limit to how well experimental observations can be reproduced.2. The intrinsic ability of the 2PT approximation to correctly compute entropies.3. The particular way to compute fluidicities, i.e., the solid/gas partition, that will serve as input to the 2PT machinery.4. The choice of "ideal" or "combinatorial" entropy of mixing, which acts as a correction to the 2PT values.5. Statistical error arising from the MD sampling.
J. Chem.Phys.145, 244504 (2016) TABLE I. Calculated data for pure liquids and experimental references.In the cases where experimental values of molar entropies were lacking at the desired temperature, they had to be extrapolated from existing data (see table footnotes).For the calculated molar entropies S 0 , "Std" indicates values obtained using the standard 2PT approach for rotational fluidicity 8 whereas "S-E" indicates the Stokes-Einstein-based treatment for rotational fluidicity developed in the Appendix of this manuscript; in both cases translational fluidicity is the same.The mean absolute error (MAE) has been calculated from the shown values, discarding the 313 K and 323 K values for water to avoid biasing it by introducing a particular liquid more than once.Error estimates for the entropies are calculated as the standard deviation of all the 20 ps MD trajectory slices (5 slices per each of the 9 independent trajectories, totaling 45 data points per liquid).In this paper we are assessing the sources of error 1, 3, 4, and 5.For the time being, we disregard source 2 and assume that with optimal partitioning (point 3) and mixing (point 4) schemes, 2PT can yield very accurate results.Statistical error (point 5) is linked to the number of sampled trajectories, their duration, and whether they sample a representative portion of configuration space; it will be discussed further at the end of this section in the context of DoS renormalization.Finally, one could argue that it is not possible to decouple the intrinsic ability of 2PT to yield accurate thermodynamics from the ability to calculate the fluidicity itself (points 2 and 3).To do so, one would need to establish a measure of fluidicity and its connection to the solid/gas partition of the degrees of freedom that makes sense also outside of the 2PT framework.This is not a trivial task and we will not deal with it in the present manuscript.

GAFF
We simulated mixtures of water with other three fully miscible compounds (methanol, acetonitrile, and DMF) and one immiscible compound (butanol), in an attempt to test a wide range of molecular size mismatches, as shown in Fig. 1.The results for standard molar entropies S 0 and densities of the pure liquids are given in Table I.The temperatures for each compound were chosen to match the temperature at which available experimental measurements for the binary mixtures were conducted.The values for the pure liquids serve as the initial benchmark of each force field because the error emanating from the mixing scheme is not present.Overall, GAFF seems to yield entropies in better agreement with experiment for water and acetonitrile, whereas OPLS+SPC/E performs better for DMF and butanol (although the description of butanol is poor for both force fields).Both force fields yield similar entropies for methanol.It is worth noting that the predicted density of DMF is quite off with OPLS, which might help explain that better results can be obtained with GAFF for DMF/water mixtures.The effect of constraining the H-containing bonds is small for the larger molecules, but quite noticeable in the case of water, especially for the OPLS+SPC/E force field (it should be mentioned that the SPC/E model of water is optimized for a rigid description).As previously mentioned, imposing or lifting these constraints does not have a sizable impact on the excess entropies of the different mixtures.In contrast to these results, switching the description of rotational fluidicity between the standard 2PT implementation and the Stokes-Einstein-based treatment does have a slightly larger impact on mixtures than on pure liquids (the comparison for mixtures is shown in Table II).Finally, note from the error estimates given in the table that typical uncertainties are very small, especially for water.One can expect a single 20 ps simulation of liquid water analyzed with the 2PT formalism to yield molar entropy values with a typical error of about 0.4% for flexible water and 0.15% for rigid water models.This is a valuable feature in the context of simulations where sampling becomes computationally expensive (e.g., in ab initio MD simulations).
The results for the different binary mixtures are given in Fig. 2.An error estimate for all the mixtures except for butanol/water is given in Table II.Good qualitative description of methanol/water, acetonitrile/water, and DMF/water is achieved with both force fields tested, but the quantitative agreement depends strongly on the particular mixture, force  11)] tends to yield results in better agreement with experiment than volume-based mixing [Eq.( 12)], a result that agrees with what has been previously reported in the literature for hard sphere mixtures. 13Our self-consistent HS diameter optimization with Stokes-Einstein-based rotational treatment seems to outperform the ∆ (original) treatment only for GAFF methanol/water.For other mixtures it is either equivalent, e.g., methanol/water with OPLS and acetonitrile/water, or inferior to the standard "∆ approach" (based on the original formulation with "normalized diffusivity" ∆ 1 ).Removing the S-E treatment leads to less differing results between the two methods (not shown in Fig. 2, but error estimates available in Table II).This supports the idea, previously discussed in the literature, that HS mixtures can indeed be well approximated as an ensemble of non-interacting HS systems. 4,34It is not surprising that the largest errors appear for the mixtures with the biggest dissimilarity in molecular sizes, and can probably be traced back to the need for a more specific mixing scheme than the two simple approaches explored here.Overall, our results point towards the conclusion that with an accurate force field and the correct mixing scheme (which would be more sophisticated than the simple molar-based and volume-based schemes explored here) the 2PT method is able to produce very accurate entropies not only for pure liquids but also for liquid mixtures.Detailed entropy and fluidicity data, computed for all the combinations of force field, mixing scheme, and hard-sphere formalism explored here, can be obtained from the supplementary material.
To get an idea of the effect of the used approximated combinatorial mixing entropy on the results, we have tabulated the mole-based and volume-based combinatorial entropy contributions to binary mixtures for different molar fractions and representative relative molecular sizes.These values are given in Table III, where for a binary mixture, the combinatorial FIG.Excess entropy of mixing of different liquid mixtures simulated with either GAFF or OPLS+SPC/E force fields.We show the effect of different ways to calculate fluidicities: "∆" for the original 2PT expressions, "SC-σ" for the self-consistent procedure for translational fluidicity explained in the Appendix, and "SE" for the Stokes-Einstein-based estimation of rotational fluidicity, also explained in the Appendix.We also show the effect of using molar-based entropy mixing ("mol") versus partial volumebased entropy mixing ("vol").The experimental results are taken from Ref. 30  molar entropy correction ∆ S to the total entropy of the mixture is where N A is the Avogadro constant.The combinatorial corrections are large and can be in the order of ∼5% or more of the total entropy of the mixture (depending on molecular size and number of degrees of freedom per molecule).For example, for methanol/water at x = 0.5, mole-based and volume-based combinatorial mixing amount to approximately +5.8 J/mol K TABLE III.Amount contributed to the total entropy of binary mixtures by the two mixing schemes explored in this paper [second terms in Eqs.(11)  and ( 12)].Both mole-based and volume-based schemes contribute the same entropy when the partial molar volumes are the same, V1 = V2 .x 1 is the mole fraction of component 1, therefore the molar fraction of component 2 is As a guide for the mixtures explored in this paper, the V1 / V2 ratios are approximately 2.2, 3.2, 4.1, and 5.1 for methanol/water, acetonitrile/water, DMF/water, and butanol/water, respectively.

Mole-based mixing entropy
Volume-based mixing entropy ∆ Smol (J/mol K) ∆ Svol (J/mol K) and +6.4 J/mol K, respectively.These quantities are larger (in modulus) than the corresponding experimental excess entropy of mixing, circa -3.7 J/mol K (Fig. 2).For comparison, the total entropy of the mixture at x = 0.5 is ∼90.8J/mol K (these data for all the mixtures can be retrieved from the supplementary material).Therefore, a large error computing the combinatorial contribution will lead to a large error in the estimated excess entropies.This highlights again the need to develop an accurate approach to incorporate combinatorial mixing entropies into the 2PT formalism.Finally, we want to make an important remark on the convergence of the calculations.Our results for methanol/water for OPLS+SPC/E are in excellent agreement with the corresponding results presented by Pascal and Goddard. 4 However, our error bars (given by the standard deviation of our results) are significantly smaller, even though we use very similar sampling schemes (15 snapshots per composition).The reduced spread of our values comes as the result of a correction (renormalization) that we apply to the DoS, and is justified as follows.As mentioned in the Introduction, for a finite-size system, temperature fluctuations lead to transient inhomogeneous distribution of the thermal kinetic energy among degrees of freedom of different types: this kinetic energy might be distributed in different proportions between the translational, rotational, and vibrational degrees of freedom.In other words, for any given finite time interval, the translational, rotational, and vibrational temperatures might not be equal.This issue may become exacerbated if the system has not been properly equilibrated or the sampled trajectory is short, e.g., because of limited available computational power and/or memory.From the point of view of the calculated density of states, this kinetic energy transfer leads to an effective number of degrees of freedom that does not match the real number of degrees of freedom.For instance, in a system made up of M molecules with N atoms each, the total real number of degrees of freedom amounts to 3MN, of which 3M are translational, 3M are rotational, and the remainder 3M(N -2) (assuming molecules with at least 3 atoms each) are vibrational.For reasonably equilibrated systems where the instantaneous temperature oscillates around the target thermodynamic temperature, the total DoS integrates to values very close to 3MN (especially for constant-energy simulations).However, the decomposed translational, rotational, and vibrational DoS each integrate to values that might deviate considerably from 3M, 3M, and 3M(N -2), respectively.Since, within the 2PT formalism (and also in general), translational, rotational, and vibrational DoF do not contribute equally to the entropy (their weighting functions are different), this means that the convergence of thermodynamic properties calculated from the DoS will suffer accordingly.The following first-order correction, or DoS renormalization to the correct number of DoF, considerably ameliorates the issue and improves the convergence of the 2PT method: FIG. 3. (Top panel) Excess entropy results for methanol/water mixtures.The "renormalized DoS" result with "constrained H-bonds" is the exact same as the corresponding OPLS+SPC/E graph from Fig. 2 with ∆ approximation for the fluidicity, and molar-based mixing.Here we show the effect on statistical variance (standard deviation of results at a given mole fraction, given by error bars) of the DoS renormalization procedure discussed in the text.(Bottom panel) Temperatures for "fully flexible" pure methanol, for a typical 20 ps trajectory, calculated for each type of degree of freedom based on the velocity decomposition. 8The thick green lines give a smoothed-out representation of the data points.
where w ≡ (trn, rot, vib).A comparison between the results obtained using the unnormalized "conventional" DoS S w (ν) and those obtained using the renormalized DoS Sw (ν) are shown in Fig. 3 (top panel) for methanol/water with the OPLS+SPC/E force field.The bottom panel of the figure shows the instantaneous temperatures for pure methanol as they evolve during a typical 20 ps dynamics, calculated from the velocity decomposition. 8Translational and rotational temperatures oscillate with a longer period and larger amplitude than the total temperature.It can be observed how the renormalization approach allows much improved convergence of the entropy values, especially for high methanol mole fractions and in the case of a "fully flexible" simulation, where high frequency H-bond vibrations affect the thermalization of the system.This correction is extremely attractive since it considerably improves the statistical precision of the results at virtually no added computational cost.

V. CONCLUSIONS
In this manuscript we have carried out a detailed and careful assessment of the strengths and limitations of the 2PT method 1 to correctly estimate the entropy of binary liquid mixtures throughout a wide range of dissimilarly sized molecules.
The largest sources of error turned out to be the choice of force field, the assumed "ideal" or "combinatorial" entropy of mixing, and the fluidicity scheme employed to realize the solid/gas partition (not necessarily in that order).While the quality of the force field employed affects any free energy estimation method, the two latter factors affect the 2PT method quite specifically.We have shown that ideal mixing based on molar fractions tends to yield better agreement with experiment than partial volume-based ideal mixing.This possibly occurs because of a systematic overestimation of the entropy of mixing by the 2PT method; volume-based mixing entropy is always higher than molar-based mixing entropy, therefore the molar-based scheme would always be better at correcting a systematic overestimation.The issue with combinatorial entropy of mixing also appears in spite of volume-based mixing constituting the "one-particle" term in the calculation of the total entropy of mixing. 12We hypothesize that inclusion of higher-order interaction terms contained in the radial distribution functions (RDFs), or rather the change in the RDFs moving from the pure liquids to the mixture, could be a successful route in ameliorating this problem.This will be the subject of future work on our part.
The main strengths of 2PT are being able to provide free energies directly from unaltered molecular dynamics (e.g., it does not require a modified Hamiltonian to couple initial and final states) and, especially, the ability to yield converged results at very low computational cost.We have presented an improved convergence scheme, based on the renormalization of the density of states, which allows faster convergence with respect to the number of sampled configurations, thus making 2PT even more attractive than before.Our new methodologies presented here have been implemented, together with the original 2PT approach previously discussed in the literature, into our DoSPT code, available online at http://dospt.org.
All in all, and taking the issues and benefits explained above into consideration, the 2PT method is emerging as a very promising molecular dynamics free energy method.However, at the moment accuracy seems to be highly case specific.In order to become highly flexible, and thus more appealing to the computational chemistry community as a standard method, we need to make further progress on the issues regarding solid/gas partitioning and, especially, combinatorial entropy of mixing.

SUPPLEMENTARY MATERIAL
See supplementary material for this paper contains detailed partial entropy and fluidicity data for all the simulations.

ACKNOWLEDGMENTS
The authors would like to acknowledge the computational resources provided for this project by CSC-IT

APPENDIX: SELF-CONSISTENT HARD-SPHERE MODEL FOR MIXTURES AND ROTATIONAL DIFFUSIVITY
In order to simultaneously optimize the hard-sphere diameters of all the components we can minimize the following penalty function with σ Λ as the variational parameters: where z is the compressibility factor of the hard-sphere mixture, calculated according to the Mansoori-Carnahan-Starling-Leland equation of state for hard-sphere mixtures 34 using the total volume and mole fractions (multiplied by their respective translational fluidicities f Γ trn ) to calculate the individual packing fractions, see Eq. ( 7) of Ref. 34; V Γ is the partial volume of component Γ (that can be accurately calculated using, e.g., Voronoi partitioning) and V is the total volume; ξ Γ trn is the partial packing fraction of component Γ, that is, its packing fraction calculated within its own partial volume, and z Γ trn is the partial compressibility factor calculated according to the Carnahan-Starling equation of state using partial packing fractions, Minimizing the second term in Eq. (A1) ensures that the deviation of real diffusivity from zero-pressure diffusivity for each individual component in the interacting mixture is well described within its own partial volume by the monocomponent HS formalism. 1Minimizing the first term in Eq. (A1) ensures that the weighted sum of non-interacting HS compressibilities z Γ trn is as close as possible to the real compressibility z of the interacting HS mixture.Therefore, by self-consistently solving Eq. (A1), we are obtaining a set of effective HS diameters that optimize the description of the multicomponent system as an ensemble of non-interacting monocomponent systems which resembles the multicomponent system as closely as possible.Within this approach, the "normalized diffusivity" parameter ∆ from the original 2PT model is not required, and the fluidicity is calculated directly from the definition, Eq. ( 6) with w ≡ trn, once the HS diameters have been determined.As a final note, for monocomponent systems the approach outlined above reduces to the original 2PT formalism. 1n Fig. 4 we compare our self-consistent HS diameter approach (SC-σ) with the approach based on the normalized diffusivity ∆. 3 We show results for a water/methanol mixture modeled with the OPLS force field and SPC/E water and a water/DMF mixture modeled with GAFF.As expected, both formalisms predict the same HS diameters for the pure liquids.Our SC-σ approach systematically predicts larger effective HS diameters for water and smaller effective HS diameters for the other molecules, when compared to the approach employed by Lai et al. 3 However, the truly important quantities to perform the solid/gas partition are the translational and rotational fluidicities (the vibrational DoS is purely solid-like).Lin et al. 8 proposed that rotational fluidicity be estimated on the same FIG.4. Hard-sphere diameters for each component of a methanol/water mixture (top) and DMF/water mixture (bottom), calculated using the normalized fluidicity approach (∆) and the simultaneous self-consistent approach (SC-σ).
Error bars indicate statistical variation in the data (standard deviation), which was obtained by averaging several configurations for each composition, as explained in Sec.III.
footing as the translational one.This is achieved by calculating an effective rotational HS diameter which is independent of the translational one.However, the theory of HS diffusion allows to compute rotational diffusion directly taking the translational HS diameter as input.The Stokes-Einstein relation for rotational diffusion of a spherical particle is where η Λ 0 is the "shear viscosity" coefficient (or simply, "viscosity" coefficient).Here we use a tilde to denote the usual rotational diffusivity constant, with units of 1/time , rather than the value calculated with Eq. ( 7) from the rotational DoS as defined by Lin et al., 8 which has units of length 2 /time and is an "effective" rotational diffusivity constant.For a hard-sphere monocomponent fluid, viscosity and translational diffusion are related as follows: 11 where the HS diameter σ Λ is calculated from the translational properties.To correct for non-sphericity of elongated molecules, Eq. (A6) can be rewritten taking into account the principal moments of inertia I i as DΛ,eff 0,rot = 1 3 DΛ 0,rot where the isotropic (symmetrized) moment of inertia I iso is simply Note that in the limiting case of a perfectly spherical molecule (I 1 = I 2 = I 3 ) Eq. (A7) reduces to the correct result, that is, DΛ,eff 0,rot = DΛ 0,rot .Finally, the rotational fluidicity parameter can be calculated from the definition: In Fig. 5 we compare the fluidicities computed for our methanol/water mixture using the different approaches previously outlined.

3 (
FIG.5.Translational and rotational fluidicities obtained for each component of a methanol/water mixture, calculated using the normalized fluidicity approach (∆) and the simultaneous self-consistent approach (SC-σ).The rotational fluidicity obtained with the Stokes-Einstein-based approach from the translational SC-σ hard-sphere diameters is also shown.Error bars are explained in the caption of Fig.4.

TABLE II .
Error estimate for different fluidicity and force field approximations for the aqueous mixtures studied in this work.n-butanol has been omitted because of the lack of experimental data in the full molar range.The error is calculated from the integral of the distance between simulated and experimental excess entropy curves: error = ∫ field, and mixing scheme, as can be seen from the calculated errors in TableII.The case of butanol/water is special because of the miscibility gap and lack of experimental values throughout the full molar range.Overall, molar-based mixing [Eq.( Center for Science.O.L.A. and M.A.C. would like to acknowledge funding from Academy of Finland through the Centres of Excellence program (Grant No. 284621).T.L. and M.A.C. also acknowledge funding from Academy of Finland (Grant Nos.285015 and 285526).M.A.C. would like to thank Chris Rycroft for help linking the Voro++ library to the DoSPT code.