Computational methodology for solubility prediction : Application to the sparingly soluble solutes

The solubility of a crystalline substance in the solution can be estimated from its absolute solid free energy and excess solvation free energy. Here, we present a numerical method, which enables convenient solubility estimation of general molecular crystals at arbitrary thermodynamic conditions where solid and solution can coexist. The methodology is based on standard alchemical free energy methods, such as thermodynamic integration and free energy perturbation, and consists of two parts: (1) systematic extension of the Einstein crystal method to calculate the absolute solid free energies of molecular crystals at arbitrary temperatures and pressures and (2) a flexible cavity method that can yield accurate estimates of the excess solvation free energies. As an illustration, via classical Molecular Dynamic simulations, we show that our approach can predict the solubility of OPLS-AA-based (Optimized Potentials for Liquid Simulations All Atomic) naphthalene in SPC (Simple Point Charge) water in good agreement with experimental data at various temperatures and pressures. Because the procedure is simple and general and only makes use of readily available open-source software, the methodology should provide a powerful tool for universal solubility prediction.


I. INTRODUCTION
Solubility is an important concept in many areas of science.For the development of pharmaceuticals, [1][2][3] it is crucial to know the solubility of drugs, as this influences bioavailability.5][6][7] These are but two examples to illustrate that it is extremely important to understand the precipitation of common materials under normal conditions and predict their solubility under conditions for which experimental information is difficult to obtain.
Numerical simulations provide a powerful tool for solubility prediction.For many users, it would be helpful to have a generic tool to predict solubility, which uses standard, open-source software.There is no lack of solubility calculations in the literature.  Hower, on the whole, simulationbased predictions of solubility are not widely used.We argue that this is due to the fact that generic, open-source tools are lacking.Yet, as we argue below, the prediction of solubility should be a standard test of new force fields.This becomes particularly clear when we consider the cases where existing force-fields have been used to compute solubility: very often, force-fields that seem to perform well for other properties fail the solubility-prediction test.Conceptually, the simplest way to compute solubilities from simulation is to a) This research was performed while Lunna Li was at Department of Chemistry, University of Cambridge, Cambridgeshire CB2 1EW, United Kingdom.][17] This method has the advantage of simplicity, but it generally requires long simulations (in some cases up to microseconds) to achieve solubility equilibrium, even for highly soluble compounds.
5][16][17][18][19][20] The alternative "thermodynamic" approach determines the saturation solute composition by imposing the chemical potential of the solid phase to the solution in a grand canonical style.Such a route has been followed in the osmotic ensemble method/OEMC/OEMD. 10,12,13,21The osmotic ensemble method has typically been applied to relatively small (ionic) solutes and focuses on the simulation of the liquid phase, assuming the chemical potential of the crystal is known from other sources.Also, it exhibits an advantage for multi-electrolyte solutions.However, we find that, in particular for large solutes, there is a need for a robust generic approach that integrates the calculation of the properties of both the crystal and of the solution, and is easily applicable in standard commercially available codes.
In the present work, we describe a generic protocol to predict solubility via computing the absolute chemical potentials of the solid and solution phases in solubility equilibrium.The method can be readily incorporated in standard codes for all-atom molecular dynamics (MD) simulations such as LAMMPS, 30 using classical all-atomic force fields available in the literature.We note that extension to other software packages should be straightforward.9][40][41][42][43][44] The modeling challenge is focused on developing atomistic modeling tools that will make it possible to predict the solubility of simple organic crystals (we use naphthalene as an example) for potential generalization to a broader range of compounds.

A. Theoretical background
As the basic approach to compute solubilities is well known, we only sketch the bare essentials here and refer the reader to Appendices A-C for more details.We start from the standard thermodynamic expressions. 45G, µ, A, and V are the Gibbs free energy, chemical potential, Helmholtz free energy, and volume and number of solute or solvent molecules.p and T are the pressure and temperature.At constant T and p, the condition for equilibrium of a solute between the crystal phase and the solution is the equality of chemical potential in the two phases, µ solute solution (p, T ) = µ solute solid (p, T ). (1) Hence, we need tools to compute the chemical potential in the crystal phase and in solution.
The chemical potential of the solute in the solution is In Eq. ( 2), the only part that requires computation is the last term on the right hand side of the equation (the excess chemical potential ∆µ excess ).ρ solute denotes the number density of the solute, with the common ideal gas reference state ρ 0 solute = 1 (1 molecule in 1 Å 3 = 0.001 nm 3 ).Λ solute is the thermal de Broglie wavelength of the solute, and q solute is the intramolecular (vibrational and electronic) partition function of the solute; R solute is the fixed position of the extra single solute in the solution, and U solute -solution (R solute ) is the interaction energy of the extra solute with the rest of solution.As explained in Appendix A, we can ignore the constant term ln ρ 0 solute Λ 3 solute q −1 solute for both coexisting phases.The chemical potential of the solute in the solid is For the solid phase, we derive the chemical potential from a simulation in which we compute the absolute free energy of the crystal.We compute the free energy of the crystal using an (straightforward) extension of the Einstein crystal method (see Sec. II B).
We describe details of the calculation of the chemical potentials A solid and ∆µ excess in Secs.II B and II C and in Appendices A-C.Equation (1) becomes In the case of a sparingly soluble solute (the example studied in this paper), we may set the second term on the left side to the hydration free energy of a single solute molecule in the solvent, and then solve analytically for ρ solute .In the case of higher solubilities, the hydration free energy becomes a function of composition, and we would need to calculate this dependence and determine the composition that solves the above equation.

B. Extended Einstein crystal method
Figure 1 is a pictorial description of the extended Einstein crystal method to compute the absolute solid free energy A solid of a molecular crystal.][42][43][44] In the present work, it has been adapted to be used in MD simulations in LAMMPS without extra codes.
As the extended Einstein crystal method is similar in spirit to the original Einstein method, we describe the practical details of the approach including the various steps and contributions as illustrated in Fig. 1 as well as finite-size corrections 46,47 in Appendix B.Here we just mention the salient features: in the original Einstein crystal method for atomic systems, the free energy is computed by TI from the fully interaction crystal to an atomic Einstein crystal for which the free energy can be computed analytically.In the extended Einstein crystal method, we transform the molecular crystal into an Einstein crystal of independent molecules that are tethered by harmonic springs to reference positions.The number of springs needed should be sufficient to fix the position and orientation of molecule, but otherwise there is considerable freedom of choice.A second aspect is that we perform TI in stages, as illustrated in Fig. 1.Finally, we should consider ∆A symmetry , the analytic contribution to A solid due to the symmetry of the constituting molecule in the crystal.For this term, we need to consider the molecular point group.Such an explicit consideration of ∆A symmetry would not be necessary if a molecule can occupy all the symmetrically equivalent orientations during a crystal simulation.However, in practice in MD, a molecule in a non-rotator phase only samples one of the symmetrically equivalent orientations, because of kinetic reasons.In contrast, a solute in the solution can explore all orientations.Therefore ∆A symmetry has to be explicitly added for a consistent consideration of the same reference state for the two phases.In the present work, for the compounds that we study, we have β∆A

C. Cavity method for excess solvation free energy
Figure 2 gives a schematic description of the cavity method to compute the excess solvation free energies ∆µ excess of general solutes in isothermal-isobaric NpT ensembles.
In Fig. 2, part label (1) represents a pure solvent with the total potential energy U (1) = U solvent-solvent , with U solvent-solvent = N solvent i u solvent-solvent and N solvent as the total number of solvent molecules.Part label (4) represents the final solution with a fully solvated solute, and the total potential energy is U (4) = U solvent-solvent + U solute-solvent with U solute-solvent = N solvent i u solute-solvent .The free energy difference between these two states is hence the excess chemical potential that we wish to compute, (5) FIG. 1. Pictorial representation of the extended Einstein crystal method to compute the absolute solid free energies A solid of single-component molecular crystal.
(0) The Einstein atomic crystal for which we know the analytic Helmholtz free energy A 0 ; (1) the Einstein orientational atomic crystal where each molecule is freely rotatable, with A 0 = A 1 because the solute is also considered to be freely rotatable in the solution phase (see Appendix B); (4) the real molecular crystal whose A solid we wish to compute.The various ∆A represent different thermodynamically reversible paths and their accompanying free energy changes.Within each molecule, there is one "Central Atom" represented by the blue solid particle.All the "Central Atoms" are attached to their initial positions with the strongest Einstein springs K max represented by the red solid springs throughout (1)- (3).There are also "Orientational Atoms" represented by the green empty particles.
During ( 1) and ( 2), the "Orientational Atoms" are attached with varying K from K min = 0 to K max , represented by the orange dashed springs, for calculating the orientational contribution ∆A 0 .During ( 2) and (3), both the "Central Atoms" and "Orientational Atoms" are attached with the strongest Einstein springs K max , now all represented by the red solid springs, to sample the intermolecular energy contribution ∆A 1 .Finally, during (3) and ( 4), all the red solid springs on "Central Atoms" and "Orientational Atoms" are reduced from K max to K min = 0 to recover the real crystal.Such a central-orientational arrangement is called the "Einstein Arrangement" or "EA" in our work.The yellow background filling in (3) and ( 4) means that the intermolecular interaction is active, but it is absent in (0)- (2).
In principle, we can compute ∆µ excess by switching on the interaction between the solute and the solvent directly.However, growing a solute in solution directly suffers from inaccuracies (mainly due to the well-known, "end-point" singularity associated with Lennard-Jones (LJ) solute-solvent interactions 48,49 ).The cavity method eliminates the "endpoint" singularity by introducing a softly repulsive cavity in FIG. 2. Schematic representation of the cavity method to compute excess solvation free energies.The blue empty particles are solvent molecules, the red solid particle is the solute, and the black empty dashed particle around the solute is the cavity that we grow and shrink.
the solution, inside which the solute will be grown, and the cavity can be removed after the solute is fully grown.This procedure is described by (2) and (3) in Fig. 2. The cavity is attached to the center of the cubic box.
In (1) and (2) in Fig. 2, we slowly switch on the softly repulsive cavity to a predetermined size, and ∆G grow is the Gibbs free energy change of creating the cavity.The potential energy expressions for states (1) and (2) are U (1) = U solvent-solvent and U (2) (λ) = U solvent-solvent + U cavity (λ).
There is no fixed functional form for U cavity (λ) but U cavity (λ) should not suffer the same divergence problem as the LJ potential.For example, Postma et al. 50have studied the thermodynamics of cavity formation in water, in the form of U cavity (λ) = λ(B/r) 12 , where λ is a linear scaling parameter, B a repulsive parameter, and r is the distance from the water oxygen to the cavity center.However, we cannot use this potential form for the cavity method because it is essentially the repulsive part of the Lennard-Jones potential causing the end-point singularity.In the present work, the cavity potential form we use is U cavity (λ) = N solvent i A exp −r solute-i B + λ , where r solute -i is the distance between the center of the cavity and a fixed point of the solvent molecule.The fixed point can be the center of mass of the solvent molecule, but it is more convenient to choose a specific atom, such as the O atom of water.The size of the cavity is indicated by the variable λ.In our simulations, we kept the parameters A and B fixed at A = 1255 kJ/mol and B = 0.1 nm for the typical case of a single solute molecule in N water = 864 water molecules, and the largest range of λ we vary is between 10 and 5.At λ = 10, the interaction between the cavity potential and solvent is negligible, so it is practically equivalent to having no cavity; at λ = 5, the volume of the cavity is roughly comparable to that of the original cubic box of N water = 864.
In summary, the excess solvation free energy is calculated via The computed excess solvation free energies should also include any tail corrections of long-range dispersion energy, which we obtain from our NpT simulations.

III. SIMULATION DETAILS
All simulations were run using the MD software package LAMMPS. 30The van der Waals parameters and intramolecular interactions were based upon the classical all-atomic force OPLS-AA (optimized potentials for liquid simulations all atomic) 51 and SPC (simple point charge) water. 52We used a time step dt between 0.5 and 1.0 fs.The Lorentz-Berthelot combining rules [53][54][55] were used for interatomic mixings.We have chosen the cutoff distances r cut = 1.2 nm for all solution simulations and r cut = 1.0 nm for all solid simulations.We used the default simulation settings in LAMMPS.In LAMMPS, the equations of motion are based on the work of Shinoda et al., 56 Martyna et al., 57 and Parrinello and Rahman. 58For time integration schemes, we have consistently chosen the velocity-Verlet integrator. 59For simulations in isobaric-isothermal NpT ensembles, a Nosé-Hoover 60,61 thermostat and a barostat were used to equilibrate all solution and solid systems.For simulations in canonical NVT ensembles, both Nosé-Hoover and Langevin thermostats 62 were used.We have used N water = 512 for cyclohexane solution simulations.We have used N water = 894 for naphthalene solution simulations between p = 0.1-200 MPa for T = 298 K and N water = 2048 for naphthalene solution at T = 320 K and T = 340 K for p = 0.1 MPa, as well as N naphthalene = 240 for naphthalene crystal simulations.We used real atomic masses.LAMMPS' default PPPM (Particle-Particle-Particle-Mesh) summation 63,64 with tin-foil boundary conditions 65 was used for all long-range electrostatics.Gauss-Legendre quadrature method 66 and the trapezoid rule 67 were used for TI.
Because the Einstein crystal method requires strong harmonic springs, it is important to choose the time step dt carefully for MD simulations of ∆A 2 , ∆A 0 , and ∆A 1 .We have consistently chosen dt that is below 1/20 of the period T p = 2π m (2K) of a harmonic spring u = K(r i − r 0i ) 2 .For example, for the biggest K max = 2 092 000 (kJ/mol)/nm 2 and carbon atomic mass of 12 g/mol, T p = 2π m (2K) = 10.64 fs so T p /20 = 0.53 fs, and we used dt = 0.5 fs for all Einstein crystal simulations.We stress that some individual aspects of the simulations could have been carried out in a more efficient way, but this would have made it difficult to use standard software.
All the site-site interactions in this work are described by the 12-6 Lennard-Jones (LJ) potential, 68 u LJ (r) = 4ε (σ/r) 12 − (σ/r) 6 , with ε as depth of the potential and σ is the finite intermolecular distance where u LJ (r) = 0.
In both the extended Einstein crystal method and the cavity method, statistical errors were calculated from block averaging, 34,69 and errors of calculated values are obtained from error propagations of the individual block averaging errors.

A. Cyclohexane: Excess solvation free energy
We first illustrate the implementation of our cavity method by computing the excess solvation free energies of an OPLS-AA-based cyclohexane solute in 512 SPC water at T = 298 K and p = 0.1 MPa.
Figure 3 shows our computed excess solvation free energies of cyclohexane in water at different cavity sizes λ.We have an average of ∆µ excess = 7.47 (35) kJ/mol, compared with the computed value of 6.86 kJ/mol in the OPLS-AA + SPC simulation in Ref. 37 and the experimental value 29,37,70 of 5.15 kJ/mol.Both simulations overestimate the excess solvation free energy of cyclohexane slightly but they agree within each other within errors.
Figure 3 shows a plot of the variation of our integrand contributions to ∆G vdwl insert , namely, ∂U vdwl solute-solvent ∂λ vdwl λ vdwl = U vdwl solute-solvent λ vdwl λ vdwl with the solute-solvent van der Waal's coupling parameter λ vdwl for three different cavity sizes λ, during the cyclohexane insertion.It is evident that, as the cavity size we used for cyclohexane insertion increases (indicated by the larger λ), the weak singularity of ∂U vdwl solute -solvent ∂λ vdwl λ vdwl as λ vdwl → 0 becomes less pronounced.At λ = 0, using the trapezoid rule, we estimate ∆G vdwl insert = 18.20(4) kJ/mol, while a single-step FEP gives Figure 4 shows the computed excess solvation free energies for cyclohexane in water, computed (i.e., the cyclohexane solute inserted) at different cavity sizes.Figure 4 illustrates that all values for ∆µ excess agree within error, and the cavity at λ = 0.5 seems already large enough to suppress the weak singularity at the beginning of the switching on of the solute-solvent interaction (see Fig. 3).are sufficiently large to suppress the "end-point" singularity as λ vdwl → 0. In Fig. 5(a), at p = 0.1 MPa, we obtain ∆µ excess = 6.93 (42) kJ/mol, in good agreement with the value of 6.82 kJ/mol reported in the OPLS-AA + SPC simulation in Ref. 37. It has been mentioned that we have used slightly longer cutoff distance r cut = 1.2 nm while the simulation reference has used r cut = 1.0 nm, but ∆µ excess should be nearly independent of r cut for reasonable r cut values if tail corrections are included. 37oth numerical results are in fair agreement with the experimental values 29,37,[70][71][72]

B. Naphthalene: Excess solvation free energy
At T = 298 K, we estimate v solute excess = 123.9(3.0)cm 3 /mol, comparable with the experimental 75 value of 122.0 cm 3 /mol.Again, as in the case of cyclohexane, we verify to what extent the cavity method reduces the "end-point" singularity during the naphthalene insertion.Figure 6 shows a plot of ∂U vdwl solute-solvent ∂λ vdwl λ vdwl for four cavity sizes λ = 0.5, 1, 0, and 5.In Fig. 6(a), λ = 1 is apparently still too small for a complete avoidance of the "end-point" singularity because ∂U vdwl solute-solvent ∂λ vdwl λ vdwl still shows divergent behavior as λ vdwl → 0. In Fig. 6(b), we have a larger cavity at λ = 0.5: here, the divergence of ∂U vdwl solute-solvent ∂λ vdwl λ vdwl is already less pronounced.In Fig. 6(c), the "end-point" divergence at λ vdwl = 0.005 is still visible but in Fig. 6(d) (λ = 5), the divergence, although in principle still present, has become unobservably small.We note that whilst divergences in ∂U vdwl solute-solvent ∂λ vdwl λ vdwl may affect the results of TI, they have no influence on FEP.In practice, single-step FEP tends to be less accurate at small λ, but the agreement between singlestep-FEP and multi-steps FEP/TI gets better for larger λ.On the other hand, it is not favorable to grow an "oversized" cavity for small solutes, because this implies larger accumulated statistical errors.
Finally, we have also used the cavity method to compute the excess solvation free energies of naphthalene in water at higher temperatures of T = 320 K and 340 K at p = 0.1 MPa.76,77 All computed excess solvation free energies are summarized in Table I.
From the slope of Fig. 7(d), we can estimate the partial molar entropy of solvation of naphthalene in water via The experimental line in Fig. 7(d) has a slope of 0.12 (kJ/mol)/K, giving ∆s solute excess = 0.12 (kJ/mol)/K according to Eq. ( 7); similarly, our computed line in Fig. 7(d

C. Naphthalene: Absolute solid free energy and solubility estimation
Next, we consider the implementation of the extended Einstein crystal method to compute the free energy of the naphthalene crystals.A 20-point Gauss-Legendre quadrature series was used to evaluate ∆A 2 and ∆A 0 .Obviously, we should use the same naphthalene force field for the solid and the liquid.Unfortunately, not all force fields describe the solid and liquid phases equally well.However, we found that the force field that we have used for the naphthalene solute can also reproduce the naphthalene monoclinic crystal lattice parameters reasonably well, with around 3% discrepancy between measured and simulated densities (see Fig. 8).Of course, this discrepancy could be fixed by improving the force field, but this was not our aim: we wished to investigate how well "off-the-shelf" force fields perform.
Polycyclic aromatic hydrocarbons often occur in several polymorphs. 78At the ambient pressure and temperature, naphthalene has a monoclinic unit cell with two molecules per unit cell.Fortunately, according to X-ray data, naphthalene remains as a single polymorph at room temperature for pressures up to at least 0.5 GPa. 79,80Raman studies also indicate that naphthalene undergoes no structural phase transition 81 at least up to 3.5 GPa and 450 K.We used the ambient naphthalene crystal structure 82  In the extended Einstein crystal method, we tether a number of atoms (either real or virtual) in each molecule, to restrain the molecular position and orientation.As described in Appendix B, within each molecule, these particles can be categorized into two groups, (1) a single "Central Atom" and (2) a small number of "Orientational Atoms."The "Central Atom" is the atom attaching the molecule to its molecular site.The "Orientational Atoms" are the particles that, when tethered, fix the molecular orientation.For non-linear molecules, the number of "Orientational Atoms" per molecule should be at least two, but in practice, more "Orientational Atoms" are often necessary to reduce the effect of molecular flexibility in the extended Einstein crystal method.On the other hand, using too many "Orientational Atoms" increases the total statistical error because more mean-squared displacements have to be sampled for ∆A 0 and ∆A 2 .The specific spring arrangement in the extended Einstein crystal method is called the "Einstein Arrangement" or "EA."In the example of naphthalene, we have used two Einstein arrangements.In Fig. 9, (a) is "EA 1", the first Einstein arrangement that we used for naphthalene; (b) is "EA 2", the second Einstein arrangement.For both Einstein arrangements, the "Central Atom" is the virtual central particle between the two aromatic carbon atoms.For "EA 1," E, F, G, and H are the "Orientational Atoms," and for EA 2, A, B, C, and D are the "Orientational Atoms."Ideally, the results of the simulations do not depend on the choice of the EAs.
Table II shows the computed absolute solid free energies of naphthalene for p = 0.1-200 MPa and T = 298-340 K (see Tables S1-S5 in the supplementary material for details).pV solid corresponds to the densities and target pressures of the corresponding reference structures.
As a test of the internal consistency of our results, we can compare the internal energy U as obtained from ∂ A solid N solid T ∂ 1 T = U solid N solid , with the value obtained directly from the simulations.Figure 10 shows a plot of A solid N solid T versus 1 T with data from Table II, and the least squares fit slope is 65.7(9) kJ/mol.For comparison, we have run independent NVT simulations using the reference structures at these three temperatures, with U solid N solid = 67.38(1)kJ/mol, 66.34(1) kJ/mol, and 65.78 (1) kJ/mol at T = 298 K, 320 K, and 340 K, respectively.These values are all close to the slope of 65.7(9) kJ/mol.It should be noted that U solid N solid consistently includes only the intermolecular interactions.
Similarly, we can compare the molar volume (more precisely, the "volume per molecule") obtained from ∂ G solid N solid ∂p = v solid , with the value obtained directly from NpT simulations.Figure 11   against p with data from Table II.After unit conversions, the gradient gives us v solid = 0.181(1) nm 3 /molecule, compared with v solid = 0.1865 nm 3 , v solid = 0.1827 nm 3 , and v solid = 0.1798 nm 3 of the reference structures at p = 0.1 MPa, 100 MPa, and 200 MPa, respectively.Also, v solid = 0.181(1) nm 3 corresponds to a partial molar volume of v solid = 109.0(6)cm 3 /mol in the crystal phase, compared with the experimental value 75 of v solid = 108.2(3)cm 3 /mol.We are now in a position to estimate the aqueous solubility of naphthalene at these pressures and temperatures according to Eq. ( 4).In Fig. 12, we compare the computed solubilities with the experimental data. 75,85In Fig. 12, our simulated result at p = 0.1 MPa and T = 298 K predicts a (mole fraction) solubility limit of x = 4.74 × 10 6 compared with the experimental value of x = 4.40 × 10 6 .Also, as shown in Fig. 12(a), the negative pressure dependence of naphthalene solubility is reproduced fairly well, and in Fig. 12(b), the positive temperature dependence of naphthalene solubility is again well captured.Considering the simplicity of the model, such an agreement is almost certainly fortuitous.
From the slopes of the least squares fits in Fig. 12(a), we can estimate the experimental and computed volume change accompanying the dissolution, 75 ∆v = −RT ∂lnx ∂p T , (10)   where ∆v is defined as the difference between the partial molar volume at infinite dilution in the solution v solute excess and the partial molar volume in the crystal v solid , The linear least squares fit of the experimental data in Fig. 12(a We have used a "flexible" model for naphthalene, and hence we have to check whether the effect of flexibility is different in the solid and the liquid phase.To this end, we tracked the intramolecular van der Waals plus electrostatic energy per naphthalene E intra /N solid at p = 0.1 MPa and T = 298 K (see the supplementary material).We find that the computed intramolecular energies in solution and in the solid agree within the statistical errors.We have also tracked the intramolecular bond-angle-torsion energy per molecule E mol /N solid throughout the complete route of the extended Einstein crystal method at p = 0.1 MPa and T = 298 K.We find that E mol /N solid can vary by 2 kJ/mol, depending on the spring constants K during the computation of ∆A 2 and ∆A 0 .Also, using different Einstein arrangements seems to have a small but noticeable effect on the value of A solid .For example, the largest difference within the range of computed A solid is around 0.8 kJ/mol, approximately 10 times of the statistical error 0.08 kJ/mol.Since in the current scheme, we compute ∆A 2 and ∆A 0 via sampling atomic mean-squared-displacements from their initial tethering atomic coordinates, it is not surprising that molecular flexibility in the extended Einstein crystal method has an effect.However, it is not straightforward to take these effects into account quantitatively.Fortunately, they have limited impact on the solubility prediction of naphthalene.However, to obtain higher accuracy, we would have to include the intramolecular parts in the Einstein crystal Hamiltonian, so as to compute explicitly the intramolecular contributions to the "Einstein free energy."An alternative is to use rigid molecular models, but for bending and torsion potentials, this would not be realistic.
As explained in Appendix B, we have used final configurations from long equilibrations as reference states for the extended Einstein crystal method.This should have no effect on the free energy estimates.We verified this by computing the absolute solid free energy of another naphthalene reference structure at T = 298 K and p = 0.1 MPa with K max = 1 046 000 (kJ/mol)/nm 2 .The values are A solid /N solid = 46.19(8) kJ/mol and 45.69 (8) kJ/mol for Einstein Arrange ment 2 and Einstein Arrangement 1, respectively, which are within the range of values (see the supplementary material).

V. CONCLUSION
In this work, we have developed a robust computational methodology to enable the solubility prediction of general crystalline solutes.The method is based on the equality of the chemical potentials of the solute in coexisting solution and solid phases.Thus, by separately computing the chemical potentials of the solute in the two phases via the cavity method and the extended Einstein crystal method, we obtain a numerical estimate of its solubility limit.The methodology is formulated by integrating existing alchemical numerical techniques such as free energy perturbation and thermodynamic integration with tailor-made protocols in the molecular dynamics software package LAMMPS. 30Extensions to other software packages or homemade Monte Carlo codes should be straightforward.To demonstrate the methodology, we have estimated the solubility of naphthalene in water for a range of temperatures and pressures using simple all-atomic force fields.These predictions are all in reasonable agreement with experiments within a factor of 2.
Regarding the solution part of the methodology, the computation of excess solvation free energies of general solutes is already a well-established procedure in molecular simulations.The cavity method is computationally cheap when the solubilities of a range of compounds in the same solvent must be computed (the cavity only needs to be created once).For the solid part of the methodology, we have applied simple extensions of the original Einstein crystal method so that the method can be easily used in LAMMPS 30 without extra code.Potentially, our approach can predict the solubility of other polymorphs, not just the one that is most stable.Our current results show that a better treatment of molecular flexibility may be needed to improve the accuracy of the results.The development of computing numerical solubilities has been hampered by two factors: (1) the limited availability of accurate force fields for molecules in crystals and in solution and (2) the fact that computing absolute solid free energies of molecular solids is not very popular, because the method, although straightforward, is cumbersome.In the literature, popular classical force fields, such as OPLS-AA 51 used in the present work, are mostly targeted at liquid phases, and in general, there are fewer accurate parameterizations for crystalline compounds.These problems may not always be relevant for the compound of interest, but they have indeed heightened the barrier for applying the Einstein crystal method and its adaptions to solubility estimation.Another obvious complication is that we have to know the experimental structure of the crystal (polymorph) that we study.In the present work, we have conducted all our simulations in the MD software package LAMMPS, 30 which at present does not allow for isobaric-isothermal NpT simulations of crystals of rigid molecules with a non-cubic simulation cell.We stress that many of the problems that we face are in no way exclusive to the topic of solubility prediction, but are generic to all numerical studies requiring high accuracy.Our key message is that there is no fundamental obstacle in developing a generic computational methodology for solubility prediction.Therefore, with the ever-increasing computational power, improving classical force fields (and hopefully, more experimental validation data appearing in the literature), and more solubility studies in the literature, it is likely that accurate yet general solubility predictions will soon be commonplace.

SUPPLEMENTARY MATERIAL
See supplementary material for details of the LAMMPS protocol to calculate the absolute chemical potentials of the solute in the aqueous solution and solid phases A solid and ∆µ excess at T = 298 K and p = 0.1 MPa, and the discussion of the effect of phase on the intramolecular energy of the solute.

ACKNOWLEDGMENTS
The authors would like to acknowledge the funding and technical support including BP's High Performance Computing facility, from BP through the BP International Centre for Advanced Materials (BP-ICAM) which made this research possible.All the simulations in the work were hence conducted using the HPC resources from BP and computer resources at the Department of Chemistry, Cambridge.

APPENDIX A: CHEMICAL POTENTIAL OF THE SOLUTE
We adopt the definition of the chemical potential of a solute on the work of Widom 86,87 and the subsequent work of Ben-Naim. 73,88n the thermodynamic limit, For the solid phase, with Q solid as the canonical partition function, we can write the absolute Helmholtz free energy as In Eq. (A10), Z solid is the configurational contribution and P solid the momenta and intramolecular (vibrational and electronic) contribution to Q solid .U inter r N solid denotes the intermolecular potential energy.Λ solid is the de Broglie wavelength for the solute in the solid phase, and q solid is the intramolecular (vibrational and electronic) partition function of the solute in the solid phase.
Finally, substituting Eqs.(A10) and (A6) into Eq.( 1) gives If we adopt Λ solid = Λ solute and q solid = q solute (the assumption that the intramolecular contributions in the solid/solution phases cancel each other out), the expression for solubility estimation in the unit of k B T per solute molecule is Going from Eq. (A11) to Eq. (A12), the part of the intramolecular partition function associated with the molecular internal rotational degree of freedom is different for the solution and crystalline phase, because in the solution phase the solute is equally likely to have any orientation, while this is not the case in the crystalline phase.If we take an asymmetric top as an example (a similar argument would work for other types), in principle, we can define the molecular rotation with reference to any reference point because configurational properties do not depend on the choice of reference point.The proper orientational partition function for a solute molecule in the solution is q solution or = ∫ exp (− βu or ) sin θdθdφdγ = 8π 2 , where θ, ϕ, and γ are the Euler angles, and u or is the single molecular orientation Hamiltonian, 40 and we have u or = 0 for a solute molecule that can freely orient in the liquid phase.Then, the important message is that, at one intermediate stage of the extended Einstein crystal method (see Appendix A), each freely rotating solute molecule has the same rotational partition function to the solute in the liquid phase, so the whole crystal has the total rotational partition function as Q solid or = ∫ exp(− βu or ) sin θdθdφdγ N solid = 8π 2 N solid with ln q solid or = N solid ln 8π 2 N solid = ln q solution or .Here the rotational partition function is effectively cancelled out and hence Eq. (A12) is valid.

APPENDIX B: EXTENDED EINSTEIN CRYSTAL METHOD
Prior to the implementation of the extended Einstein crystal method, we need to have a suitable crystalline reference structure.Although it is common to take the minimum energy crystal structure as a reference, this is not necessary, as the computed free energy of the interacting crystal does not depend on this choice.In order to facilitate the use of the Einstein crystal method with standard MD software, we take a final configuration of the crystal after a long isobaric-isothermal NpT equilibration run as a reference.
There are several ways to constrain the position and orientation of molecules in a crystal using harmonic springs.The most elegant approach would be to tether the center of mass of the molecule to a lattice site using a harmonic spring, and to fix the orientation of the molecule using a "harmonic" constraint on the quaternion specifying the molecular orientation.However, as many existing codes use atomic coordinates, it is in practice more convenient (though certainly less elegant) to restrain the orientation of the molecules by tethering three or more atoms harmonically to the corresponding "atomic lattice sites" (see Fig. 1).Again, this choice should not introduce systematic errors in the estimate of the solid free energy.
In Fig. 1, (0) is an ideal Einstein atomic crystal with a "zero total momentum."When computing the partition function, we can, as before, ignore that part due to the momenta, so that the absolute Helmholtz free energy with no constraint on the total momentum or the center of mass is The potential energy for state (0) is 2 .N solid is the total number of "atomic molecules" in (0), which is the same as the total number of molecules.
The constraint of a "zero total momentum" is similar to the requirement of fixing the center of mass in the original Einstein crystal method.We know that this constraint is important in MC simulations in the limit of vanishing spring constants, because the center of mass of the system could diffuse. 34,38owever, precisely in this limit, it is sufficient to carry out MD simulations at zero total momentum to constrain the center of mass.The reason of choosing this condition is that in MD simulations it is straightforward to initiate the total momentum at zero.The particles are initially at their tethering positions, so the net force on the center of mass is zero.If the system is then initialized with zero total momentum, the center of mass will not move.This is strictly true if all atoms are involved in the Einstein crystal simulations and are of equal masses.There might be small fluctuations of the center of mass of the system if: (1) Einstein springs of the same strength K are attached to particles with different masses by using real atomic masses, this is because only when K i m i is constant that we can maintain a fixed total momentum; (2) for molecules with nonlinear molecular structures, during the thermodynamic stage of (1) and (2), "Orientational Atoms" are attached with different spring strength K from "Central Atoms," so K i m i is not constant even when equal atomic masses are used.For practical applicability, we choose to remove the constraint of fixing the center of mass and total momentum from our partition function and simulations.In doing so, the only difference introduced to our computed absolute solid free energies is the analytic finite size correction associated with fixing the center of mass and total momentum.Along with the finite-size correction from harmonic crystal approximation, 43,44 they comprise all the finite-size corrections that are usually important to be included in theoretical studies of, e.g., Lennard-Jones (LJ) or hard spheres (HSs) crystals, where uncertainties can be as small as 0.001 k B T per molecule. 34,38However, for solubility predictions of more complex molecular crystals, these finite-size corrections are negligible compared to other sources of error.This is because the accuracy of solubility estimation is dominated by the statistical errors of the computed excess solvation free energies and solid free energies, as well as the quality of the force field used.For the harmonic approximation, the dominant finite-size correction per molecule is proportional to +lnN solid N solid .For typical system sizes in the present work, this amounts to around +lnN solid N solid = +0.02k B T /molecule for N solid = 240.As we will show, this is comparable with typical statistical errors in our computed solid free energies.For this reason, we simply ignore this term.The finite-size correction due to the constraint on the center of mass and total momentum is again of the order of 1/N and small compared with typical statistical errors.To estimate these magnitudes, we can compare Eq. (A1) βA 0 = 3 2Nln βK max π , the analytic expression for an Einstein atomic crystal at state (0) without the constraints on the center of mass and the total momentum, and βA 0 = 3 2 (N − 1) ln βK max π + ln N V − 3 2ln (N), the analytic expression for an Einstein atomic crystal with these constraints plus the general free energy difference between any two solids with and without these constraints. 41The value of βA 0 − βA 0 N is around 0.1 k B T /molecule for N solid = 240 and K max = 1 046 000 (kJ/mol)/nm 2 , comparable to or smaller than errors of computed excess solvation free energies.
For steps (0) and (1), we have A 1 = A 0 .In (0), each atom is attached to its initial atomic lattice site through its center of mass.We can treat these blue solid atoms as the "Central Atoms" in (1).In (1), each molecule is attached to its initial molecular site through its blue solid "Central Atom," around which the molecule is freely rotatable.Using atomic coordinates is a less elegant way to specify the molecular orientation, but we gain much practical convenience.For (1), the proper orientational partition function for a lattice containing N solid molecules is Q or = ∫ exp (− βu or ) sin θdθdφdγ N solid , where θ, ϕ, and γ are the Euler angles, and u or is the single molecule orientation Hamiltonian. 40For solubility estimation, the important information is that the same orientational partition function applies to a single solute in solution as well.Therefore A 1 = A 0 is valid as long as this same reference of an ideal gas is used for both phases.
In steps ( 1) and ( 2), the springs on the "orientational atoms" gradually grow from K min = 0 to K max to recover the molecular orientation of the real crystal.In (1), the potential energy is U , with the summation over all the "Central Springs" and the total number of molecules in the solid; in (2), the potential energy is (B2) In Eq. (A4), the constant c 1 is added to make a change of the original variable K to allow for more accurate integration. 34,38o compute the free energy change ∆A 1 in steps (2) and (3), we implement a TI procedure, first switching on the electrostatic interactions, going from the non-interacting Einstein molecular crystal with the potential energy U In the classical all-atomic MD, Columbic electrostatics is based on pairwise-additive point (partial) charges.Therefore, for a system of (partial) charges αq i with i indicating the atom types, we can write the total electrostatic potential energy for the system as U electrostatic inter (α) = N solid i j (αq i ) αq j φ r ij , where φ r ij is the position-dependent electrostatic potential, so ∂U electrostatic inter (α) ∂α α = 2α i j (q i ) q j φ r ij The free energy change in (3) and ( 4) is denoted by ∆A 2 .In (3), all the Einstein springs on the "Central Atoms" and "Orientational Atoms" are now equivalent, and thereafter during (3) and ( 4), all of them are reduced from at K max to K min = 0 together.For (4), the potential energy finally corresponds to the real crystal, i.e., U (B6) Again, the constant c 2 is added to make a change of the original variable K to allow for more accurate integration. 34,38n summary, the final expression for the absolute Helmholtz free energy A solid of a molecular crystal is .

(C3)
The total insertion free energy is therefore ∆G insert , ∆G vdwl insert , and ∆G electrostatic insert are all computed in the same cavity λ created by ∆G grow .
For small neutral solutes, or if the size of the cavity is large enough, ∆G grow can be easily obtained by a single-step FEP.In principle, we can always create a cavity that is sufficiently large for the single-step FEP insertion of any large solutes, by using a larger λ.However, our purpose is to reduce the "endpoint" singularity as much as possible, and as long as this is satisfied, there is no need to create a larger cavity, in which case we can use two-steps/multi-steps FEP/TI.
Finally, in (3) and ( 4), we gradually switch off the cavity potential to recover the final state of a fully solvated solute in the solution, Because the removal of "end-point" singularity is a gradual process, there is no strict criterion to determine when a cavity becomes "sufficiently large," but we use as a rule of thumb that the solvent should never experience the repulsive part of the solute-solvent interaction.Of course, the singularity problem can be avoided by using a small FEP calculation to replace the first step in the TI.

N
solid = −ln 4 per molecule for naphthalene, because the point group is D 2h .

Figures 5 (
Figures 5(a)-5(c) show the computed excess solvation free energies of OPLS-AA-naphthalene in SPC water with increasing cavity size at p = 0.1 MPa, 100 MPa, and 200 MPa, respectively, at T = 298K.For all pressures, the values of ∆µ excess (λ) become independent of λ, to within statistical error for λ 0, indicating that the cavities beyond λ = 0

FIG. 4 .
FIG. 4. Computed excess solvation free energy of cyclohexane in water at T = 298 K and p = 0.1 MPa.The red dashed line is the average value of 7.47 kJ/mol.
of 9.58 kJ/mol or 10.00 kJ/mol, with the difference most likely due to deficiencies of the force field.Similarly, in Figs.5(b) and 5(c), we plot the computed excess solvation free energies of naphthalene in water with increasing cavity size at p = 100 MPa and 200 MPa and T = 298K.

Figure 5 (
d) shows the variation of the computed excess solvation free energies with pressure from p = 0.1 MPa to p = 200 MPa at T = 298 K.The pressure derivative of ∆µ excess at fixed T at infinite dilution is the excess partial molar volume at infinite dilution, 73,74 v solute excess , ∂∆µ excess ∂p T = v solute excess .

9 )FIG. 5 .
FIG. 5. Computed excess solvation free energies of naphthalene in water at T = 298 K for p at (a) 0.1 MPa, (b) 100 MPa, and (c) 200 MPa.The dashed lines in (a)-(c) represent the average values for λ 0 for the three pressures.These three average values are also plotted in (d).

β
FIG. 7. Computed excess solvation free energies of naphthalene in water at p = 0.1 MPa and T of (a) 298 K; (b) 320 K; and (c) 340 K.The black dashed lines in (a)-(c) are the average from specific λ for the three temperatures (λ 0 for 298 K and 320 K, and λ 1.0 for 340 K), and they are also plotted (d).In (d), the empty blue squares are experimental data.
FIG. 8. (a) Simulated and experimental unit cell volumes of naphthalene with pressure.The empty blue squares are the averaged simulated results in this work, the empty red circles ("experimental unit cell volume 1" in the figure) are from the experiments of Ref. 81, and finally the empty orange rhombuses ("experimental unit cell volume 2" in the figure) are from the experiments of Ref. 78; (b) variation of simulated and experimental unit cell volumes with temperature.The empty blue squares are the averaged simulated results in this work, and the empty light green triangles ("experimental unit cell volume 3" in the figure) are from the experiments of Ref. 83.We cite the experimental melting point of 353.4 K from Ref. 84; (c) and (d) represent snapshots of the simulated naphthalene at p = 0.1 MPa and T = 298 K (hydrogen atoms are not shown and all bonds/atoms are colored equivalently).
FIG. 10. (A solid /N solid )/T against 1/T.Energies are in kJ/mol and T is in K.

FIG. 12 .
FIG. 12. Computed and experimental mole fraction solubility ln x of naphthalene in water (a) with varying pressure at T = 298 K; (b) with varying temperature at p = 0.1 MPa.The procedure by which we estimated the error bars in this figure is described in the supplementary material.

2 ,− r 0ij 2 K×
with the summation over all the springs.The accompanying free energy change ∆A 0 can be computed via TI, (K + c 1 ) dln (K + c 1 ) .

TABLE II .
Summary of the absolute solid free energies of naphthalene and densities of the reference structures and equilibrated crystals in NpT simulations.Energies are in kJ/mol.