From dimers to the solid-state: Distributed intermolecular force-ﬁelds for pyridine

An anisotropic atom-atom force-ﬁeld for pyridine, using distributed atomic multipoles, polariz-abilities, and dispersion coefﬁcients and an anisotropic atom-atom repulsion model derived from symmetry-adapted perturbation theory (density functional theory) dimer calculations, is used to model pyridine crystal structures. Here we show that this distributed intermolecular force-ﬁeld (DIFF) models the experimental crystal structures as accurately as modelling all but the electrostatic term with an isotropic repulsion-dispersion potential that has been ﬁtted to experimental crystal structures. In both cases, the differences are comparable to the changes in the crystal structure with temperature, pressure, or neglect of zero-point vibrational effects. A crystal structure prediction study has been carried out, and the observed polymorphs contrasted with hypothetical thermodynamically competitive crystal structures. The DIFF model was able to identify the structure of an unreported high pressure phase of pyridine, unlike the empirically ﬁtted potential. The DIFF model approach therefore provides a model of the underlying pair potential energy surface that we have transferred to the crystalline phase with a considerable degree of success, though the treatment of the many-body terms needs improvement and the pair potential is slightly over-binding. Furthermore, this study of a system that exhibits isotopic polymorphism highlights that the use of an empirical potential has partially absorbed temperature and zero-point motion effects as well as the intermolecular forces not explicitly represented in the functional form. This study therefore highlights the complexity in modelling crystallization phenomena from a realistic pair potential energy surface. ' 2017 Author(s). All article


I. INTRODUCTION
The true intermolecular potential energy surface of a molecule should be transferable between all phases, allowing simulation of all the physical properties that depend on the forces between the molecules. A pairwise approximation is suitable enough to describe the gas-phase properties but the many-body (non-additive) terms may affect the condensed phases. 1 This paper explores transferring a theoretically based many-body pair potential for pyridine 2 to the crystalline state.
The holy grail of defining a sufficiently accurate analytical pair potential to account for all the physical properties of the inert gases was achieved in the 1980's. 3 These potentials only included the three-body dispersion term for the condensed phases. Water and other small polyatomic molecules now have very accurate potentials available, though these are often not used in Molecular Dynamics (MD) simulations because of the need to compromise between accuracy, speed of evaluation, and functional forms assumed in the suitable codes. The condensed phases of organic molecules are an area of increasing interest, with the ability to simulate the differences in material properties between polymorphs being highly desirable for the specialty chemical industries. [4][5][6] Polymorphs, which are different crystal structures with identical liquid and vapor phases, usually have a number of differing physical properties, such as solubility, melting point, morphology, shock and mechanical stress responses, surface chemistry, reactivity, and phase transformations. Over 50% of organic molecules are found to be polymorphic in the industrial searches for polymorphs which are routine in developing a pharmaceutical product. 7 Hence all organic specialty industries, from opto-electronic, energetic pigments, agrochemicals as well as pharmaceuticals, wish to establish the behavior of the organic molecular crystals under all functioning conditions. However, no current force-fields are accurate enough for use in predicting the crystal structures of weakly bound organic molecules: all successful methods in the recent blind test of crystal structure prediction 8 (CSP) relied on large numbers of electronic structure calculations for attempting to rank the relative energies of possible crystal structures. Thus, there is a need for force-fields that correctly model the thermodynamics and properties of the observed polymorphs and other hypothetical crystal structures, the liquid, and the gaseous phase if we are to be able to study the phase change behavior of organic molecules.
The intermolecular force-fields that have been most widely used in simulating organic condensed phases have been derived by empirical fitting to crystal structures and properties 9,10 with increasing the sophistication being used for the challenge of modelling crystal structures 6,11,12 to the extent that they rival popular periodic DFT+D calculations in accuracy. 6 The parameterization of such empirically fitted force-fields inevitably absorbs, to some extent, errors, approximations, and assumptions in the experiments and computational methods, such as neglect of thermal effects, or assumed functional form. The problems with selecting the experimental data and the assumptions used in fitting a potential to the data can be avoided by fitting an analytical force-field to an ab initio potential energy surface. 13 This approach of using ab initio derived analytical potential energy surfaces (PESs) for spectra of gas-phase clusters has been extensively applied to water [14][15][16][17][18] and other small molecule complexes, [19][20][21] and various ab initio astrophysical applications. [22][23][24][25] Pioneering studies of the organic solid-state using analytical fits to ab initio interaction energy calculations have mainly concentrated on energetic molecules, 26,27 where predictive modelling requires reliable extrapolation into the repulsive region that is not sampled in fitting empirical potentials. A non-empirical potential, based on an early variant of the Distributed Intermolecular Force-Field (DIFF) approach for C 6 Br 2 ClFH 2 , 28 was successful in predicting its crystal structure in the fourth blind test. It is therefore very timely to assess the barriers to producing a sufficiently realistic model intermolecular potential from dimer calculations that can be used to predict the properties of a molecule in all phases and assess the extent to which the many-body terms are a limitation.
Considerable advances have been made in the ability to calculate intermolecular pair potential energy surfaces accurately. 2,13,[29][30][31][32][33] Computing sufficient number of points to define the PES leads to two challenges: (1) the computational cost incurred, particularly for large molecules, and (2) the need for sufficient accuracy.
Both needs are satisfied by the symmetry-adapted perturbation theory based on density functional theory 11,34-37 SAPT(DFT), which has now become the method of choice for many applications involving weakly bound molecular interactions. 2,28,[38][39][40][41] However, errors are introduced by the choice of an analytical functional form and in the fitting process. Recently a method of automatically generating analytical intermolecular potential energy surfaces in an isotropic atom-atom functional form to SAPT(DFT) numerical points reported a typical fit error of about 0.8 kJ mol 1 in the negative energy region. 13 Also, recently, one of us has been involved in the development of the Slater-type Force-Field (Slater-FF) models, which exhibits average errors of 0.2 kJ mol 1 in the attractive region for a wide range of interacting dimers. 31 However, these model potentials have so far been based on an isotropic atom-atom form, primarily for the ease of use in simulation programs. Here we explore the alternative approach of using anisotropic atom-atom model potentials that are generated from distributed monomer properties and advanced fitting to SAPT(DFT) component energies. This approach, using the program CamCASP, has been described in detail in an application to the pyridine dimer. 2 This type of non-empirical distributed intermolecular force-field (DIFF) uses distributed multipolar models for the electrostatic, polarization, and dispersion interactions and anisotropic atomatom exponential short range terms derived by fitting to SAPT(DFT) interaction energies via a distributed densityoverlap model.
While all three of these models represent considerable progress in accurate force-fields for dimers, it is important to determine how well they perform for the crystalline state. There are two issues that arise here. The first is that the lattice energy differences between 80% of polymorph pairs 7 is less than 4 kJ mol 1 , with many observed polymorph energy differences being close to the reported errors in the dimer potentials. The second issue is that so far these models have only been tested on gas-phase and calculated dimer properties while there are many-body effects in condensed phases.
In this paper, we pioneer transferring this DIFF models to the solid-state phases of pyridine, seeking to explore the issues involved in defining and using analytical force-fields that can transfer between phases.

A. Pyridine
Pyridine has a melting point of 231.6 K, 42,43 and as a liquid at ambient conditions has relatively weak intermolecular interactions. The molecular structure varies so little in the condensed phases that pyridine is assumed to be rigid and to adopt the isolated static molecule structure. However, pyridine has a dipole moment of ∼2.2 D 44 and can be considered capable of forming weak C-H· · · N hydrogen bonds. 45 Thus, dispersion, electrostatic, and polarization forces are competitive with the repulsive forces in determining the structure of the different phases, as has already been shown in the design of an intermolecular potential for simulating liquid pyridine. 46 The first crystal structure of pyridine to be determined is unusually complex, 42,43 with 4 independent molecules in the asymmetric unit (Z = 4) and 16 molecules in the unit cell (form I, Pna2 1 ). After an early crystal structure prediction (CSP) study showed that simpler structures were thermodynamically competitive, a low-temperature form II (P2 1 2 1 2 1 Z = 1) was crystallized from pentane for deuterated-pyridine (d5) though not for protonated-pyridine (h5). 47 The two polymorphs are rather similar, with a large common coordination cluster ( Fig. 1) but differ in the stacking so that a CH· · · π interaction in form I becomes a CH· · · N interaction in the lowtemperature, high-pressure form II. 48 A later CSP study, using DFT-D periodic electronic structure calculations, showed that the lattice energy of form II is slightly more stable than form I by less than 0.1 kJ mol 1 , though the difference increases with pressure, and alternative structures are within 0.1 kJ mol 1 of the observed structures. 49 The crystallization behavior of pyridine is isotope-dependent, as well as depending on pressure and temperature. 45,48 Additionally, towards the conclusion of this work, we became aware that a third form (d5-III) had been crystallized at around 2 GPa pressure 50 and were challenged to identify its structure, providing a test of how the intermolecular potential models the repulsive wall.

A. Analytical intermolecular potential models
This study used the most accurate version of the DIFF approach, referred to as model 3 in the paper which described its development using CamCASP. 2 Here we focus on the aspects of the model that required modification and refitting for use in organic crystal structure modelling code DMACRYS, 51 and the checks used to ensure that the potentials were equivalent. The DIFF model has the following functional form, omitting the standard multipolar expansion of the electrostatic and polarization terms: 1 where the relative position (R ik ) and orientations (Ω ik ) of atom i in molecules M and k in molecule N are defined by the relative position and orientation (R MN , Ω MN ) of the molecules M and N and the rigid molecular structure. The static isolated molecule structure was obtained by optimization using the Perdue-Burke-Emzerhof (PBE) general gradient approximation combined with a portion of exact exchange to form the PBE0 functional [52][53][54] and the cc-pVTZ basis set, and kept rigid in all simulations. The long range contributions are all modelled using distributed molecular properties 1 of the asymptotically corrected 55 (PBE0/AC) charge density calculated with the d-aug-cc-pVTZ Dunning basis 56 and referred to a molecule-fixed axis system, with x along the C3-N symmetry axis and the molecule in the xy plane. The distributed multipoles (Q i lm ) were derived using the basis-space iterated stockholder atoms (ISA) 57 analysis, which is a more effective partitioning method than GDMA 58 as it exhibits more natural charges, nearly spherical shapes, and is, overall, a better model of the atomic anisotropic electron densities. The distributed multipole expansion is evaluated up to rank 4 (hexadecapole) to include the contributions of the atomic anisotropy, such as the lone-pair (dipole) and π-orbital (quadrupole) features, which are important in organic crystals. 59 The atomic polarizabilities (α i lm,l m ) were limited to just the dipolar terms (l = l = 1) and the atomic dispersion coefficients (C ik n ) were isotropic, with both being derived using the Williams-Stone-Misquitta (WSM) polarizability model. [60][61][62] The C 8 and C 10 dispersion terms are included for heavy atoms, to ensure the correct distance dependence for the dispersion, as the relative orientations of molecules in crystals sample the dispersion out to infinity. Both the polarization and dispersion terms are damped with a single atom-atom parameter ( β ind = 1.25 a.u., β disp = 1.67 a.u.) Tang-Toennies damping function 63 to prevent unphysical behavior at close contacts.
The remaining short range contributions to the intermolecular potential were obtained using the CamCASP and ORIENT programs by fitting to a large number of SAPT(DFT) dimer energy calculations via a distributed density-overlap model. The fitting was performed in a hierarchical manner to obtain the atom-atom exponential repulsive model with isotropic preexponential (A ik ) and hardness (B ik ) coefficients. 2 The short range term is made anisotropic by the inclusion of the shape function (ρ ik (Ω ik )) in the exponential. The shape function is a polynomial containing anisotropic coefficients that were defined relative to an atomic local axis system with the y axis normal to the plane of the molecule and the z axis along the C-H bond pointing out from the molecule. The changes to these specific molecule-fixed and local axes are required for most molecules (unlike for pyridine) by the conventions within the crystal structure modelling program DMACRYS. These conventions enable the multipole moments to be transformed to maintain a right-handed axis system when molecules are generated by inversion or other symmetry operators that would otherwise invert the axis definition. The change in axis systems meant that the repulsion part of the potential had to be refitted, but this DIFF model is essentially the same as model 3 from Ref. 2 except for the definitions of axis systems. The parameters of the DIFF model are given in full in the supplementary material in the ORIENT format so that the symmetry relationships are clear. The accuracy of the translation between the gas/dimer and solid state modeling was tested by the calculation of the second virial coefficients and by reproducing the previous ORIENT gas phase results 2 in DMACRYS calculations in which identical dimers were set up within cubic unit cells of side 100 Å. Two other non-empirical variants on this model were also tested (see the supplementary material). DIFF(no-pol) simply omits the polarization term, which is the only many-body contribution included in DIFF. The effect of the anisotropic repulsion was tested by refitting the short-range terms to an isotropic model (ρ ικ (Ω ik ) = 0) to give DIFF(iso-rep).
The empirically fitted potentials being used for comparison are representing all but the electrostatic contribution by an empirical isotropic atom-atom exp-6 potential, where atom i is of type ι (C/H/N) and atom k of type κ. The exp-6 FIT parameters 51 relevant to pyridine were originally fitted to aza-hydrocarbon crystal structures and a limited set of crystal energies 64 whereas the WILL01 set discriminates between different types of N hybridization. 12 The FIT model is combined with the following electrostatic models: the same ISA distributed multipoles as used in DIFF, the point charge only component, and the electrostatic model that has often been combined with FIT in crystal structure prediction studies 65 [i.e., distributed multipoles from the GDMA2 58 analysis of the PBE0 6-31G(d, p) molecular charge density, thus involving a change in the wavefunction ψ used to produce the electrostatic model]. Since the WILL01 potential is defined with the H interaction sites moved slightly into the C-H bond, to reflect the non-spherical nature of the charge distribution around hydrogen atoms, distributed multipoles for these non-atomic sites could only be obtained using GDMA2. 58

B. Simulation methods
The gas phase dimer properties were simulated using ORI-ENT, 66 including the calculation of the classical second virial coefficients by numerical integration. The second virial coefficients are an appropriate test of the intermolecular pair potential in the gas phase and could be compared with experimental values 67,68 derived by vapor compressibility measurements in the 1950s, which showed some deviation from prior measurements, and gave a heat of vaporization of 35.2 kJ mol 1 at the boiling temperature of 115.26°C and 40.4 kJ mol 1 at 25°C.
The CSP study is based on lattice energies E latt , the internal energy of the system relative to infinitely separated molecules in their lowest energy configuration: The total lattice energy can be separated into atom-atom intermolecular interactions and intramolecular (geometry distortion) energies of the molecules, however, in this study, molecules are held rigid, hence ∆E molecule intra = 0 and the lattice energy depends only on the atomatom intermolecular interaction energy. The crystal structures were modelled using DMACRYS2.2.1.1, 51 with lattice summations being carried out to 15 Å followed by a 2 Å splined correction, with the charge-charge, charge-dipole, and dipoledipole electrostatic contributions being evaluated by Ewald summation. Optimization used the analytical first derivatives apart from the forces due to the polarization term.
The polarization term is the only non-additive many-body term in the potential, and its use in the solid state produces new challenges. The induced moments in a crystal structure need to be solved iteratively to consistency 51 because the polarizing field depends on the polarized moments of all the molecules in the crystal. Hence calculating the polarization forces requires taking numerical derivatives of the iterated distributed induced moments. As smooth, numerical forces are needed, optimization with the polarization energy is very computationally demanding. This has been done for the experimental structures (see Table I of the supplementary material), where it was confirmed that the inclusion of polarization forces during optimization changed the lattice parameters from those optimized without the forces due to polarization by less than ±0.04 Å. Thus the geometric changes resulting from including polarization forces resulted in a negligible further change in the lattice energy. A significant effort was required to do the optimization including the polarization forces and since the calculations in Table I of the supplementary material show that the cell geometry with polarization forces included is so close to that obtained by just adding the polarization energy after optimization, all other optimizations and second derivative property calculations did not include the derivatives of the polarization contribution. A single-point polarization energy was added for the DIFF and DIFF(iso-rep) model potentials. This approximation may only be acceptable because the induced moments are small, as shown by a small induced moment electrostatic potential on the van der Waals surface (Fig. 2). However the induced moments do vary with the changes in crystalline environment, with all the four independent molecules in form I having a less symmetric environment, and hence greater polarization, than the single independent molecule in form II (Fig. 2).
The energy difference between the hypothetical static crystal at 0 K and a real crystal can be estimated by lattice dynamics within the harmonic approximation and FIG. 2. The additional electrostatic potential around each pyridine molecule in the two crystals arising from the induced atomic dipole moments, after optimization with the DIFF model. The potential in eV is displayed on the van der Waals surface using ORIENT 4.9.08. 66 The lower symmetry environment in Z = 4 form I produces a marked difference in the induced moments relative to Z = 1 form II.
using the DMACRYS methodology and scripts developed by Nyman. 6, 69 We sample a number of k-points in reciprocal space by computing the phonons frequencies 70 and elastic stiffness tensors 71 for a number of linearly elongated supercells of pyridine to sample the first Brillouin zone. The supercells are generated by selecting a default 0.12 Å 1 k-point spacing 6,69 which samples about 26 k-points. As pyridine is a small molecule, this does not require very large supercells. The Debye frequency contribution to the acoustic phonons and a Gaussian Kernel Density Estimate (KDE) of the optical density of states (using a bandwidth of 3 cm 1 ) are calculated from the elastic tensors and phonons, respectively, and then used to calculate the free energy thermal correction F vib , as the sum of the vibrational zero-point energy and the thermal energy at 190 K. The Helmholtz free energy A free (T ) = E latt + F vib (T ) was evaluated at 190 K as this is the temperature at which there is an experimental crystal structure of both forms I and II but this is only 40 K below the melting point of pyridine. Hence, the use of the harmonic approximation is unlikely to be realistic, as discussed later. The effects of pressure were calculated with DMACRYS by optimizing the cell geometry, including the PV contribution to the energy, to minimise the lattice enthalpy H latt = E latt + PV.
The CSP study used CrystalPredictor 1.8 72,73 to generate a million putative Z = 1 crystal structures of pyridine within the 59 most probable space groups. Optimization using the FIT potential and ISA point charges generated just over five thousand unique structures. This set was re-optimized with the various anisotropic model potentials, any optimized structures whose second derivative properties showed that they were not true minima were discarded, and the remaining structures were re-clustered. The most stable structures were analyzed in detail for their similarities to each other and the isolated dimer structures. This was done using the similarity tool in Mercury 74 which determines how many molecules (n) of a maximum coordination cluster (15 for crystals, 2 for crystal/dimer comparisons) can be matched within a 20% distance in intermolecular atom-atom distances and 20°in interatomic intermolecular angles and reports the optimum RMSD n (root mean square deviation of n molecules) of the overlay, ignoring hydrogen or deuterium atoms.

A. Gas phase properties
The quality of the DIFF model is mainly supported by its theoretical origin and the quality of fit. The intermolecular potential energy surface has eight distinct minima 2 (Fig. 3), the most stable having two C-H· · · N weak hydrogen bonds (referred to as Hb1), others only one, some distorted T shaped geometries stabilized by C-H· · · π interactions, as well as a trio of displaced stacked geometries (Fig. 3). It is notable that the dimers differ markedly in the dominant contribution to the binding energy; the electrostatic plus polarization contribution is similar to the total binding for the hydrogen bonded Hb1 dimer, but this contribution is less 1 kJ mol 1 for the almost iso-energetic stack S1, which has double the dispersion stabilization. There are three distorted T dimers with CH· · · π interactions and another C-H close to nitrogen, which are quite similar in energy and also in structure apart from the relative position of the nitrogen. These T-shaped dimers have all components intermediate between the hydrogen-bonded and stacked dimers.
The classical second virial coefficients (Fig. 4) differ markedly between the empirical and DIFF models, with the latter being closer to the experimental values. The empirical FIT potential underestimates the effect of the intermolecular forces on the gas-phase property by approximately 25%, whereas the DIFF model appears to slightly overestimate the effect by around 10%. The reasons why the DIFF model is slightly over-binding are discussed later.  67,68 The values for the DIFF model differ from those previously reported 2 for model 3 which were affected by a coding error. The agreement between DIFF and model 3 tests the equivalence of the potentials.

B. Reproduction of known crystal structures of pyridine
A comparison of the cell parameters and densities of the lattice energy minima with the various analytical potentials with the three experimental determinations of form I at three temperatures between 190 and 5 K and the two determinations of form II at two pressures (but different temperatures) is given in Fig. 5, with a full set of comparisons in the supplementary material. The majority of the potentials give a satisfactory reproduction of the two crystal structures by lattice energy minimization, which corresponds to a static crystal, in comparison with the variation of structure with temperature. The cell parameters of the non-empirical DIFF models are found to better match the 5 K form h5-I structure (PYRDNA04) than the empirical models, while the empirical models that include a distributed multipole electrostatic model (GDMA/ISA) better match the densities at higher temperatures Fig. 5(a). Accessible variations in pressure generally have a larger effect on the crystal structures than temperature changes, but since pressure experiments are usually done at ambient temperature, it is not possible to de-convolute temperature and pressure effects on pyridine which is only solid at ambient when under pressure. Allowing for this difference in temperatures, optimizing the lattice enthalpy seems to provide a sensible estimate of the structures under modest pressures [1.1 GPa Fig. 5 The space group, lattice parameters, energy, and the corresponding dimer motif for the 30 most stable structures can be found in the supplementary material, linked to the structure identifier pyr #. Fig. 1(b) of the supplementary material] for all the intermolecular potentials.

C. Stability of observed versus hypothetical structures
The CSP lattice energy landscape with DIFF generated many structures competitive in energy with forms I and II (Fig. 6), with there being 52 unique structures within 5 kJ mol 1 of the most stable. The lowest energy structure ( pyr 2) is denser and 1.3 kJ mol 1 more stable than form II. DIFF therefore passes the test of predicting the known structures as sufficiently close to the global minimum to be thermodynamically plausible as polymorphs. Given the experimental uncertainty that the experimental structures are the most thermodynamically stable, and the neglect of temperature effects, as discussed further below, the DIFF model has transferred successfully to the crystalline state. It is therefore worth determining the importance of the different contributions to the relative stability ranking, in light of what parts of the potential surface are sampled by the CSP test.
The ranking of the low energy structures is very sensitive to the model potential (Fig. 7). Removing the polarization energy DIFF(no-pol) (Fig. 3 of the supplementary material) leads to considerable re-ranking, with form II becoming the most stable, with the energy difference between the two forms increasing to nearly 5 kJ mol 1 . The model fitted with an isotropic repulsion model [DIFF(iso-rep)] reverses the stability order of forms I and II and has a different selection of structures (Fig. 3 of the supplementary material) that are slightly more stable than the known forms. The empirical FIT+GDMA potential gives the observed form II as very close in energy (0.14 kJ mol 1 ) to form I, but there is just one lower energy structure, by 0.8 kJ mol 1 , pyr 83, which is significantly different (Fig. 3 of the supplementary material). The empirical potential produces structures that are less dense than with the DIFF model, as found for the observed structures (Fig. 5). This sensitivity to the energy model in ranking the structures requires understanding in terms of the variety of pairwise intermolecular interactions in the crystal structures which are thermodynamically plausible as polymorphs. The gas-phase dimer motifs 2 in the 30 lowest energy structures found in the DIFF search are given in Table IV of the supplementary material along with their lattice energies.
The analysis of the low energy crystal structures (Table IV of the supplementary material) showed that the coordination clusters are not dominated by the gas-phase dimer structures shown in Fig. 3. The most stable crystal structure ( pyr 2) contains the T-shaped dimer T1 and a much distorted version FIG. 7. Relative lattice energies of the observed and selected computer generated crystal structures of pyridine, relative to form I as a function of the model intermolecular potential. The labels for the CSP generated structures pyr # refer to their ranking after the CrystalPredictor 72,73 stage, i.e., with the FIT+POINT model and are given in Table IV   of the singly hydrogen bonded dimer Hb2, whereas the most stable dimer Hb1 occurs in a crystal structure ( pyr 83) that is 1 kJ mol 1 less stable. Thus the low energy crystal structures differ in the relative contribution from each of the van der Waals contact dimers defining the structure, and each of these differs in the partitioning between the various contributions 75 in a similar manner to the dimers (Fig. 3). In Fig. 7, we display the relative energies of a selection of structures within 4 kJ mol 1 of the most stable, together with the three known polymorphs. The hypothetical structures were selected based on the diversity of the gas-phase-like dimers they contained, so as to allow us to more clearly explore the effects of the various energy models on the relative lattice energy and its components. The relative thermodynamic stability order varies significantly as a function of potential (Fig. 7). There is only a slight re-ranking of the structures in changing between the two empirical repulsion-dispersion models (FIT/WILL+GDMA), but there is a dramatic re-ranking caused by omitting the polarization energy or the repulsion anisotropy from the DIFF model. In addition, the global minimum from the DIFF ( pyr 2) is less stable than forms I and II with the empirical models, and being denser, its stability may be an artifact of the DIFF model, as discussed later. There is also some variation with the electrostatic model used, with the re-ranking caused by using a point charge or distributed multipole representation of the same charge distribution (FIT+POINT vs FIT+GDMA) being less severe than that caused by changing the quality of charge distribution and its representation (FIT+GDMA vs FIT+ISA). This is unusual, as the use of distributed multipoles rather than the corresponding potential-derived charges usually makes a considerable improvement in CSP studies, particularly for hydrogen-bonded systems. 59 The lattice energies of the selected crystal structures vary by less than 4 kJ mol 1 using the DIFF model (Table I); however, the various contributions to this energy vary considerably more. The dominant dispersion contribution varies by over 10 kJ mol 1 favoring denser structures that contain stacked or Tshaped dimers whose dimer energies are heavily dominated by the dispersion contribution, but this is often partially balanced out by an increase in the short range repulsion term. The electrostatic contribution varies by 5 kJ mol 1 , favoring the experimental forms and other structures containing the CH· · · N "hydrogen bonds." The polarization energy is a small contribution but stabilizes form I (Z = 4) the most, and the difference in the polarization energy between the two known polymorphs, at 2.8 kJ mol 1 , is larger than their total lattice energy difference. Although this difference is accounted for by the lower symmetry within form I allowing larger induced moments (Fig. 2), the polarization term also stabilizes pyr 2 and all other structures that are lower in lattice energy than form II. Hence the polarization contribution is important, even for pyridine and the low energy structures re-rank substantially in the CSP which neglects this structure-dependent non-additive term by using DIFF(no-pol) (Fig. 3 of the supplementary  material).
Thus the CSP study samples a wide range of relative orientations of the pyridine molecules and finds that the various contributions can balance to within the energy range of plausible polymorphism (<5 kJ mol 1 ) in a variety of different structures.

D. Form III
On hearing of the existence of a new polymorph d5-III at around 2 GPa, the crystal energy landscape was recalculated at 2 GPa. The effect of the pressure-volume (PV ) energy term significantly re-ordered the relative stability of the structures (Table V of the supplementary material) as well as increasing the density by 15%. One structure became significantly more favorable with pressure and was only 1.2 kJ mol 1 less stable than form II. Furthermore, this structure ( pyr 35) had 7 coordinating molecules in in common with form II, and so it is very plausible as the result of a low-barrier transformation (Fig. 8), unlike the other low energy hypothetical structures at 2 GPa. This seemed likely to be form III on grounds of relative energy at 2 GPa and being the structure in the search with the most plausible degree of rearrangement from the known forms. The overlay of this structure ( pyr 35) with the experimental structure of form III at approximately 2 GPa is shown in Fig. 9. In contrast, the structure of form III could not be identified with the empirical FIT potential: the corresponding structure pyr 35 is not amongst the most stable, independent of pressure ( Fig. 3 + Table VI of the supplementary material), and is 3.3 kJ mol 1 less stable and less dense than form II at 2 GPa. Thus, it would not have been possible to propose a candidate structure for form II using the FIT potential solely from the knowledge of its existence at ca. 2 GPa.

IV. DISCUSSION
The non-empirical DIFF model for pyridine, which was derived from theoretical calculations on the monomer and dimer, gives a reasonable account of the structures of the polymorphs and their stability relative to hypothetical structures. The accuracy in the solid-state rivals, and arguably exceeds, that of empirical potentials that had been parametrized to experimental crystal structures and fails badly in reproducing the second virial coefficients or the behavior of the solid under pressure. This is a major advance, but this study does bring into focus both the advantages and disadvantages of having a genuine pair potential energy surface which it was hoped would be transferable between phases with the approximation that the polarization was the only significant many-body term for relative energies.
All the static lattice or enthalpy (P = 1.1 GPa and 2 GPa) structures are in good agreement with the experimental structures, given the variation between the structures that have been determined at different temperatures (5-190 K) and pressures (≤1.1 GPa). The DIFF model reproduces the 5 K h5-I structure of pyridine well but overestimates the density by 5.6%. Nonetheless, perhaps this is not as serious an issue as it may seem as it has recently been noted that a quasi-harmonic modelling of crystalline imidazole using ab initio based potentials gave a 4% increase in molar volume on including just the zeropoint energy. 76 Hence, a part of the error in the density at 5 K is probably due to neglecting the effect of the zero-point TABLE II. The Helmholtz free energy at T = 190 K, p = 0 for both protonated and deuterated isotopes (h5 A free , d5 A free ) calculated using the lattice dynamics and the DIFF model for the observed polymorphs and the global minimum (GM) in the CSP study (Fig. 6) expansion on the structure. Hence, to the extent that the static lattice energy (or enthalpy with a PV term) can model the crystal structures, DIFF models the experimentally unobservable static crystal structure, and the empirically fitted potentials implicitly include some average over zero-point energy effects and thermal expansion. The lattice energies of the crystals are only an approximation to the thermal stability neglecting the isotope-dependent zero-point energy and thermal contributions. The free energy estimates in Table II show that including the zero-point energy and thermal corrections estimated by lattice dynamics reverses the stability of h5-forms I and III and significantly reduces the energy difference between forms I and II by over 0.7 kJ mol 1 . While, these harmonic, rigid-molecule lattice-dynamic estimates are approximate, they highlight that both the zero-point energy and thermal corrections are significant, and affected by deuteration, as seen in Table II. The effect of deuteration is virtually identical for every polymorph within this fixed rigid-molecule approximation. However, using experimental molecular structures can change the lattice energies by a few kJ mol 1 , showing that modelling small changes in molecular structure is important, even for pyridine. Most organic crystals have a significant thermal expansion, which can be very anisotropic and dependent on the specific crystal structure. Recent quasi-harmonic periodic electronic structure methods 77 and empirical potential estimates for a large dataset of organic crystals 78 show that the effects of thermal expansion are thermodynamically significant, producing an underestimation of heat capacities at high temperatures. 79 Moreover a divergence between harmonic-approximation phonon-modes and those in the crystal as modelled by Molecular Dynamics (MD) has been noted at quite low temperatures for imidazole and 5-azauracil. 80 For pyridine, which is a liquid at room temperature, the molecular motions are likely to be large in amplitude at the temperatures of the majority of experimental measurements and therefore should be more realistically modelled by a finite-temperature MD simulation. What Table II demonstrates is that both zero-point and thermal contributions are significant for the relative stability of the crystal structures, with the common observation that their inclusion reduces the energy differences between structures. 81 The success of the DIFF model relative to potentials specifically developed for modelling the organic state derives from having a functional form based on the theory of intermolecular forces and not parameterization from the experiment. The diversity of gas phase dimer motifs in the low energy hypothetical and experimental crystal structures emphasizes that a solid-state CSP samples the intermolecular potential energy surface much more extensively. A dimer structure only samples the potential around one configuration, whereas a crystal structure samples a wider range of relative orientations and close contacts from all the molecules in the nearest neighbor coordination sphere of typically fourteen molecules. The lattice summation samples the potential over a larger distance than a gas-phase dimer and the closest contacts in the crystals sample higher up the repulsive wall because of the attractive force from the second, and higher-order coordination spheres even at ambient pressure. The dispersion contribution is about 150% of the total lattice energy in the low energy crystals (Table I). Hence, the theoretically based distance-dependence of the long range forces is particularly important in the solid-state. Using the DIFF model with a spilt into C 6 , C 8 , C 10 is more theoretically justified than using an empirical C 6 only potential where the higher dispersion coefficients have been absorbed in the parametrization. A small error in C 6 could result in a large overall error due to the lattice summation.
A similar advantage in the distance dependence is seen when the long range electrostatic terms are given by distributed multipoles over atomic point charge molecules, though the main advantage of the higher atomic multipole moments is to correctly model the orientation dependence of the electrostatic forces due to lone pair and π electron density. These non-spherical features in the charge distribution also determine the anisotropy in the repulsion. The slightly closer contacts in solid-state and diversity of contact configurations means that the relative energies of the crystal structures are very sensitive to the anisotropy in the repulsion, as shown by a considerable re-ranking of the crystal structures when this anisotropy is not explicitly fitted in the potential (Fig. 3 of the supplementary  material).
The successful identification of form III as a structure which becomes relatively more stable under pressure is clear evidence of the importance of using a non-empirical anisotropic repulsion potential. It is not surprising that parameterizing an oversimplified functional form of the repulsive wall by fitting to ambient pressure crystal structures fails to extrapolate to the closer contacts sampled at moderate pressure. Applying pressure changes the intermolecular contact distances more than changing temperature and so probes interactions higher up the repulsive wall. 82,83 High-pressure recrystallization is a versatile route to generating new polymorphs, with structural properties modifying significantly with pressures around and above 1 GPa. [84][85][86] Hence more realistic, theorybased, non-empirical intermolecular potentials will be very important for more reliable exploration of structure-property relationships in organic crystals under pressure.
The second virial coefficients show that the DIFF model systematically over-estimates the intermolecular interactions (Fig. 4). We believe that these errors stem not from problems with the fitting procedure described by Misquitta and Stone 2 but from errors in the reference SAPT(DFT) interaction energies to which the DIFF model was fit. While a complete analysis of these problems would take us beyond the scientific scope of this paper, we can state that there are two main causes for the over-binding of the SAPT(DFT) interaction energies: (1) Interaction energy contributions from third to infinite order in the intermolecular interaction operator are approximated by the δ HF int energy. 87,88 This energy includes mainly higher order induction and exchangeinduction contributions, which are known to be important for capturing the effects of hydrogen bonding in water. However, this may not be suitable for configurations of pyridine in which the binding is primarily or dominantly from the dispersion interaction, [89][90][91] where this correction may lead to an overestimation of the binding.
(2) The choice of asymptotic correction to correct the charge density used for calculating the molecular properties, made by Misquitta and Stone, 2 may not be suitable for a strongly anisotropic (in shape) molecule such as pyridine. This is a subtle issue that is being researched in our group.
The first problem is a potentially serious one as the δ HF int term is needed for systems with hydrogen bonds 92 but is known to be inappropriate for dispersion-bound dimers for which it leads to an over-binding. 93,94 This term is sometimes excluded entirely 41,95 if the primary binding is from the dispersion energy as is the case for the benzene crystal. But it is as yet unclear what is to be done for a system such as pyridine which exhibits both hydrogen-bonding as well as dispersion-bonding. Work is currently underway in our groups, using CCSD(T) dimer calculations to better understand both problems, as it is essential for modelling pharmaceuticals where most crystal structures are a balance between hydrogen bonding, π-π stacking, and other dispersion interactions and conformational changes. While a pairwise additive intermolecular potential is sufficient for a complete potential energy surface for the isolated dimer, it only accounts for the dominant contributions in condensed phases. The only explicitly non-pairwise terms included in the DIFF model are in the polarization term which leads to a net attractive many-body contribution to the lattice energy, However, in a weakly bound molecular crystal like pyridine, where most of the binding arises from the two-body dispersion energy, the many-body dispersion energy may be expected to be large and repulsive. For example, it has been shown in Refs. 38 and 96 that the three-body non-additive dispersion contributes repulsively to the benzene crystal by as much as 7%-14% of the total lattice energy. We expect a similar contribution to arise in the pyridine crystal. In particular, the dense, strongly dispersion bound global minimum structure ( pyr 2) will be relatively destabilized by the three-body dispersion.
This study has shown the advantage of using nonempirical potentials for crystal structure prediction, particularly for form III which was obtained under pressure, where the repulsive wall is sampled in regions not used in fitting empirical potentials. However, it also shows how sensitive the relative thermodynamics are to the modelling assumptions. For example, pyr 2 is predicted to be more stable than form II, both at the level of the lattice energy Fig. 6 and taking into account thermal effects Table II, though the difference is considerably reduced on application of pressure (Table V of the  supplementary material) and its stability may be because of the neglect of the many-body dispersion term. However, what would be the consequence of a confident calculation that pyr 2 or any other structure was more thermodynamically stable than the observed polymorphs at accessible temperatures and pressures? The prediction of undiscovered more thermodynamically stable phases is a major justification 97 for the development and testing 8 of CSP methods, as this determines the risk of disappearing polymorphs, 98 helps justify devising experiments for finding new polymorphs 99 and can generally assist in the development of specialty organic materials. When polymorphs switch relative stability with temperature or pressure, it may often not be as obvious as a solid-state phase transformation but only apparent by recrystallization experiments in the presence of seeds of both polymorphs. It is difficult to obtain, let alone structurally characterize, the most stable crystalline forms of molecules that are liquid or gases under ambient conditions, 100 let alone change the experimental conditions sufficiently to vary the kinetics (already implicated in the isotopic polymorphism of pyridine), at low temperatures or at pressure 101 to compete with the nucleation of the known forms. The similarities between the known polymorphs of pyridine (Figs. 1 and 8) and the contrast to the different dimers in the coordination sphere in pyr 2 (Table I) emphasize that the observed structures may be kinetically favored. Hence, pyridine could well have alternative thermodynamically competitive polymorphs that have not yet been found. The DIFF model potential could be used for simulating liquid-phase pyridine, adding confidence to interpreting experimental data, 102,103 as recent simulations show that this is sensitive to the anisotropy of the electrostatic model. 46 Such simulations could suggest why the observed forms crystallize, by revealing a link between the liquid and solid state structures via the most readily formed nucleus.
Non-empirical potentials have considerable advantages over periodic electronic structure methods, or other advanced methods of calculating lattice energies of molecular crystals 104 in that they can be evaluated sufficiently readily to be used in CSP and for estimating the effect of temperature. Empirical potentials have the advantages of simpler functional forms, and the errors in the functional form, transferability assumptions, and method of simulation, such as neglect of zero-point and thermal effects, are partially absorbed into the potential. However, these advantages imply uncertainties in extrapolating to other conditions as required for solid state phase diagrams, 105 and empirical potentials cannot be expected to transfer to the gas phase, as demonstrated by the second virial coefficients (Fig. 4). In contrast, the DIFF and other non-empirical models 13,31 have the advantage that the approximations used are known and can be built upon. Although, the polarization energy is challenging to include in any force-field, this is being tackled in the development of the next generation force-fields and simulation codes. [106][107][108] The improved realism of intermolecular force-fields has to go alongside the further development of simulation methods and codes to include both zero-point effects and realistic temperature dependent dynamics.
In summary, there is clearly a conceptual advantage in using model intermolecular potentials where the approximations are known and controlled. The current potentials are not definitive but can act as a strong framework for the development of force-fields for organic molecules.

V. CONCLUSIONS
This paper has shown that a non-empirical distributed intermolecular pair potential (DIFF), derived from the theory of intermolecular forces and SAPT(DFT) calculations using the CamCASP and ORIENT programs, can be used for modelling the solid-state of pyridine with a realism that exceeds that of transferable empirical potentials that have been derived for modelling the crystalline state. Furthermore, this study stresses the importance of crystal structure prediction as a comprehensive way of testing the robustness of intermolecular model potentials. The DIFF model was particularly effective for studying the effects of pressure on the relative stability of structures, as shown by the identification of form III. Nevertheless, empirical potentials do have the advantage of absorbing many errors, including approximations in the simulation methods, and being more readily implemented in existing codes. Hence, contrasting CamCASP-derived potentials, and other models driven by the theory of intermolecular forces, with more approximate models, should produce a hierarchy of model force-fields that can be used with molecule-specific knowledge of the effects of various approximations.

SUPPLEMENTARY MATERIAL
See supplementary material for tables of CSP generated crystal structures, further calculations, and the potential models including parameter input files.