A non-empirical intermolecular force-field for trinitrobenzene and its application in crystal structure prediction

An anisotropic atom–atom distributed intermolecular force-field (DIFF) for rigid trinitrobenzene (TNB) is developed using distributed multipole moments, dipolar polarizabilities, and dispersion coefficients derived from the charge density of the isolated molecule. The short-range parameters of the force-field are fitted to first-and second-order symmetry-adapted perturbation theory dimer interaction energy calculations using the distributed density-overlap model to guide the parameterization of the short-range anisotropy. The second-order calculations are used for fitting the damping coefficients of the long-range dispersion and polarization and also for relaxing the isotropic short-range coefficients in the final model, DIFF-srL2(rel). We assess the accuracy of the unrelaxed model, DIFF-srL2(norel), and its equivalent without short-range anisotropy, DIFF-srL0(norel), as these models are easier to derive. The model potentials are contrasted with empirical models for the repulsion–dispersion fitted to organic crystal structures with multipoles of iterated stockholder atoms (ISAs), FIT(ISA,L4), and with Gaussian Distributed Analysis (GDMA) multipoles, FIT(GDMA,L4), commonly used in modeling organic crystals. The potentials are tested for their ability to model the solid state of TNB. The non-empirical models provide more reasonable relative lattice energies of the three poly-morphs of TNB and propose more sensible hypothetical structures than the empirical force-field (FIT). The DIFF-srL2(rel) model successfully has the most stable structure as one of the many structures that match the coordination sphere of form III. The neglect of the conformational flexibility of the nitro-groups is a significant approximation. This methodology provides a step toward force-fields capable of representing all phases of a molecule in molecular dynamics simulations.


I. INTRODUCTION
An accurate potential energy surface (PES) for the intermolecular interaction between organic molecules is a key requirement for simulating the properties of the gas phase and the liquid, the solid, and other condensed phases including crystalline polymorphs and the transitions between them. 1 The methods of realistically evaluating many condensed phase properties require an analytical model so that huge assemblies of molecules can be simulated over long time periods.Such analytical models are generally pairwise intermolecular potentials; however, non-additive many-body contributions also need to be considered when studying condensed phases.Furthermore, potentials for flexible molecules may depend on molecular conformation.Many molecules change conformation when packed with other molecules, which can result in changes in the molecular charge distribution that affect the intermolecular interactions.Hence, while model intermolecular potentials that are able to describe the properties in all phases of the inert gases, such as argon, have been available for decades 2 and considerable progress has been made on a highly accurate pair potential for water (see Refs. 3 and 4 and references therein), most atomistic modeling of condensed phases of organic molecules has used empirical forcefields.Force-fields have been heavily developed for biomolecular simulations, but the fitting to experimental data inevitably absorbs system-specific errors and so limits accuracy if they are used for molecules and properties outside the scope of fitting and validation. 5ence, advancing the computer simulation work requires a transferable approach to deriving non-empirical potentials, based on the theory of intermolecular forces, and building on the electronic level modeling of intermolecular interactions.
Since electron correlation is the cause of the dispersion, which is important and sometimes dominant to the long-range attractive force between organic molecules, even calculating dimer interaction energies requires high-level electronic structure treatments. 6,7s the charge densities of closed shell organic molecules are only slightly perturbed by intermolecular interactions, their intermolecular forces are well described by perturbation methods.Indeed, perturbational methods such as symmetry-adapted perturbation theory (SAPT), 4,8,9 and its variant based on density-functional theory [SAPT(DFT)], 6,7 are ideally suited for this problem as they are not only accurate, and also, SAPT(DFT) is computationally efficient.Additionally, perturbation theory provides a rigorous match between the short-range energies and the long-range analytic expansions using multipoles, polarizabilities, and dispersion coefficients.The size of organic molecules means that a density-functional charge distribution of each molecule is the most practical basis for developing a model intermolecular potential using SAPT(DFT) 6,7 to provide the essential link between electronic and atomistic level modeling of the intermolecular forces.
In SAPT(DFT), 10,11 the intermolecular interaction energy between a pair of rigid molecules is defined through the sum of energy components, elst + E (1)   exch + E (2) Here, E elst and E (1) exch are the first order electrostatic and exchangerepulsion energies, respectively.E (2)   IND is the second-order induction energy, which includes the exchange-induction energy, and E (2) DISP is the total second-order dispersion energy, which includes the exchange-dispersion energy, E exch−ind and E (2) The term δ HF int estimates the main error in truncating the expansion to second-order 12 and is dominated by the higher order induction energy contributions.For condensed phases, the usual summation over the intermolecular pair potential between all pairs of molecules in the system is an approximation (the pairwise additive approximation).The induction energy includes contributions from quantum charge-delocalization and classical polarization effects. 13While both terms are likely to be non-additive, it is the classical polarization contribution that is dominant in the condensed phase as it is long-ranged.This term is difficult to evaluate and include in simulations as it requires numerical iteration to self-consistently account for the mutual polarization of all components in the lattice. 14APT(DFT) and related supermolecular partitioning methods have been the basis of various methods of deriving non-empirical intermolecular potentials (see Refs. 15 and 16).Non-empirical potentials are particularly valuable for modeling energetic materials, where the key functional groups, such as NO 2 , have rather different charge distributions from the N and O atoms in well-parameterized biomolecular force-fields.The difficulties in handling explosives naturally leads to a lack of experimental data for empirical parameterization, and this has led to a desire for reliable computational prediction of condensed phase properties.Pioneering studies of the organic solid-state using analytical fits to SAPT(DFT) interaction energy calculations have concentrated on small energetic molecules, 17 such as RDX [18][19][20] and FOX-7 21 where predictive modeling requires reliable extrapolation into the repulsive region that is not sampled in fitting empirical potentials.These force-fields are in an isotropic atom-atom form for ease of use in molecular dynamics simulation programs.
This paper seeks a non-empirical model potential for use in Crystal Structure Prediction (CSP) studies of energetic molecules.The vision behind one of the earliest CSP methods was to predict whether an energetic molecule would crystallize in a sufficiently dense crystal to be a promising material prior to synthesis, 22 although CSP studies can also be used to help solve the crystal structures. 23,24The anisotropy of the atomic charge density arising from features such as lone pairs and π electron densities has proved to be important in realistic CSP studies, 25,26 with one of the leading family of methodologies being based on using a distributed multipole representation of the molecular charge density. 27While most CSP studies have used empirical isotropic exp-6 repulsion-dispersion potentials in conjunction with an anisotropic atom-atom electrostatic model, an early non-empirical potential that additionally included anisotropy in the repulsion 28 for C 6 Br 2 FClH 2 was successful in an international blind test. 29This potential completely neglected the induction contribution to the lattice energy. 28A model for the use of atomic dipolar polarizabilities has since been implemented in the organic solid state modeling code DMACRYS, 14 although the numerical iteration required means that the polarization energy cannot be included in structure optimization.A fully anisotropic nonempirical atom-atom model intermolecular potential for pyridine has been derived using the program CamCASP 30 and used within DMACRYS in CSP studies to study the polymorphs of pyridine. 16he non-empirical potential was successful in identifying the structure of a high-pressure polymorph 31 unlike the empirical repulsiondispersion potential FIT 14 that is often used in CSP studies. 16Thus, this paper seeks to extend this approach to predicting the crystal structures of energetic molecules.The use of an atomistic intermolecular potential is far more computationally efficient than using periodic dispersion-corrected density functional (DFT-D) calculations on hundreds of low energy crystal structures, as it is being increasingly used in CSP studies. 27Thus, further examples of evaluating the use of non-empirical intermolecular pair potentials in CSP are urgently needed.
The energetic molecule chosen for this study was the highexplosive trinitrobenzene (TNB), which is extremely volatile in its dehydrated powder form. 32It has a melting point of ∼400 K, 32,33 and although it was first discovered in 1883, 34 a crystal structure was not determined until 1972. 35The polymorphism of TNB (Fig. 1) was first reported in 2004 after the addition of trisindane in an attempted cocrystallisation 36 that resulted in the serendipitous crystallization of forms II (Z ′ = 2) and III (Z ′ = 1).The experimental data on forms II and III are limited to the melting points and crystal structures, with form III having the highest melting point.This, and a force-field lattice energy estimate, suggests 36 that form III is more thermodynamically stable than kinetically favored form I. Form III is estimated to be more stable than form I by 0.6 kJ mol −1 by periodic PBE-D3(BJ) lattice energy calculations. 37he crystal structures of TNB (Fig. 1) show that the crystalline conformations do not differ much in the NO 2 torsion angles, and consequently, a rigid-molecule intermolecular potential is worth developing as a first approximation.TNB is highly symmetrical in its gas-phase optimized geometry so that there are only five distinct atomic types.This type of rigid-molecule approximation was often reasonably successful for substituted aromatic molecules in early CSP studies. 38,39Incorporating flexibility into the force-field so that it accurately balances the inter and intramolecular forces is highly challenging, as shown by a contemporary study of α-RDX 20 as well as CSP studies on flexible pharmaceuticals. 40,41Indeed, larger changes in the NO 2 torsion angles would affect the charge distribution of the nitro group to an extent that would require reparameterization of the model potential for accurate lattice energies. 42e can develop non-empirical derived intermolecular forcefields (DIFFs) for TNB with an atomic multipolar expansion of long-range components VLR and an exponential model for the shortrange terms VSR.The functional form 1 for the interaction between molecules M and N containing atoms i and k of types ι and κ (types are C H , C N , N, O, or H) at a separation R ik and orientation Ω ik can be written as The electrostatic and induction terms are implicitly summed over the permanent atomic multipoles Q i t (where t = lk; l is the total angular momentum, k is the z-component of angular momentum, and l ≤ 4), induced dipoles ΔQ i t and the interaction tensors T ik tu are defined using the axes (Fig. 1) on the two molecules. 1The induced moments and the isotropic dispersion coefficients, C ικ 6 , C ικ 8 , and C ικ 10 , are calculated from atomic polarizabilities that are derived from the molecular charge distribution using linear-response Kohn-Sham theory together with suitable distribution techniques.The dispersion and polarization expansions are damped using the Tang-Toennies 43 damping function fn.All the short-range terms are modeled by an exponential term where the atomic repulsion anisotropy ρ(Ω ik ) is derived specifically for the molecule by fitting to SAPT(DFT) calculations, with guidance from the distributed density overlap model. 16he partitioning of the interaction energy and molecular properties into atomic contributions is based on the basis-space version 44 of the iterated stockholder atoms approach of Lillestolen and Wheatley. 45,46he DIFF force-field functional form is complex.However, the complication of the anisotropy is likely to be important for many molecules as there is a significant difference in the reported quality of fits to SAPT(DFT) dimer intermolecular potential energy surfaces from using isotropic or anisotropic models.One automated generation of isotropic model potentials reported a typical error of about 0.8 kJ mol −1 in the negative energy region, 47 whereas a different approach that included anisotropy exhibits average errors of 0.2 kJ mol −1 in the attractive region for a different wide range of interacting dimers. 48Given that lattice energy differences between polymorphs are often less than 2 kJ mol −1 , with about 80% calculated to be less than 4 kJ mol −1 , 49 the modeling of the organic solid state requires the best possible accuracy.The solid state also samples the potential energy surface at a much wider set of relative separations and orientations in the known and CSP generated crystal structures than those that are tested in dimer calculations.Thus, the ability to reproduce crystal structures and the relative energies of known and hypothetical structures generated in a CSP study is a severe test of the accuracy of a force-field.
In this paper, a set of DIFF potentials was developed using a combination of CamCASP, 30 Psi4, 50 NWChem, 51 and ORIENT. 52hese potentials have the same long-range terms but increase the cost of development, first by the inclusion of anisotropy in the short range term and then allowing the isotropic short range coefficients to be relaxed by fitting to the total interaction energy, calculated up to second-order SAPT(DFT) interaction energies. 30,52The size of TNB meant that we needed to make some changes in the DIFF potential development method compared to that used for pyridine 16 and The Journal of Chemical Physics ARTICLE scitation.org/journal/jcpwater; 31 however, the philosophy of the approach remains the same.The various DIFF model potentials were tested in a CSP (Z ′ = 1) study to assess whether form III was found and also to study the relative stability of all observed polymorphs.The aim is to advance our ability to model the condensed phases of energetic molecules such as TNB by developing a workflow for deriving a sufficiently realistic non-empirical potential and implementing it in a CSP study.

II. METHODS
The non-empirical potential derivation begins from the isolated molecular structure of TNB obtained by optimizing the isolated molecular conformation using the Gaussian09 program, 53 with the PBE0 functional [54][55][56] and the d-aug-cc-pVTZ Dunning basis. 57This molecular conformation was held rigid for the development of the force-fields and in all simulations, except those used to assess this approximation.The three non-empirical DIFF potentials were constructed using the workflow in Fig. 2.These had the same long-range terms but differed in whether the shortrange terms were isotropic, DIFF-srL0, or anisotropic with terms up to L = 2, DIFF-srL2.Additionally, the model parameters determined using first-order SAPT(DFT) energies were either allowed to relax against the total SAPT(DFT) intermolecular energies, indicated by "(rel)," or kept at their un-relaxed values, indicated by "(norel)."The comparisons were made with the non-empirical isotropic exp-6 model potential (FIT) 14 whose parameters were fitted to organic crystal structures and have widely been used in CSP studies.The FIT model was either combined with multipoles computed using the Gaussian-Distributed Multipole Analysis (GDMA) 58 method with the PBE0/6-31G(d,p) charge density, denoted as FIT(GDMA,L4), or the same ISA distributed multipoles used in the DIFF potentials [FIT(ISA,L4)].Here, "L4" denotes that these multipoles contained terms of maximum rank of 4 on all sites.

A. Numerical details for electronic structure calculations
The molecular properties of TNB were computed using the modified basis-space iterated stockholder atoms (BS-ISA) 44,59 and the ISA-Pol 59 algorithms implemented in CamCASP. 30The Kohn−Sham orbitals and orbital energies needed for these calculations were calculated using the NWChem.6.6 60 code.We used the CS00 (Casida-Salahub) asymptotically corrected 61 PBE0 functional and the augA-Sadlej basis set, 62 a Sadlej-pVTZ basis that has been augmented with additional diffuse functions and functions of f -symmetry.This basis set was chosen as a cost-efficient yet accurate alternative to the larger d-aug-cc-VTZ basis set (see Fig. 1 of the supplementary material).The asymptotic shift (Δ) for TNB was calculated using the experimental ionization potential of 0.4028 a.u. 63nd the energy of the highest occupied molecular orbital, also computed with PBE0/augA-Sadlej, to give Δ = 0.0539 a.u.The linearresponse density functional calculations were performed using the hybrid ALDA+CHF kernel in CamCASP 6.0. 30he SAPT(DFT) dimer interaction energy calculations used to determine the induction energies and regularized induction energies for the polarization models were computed using CamCASP 6.0 with the above numerical details except that we used the augcc-pVTZ basis in the MC+BS format using a 3s2p1d1f midbond basis from the CamCASP basis library.Note that in CamCASP 6.0, the second-order exchange-induction energy is computed without the single-exchange (S 2 ) approximation using a closed-shell implementation of the general energy expression derived by Schäeffer and Jansen. 64The first-order SAPT(DFT) interaction energies were computed using the above basis but with CamCASP 5.9.However, for reasons of computational efficiency, the interaction energies computed up to second-order were performed using the CamCASP with DFT orbitals and orbital energies calculated with the Psi4 1.1 code.In this case, the asymptotic correction was changed to be the GRAC correction 65 as Psi4 does not support the CS00 scheme.The two asymptotic correction schemes use the same long-range form but differ in the manner in which the long-range functional is spliced with the PBE0 functional.Consequently, some differences are to be expected, but these are likely smaller than the overall errors of the approach

B. Long-range terms
The molecular charge density was partitioned into atomic charge densities to obtain the permanent distributed multipole moments Q i t of rank l ≤ 4 using the most recent iterated stockholder atoms algorithm 44 implemented in using CamCASP 6.0, the ISA-A algorithm. 59The ISA-A method used the augA-Sadlej main basis with PBE0/CS00.The additional basis sets needed for the ISA calculation included the aug-cc-pVTZ 66 basis for the density-fitting and the aug-cc-pVQZ-RI basis modified with the ISA Set2 s-functions from the CamCASP basis library.The permanent atomic multipoles Q i t of rank up to hexadecapole were obtained for each atom. 59he ISA-Pol 59 algorithm as implemented in CamCASP was used to derive the anisotropic dipolar (L1) polarizabilities α ik tt ′ for use in modeling the polarization energy and the isotropic L3 atom-atom dispersion model, 67 i.e., the C ικ 6 , C ικ 8 , and C ικ 10 . 14,67The long-range multipolar properties used in the DIFF potentials are given in the supplementary material.

C. Short-range potential parameters
The parameters in the short-range potential include the terms in the exponential part of the Born-Mayer potential and also the damping parameters for the dispersion and polarization models.The latter were determined by direct fitting to selected SAPT(DFT) second-order energies.However, for the Born-Mayer repulsive terms, we adopted the multi-stage fitting technique, which has been extensively described by Misquitta and Stone. 16In this approach, the initial parameter estimates are obtained via the distributed densityoverlap parameterized against a dense set of first-order SAPT(DFT) energies.This provides the potential DIFF-srL2(norel) and a model with only isotropic repulsion DIFF-srL0(norel).
The first-order exchange, E exch , and electrostatic, E elst , interaction energies were computed using a combination of NWChem.and CamCASP 16 for 2000 pseudo-random dimer configurations generated using the quasi-random Sobol sequence and Shoemake's uniform random rotations algorithm, 16 implemented in CamCASP and used previously for molecules as non-spherical as pyridine and C 6 Br 2 ClFH 2. 28 The distributed electrostatic energy, V elst , was then elst ) for each dimer configuration.
The fitting of the first-order short range energies to the sum of atom-atom exponential terms in Eq. ( 3), E (1) allowing for all the symmetry allowed shape-anisotropy terms of rank 1 or 2 (see Table VI of the supplementary material), is an ill-defined problem.Hence, the parameters are initially estimated by using the ISA atomic charge densities and assuming that E The isotropic model was derived by using the fits to the distributed density overlap model to provide the initial parameters.These parameters were relaxed in stages to fit the 2000 E (1) pen energies using constraints to keep the relaxed parameter values close to the initial overlap model values.The anisotropic model was derived in the same way, with the fitting to the distributed density overlap model being done systematically to determine which anisotropic terms made a significant improvement on the isotropic fits.Once the choice of anisotropic terms had been made, these parameter values were tightly constrained in the relaxation stage.Further details are in 2.4 of the supplementary material.
The final potential parameters required for Eq. ( 3) are those required for damping the multipolar induction and dispersion expansion at a short range to allow for molecular overlap using the Tang-Toennies damping function, 43 fn Note that, in principle, the damping parameters are dependent on the sites involved.Indeed, this has been demonstrated for the water dimer, 4 but due to limitations of the DMACRYS 14 code, here, we are restricted to single damping parameters for the polarization and dispersion models.This results in a degradation of the quality of the resulting models and the need for compromises.A preliminary estimate of β disp = 1.8 a.u.can be derived 59 from the ionization potential of TNB.This was investigated by comparing the secondorder dispersion energies E (2) exch−disp and the multipolar dispersion energies V (2) DISP of DIFF-srL2(norel) of both the configuration scan and crystal structure dimer datasets (defined below) for a range of values of β disp .The estimate β disp = 1.8 a.u.gave a good approximation of the dispersion energy at short-range, but a value of β disp = 1.65 a.u.gave the best fit to the data (see 2.3.2 of the supplementary material) and so was used in all the DIFFs.
Defining the polarization damping parameter is more involved as this parameter is more site-dependent than the dispersion damping and also because the polarization damping needs to be determined by fitting not to the second-order induction energy, but to the second-order polarization energy defined by regularized SAPT(DFT).We compared E (2) POL of DIFF-srL2(norel) for various values of β pol .It has been suggested that 68 but this damping is insufficient.No particular value of β pol performed perfectly across all separations, particularly for repulsive configurations (see 2.2.2 of the supplementary material); however, β pol = 1.0 a.u. was selected as it gave the smallest errors around the equilibrium distances of the low energy dimers and for the various two-molecule contacts in the crystal (see Figs.The SAPT(DFT) energies to second-order include a chargedelocalization energy, Both the CD and E (2) POL include second-order exchange-induction effects. 13 CD is very small for TNB except for highly repulsive configurations (E int > 150 kJ mol −1 ) and, in most dimers, is negligible in comparison with the polarization energy (see 2.6 of the supplementary material).Contributions to the interaction energy from third and higher-orders in the interaction operator are usually included via the δ HF int term; however, this term is known to overestimate these effects in complexes with little or no hydrogen bonding, as is the case for TNB.Furthermore, we do include higher-order polarization effects through the iterated polarization model.After investigation (see 2.6 of the supplementary material), we have not included the δ HF int term in this study.Finally, three datasets of TNB dimer intermolecular energies, DISP , were calculated using the dimer configurations described below.The configuration scans and the pseudo-random dataset were used for refining the isotropic short-range parameters ρ ικ 00 and α ικ 00 to absorb errors from the partitioning of the energies to give DIFF-srL2(rel) (see 2.5 of the supplementary material for details).A set of dimer orientations derived from the CSP-generated crystal structures was used as an independent test of the quality of all the model intermolecular potentials for TNB.

Generation of datasets of SAPT(DFT) dimer interaction energies
a. Gas-phase dimer orientation configuration scans.The gas phase dimers corresponding to minima on the DIFF-srL0(norel) were found using the basin-hopping algorithm, a two-part iterative global optimization treatment that employs a global stepping algorithm with local minimizations at each step 69 using ORIENT, 52 as a good way of locating minima on a bumpy potential energy surface.For TNB, an energy corresponding to kT at a temperature of 500 K determined the acceptance criterion for up to 100 000 steps with a maximum displacement Δx = 0.10 and rotation Δθ = 30 ○ for each step.The CLUSTER procedure in CamCASP was used to find the unique dimer geometries orientated by their axes of inertia.
The configuration scan dataset, used for fitting the damping parameters and illustrating the potential fits, comprised up to 33 points for each dimer orientation, with the center of mass separation scaled from the minimum energy structure by a factor between 0.76 and 1.4.The set of dimers in this dataset was considered too correlated to use in the relaxation process, so a truncated dataset of only 9 points per dimer orientation (45 points in all) was used in the relaxation of the short-range parameters.
b. Pseudo-random dataset.A further 1000 random configurations were added to the 2000 generated for the first order energies by restarting the quasi-random Sobol sequence.The energies of these 3000 pseudo-random dimer configurations were evaluated using DIFF-srL2(norel).E

ARTICLE
scitation.org/journal/jcpminima and the 282 pseudo-random structures, as described in 2.5 of the supplementary material.

E. Crystal structure prediction (CSP) and modeling for TNB
The TNB CSP study employed CrystalPredictor 2.0.1 71,72 to use the optimized isolated molecular structure of TNB to generate Z ′ = 1 crystal structures within the 59 most probable space groups 73 using the FIT potential and the point charges from the ISA multipole model [FIT(ISA,L0)].This generated 13 956 unique structures that were lattice energy minimized using DMACRYS 14 with the three non-empirical potentials [DIFF-srL2(rel), DIFF-srL2(norel), and DIFF-srL0(norel)] and the empirical isotropic exp-6 potential FIT 14 with the two distributed multipole electrostatic models [FIT(ISA,L4) and FIT(GDMA,L4)].Starting the lattice energy minimizations from the Crystal Predictor/FIT(ISA,L0), output structures ensured that more of the differences in the model potential energy surfaces were sampled than the usual hierarchical approach in CSP studies of minimizing from the lattice energy minima obtained with the previous model in the hierarchy of expected accuracy.
The structure optimizations were carried out without the polarization term.A Hessian check confirmed that each structure was a minimum in the lattice energy, and the structures clustered to remove duplicates.The polarization term was subsequently included as a single point energy evaluation, iterated so that the induced moments were maintained the crystal symmetry.Corresponding calculations were performed on the experimental crystal structures after the molecular conformation had been replaced by the optimized isolated molecular structure.

Dimer and crystal structure comparisons
The crystal structure similarity tool in Mercury 3.6 74 was used to determine the number of molecules, n, in a maximum coordination cluster (30 for crystals, 2 for dimer comparisons, and 1 for molecular conformations), which matched within a 30% distance in intermolecular atom-atom distances and 30 ○ in interatomic intermolecular angles, and the corresponding root mean square difference, RMSDn, excluding hydrogen atoms.

A. Derivation and validation of the DIFF potentials
The planar optimized structure has 5 unique atoms, all of which have anisotropic charge distributions within the ISA-A partitioning, as shown in Fig. 3.The corresponding long-range potential parameters are given in Tables I-III of the supplementary material.However, the derivation of the short-range anisotropy showed that only anisotropy on O and N made a significant difference, reducing the weighted root mean squared (rms) residuals in reproducing the E (1) pen energies from 1.16 kJ mol −1 to 0.77 kJ mol −1 (see Table V of supplementary material).The required repulsion and its anisotropy can be explained by considering that these are the only atoms where the repulsive wall is sampled over a range of orientations.

TABLE II.
The un-weighted root mean squared deviations (RMSD) in kJ mol −1 of the model potentials from the SAPT(DFT) calculated energies, /N, for various datasets.The 160 configuration scan dimer energies and the 282 pseudo-random dimers energies were used in the relaxation of DIFF-srL2(rel).The crystal dimers points are a validation set, not used in the fitting, that all have with some atoms in van der Waals contact.The configuration scan with the gas-phase minima orientations with R-scalings ≥0.85 also samples the orientations found in condensed phases and excludes the highly repulsive short-range parts of the configuration scans.The RMSDs of the various models need to be considered together with the range of intermolecular energies of the dimers given in the second row.The Journal of Chemical Physics

ARTICLE scitation.org/journal/jcp
The dimer structures that correspond to minima in the intermolecular pair potential are fairly insensitive to the model, with there being four structures that have the planes of the molecules roughly parallel but significantly displaced and one where there is a large angle between the molecular planes (Table I).However, the relative energies and order of the minima differ significantly with the model potential.
The relaxation of the isotropic repulsion parameters to the total second-order energies can be very dependent on the choice of the dimer configurations included if these dimers are not chosen in a balanced way.If the fitting was just done to the configuration scan dataset (with 33 dimers for each of the 5 dimer orientations), there was an improvement representing those energies, but the energies of the crystal dimer dataset became significantly worse (see 2.5 of the supplementary material).This is a reflection of the heavily correlated dimers in the fitting set and that these orientations are not important in the crystalline phase.However, relaxation to the balanced dataset of pseudo-random dimers and some configuration scan points does improve the fit to the total energies, as is just detectable in Fig. 4 for the configuration scans.Each component is well represented by the model potential, apart from the polarization energy at a short range.The real test of the model potentials is in Fig. 5 for the independent test of the crystal dimer dataset.The relaxation has somewhat improved the model potential for the crystal dimers, although it was expensive to generate sufficient total intermolecular energies {Eint[SAPT(DFT)]} so that the pseudo-random dimers covered sufficient orientation space for the relaxation step to be at all worthwhile.
The overall relative success of the different model potentials in representing the second-order energies is shown in Table II.This confirms that the model potentials give a good representation of the SAPT(DFT) energies, including the relative orientations sampled in crystal structures, as well as to the points used in the weighted relaxation process (see 2.5 of the supplementary material), many of which are significantly up the repulsive wall.

B. Modeling the solid phase: Observed and CSP generated crystal structures of TNB
The CSP lattice energy landscape for the final potential DIFF-srL2(rel) is shown in Fig. 6.The most stable and densest structures are all very similar to form III with the same 30-molecule coordination sphere.The global minimum structure is considerably more stable than the lattice energy minima reached by optimizing from the experimental form III structure, by 12 kJ mol −1 , but overlays 30 molecules with an RMSD 30 of 0.6 Å.It seems likely that many, if not all, structures with a 30 molecule overlay with form III are in shallow potentials energy wells and would become the same dynamic structure in a MD simulation under ambient conditions, i.e., correspond to form III if this lattice energy landscape (CSP_0) were reduced by allowing the molecular motions. 75The lowest energy structure that does not approximate to form III (the Pbca structure, TNB No. 55 in Fig. 6) has two layers of molecules in common with form III but differs by the third layer (see Fig. 12 of the supplementary material).
The majority of computer-generated structures are less dense than form III, although denser than form II, and more stable than form I. A selection of the most diverse structures is shown in Table III, which emphasizes that only some of the crystal structures contain a dimer related to the gas phase minima (see Fig. 11 of the supplementary material), and none contain any of the three most stable dimer configurations.Hence, the CSP with DIFF-srL2(rel) is successful, within the limitations that only Z ′ = 1 structures were generated, and so, forms I and II could not be found in the search.The balance of contributions to the lattice energy varies considerably over the crystal structures (Table III), particularly the dominant dispersion energy, although this is partially offset by the variation in the short-range energies, as the denser structures are most stabilized by the dispersion.The polarization energy is significant, approaching half the electrostatic energy in some cases.Differences in the polarization energy account for a large proportion of the variation in lattice energies between the structures that approximate form III.
The relative stability of the CSP generated structures is very dependent on the potential energy model, as shown by the different crystal energy landscapes in Figs.8-10 of the supplementary material and Fig. 7.The CSP study with the model potential commonly used for pharmaceuticals [FIT(GDMA,L4)] is almost useless, with all the known forms being unrealistically less stable than the hypothetical structures.Changing the electrostatic model to use the ISA partitioning together with a better quality charge density brings the structures within a more plausible energy range.
All the non-empirical DIFF potentials give the structures in a plausible energy range and have form I as metastable to forms II and III but differ in the relative rankings of forms II and III and the selected unobserved structures.The introduction of repulsion anisotropy introduces some reranking.The relaxation of the parameters by fitting to the second-order energies makes some difference, changing the relative stabilities of forms II and III and the structures that approximate form III from other hypothetical structures.
All the non-empirical models were lattice energy minimized without the polarization energy, which was then added as a singlepoint correction.Hence, the changes in relative energies include a contribution from small changes in the structure.The exception is the comparison between the final potential DIFF-srL2(rel) before (see Fig. 8 of the supplementary material) and after the polarization term (Fig. 6).This shows that the polarization energy does make a considerable difference in spreading out the energies, including the energy range of the structures that approximate form III.

ARTICLE scitation.org/journal/jcp
However, the improvement of the electrostatic model within the model potential has the greatest effect on the CSP study.

IV. DISCUSSION
The final derived DIFF-srL2(rel) potential is highly successful in that it represents the intermolecular energies as calculated by SAPT(DFT) to a good accuracy over a wide energy range (Table II).The use of distributed multipoles, polarizabilities, and dispersion coefficients guarantees the correct long-range behavior, but the balance of the long and short-range terms is well represented in the potential energy wells for the low energy dimers (Fig. 4).This model potential has successfully translated to the solid state, providing a realistic CSP study for TNB.It is a significant improvement on a CSP study using the semi-empirical model potential [FIT(GDMA,L4)], which has been mainly used for pharmaceuticals.Indeed, early CSP studies of some chloronitrobenzenes 39 struggled to find an acceptable model potential.The improvement in the model potential was most dramatic on improving the distributed electrostatic model as the change in both the CSP (Fig. 7) and gas-phase dimer (Table I) results was greatest in going between FIT(GDMA,L4) and FIT(ISA,L4).However, the relative ranking of the minima in both phases was sensitive to the differences between the DIFF potentials.

A. The approximation of a rigid TNB molecule
This study has assumed that TNB is rigid and remains in the highly symmetrical planar conformation of the ab initio optimized isolated molecule upon crystallization.Analysis of the known polymorphs in Fig. 1 shows that this is an approximation, and the nitro groups twist somewhat to give non-planar structures.The most readily crystallized structure, form I, has a nitro group torsion that is more than 30 ○ from planar in one of the two independent molecules.We estimated the effect of this conformational adjustment on crystallization by extracting the experimental molecular conformations from the crystal structures, recalculating the ISA multipoles and polarizabilities, and reoptimizing the crystal structures keeping the observed conformations rigid.This significantly improved the reproduction of form I and stabilized the lattice energies for forms II and III (see 3.1.1 of the supplementary material).However, even without including any energy penalty for the conformational change, the recently discovered forms II and III were more stable than form I. This sensitivity to the conformational change limits the conclusions that can be drawn from the CSP study of TNB.However, it seems very likely that form I is kinetically favored under most crystallization conditions, but the recently discovered forms II and III are more stable and that further polymorph screening and detailed crystallography may reveal more polymorphs or disorder in TNB.Ideally, a further CSP would be done in which the NO 2 groups of TNB could change conformation in response to the packing forces.This could be done by analytically rotating the multipoles during the structure relaxation, 42 but a final revaluation of at least the multipoles for each crystalline conformation would be needed for the final lattice energies. 42

B. Many-body terms
A second approximation in the use of DIFF pair potentials in condensed phases is the modeling of the non-pairwise additive terms, i.e., the many-body terms that involve three or more molecules.The only non-additive term included in the DIFF potentials is the polarization energy and its exchange counterpart.The polarization contribution to the lattice energy is relatively large (Table III), several times the polarization energy per molecule for the dimers extracted from crystal structures (|E ); see Fig. 2 of the supplementary material}.This is in contrast to simple ionic solids, where the net electrostatic field at each ion in the crystal can be zero by symmetry so that the dipolar polarization energy is zero, although there would be a huge polarization energy for a pair of ions.The variation of the polarization energy between different crystals is significant (Table III).However, the fitting of a molecular damping parameter β pol was not very satisfactory for TNB (see Fig. 2 of the supplementary material), and this could have made the polarization energy sensitive to the distance in the nearest neighbor contacts.This coupled with the uncertainty in the applicability of the δ HF int correction and not including the polarization energy in the lattice minimization could have led to an overestimation of the relative differences in the polarization contribution to the lattice energies of the form III type structures (Fig. 6).
The modeling of the polarization within the specific crystal structure via atomic polarizability tensors 76 is an advance on the isotropic polarizabilities within the AMEOBA force-field or the calculation of the distributed multipoles within a polarizable continuum model with a dielectric constant typical of those of organic crystals.The latter approach is often used in CSP studies, 77 as it often improves the relative energies between structures with different types of hydrogen bonding.We note that the molecular three-body Axilrod-Teller-Muto dispersion contribution was not included in modeling the crystals, and this would have provided a repulsive term that would share the sensitivity to the nearest neighbor atomatom distances and at least partially cancel the attractive polarization term.Hence, it is important to extend our understanding of the many-molecule terms in condensed phase modeling, noting that the physics of many-body (non-additive) contributions limits the accuracy of the lattice energy evaluated from a very accurate two-body model such as the DIFF.

C. The partitioning into atoms (ISA) and effect on relaxation
This CamCASP-derived TNB DIFF model is the first time the ISA partitioning of the molecular charge density into atoms has been used for all the components of the potential for a molecule larger than water. 4These atomic charge densities were used to derive the atomic multipoles, 44 polarizabilities, 59 and dispersion 59 parameters and, via the distributed density overlap model, to provide a first estimate of the repulsion potentials and form of anisotropy. 16The more rigorous ISA partitioning method provides maximally spherical atoms, thus minimizing the number of important anisotropy terms.From this approach, we were able to identify the significant anisotropy on the O and N atoms in TNB (Fig. 3) and thereby construct a model with short-range anisotropy on these two atom types to significantly improve the model potential (see Table V of the supplementary material).The ISA partitioning also allows the use of better wavefunctions and hence improved polarizabilities and C ικ 6 , The Journal of Chemical Physics ARTICLE scitation.org/journal/jcpC ικ 8 , and C ικ 10 dispersion coefficients, which is particularly important given how the dispersion dominates the lattice energies (Table III).The use of the ISA partitioning to represent each individual component of the interaction energy is expected to reduce the errors in each contribution and hence reduce the need to relax the parameters to fit the total energies.Certainly, the unrelaxed model potential was already a good approximation for TNB and resulted in nearly the same relative energies of the crystal structures in the CSP study (Fig. 7).This is a promising finding for the development of potentials for large molecules, as the cost of deriving the DIFF-srL2(rel) potential was significantly more than the DIFF-srL2(norel).This is due to the need to calculate several hundred second-order dimer interaction energies, sampling a wide range of relative orientations, in order for the relaxation to be unbiased and hence an improvement (see 2.5 of the supplementary material).It is noteworthy that there are relatively few TNB gas-phase dimer structures (minima in the dimer potential energy surface, Table I), and so, the properties of TNB in the gas-phase would sample a very limited portion of the dimer intermolecular potential surface.The condensed phases sample a wide range of relative orientations (Table III) although the relative orientations in van der Waals contact are rarely those of the gasphase dimers.Indeed, the non-empirical potentials have the further advantage of accurately representing the repulsive wall, which would be sampled when the crystals are compressed. 5The use of the theory of intermolecular forces in the DIFF approach produces model pair potentials that should describe the intermolecular pair potential surface over the configurations sampled in all phases.[80]

D. Further work
An anticipated advantage of the ISA partitioning of the charge density is that the changes in the atomic potential parameters with conformation may be limited to the physical redistribution of charge with conformation 42 (i.e., the changes that go beyond the geometric changes that can be done using analytical tensor rotation).This could greatly improve the efficiency of modeling the changes in conformation that occur on crystallization.The CrystalPredictor 81,82 and CrystalOptimizer 82,83 approach to CSP models the effects of conformational changes by building up databases of conformational energies, associated forces, and distributed multipoles as a function of the torsion and bond angles that vary between polymorphs and using local approximate model interpolations.The costs of using the more accurate molecular charge densities found necessary in this study may be mitigated by the better transferability of the potential parameters with conformation.It is also possible that a DIFF-style intermolecular force-field and a separately parameterized atom-atom intramolecular force-field for the conformational energy penalty 84 could be used to efficiently generate and rank crystal structures in a CSP study.5][86] ) There is a great need to have force-fields that are accurate enough to give realistic relative stabilities of observable and hypothetical crystal structures generated by CSP and can be used in molecular dynamics calculations to simulate the crystals under ambient temperature and determine the effect of varying the temperature and pressure. 75The DIFF approach could provide such force-fields: DIFF-srL0(norel) potential could, in principle, be used in TINKER 87 after adjusting the atomic multipoles to have a maximum rank of 2.

V. CONCLUSIONS
We have derived a set of progressively more accurate and detailed non-empirical model intermolecular pair potentials for trinitrobenzene using the CamCASP and ORIENT programs.Central to these potentials is the Iterated Stockholder Atoms (ISA) partitioning of the molecular charge density into atomic charge densities using the basis-space ISA algorithm (BS-ISA) implemented in the CamCASP suite of codes.The potentials include analytic long-range parameters from the distributed atomic multipole tensors, distributed dipolar polarizability tensors, and distributed isotropic C ικ 6 , C ικ 8 , and C ικ 10 dispersion coefficients, corresponding to a high-quality molecular charge density, and frequency-dependent molecular response functions.The long-range parameters were derived using the BS-ISA and ISA-Pol algorithms using the CamCASP code.The short-range potential parameters were determined through a hierarchical algorithm based on the distributed density-overlap model using the BS-ISA expansions for the atoms-in-a-molecule densities and SAPT(DFT) interaction energies.This algorithm involves multiple computationally well-defined steps that established a minimal set of molecule-specific atomic anisotropy terms.The initial short-range parameters in the analytical models were determined by fitting to an extensive set of computationally inexpensive SAPT(DFT) first-order exchange and penetration energies, E exch + E (1) pen .The Tang-Toennies short-range damping of the dispersion and polarization used a molecular parameter that was fitted to second-order dispersion and polarization energies, respectively, although these parameters could have been estimated.The resulting model potential, DIFF-srL2(norel), gave a useful crystal structure prediction (CSP) study for TNB and reproduced the dimer interactions energies very well.
The SAPT(DFT) interaction energies up to second-order were then considered, with the charge-delocalization term found to be negligible and the δ HF int term found to be insignificant and therefore neglected.A dataset of SAPT(DFT) dimer energies, E int = E DISP , was used to relax the isotropic short range parameters in order to absorb the errors in modeling the separate contributions and to also absorb the charge-delocalization energy.This final model, DIFF-srL2(rel), further improved the ability to model an independent dataset of total interaction energies of crystalline dimer configurations (not used in the potential derivation) with an RMSD of 0.5 kJ mol −1 .The improvement was sufficiently small to suggest that using ISA partitioning and SAPT(DFT) firstorder calculations to produce a model for each significant contribution to the intermolecular potential could be extended to producing an accurate anisotropic atom-atom model intermolecular potential for much larger molecules than TNB.
The non-empirical DIFF intermolecular potentials were used in successful crystal structure prediction (CSP) studies of TNB, despite sampling a very different set of relative orientations of the molecules The Journal of Chemical Physics ARTICLE scitation.org/journal/jcpthan the minima in the dimer potential energy surface.The relative energies of the observed and computer-generated crystal structures of TNB were reasonable, unlike those generated using an empirical model potential frequently used in CSP for pharmaceuticals.The study supports other evidence that the recently discovered form III of TNB is the most stable polymorph.The chief weaknesses of the intermolecular potential model, from both theory and the results, were in the treatment of the nonpairwise additive (many-molecule) terms and in the assumption that the molecule was rigid.We were unable to include non-additivity in the dispersion model, and the use of a single damping parameter in the polarization model may have resulted in the many-body polarization energy being too sensitive to the crystal structure.However, the rigid molecule approximation was probably the more important limitation for modeling the crystal structures of the known polymorphs and the CSP study, as small changes in the NO 2 torsion angles can significantly improve the packing of the molecules in some crystal structures.
This study makes significant progress toward a methodology for deriving non-empirical anisotropic atom-atom force-fields for organic molecules that would be capable of realistic modeling of all phases in molecular dynamics simulations.

SUPPLEMENTARY MATERIAL
See the supplementary material for the justification for choice of basis set for SAPT(DFT), the TNB atomic properties used in the DIFF potentials, the definition of polarization energy and damping the multipolar polarization energy, the distributed dispersion model, the derivation of the repulsion anisotropy, the refinement of the potential parameters, and discussion of the omitted terms and for further details of the crystal structure modeling of TNB, reproduction of experimental polymorphs, effect of conformational flexibility, sensitivity to the force-field, CSP results with different force-fields, and the CSP generated structures for DIFF-srL2(rel).

DEDICATION
This paper honors Ruth Lynden-Bell FRS, an outstanding theoretical chemist, who has mentored and inspired many students, including two of the authors, to pursue careers in academia during her career at the University of Cambridge and Queen's University Belfast.

FIG. 1 .
FIG.1.The molecular and crystal structures of the energetic polymorphs of trinitrobenzene (TNB).36The form, Cambridge Structural Database reference code, temperature of the structure determination at ambient pressure, density, and NO 2 torsion angles (ϕ) are given below the visual representation of each experimental structure constructed using Mercury 3.6.The molecular diagram defines the atomic types and the molecule-fixed axes (red) and local atom-fixed axes (blue) used for the DIFF models, with the local axes for C N and N and C H and H being parallel.

( 1 )
SR is proportional to the overlap of the charge distributions for each of the 15 different atom-atom interactions in TNB, 3 and 4 of the supplementary material).These values of β disp = 1.65 a.u. and β pol = 1.0 a.u.were used in Eq. (3) to complete the parameterization of the isotropic DIFF-srL0(norel) and anisotropic DIFF-srL2(norel) model intermolecular potentials.

D
. Improving DIFF by relaxing parameters to reproduce SAPT(DFT) interaction energies 1. Definition of SAPT(DFT) total (second order) interaction energies

( 2 )FIG. 3 .
FIG. 3.The anisotropy of the charge distribution for each atomic type in TNB, as shown by the electrostatic potential (eV) arising from the cumulative effect of the ISA multipoles of ranks 1 to 4 on the iso-density surface of 10 −3 e/bohr 3 .The scale is +0.4 eV (red) to −0.4 eV (blue).

TABLE I . FIG. 4 .
FIG. 4.The radial dependence of the E int [SAPT(DFT)] energy (points) for the gas-phase dimer orientations (the configuration scans) and the model potential V int (dashes) for the different contributions in DIFF-srL2(rel).The total E int [SAPT(DFT)] and short range E (1) SR energies are also given before relaxation, i.e., for DIFF-srL2(norel).

FIG. 6 .
FIG. 6.The lattice energy and density of the 200 most stable CSP generated crystal structures of trinitrobenzene calculated with DIFF-srL2(rel).Each point represents a mechanically stable crystal structure classified by its space group.The lattice energy minima obtained by minimizing the experimental structures with the same rigid, planar molecular structure, and intermolecular potential are shown by the filled symbols corresponding to their space group.All CSP generated structures that had a 30/30 molecule overlay with form III are indicated by the pentagon and cross.The details for the 30 most stable structures and their structure identifier TNB number are presented in Table X of the supplementary material and all structures in .resformat are available on request.

FIG. 7 .
FIG. 7. Relative lattice energies of the observed and selected computer generated crystal structures of TNB inTable III relative to form III as a function of the model intermolecular potential.

TABLE III .
Solid-state energy contributions (in kJ mol −1 ) for DIFF-srL2(rel) in the observed and the most diverse hypothetical structures.The structure labels are the order of stability after generation by CrystalPredictor, i.e., with the FIT(ISA,L0) empirical point charge model.