A combined on-the-fly/interpolation procedure for evaluating energy values needed in molecular simulations

We propose an algorithm for molecular dynamics or Monte Carlo simulations that uses an interpolation procedure to estimate potential energy values from energies and gradients evaluated previously at points of a simplicial mesh. We chose an interpolation procedure which is exact for harmonic systems and considered two possible mesh types: Delaunay triangulation and an alternative anisotropic triangulation designed to improve performance in anharmonic systems. The mesh is generated and updated on the fly during the simulation. The procedure is tested on two-dimensional quartic oscillators and on the path integral Monte Carlo evaluation of HCN/DCN equilibrium isotope effect.


Switzerland
(Dated: November 19, 2021) We propose an algorithm for molecular dynamics or Monte Carlo simulations that uses an interpolation procedure to estimate potential energy values from energies and gradients evaluated previously at points of a simplicial mesh. We chose an interpolation procedure which is exact for harmonic systems and considered two possible mesh types: Delaunay triangulation and an alternative anisotropic triangulation designed to improve performance in anharmonic systems. The mesh is generated and updated on the fly during the simulation. The procedure is tested on two-dimensional quartic oscillators and on the path integral Monte Carlo evaluation of HCN/DCN equilibrium isotope effect.

I. INTRODUCTION
Accurate evaluation of the Born-Oppenheimer potential energy surface of a molecular system is essential for predicting its dynamical and equilibrium properties. Numerous advances in the algorithms used for the problem 1,2 combined with increasing computational power available to researchers have made it possible to combine on-the-fly ab initio evaluation of the potential energy even with path integral 3-5 or semiclassical 6-11 dynamics algorithms.
Unfortunately, such approaches are still computationally expensive and, for long simulations requiring a very large number of potential energy values in the same region of configuration space, it is reasonable to instead generate a mesh of points at which accurate ab initio calculations are made and then fit a function to reproduce their potential energy values or some other potential quantities that become bottlenecks of the calculation, such as Hessians of the potential energy. 12,13 For that purpose, a multitude of methods has been proposed, from modified Shepard interpolation [14][15][16][17] to more sophisticated approaches, 18 including those based on interpolating moving least squares, 19,20 Gaussian process regression, 21,22 and neural networks. [23][24][25][26] We aimed for a procedure that would interpolate energies from stored data evaluated at points of a simplicial mesh and that would be comparable to Shepard interpolation in terms of simplicity and generality. To that end, we investigated interpolation from points of the mesh that constitute a simplex containing the point of interest, an approach already applied to some lower-dimensional systems. 27,28 Compared to Shepard interpolation, the downside of this approach is the necessity to generate a triangulation for the mesh, whose size grows very fast with the number of dimensions, 29 but the upside is the logarithmic scaling of the interpolation procedure with the number of mesh points as well as an extra order of accuracy for a given number of derivatives available at the mesh points. In comparison to the method of Ref. 28, the main differences in the approach presented here are an alternative triangulation of the mesh and a different choice of the interpolant, along with a procedure for updating the mesh during the simulation.
The theory behind the proposed algorithm is explained in Sec. II, while Sec. III presents numerical tests for model anharmonic potentials and for the HCN/DCN equilibrium isotope effect. While we focus on classical Monte Carlo and path integral Monte Carlo applications, similar interpolation procedures can be also used with molecular dynamics or path integral molecular dynamics methods.

II. THEORY
Running a Monte Carlo simulation requires knowledge of the potential energy function V (r), where r is the D-dimensional vector of system's internal coordinates. Let us assume that we can access a number of previously stored points together with their potential energy and gradient values as well as a triangulation of their mesh; we want to use that information to estimate the potential energy value at a new pointr. If the mesh currently contains fewer than D + 1 points, V (r) is evaluated exactly andr is added to the mesh, which will be triangulated (in the only possible way) once D + 1 points have been added. If the mesh has already been triangulated the following algorithm is used for estimating V (r): 1. Find the simplexS containingr or verify thatr lies outside the convex hull C mesh of all mesh points.
2. IfS was found, calculate the value of the interpolantṼ (r) and estimate whether the interpolation error |Ṽ (r) − V (r)| is below a predefined threshold.

Ifr /
∈ C mesh or if V (r) cannot be estimated with sufficient accuracy, add more points to the mesh to allow for an accurate estimate of V (r).
We will discuss each part of the algorithm separately in the following subsections.

A. Interpolation procedure and reliability estimate
Supposer is inside simplexS with vertices r jS (j = 1, . . . , D + 1) and we want to estimate V (r) based on the values of the energy and its gradient at the D + 1 points r jS . Previously, Clough-Tocher interpolants 30,31 were used for the problem in up to three dimensions; 27,28 these interpolation schemes are exact for cubic potentials and have derivatives that are continuous up to the second order, but they have two disadvantages: they use Hessians, whose evaluation increases enormously the cost of an ab initio calculation, and their generalization to higher-dimensional systems is not straightforward. Perpendicular interpolation 32 is another powerful approach which, for an arbitrary number of dimensions and an arbitrary number of derivatives q available for all vertices, produces an interpolant that is exact for a polynomial of order q + 1 and that has q continuous derivatives; however it scales exponentially with dimensionality D, making potential applications to higher-dimensional systems problematic. In this work we used an interpolant that exhibits a better scaling with D at the cost of having discontinuous derivatives. (If this is a problem, interpolants of Ref. 32 should be used instead.) To define this interpolant we introduce barycentric coordinates λ j (j = 1, . . . , D + 1) ofr, which are defined by the D + 1 equations D+1 j=1 λ j r jS =r, The interpolant we propose is defined in terms of "partial" interpolantsṼ j all of which are exact for quadratic potentials. One way to combine them into a single interpolant symmetric with respect to vertex permutations is which is an interpolant proposed in Ref. 33 (based on Refs. 34 and 35). The term on the second line is zero because the second factor is zero. This combination ofṼ j , however, would not reproduce potential energy gradient at the vertices, which is a waste since each V j reproduces the gradient at vertex j. An alternative expression that does reproduce gradients at all vertices isṼ It is impossible to get a reliable estimate of the interpolation error without any knowledge of the third derivatives of V (r) in the simplex and application of Bayesian approaches as in Shepard interpolation 36 is complicated byṼ j (r) containing data from all vertices of the simplex at once. One exception is the one-dimensional case, where defining (with D = 1) yields an exact estimate |Ṽ (r)−V (r)| ≤ δV (r). The estimate seems to perform qualitatively correctly for a large number of higher-dimensional potentials as well, so we decided to deem the interpolation result reliable if δV (r) were below some predetermined threshold δV max . This makes δV max a parameter whose only relation to interpolation error is that both are proportional to the magnitude of third derivatives in a small enough simplex.
The lack of a more precise relation between the two quantities forced us to estimate the exact interpolation error for a small number of randomly chosen interpolation results, determining whether the chosen δV max is adequate. For small enough simplices the leading contributions to both the interpolation error and δV (r) depend linearly on the tensor of third derivatives of potential energy, so one possible rule of thumb for fixing an unacceptable interpolation error would be a proportional decrease of δV max , e.g. halving δV max if root mean square error (RMSE) of interpolation should be halved.

B. Updating the mesh and its triangulation
If we either findr to be outside the convex hull C mesh or that δV ≥ δV max , we add a carefully chosen point r add to the mesh and update the triangulation. Before describing the algorithm, let us introduce several definitions. Firstly, the boundary of C mesh is a set of faces referred to as F mesh . Secondly, we will often use the signed distance n F (r) from the plane containing the face F ∈ F mesh , with sign defined to be nonnegative at the mesh points. Lastly, both systems that will be considered in Sec. III have certain restrictions on the values r can take: in the symmetric two-dimensional quartic oscillator, the symmetry makes it possible to consider values of coordinates in only one quadrant (e.g., both x and y non-negative), while in HCN our choice of internal coordinates implies that two of them only take non-negative values. We thus consider a situation where r needs to satisfy one or more linear constraints of the form where i is an index of the constraint, c i is a scalar constant, and v i is a vector constant.
Whenr was inside C mesh , we found that adding a point r add =r to the mesh worked well enough. However, whenr was outside C mesh , we "pushed" r add further out, i.e., chose r add further away from C mesh thanr, primarily to avoid creating nearly singular simplices. The procedure, referred to as "outward push," works as follows: 1. Find F min ∈ F mesh that minimizes n F (r).

Set
Step 3 is designed to move the mesh point (pushed away in Steps 1-2) onto one of the "constraining surfaces" defined by instead of rejecting a move that would violate the constraint. Several such rejections would lead to a mesh that would approach infinitely close to one of the constraining surfaces during the simulation, as illustrated in Subsec. III A.
Once r add is chosen, we need to update the triangulation of the mesh. Here we only present the main ideas of the employed algorithms; the details are in the Appendix. The starting point of this work was using Delaunay triangulation 37 following its previous successful applications. 27,28 There exist several algorithms 29 that update a Delaunay triangulation at a cost that does not increase with the number of simplices. The approach outlined in this subsection uses Lawson flips 37 defined using "parabolic lifting" 38 instead of the more conventional empty circumsphere test. 37,38 One considers sets of D + 2 points, which can be triangulated at most in two different ways. 37 Whenever such a set is already triangulated using simplex array S and an alternative triangulation S is available, one compares the values G(S) and G(S ), where the function G is defined as and where v(S) is the volume of simplex S. The choice of the cost function g(S) that yields Delaunay triangulation is If G Delaunay (S) > G Delaunay (S ), which is equivalent to S failing the empty circumsphere test used to define Delaunay triangulation, 37,38 the simplices of S are replaced with those of S .
One performs Lawson flips until they fail to change the triangulation regardless of the initial S. Since each Lawson flip decreases G Delaunay (S mesh ), where S mesh is the array of all simplices in C mesh , the algorithm is bound to stop at a certain point, and it can be proven 37 that the resulting final triangulation is unique to the mesh. It can also be shown that G(S) = G(S ) for the two triangulations of D + 2 points unless the points lie on a sphere or in a hyperplane; treatment of these singular cases is discussed in the Appendix.
The expression for g Delaunay (S) underlines one problem with Delaunay triangulation: it treats all dimensions equivalently, necessitating a choice of internal coordinates that makes properties of V (r) approximately isotropic, which tends to be non-trivial. A Bayesian approach to bypassing the problem for Shepard interpolation is discussed in Ref. 36, while for simplex interpolation one can use higher-order derivatives to define a Riemannian metric 39 that can then be used to construct the triangulation optimal for the current interpolation procedure. [40][41][42] Unfortunately, for quadratic interpolation the latter option would involve calculating third derivatives of the potential, which is rather expensive; therefore, we instead used Lawson flips with a modified g(S). Obviously, the procedure still stops at a certain triangulation regardless of the choice of g(S), even though we will not be able to guarantee the triangulation's uniqueness without restrictions on the potential V (r). The anisotropic g(S) proposed in this work was which is a qualitative estimate of the upper bound for interpolation error in a given simplex; unlike g Delaunay , g anisotr is invariant with respect to linear transformations of coordinates. To avoid entering infinite loops for cases when G(S) = G(S ), we modify the flipping criterion to be G(S) − G(S ) > δG min , where δG min is a small predefined parameter.
g anisotr should be applicable for any simplex interpolant which uses potential and its gradient and is exact for quadratic potentials. To understand why quadratic interpolation is special in this context, consider interpolating from D+1 vertices with q derivatives available, which in general can yield an interpolant exact for polynomials up to degree q + 1. In the special case of D = 1, q polynomials of degree q + 1 can be constructed from 2(q + 1) parameters available; for q = 1 g anisotr arises naturally as the degree to which the two cubic polynomials disagree. The q = 1 case is special because for larger values of q and more than two polynomials of degree (q + 1), several analogues of g anisotr are possible, while for q = 0 the (linear) interpolant is uniquely defined, making it impossible to define a similar g anisotr .
From now on, the triangulation that results from using g anisotr with Lawson flips will be referred to as "anisotropic triangulation".

C. Search for the simplex
The last task is finding the simplexS that containsr. We used stochastic walk 43 that iteratively updatesS from an initial guessS init by calculating barycentric coordinates λ j [Eq. (1)

A. Anharmonic oscillator
A large number of molecular systems are close to harmonic in the most relevant part of their configuration space, so as a model problem we chose a system of two harmonic vibrational modes with a "very anisotropic" anharmonic perturbation, where anharm determines the "anharmonicity" of the potential. In all examples presented here we ran 2 24 (≈ 1.68 · 10 7 ) step Monte Carlo simulations with inverse thermodynamic temperature β = 1 and δV max = 3.125 · 10 −2 ; the mesh was constructed in |x| and |y| rather than x and y to capitalize on the potential's symmetry. The results are presented in Figs Table I.  Table I, the improvement is definitely less drastic for anharmonic potentials where simplex size is also determined by the magnitude of δV (r).
The motivation behind the anisotropic triangulation introduced in this work is illustrated by Fig. 2, comparing the meshes and interpolation error distributions obtained using two different triangulations in Monte Carlo simulations for anharm = 0.01. The simplices obtained with Delaunay and anisotropic triangulations are plotted in panels (a) and (b); switching to the anisotropic triangulation "elongates" triangles along |x| axis, as it should, judging by the form of anharmonic part of the potential (13). While in this case the distribution of interpolation errors is not significantly affected, the number of mesh points added is decreased more than by a factor of two (see Table I).
We also checked how the tendencies observed for anharm smaller than δV max , a tendency we observed for a wide range of potentials.
Lastly, recall that running the Monte Carlo simulations presented here involved 2 24 + 1 ≈ 1.6 · 10 7 potential energy evaluations, and instead of exact calculations in each instance we used mere thousands of mesh points to reproduce these exact calculations with great precision (see Table I). This demonstrates the potential of our method for speeding up practical calculations, a point elaborated further in the next subsection.

B. HCN/DCN equilibrium isotope effect
In this subsection we combine our interpolation procedure with the path integral Monte Carlo method, 46,47 which accounts for nuclear quantum effects by replacing each atom of the simulated molecule with P replicas connected by harmonic forces. 48 We combined the path integral Monte Carlo method with the free energy perturbation approach 49 (direct estimators 50 ) for isotope fractionation to calculate the HCN/DCN equilibrium isotope effect defined as where Q denotes the partition function. The potential energy surface of HCN was taken from Ref. 51. The interpolation algorithm used three internal coordinates that were defined in terms of atom radius-vectors r D/H , r C , and r N as follows It is necessary to use the "outward push" procedure to avoid the mesh approaching infinitely closely the x 3 = 0 surface due to the x 3 ≥ 0 constraint, for reasons illustrated in Subsec. III A. Monte Carlo steps (to avoid wasting computational effort on calculating correlated samples); the statistical error of its average was estimated as the root mean square error evaluated with block averaging. 45 The number of replicas P were chosen as 256 and 32 for 200 K and 1000 K; it was verified with separate calculations that doubling P did not change the isotope effect by more than 1%. For the other temperatures the P was assigned by linear interpolation of P values as a function of 1/T . We set δV max = 10 −4 a.u., and after each successful interpolation the algorithm had a 10 −5 probability to carry out an additional exact potential energy calculation in order to estimate the RMSE of interpolation.

Results and discussion
In Table II, isotope effects calculated with our interpolation algorithm are compared to benchmark values calculated with the original force field and with harmonic approximation [54][55][56] values. Interpolation allows reproducing benchmark isotope effect values with an error below 1%; the decent agreement between the harmonic approximation values and the formally exact path integral results are expected considering HCN is a fairly harmonic molecule.
The RMSEs of interpolation and the number of mesh points generated during the simulations are displayed in Table III. If we used an expensive ab initio procedure for the exact potential, then the speedup due to the interpolation method would equal the ratio of the numbers of interpolated potential energy evaluations and the number of exact potential energy evaluations which approximately equals to the number of mesh points generated in the simulation. In this 3-dimensional problem this ratio is always of the order of 10 4 , indicating a large potential speedup. As discussed in Subsec. II A, we made an additional number of exact potential energy calculations to make sure that the choice of δV max guarantees an adequate interpolation accuracy, however the number of these additional calculations (approximately the number of potential evaluations during the calculation times 10 −5 , see Subsubsec. III B 1) was always small compared to the number of mesh points, but still large enough to estimate the RMSE of interpolation with high precision. As was the case for the quartic oscillator, the mean square interpolation error is significantly smaller than |δV max | 2 .
However, because HCN is a very harmonic system, the anisotropic triangulation loses its advantage over the Delaunay triangulation. Both approaches behave similarly and, in fact, the anisotropic triangulation yields slightly higher interpolation errors and generates slightly We have also investigated how our method performs if the mesh created during one simulation is reused for simulations at other temperatures. One would expect that the best starting point would be the mesh generated during the highest temperature simulation, whichâĂŤin accordance with classical Boltzmann distributionâĂŤtends to visit a larger region of configuration space during a fixed number of simulation steps. However, in our calculations of HCN isotope effect we found that using the lowest temperature was preferable. 57 The results are presented in Table IV, which shows clearly that the number of extra mesh points that must be added to the mesh generated during the lowest temperature simulation is relatively small for simulations at all other temperatures.

IV. CONCLUSION
We have proposed an algorithm for interpolating potential energy values from the values of the potential energy and its gradient calculated and stored for points of a mesh generated during a Monte Carlo simulation. The interpolation procedure is exact in harmonic systems, while in anharmonic systems its accuracy depends on the triangulation procedure chosen for the mesh. For the latter, we considered two choices: the previously used Delaunay triangulation and an anisotropic triangulation designed to decrease the interpolation error. Both triangulations combined with subsequent interpolation resulted in a very large reduction of potential energy evaluations in comparison with a purely on-the-fly approach. Moreover, we found that for nearly harmonic systems the two triangulations give similar results, with Delaunay triangulation demonstrating superior performance in some cases, but for more anharmonic systems the proposed anisotropic triangulation achieves similar interpolation errors with significantly fewer mesh points. The ad hoc procedure used for construction of such anisotropic triangulations may be used to improve performance of other interpolants, [31][32][33]35 even though a different definition of δV (r) [Eq. (6)] may prove more convenient.
To combine our interpolation algorithm with classical or semiclassical molecular dynamics simulations, one may need to use a different interpolant, as mentioned in Subsec. II A; the "outward push" procedure would also need to be extended to points added inside C mesh to avoid forming nearly degenerate simplices. By contrast, as mentioned in Subsec. II C, searching for the simplex used in interpolation should become even simpler.
It is important to discuss the scaling of our interpolation procedure with respect to two parameters: number of points in the mesh and dimensionality. For the former, adding new points to the mesh is done at a cost that does not depend on the number of points already in the mesh and the cost of finding the simplex used in interpolation scales logarithmically with the number of mesh points; calculating the interpolant costs the same regardless of the number of mesh points. This behavior compares favourably to Shepard interpolation and Gaussian process regression, which utilize functions whose evaluation cost is proportional to the number of mesh points; triangulating the mesh is a natural way to avoid the issue.
However, the cost of storing and updating the triangulation increases dramatically with dimensionality 29 even if one does not take into account the increase in the needed number of mesh points (which also grows quickly with dimensionality, at least for lower dimensions).
This problem is likely to be decisive if one wanted to apply our method to systems of dimensionality six and higher (corresponding to molecules with four atoms and more). Yet, in this work we demonstrated a significant potential speedup achieved by such algorithms in the simulations of two-and three-dimensional systems.
counterintuitive observation can be explained as follows: The lower the temperature, the greater the quantum delocalization of the ring polymer and the greater the explored region of configuration space. Although this delocalization diminishes at higher temperatures, it is eventually replaced by an ever increasing motion of the center of the ring polymer associated with the classical Boltzmann distribution. However, we cannot see this transition yet in Table III, most likely because the quantum delocalization at lower temperatures is further increased due to the use of mass-scaled direct estimators, which stretch the ring polymer by a factor of into the sum. We still use the "expand_convex_hull" subroutine to make the triangulation computationally cheaper, but we will use the notion of virtual simplices a little later.
As mentioned in Sec. II, it is beneficial to place some mesh points on constraining surfaces (9). In this case each point of the mesh is assigned a logical variable array whose ith element indicates whether the point lies in the constraining plane with index i. To account for constraints, the following modifications should be made to Algorithms 1-2: ALGORITHM 2. Procedure for expanding C mesh once a new point r add is added outside of it.
procedure expand_convex_hull(r add , S new ) find a face F start such that n Fstart (r add ) < 0; |* F conf will consist of all "conflicting" faces F such that n F (r add ) < 0, e bound will consist of edges of F conf not shared by two faces in the array *| call expand_conflict_zone(r add , F start , F conf , e bound ) for each (F ∈ F conf ) do create a simplex S from r add and F; add S to the current triangulation and S new ;