Machine learning scheme for fast extraction of chemically interpretable interatomic potentials

We present a new method for a fast, unbiased and accurate representation of interatomic interactions. It is a combination of an artificial neural network and our new approach for pair potential reconstruction. The potential reconstruction method is simple and computationally cheap and gives rich information about interactions in crystals. This method can be combined with structure prediction and molecular dynamics simulations, providing accuracy similar to ab initio methods, but at a small fraction of the cost. We present applications to real systems and discuss the insight provided by our method.


I. INTRODUCTION
In computational chemistry, the problem of fast and accurate reconstruction of the potential energy surface (PES) of a crystal is of central importance, since PES contains essential information about the system and all transition states.Indeed, given PES, one can find out the stable and metastable structures, paths of transitions between these structures, its first derivatives provide forces and second derivatives contain information about lattice vibrations and mechanical properties.Accurate first-principles calculations are computationally very expensive, which prohibits their use for large systems and long-timescale simulations of rare events.Faster methods (e.g.empirical potentials) are usually much less reliable.Ideally, one wants to build an approach that has a speed of empirical potentials and accuracy of first-principles calculations.One of the most promising solutions is to use machine learning (ML) techniques, because ML potentials are more flexible, more general and more accurate than empirical potentials, and they are much faster than first-principles calculations.
One may consider ML as a black box that has input and output; this black box has its own parameters and the process of learning is just an optimization of these parameters in order to correlate the input and output.In our case it translates as: the input is a way to represent a crystal structure (feature-vector), while the output is energy (or another property).When one wants to apply ML to some problem, many crucially important steps are encountered, which include 1) choice of training data, 2) choice of ML algorithm, and 3) structure description.
Because of high importance of the problem of PES reconstruction, much has already been done in this field.  Thihuge work includes applications of many different ML schemes.Among them are artificial neural networks (NN), [1][2][3][4][5][6][7][8][9] kernel ridge regression (KRR), 17 gaussian approximation potentials (GAP), [10][11][12] linear algorithms 13,14 and many others. 15,16,20Mostly, their aim was to build interatomic potentials for molecular dynamics.Also these works include many successfull tests (not only for bulk crystals, but also for other systems, such as molecules, clusters, liquids etc.), and deep analysis of structure descriptions. 18,20,21Here we do not aim to review all the previous works, but mention some important results.Nowadays, there are at least 3 different schemes, which we think to be particularly promising solutions to this problem: NNs, GAP, and Moment Tensor Potentials 14 (MTP), presented in the end of 2015.All 3 schemes showed very good accuracy, and currently it is not clear which is the best.Although NNs seem to be the most flexible, GAP and MTP seem to have better performance.In paper 14 a comparison between GAP and MTP was presented and it was showed that MTP is faster (in CPU time) and more accurate on tungsten.Still, MTP can nowadays work only with pure elements, while both NNs and GAP have a successful schemes for at least binary compounds. 9,10In this paper we decided to follow the artificial NN approach.One of the motivations of this work was to make a successfull scheme for global searches.Currently, such searches can be performed by USPEX, [22][23][24] which is a combination of evolutionary algorithm with e.g.DFT (any code that calculates energies and forces).Altough USPEX is very successful, [25][26][27] important limitations come from the high cost of ab initio calculations: e.g.only zero-Kelvin structure searches can be realistically performed, and very large systems are still problematic.This motivates our choice of training data (see below).Still, we made our scheme as general as possible: e.g. the strategy proposed below can be used in MD simulations, for calculations that require many atoms (defects, grain boundaries, glasses, polymers etc.) and/or long-timescale (diffusion, crystallization, chemical reactions, protein folding etc.).In previous works, data that was used for training contained only crystal structures that were already close to the local energy minima geometrically and energetically.So, the high-energy part of the PES could be lost (which is crucial for us).Here we present a scheme where we also include the high-energy part in training.This increases the chances to better capture the whole PES, and once we encounter high-energy structure, our scheme will still be robust, working in the regime of interpolation rather than extrapolation.This hypothesis is justified by the end results.
One of the disadvantages of NNs (or any other ML algorithm) is that when NN is trained, we are not able to understand what physically is obtained.But our combined scheme proposed below produces a pair potential that can be visualized, and allows separation of the total energy into pairwise and many-body parts, providing us with some insight into the interactions in the crystal.Combination of "black-box" NNs (capturing physically complex, if tractable, effects) with physically motivated analytic functions (capturing the most important basic physics) has been successfully applied for other problems before, 28 and we think it increases robustness of ML models, in addition making physical interpretations possible.
The paper is organised as follows.In Section II we formulate our lattice sums method, the application of which will fully play out only in Section III, where we present the general scheme.In Section IV we show results for several different systems, such as: Al, C, and two noble gases (He and Xe).

II. LATTICE SUMS METHOD
If there are K different types of atoms in the crystal, the number of pair potentials equals to 1 2 K(K + 1).Each potential can be represented in a very general form: The total pairwise energy per unit cell then equals to: where the summation is done over the whole crystal; each unit cell is numbered by index l and l equals to 0 for the current cell; i refers to the atom in the cell with l = 0, j refers to the atom in the cell with number l; A k i, j is the k-th coefficient in the potential of interaction between the atom i and the atom j -it depends only on types of those atoms.Such sums are called lattice sums.
Terms with k = 1, 2, 3 correspond to long-range interactions, because integrals r k diverge.The main problem for these terms is that one has to calculate them through the whole crystal rather than inside a finite sphere.For the Coulomb part, Ewald method 29,30 was developed, where summation is taken both in real and reciprocal spaces.We generalize this approach to terms with k = 2, 3, 4, 5, 6.The derivation of the formulas (see below) is very cumbersome and is almost the same as described in Ref. 30.For the term k = 6 the formula was taken from Ref. 31.The components with k ≥ 7 we calculate within a sphere of sufficiently large radius (typically, 10 Å).Terms k = 4, 5, 6 can be calculated by summation within a sphere of finite radius, but in order to reach the same accuracy as for terms k ≥ 7, the radius of that sphere would have to be too large, making the calculations too expensive.Ewald-like summation of these terms in both real and reciprocal space makes calculations cheaper and better.
For these terms, real-space and reciprocal-space radii R ma x and G ma x , and parameter g (this parameter appears in formulas below) should be chosen in the most computatioally optimal way, reaching high accuracy.The optimal parameters for the Coulomb interaction were borrowed from the GULP code: 31,32 where N is the number of atoms in a unit cell and V is its volume.Our analysis showed that these parameters work equally well for k = 2, 3, 4, 5, 6, so we used them.
It is obvious from the physical considerations (and also from our formulas, since otherwise sums diverge) that the overall charge of the unit cell equals to zero.It turns out that similar sum rules apply also to the components with k = 2, 3 (see formulas below): A k i, j = 0. From this we make an important conclusion: crystals with only one type of atoms have no long-range interactions.That is why they are fundamentally easier to operate with than crystals made of many atomic types.We assume here that all the atoms of one species should have the same charge.It is true only for systems that have only pair interactions.In real systems even in case of only one type of species there can be a significant charge transfer (e.g. in γ-boron 25 ) because of strong many-body interactions.
085318-4 Dolgirev, Kruglov, and Oganov AIP Advances 6, 085318 (2016) 2r 5 i j (l) Usually, one considers the non-linear dependence of E on r, but it is striking that dependence of E on the Ai j coefficients is obviously linear, and therefore, knowing the energy of some structures, we can reconstruct the coefficients of the remaining structures using linear regression.We also note that since in real systems pair interactions are affected by the environment (which leads to many-body interactions), we can expect the dependence of pair potentials on the density of the system, especially for metals.For metals, this can be described as density-dependent screening of pair interactions by the electron gas.
To illustrate our method, we carried out two tests of our algorithm.In the first test we used the Lennard-Jones potential: where ϵ is the depth of the potential well and r m -position of its minimum.In our tests we considered A x B y Lennard-Jones system with the following coefficients: 5σ -it means that we work in ϵ, σ units.Using USPEX we generated 10000 random (with arbitrary numbers of formula units in the unit cell and any space group symmetry) unrelaxed structures and then computed their energies using GULP. 31Having these energies we reconstructed the potentials -see Fig. 1.Energies per atom of the structures were in the range [−30; −2] in ϵ units.Reconstructed model yielded root mean squared error (RMSE) of less than 10 −7 ϵ; Pearson coefficient was equal to 1; the reconstructed and real potentials coincide on the plot -see Fig. 1.
The second test was more complicated: we added Coulomb interactions to this Lennard-Jones potential.We took the same A-B system but with fixed composition A n B n ; this allowed us to put fixed charges on atoms: +1 on atom A and −1 on atom B. After generating these structures with USPEX and calculating their energies with GULP we could reconstruct the potentials.Results were even more encouraging; we also were able to calculate charges and we got this charge of ±1e on each A atom (to be more precise, we are able only to calculate squared charges).
Thus, we showed that our method allows one to reconstruct pair interaction potentials.Even in the systems with many-body interactions, such potentials carry important chemical information, because usually many-body contributions are much smaller.Yet, for quantitatively accurate results, many-body interactions must be taken into account.The next section describes our main model, which is a combination of our lattice sums method for 2-body interactions and machine learning for many-body terms.With this model we can fit the energies of real systems very accurately.

III. MAIN SCHEME
We express the total energy in the following way (similar ideas are reviewed in Ref. 28): i.e. as a sum of some constant, pairwise and many-body terms.
In our algorithm 2-body term is fitted by linear regression and the second term is fitted by NN.The process of fitting is iterative: at first we reconstruct pair potentials and the constant, then we fit the difference between the actual energy and 2-body part; after NN is optimized we reconstruct pair potentials again by lattice sums method using difference of the total energy and many-body part and so on, until convergence.
To confirm our assumption that 2-body terms are very important, we calculate the following number: where the average is calculated over all structures; it means that if this value is close to 1, then the system can be described by pair potentials.

A. Main scheme and data
Data selection is one of the most important steps, and here we describe selection protocol, noting that the selection of data in each case depends on the problem.For example, if a model is to be used for molecular dynamics, the data should be generated by molecular dynamics and cover the energy range accessible to molecular dynamics.For global optimization, where many initial structures have very high energies, data should span a much wider energy range.Here, we operate with data collected for the second, much more challenging, case.
USPEX is an evolutionary algorithm that can efficiently predict global minimum energy structures.With this code we collect our data, which contain 2 parts.At first USPEX generates 10000-20000 random symmetric structures; energies of these structures are evaluated ab initio without relaxation.Training on these data provides us with good initial values of weights.The second part of our data consists of structures that are obtained in the USPEX run.During this run we collect intermediate and final steps of relaxation in 10-20 generations 33 (yielding in total, ∼ 30000 structures).Since the data contains both random and relaxed structures, training on such data provides us with a highly transferable model.After NN is trained, it can be used for prediction/relaxation in further generations.Thanks to a simple classifier, our algorithm is adaptive: if the newly generated structure is very different from those used for training, then it will be included in the training set and NN will be retrained.That classifier is described in Ref. 17: let X be the training data (each row is a feature-vector of one structure), then x min , x ma x are row-vectors with values that correspond to minimum and maximum of each feature in X, respectively; we say that structure x is new only if not true: x min ≤ x ≤ x ma x .

B. Feature-vector
We want to build a descriptor of a crystal structure, that is unambiguous (i.e. 2 feature-vectors should be different for 2 different structures) and invariant with respect to the order of atoms, choice of the unit cell, translation and rotation of the system.In addition, the feature-vector should be compact, i.e. contain as components as possible.Finally, since ML builds a correlation between the input (feature-vector) and output (in this case, energies), we want to find geometric features that are correlated with energies.
Here as a feature-vector we consider some sums/integrals over crystal structure.In particular, one part of our vector refers to 2-body distribution function, while another part corresponds to many-body correlation function in order to describe better many-body interactions.The first part is represented by the following: 1. Lattice sums with 15 ≥ k ≥ 4 (lattice sums with k = 4, 5, 6 catch far neighbors) 2. For each atom i in the unit cell we calculate the following numbers (sums are taken in a sphere of a large radius around atom i): and and average this sums over atoms in the unit cell.Then we fix some list of g (the list is described below) and for every g we calculate 2 such features.Those terms appear as part of lattice sums with k = 1, 2. Hence, we calculate them within a sphere of a radius where both sums converge (the radius for a given parameter g is equal to 3/g -for details and notations see Section II).In this paper we use the following parameters for 2-body terms in all the examples presented in Section IV: g = [0.17;0.18; 0.19; 0.20; 0.21; 0.22; 0.23; 0.24; 0.25; 0.26].
In the second part we include volume per atom and average degree of order described in Refs.34 and 35.The last one has an important correlation for optimised structures: the higher degree of order, the lower energy of the structure. 35Motivated by success of Ref. 3, we also include averaged 3-body symmetry function which for i t h atom in the unit cell is defined as: , where j-th and k-th atoms are inside a sphere of relatively small radius (6Å); λ = ±1; ξ and η are some fixed parameters for each feature -the parameters used here are listed in Table I.So, the final length of our feature-vector is equal to 50.

C. Neural Network and its training
We use simple feed-forward neural network 28,36 with architecture (50-70-1).It means that the size of feature-vector is 50; we have one hidden layer with 70 nodes; 1 corresponds to the output since we fit only energy.We tested several architectures (with one or two hidden layers) on the examples described in Section IV and it turned out that the simplest architecture with one hidden layer showed the best performance: if the number of training samples is large (≥ 30000) we use (50-70-1), if the number of training samples is small we use (50-50-1).
We fixed the same activation function on both hidden and output layers: tanh(x) + γx.
Such an activation function prevents the so-called paralysis of a neural network; linear term is also important in the output layer, because energies may be arbitrary. 36We used γ = 0.1, and choice of that particular parameter was done by making several tests on examples below.The weights and biases are trained by standard back-propagation.We use batch gradient descent with conjugate gradients, but we modified the objective function.Here we are minimizing the following function: where m is the total number of training samples; x i , E i correspond to i t h feature-vector and its energy; F is the NN function (i.e.output) and w are weights which we are optimizing.Such choice of cost-function results in the NN being more sensitive to low-energy structures.Indeed, the higher is β the more accurately we predict low-energy part of PES, but at the cost of slight worsening of the description of the high-energy part.Since we use gradient descent for optimization of weights, it converges much faster if the feature-vector is normalized.In particular, each feature x is replaced by x ′ = x− x σ , where x denotes the average over the training data; σ is standard deviation.In our scheme, the subtraction of 2-body term may be considered as physically motivated normalization of energies (i.e.normalization of the output of the NN).

IV. RESULTS
In this section we consider 4 systems: Al, C, He and Xe.For metallic Al, many-body effects can be largely subsumed into the pair potential (leading to its oscillating form); for covalent C many-body interactions are essential and largely irreducible to pair effects; for noble gases He and Xe one expects a mostly pairwise interaction of Lennard-Jones type.
Data (structures and energies) on Al, C, He and Xe were generated by USPEX code as described above; we considered structures with 8 atoms in the unit cell.Energies were calculated using density functional theory (DFT) as implemented in the Vienna Ab initio Simulation Package (VASP). 37The generalized gradient approximation of Perdew-Burke-Ernzerhof 38 was used to describe Al and C, whereas we used van der Waals function 39 for He and Xe.Projector-augmented wave (PAW) method 40 was employed to describe core electrons.We used plane wave energy cutoffs of 350 eV, 600 eV, 650 eV and 200 eV for Al, C, He and Xe, respectively, and Γ-centered k-point meshes with the resolution of 2π • 0.07Å −1 .

A. Aluminum
As described in Section III, we applied our scheme to Al, where we collected 30 000 training structures (∼12 000 of them correspond to random and the rest are intermediate structures collected during relaxation) and 8 000 were used for test.We summarize our results in Table II.Energies were in the range [−3.8; 0] eV/atom.
One can notice in Table II that results at low energies do not depend much on β here.This is due to the very high contribution of 2-body part to the total energy ≥ 90%.This significantly changes in carbon below, where interactions are much more complex.Reconstructed potential is plotted on Fig. 2. Since the data include structures of different densities, it means that this potential is averaged over different densities.
Since the interaction potential generally depends on the density, we performed the following calculations: about 20 000 random structures with fixed densities were generated and afterwards for each density we reconstructed the potentials using our scheme with architecture (50-50-1) (we fixed β = 0).We used the following densities: 8.62  they have pronounced wiggles which correspond to Friedel oscillations, i.e. reflects the effect of screening of the pair interaction by the electron gas, and they have minima at the same Al-Al distance (also a confirmation of Friedel's theory).This distance is very close to 2.86 Å -the neighbour distance between Al atoms in the fcc structure of Al.Before proceeding further, we would like to notice that 2-body contribution is higher in structures with higher energies (structures far from local energy minima).This means that a given non-optimal structure can be relaxed at first steps only by using the potential reconstruction method.
Interestingly, the higher the density the higher the potential -see Fig. 3.At first sight it may be concluded that the the higher the density the higher relative contribution of many-body terms to the total energy.But for Al calculations show the opposite.We guess that this is because a large part of many-body interaction energy is subsumed into the (now density-dependent) pair potential.

B. Carbon
We applied our scheme to carbon, where we collected 37 000 training structures (∼12 000 of them correspond to random and the rest are intermediate structures collected during relaxations) and 9 000 were used for test.We summarize our results in Table III.Energies of structures covered a wide range [−9.3; 0] eV/atom.We calculated the average 2-body contribution and it turned out to be unexpectedly small -only about 50%.It means that this system is very complex and cannot be treated by any pair model, many-body interactions being essential.decided to test the accuracy of our model by comparing it with the well-known ReaxFF model 41 for test structures (i.e.those that were not used for training of our model).One important question arises: how to compare different models?We want to emphasize that although we want to fit high-energy part of PES, it is more important to describe low-energy part very accurately.We propose to plot RMSE as a function of energy: RMSE(E) gives RMSE for structures with energies below E. Indeed, such plot reflects better the performance of the model.We plot RMSE(E) curves for ReaxFF and for our models in Fig. 4. Clearly, the higher β the better low-energy part of PES is described and the worse is high-energy part.Analysing the curves, we can naturally separate models with different β and chose to use them in different energy ranges, obtaining a very good fit of both low-and high-energy regions.We compare our best model with performance of ReaxFF on Fig. 4 and on Fig. 5.This reflects a clear advantage of our scheme.

C. Noble gases
It is well known that interactions in noble gases can be described by a simple Lennard-Jones potential.Our scheme recovers this shape without using any assumptions -see Fig. 6.
Here we used a simple architecture (50-50-1) (we fixed β = 0) and trained on unrelaxed structures (∼ 20000 structures for each systems).The minima of these potentials are very close to the sum of van der Waals radii: R H e−H e = 2.80 Å and R X e−X e = 4.32 Å. He-He potential has a very shallow minimum, as expected.Moreover, 2-body contribution to the total energy in He is 97%; during the process of training, RMSE on train was only 0.003 eV/atom, while the energy range was [−0.015; 0.87] eV/atom.In Xe the 2-body contribution decreased to 92% and many-body terms turned out to be quite important.Many-body effects in Xe are discussed comprehensively in paper. 42

V. DISCUSSIONS
The new machine learning strategy is proposed for recovering potential energy surfaces, with emphasis on its use for global exploration of the PES (e.g., in crystal structure prediction).The method is a combination of the proposed lattice sums method and an artificial neural network.We reconstructed potentials for a number of systems, and our strategy showed good results.Our reconstructed Al-Al potential showed, as expected for metals, strong density-dependence and oscillatory form originating from Friedel oscillations.Lattice sums with k = 4, 5, 6 are very important for metals.We demonstrate that unlike metals, covalent crystals (e.g.carbon) cannot be adequately described by any pair potential, and many-body interactions are essential.Our scheme, including many-body interactions, shows much better performance than well-known ReaxFF manybody forcefield.Reconstructed pair potentials in noble gases (He-He and Xe-Xe) demonstrate the Lennard-Jones type shape with minima corresponding to sums of van der Waals radii.FIG. 7. Performance of our scheme varying size of data used for training.

FIG. 1 .
FIG. 1. Reconstructed and real potentials in A x B y Lennard-Jones system.

FIG. 5 .
FIG. 5. Comparison of performances of our best mixed model with known ReaxFF method.

TABLE I .
Parameters for 3-body terms used in all the examples in Section IV.

TABLE II .
Results of our scheme for aluminum: r low and RMSE low correspond to the Pearson coefficient and RMSE on test structures with energies not more than 0.5 eV/atom above the minimal energy; r and RMSE correspond to all test structures.

TABLE III .
Results of our scheme for carbon: r low and RMSE low correspond to the Pearson coefficient and RMSE on test structures with energies not more than 1 eV/atom above the minimal energy; r and RMSE correspond to all test structures.