Continuous and Optimally Complete Description of Chemical Environments Using Spherical Bessel Descriptors

Recently, machine learning potentials have been advanced as candidates to combine the high-accuracy of quantum mechanical simulations with the speed of classical interatomic potentials. A crucial component of a machine learning potential is the description of local atomic environments by some set of descriptors. These should ideally be continuous throughout the specified local atomic environment, twice-differentiable with respect to atomic positions and complete in the sense of containing all possible information about the neighborhood. An updated version of the recently proposed Spherical Bessel descriptors satisfies all three of these properties, and moreover is optimally complete in the sense of encoding all configurational information with the smallest possible number of descriptors. The Smooth Overlap of Atomic Position descriptors that are frequently visited in the literature and the Zernike descriptors that are built upon a similar basis are included into the discussion as being the natural counterparts of the Spherical Bessel descriptors, and shown to be incapable of satisfying the full list of core requirements for an accurate description. Aside being mathematically and physically superior, the Spherical Bessel descriptors have also the advantage of allowing machine learning potentials of comparable accuracy that require roughly an order of magnitude less computation time per evaluation than the Smooth Overlap of Atomic Position descriptors, which appear to be the common choice of descriptors in recent studies.


I. INTRODUCTION
Machine learning potentials (MLPs) have recently become a subject of interest in computational materials science 1 , potentially offering the accuracy of electronic structure techniques like density functional theory (DFT) without the associated computational cost. An MLP effectively learns to reproduce the potential energy surface (PES), i.e., the hypersurface that defines the potential energy of an atomic system as a function of the atomic positions.
While the reliability of atomistic simulations including molecular dynamics (MD) depends on the accuracy of the PES, their usefulness to study complex phenomena is limited by the accessible time and length scales; in practice this makes the computational cost of an MD simulation nearly as much a concern as the accuracy. Recent studies [2][3][4] suggest that MLPs can achieve a favorable combination of performance and accuracy that is provided by neither classical force fields nor electronic structure calculations.
Machine learning (ML) algorithms that have been employed to construct MLPs include artificial neural networks (ANNs) 5,6 , support vector machines (SVMs) 7 and Gaussian processes (GPs) 8 . Regardless of the algorithm, MLPs rely on the reasonable assumption that the energy of an atom is a multidimensional function of the relative positions of the neighboring atoms. This atom-centered approach 6 enables the total energy E of a system to be calculated by summing over all individual atomic energies E i as and reduces the problem to one involving a local atomic environment. This environment is usually encoded as a set of scalars, known as descriptors, that serve as the inputs for the atom-centered MLPs. Faber et al. 9 carried out a systematic study of how the choice of descriptors and ML algorithm can affect the accuracy of an MLP by testing a variety of combinations. They found that the choice of descriptors could affect the accuracy more than the regression scheme, justifying the effort spent over the last decade in developing the many competing descriptors available in the literature 6,8,[10][11][12][13][14] . Of these, the Behler-Parinello (BP) symmetry functions 6 and the Smooth Overlap of Atomic Position (SOAP) descriptors 10 are some of the most frequently used, and have been employed in MLPs that achieve the accuracy of electronic structure methods in a variety of applications [15][16][17][18] . Afterwards, Khorshidi et al. proposed to use the Zernike polynomials 11,19 and the neighbor density function of Bartok et al. 8 to construct the Zernike descriptors, and reported comparable results. Recently, Kocer et al. 20 proposed to use the spherical Bessel functions with a closely related procedure to construct the Spherical Bessel (SB) descriptors. These were found to allow construction of MLPs significantly more accurate than those using the BP symmetry functions, and of comparable accuracy to but an order of magnitude faster to evaluate than those using the SOAP descriptors.
Any set of descriptors should satisfy a number of mathematical properties to not constrain the ability of the ML algorithm to approximate the PES. First, it is desirable from a computational standpoint that they be invariant to the symmetries of the physical system (i.e., translations, rotations, inversions and permutation of atomic labels) to reduce the domain of the PES and the number of training examples required. More subtle but perhaps more important is that the descriptors be similar but distinct for similar but distinct atomic environments: If the descriptors are not similar, the MLP would not likely be continuous, and if the descriptors are not distinct, the MLP would not be able to reproduce potentially significant features of some physical systems. This is closely related to the concept of completeness, here defined as the condition that the space of all local atomic environments that are not related by physical symmetries is smoothly embedded into the space of descriptors. This is desireable because, e.g., a set of descriptors that is complete allows the atomic environment to be reconstructed up to symmetry. The stronger condition of optimal completeness requires that the embedding into the space of descriptors always be achieved with the minimum number of descriptors, and is highly desireable for computational reasons. Finally, the descriptors should be twice-differentiable to allow for continuity of forces and elastic constants, contain few adjustable parameters to help with transferrability of the potentials, and be numerically efficient to evaluate. To the extent of our knowledge, none of the descriptors available in the literature fulfills all of these requirements. This paper presents an updated version of the SB descriptors (requiring only a change in indexing) that makes them continuous with respect to atomic displacements. A necessary condition for optimal completeness is then formulated using the Rank Theorem 21 . The SB descriptors are found to satisfy this condition, whereas the power spectrum coefficients used in the construction of the SOAP descriptors do not. Finally, the accuracy and efficiency of the SB descriptors in a proof-of-concept MLP are compared to several of the alternatives available in the literature.

II. SPHERICAL BESSEL DESCRIPTORS
Following a similar procedure to our recent study 20 , an atomic neighbor density function is first defined for a central atom i, wherer ij are the relative position vectors of each neighbor j with respect to i. The weight factor w k ij could be used to specify the species of atoms i and j in a multi-component system, but is assumed to be one in this study. The neighbor density function ρ(r) is projected onto a set of orthonormal basis functions on the ball of radius r c , giving an expansion of the form where g nl (r) is a radial basis function, Y m l (θ, φ) is a spherical harmonic, and n max specifies the order of the approximation. While many functions could be used for the g nl (r), the one for the SB descriptors begins with the linear combination f nl (r) = a nl j l r u ln r c + b nl j l r u l,n+1 r c where a nl and b nl are constants, j l (r) is the lth spherical Bessel function of the first kind, u ln is the (n + 1)th nonzero root of j l (r), and r c is the cutoff radius. The condition f nl (r c ) = 0 is satisfied by definition, and a nl and b nl can surprisingly be chosen to simultaneously satisfy the conditions f nl (r c ) = 0 and f nl (r c ) = 0, i.e., to make the radial basis functions twice differentiable at the cutoff radius. Along with normalization, this leads to The radial basis functions g nl (r) are then obtained by applying a Gram-Schmidt process to the f nl (r) for 0 ≤ n ≤ n max . A detailed derivation of the g nl (r) and explicit recursion relations that allow them to be efficiently evaluated are provided in the supplementary material. Given the radial and angular basis functions in Eq. 2, the expansion coefficients c nlm for the ith atom are calculated from the relative spherical coordinates (r ij , θ ij , φ ij ) of the neighboring atoms as by means of the standard orthogonality relations. The power spectrum p nl obtained from then comprises an infinite set of real-valued numbers that are used as the local structural descriptors. They are invariant to translations by the use of relative spherical coordinates, and to permutations of atomic labels by the construction of the neighbor density function in Eq. 1. Invariance to rotations and inversions can be seen by substituting Eq. 3 into Eq.
4 and reordering the summations to find where the subscript i is suppressed for clarity. The spherical harmonic addition theorem 22 allows this to be reduced to where P l is the Legendre polynomial of order l and γ jk is the triplet angle between atoms i, j and k. Since the radial distances and triplet angles that constitute the independent variables in Eq. 5 are invariant to rotations and inversions, the p nl necessarily have the same property. A second reason to consider Eq. 5 as defining the p nl is that Eq. 5 is much more efficient to evaluate than Eqs. 3 and 4.
The original definition 20 of the radial basis functions g nl (r) included the constraint l = 0, removing the coupling between the angular and radial parts in Eq. 2. While this significantly simplified the evaluation without an observable effect on the accuracy of the MLP for that specific set of training data, this also introduced discontinuities around the origin (near the central atom) in the basis functions in Eq. 2 for odd l (Fig. 1a). Precisely the same issue occurs for the basis functions used in the SOAP descriptors (Fig. 1b), and is likely the motivation for using a superposition of Gaussians in the neighbor density function 10 to smooth over the discontinuity. This approach comes at the price of expensive numerical integrations when evaluating the descriptors though, and introduces additional adjustable parameters. The proposed SB descriptors instead use basis functions that are twice differentiable everywhere (Fig. 1c), allowing the neighbor density function to be written as a superposition of Dirac delta functions and the descriptors to be calculated at least an order The functions n is a normalizing constant are known to form a complete orthonormal basis for square-integrable functions on this domain 27,28 . Since the proposed radial basis functions g nl (r) are derived by projecting the j l (r u ln rc ) onto the space of functions with vanishing first and second derivatives at the boundary and constructing an orthonormal basis from the result, the basis functions in Eq. 2 constitute a complete orthonormal basis for squareintegrable functions with the given boundary conditions as well.
With regard to the descriptors, completeness is usually considered to indicate whether the descriptors can be used to faithfully reconstruct a given local atomic environment up to symmetry. This paper instead defines completeness by whether the space of physicallydistinct atomic environments is smoothly embedded by a map into the space of descriptors.
This necessarily implies that there is an inverse map that allows the atomic environment to be reconstructed up to symmetry, and moreover that the map and its inverse are both continuous and differentiable. Observe that a local atomic environment with ν neighboring atoms is specified by 3ν distinct relative spherical coordinates, but only 3(ν − 1) quantities (e.g., the ν radial coordinates and 2ν − 3 triplet angles) are required to specify the environment up to rotations. This means that the space of physically-distinct atomic environments is 3(ν − 1)-dimensional, and a complete set of descriptors maps this space to a 3(ν − 1)-dimensional submanifold in the space of descriptors. Additionally, if the embedding is achieved using only the first 3(ν − 1) of the descriptors for any ν > 2, then the descriptors are said to be optimally complete. Intuitively, an optimally complete set of descriptors encodes all relevant information (and just this information) about the atomic environment as concisely as possible.
There is limited discussion of completeness in the literature. One exception is the proof by Shapeev 29 that any rotation-and permutation-invariant polynomial can be written as a linear combination of the moment tensor descriptors, implying that these descriptors are complete (though probably not optimally complete). A recent dimensionality-reduction study 30 also touches on this question, attempting to optimize an MLP by reducing the dimension of the feature space. Possibility of such a reduction indicates that the BP and SOAP descriptors considered by the study contain substantial redundant information.
While proving that a set of descriptors is complete using the definition above is quite difficult, there is a necessary (but not necessarily sufficient) condition for completeness and optimal completeness that can be readily evaulated for any set of descriptors. This involves 8 using the rank theorem 21 (a generalization of the implicit function theorem) to establish that the map from the space of physically-distinct atomic environments into the space of descriptors is locally invertible. More precisely, the condition establishes that for any particular atomic environment there is a set of closely-related and physically-distinct atomic environments over which the map into the space of descriptors is invertible, and that the inverse map is continuously differentiable. Practically speaking, this involves finding the rank r of the Jacobian matrix J of the function that transforms the relative atomic coordinates into the vector of descriptors. Assume for the moment that r is constant. If r < 3(ν − 1), then the descriptors discard relevant information and cannot be complete. If r > 3(ν − 1), then the numerical calculation is faulty and should be checked. If r = 3(ν − 1), then the descriptors satisfy the necessary condition to be complete (though this local property does not necessarily extend to a global one). With regard to optimal completeness, let J [q] be the matrix formed by taking the first q rows of J. If the rank of J [3(ν−1)] is 3(ν − 1) for any n > 2, then the descriptors satisfy the necessary condition to be optimally complete.
Consider the p nl for the local atomic environment around the ith atom. From Eq. 5, the p nl can be written as a function of the relative spherical coordinates of the neighboring atoms as p nl = 2l + 1 4π j k g n−l,l (r j )g n−l,l (r k )P l (cos θ j cos θ k + sin θ j sin θ k cos(φ j − φ k )) (6) using various trigonometric identities. This defines a map from the 3ν-dimensional space of relative spherical coordinates into the infinite-dimensional space of the descriptors p nl .
Let the Jacobian matrix of this map be constructed with rows labelled by the pairs (n, l) in lexicographic order, where the (n, l)th row contains the 3ν partial derivatives of p nl with respect to the relative spherical coordinates (provided in the supplementary material). In practice, J [q] is constructed to consider the information content of only the first q descriptors, and the rank of J [q] is found by performing singular value decomposition and counting the singular values that are substantially larger than the machine precision.
As a specific example, we generated an atomic environment with six randomly-positioned neighbors around a central atom, and constructed the J The decay of the s i to the machine precision for i > 15 indicates that the rank of J is 15.
FIG. 3. The logarithmic singular values log(s i ) of J [18] for the p nl descriptors as a function of q. Five configurations were generated by scaling the initial configuration such that the radial coordinate of the most distant atom ranged from 0.967r c to r c .
environments and different numbers of neighbors. This strongly suggests that the p nl satisfy the necessary conditions developed above for completeness and optimal completeness.
The above analysis is somewhat complicated by the differentiability of the p nl as atoms pass through the boundary at r = r c ; the differentiability of the p nl implies that the s i are continuous, and the number of significant s i should be reduced by three as an atom leaves the environment. This situation is considered in Fig. 4, where q is fixed at 18 and the log(s i ) are plotted as the most distant atom approaches the boundary. This shows that the rank behaves as expected, and moreover that the descriptors contain significant information even where 0 ≤ n ≤ n max , 0 ≤ n ≤ n max , and 0 ≤ l ≤ min(n , n). This is motivated by the desire to explicitly couple information from radial basis functions with different indices n and n, and is closely related to the standard construction of the SOAP descriptors 10,23 . The Jacobian matrix of this map is constructed with rows labelled by the triplets (n , n, l) in lexicographic order (partial derivatives are provided in the supplementary material). Figure   4a shows the logarithmic singular values of J [q] for an environment with six randomlypositioned neighbors. While these descriptors satisfy the same completeness condition as the p nl (p nnl = p nl ) they are certainly not optimally complete, with r = 15 only for q ≥ 33. The usual practice when constructing the SOAP descriptors 30 is to not place further restrictions on n and n, but inspection of Eq. 7 indicates that p n nl = p nn l . That is, nearly half of the p n nl and SOAP descriptors are trivially redundant and should be removed by enforcing the constraint n ≤ n . This is done in Fig. 4b, and while the situation is improved (r = 15 for q ≥ 26) there is still a significant number of redundant terms. Our conclusion is that constructing descriptors from the power spectrum with n = n is highly preferable to the n = n case.

IV. PERFORMANCE AND EFFICIENCY
Since the intention for the SB descriptors is to be used as inputs for an MLP, a highdimensional neural network potential (NNP) was constructed for solid-state silicon using the procedure described in Sections IIB and IIC of Kocer et al. 20 The training data consisted of 8500 environments sampled from an MD simulation equilibrated at 0 Pa and 1500 K that used the Stillinger-Weber potential 31 , and the error was defined as the root mean square deviation of the predicted energies of an additional 1500 environments. Corresponding NNPs were constructed using the same data and training procedure for the prior SB descriptors 20 , the SOAP descriptors 23 , and the Zernike descriptors 19 (the BP descriptors having been con- One of the advantages of the SB descriptors is that while they allow for the construction of MLPs of comparable accuracy to those using the SOAP descriptors, the SB descriptors are much faster to evaluate. Apart from the MATLAB implementations used for the timing experiments reported in Sec. II, a highly optimized C library with MATLAB and Python interfaces has been developed to calculate the SB descriptors and is publicly available a .

V. CONCLUSION
The descriptors used to describe a local atomic environment are one of the essential components of a machine learning potential. This paper identified a discontinuity in the basis functions used to construct the prior SB descriptors 20 and the SOAP descriptors 10 , and updated the indexing of the SB descriptors to make them twice-differentiable everywhere.
Moreover, the SB descriptors were shown to satisfy a necessary condition for optimal completeness on the basis of the rank theorem 21 , establishing their ability to encode all relevant physical information about a local atomic environment using the fewest possible descriptors. At present, the SB descriptors are the only descriptors known to satisfy this condition. Moreover, they have been shown to be more than an order of magnitude faster to calculate than the SOAP descriptors, and an optimized code to calculate the SB descriptors has been made available. Finally, the performance of an NNP for solid-state silicon using the SB descriptors was compared to that of NNPs using the prior SB descriptors, the SOAP descriptors, and the Zernike descriptors.