A universal preconditioner for simulating condensed phase materials

We introduce a universal sparse preconditioner that accelerates geometry optimisation and saddle point search tasks that are common in the atomic scale simulation of materials. Our preconditioner is based on the neighbourhood structure and we demonstrate the gain in computational efficiency in a wide range of materials that include metals, insulators and molecular solids. The simple structure of the preconditioner means that the gains can be realised in practice not only when using expensive electronic structure models but also for fast empirical potentials. Even for relatively small systems of a few hundred atoms, we observe speedups of a factor of two or more, and the gain grows with system size. An open source Python implementation within the Atomic Simulation Environment is available, offering interfaces to a wide range of atomistic codes.


I. INTRODUCTION
Geometry optimisation, i.e. finding a nearby local minimum of the potential energy surface is the most common routine task of atomistic modelling, not only used for finding the equilibrium geometries of molecules and crystals but also as a fundamental building block of more complex algorithms for global optimisation, 1 structure prediction by random search 2 and sampling. 3The closely related task of finding saddle points is also used for finding transition states of reactions, global optimisation, and accelerated sampling.
It is well recognised in the optimisation community how important preconditioners are in creating efficient algorithms.An example familiar in the electronic structure community is using the kinetic energy operator as a preconditioner when solving the electronic energy minimisation problem in plane wave pseudopotential density functional theory (DFT) codes. 4Preconditioning in linear algebra and numerical PDE problems is well established, but "universal" preconditioners do not work particularly well, and most practitioners advocate constructing preconditioners specifically designed to suit each problem. 5here is a middle ground, which is to reduce the domain enough to be able to give a good preconditioner, but keep it general enough that many problems that need solving fall into it.
The hallmark of a good preconditioner is that it captures some aspects of the local curvature of the potential a) Electronic mail: j.r.kermode@warwick.ac.uk b) Electronic mail: c.ortner@warwick.ac.uk c) Electronic mail: gc121@cam.ac.uk energy landscape, e.g.some of the directions in which the minimum is much shallower than in other directions.In this way, using the preconditioner enhances the convergence by reducing the condition number (see (2)).For example, it was recognised by many that geometry optimisation with a computationally expensive electronic structure model can be preconditioned using cheap empirical interatomic model.This approach is clearly not feasible for large scale problems in which the modeling method itself is a relatively cheap interatomic model.
A universal goal in preconditioning of condensed phase atomistic systems is to take account of the long wavelength vibrational modes, whose energies tend towards zero as the system size increases, while the eigenvalues corresponding to the high frequency optical modes stay constant.In order to capture this geometry, due to the intrinsic locality of the interaction Hamiltonian, it is enough to build a model that is aware of the neighbourhood structure of the constituent atoms or molecules.
In this work we use the simplest preconditioner that is capable of capturing this structure, the adjacency matrix of the atoms, or a smoothed variant using a distance cutoff.The only requirement of the cutoff is that it is chosen such that all atoms are assigned some neighbours.We choose example systems of current interest which have a wide range of system sizes.
For a steepest descent (SD) or nonlinear conjugate gradient (CG) scheme with preconditioner P one expects that the number n P of iterations required to reach a relative residual τ is 6 where κ P is the condition number of the preconditioned Hessian at equilibrium, and are the largest and smallest eigenvalues.For a material system with a diameter of R atomic spacings, without preconditioning (i.e.P ≡ I), one expects κ I ∼ R while our preconditioner achieves that κ P is independent of R. Therefore the expected efficiency gain is The theory of the most commonly used Broyden-Fletcher-Goldfarb-Shanno (BFGS) and similar quasi-Newton type schemes is less clear, but numerical evidence suggests that a similar conclusion as in the CG case can be drawn.

A. Geometry Optimisation
Throughout, we let f (x) denote the energy for a configuration x.If x k is an iterate of an optimization algorithm then we denote the gradient and Hessian at x k , by The most basic geometry optimisation schemes are steepest descent and (undamped) Newton's method, While the former suffers from slow convergence to equilibrium due to ill-conditioning of the energy-landscape, the latter is usually impractical since (i) analytical Hessians are typically unavailable for complex interatomic potentials and electronic structure methods and (ii) are expensive to invert.Line search is an essential part of all the above gradient descent algorithms, and preconditioning the line search (as opposed to preconditioning the Newton step) can be thought of as a middle ground, replacing H k with an approximate Hessian P k , The usual requirements on P k are that it is (1) cheap to build; (2) cheap to invert; and (3) positive definite to ensure descent in energy.
The most common way to construct P k is via a quasi-Newton approach, typically (L)BFGS.This works poorly for large systems since many iterations are required to "learn the Hessian" to a useful degree of accuracy.Physical intuition and mathematical analysis can been used to develop an improved initial guess for the Hessian to speed up convergence. 7n alternative approach (sometimes used in the electronic structure community 8 ) is to take P k = ∇ 2 f (x k ) to be the Hessian of a surrogate interatomic potential model f .This has considerable potential for performance gains if a good surrogate model f can be found.Downsides of this approach are (i) the challenge of finding or constructing such a surrogate model; (ii) indefiniteness of the surrogate Hessian in the nonlinear regime (and potentially even in the asymptotic regime); (iii) lack of transferability of the preconditioner: changing the system requires the construction of a new surrogate model.

B. Metric preconditioning
Assume, for the moment, that we use the same preconditioner throughout the optimization process, P k ≡ P in (8).An alternative point of view, which is common in the numerical linear algebra and nonlinear optimisation communities, is to think of P as defining a metric on the space of configurations.To see this note that calling −g k = −∇f (x k ) the direction of steepest descent is with reference to the ℓ 2 -norm u I := ( |u i | 2 ) 1/2 (where u is a direction in configuration space).If we measure distances in configuration space with respect to the P -norm, u P = (u T P u) 1/2 , then the direction of steepest descent becomes arg min That is, (8) is the natural steepest descent scheme with respect to the metric P k .The advantage of this point of view is that it frees us from the constraint of aiming to approximate the Hessian.Instead we are now searching for an alternative notion of distance in configuration space, which is a more general concept and a fixed choice of metric may exist that is suitable for a wide range of atomistic systems.Equivalently, we may think of (8) in terms of a change of coordinates.Let xk := P 1/2 x k , and F (x) = f (P −1/2 x), then the "standard" gradient descent scheme xk+1 = xk − α k ∇F (x k ) is equivalent to (8).
Since ∇ 2 F (x) = P −1/2 ∇ 2 f (P −1/2 x)P −1/2 it follows that the rate of convergence x k → x of (8) to a limit x is given by 9 where κ P is the condition number of P −1/2 HP −1/2 .The latter can be computed from the generalised eigenvalue problem Hv = λP v. (10)   While approximating the Hessian would lead us to aim for P such that κ P ≈ 1, we shall be content with a good notion of distance which will lead to a P such that κ P is bounded by some moderate constant for a wide range of systems of interest.
Our final remark in this abstract context is that while the discussion of convergence rates applies strictly to the asymptotic regime of the iteration, preconditioning also improves performance in the pre-asymptotic regime: a moderate upper bound on κ P implies, loosely speaking, that (8) relaxes all wavelength modes simultaneously rather than focusing on short wavelength modes first.

C. Preconditioned LBFGS
The usage of a preconditioner is not restricted to the steepest descent method, but it can be readily applied to improved optimisation algorithms such as nonlinear conjugate gradients.It is particularly effective when combined with the LBFGS scheme 6 , for which we briefly outline the implementation. Using k s k then the action of the inverse Hessian can be efficiently approximated, This formulation of LBFGS does not require the approximate Hessian itself to be stored, only the positions and gradients at previous iterates.For the initial iterate we simply obtain z = P −1 0 ∇f (x 0 ).The boxed step is the only modification needed to the standard algorithm to achieve preconditioning.After obtaining the output p k = z from (11), the LBFGS step takes the form for a suitable choice of step length α k .

D. A simple and general metric for materials
Changes in energy of atomistic systems occur through changes in bonding, for which the simplest measure is change in bond-length.Motivated by this observation we propose the following preconditioner for materials systems: given parameters r cut , r nn , A, µ (we will discuss below how to choose these automatically) we define P via the quadratic form or, written in matrix form Default parameters are discussed in section II E.
Remarks.(i) The exponential form of c ij is for convenience, and has no deeper physical meaning; A = 0 corresponds to using the adjacency matrix with a hard cutoff.(ii) We use this metric even for multi-component systems, however, if the interaction strength and/or distances between different components varies significantly, then it would be straightforward to generalise it by distinguishing different types of bonds.(iii) As shown in the Appendix, for Bravais lattices, phonon stability is equivalent to the lower bound u T Hu ≥ cu T P u for some constant c > 0.
Together with the generic and elementary upper bound u T Hu ≤ Cu T P u and equations ( 2)-( 4) we obtain that for finite periodic supercells in a Bravais lattice state, the condition number κ P for the preconditioned system is bounded above by C/c independently of the system size.In the presence of defects (crystal surfaces, point defects, dislocation lines) or even disorder partial results in this direction likely still hold because P contains the nearest neighbour bonds that dominate in H.

E. Default Parameters
The parameters A and r cut are user inputs, however P is fairly insensitive to their choice, provided their interdependency illustrated in Figure 1 is taken into account.Hence, we suggest generic default parameters below.The parameters µ and r nn are computed in a preprocessing step from the initial configuration of the optimisation.
1.The nearest-neighbour distance r nn is obtained as the maximum of nearest neighbour bond-lengths: if r 2. The exponent A should be large enough to ensure that nearest neighbours dominate, but not so large that small changes in the configuration lead to large changes in P .All our tests are performed with A = 0 and A = 3; with A = 3 giving slightly better performance.3. The cut-off r cut should be larger than r nn , however, then exponential decay of the preconditioner entries ensures that additional entries have a small influence.For A = 0 we choose r cut = 1.1r nn and when A = 3 we use r cut = 2r nn .The latter choice is intuitively preferable since it accommodates the possibility of significant bond stretching.
4. Finally, the energy-scale µ is chosen to ensure that the LBFGS algorithm can choose the unit step-length as the default.We achieve this by equating where P µ=1 is the metric with µ = 1 and v is a test displacement of the form where L i are the lengths of the periodic lattice vectors and M is a user-defined matrix with default value M = 10 −2 r nn I.

F. Implementation details
Preconditioner application.It is important that the cost of applying the preconditioner does not dominate the cost of the calculation over the evaluation of energy and gradient.For inexpensive models (Lennard-Jones, EAM, Stillinger-Weber, etc) the choice of method to solve z = P −1 k q in (11) is crucial.Our implementation uses a smoothed aggregation algebraic multigrid method 10 .As a further optimisation we only rebuild the preconditoner when the maximum atomic displacement since the last update exceeds r nn /2.
Line search.Irrespective of the choice of the search direction used (e.g., SD (8), CG 6 or LBFGS (12)) a line search algorithm must be implemented to choose the length of the step, α k .The standard choice is a bracketing algorithm which enforces sufficient decrease and approximate orthogonality between subsequent directions (Wolfe conditions).We observed in our tests that a backtracking algorithm imposing only sufficient decrease (Armijo condition), although less robust in theory, was more efficient in practise.We give the details of our implementation, and additional discussion, in Appendix A.
Robust energy differences.The computation of the energy differences and inner products in the Wolfe conditions (A1, A2) must be performed with a high degree of accuracy, since the optimization algorithm relies on robustly detecting the change in energy.A common difficulty in implementing a line search strategy based on (A1, A2) is the numerical round-off error that arises for large numbers of atoms (typically 10 5 or higher).Numerically robust inner products are equally important in the inversion of the preconditioner and in the LBFGS algorithm.Numerically robust evaluation of energy differences and inner products may, for example, be implemented using compensated summation algorithms. 11A simpler strategy which proved sufficient in our case is to use 128 bit floating point numbers for these steps.
Stabilisation.If the system contains clamped atoms then the preconditioner defined in ( 13) is strictly positive definite but in order to improve its conditioning and ensure positive definiteness for cases where there are no clamped atoms, we stabilize the preconditioner by adding a diagonal term, In all our results we choose C stab = 0.1.Even when there are clamped atoms, we find that setting C stab = 0.1 improves overall performance.

Variable cell optimisation.
We confirmed that our preconditoner also gives good performance when degrees of freedom associated with the periodic unit cell are included as well as the atomic positions.Following the approach of Tadmor et al. 12 , we consider a combined objective function Φ(x, D) = f (Dx) with 3N + 9 degrees of freedom: 3N for the atomic positions x and 9 components of the deformation tensor D, which is with respect to the original undeformed unit cell.The combined gradient is then given by where V is the cell volume and σ the stress tensor, and we have introduced an additional preconditioner parameter µ c to set the energy scale for the cell degrees of freedom.µ c can be pre-computed at the same time as µ for no additional cost by including a trial perturbation of the cell in (15), with default v(x c ) = M/r nn = 10 −2 I.

III. RESULTS
We have selected a broad range of materials examples to test our preconditioner.The first is a 160 Si atom 1 × 1 × 20 supercell of the cubic diamond structure cell in a slab geometry, with periodic boundary conditions along x and y and free boundaries in z, simulated with the Stillinger-Weber interatomic potential. 13The two halves of the cell (along z) are uniformly displaced toward each other by 0.5 Å, creating a large but very localized strain in the center of the slab.The problem is ill-conditioned because the initial strain is localized, but reaching the relaxed geometry requires all the slab atoms to move out towards the free surfaces.As shown in Fig. 2, both the A = 0 and A = 3 preconditioners dramatically reduce the computational cost of the minimization, by a factor of about 6 compared to the non-preconditioned minimizer.Results using the Pfrommer et al. 7 block-diagonal approximation to the initial inverse Hessian are also shown for comparison.Note that even for this relatively fast interatomic potential the computational cost of applying the preconditioner is nearly negligible, so the reduction in computational time is nearly equal to the reduction in number of energy evaluations.
Next we consider a 33,696-atom Si model of the (111) [11 2] cleavage system (Fig. 3) in a quasi-twodimensional thin strip geometry with dimensions 717 × 242 × 3.84 Å3 .The applied strain was chosen so that the crack is lattice trapped 14 , leading to a stable ground state with the Stillinger-Weber 13 interatomic potential.Strong coupling between length scales makes this a difficult system to optimize and hence a good test of our preconditioner.A complex trade off between local chemical cost and long-range elastic relaxation makes it favourable for a 5-7 crack tip reconstruction to form via a bond rotation. 15Here, we find that both the A = 0 and A = 3 preconditioners lead to a significant speed up over both unpreconditoned LBFGS and the approach of Pfrommer et al. 7 .Fig. 3 also includes a comparison between the Armijo and Wolfe line searches.As noted above, enforcing only the Armijo condition leads to a further increase in performance.
To investigate whether the theoretical independence of the cost of preconditioned minimisations from system size (Eq.5) is achieved in real systems, we carried out tests in a series of N × 1 × 1 Si supercells, again using the Stillinger-Weber potential.The atomic positions were perturbed by random displacements of magnitude 0.1 Å and also subjected to a compressive strain of 0.5% to introduce a long-wavelength deformation.The results shown in Fig. 4 indicate that our preconditioner achieves convergence after an approximately constant number of

FIG. 2.
Convergence of the geometry optimisation of a 160atom silicon slab using the Stillinger-Weber potential in fixed unit cell.The parameters of the preconditioner (or not using a preconditioner) are given in the legend.The lower panel shows the time required to solve the problem in each case, indicating that the overhead of constructing and applying the preconditioner is minimal in comparison to the cost of computing forces with the interatomic potential.
force evaluations as the system is made larger, in contrast to not using a preconditioner or to the approach of Pfrommer et al. 7 which does not use connectivity information.Our new method is therefore expected to be particularily useful for very large systems.
Since large systems inherently have a wide range of displacement wavelengths and corresponding stiffnesses, it is not obvious a priori how much preconditioning will help for a smaller system, for example one that can feasibly be simulated using density functional theory.We therefore simulated a perovskite structure oxide, LaAlO 3 , in a 220atom slab geometry with periodic boundary conditions in-plane and free surfaces separated by a vacuum region in the normal direction.7][18] In this system, as shown in Fig. 5, we find that the preconditioning still significantly reduces the computational cost, but the improvement is not as dramatic as for the larger systems discussed above.With our convergence criterion the reduction is about a factor of two, although the non-preconditioned minimization stagnates just before reaching convergence, and with a slightly looser criterion the reduction would only be a factor of 1.6.Note that the computational cost of the DFT energy and force evaluations is so large that the application of the preconditioner is completely negligible FIG. 3. Convergence of the geometry optimisation of a silicon crack using the Stillinger-Weber potential in a fixed unit cell with 33,696 atoms.Solid and dashed lines correspond to using line searches enforcing Armijo and Wolfe conditions, respectively.The parameters of the preconditioner (or not using a preconditioner) are given in the legend.in comparison.In this case the approach of Pfrommer et al. 7 does not make a significant improvemenet over not using a preconditioner.
For a test of the relaxation of both atomic positions and unit cell size and shape we used a 1 × 1 × 2 supercell of a γ-Al 2 O 3 structure, with methods similar to those described above for LaAlO 3 , except for a 530 eV plane wave cutoff and a Γ-centered k-point mesh.For this system, plotted in Fig. 6, the reduction in computations for both preconditioners is about a factor of 5, a very significant improvement.While the non-preconditioned  FIG. 6. Convergence of the geometry optimisation of a 106atom γ-Al2O3 system in a variable cell.The parameters of the preconditioner (or not using a preconditioner) are given in the legend.minimizer fails to make progress at several points during the relaxation, both our new preconditioners allow the LBFGS minimizer to rapidly and steadily reduce the gradient until convergence.Here, the approach of Pfrommer et al. 7 actually results in slightly worse performance than unpreconditioned LBFGS.This could perhaps be improved by careful tuning of the bulk modulus and optical phonon frequency parameters used to construct the approximation to the inverse Hessian; however, we note that our new preconditioner does not require any user input as all parameters are computed automatically.The addition of the cell degrees of freedom, which are preconditioned in magnitude but not coupled to the positional degrees of freedom, do not reduce the effectiveness of our preconditioners.
Finally we tested the new preconditioner for a molec- FIG. 7. Convergence of the geometry optimisation of a 432atom ice VIII system with fixed (solid lines) and variable (dashed lines) unit cells.The parameters of the preconditioner (or not using a preconditioner) are given in the legend.
ular system, ice VIII.The system contained 432 atoms with an initial cell dimension of 13.65 × 13.65 × 13.16 Å3 .A DFT potential with BLYP exchange-correlation functional was used with DZVP basis set and GTH pseudopotentials.Calculations were performed by the CP2K program package using the QUIP interface. 16,19,20Fig. 7 shows the number of energy evaluations of the different optimisations for fixed and variable cells using a maximum force threshold of 10 −3 eV Å−1 .Similarly to previous systems, the Armijo condition performed better than Wolfe so we present here only the results with the former line search.In the A = 0 case we slightly increased the default cutoff parameter (r cut = 2.25 Å) to include hydrogen bonded neighbours too.For both the fixed and variable cells the computational costs compared to the unpreconditioned optimisation were reduced by 3 and 4 times using A = 0 and A = 3, respectively.

IV. SADDLE SEARCH
To demonstrate the transferability of our preconditioner not only across problem classes but also across algorithms, we apply it to the dimer saddle search algorithm. 21,22A modified variant of the algorithm proposed in Ref. 22 reads a finite-difference approximation of ∇ 2 f (x k )v.Interestingly, naive preconditioning of the orientation steps led to poorer performance in our tests.
We test this preconditioned dimer algorithm by computing the saddle configuration of a vacancy in a Lennard-Jones fcc crystal, with a cubic computational cell.Given two states x (0) , x (1) which have two neighbouring lattice sites removed, we choose the starting configuration x 0 = 1 3 x (0) + 2 3 x (1) and v 0 ∝ x (1) − x (0) .The step-sizes are chosen by hand-optimising for a small setup with 3 3 unit cells: α = 0.01, β = 0.005 for the unpreconditioned variant (P k = I) and α = 0.5, β = 0.01 for our preconditioner with parameters A = 3.0, r rcut = 2r nn .For both variants we chose h = 10 −2 .The results are displayed in Figure 8, demonstrating analogous improvements to the energy minimisation examples.

V. CONCLUSIONS
In summary, we have presented a simple preconditioner for geometry optimisation and saddle search that is universally applicable in a wide range of atomistic and molecular condensed phase systems, offering at least a factor of two in performance gain in our examples of small systems, and up to factor of ten in systems of tens of thousands of atoms.The extra cost of using the preconditioner is small enough that it is worth using even with inexpensive interatomic potentials, while the performance gain is expected to scale as the square root of the system size.A Python implementation within the Atomic Simulation Environment 23 is available at https://gitlab.com/jameskermode/ase,offering interfaces to a wide range of atomistic codes such as VASP 17 , CASTEP 24 , CP2K 19 , LAMMPS 25 , and many others.Conversely, if u T Hu ≥ cu T P u, then phonon stability of P implies phonon stability of H.But the former is an immediate consequence of the fact that the coefficients in the definition of P are positive. 26
FIG. 4. Scaling with system size for geometry optimisations in N × 1 × 1 Si supercells containing from 32 to 512 atoms with the Stillinger-Weber potential, using unpreconditioned LBFGS (red), the inverse Hessian approximation of Pfrommer et al. (magenta), and our new preconditioner (blue).

FIG. 5 .
FIG.5.Convergence of the geometry optimisation of a 220atom LaAlO3 slab using DFT in a fixed unit cell.The parameters of the preconditioner (or not using a preconditioner) are given in the legend.

. 1 3 FIG. 8 .
FIG.8.Performance of the preconditioned dimer method for a vacancy in a Lennard-Jones fcc crystal (solid lines), and without a preconditioner (dashed lines), for two system sizes.