A trust-region augmented Hessian implementation for restricted and unrestricted Hartree-Fock and Kohn-Sham methods

We present a trust-region augmented Hessian implementation (TRAH-SCF) for restricted and unrestricted Hartree-Fock and Kohn-Sham methods. With TRAH-SCF convergence can always be achieved with tight convergence thresholds, which requires just a modest number of iterations. Our convergence benchmark study and our illustrative applications focus on open-shell molecules, also antiferromagnetically coupled systems, for which it is notoriously complicated to converge the Roothaan-Hall self-consistent field (SCF) equations. We compare the number of TRAH iterations to reach convergence with those of Pulay's and Kolmar's (K) variant of the direct inversion of the iterative subspace (DIIS) method and also analyze the obtained SCF solutions. Often TRAH-SCF finds a symmetry-broken solution with a lower energy than DIIS and KDIIS. For unrestricted calculations, this is accompanied by a larger spin contamination, i.e. larger deviation from the desired spin-restrictedexpectation value. However, there are also rare cases in which DIIS finds a solution with a lower energy than KDIIS and TRAH. In rare cases, both TRAH-SCF and KDIIS may also converge to an excited-state determinant solution. For those calculations with negative-gap TRAH solutions, standard DIIS always diverges. If comparable, TRAH usually needs more iterations to converge than DIIS and KDIIS because for every new set of orbitals the level-shifted Newton-Raphson equations are solved approximately and iteratively by means of an eigenvalue problem. Nevertheless, the total runtime of TRAH-SCF is still competitive with the DIIS-based approaches even if extended basis sets are employed, which is illustrated for a large hemocyanin model complex.

convergence can always be achieved with tight convergence thresholds, which requires just a modest number of iterations. Our convergence benchmark study and our illustrative applications focus on open-shell molecules, also antiferromagnetically coupled systems, for which it is notoriously complicated to converge the Roothaan-Hall selfconsistent field (SCF) equations. We compare the number of TRAH iterations to reach convergence with those of Pulay's and Kolmar's (K) variant of the direct inversion of the iterative subspace (DIIS) method and also analyze the obtained SCF solutions. Often TRAH-SCF finds a symmetry-broken solution with a lower energy than DIIS and KDIIS. For unrestricted calculations, this is accompanied by a larger spin contamination, i.e. larger deviation from the desired spin-restricted S 2 expectation value. However, there are also rare cases in which DIIS finds a solution with a lower energy than KDIIS and TRAH. In rare cases, both TRAH-SCF and KDIIS may also converge to an excited-state determinant solution. For those calculations with negative-gap TRAH solutions, standard DIIS always diverges. If comparable, TRAH usually needs more iterations to converge than DIIS and KDIIS because for every new set of orbitals the level-shifted Newton-Raphson equations are solved approximately and iteratively by means of an eigenvalue problem. Nevertheless, the total runtime of TRAH-SCF is still competitive with the DIIS-based approaches even if extended basis sets are employed, which is illustrated for a large hemocyanin model complex.

I. INTRODUCTION
Most of today's quantum chemical calculations use either the Hartree-Fock (HF) or Kohn-Sham (KS) density functional theory (DFT) ansatz to solve the molecular Schrödinger equation. Typically, a solution is obtained with the self-consistent field (SCF) method applied to the Roothaan-Hall (RH) equations 1 that, until convergence is reached, repetitively diagonalizes the Fock matrix in the atomic-orbital (AO) basis to receive a new set of molecular orbitals (MO). The SCF approach often faces convergence issues and thus several tricks become necessary such as mixing old and new density matrices 2 known as damping or shifting the virtual-orbital energies by a constant or varying value. 3 These two techniques are available and used by default in most quantum chemistry codes together with a extrapolation scheme known as the direct inversion of the iterative subspace (DIIS). 4 For most SCF calculations, the DIIS scheme together with damping and/or level shifts facilitates fast convergence within a few iterations. However, following this route convergence cannot be guaranteed and often fails, in particular for open-shell molecules. Open-shell molecules often feature small energetic gaps between the highest occupied (HO) MO and the lowest unoccupied (LU) MO and multiple low-lying energy solutions exists. Those obstacles lead to slow convergence rates or even divergence when following the standard SCF approach.
Alternatively, the HF and KS-DFT solution can be found with so-called second-order methods that employ the first and second derivative of the energy with respect to changes in the MO coefficients to minimize the energy by exploiting the variational principle. A second-order quadratically convergent (QC) SCF method was first proposed and implemented by Bacskay 5,6 that can be still considered as state-of-the-art in many aspects: (i) The orbital updates in every macro iteration were obtained from the eigenvalue equations of the gradient-augmented Hessian (AH) that contains the first and second energy derivative, respectively. (ii) The eigenvalue equations were solved iteratively with the Davidson algorithm 7 and the accuracy of those micro iteration was coupled to the accuracy of the current energy. (iii) The linear transformations of the Hessian with trial vectors for the orbital update were implemented with a Fock matrix-based formulation that circumvents the costly O(N 5 ) scaling transformations of the two-electron integrals from the AO to the MO basis with N being a measure for the system size.
Though it was shown that the QC-SCF method was significantly more robust than the standard RH based SCF approach, there is still no guarantee that QC-SCF will always converge. This issue was later resolved with a restricted step trust-region algorithm 8 for multi-configurational (MC) SCF methods by Jensen and Jørgensen,9 which at least in theory should always converge within a finite number of iteration. Note that for MCSCF methods convergence is often even harder to obtain due to the coupling of the orbital rotation and configuration interaction parameters. For that reason many other variants of second-order MCSCF optimizers have been developed in the course of time. 10 However, that norm extended optimization algorithm of Jensen and Jørgensen 9 is not directly applicable to single-determinant wavefunctions. That is probably why Jørgensen and his co-workers combined the AH approach of Bacskay 5 with their previous restricted-step approach when working on a linearly scaling, completely AO-based trust-region augmented Hessian (TRAH) approach 11 for large closed-shell molecules. 12 In the present work, we present a restricted-step trust-region implementation of the QC-SCF method that we call TRAH-SCF. Our primary focus is to accomplish robust convergence for both restricted closed-shell and unrestricted open-shell SCF calculations that are also still feasible for extended molecules with sufficiently large basis sets. In Sec. II, we will recapitulate the general TRAH method for finding minimum solutions of variational wavefunction models and also give some details for SCF approaches whenever necessary.
After presenting the computational detail in Sec. III, we investigate the convergence behavior of closed-shell restricted and open-shell unrestricted HF and KS-DFT calculations in Sec. IV A with our TRAH-SCF implementation and compare it with the established DIIS and KDIIS methods. For our convergence study, we use a small set of molecules with a notoriously complicated electronic structure and choose a single exchange-correlation (XC) functional for each of the following functional types: local density approximation (LDA), generalized gradient approximation (GGA), hybrid with exact HF exchange, hybrid meta-GGA, and range-separated hybrid. Also we investigate in Sec. IV B the SCF convergence of unrestricted triplet and broken-symmetry singlet calculation for a small metal cluster that was studied previously 13 also in the context of robust convergence. 14 In Sec. IV C we study the SCF convergence of an antiferromagnetically coupled iron dimer system, i.e. Roussin's red dianion, with broken-symmetry unrestricted HF and DFT methods. We also discuss spin contamination and symmetry breaking of the respective SCF solutions obtained with TRAH-SCF and the two DIIS implementations. Then, we demonstrate the efficiency of our TRAH-SCF implementation for a hemocyanin model complex in Sec. IV D and relate it to the conventional DIIS based implementations. Finally, we conclude our work and give a perspective for future work in Sec. V.

II. THEORY AND IMPLEMENTATION
A. Trust-region augmented Hessian SCF method (1) withĤ being the nonrelativistic molecular electronic Hamiltonian that may also include exchange-correlation potentials. The final solution can then be described by a unitary transformation from an arbitrary reference wavefunction |0 In case of closed-shell restricted or open-shell unrestricted HF and KS-DFT methods,κ represents rotations between inactive occupied and unoccupied virtual MOs. So-called secondorder methods expand the E(κ) through second order in the orbital-rotation parameters κ and try to find an optimal solution for such a quadratic model to update the current wavefunction solution by invoking Eq.
for the orbital update. Explicit equations for the electronic gradient g and the linear transformations of the Hessian H are well known 15 and recapitulated in the Appendix.
The energy expansion in terms of orbital rotations does formally not truncated after a finite order and the quadratic model is not necessarily a reasonable approximation, in particular, when being far from the final solution of Eq. (1). This implies that finding optimal orbital rotations κ by solving the NR equations (5) (micro iterations) and the subsequent orbital update Eq. 3 (macro iterations) has to be performed more than once.
The bigger obstacle is that in such situations the Hessian H can be indefinite and/or the NR solution gives a large step veering away from the final solution |0 . A remedy for such NR based optimization procedures is the introduction of a trust region, i.e. a sphere with radius h, in which the solution of the quadratic model is constrained to by To find a solution subject to Eq. (6), the minimum of the second-order Lagrangian has to be found, which results for the boundary condition of Eq. (6) in the level-shifted NR The shift µ is an additional unknown parameter that cannot be obtained by solving the linear equations (8). Instead, a value that is right below the lowest eigenvalue of the Hessian can be chosen to guarantee positive definiteness of the left-hand side of Eq. (8). An approximate determination of the shift µ prior to the solution of the level-shifted NR equations (8) has been successfully attempted. 10 However, it is expected that the number of iterations for such a two-step approach are larger than for a one-step approach that determines all parameters µ and κ simultaneously. This can be achieved by diagonalizing the scaled augmented Hessian for which level shift µ and orbital rotations κ occur as lowest eigenvalue and the corresponding eigenvector, respectively. The lower part of the eigenvalue equations (9) resembles, apart from the scaling factor 1/α, the level-shifted NR equations (8). The length of the update vector κ can be chosen in such a way that lies eihter within or on the surface of a trust-region sphere The constraint (11) can be imposed easily in an iterative Davidson algorithm 7 for finding the lowest root of the AH in Eq. (9) (vide infra).
We note in passing that the particular form of the scaled AH eigenvalue equations is not unique. In the mathematics literature, other eigenvalue equations for the the trust-region subproblem (8) were proposed and successfully implemented. 16

B. The iterative TRAH eigenvalue problem
Concerning solving the TRAH eigenvalue equations, we follow Refs. 11 and choose the following two initial trial vectors This choice of trial vectors separates the orbital update into two contributions -one parallel to the gradient and the other orthogonal to it -and leads to a special structure of the reduced-space TRAH A with To assist convergence to the lowest root, we add another start vector b 2 that is zero for all elements except for the orbital rotation that matches the smallest virtual-occupied orbital energy difference. The latter is a reasonable approximation to the diagonal Hessian Here we only allow α values in a given interval [α min , α max ] (see Tab. I) and perform a bisection search for α that gives the smallest deviation h 2 − 1 α 2 ||κ(α) red || 2 2 with κ(α) red being the corresponding eigenvector of A(α). The equality condition 1 α 2 ||κ(α) red || 2 2 = h 2 giving the solution on the trust-region sphere usually occurs only for the first few macro iterations.
Thereafter, α is located at the lower boundary α min and the step κ(α) is within the trust- When the gradient norm is below a threshold (see Tab. I), our current solution is close to convergence and we switch to an iterative solution of the NR equations (5) without a level shift. We use for this purpose a Davidson-type method customized for systems of linear equations. Since the Hessian is positive definite when being close to convergence, we have also employed the often advocated preconditioned conjugate gradient (PCG) method. 18 But PCG did not offer any advantage for the shift-free NR equations because it turned out to be less robust for critical cases and did not show a faster convergence than the Davidson-type method.

C. Orbital update
Once the TRAH eigenvalue equations are solved approximately, the orbital rotation vector κ is to compute the MOs C of the next macro iteration k + 1, By default, the matrix exponential is evaluated recursively by Taylor expansion. It is truncated once the norm of next expansion order is below 10 −14 . To reduce the operations and hence the numerical error, we make use of the scaling-and-squaring approach 19 with a fixed order (s = 3) and the exponent scale factor 1/2 s . Before the orbital update, a Cholesky orthogonalization of exp(−K) is performed to preserve orthonormality of the updated MOs.
Finally, the inactive occupied and virtual are transformed in such a way that MO Fock matrix becomes diagonal in their respective occupied-occupied and virtual-virtual subblocks. This MO canonicalization usually improves the convergence rate of Davidson micro iterations.

D. Step control
In course of the TRAH optimization the trust radius should be updated to facilitate fast and robust convergence. For this purpose, we employ Fletcher's algorithm 8 that has been also chosen for many other restricted-step second-order implementations. 9,11,20,21 Based on the ratio of the actual energy E actu and the predicted energy by the quadratic model E pred , the trust radius is potentially adjusted and the wavefunction update accepted or rejected: • If r < 0, either the predicted or the actual energy rises while the other falls. The quadratic model is not applicable within the given trust region and the new trust radius is decreased by h k+1 = 0.7 h k . Moreover, the recently updated orbitals are rejected and the micro iterations are repeated from the previous set of orbitals.
• If 0 ≤ r ≤ 0.25, the recent update step was too long. Hence, the step is accepted but the new trust radius is decreased by h k+1 = 0.7 h k .
• If 0.25 < r ≤ 0.75, the step is accepted and the trust radius is left unchanged.
• If r > 0.75, the step is accepted and the new trust radius is increased h k+1 = 1.2 h k .

III. COMPUTATIONAL DETAILS
All calculations were performed with a development version of the ORCA quantum chemistry program package. 22 The new TRAH-SCF implementation will be publicly available in  Note that for reasons of convenience, the maximum norm rather than the Frobenius norm is shown for DIIS and KDIIS in all figures because this norm is also used for those implementations. However, TRAH works entirely with the Frobenius norm.

A. A benchmark on SCF convergence
To demonstrate the robustness and efficiency of the new TRAH-SCF implementation for restricted and unrestricted SCF, we compare the number of iterations and the occurrence of erratic convergence behavior with those of DIIS and KDIIS. For this purpose we chose a test set of small molecules that should be handled rather with multi-configurational methods and was initially proposed by Scuseria and his coworkers. 24,25 The number of SCF iterations needed to reach convergence with of DIIS, KDIIS, and TRAH using the default settings is given in Fig. 1 Fig. 3.
Additionally, the convergence of the triplet calculations is presented in Fig. 3.
As shown in Fig. 3 DIIS and TRAH show similar convergence rates if one also includes the micro iterations in the analysis. The fewer number of SCF iterations in the DIIS calculations is mainly related to a faster convergence in the initial iterations rather convergence rates in a very shallow quadratic convergence region. KDIIS showed the best convergence for triplet case but showed a very slow convergence for the singlet calculation. There the maximum gradient norm went below 10 −3 only after 190 iteration.

C. Application to a Fe 2 S 2 complex
The SCF solution and the convergence behavior of DIIS, KDIIS, and TRAH is investigated for high-spin (M S = 11) and broken-symmetry singlet calculation of the Roussin's red salt dianion (see Fig. 4). The two iron centers are antiferromagnetically coupled as has been revealed in previous computational studies. 43 Concerning our BP86 calculations, the high-spin DIIS calculation diverges and, hence, For this BS singlet solution, the point-group symmetry is not broken as revealed by the Mulliken charges. Since the hybrid meta-GGA TPSSh includes exact HF exchange, we observe a larger spin contamination in particular for the BS singlet calculation (see Tab. IV).
Finally, we have also performed HF calculations. Again, only KDIIS and TRAH converged with default settings. For the high-spin calculation, TRAH breaks the point-group symmetry while converging to a solution with a lower energy than KDIIS. Also for the broken-symmetry singlet calculation, TRAH converges to a solution with a lower energy than KDIIS and a larger deviation from the ideal S 2 value (see Tab. IV). Both highspin and singlet TRAH calculation take much mores iteration then KDIIS which is again attributed to the complicated energy landscape that offers the possibility to find many symmetry-broken solutions.

D. Performance of TRAH for a hemocyanin model complex
After focusing on convergence characteristics of our new TRAH-SCF implementation, we finally investigate the run-time performance in comparison with the other DIIS based convergers. For this purpose, we calculated the unrestricted triplet and broken-symmetry singlet SCF energy of a hemocyanin model complex 40 that has 164 atoms (see Fig. 5).
We employed for those calculations RI approximation for Coulomb matrix 44  Often we observed that TRAH-SCF finds a solution with a lower energy than DIIS and KDIIS. For unrestricted calculations, this is accompanied by a larger spin contamination, i.e. larger deviation from the desired spin-restricted S 2 expectation value. However, we found also rare cases in which DIIS found a solution with a lower energy than KDIIS and TRAH, which had one or more types of broken symmetries. Furthermore, we observed that

VII. ACKNOWLEDGMENTS
The author acknowledges gratefully financial support from the Max Planck Society X −,σ ai = a † aσ a iσ − a † iσ a aσ (22) and the sigma vector are given for spin-unrestricted SCF wavefunctions. We follow the convention that the indices i, j, . . . denote occupied orbitals and a, b, . . . virtual orbitals. Both equations are formulated in terms of Fock and Kohn-Sham potential matrices in the sparse AO basis represented by Greek-letter indeces. The following intermediates are needed to compute the gradient and sigma vectors with an AO-based formulation: For the intermediates above the following densities ρ, density matrices D, and trial vectorcontaining MO coefficients 49 Λ are used: For simplicity reasons, the XC potential matrices are only given for a local density approximation (LDA). All other functionals that are available in ORCA can be employed as well with TRAH-SCF. Though not discussed in this work, please note that solvation effects can also be accounted for in the new TRAH-SCF implementation with the conductor-like polarizable continuum model 50 (C-PCM).