Chemical bonding theories as guides for self-interaction corrected solutions: multiple local minima and symmetry breaking

Fermi--L\"owdin orbitals (FLO) are a special set of localized orbitals, which have become commonly used in combination with the Perdew--Zunger self-interaction correction (SIC) in the FLO-SIC method. The FLOs are obtained for a set of occupied orbitals by specifying a classical position for each electron. These positions are known as Fermi-orbital descriptors (FODs), and they have a clear relation to chemical bonding. In this study, we show how FLOs and FODs can be used to initialize, interpret and justify SIC solutions in a common chemical picture, both within FLO-SIC and in traditional variational SIC, and to locate distinct local minima in either of these approaches. We demonstrate that FLOs based on Lewis' theory lead to symmetry breaking for benzene -- the electron density is found to break symmetry already at the symmetric molecular structure -- while ones from Linnett's double-quartet theory reproduce symmetric electron densities and molecular geometries. Introducing a benchmark set of 16 planar, cyclic molecules, we show that using Lewis' theory as the starting point can lead to artifactual dipole moments of up to 1 Debye, while Linnett SIC dipole moments are in better agreement with experimental values. We suggest using the dipole moment as a diagnostic of symmetry breaking in SIC and monitoring it in all SIC calculations. We show that Linnett structures can often be seen as superpositions of Lewis structures and propose Linnett structures as a simple way to describe aromatic systems in SIC with reduced symmetry breaking. The role of hovering FODs is also briefly discussed.


I. INTRODUCTION
One of the simplest descriptions of chemistry is given by Lewis' theory (LT) of chemical bonding. 1 The main focus of LT is the octet rule, which states that the valence electrons of main group elements are arranged in four pairs that imitate the electron configuration of noble gas atoms. 1 While LT is probably the best-known, most extensively taught, and most widely used theory of chemical bonding, there are several cases in which LT does not suffice for an accurate understanding of chemical bonding. 2 For instance, as LT assumes that the electrons are always paired-thereby forming a closed-shell singlet state-LT is not able to describe molecules with non-singlet ground states, such as the oxygen molecule. This shortcoming of LT, the lack of electronic spin, is solved by Linnett's double-quartet (LDQ) theory 3-7 that explicitly includes the electronic spin in the formalism. The nearly-forgotten 8 LDQ replaces the octet of LT by two quartets-one quartet per spin channel-hence the name of the theory.
Schwalbe et al. 9 revived LDQ theory in the context of the Perdew-Zunger (PZ) self-interaction correction 10 (SIC) to density functional theory (DFT). 11,12 The idea a) Electronic mail: ktrepte@stanford.edu/kai.trepte1987@gmail.com b) Electronic mail: schwalbe@physik.tu-freiberg.de c) Electronic mail: susi.lehtola@alumni.helsinki.fi of SIC is to approximately remove one-electron selfinteraction error from the Kohn-Sham (KS) energy functional 12 where T s [n α , n β ] is the kinetic energy of the noninteracting system, V [n] is the external potential energy, and n σ is the electron density for spin σ, with α and β representing spin up and spin down, respectively. The PZ functional explicitly removes the self-interaction error one electron at a time 10 where n iσ is the electron density of the i-th occupied orbital with spin σ, and E J and E xc denote the Coulomb and exchange-correlation energy functionals. While the usual formulation of DFT is invariant to rotations between occupied orbitals, 13 this invariance is broken by SIC which explicitly depends on the occupied orbitals used to evaluate Eq. (2). 14 The PZ functional then needs to be minimized in terms of the O(N 2 occ ) rotation angles for N occ occupied spatial orbitals; the orbitals tend to become localized upon the optimization. 14 It has been shown more recently that the variational optimization requires complex orbitals; 15 software for such calculations is openly available. 15,16 In the Fermi-Löwdin orbital 17 (FLO) formulation of SIC, [18][19][20][21] the need for occupied-occupied orbital rotations is avoided by the use of Fermi orbitals, 17 obtained from for spin-σ position eigenstates |a σ i localized at the so-called Fermi-orbital descriptor (FOD) a σ i which become the optimized variables in the theory. When the occupied orbitals are known as a basis set expansion where C σ are the molecular orbital coefficients and χ µ are the basis functions, the corresponding Fermi orbital (FO) coefficients c σ FO are given by where the transformation matrix R σ is given by The FOs are non-orthonormal but can be symmetrically orthogonalized 22 to yield the Fermi-Löwdin orbitals (FLOs) with coefficients where T σ and Q σ contain the eigenvectors and eigenvalues of the FO overlap matrix, respectively. These FLOs are then used to evaluate the PZ functional, Eq. (2). In the spin-restricted formalism, the Fermi orbitals are parametrized by a set of 3N occ Fermi orbital descriptors (FODs), which correspond to semiclassical cartesian coordinates of the N occ electrons of each spin that need to be optimized instead of the O(N 2 occ ) rotation angles of the traditional formalism. 14 Analytical energy gradients for the FODs have been derived in Ref. 19, and they are employed in various implementations of the FLO-SIC method; please refer to Schwalbe et al. 23 for a thorough overview on the presently used, openly available implementation of FLO-SIC.
Schwalbe et al. 9 showed that the FODs and the corresponding FLOs have a clear relation to chemical bonding theories. The most common bond patterns in LT are single, double, and triple bonds, and any of these bonds can be represented in terms of FOD arrangements. For instance, single bonds are described by one α and one β FOD in-between the atoms, double bonds are characterized by two α and two β FODs above and below the bond center, and triple bonds are built up by three α and three β FODs placed in triangles around the bond center for each spin. Analogous FOD motifs have also been found for FODs corresponding to core and lone pair orbitals: 9 for example, FODs corresponding to 1s orbitals are typically placed at the nuclear sites, while the FODs for fully occupied 2s and 2p orbitals usually form a tetrahedron.
While LDQ theory includes all the bonding motifs of LT, it also allows a simple and elegant representation of aromatic bonds. As illustrated by Fig. 1, an aromatic bond can be represented in LDQ by two α FODs and one β FOD or vice versa, leading to a formal bond order of 1.5, whereas the LT structures have alternating single and double bonds. Figure 1. Electronic geometries of benzene according to various bonding theories. A single LT structure cannot describe the aromaticity correctly; only a superposition of the two resonance structures reproduces the aromatic character. On the other hand, a single LDQ structure already describes the electronic configuration correctly, since the spin-average of the electronic geometry has a formal bond order of 1.5 in all bonds. Solid lines represent a bond involving one spin-up and one spin-down electron. In LDQ theory, the x symbol denotes a spin-up electron while the • symbol denotes a spindown electron. The arrows indicate transformations between the representations of aromatic bonding.
In this work, we investigate these findings further by studying the usefulness of FODs that follow the bonding theories of Lewis and Linnett for exploring symmetry breaking in SIC calculations. Symmetry breaking in SIC was originally discovered by Lehtola, Head-Gordon, and Jónsson 15 , who showed that the molecular symmetry of benzene is broken by SIC with real (RSIC) or complex orbitals (CSIC). As discussed in Ref. 15, the symmetry breaking arises because SIC calculations on benzene are based on a single LT structure with alternating single and double bonds. The energy of the SIC solution can be decreased by allowing the C-C double bonds in the Lewis structure to contract, leading to more spatial localization which is energetically favored by SIC, while the C-C single bonds of the Lewis structure become elongated. The carbon-carbon bonds in benzene then come out unequal in length, similarly to (but not as strongly as in) the fictitious 1,3,5-cyclohexatriene molecule.
While SIC solutions guided by LT thus break the molecular symmetry of benzene, we show in this work that LDQ theory succeeds in reproducing a symmetric molecular geometry and thereby overcomes symmetry breaking for this molecule. We also show that the symmetry breaking of LT can already be seen in the electron density at the symmetric molecular geometry of benzene.
To highlight the broader scope of these findings for various kinds of SIC, in this work we consider not only the FLO-SIC method, but also include variational SIC in both the RSIC and CSIC variants to show how the choice of the initial orbital guess affects self-consistent SIC solutions.
Moreover, as artifacts arising from the use of a single Lewis structure may also become visible in the electron density, we point out the need to carefully analyze SIC solutions. Following the pioneering work of Hait and Head-Gordon 24 , who suggested using electronic dipole moments (DMs) as a simple global measure of the electron density reproduced by DFT, we investigate with DMs whether the electron density from SIC calculations properly reflects the molecular symmetry. Using a benchmark set of 16 planar cyclic molecules, which extends a database previously suggested by Adhikari et al. 25 , we show that bonding patterns guided by LT lead in some cases to SIC DMs that do not reflect the molecular symmetry, leading to artifactual dipole moments. For example, in the case of phenazine, (C 6 H 4 ) 2 N 2 , which is symmetric and therefore should not have a dipole moment, LT structures result in a huge artifactual dipole moment of 1 Debye.
Motivated by the success with benzene, we also show how LDQ can be used to find alternative bonding patterns that lead to improved DMs in SIC calculations for this database. The LDQ structures for all the studied molecules have the same symmetries as the molecules themselves, at variance to the LT structures. The correct symmetry of the FODs also leads to DMs with the correct symmetry; in the previously discussed case of phenazine, LDQ correctly yields no dipole moment. Based on this victory, we point out that LDQ FODs can be seen as a simple way to model aromatic molecules in SIC. We elaborate that FLOs based on LDQ FODs, which can be interpreted as a superposition of LT structures, can be used as initial guesses for RSIC and CSIC, leading to stable local minima.
This work is structured as follows. The methodology used in this study is discussed in Section II, and details of the electronic structure calculations carried out in this work are provided in Section III. The results are presented in Section IV. First, various local minima and symmetry breaking in benzene are discussed in detail using FLO-SIC, RSIC, and CSIC in Section IV A. Next, the influence of the initial orbital guesses following LT or LDQ arrangements on the DMs arising from SIC calculations on our benchmark set of planar, cyclic molecules is discussed in Section IV B, where calculated DMs are used to explain how the SIC solutions are affected by the chemical bonding situations described by the initial orbital guess; basis set convergence is also investigated. Further investigations into bonding motifs are discussed in Section IV C. The main findings of this study are summarized and conclusions are drawn in Section V.

A. Benzene study
Our goal in this section is to study the effect of various electronic geometries, that is, FOD arrangements, on the molecular structure of benzene. Benzene is the prototypical aromatic molecule, and as will be shown later in this work, any conclusions that can be derived for benzene are likely applicable also to larger aromatic systems as well. As has been already established, 15 SIC prefers a symmetry-broken structure for benzene; the energy landscape of benzene was therefore studied as follows. Average C-C bond lengths d CC in the range between 1.317 Å and 1.417 Å were sampled with a step size of 0.005 Å, yielding 21 values for the average bond length; experimentally, all the C-C bond lengths are the same, d CC = 1.397 Å. 26 All C-H bonds were fixed to their experimental value, d CH = 1.084 Å. 26 To investigate the distorted structures that may be favored by SIC, following Ref. 15 we elongate three C-C bonds by an amount ∆, whereas the other three C-C bonds are shortened; the hydrogens are moved correspondingly. We consider local distortions ∆ ranging from 0 Å (symmetric structure) to 0.075 Å, likewise sampled in 0.005 Å steps, leading to 16 distorted structures for each value of the average C-C bond length and a grand total of 336 candidate structures. A detailed description of how to obtain the nuclear coordinates with this procedure is given in the supporting information (SI); for a visual explanation, see Fig. 2.
The three FOD starting structures (LT1, LT2, and LDQ) shown in Fig. 3 were considered for benzene. In the LT1 structure, the double-bond FODs are placed between C 2 -C 3 , C 4 -C 5 and C 1 -C 6 ; the numbering is shown in Fig. 2. In the LT2 structure, the double bonds are placed between C 1 -C 2 , C 3 -C 4 and C 5 -C 6 , instead. The LDQ structure is staggered: a double-bond FOD structure is picked for one spin channel, while a single-bond FOD structure is used for the other spin channel, and the configurations alternate spins between each bond. The LDQ structure thereby formally corresponds to a brokensymmetry open-shell singlet state; however, it correctly yields a formal bond order of 1.5 for all C-C bonds which are expected to be aromatic.

B. DIP16 benchmark set
The benchmark set used within this study extends an existing benchmark set introduced by Adhikari et al. 25 ; a full list of the molecules is given in Table 1     imental values (see Table 1). In this work, we have included the experimental DMs for the four missing systems: 2,1,3-benzothiadiazole, dichlone, fluorene, and 1,2,5-thiadiazole. In addition, we have included cyclobutadiene to account for anti-aromaticity as well as benzene for a fuller coverage of various chemistries. We will refer to this entire set as DIP16 (DIPoles of 16 molecules).
In the interest for maximal comparability to the Table 1. Description of the DIP16 benchmark set, including the molecules' identifiers (IDs) and point groups (PGs). More details can be found in the SI. a Molecules studied by Adhikari et al. 25 but whose DMs were not compared to experiment in Ref. 25. b Molecules introduced in this work.
study of Adhikari et al. 25 , the molecular structures used in this work were collected from common chemical databases [27][28][29] ; an explicit listing is given in the SI. All the used molecular and FOD structures as well as the corresponding DMs are also available at https: For the entire DIP16 test set, we investigated a total of 51 LT and LDQ electronic geometries as initial starting points for RSIC, CSIC, and FLO-SIC calculations. There are up to 4 LT and 2 LDQ electronic geometries for each molecule; an example will be provided in Section IV, and some more visualizations are available in the SI. The various LT and LDQ configurations for each molecule lead to distinct DMs, as the total density that arises from the orbital optimization is biased by the bonding pattern described by the FOD configuration. The DM components of DFT, RSIC, CSIC, and FLO-SIC calculations are also detailed in the SI.

A. Electronic structure codes
The Gaussian-basis all-electron codes ERKALE 15,16,30,31 , PySCF 32 , and PyFLOSIC 23 were used in this work. As ERKALE and PyFLOSIC are based on similar computational approaches (both employ the standard approach used in almost all Gaussian-basis quantum chemistry programs, which allows routinely reproducing total energies to microhartree accuracy), they enable comparisons of FLO-SIC, RSIC, and CSIC on an equal footing. The codes used in this work have a focus on simple input and output file handling as well as user-friendliness, which makes setting up complex workflows rather easy. Moreover, all three codes are freely and openly available, and can be used without limitations also in industry; see Ref. 33 for thorough discussion.
Furthermore, all three codes evaluate density functionals with the open-source Libxc library, 34 again ensuring comparability of the results. The Perdew-Wang (PW92) correlation functional, 35 which is a local density approximation (LDA), has been used for all calculations described in this work in combination with LDA exchange, 36,37 if not stated otherwise.
Even though the FODs depend on the exchangecorrelation functional, in analogy to the findings of Ref. 15, the results obtained at the LDA level of theory with the PW92 functional suffice to demonstrate symmetry-breaking effects arising from the selfinteraction correction. Indeed, as will be demonstrated in Section IV B 5, calculations performed at the generalized gradient approximation (GGA) as well as the meta-GGA levels of theory yield results that are similar to those obtained at the LDA level with the PW92 functional.
The all-electron calculations in ERKALE, PySCF and PyFLOSIC were carried out with the double-ζ polarization consistent pc-1 basis set. 38,39 While DMs are a textbook example of the need for diffuse basis functions, 40 as the expectation value of the dipole moment operator r may accumulate significant contributions from small changes in the electron density at large r where only diffuse functions contribute, the pc-1 basis set was chosen for comparability to the work of Adhikari et al. 25 that employed the DFO basis set 41 which is limited to d functions, thereby contains only a single shell of polarization functions on main group elements, and is therefore formally of polarized double-ζ quality; this is also supported by numerical evidence. 23 Although the use of such small basis sets is generally not advisable for investigating properties as challenging as DMs, exploratory calculations carried out with larger basis sets in Section IV B 5 show that the findings of the present study are not compromised by the use of such a small basis set that was chosen to optimize the computational cost.

PySCF and PyFLOSIC
A grid level 42 of 7 was used in PySCF and PyFLOSIC. This corresponds to using a (90,974) grid for hydrogen, a (135,1202) grid for second-period atoms and a (140,1202) grid for third-period atoms in Becke's multicenter quadrature scheme. 43 In contrast to the total electron density in normal DFT calculations, the orbital densities considered in SIC are not smooth; this is the main reason for the need for huge integration grids in SIC calculations, and also the rationale for turning off grid pruning in all DFT and SIC calculations of this work. Any linear dependencies in the molecular atomic-orbital basis set were removed by applying the scf.addons.remove_linear_dep_ function on the PySCF DFT object; default PySCF thresholds for this procedure were used. An SCF energy criterion of 10 −8 E h was used in all calculations.
Starting values for the FODs for PyFLOSIC calculations were generated with the Fermi-orbital descriptor Monte-Carlo fodMC code 9 (https://github.com/ pyflosic/fodMC). While there are other ways to initialize the FODs (see Refs. 9 and 23 for discussion), the fodMC code enables a direct translation between specific chemical bonding situations and FOD guesses, making fodMC the ideal approach to study the differences between LT and LDQ within SIC. The starting FODs were optimized in the FLO-SIC calculations performed with PyFLOSIC until the maximum force max i |F i | acting on any FOD fell below 5 · 10 −4 E h /a 0 .
As it is well-known that a vanishing gradient does not necessarily imply convergence onto a local minimum, for verification purposes, we introduce a stability analysis of FLO-SIC solutions in analogy to the method introduced in Ref. 15 for RSIC and CSIC calculations. A FLO-SIC calculation will be deemed to have converged onto a stable solution if the FOD Hessian is positive semidefinite. In this work, the FOD Hessian is formed seminumerically via finite differences of analytical FOD forces, using a two-point stencil with a step size of 0.001 Å. Because of the small number of FODs, the FOD Hessian fits in memory even for large systems and can be diagonalized using dense matrix algebra.

ERKALE
An unpruned (150,1202) quadrature grid was used in ERKALE. The threshold for the gradients of the orbital rotations in the occupied-occupied block was set to 10 −5 E h .
Stability analyses 15 -checks that the SIC solutions correspond to local minima with respect to rotations of the occupied orbitals-were carried out for all RSIC and CSIC calculations; the CSIC calculations were started from stable RSIC minima according to the best practices established in Ref. 15.
The default initial orbital guess in ERKALE for RSIC and CSIC calculations is given by localized orbitals 44 according to the FB criterion. To allow starting RSIC calculations from the same initial guess as PyFLOSIC, ERKALE was extended in this work to be able to initialize RSIC and CSIC calculations from FODs. Thus, instead of performing a FB localization to get initial orbitals for SIC calculations, input FODs are read in alongside a converged DFT wave function; FLOs are then constructed from these data using Eqs. (4)- (6). After the initialization, the Perdew-Zunger functional, Eq. (2), is minimized as usual by variational optimization, using the orbital rotation technique detailed in Ref. 15.

A. Benzene study
As was discussed in Section II A, benzene has three distinct FOD geometries: LT1, LT2, and LDQ, which were already shown in Fig. 3. Because LT1 and LT2 only differ by a rotation of the molecule, it is expected that both LT configurations converge onto equivalent minima. 15 The LT structures are expected to yield symmetry-broken geometries, such that the double-bond FODs prefer shorter C-C bonds. On the other hand, as the LDQ configuration can be thought of as a superposition of the two LT configurations, in which the spin-up FODs are picked from one LT structure while the spin-down FODs are picked from the other LT structure, the LDQ structure is expected to prefer a symmetric molecular geometry. Moving onto FLO-SIC, the calculations with PyFLOSIC started from LT type guesses give analogous results to the RSIC calculations with ERKALE: the molecular structure is distorted, as the minimal-energy structure corresponds to short double bonds and long single bonds. Although FLO-SIC expectedly produces a higher total energy than RSIC, as the latter method has more optimizable degrees of freedom as it allows all possible real-valued occupied-occupied orbital rotations, the optimal bond lengths from RSIC and FLO-SIC match to the precision used in this study.
In contrast, the electronic geometry from LDQ leads to a symmetric molecular structure. The solution obtained from the LDQ initial guess is slightly higher in energy (by about 6 mE h ) than that from the LT initial guesses in the RSIC calculations in ERKALE; a similar energy difference is also observed in the FLO-SIC calculations in PyFLOSIC. The energy difference of the LT and LDQ solutions is smaller in the CSIC calculations performed with ERKALE, measuring about 2.5 mE h . However, these millihartree differences are negligible compared to the overall magnitude of the SIC, which is in the order of 3 E h as evidenced by the data in Table 2.
The C-C bond length optimized with DFT and the PW92 functional is in excellent agreement with the experimental bond length 1.397 Å, with a difference of only 0.005 Å, which is also the resolution of this study. The LDQ calculation underestimates the experimental bond length by about 0.035 Å, whereas the LT structures break the symmetry, predicting one bond to be 0.075 Å shorter and the other to be 0.008 Å longer than in the experiment. As these results show, not only does LDQ resolve the symmetry breaking, but the maximal error in the bond lengths is two times smaller than in the LT structures. Interestingly, the optimal bond length for the LDQ structure, 1.362 Å, is close to the average of the optimal bond lengths for the LT structures, 1.364 Å.
As can be expected from the correspondence of the LDQ structure to broken-symmetry open-shell singlet calculations, the spin symmetry is slightly broken as evidenced by the non-zero Ŝ 2 from the LDQ calculations using RSIC, CSIC and FLO-SIC. In contrast, there is no spin contamination in the LT structures, but they do break the molecular symmetry.
The electron densities resulting from DFT and the RSIC solutions guided by the LT1, LT2 and LDQ FOD starting guesses in ERKALE are shown in Fig. 4. As expected, DFT delivers a symmetric electron density. Strikingly, symmetry breaking is already observable in the electron density of the two LT solutions at the symmetric molecular structure reproduced by DFT (see Table 2): both LT1 and LT2 break the rotational symmetry by placing more electron density onto the double bonds of the corresponding Lewis structure.
LDQ restores a rotationally invariant electron density, like the one from DFT. However, as expected from the open-shell singlet character of LDQ evidenced by the the non-zero value of Ŝ 2 , the LDQ calculation results  in a spin density, which is shown in Fig. 5: the spinalternation of the single and double bond FODs in the LDQ structure (see Fig. 3) shows up as an alternating spin density.

B. DIP16
Having exemplified the existence of several possible types of SIC solutions for benzene, we will next examine the DMs arising from various types of SIC solutions for our benchmark database of 16 planar cyclic molecules. In the interest of keeping the discussion short, the large set of data is only discussed summarily in the main text; thorough analyses of all molecules can be found in the SI. Note again that as discussed in Section II, all calculations in this section are performed at fixed geometries, at variance to the calculations on benzene discussed in Section IV A.   6 shows the DMs of azulene (mol03, C 10 H 8 , C 2v point group) corresponding to DFT and stable RSIC solutions from ERKALE. The calculations for Fig. 6 were started from FLOs generated by three distinct sets of FODs, analogously to the ones for benzene discussed in Section IV A. The three FOD starting points (LT1, LT2, and LDQ) each lead to a stable solution in ERKALE, as verified by the stability analysis with respect to rotations of the occupied orbitals discussed in Section III A 2. Stability analysis was also employed in the FLO-SIC calculations on this molecule (see Section III A 1), and the three solutions were each found to be true local minima.

Aromatic molecules
As can be seen from Fig. 6, the DM from SIC started from an LDQ configuration points along the two-fold symmetry axis; this means that the LDQ structure leads to an electron density that reflects the symmetry of the molecule. In contrast, the electron densities obtained with SIC starting from LT structures violate this symmetry, and as a result the DMs do not point in the direction of the two-fold symmetry axis. The two LT structures are mirror images of one another, and their DMs have an angle of roughly 10 • with respect to the symmetry axis. The asymmetry can already be seen from the LT FODs: counting all C-C bond FODs in the symmetry-equivalent parts of the molecule shows that there are 14 FODs on one side, and 16 on the other. This imbalance is then reflected and reinforced when the electron density is optimized in the self-consistent field procedure, as was already mentioned in Section II B.
Moreover, as mentioned in Section II A, combining the spin-up FODs of LT1 and the spin-down FODs of LT2 results in a LDQ structure. This LDQ structure has 15 C-C bond FODs in both symmetry-equivalent parts of the molecule, and thereby affords an unbiased description for the molecule. Thus, we can introduce LDQ structures as a useful representation of aromatic bonding. The character of the LDQ structure as a superposition of the LT structures is also observable in the DM: the averaged LT DM is µ (LT1+LT2)/2 = 0.71 D, it aligns with the molecular axis like the DMs from DFT and LDQ, and has a similar size to the LDQ DM, µ LDQ = 0.65 D.
There are several other molecules in the DIP16 set where LT FODs likewise lead to incorrect DMs. In acridine (mol01), LT configurations can lead to DMs with an angle of about 14 • with respect to the symmetry axis, while the LDQ DM correctly reflects the symmetry of the molecule. Phenazine (mol12) has LT structures which introduce a significant DM of about 1 D which is oriented perpendicular to the N-N axis, while the DM should be zero according to the molecule's D 2h point group; only the LDQ solution reproduces the correct DM. Similar results are determined for PyFLOSIC, see the SI, and the observations are analogous. For instance, the stability analysis discussed in Section III A 1 was carried out for the FLO-SIC solutions of azulene (mol03), and they were found to be true local minima.
For either ERKALE or PyFLOSIC, the LDQ DMs are typically in best agreement with experimental values for the absolute DM, and always respect the symmetries of the molecules in this study. However, some LT DMs do not do so, leading to artifactual dipole moments as showcased above for azulene.

Non-aromatic molecules
Both LT and LDQ describe identical chemical bonding patterns for non-aromatic parts of the molecules in this study; this means there is only one FOD geometry for such structures. Our test set contains 3 nonaromatic molecules which do not have any aromatic fragments: p-benzoquinone, p-chloranil, and p-fluoranil. These molecules are mol04, mol07, and mol09 in our database; see Table 1. They all have a D 2h molecular geometry; thereby, none should have a DM. This is correctly reflected by all methods (DFT, RSIC, CSIC, FLO-SIC) used in this study.
For molecules which do have aromatic fragments, like dichlone (mol08), fluorene (mol10), and 1,4naphthalenedione (mol11), various electronic geometries can be generated for their aromatic part, allowing for a greater variety in the FOD search space. The resulting DMs point along the C 2 symmetry axis. An exception is the LT3 structure of fluorene (mol10) that has a crooked DM with an angle of about 31 • to the symmetry axis in ERKALE and PyFLOSIC. This is unsurprising per the discussion in Section IV B 1, as the corresponding electronic geometry shows 4 C-C bond FODs on one side of the inner C-ring, while there are 6 C-C bond FODs on the other side; this imbalance leads to a bias that tilts the electron density to one side of the molecule during the optimization.
As before, the DMs that arise from various LT and LDQ FOD arrangements turn out to be different in size. DMs from LDQ FODs yield the best agreement of all SIC DMs with experimental reference values.

Anti-aromatic molecules
The antiaromatic cyclobutadiene (mol15) molecule has two sets of C-C bond lengths: 1.43 Å and 1.35 Å. According to its D 2h point group, the DM should be zero. Similarly to Section IV A, two LT and one LDQ structure can be formed. Given the distinct C-C bond lengths, one can expect the LT structure in which the double-bond FODs are placed at the shorter C-C bonds to be energetically favorable. Our results are in line with these expectations, that is, the DM is zero in all calculations and the LT structures that place the double-bond FODs in the shorter C-C bonds are lower in energy than the structures that place the double-bond FODs on the longer C-C bonds.

Summary
Having discussed the general features of the results on the DIP16 set of molecules, we can proceed with a summary analysis of the DIP16 data. The mean error (ME) and mean absolute error (MAE) are calculated over the entire DIP16 test set, and are analyzed in the following. The explicit values and more details about the used reference values [45][46][47][48][49][50][51] can be found in the SI. The PW92 DFT calculations yield a ME of −0.06 D and a MAE of +0.10 D both in ERKALE and PySCF; the excellent agreement again confirms the reproducibility and comparability of the results. The MAEs for the SIC calculations from various starting points are shown in Table 3; the agreement with experiment is worse with SIC than without it, as the errors in Table 3 are greater than the DFT values given above. In Table 3, LT and LDQ stand for initialization with FODs/FLOs arising from these two methods; as some of the molecules afford many possible LT structures, the LT configuration leading to the lowest energy was chosen in each case. An analogous procedure was also employed for LDQ. Finally, E min refers to using the energetically lowest-lying solution, regardless of whether it represents LT or LDQ type bonding. The RSIC calculations started from FODs in ERKALE result in DMs that agree with the corresponding PyFLOSIC values in most cases. In cases where RSIC and FLO-SIC disagree, we found that the stability analysis and the subsequent re-optimization of the density in ERKALE has taken the RSIC solution away from the FOD starting point; without the stability analysis, the agreement between RSIC and FLO-SIC is recovered. This indicates that FLO-SIC does not correspond to a local minimum of RSIC in such cases.
Interestingly, the best DMs are obtained when starting from LDQ FODs, both in FLO-SIC calculations with PyFLOSIC as well as RSIC calculations ERKALE; in PyFLOSIC, the MAE is close to the DFT value. Worse DMs are obtained when analyzing DMs from FODs based on LT, or the overall energetically lowest solutions (E min ) which likewise results in the use of some LT structures instead of LDQ structures.

Validation of computational methodology
To emphasize that the results obtained with the PW92 functional and the pc-1 basis set are qualitatively correct, additional DMs of azulene (mol03), see Section IV B 1, were computed using PyFLOSIC with other exchangecorrelation functionals and larger basis sets. Starting out with the question of the exchange-correlation functional, additional calculations were performed with the PBEsol 52 and r 2 SCAN 53 functionals, which are GGA and meta-GGA approximations, respectively. PBEsol has been identified as a particularly good functional for SIC calculations, 54 whereas r 2 SCAN exemplifies the state of the art in non-empirical density functional approximations. For these tests, the FODs were re-optimized for each functional. As can be seen from the results in Table 4, the DMs from PBEsol are close to those from PW92, while r 2 SCAN leads to larger dipole moments than PW92 or PBEsol.
However, most importantly for the present study, the orientation of the DMs are similar in each case: the DMs based on LT FODs are noticeably crooked, confirming that the same issues can be found at other levels of theory and validating the findings of Ref. 15 for the case of FLO-SIC. Table 4. DMs µ for azulene (mol03) using various exchangecorrelation functionals, as well as the angle αLT between the DMs from the two LT structures and the molecular symmetry axis, see Fig. 6. The pc-1 basis set was used in all calculations. Next, the basis question. Repeating the calculations in the split-valence, polarized double-ζ, polarized tripleζ, and polarized quadruple-ζ pc-0, pc-1, pc-2, and pc-3 basis sets, respectively, as well as their counterparts augmented with diffuse basis functions, the aug-pc-0, augpc-1, aug-pc-2, and aug-pc-3 basis sets, respectively, we obtain the results in Table 5. As these calculations are computationally costly, and as we are only interested in the basis set convergence, the FODs were not reoptimized; instead, they were frozen to the pc-1 optimized values. Table 5. DMs µ and the angle αLT between the LT DMs and the symmetry axis for azulene (mol03) using various pc-n and aug-pc-n basis sets and the PW92 functional. The DFT DMs arising from the calculations in the augpc-2, pc-3, and aug-pc-3 basis sets are identical, meaning that the DM has converged to the complete basis set limit. Similarly, there is excellent agreement in the SIC DMs between the pc-3 and aug-pc-2 calculations; SIC calculations were not attempted in aug-pc-3 due to the large size of this basis set. We can thus confirm that the data with the augmented triple-ζ aug-pc-2 basis set is (almost) at the complete basis set limit.
Having established the basis set limit, we can comment on the implications of the basis set converge on the present study. Although improving the basis affects the absolute values of the DMs, the angle between the LT DMs to the symmetry axis remains at roughly 10 • . This illustrates that the symmetry-broken solutions are not artifacts of small basis sets and the results obtained with the pc-1 basis are reliable.
The sufficiency of the pc-1 basis for the present study is further supported by PW92 DFT calculations for the whole DIP16 set in the aug-pc-2 basis. The ME and MAE values for pc-1 are in excellent agreement with values from the aug-pc-2 calculations, which are −0.05 D and 0.10 D, respectively. Since the DFT dipole moments coincide with the molecular plane, the importance of diffuse functions is less pronounced, as the molecular plane already has a decent amount of flexibility from the valence basis functions in the pc-1 basis set.
As a note, we tested whether fixing the FODs corresponding to the 1s orbitals to the nuclear sites helps to converge the FLO-SIC calculations without compromising the resulting DMs. As can be seen from the results presented in the SI, the DMs resulting from such a procedure are in excellent agreement with the DMs from the fully optimized FODs. Although this strategy was not exploited in this work, it could be used in future investigations to achieve better computational efficiency.

C. Further investigations on bonding motifs
We would like to point out that while we have investigated several types of FOD configurations for the DIP16 database, we have not made any claims to have found the best possible FOD configurations for any of the systems examined in this work. Even though we have examined several kinds of electronic geometries in our calculations, that is, multiple LT and LDQ structures, it is possible that some other type of FOD configuration yields even lower energies. However, all the FOD configurations employed in this work are fully reproducible and can even be generated automatically, thus delivering reproducible FLO-SIC solutions; the molecular and FOD configurations used in this work have also been included in the SI.
For the case of benzene, fascinating bonding motifs in which all three double-bond FOD pairs hover out-ofplane on the same side of the molecule have been found and described by Pederson and Baruah 20 for a symmetric molecular geometry for benzene; correspondingly, we will term these hovering FODs. It is likely that several of the molecules in the DIP16 database likewise afford hovering FOD configurations, given that they are cyclic, planar molecules like benzene. However, the role of the hovering bonding motifs is still not clearly understood; at present, we also lack the tools to generate them in an automatic manner. Until now, despite thorough efforts, we have only found stable hovering FODs for ring systems. In contrast, hovering double-bond FOD pairs are not suitable solutions for ethene, (C 2 H 4 ), for example, as such FODs converge back to a LT solution upon optimization. The DIP16 database contains molecules that were also considered by Adhikari et al. 25 . Ref. 25 describes using fixed molecular geometries from a chemical database, and optimizing only a single FOD configuration for each molecule. Because of the problem of many local minima first pointed out in Ref. 15 for variational SIC calculations and extended to FLO-SIC in this work, it is unlikely that such a procedure leads to the global minimum for FLO-SIC. Comparing our FOD configurations with the ones of Adhikari et al., obtained directly from the authors, 55 we observed that their FOD configurations place some double-bond FOD pairs above the molecular plane for acridine (mol01) and phenazine (mol12); the FOD structures for these molecules are thus partly hovering in nature.
In order to try to understand such solutions, we return to our favorite model system-benzene-and start out by examining the FOD structure described by Pederson and Baruah 20 . Employing the methodology described in Section II to optimize this configuration, denoted as 3RH in Fig. 7, we obtain a large out-of-plane dipole moment (ca. 0.10 D) and an energy of −233.013171 E h , which is lower than that of our LT solution (−233.007512 E h ).
Obviously, such a dipole moment is completely artifactual, and is at variance to experiment and results of ab initio calculations. The artifactual dipole moment can be eliminated for this molecule by splitting the α and β FODs onto different sides of the molecule; we denote this novel FOD pattern as unrestricted hovering; it is illustrated as 3UH in Fig. 7. However, as the spin symmetry is now broken, this procedure leads to a slightly non-zero spin density and an Ŝ 2 ≈ 5 × 10 −4 .
However, also other kinds of hovering solutions are possible: instead of hovering all three double-bond FOD pairs, one can also hover just one or two of them, and fully relax the FODs in an unrestricted calculation; the structures are presented in Fig. 7. Hovering one pair lowers the energy (1UH, E = −233.009569 E h ), but induces a large dipole moment (0.28 D) in the molecular plane. Next, hovering two pairs lowers the energy further (2UH, E = −233.011030 E h ), still increasing the dipole moment (0.29 D). Only when all three pairs are hovered is the correct dipole moment of zero obtained along with an even lower energy (3UH, E = −233.0132249 E h ).
Next, starting from the lattermost structure, in which the three hovering double-bond FOD pairs form a triangle, one can vary the the size of the triangle, as well as the distance of the triangle from the bonding plane. For an unrestricted calculation, the α and β triangles' sizes and distance from the molecular plane can be varied to obtain even more hovering structures. The currently lowest energy we have found (−233.015872 E h ) is produced by a hovering structure where the hovering α and β triangles are rotated against each other by 60 • ; this motif thereby belongs to the LDQ class of motifs.
As the above discussion demonstrates, determining the lowest possible solution is non-trivial even for benzene. Larger molecules as those included in the DIP16 set may afford several types of solutions with hovering FODs, and finding these solutions requires global optimization techniques due to the problem of multiple local minima: the two hovering structures employed by Adhikari et al. 25 can likely be further tuned to give a significantly lower energy by hovering more or all of the double-bond FOD pairs.
All FLOs corresponding to hovering FODs appear to be less localized than FLOs from the respective LT or LDQ FODs, both according to visual inspection, as well as according to the Foster-Boys localization criterion. This is another aspect which needs to be studied in more detail in the future. More work is thus needed for a full understanding of any potential hovering FOD solutions.
Moreover, it is important to note that each electronic geometry is likely to correspond to a different molecular geometry. 15 When the geometry is optimized for the LT structure, yielding a symmetry-broken molecular geometry with alternating single and double bonds, 15 the hovering FOD solutions no longer yield a decrease in energy.

V. SUMMARY AND DISCUSSION
As originally pointed out in Ref. 15, symmetry breaking is an important problem in self-interaction corrected calculations. We have shown that Fermi-orbital descriptors (FODs) and the corresponding Fermi-Löwdin orbitals (FLOs) can be used to guide self-interaction corrected calculations towards the wanted type of solution, that is, the wanted type of local minimum. In the first part of the manuscript we showed that symmetry-broken solutions for benzene in real (RSIC) and complex orbital SIC (CSIC) can be reproduced starting from Lewis-like solutions. We demonstrated that a symmetric molecular geometry for benzene can be captured with SIC by starting the calculation from a Linnett double-quartet (LDQ) structure in RSIC, CSIC, and in FLO-SIC. While the symmetric LDQ solution is higher in energy than the  Table 1 for the legend of molecules.
Lewis structures when the molecular geometry is allowed to relax, the LDQ structure is the lowest-lying solution for the symmetric molecular geometries considered in this study. Moreover, the millihartree energy difference between Lewis' theory (LT) and LDQ structures is negligible with respect to the overall SIC contribution which is in the order of hartrees.
In the second part of this study, a benchmark set consisting of 16 planar, cyclic molecules we call DIP16 was used to show that Lewis-like SIC solutions often break the symmetry of the electronic density, while LDQ SIC solutions do not. Reasonable LDQ solutions can typically be constructed by combining two Lewis-like solutions, by taking the spin-up FODs from one Lewis structure and the spin-down FODs from another Lewis structure. Thus, we were able to demonstrate that LDQ SIC is able to provide a simple representation of resonance/aromaticity.
Interestingly, when the resulting Linnett structure is optimized, a dipole moment that coincides roughly with the average of the dipole moments resulting from the two employed Lewis structures is obtained. To quantify this, for our benchmark set a linear combination of LT DMs is compared to the corresponding LDQ DM in Fig. 8; the resulting DMs agree well. It shall be noted that there are cases which require more than two Lewis structures to obtain the LDQ structure, which is the case for acridine and fluorene in the DIP16 set (see the SI).
In summary, we have pointed out that FODs guided by Lewis' theory of chemical bonding tend to result in symmetry breaking, while FODs based on Linnett's theory are able to restore qualitatively correct total densities for the systems studied, at the cost of breaking the spin symmetry.
Molecular symmetries appear to break whenever the FOD arrangement does not have the same symmetry as the molecule. Classifying the FODs in a given molecule with respect to its symmetry-equivalent parts, given by mirror planes or rotations, for example, one finds that if the number of FODs for the symmetry-equivalent parts of the molecule differ, the DM from a self-consistent calculation will point towards the part of the molecule that has more FODs; that is, the density follows the FOD arrangement. Any imbalances in the FOD arrangement result in a bias for the electron density in SIC calculations, as the SIC prefers localized orbitals that are determined by the FODs.
Our findings are not limited to the FLO-SIC method, as we were also able to reproduce the main findings of this work also for RSIC and CSIC by starting the calculations from FLOs guided by LT or LDQ FODs.
A careful initialization of SIC calculations is of paramount importance. For instance, various starting points may lead to convergence to different local minima. In many cases, several FOD arrangements have to be evaluated to find the minimum-energy SIC solution. Employing a single type of FOD arrangement may lead to incorrect conclusions.
Concluding the summary, we move on to further discussion. The second part of this study-the examination of the DIP16 dataset described in this work-employed fixed molecular geometries. Optimizing the geometries will likely bias the dipole moments further if the FODs break the molecular symmetry. New advances in theory are warranted in order to improve the usefulness of self-interaction corrected calculations for practical applications, which typically require the determination of optimal molecular geometries, for instance, and we hope to address such topics in future work.
Furthermore, we excluded hovering FOD structures from the current investigations on the DIP16 dataset. Although all presently-known hovering structures can be constructed starting from a LT or LDQ configuration by moving some of the FODs out-of-plane, there is currently no elegant, systematic way to set up reproducible hovering motifs for more complex molecules, such as those included in the DIP16 benchmark set of this work. The phenomenon of hovering FODs is not yet understood, and we hope to find suitable explanations in further research.
As shown in this study, several kinds of FOD configurations and FLO-SIC solutions are possible for molecules as simple as benzene, as could be expected from the results of Ref. 15 for fully variational SIC. We have demonstrated in Section II that the various kinds of FLO-SIC solutions all correspond to local minima at symmetric nuclear geometries for benzene. Importantly, as shown here and in Ref. 15, if the nuclear geometry is relaxed, the resulting optimal molecular geometries may be different for each type of solution, further complicating the problem of locating the lowest-lying solution.
Finally, we would also like to point out that free and open-source software is essential for a fruitful development of computational methods, 33,56 as they enable straightforward one-to-one comparisons as in this work. All authors of this paper are either main developers of, or contributors to various free and opensource software electronic structure programs, making self-interaction correction methods readily available to a broad community of physicists, chemists, and material scientists. ACKNOWLEDGMENT We thank all the sponsors and guests of the "Quo Vadis Self-Interaction Correction?" workshop (Freiberg, Germany, 2019), as this meeting brought some of the present authors together in person for the first time and thereby enabled them to start a fruitful cooperation. Moreover, it was an honor to meet Prof. John P. Perdew, the heart and one of the most important persons in SIC and a living legend in the field of electronic structure. Further thanks goes to Prof. Mark R. Pederson for creating the FLO-SIC method and Prof. Hannes Jónsson for major contributions in the development of the CSIC method, which are both methods discussed within the manuscript. This contribution is part of our Open-SIC project, which aims to develop fresh perspectives in the field of self-interaction corrections using opensource software. J. Kortus and S. Schwalbe have been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) -Project ID 421663657 -KO 1924/9-1. We acknowledge the contributions of Jakob Kraus at an early stage of this project. We furthermore thank the ZIH in Dresden for computational time and support.

SUPPORTING INFORMATION
See the supporting information for thorough information on the used molecular geometries, the full data set of dipole moments, comparison of the LDQ dipole moments as averages of the respective LT dipole moments, the molecule-by-molecule discussion of the electronic geometries and the resulting dipole moments of the DIP16 data set, as well as the analysis of freezing the 1s FODs at the nuclei. The full set of molecular and electronic geometries is also available as part of the supporting information.
Click on the pin for the supporting infomation.

DATA AVAILABILITY STATEMENTS
The data that supports the findings of this study are available within the article and its supporting information. The data are also openly available in https: //gitlab.com/kaitrepte/dip16.

AUTHOR DECLARATIONS
The authors have no conflicts to disclose.