The Adaptive Shift Method in Full Configuration Interaction Quantum Monte Carlo: Development and Applications

In a recent paper, we proposed the adaptive shift method for correcting the undersampling bias of the initiator-FCIQMC. The method allows faster convergence with the number of walkers to the FCI limit than the normal initiator method, particularly for large systems. In its application to strongly correlated molecules, however, the method is prone to overshooting the FCI energy at intermediate walker numbers, with convergence to the FCI limit from below. In this paper, we present a solution to the overshooting problem in strongly correlated molecules, as well as further accelerating convergence to the FCI energy. This is achieved by offsetting the reference energy to a value typically below the Hartree-Fock energy but above the exact energy. This offsetting procedure does not change the exactness property of the algorithm, namely convergence to the exact FCI solution in the large-walker limit, but at its optimal value greatly accelerates convergence. There is no overhead cost associated with this offsetting procedure, and is therefore a pure and substantial computational gain. We illustrate the behavior of this offset adaptive shift method by applying it to the N$_2$ molecule, the ozone molecule at three different geometries (equilibrium open minimum, a hypothetical ring minimum, and a transition state) in three basis sets (cc-pV$X$Z, $X$=D,T,Q), and the chromium dimer in cc-pVDZ basis set, correlating 28 electrons in 76 orbitals. We show that in most cases the offset adaptive shift method converges much faster than both the normal initiator method and the original adaptive shift method.


I. INTRODUCTION
The Schrödinger equation contains, in principle, all the information necessary for predicting the electronic structure of quantum chemistry and solid-state physics. Yet solving the equation accurately remains an extremely challenging task except for small systems. The problem lies in the explosion of the size of the Hilbert space in which the solution lives. This size scales combinatorially with the number of electrons and the size of the basis set. As a result, an exact diagonalization method like Full Configuration Interaction (FCI) rapidly becomes prohibitive in the memory required for the entire wavefunction vector. Full configuration interaction quantum Monte Carlo (FCIQMC) is a stochastic method that circumvents this problem by utilizing a sparse representation of the wavefunction using a set of signed walkers 1 . The walker dynamics is governed by the imaginary-time Schrödinger equation, and with sufficient walkers, converges in the long-time limit to the ground state of the system of interest. In its initiator variant, it has a comparatively low memory requirement, and the method is also highly-parallelizable and has been successfully applied not only to molecules but also a wide variety of systems including solids 2 , the uniform electron gas 3,4 , and the Hubbard model 5 . The utility of the method is not restricted to ground states but has been extended in several different ways, including to excited states 6 , calculating properties by sampling reduced density matrices 7 , a) Electronic mail: k.ghanem@fkf.mpg.de obtaining electronic spectra 8 and full spin-adaptation, targeting specific spin states, using the graphical unitary group approach 9 .
Although the original FCIQMC algorithm can converge to the exact ground state, its convergence requires a minimum number of walkers in order to maintain stable dynamics and in order for the sign structure of the wavefunction to emerge 10 . This minimum number depends on the system under study and is a fraction of the full size of the Hilbert space but still scales with it, making the method impractical for large systems. The initiator approximation (i-FCIQMC) solves this issue at the cost of introducing a systematically-improvable bias. In the initiator approximation, the dynamics of the walkers is constrained such that only walkers on determinants with enough population (called initiators) are allowed to spawn new walkers onto empty determinants. Spawns of a non-initiator to the empty determinants are rejected. This constrained dynamics prevents non-initiators, which are potentially sign-incoherent, from propagating noise and contaminating the wavefunction. Consequently, the simulation remains stable at any number of walkers, allowing i-FCIQMC to tackle much larger systems than the original FCIQMC algorithm. The main drawback is that the initiator criterion effectively scales down certain off-diagonal elements of the Hamiltonian, leading to an undersampling bias of the non-initiators.
In a recent paper 11 , we proposed a remedy to this bias by giving each non-initiator determinant its own local shift, as a proportionately-scaled value of the overall global shift. The scale factors are based on the weighted acceptance probability of the non-initiator successfully arXiv:2010.05743v1 [physics.comp-ph] 12 Oct 2020 spawning a walker, where the weights are computing using perturbation theory. We showed that this method reduces the initiator bias significantly, and leads to convergence to near-FCI energies at a much lower number of walkers than the normal initiator method. However, it was also found that, in some cases, the method rather overcorrects the initiator bias, leading to an overshoot in the energy estimates and a convergence from below. As a result, a global way of controlling the strength of the correction is desired. In this paper, we present a simple way of tuning the adaptive shift correction that results in a faster converge of the method than either the normal initiator and the original adaptive shift.
The paper is structured as follows. We first review the FCIQMC in both its original formulation and its initiator approximation and discuss the bias it introduces. We then explain the adaptive shift method and how it rectifies this bias. We introduce an offset version of the adaptive shift as a simple method which gives further control on the extent to which the local shift at each non-initiator is modified, and demonstrate its effect using the example of N 2 molecule, at equilibrium and stretched geometries in the cc-pVDZ and cc-pVQZ basis sets. Next, we show results using this extended version of the adaptive shift for two challenging highly multi-configurational systems: ozone, and the chromium dimer. We conclude with a summary and a discussion.

II. FCIQMC AND THE INITIATOR APPROXIMATION
FCIQMC is a stochastic projective method in the space of Slater determinants 1 . As a projective method, it starts from some initial state with non-zero overlap with the ground state and evolves it according to the imaginarytime Schrödinger equation Setting E equal to the ground state energy E 0 , contributions from excited states decay exponentially and the wavefunction |ψ(τ ) approaches the ground state in the long-time limit. The ground state energy is not known a priori, so it must be estimated from the wavefunction itself, as will be discussed later. As a determinant method, FCIQMC solves the Schrödinger equation in the Hilbert space spanned by determinants constructed from a finite orthonormal basis of single-particle orbitals. Let C i := D i |ψ be the projection coefficient on determinant |D i , known as Configuration Interaction (CI) coefficient, the Schrödinger equation in this space reads where H ij := D i |Ĥ|D j is the Hamiltonian matrix in the CI basis. Deterministic integration of the above equation would require storing the entire CI vector, which is prohibitive except for small systems; the number of determinants scales combinatorially with the number of electrons and the size of the single-particle basis. This memory bottleneck is reduced in FCIQMC by a stochastic sampling of the wavefunction such that only a fraction of the full Hilbert space is simultaneously occupied. The wavefunction is encoded using a population of walkers, each of which has either a positive or negative sign and occupies one determinant only. Walkers can die or duplicate on their own determinants, and they can spawn walkers on other determinants connected by non-zero matrix elements. A CI coefficient of a determinant C i is represented by the signed sum of walkers on that determinant N i . If the population of walkers is then updated randomly such that the average change at each time step respects the Schrödinger equation i.e.
then due to linearity, the average values themselves satisfy the Schrödinger equation and we get CI coefficients as averages of the walker populations In order for the linearity argument to hold, the energy estimate E must be independent of the walker population. In practice, however, a decomposition into a fixed Hartree-Fock energy term E HF and a varying shift term S(τ ) is used where the shift is updated periodically to keep the total number of walkers roughly constant. As a result, there is a weak correlation between the shift and the walker population leading to what is known as population bias 12 . This bias is typically very small and negligible in comparison to the statistical error that inherently exits in estimates of FCIQMC. In the long-time limit, the shift S fluctuates around the correlation energy and E around the total energy. The master equation (3) is carried out via three steps of walker dynamics: a spawning step corresponding to the non-diagonal elements of the Hamiltonian, a death step corresponding to the diagonal elements, and an annihilation step, where walkers of opposite signs on the same determinant are canceled and removed. The details of these steps are explained in Ref. 1 . Several technical improvements of the original algorithm have been proposed since then: floating-point weights for walkers, energy estimation using trial wavefunctions, semistochastic projection 13,14 , efficient parallelization 15 , and efficient excitation generation 16,17 . These improvements help reduce the statistical errors and the computational cost significantly but do not fundamentally change the dynamics of the walkers.
Maintaining stable dynamics in the original FCIQMC requires a system-dependent minimum number of walkers. Below this number, the signs of low-populated determinants fluctuate randomly, leading to sign-incoherent noise. This noise then propagates via spawning and dominates the wavefunction preventing it from converging to the ground state. Although the minimum number of walkers is typically less than the full size of the Hilbert space, it still scales exponentially with the system size. This is the manifestation of the sign-problem in FCIQMC 10 .
The initiator approximation (i-FCIQMC) overcomes the issue by constraining the dynamics of the walkers 18 . In this method, walkers residing on determinants with enough population (denoted as n a and typically chosen to be three) are called initiators. These walkers are the only ones allowed to spawn onto empty determinants. The other walkers (non-initiators) can spawn only onto an already-occupied determinant. In essence, i-FCIQMC is modifying the non-diagonal matrix elements of the master equation (3) as following This prevents non-initiators, whose sign may not be stable, from propagating sign-incoherent noise and thus allows convergence with arbitrarily small number of walkers. The price, however, is introducing an initiator bias that gets smaller as the number of walkers increases. In the limit of a large population, all determinants become occupied and the full method is recovered, yielding exact results. Nevertheless, it is of interest to be able to mitigate the effects of the initiator bias, either by providing estimates of the initiator bias at a given walker number, or else by accelerating the convergence with walker number towards the exact FCI limit. Regarding the first strategy, perturbative estimates of the initiator bias can be made, to second 19 or more recently third-order 20 Epstein-Nesbet perturbation theory, inspired by similar PT2 corrections to selected CI methodologies 21,22 . Preconditioning methods 23 help in reducing the inherent noise in the PT2 estimators, although the replica-trick 7 is necessary to obtain unbiased energy estimates. Regarding the second strategy of accelerating convergence to the exact FCI limit, the adaptive shift (AS) methodology 11 has been proposed, which aims at improving the sampled wavefunction, not just the energy. Such improved wavefunctions allow calculation of properties 24 via reduced den-sity matrices, including electronic gradients in stochastic CASSCF procedures 25,26 . It has been shown that such improved wavefunctions (for example at the level of triple and quadruple excitations) can be successfully used in other methods, such as externally corrected and tailored Coupled-Cluster theory [27][28][29] . The AS methodology is closer in spirit to coupled electron-pair approximations (CEPA) 30 and also down-folding concepts 31 , effectively modifying the diagonal matrix elements via local shifts at the determinants at which the initiator approximation is occurring.

III. THE ADAPTIVE SHIFT METHOD
The initiator bias is the result of the undersampling of non-initiator determinants. From Eq. (8), we see that Hamiltonian matrix elements between any two initiators or between an initiator and a non-initiator stay unmodified. However, matrix elements between any two noninitiators get effectively reduced in the long-run, because whenever one of them is zero, the other cannot spawn onto it.
To see how one might rectify such a bias, let us first consider the following simpler truncation criterion: Spawns are allowed only to a pre-specified set of determinants A, while spawns to other determinants R are rejected. The rejection of spawns onto set R biases the walker population in the set A due to the lack of subsequent back-spawns from R to A. Nevertheless, one can still, in principle, recover the exact dynamics of the set A, if the exact wavefunction amplitudes C i are known. This can be achieved by rewriting the master equation 3 such that each determinant in A gets its own local shift where the back-spawns of the set R are now folded into the local shift In practice, this formal expression of the local shift should be approximated by some heuristic estimate of the missing back-spawns. Motivated by the above argument, the adaptive shift method uses the idea of local shifts to compensate for the undersampling bias of the initiator criterion. The truncation in the initiator method is, however, more intricate and differs in two aspects from above. First, the set of rejected determinants R is dynamically changing from one time step to the next. As a result, the local shifts should also be updated dynamically to reflect this dynamic truncation. Second, only the spawns of noninitiators onto R are rejected, while initiators can spawn freely. Therefore, we need to keep the shifts of initiators unmodified.
In Ref. 11 , we proposed using local shifts S i (τ ) that are fractions of the total shift S(τ ) where the fraction f i should be proportional to how severely determinant |D i is affected by the undersampling. A systematic and computationally-cheap way of assigning these factors is as follows. We associate a positive weight w ij with each attempted spawn from determinant |D i to determinant |D j , and monitor which of these spawns are accepted and rejected due to the initiator criterion. The factor f i of determinant |D i is then computed as the ratio of accepted weights over all weights Regardless of the exact form of the weights, the above expression ensures that initiators get the total shift automatically. Besides, the local shift of a non-initiator monotonically approaches the total shift as its local Hilbert space becomes more populated. This guarantees that we restore the original master equation of FCIQMC in the large walker limit.
In this study, we use weights w ij derived from perturbation theory 31 where the first-order contribution of |D i to the amplitude of |D j is used as a weight for spawns from |D i to |D j We found these weights to be particularly suitable for ab initio systems because they account for the nonuniform matrix element magnitudes |H ij | in these Hamiltonians and the resulting non-equal contributions in a local Hilbert space.

A. Fine Tuning The Adaptive Shift
In Ref. 11 , we showed that the adaptive shift method is effective in correcting the undersampling bias caused by the initiator criterion. As the number of walkers increases, energy estimates using the adaptive shift method typically converge much faster than the normal initiator method . Still, we found that for some systems, the energy converges from below, indicating that the adaptive shift is rather over-correcting. It is then desirable to be able to tune down this correction.
A key observation is noting that the definition of the total shift S is not unique. In Eq. (6), the shift is taken with respect to the Hartree-Fock energy, but we can equally take it with respect to any other reference energy E Ref ≥ E 0 . Without the adaptive shift method, this change of reference does not affect the results, and the new shift would fluctuate around the residual energy E 0 − E Ref instead of the correlation energy. The choice of Hartree-Fock energy E HF as a reference is driven by convenience since its shift can be used as an independent estimator of the correlation energy. Using the adaptive shift method, however, the choice of the reference energy affects the amount of death rate that gets scaled down by the factors f i . In the extreme case when E Ref = E 0 , the total shift S(τ ) fluctuates around zero and the modification factors have little to no effect on the result.
To account for this degree of freedom, we introduce an offset parameter ∆ := E Ref − E HF ≥ E correlation and redefine the local shift as following This new parameter allows us to control the overall strength of the adaptive shift correction. When ∆ = 0, this definition coincides with the original definition of the local shift (11), and the correction is applied fully. When ∆ = E correlation , the adaptive shift has practically no effect on the result, and we are back to the normal initiator. We therefore expect appropriate values of the offset ∆ to be in the range E correlation < ∆ < 0, and with a guess of the correlation energy (obtained for example from MP2, or coupled cluster or other sources), an educated guess for a good value of ∆ can easily be made. As will transpire from the calculations we will report, ∆ ≈ E correlation /2 seems to be a universally good value. Note that for initiators, i.e., when f i = 1, the local shift is independent of the offset and always equals the total shift.

B. Illustrative Example
To illustrate the effect of varying the offset parameter of the adaptive shift, we consider N 2 molecule in cc-pVDZ basis for a near-equilibrium geometry (2.118 a 0 ) and a stretched geometry (4.2 a 0 ). This system is a toy problem (10 electrons in 26 orbitals) where the exact result is known, and the normal initiator method already converges to chemical accuracy using less than 10 5 walkers.
For the near-equilibrium geometry, the initiator method using n a = 3 and 3 × 10 5 walkers gives −109.278208(6) a.u., which is 0.131 mH above the exact energy of −109.278339. Applying the full adaptive shift correction (i.e., with offset ∆ = 0), we get −109.278454 (5), which is below the exact energy by a similar amount (0.115 mH). As the number of walkers is increased to 3M, the initiator method converges from above, while the full adaptive shift converges from below, bracketing the exact energy to within 0.02 mH. Clearly, the full adaptive shift is over-correcting here, so we next apply the adaptive shift with an offset ∆ = −0.15, which is around half the correlation energy. Using this offset, we get −109.278344(8) using only 3×10 5 walkers, matching the exact energy within 0.005 mH. As the number of walkers increases, the energy stays constant within the statistical error, which shows that the adaptive shift correction for this offset cancels almost exactly the initiator bias. Lowering the offset further to ∆ = −0.2 and ∆ = −0.25 leads to less correction and thus higher energies. The results are shown in Fig. 1.
We observe similar behavior for the stretched geometry as we vary the offsets and the number of walkers (See Fig. 2). The initiator method converges from above, while the full adaptive shift (∆ = 0) overshoots and converges from below. The errors of both methods are larger here than for the previous geometry. For exam-ple, using 3 × 10 5 walkers, the initiator method is 0.449 mH above the exact energy of −108.966950, and the full adaptive shift is 0.433 mH below it. Setting the offset to ∆ = −0.28 reduces the overshoot of the adaptive shift significantly, yet the results still converge from below. By lowering the offset to ∆ = −0.43, the results now converge from above. Using these two offsets, the exact energy is bracketed to within 0.16 mH at already 3 × 10 5 walkers. For the largest number of walkers of 10M, all results are within 0.01 mH of the exact energy. Again for this stretched system, an offset of roughly half of the correlation energy (-0.744 mH) seems to be an appropriate value to select.
We also considered the equilibrium geometry (1.0977 A) in the larger cc-pVQZ basis (108 orbitals) whose results are shown in Fig. 3. Using 3M walkers, the normal initiator gives −109.40522(2) a.u., which is 0.41 mH above the CCSDTQ result of −109.405633. On the other hand, the full adaptive shift gives −109.40637(4) a.u. which is 0.49 mH below CCSDT(Q) result of −109.405885. Increasing the number of walkers to 100M, the normal initiator energy comes down below the CCSDTQ result to −109.405721 (6), while the full adaptive shift energy goes above the CCSDT(Q) to −109.405863 (9). We applied the offset adaptive shift method with three different offsets. Using ∆ = −0.11, the results converge from below faster than the full adaptive shift to the value −109.405821(6) a.u. Using ∆ = −0.31, the results converge from above faster than the normal initiator to the value −109.405761(7) a.u. The best correction is obtained using ∆ = −0.21, where the energy stays roughly constant in the middle as the number of walkers increases. With 3M walkers, this offset gives energy −109.40577(4) a.u., which is only 0.02 mH away from the value −109.405799(7) a.u. obtained with the much larger population of 100M walkers. Since the total correlation energy is approximately −0.415, this again implies that a good value of the offset is simply half of the total correlation energy.

IV. RESULTS
In the following, we report the results of the adaptive shift for two systems well-known to be challenging and have been the subject of recent high-level calculations: the ozone molecule and the chromium dimer. In the case of the ozone molecule, a recent CEEIS study by Ruedenberg et al 32 concluded that accurate energy differences in this system require CI expansions to at least sextuple excitations. High-order coupled cluster calculations (Table I) indicate significant differences between the energy gaps predicted by CCSDT, CCSDT(Q), and CCSDTQ, this being another indication of the intricate level of correlation necessary to describe this system.
The calculations were done using the NECI code, which provides a state-of-the-art and efficient implementation of FCIQMC 33 . In all calculations, the initiator thresh-old is set to n a = 3. The calculations start with 100 walkers on the Hartree-Fock (HF) determinant and grow the population with zero shift till the target number of walkers N w is reached. The shift is then varied to maintain a stable number of walkers. When the adaptive shift method is applied, the local shifts are used only after the target population is reached, and the shift starts varying. After 1000 iterations of reaching the target population, a semistochastic space D and a trial wavefunction space T are constructed out of the topmost occupied determinants. Energy estimates are computed by projecting on the trial wavefunction. Tables of the results can be found in the supplementary material file.

A. Ozone
The ozone molecule was treated in three basis sets: cc-pVDZ (39 orbitals), cc-pVTZ (87 orbitals) and cc-pVQZ (162 orbitals) and three different geometries: an open minimum (OM), a meta-stable ring minimum (RM), and a transition state (TS) between these two minima, correlating 18 valence electrons. RHF orbitals were used. For all cases, we ran normal initiator calculations, full adaptive shift calculations, and offset adaptive shift calculations with offsets chosen to be roughly half of the correlation energies. We set the sizes of the semistocastic and trial spaces respectively as following: |D| = 1000 and |T | = 100. As the number of walkers increases, in all cases, the initiator calculations converge from above, while the full adaptive shift calculations converge from below. The offset adaptive shift energy is bracketed from above by the normal initiator and below by the full adaptive shift method and gives the most well-converged results.
For the cc-pVDZ basis, we ran calculations using 10M, 30M, and 100M walkers. For the OM geometry, the offset adaptive shift energy is right between the CCSDTQ result of −224.91555 and the CCSDTQ(P) result of −224.91641 with the value −224.91614(3) obtained using 100M walkers (Fig. 4). For the RM geometry, the offset adaptive shift energy using 10M walkers coincides with the CCSDTQ(P) result of −224.86317 but then raises to −224.86297(4) using 100M walkers. This is about 0.3 mH below CCSDTQ result of −224.86266 (Fig. 5). For the TS geometry, the offset adaptive shift energy stays relatively stable as the number of walkers increases with value −224.83115(4) obtained using 100M walkers. This value lies again between the CCSDTQ result of −224.82959 and the CCSDTQ(P) result of −224.83211 (Fig. 6).
For the cc-pVTZ basis, we used populations of 10M, 30M, 100M, and 300M walkers. For all geometries, the offset adaptive shift energy converges from above, with all values falling between CCSDT and CCSDT(Q) results. The converged energy of OM geometry is −225.1375(1), which is 5.8 mH below the CCSDT result of −225.1317 and is 2.1 mH above the CCSDT(Q)   result of −225.1396 (Fig. 7). The converged energy of RM geometry is −225.0896(1) which is about 3.7 mH below the CCSDT result of −225.0859 and about 1.3 mH above the CCSDT(Q) result of −225.0909 (Fig. 8). For the TS geometry, the full adaptive shift result increases up to −225.0517(3), while the offset adaptive shift decrease down to −225.0493 (3). We believe the true value to be in between, about 1 mH from our best estimate of −225.050. This value is 3 mH above the CCSDT(Q) result of −225.0533 and 14 mH below the CCSDT result of −225.0360 (Fig. 9).
For the cc-pVQZ basis, we used 30M, 100M, 300M and 1B walkers. The offset adaptive shift energy for the OM geometry converges to −225.2058 (2), which is right between CCSDT result of −225.2005 and the CCSDT(Q) result of −225.2089 (Fig. 10). For the RM geometry, the offset adaptive shift energy converges from above to −225.1558(3), which is 2.7 mH below the CCSDT result of −225.1531 (Fig. 11). For the TS geometry, the offset adaptive shift energy also converges from above down to the value −225.1146(1), while the full adaptive shift converges from below up to −225.1169 (2). The midpoint −225.116 is about 13 mH below the CCSDT result of −225.1025 (Fig. 12).
Interestingly, using CASSCF for the TS geometry in cc-pVQZ basis, we obtained molecular orbitals that give reference energy of −224.2602, 60 mH below that of RHF energy of −224.1998. Using this new set of orbitals, we then applied the full adaptive shift and the offset adaptive shift with the same offset value as before. The full adaptive shift gives almost identical results to the ones using RHF orbitals with differences less than the statistical error bars. Using the offset the adaptive shift, the results are very close to the earlier ones and slightly below. The difference is about 1 mH for 300M walkers and gets smaller as the number of walkers increases. This demonstrates that the adaptive shift results are to a large extent independent of such changes of orbitals.
To our knowledge, these are the first FCI estimates in a cc-pVQZ basis set of the valence-only correlated ozone molecule in these three geometries. The gaps between these states can be compared with those reported by Umrigar et al using the semi-stochastic Heat-Bath CI-PT2 method at the cc-pVTZ level. At the cc-pVTZ level, the agreement between the two is on the order of less than 0.01eV, i.e. very good. The observed change in the gaps with basis set is much larger, highlighting the crucial importance of assessing basis-sets errors, to obtain quantitatively reliable energy differences. Since the memory footprint for treating large basis-sets can be prohibitive, the usefulness of the AS methodology is here shown, whereby the cc-pVQZ problem was treated with the same hardware resource as that for its cc-pVTZ counterpart. The Coupled Cluster gaps show erratic behavior with increasing the level of CC theory (with especially large changes observed in going from CCSDT to CCSDT(Q)), but an excellent agreement is achieved between the AS-FCIQMC gaps and CCSDTQ(P) gaps. Unfortunately, the latter was affordable only at the cc-pVDZ level, because of the memory requirements of storing the CCSDTQ method for larger basis sets.

B. Chromium Dimer
Next, we applied the offset adaptive shift FCIQMC method to the chromium dimer treated in a cc-pVDZ-DK basis with the X2C relativistic Hamiltonian, correlating 28 electrons in 76 orbitals (Ne cores frozen). We used orbitals obtained from a singlet-spin CASSCF (12,12) calculation.   This system is a challenging multi-configurational problem even at the equilibrium geometry (1.68Å), and has been the subject of a recent extensive DMRG study going up to very large bond dimensions 34 , as well as an SHCI study with very large variational spaces of 2-billion Slater determinants 35 . A DMRG variational total energy of −2099.9061 is obtained using a bond dimension M = 16000 in the default scheme, with an extrapolated FCI energy of −2099.9195 (27). The variational energy of SHCI is found to be −2099.906322. Adding the perturbative correction, the energy of the SHCI calculation falls to −2099.919205, and further extrapolation to the zero PT2 limit yields the FCI estimate of −2099.9224 (6)   (see Fig. 13), close to 3 mH below the DMRG estimate.
We started the calculations by using two offsets obtained as the correlation energies of two trial wavefunctions. The first offset ∆ = −0.64 is the correlation energy of the trial wavefunction resulting from diagonalizing the space of the top 100 most-populated determinants in a normal initiator calculation. The other offset ∆ = −0.82 is the correlation energy of the trial wavefunction obtained from diagonalizing the space of the top 1000 most-populated determinants. In each case, the energies are estimated by projecting on the corresponding trial wavefunctions. We also used a semistochastic space with |D| = 50K determinants. As we increase the number of walkers from 100M up to 6.4B, the energy using the first offset increases monotonically from −2099.9239(2) to −2099.92170 (5). On the other hand, the energy using the second offset decreases monotonically from −2099.9170(6) a.u. to −2099.92029(3) bracketing the exact energy to within 1.4 mH. The results using 1.6 billion walkers or larger fall between the extrapolated DMRG energy 34 of −2099.9195 (27) and the extrapolated SHCI energy 35 of −2099.9224(6) (see Fig. 13).
To bracket the exact energy even more accurately, we ran additional adaptive shift calculations using three offsets evenly spaced between the previous two. At all walker populations except the largest (6.4B), we used a recent efficient implementation of the semistochastic method that allowed constructing very large semistochastic spaces with |D| = 1M determinants. For 6.4B walk-ers, we had to restrict the semistochastic space back to 50K to avoid memory issues for these demanding calculations. The size of the trial space used in these calculations was set to |T | = 1000. Using ∆ = −0.74, we get an energy estimate of −2099.92086(3) at 6.4B walkers, while using ∆ = −0.69 we get a slightly lower estimate of −2099.92130(2) at the same number of walkers. Both these offsets appear to saturate for populations larger than 800M walkers, so it is hard to tell which of the two provides a better estimate, although the discrepancy between them is small (0.4mH). Combined with the trends of the other offsets, the true energy is likely to be between these two values, which pins it to within 0.3 mH of our best estimate −2099.921.
To make sure that these offset adaptive shift calculations are ergodic, we repeated the first four calculations using the same two offsets but with different seeds for the random number generator. The results of different seeds agree within the statistical error bars as shown in Fig. 13. It is encouraging that, for the same offset, different simulation trajectories yield essentially identical energies, whilst the spread of energies among the different offsets is also not very large: about 0.5 mH -much smaller than the current discrepancy of 3 mH among the best alternative estimates given by DMRG or SHCI. Furthermore, the number of walkers required to give reliable energies is on the order of a few hundred million walkers -rather smaller than the 2B determinants used in the SHCI calculation.

V. SUMMARY AND DISCUSSION
In this paper, we have reviewed and extended the adaptive shift method for unbiasing the initiator approximation of FCIQMC. The adaptive shift method greatly helps to accelerate the convergence of the initiator approximation with respect to the number of walkers. For some systems, the original adaptive shift method may overcorrect the initiator bias, leading to an overshoot in the energy estimates and a convergence from below. To remedy this, we have presented a simple way of modifying the adaptive shift correction that takes advantage of the inherent flexibility in the definition of the shift. We have shown that adding a fixed offset to the shift reduces the strength of the correction and provides a natural interpolation between the normal initiator method and the full adaptive shift method. In all cases where an offset is needed, we found that using a value, which is about half the correlation energy provides good results. The question of whether an optimal offset exists and how to automatically estimate it is still currently under investigation. We observe a strict ordering between the energy curves of different offsets in all cases: lower offset leads to higher energy at any specific number of walkers. By varying the offset, we can achieve faster convergence to the exact energy and bracket it between the curves of different offsets. Using this approach, we are able to bin down energies of difficult strongly-correlated systems, such as chromium dimer in a cc-pVDZ basis, to a sub-mH accuracy.

SUPPLEMENTARY MATERIAL
Tables of the results can be found in the supplementary material file.