Minimising biases in Full Configuration Interaction Quantum Monte Carlo

We show that Full Configuration Interaction Quantum Monte Carlo (FCIQMC) is a Markov Chain in its present form. We construct the Markov matrix of FCIQMC for a two determinant system and hence compute the stationary distribution. These solutions are used to quantify the dependence of the population dynamics on the parameters defining the Markov chain. Despite the simplicity of a system with only two determinants, it still reveals a population control bias inherent to the FCIQMC algorithm. We investigate the effect of simulation parameters on the population control bias for the neon atom and suggest simulation setups to in general minimise the bias. We show a reweighting scheme to remove the bias caused by population control commonly used in Diffusion Monte Carlo [J. Chem. Phys. 99, 2865 (1993)] is effective and recommend its use as a post processing step.


I. INTRODUCTION
A cheap and accurate computational description of the ground state energy of a chemical system remains one of the principal challenges in electronic structure theory, yet achieving both of these goals systematically remains beyond the grasp of current approximations. Hierarchies of methods of increasing sophistication have been developed in the quantum chemistry community which systematically capture increasing amounts of the electron-electron correlation energy at the expense of additional computational cost. These methods start from Hartree-Fock 1 which scales modestly with the fourth power of the number of electrons to Full Configuration Interaction (FCI) which captures the maximal amount of electronelectron correlation in a finite basis set but scales factorially with the number of electrons. Approximations (which are often very accurate) such as density fitting can potentially reduce the scaling of these methods. 2 If FCI is used with a large enough basis set or an extrapolation to the complete basis set limit, 3,4 energy differences can be obtained to chemical accuracy (1 kcal/mol) providing direct comparison with experiment. Unfortunately, the factorial scaling with the number of electrons makes it unfeasible for studying anything but the smallest of chemical systems.
Full Configuration Interaction Quantum Monte Carlo (FCIQMC) 5 marries FCI with a projector Monte Carlo paradigm but crucially requires no a priori knowledge of the sign structure of the wavefunction. FCIQMC has two principal advantages over conventional FCI: the storage requirements are greatly reduced due to a sparse stochastic representation of the wavefunction, 5 and it can be efficiently parallelised. 6 The storage requirements for FCIQMC depend on the (systemdependent) severity of the Fermion sign problem 7 and are often orders of magnitude less than conventional FCI calculations. FCIQMC with the controllable initiator approximation 8 has allowed molecular systems with Hilbert spaces of 10 29 Slater determinants 9 and the uniform electron gases with Hilbert spaces containing up to 10 108 determinants 10 to be studied. The FCIQMC methodology was subsequently extended to coupled cluster, 11 and we believe stochastic approaches are becoming increasingly important to the quantum chemistry community due to the need for scalable algorithms which are well-suited to modern computer architectures.
Some questions still remain concerning the best way to use Monte Carlo to solve the FCI equations. The FCIQMC algorithm is not a black box, and a choice has to be made about calculation parameters which control the stochastic sampling and hence the systematic and stochastic errors inherent to the simulation for a given amount of computational resources. In this article, we investigate the behaviour of FCIQMC simulations to understand the relationship between parameter choices and errors by investigating the exact distribution obtained from a Markov chain transition matrix.
Section II contains a brief recap of the FCIQMC method. We show in Sec. III that FCIQMC is an example of Markov Chain Monte Carlo (MCMC). We use these ideas to investigate population control bias in the two determinant H 2 system and more realistic calculations on the neon atom in Sec. IV. We draw conclusions and provide suggestions on simulation strategies in Sec. V. Atomic units are used throughout. The many-electron Hamiltonian and all energies have been shifted to be relative to the absolute Hartree-Fock energies of the appropriate system. Error bars signify one standard error, an estimate of the standard deviation, either side of the mean value.

II. FCIQMC
We briefly review the FCIQMC method, which is discussed in more detail in (e.g.) Refs. 5 and 7. The imaginarytime Schrödinger equation is where S is an energy offset, which we shall discuss in the context of FCIQMC later, introduced to control normalisation. The general solution to Eq. (1) is |Ψ(τ)⟩ = e −τ(Ĥ −S) |Ψ(τ = 0)⟩, which in the long-time limit tends to the lowest eigenstate with which the initial wavefunction has a non-zero overlap.
We begin with the Configuration Interaction (CI) ansatz where the wavefunction is a linear combination of Slater determinants: It is convenient (though not necessary 12 ) to represent the coefficients by a discrete set of signed particles which we shall call psips following Anderson. 13 Booth et al. 5 showed that a finite-difference approximation to Eq. (1) could be sampled by allowing a psip on one determinant to create a new psip on another determinant ("spawn") or on the same determinant ("death") with probability proportional to the connecting Hamiltonian matrix element. Pairs of psips with opposite signs on the same determinant are removed ("annihilated") at the end of each timestep. After a sufficient number of such steps, the psip vector becomes a stochastic representation of the eigenvector. The finite difference approximation introduces no timestep errors if the timestep, δτ, satisfies δτ < 2(E max − E 0 ) −1 , where E max (E 0 ) is the highest (lowest) eigenvalue of the Hamiltonian; 7 a property FCIQMC shares with Green's function quantum Monte Carlo. 14 Following an equilibration phase, the shift is periodically updated every A steps to control the psip population using 5 where γ is a damping factor and N(τ) is the total number of psips at time τ. Repeated substitution of Eq. (2) into itself yields where S(0) is the initial value of the shift (in this work, the Hartree-Fock energy), N s is the population at the end of the equilibration phase, and ξ = γ/(Aδτ) is usually fixed during a simulation. Eq. (3) implies that FCIQMC is an example of MCMC, the implications of which we shall discuss in Sec. III. The correlation energy can also be found by where the trial state, |D 0 ⟩, is typically the Hartree-Fock determinant. The variance of the projected estimator is generally smaller than that of the shift and can be reduced further by a multi-determinant trial function. 12 Both estimators are serially correlated as the state of the simulation at one timestep is heavily dependent on the state at the previous timestep. We use an automated iterative blocking algorithm [15][16][17][18] to accurately estimate the stochastic error in all FCIQMC calculations presented in this paper.

A. General Markov chain Monte Carlo theory
A stochastic process is a discrete time Markov chain if the probability of transitioning from one state to another (in one discrete timestep) depends only upon the current state. 19,20 The probability of a given set of psips, {n i }, producing another set of psips, {n ′ i }, at the next timestep depends only on {n i } and the value of the shift, which depends upon the total number of psips at a given shift update. We can therefore describe FCIQMC as a Markov chain taking one step every A timesteps. In order to simplify the mathematical details, we shall henceforth assume that the simulation takes one timestep between shift updates (i.e., A = 1).
We shall denote the Markov states of the simulation using indices α, β . . . and Slater determinants using indices i, j, . . . . The stochastic matrix, Γ, consists of elements Γ α, β which gives the probability that the system transitions from state α to state β in one step in the Markov chain and is in general not symmetric.
We can infer some properties of the eigenvectors and eigenvalues of the stochastic matrix as Γ is non-negative and  β Γ α β = 1 as a FCIQMC calculation must transition from one state to another or remain in the same state. From this, there must exist one or more left eigenvectors, γ α , satisfying where γ α gives the probability that the Markov chain will be in state α if the chain is in equilibrium. The Perron-Frobenius theorem proves that the Γ must have one or more such Perron-Frobenius eigenvectors with a unit eigenvalue and all other eigenvalues must be smaller. The Perron-Frobenius eigenvector is unique, and the chain will converge towards this distribution if (i) all the states are aperiodic, i.e., Γ N α β > 0 for all values of large N; (ii) every state can be reached from every other state. For function f (α) defined for all possible Markov states, its expectation value is The Perron-Frobenius eigenvector specifies the distribution of an ensemble of independent Markov chains taking a single step, and by computing it, we may find expectation values of interest in this system.

B. The FCIQMC Markov chain
A state α in FCIQMC is represented by the signed number of psips on each determinant (n a , n b , . . .), The shift S is not an independent variable as it is simply a function of the total number of psips. The FCIQMC chain is in an absorbing state (i.e., the probability of leaving the state is zero) when there are no psips on any of the determinants, as all events which change the psip population require an existing non-zero population. As the shift is initially held at a constant  15 The bias in both estimates of the correlation energy decays with the inverse of the average number of psips. Linear fits were performed with numpy. 24 Errors were weighted in the fits for the FCIQMC data using the sum of the variance of 1/⟨N ⟩ and variance of the energy estimator. The state with the smallest 1/⟨N ⟩ was removed from the fit as the energy is too large to fit a linear slope. This is caused by the stochastic matrix being truncated at the state with 150 psips on both determinants; states after this truncation have become important at this point.
value during the equilibration phase, the Markov chain changes after the shift is allowed to vary. The estimators of interest in FCIQMC are the shift and the numerator and denominator of the projected energy. Defining The projected energy can hence be evaluated using ⟨E Proj ⟩ = ⟨E Numer ⟩/⟨N Denom ⟩.
Computing the stochastic matrix for an arbitrarily large system of determinants is computationally infeasible as the space scales as the power of the number of determinants, so we restrict ourselves to the simplest possible (interesting) system: two determinants, a and b. In this case the change on one determinant is independent of the change on the other, the elements of the stochastic matrix element are given by where p cn a , n ′ a is the probability that the population on a changes from n a to n ′ a (Appendix). We have constructed Γ α, β for some simple systems and determined (by direct 21 or iterative 22 diagonalisation) the stationary distributions γ α corresponding to the Perron-Frobenius eigenvector. The supplemental material 23 contains further examples of using the stationary distribution to examine the behaviour of the ensemble of FCIQMC states.

IV. POPULATION CONTROL BIAS
In order to achieve a finite population in a simulation, we must resort to population control by introducing a shift which itself is dependent upon the current total population. However, this process introduces a feedback into the propagator and hence a systematic bias. 25 Random fluctuations causing the population to increase (i.e., the psip distribution enters a low energy region of phase space) or decrease (higher energy region of phase space) are moderated by a corresponding decrease or increase in the shift. Both actions lead to an increase in the time-averaged energy estimators.
In DMC, this bias is known 25,26 to scale as ⟨N⟩ −1 , though we know of no previous investigations of population control bias in FCIQMC. We suspect that the effect has most likely been obscured by the stochastic error in all previous FCIQMC studies. With the aid of exact energy estimators from the transition matrix, we are now able to investigate the magnitude of any bias present. We feel it is important to understand where population control bias is likely to cause a problem if a small stochastic error is desired. In addition to the energy estimators from the transition matrix, we shall also investigate them from single chains via blocking analyses of single FCIQMC calculations to compare both methods and use them to quantify the factors controlling population control bias.

A. H 2 in a STO-3G basis set
For different values of N s , transition matrix and single chain calculations were performed on H 2 in a STO-3G basis at the equilibrium geometry of 0.7122 Å, 27 and the energy estimators evaluated. Fig. 1 shows the bias of the projected energy and shift estimators decreases with 1/⟨N⟩. Though the singlechain calculations have relatively large stochastic errors, a similar bias in the energy and decay is also notable, and there is good agreement between the single chain and transition matrix results. The fits for the transition matrix calculations, however, do not exactly intercept the y-axis at the correlation energy. The worst extrapolation is for the projected energy, which The prefactor in the 1/N scaling of the bias in the projected energy is affected by ξ, and damping less hard, i.e., decreasing ξ, reduces the prefactor (Fig. 2). Population control bias also appears to be made worse in strongly correlated systems. Fig. 2 shows population control bias as a function of ξ for both H 2 in a STO-3G basis set at bond lengths of 0.7122 Å and 1.4244 Å. 28 We may explain this by reviewing DMC, where, in the limit of a perfect trial function, there are no branching processes and thus no population control bias. Equivalently in FCIQMC, if there is no spawning there can be no population control bias. Although true only in the limit in which the Hilbert space is the set of eigenvectors of the Hamiltonian, this indicates that in the weakly correlated limit, there will be less population control bias, exactly as we observe.

B. The neon atom in a cc-pVDZ basis
Population control bias is also potentially a significant source of a systematic error in systems which are large enough not to be trivially soluble (rendering transition matrix calculations computationally infeasible). We now turn to the neon atom in a cc-pVDZ basis 29,30 which has a Hilbert space of 50 000 determinants. This is small enough such that it is straightforward to compute the FCI energy via iterative diagonalisation but large enough such that most determinants have a small contribution to the wavefunction. It was necessary to oversample the Hilbert space in H 2 (i.e., more psips than the number of determinants), whereas FCIQMC calculations in this neon system are stable with a significant undersampling of the space. We shall investigate the effect of population control bias in this regime.
Changing the population control parameters affects both estimators of the correlation energy in the same way as H 2 . Fig. 3(a) shows the projected energy decaying towards the FCI energy as ξ decreases until the estimator of the energy becomes within error bars. Fig. 3(b) shows the bias in the projected energy decaying as 1/⟨N⟩. This intercepts the yaxis at −0.192 106 6(12) E h which is within errors of the FCI energy of −0.192 105 578 E h ; suggesting, we converge to the exact ground state as expected. The population control bias is however significant and with about 10 000 psips is about 20 µE h .
As population control bias is clearly a problem, we sought to find a simple indicator of its magnitude in a calculation. An investigation into the relationship between the variance of the shift and the magnitude of the population control bias did not yield any simple relationship. With hindsight, a consideration of the causal relationship between the two makes it likely that not only the extent but also the speed of variation of the shift is important. As such values are considerably more difficult to calculate, we will leave investigation of this connection to a future publication.
Instead, we have adopted a method used in DMC. The population bias can be both quantified and reduced by a reweighting technique based upon the history of the shift. 25 The contribution at a given time, τ, to the numerator and denominator of the projected energy is weighted by taking into account the shift of the preceding W iterations. For S m , denoting the shift m iterations previously, the weight is given by The reweighting is implemented as a post-processing step on the output of a calculation. The population control bias is effectively removed for sufficiently large W at the cost of increasing the stochastic error and, as can be seen in Fig. 4, the residual bias is of the order of the stochastic error bars. 31 The value of W (≈ 250) required for this procedure to converge is of the order of the serial correlation length. We

V. DISCUSSION
To summarise: we have demonstrated that FCIQMC is an example of Markov chain Monte Carlo and computed the stochastic matrix for a two determinant system. Even though a two determinant system is the simplest non-trivial system, it still contains some of the inherent features of FCIQMC including population control bias. A two determinant system cannot have a sign problem unless the timestep is greater than the critical point. It would be interesting to extend these ideas to investigate the sign problem using a three determinant system, though the stochastic matrix may be inaccessible due to its scaling with the size of the Hilbert space.
Recently, Petruzielo et al. proposed to use floating point numbers to represent the population of psips on a determinant. 12 This adaption results in an uncountably infinite state space of the Markov chain. They also proposed to partition the determinant space into deterministic and stochastic subspaces, where the action of the Hamiltonian in the deterministic subspaces is applied exactly using sparse matrix multiplication, and the action in the stochastic subspaces is sampled in the same way as in FCIQMC. Using floating point numbers as walker weights as well as a multideterminantal trial function significantly reduces the prefactor of the 1/⟨N⟩ scaling in population control bias.
The population control algorithm in DMC, as recommended in Ref. 25, is slightly different from that used in FCIQMC: the shift is updated from the "best current estimate" of the energy rather than from the previous value of the shift. Using this population control algorithm would render FCIQMC non-Markovian. Nonetheless, we could use the stochastic matrix technique presented here to calculate the probability distribution of the shift in the limit of convergence of the projected energy. It would be interesting to investigate if this is a better method of population control for FCIQMC.
Using the population control approach given in Ref. 5 (i.e., using Eq. (3) with γ set in the region 0.01 to 0.05) may introduce population control bias, due to the factor of 1 δτ , if the timestep needed to converge a calculation needs to be small. This means population control bias is likely to be more of a problem for calculations which require smaller timesteps, such as strongly correlated systems, or calculations using coupled cluster Monte Carlo. 11 We also note that converging FCIQMC calculations to µE h accuracy has previously been attempted. 32 In this regime, the population control bias could potentially become similar in magnitude to the stochastic error.
We recommend that one reweights the projected energy estimator, as suggested in Ref. 25, as it does not involve multiple expensive runs. If a large enough population is used, the resultant estimate of the energy would be unbiased albeit with a larger stochastic error (though this has appeared negligibly larger in our tests). Alternatively, one should use a large population of psips and set ξ to be as small as possible, such that the number of psips does not drop below the system-dependent critical population. Doubling the number of psips in a simulation increases the equilibration time and possibly also the memory requirements. It is also important to perform enough steps to get an accurate estimate of the error. In choosing an appropriate value of ξ, there is a compromise to be made; it is tempting to increase ξ because it reduces the fluctuations in the total number of psips and, for larger systems, this can reduce the maximum amount of memory used during the calculation. However, too large a ξ will cause population control bias to become significant.
In conclusion, we caution users of FCIQMC and related methods to be aware that population control can introduce a significant bias in calculated energies. We recommend that post-processing reweighting is used to quantify its magnitude and the psip population and damping parameters be modified as suggested in this paper if needed.

ACKNOWLEDGMENTS
The authors thank C. J. Umrigar for several enlightening discussions, facilitated by the unique setting of The Towler Institute. Calculations were performed using the Imperial College High Performance Computing Service 33 and figures were plotted using matplotlib. 34

APPENDIX: TRANSITION PROBABILITIES FOR A TWO DETERMINANT SYSTEM
Consider two determinants, a and b, with states α and β representing two states with a given signed number of psips on each determinant, α = (n a , n b ), β = (n ′ a , n ′ b ).

(A1)
Each psip independently attempts to spawn and die every timestep. The probability that n psips succeed out of N attempts is given by the probability mass function of the binomial distribution, B(n, N, p) = ( N n ) p n (1 − p) N −n , where p is the probability of one psip spawning or dying independently and can be obtained from the FCIQMC algorithm. 5 The change on determinant a is 35 n ′ a − n a = −sgn(H ba )sgn(n b )n sa − sgn(H aa − S)sgn(n a )n da , where n sa (n da ) is the number of psips spawning onto (dying on) a. As the spawning and death events on each determinant are independent, the probability p c n a , n ′ a that the number of psips on a changes from n a to n ′ a via any possible combination of spawning and death is given by p c n a , n ′ a =  n sa B(n sa , n b , P s (a|b))B(n da , n a , P d (a)), where P s (a|b) is the probability for spawning to a from b and P d (a) is the probability of death of a.