A light weight regularization for wave function parameter gradients in quantum Monte Carlo

The parameter derivative of the expectation value of the energy, $\partial E/\partial p$, is a key ingredient in variational quantum Monte Carlo (VMC) wave function optimization methods. In some cases, a na\"ive Monte Carlo estimate of this derivative suffers from an infinite variance which inhibits the efficiency of optimization methods that rely on a stable estimate of the derivative. In this work, we derive a simple regularization of the na\"ive estimator which is trivial to implement in existing VMC codes, has finite variance, and a negligible bias which can be extrapolated to zero bias with no extra cost. We use this estimator to construct an unbiased, finite variance estimation of $\partial E/\partial p$ for a multi-Slater-Jastrow trial wave function on the LiH molecule. This regularized estimator is a simple and efficient estimator of $\partial E/\partial p$ for VMC optimization techniques.


I. INTRODUCTION
Variational Monte Carlo (VMC) is a powerful technique to study correlated many-electron wave functions in materials. Accurate and stable stochastic estimation of the parameter gradient of the energy expectation value is an integral component of VMC wave function energy optimization techniques [1][2][3][4][5][6]. Within these techniques, the parameters p in a wavefunction Ψ( R, p) are optimized to minimize the energy expectation value E( p) ≡ Ψ( p)|Ĥ|Ψ( p) . This is accomplished through iterative approaches where a stochastic estimate of the gradient ∂E( p)/∂p is used in updating the parameter p at each iteration. Low bias and low variance evaluations of the derivative directly improve the efficiency of stochastic wave function optimization techniques.
The simplest estimator for ∂E/∂p is the derivative of the energy expectation value (Eqn 1 in the methods). This estimator is unbiased and has a zero-variance principle for any wave function parameter p if Ψ( R, p) is an eigenstate ofĤ. However, the variance of this estimator diverges when Ψ is not an exact eigenstate and p is a parameter which affects the nodes of the wave function, as occurs in the case of orbital or determinant coefficients [7,8].
Previous to this work, the infinite variance has been removed in several ways. One method is to sample from distributions other than the trial wave function squared. [7,9,10] This method has a small bias [8] due to nonlinearity in the estimator, which is removed as the sample size increases. Alternatively, one can modify the estimator using an auxillary wave function. [11][12][13] However, both of these approaches require a choice of guiding or auxillary functions to implement the algorithm.
In this paper, we derive and test a simple regularized estimator for ∂E/∂p which has finite variance and can * sapatha2@illinois.edu † lkwagner@illinois.edu be efficiently extrapolated to zero bias. Instead of guiding wave functions, the divergent variance of the naïve Monte Carlo estimator is removed via multiplication by a polynomial function within a distance of the nodes of Ψ( p, R). We show that the bias scales as 3 and the variance scales as 1/ . This estimator requires essentially no modification to simple VMC codes. For a realistic system, we test that the predicted scaling is seen in a real calculation, and show that extrapolation to zero bias is straightforward and requires no extra computation.

II. REGULARIZED ESTIMATOR
The naïve Monte Carlo estimate for ∂E/∂p evaluated on a wave function Ψ( R, p) iŝ where the brackets are Monte Carlo expectation values over M configurations drawn from the distribution |Ψ( p, R)| 2 , and E L = (ĤΨ)/Ψ is the local energy with Hamiltonian operatorĤ. This estimator has zero bias for any wave function Ψ and parameter p, but can acquire a divergent variance. The divergent contribution to the variance arises from the evaluation of the integral when Ψ is not an eigenstate ofĤ and the parameter p affects the nodes of Ψ, such as orbital or determinantal coefficients. In this case, to lowest order in distance | R − N | from a nodal point N of Ψ,ĤΨ and ∂ p Ψ const and Ψ ∇Ψ( N ) · ( R − N ), leading to the integrand in Eqn 2 behaving as 1/| R − N | 2 near the node. Since the integration domain includes the nodal surface of Ψ, the 1/| R − N | 2 behavior of the integrand as | R − N | → 0 leads to a divergent integral.
We obtain a finite variance estimator for ∂E/∂p by regularizing the naïve estimator of Eqn 1 by a function f :θ The quantity | x| is the normal distance between R and nearest nodal point of Ψ, is a parameter which defines how far away from the node the regularization should be carried out, and ∇ is the many-body gradient. Details on the construction of f can be found in the Appendix.
In evaluating the variance of Eqn 3, the divergent integral Eqn 2 is replaced by a regularized integral Unlike the integrand of Eqn 2, the regularized integrand does not diverge near the nodes of Ψ. This is seen by Taylor expanding the integrand to lowest non-vanishing order in | R − N |. Substituting the limiting relationships forĤΨ, ∂ p Ψ and Ψ from above and noting as | R − N | → 0, we find the limiting behavior of the regularized integrand near the nodes of Ψ: The regularized estimator thus removes the divergent contribution to the variance present in Eqn 2 and results in a finite variance estimation of ∂E/∂p for any > 0.
To lowest order in , the variance of Eqn 3 decreases as 1/ . This can be seen by carrying out the integral Eqn 5 as → 0. Since the regularization is only present for | x/ | < 1, the lowest order scaling with appears in the integration domain | x/ | < 1. Within this domain, the limit → 0 is equivalent to | R − N | → 0, and the limiting integrand of Eqn 7 can be used. Making the change of variables R → (l, N ) where l is the normal coordinate to the nodal surface and N is the coordinate of the nodal surface, Eqn 5 to lowest order in is The suppressed variance of the estimate comes at a cost, in the form of cubic bias O( 3 ) to lowest order in . The bias of the regularized estimator is Following the analysis of the previous paragraph, we evaluate the bias to lowest non-vanishing order in by Taylor expanding the integrand in powers of | R − N | and calculating the bias at each order in the expansion. The constant term in the Taylor expansion does not contribute to the bias by parity of the integrand. Further, the choice of coefficients for f in Eq 4 ensures the first order term vanishes, as proven in the appendix. The lowest nonvanishing contribution to the bias is due to term in the Taylor expansion ∝ l 2 , yielding an integral which scales as 3 . This cubic bias can be removed by a zero-bias extrapolation carried out in four steps.

Conduct a standard VMC calculation to collect
Evaluate | x i | from Eqn 4 for each configuration.

Calculate the regularized estimate Eqn 3 for a se-
quence of → 0 on the collected configurations.

Fit a function a 3 + b to the estimates; the intercept b is the zero bias, finite variance estimate for ∂E/∂p.
This procedure can be carried out very efficiently as only a single set of VMC configurations must be drawn for the entire extrapolation, resulting in an inexpensive, finitevariance, zero bias estimate of ∂E/∂p.

III. APPLICATION TO LIH MOLECULE
We verify the predicted mathematical behavior ofθ by computing ∂E/∂p for a determinantal coefficient of an unoptimized multi-Slater Jastrow (MSJ) wave function for the LiH molecule: The determinants |D i and coefficients c i were taken from a full configuration interaction (CI) expansion over the Li 1s, 2s, 2p and H 1s orbitals. The orbitals were constructed from a restricted open-shell Hartree Fock (ROHF) calculation using a correlation consistent quadruple-zeta valence basis set [14]. A 2-body Jastrow factor J( α) of the form in [15] was used with α = 0 except for electron-electron cusp conditions. The ROHF and CI calculations were done using the PySCF package [16] and all QMC calculations were carried out using pyqmc [17].
We begin by verifying that the regularized integral in Eqn 5 has finite value across a node for > 0. To do so, we evaluate the regularized integrand for various along a path which passes through a node N ,  from l = −0.1 to 0.1 Bohr and normalized by a factor |Ψ| 2 dR along that path. The predicted cubic bias is observed for six decades in while the O(1/ ) decrease in variance stands for four decades. The increase in the variance after = 10 −2 occurs due to the breakdown of the assumption Ψ( R) ∇Ψ( N ) · ( R − N ), resulting in a linear increase in variance with . This change is not mirrored in the bias since the sub-leading order scaling with appears when the assumptions ofĤΨ and ∂ p Ψ const break down.
We conclude by carrying out the four step, zero-bias, finite variance extrapolation for ∂E/∂p proposed in the previous section, shown in Figure 3. First, a standard VMC calculation with 200,000 steps and 2,000 configurations per step was carried out. Then, ∂E/∂p was estimated using Eqn 3 for between 1. the blue data points. Since the same configurations were used for each evaluation, the estimates have a strong statistical correlation and the predicted 3 bias is clearly present, shown by the orange fit curve. The intercept of the fit curve is shown by the red star and is the zero-bias estimate of ∂E/∂p. The variance can be deduced by the error bar of the blue data points, in this case 0.025 mHa. Since the bias is zero within statistical errors for small values of , most practical calculations can be carried out for a fixed value of between 10 −5 and 10 −2 Bohr without extrapolation. Thus the regularized estimator is quite robust to .

IV. CONCLUSION
In this work, we derived and tested a simple regularized estimator for ∂E/∂p which has finite variance and can be extrapolated to zero bias. The divergent variance present in the naïve Monte Carlo estimator is suppressed via multiplication by a polynomial function within a distance of the nodes of Ψ( p, R). We prove that the regularized estimator manages a finite variance by incurring a cubic bias which can be efficiently extrapolated to zero bias. The extrapolation is carried out, and a finite variance, zerobias estimate of ∂E/∂p is evaluated for a determinantal coefficient in a trial wave function for the LiH molecule.
The regularized estimator stands as an improvement to popular finite-variance estimation techniques for ∂E/∂p. These popular techniques require complex guiding [7,9,10] or auxiliary wave functions [11][12][13] in order to estimate ∂E/∂p for the trial wave function Ψ( R, p). The regularized estimator does not require these additional, complicated wave functions, leading to a simple algorithm for the finite variance, zero-bias, estimation of ∂E/∂p which requires only the trial wave function Ψ( R, p).