Fast Ewald summation for electrostatic potentials with arbitrary periodicity

A unified treatment for fast and spectrally accurate evaluation of electrostatic potentials subject to periodic boundary conditions in any or none of the three spatial dimensions is presented. Ewald decomposition is used to split the problem into a real-space and a Fourier-space part, and the FFT-based Spectral Ewald (SE) method is used to accelerate the computation of the latter. A key component in the unified treatment is an FFT-based solution technique for the free-space Poisson problem in three, two or one dimensions, depending on the number of non-periodic directions. The computational cost is furthermore reduced by employing an adaptive FFT for the doubly and singly periodic cases, allowing for different local upsampling factors. The SE method will always be most efficient for the triply periodic case as the cost of computing FFTs will then be the smallest, whereas the computational cost of the rest of the algorithm is essentially independent of periodicity. We show that the cost of removing periodic boundary conditions from one or two directions out of three will only moderately increase the total runtime. Our comparisons also show that the computational cost of the SE method in the free-space case is around four times that of the triply periodic case. The Gaussian window function previously used in the SE method, is here compared to a piecewise polynomial approximation of the Kaiser-Bessel window function. With a carefully tuned shape parameter that is selected based on an error estimate for this new window function, runtimes for the SE method can be further reduced. Furthermore, we consider different methods for computing the force, and compare the runtime of the SE method with that of the Fast Multipole Method.


I. INTRODUCTION
The task of computing interactions in an N -body problem is the most demanding part of various numerical simulations such as electrostatics in molecular dynamics, gravitational fields in cosmological formation of galaxies, and potentials in Stokes flow simulations. Due to the long-range behavior of the involved kernels, these problems are computationally expensive and therefore, fast and accurate numerical algorithms are required to accelerate simulations. The Ewald technique 1 splits the interactions into a near field (computed in real space) and a far field (computed in Fourier space) contribution. There exist several methods that utilize this decomposition together with the Fast Fourier transform (FFT) in order to accelerate the calculation of the Fourier space sum. [2][3][4][5] These methods belong to a family of Particle-Mesh-Ewald (PME) methods which, applied to a system of N particles, reduce the computational complexity from O(N 2 ) to O(N log(N )) with a prefactor depending on the required accuracy.
The given references are all concerned with the triply periodic case. Different approaches have been suggested to extend to settings with reduced periodicity. One simple idea for the doubly periodic case is to use the triply periodic summation, simply extending the unit cell in the non-periodic direction and thereby creating a gap that separates sheets of charged particles. To increase the accuracy, various methods have been proposed that introduce correction terms to the triply periodic sum. In Arnold et al. 6 , a correction term that allows for highly accurate calculations is derived.
Lekner summation is an alternative to Ewald summation, and the MMM2D method is based on this approach, 7 reducing the computational complexity not to the desired O(N log(N )) but rather O(N 5/3 ). There is also a MMM1D method for singly periodic problems, 8 but with a computational cost of O(N 2 ).
In Ref. 9, O(N log(N )) methods are introduced both for doubly and singly periodic problems. As soon as a non-periodic direction exists, the discrete summation in Fourier space in the Ewald decomposition is substituted with an integral. Evaluating the integrals analytically, the summation in Fourier space involves complementary error functions and Bessel functions for the doubly and singly periodic cases, respectively. The methods introduced in Ref. 9 are based on regularization and periodic extension of such functions to enable the use of FFTs.
In this paper, we present a different approach, that works directly with the numerical discretization of the integrals in Fourier space. The Spectral Ewald method has been developed over the last decade 5,[10][11][12] in order to provide a fast and spectrally accurate approach for evaluating electrostatics problems with different periodicities. The free-space and 1dperiodic versions of the method were developed recently and equipped with a novel technique proposed by Vico et al. which provides a tool for computing volume potentials using FFTs. 13 Together with an adaptive FFT that enables different local upsampling factors, this extension makes it possible to unify the treatment of all modes, as was done in the singly periodic case in Ref. 12. As a result, this case can be evaluated with only a small extra cost as compared to the triply periodic case. 12 In Ref. 11, the free-space version of the SE method is used for accelerating the evaluation of free-space potentials of Stokes flow. It was shown that this method is competitive with the Fast Multipole Method (FMM), which unlike the SE method is most efficient for tackling problems with non-periodic boundary conditions.
In the current work, we extend the recent advances made also to the doubly periodic electrostatic problem, previously considered in Ref. 10, as well as the free-space case, to complete the full range from free space to triply periodic in one unified treatment.
The SE method has so far been using a Gaussian window function to interpolate between point sources and a uniform mesh. In this paper, we replace the Gaussian window by a piecewise polynomial approximation of the Kaiser-Bessel (KB) window function 14 to perform the interpolation. This approximation, inspired by Ref. 15, is accurate enough to retain desired properties of the KB window and is substantially cheaper to evaluate. For both the Gaussian and KB window function, a shape parameter has to be set. We provide an error estimate useful for finding the optimal shape parameter of the KB window function and assess the accuracy of the estimate by means of numerical tests. We show that employing the polynomial approximation of the KB window instead of the Gaussian window, the cost of evaluation using the SE method is reduced significantly. We also provide a systematic approach for selecting parameters based on a given error tolerance, which can be used to automate the parameter selection process.
This paper is organized as follows: In section II, we provide Ewald summation formulas for different types of periodic boundary conditions. In section III, the Spectral Ewald method is constructed for problems with arbitrary periodicity. Section IV introduces different window functions that can be used in PME methods, as well as the piecewise polynomial approxima-tion used in this paper. Truncation and approximation errors together with error estimates are introduced in section V, while section VI is dedicated to selection of parameters. The numerical results in section VII have a threefold focus: (i) comparison of the Gaussian and KB window functions, (ii) computation of forces, which is important in applications, and (iii) a new comparison with the FMM. Finally, conclusions are drawn in section VIII.

II. EWALD SUMMATION
The classical Ewald sum was developed for fast evaluation of potentials in ionic crystals and later the same technique was used for computing long-range interactions in molecular dynamics simulations and potentials in Stokes flow. The resulting formula relies on the Ewald decomposition introduced by Ewald 1 in 1921 for 3d-periodic problems. The Ewald sum in the 2d-periodic case, sometimes referred to as slab/slablike geometry, was derived by Grzybowski et al. 16 using lattice sums. The first derivation of the Ewald sum for the 1d- The prime above the summation symbol denotes that the term with n = m and p = 0 is excluded from the sum. The set P D with D ∈ {0, 1, 2, 3} is defined to impose periodicity, Triply periodic : Doubly periodic : Singly periodic : Free space : The computational domain Ω does not have to be a cube but this assumption simplifies the description and formulation. We also assume that the system is charge-neutral, i.e. n q n = 0. This condition is necessary for the sum to converge in triply, doubly and singly periodic cases 20 ; however, we assume that it also holds for free-space systems. Even for a charge-neutral system, the sum in Eq. (1) is only conditionally convergent in the triply periodic case and therefore the order of summation has to be defined. 21 The classical Ewald summation formula derived in Ref. 1 corresponds to a spherical order of summation and so-called "tin foil" far-field conditions, i.e. a surrounding medium with infinite dielectric constant. 19 The potential (1) is the solution to the Poisson problem with the conditions that ∇ϕ(x) vanishes at infinity in the free (i.e. non-periodic) directions and that where the integral is over R in each free direction and over [0, L) in each periodic direction.
In Eq. (2), ∆ is the Laplace operator and δ is the Dirac delta function. By introducing a screening function γ, f DP is decomposed as where * denotes convolution. Now, ϕ(x m ) can be obtained by solving two Poisson equations with the right-hand sides of f DP,R and f DP,F . The solutions to these two problems are denoted here by ϕ DP,R (x m ) and ϕ DP,F (x m ) respectively. The total solution to the problem in Eq. (2) can then be written as To obtain the classical Ewald sum, the screening function γ, with Fourier transformγ, is selected as Here, the Ewald decomposition parameter ξ > 0 controls how fast the two terms ϕ DP,R (x m ) and ϕ DP,F (x m ) decay, but does not change the final result ϕ(x m ).
The self-contribution term ϕ self m in Eq. (5) is a constant term which is independent of the periodicity. This term is added to the sum in order to exclude the unwanted interaction of charges with themselves which is introduced as a result of the decomposition. The term The real-space sum can be written as where erfc is the complementary error function and and as before D ∈ {0, 1, 2, 3} represents free-space, 1d-, 2d-and 3d-periodic cases. The sum in Eq. (7) decays exponentially fast with |x mnp | and is calculated by introducing a cut-off radius r c > 0 and including only terms s.t. |x mnp | < r c . In practice, a cell list is constructed for each target point x m . Taking into account that the domain is wrapped around periodically in periodic directions, the calculation is restricted to this list.
The term ϕ DP,F (x m ) is smooth, and therefore its Fourier spectrum decays rapidly. This term is treated in Fourier space, and its structure depends heavily on the type of periodicity, i.e. on D. In order to unify the description for different periodicities, we now introduce some non-standard notation. [v, w] = (x, y, z) for the spatial position and k = [k, κ] = (k 1 , k 2 , k 3 ) for the wavenumber vector, where w, κ ∈ R 3−D represent free directions and v ∈ R D , k ∈ K D represent periodic directions with For D = 3, w and κ are not defined and k = k, x = v. For D = 0, v and k are not defined and k = κ, x = w. Also write k := |k| and κ := |κ|. Furthermore, define a functional L that takes a function g : Let f ([v, w]) be a function that is periodic in v and non-periodic in w with Fourier transform given byf Then f andf are related through Using the notation introduced in definition 1, the k-space part of the Ewald sum reads While the compact notation here will help us to unify the treatment of all periodicities, we realize that it may be unfamiliar and therefore write out Eq. (11) explicitly for all periodicities at the beginning of section III, see Eqs. (16)- (19).
For D = 3, the term corresponding to k = 0 in Eq. (11) vanishes under the assumed spherical order of summation, charge neutrality and tin foil conditions. 1 For D = 0, the operator L only includes Fourier integrals defined for all κ ∈ R 3 and the closed form of the integral is nothing but the complement of the real-space sum minus the self term.
For k = 0 these integrals can be evaluated analytically. We have q n e ik·vmn k e kzmn erfc k 2ξ + ξz mn e −kzmn erfc k 2ξ − ξz mn , (12) in which we used the fact that for D = 2, w mn = z mn and for D = 1, v mn = x mn and k = k 1 .
Here, K 0 (·, ·) is the incomplete modified Bessel function of the second kind, defined as Note that, as done in Ref. 22, it is possible to construct a fast method for the sums defined in Eqs. (12) and (13). However, using the integral representation of the sums (11), we are able to construct a fast method that has a minimal deviation from the treatment of the triply periodic SE method 5 while incurring only a small additional cost.
For D ∈ {1, 2} and k = 0, the Fourier integrals in Eq. (11) are singular but have closed- where erf is the error function, E 1 (·) = K 0 (·, 0) and γ = 0.5772156649 . . . is the Euler-Mascheroni constant. The term ϕ 2P,F,k=0 , cf. Eq. (14), is a one-dimensional sum and in Ref. 10 it is computed via Chebyshev interpolation. This approach would be much more expensive if it were to be used for the two-dimensional sum ϕ 1P,F,k=0 . In fact, using Chebyshev interpolation in this case, the cost of the zero-mode term (15) would be comparable to the cost of calculating the rest of the Fourier modes, cf. Eq. (13). Instead, we note that for k = 0, (14) and (15) are solutions to (3−D)-dimensional free-space Poisson problems. Using the idea in Ref. 13, the corresponding forms in Eq. (11) can be replaced by non-singular expressions amenable to numerical integration. This approach has already been used in Ref. 11 and 12 for designing the Spectral Ewald method for the 1d-periodic and free-space cases. In this paper, the same technique is used for the zero mode of the 2d-periodic case as well. In Sec. III A we review the treatment of the k = 0 case in detail.

III. THE SPECTRAL EWALD METHOD
Here, we introduce the unified Spectral Ewald (SE) method, a fast method to accelerate the computation of the Fourier-space part of the electrostatic potential, i.e. Eq. (11). To aid the reader, we first write down Eq. (11) explicitly for the separate cases D = 3, 2, 1, 0, Here, we have taken the liberty to write κ i rather than k i for the wavenumbers in the free directions, to emphasize which directions are free. Note that the kernel (e −|k| 2 /4ξ 2 /|k| 2 )e ik·(xm−xn) is the same for all periodicities; the difference is that wavenumbers are summed in the periodic directions but integrated in the free directions. In the following, we will use the compact form in Eq. (11) to treat all periodicities simultaneously.
However, we first point out some difficulties related to the factor 1/|k| 2 . In the SE method, we will discretize the integrals in Eqs. (17)- (19) using the trapezoidal rule on a uniform grid in k-space, so that we may use the Fast Fourier Transform (FFT) to go between real space and k-space. However, note that when the periodic wavenumber vector k = (k 1 , . . . , k D ) is zero in the D = 1, 2 cases, and always in the D = 0 case, the integrand becomes singular at the point κ = (k D+1 , . . . , k 3 ) = 0. Moreover, when k is non-zero but small in the D = 1, 2 cases, the integrand will vary rapidly when κ is close to zero, and will therefore be hard to resolve. The former problem (singular integrand) is treated by modifying the Green's function 1/|k| 2 as in Ref. 13 to remove the singularity, which we describe in section III A. The latter problem (rapidly varying but non-singular integrand) is treated by adaptive upsampling, described in section III D.
The SE method was first introduced by Lindbo and Tornberg 5 for the 3d-periodic case and was extended by the same authors for the 2d-periodic case 10 . The method follows the same steps as other PME methods, namely: 1. (Gridding) In real space, a uniform grid is introduced and irregular point sources are distributed onto the grid using an interpolating window function. In the SE method, Gaussians have traditionally been used as window functions.

(FFT) An
FFT is applied to compute the Fourier transform of the gridded function.
3. (Scaling) In Fourier space, the result is scaled with a (modified) Green's function.
4. (IFFT) An IFFT is employed to take the result back to real space.
5. (Gathering) Finally, the Fourier-space part of the potential, cf. (11), is evaluated at target points by interpolation with the same window function.
The choice of window function influences the accuracy and runtime of the resulting method.
The main feature of the SE method compared to other PME methods is that the support of the window function can be varied independently of the size of the uniform grid, which allows approximation errors stemming from the window function to be controlled separately from truncation errors (see section V).
In the following, we first introduce the modified Green's functions to treat the case when the integrand is singular (section III A). Then, we give an outline of the SE method and how the window function is introduced in the method (section III B). This is followed by the discretization (section III C), adaptive FFT and upsampling (section III D), precomputation for the free-space case (section III E), and finally a summary of the SE algorithm (section III F) in Algorithm 4.
A. Modified Green's functions for the singular case As noted above, the Fourier integrals in the free directions require special treatment when k = 0 for D = 1, 2, and also when D = 0, since the integrand is singular. In this section, we describe how the singularity can be removed by modifying the Green's function. This is done by considering a free-space Poisson problem. As mentioned in connection to Eqs. Consider the free-space Poisson problem with boundary conditions ϕ(x) → 0 as |x| → ∞. We are interested in the solution ϕ in a box Ω = [0, L) 3 ⊂ R 3 , which corresponds to the box introduced above Eq. (1). The solution to problem (20) can be written in real space as a convolution between the Green's function G(x) and the right hand side f (x), or in Fourier space, as where G(x) = 1/(4π|x|) and G(k) = 1/|k| 2 is the Fourier transform of G. Now assume Noting that x is always contained inΩ in (21), we can, without modifying the value of ϕ(x), replace Then, using the fact that G R (r) is radially symmetric, its Fourier transform can be computed as 13 and in the limit k → 0 we have Note that G R is not singular. The expression here is valid for the case D = 0. Using the same technique, we can derive corresponding Fourier transforms also for the cases D = 1 and D = 2, where the free-space Poisson equations are two-and one-dimensional, respectively.
Dropping the subscript R and recalling that G(k) = 1/|k| 2 in the non-singular case when k = 0, we define modified Green's functions G DP (k) as follows, with R = √ 3 − DL: • For D = 0 (free in all directions), κ = k = (k 1 , k 2 , k 3 ) and k is not defined, and as we showed above.
• For D = 1 (periodic in the x direction and free in the y and z directions), k = k 1 and κ = (k 2 , k 3 ), and 12,13 where J 0 and J 1 are the Bessel functions of the first kind and order 0 and 1, respectively.
• For D = 2 (periodic in the x and y directions and free in the z direction), k = (k 1 , k 2 ) and κ = k 3 , and (see Appendix A) • For D = 3 (periodic in all directions), k = k = (k 1 , k 2 , k 3 ) and κ is not defined, and which is just the standard Green's function modified to take into account that the In the derivation above, we have assumed the right-hand side to be compactly supported in the extended boxΩ. In (4), however, the right-hand side is a superposition of screening functions, i.e. Gaussians, and hence does not have compact support. Nonetheless, in practice, it decays rapidly outside of Ω = [0, L) 3 and the extended domainΩ can be selected such that the magnitude is arbitrarily small (we return to how this is done in section III C).
Having this in mind, we may now replace 1/(k 2 + κ 2 ) by G DP in (11) and define In the triply periodic case, (26) is just another representation of (11), and ϕ 3P, In the cases D = {0, 1, 2}, there is an approximation due to the truncation of the screening functions outsideΩ, and we have ϕ DP,F (x m ) ≈ ϕ DP,F (x m ). This approximation is accurate as long as x m lies in Ω andΩ is large enough, a statement that will be made precise in the following sections.

B. Outline of the method
So far, we have presented a unified formula for the representation of the Fourier space sum with different periodicities, i.e. (26). Later, in section III C, we will introduce a uniform grid and discretize the formulation. Before we do that, we want to give a schematic outline of the SE method, without explicitly introducing the grid. Of particular interest here is the way the window function is introduced into what will become the gridding, scaling and gathering steps, cf. the numbered list at the beginning of section III above. In section IV, we will review four relevant window functions that are used in different Ewald methods.
Let w(x) be a window function with Fourier transform w(k) and consider the trivial We will insert this into the Fourier space sum (26) and use one factor w for the gridding step, the second w for the gathering step, and w −2 in the scaling step. Note that (26) can be rewritten as We start by defining which is the Fourier transform of Here, (·) * denotes that periodicity is implied in the periodic directions. Eq. (30) will become the gridding step in the final method, in which H(x) is evaluated on the uniform grid.
We furthermore define which will become the scaling step. Note that the scaling step will depend on the selected window function. Also recall that G DP here is the modified Green's function defined by Eqs. (25)- (22). Eq. (31) allows us to write (28) as Applying the convolution theorem to (32), we obtain an integral form for the Fourier-space potential evaluated at target point x m , This form, once discretized, will become the gathering step. We will discretize the integral in (33) using the trapezoidal rule. Note that if w is smooth and have compact support, the integral can be computed with spectral accuracy. Furthermore, note that in the case when target points and source points coincide, the window function will be evaluated in the same points in both (30) and (33), so its values can be cached.
In the above procedure, we obtain H from H and H from H via a Fourier transform and inverse Fourier transform, respectively. In the 2d-and 1d-periodic cases these transforms are mixed : a discrete Fourier transform in each periodic direction and an approximation to the continuous Fourier integral in each non-periodic direction. We shall return to this discussion later, in section III D.

C. Discretization
Let us now introduce a uniform Cartesian grid on Ω = [0, L) 3 with M subintervals in each spatial direction (M being an even integer), and define the step size h = L/M . The charges of the point sources are spread to the uniform grid using a suitable window function in the gridding step (cf. (30)). Let the support of the window function have length P h in each spatial direction, where P is a positive even integer such that P ≤ M (i.e. P is the number of grid subintervals within the support of the window).
In the non-periodic directions, the box Ω must be extended to lengthL = L + δ L , as However, the modified Green's functions are oscillatory and thus requires a larger sampling rate thanL/2π to resolve. Furthermore, when k is small but non-zero for D = 1, 2, the Green's function G DP ([k, κ]) = 1/(|k| 2 + |κ| 2 ) varies rapidly for small κ, and thus also needs an increased sampling rate. One option would be to apply a global upsampling factor s g > 1 such that 1/∆κ = s gL /2π for all periodic modes, and the grid would be of size It has been shown that for free-space Poisson problems in d = 3 − D dimensions, s 0 ≥ 1 + √ d is required to account for the oscillatory behaviour of the modified Green's functions and accurately compute the aperiodic convolution. 11 In practice, we select s 0 = 2, 2.5 and 2.8 for D = 2, 1 and 0, respectively.
The following is valid for D = 1, 2 only. To define which non-zero periodic modes to upsample, let n be a positive integer such that n k ∞ := M/2 and define the two sets and The set I contains periodic k-vectors in a box around the zero vector (to be upsampled with factor s), while J contains the vectors outside this box (which will not be upsampled). The selection of s and n is treated later in section VI. The AFT computes the Fourier transform w → κ in the free directions with adaptive upsampling factor     Once again, if n =k ∞ and s = s 0 , all steps can be merged into a three-dimensional inverse FFT followed by truncating the result to M 3 grid points. It is also worth mentioning that for D = 3, 0, the AFT/AIFT is simply a three dimensional FFT/IFFT on a grid of size M 3 and (s 0M ) 3 respectively.

E. Precomputation for the free-space case
In the previous section we discussed how the AFT/AIFT algorithms can be used to accelerate the computations compared to global upsampling, for the D = 1, 2 cases. However, in the free-space case this does not apply since there is no periodic directions (and thus the AFT becomes equivalent to global upsampling). Therefore, in the free-space case, all modes must be upsampled by at least s 0 ≥ 1 + √ 3 ≈ 2.73, applied to the extended grid sizeM = 2 (M + λP )/2 in all three directions. As stated before, upsampling is needed to resolve the oscillatory modified Green's function (22) and to compute an aperiodic convolution. However, the convolution only requires s 0 = 2; the rest is needed to resolve the Green's function. By precomputing an effective Green's function and truncating it in real space, we can thus reduce zero-padding to a factor of 2 in each direction. 11 The precomputation algorithm are given in Algorithm 3. The output G 0P R can be stored and reused in the scaling step. Note that it depends only onL andM , not on the locations or charges of the point sources.

F. Summary of the Spectral Ewald algorithm
In Algorithm 4, we finally present the unified Spectral Ewald method to compute the approximate Fourier-space part of the electrostatic potential, given by (26), with arbitrary periodicity. Note that in the free-space case (D = 0), precomputation of G 0P R is done once according to Algorithm 3; the result is then used in step 3 of Algorithm 4.

IV. WINDOW FUNCTIONS
Here, we review some of the most relevant window functions that appear in electrostatic calculations and specifically in PME methods. We also include a recent window function introduced by Barnett et al. 15 . For a complete survey of the classical window functions, the reader is directed to Ref. 23.
The following window functions are presented in one dimension. In three dimensions, the corresponding window function w(x) can be obtained as a tensor product where w 0 is the one-dimensional window function.
where α > 0 is a shape parameter and w > 0 is the half-width of the window function (w = P h/2 with P as in section III C). (In previous work 5,10-12 , the shape parameter m has been used, related to α through α = m 2 /2.) The shape parameter α controls the truncation level of the Gaussian, described further in section V B. Note that in higher dimensions, the window is truncated outside of a cube.
The Fourier transform of the truncated Gaussian is given by where erf is the error function. In practice, however, this expression is never used; instead, we use the Fourier transform of the untruncated Gaussian, i.e.
which is taken into account in the error analysis (see section V B).
Cardinal B-spline window. Another type of window function that is used in FFTbased methods are cardinal B-splines 4 . The B-spline of order 2 is defined as and for order p > 2 is defined recursively as This window has finite support and is easy and fast to implement. Moreover, its Fourier transform is available analytically. B-splines have polynomial degree of smoothness and consequently, if B-splines of low order are used in an FFT-based method, the FFT grid size must be increased significantly to achieve high accuracy.
Kaiser-Bessel window. The Kaiser-Bessel (KB) window function 14 is defined by where I 0 (·) is the zeroth-order modified Bessel function of the first kind and β > 0 is a shape parameter. The KB window is shown with different β in figure 2. We describe how β should be selected to minimize approximation errors in section V B. The Fourier transform of the window is available in closed form as The KB window is an approximation to a family of prolate-spheroidal wave functions of order zero. It has been shown that the prolate-spheroidal wave functions provide an orthonormal basis which is optimal for the representation of functions whose Fourier transforms are compactly supported 24 . In addition, and to our interest in this paper, they require a significantly smaller width w compared to the Gaussian window to achieve the same target accuracy, thus reducing the computational effort. Potts et al. 25  the KB window function, the resulting algorithm is more accurate than the method using B-splines for homogeneous systems. More recently, Gao et al. 26 used the same window function in their simulation and arrived at the same conclusion for non-homogeneous systems.
The main drawback of the KB window is that it is expensive to compute. We address this by approximating it by a piecewise polynomial, described in section IV A.
Exponential of semicircle window. The "exponential of semicircle" (ES) window was recently introduced by Barnett et al. 15 as an approximation to the KB function that avoids evaluation of Bessel functions. The ES window is defined by This window can achieve nearly as high precision as the KB window with the same width, but is cheaper to evaluate. However, unlike the other window functions introduced in this section, its Fourier transform is not known analytically and thus has to be computed numerically.
Another drawback is that the ES window is in fact not differentiable at the endpoints x = ±w. The main advantage of the ES window is that it is a cheaper-to-evaluate alternative to the KB window. However, using the piecewise polynomial approximation described below, we can evaluate the KB window itself to desired accuracy at low cost. We will therefore have no need to consider the ES window, nor the B-spline window, in the remainder of this paper.

A. Polynomial approximation and the PKB window
The idea is simply to approximate the window function by a piecewise polynomial, with as high accuracy as needed to reach the desired target accuracy in the SE method. The The support of the window function, which is of length 2w = P h (cf. section III C), is divided into P subintervals each of length h. On each subinterval, the window function is interpolated by a polynomial of degree ν (selection of ν is discussed in section VI). The interpolation is performed by first mapping each subinterval [x i , x i +h] to the interval [−1, 1], i.e.
The window function in each subinterval is then sampled in ν + 1 Chebyshev points illustrated in the left part of figure 4, and subsequently the polynomial is fitted to the window function by solving the linear system for the ν +1 coefficients {c j }. The piecewise polynomial is allowed to be discontinuous where two subintervals meet.
When the window function w(x) is to be evaluated in the gridding and gathering steps, the window is centered at one of the point sources and must be evaluated on the uniform grid. Thus, the evaluation points all have the same offset within the subintervals, shown in the right part of figure 4. This fact makes the evaluation efficient, since it means that x in (46) is in fact the same for all subintervals; only the coefficients {c j } differ between subintervals. The polynomial (46) is evaluated using Horner's rule. (In the scaling step, the w −2 factor is evaluated directly using (42), i.e. no polynomial approximation is used.) Since we use the monomial basis {x j } in (46), the matrix of the linear system (47) becomes a (ν + 1) × (ν + 1) Vandermonde matrix evaluated in the Chebyshev points. The condition number of this matrix is approximately 0.6×10 0.38ν , which seems acceptable considering that we will see in section VI that ν = 9 (corresponding to a condition number of 1.6×10 3 ) will be sufficient for full precision in the SE method. In practice, the choice of polynomial basis does not seem to influence the accuracy greatly, as long as Chebyshev sampling points are used with the same offset within its subinterval.
(for example, the Chebyshev basis gives almost the same result as the monomial basis).
However, note that the choice of sampling points is important, and equidistant sampling points would give a significantly more ill-conditioned problem than Chebyshev points.
The PKB window is determined by the number of subintervals P , shape parameter β (both given by the underlying KB window) and degree ν. Examples of PKB windows with different degrees ν are given in figure 5. However, we show in section V B that β can be related to P , and in section VI that ν too can be related to P . Thus, the PKB window will in the end be uniquely determined by P .

A. Truncation error
Truncation errors in Ewald methods are introduced due to the truncation of interactions in the real space sum or truncation of Fourier modes in the k-space sum. As a measure of accuracy, we define the root mean square (rms) error as where ϕ * denotes the exact or well converged potential. The error (48) may be evaluated in the real-space part ϕ DP,R or Fourier-space part ϕ DP,F separately. The magnitude of the truncation errors can be perfectly estimated using error estimates by Kolafa and Perram 27 , which suggest that the error in the real space sum is and the error in the k-space sum is where Q = N n=1 q 2 n andk ∞ = M/2. Therefore, setting an absolute error tolerance ε rms and an Ewald decomposition parameter ξ, the cut-off radius should be selected as and the maximum wavenumber as where W is the Lambert W function (defined as the solution to W (x)e W (x) = x).
Assume that ε rms and ξ are fixed and that the density N/L 3 and charge density Q/L 3 are constant while N increases. Then r c as given by (51) (N log(N )).

B. Approximation error
Approximation errors arise due to (a) the evaluation of (33) with the trapezoidal rule using truncated window functions and (b) approximating Fourier integrals. We have noted earlier that the quadrature error of the Fourier integrals can be controlled by upsampling the grid using upsampling factors s and s 0 and the parameter n. In section VI we will discuss how to select these parameters. In this section, we focus only on approximation errors due to (a).
In Ref. 5, where Gaussians are used as window functions, the authors derive the approximation error estimate where α is the shape parameter of the Gaussian window and P is the size (i.e. number (where m = √ 2α). The constant C G may depend on the solution ϕ DP,F , but not on P or α. In (53), the first term estimates the quadrature error and the second term estimates the window function truncation error. Balancing both terms and using the approximation erfc( √ x) ≈ e −x , one obtains α = (π/2)P c 2 , where c 2 is a heuristically inserted constant which in practice is set slightly below unity (in this paper we use c 2 = 0.91).
Also for the KB window function, the shape parameter β strongly influences the accuracy of the algorithm (but not the runtime for a fixed P ). Selection of the shape parameter for the KB window has been discussed in different contexts for instance in Refs. 14, 25 and 26. There is no universally optimal shape parameter; rather, the optimal choice depends on the context in which the window function is used. Here, we use numerical experiments to determine a near-optimal shape parameter for use in the SE method, in the sense that it nearly minimizes the approximation error for all P . We heuristically find the approximation error estimate for the KB window, where the constant C KB may depend on the solution but not on P or β. Note that this estimate is very similar to the one provided in Eq. (53), and again the first term estimates the quadrature error while the second estimates the window truncation error. Using the approximation erfc( √ x) ≈ e −x , we find that the shape parameter that balances both terms is given by β = √ 2πP ≈ 2.5P . Note that, as for the Gaussian window, the shape parameter is only a function of P .
In figure 6 (left), we plot the rms error in evaluating the triply periodic electrostatic potential (26) as a function of β for P = 6, 10, 14, together with the error estimate (54).
This figure demonstrates that β = 2.5P coincides with the minimum of the error curves for each P . Figure 6 (right) shows the rms error as a function of P for different shape parameters as well as the near-optimal choice β = 2.5P .  Using the optimal shape parameters α = (π/2)P c 2 and β = 2.5P , the error estimates (53) and (54) can be simplified to 2C G e −(π/2)P c and 2C KB e −2.5P , respectively. Thus the approximation error is controlled solely by P . Moreover, in Ref. 12, we showed that C G ≈ Here, we show that this can be refined to C G ≈ B and that We have observed that when P and M are both selected from error estimates, i.e. when 2Be −(π/2)P c ≈ ε rms ≈ (50) for the Gaussian window or 10Be −2.5P ≈ ε rms ≈ (50) for the PKB window, the truncation error and approximation error may pollute each other so that the total error becomes larger than expected. We have heuristically determined that this occurs when P > f w (M/(ξL)), where the function f w depends on the window function and is for the Gaussian window, 0.7x 2 + 0.2x + 1.8 for the PKB window.
(56) This is illustrated in figure 8. In this situation, the error still decreases down to the truncation error level when P is increased, but slower than predicted by the error estimates. We suggest increasing M by 5 % and increasing the window support to P + 4 when P > f w (M/(ξL)), which should be enough to reach the error expected from the estimates.

VI. PARAMETER SELECTION
There are several parameters present already in the standard SE method for the triply periodic case, and we have here added some more parameters related to upsampling in the AFT and the polynomial window function. Here, we provide a systematic approach for selecting these parameters.
Let ξ and an error tolerance ε rms be given. (The parameter ξ is in principle free, and is in practice used to balance the runtime of the real-space and Fourier-space computations.) Select r c andk ∞ using the Kolafa & Perram estimates (51) and (52), respectively. This also sets the grid size in the periodic directions as M = 2k ∞ .
Either the Gaussian window (37) or KB window (41) is selected, the latter evaluated using the PKB window (section IV A). To set P , the error estimates ε rms ≈ 2Be −(π/2)P c and ε rms ≈ 10Be −2.5P are used for the Gaussian and (P)KB window functions, respectively; P is rounded to an even integer. However, to avoid error pollution as explained in section V B, if P > f w (M/(ξL)) where f w is given by (56), then the values of P and M (and thus k ∞ ) determined above are modified by increasing M by 5 % and adding 4 to P (unless M and P were set directly and not through the error estimates). In any case, the window is truncated at w = P h/2 where h = L/M . Based on the discussion in the previous section, the shape parameters are selected as α = (π/2)P c 2 and β = 2.5P for the Gaussian and (P)KB windows, respectively. For the Gaussian window, we use the fast Gaussian gridding approach, see Ref. 5 and references therein. For the PKB window, the polynomial degree ν must be selected. As shown in figure 9, ν = 9 is sufficient to reach the same precision as the KB window for all P , and the rule can be used to select ν.
The remaining parameters only appear when D = 2, 1, 0. In the free directions, the grid is extended toM = 2 (M + λP )/2 and the box is extended toL = hM , as discussed in section III C. The parameter λ depends on the periodicity and window function and is given in table I. The reason that we select λ larger for the (P)KB window compared to the Gaussian window is that the product λP then becomes approximately the same for both windows for a given tolerance, so that the FFT grids (i.e.M andL) and the upsampling parameters below become similar for both windows. Note that λ does not necessarily have to be larger for D = 1, 2 than for D = 0, but increasing λ slightly in the former cases leads to a lower runtime since s and n can be selected smaller, as seen below.
Based on the requirement s 0 ≥ 1 + √ d, where d is the number of free directions, for the upsampling factor in the free-space case and for the zero modes in the 2d-and 1d-periodic cases, we select s 0 equal to 2, 2.5 and 2.8 in the 2d-periodic, 1d-periodic and free-space cases, respectively. Finally, for the 1d-and 2d-periodic cases, we also need to select s and n for the I set of the AFT, cf. section III D. Based on a generalized form of an estimate derived in Ref. 12, we set (see Appendix B) and In practice, s 0 and s are adjusted such that s 0M and sM are both even integers.

VII. NUMERICAL RESULTS
Here, we provide numerical results of three kinds: (i) a comparison between the new PKB window function and the classical Gaussian window in the SE method, (ii) force computation with the SE method, and (iii) a comparison with the FMM. In all cases, we measure the relative rms error, either in the full electrostatic potential (5) or in the Fourierspace part (26). All experiments are performed on a single core of a machine with an Intel Core i7-8700 CPU which runs at 4.6 GHz with 32 GB of memory. Unless otherwise stated, parameters are selected according to section VI.
The unified Spectral Ewald code will soon be available online 28  In figure 10 (left), the relative rms error is plotted as a function of P . The result suggests that n digits of accuracy can be achieved with P ≈ n for the PKB window (and machine precision is reached with P ≈ 14), while the Gaussian window requires P ≈ 1.6n to achieve n digits (and reaches machine precision with P ≈ 24). Thus, we have approximately that The runtime as a function of P is demonstrated in figure 10 (right) for both window functions. The same system as in the left figure is used, and we only include the runtime of the gridding and gathering steps. This figure suggests that the PKB window is slightly faster than the Gaussian window for a given P in most cases. Since the PKB window can also use a smaller P than the Gaussian, it will require less time to reach a given error.
To study the effect of periodicity on the total computational cost of the SE method, let us as a second experiment consider a uniformly distributed system of N = 10 5 particles in a box of size L = 10 (with the same charge distribution as in the previous experiment, i.e. Q = 3.32 × 10 4 ). We select the decomposition parameter ξ = 3 and choose all other parameters from the absolute error tolerance ε rms = 10 −12 , 10 −11 , . . . , 10 −1 , 10 0 as described in section VI.
In figure 11 the Fourier-space runtime, excluding the free-space precomputation, is illustrated curves of the FFT+scaling steps are essentially independent of the window function. (Naturally, this is due to the fact that, as noted in section VI, the size of the FFT grid is kept approximately the same for both window functions.) Moreover, as the number of nonperiodic directions increases, the total number of grid points and consequently the cost of the FFT+scaling steps grows. Nonetheless, the AFT algorithm reduces the runtime significantly compared to full upsampling, such that the 2d-periodic case has a similar (around a factor two larger) computational cost to the 3d-periodic case. In the free-space case, where the AFT algorithm cannot be used, we see the full effect of upsampling in all three nonperiodic directions (although by a factor 2 rather than 2.8 due to the precomputation step).
The right plot in figure 11 demonstrates that the PKB window function reduces the runtime of the gridding+gathering steps by a factor 2-4 in most cases. This is because P can be selected smaller for the PKB window compared to the Gaussian. free-space case where the FFT+scaling steps dominate the cost. Thus, relatively speaking, the total runtime is more affected by periodicity for the PKB window than for the Gaussian window.
In Ref. 12 we showed for the Gaussian window that the total runtime of the 1d-periodic case is only somewhat larger than the 3d-periodic case, due to the employment of AFTs.
The relative difference between the 1d-periodic and 3d-periodic cases is a bit larger for the PKB window, due to the observation in the previous paragraph. With the PKB window, the 1d-periodic case is at most three times more costly than the 3d-periodic case, for very strict error tolerances. In contrast, the runtime of the 2d-periodic case is very close to that of the 3d-periodic case, for both window functions. For the free-space case, the precomputation step reduces the cost of the FFT+scaling steps. Even so, the free-space case is around four times more expensive than the 3d-periodic case for strict tolerances.
The precomputation step for the free-space case takes about as much time as all other steps (gridding+gathering+FFT+scaling) combined. The reason we exclude the precomputation from the total runtime is that, in a time-dependent simulation where the system is simulated for a long time, the precomputation is a one-time cost and will therefore be negligible compared to the other steps of the algorithm which must be carried out in every time step.

B. Force computation
In molecular dynamics simulations, one is usually interested in evaluating the force and energy along with the potential. The Fourier-space part of the energy can be obtained simply by using To compute the Fourier-space force F F (x m ) = −q m ∇ xm ϕ DP,F (x m ), the potential has to be differentiated with respect to x m . This can be done in two ways, both preserving the spectral accuracy.
Algorithm 4 can be used to calculate the force, modifying only step 5 (gathering).
This requires the derivative of the window function, and since the output from (60) is a three-dimensional vector, the integration must be performed three times 12 .
2. Alternatively, the differentiation can be carried out in Fourier space, which amounts to multiplying the scaling step (31) by a factor ik. This modifies the output from step 3 in Algorithm 4 to be a vector, so both step 4 and step 5 must be carried out three times.
While method 2 is very easy to implement, it requires three times as many IFFTs as method 1, and therefore method 2 is generally avoided in molecular dynamics simulations.
Furthermore, method 2 may require M to be increased since the extra factor ik makes the quantity H(k), cf. (31), decay slightly slower in Fourier space. Method 1, on the other hand, requires the gradient of the window function, and since the window function is a tensor product w(x) = w 0 (x)w 0 (y)w 0 (z), this means that w 0 (x) must be available. When the PKB window is used, we do not differentiate the polynomial approximation. Instead, we construct a separate polynomial approximation of w KB (x), cf. (41), using the same procedure described in section IV A and same values for P , β and ν.
Here, we demonstrate both methods to compute the force. We consider a system of N = 10 4 uniformly distributed particles, generated as before (Q = 3.35 × 10 3 ), in a box of size L = 1 with periodicity in all directions, i.e. D = 3. We set ξ = 8 and M = 64, and compute the forces acting on all particles using methods 1 and 2, with the PKB window function. The result is shown in figure 13. The left plot shows that method 2 has the exact same error curve as the potential (since M is here large enough for truncation errors to be negligible in both cases), while the error curve of method 1 is offset by a constant factor.
This reflects the fact that method 1 uses a different window function in the gathering step (namely the derivative of the original window function), which requires a larger P to resolve at the same error level as the original window function. Here, P needs to be increased by 2 in method 1 to reach the same error as method 2. (In principle, it would be enough to increase P in the gathering step, leaving the gridding step unchanged. However, increasing P in both the gridding and gathering steps allows the structure of the scaling step, with the factor w −2 , to be the same, cf. (31).) Despite the fact that P needs to be slightly increased in method 1, it is still faster than method 2 for all tolerances (even more so for less strict tolerances), as shown in the right plot of figure 13. The reason is that method 2 requires three IFFTs rather than one (while both methods require three integrations in the gathering step, unlike the potential computation which only requires one). However, the relative difference in runtime between methods 1 and 2 depends on how dominant the cost of IFFTs are in the total algorithm, which in turn depends on N , M and P (as well as the periodicity).

C. Runtime comparison with FMM3D
In this section, we strive to demonstrate that the SE method is competitive with other state-of-the-art fast summation methods. To that effect, we compare the runtimes of the SE method and the FMM3D library 30 , a recent improved implementation of the Fast Multipole Method (FMM) 31 . While both methods can be applied to problems with different periodicities, the FMM is fastest in the free-space case, whereas the SE method is fastest in the triply periodic case. Therefore, to give each algorithm optimal conditions we compare the SE method applied to a triply-periodic problem with the FMM applied to a free-space problem. The full potential (1) is computed in both cases.
While the FMM is widely used and has been celebrated as one of the top ten algorithms of the 20th century 32 , it is less often used in molecular dynamics simulations e.g. in GROMACS.
The reason is that the FMM gives rise to discontinuities in the potential and force which break the conservation of energy in the system, although this violation of energy conservation is at the selected error level. 33 As shown in Ref. 33, the FMM can be regularized to alleviate this problem, albeit at an extra cost. However, we will not consider this problem here.
It should be noted that the FMM3D library comes in two versions, one "easy to install" version and one high-performance optimized version, said to be up to three times faster than the "easy" version on some CPUs. Since our own Spectral Ewald code is not highly optimized, we here use the "easy" version of FMM3D. Furthermore, all comparisons are performed on a single core, on the same machine as above.
We consider two different types of particle systems, each with N particles in a box of size L = 3. In the first system (left part of figure 14), the particles are uniformly distributed in the box. In the other (right part of figure 14), all particles are concentrated in two corners of the box, thus forming isolated dense clouds surrounded by empty space. Both the SE method and the FMM are expected to be faster for the former, uniform, system than the dense-cloud system, but the FMM is expected to handle the latter system better than the SE method due to adaptivity. In both cases, the charges are uniformly randomly selected from the interval [−1, 1].
We consider both particle systems with N = 10 4 (Q = 3.35 × 10 3 ) and N = 10 5 (Q = 3.32 × 10 4 ) particles. For FMM3D the problem is considered in free-space. For the SE method, the triply-periodic problem is considered, we use the PKB window and select ξ such that the runtimes of the real-space and Fourier-space parts are balanced; for the uniform system we get ξ = 10 for N = 10 4 and ξ = 17 for N = 10 5 , while for the dense-cloud system we get ξ = 11 for N = 10 4 and ξ = 27 for N = 10 5 . For both methods, the absolute error tolerance is varied between 10 −14 and 10 0 . The total runtime versus relative rms error for the full potential is shown in figure 15. For the uniform system, the SE method is slightly faster than FMM3D for all N and tolerances considered here. For the dense-cloud system, the methods are comparable for N = 10 4 , while FMM3D is slightly faster than SE for N = 10 5 , as expected. However, for very loose error tolerances the SE method seems to be faster also in this case.

VIII. CONCLUSIONS
In this work, we have presented a unified approach to compute the Fourier-space part of the Ewald sum for the electrostatic potential, with arbitrary periodicity (triply, doubly, singly periodic as well as free-space). We use the Spectral Ewald method together with the recent idea of Vico et al. 13 to unify the treatment of all Fourier modes and to utilize FFTs to accelerate the calculation. This approach has already been used for the 1d-periodic 12 and free-space cases 11 , but not previously for the 2d-periodic case. In an attempt to compute the zero modes in this case, Lindbo and Tornberg 10 used Chebyshev interpolation and showed that the extra cost of calculating these modes is small compared to the total cost of the SE method. In this paper we unified these modes into the treatment of the other modes and thus completed the framework.
We also extended the idea of the adaptive Fourier transform, first developed in Ref. 12, to the 2d-periodic case. With this, upsampling can be applied to a small fraction of the Fourier modes, thus further reducing the computational cost.
We compared the Gaussian window function, previously used in the SE method, with a piecewise polynomial approximation of the Kaiser-Bessel window function, here referred to as the PKB window. The PKB window accelerates the gridding and gathering steps of the method by reducing the number of points in the support compared to the Gaussian window.
With the PKB window function, the support can be reduced by about 40 % compared to the Gaussian window for the same accuracy. The new window function maintains the spectral accuracy observed with Gaussians, and the method as a whole scales as O(N log(N )) for N sources. Implementing the new window function in the SE method, we compared the resulting method with the existing algorithm that has been developed in a series of papers 5,10-12 .
To compute the optimal shape parameter of the PKB window, we numerically estimated the approximation error due to the truncation of the Kaiser-Bessel window function and the employment of the trapezoidal rule. We then showed that the optimal shape parameter, and therefore the approximation error, can be controlled using a single parameter, namely the number of points in the support of the window function. The same parameter controls the degree of the polynomial approximation.
Our numerical experiments show that the AFT algorithm can reduce the runtime remarkably such that the cost of computing doubly periodic and triply periodic cases are very similar, while the singly periodic case is at most three times more expensive, the most for very strict error tolerances. In order to accurately compute Fourier integrals in the freespace case, the grid has to be upsampled up to two times in each of the three directions.
Our results indicate that, when precomputing Fourier coefficients of the modified Green's function, the free-space case is around four times as expensive as the triply periodic case for strict tolerances. We presented and compared two methods to compute forces using the SE method, and also made a runtime comparison between the SE method and the Fast Multipole Method to indicate that the SE method is competitive.
The unified Spectral Ewald package for solving electrostatic problems with different boundary conditions is accelerated using OpenMP and vector intrinsics and will soon be available online 28 .

ACKNOWLEDGMENTS
We point out that k may be thought of as the periodic directions and κ as the free directions; comparing (B1) with (17), it is clear that this can be directly applied to the doubly periodic case, with k 2 = k 2 1 + k 2 2 , κ = κ 3 , a = 1/4ξ 2 and b = z m − z n . Noting that ∆κ = 2π/(s fL ), cf. section III D, and that |b| ≤ L, we can write wherek = Lk/(2π). Using the approximation cosh(x) ≈ 1 2 e x and assuming that e 2π(L/L)s fk 1, we get For 1 ≤k ≤ n, the value ofH(k, s) is maximal whenk = 1. Demanding thatH(1, s) ≤ εL for some ε > 0, we find that To determine n, we furthermore demand thatH(n + 1, 1) ≤ εL/(n + 1), which yields Noting that ε is a relative error tolerance here, we get an absolute error tolerance ε rms by setting ε = ε rms /B, with B as above Eq. (55). Also noting that L/L = M/M , Eqs. (B7) and (B8) lead to (58) and (59), respectively. While derived for the double periodic case, these estimates are empirically found to work well also in the singly periodic case.