Automatic Differentiation for Orbital-Free Density Functional Theory

Differentiable programming has facilitated numerous methodological advances in scientific computing. Physics engines supporting automatic differentiation have simpler code, accelerating the development process and reducing the maintenance burden. Furthermore, fully-differentiable simulation tools enable direct evaluation of challenging derivatives - including those directly related to properties measurable by experiment - that are conventionally computed with finite difference methods. Here, we investigate automatic differentiation in the context of orbital-free density functional theory (OFDFT) simulations of materials, introducing PROFESS-AD. Its automatic evaluation of properties derived from first derivatives, including functional potentials, forces, and stresses, facilitates the development and testing of new density functionals, while its direct evaluation of properties requiring higher-order derivatives, such as bulk moduli, elastic constants, and force constants, offers more concise implementations compared to conventional finite difference methods. For these reasons, PROFESS-AD serves as an excellent prototyping tool and provides new opportunities for OFDFT.


I. INTRODUCTION
Automatic differentiation (AD) is a family of techniques for algorithmic computation of the derivatives of a function specified by a computer program. Its utility for the physical sciences was recognized soon after its introduction, 1,2 and it has remained a topic of interest for scientific computing in the decades since. [3][4][5] However, its adoption has accelerated recently in conjunction with trends in machine learning and the addition of AD frameworks to more programming platforms. [6][7][8][9] In the paradigm of differentiable physics, 10 AD aids the modelling of physical systems as varied as fluid dynamics 11 and molecular dynamics. 12,13 In the realm of electronic structure, AD has been used for Hartree-Fock basis set optimization 14 and the computation of molecular properties depending on higher-order derivatives of Hartree-Fock or density functional theory (DFT) energies. 15 It has also been used to compute arbitrary-order exchange-correlation functional derivatives to facilitate time-dependent DFT simulations. 16 Further examples include DFT and quantum chemistry codes with AD capabilities [17][18][19][20] and the "Kohn-Sham regularizer", 21,22 which improves machine-learned exchange-correlation functionals.
Orbital-free DFT (OFDFT) [23][24][25][26] is a promising sub-field that would benefit from a fully differentiable implementation. In contrast with conventional Kohn-Sham DFT (KSDFT), 27 whose computational cost increases cubically with system size for large systems, OFDFT offers (quasi-)linear scaling, allowing for million-atom simulations, [28][29][30][31] as well as much faster simulations of moderate size. The reduced computational complexity is achieved because OFDFT determines the system energy directly from the electron density, bypassing any need for individual orbitals and thereby eliminating costs associated with orbital manipulation. Several relatively mature OFDFT codes are in use based on these princi-ples, including PROFESS, [32][33][34] ATLAS, 35 DFTPy, 31 and CONUNDrum. 36 However, while OFDFT is an exact theory, in practice it is typically less accurate than conventional KS-DFT, suffering from limitations of local pseudopotentials and the noninteracting kinetic energy functional approximations necessary for the orbital-free framework. Overcoming these limitations remains an active area of research.
This manuscript demonstrates how differentiable programming can simplify and enhance OFDFT simulations. We begin in Section II with background on the relevant AD strategies, as well as the general OFDFT formulation. In Section III, we introduce PROFESS-AD, an auto-differentiable OFDFT code. Finally, in Section IV, we present applications demonstrating its utility for OFDFT users and developers.

A. Automatic Differentiation
AD is based on the principle that all numerical computations are inherently compositions of a finite set of elementary operations, each of which has a known derivative. 37 By tracking the elementary parts of a complicated calculation, it becomes possible to efficiently, in an automated fashion, combine elementary derivatives into more challenging derivatives using the chain rule. AD is exact and its speed is comparable to that of manually coded derivatives. 38 In fact, the cost of evaluating a scalar-valued function with backward mode AD is bounded by a small multiple of the cost of the function evaluation itself. 39,40 1. Backward Mode Automatic Differentiation Backward mode AD, as implemented in PyTorch, 8 is used for this work. For illustration of the basic principle, consider evaluating the partial derivatives of y(x 1 , x 2 ) = tanh (ax 1 + x 2 ) e −x 2 . When computing y(x 1 , x 2 ), PyTorch constructs a graph tracking the elementary operations, as depicted in Fig. 1. Then, the derivatives ∂ y/∂ x 1 and ∂ y/∂ x 2 may be accumulated from elementary derivatives, in a manner consistent with the chain rule, during backwards traversal of the graph-see

Differentiable Functionals
AD may also be extended to complicated numerical algorithms. For example, PROFESS-AD makes frequent use of ξ -torch 41 (pronounced "sigh-torch"), a PyTorch-based library providing differentiable functionals, including differentiable initial value problem solvers and symmetric matrix eigensolvers. (Note that this use of "functional" is more generic than that used in the DFT context.) A crucial feature of ξtorch is its use of analytical expressions for the backward pass, which is simpler than direct backpropagation through multi-step algorithms, many of which are iterative. 15 The ξtorch minimization functional is of particular relevance to our work. Here, "minimization functional" refers to an iterative, often gradient-based algorithm for solving problems of the form x min = argmin x∈R m f θ (x) by minimizing the function f θ : R m → R, where θ is a parameter vector. The ideal solution for x min obeys the stationary condition Suppose we have a known scalar function of the optimal argument vector, g(x min ). It will depend on θ implicitly through x min and might additionally have explicit θ dependence. ξtorch facilitates seamless computation of the derivative in a backward mode AD framework by determining the nontrivial ∂ x min /∂ θ k term analytically via which can be derived by differentiating the minimization condition (Eq. 1) with respect to θ k . Equivalently, the computation of ∂ x min /∂ θ k involves solving the matrix equation As ξ -torch uses a differentiable linear equation solver for that purpose, whose backward pass is also differentiable, derivatives of arbitrarily high orders are accessible in principle. However, the accumulation of error and magnification of numerical noise during higher-order derivative computation complicates this task in practice.

B. Orbital-free Density Functional Theory
The foundations of OFDFT are the Hohenberg-Kohn theorems, 42 which demonstrate that the ground state properties of a many-electron system may be found by minimizing an energy functional of the electron density E[n], subject to the constraints that the density is nonnegative and integrates to the expected number of electrons, N e . To enforce nonnegativity and normalization of the density, one may minimize with respect to an unconstrained variable χ(r), where withÑ = χ 2 (r )d 3 r . Effectively, this scheme minimizes E[n [χ]] with respect to χ(r), determining the ground state density, n gs from the optimized χ gs , which satisfies the minimization condition The relationship between δ E/δ χ and the more conventional density functional derivative δ E/δ n is This result is derived in the Supplementary Information. Among other benefits, recasting the constrained density optimization problem as a generic unconstrained minimization problem is useful for compatibility with ξ -torch's differentiable minimizer.
To illustrate OFDFT energy minimizations, it is useful to consider a single, one-dimensional electron in a harmonic oscillator potential. An exact orbital-free description is available for this system because the Schrödinger equation, may be converted into the density functional expression, using the relationship ψ = √ n between the single orbital ψ and the electron density n. The variable κ represents the curvature of the harmonic potential. Fig. 3 depicts the evolution of the electron density and energy for this system over numerous gradient descent steps of the form χ i+1 = χ i −γ δ E/δ χ| χ i from an initialized uniform density until convergence of the energy is achieved. (γ controls the magnitude of the gradient descent steps).
C. Automatic Differentiation and Orbital-free Density Functional Theory

Functional Derivatives
Given a density functional F[n], the functional derivative δ F/δ n(r) is defined such that This quantity is crucial for OFDFT energy minimizations; conventionally, the functional derivative of each term contributing to the total energy functional must be derived and coded by hand. With AD, energy minimizations are achievable immediately-one need only implement the base functional, after which its (often complicated) functional derivative is determined automatically. FIG. 3. Illustration of the OFDFT energy minimization procedure for a single electron in a one-dimensional harmonic oscillator potential with κ = 10 (see Eq. 9). The ground state energy of this system is E 0 = (1/2) √ κ. The total energy corresponding to each density profile is written beside it as a factor of E 0 , highlighting the steady decrease in energy as the system evolves to the expected ground state.

Derivatives of the Ground State Energy
With the unconstrained minimization from Section II B in mind, consider an energy functional E[χ] that depends on an external parameter λ . Possible quantities represented by λ include atom coordinates or lattice vector elements. This section investigates derivatives of the ground state energy, E[χ gs ], with respect to λ . The first such derivative is where the second term vanishes due to Eq. 6. This result is a variant of the well-known Hellmann-Feynman theorem. 43,44 Its practical significance is that we may obtain first derivatives without knowledge of the challenging ∂ χ gs /∂ λ (or equivalently ∂ n gs /∂ λ ) term. Accordingly, standard AD suffices.
For the second-order derivative, we differentiate the first derivative with respect to another parameter ν to obtain Here, one can no longer avoid terms of the form ∂ χ gs /∂ ν. For this reason, such second derivatives are conventionally computed by finite differences (or, in some cases, with the aid of perturbation theory). One straightforward AD strategy (at least conceptually) would be direct backpropagation through the iterations, yielding χ gs . However, the analytical backward pass outlined in Section II A 2 offers a more direct route. To gain more insight into the underlying operations performed, one may differentiate Eq. 6 with respect to ν and rearrange to yield an expression for ∂ χ gs /∂ ν, which may be used to re-express Eq. 12 as To further illustrate these ideas, we return to the model system of a single-electron one-dimensional quantum harmonic oscillator. The parameter κ determines the external potential and consequently affects the ground state energy and density. As special cases of Eqs. 11 and 12, the corresponding derivatives for this system are and As with the general case, the first κ derivative of the ground state energy lacks any dependence on δ χ gs /δ κ while the second κ derivative does depend on this nontrivial term. As highlighted earlier, two distinct AD-based methods may be used for this term, direct backpropagation over the minimization iterations or the analytical approach outlined in Section II A 2. The conceptual differences between the two methods are illustrated in Fig. 4. Unlike direct backpropagation, the direct backward pass of ξ -torch's differentiable minimizer is independent of the minimization iterations and does not require caching of intermediate values from the minimization iterations, making it the more favourable option in most cases.

A. Overview
This section introduces PROFESS-AD, an autodifferentiable OFDFT code built using PyTorch. PROFESS- Illustration of the differences between using direct backpropagation and ξ -torch's differentiable minimizer for computing the quantity δ χ gs /δ κ for the single-electron one-dimensional quantum harmonic oscillator system.
AD was designed as a prototyping tool to facilitate the development and testing of new methods, including new density functionals, with the expectation that features developed on this platform will eventually be implemented in mainstream performance-optimized codes, such as PROFESS. [32][33][34] Several use cases for PROFESS-AD are listed next.
• Rapid prototyping. A key indicator of the quality of a kinetic energy functional is its performance during the minimization procedure that yields the ground state energy and density. Conventional testing requires one to derive and implement the functional derivatives of a candidate functional. Furthermore, tests involving geometry optimizations require stresses to be derived and implemented, which becomes a genuine challenge when a functional is complicated. PROFESS-AD provides these derivatives immediately, accelerating the transition from concept to practical calculations.
• Testing hand-coded derivatives. At times, analytical derivatives are required, whether for implementation in mainstream codes or assessment of theoretical properties. PROFESS-AD's AD tools are useful for verifying the correctness of such derivative expressions and code.
• Training highly-parameterized functionals. A fullydifferentiable OFDFT code facilitates the development of highly parameterized functionals by enabling parameter optimization via gradient-based methods. Only an energy functional expression is required for the fitting procedure. Moreover, for kinetic energy functionals, PROFESS-AD provides functions for computing conventional fitting targets such as the kinetic potential and the implied linear response function associated with a free electron gas.
• Higher-order derivatives. PROFESS-AD streamlines the process of obtaining more complicated derivatives such as bulk moduli, elastic constants and force constants for more comprehensive benchmarking. This feature is especially useful given that external packages are often required for such calculations.

Programming Platform
Use of PyTorch 8 is consistent with the features of Python that make it an attractive rapid-prototyping language, including easy-to-learn syntax, polymorphism, and object-oriented programming support. 45 (The OFDFT code DFTPy 31 was written in pure Python for similar reasons.) An additional advantage is seamless GPU-compatibility, such that PROFESS-AD users can benefit from GPU-accelerated computation.
The choice of PyTorch is also consistent with the growing use of machine-learning techniques for density functional development, including machine-learned kinetic energy functionals for three-dimensional systems. 46-53 PROFESS-AD complements these efforts by simplifying incorporation of such machine-learning models in density functionals.

Energy Functionals
For a given configuration of atoms, the total energy functional is where n is the electron density. T S is the noninteracting kinetic energy functional, U ion-elec is the ion-electron interaction energy, U Hartree is a classical mean-field electron-electron interaction energy, E XC is the exchange-correlation (XC) functional, and U ion-ion is the ion-ion electrostatic energy.

Kinetic Energy
Noninteracting kinetic energy functionals can be broadly categorized as semilocal functionals or nonlocal functionals.
Semilocal functionals involve a single integral of an energy density, which at a given point may depend on the electron density and related quantities like the gradient or Laplacian of the density at that point. Two fundamental semilocal functionals yield the Weizsäcker 54 and Thomas-Fermi 55,56 kinetic energies, with the former exact for single-orbital systems and the latter exact for the homogeneous electron gas. Additional semilocal functionals available at present include the vWGTF, 57 Luo-Karasiev-Trickey, 58 and Pauli-Gaussian 59,60 functionals.
Nonlocal functionals have more complicated functional forms, generally involving at least two integrals. One wellknown class of functionals suitable for nearly-free-electron metals was pioneered by Wang and Teter, which includes the Wang-Teter, 61 Perrot, 62 Smargiassi-Madden, 63

Electrostatic Energies
For the ion-electron term, users can choose between a direct quadratic-scaling implementation of the structure factor or a O(N log N) scaling approximation based on the particle-mesh Ewald (PME) scheme. 28,78,79 The Hartree energy is computed via FFTs. Often, the ion-ion interaction term U ion-ion is computed by Ewald summation; 80-82 however PROFESS-AD utilizes the real-space method of Pickard 83 for its simplicity.

Exchange-Correlation Energy
At present, PROFESS-AD offers local density approximation (LDA) XC functionals based on both the Perdew-Zunger 84 and Perdew-Wang 85 parameterizations of the Ceperley-Alder quantum Monte Carlo data for the homogeneous electron gas. 86 Additionally, the nonempirical Chachiyo LDA correlation, derived from second-order Møller-Plesset perturbation theory, is implemented. 87 At the generalized gradient approximation (GGA) level, the Perdew-Burke-Ernzerhof (PBE) functional is available. 88

OFDFT Scheme
The energy minimization procedure presented in Section II B is compatible with any gradient-based minimization algorithm. PROFESS-AD offers the choice between a modified (PyTorch-based) limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) optimizer 89 and a two-point gradient descent algorithm. 90

IV. FEATURES AND APPLICATIONS
This section presents example applications. All OFDFT calculations were performed with PROFESS-AD using a plane-wave cutoff of 2000 eV for the electron density and bulk-derived local pseudopotentials (LPPs). 91 For comparison, KSDFT calculations were performed with CASTEP, 92 using a plane-wave cutoff of 1000 eV for the wavefunctions and Brillouin zone sampling with Monkhorst-Pack 93 grids having maximum distance between k-points of 0.015 × 2π Å −1 . KSDFT calculations were performed with the same LPPs used for the OFDFT calculations, as well as the C19 set of ultrasoft nonlocal pseudopotentials (NLPPs). The KSDFT-LPP results allow for assessment of the accuracy of the approximate kinetic energy functionals used in OFDFT calculations, while the KSDFT-NLPP calculations provide additional information about the suitability of the LPPs. In all calculations, the PBE 88 XC functional was used.
Our test systems comprise the face-centered cubic (fcc), hexagonal close packed (hcp), body-centered cubic (bcc), simple cubic (sc) and diamond cubic (dc) structures of the elemental metals Li, Mg, and Al, as well as the wurtzite GaAs system.

A. Functional Derivatives for Energy Minimization
The functional derivatives required for energy minimization are determined effortlessly with AD, accelerating the functional development process-one can proceed directly from a new functional form to benchmarking.
To illustrate this feature, we performed standard equationof-state tests with several kinetic energy functionals, including a published functional (the Yuk1 functional 94 ) that has not yet been assessed in literature for its performance in energy minimizations (to the knowledge of the authors). The Yukawa kinetic energy functionals 94,95 involve a nonlocal ingredient based on the Yukawa potential, where k F (r) = [3π 2 n(r)] 1/3 is the spatially varying Fermi wavevector. These functionals were developed to respect exact constraints such as the Lindhard constraint 96 and Pauli positivity 97 as best they can. However, while they were tested on converged Kohn-Sham densities for spherical systems, their performance in energy minimzations and for periodic systems is mostly unknown, having been mentioned as avenues for future work. 94 Our tests consist of equilibrium equation-of-state (EOS) fits for the zero-pressure properties of various simple structures (fcc, hcp, bcc, sc, dc) of Li, Mg, and Al. We employed the Yuk1 functional, 94 as well as the PGSLr 60 and Wang-Teter (WT) 61 functionals for comparison. Energy-volume curves were constructed with 11 evenly spaced points within ±5% of its equilibrium volume, and fit with the Birch-Murnaghan EOS. 98 To compute the Yukawa descriptor, y αβ efficiently, we used a similar interpolation scheme as that used for the Huang-Carter functionals. Table I  It is clear from the tables that the WT functional performs best overall in the description of these simple metals, likely because it obeys the Lindhard constraint exactly while the Yuk1 and PGSLr functionals only obey it approximately. As the Lindhard constraint is based on perturbations on the free electron gas, it is particularly relevant to nearly free electron like metallic systems. Of course, there are other considerations for comparing these functionals; nevertheless these calculations highlight the utility of PROFESS-AD for rapid prototyping. Further work could consider subsequent generations of the Yukawa functionals, such as Yuk3. 94,95 B. First Derivatives of the Ground State Energy PROFESS-AD uses AD to compute properties related to first-order derivatives that would conventionally be implemented manually. Such quantities include the pressure, P = −∂ E gs /∂ Ω for cell colume Ω, the stress, σ i j = (1/Ω)∂ E gs /∂ ε i j | ε i j =0 for strain ε i j , and the force, F κ = −∇ R κ E gs , where R κ is the ionic coordinate of the ion indexed by κ. These quantities are relevant for geometry optimizations, which minimize the ground state energy with respect to the lattice vectors and ionic coordinates. Additionally, PROFESS-AD provides direct access to the the derivatives ∂ E gs /∂ h i j , where h i j are elements of the matrix whose columns are the lattice vectors.
Furthermore, PROFESS-AD facilitates constrained geometry optimizations, where the lattice vectors and ion coordinates depend on a separate set of parameters. This feature is demonstrated by structure optimization of hcp Mg and wurtzite GaAs.
For hcp Mg, the ground state energy is minimized with respect to the c/a ratio and cell volume, with the corresponding derivatives obtained directly with AD. The RPROP algorithm 99 and the convergence condition that the maximum force and stress component must be below 10 −4 eV/Å and 10 −4 eV/Å 3 respectively were used for this example. Fig. 5 depicts the optimization path from the initial geometry G init of volume = 25.50 Å per atom and c/a = 1.42 to the optimized geometry G opt of volume = 23.04 Å per atom and c/a = 1.63. The second example involves wurtzite GaAs, where the energy is minimized with respect to its three degrees of freedom, the volume, the c/a ratio, and an additional fractional coordinate parameter, u. The Huang-Carter functional 71 (with the parameters λ = 0.01177 and β = 0.7143) and revised Huang-Carter functional 72 (with the parameters a = 0.45, b = 0.10 and β = 2/3) were used for this calculation. Forces and stresses were converged to be below 5 × 10 −2 eV/Å and 6 × 10 −4 eV/Å 3 respectively for both OFDFT and KSDFT.
The resulting structural parameters from OFDFT and KSDFT calculations, as well as experiment, are presented in Table II. While reasonable agreement is observed, the errors between the OFDFT, KSDFT-LPP, and KSDFT-NLPP equilibrium volumes highlight the limitations of the kinetic energy functionals and local pseudopotentials for this system.

Phonon Calculations with Ionic Coordinate Derivatives
Within the harmonic approximation, the calculation of phonon properties depends on the second order force con- where u i ( κ) is an atomic displacement, i, j ∈ {x, y, z} represent the Cartesian directions, , are unit cell labels, and κ, κ index the atoms in each unit cell. The force constants are used to construct the dynamical matrix, which is diagonalized to yield a phonon band structure that can be post-processed to estimate thermal properties.
Conventionally, such force constants are computed via finite difference approximations, where forces are obtained from separate calculations for a set of perturbations of the ionic positions. PROFESS-AD provides an exact method for determining force constants, whereby one need not worry about the magnitude of the finite difference step.
As an example, the phonon band structure for bcc Li was generated using force constants computed with both finite differences and AD with the Wang-Teter kinetic functional. 61 These calculations were facilitated by Phonopy, 101 which generated the supercells required for both methods, and the supercells with displacements necessary for the finite difference calculations. For comparison, similar phonon calculations were performed with KSDFT using the nondiagonal supercell method. 102 In all calculations, the phonon band structures were converged to within 5 cm −1 when a 5 × 5 × 5 supercell was used. The phonon band structures obtained from these calculations are presented in Fig. 6.
The agreement between the OFDFT band structures obtained from both finite differences and AD confirms that the two methods yield the same force constants. Furthermore, the agreement between the KSDFT-LPP and OFDFT band structures indicates that the Wang-Teter functional can describe bcc-Li quite well, while deviation from the experimental data likely conveys limitations in the local pseudopotentials or the XC functional used.
FIG. 6. Zero pressure phonon band structure for bcc-Li computed using Kohn-Sham (KS) and orbital-free (OF) DFT. The force constants from the OFDFT calculations were obtained using AD and finite differences (FD). Experimental data from Ref. 103 is presented for reference.

Elastic Properties with Lattice Vector Derivatives
The bulk modulus, K, requires the second volume derivative of the ground state energy, Furthermore, the second-order elastic constants, specifically the Birch coefficients, 98,104 are defined as derivatives of the Cauchy stress σ i j with respect to the infinitesimal Cauchy strains ε k , where the Cauchy stress tensor is This definition of C i jk only displays i ↔ j and k ↔ symmetry but does not possess complete Voigt symmetry since C i jk = C k i j in general. Under conditions of an arbitrary initial isotropic pressure however, the C i jk coefficients do exhibit complete Voigt symmetry, and are the coefficients measured in wave propagation experiments for materials under isotropic pressure. 104 The conversion of the stress and elastic constants from strain derivatives to lattice vector derivatives is presented in the Supplementary Information.
PROFESS-AD provides methods for computing the bulk modulus and elastic constants based on these definitions.
The procedure is as simple as calling system.elastic_constants() as AD provides these quantities directly, in contrast to more complicated multi-step procedures performed ordinarily. 105,106 PROFESS-AD also provides tools for post-processing the elastic constants into other elastic quantities including the Reuss and Voigt bulk modulus and shear modulus, as well as Young's modulus and Poisson's ratio.
As an illustration, we investigate the elastic constants of fcc Al using the Foley-Madden (FM) 66 kinetic energy functional, whose complicated and lengthy functional form has historically rendered its stress contribution intractable. We compare the elastic constants obtained via AD to the more conventional stress-strain method, where the elastic constants are obtained by fitting the stress-strain data to the generalized Hooke's Law, σ i j = C i jk ε k . Specifically, nine stress-strain data points were evaluated (with the central data point corresponding to the unstrained structure), then fit with a quadratic polynomial from which the linear coefficient was extracted as the elastic constant. For comparison, KSDFT calculations were also performed with the stress-strain method.
The zero pressure elastic property predictions are presented in Table III, along with experimental data, while Fig. 7 presents OFDFT and KSDFT data at pressures up to 100 GPa. These results verify that the OFDFT elastic constants obtained with AD and the stress-strain method are practically identical, validating the much simpler AD method. We also note that the OFDFT predictions for fcc Al's elastic constants agree well with the KSDFT predictions. ample is a function for determining the quantity where n 0 is a uniform density, k F = (3π 2 n 0 ) 1/3 is the Fermi wavevector, and η = |q|/2k F is a dimensionless wavevector. The relevance of this quantity derives from a known constraint on the noninteracting kinetic energy functional, namely that G −1 (η) should equal the Lindhard linear response function for a free electron gas, 109 given by These AD-based tools are useful for verifying analytically derived derivatives and/or for debugging hand-coded derivatives. They also simplify development of parameterized functionals via gradient-based optimization, as their outputs are themselves auto-differentiable, allowing them to become training targets. For example, the kinetic potential is an increasingly popular loss function ingredient target for machinelearned functional developers. 52,110 An example based on a mini-problem faced by Imoto et al. 52 in the development of their machine-learned semilocal kinetic energy functional is presented in the Supplementary Information, which uses G −1 (η) as the training target. Furthermore, PROFESS-AD trivially allows for less-traditional fitting targets such as the kinetic stress to be included in the loss function, and the availability of nontrivial ground state density derivatives ∂ n gs /∂ λ , enabled by ξ -torch, opens the door to even more sophisticated training procedures which could include fitting for the ground state densities and other ground state properties after a density optimization.

V. DISCUSSION
The advantages AD provides do not come completely for free; the extra machinery does incur some overhead. However, in simple tests on fcc Al supercells with up to 1000 atoms, we compared the time required for AD-based functional differentiation against that required for manually coded derivatives, observing an additional cost never greater than twenty percent (and typically less).
Future work could improve the performance of PROFESS-AD, making it competitive even for large-scale performanceintensive applications. One straightforward avenue would be just-in-time compilation, but there are algorithmic aspects that could be optimized in tandem. For example, the present implementation of ξ -torch's differentiable minimizer does not utilize the knowledge that certain terms will vanish, such as in Eq. 11.

VI. CONCLUSION
This work introduced AD to the field of OFDFT in the form of the PROFESS-AD code. As illustrated by numerous examples, PROFESS-AD is suitable for practical calculations. In addition, it is especially promising as a prototyping tool for prompt and convenient testing. Additionally, we expect the ability to differentiate with respect to arbitrary external parameters will expand the reach of OFDFT into new application areas. Work of this kind is ongoing. This work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (www.csd3.cam.ac.uk), provided by Dell EMC and Intel using Tier-2 funding from the EPSRC (capital grant EP/T022159/1), and DiRAC funding from the STFC (www.dirac.ac.uk).
For the purpose of open access, the corresponding author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.

DATA AVAILABILITY STATEMENT
Code for PROFESS-AD can be found in the repository https://github.com/profess-dev/profess-ad. The various examples are reproducible by following the tutorial examples in the repository and in the documentation (link in the repository). As highlighted in the main text, the goal of orbital-free density functional theory software is to compute the ground state energy and density for a given external potential. To ensure proper normalization of the density, it is common to formulate this problem in terms of a Lagrangian where n(r) ≥ 0 and N e is the number of electrons. The ground state density is hence a solution of the Euler-Lagrange equation where µ is the chemical potential. While other methods exist for solving the Euler-Lagrange equation, ? ? ? ? ? we employ direct minimization of the energy functional.

B. Density Optimization with Unconstrained Variables
As explained in the main text, we can recast a constrained density optimization problem as a generic unconstrained minimization problem by minimizing the energy functional with respect to an unconstrained variable χ(r), related to the positive and normalized density by n[χ](r) = Ñ N χ 2 (r), whereÑ = χ 2 (r )d 3 r .
Hence, we obtain δ E δ χ(r ) = d 3 r δ E δ n(r) δ n(r) δ χ(r ) This expression allows for transformations between the more commonly known density functional derivatives δ E/δ n and the functional derivative required for the unconstrained optimization, δ E/δ χ.