Machine-learning free-energy functionals using density profiles from simulations

The formally exact framework of equilibrium Density Functional Theory (DFT) is capable of simultaneously and consistently describing thermodynamic and structural properties of interacting many-body systems in arbitrary external potentials. In practice, however, DFT hinges on approximate (free-)energy functionals from which density profiles (and hence the thermodynamic potential) follow via an Euler-Lagrange equation. Here we explore a relatively simple Machine Learning (ML) approach to improve the standard mean-field approximation of the excess Helmholtz free energy functional of a 3D Lennard-Jones system at a supercritical temperature. The learning set consists of density profiles from grand-canonical Monte Carlo simulations of this system at varying chemical potentials and external potentials in a planar geometry only. Using the DFT formalism we nevertheless can extract not only very accurate 3D bulk equations of state but also radial distribution functions using the Percus test-particle method. Unfortunately, our ML approach did not provide very reliable Ornstein-Zernike direct correlation functions for small distances.


I. INTRODUCTION
Given the massive present-day availability of computer power and data, the grown general interest in machine learning (ML) should not come as a big surprise.This interest also extends to physics, which shares some common goals with ML such as gathering, organizing, and analyzing data in order to predict and model the behavior of systems with many degrees of freedom.An important difference is that physicists tend to understand, model, and predict the systems of their interest via a stepwise chain of reasoning from cause to effect, whereas ML algorithms tend to "only" directly relate cause to effect without necessarily understanding (in the traditional "human" sense) the steps in between.In other words, ML can often be regarded as a black box that is as incomprehensible as the initial raw data itself.
Here we will also suffer, at least to some extent, from this black-box character of ML applied to a problem in classical Density Functional Theory (DFT) 1,2 , although only in a limited way because we can build on the foundations of physics to exploit, in this case, a few ingredients of the DFT formalism.As we will explain in full detail below, DFT is an exact framework to describe thermodynamic and structural properties of interacting many-body systems, and involves the solution of Euler-Lagrange equations for the equilibrium density profile for a particle system in an external potential.However, DFT hinges for given particle-particle interactions on approximate free-energy density functionals.By comparison with Monte Carlo simulations of density profiles in a learning set of external potentials, a free-energy functional can be constructed during an ML process.The additional physics that can be extracted beyond the learning set not only includes density profiles for external potentials outside the learning set but also (i) thermodynamic bulk quantities (because the minimal value of the functional corresponds to the thermodynamic potential at equilibrium from which for instance the bulk pressure follows) and (ii) the two-body direct correlations (because it is related to the second functional derivative of the functional) from which the radial distribution functions follow.Our work is strongly inspired by recent ML work to construct a classical DFT for the Lennard-Jones (LJ) fluid in one spatial dimension 3,4 , which we here extend to the threedimensional LJ fluid.Similar to Refs. 3 and 4 we use grandcanonical Monte Carlo (MC) simulations at a learning set of chemical potentials and external potentials, however in a planar geometry (that should be distinguished from the strictly 1D geometry).We will show to be able to "learn" a freeenergy functional that predicts density profiles of this system at chemical and external potentials outside the training set, but also to extract several system properties that were not at all present in the data of the training set, or at least not explicitly.In particular, we will show that from a learning set in a planar geometry a machine-learned functional can be constructed that is capable of predicting the 3D mechanical bulk equation of state of the homogeneous fluid (the pressure-densitychemical potential relations), the 3D radially symmetric direct correlation function and the radial distribution function at any density, and (in principle) the prediction of Lennard-Jones density profiles in an arbitrary external potential in 3D.The agreement of these predictions against simulations varies from excellent (equation of state, radial distributions from the Percus test particle method) and very good (density profiles outside learning set) to, admittedly, rather poor (direct correlation).The poor prediction for the latter is probably due to the rather simple form (and in retrospect perhaps an overly-simple form compared to Ref. 4) for the free-energy functional.The main thrust of our findings at this point, therefore, is not the construction of the Lennard-Jones free-energy functional that compares "best" with MC simulations, but rather the notion that free-energy functionals for 3D systems can be constructed from relatively simple geometries (here planar) in the learning set.Extensions to other systems, for instance electrolytes and ionic liquids forming an electric double layer in contact with planar electrodes, could be a next step with actual applications in modelling the osmotic equation of state, the differential capacitance, and the adsorption in porous geometries.
This paper is organised as follows.We start in section II with an extensive introduction into classical DFT -that can easily be skipped by readers familiar with this framework.In section III and IV we discuss the system and the (simulation and machine-learning) methods that we use, and in section V we discuss the resulting kernels, density profiles, equations of state, and pair correlation functions.We end in section VI with a discussion and outlook.

A. Formalism
We consider a classical one-component system of N spherical particles with linear momenta p i and center-of-mass positions r i with i = 1, • • • , N the particle label.The particles interact with each other via an isotropic pair potential φ (r i j ), where r i j = |r i − r j | is the distance between particle i and j.All particles are subject to a static external potential V ext (r i ), such that the Hamiltonian of the system reads where m denotes the mass of the particles.Here we note that Eq. ( 1) can also describe macroscopic bulk systems by considering the external potential to be zero in a box of volume V at a homogeneous density ρ = N/V and temperature T .For these homogeneous systems, typical thermodynamic equilibrium quantities of interest include the caloric and mechanical equations of state u(ρ, T ) and p(ρ, T ) for the internal energy and pressure, respectively.Also structural quantities such as the radial distribution function g(r) (at particle-particle separation r) and the structure factor S(q) (at wavenumber q) are of interest for homogeneous systems 2 .Equilibrium statistical mechanics offers a variety of techniques to calculate (approximations to) these quantities, for instance systematic low-ρ or high-T expansions, integral equations based on the Ornstein-Zernike equation, or computer simulations.However, the situation is more complicated in the case of a nontrivial external potential due to, for instance, the Earth's gravity, an attractive or repulsive substrate, or a porous matrix that may confine the particles of interest.In this case, the system described by Eq. ( 1) becomes heterogeneous in thermodynamic equilibrium, such that the local density ρ(r) varies in space.Consequently, the energy density u and the pressure p become ill-defined (except, of course, within a local density approximation), and the broken translation invariance causes the radial distribution function to be of the form g(r, r ′ ) rather than g(|r − r ′ |).Nevertheless, the formalism of Density Functional Theory (DFT) can provide a consistent picture of the thermodynamic and structural properties of inhomogeneous fluids in an external potential.Although DFT finds its roots in the quantum-many body description of electrons, it has also found many applications in the (essentially) classical context of soft-matter systems to describe molecular liquids, electrolytes, colloidal dispersions, etc. 1,[5][6][7][8][9][10][11][12] .
DFT is essentially a grand-canonical framework in which the temperature T and the chemical potential µ of the particles are fixed to characterise the heat bath and the particle bath with which the system is in thermal and diffusive equilibrium.The corresponding thermodynamic potential is the grand potential Ω 0 defined by , where β −1 = k B T , k B the Boltzmann constant, and h an arbitrary constant with the same dimension as the Planck constant.From Ω 0 essentially all thermodynamic properties would follow, for instance the pressure of the homogeneous system equals −Ω 0 /V and the internal energy is ∂ β Ω 0 /∂ β .Of course, this involves the immense problem of evaluating the 6Ndimensional phase-space integral in the definition of Ω 0 .The key of classical DFT is that it circumvents this highdimensional phase-space integral by a proof 1 of the existence of a grand-potential functional Ω[ρ] of the variational onebody density profile ρ(r), with the properties that (i) the equilibrium density profile ρ 0 (r) minimizes the functional Ω[ρ], and (ii) this minimum equals the equilibrium grand potential Ω 0 .This implies that The problem is thus reduced to finding the functional Ω[ρ], and after that to solve the 3D Euler-Lagrange equation (2), which amounts to a huge reduction of the problem compared to the high-dimensional phase-space integral.
One can also prove rigorously 13 that the grand potential functional Ω[ρ] can always be written as where F [ρ] is the intrinsic Helmholtz free-energy functional that, and this is crucial for our machine-learning approach, only and uniquely depends on the particle-particle interactions (here the pair potential φ (r)) and on the temperature, and not on µ and V ext (r).In other words, the same and unique functional F [ρ] for a given φ (r) applies at any chemical and external potential.That F [ρ] is a Helmholtz free-energy functional follows straightforwardly from the thermodynamic relation Ω 0 = F 0 − µN 0 with N 0 = drρ 0 (r) the equilibrium number of particles and F 0 the equilibrium Helmholtz free energy, which can be decomposed into the sum of the potential energy drρ 0 (r)V ext (r) due to the external field and the remaining intrinsic free energy is not explicitly known in most cases.An exception is the ideal-gas case of φ (r) ≡ 0, where it is possible to construct the intrinsic free-energy functional as F id [ρ] = k B T drρ(r) ln ρ(r)Λ 3 − 1 , with Λ = h/ √ 2πmk B T the thermal wavelength.The common practise in DFT is now to split the intrinsic free energy into the ideal and the excess-over-ideal part, and to find an explicit (usually approximate) expression for F exc [ρ].Once such an explicit expression has been found, we can cast the minimum condition for ρ 0 (r) of Eq. (2) in the explicit form Note that Eq. ( 4) is a self-consistency relation for interacting systems, that usually takes the form of a nonlinear integrodifferential equation that needs to be solved numerically for a given µ and V ext (r) for a system of interest with pair potential φ (r) at temperature T -and hence with a given excess functional F exc [ρ].In relatively simple geometries, for instance with planar or radial symmetry, a numerical solution of Eq. ( 4) can be found at relatively low computational cost by means of e.g. a Picard iteration scheme.
Thus, the remaining problem of DFT lies in constructing an explicit form for F exc [ρ], for which no universal recipe is available -not unlike the case of partition functions of interacting systems.There is, however, one more exact relation that can be and has been exploited, and involves the second functional derivative −β δ 2 F exc [ρ]/δ ρ(r)δ ρ(r ′ ), which equals by definition the Ornstein-Zernike direct correlation function c(r, r ′ ) and is hence directly related to the two-body structure of the system.In particular, in a homogeneous bulk system the direct correlation function is of the form c b (|r−r ′ |) and its Fourier transform ĉb (q) yields the structure factor S(q) = (1 − ρ ĉb (q)) −1 , from which the radial distribution function g(r) follows by an inverse Fourier transformation.
In this manuscript we will focus on a Lennard-Jones fluid.In the DFT treatment, we split the pair potential φ (r) = φ 0 (r) + φ 1 (r) into a steep repulsion φ 0 (r) at short distances and an attractive tail φ 1 (r) of well depth −ε < 0 in accordance with Barker-Henderson theory, as further detailed in section III.On the basis of the vast body of knowledge on the thermodynamics and the two-body structure of the hard-sphere system, extremely accurate approximations have been constructed for its intrinsic excess Helmholtz freeenergy functional F exc HS [ρ], for which we will use the White-Bear mark II 12 version of the fundamental measure theory 6,11 throughout this paper.It is common practise in liquid-state theory to treat the attractions as a perturbation on the hardsphere system, and a popular version results in the Van der Waals like mean-field (MF) approximation The high-temperature limit of Eq. ( 5) returns the hard-sphere limit, but the MF approximation fails to give accurate results for lower temperatures where the attractions play a more prominent role 2,3,14 .We therefore seek improved excess freeenergy functionals in terms of corrections to the mean-field functional of Eq. ( 5) of the quadratic and cubic form where the labels ML2 and ML3 refer to the fact that we will use machine learning (ML) to find the optimal form of the kernels Ω 2 (r) and Ω 3 (r).We note that ML2 reduces to the mean-field form for Ω 2 (r) ≡ 0 and that ML3 reduces to ML2 for Ω 3 (r) ≡ 0. We also emphasize that the ML3 form of the functional is not compatible with the third-order virial-type expansion which would have entailed an additional spatial integration (say over r ′′ ) and a kernel of the triple product form ; finding the optimal kernel f (r) proved to be computationally too demanding and inconvenient at the exploration phase of this project and hence we settled for the simpler form of the cubic term ML3.

B. Planar geometry
Interestingly, we can exploit the fact that the optimal kernels Ω 2 (r) and Ω 3 (r) that we seek must be independent of µ and V ext (r) to determine them in systems with a planar geometry, i.e. systems that have translation invariance in the x-and y-direction with external potentials V ext (z) and density profiles ρ(z) that only depend on the normal coordinate z.One easily checks that the non-HS part of the MF functional of Eq. ( 5) then reduces to where A = dxdy is the (macroscopically large) area of the planar surface and φ 1,z (|z is the laterally-integrated pair potential φ 1 (r).Likewise the non-HS contributions to the functionals of Eq. ( 6) can be cast in the form (10) where the laterally-integrated kernels ω 2 and ω 3 can be writ- ten as Interestingly, Eq. ( 11) can be inverted, such that we find In other words, once we find ω 2 (z) and ω 3 (z) from calcula- tions in planar geometry, we can determine Ω 2 (r) and Ω 3 (r) from Eq. ( 12) such that the direct correlation function follows by taking second functional derivatives of Eq. ( 6).Hence, we have access to thermodynamic as well as structural properties in bulk, in any geometric confinement, in any external potential, at any chemical potential, solely on the basis of input in a planar geometry.

III. SYSTEM
In this paper, we consider a 3D fluid in which the particles interact with a truncated and shifted Lennard-Jones (LJ) interaction given by where ε > 0 denotes the well depth and σ is the LJ particle diameter.The full LJ potential is truncated at r cut = 4σ and shifted upwards by ε cut = 0.98 The splitting of φ LJ (r) into a hard-sphere reference and an attractive tail in the DFT treatment is performed on the basis of the well-known Barker-Henderson theory 15,16 that leads to an effective and temperature-dependent hard-core diameter 0 < d ≤ σ that does not depend on the bulk density as explained in Refs.15 and 17.The resulting expression for φ 1 (r) then reads At the temperature k B T /ε = 2 of our main interest the effective diameter is given by d = 0.9568σ .The external potentials V ext (z) that we consider in this manuscript all mimic a planar slit geometry.The slit is translationally invariant in the lateral x-y plane and is mirrorsymmetric in the midplane z = 0 such that V ext (z) = V ext (−z).We employ a family of external wall-particle potentials that is repulsive and parameterised by where the dimensionless strength s = βV ext (L/2) ≥ 40 characterises the potential at |z| = L/2, w ∈ [0, 1] denotes the width of the central part of the slit where βV ext (z) = 0, and p > 0 the power that characterises the steepness of the potential.Fig. 1 illustrates the external potential for general s, w, and L and for steepness parameters p = 2 (black) and p = 8 (blue).

A. Simulations
To generate the training and validation data sets to "learn" the density functional, we perform grand-canonical Monte FIG. 1.A general visualization of the external potential described in Eq. ( 15).This external potential is applied in the training data and is given by the parameters w, s, p and L. In this figure two values of p are considered, namely p = 2 (black) and p = 8 (blue).
Carlo (MC) simulations of the 3D truncated and shifted Lennard-Jones (LJ) fluid confined between two planar softrepulsive walls described by the external potential V ext (z) of Eq. ( 15).Here, we only consider highly repulsive walls with s ≥ 40 to ensure that the density reduces to essentially zero at |z| = L/2.We measure the equilibrium density profile ρ MC (z) in a cubic simulation box of volume V = L 3 with L = 10σ .We impose periodic boundary conditions in the xand y-directions, and equilibrate the system for at least 10 5 MC cycles before the measurements start.Each MC cycle consists of N trial moves with N denoting the instantaneous number of particles.Each trial move can either displace a particle or insert/remove a particle.The probability of selecting a trial move to displace a particle instead of an insertion/deletion move is set to 90%.The sampling of the density profile ρ MC (z) is performed by dividing the volume in 320 equidistant bins that represent planar slices normal to the zaxis, each of width σ /32 such that the interval z ∈ [−5σ , 5σ ] is fully covered.The density in each bin is measured and stored after every fourth MC cycle.

B. Machine Learning Methods
With an optimization process that uses several techniques from the field of ML, we will construct intrinsic free-energy functionals of the form of Eqs. ( 9) and (10) such that the density profiles ρ ML (z) that follow from this machine-learned functional are an "optimal" approximation to the corresponding MC densities ρ MC (z).We recall that ρ ML (z) is to be de- termined as a solution of the Euler-Lagrange equation (4), not only for Λ = σ at a given temperature, chemical potential, and external potential, but also for a given excess functional In other words, we are interested in optimal kernels ω 2 (z) and ω 3 (z) (where ω 3 (z) ≡ 0 for ML2).
In order to quantify "optimal" we define a so-called loss function L that characterises the difference between ML and MC profiles, and that we will minimize with respect to ω 2 (z) and ω 3 (z).Here we define L = L 1 + L 2 to consist of a dom- inant contribution L 1 and a regularization term L 2 .The dominant loss term is defined by the mean-square error 18 between the MC and ML profiles, where j = 1, • • • , n labels the n = 24 × 8 = 192 combinations of 24 external potentials and the 8 chemical potentials of the training set as identified above.We normalise the difference between the MC and ML profiles by the MC bulk density at the chemical potential µ j of training set j, for which we performed separate bulk simulations.This scaling promotes equal weights to high-and low-density states during the learning process.The regularization term L 2 is independent of the MC and ML profiles and defined by where f (z) is given by It accounts for the constraint that ω 2 (z) and ω 3 (z) must de- cay smoothly to zero for z ≫ σ , where our statistics is poor.Moreover, L 2 also suppresses undue high-wavenumber undulations that tend to develop at |z| < σ .We tune the (positive) regularization parameter λ by trial and error such that it contributes less to the minimization procedure than L 1 , while not being too small to be irrelevant.The minimization of the total loss function L is performed with the stochastic and iterative optimization method Adam 19 with the default input parameters β 1 = 0.9, β 2 = 0.999 and α = 0.001 as proposed by Kingma and Ba in Ref. 19.During each iteration of the minimization process the gradient of the loss function w.r.t. the kernels ω i (z) is required, which are straightforwardly derived for L 2 to be the functional derivatives The functional derivatives of L 1 with respect to ω i (z) for i = 2, 3 is more involved and stems from the dependence of L 1 on the ML density profiles ρ ML j (z) for j = 1, • • • , n, such that the functional chain-rule yields where we replaced the dummy integration variable z of Eq. ( 16) by z ′ .From the Euler-Lagrange equation ( 4) for the DFT equilibrium profiles ρ 0 -which are represented by ρ ML j in Eq. ( 20) -one checks that δ ρ 0 (z ′ )/δ ω i (z) = −ρ 0 (z ′ ) × δ 2 β F exc /δ ρ(z ′ )δ ω i (z).Upon considering ω i and ρ independent variables in Eqs. ( 9) and (10) for the ML2 and ML3 excess functional, respectively, the second (cross) derivative for i = 2 equals ρ 0 (z + z ′ ), and for ML3 and i = 3 it equals 1 3 [ρ 2 0 (z+ z ′ )+ 2ρ 0 (z ′ )ρ 0 (z+ z ′ )].Hence, within this approximation a numerical integration of z ′ suffices to evaluate Eq. (20), and in combination with Eqs.(19) we can numerically calculate δ L /δ ω i (z) for the grid points z of our sys- tem.Thus, we have all ingredients to minimize L by means of Adam 19 .

C. The Training Process
The training process starts with an initial guess for the two kernels, for which we take the MF approximation ω 0 2 (z) = ω 0 3 (z) ≡ 0, where the superscript 0 denotes the 0-th iteration in the training process.Next, we use these kernels to calculate the n density profiles ρ ML j,k (z) for learning sample j = 1, • • • , n and iteration label k = 0 by solving the Euler-Lagrange equation Eq. ( 4) using a Picard iteration scheme with the MC profile ρ MC j (z) as the initial guess.On the basis of Eqs. ( 16)- (20) we can then evaluate L and δ L /δ ω i (z) for i = 1, 2, from which improved kernels ω k i (z) are constructed for k = 1 by employing Adam 19 as we have seen, which will give rise to improved density profiles ρ ML j,1 (z), etc.For k ≥ 2 we take ρ ML j,k−1 (z) as initial guess in the Picarditeration of ρ ML j,k .The iteration process is repeated until the loss function has converged.
Although Adam is already an efficient algorithm for the learning process, its computational cost can be significantly reduced by making use of stochastic optimization.Rather than using all n elements of the training set in every iteration, which involves the addition of all n terms in Eq. ( 20) at every iteration level k, we consider mini batches with only 20 randomly selected elements of the training set during each Picard iteration k.The gradient of the loss function L 1 is computed by only taking into account this mini batch, thus the summation over the n density profiles of Eq. ( 20) changes to a summation over 20 randomly-selected density profiles and the normalisation factor 1/n becomes 1/20.A new mini batch is randomly selected during every iteration in the ML process.

V. RESULTS FOR THE LENNARD-JONES SYSTEM
We perform MC simulations of the LJ system to generate MC density profiles with 24 different external potentials, described in section III, and 8 equi-distant different chemical potentials, β µ ∈ {−3.0, −2.5, • • • , 0.0, 0.5}, for the temperature k B T /ε = 2.We describe the kernels, the resulting density profiles, the mechanical equations of state of the bulk fluid, and some radial distribution functions that follow from the functionals ML2 and ML3.

A. The Kernels
For several choices of the regularisation parameter λ in Eq. ( 17) we determined the kernel ω 2 (z) for ML2 and ω 2 (z) and ω 3 (z) for ML3.Without a significant L 2 contribution, λ ≤ 10 −3 , we found spurious peaks in both ω 2 (z) and ω 3 (z) for |z| > 8σ , i.e. at the largest separations (with the poorest statistics) we considered in the learning set; these spurious peaks disappeared and ω 2 (z) smoothly decayed to zero for λ ≥ 10 −2 , and throughout we settle for λ = 10 −2 as a reasonable compromise between error-correction and minimization of the actual loss function of interest L 1 .In Fig. 3 we present the evolution of the loss functions L 1 (blue) and L 2 (red) during the training process with iteration label k, in (a) for ML2 and in (b) for ML3; the grey curves in (a) and (b) represent a moving average of L 1 over 15 iterations.We observe good convergence after, say k = 5000 iterations, respectively.We note that the minimized loss function L 1 of ML2 is as small as 5 × 10 −4 , and for ML3 it is even about four times smaller.We also note that L 2 < L 1 for ML2, as desired for a regularisation term that is (naively) supposed to be a small correction to the total loss function.However, for ML3 we find L 1 to be so small that it has dropped below the regularization term L 2 , which in retrospect should be seen as a consequence of the good accuracy of the ML3 functional rather than as a problem for the relative magnitude of the two contributions to the loss function.
In Fig. 4(a) we show the MF (scaled) kernel β φ 1,z (z) (dashed blue line) and its ML2 correction β φ 1,z (z) + ω 2 (z) (black solid line), as obtained after 5000 iterations.For all z the ML2 kernel is more negative than the MF kernel, as if there is actually more cohesive energy in the system than predicted by MF.We see that ω 2 (z) develops a peculiar and unexpected small "bump" close to z = 0.For ML3 a similar feature occurs close to z = 0 in both ω 2 (z) and ω 3 (z), as can be seen in Fig. 4(b) where we plot ω 2 (z) (green solid line) and ω 3 (z) (green dotted line) for the ML3 case as obtained af- ter 5000 iterations, together with the ML2 kernel ω 2 (z) (black solid line) for comparison.We see that ω 2 from ML3 is again essentially negative (except for a tiny positive feature at z = 0 and |z| ≃ 2σ ), and contains a "bump" similar to the ML2 case.We also see that ω 3 (z) has a structure that is quite similar to ω 2 (z), however more pronounced with a higher peaks and lower valleys.Below we will investigate the thermodynamic and structural properties that follow from DFT based on these kernels.

B. The Density Profiles
The first test of the quality of the ML functionals is a comparison of their resulting density profiles with the simulated ones from the training set.In Fig. 5 this comparison is illustrated for the external potential parameterised by w = 0.65, p = 2.0, and s = 40 (shown in red) and the four chemical potentials β µ ∈ {−2.5, −1.5, −0.5, 0.5}; for symmetry reasons we only plot the regime 0 < z < L/2, and for comparison we also show the MF profiles.Clearly, the MF predictions are substantially worse than ML2 and ML3, except at the lowest FIG. 5. Density profiles of a (truncated) Lennard-Jones fluid confined in a planar slit characterised by a repulsive external potential given by Eq. ( 15) with parameters w = 0.65, p = 2.0, s = 40 at temperature k B T /ε = 2 and at four chemical potentials β µ ∈ {−2.5, −1.5, −0.5, 0.5} from bottom to top.Symbols stem from the grand-canonical MC simulations, and curves from the MF (blue dashed), ML2 (black solid), and ML3 (green solid) functionals; all four state points are part of the training set.
µ, and ML3 constitutes a small but clear improvement over ML2, especially at the peaks of the profiles at the highest µ.
In fact we can also conclude from Fig. 5 that the main improvement of ML2 and ML3 over MF compared to the simulations concerns the bulk density ρ b that is approached in the center of the slit at z = 0, as will be made more explicit below.
In Fig. 6 we consider a comparison of MC simulations with MF, ML2, and ML3 density profiles in a particular external potential outside the training set, at β µ = −1.The external potential consists of hard walls at z = 0 and z = 20σ , and for z ∈ [0, 20σ ] the potential varies irregularly with wells and barriers between ±k B T as shown by the solid red curve in Fig. 6.We see again that both ML2 and ML3 are largely of comparable quality and substantially more accurate than MF.

C. Mechanical equation of state of the bulk
The (isothermal) mechanical bulk equation of state provides relations between the bulk density ρ b , the pressure p, and the chemical potential µ, satisfying the constraint of the Gibbs-Duhem equation dp = ρ b dµ such that we can equivalently consider ρ b (µ), p(µ), or p(ρ b ).Within DFT the bulk density ρ b (µ) that follows from a particular freeenergy excess functional follows from the solution of the Euler-Lagrange equation (4) for the homogeneous bulk case V ext ≡ 0, which reduces for ML2 and ML3 to a nonlinear algebraic equation with coefficients that depend on dz ω i (z).Hence ρ b (µ) is straightforwardly solved numerically for the the three functionals ML2, ML3, and MF of our interest here.The bulk pressure follows as p(µ) = −Ω[ρ b ]/V , from which p(ρ b ) follows upon inversion of ρ b (µ).For the temperature of interest, k B T /ε = 2, these three representations of the equation of state are shown in Fig. 7 tionals MF (blue dashed line), ML2 (black solid line), and ML3 (green solid line) together with the MC data (purple symbols).The regime of the training set is hatched grey.In the µ-dependent curves of (a) ρ b (µ) and (b) p(µ), we find agreement in the low-density limit β µ < −3, as expected, since all functionals include the ideal-gas limit properly.In the regime of the training set we also see ML2 and ML3 outperforming MF by a large margin in (a) and (b), with a small but hardly noticable improvement of ML3 compared to ML2, as we could have expected on the basis of the density profiles of Fig. 6 and the loss functions of Fig. 3.At the high-µ side outside the training set, Fig. 7(a) shows an increasingly deteriorating quality of the ML2 and especially the ML3 prediction, which are systematically higher than the MC data, although they are still much more accurate than the predictions based on the MF functional.Interestingly, however, the picture that emerges from the p(ρ b ) representation shown in 7(c) is much more forgiving for the MF functional, which is now of comparable good agreement in the complete regime of the training set and deviates as much as ML2 (and even less than ML3) from the MC data.Clearly, this relatively good MF and ML2 performance is due to a fortunate cancellation of errors occurring in the process of eliminating the dependence on the chemical potential.
It is perhaps remarkable that rather accurate bulk equations of state in a complete density interval can be obtained from MC simulations at only a few chemical potentials in only a few external potentials.Here it is crucial to appreciate the DFT formalism, which includes the statement that the intrinsic excess free-energy functional F exc [ρ] that we construct by the ML Ansätze of Eqs. ( 9) and ( 10) is independent of the external and chemical potential, and hence can also be applied at any µ in the homogeneous bulk where V ext ≡ 0.

D. The structure of the bulk fluid
A key feature of DFT is that it provides not only thermodynamic but also structural information, where we have seen that the first functional derivative δ F exc [ρ]/δ ρ(r) plays a key role in the Euler-Lagrange equation ( 4) for the equilibrium one-body distribution function.We have also seen already that the second functional derivative −β δ 2 F exc [ρ]/δ ρ(r)δ ρ(r ′ ) ≡ c(r, r ′ ) equals the direct correlation function and governs the two-body distribution function 1,2 .In a homogeneous and isotropic bulk fluid symmetry dictates that the direct correlation takes the bulk form c b (|r − r ′ |), and the radial distribution function g(r) follows from the Ornstein-Zernike equation g(r) − 1 = c b (r) + ρ b dr ′ (g(r ′ ) − 1)c b (|r − r ′ )|).Since the ML2 and ML3 func- tionals have been fully determined in terms of ω i (z) in planar geometry and its conversion to Ω i (r) according to Eq. ( 12), we can write from Eqs. ( 9) and ( 10) where for c HS (r) we used the White-Bear mark I direct correlation function reported in Ref. 20 (and where Ω 3 ≡ 0 for ML2).Consequently, our ML2 and ML3 functionals (and likewise also the MF functional) give direct access to the twobody structure encoded in c b (r) and g(r).
In Fig. 8 we plot the resulting c b (r) for bulk density ρ b σ 3 = 0.39, and find fairly good agreement between MF, ML2, and ML3, except close to r = 0 where the c b (r) from ML2 and especially ML3 become deeply negative.This can be traced back directly to Fig. 4, which reveals that (i) there is close structural similarity between the functionals outside the hard core, and (ii) that the "bumps" of ω 2 (z) and ω 3 (z) close to z = 0 give rise to a substantial slope dω 3 (z)/dz for z/σ ∈ [0, 0.05] that translates via Eq.( 12) in a relatively large effect in c b (r) in the vicinity of r = 0. Note also that c b (r) vanishes for d < r ≤ σ due to the Barker-Henderson splitting, and that all three versions of c b (r) agree pretty accurately outside the hard core, at least on the scale of the plot.
Upon insertion of c b (r) into the (Fourier transform of the) Ornstein-Zernike equation, we find (after an inverse Fourier transform) the radial distribution functions g(r) that we com-FIG.8.The Ornstein-Zernike direct correlation function c b (r) of the bulk Lennard-Jones system at temperature B T /ε = 2 and bulk density ρ b σ 3 = 0.39, as predicted by the second functional derivative of the excess free-energy functional within the mean-field (MF) approxand its quadratic (ML2) and cubic (ML3) Machine Learning corrections.
pare with canonical MC simulations a density ρ b (at the temperature of interest k B T /ε = 2).The three lines in Fig. 9 show these radial distribution MF, and at bulk density (a) ρ b σ 3 = 0.39 and (b) ρ b σ 3 = 0.837, together with the MC simulation results (sym- bols).For the lower density in (a) and the higher one in (b) we find reasonably good overall agreement outside the hard core (r > with and ML2 actually outperforming ML3 close to contact.Inside the hard core our prediction for the g(r) is poor in all cases, which is not surprising given that the underlying c HS (r) is constructed such as to cause a vanishing g(r) inside the hard core; any tampering of the direct correlation (such as adding terms as we do in Eq. ( 21)) will give rise to spurious nonzero contributions to g(r) for r < d 21 .
Interestingly, DFT provides another procedure to calculate the radial distribution function of a bulk fluid.This so-called "Percus test-particle method" 22 is based on the identification of ρ b g(r) with the equilibrium density profile ρ 0 (r) that sur- rounds a given (test) particle that is fixed in the origin of an otherwise homogeneous fluid at bulk density ρ b ≡ ρ 0 (∞).In other words, g(r) = ρ 0 (r)/ρ 0 (∞) with ρ 0 (r) the spherically symmetric density profile of the fluid in an external potential that equals the pair potential, V ext (r) = φ (r), scaled such that g(∞) = 1.For a given chemical potential µ and a given functional F exc [ρ] one thus obtains g(r) through the solution ρ 0 (r) of the Euler-Lagrange equation (4).For the same two state points as used in Fig. 9 we present the resulting radial distributions in Fig. 10(a) and (b).For both densities the agreement between simulation and all three DFTs is substantially better than obtained from the Ornstein-Zernike route shown in Fig. 9, not only for r < σ where the Boltzmann factor of the external potential in Eq. ( 4) ensures a vanishingly small contribution to g(r) but also at larger distances where the oscillations in the MC data are rather accurately captured by all three DFTs.Interestingly, however, the peaks of the oscillations in (b) are actually better accounted for by MF and ML2 than by ML3, which overestimates them especially at close contact.The relatively good performance of MF in predicting g(r) via the Percus test-particle method compared to its relatively poor prediction of the equation of state ρ b (µ) is due to the scaling-out of ρ b in the density profile ρ 0 (r) such that g(∞) = 1 by construction.Clearly, an overall comparison of Fig. 9 and Fig. 10 shows that g(r) based on the test-particle method is much more accurate compared to the MC simulations than those based on the Ornstein-Zernike equation.

VI. SUMMARY, DISCUSSION, AND OUTLOOK
In this article we combine the formalism of classical density functional theory (DFT) with machine learning (ML) density profiles from Monte Carlo (MC) simulations to construct approximations to the excess Helmholtz functional F exc [ρ] of a (truncated and shifted) Lennard-Jones fluid at the supercritical temperature k B T /ε = 2.This functional consists of a well-known and accurate hard-sphere contribution, a standard Van der Waals type account of the attractions, and new machine-learned corrections that FIG. 10.The radial distribution function g(r) for the Lennard-Jones system as obtained from the Percus test-particle method for a bulk density (a) ρ b σ 3 = 0.39 and (b) ρ b σ 3 = 0.837.System, legends, and the MC data are identical to those in Fig. 9. are, for simplicity, either of a quadratic (ML2) or an additional cubic (ML3) form in the density.The kernels of ML2 and ML3 are radially symmetric and translation-invariant twopoint functions of the form Ω i (|r − r ′ |) for i = 2 and 3, see Eqs. ( 9) and (10).By comparing DFT predictions of the equilibrium density profiles ρ 0 (z) with grand-canonical MC sim- ulations at a learning set of chemical potentials µ and external potentials V ext (z) in a 3D planar geometry, we can construct the optimal planar kernels ω i (|z − z ′ |) using Adam to minimize a suitable loss function, from which we can reconstruct the full radially symmetric kernels Ω i (|r − r ′ |).Given that F exc [ρ] is independent from the external potential and the chemical potential, the functional and its Euler-Lagrange equation ( 4) for ρ 0 (r) can be applied to any µ and any V ext (r).By comparisons with density profiles obtained from grandcanonical MC simulations, for conditions within and outside of the learning set, we find that the ML2 and ML3 functionals generally outperform MF by far because the latter predicts densities that are systematically too low; ML3 improves ML2 somewhat on some of the details at higher µ, at least within the training set.A similar picture emerges from the resulting representations of the mechanical equations of state, viz. the bulk density ρ b (µ) and the pressure p(µ), where MF is too low by a large margin and ML3 performs only slightly better than ML2 within the training set, while showing a slightly poorer performance outside.The functional F exc [ρ] can also be used to calculate the direct pair correlation function, from which the radial distribution g(r) of a bulk fluid follows via the Ornstein-Zernike equation.At the relatively low bulk density ρ b σ 3 = 0.39 this yields, outside the hard core, an (almost) equally satisfying result for MF, ML2, and ML3, see Fig. 9. Inside the hard core, and also close to contact, say σ < r < 1.3σ , the prediction for g(r) is poor in all cases.However, all three functionals give a rather good account of the simulated g(r) at these two state points if the Percus testparticle method is employed, although here ML3 overestimates the peaks at the higher density slightly.The reason for the relatively good MF performance for g(r) compared to the equation of state ρ b (µ) and density profiles stems from the imposed asymptotic normalisation g(r → ∞) = 1.
A disadvantage of the ML approach is its black-box character, and the associated difficulty to interpret the outcome.In particular the "hump" in the ML2 and ML3 kernels ω i (z) close to z = 0 shown in Fig. 4 and the associated deeply negative direct correlation c b (r) close to r = 0 for ML3 in Fig. 8 are actually rather suspicious.In retrospect, we expect these features to be the result of some degree of overfitting the data in the learning process.This is also borne out by closer inspection of the bulk equations of state ρ b and in Fig. 7(a) and (b), respectively, where ML3 hardly improves upon ML2 in the (hatched) regime of the learning set while performing even poorer outside, and likewise for the g(r) of Figs.9(a) and 10(a) at the ρ b σ 3 = 0.39 that lies com- fortably in the middle of the training set.Of course, ML3 does outperform ML2 somewhat for the density profiles of Fig. 5. Nevertheless, some more caution could or should have been exercised in the diversity of the training set of external potentials, perhaps with attractive components and discontinuities.We leave studies along these lines for future work.
Although there is room for improvement and extensions, we have shown here anyway that it is in principle possible to construct a free-energy functional for an atomic fluid by an ML process that takes data from grand-canonical MC simulations at a variety of chemical and external potentials, from which further predictions outside the training set can be made.Interestingly, even data taken in a planar geometry can suffice to construct the full functional, at least for the (relatively simple) functional forms that we considered here which are linear in the kernels Ω i (|r − r ′ |); nonlinear forms probably require a different treatment.It is important to realise that we fixed the temperature, and although F exc [ρ] is independent of µ and V ext (r) it is dependent on T , so strictly speaking a new functional is to be constructed at every temperature of interest.We leave the T -dependence of the functional to future work.We also expect that it is possible to extend studies of this type to other systems with spherically symmetric particles, also to mixtures such as electrolytes.Systems of particles with orientation degrees of freedom are probably challenging in practise because of their larger number of variables, although one could imagine first attempts based on truncated expansions in spherical harmonics or an initial focus on homogeneous bulk states (nematics).We hope that this paper will stimulate further explorations of the combination of DFT, ML, and MC simulation.

FIG. 3 .
FIG. 3. The loss function contributions L 1 (blue) and L 2 (red) as function of the iteration label k, in (a) for ML2 and in (b) for ML3.The grey traces in (a) and (b) represent moving average of L 1 , and the insets only zoom in.
FIG.6.The equilibrium density profile for a LJ fluid at chemical potential β µ = −1 and temperature k B T /ε = 2 in the external potential V ext (z) (red solid line) outside the ML training set.Symbols stem from grand-canonical Monte Carlo (MC) simulations, and lines are DFT predictions based on the mean-field (MF) approximation (blue dashed) and on the quadratic (ML2, black solid) and cubic (ML3, green solid) corrections with machine-learned kernels.

FIG. 7 .
FIG. 7. The relations between (a) the bulk density ρ b and (b) the bulk pressure p as a function of the chemical potential µ for the (truncated) Lennard-Jones fluid at temperature k B T /ε = 2 as obtained from grand-canonical Monte Carlo simulations (MC, symbols), the machine learning functionals ML2 (black) and ML3 (green), and the mean-field function (MF, blue dashed).In (c) the corresponding pressure-density relation p(ρ b ) are shown, obtained by elimination of µ from ρ b (µ) of (a) and p(µ) of (b).

FIG. 9 .
FIG. 9.The radial distribution function g(r) of a truncated Lennard-Jones fluid at bulk density (a) ρ b σ 3 = 0.39 and (b) ρ b σ 3 = 0.837, as obtained from the Ornstein-Zernike equation with a direct correlation function c b (r) that follows from the free-energy functionals ML2 (black), ML3 (green), and MF (blue dashed).The symbols denote g(r) as obtained from canonical Monte Carlo simulations at the same bulk density and temperature.