Nucleation barrier reconstruction via the seeding method in a lattice model with competing nucleation pathways

We study a three-species analogue of the Potts lattice gas model of nucleation from solution in a regime where partially disordered solute is a viable thermodynamic phase. Using a multicanonical sampling protocol, we compute phase diagrams for the system, from which we determine a parameter regime where the partially disordered phase is metastable almost everywhere in the temperature-fugacity plane. The resulting model shows non-trivial nucleation and growth behaviour, which we examine via multidimensional free energy calculations. We consider the applicability of the model in capturing the multi-stage nucleation mechanisms of polymorphic biominerals (e.g., CaCO3). We then quantitatively explore the kinetics of nucleation in our model using the increasingly popular "seeding" method. We compare the resulting free energy barrier heights to those obtained via explicit free energy calculations over a wide range of temperatures and fugacities, carefully considering the propagation of statistical error. We find that the ability of the "seeding" method to reproduce accurate free energy barriers is dependent on the degree of supersaturation, and severely limited by the use of a nucleation driving force Δμ computed for bulk phases. We discuss possible reasons for this in terms of underlying kinetic assumptions, and those of classical nucleation theory.


I. INTRODUCTION
2][3] To date, very few studies have computed quantitative absolute nucleation rates from such simulations. 4 particularly challenging example where nucleation kinetics remain well beyond the reach of atomistic simulation is biomineralisation.Here nucleation and growth, mediated by organic components, appears to progress via a multistage process with a high degree polymorph control. 5This serves as a natural example of self-assembly -a phenomenon of great technological interest. 68][9] Recent in situ transition electron microscopy (TEM) experiments 10 have provided a striking visualisation of crystallisation of calcite -the most stable polymorph of CaCO 3 at ambient conditions -via vaterite -a metastable polymorph whose molecular structure is characterised by a degree of structural disorder. 113][14] However, the expense of detailed atomistic models and the low solubility of CaCO 3 prohibit modelling of the entire multi-stage process.More quantitative studies of multi-stage processes [15][16][17] are possible using simple lattice models.In common with biomineral nucleation, these show existence of amorphous precursors to the assembly of anisotropic particles.To our knowledge, however, pathways proceeding via partially disordered phases or featuring the dissolution-regrowth mechanism 10 have not been previously captured.
Lattice models remain a useful tool in nucleation theory, yielding insights into nonclassical phenomena, [15][16][17] multistep pathways, 18,19 heterogeneous nucleation, 20,21 and limitations of calculation methodology. 22,23n the following contribution, we present a lattice model of nucleation from solution, where the solute may form disordered, ordered, or semi-ordered solids.We map the phase diagram of the system and show that, in the limit of slow growth, the transition from solvent rich to ordered crystalline state proceeds via the two metastable solute phases.The temperature dependence of the barrier to nucleation of these three phases indicates the existence of a parameter regime where heights of these barriers become comparable.However, we also find that the barriers to solid state transformation between the three solute phases, once nucleated, are too low to allow our model to capture the dissolution and regrowth pathway to crystallisation.
We then study the kinetics of nucleation in our model using two distinct choices of microscopic dynamics over a broad range of parameters.We use the increasingly popular "seeding" approach, 4,[24][25][26] which uses information extracted from simulations to parametrise a classical nucleation theory (CNT) expression for the nucleation rate.We consider the propagation of statistical error and critically examine the approximations made in this method.We show that the reconstructed nucleation free energy barrier is limited by the use of a CNT driving force parameter ∆µ calculated for bulk phases rather than finite-size nuclei.As a result, barriers can be inconsistent with those obtained via explicit free energy calculation.
The definition of the lattice model is given in Sec.II.In Sec.III we specify the details of the free energy calculations presented in Sec.IV, where we examine the phase behaviour and the possible nucleation pathways in the model.In Sec.V we move on to the discussion of kinetics of nucleation under the different choices of microscopic dynamics (Sec.V A), developing an error estimation approach for the "seeding" method in Sec.V B and discussing generation of appropriately structured seeds in Sec.V C, before presenting the obtained by the method results in Secs.V D and V E. Finally, in Sec.VI we summarise and discuss the results presented in Sections IV and V.

II. MODEL
Our model is an extension of that introduced by Duff and Peters. 16,27We study a three component system of anisotropic particles on a cubic lattice, where each particle i is characterised by species s i ∈ {1, 2, 3} and orientation q i ∈ {1, . . ., 24}.We label species 1 as solvent and species 2 and 3 as solute.The nearest neighbouring particles (i, j) interact isotropically with strength K(s i , s j ), while diagonally neighbouring particles (k, l) interact anisotropically with strength A(s k , s l ), giving us the following lattice energy E: where the first and second summations are performed, respectively, over the unique nearest and diagonal neighbour pairs, δ(., .) is the Kronecker delta function, and the lattice structure is primitive cubic with 6 nearest and 12 diagonal neighbours.We sample lattice configurations from the semigrand (µVT) ensemble, i.e.,  3 s=1 N s = N, where N s is the number of particles of species s and N = L 3 is the number of lattice sites, using the Metropolis algorithm with the moves: (1) Transmutation s i → s ′ i .(2) Reorientation q i → q ′ i .The two Monte Carlo (MC) moves are attempted with equal probability and accepted with probability, where is the ratio of fugacities of species s ′ and s, β = (k B T) −1 is the inverse temperature of the system, and ∆E is the change in energy of the lattice configuration due to the proposed move.In Secs.V and VI of the article, we will refer to the MC move set defined here as transmutation-reorientation kinetics.For convenience, we define f to be the solute to solvent fugacity ratio f 12 .
FIG. 1. Visualisations of bulk forms of the three solute phases on a cubic lattice of length L = 6, with solute species s = 2 and s = 3 drawn as cubes and tetrahedrons, respectively.Due to the nature of isotropic interaction, solute particles assemble into checkered structures.The shapes are colour coded according to the corresponding orientation label q.
We study the phase behaviour of the system as a function of temperature k B T and fugacity ratio f in the parameter regime where µ 2 = µ 3 with the following interaction strengths: For c = 0.5, i.e., where anisotropic interactions of species 2 are stronger than those of species 3, we observe, for k B T ≤ 1.5, three distinct energetically stable solute rich states (Fig. 1) and a solvent rich state for low values of f .

III. FREE ENERGY METHODS
We employ a multicanonical "flat histogram" method, refined using the Wang-Landau recursion 28 algorithm in conjunction with a histogram reweighting procedure. 29,30ur implementation of the method relies on constructing functions η N 1 (N 1 ) and η E (E), which, when used as bias energies in our Metropolis acceptance criteria, allow our MC scheme to sample uniformly in N 1 and E, where N 1 is the total number of solvent particles in the system.It can be shown that uniform sampling is achieved when η N 1 (N 1 ) = ln P(N 1 ) and η E (E) = ln P(E), where P(N 1 ) and P(E) are the equilibrium probability distributions of quantities N 1 and E.
We proceed by segmenting the ranges of the two quantities into overlapping intervals of equal width, ranging between 40 and 80 bins.We use bin widths of 1 and 6 for N 1 and E, respectively.Within each interval, the sampling scheme performs a biased Metropolis random walk while simultaneously refining the bias function using increments ∆η ∈ {2 −10 , 2 −11 , . . ., 2 −26 } and generating a histogram h of the quantity of interest.Once the histogram achieves the flatness criteria max(h) ≤ 1.1 h and min(h) ≥ 0.9 h, where h is the mean value of the histogram across all bins, the increment ∆η is reduced and the histogram is discarded.After obtaining the estimates of η N 1 (N 1 ) and η E (E), we sample histograms of N 1 and E with the, now fixed, bias.][33] The obtained distributions are reweighed with respect to parameters f and β to produce approximate coordinates of the points where two or more states, e.g., solvent rich (N 1 ≈ N) and solute rich (N 1 ≈ 0), are equally probable.We rerun the scheme 20 times for each coexistence point to obtain error bars.
To explore the nucleation pathways in our model, we implement an equilibrium path sampling (EPS) algorithm, 16,34 which produces an estimate of the equilibrium probability distribution of a desired quantity without biasing the MC.We first define an order parameter (n, υ, χ) which characterises the size and degree of orientational order within the largest cluster of solute particles on the lattice, where n is the cluster size, and υ and χ are cluster orientational order parameters defined as follows: where P i and p i are, respectively, the total number and the number of aligned diagonally neighbouring pairs of particles of species i, present in the cluster.To avoid singularities we only consider configurations where P i ≥ 1.We argue that, for a cluster of n > 20, the (υ, χ) coordinate is sufficiently representative of the orientational ordering of the cluster to classify it as amorphous (I), semi-ordered (II), or crystalline (III).We employ the standard "geometric" definition of clustering, where a solute particle is considered to be part of a cluster if it is a nearest neighbour of at least one other solute particle.The quantity n is taken as the size of the largest such cluster on the lattice.This definition is known to be problematic at high temperatures above the surface roughening transition; 35 however, in this work we operate in the low temperature regime where the roughening effects are negligible.
We estimate the equilibrium joint probability distribution P(n, υ, χ) for n ∈ [22, 322] by, once again, segmenting the range of the order parameter into overlapping windows of 4 bins wide along each axis.We use a bin width of 1 along n, and 1/16 along υ and χ.In each window, the EPS algorithm performs path MC, 36 where the state of the Markov chain is some path ⃗ σ = (σ 1 , . . ., σ τ ), with σ i ∈ {(s, q)} N being a lattice configuration.In our implementation, a path is generated from a seed lattice state by propagating it a random number m ∈ {1, . . ., τ − 1} of sweeps forward and τ − m − 1 sweeps backward, where a sweep is equivalent to N Metropolis MC moves and, for our purposes, the MC is time symmetric.We fix the path length τ = 7.The generated path is accepted with probability 1 if at least one of the comprising states falls within the range of (n, υ, χ), as specified by the window, and rejected otherwise.To generate a new path ⃗ σ ′ from an already accepted path ⃗ σ, EPS selects a random snapshot σ k of the old path and uses it as a seed for the new path.Upon seeding the path MC, we allow a relaxation stage which terminates after accepting 5 × 10 4 new configurations (up to 7 configurations per path).This is followed by a sampling stage, which aims to accept at least 10 5 new configurations.
All EPS calculations presented in this work were carried out in cubic systems with linear dimension of L = 32 lattice sites.For all conditions studied, this is sufficient to capture nuclei at the final stage of the nucleation pathway without self-interaction between periodic images.For computation of phase diagrams via multicanonical sampling with Wang-Landau recursion, the structural details of the mixed-phase region are unimportant allowing use of smaller system sizes L ∈ {4, 8} for rapid convergence of bias energies.

IV. PHASE BEHAVIOUR AND NUCLEATION PATHWAYS
With the help of the multicanonical sampling method, as described in Sec.III, we verify that the solute-solvent, I-II and II-III phase transitions in our model are first-order, as is consistent with the current understanding of the Q ≥ 3 3D Potts models [37][38][39] as well as previous studies of the Potts Lattice Gas (PLG) model. 16We obtain phase diagrams (Fig. 2) of our system for c ∈ {0.5, 1}, showing that the stability of the partially disordered solute phase is determined by the relative strength of anisotropic interactions of solute species.By considering the solute rich state of our model as an interweave of two decoupled Q = 24 state Potts lattices with 12 nearest neighbours (due to diagonal interactions), we show (Fig. 2(c)) that the coordinates β † of solute phase coexistence points are in close agreement with the existing mean field result, 40 where R is the strength of anisotropic interaction between aligned particles and N nbr is the maximum number of neighbours with which a solute particle can interact anisotropically.In our case, N nbr = 12 due to diagonal interactions.While we observe finite widths of the stability region of the partially disordered phase for c = 1 on small FIG. 2. Phase diagrams for different anisotropic interaction strengths, where f is the parameter controlling the solute saturation.Coexistence lines shown: (i) II and III.(ii) I and II.(iii) Solute and solvent.In (a) and (b), the regions of stability of the solvent rich state are marked by an asterisk ( * ).Shown in (c) is the comparison of mean field predictions (A) and (B) as given by ( 6), and computational results for the coordinates of the solute phase coexistence points.The equilibrium distributions of nuclei were sampled at parameter values indicated by black crosses in (b).Lines (i) and (ii) were estimated via the multicanonical approach described in Sec.III applied to the model on a cubic lattice with a linear dimension L = 8.Lines (iii) were estimated using the same approach but in a cubic system with L = 4.In (b), line (iii) is in good agreement with the mean field approximation (7) shown by (C).
lattices, we find that the proximity of the two solute phase coexistence lines increases with system size.Hence, in accordance with the mean field prediction, we expect the partially ordered phase to be metastable everywhere but the disorder-order coexistence line in the thermodynamic limit.
We derive an approximation to the solute-solvent coexistence line for c = 1 by considering the lattice adaptation 41 of the Widom expressions 42,43 for solute and solvent chemical potentials in the semigrand ensemble.Assuming that, at conditions of solute-solvent coexistence, the free energy change due to insertion of one solvent particle into a system with N solute particles is equal to that due to insertion of one solute particle into a system with N solvent particles, we arrive at the ideal gas like approximation for the coordinates f * ( β) of solute-solvent coexistence points, where the inverse temperature of the order-disorder transition.The estimates of solute-solvent coexistence points, obtained via the multicanonical sampling of ln P(N 1 ) in a cubic L = 4 system, appear in good agreement with Eq. ( 7) (Fig. 2(b)).Additionally, we find that β * (Q) is in reasonable agreement with Eq. ( 6) for Q ≥ 24, converging to β † (Q) as a power law for large Q.
We study the energetics of transition of our model between solvent rich and solute rich states at conditions where the system is most stable in the crystalline phase (Fig. 2(b)).Using EPS along with nearest neighbour interpolation based on Delaunay triangulation, we obtain free energy surfaces F (n, υ, χ) = −k B T ln P(n, υ, χ) for k B T ∈ {0.6, 0.65, 0.7} (respectively, 33%, 28%, and 22% undercooling with respect to the order-disorder transition), at a constant value of the fugacity ratio f = 2.25 (167%, 160%, and 155% supersaturation for the respective temperatures).By examining the local minima (υ, χ) ‡ (n) = min (υ, χ) F (n, υ, χ), we note that the preferred orientational ordering of the solute nuclei varies with nucleus size.Thus, we argue that thermodynamic transition pathways, i.e., those limited to slow nuclei growth, starting in the solvent rich phase and leading to the solute crystal, proceed via amorphous and partially disordered precursors.Defining the critical nucleus size n ‡ as the maximum of free energy , we further show that the preferred orientational ordering (υ, χ) ‡ (n ‡ ) of the critical nuclei is temperature dependent (Fig. 3).
We compute the conditional probability distributions energy barrier to nucleation of the ordered phase is lowest.The picture changes, however, at temperatures k B T ≥ 0.7, where the ordered phase has the highest free energy barrier.Hence, we anticipate a crossover regime at intermediate temperatures 0.6 < k B T < 0.7, where the heights of all three free energy barriers are comparable.This hints at the existence of a parameter regime where nuclei of all three solute phases can nucleate on similar time scales, with nuclei of the most stable phase growing at the expense of the others via dissolution and reprecipitation.However, as illustrated by the sparsity of contours between minima in Fig. 3, we find that barriers to solid state transformation between the solute phases are typically comparable to thermal energy at the critical nucleus size, or absent entirely, in either case vanishing completely for large n.This implies that the most probable route to reaching the thermodynamically stable phase is via transformations within the solute rich phase.
The vanishing of the barriers to solid state transformation within the larger nuclei is not surprising, since we study the model in a temperature regime far below the FIG. 4. Numerically obtained free energy curves F X (n) (markers) for nuclei of the three phases at f = 2.25, showing the corresponding CNT least squares fits (lines).For reference, shown by the dashed lines are estimates of the free energies F (n) obtained via a one-dimensional analogue of the EPS procedure described in Sec.III.We find both variants of the EPS approach in excellent agreement with respect to the estimates of F (n).
thermodynamic and kinetic (for L ≥ 32 cubic systems) metastability limits of bulk disordered and partially disordered solute states.In addition, we are able to qualitatively reproduce the temperature dependence of the preferred orientational ordering within the small nuclei by studying the system in the solute rich state on small (L < 10) cubic lattices with reflecting boundary conditions, demonstrating that the relative thermodynamic stability of the three solute phases is strongly size dependent.Thus, the stability of disordered and partially disordered structures in small nuclei can be interpreted as analogous to a finite size effect.
The shapes of the three barriers in Fig. 4, with the exception of faceting effects, are well fitted by the classical nucleation theory (CNT) expression of the form F(n) = F 0 + An 2/3 − Bn, where A and B are related to, respectively, the surface and volume free energy densities of the nucleus and F 0 relaxes the fit by allowing the free energy of the metastable solvent-rich phase to deviate from zero. 16he obtained least squares fits yield parameter values which are consistent with the energetics of our model-the cost of forming solute-solvent interface and the gain associated with the growth of solute domain both increase with the degree of solute ordering.
We observe a dramatic reduction in the quality of the CNT fits if setting F 0 = 0, i.e., fitting the commonly used expression ∆F(n) = F(n) − F(0), consistent with other reports of poor quantitative performance of the functional form ∆F(n) = An 2/3 − Bn in representing explicitly computed free energy barriers. 22,44,45Fitting ∆F(n) to the overall nucleation barriers F (n) in the range n ∈ [n ‡ − 50, n ‡ + 50], we find the barrier heights adequately captured by the CNT fits, suggesting that the standard CNT framework alone is sufficient to formulate an effective description of the energetics of nucleation in the present model, despite the evident multi-stage character of the process.This poses an interesting test case for the increasingly popular "seeding" method for nucleation barrier and rate estimation, which aims to parametrise a CNT based model of nucleation kinetics by extracting the necessary quantities from trajectories of the nucleus size coordinate.Below, we will assess the capability of the "seeding" method to formulate an effective description of the multi-stage nucleation process under different kinetic regimes, thus evaluating the sensitivity of the method to variations in kinetic growth pathways and violation of the Markov assumption on the kinetics of the nucleus size coordinate.

V. SEEDING METHOD
In the framework of CNT, the nucleation rate J is given by 1,46 where ρ is the solute monomer density, J + † is the rate of monomer attachment to the critical cluster, F(n) = −k B T ln P(n) is the free energy of nucleus of size n, ∆F(n) = F(n) − F(0), and Z =  − βF ′′ (n † )/2π is the Zeldovich factor proportional to the square root of the second n derivative F ′′ of F(n) evaluated at n † -the size of the critical nucleus, i.e., n † = argmax n {F(n)}.We distinguish between the empirical value n ‡ of the critical nucleus, as used in Sec.IV, and the value n † maximising the standard CNT formula for F(n) since the two may not agree in general.
A closely related expression 47 to (8) can be derived by assuming the continuum approximation to time evolution of nucleus size n(t) to obey the Langevin equation in the overdamped limit: ṅ(t) = − βD n ∇ n F[n(t)] + ξ n (t), where ξ n (t) is a random process satisfying ⟨ξ n (t)⟩ = 0 and ⟨ξ n (t)ξ n (t ′ )⟩ = 2D n δ(t − t ′ ), with δ(t) being the Dirac delta function, and D n -the diffusivity of the nucleus size coordinate at the top of the barrier ∆F(n) -is taken to be equal to the monomer attachment rate J + † = D n .The "seeding" approach to nucleation rate calculations builds on the above equations and has been reported to yield accurate rate estimates from atomistic simulations when applied to nucleation from the melt. 48Seeding, in this context, amounts to preparing the initial state of the system to contain a nucleus of size n 0 and observing the time evolution n(t) with the initial condition n(0) = n 0 .Taking the ensemble average on both sides of the overdamped Langevin equation yields which, in principle, allows one to reconstruct the free energy profile F(n) from the initial drift velocities ⟨ ṅ0 ⟩ n 0 .In practice, however, accurate estimation of drift velocities for the whole range of the relevant n 0 is often computationally expensive and the free energy barrier ∆F(n † ) is instead estimated by fitting the n derivative of the CNT expression, where ∆µ can be defined as the difference per particle between the free energies of solute and solvent rich states and γ is the product of the nucleus shape factor and the free energy per unit area of the nucleus surface, to a handful of estimates of ⟨ ṅ0 ⟩ n 0 in the vicinity of n † , where D n is assumed to be approximately independent of n.In both cases, the mean initial drift velocities must be expressed in units of βD n , leading to propagation of error in estimates of D n to estimates of the barrier height, which is exponentially amplified in the rate calculation.We, therefore, pay particular attention to error analysis in this work.

A. Microscopic kinetics
In principle, the outcome of the nucleation barrier reconstruction method should be independent of the microscopic kinetics, provided that the relevant statistical ensemble of the system's configurations is sampled appropriately and the solute chemical potentials are correctly maintained.The transmutation-reorientation (TR) MC move set, as defined in Sec.II, provides kinetics which correctly maintain the chemical potentials of all particle species at the expense of incorrectly representing mass transport due to the unphysical transmutation move.A more realistic representation of mass transport can be provided by replacing the transmutation move, in the TR move set, with nearest neighbour particle exchange move, i.e., analogous to Kawasaki dynamics in the Reuse of AIP Publishing content is subject to the terms: https://publishing.aip.org/authors/rights-and-permissions.Downloaded to IP: 212.219.220.116On: Thu, 27 Oct 2016 14:18:54 kinetic Ising model, 49,50 while keeping the reorientation move and the move attempt probabilities unchanged.We will refer to this alternative move set as diffusion-reorientation (DR) kinetics.Due to the conservation of particle counts under the DR kinetics, solute depletion effects can occur during the nucleation process; however, as detailed in Sec.V C, part of the "seeding" approach is to ensure that these depletion effects are negligible.
Apart from being a more realistic model of particle transport, the DR kinetics differ from TR in two ways: (1) The Markov assumption on n(t), as made by the "seeding" method, is violated more strongly due to the stronger memory effects in local particle density fluctuations. 23,51(2) The rates of nuclei growth are much lower, due to the diffusion limited character of the particle attachment process, allowing substantially greater time scales for structural relaxation of the growing nuclei between successive solute attachment events.By applying the "seeding" approach under both sets of kinetics, we are, therefore, able to assess the sensitivity of the barrier reconstruction method to deviations of n(t) from the assumed Langevin dynamics description as well as differences in the observed nuclei growth pathways.

B. Fitting and error estimation procedures
In order to estimate the quantities ⟨ ṅ0 ⟩ n 0 and D n , one typically samples M independent trajectories n j (t) : n j (0) = n 0 at some number W of uniformly spaced time points t i = i∆t, i ∈ {1, . . .,W } for a range of initial cluster sizes n 0 .The estimator for the mean drift velocity ⟨ ṅ0 ⟩ n 0 is given simply by the gradient of the least squares fit to the time series of the mean displacement ⟨∆n(n 0 ,t)⟩ = ⟨n(t) − n 0 ⟩.In accordance with CNT, one expects that trajectories n(t) : n(0) = n † , starting at the critical cluster size, will yield ⟨∆n(n † ,t)⟩ = 0, allowing the diffusivity D n to be estimated via the halved gradient of the linear least squares fit to the time series of the mean squared displacement.In practice, however, deviations from the expected zero drift behaviour are common, and average squared deviation from the mean ⟨SD n (n 0 ,t)⟩ = ⟨[n(t) − ⟨n(t)⟩] 2 ⟩ is used instead.Both ⟨∆n(n 0 ,t)⟩ and ⟨SD n (n 0 ,t)⟩ can be expected to satisfy ⟨∆n(n 0 , 0)⟩ = 0 and ⟨SD n (n 0 , 0)⟩ = 0, allowing the use of a linear model g(t) ∝ t with zero vertical offset for extraction of gradients.The mean drift velocity and diffusivity estimators are then given by, respectively, v n (n 0 ) and ζ n (n † ), which, if writing out the averaging and least squares fitting operations explicitly, can be written as where ∆n j (n 0 ,t) and SD ( j) n (n 0 ,t) are obtained from independent trajectories n j (t) with n j (0) = x 0 .
Assuming the CNT form of the nucleus free energy, as given by Eq. ( 10), we can extract the CNT critical nucleus size n † and the height of the free energy barrier ∆F(n † ), by estimating γ via a fit derived from Eq. ( 9), For known ∆µ, the fitting procedure can be simplified by rearranging Eq. ( 14) for γ.If Brownian motion is a good approximation to n(t) and Eq. ( 10) holds, one should find that γ † (n 0 ) = γ, where hence the least squares estimator γ LS of γ is given by an average over the range n 0 ∈ [a, b], where For normally distributed estimates v n (n 0 ) of ⟨ ṅ0 ⟩ n 0 with known parameters, the parameters of the normal distribution of the sum  b n 0 =a v n (n 0 )n 1/3 0 can be deduced via the scaling and addition properties of normal distributions.If estimates ζ * n of D n also follow a normal distribution, then the quantity  b n 0 =a v n (n 0 )n 1/3 0 /ζ * n follows the distribution of the quotient of noncentral normal variates, 52,53 for which the cumulative distribution function (CDF) is known and can be computed accurately by appropriately integrating the bivariate normal probability density function. 54Thus, knowing the parameters of the normal distributions of ζ n (n † ) and v n (n 0 ), we can compute the confidence interval on γ LS as well as on ∆F(n † ).
The above error estimation approach makes a number of assumptions, in particular, that (1) estimators v n (n 0 ) and ζ * n are unbiased and normally distributed and (2) estimators v n (n 0 ) and ζ * n are independent.While ( 2) is straightforward to guarantee by using independent sets of trajectories for computation of v n (n 0 ) and ζ * n , we find that (1) is not guaranteed even if the dynamics of n(t) exactly follow the overdamped limit of the Langevin equation.In the Appendix, we show that ζ * n may not follow the normal distribution for small M and can be a biased estimator of D n if n(t) is a trajectory of a 1D Brownian particle in a bistable potential.We, therefore, recommend use of M ≥ 100 for estimation of D n ; however, explicit verification of normality of ζ * n for large M may be infeasible in many applications of the "seeding" method.

C. Seed generation
We employ the "seeding" method in our lattice model to estimate the nucleation barrier heights in the range of k B T ∈ {0.6, 0.65, 0.7} and f ∈ {2, 3, . . ., 7}.At each parameter point, we sample M = 10 2 independent trajectories of the largest cluster size n j (t) at time values t i = i∆t, i ∈ {1, . . .,W } with n j (0) = n 0 ∈ [3, 498], ∆t = 10 MCS and W = 10 2 , where a unit of time represents a single MC sweep (MCS) of the whole lattice.
The initial configurations for each independent trajectory were generated by equilibrating a randomly grown compact cluster of solute particles in pure solvent.The initial compact clusters were generated iteratively by, starting with a single solute particle on the lattice, placing a solute particle into a randomly chosen neighbouring site of a randomly chosen member particle of the cluster until the desired cluster size is reached.The term "equilibration" here is used in reference to a statistical ensemble of configurations where the number of solute particles belonging to the constructed cluster is conserved.To achieve correct sampling from the said ensemble, we restricted the MC to the set of sites S, which included solute sites belonging to the grown solute cluster and solvent sites in the cluster's nearest neighbourhood.The equilibration procedure employed transmutation and reorientation moves on solute sites in S as well as non-local particle swap moves on pairs of solute and solvent sites in S, with all move types having equal attempt probabilities.Acceptance of a non-local particle swap move may lead to a change in S, which we took into account in the calculation of the move acceptance probability to assure detailed balance.Moves leading to a change in the size of the grown cluster were rejected with probability 1.
We find that after a burn-in period of 10 3 n iterations of the procedure, the nuclei attain compact shapes and the corresponding distributions of internal orientational order parameter are consistent with those obtained via the threedimensional EPS approach applied to the µVT ensemble.The agreement between the two methods of sampling distributions of cluster orientational order parameter is not surprising, since the difference between the effects of pure solvent and saturated solution on the structure of the nuclei is negligible due to the short range nature of particle interactions and the low counts of solute particles in the supersaturated solution.Upon equilibration of the nucleus, we fill the lattice uniformly with N ρ(k B T, f ) solute particles, where ρ(k B T, f ) is the fugacity and temperature dependent solute concentration, which we compute numerically as the mean fraction of solute particles in the system in the metastable state.
Following seeding, the trajectories n j (t) were generated under composition conserving unconstrained DR dynamics, 49,50 i.e., equally probable reorientation and nearest neighbour particle exchange moves, as well as the composition nonconserving TR dynamics described in Sec.II.Since DR dynamics samples system configurations from the NVT ensemble, conserving the solute particle count, the resultant evolution of nucleus size can be affected by depletion or augmentation of solute-an unphysical effect in open systems. 4To avoid this effect in our sample of trajectories n j (t), we employ the larger cubic lattice sizes of L = 64.In addition, we verify that the maximum change in nucleus size, along all obtained trajectories for DR dynamics, is negligible in comparison to the total number of the available solute particles in the system.

D. Drift velocities and diffusivities
Typical realisations of ⟨∆n(n 0 ,t)⟩ = ⟨n(t) − n 0 ⟩ are shown in Fig. 5, where we see that, consistent with the intuition FIG. 5. Plots of mean displacements ⟨∆n(n 0 , t)⟩ against time t for DR [(a) n * = 82] and TR [(b) n * = 78] dynamics at k B T = 0.7, f = 3. Error bars represent the 95% confidence intervals based on 10 2 observations assuming normal distribution.The three sets of points correspond to trajectories initialised with n 0 given by (i) n * − 60, (ii) n * , and (iii) n * + 10 2 .Solid line segments represent the linear fits with gradients v n (n 0 ) used as estimates of initial drift velocity ⟨ ṅ0 ⟩ n 0 along the cluster size coordinate.
of Sec.V A, nuclei growth rates for TR kinetics are approximately two orders of magnitude faster than those for DR.We observe roughly linear behaviour of ⟨∆n(n 0 ,t)⟩ on short time scales t ≤ 100MCS; however, based on 10 2 observations, we expect considerable relative error margins on initial drift velocities measured under DR dynamics.Estimates v n (n 0 ) of initial drift velocities ⟨ ṅ0 ⟩ n 0 along the cluster size coordinate were calculated by fitting straight lines with zero vertical offset (Eq.( 11)) to time series ⟨∆n(n 0 ,t)⟩ over the range t ≤ 100 MCS.Estimates ζ n (n 0 ) of cluster size dependent diffusivity D n along n were extracted from time series ⟨SD n (n 0 ,t)⟩ = ⟨[n(t) − ⟨n(t)⟩] 2 ⟩, n(0) = n 0 (Fig. 6(a)) in a similar fashion [Eq.( 12)].
In accordance with CNT, trajectories starting at the critical cluster size n † are expected to yield v n (n † ) ≈ 0. Computing the value n † , however, requires knowledge of the value γ LS , which, in itself, requires the estimate ζ * n and, hence, the coordinate of the barrier top.Thus, we estimate the coordinate of the barrier top as n * = argmax n  n n 0 =3 −v n (n 0 ), which exploits the proportionality of dF(n)/dn to the mean initial drift velocity in Eq. ( 9).Again, it is important to stress that n * is an empirical value of the critical cluster size, which may differ from the fitted CNT value n † .We observe that within the range n 0 ∈ [n * − 20, n * + 20] of initial cluster sizes n 0 , the gradient estimates ζ n (n 0 ) of the mean squared displacements Reuse of AIP Publishing content is subject to the terms: https://publishing.aip.org/authors/rights-and-permissions.Downloaded to IP: 212.219.220.116On: Thu, 27 Oct 2016 14:18:54 vary slowly with n 0 , as shown in Fig. 6(b), which is consistent with the intuition that diffusivity along the n coordinate is approximately constant close to the top of the free energy barrier. 1 We therefore use the data for ζ n (n 0 ) in the range n 0 ∈ [n * − 20, n * + 20] to estimate the statistical uncertainty in our measurement ζ n (n * ) of the diffusivity D n along the cluster size coordinate.

E. Choice of ∆µ
In our model the degree of saturation of the solution is dependent only on the difference between solute and solvent chemical potentials.Thus we can set µ 1 = 0 without loss of generality, allowing the chemical potential µ 2 = µ 3 of solute to fully determine the chemical composition of the system.Therefore, the standard choice of the value of the CNT parameter ∆µ, in our case, would be with f * ( β) given by Eq. ( 7), corresponding to the difference between the bulk solute chemical potentials in states of metastable supersaturation and solute-solvent coexistence. 1,4,55It is, however, understood that usage of the bulk value ∆µ = ∆µ coex as the driving force to nucleation may be unsuitable for the description of the microscopic particle attachment process. 56An alternative value ∆µ fit ( β, f ) can be obtained via a two parameter fit of Eq. ( 14), allowing the appropriate value of ∆µ to be determined by the microscopic dynamics of the system.Due to the comparatively low computational cost of free energy calculations in our model, we are able to compute the nucleation barriers over the specified range of parameter values explicitly with high accuracy by utilizing the onedimensional analogue of the EPS procedure, described in EPS against ∆µ −2 coex given by Eq. ( 17).
Sec. III, applied to cubic systems of length L = 32 sites.
To assure absence of finite size effects in the obtained barrier estimates, we verify that the EPS data are reproducible in smaller systems with L = 16.We obtain data which are adequately described by Eq. ( 10) (Fig. 7), yielding the set of values ∆µ EPS ( β, f ) via a two parameter fit of Eq. ( 10) to the EPS estimates of ∆F(n) over a range of 100 values of n near the top of the barrier (Fig. 7(d)).
We now consider the effect of the choice of ∆µ on the capacity of the "seeding" method to reconstruct the explicitly computed free energy barriers by carrying out the fitting procedure [Eq.( 16)] for each of the three possible choices: ∆µ coex , ∆µ fit , and ∆µ EPS .We first obtain estimates of ∆µ fit via two parameter fits of Eq. ( 14 15)] for ∆µ ∆µ coex (Figs.8(b) and 8(c)), while the usage of the bulk value ∆µ coex often yields a clear n dependence of γ † (n) (Fig. 8(a)).Despite the comparable quality of the three fits of Eq. ( 14) (Fig. 8(d)), with occasionally exceptionally poor fits using ∆µ coex (Fig. 8(e)), we find that barrier reconstruction via the "seeding" approach is highly sensitive to the choice of ∆µ (Figs.8(f)-8(h)), only yielding a consistent agreement with EPS for the two sets of dynamics if using ∆µ = ∆µ EPS .
To illustrate the sensitivity of the "seeding" method further, we plot all barrier height estimates in Fig. 9.For f ≥ 3, the EPS barrier height estimates fall on a straight line ∆F(n † ) ∝ ∆µ −2 coex (Fig. 9(a)) as is consistent with CNT. 1 Closer to coexistence ( f = 2) for k B T ∈ {0.65, 0.7}, however, the linear trend does not hold, which cannot be accounted for by statistical errors or finite size effects in our free energy calculations, and, therefore, may be due to the errors in the mean field approximation to f * ( β) in Eq. (7).We find that usage of the bulk values of ∆µ yields a systematic error of at least a factor of 2 in the barrier height estimates due to the "seeding" method (Fig. 9(b)).Although the approach employing a two parameter fit of Eq. ( 14) can yield reasonable agreement between EPS and the "seeding" method data, usage of ∆µ = ∆µ fit leads to barrier estimates which vary greatly with temperature, saturation, and the choice of the system's kinetics (Fig. 9(c)).Setting ∆µ = ∆µ EPS , on the other hand, recovers barrier height estimates which are in excellent agreement with the explicit free energy calculations for f ≥ 3 independent FIG. 9. Comparison of nucleation barrier height estimates obtained via EPS and the "seeding" method.In (a) we plot the EPS estimates of barrier heights against ∆µ −2 coex , yielding good agreement with the linear trend ∆F(n † ) ∝ ∆µ −2 coex for f ≥ 3 at the three values of B T .In (b)-(d), we show the barrier height estimates obtained via the "seeding" method under DR (circles) and TR (squares) dynamics, taking the ∆µ values as, respectively, ∆µ coex , ∆µ fit , and ∆µ EPS , in relation to the linear fits for f ≥ 3 to the EPS data.Error bars represent the 95% confidence intervals obtained via the CDF of noncentral normal quotients.
of the system's kinetics, yet deviate by, roughly, 43% for k B T ∈ {0.6, 0.65}, f = 2 (Fig. 9(c)).Thus, we argue that, even for a well chosen ∆µ, the "seeding" method cannot guarantee an accurate estimate of the nucleation barrier height for low supersaturations in our model.In our case, particularly, the 43% error in barrier height can lead to underestimation of the nucleation rate by up to 10 orders of magnitude when using Eq. ( 8).
The kinetic prefactor ρD n Z of Eq. ( 8) can readily be obtained from our calculations, with estimates ζ n (n * ) of D n being, as to be expected, the only significant kinetics dependent contribution to the rate J for ∆µ = ∆µ EPS .For the explicitly computed nucleation barriers, the Zeldovich factor can be estimated via a parabolic fit near n † , which allows us to compare rate estimates due to EPS and the "seeding" method.Taking into account the statistical errors in our estimates, we arrive at the same observations analysing J as a function of ∆µ coex as outlined above, obtaining reasonable agreement with the CNT prediction ln J ∝ ∆µ −2 coex .

VI. CONCLUSIONS
We have introduced a novel multicomponent lattice model, in which the slow growth limited solute crystallisation pathway proceeds via the metastable disordered and partially disordered solute phases.We have shown that the heights of the barriers to nucleation of the metastable phases in relation to that of the stable crystal vary with temperature, leading to a parameter regime where the free energy barriers to nucleation of all three phases are equal.Due to the low barriers to solid state transformation between the three solute phases, we argue that the present model cannot expect to favour the dissolution-regrowth pathway relevant to homogeneous nucleation of calcite from solution. 10Design of minimal models to capture this process needs to incorporate large energy barriers to direct transformation between solute-rich phases.We hope to report on such models in a future communication.
Turning our attention to nucleation kinetics and the "seeding" method, given that the nucleation barrier height is exponentiated in the CNT rate expression, the largest contribution to error in rate estimates via the "seeding" method lies in the barrier reconstruction.We have shown how to estimate statistical uncertainties in this procedure, and hence demonstrated statistically significant deviation of nucleation barrier height estimates from those obtained via explicit free energy calculations, subject to the definition of the CNT parameter ∆µ.We found that the discrepancies between the two barrier estimation methods vanish over a broad range of parameter space, with the exception of conditions of lower supersaturation, if using values of ∆µ informed by the shape of the explicitly computed free energy barriers.At lower supersaturations, however, the discrepancies between the two methods remain significant, which cannot be explained by the errors in our calculations.A possible source for these discrepancies lies in our choice of reaction coordinate.It is known that the choice of the reaction coordinate plays a major role in quantitative treatment of nucleation. 57While we cannot completely rule out the possibility of having chosen a suboptimal definition of n, results of the committor test 58 suggest the quality of our definition is at worst comparable to those used in off-lattice simulations.
Additional explanations for the failure of the "seeding" method to correctly reconstruct F(n) at weak supersaturation lie in the underlying assumptions of CNT.The functional form of ∆F(n) remains a subject of debate. 22,59In this work we have not made any assumptions regarding the shape of nuclei except that surface area scales as n 2/3 .The single cluster assumption underlying this statement is potentially problematic close to n = 0 where the entropic benefit of multiple clusters may offset the increased interfacial free energy.This trade-off is system size dependent and not corrected for in our free energy calculations.
The "seeding" method also relies on the assumption that the dynamics of n(t) can be approximated by the over-damped Langevin equation.This assumption enters into the calculation of the barrier height as well as the kinetic prefactor.The validity of the Markov assumption for n(t) is expected to depend on the choice of microscopic kinetics.It is known that TR dynamics yield only approximately Markovian n(t), while the behaviour of n(t) under the more realistic mass transport of DR dynamics completely violates the Markov assumption. 23espite the apparent differences between the two sets of kinetics, we observed a remarkable agreement between the two corresponding free energy barrier reconstructions, particularly at higher supersaturations, suggesting that the Markov assumption does not play a great role here.The agreement also suggests that the "seeding" method can recover consistent barrier estimates in different regimes of nuclei growth, provided that the structures of the initial seeds are correctly prepared.In future communications, we will report on direct calculations of nucleation rates via path sampling methods, with the hope to examine more closely the role of the Markov assumption in computing accurate nucleation rates.FIG.11.Mean displacements (a) and mean squared displacements (b) for x(t) against time at k B T = 0.15, M = W = 10 2 .Error bars represent the 95% confidence intervals, the width of which is typically smaller than that of the markers.The solid lines in (a) are drawn with the gradient given by ∇ x U (x 0 ).In (b), the solid lines represent the linear least squares fits to the data. of ζ x between 1% and 4% higher than k B T. Factoring in the error of barrier height and D x estimates due to increasingly high drift contributions to ⟨SD x (x 0 ,t)⟩ away from the barrier top, we observe good agreement of Eq. (A1) with brute force estimates of inverse mean first passage times 63 (Fig. 12(c)).It is important to highlight that the derivation of Eq. (A1) relies on harmonic approximations of the potential U(x) near the top and bottom of the barrier.These approximations, in our case, lead to small errors (Fig. 12(c)) which are cancelled by the systematic errors in estimates of the diffusion coefficient.A more accurate value of j x can be obtained by numerically integrating the Kramers' 60 expression for the stationary current: j x = D x P(−1)e βU (−1) /  1 −1 dxe βU (x) , out of the quasi-stationary state P(x) near x = −1, of the appropriate Fokker-Planck equation.
FIG. 12. Mean drift velocity (a), diffusion coefficient (b), and rate (c) estimates for the Brownian dynamics in U(x) = x 4 − 2x 2 .In (a), the mean drifts are in excellent agreement with the gradient of the potential shown as a solid line.In (b), the estimates of the diffusion coefficient vary parabolically in the vicinity of the barrier top-the solid line shows a parabolic fit to the data to guide the eye.In (c), the points represent brute force computed inverse mean first passage time estimates, dashed lines show the range of rate estimates obtained via the procedure analogous to the "seeding" method, while the solid grey line gives the more accurate values of j x computed via numerical integration of the Kramers' stationary current equation.The dashed-dotted line in (c) is the plot of Eq. (A1) taking β D x = 1.
FIG. 3. Free energy surfaces F ‡ (υ, χ) for critical nuclei n ‡ at three different temperatures and constant fugacity ratio f = 2.25.Contour lines are drawn at intervals of 1.0k B T and the coordinates (υ, χ) ‡ (n ‡ ) of the local minima F ‡ (υ ‡ , χ ‡ ) = 0 are indicated by the crosses.The plots demonstrate the temperature dependence of preferred ordering (υ, χ) ‡ (n ‡ ) of critical nuclei n ‡ .The three regimes shown are as follows: (a) critical nuclei most stable when ordered; (b) critical nuclei most stable when partially ordered; and (c) critical nuclei most stable when disordered.

FIG. 6 .
FIG. 6. Plots of ⟨SD n (n 0 , t)⟩ for n 0 = n * (a) and diffusion coefficient estimates ζ n for n 0 in the range n 0 ∈ [n * − 20, n * + 20] (b) obtained under TR (main panels) and DR (insets) kinetics at k B T = 0.7, f = 3.In (a), the solid line segments represent linear fits used in estimation of cluster size diffusivity ζ n (n * ).In (b), solid lines represent linear fits used for detrending the data for estimation of statistical uncertainty in ζ n (n * ).