Reconstruction of effective potential from statistical analysis of dynamic trajectories

The broad incorporation of microscopic methods is yielding a wealth of information on atomic and mesoscale dynamics of individual atoms, molecules, and particles on surfaces and in open volumes. Analysis of such data necessitates statistical frameworks to convert observed dynamic behaviors to effective properties of materials. Here we develop a method for stochastic reconstruction of effective acting potentials from observed trajectories. Using the Silicon vacancy defect in graphene as a model, we develop a statistical framework to reconstruct the free energy landscape from calculated atomic displacements.


I. INTRODUCTION:
Molecular Dynamics (MD) simulation has emerged as one of the primary computational tools giving insight into the atomic scale processes in condensed matter physics, chemistry, and materials science. MD is a method for computationally evaluating the thermodynamic and However, one of the characteristic aspects of MD is the large amount of information it yields, necessitating the compression of the full set of time-dependent positions and conjugate force fields into a small number of physically relevant reduced variables. The complexity of many-particle systems is frequently the motivation for introducing mean-field approximations in order to simplify the descriptions given by complicated equations of motion. For example, linear response of an equilibrium system to a small external influence can be derived using the classical Kubo formula 8 . This approach can be used e.g. to calculate diffusion coefficients by space-time integral of the flux-flux equilibrium correlation function 9 .
Other classical examples include order parameters in physics 10 , slow variables in proteins 11 , and the whole concept of reaction paths and reaction coordinates 12,13 . Until now, most of these information compression methods analyzed the full dynamics of the system parametrized as the time evolution of particle coordinates and conjugate variables (e.g. velocities).  [20][21][22][23] and the dynamics of Si clusters have also been observed [24][25][26] . Recent progress in automated analytical techniques leveraging machine learning have provided a path forward for detailed analysis of large data sets, extracting and tracking features of interest for a more comprehensive and statistically robust footing with regard to atomic dynamics. [27][28][29] . Analysis of such systems necessitates development of pathways to compress the MD simulation results to reconstruct effective potentials acting on the atoms, using collective dynamics of the atomic neighborhood in the analysis.
Here we develop a method for stochastic reconstruction of effective acting potentials from observed trajectories. For this, we perform the molecular dynamic simulation of the silicon vacancy defect in graphene for a known time-dependent excitation force and reconstruct the effective autonomous (time-independent) free energy landscape from calculated atomic displacements.

II. MOLECULAR DYNAMICS SIMULATION:
The MD simulations were performed to predict the response of silicon vacancy defect The initial structure was allowed to relax at the beginning of the simulation using an NPT (isothermal-isobaric) ensemble for 0.5 ns, which was followed by a NVT ensemble at a temperature of 300K for another 0.5 ns to achieve steady temperature (Temperature fluctuations < 3K) for entire system. After reaching a steady temperature for the system, a periodic force is applied, with the form = ( &.(×* + ). Here F is the effective force applied to the Si atom in -z direction and m is a random generated number in the range [0, 1) and A is the factor defining the force magnitude. The force was applied as a short delta function with a duration of 100 fs and at a rate of every 2.5 ps. The time gap between each force application is long enough so the effect of the previous force is damped by the system. Fig. 1 b shows an example of the applied force distribution. For femto-scale dynamics of the MD simulation, on the order of the decorrelation time of the underlying process, it makes sense to utilize dynamical information in the reconstruction.
We leverage the sequential nature of the data and the dynamical information by using a sequential Monte Carlo sampling method to process the observed atomic trajectories and perform parameter estimation. First the energy, E, is fit to the quadratic variation of the process.
As mentioned above, it is assumed that the potential has a specific form and is governed by a set of parameters . It is furthermore assumed that the data arise as exact observations from a discretized first order Langevin stochastic differential equation (SDE) in this potential, where M is the state, = W is the characteristic energy and carries the dimension of energy, Here a somewhat non-standard statistical model is introduced which simplifies the problem of static parameter estimation for stochastic differential equations (SDEs). In particular, discretization error is ignored, and observations are assumed exact, which yields a tractable posterior distribution. A sequential Monte Carlo sampler is then implemented for inference from the proposed posterior distribution.
We consider the following statistical model where & is a prior on some parameter ∈ ℝ g , and a is a time-step Euler approximation of the following SDE where ∈ ℝ l×l is a known diffusion constant (this will be revisited below), M is Brownian motion on ℝ l , M ∈ ℝ l , : ℝ lng → ℝ l is Lipschitz. That is We cannot sample from this distribution, but we can obtain a convergent estimator using sequential Monte Carlo samplers, 36 as described below.
First, we analyze the system for a priori known D. We define ^( ) =^n D ( )/^( ) = a (^n D |^, ) .
Let ^ denote an MCMC kernel such that Let & (b) ∼ & . For = 1, … , , repeat the following steps for = 1, … , : We further define | ( ) ∶= D | ∑ | bcD (( b) ) . Under mild conditions it is well known that as → ∞, one has that | ( ) →^( ) almost surely. Rates, central limit theorem, and large deviations estimates can also be obtained. 37 In practice, we do not have access to D and we must derive it from the data as well.
This is done offline in a preprocessing step before implementing the above method. Observe that X 'nD − X ' ∼ N(−δ∇U(X • ; θ), 2Dδ). It is clear then that the drift term is higher order in when computing an approximation of the quadratic variation of the limiting SDE Indeed, this gives a very good approximation to 2 , and we define ‰ ∶= ‰ /(2 ). As → 0, one has that ‰ → .
An alternative approach to parameter inference is to introduce a pseudo-dynamic on the parameter as ^nD ∼ (^, oe ) , and then solve the filtering problem for ^| D , … ,^, 35 using a standard particle filter. We find that this approach can provide a reasonable approximate reconstruction with appropriately chosen oe ∝ .
Now the long MD trajectory is used to fit an effective Langevin dynamics. 33 It is assumed that in this small mass and long time regime it is reasonable to ignore inertial effects and employ a first order Langevin equation for fitting. 33 First (10) is employed to predict the diffusion constant from the quadratic variation. It is observed that the dynamics is smooth on a femtosecond timescale, and so a timestep of = 0.025 ps is considered. An additional difficulty arises here due to the deterministic time-dependent forcing in the direction, denoted by ( ) = (0,0, ( )) ™ , as well as the difference in effective diffusion coefficients in the ( , ) and directions, denoted 4 and 6 , respectively. The Langevin dynamics now take the form where = ( , , ) ™ , and for simplicity we assume = diag{ 4 , 4 , 6 }. In this equation, (2) has been multiplied through by CD , the inverse of the fluctuation-dissipation coefficient, which is now a matrix, also assumed diagonal, = diag{ 4 , 4 , }. With a known temperature, i.e. It is furthermore assumed that the potential is separable as U( , , ) = 4 ( , ) + 6 ( ), where the following parametric ansatz are employed for the individual terms 4 ( , ) = ( 8 + 8 ) ¤1 + cos ¥4tan CD ¦ § ©ª« , 6 ( ) = 8 + , and so the invariant measure is proportional to exp(−( 4 / 4 + 6 / 6 )).
The method is used to fit ( , , , ). Note that degeneracy may occur if b is allowed to change sign, and ill-posedness will result from , < 0. So, it is assumed that = oe -, =

IV. CONCLUSION:
To summarize, here we propose an approach to derive an effective single-particle potential from the observations of particle dynamics in multiple-particle systems. Here, the approach is developed for the specific case where observations of the full state are available, but it can be adapted for more complex cases as well with additional machinery, including partial and noisy observations. The latter is in fact the standard setting in the literature on hidden Markov models, where it is referred to as a state space model. This case is more challenging, as one must also infer (or marginalize over) the state as well.

ACKNOWLEDGEMENTS:
AYN would like to thank Dr. Seungha Shin and Jiaqi Wang for their expertise and assistance at Nano-Heat lab at the University of Tennessee. This work was also supported by The Alan Turing