Variational Principle for Stochastic Mechanics Based on Information Measures

Stochastic mechanics is regarded as a physical theory to explain quantum mechanics with classical terms such that some of the quantum mechanics paradoxes can be avoided. Here we propose a new variational principle to uncover more insights on stochastic mechanics. According to this principle, information measures, such as relative entropy and Fisher information, are imposed as constraints on top of the least action principle. This principle not only recovers Nelson's theory and consequently, the Schr\"{o}dinger equation, but also clears an unresolved issue in stochastic mechanics on why multiple Lagrangians can be used in the variational method and yield the same theory. The concept of forward and backward paths provides an intuitive physical picture for stochastic mechanics. Each path configuration is considered as a degree of freedom and has its own law of dynamics. Thus, the variation principle proposed here can be a new tool to derive more advanced stochastic theory by including additional degrees of freedom in the theory. The structure of Lagrangian developed here shows that some terms in the Lagrangian are originated from information constraints. This suggests a Lagrangian may need to include both physical and informational terms in order to have a complete description of the dynamics of a physical system.


I. INTRODUCTION
Although quantum mechanics is one of the most successful physical theories and has been experimentally confirmed extensively, there are many fundamental questions still left unanswered. For instance, the origin of probability in quantum mechanics is not clearly understood. It is still a curiosity why the probability is calculated as the absolute square of a complex number. The meaning of wave function, especially the interpretation of wave function collapse in a measurement, has been always a debated topic. These questions were not fully addressed by the traditional Copenhagen Interpretation. Over the years in the modern history of quantum physics, many more theories and interpretations have been developed 1,2 . Among these theories and interpretations, stochastic mechanics is of particular interest because it aims to derive quantum mechanics from classical physics concepts with an additional assumption that a physical system is constantly undergoing a stochastic process 3,4 . There is no need to introduce concepts such as probability amplitude, wave function, or Born's rule, as fundamental elements for the quantum theory. Instead, they are secondary and can be derived.
Historically, the investigation of the connection between quantum mechanics and diffusion process was started in the early days of quantum mechanics. Schrödinger initiated this line of investigation in 1932 by asking a question regarding Brownian motion that is later termed as Schrödinger Bridge Problem (SBP) 5 . SBP essentially aims to searching the most likely random evolution that Brownian particles have taken from an initial end point to another end point, provided the a) Electronic mail: jianhao.yang@alumni.utoronto.ca probability densities of the two end points are given. In the context of Markov diffusion process, the solution of SBP was found to admit a form that is similar to the Born's rule in quantum mechanics 7,8 . A more ambitious attempt to formulate quantum mechanics directly based on diffusion process was put forward by Nelson in 1966 10 , based on an early work due to Fényes who recognized the Schrödinger equation can be understood as a partial differential equation (PDE) for a Markov process 9 . Nelson's theory became the most well-known formulation of stochastic mechanics. In this theory, a rather arbitrary definition of the mean stochastic acceleration was postulated. Then, together with the Fokker-Planck equation, Nelson derived two non-linear PDEs, which when combined together through a set of mathematical transformation, lead to the Schrödinger equation. Subsequent researches 12,17,18 focus on the variational principle that would serves as underlined foundation to recover the Nelson theory. Among them, Yasue's approach 12 is of importance. It not only successfully derives the stochastic acceleration from the least action principle, but also proposes a generic stochastic calculus for variation. Guerra and Morato 17 give another variational approach that can recover Nelson' theory, but with a different structure of Lagrangian, leaving a mystery on why multiple Lagrangian can result in the same Nelson's theory.
More recent researches on stochastic mechanics are motivated by two fronts. First, since the source of the randomness that causes the system to perform Brownian motion is not yet completely known, it is desirable to investigate such source. One promising conjecture is that the space metrics itself is stochastic 20,21 . A particle in such a stochastic space is shown to constantly perform Brownian motion. Given the effect of the randomness of spacetime metrics is universal, the resulting quantum effect is also univeral to all systems in the spacetime. Another proposal, which is less universal since it only ap-plies to charged systems, is that the vacuum electromagnetic field at absolute zero temperature degree is a real radiation. A charged particle constantly interacts with such electromagnetic field. The quantum effects are produced by the electromagnetic noise combined with classical dynamics 22,23 . Second, in wake of the advance on quantum information, there are considerable amount of interest to explore the informational foundation of quantum mechanics. Information concepts such as entropy, Bayes' Theorem, Fisher information, entanglement, etc., can be considered as foundational elements in constructing quantum theory [29][30][31][35][36][37]39 . Some efforts 18,19 have been put forward to recover Nelson's theory with entropy as a key element. However, these formulations depend on other assumptions such as conservation of energy 19 or rather arbitrary constraints in the variation process 18 .
There are opened questions on how stochastic mechanics can explain quantum phenomena, such as entanglement, locality, the 2π periodicity of the wave function. Readers are referred to the well known review papers 3,4 and a more recent paper 26 . One of the subtleties is that to what extend stochastic mechanics can rely on the classical probability theory to explain quantum mechanics specific phenomena, as exemplified by the multitime correlations issue 42,43 . One should be very cautious in equating quantum mechanics to classical probability theory. Nevertheless, these challenges will continue to inspire future researches to bring new physical insights into stochastic mechanics.
In summary, stochastic mechanics remains a promising theory to explain quantum mechanics classically. The theory can be further developed in multiple fronts. This paper is motivated by exploring possible informational foundation of stochastic mechanics. By doing this we wish to uncover additional physical insights on quantum mechanics. The goal of this work is to take a new look on the variational principle, by examining the structure of Lagrangian and constraints related to information measure. We will show that the efforts are indeed fruitful. By defining proper Lagrangian and imposing constraint of relative entropy for both forward and backward path configurations, we are able to derive the time dynamics for both path configurations using the Yasue stochastic calculus. From them, the Nelson theory and the Schrödinger equation are recovered. Furthermore, by adding the Fisher information production into the variational approach, we can also derive the Nelson theory using the Guerra and Morato version of Lagrangian but with the Yasue stochastic calculus. This clears the mystery mentioned earlier. There are several new physical insights our derivation brings in. First, the concept of forward and back paths provides an intuitive physical picture to consider the dynamics of a diffusing particle. It gives more insight on the difference between classical and quantum mechanics in terms of degree of freedom needed to describe a system completely. Second, from methodology perspective, the variational approach presented here can be naturally extended to include new de-grees of freedom. It paves a possible way to derive Dirac equation using the stochastic variation method once the theory is formulated in a relativistic setting. Third, The constraint of zero relative entropy imposed to the least action principle shows that the forward and backward paths, although both are needed for a complete description of a diffusing particle, are indistinguishable through measurement. This echoes the idea of interfering alternative proposed in the path integral formulation of quantum mechanics. Lastly, the structure of Lagrangian developed in our works shows that there is intrinsic connection between the physical variables and quantities related information measure. It is subtle to distinguish them when writing down a Lagrangian that consists many terms. This suggests that it is possible to reconsider from the information measure perspective the meaning of certain terms in the Lagrangian density used in classical or quantum field theory.
Although the formulation presented here is mathematically equivalent to the Nelson theory, we believe the method and the conceptual insights brought up in this work can be valuable for future investigation on the foundation of quantum mechanics.
The paper is organized as followings. Section II briefly reviews Nelson's theory and the stochastic calculus of variation proposed by Yasue. In Section III we present the main results. An information measure, the relative entropy for the froward and backward path configurations, is introduced. We then prove that for a Markov diffusion process, the relative entropy must be zero. Using this as a constraint in the least action principle, we derive the Nelson theory in Section III B. In Section III C, another information measure, the Fisher information, is introduced. This allows us to recover Nelson's theory using Guerra and Morato version of Lagrangian. Section III D compares three sets of Lagrangian that can be used to derive Nelson's theory when coupled with proper information measures. In Section IV, we discuss the physical insights brought in by our formulation, point out the limitations of our derivation, and summarize the conclusions.

II. STOCHASTIC MECHANICS AND VARIATION
This section briefly reviews the stochastic mechanics and emphasizes on the formulation that is referred in later sections. For convenience we will adopt the mathematical notations in Nelson's works 11 .

A. Nelson's Theory
The basic assumption for the stochastic mechanics is to consider a system as a point particle and constantly undergoes a Brownian motion. Let ξ(t) be a Markov diffusion process 1 such that where i = 1, 2, 3 is the spatial index. b + (ξ(t), t) is a vector-valued function which meaning will be given shortly. W i + (t) is the standard, independent Wiener process with properties E[dW i denotes the absolute expectation value. The diffusion coefficient ν is set to be inversely proportional to the mass of the system ν = /2m and is determined later.
Given that the stochastic process ξ(t) is not differentiable, the forward and backward derivatives are defined to replace the regular derivative. For a real-valued stochastic process f (ξ(t)) (also denoted as F (t) for simpler notation), its forward derivative D + and backward derivative D − are defined as 10 where E t [·] is a conditional expectation operator with respect to the configuration at time t. More precisely, the conditional expectation should be denoted as E ξ(t) [·], we use E t [·] for simpler notation. For instance, given t ′ = t, where p(ξ ′ , t ′ |ξ, t) is the conditional probability density. With this definition, it becomes clear that D + ξ i (t) = b i + (ξ(t), t) is the mean forward velocity. The diffusion process can also be written as where dW i − (t) has the same property as dW i + (t) except it is independent of ξ(s) for s ≥ t. By the definition in , which is the mean backward velocity. With the definitions (1), (2), (4), one obtains the following explicit expressions Let ρ(x, t) be the probability density of the diffusion process at position x at time t. Nelson's theory assumes ρ(x, t) satisfies both the forward and backward Fokker-Planck equations. From the forward and backward Fokker-Planck equations, the continuity equation for ρ(x, t) is obtained.
More crucially, the following identity is also derived, A more elegant derivation of (7) is given based on Bayes' theorem 19 and the definition of (2), and (3) by taking x = ξ(t) and . This approach has its advantage since Bayes' theorem is more general in probability theory and there is no need to depend on the Fokker-Planck equation.
The drift velocity is defined as v i = (b i + + b i − )/2, and the so-called "osmotic velocity", which is somewhat misleading and will be further discussed later, is defined as Multiplying 1/ρ to both sides of the continuity equation (6), and taking the gradient of both sides, one gets the Nelson first equation The mean acceleration is defined somewhat arbitrarily as Assuming Newton's second law holds F = ma = −∇φ, where φ is the external potential, and applying (5) to b i − and b i + in (10), lead to the Nelson's second equation Eqs. (9) and (11) give the complete description of the dynamics of a Brownian particle in the context of stochastic mechanics. By introducing a series of mathematical transformations, Eqs. (9) and (11) can be combined into a single linear PDE with complex variable. Let and ψ = e R+iS , one can verify that (9) and (11) are equivalent to which is the Schrödinger equation. Given ∇ i R = u i /(2ν) and u i = ν∇ i lnρ, we have ρ = e 2R . Thus, The Born's rule is naturally derived rather than being postulated.
The description of a diffusing particle in stochastic mechanics differs from the classical mechanics in that it requires two non-linear PDEs, rather than just one. Extra degree of freedom is introduced through the forward and backward velocities. Note that (9) is essentially derived from the Fokker-Planck equations for ρ and the identity (7). This indicates that a component of Schrödinger equation comes from the probability theory itself, which we will explore later in terms of information quantity. For now, we turn back to the definition of mean acceleration (10), which is rather arbitrary. It is desirable to justify (10) from a first principle. This motivates the development of a variational approach.

B. Stochastic Calculus of Variations
There are multiple variational methods proposed to recover Nelson's theory based on the least action principle 12,17,18 or the conserved energy constraint 19 . The Yasue's variation method is of particular interest here, since we will extensively use its stochastic calculus, which we give a brief overview here. More rigorous description of the stochastic calculus can be found in Ref. 12 .
In Yasue's stochastic calculus, the three variables, (ξ, b + , b − ), or equivalently, (ξ, D + ξ, D − ξ), are considered independent during the variation process. Thus, the Lagrangian is denoted as L(ξ, D + ξ, D − ξ). Suppose a particle moves from point a (i.e., ξ(t a ) = x a ) to point b (i.e., ξ(t b ) = x b ), the stochastic action is defined as Here E[·] denotes the absolute expectation along the path ξ(t a ) → ξ(t b ) and t a < t b . Now let the path ξ vary but keep the end points fixed as x a and x b . The variation itself is a stochastic process, denoted as z(t) for t a < t < t b and z(t a ) = z(t b ) = 0. The variation of stochastic action, due to the variation of ξ, is δJ ab = J ab (ξ + z) − J ab (ξ). Let · denote the Euclidean vector norm. Defining and for arbitrary variation z(t), and with the condition that J ab has finite energy. To obtain the explicit expression of δJ ab , one performs Taylor's expansion of L(ξ, D + ξ, D − ξ), (16) To proceed further, we need the following identity 11 for stochastic processes f (t) and z i (t) Due to the property of Markov process, the expectation operator E[·] and the integration over t can exchange the order (see Appendix B). This allows us to apply (17) to (16) Since z(t) is an arbitrary stochastic process, δJ ab = 0 if and only if Defining the Lagrangian for the diffusion process ξ(t) as This justifies the definition of mean acceleration in (10). The choice of Lagrangian in (20) appears to be intuitive as it assumes the kinetic energy is the average of kinetic energy associate with forward and backward velocities. However, there are multiple variables related to velocities, such as b + , b + , v and u. There is flexibility to define the kinetic energy and consequently the Lagrangian. For instance, Guerra and Morato suggested a different Lagrangian 17 as Using L G with a different variation approach, they are able to recover Nelson's theory as well. It has been a mystery why two different Lagragians lead to the same result 2 . We will later show that both versions of Lagrangian can be unified in our variation method. Another interesting point is that the Lagrangian in (20) indicates that the Lagrangian can be split into two parts, one for forward path configuration and the other for backward path configuration. One might ask if there is dynamics equation that can be derived for each of the path configuration using the variation calculus. These are interesting questions to be answered next.

A. Relative Entropy in Path Space
To search the dynamics equations for the forward and backward path configurations, we first introduce the rela-tive entropy of the forward and backward paths, as this in the end leads to a constraint in the stochastic variational principle we will propose. Note that the terms forward or backward do not refer to the direction of time. Instead, it just refers to a path configuration that is described by either the forward mean velocity or the backward mean velocity.
Suppose a particle undergoes a Markov diffusion pro- ) is defined as a backward sample path. It is important to note that γ + ab and γ − ab share the same position variable x(t) but differ by the velocity variables. If one plots the trajectory paths of γ + ab and γ − ab in the four dimension space-time, both paths overlap. But if one plots the paths in the phase spaces defined next, they are different. In the phase space for a diffusing point particle, the particle is identified by Γ : We can envision the phase space Γ comprises two classical phase subspaces Γ + : The probability density within a space measure ..., x n , t n ). In other words, the regular probability measure on the path space dP = ρDx. By the Markov properties, the probability density for the forward path can be written as 3 so that dP + = ρ + Dx. Similarly the probability density for the backward path is and dP − = ρ − Dx. The relative entropy of forward path probability to the backward path probability, i.e., the Kullback-Leibler divergence, is Similarly, The following theorem gives explicit expression of the relative entropy.
Theorem 1 For a Markov diffusion process ξ(t) from t a → t b , the relative entropy can be written as where ρ(x(t)) is the probability density at x(t), and Applying (7) to (27) Proof of the Theorem and the Corollary is given in Appendix A. One can immediately observe that Then, by the non-negativity property of relative entropy 44 , we have Since H(ρ + ρ − ) = 0 if and only if ρ + = ρ − , the two path probability densities are the same, which is consistent with the results in Ref. 11 . Corollary 1.2 gives two constraints in terms of relative entropy. These two constraints in fact are the same as the transport equations 17 . The significance of Corollary 1.2 is that we recognize them as relative entropy. It says that the information encoded in the forward path configuration and backward path configuration is identical. The dynamics of the Brownian motion needs to comply to this constraint. If we define the total relative entropy as H rel = H(ρ + ρ − ) + H(ρ − ρ + ), by Theorem 1, H rel = 0 and doesn't result in a constraint. We will next see how these constraints play important roles in the stochastic variation.

B. Variations with Relative Entropy Constraint
In this subsection, we will show how the dynamic equations for the forward and backward mean velocities are derived through variational method. From them the Schrödinger equation is recovered.
The Yasue version of Lagrangian (20) already suggests we can split it into two parts, forward and backward La- grangian. Let The corresponding forward and backward actions are defined as where E[·] denotes the absolute expectation along the path x a → x b . In Appendix B, we show that for a Markov diffusion process, one can swap the order of integration and taking expectation. Thus, and E[·] denotes the absolute expectation at time t. Now we combine the forward and backward actions with the constraints in Corollary 1.2 by defining the Lagrangian functional where β is a Lagrangian multiplier. The definition of Lagrangian functional means that we seek to minimize the forward and backward actions with the relative entropy constraints. Substitute (29) and (30) into (33), Next we apply the stochastic calculus described in Section II B. Suppose we variate the diffusion process ξ(t) to be ξ ′ (t) = ξ(t) + z(t) with z(t a ) = z(t b ) = 0, and demands that δJ + ab = o( z ). Substituting (34) into (16), and notice that H a and H b are fixed values, we have To proceed further, we need the following identity that is derived from integration by part. Let f a smooth vector function of the diffusion process, Let . Substituting it into (35), and replacing δb i Applying (17) to (37), we obtain This is the PDE for the dynamics derived from the forward path. Repeat the same variational method on J − ab , we obtain the PDE for the dynamics of the backward path, 0 which is the same as (21) and leads to Nelson's second equation (11). (39) - (40) gives In Appendix D, we prove that if β = , (41) is the same as Nelson's first equation (9). Note that Nelson's equations are written in terms of time dynamics of v i and u i . Here however, using (7) and (5), we can express the time dynamics equation (40) in terms of b i ± as following, Similarly the time dynamics of backward mean velocity is derived from (39) as In summary, (39) and (40), or equivalently, (42) and (43), recover Nelson's theory. From them, the Schrödinger equation (12) can be derived through the similar set of mathematical transformations The variational method presented here can be considered as an extension of Yasue's variational method 12 . In fact, Yasue's variational method is a special case if we define total Lagrangian functional as J ab = (J + ab + J − ab )/2 = E[L Y ]dt + βH rel . But the total relative entropy H rel gives no constraint as mentioned in Section III A. This is why there is no constraint term in Yasue's variational method.
It is worth to note in our derivation of Nelson's equations, and consequently, the Schrödinger equation, we only rely on a limited number of definitions and assumptions. Specifically, there are three key definitions: 1.) The definitions of forward and backward derivatives (2), and forward and backward mean velocities; 2.) The identity (7), which in turn is derived from definitions of forward and backward mean velocities and Bayes' Theorem; 3.) Definition of Lagrangian L ± Y for both forward and backward path configurations. The important assumptions include: 1.) The system is constantly performing Markov diffusion; 2.) The variation principle of extremizing the action with constraints on relative entropy; 3.) The Lagrangian multiplier β is set to be the Planck constant . The third assumption is non-trivial and is an important feature of stochastic mechanics. Its possible justification has been discussed in Ref. 33,34 . The rest of the definitions and assumptions can be naturally understood from classical physics and probability theory. The fact that the Schrödinger equation can be derived from this limited list of definitions and assumptions is striking. It manifests the original goal of stochastic mechanics to explain quantum mechanics from classical physics and principles as much as possible 4 In addition, the variation method presented here gives richer physics since it derives PDEs for both forward and backward path configurations. The relative entropy constraint requires that the information encoded on the forward and backward path configuration is identical, indicating some kinds of information symmetry. The methodological implication of our approach will be discussed further in Section IV. Another interesting feature here is that there is no need to depend on the forward and backward Fokker-Planck equations, which are needed in Nelson's original derivation 10 , and earlier variation approaches 12,[17][18][19] in order to derive the Schrödinger equation. To the contrary, in our variational approach, one can actually derive the Fokker-Planck equations from (41) and (7), as shown in Appendix D. 4 It is likely that the list of definitions and assumptions needs to be expanded in order to recover other parts of quantum theory such as quantum measurement or quantum entanglement. These are future research topics.

C. Variations with Fisher Information
Now we turn to the question on why the Guerra version of Lagrangian (22) can also lead to the same Nelson theory. We wish to derive the theory using the variation method similar to that in Section III B, i.e., through a functional that combines both action and certain information measure of the diffusion process. Clearly such information measure cannot be the relative entropy, so our first step is to search what the suitable information measure might be. It turns out that the answer is related to Fisher information.
Traditionally, Fisher information is defined to measure the amount of information that an probability distribution of random variable x carries about an observable parameter θ. Let f (x, θ) the probability density function for x conditioned on θ, the Fisher information is defined as 29 If we are interested in the Fisher information about the observable of position for a probability density function, we can set the parameter θ as the position variable itself, i.e., let θ = x i and f (x, θ) = ρ(x), the Fisher information can be rewritten as 30,31 Since ∇lnρ = u/ν and ν = /2m, I can be rewritten as 5 We further define the Fisher information production along the diffusion path ξ(t) from t a to t b as Constant ν is multiplied to I so that I ab is dimensionless. With help of Theorem 1, we have the following theorem.
Theorem 2 For a Markov diffusion process ξ(t) from t a → t b , the Fisher information production can be written as 5 It is worth to note that the Fisher information defined here is also related to the Bohm quantum potential. Let Q = − 2 ∆ i ∆ i √ ρ/(2m √ ρ) be the Bohm quantum potential, it can be shown that 32 E[Q] = 2 I/(8m). Bohm potential is considered to be nonlocal. The non-locality issue of stochastic mechanics is further discussed in Section IV.D.
Proof of Theorem 2 is given in Appendix E. The Guerra and Morato version of Lagrangian are given as 17 The corresponding actions can be defined as However, it is verified 11 . Thus the two expressions in (50) are equivalent. Furthermore, one can observe that the difference between A + ab , defined in (50), and A + ab , defined in (31), is a term related to E[∇ i b i + ]dt. This indicates that the difference is related to the Fisher information production I + ab . Thus, instead of minimizing the actions, we seek to minimize the actions and Fisher information production together, with the same relative entropy constraints. With this consideration, we define the Lagrangian functional as where β is the Lagrangian multiplier. Here, we seek to minimize the combination of actions and Fisher information production with the relative entropy constraints for both forward and backward paths, respectively. Substituting (29), (48), (50) into (51), we have The Lagrangian multiplier has been set as β = in Section III B. Let α = /2, J + ab is simplified to 6 J + ab is different from J + ab only by a constant term 2 (H b − H a ). Applied the stochastic calculus described in Section II B by varying the diffusion process ξ(t) to be ξ(t) + z(t) with z(t a ) = z(t b ) = 0, the variation of J + ab can be calculated as (54) Comparing (54) to (35), we see δJ + ab is the same as δJ + ab . The subsequent calculations for (37)- (38) are applicable here, and resulting the same PDE (39) for the dynamics of the forward path. Similarly, one can derive the expression for J Again, J − ab is different from J − ab only by a constant 2 (H b − H a ). Variations on both functional give the same PDE (40) for the backward path. Thus, even though J ± ab and J ± ab are defined quite differently, the variations over both Lagrangian functional converge to the same outcomes.
Essentially, compared to L ± Y , the Guerra and Morato version of Lagrangian L ± G includes a term related to Fisher information production. We define the corresponding Lagrangian functional J ± ab by reversing the effect of this term. Then, applying the same variation method with the relative entropy constraint, we can also recover Nelson's theory.

D. Effective Lagrangian
If both Lagrangians L ± Y , defined in (30), and L ± G , defined in (49), can lead to the same PDEs for the forward and backward paths, one may ask why there can be multiple choices of Lagrangian for the same physical process. In classical mechanics, one typically writes down the kinetic energy K and define Lagrangian as L = K−φ where φ is the potential energy. But in stochastic mechanics, there is no first principle that can guide the definition of Kinetic energy, since there are multiple variables related to velocity.
Further complication is that the Lagrangian functional in the variation method consists the action, which is a functional of the Lagrangian, and the relevant constraints from information measures. Technically once can choose an effective Lagrangian that factors in the constraints, as long as the variation on the Lagrangian functional without the constraint gives the desired dynamics PDEs. For instance, define The corresponding actions are Then, by applying the same variational method in previous section to minimize the actions A ± ab without any other constraints, one can obtain the same PDEs as (39) and (40). Table 1 summaries that for the three sets of Lagrangian, one can apply the variation calculus to derive the same Nelson's theory by combining with correct choices of Fisher information production and (or) relative entropy constraint.
By taking the absolute expectation of these three forms of Lagrangian, we may obtain additional insight. Re- Using identity (36), we can express the expectations of the three forms of Lagrangian in terms of L Y , v and u, Yasue's initial variational approach using L Y ] as the Lagrangian does not require constraint 12 . (58) shows that Chosen this form of Lagrangian, the variational method does not require constraint to derive the Nelson theory, consistent with Yasue's approach. However, using L ± E as the Lagrangian gives the advantage of distinguishing the dynamics of forward and backward path configurations, which is missing in Yasue's approach. The difference between E[L ± Y ] and E[L ± E ] is the term E[v i u i ], which is related to the rate of entropy production. Thus, to use L ± Y as the Lagrangian, the relative entropy constraint is introduced in the variational method. On the other hand, the difference between which is related to Fisher information. Therefore, the Fisher information production is needed in the Lagrangian functional.

A. Degrees of Freedom
In previous sections, we recast the theory of stochastic mechanics into two PDEs for forward and backward path configuration using the variation approach that combines the least action principle and information measure constraints. What are the physical implications of this derivation? The concept of forward and backward paths gives a physical picture when considering the dynamics of a diffusing particle in stochastic mechanics. Recalled Add Fisher information production, and relative entropy constraint FIG. 2: (a) In classical mechanics, a point particle moves from point a to point b with a path configuration γ ab determined by least action principle. (b) In stochastic mechanics, diffusion of a point particle is described with forward and backward path configurations γ ± ab . Each path configuration follows its own stochastic differential equation, but connected through the relative entropy constraint. Combing the two PDEs results in the Schrödinger equation. (c) Conjecture: By introducing rotational degrees of freedom σ ± , there are four path configurations γ i in the phase spaces Γ i (i = 1, 2, 3, 4). Can the variation approach developed here lead to the Dirac equation for spin once it is extended into the relativistic framework? that the term forward or backward does not refer to the direction of time. Instead, it just refers to the path configuration γ ± ab that are described by either the forward mean velocity and backward mean velocity, respectively. This implies that extra degrees of freedom are needed to completely describe the dynamics of the system. For the time being we assume the system under study is a point particle in a three-dimension space. As mentioned earlier, we can envision the phase space Γ : Fig.2b. Path configurations γ ± ab follow their own dynamics in Γ ± , respectively. But they are connected by the relative entropy constraint. The overall dynamics is described by the Lagrangian functional defined in (33). Applying the variation method gives (39) for forward path configuration, and (40) for backward path configuration.
The Schrödinger equation essentially combines two non-linear PDEs into a linear PDE through mathematical transformation. In this perspective, the wave function in quantum mechanics is just a mathematical tool that makes the calculation much easier. But the underline physics is stochastic mechanics that essentially demands extra degree of freedom to completely describe a diffusing point particle.
By replacing the classical velocity with forward and backward mean velocity, one obtains two PDEs that leads to the Schrödinger equation. We may naturally ask if this approach can be further extended. Although the assumption that the system is a point particle with both forward and backward mean velocity is a step forward compared to describing the system as a classical point particle with a single velocity, it is still an over simplification. Suppose we add two new degrees of freedom, σ + and σ − , to describe the rotation of the system clockwise and counterclockwise, respectively. The total phase configuration is expanded to be Γ : In a similar way, we envision the phase space Γ comprises four classical phase subspaces Γ 1 : Fig.2c. There are four path configurations γ i (i = 1, 2, 3, 4) in these phase subspaces. We would expect that more constraints will be imposed, and the variational method can still be applied to give four PDEs, one for each path configuration. Will this lead to, or recover, the more advanced quantum theory, possibly the Dirac equation for spin? To this end, one will need to first construct a relativistic stochastic mechanics [24][25][26] , and also extend the variational approach in a relativistic setting. This is an interesting conjecture for future research.
Furthermore, other degrees of freedom can be added for a more sophisticated physical modeling of a quantum system.

B. Probability for Indistinguishable Alternatives
The constraint of zero relative entropy we impose in the variation principle essentially requires that the probabilities for both forward and backward paths are identical, i.e., ρ + = ρ − . By measuring the probability of a particle diffuses from point a to point b, one cannot tell the underline paths are forward or backward paths. In other words, we can view the two path configurations as indistinguishable alternatives that are needed to complete the description of the diffusion dynamics. It is not clear if it is even possible to design an experiment to distinguish the two path configurations. But is clear that without a specific measurement, the forward and backward paths cannot be distinguishable. They are not exclusive alternatives.
Such similar idea was proposed in the Feynman path integral formulation of quantum mechanics 40 . In path integral, a particle can move from point a to point b through infinite number of alternative paths. These paths are indistinguishable but they should be all counted in order to have a complete description of the quantum behavior of the particle. The law for computing the probability of the particle moving from point a to point b is not just to simply add up the probability of each path. Instead, one adds up the probability amplitude for each path first, then takes the modulus square of the summation. According to Feynman 40,41 , the law of computing the probability is different from classical law because the alternatives are interfering alternatives, instead of exclusive alternative. The concept of interfering alternatives is unique to quantum mechanics.
In our formulation, there is no concept of probability amplitude. Thus, we cannot straightforwardly use the same law of calculating the probability as that in path integral. But there is still similarity in that the forward and backward paths are non-exclusive alternatives. The two paths interfere each other, as shown by the two PDEs (39) and (40). The probability of a particle diffuses from point a to point b is not simply the sum of the probabilities of each path. The characteristic of interfering alternative that is unique to quantum mechanics is manifested here, but in the context of stochastic process.

C. Physical Variable versus Information Measure
As pointed out in Section III D, using the same stochastic calculus of variation, one can derive the Nelson theory, thus the Schrödinger equation, from three different forms of Lagrangian. The action derived from each form of Lagrangian needs to be combined with appropriate constraints related to information measures in the varia-tional method. Effectively, one can just construct a Lagrangian without being aware of the corresponding constraint, as long as the effective Lagrangian leads to the correct form of Euler-Lagrangian equation. This is possible because there is no clear principle to guide the construction the correct Lagrangian, and some of the constraints may not be obvious to recognize. This is the case for L ± E . The terms in the Lagrangian come from proper choice of physical variables, such as kinetic energy and potential energy, or can come from information measure such as entropy production rate and Fisher information. In the case of stochastic mechanics, the Planck constant appears to be the Lagrangian multiplier that converts an information measure related term to act like physical variable term in the variation process.
In both classical and quantum field theory (QFT), the first step of developing a field theory is to construct a proper Lagrangian density. This important first step is a creative one, allowing trial and error. As long as the Lagrangian density leads to the correct form of Euler-Lagrangian equation, it is accepted as an appropriate form. Many complicated Lagrangian density functions constructed in QFT comprise many terms, and some of the terms are introduced intuitively. If we extend the variation principle developed here for stochastic mechanics to field theory, we can reasonably ask the following question for future investigation: From stochastic mechanics perspective, are some of the terms in the Lagrangian density functions in classical or quantum field theory actually reflect certain information related constraints?
At the philosophical level, the structure of Lagrangian developed here implies that a physical theory can embrace both ontological component and epistemic component. A physical theory essentially describes how well a physical phenomenon can be observed. Such observation echoes the ideas brought up by other authors previously 27,28 .

D. Comparison with the Earlier Variational Methods
The variational principle presented here utilizes the stochastic calculus of variations from Yasue 12 and significantly extends Yasue's results. In particular, we are able to derive the PDEs for both forward and backward paths. Yasue's derivation is just a special case in our formulation. Guerra and Morato 17 proposed a different variation approach where the end point of the path varies. Again, they do not derive different PDEs for forward and backward path. Both Yasue and Guerra variation approaches do not involve information measures compared to our approach.
Ref. 18 introduces a saddle-point entropy production principle, where one seeks to extremize the entropy production in the diffusion process with a constraint that essentially comes from the Fokker-Planck equation. In this works, we recognize the equation on the entropy pro-duction is actually a constraint on the relative entropy in the path space, and no need to depend on Fokker-Planck equation. Our approach is more intuitive and more importantly, we are able to derive PDEs for both forward and backward path, which is not shown in Ref. 18 .
A more recent development of variation based entropic dynamics is described in Ref. 19 , where the probability distribution is derived from a variation of relative entropy. Such probability distribution implies a trajectory of Brownian motion. Then, variation based on principle of conservation of energy, together with the Fokker-Planck equation, gives the Schrödinger equation. This approach requires two variation processes, and does not give the dynamics of both forward and backward paths. Furthermore, the relative entropy defined in Ref. 19 is very different from the relative entropy in the present works. The method of variation with relative entropy was investigated in many other contexts. For instance, the original Schrödinger Bridge Problem was later reformulated to be a problem of minimizing relative entropy 6 , which leads to connection to the mass transportation theory 8 . In such reformulation, the relative entropy is defined on the bridging path probability measure relative to a reference path probability measure. Variation over the dynamic probability measure to minimize the relative entropy gives to the solution of the Schrödinger Bridge Problem 8 . However, the way relative entropy is used in our formulation is very different. Here, the relative entropy is defined based on probability densities for both forward and backward path configurations, and used as a constraint in the least action principle.
Variation on the combination of action and Fisher information production to derive the Schrödinger equation was also studied in Refs. 30,31 . However, the derivation was primarily mathematical rather than based on a physical dynamics model such as Brownian motion. Thus, it did not provide the dynamics of both forward and backward paths. However, the justifications in Ref. 31 for choosing Fisher information in their variational method maybe well applicable in the stochastic variational approach here.
In summary, the novelty of the variational method proposed here comes from its capability of deriving PDEs from both forward and backward paths, and the fruitful interplay between actions and information measures in the Lagrangian functional. These results enhance our understanding of stochastic mechanics.

E. Limitations
The rigorousness of the derivation presented here depends on the stochastic calculus described by Yahsue 12 . Thus, it is only as rigorous as the stochastic calculus can be. One limitation to point out is the calculation in (16). There the variation of J ab is calculated as J ab = E L(ξ +z)dt−E L(ξ)dt = E (L(ξ +z)−L(ξ))dt. Essentially it assumes that the variation due to the change of probability density along the path is negligible compared to the variation due to the change of integrand, so that the expectation operators are considered the same. A rigorous treatment to confirm this is desirable, even though this stochastic calculus has been successfully applied to derive Nelson's theory, Noether's theorem 12 , the results in the present works, and in many other applications [14][15][16] .
The origin of randomness of the diffusion process is not investigated here. One explanation is that the space metric tensor itself given stochastically with some appropriate distribution function 21 . A point particle in such stochastic metric space is described as Brownian motion. However, modeling a system as a point particle is an over simplification. As mentioned earlier, an intuitive extension is to introduce rotational degree of freedom into the variation, and investigate what new physical dynamics can be derived. In addition, our construction here is limited to Markov diffusion process. The connection between quantum mechanics and a more generalized reciprocal stochastic process called Bernstein-Markov process was investigated in Ref. 14 , which confirms not only the connection between Bernstein process and Nelson's theory, but also the connection between Bernstein process and the imaginary time version of Schrödinger equation. Whether the variational principle proposed in the present work can be applied to Bernstein reciprocal process is an interesting topic for further research.
The stochastic mechanics has its own limitation to explain the locality problem. Nelson has described a scenario 11 that the behavior of a particle described by Markovian stochastic mechanics depends on the process on a second particle that is uncoupled and separated arbitrary away. Such scenario leads him to believe that stochastic mechanism is not a tenable physical theory to reflect reality 11 . However, it is not clear whether stochastic mechanics should be regarded as a "non-local" hidden variable theory that violates the Bell inequalities. Recent investigation shows that the derivation of Bell inequalities depends on three assumptions: outcome independence, statistical locality, and measurement independence 45 . Violation of Bell inequalities just means at least one of the three assumptions does not hold. The violation does not necessarily imply the theory is non-local. Instead it means the statistical correlation is non-separable 46 . The non-locality issue that Nelson concerned on stochastic mechanics may well be due to the similar non-separability of statistical correlation that exhibits in the quantum correlation through entanglement.
There are other opened questions on how stochastic mechanics can explain quantum phenomena, readers are referred to the well known review papers 3,4 and a most recent recent paper 26 . One of the subtleties is that to what extend stochastic mechanics can rely on the classical probability theory to explain quantum mechanics specific phenomena. For instance, multi-time correlations cannot be straightforwardly calculated using classically probability theory since it predicts different re-sults from quantum mechanics 42 . Instead, to obtain the correct multi-time correlations, one should take account into consideration that after a measurement, the diffusion process is reset 43 .
The challenges and limitations mentioned in this subsection will continue to inspire future researches to bring new physical insights into stochastic mechanics. The present works does not intend to address these opened issues in stochastic mechanics. But we believe the variational principle described here gives new insight that stochastic mechanics is coupled with information measure constraint such as Fisher information that may give rise to the property of non-separability. Therefore, it might explain the locality issue and entanglement phenomenon. This is currently under further investigation.

F. Conclusions
A new variational principle is proposed here to derive the dynamics equations for the forward and backward paths, which when combined together, result in the non-relativistic Schrödinger equation. According to this principle, appropriate Lagrangian must be chosen, together with constraints that are related to information measure such as relative entropy or Fisher information. The derivation method is based on the stochastic calculus. We show three different forms of Lagrangian can lead to the same Nelson theory. The advantages of this variational principle compared to others are clearly shown from its ability to resolve the issue of multiple Lagrangians, and the derivation of the Fokker-Planck equation as a side outcome instead of dependent on it when deriving the Schödinger equation.
The variational principle developed in this work not only mathematically recovers the Nelson stochastic mechanics and the non-relativistic Schrödinger equation, but also brings new insights on the underlined physics. First,, the concept of forward and back paths provides a intuitive physical picture to consider the dynamics of a diffusing particle. It gives more insight on the difference between classical and quantum mechanics in terms of degree of freedom needed to describe a system completely. Methodologically, one can include new degrees of freedom to expand the theory to derive more advanced theory. It is natural to conjecture that this idea can be potentially generalized to derive the Dirac equation if rotational degrees of freedom are included in the stochastic variation. The constraint of zero relative entropy imposed to the least action principle shows that the forward and backward paths, although both are needed for a complete description of a diffusing particle, are indistinguishable through measurement. This echoes the idea of interfering alternative proposed in the path integral formulation of quantum mechanics. Finally, when constructing a Lagrangian, it is subtle to distinguish terms coming from physical variables such as kinetic energy and potential energy or from constraints on certain information mea-sure such as relative entropy and Fisher information. It is intuitive to speculate that if the stochastic mechanics is extended to field theory, some of the Lagrangian terms in the field theory may turn out to actually reflect certain information constraints.

ACKNOWLEDGMENTS
The author would like to thank the anonymous referees for their valuable comments, which help to clarify the physical implications of the present work, and the connection of the present work with the history of the stochastic mechanics.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available within the article. To prove Theorem 1, one substitutes (23) and (24) into (25) and rearranges the terms inside the logarithm function, Expanding the first term, labeled as T 1 , one gets Recalled the following two identities, T 1 is simplified as T 1 = ρ(x 1 )lnρ(x 1 )dx 1 − ρ(x n )lnρ(x n )dx n = H(x n ) − H(x 1 ).

Appendix B: Swapping order of taking expectation and integration
In the definition of action in (31), E[·] is understood as the absolute expectation along the path x a → x b . Given the property of Markov process, the probability density is defined in (23). Thus, = ρ(x 1 )p(x 2 |x 1 )p(x 3 |x 2 )...p(x n |x n−1 ) ∆t ρ(x 1 )p(x 2 |x 1 )p(x 3 |x 2 )...p(x n |x n−1 ) For each of the integration term in the summation over i, we apply the identities (A3) to the integration over dx 1 dx 2 ...dx n except dx i , it ends up with Here E[·] denotes the absolute expectation over the configuration at time t. Thus, the order of integration and expectation can be swapped, as long as the meaning of expectation is understood correctly. The same argument goes for A − ab .