Data Driven Learning of Mori-Zwanzig Operators for Isotropic Turbulence

Developing reduced-order models for turbulent flows, which contain dynamics over a wide range of scales, is an extremely challenging problem. In statistical mechanics, the Mori-Zwanzig (MZ) formalism provides a mathematically formal procedure for constructing reduced-order representations of high-dimensional dynamical systems, where the effect due to the unresolved dynamics are captured in the memory kernel and orthogonal dynamics. Turbulence models based on MZ formalism have been scarce due to the limited knowledge of the MZ operators, which originates from the difficulty in deriving MZ kernels for complex nonlinear dynamical systems. In this work, we apply a recently developed data-driven learning algorithm, which is based on Koopman's description of dynamical systems and Mori's linear projection operator, on a set of fully-resolved isotropic turbulence datasets to extract the Mori-Zwanzig operators. With data augmentation using known turbulence symmetries, the extracted Markov term, memory kernel, and orthogonal dynamics are statistically converged and the Generalized Fluctuation-Dissipation Relation can be verified. The properties of the memory kernel and orthogonal dynamics, and their dependence on the choices of observables are investigated to address the modeling assumptions that are commonly used in MZ-based models. A series of numerical experiments are then constructed using the extracted kernels to evaluate the memory effects on predictions. Results show that the prediction errors are strongly affected by the choice of observables and can be further reduced by including the past history of the observables in the memory kernel.


I. INTRODUCTION
Direct and accurate computation of complex nonlinear dynamical systems in physical sciences and engineering applications, such as turbulent flows, are in general prohibitively expensive due to the existence of wide range of length and time scales. This challenge motivated the development of reduced order models (ROM) to achieve fast and efficient solutions in practical applications. In the field of turbulence simulations, Reynolds-Averaged Navier-Stokes (RANS) and Large Eddy Simulations (LES) have been widely adopted as alternatives for Direct Numerical Simulation (DNS).
The computational complexity is reduced through coarse-graining, which effectively narrows the ranges of scales that needs to be resolved 1 . High-order moments and their dynamical equations, which account for the effects from unresolved scales, naturally emerge during the coarse-graining process. The high-order moments are then truncated and surrogate models, commonly referred to as subgrid-scale models, have been developed to account for the effects from the high-order moments contributions. Such surrogate models are usually developed under the assumptions of scale similarity and homogeneity, and the resulting simplified models are usually Markovian in nature. Kraichnan 2,3 introduced Direct Interaction Approximation (DIA) as a non-Markovian closure model for turbulence statistics. The evolution of a new quantity, the infinitesimal response function, is employed to model the response of turbulent flows to infinitesimal perturbations. However, this method suffers from certain theoretical (inability of reproducing inertial range behavior) and practical (difficulty in calculating long-time statistics) weaknesses 4 . The Mori-Zwanzig (MZ) formalism was originally developed in statistical physics for constructing low-dimensional, non-Markovian models for high-dimensional nonlinear dynamical systems 5,6 . It provides a mathematically exact procedure for developing reduced-order models for high-dimensional systems. The resulting lower-dimensional model, referred to as the Generalized Langevin equation (GLE), consists of a Markovian term, a memory term, and a noise term. The GLE as derived in the framework of MZ formalism is an exact representation of the dynamics of the reduced-order model. In the context of RANS and LES modeling, the truncated high-order moments in the high-dimensional closed dynamical systems can be accounted for in the memory integral and noise terms under the Mori-Zwanzig formalism. However, in most previous efforts to subgrid-scale modeling, the evolutionary equations are usually treated as Markovian.
Modeling turbulence under the MZ formalism is extremely challenging due to the limited understanding of the memory kernels and orthogonal dynamics. Obtaining the structure of the memory kernel requires the solution of the unresolved orthogonal dynamics, which is another high-dimensional nonlinear dynamical system. Theoretically obtaining the memory kernel is a very difficult task especially for complex nonlinear system. Despite this difficulty, the optimal prediction framework developed in Chorin, Hald, and Kupferman 7 , Givon, Kupferman, and Hald 8 , Chorin and Stinis 9 provided a formal procedure for analyzing the memory effects and developing surrogate models based on the MZ formalism. One of the major modeling difficulties is determining the structure and the length of the memory integral. The widely used t-model approximates the memory length as equal to simulation time t and has been used by Bernstein 10 , Hald and Stinis 11 , Chandy and Frankel 12 for prediction of Burger's equations, Euler equations, and Navier-Stokes (N-S) equations. Parish and Duraisamy 13 later proposed a dynamic-τ model to approximate the memory length using the similarity between two coarse-graining levels. The renormalized MZ models 14,15 embedded a larger class of models that share similar functional forms with MZ formalism but with different coefficients to approximate the memory integral. Stinis 16 , Parish and Duraisamy 17 further used finite order expansion of the orthogonal dynamics and cast it to a set of differential equations that represent the effects of memory integral with finite memory length. These MZ-based turbulence models rely on simplified assumptions and observations of the turbulent flow, most of which have not been verified due to the difficulty in deriving or extracting the MZ operators. Gouasmi, Parish, and Duraisamy 18 proposed a method for estimation of the memory integrals using pseudo orthogonal dynamics, which is only exact for linear dynamics.
The above mentioned models approach the dimensional reduction starting with the original nonlinear systems. The challenges in approximating the memory kernel come from the nonlinearity of the equations. For the same dynamical system, there exists another formulation as proposed by Koopman 19 , Koopman and Neumann 20 . In Koopman's description, the system is characterized by a collection of observables which are functions of the original phase space coordinates. The Koopmanian formulation describe how observables evolve in an infinite-dimensional Hilbert space, which is composed of all the possible observables. The advantage of this formulation is that the evolution of the observables, which is a vector in the infinite dimensional Hilbert space, is always linear, even for systems that are nonlinear in the phase-space picture. The disadvantage of this formulation is that the state space of the system, which consists of all possible observables, is infinite dimensional. Based on the Koopman description of dynamical system, approximate learning methods, such as Dynamic Mode Decomposition (DMD) 21 and Extended Dynamic Mode Decomposition (EDMD) 22 have been developed for data-driven modeling of dynamical systems. By combining the Koopman description with MZ formalism, it is possible to perform a dimensional reduction of the infinite dimensional Koopmanian linear formulation to a finite, low-dimensional dynamical system with memory kernels and orthogonal dynamics. Since the observables evolve in a linear space, the learning problem is convex, which can greatly simplify the learning of MZ operators. Lin et al. 23 proposed a data driven learning framework for MZ memory kernel and noise term under the generalized Koopman formulation, and analyzed the properties of these terms for a Lorenz '96 model. This is the first study that successfully extracted the MZ operators for a nonlinear dynamical system.
In this work, we take the first step to apply the learning algorithm in Lin et al. 23 to a homogeneous isotropic turbulence DNS database to extract the Markov, memory, and orthogonal (noise) terms for the coarse-grained Navier-Stokes system. To the authors' best knowledge, there has been no study using data-driven methods to accurately extract MZ terms for Navier-Stokes turbulence, despite the fact that understanding the properties of the memory kernels and orthogonal dynamics are crucially important not only to quantitatively address the assumptions in MZ-based turbulence models but, more generally, understand the past memory effects for NS truncated dynamics. The manuscript is organized as follows: in Section II, the MZ formalism, as a generalized Koopman learning framework, will be introduced. The data-driven learning framework for MZ kernels is introduced in Section III. The DNS database and post-processing procedures are then explained in Section IV. The results and discussions are presented in Section V and the conclusions are drawn in Section VI.

II. MORI-ZWANZIG FORMALISM
Consider the following semi-discrete high-dimensional Ordinary Differential Equations (ODE) for the full set of state variables where R : R N → R N is a N-dimensional vector of nonlinear real functions defined on R N . Due to the difficulties in simulating and analyzing high-dimensional nonlinear dynamical systems, it is generally desirable to develop low-dimensional representations of the same system. In order to achieve this reduction of complexity, we consider the evolution of a set of observables u(x,t) := g(Φ(x,t)), where g : R N → R D is a D-dimensional vector of observables (functions of the phase space variables Φ(x,t)) and, in general, D < N. Here, we use Φ(x,t) to denote the solution to equation (1) with the initial conditions Φ(0) = x, and g(x,t) to denote the observables g(Φ(x,t)) at time t. One way to define the observables is to decompose the state vari- where L is the Liouville operator, Thus, the PDE (2) becomes the ODE (1) along the characteristics curves. A special choice of g in .., D, that extracts the i th component of the state of the system, x. Using the semigroup notation, the solution of the linear PDE can be written as u(x,t) = e tL g(x), where e tL is referred to as the evolution operator. It can be shown that e tL g(x) = g(e tL x) = g(Φ(x,t)) since e tL x = Φ(x,t). Alternatively, this can be written as e tL Φ(x, 0) = Φ(x,t) and we recognize that the operator K t = e tL is the one parameter family of Koopman operators. We further remark that the evolution operator (Koopman operator) e tL and the Liouville operator L commute, that is L e tL = e tL L . Hence, Eq. (2) becomes In order to construct reduced-order representation of the linear PDE using the reduced Ddimensional vector of observables g, with the initial condition u(x, 0) = g(x), a projection operator, P, needs to be specified that maps functions f (x) defined in the Hilbert space into the subspace The particular choice of projection operator determines the functional form of the Mori-Zwanzig formulation. Examples of the projection operators include nonlinear projection operator that relies on the marginalization of the under-resolved observables 6 and finite rank projection operator that relies on the inner product in the Hilbert space 5 . After the projection operator P is defined, its complement Q is denoted as Q = I − P and satisfies PQ = QP = 0, where I is the identity operator. We then substitute the Dyson identity 24 : in Eq. (4) and arrive at: which can be written in terms of the observables: The specific selection of the projection operators will be discussed below. The above equation is the Generalized Langevin Equation (GLE), which contains a Markov transition term, orthogonal dynamics and a memory kernel that are defined as: with the orthogonality condition PF(x,t) = 0.These three components represent the key construct of Mori-Zwanzig formalism and the GLE is exact in describing the dynamics of observables g.
Eq. (8c) is also referred to as the nonlinear Generalized Fluctuation Dissipation relation 6 . Note that here we use a negative sign in front of the memory term following the convention in Mori 5 , We remark that Eq. • In Mori's formulation 5 , the projection operator relies on inner product defined as: where f and g are L 2 -integrable functions and x is drawn from the probability distribution µ. In this work, we adopt a stationary measure for x. With the inner product, the Mori's projection operator 5 , or the finite rank projection operator, can be defined onto the span of a set of linearly independent basis functions g i (x), i ∈ 1, ..., D: where C −1 0 is the inverse of the covariance matrix [C 0 ] i, j = g i , g j , i, j ∈ 1, ..., D. For a special set of orthonormal basis functions h i (x), the covariance matrix becomes identity matrix so that the projection operator can be simplified: In general, one can use the Gram-Schmidt (G-S) procedure to identify the set of orthonornal • In Zwanzig's formulation 6 , the observables are chosen to be a sub-set of the variables g(x) =x and the projection operator is defined using direct marginalization of the unresolved variables. If the probability distribution µ for phase-space variable x is written for resolved/unresolved variables as ρ(x,x), the projection operator is then defined as: The resulting function P f is generally nonlinear, so this projection operator is also termed as nonlinear projection 25 or infinite rank projection 26 . This nonlinear projection operator has been adopted in Parish and Duraisamy 13,17 to construct MZ-based models for turbulence, where the initial conditions were assumed to be fully-resolved (i.e,x = 0 at t = 0) and the unresolved observables were assumed to remain centered at 0 and delta distributed.
• A recently proposed Wiener projection is used to link the Nonlinear Auto-Regressive Moving Average with eXogenous input (NARMAX) to MZ formalism 27 , where the basis functions g also embed information from past history. Let f n and g n be two discrete-time zero mean wide-dense stationary processes, where subscript n denotes the index of time steps, then the Wiener projection operator can be written as: where the sequence h i is the Wiener filter.
In this work, we focus on the Mori's finite rank projection operator and the corresponding constructed MZ kernels, which is the foundation of the data-driven algorithms proposed in Lin et al. 23 .

A Discrete-time Mori-Zwanzig Formalism
Even though the dynamical system discussed above is formulated in continuous-time, it is common that in high-resolution simulations or experimental measurements, the outputs are discretetime snapshots, where the temporal derivative is not readily available. For completeness, in this section, we introduce the discrete formulation of dynamical system and corresponding MZ formulation following Lin and Lu 27 We write the dynamical equation for the full set of discrete solution vector Φ(n∆) ∈ R N as: where n and ∆ are the time step and time interval of the discrete-time snapshots, respectively.
Similar to the continuous time derivation, we define the observables as g(Φ(n∆)) ∈ R D , where the components g i are functions in L 2 (µ) that map the original solution Φ to a physically observed quantity of interest: R N → R. For simplicity, we use g(n∆) to denote g(Φ(n∆)). To describe the evolution of the observables g, we introduce discrete-time Koopman operator K ∆ that satisfies where the symbol • is used to denote composite functions. By operating the Koopman operator on functions g and applying to the solution variable at the current state, we can obtain the observables at the future time step. We apply the Koopman operator n + 1 times and derive the evolution of observables: With a given projection operator P on Hilbert space H and its complement Q = I − P, we can then write the Dyson identity for the Koopman operator 27 : We substitute Eq. (16) in Eq. (15) and arrive at the evolutionary equation for observables g: = Ω (0) The above equation is the discrete-time GLE and can be understood as the discrete counterpart of Eq. (6). The corresponding Mori-Zwanzig operators can then be derived for the three components in Eq. (17): with the orthogonality condition PW n+1 = 0, ∀ n ∈ N . Eq. (17) is also general and the specific forms of the MZ operators depend on the choice of projection operator.

B. Reduced-order Construction of Navier-Stokes Equation Based on MZ Formalism
In the previous section, we derived and discussed the key components of MZ formalism and their dependence on the projection operator. In this section, we first demonstrate their application on Navier-Stokes turbulence modeling and establish the link between MZ formulation with the nonlinear projection operators and classical turbulence modeling approach, i.e. LES. Due to the challenges in extracting the properties of the corresponding nonlinear MZ operators, we propose the MZ formulation based on Mori's finite rank projection operator as a generalization of Koopman learning framework, which greatly simplifies the learning task and allows us to quantitatively address the assumptions in modeling.

LES and MZ Formulation Based on Nonlinear Projection operator
Consider the three-dimensional discretized velocity field in a fully-resolved numerical simulation v i (t, n x , n y , n z ), where i ∈ 1, 2, 3 is the direction of velocity and n x , n y , n z ∈ 1, .., N x , N y , N z the spatial coordinates of the velocity field. We can stack the discretized solution of the velocity where R are nonlinear functions that can be viewed as the spatially-discretized form of the right hand side of the Navier-Stokes equation given a numerical scheme.
For the spatially discretized N-S equation, we can then derive the discrete-time formulation for the temporally-discretized velocity vector v(n∆) and arrive at Eq. (14). Here, S ∆ encodes information of the temporal scheme, for example, S ∆ (v) = I + ∆ · R(v) for the Euler method.
Fully resolving the dynamics of the Navier-Stokes equations requires prohibitively large amounts of computational resources due to the wide range of scales of practically relevant problems. In the classic approach of reduced-order modeling for turbulence, the velocity field is coarsegrained by applying a spatial filter to reduce the range of scales that need to be resolved. Here, we denote the solution vector of the filtered discretized velocity field as v(t) : where the overline denotes the spatial filtering. Commonly used spatial filters include Gaussian filter, box filter, and spectral filter. The size of the computational mesh required to fully resolve v is significantly reduced because of the reduced range of scales, As a result of spatially filtering the nonlinear NS equation, high-order moments emerge and the system for the resolved variable v is not closed.
Dynamical equations for the filtered velocities can be written as: where R(v(t)) takes the same form as the original NS equations but numerically discretized on a coarser grid and is fully closed, while τ sgs (v(t)) denotes the unclosed sub-grid scale contributions.
Transport equations for the higher-order moments can also be derived, such as equations for the sub-grid stress (SGS) in LES, but they depend on even higher-order unclosed terms. In practice, the resulting infinite dimensional system is truncated to include only the resolved variableΦ = v, and a sub-grid model is used to compensate for the effects from the unresolved moments/scales, which is usually Markovian: τ sgs (v(t)).
In the framework of MZ formalism, we can write evolution equation of the reduced-order variable v as a GLE that consists of a Markov term, a memory term and orthogonal dynamics: We remark that there may exist a projection operator that one can apply to the filter NS equations and establish the connection between the terms in Eq. 22  We also remark that the difficulty of extracting the memory kernel and orthogonal dynamics originates from two aspects: nonlinearity of the Navier-Stokes equations and that of the projection operator. In the next section, we solve this issue by introducing Mori's finite rank projection operator and discuss its relation to the Koopman learning framework, which lays the foundation for the extraction of MZ kernels using a data-driven algorithm. We also remark that we only discuss Smagorinsky-type sub-grid models 28

MZ Formulation Based on Mori's Projection Operator
In the Koopman learning framework 19 , the same dynamical system in Eq. (1) can be characterized by a collection of observables g, which are functions of the physical-space variables Φ.
The system can then be cast from a finite-dimensional system of nonlinear ODEs describing the physical variables to an infinite-dimensional system of linear ODEs that describes all possible observables. In Koopman's formulation, the observables evolve on an infinite-dimensional Hilbert space H , which is composed of all possible observables.
In the Koopman framework, deriving a closed-form solution is equivalent to identifying a set of observables whose dynamics are invariant in a subspace which is linearly spanned by the set of the observables. In general, it is very challenging to identify the finite set of observables that close the dynamics and one has to resort to approximation methods to close the dynamics. Naturally, we can leverage the Mori-Zwanzig formalism and the inner product equipped in the Hilbert space to construct the dynamical equations for the finite set of observables. Lin et al. 23 showed that by using Mori's finite rank projection operator (Eq. (10)), which depends on the inner product of two functions, both the Koopman 20 and MZ formulations operate in a shared Hilbert space. The advantage of using Mori's projection operator is that the projected low-dimensional functions are linear, which significantly simplifies the derivation/learning of the MZ kernels. This is in contrast to the MZ construction based on nonlinear projection operators as discussed in the previous section. Following Lin et al. 23 , the MZ formulations when using the Mori's projection operator with linearly independent basis functions g can be written as: where M and K are D × D matrices. Similarly, the discrete counterpart of MZ formulation based on Mori's projection operator for the discrete-time observable g(n∆) = g(Φ(n∆)) can be written as 23 : Note that in the discrete form, Ω ∆ is the Markov operator, Ω (l) ∆ , l ∈ 1, 2, 3.. are the memory kernels, W n+1 is the orthogonal dynamics, and ∆ represents the discrete time step 23 . There is also a switch of sign in the memory kernel between the two types of formulation in equations 24 and 25, which is because of the conventions in continuous and discrete formulations. In the rest of the paper, the MZ formulation will be mainly discussed using the discrete form, due to the fact that it simplifies the calculation by replacing integral with summation and temporal gradient with numerical values.
We remark that Eq. (24) (or Eq. (25) for discrete-time formulation) is a special case of the general MZ formulation (Eq. (6) and (17)) where the projection operator is chosen to be the finite rank projection operator. The basis functions g are not limited to the original physical-space variables, g(Φ) = Φ, but can be any linearly independent functions of Φ. Naturally, we can construct a MZ formulation for the Navier-Stokes equation using the Mori's projection operator.
The form of the MZ formulation follows that in Eq. (24). Given the simplicity of the linear Markov operator and memory kernels when using Mori's projection operator, we will show that the Mori-Zwanzig operators can be extracted in a relatively straightforward manner using this formulation, unlike the one based on a nonlinear projection operator. We will also continue the discussions using the discrete-time formulation because: (i) observations and simulation results are usually discrete and fully resolved temporal gradients are usually not readily available, and (ii) discrete-time formulation can avoid the errors induced by numerically integrating the continuous-time counterpart.

C. The Evolution of Time Correlation Matrix
In this section, we derive the evolution equation of the time correlation matrix C, which is the foundation of the learning algorithm. For Mori's projection operator, we choose to use the initial condition g(0) as the basis of the projected linear subspace. We then apply an inner product ·g (0) to Eq. (25) and obtain an evolutionary equation for the two-time correlation function C(n∆) = g(n∆), g(0) T : Note that with this procedure, we exploit the orthogonality between the basis function g(0) and the noise W n+1 to remove the complex orthogonal dynamics in the evolution equation of C. The resulting formula (Eq. (27)) builds the foundation for the data-driven learning algorithm.

D. Generalized Fluctuation Dissipation Relation
The relation between the memory kernel and the orthogonal dynamics, i.e. Eq. (8c), is commonly referred to as the Generalized Fluctuation Dissipation relation (GFD). There exist different interpretations of this relation but, in general, it imposes a structural relation between the memory kernel and orthogonal dynamics. Thus, these operators can not be approximated using models in an arbitrary manner. This relation has been used to estimate the memory kernel when a model for orthogonal dynamics is proposed 18 .
When constructing the MZ formulation using Mori's projection operator, a specific form of the GFD can be derived for the two-time correlation matrix, if the Liouville operator L is anti-selfadjoint with respect to the chosen inner product: for any functions f and h of the physical-space variable Φ. Note that the anti-self-adjoint property depends on the choice of the inner product. Lin et al. 23 showed that the Liouville operator of a dynamical system is anti-self-adjoint if the inner product is defined as the temporally averaged value of the product of the test functions evaluated on a long trajectory, provided the observables are bounded along the trajectory. The specific GFD for the discrete-time MZ formulation with Mori's projection operator is 23 : where C(−∆) = C T (∆). This non-trivial relation should be satisfied if the kernels are correctly extracted from the data and will be verified using numerical data.

DYNAMICS
In this section, we briefly describe the learning algorithm proposed in Lin et al. 23 . The learning procedure is based on Eq. (27) and starts by calculating the two-time correlation matrix C(n∆) = g(n∆), g(0) T . As mentioned in previous section, the evaluation of the inner product requires taking expectation value against the stationary distribution dµ or temporally and uniformly sampling/averaging along a long trajectory. Given a long and evenly spaced trajectory of physical space variable (velocity field) from the fully resolved Direct Numerical Simulation Φ(n∆), n ∈ 0, 1...N t − 1, the two-time correlation matrix C(n∆) can be calculated as: where N t is the total number of snapshots and N t ∆ ≫ T l , where T l is the integral time scale of turbulence. It is also beneficial to implement known symmetries of the physical system for data augmentation, which can further improve the accuracy in extracting Mori-Zwanzig operators without generating more data. For isotropic turbulence with triply-periodic boundaries, periodicity and rotational symmetries are satisfied and can be used to facilitate data augmentation. This can be implemented in the calculation of the two-time correlation matrix. Suppose there exist N s symmetric representations of the same physical-space variable S n s (Φ), n s = 1, ...N s , where S n s is the symmetry operator that preserve the statistics of the original physical-space variables Φ. One of such operator could be rotating the velocity field such that u 1 → u 2 , u 2 → u 3 , u 3 → u 1 , and the dynamics are the same for all the symmetric representations. Naturally, Eq. (30) can then be modified to: After the calculation of two-time correlation matrix C(n∆), we can set n = 0 in Eq. (27) to obtain the Markov operator Ω ∆ : We can then recursively solve for the memory kernel Ω After we obtain memory kernel, the orthogonal dynamics can then be extracted for each section in the trajectory using Eq. (25): After obtaining W (i) n+1 , the properties of the orthogonal dynamics can be studied, such as the twotime correlation W n+1 , W 1 , etc.

IV. GROUND TRUTH TURBULENCE DATA
The "Ground-Truth" data is generated from the Eulerian DNS solution of the incompressible Navier-Stokes equations: where pressure p is obtained by solving the Poisson equation and ν is the Kinematic viscosity.
The isotropic turbulence is generated on a 128 3 grid using the pseudo-spectral method. A large scale forcing term is applied to prevent turbulence from decaying. Time advancement is achieved through Adam-Bashforth-Moulton method. The Taylor Reynolds number when the turbulence reaches a statistically-steady state is ≈ 100. See Petersen and Livescu 31 , Daniel, Livescu, and Ryu 32 for more details on the numerical method.
After the turbulent flow becomes fully-developed, the 3D snapshots of the flow field are stored in consecutive time steps to generate a long trajectory of turbulence data. The total length of the trajectory is approximately 3000T l , where T l is the integral time scale. We then apply the postprocessing procedure to obtain the low-dimensional coarse-grained observables. The choices of observables could significantly affect the properties of the Markov operator, memory kernels, and the noise, therefore in this study, we select several sets of observables that are closely related to the canonical turbulence modeling approach (LES) and turbulence theory. To summarize, the following procedures are used in this work to obtain the observables: 1) similar to LES, we first apply spatial filters to the velocity field at each time step with a wide range of filter sizes l ∆ to obtain the filtered velocity v i , 2) various types of observables such as filtered velocity, pressure, sub-grid stress, kinetic energy, etc. , are then computed from the spatially-filtered field, 3) the selected observables on the fine mesh (128 × 128 × 128) are then uniformly sampled onto a coarse mesh (4 × 4 × 4), and 4) the resulting observables at each time step are stacked into a single vector. Following these coarse-graining steps, a dimension reduction from 3×128 3 physical-space variables to n obs × 4 3 observables can be achieved, where n obs is total number of function/variable types. We include the filtered velocities in all sets of observables. In this work, we consider the following four sets of observables for extracting MZ kernels.
Note that in observable set 3, not all components of the strain rate tensor S i j are chosen as basis functions for observables because there exists a linear dependence of the diagonal components from the incompressibility condition. In addition to the above-mentioned functions for each set of the observables, a constant function g 0 = 1 is added to every set of observables, making the total number of observables n obs × 4 3 + 1.

V. RESULTS
In this section, we present the results and properties of the extracted MZ kernels. We first address the statistical convergence of the learned MZ operators from "Ground-Truth" data. The non-trivial GFD relation between the memory kernels and orthogonal dynamics is also verified.
The properties of the Markov operator, memory kernel and noise and their dependence on different choices of observables are then analyzed. Lastly, we conduct numerical experiments using the extracted kernels to investigate the effects of memory kernel on prediction.

A. Statistical Convergence
The statistical convergence of the computed two-time correlation matrix C and the learned kernel is an important factor in the proposed learning algorithm 23 and needs to be confirmed in order to reduce the effects of statistical variability on the analysis. The accurate computation of the two-time correlation matrix C in the ergodic system requires averaging over a long trajectory. This requires a large amount of data samples, which represent the distribution of the stationary system.
As described in Section IV, we performed fully-resolved simulations to generate a long trajectory of 3D turbulence data and stored total N t 3D snapshots of velocity fields. In the convergence test, three different sampling methods are used to sample from the database: • Method 1: The convergence test data are randomly sampled from the total N t snapshots.
• Method 2: The total N t snapshots are first coarsely sampled in time to make sure that any two snapshots are at least one integral timescale apart. An integral timescale T l ≈ N int dt corresponds to approximately N t /N int snapshots. The convergence test data are then randomly sampled from the N t /N int snapshots. This procedure ensures that the data are not temporally-correlated and are truly independent samples from the stationary distribution. After obtaining the samples from the database, we apply the post-processing procedure as discussed in Section IV to the 3D simulation data in order to generate the vectors of observables g. In the convergence test, the observable set 1 is chosen with the a spatial filtering length π/8. Data augmentation based periodicity and rotational symmetry is also performed for the statistical convergence test. The discrete time step ∆ is chosen to be 10dt. Figure 1 shows the percentage changes of the Frobenius norm of the learned Markov operator and memory kernels with different number of samples, where the percentage change is calculated using formula: Here, the subscript F denotes the Frobenius norm and n denotes the number of samples used for calculating the Markov operator Ω in the accuracy of the learned kernels. It is also interesting to note that there is no difference in the rate of convergence among the three sampling methods. Considering this, we will use Method 1 for sampling, as it uses the largest amount of data, thus resulting in the smallest error in the learned kernel.  ∆ ] 1,2 is trivial in the Markov operator and further increasing samples would result in negligible improvements. Ideally, one could inject known physics into the learning framework to impose constraints on the learned kernels. However, in our first attempt in extracting MZ operators, we only apply the physical symmetries in the learning and let the data to inform/reveal the structure of the MZ operators. Figure 2 (b) shows similar results for the memory kernel Ω (1) ∆ , which provide evidence for the convergence of the learned memory kernel.

B. Generalized Fluctuation-Dissipation Relation
The Generalized Fluctuation-Dissipation relation (GFD) refers to the subtle self-consistent relationship between the learned memory kernel Ω

C. Properties of the Learned Mori-Zwanzig Operators
There have been few attempts to perform quantitative analysis of the memory kernels and orthogonal dynamics of Navier-Stokes equations, due to the difficulty in developing tools for accurately and efficiently extracting them with nonlinear projection operators. The lack of knowledge makes it difficult to justify various assumptions on turbulence models based on MZ. In this section, we unveil some of the important properties of the extracted memory kernels and orthogonal dynamics with the current learning framework, in order to lay foundations for future turbulence model development.
In the discrete Mori-Zwanzig formulation, a hyper-parameter that is not related to the physical system is the discrete time interval ∆. Given a database for a long trajectory with fixed temporal step dt, one can extract the corresponding Markov operator and memory kernels with the discrete

The Effects of Spatial Filters
One of the most important assumptions in MZ-based models is about the length of the memory kernel. In the popular t-model 11 , which has been applied to many nonlinear dynamical systems and achieved certain success, the convolution integral of the memory term is carried out using a simple left-hand quadrature rule, which can be interpreted as assuming an infinite long memory length. Other MZ-based models 13,16,17,33 also make various assumptions and approximations on the length and shape of the memory kernel.
In figure 5, we present the Frobenius norm of Markov operator and the temporal decay of memory kernel. Two types of spatial filters that are commonly used in LES are also used here to compute the coarse-grained observables, in order to understand the effects of the filter type on memory length. In addition, the coarse-graining length scales as reflected by the filtering length l ∆ are also examined. Figure 5 (a) shows the dependence of Markovian contribution on the spatial filter sizes (normalized by integral length scale L). It is noted that the Markovian contribution decreases as the filtering size increases. In figure 5 (b), the Frobenius norm of memory kernel (normalized by its corresponding Markov operator) is plotted against the normalized time delay.
From figure 5 (b), we can make a few important observations. First, the Frobenius norm of the memory kernel does not decrease to zero with a finite time delay; however, it becomes 2-3 orders of magnitude smaller at a time delay around several Kolmogorov timescales. This indicates that using finite support in the memory integral can be a reasonable modeling assumption because the contributions from large time delay are generally negligible. Second, the difference between the two different spatial filter types is small, but the effects of the filtering length scale are significant.
With larger filtering sizes, the temporal decay of the memory kernel becomes slower, making the finite memory length longer, which indicates a shift of dynamical contributions from Markov term to memory integral. Generally, when the filtering length scale increases, the range of scales that can be resolved by the coarse-grained observables becomes smaller. In this case, more past history of the observables needs to be included in the prediction, because the memory kernel formally characterizes the interactions between the coarse-grained dynamical variables and the under-resolved degrees of freedom. These observations suggest a qualitative statement that the more we coarse-grain the observables, the less Markovian the coarse-grained model should be. With the extracted memory kernels in the current study, we are able to calculate the timescales of the memory kernel and use them to verify the assumptions in these models. Figure 6 shows the extracted timescales (normalized by Kolmogorov timescale) for various spatial filtering sizes (normalized by Kolmogorov scale). Two methods are employed for the calculation: 1) the integral of the memory kernel divided by the value at the smallest time delay, and 2) the time delay when the memory kernel dropped to 10% of the maximum value. Note that we used a mean timescale based on the Frobenius norm to quantify the finite memory length, but the timescales may vary for different components of the memory kernel. We observe that the memory length is generally short, within the range of several Kolmogorov timescales. The memory length increases with the increase of the coarse-graining size, in agreement with those estimated by previously proposed models. There exists a weak dependence on the type of filters, which should be taken into account for modeling 13 . We remark that it may seem that including memory integral may significantly increase the storage overhead, but when conducting reduced-order simulations using MZ-based models, it may not be necessary to store all the past information to calculate the memory kernel.
Methods have been proposed based on the quadrature rule to model the memory integral as an additional set of ODEs 33 , which are solved alongside the main coarse-grained equations. One of the aims of this work is to provide validation for constructing MZ-based memory closure models. Addressing the noise/orthogonal dynamics is another essential component in developing models in Mori-Zwanzig formalism. Similar to the memory kernel, it is extremely difficult to theoretically solve the orthogonal dynamics due to its high-dimensionality and complexity, so the understanding of its property is still limited. To circumvent this difficulty, previous models based on MZ formalism have been focused on the projected image, where the noise term vanishes due to its orthogonality to the projection operator. Under the current learning framework, the noise term can be numerically extracted using the DNS database. By analyzing its property, we hope to shed some light on the modeling of the orthogonal dynamics.

The Effects of Observables on the Extracted Operators
Similar to Koopman learning, choosing the appropriate set of observables that can best close the original dynamics systems in the linear space is an important task in the current MZ learning framework, with another layer of complexity due to the existence of memory kernel and orthogonal dynamics. Here, we investigate the effects of different sets of observables on the properties of the learned kernels. These sets are described in IV, and correspond to different aspects of N-S equations and turbulence variables. Figure 8 shows the Frobenius norm of the learned memory kernels for four sets of observables and the Frobenius norm of the corresponding orthogonal dynamics.
It is obvious from figure 8  The memory kernels shown in figure 8 contain components from all the observables in each set, making it difficult to study how the quantities of interest, e.g., filtered velocities, are affected by the different choices of observables. To understand this, we extract from the learned kernels the subset of the matrices that correspond to the filtered velocity and show them in figure 9. We observe that the property of the memory kernel for the filtered velocity is not significantly modified by including more observables in sets 2 and 3, even though the memory kernels for the full set of observables exhibit significant differences. On the other hand, for the observable set 4, the contribution from memory kernel is reduced and the decay of the memory kernel becomes smoother.
A similar trend is observed for the noise, with a smaller magnitude and smoother rate of decay.
We conclude from these observations that by using observable set 4, which contains the RHS of the governing equations, the contribution to dynamics shifts from memory kernel to the Markov term. By comparing figure 8 and 9, we note that different sets of observables may exhibit different memory kernel structure and memory length. This implies that the modeling strategy may need to be changed for different observables. We also remark that adding more observables may not be beneficial for improving the properties of memory kernel for modeling, as shown by observable set 3 results.

D. Numerical Experiments on Memory Effects
In this section, we conduct numerical experiments to illustrate the advantage of including memory kernels for prediction. Consider the following procedure: a) After we apply the learning algorithm for a set of observables g with a time interval ∆, we generate additional samples (> 2 × 10 4 ) on the trajectories of the same observables using the fully resolved simulation. These samples had not been used in the learning of the corresponding kernels. Within each sample, the snapshots are also evenly spaced in time with the same time interval ∆. b) The total length of the trajectories is longer than the time scale of the memory kernel. c) We then use the additional snapshots to predict δ = n∆ into the future using the following formula recursively: , which is the discrete GLE (Eq. (25)) with the assumption that the orthogonal dynamics W n+1 = 0, and calculate the errors on the prediction. d) The errors are then averaged over samples to reduce statistical variability. In this work, we choose the L 2 -norm as the measure of prediction error, which is calculated as: where g pred and g DNS denote the observables from predictions and the "ground truth" DNS simulations. It is evident from figure 10 that the prediction errors decrease when past histories are included in the memory kernel. As the memory length further increases past one Kolmogorov timescale, their effects on the prediction errors vary for different discrete time intervals ∆: for the smallest time interval, the prediction error will increase and then saturate when the memory length is larger than 4τ η . Our numerical results suggest the existence of an optimal memory length when the MZ operators are extracted with small time intervals ∆. When the discrete time interval increases, the optimal memory length disappears and the prediction errors decrease and saturate after a certain memory length. The smallest prediction errors can be achieved when the discrete time interval ∆ matches with the prediction horizon. For a longer prediction horizon δ = 0.2, similar observations can be made on the improvement of prediction errors by including past history. On the other hand, when the discrete time interval ∆ matches with the prediction horizon, the error actually becomes larger than that of the smaller discrete time interval, which shows that there also exists an optimal discrete time interval for the selected prediction horizon. For the larger prediction horizon, the overall improvement by including past history is around 23%. Note that the prediction errors depend on the accumulated magnitude of the orthogonal dynamics, which is neglected in the current prediction method. When the discrete time interval ∆ is small, we need more steps to advance the observables so that the magnitude of the accumulated orthogonal dynamics that are missed in the prediction is larger. On the other hand, when the discrete time interval is too large (compared to the timescale of memory kernel), the projected image across such a large step can become small, which in turn will increase the magnitude of orthogonal dynamics. This may explain why the prediction error is large for discrete time interval ∆ = 0.2. Additionally, one should also take into account the different timescales of chosen observables, which may further complicate the process of choosing optimal discrete interval. Ideally, when proper models for orthogonal dynamics are included in prediction, one would not need to have this concern. The effects on prediction by choosing different sets of observables are explored next. Figure 11 compares the prediction errors of the quantities of interest (filtered velocity) across the four sets of observables presented above. The prediction horizon is δ = 0.05 and it is the same as discrete time Note that in the current numerical experiments, we neglect an important component of the MZ formalism, namely the orthogonal dynamics/noise. The orthogonal dynamics encode the important unresolved initial conditions of the dynamics that are important in reproducing the correct statistical property of the original system. In practice, it is more desirable to select a set of observables with a smaller magnitude of the orthogonal dynamics and shorter temporal correlation, which adds another layer of complexity in selecting observables for prediction. Our extracted noise exhibits complicated/nontrivial two-time correlation, meaning that the orthogonal dynamics can only be modeled by highly nontrivial color noise.

VI. CONCLUSIONS
Developing reduced-order models for turbulence is a challenging problem due to the existence of wide range scales. Traditional modeling strategies based on moment closure (RANS, LES) are derived based on physical intuition. On the other end of the spectrum, Mori-Zwanzig (MZ) formalism provides a formal mathematical procedure for derivation of low-dimensional repre- sentation of high-dimensional nonlinear dynamical systems. The outcome of applying the MZ formalism is the emergence of a memory term and orthogonal dynamics, which make solving the full low-dimensional system computationally expensive. Proper models for memory kernel and orthogonal dynamics need to be developed, which requires a comprehensive understanding of their mathematical properties. However, efforts of directly extracting the memory kernel and orthogonal dynamics in turbulent flows have been scarce, so the understanding of their properties is limited. In this work, we are the first to apply a data-driven algorithm to a fully-resolved turbulence simulation dataset to extract the memory kernel and orthogonal dynamics and analyze their properties. This provides a foundation for developing accurate MZ-based turbulence models including contributions from memory and orthogonal dynamics.
With data augmentation using known turbulence symmetries, the Markov, memory kernels, and orthogonal dynamics can be successfully extracted using a reasonable amount of data and are shown to be statistically converged. The subtle Generalized Fluctuation Dissipation relation, which is a natural outcome of MZ formalism, is verified numerically using the extracted kernels and two-time correlation function of the orthogonal dynamics. This confirms the accuracy and correctness of the learning procedure and the learned MZ kernels. The properties of the memory kernels are shown to have a strong dependence on the spatial filtering sizes and weak dependence on filtering type. The Frobenius norm of the memory kernel exhibits a fast decay, indicating that the finite memory length modeling assumption is reasonable. The timescales of the memory kernels are then calculated and qualitative agreement can be observed with previous studies. The two-time correlation matrix of the orthogonal dynamics exhibits similar dependence on the spatial filtering size as the memory kernel. With a larger filter size, the PDF of orthogonal dynamics becomes more Gaussian-like. The effects of different observables on the extracted kernels are then examined. We observe that expanding the set of observables using nonlinear functions may significantly influence the decay of the memory kernel. On the other hand, the memory lengths of the added nonlinear observables are different, implying that a multi-timescale model may be needed for the expanded observable set. By using the observable set 4, which includes the righthand-side of the filtered N-S equation, the magnitude of the memory kernel decreases. This is explained as a shift of contributions from memory term to Markov term.
The advantages of including past history in prediction using MZ-based models are studied by comparing the prediction error with only the Markovian model. Results show that the L 2 prediction error is lowered by including the memory integral, especially for longer predicting horizon.
When using kernels extracted on a smaller discrete time interval, there exists an optimal memory length for calculating the memory integral. On the other hand, for larger discrete time interval, the improvement on prediction saturates when long enough past history is used. The optimal discrete time interval may also change for different prediction horizons, which is related to the relative magnitude of discrete time interval and memory length. Finally, the influence of observables on prediction is investigated. It is shown that the improvement on prediction of filtered velocity is marginal when including physics-based observables. By using equation-based observables (observable set 4), the improvement of the Markovian model is significant. In addition, including the memory integral further improves the prediction accuracy and the percentage increase in accuracy is also larger for observable set 4. In this study, we only show the improvements in prediction due to the inclusion of memory effects, but without the orthogonal dynamics. When proper models for orthogonal dynamics are proposed and used for prediction, the prediction should significantly improve and the concerns on choosing the optimal prediction parameters could be alleviated. Future works will be dedicated to developing MZ-based turbulence models from the following perspectives: (a) discovering observables with suitable properties for modeling (finite memory length and simple profile of the memory kernel) and (b) devising stochastic models for orthogonal dynamics that satisfies the learned statistical properties.