Investigation of nonlocal data-driven methods for subgrid-scale stress modelling in large eddy simulation

A nonlocal subgrid-scale stress (SGS) model is developed based on the convolution neural network (CNN), a powerful supervised data-driven approach. The CNN is an ideal approach to naturally consider nonlocal spatial information in prediction due to its wide receptive field. The CNN-based models used here only take primitive flow variables as input, then the flow features are automatically extracted without any $priori$ guidance. The nonlocal models trained by direct numerical simulation (DNS) data of a turbulent channel flow at $Re_{\tau}=178$ are accessed in both the $priori$ and $posteriori$ test, providing physically reasonable flow statistics (like mean velocity and velocity fluctuations) closing to the DNS results even when extrapolating to a higher Reynolds number $Re_{\tau}=600$. In our model, the backscatter is also predicted well and the numerical simulation is stable. The nonlocal models outperform local data-driven models like artificial neural network and some SGS models, e.g. the Smagorinsky model in actual large eddy simulation (LES). The model is also robust since stable solutions can be obtained when examining the grid resolution from one-half to double of the spatial resolution used in training. We also investigate the influence of receptive fields and suggest using the two-point correlation analysis as a quantitative method to guide the design of nonlocal physical models. To facilitate the combination of machine learning (ML) algorithms to computational fluid dynamics (CFD), a novel heterogeneous ML-CFD framework is proposed. The present study provides the effective data-driven nonlocal methods for SGS modelling in the LES of complex anisotropic turbulent flows.


Introduction
Large eddy simulation (LES) is a powerful tool for turbulence simulation. The flow fields in LES are decomposed into the resolved and unresolved parts. The resolved large-scale turbulent motions are directly solved from the Navier-Stokes equations, while the unresolved part is modelled by subgrid-scale (SGS) stress models. In the recent decades, a variety of SGS models are developed to establish mappings from the resolved flow variables to the SGS stress tensor. Based on the Boussinesq hypothesis, the first SGS model proposed by Smagorinsky (1963) represents that the SGS stress is linearly related to the resolved strain rate tensor. This model is simple and effective but purely dissipative since no backward energy transfer from the subgrid-scale to the resolved scale is allowed. The dynamic models are motivated by the defect of the Smagorinsky model (Germano et al. 1991;Lilly 1992;Meneveau et al. 1996). The local value of Smagorinsky coefficient is dynamically determined in different flow regimes and the backscatter can be predicted. In reality, the backward energy transfer from residual motions to the resolved flow field is clipped sometimes by setting the negative eddy viscosity ν t to zero considering the numerical stability (Zang et al. 1993;Vreman et al. 1997). The dynamic models are generally time-consuming because of the additional filtering operation. On the other hand, the gradient model (Clark et al. 1979;Liu et al. 1994) and the similarity model (Bardina 1983;Domaradzki & Saiki 1997) are not sufficiently dissipative and prone to be numerically unstable in actual LES.
Recently, data-driven approaches are introduced to LES modelling (Yang et al. 2019;Sirignano et al. 2020;Yuan et al. 2020;Subel et al. 2021). Gamahara & Hattori (2017) attempted to construct SGS stress models from filtered DNS (fDNS) data of turbulent channel flow using the artificial neural network (ANN). In their work, the strain tensor, rotation tensor, and the distance from the wall are selected as the input features of ANN, while the performance of ANN shows no advantage over the Smagorinsky model in the posteriori test. Wang et al. (2018) compared the performance of two machine learning (ML) algorithms, i.e. random forests and ANN, they found that ANN is better in SGS modelling, while random forests are helpful for input feature selection. Besides, Maulik et al. (2018) used the blind deconvolution method to recover unfiltered variables from the corresponding filtered ones. In compressible flow field, Xie et al. (2019b) applied an ANN mixed model to predict the SGS stress and the SGS heat flux of compressible isotropic turbulence.
Most models mentioned above are working in a pointwise manner, i.e. only local information is considered in prediction. Some researchers (Maulik et al. 2019;Xie et al. 2019a) choose a multiple points stencil as input of ANN to include non-local spatial information. The research of Park & Choi (2021) shows the model with multiple grid points type input can obtain larger correlation coefficients in contrast to the one with single grid point type input in the priori test but suffers numerically unstable in posteriori simulation. In the framework of ANN as shown in figure 1(a), the number of input features multiplied with the increase of the number of stretch grid points, which leads to larger networks and lower efficiency. Besides, feature engineering is a common issue for ANN-based models no matter how many grid points are included.
Previous researches have shown that input feature selection is crucial to the success of ANN-based models and extra spatial information helps improve the model's performance. In the machine learning field, the convolutional neural network (CNN) is another powerful tool that is good at feature extraction and naturally contains spatial topology information. Using CNN, we can simply use primitive variables as input, and let the machine learn a model from DNS data automatically without any assumption. Apart from that, CNN has a much wider scope in contrast to ANN with stencil-type input. As depicted in figure 1(b), the scope size increases with the deepening of the network, any prediction made on the last layer is based on the information from a block of the first layer rather than a point.
Resulting from the above-mentioned advantages, CNN has started to be used in the subgrid-scale modelling (Bolton & Zanna 2019;Zanna & Bolton 2020 Figure 1. (a) A schematic of ANN and (b) CNN with two-dimensional convolutional kernels whose kernel size is 3 × 3. The rectangular boxes of 3-by-3 grid size in the feature maps are the receptive fields where the filter kernels work on. The 1-by-1 grid size boxes are the corresponding output. The activation functions and bias terms are omitted for clarity.
point mapping, ANN with neighbouring stencil data mapping and CNN) for LES of two-dimensional Kraichnan turbulence. They found that CNN can provide the most accurate predictions with less computational overhead. Although these priori tests have demonstrated the potential of CNN-based models, further test in practical simulations is absent. Some obstacles prevent CNN from being used in actual LES (the posteriori test), mainly about efficiency and manoeuvrability. Firstly, CNN would be quite slow when serially working on the CPU. Only when running in a highly parallelized way, it shows a speed advantage. However, most CFD programs are written for CPU which cannot meet the demand of CNN. Secondly, Python is the most widely used computational language in constructing machine-learning models like CNN, various Python open-source libraries make programming convenient. Nevertheless, CFD codes are mainly written in Fortran or C++, so mixed-language programming may be an irritating issue when embedding CNN models to realistic LES. One solution is writing both the ML and CFD codes in the same programming language, but it may take a lot of time to rewrite the source code and the efficiency problem still exists. Another solution is using mix-language programming, while the CFD process and ML process have to run on the same device or machine in this way. In this work, we aim to develop a concise, non-local, efficient SGS stress model using CNN. To overcome the obstacles mentioned above, we propose the heterogeneous Machine-learning CFD (HML-CFD) framework based on inter-process communication technology which breaks the limitation of programming language and running machine. In this framework, the ML process (usually runs on Graphics Processing Unit (GPU), written in Python) and the CFD process (usually runs on Central Processing Unit (CPU), written in Fortran/C++) implemented in different programming languages can work on different devices to leverage the concise and non-local property of CNN while maintaining efficiency.
The rest of the paper is organized as follows. In § 2, we will introduce the numerical methods of ML and CFD, as well as the details of the HML-CFD framework. The results of priori test and posteriori test in turbulent channel flow are discussed in § 3.1 and § 3.2, following the investigation of numerical stability and efficiency in § 3.3. Finally, conclusions are addressed.

Numerical methods
The purpose of this work is to establish a mapping from the filtered velocity field (u, v, w) to SGS stress τ ij using a CNN-based model. In this paper, the datasets used for training and testing ML models come from the DNS of turbulent channel flow. There are various flow characters in such wall-bounded flow, which increases the difficulty of modelling. Besides, the progress of ML-based SGS models in channel flow is slow and demand for more research.

Details of DNS
The governing equations for DNS of turbulent channel flow are given by where u is the velocity vector, p is the pressure and ν is the kinematic viscosity. The simulation is performed with periodic boundary conditions in the longitudinal (x) and transverse (z) directions, the no-slip boundary condition is employed at the top and bottom walls. A third-order Runge-Kutta and six-order compact schemes are applied for time integration and spatial derivatives, respectively. A constant mass flux is maintained in the channel. Simulations are performed at Re τ = 178 and 600 with an open-source flow solver Xcompact3d (Bartholomew et al. 2020). Here, Re τ is the friction Reynolds number defined by wall fraction velocity u τ , the kinematic viscosity ν and the channel half-with δ, while the bulk Reynolds number is denoted as Re b . The computational domain sizes of DNS in the three dimensions are L x = 4π, L y = 2, and L z = 2π. The other computational parameters are listed in table 1. The validity of the DNS is shown in figure 2. To obtain the filtered data from DNS result, the sharp spectral filter is used in the wall-parallel (x and z) directions as in the preview study (Park & Choi 2021). The filter kernel of sharp spectral filter in the spectral space isĜ(k) = H(k c,x − |k|)H(k c,z − |k|), where H is the Heaviside function. The cut-off wavenumbers in the x and z directions are k c,x = 24(2π/L x ) and k c,z = 24(2π/L z ) respectively, which correspond to the filter size (∆x + , ∆z + ) = (46.7, 23.4) in Re τ = 178 case. The training dataset is sampled every 8 and 4 grid points in the x and z direction respectively, so that CNN can be trained with the similar grid resolution as the actual LES. All the grid points in the y direction are sampled except for that on the wall. The grid sizes used in the DNS of Re τ = 178 case are N x , N y , N z = 384, 129, 192 in the three directions. Using the above mentioned sampling method, totally 292608 (N x /8 × (N y − 2) × N z /4) grid points are sampled at each instant. We collect 200 instantaneous fDNS fields from DNS data at Re τ = 178 (80% for training, 10% for validation and 10% for the priori test in § 3.1). Based on our test, we also identify that no noticeable improvement of the performance reaches if using more training data.

CNN-based SGS stress model
Artificial neural network and convolutional neural network are the most widely used tools in the machine learning field. As shown in figure 1(a, b), there are multiple layers Reτ Nx, Ny, Nz Lx, Ly, Lz ∆x + , ∆y + min , ∆z + 4200 178 384, 129, 192 4π, 2, 2π 4.4, 1.0, 4.4 16800 600 384, 321, 384 2π, 2, π 9.8, 1.0, 4.9 Table 1. Parameter values of DNS. Here, Lx, Ly, Lz are the computational domain sizes in the streamwise (x), wall-norm (y) and spanwise (z) direction, respectively. Nx, Ny, Nz are the corresponding grid sizes. ∆x + and ∆z + denote the grid resolution in wall units, ∆y + min is the finest resolution in the wall-normal direction. between the input and output layers for both networks, each layer's input comes from the output of its former layer. In ANN, each neuron in one layer is usually connected to all neurons in the layer before it, the procedure of computing layer output can be formalized as where X l i denotes the ith output in layer l, φ is the nonlinear activation function, f ′ n is the number of hidden neurons in layer l − 1, W l and b l are the trainable parameters called weights and bias, respectively. For CNN, each neuron only receives a restricted region of data from the previous layer called the neuron's receptive field. Trainable weights are called filters in CNN, which are shared across neurons in the same layer and convolves on receptive fields in an iterative way to get the feature maps. The convolution operation is expressed as where f h and f w denotes height and width of filters, respectively. f ′ n is the depth of filters in the previous layer (layer l − 1), X l i,j,k is the output of the neuron located in row i, column j in the feature map k of the convolution layer l. W l m,n,k ′ ,k is the connect weight between feature map k ′ (layer l − 1) and feature map k (layer l), while b l k denotes the corresponding bias term. Weight sharing makes the extraction of shift-invariance and hierarchical features possible in deep neural networks. Because of the distinct character, CNN is ideal for data with a grid-topology such as image and CFD data.
As shown in figure 1(b), the prediction on a point in feature map X l+1 totally relies on its receptive field (e.g. 3 × 3) on the feature map X l , while the prediction of a point in X l depends on its receptive fields (e.g. 3 × 3) on X l−1 , which means a 5 × 5 size receptive fields in layer l − 1 contributes to the prediction of the point in layer l + 1. With the deepen of the neural network, the receptive field gets larger. Generally, the receptive field of a CNN with kernel size f h × f w and L hidden layers is regime around a point directly or indirectly influences the final prediction on the point. Therefore, we could use CNN to develop a nonlocal modelling method.
In terms of the SGS stress modelling problem, a CNN-based model can directly handle the raw field variables u, v, and w without destroying the spatial topology and the flow field feature extraction is automatic. However, the feature selection is hand-engineered and depends on priori knowledge for ANN. On the other hand, the ANN-based model can only include spatial information by adding more neighbouring gird points into input features, which multiplies the number of input features and enlarges the model size.
The training data are the same for ANN and CNN but organized in different ways. ANN makes predictions in a point-to-point way, i.e. taking physical values on a grid point (data shape N f eature ) and predicts τ ij on this grid point, while CNN works in a block-to-block way. Here, N f eature is the number of input features. CNN can take flow variables on the x-z plane (data shape N LES x × N LES z × N f eature corresponding to the two-dimensional (2D) convolution kernel) or a whole three-dimensional (3D) flow field (data shape N LES x × N LES y × N LES z × N f eature corresponding to the 3D convolution kernel) as the input. The grid of turbulent channel flow is usually uniform in the x and z direction but nonuniform in the y direction, the traditional 3D convolution kernel cannot represent uniform grid spacings in different directions. Besides, the 2D convolution kernel requires fewer parameters, therefore it is more efficient than the 3D one. Therefore, we choose the x-z plane type input in this work. Specifically, the training data are fed into the CNN in the shape of 48 × 48 × 3 (= N x /8 × N z /4 × N f eature ). The information of the y direction is contained in the training data sampled at different y locations. This kind of input can be applied to different grid resolutions in the y direction, thus more flexible.
The CNN used in this work is a ResNet (He et al. 2016) with 5 hidden layers shown in figure 3. The input is the three components of filtered velocity (N f eature = 3), while the output is the corresponding six components of SGS stress τ ij . The filter kernels in each hidden layer consist of f n feature maps with size f h × f w . Here f h and f w represent the sizes of receptive fields in each layer which can be regarded as a measure of non-local effect. Generally, the square filter (f h = f w ) is the most widely used in practice, which is also adopted in this work. A series of CNN-based models with different filter depth and size are examined in § 3.1. As a comparison, an ANN with five hidden layers (f n neurons per hidden layer) is also employed. Note that the input feature selection is crucial for ANN-based models, but it is not the focus of this article. The velocity gradient tensor ∂u i /∂x j is chosen as the input feature of ANN (N f eature = 9), since it is widely used in preview works ( Figure 3. Schematic of the ResNet. The inputs and outputs are the primitive quantitiesū,v, andw and six components of τij, respectively. For each convolutional layer, the depth of feature maps is fn and the filter kernel size is f h × fw. the output is also τ ij . The activation function φ for both models is the exponential linear unit (ELU) defined as where α = 1.0 here. There is no activation function in the last layer of both ANN and CNN.
For both the ML models, the objective function to minimize is the mean square error (MSE) loss function denoted as is computed from fDNS data using equation τ f DN S ij = u i u j −ū iūj and N is the total number of training data. The Adam optimizer (Kingma & Ba 2014) is employed to minimize the loss function. We use the friction velocity u τ to norm the training data.

HML-CFD framework
It is known that the ML algorithms (e.g. ANN and CNN) written mostly in Python are suitable for devices with a highly parallel structure like GPU, while most CFD algorithms written in Fortran/C++ run on CPU. When applying ML models to realistic numerical simulations, we may meet the problem of devices selection and mix-language programming. The HML-CFD framework is designed to make the combination of ML models and CFD simulations concise and efficient. As depicted in figure 4, the framework consists of a client and a server. The client is the CFD simulation process demanding unclosed terms or undetermined coefficients, while the server is the ML process waiting for the request from the client. The client sends fluid information to the server first and then waiting for the prediction of pre-trained ML models, the CFD process goes on after the predicted result is returned. The two processes exchange messages through an inter-process communication technology called sockets. A socket is one endpoint of a two-way communication link between two programs running on the network. Sockets allow communications between two different processes on the same or different machines. With the HML-CFD framework, we can focus on the development of the CFD process, while the ML process can work on efficient parallel devices like GPU or cloud computing resources.
In the case of CNN-based SGS model, the CFD process firstly sends the fluid field data u, v, and w to the ML process, then ML model predicts the SGS stress τ ij according to the received messages and returns the predictions to the CFD process. This procedure  repeats until the end of CFD simulation. Owing to the independence of the two program processes, the CFD program and ML program can run on different machines using different languages. In this way, we can take advantages of parallel computing in the computation of ML models and avoid the trouble of mixed-language programming. Hence, our proposed technique is applicable to the actual LES efficiently.
In this work, the CFD simulations are implemented by Fortran on Intel Core i7-9700K

Results
To examine the performance of ML-based models, a priori and a posteriori test are conducted in § 3.1 and § 3.2, respectively. In the priori test, the SGS stresses are predicted by ML models (ANN and CNN) with the input variables from fDNS at Re τ = 178. Two traditional models, i.e. Smagorinsky (SM) model, and Wall-Adapting Local Eddy-Viscosity (WALE) model are also performed as comparison. The WALE model is a kind of algebraic eddy viscosity model, it can return the correct wall-asymptotic behaviour, thus suitable for wall-bounded flows like channel flow. In the posteriori test, all the models are applied in actual LESs. Finally, the numerical stability and computational efficiency are discussed in § 3.3.

A priori test
The pre-trained models are examined with test data from fDNS at Re τ = 178 firstly. The correlation coefficients between the models' prediction τ pred ij and τ f DN S ij , i.e.
is used to assess models' performance. Totally 4 CNN-based models, i.e. CNN-K1, CNN-K3, CNN-K5, CNN-K7, are examined here. CNN-K3 denotes the size of filters used in this network is f h × f w = 3 × 3, CNN-K5 and CNN-K7 have the similar meanings. CNN-K1 is equivalent to an ANN with raw quantities u, v, and w as input, since the filter size is f h × f w = 1 × 1, which means nonlocal information is not considered.
To identify the relatively optimal parameters for different data-driven models, the effect of filter depth f n (or hidden neurons numbers per layer for ANN) is firstly investigated.  Figure 5. Correlation coefficients between the true and predicted τij from different models. Correlation coefficients are averaged over the whole domain (a) and x-z plane (b), respectively.
As presented in figure 5(a), with the increase of f n , ρ τ slightly raises till a critical f n , after which ρ τ shows no noticeable change, indicating larger f n is unhelpful. ANN and CNN-K1 are local methods, they only have different input features, i.e. ∂u i /∂x j and u i , respectively. ρ τ of ANN (about 0.5) is much larger than that of CNN-K1 (about 0.24), exhibiting the importance of feature selection in the local method. For the nonlocal CNNbased model with primitive flow variables u, v, and w as input, i.e. CNN-K3, CNN-K5, and CNN-K7, their predictions show relatively high correlation (ρ τ above 0.8) with the reference data and significantly higher than that of ANN and CNN-K1, indicating the CNN successfully extracted more effective features than the velocity gradient tensor related to the SGS stress from the primitive flow variables. From the comparison of the nonlocal CNN-based models, we could find the larger kernel size, including more nonlocal information in prediction, generally leads to better results but the tread is not so obvious after f h and f w greater than 3. The performance of CNN-K5 and CNN-K7 is nearly the same (marginally outperforms CNN-K3), which demonstrates moderately introducing nonlocal information is beneficial for the LES modelling.
Considering the balance of accuracy of efficiency, CNN-K3, CNN-K5, and CNN-K7 with f n = 32, CNN-K1 and ANN with f n = 16 are selected as the tested models in the following work. The variation of ρ τ with y is depicted in figure 5(b). It is seen that ρ τ of all models is relatively uniform in the outer layer but varies dramatically in the inner layer. The nonuniform performance may result from the fact that there exist distinct flow characters in different flow regions, while here only one model is used to represent features in the whole flow regime. The ρ τ from nonlocal data-driven models keep in a relatively high level in all the wall-normal distances especially for the near-wall vicinity. The results of CNN-K5 and CNN-K7 are quite close, slightly larger than that of CNN-K3 in the outer layer, which is consistent with the result displayed in figure 5(a). Therefore, the CNN-based models successfully extracted the key flow features from primitive variables, the involved nonlocal information contributes to the excellent performance in the priori test. However, high correlation coefficients of ρ τ in priori test can not guarantee the success in the actual LES (Park & Choi 2021), it is necessary to access the accuracy, stability and efficiency of SGS stress models in posteriori test.

A posteriori test
LES of turbulent channel flow with a constant mass flow at Re b = 4200 (denoted as LES178) and Re b = 16800 (denoted as LES600) will be performed using different models. All the LESs are carried out with the same numerical methods as those of DNS    (Moin & Kim 1982) (multiplying the SGS stress by (1 − e −y + /A + ) 2 with A + = 25) is used in the posteriori test of the SM model, no extra treatment is needed in the other models. Re τ obtained from LES with nonlocal models are close to that of DNS (less than 3 % error) in LES178 case. ANN with ∂u i /∂x j as the input shows similar performance as traditional models (around 5% error), while CNN-K1 underpredicts Re τ , indicating the friction velocity u τ is underestimated seriously. Figure 6(a) compares the mean velocity profiles from LES simulations at Re b = 4200. CNN-K3 and CNN-K5 show excellent predictions of the mean velocity, while local methods and the two traditional models overestimate the profile because of the underestimated u τ . Among all the models, only CNN-K7 predicts a lower mean velocity profile. In terms of velocity fluctuations, CNN-K3 and CNN-K5 also perform good, providing fairly well agreements of result with the reference in the x and z direction but noticeably deviation in the y direction. It is difficult to capture fluctuations in the y direction since the value is too small compared to the streamwise fluctuations, especially in near-wall regions. Note that CNN-K7 does not well predict the velocity fluctuations, the root-mean-square (r.m.s.) in the x direction decays too slow with the increase of the wall-normal distances after the peak value. CNN-K3 surpasses CNN-K7, even ρ τ of CNN-K3 is lower in the prior test. On the other hand, the ANN-based model generates fewer fluctuations than the DNS result while CNN-K1 and traditional models provide more. In general, all the SGS models generate reasonable solutions and the nonlocal data-driven models (except for CNN-K7) shows advantages over traditional models, while local ML models (i.e. CNN-K1 and ANN) do not.
The previous test is conducted at the same Reynolds number (Re b = 4200) as that of training data. To evaluate the extrapolation ability of ML models, the posteriori test is also carried out at a higher Reynolds number Re b = 16800. The ANN-based model diverges in this case, numerical stability is a big issue for the local data-driven method like the ANN-based model. As demonstrated in Park & Choi (2021), most ANNbased models face the numerical divergence problem without special treatments (e.g. clipping backscatter), while the extra treatments introduce human intervention and break the intention of data-driven methods to some extend. The SM and WALE models are numerically stable in a wide range of Reynolds numbers, which is the advantage of traditional models with no backscatter. All the CNN-based models generate physically reasonable solutions just as their performance in LES178 case. Re τ from LES600 are well predicted by the nonlocal models (less than 2% error), while CNN-K1 and the two traditional models underpredict it. Figure 7 (a, b) shows the CNN-K3 well capture the mean velocity profile and the r.m.s of velocity fluctuations, while CNN-K5 and CNN-K7 generate less fluctuation than the reference data but still reasonable. The result of CNN-K1 closes to that of the SM model, both overpredicting the mean flow and u rms . LES with the WALE model is better than that of the SM model, the subgrid eddy viscosity ν t of the WALE model tends towards zero as approaching the wall, which is more physically reasonable contrast to constant ν t in all the wall-normal distances. Figure 8 (a, b) compares the prediction of the mean SGS dissipation (ǫ SGS = −τ ij S ij ) and mean backscatter ( ǫ − SGS = 1 2 ǫ SGS − |ǫ SGS | ). The SM model dramatically overestimates the SGS dissipation, which indicates much more energy is transported from resolved scale to unresolved scale. On the contrary, the nonlocal CNN-based models slightly underestimate ǫ SGS in the near-wall regions, while ǫ SGS predicted by WALE model are too small, indicating the dissipation is not enough. The backward energy transfer from unresolved scale to resolved scale is the main cause of numerical instability in LES. From figure 8(b), we could find that the nonlocal models produce slightly less backscatter compared to the reference data without incurring numerical instability, CNN-K5 is the best among CNN-based models. The mean backscatter predicted by CNN-K1 is much lower, which may contribute to its numerical stability in this high Reynolds case. For SM and WALE, only positive ν t can be predicted, thus no backscatter is allowed. From previous tests, we could conclude that the nonlocal CNN-based model can well predict the SGS stresses in turbulent channel flow at LES178 and LES600 cases, their performance surpasses both the local data-driven methods (ANN and CNN-K1) and traditional models (SM and WALE model). The extra included spatial information contributes to the success of the SGS stress modelling. However, from the comparison of CNN-K3 and CNN-K7, we could also found that too large receptive fields seem unhelpful for the modelling. To analyze the effect of receptive fields qualitatively and quantitatively, the two-point correlations R in the streamwise and spanwise directions calculated from DNS data at Re τ = 178 are presented in figure 9. Two y + locations are selected, one close to the wall (y + = 5) and the other close to the centerline (y + = 148). As shown in figure 9, with the separation distance increases, the two-point correlations tend to zero, R fall off steeper in the spanwise direction than that of the streamwise direction, since the streamwise vortices are longer.
In CNN, the final prediction totally depends on the information from its receptive fields, as described in § 2.2, the receptive field of a CNN with f h × f w size kernels and L hidden layers is [L(f h − 1) + 1] × [L(f w − 1) + 1]. The receptive field for CNN-K3 is 11 × 11 here. Because the training data is organized in the shape of 48 × 48 × 3, the physical domain size corresponding to the receptive field is 11 12 π × 11 24 π (4π × 11 48 = 11 12 π, 2π × 11 48 = 11 24 π). In other word, the prediction of τ ij on a grid point is affected by information from a 11 12 π × 11 24 π size flow region around it. The receptive field of CNN-K3, as well as that of CNN-K5 and CNN-K7, is indicated by rectangles in figure 9. As shown in figure 9(a), the receptive field of CNN-K3 only contains the region of physical distance less than 11 24 π in the x direction, which includes the most related information to the predicted point in the flow field. For CNN-K5, the scope is larger and includes extra less related information (R ii < 0.3). CNN-K7 considers nearly 2 3 of the flow field to make predictions on a grid point, while most of the flow region is irrelative to the grid point, which means much more irrelevant flow field information is included. It increases the difficulty of modelling and risk of overfitting. It is more obvious in the z directions as can be seen in figure 9 (b), the two-point correlations R nearly all approach zero when the two-point distance is larger than 0.8 except for R 11 at y + = 148, the extra scopes of CNN-K5 and CNN-K7 do not provide more useful information, which accounts for the reason why CNN-K7 cannot surpass the other nonlocal CNN-based models even with more accessible information. Therefore, introducing moderate nonlocal information is beneficial for the SGS stress modelling, while excessive nonlocal information is unnecessary. The perspective of correlation analysis provides a quantitative method to guide the design of nonlocal data-driven models. On the other hand, the larger kernel size generally leads to more trainable weights and a larger network, thus affecting models' efficiency, which will be discussed below.

The numerical stability and computational efficiency
From the preceding analysis, the ML-based models are examined in the aspect of accuracy, while numerical stability and computational efficiency are also crucial for SGS models. Here, an extra posteriori test is carried out for the nonlocal CNN-K3 model to access its numerical stability and investigate the influence of grid resolutions. We select a series of grid resolutions from nearly one half to double the spatial resolution in the x and z direction used in models' training ((∆x + , ∆z + ) = (46.7, 23.4)), the simulations are conducted in the channel flow at Re b = 16800 with domain size L x , L y , L z = π, 2, π/2. The grid numbers N x and N z in the x and z direction are varied. Hereafter, we use N x ×N z denotes the grid resolutions, e.g. 32×32 representing N x = 32 and N z = 32. The grid numbers in the y direction are set as N y = 65 for the coarser mesh cases (21 × 21 and 32 × 32) and N y = 97 for the finer mesh cases (63 × 63 and 84 × 84). Note that the CNN-K3 model examined here is the same as that tested in § 3.2, only DNS data at Re τ = 178 is used for training, while this test is conducted at a higher Reynolds number.
As can be seen from figure 10(a), the result from the resolution 42 × 42, which is the same as the resolution used in training, provides fairly well agreement with the DNS result. As the resolution becomes coarser (resolution 32 × 32), the result of CNNbased model begins to deviate from the DNS result but is still in the rational range. When the resolution is as coarse as half of the resolution of training data (21 × 21), the grid becomes too coarse for the CNN-based model to generate a correct result, r.m.s. velocity fluctuations are dramatically overpredicted and the Reynolds shear stress is underestimated. Although not so accurate, the model still keeps numerical stability and does not diverge in such coarse resolution. On the other hand, the simulation in finer resolutions is more stable and authentic compared to the coarser ones, but a decline in performance is also clearly seen from the contrast of velocity fluctuations obtained in resolution 42 × 42, 63 × 63, and 84 × 84. The difference is not obvious for the predicted Reynolds shear stresses as shown in figure 10(b). The previous tests indicate the CNNbased models do not show mesh convergence trends as traditional models, i.e. a finer grid can not guarantee a better result. It results from that the ML-based models are trained with data at specific filter size, the models prone to obtain better results in the similar grid resolutions as training data rather than finer ones or the coarser ones. Overall, the CNN-based model is able to generate stable and reasonable solutions in a wide range of resolutions, exhibiting excellent numerical robustness. Finally, table 3 evaluates the models' complexity by comparing the number of parameters, multiply-accumulate (MAC) operations and the time consumption in realistic simulations of LES600 case (grid sizes N x , N y , N z = 42, 97, 42). The number of parameters determines the model size, while MAC denotes the number of calculations in a forward pass of the neural networks. The parameters numbers of CNN-K1 and ANN are quite small compared to the nonlocal models. CNN-K3 has about 30 times more parameters than CNN-K1. With the increase of kernel size, the number of parameters multiplies, so as the number of MAC operations. Note that the MACs for data-driven models (measured in billion) are much larger than the traditional models like SM, which means the models would be inefficient when working serially in practice and parallel running is necessary. When embedding these data-driven models into the realistic LESs, the discrepancy of time consumption for local models and nonlocal models is not so large as the MACs, since the convolution operation is well optimized by the mainstream machine learning libraries and parallel devices. Here, the ML algorithms and CFD simulations are performed on different devices (as introduced in § 2.3) of the same machine in the HML-CFD framework.  Table 3. Comparison of model size and computing efficiency for the LESs with different models in LES600 case. The results of time consumption are scaled by the time required by LES with no model for comparison convenience. Here, the unit K and G denote 10 3 and 10 9 respectively.
under the same condition is utilized to norm other models' time consumptions. Through the parallelization of ML algorithms, the two most accurate models in the posteriori test, i.e. CNN-K3 and CNN-K5, merely spent 39% and 44% extra cost respectively, that are comparable to the traditional models. These results show the potential of MLbased models for replacing traditional models as an efficient tool in reality. The current simulations conducted here are small-scale, the HML-CFD framework will have a great advantage in large-scale computing.

Conclusions
Based on the convolutional neural network, several SGS stress models are developed and examined in turbulent channel flow. CNN can automatically exploit flow features from raw flow variables, thus avoiding the trouble of feature selection. The naturally nonlocal property of CNN enables the prediction process accounting for spatial correlations in a wide flow region and contributes to the numerical stability in realistic LES. These nonlocal models were trained only with DNS data at Re τ = 178, and tested in both Re τ = 178 and 600 case. In the priori test, the correlation coefficients between τ ij predicted by the nonlocal CNN-based models and τ f DN S ij is as large as around 0.87, which is much larger than that of ANN. In the posteriori test, the CNN-based model accurately predicted the mean flow and root-mean-square of velocity fluctuations, showing an advantage over ANN and traditional models in terms of accuracy. These nonlocal datadriven models also well predicted the backscatter without incurring numerical instability. The influence of filter kernel size (or receptive fields) of CNN is also investigated, the receptive field with appropriate size includes spatial information into the prediction process which is beneficial for the SGS modelling, while too large scope size is unnecessary. The two-point correlation analysis explained the impact of kernel size in a physical perspective, providing a quantitative method to guide the design of nonlocal models. On the other hand, the CNN-based model is well fitted to different grid resolutions and shows excellent numerical robustness. Due to our HML-CFD framework, the CNN-based model can be as efficient as traditional models in practical simulations. This framework is suitable for various machine learning algorithms and physical problems, not limited to the LES modelling problem.
Although CNN has the advantage in feature extraction and information perception, it still has some shortcomings. One is that the CNN can be readily applied on a structured grid but suffers in configuration with complex geometry or unstructured grids because traditional convolutional kernels can not contain unstructured topology information. Data transformations or specially designed convolutional filters may be helpful for the solution of this problem. The ANN-based model is more flexible in this aspect since it works in a point-to-point manner. Besides, we have successfully carried out the extrapolation test in the Reynolds numbers and grid resolutions different from those of training data, but it is still a challenge for ML-based models to extrapolate to different flow types that are not contained in the training process. Interpretability is also a big issue for all data-driven approaches, which will be a target in our future work.