Multi-body effects in a coarse-grained protein force field

The use of coarse-grained (CG) models is a popular approach to study complex biomolecular systems. By reducing the number of degrees of freedom, a CG model can explore long time-and length-scales inaccessible to computational models at higher resolution. If a CG model is designed by formally integrating out some of the system’s degrees of freedom, one expects multi-body interactions to emerge in the effective CG model’s energy function. In practice, it has been shown that the inclusion of multi-body terms indeed improves the accuracy of a CG model. However, no general approach has been proposed to systematically construct a CG effective energy that includes arbitrary orders of multi-body terms. In this work, we propose a neural network based approach to address this point and construct a CG model as a multi-body expansion. By applying this approach to a small protein, we evaluate the relative importance of the different multi-body terms in the definition of an accurate model. We observe a slow convergence in the multi-body expansion, where up to five-body interactions are needed to reproduce the free energy of an atomistic model.


I. INTRODUCTION
Molecular dynamics (MD) is a well-established tool for studying biomolecular systems.][6] However, despite this progress, MD is still limited to relatively fast and localized processes, when considering biological length-and timescales.
Various methods have been developed to push the boundaries of the time-and length-scales accessible by MD.Enhanced sampling methods promote the exploration of conformation space and the transition between long-lived (metastable) states in order to obtain converged estimates of free energy landscapes that are beyond the reach of naive MD simulations.7][18][19][20][21] By using coarse-graining to reduce the degrees of freedom in the molecular system, simulations can be significantly sped up.Because coarse-graining necessarily omits some physicochemical details, it is crucial to choose a CG strategy designed to preserve the properties of interest to the researcher.Even if some chemical details are missing in the CG representation, one can argue that a successful CG model allows us to focus on the most important physical factors associated with the system behavior. 19n practice, the definition of a CG model consists of two steps: first, multiple atoms are mapped onto CG sites, often referred to as "beads."Then, an effective CG Hamiltonian is defined as a function of the bead coordinates.These steps are interconnected and are both important for the success of a CG model. 22Although multiple algorithmic strategies have been proposed for the CG mapping, [23][24][25][26] they are most commonly based on physical and chemical intuition, and the optimization of the CG mapping is still an open area of research. 25The definition of an effective Hamiltonian for a given CG mapping depends on the goal of the coarsegrained model.As some of the information is necessarily lost upon coarse-graining, CG models must be designed such that certain targeted properties of the molecular system are retained and can be computed from both the all-atom and the CG ensembles.Depending on the subject of study, this can be accomplished by using top-down, bottom-up, or knowledge-based models. 228][29][30] In contrast, bottom-up CG methods are designed to preserve specific microscopic properties of an atomistic or first-principles model while coarse-graining, e.g., thermodynamic properties 16,18,[31][32][33][34] or kinetic properties. 357][38] Traditionally, CG Hamiltonians for protein systems have been defined as a combination of functional forms similar to the ones used in atomistic forcefields, that is, harmonic bonds, angle and dihedral terms, and non-bonded two-body interactions (see, e.g., Refs.29, 39, and 40).Multi-body terms have been added in CG models as a correction [41][42][43][44][45] or by considering the physical nature of the corresponding interactions, as for instance, water-mediated potentials. 20,46However, the multi-body terms are often crucial to reproduce the system's behavior correctly.For instance, an early study with a CG water model including three-body interactions showed the importance of manybody terms. 47Larini et al. designed a CG model by parameterizing specific forms of two-body and three-body energy functions and have shown that they perform significantly better than the ones parameterized with only two-body potentials. 42A more general and systematic approach, which does not require the choice of a specific three-body functional form, was developed by Das and Andersen. 48,49When tested on the modeling of SPC/E water, this approach obtained a significant improvement in the model accuracy. 49Additionally, Andrienko and co-workers have shown that the inclusion of many-body terms into a CG model can result in substantial changes in the two-body interactions, making them much more attractive at short distances. 50By first parameterizing the two-body potential and then introducing three-body terms as a correction, these authors have demonstrated that three-body interactions are essential to reproduce structural properties of liquid water. 50[55] Many-body terms can also be included in a CG potential by using kernel-based machine learning methods, such the Gaussian approximation potentials pioneered by Csańyi and coworkers.It has been shown that CG molecular models designed using this framework are able to describe many-body interactions and are much more accurate than models only using pair potentials. 56Scherer and co-workers have also described a number of kernel-based strategies to parameterize the traditional force field of molecular liquids, and they have showed that a model with two-and three-body machine learned potentials is computationally efficient and correctly recovers two-and three-body distribution functions. 57lternative approaches have been proposed to take into account multi-body terms.For instance, multi-body corrections can be included by utilizing virtual sites in a CG mapping scheme. 58n the ultra-coarse-grained (UCG) theory, standard single-state CG beads are mixed with "special" CG beads with rapidly adjusting internal states.In theory, this approach can effectively account for multi-body effects in the CG model, but it is limited by the choices for the functional forms of the interactions. 59UCG representations are equivalent to interacting-particle reaction dynamics (iPRD), 60 which were obtained as fine-grained representations of particlebased reaction dynamics rather than coarse-grained representations of molecular systems.
In recent work, by our group 61,62 and others, 26,63,64 a different philosophy has been employed to take into account multibody effects in CG modeling: namely, taking advantage of the ability of modern machine learning techniques to approximate arbitrary complex multi-body functions.Given the recent success in the use of these techniques for the definition of the classical energy function from quantum mechanical calculations, [65][66][67][68][69][70][71][72][73][74][75][76][77][78][79][80][81][82][83][84] a similar idea has been applied for coarse-graining.To this end, we have used both neural networks (CGnets) 61,62 and kernel methods 85 as universal function approximators that can represent complex many-body terms on top of lower order terms.We have demonstrated on several simple systems and a mini-protein that, thanks to the general modeling of the full n-body interaction potential allowed by these techniques, it is possible to design CG models that accurately reproduce the free energy landscape of atomistic models. 61,62,85n the present work, we employ general CGnets and a multibody CGnet architecture in order to analyze to which degree multibody interactions are required to represent accurate coarse-grained force fields.We find that, on a test miniprotein, correction terms limited to three-body interactions are not sufficient for a CG model to reproduce the free energy of an atomistic model.Even if they are very small with respect to the two-body and three-body contributions, four-body terms and chirality information are needed to at least qualitatively reproduce the free energy landscape of the miniprotein considered here.Furthermore, five-body terms are necessary to quantitatively reproduce the free energy.Surprisingly, not only do additional terms beyond five-body not further improve the CG model (according to the mean square distance from the reference free energy landscape), but a model including up to five-body interactions outperforms our original CGnet model, which in theory allows for interactions at any order. 61We believe that this architectural restriction of the multi-body interactions up to a certain order acts as an implicit regularization on the CG model that reduces overfitting, producing a smoother free energy landscape and better agreement with the atomistic model.
Although this approach is applied to a single system and we cannot directly generalize the results to different systems, the multibody decomposition presented here opens the way to the formulation of more general and accurate CG models and to the understanding of the key physical ingredients, shaping the energy landscape of a CG protein model.

A. Coarse-graining with thermodynamic consistency
Bottom-up methods are said to be "thermodynamically consistent" if the free energy landscape of the resulting CG model matches the corresponding free energy landscape of the fine-grained model (when projected in the same space).One approach to enforce thermodynamic consistency is the so-called multi-scale coarse-graining method (MS-CG) proposed by Noid, Voth, and colleagues. 36,86It has been proven that, under certain restrictions of the CG map, the thermodynamically consistent CG model can be uniquely identified from the set of all possible CG energy functions by minimizing the mean square error (MSE) between the instantaneous atomistic forces projected onto the CG space and the CG forces. 22his procedure, originally developed in the atomistic context for ab initio data, 87 is called "force matching" and was first employed in the CG context in Ref. 18.However, the force matching MSE can never be reduced to 0. This is because each point in CG space corresponds to an ensemble of atomistic configurations, and the CG force cannot match each instantaneous force of an all-atom configuration mapping to the same CG configuration.Thus, the forces calculated from the thermodynamically consistent CG model correspond to the mean atomistic force computed on that ensemble of atomistic configurations weighted by their Boltzmann factors. 36herefore, the minimum MSE obtained through force matching in the CG context is strictly larger than 0 for any non-trivial CG mapping.In a statistical machine learning framework, this minimum MSE corresponds to the estimator noise. 61In this section, we briefly describe the theory behind the force matching method for coarse-graining.
A configuration of an all-atom protein system consisting of Na atoms can be represented by a vector r ∈ R 3N a .If the representation is coarse-grained, the atomistic configuration is mapped into a lower dimensional vector x via the mapping function, where nCG < Na is the number of beads in the CG system.The mapping scheme ξ depends on the specific system under study. 37ere, we assume that the CG mapping is given and that it captures the most important structural features of the molecule.We further assume that ξ is linear, i.e., the mapping is a linear transformation of variables defined by the coarse-graining matrix Ξ ∈ R 3n×3N a , so that x = Ξr.In the following, we use a matrix Ξ whose elements are only zeros and ones: in other words, the CG mapping is a slicing of the configurational space of the atomistic model.We indicate the CG energy function as U(x; θ), where θ are parameters that need to be optimized.The parameterization of U(x; θ) can be realized in different ways, i.e., through the combination of fixed functional forms or through machine learning approaches. 61,62,85ere, we employ a neural network architecture to represent U(x; θ) and we seek to optimize the network parameters θ to obtain the thermodynamically consistent energy function for a given CG system.This means that we wish to identify the parameters that best satisfy the following equation: where kB is the Boltzmann constant, T is the temperature, and p CG (x) is the probability density distribution in the CG space, given by the marginalization of the probability density of the atomistic model, where μ(r) = exp(−V(r)/kBT) is the Boltzmann weight associated with the atomistic energy V(r). 88t is important to note that even if the atomistic energy V(r) contains mostly pairwise interactions, by enforcing Eq. ( 2), multibody terms emerge in the thermodynamically consistent energy of a CG model, by effect of the dimensionality reduction.
Various methods have been proposed to satisfy Eq. ( 3) as best as possible, such as relative entropy 34 and force matching. 18,37In this work, we design protein CG models by means of the force matching method.
In practice, force matching optimizes the parameters θ in the CG potential U(x; θ) through the minimization of the functional, 18,37 where −∇U(x; θ) is the CG force field, ΞF(F(r)) is the instantaneous all-atom force projected onto the CG space, 88 and ⟨⋅⟩r is the weighted average over the equilibrium distribution of the atomistic model, i.e., r ∼ μ(r).With our assumptions on the CG mapping, the projection of the forces becomes ΞF = Ξ.
It can be proven that the CG potential minimizing (4) in the space of all possible functions satisfies thermodynamical consistency (2) in the limit of infinite sampling. 18,37

B. Multi-body terms in the CG potential
In principle, the MS-CG approach allows us to find the correct thermodynamically consistent CG energy function if the MSE is minimized in the space of all possible functions, including multibody terms.However, in practice, the CG model is usually optimized variationally in the space spanned by only two-body (or few-body) functions. 86For an all-atom system with Na atoms, the atomistic potential energy of the system V(r) is usually expressed as the sum of non-bonded pairwise interactions (e.g., Lennard-Jones and Coulomb) and local terms such as harmonic bond, angle, and dihedral potentials.In general, the CG energy U(x, θ) can then be decomposed as follows: where xi, with i ∈ {1, 2, . .., n}, indicates the coordinate of the ith CG bead.U (k) (x, θ) indicates a functional form involving k-body ARTICLE scitation.org/journal/jcpinteractions and can be further decomposed as where {i 1 , i 2 , . .., i k } ⊂ {1, 2, . .., n} indicates all possible combinations of k indexes chosen from the full set {1, 2, . .., n} and fi 1 i 2 ...i k are non-decomposable k-body functions (i.e., they cannot be written as the sum of lower order functions).Note that we do not include k = 1 in ( 5) as (i) only relative energies matter, and thus, the energy U is defined up to an additive constant, and (ii) we assume the absence of external forces, and thus, the energy only depends on internal coordinates.
Traditional CG approaches only include lower order terms for the non-bonded interactions, that is, the effective CG potential usually contains only U (2) and sometimes U (3) terms in the decomposition ( 6), as discussed in the Introduction.In this work, we focus on the effect of higher order terms in CG protein models and extend our previously proposed deep learning framework (CGnets) 61 to explicitly learn the terms U (k) for any k in a thermodynamically consistent potential U(x, θ).

C. Constructing a multi-body CG model using neural networks
In our previous work, 61 we proposed CGnet, a deep learning framework, to model a thermodynamically consistent CG potential from all-atom molecular dynamics trajectories.In this work, we extend CGnet to extract the different n-body contributions explicitly, i.e., by means of a multi-body expansion.
We create a set of several different models to explore the contributions of the multi-body terms in the CG energy.We start the multi-body expansion (5) by considering the two-body contribution.A similar two-body network was also employed in our previous work 61 to define what we called "the spline model," which was used as a comparison to CGnet.In practice, as shown in Fig. 1(a), the Cartesian coordinates x of the CG system are first transformed into a set of roto-translational invariant features y.Then, each single feature is passed separately through an individual network [(indicated as "two-body unit" in Fig. 1(a)].The features considered here consist of all the pairwise distances and cos(ϕ) and sin(ϕ) of each dihedral angle ϕ spanned by four continuous beads.In addition, all the angles defined by three adjacent beads, all the bonds between two adjacent beads, and excluded volume terms are passed through the prior energy unit, which precomputes a prior potential as previously described. 61The excluded volume repulsive terms take the form ∑ ij ( σ r ij ) c , where rij is the distance between CG beads i and j for all pairs (i, j) connected by ⩾3 bonds.The excluded volume radius σ and exponent c are fixed to the values obtained in previous work: 61 σ = 5.5 Å and c = 6.The prior energy terms serve as a physical constraint and act as a regularization, ensuring that trajectories generated by the learned potential lie within physical meaningful regions.The sum of the energy resulting from all the two-body units, the dihedral units, and the prior energy is the CG energy of the two-body model.
We then consider the next term in expansion (5), by including three-body terms.That is, once the two-body CGnet is trained, we fix its weights, and three-body contributions are added to the fixed two-body CGnet.The three-body contributions are added by defining three-body unit networks (see Fig. 1, with k = 3): each unit network takes as input the three pairwise distances among a set of three beads for each of the ( n 3 ) sets that can be selected from the total The Journal of Chemical Physics ARTICLE scitation.org/journal/jcpn beads of the system.The training of the network is similar the two-body model.However, now, only the three-body unit networks trained, while the pre-trained two-body network is kept fixed, in order to obtain a three-body correction to the two-body model.The whole network defines the U (n,2) (x) + U (n,3) (x) terms of the multibody expansion ( 5), and we refer to it as the "2, 3-body model" in the following.Note that in the 2, 3-body model, we do not include a prior energy unit explicitly because it is already included in the two-body model.
We continue to extend the network this way to model expansion (5) by adding higher order corrections, as shown in Fig. 1(b).At the k-body order, the purple colored blocks in Fig. 1(b) indicate the previously trained networks, up to the k − 1-order, that are kept fixed when training the k-body network units.At the kth order, there are ( n k ) k-body units in the network (light blue blocks in Fig. 1).
Each of the units takes as input the k(k − 1)/2 pairwise distances among a set of k CG beads selected among the n beads of the system.The sum of the outputs of all the k-body network units captures the U (n,k) (x) term in Eq. ( 5), and each k-body unit captures a fi 1 ,i 2 ,...,i k non-decomposable function in Eq. ( 6).The entire model captures the U (n,2) (x) + ⋯ + U (n,k) (x) terms of the multi-body expansion Eq. ( 5).We refer to the entire model as "2, . .., k-body model" in the following.
For the special case of 2, 3, 4-body models, we define both a non-chiral model and a chiral model.If we consider the six pairwise distances corresponding to a group of four beads (quadrupole), they do not uniquely define the configuration of the beads as arrangements of different chiralities are consistent with the same set of distances.We define a chiral four-body network including up to four-body interactions but with additional information that encodes chirality: we also consider the dihedral angles defined by each set of four CG beads.each dihedral angle ϕ spanned by four beads, we include cos(ϕ) and sin(ϕ) as additional features entering the corresponding unit.We refer to the four-body chiral model as the "2, 3, 4C-body model" in the rest of this paper.
With this approach, we could, in principle, construct models at any order, up to k = n, as shown in Fig. 1: once a 2, 3, . .., (k − 1)-body model is trained, we keep it fixed and add ( n k ) different k-body unit networks to it.However, the number of k-body units increases rapidly with k, and the memory requirements for the training become quickly prohibitive.In this work, we consider up to five-body terms in expansion ( 5) and compare the results with the "vanilla" CGnet where interactions up to any order are included (as all the input features enter one large dense neural network).

D. Training and simulation of the multi-body models
For the two-body CG model, a three-stage fivefold crossvalidation is conducted to find the optimal hyperparameters (network depth, the number of nodes per layer, and Lipschitz regularization strength) of the network units as follows.In the first stage, we fix the number of neural per layer and Lipschitz regularization strength as some finite value, only sweeping the number of network layers.For each hyperparameter combination, we conduct fivefold crossvalidation and we identify the optimal hyperparameters as the ones associated with the smallest cross-validation score.By sweeping the number of network layers, we identify the optimal network depth.In the second stage, we fix the number of network layers to the optimal value and sweep the number of nodes per layer, while fixing the Lipschitz regularization strength.We then identify the optimal network width as the minimum cross-validation score.In the third stage, we proceed similarly by fixing the network depth and width to the optimal values and sweeping on the value of the Lipschitz regularization strength and identifying its optimal value with the one associated with the minimum cross-validation score.The Adam optimizer with a mini-batch stochastic gradient descent is used to train the network units. 89,90The hyperparameters resulting from cross-validation are reported in Table I.When the optimal hyperparameters are selected, the final energy model is defined as the average of the five models corresponding to each fold in the cross-validation at the optimal values of the hyperparameters.
For the multi-body models, we follow the same cross-validation procedure as for the two-body model to obtain the optimal values of the hyperparameters.However, at each order, only the hyperparameters for the network unit that is added are optimized by crossvalidation, while the underlying lower order networks are kept the same as in the lower order models.For example, the 2, 3-body model contains the two-body network previously trained and additional three-body units (see Fig. 1).In the 2, 3-body model, the hyperparameters of the additional three-body network units are optimized (the same hyperparameters are used in each unit), while the twobody component is kept fixed.As higher order units describe more complex interactions, their network structures are not necessarily the same at every order.
After a multi-body CGnet model has been obtained, we simulate it by numerically integrating the overdamped Langevin dynamics equation with the corresponding CG potential to generate trajectories and explore the free energy landscape, where xt (xt+τ) is a CG configuration at time t (t + τ), τ is the time step, D is the diffusion constant, and η is a Gaussian random variable with zero-mean and unit-standard deviation.As in our previous work, 61 to sample the effective multi-body potential more efficiently, we generate 100 independent trajectories in parallel, with initial configurations randomly sampled from the original dataset.
In order to visualize and compare the results, in the following, we project the trajectories of each model onto the space spanned by two collective coordinates: the first two TICA coordinates 91,92 of the allatom system.Free energy surfaces (Fig. 2) are then computed as the negative logarithm of a two-dimensional histogram over the TICA coordinates.

III. RESULTS
We apply the multi-body decomposition described above to study a series of CG models for chignolin, a 10 amino acid mini-protein [Fig.2(a)]. 93The reference all-atom trajectories were obtained by simulating the system with 1881 water molecules for a total of 5820 atoms.All the CG models studied here consist of 10 beads, located at the position of the Cα atoms along the protein backbone [Fig.2(a)].The reference all-atom free energy, shown in Fig. 2(a), exhibits three minima, where minimum A corresponds to the folded state, B corresponds to the unfolded state, and C corresponds to the misfolded state.Typical configurations from these three states are shown in Fig. 2(a).
Figure 3 reports the cross-validation error, the free energy MSE, and the KL divergence (computed as in our previous work 62 ) of each model with respect to the reference atomistic model [Fig.2(a)].Several notable results are apparent from Fig. 3: first, the multi-body CG expansion converges slowly, while the 2, 3-body model presents a significant improvement over the two-body model; the errors are still larger than the corresponding ones for the "vanilla" CGnet up to the 2, 3, 4C-body model.More quantitatively, while the addition of three-body interactions lowers the MSE of a little more than one kBT, the difference in MSE between the 2, 3, 4C, 5-body model and the 2, 3-body model is still about 1/2 kBT.The slow convergence is quite evident also in the gradual reduction of the KL divergence with the addition of multi-body terms [Fig.3(b)].
Additionally, in all three measures of the error reported, the errors associated with the 2, 3, 4C, 5-body model are smaller than for the vanilla CGnet model: the neural network potential including up to five-body terms appears to outperform the neural network potential where higher order interactions are included.The same trend appears in the learning curves of these models: the validation error for the 2, 3, 4C, 5-body model is smaller than the validation error of CGNet at every step of the training.This result suggests that the original CGnet overfits the training data and that the multibody expansion acts as a useful form of implicit regularization or inductive bias.
Free energy landscapes associated with the different models are obtained by means of overdamped Langevin simulations (Sec.II D) and are shown in Fig. 2, together with the reference free energy landscape of the all-atom model [Fig.2(a)] and of the original CGnet [Fig.2(b)].The comparison of these free energy landscapes echoes the results illustrated by Fig. 3.While the two-body model [Fig.4(c)] does not show a separation between the folded and unfolded states of the protein, the addition of three-body terms in the 2, 3-body model allows us to clearly identify the folded, unfolded, and misfolded states on the free energy landscape [Fig.4(d)].The free energy landscape becomes progressively closer to the reference one when four-body [Fig.4(e)] and chiral four-body interactions [Fig.4(f)] are added, and it is in good quantitative agreement for the 2, 3, 4C, 5-body model [Fig.4(g)].The free energy landscape of the latter is smoother and visually closer to the reference than the original CGnet model, supporting the hypothesis of a regularizing role of the multi-body expansion.
It is interesting to consider the CG potential energy contribution to the free energy for the different models.In Fig. 4, we report the values of the CG energy for all configurations sampled by the all-atom model, projected onto the same two-dimensional space of the first two TICA coordinates obtained from the all-atom data.Figures 4(a)-(c), 4(f), 4(i), and 4(l) show the total CG energy of the different models, including the original CGnet [Fig.4(a)].The different energy landscapes appear all surprisingly similar, with only the two-body model being clearly different from the others: all energy surfaces show a significant energy minimum corresponding to the folded state and an additional minimum corresponding to the misfolded state, while the configurations in the unfolded state have significantly higher energy.However, the small differences among these energy landscapes are associated with markedly different free energy landscapes.Figures 4(d), 4(g), 4(j), and 4(m) show the incremental differences in CG energy corresponding to the different terms in the multi-body expansion (5).As the energy differences are relatively small with respect to the energy gap between folded and unfolded states, Figs.4(e), 4(h), 4(k), and 4(n) show the same CG energies with a color scale zooming in a smaller energy range.

IV. CONCLUSIONS
We have presented the results of a multi-body expansion of a CG model for a small protein, chignolin.This is made possible by constructing a neural network architecture for the CG potential in a manner resembling a multi-body energy expansion.Using this approach, we can separate the different terms in the multibody expansion and evaluate their contribution to the energy and free energy landscapes.Not surprisingly, CG potentials including only pairwise interactions (in addition to angle and dihedral terms between adjacent CG beads) fail to reproduce the correct folding landscape of the protein, even at the qualitative level.
Perhaps more surprisingly, the CG multi-body expansion converges slowly for our model miniprotein: Only when the CG potential includes up to five-body terms, the free energy associated with the CG model appears remarkably similar to the reference all-atom free energy as a function of the same collective coordinates.As only one model system is studied here, we cannot easily generalize these results.However, at least for the case of the system considered here, such a slow convergence of the multi-body expansion is in contrast to the fast convergence of the multi-body expansion, capturing the behavior of the Born-Oppenheimer potential energy surface (PES).In the latter, three-body terms can be as large as 15%-20% of the total interaction energy, while four-body terms provide on average only a 1% energy contribution. 94This fast convergence has allowed for the development of very accurate analytical models for the PES of water from high-level quantum mechanical calculations for loworder interactions.On the other hand, if a slow convergence of the multi-body expansion also holds for other CG protein models, it makes it more challenging to obtain explicit analytical expressions for their effective energy functions.In principle, we expect that, when extended to the recently proposed transferable neural network architecture for the design of CG models, 62,63 95 ).However, in addition to the slow convergence of the multibody expansion, the large number of combinations for the different residues' clusters makes this task much more daunting than what has been possible for the characterization of bulk water PES. 94,96It is important to note the different nature of the multi-body terms in Born-Oppenheimer PES or CG models.In PES, such terms are directly linked to the quantum mechanical description of the system and the delocalized nature of the electronic structure.In CG models, they emerge as a result of the renormalization of atomistic degrees of freedom.We expect multi-body terms to play a more significant role when the dimensionality reduction is strong.Here, we have used a pretty aggressive coarse-graining scheme from a solvated atomistic description to a Cα-only resolution.It is likely that even for the same system, a different coarse-graining mapping could change the relative importance of the multi-body terms.We believe that the absence or limited presence of multi-body terms in traditional CG models has hindered the design of transferable CG models.It remains to be seen how the inclusion of multi-body terms in the form of neural network potentials changes the delicate balance between accuracy and transferability in CG models.
It is also worth noting that, at least for the chignolin system considered here, a CG neural network model truncating the multi-body expansion to five-body terms performs better than a model with a CG energy built as a fully connected neural network, which, in principle, includes interactions at any order.This result suggests a possible overfitting of the CGnet potential and the implicit regularizing effect of the multi-body expansion.
In terms of computational cost, MD simulations with CG neural network potentials are still slower (by about a factor of 10) than MD simulations with conventional CG potentials where the potential energy is composed of pairwise interactions with a given functional form or tabulated values.However, CG neural network models are already faster (by a factor 5-50 depending on the system and on the network architecture) than the corresponding atomistic model in explicit solvent.The studies that have so far been presented with CG neural network potentials are still exploratory in nature, and no significant effort has been devoted to the code optimization for simulation speed.We believe that this problem will be addressed in the near future as this field matures.
This study was performed using the small protein chignolin as a model system.We can only speculate if the conclusions from this study will also apply to larger proteins; future work will address this question.

FIG. 1 .
FIG. 1.(a) Neural network structure for the two-body CG potential.The input to a given two-body unit is a single pairwise distance between a pair of CG beads.(b) Neural network structure for a multibody CG potential up to order k.The inputs to a given k-body unit are all pairwise distances within a given set of k CG beads.

FIG. 2 .
FIG. 2. Free energy landscapes associated with the trained multi-body CG models as a function of the two slowest time-lagged independent components of the all-atom simulation.(a) Reference free energy from the all-atom model and representative molecular structures for each of the three metastable states.(b) Full CGnet, (c) two-body model, (d) 2, 3-body model, (e) 2, 3, 4-body model with no chirality, (f) chiral 2, 3, 4C-body model, and (g) 2, 3, 4C, 5-body model.The free energies are reported in units of k B T.

FIG. 3 .
FIG. 3. (a) Cross-validation error (blue bars) and free energy mean square error, MSE (red bars), for the different CG models studied.The units for the CV-error are [kcal/(mol ⋅ Å)] 2 , while the free energy MSE is measured in 2 .(b) KL divergence between the equilibrium distribution of the different CG models studied and the reference atomistic model.

TABLE I .
Hyperparameters of the neural network for different models.
multi-body expansion could disentangle the different contributions of interactions between different groups of residues and analytical expression could then be 154, 164113-8Published under license by AIP Publishing