Adversarial reverse mapping of condensed-phase molecular structures: Chemical transferability

Switching between different levels of resolution is essential for multiscale modeling, but restoring details at higher resolution remains challenging. In our previous study we have introduced deepBackmap: a deep neural-network-based approach to reverse-map equilibrated molecular structures for condensed-phase systems. Our method combines data-driven and physics-based aspects, leading to high-quality reconstructed structures. In this work, we expand the scope of our model and examine its chemical transferability. To this end, we train deepBackmap solely on homogeneous molecular liquids of small molecules, and apply it to a more challenging polymer melt. We augment the generator's objective with different force-field-based terms as prior to regularize the results. The best performing physical prior depends on whether we train for a specific chemistry, or transfer our model. Our local environment representation combined with the sequential reconstruction of fine-grained structures help reach transferability of the learned correlations.


I. INTRODUCTION
The demand to further expand the accessible length and time scales in computer simulations for molecular systems remains consistently high. Breaking the limits of molecular dynamics (MD) simulations is therefore still an area of active research. State-of-the-art approaches include enhanced-sampling techniques 1 , dedicated hardware 2 and hierarchical multiscale modeling. [3][4][5] Multiscale modeling aims at linking different levels of resolution. The reduced resolution in a coarse-grained (CG) model smoothens the energy landscape and thereby effectively accelerates the simulation. On the other hand, atomistic details are sometimes necessary for a thorough investigation of processes on smaller scales. The goal is therefore to use a CG model with a reduced number of degrees of freedom where it is possible and switch back to a higher resolution where it is needed. 6,7 However, the process of reintroducing lost degrees of freedom is challenging as it requires to reinsert details with the correct statistical weight: Given the CG configuration the generated atomistic structure should follow the Boltzmann distribution of atomistic microstates.
Existing backmapping schemes typically consist of the following steps: At first, an initial atomistic structure is proposed for the given CG configuration. 8 A generic approach for this is to randomly place the atoms close to their corresponding CG bead center. 9,10 Subsequent energy minimization is needed to relax the structures and some (typically position restrained) MD simulations have to be performed to obtain the correct Boltzmann distribution. The computational effort for the energy minimization and MD simulation schemes can become significant. Further, poorly initialized structures can get trapped into local minima with high energy barriers. Therefore human intervention is still required for more complex molecular structures to obtain a reasonable ini-tial structure. The computational cost of subsequent energy minimization and MD simulation procedures can be reduced significantly when presampled fragments of a correctly sampled all-atom structures are used to generate the initial backmapped configuration. 5,7,11,12 While reverse-mapping of molecular structures is still tackled largely by classical methods like energy minimization and MD simulation, recent approaches leveraging machine learning (ML) are receiving growing attention. Wang and Gómez-Bombarelli used a variational auto-encoder (VAE) to learn a mapping from an all-atom representation to coarse-grained variables, parametrizing the coarse-grained force field and decoding back to atomistic detail. 13 Other approaches, including our previous study and the work by Li et al., used convolutional conditional generative adversarial networks (convolutional cGANs) to learn the correspondence between CG and fine-grained configurations. 14,15 cGANs originate from computer vision applications and have shown the ability to model highly complex and detailed probability distributions. 16 Li et al. used a convolutional cGAN for their study on backmapping cis-1,4 polyisoprene melts using an image representation by converting XYZ components of vectors into RGB values. 15 Other approaches for generating low-energy geometries for molecular compounds but not specifically designed for reversemapping include autoregressive models, 17,18 invertible neural network, 19 Euclidean distance matrices, 20 and graph neural networks. 21 In our previous work we have introduced deepBackmap (DBM): 14 An approach based on cGANs to directly predict equilibrated molecular structures for condensedphase systems. In contrast to the work by Li et al., we aim to improve the quality of generated structures by incorporating prior knowledge into the input representation as well as the loss function of the generator. We use a voxel representation to encode spatial relationships arXiv:2101.04996v2 [physics.chem-ph] 23 Feb 2021 and make use of different feature channels typical for convolutional neural networks to encode information of the molecular topology. The loss function of the generator is augmented with a term penalizing configurations with high potential energy.
A regular discretization of 3D space prohibits scaling to larger spatial structures. Therefore, we use an autoregressive approach that reconstructs the fine-grained structure incrementally, atom by atom. In each step, we provide the convolutional generator only with local information, making the method scalable to larger system sizes and applicable to condensed phase systems.
In this work we explore the model's capability with respect to chemical transferability: we probe model generalization beyond the chemistry used for training. We recycle the learned local correlations to make predictions for molecules absent from the training set. We argue that our sequential approach combined with the local-environment representation is well suited to achieve chemical transferability, as long as the generation of one atom only relies on short-range force-field related features. We hypothesize that these atomic environments strongly overlap across chemistry, as suggested by the successes of ML for various electronic properties. 22 We train the model on molecular liquids of small molecules: octane and cumene. After training, we deploy the model on a more challenging polymeric melt: syndiotactic polystyrene (sPS). sPS is well suited for our study as it is sufficiently complex but still has some features in common with octane and cumene and therefore allows for a better understanding of the limits of generalization. The pertinent but imperfect match between the small molecules and polymer make for a more stringent backmapping exercise. Furthermore, we insert two different physical priors into the generator's objective based on the molecular force field. We compare their impact on the performance of the model, especially regarding chemical transferability.

II. MACHINE LEARNING MODEL
In the following, we will briefly summarize the approach and then focus on new extensions and applications of our model DBM. For a more detailed description of the model the reader is referred to our recent publication. 14 A. Setup We recall the notation for the coarse-grained and atomistic resolutions, as well as the backmapping procedure: Coarse-grained resolution: Let {A I = (R I , C I )|I = 1, . . . , N } denote the set of N coarse grained beads, where I(i) is the index of the bead that contains atom with index i. Each bead has position R I ∈ R 3 and bead type C I . Atomistic resolution: Let {a i = (r i , c i )|i = 1, . . . , n} denote the set of n atoms, with position r i ∈ R 3 and atom type c i . We denote ϕ I ⊂ {a i |i = 1, . . . , n} as the set of atoms contained in the coarse-grained bead A I .
Backmapping: Backmapping requires us to generate a set of n atom positions r 1 , . . . , r n conditional on the coarse-grained (CG) structure, given by the N beads A 1 , . . . , A N , as well as the atom types c 1 , . . . , c n . We express this problem as a conditional probability p(r 1 , . . . , r n |c 1 , . . . , c n , A 1 , . . . , A N ).
Our ML technique takes examples of corresponding coarse-and fine-grained configurations as input and from this training data learns to generate further samples from the conditional distribution p.
Learning to sample from p(r 1 , . . . , r n |c 1 , . . . , c n , A 1 , . . . , A N ) directly causes several problems: (1) The trained model is fixed on the system size and atom ordering used during training. (2) The model becomes specific for the given molecules and thus chemical transferability can not be achieved. (3) A dense system has an overwhelming number of degrees of freedom. A one-shot method, which generates all coordinates of the system at once, would have to solve an unreasonably high dimensional problem. Applications to the condensed phase are therefore limited.
To avoid these problems, we factorize p in terms of atomic contributions, where the generation of one specific atom becomes conditional on CG beads as well as all the atoms previously reconstructed. 18 We therefore train a generative network, G, to generate and refine the atom positions sequentially.
The backmapping scheme hereby consists of two steps: (i) An initial structure is generated using the factorization p(r 1 , . . . , r n |c 1 , . . . , c n , A 1 , . . . , where S sorts the atoms in the order of reconstruction and {r S(1) , . . . , r S(i−1) } correspond to atoms that have been already reconstructed. The dependence on earlier predictions of G makes our approach autoregressive. In a second step, we want to refine the generated structures, since this approach can still lead to misplaced atoms that blow up the potential energy of the system. To this end, we perform a sampling scheme inspired by Gibbs sam- pling, which iteratively resamples along the sequence S several times. 23 Each further iteration still updates one atom at a time, but uses the knowledge of all other atoms. Our experiments confirmed that such sampling leads to a good approximation of p, even with a small number of iterations and fixing the atom ordering.

B. Local representation and feature embedding
We use deep convolutional neural networks (deep CNNs), motivated by the impressive developments for generative tasks in computer vision. 16,24 In order to leverage CNNs for our task, an explicit spatial discretization of ambient space is required. To this end, we use a voxelbased representation. One-hot encoding of the atom positions, where each atom is assigned to its nearest voxel, leads to severe sparsity hindering learning. Therefore, we encode atoms and CG beads with a smooth density, γ(x) and Γ(x), respectively, modeled using Gaussian distributions where x is the spatial location in Cartesian coordinates expressed on a discretized grid. The density is centered around particle position r i with Gaussian width σ, treated as a hyper parameter.
For each atom to be placed a unique representation is generated by means of the density of particles placed around it. We assume locality by limiting the amount of information about the environment to a cutoff r cut and sum over all atoms or beads within a cubic environment of size 2r cut centered on the current atom of interest.
In our previous work, we motivated the local environment representation with the high computational costs of regular 3D grids (thus representing the whole structure at once gets infeasible) and the scalability to larger system sizes. In this study, we also emphasize the gains of a local environment description for achieving chemical transferability. It only encodes small-scale features that are not necessarily unique for a given molecule and thus the learned local correlations are more likely to generalize.
Another key for the chemical transferability of DBM is the feature embedding. Similar to the three feature channels found for RGB images, we store a number of feature channels in each voxel that represent the presence of other atoms or beads of a certain kind. In our current implementation we made the feature mapping rather flexible such that it can be defined individually by the user. Atom types can be distinguished not only by element but additionally by chemical similarity, i.e., atoms of a given type can be treated as identical in the MD simulation. Furthermore, the user can add channels to distinguish the functional form of interaction to the current atom of interest. Interaction types can include bond, bending angle, torsion, and Lennard-Jones. Similarly, separate channels can be used to encode the different coarse-grained bead types. This feature representation is permutationally invariant with respect to the ordering of the atoms in the local environment. It is well suited for achieving chemical transferability as it focuses on local geometries determined by the underlying force field, rather than features specific to the molecule.

C. Generative model
We train our model using the generative adversarial approach. GANs yield strong results when aiming at high-quality generative tasks, in particular on highdimensional and hard to model image spaces. 16 The generator, G, is trained against a second network, the critic, C. While the critic, C, is trained to learn a distance metric between generated and reference data, the generator, G, is trained to minimize the distance.
We use a cGAN to generate new atom positions. For atom i contained in bead I the input for G is made up from a random noise vector z ∼ N (0, 1) and the conditional input u i := {ξ i , Ξ I(i) , c i } consisting of the local environment representation for atoms ξ i and for beads Ξ I(i) , as well as the current atom type c i . The output of the generator G is a smooth-density representation The critic network C is trained to distinguish between reference densities γ i related to the conditional input u i and generated densitiesγ i = G(u i , z). We can write the basic loss function for the critic as and for the generator we obtain a loss function purely affected by the generated data

D. Extensions
We reimplemented the model using the python package PyTorch. 25 The code can be found at https://github. com/mstieffe/deepBM. In the following we want to focus on the differences of the model compared to our previous study.

Regularization and normalization
We use a variant of adversarial models where the Wasserstein distance, which arises from the idea of optimal transport, serves as a metric to measure the similarity between the target and the generated distributions. 26 Computing the Wasserstein distance directly is intractable, as it involves computing the infimum over the set of all possible joint probabilities of the target and generated distribution. Instead, we are able to use a dual representation of the Wasserstein distance, which is based on the Kantorovich-Rubinstein duality, and use the critic C to approximate it. 27 For this purpose, the critic C has to be constrained to the set of 1-Lipschitz functions. Two major approaches for achieving this are regularization and normalization.
Gradient Penalty A differentiable function is 1-Lipschitz if and only if it has gradients everywhere with norm at most one. A soft version of this constraint is enforced with a penalty on the gradient norm 28 where (ũ i ,γ i ) is interpolated linearly between pairs of points (u i , γ i ) and (u i , G(u i , z)). The prefactor λ gp scales the weight of the gradient penalty. The additional term in the loss function may be considered as a regularizer for the complexity of the critic C. Spectral Normalization The Lipschitz constant of a linear function is its largest singular value (spectral norm). The 1-Lipschitz constrain can therefore be achieved by applying Spectral Normalization to all the weights in the network where σ(W ) is the largest singular value of W . 29 In our previous study we used only regularization but in this study we found that combining regularization and normalization lead to the best results.

Physical prior
We collapse the generated smooth-density representationγ i back to point coordinates by computing a weighted average, discretized over the voxel grid This density-collapse step is differentiable and thus the point coordinates can be used to incorporate a physical prior, p, in the loss function for the generator. p is built on force-field-based energy contributions and penalizes high-energy structures. It thereby effectively narrows down the functional space of the generator. Adding p with appropriately low weight to the loss function helps steering the optimization and regularizes the generator. It aims at improving generalization and accelerating convergence. p depends on the set of atoms corresponding to a coarse-grained bead, ϕ I for reference atoms andφ I for generated atoms, as well as reference atoms N I in the local neighborhood of different beads. In the following, ε t refers to the potential energy of specific intra-and intermolecular interactions, where t runs over the interaction types: intramolecular bond, angle, and dihedral, and non-bonded Lennard-Jones. While bonded interactions are expressed via harmonic (bond, angle, improper dihedral) or periodic (proper dihedral) potentials, nonbonded interactions follow the Lennard-Jones potential where is the depth of the potential well and σ its characteristic distance. In this study we compare two different prior types: Energy minimizing The first prior p 1 aims at minimizing the potential energy of generated structures Energy matching The second prior p 2 penalizes discrepancies between the potential energies of generated and reference structures The prefactor λ t scales the weight of a given interaction term. Overall we use the following loss function for the generator where θ I = {i|a i ∈ ϕ I } is the set of atom indices for atoms contained in ϕ I and p is one of the prior terms defined above.

E. Implementation details
We use 3D convolutional neural networks (CNNs) with residual connections similar to our previous work. 30 The model is trained for 60 epochs in total using a batch size of 64. We start training with λ t = 0 and increase it in small increments to λ t = 4 · 10 −2 for nonbonded Lennard Jones and λ t = 4 · 10 −3 for bonded interactions during training. The final values for λ t are obtained in a hyper parameter search optimizing the overall performance of the model. Treating all interaction terms equally (e.g. setting λ t = 4 · 10 −2 for all terms) lead to only marginal improvement regarding covalent interactions but significantly higher Lennard-Jones energies. On the other hand, setting λ t = 0 for all covalent terms but keeping λ t = 4 · 10 −2 for Lennard-Jones makes the training unstable. We use the Adam optimizer with learning rates 5 · 10 −5 for the generator and 10 −4 for the critic. The prefactor scaling the weight of the gradient penalty term is set to λ gp = 0.1. The critic C is trained five times in each iteration while the generator G is trained just once.
As in our previous work, we train the model recurrently on atom sequences containing either all heavy (carbon) or light (hydrogen) atoms corresponding to a single coarsegrained bead. While CG force fields might lead to the sharing of an atom between two neighboring beads (see Sec. III), the reconstruction of the atom is assigned to only one of the two beads. This assignment has no impact on the local environment representation of the atoms (except a shift of the center), but might affect the order of reconstruction. For heavy atoms we remove intramolecular hydrogens from the local environment representation. In training mode, the initial local neighborhood for a sequence is generated from training data. After each step, the generated atom density is added to the local environment representation for the next atom in the sequence until all atoms of the sequence are generated. In evaluation mode, no training data is used and all environment atoms are generated autoregressively.
We reduce the rotational degrees of freedom by aligning the local environment according to the position of the central bead and the difference vector to a bonded bead. This leaves one rotational degree of freedom around the director axis, for which we augment the training set by means of rotations. To further improve the quality of reconstructed structures, we feed different orientations about said axis during prediction and choose the structure with the lowest energy from the generated ensemble. We use four iterations to refine the structures.

III. COMPUTATIONAL METHODS
This study is based on two molecular liquids: octane and cumene, as well as a syndiotactic polystyrene (sPS) melt. All data were generated using the molecular dynamics package GROMACS (version 4.6 for sPS, 5.0 for octane and cumene, but the version does not effect the outcome of the simulations). 31 Molecular dynamics simulations were performed in the NPT ensemble using the velocity rescaling thermostat and the Parrinello-Rahman barostat. An integration timestep of 1 fs was used.
The atomistic data for sPS was reported in Liu et al.; 32 the underlying force field is based on the work of Mueller-Plathe. 33 Replica Exchange MD simulation, a temperature-based enhanced-sampling technique, was used to sample the system. Each snapshot contains 36 chains consisting of 10 monomers. For additional details regarding the simulations the reader is referred to the work of Liu et al. 32 The coarse-grained model of sPS was developed by Fritz et al. 34 It represents a polymer as a linear chain, where each monomer is mapped onto two CG beads of different types, denoted A for the chain backbone and B for the phenyl ring (see Fig. 1). The center of bead A is the center of mass of the CH 2 group and the two neighboring CH groups, which are weighted with half of their masses. Bead B is centered at the center of mass of the phenyl group. Bonds are created between the backbone beads A-A and between backbone and phenyl-ring beads A-B.
The atomistic data for the liquids of octane and cumene were generated using the Gromos force field and the topologies were generated by the Automated Topology builder. 35 Notably, the Gromos and sPS force fields differ in parametrization strategies, leading to evident inconsistencies for both intra and intermolecular interactions. Octane and cumene simulation boxes contained 215 and 265 molecules, respectively. Both sys-tems were sampled at 350 K. Similar to the sPS mapping, cumene was mapped onto three CG beads: Two beads of type A for the backbone, each containing a methyl group and sharing the CH group connected to the phenyl ring, and one bead of type B for the phenyl ring. Octane was mapped onto four beads of type A (Fig. 1), where neighboring A beads share a CH 2 group.

IV. RESULTS
In the present work we want to probe the chemical transferability of DBM. To this end, we train the model solely on a dataset consisting of small molecules, but validate it on a challenging polymer melt. The training set contains 3,225 and 2,120 molecules of octane and cumene in the liquid state, respectively, both simulated at T = 350 K. After training was completed, we applied DBM to 720 chains (7200 monomers) of sPS melt at T = 568 K. The temperature discrepancy for the test and training set arises from the different boiling and melting points of the molecules, as we wish to probe the model's chemical transferability in their liquid state. However, as we have shown in our previous work, DBM is robust against temperature changes: the learned local correlations are weakly sensitive to temperature. 14 Furthermore, we want to investigate the impact of different force fieldbased priors used during training of the model. We evaluate and compare the performance of different models regarding their ability to reproduce structural and energetic features of the sPS reference atomistic configurations.
A. Local structural and energetic features Fig. 2-4 show distribution functions for structural and energetic properties of sPS reference structures ("AA") and structures generated with DBM. The model was trained using three different prior configurations: Prior p 1 ("energy minimizing"), prior p 2 ("energy matching") and no prior. For a thorough comparison, we show results for the chemically-specific models, i.e., trained directly on sPS (left) and chemically-transferred models, i.e., trained solely on octane and cumene configurations (right).
We first analyze the angle distributions shown in Fig. 2. The largest discrepancy between chemically-specific and chemically-transferred models can be found in the C-C-C backbone-angle distributions (panels a and b). While models trained directly on sPS are able to reproduce the distribution with high accuracy, models trained solely on octane and cumene lead to overly broad distributions. Surprisingly, for the other angles shown (panels c-h) the opposite holds: Models trained on sPS generate slightly too narrow distributions compared to the reference configurations, but distributions generated with models trained on octane and cumene are remarkably close to the reference system. Apparently, the prior type used during training seems to only have a marginal effect on the angle distributions. 120 140 Next, we focus on the dihedral distributions displayed in Fig. 3. Again, the largest discrepancy between models trained on the different training sets can be found for the distributions of the C-C-C-C backbone dihedral (panels a and b), which is well reproduced by models trained on sPS directly but models trained on octane and cumene fail to reproduce the height of the main peak and are not able to reproduce the side peak. Similarly, the per-formance for the C-C-C-H backbone dihedral (panels e and f) is also slightly worse for models trained only on the octane and cumene dataset. On the other hand, improper dihedrals of the phenyl rings (panels c, d, g, and h) are reproduced virtually equally well for both training sets. While the generated improper dihedrals are slightly too narrow compared to the reference distributions, we emphasize the small range of angles due to the imposed planarity of the ring. The prior used does not influence the generated dihedral distributions. Performance vary between scenarios and interactions, as reported by the Jensen-Shannon divergence between reference and backmapped distributions (see Table S1 in the SI). We did not observe clear trends between the quality of reconstruction between bending angle and dihedral.
The impact of the prior becomes most significant for the Lennard-Jones energies, shown in Fig. 4 (a-d) obtained for each sPS chain separately. Regarding models trained on sPS, both prior p 2 and no prior lead to a good match with the reference distribution. While the carbononly Lennard-Jones energies match well, the generated distributions show slightly large high-energy tails. On the other hand, prior p 1 over-stabilizes the system leading to a significant shift of the distributions toward lower energies. This is reasonable, since p 1 aims at minimizing the energy of generated structures during training, and therefore might not account for the diversity of generated microstates. For models trained on octane and cumene these findings turn around: While using either prior p 2 or no prior lead to significantly high Lennard-Jones energies, prior p 1 dramatically improves the performance of the model. Training the model to minimize the energy seems to help learning more general features that are better transferable across chemistry. On the other hand, the energy-matching prior p 2 and the absence of prior (only data-driven) encourage the model to reproduce the features found in the training set, making it less transferable. This is especially relevant in the context of possible force-field inconsistencies.

B. Large-scale structural features
Next, we evaluate large-scale structural features. A first impression can be gained looking at the pair correlation function, g(r), shown in Fig. 4 (e-f), obtained for pairs of non-bonded carbon atoms. All models reproduce the pair correlation function remarkably well, indicating that the pair statistics of the sPS chains is reproduced with great accuracy. It more broadly suggests good agreement regarding local packing.
Beyond pair statistics, we seek to probe and compare the accuracy of the generated configurations of the different models at higher order. The higher dimensionality prevents us from directly visualizing the space, and we turn to dimensionality reduction instead. We build a twodimensional map representing proximity relationships between sPS monomers and their environments. The pair-  wise distance between two such environments is encoded using a similarity kernel based on the many-body smooth overlap of atomic position (SOAP) representation. 36 We neglect hydrogens in the representation. Taking the pairwise similarity for N monomers into account, we derive an N ×N similarity matrix. We further apply Sketchmap to project this high dimensional representation of conformational space onto a two dimensional embedding. 37,38    We analyze the local environment of 720 sPS monomers to infer landmarks (gray) for the two dimensional map. We used these landmarks to further project 1440 additional AA monomer representations (black). In Fig. 5 (b) we project the backmapped structures from corresponding coarse-grained configurations applying a model trained on sPS (red) or on octane and cumene (blue). Both models were trained using prior p 1 , the projections for models trained with different priors can be found in An in-depth analysis of the low-dimensional maps is shown in Fig. 6. We identify cluster centers using a kmeans algorithm. We then assign each point of the two dimensional projection to its closest cluster. This allows us to compute a confusion matrix comparing the cluster assignments of reference and backmapped structures. The diagonal on the confusion matrix hereby refers to reference and backmapped structures that get mapped into the same cluster, indicating closeness in conformational space. The results for the chemically-specific and chemically-transferred models can be found in Fig. 6 (ac) and Fig. 6 (d-f), respectively. Interestingly, the confusion matrix becomes most diagonal for both training sets if we train DBM without any prior (Fig. 6 c, f). However, the reduced CG resolution implies that a single CG configuration will correspond to an ensemble of atomistic microstates. This ensemble might span a broad region in conformational space and two microstates corresponding to the same CG structure do not necessarily have to fall into the same cluster. Therefore, it is not clear to what extent backmapped structures should correctly map to the same clusters as their corresponding reference structures. More importantly, the relative clusters populations should match, as this would indicate an accurate coverage of the conformational space. Regarding models trained on sPS, we find that both prior p 2 and no prior lead to an excellent match of the relative populations. For models trained on octane and cumene all priors lead to comparable accuracy, with limited reproduction of the relative populations.

V. CONCLUSION
In this study we probe the chemical transferability of our machine learning (ML) model deepBackmap (DBM), designed for backmapping of condensed-phase molecular structures. To this end, we trained DBM solely on liquidphase configurations of small molecules, specifically octane and cumene, and then tested its performance on the more complex system of syndiotactic polystyrene (sPS) melts. Furthermore, we tested different priors and their impact on the quality of the backmapped structures.
We observe that the local correlations learned from the chemically-transferred (i.e., octane and cumene) transfer remarkably well to sPS. Despite discrepancies for some structural distributions, such as the angles of the backbone carbon atoms, the overall quality of the backmapped structures is encouraging. Importantly, the model performs well in a challenging condensed-phase environment, and is able to reproduce the distribution of Lennard-Jones energies with high accuracy. Nonbonded structural features, in particular the pair correlation function, match the reference distributions virtually identically. A higher-order investigation, facilitated by the Sketchmap algorithm, also indicates high structural fidelity. Although backmapped structures are not necessarily mapped onto the same cluster as their corresponding reference structures, as shown by the confusion matrices, DBM is able to cover the correct spots in conformational space. The relative statistical weight of generated microstates leaves further room for improvement.
The results shown here indicate that a sequential reconstruction combined with a local-environment representation are well suited toward chemical transferability. However, generalization shows its limits. For example, the orientation of the phenyl ring with respect to the backbone cannot be learned from the octane and cumene structures, leading to misplaced atoms. This likely explains the limited quality of the carbon-backbone structures (Fig. 2 a,b and Fig. 3 a,b). In addition, force-field inconsistencies between molecules will evidently lead to incoherent conformational spaces, directly affecting the transferability of the backmapping.
We investigated the role of the prior. The different priors only have a marginal impact on the quality of the covalent interaction terms. On the contrary, the non-bonded Lennard-Jones interaction is more sensitive to the prior, as can be seen in the distributions of Lennard-Jones energies. This can be explained with the functional form of the interactions: While the harmonic or periodic potentials for the bonded interactions react moderately to shifts of the atomic arrangement, the Lennard-Jones potential is more sensitive and small shifts can rapidly change the energy by several orders of magnitude. The energy-minimizing prior p 1 leads to high-quality configurations for the chemically-transferred DBM, but yields too low energies when trained directly on sPS. The energy-matching prior p 2 has an overall negligible impact compared to training without any prior. We believe that prior p 1 encourages the model to learn more general aspects, such as increasing the distance of non-bonded atoms, while prior p 2 and no prior (only data-driven) let the model focus on more specific features, making them less generalizable.
In general, our approach offers the perspective to efficiently recycle learned local correlations from small and easy to sample molecules and deploy them for the backmapping of more complex systems. This can be of tremendous use for generating high resolution configurations of complex systems, without necessarily simulating the fine-grained system first.