Domain Rearrangement and Denaturation in Ebola Virus Protein VP40

The VP40 protein plays a critical role in coordinating the virion assembly, budding, and replication of the Ebola virus. Efforts have been made in recent years to understand various aspects of VP40 structure, dynamics, and function such as assembly of the protein and its roles in virus replication and penetration of the protein into the plasma membrane. A major conformational transformation is necessary for VP40 to form some of its oligomeric structures and to perform various functions. This conformational change from a compact structure with the N-terminal domain (NTD) and C-terminal domain (CTD) closely associated involves a dissociation or springing-out of the CTD from the NTD. We perform investigations using computational molecular dynamics simulations as well as knowledge-based Monte Carlo simulations. We find that a sharp springing of the CTD from the NTD in a free VP40 protein cannot occur solely by random thermal fluctuations without intermediate oligomerized segments, and therefore is likely triggered by additional molecular events.


INTRODUCTION
The VP40 protein plays a critical role in coordinating the virion assembly, budding, and replication of the Ebola virus. 1- 6 Enormous efforts have been made in recent years to understand various aspects of VP40 structure, dynamics, and function such as assembly of the protein 1 and its roles in virus replication 2 and penetration of the protein into the plasma membrane. 3 These molecular investigations on VP40 may help in designing therapeutics 4 against the Ebola virus. As part of these investigations, it is helpful to understand the structure and dynamics of a VP40 monomer in order to assess how its various oligomeric states perform their functions.
A VP40 monomer ( Fig. 1) is composed of a distinct C-terminal domain (CTD) connected by a linker segment to an N-terminal domain (NTD). A major conformational transformation is necessary for VP40 to form some of its oligomeric structures and to perform various functions. This transformer conformational change from a compact structure with the NTD and CTD closely associated (Fig. 1a) involves a dissociation or springing-out of the CTD (Fig. 1b) from the NTD. Figure 1a is the initial configuration used in the molecular dynamics computational investigations discussed below. The information for this initial, compact configuration of Fig. 1a was obtained from the Protein Data Bank (PDB ID 1ES6). The crystal structure for the sprung configuration of Fig. 1b is not available. In order to create Fig. 1b, we pulled gently on the atoms of the CTD to 'spring' the CTD away from the NTD at 300K without disrupting the CTD's folded domain structure. We note that the image in Fig. 1b is used only to pictorially display the 'sprung' configuration and that this arrangement is not used in any of the calculations.
In the sprung configuration, the CTD and NTD remain connected by the linker segment, but have little interfacial contact. The causation of this domain dissociation is under investigation. Silva et al. 1 have pointed out that "VP40 proceeds through intermediate states of assembly (e.g. octamers) but it remains unclear how these intermediates are coordinated with the various stages of the life cycle" particularly the extreme N and C terminals in oligomerization processes. They investigated the 'induced structural and conformational changes' of VP40 particularly aggregation and urea induced denaturation using hydrogen-deuterium exchange mass spectroscopy. They observed only a single conformational transition with an onset concentration of 2 M urea and concluded that "the full length monomer is remarkably stable in presence of high concentration of urea and incapable of selfassembly". With computational modeling it may be easier to analyze the functional conformational springing transition in terms of the interplay between amino acid residue-residue interactions versus thermal agitation. Our investigations make use of computational molecular dynamics simulations as well as knowledge-based Monte Carlo simulations. Data presented in this study shows how the thermal agitations control the oligomerization as the protein undergoes structural transformation. We find that the springing of the CTD from the NTD of a free monomer without bridging via segmental oligomerization cannot occur by random thermal fluctuations alone, and therefore is likely triggered by additional and perhaps specific molecular events as proposed by Ref. 7.

METHODS
In order to investigate large scale structural changes in the VP40 monomer, we use computer simulations over a range of temperatures. By simulating the system at higher temperatures, thermally induced structural changes can occur in time-scales short enough to observe in computational simulations. In all simulations, the initial configuration of the VP40 monomer has the NTD and CTD in close contact.

Molecular dynamics simulations
The initial, compact configuration of the VP40 with the NTD and CTD closely associated was obtained from the Protein Data Bank (PDB ID 1ES6). Missing residues were added using the Modeler software. 8 We performed coarse-grained molecular dynamic (CGMD) simulations using the Martini force field in which approximately four heavy atoms are mapped onto a single interaction site (bead). [9][10][11] The monomer system was generated using the Martini-maker plugin in the Charmm-gui web server [12][13][14] with the Martini22 non-polarizable water model with 0.15 mM NaCl so that counterions were available to neutralize the system. The total system of VP40 monomer and water had a total of 5944 beads. All of our CGMD simulations were run using Gromacs-5.1.1(gpu-version). 15,16 A time step of 15 fs was used for the coarse-grained simulations. The Lennard-Jones and coulomb potential cutoff was set at 12 Å. The Verlet neighbor scheme was used with a straight cutoff to keep track of particles within specified distances. Coulomb interactions were treated using a PME. Ring systems and stiff bonds were controlled using the LINCS algorithm. The pressure was maintained at 1 bar using the Berendsen barostat during equilibration runs and the Parrinello-Rahman barostat during production runs. Pressure coupling was isotropic with a compressibility of 4.5 × 10 −4 bar −1 . The temperature coupling was done using a velocity rescale algorithm. In order to compare our CG computations with the MC results, as well as to allow conformational changes to occur, we did not restrain the protein with an elastic network. The conformational changes that occur without constraining the secondary structure with the elastic network, as well as the use of the non-polarizable water place limitations on the accuracy of the fine-scale representation of the structural dynamics of the protein.
We ran simulations at eight different temperatures: 300, 450, 550, 600, 700, 800, 1000 and 1100 K. Each system was run for 5 ns of minimization, 200 ns of equilibration, and 500 ns of production run. We used the VMD 17 visualization tool for analysis of our CGMD runs. The analysis for the radius-of-gyration (discussed below) was performed on the final 100 ns of the production run.

Monte Carlo simulations
Monte Carlo (MC) computational simulations involve a coarse-grained representation of amino acids by a node on a cubic lattice with several degrees of freedom. 18,19 Even though the atomic scale structure of each residue is ignored, their specificity is captured via unique residue-specific interactions. The protein VP40 is a chain of 326 residues ( Figure S1) tethered together by covalent/peptide bonds; the length of the covalent bond between consecutive residues in the MC simulations varies between 2 and √ (10) in units of the lattice constant. The bond-fluctuation approach used here is one of the most efficient and effective methods to investigate large-scale complex problems 20 in polymers, soft matter, and proteins. 18,19 The degrees of freedom for nodes to move and bonds to fluctuate, though limited, are well-defined on a cubic lattice in comparison to off-lattice, continuum space generally used in all-atom MD simulations such as the one used here. The degrees of freedom can be easily enhanced by fine-graining, 19 and allows extension of a computationally efficient 20 approach to protein modeling.
A generalized Lennard-Jones potential 18,19 involving knowledge-based residue-residue interactions 21,22 as input is used for the residue-residue interaction with a range (r c ). It is worth pointing out that the knowledge based residue-residue interactions 21,22 involves a huge ensemble of protein structures from the protein data bank (PDB). Of 400 residue-residue contact elements of a 20×20 matrix of 20 amino acids, there are 210 independent pair interactions. Each interaction pair is unique with attractive (negative) and repulsive (positive) matrix elements 21,22 that capture their specificity. Since the structures (i.e. X-ray snapshots) in the PDB involve proteins in real laboratory samples, interactions of each residue with the solvent and other constituents of the underlying sample are implicitly included in the estimates of the residue-residue contact interactions. Even though the generalized LJ potential 18,19 used here is phenomenological, it captures the specificity of residues somewhat effectively. Each residue performs their stochastic movement guided by the Metropolis algorithm at each temperature (T ). A series of attempts to move each residue defines one Monte Carlo step (MCS). Simulations are performed for sufficiently long times (t), typically t = 10 7 MCS time-steps, so that the system reaches equilibrium. Starting with an initial random configuration, simulations are performed on a 350 3 lattice with 100 independent samples, at each temperature to estimate average values of local and global physical quantities. MC parameters are measured in arbitrary scales (i.e. T in the reduced unit of the Boltzmann constant, sample size in lattice constant, etc.). Connection between the arbitrary reduced units and the realistic units (order of magnitude) can be made by comparing and calibrating the results of the coarse-grained MC approach with the data from MD simulations (see below). Different lattice sizes are also used to make sure that our results (qualitative trends) are independent of lattice size. Figure 2 displays the results of the MD simulations. The radius-of-gyration (R g ) of the VP40 monomer is given as a function of temperature. As can be seen in the figure, as the temperature increases, the VP40 monomer undergoes a structural transition from a compact configuration to an extended configuration. Upon examination of the trajectory, we found that the high-temperature extended configuration of the VP monomer was not due to dissociative springing of the NTD from the CTD, but instead was due to denaturation of the protein. In order to better understand the effects of residue-residue interactions versus thermal fluctuations, we analyzed the conformational response using knowledge-based MC simulations.

RESULTS AND DISCUSSION
In order to use the knowledge-based MC simulations together with the MD simulations to understand the dynamics of the VP40 monomer, we first checked if both computational methods resulted in similar behavior. Figure 3 shows that the R g as a function of temperature displays similar behavior in the MC simulations as it does in the MD simulations. The MD values for temperature and R g provide approximate calibration of the units for the MC simulations.
Snapshots of VP40 conformations from the knowledge-based MC simulations at representative temperatures are presented in Figures 4 and S2 (supplementary material). These snapshots also show that thermal fluctuations do not lead to springing of the CTD from the NTD, but instead result in segmental oligomerization and denaturation. Clustering of self-assembled residues occurs at low temperatures (i.e., T = 0.019-0.022) and gradually reduces as the size of protein expands on raising the temperature. Some degree of domain formation persists at the intermediate temperatures but disappears upon raising the temperature further (see Figure S2 at T = 0.025).
Trends in the evolution of segmental domain growth and decay can be better clarified by examining the contact maps of Fig. 5 which shows how the coordinated segmental structures disappear as the temperature is increased. For example, at a low temperature (T = 0.013), segmental organization, a cooperative phenomenon resulting from interactions among the neighboring residues along the protein backbone, shows a phase-separation with three distinct regions ( 1 M -120 Y, 125 F -223 G, 225 K -326 K, Figure S1). Increasing the temperature leads to the appearance of larger loop segments and decay of globular density in the intermediate temperature region, T=0.021, which may be due to the onset of denaturation and further investigated below. Note that some degree of segmental globular structure still exists in the center region (involving 111 T -228 S) of the protein at T = 0.021. All organized structure vanishes at high temperature (T = 0.025, see Figure S2). Note that the contact map (Fig. 5) and snapshots ( Fig. 4 and Fig. S2) represent transient structures. In order to assess the overall contacts, one has to quantify the degree of contacts of each residue and evaluate its average values over many configurations of the protein as follows. The average number N n of residues around each residue within the range of interaction is a measure of the contact density and provides insight into the segmental stability of the local structures. The contact density profiles are presented in Fig. 6 and Fig. S3 at representative low, medium, and high temperatures. The contact density of the N-tail, residues 1 M− 13 Y, the main part of the NTD ( 70 S− 185 P), and the C-tail ( 290 P− 326 K) (Fig. S3) are noteworthy as the extreme N and C terminal tails seem to 'stabilize the monomeric state through a direct association' as reported by Silva et al. 1 and Gc et al. 23  the domain associated configuration, and play a role in the binding of VP40 to the host cell membrane and the subsequent budding of new virus-like particles. We find that the domains evolve with the temperature as a result of its redistribution and growth and decay of its size (Fig. 6). For example at a low temperature (T=0.013, Fig. S3), each of the three clusters (N-tail 1 M− 40 E, NTD 72 A − 195 T, CTD 201 G− 305 M) has a relatively high contact density as the residues self-assemble into segmental globules (see Fig. S2 for the corresponding snapshot). The segmental clusters almost disappear at high temperature (Fig. S2 at T=0.025) due to denaturation. Note that the high contact density of the extreme N-tail ( 1 M− 13 Y) persists over a relatively large range of temperature (T=0.013−0.023) while the globular state of the C-terminal disappears. The large segment of the NTD ( 70 S− 185 P) phaseseparates into two domains ( 72 A− 105 T and 145 H− 185 P) on raising the temperature from T=0.013 to T=0.021. The segmental mobility profile appears to complement the contact map profile, i.e., lower mobility at higher contact density (Fig. S4). Denaturation versus domain dissociation can also be quantified through the effective dimension D of the protein chain, which is a measure of how the mass is distributed at various length scales. Compact structures, such as domains have D∼3, whereas denatured, random-coil structures have D≤2. The structural dimensionality can be determined 5 by calculating the structure factor S(q) as a function of the wave vector |q| = 2π/λ, with the wavelength λ used for q in Fig. 7 is given in MC lattice units. Variation of the structure factor S(q) with the wave vector q is presented in Fig. 7 for representative temperatures. From Fig. 7, we can determine the scaling of the structure factor S(q) with the wave vector q, S(q) ∝ q -1/γ to get a value for the exponent. Fitting the data for the structure factor at different length scales (λ) comparable to R g , one can estimate the effective dimension D ≈ 1/γ for the mass distribution N ∝ R g 1/γ . Estimates for -1/γ at various T and for various length scales are provided in Fig. 7, which shows that the overall protein structure is globular (D ∼ 3) at the low temperature (T=0.013). The effective dimension D decreases on increasing the temperature, i.e., D ∼ 2.5 at T= 0.021, D∼ 2 at T ≥ 0.23 which implies that the conformation of the protein becomes a denatured random-coil at higher temperatures. The globular (D ∼ 3) to random coil (D∼ 2) transition appears to be continuous as a function of temperature.

CONCLUSION
A crucial aspect of the structural-function behavior of the Ebola virus protein VP40 is its ability to dissociate its CTD from its NTD to perform different functions. To determine if this springing apart of the domains occurs due to random thermal motions, we have used both molecular dynamics and knowledge-based Monte Carlo computational simulations to investigate the thermally induced behavior of a monomer of VP40. Conformation of the protein clearly shows that domains appear as the residues self-assemble in globules at low temperatures. Domains are localized around N-tail, the intermediate NTD region, and C-terminal at low temperature, but the domain structures disappear at high temperature where the protein denatured into a randomcoil conformation. Therefore, the springing of the CTD from the NTD cannot occur by random thermal fluctuations alone without bridging via segmental oligomerization, and that instead, the protein is more likely to denature. Thus, the functionally important springing apart of the domains is likely triggered by additional mechanism such as a specific molecular event as proposed by Bornholtd. 7