Self-Assembly Dynamics for the Transition of a Globular Aggregate to a Fibril Network of Lysozyme Proteins via a Coarse-Grained Monte Carlo Simulation

The self-organizing dynamics of lysozymes (an amyloid protein with 148 residues) with different numbers of protein chains, Nc = 1,5,10, and 15 (concentration 0.004 – 0.063) is studied by a coarse-grained Monte Carlo simulation with knowledge-based residue-residue interactions. The dynamics of an isolated lysozyme (Nc = 1) is ultra-slow (quasi-static) at low temperatures and becomes diffusive asymptotically on raising the temperature. In contrast, the presence of interacting proteins leads to concentration induced protein diffusion at low temperatures and concentration-tempering sub-diffusion at high temperatures. Variation of the radius of gyration of the protein with temperature shows a systematic transition from a globular structure (at low T) to a random coil (high T) conformation when the proteins are isolated. The crossover from globular to random coil becomes sharper upon increasing the protein concentration (i.e. with Nc = 5,10), with larger Rg at higher temperatures and concentration; Rg becomes s...


I. INTRODUCTION
A diverse range of morphologies, from a solid aggregate to a tenuous fibril network can appear due to self-and directed (by an underlying matrix environment) assembly of proteins and peptides.][3][4][5][6][7][8][9][10][11][12] Many different human proteins such as beta-amyloid, alpha-synuclein, lysozyme, and insulin, with different sequences and structures can form insoluble amyloid fibrils with somewhat similar morphology. 13Enormous efforts have been made in attempting to understanding the underlying mechanisms.For example, beta sheet formation of amyloid proteins appears to be one of the main causes in amyloid diseases. 12,13The understanding of the morphological evolution from an isolated protein to fibrils is far from complete, particularly in such proteins as lysozyme which is involved in hereditary non-neuropathic systemic amyloidosis. 122][3][4] The protein dynamics remain an open problem. 152][3][4][5] In order to study the self-organizing structures involved in fibril formation, aggregation, dispersion, etc. resulting from the cooperative and competing mechanisms, one has to consider many proteins in a matrix.Therefore, simplifications via appropriate coarse-grained approximations is necessary.Using a coarse-grained approach 5 we investigate the self-organizing structures of lysozyme C ( 1 M 2 K 3 A . . . 147G 148 V), a protein with 148 residues. 16everal attempts [1][2][3][4] have already been made to study aggregation of proteins without reference to specific proteins via simplified models with relatively small samples.Some of these studies 1,3,4 include 'minimalist' models involving a few short chains, and MD simulations 2 (replica exchange Langevin dynamics) with a coarse-grained representation (e.g.27 peptides each with 7 residues to study fibrils, β-barrel, and aggregate structures). 2Using a coarse-grained Monte Carlo simulation, Pandey and Farmer 5 have recently examined the multi-scale structures in self-assembly of histones (H3.1, a protein with 136 residues) with as many as 20 proteins in the simulation box.With the large-scale simulations, they have identified the appearance of a range of morphologies e.g from a solid-aggregate at low temperatures to the onset of long-range fibrous networks at high temperatures as a function of protein density.Using the same coarse-grained method, 5 we study the multi-scale morphology of lysozyme as a function of temperature and concentration.The model is presented in the next section, followed by results and discussion with a concluding remark towards the end.

II. MODEL
Residue-residue interactions play a critical role in modeling the structure and dynamics of a protein in coarse-grained descriptions of proteins.Though the fine-scale details of atom-atom interactions are not included, coarse-grained models allow for computational simulations of larger systems and for longer times.In our model, the lysozyme 16 protein is represented by a set of 148 nodes (each node representing a residue) tethered together by flexible covalent (peptide) bonds in a specific sequence on a cubic lattice.Unlike minimalist lattice models, there are several degrees of freedom for each residue to move and covalent bonds to fluctuate which can be enhanced further by fine-graining. 17Even though the intra-molecular (atomistic) details of a residue are not explicitly included, amino acid specificity is captured via unique residue-residue interactions.9][20][21][22][23][24][25] The enormity of the literature available on this subject does not allow us to cite an exhaustive list; we therefore cite only a few for historical perspective and that are especially relevant to the work presented here.We have employed KBRR interactions 5,14 such as the classical Miyazawa-Jernigan contact matrix 19 in studying the structure and dynamics of individual proteins as well as self-assembly of peptides 14 and proteins. 5n our all-residue description of the protein chain, a residue is represented by a node that occupies a unit cube on a cubic lattice and can move to one of its 26 adjacent neighboring sites.The bond length between consecutive nodes varies between 2 and √ (10) in unit of the lattice constant, while maintaining the excluded volume constraints.Note that √ (7) is not feasible on such a cubic lattice and √ (8) is excluded to avoid the constraint of bond-crossing. 26The constraint of the minimum distance of two lattice units between two nodes (inter-and intra-chain) requires excluded volume of the hard-core of a node to be 2 3 in unit of lattice constant.In an initial random configuration, a protein chain can occupy the lattice sites on the trail of a random walk of a cubic node with excluded volume constraints.Such a bond-fluctuation model has been used extensively in understanding the structure and dynamics (including the viscoelastic properties) of complex polymer systems; advantages and pitfall of such approaches are well explored in computational polymer research. 26We have used the bond-fluctuation representation of the peptide bonds with the input of KBRR interactions in several studies for analyzing the structure and dynamics of proteins, 5 peptides, 14 and bio-functionalized soft materials. 27Because of the efficiency of the approach and the complexity of the lysozyme protein, 12,13,16 we adopt it here to explore the self-assembly of lysozymes by a large-scale simulation.Each residue interacts with the surrounding residues (both inter-and intra-chain) as described by a generalized Lennard-Jones (LJ) potential, 5 where r ij is the distance between the residues at site i and j; r c = √ 8 and σ = 1 in units of the lattice constant.Note that the range of interactions as defined by the cut-off r c spans over many lattice sites to interact with surrounding residues (see figure S1 28 ).To provide amino acids with a specific identity, the strength of the LJ potential ε ij is chosen uniquely 5 for each residue-residue interaction pair, with appropriate positive (repulsive) and negative (attractive) values.As described above, we use the classic MJ interaction matrices 5,19 for the residue-residue pair interactions (ε ij ) included in the supplement. 28e analyze the structure and dynamics of protein chain aggregation for simulations with different numbers Nc (=1, 5, 10, 15) of chains in the simulation box.Protein chains (in their random configurations) are initially inserted in the box randomly and are moved around with the excluded volume constraints to randomize their distribution in preparing the sample. 5Note that for such a long protein chain (148 residues), it becomes difficult 5 to insert many chains beyond a certain number as we approach a quasi-jamming physical limit.The computer processing unit time also increases with increasing the number of protein chains.Therefore, we restrict the number of chains that can be feasibly simulated with reasonable simulation time while capturing their cooperative and competing effects.The unit Monte Carlo step (MCS) 26 is defined as the attempt to move each residue (node) once, and varies depending on the size of the system, i.e., the number Nc × L c of attempts to move randomly selected nodes in a simulation box with Nc protein chains each with L c residues defines the unit MCS.
The Metropolis algorithm is used to move each node stochastically.For example, in order to move a randomly selected residue in a randomly selected protein chain from a site i to a site j, the excluded volume constraint and the limitations on changes in bond length are checked first for the proposed move.An attempt is then made to move the residue from site i to site j with the Boltzmann probability exp(−∆E ij /T), where ∆E ij is the change in energy between its new (E j ) and old (E i ) configuration, ∆E ij = E j − E i .T is the temperature in reduced units (ε ij /k B ) of the Boltzmann constant k B and the residue-residue interaction energy energy (ε ij ).
As in previous studies, 5,14 we analyze a number of local and global physical quantities such as the energy of each residue and protein chain, their mobility, mean square displacement of the center of mass of the protein, radius of gyration, and its structure factor.These physical quantities are in arbitrary units, i.e., the length is in units of the lattice constant 5,14 which is different 5 from many all-atom simulations where realistic units for size and time scales are used via calibration of σ and ε ij from experimental data for simplified systems.It is difficult to quantify physical quantities in absolute units due to the phenomenological nature of the interactions (Eq.( 1)) used here.However, the trend in response of the physical quantities to such parameters as the temperature and network density should be qualitatively comparable with appropriate experimental observations.Most of our simulations are performed on a 64 3 lattice, although different lattice sizes are used to assure that there is no significant finite size effect to alter our qualitative findings.Simulations are performed for 10 million MCS time steps with many independent simulations (10-100) depending on Nc.For example, 10 simulations with long runs (10 7 MCS) were performed for the largest Nc = 15, whereas 100 independent simulations with 10 7 MCS were performed with one chain (Nc = 1).Note that the chain number N c corresponds to the occupied volume fraction (i.e., the concentration c) of protein: c = 0.004 (Nc = 1), 0.021 (Nc = 5), 0.042 (Nc = 10), and 0.063 (Nc = 15).These volume fractions appear relatively low, but are necessary because for chains of size L c = 148 (with minimum bond length 2 in units of the lattice constant) the percolation threshold and the jamming limit would be very low.

III. RESULTS AND DISCUSSION
We first examine the structure of a single lysozyme in the simulation box before analyzing the multi-scale structures and dynamics of many interacting protein chains.Figure 1 shows a set of illustrative snapshots in a temperature range with appreciable variations in the radius of gyration (see below) of the protein.Though reliable predictions cannot be based upon a few snapshots, they provide pictures of some of the conformational variability of the protein, segmental interaction, and local structures.For example, at the low temperatures (T = 0.015, 0.017), the protein remains relatively compact with globular segments.The protein expands on increasing the temperature which, reduces the size of the segmental globules considerably.
We have analyzed the simulations to ascertain information about how the dynamics of the protein chain as it performs its stochastic movement through a relatively large ensemble of configurations.Variation of the root mean square (RMS) displacement (R c ) of the protein chain with the time steps, as presented in Figure 2, provides an estimate of the dynamics for a range of temperatures (T = 0.015 − 0.027).At the low temperatures (T = 0.015, 0.016), the protein appears to reach a quasi-static state (with extremely slow motion) in the asymptotic time regime as it settles into its globular conformations after about two million time steps.At high temperatures (T = 0.020 − 0.027) on the other hand, it continues to diffuse with its fluctuating extended conformations.Raising the temperature from low to high enhances the motion of the protein systematically from an ultra-sub-diffusive (almost standstill) towards diffusive dynamics.The variation of the overall changes in the RMS displacement of the protein over the entire time steps with the temperature is provided in the inset of Figure 2, which seems consistent with the interpretation of fluorescence intensity variation with the temperature by Ow and Dunstan. 10The variation of overall changes in R c with the temperature (T) (inset Figure 2) also echoes the global size of the protein (see below) which shows that the structure and dynamics are correlated.
We have carried out extensive simulations with Nc = 1, 5, 10, and 15 of lysozyme chains to examine their cooperative and competing effects on self-organizing structures at a range of temperatures (T = 0.015 − 0.021).Figure 3 shows representative snapshots of the self-assembly with Nc = 10.Globular structures, similar to that of a single protein chain (Figure 1), form networks at the low temperatures.Increasing the temperature opens up individual protein structures and allows them to form spanning networks of fibrous and smaller globules.A network spanning all 10 lysozyme chains appears at the high temperature T = 0.021.
As the self-organizing structures appear, how do individual protein chains move, relax, and conform? Figure 4 shows the variation of the average RMS displacement of the center of mass of a protein with the time steps for the entire temperature range (T = 0.015 − 0.021).Despite some fluctuations, it is possible to identify the trend in dynamics of each protein chain on average, i.e. by examining the power-law dependence R c ∝ t ν in the asymptotic time (t) regime.To our surprise, the protein chains begins to diffuse (ν ∼ 1/2) at low temperature (T = 0.015) in contrast to the extremely slow (nearly standstill ν ∼ 0.1) motion of a free protein chain at low temperature seen in Figure 2.This means that adding protein chains increases their mobility at low temperature while maintaining the network of globular structures.At high temperatures (T = 0.021) on the other hand, proteins slow down considerably with sub-diffusive (ν ∼ 1/4) dynamics as the protein chains elongate and entangle, constraining their movements.Again this is in contrast to diffusive dynamics of a single chain at the high temperatures (see Figure 2).
The variation of the equilibrium radius of gyration (R g ) with the temperature is included in the inset of Figure 5.The radius of gyration exhibits a continuous increase with the temperature before its saturation.There is a non-monotonic trend as the number of protein chains increases from 10 to 15 due to interference of fibrils at moderate to high temperatures.We see that the protein chains elongate on increasing the temperature in the low temperature regime.The radius of gyration also increases by increasing the number of protein chains (concentration) i.e.Nc = 1, 5, 10 until a certain limit beyond which (onset of jamming) it decreases, as is particularly evident with Nc = 15 in the high temperature regime.Concentration-induced elongation of protein chains enhances the formation of fibrils in the high temperature regime below a certain volume fraction, above which the radius of gyration decreases due to steric constraints i.e. crowding.
The global dynamics of protein chains emerges from a collective movement of the constitutive residues.The movement of each residue is controlled by its local surrounding, i.e. the interacting residues.In order to gain an insight into the stochastic movement of a residue along the backbone of a protein, one may estimate the probability (M n ) of its successful move per unit time step; M n is a measure of its local mobility.The mobility profiles of residues at a low (T = 0.015) and a high (T = 0.021) temperatures are presented in Figures 6 and 7 with Nc = 1, 5, 10, and 15.It is rather easy to identify the residues that are least mobile at low temperature.For example, the least mobile residues in an isolated protein (Nc = 1, Figure 6) 22   The pattern of the mobility profiles remain nearly the same (Figure 6) even with the additional interacting protein chains (Nc = 5, 10, 15) with a somewhat reduced mobility with Nc = 15.
The mobility profile at the high temperature (Figure 7) is very different from its low temperature counterpart (Figure 6).Almost all residues appear to be very mobile, few a few exceptions.Two residues ( 51 K, 53 E) appear to be quite immobile.Even though the mobility decreases with increasing number of chains and protein concentration, the identities of the relatively immobile residues remain the same as that of an isolated protein chain in this thermal-driven mobility profile.
The global physical properties such as overall size of the protein conformations depend on the local structures.Residue contact maps can provide some insight into the local surrounding that dictate the segmental movement and structures.Contact maps with Nc = 1, 5, 10, and 15 at low T = 0.015 (Figure 8) and high T = 0.021 (Figure 9) are presented.At the low temperature (Figure 7) we can see significant off-diagonal contacts for Nc = 1, signifying that residues from the opposite ends of the protein chain come in contact and form globular structures (see Figure 1).Adding more protein chains appears to reduce intra-chain looping, which may result in enhancing the elongation, i.e. larger radius of gyration.With many chains (Nc = 5, 10, 15), there may be many loops of various sizes (scatter maps in Figure 8) emerging from the dynamic network of protein chains which contribute to large scale morphology.
The scattering in the residue map reduces considerably at high temperature (Figure 9) for all Nc.For higher Nc (=10, 15), this is consistent with the formation of an extended, spanning network of protein structures involving fibrils as seen in Figure 3 (e.g.; residues from different proteins assemble into a heterogeneous structure with multi-scale structures including tenuously connected fractals.) In order to examine structural evolution that spans multiple length scales (i.e., beyond the size (R g ) of a protein in the simulation box) we analyze the structure factor S(q), where r j is the position of each residue regardless of its affiliation with the protein chain and |q| = 2π/λ is the wave vector of wavelength λ.Using a power-law scaling of the structure factor, one may study the spread of residues over the length scale λ by evaluating the exponent γ which describes the mass (residue) distribution.The effective dimension (D) of the distribution of mass M (number of correlated residues) over its radius of gyration (R) may be evaluated from a scaling, R ∝ M γ with D = 1/γ .Figure 10 shows the variation of the structure factor S(q) with the wave vector q with Nc = 1, 5, 10, and 15 for a range of temperatures (T = 0.015 − 0.027).As seen in Figure 1, the structure of an isolated protein chain (Nc = 1) is globular at low temperature and approaches random coil (self-avoiding walk, SAW) at higher temperature.This is confirmed through the calculation of D using S(q).The radius of gyration R g ∼ 12 at T = 0.015, and R g ∼ 20 at T = 0.021 (Figure 5).At T = 0.015, the scaling of S(q) with the wave vector q around q ∼ 0.5 − 0.6 (λ ∼ R g ) gives D = 1/γ ∼ 3.46 (Figure 10) which shows that the structure of the protein is solid-like (i.e.globular).At the high temperature T = 0.021, on the other hand, the scaling of S(q) around q ∼ 0.3 (λ ∼ R g ) leads to D ∼ 1.78, an exponent associated with SAW linear structure.
The structure factor pattern changes systematically on increasing the protein concentration (Nc = 5, 10, 15).The appearance of oscillations in the variation of S(q) with q in particular is an indication of ordering which set-in as the proteins elongate and form the network (due to both interaction and entanglement).The mass distribution over the entire sample (T = 0.021, Nc = 5, 10, 15) appears to be tenuous (Figure 10), with an effective dimension D ∼ 1.7 similar to that of an SAW.However, unlike a standard linear structure which dominates the large-scale connectedness of the fibrils, there are small aggregates (self-assembled structures with many residues that appear to glue together via non-covalent interactions) (see Figure 3, 8).With many interacting proteins at the low temperatures (T = 0.015, Nc = 15), our data (Figure 10) shows a dense (globular) structure at small scale (D ∼ 2.9) and relatively less dense morphology at large scale (D ∼ 2.3).A smooth transition from small scale dense structure to larger scale tenuous network of fibrils seems to persist FIG. 10.Variation of the structure factor S(q) with the wave vector q for a range of temperature (T = 0.015 − 0.021) with Nc = 1, 5, 10, and 15.Inset in the plots for Nc = 10 is to guide the conversion between the wave vector (q) and the spatial (r ) length scales.Simulations are performed on a 64 3 lattice for 10 7 time steps each with 10-100 independent samples (smaller number of samples with larger number of protein chains).Slopes of the regressive fits of representative data sets are included: at the low temperature T = 0.015, with Nc = 1, slope of the data sets in the wave vector range comparable to its radius of gyration is about -3.46 and with Nc = 15, the slope is about -2.90 for the data sets in the wave vector range comparable to its radius of gyration and -2.30 on a larger scale.Slope at the high temperature (T = 0.021) is about -1.7 over the entire length scale with Nc = 1, 5, 10, and 15.
on increasing the temperature with many interacting protein chains.The amplitude of structural oscillation (a measure of correlation) appears to increase with the protein density.

IV. CONCLUSION
The self-organizing dynamics of lysozymes (an amyloid protein) are studied by a coarsegrained Monte Carlo simulation with input from knowledge-based residue-residue interactions. 19ecause of the efficiency of our coarse-grained approach that involves phenomenological residueresidue interactions, we are able to explore the effects of temperature and protein concentrations on a fairly large system (chains of 148 residues, and as many as 15 chains) for time scales long-enough to observe self-organizing aggregations in a reasonable span of time (on the order of a couple of weeks on a standard contemporary cluster).The simulations are carried out for 10 7 time steps, each at a range of temperatures (T = 0.015 − 0.021) involving a number of independent samples (10-100).A number of interesting observations are made concerning: (i) the difference in dynamics of an isolated chain versus many interacting chains (see below) and (ii) the onset of long range correlation with the self-assembly of a network spanning all chains.
Based on the variation of the RMS displacement of the center of mass of the protein chain with the time steps (i.e.R c ∝ t ν in the asymptotic time (t) regime), the isolated lysozyme appears to move extremely slowly (quasi-static, ν → 0) at low temperatures and speeds up systematically with increasing temperature towards its diffusive dynamics asymptotically.In contrast, the presence of interacting proteins (i.e. with Nc = 10) leads to a concentration-induced protein chain diffusion (ν ∼ 1/2) at low temperatures (e.g.T = 0.015) and concentration-tempered sub-diffusion ((ν ∼ 1/4)) at high temperatures (e.g.T = 0.021).
The radius of gyration of the isolated protein (Nc = 1) exhibits a systematic transition from a globular (low T) to a random coil (high T) conformation with increasing temperature.The crossover from globular to random coil becomes sharper upon increasing the protein concentration (i.e. with Nc = 5, 10), with larger R g at higher temperatures where elongation of protein conformation appears to be concentration driven.Adding more protein chains (e.g. with Nc = 15, crowded regime) leads to a decrease in the magnitude of R g , but still larger than that of an isolated protein.
Visual inspections of the snapshots and animations reveal a range of diverse structures over multiple scales that depend on both the temperature and the protein concentration.Self-assembly of proteins' segments into globular form persists at the low temperatures (i.e.T = 0.015, 0.016) with isolated aggregates at dilute protein concentration, and a connected network of globular aggregates at relatively high protein concentrations (Nc = 10, 15).Increasing the temperature causes appreciable multi-scale responses in structures of a protein (expanded protein with larger R g ) to large-scale (much larger than the size of a protein) morphology of its self-assembled network.At high temperature (T = 0.021), the self-assemble fibril network is easy to identify visually (snapshots) at higher protein concentrations (Nc = 10, 15).
Detailed analysis of the structure factor S(q) provides a quantitative measure of mass distribution in the self-organized heterogeneous network over multiple length scales.Scaling of the structure factor with the wave vector is used to estimate the effective dimension D of the network mass over these length scales.We know the magnitude of the index D for standard structures, i.e.D = 3(solid), 5/3 (random walk).We have found that an isolated lysozyme chain conforms to a globular structure (D ≥ 3) at low temperature, and a random coil (D ∼ 1.7) at high temperatures.With many interacting proteins, at the low temperature T = 0.015, we find different mass distributions at different length scales (implying different structures), i.e.D ∼ 2.9 on the scale comparable to the radius of gyration of a single chain, and D ∼ 2.3 at the large scale over the entire sample.The network of fibrils, at high temperature (T = 0.021) is rather tenuous, i.e.D ∼ 1.7 on a large-scale with a heterogeneous distribution of small (micro) globules that may be very important in propagating the long range correlation over the spanning cluster.
Aggregation of fibrils is the hallmark of amyloids.We have identified the structures that span multiple scales from the radius of gyration of an individual lysozyme to the global amyloid network as a function of temperature and the protein concentration.We hope this study will stimulate further investigations and help in interpreting and understanding experimental data.

FIG. 1 .
FIG. 1. Snapshots of the Lysozyme chain at the temperatures T = 0.015, 0.017, 0.019, and 0.021 (from left to right) towards the end of 10 7 time steps in a 64 3 simulation box.Spheres represent the interacting residues (other than adjacent residues along the backbone) within the range of interaction.

FIG. 3 .
FIG. 3. Snapshots of the self-organized structures of 10 lysozyme chains at the temperatures T = 0.015 (top left), 0.017 (top right), 0.019 (bottom left), and 0.021 (bottom right) towards the end of 10 7 time steps in a 64 3 simulation box.Spheres represent the interacting residues within the range of interaction.

FIG. 4 .
FIG. 4. Variation of the RMS displacement R c of the center of mass of the protein chain with the time steps at temperatures T = 0.015 − 0.021 with Nc = 10; data sets with Nc = 15 for two temperatures (T = 0.015, 0.021) are included to see the trend.Simulations are performed on a 64 3 lattice for 10 7 time steps each with 10 -20 independent samples (lower samples with larger number of protein chains).

FIG. 6 .
FIG.6.Mobility (M n , probability of a successful move) for each residue per unit time step, at low temperature T = 0.015 for Nc = 1, 5, 10, and 15.These data represents the average mobility of each residue.Simulations are performed on a 64 3 lattice for 10 7 time steps each with 10-100 independent samples (lower samples with larger number of protein chains).