No Access Submitted: 10 August 2016 Accepted: 05 October 2016 Published Online: 26 October 2016
J. Chem. Phys. 145, 164104 (2016); https://doi.org/10.1063/1.4965440
more...View Affiliations
The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.
This research has been partially funded by Deutsche Forschungsgemeinschaft (DFG) through Grant No. CRC 1114. O.L. acknowledges support by the DFG collaborative research center CRC 765. The computer facilities of the Freie Universität Berlin (ZEDAT) are acknowledged for computer time.
  1. 1. N. Singhal, C. D. Snow, and V. S. Pande, “Using path sampling to build better Markovian state models: Predicting the folding rate and mechanism of a tryptophan zipper beta hairpin,” J. Chem. Phys. 121, 415 (2004). https://doi.org/10.1063/1.1738647, Google ScholarScitation, ISI
  2. 2. V. S. Pande, K. Beauchamp, and G. R. Bowman, “Everything you wanted to know about Markov state models but were afraid to ask,” Methods 52, 99 (2010). https://doi.org/10.1016/j.ymeth.2010.06.002, Google ScholarCrossref
  3. 3. A. Sirur, D. De Sancho, and R. B. Best, “Markov state models of protein misfolding,” J. Chem. Phys. 144, 075101 (2016). https://doi.org/10.1063/1.4941579, Google ScholarScitation, ISI
  4. 4. S. Doerr, M. J. Harvey, F. Noé, and G. D. Fabritiis, “HTMD: High-throughput molecular dynamics for molecular discovery,” J. Chem. Theory Comput. 12, 1845 (2016). https://doi.org/10.1021/acs.jctc.6b00049, Google ScholarCrossref
  5. 5. G. R. Bowman, D. L. Ensign, and V. S. Pande, “Enhanced modeling via network theory: Adaptive sampling of Markov state models,” J. Chem. Theory Comput. 6, 787 (2010). https://doi.org/10.1021/ct900620b, Google ScholarCrossref
  6. 6. Q. Qiao, G. R. Bowman, and X. Huang, “Dynamics of an intrinsically disordered protein reveal metastable conformations that potentially seed aggregation,” J. Am. Chem. Soc. 135, 16092 (2013). https://doi.org/10.1021/ja403147m, Google ScholarCrossref
  7. 7. C. T. Leahy, R. D. Murphy, G. Hummer, E. Rosta, and N.-V. Buchete, “Coarse master equations for binding kinetics of amyloid peptide dimers,” J. Phys. Chem. Lett. 7, 2676 (2016). https://doi.org/10.1021/acs.jpclett.6b00518, Google ScholarCrossref
  8. 8. M. Schor, A. S. J. S. Mey, F. Noé, and C. E. MacPhee, “Shedding light on the dock–lock mechanism in amyloid fibril growth using Markov state models,” J. Phys. Chem. Lett. 6, 1076 (2015). https://doi.org/10.1021/acs.jpclett.5b00330, Google ScholarCrossref
  9. 9. J. Witek, B. G. Keller, M. Blatter, A. Meissner, T. Wagner, and S. Riniker, “Kinetic models of cyclosporin A in polar and apolar environments reveal multiple congruent conformational states,” J. Chem. Inf. Model. 56, 1547 (2016). https://doi.org/10.1021/acs.jcim.6b00251, Google ScholarCrossref
  10. 10. G. R. Bowman, K. A. Beauchamp, G. Boxer, and V. S. Pande, “Progress and challenges in the automated construction of Markov state models for full protein systems,” J. Chem. Phys. 131, 124101 (2009). https://doi.org/10.1063/1.3216567, Google ScholarScitation, ISI
  11. 11. M. Sarich, F. Noé, and C. Schütte, “On the approximation quality of Markov state models,” Multiscale Model. Simul. 8, 1154 (2010). https://doi.org/10.1137/090764049, Google ScholarCrossref
  12. 12. L. V. Nedialkova, M. A. Amat, I. G. Kevrekidis, and G. Hummer, “Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions,” J. Chem. Phys. 141, 114102 (2014). https://doi.org/10.1063/1.4893963, Google ScholarScitation
  13. 13. J. D. Chodera, N. Singhal, V. S. Pande, K. A. Dill, and W. C. Swope, “Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics,” J. Chem. Phys. 126, 155101 (2007). https://doi.org/10.1063/1.2714538, Google ScholarScitation, ISI
  14. 14. G. R. Bowman and P. L. Geissler, “Equilibrium fluctuations of a single folded protein reveal a multitude of potential cryptic allosteric sites,” Proc. Natl. Acad. Sci. U. S. A. 109, 11681 (2012). https://doi.org/10.1073/pnas.1209309109, Google ScholarCrossref
  15. 15. R. T. McGibbon and V. S. Pande, “Learning kinetic distance metrics for markov state models of protein conformational dynamics,” J. Chem. Theory Comput. 9, 2900 (2013). https://doi.org/10.1021/ct400132h, Google ScholarCrossref
  16. 16. G. Pérez-Hernández, F. Paul, T. Giorgino, G. D. Fabritiis, and F. Noé, “Identification of slow molecular order parameters for Markov model construction,” J. Chem. Phys. 139, 015102 (2013). https://doi.org/10.1063/1.4811489, Google ScholarScitation, ISI
  17. 17. F. Noé and F. Nüske, “A variational approach to modeling slow processes in stochastic dynamical systems,” Multiscale Model. Simul. 11, 635 (2013). https://doi.org/10.1137/110858616, Google ScholarCrossref
  18. 18. F. Nüske, B. G. Keller, G. Pérez-Hernández, A. S. J. S. Mey, and F. Noé, “Variational approach to molecular kinetics,” J. Chem. Theory Comput. 10, 1739 (2014). https://doi.org/10.1021/ct4009156, Google ScholarCrossref
  19. 19. F. Vitalini, F. Noé, and B. G. Keller, “A basis set for peptides for the variational approach to conformational kinetics,” J. Chem. Theory Comput. 11, 3992 (2015). https://doi.org/10.1021/acs.jctc.5b00498, Google ScholarCrossref
  20. 20. C. Schütte, F. Noé, J. Lu, M. Sarich, and E. Vanden-Eijnden, “Markov state models based on milestoning,” J. Chem. Phys. 134, 204105 (2011). https://doi.org/10.1063/1.3590108, Google ScholarScitation, ISI
  21. 21. M. Sarich, R. Banisch, C. Hartmann, and C. Schütte, “Markov state models for rare events in molecular dynamics,” Entropy 16, 258 (2013). https://doi.org/10.3390/e16010258, Google ScholarCrossref
  22. 22. W. E and E. Vanden-Eijnden, “Transition-path theory and path-finding algorithms for the study of rare events,” Annu. Rev. Phys. Chem. 61, 391 (2010), https://web.math.princeton.edu/~weinan/. https://doi.org/10.1146/annurev.physchem.040808.090412, Google ScholarCrossref
  23. 23. A. K. Faradjian and R. Elber, “Computing time scales from reaction coordinates by milestoning,” J. Chem. Phys. 120, 10880 (2004). https://doi.org/10.1063/1.1738640, Google ScholarScitation, ISI
  24. 24. E. Vanden-Eijnden, M. Venturoli, G. Ciccotti, and R. Elber, “On the assumptions underlying milestoning,” J. Chem. Phys. 129, 174102 (2008). https://doi.org/10.1063/1.2996509, Google ScholarScitation, ISI
  25. 25. E. Vanden-Eijnden and M. Venturoli, “Markovian milestoning with Voronoi tessellations,” J. Chem. Phys. 130, 194101 (2009). https://doi.org/10.1063/1.3129843, Google ScholarScitation, ISI
  26. 26. B. Keller, X. Daura, and W. F. van Gunsteren, “Comparing geometric and kinetic cluster algorithms for molecular simulation data,” J. Chem. Phys. 132, 074110 (2010). https://doi.org/10.1063/1.3301140, Google ScholarScitation, ISI
  27. 27. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in KDD-96 Proceedings (AAAI, 1996), p. 226. Google Scholar
  28. 28. R. A. Jarvis and E. A. Patrick, “Clustering using a similarity measure based on shared near neighbors,” IEEE Trans. Comput. C-22, 1025 (1973). https://doi.org/10.1109/T-C.1973.223640, Google ScholarCrossref
  29. 29. D. V. D. Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark, and H. J. C. Berendsen, “GROMACS: Fast, flexible, and free,” J. Comput. Chem. 26, 1701 (2005). https://doi.org/10.1002/jcc.20291, Google ScholarCrossref
  30. 30. N.-V. Buchete and G. Hummer, “Coarse master equations for peptide folding dynamics,” J. Phys. Chem. B 112, 60576069 (2008). https://doi.org/10.1021/jp0761665, Google ScholarCrossref
  31. 31. C. Schütte, A. Fischer, W. Huisinga, and P. Deuflhard, “A direct approach to conformational dynamics based on hybrid Monte Carlo,” J. Comput. Phys. 151, 146 (1999). https://doi.org/10.1006/jcph.1999.6231, Google ScholarCrossref
  32. 32. J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte, and F. Noé, “Markov models of molecular kinetics: Generation and validation,” J. Chem. Phys. 134, 174105 (2011). https://doi.org/10.1063/1.3565032, Google ScholarScitation, ISI
  33. 33. C. Schütte, “Conformational dynamics: Modelling, theory, algorithm, and application to biomolecules,” Habilitation thesis, Konrad-Zuse-Zentrum für Informationstechnik Berlin, 1999. Google Scholar
  34. 34. B. Keller, P. Hünenberger, and W. F. van Gunsteren, “An analysis of the validity of Markov state models for emulating the dynamics of classical molecular systems and ensembles,” J. Chem. Theory Comput. 7, 1032 (2011). https://doi.org/10.1021/ct200069c, Google ScholarCrossref
  35. 35. J.-H. Prinz, B. Keller, and F. Noé, “Probing molecular kinetics with Markov models: Metastable states, transition pathways and spectroscopic observables,” Phys. Chem. Chem. Phys. 13, 16912 (2011). https://doi.org/10.1039/c1cp21258c, Google ScholarCrossref
  36. 36. B. G. Keller, J.-H. Prinz, and F. Noé, “Markov models and dynamical fingerprints: Unraveling the complexity of molecular kinetics,” Chem. Phys. 396, 92 (2012). https://doi.org/10.1016/j.chemphys.2011.08.021, Google ScholarCrossref
  37. 37. C. Schütte and M. Sarich, “A critical appraisal of Markov state models,” Eur. Phys. J.: Spec. Top. 224, 2445 (2015). https://doi.org/10.1140/epjst/e2015-02421-0, Google ScholarCrossref
  38. 38. W. C. Swope, J. W. Pitera, and F. Suits, “Describing protein folding kinetics by molecular dynamics simulations. 1. Theory,” J. Phys. Chem. B 108, 6571 (2004). https://doi.org/10.1021/jp037421y, Google ScholarCrossref
  39. 39. F. Noé, I. Horenko, C. Schütte, and J. C. Smith, “Hierarchical analysis of conformational dynamics in biomolecules: Transition networks of metastable states,” J. Chem. Phys. 126, 155102 (2007). https://doi.org/10.1063/1.2714539, Google ScholarScitation, ISI
  40. 40. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” J. Chem. Phys. 21, 1087 (1953). https://doi.org/10.1063/1.1699114, Google ScholarScitation, ISI
  41. 41. F. Vitalini, A. S. J. S. Mey, F. Noé, and B. G. Keller, “Dynamic properties of force fields,” J. Chem. Phys. 142, 084101 (2015). https://doi.org/10.1063/1.4909549, Google ScholarScitation, ISI
  42. 42. K. Lindorff-Larsen, S. Piana, K. Palmo, P. Maragakis, J. L. Klepeis, R. O. Dror, and D. E. Shaw, “Improved side-chain torsion potentials for the Amber ff99SB protein force field,” Proteins 78, 1950 (2010). https://doi.org/10.1002/prot.22711, Google ScholarCrossref
  43. 43. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L. Klein, “Comparison of simple potential functions for simulating liquid water,” J. Chem. Phys. 79, 926 (1983). https://doi.org/10.1063/1.445869, Google ScholarScitation, ISI
  44. 44. G. Bussi, D. Donadio, and M. Parrinello, “Canonical sampling through velocity rescaling,” J. Chem. Phys. 126, 014101 (2007). https://doi.org/10.1063/1.2408420, Google ScholarScitation, ISI
  45. 45. B. Hess, H. Bekker, H. J. C. Berendsen, and J. G. E. M. Fraaije, “LINCS: A linear constraint solver for molecular simulations,” J. Comput. Chem. 18, 1463 (1997). https://doi.org/10.1002/(SICI)1096-987X(199709)18:12<1463::AID-JCC4>3.0.CO;2-H, Google ScholarCrossref
  46. 46. T. Darden, D. York, and L. Pedersen, “Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems,” J. Chem. Phys. 98, 10089 (1993). https://doi.org/10.1063/1.464397, Google ScholarScitation, ISI
  47. 47. V. A. Gil and V. Guallar, “pyRMSD: A Python package for efficient pairwise RMSD matrix calculation and handling,” Bioinformatics 29, 2363 (2013). https://doi.org/10.1093/bioinformatics/btt402, Google ScholarCrossref
  48. 48. D. L. Theobald, “Rapid calculation of RMSDs using a quaternion-based characteristic polynomial,” Acta Crystallogr., Sect. A 61, 478 (2005). https://doi.org/10.1107/S0108767305015266, Google ScholarCrossref
  49. 49. P. Deuflhard and M. Weber, “Robust Perron cluster analysis in conformation dynamics,” Linear Algebra Appl. 398, 161 (2005). https://doi.org/10.1016/j.laa.2004.10.026, Google ScholarCrossref
  50. 50. M. Senne, B. Trendelkamp-Schroer, A. S. J. S. Mey, C. Schütte, and F. Noé, “EMMA: A software package for Markov model building and analysis,” J. Chem. Theory Comput. 8, 2223 (2012). https://doi.org/10.1021/ct300274u, Google ScholarCrossref
  51. 51. M. K. Scherer, B. Trendelkamp-Schroer, F. Paul, G. Pérez-Hernández, M. Hoffmann, N. Plattner, C. Wehmeyer, J.-H. Prinz, and F. Noé, “PyEMMA 2: A software package for estimation, validation, and analysis of Markov models,” J. Chem. Theory Comput. 11, 5525 (2015). https://doi.org/10.1021/acs.jctc.5b00743, Google ScholarCrossref
  52. 52. T. Kortemme, M. Ramírez-Alvarado, and L. Serrano, “Design of a 20 amino-acid, three-stranded β-sheet protein,” Science 281, 253 (1998). https://doi.org/10.1126/science.281.5374.253, Google ScholarCrossref
  53. 53. A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science 344, 1492 (2014). https://doi.org/10.1126/science.1242072, Google ScholarCrossref
  54. 54. F. Sittel and G. Stock, “Robust density-based clustering to identify metastable conformational states of proteins,” J. Chem. Theory Comput. 12, 2426 (2016). https://doi.org/10.1021/acs.jctc.5b01233, Google ScholarCrossref
  55. 55. E. Guarnera and E. Vanden-Eijnden, “Optimized Markov state models for metastable systems,” J. Chem. Phys. 145, 024102 (2016). https://doi.org/10.1063/1.4954769, Google ScholarScitation, ISI
  1. © 2016 Author(s). Published by AIP Publishing.