No Access Submitted: 05 September 2018 Accepted: 17 December 2018 Published Online: 10 January 2019
J. Chem. Phys. 150, 024307 (2019);
Predicting molecular properties using a Machine Learning (ML) method is gaining interest among research as it offers quantum chemical accuracy at molecular mechanics speed. This prediction is performed by training an ML model using a set of reference data [mostly Density Functional Theory (DFT)] and then using it to predict properties. In this work, kernel based ML models are trained (using Bag of Bonds as well as many body tensor representation) against datasets containing non-equilibrium structures of six molecules (water, methane, ethane, propane, butane, and pentane) to predict their atomization energies and to perform a Metropolis Monte Carlo (MMC) run with simulated annealing to optimize molecular structures. The optimized structures and energies of the molecules are found to be comparable with DFT optimized structures, energies, and forces. Thus, this method offers the possibility to use a trained ML model to perform a classical simulation such as MMC without using any force field, thereby improving the accuracy of the simulation at low computational cost.
We would like to thank Dr. Matthias Rupp for providing us with MBTR code and essential tutorials to get started with MBTR. We would also like to thank Software for Chemistry and Materials ( for providing us with the DFT software used in this work.
  1. 1. A. P. Bartok and G. Csanyi, “Gaussian approximation potentials: A brief tutorial introduction,” Int. J. Quantum Chem. 115(16), 1051–1057 (2015)., Google ScholarCrossref
  2. 2. A. P. Bartók, R. Kondor, and G. Csányi, “On representing chemical environments,” Phys. Rev. B 87(18), 184115 (2013)., Google ScholarCrossref
  3. 3. J. Behler, R. Martonak, D. Donadio, and M. Parrinello, “Metadynamics simulations of the high-pressure phases of silicon employing a high-dimensional neural network potential,” Phys. Rev. Lett. 100(18), 185501 (2008)., Google ScholarCrossref
  4. 4. J. Behler and M. Parrinello, “Generalized neural-network representation of high-dimensional potential-energy surfaces,” Phys. Rev. Lett. 98(14), 146401 (2007)., Google ScholarCrossref
  5. 5. B. Bertsimas and J. Tsitsiklis, “Simulated Annealing,” Stat. Sci. 8(1), 10–15 (1993), Google ScholarCrossref
  6. 6. C. M. Bishop, Pattern Recognition and Machine Learning, 1st ed. (Springer, 2006). Google Scholar
  7. 7. J. R. Boes and J. R. Kitchin, “Modeling segregation on AuPd(111) surfaces with density functional theory and Monte Carlo simulations,” J. Phys. Chem. C 121(6), 3479–3487 (2017)., Google ScholarCrossref
  8. 8. V. Botu, R. Batra, J. Chapman, and R. Ramprasad, “Machine learning force fields: Construction, validation, and outlook,” J. Phys. Chem. C 121(1), 511–522 (2017)., Google ScholarCrossref
  9. 9. K. R. Brorsen, Y. Yang, M. V. Pak, and S. Hammes-Schiffer, “Is the accuracy of density functional theory for atomization energies and densities in bonding regions correlated?,” J. Phys. Chem. Lett. 8(9), 2076–2081 (2017)., Google ScholarCrossref
  10. 10. A. Brown, A. B. McCoy, B. J. Braams, Z. Jin, and J. M. Bowman, “Quantum and classical studies of vibrational motion of CH5+ on a global potential energy surface obtained from a novel ab initio direct dynamics approach,” J. Chem. Phys. 121(9), 4105–4116 (2004)., Google ScholarScitation, ISI
  11. 11. S. De, A. P. Bartók, G. Csányi, and M. Ceriotti, “Comparing molecules and solids across structural and alchemical space,” Phys. Chem. Chem. Phys. 18(20), 13754–13769 (2016)., Google ScholarCrossref
  12. 12. E. Garijo del Río, J. Jørgen Mortensen, and K. W. Jacobsen, “A local Bayesian optimizer for atomic structures,” e-print arXiv:1808.08588 (2018). Google Scholar
  13. 13. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification (John Wiley and Sons, Inc., New York, 2001). Google Scholar
  14. 14. F. A. Faber, A. S. Christensen, B. Huang, and O. Anatole von Lilienfeld, “Alchemical and structural distribution based representation for universal quantum machine learning,” J. Chem. Phys. 148(24), 241717 (2018)., Google ScholarScitation, ISI
  15. 15. F. A. Faber, L. Alexander, O. Anatole Von Lilienfeld, and R. Armiento, “Machine learning energies of 2 million elpasolite (ABC2D6) crystals,” Phys. Rev. Lett. 117(13), 135502 (2016)., Google ScholarCrossref
  16. 16. C. Fonseca Guerra, J. G. Snijders, G. Te Velde, and E. Jan Baerends, “Towards an order-N DFT method,” Theor. Chem. Acc. 99, 391–403 (1998)., Google ScholarCrossref
  17. 17. C. M. Handley and P. L. A. Popelier, “Dynamically polarizable water potential based on multipole moments trained by machine learning,” J. Chem. Theory Comput. 5(6), 1474–1489 (2009)., Google ScholarCrossref
  18. 18. K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O. Anatole Von Lilienfeld, K. Robert Müller, and A. Tkatchenko, “Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space,” J. Phys. Chem. Lett. 6(12), 2326–2331 (2015)., Google ScholarCrossref
  19. 19. K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Scheffler, O. Anatole Von Lilienfeld, T. Alexandre, and K. Robert Müller, “Assessment and validation of machine learning methods for predicting molecular atomization energies,” J. Chem. Theory Comput. 9(8), 3404–3419 (2013)., Google ScholarCrossref
  20. 20. G. Hautier, C. C. Fischer, A. Jain, T. Mueller, and G. Ceder, “Finding natures missing ternary oxide compounds using machine learning and density functional theory,” Chem. Mater. 22(12), 3762–3767 (2010)., Google ScholarCrossref
  21. 21. B. Huang and O. Anatole Von Lilienfeld, “Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity,” J. Chem. Phys. 145(16), 161102 (2016)., Google ScholarScitation, ISI
  22. 22. H. Huo and M. Rupp, “Unified representation of molecules and crystals for machine learning,” e-print arXiv:1704.06439 (2017). Google Scholar
  23. 23. E. Iype, M. Hütter, A. P. J. Jansen, S. V. Nedea, and C. C. M. Rindt, “Parameterization of a reactive force field using a Monte Carlo algorithm,” J. Comput. Chem. 34(13), 1143–1154 (2013)., Google ScholarCrossref
  24. 24. R. Jinnouchi and R. Asahi, “Predicting catalytic activity of nanoparticles by a DFT-aided machine-learning algorithm,” J. Phys. Chem. Lett. 8(17), 4279–4283 (2017)., Google ScholarCrossref
  25. 25. C. Kim, G. Pilania, and R. Ramprasad, “Machine learning assisted predictions of intrinsic dielectric breakdown strength of ABX3 perovskites,” J. Phys. Chem. C 120(27), 14575–14580 (2016)., Google ScholarCrossref
  26. 26. B. Kolb, P. Marshall, B. Zhao, B. Jiang, and H. Guo, “Representing global reactive potential energy surfaces using Gaussian processes,” J. Phys. Chem. A 121(13), 2552–2557 (2017)., Google ScholarCrossref
  27. 27. B. Li, P. Sun, Q. Jin, J. Wang, and D. Ding, “Simulated annealing study of cation distribution in dehydrated zeolites,” J. Mol. Struct.: THEOCHEM 391(3), 259–263 (1997)., Google ScholarCrossref
  28. 28. B. J. Lynch and D. G. Truhlar, “Robust and affordable multicoefficient methods for thermochemistry and thermochemical kinetics: The MCCM/3 suite and SAC/3,” J. Phys. Chem. A 107(19), 3898–3906 (2003)., Google ScholarCrossref
  29. 29. J. C. David MacKay, Information Theory, Inference, and Learning Algorithms, 7th ed. (Cambridge University Press, 2005), Vol. 100. Google Scholar
  30. 30. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” J. Chem. Phys. 21(6), 1087–1092 (1953)., Google ScholarScitation, ISI
  31. 31. T. M. Mitchell, Machine Learning (McGrawHill Science/Engineering/Mathematics, 1997), Number 1. Google Scholar
  32. 32. G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen, T. Alexandre, K. R. Muller, and O. Anatole Von Lilienfeld, “Machine learning of molecular electronic properties in chemical compound space,” New J. Phys. 15, 095003 (2013)., Google ScholarCrossref
  33. 33. J. P. Perdew and Y. Wang, “Accurate and simple analytical representation of the electron-gas correlation energy,” Phys. Rev. B 45(23), 13244 (1992)., Google ScholarCrossref
  34. 34. R. Ramakrishnan, P. O. Dral, M. Rupp, and O. Anatole von Lilienfeld, “Big data meets quantum chemistry approximations: The Δ-machine learning approach,” J. Chem. Theory Comput. 11, 2087 (2015)., Google ScholarCrossref
  35. 35. R. Ramakrishnan, M. Hartmann, E. Tapavicza, and O. Anatole Von Lilienfeld, “Electronic spectra from TDDFT and machine learning in chemical space,” J. Chem. Phys. 143(8), 084111 (2015)., Google ScholarScitation, ISI
  36. 36. P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-validation,” in Encyclopedia of Database systems (Springer US, 2009), pp. 532–538. Google Scholar
  37. 37. M. Rupp, “Machine learning for quantum mechanics in a nutshell,” Int. J. Quantum Chem. 115(16), 1058–1073 (2015)., Google ScholarCrossref
  38. 38. M. Rupp, R. Ramakrishnan, and O. Anatole von Lilienfeld, “Machine learning for quantum mechanical properties of atoms in molecules,” J. Phys. Chem. Lett. 6(16), 3309–3313 (2015)., Google ScholarCrossref
  39. 39. M. Rupp, T. Alexandre, K.-R. Müller, and O. Anatole von Lilienfeld, “Fast and accurate modeling of molecular atomization energies with machine learning,” Phys. Rev. Lett. 108, 058301 (2012)., Google ScholarCrossref
  40. 40. M. Rupp, O. Anatole Von Lilienfeld, and K. Burke, ‘Guest editorial: Special topic on data-enabled theoretical chemistry,” J. Chem. Phys. 148(24), 241401 (2018)., Google ScholarScitation, ISI
  41. 41. B. Scholkopf, “The kernel trick for distances,” in Advances in Neural Information Processing Systems 13, edited by T. K. Leen, T. G. Dietterich, and V. Tresp, (MIT Press, 2001), pp. 301–307, Google Scholar
  42. 42. A. V. Shapeev, “Moment tensor potentials: A class of systematically improvable interatomic potentials,” Multiscale Model. Simul. 14, 1153 (2015); e-print arXiv:1512.06054., Google ScholarCrossref
  43. 43. G. te Velde, F. M. Bickelhaupt, E. J. Baerends, C. Fonseca Guerra, S. J. A. van Gisbergen, J. G. Snijders, and T. Ziegler, “Chemistry with ADF,” J. Comput. Chem. 22(9), 931–967 (2001)., Google ScholarCrossref
  44. 44. E. Van Lenthe and E. J. Baerends, “Optimized Slater-type basis sets for the elements 1-118,” J. Comput. Chem. 24(9), 1142–1156 (2003)., Google ScholarCrossref
  45. 45. A. Varnek and I. Baskin, “Machine learning methods for property prediction in chemoinformatics: Quo vadis?,” J. Chem. Inf. Model. 52, 1413 (2012)., Google ScholarCrossref
  46. 46. O. Anatole Von Lilienfeld, “First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties,” Int. J. Quantum Chem. 113(12), 1676–1689 (2013)., Google ScholarCrossref
  47. 47. O. Anatole Von Lilienfeld, R. Ramakrishnan, M. Rupp, and A. Knoll, “Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties,” Int. J. Quantum Chem. 115(16), 1084–1093 (2015)., Google ScholarCrossref
  48. 48. K. Vu, J. Snyder, L. Li, M. Rupp, B. F. Chen, T. Khelif, K.-R. Müller, and K. Burke, “Understanding kernel ridge regression: Common behaviors from simple functions to density functionals,” Int. J. Quantum Chem. 115 1115–1128 (2015); e-print arXiv:1501.03854., Google ScholarCrossref
  49. 49. S. R. Wilson and W. Cui, “Applications of simulated annealing to peptides,” Biopolymers 29(1), 225–235 (1990)., Google ScholarCrossref
  50. 50. K. Yao, J. E. Herr, S. N. Brown, and J. Parkhill, “Intrinsic bond energies from a bonds-in-molecules neural network,” J. Phys. Chem. Lett. 8(12), 2689–2694 (2017)., Google ScholarCrossref
  1. © 2019 Author(s). Published under license by AIP Publishing.