A machine learning workflow for molecular analysis: Application to melting points

Ganesh Sivaraman, Nicholas E. Jackson, Benjamin Sanchez-Lengeling, Alvaro Vazquez-Mayagoitia, Alan Aspuru-Guzik, Venkatram Vishwanath, Juan J. De Pablo

Research output: Contribution to journalArticlepeer-review

Abstract

Computational tools encompassing integrated molecular prediction, analysis, and generation are key for molecular design in a variety of critical applications. In this work, we develop a workflow for molecular analysis (MOLAN) that integrates an ensemble of supervised and unsupervised machine learning techniques to analyze molecular data sets. The MOLAN workflow combines molecular featurization, clustering algorithms, uncertainty analysis, low-bias dataset construction, high-performance regression models, graph-based molecular embeddings and attribution, and a semi-supervised variational autoencoder based on the novel SELFIES representation to enable molecular design. We demonstrate the utility of the MOLAN workflow in the context of a challenging multi-molecule property prediction problem: the determination of melting points solely from single molecule structure. This application serves as a case study for how to employ the MOLAN workflow in the context of molecular property prediction.

Original languageEnglish (US)
Article number025015
JournalMachine Learning: Science and Technology
Volume1
Issue number2
DOIs
StatePublished - Jun 2020
Externally publishedYes

Keywords

  • Machine learning
  • Materials
  • Melting point
  • Workflow

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'A machine learning workflow for molecular analysis: Application to melting points'. Together they form a unique fingerprint.

Cite this