TY - JOUR
T1 - A machine learning workflow for molecular analysis
T2 - Application to melting points
AU - Sivaraman, Ganesh
AU - Jackson, Nicholas E.
AU - Sanchez-Lengeling, Benjamin
AU - Vazquez-Mayagoitia, Alvaro
AU - Aspuru-Guzik, Alan
AU - Vishwanath, Venkatram
AU - De Pablo, Juan J.
N1 - Publisher Copyright:
©2020 The Author(s).
PY - 2020/6
Y1 - 2020/6
N2 - Computational tools encompassing integrated molecular prediction, analysis, and generation are key for molecular design in a variety of critical applications. In this work, we develop a workflow for molecular analysis (MOLAN) that integrates an ensemble of supervised and unsupervised machine learning techniques to analyze molecular data sets. The MOLAN workflow combines molecular featurization, clustering algorithms, uncertainty analysis, low-bias dataset construction, high-performance regression models, graph-based molecular embeddings and attribution, and a semi-supervised variational autoencoder based on the novel SELFIES representation to enable molecular design. We demonstrate the utility of the MOLAN workflow in the context of a challenging multi-molecule property prediction problem: the determination of melting points solely from single molecule structure. This application serves as a case study for how to employ the MOLAN workflow in the context of molecular property prediction.
AB - Computational tools encompassing integrated molecular prediction, analysis, and generation are key for molecular design in a variety of critical applications. In this work, we develop a workflow for molecular analysis (MOLAN) that integrates an ensemble of supervised and unsupervised machine learning techniques to analyze molecular data sets. The MOLAN workflow combines molecular featurization, clustering algorithms, uncertainty analysis, low-bias dataset construction, high-performance regression models, graph-based molecular embeddings and attribution, and a semi-supervised variational autoencoder based on the novel SELFIES representation to enable molecular design. We demonstrate the utility of the MOLAN workflow in the context of a challenging multi-molecule property prediction problem: the determination of melting points solely from single molecule structure. This application serves as a case study for how to employ the MOLAN workflow in the context of molecular property prediction.
KW - Machine learning
KW - Materials
KW - Melting point
KW - Workflow
UR - http://www.scopus.com/inward/record.url?scp=85121485413&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121485413&partnerID=8YFLogxK
U2 - 10.1088/2632-2153/ab8aa3
DO - 10.1088/2632-2153/ab8aa3
M3 - Article
AN - SCOPUS:85121485413
SN - 2632-2153
VL - 1
JO - Machine Learning: Science and Technology
JF - Machine Learning: Science and Technology
IS - 2
M1 - 025015
ER -