TY - JOUR
T1 - The Experimentalist’s Guide to Machine Learning for Small Molecule Design
AU - Lindley, Sarah E.
AU - Lu, Yiyang
AU - Shukla, Diwakar
N1 - The authors would like to express their gratitude for the fruitful discussions and extensive suggestions from all Shukla Group members, especially Austin T. Weigle, Krishna K. Narayanan, and Diego Kleiman. S.E.L. and D.S. acknowledge support from Herman Frasch Foundation for Chemical Research, Bank of America, N.A., Trustee. D.S. acknowledges support from the National Institutes of Health, under Award No. R35GM142745. Y.L. and D.S. acknowledge support from the project Realizing Increased Photosynthetic Efficiency (RIPE), which is funded by the Bill & Melinda Gates Foundation, Foundation for Food and Agriculture Research (FFAR), and the UK Foreign, Commonwealth and Development Office, under Grant No. OPP1172157.
PY - 2024/2/19
Y1 - 2024/2/19
N2 - Initially part of the field of artificial intelligence, machine learning (ML) has become a booming research area since branching out into its own field in the 1990s. After three decades of refinement, ML algorithms have accelerated scientific developments across a variety of research topics. The field of small molecule design is no exception, and an increasing number of researchers are applying ML techniques in their pursuit of discovering, generating, and optimizing small molecule compounds. The goal of this review is to provide simple, yet descriptive, explanations of some of the most commonly utilized ML algorithms in the field of small molecule design along with those that are highly applicable to an experimentally focused audience. The algorithms discussed here span across three ML paradigms: supervised learning, unsupervised learning, and ensemble methods. Examples from the published literature will be provided for each algorithm. Some common pitfalls of applying ML to biological and chemical data sets will also be explained, alongside a brief summary of a few more advanced paradigms, including reinforcement learning and semi-supervised learning.
AB - Initially part of the field of artificial intelligence, machine learning (ML) has become a booming research area since branching out into its own field in the 1990s. After three decades of refinement, ML algorithms have accelerated scientific developments across a variety of research topics. The field of small molecule design is no exception, and an increasing number of researchers are applying ML techniques in their pursuit of discovering, generating, and optimizing small molecule compounds. The goal of this review is to provide simple, yet descriptive, explanations of some of the most commonly utilized ML algorithms in the field of small molecule design along with those that are highly applicable to an experimentally focused audience. The algorithms discussed here span across three ML paradigms: supervised learning, unsupervised learning, and ensemble methods. Examples from the published literature will be provided for each algorithm. Some common pitfalls of applying ML to biological and chemical data sets will also be explained, alongside a brief summary of a few more advanced paradigms, including reinforcement learning and semi-supervised learning.
KW - QSAR
KW - data analysis
KW - drug design
KW - experimentalist friendly
KW - machine learning
KW - small molecule design
UR - http://www.scopus.com/inward/record.url?scp=85168482616&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85168482616&partnerID=8YFLogxK
U2 - 10.1021/acsabm.3c00054
DO - 10.1021/acsabm.3c00054
M3 - Review article
C2 - 37535819
AN - SCOPUS:85168482616
SN - 2576-6422
VL - 7
SP - 657
EP - 684
JO - ACS Applied Bio Materials
JF - ACS Applied Bio Materials
IS - 2
ER -