The Experimentalist’s Guide to Machine Learning for Small Molecule Design

Sarah E. Lindley, Yiyang Lu, Diwakar Shukla

Research output: Contribution to journalReview articlepeer-review


Initially part of the field of artificial intelligence, machine learning (ML) has become a booming research area since branching out into its own field in the 1990s. After three decades of refinement, ML algorithms have accelerated scientific developments across a variety of research topics. The field of small molecule design is no exception, and an increasing number of researchers are applying ML techniques in their pursuit of discovering, generating, and optimizing small molecule compounds. The goal of this review is to provide simple, yet descriptive, explanations of some of the most commonly utilized ML algorithms in the field of small molecule design along with those that are highly applicable to an experimentally focused audience. The algorithms discussed here span across three ML paradigms: supervised learning, unsupervised learning, and ensemble methods. Examples from the published literature will be provided for each algorithm. Some common pitfalls of applying ML to biological and chemical data sets will also be explained, alongside a brief summary of a few more advanced paradigms, including reinforcement learning and semi-supervised learning.

Original languageEnglish (US)
Pages (from-to)657-684
Number of pages28
JournalACS Applied Bio Materials
Issue number2
StatePublished - Feb 19 2024


  • QSAR
  • data analysis
  • drug design
  • experimentalist friendly
  • machine learning
  • small molecule design

ASJC Scopus subject areas

  • General Chemistry
  • Biochemistry, medical
  • Biomedical Engineering
  • Biomaterials


Dive into the research topics of 'The Experimentalist’s Guide to Machine Learning for Small Molecule Design'. Together they form a unique fingerprint.

Cite this