Abstract
Molecular design benefits from a strong partnership between chemical intuition and machine learning. Given the proliferation of machine learning in the small molecule catalyst space, several ideas on how best to apply and interpret various 2D or 3D methods are under discussion across the field. We undertook an investigation of several methods for modeling a catalyst system with a large training set. The synthesis of the drug letermovir involves a key conjugate addition that is promoted asymmetrically by a cinchonidine-derived "bis-quat"phase transfer catalyst. An initial data set acquired from 177 catalysts was used to drive five additional rounds of optimization based on machine learning approaches. For this specific data set, random forest with 2D molecular descriptors outperformed all other 2D methods tested, alternative descriptor combinations, and 3D-based approaches. Improvement in the model performance was observed over time, and a high-throughput approach for the synthesis of new catalysts was key to iterating through larger rounds of optimization. Optimizing reaction conditions for one of the best catalysts identified during the machine learning work led to improvement of enantioselectivity to 89%.
Original language | English (US) |
---|---|
Pages (from-to) | 670-682 |
Number of pages | 13 |
Journal | Organic Process Research and Development |
Volume | 26 |
Issue number | 3 |
DOIs | |
State | Published - Mar 18 2022 |
Keywords
- PTC
- QSAR
- cinchona
- cinchonidine
- machine learning
- phase transfer catalysis
- random forest
ASJC Scopus subject areas
- Physical and Theoretical Chemistry
- Organic Chemistry