TY - JOUR
T1 - Development of a Computer-Guided Workflow for Catalyst Optimization. Descriptor Validation, Subset Selection, and Training Set Analysis
AU - Henle, Jeremy J.
AU - Zahrt, Andrew F.
AU - Rose, Brennan T.
AU - Darrow, William T.
AU - Wang, Yang
AU - Denmark, Scott E.
N1 - Publisher Copyright:
Copyright © 2020 American Chemical Society.
PY - 2020/7/1
Y1 - 2020/7/1
N2 - Modern, enantioselective catalyst development is driven largely by empiricism. Although this approach has fostered the introduction of most of the existing synthetic methods, it is inherently limited by the skill, creativity, and chemical intuition of the practitioner. Herein, we present a complementary approach to catalyst optimization in which statistical methods are used at each stage to streamline development. To construct the optimization informatics workflow, a number of critical components had to be subjected to rigorous validation. First, the critically important molecular descriptors were validated in two case studies to establish the importance of conformation-dependent molecular representations. Next, with a large data set available, it was possible to investigate the amount of data necessary to make predictive models with different modeling methods. Given the commercial availability of many catalyst structures, it was possible to compare models generated with algorithmically selected training sets and commercially available training sets. Finally, the augmentation of limited data sets is demonstrated in a method informed by unsupervised learning to restore the accuracy of the generated models.
AB - Modern, enantioselective catalyst development is driven largely by empiricism. Although this approach has fostered the introduction of most of the existing synthetic methods, it is inherently limited by the skill, creativity, and chemical intuition of the practitioner. Herein, we present a complementary approach to catalyst optimization in which statistical methods are used at each stage to streamline development. To construct the optimization informatics workflow, a number of critical components had to be subjected to rigorous validation. First, the critically important molecular descriptors were validated in two case studies to establish the importance of conformation-dependent molecular representations. Next, with a large data set available, it was possible to investigate the amount of data necessary to make predictive models with different modeling methods. Given the commercial availability of many catalyst structures, it was possible to compare models generated with algorithmically selected training sets and commercially available training sets. Finally, the augmentation of limited data sets is demonstrated in a method informed by unsupervised learning to restore the accuracy of the generated models.
UR - http://www.scopus.com/inward/record.url?scp=85087385760&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85087385760&partnerID=8YFLogxK
U2 - 10.1021/jacs.0c04715
DO - 10.1021/jacs.0c04715
M3 - Article
C2 - 32568531
AN - SCOPUS:85087385760
SN - 0002-7863
VL - 142
SP - 11578
EP - 11592
JO - Journal of the American Chemical Society
JF - Journal of the American Chemical Society
IS - 26
ER -