TY - JOUR
T1 - molli
T2 - A General Purpose Python Toolkit for Combinatorial Small Molecule Library Generation, Manipulation, and Feature Extraction
AU - Shved, Alexander S.
AU - Ocampo, Blake E.
AU - Burlova, Elena S.
AU - Olen, Casey L.
AU - Rinehart, N. Ian
AU - Denmark, Scott E.
N1 - We are grateful to the National Science Foundation for financial support (NSF CHE 2154237) as well as for the Molecule Maker Laboratory Institute (NSF CHE 2019897). We thank the W. M. Keck Foundation for contributing to the purchase of system 4 and system 5 and Merck & Co. for contribution to the purchase of system 1. Blake Ocampo thanks the Alfred P. Sloan Foundation\u2019s Minority Ph.D. Program for funding. Alexander Shved, Blake Ocampo, Casey Olen, and Ian Rinehart thank the University of Illinois for graduate fellowships. We thank Sara Lambert and Matthew Berry (UIUC NCSA) for their assistance with the GitHub workflow and insightful discussions. The authors acknowledge Austin Douglas for assistance with establishing the documentation system and Ethan G. M. Mattson for prototyping some of the conformer generation code. We thank Mark Hewitt for his assistance with the cluster computing resources. Finally, we thank Dr. Jeremy J. Henle and Dr. Andrew F. Zahrt for the design of ccheminfolib library that inspired the creation of molli.
PY - 2024/11/11
Y1 - 2024/11/11
N2 - The construction, management, and analysis of large in silico molecular libraries is critical in many areas of modern chemistry. Herein, we introduce the MOLecular LIibrary toolkit, “molli”, which is a Python 3 cheminformatics module that provides a streamlined interface for manipulating large in silico libraries. Three-dimensional, combinatorial molecule libraries can be expanded directly from two-dimensional chemical structure fragments stored in CDXML files with high stereochemical fidelity. Geometry optimization, property calculation, and conformer generation are executed by interfacing with widely used computational chemistry programs such as OpenBabel, RDKit, ORCA, NWChem, and xTB/CREST. Conformer-dependent grid-based feature calculators provide numerical representation and interface to robust three-dimensional visualization tools that provide comprehensive images to enhance human understanding of libraries with thousands of members. The package includes a command-line interface in addition to Python classes to streamline frequently used workflows. Parallel performance is benchmarked on various hardware platforms, and common workflows are demonstrated for different tasks ranging from optimized grid-based descriptor calculation on catalyst libraries to an NMR chemical shift prediction workflow from CDXML files.
AB - The construction, management, and analysis of large in silico molecular libraries is critical in many areas of modern chemistry. Herein, we introduce the MOLecular LIibrary toolkit, “molli”, which is a Python 3 cheminformatics module that provides a streamlined interface for manipulating large in silico libraries. Three-dimensional, combinatorial molecule libraries can be expanded directly from two-dimensional chemical structure fragments stored in CDXML files with high stereochemical fidelity. Geometry optimization, property calculation, and conformer generation are executed by interfacing with widely used computational chemistry programs such as OpenBabel, RDKit, ORCA, NWChem, and xTB/CREST. Conformer-dependent grid-based feature calculators provide numerical representation and interface to robust three-dimensional visualization tools that provide comprehensive images to enhance human understanding of libraries with thousands of members. The package includes a command-line interface in addition to Python classes to streamline frequently used workflows. Parallel performance is benchmarked on various hardware platforms, and common workflows are demonstrated for different tasks ranging from optimized grid-based descriptor calculation on catalyst libraries to an NMR chemical shift prediction workflow from CDXML files.
UR - http://www.scopus.com/inward/record.url?scp=85208772964&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85208772964&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.4c00424
DO - 10.1021/acs.jcim.4c00424
M3 - Article
C2 - 39441186
AN - SCOPUS:85208772964
SN - 1549-9596
VL - 64
SP - 8083
EP - 8090
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 21
ER -