TY - GEN
T1 - GLaD
T2 - 33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
AU - Nguyen, Thao
AU - Torres-Flores, Tiara
AU - Hwang, Changhyun
AU - Edwards, Carl
AU - Diao, Ying
AU - Ji, Heng
N1 - This work is supported by the Molecule Maker Lab Institute: an AI research institute program supported by NSF under award No. 2019897. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of, the National Science Foundation, and the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.
PY - 2024/10/21
Y1 - 2024/10/21
N2 - This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors for enhanced PCE prediction. Due to the lack of high-quality experimental data, we collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, which we utilize as the training data for our predictive model. In this low-data regime, GLaD leverages properties learned from large language models (LLMs) pretrained on extensive scientific literature to enrich molecular structural representations, allowing for a multimodal representation of molecules. GLaD achieves precise predictions of PCE, thereby facilitating the synthesis of new OPV molecules with improved efficiency. Furthermore, GLaD showcases versatility, as it applies to a range of molecular property prediction tasks (BBBP, BACE, ClinTox and SIDER [45]), not limited to those concerning OPV materials. Especially, GLaD proves valuable for tasks in low-data regimes within the chemical space, as it enriches molecular representations by incorporating molecular property descriptions learned from large-scale pretraining. This capability is significant in real-world scientific endeavors like drug and material discovery, where access to comprehensive data is crucial for informed decision-making and efficient exploration of the chemical space.
AB - This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors for enhanced PCE prediction. Due to the lack of high-quality experimental data, we collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, which we utilize as the training data for our predictive model. In this low-data regime, GLaD leverages properties learned from large language models (LLMs) pretrained on extensive scientific literature to enrich molecular structural representations, allowing for a multimodal representation of molecules. GLaD achieves precise predictions of PCE, thereby facilitating the synthesis of new OPV molecules with improved efficiency. Furthermore, GLaD showcases versatility, as it applies to a range of molecular property prediction tasks (BBBP, BACE, ClinTox and SIDER [45]), not limited to those concerning OPV materials. Especially, GLaD proves valuable for tasks in low-data regimes within the chemical space, as it enriches molecular representations by incorporating molecular property descriptions learned from large-scale pretraining. This capability is significant in real-world scientific endeavors like drug and material discovery, where access to comprehensive data is crucial for informed decision-making and efficient exploration of the chemical space.
KW - graph neural network
KW - large language models
KW - organic photovoltaics
KW - power conversion efficiency prediction
UR - http://www.scopus.com/inward/record.url?scp=85208153353&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85208153353&partnerID=8YFLogxK
U2 - 10.1145/3627673.3680103
DO - 10.1145/3627673.3680103
M3 - Conference contribution
AN - SCOPUS:85208153353
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 4777
EP - 4785
BT - CIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
Y2 - 21 October 2024 through 25 October 2024
ER -