TY - JOUR
T1 - Towards overcoming data scarcity in materials science
T2 - unifying models and datasets with a mixture of experts framework
AU - Chang, Rees
AU - Wang, Yu Xiong
AU - Ertekin, Elif
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - While machine learning has emerged in recent years as a useful tool for the rapid prediction of materials properties, generating sufficient data to reliably train models without overfitting is often impractical. Towards overcoming this limitation, we present a general framework for leveraging complementary information across different models and datasets for accurate prediction of data-scarce materials properties. Our approach, based on a machine learning paradigm called mixture of experts, outperforms pairwise transfer learning on 14 of 19 materials property regression tasks, performing comparably on four of the remaining five. The approach is interpretable, model-agnostic, and scalable to combining an arbitrary number of pre-trained models and datasets to any downstream property prediction task. We anticipate the performance of our framework will further improve as better model architectures, new pre-training tasks, and larger materials datasets are developed by the community.
AB - While machine learning has emerged in recent years as a useful tool for the rapid prediction of materials properties, generating sufficient data to reliably train models without overfitting is often impractical. Towards overcoming this limitation, we present a general framework for leveraging complementary information across different models and datasets for accurate prediction of data-scarce materials properties. Our approach, based on a machine learning paradigm called mixture of experts, outperforms pairwise transfer learning on 14 of 19 materials property regression tasks, performing comparably on four of the remaining five. The approach is interpretable, model-agnostic, and scalable to combining an arbitrary number of pre-trained models and datasets to any downstream property prediction task. We anticipate the performance of our framework will further improve as better model architectures, new pre-training tasks, and larger materials datasets are developed by the community.
UR - http://www.scopus.com/inward/record.url?scp=85142172156&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142172156&partnerID=8YFLogxK
U2 - 10.1038/s41524-022-00929-x
DO - 10.1038/s41524-022-00929-x
M3 - Article
AN - SCOPUS:85142172156
SN - 2057-3960
VL - 8
JO - npj Computational Materials
JF - npj Computational Materials
IS - 1
M1 - 242
ER -