Abstract
Data-driven approaches to materials exploration and discovery are building momentum due to emerging advances in machine learning. However, parsimonious representations of crystals for navigating the vast materials search space remain limited. To address this limitation, we introduce a materials discovery framework that utilizes natural language embeddings from language models as representations of compositional and structural features. The contextual knowledge encoded in these language representations conveys information about material properties and structures, enabling both similarity analysis to recall relevant candidates based on a query material and multi-task learning to share information across related properties. Applying this framework to thermoelectrics, we demonstrate diversified recommendations of prototype crystal structures and identify under-studied material spaces. Validation through first-principles calculations and experiments confirms the potential of the recommended materials as high-performance thermoelectrics. Language-based frameworks offer versatile and adaptable embedding structures for effective materials exploration and discovery, applicable across diverse material systems.
Original language | English (US) |
---|---|
Article number | 58 |
Journal | npj Computational Materials |
Volume | 10 |
Issue number | 1 |
DOIs | |
State | Published - Dec 2024 |
Externally published | Yes |
ASJC Scopus subject areas
- Modeling and Simulation
- General Materials Science
- Mechanics of Materials
- Computer Science Applications