Leveraging language representation for materials exploration and discovery

Jiaxing Qu, Yuxuan Richard Xie, Kamil M. Ciesielski, Claire E. Porter, Eric S. Toberer, Elif Ertekin

Research output: Contribution to journalArticlepeer-review


Data-driven approaches to materials exploration and discovery are building momentum due to emerging advances in machine learning. However, parsimonious representations of crystals for navigating the vast materials search space remain limited. To address this limitation, we introduce a materials discovery framework that utilizes natural language embeddings from language models as representations of compositional and structural features. The contextual knowledge encoded in these language representations conveys information about material properties and structures, enabling both similarity analysis to recall relevant candidates based on a query material and multi-task learning to share information across related properties. Applying this framework to thermoelectrics, we demonstrate diversified recommendations of prototype crystal structures and identify under-studied material spaces. Validation through first-principles calculations and experiments confirms the potential of the recommended materials as high-performance thermoelectrics. Language-based frameworks offer versatile and adaptable embedding structures for effective materials exploration and discovery, applicable across diverse material systems.

Original languageEnglish (US)
Article number58
Journalnpj Computational Materials
Issue number1
StatePublished - Dec 2024

ASJC Scopus subject areas

  • Modeling and Simulation
  • General Materials Science
  • Mechanics of Materials
  • Computer Science Applications


Dive into the research topics of 'Leveraging language representation for materials exploration and discovery'. Together they form a unique fingerprint.

Cite this