← Back to Paper List

MatterChat: A Multi-Modal LLM for Material Science

Yingheng Tang, Wenbin Xu, Jie Cao, Jian Ma, Weilu Gao, Steven Farrell, Benjamin Erichson, Michael W. Mahoney, Andy Nonaka, Zhi Yao
Lawrence Berkeley National Laboratory, University of Colorado at Boulder, The University of Utah, University of California at Berkeley
arXiv.org (2025)
MM Reasoning RAG Benchmark

📝 Paper Summary

Material Property Prediction Multi-modal Large Language Models AI for Science
MatterChat integrates a pretrained universal interatomic potential with a frozen Large Language Model via a trainable bridge module to enable structure-aware material property prediction and scientific reasoning.
Core Problem
Existing methods either lack language understanding (graph-based models) or lose structural resolution by relying on text descriptions like SMILES/CIF (LLM-based methods).
Why it matters:
  • High-fidelity methods like DFT are computationally prohibitive for large-scale screening
  • Pure graph models cannot handle user prompts, literature context, or explainable reasoning
  • Text-only LLMs fail to capture precise atomic interactions, leading to inferior quantitative predictions
Concrete Example: When predicting properties for a material like Yttrium Iron Garnet (YIG), a standard graph model gives a number without context, while a text-only LLM might hallucinate the structure. MatterChat takes the specific atomic graph, predicts it is magnetic, and generates a detailed synthesis protocol (mixing ratios, sintering conditions) grounded in that specific structure.
Key Novelty
Structure-Aware Multi-Modal Bootstrapping
  • Uses a pretrained universal machine learning interatomic potential (uMLIP) as a frozen graph encoder to extract physically meaningful atomic embeddings
  • Employs a BLIP2-style transformer bridge to align these dense atomic embeddings into the LLM's token space via trainable queries, avoiding expensive full-model retraining
Architecture
Architecture Figure Figure 1(a)
The architecture of MatterChat, detailing the data flow from material structure and text prompts to the final text output.
Evaluation Highlights
  • Outperforms general-purpose LLMs (GPT-4o, Gemini, DeepSeek) on formation energy estimation for GNoME-discovered materials
  • Surpasses physical graph-based baselines (SchNet, CHGNet) on numerical tasks like bandgap and energy prediction by leveraging multi-modal reasoning
  • Demonstrates effective retrieval-augmented generation (RAG) capabilities, improving robustness by retrieving similar materials during inference
Breakthrough Assessment
8/10
Successfully bridges the gap between precise physical potentials and reasoning-capable LLMs without retraining the backbone models. The ability to outperform specialized physical models on numerical tasks while retaining chat capabilities is significant.
×