← Back to Paper List

PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs

Jinyue Li, Yuci Liang, Qiankun Li, Xinheng Lyu, Jiayu Qian, Huabao Chen, Kun Wang, Zhigang Zeng, Anil Anthony Bharath, Yang Liu
Affiliations not explicitly listed in the provided text snippet
arXiv (2026)
MM Memory KG RAG

📝 Paper Summary

Computational Pathology Multimodal Large Language Models (MLLMs) Memory-Augmented Generation
PathMem introduces a memory-centric framework for pathology MLLMs that dynamically selects and grounds structured medical knowledge from a literature-derived long-term memory into working memory for accurate diagnosis.
Core Problem
Existing pathology MLLMs operate as parametric black boxes lacking explicit mechanisms to integrate structured expert knowledge (grading criteria, taxonomy) with visual evidence, leading to inconsistent diagnostic reasoning.
Why it matters:
  • Pathology is knowledge-intensive; accurate diagnosis requires linking visual morphology with formal diagnostic standards, not just pattern recognition
  • Current retrieval-augmented methods use static pipelines that fail to model the dynamic, adaptive memory selection process used by human experts
  • Without interpretable memory control, models struggle to reliably incorporate evolving clinical evidence and complex disease taxonomies
Concrete Example: When diagnosing a slide, a standard MLLM might identify tumor cells but fail to apply the specific grading criteria found in recent literature. PathMem retrieves the exact grading rules from its knowledge graph and explicitly conditions its reasoning on that retrieved standard.
Key Novelty
Dynamic LTM-to-WM Transformation via Memory Transformer
  • Constructs a high-quality pathology knowledge graph (LTM) via deep semantic search over PubMed, simulating expert-level accumulated domain knowledge
  • Uses a 'Memory Transformer' to dynamically select relevant knowledge using both static (cosine similarity) and dynamic (joint projection) activation mechanisms
  • explicitly models the cognitive process of transferring only highly relevant knowledge entries from Long-Term Memory to Working Memory for the final reasoning step
Architecture
Architecture Figure Figure 1 (implied)
Overview of the PathMem framework, illustrating the LTM construction from PubMed and the runtime Memory Transformer mechanism.
Evaluation Highlights
  • +12.8% improvement in WSI-Precision and +10.1% in WSI-Relevance on WSI-Bench report generation compared to prior WSI-based models
  • +9.7% gain in open-ended diagnosis accuracy on WSI-Bench compared to baselines
  • Zero-shot generalization demonstrated on three external datasets (WSI-VQA, SlideBench-VQA, CPTAC-NSCLC) without additional fine-tuning
Breakthrough Assessment
8/10
Significant quantitative gains in specialized pathology tasks by effectively bridging the gap between static knowledge bases and dynamic visual reasoning, moving beyond standard RAG.
×