← Back to Paper List

Kosmos: An AI Scientist for Autonomous Discovery

Ludovico Mitchener, Angela Yiu, Benjamin Chang, M. Bourdenx, Tyler Nadolski, Arvis Sulovari, E. Landsness, Dániel L. Barabási, Siddharth Narayanan, Nicky Evans, S. Reddy, M. Foiani, Aizad Kamal, Leah P. Shriver, F. Cao, A. Wassie, Jon M. Laurent, Edwin Melville-Green, M. C. Ramos, Albert Bou, Kaleigh F. Roberts, Sladjana Zagorac, Timothy C. Orr, Miranda E. Orr, K. Zwezdaryk, Ali E. Ghareeb, L. McCoy, B. Gomes, Euan A Ashley, K. Duff, et al.
Not reported in the paper
arXiv.org (2025)
Agent Memory Reasoning Benchmark

📝 Paper Summary

Autonomous AI Scientists Multi-agent Systems
Kosmos is an autonomous AI system that coordinates parallel agents via a structured world model to conduct end-to-end scientific research, from hypothesis generation to report writing.
Core Problem
Existing AI research assistants are either limited to specific domains (e.g., therapeutics, ML) or lack the ability to perform both extensive literature search and deep exploratory data analysis simultaneously.
Why it matters:
  • Accelerating scientific discovery requires integrating vast literature with complex data analysis, a bottleneck for human researchers
  • Prior systems like Robin or AI Scientist are constrained to single domains or lack context sharing between search and analysis agents
  • Siloed agents often fail to trace reasoning back to primary data sources, reducing the transparency and rigor required for scientific trust
Concrete Example: In analyzing metabolomics data for neuroprotection, a standard analysis might identify metabolite changes but fail to link them to specific biological pathways. Kosmos autonomously identified a 'nucleotide salvage' pathway by running parallel literature searches on the observed metabolite inversion patterns, successfully matching the conclusions of a human expert study.
Key Novelty
Structured World Model for Multi-Agent Coordination
  • Uses a central 'world model' to synthesize outputs from parallel instances of literature search and data analysis agents, enabling context sharing across tasks
  • Decouples task execution (reading/coding) from reasoning, allowing the system to run massive parallel rollouts (reading 1,500 papers, writing 42k lines of code) while maintaining a coherent research narrative
  • Links every claim in the final report directly to specific data analysis outputs or literature sources stored in the world model for full traceability
Architecture
Architecture Figure Figure 1a
Schematic of the Kosmos workflow involving the World Model and parallel agents
Evaluation Highlights
  • Reproduced findings from 3 unpublished/preprinted manuscripts and made 4 novel discoveries across diverse fields (neuroscience, cardiology, material science)
  • Executed ~4.1 expert-months of research per run (based on time estimates for 1,500 papers read and 166 analysis rollouts)
  • Achieved 85.5% reproducibility for data analysis statements and 82.1% validation for literature statements in generated reports
Breakthrough Assessment
9/10
Demonstrates genuine autonomous discovery across widely different domains (biology, materials, genetics), surpassing previous single-domain AI scientists. The scale of operation (4.1 expert-months/run) and successful novel findings suggest a step-change in utility.
×