← Back to Paper List

Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents

Jiacheng Miao, Joe R. Davis, J. Pritchard, James Zou
Department of Genetics, Biomedical Data Science, Electrical Engineering, Biology, and Computer Science, Stanford University
arXiv.org (2025)
Agent Benchmark

📝 Paper Summary

Automated code generation Scientific discovery agents Tool-use post-training
Paper2Agent serves as an automated framework that transforms static research papers and codebases into interactive AI agents by building verified Model Context Protocol (MCP) servers.
Core Problem
Research papers are passive artifacts; reproducing computational methods requires substantial effort to locate code, install complex dependencies, and understand API hierarchies, creating barriers to adoption.
Why it matters:
  • Biologists and non-experts cannot easily leverage advanced computational tools (e.g., AlphaGenome) due to technical setup barriers
  • Static code repositories often require significant manual adaptation to work on new data
  • Existing 'executable papers' or notebooks still require technical familiarity to configure and run successfully
Concrete Example: To use AlphaGenome, a user must normally install environments, import modules, manage API keys, and construct specific chromosome objects. With Paper2Agent, a user simply asks 'Generate AlphaGenome predictions for these variants,' and the agent handles the underlying API complexity automatically.
Key Novelty
Automated Paper-to-MCP Conversion Pipeline
  • Systematically analyzes a paper's text and codebase using specialized agents (Environment, Extraction, Testing) to construct a Model Context Protocol (MCP) server
  • Validates generated tools against the paper's reported results to lock in reproducibility and prevent 'code hallucination' before exposing them to users
Architecture
Architecture Figure Figure 1
The Paper2Agent framework workflow, identifying the codebase, invoking construction agents, validating via testing, and deploying as an MCP server.
Evaluation Highlights
  • 100.0% accuracy on novel AlphaGenome queries (unseen variants/tissues), significantly outperforming Claude + Repo (80.0%) and Biomni (60.0%)
  • Median runtime reduced by 3.2x compared to Claude + Repo and 4.6x compared to Biomni on novel benchmark tasks
  • Automatically reproduced original paper results for TISSUE (prediction intervals) and Scanpy (clustering workflows) without human intervention
Breakthrough Assessment
9/10
Introduces a paradigm shift from static PDFs to interactive agents using the industry-standard MCP. Demonstrates high reliability (100% accuracy) and enables automated scientific collaboration.
×