โ† Back to Paper List

Towards a Neural Debugger for Python

Maximilian Beck, Jonas Gehring, Jannik Kossen, Gabriel Synnaeve
Johannes Kepler University Linz, Institute for Machine Learning, Meta FAIR CodeGen Team
arXiv (2026)
Agent Pretraining Reasoning Benchmark

๐Ÿ“ Paper Summary

Self-evolving Agentic reasoning Tool-use post-training
Neural debuggers are language models trained on execution traces to simulate interactive debugging actions like stepping and breakpoints, enabling both forward and inverse program execution prediction.
Core Problem
Existing neural interpreters execute code strictly line-by-line, failing to model the interactive, non-sequential usage of debuggers (breakpoints, stepping over) that developers rely on.
Why it matters:
  • Developers rarely execute programs sequentially; they jump between relevant states to isolate bugs, a behavior current models cannot simulate
  • Traditional debugging requires a live execution environment, which is often unavailable or restricted in synthesis and testing scenarios
  • Agentic coding systems lack a world model for debugging that allows them to plan, verify, and repair code without expensive re-execution
Concrete Example: A developer sets a breakpoint at a specific line inside a loop to inspect a variable. A standard neural interpreter must predict every intermediate line before reaching that point, whereas a developer (and a neural debugger) jumps directly to the state of interest.
Key Novelty
Neural Debugger as an MDP
  • Models the debugger as a Markov Decision Process where states are program locations/variables and actions are debugger commands (step_into, step_over, breakpoint)
  • Constructs a state tree from execution traces to define valid transitions, allowing the model to learn non-sequential jumps and call-stack navigation
  • Supports inverse execution (predicting past states or inputs from a current state) by reversing the transition tree and repurposing actions
Architecture
Architecture Figure Figure 1
The data pipeline for creating neural debuggers: from execution traces to state trees, trajectory sampling, and final tokenization.
Evaluation Highlights
  • The 32B-parameter neural debugger achieves >90% accuracy in predicting the next state across key actions (step into, step over, step return, breakpoint)
  • Fine-tuned 32B model achieves 83.2 pass@1 on CruxEval output prediction, significantly outperforming base models
  • 1.8B model trained from scratch achieves 53.6 pass@1 on CruxEval input prediction, demonstrating strong inverse execution capabilities
Breakthrough Assessment
8/10
Novel formulation of execution prediction as an interactive debugging MDP. Enables capabilities like inverse execution and constant-time jumps that standard neural interpreters lack.
×