Mid-training (MT): An intermediate training stage between pre-training and fine-tuning, using domain-specific data at scale to shift model capabilities
SFT: Supervised Fine-Tuningβtraining on high-quality demonstrations to teach specific behaviors
Pull Request (PR): A proposal to merge code changes in version control systems like GitHub, containing commits, descriptions, and code diffs
SWE-Bench Verified: A benchmark for evaluating software engineering agents on real-world GitHub issues, consisting of a verified subset of tasks
Contextually-native trajectories: Training data reconstructed from PRs that preserves the logical flow: Issue β Context Retrieval β Edits, simulating an agent's information state
Environmentally-native trajectories: Training data recorded from live agent interactions with a compiler/interpreter, capturing real tool outputs and execution feedback (e.g., test failures)
Pass@1: The percentage of problems solved correctly on the first attempt
Scaffold: The software framework wrapping the LLM that handles tool execution, memory management, and environment interaction (e.g., SWE-AGENT)