semi-formal reasoning: A prompting methodology requiring agents to construct explicit premises, trace execution paths, and derive formal conclusions in a structured template
patch equivalence: Determining whether two different code patches produce identical pass/fail outcomes on a test suite
fault localization: The task of identifying the specific lines of code responsible for a software bug given a failing test case
RL: Reinforcement Learning—a training method where agents learn by receiving rewards for their actions
Chain-of-Thought: A prompting technique where the model is encouraged to generate intermediate reasoning steps before the final answer
hunk: A contiguous block of changes (insertions/deletions) in a diff or patch file
interprocedural reasoning: Analyzing code behavior by tracing control flow across boundaries of different functions or files
Defects4J: A database of real-world bugs from Java projects used to benchmark software testing and repair techniques
SWE-bench: A benchmark for evaluating LLMs on real-world software engineering issues collected from GitHub