IAD: Iterative Agent Decodingโa sequential inference framework that uses feedback to iteratively refine agent outputs.
BoN: Best-of-Nโa sampling strategy where N candidate solutions are generated independently, and the best one is selected by a verifier.
Inference-time alignment: Techniques to improve model outputs during generation (test time) rather than during training, often using extra compute.
Scalar feedback: Numerical scores or binary pass/fail signals provided by a verifier.
Textual feedback: Natural language critiques or instructions describing specific errors or improvements.
Sketch2Code: A benchmark task converting wireframe sketches into functional HTML code.
Text2SQL: A task mapping natural language questions to executable SQL queries.
Verifier: A function or model that evaluates the quality of a generated response, used to guide selection or feedback.