GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm that optimizes policies by comparing outputs within a group of samples rather than using a learned value function critic
RLVR: Reinforcement Learning with Verifiable Reward—using objective success criteria (like correct answers) to guide RL training
Self-contained: The property of a generated structure (like a table) containing all necessary information to answer the query without needing to reference the original source text
Information Density: A metric defined in the paper measuring the ratio of relevant semantic information to the total number of tokens in a sequence
Format-aware prompting: A prompting strategy that uses explicit tags (e.g., <format>) to separate reasoning, structuring, and answering phases
Lost in the middle: A failure mode where LLMs struggle to access information located in the middle of a long context window
Structure-R1: The proposed framework that transforms retrieved content into structured representations optimized for reasoning