STITCH: Solve with RL, Then Imitate To Close Holes—a training loop alternating between RL, hinted RL, and imitation learning to train the planner
Planner: A module that predicts the optimal minimum and maximum chunk granularity (abstraction level) for a specific query
Compressor: A neural module that aggregates embeddings of fine-grained chunks into a single coarse-grained embedding without generating intermediate text summaries
GRPO: Group Relative Policy Optimization—an RL algorithm used here to update the planner policy
RAG: Retrieval-Augmented Generation—providing LLMs with external evidence to improve factual accuracy
SFT: Supervised Fine-Tuning—training a model on labeled examples
RL: Reinforcement Learning—training agents to take actions that maximize a reward signal
Pseudo-labels: Automatically generated supervision signals used when human-annotated ground truth is unavailable