NDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that accounts for the position of relevant items
BM25: Best Matching 25—a probabilistic retrieval function used to rank documents based on query terms appearing in each document
BGE: BAAI General Embedding—a state-of-the-art dense retrieval model that maps text to vector embeddings
GRPO: Group Relative Policy Optimization—an RL algorithm that normalizes rewards within a group of outputs generated from the same input to reduce variance
REINFORCE++: A variant of the REINFORCE algorithm that uses batch-wide normalization instead of group-wise normalization
rollout: A single execution path of the model (generating an augmentation) used to estimate rewards
sparse retrieval: Retrieval based on matching specific keywords (tokens) between query and document
dense retrieval: Retrieval based on semantic similarity between vector representations of query and document
RLHF: Reinforcement Learning from Human Feedback—fine-tuning models using rewards derived from human preferences