RAG: Retrieval-Augmented Generation—enhancing LLMs by retrieving external documents to answer queries
Self-RAG: A baseline method that trains models to retrieve, generate, and critique their own outputs using special reflection tokens
PPL: Perplexity—a measurement of how well a probability model predicts a sample; lower PPL indicates the model is less 'surprised' by the text
Control tokens: Special tokens added to the vocabulary (e.g., SPECIAL_rewrite) to trigger specific model behaviors like rewriting or decomposing a query
Tree decoding: An inference strategy where the model explores multiple possible action sequences (branches) before selecting the best final output
DuckDuckGo: An internet search engine used here as the retrieval source
Single-hop QA: Questions that can be answered with a single piece of evidence
Multi-hop QA: Questions requiring reasoning across multiple documents or steps to derive an answer