RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents
Counterfactual Noise: Retrieved documents containing false information or factual errors relative to the ground truth
Illegal Sentence Noise: Context containing grammatically broken or meaningless word combinations (e.g., 'history transform cover managed')
Prior Noise: Questions based on false assumptions (e.g., asking about an event that never happened)
Datatype Noise: Context mixing text with other data formats like URLs or code snippets
Orthographic Noise: Text containing spelling mistakes or typos
Supportive Noise: Documents that are semantically relevant to the query but do not contain the answer information
Semantic Noise: Documents that are off-topic or have low semantic relevance to the query
NLI: Natural Language Inference—determining whether one sentence logically entails another
Golden Context: The correct, factually accurate retrieved document containing the answer