D-F1: Disambiguation F1 score—a metric evaluating how well the generated answer covers all unique disambiguated answers for an ambiguous question
Pseudo-interpretations: Inferred potential meanings of an ambiguous question generated by an LLM to guide diverse retrieval
Iterative RAG: A RAG approach that repeatedly retrieves and generates to refine answers, often at high computational cost
ToC: Tree of Clarifications—a state-of-the-art iterative RAG method that builds a tree of disambiguations
ColBERT: A dense retrieval model that uses late interaction to match query and document tokens
SentenceBERT: A modification of the BERT network to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity
ASQA: Ambiguous SQuAD—a dataset specifically designed for ambiguous question answering
SituatedQA: A QA dataset where answers depend on context (time, location), introducing ambiguity