Query Expansion (QE): The process of adding related terms to a user's query to improve the chances of matching relevant documents
PLM: Pre-trained Language Model—large neural networks trained on vast text data, used here to generate expansion terms
Sparse Retrieval: Retrieval methods like BM25 that match documents based on exact word overlap, as opposed to dense vector similarity
Pseudo-relevant: Documents retrieved by a first pass that are assumed to be relevant for the purpose of feedback or optimization, classified here by a model
Cross-encoder: A model that processes the query and document together to output a relevance score, typically more accurate but slower than bi-encoders
Hit@k: The percentage of queries for which at least one correct answer appears in the top-k retrieved documents
EM (Exact Match): A metric measuring if the predicted answer string exactly matches the ground truth
ODQA: Open-Domain Question Answering—answering questions using a large collection of documents (like Wikipedia) without a pre-specified context
SPLADE: Sparse Lexical and Expansion Model—a neural retrieval method that learns sparse representations for queries and documents