RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents
BM25: A probabilistic retrieval function based on term frequency and inverse document frequency, used here for first-stage retrieval
MusWikiDB: The authors' proposed vector database containing 3.2M music-specific Wikipedia passages
ArtistMus: The authors' proposed benchmark dataset containing 1,000 questions about 500 globally diverse music artists
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that freezes pre-trained weights and injects trainable rank decomposition matrices
pp: Percentage points—the arithmetic difference between two percentages
Contextual reasoning: Questions requiring synthesis or inference across multiple pieces of information within a passage, rather than simple fact lookup
Reranker: A second-stage model that re-scores retrieved documents to improve the quality of the context provided to the generator