RAGProbe: The proposed framework for automating RAG evaluation by generating scenario-based QA pairs.
Evaluation Scenario: A structured definition including sampling strategies, prompts, and metrics designed to test a specific RAG capability (e.g., reasoning across documents).
RAGAS: A state-of-the-art framework for RAG evaluation that provides metrics and data generation, used as a baseline comparison.
Chunking: The process of breaking down large documents into smaller text segments for indexing and retrieval.
Vector Database: A storage system for high-dimensional vectors (embeddings) used to perform semantic search.
CI/CD: Continuous Integration/Continuous Deployment—software engineering practices for automating the delivery of applications.
One-shot prompting: Providing an LLM with a single example of the desired input-output format within the prompt.
S4: Scenario 4: A combined question where answers are found in a single document.
S5: Scenario 5: A combined question where answers span multiple documents.