FreshQA: A novel dynamic QA benchmark with 600 questions divided into categories based on answer stability (never-changing, slow-changing, fast-changing, false-premise)
FreshPrompt: A few-shot prompting method that injects search engine results (snippets, answer boxes, related questions) into the LLM context to improve factuality
STRICT evaluation: An evaluation mode where a response is only credited if the main answer is correct AND it contains zero hallucinations or outdated information
RELAXED evaluation: An evaluation mode where a response is credited if the primary answer is correct, even if it contains minor hallucinations or outdated details
hallucination: Plausible but factually incorrect information generated by an LLM
CoT: Chain-of-Thought—a prompting technique where the model is encouraged to generate intermediate reasoning steps before the final answer
Self-Ask: A prompting method that teaches an LLM to decompose questions into sub-questions and answer them using search results
knowledge cutoff: The date up to which an LLM's training data extends; the model generally lacks knowledge of events after this date
organic results: The standard list of web page links and snippets returned by a search engine, excluding special features like ads or answer boxes