RCT: Randomized Controlled Trial—a scientific study design that randomly assigns participants to an experimental group or a control group to measure the effectiveness of an intervention
PICO: Population, Intervention, Comparator, Outcome—the four key components used to structure clinical evidence questions and define trial characteristics
PICO-R: PICO elements plus Results/Evidence Inference—the combination of trial characteristics and the findings concerning them
plain language summarization: The task of generating summaries that simplify technical content for lay readers, often involving elaboration and explanation of difficult concepts
Evidence Inference: The task of determining whether an intervention yielded a significant difference compared to a control group with respect to a specific outcome
hallucination: In this context, non-factual information added by the model that is not supported by the source text or general medical knowledge
Kendall's τb: A rank correlation coefficient used to measure the ordinal association between two measured quantities (e.g., human ratings vs. metric scores)
Spearman's ρ: A nonparametric measure of rank correlation assessing how well the relationship between two variables can be described using a monotonic function
ROUGE-L: Recall-Oriented Understudy for Gisting Evaluation (Longest Common Subsequence)—a metric measuring text overlap based on the longest matching sequence of words
Flesch-Kincaid Grade Level: A readability metric that indicates the US school grade level required to understand a text
BERTScore: An automatic evaluation metric that computes similarity between candidate and reference text using contextual embeddings
zero-shot prompting: Asking a model to perform a task without providing any examples of that task in the prompt