LLM: Large Language Model—advanced AI models capable of understanding and generating human-like text
NLG: Natural Language Generation—the subfield of AI focused on generating text
BLEU: Bilingual Evaluation Understudy—a metric for evaluating text quality by counting matching n-grams between candidate and reference text
ROUGE: Recall-Oriented Understudy for Gisting Evaluation—a set of metrics used to evaluate automatic summarization and translation
Meta-evaluation: The process of evaluating the quality of an evaluation method itself, typically by measuring its correlation with human judgments
BiasedMF: Biased Matrix Factorization—a collaborative filtering algorithm used to generate the movie recommendations in the dataset
Zero-shot: Prompting the model to perform a task without providing any examples
One-shot: Prompting the model with a single example of the task to guide its output
Likert scale: A psychometric scale commonly involved in questionnaires (e.g., 1 to 5) to measure agreement or satisfaction