Submodular Optimization: A method for selecting subsets that naturally models diminishing returns, ensuring diversity and coverage (like selecting locations for facilities to cover a city)
PMI: Pointwise Mutual Information—a measure of how much knowing one event tells you about another
In-Context Learning (ICL): The ability of LLMs to perform tasks by seeing examples within the prompt context without weight updates
Facility Location: A specific submodular function that maximizes the sum of similarities between each data point and its most similar representative in the selected subset
KL divergence: A statistical distance measuring how one probability distribution differs from a second, reference probability distribution
Teacher Forcing: A training method where the model is fed the actual previous ground truth tokens as input for the next prediction, rather than its own generated guesses
Greedy Heuristic: An algorithm that makes the locally optimal choice at each stage (picking the single best next item) to approximate a global optimum