User Agent: An LLM-based agent (mathcal{U}) that simulates a human user with specific demographics, preferences, and goals
Judge Agent: An LLM-based agent (mathcal{J}) that evaluates the assistant's performance based on the dialogue history and user profile
Situational Context: Dynamic, task-specific factors (e.g., location, time, device) that influence user needs during a specific interaction
LLM-as-a-Judge: Using a large language model to score or evaluate the outputs of another model
TOD: Task-Oriented Dialogue—conversational systems designed to help users complete specific goals like booking tickets or scheduling appointments
PRISM Alignment: A dataset of diverse real-world user profiles used to ground the demographic generation in PersonaLens
Lexical Diversity: A measure of the variety of vocabulary used in text, used here to validate the richness of generated dialogues