CRS: Conversational Recommender Systems—AI systems that converse with users to elicit preferences and provide recommendations
Safe-GDPO: Safe Group reward–Decoupled Normalization Policy Optimization—an RL training method that normalizes safety and utility rewards separately to ensure stable multi-objective optimization
Latent Traits: Hidden user sensitivities (e.g., phobias, trauma triggers) inferred by the model from conversational cues
DDD: DoesTheDogDie—a crowdsourced database tracking detailed content triggers (e.g., 'Does a dog die?', 'Are there needles?') in media
IPG: IMDb Parent Guide—a structured set of severity ratings for content categories like Violence, Sex/Nudity, and Profanity
ESRB: Entertainment Software Rating Board—the standard age and content rating system for video games in North America
GRPO: Group Relative Policy Optimization—an RL method that optimizes policies based on group-level relative rewards (often used to reduce need for a critic model)
RLHF: Reinforcement Learning from Human Feedback—fine-tuning models using rewards derived from human preferences