over-personalization: When an agent uses personal information inappropriately, resulting in irrelevant, sycophantic, or repetitive responses.
sycophancy: Excessive deference to the user, where the model prioritizes agreeing with the user's beliefs or memories over factual accuracy.
memory hijacking: A phenomenon where retrieved memories receive disproportionately high attention from the model, overshadowing the actual user query and reasoning.
OP-Bench: A benchmark of 1,700 instances designed to diagnose three types of over-personalization: Irrelevance, Sycophancy, and Repetition.
Self-ReCheck: A proposed lightweight module that filters retrieved memories based on their relevance to the current query to prevent over-personalization.
LoCoMo: A long-context, multi-session dialogue dataset used as the source for generating user profiles in this paper.
baiting prompts: Queries designed to look superficially related to a user's profile to trick the model into unnecessary personalization.
repetition score: A metric measuring the cosine similarity between embeddings of responses to distinct queries; higher scores indicate better diversity (less repetition).