Soft Prompts: Learnable continuous vectors optimized during training to guide the LLM, as opposed to discrete, human-readable text templates
Vector Quantization (VQ): A technique to map continuous inputs to a discrete set of codebook vectors, used here to select personalized prompts
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning method that updates only a small set of added weights while keeping the main model frozen
InfoNCE: A contrastive loss function used to pull positive user-item pairs closer and push negative pairs apart in the embedding space
Cross-attention: An attention mechanism where the Query comes from one source (e.g., prompt) and Key/Value from another (e.g., LLM output), used to extract specific features
Gradient Masking: A technique to selectively block gradient updates, used here to only update the LLM when its output is contradictory/harmful