HMAS: Hierarchical Multi-Agent System—a structured collaboration of Planner, Experts, and Arbiter agents to decompose and solve recommendation tasks
Atomized Entity Compression: A technique to map multi-token entity descriptions into a single vector representation (atomic unit) to reduce LLM input context length
CRS: Constrained Reward Shaping—a reinforcement learning strategy where secondary objectives act as hard constraints that must be satisfied before the primary reward is optimized
Agent-as-a-Judge: An evaluation framework where an LLM agent decomposes quality assessment into multi-step reasoning rather than predicting a single score directly
MFU: Model FLOPs Utilization—a metric measuring how efficiently the hardware's floating-point operations are being used during model inference
XQA kernel: A highly optimized GPU kernel for attention computation, supporting FP8 precision on H20 GPUs
GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm that normalizes advantages within a group of sampled outputs to stabilize training
IPV: Item Page Views—a metric counting how many times users view specific item detail pages
CTR: Click-Through Rate—the ratio of users who click on a specific link to the number of total users who view a page
NER: Novelty Exposure Rate—a metric measuring the system's ability to expose new or less popular items to users
SFT: Supervised Fine-Tuning—training a model on labeled examples to establish baseline capabilities before applying reinforcement learning