_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
REST-PG: Reasoning-Enhanced Self-Training for Personalized Text Generation—the proposed framework
SFT: Supervised Fine-Tuning—training a model on labeled examples
LLM: Large Language Model—a deep learning model trained on vast amounts of text
EM: Expectation-Maximization—an iterative method to find maximum likelihood estimates, used here to alternate between generating data (E-step) and training on it (M-step)
LongLaMP: Long-form Language Model Personalization benchmark—a dataset for evaluating personalized text generation
RL: Reinforcement Learning—training models to make sequences of decisions to maximize a reward
Reasoning Path: Intermediate text generated by the model explicitly analyzing user preferences/style before generating the final answer
ROUGE: Recall-Oriented Understudy for Gisting Evaluation—a set of metrics for evaluating automatic summarization and translation
Gemma: A family of open weights LLMs developed by Google DeepMind
Seq2Seq Loss: Sequence-to-Sequence Loss—typically cross-entropy loss used to train models to map input sequences to output sequences