SFT: Supervised Fine-Tuning—training the model on a labeled dataset (matched CV-JD pairs) to learn the basic task format
RLRF: Reinforcement Learning from Recruiter Feedback—aligning the model with market needs using a reward model trained on recruiter acceptance/rejection data
PPO: Proximal Policy Optimization—an RL algorithm used to update the generator's policy to maximize the reward signal while maintaining training stability
CV: Curriculum Vitae—a document detailing a person's career history and qualifications
JD: Job Description—a text document outlining the responsibilities and requirements of a specific job role
KL divergence: A statistical distance measure used in RL to prevent the fine-tuned model from deviating too far from the initial supervised model
BLEU: Bilingual Evaluation Understudy—a metric for evaluating the quality of text which counts the overlap of n-grams between the candidate and reference text
ROUGE: Recall-Oriented Understudy for Gisting Evaluation—a set of metrics used to evaluate automatic summarization and translation in NLP
AUC: Area Under the Curve—a performance measurement for classification problems at various threshold settings