Generative Job Recommendations with Large Language Model

📝 Paper Summary

Recommender Systems Generative LLMs

GIRL is a generative job recommendation system that creates personalized job descriptions from resumes, refined by recruiter feedback via reinforcement learning.

Core Problem

Traditional job recommendation systems rely on opaque 'black-box' matching scores and are limited to ranking existing database entries, failing to provide explainable guidance or synthesized career advice.

Why it matters:

Job seeking is a high-stakes scenario where user trust and explainability are critical, but black-box neural networks lack transparency.
Discriminative models can only retrieve existing jobs, limiting their ability to act as comprehensive AI advisors that suggest ideal career paths or synthesized roles.
A significant semantic gap often exists between the language in CVs and Job Descriptions (JDs), hindering effective matching.

Concrete Example: A traditional model might output a 0.8 matching score for a candidate and a job without explanation. In contrast, GIRL generates a full Job Description specifically tailored to the candidate's CV, showing exactly what an ideal role looks like for them.

Key Novelty

Generative Paradigm for Job Recommendation (GIRL)

Instead of ranking existing jobs, the model generates a hypothetical 'perfect' Job Description (JD) based on a candidate's CV.
Uses a three-stage training pipeline (SFT, Reward Modeling, RL) to align the LLM's generation not just with language patterns, but with actual recruiter preferences (market demand).
The generated description serves two purposes: providing interpretable career advice to the user and acting as a data augmentation feature to improve traditional matching models.

Architecture

The overall framework of GIRL, illustrating the three-step training process: Supervised Fine-Tuning (SFT), Reward Model Training, and Reinforcement Learning (RL).

Evaluation Highlights

Outperforms state-of-the-art baselines on generation metrics, achieving higher BLEU and ROUGE scores compared to vanilla LLMs.
Improves traditional matching tasks: using generated JDs as auxiliary features boosts the AUC of a BERT-based matching model.
RL training aligns model output with recruiter preferences, yielding higher reward scores compared to SFT-only models.

Breakthrough Assessment

7/10

Novel application of Generative AI to job recommendation (generation vs. retrieval). The proposed pipeline effectively adapts RLHF to the recruitment domain, though the evaluation relies heavily on internal datasets.

⚙️ Technical Details

Problem Definition

Setting: Generative Job Recommendation and Generation-Enhanced Job Recommendation

Inputs: A job seeker's Curriculum Vitae (CV) represented as a sequence of words

Outputs: A generated Job Description (JD) text; optionally, a matching score for a CV-JD pair using the generated JD as augmentation

Pipeline Flow

Input Processing: CV text → Prompt Template
Generation: LLM generates a personalized Job Description (JD)
Optional Enhancement: Generated JD + Original CV + Target JD → Discriminative Model → Matching Score

System Modules

Generator (G)

Generates a personalized JD based on the CV

Model or implementation: BERT-based initialized LLM (specific variant not named, likely standard Transformer)

Reward Model (U)

Predicts matching score between CV and JD to simulate recruiter feedback

Model or implementation: Transformer-based encoder with linear prediction head

Discriminative Matcher (M)

Calculates matching score between a user and a real job, using the generated JD as an extra feature

Model or implementation: BERT-based matching model

Novel Architectural Elements

Integration of generated synthetic JDs as an auxiliary feature input for traditional discriminative matching models

Modeling

Base Model: BERT (specifically bert-base-chinese mentioned in experiment details)

Training Method: Three-step process: Supervised Fine-Tuning (SFT), Reward Model Training (RMT), Reinforcement Learning from Recruiter Feedback (RLRF)

Objective Functions:

Purpose: SFT - Learn to generate JD from CV.

Formally: Negative log-likelihood of generating the target JD tokens.
Purpose: Reward Model - Distinguish matched CV-JD pairs from mismatched ones.

Formally: Pairwise ranking loss maximizing the difference between scores of matched and mismatched pairs.
Purpose: RL - Align generator with recruiter feedback.

Formally: PPO objective maximizing reward (matching score) minus KL divergence penalty.

Training Data:

Large-scale real-world dataset from BOSS Zhipin (online recruitment platform)
Recruitment logs from July to September 2022
Filtered to active users with complete profiles
SFT Data: 100,000 matched CV-JD pairs
Reward Model Data: 100,000 samples (50k matched, 50k mismatched)
RL Training Data: 10,000 CVs

Key Hyperparameters:

learning_rate: 2e-5
batch_size: 16
max_sequence_length: 512
+ 2 more
training_epochs: Not explicitly reported in the paper
optimizer: AdamW

Compute: Not reported in the paper

Comparison to Prior Work

vs. PJFNN/APJFNN/IPJF: GIRL is generative, creating new JDs rather than just scoring existing pairs.
vs. BERT (Discriminative): GIRL uses a generation-enhanced framework where synthetic JDs augment the input features.
vs. General LLMs (e.g., GPT-3): GIRL is fine-tuned with domain-specific recruitment data and aligned via RL with recruiter feedback.

Limitations

Evaluation relies on a proprietary dataset from BOSS Zhipin, hindering direct reproducibility.
The 'base model' is BERT-based (encoder-only usually), but used for generation, which is less common than decoder-only models (like GPT); technical implementation details of the generation mechanism (decoder structure) are sparse.
No human evaluation with actual recruiters was performed; relies on ChatGPT and proxy metrics.

Reproducibility

Code availability is not provided. Dataset is proprietary (BOSS Zhipin) and not public. Prompt templates are provided in the paper (in Chinese, with English translation).

📊 Experiments & Results

Evaluation Setup

Job Description Generation quality assessment and Generation-Enhanced Job Matching performance

Benchmarks:

BOSS Zhipin Dataset (Real-world recruitment data) [New]

Metrics:

BLEU-1, BLEU-2, BLEU-3, BLEU-4
ROUGE-1, ROUGE-2, ROUGE-L
AUC (Area Under Curve)
F1 Score
Accuracy
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Generation quality results comparing GIRL against SFT-only and standard pre-trained models.
BOSS Zhipin Dataset	BLEU-1	0.3342	0.3512	+0.0170
BOSS Zhipin Dataset	ROUGE-L	0.3621	0.3854	+0.0233
Performance of 'Generation-Enhanced' matching, where the generated JD is added as a feature to a BERT matcher.
BOSS Zhipin Dataset	AUC	0.7812	0.8045	+0.0233

Experiment Figures

Prompt template design for the generation task.

Main Takeaways

RLRF (Reinforcement Learning from Recruiter Feedback) significantly improves generation quality metrics (BLEU, ROUGE) compared to SFT alone, suggesting the model learns better alignment with professional standards.
Generated JDs contain valuable semantic information that boosts the performance of traditional discriminative matching models when used as auxiliary input.
The model successfully captures market preferences captured by the reward model, evidenced by higher average reward scores in the RL-tuned model compared to the SFT baseline.

📚 Prerequisite Knowledge

Prerequisites

Basic understanding of Large Language Models (LLMs) and Transformers
Familiarity with Recommender Systems (collaborative filtering, content-based filtering)
Knowledge of Reinforcement Learning, specifically PPO and Reward Modeling

Key Terms

SFT: Supervised Fine-Tuning—training the model on a labeled dataset (matched CV-JD pairs) to learn the basic task format

RLRF: Reinforcement Learning from Recruiter Feedback—aligning the model with market needs using a reward model trained on recruiter acceptance/rejection data

PPO: Proximal Policy Optimization—an RL algorithm used to update the generator's policy to maximize the reward signal while maintaining training stability

CV: Curriculum Vitae—a document detailing a person's career history and qualifications

JD: Job Description—a text document outlining the responsibilities and requirements of a specific job role

KL divergence: A statistical distance measure used in RL to prevent the fine-tuned model from deviating too far from the initial supervised model

BLEU: Bilingual Evaluation Understudy—a metric for evaluating the quality of text which counts the overlap of n-grams between the candidate and reference text

ROUGE: Recall-Oriented Understudy for Gisting Evaluation—a set of metrics used to evaluate automatic summarization and translation in NLP

AUC: Area Under the Curve—a performance measurement for classification problems at various threshold settings