GRPO: Group Relative Policy Optimization—an RL algorithm that improves a policy by comparing a group of outputs generated for the same input, eliminating the need for a separate critic model
MOS: Mean Opinion Score—a numerical measure of the perceived quality of an image, usually obtained by averaging human ratings
PLCC: Pearson Linear Correlation Coefficient—a metric measuring the linear correlation between predicted scores and ground truth
SRCC: Spearman Rank-Order Correlation Coefficient—a metric measuring the monotonic relationship (ranking order) between predicted scores and ground truth
SFT: Supervised Fine-Tuning—training a model on a dataset of input-output pairs to adapt it to a specific task
OOD: Out-Of-Distribution—data that differs significantly from the data seen during training
AIGC: AI-Generated Content—media generated by artificial intelligence models
KL divergence: Kullback-Leibler divergence—a statistical distance measuring how one probability distribution differs from a second, reference distribution