DPO: Direct Preference Optimization—a method to align models to preferences (e.g., 'response A is better than B') without a separate reward model
SFT: Supervised Fine-Tuning—training a model on high-quality input-output pairs (here, profile+context -> response)
Psychological Profile: A structured set of attributes (e.g., 'Gender: Female', 'Symptom: Insomnia', 'Severity: Severe') extracted from clinical literature (DSM-V) to define the simulated patient
Profile Noise Augmentation: A technique where the model generates a 'bad' response by using a slightly altered (noisy) profile, creating a negative sample for DPO training
Cognitive Distortions: Biased ways of thinking common in depression (e.g., catastrophizing), adapted from Beck's theory for the profiles
DSM-V: Diagnostic and Statistical Manual of Mental Disorders—the standard classification of mental disorders used by mental health professionals