Evaluation Setup
Depression severity estimation using the E-DAIC dataset (text modality only).
Benchmarks:
- E-DAIC (Depression Severity Regression)
Metrics:
- Concordance Correlation Coefficient (CCC)
- Mean Absolute Error (MAE)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Comparison with traditional multimodal baselines showing LLMs with CoT outperform deep learning fusion methods. |
| E-DAIC |
CCC |
0.583 |
0.732 |
+0.149
|
| Ablation studies demonstrating the impact of the proposed CoT prompting strategy on Standard LLMs (without inherent CoT). |
| E-DAIC |
CCC |
0.550 |
0.637 |
+0.087
|
| E-DAIC |
MAE |
4.33 |
4.07 |
-0.26
|
| E-DAIC |
CCC |
0.696 |
0.732 |
+0.036
|
| Ablation studies showing that even models with inherent CoT capabilities benefit from the specific Emotion-to-Reasoning structured framework. |
| E-DAIC |
CCC |
0.597 |
0.705 |
+0.108
|
| E-DAIC |
MAE |
4.23 |
3.55 |
-0.68
|
| E-DAIC |
CCC |
0.625 |
0.677 |
+0.052
|
Main Takeaways
- Structured CoT prompting consistently improves depression detection performance across both standard LLMs and reasoning-enhanced LLMs.
- The 'Emotion-to-Reasoning' framework effectively bridges the gap between raw text processing and clinical diagnostic standards.
- LLMs using this text-only strategy can outperform complex multimodal systems that use audio and video, highlighting the density of diagnostic information in linguistic cues.
- The method improves auditability by generating explicit lists of depressive factors (social, biological, psychological) alongside the score.