| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| EMG-RAG consistently outperforms baselines on standard QA tasks using GPT-4. | ||||
| Personalized QA | BLEU | 64.16 | 75.99 | +11.83 |
| Personalized QA | ROUGE-L | 84.74 | 88.06 | +3.32 |
| In scenarios with continuous memory updates (weeks 1-4), EMG-RAG shows superior robustness due to its editable graph structure. | ||||
| Personalized QA | ROUGE-L | 86.39 | 96.93 | +10.54 |
| Autofill Forms | Exact Match | 88.89 | 95.83 | +6.94 |
| Ablation studies confirm the necessity of both the Warm Start (WS) and Policy Gradient (PG) training stages. | ||||
| Personalized QA | BLEU | 65.65 | 75.99 | +10.34 |
| Personalized QA | BLEU | 65.07 | 75.99 | +10.92 |