| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| MEGa significantly outperforms continual learning baselines on the Fictional Character dataset, maintaining high recall and QA accuracy where others fail. | ||||
| Fictional Character Dataset | Recall Cosine Similarity | 0.10 | 0.95 | +0.85 |
| Fictional Character Dataset | QA Accuracy (GPT Judge) | 0.10 | 0.98 | +0.88 |
| On the Wikipedia dataset, MEGa maintains high performance while baselines degrade, though RAG remains the ceiling. | ||||
| Wikipedia 2024 Events | QA Accuracy (GPT Judge) | 0.38 | 0.98 | +0.60 |
| Wikipedia 2024 Events | Log Prob | -0.20 | -0.05 | +0.15 |
| MEGa preserves general capabilities (MMLU) better than full fine-tuning approaches. | ||||
| MMLU | Macro Accuracy | 0.38 | 0.66 | +0.28 |