| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Offline analysis of LLM fine-tuning shows the model learns to format outputs and align with user behavior. | ||||
| Internal Log Data | Match Rate | 0 | 99 | +99 |
| Internal Log Data | Recall (Test Set) | Not reported in the paper | Not reported in the paper | Not reported in the paper |