| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Re-evaluation of reported MAS gains using modern frontier models (Gemini-2.0-Flash) shows a diminishing gap compared to original papers (often using GPT-3.5). | ||||
| Average across tasks | Accuracy Improvement (MAS - SAS) | 10 | 3 | -7 |
| Cost analysis reveals massive overhead for MAS compared to SAS. | ||||
| Average across 7 datasets | Input Token Multiplier (MAS / SAS) | 1 | 4 | 3 |
| Average across 7 datasets | Input Token Multiplier (MAS / SAS) | 1 | 220 | 219 |
| Performance of the proposed hybrid architecture. | ||||
| Various agentic applications | Accuracy Improvement | Not reported in the paper | Not reported in the paper | +1.1% to +12% |
| Various agentic applications | Cost Reduction | 100 | 11.9 | -88.1 |