Evaluation Setup
25 expert-curated market research queries across 4 categories (Performance, Competitive, Financial, Strategic)
Benchmarks:
- Market Research Queries (Complex Query Resolution) [New]
Metrics:
- Completeness (1-5 scale)
- Source Quality (1-5 scale)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| VMAO outperforms Single-Agent and Static Pipeline baselines across both quality metrics. |
| Market Research Queries |
Completeness (1-5) |
3.1 |
4.2 |
+1.1
|
| Market Research Queries |
Source Quality (1-5) |
2.6 |
4.1 |
+1.5
|
| Strategic Assessment Queries |
Completeness Improvement |
Not reported in the paper |
Not reported in the paper |
+53%
|
Main Takeaways
- Orchestration-level verification significantly improves completeness and source quality by catching gaps that single agents or static pipelines miss
- The largest gains (+53%) occur in open-ended 'Strategic Assessment' queries, while gains are smaller for well-defined 'Performance Analysis' queries
- Most queries (>75%) terminate via resource-based stop conditions (diminishing returns, token budget), indicating the system effectively trades off cost vs. quality
- Replanning primarily triggers 'retries' of existing questions rather than generating new ones, suggesting execution variance (tool failure) is a bigger issue than initial planning