Evaluation Setup
Analysis of SEC 10-K filings from publicly listed U.S. companies.
Benchmarks:
- Financial Document Analysis (Risk factor extraction, Financial summarization, Regulatory QA) [New]
Metrics:
- Factual Coverage (against analyst-curated reference)
- Compliance Accuracy (vs. gold-standard responses)
- Revision Rate (frequency of downstream rejections)
- Redundancy Penalty (repeated/contradictory info)
- Likert Scale Human Ratings (Coherence, Relevance)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Comparison of the proposed Full System against Static and Adaptive-only baselines. |
| SEC 10-K Analysis |
Compliance Accuracy |
0.74 |
0.94 |
+0.20
|
| SEC 10-K Analysis |
Factual Coverage |
0.76 |
0.92 |
+0.16
|
| SEC 10-K Analysis |
Revision Rate |
Not reported in the paper |
Not reported in the paper |
Not reported in the paper
|
| SEC 10-K Analysis |
Redundancy Penalty |
Not reported in the paper |
Not reported in the paper |
Not reported in the paper
|
| SEC 10-K Analysis |
Compliance Accuracy |
Not reported in the paper |
Not reported in the paper |
Not reported in the paper
|
Main Takeaways
- Parallel agent evaluation is critical for ambiguity: redundant execution with selection outperforms single-path execution in detecting nuanced risks (e.g., off-balance sheet arrangements).
- Feedback loops sever error chains: Allowing downstream agents to reject inputs reduced redundancy penalties by 73%.
- Dynamic routing enables specialization: The system effectively offloads technical legal parsing to compliance agents while keeping summarizers focused on narrative, improving workflow speed by 14%.