Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research

📝 Paper Summary

Multi-agent Agentic RAG pipeline

This paper proposes a multi-agent system with configurable collaboration structures (vertical, horizontal, hybrid) for financial research, demonstrating that complex tasks like risk analysis benefit from agent ensembles while simple tasks favor single agents.

Core Problem

Existing financial AI tools typically rely on single-agent systems that fail to leverage collaborative intelligence, while standard multi-agent debate methods are impractical for complex, structured corporate workflows.

Why it matters:

Financial decision-making requires integrating diverse perspectives (risk, sentiment, fundamentals), which single models struggle to balance
Applying unstructured multi-agent debates (MAD) to large groups is inefficient and lacks the clear role definitions needed for rigorous investment research
There is a lack of empirical validation regarding which agent topology (hierarchy vs. flat team) works best for specific financial sub-tasks

Concrete Example: In a risk analysis task, a single agent might overlook a subtle liability in a 10-K form because it is overwhelmed by the context. A 'Vertical' multi-agent group assigns a leader to direct a subordinate specifically to 'analyze liquidity risks,' ensuring deeper coverage.

Key Novelty

Configurable Multi-Agent Collaboration Topologies for Finance

Defines three distinct collaboration structures (Horizontal, Vertical, Hybrid) that dictate how agents share information and authority, moving beyond simple 'more agents is better' logic
Implements a 'Vertical' structure via a nested chat mechanism where leaders issue hidden commands to subordinates, simulating corporate hierarchy within LLM interactions
Treats RAG as a unified tool function callable by agents, allowing them to autonomously refine query parameters rather than relying on fixed retrieval settings

Architecture

Overview of the agent structures (Single, Dual, Vertical, Horizontal, Hybrid) and the unified RAG/Tool calling mechanism.

Evaluation Highlights

Ensemble multi-agent structure achieves 66.7% accuracy in 'buy or not' investment decisions on Dow Jones stocks
Achieves a low 2.35% average difference in one-week target price predictions using the optimal agent configuration
Demonstrates that single agents actually outperform multi-agent groups on simpler tasks like fundamental and sentiment analysis

Breakthrough Assessment

7/10

Provides a practical, empirically grounded framework for structuring multi-agent teams in finance. While the underlying models are standard (GPT-4), the structural analysis of agent collaboration typologies is valuable.

⚙️ Technical Details

Problem Definition

Setting: AI-powered investment research analyzing SEC 10-K forms to predict stock movements and make investment recommendations

Inputs: Company ticker symbol and 2023 SEC 10-K form (PDF converted to text)

Outputs: Investment decision (Buy/Not Buy) and Target Price (1-week forecast)

Pipeline Flow

User Input (Company Ticker)
Leader Agent (Planning & Coordination)
Sub-Agents (Execution via Tools)
Tools (RAG, YFinance, Reddit API)
Final Report Generation

System Modules

Leader Agent

Global planning, task delegation, and final synthesis of reports

Model or implementation: GPT-4-1106-vision-preview

Analyst Agents

Execute specific analyses (Fundamentals, Sentiment, Risk) using tools

Model or implementation: GPT-4-1106-vision-preview

RAG Tool

Retrieve context from 10-K filings

Model or implementation: all-MiniLM-L6-v2 (Embedding model)

Novel Architectural Elements

Nested chat mechanism for Vertical Collaboration: Leader output triggers a separate, isolated chat loop with a specific subordinate that is invisible to others
Hybrid Collaboration structure: Maintains leader authority for final decisions but allows shared communication among subordinates

Modeling

Base Model: GPT-4-1106-vision-preview (OpenAI API)

Training Method: Inference-only prompting and orchestration

Compute: Not reported in the paper (relies on external API)

Comparison to Prior Work

vs. StockAgent/FinAgent: Uses structured multi-agent collaboration (Vertical/Hybrid) rather than single-agent workflows
vs. MAD: Focuses on structured role-based collaboration (leader-subordinate) rather than unstructured debate/consensus mechanisms

Limitations

Relies on closed-source GPT-4 API, making costs high and reproducibility dependent on OpenAI
Analysis limited to 30 Dow Jones companies and 2023 10-K forms
Performance on simple tasks (Fundamentals/Sentiment) degrades with multi-agent complexity compared to single agents

Reproducibility

Code: https://github.com/AI4Finance-Foundation/FinRobot

Code is publicly available at https://github.com/AI4Finance-Foundation/FinRobot. The paper relies on closed-source models (GPT-4) and third-party APIs (FMP, FinnHub, Reddit), meaning exact replication depends on API access and version stability.

📊 Experiments & Results

Evaluation Setup

Financial analysis of 30 Dow Jones companies using 2023 annual reports

Benchmarks:

Dow Jones 30 Analysis (Investment Research (Real-world data)) [New]

Metrics:

Target Price Prediction (Average Difference %)
Buy/Not Buy Accuracy
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Dow Jones 30 Analysis	Target Price Prediction Avg Diff	Not reported in the paper	2.35%	Not reported in the paper
Dow Jones 30 Analysis	Buy/Sell Accuracy	Not reported in the paper	66.7%	Not reported in the paper

Main Takeaways

Task complexity dictates optimal agent structure: Simple tasks (Fundamentals, Sentiment) are best handled by Single Agents, while complex tasks (Risk Analysis) require Multi-Agent groups.
The 'Ensemble' structure, which combines different agent groups, outperforms individual configurations, achieving 66.7% accuracy in decision making.
Vertical collaboration (strict hierarchy) optimizes efficiency for execution-heavy tasks, whereas Horizontal collaboration (shared chat) promotes better information exchange for simple cooperative tasks.
Increasing agent count does not strictly correlate with performance; for simpler tasks, adding agents introduces noise and reduces efficiency.

📚 Prerequisite Knowledge

Prerequisites

Understanding of Large Language Models (LLMs) and function calling
Familiarity with RAG (Retrieval-Augmented Generation)
Basic financial concepts (10-K forms, market sentiment, risk analysis)

Key Terms

RAG: Retrieval-Augmented Generation—providing LLMs with external data (like financial reports) to improve accuracy

SEC 10-K: A comprehensive annual report filed by public companies providing a detailed picture of financial performance

Vertical Collaboration: A hierarchical agent structure where a 'Leader' agent sends private commands to 'Subordinate' agents via nested chats

Horizontal Collaboration: A flat structure where all agents communicate in a shared group chat using a round-robin speaking order

Text2Param: The capability of an LLM to convert natural language instructions into structured parameters for function/API calls

Function Calling: A feature allowing LLMs to execute external code or APIs (e.g., fetching stock prices) rather than just generating text