Evaluation Setup
Evaluated across retrieval-augmented reasoning, tool-use benchmarks, and embodied task environments
Benchmarks:
- Retrieval-Augmented Reasoning tasks (Information seeking and reasoning)
- Tool-Augmented Agent benchmarks (Complex tool usage)
- Embodied Task Environments (Interactive environment tasks)
Metrics:
- Task success rate
- Tool-use efficiency
- Collaborative robustness
- Statistical methodology: Not explicitly reported in the paper
Main Takeaways
- AutoAgent consistently improves task success rates compared to static and memory-augmented baselines.
- The framework demonstrates higher tool-use efficiency, likely due to the evolving Internal Cognition that refines tool preconditions.
- Collaborative robustness is enhanced through External Cognition, allowing agents to adapt to peer capabilities better than fixed-role systems.
- Elastic Memory Orchestration effectively reduces token overhead while retaining critical information for long-horizon reasoning.