AGENTKB: A universal memory infrastructure enabling seamless experience sharing across heterogeneous agent frameworks without retraining
disagreement gate: A mechanism that selectively integrates feedback only when the refined plan significantly differs from the original plan (based on embedding similarity), ensuring stability
pass@k: An evaluation metric measuring the percentage of problems solved correctly given k attempts
hybrid retrieval: A search strategy combining lexical (keyword-based, e.g., BM25) and semantic (embedding-based) similarity scores
heterogeneous agent frameworks: Different software architectures for building AI agents (e.g., smolagents, OpenHands) that typically have incompatible internal representations
smolagents: A lightweight library for building agentic systems
OpenHands: An open-source platform for software development agents
GAIA: A benchmark for General AI Assistants covering reasoning and tool use
SWE-bench: A benchmark for evaluating large language models on software engineering tasks via GitHub issues
HLE: Humanity's Last Exam—a difficult multi-modal benchmark for reasoning
GPQA: A challenging dataset of graduate-level questions in biology, physics, and chemistry