LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions

📝 Paper Summary

Agent Safety and Reliability Hallucination Detection and Mitigation

This survey proposes a comprehensive taxonomy for hallucinations in LLM-based agents, identifying five distinct types across the agent workflow (Reasoning, Execution, Perception, Memorization, Communication) and reviewing mitigation strategies.

Core Problem

LLM-based agents suffer from hallucinations that are more complex than simple text generation errors, involving multi-step reasoning failures, tool misuse, and incorrect environmental perception.

Why it matters:

Unlike static LLM hallucinations, agent hallucinations involve 'physically consequential' errors where incorrect actions directly affect real-world task execution and system devices
Existing surveys focus on Natural Language Generation (NLG) hallucinations (factuality/faithfulness), overlooking the compound errors arising from agent modules like perception, memory, and tool use
Errors propagate through long chains: a hallucination in reasoning can cascade into execution and memory, compounding over time

Concrete Example: In a tool-use scenario, an agent might hallucinate a tool call by inventing parameter values that don't exist (Execution Hallucination), or it might correctly select a tool but fail to decompose the user's intent into the necessary sub-steps due to logical fallacies (Reasoning Hallucination).

Key Novelty

Internal-External Decomposition Taxonomy

Decomposes agent architecture into 'Internal State' (Belief State) and 'External Behaviors' (Reasoning, Execution, Perception, Memorization, Communication)
Maps specific hallucination types to these workflow stages, distinguishing between cognitive errors (internal) and action/sensory errors (external)
Identifies 18 specific triggering causes underlying these hallucination types, such as 'Tool Documentation Limitation' or 'Sub-intention Disorder'

Evaluation Highlights

Identifies 5 major categories of agent hallucinations: Reasoning, Execution, Perception, Memorization, and Communication
Catalogs 18 distinct triggering causes, including 'Inadequate Subjective Comprehension' and 'Deficient Dependency Modeling'
Reviews over 200 related papers to summarize mitigation strategies across the agent lifecycle

Breakthrough Assessment

9/10

The first comprehensive survey specifically targeting hallucinations in agents (vs. general LLMs). It provides a crucial structural framework (taxonomy) that will likely define future research in agent safety.

⚙️ Technical Details

Problem Definition

Setting: Partially Observable Markov Decision Process (POMDP) defined as an 8-tuple (S, A, T, G, O, Z, R, γ)

Inputs: Goal g, Observation o, Belief State b_t

Outputs: Action a_t (content generation or tool use)

Pipeline Flow

Goal Understanding (Reasoning)
Planning Generation (Reasoning)
Tool Selection & Calling (Execution)
Environment Transition & Perception
Memorization & Belief Update

System Modules

Reasoning Module

Infers user intention, decomposes goals into sub-intentions, and generates plans

Model or implementation: LLM backbone

Execution Module

Translates plans into executable actions, selecting tools and populating parameters

Model or implementation: LLM backbone + External Tools

Perception Module

Converts multi-modal environmental feedback into observations

Model or implementation: Multi-modal Encoders / LLM

Memorization Module

Stores actions and observations to update the agent's history

Model or implementation: Vector DB / Context Window

Novel Architectural Elements

Taxonomy framework dividing agent into Internal State (Belief) vs. External Behaviors (Reasoning, Execution, Perception, Memorization, Communication)

Comparison to Prior Work

vs. Ji et al.: Focuses on 'Agent' hallucinations (actions/planning) rather than just 'NLG' (text generation) errors
vs. Huang et al.: Expands scope from linguistic factuality to include tool-use errors, perception failures, and multi-agent communication issues
vs. Xi et al. [not cited in paper]: While Xi et al. survey agent architectures generally, this paper specifically targets the *failure modes* (hallucinations) within those architectures

Limitations

Survey nature means no empirical benchmarks are introduced or tested directly in this paper
Taxonomy is theoretical; real-world agents may exhibit hybrid hallucinations that blur these categories
Mitigation strategies are summarized from existing literature rather than proposed as novel algorithms

Reproducibility

Code: https://github.com/ASCII-LAB/Awesome-Agent-Hallucinations

The paper is a survey and does not propose a specific new model to train. However, it provides a GitHub repository (https://github.com/ASCII-LAB/Awesome-Agent-Hallucinations) containing the curated list of 200+ papers reviewed.

📊 Experiments & Results

Evaluation Setup

Qualitative review and taxonomy construction based on existing literature

Metrics:

Statistical methodology: Not explicitly reported in the paper

Main Takeaways

Agent hallucinations are fundamentally different from LLM hallucinations: they involve 'human-like behaviors' (misjudgments, fabricated actions) rather than just linguistic errors
The consequences of agent hallucinations are more severe because they involve embodied actions (e.g., smart home control) rather than just text output
Errors propagate: A hallucination in the 'Reasoning' stage (e.g., wrong plan) makes the subsequent 'Execution' stage incorrect even if the tool is used technically correctly
Mitigation requires a holistic approach covering the entire POMDP loop: improving the 'Brain' (reasoning), 'Perception' (grounding), and 'Action' (tool interfaces)

📚 Prerequisite Knowledge

Prerequisites

Understanding of LLM-based Agent architectures (Brain, Perception, Action)
Markov Decision Processes (MDP) and POMDPs
Basic concepts of LLM hallucinations (factuality vs. faithfulness)

Key Terms

POMDP: Partially Observable Markov Decision Process—a mathematical framework for modeling decision-making where the agent cannot directly observe the full state of the environment

Belief State: The agent's internal, subjective representation of the environment state, updated over time based on observations and actions

NLG: Natural Language Generation—the subfield of AI focused on producing human-like text

Factuality Hallucination: Discrepancies between generated content and verifiable real-world facts

Faithfulness Hallucination: Deviations of the generated output from the user's original input or source context

Intention Decomposition: The process where an agent breaks down a complex high-level goal into a sequence of manageable sub-intentions

Tool Solvability: Whether a generated plan can actually be executed given the available tools and current conditions

Broadcasting: In multi-agent systems, the process of an agent sending messages to neighboring nodes/agents according to a plan

Structure Evolution: The dynamic update of the communication structure among agents in a multi-agent system