QSAF: A Novel Mitigation Framework for Cognitive Degradation in Agentic AI

📝 Paper Summary

Agentic AI Safety Runtime Security Cognitive Architecture

The paper defines Cognitive Degradation as a distinct vulnerability class in AI agents and proposes QSAF Domain 10, a lifecycle-aware framework with runtime controls to detect and mitigate internal system failures.

Core Problem

Autonomous AI agents suffer from internal runtime failures—such as memory starvation, planner recursion, and context flooding—that lead to silent drift and hallucinations, which traditional external prompt injection defenses fail to detect.

Why it matters:

Current defenses focus on external threats (prompt injection) while internal cognitive failures (e.g., logic loops, memory poisoning) remain largely unaddressed.
Agentic frameworks like LangChain and AutoGPT introduce complex dependencies where a failure in one module (e.g., memory latency) cascades into systemic collapse.
There is no existing structured lifecycle model to identify the progressive stages of agent degradation before total system failure occurs.

Concrete Example: When LLaMA3 is prompted with 'You must keep refining this task until it is perfect. Don't stop,' it enters a recursive loop generating self-referential subtasks. Without the proposed starvation detection, the planner degrades until memory and output modules fail completely.

Key Novelty

Qorvex Security AI Framework (QSAF) Domain 10

Defines a formal six-stage 'Cognitive Degradation Lifecycle' (from Trigger Injection to Systemic Collapse) to model how internal agent faults evolve.
Introduces seven specific runtime controls (QSAF-BC-001 to BC-007) that act as a resilience overlay, monitoring subsystems for signals like latency spikes or entropy drift to trigger fallback logic.
Maps agentic architectures to human cognitive analogs to enable behavioral introspection, moving beyond simple input/output filtering.

Architecture

The QSAF Domain 10 Architecture, illustrating the overlay of security controls on top of standard agent subsystems.

Evaluation Highlights

Identified critical 'Planner Entrapment' vulnerability in LLaMA3 where recursive goals caused infinite logic loops, undetected by default safety layers.
Demonstrated 'Persistent Memory Drift' in Mixtral and Claude, where hallucinated content was stored in vector memory and reused across sessions (Cross-Session Memory Poisoning).
Uncovered 'Output Suppression' risks in ChatGPT, which failed to warn users when toolchains returned null responses due to rate-limiting.

Breakthrough Assessment

7/10

Establishes a necessary new vulnerability class for agents and a structured defense framework. However, the paper is qualitative, lacking code or quantitative performance metrics (e.g., overhead, detection accuracy) for the proposed controls.

⚙️ Technical Details

Problem Definition

Setting: Runtime security monitoring of autonomous agentic systems composed of perception, memory, planning, and tool execution modules.

Inputs: Telemetry signals from agent subsystems (latency, token usage, output entropy, memory access logs).

Outputs: Real-time mitigation actions (fallback routing, starvation detection, memory integrity enforcement, session reset).

Pipeline Flow

Agent Subsystems (Perception, Memory, Planning, Tool, Output) generate telemetry
QSAF Monitoring Layer (Health Probes, Starvation Monitors, Token Guards) analyzes signals
Lifecycle State Monitor classifies current degradation stage (1-6)
QSAF-BC Control Layer triggers specific mitigation (e.g., BC-004 for loops)

System Modules

Health Probes (Monitoring Layer)

Perform continuous liveness checks and monitor timeouts

Model or implementation: Heuristic/Rule-based monitor

Starvation Monitors (Monitoring Layer)

Detect latency spikes and request bottlenecks in memory/API calls

Model or implementation: Heuristic/Rule-based monitor

Lifecycle State Monitor

Map telemetry signals to one of six degradation stages

Model or implementation: Classifier (Unspecified architecture)

QSAF-BC Control Layer

Trigger controls (BC-001 to BC-007) based on lifecycle state

Model or implementation: Policy Engine

Novel Architectural Elements

Overlay architecture that does not modify the base agent model but wraps it in a 'lifecycle-aware' security layer
Formal six-stage degradation state machine driving mitigation logic

Modeling

Base Model: Model-agnostic framework (Tested monitoring on LLaMA3, Mixtral 8x7b, ChatGPT, Claude)

Compute: Not reported in the paper

Comparison to Prior Work

vs. OWASP/Constitutional AI: QSAF addresses internal runtime degradation (loops, starvation) rather than just input/output safety.
vs. OpenAI SafetyKit: QSAF operates in real-time with lifecycle awareness, whereas SafetyKit often relies on post-hoc moderation.
vs. NeMo Guardrails [not cited in paper]: NeMo enforces dialog flows; QSAF focuses on lower-level cognitive resource management (memory, tokens, planner health).

Limitations

No quantitative performance metrics (detection accuracy, false positive rates) provided for the proposed controls.
The framework is proprietary, limiting external validation of the architecture.
Testing was limited to qualitative observations of failure modes on 400+ prompts, without large-scale statistical benchmarking.

Reproducibility

No replication artifacts mentioned in the paper. The framework (QSAF) is described as 'proprietary, enterprise-grade'. Code, weights, and specific prompt templates for the controls are not provided.

📊 Experiments & Results

Evaluation Setup

Structured testing across 400+ prompts to induce cognitive degradation and observe failure modes.

Benchmarks:

Custom Degradation Prompts (Adversarial capability testing (Context Flooding, Tool Starvation, Planner Entrapment)) [New]

Metrics:

Qualitative observation of system behavior (Drift, Collapse, Hallucination)
Presence of recursive loops
Persistence of memory poisoning
Statistical methodology: Not explicitly reported in the paper

Experiment Figures

The Six-Stage Cognitive Degradation Lifecycle.

Main Takeaways

Cognitive degradation is a replicable vulnerability class: LLaMA3, Mixtral, and Claude all exhibited specific failures (loops, poisoning) when subjected to resource pressure or recursive prompts.
Existing platforms lack internal introspection: Major LLMs (ChatGPT, Claude) failed to detect 'silent' failures like output suppression or cross-session memory drift.
Memory is a critical attack surface: Hallucinations generated during degraded states were successfully stored and retrieved in subsequent sessions (Mixtral/Claude), confirming the need for memory integrity controls (BC-007).
Planner collapse is distinct from prompt injection: Recursive logic failures in LLaMA3 occurred due to internal planning defects, not adversarial role-play, validating the need for the QSAF planner monitoring (BC-004).

📚 Prerequisite Knowledge

Prerequisites

Understanding of Agentic AI architectures (e.g., LangChain, AutoGPT)
Knowledge of RAG (Retrieval-Augmented Generation) systems
Familiarity with cybersecurity vulnerability classes (e.g., Prompt Injection)

Key Terms

Cognitive Degradation: A vulnerability class where agentic systems progressively fail due to internal resource exhaustion, logic loops, or memory corruption, distinct from external attacks.

QSAF: Qorvex Security AI Framework—an enterprise-grade security framework with specific domains for AI protection.

RAG: Retrieval-Augmented Generation—systems that fetch external data to ground LLM responses.

Planner Recursion: A failure mode where the planning module generates infinite or circular subtasks, preventing task completion.

Memory Starvation: A state where the memory module (e.g., Vector DB) becomes unresponsive or latent, forcing the agent to hallucinate or skip context retrieval.

LPCI: Logic-layer Prompt Control Injection—attacks embedding delayed payloads in memory or tool outputs to evade filters.

MAESTRO: A tactical framework for adversarial attacks on AI systems referenced for classification of attack vectors.