← Back to Paper List

Reflection-Driven Control for Trustworthy Code Agents

Bin Wang, Jiazheng Quan, Xingrui Yu, Hansen Hu, Yuhao, Ivor Tsang
arXiv (2025)
Agent Memory RAG Reasoning

📝 Paper Summary

Agentic Security Code Generation Agents Memory-augmented Agents
Reflection-Driven Control enhances agent safety by embedding a standardized reflection loop that uses lightweight checks and retrieved repair examples to intercept and fix unsafe code during generation.
Core Problem
Autonomous LLM agents often generate unsafe, unconstrained, or hallucinatory code, and existing safety controls are typically post-hoc patches that lack integration into the agent's internal reasoning process.
Why it matters:
  • Jailbreaks and prompt injections in autonomous agents can lead to system-level risks like hazardous tool calls or agent worms
  • Current workflows lack auditability, making it difficult to trace the evidential basis of an agent's decision or repair logic
  • Agents need to balance autonomy with strict safety compliance without incurring prohibitive computational overhead
Concrete Example: When an agent generates code containing a SQL injection vulnerability, a standard agent might commit the code or rely on external scanners. The proposed system creates an internal 'UNSAFE' verdict, retrieves a secure coding guideline from memory, and forces the agent to self-correct the query to a parameterized format before final output.
Key Novelty
Standardized Reflex Module (Plan–Reflect–Verify)
  • Elevates reflection from an external post-processing step to a first-class internal control circuit that interrupts the generation loop when risks are detected
  • Utilizes a dual-layer Reflective Memory (dynamic past repairs + static security standards) to ground self-correction in verifiable evidence
  • Implements a 'Lightweight Self-Checker' to route only risky code through the expensive reflection process, minimizing overhead for safe outputs
Architecture
Architecture Figure Figure 1
The Reflex Agent Architecture. It contrasts the standardized module (left) with the integrated agent workflow (right).
Breakthrough Assessment
7/10
Proposes a practical, architectural solution to agent safety that balances cost and control. While the core concept of reflection is known, the standardized modular implementation and evidence-grounded memory loop are strong contributions to trustworthy AI.
×