← Back to Paper List

ProgAgent:A Continual RL Agent with Progress-Aware Rewards

Jinzhou Tan, Gabriel Adineera, Jinoh Kim
University of California, San Diego, Texas A&M University-Commerce
arXiv (2026)
Agent RL Memory MM

📝 Paper Summary

Continual Reinforcement Learning (CRL) Visual Reward Learning Robotic Manipulation
ProgAgent unifies progress-based visual reward learning with a high-throughput JAX architecture to enable scalable continual robot learning without manual rewards or catastrophic forgetting.
Core Problem
Lifelong robotic learning suffers from catastrophic forgetting of past skills and the impracticality of manually designing dense rewards for every new task.
Why it matters:
  • Adapting to new tasks typically overwrites prior capabilities, preventing long-term autonomy in dynamic environments
  • Crafting dense, shaped rewards for complex manipulation is labor-intensive and does not scale to open-world settings
  • Prior methods treat reward learning and continual learning systems as separate problems, leading to inefficiencies and brittleness under distribution shift
Concrete Example: In a sequence of manipulation tasks, an agent might learn to open a door but forget this skill when learning to pick up an object. Furthermore, existing visual reward models often give high confidence (false positive rewards) to novel, non-expert states encountered during exploration, derailing the learning process.
Key Novelty
Unified Progress-Aware JAX-Native Agent
  • Conceptualizes reward as a learned state-potential function derived from video progress, ensuring theoretically grounded shaping that aligns with expert trajectories
  • Incorporates an adversarial push-back mechanism that regularizes the reward model on exploratory data, preventing overconfidence on out-of-distribution states
  • Embeds the entire loop—data collection, reward updates, and policy optimization—into a fully JIT-compiled JAX pipeline for massive parallelization
Breakthrough Assessment
8/10
Proposes a strong theoretical link between visual progress and potential-based shaping, combined with a modern systems approach (JAX) that enables computationally expensive continual learning techniques.
×