← Back to Paper List

Provably Efficient CVaR RL in Low-rank MDPs

Yulai Zhao, Wenhao Zhan, Xiaoyan Hu, Ho-fung Leung, Farzan Farnia, Wen Sun, Jason D. Lee
Princeton University, The Chinese University of Hong Kong, Cornell University
arXiv (2023)
RL

📝 Paper Summary

Risk-Sensitive Reinforcement Learning Low-Rank MDPs Exploration in RL
ELA and ELLA are the first algorithms to efficiently optimize Conditional Value at Risk (CVaR) in Low-Rank MDPs, balancing exploration, representation learning, and risk-averse planning.
Core Problem
Standard RL maximizes expected return, ignoring catastrophic tail risks, while existing risk-sensitive algorithms (optimizing CVaR) are restricted to small tabular settings and cannot handle large state spaces requiring function approximation.
Why it matters:
  • High-stakes applications like autonomous driving, finance, and healthcare require avoiding rare but catastrophic failures, which expected value maximization ignores
  • Current risk-sensitive theoretical guarantees do not scale to real-world problems with large or infinite state spaces where function approximation is necessary
  • Planning for CVaR is computationally harder than standard RL due to the non-linear objective and the need to manage a dynamic risk budget
Concrete Example: In autonomous driving, a standard RL agent might maximize average speed by occasionally making risky overtakes that rarely crash but are fatal when they do. A CVaR-optimizing agent would avoid these actions to improve the worst-case tail outcomes, but existing methods can't learn this in complex visual environments (large state space).
Key Novelty
Representation Learning for CVaR (ELA) & CVaR-LSVI Planning
  • Extends the augmented MDP framework (treating risk budget as a state) to Low-Rank MDPs, learning unknown transition representations via Maximum Likelihood Estimation (MLE)
  • Introduces a bonus-driven exploration mechanism specifically designed for the CVaR objective to balance discovering new states and mitigating worst-case risks
  • Proposes a computationally efficient planning oracle (CVaR-LSVI) that discretizes the risk budget and uses Least Squares Value Iteration to avoid enumerating the infinite state space
Evaluation Highlights
  • Achieves sample complexity of ˜O(H^7 A^2 d^4 / (τ^2 ϵ^2)) to find an ϵ-optimal policy, the first such bound for CVaR RL with function approximation
  • Proves that the proposed ELLA algorithm requires only polynomial running time and polynomial calls to an MLE oracle
  • Demonstrates that sample complexity depends on representation dimension d rather than the cardinality of the state space |S|, enabling scaling to infinite state spaces
Breakthrough Assessment
8/10
Significant theoretical advance as the first provably efficient algorithm for CVaR RL in the function approximation setting (Low-Rank MDPs). Bridges the gap between risk-sensitive RL theory and modern RL with large state spaces.
×