Evaluation Setup
Theoretical analysis of sample complexity and regret bounds
Metrics:
- Sample Complexity (number of episodes to reach ϵ-optimality)
- Regret (cumulative suboptimality over K episodes)
- Statistical methodology: Mathematical proofs providing high-probability bounds (1-δ)
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Theoretical Bound |
Episodes |
Not reported in the paper |
˜O(H^7 A^2 d^4 / (τ^2 ϵ^2)) |
Not reported in the paper
|
| Theoretical Bound |
Cumulative Regret |
Not reported in the paper |
˜O(τ^-1 H^3 A d^2 sqrt(K)) |
Not reported in the paper
|
Main Takeaways
- The proposed algorithm ELA is theoretically sample-efficient for CVaR optimization in large state spaces, breaking the 'curse of dimensionality' associated with tabular methods.
- The sample complexity scales polynomially with the rank of the MDP (d) and action space (A), matching dependencies of risk-neutral algorithms like Rep-UCB, though with higher horizon (H) dependence.
- Computational efficiency is achievable via the ELLA algorithm, which uses discretized Least-Squares Value Iteration, proving that planning for CVaR in low-rank models is tractable.