Evaluation Setup
Theoretical sample complexity analysis
Benchmarks:
- Tabular MDP Sample Complexity (Theoretical Bound)
Metrics:
- Sample Complexity (number of episodes K to reach epsilon-optimality)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| The main result is a theoretical upper bound on sample complexity derived in Theorem 1. |
| Tabular MDP Sample Complexity |
Episodes (K) |
H^3 * S * A / epsilon^2 |
H^3 * S * A * min(H*sigma, 1) / epsilon^2 + H^3 * S * C*(sigma) / epsilon^2 |
Reduction proportional to uncovered fraction sigma
|
Main Takeaways
- Hybrid RL allows sample complexity to scale with the 'uncovered' part of the state space (σ) rather than the whole space.
- The algorithm is robust: if the offline data is useless (σ=1), it recovers the performance of pure online RL.
- The algorithm is strictly better than pure offline RL when the dataset has partial coverage (C*(0) = infinity), as it can explore the missing parts.