Evaluation Setup
Theoretical derivation and validation on 'simple examples' (though specific results for these examples are not included in the provided text).
Metrics:
- Statistical methodology: Not explicitly reported in the paper
Main Takeaways
- Theoretically establishes that simulator accuracy is not necessary for optimal real-world performance; adapting parameters to maximize policy return is sufficient.
- Provides a general recipe for differentiation through the Stochastic Policy Gradient, enabling bi-level optimization for a wider class of RL algorithms (Policy Gradient) than previously possible.
- Identifies that the sensitivity of the in-sim policy involves two critic sensitivity terms: one w.r.t. simulator parameters and one w.r.t. policy parameters.