← Back to Paper List

Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling

Susobhan Ghosh, Raphael Kim, Prasidh Chhabria, Raaz Dwivedi, Predrag Klasnja, Peng Liao, Kelly Zhang, Susan Murphy
Department of Computer Science, Harvard University, Department of Electrical Engineering and Computer Science, MIT, School of Information, University of Michigan
arXiv (2023)
P13N RL

📝 Paper Summary

Assessment of Personalization in Online RL Digital Health Interventions
A resampling-based framework to determine if apparent personalization in online RL algorithms is genuine learning or merely an artifact of the algorithm's inherent stochasticity.
Core Problem
Stochastic online RL algorithms can produce user trajectories that appear to be 'personalized' (consistently selecting specific actions) purely by chance, even when no actual learning has occurred.
Why it matters:
  • Researchers need to verify if expensive RL algorithms are actually delivering value over simpler methods before deploying them in optimized real-world interventions
  • False impressions of personalization can lead to incorrect scientific conclusions about which user features (e.g., location, mood) are relevant for treatment
  • Distinguishing signal from noise helps refine future algorithm designs by identifying which features truly drive advantageous decisions
Concrete Example: In the HeartSteps trial, User 2 showed a pattern where the algorithm consistently favored 'send suggestion' when step variation was low, and 'do not send' when high. The researchers needed to know: did the algorithm actually learn this preference, or did random sampling just happen to pick these actions repeatedly?
Key Novelty
Resampling-based 'Truth-in-Advertising' for RL Personalization
  • Defines 'interestingness' scores that quantify visual patterns of personalization (e.g., consistent action selection in specific states)
  • Constructs a null hypothesis world where no advantage exists (or no feature-specific advantage exists) using generative models fitted to user data
  • Resimulates the RL algorithm hundreds of times in this null world to build a reference distribution of 'interestingness' arising solely from stochasticity, then compares the real user's score to this distribution
Evaluation Highlights
  • Confirmed that 18 out of 63 users in the HeartSteps trial showed personalization patterns (consistently positive advantage) that could be explained by chance/stochasticity alone
  • Found strong evidence for User 1 that high 'interestingness' (score = 1.0) was statistically unlikely to occur by chance (p-value < 0.002), confirming genuine personalization
  • Refuted the hypothesis that the 'variation' feature drove personalization for User 2; the observed differential treatment pattern was likely a stochastic artifact (p-value ~ 0.53)
Breakthrough Assessment
7/10
Provides a crucial methodological sanity check for the growing field of RL in digital health. While not a new RL algorithm itself, it addresses a significant evaluation gap in real-world deployments.
×