PPO: Proximal Policy Optimization—a popular on-policy reinforcement learning algorithm that restricts policy updates to a small trust region to ensure stability
colored noise: Noise signals where the power spectral density is not constant but varies with frequency (e.g., pink noise, red noise), creating temporal correlations between samples
PSD: Power Spectral Density—a measure of a signal's power content versus frequency; for colored noise, PSD proportional to 1/f^beta
beta: The exponent in 1/f^beta that determines the 'color' of the noise; beta=0 is white noise, beta=1 is pink, beta=2 is red (Brownian motion)
re-parameterization trick: A technique to sample from a distribution (like Gaussian) by separating the deterministic parameters (mean, std) from the stochastic element (noise), allowing gradients to flow through the sampling step
white noise: Uncorrelated noise with constant power spectral density (beta=0), used as the default in standard PPO
pink noise: Noise with power spectral density inversely proportional to frequency (beta=1), found effective for off-policy RL
brownian motion: Red noise (beta=2), equivalent to a random walk or integrating white noise over time
on-policy: RL algorithms that learn strictly from data collected by the current policy (e.g., PPO), unlike off-policy methods that can learn from historical data