RAN Slicing: Partitioning physical radio network resources into multiple virtual networks (slices) to serve different service requirements simultaneously
KTO: Kahneman-Tversky Optimization—a loss function for aligning LLMs to preferences that supports unbalanced datasets by modeling prospect-theory utility
Reflective MDP: A decision process formalism where the agent outputs linguistic reflections and analyses alongside actions, replacing scalar rewards with language feedback
PRB: Physical Resource Block—the smallest unit of resource allocation in LTE/5G networks (time-frequency grid)
MOOP: Multi-Objective Optimization Problem—optimizing for multiple conflicting goals simultaneously (e.g., speed vs. energy)
Hallucination: In this context, when an LLM generates plausible but incorrect resource allocations or analyses not grounded in the environment state
Actor-Reflector: Proposed architecture replacing the RL 'Critic' (value estimator) with a 'Reflector' (linguistic evaluator) to guide policy updates
RfR: Refine-from-Reflection—the proposed fine-tuning framework that creates preference datasets from self-reflected trajectories