Qiskit: An open-source Software Development Kit (SDK) for working with quantum computers
QPU: Quantum Processing Unit—the physical hardware chip that performs quantum computations
GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm that normalizes advantages within a group of sampled outputs to stabilize training without a separate value network
DPO: Direct Preference Optimization—a method to align models to preferences (e.g., correct vs incorrect code) without explicit reward modeling
SLERP: Spherical Linear Interpolation—a technique for merging model weights that preserves the geometric properties of the parameter space better than simple averaging
Transpilation: The process of rewriting a quantum circuit to match the specific constraints (connectivity, basis gates) of a target quantum device
Estimator/Sampler: Qiskit Runtime primitives; 'Estimator' calculates expectation values of operators, while 'Sampler' returns measured bitstrings from the quantum circuit
SFT: Supervised Fine-Tuning—training the model on high-quality input-output pairs
EPT: Extended Pretraining—continuing the pretraining phase on domain-specific data