PDDL: Planning Domain Definition Language—a standard encoding for AI planning problems involving states, actions, and goals
STRIPS: Stanford Research Institute Problem Solver—a formal language for automated planning problems, a subset of PDDL
TPTP: Thousands of Problems for Theorem Provers—a standard library and format for testing automated theorem provers
PCFG: Probabilistic Context-Free Grammar—a grammar where each production rule has a probability, used here for generating structured data
RLVR: Reinforcement Learning with Verifiable Rewards—training models using objective success signals (like passing a test case) rather than human preference labels
NLL: Negative Log Likelihood—a standard loss metric in language modeling representing how well the model predicts the correct next token
bushiness factor: A parameter in the generation algorithm that forces derivation trees to expand laterally (width) alongside vertical growth (depth) to ensure structural complexity
balancing key: A mechanism to cap the frequency of specific features (e.g., answer labels) in a batch to prevent degenerate distributions
Stochastic rounding: Probabilistically rounding a float to the nearest integer (e.g., 2.3 becomes 2 with 70% chance, 3 with 30%) to allow continuous control over discrete parameters