Reasoning Boundary (RB): The maximum problem difficulty level at which a model maintains a specific accuracy threshold (e.g., 90%).
CFRB: Completely Feasible Reasoning Boundary—the difficulty range where model accuracy is ≥ 90%, implying mastery without extensive aid.
PFRB: Partially Feasible Reasoning Boundary—the difficulty range where accuracy is between 10% and 90%, requiring consensus or clearer prompts.
CIRB: Completely Infeasible Reasoning Boundary—the difficulty range where accuracy is ≤ 10%, implying the task is beyond the model's current capacity.
Combination Law: A formula estimating a model's performance on a complex task as the weighted harmonic mean of its performance on sub-tasks (e.g., planning vs. calculation).
MARP: Minimum Acceptable Reasoning Path—a prompting strategy that simplifies the reasoning process to the minimum necessary steps to reduce error accumulation.
BigGSM: A new dataset constructed by the authors offering greater calculation complexity and longer reasoning chains than standard GSM8K.
Self-Consistency: A decoding strategy that samples multiple reasoning paths and selects the most consistent answer to improve accuracy.
PAL: Program-Aided Language models—a method using code generation to solve reasoning problems.