Syllogism: A logical argument applying deductive reasoning to arrive at a conclusion based on two premises (Major and Minor)
Middle Term: The term that appears in both premises but not in the conclusion, which must be 'eliminated' during the reasoning process to link the subject and predicate
RLHF: Reinforcement Learning from Human Feedback—a method to align language models using reward models trained on preference data
MuseD: Multi-step Deduction—the authors' proposed method for synthesizing logical data and scoring reasoning steps
Categorical Proposition: A proposition that asserts or denies that all or some of the members of one category (the subject term) are included in another (the predicate term)
PPO: Proximal Policy Optimization—a reinforcement learning algorithm used here to fine-tune the model against the reward model
Step Score: A metric defined in this paper that calculates the ratio of correctly eliminated middle terms in the generated reasoning chain
FOLIO: First-Order Logic Interpolation and Optimization—a benchmark dataset for first-order logic reasoning
ProofWriter: A synthetic dataset for logical reasoning over natural language rules