SAL: Soundness-Aware Level—a metric measuring how well a model's internal probability distributions distinguish between sound and unsound logic rules
RLVR: Reinforcement Learning with Verifiable Rewards—a training method where models are optimized using objective feedback (e.g., correct/incorrect math answers)
SAE: Sparse Autoencoder—a neural network trained to decompose an LLM's dense hidden states into a sparse set of interpretable features
Horn Clause: A logical rule of the form 'If A and B, then C', used here to represent internal reasoning steps between features
JSD: Jensen-Shannon Divergence—a statistical metric used to measure the similarity between two probability distributions
LLM Judge: Using a high-capability LLM (DeepSeek-R1) to annotate the semantic quality of extracted rules based on feature descriptions
Strict Rule: A logic rule representing necessary truths (e.g., mathematical theorems)
Plausible Rule: A logic rule representing strong heuristics that are usually but not universally true
Noise Rule: A logic rule representing spurious correlations or nonsensical connections