Knowledge Graph (KG): A structured representation of data using a network of entities and relationships, typically formatted as triples (subject, predicate, object).
R-LLM: Reasoning-LLM; models explicitly optimized for reasoning, often using techniques like Chain-of-Thought or reinforcement learning (e.g., o1, DeepSeek-R1).
OneEval-Hard: A subset of the benchmark containing samples where a majority of tested LLMs failed empirically, filtered further by human experts for reasoning complexity.
F1 score: A metric balancing precision and recall, measuring the overlap between the predicted answer and the ground truth.
ISM@1: Input-Similarity-Metric at 1; a metric used for code evaluation to measure functional or semantic correctness of the top generated solution.
Dense retrieval: A method using vector embeddings to find relevant knowledge chunks based on semantic similarity rather than keyword matching.
Logic Base: A formal specification of a domain using concepts, properties, and axioms (rules), requiring deductive reasoning.