Process Mining: Techniques to analyze process data (event logs) to discover, monitor, and improve real processes
Conformance Checking: A process mining task that compares an observed event log with a reference process model to find deviations
Event Log: A hierarchical data structure recording the execution of a process, consisting of traces and events with attributes
Trace: A sequence of events corresponding to a single execution (here, one reasoning attempt)
Alignment: A mapping between moves in a trace and moves in a process model to minimize deviation cost
Inductive Miner (IM): An algorithm used to discover a process model (like a Petri net) from an event log
Fitness: A metric measuring how much of the observed behavior (trace) can be explained by the model
Precision: A metric measuring how much the model forbids behavior that was not observed in the trace (avoiding underfitting)
GRPO: Group Relative Policy Optimization—a sparse reward RL method that normalizes rewards within a group of samples to reduce variance without a critic
GSPO: Group Shared Policy Optimization—a variant of GRPO
RLOO: Reinforced Leave-One-Out—a policy gradient baseline that uses the mean reward of other samples in a batch to reduce variance
DeepSeek R1: A strong reasoning model used here as the 'teacher' to generate reference traces
Think tags: XML-style tags (<think>...</think>) used to enclose the reasoning process in model outputs