LRM: Large Reasoning Model—models like DeepSeek-R1 or OpenAI o1 explicitly trained to generate long intermediate reasoning steps (Chain-of-Thought) before answering
y_CoT: The reasoning trace (chain-of-thought) generated by the model before the final answer
y_ans: The final answer generated by the model after the reasoning trace
ZeroThink: A decoding strategy that forces the model to output an empty thought block (<think></think>), effectively bypassing the reasoning process
MoreThink: A decoding strategy that forces the model to extend its reasoning process by suppressing the end-of-thought token
Safe@K: A metric indicating if ALL K generated responses for a given input are safe (binary 0/1)
ConsSafe@K: A consensus metric indicating if at least K/2 of the generated responses are safe
StrongReject: A dataset of policy-violating queries used to evaluate LLM refusal capabilities
WildJailbreak: A dataset of adversarial jailbreak prompts mined from real user-model interactions