← Back to Paper List

The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

Tianshi ZHENG, Yixiang Chen, Chengxi Li, Chunyang Li, Qing Zong, Haochen Shi, Baixuan Xu, Yangqiu Song, Ginny Y. Wong, Simon See
The Hong Kong University of Science and Technology, NVIDIA
Trans. Mach. Learn. Res. (2025)
Reasoning Benchmark

📝 Paper Summary

In-Context Learning (ICL) Prompt Engineering Reasoning
Chain-of-Thought prompting degrades performance in pattern-based in-context learning because the generated rationales disrupt the contextual continuity needed for implicit learning while failing to correctly infer explicit rules.
Core Problem
While Chain-of-Thought (CoT) typically improves reasoning, it consistently underperforms Direct Answering (DA) in pattern-based in-context learning tasks where models must induce rules from examples.
Why it matters:
  • Challenges the prevailing assumption that explicit reasoning (CoT) is universally beneficial for Large Language Model (LLM) problem-solving
  • Reveals a fundamental trade-off: explicit reasoning steps increase 'contextual distance,' disrupting the model's ability to implicitly pattern-match from demonstrations
  • Highlights the fragility of current LLMs in abstract pattern induction (e.g., symbolic or numerical rules) compared to their execution capabilities
Concrete Example: In a task like List Functions (e.g., input [1, 2] -> output [2, 3]), Direct Answering simply outputs [4, 5] for input [3, 4] by implicitly matching the 'add 1' pattern. When using CoT, the model might hallucinate a complex, incorrect mathematical rule (explicit failure) and the long text of this rule pushes the examples far away from the final answer (implicit failure), resulting in a wrong prediction.
Key Novelty
Explicit-Implicit Hybrid Mechanism Failure
  • Proposes that CoT reasoning is not a pure explicit process but a hybrid of explicit rule-following and implicit pattern matching
  • Identifies 'Contextual Distance' as a negative factor: inserting rationales physically separates demonstrations from the test query, weakening the attention mechanism's ability to perform implicit learning
  • Demonstrates that LLMs often get the right answer with CoT despite wrong reasoning (implicit success), but CoT's structure hampers this implicit success compared to Direct Answering
Evaluation Highlights
  • Direct Answering outperforms Chain-of-Thought (CoT) by a relative 20.42% (absolute 5.10%) across 9 diverse benchmarks
  • On symbolic tasks (e.g., ARC-AGI, RAVEN), Direct Answering outperforms CoT by a relative 41.88%, the most significant gap observed
  • Implicit reasoning contributes 7.5x more to CoT success than explicit reasoning on the List Function dataset, confirming the hybrid mechanism hypothesis
Breakthrough Assessment
8/10
Provides a strong counter-evidence to the 'CoT is always better' narrative with robust empirical backing and a novel theoretical mechanism (Explicit-Implicit Hybrid) explaining the failure modes.
×