← Back to Paper List

How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

Qianou Ma, Hua Shen, K. Koedinger, Tongshuang Wu
Carnegie Mellon University
International Conference on Artificial Intelligence in Education (2023)
Agent Reasoning Benchmark

📝 Paper Summary

LLMs for Education (EdTech) Human-AI Collaboration Intelligent Tutoring Systems
HypoCompass reverses the traditional roles by having students act as Teaching Assistants to debug imperfect LLM-generated code, improving their own hypothesis construction and testing skills.
Core Problem
Novice programmers lack sufficient practice in debugging and hypothesis construction because creating specialized debugging exercises is time-consuming for instructors.
Why it matters:
  • LLMs (like Copilot) are now ubiquitous 'AI pair programmers' but frequently make subtle mistakes (up to 17% error rates in basic tasks), requiring students to have strong evaluation skills
  • Debugging is often overlooked in CS1 curricula due to the high logistical cost of creating materials
  • Students currently learn debugging inefficiently by struggling with their own code, mixing hypothesis formation with the cognitive load of code writing
Concrete Example: A student struggling to debug their own code must simultaneously understand the logic, write the syntax, and hypothesize bugs. In HypoCompass, the student delegates code writing to the LLM and focuses purely on creating test cases (hypotheses) to identify why the LLM's code fails.
Key Novelty
LLM as a Teachable Agent (Role Reversal)
  • Simulates a 'reverse' classroom where the LLM plays a confused student and the human user plays the Teaching Assistant (TA)
  • Uses 'over-generate-then-select' prompting to create diverse, naturally buggy programs from a single problem description
  • Disentangles learning objectives: students focus on high-level hypothesis testing while the LLM handles low-level code completion and bug fixing based on student instructions
Architecture
Architecture Figure Figure 4
The pipeline for generating practice materials using LLMs.
Evaluation Highlights
  • HypoCompass generates high-quality training materials (bugs, fixes, tests) 4x faster than human Teaching Assistants
  • Students using HypoCompass improved their debugging performance by 12% from pre-test to post-test
  • Students reduced their task completion time by 14% after training with the system
Breakthrough Assessment
7/10
Strong application of LLMs to solve a specific pedagogical bottleneck (debugging practice). The role-reversal design is clever and the efficiency gains over human material generation are significant.
×