← Back to Paper List

CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes

Mark H. Liffiton, Brad E. Sheese, Jaromir Savelka, Paul Denny
Illinois Wesleyan University, Carnegie Mellon University, The University of Auckland
European Conference on Modelling and Simulation (2023)
Agent QA Reasoning Factuality

📝 Paper Summary

AI for Education Human-Computer Interaction (HCI)
CodeHelp employs a multi-stage LLM prompting pipeline with explicit guardrails to provide on-demand programming assistance while actively preventing the generation of direct solution code.
Core Problem
Standard LLMs (like ChatGPT) and code generators often provide direct solutions to programming assignments, leading to student over-reliance and hindering learning.
Why it matters:
  • Scalability limits of human TAs/instructors in large classes prevent timely support for all students
  • Students using raw LLMs may bypass the productive struggle required for learning CS concepts
  • Static hint systems are labor-intensive to author and rarely cover all possible student errors
Concrete Example: When a student asks an LLM 'Write a while loop that starts at the last character...', a standard model outputs the exact code. CodeHelp instead provides a conceptual explanation of `len()` and `range()` without writing the solution block.
Key Novelty
3-Stage Guardrailed Pipeline
  • Decomposes the help process into three distinct LLM calls: Sufficiency Check, Main Response Generation, and Code Removal
  • Uses a 'Code Removal' agent specifically prompted to rewrite responses if the main agent violates instructions and leaks code (a failure mode common in standard instruction-tuned models)
  • Scores multiple generated completions against an instructor-defined 'avoid set' (forbidden keywords) to select the most pedagogically appropriate response
Architecture
Architecture Figure Figure 4
The logic flow of the CodeHelp response pipeline.
Evaluation Highlights
  • 95% of surveyed students (n=45) agreed they would like to use CodeHelp in future Computer Science courses
  • 80% of students agreed or strongly agreed that the tool helped them complete their work successfully
  • Cost-effective deployment at roughly $0.002 per query, estimated under $10 for a 50-student class per semester
Breakthrough Assessment
7/10
While not a fundamental architectural advance in ML, the 'Code Removal' pipeline is a practical, effective pattern for enforcing negative constraints (guardrails) where standard prompting often fails.
×