← Back to Paper List

Automated Unit Test Improvement using Large Language Models at Meta

N. Alshahwan, Jubin Chheda, Anastasia Finogenova, Beliz Gokkaya, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang
Meta Platforms Inc.
SIGSOFT FSE Companion (2024)
Agent Factuality Benchmark

📝 Paper Summary

LLM-based Software Engineering Automated Test Generation
TestGen-LLM automatically improves existing human-written unit tests at Meta by generating new test cases that are verified via strict filtration to guarantee compilation, reliability, and increased code coverage.
Core Problem
Automated test generation in large industrial codebases faces challenges regarding trust, hallucination, and regression, often producing flaky or duplicate tests that waste engineering resources.
Why it matters:
  • LLM hallucinations make generated code unreliable for production without rigorous verification
  • Industrial scale (millions of lines of code) requires automated solutions that integrate into existing workflows without increasing maintenance burden
  • Regressions and flaky tests disrupt Continuous Integration (CI) systems, costing significant developer time
Concrete Example: An LLM might generate a test case that calls a non-existent method (hallucination) or passes sometimes but fails others (flakiness). Without filtration, this code would break the build. TestGen-LLM filters these out, keeping only tests that build, pass consistently, and cover previously missed lines (e.g., covering a specific 'early return' statement).
Key Novelty
Assured Offline LLM-Based Software Engineering (Assured Offline LLMSE)
  • Treats LLM output not as final code but as candidate suggestions that must pass a rigorous, automated filtration pipeline before being recommended to humans
  • Guarantees improvement by discarding any test that does not measurably increase coverage over the existing suite
  • Guarantees non-regression by only adding new test cases to existing classes, never modifying or deleting existing stable tests
Architecture
Architecture Figure Figure 1
Top-level architecture of TestGen-LLM as an Assured Offline LLMSE system
Evaluation Highlights
  • 73% of TestGen-LLM's recommendations were accepted by Meta engineers for production deployment during test-a-thons
  • 25% of generated test classes increased coverage (building correctly and passing reliably) in an evaluation on Instagram's Reels and Stories
  • Improved 11.5% of all classes to which it was applied during Meta's Instagram and Facebook test-a-thons
Breakthrough Assessment
8/10
High score due to the unprecedented scale of industrial deployment and the high acceptance rate (73%) of LLM-generated code, proving the viability of 'Assured LLMSE' in production environments.
×