TestSmith: Coding agent role responsible for converting the product specification into executable tests (visible and hidden)
PromptSmith: Coding agent role responsible for iteratively refining the agent prompt until the visible tests pass
MutationSmith: Coding agent role responsible for generating plausible faulty prompt variants to evaluate the strength of the test suite
MFT: Minimum Functionality Test—checks the basic required action for a specific leaf node in the decision tree
INV: Invariance Test—checks that the agent's behavior remains consistent when user inputs are paraphrased
DIR: Directional Expectation Test—checks that changing a specific input condition (e.g., changing an order value) changes the output as expected
canary values: Unique identifiers (e.g., specific fake SSNs) embedded in mock data that indicate a security failure if they appear in the agent's output
HPR: Hidden Pass Rate—the fraction of held-out tests (not seen by PromptSmith) that the compiled agent passes; measures generalization
SURS: Spec Update Regression Score—fraction of v1 invariant tests that still pass after the agent is compiled for v2 requirements
activation probe: A targeted test case used by MutationSmith to verify that a generated mutant prompt actually exhibits the intended faulty behavior before running the full test suite