Foundation Model (FM): A large-scale machine learning model (like GPT-4) trained on vast data that serves as the 'brain' for AI agents.
Agentic Application: Software that uses an FM to perceive, reason, and act to achieve goals, often using tools and memory.
SUT (Subject Under Test): The specific part of the software (function, class, module) being verified by a test.
AAA Pattern: Arrange-Act-Assert: A standard structure for unit tests involving setting up state, executing the function, and verifying the result.
JaCaMo: A classic conceptual framework for multi-agent systems describing agents, environments, and organizations; used here as a reference architecture.
Non-determinism: The property of FMs where the same input may produce different outputs, complicating traditional equality-based testing.
Resource Artifacts: Deterministic tools or APIs the agent uses, such as a calculator or database connection.
Trigger: The component responsible for initiating an agent's plan, typically the prompt sent to the FM.
DeepEval: A specialized testing framework designed to evaluate LLM outputs using metrics like hallucination or faithfulness scores.
Mock Assertion: A testing pattern where the test verifies that a dependency was called (e.g., 'tool was invoked') rather than checking the actual output.