RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents
STARA: Statutory Research Assistant—a specialized retrieval system that parses legal codes preserving hierarchy and definitions before applying semantic search
F1 score: A metric balancing precision (are answers correct?) and recall (are answers complete?)
UI: Unemployment Insurance—the specific legal domain of the benchmark
DOL: U.S. Department of Labor—the federal agency whose manual statutory surveys serve as the initial ground truth
LaborBench: A benchmark dataset of 1,647 questions on state unemployment insurance laws derived from DOL reports
RegEx: Regular Expressions—patterns used to filter text; here used to narrow the search space before semantic analysis
False Positive: A result where the AI claims a law exists when it ostensibly does not (though many proved to be valid laws missed by humans)
False Negative: A result where the AI fails to find an existing law
Recall: The percentage of relevant laws found by the system out of all laws that actually exist