intrinsic hallucination: A type of error where the model's output contradicts the provided source context (e.g., a table in the prompt), as opposed to contradicting external world knowledge
masked span prediction: A task where specific parts of a text are hidden (masked) and the model must predict the original content based on context
S&P 500: A stock market index tracking the stock performance of 500 of the largest companies listed on stock exchanges in the United States
10-K report: A comprehensive summary report of a company's financial performance submitted annually to the U.S. Securities and Exchange Commission
precision-relaxed evaluation: An evaluation method that normalizes numbers and compares them based on their significant digits to avoid penalizing valid formatting differences (e.g., 1M vs 1,000,000)
unit groups: Sets of aliased units (e.g., {$, USD, dollars}) used to match predicted units with ground truth regardless of specific phrasing
Fleiss' Kappa: A statistical measure for assessing the reliability of agreement between a fixed number of raters