Continued Pre-training: Training an already pre-trained model on a specific domain corpus (e.g., legal text) to adapt its internal knowledge before fine-tuning
RAG: Retrieval-Augmented Generation—systems that improve model answers by retrieving relevant documents from an external database
SFT: Supervised Fine-Tuning—training a model on labeled instruction-response pairs to teach it how to follow user commands
Needle in a Haystack (NIAH): A benchmark testing if a model can find a specific fact hidden within a very large amount of unrelated text
OAB: Ordem dos Advogados do Brasil—The Brazilian Bar Association exam, used here as a benchmark for legal reasoning and drafting
TPU: Tensor Processing Unit—specialized hardware by Google designed to accelerate machine learning workloads
JAX: A high-performance numerical computing library used for machine learning research, particularly on TPUs
Function Calling: The ability of an LLM to generate structured outputs (like JSON) that can be executed by external code or APIs