RAG: Retrieval-Augmented Generation—systems that retrieve relevant documents to answer queries
GraphRAG: A structured RAG approach that builds a knowledge graph (entities/relationships) from documents to enable better reasoning and global summarization
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that updates only a small subset of model weights
CoT: Chain-of-Thought—a prompting technique encouraging the model to generate intermediate reasoning steps
LLM-native memory: Information stored directly in the model's parameters via fine-tuning, rather than accessed via external retrieval
local queries: Questions targeting specific, fine-grained details within a small chunk of text
global queries: Questions requiring synthesis or aggregation of information across the entire memory/corpus
SFT: Supervised Fine-Tuning—training a model on labeled examples to adapt it to a specific task