MRAMG: Multimodal Retrieval-Augmented Multimodal Generation—generating answers that seamlessly integrate both text and retrieved images
Interleaved content: Data format where text paragraphs and images appear in a specific sequence (e.g., Text1, Image1, Text2...)
CoT: Chain-of-Thought—a prompting strategy where the model generates intermediate reasoning steps before the final answer
MinHash: A technique for quickly estimating the similarity between two sets, used here for deduplicating images
MinerU: A tool used to parse PDF documents into markdown format while preserving the structure of text and images
MRAMG-Bench: The proposed benchmark consisting of datasets across Web, Academia, and Lifestyle domains
RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents