RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents
ScreenshotVQA: A new benchmark introduced in this paper comprising ~20,000 high-resolution screenshots to test multimodal memory recall
LOCOMO: A long-context benchmark requiring reasoning over long-form, multi-turn conversations
Episodic Memory: Memory component storing time-stamped, specific events and experiences
Semantic Memory: Memory component storing abstract facts, concepts, and entities independent of specific events
Procedural Memory: Memory component storing step-by-step instructions and workflows
Gemini API: A multimodal LLM API from Google used here for processing visual data
React-Electron: A framework for building cross-platform desktop applications using web technologies