PIR: Proactive Intent Recommendation—a task where agents anticipate user needs from context without explicit prompts.
GUI: Graphical User Interface—the visual display (icons, windows) users interact with on devices.
POMDP: Partially Observable Markov Decision Process—a mathematical framework for decision-making where the agent cannot see the entire state of the world.
MLLM: Multimodal Large Language Model—AI models capable of processing and reasoning over both text and images.
Hallucination: In this context, when an agent predicts an intent or action that the user does not actually have or need, often triggered by noise.
PIRF: Proactive Intent Recommendation Framework—the baseline architecture proposed in the paper featuring memory and reflection.
F1 score: A metric balancing precision and recall, used here to measure how accurately the predicted intents match the ground truth intents.
FPS: False Positive Score—measures the frequency of hallucinated intents when the agent should have remained silent.
Interleaved intents: Multiple distinct tasks occurring in a mixed sequence (e.g., switching between chatting and studying).