← Back to Paper List

PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents

Yuxiang Chai, Shunye Tang, Han Xiao, Rui Liu, Hongsheng Li
Nankai University, Huawei Research
arXiv (2026)
MM Agent Benchmark Memory P13N

📝 Paper Summary

Proactive AI Assistants GUI Automation
PIRA-Bench evaluates the ability of multimodal agents to shift from reactive instruction-following to proactively inferring future user intents from continuous, noisy GUI visual streams and user profiles.
Core Problem
Current GUI agents are reactive, requiring explicit, detailed instructions from users who may forget context or find prompting tedious, and they fail to handle real-world interleaved multitasking.
Why it matters:
  • Explicit prompting imposes a high cognitive burden on users, interrupting natural workflows.
  • Reactive agents fail in dynamic scenarios where users omit crucial details (e.g., time or location mentioned earlier).
  • Real-world screen activity is non-linear and noisy; agents must distinguish between active tasks, background browsing, and idle distractions.
Concrete Example: If a user chats with a friend about a weekend meal, a reactive agent waits for a command like 'Book restaurant'. A proactive PIRA agent observes the chat, anticipates the need, and autonomously recommends booking the table, setting a reminder, and adding a calendar event.
Key Novelty
Proactive Intent Recommendation (PIR) Benchmark & Framework
  • Defines a new task (PIR) where agents must predict latent future goals from passive screen history rather than executing explicit current commands.
  • Introduces a dataset (PIRA-Bench) containing 'negative' pure noise trajectories to test operational restraint (preventing hallucinations when no action is needed).
  • Proposes PIRF (Proactive Intent Recommendation Framework), which uses a memory module with 'reflection-based auto-deletion' to keep track of interleaved tasks and remove outdated intents.
Architecture
Architecture Figure Figure 1 (Conceptual)
Illustration of the Proactive Intent Recommendation (PIR) Agent concept versus a standard Reactive Agent.
Evaluation Highlights
  • Constructed 100 real-world trajectories averaging 32 sequential screenshots each, designed to test long-horizon visual understanding.
  • Each trajectory is paired with 3 distinct user profiles (300 evaluation instances total) to assess personalization capabilities (e.g., suggesting luxury vs. budget options).
  • Includes specific 'Negative Rejection Samples' composed entirely of noise to strictly penalize agents that fail to remain idle when no intent exists.
Breakthrough Assessment
8/10
Significant conceptual shift from reactive to proactive GUI agents. The inclusion of pure noise trajectories and profile-dependent ground truths addresses critical gaps in agent reliability and personalization.
×