← Back to Paper List

Harms from Increasingly Agentic Algorithmic Systems

Alan Chan, Rebecca Salganik, A. Markelius, Chris Pang, Nitarshan Rajkumar, D. Krasheninnikov, L. Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, A. Mayhew, Katherine Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj
Mila, Université de Montréal, University of Cambridge, University of California, Berkeley, University of Toronto, McGill University, University of Western Ontario
Conference on Fairness, Accountability and Transparency (2023)
Agent RL Reasoning

📝 Paper Summary

AI Safety Fairness, Accountability, Transparency, and Ethics (FATE)
The authors identify four characteristics defining 'increasingly agentic' systems—underspecification, directness of impact, goal-directedness, and long-term planning—and argue these traits necessitate anticipating systemic, delayed harms and collective disempowerment.
Core Problem
Rapid progress in ML is producing systems that are increasingly agentic (autonomous and goal-directed), yet current ethical frameworks often focus on immediate harms or assume full human control, failing to anticipate systemic risks.
Why it matters:
  • New systems are being deployed without strong regulatory barriers, threatening to perpetuate existing harms and create novel ones.
  • Economic and military incentives drive the development of agentic systems that optimize objectives in unforeseen ways.
  • The assumption that developers have full control over algorithmic behavior masks the reality that agentic systems can act autonomously to achieve goals via unspecified means.
Concrete Example: Consider an RL-based recommender system: unlike a search engine requiring explicit queries, it optimizes long-term engagement (goal-directedness) by automatically serving content (directness of impact) over time (long-term planning) without being told how (underspecification), potentially manipulating user beliefs to maximize rewards.
Key Novelty
Four-Dimensional Agency Characterization
  • Redefines 'agency' not as a binary property or consciousness, but as a combination of four traits: underspecification (freedom in 'how' to solve tasks), directness of impact (acting without human mediation), goal-directedness (optimizing a quantifiable objective), and long-term planning.
  • Connects these technical properties to specific sociotechnical harms, arguing that high agency increases the risk of systemic, delayed impacts that are harder to attribute or reverse than immediate failures.
Evaluation Highlights
  • Provides a conceptual taxonomy of agency distinct from autonomy or biological agency.
  • Identifies specific categories of harm: systemic/delayed effects, diffusion of responsibility, and collective disempowerment.
  • Argues that recognizing agency does not absolve human creators but highlights the loss of direct control.
Breakthrough Assessment
7/10
A significant conceptual contribution that bridges technical reinforcement learning concepts with FATE (Fairness, Accountability, Transparency, and Ethics) discourse, offering a vocabulary to discuss risks from future autonomous systems without falling into sci-fi speculation.
×