← Back to Paper List

CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering

Hang Lv, Sheng Liang, Hao Wang, Hongchao Gu, Yaxiong Wu, Wei Guo, Defu Lian, Yong Liu, Enhong Chen
arXiv (2025)
P13N Memory RL

📝 Paper Summary

Edge-Cloud Collaborative Inference Privacy-Preserving Personalization
CoSteer enables privacy-preserving personalization by using a local small model to calculate steering signals from user data, which then guide a frozen cloud LLM's generation without transmitting private context.
Core Problem
Deploying personalized LLMs involves a difficult trade-off: cloud models compromise privacy by requiring user data transmission, while local models lack the computational power for high-quality generation.
Why it matters:
  • Sending private user context (profiles, history) to cloud LLMs violates privacy and data residency requirements
  • Local devices cannot host state-of-the-art LLMs, leading to inferior personalized content if relying solely on on-device models
  • Existing training-based personalization is too resource-intensive for edge devices and difficult to update in real-time as user preferences evolve
Concrete Example: A user asks 'Recommend a dinner spot.' A cloud LLM, lacking context, suggests a generic steakhouse. A local small model knows the user is vegetarian but generates incoherent text. Current methods either force the user to upload their 'vegetarian' profile to the cloud (privacy risk) or accept the poor local output.
Key Novelty
Collaborative Decoding-Time Personalization via Local Delta Steering
  • Treats personalization as an online learning problem where a local device iteratively 'steers' the cloud model's output distribution
  • Calculates a 'delta' vector locally by comparing a small model's predictions with and without personal context (e.g., 'prediction given profile' minus 'prediction given query only')
  • Fuses this local delta with the cloud model's logits using a closed-form update rule, ensuring the cloud model never sees the raw private data
Architecture
Architecture Figure Algorithm 1 / Concept Description
The edge-cloud collaborative inference procedure
Breakthrough Assessment
8/10
Ideally addresses the privacy-utility bottleneck in personalization by decoupling context processing (local) from generation capability (cloud) via a mathematically grounded steering mechanism.
×