← Back to Paper List

LLMs Can Infer Political Alignment from Online Conversations

Byunghwee Lee, Sangyeon Kim, Filippo Menczer, Yong-Yeol Ahn, Haewoon Kwak, Jisun An
School of Data Science, University of Virginia, Center for Complex Networks and Systems Research, Indiana University, Division of Communication and Media, Ewha Womans University
arXiv (2026)
P13N Benchmark Factuality

📝 Paper Summary

User modeling Privacy and Societal Impact
Large language models can accurately infer a user's political alignment from seemingly innocuous, general-interest online conversations (like music or cars) by leveraging latent socio-cultural linguistic correlations.
Core Problem
Seemingly harmless public preferences (e.g., music taste, car choice) correlate with private traits like political alignment, but the extent to which off-the-shelf LLMs can exploit these correlations for mass profiling without bespoke training is unknown.
Why it matters:
  • Enables large-scale political micro-targeting and manipulation (e.g., similar to the Cambridge Analytica scandal) using easily accessible public data
  • Demonstrates a fundamental privacy risk where opting out of political discussions does not protect users from being politically profiled
  • Reduces the barrier to entry for invasive psychological profiling, moving it from data experts to anyone with access to standard LLMs
Concrete Example: A user discusses 'Taylor Swift' in a music forum or 'Tesla' in a car forum. While these are not explicit policy debates, an LLM infers the user is likely Democratic or Republican, respectively, because these cultural symbols have become politicized signals.
Key Novelty
Zero-shot Inference of Political Alignment from General Discourse
  • Demonstrates that LLMs pre-trained on web-scale data natively encode subtle socio-cultural correlations (homophily), allowing them to predict politics from non-political text (e.g., 'Health', 'Science') without specific fine-tuning
  • Introduces confidence-based aggregation methods (Max-Confidence) that significantly boost user-level prediction accuracy by filtering for texts where the LLM detects strong partisan signals
Evaluation Highlights
  • GPT-4o achieves an F1 score of 0.799 on Reddit general-interest texts using maximum-confidence aggregation, outperforming text-level inference by +0.193
  • LLMs outperform traditional supervised machine learning baselines (max F1 ~0.612) on identifying political alignment from Debate.org data
  • Strong correlation (r=0.673) in inference performance across categories between Reddit and Debate.org, suggesting stable discourse-level leakage of political signals
Breakthrough Assessment
8/10
Strongly demonstrates a significant privacy capability of vanilla LLMs that outperforms traditional supervised methods. The finding that general/innocuous text leaks high-fidelity political signals has major implications for privacy and user modeling.
×