Enhancing Debunking Effectiveness through LLM-based Personality Adaptation

📝 Paper Summary

User-profile based personalization Combating Misinformation

This study uses Large Language Models to tailor fake news debunking messages to specific Big Five personality profiles, finding that personalized content is generally more persuasive than generic fact-checking.

Core Problem

Generic fact-checking messages often fail to persuade users because they overlook individual psychological differences, cognitive styles, and pre-existing beliefs.

Why it matters:

Generalized debunking is suboptimal as personality traits like Extraversion or Openness significantly moderate how individuals process information and accept corrections
Manual fact-checking is not scalable against the volume of AI-generated disinformation, necessitating automated but effective counter-narrative strategies
One-size-fits-all approaches ignore cognitive biases like confirmation bias, reducing the impact of corrections on susceptible groups

Concrete Example: A generic debunking message might simply state facts, which fails to resonate with a highly Neurotic individual who responds better to reassurance, or an Extravert who engages with social rewards. The proposed system rewrites the verdict to specifically align with the user's psychological profile (e.g., emphasizing social aspects for Extraverts).

Key Novelty

Big Five-Aligned LLM Debunking

Systematically prompt an LLM to rewrite generic debunking verdicts into 32 distinct variations corresponding to binarized Big Five personality profiles (e.g., High Extraversion, Low Agreeableness)
Employ a persona-based evaluation framework where a separate LLM adopts specific personality traits to act as a judge, assessing the persuasiveness of matched vs. mismatched messages

Architecture

The workflow for generating and evaluating personalized debunking messages

Evaluation Highlights

Personalized (Matched) verdicts achieved higher persuasiveness scores than Generic verdicts across all judge profiles and models (t-statistic > 15, p < 0.05)
Qwen3-8B achieved 88.64% accuracy in identifying the exact matched profile as the most persuasive, significantly outperforming Llama3 (70.59%) and Qwen3-32B (68.78%)
All models showed high accuracy (>86%) in preferring verdicts tailored to the exact profile or a 'close neighbor' (differing by only one trait) over generic content

Breakthrough Assessment

7/10

Demonstrates a practical, automated pipeline for psychological targeting in misinformation correction. While the methodology is sound and results are positive, it relies entirely on synthetic evaluation (LLM-as-a-judge) without human validation.

⚙️ Technical Details

Problem Definition

Setting: Automated rewriting of debunking verdicts to maximize persuasiveness for specific psychological profiles

Inputs: Fake news claim, generic debunking verdict, full debunking context, target personality profile (5-bit binary code)

Outputs: Personalized debunking verdict text

Pipeline Flow

Profile Definition (32 binarized Big Five profiles)
Tailored Debunking Generation (LLM rewrites generic verdict for target profile)
Persona-Based Evaluation (LLM judge adopts profile to rate persuasiveness)

System Modules

Profile Generator

Define target personas using 5-digit binary codes representing high/low values for Big Five traits

Model or implementation: N/A (Rule-based)

Debunking Rewriter

Rewrite generic verdicts to align with specific personality traits

Model or implementation: Qwen3-32B

Persona Judge

Simulate a human with specific traits to evaluate message persuasiveness

Model or implementation: Llama3-8B-Instruct, Qwen3-8B, Qwen3-32B

Modeling

Base Model: Qwen3-32B (for generation); Llama3-8B-Instruct, Qwen3-8B, Qwen3-32B (for evaluation)

Comparison to Prior Work

vs. Generic Fact-Checking: Incorporates psychological profiling (Big Five) to adjust tone and framing
vs. Rule-based Personalization: Uses LLM generative capabilities to fluently rewrite content rather than selecting from pre-written templates

Limitations

Relies entirely on simulated LLM judges; no human evaluation to confirm if real users find the messages more persuasive
Binarization of personality traits simplifies the continuous nature of human personality
Ethical concerns regarding manipulation and privacy are raised but not technically resolved
Different LLM families (Llama vs. Qwen) exhibit inherent biases in evaluation (e.g., Llama3 is more 'generous' and Agreeable)

Reproducibility

Prompt templates for both generation and evaluation are provided in the paper. The filtered subset of the FullFact dataset (933 instances) is described but no direct download link is provided. Model weights for Llama3 and Qwen3 are publicly available open-source artifacts.

📊 Experiments & Results

Evaluation Setup

LLM-as-a-judge simulation of 32 distinct personality profiles evaluating debunking messages

Benchmarks:

FullFact Subset (Fake news debunking) [New]

Metrics:

Persuasiveness Score (1-7 Likert scale)
Accuracy_p (Exact Profile Accuracy)
Accuracy_cn (Close Neighbor Accuracy)
Statistical methodology: Paired-sample t-tests (p-value < 0.05)

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Comparison of persuasiveness scores shows personalized messages consistently outperform generic ones across all models.
FullFact Subset	t-statistic (Matched vs Generic)	0	59.81	+59.81
FullFact Subset	t-statistic (Matched vs Mismatched)	0	15.01	+15.01
Accuracy metrics reveal how often the model prefers the verdict specifically tailored to its profile (or a close neighbor).
FullFact Subset	Accuracy_p (Exact Match)	Not reported in the paper	88.64	Not reported in the paper
FullFact Subset	Accuracy_cn (Close Neighbor)	Not reported in the paper	96.39	Not reported in the paper

Experiment Figures

Mean persuasion scores for each of the 32 judge profiles across three conditions: Matched, Mismatched, and Generic

Heatmap or distribution of persuasion scores across personality traits

Main Takeaways

Matched verdicts are statistically superior to Mismatched verdicts, and both vastly outperform Generic verdicts across all evaluators
Personality traits influence persuadability: Profiles with high Openness and Conscientiousness are generally easier to persuade, while high Neuroticism lowers persuasion scores
Model bias affects evaluation: Llama3 simulates 'generous' judges (higher scores, less discrimination), while Qwen3 models are more 'cautious' and discriminating
The 'Close Neighbor' effect suggests that personalization is robust; even if the exact profile isn't the top pick, a very similar profile's message usually is

📚 Prerequisite Knowledge

Prerequisites

Understanding of the Big Five personality traits (OCEAN model)
Familiarity with Large Language Models (LLMs) and prompting techniques
Concept of LLM-as-a-judge for evaluation

Key Terms

Big Five: A psychological taxonomy describing personality via five traits: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism

LLM-as-a-judge: Using an LLM to evaluate the quality or properties of text (e.g., persuasiveness) instead of human annotators

Binarization: Simplifying continuous personality traits into high (1) or low (0) categories to create discrete profiles (e.g., 10101)

Persona-based prompting: Instructing an LLM to adopt a specific persona or role (e.g., 'You are an Extraverted expert') to guide generation or evaluation

Hallucination: A phenomenon where LLMs generate plausible but factually incorrect or fabricated information

Matched verdict: A debunking message specifically generated to align with the personality traits of the evaluator

Mismatched verdict: A debunking message generated for a personality profile different from that of the evaluator

Close neighbour: A personality profile that differs from the target profile by only one trait (one bit in the binary representation)