Personalized Attacks of Social Engineering in Multi-turn Conversations: LLM Agents for Simulation and Detection

📝 Paper Summary

Multi-agent simulation AI Safety & Security

SE-VSim is a multi-agent framework that simulates chat-based social engineering attacks by modeling victim personality traits to analyze susceptibility to LLM-driven manipulation.

Core Problem

Existing simulations of social engineering lack grounding in psychological frameworks (like victim personality) and overemphasize immediate data theft rather than realistic trust-building strategies.

Why it matters:

LLMs can generate highly convincing, context-aware attacks at scale, making detection difficult compared to traditional static phishing
Victims' psychological profiles (e.g., Agreeableness, Neuroticism) significantly alter their vulnerability, yet prior simulations model victims as generic agents
Real-world attacks often involve long-term trust establishment before information extraction, which single-turn or ungrounded simulations fail to capture

Concrete Example: A traditional simulation might simply demand a password. A real attacker (and SE-VSim) might impersonate a recruiter, compliment a 'High Agreeableness' victim's work history over several turns to build rapport, and then subtly request sensitive PII.

Key Novelty

Personality-Aware Dual-Agent Simulation (SE-VSim)

Models the interaction between an Attacker Agent (with specific roles/intents) and a Victim Agent (conditioned on Big Five personality traits) to generate diverse attack trajectories
Integrates psychological theory into the generation pipeline, allowing the study of how traits like Neuroticism or Conscientiousness affect attack success
Prioritizes 'attack strategies' (persuasion, influence) as annotation targets alongside simple success/failure labels

Architecture

The SE-VSim framework showing the interaction loop between the Attacker Agent and Victim Agent.

Evaluation Highlights

Generated a dataset of 1,350 multi-turn conversations (900 malicious, 450 benign) covering 3 attacker roles and 5 victim personality traits
Achieved 0.796 Fleiss' Kappa agreement between human annotators and the LLM judge (GPT-4o-mini) for labeling attack success
Simulated attacks across 3 distinct professional scenarios: Funding Agencies, Journalists, and Recruiters

Breakthrough Assessment

7/10

Significant contribution in grounding security simulations in psychology (Big Five). The dataset generation methodology is robust, though the provided text lacks the downstream defense performance results.

⚙️ Technical Details

Problem Definition

Setting: Multi-turn Chat-based Social Engineering (CSE) simulation and detection

Inputs: Attacker goal (Role + Intent) and Victim Persona (Big Five Trait Description)

Outputs: Multi-turn conversation transcript labeled with malicious/benign intent and success level

Pipeline Flow

Attacker Agent Initialization (Goal: Role + Intent)
Victim Agent Initialization (Persona: Big Five Trait)
Conversation Generation Loop (Turn-taking up to budget)
Annotation (GPT-4o-mini + Humans)

System Modules

Attacker Agent (Simulation Agents)

Emulate malicious actor seeking sensitive info

Model or implementation: Open-source LLMs (specific name not provided in text snippet)

Victim Agent (Simulation Agents)

Emulate target user with specific personality

Model or implementation: Open-source LLMs (specific name not provided in text snippet)

Annotator

Label conversation success and attack strategies

Model or implementation: GPT-4o-mini

Novel Architectural Elements

Psychological conditioning of Victim Agents using Big Five personality descriptions to vary susceptibility
Dual-agent loop specifically grounded in SE effect mechanisms (Trust building -> Info extraction)

Modeling

Base Model: Open-source LLMs for agents (specific model name not found in text); GPT-4o-mini for annotation

Training Method: In-context learning (Prompt Engineering)

Adaptation: None (Prompt-based simulation)

Trainable Parameters: None (Inference only)

Compute: Not reported in the paper

Comparison to Prior Work

vs. Ai et al. (2024): SE-VSim adds the 'Victim' dimension with psychological grounding, whereas prior work focused primarily on the dual role of LLMs without personality variation
vs. Traditional SE datasets: SE-VSim generates multi-turn, context-aware dialogues rather than static phishing templates

Limitations

Relies on the assumption that LLMs accurately simulate human personality traits based on descriptions
Success labels focus on information extraction and trust building, which can be subjective
Manual annotation was limited to a subset (implied by use of GPT-4o-mini for scale)
Specific open-source model used for generation is not named in the provided text snippet

Reproducibility

Code and dataset promised upon acceptance. Prompts for Attacker and Victim are referenced in Appendix B (not fully provided in text). Victim personality descriptions referenced in Table 6. GPT-4o-mini used for annotation (closed source).

📊 Experiments & Results

Evaluation Setup

Simulation of 1,350 conversations across varied attack scenarios and victim profiles

Benchmarks:

SE-VSim Generated Dataset (Social Engineering Simulation) [New]

Metrics:

Number of conversations
Fleiss' Kappa (Annotation Agreement)
Statistical methodology: Fleiss' Kappa calculated for inter-annotator agreement.

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
The paper primarily presents the construction and validation of the SE-VSim dataset.
SE-VSim Dataset	Total Conversations	0	1350	+1350
Attack Success Labeling	Fleiss' Kappa	0	0.796	+0.796

Experiment Figures

Distribution of the 1,350 generated conversations across Attacker Roles, Information Types, and Victim Traits.

Main Takeaways

Generated a balanced dataset covering three professional attacker roles (Recruiter, Journalist, Funding Agency) and three target information types (PII, Financial, Patents).
Demonstrated that LLMs (GPT-4o-mini) can reliably replace human annotators for complex social engineering success labeling (Kappa=0.796).
Established a simulation framework that successfully integrates Big Five personality traits into victim agents, allowing for diverse conversation trajectories.

📚 Prerequisite Knowledge

Prerequisites

Social Engineering (SE) concepts
Large Language Models (LLMs)
Psychological personality frameworks (Big Five)

Key Terms

Social Engineering (SE): Psychological manipulation to deceive people into performing actions or divulging confidential information

Chat-based Social Engineering (CSE): SE attacks conducted via multi-turn conversations (e.g., LinkedIn, Slack) rather than single emails

Big Five: A psychological model describing personality via five traits: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism

Fleiss' Kappa: A statistical measure for assessing the reliability of agreement between a fixed number of raters

PII: Personally Identifiable Information—sensitive data used to identify a specific individual

In-context learning: Prompting an LLM with instructions and examples within the input context to guide its behavior without fine-tuning weights