FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models

📝 Paper Summary

Multimodal Large Language Models (MLLMs) Hallucination Mitigation Creative Generation

FlexAC is a training-free framework that modulates MLLM associative reasoning by extracting steering vectors from hallucinated responses and injecting them into middle layers during inference.

Core Problem

Current MLLMs face a trade-off where methods to reduce hallucination (improving faithfulness) inadvertently suppress associative reasoning (harming creativity), lacking flexible control.

Why it matters:

Existing hallucination mitigation techniques like VCD and DPO often degrade performance on creative tasks (e.g., storytelling, event planning)
MLLMs need to adaptively switch between convergent thinking (factual) and divergent thinking (creative) based on task demands, similar to human cognition
Enhancing creativity in a controllable, task-specific manner remains underexplored compared to faithfulness

Concrete Example: Existing hallucination mitigation techniques improve faithfulness (lowering CHAIR scores by 14.0) but reduce associative reasoning strength (lowering VDAT scores by 1.78), causing poor performance on creative tasks like event planning.

Key Novelty

Flexible Association Control (FlexAC)

Identifies that middle layers are the primary locus of associative behavior and that hallucinated responses encode strong associative directions useful for steering
Constructs steering vectors by contrasting hidden states of hallucinated (high-association) vs. grounded (low-association) responses
Applies these vectors at inference time with intensity calibration to dynamically amplify or suppress associative reasoning based on input alignment

Architecture

The FlexAC framework pipeline, illustrating both Offline Control Vector Construction and Inference-Time Control phases.

Evaluation Highlights

Achieves up to 5.8× improvement in creativity on Creation-MMBench compared to baselines
Reduces hallucination rate by 29% on CHAIR benchmark while maintaining general capabilities
Outperforms VCD and Ha-DPO on both faithfulness (CHAIR) and creativity (VDAT) metrics by flexibly adjusting the steering coefficient

Breakthrough Assessment

8/10

Strong mechanistic insight linking hallucination and creativity to specific layer representations. The method effectively solves the trade-off between faithfulness and creativity without retraining.

⚙️ Technical Details

Problem Definition

Setting: Inference-time modulation of Multimodal Large Language Models

Inputs: Image I and text prompt T

Outputs: Generated text response R modulated by steering vector v

Pipeline Flow

Offline: Generate grounded/hallucinated pairs → Extract middle-layer diffs → Average into general vector
Offline (Optional): Generate task-specific vector using GPT-4o examples
Inference: Calculate steering intensity based on current state alignment
Inference: Inject scaled vector into middle layers → Normalize → Output

System Modules

Vector Constructor

Compute the steering vector by averaging differences between hallucinated and grounded hidden states

Model or implementation: Same as base MLLM (e.g., LLaVA-1.5)

Intensity Calibrator (Inference-Time Control)

Dynamically adjust steering strength to prevent over-steering if input is already aligned

Model or implementation: Algebraic calculation

Feature Injector (Inference-Time Control)

Add the steering vector to the hidden states of specific middle layers

Model or implementation: Base MLLM Middle Layers

Novel Architectural Elements

Steering Intensity Calibration mechanism that dynamically scales the intervention vector based on the projection of the current state onto the steering direction
Dual-vector integration combining a general associative vector (from hallucinations) with a task-specific vector (from GPT-4o examples)

Modeling

Base Model: Evaluated on LLaVA-1.5-7b, Qwen-VL-Chat, Deepseek-VL-7b

Training Method: Training-free inference-time intervention (activation steering)

Adaptation: None (weights frozen)

Trainable Parameters: 0

Training Data:

Uses 2000 images from COCO2014 for vector construction
Selected 50 images via Instance Selection for general association vector

Key Hyperparameters:

steering_coefficient_alpha: 1.0 (faithfulness), -1.0 (faithfulness variant in text seems contradictory, usually positive reduces hallucination? Paper says alpha=1 for precision/faithfulness, alpha=-1 for creativity?? Actually Section 3.2 says alpha=1 for precision, Table 1 uses FlexAC (alpha=1). Wait, Section 2.2 says 'increasing alpha raises CHAIR... indicates higher alpha leads to more hallucination'. Section 3.2 says 'set alpha to 1... selecting precision-optimized variant' - likely a sign flip in implementation vs derivation. Let's stick to text: alpha=1 for faithfulness/precision, alpha=-1 for creativity/hallucination? Text says 'FlexAC-P (faithfulness)... alpha=-1' in Implementation Details section. 'FlexAC-C (creativity)... alpha=1'. CONFLICT: Section 3.2 says 'set alpha to 1... selecting precision-optimized'. Implementation Details says 'FlexAC-P... alpha=-1'. I will report the Implementation Details values: alpha=-1 (faithfulness), alpha=1 (creativity).
top_k_pairs: 50
intervention_layers_qwen: 15, 16, 17
+ 2 more
intervention_layers_llava: 11, 12, 13
intervention_layers_deepseek: 4, 5, 6

Compute: 8x RTX 4090 GPUs used for experiments (inference only)

Comparison to Prior Work

vs. VCD: FlexAC allows bidirectional control (faithfulness AND creativity) whereas VCD only focuses on faithfulness
vs. Ha-DPO: FlexAC is training-free and lightweight, whereas Ha-DPO requires retraining/fine-tuning
vs. CAA (Context-Aware Decoding) [not cited in paper]: FlexAC uses hallucination specifically as the steering source for creativity, whereas CAA typically uses engineered prompts for general steering

Limitations

Relies on the assumption that hallucination and creativity share underlying associative mechanisms
Requires determining optimal intervention layers per model architecture empirically
Over-steering can still occur despite calibration if parameters are not tuned

Reproducibility

Code: https://github.com/ylhz/FlexAC

Code available at https://github.com/ylhz/FlexAC. Uses COCO2014 for vector construction. Specific layers for intervention provided for all three models.

📊 Experiments & Results

Evaluation Setup

Multimodal tasks covering hallucination (low association), creativity (high association), and general capability.

Benchmarks:

CHAIR (Hallucination Evaluation (Image Captioning))
POPE (Object Hallucination Probing (VQA))
VDAT (Visual Divergent Association Test) [New]
Creation-MMBench (Creative Generation)
MME / MMMU / MMStar (General Multimodal Capabilities)

Metrics:

CHAIR_S (Sentence-level Hallucination)
CHAIR_I (Image-level Hallucination)
VDAT Score (Associative Strength)
VFS (Visual Fidelity Score)
Creativity Reward
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Hallucination mitigation results (Faithfulness) showing FlexAC reduction in hallucination rates compared to baselines.
CHAIR	CHAIR_S	40.6	19.2	-21.4
CHAIR	CHAIR_S	50.8	36.6	-14.2
POPE	F1-score	85.8	87.9	+2.1
Creativity enhancement results showing FlexAC improves associative reasoning and creative output.
VDAT	Score	86.89	88.49	+1.60
Creation-MMBench	Reward	0.00	10.92	+10.92
Ablation study demonstrating the contribution of Instance Selection (IS), Steering Intensity Calibration (SIC), and Directional Integration (DI).
CHAIR	CHAIR_S	30.4	19.2	-11.2

Experiment Figures

Layer-wise analysis of cosine and Euclidean distances between grounded and hallucinated features, and the effect of layer intervention.

Scatter plot of CHAIR (Faithfulness) vs VDAT (Creativity) scores as the steering coefficient alpha is swept from -1.5 to 1.5.

Main Takeaways

Middle layers (10-15 in LLaVA) are the primary source of associative behavior; manipulating these layers effectively controls the output's faithfulness vs. creativity.
Hallucination mitigation methods like Ha-DPO and VCD actively harm associative reasoning capabilities (lower VDAT scores), whereas FlexAC can enhance them.
FlexAC provides a unified control knob (alpha) that can be swept to trade off between hallucination reduction and creativity enhancement without model retraining.

📚 Prerequisite Knowledge

Prerequisites

Understanding of Transformer architecture (hidden states, layers)
Familiarity with steering vectors / activation engineering
Basics of MLLM hallucination (object existence vs. fabrication)

Key Terms

Steering Vector: A direction in the model's activation space that, when added to hidden states, biases the model's behavior (e.g., towards creativity or factuality)

CHAIR: Captioning Hallucination Assessment with Image Relevance—a metric measuring the percentage of generated objects not present in the image

POPE: Polling-based Object Probing Evaluation—a benchmark testing whether a model answers 'yes' or 'no' correctly regarding object existence

VDAT: Visual Divergent Association Test—a new benchmark proposed in this paper to measure associative reasoning by asking models to generate nouns unrelated to an image

VCD: Visual Contrastive Decoding—a baseline method that reduces hallucination by contrasting logits from original and distorted visual inputs

Ha-DPO: Hallucination Direct Preference Optimization—a training-based baseline that aligns models to prefer grounded over hallucinated responses

Associative Reasoning: The cognitive process of connecting ideas; 'convergent' for facts (faithfulness) and 'divergent' for creativity

Cosine Distance: A metric measuring the directional difference between two vectors

Euclidean Distance: A metric measuring the straight-line magnitude difference between two vectors