Reasoning over User Preferences: Knowledge Graph-Augmented LLMs for Explainable Conversational Recommendations

📝 Paper Summary

Conversational Recommender Systems (CRS) Knowledge Graph-augmented LLMs

COMPASS integrates Knowledge Graphs with LLMs using a graph-to-text alignment strategy to generate human-readable user preference summaries that enhance transparency and recommendation accuracy.

Core Problem

Existing CRSs use opaque latent vectors for user preferences, while LLMs lack domain-specific item knowledge; integrating structured KGs with unstructured LLM dialogue creates a difficult modality gap.

Why it matters:

Latent embeddings (hidden vectors) make it impossible to verify why a system made a specific recommendation, reducing user trust
LLMs hallucinate or miss specific item attributes without access to up-to-date structured domain knowledge
Standard approaches cannot effectively perform cross-modal reasoning to synthesize dialogue history with complex graph relationships

Concrete Example: In a movie recommendation, a standard CRS might represent a user's love for 'Inception' as a meaningless vector `[0.4, -0.1, ...]`. It cannot explain that the user prefers 'sci-fi directed by Nolan'. COMPASS generates the text: 'The user enjoys complex sci-fi films by Christopher Nolan,' providing a transparent rationale.

Key Novelty

Compact Preference Analyzer and Summarization System (COMPASS)

Bridges the modality gap by pre-training the LLM on a 'graph entity captioning' task, teaching it to translate structured graph embeddings into natural language descriptions
Uses 'knowledge-aware instruction fine-tuning' to guide the LLM in synthesizing dialogue history with KG-augmented context to output structured preference summaries
Integrates generated text summaries back into base CRS models via a BERT-based encoder and adaptive gating mechanism, requiring no architectural changes to the base model

Architecture

The overall architecture and two-stage training process of COMPASS.

Breakthrough Assessment

7/10

Novel approach to the modality gap problem via entity captioning. Significant potential for explainability, though the provided text lacks the quantitative results to confirm SOTA performance.

⚙️ Technical Details

Problem Definition

Setting: Conversational Recommendation where a system estimates user preferences from dialogue history H_t to recommend items I_t and generate the next response

Inputs: Dialogue history H_t, Knowledge Graph G (Entities, Relations, Descriptions)

Outputs: Textual user preference summary P_t, Recommended items I_t

Pipeline Flow

Graph Encoder (R-GCN) processes KG
Graph-to-Text Adapter projects embeddings
LLM generates Preference Summary
Preference Encoder (BERT) processes Summary
Adaptive Gating mixes Summary with Base Model

System Modules

Graph Encoder (Input Processing)

Capture structural relationships and item attributes from the Knowledge Graph

Model or implementation: Relational Graph Convolutional Network (R-GCN)

Graph-to-Text Adapter (Input Processing)

Map graph embeddings into the LLM's semantic space

Model or implementation: Linear Projection Layer

Preference Analyzer LLM

Reason over dialogue and KG context to generate explainable preference summaries

Model or implementation: LLM (Unspecified architecture in text, likely Llama or similar)

Preference Encoder

Convert natural language summary into a vector for the recommender

Model or implementation: BERT

Novel Architectural Elements

Graph Entity Captioning Pre-training Loop: A specific pipeline branch used during training where the LLM reconstructs entity descriptions from graph embeddings to align modalities

Modeling

Base Model: Large Language Model (Generic term used in text, specific variant not detailed in snippet)

Training Method: Two-stage training: (1) Graph Entity Captioning, (2) Knowledge-Aware Instruction Fine-tuning

Objective Functions:

Purpose: Align graph embeddings with text by minimizing negative log-likelihood of generating entity captions.

Formally: L_pre = - sum log P(C_e | h_tau_e, I_c)
Purpose: Optimize preference reasoning by minimizing negative log-likelihood of generating preference summaries.

Formally: L_tune = - sum log P(P_t | H_t, E_t, I_p)
Purpose: Optimize recommendation accuracy in the base CRS.

Formally: Cross-entropy loss L_rec = - sum y_ij log(p_ij)

Training Data:

Stage 1: Input-output pairs constructed for each entity in the KG (Embedding -> Caption)
Stage 2: Dialogue history + Mentioned Entity Embeddings -> Structured Preference Summary

Compute: Not reported in the provided text

Reproducibility

The provided text does not contain a link to code or datasets. The methodology for prompt construction (templates for items vs non-items) is described in detail.

📊 Experiments & Results

Evaluation Setup

Conversational Recommendation using Knowledge Graphs

Benchmarks:

Benchmark datasets (Conversational Recommendation)

Metrics:

Recommendation Performance (implied)
Explainability (implied)
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

The paper proposes a 'plug-and-play' framework, meaning the generated summaries can enhance various existing CRS models without altering their internal architecture.
The two-stage training approach effectively addresses the modality gap, allowing the LLM to 'understand' graph embeddings as if they were language tokens.
Qualitative examples show the model generates structured summaries containing: Reasoning, Overall Preferences, Current Interests, and Recommendations.
Note: The provided text ends before the Experiments section; therefore, specific quantitative results (tables, metrics) are not available for extraction.

📚 Prerequisite Knowledge

Prerequisites

Conversational Recommender Systems (CRS)
Knowledge Graphs (KG)
Large Language Models (LLM)
Graph Neural Networks (GNN)

Key Terms

CRS: Conversational Recommender System—a system that elicits user preferences through multi-turn natural language dialogue

KG: Knowledge Graph—a structured representation of data with entities as nodes and relationships as edges

R-GCN: Relational Graph Convolutional Network—a type of GNN specifically designed to handle knowledge graphs with multiple relation types

COMPASS: Compact Preference Analyzer and Summarization System—the proposed framework

Modality Gap: The difficulty in combining data from different representations, specifically structured graph data and unstructured natural language text

Entity Captioning: A pre-training task where the model learns to generate a natural language description from a graph entity embedding

Instruction Fine-tuning: Training an LLM on specific tasks using natural language instructions to guide its behavior