Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback

📝 Paper Summary

Conversational personalization Knowledge internalization

KGT personalizes LLMs by modifying an external knowledge graph based on user feedback during inference, maximizing the probability of retrieving and reasoning over personalized facts without updating model parameters.

Core Problem

Existing personalization methods like parameter-efficient fine-tuning or knowledge editing require computationally expensive back-propagation and lack interpretability, making real-time adaptation to extensive personalized knowledge difficult.

Why it matters:

Back-propagation incurs high GPU memory and computational costs unacceptable for daily on-device LLM use
Directly modifying model parameters can cause 'catastrophic forgetting' or corruption, adversely affecting responses to unrelated queries
In-context learning becomes inefficient and unscalable as the context length increases with accumulated personalized knowledge

Concrete Example: If a user tells an LLM 'My dog is vegetarian', parameter-based methods might overfit or damage general knowledge about dogs. KGT simply deletes the triple (Dog, Enjoy, Meat) and adds (Dog, Enjoy, Vegetable) to the user's graph, allowing the model to retrieve this specific fact for future planning without retraining.

Key Novelty

Optimization of Knowledge Graphs via Evidence Lower Bound (ELBO) Maximization

Treats personalization as optimizing the structure of an external knowledge graph (KG) rather than the LLM's weights
Uses a heuristic algorithm to iteratively add relevant personalized triples and remove conflicting ones until the model's reasoning probability for the correct answer is maximized

Architecture

The KGT framework workflow showing the interaction between User, LLM, and Knowledge Graph.

Evaluation Highlights

Reduces inference latency by up to 84% compared to parameter-based tuning methods like LoRA
Cuts GPU memory costs by up to 77% compared to gradient-based approaches
Outperforms GPT-2 and Llama-2 baselines in personalization accuracy while maintaining interpretability through explicit graph edits

Breakthrough Assessment

7/10

Offers a highly efficient, interpretable alternative to fine-tuning for personalization. While it relies on existing retrieval mechanisms, the formulation of graph editing as an ELBO optimization problem is a clever, practical shift.

⚙️ Technical Details

Problem Definition

Setting: Online personalization where an LLM must adapt to a sequence of user queries and feedback without parameter updates

Inputs: User query q_t and feedback/answer a_t at time t

Outputs: Optimized Knowledge Graph G

Pipeline Flow

Relation Extraction (LLM predicts relations between query/answer entities)
Candidate Construction (Build candidate personalized triples)
Heuristic Optimization (Iteratively add/remove triples from KG based on reasoning probability)

System Modules

Relation Extractor

Identify potential relationships between query subject and answer object

Model or implementation: Llama-2 or Llama-3 (frozen)

KG Optimizer

Modify the graph structure to maximize ELBO

Model or implementation: Heuristic Algorithm (Non-parametric)

Reasoning Engine

Generate answer or calculate probability of feedback

Model or implementation: Llama-2 or Llama-3 (frozen)

Novel Architectural Elements

Inference-only optimization loop: Replaces back-propagation with a search process that modifies external memory (KG) structure based on forward-pass probabilities

Modeling

Base Model: GPT-2, Llama-2-7b-chat, Llama-3-8b-instruct

Training Method: Inference-time Knowledge Graph Tuning (heuristic search)

Objective Functions:

Purpose: Maximize the likelihood of generating the user's feedback by optimizing the retrieved knowledge.

Formally: Maximize ELBO = E_Q[log P(a|q,z)] - KL(Q(z)||P(z|q))

Compute: Reduces GPU memory cost by up to 77% compared to gradient methods; requires only inference capability

Comparison to Prior Work

vs. ROME: KGT edits external graph instead of internal weights, avoiding model degradation
vs. LoRA: KGT requires no back-propagation, significantly lowering memory/compute
vs. Z-ICL: KGT scales better as knowledge accumulates; ICL hits context window limits
+ 1 more
vs. GRACE [not cited in paper]: GRACE edits a codebook of activations for sequential editing; KGT edits symbolic triples, offering better interpretability

Limitations

Relies on the LLM's intrinsic ability to extract relations and reason over triples; weak base models may fail
Currently focuses on one-depth triple retrieval; multi-hop reasoning is not explicitly handled in the optimization
Heuristic optimization (add/remove) is a greedy approximation of the full objective
Requires entities in query and answer to be labeled/identifiable

Reproducibility

Code: https://github.com/W-R-Liu/KGT

Code is publicly available at https://github.com/W-R-Liu/KGT. The paper describes the heuristic algorithm and instruction templates in detail.

📊 Experiments & Results

Evaluation Setup

Personalized Question Answering using modified knowledge bases

Benchmarks:

ZsRE (Relation Extraction / QA)
CounterFact (Counterfactual QA)

Metrics:

Accuracy (Personalization Performance)
GPU Memory Cost
Latency (Time Cost)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Performance metrics demonstrating KGT's efficiency advantages over parameter-tuning methods.
Llama-2-7b-chat	Latency Reduction	Not explicitly reported as raw number in text, derived from percentage	Not explicitly reported as raw number in text, derived from percentage	84% reduction
Llama-2-7b-chat	GPU Memory Reduction	Not explicitly reported as raw number in text, derived from percentage	Not explicitly reported as raw number in text, derived from percentage	77% reduction

Main Takeaways

KGT significantly improves personalization performance compared to baselines (GPT-2, Llama2, Llama3) while reducing computational overhead.
The method scales better than In-Context Learning (ICL) for long-term use because it doesn't linearly increase context length with accumulated knowledge.
Interpretability is inherently higher than parameter-editing methods because all changes are visible as added/removed graph triples.

📚 Prerequisite Knowledge

Prerequisites

Knowledge Graphs (entities, relations, triples)
Retrieval-Augmented Generation (RAG)
Evidence Lower Bound (ELBO) / Variational Inference
Causal Language Modeling

Key Terms

ELBO: Evidence Lower Bound—a proxy objective function used in variational inference to approximate a difficult-to-compute probability distribution

Knowledge Triple: A structured representation of a fact consisting of a subject, predicate (relation), and object, e.g., (Dog, Enjoy, Meat)

Posterior Distribution: The probability distribution of knowledge triples given the user's query and feedback, representing which facts are most likely true for this specific user

PEFT: Parameter-Efficient Fine-Tuning—methods like LoRA that fine-tune a small number of extra parameters instead of the full model

Knowledge Editing: Techniques aimed at modifying specific facts within a pre-trained model's weights

Knowledge Retrieval: The process of selecting relevant triples from the graph based on the query

Knowledge-Enhanced Reasoning: The process where the LLM generates an answer conditioned on both the query and the retrieved triples