← Back to Paper List

TimeR4: Time-aware Retrieval-Augmented Large Language Models for Temporal Knowledge Graph Question Answering

X Qian, Y Zhang, Y Zhao, B Zhou, X Sui, L Zhang…
Nankai University, China, Tiangong University, China
Proceedings of the 2024 …, 2024 (2024)
RAG KG QA Reasoning Factuality

📝 Paper Summary

Graph-based RAG pipeline
TimeR4 enhances LLMs for temporal question answering by rewriting implicit questions into explicit ones with retrieved facts and using a time-aware retrieval-rerank pipeline.
Core Problem
LLMs struggle with temporal reasoning because they hallucinate on implicit time questions (e.g., 'after the ministry') and standard retrievers miss time constraints.
Why it matters:
  • Standard retrieval methods (like BM25) focus only on semantic matching, often retrieving facts with incorrect timestamps that mislead the LLM
  • Implicit temporal questions (lacking specific dates) cause severe hallucinations in LLMs, which cannot infer the hidden timeline without external knowledge
  • Existing TKGQA (Temporal Knowledge Graph Question Answering) methods using graph embeddings fail to handle the complex semantic nuance of natural language questions
Concrete Example: For the question 'After the Danish Ministry, who was the first to visit Iraq?', an LLM might guess incorrectly. TimeR4 retrieves the fact (Danish Ministry, visit, Iraq, 2016-01-05), rewrites the question to 'After 2016-01-05...', and then retrieves the correct answer 'Jack Straw' (visit date 2016-01-06) while filtering out irrelevant visits like Evan Bayh's in 2016-01-04.
Key Novelty
Retrieve-Rewrite-Retrieve-Rerank Framework (TimeR4)
  • Rewrites implicit temporal questions by retrieving background facts (e.g., event dates) and asking an LLM to substitute them into the query as explicit timestamps
  • Trains a specific Time-Aware Retriever using contrastive learning with negatives that have perturbed timestamps, ensuring the embedding model learns time sensitivity
  • Applies a hard temporal filter during reranking to explicitly penalize retrieved facts that violate the question's time constraints (e.g., filtering events before a 'start' date)
Architecture
Architecture Figure Figure 2
The four-module architecture of TimeR4: Fact Retrieval -> Rewriting -> Time-aware Retrieval -> Reasoning
Evaluation Highlights
  • +47.8% improvement in Hits@1 on the MultiTQ dataset compared to the ChatGPT-based baseline ARI
  • +22.5% relative improvement on TimeQuestions dataset compared to the best baseline TwiRGCN
  • Achieves 72.8% Hits@1 on MultiTQ, significantly outperforming LLaMA2 (18.5%) and ChatGPT (10.2%) in zero-shot settings
Breakthrough Assessment
8/10
Significant performance jumps on standard TKGQA benchmarks. Effectively addresses the specific 'implicit time' problem in RAG, though the scope is limited to structured temporal knowledge graphs.
×