Fangye Wang, Haowen Lin, Yifang Yuan, Siyuan Wang, Xiaojiang Zhou, Song Yang, Pengjie Wang
AMAP, Alibaba Group
arXiv
(2026)
RecommendationP13NPretraining
📝 Paper Summary
Next Point-of-Interest (POI) RecommendationGenerative RetrievalSpatio-Temporal Modeling
GeoGR adapts Large Language Models for next-POI prediction by encoding locations into hierarchical semantic IDs that explicitly capture spatio-temporal collaborative patterns, then training the model via continued pre-training and supervised fine-tuning.
Core Problem
Existing LLM-based POI recommenders rely on non-semantic identifiers or purely textual embeddings that fail to capture collaborative cross-category relationships (e.g., airport→hotel→parking) and struggle with the sparsity of real-world navigation data.
Why it matters:
Accurate prediction is critical for large-scale navigation platforms serving billions of users with diverse needs (dining, tourism, fueling).
Traditional sequential models miss the semantic reasoning of LLMs, while standard LLM approaches miss the structured spatio-temporal dependencies inherent in mobility data.
Concrete Example:A user searches for 'dinner' near a specific location. A standard LLM might recommend a generic popular restaurant based on text similarity. GeoGR, understanding the user's specific trajectory (e.g., arriving from an airport), recommends a hotel restaurant with parking, leveraging learned collaborative signals between these distinct categories.
Key Novelty
Geo-Aware Generative Recommendation Framework
Constructs 'Semantic IDs' (SIDs) for POIs not just from text, but by explicitly modeling geographically constrained co-visitation patterns using contrastive learning.
Aligns the LLM with these new SIDs through a two-stage process: Continued Pre-Training (CPT) on template-based tasks to learn the 'language' of SIDs, followed by Supervised Fine-Tuning (SFT) for the specific next-POI prediction task.
Architecture
The overall framework of GeoGR, split into two main stages: (1) Geo-aware SID Construction and (2) Generative POI Recommendation Training.
Evaluation Highlights
Online A/B testing on the AMAP platform (millions of users) demonstrated significant boosting of multiple online metrics.
Offline experiments on real-world datasets show superiority over state-of-the-art baselines (specific numbers not provided in snippet but claimed).
Breakthrough Assessment
8/10
Strong industrial application with a novel approach to 'grounding' LLMs in spatio-temporal data via specialized tokenization. Successfully deployed on a massive scale (AMAP).
⚙️ Technical Details
Problem Definition
Setting: Next POI recommendation formulated as a conditional probability maximization problem.
Inputs: User interaction history T_u (sequence of POIs, times, conditions) and current context con_u (time, location, query).
Outputs: The next POI p_{n+1} (represented as a sequence of Semantic ID tokens).
Pipeline Flow
Group 1: SID Construction: POI Representation Learning → Tokenization → Refinement
Group 2: Generative Training: Continued Pre-Training (CPT) → Supervised Fine-Tuning (SFT) → Inference
System Modules
POI Encoder (SID Construction)
Generate dense embeddings for POIs incorporating text and spatial context.
Model or implementation: Qwen 4B embedding (fine-tuned)
Tokenizer (RQ-Kmeans) (SID Construction)
Convert dense POI embeddings into discrete hierarchical Semantic IDs.
Model or implementation: Hierarchical K-means clustering
SID Refiner (SID Construction)
Iteratively improve SIDs to be more predictable by the LLM.
Model or implementation: EM-style optimization algorithm
Generative Recommender
Predict the next POI's SID sequence given user context.
Model or implementation: Qwen 4B (CPT + SFT)
Novel Architectural Elements
Geo-aware SID tokenization pipeline that injects spatio-temporal collaborative signals directly into the ID creation process via contrastive learning on co-visit pairs.
EM-style iterative refinement loop where the LLM and the SID codebook mutually update to maximize learnability.
Modeling
Base Model: Qwen 4B
Training Method: Continued Pre-Training (CPT) followed by Supervised Fine-Tuning (SFT)
Objective Functions:
Purpose: Learn collaborative POI representations.
Formally: NCE loss L_{contrast} = -log( exp(sim(e_i, e_j)/tau) / sum(exp(sim(e_i, e_k)/tau)) )
Purpose: Align LLM with SID tokens (CPT) and Learn Next-POI prediction (SFT).
Formally: Negative Log-Likelihood (NLL) loss over the autoregressive generation of SID tokens.
vs. TIGER/GNPR-SID: GeoGR explicitly incorporates spatio-temporal collaborative signals (geo-constrained co-visits) into the SID construction via contrastive learning, rather than relying solely on text or pure quantization.
vs. LLM4POI: GeoGR uses discrete Semantic IDs instead of full text generation, improving efficiency and handling the specific vocabulary of locations better.
vs. Standard GR: Uses an EM-style iterative refinement to align the SIDs with the LLM's capability, rather than keeping SIDs fixed after quantization.
Limitations
Relies on proprietary data and platform (AMAP), limiting reproducibility.
Specifics of the offline experimental results (tables/numbers) are not provided in the snippet.
Reproducibility
No replication artifacts mentioned in the paper. The system is deployed on a proprietary platform (AMAP), and the dataset appears to be internal/proprietary real-world data.
📊 Experiments & Results
Evaluation Setup
Next-POI prediction on real-world datasets and online A/B testing on AMAP.
Benchmarks:
Internal/Real-world datasets (Next POI Prediction) [New]
Metrics:
Online engagement metrics (not specified exactly but implied CTR/Conversion)
Offline accuracy metrics (likely Hit@K, NDCG@K - not explicitly listed in snippet)
Statistical methodology: Not explicitly reported in the paper
Key Results
Benchmark
Metric
Baseline
This Paper
Δ
AMAP Platform
Online metrics
Not reported in the paper
Not reported in the paper
Positive boosting
Main Takeaways
GeoGR successfully integrates LLMs into a high-throughput industrial navigation platform.
The geo-aware SID tokenization effectively captures cross-category associations (e.g., airport -> hotel) that standard text embeddings miss.
The two-stage alignment (CPT + SFT) is crucial for adapting the LLM to non-native SID tokens.
📚 Prerequisite Knowledge
Prerequisites
Generative Retrieval (GR) paradigms
Vector Quantization (RQ-VAE / RQ-Kmeans)
Large Language Model fine-tuning (CPT, SFT)
Contrastive Learning (NCE loss)
Key Terms
SID: Semantic ID—a short sequence of discrete tokens representing an item (POI) in a generative retrieval system, often derived from hierarchical clustering.
CPT: Continued Pre-Training—an intermediate training stage to adapt a pre-trained LLM to a new domain or vocabulary before specific task fine-tuning.
SFT: Supervised Fine-Tuning—training the model on labeled input-output pairs (instruction tuning) to perform the specific downstream task.
RQ-Kmeans: Residual Quantization K-means—a method to discretize continuous vectors into hierarchical discrete codes by iteratively clustering residuals.
Co-visit pairs: Pairs of POIs that appear together in user trajectories within a short time window, indicating a behavioral relationship.
Spatio-temporal collaborative signals: Information derived from the collective movement patterns of users over time and space, revealing relationships between locations beyond just semantic similarity.