Geography-Aware Large Language Models for Next POI Recommendation

📝 Paper Summary

Next Point-of-Interest (POI) Recommendation Spatial Representation Learning

GA-LLM improves next POI recommendation by transforming GPS coordinates into hierarchical quadkey embeddings and aligning external graph-based transition knowledge with the LLM's semantic space.

Core Problem

LLMs struggle with spatial tasks because they tokenize high-precision GPS coordinates inefficiently (leading to hallucinations) and lack global knowledge of POI transition patterns due to limited context windows.

Why it matters:

Text-only LLMs often predict hallucinated POIs that are semantically plausible but geographically impossible (far from user location)
High-precision GPS coordinates (e.g., >10 decimal places) generate excessive tokens, increasing computational cost and confusing semantic modeling
Limited prompts cannot fit entire user histories, causing LLMs to miss implicit POI-POI transition rules evident in long-term trajectories

Concrete Example: When a user visits 'AirTrain JFK station', a text-only LLM might predict a previously mentioned POI again. However, the ground truth is 'Kennedy Airport' (POI 404), which is not in the recent context but is a logical transition. GA-LLM captures this via transition patterns where text-only models fail.

Key Novelty

Geography-Aware Large Language Model (GA-LLM)

Geographic Coordinate Injection Module (GCIM): Discretizes GPS into hierarchical quadkeys and applies Fourier positional encoding to capture multi-scale spatial dependencies within the LLM.
POI Alignment Module (PAM): Projects pre-trained embeddings from external graph-based models (which capture global transition patterns) directly into the LLM's high-dimensional semantic space.

Architecture

The GA-LLM framework architecture, detailing how user trajectories are processed into prompts and how the two specialized modules (GCIM and PAM) inject information.

Breakthrough Assessment

7/10

Addresses a critical weakness of LLMs (spatial numeracy) with a principled encoding scheme (Quadkeys + Fourier). The alignment of graph embeddings is a logical step for hybrid recommendation.

⚙️ Technical Details

Problem Definition

Setting: Given a user's historical trajectory T_u(t) containing a sequence of POIs with categories, timestamps, and coordinates, predict the next POI p_{k+1}.

Inputs: User trajectory tuples q = (user, POI, category, timestamp, GPS coordinates)

Outputs: Next POI p_{k+1}

Pipeline Flow

Coordinate Transformation (GPS → Quadkey)
Spatial Encoding (GCIM)
POI Alignment (PAM)
LLM Inference

System Modules

Geographic Coordinate Injection Module (GCIM) (Input Processing)

Transform continuous GPS coordinates into compact, hierarchical spatial representations compatible with the LLM

Model or implementation: Quadkey Grid + Fourier Encoding + Self-Attention

POI Alignment Module (PAM) (Input Processing)

Integrate global POI transition knowledge into the LLM by projecting external graph embeddings

Model or implementation: Multi-Layer Perceptron (MLP)

Large Language Model (LLM)

Process the sequence of semantic, spatial, and transition embeddings to predict the next POI

Model or implementation: Not explicitly named in provided text

Novel Architectural Elements

Hierarchical Quadkey-based embedding strategy injected directly into LLM input sequence
Hybrid Fourier + n-gram attention mechanism for processing spatial strings (quadkeys)
Explicit projection module (PAM) to inject pre-trained graph dynamics into the LLM context

Modeling

Base Model: Large Language Model (specific variant not reported in the provided text)

Training Method: Fine-tuning with specialized modules (GCIM and PAM)

Adaptation: Projection layers (MLP for PAM, learnable matrix for Fourier) and likely LLM fine-tuning

Key Hyperparameters:

Note: Specific hyperparameters (learning rate, batch size) not reported in the provided text

Comparison to Prior Work

vs. LLM4POI: GA-LLM adds explicit spatial encoding (Quadkeys) and graph-based transition alignment, whereas LLM4POI relies mostly on text prompts/retrieval
vs. ROTAN/MTNet: GA-LLM uses these as sources for transition embeddings (PAM) but processes the final recommendation via an LLM for better semantic reasoning
vs. GeoSAN: GA-LLM adapts the quadkey idea for LLM token space using Fourier encoding and self-attention, rather than just for a specialized RNN/Transformer

Limitations

Relies on external models (ROTAN, MTNet) to provide pre-trained POI embeddings for the PAM module
Complexity of converting GPS to Quadkeys adds a preprocessing step
Effectiveness depends on the quality of the underlying graph embeddings used in PAM

Reproducibility

Code: https://anonymous.4open.science/r/GA-LLM-D2408

Source code is publicly available at https://anonymous.4open.science/r/GA-LLM-D2408. The provided text does not specify the base LLM, exact training data splits, or hyperparameters.

📊 Experiments & Results

Evaluation Setup

Next POI recommendation based on historical user trajectories

Benchmarks:

Three public real-world datasets (Next POI Recommendation)

Metrics:

Error distance (in kilometers)
Statistical methodology: Not explicitly reported in the provided text

Main Takeaways

GA-LLM addresses the 'hallucination' problem where text-only LLMs predict POIs that are semantically relevant but geographically distant.
The combination of GCIM (spatial) and PAM (transition) allows the model to handle cases where the ground truth POI is absent from the immediate context window (cold-start/exploration).
Qualitative analysis (Figure 1) suggests GA-LLM produces lower error distances compared to text-only baselines.

📚 Prerequisite Knowledge

Prerequisites

Understanding of tokenization issues with numerical data in LLMs
Basics of Mercator projection and spatial indexing
Knowledge of sequential recommendation and graph embeddings

Key Terms

Quadkey: A spatial indexing system that maps 2D grid tiles to a hierarchical base-4 string, where string length corresponds to zoom level/granularity

Mercator projection: A map projection converting spherical GPS coordinates (latitude/longitude) into planar Cartesian coordinates (x, y)

Fourier positional encoding: A method using sinusoidal functions at different frequencies to represent positions, allowing the model to capture both fine-grained and global spatial patterns

n-gram: A contiguous sequence of n items from a given sample (here, overlapping substrings of the quadkey digit sequence)

Hallucination (Spatial): When a model generates a location that doesn't exist or is geographically impossible (e.g., recommending a restaurant in a different city)