ItiNera: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary Planning

📝 Paper Summary

LLM-based recommendation Agentic AI (Planning)

ItiNera integrates Large Language Models for intent understanding with mathematical spatial optimization to generate personalized, feasible urban travel itineraries without hallucinations or inefficient routing.

Core Problem

Existing itinerary planning is either purely optimization-based (lacking personalization/flexibility) or purely LLM-based (prone to POI hallucinations and spatially incoherent routing).

Why it matters:

Pure LLMs (like GPT-4) cannot reference specific POI lists reliably, leading to non-existent destinations
LLMs lack spatial reasoning capabilities, often generating 'zig-zag' routes that waste travel time
Traditional operations research methods (like TSP solvers) cannot understand nuanced natural language preferences (e.g., 'a quiet cafe with a retro vibe')

Concrete Example: A user asks for a 'citywalk including art spots and coffee.' An LLM might suggest a museum that is closed or route the user back and forth across the city efficiently. ItiNera retrieves valid open spots and orders them using spatial clustering.

Key Novelty

LLM-Solver-LLM Sandwich Architecture

Decomposes the problem: Uses LLMs to understand *what* to visit (intent decomposition) and *how* to describe it (final generation), but delegates *where* and *when* to a mathematical solver
Introduces a 'Hierarchical TSP' module in the loop: Clusters retrieved points of interest spatially and solves the Traveling Salesman Problem to ensure the route is physically logical before the LLM writes the final narrative

Architecture

The complete inference pipeline of ItiNera.

Evaluation Highlights

Achieves ~30% improvement in rule-based metrics (Recall, Spatial Margin) over best baselines (including GPT-4 CoT)
Maintains spatial coherence by generating itineraries only ~100 meters longer per POI than the theoretical shortest path (TSP)
Outperforms GPT-4 CoT on 'Match' (alignment with user request) in human-aligned LLM evaluations

Breakthrough Assessment

7/10

Strong engineering integration of symbolic AI (optimization) and connectionist AI (LLMs) for a practical application. Solves the specific 'spatial hallucination' problem of LLMs effectively.

⚙️ Technical Details

Problem Definition

Setting: Open-domain Urban Itinerary Planning (OUIP)

Inputs: User request r in natural language and a user-owned POI database P

Outputs: A coherent travel itinerary I (ordered list of POIs) maximizing alignment with r and spatial efficiency

Pipeline Flow

Request Decomposition (LLM)
Preference-aware POI Retrieval (Vector Search)
Cluster-aware Spatial Optimization (Math Solver)
Itinerary Generation (LLM)

System Modules

Request Decomposition (RD)

Break user request into atomic sub-requests classified by granularity, specificity, and attitude (pos/neg)

Model or implementation: LLM (e.g., GPT-3.5/4)

Preference-aware POI Retrieval (PPR)

Retrieve candidate POIs matching positive sub-requests while avoiding negative ones

Model or implementation: Embedding Model + Vector DB

Cluster-aware Spatial Optimization (CSO)

Select final POI subset and determine optimal visiting order

Model or implementation: Hierarchical TSP Solver (Algorithm)

Itinerary Generation (IG)

Generate the final natural language itinerary respecting the optimized order and time constraints

Model or implementation: GPT-4

Novel Architectural Elements

Integration of a deterministic Hierarchical TSP solver strictly *between* the retrieval and generation phases to force spatial coherence
Dual-embedding retrieval mechanism that explicitly reranks based on the delta between Positive and Negative preference vectors

Modeling

Base Model: GPT-4 (for final generation), GPT-3.5 (for intermediate steps)

Training Method: Prompt Engineering + Classical Optimization

Compute: Not reported in the paper (Inference-only system using APIs)

Comparison to Prior Work

vs. TravelPlanner: ItiNera focuses on fine-grained single-day urban routing with specific spatial optimization, whereas TravelPlanner is a broad multi-day benchmark
vs. IP: ItiNera handles open-domain natural language requests and dynamic POIs, whereas IP requires structured inputs and fixed utilities
vs. Ernie-Bot 4.0: ItiNera uses external tools (retrieval + solver) to fix hallucinations and routing, whereas Ernie is a standalone LLM

Limitations

Relies on the quality and freshness of the underlying POI database scraped from social media
Current implementation focuses on single-day planning; multi-day extension is possible but not demonstrated
Dependent on commercial LLM APIs (GPT-4) for high-quality final generation

Reproducibility

Code: https://github.com/YihongT/ITINERA

Source code available at https://github.com/YihongT/ITINERA. Dataset of 1233 itineraries and 7578 POIs collected from Chinese social media (Little Red Book) is mentioned. System relies on OpenAI API (GPT-3.5/4).

📊 Experiments & Results

Evaluation Setup

Comparison against baselines on a collected dataset of 1233 real-world urban itineraries from 4 Chinese cities.

Benchmarks:

Real-world Urban Itinerary Dataset (Itinerary Generation) [New]

Metrics:

Recall Rate (RR)
Average Margin (AM - spatial deviation from TSP)
Overlaps (OL - route intersections)
Fail Rate (FR - hallucinated POIs)
LLM-evaluated Match/Quality
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
ItiNera demonstrates superior performance in both rule-based metrics (spatial accuracy, recall) and LLM-based metrics (quality, match) compared to purely LLM or purely optimization-based baselines.
Real-world Dataset	Rule-based metrics (Recall, etc.) improvement	0	30	+30
Real-world Dataset	Spatial Efficiency (Distance Margin vs TSP)	0	100	+100
Real-world Dataset	Match (LLM-eval)	Qualitative Lower	Qualitative Higher	Positive

Main Takeaways

Integrating a mathematical solver (TSP) strictly prevents the 'spaghetti routing' problem common in pure LLM planners.
The 'User-owned POI Database' approach effectively eliminates hallucinations (Fail Rate) compared to pure LLMs which hallucinate venues.
Decomposing user requests into positive/negative embedding queries significantly improves the alignment (Match) of the retrieved POIs with user intent.
Ablation studies show that removing the Cluster-aware Spatial Optimization (CSO) module forces the LLM to do routing, which degrades spatial coherence metrics.

📚 Prerequisite Knowledge

Prerequisites

Understanding of Large Language Models (LLMs) and RAG
Basic knowledge of Combinatorial Optimization (TSP)
Familiarity with Embedding-based Retrieval

Key Terms

OUIP: Open-domain Urban Itinerary Planning—generating personalized travel plans from natural language requests

POI: Point of Interest—a specific location (restaurant, park, museum) in the database

TSP: Traveling Salesman Problem—an optimization problem to find the shortest possible route visiting a set of locations exactly once

Citywalk: A form of urban tourism focusing on wandering streets and immersing in local culture rather than just visiting famous landmarks

Hierarchical TSP: A spatial optimization approach that first clusters points into groups and then optimizes the route within and between clusters

CoT: Chain-of-Thought—a prompting technique where the model generates intermediate reasoning steps

RAG: Retrieval-Augmented Generation—providing external data to an LLM to improve accuracy