Personalized Autonomous Driving with Large Language Models: Field Experiments

📝 Paper Summary

Personalized autonomous driving Episodic memory in agents Linear memory organization

Talk2Drive integrates LLMs and a memory module into real vehicles to translate verbal commands into executable control codes, adapting to driver preferences by learning from historical feedback.

Core Problem

Traditional autonomous driving systems fail to understand abstract verbal commands (e.g., 'I'm in a hurry') and lack mechanisms to personalize driving behaviors based on historical driver preferences.

Why it matters:

Standard systems rely on rigid, numerical configurations that cannot adapt to the nuanced, context-dependent needs of different human drivers.
Current personalization frameworks often struggle with unseen scenarios and lack the semantic understanding to interpret indirect speech acts (hints) effectively.
Most LLM-based driving research is confined to simulation, leaving a gap in understanding how these models perform with real-world vehicle dynamics and safety constraints.

Concrete Example: A driver saying 'I am really in a hurry now' requires the car to increase speed and aggressiveness. A traditional system ignores this abstract hint. Talk2Drive interprets the urgency, checks safety limits, and generates code to increase target velocity.

Key Novelty

Talk2Drive Framework with Verbal-to-Code Memory

Translates spoken commands into 'Language Model Programs' (executable parameter adjustments) rather than just selecting pre-defined discrete actions.
Implements a memory module that stores interaction triples (Command, Policy, Feedback) to refine future generations based on past user satisfaction.
Demonstrates the first known end-to-end implementation of an LLM-based personalization system controlling a real-world full-scale autonomous vehicle.

Architecture

The flowchart of the Talk2Drive system processing a command from speech to execution.

Evaluation Highlights

Reduces driver takeover rate by 75.9% in real-world scenarios compared to the baseline autonomous system, indicating significantly higher trust.
The memory module further reduces the takeover rate by up to 65.2% compared to the LLM system without memory, demonstrating effective personalization.
Successfully handles variable command directness, from explicit instructions ('drive faster') to non-conventional hints ('I hope we're not late'), in highway and parking settings.

Breakthrough Assessment

8/10

High score for being the first field deployment of LLMs for AD personalization on a real vehicle. The results in takeover reduction are significant, though the scope is limited to specific scenarios.

⚙️ Technical Details

Problem Definition

Setting: Real-world autonomous driving where an agent maps verbal commands and context to executable control parameters

Inputs: Verbal command sequence I, Contextual data C (weather, traffic, rules), System messages S, Historical interactions H

Outputs: Language Model Programs (P) containing executable ROS (Robot Operating System) codes

Pipeline Flow

Input Processing: Voice → Text (Whisper) + Context Generation
Reasoning & Generation: LLM (GPT-4) + Memory Retrieval → Code Generation
Execution: Safety Check → ECU execution → Vehicle Actuation

System Modules

Voice Recognition (Input Processing)

Convert raw audio of human commands into text

Model or implementation: Whisper API

Context Generator (Input Processing)

Convert numerical sensor data into descriptive text prompts

Model or implementation: Predefined Structured Language Generator

Policy Generator (Reasoning & Generation)

Interpret commands and generate executable control code

Model or implementation: GPT-4 (via ChatGPT API)

Memory Module (Reasoning & Generation)

Store and retrieve interaction history for personalization

Model or implementation: Text-based Log/Storage

Safety & Execution

Validate and execute generated code on the vehicle

Model or implementation: Rule-based Checker + Vehicle ECU

Novel Architectural Elements

Integration of a text-based Memory Module directly into the prompt chain for real-time control adaptation
Cloud-to-Vehicle closed loop where LLM generates ROS-compatible code (LMPs) executed by a local drive-by-wire system

Modeling

Base Model: GPT-4 (accessed via ChatGPT API)

Training Method: In-context learning with Chain-of-Thought prompting

Compute: Not reported in the paper (Cloud-based inference, Latency reported as metric)

Comparison to Prior Work

vs. GPT-Driver: Talk2Drive is deployed on a real vehicle (Lexus RX450h), whereas GPT-Driver is simulation-focused.
vs. DiLu: Talk2Drive incorporates a specific personalization memory module to adapt to driver preferences over time, which DiLu lacks.
vs. Traditional AD: Talk2Drive handles abstract/indirect verbal commands via LLM reasoning, whereas traditional systems require explicit numerical inputs.

Limitations

Reliance on cloud-based LLM APIs introduces latency concerns for time-critical driving maneuvers.
Field experiments are limited to specific scenarios (highway, intersection, parking) and may not cover complex edge cases.
Safety checks are rule-based and might not catch all semantically dangerous but syntactically correct LMPs.

Reproducibility

Code not provided. Experiment video available at https://www.youtube.com/watch?v=4BWsfPaq1Ro. Uses commercial APIs (GPT-4, TomTom, OpenWeather, OpenStreetMap). Specific prompt templates are alluded to but full text not released.

📊 Experiments & Results

Evaluation Setup

Real-world field experiments on a Lexus RX450h autonomous vehicle

Benchmarks:

Highway Scenario (Real-world driving) [New]
Intersection Scenario (Real-world driving) [New]
Parking Scenario (Real-world driving) [New]

Metrics:

Takeover Rate (R)
Time to Collision (tau)
Speed Variance
Mean Absolute Acceleration
Mean Absolute Jerk
LLM Latency
Statistical methodology: Not explicitly reported in the paper

Experiment Figures

Visualization of the real-world experiments in highway, intersection, and parking scenarios.

Main Takeaways

The integration of the Talk2Drive framework reduced the driver takeover rate by 75.9% overall compared to the baseline, indicating improved trust and capability.
The Memory Module is critical for personalization: adding it reduced the takeover rate by an additional 65.2% compared to the Talk2Drive system without memory.
The system effectively handles commands at varying levels of directness, from explicit instructions to mild hints, translating them into safe vehicle behavior.
Safety checks successfully prevented the execution of dangerous LMPs (e.g., speeding limits), ensuring the system remains within regulatory bounds.

📚 Prerequisite Knowledge

Prerequisites

Autonomous driving software stacks (Autoware)
Large Language Models (In-context learning)
Control theory basics (Pure pursuit)

Key Terms

LMPs: Language Model Programs—executable code snippets generated by the LLM to adjust vehicle control parameters (e.g., speed, look-ahead distance)

Takeover Rate: The frequency with which a human driver must manually intervene and disengage the autonomous mode due to safety or comfort concerns

ROS: Robot Operating System—a middleware framework used for robot software development, handling communication between vehicle sensors and actuators

Pure Pursuit: A path tracking algorithm that calculates the steering angle to move the vehicle towards a look-ahead point on the reference path

In-context Learning: A method where the LLM performs a task based on examples and instructions provided in the prompt without updating its weights

Chain-of-thought: A prompting strategy that encourages the LLM to generate intermediate reasoning steps before producing the final answer

NDT: Normal Distributions Transform—a scan matching algorithm used for vehicle localization and mapping