Coloring Between the Lines: Personalization in the Null Space of Planning Constraints

📝 Paper Summary

Robot Personalization Constraint-based Planning Active Learning

CBTL enables robots to personalize safely by learning user-specific preferences as constraints that operate strictly within the valid solution space of pre-defined safety and competency rules.

Core Problem

Robots face a tension between safety and flexibility: over-constrained systems cannot adapt to user preferences, while under-constrained systems risk dangerous behavior.

Why it matters:

Generalist robots in homes or hospitals must adapt to unique user needs (e.g., dietary restrictions, mobility limits) without requiring expert reprogramming.
Existing personalization methods often sacrifice safety guarantees or require users to fully specify preferences upfront rather than learning continually over time.
Current approaches struggle to balance exploration (finding what the user likes) with the strict requirements of physical safety.

Concrete Example: In an assisted feeding scenario, a robot needs to know that a user loves tacos but hates cilantro, or has a limited range of motion. A standard robot might treat all solutions (e.g., any edible bite) as equal, potentially serving unwanted food or moving the arm in a way that startles the user.

Key Novelty

Coloring Between the Lines (CBTL)

Treats the set of all safe plans (the 'null space' of safety constraints) as a canvas for personalization, selecting only the safe options that also satisfy learned user preferences.
Uses active learning to purposely select plans that maximize uncertainty about the user's preferences, rapidly narrowing down the personalized constraints without violating safety rules.

Architecture

Overview of the Coloring Between the Lines (CBTL) approach.

Evaluation Highlights

Web-based user study (N=60) shows participants significantly prefer CBTL choices over a non-personalized baseline (p < 0.005, Wilcoxon-Signed Rank test).
Demonstrates zero-shot generalization on a real robot: occlusion preferences learned during feeding were successfully applied to a new drinking task without additional training.
Consistently achieves more effective personalization with fewer interactions than baselines (Free Explore, Epsilon-Greedy) across three simulation environments (Cooking, Cleaning, Books).

Breakthrough Assessment

8/10

A strong conceptual advance unifying safety and personalization via constraint null spaces. Effectively combines TAMP, LLMs, and active learning for practical robotics.

⚙️ Technical Details

Problem Definition

Setting: Continual robot learning from a single trajectory of experience (no resets) with unknown transition/observation models

Inputs: History of observations and actions h_t = (o_0, a_1, ..., o_t)

Outputs: Action a_t derived from a solution-conditioned policy

Pipeline Flow

CSP Generator (Produce safe base CSP)
Personalized Constraint Generator (Add user-specific constraints)
Solver (Find solution in null space)
Policy Execution (Execute solution)

System Modules

CSP Generator

Generates the fundamental Constraint Satisfaction Problem (CSP) enforcing safety and competency

Model or implementation: Domain-specific TAMP generator (e.g., PDDLStream-inspired)

Personalized Constraint Generator

Generates additional constraints reflecting user preferences to reduce the null space

Model or implementation: Ensemble of generators (Learned Classifiers or LLM-based)

Active Learning Solver

Finds a solution satisfying safety constraints that maximizes uncertainty about personalized parameters

Model or implementation: Sampling-based CSP Solver with Entropy Maximization

Novel Architectural Elements

Compositional constraint generation where safety constraints are fixed/hard-coded while personalization constraints are learned and appended dynamically
Null space exploitation: treating the set of valid TAMP solutions as the search space for personalization rather than optimizing a reward function directly

Modeling

Base Model: Large Language Model (specific variant not named in snippet) for natural language constraints; Standard classifiers for other constraints

Training Method: Online Active Learning (Entropy-based)

Objective Functions:

Purpose: Maximize information gain about user preferences during exploration.

Formally: maximize H(P(C_p(v) = True)) subject to C(v) = True

Adaptation: Continual update of constraint parameters (theta) from interaction history

Training Data:

Datasets derived online from history h_t: (v, label) for classifiers, or (v, observation, text) for LLMs

Compute: Not reported in the paper

Comparison to Prior Work

vs. Thumm et al. & Wang et al.: CBTL learns online over time and generalizes between tasks, whereas prior work requires full preference specification at the start of every task
vs. Reinforcement Learning (RL): CBTL does not require a reward function and uses constraints for safety guarantees, avoiding the 'unsafe exploration' problem typical of RL
vs. Standard TAMP: Uses the null space of solutions for personalization rather than just picking the first valid plan

Limitations

Relies on the existence of a non-empty null space; if safety constraints are too tight, personalization is impossible
Requires domain-specific CSP generators to be implemented by engineers upfront
Assuming the user provides accurate feedback or natural language signals for the learning step

Reproducibility

Code: https://emprise.cs.cornell.edu/cbtl/

Code is publicly available at https://emprise.cs.cornell.edu/cbtl/. Simulation environments (Cooking, Cleaning, Books) and web study interface are described. Specific LLM prompt templates are referenced in Appendix B (not in snippet).

📊 Experiments & Results

Evaluation Setup

Comparison of personalization efficiency and effectiveness in simulation and real-world studies

Benchmarks:

Cooking Simulation (2D stove-top meal preparation) [New]
Cleaning Simulation (Robotic arm dusting surfaces) [New]
Books Simulation (Fetching and handing over books) [New]
Assisted Feeding Study (Web-based user preference study & Real robot demo) [New]

Metrics:

User satisfaction (Likert scale)
Prediction accuracy (categorical choices)
Statistical methodology: Wilcoxon-Signed Rank test for user study results

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Assisted Feeding (Web Study)	p-value (Wilcoxon-Signed Rank)	Not reported in the paper	<0.005	Significant preference

Experiment Figures

Web-based user study results comparing CBTL to baselines.

Real-robot generalization demonstration.

Main Takeaways

CBTL consistently outperforms baselines (No Personalization, Free Explore, Exploit Only, Epsilon-Greedy) across three diverse simulation domains, avoiding overfitting while exploring efficiently.
Active learning (entropy maximization) is critical; naive exploration (Free Explore) wastes time, while purely greedy methods (Exploit Only) overfit to initial successes (e.g., reusing the first successful book handover pose forever).
The method generalizes real-world constraints: occlusion preferences learned during a 'feeding' task were successfully transferred to a 'drinking' task without additional training, demonstrating compositional generalization.

📚 Prerequisite Knowledge

Prerequisites

Constraint Satisfaction Problems (CSPs)
Task and Motion Planning (TAMP)
Active Learning (Entropy maximization)

Key Terms

CSP: Constraint Satisfaction Problem—a mathematical set of variables and rules (constraints) where the goal is to find values for variables that satisfy all rules

Null Space: In this paper, the set of all valid solutions to a CSP; effectively the 'wiggle room' where the robot can choose different actions that are all considered safe

CBTL: Coloring Between the Lines—the proposed method for learning personalized constraints within the safe null space

TAMP: Task and Motion Planning—algorithms that combine high-level logic (what to do) with low-level geometry (how to move)

SE(2)/SE(3): Special Euclidean groups representing rigid body transformations (position and rotation) in 2D and 3D space

LLM: Large Language Model—used here to generate natural language constraints based on user feedback

Active Learning: A machine learning approach where the model actively chooses which data points to learn from (here, which plans to execute) to learn as fast as possible