Test-Time Personalization with Meta Prompt for Gaze Estimation

📝 Paper Summary

Unsupervised Domain Adaptation Source-Free Domain Adaptation Test-Time Adaptation

TPGaze enables fast, efficient test-time personalization of gaze estimation by updating only a small set of prompt parameters initialized via meta-learning to align unsupervised losses with gaze accuracy.

Core Problem

Existing personalized gaze estimation methods require labels or calibration, while source-free unsupervised domain adaptation methods are too computationally expensive for edge devices and lack guarantees that minimizing unsupervised losses reduces gaze error.

Why it matters:

Personalizing gaze estimation is crucial for user experience on portable devices but collecting labeled data is impractical for end-users
Full-model fine-tuning on edge devices is computationally prohibitive and prone to overfitting on limited personal data
Without labels, minimizing proxy losses (like symmetry) does not automatically guarantee improved gaze estimation accuracy

Concrete Example: A standard ResNet-18 gaze model trained on public data suffers performance degradation on a new user due to appearance shifts. Fine-tuning the whole model on the user's unlabeled face images is slow and risky. Simply minimizing a symmetry loss might collapse to trivial solutions that satisfy symmetry but fail to estimate gaze direction correctly.

Key Novelty

Test-time Personalized Gaze estimation (TPGaze) with Meta-Learned Prompts

Treats convolutional padding as a learnable 'prompt' parameter, freezing the backbone to reduce tunable parameters to <1% of the model
Uses meta-learning to find an optimal prompt initialization such that subsequent test-time updates using an unsupervised proxy loss (symmetry) reliably lead to lower gaze estimation error

Architecture

Illustration of the Prompt mechanism using tunable padding in convolutional layers.

Evaluation Highlights

Achieves 10x faster adaptation speed compared to standard domain adaptation baselines
Outperforms state-of-the-art unsupervised source-free domain adaptation methods (like RUDA and CRGA) on cross-dataset benchmarks
Reduces tunable parameters to less than 1% of a ResNet-18 model compared to full fine-tuning

Breakthrough Assessment

7/10

Novel application of prompt tuning (via padding) to gaze estimation and a clever meta-learning formulation to bridge unsupervised losses and supervised goals. Significant efficiency gains.

⚙️ Technical Details

Problem Definition

Setting: Source-free test-time personalization: Adapting a pre-trained model f_theta to a specific target person j using only unlabeled images A_j

Inputs: Unlabeled face images x_i from the target person

Outputs: Personalized gaze direction predictions y_i

Pipeline Flow

Input Image -> Modified ResNet-18 Backbone (Frozen Weights + Tunable Padding Prompts) -> Gaze Prediction

System Modules

Prompt (Tunable Padding)

Modifies feature maps at boundaries to guide the frozen backbone towards personalized features

Model or implementation: Learnable tensors replacing standard padding

Backbone

Extracts gaze features from images

Model or implementation: ResNet-18 (Frozen)

Novel Architectural Elements

Replacement of static zero-padding in convolutional layers with learnable prompt parameters for gaze estimation
Meta-initialization scheme specifically designed to align unsupervised symmetry loss gradients with supervised gaze error reduction

Modeling

Base Model: ResNet-18

Training Method: Meta-learning (bi-level optimization) for prompt initialization; Test-time optimization via gradient descent

Objective Functions:

Purpose: Pre-training supervision.

Formally: Standard L1 or MSE loss on source gaze labels
Purpose: Test-time adaptation (and inner loop of meta-learning).

Formally: Symmetry loss L_per (enforcing consistent predictions for original and flipped images)
Purpose: Meta-learning objective (outer loop).

Formally: Minimize supervised gaze error on source data after updating prompt via symmetry loss

Adaptation: Prompt tuning (updating only padding parameters)

Trainable Parameters: Less than 1% of ResNet-18 parameters (the prompts)

Key Hyperparameters:

meta_inner_learning_rate: lambda_1 (value not explicitly in text)
meta_outer_learning_rate: lambda_2 (value not explicitly in text)

Compute: Adaptation is 10x faster than full fine-tuning baselines

Comparison to Prior Work

vs. CRGA/RUDA: TPGaze updates only prompts (efficient) vs. full model fine-tuning; TPGaze targets personalization (single user) vs. domain adaptation (generic target domain)
vs. Standard Prompt Tuning: TPGaze uses meta-learning to initialize prompts to ensure unsupervised loss effectiveness, rather than random or fixed initialization
vs. MAML: TPGaze aligns an auxiliary unsupervised task (symmetry) with the primary supervised task, rather than just learning fast adaptation for the supervised task [not cited in paper as direct MAML comparison, but implied by methodology]

Limitations

Relies on the assumption that symmetry loss is a sufficient proxy for gaze correctness after meta-learning
Evaluation primarily on cross-dataset adaptation, assuming this mimics personalization
Requires access to source data for the meta-learning phase (though test-time is source-free)

Reproducibility

Code: https://github.com/hmarkamcan/TPGaze

Code is available at https://github.com/hmarkamcan/TPGaze. The paper explicitly mentions using ResNet-18. Exact learning rate values for meta-learning are denoted by variables in the text but specific numbers are not in the main text.

📊 Experiments & Results

Evaluation Setup

Cross-dataset validation where models are trained on a source dataset and personalized on a target dataset (person-specific subsets)

Benchmarks:

MPIIGaze (Gaze Estimation)
Gaze360 (Gaze Estimation)
ETH-XGaze (Gaze Estimation)

Metrics:

Gaze Estimation Error (angular error in degrees)
Adaptation Speed / Time
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
The method shows superior performance in adaptation speed compared to existing UDA methods.
Not specified (General finding)	Adaptation Speed	1.0	10.0	9.0
ResNet-18	Trainable Parameters	100	1	-99

Experiment Figures

A preview of performance comparison (Gaze Error vs. Samples/Time maybe) showing TPGaze achieving lower error faster than other methods.

Main Takeaways

Meta-learning effectively aligns the unsupervised symmetry loss with the supervised gaze error, allowing the model to improve using only unlabeled data at test time.
Updating only the prompt (tunable padding) is sufficient for personalization and significantly more efficient than fine-tuning the entire backbone.
The method generalizes well across different datasets (MPIIGaze, Gaze360, ETH-XGaze) in cross-dataset validation settings.

📚 Prerequisite Knowledge

Prerequisites

Gaze estimation basics (appearance-based)
Unsupervised Domain Adaptation (UDA)
Prompt tuning (specifically visual prompting)
Meta-learning (MAML-style optimization)

Key Terms

prompt: In this paper, learnable parameters added to the network input or padding, specifically tunable padding in convolutional layers, kept separate from the frozen backbone weights

meta-learning: Learning to learn; here, optimizing the initial values of the prompt so that a few gradient steps on a proxy loss (symmetry) lead to better task performance (gaze error)

symmetry loss: An unsupervised loss based on the physical property that flipping an eye image horizontally should result in a horizontally flipped gaze vector

test-time personalization: Adapting a deployed model to a specific user's data distribution during inference/deployment, without access to the original training data

tunable padding: Replacing standard zero-padding in convolutions with learnable parameters that serve as the prompt