VTG: Video Temporal Grounding—the task of finding exact start and end times for a text query within a video
RLVR: Reinforcement Learning with Verifiable Rewards—a training method where the model is optimized using a ground-truth verifier (like IoU) rather than a learned reward model
IoU: Intersection over Union—a metric measuring the overlap between the predicted time segment and the ground truth segment
SFT: Supervised Fine-Tuning—standard training on labeled data before applying RL
thinking-free: A model approach that outputs answers directly without generating intermediate 'reasoning' or 'thought' tokens
interleaved textual encoding: Representing time by inserting text tokens (e.g., '<0.5>') directly into the sequence, rather than using special learned embeddings
TimeLens-Bench: The author's newly curated, high-quality evaluation suite derived from re-annotating existing datasets
TimeLens-100K: The author's newly created training dataset, generated by automatically fixing labels in existing corpora