reflection tokens: Special vocabulary tokens generated by the model to signal decisions (e.g., [Retrieve]) or assessments (e.g., [Relevant], [Supported])
critic model: An auxiliary model (initialized from Llama 2-7B) trained on GPT-4 distilled data to annotate the training corpus with reflection tokens
Retrieve token: A decision token indicating whether external documents are needed to answer the current query
IsRel token: A critique token indicating if a retrieved document provides useful information for the input
IsSup token: A critique token indicating if the generated response is fully supported by the retrieved evidence
IsUse token: A critique token indicating the overall utility/quality of the response
segment-level beam search: A decoding strategy where the model generates a full sentence/segment, evaluates it using reflection token probabilities, and selects the best path
control tokens: Special tokens used to guide generation style or content; here, reflection tokens serve as dynamic control tokens
knowledge distillation: Transferring capabilities from a large model (teacher, here GPT-4) to a smaller model (student, here the Critic model)