ReSet: Rejection Sampling for Continued Self-instruction Tuning—the proposed method of filtering model generations to create a high-quality fine-tuning dataset
Faithfulness: The degree to which a model's response is grounded in and supported by the provided source context, rather than hallucinated
Instruction Following: The ability of a model to adhere to open-ended user requests, style constraints, and formatting rules
MTL: Multi-Task Learning—training a model simultaneously on mixed datasets (here, both instruction-following and context-dependent data)
LLM-as-a-Judge: Using a strong LLM (like GPT-4) to evaluate the quality of outputs from a smaller model
Rejection Sampling: A technique where multiple samples are generated, evaluated against a criterion, and only valid/high-quality samples are retained for training
Supercharge: In this paper, a variant of ReSet (ReSet-S) that uses more aggressive sampling and filtering to create a smaller but higher-quality dataset
SummaC-ZS: A zero-shot metric for checking if a summary or answer is entailed by the source text, used here to measure faithfulness