Know-Filter: A small auxiliary model (MonoT5) trained to predict whether a specific piece of generated knowledge will help the solver LLM answer the question correctly.
Utility Score: The probability assigned by the solver LLM to the correct answer option when provided with a specific piece of context knowledge.
UWC Loss: Utility-Weighted Classification Loss—a custom loss function that aligns the Know-Filter's predictions with the actual utility score (probability of correct answer) rather than just binary labels.
SLFG: Sentence-Level Fusion Generation—a decoding strategy where the LLM generates one sentence, the Know-Filter evaluates it, and the best sentences are fused to prompt the next sentence.
CoT: Chain-of-Thought—a prompting technique where the model generates intermediate reasoning steps before the final answer.
MonoT5: A T5 (Text-to-Text Transfer Transformer) model fine-tuned as a point-wise ranker/classifier, often used in information retrieval.
greedy decoding: A decoding method that selects the most probable token at each step.
vicuna: An open-source chatbot trained by fine-tuning LLaMA on user-shared conversations.
Alpaca: An instruction-following language model fine-tuned from LLaMA on instruction-following demonstrations.