ASMR: Aggregated Semantic Matching Retrieval—the proposed method that generates open-ended answers first to retrieve relevant choices
MCP: Multiple Choice Prompting—a baseline method where the question and all answer choices are directly provided to the LLM
ZS-SC: Zero-Shot Self-Consistency—a method that samples multiple reasoning paths and aggregates answers via majority vote
SimCSE: Simple Contrastive Learning of Sentence Embeddings—a framework for learning sentence embeddings used here to measure similarity between generated answers and choices
CSQA: CommonsenseQA—a dataset for commonsense reasoning
SIQA: SocialIQA—a dataset for social and emotional intelligence reasoning
ARC: AI2 Reasoning Challenge—a dataset of grade-school science questions (Easy and Challenge sets)
Open-Ended Question Answering: Prompting the model with just the question (no choices) to generate a free-text response
Cosine Similarity: A metric used to measure the similarity between two non-zero vectors (text embeddings)
Beam Search: A decoding strategy that explores multiple probable next tokens to generate text
Greedy Search: A decoding strategy that selects the most probable token at each step