Relevance Generation: A pointwise ranking method where an LLM is prompted to output 'Yes' or 'No' for a query-passage pair, using the probability of 'Yes' as the score.
Demonstration Reranker (DReranker): A cross-encoder model that takes a query, passage, and a sequence of already selected demonstrations to predict the next best demonstration.
List-pairwise training: A training method where the model compares two demonstration lists that differ only in their last element to learn the optimal sequential selection.
NDCG: Normalized Discounted Cumulative Gainβa measure of ranking quality that takes into account the position of relevant items.
Bi-encoder: A model architecture that encodes two inputs (e.g., query and document) separately into vectors and computes their similarity (usually dot product).
Cross-encoder: A model architecture that concatenates two inputs and processes them together through the network layers, allowing for full interaction between them.
RankNet: A pairwise learning-to-rank loss function that optimizes the probability that a relevant document is ranked higher than an irrelevant one.
NP-hard: A class of problems that are at least as hard as the hardest problems in NP; here, finding the optimal permutation of demonstrations is computationally intractable.