Mixture-of-Agents (MoA): A layered architecture where multiple LLMs generate responses, which are then aggregated and refined by subsequent layers of LLMs
Pre-inference: Executing a model to get its output before deciding whether to use it; RouteMoA avoids this to save cost
SLM: Small Language Model—used here as a lightweight scorer (86M parameters) to predict LLM performance
Self-assessment: A model evaluating its own confidence or output quality
Cross-assessment: One model evaluating the quality of another model's output
mDeBERTaV3-base: A small, pre-trained language model used to encode queries into embeddings for the scorer
Prior knowledge: Information available before model execution (e.g., query content)
Posterior knowledge: Information available after model execution (e.g., generated response, confidence score)