NMT: Neural Machine Translation—automated translation using deep neural networks, known for preserving sentence structure but sometimes lacking nuance
GAIA: A benchmark for General AI Assistants that tests reasoning, tool use, and multi-modality in real-world scenarios
SWE-Bench: Software Engineering Benchmark—evaluates an agent's ability to resolve GitHub issues via code generation and editing
ASB: Agent Security Benchmark—evaluates agent robustness against adversarial attacks and safety violations
answerability: A custom metric measuring whether a translated task preserves enough meaning for a human expert to solve it correctly
Multilingual Effect: The measurable degradation in AI performance or safety when processing non-English inputs compared to English
adequacy: A translation quality metric assessing whether the meaning of the source text is fully preserved in the target
fluency: A translation quality metric assessing the grammatical and stylistic naturalness of the translated text