← Back to Paper List

Tiny Aya: Bridging Scale and Multilingual Depth

Alejandro R. Salamanca, Diana Abagyan, Daniel D'souza, Ammar Khairi, David Mora, Saurabh Dash, Viraat Aryabumi, Sara Rajaee, Mehrnaz Mofakhami, Ananya Sahu, Thomas Euyang, Brittawnya Prince, Madeline Smith, Hangyu Lin, Acyr Locatelli, Sara Hooker, Tom Kocmi, Aidan Gomez, Ivan Zhang, Phil Blunsom, Nick Frosst, Joelle Pineau, Beyza Ermis, Ahmet Üstün, Julia Kreutzer, Marzieh Fadaee
Cohere
arXiv (2026)
Pretraining Benchmark Reasoning QA

📝 Paper Summary

Multilingual Language Modeling Small Language Models (SLMs)
Tiny Aya is a family of 3.35-billion parameter multilingual models that achieves high performance across 70 languages through balanced data curation and region-aware post-training rather than brute-force scaling.
Core Problem
Current multilingual models prioritize high-resource languages (like English) due to skewed data availability, while strategies to improve performance rely on massive scale that limits accessibility and deployment.
Why it matters:
  • Performance gains track data availability, reinforcing disparities between high-resource and underrepresented linguistic communities
  • Dominant scaling strategies raise barriers for researchers and limit adaptability for practical deployment in varied regions
  • Existing small models often fail to maintain consistency or safety across diverse languages compared to English
Concrete Example: A standard multilingual model might tokenize underrepresented scripts like Khmer inefficiently, using many tokens per word, which degrades performance and increases cost. In contrast, Tiny Aya uses a balanced tokenizer that compresses these scripts effectively.
Key Novelty
Balanced Multilingual Small Language Model Family
  • Constructs a tokenizer and data mixture explicitly weighted to balance 70 languages, ensuring equitable representation rather than following natural data distribution
  • Introduces region-specialized model variants (Earth, Fire, Water) optimized for specific linguistic clusters (e.g., South Asia, Africa) alongside a general global model
  • Uses a 'Fusion-of-NN' (FusioNN) pipeline where multiple teacher models generate synthetic data which is then aggregated and filtered to improve quality for low-resource languages
Evaluation Highlights
  • Tiny Aya Global outperforms Gemma 3-4B in translation quality on 46 of 55 languages in the WMT24++ benchmark
  • Region-specialized variants improve translation quality significantly, with up to +5.5 ChrF points in South Asia compared to the base global model
  • Achieves highest mean safe response rate (91.1%) on MultiJail compared to baselines, while reducing safety disparities across languages
Breakthrough Assessment
8/10
Significantly advances the capability of small models (sub-4B) in multilingual settings, demonstrating that careful data curation and regional specialization can compete with larger models on diverse languages.
×