← Back to Paper List

Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian

TM Buonocore, S Rancati, E Parimbelli
Dept. of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
arXiv, 7/2024 (2024)
Pretraining QA Benchmark

📝 Paper Summary

Biomedical NLP Low-resource language modeling
Igea adapts the Italian-English Minerva foundation model into a specialized biomedical generator for Italian by continually pre-training on a diverse corpus of translated abstracts, textbooks, and web data.
Core Problem
General-purpose Italian language models lack the specialized terminology required for medical accuracy, while existing biomedical models are predominantly English-centric.
Why it matters:
  • Medical communication requires high precision and clarity; generic models often hallucinate or misuse terminology.
  • Significant disparity exists in NLP resources between English and other languages like Italian, hindering clinical adoption in non-English speaking regions.
  • Previous Italian biomedical efforts (e.g., BioBIT) were small BERT-based encoders unsuitable for generative tasks.
Concrete Example: A general Italian LLM might describe a medical condition using lay terms or inaccurate translations, whereas Igea is trained to use formal scientific lexicon appropriate for clinical documentation or research.
Key Novelty
Continual Pre-training for Italian Biomedical Generation
  • Continually pre-trains a general-purpose Italian/English model (Minerva) on a curated 5-billion-word corpus of Italian medical text.
  • Combines diverse data sources—formal textbooks, translated PubMed abstracts, and layman web discussions—to capture both technical and patient-facing registers.
  • Uses a cosine learning rate schedule with warmup to adapt the model to the new domain without catastrophically forgetting general language capabilities.
Evaluation Highlights
  • Igea 3B achieves 31.3% accuracy on MedMCQA-ITA, outperforming the base Minerva 3B model (29.3%) on this domain-specific task.
  • Retains general knowledge with competitive scores on Italian MMLU (34.3% vs Minerva's 36.2%) despite heavy domain adaptation.
  • Demonstrates scaling behavior where larger models (3B) consistently outperform smaller variants (350M, 1B) across both medical and general benchmarks.
Breakthrough Assessment
5/10
Significant as the first generative biomedical LLM for Italian, filling a major resource gap. Methodologically standard (continual pre-training), but the resulting artifacts and dataset (MedMCQA-ITA) are valuable contributions.
×