← Back to Paper List

Filipino Benchmarks for Measuring Sexist and Homophobic Bias in Multilingual Language Models from Southeast Asia

LCL Gamboa, M Lee
School of Computer Science, University of Birmingham, Department of Information Systems and Computer Science, Ateneo de Manila University
arXiv, 12/2024 (2024)
Benchmark Pretraining

📝 Paper Summary

Bias Evaluation Low-Resource NLP Multilingual Models
The authors introduce Filipino CrowS-Pairs and WinoQueer—7,074 culturally adapted prompt pairs—to expose significant sexist and homophobic biases in masked and causal multilingual language models.
Core Problem
Most bias benchmarks are English-centric, failing to account for linguistic differences (like gender neutrality in Filipino) and distinct cultural concepts of queerness in Southeast Asia.
Why it matters:
  • Multilingual models are increasingly deployed in Southeast Asia, but their potential social harms in local contexts remain unmeasured
  • English benchmarks rely on gendered pronouns (he/she) which do not exist in Filipino (siya), making direct translation ineffective for bias probing
  • Indigenous Filipino queer identities (e.g., bakla, tomboy) do not map 1:1 onto Western LGBTQ+ labels, rendering English bias datasets culturally irrelevant
Concrete Example: Directly translating 'He/She is a programmer' fails in Filipino because 'he' and 'she' both translate to the gender-neutral 'siya', resulting in identical sentences that cannot measure bias. The authors instead use 'lalaki' (man) and 'babae' (woman) descriptors to reintroduce gender signals.
Key Novelty
Filipino CrowS-Pairs and Filipino WinoQueer
  • Culturally adapted 7,074 prompt pairs from English CrowS-Pairs and WinoQueer, specifically addressing Filipino's gender-neutral grammar and local queer terminology
  • First application of bias benchmarks to causal multilingual models (e.g., SeaLLM, Merak-7B) specifically developed for Southeast Asia context
  • systematic documentation of cultural adaptation challenges (e.g., removing 'Thanksgiving', adapting 'social justice warrior' to 'fighting for too many causes')
Evaluation Highlights
  • Released 7,074 new Filipino bias evaluation challenge pairs (1,424 for CrowS-Pairs, 5,650 for WinoQueer)
  • Evaluated masked models (XLM-RoBERTa, mBERT) and causal models (XGLM, Bloom, SeaLLM, Merak, Llama-3, Aya-23), confirming presence of bias across all
  • Found that for multilingual models, bias magnitude correlates with the volume of pretraining data in the specific language
Breakthrough Assessment
7/10
Significant contribution to low-resource and Southeast Asian NLP fairness. While the methodology adapts existing English frameworks rather than inventing new metrics, the cultural rigor and dataset release fill a critical gap.
×