Cross-Lingual Knowledge Augmentation for Mitigating Generic Overgeneralization in Multilingual Language Models

📝 Paper Summary

Generic overgeneralization (GOG) Multilingual commonsense reasoning Knowledge injection

Combining commonsense (ConceptNet) and encyclopedic (DBpedia) knowledge via graph attention networks reduces the tendency of multilingual models to interpret generic statements like 'lions have manes' as universal truths.

Core Problem

Language models and humans exhibit 'generic overgeneralization' (GOG), incorrectly treating generic statements (e.g., 'ducks lay eggs') as universal ('all ducks lay eggs'), a bias exacerbated in low-resource languages.

Why it matters:

Models fail to capture nuanced semantics, treating exceptions (e.g., male ducks don't lay eggs) as false rather than valid variations
Low-resource languages like isiZulu and Sepedi lack digital corpora and have distinct morphological features (e.g., obligatory plural marking) that may amplify this bias
Prior mitigation work focused only on English and limited knowledge bases (ASCENT KB), leaving cross-lingual patterns and broader knowledge sources unexplored

Concrete Example: When given the true generic 'ducks lay eggs', models incorrectly accept the universal claim 'all ducks lay eggs' despite male ducks not laying eggs. In isiZulu, 'amabhubesi' (lions) uses a prefix 'ama-' that inherently marks plurality, potentially biasing models toward universal interpretations.

Key Novelty

Dual-Source Cross-Lingual Knowledge Injection

Projects large-scale knowledge graphs (ConceptNet for commonsense, DBpedia for facts) into low-resource languages using alignment tools
Demonstrates that commonsense knowledge specifically helps 'minority' generics (subset properties) while encyclopedic knowledge helps 'majority' generics (exceptions)
Uses a Graph Attention Network (QA-GNN) to reason over these projected knowledge subgraphs to correct the model's semantic interpretation

Evaluation Highlights

Combined knowledge (ConceptNet + DBpedia) reduces generic overgeneralization by ~67% (relative MRR reduction) in mT5-large across 5 languages
ConceptNet alone reduces overgeneralization for minority characteristic generics by 45-52% relative to baseline
Nguni languages (isiZulu, isiXhosa) show 4-7% higher baseline overgeneralization than Sotho-Tswana languages, suggesting morphological influence

Breakthrough Assessment

7/10

Strong empirical evidence for cross-lingual knowledge transfer addressing a specific semantic bias. First study of this phenomenon in African languages, though limited by reliance on translated evaluation data.

⚙️ Technical Details

Problem Definition

Setting: Quantifier Prediction and Classification: determining if a statement is generic or universal, and predicting appropriate quantifiers for masked generic statements

Inputs: A generic statement with a masked quantifier (e.g., '[MASK] lions have manes') or a universally quantified statement

Outputs: Probability distribution over quantifiers (all, every, some, most, etc.) or binary classification (Generic/Non-Generic)

Pipeline Flow

Input Generic Statement
Entity Extraction & Subgraph Retrieval (from ConceptNet/DBpedia)
Knowledge Projection (English KB to Target Language)
Graph Reasoning (QA-GNN)
Quantifier Prediction / Classification

System Modules

Knowledge Projection

Translates/aligns English knowledge graphs to target African languages

Model or implementation: LeNS-Align (from prior work)

Graph Encoder

Reasons over retrieved knowledge subgraphs to identify exceptions or subset properties

Model or implementation: QA-GNN (Graph Attention Network)

Language Model

Predicts masked tokens or classifies statements using text + knowledge context

Model or implementation: mT5-large

Modeling

Base Model: mT5-large (for multilingual experiments); BERT-large/RoBERTa-large (for English baselines)

Training Method: Knowledge injection via QA-GNN (multilingual) or KEPLER (English)

Objective Functions:

Purpose: Minimize error in identifying correct quantifiers or classifying generic statements.

Formally: Standard cross-entropy loss for masked token prediction / classification.

Training Data:

5884 minority characteristic generics
8750 majority characteristic generics
60368 training generics
Translated to 4 languages (isiZulu, isiXhosa, Sepedi, SeSotho) via Google Translate API

Key Hyperparameters:

batch_size: 16 (mT5), 32 (BERT/RoBERTa)
learning_rate: 1e-4 (mT5), 2e-5 (BERT/RoBERTa)
epochs: 10 (mT5), 5 (BERT/RoBERTa)
+ 2 more
subgraph_hops: 2
max_nodes_per_subgraph: 50

Compute: Google Cloud Compute Engine with 2 x NVIDIA A100 80GB GPUs, 340GB memory. Training time: ~8 hours (BERT/RoBERTa), ~12 hours (mT5).

Comparison to Prior Work

vs. ASCENT KB: ConceptNet/DBpedia provide broader conceptual/encyclopedic coverage (450k+ triples vs 400k faceted triples), resulting in nearly double the reduction in overgeneralization (67% vs 30-40%)

Limitations

Relies on translated data for low-resource languages rather than native corpora, introducing potential noise
Classification accuracy remains low (<40%) even with knowledge enhancement, indicating persistent difficulty
Study limited to two language families (Nguni, Sotho-Tswana), restricting typological generalizability
Does not eliminate the performance gap between language families; Nguni languages still overgeneralize more

Reproducibility

Code: https://github.com/sello-ralethe/Multilingual_Generics

Publicly available code (https://github.com/sello-ralethe/Multilingual_Generics). Dataset includes translated generics. Uses standard Google Translate API for data creation. Knowledge bases (ConceptNet, DBpedia) are public, but the specific projected versions rely on LeNS-Align.

📊 Experiments & Results

Evaluation Setup

Cross-lingual evaluation on English plus 4 South African languages (isiZulu, isiXhosa, Sepedi, SeSotho)

Benchmarks:

Generic Overgeneralization Dataset (Quantifier Prediction (Masked Language Modeling)) [New]
Generic Classification Dataset (Binary Classification (Generic vs. Non-Generic)) [New]

Metrics:

Mean Reciprocal Rank (MRR) (Lower is better for universal quantifiers in prediction task)
Precision@5 (P@5) (Lower is better for universal quantifiers)
Classification Accuracy (Higher is better)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
English results comparing knowledge sources show ConceptNet excels at minority generics while DBpedia excels at majority generics.
Generic Overgeneralization Dataset (Minority)	MRR (Lower is better)	0.217	0.158	-0.059
Generic Overgeneralization Dataset (Majority)	MRR (Lower is better)	0.257	0.180	-0.077
Generic Overgeneralization Dataset (Combined)	MRR (Lower is better)	0.329	0.108	-0.221
Cross-lingual results show consistent reduction across all languages, with Nguni languages starting at a worse baseline.
Generic Overgeneralization Dataset (isiZulu - Minority)	MRR (Lower is better)	0.347	0.151	-0.196
Generic Overgeneralization Dataset (Sepedi - Minority)	MRR (Lower is better)	0.324	0.144	-0.180
Universally Quantified Statements	Accuracy	10.5	37.3	+26.8

Main Takeaways

Commonsense knowledge (ConceptNet) is superior for 'minority' generics (e.g., lions have manes) by explaining subset properties.
Encyclopedic knowledge (DBpedia) is superior for 'majority' generics (e.g., tigers have stripes) by providing factual exceptions (e.g., white tigers).
Knowledge augmentation is effective cross-lingually (via projection), but morphological features (like Nguni plural prefixes) cause persistent higher baseline overgeneralization.
Models struggle significantly to differentiate 'all' from generic statements (baseline accuracy ~10%), improving to only ~37% with knowledge, suggesting deep semantic confusion remains.

📚 Prerequisite Knowledge

Prerequisites

Understanding of 'generics' in linguistics (statements about kinds tolerating exceptions)
Knowledge Graph structures (triples, relations)
Multilingual language models (mT5)
Graph Neural Networks (specifically Graph Attention Networks)

Key Terms

generic overgeneralization: The cognitive and computational bias where a system incorrectly interprets a generic statement (e.g., 'birds fly') as a universal truth ('all birds fly')

minority characteristic generics: Statements true for only a subset of a kind (e.g., 'lions have manes' - only males do)

majority characteristic generics: Statements true for most but not all of a kind (e.g., 'tigers have stripes' - albinos do not)

ASCENT KB: A baseline knowledge base of animal-related triples with faceted information (e.g., 'young lions do not have manes')

ConceptNet: A commonsense knowledge graph capturing prototypical relations like 'CapableOf' or 'HasProperty'

DBpedia: An encyclopedic knowledge base extracted from Wikipedia, containing factual details and exceptions

QA-GNN: Question Answering Graph Neural Network—a method that retrieves relevant subgraphs and reasons over them using attention mechanisms

KEPLER: Knowledge Embedding and Pre-trained Language Representation—a framework for injecting knowledge into PLMs by training on verbalized triples

MRR: Mean Reciprocal Rank—a metric used here to evaluate how highly the model ranks incorrect universal quantifiers (lower is better for this specific task)

Nguni languages: A Bantu language family including isiZulu and isiXhosa, characterized by specific noun class systems

Sotho-Tswana languages: A Bantu language family including Sepedi and SeSotho, distinct from Nguni in morphology