Targeted Lexical Injection: Unlocking Latent Cross-Lingual Alignment in Lugha-Llama via Early-Layer LoRA Fine-Tuning

📝 Paper Summary

Cross-lingual alignment Low-resource languages (LRLs) Parameter-Efficient Fine-Tuning (PEFT)

The paper improves Swahili-English translation capabilities in a Llama-based model by identifying that early layers already understand these translations perfectly, then fine-tuning the model to preserve this knowledge until the final output.

Core Problem

Multilingual LLMs often show poor output-level alignment for low-resource languages like Swahili, despite possessing strong latent knowledge of these translations in their internal layers.

Why it matters:

Low-resource languages like Swahili are underserved by current LLMs, hindering equitable access to technology.
Suboptimal lexical alignment degrades performance in critical downstream tasks like translation and cross-lingual information retrieval.
Current methods often assume the model lacks knowledge, rather than realizing the knowledge exists but is lost during information propagation through deep networks.

Concrete Example: The model Lugha-Llama inherently knows that Swahili 'mkate' means 'bread' (near-perfect similarity in Layer 2), but by the time the information reaches the final output layer (Layer 31), the representation has degraded to a low similarity (~0.32), failing to reflect this knowledge.

Key Novelty

Targeted Lexical Injection (TLI)

Empirically identifies a specific early layer (Layer 2) where cross-lingual alignment is naturally maximal before it degrades deeper in the network.
Uses LoRA (Low-Rank Adaptation) fine-tuning with a contrastive objective specifically on embeddings from this optimal early layer to reinforce and propagate this existing knowledge to the output.

Evaluation Highlights

+28.08% improvement in average cosine similarity for trained Swahili-English word pairs (from 0.3211 to 0.4113).
+28.32% improvement generalizes to unseen control word pairs (from 0.3143 to 0.4033), showing the method improves the mechanism rather than just memorizing pairs.
Statistical significance confirmed with extremely low p-values (p < 10^-240) for both trained and control sets.

Breakthrough Assessment

7/10

Novel insight about layer-wise alignment degradation and a targeted, parameter-efficient fix. Strong empirical results for Swahili, though limited to lexical alignment on one language pair so far.

⚙️ Technical Details

Problem Definition

Setting: Cross-lingual lexical alignment for low-resource languages in pre-trained Large Language Models (LLMs)

Inputs: Swahili word (anchor) and English word (positive/negative candidates)

Outputs: Aligned vector representations at the final output layer

Pipeline Flow

Input Tokenization
Layer 2 Embedding Extraction (Target Layer)
Transformer Layers 3-31 (Propagation)
Output Layer Embedding Extraction

System Modules

Input Processing

Tokenizes input words and converts them to initial embeddings

Model or implementation: Lugha-Llama-8B-wura (Input Layer)

Early Transformer Layers (Representation Learning)

Processes embeddings up to the empirically optimal alignment layer

Model or implementation: Lugha-Llama-8B-wura (Layers 1-2)

Deeper Transformer Layers (Representation Learning)

Propagates information through the network, where alignment typically degrades

Model or implementation: Lugha-Llama-8B-wura (Layers 3-31) with LoRA Adapters

Output Extraction

Extracts final vector representations for evaluation

Model or implementation: Mean Pooling + L2 Normalization

Novel Architectural Elements

Intervention at an empirically selected internal layer (Layer 2) rather than input/output only
Training objective applied to internal layer representations to guide final output alignment

Modeling

Base Model: Lugha-Llama-8B-wura (Llama-3 architecture)

Training Method: LoRA Fine-Tuning with Contrastive Learning

Objective Functions:

Purpose: Pull Swahili anchor and English positive translation closer, push English negative further away.

Formally: Loss = max(0, sim(e_a, e_n) - sim(e_a, e_p) + margin)

Adaptation: LoRA (rank=16, alpha=32, dropout=0.05)

Trainable Parameters: LoRA adapters on q_proj and v_proj modules

Training Data:

623 Swahili-English word pairs for training
63 pairs for control/testing
Sourced from SAWA Corpus

Key Hyperparameters:

learning_rate: 2e-4
batch_size: 8
epochs: 5
+ 3 more
warmup_steps: 50
optimizer: AdamW
margin: 0.4

Compute: Single CUDA-equipped device (exact GPU model not specified)

Comparison to Prior Work

vs. Static Alignment: TLI works on internal layers of a generative LLM rather than static vectors.
vs. Full Fine-Tuning: TLI uses PEFT (LoRA) and targets specific layer representations, avoiding catastrophic forgetting and high compute costs.
vs. Standard LoRA [not cited in paper]: Standard LoRA typically targets output generation loss (Next Token Prediction), whereas TLI targets internal representation alignment via contrastive loss.

Limitations

Evaluation limited to lexical alignment (cosine similarity) and does not test downstream generation tasks (e.g., translation quality, perplexity).
Focuses only on Swahili-English pairs; generalizability to other language pairs is untested.
Relies on the assumption that Layer 2 is optimal, which requires a pilot study and might vary by model or language pair.

Reproducibility

Code URL not provided in paper. Dataset splits (623 train / 63 control) and sources (SAWA corpus) are described. Model weights (Lugha-Llama-8B-wura) are open-source. Exact random seeds not specified.

📊 Experiments & Results

Evaluation Setup

Lexical alignment evaluation using cosine similarity of word embeddings

Benchmarks:

Custom Swahili-English Word Pair Dataset (Cross-lingual lexical alignment) [New]

Metrics:

Average Cosine Similarity
Statistical methodology: Paired t-test

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
TLI significantly improves alignment for trained word pairs compared to the base model.
Trained Set (623 pairs)	Average Cosine Similarity	0.3211	0.4113	+0.0902
Improvements generalize to unseen control words, indicating mechanism learning rather than memorization.
Control Set (63 unseen pairs)	Average Cosine Similarity	0.3143	0.4033	+0.0890

Experiment Figures

Layer-wise average cosine similarity for Swahili-English pairs in the base model (Pre-TLI)

Main Takeaways

Base model (Lugha-Llama) has near-perfect lexical alignment in early layers (Layer 2) which degrades significantly by the output layer.
TLI successfully injects this early-layer knowledge into the final representations, improving output alignment by ~28%.
The method generalizes to unseen words, suggesting it refines the model's internal processing pathways rather than just memorizing the training set.

📚 Prerequisite Knowledge

Prerequisites

Transformer architecture (layers, embeddings)
Vector space models and Cosine Similarity
Parameter-Efficient Fine-Tuning (LoRA)
Contrastive Learning (Triplet Loss)

Key Terms

_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.

LoRA: Low-Rank Adaptation—a technique to fine-tune large models by injecting small, trainable rank-decomposition matrices while freezing the pre-trained weights.

LRLs: Low-Resource Languages—languages with limited available digital data for training NLP models (e.g., Swahili).

Contrastive Learning: A learning paradigm that aims to pull similar items (positive pairs) close together in embedding space while pushing dissimilar items (negative pairs) apart.

Cosine Similarity: A metric used to measure how similar two vectors are, ranging from -1 (opposite) to 1 (identical).

Triplet Margin Loss: A loss function where the model learns to keep a positive example closer to an anchor than a negative example by a specific margin.

L2-normalization: Scaling a vector so that its length (Euclidean norm) is 1, ensuring similarity relies on direction rather than magnitude.

MWEs: Multi-Word Expressions—terms composed of multiple words that function as a single unit of meaning.