LA-UCL: LLM-Augmented Unsupervised Contrastive Learning Framework for Few-Shot Text Classification

📝 Paper Summary

Data Augmentation for Few-Shot Learning Contrastive Learning

LA-UCL enhances few-shot text classification by using retrieval-guided LLMs to generate diverse augmented samples and optimizing them via novel group-level and sample-level unsupervised contrastive losses.

Core Problem

Few-shot text classification suffers from overfitting and poor class discrimination because existing data augmentation methods (like simple paraphrasing) lack diversity and cognitive ability.

Why it matters:

Traditional augmentation models generate samples too similar to the original, failing to expand the feature space effectively
Lack of diversity in augmented data exacerbates overfitting in low-resource settings
Models struggle to distinguish between semantically similar classes without richer, more discriminative training signals

Concrete Example: When augmenting the question 'Who is the Prime Minister of Russia?', traditional models produce repetitive variants like 'Is Vladimir Putin a prime minister?'. In contrast, a retrieval-augmented LLM can generate diverse, high-quality variants by leveraging external knowledge, preventing the model from overfitting to simple surface patterns.

Key Novelty

Retrieval-Guided LLM Augmentation with Dual Contrastive Losses

Uses retrieval-based in-context prompts to guide an LLM (ChatGPT) in generating data. For labeled data, it retrieves similar negative samples to force the LLM to generate *discriminative* positives (avoiding confusion).
For unlabeled data, it retrieves external web knowledge to help the LLM generate *diverse* and accurate paraphrases, expanding the semantic space.
Introduces two specific unsupervised contrastive losses: Group-Level (interacts with base classes to improve discrimination) and Sample-Level (pulls diverse augmentations of the same sample together to reduce overfitting).

Architecture

The overall LA-UCL framework, illustrating the two data augmentation strategies (Self-augmented and External-augmented) and the corresponding contrastive learning losses.

Evaluation Highlights

Outperforms ContrastNet by +1.59% in 5-shot setting on HWU64 dataset
Achieves +22.09% improvement over MLADA on HuffPost 1-shot classification
Surpasses state-of-the-art ContrastNet by +3.54% on HuffPost 1-shot setting

Breakthrough Assessment

7/10

Solid combination of LLM generation and contrastive learning. The retrieval-guided prompting to fix specific augmentation weaknesses (discrimination vs. diversity) is clever, though the underlying components (CL + LLM aug) are established concepts.

⚙️ Technical Details

Problem Definition

Setting: N-way K-shot text classification

Inputs: A support set S of labeled examples and a query set Q of unlabeled examples

Outputs: Predicted class labels for the query set Q

Pipeline Flow

Data Augmentation (Offline/Pre-processing): Generate augmented samples using LLM with retrieval-based prompts
Training: Encoder maps text to embeddings
Loss Calculation: Compute Batch Supervised CL + Group-Level Unsupervised CL + Sample-Level Unsupervised CL

System Modules

LLM Augmenter (Self-Augmented) (Data Augmentation)

Generate discriminative positive samples for labeled data

Model or implementation: ChatGPT

LLM Augmenter (External-Augmented) (Data Augmentation)

Generate diverse paraphrases for unlabeled/external data to prevent overfitting

Model or implementation: ChatGPT

Text Encoder

Convert text to vector representations

Model or implementation: BERT-base

Novel Architectural Elements

Retrieval-based In-context Prompt Scheme: Injecting retrieved negative neighbors (for discrimination) or web knowledge (for diversity) into prompts
Group-Level Contrastive Loss: Interaction between current batch support set and 'base class groups' (augmented data as query set) to improve class separation

Modeling

Base Model: BERT-base (for encoding), ChatGPT (for augmentation)

Training Method: Contrastive Learning

Objective Functions:

Purpose: Supervised Contrastive Loss (Batch).

Formally: Standard SupCon loss pulling same-class samples together in the batch.
Purpose: Group-Level Unsupervised Contrastive Loss (Self-Augmented).

Formally: Contrastive loss between original support set and LLM-augmented queries, treating augmented versions as positive pairs.
Purpose: Sample-Level Unsupervised Contrastive Loss (External-Augmented).

Formally: Contrastive loss pulling unlabeled sample and its web-augmented variants together.

Adaptation: Fine-tuning of BERT encoder

Trainable Parameters: All BERT parameters + scalar weights alpha, beta for losses

Key Hyperparameters:

learning_rate: 1e-6
optimizer: Adam
temperature_batch: 3.0 to 8.0
+ 4 more
temperature_group: 3.0 to 8.0
temperature_sample: 3.0 to 8.0
N_G (groups): 10
N_S (unlabeled samples): 10

Compute: NVIDIA Tesla V100 PCIe 32GB GPU

Comparison to Prior Work

vs. ContrastNet: LA-UCL adds LLM-based augmentation and auxiliary unsupervised losses; ContrastNet relies on standard supervised contrastive learning
vs. PROTAUGMENT: LA-UCL uses LLM (ChatGPT) + Retrieval for augmentation instead of a smaller paraphrasing model; LA-UCL targets discrimination/diversity specifically via prompting
vs. ChatAug: Uses ChatGPT for augmentation but without the specific retrieval-based negative/external context injection strategies [not cited in paper]

Limitations

Reliance on commercial LLM (ChatGPT) and web retrieval (Bing) during data preparation
External knowledge retrieval is optional/omitted for intent datasets due to subjectivity
Sequence expansion for extremely long texts remains challenging for LLMs with limited context
No cost analysis of using LLM API for augmentation

Reproducibility

Code availability is not provided. Prompt templates are described in the paper. Pre-trained BERT models are standard. Datasets (Banking77, HWU64, etc.) are public.

📊 Experiments & Results

Evaluation Setup

5-way 1-shot and 5-way 5-shot classification

Benchmarks:

Banking77 (Intent Classification)
HWU64 (Intent Classification)
Clinic150 (Intent Classification)
Liu57 (Intent Classification)
HuffPost (News Headlines Classification)
Reuters (News Classification)

Metrics:

Accuracy
Statistical methodology: Average accuracy over 600 (intent) or 1000 (news) samples; 5 runs using re-split datasets; standard deviation reported.

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Comparison against baselines on intent classification datasets (Banking77, HWU64, Liu57, Clinic150).
HWU64	Accuracy (5-way 1-shot)	86.56	89.46	+2.90
HWU64	Accuracy (5-way 5-shot)	92.57	94.04	+1.47
Liu57	Accuracy (5-way 1-shot)	85.89	87.49	+1.60
Banking77	Accuracy (5-way 1-shot)	91.18	92.63	+1.45
Comparison on News Classification datasets (HuffPost, Reuters) showing larger margins.
HuffPost	Accuracy (5-way 1-shot)	53.06	54.94	+1.88
Reuters	Accuracy (5-way 1-shot)	86.42	87.70	+1.28
Ablation study showing the impact of removing components.
Liu57	Accuracy (5-way 1-shot)	87.49	81.39	-6.10
Liu57	Accuracy (5-way 1-shot)	87.49	86.14	-1.35

Experiment Figures

t-SNE visualization of sample representations for 5 similar classes from Liu57 dataset (ContrastNet vs. LA-UCL).

Confusion matrix heatmaps comparing error rates on specific similar class pairs.

Main Takeaways

Consistent improvements across all 6 datasets (intent and news) in both 1-shot and 5-shot settings.
Ablation studies confirm that both the group-level (discrimination) and sample-level (diversity) losses are necessary; removing them drops performance significantly.
LLM augmentation is superior to traditional paraphrasing models (PROTAUGMENT), but the retrieval-guidance is key—removing retrieval drops performance by ~2% on Reuters.
Visual analysis (t-SNE and error heatmaps) shows reduced confusion between semantically similar classes compared to ContrastNet.

📚 Prerequisite Knowledge

Prerequisites

Contrastive Learning (supervised and unsupervised)
Few-shot Learning (N-way K-shot setup)
Large Language Models for Data Augmentation
In-context Learning / Prompting

Key Terms

Mixup: A data augmentation technique where new examples are created by linear interpolation of input pairs and their labels; here used conceptually to inspire prompts that combine positive and negative sample information

BM25: A ranking function used in information retrieval to estimate the relevance of documents to a given search query

Support Set: In few-shot learning, the small set of labeled examples available for each class during training/inference

Query Set: The set of examples to be classified using the information from the support set

In-context prompt: Providing examples (demonstrations) within the LLM's input prompt to guide its generation style and format