← Back to Paper List

Optimising Language Models for Downstream Tasks: A Post-Training Perspective

Z Shi
University College London
arXiv, 6/2025 (2025)
Pretraining RL Reasoning Benchmark

📝 Paper Summary

Parameter-Efficient Fine-Tuning (PEFT) Semi-supervised Learning Instruction Tuning Reasoning Benchmarks
This thesis proposes a suite of post-training methods—including prompt-based continued pre-training, decomposed prompt tuning, and instruction modeling—to adapt language models efficiently and robustly to downstream tasks with limited data.
Core Problem
Standard fine-tuning of Large Language Models (LLMs) often fails to leverage unlabelled data effectively, incurs high computational costs, and struggles with instruction following in low-resource settings.
Why it matters:
  • Fine-tuning large models on small datasets leads to overfitting and poor generalization.
  • The computational cost of full fine-tuning or even standard prompt tuning becomes prohibitive for real-time or resource-constrained applications.
  • Current evaluation benchmarks often fail to capture specific cognitive abilities like multi-hop spatial reasoning, masking model limitations.
Concrete Example: When adapting a model to a sentence-pair task, standard continued pre-training on task-related text can actually degrade performance compared to no pre-training. Similarly, prompt tuning increases inference latency due to longer input sequences.
Key Novelty
Unified suite of efficient adaptation techniques (PCP, DePT, IM)
  • Prompt-based Continued Pre-training (PCP): Reformulates continued pre-training as a prompt-based task to better align unlabelled data with downstream fine-tuning formats.
  • Decomposed Prompt Tuning (DePT): Splits soft prompts into a shorter vector and low-rank matrices to reduce sequence length (latency) while maintaining expressivity.
  • Instruction Modelling (IM): Applies loss to the instruction/prompt tokens (not just the output) during tuning to prevent overfitting, especially when instructions are long and outputs are short.
Evaluation Highlights
  • Decomposed Prompt Tuning (DePT) reduces memory usage by ~20% and training time by ~15% compared to vanilla Prompt Tuning while maintaining or exceeding performance.
  • Instruction Modelling (IM) boosts AlpacaEval 1.0 win rates by over 100% in low-resource settings compared to standard fine-tuning.
  • Prompt-based Continued Pre-training (PCP) consistently improves prompt-based fine-tuning performance in semi-supervised settings, outperforming standard self-training methods.
Breakthrough Assessment
7/10
Offers practical, model-agnostic improvements for efficiency and low-resource robustness. While not a fundamental architecture shift, the methods (especially DePT and IM) provide significant utility for deploying LLMs.
×