PAIGE converts textbook chapters into personalized dual-host podcasts using generative AI, demonstrating that tailoring content to student interests enhances learning outcomes compared to generalized audio or reading.
Core Problem
Traditional textbooks are often perceived as boring or irrelevant by students, leading to low engagement, while manual creation of engaging alternative formats like podcasts is tedious for educators.
Why it matters:
University students increasingly prefer podcasts/media over reading but lack high-quality, curriculum-aligned audio resources
Existing personalization often focuses only on difficulty adjustment rather than engaging students through interest-based context
Standard text-to-speech lacks the engaging, conversational dynamic of human-hosted educational podcasts
Concrete Example:A psychology student reading a generic government textbook might find it dry. PAIGE uses their profile to generate a podcast where the hosts explain government concepts using psychology analogies, making the material more relevant.
Generates conversational podcast scripts from textbooks using a 'Skeleton of Thought' approach to manage structure and length
Integrates user profiles (major, interests, learning style) into the generation context to tailor examples and dialogue
Uses a dual-speaker architecture (Host and Expert personas) with high-quality neural audio to simulate natural educational dialogue
Architecture
The generation pipeline for PAIGE, detailing how context flows into transcript creation
Evaluation Highlights
Large-scale user study (n=180) across three subjects (Philosophy, Psychology, Government) comparing Textbooks, Generalized Podcasts, and Personalized Podcasts
Personalized podcasts led to significantly improved learning outcomes compared to generalized podcasts (specific numeric scores not in provided text)
Students rated AI-generated podcasts as more enjoyable than traditional textbook reading regardless of personalization
Breakthrough Assessment
7/10
Novel application of GenAI for end-to-end educational content transformation. Strong study design (n=180), though reliance on proprietary models (Gemini/AudioLM) limits reproducibility.
⚙️ Technical Details
Problem Definition
Setting: Transforming static educational text into engaging audio formats tailored to learner profiles
Outputs: Audio podcast file with two distinct voices (Host and Expert)
Pipeline Flow
Context Ingestion (Chapter + Profile)
Configuration Generation
Outline Generation
Transcript Generation
Fact Checking
Audio Synthesis
System Modules
Configuration Generator (Content Planning)
Create a config file defining host/expert personas and tailoring content strategy to user profile
Model or implementation: Gemini 1.5 Pro
Outline Generator (Content Planning)
Generate a structural skeleton of the conversation to ensure logical flow
Model or implementation: Gemini 1.5 Pro
Transcript Generator
Write the full dialogue script between Host and Expert
Model or implementation: Gemini 1.5 Pro
Speech Synthesizer
Convert transcript to audio using distinct voices
Model or implementation: AudioLM-based TTS
Novel Architectural Elements
Multi-stage generation pipeline (Config → Outline → Transcript) specifically designed to inject user profile constraints at the structural level before content generation
Modeling
Base Model: Gemini 1.5 Pro (Text), AudioLM-based model (Audio)
Compute: Not reported in the paper
Comparison to Prior Work
vs. Interactive Tutors: PAIGE transforms the *core material* itself rather than just answering questions about it
vs. News Podcasts: PAIGE incorporates deep personalization (user major/interests) rather than just content summarization
vs. Selection-based Personalization: Generates content in real-time rather than selecting from pre-defined passages
Limitations
Evaluation limited to three specific chapters/subjects
Relies on proprietary models (Gemini/AudioLM) which may not be accessible to all researchers
Study does not measure long-term retention, only immediate recall
Reproducibility
Not yet released. The paper mentions using Gemini 1.5 Pro and an AudioLM-based model (likely internal Google tools given authors' affiliation). No code URL provided.
📊 Experiments & Results
Evaluation Setup
Between-subjects user study (n=180) comparing three modalities across three subjects
Benchmarks:
Introduction to Philosophy (OpenStax) (Knowledge Retention Quiz) [New]
Psychology 2e (OpenStax) (Knowledge Retention Quiz) [New]
American Government 3e (OpenStax) (Knowledge Retention Quiz) [New]
Statistical methodology: 3x3 ANOVA with Tukey’s HSD post hoc analysis
Main Takeaways
Personalized AI-generated podcasts lead to significantly improved learning outcomes compared to generalized podcasts (subject-specific effects observed)
Students find the AI-generated podcast format significantly more enjoyable and engaging than reading traditional textbooks
Personalization enhances content relevance, addressing common student complaints of 'boredom' and 'irrelevance' associated with textbooks
📚 Prerequisite Knowledge
Prerequisites
Generative AI for text and speech
Basic understanding of Prompt Engineering
Personalized Learning theory
Key Terms
UEQ: User Experience Questionnaire—a widely used tool for measuring user satisfaction along dimensions like Attractiveness and Stimulation
Skeleton of Thought: A prompting technique that guides LLMs to generate a structural outline before full content, improving coherence and processing time
AudioLM: A language modeling approach to audio generation that treats audio synthesis as a sequence modeling task, producing high-fidelity speech
TTS: Text-to-Speech—technology that converts written text into spoken audio
RAG: Retrieval-Augmented Generation—using external data to ground LLM responses (mentioned in related work as a contrast to this system's approach)