COMIC: Agentic Sketch Comedy Generation

📝 Paper Summary

Agentic Video Generation Computational Humor Creative Content Generation

COMIC is a multi-agent framework that generates sketch comedy videos by evolving scripts through competitive tournaments and iteratively refining visual shots using critics aligned with YouTube viewer engagement.

Core Problem

Generating funny, long-form video content is difficult because humor is subjective and context-dependent, while current video models struggle with narrative consistency over long durations.

Why it matters:

Standard LLMs often produce 'dad jokes' or cliché puns rather than genuine comedy.
Fixed objective functions fail for creative tasks because humor has no single ground truth and evolves with exposure (jokes get stale).
Existing video generation pipelines typically produce short, disconnected clips lacking the structural coherence needed for storytelling.

Concrete Example: If you ask a standard model to write a sketch, it might output a generic, unfunny dialogue. COMIC, however, simulates a writers' room where 'island' populations of scripts compete; a losing script about a mundane topic might be rewritten to incorporate a surreal twist from a winning script, eventually evolving into a high-quality sketch.

Key Novelty

Content Optimization via Multi-agent Iterative Competition (COMIC)

Replaces fixed reward functions with relative fitness via pairwise tournaments, where losing scripts are updated using feedback from winners (simulating a writer's room).
Uses distinct 'islands' of script populations, each governed by different critic personas, to preserve diversity in comedic styles (e.g., slapstick vs. dry wit).
Introduces a 'Generate-and-Select' method for critics, creating a pool of diverse evaluator agents and retaining only those that correctly predict real-world YouTube engagement statistics.

Architecture

The COMIC framework pipeline, detailing the progression from script evolution to video realization.

Evaluation Highlights

Outperforms 'Single Best' critic baseline on Studio C, VLDL, and SNL engagement prediction tasks (e.g., +6.5% accuracy on Studio C top-vs-bottom).
Achieves state-of-the-art performance in agentic video generation, producing results approaching the quality of professionally produced sketches.
Demonstrates effective test-time scaling: increasing the number of rendering iterations directly improves visual quality without retraining.

Breakthrough Assessment

8/10

Significant advance in applying agentic workflows to highly subjective creative tasks. The alignment of critics to real-world YouTube engagement data to serve as a proxy for 'humor' is a clever, impactful methodological contribution.

⚙️ Technical Details

Problem Definition

Setting: Automated generation of a coherent, humorous video V* from character specs X and background assets B.

Inputs: List of character descriptions (text, image, voice) and background references.

Outputs: A full sketch comedy video comprising a sequence of consistent shots with dialogue and audio.

Pipeline Flow

Global Script Evolution (Islands of writer/critic agents) -> Final Script
Scene Director (Script -> Storyboard/Scene Directions)
Iterative Visual Refinement (Renderer + Critic -> History of shots -> Tournament selection)

System Modules

Writer Agent (Script Evolution)

Generates initial concepts and expands them into full dialogues; revises scripts based on feedback.