T2I: Text-to-Image generation—creating images from text descriptions.
LoRA: Low-Rank Adaptation—a technique to fine-tune large models by training small rank-decomposition matrices instead of all weights.
DreamBooth: A method for personalizing text-to-image models to generate specific subjects or styles given a few images.
Stable Diffusion: A popular open-source latent text-to-image diffusion model used as the base for this work.
MotionLoRA: The authors' proposed method to fine-tune the motion module for specific camera movements using Low-Rank Adaptation.
Domain Adapter: A temporary LoRA layer used during training to capture the visual defects (blur, watermarks) of video data so the main motion module doesn't learn them.
ControlNet: A neural network structure to control diffusion models by adding extra conditions (like edge maps or depth maps).
Temporal Transformer: Attention blocks applied along the time axis of video data to model how content changes between frames.
Inflation: Expanding 2D image processing layers to handle 3D video data (Batch x Channels x Frames x Height x Width) by reshaping input tensors.