arxiv preprint - Lumiere: A Space-Time Diffusion Model for Video Generation

In this episode, we discuss Lumiere: A Space-Time Diffusion Model for Video Generation by Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Yuanzhen Li, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri. The paper presents Lumiere, a novel text-to-video diffusion model capable of generating realistic and coherently moving videos by producing the full temporal sequence in a single pass, using a Space-Time U-Net architecture. Unlike other methods that create videos by interpolating between keyframes, Lumiere ensures global temporal consistency by using spatial and temporal down- and up-sampling. The model shows superior performance in text-to-video generation and is versatile, allowing for content creation tasks such as image-to-video conversion, video inpainting, and stylized video generation.

arxiv preprint – Lumiere: A Space-Time Diffusion Model for Video Generation