Arxiv paper - MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation

In this episode, we discuss MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation by Sihyun Yu, Meera Hahn, Dan Kondratyuk, Jinwoo Shin, Agrim Gupta, José Lezama, Irfan Essa, David Ross, Jonathan Huang. The paper introduces MALT Diffusion, a new diffusion model designed for generating long videos by dividing them into short segments and using recurrent attention to maintain a memory latent vector for long-term context. It presents training techniques to ensure consistent quality over extended frames and demonstrates superior performance on long video benchmarks, significantly improving FVD scores. Additionally, MALT shows strong results in text-to-video generation, capable of producing longer videos than existing methods.

Arxiv paper – MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation