arxiv preprint - StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

In this episode, we discuss StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation by Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou. The paper introduces advanced techniques to improve diffusion-based generative models used for creating consistent and continuous sequences in image and video generation. It presents “Consistent Self-Attention” for maintaining content consistency and a “Semantic Motion Predictor” that aids in generating coherent long-range video content by managing motion prediction. These enhancements, encapsulated in the StoryDiffusion framework, allow for the generation of detailed, coherent visual narratives from textual stories, demonstrating the potential to significantly advance visual content creation.

arxiv preprint – StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation