arxiv Preprint - VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning

In this episode we discuss VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
by Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal. The paper presents VIDEODIRECTORGPT, a framework for generating multi-scene videos with consistency using large language models. It consists of a video planner LLM (GPT-4) that expands a text prompt into a “video plan” and a video generator called Layout2Vid that creates the videos while maintaining spatial and temporal consistency. The framework achieves competitive performance in single-scene video generation and allows for dynamic control of layout guidance strength and user-provided images.

arxiv Preprint – VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning