Arxiv paper - VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

In this episode, we discuss VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation by Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, Yanwei Fu. The paper presents VidCRAFT3, a new framework for image-to-video generation that allows simultaneous control over camera motion, object movement, and lighting direction. It addresses previous limitations by introducing the Spatial Triple-Attention Transformer, which effectively decouples and integrates lighting, text, and image inputs. This innovative approach enhances the precision and versatility of controlling multiple visual elements in generated videos.

Arxiv paper – VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation