arxiv preprint - Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

In this episode we discuss Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
by Shuai Yang, Yifan Zhou, Ziwei Liu, Chen Change Loy. The paper introduces a novel framework for adapting image models to videos through zero-shot text-guided video-to-video translation. The framework consists of two parts: key frame translation and full video translation. The key frame translation generates key frames with hierarchical cross-frame constraints to ensure coherence, which are then propagated to other frames using temporal-aware patch matching and frame blending. The framework achieves global style and local texture temporal consistency without the need for re-training or optimization and performs better than existing methods in producing high-quality and coherent videos.

arxiv preprint – Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation