arxiv preprint – VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

In this episode, we discuss VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time by Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, Baining Guo. VASA is a new framework designed to create realistic talking faces from a static image and audio clip, featuring lip synchronization, facial expressions, and head movements. It utilizes a diffusion-based model in a face latent space for generating dynamic facial and head movements, improving the authenticity and liveliness of the avatars. VASA-1 delivers high-quality, real-time video generation at up to 40 FPS, outperforming existing technologies in realism and responsiveness, suitable for real-time avatar interaction. Project page: https://www.microsoft.com/en-us/research/project/vasa-1/


Posted

in

by

Tags: