VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Microsoft Research introduced VASA-1, a framework capable of generating lifelike talking faces of virtual characters using a single static image and audio speech. VASA-1 masterfully synchronises lip movements with audio, captures facial nuances, and produces natural head motions, offering a perception of realism and liveliness. Key innovations include a holistic model for facial dynamics and head movement in face latent space, developed using videos. This method substantially outperforms existing ones in various metrics, enabling real-time generation of high-quality, realistic talking face videos.

Visit Original Article →