Stable Video Portraits | Neural Capture and Synthesis – Max Planck Institute for Intelligent Systems

Institute Homepage

Institute Homepage Sign In

Research Overview

Digital Humans in Motion

Digital Multi-Media Forensics

Digitization of Objects and Scenes

Capturing Humans, Objects and Scenes

MICA: Towards Metrical Reconstruction of Human Faces

GAN-Avatar: Controllable Personalized GAN-based Human Head Avatars

INSTA: Instant Volumetric Head Avatars

Motion Synthesis

Imitator: Personalized Speech-driven 3D Facial Animation

FaceTalk: Audio-driven Motion Diffusion for Neural Parametric Head Models

3DiFace: Synthesizing and Editing Holistic 3D Facial Animations

(Generative) Appearance Modelling and Editing

HAAR: Tex-conditioned Generative Model of 3D Strand-based Human Hairstyles

High-Res Facial Appearance Capture from Polarized Smartphone Images

Stable Video Portraits

Neural Capture and Synthesis

Stable Video Portraits

Svp — Personalized video diffusion with 3D control.

Stable Video Portraits is a novel hybrid 2D/3D generation method that outputs photorealistic videos of talking faces leveraging a large pre-trained text-to-image prior (2D), controlled via a 3DMM (3D). It is based on a personalized image diffusion prior which allows us to generate new videos of the subject, and also to edit the appearance by blending the personalized image prior with a general text-conditioned model.