[ECCV 2020 Neural Voice Puppetry] [Lip Sync.]
Neural Voice Puppetry: Audio-driven Facial Reenactment
Figure: NVP (ECCV 2020)
Task: Predict expression coefficients to drive expression blendshape basis.
Motivation: The visual counterpart is largely missing.
Motion: 3DMM Coefficients.
Dataset: 116 videos with an average length of 1.7min (total 302,750 frames).
Problem: Leveraging explicit facial structural priors may accumulate errors in predicting such intermediate representation.
[CVPR 2023 SadTalker] [Lip Sync.] [Context Expression] [Head Pose]
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Figure: SadTalker (CVPR 2023)
Task: Generate 3D motion coefficients.
Motivation: Unnatural head movement, distorted expression, and identity modification.
Motion: Expression and head pose.
Dataset:
Views:
- It is an early trial to use lip-only 3DMM coefficients.
- Generates 3D motion coefficients from audio for realistic head movement and facial expressions.
Problems:
- These approaches relying on 3D intermediate representations typically face challenges in accurately capturing subtle expressions and realistic motions, which significantly limits the quality of the generated portrait animations.
- A recurring challenge is the limited capacity of the 3D mesh to capture intricate details, constraining overall dynamism and realism. Omitting intermediate representations may improve naturalness.