[ECCV 2020 Neural Voice Puppetry] [Lip Sync.]

Neural Voice Puppetry: Audio-driven Facial Reenactment

NVP
Figure: NVP (ECCV 2020)

Task: Predict expression coefficients to drive expression blendshape basis.
Motivation: The visual counterpart is largely missing.
Motion: 3DMM Coefficients.
Dataset: 116 videos with an average length of 1.7min (total 302,750 frames).
Problem: Leveraging explicit facial structural priors may accumulate errors in predicting such intermediate representation.


[CVPR 2023 SadTalker] [Lip Sync.] [Context Expression] [Head Pose]

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation


Figure: SadTalker (CVPR 2023)

Task: Generate 3D motion coefficients.
Motivation: Unnatural head movement, distorted expression, and identity modification.
Motion: Expression and head pose.
Dataset:

Views:

  • It is an early trial to use lip-only 3DMM coefficients.
  • Generates 3D motion coefficients from audio for realistic head movement and facial expressions.

Problems:

  • These approaches relying on 3D intermediate representations typically face challenges in accurately capturing subtle expressions and realistic motions, which significantly limits the quality of the generated portrait animations.
  • A recurring challenge is the limited capacity of the 3D mesh to capture intricate details, constraining overall dynamism and realism. Omitting intermediate representations may improve naturalness.


© 2025 - Zhihao Li Created using Stellar
Page UV: 326 | Page PV: 326
Site UV: 113701 | Site PV: 113701
🦉 感谢你的到访,愿你每天都有好心情!🦉