3D Talking Head: Generative Models

[ICLR 2023 GeneFace] [Lip Sync.] [General Animation]

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

Figure: GeneFace (ICLR 2023)

Task: Select 68 facial keypoints and predict the offset of keypoints to animate the NeRF rendering.
Motivation: The generalization is limited by the small scale of training data.
Motion: Keypoints Offset.
Dataset: LRS3-TED
Views:

It attempts to reduce NeRF artifacts by translating speech features into facial landmarks, but this often results in inaccurate lip movements.
It’s also hard to reproduce actions such as blinking and eyebrow-raising.

[arXiv 2025 KDTalker] [Lip Sync.] [Context Expression] [Head Pose]

Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait

Figure: KDTalker (arXiv 2025)

Task: Integrate unsupervised 3D keypoints (K) with diffusion models.
Motivation: Fixed nature of 3DMM keypoints without flexibility.
Motion: Keypoints Position.
Dataset:

Training: VoxCeleb
Evaluation: HDTF