[ICCV 2021 AD-NeRF] [Lip Sync.] [Specific Animation]

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis


Figure: AD-NeRF (ICCV 2021)

Task: Generating high-fidelity talking head video by fitting with the input audio sequence.
Motivation: The information loss caused by the intermediate representation in existing methods, while Neural Radiance Field (NeRF) adopts implicit scene representation.
Motion: NeRF
Dataset: Self-collected. Average video length is 3–5 minutes, all at 25 fps.
Problem:

  • Heavy Training: Requires several hours of training time, hindering rapid transfer to other individuals.
  • Less Control: Due to the lack of a unified representation, these methods fail to generate videos driven by multiple conditions.

[IJCV 2022 RAD-NeRF] [Lip Sync.] [Specific Animation]

RAD-NeRF: Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition


Figure: RAD-NeRF (IJCV 2022)

Task: Using a low-dimensional feature grid to model the high-dimensional audio-driven facial dynamics (using keypoints to smooth the samples).
Motivation: Dynamic NeRF exhibits slow training and inference speed.
Motion: NeRF
Dataset: Self-collected. Collected by previous works.
Problem:

  • Requires a complex MLP-based grid encoder to implicitly learn regional audio-motion mapping, limiting convergence and reconstruction quality.
  • Identity-dependent: Produces results with poor generalization for a different person.
  • Controllability: Cannot explicitly control facial expressions and poses, sometimes resulting in unsatisfactory outcomes.

© 2025 - Zhihao Li Created using Stellar
Page UV: 326 | Page PV: 326
Site UV: 113701 | Site PV: 113701
🦉 感谢你的到访,愿你每天都有好心情!🦉