[ECCV 2024 TalkingGaussian] [Lip Sync.] [Specific Animation]
TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
Figure: TalkingGaussian (ECCV 2024)
Task: Predict the point-wise deformation for each primitive and decompose the face and inside mouth avoiding corruption.
Motivation: Directly modifying point appearance may lead to distortions in dynamic regions.
Motion: Gaussian Primitive Offset
Dataset: Four high-definition speaking video clips with an average length of about 6500 frames in 25 FPS with a center portrait.
Views:
- First to address facial distortion in radiance-fields-based methods by using 3DGS.
- Introduces a 3D Gaussian method that improves visual quality and lip-sync accuracy.
- Enables real-time inference.
Problem: Requires video data for identity-specific training, lacking generalization capabilities.
[ACM MM 2024 GaussianTalker] [Lip Sync.] [Specific Animation]
GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting
Figure: GaussianTalker (ACM MM 2024)
Task: Learn canonical 3D Gaussians with a triplane and predict Gaussian attributes offset using cross-attention.
Motivation: Real-time generation of pose-controllable talking heads.
Motion: Gaussian Primitive Offset
Dataset: Public NeRF-based video datasets, ~6000 frames each at 25 FPS.
Problems:
- Lower image quality, lip-sync, and frequent lip flicker.
- Uses monocular video, lacking multi-view synthesis.
- Sparse audio features drive dense Gaussian clouds, causing artifacts.
- Simple attention module leads to jitter and gaps between face and background.
[SIGGRAPH 2025 LAM] [Lip Sync.] [Animation Efficiency]
LAM: Large Avatar Model for One-shot Animatable Gaussian Head
Figure: LAM (SIGGRAPH 2025)
Task: Reduces reliance on extra neural networks to support real-time rendering on low-power platforms.
Motivation: Animate Gaussian points using blendshape and LBS (Linear Blend Skinning) functions.
Motion: Gaussian Primitive
Dataset: