Anchor-based Dynamic Gaussian Splatting with Gradient Disentanglement for Efficient and High-Fidelity Dynamic Scene Reconstruction
Code: github.com/I2-Multimedia-Lab/GrainGS
Overview of GrainGS. Input SfM points are encoded into canonical anchors and lifted into canonical base Gaussians by a static prediction module. A deformation network (DeformNet) predicts per-Gaussian temporal offsets [Δx, Δs, Δr] from the canonical Gaussian and a time embedding γ(t), while a canonical view-MLP and an appearance-residual MLP model time-varying color and opacity. A stop-gradient on the canonical→deformation path keeps the canonical base Gaussians clean.
GrainGS is a dynamic-scene Gaussian Splatting pipeline built on the anchor-based Scaffold-GS representation. It extends the static scaffold to 4D dynamic reconstruction with a canonical anchor representation, Gaussian-level temporal deformation, and temporal appearance modeling.
The central observation behind GrainGS is gradient entanglement: in a naive anchor-based dynamic formulation, the gradients of the deformation network flow back into the canonical (base) Gaussians and gradually corrupt the canonical representation. GrainGS resolves this with two ingredients — a warm-up phase that first establishes a clean canonical base, and a stop-gradient that decouples the deformation branch from the canonical branch during joint training.
The main dynamic training entry is train_dynamic.py. The code supports D-NeRF style Blender datasets, COLMAP-style dynamic scenes, and Nerfies/HyperNeRF-style dynamic data loaders.
Core design:
- Input SfM points become canonical anchors
A = {x_a, s_a, f_a}(position, influence radius, feature), and a static prediction module generates the canonical base Gaussians. - A deformation network predicts Gaussian-level temporal offsets for position, rotation, and scale.
- View-dependent MLPs remain time-independent for canonical appearance.
- Optional temporal appearance embeddings model time-varying color and opacity residuals.
- A stop-gradient and a warm-up schedule prevent deformation gradients from corrupting the canonical base.
- Anchor growing and pruning follow the Scaffold-GS style densification process.
Relevant files:
train_dynamic.py: dynamic-scene training.render_dynamic.py: dynamic-scene rendering.metrics.py: PSNR, SSIM, and LPIPS evaluation on rendered results.scene/dynamic_dataset_loader.py: dynamic dataset readers with temporalfidsupport.scene/deform_model.pyandutils/time_utils.py: deformation model.scene/canonical_cell.py: canonical anchor / cell representation.gaussian_renderer/__init__.py: rendering and deformation integration.
GrainGS represents the static structure of a dynamic scene with anchors derived from the input SfM points. Each anchor A^i = {x_a^i, s_a^i, f_a^i} stores a position, an influence radius, and a learned feature. A static prediction module decodes the anchors into the canonical base Gaussians x_can, which act as the time-independent spatial backbone shared across all frames.
Warm-up phase. GrainGS first establishes a clean canonical base Gaussian from anchors (position, influence radius, and feature) before enabling temporal deformation, so the canonical geometry is well-formed when the deformation branch is switched on.
When the deformation network is trained jointly with the canonical branch from the start, its gradients backpropagate through the selected Gaussians and into the canonical base. This entangles the canonical and dynamic objectives and corrupts the canonical representation — the canonical model is forced to absorb motion it should not represent.
Gradient entanglement. Without intervention, gradients from the deformation network flow back into the canonical base Gaussians and produce a corrupted canonical representation (note the ghosting in the canonical figure).
GrainGS addresses this by (1) running a warm-up phase (--warm_up) that optimizes only the canonical/static representation for the first iterations, and (2) applying a stop-gradient on the canonical→deformation path so that the per-Gaussian temporal offsets are learned without back-propagating into and degrading the canonical base. The result is a clean canonical base that the deformation and appearance-residual networks operate on, as shown in the pipeline figure above.
The provided environment uses Python 3.10 and PyTorch with CUDA 12.1. Other CUDA/PyTorch versions may also work, but may require editing environment.yml.
Clone this repository:
git clone https://github.com/I2-Multimedia-Lab/GrainGS.git
cd GrainGSCreate and activate the conda environment:
conda env create -f environment.yml
conda activate GrainGSInstall the local CUDA extensions:
pip install submodules/diff-gaussian-rasterization
pip install submodules/simple-knnIf extension compilation fails, check that the CUDA toolkit, PyTorch CUDA version, and compiler version are compatible.
Create a data/ folder under the project root:
mkdir -p dataThe default dynamic dataset root used by the scripts is:
data/
+-- DynamicScene/
+-- D-Nerf/
+-- DG-Mesh/
D-NeRF scenes should follow the Blender/NeRF synthetic format:
data/
+-- DynamicScene/
+-- D-Nerf/
+-- bouncingballs/
| +-- train/
| | +-- r_000.png
| | +-- r_001.png
| | +-- ...
| +-- test/
| | +-- r_000.png
| | +-- r_001.png
| | +-- ...
| +-- transforms_train.json
| +-- transforms_test.json
| +-- transforms_val.json
| +-- points3d.ply
+-- hellwarrior/
+-- hook/
+-- jumpingjacks/
+-- lego/
+-- mutant/
+-- standup/
+-- trex/
For Blender-style data, pass --is_blender --white_background. The loader reads timestamps from each frame's time field when present; otherwise it assigns normalized time by frame order.
The training scripts expect DG-Mesh scenes under:
data/
+-- DynamicScene/
+-- DG-Mesh/
+-- beagle/
+-- bird/
+-- duck/
+-- girlwalk/
+-- horse/
+-- torus2sphere/
For generic COLMAP-style dynamic scenes, the loader recognizes this structure:
scene_name/
+-- images/
| +-- 00000.png
| +-- 00001.png
| +-- ...
+-- sparse/
+-- 0/
+-- cameras.bin
+-- images.bin
+-- points3D.bin
Text COLMAP files are also supported:
sparse/0/
+-- cameras.txt
+-- images.txt
+-- points3D.txt
If points3D.ply does not exist, it is generated from COLMAP points the first time the scene is loaded.
The dynamic loader also supports Nerfies/HyperNeRF-style data when the scene contains:
scene_name/
+-- scene.json
+-- metadata.json
+-- dataset.json
+-- points.npy
+-- camera/
+-- rgb/
Use train_single.sh for common dynamic datasets:
bash train_single.sh dnerf bouncingballs
bash train_single.sh dgmesh beagleThis script writes logs and checkpoints to:
outputs/<dataset_type>/<scene_name>/
For example:
outputs/dnerf/bouncingballs/
+-- outputs.log
+-- cfg_args
+-- cameras.json
+-- input.ply
+-- results.json
+-- point_cloud/
python train_dynamic.py \
-s data/DynamicScene/D-Nerf/bouncingballs \
-m outputs/dnerf/bouncingballs \
--is_blender \
--white_background \
--eval \
--iterations 30000 \
--save_iterations 7000 15000 30000 \
--warm_up 3000python train_dynamic.py \
-s data/DynamicScene/DG-Mesh/beagle \
-m outputs/dgmesh/beagle \
--eval \
--iterations 30000 \
--save_iterations 7000 15000 30000 \
--warm_up 3000bash scripts/train_all_dnerf.sh
bash scripts/train_all_dgmesh.shThe all-scene scripts use the dataset lists defined inside each script.
bash test_dynamic_train.shThis runs a shorter D-NeRF training job on bouncingballs and writes results to outputs/test_bouncingballs.
Training includes periodic rendering and metric evaluation when --eval is enabled. The best PSNR, SSIM, and LPIPS values are saved in results.json, and detailed logs are saved in outputs.log.
Manual rendering:
python render_dynamic.py \
-s data/DynamicScene/D-Nerf/bouncingballs \
-m outputs/dnerf/bouncingballs \
--iteration -1 \
--skip_trainCompute metrics after rendering:
python metrics.py -m outputs/dnerf/bouncingballsRendered images are saved under:
outputs/dnerf/bouncingballs/
+-- test/
+-- ours_<iteration>/
+-- renders/
+-- gt/
The FPS reported by the renderer is measured with CUDA synchronization around rendering calls:
torch.cuda.synchronize()
t_start = time.time()
# rendering
torch.cuda.synchronize()
t_end = time.time()Measure FPS and checkpoint storage size without saving rendered images:
bash scripts/benchmark_dynamic_fps_storage.sh \
data/DynamicScene/D-Nerf/lego \
outputs/dnerf_7/lego \
outputs/benchmark_results.csv \
test \
5Batch benchmark scripts:
bash scripts/benchmark_all_dnerf.sh
bash scripts/benchmark_all_dgmesh.shThe repository also keeps static Scaffold-GS style training scripts. For example:
bash train.sh \
-d blending/drjohnson \
-l scaffold \
--gpu 0 \
--voxel_size 0.001 \
--update_init_factor 16 \
--appearance_dim 32 \
--ratio 1Static scenes are expected under:
data/
+-- StaticScene/
Important dynamic training arguments:
-s, --source_path: input scene path.-m, --model_path: output model path.--eval: enable train/test split and evaluation.--is_blender: use Blender/D-NeRF settings.--white_background: use white background for Blender synthetic data.--iterations: total training iterations.--save_iterations: checkpoint save iterations.--test_iterations: metric evaluation iterations.--warm_up: canonical warm-up iterations before deformation training.--time_appearance_dim: temporal appearance embedding dimension; set to0to disable.--use_canonical_cell: enable the canonical cell architecture.
GrainGS inherits components from 3D Gaussian Splatting and Scaffold-GS, so their original license terms apply where relevant. Please refer to the license files included with the bundled submodules under submodules/.
This project is built on top of the excellent work of 3D Gaussian Splatting and Scaffold-GS. The dynamic reconstruction components are inspired by Deformable 3D Gaussian Splatting, 4D Gaussian Splatting, and 4D Scaffold-GS. We thank the authors for releasing their code.


