FSGS

Real-Time Few-Shot View Synthesis using Gaussian Splatting

Zehao Zhu*   Zhiwen Fan*   Yifan Jiang   Zhangyang Wang   University of Texas at Austin ECCV 2024

Overview

Novel view synthesis from limited observations remains an important and persistent task. However, high efficiency in existing NeRF-based few-shot view synthesis is often compromised to obtain an accurate 3D representation. To address this challenge, we propose a Few-Shot view synthesis framework based on 3D Gaussian Splatting that enables real-time and photo-realistic view synthesis with as few as three training views. The proposed method, dubbed FSGS, handles the extremely sparse initialized SfM points with a thoughtfully designed Gaussian Unpooling process. Our method iteratively distributes new Gaussians around the most representative locations, subsequently infilling local details in vacant areas. We also integrate a large-scale pre-trained monocular depth estimator within the Gaussians optimization process, leveraging online augmented views to guide the geometric optimization towards an optimal solution. Starting from sparse points observed from limited input viewpoints, our FSGS can accurately grow into unseen regions, comprehensively covering the scene and boosting the rendering quality of novel views. Overall, FSGS achieves state-of-the-art performance in both accuracy and rendering efficiency across diverse datasets, including LLFF, Mip-NeRF360, and Blender. Code will be made available.

Method

FSGS are initialized from SfM, with a few images (black cameras). For the sparsely placed Gaussians, we propose densifying new Gaussians to enhance scene coverage by unpooling existing Gaussians into new ones, with properly initialized Gaussian attributes. Monocular depth priors, enhanced by sampling unobserved views (red cameras), guide the optimization of grown Gaussians towards a reasonable geometry. The final loss consists of a photometric loss term, and a geometric regularization term calculated as depth correlation.

Pipeline Image

Baseline Comparisons

We demonstrates the visual improvement of FSGS compared with FreeNeRF and SparseNeRF in both the forward-scene and 360 degree dataset. We can observe that NeRF-based methods generate floaters and lead to aliasing results due to limited observation Our proposed FSGS enforces more consistent and solid surfaces with geometric coherence.

FreeNeRF
SparseNeRF
3D-GS
FSGS

Visualizations on LLFF dataset

On LLFF dataset, we visualize results from 8 different scenes trained with 3, 6, 9 views repectively. FSGS produces pleasing appearances while demonstrating detailed thin structures.

Trained with 3 Views

fern
flower
fortress
horns
leaves
orchids
room
trex

Trained with 6 Views

fern
flower
fortress
horns
leaves
orchids
room
trex

Trained with 9 Views

fern
flower
fortress
horns
leaves
orchids
room
trex

Visualizations on Mip-NeRF360 dataset

Below, we visualize 8 tasks across 3 domains that we consider.

Trained with 24 Views

bicycle
bonsai
counter
garden
kitchen
room

Trained with 16 Views

bicycle
bonsai
counter
garden
kitchen
room

Citation

@misc{zhu2023FSGS,
    title={FSGS: Real-Time Few-Shot View Synthesis using Gaussian Splatting},
    author={Zehao Zhu and Zhiwen Fan and Yifan Jiang and Zhangyang Wang},
    year={2023},
    eprint={2312.00451},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}