Disentangled Generation and Aggregation for Robust Radiance Fields

ECCV 2024


Shihe Shen1*, Huachen Gao1*, Wangze Xu1, Rui Peng1,2, Luyang Tang1,2, Kaiqiang Xiong1,2, Jianbo Jiao3, Ronggang Wang1,2

1School of Electronic and Computer Engineering, Peking University
2Peng Cheng Laboratory
3School of Computer Science, University of Birmingham
* denotes equal contribution

Abstract


Teaser

The utilization of the triplane-based radiance fields has gained attention in recent years due to its ability to effectively disentangle 3D scenes with a high-quality representation and low computation cost. A key requirement of this method is the precise input of camera poses. However, due to the local update property of the triplane, a similar joint estimation as previous joint pose-NeRF optimization works easily results in local-minima. To this end, we propose the Disentangled Triplane Generation module to introduce global feature context and smoothness into triplane learning, which mitigates errors caused by local updating. Then, we propose the Disentangled Plane Aggregation to mitigate the entanglement caused by the common triplane feature aggregation during camera pose updating. In addition, we introduce a two-stage warm-start training strategy to reduce the implicit constraints caused by the triplane generator. Quantitative and qualitative results demonstrate that our proposed method achieves state-of-the-art performance in novel view synthesis with noisy or unknown camera poses, as well as efficient convergence of optimization.


Method


pipeilne

Our method can be divided into two stages. In the first stage, we input random triplane noise to the proposed triplane generator to generate different feature grids for scene representation while optimizing the camera poses. Features interpolated from different planes are aggregated through the proposed DPA. Colors and densities are derived from an MLP decoder. In the second stage, we discard the generator and transform it to direct updates on triplane feature grids. We adopt a warm-start training strategy for both efficient training and better scene representation.



NVS Results on LLFF Dataset


LLFF

Naive represents the reference K-Planes trained with unknown camera poses.


NVS Results on NeRF-Synthetic Dataset


blender

Naive represents reference baseline K-Planes is trained with noisy camera poses.





Citation


@inproceedings{DiGARR,
  author = {Shen, Shihe and Gao, Huachen and Xu, Wangze and Peng, Rui and Tang, Luyang and Xiong, Kaiqiang and Jiao, Jianbo and Wang, Ronggang},
  title = {Disentangled Generation and Aggregation for Robust Radiance Fields},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2024}
}