Efficient Perspective-Correct 3D Gaussian Splatting
Using Hybrid Transparency
arXiv

1TU Braunschweig 2FAU Erlangen-Nürnberg 3University College London

Abstract


3D Gaussian Splats (3DGS) have proven a versatile rendering primitive, both for inverse rendering as well as real-time exploration of scenes. In these applications, coherence across camera frames and multiple views is crucial, be it for robust convergence of a scene reconstruction or for artifact-free fly-throughs. Recent work started mitigating artifacts that break multi-view coherence, including popping artifacts due to inconsistent transparency sorting and perspective-correct outlines of (2D) splats. At the same time, real-time requirements forced such implementations to accept compromises in how transparency of large assemblies of 3D Gaussians is resolved, in turn breaking coherence in other ways.
In our work, we aim at achieving maximum coherence, by rendering fully perspective-correct 3D Gaussians while using a high-quality approximation of accurate blending, hybrid transparency, on a per-pixel level, in order to retain real-time frame rates. Our fast and perspectively accurate approach for evaluation of 3D Gaussians does not require matrix inversions, thereby ensuring numerical stability and eliminating the need for special handling of degenerate splats, and the hybrid transparency formulation for blending maintains similar quality as fully resolved per-pixel transparencies at a fraction of the rendering costs.
We further show that each of these two components can be independently integrated into Gaussian splatting systems. In combination, they achieve up to 2× higher frame rates, 2× faster optimization, and equal or better image quality with fewer rendering artifacts compared to traditional 3DGS on common benchmarks.

Accurate Splat Bounding and Evaluation



Although the affine approximation 3DGS uses for the projection of 3D Gaussians onto the image plane performs well on benchmark datasets, it fails to model perspective distortion correctly, especially when parts of the scene are viewed at close distances. The result are visually disturbing artifacts, where the projected Gaussians take on extreme, distorted shapes, severely affecting the rendering quality. We propose a fast, differentiable method for perspective-accurate 3D Gaussian splat evaluation at the point of maximum contribution along per-pixel viewing rays that avoids matrix inversion entirely by extending established techniques [SWBG06, WHA*07].

The perspectively correct screen-space bounding box of a splat (a) is given by the projection of its bounding frustum in view space (b). When transformed into local splat coordinates, the frustum planes align with tangential planes of the unit sphere (c). Our approach for splat evaluation along viewing rays makes use of the Plücker coordinate representation ( 𝒅 : 𝒎). In local splat coordinates, the point along the ray that maximizes the Gaussian’s value corresponds to the point 𝒙 that minimizes the perpendicular distance ∥𝒙∥ to the origin (d). Parts (a-c) courtesy of Weyrich et al. [WHA*07]; used with permission.

Temporally-Stable Rendering via Hybrid Transparency



We propose to use the established rendering paradigm of Hybrid Transparency [MCTB13], which provides high quality and performance while avoiding the global depth presorting used in 3DGS. By alpha-blending the first 𝐾 fragments (called the core) in correct depth-order per pixel and accumulating remaining contributions (the tail) using an order-independent residual, our method mitigates popping artifacts while maintaining superior performance.

Visual comparisons for different model configurations regarding our hybrid transparency approach. Using a smaller core size 𝐾 causes issues for reflective surfaces, as radiance fields commonly model these using semi-transparency. Disabling the order-independent tail only slightly reduces quality, especially in the sky, whereas not using it during optimization results in catastrophic failure.

Quantitative Results


Quantitative comparisons on the Mip-NeRF360 and Tanks and Temples datasets. Our approach of using perspectively correct splat evaluation in combination with hybrid transparency significantly reduces training and rendering times with image quality being similar to the baselines’. Excluding Zip-NeRF, the three best results are highlighted in green in descending order of saturation.

Visual Comparisons


3DGS
Ours
2DGS
Ours
StopThePop
Ours

Concurrent Work


EVER also addresses limitations of 3D Gaussian splatting. It shows that the 3D Gaussians can be replaced with constant density ellipsoids to allow for the use of exact volume rendering. Compared to our approach, their rendering is slower as they use a ray tracing framework but the volumetric rendering improves image quality significantly. Similarly check out the recent Taming 3DGS, which proposes a controllable densification strategy alongside multiple improvements to reduce training times. We believe that future work could combine these ideas with our approach for even better results.

Citation


@article{hahlbohm2024htgs,
    title={Efficient Perspective-Correct 3D Gaussian Splatting Using Hybrid Transparency},
    author={Florian Hahlbohm and Fabian Friederichs and Tim Weyrich and Linus Franke and Moritz Kappel and Susana Castillo and Marc Stamminger and Martin Eisemann and Marcus Magnor},
    journal={arXiv},
    year={2024}
}

Acknowledgements


We would like to thank Timon Scholz and Carlotta Harms for their help with comparisons and the supplemental material.
The authors gratefully acknowledge financial support from the German Research Foundation (DFG) for the projects “Real-Action VR” (ID 523421583) and “Increasing Realism of Omnidirectional Videos in Virtual Reality” (ID 491805996), as well as from the L3S Research Center, Hanover, Germany.
Linus Franke was supported by the 5G innovation program of the German Federal Ministry for Digital and Transport under the funding code 165GU103B.

All scenes shown above are from the Mip-NeRF360 and Tanks and Temples datasets. The website template was adapted from Zip-NeRF. For the comparison sliders, we use img-comparison-slider and the video comparison tool from Ref-NeRF.