INPC: Implicit Neural Point Clouds for Radiance Field Rendering
3DV 2025 (Oral Presentation)

1TU Braunschweig 2FAU Erlangen-Nürnberg 3University of New Mexico

Abstract


We introduce a new approach for reconstruction and novel view synthesis of unbounded real-world scenes.
In contrast to previous methods using either volumetric fields, grid-based models, or discrete point cloud proxies, we propose a hybrid scene representation, which implicitly encodes the geometry in a continuous octree-based probability field and view-dependent appearance in a multi-resolution hash grid. This allows for extraction of arbitrary explicit point clouds, which can be rendered using rasterization.
In doing so, we combine the benefits of both worlds and retain favorable behavior during optimization: Our novel implicit point cloud representation and differentiable bilinear rasterizer enable fast rendering while preserving the fine geometric detail captured by volumetric neural fields. Furthermore, this representation does not depend on priors like structure-from-motion point clouds.
Our method achieves state-of-the-art image quality on common benchmarks. Furthermore, we achieve fast inference at interactive frame rates, and can convert our trained model into a large, explicit point cloud to further enhance performance.

Pipeline



We introduce the implicit point cloud, a combination of a point probability field stored in an octree and implicitly stored appearance features. To render an image for a given viewpoint, we sample the representation by estimating point positions and querying the multi-resolution hash grid for per-point features. This explicit point cloud – together with a small background MLP – is then rendered with a bilinear point splatting module and processed by a CNN. During optimization, the neural networks as well as the implicit point cloud are optimized, efficiently reconstructing the scene.

Point Cloud Sampling



To sample a point cloud for a given viewpoint, we check what voxels are inside the viewing frustum and downscale probabilities based on voxel size as well as distance to the camera. Next, we generate a set of positions using multinomial sampling with replacement where each point is randomly offset inside its corresponding voxel. Lastly, we query a neural field for per-point appearance features.

Results


User Study


We complement our evaluation by conducting a perceptual experiment in which we compare INPC against Zip-NeRF, as the latter achieves the best quality metrics among the compared-against methods. We followed a fully randomized, within-participants experimental design with a 2AFC task. Our 17 participants saw the results of both methods side-by-side (one pair at a time, in random order and screen side, with a different order per participant) and were instructed to select the image they preferred. The 55 stimuli covered all 17 evaluated scenes and consisted of a minimum of 3 frames per scene. Our method was favored by the participants on an average of 69.41% of the cases, with all participants preferring our results with a ratio above the chance line.

Click here to try our web version of the user study

Comparisons


3DGS

3DGS Ours 3DGS Ours

Zip-NeRF

Zip-NeRF Ours Zip-NeRF Ours

TRIPS

TRIPS Ours TRIPS Ours

Sampling during Inference


View-Specific Multisampling

Global Pre-Extraction

To achieve the best image quality during inference, we sample multiple viewpoint-specific point clouds for each image and average the rasterized feature maps. Alternatively, we pre-extract a global point cloud that can be used for every viewpoint which boosts frame rates at the cost of image quality.

Related Work


Please also check out RadSplat, a work that also improves upon best-quality baselines in terms of both quality and inference frame rates. They optimize a 3D Gaussian model with NeRF-based supervision and achieve high-fidelity novel-view synthesis at remarkably high frame rates. Similarly check out TRIPS, a work that makes use of trilinearly splatted points to render crisp images in real-time.

Citation


@inproceedings{hahlbohm2025inpc,
  title     = {{INPC}: Implicit Neural Point Clouds for Radiance Field Rendering},
  author    = {Hahlbohm, Florian and Franke, Linus and Kappel, Moritz and Castillo, Susana and Eisemann, Martin and Stamminger, Marc and Magnor, Marcus},
  booktitle = {International Conference on 3D Vision},
  doi       = {tba},
  year      = {2025},
  url       = {https://fhahlbohm.github.io/inpc/}
}

Acknowledgements


We would like to thank Peter Kramer for his help with the video, Timon Scholz for his help with the implementation of our viewer, and Fabian Friederichs and Leon Overkämping for their valuable suggestions.
This work was partially funded by the DFG (“Real-Action VR”, ID 523421583) and the L3S Research Center, Hanover, Germany. We thank the Erlangen National High Performance Computing Center (NHR@FAU) for the provided scientific support and HPC resources under the NHR project b162dc. NHR funding is provided by federal and Bavarian state authorities. NHR@FAU hardware is partially funded by the DFG (ID 440719683).
Linus Franke was supported by the Bavarian Research Foundation (AZ-1422-20) and the 5G innovation program of the German Federal Ministry for Digital and Transport under the funding code 165GU103B.

All scenes shown above are from the Mip-NeRF360 and Tanks and Temples datasets. The website template was adapted from Zip-NeRF, who borrowed from Michaël Gharbi and Ref-NeRF. For the comparison sliders we follow RadSplat and use img-comparison-slider.