INPC: Implicit Neural Point Clouds for Radiance Field Rendering
3DV 2025 (Oral Presentation)
Abstract
We introduce a new approach for reconstruction and novel view synthesis of unbounded real-world scenes.
In contrast to previous methods using either volumetric fields, grid-based models, or discrete point
cloud proxies, we propose a hybrid scene representation, which implicitly encodes the geometry in a
continuous octree-based probability field and view-dependent appearance in a multi-resolution hash grid.
This allows for extraction of arbitrary explicit point clouds, which can be rendered using rasterization.
In doing so, we combine the benefits of both worlds and retain favorable behavior during optimization:
Our novel implicit point cloud representation and differentiable bilinear rasterizer enable fast
rendering while preserving the fine geometric detail captured by volumetric neural fields.
Furthermore, this representation does not depend on priors like structure-from-motion point clouds.
Our method achieves state-of-the-art image quality on common benchmarks.
Furthermore, we achieve fast inference at interactive frame rates, and can convert our trained model
into a large, explicit point cloud to further enhance performance.
Pipeline

We introduce the implicit point cloud, a combination of a point probability field stored in an
octree and implicitly stored appearance features. To render an image for a given viewpoint, we sample
the representation by estimating point positions and querying the multi-resolution hash grid for
per-point features. This explicit point cloud – together with a small background MLP – is then rendered
with a bilinear point splatting module and processed by a CNN. During optimization, the neural networks
as well as the implicit point cloud are optimized, efficiently reconstructing the scene.
Point Cloud Sampling
To sample a point cloud for a given viewpoint, we check what voxels are inside the viewing
frustum and downscale probabilities based on voxel size as well as distance to the camera. Next, we
generate a set of positions using multinomial sampling with replacement where each point is randomly
offset inside its corresponding voxel. Lastly, we query a neural field for per-point appearance features.
Results
User Study
We complement our evaluation by conducting a perceptual experiment in which we compare INPC against Zip-NeRF, as the latter achieves the best quality metrics among the compared-against methods. We followed a fully randomized, within-participants experimental design with a 2AFC task. Our 17 participants saw the results of both methods side-by-side (one pair at a time, in random order and screen side, with a different order per participant) and were instructed to select the image they preferred. The 55 stimuli covered all 17 evaluated scenes and consisted of a minimum of 3 frames per scene. Our method was favored by the participants on an average of 69.41% of the cases, with all participants preferring our results with a ratio above the chance line.
Comparisons
3DGS




Zip-NeRF




TRIPS




Sampling during Inference
View-Specific Multisampling
Global Pre-Extraction
To achieve the best image quality during inference, we sample multiple viewpoint-specific point clouds for each image and average the rasterized feature maps. Alternatively, we pre-extract a global point cloud that can be used for every viewpoint which boosts frame rates at the cost of image quality.
Related Work
Please also check out RadSplat, a work that also improves upon best-quality baselines in terms of both quality and inference frame rates. They optimize a 3D Gaussian model with NeRF-based supervision and achieve high-fidelity novel-view synthesis at remarkably high frame rates. Similarly check out TRIPS, a work that makes use of trilinearly splatted points to render crisp images in real-time.
Citation
@inproceedings{hahlbohm2025inpc,
title = {{INPC}: Implicit Neural Point Clouds for Radiance Field Rendering},
author = {Hahlbohm, Florian and Franke, Linus and Kappel, Moritz and Castillo, Susana and Eisemann, Martin and Stamminger, Marc and Magnor, Marcus},
booktitle = {International Conference on 3D Vision},
doi = {tba},
year = {2025},
url = {https://fhahlbohm.github.io/inpc/}
}
Acknowledgements
We would like to thank Peter Kramer for his help with the video, Timon Scholz for his help with the implementation of our viewer, and Fabian Friederichs and Leon Overkämping for their valuable suggestions.
This work was partially funded by the DFG (“Real-Action VR”, ID 523421583) and the L3S Research Center, Hanover, Germany. We thank the Erlangen National High Performance Computing Center (NHR@FAU) for the provided scientific support and HPC resources under the NHR project b162dc. NHR funding is provided by federal and Bavarian state authorities. NHR@FAU hardware is partially funded by the DFG (ID 440719683).
Linus Franke was supported by the Bavarian Research Foundation (AZ-1422-20) and the 5G innovation program of the German Federal Ministry for Digital and Transport under the funding code 165GU103B.
All scenes shown above are from the Mip-NeRF360 and Tanks and Temples datasets. The website template was adapted from Zip-NeRF, who borrowed from Michaël Gharbi and Ref-NeRF. For the comparison sliders we follow RadSplat and use img-comparison-slider.