Re-direct to the full PAPER and CODE

Abstrarct

Existing depth estimation methods are fundamentally limited to predicting depth on discrete image grids. Such representations restrict their scalability to arbitrary output resolutions and hinder the geometric detail recovery. This paper introduces InfiniDepth, which represents depth as neural implicit fields. Through a simple yet effective local implicit decoder, we can query depth at continuous 2D coordinates, enabling arbitrary-resolution and fine-grained depth estimation. To better assess our method’s capabilities, we curate a high-quality 4K synthetic benchmark from five different games, spanning diverse scenes with rich geometric and appearance details. Experiments demonstrate that InfiniDepth achieves SOTA performance on both synthetic and real-world benchmarks across relative and metric depth estimation tasks, particularly excelling in fine-detail regions. It also benefits the task of novel view synthesis under large viewpoint shifts, producing high-quality results with fewer holes and artifacts.

Methodology

Pipeline

pipeline

Pipeline of InfiniDepth:

Feature Query: given an input image and a continuous query 2D coordinate, we extract feature tokens from multiple layers of the ViT encoder, and query local features for the coordinate at each scale through bilinear interpolation.
Depth Decoding: given the multi-scale local features queried at the continuous coordinate, we hierarchically fuse features from high spatial resolution to low spatial resolution, and decode the fused feature to the depth value through a MLP head.

Qualitative Visualization

The following are sample scene examples from InfiniDepth. The project page provides fully interactive versions where you can zoom into 8K depth maps, rotate 3D point clouds, and explore Gaussian Splatting scenes in real time.

Depth Map

Fine-grained depth maps predicted by InfiniDepth at high resolution. Each pair shows the RGB input (left) and the predicted depth map (right).

For an interactive depth map viewer with zoom and pan, visit the InfiniDepth project page.

Point Cloud

Point clouds reconstructed from predicted depth maps on diverse scenes. Below are sample scene examples and the rendered point cloud visualizations.

DIODE

ETH3D

NYU

vis_pcd1 vis_pcd3 vis_pcd4

For an interactive 3D point cloud viewer with rotation, zoom, and pan, visit the InfiniDepth project page.

Interactive Gaussian Splatting

3D Gaussian Splatting results from single-view depth prediction by InfiniDepth. Below are sample scene examples (input views).

For an interactive Gaussian Splatting viewer where you can freely navigate the 3D scenes, visit the InfiniDepth project page.

Qualitative Comparison

Depth Map Comparison

Predicted depth maps from different methods on the same input. Blue and pink boxes highlight regions with fine-grained geometric details. depth_comparison

Point Cloud Comparison

Predicted point clouds from different methods on the same input. Orange boxes highlight regions with fine-grained geometric details. pcd_comparison

Novel View Synthesis

Single-View NVS results under large viewpoint changes, e.g. Bird’s-eye (BEV) views. nvs_comparison

Cite

If you find this work useful in your research, please cite:

@article{yu2026infinidepth,
    title={InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields},
    author={Hao Yu, Haotong Lin, Jiawei Wang, Jiaxin Li, Yida Wang, Xueyang Zhang, Yue Wang, Xiaowei Zhou, Ruizhen Hu and Sida Peng},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    year={2026}
}

Abstrarct#

Methodology#

Pipeline#

Qualitative Visualization#

Depth Map#

Point Cloud#

Interactive Gaussian Splatting#

Qualitative Comparison#

Depth Map Comparison#

Point Cloud Comparison#

Novel View Synthesis#

Cite#