Perception and Grounding

ZoeDepth

ZoeDepth estimates metric depth from a single RGB image by combining relative depth priors with metric depth prediction, enabling zero-shot transfer across indoor and outdoor scenes.

Tool Introduction

Core parameters, trigger timing, and visual before/after demo references.

Short Explanation

Upload one RGB image and ZoeDepth predicts a metric depth map that can be used for 3D perception, obstacle reasoning, or scene geometry estimation.

InputRGB image
OutputMetric depth map
Trigger TimingTriggered on demand from the source demo or local example command.
RuntimePython / PyTorch / Torch Hub / Gradio UI
BeforeRGB image

Prepare the scene, image, video, sensor stream, prompt, or configuration expected by the original project.

AfterMetric depth map

Read the produced visualization, prediction, map, trajectory, mask, grasp pose, or other documented artifact.

Preset Example

A quick-run style example for the documentation page.

Inputtools/zoedepth/examples/input.jpg
PromptUse ZoeD_N for general RGB depth estimation
ExpectedA metric depth map and optional colored depth visualization saved in the run folder.

Parameters And Output

Readable controls and the meaning of each returned artifact.

Parameter Explanation

imagefile

Single RGB image to convert into a metric depth prediction.

modelselectZoeD_N

Checkpoint variant. ZoeD_N is commonly used for NYU-style indoor depth, while ZoeD_NK targets mixed indoor/outdoor transfer.

pretrained_resourceselectTorch Hub

Chooses whether the model is loaded from Torch Hub, local checkpoint, or the repository config.

outputpath

Where to save the raw depth and rendered visualization.

Output Explanation

depth_map

Per-pixel metric depth values, typically in meters after model-specific scaling.

visualization

A colored image for inspection; color is for readability, not the raw numeric result.

model_variant

The checkpoint used, which affects indoor/outdoor generalization.

How To Use

Official resources, deployment steps, academic context, citation, and source-reported benchmark numbers.

Deployment Notes

  1. Install the official repository requirements, then run the sanity check to confirm checkpoints load.
  2. Use Torch Hub or download official checkpoints before offline inference.
  3. Run single-image inference or the Gradio UI for quick inspection.
  4. For reproducible evaluation, use the repository's evaluate.py commands and official dataset splits.

Relative Path Example

# Relative-path local entry for the ZoeDepth tool folder
python tools/zoedepth/sanity.py

# Torch Hub inference path:
python tools/zoedepth/examples/infer_depth.py   --image tools/zoedepth/examples/input.jpg   --model ZoeD_N   --output tools/zoedepth/runs/depth_output.png

# Gradio UI path:
python -m tools/zoedepth.ui.app

# Evaluation examples:
python tools/zoedepth/evaluate.py -m zoedepth -d nyu
python tools/zoedepth/evaluate.py -m zoedepth_nk -d nyu

# Suggested repository layout when adding local files:
# tools/zoedepth/README.md
# tools/zoedepth/sanity.py
# tools/zoedepth/evaluate.py
# tools/zoedepth/ui/app.py
# tools/zoedepth/examples/input.jpg
# tools/zoedepth/runs/depth_output.png

# This page documents the path. The static page does not execute ZoeDepth.

Expected Result Shape

{
  "tool": "zoedepth",
  "status": "ok",
  "depth_map": [
    {
      "label": "Metric depth estimation",
      "score": 0.87,
      "output": "Metric depth map"
    }
  ],
  "timing": {
    "runtime": "PyTorch inference depends on backbone and resolution; the paper reports ZoeDepth variants from 42M parameters (Swin2-T) to 345M parameters (BEiT-L).",
    "device": "documented in source benchmark when available"
  },
  "artifacts": {
    "visualization": "tools/zoedepth/runs/visualization.png",
    "raw_predictions": "tools/zoedepth/runs/predictions.json"
  }
}
Paper figure

Academic Info

Paper identity and contribution summary.

TitleZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth
AuthorsShariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Muller
VenuearXiv:2302.12288, 2023
ContributionCombines strong relative depth estimation with metric depth heads, allowing monocular depth prediction to generalize across datasets such as NYU Depth V2 and KITTI.

Citation

@misc{zoedepth2023,
  title={ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth},
  author={Shariq Farooq Bhat and Reiner Birkl and Diana Wofk and Peter Wonka and Matthias Muller},
  year={2023},
  note={arXiv:2302.12288, 2023},
  url={https://arxiv.org/abs/2302.12288}
}

Benchmark

Only compact, source-reported numbers are shown here.

DatasetMetricValueRuntimeSource
NYU Depth V2delta1 / REL / RMSE / log100.955 / 0.075 / 0.270 / 0.032 for ZoeD-M12-N42M-345M parameters depending on backboneZoeDepth paper
KITTIREL0.057 for universal ZoeD-M12-NKSingle-image PyTorch inferenceZoeDepth paper

Artifacts

ZoeDepth paper, NYU/KITTI quantitative tables, Torch Hub entries, sanity scripts, evaluation scripts, Gradio UI, and model configs.

Demo Images

Visual references from the original tool. Click any image to inspect the original size.